Ensembl Regulation (funcgen) API Tutorial

Introduction

The Ensembl Regulation team deals with functional genomics data. The API and databases for Ensembl Regulation are called Funcgen.

This tutorial is an introduction to the Funcgen API. Knowledge of the Ensembl Core API and of the coding conventions used in the Ensembl APIs is assumed.

Documentation about the Regulation database schema is available here, and while not necessary for this tutorial, an understanding of the database tables may help as many of the adaptor modules are table-specific.

Regulatory Features

RegulatoryFeatures are features involved with regulatory aspects like:

  • Predicted promoters,
  • Predicted enhancer regions,
  • CTCF Binding Sites,
  • Transcription factor binding sites or
  • Open chromatin regions.

They are generated by the Ensembl Regulatory Build.

To fetch RegulatoryFeatures from the funcgen database, you need to use the corresponding adaptor. To obtain all the regulatory features present in a given region of the genome, use the adaptor method fetch_all_by_Slice:

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
  -host => 'ensembldb.ensembl.org',
  -user => 'anonymous'
);

# get the SliceAdaptor and Slice
my $slice_adaptor = $registry->get_adaptor('Human', 'Core', 'Slice');
my $slice = $slice_adaptor->fetch_by_region('chromosome', 1, 54_960_000, 54_980_000);

# Get the RegulatoryFeatureAdaptor and fetch all RegulatoryFeatures by Slice
my $regulatory_feature_adaptor = $registry->get_adaptor('Human', 'Funcgen', 'RegulatoryFeature');
my @regulatory_features = @{$regulatory_feature_adaptor->fetch_all_by_Slice($slice)};

# Move through the regulatory features and print information about them
foreach my $current_regulatory_feature (@regulatory_features) {
  print $current_regulatory_feature->stable_id, "\t", $current_regulatory_feature->feature_type->name, "\n";
}
Used objects:
Adaptor objects
Main objects

Registry

What is this object $registry? Make sure you have defined it in all your scripts. Learn more in the general instructions page.

You can also narrow down by FeatureType. To do this, you need to specify the FeatureType using the FeatureTypeAdaptor.

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
  -host => 'ensembldb.ensembl.org',
  -user => 'anonymous'
);

# get the SliceAdaptor and Slice
my $slice_adaptor = $registry->get_adaptor('Human', 'Core', 'Slice');
my $slice = $slice_adaptor->fetch_by_region('chromosome', 17, 64000000, 64050000);

# Get the FeatureTypeAdaptor and specify the FeatureType
my $feature_type_adaptor = $registry->get_adaptor('Human', 'Funcgen', 'FeatureType');
my $feature_type = $feature_type_adaptor->fetch_by_name("Promoter");

# Get the RegulatoryFeatureAdaptor and fetch all RegulatoryFeatures by Slice
my $regulatory_feature_adaptor = $registry->get_adaptor('Human', 'Funcgen', 'RegulatoryFeature');
my @regulatory_features = @{$regulatory_feature_adaptor->fetch_all_by_Slice_FeatureType($slice, $feature_type)};

# Move through the regulatory features and print information about them
foreach my $current_regulatory_feature (@regulatory_features) {
    print $current_regulatory_feature->stable_id, "\t", $current_regulatory_feature->seq_region_start, "-", $current_regulatory_feature->seq_region_end, "\n";
}
Used objects:
Main objects

Regulatory Activities

For every regulatory feature the Ensembl Regulatory Build predicts the regulatory activity of the regulatory feature in each of the epigenomes of the regulatory build. For every epigenome there are five possible activities:

  1. Active
  2. Poised (Has both active and repressive marks, "ready to go"),
  3. Inactive,
  4. Repressed,
  5. NA (No data available for this epigenome)

The regulatory activities have their own object and can be queried like this:

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
  -host => 'ensembldb.ensembl.org',
  -user => 'anonymous'
);

# get the regulatory feature by ID
my $regulatory_feature_id = 'ENSR00000358244';
my $regulatory_feature_adaptor = $registry->get_adaptor('Human', 'Funcgen', 'RegulatoryFeature');
my $regulatory_feature = $regulatory_feature_adaptor->fetch_by_stable_id($regulatory_feature_id);

# print information about the feature
print "The ", $regulatory_feature->get_FeatureType->name, " with stable id: "  . $regulatory_feature->stable_id . " has the following activities: \n";

# Get the activity
my $regulatory_activity_adaptor = $registry->get_adaptor('homo_sapiens', 'funcgen', 'RegulatoryActivity');
my $regulatory_activity_list    = $regulatory_activity_adaptor->fetch_all_by_RegulatoryFeature($regulatory_feature);

# print the activity
foreach my $current_regulatory_activity (@$regulatory_activity_list) {
	print "\tIn the epigenome ", $current_regulatory_activity->get_Epigenome->short_name, " it is ", $current_regulatory_activity->activity, "\n";
}

External Features: Externally curated data

There are some Feature Sets that are either entirely or partially curated by external groups. These are stored as ExternalFeatures and can be accessed using the ExternalFeatureAdaptor.

If you know the name of a feature set, you can use the name to fetch the data, using the FeatureSetAdaptor. For example, we store data from the Vista Enhancer Browser.

The following script fetches the Vista Enhancers for a Slice.

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
  -host => 'ensembldb.ensembl.org',
  -user => 'anonymous'
);

# Get the Slice, FeatureSet and ExternalFeatureAdaptors
my $slice_adaptor = $registry->get_adaptor('homo_sapiens', 'core', 'Slice');
my $feature_set_adaptor  = $registry->get_adaptor('homo_sapiens', 'funcgen', 'FeatureSet');
my $ex_feat_adaptor  = $registry->get_adaptor('homo_sapiens', 'funcgen', 'ExternalFeature');

# Fetch a Slice and FeatureSet of interest
my $slice = $slice_adaptor->fetch_by_region( 'chromosome', '13');
my $vista_feature_set = $feature_set_adaptor->fetch_by_name('VISTA enhancer set');

# Use the ExternalFeatureAdaptor to fetch the Vista enhancers in the Slice
my @vistas = @{ $ex_feat_adaptor->fetch_all_by_Slice_FeatureSets($slice, [$vista_feature_set]); };

# Move through the Vista enhancers and print their locations
while (my $vista = shift @vistas) {
	print $vista->seq_region_name, ":", $vista->seq_region_start, "-", $vista->seq_region_end, "\n";
}
Used objects:
Main objects

Feature Types

FeatureTypes provide a biological annotation for features. They are divided in classes forming biologically coherent groups (eg. Transcription Factors). This is different from the FeatureSet class, which just states the origin of the data. Feature Types can be accessed using the FeatureTypeAdaptor.

External FeatureTypes

FeatureTypes for ExternalFeatures have a meaning that is specific to the FeatureSet. For example, for features of the Vista FeatureSet, the feature type indicates if the feature was active or inactive in an experiment.

Microarrays and associated information

Some popular commercial microarrays are stored in the Ensembl database, with mapping to genomic regions and genes. The arrays themselves are stored as Array objects, which can be fetched with the Array Adaptor.

The following script fetches all the arrays for a species and prints information about them:

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
  -host => 'ensembldb.ensembl.org',
  -user => 'anonymous'
);
  
# get the array adaptor
my $array_adaptor = $registry->get_adaptor('Human','Funcgen','Array');

# fetch all arrays and move through
my @arrays = @ { $array_adaptor->fetch_all };
foreach my $array (@arrays) {

	# Print some array info
	print "Array:\t", $array->name,"\nType:\t",  $array->type, "\nVendor:\t", $array->vendor, "\n";

	# Get some information about the array chips and print
	my @array_chips   = @{ $array->get_ArrayChips };
	foreach my $array_chip (@array_chips) {
		print "ArrayChip:\t", $array_chip->name, " DesignID:\t", $array_chip->design_id, "\n";
	}
	print "\n";
}
Used objects:
Adaptor objects
Main objects

Fetch all Probe Features from a specific Array and Probe

Probes are stored as Probe objects, which represent the probe on the array, and ProbeFeature objects, which represent the mapping of the Probe to the genome.

In this example, a Probe from the WholeGenome_4x44k_v1 array is obtained.

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
  -host => 'ensembldb.ensembl.org',
  -user => 'anonymous'
);

# get ProbeAdaptor and use to fetch a probe from the  WholeGenome_4x44k_v1 array
my $probe_adaptor = $registry->get_adaptor('Human', 'Funcgen', 'Probe');
my $probe = $probe_adaptor->fetch_by_array_probe_probeset_name('WholeGenome_4x44k_v1', 'A_23_P18656');

# Fetch the feature associated with this probe
my @probe_features = @{ $probe->get_all_ProbeFeatures };

#Print some info about the features
foreach my $probe_feature ( @probe_features ){
	print "ProbeFeature found at:\t", $probe_feature->feature_Slice->name, "\n";
}
Used objects:
Adaptor objects
Main objects

Probe mappings to transcripts

ProbeSets represent groups of Probes, and are mapped to transcripts.

In this example, the FOXP2 transcript is fetched by its stable_id. Then all ProbeSets that have been mapped to this transcript are fetched and printed.

  use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
  -host => 'ensembldb.ensembl.org',
  -user => 'anonymous'
);

my $trans_id = "ENST00000393489";

# get the ProbeSetAdaptor
my $probe_set_adaptor  = $registry->get_adaptor("human", "Funcgen", "ProbeSet");

# Fetch ProbeSets associated with a transcript and move through
my @probesets  = @{ $probe_set_adaptor->fetch_all_by_transcript_stable_id($trans_id) };
foreach my $probeset (@probesets) {
	
	# get all the Arrays the ProbeSets are found on, then make an array of their names 
	my @arrays = @{ $probeset->get_all_Arrays };	
	my @arraynames;
	foreach my $array (@arrays) {
		push @arraynames, ($array->name);
	}
	
	# print information about the mapping
	print "Probeset ", $probeset->name, " on array(s) ", join(", ", @arraynames), " maps to ", $trans_id, ".\n";
}
Used objects:
Adaptor objects
Main objects

Further help

For additional information or help mail the ensembl-dev mailing list. You will need to subscribe to this mailing list to use it. More information on subscribing to any Ensembl mailing list is available from the Ensembl Contacts page.