EnsemblEnsembl Home

Regulation Overview

Ensembl Regulation provides resources used for studying gene expression and its regulation in human and mouse, with a focus on the transcriptional and post-transcriptional mechanisms. Our database includes the Ensembl Regulatory Build, an up-to-date and comprehensive summary of regulatory features across the genome, as well as popular curated external resources.

Regulatory Features

Ensembl maintains a catalogue of genomic regions that could be involved in gene transcriptional regulation in human and mouse, called the Regulatory Build. These regions, called regulatory features, are inferred from publicly available experimental data sets, including:

In short, raw sequencing data are aligned against the reference genome and then a genome segmentation algorithm is applied to define genomic loci of distinct epigenomic profiles. Finally each epigenomic profile is classified into one type of regulatory features based on a decision tree that takes into consideration transcription factor binding sites (TFBSs), gene annotation and histone modification signal enrichment. For more information please see here.

The different types of regulatory features annotated include:

  • Promoters
  • Promoter flanking regions
  • Enhancers
  • CTCF binding sites
  • Transcription factor binding sites
  • Open chromatin regions

For each cell type the regulatory features are assigned labels to describe their activity levels. These include:

  • ACTIVE, when the region bears/displays an active epigenetic signature
  • POISED, when the region bears/displays a poised epigenetic signature
  • REPRESSED, when the region is epigenetically repressed
  • INACTIVE, when the region bears no epigenetic modifications from the ones included in the Regulatory Build, or
  • NA, when there is no available data in the cell type for this feature.

Segmentation Features

Genome segmentation datasets are generated as part of our Regulatory Build pipeline for each cell type using algorithms such as ChromHMM (Ernst et al., 2011) or Segway (Hoffman et al., 2011). These algorithms detect recurring signal patterns, called states, from a collection of genome-wide assays, such as DNase-seq and ChIP-seq, across the different cell types. They then assign a state to each basepair per epigenome. Following this stage, the 25 states are assigned a functional label, including CTCF, Distal, Heterochromatin, Open Chromatin, Transcription Factor Binding Site, Gene, Predicted Weak enhancer/Cis-reg element, Proximal, Tss, Poised and Repressed, based on a decision tree described here. For more information please also see our segmentation analyses documentation.

DNA Methylation

Ensembl Regulation resources also provide hyper- and hypo-methylated CpGs, as these have been studied using Reduced Representation Bisulfite Sequencing (RRBS) assays on forty five cell lines and Whole Genome Bisulphite Sequencing (WGBS) assays on two cell lines.

Other Regulatory Data

Ensembl Regulation databases also store data directly imported from external sources:

Oligo probes

Ensembl stores microarray probe mappings for several species and technologies, including:

  • Affymetrix: IVT and ST gene expression arrays
  • Codelink: gene expression array
  • Agilent: whole genome, CGH and SurePrint arrays
  • Illumina: whole genome and Infinium methylation arrays
  • Phalanx: OneArray

Displaying Regulatory Features

Regulation data can be accessed in the browser from various angles:

Location Tab

You can configure the Region in detail panel to display tracks linked to regulation. Watch our Region in detail video to find out how to add tracks. The Regulation tracks are grouped into subdivisions:

  • Regulatory Features: these can be visualised using the 'Regulatory Build' track.
  • Activity Levels: select the cell type/line of interest and display the activity levels for each regulatory feature defined in the Regulatory Build.
  • Segmentation Features: select the cell type/line of interest and display the genomic state assignment in the region.
  • Open Chromatin and Transcription Factor Binding Sites (TFBS): display of signal or peaks from assays measuring open chromatin (DNAse-seq) or transcription factor binding (ChIP-seq) in various cell types/lines.
  • Histones and Polymerases: display of signal or peaks from experimental ChIP-seq assays measuring histone marks/modifications or binding of RNA Polymerases II and III.
  • DNA Methylation: RRBS and WGBS methylation tracks.
  • Other regulatory regions: imported tracks from external databases.

Species-specific microarray probe mappings can be visualised by turning on tracks from the separate 'Oligo probes' section at the bottom of the configuration panel.

Gene tab

Click on the Regulation link in the left hand side menu to view the regulatory features and GTEx SNP-gene associations in the vicinity of your gene. Note that a gene is not necessarily controlled by nearby regulatory elements, but rather by distal elements.

Regulation tab

Clicking on a regulatory feature will open a Regulation tab with information about the evidence supporting that regulatory feature as well as cell-specific activity estimates. Different views can be selected:

  • Summary: this view displays the selected regulatory feature from the Regulatory Build. No cell-type specific activity levels are displayed by default. To turn those on click on 'Select cells'.
  • Details by cell type: this view displays the activity of the regulatory feature in any selected cell type (HUVEC by default) along with a default set of supporting evidence. You can display more cell types and/or evidence tracks by clicking on Configure this page or the Select cells/Select evidence button above the image.
  • Feature Context: this view displays the regulatory features in a wider context around the chosen regulatory feature and their activity levels across all available cell types.
  • Evidence: this view displays the complete list of supporting core evidence, such as histone modifications presence, transcription factor binding and open chromatin, for the chosen regulatory feature in a table.


Data Access

In addition to the browser, the regulation data in Ensembl can also be accessed through :

References

  • Daniel R Zerbino, Steven P Wilder, Nathan Johnson, Thomas Juettemann and Paul R Flicek
    The Ensembl Regulatory Build
    Genome Biology 16:56
    doi:10.1186/s13059-015-0621-5

  • Daniel R Zerbino, Nathan Johnson, Thomas Juettemann, Dan Sheppard, Steven P Wilder, Ilias Lavidas, Michael Nuhn, Emily Perry, Quentin Raffaillac-Desfosses, Daniel Sobral, Damian Keefe, Stafan Gräf, Ikhlak Ahmed, Rhoda Kinsella, Bethan Pritchard, Simon Brent, Ridwan Amode, Anne Parker, Steven Trevanion, Ewan Birney, Ian Dunham and Paul Flicek
    Ensembl Regulation Resources
    Database pii: bav119
    doi: 10.1093/database/bav119

  • Daniel R. Zerbino, Nathan Johnson, Thomas Juettemann, Steven P. Wilder and Paul Flicek
    WiggleTools: parallel processing of large collections of genome-wide datasets for visualization and statistical analysis
    Bioinformatics 30(7):1008-9
    10.1093/bioinformatics/btt737

  • Andrew Yates, Wasiu Akanni, M. Ridwan Amode, Daniel Barrell, Konstantinos Billis, Denise Carvalho-Silva, Carla Cummins, Peter Clapham, Stephen Fitzgerald, Laurent Gil, Carlos García-Girón, Leo Gordon, Thibaut Hourlier, Sarah E. Hunt, Sophie H. Janacek, Nathan Johnson, Thomas Juettemann, Stephen Keenan, Ilias Lavidas, Fergal J. Martin, Thomas Maurel, William McLaren, Daniel N. Murphy, Rishi Nag, Michael Nuhn, Anne Parker, Mateus Patricio, Miguel Pignatelli, Matthew Rahtz, Harpreet Singh Riat, Daniel Sheppard, Kieron Taylor, Anja Thormann, Alessandro Vullo, Steven P. Wilder, Amonida Zadissa, Ewan Birney, Jennifer Harrow, Matthieu Muffato, Emily Perry, Magali Ruffier, Giulietta Spudich, Stephen J. Trevanion, Fiona Cunningham, Bronwen L. Aken, Daniel R. Zerbino and Paul Flicek
    Ensembl 2016
    Nucleic Acids Research 4;44(D1):D710-6
    doi: 10.1093/nar/gkv1157