Ensembl Core and OtherFeatures databases

Database Descriptions

Core database

For each organism supported by Ensembl, a species-specific Ensembl Core database stores the bulk of annotation information. Some of the key features are:

  • The unmasked genome sequence is stored on the sequence-level i.e. in form of BAC clones or whole genome shotgun scaffolds.
  • Assembly information to generate top-level sequences like chromosomes from sequence-level entities.
  • The non-redundant set of Ensembl gene, transcript and protein models annotated by the automated genome analysis and annotation pipeline.
  • Sequence tagged sites (STS) marker information on physical sequence and genetic map locations.
  • Cytogenetic banding patterns if a reliable banding pattern has been established for the organism.
  • Microarray probe set annotation.
  • External references to other databases of biological information.
  • cDNA and protein alignments supporting the transcript models that have been built. The cDNA alignments found in the core database are a subset of the cDNA alignments found in the cDNA database (for human and mouse) or otherfeatures database (all other species).

cDNA database

Human and Mouse have a separate cDNA database, containing alignments of all species-specific cDNAs to the genome sequence, which serve as source of biological evidence in the Ensembl annotation strategy. The alignments are updated every release and should therefore include even the most recently submitted cDNA sequences.

OtherFeatures database

For human and mouse, the otherfeatures database contains the full set of species-specific EST alignments to the genome. For all other species with enough species-specific data, the otherfeatures database contains the full set of species-specific cDNA and EST alignments to the genome. Some RNA-seq data may also be stored in the otherfeatures database.

For well-characterised species with a reasonable amount of biological sequence records as supporting evidence Ensembl provides an independent EST gene set, which is solely based on EST information.

Please note that Ensembl OtherFeatures databases do not contain genome sequences. This saves disk space and by connecting to Core and the OtherFeatures databases simultaneously, it is still possible to retrieve sequence information.

RNA-seq database

The rnaseq database contains RNA-seq based gene models generated using our in-house RNA-seq pipeline. This database may also point to BAM files.

Database Schema

The layout of Ensembl Core and OtherFeature databases is identical so that schema descriptions and API access apply similarly. The database table layout is explained in the following document:

Perl API

A comprehensive Perl Application Program Interface (API) provides efficient access to tables within the Ensembl Core and EST databases.

Ensembl Software Support

Ensembl is an open project and we would like to encourage correspondence and discussions on any subject on any aspect of Ensembl. Please see the Ensembl Contacts page for suitable options getting in touch with us.