In early 2020 we are planning to remove all data other than for human from this archive website. If you think this will impact you please read our blog post for full details.

EnsemblEnsembl Home

Biotypes

  • Biotype: A gene or transcript classification.
    • IG gene: Immunoglobulin gene that undergoes somatic recombination, annotated in collaboration with IMGT http://www.imgt.org/.
      • IG C gene: Constant chain immunoglobulin gene that undergoes somatic recombination before transcription
      • IG D gene: Diversity chain immunoglobulin gene that undergoes somatic recombination before transcription
      • IG J gene: Joining chain immunoglobulin gene that undergoes somatic recombination before transcription
      • IG V gene: Variable chain immunoglobulin gene that undergoes somatic recombination before transcription
    • Processed transcript: Gene/transcript that doesn't contain an open reading frame (ORF).
      • Long non-coding RNA (lncRNA): A non-coding gene/transcript >200bp in length
        • 3' overlapping ncRNA: Transcripts where ditag and/or published experimental data strongly supports the existence of long (>200bp) non-coding transcripts that overlap the 3'UTR of a protein-coding locus on the same strand.
        • Antisense: Transcripts that overlap the genomic span (i.e. exon or introns) of a protein-coding locus on the opposite strand.
        • Macro lncRNA: Unspliced lncRNAs that are several kb in size.
        • Non coding: Transcripts which are known from the literature to not be protein coding.
        • Retained intron: An alternatively spliced transcript believed to contain intronic sequence relative to other, coding, transcripts of the same gene.
        • Sense intronic: A long non-coding transcript in introns of a coding gene that does not overlap any exons.
        • Sense overlapping: A long non-coding transcript that contains a coding gene in its intron on the same strand.
        • lincRNA (long intergenic ncRNA): Transcripts that are long intergenic non-coding RNA locus with a length >200bp. Requires lack of coding potential and may not be conserved between species.
      • ncRNA: A non-coding gene.
        • miRNA: A small RNA (~22bp) that silences the expression of target mRNA.
        • miscRNA: Miscellaneous RNA. A non-coding RNA that cannot be classified.
        • piRNA: An RNA that interacts with piwi proteins involved in genetic silencing.
        • rRNA: The RNA component of a ribosome.
        • siRNA: A small RNA (20-25bp) that silences the expression of target mRNA through the RNAi pathway.
        • snRNA: Small RNA molecules that are found in the cell nucleus and are involved in the processing of pre messenger RNAs
        • snoRNA: Small RNA molecules that are found in the cell nucleolus and are involved in the post-transcriptional modification of other RNAs.
        • tRNA: A transfer RNA, which acts as an adaptor molecule for translation of mRNA.
        • vaultRNA: Short non coding RNA genes that form part of the vault ribonucleoprotein complex.
    • Protein coding: Gene/transcipt that contains an open reading frame (ORF).
    • Pseudogene: A gene that has homology to known protein-coding genes but contain a frameshift and/or stop codon(s) which disrupts the ORF. Thought to have arisen through duplication followed by loss of function.
      • IG pseudogene: Inactivated immunoglobulin gene.
      • Polymorphic pseudogene: Pseudogene owing to a SNP/indel but in other individuals/haplotypes/strains the gene is translated.
      • Processed pseudogene: Pseudogene that lack introns and is thought to arise from reverse transcription of mRNA followed by reinsertion of DNA into the genome.
      • Transcribed pseudogene: Pseudogene where protein homology or genomic structure indicates a pseudogene, but the presence of locus-specific transcripts indicates expression. These can be classified into 'Processed', 'Unprocessed' and 'Unitary'.
      • Translated pseudogene: Pseudogenes that have mass spec data suggesting that they are also translated. These can be classified into 'Processed', 'Unprocessed'
      • Unitary pseudogene: A species specific unprocessed pseudogene without a parent gene, as it has an active orthologue in another species.
      • Unprocessed pseudogene: Pseudogene that can contain introns since produced by gene duplication.
    • Readthrough: A readthrough transcript has exons that overlap exons from transcripts belonging to two or more different loci (in addition to the locus to which the readthrough transcript itself belongs).
    • TEC (To be Experimentally Confirmed): Regions with EST clusters that have polyA features that could indicate the presence of protein coding genes. These require experimental validation, either by 5' RACE or RT-PCR to extend the transcripts, or by confirming expression of the putatively-encoded peptide with specific antibodies.
    • TR gene: T cell receptor gene that undergoes somatic recombination, annotated in collaboration with IMGT http://www.imgt.org/.
      • TR C gene: Constant chain T cell receptor gene that undergoes somatic recombination before transcription
      • TR D gene: Diversity chain T cell receptor gene that undergoes somatic recombination before transcription
      • TR J gene: Joining chain T cell receptor gene that undergoes somatic recombination before transcription
      • TR V gene: Variable chain T cell receptor gene that undergoes somatic recombination before transcription