EMBL-EBI User Survey 2024

Do data resources managed by EMBL-EBI and our collaborators make a difference to your work?

Please take 10 minutes to fill in our annual user survey, and help us make the case for why sustaining open data resources is critical for life sciences research.

Survey link: https://www.surveymonkey.com/r/HJKYKTT?channel=[webpage]

Ensembl Variation - Calculated variant consequences

For each variant that is mapped to the reference genome, we identify all overlapping Ensembl transcripts. We then use a rule-based approach to predict the effects that each allele of the variant may have on each transcript. The set of consequence terms, defined by the Sequence Ontology (SO), that can be currently assigned to each combination of an allele and a transcript is shown in the table below. Note that each allele of each variant may have a different effect in different transcripts.

This approach is applied to all germline variants and somatic mutations stored in the Ensembl databases. The resulting consequence type calls, along with information determined as part of the process, such as the cDNA and CDS coordinates, and the affected codons and amino acids in coding transcripts, are stored in the Ensembl Variation database and displayed on our website. For human and mouse variants any overlap with regulatory features is also displayed. For structural variants consequence terms are calculated on the fly for display on our website or API access. You can use this pipeline to annotate your own data via VEP. By default, VEP will include upstream and downstream annotations for variants within 5kb of a nearby feature, see --distance in VEP options.

See below a diagram showing the location of each display term relative to the transcript structure:
Consequence diagram

The terms in the table below are shown in order of severity (more severe to less severe) as estimated by Ensembl, and this ordering is used on the website summary views. This ordering is necessarily subjective and API and VEP users can always get the full set of consequences for each allele and make their own severity judgement. The IMPACT rating is a separate rating given for compatibility with other variant annotation tools (e.g. snpEff).

*SO termSO descriptionSO accessionDisplay termIMPACT
transcript_ablationA feature ablation whereby the deleted region includes a transcript featureSO:0001893Transcript ablationHIGH
splice_acceptor_variantA splice variant that changes the 2 base region at the 3' end of an intronSO:0001574Splice acceptor variantHIGH
splice_donor_variantA splice variant that changes the 2 base region at the 5' end of an intronSO:0001575Splice donor variantHIGH
stop_gainedA sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcriptSO:0001587Stop gainedHIGH
frameshift_variantA sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of threeSO:0001589Frameshift variantHIGH
stop_lostA sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcriptSO:0001578Stop lostHIGH
start_lostA codon variant that changes at least one base of the canonical start codonSO:0002012Start lostHIGH
transcript_amplificationA feature amplification of a region containing a transcriptSO:0001889Transcript amplificationHIGH
feature_elongationA sequence variant that causes the extension of a genomic feature, with regard to the reference sequenceSO:0001907Feature elongationHIGH
feature_truncationA sequence variant that causes the reduction of a genomic feature, with regard to the reference sequenceSO:0001906Feature truncationHIGH
inframe_insertionAn inframe non synonymous variant that inserts bases into in the coding sequenceSO:0001821Inframe insertionMODERATE
inframe_deletionAn inframe non synonymous variant that deletes bases from the coding sequenceSO:0001822Inframe deletionMODERATE
missense_variantA sequence variant, that changes one or more bases, resulting in a different amino acid sequence but where the length is preservedSO:0001583Missense variantMODERATE
protein_altering_variantA sequence_variant which is predicted to change the protein encoded in the coding sequenceSO:0001818Protein altering variantMODERATE
splice_donor_5th_base_variantA sequence variant that causes a change at the 5th base pair after the start of the intron in the orientation of the transcriptSO:0001787Splice donor 5th base variantLOW
splice_region_variantA sequence variant in which a change has occurred within the region of the splice site, either within 1-3 bases of the exon or 3-8 bases of the intronSO:0001630Splice region variantLOW
splice_donor_region_variantA sequence variant that falls in the region between the 3rd and 6th base after splice junction (5' end of intron)SO:0002170Splice donor region variantLOW
splice_polypyrimidine_tract_variantA sequence variant that falls in the polypyrimidine tract at 3' end of intron between 17 and 3 bases from the end (acceptor -3 to acceptor -17)SO:0002169Splice polypyrimidine tract variantLOW
incomplete_terminal_codon_variantA sequence variant where at least one base of the final codon of an incompletely annotated transcript is changedSO:0001626Incomplete terminal codon variantLOW
start_retained_variantA sequence variant where at least one base in the start codon is changed, but the start remainsSO:0002019Start retained variantLOW
stop_retained_variantA sequence variant where at least one base in the terminator codon is changed, but the terminator remainsSO:0001567Stop retained variantLOW
synonymous_variantA sequence variant where there is no resulting change to the encoded amino acidSO:0001819Synonymous variantLOW
coding_sequence_variantA sequence variant that changes the coding sequenceSO:0001580Coding sequence variantMODIFIER
mature_miRNA_variantA transcript variant located with the sequence of the mature miRNASO:0001620Mature miRNA variantMODIFIER
5_prime_UTR_variantA UTR variant of the 5' UTRSO:00016235 prime UTR variantMODIFIER
3_prime_UTR_variantA UTR variant of the 3' UTRSO:00016243 prime UTR variantMODIFIER
non_coding_transcript_exon_variantA sequence variant that changes non-coding exon sequence in a non-coding transcriptSO:0001792Non coding transcript exon variantMODIFIER
intron_variantA transcript variant occurring within an intronSO:0001627Intron variantMODIFIER
NMD_transcript_variantA variant in a transcript that is the target of NMDSO:0001621NMD transcript variantMODIFIER
non_coding_transcript_variantA transcript variant of a non coding RNA geneSO:0001619Non coding transcript variantMODIFIER
coding_transcript_variantA transcript variant of a protein coding geneSO:0001968Coding transcript variantMODIFIER
upstream_gene_variantA sequence variant located 5' of a geneSO:0001631Upstream gene variantMODIFIER
downstream_gene_variantA sequence variant located 3' of a geneSO:0001632Downstream gene variantMODIFIER
TFBS_ablationA feature ablation whereby the deleted region includes a transcription factor binding siteSO:0001895TFBS ablationMODIFIER
TFBS_amplificationA feature amplification of a region containing a transcription factor binding siteSO:0001892TFBS amplificationMODIFIER
TF_binding_site_variantA sequence variant located within a transcription factor binding siteSO:0001782TF binding site variantMODIFIER
regulatory_region_ablationA feature ablation whereby the deleted region includes a regulatory regionSO:0001894Regulatory region ablationMODIFIER
regulatory_region_amplificationA feature amplification of a region containing a regulatory regionSO:0001891Regulatory region amplificationMODIFIER
regulatory_region_variantA sequence variant located within a regulatory regionSO:0001566Regulatory region variantMODIFIER
intergenic_variantA sequence variant located in the intergenic region, between genesSO:0001628Intergenic variantMODIFIER
sequence_variantA sequence_variant is a non exact copy of a sequence_feature or genome exhibiting one or more sequence_alterationSO:0001060Sequence variantMODIFIER

* Corresponding colours for the Ensembl web displays.

Missense variants may have further annotation on their effect on the protein function, using a number of algorithms.