Repeats

Repetitive sequence is found throughout genomes. It is important to mask repeats before gene annotation, as repeats will cause non-specific gene hits. Repeats are also useful for studying evolution and for DNA fingerprinting.

Types of repeats

Repeat type	Definition
Centromere	The region of the chromosome at which the two sister chromatids are joined during mitosis and meiosis, mostly composed of satellite DNA.
Low complexity regions	Poly-purine or poly-pyrimidine stretches, or regions of extremely high AT or GC content.
RNA repeats	Non-functional copies of RNA genes which have been reintegrated into the genome with the assistance of a reverse transcriptase.
Satellite repeats	Multiple copies of the same base sequence on a DNA sequence. The repeated pattern can vary in length from a single base to several thousand bases long.
Simple repeats	Duplications of simple sets of DNA bases (typically 1-5bp) such as A, CA, CGG etc.
Tandem repeats	Typically found at the centromeres and telomeres of chromosomes these are duplications of more complex 100-200 base sequences.
LTRs	Long tandem repeats.
Type I Transposons/LINE	Long Interspersed Elements. Retrotransposed elements in the genome containing open reading frames encoding (often inactive) reverse transcription machinery.
Type I Transposons/SINE	Short Interspersed Elements. Retrotransposed elements less than 500 bp that contain tRNA, snRNA and rRNA, which require other mobile elements to be transposed. Alu elements are a type of SINE.
Type II Transposons	Elements that have been transposed and duplicated around the genome by excision and ligation.
Unknown	Repeats that cannot be classified.

Masking the repeats

Ensembl mask repeats using Repeat Masker and Dust.

Repeats can be viewed in the browser and extracted using our APIs. You can also download repeat-masked sequence from our FTP site, either hard-masked (rm) where repeats are replaced with Ns, or soft-masked (sm) where repeats are in lower-case text.