Repeats
Repetitive sequence is found throughout genomes. It is important to mask repeats before gene annotation, as repeats will cause non-specific gene hits. Repeats are also useful for studying evolution and for DNA fingerprinting.
Types of repeats
Repeat type | Definition |
---|---|
Centromere | The region of the chromosome at which the two sister chromatids are joined during mitosis and meiosis, mostly composed of satellite DNA. |
Low complexity regions | Poly-purine or poly-pyrimidine stretches, or regions of extremely high AT or GC content. |
RNA repeats | Non-functional copies of RNA genes which have been reintegrated into the genome with the assistance of a reverse transcriptase. |
Satellite repeats | Multiple copies of the same base sequence on a DNA sequence. The repeated pattern can vary in length from a single base to several thousand bases long. |
Simple repeats | Duplications of simple sets of DNA bases (typically 1-5bp) such as A, CA, CGG etc. |
Tandem repeats | Typically found at the centromeres and telomeres of chromosomes these are duplications of more complex 100-200 base sequences. |
LTRs | Long tandem repeats. |
Type I Transposons/LINE | Long Interspersed Elements. Retrotransposed elements in the genome containing open reading frames encoding (often inactive) reverse transcription machinery. |
Type I Transposons/SINE | Short Interspersed Elements. Retrotransposed elements less than 500 bp that contain tRNA, snRNA and rRNA, which require other mobile elements to be transposed. Alu elements are a type of SINE. |
Type II Transposons | Elements that have been transposed and duplicated around the genome by excision and ligation. |
Unknown | Repeats that cannot be classified. |
Masking the repeats
Ensembl mask repeats using Repeat Masker and Dust.
Repeats can be viewed in the browser and extracted using our APIs. You can also download repeat-masked sequence from our FTP site, either hard-masked (rm) where repeats are replaced with Ns, or soft-masked (sm) where repeats are in lower-case text.