Repetitive sequence is found throughout genomes. It is important to mask repeats before gene annotation, as repeats will cause non-specific gene hits. Repeats are also useful for studying evolution and for DNA fingerprinting.
Types of repeats
|Centromere||The region of the chromosome at which the two sister chromatids are joined during mitosis and meiosis, mostly composed of satellite DNA.|
|Low complexity regions||Poly-purine or poly-pyrimidine stretches, or regions of extremely high AT or GC content.|
|RNA repeats||Non-functional copies of RNA genes which have been reintegrated into the genome with the assistance of a reverse transcriptase.|
|Satellite repeats||Multiple copies of the same base sequence on a DNA sequence. The repeated pattern can vary in length from a single base to several thousand bases long.|
|Simple repeats||Duplications of simple sets of DNA bases (typically 1-5bp) such as A, CA, CGG etc.|
|Tandem repeats||Typically found at the centromeres and telomeres of chromosomes these are duplications of more complex 100-200 base sequences.|
|LTRs||Long tandem repeats.|
|Type I Transposons/LINE||Long Interspersed Elements. Retrotransposed elements in the genome containing open reading frames encoding (often inactive) reverse transcription machinery.|
|Type I Transposons/SINE||Short Interspersed Elements. Retrotransposed elements less than 500 bp that contain tRNA, snRNA and rRNA, which require other mobile elements to be transposed. Alu elements are a type of SINE.|
|Type II Transposons||Elements that have been transposed and duplicated around the genome by excision and ligation.|
|Unknown||Repeats that cannot be classified.|
Masking the repeats
Repeats can be viewed in the browser and extracted using our APIs. You can also download repeat-masked sequence from our FTP site, either hard-masked (rm) where repeats are replaced with Ns, or soft-masked (sm) where repeats are in lower-case text.