In early 2020 we are planning to remove all data other than for human from this archive website. If you think this will impact you please read our blog post for full details.

EnsemblEnsembl Home

Multiple genome alignments

Multiple alignments are calculated between groups of genomes. These are used to calculate ancestral sequences, age of base, conservation scores and constrained elements.

Alignments available

NameGenomesMethod used
21 amniota vertebratesAnole Lizard, Cat, Chicken, Chimpanzee, Cow, Dog, Gorilla, Horse, Human, Macaque, Marmoset, Mouse, Opossum, Orangutan, Pig, Platypus, Rabbit, Rat, Sheep, Turkey, Zebra FinchMercator-Pecan
5 teleost fishFugu, Medaka, Stickleback, Tetraodon, ZebrafishEPO
6 primatesChimpanzee, Gorilla, Human, Macaque, Marmoset, OrangutanEPO
4 sauropsidsAnole Lizard, Chicken, Turkey, Zebra FinchEPO
15 eutherian mammalsCat, Chimpanzee, Cow, Dog, Gorilla, Horse, Human, Macaque, Marmoset, Mouse, Orangutan, Pig, Rabbit, Rat, SheepEPO
37 eutherian mammalsAlpaca, Armadillo, Bushbaby, Cat, Chimpanzee, Cow, Dog, Dolphin, Elephant, Ferret, Gibbon, Gorilla, Guinea Pig, Hedgehog, Horse, Human, Hyrax, Kangaroo rat, Macaque, Marmoset, Megabat, Microbat, Mouse, Mouse Lemur, Orangutan, Panda, Pig, Pika, Rabbit, Rat, Sheep, Shrew, Sloth, Squirrel, Tarsier, Tenrec, Tree ShrewEPO-Low-Coverage
7 sauropsidsAnole Lizard, Chicken, Chinese softshell turtle, Duck, Flycatcher, Turkey, Zebra FinchEPO-Low-Coverage
10 teleost fishCave fish, Cod, Fugu, Medaka, Platyfish, Spotted gar, Stickleback, Tetraodon, Tilapia, ZebrafishEPO-Low-Coverage

Alignment methods

PECAN Multiple Alignment

Pecan is used to provide global multiple genomic alignments. First, Mercator is used to build a synteny map between the genomes and then Pecan builds alignments in these syntenic regions.

Pecan is a global multiple sequence alignment program that makes practical the probabilistic consistency methodology for significant numbers of sequences of practically arbitrary length. As input it takes a set of sequences and a phylogenetic tree. The parameters and heuristics it employs are highly user configurable, it is written entirely in Java and also requires the installation of Exonerate.

EPO Multiple Alignment

The EPO (Enredo, Pecan, Ortheus) pipeline is a three step pipeline for whole-genome multiple alignments.

  1. Enredo produces colinear segments from extant genomes handling both rearrangements, deletions and duplications.
  2. Pecan, as described above, is used to align these segments.
  3. Finally, Ortheus is used to create genome-wide ancestral sequence reconstructions.

Further details on these methods can be found at: Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs

EPO-LOW-COVERAGE Multiple Alignment

Due to difficulties with running Ortheus on the fragmented assemblies, we have two flavours of the pipeline.

  1. The plain EPO pipeline is available on the chromosome-level genomes, listed as EPO in the table above
  2. The scaffold-level genomes are then projected onto the EPO alignments using LastZ-net alignments, listed as EPO-LOW-COVERAGE.

By construction, each pair of EPO and EPO-LOW-COVERAGE alignments represent the exact same alignment of chromosome-level genomes.

Progressive Cactus

Progressive-Cactus is a next-generation aligner that stores whole-genome alignments in a graph structure. Genomes can be added incrementally, which makes it scalable to hundreds of genomes. Further details on these methods can be found in Algorithms for genome multiple sequence alignment and Cactus graphs for genome comparisons.