Multiple genome alignments
|21 amniota vertebrates||Anole Lizard, Cat, Chicken, Chimpanzee, Cow, Dog, Gorilla, Horse, Human, Macaque, Marmoset, Mouse, Opossum, Orangutan, Pig, Platypus, Rabbit, Rat, Sheep, Turkey, Zebra Finch||Mercator-Pecan|
|5 teleost fish||Fugu, Medaka, Stickleback, Tetraodon, Zebrafish||EPO|
|6 primates||Chimpanzee, Gorilla, Human, Macaque, Marmoset, Orangutan||EPO|
|4 sauropsids||Anole Lizard, Chicken, Turkey, Zebra Finch||EPO|
|15 eutherian mammals||Cat, Chimpanzee, Cow, Dog, Gorilla, Horse, Human, Macaque, Marmoset, Mouse, Orangutan, Pig, Rabbit, Rat, Sheep||EPO|
|37 eutherian mammals||Alpaca, Armadillo, Bushbaby, Cat, Chimpanzee, Cow, Dog, Dolphin, Elephant, Ferret, Gibbon, Gorilla, Guinea Pig, Hedgehog, Horse, Human, Hyrax, Kangaroo rat, Macaque, Marmoset, Megabat, Microbat, Mouse, Mouse Lemur, Orangutan, Panda, Pig, Pika, Rabbit, Rat, Sheep, Shrew, Sloth, Squirrel, Tarsier, Tenrec, Tree Shrew||EPO-Low-Coverage|
|7 sauropsids||Anole Lizard, Chicken, Chinese softshell turtle, Duck, Flycatcher, Turkey, Zebra Finch||EPO-Low-Coverage|
|10 teleost fish||Cave fish, Cod, Fugu, Medaka, Platyfish, Spotted gar, Stickleback, Tetraodon, Tilapia, Zebrafish||EPO-Low-Coverage|
PECAN Multiple Alignment
Pecan is used to provide global multiple genomic alignments. First, Mercator is used to build a synteny map between the genomes and then Pecan builds alignments in these syntenic regions.
Pecan is a global multiple sequence alignment program that makes practical the probabilistic consistency methodology for significant numbers of sequences of practically arbitrary length. As input it takes a set of sequences and a phylogenetic tree. The parameters and heuristics it employs are highly user configurable, it is written entirely in Java and also requires the installation of Exonerate.
EPO Multiple Alignment
The EPO (Enredo, Pecan, Ortheus) pipeline is a three step pipeline for whole-genome multiple alignments.
- Enredo produces colinear segments from extant genomes handling both rearrangements, deletions and duplications.
- Pecan, as described above, is used to align these segments.
- Finally, Ortheus is used to create genome-wide ancestral sequence reconstructions.
Further details on these methods can be found at: Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs
EPO-LOW-COVERAGE Multiple Alignment
Due to difficulties with running Ortheus on the fragmented assemblies, we have two flavours of the pipeline.
- The plain EPO pipeline is available on the chromosome-level genomes, listed as EPO in the table above
- The scaffold-level genomes are then projected onto the EPO alignments using LastZ-net alignments, listed as EPO-LOW-COVERAGE.
By construction, each pair of EPO and EPO-LOW-COVERAGE alignments represent the exact same alignment of chromosome-level genomes.
Progressive-Cactus is a next-generation aligner that stores whole-genome alignments in a graph structure. Genomes can be added incrementally, which makes it scalable to hundreds of genomes. Further details on these methods can be found in Algorithms for genome multiple sequence alignment and Cactus graphs for genome comparisons.