In the first quarter of 2020 we will be removing all data other than for human from this archive website. Please read our blog post for full details.

EnsemblEnsembl Home

Variation Pattern Finder

The Variation Pattern Finder lets you look for patterns of shared variation between individuals in the same vcf file. In any specified chromosomal regions, different samples will have different combinations of variations. The finder looks for distinct variation combinations within the region, as well as individuals associated with each variation combination pattern. The finder only focuses on variations that change protein coding sequences such as missense variants, splice site changes.

The vcf format is a tab format for presenting variation sites and genotypes data and is described: http://vcftools.sourceforge.net/specs.html. This tool takes both vcf4.0 and vcf4.1 format files.

Currently available for GRCh37 only, you can access the Variation Pattern Finder from the tools link in the menu bar at the top of every page.

Firstly this form gives you the option of naming your job.

To select the region you want to slice from, type the chromosome and coordinates; e.g. 1:1-50000.

You now have the option to select 1000 Genomes Phase 3, 1000 Genomes Phase 1 or to provide the URL of any publicly visible vcf file (over http or ftp). These URLs must be accompanied by either a tabix index (.tbi) of the same name. For more information about creating tabix indexes please look at Tabix: fast retrieval of sequence features from generic TAB-delimited files. All 1000 Genomes VCF files on the FTP site have these indices with them.

After clicking next the system produces your final file.

 

The results file will have the following sections:

1. Variation Header:  

  • line 1, variation rs number and the reference allele for the site, separated by “:”. When rs number is not available, chromosomal position of the site is given.  
  • line 2, chromosome and chromosomal position of the variation, separated by “:” 
  • line 3 and more, functional consequences of the SNP on transcript specified, one transcript per line. When it is non-synonymous coding, the amino acid changes are also displayed.


2. Freq column: it gives the frequency of the given variant genotype combination in the file


3. Sample panel: it displays the first 2 samples for a particular population who have this pattern of variation and the heading shows which population that sample group is from


4. Genotype Panel: this is the individual genotypes as given by the VCF file. Please note if the delimiter symbol is | this means the genotype is phased; otherwise un-phased. “./.” in the expanded view represents sites with no genotype data. “-“ in the collapsed view represent genotypes that are either homozygous reference or no data.