EnsemblEnsembl Home

Variant Recoder


Variant Recoder is a tool for translating between different variant names. It accepts HGVS descriptions and variant identifiers from databases such as dbSNP, ClinVar, UniProt and PharmGKB (see full list of accession databases) as input.

Some forms of ambiguous and incorrect HGVS descriptions are supported including those stating only gene name and protein change (eg. AGT:p.Met268Thr) which are often seen in publications.

For each input variant, variant_recoder reports the variant identifiers held in Ensembl and HGVS genomic, transcript and protein descriptions on Ensembl, RefSeq and LRG sequences.


REST API

Ensembl provides a REST API for Variant Recoder with 2 endpoints:


Download and install

Variant Recoder is part of the VEP package.

Please follow the instructions about the download and installation of VEP.

Note

Because Variant Recoder depends on database access for identifier lookup, and cannot be used in offline mode as per VEP, you don't need to download the VEP cache files (which will be asked during the installation process).


Usage

Variant Recoder depends on database access for identifier lookup, and cannot be used in offline mode as per VEP.

The output format is JSON and the JSON Perl module is required.

# Running on one ID, as a string:
./variant_recoder --id [input_data_string]

# Running on several IDs, in a text file:
./variant_recoder -i [input_file] --species [species]

Like VEP, Variant Recoder can use VCF, variant identifiers and HGVS notations in addition to the VEP default format as input.


Output

Output is a JSON array of objects, one per input variant, with the following keys:
  • input: input string
  • id: variant identifiers
  • hgvsg: HGVS genomic nomenclature
  • hgvsc: HGVS transcript nomenclature
  • hgvsp: HGVS protein nomenclature
  • warnings: Warnings generated e.g. for invalid HGVS

Tips

Use --pretty to pre-format and indent JSON output.

Example of output, with the --pretty flag:

./variant_recoder --id "AGT:p.Met268Thr" --pretty

[
   {
      "input" : "AGT:p.Met268Thr",
      "id" : [
         "rs699",
         "CM920010"
      ],
      "hgvsg" : [
         "NC_000001.11:g.230710048A>G"
      ],
      "hgvsc" : [
         "ENST00000366667.4:c.803T>C",
         "NM_000029.3:c.803T>C"
      ],
      "hgvsp" : [
         "ENSP00000355627.4:p.Met268Thr",
         "NP_000020.1:p.Met268Thr"
      ],
      "warnings" : [
         "Possible invalid use of gene name 'AGT' as HGVS reference; AGT:p.Met268Thr may resolve to multiple genomic locations"
      ]
   }
]

Options

Variant Recoder shares many of the same command line flags as VEP.

However some other flags are unique to variant_recoder:

Flag Alternate Description
-input_data [input_string]
-id

A single variant as a string, such as:

e.g.:
--id rs699
e.g.:
--id "ENST00000366667.4:c.803T>C"
e.g.:
--id "1  230710048  rs699  A  G"
e.g.:
--id "1  230710048   230710048  A/G  +"
--input_file [input_file]
-i

Input file containing one or more variants, one per line. Mixed formats is disallowed.
Here are the input formats supported:

--species
Species to use. Default value: homo_sapiens
--grch37
Use GRCh37 assembly instead of GRCh38
--genomes
Set database parameters for Ensembl Genomes species
--pretty
Write pre-formatted indented JSON
--fields [field1,field2]

Limit the output fields. Comma-separated list, one or more of: id, hgvsg, hgvsc, hgvsp. e.g.:

./variant_recoder --id "AGT:p.Met268Thr" --fields id,hgvsc
--host [db_host]
Change database host from default ensembldb.ensembl.org (UK).
Geographic mirrors are useastdb.ensembl.org (US East Coast) and asiadb.ensembl.org (Asia).
Other flags such as --user, --port and --pass may also be set.
--pick, --per_gene, --pick_allele,
--pick_allele_gene, --pick_order
Set and customise transcript selection process, see VEP documentation