BioMart Perl API

The BioMart Perl API allow you to go a step further with BioMart and allow you to integrate BioMart Perl Code into custom Perl scripts.

Summary

  1. Obtaining the BioMart Perl script from the BioMart website
  2. Download the BioMart PERL API
  3. BioMart PERL API set up

Obtaining the BioMart Perl script from the BioMart website

You can easily obtain a BioMart Perl script from the BioMart interface. For example, navigate to the Ensembl gene mart on the Ensembl website, apply your required filters and select the attribute you are interested in. As shown in the example below, filter on the human Ensembl Gene ID "ENSG00000139618" and select the Ensembl Gene, Transcript IDs, HGNC IDs and symbols and Uniprot Swissprot accessions attribute. The BioMart Perl script can be downloaded from the BioMart result page accessible via the "Results" button. To get your BioMart query in Perl, just click on the "Perl" button as indicated by the red box in the image below:

The Perl button will open a new browser window and display the BioMart query in Perl, the text will be similar to the following image:

Just save the content of this page in a new file on your computer, e.g 'hgnc_swissprot.pl' in our example.

Download the BioMart PERL API

To run the BioMart Perl script, you will need to download the BioMart API from our GitHub Repository (Complete documentation can be found on the biomart.org website). To to do this you can follow the command below:

git clone --branch cvs/release-0_7 https://github.com/biomart/biomart-perl

BioMart PERL API set up

By default the biomart-perl API will be looking at the biomart.org website, this can be changed in the "biomart-perl/conf/martURLLocation.xml" file. The following URL will give you the Ensembl.org Mart registry information. For example in release 79:

<MartRegistry>
  <MartURLLocation database="ensembl_mart_94" default="1" displayName="Ensembl Genes 94" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_ENSEMBL" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" />
  <MartURLLocation database="mouse_mart_94" default="" displayName="Mouse strains 94" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_MOUSE" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" />
  <MartURLLocation database="sequence_mart_94" default="" displayName="Sequence" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_SEQUENCE" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="" />
  <MartURLLocation database="ontology_mart_94" default="" displayName="Ontology" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_ONTOLOGY" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="" />
  <MartURLLocation database="genomic_features_mart_94" default="" displayName="Genomic features 94" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_GENOMIC" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="" />
  <MartURLLocation database="snp_mart_94" default="" displayName="Ensembl Variation 94" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_SNP" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" />
  <MartURLLocation database="regulation_mart_94" default="" displayName="Ensembl Regulation 94" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_FUNCGEN" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" />
</MartRegistry>

Paste the text obtained on the following page into the biomart-perl/conf/martURLLocation.xml file.

Add biomart perl to your $PERL5LIB variable:

E.g in bash:
PERL5LIB=${PERL5LIB}:${HOME}/biomart-perl/lib
export PERL5LIB

Copy the Perl file previously obtained on the Ensembl website to "biomart-perl/scripts" and make the following modifications:

  1. Edit the "$confFile" variable with the path to your registry file (e.g: biomart-perl/conf/martURLLocation.xml) as indicated by the number 1 in the image below.

  2. Edit the "$action" variable to "clean" as indicated by the 2 in the image below. The variable need to be set up to "clean" everytime the "biomart-perl/conf/martURLLocation.xml" get updated as some data will be cached on your computer.

Run the Perl script to get the data from BioMart:

perl hgnc_swissprot.pl


Connection parameters of [Ensembl Genes 94][ OK ]
Connection parameters of [Mouse strains 94][ OK ]
Connection parameters of [Sequence][ OK ]
Connection parameters of [Ontology][ OK ]
Connection parameters of [Genomic features 94][ OK ]
Connection parameters of [Ensembl Variation 94][ OK ]
Connection parameters of [Ensembl Regulation 94][ OK ]


...
The script will print some output before you get the result
...


Ensembl Gene ID  Ensembl Transcript ID  HGNC symbol  UniProt/SwissProt Accession
ENSG00000139618  ENST00000380152        BRCA2        P51587
ENSG00000139618  ENST00000528762        BRCA2
ENSG00000139618  ENST00000470094        BRCA2
ENSG00000139618  ENST00000544455        BRCA2        P51587

IMPORTANT: the first will run of the script will take some time since BioMart will cache some data on your computer.

Once you have run your script once, you can update the "$action" variable from "clean" to "cached". The run will be faster and you should get the following output:

perl hgnc_swissprot.pl

Processing Cached Registry: ../conf/cachedRegistries/martURLLocation.xml.cached

Ensembl Gene ID  Ensembl Transcript ID  HGNC symbol  UniProt/SwissProt Accession
ENSG00000139618  ENST00000380152        BRCA2        P51587
ENSG00000139618  ENST00000528762        BRCA2
ENSG00000139618  ENST00000470094        BRCA2
ENSG00000139618  ENST00000544455        BRCA2        P51587