BioMart Perl API
The BioMart Perl API allow you to go a step further with BioMart and allow you to integrate BioMart Perl Code into custom Perl scripts.
Summary
- Obtaining the BioMart Perl script from the BioMart website
- Download the BioMart PERL API
- BioMart PERL API set up
Obtaining the BioMart Perl script from the BioMart website
You can easily obtain a BioMart Perl script from the BioMart interface. For example, navigate to the Ensembl gene mart on the Ensembl website, apply your required filters and select the attribute you are interested in. As shown in the example below, filter on the human Ensembl Gene ID "ENSG00000139618" and select the Ensembl Gene, Transcript IDs, HGNC IDs and symbols and Uniprot Swissprot accessions attribute. The BioMart Perl script can be downloaded from the BioMart result page accessible via the "Results" button. To get your BioMart query in Perl, just click on the "Perl" button as indicated by the red box in the image below:
The Perl button will open a new browser window and display the BioMart query in Perl, the text will be similar to the following image:
Just save the content of this page in a new file on your computer, e.g 'hgnc_swissprot.pl' in our example.
Download the BioMart PERL API
To run the BioMart Perl script, you will need to download the BioMart API from our GitHub Repository (Complete documentation can be found on the biomart.org website). To to do this you can follow the command below:
git clone --branch cvs/release-0_7 https://github.com/biomart/biomart-perl
BioMart PERL API set up
By default the biomart-perl API will be looking at the biomart.org website, this can be changed in the "biomart-perl/conf/martURLLocation.xml" file. The following URL will give you the Ensembl.org Mart registry information. For example in release 79:
<MartRegistry> <MartURLLocation database="ensembl_mart_94" default="1" displayName="Ensembl Genes 94" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_ENSEMBL" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" /> <MartURLLocation database="mouse_mart_94" default="" displayName="Mouse strains 94" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_MOUSE" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" /> <MartURLLocation database="sequence_mart_94" default="" displayName="Sequence" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_SEQUENCE" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="" /> <MartURLLocation database="ontology_mart_94" default="" displayName="Ontology" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_ONTOLOGY" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="" /> <MartURLLocation database="genomic_features_mart_94" default="" displayName="Genomic features 94" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_GENOMIC" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="" /> <MartURLLocation database="snp_mart_94" default="" displayName="Ensembl Variation 94" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_SNP" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" /> <MartURLLocation database="regulation_mart_94" default="" displayName="Ensembl Regulation 94" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_FUNCGEN" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" /> </MartRegistry>
Paste the text obtained on the following page into the biomart-perl/conf/martURLLocation.xml file.
Add biomart perl to your $PERL5LIB variable:
E.g in bash: PERL5LIB=${PERL5LIB}:${HOME}/biomart-perl/lib export PERL5LIB
Copy the Perl file previously obtained on the Ensembl website to "biomart-perl/scripts" and make the following modifications:
Edit the "$confFile" variable with the path to your registry file (e.g: biomart-perl/conf/martURLLocation.xml) as indicated by the number 1 in the image below.
Edit the "$action" variable to "clean" as indicated by the 2 in the image below. The variable need to be set up to "clean" everytime the "biomart-perl/conf/martURLLocation.xml" get updated as some data will be cached on your computer.
Run the Perl script to get the data from BioMart:
perl hgnc_swissprot.pl Connection parameters of [Ensembl Genes 94][ OK ] Connection parameters of [Mouse strains 94][ OK ] Connection parameters of [Sequence][ OK ] Connection parameters of [Ontology][ OK ] Connection parameters of [Genomic features 94][ OK ] Connection parameters of [Ensembl Variation 94][ OK ] Connection parameters of [Ensembl Regulation 94][ OK ] ... The script will print some output before you get the result ... Ensembl Gene ID Ensembl Transcript ID HGNC symbol UniProt/SwissProt Accession ENSG00000139618 ENST00000380152 BRCA2 P51587 ENSG00000139618 ENST00000528762 BRCA2 ENSG00000139618 ENST00000470094 BRCA2 ENSG00000139618 ENST00000544455 BRCA2 P51587
IMPORTANT: the first will run of the script will take some time since BioMart will cache some data on your computer.
Once you have run your script once, you can update the "$action" variable from "clean" to "cached". The run will be faster and you should get the following output:
perl hgnc_swissprot.pl Processing Cached Registry: ../conf/cachedRegistries/martURLLocation.xml.cached Ensembl Gene ID Ensembl Transcript ID HGNC symbol UniProt/SwissProt Accession ENSG00000139618 ENST00000380152 BRCA2 P51587 ENSG00000139618 ENST00000528762 BRCA2 ENSG00000139618 ENST00000470094 BRCA2 ENSG00000139618 ENST00000544455 BRCA2 P51587