BiomaRt, Bioconductor R package

The Bioconductor BiomaRt R package is a quick, easy and powerful way to access BioMart right from your R software terminal.

The following documention is using R 2.2 and Bioconductor version 3.1.

Summary

  1. How to install the Bioconductor BiomaRt R package
  2. Bioconductor BiomaRt R package documentation
  3. Bioconductor BiomaRt R examples with the Ensembl Gene mart

How to install the Bioconductor BiomaRt R package

First make sure you have installed the R software on your computer.

Then, run the following commands to install the Bioconductor BiomaRt R package:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("biomaRt")

Bioconductor BiomaRt R package documentation

More information regarding the Bioconductor BiomaRt, R package and documentation can be found on the BiomaRt Bioconductor page.

Bioconductor BiomaRt R examples with the Ensembl Gene mart

listEnsembl & listDatasets

To get the list of all the Ensembl mart availables on the ensembl.org website, run the "listEnsembl" function:

> library(biomaRt)

> listEnsembl()

     biomart               version
1    ensembl               Ensembl Genes 79
2        snp               Ensembl Variation 79
3 regulation               Ensembl Regulation 79

You can give an Ensembl archive version as a parameter to get the list of archived Ensembl marts, for example for the Ensembl GRCh37 or release 78 marts:

> listEnsembl("GRCh=37")

     biomart            version
1    ensembl            Ensembl Genes
2        snp            Ensembl Variation
3 regulation            Ensembl Regulation

> listEnsembl(version=78)

     biomart               version
1    ensembl               Ensembl Genes 78
2        snp               Ensembl Variation 78
3 regulation               Ensembl Regulation 78

The "listDatasets" function will give you the list of all the species available (mart datasets) for a given mart:

> library(biomaRt)
> ensembl = useEnsembl(biomart="ensembl")
> head(listDatasets(ensembl))
                         dataset                                description version
1         oanatinus_gene_ensembl     Ornithorhynchus anatinus genes (OANA5)   OANA5
2        cporcellus_gene_ensembl            Cavia porcellus genes (cavPor3) cavPor3
3        gaculeatus_gene_ensembl     Gasterosteus aculeatus genes (BROADS1) BROADS1
4         lafricana_gene_ensembl         Loxodonta africana genes (loxAfr3) loxAfr3
5 itridecemlineatus_gene_ensembl Ictidomys tridecemlineatus genes (spetri2) spetri2
6        choffmanni_gene_ensembl        Choloepus hoffmanni genes (choHof1) choHof1

You can also use listDatasets with the Ensembl GRCh37 and archived marts:

> library(biomaRt)
> grch37 = useEnsembl(biomart="ensembl",GRCh=37)
> listDatasets(grch37)[31:35,]
                    dataset                                description      version
31    hsapiens_gene_ensembl            Homo sapiens genes (GRCh37.p13)   GRCh37.p13
32       mfuro_gene_ensembl Mustela putorius furo genes (MusPutFur1.0) MusPutFur1.0
33  tbelangeri_gene_ensembl           Tupaia belangeri genes (tupBel1)      tupBel1
34     ggallus_gene_ensembl              Gallus gallus genes (Galgal4)      Galgal4
35 xtropicalis_gene_ensembl          Xenopus tropicalis genes (JGI4.2)       JGI4.2


> ensembl78 = useEnsembl(biomart="ensembl",version=78)
> listDatasets(ensembl78)[31:35,]

                  dataset                                description      version
31 mlucifugus_gene_ensembl           Myotis lucifugus genes (myoLuc2)      myoLuc2
32   hsapiens_gene_ensembl                Homo sapiens genes (GRCh38)       GRCh38
33   pformosa_gene_ensembl      Poecilia formosa genes (PoeFor_5.1.2) PoeFor_5.1.2
34      mfuro_gene_ensembl Mustela putorius furo genes (MusPutFur1.0) MusPutFur1.0
35 tbelangeri_gene_ensembl           Tupaia belangeri genes (tupBel1)      tupBel1

useEnsembl

The "useEnsembl" function allow you to connect to a an ensembl website mart by specifying a BioMart and dataset parameters. For example, to connect to the Ensembl live gene mart human dataset (GRCh38):

> library(biomaRt)
> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")

To connect to the human dataset of the Ensembl GRCh37 or release 78 gene marts:

> library(biomaRt)
> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl", GRCh=37)


> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl", version=78)

listMarts, listDatasets and useMart for the Ensembl mirrors

You can connect to the following Ensembl mirrors using the listMarts, listDatasets and useMart functions:

For example to connect to the Ensembl US West mirror:

> library(biomaRt)
> listMarts(host="uswest.ensembl.org")
               biomart               version
1 ENSEMBL_MART_ENSEMBL               Ensembl Genes 79
2     ENSEMBL_MART_SNP               Ensembl Variation 79
3 ENSEMBL_MART_FUNCGEN               Ensembl Regulation 79
4    ENSEMBL_MART_VEGA               Vega 59
5                pride               PRIDE (EBI UK)

> ensembl_us_west = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="uswest.ensembl.org")

> head(listDatasets(ensembl_us_west))

                         dataset                                description version
1         oanatinus_gene_ensembl     Ornithorhynchus anatinus genes (OANA5)   OANA5
2        cporcellus_gene_ensembl            Cavia porcellus genes (cavPor3) cavPor3
3        gaculeatus_gene_ensembl     Gasterosteus aculeatus genes (BROADS1) BROADS1
4         lafricana_gene_ensembl         Loxodonta africana genes (loxAfr3) loxAfr3
5 itridecemlineatus_gene_ensembl Ictidomys tridecemlineatus genes (spetri2) spetri2
6        choffmanni_gene_ensembl        Choloepus hoffmanni genes (choHof1) choHof1

Please note that the useMart function will always require a biomart and host parameters when connecting to an Ensembl mirror website.

listFilters & listAttributes

The "listFilters" function will give you the list of available filters for a given mart and species:

> library(biomaRt)
> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")
> head(listFilters(ensembl))
             name     description
1 chromosome_name Chromosome name
2           start Gene Start (bp)
3             end   Gene End (bp)
4      band_start      Band Start
5        band_end        Band End
6    marker_start    Marker Start

The "listAttributes" function will give you the list of the available attributes for a given mart and species:

> library(biomaRt)
> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")
> head(listAttributes(ensembl))
                   name           description
1       ensembl_gene_id       Ensembl Gene ID
2 ensembl_transcript_id Ensembl Transcript ID
3    ensembl_peptide_id    Ensembl Protein ID
4       ensembl_exon_id       Ensembl Exon ID
5           description           Description
6       chromosome_name       Chromosome Name

getBM

The "getBM" function allow you to build a BioMart query using a list of mart filters and attributes.

Example query: Fetch all the Ensembl gene, transcript IDs, HGNC symbols and chromosome locations located on the human chromosome 1

> library(biomaRt)
> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")
> chr1_genes <- getBM(attributes=c('ensembl_gene_id',
'ensembl_transcript_id','hgnc_symbol','chromosome_name','start_position','end_position'), filters =
'chromosome_name', values ="1", mart = ensembl)

> head(chr1_gene)

ensembl_gene_id ensembl_transcript_id hgnc_symbol chromosome_name start_position end_position
1 ENSG00000231510       ENST00000443270                           1        5086459      5090899
2 ENSG00000162444       ENST00000315901        RBP7               1        9997206     10016020
3 ENSG00000162444       ENST00000294435        RBP7               1        9997206     10016020
4 ENSG00000270171       ENST00000602640                           1        7693124      7694844
5 ENSG00000225643       ENST00000412797                           1       25581478     25590356
6 ENSG00000116497       ENST00000530710     S100PBP               1       32816767     32858879

Example query: Fetch Ensembl Gene, Transcript IDs, HGNC symbols and Uniprot Swissprot accessions mapped to the human Ensembl Gene ID "ENSG00000139618"

> library(biomaRt)
> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")
> hgnc_swissprot <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id','hgnc_symbol','uniprotswissprot'),filters = 'ensembl_gene_id', values = 'ENSG00000139618', mart = ensembl)
> hgnc_swissprot
  ensembl_gene_id ensembl_transcript_id hgnc_symbol   uniprot_swissprot
1 ENSG00000139618       ENST00000380152       BRCA2 P51587
2 ENSG00000139618       ENST00000528762       BRCA2                  
3 ENSG00000139618       ENST00000470094       BRCA2                  
4 ENSG00000139618       ENST00000544455       BRCA2 P51587