Ensembl Regulation (Funcgen) Schema Documentation

Deprecation note

As of release 113, it is no longer recommended to access funcgen data via the Perl API or the funcgen DB. See Accessing regulatory data for alternative options.

If the Funcgen Perl API or database is important to your research, please let us know about your use case by emailing helpdesk@ensembl.org.

Introduction

This document describes the tables that make up the Ensembl Regulation schema. Tables are grouped logically by their function, and the purpose of each table is explained. This document refers to version 105 of the Ensembl Regulation schema.

List of the tables:

transcription_factor
transcription_factor_complex
transcription_factor_complex_composition
binding_matrix_transcription_factor_complex
external_feature
external_feature_file
probe_feature
probe_feature_transcript
feature_type
associated_feature_type

Set tables

feature_set
alignment
alignment_read_file
alignment_qc_flagstats
data_file
read_file
read_file_experimental_configuration

Array design tables

array
array_chip
probe_set
probe_set_transcript
probe
probe_seq
probe_transcript

Experiment tables

experiment
experimental_group

Ancilliary tables

epigenome

Core tables

analysis
analysis_description
meta
meta_coord
associated_xref
associated_group
identity_xref
external_synonym
external_db
ontology_xref
unmapped_reason

Core like tables

xref
object_xref
unmapped_object

Main feature tables

These define the various genomics features and their relevant associated tables.

regulatory_feature

Column	Type	Default value	Description	Index
regulatory_feature_id	int(10)	-	Internal ID	primary key
feature_type_id	int(10)	NULL	feature_type ID	unique key: uniqueness_constraint_idx key: feature_type_idx
seq_region_id	int(10)	-	seq_region ID	unique key: uniqueness_constraint_idx
seq_region_start	int(10)	-	Start position of this featurefeature_set	unique key: uniqueness_constraint_idx
seq_region_end	int(10)	-	End position of this feature	unique key: uniqueness_constraint_idx
seq_region_strand	tinyint(1)	-	Strand orientation of this feature	unique key: uniqueness_constraint_idx
stable_id	varchar(18)	NULL	Integer stable ID without ENSR prefix mnuhn: Not true, they do have this prefix	unique key: uniqueness_constraint_idx key: stable_id_idx
bound_start_length	mediumint(3)	-	Distance between start of the feature and start of the bound region. Bound regions are used for promoters only. They define the flanking regions. It is an area that is predicted t	unique key: uniqueness_constraint_idx
bound_end_length	mediumint(3)	-	Distance between end of the bound region and end of this feature	unique key: uniqueness_constraint_idx
epigenome_count	smallint(6)	NULL	Integer, number of epigenomes in which this feature is active
regulatory_build_id	int(10)	NULL	regulatory_build ID	unique key: uniqueness_constraint_idx

Column	Type	Default value	Description	Index
regulatory_activity_id	int(10)	-	regulatory_activity	primary key
attribute_feature_id	int(10)	-	Table ID of attribute feature	primary key key: attribute_feature_idx
attribute_feature_table	ENUM: annotated motif	'annotated'	Table name of attribute feature	primary key key: attribute_feature_idx

Column	Type	Default value	Description	Index
regulatory_build_id	int(4)	-	Internal ID	primary key
name	text	NULL	Name of the regulatory build
version	varchar(50)	NULL	Version of the regulatory build
initial_release_date	varchar(50)	NULL	Date of initial release
last_annotation_update	varchar(50)	NULL	Date of last annotation update
feature_type_id	int(4)	-	feature_type
analysis_id	smallint(5)	-	analysis
is_current	tinyint(1)	'0'	Set to true, if this entry refers to the current regulatory build
sample_regulatory_feature_id	int(10)	NULL	regulatory_feature

Column	Type	Default value	Description	Index
regulatory_build_epigenome_id	int(10)	-	Internal ID	primary key
regulatory_build_id	int(10)	-	regulatory_build
epigenome_id	int(10)	-	epigenome

Column	Type	Default value	Description	Index
segmentation_file_id	int(10)	-	Internal ID	primary key
regulatory_build_id	int(4)	NULL	regulatory_build
name	varchar(100)	NULL	A descriptive name of what is in the file.	unique key: name_idx
analysis_id	smallint(5)	-	analysis	key: analysis_idx
epigenome_id	int(10)	NULL	epigenome	key: epigenome_idx
segmentation_id	int(18)	NULL	segmentation

Column	Type	Default value	Description	Index
peak_id	int(10)	-	Internal ID	primary key
peak_calling_id	int(10)	-	peak_calling ID	unique key: seq_region_feature_set_idx key: feature_set_idx
seq_region_id	int(10)	-	seq_region ID	unique key: seq_region_feature_set_idx
seq_region_start	int(10)	-	Start position of this feature	unique key: seq_region_feature_set_idx
seq_region_end	int(10)	-	End position of this feature
seq_region_strand	tinyint(1)	-	Strand orientation of this feature
score	double	NULL	Score derived from software
summit	int(10)	NULL	Represents peak summit for those analyses which provide it (e.g. Swembl)

Column	Type	Default value	Description	Index
motif_feature_id	int(10)	-	Primary key, internal ID	primary key
binding_matrix_id	int(10)	-	binding_matrix table	unique key: unique_idx key: binding_matrix_idx
seq_region_id	int(10)	-	seq_region table	unique key: unique_idx key: seq_region_idx
seq_region_start	int(10)	-	Start position of this feature	unique key: unique_idx key: seq_region_idx
seq_region_end	int(10)	-	End position of this feature
seq_region_strand	tinyint(1)	-	Strand orientation of this feature	unique key: unique_idx
display_label			Text display label
score	double	NULL	Score derived from alignment software (e.g.MOODS)
interdb_stable_id			Unique key, provides linkability between DBs

Column	Type	Default value	Description	Index
motif_feature_peak_id	int(11)	-	Primary key, internal ID	primary key
motif_feature_id	int(11)	-	motif_feature table	key: motif_feature_idx
peak_id	int(11)	-	peak table	key: peak_idx

Column	Type	Default value	Description	Index
motif_feature_regulatory_feature_id	int(11)	-	Primary key, internal ID	primary key
motif_feature_id	int(11)	-	motif_feature table	unique key: mf_rf_ep_idx key: motif_feature_idx
regulatory_feature_id	int(11)	-	regulatory_feature table	unique key: mf_rf_ep_idx key: regulatory_feature_idx

Column	Type	Default value	Description	Index
mirna_target_feature_id	int(10)	-	Internal ID	primary key
feature_type_id	int(10)	NULL	feature_type ID	key: feature_type_idx
seq_region_id	int(10)	-	seq_region ID	key: seq_region_idx
accession	varchar(60)	NULL	Accession number given by data source	unique key: unique_idx
display_label	varchar(60)	NULL	Text display label
evidence	varchar(60)	NULL	Evidence level provided by data source	unique key: unique_idx
method	varchar(60)	NULL	Method used to identify miRNA target	unique key: unique_idx
seq_region_start	int(10)	-	Start position of this feature	unique key: unique_idx key: seq_region_idx
seq_region_end	int(10)	-	End position of this feature	unique key: unique_idx
seq_region_strand	tinyint(1)	-	Strand orientation of this feature
supporting_information	varchar(100)	NULL	Additional information which does not fit another category
analysis_id	smallint(10)	NULL	analysis ID @gene_stable_id link to gene stable ID

Column	Type	Default value	Description	Index
annotated_feature_id	int(10)	-	annotated_feature table ID	primary key
motif_feature_id	int(10)	-	motif_feature table ID	primary key key: motif_feature_idx

Column	Type	Default value	Description	Index
binding_matrix_id	int(10)	-	Internal table ID	primary key
name	varchar(200)	-	Name of binding matrix	unique key: name_idx
threshold	double	NULL	Minimum score for Motif Features for this matrix
source	varchar(20)	-	Source of binding matrix
stable_id	varchar(128)	-	Stable ID of binding matrix, ie. ENSPFM001	unique key: stable_id_idx

Column	Type	Default value	Description	Index
binding_matrix_frequencies_id	int(11)	-	Internal table ID	primary key
binding_matrix_id	int(11)	-	binding_matrix ID	key: binding_matrix_id_idx unique key: unique_constraint_idx
position	int(11)	-	Matrix sequence position	unique key: unique_constraint_idx
nucleotide	ENUM: A C G T	-	Nucleotide (A, C, G or T)	unique key: unique_constraint_idx
frequency	int(10)	-	Frequency value

Column	Type	Default value	Description	Index
transcription_factor_id	int(11)	-	Internal table ID	primary key
name	varchar(120)	-	Name of Transcription Factor	unique key: name_idx
feature_type_id	int(10)	NULL	feature_type ID	key: feature_type_id_idx
gene_stable_id	varchar(128)	NULL	Gene stable ID

Column	Type	Default value	Description	Index
transcription_factor_complex_id	int(11)	-	Internal table ID	primary key
production_name	varchar(120)	-	Name for internal use in pipelines	unique key: production_name_idx
display_name	varchar(120)	-	Name for external use (Genome Browser, API etc)	unique key: display_name_idx

Column	Type	Default value	Description	Index
transcription_factor_complex_composition_id	int(11)	-	Internal table ID	primary key
transcription_factor_complex_id	int(11)	-	transcription_factor_complex ID	unique key: tfc_id_tf_id_idx key: transcription_factor_complex_id_idx
transcription_factor_id	int(11)	-	transcription_factor ID	unique key: tfc_id_tf_id_idx key: transcription_factor_id_idx

Column	Type	Default value	Description	Index
binding_matrix_transcription_factor_complex_id	int(11)	-	Internal table ID	primary key
binding_matrix_id	int(11)	-	binding_matrix ID	unique key: binding_matrix_id_transcription_factor_complex_id_idx key: binding_matrix_id_idx
transcription_factor_complex_id	int(11)	-	transcription_factor_complex ID	unique key: binding_matrix_id_transcription_factor_complex_id_idx key: transcription_factor_complex_id_idx

Column	Type	Default value	Description	Index
external_feature_id	int(10)	-	Internal ID	primary key
feature_set_id	int(10)	-	feature_set ID	key: feature_set_idx
feature_type_id	int(10)	NULL	feature_type ID	key: feature_type_idx
seq_region_id	int(10)	-	seq_region ID	key: seq_region_idx
seq_region_start	int(10)	-	Start position of this feature	key: seq_region_idx
seq_region_end	int(10)	-	End position of this feature
seq_region_strand	tinyint(1)	-	Strand orientation of this feature
display_label	varchar(60)	NULL	Text display label
interdb_stable_id	mediumint(8)	NULL	Unique key, provides linkability between DBs	unique key: interdb_stable_id_idx

Column	Type	Default value	Description	Index
external_feature_file_id	int(10)	-	Internal ID	primary key
name	varchar(100)	NULL	A name descriptive of the data in the file	unique key: name_idx
analysis_id	smallint(5)	-	analysis	key: analysis_idx
epigenome_id	int(10)	NULL	epigenome	key: epigenome_idx
feature_type_id	int(10)	NULL	feature_type

Column	Type	Default value	Description	Index
probe_feature_id	int(10)	-	Internal ID	primary key key: seq_region_probe_probe_feature_idx
analysis_id	smallint(5)	-	analysis table ID
probe_id	int(10)	-	probe table ID	key: probe_idx key: seq_region_probe_probe_feature_idx
seq_region_id	int(10)	-	seq_region table ID	key: seq_region_probe_probe_feature_idx
seq_region_start	int(10)	-	Start position of this feature	key: seq_region_probe_probe_feature_idx
seq_region_end	int(10)	-	End position of this feature	key: seq_region_probe_probe_feature_idx
seq_region_strand	tinyint(4)	-	Strand orientation of this feature
mismatches	tinyint(4)	-	Integer, the number of bp matches for this alignment
cigar_line	varchar(50)	NULL	Extended cigar line format representation of the alignment as defined here http://samtools.sourceforge.net/SAM-1.3.pdf.<br> In summary: = Seq/Alignment Match M Alignment match/Seq mismatch X Seq/Alignment mismatch D Deletion S Soft clipping, used for overhanging cdna alignments where genomic seq is unknown
hit_id	varchar(255)	NULL	Id of the sequence on which the hit was initially made. Typically this will be the stable id of the transcript or the name of the sequence region.
source	ENUM: genomic transcript	NULL	The source of the sequence on which the probe was found. If set, this can be 'genomic' or 'transcript'

Column	Type	Default value	Description	Index
probe_feature_transcript_id	int(10)	-	Internal ID	primary key key: probe_feature_transcript_id_idx
probe_feature_id	int(10)	NULL	probe_feature table ID	key: probe_feature_id_idx
stable_id	varchar(128)	NULL	Stable id of the transcript to which it has been mapped
description	varchar(255)	NULL	Transcript description

Column	Type	Default value	Description	Index
table_id	int(10)	-	Internal table_id of linked table	primary key
feature_type_id	int(10)	-	Internal table_id of linked feature_type	primary key key: feature_type_index
table_name	ENUM: annotated_feature external_feature regulatory_feature feature_type	-	Name of linked table	primary key

Column	Type	Default value	Description	Index
alignment_id	int(10)	-	Internal ID	primary key
analysis_id	smallint(5)	-	analysis ID The aligner used to create this alignment.	key: analysis_idx
name	varchar(255)	-	Name of the alignment.	unique key: name_idx
bam_file_id	int(11)	NULL	This is the data_file_id in the data_file for the bam file of this alignment.
bigwig_file_id	int(11)	NULL	This is the data_file_id in the data_file for the bigwig file of this alignment.

Column	Type	Default value	Description	Index
alignment_read_file_id	int(10)	-	Internal ID	primary key
alignment_id	int(10)	-	alignment ID	primary key unique key: rset_table_idname_idx
read_file_id	int(10)	-	read_file ID	unique key: rset_table_idname_idx

Column	Type	Default value	Description	Index
alignment_qc_flagstats_id	int(28)	-	Internal ID	primary key unique key: name_exp_idx
alignment_id	int(10)	-	alignment ID
analysis_id	smallint(5)	NULL	analysis ID
column			qc_passed_reads
column			qc_failed_reads
column			path
column			bam_file
see			alignment

Column	Type	Default value	Description	Index
data_file_id	int(11)	-	Internal ID	primary key unique key: data_file_id
table_id	int(10)	-	Primary key of linked entity e.g. external_feature_file or segmentation_file or alignment	unique key: table_id_name_path_idx
table_name	varchar(32)	-	Name of linked table (external_feature_file, segmentation_file, alignment)	unique key: table_id_name_path_idx
path	varchar(255)	-	Either a full filepath or a directory which the API will use to build the filepath	unique key: table_id_name_path_idx
file_type	ENUM: BAM BAMCOV BIGBED BIGWIG VCF CRAM DIR	'BAM'	Type of data file ('BAM','BAMCOV','BIGBED','BIGWIG','VCF','CRAM','DIR')
md5sum	varchar(45)	NULL	md5sum of data file

Column	Type	Default value	Description	Index
read_file_experimental_configuration_id	int(10)	-	Internal ID	primary key
read_file_id	int(10)	NULL	read_file id of the read file that is being described.
experiment_id	int(10)	-	experiment id of the experiment during which the read file was generated.	unique key: name_exp_idx key: experiment_idx
biological_replicate	tinyint(3)	'1'	the biological replicate number	unique key: name_exp_idx
technical_replicate	tinyint(3)	'1'	the technical replicate number	unique key: name_exp_idx
paired_end_tag	int(11)	NULL	(not used yet)	unique key: name_exp_idx
multiple	int(11)	'1'	(not used yet)	unique key: name_exp_idx

Column	Type	Default value	Description	Index
array_id	int(10)	-	Internal ID	primary key
name	varchar(40)	NULL	Name of array	unique key: vendor_name_idx unique key: class_name_idx
format	varchar(20)	NULL	Format of array e.g. EXPRESSION, TILED
vendor	varchar(40)	NULL	Name of array vendor e.g. AFFY	unique key: vendor_name_idx
description	varchar(255)	NULL	Text description
type	varchar(20)	NULL	Array type e.g. OLIGO, PCR
class	varchar(20)	NULL	Array class e.g. AFFY_ST, ILLUMINA_INFINIUM	unique key: class_name_idx
is_probeset_array	tinyint(1)	'0'	Indicates whether the array is organised into probe sets.
column			has_sense_interrogation Indicates whether the array has sense interrogation
has_sense_interrogation	tinyint(1)	'0'	Indicates whether the array has sense interrogation

Column	Type	Default value	Description	Index
array_chip_id	int(10)	-	Internal ID	primary key
array_id	int(10)	-	array ID	unique key: array_design_idx
design_id	varchar(100)	NULL	ID/Accession defined by vendor	unique key: array_design_idx
name	varchar(100)	NULL	Name of array_chip

Column	Type	Default value	Description	Index
probe_set_id	int(10)	-	Internal ID	primary key
name	varchar(100)	-	Name of the probe set	key: name
size	smallint(6)	-	Integer size of the probe set i.e. how many probe is contains
family	varchar(20)	NULL	Generic descriptor for probe_set e.g. ENCODE_REGIONS, RANDOM etc. Not used
array_chip_id	int(10)	NULL	array_chip ID of the array chip to which this probe set belongs.