Yum, beautiful regulatory variants to spot...

Regul@tionSpotter

Documentation

Documentation
Input Output Interaction Plot vcf documentation Contact
Examples
Examples & Tutorial

Analysing a single variant

Input

Single Variant

For single queries in RegulationSpotter you can use the single query interface. Here, you can put in single variants as shown in our single query tutorial. Simply fill in the chromosome and location of the variant along with the reference and alternative allele. Please note that For InDels, you have to use the VCF format, i.e. always start with the last reference base before the variant.

Output

Clicking the submit button leads you to a view of your results. For intragenic alterations and known disease causing variants, you will be redirected to our conventional MutationTaster output. More information on this can be found in the MutationTaster documentation. Here, we will focus on explanations about the detailed RegulationSpotter output.

Screenshot of a single variant result file

Screenshot of detailed results view of RegulationSpotter analysis of a single variant.

Results

Likely effect of an alteration

RegulationSpotter treats alterations differently depending on whether they are located within a gene or not. For alterations in protein-coding transcripts of genes, it relies on MutationTaster, which classifies an alteration as one of four possible types:


For more details about the classification process, please refer to our MutationTaster documentation.

Extragenic (extratranscriptic) alterations are assessed by RegulationSpotter directly. The program compiles and combines all the regulatory data and comes up with an estimate of how likely it is for a variant to be located in a regulatory region. RegulationSpotter also assesses public data sources such as CLinVar, 1000G and ExAC in order to reliably classify known variants. The possible outcomes are:
The results overview is followed by a summary section summing up the most important annotations for this variant.

Alteration (phys. location)

The alteration on "physical" i.e. chromosomal level (e.g. chr7:91623937_91623938insGGCAAT).

Alteration type

Is either SNV (a single base exchange), an insertion, a deletion or a combination of insertion and deletion.

Alteration region

Extratranscriptic by definition. Extratranscriptic in this context means any position that is out of (protein coding) transcripts, i.e. also promoter regions upstream of the gene start (the 5' most TSS).

Known variant

Any known polymorphism(s) or known disease variant that have been found at the position in question. Our database contains all single nucleotide polymorphisms (SNPs) from the NCBI SNP database (dbSNP). If an alteration is located at the same position as a known dbSNP, RegulationSpotter provides the SNP ID (or rs ID) and a link. Please note that there may be differences between your alteration and the alleles in dbSNP.
Moreover, we have stored all variants from the 1000 Genomes Project [1] (1000G) and from the Exome Aggregation Consortium (ExAC) [2]. For 1000G and ExAC, RegulationSpotter provides detailed information about homozygous/heterouygous hits and numbers of allele carries. If an alteration was found > 4x homozygously in 1000G or >10x homozygously in ExAC, it is automatically regarded as polymorphism.
We also display known disease variants from NCBI ClinVar. If a variant is marked as probable-pathogenic or pathogenic in ClinVar, it is automatically predicted to be disease-causing, i.e. disease causing automatic (the Region Score is calculated and shown nevertheless). We also provide a link to the respective entry in the ClinVar database.
Moreover, we have integrated the public version of the Human Gene Mutation Database (HGMD) [3]. The data includes the positions of the disease mutations and their HGMD ID. The disease alleles are not included so we cannot use HGMD for automatic predictions. Whenever an HGMD public disease mutation is found at the same position as a variant, this will be written in the summary. We also place a direct hyperlink to the mutation in HGMD into the 'dbSNP / 1000G / HGMD(public) / ClinVar' field, so you can check whether the HGMD mutation has the same allele as your variant (and whether the disease matches). Please note that you must be logged in at the HGMD site to make the hyperlink work - access to the public version is free but requires registration.

Promoters

This section displays all the promoters annotated for your variant. RegulationSpotter gets this information from various sources:

Enhancers

Displays all enhancer annotations found for your variant. Enhancer annotations were obtained from FANTOM5 [5] and VISTA [6]

Epigenetic Marks (RegulationSpotter)

Epigenetic marks (DNase1 hypersensitive sites and H3K4me3 annotations) obtained from Ensembl multicell regulatory features which were FOUND (alternativ: are annotated) in at least 3 cell lines and overlap with a promoter region. Please keep in mind that the coordinates of these marks may differ from the marks directly taken from the Ensembl Regulatory Build, because we show the overlap between different cell lines. This allows for a sharper annotation than in the Ensembl Regulatory Features Promoter and Promoter flanking region, but is less detailed than the cell-based single track annotations.

Histone Modifications, Polymerase, Open Chromatin, Transcription Factor Binding Site

We obtained cell-based annotations on histone modifications, polymerase binding sites, open chromatin and transcription factor binding sites (TFBSs) from the Ensembl regulatory build [3].
For each regulatory feature, we display the cell lines that were annotated with it. We grouped the cell lines according to their biological properties, which is indicated by the background colour:

Blood cells: GM12878, GM12865, CMK, GM12891, GM15510, Th2, GM12801, GM12892, GM18507, GM19240, Jurkat, GM12873, GM18526, GM19099, GM19238, GM18951, Th1, GM12874, GM12878-XiMat, GM12864, GM10847, GM12875, NB4, GM19239, GM12872, GM19193, GM18505, K562, K562b, DND-41, Monocytes-CD14+
Bone cells: Osteobl
Brain cells: Medullo
Breast cells: MCF10A-Er-Src, MCF7
Colon cells: Caco-2, HCT116
Embryonic Stem cells: H9ESC, H1ESC, H7ESC
Endothelial cells: HUVEC
Epithelial cells: LHSR, HPAEpiC, HRPEpiC, HCPEpiC, HEEpiC, HAEpiC, HIPEpiC, SAEC, HRE, A549, RPTEC, HRCEpiC, NHBE, HNPCEpiC, HeLa-S3, HMEC
Fetal Membrane cells: Chorion
Gingiva cells: AG09319, HGF
Heart cells: HCF
Kidney cells: HEK 293, HEK293b
Liver cells: HepG2b, HepG2
Lung cells: NHLF, AG04450, IMR90
Monocytes: Monocytes-CD14+
Muscle cells: SKMC, HSMMtube, HSMM
Neuron cells: SKNSHRA, PFSK1, NH-A, SKNMC
Pancreas cells: PanIslets, Panc1
Retina cells: WERIRB1
Skin cells: ProgFib, Melano, BJ, Fibrobl, AG10803, AG04449, NHEK, NHDF-neo, NHDF-Ad, NHDF, AG09309
Not grouped:NTERA-2 cl.D1, DND-41, HCM

Histone Modifications

We used annotations for the following 28 histone modifications from the Ensembl regulatory build:

H2AK5acHistone 2A Lysine 5 Acetylation
H2AZHistone 2A variant Z
H2BK120acHistone 2B Lysine 120 Acetylation
H2BK12acHistone 2B Lysine 12 Acetylation
H2BK15acHistone 2B Lysine 15 Acetylation
H2BK20acHistone 2B Lysine 20 Acetylation
H2BK5acHistone 2B Lysine 5 Acetylation
H3K14acHistone 3 Lysine 14 Acetylation
H3K18acHistone 3 Lysine 18 Acetylation
H3K23acHistone 3 Lysine 23 Acetylation
H3K23me2Histone 3 Lysine 23 di-methylation
H3K27acHistone 3 Lysine 27 Acetylation
H3K27me3Histone 3 Lysine 27 Tri-Methylation
H3K36me3Histone 3 Lysine 36 Tri-Methylation
H3K4acHistone 3 Lysine 4 Acetylation
H3K4me1Histone 3 Lysine 4 Mono-Methylation
H3K4me2Histone 3 Lysine 4 Di-Methylation
H3K4me3Histone 3 Lysine 4 Tri-Methylation
H3K56acHistone 3 Lysine 56 Acetylation
H3K79me1Histone 3 Lysine 79 mono-methylation
H3K79me2Histone 3 Lysine 79 di-methylation
H3K9acHistone 3 Lysine 9 Acetylation
H3K9me1Histone 3 Lysine 9 mono-methylation
H3K9me3Histone 3 Lysine 9 Tri-Methylation
H4K20me1Histone 4 Lysine 20 mono-methylation
H4K5acHistone 4 Lysine 5 Acetylation
H4K8acHistone 4 Lysine 8 Acetylation
H4K91acHistone 4 Lysine 91 Acetylation

Open Chromatin

For annotation of open chromatin, we used DNase I hypersensitive sites from the Ensembl regulatory build.

Polymerase Binding Sites

Indicates that annotations for Polymerase II and Polymerase III were found for your variant's location.

Transcription Factor Binding Sites

We included the following TFBSs (see list below). TFBSs that are annotated in at least 3 different cell lines are printed in bold. TFBSs can be either confirmed, i.e. found by experimental procedures such as ChIP-seq, or be deduced by motif, i.e. the binding site for a certain TF that can be contained in the DNA sequence.

Ap2alphaAp2alpha Transcription Factor Binding
Ap2gammaAp2gamma Transcription Factor Binding
ATF3ATF3 Transcription Factor Binding
BAF155BAF155 Transcription Factor Binding
BAF170BAF170 Transcription Factor Binding
BATFBATF Transcription Factor Binding
BCL11ABCL11A Transcription Factor Binding
BCL3BCL3 Transcription Factor Binding
BCLAF1BCLAF1 Transcription Factor Binding
BHLHE40BHLHE40 Transcription Factor Binding
Brg1Brg1 Transcription Factor Binding
CfosCfos TF binding
CjunCjun TF binding
CmycCmyc TF binding
CTCFCCCTC-binding factor
CTCFLCTCFL Transcription Factor Binding
E2F1E2F1 Transcription Factor Binding
E2F4E2F4 Transcription Factor Binding
E2F6E2F6 Transcription Factor Binding
EBF1EBF1 Transcription Factor Binding
Egr1Egr1 Transcription Factor Binding
ELF1ELF1 Transcription Factor Binding
ETS1ETS1 Transcription Factor Binding
FOSL1FOSL1 Transcription Factor Binding
FOSL2FOSL2 Transcription Factor Binding
FOXA1FOXA1 Transcription Factor Binding
FOXA2FOXA2 Transcription Factor Binding
GabpGabp TF binding
Gata1Gata1 TF binding
Gata2Gata2 Transcription Factor Binding
GTF2BGTF2B Transcription Factor Binding
HDAC2HDAC2 Transcription Factor Binding
HDAC8HDAC8 Transcription Factor Binding
HEY1HEY1 Transcription Factor Binding
HNF4AHNF4A Transcription Factor Binding
HNF4GHNF4G Transcription Factor Binding
Ini1Ini1 Transcription Factor Binding
IRF4IRF4 Transcription Factor Binding
JunbJunb Transcription Factor Binding
JundJund TF binding
MaxMax TF binding
MEF2AMEF2A Transcription Factor Binding
MEF2CMEF2C Transcription Factor Binding
NanogNanog Transcription Factor Binding
Nfe2Nfe2 TF binding
NFKBNFKB Transcription Factor Binding
NR4A1NR4A1 Transcription Factor Binding
Nrf1Nrf1 Transcription Factor Binding
NrsfNrsf TF binding
p300p300 Transcription Factor Binding
Pax5Pax5 Transcription Factor Binding
Pbx3Pbx3 Transcription Factor Binding
POU2F2POU2F2 Transcription Factor Binding
POU5F1POU5F1 Transcription Factor Binding
PU1PU1 Transcription Factor Binding
Rad21Rad21 Transcription Factor Binding
RXRARXRA Transcription Factor Binding
SETDB1SETDB1 Transcription Factor Binding
Sin3Ak20Sin3Ak20 Transcription Factor Binding
SIX5SIX5 Transcription Factor Binding
SP1SP1 Transcription Factor Binding
SP2SP2 Transcription Factor Binding
SrfSrf TF binding
TAF1TAF1 Transcription Factor Binding
TAF7TAF7 Transcription Factor Binding
Tcf12Tcf12 Transcription Factor Binding
THAP1THAP1 Transcription Factor Binding
Tr4Tr4 Transcription Factor Binding
USF1USF1 Transcription Factor Binding
XRCC4XRCC4 Transcription Factor Binding
Yy1Yy1 Transcription Factor Binding
ZBTB33ZBTB33 Transcription Factor Binding
ZBTB7AZBTB7A Transcription Factor Binding
ZEB1ZEB1 Transcription Factor Binding
Znf263Znf263 TF binding
ZNF274ZNF274 Transcription Factor Binding
In this section, you will also find a link to ePOSSUM, our software for the analysis of transcription factor binding sites.

Genomic Interactions

We integrated data on the interaction of distant genomic elements generated by Hi-C experiments from Rao et al. [7], from 5C experiments for the ENCODE project [8,9] generated by groups from the University of Massachusetts and from the 4D Genome database. 5C and Hi-C data were downloaded from NCBI: Find 5C data UMass data here and Hi-C data here .
For each interaction annotated for your variant, RegulationSpotter displays the gene name and Ensembl gene ID as well as the element, promoter or distant element, interacts w/ promoter (might be an enhancer) involved in the interaction. It should be noted that due to multiple TSSs of the same gene, a variant can be considered as affecting the promoter of a certain gene or not, depending on which transcript / TSS is under scrutiny.
We only display interactions which were present in at least 3 different cell lines and also include the affected cell lines.
To give you a better understanding of the interaction, RegulationSpotter also displays the interaction as a plot - just try out the link given below in the figure capture.

Interaction plot

Screenshot of a single variant result file

Screenshot of an interaction plot. This plot is embedded in the single variant output (
example - click on 'show interactions as plot', direct link to the interaction plot).
The image is divided into two parts, which can be separately resized and scrolled through to bring together the different elements: the upper part shows involved genes or transcripts (display can be changed by the user upon clicking on 'show transcripts instead of genes'), while the lower part shows interacting regions, in one of which the analysed variant is located. The thin red line symbolizes the location of the variant. Interaction elements are depicted as black lines with blue ends, the blue ends represent the genomic elements which were found to interact with each other, e.g. by Hi-C or similar methods. You can find protein-coding genes or transcripts in the region as red rectangles and pseudogenes or non-protein-coding genes (e.g. pseudogenes) or transcripts (e.g. processed trancripts) marked with a little green box. You can switch between viewing genes (usually resulting in a condensed picture) or transcripts (extended view). We recomment to switch on transcript view in order to be able to understand the classification of interacting elements as promoter or distant element (e.g. enhancer). Moreover, you will find a link to explore the region in Ensembl. Below the plot you can find a legend explaining the picture.


PhyloP/PhastCons

Indicates the conservation of the alteration site. Data from phyloP [10] and PhastCons [11].
PhastCons and phyloP are both methods to determine the grade of conservation of a given nucleotide. RegulationSpotter uses values which are precomputed and offered by UCSC (please follow the links to phyloP and PhastCons).
phastCons values vary between 0 and 1 and reflect the probability that each nucleotide belongs to a conserved element, based on the multiple alignment of genome sequences of 46 different species (the closer the value is to 1, the more probably the nucleotide is conserved). It considers not just each individual alignment column, but also its flanking columns.
In contrast, phyloP (values between -14 and +6) separately measures conservation at individual columns, ignoring the effects of their neighbors. Moreover, phyloP can not only measure conservation (slower evolution than expected under neutral drift) but also acceleration (faster evolution than expected). Sites predicted to be conserved are assigned positive scores, while sites predicted to be fast-evolving are assigned negative scores.
For deletions, insertions and Indels, not all phyloP and phastCons values of all affected bases add up to the Region Score, but only one value for each, phyloP and phastCons, is added to the Region Score.
For more information about phyloP and phastCons, please see the cited papers.

CADD

The CADD [12] value for the respective position. Please be aware that we always display the highest value for a certain position, regardless of the actual variant, which means that the CADD value displayed here might slighty differ from the actual value for the distinct variant stored or displayed elsewhere. Moreover, CADD values are only displayed for informational reasons and are not included in the score. The integrated version is CADD for b37 v1.3.

Chromosome

The chromosome the alteration is located on.

Strand

Is either 1 for forward strand or -1 for reverse strand

Chromosomal position

Gives the last wild-type base before alteration and first wild-type base after alteration in chromosomal sequence context (position relative to start of chromosomal reference sequence) e.g. 154,372,337 / 154,372,339, the altered base is at position 154,372,338.

Original chrDNA sequence snippet

Original DNA sequence with the original nucleotide marked in blue.

Altered chrDNA sequence snippet

Altered DNA sequence with the original nucleotide marked in blue.

Speed

The speed that was required for the current analysis.

Contact

In case you discover bugs, have suggestions or questions, please write an e-mail to
Jana Marie Schwarz (jana-marie.schwarz AT charite.de) or to
Dominik Seelow
(dominik.seelow AT charite.de).
We also appreciate hearing about your general experiences using RegulationSpotter.

References

[1] 1000 Genomes Project Consortium: An integrated map of genetic variation from 1,092 human genomes. Nature 2012 Nov 1. PMID: 23128226

[2] Analysis of protein-coding genetic variation in 60,706 humans. Monkol Lek, Konrad J. Karczewski[…]Exome Aggregation Consortium. Nature volume 536, pages 285–291 (18 August 2016)

[3] The Human Gene Mutation Database: 2008 update. Peter D Stenson, Matthew Mort, Edward V Ball, Katy Howells, Andrew D Phillips, Nick ST Thomas and David N Cooper. Genome Medicine 2009.

[4] Zerbino DR, Wilder SP, Johnson N, Huettemann T, Flicek PR. The Ensembl Regulatory Build. Genome Biology 2015. PMID: 25887522

[5] FANTOM Consortium and the RIKEN PMI and CLST (DGT) et al. A promoter-level mammalian expression atlas. Nature 507, 462-470 (2014).

[6] Visel A, Minovitsky S, Dubchak I, Pennacchio LA. VISTA Enhancer Browser - a database of tissue-specific human enhancers. Nucleic Acids Res. 2007. PMID: 17130149

[7] Rao SS, Huntley MH, Durand NC, Stamenova EK et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 2014. PMID: 25497547

[8] ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).

[9] Sloan, C. A. et al. ENCODE data at the ENCODE portal. Nucleic Acids Res. 44, D726-732 (2016).

[10] Pollard KS, Hubisz MJ, Siepel A. Detection of non-neutral substitution rates on mammalian phylogenies. Genome Res. 2009. PMID: 19858363

[11] Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005. PMID: 16024819

[12] CADD: predicting the deleteriousness of variants throughout the human genome. Philipp Rentzsch, Daniela Witten, Gregory M Cooper, Jay Shendure, Martin Kircher. Nucleic Acids Research 2018.