Yum, beautiful regulatory variants to spot...



Example 1 (polymorphism): chr3:11022826G>A (rs956053)

Example 1 is a single base exchange that has been found in a homozygous state and with a high frequency in the 1000Genomes Project data (1000G).

The variant lies in an extragenic part that has only little function annotated: H3K27me3 histone modification in more than 3 cell types (bold print), H4K4me1 histone modification in only 1 cell type, CTCF binding site in numerous cell types. However, taken together, these annotations are not indicative of a region with a distinct regulatory function such as a promoter or an enhancer. PhyloP / phastCons scores are obtained, both with low values showing that the position is not conserved.

Due to this combined evidence, RegulationSpotter calculated a low score (6.78) and assigned this variant as polymorphism due to its homozygous presence in the 1000G data.

Example 2 (disease causing variant): chr1:160001799G>C

The detected single base exchange is not annotated in either ExAc or 1000G. However, it is a known disease causing variant in ClinVar and HGMD responsible for Glycosylphosphatidylinositol deficiency. Because it is a known disease causing mutation, it is denoted by RegulationSpotter as such.

Independent of this classification due to its presence in disease databases, RegulationSpotter calculates the Region Score based on the amount of evidence that the variant is located in a regulatory region. Available annotations implicate that the region the variant is located in is a Promoter, as it is also annotated in Ensembl Regulation. It also meets the RegulationSpotter criteria of a Promoter by position (500bp upstream / 50bp downstream) and shows up with with overlapping DNase I and H3K4me3 marks, which are indicative of an active promoter. In detail, there are several other histone modifications and TFBS in this region. The probable functional importance of this location is further emphasized by the relatively high PhyloP/ PhastCons conservation values.

Taken together, these annotations account for a relatively high total Region Score for this variant of 120.64.

Example 3 (Whole Genome Sequencing data set): HG00096_plus_ClinVar

Here, you can find a ready analysed Whole Genome Sequencing data set from the 1000 Genomes Project with interspersed extratranscriptic disease mutations from the ClinVar database.

You can try out different sorting options, e.g. sort by effect, which will bring up the ClinVar disease mutations on top, or sort by position, which will show up the first 200 variants, starting from chromosome 1, regardless of of the variant effect. Per default, the first 200 variants matching the sorting and filter criteria are shown.


In case you discover bugs, have suggestions or questions, please write an e-mail to
Jana Marie Schwarz (jana-marie.schwarz AT charite.de) or to
Dominik Seelow
(dominik.seelow AT charite.de).
We also appreciate hearing about your general experiences using RegulationSpotter.