Download Gene regulatory elements of the cardiac

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Electrocardiography wikipedia , lookup

Quantium Medical Cardiac Output wikipedia , lookup

Transcript
B RIEFINGS IN FUNC TIONAL GENOMICS . VOL 13. NO 1. 28 ^38
doi:10.1093/bfgp/elt031
Gene regulatory elements of the
cardiac conduction system
Karel van Duijvenboden, Jan M. Ruijter and Vincent M. Christoffels
Advance Access publication date 22 August 2013
Abstract
The coordinated contraction of the heart relies on the generation and conduction of the electrical impulse.
Aberrations of the function of the cardiac conduction system have been associated with various arrhythmogenic
disorders and increased risk of sudden cardiac death. The genetics underlying conduction system function have
been investigated using functional studies and genome-wide association studies. Both methods point towards the
involvement of ion channel genes and the transcription factors that govern their activity. A large fraction of diseaseand trait-associated sequence variants lie within non-coding sequences, enriched with epigenetic marks indicative
of regulatory DNA. Although sequence conservation as a result of functional constraint has been a useful property
to identify transcriptional enhancers, this identification process has been advanced through the development of
techniques such as ChIP-seq and chromatin conformation capture technologies. The role of variation in gene regulatory elements in the cardiac conduction system has recently been demonstrated by studies on enhancers of
SCN5A/SCN10A and TBX5. In both studies, a region harbouring a functionally implicated single-nucleotide polymorphism was shown to drive reproducible cardiac expression in a reporter gene assay. Furthermore, the risk variant of
the allele abrogated enhancer function in both cases. Functional studies on regulatory DNA will likely receive a
boost through recent developments in genome modification technologies.
Keywords: cardiac conduction system; arrhythmias; enhancers; genetic variation; epigenetic variation; TBX transcription
factors
THE FUNCTION OF THE CARDIAC
CONDUCTION SYSTEM
Sudden cardiac death (SCD) from ventricular fibrillation, often in the context of coronary heart disease,
is a leading cause of death [1, 2]. The occurrence of
arrhythmia in general, including sinus node dysfunction, atrioventricular (AV) block and atrial fibrillation, represents a major worldwide health problem
[3]. The majority of such abnormal heart rhythms
develop with increasing age or in a setting of comorbidity, though a significant portion is present at
birth as a consequence of congenital heart defects,
the most prevalent type of birth defect [4]. The coordinated contractions of the atria and ventricles
require the rhythmic generation of a depolarizing
wave of electricity by the cardiac pacemaker and
the coordinated propagation of this wave by the cardiac conduction system. This process is capacitated
by localized cardiomyocyte specialization in the
different components of the heart. The compartment-specific differences in the expression of ion
channel and gap junction genes define the specific
Corresponding author. Vincent M. Christoffels, Department of Anatomy, Embryology & Physiology, Academic Medical Centre,
L2-108, Meibergdreef 15, 1105 AZ Amsterdam, the Netherlands. Tel.: þ 31 20 5667821; Fax: þ 31 20 6976177; E-mail:
[email protected]
Karel van Duijvenboden studied molecular and cellular biology and is currently a PhD student in the Christoffels lab in the
Academic Medical Center, Amsterdam. His research focuses on the development of in silico methods for the identification of
tissue-specific enhancers and analysis of cardiac gene regulatory networks.
Jan M. Ruijter trained as medical biologist and worked in endocrinology, neurobiology, biostatistics, ophthalmology and embryology.
He currently heads a research group studying cardiac gene expression and development with molecular, image analysis and
3D-reconstruction techniques.
Vincent M. Christoffels is head of the Department of Anatomy, Embryology and Physiology of the Academic Medical Center,
University of Amsterdam. He studies transcriptional regulatory mechanisms of development, focusing on the regulation of heart and
conduction system development.
ß The Author 2013. Published by Oxford University Press. All rights reserved. For permissions, please email: [email protected]
Gene regulatory elements of the cardiac conduction system
PR
SAN
PR
AVN
PR & QRS
QRS
29
QT (repolarization)
CAV1
SCN10A
HAND1
SCN5A
MEIS1
SCN5A
TBX20
TBX5
NKX2-5
TBX5
CACNA1D
KCNQ1
SOX5
TBX3
DKK1
KCNE1
CDKN1A
NOS1AP
NF1A
AVB
QRS
RBB
LBB
PVCS
Figure 1: Schematic representation of the surface ECG and components of the cardiac conduction system. The
cardiac conduction system consists of the sino-atrial node (SAN), atrioventricular node (AVN), atrioventricular
bundle (AVB), right bundle branch (RBB), left bundle branch (LBB) and the peripheral ventricular conduction
system (PVCS). The association between the cardiac conduction system components and the ECG parameters is
shown. GWAS identified genes near loci associated with variation in PR interval, QRS complex or QT interval.
The boxes list a selection of these genes.
electrophysiological characteristics of the distinct cell
populations that are involved in the generation of the
action potential and propagation of the impulse [5–7]
(Figure 1).
ECG REFLECTS CARDIAC
CONDUCTION SYSTEM FUNCTION
Variables derived from the electrocardiogram (ECG)
are indicative of the function of the cardiac conduction system. They can describe a range of intermediate phenotypes of arrhythmogenic disorders and
SCD risk [8]. The measurements routinely obtained
with ECG include heart rate, PR interval, QRS duration and QT interval. A strong correlation between
elevated heart rate and cardiovascular morbidity and
mortality has been established [9]. The PR interval
reflects conduction through the AV node. The QRS
complex records the depolarization of the ventricles
through the Purkinje system and ventricular myocardium. Deviations in these measures of electrical
activation have been associated with increased risk
of potentially lethal arrhythmias [10, 11]. The QT
interval reflects myocardial repolarization. Both tails
of the QT interval distribution are linked to high risk
of SCD. Numerous rare mutations in ion channel
genes give rise to known Mendelian long- and
short-QT syndromes [2, 12].
THE POWER AND LIMITATIONS
OF GWAS
In genetic epidemiology, genome-wide association
studies (GWAS) represent a research technique to
examine many common genetic variants in different
individuals to see if any variant is associated with a
trait, such as a parameter derived from the ECG, or a
major disease. These studies typically compare the
genomic sequences of case and control groups to
determine which single-nucleotide polymorphisms
(SNPs) are more frequent in people with the trait.
GWAS can also be used to study continuous traits
that are attributable to polygenic effects. The relative
contribution of a sequence variant to a complex
varying phenotype, such as blood pressure, is then
estimated using quantitative trait locus (QTL) mapping, wherein statistical methods are applied to differentiate between marker genotype groups.
To date a multitude of GWAS have identified
statistical associations between common sequence
variants and ECG variables [13–19]. Strikingly,
though not unexpectedly, the loci identified in
these GWAS are enriched with ion channel genes
and cardiac transcription factors (TFs) that govern
their activity. These TFs are also known to be critical
for the development of the heart and its conduction
system (e.g. [20, 21]). These sequence variants that
are statistically associated with electrophysiological
30
van Duijvenboden et al.
properties of the heart, may thus identify the genetic
components of a gene regulatory network of TFs
and target genes controlling the function of the conduction system.
However, though the GWAS data greatly help in
pointing out the suspect areas of the genome, it is
often challenging to move from these statistical
associations to knowledge of the underlying biological mechanism that explains why a genomic interval correlates with a complex physiological trait.
Further complicating the functional annotation of
the identified genetic variants is the fact that
GWAS studies identify tag SNPs for regions in linkage disequilibrium (LD). In population genetics a
combination of alleles is in LD when their observed
haplotype frequencies deviate from the expected frequencies based on random genetic cross-over. When
alleles are in complete LD, QTL segregation is
impossible as the allele will always be in the same
marker genotype group in each GWAS comparison.
This hampers the identification of the precise variants
in each region that causes the statistical association
with the trait. In some cases the associated risk allele
resides in a protein-coding region, causing a different
amino-acid to be encoded. However, such clear-cut
causative biological mechanisms are a rare find. It has
been reported that between 88% and 93% of diseaseand trait-associated variants emerging from GWAS
studies lie within non-coding sequences [22, 23].
Since these variants are reported to be concentrated
in NFkB binding sites [24] and deoxyribonuclease
I (DNaseI) hypersensitive sites [23], both markers
indicative of regulatory DNA, these high percentages
do not merely reflect the increased polymorphism of
non-coding DNA. Instead, such findings suggest a
pervasive involvement of variation in regulatory
DNA in common human disease.
activated exclusively in the presence of external
signals, like hormones. Enhancers are a class of cisregulatory elements that promote gene expression
but can be located in intergenic regions, introns
and exons, tens to hundreds of kilobases from their
target genes (e.g. [25]). The bulk of DNA is compacted into chromatin, but enhancers must be accessible to proteins and are, therefore, localized in
euchromatin regions with exposed DNA. The
regions of the genome harbouring enhancers may
require appropriate stimuli to become accessible.
To that end, histones, the proteins that constitute
nucleosomes, can be modified at different residues
by chromatin remodelling enzymes. A wide assortment of histone modifications allows for dynamic
repositioning of nucleosomes [26, 27]. For example,
the acetylation of lysine 27 at histone 3 (H3K27ac)
has been linked to active enhancers [28]. The acetyltransferase p300 is a catalyst for histone acetylation
and given this role, it has been a popular tool in
enhancer identification [29, 30]. The end result of
enhancer activation by nucleosome repositioning,
acetylation, TF binding and the docking of RNA
polymerase is gene transcription. Mounting evidence
supports a mechanism where enhancers directly contact their target gene promoters and initiate gene
expression by looping out of intervening chromatin
[31–33], reviewed in [34]. Enhancers typically span a
few hundred base pairs (bp) and are composed of
clusters of TF-specific binding sites, which facilitate
interaction with both trans-activating and -repressive
factors. Complex gene expression patterns in different tissues and time points can arise through biological integration of these different cues and physically
supplying them to their targeted gene promoters.
ENHANCER IDENTIFICATION
THROUGH CONSERVATION
TRANSCRIPTIONAL REGULATION
BY CIS-REGULATORY ELEMENTS
The stage- and tissue-specific requirements of the
genes responsible for the generation and functioning
of the cardiac conduction system, require complex
spatiotemporal gene expression patterns in these
tissues. In metazoans, an important part of gene
regulation occurs at the transcriptional level. The
docking of RNA polymerase to the promoters of
genes to initiate transcription is mediated by the
interactions between TFs and cis-regulatory DNA
sequences. TFs can be cell-lineage-specific, or
It has been hypothesized that enhancers, being
important biological sequences, are conserved between species due to functional constraints. This assumption has been validated by a number of studies
that have been successful in identifying functional
sequences solely using inter-species genomic comparisons [35–38]. For example, a study by Visel
et al. [38] found that half (115/231) of the ultraconserved regions they tested drove reproducible
reporter gene expression in various tissues of the
developing mouse embryo. In contrast to these
findings, approaches using epigenomic cues have
Gene regulatory elements of the cardiac conduction system
illustrated how poorly conserved enhancers can be in
certain tissues [39–41]. Using extreme conservation
as the guideline, Blow et al. [39] were largely unsuccessful in identifying cardiac enhancers. However,
when they searched for cardiac enhancers using
p300 binding sites from embryonic day 11.5
(e11.5) mouse embryos, they reported a success
rate of 68% (89/130). Furthermore, the cardiac
p300 binding sites they uncovered showed lower
evolutionary conservation than their counterparts
found in brain and limb tissue. Low success rates of
evolutionary approaches in enhancer discovery can
be partly explained by limitations of the used assays,
in terms of selected time points of development, but
intuitively it makes sense that the small size of TF
binding sites (6–12 bp) results in highly localized and,
therefore, limited sequence constraints. On the other
hand, it could be argued that enhancers consisting of
multiple TF binding sites, arranged in modular fashion within large clusters, would be under considerable selection pressure.
CHIP-SEQ IS A POWERFUL TOOL
TO LOCATE ENHANCERS
Chromatin immunoprecipitation (ChIP) followed
by high-throughput DNA sequencing (ChIP-seq)
has become the gold standard approach to map protein–DNA interactions in vivo on a genome-wide
scale [42–44]. The ChIP-seq approach is independent of sequence conservation and directly targets the
epigenetic marks from the tissues of study. The result
of ChIP-seq analysis is the quantified occurrence of
DNA fragments, which reflect the genomic
occupancy by the factor through direct binding or
indirect binding via complex formation. Such quantitative maps of potential cis-regulatory binding
elements are useful for enhancer identification, as
proven through the use of enhancer reporter vectors
in vitro [45] and in vivo [29]. ChIP-seq on tissue-specific TFs provides useful marks for finding enhancers
that are active in the tissue of choice [43, 46–48].
Other commonly used marks for identifying putative
enhancers include p300, RNA polymerase 2
(Pol2), H3K27ac and H3 monomethylated at K4
(H3K4me1). Collective efforts, such as the
ENCODE project, have generated genome-wide
maps of histone modifications, TF binding and
DNaseI hypersensitivity in a variety of cell lines
and tissues [49–51]. Each of these datasets can
31
potentially be used to identify regulatory elements
in the non-coding genome.
4C-SEQ TECHNOLOGY
DEMONSTRATES REGULATORY
POTENTIAL OF GENOMIC
REGIONS
The development of chromatin conformation capture (3C) technology and its genome-wide derivatives have enabled the unbiased study of the spatial
organization of DNA into chromatin. The strategy
of 3C technology relies on digestion and religation
of fixed chromatin in cells [52]. The subsequent
quantification of ligation junctions allows insights
into DNA contact frequencies. Using this technology, a chromatin loop can be demonstrated to exist if
two distal sites on the same chromosome form more
ligation junctions with each other than with intervening sequences. Circular chromatin conformation
capture (4C) extended 3C technology to enable the
more unbiased screening of the genome from a
single viewpoint [53, 33]. In 4C, the ligated 3C template is processed with a second round of DNA digestion and ligation to create small DNA circles. An
inverse PCR using viewpoint-specific primers then
amplifies all sequences that contacted the chromosome at the chosen viewpoint. An even higher resolution and larger dynamic range can be achieved
when 4C is coupled with high-throughput DNA
sequencing (4C-seq) [54]. It is important to realize
that spatial proximity of a genomic region with a
promoter only conveys the possibility of an interaction between the two; alternatively, it could indicate the location of a poised enhancer or simply be
the result of non-functional stochastic DNA looping.
However, the demonstration of spatial proximity of
candidate enhancer and promoter regions using
4C-seq methodology constitutes a very powerful observation, given the fact that enhancer function
requires such proximity to the target promoter for
direct contact.
GENETIC VARIATION
FUNCTIONALLY AFFECTS
SCN5A/SCN10A ENHANCER
A series of GWAS implicated that genetic variants
linked to loci harbouring genes for cardiac TFs
(including NKX2-5, TBX3 and TBX5) and ion channels (most notably SCN5A and SCN10A), modulate
32
van Duijvenboden et al.
cardiac impulse conduction and increase the risk of
arrhythmia [13–19]. These three TFs are essential for
heart development and function of the cardiac conduction system (reviewed in [20, 55, 56]). Several of
the genetic variants are located in non-coding
regions, leading van den Boogaard et al. [48] to hypothesize that these variants affect the function of
enhancers controlling the expression pattern of the
implicated nearby genes. In this study, 2 T-boxregulated enhancers were identified in the Scn5a/
Scn10a locus [48]. After performing ChIP-seq on
the cardiac TFs TBX3/TBX5, NKX2-5, GATA4
and the enhancer-associated factor p300, overlap
analysis (Figure 2A) revealed that a conserved
region in Scn10a contained multiple TF binding
sites and the SNP rs6801957 implicated by GWAS.
This SNP is directly positioned under a TBX3
ChIP-seq peak and the G–A substitution it represents is located in a highly conserved portion of the
T-box binding consensus sequence (Figure 2C).
When exposing the implicated sequence to TBX5
and TBX3 using in vitro experiments, the reported
responses were consistent with the function of
TBX5 as an activator [57] and TBX3 as a transcriptional repressor [58, 59]. The observed endogenous
expression patterns of Scn5a and Scn10a indeed correlate positively and negatively with those of Tbx5
and Tbx3, respectively. In vivo enhancer activity of
the implicated sequence was proven when the
human and mouse orthologous fragments were
cloned into an Hsp68-LacZ reporter vector. The
LacZ expression pattern driven by the regulatory
fragment closely resembled that of both endogenous
Scn5a and Scn10a, matching the future ventricular
conduction system components (Figure 2C). In contrast, the human fragment cloned with the risk allele
showed decreased enhancer driven reporter activity
in zebrafish and the loss of T-box-mediated response
in vitro, confirming that rs6801957 variation affects
enhancer function by perturbing a T-box site.
Individuals carrying the risk allele of this SNP may
express reduced levels of SCN5A/SCN10A (it has not
been defined which promoters are controlled by this
enhancer) suggesting a likely mechanism to explain
the observed association with the increased risk of
arrhythmia. Further data demonstrating T-boxmediated regulation of the Scn5a locus was provided
by Arnolds et al. [60]. This study shows that an enhancer 15 kb downstream of Scn5a loses the capacity
to drive cardiac LacZ expression upon mutating
three conserved T-box binding sites.
The dosage of a TF can be of critical importance
in development (e.g. Tbx5 [61] and Tbx3 [62]).
Such dosage sensitivity is conferred through the presence of multiple binding sites for a single TF in the
regulatory DNA of the target gene. Such modular
enhancer function allows for the fine-tuning of localized gene expression that is necessary for conduction
system function. Consistent with this paradigm is the
observation that the effect of the rs6801957 variant
on conduction times was very small, implying that
multiple genetic variants together are required to
result in a cumulative, significant effect on sodium
channel gene expression. It is likely that other genetic variants in non-coding DNA identified by
GWAS will influence the cardiac conduction
system in a similar way. Their cumulative effect
may then lead to disease.
GENETIC VARIATION IN TBX5
ENHANCER LEADS TO
CONGENITAL HEART DISEASE
More evidence for the role of regulatory variation in
cardiac function and disease came from a study from
Smemo etal. [63]. There are several coding mutations
in TBX5 known that lead to Holt–Oram syndrome
[64]. Holt–Oram syndrome patients invariably show
malformations in the upper limbs and have an
increased risk of congenital heart defects. The incomplete penetrance of the cardiac phenotype suggests that additional triggers are necessary. Due to the
reported dosage sensitive function of TBX5, Smemo
et al. [63] screened the TBX5 locus for developmental
enhancers. The 700 kb gene desert that encompasses
TBX5 was narrowed down for regulatory potential
using 3 LacZ recombined mouse bacterial artificial
chromosomes (BACs), 2 of which showed cardiac
expression consistent with the endogenous TBX5
expression profile. Cues from evolutionary conservation and from cardiac ChIP-seq datasets (generated
by the Pu laboratory [47]) were used to further pinpoint individual enhancers within the genomic
window indicated by the BACs. Smemo et al. [63]
selected and screened 19 candidate enhancers of
which 18 overlap with genomic areas that show
high conservation between chicken and mouse
(Figure 2B). Using Hsp68-LacZ reporter vectors, six
elements were shown to drive reproducible cardiac
expression. This implies that candidate approaches
relying on evolutionary conservation can still prove
successful to identify cardiac enhancers, despite the
Gene regulatory elements of the cardiac conduction system
A
33
B
Tested elements
Exog
C
D
Conserved elements
with Chicken
DNaseI
p300
NKX2-5
TBX3
GATA4
C
D
la
lv
rs6801957
7/9
ra
rv
ivs
E14.5
G
7/ 9
G>T
1/ 11
Mouse Scn5a/Scn10a enhancer F1-2 (major)
8/9
la
lv
ra
rv
ivs
E12.5
Human SCN5A/SCN10A enhancer F1-2 (major)
% Hearts expressing GFP
80
60
40
20
0
rs 6801957
E10.5
-
G
G>A
Human TBX5 enhancer 9 (major)
E11.5
Human TBX5 enhancer 9 (minor)
Figure 2: Overview of the regulatory elements discussed in this review. (A, B) UCSC genome browser overviews
of the Scn5a/Scn10a locus studied in [48] (A) and the Tbx5 locus studied in [61] (B). In both panels, the upper track
shows the tested elements, where arrow heads indicate confirmed regulatory elements; black regions did not drive
detectable expression in the assays used. Functional assays of the elements denoted with a C and D are shown in
panels (C) and (D), respectively. The track containing genomic regions corresponding with high evolutionary conservation between mouse and chicken was obtained from the UCSC genome browser database. The DNase1 track depicts the results of a DNase1 hypersensitivity assay; the p300 track depicts the results of a p300 ChIP-seq. Both
were performed on adult heart and made available by the ENCODE consortium [51]. The NKX2-5, TBX3 and
GATA4 tracks depict the results of ChIP-seq with antibodies for these TFs performed on adult mouse heart by
van den Boogaard et al. [48]. (C, D) Enhancer activity in vivo using reporter constructs. (C) The C element [panel
(A), top track] is shown to drive reproducible expression in the interventricular septum for both the human and
mouse orthologue. The element contains the rs6801957 SNP, the major allele is highly conserved in evolution. The
minor allele showed diminished enhancer activity in a zebrafish GFP-reporter assay. Pictures courtesy of M. van
den Boogaard. ra, right atrium; la, left atrium; rv, right ventricle; lv, left ventricle; ivs, interventricular septum. (D)
The Tbx5 enhancer D [panel (B), top track] is shown to drive reporter gene expression at E11.5 in ventricular myocardium, consistent with endogenous Tbx5. A patient with a VSD possessed an SNP at an evolutionary conserved
position in this enhancer. The element containing the minor sequence variant showed abrogated cardiac enhancer
function. Pictures adapted from Smemo et al. [63], copyright owned by Oxford University Press.
reported lower degree of evolutionary conservation
of this sub-type of enhancers [39]. Three enhancers
showed patterns largely consistent with cardiac
TBX5 expression (Figure 2D), albeit that, interestingly, the left-ventricular-restricted pattern of the
endogenous gene was not recapitulated.
The genomic regions of the three enhancers were
sequenced in a patient cohort consisting of Brazilian
children born with isolated congenital heart defects.
One patient diagnosed with a non-syndromic ventricular septal defect (VSD) possessed a homozygous
G–T substitution in an enhancer driving expression
in the ventricular septum. This may impact on cardiac conduction as the AV bundle is established by
Tbx5 and Tbx3 in the ventricular septum [60, 65,
66]. After excluding protein-coding mutations in this
34
van Duijvenboden et al.
patient as the cause for the VSD, this low frequency
variant (0.3% in the Brazilian population) was tested
for enhancer function in vivo. Whereas the wild-type
allele showed reproducible cardiac expression in 7
out of 8 transgenic founder mice, the variant allele
showed cardiac expression in 1 out of 11 cases only.
Moreover, the latter expression was weak and in a
different compartment, the atrial free wall. These
results were confirmed with a zebrafish assay.
Taken together, these findings demonstrate that the
low frequency variant abrogates the correct cardiac
expression of this enhancer, likely explaining its link
to congenital malformations. Furthermore, the
modular function of this enhancer was demonstrated.
The patient with the enhancer variant did not show
any other Holt–Oram syndrome symptoms, indicating that the non-cardiac repertoire of TBX5 function was unaffected, which is consistent with the
observation that this enhancer exclusively drives
cardiac expression in reporter assays.
INTEGRATIVE METHODS TO
LOCATE ENHANCERS
As the role of regulatory elements in common
human disease is becoming more established, methods to reliably identify such enhancers are in high
demand. Through ChIP-seq tens of thousands of TF
binding sites and dynamic histone modifications for a
wealth of factors and tissues are found. How many of
the biochemical events we observe in a ChIP-seq
experiment actually contribute to gene regulation
remains an open question. In the absence of a
single mark that identifies all active enhancers in a
given tissue, the challenge to reliably identify these
elements is best undertaken through efficient integration of the various cues available. The majority of
ChIP-seq peak calling algorithms, including MACS
[67], CisGenome [68], PeakSeq [69], SPP [70] and
Sole-Search [71] use so-called input controls to correct for sampling bias in the data. Input controls are
generated by performing ChIP-seq without an antibody and help identify regions where read density is
high despite the absence of biological signal. The use
of input controls can be questioned, as protein–antibody interactions have a pronounced effect on the
bias. In input controls such interactions are absent,
leading to a different bias that cannot be used for
proper modelling of the background of an antibody-driven ChIP-seq assay [72]. Consequentially,
the use of input controls can lead to regions that
are incorrectly labelled as false positive: peaks are
ignored. The incorrect elimination of such regions
would decrease the power of any integrative
method, which would ultimately rely on the overlap
of a wealth of different cues. Although integrative
analysis on data generated by ChIP-seq has been
applied [47, 73], a systematic framework enabling
such analysis is still lacking.
NEW GENOME EDITING
TECHNOLOGIES TO FURTHER
UNRAVEL THE FUNCTIONS OF
REGULATORY DNA
Perhaps an even harder challenge than completing
the annotation of all enhancers in the genome, is
finding out which ones actually affect fitness.
Until recently, methods to interrogate enhancer
function were very limited, costly and time consuming. However, powerful new approaches for functional genomic applications have been developed.
With artificial transcription activator-like effector
nucleases (TALENs), single-stranded DNA oligonucleotides can be used to precisely modify
sequences at predefined locations in the genome
through the induction of double-strand breaks and
subsequent repair through non-homologous end
joining (NHEJ) or homology-directed repair
[74, 75].
Bacteria and archaea resist invading viruses and
plasmids through an RNA-based immune system
using clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated
(Cas) proteins [76]. Recently, this CRISPR/Cas
system has been adapted for efficient multiplexed
genome editing [77, 78]. With these possibilities to
efficiently target the genome in a direct way, new
avenues have been opened to generate insights in
enhancer-mediated phenotypes. These developments will also facilitate the functional assessment
of other types of regulatory elements, such as insulators and repressors, which are traditionally hard to
interrogate with functional studies.
CONCLUSIONS
The development and deployment of the most
recent technologies and approaches, such as
GWAS, ChIP-seq, 3C and 4C, TALEN and
CRISPR is continuously advancing the identification and functional characterization of regulatory
Gene regulatory elements of the cardiac conduction system
Sequence
constraint
3C & 4C
Technology
ChIP-seq
DNase1
Hypersensitivity
Enhancer identification
GWAS
35
Integration
Reporter gene assays
Cell culture
transfection
assay
Transient
transgenic
embryos
High throughput
functional
analysis
Genome modification
TALEN
CRISPR / Cas9
Enhancer function
x
x
Light
mRNA
sequencing
Phenotypic read-outs
Figure 3: Overview of the enhancer identification and characterization methods described in this review.
Non-coding regions of the genome can be screened for the presence of regulatory regions using a combination of
GWAS data, evolutionary conservation, ChIP-seq, 3C and 4C technologies and DNaseI hypersensitivity assays.
Functional characterization of (putative) enhancers can be performed using in vitro luciferase assays and in vivo
reporter gene assays. With the development of flexible new genome modification techniques, such as TALEN and
CRISPR, regulatory regions can directly be altered greatly facilitating functional studies on regulatory DNA.
Picture of transgenic embryo adapted from the VISTA Enhancer Browser (http://enhancer.lbl.gov/).
DNA elements (summarized in Figure 3). This has
lead to improved understanding of the role that
regulatory DNA plays in transcriptional regulation
and ultimately in the establishment of phenotypes
that could predispose to disease. In line with these
developments, genetic variations in non-coding
DNA have been found that influence cardiac conduction, and the first enhancers have now been identified that mechanistically link these variations to the
functioning of the cardiac conduction system. Such
studies are establishing a model for understanding the
molecular pathology of cardiac conduction system
disease.
Key Points
Functioning of the cardiac conduction system relies on compartment-specific ion channel and gap junction gene expression.
Genetic variation in gene regulatory elements can influence conduction system function through the aberration of fine-tuned
cardiac transcription factor networks.
Cataloguing the location and function of all relevant gene regulatory elements is an ongoing effort and is advanced by continuing
technological developments.
Acknowledgement
We thank Malou van den Boogaard for kindly providing the
images of Figure 2C.
36
van Duijvenboden et al.
FUNDING
KvD is supported by a special fellowship of the Academic
Medical Centre, Amsterdam.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
Huikuri HV, Castellanos A, Myerburg RJ. Sudden death
due to cardiac arrhythmias. N EnglJ Med 2001;345:1473–82.
Bezzina CR, Pazoki R, Bardai A, et al. Genome-wide
association study identifies a susceptibility locus at 21q21
for ventricular fibrillation in acute myocardial infarction.
Nat Genet 2010;42:688–91.
Shah AJ, Liu X, Jadidi AS, et al. Early management of atrial
fibrillation: from imaging to drugs to ablation. Nat Rev
Cardiol 2010;7:345–54.
Hoffman JI, Kaplan S. The incidence of congenital heart
disease. J Am Coll Cardiol 2002;39:1890–900.
Schram G, Pourrier M, Melnyk P, et al. Differential distribution of cardiac ion channel expression as a basis for regional specialization in electrical function. Circ Res 2002;90:
939–50.
Moorman AFM, Christoffels VM. Cardiac chamber formation: development, genes and evolution. Physiol Rev 2003;
83:1223–67.
Marionneau C, Couette B, Liu J, et al. Specific pattern of
ionic channel gene expression associated with pacemaker
activity in the mouse heart. J Physiol 2005;562:223–34.
Kolder IC, Tanck MW, Bezzina CR. Common genetic
variation modulating cardiac ECG parameters and susceptibility to sudden cardiac death. J Mol Cell Cardiol 2012;52:
620–9.
Jouven X, Empana JP, Schwartz PJ, et al. Heart-rate profile
during exercise as a predictor of sudden death. N EnglJ Med
2005;352:1951–8.
Cheng S, Keyes MJ, Larson MG, et al. Long-term outcomes
in individuals with prolonged PR interval or first-degree
atrioventricular block. JAMA 2009;301:2571–7.
Hesse B, Diaz LA, Snader CE, et al. Complete bundle
branch block as an independent predictor of all-cause mortality: report of 7,073 patients referred for nuclear exercise
testing. AmJ Med 2001;110:253–9.
Newton-Cheh C, Shah R. Genetic determinants of QT
interval variation and sudden cardiac death. Curr Opin
Genet Dev 2007;17:213–21.
Holm H, Gudbjartsson DF, Arnar DO, et al. Several
common variants modulate heart rate, PR interval and
QRS duration. Nat Genet 2010;42:177–22.
Sotoodehnia N, Isaacs A, de Bakker PI, et al. Common
variants in 22 loci are associated with QRS duration and
cardiac ventricular conduction. Nat Genet 2010;42:1068–76.
Newton-Cheh C, Eijgelsheim M, Rice KM, etal. Common
variants at ten loci influence QT interval duration in the
QTGEN Study. Nat Genet 2009;41:399–406.
Chambers JC, Zhao J, Terracciano CM, et al. Genetic variation in SCN10A influences cardiac conduction. Nat Genet
2010;42:149–52.
Pfeufer A, van NC, Marciante KD, et al. Genome-wide
association study of PR interval. Nat Genet 2010;42:153–9.
Butler AM, Yin X, Evans DS, et al. Novel loci associated
with PR interval in a genome-wide association study of 10
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
African American cohorts. Circ Cardiovasc Genet 2012;5:
639–46.
den Hoed M, Eijgelsheim M, Esko T, et al. Identification of
heart rate-associated loci and their effects on cardiac conduction and rhythm disorders. Nat Genet 2013.
Christoffels VM, Smits GJ, Kispert A, et al. Development of
the pacemaker tissues of the heart. Circ Res 2010;106:
240–54.
Munshi NV. Gene regulatory networks in cardiac conduction system development. Circ Res 2012;110:1525–37.
Hindorff LA, Sethupathy P, Junkins HA, et al. Potential
etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci
USA 2009;106:9362–7.
Maurano MT, Humbert R, Rynes E, et al. Systematic
localization of common disease-associated variation in regulatory DNA. Science 2012;337:1190–5.
Karczewski KJ, Dudley JT, Kukurba KR, et al. Systematic
functional regulatory assessment of disease-associated variants. Proc Natl Acad Sci USA 2013.
Kleinjan DA, van Heyningen V. Long-range control of
gene expression: emerging mechanisms and disruption in
disease. AmJ Hum Genet 2005;76:8–32.
Jenuwein T, Allis CD. Translating the histone code. Science
2001;293:1074–80.
Ruthenburg AJ, Li H, Patel DJ, et al. Multivalent engagement of chromatin modifications by linked binding modules. Nat Rev Mol Cell Biol 2007;8:983–94.
Creyghton MP, Cheng AW, Welstead GG, et al. Histone
H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci USA 2010;107:
21931–6.
Visel A, Blow MJ, Li Z, et al. ChIP-seq accurately
predicts tissue-specific activity of enhancers. Nature 2009;
457:854–8.
Heintzman ND, Hon GC, Hawkins RD, et al. Histone
modifications at human enhancers reflect global cell-typespecific gene expression. Nature 2009;459:108–12.
Su W, Porter S, Kustu S, et al. DNA-looping and enhancer
activity: association between DNA-bound NtrC activator
and RNA polymerase at the bacterial glnA promoter. Proc
Natl Acad Sci USA 1990;87:5504–8.
Deng W, Lee J, Wang H, et al. Controlling long-range
genomic interactions at a native locus by targeted tethering
of a looping factor. Cell 2012;149:1233–44.
Simonis M, Klous P, Splinter E, et al. Nuclear organization
of active and inactive chromatin domains uncovered by
chromosome conformation capture-on-chip (4C). Nat
Genet 2006;38:1348–54.
Kulaeva OI, Nizovtseva EV, Polikanov YS, et al. Distant
activation of transcription: mechanisms of enhancer action.
Mol Cell Biol 2012;32:4892–7.
Dubchak I, Brudno M, Loots GG, et al. Active conservation
of noncoding sequences revealed by three-way species comparisons. Genome Res 2000;10:1304–6.
Nobrega MA, Ovcharenko I, Afzal V, et al. Scanning
human gene deserts for long-range enhancers. Science
2003;302:413.
Pennacchio LA, Ahituv N, Moses AM, et al. In vivo enhancer analysis of human conserved non-coding sequences.
Nature 2006;444:499–502.
Gene regulatory elements of the cardiac conduction system
38. Visel A, Prabhakar S, Akiyama JA, et al. Ultraconservation
identifies a small subset of extremely constrained developmental enhancers. Nat Genet 2008;40:158–60.
39. Blow MJ, McCulley DJ, Li Z, et al. ChIP-seq identification
of weakly conserved heart enhancers. Nat Genet 2010;42:
806–10.
40. May D, Blow MJ, Kaplan T, et al. Large-scale discovery of
enhancers from human heart tissue. Nat Genet 2011.
41. Schmidt D, Wilson MD, Ballester B, et al. Five-vertebrate
ChIP-seq reveals the evolutionary dynamics of transcription
factor binding. Science 2010;328:1036–40.
42. Barski A, Cuddapah S, Cui K, et al. High-resolution profiling of histone methylations in the human genome. Cell
2007;129:823–37.
43. Robertson G, Hirst M, Bainbridge M, et al. Genome-wide
profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat
Methods 2007;4:651–7.
44. Johnson DS, Mortazavi A, Myers RM, et al. Genome-wide
mapping of in vivo protein-DNA interactions. Science 2007;
316:1497–502.
45. Heintzman ND, Stuart RK, Hon G, et al. Distinct and predictive chromatin signatures of transcriptional promoters
and enhancers in the human genome. Nat Genet 2007;39:
311–8.
46. Junion G, Spivakov M, Girardot C, et al. A transcription
factor collective defines cardiac cell fate and reflects lineage
history. Cell 2012;148:473–86.
47. He A, Kong SW, Ma Q, et al. Co-occupancy by multiple
cardiac transcription factors identifies transcriptional enhancers active in heart. Proc Natl Acad Sci USA 2011;108:5632–7.
48. van den Boogaard M, Wong LY, Tessadori F, et al. Genetic
variation in T-box binding element functionally affects
SCN5A/SCN10A enhancer. J Clin Invest 2012;122:
2519–30.
49. The ENCODE (ENCyclopedia Of DNA Elements)
Project. Science 2004;306:636–40.
50. Birney E, Stamatoyannopoulos JA, Dutta A, et al.
Identification and analysis of functional elements in 1% of
the human genome by the ENCODE pilot project. Nature
2007;447:799–816.
51. Stamatoyannopoulos JA, Snyder M, Hardison R, et al. An
encyclopedia of mouse DNA elements (Mouse ENCODE).
Genome Biol 2012;13:418.
52. Dekker J, Rippe K, Dekker M, et al. Capturing chromosome conformation. Science 2002;295:1306–11.
53. Zhao Z, Tavoosidana G, Sjolinder M, et al. Circular
chromosome conformation capture (4C) uncovers extensive
networks of epigenetically regulated intra- and interchromosomal interactions. Nat Genet 2006;38:1341–7.
54. Splinter E, de WE, Nora EP, et al. The inactive X chromosome adopts a unique three-dimensional conformation
that is dependent on Xist RNA. Genes Dev 2011;25:
1371–83.
55. Hoogaars WMH, Barnett P, Moorman AFM, et al. T-box
factors determine cardiac design. Cell Mol Life Sci 2007;64:
646–60.
56. McCulley DJ, Black BL. Transcription factor pathways and
congenital heart disease. CurrTop Dev Biol 2012;100:253–77.
57. Bruneau BG, Nemer G, Schmitt JP, et al. A murine model
of Holt-Oram syndrome defines roles of the T-box
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
37
transcription factor TBX5 in cardiogenesis and disease.
Cell 2001;106:709–21.
Hoogaars WM, Engel A, Brons JF, et al. Tbx3 controls the
sinoatrial node gene program and imposes pacemaker function on the atria. Genes Dev 2007;21:1098–112.
Horsthuis T, Buermans HP, Brons JF, et al. Gene expression
profiling of the forming atrioventricular node using a novel
Tbx3-based node-specific transgenic reporter. Circ Res
2009;105:61–9.
Arnolds DE, Liu F, Fahrenbach JP, et al. TBX5 drives Scn5a
expression to regulate cardiac conduction system function.
J Clin Invest 2012;122:2509–18.
Mori AD, Zhu Y, Vahora I, et al. Tbx5-dependent rheostatic control of cardiac gene expression and morphogenesis.
Dev Biol 2006;297:566–86.
Frank DU, Carter KL, Thomas KR, etal. Lethal arrhythmias
in Tbx3-deficient mice reveal extreme dosage sensitivity of
cardiac conduction system function and homeostasis. Proc
Natl Acad Sci USA 2011;109:E154–63.
Smemo S, Campos LC, Moskowitz IP, et al. Regulatory
variation in a TBX5 enhancer leads to isolated congenital
heart disease. Hum Mol Genet 2012;21:3255–63.
Basson CT, Bachinsky DR, Lin RC, et al. Mutations
in human TBX5 (corrected) cause limb and cardiac
malformation in Holt-Oram syndrome. Nat Genet 1997;15:
30–5.
Bakker ML, Boukens BJ, Mommersteeg MTM, et al.
Transcription factor Tbx3 is required for the specification
of the atrioventricular conduction system. Circ Res 2008;102:
1340–9.
Moskowitz IP, Kim JB, Moore ML, et al. A molecular pathway including id2, tbx5, and nkx2-5 required for cardiac
conduction system development. Cell 2007;129:1365–76.
Zhang Y, Liu T, Meyer CA, et al. Model-based analysis of
ChIP-seq (MACS). Genome Biol 2008;9:R137.
Ji H, Jiang H, Ma W, etal. An integrated software system for
analyzing ChIP-chip and ChIP-seq data. Nat Biotechnol
2008;26:1293–300.
Rozowsky J, Euskirchen G, Auerbach RK, et al. PeakSeq
enables systematic scoring of ChIP-seq experiments relative
to controls. Nat Biotechnol 2009;27:66–75.
Valouev A, Johnson DS, Sundquist A, et al. Genome-wide
analysis of transcription factor binding sites based on ChIPseq data. Nat Methods 2008;5:829–34.
Blahnik KR, Dou L, O’Geen H, et al. Sole-Search: an
integrated analysis program for peak detection and functional annotation using ChIP-seq data. Nucleic Acids Res
2010;38:e13.
Cheung MS, Down TA, Latorre I, et al. Systematic bias in
high-throughput sequencing data and its correction by
BEADS. Nucleic Acids Res 2011;39:e103.
Bolouri H, Ruzzo WL. Integration of 198 ChIP-seq datasets reveals human cis-regulatory regions. J Comput Biol
2012;19:989–97.
Cermak T, Doyle EL, Christian M, et al. Efficient design
and assembly of custom TALEN and other TAL effectorbased constructs for DNA targeting. Nucleic Acids Res 2011;
39:e82.
Bedell VM, Wang Y, Campbell JM, et al. In vivo genome
editing using a high-efficiency TALEN system. Nature 2012;
491:114–8.
38
van Duijvenboden et al.
76. Horvath P, Barrangou R. CRISPR/Cas, the immune
system of bacteria and archaea. Science 2010;327:167–70.
77. Cong L, Ran FA, Cox D, et al. Multiplex genome engineering using CRISPR/Cas systems. Science 2013;339:
819–23.
78. Wang H, Yang H, Shivalila CS, et al. One-step generation of mice carrying mutations in multiple genes by
CRISPR/Cas-mediated genome engineering. Cell 2013;
153:910–8.