Download Functional implications of genetic variation in non

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA polymerase wikipedia , lookup

Replisome wikipedia , lookup

DNA nanotechnology wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

DNA repair protein XRCC4 wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Clinical Science (2003) 104, 493–501 (Printed in Great Britain)
GLAXOSMITHKLINE/MRS PAPER
Functional implications of genetic variation in
non-coding DNA for disease susceptibility
and gene regulation*
Julian C. KNIGHT
Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Headington, Oxford OX3 7BN, U.K.
A
B
S
T
R
A
C
T
The role of host genetic variation in determining susceptibility to complex disease traits is the
subject of much research effort, but it often remains unclear whether disease-associated genetic
polymorphisms are themselves functionally relevant or acting only as markers within an
extended haplotype. Experimental approaches to investigate the functional impact of polymorphisms in non-coding regulatory DNA sequences for gene expression are discussed, including
the role of gel-shift assays, DNA footprinting and reporter gene analysis. The limitations of
different experimental approaches are presented together with future prospects for in vivo
analysis. The strategic application of these functional approaches is discussed and illustrated
by analysis of the role of genetic variation in the tumour necrosis factor promoter region in
determining susceptibility to severe malaria.
INTRODUCTION
The genetic factors that determine our individual susceptibility to complex disease traits have proved difficult
to localize despite significant research effort. Gene effects
are likely to involve multiple susceptibility loci of
individually modest magnitude, limiting the application
of linkage analysis. The candidate gene approach has been
widely used to analyse possible associations between
genetic variants and disease outcome, with selection
of genes based on a priori knowledge of disease pathogenesis. However, both linkage analysis and association
studies typically leave open the question of whether a
disease-associated genetic variant is functionally important or serving only as a genetic marker with the functional
locus co-inherited on the polymorphic allele.
Most genetic variation comprises single base changes in
the DNA sequence, known as single nucleotide polymor-
phisms (SNPs), occurring approximately once every 1000
nucleotides [1,2] (Figure 1). Where this occurs in coding
DNA, the implications may be readily apparent as a
change in the amino acid make-up of the translated
protein [3]. Even where the polymorphism remains silent
at the protein level, as in the case of synonymous
mutations, the effect of the polymorphism can be assayed
at the level of mRNA. This is based on the fact that when
a polymorphism occurs in coding DNA it will be present
in the transcribed mRNA, allowing the relative abundance of the two alleles in cells from an individual
heterozygous for a given SNP to be assayed. In contrast,
assaying the functional effect of polymorphisms occurring in non-coding DNA is more problematic. Here,
most attention has focused on SNPs occurring in putative
regulatory regions in which a base change in the DNA
sequence may affect the process of gene expression.
Polymorphisms occurring in promoter regions upstream
* This paper was presented at the GlaxoSmithKline\MRS Young Investigator session at the MRS Meeting, Royal College of
Physicians, London on 4 February 2002.
Key words : gene regulation, genetic susceptibility, polymorphism, tumour necrosis factor.
Abbreviations : CAT, chloramphenicol acetyl transferase ; EMSA, electrophoretic mobility-shift assay ; NF-κB, nuclear factor κB ;
Oct, octamer ; SNP, single nucleotide polymorphism ; TNF, tumour necrosis factor.
Correspondence : Dr Julian Knight (e-mail julian!well.ox.ac.uk).
# 2003 The Biochemical Society
493
494
J. C. Knight
the analysis of the naturally occurring combination of
polymorphisms on a given allele that may, in turn, reveal
their true biological significance.
In this paper, I will outline some of the experimental
approaches used to define the functional significance of
genetic polymorphism and illustrate these using examples
from the published literature and through work on SNPs
in the promoter region of the TNF gene, the gene
encoding tumour necrosis factor. The clinical significance
of this work lies in identifying candidate functional SNPs
that may have a role in determining disease susceptibility.
An increasing amount of genomic information is now
available with large numbers of SNPs in the public
domain through SNP discovery efforts such as the SNP
Consortium [2]. While this facilitates fine mapping of
regions by providing increasing numbers of markers
for linkage disequilibrium mapping, it also highlights
Figure 1 Example of an SNP comprising a G A substitution the current need for more research effort in defining the
Electropherograms produced by fluorescence-based sequencing using an ABI 3700 functional relevance of polymorphisms, so that this can
showing the genomic DNA from an individual homozygous for G at the site of the be used either to allow a rational choice of SNPs for
SNP (a) and an individual homozygous for A (b). The base substitution is denoted inclusion in genetic epidemiological studies or to aid the
by an arrow.
interpretation of apparent disease-associated SNPs.
of genes may potentially affect the process of transcription whereby RNA polymerase, the enzyme responsible for transcribing the DNA into mRNA, is
recruited to the gene and the nascent mRNA synthesized.
This complex process requires the co-ordinated action of
multiple regulatory proteins through complex protein–
DNA and protein–protein interactions [4]. Variation in
the DNA sequence may potentially alter the affinities of
existing protein–DNA interactions or, indeed, recruit
new proteins to bind to the DNA, altering the specificity
and kinetics of the transcriptional process [5–8].
Analysis of putative functional SNPs in regulatory
regions has focused on techniques which assay protein–
DNA interactions, such as DNA footprinting and gelshift assays, and on those measuring gene expression,
such as reporter gene experiments [9,10]. However, such
methods provide predominately in vitro evidence of the
potential effects of a change in DNA sequence on gene
expression. The process of control of gene expression
operates at many levels, not least of which is the
chromatin structure within which DNA is packaged and
whose particular state of modification profoundly affects
accessibility to regulatory transcription factor proteins
and thus levels of expression. Thus, while important
insights into potential modulation of gene expression
may be gained using in vitro techniques, it is only by
analysis of SNPs in their natural in vivo context that it is
possible to gain a clear appreciation of their biological
importance. Furthermore, not only is it important to
investigate the consequences of DNA sequence variation
within the naturally occurring milieu of transcriptional
machinery, but also in the context of the naturally
occurring haplotype within which the SNP occurs. It is
# 2003 The Biochemical Society
APPROACHES IN DEFINING FUNCTIONAL
EFFECTS OF POLYMORPHISMS IN NONCODING DNA
In vitro assays of protein–DNA interactions
A typically postulated mode of action whereby DNA
sequence variation may exert a functional effect on gene
regulation is through an effect on protein–DNA interactions. Regulatory transcription factor proteins show a
high degree of specificity in their DNA-binding domains
for the DNA sequence to which they will bind. This
consensus DNA sequence is typically 5–8 nucleotides
in length and specific to a given family of transcription
factors ; for example, members of the octamer (Oct)
family of transcription factors recognize the DNA
sequence ATGCAAAT [11]. The DNA sequence
corresponding to the two allelic forms of a given SNP
can thus be inspected for its inclusion of any potential consensus binding motifs, for example, derived
using online transcription factor databases, e.g. the
TRANSFAC transcription factor database (http:\\www.
generegulation.de\). Quantitative models predicting the
effects of SNPs on the binding affinity of a transcription
factor for the DNA-binding motifs have been developed,
for example, for the important regulatory transcription
factor nuclear factor κB (NF-κB) [13].
The electrophoretic mobility-shift assay (EMSA ; gelshift assay) [14] provides a comparatively simple, rapid
and extremely sensitive technique for studying protein–
DNA interactions in vitro. In this technique, short DNA
sequences are synthesized, typically 20 nucleotides in
Functional implications of genetic variation in non-coding DNA
Figure 2
Principle of the EMSA
(a) The binding site of interest is synthesized as a short radiolabelled DNA probe which can be used to identify both known and novel factors binding to the candidate
region. Once bound to DNA, a protein–DNA complex is stabilized when subjected to non-denaturing PAGE, allowing resolution of protein–DNA complexes as discrete
bands. (b) The specificity of the interaction may be investigated by competition experiments in which typically 10- or 100-fold excess unlabelled probe is added, which,
in the case of a specific competitor probe, results in progressively less radiolabelled probe bound by the transcription factor protein.
length, which match the SNP in the two allelic forms.
These are radiolabelled and used as bait to which
transcription factors may potentially bind, retarding the
migration of the radiolabelled DNA when these are
electrophoresed through a gel (Figure 2a). Entry of free
DNA and DNA–protein complexes into the gel matrix
by electrophoresis results in physical separation ; in
general, the larger the protein bound, the greater the
retardation of mobility within the gel. Stability is
enhanced by low salt conditions in the gel and a cage
effect created by the gel matrix. Non-specific interactions
are minimized by adding synthetic co-polymers such as
poly(dI-dC)–(dI-dC) [15]. Preliminary experiments may
involve use of a crude nuclear extract prepared from an
appropriate cell line under stimulation conditions of
interest, based, for example, on the putative cell type in
which the SNP may be exerting an effect. In the case of
SNPs in the TNF gene, for example, macrophages are
known to be a major source of TNF production in the
course of infection and may be an appropriate cell type to
study. It is important to be aware that interactions
observed in preliminary experiments may not be specific
and this can be resolved by competition experiments, as
outlined in Figure 2(b). Increasing amounts of unlabelled
competitor to DNA probe are added to a fixed amount of
labelled DNA probe. Excess unlabelled competitor probe
identical with the labelled probe should abolish the
complexes observed from the now relatively small
proportion of labelled probe in the reaction. In contrast,
if the unlabelled competitor probe does not contain a
high-affinity site for the protein it will not affect the
protein–DNA interaction. Further information can also
be obtained using antibodies to specific candidate transcription factors. Binding of antibodies to the DNA# 2003 The Biochemical Society
495
496
J. C. Knight
binding protein results in slower mobility through the
gel, ‘ supershifting ’ the complex : indeed, the complex
may disappear if the antibody binds to a domain of the
transcription factor that is required for DNA binding.
Binding affinities can also be investigated in more detail
in titration experiments with either crude nuclear extract
or recombinant protein. For example, clear differential
binding affinity was observed for NF-κB family members
with the TNF promoter SNP at k863 nt which lies in an
NF-κB-binding site [7]. It is important to remember,
however, that EMSA experiments can provide information that a change in the DNA sequence due to the
SNP has the capacity in vitro to affect protein–DNA
interactions, but not that these are actually occurring
in vivo.
The EMSA technique has been applied to the functional characterization of many SNPs. A number of
important host genetic determinants of susceptibility to
malaria infection have been identified, including possession of the Duffy blood group antigen, which functions both as a chemokine receptor and as a receptor for
erythrocyte invasion by Plasmodium vivax malarial
parasites [16,17]. In the absence of the Duffy antigen,
erythrocytes are resistant to invasion by malarial parasites.
This phenotype, designated Duffy negative, occurs because the DARC gene (encoding Duffy antigen receptor
for chemokines) is not transcribed in erythrocytes.
However, in these individuals, Duffy mRNA and antigen
are expressed in non-erythroid tissues. The molecular
basis for this was elucidated by analysis of regulation of
the DARC gene. In Duffy-negative individuals, a T C
SNP in the DARC promoter region results in erythroidspecific repression of gene expression [5]. EMSA experiments demonstrated that this occurs because the
polymorphism abolishes the binding of the erythroidspecific transcription factor GATA-1, which was associated with a 20-fold reduction in gene expression.
A limitation of the EMSA technique is that the region
of DNA of interest must be known and that the length of
probe used is short (typically 20–30 nucleotides) to allow
good resolution on the gel. This does not allow for cooperation between sites that may be required for binding
and, if the binding site is unknown, it is not an efficient
screening method. The technique of DNA footprinting
offers considerable advantages in this respect as it can be
used to generate a map of sites of protein–DNA interaction occurring in a region, including novel sites of
interaction. In a region containing several SNPs, this
therefore allows the investigator to identify SNPs lying
in or near sites of protein–DNA interaction which merit
further investigation. DNA footprinting (or protection
mapping) is based on cleavage of a DNA sequence with
either a chemical reagent or an enzyme. The DNA region
of interest, which may be several hundred nucleotides in
length, is synthesized as an oligoduplex probe and
radiolabelled at one end. Careful titration ensures that
# 2003 The Biochemical Society
Figure 3
DNaseI footprinting assay
Radiolabelled DNA probes are digested with DNaseI in the absence or presence of
proteins ; protein binding to DNA is seen as a region of protection in the ladder
of DNA fragments. In this example, increasing amounts of crude nuclear extract are
added to the radiolabelled DNA probe prior to addition of enzyme. Regions of
protection (‘ footprints ’) are denoted schematically to the right of the gel.
each molecule of the DNA is cleaved on average only
once at a random site, resulting in a ladder of DNA
fragments of varying size when subjected to denaturing
PAGE (Figure 3). Each band in the DNA ladder is
representative of a specific nucleotide where the cleavage
agent has cut the labelled strand of DNA, and can be
localized by concomitantly running a Maxam–Gilbert
sequencing reaction of the same DNA probe alongside
the cleavage products. In the presence of protein forming
a binding complex with the DNA, the nucleotides to
which the protein is bound will be protected from
cleavage. This produces a gap or ‘ footprint ’ in the ladder
of labelled DNA fragments on electrophoresis, allowing
Functional implications of genetic variation in non-coding DNA
Figure 4
Principles of reporter gene design
Cloning sites for insertion of gene sequence to test are shown (>) 5h and 3h to
the reporter gene. The antibiotic resistance gene and prokaryotic origin of
replication permit propagation and selection of the vector in E. coli. Poly(A),
polyadenylated.
specific localization of the site of protein–DNA interaction.
DNaseI footprinting [18] uses DNaseI as the cleavage
agent of the DNA molecules. This enzyme does not
completely cleave the DNA duplex, but rather will
randomly cut only one strand of the DNA. DNaseI
binds to the minor groove of the DNA and cuts the
phosphodiester backbone of each strand independently
[19]. On binding by proteins, the DNA is protected from
hydrolysis catalysed by DNaseI, producing the characteristic footprint. In addition, structural changes may
make nucleotides adjacent to the bound region more
vulnerable to digestion, appearing more intensely on the
autoradiograph as ‘ hypersensitive ’ sites. The footprinting technique allows for the detection of multiple
binding sites on the DNA fragment. Probes need to have
high specific activity, and protein concentrations must
be high enough to saturate binding to the DNA in order to
ensure that potential ‘ footprint ’ areas are not masked by
unbound digested fragments. Careful titration of DNaseI
concentration and exposure time is required to ensure
one nick occurs per molecule of DNA probe to try to
ensure an even ladder. Unfortunately, preferential sites of
cleavage by DNaseI mean the ladder tends to be uneven,
but differences between protein-protected and naked
DNA ladders are usually clear. The size of the DNaseI
enzyme limits the resolution of the footprints. These may
be defined further using other reagents, such as exonuclease III, while the individual nucleotides involved in
binding can also be assessed by methylation interference
footprinting.
Functional assays of transcriptional activity
The most commonly used assay to investigate the
functional effects of SNPs on gene expression is transient
transfection. This approach involves constructing a plasmid in which the putative regulatory region spanning the
position of the SNP(s) from the gene of interest is linked
to the coding sequence for an unrelated reporter gene [20]
(Figure 4). For the purpose of comparing SNPs occurring
in a gene promoter region, for example, the promoter
fragment can be placed directly upstream of the reporter
gene in a plasmid vector lacking endogenous promoter
activity and then constructs with the two allelic forms of
the SNP are compared. Following transfer into cells,
measurement of the reporter gene product gives an
indirect estimate of the induction in gene expression
directed by the regulatory sequences. A range of reporter
genes are available and generally aim to (i) produce a
protein normally absent from the host cell ; (ii) be
quantifiable in a simple, rapid, reproducible and sensitive
assay with a broad linear range ; and (iii) not alter the
physiology of the recipient cells. The major limitation of
transient transfection assays is that, within a transfected
cell, the plasmid DNA exists in a highly artificial
configuration with variable copy number that can profoundly influence results, leading, for example, to inactivity or dysregulation of regulatory elements [21].
A variety of techniques can be employed to introduce
(‘ transfect ’) the DNA into cells. Careful optimization is
required as every mammalian cell type has a characteristic
set of requirements for introduction of foreign DNA.
Divalent cations, such as calcium and magnesium, were
initially found to promote uptake of DNA into bacteria
and this led to the development of the technique of
calcium phosphate transfection [22,23]. A second chemical method, DEAE-dextran transfection [24,25], is simpler to perform and more reproducible than calcium
phosphate-mediated transfection, although it is not
suitable for production of stably transfected cell lines,
and DEAE-dextran can be directly toxic to cells. The
coupling of a positively charged chemical group DEAE
(diethylaminoethyl) to an inert carbohydrate polymer
(dextran) results in sticking of co-incubated DNA via its
negatively charged phosphate groups to cell surfaces,
from where they may be taken into cells by endocytosis.
In some cell types, transfection may be enhanced by a
glycerol or DMSO shock after the cells are exposed to
DNA–DEAE-dextran complexes [26]. Other methods
available for introducing DNA into cells include electroporation in which cells in solution with DNA are
subjected to a brief electrical pulse that causes pores to
transiently open in their membranes, allowing DNA
to diffuse into the cells [27]. Liposomes can also be used to
mediate transfection with DNA incorporated into artificial lipid vesicles that fuse with the cell membrane [28].
Only very rarely will DNA transferred into a cell nucleus
be stably incorporated into the genome of the transfected
cell (1 in 10$–10' cells) [29]. However, many more cells,
up to 50 %, will take up DNA on transfection but fail to
integrate it, with DNA persisting for several days in the
nucleus and being subject to the regulatory machinery
that controls expression of endogenous genes. This
transient expression allows assays of regulatory elements
when these are incorporated in the transfected DNA
reporter gene construct.
# 2003 The Biochemical Society
497
498
J. C. Knight
Quantification of the reporter gene protein by enzymic
or immunological means allows an indirect estimate of
the transcriptional activity of the reporter vector encoding the protein. One of the earliest assays was based on
the bacterial gene CAT encoding the enzyme chloramphenicol acetyl transferase [30]. The CAT gene is derived
from transposon 9 of Escherichia coli and catalyses
the transfer of the acetyl group from acetylCoA to the
substrate chloramphenicol. Enzymic activity can be
quantified on the basis of acetylated chloramphenicol ;
for example, by following the conversion of ["%C]
chloramphenicol into its acetyl derivatives using TLC on
silica gels by autoradiography or by organic extraction
and scintillation counting, or by non-radioactive methods
such as ELISA. The CAT reporter protein is stable and
not found in eukaryotes, but assays are typically timeconsuming with relatively low sensitivity and a narrow
linear range. Among more recently developed reporter
gene assays, a highly sensitive rapid method measures the
enzyme luciferase and is based on the oxidation of beetle
luciferin with production of photons of light, which are
quantified using a luminometer over a broad linear
range [31].
Transient transfection assays provide valuable information regarding potential modulation of gene expression by DNA sequence variation resulting from SNPs.
However, it must be appreciated that they represent an
in vitro technique in which a naked DNA sequence is
assayed in the absence of what may be its normal
transcriptional machinery and of regulatory elements,
such as enhancers, which may be a significant distance
from the promoter itself. Assays often show a high
degree of variability, a major cause of which is variation
in transfection efficiency [32]. Co-transfection of a
control reporter gene vector can be used to normalize for
transfection efficiency within or between transfection
experiments. The control reporter gene vector has a
different gene product from the vector containing the
DNA of interest and is typically driven by a strong
constitutive promoter. The choice of which portions of
the naturally occurring regulatory regions of the gene
locus, within which the polymorphism is found, to include
in the reporter gene may profoundly influence results.
For example, inclusion of the TNF 3h-untranslated region
(‘ UTR ’) appears to significantly influence observed
results for the TNF− SNP [33]. Similarly, inclusion of
$!)
other SNPs in the reporter construct may be important,
as any functional effect of an SNP may also be dependent
on its naturally occurring haplotype of associated polymorphisms. Results of transient transfection assays have
also been found to be highly context specific with respect
to the choice of cell type to transfect, the stimulus used
for induction of gene expression, the mode and efficiency
of transfection and the DNA plasmid design and preparation. These differences in experimental design may
have significant consequences for data interpretation and
# 2003 The Biochemical Society
underlie much of the apparent controversy in the
literature concerning the effects of a given SNP on
transcription.
Stable transfection offers some solutions to the problems encountered with transient transfection assays.
Here, the plasmid containing the reporter gene with
the DNA region of interest is stably incorporated into the
genome, with selection of cells in which this has occurred
based, for example, on a drug-resistance gene present
within the same plasmid. Incorporation into the genome
offers significant advantages in that this is a more natural
chromatin environment and copy number, but the
approach is more technically demanding and takes several
weeks to complete, because of the requirement for drug
selection and cell expansion. An important consideration
is that the transcriptional activity of the DNA sequence
of interest will be significantly influenced by the accessibility of the site into which it has integrated, for
example, in terms of chromatin configuration. This
requires analysis of multiple clones and may well exclude
its usefulness for most SNPs in which a modest difference
in expression for the two alleles is predicted, as clone-toclone variation is likely to be greater than 2- to 3-fold [9].
An alternative approach to transfection assays is to use
a cell-free system to perform in vitro transcription [34].
This approach is of significant value in the detailed
biochemical characterization of a regulatory element and
may be most appropriate when a hypothesis exists as to
the likely mechanism of action of an SNP, for example, in
terms of recruitment of a specific transcription factor
in the presence of a particular DNA base change. Early
assays utilized a naked DNA template but, if practicable,
it may be most appropriate to recreate a chromatin
template to more fully replicate events occurring in vivo
with this reconstituted system. Even in this situation,
specific regulatory proteins may not be fully active in an
in vitro assay of this type, because of their low concentration and absence of cofactors essential for their
function.
In vivo assays to determine function
One approach to defining the functional significance of
SNPs would be to take cells from individuals of differing
genotype and correlate this with phenotype, for example,
TNF promoter genotype with levels of TNF protein
secreted by peripheral blood mononuclear cells following
induction. This type of experiment is, however, potentially confounded at many levels. Individuals may
differ at a particular SNP of interest, but may be
polymorphic at other positions in the region, confounding the interpretation of the results. Comparison between
individuals should ideally be based on knowledge of the
underlying haplotypes, so that meaningful interpretations can be made. Any observed variation between
individuals for a given phenotype may also arise from
Functional implications of genetic variation in non-coding DNA
Figure 5
TNF promoter region
The 1.2 kb TNF promoter region is highly polymorphic and subject to interaction with multiple regulatory proteins which, in turn, recruit cofactors and RNA polymerase II (RNAP II). Accessibility of the region is dependent on chromatin configuration and, for illustration, a nucleosome is shown in the distal region around which the
DNA is coiled. A cluster of SNPs found in the distal TNF promoter is also shown.
multiple environmental and genetic factors other than the
genotype under study, which may potentially confound
results. Transcribed SNPs offer the opportunity to
compare within cells from an individual heterozygous for
a given SNP the relative abundance of mRNA originating
from the two alleles discriminated by the SNP using
allele-specific reverse transcription–PCR. Unfortunately,
this approach is confined to SNPs occurring in exonic
regions, which limits its applicability.
Research in the field of gene regulation has yielded
several novel in vivo approaches which may aid characterization of SNPs in vivo, but which are, at this time,
technically challenging. One approach is to perform
genomic DNA footprinting in order to visualize sites
of protein–DNA interaction occurring in living cells
[35,36]. Dimethyl sulphate (‘ DMS ’) is a chemical which
rapidly enters cells, modifying exposed guanine residues
in DNA which are then susceptible to cleavage with
piperidine. In the presence of a DNA-binding protein,
DNA is protected from modification and subsequent
cleavage, resulting in a footprint in the ladder of DNA
fragments. The challenge is how to visualize this as, at a
genomic level, the amounts of DNA involved are very
small and the cut fragments not amenable to conventional
PCR. The technique of ligation-mediated PCR has been
successfully employed, ligating a linker oligonucleotide
to enable amplification. The results of such genomic
footprinting are extremely valuable as an in vivo record
of protein–DNA interactions actually occurring, but are
typically not clear-cut and have yet to be successfully
employed in characterization of SNPs to demonstrate an
allele-specific difference. In vivo protein–DNA interactions could also be potentially investigated based on
knowledge of the probable proteins involved using
chromatin immunoprecipitation [37]. In this situation,
specific antibodies would be used to immunoprecipitate
chromatin to which proteins have been previously
crosslinked in vivo using a reagent such as formaldehyde ;
levels of occupancy at a specific gene can then be assessed
using gene-specific primers to amplify the immunoprecipitated DNA.
A strategic approach : functional analysis of
a TNF polymorphism
A variety of experimental methodologies can be adopted
to investigate the functional significance of SNPs. The
TNF promoter region is typical of many such regulatory
regions of DNA in that it contains multiple transcription
factor-binding sites with highly context-specific regulation such that particular elements are more important in
certain cell types and in response to specific stimuli. The
majority of the transcription factor-binding sites lie in
the first 200 nucleotides upstream of the transcriptional
start site, whereas the majority of SNPs lie more distally
(Figure 5). As part of an investigation of genetic determinants of susceptibility to severe malaria, TNF has been
investigated as a candidate gene based on a priori
knowledge of its role in disease pathogenesis. In order to
identify potentially functional SNPs, the initial strategy
was to map out potential sites of protein–DNA interactions in this highly polymorphic region using in vitro
DNA footprinting in a human macrophage cell line. This
revealed a novel region of protein–DNA interactions
spanning a G A SNP at k376 nt relative to the
transcriptional start site (TNF− ) [6]. In the presence of
$('
the adenine substitution, an additional complex was seen
on EMSA which was present constitutively. Further
characterization, including using UV crosslinking to
determine the molecular mass of the complex, led to the
identification of the protein as Oct-1, a member of
the Oct family of transcription factors.
This provided evidence of allele-specific recruitment of
a transcription factor to a novel region of complex
protein–DNA interactions in the TNF promoter. Reporter gene analysis in the same macrophage cell lines
demonstrated that the G A polymorphism at TNF−
$('
resulted in a modest increase in level of gene expression,
either in isolation or as part of its naturally occurring
# 2003 The Biochemical Society
499
500
J. C. Knight
haplotype with the G A SNP at k238 nt. With
experimental evidence of functional modulation of transcriptional regulation, two independent case-control
studies in The Gambia and Kenya were used to first
generate and then confirm the hypothesis that possession
of TNF− was associated with increased susceptibility
$('
to cerebral malaria [6]. In a large case-control study from
The Gambia, multiple logistic regression analysis of
cerebral malaria cases versus controls demonstrated an
odds ratio of 4.3 (95 % confidence intervals 1.5–12.8)
for TNF− A after inclusion of potentially predictive
$('
variables, such as ethnicity, other TNF polymorphisms
and linked HLA types. A similar result was observed in
Kenya, where TNF− A as a predictive variable for
$('
cerebral malaria cases versus controls was associated with
an odds ratio of 4.6 (95 % confidence intervals 1.3–15.7).
Further characterization is required using in vivo approaches, such as genomic footprinting and chromatin
immunoprecipitation, to determine if the postulated
mechanism of allele-specific recruitment of Oct-1 is
actually occurring in living cells.
represents a naturally constructed experiment that may in
the future yield profound insights into transcriptional
regulation.
ACKNOWLEDGMENTS
This work is funded by the Medical Research Council.
REFERENCES
1
2
3
4
5
FUTURE PROSPECTS
The role of host genetics in determining susceptibility to
complex disease traits is the subject of much active
investigation across the field of clinical medicine. The
completion of the genome project, combined with
ongoing large-scale SNP identification projects, will
ensure a huge number of genetic markers for both
genome-wide genetic association studies and for fine
mapping of disease susceptibility loci. High-throughput
genotyping utilizing the most informative markers based
on knowledge of haplotypic structure will facilitate a
logical approach to this, but the scale of research effort
required in terms of the numbers of markers and study
population sizes remains daunting. A clearer understanding of the functional significance of SNPs stands to
significantly benefit this work, both by potentially
defining which SNPs are of most value to include in such
studies as candidates and in the interpretation of results.
Any genetic susceptibility study has as its premise that the
genetic variation functionally affects disease outcome ;
the challenge is how to resolve this. As has been discussed,
a variety of in vitro techniques are available which can
provide important clues as to potential mechanisms of
action. Equally clear, however, is the problem that such
assays may not reflect what is actually occurring in vivo.
The field of transcriptional regulation research is showing
us that DNA exists in a highly dynamic environment
within which multiple complex regulatory mechanisms
are operating. Although this insight represents a significant challenge to our attempts to define the functional
significance of genetic polymorphism, it also offers an
opportunity, as studying the effects of an SNP in vivo
# 2003 The Biochemical Society
6
7
8
9
10
11
12
13
14
15
16
17
Wang, D. G., Fan, J. B., Siao, C. J. et al. (1998) Largescale identification, mapping, and genotyping of singlenucleotide polymorphisms in the human genome. Science
(Washington, D.C.) 280, 1077–1082
Sachidanandam, R., Weissman, D., Schmidt, S. C. et al.
(2001) A map of human genome sequence variation
containing 1.42 million single nucleotide polymorphisms.
Nature (London) 409, 928–933
Ramensky, V., Bork, P.and Sunyaev, S. (2002) Human
non-synonymous SNPs : server and survey. Nucleic
Acids Res. 30, 3894–3900
Orphanides, G. and Reinberg, D. (2002) A unified theory
of gene expression. Cell (Cambridge, Mass.) 108, 439– 451
Tournamille, C., Colin, Y., Cartron, J. P. and Le Van
Kim, C. (1995) Disruption of a GATA motif in the Duffy
gene promoter abolishes erythroid gene expression in
Duffy-negative individuals. Nat. Genet. 10, 224–228
Knight, J. C., Udalova, I., Hill, A. V. S. et al. (1999) A
polymorphism that affects OCT-1 binding to the TNF
promoter region is associated with severe malaria.
Nat. Genet. 22, 145–150
Udalova, I. A., Richardson, A., Denys, A. et al. (2000)
Functional consequences of a polymorphism affecting
NF-κB p50-p50 binding to the TNF promoter region.
Mol. Cell. Biol. 20, 9113–9119
Farzaneh-Far, A., Davies, J. D., Braam, L. A. et al. (2001)
A polymorphism of the human matrix γ-carboxyglutamic
acid protein promoter alters binding of an activating
protein-1 complex and is associated with altered
transcription and serum levels. J. Biol. Chem. 276,
32466–32473
Carey, M. and Smale, S. T. (2000) Transcriptional
regulation in eukaryotes : concepts, strategies and
techniques, Cold Spring Harbor Laboratory Press,
New York
Latchman, D. S. (1999) Transcription factors. A practical
approach (Hames, B. D., ed.), Oxford University Press,
Oxford
Falkner, F. G. and Zachau, H. G. (1984) Correct
transcription of an immunoglobulin κ gene requires an
upstream fragment containing conserved sequence
elements. Nature (London) 310, 71–74
Reference deleted
Udalova, I. A., Mott, R., Field, D. and Kwiatkowski, D.
(2002) Quantitative prediction of NF-κB DNA–protein
interactions. Proc. Natl. Acad. Sci. U.S.A. 99, 8167–8172
Fried, M. and Crothers, D. M. (1981) Equilibria and
kinetics of lac repressor-operator interactions by
polyacrylamide gel electrophoresis. Nucleic Acids Res. 9,
6505–6525
Singh, H., Sen, R., Baltimore, D. and Sharp, P. A. (1986)
A nuclear factor that binds to a conserved sequence motif
in transcriptional control elements of immunoglobulin
genes. Nature (London) 319, 154–158
Miller, L. H., Mason, S. J., Dvorak, J. A., McGinnis,
M. H. and Rothman, K. I. (1975) Erythrocyte receptors
for (Plasmodium knowlesi) malaria : Duffy blood group
determinants. Science (Washington, D.C.) 189, 561–563
Horuk, R., Chitinis, C. E., Darbonne, W. C. et al. (1993)
A receptor for the malarial parasite Plasmodium vivax :
the erythrocyte chemokine receptor. Science
(Washington, D.C.) 261, 1182–1184
Functional implications of genetic variation in non-coding DNA
18
19
20
21
22
23
24
25
26
27
Galas, D. J. and Schmitz, A. (1978) DNAase footprinting :
a simple method for the detection of protein–DNA
binding specificity. Nucleic Acids Res. 5, 3157–3170
Suck, D., Lahm, A. and Oefner, C. (1988) Structure
refined to 2 AH of a nicked DNA octanucleotide complex
with DNaseI. Nature (London) 332, 464– 468
Alam, J. and Cook, J. L. (1990) Reporter genes :
application to the study of mammalian gene transcription.
Anal. Biochem. 188, 245–254
Smith, C. L. and Hager, G. L. (1997) Transcriptional
regulation of mammalian genes in vivo. A tale of two
templates. J. Biol. Chem. 272, 27493–27496
Graham, F. L. and van der Eb, A. J. (1973) A new
technique for the assay of infectivity of human
adenovirus 5 DNA. Virology 52, 456–467
Wigler, M., Pellicer, A., Silverstein, S. and Axel, R. (1978)
Biochemical transfer of single-copy eukaryotic genes
using total cellular DNA as donor. Cell (Cambridge,
Mass.) 14, 725–731
Vaheri, A. and Pagano, J. S. (1965) Infectious poliovirus
RNA : a sensitive method of assay. Science (Washington,
D.C.) 175, 434
McCutchan, J. H. and Pagano, J. S. (1968) Enhancement
of the infectivity of simian virus 40 deoxyribonucleic acid
with diethylaminoethyl-dextran. J. Natl. Cancer Inst. 41,
351–357
Lopata, M. A., Cleveland, D. W. and Sollner-Webb, B.
(1984) High-level expression of a chloramphenicol
acetyltransferase gene by DEAE-dextran-mediated DNA
transfection coupled with a dimethylsulphoxide or
glycerol shock treatment. Nucleic Acids Res. 12,
5707–5717
Wong, T. K. and Neumann, E. (1982) Electric field
mediated gene transfer. Biochem. Biophys. Res.
Commun. 107, 584–587
28
29
30
31
32
33
34
35
36
37
Felgner, P. L., Gadek, T. R., Holm, M. et al. (1987)
Lipofection : a highly efficient lipid-mediated
DNA\transfection proceedure. Proc. Natl. Acad.
Sci. U.S.A. 84, 7413–7417
Chen, C. and Okayama, H. (1987) High efficiency
transformation of mammalian cells by plasmid DNA.
Mol. Cell. Biol. 7, 2745–2752
Gorman, C. M., Moffat, L. F. and Howard, B. H. (1982)
Recombinant genomes which express chloramphenicol
acetyltransferase in mammalian cells. Mol. Cell. Biol. 2,
1044–1051
de Wet, J. R., Wood, K. V., DeLuca, M., Helinski, D. R.
and Subramani, S. (1987) Firefly luciferase gene : structure
and expression in mammalian cells. Mol. Cell. Biol. 7,
725–737
Hollon, T. and Yoshimura, F. K. (1989) Variation
in enzymatic transient gene expression assays.
Anal. Biochem. 182, 411– 418
Allen, R. D. (1999) Polymorphism of the human TNF-α
promoter : random variation or functional diversity ?
Mol. Immunol. 36, 1017–1027
Weil, P. A., Luse, D. S., Segall, J. and Roeder, R. G.
(1979) Selective and accurate initiation of transcription
at the Ad2 major late promoter in a soluble system
dependent on purified RNA polymerase II and DNA.
Cell (Cambridge, Mass.) 18, 469– 484
Mueller, P. R. and Wold, B. (1989) In vivo footprinting
of a muscle specific enhancer by ligation mediated PCR.
Science (Washington, D.C.) 246, 780–786
Becker, P. B., Weih, F. and Schutz, G. (1993)
Footprinting of DNA-binding proteins in intact cells.
Methods Enzymol. 218, 568–587
Orlando, V., Strutt, H. and Paro, R. (1997) Analysis
of chromatin structure by in vivo formaldehyde
cross-linking. Methods 11, 205–214
Received 31 October 2002/3 January 2003; accepted 6 January 2003
Published as Immediate Publication 6 January 2003, DOI 10.1042/CS20020304
# 2003 The Biochemical Society
501