Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Clinical Science (2003) 104, 493–501 (Printed in Great Britain) GLAXOSMITHKLINE/MRS PAPER Functional implications of genetic variation in non-coding DNA for disease susceptibility and gene regulation* Julian C. KNIGHT Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Headington, Oxford OX3 7BN, U.K. A B S T R A C T The role of host genetic variation in determining susceptibility to complex disease traits is the subject of much research effort, but it often remains unclear whether disease-associated genetic polymorphisms are themselves functionally relevant or acting only as markers within an extended haplotype. Experimental approaches to investigate the functional impact of polymorphisms in non-coding regulatory DNA sequences for gene expression are discussed, including the role of gel-shift assays, DNA footprinting and reporter gene analysis. The limitations of different experimental approaches are presented together with future prospects for in vivo analysis. The strategic application of these functional approaches is discussed and illustrated by analysis of the role of genetic variation in the tumour necrosis factor promoter region in determining susceptibility to severe malaria. INTRODUCTION The genetic factors that determine our individual susceptibility to complex disease traits have proved difficult to localize despite significant research effort. Gene effects are likely to involve multiple susceptibility loci of individually modest magnitude, limiting the application of linkage analysis. The candidate gene approach has been widely used to analyse possible associations between genetic variants and disease outcome, with selection of genes based on a priori knowledge of disease pathogenesis. However, both linkage analysis and association studies typically leave open the question of whether a disease-associated genetic variant is functionally important or serving only as a genetic marker with the functional locus co-inherited on the polymorphic allele. Most genetic variation comprises single base changes in the DNA sequence, known as single nucleotide polymor- phisms (SNPs), occurring approximately once every 1000 nucleotides [1,2] (Figure 1). Where this occurs in coding DNA, the implications may be readily apparent as a change in the amino acid make-up of the translated protein [3]. Even where the polymorphism remains silent at the protein level, as in the case of synonymous mutations, the effect of the polymorphism can be assayed at the level of mRNA. This is based on the fact that when a polymorphism occurs in coding DNA it will be present in the transcribed mRNA, allowing the relative abundance of the two alleles in cells from an individual heterozygous for a given SNP to be assayed. In contrast, assaying the functional effect of polymorphisms occurring in non-coding DNA is more problematic. Here, most attention has focused on SNPs occurring in putative regulatory regions in which a base change in the DNA sequence may affect the process of gene expression. Polymorphisms occurring in promoter regions upstream * This paper was presented at the GlaxoSmithKline\MRS Young Investigator session at the MRS Meeting, Royal College of Physicians, London on 4 February 2002. Key words : gene regulation, genetic susceptibility, polymorphism, tumour necrosis factor. Abbreviations : CAT, chloramphenicol acetyl transferase ; EMSA, electrophoretic mobility-shift assay ; NF-κB, nuclear factor κB ; Oct, octamer ; SNP, single nucleotide polymorphism ; TNF, tumour necrosis factor. Correspondence : Dr Julian Knight (e-mail julian!well.ox.ac.uk). # 2003 The Biochemical Society 493 494 J. C. Knight the analysis of the naturally occurring combination of polymorphisms on a given allele that may, in turn, reveal their true biological significance. In this paper, I will outline some of the experimental approaches used to define the functional significance of genetic polymorphism and illustrate these using examples from the published literature and through work on SNPs in the promoter region of the TNF gene, the gene encoding tumour necrosis factor. The clinical significance of this work lies in identifying candidate functional SNPs that may have a role in determining disease susceptibility. An increasing amount of genomic information is now available with large numbers of SNPs in the public domain through SNP discovery efforts such as the SNP Consortium [2]. While this facilitates fine mapping of regions by providing increasing numbers of markers for linkage disequilibrium mapping, it also highlights Figure 1 Example of an SNP comprising a G A substitution the current need for more research effort in defining the Electropherograms produced by fluorescence-based sequencing using an ABI 3700 functional relevance of polymorphisms, so that this can showing the genomic DNA from an individual homozygous for G at the site of the be used either to allow a rational choice of SNPs for SNP (a) and an individual homozygous for A (b). The base substitution is denoted inclusion in genetic epidemiological studies or to aid the by an arrow. interpretation of apparent disease-associated SNPs. of genes may potentially affect the process of transcription whereby RNA polymerase, the enzyme responsible for transcribing the DNA into mRNA, is recruited to the gene and the nascent mRNA synthesized. This complex process requires the co-ordinated action of multiple regulatory proteins through complex protein– DNA and protein–protein interactions [4]. Variation in the DNA sequence may potentially alter the affinities of existing protein–DNA interactions or, indeed, recruit new proteins to bind to the DNA, altering the specificity and kinetics of the transcriptional process [5–8]. Analysis of putative functional SNPs in regulatory regions has focused on techniques which assay protein– DNA interactions, such as DNA footprinting and gelshift assays, and on those measuring gene expression, such as reporter gene experiments [9,10]. However, such methods provide predominately in vitro evidence of the potential effects of a change in DNA sequence on gene expression. The process of control of gene expression operates at many levels, not least of which is the chromatin structure within which DNA is packaged and whose particular state of modification profoundly affects accessibility to regulatory transcription factor proteins and thus levels of expression. Thus, while important insights into potential modulation of gene expression may be gained using in vitro techniques, it is only by analysis of SNPs in their natural in vivo context that it is possible to gain a clear appreciation of their biological importance. Furthermore, not only is it important to investigate the consequences of DNA sequence variation within the naturally occurring milieu of transcriptional machinery, but also in the context of the naturally occurring haplotype within which the SNP occurs. It is # 2003 The Biochemical Society APPROACHES IN DEFINING FUNCTIONAL EFFECTS OF POLYMORPHISMS IN NONCODING DNA In vitro assays of protein–DNA interactions A typically postulated mode of action whereby DNA sequence variation may exert a functional effect on gene regulation is through an effect on protein–DNA interactions. Regulatory transcription factor proteins show a high degree of specificity in their DNA-binding domains for the DNA sequence to which they will bind. This consensus DNA sequence is typically 5–8 nucleotides in length and specific to a given family of transcription factors ; for example, members of the octamer (Oct) family of transcription factors recognize the DNA sequence ATGCAAAT [11]. The DNA sequence corresponding to the two allelic forms of a given SNP can thus be inspected for its inclusion of any potential consensus binding motifs, for example, derived using online transcription factor databases, e.g. the TRANSFAC transcription factor database (http:\\www. generegulation.de\). Quantitative models predicting the effects of SNPs on the binding affinity of a transcription factor for the DNA-binding motifs have been developed, for example, for the important regulatory transcription factor nuclear factor κB (NF-κB) [13]. The electrophoretic mobility-shift assay (EMSA ; gelshift assay) [14] provides a comparatively simple, rapid and extremely sensitive technique for studying protein– DNA interactions in vitro. In this technique, short DNA sequences are synthesized, typically 20 nucleotides in Functional implications of genetic variation in non-coding DNA Figure 2 Principle of the EMSA (a) The binding site of interest is synthesized as a short radiolabelled DNA probe which can be used to identify both known and novel factors binding to the candidate region. Once bound to DNA, a protein–DNA complex is stabilized when subjected to non-denaturing PAGE, allowing resolution of protein–DNA complexes as discrete bands. (b) The specificity of the interaction may be investigated by competition experiments in which typically 10- or 100-fold excess unlabelled probe is added, which, in the case of a specific competitor probe, results in progressively less radiolabelled probe bound by the transcription factor protein. length, which match the SNP in the two allelic forms. These are radiolabelled and used as bait to which transcription factors may potentially bind, retarding the migration of the radiolabelled DNA when these are electrophoresed through a gel (Figure 2a). Entry of free DNA and DNA–protein complexes into the gel matrix by electrophoresis results in physical separation ; in general, the larger the protein bound, the greater the retardation of mobility within the gel. Stability is enhanced by low salt conditions in the gel and a cage effect created by the gel matrix. Non-specific interactions are minimized by adding synthetic co-polymers such as poly(dI-dC)–(dI-dC) [15]. Preliminary experiments may involve use of a crude nuclear extract prepared from an appropriate cell line under stimulation conditions of interest, based, for example, on the putative cell type in which the SNP may be exerting an effect. In the case of SNPs in the TNF gene, for example, macrophages are known to be a major source of TNF production in the course of infection and may be an appropriate cell type to study. It is important to be aware that interactions observed in preliminary experiments may not be specific and this can be resolved by competition experiments, as outlined in Figure 2(b). Increasing amounts of unlabelled competitor to DNA probe are added to a fixed amount of labelled DNA probe. Excess unlabelled competitor probe identical with the labelled probe should abolish the complexes observed from the now relatively small proportion of labelled probe in the reaction. In contrast, if the unlabelled competitor probe does not contain a high-affinity site for the protein it will not affect the protein–DNA interaction. Further information can also be obtained using antibodies to specific candidate transcription factors. Binding of antibodies to the DNA# 2003 The Biochemical Society 495 496 J. C. Knight binding protein results in slower mobility through the gel, ‘ supershifting ’ the complex : indeed, the complex may disappear if the antibody binds to a domain of the transcription factor that is required for DNA binding. Binding affinities can also be investigated in more detail in titration experiments with either crude nuclear extract or recombinant protein. For example, clear differential binding affinity was observed for NF-κB family members with the TNF promoter SNP at k863 nt which lies in an NF-κB-binding site [7]. It is important to remember, however, that EMSA experiments can provide information that a change in the DNA sequence due to the SNP has the capacity in vitro to affect protein–DNA interactions, but not that these are actually occurring in vivo. The EMSA technique has been applied to the functional characterization of many SNPs. A number of important host genetic determinants of susceptibility to malaria infection have been identified, including possession of the Duffy blood group antigen, which functions both as a chemokine receptor and as a receptor for erythrocyte invasion by Plasmodium vivax malarial parasites [16,17]. In the absence of the Duffy antigen, erythrocytes are resistant to invasion by malarial parasites. This phenotype, designated Duffy negative, occurs because the DARC gene (encoding Duffy antigen receptor for chemokines) is not transcribed in erythrocytes. However, in these individuals, Duffy mRNA and antigen are expressed in non-erythroid tissues. The molecular basis for this was elucidated by analysis of regulation of the DARC gene. In Duffy-negative individuals, a T C SNP in the DARC promoter region results in erythroidspecific repression of gene expression [5]. EMSA experiments demonstrated that this occurs because the polymorphism abolishes the binding of the erythroidspecific transcription factor GATA-1, which was associated with a 20-fold reduction in gene expression. A limitation of the EMSA technique is that the region of DNA of interest must be known and that the length of probe used is short (typically 20–30 nucleotides) to allow good resolution on the gel. This does not allow for cooperation between sites that may be required for binding and, if the binding site is unknown, it is not an efficient screening method. The technique of DNA footprinting offers considerable advantages in this respect as it can be used to generate a map of sites of protein–DNA interaction occurring in a region, including novel sites of interaction. In a region containing several SNPs, this therefore allows the investigator to identify SNPs lying in or near sites of protein–DNA interaction which merit further investigation. DNA footprinting (or protection mapping) is based on cleavage of a DNA sequence with either a chemical reagent or an enzyme. The DNA region of interest, which may be several hundred nucleotides in length, is synthesized as an oligoduplex probe and radiolabelled at one end. Careful titration ensures that # 2003 The Biochemical Society Figure 3 DNaseI footprinting assay Radiolabelled DNA probes are digested with DNaseI in the absence or presence of proteins ; protein binding to DNA is seen as a region of protection in the ladder of DNA fragments. In this example, increasing amounts of crude nuclear extract are added to the radiolabelled DNA probe prior to addition of enzyme. Regions of protection (‘ footprints ’) are denoted schematically to the right of the gel. each molecule of the DNA is cleaved on average only once at a random site, resulting in a ladder of DNA fragments of varying size when subjected to denaturing PAGE (Figure 3). Each band in the DNA ladder is representative of a specific nucleotide where the cleavage agent has cut the labelled strand of DNA, and can be localized by concomitantly running a Maxam–Gilbert sequencing reaction of the same DNA probe alongside the cleavage products. In the presence of protein forming a binding complex with the DNA, the nucleotides to which the protein is bound will be protected from cleavage. This produces a gap or ‘ footprint ’ in the ladder of labelled DNA fragments on electrophoresis, allowing Functional implications of genetic variation in non-coding DNA Figure 4 Principles of reporter gene design Cloning sites for insertion of gene sequence to test are shown (>) 5h and 3h to the reporter gene. The antibiotic resistance gene and prokaryotic origin of replication permit propagation and selection of the vector in E. coli. Poly(A), polyadenylated. specific localization of the site of protein–DNA interaction. DNaseI footprinting [18] uses DNaseI as the cleavage agent of the DNA molecules. This enzyme does not completely cleave the DNA duplex, but rather will randomly cut only one strand of the DNA. DNaseI binds to the minor groove of the DNA and cuts the phosphodiester backbone of each strand independently [19]. On binding by proteins, the DNA is protected from hydrolysis catalysed by DNaseI, producing the characteristic footprint. In addition, structural changes may make nucleotides adjacent to the bound region more vulnerable to digestion, appearing more intensely on the autoradiograph as ‘ hypersensitive ’ sites. The footprinting technique allows for the detection of multiple binding sites on the DNA fragment. Probes need to have high specific activity, and protein concentrations must be high enough to saturate binding to the DNA in order to ensure that potential ‘ footprint ’ areas are not masked by unbound digested fragments. Careful titration of DNaseI concentration and exposure time is required to ensure one nick occurs per molecule of DNA probe to try to ensure an even ladder. Unfortunately, preferential sites of cleavage by DNaseI mean the ladder tends to be uneven, but differences between protein-protected and naked DNA ladders are usually clear. The size of the DNaseI enzyme limits the resolution of the footprints. These may be defined further using other reagents, such as exonuclease III, while the individual nucleotides involved in binding can also be assessed by methylation interference footprinting. Functional assays of transcriptional activity The most commonly used assay to investigate the functional effects of SNPs on gene expression is transient transfection. This approach involves constructing a plasmid in which the putative regulatory region spanning the position of the SNP(s) from the gene of interest is linked to the coding sequence for an unrelated reporter gene [20] (Figure 4). For the purpose of comparing SNPs occurring in a gene promoter region, for example, the promoter fragment can be placed directly upstream of the reporter gene in a plasmid vector lacking endogenous promoter activity and then constructs with the two allelic forms of the SNP are compared. Following transfer into cells, measurement of the reporter gene product gives an indirect estimate of the induction in gene expression directed by the regulatory sequences. A range of reporter genes are available and generally aim to (i) produce a protein normally absent from the host cell ; (ii) be quantifiable in a simple, rapid, reproducible and sensitive assay with a broad linear range ; and (iii) not alter the physiology of the recipient cells. The major limitation of transient transfection assays is that, within a transfected cell, the plasmid DNA exists in a highly artificial configuration with variable copy number that can profoundly influence results, leading, for example, to inactivity or dysregulation of regulatory elements [21]. A variety of techniques can be employed to introduce (‘ transfect ’) the DNA into cells. Careful optimization is required as every mammalian cell type has a characteristic set of requirements for introduction of foreign DNA. Divalent cations, such as calcium and magnesium, were initially found to promote uptake of DNA into bacteria and this led to the development of the technique of calcium phosphate transfection [22,23]. A second chemical method, DEAE-dextran transfection [24,25], is simpler to perform and more reproducible than calcium phosphate-mediated transfection, although it is not suitable for production of stably transfected cell lines, and DEAE-dextran can be directly toxic to cells. The coupling of a positively charged chemical group DEAE (diethylaminoethyl) to an inert carbohydrate polymer (dextran) results in sticking of co-incubated DNA via its negatively charged phosphate groups to cell surfaces, from where they may be taken into cells by endocytosis. In some cell types, transfection may be enhanced by a glycerol or DMSO shock after the cells are exposed to DNA–DEAE-dextran complexes [26]. Other methods available for introducing DNA into cells include electroporation in which cells in solution with DNA are subjected to a brief electrical pulse that causes pores to transiently open in their membranes, allowing DNA to diffuse into the cells [27]. Liposomes can also be used to mediate transfection with DNA incorporated into artificial lipid vesicles that fuse with the cell membrane [28]. Only very rarely will DNA transferred into a cell nucleus be stably incorporated into the genome of the transfected cell (1 in 10$–10' cells) [29]. However, many more cells, up to 50 %, will take up DNA on transfection but fail to integrate it, with DNA persisting for several days in the nucleus and being subject to the regulatory machinery that controls expression of endogenous genes. This transient expression allows assays of regulatory elements when these are incorporated in the transfected DNA reporter gene construct. # 2003 The Biochemical Society 497 498 J. C. Knight Quantification of the reporter gene protein by enzymic or immunological means allows an indirect estimate of the transcriptional activity of the reporter vector encoding the protein. One of the earliest assays was based on the bacterial gene CAT encoding the enzyme chloramphenicol acetyl transferase [30]. The CAT gene is derived from transposon 9 of Escherichia coli and catalyses the transfer of the acetyl group from acetylCoA to the substrate chloramphenicol. Enzymic activity can be quantified on the basis of acetylated chloramphenicol ; for example, by following the conversion of ["%C] chloramphenicol into its acetyl derivatives using TLC on silica gels by autoradiography or by organic extraction and scintillation counting, or by non-radioactive methods such as ELISA. The CAT reporter protein is stable and not found in eukaryotes, but assays are typically timeconsuming with relatively low sensitivity and a narrow linear range. Among more recently developed reporter gene assays, a highly sensitive rapid method measures the enzyme luciferase and is based on the oxidation of beetle luciferin with production of photons of light, which are quantified using a luminometer over a broad linear range [31]. Transient transfection assays provide valuable information regarding potential modulation of gene expression by DNA sequence variation resulting from SNPs. However, it must be appreciated that they represent an in vitro technique in which a naked DNA sequence is assayed in the absence of what may be its normal transcriptional machinery and of regulatory elements, such as enhancers, which may be a significant distance from the promoter itself. Assays often show a high degree of variability, a major cause of which is variation in transfection efficiency [32]. Co-transfection of a control reporter gene vector can be used to normalize for transfection efficiency within or between transfection experiments. The control reporter gene vector has a different gene product from the vector containing the DNA of interest and is typically driven by a strong constitutive promoter. The choice of which portions of the naturally occurring regulatory regions of the gene locus, within which the polymorphism is found, to include in the reporter gene may profoundly influence results. For example, inclusion of the TNF 3h-untranslated region (‘ UTR ’) appears to significantly influence observed results for the TNF− SNP [33]. Similarly, inclusion of $!) other SNPs in the reporter construct may be important, as any functional effect of an SNP may also be dependent on its naturally occurring haplotype of associated polymorphisms. Results of transient transfection assays have also been found to be highly context specific with respect to the choice of cell type to transfect, the stimulus used for induction of gene expression, the mode and efficiency of transfection and the DNA plasmid design and preparation. These differences in experimental design may have significant consequences for data interpretation and # 2003 The Biochemical Society underlie much of the apparent controversy in the literature concerning the effects of a given SNP on transcription. Stable transfection offers some solutions to the problems encountered with transient transfection assays. Here, the plasmid containing the reporter gene with the DNA region of interest is stably incorporated into the genome, with selection of cells in which this has occurred based, for example, on a drug-resistance gene present within the same plasmid. Incorporation into the genome offers significant advantages in that this is a more natural chromatin environment and copy number, but the approach is more technically demanding and takes several weeks to complete, because of the requirement for drug selection and cell expansion. An important consideration is that the transcriptional activity of the DNA sequence of interest will be significantly influenced by the accessibility of the site into which it has integrated, for example, in terms of chromatin configuration. This requires analysis of multiple clones and may well exclude its usefulness for most SNPs in which a modest difference in expression for the two alleles is predicted, as clone-toclone variation is likely to be greater than 2- to 3-fold [9]. An alternative approach to transfection assays is to use a cell-free system to perform in vitro transcription [34]. This approach is of significant value in the detailed biochemical characterization of a regulatory element and may be most appropriate when a hypothesis exists as to the likely mechanism of action of an SNP, for example, in terms of recruitment of a specific transcription factor in the presence of a particular DNA base change. Early assays utilized a naked DNA template but, if practicable, it may be most appropriate to recreate a chromatin template to more fully replicate events occurring in vivo with this reconstituted system. Even in this situation, specific regulatory proteins may not be fully active in an in vitro assay of this type, because of their low concentration and absence of cofactors essential for their function. In vivo assays to determine function One approach to defining the functional significance of SNPs would be to take cells from individuals of differing genotype and correlate this with phenotype, for example, TNF promoter genotype with levels of TNF protein secreted by peripheral blood mononuclear cells following induction. This type of experiment is, however, potentially confounded at many levels. Individuals may differ at a particular SNP of interest, but may be polymorphic at other positions in the region, confounding the interpretation of the results. Comparison between individuals should ideally be based on knowledge of the underlying haplotypes, so that meaningful interpretations can be made. Any observed variation between individuals for a given phenotype may also arise from Functional implications of genetic variation in non-coding DNA Figure 5 TNF promoter region The 1.2 kb TNF promoter region is highly polymorphic and subject to interaction with multiple regulatory proteins which, in turn, recruit cofactors and RNA polymerase II (RNAP II). Accessibility of the region is dependent on chromatin configuration and, for illustration, a nucleosome is shown in the distal region around which the DNA is coiled. A cluster of SNPs found in the distal TNF promoter is also shown. multiple environmental and genetic factors other than the genotype under study, which may potentially confound results. Transcribed SNPs offer the opportunity to compare within cells from an individual heterozygous for a given SNP the relative abundance of mRNA originating from the two alleles discriminated by the SNP using allele-specific reverse transcription–PCR. Unfortunately, this approach is confined to SNPs occurring in exonic regions, which limits its applicability. Research in the field of gene regulation has yielded several novel in vivo approaches which may aid characterization of SNPs in vivo, but which are, at this time, technically challenging. One approach is to perform genomic DNA footprinting in order to visualize sites of protein–DNA interaction occurring in living cells [35,36]. Dimethyl sulphate (‘ DMS ’) is a chemical which rapidly enters cells, modifying exposed guanine residues in DNA which are then susceptible to cleavage with piperidine. In the presence of a DNA-binding protein, DNA is protected from modification and subsequent cleavage, resulting in a footprint in the ladder of DNA fragments. The challenge is how to visualize this as, at a genomic level, the amounts of DNA involved are very small and the cut fragments not amenable to conventional PCR. The technique of ligation-mediated PCR has been successfully employed, ligating a linker oligonucleotide to enable amplification. The results of such genomic footprinting are extremely valuable as an in vivo record of protein–DNA interactions actually occurring, but are typically not clear-cut and have yet to be successfully employed in characterization of SNPs to demonstrate an allele-specific difference. In vivo protein–DNA interactions could also be potentially investigated based on knowledge of the probable proteins involved using chromatin immunoprecipitation [37]. In this situation, specific antibodies would be used to immunoprecipitate chromatin to which proteins have been previously crosslinked in vivo using a reagent such as formaldehyde ; levels of occupancy at a specific gene can then be assessed using gene-specific primers to amplify the immunoprecipitated DNA. A strategic approach : functional analysis of a TNF polymorphism A variety of experimental methodologies can be adopted to investigate the functional significance of SNPs. The TNF promoter region is typical of many such regulatory regions of DNA in that it contains multiple transcription factor-binding sites with highly context-specific regulation such that particular elements are more important in certain cell types and in response to specific stimuli. The majority of the transcription factor-binding sites lie in the first 200 nucleotides upstream of the transcriptional start site, whereas the majority of SNPs lie more distally (Figure 5). As part of an investigation of genetic determinants of susceptibility to severe malaria, TNF has been investigated as a candidate gene based on a priori knowledge of its role in disease pathogenesis. In order to identify potentially functional SNPs, the initial strategy was to map out potential sites of protein–DNA interactions in this highly polymorphic region using in vitro DNA footprinting in a human macrophage cell line. This revealed a novel region of protein–DNA interactions spanning a G A SNP at k376 nt relative to the transcriptional start site (TNF− ) [6]. In the presence of $(' the adenine substitution, an additional complex was seen on EMSA which was present constitutively. Further characterization, including using UV crosslinking to determine the molecular mass of the complex, led to the identification of the protein as Oct-1, a member of the Oct family of transcription factors. This provided evidence of allele-specific recruitment of a transcription factor to a novel region of complex protein–DNA interactions in the TNF promoter. Reporter gene analysis in the same macrophage cell lines demonstrated that the G A polymorphism at TNF− $(' resulted in a modest increase in level of gene expression, either in isolation or as part of its naturally occurring # 2003 The Biochemical Society 499 500 J. C. Knight haplotype with the G A SNP at k238 nt. With experimental evidence of functional modulation of transcriptional regulation, two independent case-control studies in The Gambia and Kenya were used to first generate and then confirm the hypothesis that possession of TNF− was associated with increased susceptibility $(' to cerebral malaria [6]. In a large case-control study from The Gambia, multiple logistic regression analysis of cerebral malaria cases versus controls demonstrated an odds ratio of 4.3 (95 % confidence intervals 1.5–12.8) for TNF− A after inclusion of potentially predictive $(' variables, such as ethnicity, other TNF polymorphisms and linked HLA types. A similar result was observed in Kenya, where TNF− A as a predictive variable for $(' cerebral malaria cases versus controls was associated with an odds ratio of 4.6 (95 % confidence intervals 1.3–15.7). Further characterization is required using in vivo approaches, such as genomic footprinting and chromatin immunoprecipitation, to determine if the postulated mechanism of allele-specific recruitment of Oct-1 is actually occurring in living cells. represents a naturally constructed experiment that may in the future yield profound insights into transcriptional regulation. ACKNOWLEDGMENTS This work is funded by the Medical Research Council. REFERENCES 1 2 3 4 5 FUTURE PROSPECTS The role of host genetics in determining susceptibility to complex disease traits is the subject of much active investigation across the field of clinical medicine. The completion of the genome project, combined with ongoing large-scale SNP identification projects, will ensure a huge number of genetic markers for both genome-wide genetic association studies and for fine mapping of disease susceptibility loci. High-throughput genotyping utilizing the most informative markers based on knowledge of haplotypic structure will facilitate a logical approach to this, but the scale of research effort required in terms of the numbers of markers and study population sizes remains daunting. A clearer understanding of the functional significance of SNPs stands to significantly benefit this work, both by potentially defining which SNPs are of most value to include in such studies as candidates and in the interpretation of results. Any genetic susceptibility study has as its premise that the genetic variation functionally affects disease outcome ; the challenge is how to resolve this. As has been discussed, a variety of in vitro techniques are available which can provide important clues as to potential mechanisms of action. Equally clear, however, is the problem that such assays may not reflect what is actually occurring in vivo. The field of transcriptional regulation research is showing us that DNA exists in a highly dynamic environment within which multiple complex regulatory mechanisms are operating. Although this insight represents a significant challenge to our attempts to define the functional significance of genetic polymorphism, it also offers an opportunity, as studying the effects of an SNP in vivo # 2003 The Biochemical Society 6 7 8 9 10 11 12 13 14 15 16 17 Wang, D. G., Fan, J. B., Siao, C. J. et al. (1998) Largescale identification, mapping, and genotyping of singlenucleotide polymorphisms in the human genome. Science (Washington, D.C.) 280, 1077–1082 Sachidanandam, R., Weissman, D., Schmidt, S. C. et al. (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature (London) 409, 928–933 Ramensky, V., Bork, P.and Sunyaev, S. (2002) Human non-synonymous SNPs : server and survey. Nucleic Acids Res. 30, 3894–3900 Orphanides, G. and Reinberg, D. (2002) A unified theory of gene expression. Cell (Cambridge, Mass.) 108, 439– 451 Tournamille, C., Colin, Y., Cartron, J. P. and Le Van Kim, C. (1995) Disruption of a GATA motif in the Duffy gene promoter abolishes erythroid gene expression in Duffy-negative individuals. Nat. Genet. 10, 224–228 Knight, J. C., Udalova, I., Hill, A. V. S. et al. (1999) A polymorphism that affects OCT-1 binding to the TNF promoter region is associated with severe malaria. Nat. Genet. 22, 145–150 Udalova, I. A., Richardson, A., Denys, A. et al. (2000) Functional consequences of a polymorphism affecting NF-κB p50-p50 binding to the TNF promoter region. Mol. Cell. Biol. 20, 9113–9119 Farzaneh-Far, A., Davies, J. D., Braam, L. A. et al. (2001) A polymorphism of the human matrix γ-carboxyglutamic acid protein promoter alters binding of an activating protein-1 complex and is associated with altered transcription and serum levels. J. Biol. Chem. 276, 32466–32473 Carey, M. and Smale, S. T. (2000) Transcriptional regulation in eukaryotes : concepts, strategies and techniques, Cold Spring Harbor Laboratory Press, New York Latchman, D. S. (1999) Transcription factors. A practical approach (Hames, B. D., ed.), Oxford University Press, Oxford Falkner, F. G. and Zachau, H. G. (1984) Correct transcription of an immunoglobulin κ gene requires an upstream fragment containing conserved sequence elements. Nature (London) 310, 71–74 Reference deleted Udalova, I. A., Mott, R., Field, D. and Kwiatkowski, D. (2002) Quantitative prediction of NF-κB DNA–protein interactions. Proc. Natl. Acad. Sci. U.S.A. 99, 8167–8172 Fried, M. and Crothers, D. M. (1981) Equilibria and kinetics of lac repressor-operator interactions by polyacrylamide gel electrophoresis. Nucleic Acids Res. 9, 6505–6525 Singh, H., Sen, R., Baltimore, D. and Sharp, P. A. (1986) A nuclear factor that binds to a conserved sequence motif in transcriptional control elements of immunoglobulin genes. Nature (London) 319, 154–158 Miller, L. H., Mason, S. J., Dvorak, J. A., McGinnis, M. H. and Rothman, K. I. (1975) Erythrocyte receptors for (Plasmodium knowlesi) malaria : Duffy blood group determinants. Science (Washington, D.C.) 189, 561–563 Horuk, R., Chitinis, C. E., Darbonne, W. C. et al. (1993) A receptor for the malarial parasite Plasmodium vivax : the erythrocyte chemokine receptor. Science (Washington, D.C.) 261, 1182–1184 Functional implications of genetic variation in non-coding DNA 18 19 20 21 22 23 24 25 26 27 Galas, D. J. and Schmitz, A. (1978) DNAase footprinting : a simple method for the detection of protein–DNA binding specificity. Nucleic Acids Res. 5, 3157–3170 Suck, D., Lahm, A. and Oefner, C. (1988) Structure refined to 2 AH of a nicked DNA octanucleotide complex with DNaseI. Nature (London) 332, 464– 468 Alam, J. and Cook, J. L. (1990) Reporter genes : application to the study of mammalian gene transcription. Anal. Biochem. 188, 245–254 Smith, C. L. and Hager, G. L. (1997) Transcriptional regulation of mammalian genes in vivo. A tale of two templates. J. Biol. Chem. 272, 27493–27496 Graham, F. L. and van der Eb, A. J. (1973) A new technique for the assay of infectivity of human adenovirus 5 DNA. Virology 52, 456–467 Wigler, M., Pellicer, A., Silverstein, S. and Axel, R. (1978) Biochemical transfer of single-copy eukaryotic genes using total cellular DNA as donor. Cell (Cambridge, Mass.) 14, 725–731 Vaheri, A. and Pagano, J. S. (1965) Infectious poliovirus RNA : a sensitive method of assay. Science (Washington, D.C.) 175, 434 McCutchan, J. H. and Pagano, J. S. (1968) Enhancement of the infectivity of simian virus 40 deoxyribonucleic acid with diethylaminoethyl-dextran. J. Natl. Cancer Inst. 41, 351–357 Lopata, M. A., Cleveland, D. W. and Sollner-Webb, B. (1984) High-level expression of a chloramphenicol acetyltransferase gene by DEAE-dextran-mediated DNA transfection coupled with a dimethylsulphoxide or glycerol shock treatment. Nucleic Acids Res. 12, 5707–5717 Wong, T. K. and Neumann, E. (1982) Electric field mediated gene transfer. Biochem. Biophys. Res. Commun. 107, 584–587 28 29 30 31 32 33 34 35 36 37 Felgner, P. L., Gadek, T. R., Holm, M. et al. (1987) Lipofection : a highly efficient lipid-mediated DNA\transfection proceedure. Proc. Natl. Acad. Sci. U.S.A. 84, 7413–7417 Chen, C. and Okayama, H. (1987) High efficiency transformation of mammalian cells by plasmid DNA. Mol. Cell. Biol. 7, 2745–2752 Gorman, C. M., Moffat, L. F. and Howard, B. H. (1982) Recombinant genomes which express chloramphenicol acetyltransferase in mammalian cells. Mol. Cell. Biol. 2, 1044–1051 de Wet, J. R., Wood, K. V., DeLuca, M., Helinski, D. R. and Subramani, S. (1987) Firefly luciferase gene : structure and expression in mammalian cells. Mol. Cell. Biol. 7, 725–737 Hollon, T. and Yoshimura, F. K. (1989) Variation in enzymatic transient gene expression assays. Anal. Biochem. 182, 411– 418 Allen, R. D. (1999) Polymorphism of the human TNF-α promoter : random variation or functional diversity ? Mol. Immunol. 36, 1017–1027 Weil, P. A., Luse, D. S., Segall, J. and Roeder, R. G. (1979) Selective and accurate initiation of transcription at the Ad2 major late promoter in a soluble system dependent on purified RNA polymerase II and DNA. Cell (Cambridge, Mass.) 18, 469– 484 Mueller, P. R. and Wold, B. (1989) In vivo footprinting of a muscle specific enhancer by ligation mediated PCR. Science (Washington, D.C.) 246, 780–786 Becker, P. B., Weih, F. and Schutz, G. (1993) Footprinting of DNA-binding proteins in intact cells. Methods Enzymol. 218, 568–587 Orlando, V., Strutt, H. and Paro, R. (1997) Analysis of chromatin structure by in vivo formaldehyde cross-linking. Methods 11, 205–214 Received 31 October 2002/3 January 2003; accepted 6 January 2003 Published as Immediate Publication 6 January 2003, DOI 10.1042/CS20020304 # 2003 The Biochemical Society 501