Download Implications of the Human Genome for Understanding Human

SPECIAL COMMUNICATION Implications of the Human Genome for Understanding Human Biology and Medicine G. Subramanian, MD, PhD Mark D. Adams, PhD J. Craig Venter, PhD Samuel Broder, MD T HE CAPACITY TO SEQUENCE THE entire genomes of free-living organisms, and to analyze such genomes in their entirety, has considerable implications for understanding human biology and medicine.1,2 Our genomic sequence provides a unique record of who we are and how we evolved as a species. 3 The knowledge fostered by understanding the genome might clarify which human characteristics are innate and which are acquired, as well as the interplay between heredity and environment in defining susceptibility to illness. Such an understanding will make it possible to study how our genomic DNA varies among cohorts of patients, and especially the role of such variation in the causation of important illnesses and responses to pharmaceuticals.4-6 We may also be able to use new approaches to investigating complex aspects of the human condition, such as language, thought, selfawareness, and higher-order consciousness. The study of the genome (genomics) and the associated protein content (proteomics) of free-living organisms will eventually make it possible to localize and understand the function of every human gene, as well as the regulatory elements that control the timing, organ-site specificity, extent of gene expression, protein levels, and posttranslational modifications that define health or illness. For any given physiological process, we will 2296 Clinical researchers, practicing physicians, patients, and the general public now live in a world in which the 2.9 billion nucleotide codes of the human genome are available as a resource for scientific discovery. Some of the findings from the sequencing of the human genome were expected, confirming knowledge presaged by many decades of research in both human and comparative genetics. Other findings are unexpected in their scientific and philosophical implications. In either case, the availability of the human genome is likely to have significant implications, first for clinical research and then for the practice of medicine. This article provides our reflections on what the new genomic knowledge might mean for the future of medicine and how the new knowledge relates to what we knew in the era before the availability of the genome sequence. In addition, practicing physicians in many communities are traditionally also ambassadors of science, called on to translate arcane data or the complex ramifications of biology into a language understood by the public at large. This article also may be useful for physicians who serve in this capacity in their communities. We address the following issues: the number of protein-coding genes in the human genome and certain classes of noncoding repeat elements in the genome; features of genome evolution, including large-scale duplications; an overview of the predicted protein set to highlight prominent differences between the human genome and other sequenced eukaryotic genomes; and DNA variation in the human genome. In addition, we show how this information lays the foundations for ongoing and future endeavors that will revolutionize biomedical research and our understanding of human health. www.jama.com JAMA. 2001;286:2296-2307 have a new paradigm for addressing its evolution, development, function, and mechanism in causing disease and in affecting the onset and outcome of disease.7 PREDICTED PROTEIN-CODING GENES One noteworthy finding is the relatively low number of genes in the human genome.1,2 A gene, in this context, is Author Affiliations: Celera Genomics, Rockville, Md. Financial Disclosures: Dr Subramanian assisted in Celera patent filings; Dr Adams received honoraria from noncommercial medical organizations, served as advisor to Celera and as vice president of Celera, and was involved with Celera patents; Dr Venter served as president of Celera and was involved with Celera patents; Dr Broder served as executive vice president of Celera, holds 16 issued patents for therapeutic agents (unrelated to Celera activities), and received honoraria/ travel expenses from noncommercial medical organizations for continuing medical education programs. Celera is involved in developing assay kits or reagents. All authors owned Celera stock and had Celera stock options. All authors received government research grants from the National Human Genome Research Institute to sequence the rat genome and from the National Institute of Allergy and Infectious Diseases to sequence the Anopheles gambiae genome. Corresponding Author and Reprints: Samuel Broder, MD, Celera Genomics, 45 W Gude Dr, Rockville, MD 20850 (e-mail: [email protected]). JAMA, November 14, 2001—Vol 286, No. 18 (Reprinted) ©2001 American Medical Association. All rights reserved. THE HUMAN GENOME defined as a locus of cotranscribed exons, which ultimately result in the production of a peptide or protein. There are a number of computational tools used to identify, enumerate, and compare genes within a species and between species. These computational methods integrate gene prediction models with different types of experimental and computational evidence to impart a stringency requirement to this process, and have been applied by ourselves and others.8-10 The text and subtext of biology prior to the availability of the sequence for the human genome was that the number of genes in an organism would in some fashion reflect its “complexity.” There were expectations that the human genome would contain 100000 genes or more.11-13 The genomic sequences of 4 multicellular eukaryote genomes have been published over the past 3 years. The approximate gene count for the fruit fly (Drosophila melanogaster) is 14 000 genes 10 ; for the roundworm (Caenorhabditis elegans), 19000 genes8; and for the mustard plant (Arabidopsis thaliana), 26 000 genes.9 A comparison of gene numbers among the genomes of different species is in many ways more important than the total number of genes found in a genome of any single species. Those who might be tempted to use the number of genes to explain human complexity might then pause to consider that, by this measure a human being, with approximately 30000 genes,1,2 is roughly a fly plus a worm or the equivalent of a plant. This assessment of the number of human genes is based on results from analyses in which there are stringency requirements used in conjunction with the computational algorithms. Thus, absolute count may be less important than comparisons with published genomes. Similarly, those who were expecting very large numbers of newly discovered genes as targets for pharmaceutical interventions might need to reassess their expectations. The number of genes independently reported by both groups1,2 is far fewer than the expectations based on prior experimental analysis of expressed sequence tags (ESTs) or computational analysis, which estimated that humans would have 70 000 to 120000 genes.12-14 The genomic sequence and the gene complement predicted by both groups may be accessed on the Web (at http:// public.celera.com/index/cfm [for noncommercialpurposesonly1];andathttp:// www.ensembl.org/ [data generated by the International Human Genome Sequencing Consortium2]). The sequencing of the human genome suggests that we must look beyond gene number per se (at least protein-coding genes) as we attempt to understand human complexity, future targets for pharmaceutical research, and implications for medical practice (TABLE 1).1,2,4,5,7,15-46 Surveying the landscape of the human genome leads to several other observations. Only about 1% of the genome is spanned by exons (regions that code for proteins), while just under 25% is contained within introns (regions between exons within genes that are spliced out in the creation of messenger RNA and do not code proteins), and about 75% of the genome is contained in intergenic DNA.1,2 Thus, genes often exist in nonrandom clusters or generich “oases,” separated by what appear to be large “deserts” of several hundreds of thousands of nucleotide codes that do not appear to encode genes. There is no simple explanation for why natural selection has taken this path in the evolution of the human genome, but we believe it is premature to conclude that such “deserts” lack biological or medical importance. REPEAT ELEMENTS The human genome is filled with blocks or “elements” of repetitive nucleotide codes whose function is still a mystery. It has been known for many years, and amply confirmed with the sequencing of the genome, that human DNA contains large and complex families of such repeat elements.1,2 These include the long interspersed repetitive elements (LINEs) and short interspersed repetitive elements (SINEs), which include Alu sequences that arose with the evolution of pri- ©2001 American Medical Association. All rights reserved. mates, including humans. 47 These sequences represent a distinct class of retrotransposon-amplified repeat DNA. During primate evolution, these DNA elements could be replicated and transposed to new sites in the genome.21,47 They comprise approximately 10% of the human genome.1,2 Their biological function and role in natural selection has remained an enigma. Yet in surveying the landscape of the human genome, a striking and nonrandom distribution of Alu sequences is evident. They appear to preferentially colocate within gene-rich regions of the genome.1,2 One inference is that the biological role of these Alu sequences, the effects of nucleotide variations within such elements,21 and their ability to mediate recombination events17,18 will be important in understanding their regulatory effects19-21 on gene function and disease. Further investigations are required to add to the known examples where Alu sequence variations have been shown to affect biology and clinical conditions.17-21 Such elements had previously been characterized as “selfish” DNA (ie, DNA whose existence seems related to replication purposes only),48 having no direct impact on medicine or natural selection. The availability of the human genome sequence suggests that this view should be revised since it appears possible that such repeat elements may indeed contribute to the causation of human diseases. GENOME DUPLICATION The human genome reveals a remarkable level of duplication.1,2 Although the biological impact of duplication in generating gene superfamilies is well established, the first comprehensive view of the genome-wide landscape has revealed the widespread impact of 2 distinct mechanisms of duplication. These 2 forms of duplication are very different: one form mediated at the DNA level (segmental duplication), and another mediated at the RNA level (retrotransposition). Both mechanisms produce paralogs—a term for genes that make their appearance in more than 1 (Reprinted) JAMA, November 14, 2001—Vol 286, No. 18 2297 THE HUMAN GENOME copy in the genome (albeit with possible modifications). Segmental Duplication Among humans, the extent of the segmental duplications is 10- to 100-fold greater than that observed in the fly and worm genomes. There are more than 3500 genes in over 1000 genomic blocks ranging in size up to chromosomal lengths, that have shown a duplication with linear preservation of order on another chromosome.1 This is illustrated by an examination of chromosomes 18 and 20 in FIGURE 1 and a more global representation of this phenomenon in FIGURE 2.1,49 This process might be analogized to a kind of internal genomic colonization. The clinical relevance of such events is emphasized by our finding of Table 1. Representative Clinical and Biological Correlates of Genomic “Complexity” Genomic Information Endogenous retroviral elements Simple repeats (eg, Alu repeat elements, triplet repeats) Relevance to Medicine and Physiology Non−Protein-Coding (Regulation of Gene Expression) Position-specific effects on gene expression; likely has a major influence on disease expression Position-specific effects on gene expression or mediate recombination events that alter DNA sequence Examples Fukuyama muscular dystrophy,15 major histocompatibility complex gene diversity16 Alu repeats: heme oxygenase-1 deficiency,17 breast and ovarian cancer,18-20 parathyroid hormone,21 nicotinic acetylcholine receptor,21 T-cell CD8 alpha,21 Fc epsilon RI-gamma receptor21 Triplet repeats22,23: Huntington disease, Friedreich ataxia Single-nucleotide polymorphisms (SNPs), including those within promoter, enhancer, or intronic regions Determine disease susceptibility and predict therapeutic efficacy and toxicity4,24 (see also Table 2) Disease-associated DNA variants in promoters (malaria25) and introns (calpain-10 type 2 diabetes mellitus26) Therapy-related DNA variants in multidrug resistance gene (MDR-1) and digoxin27; cardiac sodium channel and flecainide28 Noncoding RNA (includes transfer RNA [tRNA], ribosomal RNA [rRNA], small nucleolar RNA [snoRNA]-methylation of rRNA, small nuclear RNA [snRNA], and X dosage compensation [Xist]) DNA methylation Other epigenetic phenomena Indirect and direct effects on gene expression with several recent reports of disease association SnoRNA (imprinting): Prader-Willi syndrome29 Xist: X chromosome inactivation (antisense mechanism)30 Regulate gene expression in the absence of alteration in genomic sequence; defects in gene imprinting are involved in several disease conditions31,32 Prader-Willi syndrome,33 Beckman-Wiedemann syndrome,34 colorectal cancer, Wilms tumor, hepatoblastoma31,35 RNA editing Alternative splicing with protein isoforms Alternative start site for proteins (multiple start codons, internal ribosomal entry sites) Evolution of new protein domains1,2 Domain shuffling (use of “old” domains to generate “new proteins”)1,2 Domain accretion (greater numbers of domains per protein)1,2 Gene duplication (segmental duplications, gene duplication, intronless paralogs) Posttranslational modifications (eg, phosphorylation, acetylation, glycosylation, sulfation [eg, tyrosine sulfotransferase], proteolytic cleavage) 2298 Protein-Coding (Generation of Protein Diversity) Posttranscriptional process that changes the information content within the RNA to affect a wide range of biological processes Protein isoforms often show tissue-specific variability; altered splicing patterns are causative or are markers for many disease states39 Generates proteins of varying size and with different functional capabilities; important physiological role in the immune system and with cell cycle regulators Most prominent in proteins involved in hemostasis, acquired immune function, hormonal and nuclear regulation (Figure 4) Best appreciated among the plasma serine proteases of the coagulation-complement system (Figure 3A) Likely serves to enhance the combinatorial diversity of protein interactions and is prominently noted in nuclear regulators (Figure 3B) Antibody diversity,36 embryonic erythropoiesis,37 familial hypercholesterolemia38 Tau isoforms in Alzheimer disease40 Acquired immune response41 (interleukin-15); apoptotic and cell cycle proteins42 (c-myc, Apaf-1, XIAP) Developmental regulators (bioactive peptide hormones), hemostasis (fibronectin type 1 and 2 domain proteins, C1q-complement component), immune function (cytokines), and nuclear regulators (KRAB domain zinc finger family) Evolutionary phenomena that generate protein diversity by paralogous expansion; have important ramifications for disease gene and therapeutic target identification1,2,5,7 The full extent to which these key modifications affect protein function and thus pathogenesis of disease remains to be explored; in addition to being targets for therapeutic intervention, these protein modifications play a major role in clinical disease43-46 JAMA, November 14, 2001—Vol 286, No. 18 (Reprinted) ©2001 American Medical Association. All rights reserved. THE HUMAN GENOME paralogous, disease-causing genes, of validated shared ancestry, on both duplicated segments. A disease-causing gene is defined as a gene in which sequence variants are linked to the causation of a disease. Notable among these are genes involved in hemostasis, complement fixation, transcriptional regulation (such as the homeobox proteins associated with developmental disorders), metabolic disorders, and voltage-gated ion channels associated with cardiovascular conduction abnormalities.1 In many cases, there is a diseasecausing gene with a paralog on the duplicated segment, whose linkage to a disease is not currently recognized.1 It is possible that an understanding of segmental duplication will provide new insights into the pathogenesis of disease. To be sure, every duplication event will not lead to a paralog that results in the same pathophysiologic consequences. However, it might well be possible to use genomics to demonstrate a unity for disparate diseases. It may also be possible to understand and explain adverse reactions and side effects of drugs through their previously unknown collateral activities against paralogs. Retrotransposition The other remarkable finding is the extent of gene duplication that has resulted from retrovirus-based transposition of gene transcripts.1 The ancestors of humans encountered retroviruses capable of transcribing RNA to DNA (reverse transcription). Indeed, such viruses are not extinct, as diseases such as acquired immunodeficiency syndrome (AIDS) amply confirm.50 The human genome carries the results of many such encounters. Gene duplication by this process in effect creates paralogs that lack introns and often occur in multiple copies scattered randomly throughout the genome. The medical implications of this form of gene duplication are similar to those that apply to segmental duplication. In addition, the degree of identity between the source gene and the retrotransposed gene is often very high, thus leading to the possibility of confounding in DNA- Figure 1. Example of Segmental Duplication Between Chromosomes in the Human Genome1 Chromosome 18 KCNG2 NFATC1 Chromosome 20 11.32 p 11.31 11.2 11.1 11.1 11.2 12.1 q Cerebellin-Related 21.1 21.2 21.3 23 13 12 ZNF236 Kruppel Family Member 12.2 12.3 22 KCNG1 NFAT-Related Cerebellin-Related Kruppel-Related 11.1 11.1 GATA-Related 11.2 Ras (RAB)-Related GATA6 TALE Homeobox Family Member p 11.2 12 13.1 13.2 q 13.3 RAB31 (ras Oncogene Family Member) TGF-β-Induced Factor (TALE Homeobox Family Member) Schematic of a large duplicated segment between chromosome 18 (18q22) and 20 (20q13) to show examples of the genes and their predominantly colinear distribution on both duplicated segments, with the gene names of 7 of the 56 gene pairs shown. The chromosome 18 segment represents 13 million base pairs (bp) of genomic DNA sequence, whereas the chromosome 20 segment represents 1.4 million bp of genomic DNA. These genes represent a diverse set of proteins, including nuclear transcription factors (ZNF236 and Kruppel-related: Kruppel family transcription factors; NFATC1 and NFAT-related: nuclear factor of activated T-cells; GATA6 and GATA-related: GATA transcription factors; TALE homeobox family members, involved in nuclear protein transcription) as well as potassium channel-related factors (KCNG1 and KCNG2: potassium voltage-gated channels, subfamily G); RAB31 and Ras (RAB)-related: ras oncogene superfamily, involved in protein trafficking. The precise clinical associations of these proteins with human disease remain to be ascertained, though other members of these protein classes have been implicated in developmental and cardiovascular conduction abnormalities, for example.49 or protein-based diagnostic tests. It is important to note that changes in coding or noncoding regulatory regions in these paralogs, leading to different functions or expression patterns, may be one way of providing an increased functional repertoire in the human genome. ANALYSIS OF THE PREDICTED PROTEIN SET (PROTEOME) Earlier, we mentioned that the number of protein-coding genes was considered to be low relative to expectations prior to the sequencing of the genome.12,13,51 Does an analysis of the full set of proteins (ie, the proteome) help us resolve the issue of human beings not appearing to carry many more genes than a fruit fly, a roundworm, or a plant? Indeed, we do note that the average human gene makes more proteins, and more complex proteins, than its invertebrate counterparts. A number of such features are worth detailing. These include the evolution of new protein domains (well- ©2001 American Medical Association. All rights reserved. defined regions on a protein that show structural and functional conservation), duplications or expansions of domains (domain accretion), as well as greater combinatorial diversity (domain shuffling) in human beings (FIGURE 3).1,2,52,53 In addition, certain genes produce more than 1 type of protein, using alternative transcriptional start sites and RNA splicing. Finally, posttranslational modifications, wherein the translated protein is subjected to a wide range of biochemical modifications, may potentially give rise to a significantly larger set of functional proteins than would be predicted by the gene count. Table 1 provides a summary of the medical relevance of these features. Extensive protein domain shuffling is observed in the human proteome, and this would serve to increase or alter combinatorial diversity to provide an exponential increase in protein-protein interactions. Moreover, certain special genes show patterns for generating com- (Reprinted) JAMA, November 14, 2001—Vol 286, No. 18 2299 THE HUMAN GENOME Figure 2. Duplications Within the Genome Chr 22 Chr 21 Chr 20 Chr X Chr Y Chr 1 Chr 2 Chr 19 Chr 18 Chr 17 Chr 3 Chr 16 Chr 15 Chr 4 Chr 14 Chr 13 Chr 5 Chr 12 Chr 6 Chr 11 Chr 7 Chr 10 Chr 9 Chr 8 Segmental duplications comparable to those in chromosomes 18 and 20 (see Figure 1) occur throughout the human genome. Chr indicates chromosome. binatorial diversity at the protein level. For example, immunoglobulins and the T-cell receptors show clonal DNA shuffling or rearrangements to increase the immune repertoire, while the cadherins show exon transsplicing (a form of RNA shuffling that mixes and matches exons to create diversity in the final messenger RNA) to generate increased extracellular interactions.54,55 All of these factors taken together contribute to a complexity not captured by examining gene number alone. Many proteins (and protein domains) found in humans evolved early in the animal lineage and hence have orthologs (evolutionary counterparts) in invertebrate genomes. However, several noteworthy vertebrate-specific domains exist, especially within proteins involved in developmental, homeostatic, and nuclear regulation. These 2300 proteins have profound implications in understanding human development, malignant transformation, and stemcell biology. In addition, proteins related to acquired immunity, complement fixation, and hemostasis are either unique or show a considerable expansion in the human genome compared to known invertebrate genomes (FIGURE 4).1,52,53 Thus, we find several instances where evolution has harnessed “old” domains to provide novel distinct domain architectures in the human when compared to the fly or worm; that is, “new” proteins created using old domains (Figure 3B). Examples include the serine proteases, which occur with a widely diverse set of protein domains in the plasma proteases (coagulation, complement, and fibrinolytic systems), and the recruitment of the immunoglobulin fold JAMA, November 14, 2001—Vol 286, No. 18 (Reprinted) into molecules of the acquired immune system, eg, antibodies, major histocompatibility complex, and cell adhesion receptors.1,2 Also, in concordance with the greatly increased neuronal complexity in the human compared to the fly and worm, there is an increase in the number of members of protein families involved in neural development, structure, and function (Figure 4).1,56,57 These include neuronal growth regulators, as well as classes of voltagegated ion channels that play a vital role in neuronal network formation and in electrical coupling. Understanding how these components interact to generate the neuronal infrastructure in humans will have an impact on therapeutic modalities to address neuronal injury, as well as provide insights into new ways to diagnose and treat neuropsychiatric disorders. Proteins involved in apoptosis (programmed cell death), a central effector mechanism that regulates cellular physiology, are also greatly expanded in humans.1,58 The central role for this process in neurodegenerative diseases,59 malignancy, and inflammatory conditions60 related to extrinsic mediators (eg, pathogens) and intrinsic mediators (eg, cardiovascular disease, inflammatory bowel disease) constitute areas of intense current investigation. Therapeutic interventions that can modulate the apoptotic process will likely have major effects on some of the most devastating clinical illnesses that afflict humankind.61 However, a focus on the genomic DNA sequence alone will not be sufficient to resolve all the important problems of medicine and biology. The availability of the human genome sequence will substantially enhance the power of proteomics.6,44 In the near future, once the sequence of any unknown protein is determined in any human fluid or cell culture (eg, by a technology involving mass spectroscopy for separation and identification of proteins), there will now virtually always be a “hit,” or “match,” between proteins and their genes.44 The applicability of this approach to better understand disease processes in humans, as well as to facilitate drug dis- ©2001 American Medical Association. All rights reserved. THE HUMAN GENOME covery and development,43,62,63 will undoubtedly increase as genomes of additional model organisms (eg, the mouse, rat, and dog) become available. Such approaches will also enhance the capacity for detecting novel microbes and their protein complements, either pathogens or commensals, both of which have profound implications for enhancing microbial diagnosis and developing improved antimicrobial therapeutics.64 It will also be possible to link proteins and their posttranslational modifications to the pathophysiology of illnesses.43,44 Many of these modifications likely affect the activity and disposition of proteins in health and disease. One special form of posttranslational modification involves protein cleavage, which is essential to the activity of certain proteins (of which insulin and other hormones are classic examples), including those involved in the apoptotic process. Ultimately, the number, complexity, and modifications of proteins encoded by human genes all contribute to the complexity of human biology, and underscore that not all answers lie at the level of genomic information per se. Advances in proteomics44,62,63,65-67 will thus likely enhance the next generation of diagnostics as well as guide therapeutics in ways that were previously impossible or exceedingly difficult (TABLE 2 and TABLE 3).* peutic agents (even when there is no obvious difference in individual pharmacokinetics or biochemical pharmacology).4,7,24,70 The most common form of DNA variation in the human genome is the DNA VARIATION The study of the genome supports the fundamental unity of human beings. We all share at least 99.9% of the nucleotide code in our genome.1,138 And yet it is remarkable that the diversity of human beings at the genetic level is encoded by less than 0.1% variation in our DNA. In any physician’s practice, patients are predisposed to different conditions, respond to the environment in variable ways, metabolize pharmaceuticals differently,4,7 vary regarding dose-response relationships for common drugs, and have a range of susceptibilities to adverse effects of thera- A protein domain is a structural and functional unit that shows evolutionary conservation and, by convention, is represented as a distinct geometric shape. Thus, proteins are made up of 1 or more such building blocks or “domains” and, depending on the types and numbers of domains, proteins with different biological capabilities are created. Many of these domains have seemingly arbitrary nomenclature that, in many cases, reflects the experimental nuances of their initial description. A library of curated protein domains with their biological descriptions is available through the Pfam52 and SMART53 databases. A, The extensive domain shuffling seen in the plasma proteases of the coagulation and complement systems. The “ancient” trypsin family serine protease domain occurs in combination with a myriad of protein interaction domains. Most of these domains are evolutionarily ancient, that is, with the exception of the Gla domain (see below); they are also observed in the fly and the worm. These include: (1) AP: Apple, originally described in the coagulation factors, predicted to possess protein- and/or carbohydrate-binding functions; (2) Kr: Kringle, named after a Danish pastry, has an affinity for lysine-containing peptides; (3) E: epidermal growth factor (EGF)-like; (4) CUB: domain first described in complement proteins and a diverse group of developmental proteins; (5) CCP: complement control protein repeats, also known as “sushi” repeats, first recognized in the complement proteins; and (6) Gla: a hyaluron-binding domain, contains ␥-carboxyglutamate residues, and is seen in proteins associated with the extracellular matrix. Of note is the observation that apolipoprotein (a) likely represents a primatespecific evolutionary event. There is a tremendous expansion of the Kringle domain (dashed segment represents a total of 29 copies of the Kringle domain) in a trypsin family serine protease. B, Examples of domain accretion in nuclear regulators in the human compared with the fly.1,2 Domain accretion refers to greater numbers of a specific domain in a multidomain protein or addition of new domains to a multidomain protein. These domains include: (1) BTB: broad-complex, tramtrack, and bric-a-brac (a name that reflects its early descriptions in Drosophila), a protein interaction domain; (2) Zf: C2H2 class of DNA-binding zinc finger; (3) KRAB: Kruppel-associated box, a vertebrate-specific nuclear protein interaction domain; (4) HD: histone deacetylase, an important class of chromatin-modifying enzymes; (5) U: ubiquitin finger, a domain that targets proteins for proteolytic degradation. There is a major expansion of the numbers of C2H2 zinc fingers in the BTB or KRAB transcription factor (dashed segment represents a total of 3 copies of the Zf domain) families in the human, a feature that may reflect increased ability to mediate regulatory interactions with DNA. *References 1, 2, 4-6, 22, 24, 25, 31, 43, 44, 62, 63, 65-137. single-nucleotide polymorphism (SNP).69,70 Put simply, an SNP is the substitution of one purine or pyrimidine base for another at a given location in a strand of DNA. Generally, SNPs are biallelic (only 2 choices exist at a given site within Figure 3. Prominent Differentiating Features in the Domain Architectures of Representative Human Proteins A Domain "Shuffling" Protein Domains Kr Kr Kr 37 7 Protein Name AP Kr Kr Kr Kr Kr Serine Protease Plasminogen Kr Kr Kr Kr Kr Kr Serine Protease Apolipoprotein (a) E Kr Serine Protease Urokinase-Type Plasminogen Activator Serine Protease Prostate-Specific Antigen AP AP AP AP Serine Protease Gla CUB E CUB E Serine Protease Coagulation Factor X CCP Serine Protease Complement C1r Component E CCP Coagulation Factor XI B Domain Accretion Protein Domains Human Fly BTB Zf Zf BTB KRAB HD HD ©2001 American Medical Association. All rights reserved. Protein Name Zf Zf Zf Zf Zf Zf B-Cell Lymphoma 6 Protein (BCL-6) Zf Zf Zf Gonadotropin-Inducible Transcription Repressor-4 6 2 U Histone Deacetylase 6 (Hd6) Zf Zf HD HD (Reprinted) JAMA, November 14, 2001—Vol 286, No. 18 2301 THE HUMAN GENOME Figure 4. Representative Examples of the Major Differences Between the Predicted Protein Sets of the Human Compared With the Fly and the Worm 120 Worm Fly 100 No. of Proteins Human 80 60 40 20 Developmental Regulators F TN TI R e kin q to Cy C1 P CC e M AC PF gl 1 Neural Structure and Function Kr in FN 2 FN Pl ex in Se m ap ho Sy rin na pt ot ag m in TS P Vo Io lta g n Ch e-G an at ne ed ls ∗ Pr My ot eli ein n Ne s ur op ilin nt Ep hr in W Fβ TG Ca dh er in Co nn ex in 0 Hemostasis, Complement System, Immune Response The numbers of proteins containing the specified Pfam domain or protein family for each of the animal genomes were derived by computational analysis.1 Representative protein domains or protein families that show a 2-fold or greater expansion in the human were categorized into cellular processes (eg, developmental regulators; neural structure and function; or hemostasis, complement system, and immune response) for representation. A detailed biological description of each of these protein domains may be obtained from the Pfam52 or SMART53 databases. TGF-␤ indicates transforming growth factor-␤; TSP, thrombospondin; CCP, complement control protein; and TIR, toll interleukin receptor. Notable examples from this list of proteins that are unique to the human (when compared with the fly and worm) include connexins (constitutive subunits of intercellular channels, providing the structural basis for electrical coupling); neuropilin, a key mediator in axonal guidance along with the semaphorins and plexin molecules; fibronectin type 1 (FN1) domain, a fibrin-binding domain found in certain proteins of the coagulation cascade; fibronectin type 2 (FN2) domain, a collagen-binding domain found in a diverse set of hemostatic regulators; membrane-attack complex/perforin (MACPF), a domain found in certain complement proteins; C1q, a domain found in complement 1q and in many collagens; cytokines and tumor necrosis factor (TNF), 2 of the central families of secreted proteins that mediate a wide spectrum of immune-related functions. *Voltage-gated (VG) ion channels include VG-sodium, -calcium, and -potassium channels. Table 2. Immediate Benefits From Whole-Genome Analysis by Genetic Basis of Disease Genetic Mechanism Standard mendelian patterns of inheritance and X-linked inheritance Complex (polygenic) inheritance Inherited disorders involving unstable triplet repeats and the clinical phenomenon of anticipation Genetic imprinting (parent-of-origin effects) Acquired somatic mutations (eg, cancer) Benefits Improved familial linkage studies in medical genetics; discovery of genes (and regulatory regions) with mutations that result in phenotypes (diseases) that conform to the classic principles of mendelism; better identification of candidate genes; improved functional and positional cloning of genes involved in the causation of disease68,69 Better identification of disease susceptibility loci and candidate genes; more effective association (population) studies involving the search for alleles that contribute to common diseases such as cardiovascular disease, diabetes, and cancer, in which the phenotype does not conform to the classic principles of mendelism68-70 Better catalogs of repeats and polyglutamine tracts22; better identification of candidate genes71 More efficient and rapid identification of methylation patterns based on high-throughput mass spectroscopy analysis and correlating with gene expression and clinical phenotypes31,72; comparison of DNA methylation patterns between mouse and human in the context of disease, yielding clearer insights into disease in which pathogenesis is linked to abnormal imprinting and related epigenetic phenomena A reference for comparing germline and somatic configuration of genes more effectively68 a population). The nomenclature defining a mutation (a change in DNA that may affect phenotype7 [http:www.nhgri 2302 .nih.gov/DIR/VIP/Glossary/pub_glossary .cgi]) can be somewhat arbitrary and relative. By convention, when a substitution JAMA, November 14, 2001—Vol 286, No. 18 (Reprinted) is present in more than 1% of a given target population and causes no discernibly abnormal phenotype, it is called a variant or polymorphism.7,69,70 Singlenucleotide polymorphisms can affect gene function, or they can be neutral. Neutrality is sometimes inferred if an SNP does not alter protein coding (ie, a change in an exon that encodes a different amino acid). In practice, this inference can be wrong. It is also worth noting that an SNP may be subtly responsible for an abnormal phenotype, but only in the context of a given environment (or the simultaneous presence of SNPs in other locations), without which an abnormal phenotype is not expressed.69,70 We now have a genome-wide survey of several million variants, with precise nucleotide localization, in an ethnogeographically divergent group of individuals.1,138 In comparing chromosomes from any 2 randomly selected individuals, we know there is an average of 1 variation for every 1250 nucleotides. These varia- ©2001 American Medical Association. All rights reserved. THE HUMAN GENOME tions can occur within exons, with synonymous (no change in amino acid) or nonsynonymous (a change in amino acid) alterations in code, or they can occur outside exons within intronic or intergenic regions of the genome. Less than 1% of all known SNPs encode a direct amino acid change of the ultimate protein product of a gene.1 Therefore, there are only thousands (not millions) of genetic variations that directly contribute to the structural protein diversity of human beings.1 We, and others, are currently performing large-scale resequencing and genotyping to define the frequency of these variations in various populations. While such changes are certainly important to medicine, this finding implies that future medical research will need to also focus on the contributions of polymorphisms in noncoding regions or intergenic regions of the genome, something that was previously difficult or impossible to do. Thus, SNPs in proximity to various regulatory regions,25 some of which exist at a great distance from the regulated gene in either 5⬘ or 3⬘ directions, are likely to be important. By the same token, SNPs in introns may have an unexpected role in the causation of human disease.26,139 Finally, SNPs in genes whose final product is an RNA may also be of unexpected importance.140 An understanding of the human genome and its DNA variation will allow a rapid expansion of the medical applications of pharmacogenetics. 4,24 There is a number of clear examples where DNA variation, primarily, but not exclusively, in the form of SNPs has implications for clinical research and medical practice (Tables 1, 2, and 3).4,7,24,25,76-105 These include polymorphisms that influence the clinical course or response to therapy. Thus, angiotensin-II type-1 receptor polymorphisms can have an impact on the severity of congestive heart Table 3. Short- and Long-term Research and Clinical Benefits From Whole-Genome Analysis Identification of Genomic Technologies Disease Genes Drug Discovery Bioinformatics: (1) predicting Integral component in Target identification (homologs of protein structure, (2) the structural, known drug predicting protein functional, and targets or key function, (3) analysis of evolutionary members in a genetic variations, (4) analysis of a 1,2 biological impact of variations on genome pathway); structure and function, structure-based (5) analysis of rational drug expression data, (6) design (small representation and molecule or analysis of biomolecular biologics)73-75 interactions (pathways) to understand disease-gene relationships Resequencing to catalog Genetic approaches Efficient identification genetic variations for identification of of genes involved candidate disease in causation (or 5,69,70 genes prevention) of disease5,69,70 Predictive Toxicology Clinical Trials Clinical Practice Integrative analysis of Integration of Personalized medicine: pathology and clinical computational adaptation of data with biology, clinical preventive, polymorphism and data, and diagnostic, and 5,6,24 expression data polymorphism and therapeutic expression approaches to the 5,6,24 data genotypes and gene expression profiles (especially proteomic profiles) of an individual patient5,6,24 Stratification in clinical Assess susceptibility to diseases such as trials to predict cancer,85-93 toxicity and infectious efficacy both in a disease,25,94-98 and prospective and asthma99-105 retrospective Assess response to manner5,24,76-84 therapeutic interventions76-84 Differential expression: (1) Differentially Target identification Identification of surrogate Identification of Diagnostic and RNA arrays, (2) protein, expressed or validation63,65-67,115 markers for surrogate prognostic markers (3) metabolite (in other altered toxicity65,67,112,116 markers to to monitor eukaryotic genes109-112 or predict toxicity or progression of proteins43,44,113-115 systems),106,107 (4) tissue efficacy65,67,112,116 disease or response arrays108 to therapeutic intervention62,65,67,116,117,118 Protein interaction maps of Identification of genes Identification of Identification of Not applicable Not applicable pathways involved in increased number unexpected pathways that are disease: yeast 2-hybrid of potential drug involved in drug components of genetic screen, mass targets75,123,124; toxicity124 complex special spectroscopy pathways involved applications to in disease44,118-122 infectious pathogens125-127 Not applicable Not applicable Comparative genome Use of the mouse and Use of the mouse, rat, Use of mammalian and dog genomes genomes (eg, rats, analysis (animal models other animals as to model efficacy mice) to create better of disease) models to study of new models for predictive human 128-134 128,133,137 therapies toxicology and disease toxicogenomics135,136 ©2001 American Medical Association. All rights reserved. Identification of reliable surrogate markers of toxicity4,5,7,24 (ie, relevant polymorphisms in genes that are drug targets or drug modifiers) (Reprinted) JAMA, November 14, 2001—Vol 286, No. 18 2303 THE HUMAN GENOME failure as well as the response to angiotensin-converting enzyme inhibitors141,142; ␤-adrenergic receptor variants may alter airway hyperreactivity and response to ␤-agonists administered through metered-dose inhalers143; and the apolipoprotein E4 allele affects onset of disease and the differential response to anticholinergic agents in patients with Alzheimer disease.144 Also, the bioavailability of drugs is affected by polymorphisms in genes that code for proteins regulating drug metabolism and disposition (eg, MDR1, a drug efflux pump which regulates digoxin levels, 145 CYP2C19, which regulates omeprazole metabolism,146 and CYP2C9 or 2C19, which regulate tolbutamide and phenytoin metabolism78). These have clinical consequences. The availability of genome-wide data on DNA variation is thus likely to expand progress in prevention, diagnosis, and treatment customized to the needs of a specific patient, rather than to a statistical average. In addition, SNPs provide a new tool for familial linkage and population-based association studies to speed the identification of genes as targets for new diagnostics and therapeutics.4,24,69,70 In this context, it will soon be possible to integrate information on DNA variations in human populations with an understanding of entire networks of genes. Again, this would have been difficult or impossible prior to the sequencing of the entire genome. Since most common human diseases culminate from long-standing interactions between many genes and environmental factors (including lifestyle), predicting the contributions of genes in complex disorders will remain a challenge for medicine for many years to come. Biological Complexity and the Role of the Genome in the Future of Medicine The modest number of human genes means that we must explore mechanisms that generate the complexities inherent in human development and the sophisticated signaling systems that maintain homeostasis. There is a large number of ways that the func2304 tions of individual genes and gene products are regulated. An overview of these mechanisms and their relevance to disease and therapeutic intervention is discussed briefly and enumerated in Table 11,2,4,5,7,15-46 The key point is that certain observations at the clinical level provide unique opportunities to understand how the genome functions as an integrated system. Thus, the study of mendelian disorders has led to unique insights regarding the functions of more than 1000 genes.49 However, many common disorders, including cancer, asthma, type 2 diabetes mellitus, cardiovascular abnormalities, and neuropsychiatric illness, cannot be generally explained on the basis of variation in a single gene—that is, they are polygenic in origin.68 Other illnesses are manifestations of (1) the process of creating triplet repeats22,23 (eg, Huntington disease, spinocerebellar ataxia, fragile X syndrome); (2) abnormalities of certain epigenetic phenomena, such as gene imprinting31,35 (eg, Prader-Willi syndrome 33 and Beckman-Wiedemann syndrome34); (3) abnormalities of mitochondrial genes147 (eg, MELAS [myelopathy, encephalopathy, lactic acidosis, and stroke-like episodes] syndrome, Kearns-Sayre syndrome); and (4) somatic mutation or mosaicism148,149 (eg, McCune-Albright syndrome, paroxysmal nocturnal hemoglobinuria, cancer). In addition, there is growing evidence that conditions such as Prader-Willi syndrome are caused by a variant in genes whose product is an RNA molecule, not a protein per se (Table 1).29,140 Thus, understanding the physiological roles of noncoding RNA and its modifications may contribute to understanding of the causation of specific diseases.37,140,150,151 While the use of genomic sequence data to identify genetic determinants of disease has already shown significant progress (Table 2),22,31,68-72 the availability of the genomic sequence, and the development of high-throughput experimental and computational technologies, heralds a new era in our understanding of disease processes (Table JAMA, November 14, 2001—Vol 286, No. 18 (Reprinted) 3). The use of computational homology–based approaches to identify new drug targets and to predict their structures in facilitating rational drug design is revolutionizing the drug development process.73,74 Efforts to incorporate genomic approaches into various stages of the drug development process for the development of novel and improved therapeutic agents, as well as for optimizing patient stratification and following clinical outcomes in treatment trials, are currently under way (Table 3).5,24,65,67,112,116 The practical benefits in the clinical practice of medicine will become increasingly apparent when there is a complete integration of genomic information with the phenotypes of clinical disease (Tables 2 and 3). We are in the midst of a major paradigm shift in biology and medicine119; the process of studying genes in isolation has now shifted toward exploring networks of genes involved in cellular processes152 and disease, identifying molecular “portraits” of disease based on tissue or organ involvement, and ultimately defining the biochemical readouts that are specific to clinical conditions. The era of “personalized” medicine will evolve as a parallel process, in which DNA variations recorded in human populations will be integrated into the above paradigm, to guide a new generation of diagnostic, prognostic, and therapeutic modalities designed to improve patient care (Table 3).153 SOCIETAL CHALLENGES AND LEGACIES FOR GENOMIC RESEARCH Basic science researchers have the task of deciphering the biological meaning of the 2.9 billion nucleotide codes that comprise the human genome. Physicians may be called on to interpret the scientific implications for their patients. However, physicians may also be called on to address complex historical and societal issues, which are either induced or revived by the sequencing of the human genome, in their everyday practice. ©2001 American Medical Association. All rights reserved. THE HUMAN GENOME One fundamental issue is the extent to which knowledge of the genomic DNA sequence allows prediction of the essence of who we are, including the determination of risk for illness in various settings. There are some who may view the genome in a deterministic way, believing that the human condition will ultimately be seen entirely as a manifestation of sequence information and computation. We do not subscribe to such a view. Nevertheless, an individual’s DNA is, in a sense, the ultimate personal identifier; thus, some patients may fear that the advent of new genomic technologies will affect their livelihoods and standing in the community. It is ironic that approximately 1 week prior to the publications of the human genome, an agency of the US government went to court for the first time to block a private employer from compelling its employees to submit to genetic testing in work-related injury inquiries, threatening dismissal for noncompliance.154,155 Thus, physicians and other health care providers are likely to have an interest in state and federal legislation protecting patient privacy and prohibiting discrimination on the basis of genetic testing. Patients who have suffered such discrimination in the past or fear it in the future are perhaps unlikely to view the scientific achievement of sequencing the human genome as an entirely positive accomplishment. We believe legislation to protect genetic privacy and prevent discrimination is essential to progress in genomics research. There is also a complex social and political history related to human genetics. At various times in the past, many societies, including our own, adopted theories of race and genetics as the justification for political oppression against vulnerable groups.156,157 James D. Watson, the founding director of the Human Genome Project at the National Institutes of Health, provides an important perspective on these issues.158 It is possible that medicine, even today, is affected by subtle and unrecognized biases. Thus, it has been argued that the medical community may wish to mark the milestone of the recent sequencing of the human genome as a time to discuss how such biases influence medical education, clinical research, and medical practice.159 In such a discussion, we would offer that an analysis of the genome1,2 reveals a fundamental unity for all human beings. Our task now is to use the tools of modern genomics to prevent, diagnose, and treat illnesses, and, at the same time, to try to ensure that the benefits of genomics research extend fairly to all members of society. Acknowledgment: We wish to thank the members of the Celera scientific staff for contributions toward analysis of the human genome sequence, and Beth Hoyle, BA, for her excellent editorial assistance in preparing the manuscript. We also thank Steven L. Salzberg, PhD, for his helpful discussions and assistance in illustrating the segmental duplications within the human genome. REFERENCES 1. Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science. 2001;291: 1304-1351. 2. Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860-921. 3. Cavalli-Sforza LL. The DNA revolution in population genetics. Trends Genet. 1998;14:60-65. 4. Weber WW. Pharmacogenetics. Oxford, England: Oxford University Press; 1997. 5. Roses AD. Pharmacogenetics and the practice of medicine. Nature. 2000;405:857-865. 6. Broder S, Venter JC. Whole genomes. Curr Opin Biotechnol. 2000;11:581-585. 7. Broder S, Venter JC. Sequencing the entire genomes of free-living organisms. Annu Rev Pharmacol Toxicol. 2000;40:97-132. 8. The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans. Science. 1998; 282:2012-2018. 9. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408: 796-815. 10. Adams MD, Celniker SE, Holt RA, et al. The genome sequence of Drosophila melanogaster. Science. 2000;287:2185-2195. 11. Dickson D. Gene estimate rises as US and UK discuss freedom of access. Nature. 1999;401:311. 12. Liang F, Holt I, Pertea G, et al. Gene index analysis of the human genome estimates approximately 120,000 genes. Nat Genet. 2000;25:239-240. 13. Antequera F, Bird A. Number of CpG islands and genes in human and mouse. Proc Natl Acad Sci U S A. 1993;90:11995-11999. 14. Wright FA, Lemon WJ, Zhao WD, et al. A draft annotation and overview of the human genome. Genome Biol. 2001;2:1-18. 15. Kobayashi K, Nakahori Y, Miyake M, et al. An ancient retrotransposal insertion causes Fukuyamatype congenital muscular dystrophy. Nature. 1998; 394:388-392. 16. Dawkins R, Leelayuwat C, Gaudieri S, et al. Genomics of the major histocompatibility complex. Immunol Rev. 1999;167:275-304. 17. Saikawa Y, Kaneda H, Yue L, et al. Structural evidence of genomic exon-deletion mediated by AluAlu recombination in a human case with heme oxygenase-1 deficiency. Hum Mutat. 2000;16:178-179. 18. Rohlfs EM, Puget N, Graham ML, et al. An Alumediated 7.1 kb deletion of BRCA1 exons 8 and 9 in ©2001 American Medical Association. All rights reserved. breast and ovarian cancer families that results in alternative splicing of exon 10. Genes Chromosomes Cancer. 2000;28:300-307. 19. Norris J, Fan D, Aleman C, et al. Identification of a new subclass of Alu DNA repeats which can function as estrogen receptor-dependent transcriptional enhancers. J Biol Chem. 1995;270:22777-22782. 20. Sharan C, Hamilton NM, Parl AK, et al. Identification and characterization of a transcriptional silencer upstream of the human BRCA2 gene. Biochem Biophys Res Commun. 1999;265:285-290. 21. Hamdi HK, Nishio H, Tavis J, et al. Alu-mediated phylogenetic novelties in gene regulation and development. J Mol Biol. 2000;299:931-939. 22. Usdin K, Grabczyk E. DNA repeat expansions and human disease. Cell Mol Life Sci. 2000;57:914-931. 23. Lieberman AP, Fischbeck KH. Triplet repeat expansion in neuromuscular disease. Muscle Nerve. 2000; 23:843-850. 24. Roses AD. Pharmacogenetics and future drug development and delivery. Lancet. 2000;355:13581361. 25. Knight JC, Udalova I, Hill AV, et al. A polymorphism that affects OCT-1 binding to the TNF promoter region is associated with severe malaria. Nat Genet. 1999;22:145-150. 26. Horikawa Y, Oda N, Cox NJ, et al. Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nat Genet. 2000;26:163175. 27. Hoffmeyer S, Burk O, von Richter O, et al. Functional polymorphisms of the human multidrugresistance gene. Proc Natl Acad Sci U S A. 2000;97: 3473-3478. 28. Benhorin J, Taub R, Goldmit M, et al. Effects of flecainide in patients with new SCN5A mutation. Circulation. 2000;101:1698-1706. 29. Cavaille J, Buiting K, Kiefmann M, et al. From the cover: identification of brain-specific and imprinted small nucleolar RNA genes exhibiting an unusual genomic organization. Proc Natl Acad Sci U S A. 2000; 97:14311-14316. 30. Mlynarczyk SK, Panning B. X inactivation. Curr Biol. 2000;10:R899-R903. 31. Feinberg AP. DNA methylation, genomic imprinting and cancer. Curr Top Microbiol Immunol. 2000; 249:87-99. 32. Wolffe AP, Matzke MA. Epigenetics. Science. 1999;286:481-486. 33. Ohta T, Gray TA, Rogan PK, et al. Imprintingmutation mechanisms in Prader-Willi syndrome. Am J Hum Genet. 1999;64:397-413. 34. Engel JR, Smallwood A, Harper A, et al. Epigenotype-phenotype correlations in Beckwith-Wiedemann syndrome. J Med Genet. 2000;37:921-926. 35. Cui H, Horon IL, Ohlsson R, et al. Loss of imprinting in normal tissue of colorectal cancer patients with microsatellite instability. Nat Med. 1998;4:12761280. 36. Neuberger MS, Scott J. Immunology: RNA editing AIDs antibody diversification? Science. 2000;289: 1705-1706. 37. Wang Q, Khillan J, Gadue P, Nishikura K. Requirement of the RNA editing deaminase ADAR1 gene for embryonic erythropoiesis. Science. 2000;290: 1765-1768. 38. Yu L, Heere-Ress E, Boucher B, et al. Familial hypercholesterolemia. Atherosclerosis. 1999;146:125131. 39. Philips AV, Cooper TA. RNA processing and human disease. Cell Mol Life Sci. 2000;57:235-249. 40. Buee L, Bussiere T, Buee-Scherrer V, et al. Tau protein isoforms, phosphorylation and role in neurodegenerative disorders. Brain Res Brain Res Rev. 2000; 33:95-130. 41. Bamford RN, Battiata AP, Waldmann TA. IL-15. J Leukoc Biol. 1996;59:476-480. 42. Holcik M, Sonenberg N, Korneluk RG. Internal ri- (Reprinted) JAMA, November 14, 2001—Vol 286, No. 18 2305 THE HUMAN GENOME bosome initiation of translation and the control of cell death. Trends Genet. 2000;16:469-473. 43. Banks RE, Dunn MJ, Hochstrasser DF, et al. Proteomics. Lancet. 2000;356:1749-1756. 44. Pandey A, Mann M. Proteomics to study genes and genomes. Nature. 2000;405:837-846. 45. Kehoe JW, Bertozzi CR. Tyrosine sulfation. Chem Biol. 2000;7:R57-R61. 46. McKinsey TA, Zhang CL, Lu J, Olson EN. Signaldependent nuclear export of a histone deacetylase regulates muscle differentiation. Nature. 2000;408: 106-111. 47. Hamdi H, Nishio H, Zielinski R, Dugaiczyk A. Origin and phylogenetic distribution of Alu DNA repeats. J Mol Biol. 1999;289:861-871. 48. Howard BH, Sakamoto K. Alu interspersed repeats. New Biol. 1990;2:759-770. 49. Antonarakis SE, McKusick VA. OMIM passes the 1,000-disease-gene mark. Nat Genet. 2000;25:11. 50. Broder S, Merigan TC Jr, Bolognesi D. Textbook of AIDS Medicine. Baltimore, Md: Williams & Wilkins; 1994. 51. Baltimore D. Our genome unveiled. Nature. 2001; 409:814-816. 52. Bateman A, Birney E, Durbin R, et al. The Pfam protein families database. Nucleic Acids Res. 2000; 28:263-266. 53. Schultz J, Copley RR, Doerks T, et al. SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 2000;28:231-234. 54. Wu Q, Maniatis T. A striking organization of a large family of human neural cadherin-like cell adhesion genes. Cell. 1999;97:779-790. 55. Wu Q, Maniatis T. Large exons encoding multiple ectodomains are a characteristic feature of protocadherin genes. Proc Natl Acad Sci U S A. 2000; 97:3124-3129. 56. Ranscht B. Cadherins. Int J Dev Neurosci. 2000; 18:643-651. 57. Missler M, Sudhof TC. Neurexins. Trends Genet. 1998;14:20-26. 58. Aravind L, Dixit VM, Koonin EV. Apoptotic molecular machinery. Science. 2001;291:1279-1284. 59. Yuan J, Yankner BA. Apoptosis in the nervous system. Nature. 2000;407:802-809. 60. Krammer PH. CD95’s deadly mission in the immune system. Nature. 2000;407:789-795. 61. Nicholson DW. From bench to clinic with apoptosis-based therapeutic agents. Nature. 2000;407: 810-816. 62. Smith MA, Bains SK, Betts JC, et al. Use of twodimensional gel electrophoresis to measure changes in synovial fluid proteins from patients with rheumatoid arthritis treated with antibody to CD4. Clin Diagn Lab Immunol. 2001;8:105-111. 63. Yoshida M, Loo JA, Lepleya RA. Proteomics as a tool in the pharmaceutical drug design process. Curr Pharm Des. 2001;7:291-310. 64. Wren BW. Microbial genome analysis. Nat Rev Genet. 2000;1:30-39. 65. Fung ET, Wright GL, Jr, Dalmasso EA. Proteomic strategies for biomarker identification. Curr Opin Mol Ther. 2000;2:643-650. 66. Fung ET, Thulasiraman V, Weinberger SR, Dalmasso EA. Protein biochips for differential profiling. Curr Opin Biotechnol. 2001;12:65-69. 67. Kennedy S. Proteomic profiling from human samples. Toxicol Lett. 2001;120:379-384. 68. Peltonen L, McKusick VA. Genomics and medicine. Science. 2001;291:1224-1229. 69. Risch NJ. Searching for genetic determinants in the new millennium. Nature. 2000;405:847-856. 70. Chakravarti A. Population genetics: making sense out of sequence. Nat Genet. 1999;21(suppl 1):5660. 71. Hughes RE, Olson JM. Therapeutic opportunities in polyglutamine disease. Nat Med. 2001;7:419423. 2306 72. Kondo T, Bobek MP, Kuick R, et al. Wholegenome methylation scan in ICF syndrome. Hum Mol Genet. 2000;9:597-604. 73. Sanchez R, Pieper U, Melo F, et al. Protein structure modeling for structural genomics. Nat Struct Biol. 2000;7(suppl):986-990. 74. Vitkup D, Melamud E, Moult J, Sander C. Completeness in structural genomics. Nat Struct Biol. 2001; 8:559-566. 75. Teichmann SA, Murzin AG, Chothia C. Determination of protein function, evolution and interactions by structural genomics. Curr Opin Struct Biol. 2001;11:354-363. 76. Arranz MJ, Munro J, Birkett J, et al. Pharmacogenetic prediction of clozapine response. Lancet. 2000; 355:1615-1616. 77. Lesch KP, Bengel D, Heils A, et al. Association of anxiety-related traits with a polymorphism in the serotonin transporter gene regulatory region. Science. 1996;274:1527-1531. 78. Inoue K, Yamazaki H, Imiya K, et al. Relationship between CYP2C9 and 2C19 genotypes and tolbutamide methyl hydroxylation and S-mephenytoin 4’hydroxylation activities in livers of Japanese and Caucasian populations. Pharmacogenetics. 1997;7:103113. 79. Israel E, Drazen JM, Liggett SB, et al. Effect of polymorphism of the beta(2)-adrenergic receptor on response to regular use of albuterol in asthma. Int Arch Allergy Immunol. 2001;124:183-186. 80. Iwata N, Cowley DS, Radel M, et al. Relationship between a GABAA alpha 6 Pro385Ser substitution and benzodiazepine sensitivity. Am J Psychiatry. 1999;156:1447-1449. 81. Redman AR. Implications of cytochrome P450 2C9 polymorphism on warfarin metabolism and dosing. Pharmacotherapy. 2001;21:235-242. 82. Breen G, Brown J, Maude S, et al. -141 C del/ins polymorphism of the dopamine receptor 2 gene is associated with schizophrenia in a British population. Am J Med Genet. 1999;88:407-410. 83. Gelernter J, Kranzler H, Coccaro E, et al. D4 dopamine-receptor (DRD4) alleles and novelty seeking in substance-dependent, personality-disorder, and control subjects. Am J Hum Genet. 1997;61:1144-1152. 84. Cravchik A, Gejman PV. Functional analysis of the human D5 dopamine receptor missense and nonsense variants. Pharmacogenetics. 1999;9:199-206. 85. El-Omar EM, Carrington M, Chow WH, et al. Interleukin-1 polymorphisms associated with increased risk of gastric cancer. Nature. 2000;404:398-402. 86. Ziv E, Cauley J, Morin PA, et al. Association between the T29→C polymorphism in the transforming growth factor 1 gene and breast cancer among elderly white women. JAMA. 2001;285:2859-2863. 87. Struewing JP, Hartge P, Wacholder S, et al. The risk of cancer associated with specific mutations of BRCA1 and BRCA2 among Ashkenazi Jews. N Engl J Med. 1997;336:1401-1408. 88. Woodage T, King SM, Wacholder S, et al. The APCI1307K allele and cancer risk in a communitybased study of Ashkenazi Jews. Nat Genet. 1998;20: 62-65. 89. Brockmoller J, Cascorbi I, Henning S, et al. Molecular genetics of cancer susceptibility. Pharmacology. 2000;61:212-227. 90. Ma J, Stampfer MJ, Giovannucci E, et al. Methylenetetrahydrofolate reductase polymorphism, dietary interactions, and risk of colorectal cancer. Cancer Res. 1997;57:1098-1102. 91. Rebbeck TR, Kantoff PW, Krithivas K, et al. Modification of BRCA1-associated breast cancer risk by the polymorphic androgen-receptor CAG repeat. Am J Hum Genet. 1999;64:1371-1377. 92. Storey A, Thomas M, Kalita A, et al. Role of a p53 polymorphism in the development of human papillomavirus-associated cancer. Nature. 1998;393:229234. JAMA, November 14, 2001—Vol 286, No. 18 (Reprinted) 93. Hildesheim A, Schiffman M, Brinton LA, et al. p53 polymorphism and risk of cervical cancer. Nature. 1998; 396:531-532. 94. Smith MW, Dean M, Carrington M, et al. Contrasting genetic influence of CCR2 and CCR5 variants on HIV-1 infection and disease progression. Science. 1997;277:959-965. 95. Lorenz E, Mira JP, Cornish KL, et al. A novel polymorphism in the toll-like receptor 2 gene and its potential association with staphylococcal infection. Infect Immun. 2000;68:6398-6401. 96. Bellamy R, Ruwende C, Corrah T, et al. Variations in the NRAMP1 gene and susceptibility to tuberculosis in West Africans. N Engl J Med. 1998;338: 640-644. 97. Zimmerman PA, Woolley I, Masinde GL, et al. Emergence of FY*A(null) in a Plasmodium vivaxendemic region of Papua New Guinea. Proc Natl Acad Sci U S A. 1999;96:13973-13977. 98. Flores-Villanueva PO, Yunis EJ, Delgado JC, et al. Control of HIV-1 viremia and protection from AIDS are associated with HLA-Bw4 homozygosity. Proc Natl Acad Sci U S A. 2001;98:5140-5145. 99. Grasemann H, Yandava CN, Storm van’s Gravesande K, et al. A neuronal NO synthase (NOS1) gene polymorphism is associated with asthma. Biochem Biophys Res Commun. 2000;272:391-394. 100. Graves PE, Kabesch M, Halonen M, et al. A cluster of seven tightly linked polymorphisms in the IL-13 gene is associated with total serum IgE levels in three populations of white children. J Allergy Clin Immunol. 2000;105:506-513. 101. Martinez FD, Graves PE, Baldini M, et al. Association between genetic polymorphisms of the beta2adrenoceptor and response to albuterol in children with and without a history of wheezing. J Clin Invest. 1997; 100:3184-3188. 102. Dahl M, Tybjaerg-Hansen A, Lange P, Nordestgaard BG. DeltaF508 heterozygosity in cystic fibrosis and susceptibility to asthma. Lancet. 1998;351:19111913. 103. Hill MR, Cookson WO. A new variant of the beta subunit of the high-affinity receptor for immunoglobulin E (Fc epsilon RI-beta E237G). Hum Mol Genet. 1996;5:959-962. 104. Drazen JM, Yandava CN, Dube L, et al. Pharmacogenetic association between ALOX5 promoter genotype and the response to anti-asthma treatment. Nat Genet. 1999;22:168-170. 105. Stafforini DM, Numao T, Tsodikov A, et al. Deficiency of platelet-activating factor acetylhydrolase is a severity factor for asthma. J Clin Invest. 1999; 103:989-997. 106. Fiehn O, Kopka J, Dormann P, et al. Metabolite profiling for plant functional genomics. Nat Biotechnol. 2000;18:1157-1161. 107. Raamsdonk LM, Teusink B, Broadhurst D, et al. A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nat Biotechnol. 2001;19:45-50. 108. Kononen J, Bubendorf L, Kallioniemi A, et al. Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nat Med. 1998;4:844847. 109. Kitahara O, Furukawa Y, Tanaka T, et al. Alterations of gene expression during colorectal carcinogenesis revealed by cDNA microarrays after lasercapture microdissection of tumor tissues and normal epithelia. Cancer Res. 2001;61:3544-3549. 110. Ross DT, Scherf U, Eisen MB, et al. Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet. 2000;24:227-235. 111. Welsh JB, Zarrinkar PP, Sapinoso LM, et al. Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. Proc Natl Acad Sci U S A. 2001;98:1176-1181. 112. Scherf U, Ross DT, Waltham M, et al. A gene ©2001 American Medical Association. All rights reserved. THE HUMAN GENOME expression database for the molecular pharmacology of cancer. Nat Genet. 2000;24:236-244. 113. Naaby-Hansen S, Waterfield MD, Cramer R. Proteomics—post-genomic cartography to understand gene function. Trends Pharmacol Sci. 2001;22:376-384. 114. Soskic V, Gorlach M, Poznanovic S, et al. Functional proteomics analysis of signal transduction pathways of the platelet-derived growth factor beta receptor. Biochemistry. 1999;38:1757-1764. 115. Rohlff C. Proteomics in molecular medicine. Electrophoresis. 2000;21:1227-1234. 116. Steiner S, Gatlin CL, Lennon JJ, et al. Proteomics to display lovastatin-induced protein and pathway regulation in rat liver. Electrophoresis. 2000;21: 2129-2137. 117. Celis JE, Wolf H, Ostergaard M. Bladder squamous cell carcinoma biomarkers derived from proteomics. Electrophoresis. 2000;21:2115-2121. 118. Husi H, Ward MA, Choudhary JS, et al. Proteomic analysis of NMDA receptor-adhesion protein signaling complexes. Nat Neurosci. 2000;3:661669. 119. Vidal M. A biological atlas of functional maps. Cell. 2001;104:333-339. 120. Goldstein LS. Kinesin molecular motors. Proc Natl Acad Sci U S A. 2001;98:6999-7003. 121. Walhout AJ, Vidal M. High-throughput yeast two-hybrid assays for large-scale protein interaction mapping. Methods. 2001;24:297-306. 122. Walhout AJ, Vidal M. Protein interaction maps for model organisms. Nat Rev Mol Cell Biol. 2001; 2:55-63. 123. Zeng J. Mini-review: computational structurebased design of inhibitors that target protein surfaces. Comb Chem High Throughput Screen. 2000; 3:355-362. 124. Stanyon CA, Finley RL Jr. Progress and potential of Drosophila protein interaction maps. Pharmacogenomics. 2000;1:417-431. 125. McCraith S, Holtzman T, Moss B, Fields S. Genome-wide analysis of vaccinia virus protein-protein interactions. Proc Natl Acad Sci U S A. 2000;97:48794884. 126. Uetz P, Giot L, Cagney G, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623-627. 127. Rain JC, Selig L, De Reuse H, et al. The proteinprotein interaction map of Helicobacter pylori. Nature. 2001;409:211-215. 128. Blake JA, Eppig JT, Richardson JE, et al. The Mouse Genome Database (MGD). Nucleic Acids Res. 2001;29:91-94. 129. Scalzi JM, Hozier JC. Comparative genome mapping. Genomics. 1998;47:44-51. 130. Ringwald M, Baldock R, Bard J, et al. A database for mouse development. Science. 1994;265: 2033-2034. 131. Marshall E. Genome sequencing: Celera assembles mouse genome; public labs plan new strategy. Science. 2001;292:822. 132. Mody M, Cao Y, Cui Z, et al. Genome-wide gene expression profiles of the developing mouse hippocampus. Proc Natl Acad Sci U S A. 2001;98:88628867. 133. Nadeau JH, Balling R, Barsh G, et al. Sequence interpretation: Functional annotation of mouse genome sequences. Science. 2001;291:1251-1255. 134. Fortini ME, Skupski MP, Boguski MS, Hariharan IK. A survey of human disease gene counterparts in the Drosophila genome. J Cell Biol. 2000;150:F23-F30. 135. Nuwaysir EF, Bittner M, Trent J, et al. Microarrays and toxicology. Mol Carcinog. 1999;24:153159. 136. Kanitz MH, Witzmann FA, Zhu H, et al. Alterations in rabbit kidney protein expression following lead exposure as analyzed by two-dimensional gel electrophoresis. Electrophoresis. 1999;20:2977-2985. 137. Weekes J, Wheeler CH, Yan JX, et al. Bovine dilated cardiomyopathy. Electrophoresis. 1999;20:898906. 138. Sachidanandam R, Weissman D, Schmidt SC, et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001;409:928-933. 139. Kishi F, Fujishima S, Tabuchi M. Dinucleotide repeat polymorphism in the third intron of the NRAMP2/ DMT1 gene. J Hum Genet. 1999;44:425-427. 140. Ridanpaa M, van Eenennaam H, Pelin K, et al. Mutations in the RNA component of RNase MRP cause a pleiotropic human disease, cartilage-hair hypoplasia. Cell. 2001;104:195-203. 141. Andersson B, Blange I, Sylven C. Angiotensin-II type 1 receptor gene polymorphism and long-term survival in patients with idiopathic congestive heart failure. Eur J Heart Fail. 1999;1:363-369. 142. Benetos A, Cambien F, Gautier S, et al. Influence of the angiotensin II type 1 receptor gene polymorphism on the effects of perindopril and nitrendipine on arterial stiffness in hypertensive individuals. Hypertension. 1996;28:1081-1084. 143. Johnson M. The beta-adrenoceptor. Am J Respir Crit Care Med. 1998;158:S146-S153. 144. Poirier J, Delisle MC, Quirion R, et al. Apolipoprotein E4 allele as a predictor of cholinergic deficits and treatment outcome in Alzheimer disease. Proc Natl Acad Sci U S A. 1995;92:12260-12264. 145. Cascorbi I, Gerloff T, Johne A, et al. Frequency of single nucleotide polymorphisms in the Pglycoprotein drug transporter MDR1 gene in white subjects. Clin Pharmacol Ther. 2001;69:169-174. 146. Furuta T, Shirai N, Takashima M, et al. Effect of genotypic differences in CYP2C19 on cure rates for Helicobacter pylori infection by triple therapy with a proton pump inhibitor, amoxicillin, and clarithromycin. Clin Pharmacol Ther. 2001;69:158168. 147. Zeviani M, Tiranti V, Piantadosi C. Mitochondrial disorders. Medicine (Baltimore). 1998;77: 59-72. 148. Aldred MA, Trembath RC. Activating and inactivating mutations in the human GNAS1 gene. Hum Mutat. 2000;16:183-189. 149. Gottlieb B, Beitel LK, Trifiro MA. Somatic mosaicism and variable expressivity. Trends Genet. 2001; 17:79-82. 150. Grosjean H, Benne R. Modification and Editing of RNA. Washington, DC: American Society of Microbiology Press; 1998. 151. Eddy SR. Noncoding RNA genes. Curr Opin Genet Dev. 1999;9:695-699. 152. Ideker T, Thorsson V, Ranish JA, et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science. 2001; 292:929-934. 153. Collins FS, McKusick VA. Implications of the Human Genome Project for medical science. JAMA. 2001; 285:540-544. 154. Gottlieb S. US employer agrees to stop genetic testing. BMJ. 2001;322:449. 155. Schafer S. Railroad agrees to stop gene-testing workers. Washington Post. April 19, 2001:E01. 156. Buller-Hill B. Murderous Science. Plainview, NY: Cold Spring Harbor Press; 1998. 157. Timberg C. Va. house voices regret for eugenics. Washington Post. February 3, 2001:A01. 158. Watson JD. A Passion for DNA. Plainview, NY: Cold Spring Harbor Laboratory Press; 2000:183195, 213. 159. Schwartz RS. Racial profiling in medical research. N Engl J Med. 2001;344:1392-1393. New at jama.com This human genomics/genetics theme issue includes Webenhanced articles with hypertext links from genetics terms to their definitions in the National Human Genome Research Institute Glossary of Genetic Terms (http://www .nhgri.nih.gov/DIR/VIP/Glossary/pub_glossary.cgi). ©2001 American Medical Association. All rights reserved. (Reprinted) JAMA, November 14, 2001—Vol 286, No. 18 2307

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Implications of the Human Genome for Understanding Human