Download Implications of the Human Genome for Understanding Human

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression programming wikipedia , lookup

Point mutation wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Protein moonlighting wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene therapy wikipedia , lookup

Transposable element wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Metagenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Oncogenomics wikipedia , lookup

Human genetic variation wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Gene wikipedia , lookup

Gene expression profiling wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Microevolution wikipedia , lookup

Genomic library wikipedia , lookup

NEDD9 wikipedia , lookup

Pathogenomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

History of genetic engineering wikipedia , lookup

Minimal genome wikipedia , lookup

RNA-Seq wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Human genome wikipedia , lookup

Genome editing wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome (book) wikipedia , lookup

Human Genome Project wikipedia , lookup

Genomics wikipedia , lookup

Public health genomics wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
SPECIAL COMMUNICATION
Implications of the Human Genome for
Understanding Human Biology and Medicine
G. Subramanian, MD, PhD
Mark D. Adams, PhD
J. Craig Venter, PhD
Samuel Broder, MD
T
HE CAPACITY TO SEQUENCE THE
entire genomes of free-living
organisms, and to analyze such
genomes in their entirety, has
considerable implications for understanding human biology and medicine.1,2 Our genomic sequence provides a unique record of who we are and
how we evolved as a species. 3 The
knowledge fostered by understanding
the genome might clarify which human characteristics are innate and
which are acquired, as well as the interplay between heredity and environment in defining susceptibility to illness. Such an understanding will make
it possible to study how our genomic
DNA varies among cohorts of patients, and especially the role of such
variation in the causation of important illnesses and responses to pharmaceuticals.4-6 We may also be able to
use new approaches to investigating
complex aspects of the human condition, such as language, thought, selfawareness, and higher-order consciousness. The study of the genome
(genomics) and the associated protein
content (proteomics) of free-living organisms will eventually make it possible to localize and understand the
function of every human gene, as well
as the regulatory elements that control the timing, organ-site specificity,
extent of gene expression, protein levels, and posttranslational modifications that define health or illness. For
any given physiological process, we will
2296
Clinical researchers, practicing physicians, patients, and the general public
now live in a world in which the 2.9 billion nucleotide codes of the human
genome are available as a resource for scientific discovery. Some of the findings from the sequencing of the human genome were expected, confirming
knowledge presaged by many decades of research in both human and comparative genetics. Other findings are unexpected in their scientific and philosophical implications. In either case, the availability of the human genome
is likely to have significant implications, first for clinical research and then
for the practice of medicine. This article provides our reflections on what
the new genomic knowledge might mean for the future of medicine and how
the new knowledge relates to what we knew in the era before the availability of the genome sequence. In addition, practicing physicians in many communities are traditionally also ambassadors of science, called on to translate arcane data or the complex ramifications of biology into a language
understood by the public at large. This article also may be useful for physicians who serve in this capacity in their communities. We address the following issues: the number of protein-coding genes in the human genome
and certain classes of noncoding repeat elements in the genome; features
of genome evolution, including large-scale duplications; an overview of the
predicted protein set to highlight prominent differences between the human genome and other sequenced eukaryotic genomes; and DNA variation
in the human genome. In addition, we show how this information lays the
foundations for ongoing and future endeavors that will revolutionize biomedical research and our understanding of human health.
www.jama.com
JAMA. 2001;286:2296-2307
have a new paradigm for addressing its
evolution, development, function, and
mechanism in causing disease and in
affecting the onset and outcome of
disease.7
PREDICTED PROTEIN-CODING
GENES
One noteworthy finding is the relatively low number of genes in the human
genome.1,2 A gene, in this context, is
Author Affiliations: Celera Genomics, Rockville, Md.
Financial Disclosures: Dr Subramanian assisted in Celera patent filings; Dr Adams received honoraria from
noncommercial medical organizations, served as advisor to Celera and as vice president of Celera, and
was involved with Celera patents; Dr Venter served
as president of Celera and was involved with Celera
patents; Dr Broder served as executive vice president
of Celera, holds 16 issued patents for therapeutic agents
(unrelated to Celera activities), and received honoraria/
travel expenses from noncommercial medical
organizations for continuing medical education programs. Celera is involved in developing assay kits or reagents. All authors owned Celera stock and had Celera
stock options. All authors received government research grants from the National Human Genome Research Institute to sequence the rat genome and from
the National Institute of Allergy and Infectious Diseases to sequence the Anopheles gambiae genome.
Corresponding Author and Reprints: Samuel Broder,
MD, Celera Genomics, 45 W Gude Dr, Rockville, MD
20850 (e-mail: [email protected]).
JAMA, November 14, 2001—Vol 286, No. 18 (Reprinted)
©2001 American Medical Association. All rights reserved.
THE HUMAN GENOME
defined as a locus of cotranscribed exons,
which ultimately result in the production of a peptide or protein. There are a
number of computational tools used to
identify, enumerate, and compare genes
within a species and between species.
These computational methods integrate gene prediction models with
different types of experimental and computational evidence to impart a stringency requirement to this process, and
have been applied by ourselves and others.8-10 The text and subtext of biology
prior to the availability of the sequence
for the human genome was that the number of genes in an organism would in
some fashion reflect its “complexity.”
There were expectations that the human
genome would contain 100000 genes or
more.11-13
The genomic sequences of 4 multicellular eukaryote genomes have been
published over the past 3 years. The approximate gene count for the fruit fly
(Drosophila melanogaster) is 14 000
genes 10 ; for the roundworm (Caenorhabditis elegans), 19000 genes8; and
for the mustard plant (Arabidopsis
thaliana), 26 000 genes.9 A comparison of gene numbers among the genomes of different species is in many
ways more important than the total
number of genes found in a genome of
any single species. Those who might be
tempted to use the number of genes to
explain human complexity might then
pause to consider that, by this measure a human being, with approximately 30000 genes,1,2 is roughly a fly
plus a worm or the equivalent of a plant.
This assessment of the number of human genes is based on results from
analyses in which there are stringency
requirements used in conjunction with
the computational algorithms. Thus, absolute count may be less important than
comparisons with published genomes. Similarly, those who were expecting very large numbers of newly
discovered genes as targets for pharmaceutical interventions might need to
reassess their expectations. The number of genes independently reported by
both groups1,2 is far fewer than the expectations based on prior experimen-
tal analysis of expressed sequence tags
(ESTs) or computational analysis,
which estimated that humans would
have 70 000 to 120000 genes.12-14 The
genomic sequence and the gene
complement predicted by both groups
may be accessed on the Web (at http://
public.celera.com/index/cfm [for noncommercialpurposesonly1];andathttp://
www.ensembl.org/ [data generated by
the International Human Genome Sequencing Consortium2]). The sequencing of the human genome suggests that
we must look beyond gene number per
se (at least protein-coding genes) as we
attempt to understand human complexity, future targets for pharmaceutical research, and implications for
medical practice (TABLE 1).1,2,4,5,7,15-46
Surveying the landscape of the human genome leads to several other observations. Only about 1% of the genome is spanned by exons (regions that
code for proteins), while just under 25%
is contained within introns (regions between exons within genes that are
spliced out in the creation of messenger RNA and do not code proteins), and
about 75% of the genome is contained
in intergenic DNA.1,2 Thus, genes often
exist in nonrandom clusters or generich “oases,” separated by what appear
to be large “deserts” of several hundreds of thousands of nucleotide codes
that do not appear to encode genes.
There is no simple explanation for why
natural selection has taken this path in
the evolution of the human genome, but
we believe it is premature to conclude
that such “deserts” lack biological or
medical importance.
REPEAT ELEMENTS
The human genome is filled with blocks
or “elements” of repetitive nucleotide
codes whose function is still a mystery.
It has been known for many years, and
amply confirmed with the sequencing of
the genome, that human DNA contains
large and complex families of such repeat
elements.1,2 These include the long interspersed repetitive elements (LINEs) and
short interspersed repetitive elements
(SINEs), which include Alu sequences
that arose with the evolution of pri-
©2001 American Medical Association. All rights reserved.
mates, including humans. 47 These
sequences represent a distinct class of retrotransposon-amplified repeat DNA.
During primate evolution, these DNA elements could be replicated and transposed to new sites in the genome.21,47
They comprise approximately 10% of the
human genome.1,2 Their biological function and role in natural selection has
remained an enigma.
Yet in surveying the landscape of the
human genome, a striking and nonrandom distribution of Alu sequences is
evident. They appear to preferentially
colocate within gene-rich regions of the
genome.1,2 One inference is that the biological role of these Alu sequences, the
effects of nucleotide variations within
such elements,21 and their ability to mediate recombination events17,18 will be
important in understanding their regulatory effects19-21 on gene function and
disease. Further investigations are required to add to the known examples
where Alu sequence variations have
been shown to affect biology and clinical conditions.17-21
Such elements had previously been
characterized as “selfish” DNA (ie, DNA
whose existence seems related to replication purposes only),48 having no direct impact on medicine or natural selection. The availability of the human
genome sequence suggests that this
view should be revised since it appears possible that such repeat elements may indeed contribute to the
causation of human diseases.
GENOME DUPLICATION
The human genome reveals a remarkable level of duplication.1,2 Although the
biological impact of duplication in generating gene superfamilies is well established, the first comprehensive view
of the genome-wide landscape has revealed the widespread impact of 2 distinct mechanisms of duplication.
These 2 forms of duplication are very
different: one form mediated at the
DNA level (segmental duplication), and
another mediated at the RNA level (retrotransposition). Both mechanisms produce paralogs—a term for genes that
make their appearance in more than 1
(Reprinted) JAMA, November 14, 2001—Vol 286, No. 18 2297
THE HUMAN GENOME
copy in the genome (albeit with possible modifications).
Segmental Duplication
Among humans, the extent of the segmental duplications is 10- to 100-fold
greater than that observed in the fly and
worm genomes. There are more than
3500 genes in over 1000 genomic
blocks ranging in size up to chromosomal lengths, that have shown a duplication with linear preservation of order on another chromosome.1 This is
illustrated by an examination of chro-
mosomes 18 and 20 in FIGURE 1 and a
more global representation of this phenomenon in FIGURE 2.1,49
This process might be analogized to
a kind of internal genomic colonization. The clinical relevance of such
events is emphasized by our finding of
Table 1. Representative Clinical and Biological Correlates of Genomic “Complexity”
Genomic Information
Endogenous retroviral elements
Simple repeats (eg, Alu repeat elements,
triplet repeats)
Relevance to Medicine and Physiology
Non−Protein-Coding (Regulation of Gene Expression)
Position-specific effects on gene expression;
likely has a major influence on disease
expression
Position-specific effects on gene expression or
mediate recombination events that alter DNA
sequence
Examples
Fukuyama muscular dystrophy,15 major
histocompatibility complex gene diversity16
Alu repeats: heme oxygenase-1 deficiency,17
breast and ovarian cancer,18-20 parathyroid
hormone,21 nicotinic acetylcholine
receptor,21 T-cell CD8 alpha,21 Fc epsilon
RI-gamma receptor21
Triplet repeats22,23: Huntington disease,
Friedreich ataxia
Single-nucleotide polymorphisms (SNPs),
including those within promoter,
enhancer, or intronic regions
Determine disease susceptibility and predict
therapeutic efficacy and toxicity4,24 (see also
Table 2)
Disease-associated DNA variants in promoters
(malaria25) and introns (calpain-10 type 2
diabetes mellitus26)
Therapy-related DNA variants in multidrug
resistance gene (MDR-1) and digoxin27;
cardiac sodium channel and flecainide28
Noncoding RNA (includes transfer RNA
[tRNA], ribosomal RNA [rRNA], small
nucleolar RNA [snoRNA]-methylation of
rRNA, small nuclear RNA [snRNA], and X
dosage compensation [Xist])
DNA methylation
Other epigenetic phenomena
Indirect and direct effects on gene expression
with several recent reports of disease
association
SnoRNA (imprinting): Prader-Willi syndrome29
Xist: X chromosome inactivation (antisense
mechanism)30
Regulate gene expression in the absence of
alteration in genomic sequence; defects in
gene imprinting are involved in several
disease conditions31,32
Prader-Willi syndrome,33 Beckman-Wiedemann
syndrome,34 colorectal cancer, Wilms tumor,
hepatoblastoma31,35
RNA editing
Alternative splicing with protein isoforms
Alternative start site for proteins (multiple
start codons, internal ribosomal entry
sites)
Evolution of new protein domains1,2
Domain shuffling (use of “old” domains to
generate “new proteins”)1,2
Domain accretion (greater numbers of
domains per protein)1,2
Gene duplication (segmental duplications,
gene duplication, intronless paralogs)
Posttranslational modifications (eg,
phosphorylation, acetylation,
glycosylation, sulfation [eg, tyrosine
sulfotransferase], proteolytic cleavage)
2298
Protein-Coding (Generation of Protein Diversity)
Posttranscriptional process that changes the
information content within the RNA to affect a
wide range of biological processes
Protein isoforms often show tissue-specific
variability; altered splicing patterns are
causative or are markers for many disease
states39
Generates proteins of varying size and with
different functional capabilities; important
physiological role in the immune system and
with cell cycle regulators
Most prominent in proteins involved in
hemostasis, acquired immune function,
hormonal and nuclear regulation (Figure 4)
Best appreciated among the plasma serine
proteases of the coagulation-complement
system (Figure 3A)
Likely serves to enhance the combinatorial
diversity of protein interactions and is
prominently noted in nuclear regulators
(Figure 3B)
Antibody diversity,36 embryonic erythropoiesis,37
familial hypercholesterolemia38
Tau isoforms in Alzheimer disease40
Acquired immune response41 (interleukin-15);
apoptotic and cell cycle proteins42 (c-myc,
Apaf-1, XIAP)
Developmental regulators (bioactive peptide
hormones), hemostasis (fibronectin type 1
and 2 domain proteins, C1q-complement
component), immune function (cytokines),
and nuclear regulators (KRAB domain zinc
finger family)
Evolutionary phenomena that generate protein
diversity by paralogous expansion; have
important ramifications for disease gene and
therapeutic target identification1,2,5,7
The full extent to which these key modifications
affect protein function and thus pathogenesis
of disease remains to be explored; in addition
to being targets for therapeutic intervention,
these protein modifications play a major role
in clinical disease43-46
JAMA, November 14, 2001—Vol 286, No. 18 (Reprinted)
©2001 American Medical Association. All rights reserved.
THE HUMAN GENOME
paralogous, disease-causing genes, of
validated shared ancestry, on both duplicated segments. A disease-causing
gene is defined as a gene in which sequence variants are linked to the causation of a disease. Notable among these
are genes involved in hemostasis,
complement fixation, transcriptional
regulation (such as the homeobox proteins associated with developmental disorders), metabolic disorders, and voltage-gated ion channels associated with
cardiovascular conduction abnormalities.1 In many cases, there is a diseasecausing gene with a paralog on the duplicated segment, whose linkage to a
disease is not currently recognized.1 It
is possible that an understanding of segmental duplication will provide new insights into the pathogenesis of disease.
To be sure, every duplication event will
not lead to a paralog that results in the
same pathophysiologic consequences.
However, it might well be possible to use
genomics to demonstrate a unity for disparate diseases. It may also be possible
to understand and explain adverse reactions and side effects of drugs through
their previously unknown collateral activities against paralogs.
Retrotransposition
The other remarkable finding is the
extent of gene duplication that has
resulted from retrovirus-based transposition of gene transcripts.1 The ancestors of humans encountered retroviruses capable of transcribing RNA to
DNA (reverse transcription). Indeed,
such viruses are not extinct, as diseases such as acquired immunodeficiency syndrome (AIDS) amply confirm.50 The human genome carries the
results of many such encounters. Gene
duplication by this process in effect creates paralogs that lack introns and often
occur in multiple copies scattered randomly throughout the genome. The
medical implications of this form of gene
duplication are similar to those that
apply to segmental duplication. In addition, the degree of identity between the
source gene and the retrotransposed
gene is often very high, thus leading to
the possibility of confounding in DNA-
Figure 1. Example of Segmental Duplication Between Chromosomes in the Human Genome1
Chromosome 18
KCNG2
NFATC1
Chromosome 20
11.32
p
11.31
11.2
11.1
11.1
11.2
12.1
q
Cerebellin-Related
21.1
21.2
21.3
23
13
12
ZNF236
Kruppel Family
Member
12.2
12.3
22
KCNG1
NFAT-Related
Cerebellin-Related
Kruppel-Related
11.1
11.1
GATA-Related
11.2
Ras (RAB)-Related
GATA6
TALE Homeobox
Family Member
p
11.2
12
13.1
13.2
q
13.3
RAB31
(ras Oncogene
Family Member)
TGF-β-Induced Factor
(TALE Homeobox
Family Member)
Schematic of a large duplicated segment between chromosome 18 (18q22) and 20 (20q13) to show examples
of the genes and their predominantly colinear distribution on both duplicated segments, with the gene names
of 7 of the 56 gene pairs shown. The chromosome 18 segment represents 13 million base pairs (bp) of genomic DNA sequence, whereas the chromosome 20 segment represents 1.4 million bp of genomic DNA. These
genes represent a diverse set of proteins, including nuclear transcription factors (ZNF236 and Kruppel-related:
Kruppel family transcription factors; NFATC1 and NFAT-related: nuclear factor of activated T-cells; GATA6
and GATA-related: GATA transcription factors; TALE homeobox family members, involved in nuclear protein
transcription) as well as potassium channel-related factors (KCNG1 and KCNG2: potassium voltage-gated channels, subfamily G); RAB31 and Ras (RAB)-related: ras oncogene superfamily, involved in protein trafficking.
The precise clinical associations of these proteins with human disease remain to be ascertained, though other
members of these protein classes have been implicated in developmental and cardiovascular conduction abnormalities, for example.49
or protein-based diagnostic tests. It is
important to note that changes in coding or noncoding regulatory regions in
these paralogs, leading to different functions or expression patterns, may be one
way of providing an increased functional repertoire in the human genome.
ANALYSIS OF THE PREDICTED
PROTEIN SET (PROTEOME)
Earlier, we mentioned that the number of protein-coding genes was considered to be low relative to expectations prior to the sequencing of the
genome.12,13,51 Does an analysis of the
full set of proteins (ie, the proteome)
help us resolve the issue of human
beings not appearing to carry many
more genes than a fruit fly, a roundworm, or a plant? Indeed, we do note
that the average human gene makes
more proteins, and more complex proteins, than its invertebrate counterparts. A number of such features are
worth detailing. These include the evolution of new protein domains (well-
©2001 American Medical Association. All rights reserved.
defined regions on a protein that show
structural and functional conservation), duplications or expansions of domains (domain accretion), as well as
greater combinatorial diversity (domain shuffling) in human beings
(FIGURE 3).1,2,52,53 In addition, certain
genes produce more than 1 type of protein, using alternative transcriptional
start sites and RNA splicing. Finally,
posttranslational modifications,
wherein the translated protein is subjected to a wide range of biochemical
modifications, may potentially give rise
to a significantly larger set of functional proteins than would be predicted by the gene count. Table 1 provides a summary of the medical
relevance of these features.
Extensive protein domain shuffling is
observed in the human proteome, and
this would serve to increase or alter combinatorial diversity to provide an exponential increase in protein-protein interactions. Moreover, certain special
genes show patterns for generating com-
(Reprinted) JAMA, November 14, 2001—Vol 286, No. 18 2299
THE HUMAN GENOME
Figure 2. Duplications Within the Genome
Chr 22
Chr 21
Chr 20
Chr X
Chr Y
Chr 1
Chr 2
Chr 19
Chr 18
Chr 17
Chr 3
Chr 16
Chr 15
Chr 4
Chr 14
Chr 13
Chr 5
Chr 12
Chr 6
Chr 11
Chr 7
Chr 10
Chr 9
Chr 8
Segmental duplications comparable to those in chromosomes 18 and 20 (see Figure 1) occur throughout the
human genome. Chr indicates chromosome.
binatorial diversity at the protein level.
For example, immunoglobulins and the
T-cell receptors show clonal DNA shuffling or rearrangements to increase the
immune repertoire, while the cadherins show exon transsplicing (a form of
RNA shuffling that mixes and matches
exons to create diversity in the final messenger RNA) to generate increased extracellular interactions.54,55 All of these
factors taken together contribute to a
complexity not captured by examining
gene number alone.
Many proteins (and protein domains)
found in humans evolved early in the
animal lineage and hence have orthologs
(evolutionary counterparts) in invertebrate genomes. However, several noteworthy vertebrate-specific domains
exist, especially within proteins
involved in developmental, homeostatic, and nuclear regulation. These
2300
proteins have profound implications in
understanding human development,
malignant transformation, and stemcell biology. In addition, proteins related
to acquired immunity, complement
fixation, and hemostasis are either
unique or show a considerable expansion in the human genome compared
to known invertebrate genomes
(FIGURE 4).1,52,53 Thus, we find several
instances where evolution has harnessed “old” domains to provide novel
distinct domain architectures in the
human when compared to the fly or
worm; that is, “new” proteins created
using old domains (Figure 3B).
Examples include the serine proteases, which occur with a widely diverse
set of protein domains in the plasma
proteases (coagulation, complement,
and fibrinolytic systems), and the
recruitment of the immunoglobulin fold
JAMA, November 14, 2001—Vol 286, No. 18 (Reprinted)
into molecules of the acquired immune
system, eg, antibodies, major histocompatibility complex, and cell adhesion
receptors.1,2 Also, in concordance with
the greatly increased neuronal complexity in the human compared to the
fly and worm, there is an increase in the
number of members of protein families involved in neural development,
structure, and function (Figure 4).1,56,57
These include neuronal growth regulators, as well as classes of voltagegated ion channels that play a vital role
in neuronal network formation and in
electrical coupling. Understanding how
these components interact to generate
the neuronal infrastructure in humans
will have an impact on therapeutic
modalities to address neuronal injury,
as well as provide insights into new ways
to diagnose and treat neuropsychiatric
disorders. Proteins involved in apoptosis (programmed cell death), a central effector mechanism that regulates
cellular physiology, are also greatly
expanded in humans.1,58 The central role
for this process in neurodegenerative
diseases,59 malignancy, and inflammatory conditions60 related to extrinsic
mediators (eg, pathogens) and intrinsic mediators (eg, cardiovascular disease, inflammatory bowel disease)
constitute areas of intense current investigation. Therapeutic interventions that
can modulate the apoptotic process will
likely have major effects on some of the
most devastating clinical illnesses that
afflict humankind.61
However, a focus on the genomic
DNA sequence alone will not be sufficient to resolve all the important problems of medicine and biology. The availability of the human genome sequence
will substantially enhance the power of
proteomics.6,44 In the near future, once
the sequence of any unknown protein
is determined in any human fluid or cell
culture (eg, by a technology involving
mass spectroscopy for separation and
identification of proteins), there will now
virtually always be a “hit,” or “match,”
between proteins and their genes.44 The
applicability of this approach to better
understand disease processes in humans, as well as to facilitate drug dis-
©2001 American Medical Association. All rights reserved.
THE HUMAN GENOME
covery and development,43,62,63 will undoubtedly increase as genomes of
additional model organisms (eg, the
mouse, rat, and dog) become available.
Such approaches will also enhance the
capacity for detecting novel microbes
and their protein complements, either
pathogens or commensals, both of which
have profound implications for enhancing microbial diagnosis and developing improved antimicrobial therapeutics.64 It will also be possible to link
proteins and their posttranslational
modifications to the pathophysiology of
illnesses.43,44 Many of these modifications likely affect the activity and
disposition of proteins in health and disease. One special form of posttranslational modification involves protein
cleavage, which is essential to the activity of certain proteins (of which insulin
and other hormones are classic examples), including those involved in the
apoptotic process. Ultimately, the number, complexity, and modifications of
proteins encoded by human genes all
contribute to the complexity of human
biology, and underscore that not all answers lie at the level of genomic information per se. Advances in proteomics44,62,63,65-67 will thus likely enhance the
next generation of diagnostics as well as
guide therapeutics in ways that were previously impossible or exceedingly difficult (TABLE 2 and TABLE 3).*
peutic agents (even when there is no
obvious difference in individual pharmacokinetics or biochemical pharmacology).4,7,24,70
The most common form of DNA
variation in the human genome is the
DNA VARIATION
The study of the genome supports the
fundamental unity of human beings.
We all share at least 99.9% of the
nucleotide code in our genome.1,138 And
yet it is remarkable that the diversity
of human beings at the genetic level is
encoded by less than 0.1% variation in
our DNA. In any physician’s practice,
patients are predisposed to different
conditions, respond to the environment in variable ways, metabolize pharmaceuticals differently,4,7 vary regarding dose-response relationships for
common drugs, and have a range of susceptibilities to adverse effects of thera-
A protein domain is a structural and functional unit that shows evolutionary conservation and, by convention,
is represented as a distinct geometric shape. Thus, proteins are made up of 1 or more such building blocks or
“domains” and, depending on the types and numbers of domains, proteins with different biological capabilities are created. Many of these domains have seemingly arbitrary nomenclature that, in many cases, reflects
the experimental nuances of their initial description. A library of curated protein domains with their biological
descriptions is available through the Pfam52 and SMART53 databases.
A, The extensive domain shuffling seen in the plasma proteases of the coagulation and complement systems.
The “ancient” trypsin family serine protease domain occurs in combination with a myriad of protein interaction
domains. Most of these domains are evolutionarily ancient, that is, with the exception of the Gla domain (see
below); they are also observed in the fly and the worm. These include: (1) AP: Apple, originally described in the
coagulation factors, predicted to possess protein- and/or carbohydrate-binding functions; (2) Kr: Kringle, named
after a Danish pastry, has an affinity for lysine-containing peptides; (3) E: epidermal growth factor (EGF)-like; (4)
CUB: domain first described in complement proteins and a diverse group of developmental proteins; (5) CCP:
complement control protein repeats, also known as “sushi” repeats, first recognized in the complement proteins;
and (6) Gla: a hyaluron-binding domain, contains ␥-carboxyglutamate residues, and is seen in proteins associated with the extracellular matrix. Of note is the observation that apolipoprotein (a) likely represents a primatespecific evolutionary event. There is a tremendous expansion of the Kringle domain (dashed segment represents
a total of 29 copies of the Kringle domain) in a trypsin family serine protease.
B, Examples of domain accretion in nuclear regulators in the human compared with the fly.1,2 Domain accretion refers to greater numbers of a specific domain in a multidomain protein or addition of new domains to a
multidomain protein. These domains include: (1) BTB: broad-complex, tramtrack, and bric-a-brac (a name that
reflects its early descriptions in Drosophila), a protein interaction domain; (2) Zf: C2H2 class of DNA-binding zinc
finger; (3) KRAB: Kruppel-associated box, a vertebrate-specific nuclear protein interaction domain; (4) HD: histone deacetylase, an important class of chromatin-modifying enzymes; (5) U: ubiquitin finger, a domain that
targets proteins for proteolytic degradation. There is a major expansion of the numbers of C2H2 zinc fingers in
the BTB or KRAB transcription factor (dashed segment represents a total of 3 copies of the Zf domain) families in
the human, a feature that may reflect increased ability to mediate regulatory interactions with DNA.
*References 1, 2, 4-6, 22, 24, 25, 31, 43, 44, 62, 63,
65-137.
single-nucleotide polymorphism
(SNP).69,70 Put simply, an SNP is the substitution of one purine or pyrimidine base
for another at a given location in a strand
of DNA. Generally, SNPs are biallelic
(only 2 choices exist at a given site within
Figure 3. Prominent Differentiating Features in the Domain Architectures of Representative
Human Proteins
A Domain "Shuffling"
Protein Domains
Kr
Kr
Kr
37
7
Protein Name
AP
Kr
Kr
Kr
Kr
Kr
Serine Protease
Plasminogen
Kr
Kr
Kr
Kr
Kr
Kr
Serine Protease
Apolipoprotein (a)
E
Kr
Serine Protease
Urokinase-Type Plasminogen Activator
Serine Protease
Prostate-Specific Antigen
AP AP AP AP Serine Protease
Gla
CUB
E
CUB
E
Serine Protease
Coagulation Factor X
CCP
Serine Protease
Complement C1r Component
E
CCP
Coagulation Factor XI
B Domain Accretion
Protein Domains
Human
Fly
BTB
Zf
Zf
BTB
KRAB
HD HD
©2001 American Medical Association. All rights reserved.
Protein Name
Zf
Zf
Zf
Zf
Zf
Zf
B-Cell Lymphoma 6 Protein (BCL-6)
Zf
Zf
Zf
Gonadotropin-Inducible
Transcription Repressor-4
6
2
U
Histone Deacetylase 6 (Hd6)
Zf
Zf
HD HD
(Reprinted) JAMA, November 14, 2001—Vol 286, No. 18 2301
THE HUMAN GENOME
Figure 4. Representative Examples of the Major Differences Between the Predicted Protein Sets of the Human Compared With the Fly and
the Worm
120
Worm
Fly
100
No. of Proteins
Human
80
60
40
20
Developmental
Regulators
F
TN
TI
R
e
kin
q
to
Cy
C1
P
CC
e
M
AC
PF
gl
1
Neural Structure
and Function
Kr
in
FN
2
FN
Pl
ex
in
Se
m
ap
ho
Sy
rin
na
pt
ot
ag
m
in
TS
P
Vo
Io lta
g
n
Ch e-G
an at
ne ed
ls ∗
Pr My
ot eli
ein n
Ne s
ur
op
ilin
nt
Ep
hr
in
W
Fβ
TG
Ca
dh
er
in
Co
nn
ex
in
0
Hemostasis, Complement System,
Immune Response
The numbers of proteins containing the specified Pfam domain or protein family for each of the animal genomes were derived by computational analysis.1 Representative protein domains or protein families that show a 2-fold or greater expansion in the human were categorized into cellular processes (eg, developmental regulators;
neural structure and function; or hemostasis, complement system, and immune response) for representation. A detailed biological description of each of these protein
domains may be obtained from the Pfam52 or SMART53 databases. TGF-␤ indicates transforming growth factor-␤; TSP, thrombospondin; CCP, complement control
protein; and TIR, toll interleukin receptor.
Notable examples from this list of proteins that are unique to the human (when compared with the fly and worm) include connexins (constitutive subunits of intercellular channels, providing the structural basis for electrical coupling); neuropilin, a key mediator in axonal guidance along with the semaphorins and plexin molecules;
fibronectin type 1 (FN1) domain, a fibrin-binding domain found in certain proteins of the coagulation cascade; fibronectin type 2 (FN2) domain, a collagen-binding
domain found in a diverse set of hemostatic regulators; membrane-attack complex/perforin (MACPF), a domain found in certain complement proteins; C1q, a domain
found in complement 1q and in many collagens; cytokines and tumor necrosis factor (TNF), 2 of the central families of secreted proteins that mediate a wide spectrum
of immune-related functions.
*Voltage-gated (VG) ion channels include VG-sodium, -calcium, and -potassium channels.
Table 2. Immediate Benefits From Whole-Genome Analysis by Genetic Basis of Disease
Genetic Mechanism
Standard mendelian patterns of
inheritance and X-linked
inheritance
Complex (polygenic) inheritance
Inherited disorders involving
unstable triplet repeats and
the clinical phenomenon of
anticipation
Genetic imprinting
(parent-of-origin effects)
Acquired somatic mutations (eg,
cancer)
Benefits
Improved familial linkage studies in medical genetics; discovery
of genes (and regulatory regions) with mutations that result
in phenotypes (diseases) that conform to the classic
principles of mendelism; better identification of candidate
genes; improved functional and positional cloning of genes
involved in the causation of disease68,69
Better identification of disease susceptibility loci and candidate
genes; more effective association (population) studies
involving the search for alleles that contribute to common
diseases such as cardiovascular disease, diabetes, and
cancer, in which the phenotype does not conform to the
classic principles of mendelism68-70
Better catalogs of repeats and polyglutamine tracts22; better
identification of candidate genes71
More efficient and rapid identification of methylation patterns
based on high-throughput mass spectroscopy analysis and
correlating with gene expression and clinical
phenotypes31,72; comparison of DNA methylation patterns
between mouse and human in the context of disease,
yielding clearer insights into disease in which pathogenesis
is linked to abnormal imprinting and related epigenetic
phenomena
A reference for comparing germline and somatic configuration
of genes more effectively68
a population). The nomenclature defining a mutation (a change in DNA that
may affect phenotype7 [http:www.nhgri
2302
.nih.gov/DIR/VIP/Glossary/pub_glossary
.cgi]) can be somewhat arbitrary and relative. By convention, when a substitution
JAMA, November 14, 2001—Vol 286, No. 18 (Reprinted)
is present in more than 1% of a given target population and causes no discernibly abnormal phenotype, it is called a
variant or polymorphism.7,69,70 Singlenucleotide polymorphisms can affect
gene function, or they can be neutral.
Neutrality is sometimes inferred if an
SNP does not alter protein coding (ie, a
change in an exon that encodes a different amino acid). In practice, this inference can be wrong. It is also worth noting that an SNP may be subtly
responsible for an abnormal phenotype, but only in the context of a given
environment (or the simultaneous presence of SNPs in other locations), without which an abnormal phenotype is not
expressed.69,70
We now have a genome-wide survey
of several million variants, with precise
nucleotide localization, in an ethnogeographically divergent group of individuals.1,138 In comparing chromosomes from
any 2 randomly selected individuals, we
know there is an average of 1 variation
for every 1250 nucleotides. These varia-
©2001 American Medical Association. All rights reserved.
THE HUMAN GENOME
tions can occur within exons, with synonymous (no change in amino acid) or
nonsynonymous (a change in amino
acid) alterations in code, or they can occur outside exons within intronic or intergenic regions of the genome. Less than
1% of all known SNPs encode a direct
amino acid change of the ultimate protein product of a gene.1 Therefore, there
are only thousands (not millions) of genetic variations that directly contribute
to the structural protein diversity of human beings.1 We, and others, are currently performing large-scale resequencing and genotyping to define the
frequency of these variations in various
populations.
While such changes are certainly
important to medicine, this finding
implies that future medical research
will need to also focus on the contributions of polymorphisms in noncoding regions or intergenic regions of
the genome, something that was previously difficult or impossible to do.
Thus, SNPs in proximity to various
regulatory regions,25 some of which
exist at a great distance from the regulated gene in either 5⬘ or 3⬘ directions,
are likely to be important. By the
same token, SNPs in introns may have
an unexpected role in the causation of
human disease.26,139 Finally, SNPs in
genes whose final product is an RNA
may also be of unexpected importance.140
An understanding of the human genome and its DNA variation will allow
a rapid expansion of the medical applications of pharmacogenetics. 4,24
There is a number of clear examples
where DNA variation, primarily, but not
exclusively, in the form of SNPs has implications for clinical research and
medical practice (Tables 1, 2, and
3).4,7,24,25,76-105
These include polymorphisms that influence the clinical course or response
to therapy. Thus, angiotensin-II type-1
receptor polymorphisms can have an impact on the severity of congestive heart
Table 3. Short- and Long-term Research and Clinical Benefits From Whole-Genome Analysis
Identification of
Genomic Technologies
Disease Genes
Drug Discovery
Bioinformatics: (1) predicting Integral component in Target identification
(homologs of
protein structure, (2)
the structural,
known drug
predicting protein
functional, and
targets or key
function, (3) analysis of
evolutionary
members in a
genetic variations, (4)
analysis of a
1,2
biological
impact of variations on
genome
pathway);
structure and function,
structure-based
(5) analysis of
rational drug
expression data, (6)
design (small
representation and
molecule or
analysis of biomolecular
biologics)73-75
interactions (pathways)
to understand
disease-gene
relationships
Resequencing to catalog
Genetic approaches Efficient identification
genetic variations
for identification of
of genes involved
candidate disease
in causation (or
5,69,70
genes
prevention) of
disease5,69,70
Predictive Toxicology
Clinical Trials
Clinical Practice
Integrative analysis of
Integration of
Personalized medicine:
pathology and clinical
computational
adaptation of
data with
biology, clinical
preventive,
polymorphism and
data, and
diagnostic, and
5,6,24
expression data
polymorphism and
therapeutic
expression
approaches to the
5,6,24
data
genotypes and
gene expression
profiles (especially
proteomic profiles)
of an individual
patient5,6,24
Stratification in clinical Assess susceptibility to
diseases such as
trials to predict
cancer,85-93
toxicity and
infectious
efficacy both in a
disease,25,94-98 and
prospective and
asthma99-105
retrospective
Assess response to
manner5,24,76-84
therapeutic
interventions76-84
Differential expression: (1)
Differentially
Target identification
Identification of surrogate Identification of
Diagnostic and
RNA arrays, (2) protein,
expressed or
validation63,65-67,115
markers for
surrogate
prognostic markers
(3) metabolite (in other
altered
toxicity65,67,112,116
markers to
to monitor
eukaryotic
genes109-112 or
predict toxicity or
progression of
proteins43,44,113-115
systems),106,107 (4) tissue
efficacy65,67,112,116
disease or response
arrays108
to therapeutic
intervention62,65,67,116,117,118
Protein interaction maps of Identification of genes Identification of
Identification of
Not applicable
Not applicable
pathways involved in
increased number
unexpected pathways
that are
disease: yeast 2-hybrid
of potential drug
involved in drug
components of
genetic screen, mass
targets75,123,124;
toxicity124
complex
special
spectroscopy
pathways involved
applications to
in disease44,118-122
infectious
pathogens125-127
Not applicable
Not applicable
Comparative genome
Use of the mouse and Use of the mouse, rat, Use of mammalian
and dog genomes
genomes (eg, rats,
analysis (animal models
other animals as
to model efficacy
mice) to create better
of disease)
models to study
of new
models for predictive
human
128-134
128,133,137
therapies
toxicology and
disease
toxicogenomics135,136
©2001 American Medical Association. All rights reserved.
Identification of reliable
surrogate markers of
toxicity4,5,7,24 (ie,
relevant
polymorphisms in
genes that are drug
targets or drug
modifiers)
(Reprinted) JAMA, November 14, 2001—Vol 286, No. 18 2303
THE HUMAN GENOME
failure as well as the response to angiotensin-converting enzyme inhibitors141,142; ␤-adrenergic receptor variants may alter airway hyperreactivity and
response to ␤-agonists administered
through metered-dose inhalers143; and
the apolipoprotein E4 allele affects onset of disease and the differential response to anticholinergic agents in patients with Alzheimer disease.144 Also,
the bioavailability of drugs is affected by
polymorphisms in genes that code for
proteins regulating drug metabolism and
disposition (eg, MDR1, a drug efflux
pump which regulates digoxin levels, 145 CYP2C19, which regulates
omeprazole metabolism,146 and CYP2C9
or 2C19, which regulate tolbutamide
and phenytoin metabolism78). These
have clinical consequences. The availability of genome-wide data on DNA
variation is thus likely to expand
progress in prevention, diagnosis, and
treatment customized to the needs of a
specific patient, rather than to a statistical average. In addition, SNPs provide a new tool for familial linkage and
population-based association studies to
speed the identification of genes as targets for new diagnostics and therapeutics.4,24,69,70 In this context, it will soon
be possible to integrate information on
DNA variations in human populations
with an understanding of entire networks of genes. Again, this would have
been difficult or impossible prior to the
sequencing of the entire genome. Since
most common human diseases culminate from long-standing interactions between many genes and environmental
factors (including lifestyle), predicting
the contributions of genes in complex
disorders will remain a challenge for
medicine for many years to come.
Biological Complexity and
the Role of the Genome
in the Future of Medicine
The modest number of human genes
means that we must explore mechanisms that generate the complexities
inherent in human development and
the sophisticated signaling systems
that maintain homeostasis. There is a
large number of ways that the func2304
tions of individual genes and gene
products are regulated. An overview
of these mechanisms and their relevance to disease and therapeutic
intervention is discussed briefly and
enumerated in Table 11,2,4,5,7,15-46
The key point is that certain observations at the clinical level provide
unique opportunities to understand
how the genome functions as an integrated system. Thus, the study of mendelian disorders has led to unique insights regarding the functions of more
than 1000 genes.49 However, many
common disorders, including cancer,
asthma, type 2 diabetes mellitus, cardiovascular abnormalities, and neuropsychiatric illness, cannot be generally explained on the basis of variation
in a single gene—that is, they are polygenic in origin.68 Other illnesses are
manifestations of (1) the process of creating triplet repeats22,23 (eg, Huntington disease, spinocerebellar ataxia, fragile X syndrome); (2) abnormalities of
certain epigenetic phenomena, such as
gene imprinting31,35 (eg, Prader-Willi
syndrome 33 and Beckman-Wiedemann syndrome34); (3) abnormalities
of mitochondrial genes147 (eg, MELAS
[myelopathy, encephalopathy, lactic
acidosis, and stroke-like episodes]
syndrome, Kearns-Sayre syndrome);
and (4) somatic mutation or mosaicism148,149 (eg, McCune-Albright syndrome, paroxysmal nocturnal hemoglobinuria, cancer). In addition, there
is growing evidence that conditions
such as Prader-Willi syndrome are
caused by a variant in genes whose
product is an RNA molecule, not a protein per se (Table 1).29,140 Thus, understanding the physiological roles of noncoding RNA and its modifications may
contribute to understanding of the causation of specific diseases.37,140,150,151
While the use of genomic sequence
data to identify genetic determinants of
disease has already shown significant
progress (Table 2),22,31,68-72 the availability of the genomic sequence, and the
development of high-throughput experimental and computational technologies, heralds a new era in our understanding of disease processes (Table
JAMA, November 14, 2001—Vol 286, No. 18 (Reprinted)
3). The use of computational homology–based approaches to identify new
drug targets and to predict their structures in facilitating rational drug
design is revolutionizing the drug development process.73,74 Efforts to incorporate genomic approaches into
various stages of the drug development process for the development of
novel and improved therapeutic agents,
as well as for optimizing patient stratification and following clinical outcomes in treatment trials, are currently under way (Table 3).5,24,65,67,112,116
The practical benefits in the clinical
practice of medicine will become increasingly apparent when there is a
complete integration of genomic information with the phenotypes of clinical disease (Tables 2 and 3).
We are in the midst of a major paradigm shift in biology and medicine119;
the process of studying genes in isolation has now shifted toward exploring
networks of genes involved in cellular
processes152 and disease, identifying molecular “portraits” of disease based on
tissue or organ involvement, and ultimately defining the biochemical readouts that are specific to clinical conditions. The era of “personalized”
medicine will evolve as a parallel process, in which DNA variations recorded in human populations will be
integrated into the above paradigm, to
guide a new generation of diagnostic,
prognostic, and therapeutic modalities designed to improve patient care
(Table 3).153
SOCIETAL CHALLENGES
AND LEGACIES FOR
GENOMIC RESEARCH
Basic science researchers have the task
of deciphering the biological meaning
of the 2.9 billion nucleotide codes that
comprise the human genome. Physicians may be called on to interpret the
scientific implications for their patients. However, physicians may also be
called on to address complex historical and societal issues, which are either induced or revived by the sequencing of the human genome, in their
everyday practice.
©2001 American Medical Association. All rights reserved.
THE HUMAN GENOME
One fundamental issue is the extent
to which knowledge of the genomic DNA
sequence allows prediction of the essence
of who we are, including the determination of risk for illness in various settings. There are some who may view the
genome in a deterministic way, believing that the human condition will ultimately be seen entirely as a manifestation of sequence information and
computation. We do not subscribe to
such a view. Nevertheless, an individual’s DNA is, in a sense, the ultimate personal identifier; thus, some patients may
fear that the advent of new genomic technologies will affect their livelihoods and
standing in the community. It is ironic
that approximately 1 week prior to the
publications of the human genome,
an agency of the US government went
to court for the first time to block a
private employer from compelling
its employees to submit to genetic testing in work-related injury inquiries,
threatening dismissal for noncompliance.154,155 Thus, physicians and other
health care providers are likely to have
an interest in state and federal legislation protecting patient privacy and prohibiting discrimination on the basis of
genetic testing. Patients who have suffered such discrimination in the past or
fear it in the future are perhaps unlikely
to view the scientific achievement of
sequencing the human genome as an
entirely positive accomplishment. We
believe legislation to protect genetic privacy and prevent discrimination is essential to progress in genomics research.
There is also a complex social and
political history related to human genetics. At various times in the past, many
societies, including our own, adopted
theories of race and genetics as the justification for political oppression against
vulnerable groups.156,157 James D. Watson, the founding director of the Human
Genome Project at the National Institutes of Health, provides an important
perspective on these issues.158 It is possible that medicine, even today, is
affected by subtle and unrecognized
biases. Thus, it has been argued that the
medical community may wish to mark
the milestone of the recent sequencing
of the human genome as a time to discuss how such biases influence medical
education, clinical research, and medical practice.159 In such a discussion, we
would offer that an analysis of the
genome1,2 reveals a fundamental unity
for all human beings. Our task now is
to use the tools of modern genomics to
prevent, diagnose, and treat illnesses,
and, at the same time, to try to ensure
that the benefits of genomics research
extend fairly to all members of society.
Acknowledgment: We wish to thank the members of
the Celera scientific staff for contributions toward analysis of the human genome sequence, and Beth Hoyle,
BA, for her excellent editorial assistance in preparing the
manuscript. We also thank Steven L. Salzberg, PhD, for
his helpful discussions and assistance in illustrating the
segmental duplications within the human genome.
REFERENCES
1. Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science. 2001;291:
1304-1351.
2. Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis of the human genome.
Nature. 2001;409:860-921.
3. Cavalli-Sforza LL. The DNA revolution in population genetics. Trends Genet. 1998;14:60-65.
4. Weber WW. Pharmacogenetics. Oxford, England: Oxford University Press; 1997.
5. Roses AD. Pharmacogenetics and the practice of
medicine. Nature. 2000;405:857-865.
6. Broder S, Venter JC. Whole genomes. Curr Opin
Biotechnol. 2000;11:581-585.
7. Broder S, Venter JC. Sequencing the entire genomes of free-living organisms. Annu Rev Pharmacol Toxicol. 2000;40:97-132.
8. The C. elegans Sequencing Consortium. Genome
sequence of the nematode C. elegans. Science. 1998;
282:2012-2018.
9. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:
796-815.
10. Adams MD, Celniker SE, Holt RA, et al. The genome sequence of Drosophila melanogaster. Science. 2000;287:2185-2195.
11. Dickson D. Gene estimate rises as US and UK discuss freedom of access. Nature. 1999;401:311.
12. Liang F, Holt I, Pertea G, et al. Gene index analysis of the human genome estimates approximately
120,000 genes. Nat Genet. 2000;25:239-240.
13. Antequera F, Bird A. Number of CpG islands and
genes in human and mouse. Proc Natl Acad Sci
U S A. 1993;90:11995-11999.
14. Wright FA, Lemon WJ, Zhao WD, et al. A draft
annotation and overview of the human genome.
Genome Biol. 2001;2:1-18.
15. Kobayashi K, Nakahori Y, Miyake M, et al. An ancient retrotransposal insertion causes Fukuyamatype congenital muscular dystrophy. Nature. 1998;
394:388-392.
16. Dawkins R, Leelayuwat C, Gaudieri S, et al. Genomics of the major histocompatibility complex.
Immunol Rev. 1999;167:275-304.
17. Saikawa Y, Kaneda H, Yue L, et al. Structural
evidence of genomic exon-deletion mediated by AluAlu recombination in a human case with heme oxygenase-1 deficiency. Hum Mutat. 2000;16:178-179.
18. Rohlfs EM, Puget N, Graham ML, et al. An Alumediated 7.1 kb deletion of BRCA1 exons 8 and 9 in
©2001 American Medical Association. All rights reserved.
breast and ovarian cancer families that results in alternative splicing of exon 10. Genes Chromosomes
Cancer. 2000;28:300-307.
19. Norris J, Fan D, Aleman C, et al. Identification of
a new subclass of Alu DNA repeats which can function as estrogen receptor-dependent transcriptional enhancers. J Biol Chem. 1995;270:22777-22782.
20. Sharan C, Hamilton NM, Parl AK, et al. Identification and characterization of a transcriptional silencer upstream of the human BRCA2 gene. Biochem Biophys Res Commun. 1999;265:285-290.
21. Hamdi HK, Nishio H, Tavis J, et al. Alu-mediated
phylogenetic novelties in gene regulation and development. J Mol Biol. 2000;299:931-939.
22. Usdin K, Grabczyk E. DNA repeat expansions and
human disease. Cell Mol Life Sci. 2000;57:914-931.
23. Lieberman AP, Fischbeck KH. Triplet repeat expansion in neuromuscular disease. Muscle Nerve. 2000;
23:843-850.
24. Roses AD. Pharmacogenetics and future drug development and delivery. Lancet. 2000;355:13581361.
25. Knight JC, Udalova I, Hill AV, et al. A polymorphism that affects OCT-1 binding to the TNF promoter region is associated with severe malaria. Nat
Genet. 1999;22:145-150.
26. Horikawa Y, Oda N, Cox NJ, et al. Genetic variation in the gene encoding calpain-10 is associated with
type 2 diabetes mellitus. Nat Genet. 2000;26:163175.
27. Hoffmeyer S, Burk O, von Richter O, et al. Functional polymorphisms of the human multidrugresistance gene. Proc Natl Acad Sci U S A. 2000;97:
3473-3478.
28. Benhorin J, Taub R, Goldmit M, et al. Effects of
flecainide in patients with new SCN5A mutation. Circulation. 2000;101:1698-1706.
29. Cavaille J, Buiting K, Kiefmann M, et al. From the
cover: identification of brain-specific and imprinted
small nucleolar RNA genes exhibiting an unusual genomic organization. Proc Natl Acad Sci U S A. 2000;
97:14311-14316.
30. Mlynarczyk SK, Panning B. X inactivation. Curr
Biol. 2000;10:R899-R903.
31. Feinberg AP. DNA methylation, genomic imprinting and cancer. Curr Top Microbiol Immunol. 2000;
249:87-99.
32. Wolffe AP, Matzke MA. Epigenetics. Science.
1999;286:481-486.
33. Ohta T, Gray TA, Rogan PK, et al. Imprintingmutation mechanisms in Prader-Willi syndrome. Am
J Hum Genet. 1999;64:397-413.
34. Engel JR, Smallwood A, Harper A, et al. Epigenotype-phenotype correlations in Beckwith-Wiedemann
syndrome. J Med Genet. 2000;37:921-926.
35. Cui H, Horon IL, Ohlsson R, et al. Loss of imprinting in normal tissue of colorectal cancer patients
with microsatellite instability. Nat Med. 1998;4:12761280.
36. Neuberger MS, Scott J. Immunology: RNA editing AIDs antibody diversification? Science. 2000;289:
1705-1706.
37. Wang Q, Khillan J, Gadue P, Nishikura K. Requirement of the RNA editing deaminase ADAR1 gene
for embryonic erythropoiesis. Science. 2000;290:
1765-1768.
38. Yu L, Heere-Ress E, Boucher B, et al. Familial hypercholesterolemia. Atherosclerosis. 1999;146:125131.
39. Philips AV, Cooper TA. RNA processing and human disease. Cell Mol Life Sci. 2000;57:235-249.
40. Buee L, Bussiere T, Buee-Scherrer V, et al. Tau protein isoforms, phosphorylation and role in neurodegenerative disorders. Brain Res Brain Res Rev. 2000;
33:95-130.
41. Bamford RN, Battiata AP, Waldmann TA. IL-15.
J Leukoc Biol. 1996;59:476-480.
42. Holcik M, Sonenberg N, Korneluk RG. Internal ri-
(Reprinted) JAMA, November 14, 2001—Vol 286, No. 18 2305
THE HUMAN GENOME
bosome initiation of translation and the control of cell
death. Trends Genet. 2000;16:469-473.
43. Banks RE, Dunn MJ, Hochstrasser DF, et al. Proteomics. Lancet. 2000;356:1749-1756.
44. Pandey A, Mann M. Proteomics to study genes
and genomes. Nature. 2000;405:837-846.
45. Kehoe JW, Bertozzi CR. Tyrosine sulfation. Chem
Biol. 2000;7:R57-R61.
46. McKinsey TA, Zhang CL, Lu J, Olson EN. Signaldependent nuclear export of a histone deacetylase
regulates muscle differentiation. Nature. 2000;408:
106-111.
47. Hamdi H, Nishio H, Zielinski R, Dugaiczyk A. Origin and phylogenetic distribution of Alu DNA repeats. J Mol Biol. 1999;289:861-871.
48. Howard BH, Sakamoto K. Alu interspersed repeats. New Biol. 1990;2:759-770.
49. Antonarakis SE, McKusick VA. OMIM passes the
1,000-disease-gene mark. Nat Genet. 2000;25:11.
50. Broder S, Merigan TC Jr, Bolognesi D. Textbook
of AIDS Medicine. Baltimore, Md: Williams & Wilkins;
1994.
51. Baltimore D. Our genome unveiled. Nature. 2001;
409:814-816.
52. Bateman A, Birney E, Durbin R, et al. The Pfam
protein families database. Nucleic Acids Res. 2000;
28:263-266.
53. Schultz J, Copley RR, Doerks T, et al. SMART: a
web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 2000;28:231-234.
54. Wu Q, Maniatis T. A striking organization of a
large family of human neural cadherin-like cell adhesion genes. Cell. 1999;97:779-790.
55. Wu Q, Maniatis T. Large exons encoding multiple ectodomains are a characteristic feature of protocadherin genes. Proc Natl Acad Sci U S A. 2000;
97:3124-3129.
56. Ranscht B. Cadherins. Int J Dev Neurosci. 2000;
18:643-651.
57. Missler M, Sudhof TC. Neurexins. Trends Genet.
1998;14:20-26.
58. Aravind L, Dixit VM, Koonin EV. Apoptotic molecular machinery. Science. 2001;291:1279-1284.
59. Yuan J, Yankner BA. Apoptosis in the nervous system. Nature. 2000;407:802-809.
60. Krammer PH. CD95’s deadly mission in the immune system. Nature. 2000;407:789-795.
61. Nicholson DW. From bench to clinic with apoptosis-based therapeutic agents. Nature. 2000;407:
810-816.
62. Smith MA, Bains SK, Betts JC, et al. Use of twodimensional gel electrophoresis to measure changes
in synovial fluid proteins from patients with rheumatoid arthritis treated with antibody to CD4. Clin Diagn Lab Immunol. 2001;8:105-111.
63. Yoshida M, Loo JA, Lepleya RA. Proteomics as a
tool in the pharmaceutical drug design process. Curr
Pharm Des. 2001;7:291-310.
64. Wren BW. Microbial genome analysis. Nat Rev
Genet. 2000;1:30-39.
65. Fung ET, Wright GL, Jr, Dalmasso EA. Proteomic
strategies for biomarker identification. Curr Opin Mol
Ther. 2000;2:643-650.
66. Fung ET, Thulasiraman V, Weinberger SR, Dalmasso EA. Protein biochips for differential profiling.
Curr Opin Biotechnol. 2001;12:65-69.
67. Kennedy S. Proteomic profiling from human
samples. Toxicol Lett. 2001;120:379-384.
68. Peltonen L, McKusick VA. Genomics and medicine. Science. 2001;291:1224-1229.
69. Risch NJ. Searching for genetic determinants in
the new millennium. Nature. 2000;405:847-856.
70. Chakravarti A. Population genetics: making sense
out of sequence. Nat Genet. 1999;21(suppl 1):5660.
71. Hughes RE, Olson JM. Therapeutic opportunities in polyglutamine disease. Nat Med. 2001;7:419423.
2306
72. Kondo T, Bobek MP, Kuick R, et al. Wholegenome methylation scan in ICF syndrome. Hum Mol
Genet. 2000;9:597-604.
73. Sanchez R, Pieper U, Melo F, et al. Protein structure modeling for structural genomics. Nat Struct Biol.
2000;7(suppl):986-990.
74. Vitkup D, Melamud E, Moult J, Sander C. Completeness in structural genomics. Nat Struct Biol. 2001;
8:559-566.
75. Teichmann SA, Murzin AG, Chothia C. Determination of protein function, evolution and interactions by structural genomics. Curr Opin Struct Biol.
2001;11:354-363.
76. Arranz MJ, Munro J, Birkett J, et al. Pharmacogenetic prediction of clozapine response. Lancet. 2000;
355:1615-1616.
77. Lesch KP, Bengel D, Heils A, et al. Association of
anxiety-related traits with a polymorphism in the serotonin transporter gene regulatory region. Science.
1996;274:1527-1531.
78. Inoue K, Yamazaki H, Imiya K, et al. Relationship
between CYP2C9 and 2C19 genotypes and tolbutamide methyl hydroxylation and S-mephenytoin 4’hydroxylation activities in livers of Japanese and Caucasian populations. Pharmacogenetics. 1997;7:103113.
79. Israel E, Drazen JM, Liggett SB, et al. Effect of polymorphism of the beta(2)-adrenergic receptor on response to regular use of albuterol in asthma. Int Arch
Allergy Immunol. 2001;124:183-186.
80. Iwata N, Cowley DS, Radel M, et al. Relationship between a GABAA alpha 6 Pro385Ser substitution and benzodiazepine sensitivity. Am J Psychiatry.
1999;156:1447-1449.
81. Redman AR. Implications of cytochrome P450 2C9
polymorphism on warfarin metabolism and dosing.
Pharmacotherapy. 2001;21:235-242.
82. Breen G, Brown J, Maude S, et al. -141 C del/ins
polymorphism of the dopamine receptor 2 gene is associated with schizophrenia in a British population. Am
J Med Genet. 1999;88:407-410.
83. Gelernter J, Kranzler H, Coccaro E, et al. D4 dopamine-receptor (DRD4) alleles and novelty seeking
in substance-dependent, personality-disorder, and control subjects. Am J Hum Genet. 1997;61:1144-1152.
84. Cravchik A, Gejman PV. Functional analysis of the
human D5 dopamine receptor missense and nonsense variants. Pharmacogenetics. 1999;9:199-206.
85. El-Omar EM, Carrington M, Chow WH, et al. Interleukin-1 polymorphisms associated with increased
risk of gastric cancer. Nature. 2000;404:398-402.
86. Ziv E, Cauley J, Morin PA, et al. Association between the T29→C polymorphism in the transforming growth factor 1 gene and breast cancer among
elderly white women. JAMA. 2001;285:2859-2863.
87. Struewing JP, Hartge P, Wacholder S, et al. The
risk of cancer associated with specific mutations of
BRCA1 and BRCA2 among Ashkenazi Jews. N Engl J
Med. 1997;336:1401-1408.
88. Woodage T, King SM, Wacholder S, et al. The
APCI1307K allele and cancer risk in a communitybased study of Ashkenazi Jews. Nat Genet. 1998;20:
62-65.
89. Brockmoller J, Cascorbi I, Henning S, et al. Molecular genetics of cancer susceptibility. Pharmacology. 2000;61:212-227.
90. Ma J, Stampfer MJ, Giovannucci E, et al. Methylenetetrahydrofolate reductase polymorphism, dietary interactions, and risk of colorectal cancer. Cancer Res. 1997;57:1098-1102.
91. Rebbeck TR, Kantoff PW, Krithivas K, et al. Modification of BRCA1-associated breast cancer risk by the
polymorphic androgen-receptor CAG repeat. Am J
Hum Genet. 1999;64:1371-1377.
92. Storey A, Thomas M, Kalita A, et al. Role of a p53
polymorphism in the development of human papillomavirus-associated cancer. Nature. 1998;393:229234.
JAMA, November 14, 2001—Vol 286, No. 18 (Reprinted)
93. Hildesheim A, Schiffman M, Brinton LA, et al. p53
polymorphism and risk of cervical cancer. Nature. 1998;
396:531-532.
94. Smith MW, Dean M, Carrington M, et al. Contrasting genetic influence of CCR2 and CCR5 variants on HIV-1 infection and disease progression. Science. 1997;277:959-965.
95. Lorenz E, Mira JP, Cornish KL, et al. A novel polymorphism in the toll-like receptor 2 gene and its potential association with staphylococcal infection. Infect Immun. 2000;68:6398-6401.
96. Bellamy R, Ruwende C, Corrah T, et al. Variations in the NRAMP1 gene and susceptibility to tuberculosis in West Africans. N Engl J Med. 1998;338:
640-644.
97. Zimmerman PA, Woolley I, Masinde GL, et al.
Emergence of FY*A(null) in a Plasmodium vivaxendemic region of Papua New Guinea. Proc Natl Acad
Sci U S A. 1999;96:13973-13977.
98. Flores-Villanueva PO, Yunis EJ, Delgado JC, et al.
Control of HIV-1 viremia and protection from AIDS
are associated with HLA-Bw4 homozygosity. Proc Natl
Acad Sci U S A. 2001;98:5140-5145.
99. Grasemann H, Yandava CN, Storm van’s Gravesande K, et al. A neuronal NO synthase (NOS1) gene
polymorphism is associated with asthma. Biochem Biophys Res Commun. 2000;272:391-394.
100. Graves PE, Kabesch M, Halonen M, et al. A cluster of seven tightly linked polymorphisms in the IL-13
gene is associated with total serum IgE levels in three
populations of white children. J Allergy Clin Immunol. 2000;105:506-513.
101. Martinez FD, Graves PE, Baldini M, et al. Association between genetic polymorphisms of the beta2adrenoceptor and response to albuterol in children with
and without a history of wheezing. J Clin Invest. 1997;
100:3184-3188.
102. Dahl M, Tybjaerg-Hansen A, Lange P, Nordestgaard BG. DeltaF508 heterozygosity in cystic fibrosis and
susceptibility to asthma. Lancet. 1998;351:19111913.
103. Hill MR, Cookson WO. A new variant of the beta
subunit of the high-affinity receptor for immunoglobulin E (Fc epsilon RI-beta E237G). Hum Mol Genet.
1996;5:959-962.
104. Drazen JM, Yandava CN, Dube L, et al. Pharmacogenetic association between ALOX5 promoter
genotype and the response to anti-asthma treatment. Nat Genet. 1999;22:168-170.
105. Stafforini DM, Numao T, Tsodikov A, et al. Deficiency of platelet-activating factor acetylhydrolase
is a severity factor for asthma. J Clin Invest. 1999;
103:989-997.
106. Fiehn O, Kopka J, Dormann P, et al. Metabolite
profiling for plant functional genomics. Nat Biotechnol. 2000;18:1157-1161.
107. Raamsdonk LM, Teusink B, Broadhurst D, et al.
A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nat Biotechnol. 2001;19:45-50.
108. Kononen J, Bubendorf L, Kallioniemi A, et al.
Tissue microarrays for high-throughput molecular
profiling of tumor specimens. Nat Med. 1998;4:844847.
109. Kitahara O, Furukawa Y, Tanaka T, et al. Alterations of gene expression during colorectal carcinogenesis revealed by cDNA microarrays after lasercapture microdissection of tumor tissues and normal
epithelia. Cancer Res. 2001;61:3544-3549.
110. Ross DT, Scherf U, Eisen MB, et al. Systematic
variation in gene expression patterns in human cancer cell lines. Nat Genet. 2000;24:227-235.
111. Welsh JB, Zarrinkar PP, Sapinoso LM, et al. Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. Proc Natl Acad
Sci U S A. 2001;98:1176-1181.
112. Scherf U, Ross DT, Waltham M, et al. A gene
©2001 American Medical Association. All rights reserved.
THE HUMAN GENOME
expression database for the molecular pharmacology
of cancer. Nat Genet. 2000;24:236-244.
113. Naaby-Hansen S, Waterfield MD, Cramer R. Proteomics—post-genomic cartography to understand gene
function. Trends Pharmacol Sci. 2001;22:376-384.
114. Soskic V, Gorlach M, Poznanovic S, et al. Functional proteomics analysis of signal transduction pathways of the platelet-derived growth factor beta receptor. Biochemistry. 1999;38:1757-1764.
115. Rohlff C. Proteomics in molecular medicine. Electrophoresis. 2000;21:1227-1234.
116. Steiner S, Gatlin CL, Lennon JJ, et al. Proteomics to display lovastatin-induced protein and pathway regulation in rat liver. Electrophoresis. 2000;21:
2129-2137.
117. Celis JE, Wolf H, Ostergaard M. Bladder squamous cell carcinoma biomarkers derived from proteomics. Electrophoresis. 2000;21:2115-2121.
118. Husi H, Ward MA, Choudhary JS, et al. Proteomic analysis of NMDA receptor-adhesion protein
signaling complexes. Nat Neurosci. 2000;3:661669.
119. Vidal M. A biological atlas of functional maps.
Cell. 2001;104:333-339.
120. Goldstein LS. Kinesin molecular motors. Proc Natl
Acad Sci U S A. 2001;98:6999-7003.
121. Walhout AJ, Vidal M. High-throughput yeast
two-hybrid assays for large-scale protein interaction
mapping. Methods. 2001;24:297-306.
122. Walhout AJ, Vidal M. Protein interaction maps
for model organisms. Nat Rev Mol Cell Biol. 2001;
2:55-63.
123. Zeng J. Mini-review: computational structurebased design of inhibitors that target protein surfaces. Comb Chem High Throughput Screen. 2000;
3:355-362.
124. Stanyon CA, Finley RL Jr. Progress and potential of Drosophila protein interaction maps. Pharmacogenomics. 2000;1:417-431.
125. McCraith S, Holtzman T, Moss B, Fields S. Genome-wide analysis of vaccinia virus protein-protein
interactions. Proc Natl Acad Sci U S A. 2000;97:48794884.
126. Uetz P, Giot L, Cagney G, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623-627.
127. Rain JC, Selig L, De Reuse H, et al. The proteinprotein interaction map of Helicobacter pylori. Nature. 2001;409:211-215.
128. Blake JA, Eppig JT, Richardson JE, et al. The
Mouse Genome Database (MGD). Nucleic Acids Res.
2001;29:91-94.
129. Scalzi JM, Hozier JC. Comparative genome mapping. Genomics. 1998;47:44-51.
130. Ringwald M, Baldock R, Bard J, et al. A database for mouse development. Science. 1994;265:
2033-2034.
131. Marshall E. Genome sequencing: Celera assembles mouse genome; public labs plan new strategy. Science. 2001;292:822.
132. Mody M, Cao Y, Cui Z, et al. Genome-wide gene
expression profiles of the developing mouse hippocampus. Proc Natl Acad Sci U S A. 2001;98:88628867.
133. Nadeau JH, Balling R, Barsh G, et al. Sequence
interpretation: Functional annotation of mouse genome sequences. Science. 2001;291:1251-1255.
134. Fortini ME, Skupski MP, Boguski MS, Hariharan
IK. A survey of human disease gene counterparts in the
Drosophila genome. J Cell Biol. 2000;150:F23-F30.
135. Nuwaysir EF, Bittner M, Trent J, et al. Microarrays and toxicology. Mol Carcinog. 1999;24:153159.
136. Kanitz MH, Witzmann FA, Zhu H, et al. Alterations in rabbit kidney protein expression following
lead exposure as analyzed by two-dimensional gel electrophoresis. Electrophoresis. 1999;20:2977-2985.
137. Weekes J, Wheeler CH, Yan JX, et al. Bovine dilated cardiomyopathy. Electrophoresis. 1999;20:898906.
138. Sachidanandam R, Weissman D, Schmidt SC, et
al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001;409:928-933.
139. Kishi F, Fujishima S, Tabuchi M. Dinucleotide repeat polymorphism in the third intron of the NRAMP2/
DMT1 gene. J Hum Genet. 1999;44:425-427.
140. Ridanpaa M, van Eenennaam H, Pelin K, et al.
Mutations in the RNA component of RNase MRP cause
a pleiotropic human disease, cartilage-hair hypoplasia. Cell. 2001;104:195-203.
141. Andersson B, Blange I, Sylven C. Angiotensin-II
type 1 receptor gene polymorphism and long-term survival in patients with idiopathic congestive heart failure. Eur J Heart Fail. 1999;1:363-369.
142. Benetos A, Cambien F, Gautier S, et al. Influence of the angiotensin II type 1 receptor gene polymorphism on the effects of perindopril and nitrendipine on arterial stiffness in hypertensive individuals.
Hypertension. 1996;28:1081-1084.
143. Johnson M. The beta-adrenoceptor. Am J Respir
Crit Care Med. 1998;158:S146-S153.
144. Poirier J, Delisle MC, Quirion R, et al. Apolipoprotein E4 allele as a predictor of cholinergic deficits
and treatment outcome in Alzheimer disease. Proc Natl
Acad Sci U S A. 1995;92:12260-12264.
145. Cascorbi I, Gerloff T, Johne A, et al. Frequency
of single nucleotide polymorphisms in the Pglycoprotein drug transporter MDR1 gene in white subjects. Clin Pharmacol Ther. 2001;69:169-174.
146. Furuta T, Shirai N, Takashima M, et al. Effect
of genotypic differences in CYP2C19 on cure rates
for Helicobacter pylori infection by triple therapy
with a proton pump inhibitor, amoxicillin, and
clarithromycin. Clin Pharmacol Ther. 2001;69:158168.
147. Zeviani M, Tiranti V, Piantadosi C. Mitochondrial disorders. Medicine (Baltimore). 1998;77:
59-72.
148. Aldred MA, Trembath RC. Activating and inactivating mutations in the human GNAS1 gene. Hum
Mutat. 2000;16:183-189.
149. Gottlieb B, Beitel LK, Trifiro MA. Somatic mosaicism and variable expressivity. Trends Genet. 2001;
17:79-82.
150. Grosjean H, Benne R. Modification and Editing
of RNA. Washington, DC: American Society of Microbiology Press; 1998.
151. Eddy SR. Noncoding RNA genes. Curr Opin
Genet Dev. 1999;9:695-699.
152. Ideker T, Thorsson V, Ranish JA, et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science. 2001;
292:929-934.
153. Collins FS, McKusick VA. Implications of the Human Genome Project for medical science. JAMA. 2001;
285:540-544.
154. Gottlieb S. US employer agrees to stop genetic
testing. BMJ. 2001;322:449.
155. Schafer S. Railroad agrees to stop gene-testing
workers. Washington Post. April 19, 2001:E01.
156. Buller-Hill B. Murderous Science. Plainview, NY:
Cold Spring Harbor Press; 1998.
157. Timberg C. Va. house voices regret for eugenics. Washington Post. February 3, 2001:A01.
158. Watson JD. A Passion for DNA. Plainview, NY:
Cold Spring Harbor Laboratory Press; 2000:183195, 213.
159. Schwartz RS. Racial profiling in medical research. N Engl J Med. 2001;344:1392-1393.
New at jama.com
This human genomics/genetics theme issue includes Webenhanced articles with hypertext links from genetics terms
to their definitions in the National Human Genome
Research Institute Glossary of Genetic Terms (http://www
.nhgri.nih.gov/DIR/VIP/Glossary/pub_glossary.cgi).
©2001 American Medical Association. All rights reserved.
(Reprinted) JAMA, November 14, 2001—Vol 286, No. 18 2307