Download assessing the function of genetic variants in candidate gene

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
REVIEWS
ASSESSING THE FUNCTION OF
GENETIC VARIANTS IN CANDIDATE
GENE ASSOCIATION STUDIES
Timothy R. Rebbeck*, Margaret Spitz ‡ and Xifeng Wu‡
Knowledge of inherited genetic variation has a fundamental impact on understanding human
disease. Unfortunately, our understanding of the functional significance of many inherited genetic
variants is limited. New approaches to assessing functional significance of inherited genetic
variation, which combine molecular genetics, epidemiology and bioinformatics, promise to enhance
reproducibility and plausibility of associations between genotypes and disease.
*Department of Biostatistics
and Epidemiology, and
Abramson Cancer Center,
University of Pennsylvania
School of Medicine, 904
Blockley Hall, 423 Guardian
drive, Philadelphia,
Pennsylvania 19104, USA.
‡
Department of
Epidemiology,
M. D. Anderson Cancer
Center, 1515 Halcombe
Boulevard, Houston,
Texas 77030, USA.
Correspondence to T.R.R.
e-mail: trebbeck@
cceb.med.upenn.edu
doi:10.1038/nrg1403
NATURE REVIEWS | GENETICS
The human genome contains about ten million SNPs,
with an estimated two common missense variants per
gene1. At least five million SNPs have already been
reported in public databases2,3. Many of these variants
might be involved in human disease aetiology, but it is
often difficult to assess their function on the basis of
nucleotide sequence alone. This is particularly true
when variants do not alter an amino acid or do not disrupt a well-characterized motif that affects protein
function or structure. In addition, only a small subset
of variants that affect the phenotype will confer small
to moderate effects on phenotypes that are causally
related to disease risk. So, an important challenge that
faces molecular epidemiological association studies of
candidate disease-susceptibility genes is to define the
variants that are functionally implicated in disease.
This is particularly urgent because the amount of
genomic information that is available greatly exceeds
the information about the function of variants that are
used in human disease studies. There is insufficient
guidance for molecular epidemiologists to optimally
select variants for an epidemiology study, highlighting
the need for methods that prioritize the choice of
genetic variants to be genotyped in molecular epidemiological studies4. Here, we bring together examples of
experimental population genetics and evolutionary
approaches to assessing variant function, and evaluate
the potential for using this information to improve
inferences from disease association studies.
Genomic variation and molecular epidemiology
The protein coding sequences of the human genome
contain approximately 100,000–300,000 common
SNPs, and additional SNPs lie within putative regulatory regions of genes that might be relevant for studies
of human health and disease1. Regulatory and coding
SNPs are of particular interest to molecular epidemiological association studies. Non-synonymous SNPs
(nsSNPs), or missense variants, translate into aminoacid polymorphisms in the proteins they encode.
Regulatory SNPs (rSNPs) can affect the expression,
tissue-specificity or function of relevant proteins. It
seems that both nsSNPs and rSNPs are relatively rare
compared with the total number of SNPs in the human
genome5. The rarity of nsSNPs might be a consequence
of selection against the functional disruptions of
amino-acid variation. Some molecular functional diversity is attributable to the effects on protein function
caused by nsSNPs. For example, the kinetic parameters
of enzymes, the DNA-binding properties of proteins
that regulate transcription, the signal transduction
activities of transmembrane receptors, and the architectural roles of structural proteins are all susceptible to
perturbation by nsSNPs and their associated amino-acid
polymorphisms.
In addition to ongoing efforts to identify and characterize these genetic variants, molecular epidemiological
disease association studies are under way to better understand the role of inherited genetic variation in disease.
VOLUME 5 | AUGUST 2004 | 5 8 9
REVIEWS
A major challenge for epidemiologists undertaking
candidate gene–disease association studies is to choose
target SNPs that are most likely to affect the phenotype
and that ultimately contribute to disease development.
Variants in biologically plausible candidate genes are
usually selected for study on the basis of both variant
allele frequency and the functional effect of the variant on
relevant traits. Although there is often sufficient information to assess the allele frequency of a candidate variant,
understanding the functional significance of genetic
variants is usually more difficult.
Knowledge of gene and SNP function is crucial to
direct the appropriate design and interpretation of candidate gene association studies. This is in contrast to
genome-wide scans and other approaches that rely on
LINKAGE DISEQUILIBRIUM between loci to identify genotypedisease associations, for which knowledge of function is
not required. Most epidemiologists are not trained in
experimental laboratory methods and do not maintain
laboratories in which detailed laboratory assessment of
variant function in target tissues or individuals of interest
can be made. They must therefore often rely on making
an assessment of the function of a candidate gene or
SNP from the published literature. However, published
experimental evidence might not be adequate to guide
the design or interpretation of a molecular epidemiological study. For example, inadequate information
about the function of a genetic variant impedes the ability to evaluate whether the association is consistent with
a causative event that is consistent with a biological
mechanism, whether the association is a reflection of
linkage disequilibrium with a truly causative variant, or
whether the association represents a false positive result.
The lack of reproducibility of many association studies
might reflect the number of studies that involve genetic
variants with no functional significance6,7.
Experimental approaches
LINKAGE DISEQUILIBRIUM
The observation that two or
more alleles, usually at loci that
are physically close together on a
chromosome, are not inherited
independently but are observed
to occur together more
frequently than predicted under
Mendel’s law of independent
assortment.
NIFEDIPINE
A calcium-blocker drug (also
called Procardia) that was one of
the first drugs recognized to be
metabolized by CYP3A4, and for
which a regulatory element
specific to the CYP3A4 gene was
named.
MENARCHE
The first occurrence of
menstruation in a woman.
590
Inferences about function of a genetic variant can be
made using experimental systems, including in vitro systems and in vivo animal models. These include the effect
of variants on regulatory-region control of DNA
expression, RNA stability or degradation, protein structure, protein denaturing, protein expression, other measures of in vivo and in vitro control of protein levels, and
tissue specificity8–10. Because the scope of experimental
approaches for assessing SNP function is large, the purpose of this section is not to provide a comprehensive
review of experimental approaches for evaluating SNP
function, but instead to provide a context in which epidemiological studies can use experimental data in the
design and interpretation association studies that
involve candidate genes.
These approaches can provide the strongest evidence
for the functional role of a genetic variant, but can also be
difficult to interpret in the context of studies of complex
human traits. The extent of the effects of genetic variants
on relevant phenotypes involved in the disease process is
probably small. These effects may be highly dependent
on the context in which the genetic variant is acting,
including influence from environmental exposure,
| AUGUST 2004 | VOLUME 5
tissue-specific effects or experimental conditions. For
example, the functional effect of a variant might be small
in magnitude in an experimental system, but might
become more important in a specific human tissue or on
exposure to relevant environmental agents. Similarly,
large experimental effects might be observed that reflect
negligible in vivo effects in humans. Similarly, effects that
are small in magnitude in a single time point-experiment
might be amplified or only have a phenotypically relevant effect over long time periods. Indeed, most genetic
variants that are studied in complex traits and disease
would be expected to have small effects on function. So,
the use of model systems that are appropriate to detect
large effects (for example, as might be observed in high
penetrance ‘inborn errors of metabolism’ settings) might
not be optimal.
As outlined above, experimental evidence of genetic
variant function might not be consistent with results of
epidemiological studies. For example, CYP3A4 metabolizes drugs and other compounds, including steroid
hormones, that are important in the aetiology of many
common diseases11. A variant has been identified in the
CYP3A4 promoter (denoted CYP3A4*1B) that consists
of an A→G nucleotide substitution at position –290
(denoted A–290G) in the NIFEDIPINE-specific element
(NFSE)12. Subsequently, the CYP3A4*1B variant was
epidemiologically associated with various characteristics, including a more advanced stage of prostate
tumours13,14, decreased risk for treatment-related
leukaemia15, early age of MENARCHE16,17 and increased
plasma levels of insulin-like growth factor-I among
users of oral contraception18. Although these data provide evidence for functionally significant effects of
CYP3A4*1B, the basic science literature has not consistently supported the hypothesis that CYP3A4*1B has a
functionally significant effect. Hashimoto et al.12 identified several regulatory elements, including a putative
repressor fragment and a NFSE element in the CYP3A4
promoter, which indicates that variants in this promoter
region might affect CYP3A4 transcription (FIG. 1).
Lamba et al.19 reported that CYP3A4*1B alleles were
found significantly more frequently in Caucasians with
low CYP3A4 protein levels than in those with higher
levels of the protein.
Many authors20–26 have studied the relationship of
CYP3A4 expression or function to the CYP3A4*1B promoter (FIG. 1). Most of them concluded that there were
no biologically meaningful effects, given the small magnitude of effects that were observed. However, most
studies reported consistently elevated expression associated with CYP3A4*1B (a 20%–200% increase over the
consensus CYP3A4*1A)20–26.
On the basis of these data, it is possible that
CYP3A4*1B has only a small to moderate phenotypic
effect. These small phenotypic effects will probably not
have a clinically meaningful impact on drug disposition. It is not clear, however, whether this magnitude of
phenotypic perturbation is sufficient to alter metabolism on environmental exposure to steroid hormones
or other agents that might confer disease risk over the
lifetime of an individual. For example, would a 20%
www.nature.com/reviews/genetics
REVIEWS
*1B
Putative repressor region
ATG
NFSE
Cell system
Effect of CYP3A4*1B versus CYP3A4*1A
Refs
+10
Human hepatocytes
190% Increase in testosterone 6 β-oxidation
20
–1019
–6
MCF7
90% Increase in luciferase expression
21
–1019
–6
HepG2
40% Increase in luciferase expression
21
22
–490
–570
+22
Human hepatocytes
40% Increase in nifedipine oxidase activity
–570
+22
Human hepatocytes
110% Increase in CYP3A4 protein expression
22
+53
HepG2
No change in luciferase expression
23
+53
–362
HepG2 + enhancer
20% Increase in luciferase expression
23
–572
–6
MCF7
20% Increase in luciferase expression
25
–572
–6
HepG2
40% Increase in luciferase expression
25
–572
–6
Human hepatocytes
20–90% Increase in luciferase expression
25
–362
In response to xenobiotics:
–1203
–61
HepG2
20–117% Increase in transcriptional activation
26
–1203
–61
HuH7
75–147% Increase in transcriptional activation
26
–1203
–61
CaCo-2
4–19% Increase in transcriptional activation
26
Figure 1 | In vitro studies of the effect of CYP3A4*1B compared with CYP3A4*1A. At the top of the figure on the left is a
schematic representation of the structure of the CYP3A4 region. Blue bars represent the genomic regions that are used in the
CYP3A4-containing constructs for each study. The postive and negative numbers on each of the bars represent the postion of the
terminal basepairs of these regions relative to the postion of the SNP. The studies summarized here encompass substantial
variability in experimental conditions. Nonetheless, there is a small but consistent effect of increased CYP3A4 expression in the
presence of CYP3A4*1B. NFSE, nifedipine-specific element.
greater metabolism of testosterone by CYP3A4*1B over
the course of a man’s lifetime (as indicated by the data
in FIG. 1) sufficiently increase the risk of prostate cancer
to the extent that it could be detected in an epidemiological context? If so, the apparent discrepancy between
epidemiological associations of this genetic variant and
the functional effects of this variant in experimental
systems might not exist.
Variation in experimental approaches used to assess
the function of genetic variants in complex metabolic
systems could affect inferences about function (see FIG. 1
and BOX 1). Results might vary depending on the specific experimental conditions, unrecognized effects of
regulatory elements, and post-transcriptional or posttranslational processing. Hashimoto et al.12 suggest
that removing the repressor region upstream of the
CYP3A4*1B variant sequence might unmask a promoter
effect. Studies that evaluate regulatory region genetic
variants in CYP3A4 might differ substantially depending
on whether this region is present or absent in the experimental systems. In fact, various constructs were used to
evaluate in vitro functional effects of CYP3A4*1B (FIG. 1),
and this experimental variability could have influenced
the inferences made in these studies. The inclusion of
enhancer elements that might be required in expression
assays can similarly affect expression. Regulation of
expression might only become apparent under exposure
to the relevant compounds in the proper cellular context.
The use of primary cells versus cell culture systems can
also affect inferences about function. The use of different
cell or tissue types (for example see FIG. 1) might reflect
different regulatory influences that affect the functional
assessment of a genetic variant. These effects might be
NATURE REVIEWS | GENETICS
further compounded if the magnitude of functional
effects of individual variants is small, leading to a greater
potential for artificial masking or enhancement of functional effects in experimental systems. Therefore, the
ability to detect functionally relevant effects of genetic
variants might be highly dependent on the context of
the experimental system used, and experimental systems might not reflect relevant in vivo effects in
humans. Given the differences between the relevant
phenotypes in humans and experimental systems,
information obtained from the latter might not be consistent with results of epidemiological association studies. Therefore, it might not be possible to make clear
inferences about in vivo genotype function in humans
from experimental data.
Inconsistencies between experimental data and
epidemiological studies can also reflect a potential
study bias that is inherent to some epidemiological
investigations. Poor study design, including insufficient statistical power, could influence the results of an
epidemiological study to produce false positive or
false negative inferences (BOX 1).
In addition to more traditional experimental
approaches that are used to assess SNP function, a wide
variety of novel approaches for assessing gene function
have been proposed, including gene tagging27, gene trapping28, N-ethyl-N-nitrosourea (ENU) mutagenesis29,
proteomics methods30 and evaluation of epigenetic
mechanisms31. The HaploCHIP method32 is an illustrative example. HaploCHIP analyzes SNPs that affect
gene regulation in vivo using chromatin immunoprecipitation and mass spectrometry to identify differential
protein–DNA binding in vivo that is associated with
VOLUME 5 | AUGUST 2004 | 5 9 1
REVIEWS
Box 1 | Factors influencing consistency of gene–disease associations
Variables affecting inferences from experimental studies:
pleiotropic effects on disease risk. This should be a main
focus of future studies attempting to assess the functional significance of genetic variants.
• In vitro or in vivo system studied
Population genetics approaches
• Cell type studied
• Cultured versus fresh cells studied
• Genetic background of the system
• DNA constructs
• DNA segments that are included in functional (for example, expression) constructs
• Use of additional promoter or enhancer elements
• Exposures
• Use of compounds that induce or repress expression
• Influence of diet or other exposures on animal studies
Variables affecting epidemiological inferences:
• Inclusion/exclusion criteria for study subject selection
• Sample size and statistical power
• Candidate gene choice
• A biologically plausible candidate gene
• Functional relevance of the candidate genetic variant
• Frequency of allelic variant
• Statistical analysis
• Consideration of confounding variables, including ethincity, gender or age.
• Whether an appropriate statistical model was applied (for example, were interactions
considered in addition to main effects of genes?)
• Violation of model assumptions
HARDY-WEINBERG
PROPORTIONS
The binomial distribution of
genotypes (that is, frequencies of
genotypes AA, Aa and aa will be
p2, 2pq, and q2, respectively,
where p is the frequency of
allele A, and q is the frequency
of allele a) that result in a
population when there are no
external pressures that cause
deviations from p2, 2pq and q2.
TEST STATISTIC
A quantity whose value is used
to decide whether or not the null
hypothesis should be rejected,
usually based on quantities
computed using observed data.
RESTENOSIS
The constriction, narrowing or
blockage of a coronary artery
after an initial treatment such as
angioplasty aimed at removing
this blockage.
ANGIOPLASTY
An operation that is used to
repair a damaged blood vessel or
unblock a coronary artery.
592
allelic variants of a gene as a surrogate measure of transcriptional activity. This allele-specific quantification
method uses haplotype-specific chromatin immunoprecipitation (CHIP) to measure the amount of phosphorylated RNA polymerase II that is bound to different alleles,
thereby estimating the differences in protein binding
between the alleles. This assay can be adapted for highthroughput analysis because of its sensitivity and ability
to quantify the relative abundance of two different alleles
in a sample of immunoprecipitated chromatin.
Using this approach, Knight et al.32 showed that
there was a close correlation between the level of bound
phosphorylated (active) RNA polymerase II at the
imprinted small nuclear ribonucleoprotein polypeptide
N locus and allele-specific expression. They also used
this method to identify an SNP of the cytokine tumour
necrosis factor (TNF), which plays a pivotal role in
inflammation, immunity and apoptosis, although the
involvement of this SNP in modulating TNF transcription is yet to be shown. Application of the HaploCHIP
approach to the TNF/lymphotoxin-α (LTA) locus identified functionally important haplotypes that correlate
with allele-specific transcription of LTA32.
Despite reports of both traditional and new ways of
assessing the function of a genetic variant, little has been
done to determine whether genotypic differences that
lead to small phenotypic perturbations are functionally
significant in the context of complex human disease. No
criteria have been established to evaluate the importance of genetic variants with small but potentially
| AUGUST 2004 | VOLUME 5
Knowledge of population genetic structure might
provide insight into the functional relevance of a
genetic variant on a disease trait. For example, genetic
variants with a functionally significant impact on relevant phenotypes, including disease endpoints, might be
more likely to deviate from expected allele and genotype
frequencies compared with alleles that are functionally
neutral. It remains unclear whether there is evidence of
selection for or against low penetrance alleles over evolutionary time, although it is clear that selection has
influenced the pattern of genetic variability in the
human genome33,34. However, a number of diseases that
could confer selective pressures on genes of interest are
relatively new with respect to evolutionary genetics history. For example, diabetes and obesity might confer
disease risks that could lead to selective pressure on
allele or genotype frequencies, but these effects still
occur relatively late in life and have been a problem to
human health on a population level for only a few generations. So, whilst substantial new information about
evolutionary genetics history and low penetrance SNPs
is becoming available, the link between this knowledge
and the functional importance of specific SNPs has yet
to be fully elucidated.
Deviations from expected allele or genotype frequencies across relevant phenotypic groups could be used to
identify alleles that are more likely to be functionally associated with disease aetiology. For example, alleles that are
truly causative of a disease state would be expected to be
disproportionately over-represented among cases with
the disease but under-represented among the diseasefree controls35,36. As a result, deviations from HARDYWEINBERG PROPORTIONS or other measures of population
frequency might provide clues to the role of genes in the
aetiology of this disease. For example, Hoh et al.37 developed a so- called ‘set association’ approach that evaluates sets of SNPs at various positions in the genome.
Information about allelic association and HardyWeinberg equilibrium are combined over multiple
markers in the genome. A genome-wide TEST STATISTIC is
generated by summing-up contributions from many
SNPs located in different genomic regions. The method
performs a significance test on several sets of loci
simultaneously, while using conservative measures of
inference to control for the potential of false positive
associations. Hoh et al.37 applied their approach to an
SNP association study of RESTENOSIS. Among 779 patients
with heart disease, 342 showed restenosis (cases) 6
months after ANGIOPLASTY. The remaining patients, on
whom angioplasty was not performed, were the controls. Eighty nine SNP markers were genotyped in 62
candidate genes for each individual. The algorithm identified nine genes that conferred susceptibility to the disease. Unfortunately, despite promising results, the
method is sensitive to genotyping errors. Population
ADMIXTURE, in which cases and controls belong to different
www.nature.com/reviews/genetics
REVIEWS
ethnic groups with different SNP allele frequencies, can
also adversely distort the test results. Therefore,
although incorporating population genetics information can provide further information to a genotype–disease association study, additional research is required to
determine the conditions under which these approaches
will be optimally applied.
Evolutionary and structural approaches
ADMIXTURE
Combining two or more
populations into a single group.
Combining two populations has
implications for studies of
genotype–disease associations if
the component populations
have different genotypic
distributions.
CELL CYCLE CHECKPOINTS
Steps in the normal sequence of
development and division in the
cell. Disruption can lead to
uncontrolled cell growth, and
possibly cancer.
ODDS RATIO
A measure of relative risk that is
usually estimated from case
control studies.
NATURE REVIEWS | GENETICS
Natural selection shapes patterns of genetic variation in
populations such that favourable variants increase in
frequency relative to less favourable variants over time.
Mutations in functionally important sites can be eliminated or kept at low frequencies. So, nucleotide or
amino-acid residues that have been conserved across
species or within a gene family are more likely to be
involved in regulation of vital functions, expression or
tissue specificity than residues that are not conserved.
Knowledge of the functional domains can therefore be
useful when assessing the functional impact of an
amino-acid change. The value of this knowledge was
demonstrated early on for the effects of mutations in the
primary sequence of haemoglobin38. In this case, the molecular basis of the clinical effects caused by mutations
could be inferred as soon as the structural information
became available. These pioneering studies recognized
crucial links between the putative functional motifs and
potential effects of mutations on function.
Recently, a number of computational algorithms
have been developed to predict the impact of nucleotide
or amino-acid substitutions on protein structure,
expression and function. Evolutionary conservation of
the amino-acid sequence can be determined by aligning amino-acid sequences of related proteins from
unrelated organisms or across gene families. A number
of algorithms have been proposed that use DNA or
amino-acid sequence data to identify potentially functional residues or domains in a comparison with
sequences in the public databases. These include methods that are based on protein three-dimensional (3D)
structure39–44, methods that are based on evolutionary
considerations45,46, and machine-learning approaches47.
As examples of these approaches, we present two algorithms, SIFT (‘sorting intolerant from tolerant’; REFS 48–50;
see Further information for website) and PolyPhen
(‘polymorphism phenotyping’; REF. 51;see Further information for website). Both are based entirely around
amino-acid sequence, and are useful for predicting the
impact of exonic amino-acid substitutions on disease
risk. A third algorithm, known as CODDLE (‘choosing
codons to optimize discover of deleterious lesions’; see
Further information for website) uses nucleotide
sequence (for example, a PCR amplicon) to identify all
potentially deleterious nucleotide substitutions, including
nonsense, frameshift, missense and splicing mutations.
The SIFT algorithm. This algorithm is based on the
assumption that amino-acid positions that are important for the correct biological function of the protein
are conserved across the protein family and/or across
evolutionary history. SIFT uses protein sequence and
multiple alignment information of these sequences to
estimate ‘tolerance indices’ that predict tolerated and
deleterious (that is, intolerant) substitutions for every
position of the query sequence. Substitutions at each
position with normalized tolerance indices that are
below a chosen cut-off point are predicted to be deleterious. Substitutions that are greater than or equal to the
cut-off point are predicted as being tolerated (that is,
putatively non-functional). Using three examples
(LacI, HIV-I protease and bacteriophage T4 lysozyme),
Ng et al.48 showed that a high proportion of substitutions that are predicted to be deleterious by SIFT did
affect phenotypes in experimental assays. However,
some positions that were predicted to be intolerant by
SIFT were tolerated in experimental assays. In these
cases, the positions are usually involved in an unknown
function that the assay does not detect.
There is also evidence that SIFT fails to identify
residues that are vital for protein function but that have
not been conserved throughout the family48. SIFT predictions are based on sequence data alone and do not require
knowledge of protein structure or function. So, substitutions in uncharacterized proteins can be evaluated by
SIFT only when homologous sequences are provided.
Recently, Zhu et al.6 compared the relationship of
tolerance to amino-acid change, as predicted by SIFT,
with the reported association with SNPs in cancerrelated genes (FIG. 2). For the study, 166 published
case-control studies that reported associations in 46
SNPs, in 39 different genes, in 16 different cancer
sites, were chosen. All of these SNPs are located in
biologically plausible cancer-related genes, such as
those involved in DNA repair, carcinogen metabolism
or CELL CYCLE CHECKPOINTS. The putative functional significance of these affected SNPs was calculated using tolerance indices from SIFT by comparing sequences
from different species. The analyses showed a significant inverse correlation between estimates of cancer
risk as assessed by the ODDS RATIO (OR) and the tolerance indices of the amino-acid variants. These findings
indicate that alterations in conserved amino-acids in
SNPs are more likely to be associated with cancer susceptibility. So, using a molecular evolutionary approach
might help identify SNPs to be genotyped in future
molecular epidemiological studies.
Bayesian phylogenetic analysis is another comparative evolutionary method that identifies potential functionally important amino-acid sites that disrupt gene
function52. This approach aligns amino-acid sequences
for different species and constructs a phylogenetic tree.
It then obtains maximum likelihood estimates of
nucleotide substitution rates, identifies conserved
regions and calculates ancestral sequences. Finally,
regions that evolve under positive selection are identified and compared with the distribution of conserved sites with missense changes that have been
reported in public database. Fleming et al.52 used this
approach, as well as the SIFT program, to identify missense changes in exon 11 of the BRCA1 gene. The phylogenetic approach inferred that 38 of 139 missense
changes affected function in exon 11. By contrast, SIFT
VOLUME 5 | AUGUST 2004 | 5 9 3
REVIEWS
a
3
Log2ORs
2
1
0
1
0
2
PSIC score
b
4.0
3.5
3.0
Log2ORs
2.5
2.0
1.5
1.0
0.5
0.0
0
0.2
0.4
0.6
0.8
1.0
1.2
Tolerance index
Figure 2 | Relationship of in silico indices of SNP function
and associations (measured by log2-transformed odds
ratios) taken from the literature. a | The relationship
between the position-specific independent count (PSIC) score
difference for studied SNPs from PolyPhen (‘polymorphism
phenotyping’) analyses and log2-transformed odds ratios
(Log2ORs) from published associations. b | The relationship
between the tolerance index for studied SNPs from SIFT
(‘sorting intolerant from tolerant’) analysis and log2ORs from
associations. The data indicate that SNPs that are inferred as
functional are more likely to be associated with higher oddsratio effects. By inference, they are more likely to be causative
of disease. Modified, with permission, from REF. 6 © (2004) The
American Association for Cancer Research.
identified 36 of these 38 changes, and an additional 34.
Fleming et al.52 hypothesized that SIFT predicted more
changes because substitutions in all taxa are given the
identical weight in SIFT. By contrast, under the assumption of the phylogenetic method, if substitutions are not
shared by sister taxa, they are more likely to be sequencing errors. In addition, non-conservative substitutions
maintained at sites in one pair of sister taxa are unlikely
to be functionally significant. Applying this approach,
Fleming et al.52 successfully predicted >85% of the
594
| AUGUST 2004 | VOLUME 5
known detrimental missense changes in several
domains of BRCA1, showing the utility of this approach
in identifying genetic variants that could be prioritized
in further association studies.
Concerted efforts to further understand the functional role of genomic elements, including SNPs, are
also underway. For example, the National Institutes of
Health has set up a public research consortium known
as ‘Encode’ (‘encyclopaedia of DNA elements’; REF. 53),
with the goal of identifying and categorizing DNA functional elements including transcriptional regulatory
sequences and determinants of chromosome structure
and function. This initiative involves comparing existing
computational and experimental approaches, and
developing new methods for the identification and
characterization of function.
Although these and other approaches can be used to
assist molecular epidemiologists in identification of
genetic variants in specific nucleotides or residues that
are most likely to be functionally significant, they will
probably provide only limited information about the
true function of a genetic variant. The methods
described above generally cannot identify functional
effects in non-coding regions, and cannot evaluate the
effect of other factors on function, such as exposures,
that might be involved in the aetiology of the disease.
Recently, Rogan et al.54 proposed a computational
approach to evaluate the putative effect of splicing
mutations. They used information theory-based models
to evaluate the relationship between variants and predicted splice sites and relevant phenotypes. The results
presented by Rogan et al.54 corresponded to known
functions of these genes and serve as a model for additional studies that evaluate the effect of both coding and
non-coding variation on functionality (see also REF. 55).
Having interpreted the results from SIFT and
related approaches, it might not be possible to identify
whether a single variant within a putative functional
domain is sufficient to disrupt function. Such approaches require sequence data and might therefore be
limited by the sequence information that is available in
public databases. Therefore, inferences of conservation
(or the lack thereof) might simply reflect data limitations. Inconsistencies of inference using different classes
of sequence data might also arise. For example, an
analysis of sequences among human gene families
might provide inferences that are different from acrossspecies sequence analysis. Although potentially helpful
in uncovering the role of a particular genetic region or
variant, such differences might also obscure inferences
about putative functional significance.
The PolyPhen algorithm. Missense variants might affect
protein folding, binding or interaction sites, as well as
the solubility or stability of the protein. These effects can
beestimated from physical considerations and from the
contextof an amino-acid replacement within the family
of homologous proteins. Sunyaev et al.56 demonstrated
that a significantfraction of missense variants (nsSNPs)
are likely to affect protein structure or function. Their
approach, implemented in PolyPhen algorithm uses
www.nature.com/reviews/genetics
REVIEWS
amino-acid sequence, phylogenetic and structural
information to characterize the potential functional
significance of a missense change.
PolyPhen was developed to identify functionally
important SNPs by predicting whether an amino-acid
change is likely to be deleterious for the protein on the
basis of 3D structure and multiple alignmentof homologous sequences57. The possibly damaging effect of variants in an amino-acid sequence can bedetermined if the
substitution is in an annotated active or binding site,
affects interaction withligands present in the crystallographic structure, leadsto hydrophobicity or electrostatic
charge change in a buriedsite, destroys a disulfide bond,
affects the protein’s solubility, inserts proline in an
α-helix, or is incompatiblewith the profile of amino-acid
substitutions observed at this site in the set of homologous proteins. Mapping the amino-acid substitution tothe
known 3D structure reveals if the change is likely to
destroy the hydrophobic core of a protein, electrostatic
interactions, interactions with ligands, or other features
of a protein.
Briefly, PolyPhen uses protein sequence and variant
data to search homologous sequence data from publicly
available protein databases. Only sequences with 30%
or more identity to the protein of interest are considered. On the basis of the alignment of these homologous sequences, profile scores can be computed for
allelic variants. Profile scores, known as position-specific
independent counts (PSICs), are logarithmic ratios of
the likelihood that a given amino acid occurs at a particular site to the background probability of the amino
acid occurring at random at a given position. Large differences in PSIC values for specific genetic variants
might indicate that the substitution of interest is rarely
or never observed in the protein family. Finally,
PolyPhen maps the amino-acid substitution to the
known 3D structure of the protein to examine whether
the substitution might destroy the protein’s hydrophobic core, electrostatic interactions or interactions with
ligands, or other important features of a protein based
on the analysis of several structural parameters, and
also on the analysis of several contact parameters.
Therefore, Polyphen can provide information about
whether an nsSNP is probably damaging, possibly
damaging, benign or whether its function is unknown.
Using this approach, Sunyaev et al.57 estimated that
20% of human nsSNPs might affect protein function,
although the proportion of SNPs that are truly deleterious is probably substantially lower. Sunayev et al.57
further estimated that there are 20,000–60,000 functionally relevant nsSNPs in the human genome. By
comparison, Fay et al.33 used a different model-based
approach to estimate that 20–45% of nsSNPs are
slightly deleterious and reach population frequencies of
1–10%, although as many as 80% of nsSNPs might
have some functional impact.
Zhu et al.6 applied PolyPhen to the 166 molecular epidemiological studies mentioned above to examine the
correlation between PSIC score and the OR associated
with a particular nsSNP (FIG. 2). They found a significant
inverse correlation between the ORs and PSIC score
NATURE REVIEWS | GENETICS
difference, and between tolerance index and PSIC
score difference. So, the more probable it is that an SNP
is functionally relevant, the higher the OR effect and the
more likely it is that causative associations will be identified. These results imply that using a method to assess
functional significance, such as PolyPhen, can optimize
the ability to identify meaningful and reproducible
molecular epidemiological associations.
In addition to identifying important genetic variants
for research prioritization, genotyping efforts could be
reduced by eliminating amino-acid substitutions that
have been deemed ‘neutral’ by algorithms such as
PolyPhen. In some cases, some genes could be removed
from further consideration if all of the variant alleles
were deemed to be non-functional. Because prediction
algorithms provide numerical data, it is feasible to further subdivide the variant alleles of a gene into those
that might have a moderate, modest or no impact. In
particular, when a gene is known to have many genetic
variants, prediction algorithms can also reduce the
effective number of alleles that need to be considered in
association studies. Therefore, the use of these approaches should limit the number of genotypes that
need to be considered, thereby limiting the potential for
false positive associations that might result from performing unnecessary hypothesis tests. The drawback of
these approaches is that potential interactions among
combinations of substitutions at a gene might be
ignored, and the dependence of functional impact on
genotype at other genes or on exposure might not be
addressed. Nonetheless, the algorithms for predicting
genotype impact on allele function provide an initial
and important step towards simplifying the seemingly
overwhelming complexity of the genotype data. The
concept of equivalent alleles provides a basis for additional steps towards reducing the complexity of the
data for molecular epidemiological studies.
Optimizing molecular epidemiological studies
To improve the probability of obtaining biologically
and clinically meaningful associations between genotypes and disease outcomes, the design and interpretation of molecular epidemiological studies should
include an assessment of functional significance. As
outlined above and in TABLE 1, a number of criteria
and/or approaches can be used to assess functional significance of a genetic variant. These include genetic
characteristics such as the type of genetic change (missense, frameshift, nonsense, regulatory, splicing, disruption of a known functional motif, etc.), population
genetics characteristics and evolutionary conservation
of nucleotide sequences, experimental evidence from in
vitro or animal model studies that consider reproducibility of functional findings, experimental conditions (such as cell/animal system, inducers/repressors)
that are used to evaluate a functional effect, the magnitude of the inferred biological effects, and the relevance
of the experimental model to human systems, including knowledge of the effect of the variant in the target
tissue (that is, tissue specificity) and consideration of
timing, dose or duration of relevant exposures.
VOLUME 5 | AUGUST 2004 | 5 9 5
REVIEWS
Table 1 | Criteria for assessing the functional significance of a genetic variant in candidate gene association studies
Criteria
Strong support for
functional significance
Moderate support for
functional significance
Evidence against
functional significance
Nucleotide sequence
Variant disrupts a known functional
or structural motif
Variant is a missense change or disrupts a
putative functional motif; changes to protein
structure might occur
Variant disrupts a non-coding
region with no known functional or
structural motif
Evolutionary
conservation
Consistent evidence from multiple
approaches for conservation across
species and multigene families
Evidence for conservation across species
or multigene families
Nucleotide or amino-acid residue
not conserved
Population genetics
In the absence of laboratory error, strong
deviations from expected population
frequencies in cases and/or controls in a
particular ethnicity
In the absence of laboratory error, moderate Population genetics data indicates
to small deviations from expected population no deviations from expected
frequencies in cases and/or controls; effects proportions
are not well characterized by ethnicity
Experimental evidence
Consistent effects from multiple lines of
experimental evidence; effect in human
context is established; effect in target
tissue is known
Some (possibly inconsistent) evidence for
function from experimental data; effect in
human context or target tissue is unclear
Experimental evidence consistently
indicates no functional effect
Variant might affect metabolism of the
exposure or one of its components;
effect in target tissue might not be known
Variant does not affect metabolism
of exposure of interest
Exposures (for example, Variant is known to affect the
genotype–environment metabolism of the exposure in
interaction studies)
the relevant target tissue
Epidemiological
evidence
TYPE I ERROR
Incorrectly rejecting a null
hypothesis when the null
hypothesis is correct. Similarly,
the false positive rate.
596
Consistent and reproducible reports of
Reports of association exist;
moderate-to-large magnitude associations replication studies are not available
Information from previously published epidemiological investigations indicating the effect of an SNP can
also be considered for studies that attempt to replicate
association findings. For example, an ad hoc measure of
support for the functional significance of a particular
variant could be created by weighting in favour of a
functionally significant effect if the criteria for ‘strong
support’ or ‘moderate support’ for functional significance are available, and therefore might be prioritized
for association studies. Similarly, variants for which
there is no or neutral functional information might be
ranked lower in priority for association studies. Variants
for which there is evidence against functional significance should not be considered in association studies
unless the hypothesis and association study methods
consider linkage disequilibrium approaches to identify
candidate regions of interest. In this case, the study
would assume that the variants under investigation are
in linkage disequilibrium with the causative allele(s).
Variants would be chosen for analysis on the basis of
polymorphic content and haplotype or population
genetics structure at that locus.
How would this approach be applied for a specific
candidate SNP? For example, applying the criteria
outlined in TABLE 1 for CYP3A4*1B, we learn first that
CYP3A4*1B disrupts a regulatory motif (NFSE) 12.
Because the function of NFSE is not clear, we might
therefore conclude that there is ‘moderate support’
for function on the basis of the ‘nucleotide sequence’
criterion.
The regulatory region of CYP3A4 is similar, at the
sequence level, to many members of the CYP3A multigene family, so we might conclude that there is strong
support for function on the basis of the ‘evolutionary
conservation’ criterion. However, because the role of
putative regulatory domains in CYP3A genes58 is still
debatable, the assessment of ‘evolutionary conservation’
is not completely clear.
| AUGUST 2004 | VOLUME 5
Prior studies show no effect of
variant
The population genetics of CYP3A4*1B has been
well described59, but no data have been reported about
deviation of frequencies from Hardy-Weinberg proportions in cases versus controls, so this criterion provides
little support for or against function.
The ‘experimental evidence’ about CYP3A4*1B
function (FIG. 1) has been controversial, but there seems
to be a small but consistent increase in expression associated with CYP3A4*1B. This relatively small effect
might be insufficient to confer major effects on drug
metabolism, but indicates possible associations with
altered disease risk. We could therefore conclude that
the ‘experimental evidence’ criterion suggests ‘moderate
support’ for functional significance.
It is well known that CYP3A4 metabolizes compounds that are strongly associated with disease risk, such
as testosterone metabolism in prostate cancer aetiology11.
Therefore, there is strong support that the product of
this gene metabolizes relevant aetiological exposures,
including steroid hormones and carcinogens11.
Finally, there are numerous epidemiological studies
that report an association of CYP3A4*1B with various
phenotypes, thereby providing strong support for function on the basis of the ‘epidemiological evidence’ criterion. So, these criteria suggest moderate to strong support
for the hypothesis that CYP3A4*1B is a functionally
meaningful DNA change.
By applying these or similar criteria, molecular
epidemiologists might be able to maximize the
chance that association studies involve functionally
significant genetic variants, and therefore could
reduce the likelihood of TYPE I ERRORS, increase reproducibility of candidate gene association studies, and
facilitate interpretation of positive associations. Even
in the absence of a definitive function of a genetic
variant, candidate gene association studies should
not be undertaken without some consideration of
function.
www.nature.com/reviews/genetics
REVIEWS
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
Cargill, M. et al. Characterization of single-nucleotide
polymorphisms in coding regions of human genes. Nature
Genet. 22, 231–238 (1999).
Salisbury, B. A. et al. SNP and haplotype variation in the
human genome. Mutat. Res. 526, 53–61 (2003).
Schneider, J. A. et al. DNA variability of human genes.
Mech. Ageing Dev. 124, 17–25 (2003).
Schork, N. J., Fallin, D. & Lanchbury, J. S. Single nucleotide
polymorphisms and the future of genetic epidemiology.
Clin. Genet. 58, 250–264 (2000).
Sachidanandam, R. et al. A map of human genome
sequence variation containing 1.42 million single nucleotide
polymorphisms. Nature 409, 928–933 (2001).
Zhu, Y. et al. An evolutionary perspective on SNP screening
in molecular cancer epidemiology. Cancer Res. 64,
2251–2257 (2004).
The first comprehensive evaluation and comparison
of SIFT and PolyPhen algorithms in molecular
epidemiological association studies.
Lohmueller, K. E., Pearce, C. L., Pike, M., Lander, E. S. &
Hirschhorn, J. N. Meta-analysis of genetic association
studies supports a contribution of common variants to
susceptibility to common disease. Nature Genet. 33,
177–182 (2003).
A comprehensive evaluation of the consistency of
association studies that demonstrates the need for
functional correlates in achieving consistency in
association study results.
Botstein, D. & Risch, N. Discovering genotypes underlying
human phenotypes: past successes for mendelian disease,
future approaches for complex disease. Nature Genet. 33
(Suppl.), 228–237 (2003).
Stoilov, P. et al. Defects in pre-mRNA processing as causes
of and predisposition to diseases. DNA Cell Biol. 21,
803–818 (2002).
Knight, J. C. Functional implications of genetic variation in
non-coding DNA for disease susceptibility and gene
regulation. Clin. Sci. (Lond.) 104, 493–501 (2003).
Li, A. P., Kaminski, D. L. & Rasmussen, A. R. Substrates of
human hepatic cytochrome P450 3A4. Toxicology 104, 1–8
(1995).
Hashimoto, H. et al. Gene structure of CYP3A4, an adultspecific form of cytochrome P450 in human livers and its
transcriptional control. Eur. J. Biochem. 218, 585–595 (1993).
Rebbeck, T. R., Jaffe, J. M., Walker, A. H., Wein, A. J. &
Malkowicz, S. B. Modification of clinical presentation of
prostate tumors by a novel genetic variant in CYP3A4.
J. Natl Cancer Inst. 90, 1225–1229 (1998).
Paris, P. L. et al. Association between a CYP3A4 genetic
variant and clinical presentation in African-American prostate
cancer patients. Cancer Epidemiol. Biomarkers Prev. 8,
901–905 (1999).
Felix, C. A., et al. Association of CYP3A4 genotype with
treatment-related leukemia. Proc. Natl Acad. Sci. 95,
13176–13181 (1998).
Kadlubar, F. F. et al. The putative high activity variant,
CYP3A4*1B, predicts the onset of puberty in young girls.
Cancer Epidemiol. Biomarkers Prev. 12, 327–331 (2003).
Lai, J., Vesprini, D., Chu, W., Jernstrom, H. & Narod, S. A.
CYP gene polymorphisms and early menarche. Mol. Genet.
Metab. 74, 449–457 (2001).
Jernstrom, H. et al. Genetic factors related to racial variation
in plasma levels of insulin-like growth factor-1: implications
for premenopausal breast cancer risk. Mol. Genet. Metab.
72, 144–154 (2001).
Lamba, J. K. et al. Common allelic variants in cytochrome
P4503A4 and their prevalence in different populations.
Pharmacogenetics 12, 121–132 (2002).
Westlind, A., Lofberg, L., Tindberg, N., Andersson, T. B. &
Ingelman-Sundberg, M. Interindividual differences in hepatic
expression of CYP3A4: relationship to genetic
polymorphism in the 5′-upstream regulatory region.
Biochem. Biophys. Res. Commun. 259, 201–205 (1999).
Amirimani, B., Walker, A. H., Weber, B. L. & Rebbeck, T. R.
Response: re: modification of clinical presentation of
prostate tumors by a novel genetic variant in CYP3A4.
J. Natl Cancer Inst. 91, 1588–1590 (1999).
NATURE REVIEWS | GENETICS
22. Ando, Y. et al. Re: modification of clinical presentation of
prostate tumors by a novel genetic variant in CYP3A4.
J. Natl Cancer Inst. 91, 1587–1590 (1999).
23. Spurdle, A. B. et al. The CYP3A4*1B polymorphism has no
functional significance and is not associated with risk of
breast or ovarian cancer. Pharmacogenetics 12, 355–366
(2002).
24. Floyd, M. D. et al. Genotype-phenotype associations for
common CYP3A4 and CYP3A5 variants in the basal and
induced metabolism of midazolam in European- and
African-American men and women. Pharmacogenetics 13,
595–606 (2003).
25. Amirimani, B. et al. Transcriptional activity effects of a
CYP3A4 promoter variant. Environ. Mol. Mutagen. 42,
299–305 (2003).
26. Hamzeiy, H., Bombail, V., Plant, N., Gibson, G. & Goldfarb, P.
Transcriptional regulation of cytochrome P4503A4 gene
expression: effects of inherited mutations in the 5′-flanking
region. Xenobiotica 33, 1085–1095 (2003).
27. Jeon, J. & An, G. Gene tagging in rice: a high throughput
system for functional genomics. Plant Sci. 161, 211–219
(2001).
28. Cecconi, F. & Meyer, B. I. Gene trap: a way to identify novel
genes and unravel their biological function. FEBS Lett. 480,
63–71 (2000).
29. Adams, M. D. ENU mutagenesis for pharma. Drug Discov.
Today 8, 199–200 (2003)
30. Lee, Y. S. & Mrksich, M. Protein chips: from concept to
practice. Trends Biotechnol. 20 (Suppl.), S14–18 (2002).
31. Nikaido, I. et al. EICO (expression-based imprint candidate
organizer): finding disease-related imprinted genes. Nucleic
Acids Res. 32 (database issue), D548–551 (2004).
32. Knight, J. C., Keating, B. J., Rockett, K. A. & Kwiatkowski,
D. P. In vivo characterization of regulatory polymorphisms by
allele-specific quantification of RNA polymerase loading.
Nature Genet. 33, 469–475 (2003).
The authors report a new method and application of
experimental approaches to assessing genotype
function.
33. Fay, J. C., Wyckoff, G. J. & Wu, C. I. Positive and negative
selection on the human genome. Genetics 158, 1227–1234
(2001).
34. Akey, J. M., Zhang, G., Zhang, K., Jin, L. & Shriver, M. D.
Interrogating a high-density SNP map for signatures of
natural selection. Genome Res. 12, 1805–1814 (2002).
35. Feder, J. N. et al. A novel MHC class I-like gene is mutated
in patients with hereditary haemochromatosis. Nature
Genet. 13, 399–408 (1996).
36. Nielsen, D. M., Ehm, M. G. & Weir, B. S. Detecting markerdisease association by testing for Hardy-Weinberg
disequilibrium at a marker locus. Am. J. Hum. Genet. 63,
1531–1540 (1998).
37. Hoh, J., Wille, A. & Ott, J. Trimming, weighting, and
grouping SNPs in human case-control association studies.
Genome Res. 11, 2115–2119 (2001).
The authors propose a novel approach to association
studies that incorporates both association and
population genetics information in identifying disease
genes, including the possibility of genome-wide
associations.
38. Perutz, M. F. Structure and function of haemoglobin.
I. A tentative atomic model of horse oxyhaemoglobin.
J. Mol. Biol. 13, 646–668 (1965).
39. Wang, Z. & Moult, J. Three-dimensional structural
location and molecular functional effects of missense SNPs
in the T cell receptor Vβ domain. Proteins 53, 748–757
(2003).
40. Wang, Z. & Moult, J. SNPs, protein structure, and disease.
Hum. Mutat. 17, 263–270 (2001).
41. Chasman, D. & Adams, R. M. Predicting the functional
consequences of non-synonymous single nucleotide
polymorphisms: structure-based assessment of amino acid
variation. J. Mol. Biol. 307, 683–706 (2001).
42. Ferrer-Costa, C., Orozco, M. & de la Cruz, X.
Characterization of disease-associated single amino acid
polymorphisms in terms of sequence and structure
properties. J. Mol. Biol. 315, 771–786 (2002).
43. Saunders, C. T. & Baker, D. Evaluation of structural and
evolutionary contributions to deleterious mutation prediction.
J. Mol. Biol. 322, 891–901 (2002).
44. Herrgard, S. et al. Prediction of deleterious functional effects
of amino acid mutations using a library of structure-based
function descriptors. Proteins 53, 806–816 (2003).
45. Miller, M. P. & Kumar, S. Understanding human disease
mutations through the use of interspecific genetic variation.
Hum. Mol. Genet. 10, 2319–2328 (2001).
46. Koref, M. E. S., Gangeswaran, R., Koref, I. P. S., Shanahan, N.
& Hancock, J. M. A phylogenetic approach to assessing the
significance of missense mutations in disease genes. Hum.
Mutat. 22, 51–58 (2003).
47. Krishnan, V. G. & Westhead, D. R. A comparative study of
machine-learning methods to predict the effects of single
nucleotide polymorphisms on protein function.
Bioinformatics 19, 2199–2209 (2003).
48. Ng, P. C. & Henikoff, S. Predicting deleterious amino acid
substitutions. Genome Res. 11, 863–874 (2001).
An outline of the SIFT approach to assessing missense
variant function using evolutionary similarity.
49. Ng, P. C. & Henikoff, S. Accounting for human
polymorphisms predicted to affect protein function. Genome
Res. 12, 436–446 (2002).
50. Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes
that affect protein function. Nucleic Acids Res. 31,
3812–3814 (2003).
51. Ramensky, V., Bork, P. & Sunyaev, S. Human nonsynonymous SNPs: server and survey. Nucleic Acids Res.
30, 3894–3900 (2002).
An outline of the PolyPhen methodology for using
evolutionary and structure data to assess SNP function.
52. Fleming, M. A., Potter, J. D., Ramirez, C. J., Ostrander, G. K.
& Ostrander, E. A. Understanding missense mutations in the
BRCA1 gene: an evolutionary approach. Proc. Natl Acad.
Sci. USA 100, 1151–1156 (2003).
53. National Institutes of Health. The ENCODE Project:
ENCyclopedia Of DNA Elements [online],
<http://www.genome.gov/10005107> (2003).
54. Rogan, P. K., Svojanovsky, S. & Leeder, J. S. Information
theory-based analysis of CYP2C19, CYP2D6 and CYP3A5
splicing mutations. Pharmacogenetics 13, 207–218 (2003).
55. Pagani, F. & Baralle, F. E. Genomic variants in exons and
introns: identifying the splicing spoilers. Nature Rev. Genet.
5, 389–396 (2004).
56. Sunyaev, S., Ramensky, V. & Bork, P. Towards a structural
basis of human non-synonymous single nucleotide
polymorphisms. Trends Genet. 16, 198–200 (2000).
57. Sunyaev, S. et al. Prediction of deleterious human alleles.
Hum. Mol. Genet. 10, 591–597 (2001).
58. Schuetz, E. G. Lessons from the CYP3A4 promoter. Mol.
Pharmacol. 65, 279–281 (2004).
59. Zeigler-Johnson, C. M. et al. Ethnic differences in the
frequency of prostate cancer susceptibilty alleles at SRD5A2
and CYP3A4. Hum. Hered. 54, 13–21 (2002).
Acknowledgements
Some of the work discussed in this review was supported by
grants from the Public Health Service and the University of
Pennsylvania Cancer Center.
Competing interests statement
The authors declare that they have no competing financial interests.
Online links
DATABASES
The following terms in this article are linked online to:
Entrez Gene:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene
CYP3A4 | TNF
FURTHER INFORMATION
CODDLE: http://www.proweb.org/coddle
PolyPhen: http://www.bork.embl-heidelberg.de/PolyPhen
SIFT: http://blocks.fhcrc.org/%7Epauline/SIFT
Access to this interactive links box is free online.
VOLUME 5 | AUGUST 2004 | 5 9 7