Download Constraints for genetic association studies

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene therapy wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

BRCA mutation wikipedia , lookup

Gene desert wikipedia , lookup

Genetic engineering wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Behavioural genetics wikipedia , lookup

Twin study wikipedia , lookup

Ridge (biology) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Human genetic variation wikipedia , lookup

Minimal genome wikipedia , lookup

NEDD9 wikipedia , lookup

Epistasis wikipedia , lookup

Genome evolution wikipedia , lookup

History of genetic engineering wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genomic imprinting wikipedia , lookup

Heritability of IQ wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Gene wikipedia , lookup

Genetic drift wikipedia , lookup

Population genetics wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome-wide association study wikipedia , lookup

Oncogenomics wikipedia , lookup

Designer baby wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Genome (book) wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Carcinogenesis vol.28 no.3 pp.648–656, 2007
doi:10.1093/carcin/bgl182
Advance Access publication September 28, 2006
Constraints for genetic association studies imposed by attributable fraction and
familial risk
Kari Hemminki1,2, and Justo Lorenzo Bermejo1
1
Division of Molecular Genetic Epidemiology, German Cancer Research
Center (DKFZ), Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
and 2Center for Family Medicine, Karolinska Institute, 141 83 Huddinge,
Sweden
To whom correspondence should be addressed.
Email: [email protected]
Candidate gene studies have become very popular but
some of their implicit constraints, such as the familial risk
and the population attributable fraction (PAF) conferred
by the gene under study, are poorly understood. We
model here these parameters for susceptibility genes in
terms of genotype relative risk (GRR), allele frequency
and statistical power in simulated genetic association
studies, assuming 500 or 2000 case–control pairs and
different modes of inheritance. The results show that the
common association studies on genes with minor allele
frequency >10% have sufficient power to detect diseasecausing variants conferring PAFs >10%, which can be
compared to known genes, such as BRCA1 with a PAF of
1.8%. Yet, common low-risk variants confer low familial
relative risks (FRRs), typically <1.1. The models show that
candidate gene studies may be able to identify genes
conferring close to 100% of the PAF, but they may not
explain the empirical FRRs. In order to explain FRRs,
rare, high-penetrant genes or interacting combinations of
common variants need to be uncovered. However, the
candidate gene studies for common alleles do not target
this class of genes. The results may challenge the common
disease–common variant hypothesis, which posits common
variants with low GRRs and large PAFs, however failing
to accommodate the empirical FRRs.
Introduction
The strategy for dissecting genetics of complex diseases has
been debated, particularly regarding the assumptions of the
common disease–common variant (CDCV) hypothesis (1–6).
Some statistical issues of concern in candidate gene and
genome-wide studies have been sample size, level of significance, marker densities, population stratification and replication of results (2,5,7,8). However, little attention has been
paid to the population attributable fractions (PAFs) and the
familial relative risks (FRR) conferred by the tested gene,
both of which limit the domain of the possible genotype
relative risks (GRRs) and allele frequencies, and vice versa.
We illustrate these concepts in Figure 1. The attractive
aspects of considering PAFs include, first, the possibility to
Abbreviations: CDCV, common disease–common variant; FRR, familial
relative risk; GRR, genotype relative risk; PAF, population attributable
fractions; SNP, single nucleotide polymorphism.
#
compare with PAFs of known genes, as shown here, and,
second, the independence of PAFs on the number of
unobserved genes and their interactions with the gene under
study. The PAF for the gene under study merely states the
contribution of the measured gene to disease etiology,
regardless of unmeasured genetic effects. The combined
PAFs of the independent susceptibility genes cannot exceed
100%, but yet they have to be able to explain completely the
familial aggregation observed in family studies (9). Obviously, frequencies of variant alleles vary between populations
and the corresponding PAFs vary accordingly.
Familial aggregation is usually measured as FRR, which
compares the risk of disease for relatives of patients to that
for the general population. PAF and FRR address some
underpinnings of the CDCV paradigm and they should be
used a priori, to assess the feasibility of association studies
and a posteriori, to evaluate the consistency of the results
with population based data (10,11). The present study
explores the constraints posed by allele frequency (q),
GRR, PAF, FRR and statistical power on genetic association
studies. Along the presentation, we show how the highpenetrant breast cancer gene BRCA1 and the novel low
risk, relatively common chromosome 8 prostate cancersusceptibility locus DG8S737 (12) fit into the calculations.
Methods
The developed models denote the frequency of a susceptibility allele A by q,
the relative risk for variant homozygotes compared to wild-type genotypes
by GRR, the proportion of cases attributable to the susceptibility alleles by
PAF, the familial relative risk by FRR, the prevalence of the disease in the
population by k and the prevalence of the disease among individuals with
wild-type genotypes by f. We explored first the relationship between q and
GRR for a fixed PAF. The following calculations are based on a dominant
model, which assumes that the risk of heterozygotes (Aa) equals that of the
variant homozygotes (AA). Similar calculations were done for the recessive
model, in which the risk of heterozygotes equals that of the wild-type
homozygotes, for the additive model, in which the risk of heterozygotes is
the mean of the two homozygotes, and for the multiplicative model, where
the risk for heterozygotes is GRR1/2. The probability that an individual
in the population is wild-type homozygote (G ¼ aa) and he is affected
(D ¼ 1) is:
PrðG ¼ aa‚ D ¼ 1Þ ¼ ð1 qÞ2 f
Similarly,
PrðG ¼ Aa‚D ¼ 1Þ ¼ 2qð1 qÞGRR f
PrðG ¼ AA‚D ¼ 1Þ ¼ q2 GRR f
The prevalence of the disease in the population is then:
k ¼ PrðG ¼ aa‚ D ¼ 1Þ þ PrðG ¼ Aa‚ D ¼ 1Þ
þ PrðG ¼ AA‚ D ¼ 1Þ‚
and the PAF is:
PAF ¼ ðk f Þ k
The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]
648
Constraints for genetic association studies
Fig. 1. Illustration of the concepts of PAF and FRR conferred by genes. The calculations assume the allele frequency q ¼0.29, the GRR ¼ 2.0 and a
dominant mode of inheritance, i.e. the risk allele A confers an equal risk to AA homozygotes and to Aa heterozygotes.
We also investigated the relationship between q and the power of a casecontrol study of the susceptibility gene for a fixed PAF. Note that the
expected distribution of cases according to genotype is given by:
PrðG ¼ aa j D ¼ 1Þ ¼ PrðG ¼ aa‚D ¼ 1Þ/ k
PrðG ¼ Aa j D ¼ 1Þ ¼ PrðG ¼ Aa‚D ¼ 1Þ/k
PrðG ¼ AA j D ¼ 1Þ ¼ PrðG ¼ AA‚D ¼ 1Þ/k
Analogous calculations permitted to derive the expected distribution
among controls. In order to estimate the power of a case–control study,
genotypes were simulated for 500 cases/500 controls and 2000 cases/2000
controls according to their expected distributions using the R function
RUNIF. We assumed Hardy–Weinberg equilibrium and k ¼ 5%, but the
results were practically independent of the prevalence of the disease in the
population. For each allele frequency between 1 and 90%, 10 000 datasets
were generated and the association between genotype and disease was
analyzed by logistic regression using the Wald tests. The power of the study
was estimated as the proportion of simulated datasets, which resulted in
significant genotype effects at the 5% confidence level.
Since the FRR reflects the effects of susceptibility genes transmitted from
parents to their offspring, we analyzed the relationship between q and the
FRR for a fixed GRR. The familial relative risk for parents and their children
is given by:
1
V a /K 2 ‚
FRR ¼ 1 þ
2
where Va is the additive genetic variance divided by f 2 and K = k/f.
Va equals 2qð1 qÞ½ð1 qÞð1 GRRÞ2 under a dominant model, under
additive inheritance V a ¼2qð1qÞ½ð1GRRÞ/22 , and V a ¼2qð1qÞ
½qð1GRRÞ2 in the recessive model (13,14). Data on GRR and q for BRCA1
mutations were taken from the literature (15). Results on the prostate cancer
susceptibility locus DG8S737 with a confirmed effect in several populations
have been recently published (12). The code for calculation of PAF, FRR and
power using the free software environment R (www.r-project.org) is provided
in the supplementary material.
PAFs and FRRs were also calculated for multiple interacting genes. We
present here the formulas for two dominant alleles and multiplicative genegene interactions, their extension to other models/additional genes is
straightforward. We know that:
PAF ¼ ðk f Þ/k ¼ ½ðk/f Þ 1/ðk/fÞ‚
649
K.Hemminki and J.L.Bermejo
where (k / f) is the population prevalence divided by the prevalence among
wild type individuals. Similarly, it is easy to show that
FRR ¼ ðPrðP ¼ 1‚D ¼ 1Þ/f 2 Þ/ðk/f Þ2 ‚
where Pr(P ¼ 1, D ¼ 1) denotes the probability that both the parent (P) and
his/her offspring (D) are affected. We have seen that:
PrðG ¼ aa‚ D ¼ 1Þ/f ¼ ð1 qÞ2 ¼ a1
PrðG ¼ Aa‚D ¼ 1Þ/f ¼ 2qð1 qÞGRR ¼ a2
PrðG ¼ AA‚D ¼ 1Þ/f ¼ q2 GRR ¼ a3
If we consider two unlinked genes, the term needed to calculate the
PAF is:
X X
ai aj
k/f ¼
i¼1‚ 3 j¼1‚ 3
Moreover, assuming random mating:
PrðGP ¼ aa‚ GO ¼ aa‚P ¼ 1‚D ¼ 1Þ
¼ 0:5PrðG ¼ aa‚D ¼ 1ÞPrðG ¼ Aa‚D ¼ 1Þ
þ PrðG ¼ aa‚D ¼ 1Þ2 ¼ b1
PrðGP ¼ aa‚GO ¼ Aa‚ P ¼ 1‚D ¼ 1Þ
¼ PrðG ¼ AA‚ D ¼ 1ÞPrðG ¼ aa‚ D ¼ 1Þ
þ 0:5PrðG ¼ aa‚D ¼ 1ÞPrðG ¼ Aa‚ D ¼ 1Þ
¼ b2
PrðGP ¼ Aa‚GO ¼ aa‚P ¼ 1‚D ¼ 1Þ
¼ 0:25 PrðG ¼ Aa‚D ¼ 1Þ2
þ 0:5 PrðG ¼ Aa‚D ¼ 1ÞPrðG ¼ aa‚D ¼ 1Þ
¼ b3
PrðGP ¼ Aa‚GO ¼ Aa‚P ¼ 1‚D ¼ 1Þ
¼ 0:5 PrðG ¼ Aa‚D ¼ 1ÞPrðG ¼ AA‚D ¼ 1Þ
þ 0:5 PrðG ¼ Aa‚D ¼ 1Þ2
þ 0:5 PrðG ¼ Aa‚D ¼ 1ÞPrðG ¼ aa‚D ¼ 1Þ
¼ b4
PrðGP ¼ Aa‚GO ¼ AA‚P ¼ 1‚D ¼ 1Þ
¼ 0:5 PrðG ¼ Aa‚ D ¼ 1ÞPrðG ¼ AA‚D ¼ 1Þ
þ 0:25 PrðG ¼ Aa‚ D ¼ 1Þ2
¼ b5
PrðGP ¼ AA‚GO ¼ Aa‚P ¼ 1‚D ¼ 1Þ
¼ 0:5 PrðG ¼ AA‚ D ¼ 1ÞPrðG ¼ Aa‚D ¼ 1Þ
þ PrðG ¼ AA‚D ¼ 1ÞPrðG ¼ aa‚D ¼ 1Þ
¼ b6
PrðGP ¼ AA‚GO ¼ AA‚P ¼ 1‚D ¼ 1Þ
¼ PrðG ¼ AA‚D ¼ 1Þ2
þ 0:5 PrðG ¼ AA‚D ¼ 1ÞPrðG ¼ Aa‚D ¼ 1Þ
¼ b7
where GP represents the genotype of the parent and GO the genotype of his/
her offspring. Finally,
X X
bi bj
PrðP ¼ 1‚D ¼ 1Þ/f 2 ¼
i¼1‚ 7 j¼1‚ 7
Results
Figure 2 shows the dependence of GRR on q at various PAFs
under three different modes of inheritance. We show a
vertical line at q ¼ 0.1 because many association studies do
not test rarer variants. For a dominant gene with a PAF
<10%, the GRRs are <2.0 at any q > 0.1. We have marked
650
the coordinates of the breast cancer-susceptibility gene in
Figure 1. BRCA1 mutations with a GRR of 10 and q ¼
0.001 explain a PAF of 1.8%. The novel prostate cancersusceptibility locus, DG8S737 with a GRR of 1.77 · 1.77 ¼
3.13 and q ¼ 0.078 confers a PAF of 11%. A comparison
among the three modes of inheritance (from dominant to
additive and to recessive) shows a gradual shifting of the
curves towards higher GRRs for a fixed allele frequency.
Figure 3 assesses the statistical power to detect a
significant GRR (significance level 0.05) in case–control
studies using either 500 cases and controls or 2000 cases and
controls. Under dominance, a study based on 500 cases and
500 controls reaches an 80% power (shown by the horizontal
line) only for rare alleles (q < 0.1) when the PAF was 5 or
10%; larger studies based on 2000 cases and controls
reach an 80% power with q ¼ 0.1 for a PAF of 5% and
with q ¼ 0.3 for a PAF of 10%. The analysis of 2000 cases
and 2000 controls does not provide reasonable power for a
PAF of 1%. Using 500 cases and controls, the effect of
BRCA1 mutations is detected with a power of 59% and
the effect of DG8S737 with a power of 93%. Again, the
curves for additive and recessive modes shift systematically
to the right, indicating higher power at higher allele
frequencies.
The relationship between allele frequency and FRR
according to a fixed GRR is shown in Figure 4. Dominant
genes result in increased FRRs at low allele frequencies; for
example a GRR of 10 results in a FRR of 2.0 for q ¼ 0.05.
The FRRs conferred by BRCA1 (1.08) and DG8S737 (1.04)
are also represented in the Figure. Again, the curves and their
maxima are right-shifted for additive and recessive genes.
For an additive gene, a GRR of 10 and q¼0.1 confers a
familial risk of 1.5; under recessive inheritance, GRR ¼ 10
and q ¼ 0.4 results in FRR ¼ 1.5.
Table I shows PAFs and FRRs for various values of q
and GRRs. PAFs increases continuously with the allele
frequency. By contrast, the FRR increases with the allele
frequency to a maximum, and decreases thereafter. For
example, a dominant allele with a GRR of 3 shows the
maximal FRR (1.15) at q ¼ 0.1 and it confers a PAF of
27.5%; when q ¼ 0.5, the FRR decreases to 1.04, but the
PAF increases to 60%. Rare variants explain relatively more
of FRR than of PAF (e.g. BRCA1 mutations: FRR ¼ 1.08,
PAF ¼ 1.8%), in contrast to common variants (DG8S737:
FRR ¼ 1.04, PAF ¼ 11%).
Table II shows PAFs and FRRs for some basic interaction models, considering up to 10 genes with q ¼ 0.1 and
GRR ¼ 2. The assumed allele–allele interactions were
dominant, additive or recessive; multiplicative and additive
gene–gene interactions were considered. Multiplicative interactions of genes with dominant alleles cause the highest
PAFs and FRRs; with 10 interacting genes, the PAF was
82.4% and the FRR was 1.65. Additive interactions of genes
with dominant alleles and multiplicative/additive interactions
of genes with additive alleles contributed relatively more to
the PAF than to the FRR. The impact of recessive alleles on
PAFs and FRRs was relatively small.
Discussion
FRR is an indicator of heritability, assuming that the contribution of shared environmental risk factors to the familial
Constraints for genetic association studies
Fig. 2. Relationship between allele frequency (q) and GRR for a fixed PAF, according to the inheritance mode (A, dominant; B, additive; and C,
recessive). The genetic parameters for DG8S737 were calculated assuming a multiplicative model.
651
K.Hemminki and J.L.Bermejo
Fig. 3. Relationship between allele frequency (q) and statistical power to detect a significant gene–disease association (Type I error ¼ 5%) for a
fixed PAF, according to the inheritance mode and the sample size. The genetic parameters for DG8S737 were calculated assuming a multiplicative model.
652
Constraints for genetic association studies
Fig. 4. Relationship between the allele frequency (q) and the FRR for a fixed GRR, according to the inheritance mode (A, dominant; B, additive; and C,
recessive). The genetic parameters for DG8S737 were calculated assuming a multiplicative model.
653
K.Hemminki and J.L.Bermejo
Table I. PAF(%) and FRR according to allele frequency (q) and GRR
Allele frequency (q)
GRR
1.5
Dominant allele
0.01
0.05
0.1
0.2
0.3
0.5
Additive allele
0.01
0.05
0.1
0.2
0.3
0.5
Recessive allele
0.01
0.05
0.1
0.2
0.3
0.5
2
3
5
PAF
FRR
PAF
FRR
PAF
FRR
PAF
FRR
PAF
FRR
PAF
FRR
1.0
4.6
8.7
15.3
20.3
27.3
1.00
1.01
1.02
1.02
1.02
1.01
2.0
8.9
16.0
26.5
33.8
42.9
1.01
1.04
1.05
1.06
1.05
1.02
3.8
16.3
27.5
41.9
50.5
60.0
1.04
1.12
1.15
1.14
1.10
1.04
7.4
28.1
43.2
59.0
67.1
75.0
1.13
1.36
1.38
1.28
1.18
1.06
15.2
46.7
63.1
76.4
82.1
87.1
1.57
1.99
1.80
1.46
1.27
1.08
27.4
64.9
78.3
87.2
90.6
93.4
2.84
2.90
2.24
1.60
1.33
1.10
66.3
90.6
95.0
97.3
98.1
98.7
11.8
4.70
2.82
1.75
1.38
1.11
0.5
2.4
4.8
9.1
13.0
20.0
1.00
1.00
1.01
1.01
1.01
1.01
1.0
4.8
9.1
16.7
23.1
33.3
1.00
1.01
1.02
1.03
1.03
1.03
2.0
9.1
16.7
28.6
37.5
50.0
1.01
1.04
1.06
1.08
1.08
1.06
3.8
16.7
28.6
44.4
54.5
66.7
1.04
1.13
1.18
1.20
1.17
1.11
8.3
31.0
47.4
64.3
73.0
81.8
1.17
1.46
1.50
1.41
1.31
1.17
16.0
48.7
65.5
79.2
85.1
90.5
1.63
2.13
1.97
1.63
1.42
1.20
49.7
83.2
90.8
95.2
96.7
98.0
7.13
4.29
2.86
1.91
1.55
1.24
0.0
0.1
0.5
2.0
4.3
11.1
1.00
1.00
1.00
1.00
1.00
1.01
0.0
0.2
1.0
3.8
8.3
20.0
1.00
1.00
1.00
1.01
1.02
1.04
0.0
0.5
2.0
7.4
15.3
33.3
1.00
1.00
1.00
1.02
1.05
1.11
0.0
1.0
3.8
13.8
26.5
50.0
1.00
1.00
1.01
1.08
1.16
1.25
0.1
2.2
8.3
26.5
44.8
69.2
1.00
1.01
1.06
1.28
1.47
1.48
0.2
4.5
16.0
43.2
63.1
82.6
1.00
1.04
1.23
1.75
1.93
1.68
1.0
19.8
49.7
79.8
89.9
96.1
1.01
1.75
3.23
3.55
2.89
1.92
Gene–gene interaction model
Additive
PAF
FRR
PAF
FRR
16.0
29.4
40.7
50.1
58.1
82.4
1.05
1.11
1.16
1.22
1.29
1.65
16.0
27.5
36.3
43.2
48.7
65.5
1.05
1.08
1.09
1.09
1.10
1.09
9.1
17.4
24.9
31.7
37.9
61.5
1.02
1.04
1.06
1.08
1.10
1.20
9.1
16.7
23.1
28.6
33.3
50.0
1.02
1.03
1.04
1.05
1.05
1.06
1.0
2.0
2.9
3.9
4.9
9.5
1.00
1.00
1.00
1.00
1.00
1.01
1.0
2.0
2.9
3.9
4.8
9.1
1.00
1.00
1.00
1.00
1.00
1.01
The assumed parameters were: allele frequency, q ¼ 0.1, GRR ¼ 2.
aggregation of disease is small. FRR for most types of cancer
are 2.0 (16), and it is likely that the familial aggregation
of cancer is mostly due to heritable causes (17,18). The FRR
of colorectal cancer is 1.5, when known conditions, such
as hereditary non-polyposis colorectal cancer (HNPCC) and
familial adenomatous polyposis are excluded (19,20). The
combined PAF of the related high penetrance genes, mismatch repair genes and APC for colorectal cancer has been
estimated to range from 1 to 3% in Western populations (21).
654
100
FRR
Multiplicative
Dominant allele/s
1
2
3
4
5
10
Additive allele/s
1
2
3
4
5
10
Recessive allele/s
1
2
3
4
5
10
20
PAF
Table II. PAF(%) and FRR according to the number of interacting genes
Number of genes
10
According to Figure 4, a FRR of 1.5 can be explained by a
dominant gene (or the combined effect of many genes) of
GRR ¼ 10 with allele frequencies of 0.01 or 0.2; for
additive or recessive genes with GRR ¼ 10, the allele frequency should be 0.1 or 0.4, respectively. According to
Table I, a dominant gene with the described parameters
would show a PAF of 15% (q ¼ 0.01) or 76% (q ¼ 0.2),
compared with a PAF of 47% for an additive gene and a PAF
of 50% for a recessive gene.
Many candidate gene association studies focus on
genes for which the minor allele frequency is 0.1 or higher
because the likelihood of identifying a significant effect
(statistical power) is higher for common alleles, assuming a
constant GRR (22,23). In the HapMap Project, the minor
allele frequency was restricted to >0.05 (24). However,
assuming a constant PAF, the statistical power to detect the
effect of rarer alleles is higher than that of common alleles,
because rarer alleles would have higher GRRs (Figure 3).
In fact, a case–control study based on 500 cases and
500 controls only has a reasonable power (>80%) to detect
dominant and additive variants with PAFs over 10%, but
a recessive gene could be identified even at a lower PAF.
If 500 cases and 500 controls were used to investigate
the effects of rare variants, the likelihood of detecting
the effect of BRCA1 mutations would be 59%; the DG8S737
locus would be detected with a power of 93% (Figure 3).
BRCA1 was originally identified in linkage studies, but the
effects of specific mutations were statistically significant in a
case–control study based on 2000 breast cancer patients and
4000 controls from Poland (15), in agreement with the
present data. The novel prostate cancer-susceptibility locus
DG8S737 was mapped in 323 Icelandic families with a
suggestive linkage signal (lod score 2.11) and confirmed in a
case–control study on 869 cases and 596 controls. It was subsequently replicated in three other populations, one Swedish
and two others European and African American (12). The
Constraints for genetic association studies
predicted power for an association study with 500 cases and
500 controls was 93%, in perfect agreement with the actual
data. It should be pointed out that this locus was not detected
in 1233 families collected by an international consortium
(25). The reasons for the discrepancies in the prostate cancer
linkage results are not known, but one of them may be the
genetic homogeneity of the Icelandic population.
Our data show that, in concert with the CDCV paradigm,
common risk alleles confer high PAFs but modest FRRs. For
example, it has been speculated that the five genes HRAS1,
NAT2, GSTT1, TNF-alpha and MTHFR would explain
54–64% of the cases of colorectal cancer, assuming that the
GRRs were replicated (11). According to a large association
study, the two genes NAT2 and GSTM1 would explain 31%
of cases of bladder cancer (26). All these variants showed
small GRRs (<2.0) and thus, they would only marginally
contribute to the familial risk. This is a dilemma of the
CDCV paradigm: candidate gene studies may be able to
identify genes conferring close to 100% of the PAF, but they
may not explain the empirical FRRs. In order to explain
FRRs, rare, high-penetrant genes need to be uncovered.
However, the candidate gene studies for common alleles do
not target this class of genes. Even the presently known
susceptibility genes have a limited effect on FRRs (in breast
cancer: BRCA1 1.08, Figure 4) because the mutant alleles are
rare. The true test for the CDCV hypothesis will come in
diseases of high FRR, such as multiple sclerosis, for which
no high penetrant genes are known to date. The FRR for
multiple sclerosis is about 8.0, which is almost impossible to
explain by any numbers of low-risk genes, assuming that
genes and no environmental sharing through e.g. infective
agents explain the risk (27,28). The interacting common
variants are expected to be rare and the associated GRRs
could be high, thus bringing gene–gene interactions into the
realm of rare Mendelian type genes. There are numerous
possibilities for such interactions, and we tested some basic
models in Table II. Multiplicative gene–gene interactions
increase FRRs but they also tend to increase PAF. Little
empirical data are available on gene–gene interactions,
simply because the sample size requirements for even binary
interactions become challenging, and they would vastly
exceed those used in the present exercise (5,29,30). Low
GRRs pose problems, even if they are formally significant,
because the carriers are not enriched among familial cases
and, thus, they cannot be properly replicated on familial
cases (31,32) and because it will be difficult to provide
experimental support in their favor. Furthermore, their
significance for individual clinical genetic counseling
remains doubtful because of low risk. Testing of rarer alleles
may help to find genes of higher risk but larger sample sizes
are required. Another option is to use familial cases, which
are likely to harbor variants of higher risk.
The present concepts have many useful applications in
practical gene identification work. Before launching on a
study, it would be useful to check the empirical FRRs of the
particular disease. To explain a high familial risk, such as
8.0 for multiple sclerosis, rare high-penetrant genes need to
be found and linkage studies may be the suitable approach.
For a low familial risk, such as 1.5 for colorectal cancer,
association studies appear justified but sample sizes should
also allow for inclusion of rare alleles. After the study, PAFs
may be used to assess the relevance of the findings in terms
of disease etiology. The PAF reflects the overall effect of the
gene under study by taking into account possible interactions
with unknown genes. The a posteriori calculation of the
FRRs is also advantageous; there are examples from the early
genotyping studies reporting such high GRRs for common
alleles that the resulting FRRs would exceed the empirical
FRRs (33).
Acknowledgements
The Family-Cancer Database was created by linking registries maintained by
Statistics Sweden and the Swedish Cancer Register, and supported by the
Deutsche Krebshilfe, the Swedish Cancer Society, the Swedish Council for
Working Life and Social Research, the EU, LSHC-LT-2004-503465 and EU
Food-CT-2005-016320.
Conflict of Interest Statement: None declared.
References
1. Lander,E. and Kruglyak,L. (1995) Genetic dissection of complex traits:
guidelines for interpreting and reporting linkage results. Nat. Genet, 11,
241–247.
2. Risch,N. (2000) Searching for genetic determinants in the new
millennium. Nature, 405, 847–856.
3. Pritchard,J.K. and Cox,N.J. (2002) The allelic architecture of human
disease genes: common disease–common variant . . . or not? Hum. Mol.
Genet, 11, 2417–2423.
4. Terwilliger,J.D. and Weiss,K.M. (2003) Confounding, ascertainment bias,
and the blind quest for a genetic ‘fountain of youth’. Ann. Med., 35,
532–544.
5. Wang,W.Y., Barratt,B.J., Clayton,D.G. and Todd,J.A. (2005) Genomewide association studies: theoretical and practical concerns. Nat. Rev.
Genet., 6, 109–118.
6. Terwilliger,J.D. and Hiekkalinna,T. (2006) An utter refutation of the
‘Fundamental Theorem of the HapMap’. Eur. J. Hum. Genet., 14,
426–437.
7. Wacholder,S., Chanock,S., Garcia-Closas,M., El Ghormli,L. and
Rothman,N. (2004) Assessing the probability that a positive report is
false: an approach for molecular epidemiology studies. J. Natl Cancer
Inst., 96, 434–442.
8. Pharoah,P.D., Dunning,A.M., Ponder,B.A. and Easton,D.F. (2004)
Association studies for finding cancer-susceptibility genetic variants.
Nat. Rev. Cancer, 4, 850–860.
9. Hemminki,K., Lorenzo Bermejo,J. and Försti,A. (2007) Human disease:
heritable and environmental aetiology. Nat. Rev. Cancer, in press.
10. Hemminki,K., Försti,A. and Lorenzo Bermejo,J. (2005) Single nucleotide
polymorphisms (SNPs) are inherited from parents and they measure
heritable events. Carcinogenesis, 4, 2.
11. Yang,Q., Khoury,M., Friedman,J., Little,J. and Flanders,W. (2005) How
many genes underlie the occurrence of common complex diseases in the
population? Int. J. Epidemiol., 34, 1129–1137.
12. Amundadottir,L.T., Sulem,P., Gudmundsson,J. et al. (2006) A common
variant associated with prostate cancer in European and African
populations. Nat. Genet., 38, 652–658.
13. James,J. (1971) Frequency in relatives for an all-or-none trait. Ann. Hum.
Genet., 35, 47–49.
14. Risch,N. (1990) Linkage strategies for genetically complex traits. I.
Multilocus models. Am. J. Hum. Genet., 46, 222–228.
15. Gorski,B., Cybulski,C., Huzarski,T. et al. (2005) Breast cancer
predisposing alleles in Poland. Breast Cancer Res. Treat., 92, 19–24.
16. Hemminki,K., Li,X. and Czene,K. (2004) Familial risk of cancer: data for
clinical counseling and cancer genetics. Int. J. Cancer, 108, 109–114.
17. Hopper,J.L. and Carlin,J.B. (1992) Familial aggregation of a disease
consequent upon correlation between relatives in a risk factor measured
on a continuous scale. Am. J. Epidemiol., 136, 1138–1147.
18. Lorenzo Bermejo,J. and Hemminki,K. (2005) Familial lung cancer and
aggregation of smoking habits:a simulation of the effect of shared
environmental factors on the familial risk of cancer. Cancer Epidemiol.
Biomarkers Prev., 14, 1738–1740.
19. Johns,L.E. and Houlston,R.S. (2001) A systematic review and metaanalysis of familial colorectal cancer risk. Am. J. Gastroenterol., 96,
2992–3003.
20. Hemminki,K., Granstrom,C. and Chen,B. (2005) The Swedish FamilyCancer: update, application to colorectal cancer and clinical relevance.
Heredit. Cancer Clin. Pract., 3, 7–18.
655
K.Hemminki and J.L.Bermejo
21. De la Chapelle,A. (2005) The incidence of Lynch syndrome. Fam.
Cancer, 4, 233–237.
22. Hemminki,K., Försti,A. and Lorenzo Bermejo,J. (2006) Gene–
environment interactions in cancer: do they exist? Ann NY Acad Sci, in
press.
23. Wang,W.Y.S., Cordell,H.J. and Todd,J.A. (2003) Association mapping of
complex diseases in linked regions: estimation of genetic effects and
feasibility of testing rare variants. Genet. Epidemiol., 24, 36–42.
24. The International HapMap Consortium. (2005) A haplotype map of the
human genome. Nature, 437, 1299–1320.
25. Xu,J., Dimitrov,L. and Chang,B.L. (2005) A combined genomewide
linkage scan of 1,233 families for prostate cancer-susceptibility genes
conducted by the international consortium for prostate cancer genetics.
Am. J. Hum. Genet., 77, 219–229.
26. Garcia-Closas,M., Malats,N. and Silverman,D. (2005) NAT2 slow
acetylation, GSTM1 null genotype, and risk of bladder cancer: results
from the Spanish Bladder Cancer Study and meta-analyses. Lancet, 366,
649–659.
27. Hemminki,K., Li,X., Johansson,S., Sundquist,K. and Sundquist,J. (2006)
Re. ‘Familial risk of multiple sclerosis: a nationwide cohort study’. Am. J.
Epidemiol., 163, 873–874.
656
28. Nielsen,N.M., Westergaard,T., Rostgaard,K., Frisch,M., Hjalgrim,H.,
Wohlfahrt,J., Koch-Henriksen,N. and Melbye,M. (2005) Familial risk of
multiple sclerosis: a nationwide cohort study. Am. J. Epidemiol.,
162, 1–5.
29. Brennan,P. (2002) Gene–environment interactions and aetiology of
cancer: what does it mean and how can we measure it. Carcinogenesis,
23, 381–387.
30. Davey Smith,G., Ebrahim,S., Lewis,S., Hansell,A.L., Palmer,L.J. and
Burton,P.R. (2005) Genetic epidemiology and public health: hope, hype,
and future prospects. Lancet, 366, 1484–1498.
31. Antoniou,A.C. and Easton,D.F. (2003) Polygenic inheritance of breast
cancer: implications for design of association studies. Genet. Epidemiol.,
25, 190–202.
32. Houlston,R.S. and Peto,J. (2003) The future of association studies of
common cancers. Hum. Genet., 112, 434–435.
33. Hemminki,K. and Lorenzo Bermejo,J. (2005) Relationships between
familial risks of cancer and the effects of heritable genes and their SNP
variants. Mutat. Res., 592, 6–17.
Received July 26, 2006; revised September 13, 2006;
accepted September 15, 2006