* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Constraints for genetic association studies
Gene therapy wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
BRCA mutation wikipedia , lookup
Gene desert wikipedia , lookup
Genetic engineering wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Behavioural genetics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Human genetic variation wikipedia , lookup
Minimal genome wikipedia , lookup
Genome evolution wikipedia , lookup
History of genetic engineering wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genomic imprinting wikipedia , lookup
Heritability of IQ wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Genetic drift wikipedia , lookup
Population genetics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome-wide association study wikipedia , lookup
Oncogenomics wikipedia , lookup
Designer baby wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Carcinogenesis vol.28 no.3 pp.648–656, 2007 doi:10.1093/carcin/bgl182 Advance Access publication September 28, 2006 Constraints for genetic association studies imposed by attributable fraction and familial risk Kari Hemminki1,2, and Justo Lorenzo Bermejo1 1 Division of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany and 2Center for Family Medicine, Karolinska Institute, 141 83 Huddinge, Sweden To whom correspondence should be addressed. Email: [email protected] Candidate gene studies have become very popular but some of their implicit constraints, such as the familial risk and the population attributable fraction (PAF) conferred by the gene under study, are poorly understood. We model here these parameters for susceptibility genes in terms of genotype relative risk (GRR), allele frequency and statistical power in simulated genetic association studies, assuming 500 or 2000 case–control pairs and different modes of inheritance. The results show that the common association studies on genes with minor allele frequency >10% have sufficient power to detect diseasecausing variants conferring PAFs >10%, which can be compared to known genes, such as BRCA1 with a PAF of 1.8%. Yet, common low-risk variants confer low familial relative risks (FRRs), typically <1.1. The models show that candidate gene studies may be able to identify genes conferring close to 100% of the PAF, but they may not explain the empirical FRRs. In order to explain FRRs, rare, high-penetrant genes or interacting combinations of common variants need to be uncovered. However, the candidate gene studies for common alleles do not target this class of genes. The results may challenge the common disease–common variant hypothesis, which posits common variants with low GRRs and large PAFs, however failing to accommodate the empirical FRRs. Introduction The strategy for dissecting genetics of complex diseases has been debated, particularly regarding the assumptions of the common disease–common variant (CDCV) hypothesis (1–6). Some statistical issues of concern in candidate gene and genome-wide studies have been sample size, level of significance, marker densities, population stratification and replication of results (2,5,7,8). However, little attention has been paid to the population attributable fractions (PAFs) and the familial relative risks (FRR) conferred by the tested gene, both of which limit the domain of the possible genotype relative risks (GRRs) and allele frequencies, and vice versa. We illustrate these concepts in Figure 1. The attractive aspects of considering PAFs include, first, the possibility to Abbreviations: CDCV, common disease–common variant; FRR, familial relative risk; GRR, genotype relative risk; PAF, population attributable fractions; SNP, single nucleotide polymorphism. # compare with PAFs of known genes, as shown here, and, second, the independence of PAFs on the number of unobserved genes and their interactions with the gene under study. The PAF for the gene under study merely states the contribution of the measured gene to disease etiology, regardless of unmeasured genetic effects. The combined PAFs of the independent susceptibility genes cannot exceed 100%, but yet they have to be able to explain completely the familial aggregation observed in family studies (9). Obviously, frequencies of variant alleles vary between populations and the corresponding PAFs vary accordingly. Familial aggregation is usually measured as FRR, which compares the risk of disease for relatives of patients to that for the general population. PAF and FRR address some underpinnings of the CDCV paradigm and they should be used a priori, to assess the feasibility of association studies and a posteriori, to evaluate the consistency of the results with population based data (10,11). The present study explores the constraints posed by allele frequency (q), GRR, PAF, FRR and statistical power on genetic association studies. Along the presentation, we show how the highpenetrant breast cancer gene BRCA1 and the novel low risk, relatively common chromosome 8 prostate cancersusceptibility locus DG8S737 (12) fit into the calculations. Methods The developed models denote the frequency of a susceptibility allele A by q, the relative risk for variant homozygotes compared to wild-type genotypes by GRR, the proportion of cases attributable to the susceptibility alleles by PAF, the familial relative risk by FRR, the prevalence of the disease in the population by k and the prevalence of the disease among individuals with wild-type genotypes by f. We explored first the relationship between q and GRR for a fixed PAF. The following calculations are based on a dominant model, which assumes that the risk of heterozygotes (Aa) equals that of the variant homozygotes (AA). Similar calculations were done for the recessive model, in which the risk of heterozygotes equals that of the wild-type homozygotes, for the additive model, in which the risk of heterozygotes is the mean of the two homozygotes, and for the multiplicative model, where the risk for heterozygotes is GRR1/2. The probability that an individual in the population is wild-type homozygote (G ¼ aa) and he is affected (D ¼ 1) is: PrðG ¼ aa‚ D ¼ 1Þ ¼ ð1 qÞ2 f Similarly, PrðG ¼ Aa‚D ¼ 1Þ ¼ 2qð1 qÞGRR f PrðG ¼ AA‚D ¼ 1Þ ¼ q2 GRR f The prevalence of the disease in the population is then: k ¼ PrðG ¼ aa‚ D ¼ 1Þ þ PrðG ¼ Aa‚ D ¼ 1Þ þ PrðG ¼ AA‚ D ¼ 1Þ‚ and the PAF is: PAF ¼ ðk f Þ k The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 648 Constraints for genetic association studies Fig. 1. Illustration of the concepts of PAF and FRR conferred by genes. The calculations assume the allele frequency q ¼0.29, the GRR ¼ 2.0 and a dominant mode of inheritance, i.e. the risk allele A confers an equal risk to AA homozygotes and to Aa heterozygotes. We also investigated the relationship between q and the power of a casecontrol study of the susceptibility gene for a fixed PAF. Note that the expected distribution of cases according to genotype is given by: PrðG ¼ aa j D ¼ 1Þ ¼ PrðG ¼ aa‚D ¼ 1Þ/ k PrðG ¼ Aa j D ¼ 1Þ ¼ PrðG ¼ Aa‚D ¼ 1Þ/k PrðG ¼ AA j D ¼ 1Þ ¼ PrðG ¼ AA‚D ¼ 1Þ/k Analogous calculations permitted to derive the expected distribution among controls. In order to estimate the power of a case–control study, genotypes were simulated for 500 cases/500 controls and 2000 cases/2000 controls according to their expected distributions using the R function RUNIF. We assumed Hardy–Weinberg equilibrium and k ¼ 5%, but the results were practically independent of the prevalence of the disease in the population. For each allele frequency between 1 and 90%, 10 000 datasets were generated and the association between genotype and disease was analyzed by logistic regression using the Wald tests. The power of the study was estimated as the proportion of simulated datasets, which resulted in significant genotype effects at the 5% confidence level. Since the FRR reflects the effects of susceptibility genes transmitted from parents to their offspring, we analyzed the relationship between q and the FRR for a fixed GRR. The familial relative risk for parents and their children is given by: 1 V a /K 2 ‚ FRR ¼ 1 þ 2 where Va is the additive genetic variance divided by f 2 and K = k/f. Va equals 2qð1 qÞ½ð1 qÞð1 GRRÞ2 under a dominant model, under additive inheritance V a ¼2qð1qÞ½ð1GRRÞ/22 , and V a ¼2qð1qÞ ½qð1GRRÞ2 in the recessive model (13,14). Data on GRR and q for BRCA1 mutations were taken from the literature (15). Results on the prostate cancer susceptibility locus DG8S737 with a confirmed effect in several populations have been recently published (12). The code for calculation of PAF, FRR and power using the free software environment R (www.r-project.org) is provided in the supplementary material. PAFs and FRRs were also calculated for multiple interacting genes. We present here the formulas for two dominant alleles and multiplicative genegene interactions, their extension to other models/additional genes is straightforward. We know that: PAF ¼ ðk f Þ/k ¼ ½ðk/f Þ 1/ðk/fÞ‚ 649 K.Hemminki and J.L.Bermejo where (k / f) is the population prevalence divided by the prevalence among wild type individuals. Similarly, it is easy to show that FRR ¼ ðPrðP ¼ 1‚D ¼ 1Þ/f 2 Þ/ðk/f Þ2 ‚ where Pr(P ¼ 1, D ¼ 1) denotes the probability that both the parent (P) and his/her offspring (D) are affected. We have seen that: PrðG ¼ aa‚ D ¼ 1Þ/f ¼ ð1 qÞ2 ¼ a1 PrðG ¼ Aa‚D ¼ 1Þ/f ¼ 2qð1 qÞGRR ¼ a2 PrðG ¼ AA‚D ¼ 1Þ/f ¼ q2 GRR ¼ a3 If we consider two unlinked genes, the term needed to calculate the PAF is: X X ai aj k/f ¼ i¼1‚ 3 j¼1‚ 3 Moreover, assuming random mating: PrðGP ¼ aa‚ GO ¼ aa‚P ¼ 1‚D ¼ 1Þ ¼ 0:5PrðG ¼ aa‚D ¼ 1ÞPrðG ¼ Aa‚D ¼ 1Þ þ PrðG ¼ aa‚D ¼ 1Þ2 ¼ b1 PrðGP ¼ aa‚GO ¼ Aa‚ P ¼ 1‚D ¼ 1Þ ¼ PrðG ¼ AA‚ D ¼ 1ÞPrðG ¼ aa‚ D ¼ 1Þ þ 0:5PrðG ¼ aa‚D ¼ 1ÞPrðG ¼ Aa‚ D ¼ 1Þ ¼ b2 PrðGP ¼ Aa‚GO ¼ aa‚P ¼ 1‚D ¼ 1Þ ¼ 0:25 PrðG ¼ Aa‚D ¼ 1Þ2 þ 0:5 PrðG ¼ Aa‚D ¼ 1ÞPrðG ¼ aa‚D ¼ 1Þ ¼ b3 PrðGP ¼ Aa‚GO ¼ Aa‚P ¼ 1‚D ¼ 1Þ ¼ 0:5 PrðG ¼ Aa‚D ¼ 1ÞPrðG ¼ AA‚D ¼ 1Þ þ 0:5 PrðG ¼ Aa‚D ¼ 1Þ2 þ 0:5 PrðG ¼ Aa‚D ¼ 1ÞPrðG ¼ aa‚D ¼ 1Þ ¼ b4 PrðGP ¼ Aa‚GO ¼ AA‚P ¼ 1‚D ¼ 1Þ ¼ 0:5 PrðG ¼ Aa‚ D ¼ 1ÞPrðG ¼ AA‚D ¼ 1Þ þ 0:25 PrðG ¼ Aa‚ D ¼ 1Þ2 ¼ b5 PrðGP ¼ AA‚GO ¼ Aa‚P ¼ 1‚D ¼ 1Þ ¼ 0:5 PrðG ¼ AA‚ D ¼ 1ÞPrðG ¼ Aa‚D ¼ 1Þ þ PrðG ¼ AA‚D ¼ 1ÞPrðG ¼ aa‚D ¼ 1Þ ¼ b6 PrðGP ¼ AA‚GO ¼ AA‚P ¼ 1‚D ¼ 1Þ ¼ PrðG ¼ AA‚D ¼ 1Þ2 þ 0:5 PrðG ¼ AA‚D ¼ 1ÞPrðG ¼ Aa‚D ¼ 1Þ ¼ b7 where GP represents the genotype of the parent and GO the genotype of his/ her offspring. Finally, X X bi bj PrðP ¼ 1‚D ¼ 1Þ/f 2 ¼ i¼1‚ 7 j¼1‚ 7 Results Figure 2 shows the dependence of GRR on q at various PAFs under three different modes of inheritance. We show a vertical line at q ¼ 0.1 because many association studies do not test rarer variants. For a dominant gene with a PAF <10%, the GRRs are <2.0 at any q > 0.1. We have marked 650 the coordinates of the breast cancer-susceptibility gene in Figure 1. BRCA1 mutations with a GRR of 10 and q ¼ 0.001 explain a PAF of 1.8%. The novel prostate cancersusceptibility locus, DG8S737 with a GRR of 1.77 · 1.77 ¼ 3.13 and q ¼ 0.078 confers a PAF of 11%. A comparison among the three modes of inheritance (from dominant to additive and to recessive) shows a gradual shifting of the curves towards higher GRRs for a fixed allele frequency. Figure 3 assesses the statistical power to detect a significant GRR (significance level 0.05) in case–control studies using either 500 cases and controls or 2000 cases and controls. Under dominance, a study based on 500 cases and 500 controls reaches an 80% power (shown by the horizontal line) only for rare alleles (q < 0.1) when the PAF was 5 or 10%; larger studies based on 2000 cases and controls reach an 80% power with q ¼ 0.1 for a PAF of 5% and with q ¼ 0.3 for a PAF of 10%. The analysis of 2000 cases and 2000 controls does not provide reasonable power for a PAF of 1%. Using 500 cases and controls, the effect of BRCA1 mutations is detected with a power of 59% and the effect of DG8S737 with a power of 93%. Again, the curves for additive and recessive modes shift systematically to the right, indicating higher power at higher allele frequencies. The relationship between allele frequency and FRR according to a fixed GRR is shown in Figure 4. Dominant genes result in increased FRRs at low allele frequencies; for example a GRR of 10 results in a FRR of 2.0 for q ¼ 0.05. The FRRs conferred by BRCA1 (1.08) and DG8S737 (1.04) are also represented in the Figure. Again, the curves and their maxima are right-shifted for additive and recessive genes. For an additive gene, a GRR of 10 and q¼0.1 confers a familial risk of 1.5; under recessive inheritance, GRR ¼ 10 and q ¼ 0.4 results in FRR ¼ 1.5. Table I shows PAFs and FRRs for various values of q and GRRs. PAFs increases continuously with the allele frequency. By contrast, the FRR increases with the allele frequency to a maximum, and decreases thereafter. For example, a dominant allele with a GRR of 3 shows the maximal FRR (1.15) at q ¼ 0.1 and it confers a PAF of 27.5%; when q ¼ 0.5, the FRR decreases to 1.04, but the PAF increases to 60%. Rare variants explain relatively more of FRR than of PAF (e.g. BRCA1 mutations: FRR ¼ 1.08, PAF ¼ 1.8%), in contrast to common variants (DG8S737: FRR ¼ 1.04, PAF ¼ 11%). Table II shows PAFs and FRRs for some basic interaction models, considering up to 10 genes with q ¼ 0.1 and GRR ¼ 2. The assumed allele–allele interactions were dominant, additive or recessive; multiplicative and additive gene–gene interactions were considered. Multiplicative interactions of genes with dominant alleles cause the highest PAFs and FRRs; with 10 interacting genes, the PAF was 82.4% and the FRR was 1.65. Additive interactions of genes with dominant alleles and multiplicative/additive interactions of genes with additive alleles contributed relatively more to the PAF than to the FRR. The impact of recessive alleles on PAFs and FRRs was relatively small. Discussion FRR is an indicator of heritability, assuming that the contribution of shared environmental risk factors to the familial Constraints for genetic association studies Fig. 2. Relationship between allele frequency (q) and GRR for a fixed PAF, according to the inheritance mode (A, dominant; B, additive; and C, recessive). The genetic parameters for DG8S737 were calculated assuming a multiplicative model. 651 K.Hemminki and J.L.Bermejo Fig. 3. Relationship between allele frequency (q) and statistical power to detect a significant gene–disease association (Type I error ¼ 5%) for a fixed PAF, according to the inheritance mode and the sample size. The genetic parameters for DG8S737 were calculated assuming a multiplicative model. 652 Constraints for genetic association studies Fig. 4. Relationship between the allele frequency (q) and the FRR for a fixed GRR, according to the inheritance mode (A, dominant; B, additive; and C, recessive). The genetic parameters for DG8S737 were calculated assuming a multiplicative model. 653 K.Hemminki and J.L.Bermejo Table I. PAF(%) and FRR according to allele frequency (q) and GRR Allele frequency (q) GRR 1.5 Dominant allele 0.01 0.05 0.1 0.2 0.3 0.5 Additive allele 0.01 0.05 0.1 0.2 0.3 0.5 Recessive allele 0.01 0.05 0.1 0.2 0.3 0.5 2 3 5 PAF FRR PAF FRR PAF FRR PAF FRR PAF FRR PAF FRR 1.0 4.6 8.7 15.3 20.3 27.3 1.00 1.01 1.02 1.02 1.02 1.01 2.0 8.9 16.0 26.5 33.8 42.9 1.01 1.04 1.05 1.06 1.05 1.02 3.8 16.3 27.5 41.9 50.5 60.0 1.04 1.12 1.15 1.14 1.10 1.04 7.4 28.1 43.2 59.0 67.1 75.0 1.13 1.36 1.38 1.28 1.18 1.06 15.2 46.7 63.1 76.4 82.1 87.1 1.57 1.99 1.80 1.46 1.27 1.08 27.4 64.9 78.3 87.2 90.6 93.4 2.84 2.90 2.24 1.60 1.33 1.10 66.3 90.6 95.0 97.3 98.1 98.7 11.8 4.70 2.82 1.75 1.38 1.11 0.5 2.4 4.8 9.1 13.0 20.0 1.00 1.00 1.01 1.01 1.01 1.01 1.0 4.8 9.1 16.7 23.1 33.3 1.00 1.01 1.02 1.03 1.03 1.03 2.0 9.1 16.7 28.6 37.5 50.0 1.01 1.04 1.06 1.08 1.08 1.06 3.8 16.7 28.6 44.4 54.5 66.7 1.04 1.13 1.18 1.20 1.17 1.11 8.3 31.0 47.4 64.3 73.0 81.8 1.17 1.46 1.50 1.41 1.31 1.17 16.0 48.7 65.5 79.2 85.1 90.5 1.63 2.13 1.97 1.63 1.42 1.20 49.7 83.2 90.8 95.2 96.7 98.0 7.13 4.29 2.86 1.91 1.55 1.24 0.0 0.1 0.5 2.0 4.3 11.1 1.00 1.00 1.00 1.00 1.00 1.01 0.0 0.2 1.0 3.8 8.3 20.0 1.00 1.00 1.00 1.01 1.02 1.04 0.0 0.5 2.0 7.4 15.3 33.3 1.00 1.00 1.00 1.02 1.05 1.11 0.0 1.0 3.8 13.8 26.5 50.0 1.00 1.00 1.01 1.08 1.16 1.25 0.1 2.2 8.3 26.5 44.8 69.2 1.00 1.01 1.06 1.28 1.47 1.48 0.2 4.5 16.0 43.2 63.1 82.6 1.00 1.04 1.23 1.75 1.93 1.68 1.0 19.8 49.7 79.8 89.9 96.1 1.01 1.75 3.23 3.55 2.89 1.92 Gene–gene interaction model Additive PAF FRR PAF FRR 16.0 29.4 40.7 50.1 58.1 82.4 1.05 1.11 1.16 1.22 1.29 1.65 16.0 27.5 36.3 43.2 48.7 65.5 1.05 1.08 1.09 1.09 1.10 1.09 9.1 17.4 24.9 31.7 37.9 61.5 1.02 1.04 1.06 1.08 1.10 1.20 9.1 16.7 23.1 28.6 33.3 50.0 1.02 1.03 1.04 1.05 1.05 1.06 1.0 2.0 2.9 3.9 4.9 9.5 1.00 1.00 1.00 1.00 1.00 1.01 1.0 2.0 2.9 3.9 4.8 9.1 1.00 1.00 1.00 1.00 1.00 1.01 The assumed parameters were: allele frequency, q ¼ 0.1, GRR ¼ 2. aggregation of disease is small. FRR for most types of cancer are 2.0 (16), and it is likely that the familial aggregation of cancer is mostly due to heritable causes (17,18). The FRR of colorectal cancer is 1.5, when known conditions, such as hereditary non-polyposis colorectal cancer (HNPCC) and familial adenomatous polyposis are excluded (19,20). The combined PAF of the related high penetrance genes, mismatch repair genes and APC for colorectal cancer has been estimated to range from 1 to 3% in Western populations (21). 654 100 FRR Multiplicative Dominant allele/s 1 2 3 4 5 10 Additive allele/s 1 2 3 4 5 10 Recessive allele/s 1 2 3 4 5 10 20 PAF Table II. PAF(%) and FRR according to the number of interacting genes Number of genes 10 According to Figure 4, a FRR of 1.5 can be explained by a dominant gene (or the combined effect of many genes) of GRR ¼ 10 with allele frequencies of 0.01 or 0.2; for additive or recessive genes with GRR ¼ 10, the allele frequency should be 0.1 or 0.4, respectively. According to Table I, a dominant gene with the described parameters would show a PAF of 15% (q ¼ 0.01) or 76% (q ¼ 0.2), compared with a PAF of 47% for an additive gene and a PAF of 50% for a recessive gene. Many candidate gene association studies focus on genes for which the minor allele frequency is 0.1 or higher because the likelihood of identifying a significant effect (statistical power) is higher for common alleles, assuming a constant GRR (22,23). In the HapMap Project, the minor allele frequency was restricted to >0.05 (24). However, assuming a constant PAF, the statistical power to detect the effect of rarer alleles is higher than that of common alleles, because rarer alleles would have higher GRRs (Figure 3). In fact, a case–control study based on 500 cases and 500 controls only has a reasonable power (>80%) to detect dominant and additive variants with PAFs over 10%, but a recessive gene could be identified even at a lower PAF. If 500 cases and 500 controls were used to investigate the effects of rare variants, the likelihood of detecting the effect of BRCA1 mutations would be 59%; the DG8S737 locus would be detected with a power of 93% (Figure 3). BRCA1 was originally identified in linkage studies, but the effects of specific mutations were statistically significant in a case–control study based on 2000 breast cancer patients and 4000 controls from Poland (15), in agreement with the present data. The novel prostate cancer-susceptibility locus DG8S737 was mapped in 323 Icelandic families with a suggestive linkage signal (lod score 2.11) and confirmed in a case–control study on 869 cases and 596 controls. It was subsequently replicated in three other populations, one Swedish and two others European and African American (12). The Constraints for genetic association studies predicted power for an association study with 500 cases and 500 controls was 93%, in perfect agreement with the actual data. It should be pointed out that this locus was not detected in 1233 families collected by an international consortium (25). The reasons for the discrepancies in the prostate cancer linkage results are not known, but one of them may be the genetic homogeneity of the Icelandic population. Our data show that, in concert with the CDCV paradigm, common risk alleles confer high PAFs but modest FRRs. For example, it has been speculated that the five genes HRAS1, NAT2, GSTT1, TNF-alpha and MTHFR would explain 54–64% of the cases of colorectal cancer, assuming that the GRRs were replicated (11). According to a large association study, the two genes NAT2 and GSTM1 would explain 31% of cases of bladder cancer (26). All these variants showed small GRRs (<2.0) and thus, they would only marginally contribute to the familial risk. This is a dilemma of the CDCV paradigm: candidate gene studies may be able to identify genes conferring close to 100% of the PAF, but they may not explain the empirical FRRs. In order to explain FRRs, rare, high-penetrant genes need to be uncovered. However, the candidate gene studies for common alleles do not target this class of genes. Even the presently known susceptibility genes have a limited effect on FRRs (in breast cancer: BRCA1 1.08, Figure 4) because the mutant alleles are rare. The true test for the CDCV hypothesis will come in diseases of high FRR, such as multiple sclerosis, for which no high penetrant genes are known to date. The FRR for multiple sclerosis is about 8.0, which is almost impossible to explain by any numbers of low-risk genes, assuming that genes and no environmental sharing through e.g. infective agents explain the risk (27,28). The interacting common variants are expected to be rare and the associated GRRs could be high, thus bringing gene–gene interactions into the realm of rare Mendelian type genes. There are numerous possibilities for such interactions, and we tested some basic models in Table II. Multiplicative gene–gene interactions increase FRRs but they also tend to increase PAF. Little empirical data are available on gene–gene interactions, simply because the sample size requirements for even binary interactions become challenging, and they would vastly exceed those used in the present exercise (5,29,30). Low GRRs pose problems, even if they are formally significant, because the carriers are not enriched among familial cases and, thus, they cannot be properly replicated on familial cases (31,32) and because it will be difficult to provide experimental support in their favor. Furthermore, their significance for individual clinical genetic counseling remains doubtful because of low risk. Testing of rarer alleles may help to find genes of higher risk but larger sample sizes are required. Another option is to use familial cases, which are likely to harbor variants of higher risk. The present concepts have many useful applications in practical gene identification work. Before launching on a study, it would be useful to check the empirical FRRs of the particular disease. To explain a high familial risk, such as 8.0 for multiple sclerosis, rare high-penetrant genes need to be found and linkage studies may be the suitable approach. For a low familial risk, such as 1.5 for colorectal cancer, association studies appear justified but sample sizes should also allow for inclusion of rare alleles. After the study, PAFs may be used to assess the relevance of the findings in terms of disease etiology. The PAF reflects the overall effect of the gene under study by taking into account possible interactions with unknown genes. The a posteriori calculation of the FRRs is also advantageous; there are examples from the early genotyping studies reporting such high GRRs for common alleles that the resulting FRRs would exceed the empirical FRRs (33). Acknowledgements The Family-Cancer Database was created by linking registries maintained by Statistics Sweden and the Swedish Cancer Register, and supported by the Deutsche Krebshilfe, the Swedish Cancer Society, the Swedish Council for Working Life and Social Research, the EU, LSHC-LT-2004-503465 and EU Food-CT-2005-016320. Conflict of Interest Statement: None declared. References 1. Lander,E. and Kruglyak,L. (1995) Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat. Genet, 11, 241–247. 2. Risch,N. (2000) Searching for genetic determinants in the new millennium. Nature, 405, 847–856. 3. Pritchard,J.K. and Cox,N.J. (2002) The allelic architecture of human disease genes: common disease–common variant . . . or not? Hum. Mol. Genet, 11, 2417–2423. 4. Terwilliger,J.D. and Weiss,K.M. (2003) Confounding, ascertainment bias, and the blind quest for a genetic ‘fountain of youth’. Ann. Med., 35, 532–544. 5. Wang,W.Y., Barratt,B.J., Clayton,D.G. and Todd,J.A. (2005) Genomewide association studies: theoretical and practical concerns. Nat. Rev. Genet., 6, 109–118. 6. Terwilliger,J.D. and Hiekkalinna,T. (2006) An utter refutation of the ‘Fundamental Theorem of the HapMap’. Eur. J. Hum. Genet., 14, 426–437. 7. Wacholder,S., Chanock,S., Garcia-Closas,M., El Ghormli,L. and Rothman,N. (2004) Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl Cancer Inst., 96, 434–442. 8. Pharoah,P.D., Dunning,A.M., Ponder,B.A. and Easton,D.F. (2004) Association studies for finding cancer-susceptibility genetic variants. Nat. Rev. Cancer, 4, 850–860. 9. Hemminki,K., Lorenzo Bermejo,J. and Försti,A. (2007) Human disease: heritable and environmental aetiology. Nat. Rev. Cancer, in press. 10. Hemminki,K., Försti,A. and Lorenzo Bermejo,J. (2005) Single nucleotide polymorphisms (SNPs) are inherited from parents and they measure heritable events. Carcinogenesis, 4, 2. 11. Yang,Q., Khoury,M., Friedman,J., Little,J. and Flanders,W. (2005) How many genes underlie the occurrence of common complex diseases in the population? Int. J. Epidemiol., 34, 1129–1137. 12. Amundadottir,L.T., Sulem,P., Gudmundsson,J. et al. (2006) A common variant associated with prostate cancer in European and African populations. Nat. Genet., 38, 652–658. 13. James,J. (1971) Frequency in relatives for an all-or-none trait. Ann. Hum. Genet., 35, 47–49. 14. Risch,N. (1990) Linkage strategies for genetically complex traits. I. Multilocus models. Am. J. Hum. Genet., 46, 222–228. 15. Gorski,B., Cybulski,C., Huzarski,T. et al. (2005) Breast cancer predisposing alleles in Poland. Breast Cancer Res. Treat., 92, 19–24. 16. Hemminki,K., Li,X. and Czene,K. (2004) Familial risk of cancer: data for clinical counseling and cancer genetics. Int. J. Cancer, 108, 109–114. 17. Hopper,J.L. and Carlin,J.B. (1992) Familial aggregation of a disease consequent upon correlation between relatives in a risk factor measured on a continuous scale. Am. J. Epidemiol., 136, 1138–1147. 18. Lorenzo Bermejo,J. and Hemminki,K. (2005) Familial lung cancer and aggregation of smoking habits:a simulation of the effect of shared environmental factors on the familial risk of cancer. Cancer Epidemiol. Biomarkers Prev., 14, 1738–1740. 19. Johns,L.E. and Houlston,R.S. (2001) A systematic review and metaanalysis of familial colorectal cancer risk. Am. J. Gastroenterol., 96, 2992–3003. 20. Hemminki,K., Granstrom,C. and Chen,B. (2005) The Swedish FamilyCancer: update, application to colorectal cancer and clinical relevance. Heredit. Cancer Clin. Pract., 3, 7–18. 655 K.Hemminki and J.L.Bermejo 21. De la Chapelle,A. (2005) The incidence of Lynch syndrome. Fam. Cancer, 4, 233–237. 22. Hemminki,K., Försti,A. and Lorenzo Bermejo,J. (2006) Gene– environment interactions in cancer: do they exist? Ann NY Acad Sci, in press. 23. Wang,W.Y.S., Cordell,H.J. and Todd,J.A. (2003) Association mapping of complex diseases in linked regions: estimation of genetic effects and feasibility of testing rare variants. Genet. Epidemiol., 24, 36–42. 24. The International HapMap Consortium. (2005) A haplotype map of the human genome. Nature, 437, 1299–1320. 25. Xu,J., Dimitrov,L. and Chang,B.L. (2005) A combined genomewide linkage scan of 1,233 families for prostate cancer-susceptibility genes conducted by the international consortium for prostate cancer genetics. Am. J. Hum. Genet., 77, 219–229. 26. Garcia-Closas,M., Malats,N. and Silverman,D. (2005) NAT2 slow acetylation, GSTM1 null genotype, and risk of bladder cancer: results from the Spanish Bladder Cancer Study and meta-analyses. Lancet, 366, 649–659. 27. Hemminki,K., Li,X., Johansson,S., Sundquist,K. and Sundquist,J. (2006) Re. ‘Familial risk of multiple sclerosis: a nationwide cohort study’. Am. J. Epidemiol., 163, 873–874. 656 28. Nielsen,N.M., Westergaard,T., Rostgaard,K., Frisch,M., Hjalgrim,H., Wohlfahrt,J., Koch-Henriksen,N. and Melbye,M. (2005) Familial risk of multiple sclerosis: a nationwide cohort study. Am. J. Epidemiol., 162, 1–5. 29. Brennan,P. (2002) Gene–environment interactions and aetiology of cancer: what does it mean and how can we measure it. Carcinogenesis, 23, 381–387. 30. Davey Smith,G., Ebrahim,S., Lewis,S., Hansell,A.L., Palmer,L.J. and Burton,P.R. (2005) Genetic epidemiology and public health: hope, hype, and future prospects. Lancet, 366, 1484–1498. 31. Antoniou,A.C. and Easton,D.F. (2003) Polygenic inheritance of breast cancer: implications for design of association studies. Genet. Epidemiol., 25, 190–202. 32. Houlston,R.S. and Peto,J. (2003) The future of association studies of common cancers. Hum. Genet., 112, 434–435. 33. Hemminki,K. and Lorenzo Bermejo,J. (2005) Relationships between familial risks of cancer and the effects of heritable genes and their SNP variants. Mutat. Res., 592, 6–17. Received July 26, 2006; revised September 13, 2006; accepted September 15, 2006