* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Case-Parent Triads
History of genetic engineering wikipedia , lookup
Gene expression programming wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genetic testing wikipedia , lookup
Genetic engineering wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Birth defect wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Human genetic variation wikipedia , lookup
Behavioural genetics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Genome (book) wikipedia , lookup
Designer baby wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Heritability of IQ wikipedia , lookup
Public health genomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome-wide association study wikipedia , lookup
Population genetics wikipedia , lookup
Fetal origins hypothesis wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Microevolution wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
American Journal of Epidemiology Copyright © 1998 by The Johns Hopkins University School of Hygiene and Public Health All rights reserved Vol. 148, No. 9 Printed in U.S.A. Distinguishing the Effects of Maternal and Offspring Genes through Studies of "Case-Parent Triads" Allen J. Wilcox,1 Clarice R. Weinberg,2 and Rolv Terje Lie3 A gene variant that increases disease risk will be overrepresented among diseased persons, even compared with their own biologic parents. This insight has led to tests based solely on the asymmetric distribution of a variant allele among cases and their parents (e.g., the transmission/disequilibrium test). Existing methods focus on effects of alleles that operate through the offspring genotype. Alleles can also operate through the mother's genotype, particularly for conditions such as birth defects that have their origins in fetal life. An allele working through the mother would have higher frequency in case-mothers than in case-fathers. The authors develop a log-linear method for estimating relative risks for alleles in the context of case-parent triads. This method is able to detect the effects of genes working through the offspring, the mother, or both. The authors assume Mendelian inheritance, but Hardy-Weinberg equilibrium is unnecessary. Their approach uses standard software, and simulations demonstrate satisfactory power and confidence interval coverage. This method is valid with a self-selected or hospital-based series of cases and helps to protect against misleading inference that can result when cases and controls are randomly sampled from a population not in Hardy-Weinberg equilibrium. Am J Epidemiol 1998;148:893-901. abnormalities; alleles; case-control studies; epidemiologic disequilibrium; models, genetic; models, statistical methods; genetic markers; linkage developed to detect distortions in transmission from parent to child but, rather, to detect the asymmetries in allele distribution that can occur among affected offspring and their parents. The transmission/disequilibrium test and related methods for analysis of case-parent triads are useful for many diseases, but they have an inherent limitation for the study of diseases that originate during fetal life. The mother plays a crucial role as not only genetic parent but also fetal environment. Thus, a maternal allele may damage a fetus through effects on the intrauterine milieu, regardless of whether the allele is passed to the fetus. Consider the gene for the metabolic enzyme 5,10-methylenetetrahydrofolate reductase (MTHFR), which regulates a key step in the metabolism of folic acid. Low maternal intake of folic acid has been shown to increase the risk of neural tube defects in offspring (7). By extension, mothers who carry a variant of the MTHFR gene have been hypothesized to be at increased risk of bearing a child with neural tube defects (8). None of the current methods for analyzing case-parent triads would be able to detect this maternal genetic risk. We propose a simple method of analysis, based on genotypes of cases and their parents, that estimates relative risks associated with both the mother's and the offspring's genotypes. Low-penetrance genes may not produce a high absolute risk of disease, but their relative risk can be substantial. The availability of molecular genetic tools to study low-penetrance genes has created new possibilities for the estimation of gene relative risk. One ingenious approach requires no controls in the usual sense but relies instead on allele frequencies among diseased persons and their biologic parents. The key observation, made by Rubinstein et al. (1) in 1981, is that alleles associated with a given disease will occur more often in diseased persons than would have been expected based on the allele distribution in their parents. Various statistical methods have been proposed to use this observation for inferring increased risk, the best-known being the transmission/disequilibrium test (1-6). Although the terminology may be misleading, geneticists understand that these methods were not Received for publication August 29, 1997, and accepted for publication March 25, 1998. Abbreviation: MTHFR, 5,10-methylenetetrahydrofolate reductase. 1 Epidemiology Branch, NIEHS, Research Triangle Park, NC. 2 Biostatistics Branch, NIEHS, Research Triangle Park, NC. 3 Division for Medical Statistics, University of Bergen, Bergen, Norway. Reprint requests to Dr. Allen Wilcox, Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709. 893 894 Distinguishing Gene Effects Using Case-Parent Triads MATERIALS AND METHODS Any disease or condition that has its origins in fetal life would be eligible for study using the method described here. No rare-disease assumption is necessary. For this illustration, we assume the condition under study is a type of birth defect. We assume that an allele suspected of increasing the risk of the birth defect has been identified. We designate this allele as the "variant." Consider two possible biologic scenarios. In scenario A, the allele works through the fetal genotype to increase the susceptibility of the fetus to a particular birth defect. This risk occurs regardless of which parent transmits the allele. This is the mode of action most often assumed in genetic epidemiology studies. Under scenario A, the allele will be more frequent among case offspring than predicted by the allele's distribution in the parents. Statistical tests such as the transmission/disequilibrium test are designed specifically to detect this pattern (4). Scenario B assumes that a variant allele increases the risk of birth defect when carried by the mother but does no further damage if inherited by the fetus. The allele therefore would not be present in the affected offspring in case-families more than would have been predicted based on the parents. Thus, tests that compare parental and offspring genotype (e.g., transmission/disequilibrium test) could not detect the influence of this maternal variant allele. However, a comparison of the two case parents would show an excess of the allele among mothers compared with fathers. A mixed scenario involving both scenarios A and B may also occur. For example, if a maternal variant of the MTHFR gene disturbs the folic acid environment of the fetus, the effect might be compounded if the variant allele is also carried by the fetus. Therefore, it may be necessary to test simultaneously for the effects of the variant allele in both the mother and the offspring. Assumptions and basic notation We refer to the family grouping of a case and two genetic parents as the case-parent triad. Cases for whom genetic parents are not available would have to be excluded (e.g., cases who are adopted, conceived by artificial insemination or oocyte donation, or conceived by a man other than the father of record). We assume that the variant allele is transmitted by Mendelian inheritance (i.e., the parents' fertility and the survival of the fetus to diagnosis are unrelated to the genotype under study). Under this condition, children carry a random sample of the alleles of their parents. We allow the possibility of a "gene-dose" effect, such that fetuses who inherit two copies of the allele can be at higher risk than those carrying one copy. We designate the number of copies of the variant allele carried by the mother, father, and child as "M," "F," and "C." The number of copies can be zero, one, or two. Only 15 of the 3 3 (=27) possible combinations of M, F, and C are genetically possible (as listed in table 1). The frequency distribution of case families among these 15 possible triads will provide the basis for analysis. We begin by describing this multinomial distribution for the child-parent triads in the population at large and then develop the distribution conditional on the child having the disease under study. If there has been random mating (with respect to the allele under study) for several generations, the population will be in Hardy-Weinberg equilibrium with respect to that allele. Under such an assumption, the distribution of triads made up of random children and their parents can be described by simple polynomials in p (where p is the population prevalence of the variant allele) (9). The Hardy-Weinberg equilibrium assumption may be invalid if the population is made up of a mixture of subpopulations with varying gene prevalences that preferentially mate with others from the same subpopulation (a "structured" population). This can be thought of as a "melting pot" that hasn't fully mixed. Structured populations can be a problem if the subpopulations also have different background risks of disease, with risk variations not causally related to the gene under study. Under these conditions, confounding can lead to a noncausal association between genotype and risk for genes inherited by the child (scenario A) or genes carried by the mother (scenario B). TABLE 1. Frequencies in case-parent triads under scenario A (informative mating types are 2, 4, and 5) Triad genotype (copies of the variant in mother, father, and case (MFC)) Mating type Theoretical frequency 222 212 211 122 121 2 2 2 2 201 021 3 3 112 111 110 4 4 4 101 100 011 010 5 5 5 5 000 Am J Epidemiol Vol. 148, No. 9, 1998 Wilcox et al. A strategy that geneticists have used to avoid such sources of bias in association studies (and assuming scenario A) is to condition the analysis on "mating types" (3). Mating type is defined by the number of copies of the allele carried by each of the two parents (e.g., table 1). For example, a couple falls into mating type 2 if one is homozygous and the other is heterozygous for the allele under study. For one biallelic gene, there are six possible mating types. An implicit assumption in the stratification by mating type is that mating is symmetric with regard to genotype; for example, the frequency of heterozygous mothers married to homozygous variant fathers is the same as the frequency of heterozygous fathers married to homozygous variant mothers, and so on. In addition, we assume that, within each mating type, there is genetic symmetry in the probability that a couple agrees to be studied and in their fertility. We return to this assumption in the Discussion. The probability (Pr) for the triad category (M,F,C) can be expressed as Pr[C | M,F] Pr[M,F]. Thus, the probability for each possible category is the product of the mating-type probability times a factor that depends only on Mendelian inheritance. To write this without constraining the relative frequencies of the mating types, we use six unspecified probabilities for the six mating types. The population distribution for the triad allele counts (M,F,C) then depends on the relative proportions for the six mating types, together with the algebra of simple Mendelian inheritance from parent to child. The sum across the 15 triad categories is constrained to be one. Under scenario A, analytical stratification on mating type prevents the bias that can result from population stratification or other violations of Hardy-Weinberg equilibrium. Scenario A Under scenario A, the risk of the defect is increased in the offspring carrying the variant allele. In a study of case-parent triads, there will be an excess of triad combinations in which cases carry the allele. Such distortions are related to the gene relative risk in a mathematically simple way. Let /?, denote the relative risk with one copy of the variant allele (compared with no copies) and R2 denote the relative risk from carrying two copies. Let D denote the presence of the defect or disease in the offspring. The multinomial distribution for case-parent triads arises from an application of Bayes' theorem: Pr[M,F,C | D] - Pr[D | M,F,C] Pr[C | M,F] Pr[M,F] / Pr[D] In this way, the probability distribution for the triads is multiplicatively dependent on the relative risk for a Am J Epidemiol Vol. 148, No. 9, 1998 895 particular triad genotype, times the probability of occurrence of that triad. Under scenario A, Pr[D|M,F,C] = Pr[D|C]. The probability of occurrence of a particular triad depends on both the parental mating type (for which, again, we assume Pr[M,F] = Pr[F,M]), Mendelian transmission of genes to the child, and the relative risk corresponding to the count, C. Table 1 shows the expected proportions, where the y, serve as the mating-type stratum parameters. Thus, the relative risks can be estimated directly from the frequency distribution of the case-parent triads. Computation of relative risks /71 and R2 A maximum likelihood approach can be used to estimate the relative risks associated with the variant allele. The theoretical multinomial distribution in table 1 can be fitted to the observed counts of case-parent triads by maximum likelihood to yield estimates of /?, and R2. Confidence intervals for the relative risks can then be developed in the usual way based on the estimated standard errors. This model can be fit using standard software (e.g., GLIM (Numerical Algorithms Group, Downers Grove, Illinois) or SAS (SAS Institute, Inc., Cary, North Carolina)) for log-linear count data. The model fully conditions on mating type and makes the appropriate comparisons within each informative mating type (even though irrelevant data are included from the noninformative mating types, such as the 000 triad). Under scenario A, the informative mating types are 2, 4, and 5. This Poisson model assumes that the expected count for each cell with mating type j is shown in the following formula (where /(c=.s) is a "dummy" indicator variable that is one if the case carries s copies of the variant allele and zero otherwise). exp[w, ln(2)/ {M =F-c-i}]- The cjj (= ln(77)) parameters serve only to stratify on mating type and, as in log-linear modeling generally, effectively constrain the fitted total count to equal the observed total count. They are not themselves of interest. The term l n ( 2 ) / { M = F = c = 1 ) is included to allow for the "2" coefficient (see table 1) for the (1,1,1) outcome. (This arises because under Mendelian inheritance the child with one copy could have gotten that copy from the mother or from the father, with equal likelihood.) To fit this model in SAS (GENMOD procedure) or GLIM software, one needs to declare an "offset" defined as l n ( 2 ) / { M = F = c = ] ) , to allow this term to be included with its coefficient constrained to be 1. The relative risk /?, is estimated by exp(/3,) and R2 is estimated by exp(j32). The goodness-of-fit can be assessed by the usual chi-squared statistic. 896 Distinguishing Gene Effects Using Case-Parent Triads Under a dominance model (1 < Rt = R2), the model requires only a single j3, the coefficient of the sum of the two dummy variables, treated as a single dummy. The model can easily be adapted to test the possibility of a recessive allele (1 = /?, < R2). Under the recessive model, only the second dummy, indicating that the fetus carries two copies of the variant allele, is predictive. When the variant allele is uncommon, the cells containing individuals who are homozygous for the variant may be too sparse to fit a recessive model. Under scenario A, the theoretical distribution of triads among mating-type categories recalls the situation first proposed by Rubinstein et al. (1), and our analytical approach resembles the maximum likelihood methods developed by Schaid and Sommer (5), as we have discussed (9). The maximum likelihood estimates conditional on parental genotype are, in fact, identical to those developed here and also identical to what would be estimated under the Cox-like model proposed by Self et al. (10). However, our approach has an advantage over previous methods developed for scenario A in that it requires only standard software. The likelihood ratio test based on the scenario A model can be viewed as a competitor for the transmission/disequilibrium test, because both test the same null hypothesis that there is no linkage disequilibrium between the allele under study and a disease gene. Both are insensitive to a possible noncausal association at the population level because of genetic population structure. These properties are well known for the transmission/disequilibrium test (11). If there is neither linkage nor association, then simple Mendelian inheritance determines the distributions within each mating type, and R^ and R2 must both equal one in table 1. It follows that the log-linear model offers a valid test of this joint null hypothesis. Based on simulations reported elsewhere, the (2 df) likelihood ratio test based on the log-linear model provides better power than does the (1 df) transmission/disequilibrium test (9) under either a dominant or a recessive model for a candidate gene. The transmission/disequilibrium test offered better power only under the gene-dose scenario in which R2 = Rx2. Thus, even under scenario A, the proposed method offers advantages over standard methods. Moreover, the log-linear approach readily generalizes to handle scenario B. Scenario B Under scenario B, the variant allele produces a birth defect through the maternal genotype rather than through the fetal genotype. In this situation, mothers who carry the variant allele will be overrepresented among case families, compared with a null model in which the maternal and paternal allele counts are sym- metric within each mating type. As in scenario A, the asymmetric distribution among the case-parent triads permits the estimation of the allele relative risks, in this case risks associated with maternal alleles. The expected frequencies of case-parent triads under scenario B are shown in table 2, where the relative risks are now denoted 5, and S2. As in scenario A, parameters can be estimated using maximum likelihood techniques under a classical loglinear model, and the goodness-of-fit can be assessed by the usual chi-squared statistic. Here the informative mating types are 2, 3, and 5, though all of the data can be used. The only modification to the log-linear model is that the two indicator variables now refer to the maternal genotype rather than to the fetal genotype. Again the genetic mechanism can be taken to be dominant, recessive, or neither. Either scenario A or scenario B In the typical setting, one does not know a priori whether scenario A or scenario B applies. When both scenarios are possible, the preferred model would be a composite: /3 2 /{c =2} ln(2)/ {M=F=c=1} ]. In effect, this model allows simultaneously for effects of the fetal and the maternal genotype. One could also include interaction terms to allow for the possibility TABLE 2. Frequencies in case-parent triads under scenario B (informative mating types are 2, 3, and 5) Triad genotype (copies of the variant in mother, father, and case (MFC)) Mating type Theoretical frequency 222 212 211 122 121 2 2 2 2 201 021 3 3 112 111 110 4 101 100 011 010 5 5 5 5 4 4 000 Am J Epidemiol Vol. 148, No. 9, 1998 Wilcox et al. that the relative risk is greater or less than multiplicative when the variant allele is carried by both mother and fetus. Using the model with both C (case) and M (mother), one can test for significant loss of fit when either one is omitted, using the likelihood ratio test. This allows a test of whether the case's genotype carries any predictive information once the maternal allele count has been accounted for, and vice versa. Two different tests could be envisioned: one that adjusts the child's contribution for a possible maternal contribution through scenario B, and one that tests for the child's contribution against a baseline of no genetic effects at all. One surprising feature of the likelihood-based approach is that these two tests are identical. The contributions of the mother and the child are completely orthogonal, in that the estimation of the maternal parameters (S1, S2) has no effect on the estimation of the child's parameters (/?,, R2) or on their standard errors. Similarly, adjustment for the potential contribution of maternal genotype has no effect on the likelihood ratio test for the child's contribution to risk. This is true despite the correlation between the child's and mother's genotype in the population. This orthogonality arises because, under the multiplicative model, C and M are independent within each stratum defined by parental mating type. In a log-linear analysis that stratifies on parental mating type, there is a uniquely definable likelihood ratio test to assess the contribution of the child's (or mother's) genotype to risk. Estimating risk in the presence of HardyWeinberg equilibrium One advantage of the analysis described above is that no assumption of Hardy-Weinberg equilibrium is necessary. If conditions of Hardy-Weinberg equilibrium are plausible (e.g., in an ethnically homogeneous population), then estimates of risk can also be made by an alternative approach (9) that is also log-linear. This approach can be followed using standard software, requires fewer parameters, and has the added feature of providing an estimate of the population prevalence of the allele. This is despite being based on only cases and their parents. However, Hardy-Weinberg equilibrium is a strong assumption, one that most investigators will not want to rely on in practice. Simulations We used simulations to explore practical aspects of analysis, using the NAG Fortran library (Numerical Algorithms Group) to generate 1,000 data sets for each Am J Epidemiol Vol. 148, No. 9, 1998 897 of several parameter-value configurations. Each data set contained 100 case-parent triads. All simulations were based on a mixture of two subpopulations that differed in allele frequency and baseline risk. One subpopulation was 20 percent of the total population and the other was 80 percent. In the smaller group, the baseline disease risk was 0.05 (among those not carrying the variant allele), and the variant allele frequency was 0.30. The larger subpopulation had a baseline risk of 0.01 and a variant allele frequency of 0.10. Because we extracted only cases and their parents from the simulations, absolute values of the two baseline risks have no effect on the distributions of case-parent triads; only the ratio of the baseline risks is relevant. Each subpopulation was assumed to be in Hardy-Weinberg equilibrium. The simulated population as a whole was not in Hardy-Weinberg equilibrium, and the genetic stratification would produce marked confounding of the allele effect under a conventional case-control approach. Suppose that affected babies are compared with unaffected babies and that there is no true effect of the variant allele on risk. A spurious "gene dose" effect will be evident, with an odds ratio of 1.6 for babies carrying one copy of the variant allele and 2.5 for those carrying two copies. Similar bias would appear in a case-control comparison of mothers of affected babies compared with mothers of unaffected babies, carried out by an investigator concerned about scenario B types of mechanisms. The same gene-dose effect would be evident. In both designs, the investigator would be led astray by the presence of genetic population stratification. Bias due to the simulated population stratification completely disappears in our analyzed simulations of case-parent triads because of the conditioning on parental mating type. We provide results for one set of simulations in which the variant allele raises the risk 2.5-fold through a dominant mechanism of action (i.e., in the presence of either one or two copies of the variant allele). When the allele is carried by the fetus, the relative risks are designated as Rx and R2, and when the allele is carried by the mother, the relative risks are 5, and S2. Under the dominance assumption, /?, = R2 and 5, = S2. (These equalities were not assumed in the subsequent analyses.) Under these conditions, we generated three types of data: scenario A, in which the variant allele has its effect only when present in the offspring (R} = R2 = 2.5, 5, = S2 =1); scenario B, in which the variant allele has its effect only when present in the mother (/?, = R2 = 1, Sl = S2 = 2.5); and scenario A + B, in which the variant allele raises the risk equally whether carried by the offspring or the mother (Rt = 898 Distinguishing Gene Effects Using Case-Parent Triads R2 = 2.5, 5, = S2 = 2.5). One thousand independent replicates of 100 case-parent triads were generated for each of the three scenarios. Even though none of these scenarios actually involves four different risks, this would not be known in a real setting and, thus, we are obliged in the analysis to estimate each risk separately. We fitted full models to each data set, estimating all four relative risks (parameters /?,, R2, 5,, S2). We computed nominal 95 percent confidence limits (standard error based) for each data set and checked whether the true parameter value was within those limits. TABLE 4. Results (testing each of the two null hypotheses: fl, = R2 = 1 and S, = S2 = 1) for three sets of simulated data, each with 1,000 independent simulated studies, each of which included 100 case-parent triads (see text for details) Scenario A Effect of child's gene? Yes No Yes Effect of mother's gene? No 7 60 739 201 940 792 208 1,000 ScenarioB Effect of child's gene? Yes No RESULTS Simulation results are provided in table 3. Estimates showed no evidence of bias under the null and (for this small sample size) a slight upward bias under alternatives. All observed coverage rates of confidence limits were consistent with the assumed 95 percent confidence level. Using the same sets of data, we attempted to exclude an effect of either the child's or mother's genes by reducing the model by two parameters. Standard methods lead to x 2 likelihood ratio tests with 2 df. Results are given in table 4. With relative risks of 2.5 and a sample size of 100, there was 79 percent power to detect the effect of the child's variant allele under scenario A and 79 percent power to detect the effect of the mother's allele under scenario B. An allelic effect was misattributed to the mother only 6 percent of the time when the effect was through the child (scenario A) and to the child only 5 percent of the time when the effect was through the mother (scenario B), with both rates consistent with the nominal type I error rate. In the combined scenarios A + B, both allelic effects were correctly identified 69 percent of the time, with 98 percent power to reject the composite null hypothesis of no genetic effects. Collapsing /?, and R2 into a single R improved power further (results not shown). 53 Yes 39 748 787 No 12 201 213 51 949 1,000 Effect of mother's gene? Scenario A + B Effect of child's gene? Yes No Yes 688 130 818 No 157 25 182 845 155 1,000 Effect of mother's gene? DISCUSSION Low-penetrance genes may affect offspring through alleles carried by the offspring and through maternal alleles acting via the intrauterine environment. Previous statistical approaches for the study of genetic risk in case families have focused on the alleles inherited by the offspring as the crucial determinant of risk (1-6). The possibility that maternal genes may play an independent role in the etiology of birth defects has been recognized (8). However, the only analytical approach proposed thus far requires data from both TABLE 3. Estimates of relative risks produced by the variant allele in the offspring (scenario A), the mother (scenario B), or both, with estimates for each scenario based on simulated data with 1,000 independent samples of 100 case-parent triads True parameter values Scenario fl. A Bt A + B§ 2.5 1 2.5 2.5 1 2.5 s, s, 1 2.5 2.5 1 2.5 2.5 Estimated values* fli 2.57 (2.52-2.63)t 0.99 (0.97-1.01) 2.56 (2.51-2.61) NO. covered 963 961 952 « i 2.51 (2.42-2.60) 0.95 (0.92-0.99) 2.51 (2.44-2.59) No. covered s, No. covered s7 950 952 949 0.99 (0.98-1.01) 2.57 (2.51-2.63) 2.59 (2.53-2.64) 951 954 945 0.99 (0.95-1.03) 2.67 (2.55-2.79) 2.67 (2.58-2.77) No. covered 959 963 964 • Transformed mean parameter values from 1,000 replicates with coverage (number of 1,000 simulated studies) of nominal 95% confidence intervals. For coverage counts, two standard errors would be about 14 counts, based on a rate of 0.95, in 1,000 simulated studies. t Numbers in parentheses, 95% confidence interval. t In the estimates, one was excluded because of an infinite Ft, estimate and six because of an infinite S, estimate, due to small numbers in some simulated cells. § In estimates, two were excluded because of infinite S, estimates, due to small numbers in some cells. Am J Epidemiol Vol. 148, No. 9, 1998 Wilcox et al. maternal grandparents as well as mothers (12), which can be impractical. We propose an approach based on the genotypes of cases and their biologic parents. This approach estimates the separate effects of an allele carried by the mother or by the affected child. Adjustment for confounding factors is feasible, as is the exploration of gene-environment interaction. In principle, the same method could be used to test more than two alleles of a given gene, although the analysis becomes more complicated. In work reported separately (9), we show that the likelihood ratio test based on the log-linear model outperforms the transmission/disequilibrium test under either the recessive or dominant genetic model for scenario A, and we also extend the model to handle parental imprinting scenarios. These parameter estimates and tests of hypotheses must be interpreted with caution. Although the relative risk estimates are derived from a single log-linear modeling structure, there are some important distinctions between inference related to scenario A and inference related to scenario B. Tests of scenario A can be considered as simultaneously tests of association and linkage, that is, of linkage disequilibrium. For scenario A the informative asymmetry is discerned against a null background of simple Mendelian inheritance from parent to child. Symmetry of allele counts (mother vs. father) within the parental mating types is not actually needed for such tests. By contrast, tests of scenario B rely on the assumed symmetry of allele counts for mothers versus fathers within parental mating types. Thus, stronger assumptions are needed for estimation and testing under scenario B than under scenario A; moreover, rejection of the symmetry expected in the absence of maternal effects does not necessarily imply linkage of the variant allele to a genetic factor that confers risk through the mother but only implies association. Linkage could be strictly inferred only in a study that also included genotyping the baby's maternal grandparents, as proposed by Mitchell (12). While our approach to the analysis of case-parent triad data is not biased under scenario A by asymmetry within parental mating type or by the presence of genetic population structure, the resulting associations may still not be directly causal. An allele that is associated with a disease outcome under this design may only be in linkage disequilibrium with a gene that is important. Thus, the method cannot be expected to distinguish between a genetic marker proximal to a disease gene and a disease gene with incomplete penetrance. Evidence for maternal effects must be interpreted with particular caution. Suppose the population is Am J Epidemiol Vol. 148, No. 9, 1998 899 structured, such that certain subpopulations have higher baseline risk for the disease and also higher prevalence of the allele. The variation in baseline risk across subpopulations could be due in part to unmeasured exposures or to deleterious genes unrelated to the gene under study. To the extent that there is some intermarriage across the distinct subpopulations (in contrast to what was assumed in our simulations), mothers from high risk subpopulations may be overrepresented among case-parent triads (compared with fathers) because they bring to the marriage both their likelihood of carrying the allele and their deleterious exposure. On the other hand, if intermarriage is common enough to produce serious bias, the population structure itself should disappear within several generations, thus removing the source of the problem. Another kind of distortion can be caused by a gene that affects metabolism of a certain exposure and, as a consequence, indirectly affects the propensity to be exposed. An example might be a gene that affects the metabolism of ethanol, where carriers of the gene may have a higher or lower alcohol intake on average. However, such mechanisms simply serve to illustrate the point that genes can operate either directly or indirectly through their influence on exposures. In the simulated samples of 100 case-parent triads, we found nearly 80 percent power to detect the effect (relative risk = 2.5) of an allele carried by the offspring. This power is similar to that of a case-control study under the same conditions, with 100 cases and 100 controls (13). Using the case-parent-triad approach, power was equally high for detecting an allele working through the maternal genotype. Furthermore, the power to distinguish alleles working through the mother or offspring was very high; complete misattribution of a real genetic effect to the wrong member of the family occurred only about 1 percent of the time. When the variant allele had effects through both the mother and the offspring, both effects were estimated with little bias. The method that we have described can readily be generalized to incorporate possible effects of parental imprinting, where the effect of an inherited allele is different depending on the parental source of the allele. The fitting of the model is more difficult, as missing data methods must be applied, but simulations reveal very good power for detecting imprinting effects (9). The case-parent-triad analysis requires assumptions that may not hold in particular situations. Disruption of Mendelian transmission (e.g., if homozygous carriers of the variant allele do not survive) could lead to situations where the apparent risk with two alleles (R2) is lower than the risk with one allele (/?,). Fetal 900 Distinguishing Gene Effects Using Case-Parent Triads survival might also be related to the condition under study, for example, in fetuses affected with a neural tube defect. The log-linear method does not implicitly assume, however, that survival to clinical detection is unrelated to genotype. In fact, only a much weaker assumption is needed. This can be shown by expanding the Bayesian expression used earlier. Because babies can be included in a study only if they survive to clinical detection, we need the probability of the joint event where the defect occurs and the fetus survives to birth. If we denote the survival of the fetus as "S," then: Pr[M,F,C | D,S] = Pr[D,S | M,F,C] Pr[C | M,F] Pr[M,F] / Pr[D,S]. Now we can rewrite Pr[D,S | M,F,C] = Pr[D | M,F,C] Pr[S | D,M,F,C]. If the probability of survival among fetuses who have developed the defect is independent of the three allele counts, then Pr[S | D,M,F,C] =Pr[S | D]. Cancellation of this factor in the numerator and denominator of the above expression removes any effect of possible differential survival. Thus, the distribution of alleles would still be the same among the case-parent triads based on surviving cases. In this way, the method is applicable even to life-threatening conditions. Our strategy for distinguishing maternal from fetal genetic effects could apply to conditions of pregnancy (such as preeclampsia) or infancy (such as birth defects or developmental problems). This strategy might also be useful for the study of conditions among older children and even adults. There are unexplained associations between infant characteristics (such as birth weight) and adult diseases such as breast cancer and heart disease (14, 15), consistent with the possibility that genes that influence the development of the fetus may also have effects on the risk of adult diseases. A limitation in applying this method to diseases of adulthood is that parents of cases may be dead and their genetic information inaccessible. Compared with a case-control design, the caseparent-triad design has both advantages and disadvantages. Cases and parents of cases are more likely to consent to genetic testing than are healthy controls or their parents. Moreover, even in a population where genotypes for healthy controls can be obtained, casecontrol studies are still vulnerable to confounding due to population structure, as in our simulations. Furthermore, the study of case-parent triads can work well in settings where cases are highly selected (e.g., drawn from a clinic or support-group setting). Case-parent triads can produce valid estimates of relative risks with selected cases because the parents of cases provide inherently well-matched genetic controls. The stratification on parental mating type absorbs variations in recruitment rate that lead to overrepresentation of certain parental mating types (e.g., because of cultural factors). These advantages of the case-parent-triad design may make it the method of choice for preliminary studies of candidate alleles related to conditions of pregnancy or early life. Case-control studies have the advantage of being able to detect nongenetic risk factors and to estimate the population prevalence of a variant allele. Casecontrol studies are also less susceptible to the potentially distorting effects of non-Mendelian transmission. Mendelian transmission can be tested directly in a case-control study, if genotypes are obtained for control parents for a subset (not necessarily random) of the controls. It may be possible to take advantage of the respective strengths of the two approaches by developing hybrid designs in which data from case parents and some control parents are collected as part of a case-control study. The case-parent triad approach has broader use than has been appreciated. Until now, its application has been limited to the study of genes working through the case's genotype and has depended on specialized software. We show that case-parent triad data can be used to detect the effect of maternal genes, with an analytical approach that uses widely available software. Genetic relative risks can readily be estimated under the proposed method. ACKNOWLEDGMENTS The authors are grateful to Drs. Norman Kaplan, Stephanie London, and David Umbach for helpful suggestions on earlier drafts of this paper. REFERENCES 1. Rubinstein P, Walker M, Carpenter C, et al. Genetics of HLA disease associations: the use of the haplotype relative risk (HRR) and the "haplo-delta" (Dh) estimates in juvenile diabetes from three racial groups. (Abstract). Hum Immunol 1981;3:384. 2. Falk CT, Rubinstein P. Haplotype relative risks: an easy reliable way to construct a proper control sample for risk calculations. Am J Hum Genet 1987;51:227-33. 3. Schaid D, Sommer S. Genotype relative risks: methods for design and analysis of candidate-gene association studies. Am J Hum Genet 1993;53:1114-26. 4. Spielman R, McGinnis R, Ewens W. Transmission test for linkage disequilibrium: the insulin gene region and insulindependent diabetes mellitus (IDDM). Am J Hum Genet 1993; 52:506-16. Am J Epidemiol Vol. 148, No. 9, 1998 Wilcox et al. 5. Schaid D, Sommer S. Comparison of statistics for candidategene association studies using cases and parents. Am J Hum Genet 1994;55:402-9. 6. Spielman R, Ewens W. Invited editorial: the TDT and other family-based tests for linkage disequilibrium and association. Am J Hum Genet 1996;59:983-9. 7. Oakley G. Folic acid-preventable spina bifida and anencephaly. JAMA 1993;269:1292-3. 8. van der Put NM, Steegers-Theunissen RP, Frosst P, et al. Mutated methylenetetrahydrofolate reductase as a risk factor for spina bifida. Lancet 1995;346:1070-1. 9. Weinberg CR, Wilcox AJ, Lie RT. A log-linear approach to case-parent-triad data: assessing effects of disease genes that act directly or through maternal effects and that may be subject to parental imprinting. Am J Hum Genet 1998;62:969-78. Am J Epidemiol Vol. 148, No. 9, 1998 901 10. Self S, Longton G, Kopecky K, et al. On estimating HLAdisease association with application to a study of aplastic anemia. Biometrics 1991;47:53—61. 11. Ewens W, Spielman R. The transmission/disequilibrium test: history, subdivision, and admixture. Am J Hum Genet 1995; 57:455-64. 12. Mitchell L. Differentiating between fetal and maternal genotypic effects, using the transmission test for linkage disequilibrium. (Letter). Am J Hum Genet 1997;60:1006-7. 13. Schlesselman JJ. Sample size requirements in cohort and case-control studies of disease. Am J Epidemiol 1974;99: 381-4. 14. Barker DJ. Maternal and fetal origins of coronary heart disease. J R Coll Physicians Lond 1994;28:544-51. 15. Michels K, Trichopoulos D, Robins J, et al. Birthweight as a risk factor for breast cancer. Lancet 1996;348:1542—6.