Download Case-Parent Triads

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of genetic engineering wikipedia , lookup

Gene expression programming wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

RNA-Seq wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genetic testing wikipedia , lookup

Gene wikipedia , lookup

Genetic engineering wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Epistasis wikipedia , lookup

Birth defect wikipedia , lookup

Inbreeding wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Human genetic variation wikipedia , lookup

Twin study wikipedia , lookup

Behavioural genetics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Heritability of IQ wikipedia , lookup

Public health genomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome-wide association study wikipedia , lookup

Population genetics wikipedia , lookup

Fetal origins hypothesis wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Microevolution wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Genetic drift wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Transcript
American Journal of Epidemiology
Copyright © 1998 by The Johns Hopkins University School of Hygiene and Public Health
All rights reserved
Vol. 148, No. 9
Printed in U.S.A.
Distinguishing the Effects of Maternal and Offspring Genes through Studies
of "Case-Parent Triads"
Allen J. Wilcox,1 Clarice R. Weinberg,2 and Rolv Terje Lie3
A gene variant that increases disease risk will be overrepresented among diseased persons, even compared
with their own biologic parents. This insight has led to tests based solely on the asymmetric distribution of a
variant allele among cases and their parents (e.g., the transmission/disequilibrium test). Existing methods
focus on effects of alleles that operate through the offspring genotype. Alleles can also operate through the
mother's genotype, particularly for conditions such as birth defects that have their origins in fetal life. An allele
working through the mother would have higher frequency in case-mothers than in case-fathers. The authors
develop a log-linear method for estimating relative risks for alleles in the context of case-parent triads. This
method is able to detect the effects of genes working through the offspring, the mother, or both. The authors
assume Mendelian inheritance, but Hardy-Weinberg equilibrium is unnecessary. Their approach uses standard
software, and simulations demonstrate satisfactory power and confidence interval coverage. This method is
valid with a self-selected or hospital-based series of cases and helps to protect against misleading inference
that can result when cases and controls are randomly sampled from a population not in Hardy-Weinberg
equilibrium. Am J Epidemiol 1998;148:893-901.
abnormalities; alleles; case-control studies; epidemiologic
disequilibrium; models, genetic; models, statistical
methods;
genetic
markers;
linkage
developed to detect distortions in transmission from
parent to child but, rather, to detect the asymmetries in
allele distribution that can occur among affected offspring and their parents.
The transmission/disequilibrium test and related
methods for analysis of case-parent triads are useful
for many diseases, but they have an inherent limitation
for the study of diseases that originate during fetal life.
The mother plays a crucial role as not only genetic
parent but also fetal environment. Thus, a maternal
allele may damage a fetus through effects on the
intrauterine milieu, regardless of whether the allele is
passed to the fetus. Consider the gene for the metabolic enzyme 5,10-methylenetetrahydrofolate reductase (MTHFR), which regulates a key step in the
metabolism of folic acid. Low maternal intake of folic
acid has been shown to increase the risk of neural tube
defects in offspring (7). By extension, mothers who
carry a variant of the MTHFR gene have been hypothesized to be at increased risk of bearing a child with
neural tube defects (8). None of the current methods
for analyzing case-parent triads would be able to detect this maternal genetic risk. We propose a simple
method of analysis, based on genotypes of cases and
their parents, that estimates relative risks associated
with both the mother's and the offspring's genotypes.
Low-penetrance genes may not produce a high absolute risk of disease, but their relative risk can be
substantial. The availability of molecular genetic tools
to study low-penetrance genes has created new possibilities for the estimation of gene relative risk. One
ingenious approach requires no controls in the usual
sense but relies instead on allele frequencies among
diseased persons and their biologic parents. The key
observation, made by Rubinstein et al. (1) in 1981, is
that alleles associated with a given disease will occur
more often in diseased persons than would have been
expected based on the allele distribution in their parents.
Various statistical methods have been proposed to
use this observation for inferring increased risk, the
best-known being the transmission/disequilibrium test
(1-6). Although the terminology may be misleading,
geneticists understand that these methods were not
Received for publication August 29, 1997, and accepted for
publication March 25, 1998.
Abbreviation: MTHFR, 5,10-methylenetetrahydrofolate reductase.
1
Epidemiology Branch, NIEHS, Research Triangle Park, NC.
2
Biostatistics Branch, NIEHS, Research Triangle Park, NC.
3
Division for Medical Statistics, University of Bergen, Bergen,
Norway.
Reprint requests to Dr. Allen Wilcox, Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle
Park, NC 27709.
893
894
Distinguishing Gene Effects Using Case-Parent Triads
MATERIALS AND METHODS
Any disease or condition that has its origins in fetal
life would be eligible for study using the method
described here. No rare-disease assumption is necessary. For this illustration, we assume the condition
under study is a type of birth defect. We assume that
an allele suspected of increasing the risk of the birth
defect has been identified. We designate this allele as
the "variant."
Consider two possible biologic scenarios. In scenario A, the allele works through the fetal genotype to
increase the susceptibility of the fetus to a particular
birth defect. This risk occurs regardless of which parent transmits the allele. This is the mode of action
most often assumed in genetic epidemiology studies.
Under scenario A, the allele will be more frequent
among case offspring than predicted by the allele's
distribution in the parents. Statistical tests such as the
transmission/disequilibrium test are designed specifically to detect this pattern (4).
Scenario B assumes that a variant allele increases
the risk of birth defect when carried by the mother but
does no further damage if inherited by the fetus. The
allele therefore would not be present in the affected
offspring in case-families more than would have been
predicted based on the parents. Thus, tests that compare parental and offspring genotype (e.g., transmission/disequilibrium test) could not detect the influence
of this maternal variant allele. However, a comparison
of the two case parents would show an excess of the
allele among mothers compared with fathers.
A mixed scenario involving both scenarios A and B
may also occur. For example, if a maternal variant of
the MTHFR gene disturbs the folic acid environment
of the fetus, the effect might be compounded if the
variant allele is also carried by the fetus. Therefore, it
may be necessary to test simultaneously for the effects of
the variant allele in both the mother and the offspring.
Assumptions and basic notation
We refer to the family grouping of a case and two
genetic parents as the case-parent triad. Cases for
whom genetic parents are not available would have to
be excluded (e.g., cases who are adopted, conceived
by artificial insemination or oocyte donation, or
conceived by a man other than the father of record).
We assume that the variant allele is transmitted by
Mendelian inheritance (i.e., the parents' fertility and
the survival of the fetus to diagnosis are unrelated to
the genotype under study). Under this condition, children carry a random sample of the alleles of their
parents. We allow the possibility of a "gene-dose" effect,
such that fetuses who inherit two copies of the allele can
be at higher risk than those carrying one copy.
We designate the number of copies of the variant
allele carried by the mother, father, and child as "M,"
"F," and "C." The number of copies can be zero, one,
or two. Only 15 of the 3 3 (=27) possible combinations
of M, F, and C are genetically possible (as listed in
table 1). The frequency distribution of case families
among these 15 possible triads will provide the basis
for analysis. We begin by describing this multinomial
distribution for the child-parent triads in the population at large and then develop the distribution conditional on the child having the disease under study.
If there has been random mating (with respect to the
allele under study) for several generations, the population will be in Hardy-Weinberg equilibrium with
respect to that allele. Under such an assumption, the
distribution of triads made up of random children and
their parents can be described by simple polynomials
in p (where p is the population prevalence of the
variant allele) (9).
The Hardy-Weinberg equilibrium assumption may
be invalid if the population is made up of a mixture of
subpopulations with varying gene prevalences that
preferentially mate with others from the same subpopulation (a "structured" population). This can be
thought of as a "melting pot" that hasn't fully mixed.
Structured populations can be a problem if the subpopulations also have different background risks of
disease, with risk variations not causally related to the
gene under study. Under these conditions, confounding can lead to a noncausal association between genotype and risk for genes inherited by the child (scenario
A) or genes carried by the mother (scenario B).
TABLE 1. Frequencies in case-parent triads under scenario
A (informative mating types are 2, 4, and 5)
Triad genotype (copies of the
variant in mother, father, and
case (MFC))
Mating
type
Theoretical
frequency
222
212
211
122
121
2
2
2
2
201
021
3
3
112
111
110
4
4
4
101
100
011
010
5
5
5
5
000
Am J Epidemiol
Vol. 148, No. 9, 1998
Wilcox et al.
A strategy that geneticists have used to avoid such
sources of bias in association studies (and assuming
scenario A) is to condition the analysis on "mating
types" (3). Mating type is defined by the number of
copies of the allele carried by each of the two parents
(e.g., table 1). For example, a couple falls into mating
type 2 if one is homozygous and the other is heterozygous for the allele under study. For one biallelic gene,
there are six possible mating types. An implicit assumption in the stratification by mating type is that
mating is symmetric with regard to genotype; for
example, the frequency of heterozygous mothers married to homozygous variant fathers is the same as the
frequency of heterozygous fathers married to homozygous variant mothers, and so on. In addition, we assume that, within each mating type, there is genetic
symmetry in the probability that a couple agrees to be
studied and in their fertility. We return to this assumption in the Discussion.
The probability (Pr) for the triad category (M,F,C)
can be expressed as Pr[C | M,F] Pr[M,F]. Thus, the
probability for each possible category is the product of
the mating-type probability times a factor that depends
only on Mendelian inheritance. To write this without
constraining the relative frequencies of the mating
types, we use six unspecified probabilities for the six
mating types. The population distribution for the triad
allele counts (M,F,C) then depends on the relative
proportions for the six mating types, together with the
algebra of simple Mendelian inheritance from parent
to child. The sum across the 15 triad categories is
constrained to be one. Under scenario A, analytical
stratification on mating type prevents the bias that can
result from population stratification or other violations
of Hardy-Weinberg equilibrium.
Scenario A
Under scenario A, the risk of the defect is increased
in the offspring carrying the variant allele. In a study
of case-parent triads, there will be an excess of triad
combinations in which cases carry the allele. Such
distortions are related to the gene relative risk in a
mathematically simple way.
Let /?, denote the relative risk with one copy of the
variant allele (compared with no copies) and R2 denote
the relative risk from carrying two copies. Let D
denote the presence of the defect or disease in the
offspring. The multinomial distribution for case-parent
triads arises from an application of Bayes' theorem:
Pr[M,F,C | D]
- Pr[D | M,F,C] Pr[C | M,F] Pr[M,F] / Pr[D]
In this way, the probability distribution for the triads
is multiplicatively dependent on the relative risk for a
Am J Epidemiol
Vol. 148, No. 9, 1998
895
particular triad genotype, times the probability of occurrence of that triad. Under scenario A, Pr[D|M,F,C]
= Pr[D|C]. The probability of occurrence of a particular triad depends on both the parental mating type
(for which, again, we assume Pr[M,F] = Pr[F,M]),
Mendelian transmission of genes to the child, and the
relative risk corresponding to the count, C. Table 1
shows the expected proportions, where the y, serve as
the mating-type stratum parameters. Thus, the relative
risks can be estimated directly from the frequency
distribution of the case-parent triads.
Computation of relative risks /71 and R2
A maximum likelihood approach can be used to
estimate the relative risks associated with the variant
allele. The theoretical multinomial distribution in table
1 can be fitted to the observed counts of case-parent
triads by maximum likelihood to yield estimates of /?,
and R2. Confidence intervals for the relative risks can
then be developed in the usual way based on the
estimated standard errors. This model can be fit using
standard software (e.g., GLIM (Numerical Algorithms
Group, Downers Grove, Illinois) or SAS (SAS Institute, Inc., Cary, North Carolina)) for log-linear count
data. The model fully conditions on mating type and
makes the appropriate comparisons within each informative mating type (even though irrelevant data are
included from the noninformative mating types, such
as the 000 triad). Under scenario A, the informative
mating types are 2, 4, and 5.
This Poisson model assumes that the expected count
for each cell with mating type j is shown in the
following formula (where /(c=.s) is a "dummy" indicator variable that is one if the case carries s copies of
the variant allele and zero otherwise).
exp[w,
ln(2)/ {M =F-c-i}]-
The cjj (= ln(77)) parameters serve only to stratify on
mating type and, as in log-linear modeling generally,
effectively constrain the fitted total count to equal the
observed total count. They are not themselves of interest. The term l n ( 2 ) / { M = F = c = 1 ) is included to allow
for the "2" coefficient (see table 1) for the (1,1,1)
outcome. (This arises because under Mendelian inheritance the child with one copy could have gotten that
copy from the mother or from the father, with equal
likelihood.) To fit this model in SAS (GENMOD
procedure) or GLIM software, one needs to declare an
"offset" defined as l n ( 2 ) / { M = F = c = ] ) , to allow this
term to be included with its coefficient constrained to
be 1. The relative risk /?, is estimated by exp(/3,) and
R2 is estimated by exp(j32). The goodness-of-fit can be
assessed by the usual chi-squared statistic.
896
Distinguishing Gene Effects Using Case-Parent Triads
Under a dominance model (1 < Rt = R2), the model
requires only a single j3, the coefficient of the sum of
the two dummy variables, treated as a single dummy.
The model can easily be adapted to test the possibility
of a recessive allele (1 = /?, < R2). Under the recessive model, only the second dummy, indicating that
the fetus carries two copies of the variant allele, is
predictive. When the variant allele is uncommon, the
cells containing individuals who are homozygous for
the variant may be too sparse to fit a recessive model.
Under scenario A, the theoretical distribution of
triads among mating-type categories recalls the situation first proposed by Rubinstein et al. (1), and our
analytical approach resembles the maximum likelihood methods developed by Schaid and Sommer (5),
as we have discussed (9). The maximum likelihood
estimates conditional on parental genotype are, in fact,
identical to those developed here and also identical to
what would be estimated under the Cox-like model
proposed by Self et al. (10). However, our approach
has an advantage over previous methods developed for
scenario A in that it requires only standard software.
The likelihood ratio test based on the scenario A
model can be viewed as a competitor for the transmission/disequilibrium test, because both test the same
null hypothesis that there is no linkage disequilibrium
between the allele under study and a disease gene.
Both are insensitive to a possible noncausal association at the population level because of genetic population structure. These properties are well known for
the transmission/disequilibrium test (11). If there is
neither linkage nor association, then simple Mendelian
inheritance determines the distributions within each
mating type, and R^ and R2 must both equal one in
table 1. It follows that the log-linear model offers a
valid test of this joint null hypothesis. Based on simulations reported elsewhere, the (2 df) likelihood ratio
test based on the log-linear model provides better
power than does the (1 df) transmission/disequilibrium
test (9) under either a dominant or a recessive model
for a candidate gene. The transmission/disequilibrium
test offered better power only under the gene-dose
scenario in which R2 = Rx2. Thus, even under scenario
A, the proposed method offers advantages over standard methods. Moreover, the log-linear approach
readily generalizes to handle scenario B.
Scenario B
Under scenario B, the variant allele produces a birth
defect through the maternal genotype rather than
through the fetal genotype. In this situation, mothers
who carry the variant allele will be overrepresented
among case families, compared with a null model in
which the maternal and paternal allele counts are sym-
metric within each mating type. As in scenario A, the
asymmetric distribution among the case-parent triads
permits the estimation of the allele relative risks, in
this case risks associated with maternal alleles. The
expected frequencies of case-parent triads under scenario B are shown in table 2, where the relative risks
are now denoted 5, and S2.
As in scenario A, parameters can be estimated using
maximum likelihood techniques under a classical loglinear model, and the goodness-of-fit can be assessed
by the usual chi-squared statistic. Here the informative
mating types are 2, 3, and 5, though all of the data can
be used. The only modification to the log-linear model
is that the two indicator variables now refer to the
maternal genotype rather than to the fetal genotype.
Again the genetic mechanism can be taken to be
dominant, recessive, or neither.
Either scenario A or scenario B
In the typical setting, one does not know a priori
whether scenario A or scenario B applies. When both
scenarios are possible, the preferred model would be a
composite:
/3 2 /{c =2}
ln(2)/ {M=F=c=1} ].
In effect, this model allows simultaneously for effects
of the fetal and the maternal genotype. One could also
include interaction terms to allow for the possibility
TABLE 2. Frequencies in case-parent triads under scenario
B (informative mating types are 2, 3, and 5)
Triad genotype (copies of the
variant in mother, father, and
case (MFC))
Mating
type
Theoretical
frequency
222
212
211
122
121
2
2
2
2
201
021
3
3
112
111
110
4
101
100
011
010
5
5
5
5
4
4
000
Am J Epidemiol
Vol. 148, No. 9, 1998
Wilcox et al.
that the relative risk is greater or less than multiplicative when the variant allele is carried by both mother
and fetus.
Using the model with both C (case) and M (mother),
one can test for significant loss of fit when either one
is omitted, using the likelihood ratio test. This allows
a test of whether the case's genotype carries any
predictive information once the maternal allele count
has been accounted for, and vice versa.
Two different tests could be envisioned: one that
adjusts the child's contribution for a possible maternal
contribution through scenario B, and one that tests for
the child's contribution against a baseline of no genetic effects at all. One surprising feature of the likelihood-based approach is that these two tests are identical. The contributions of the mother and the child are
completely orthogonal, in that the estimation of the
maternal parameters (S1, S2) has no effect on the
estimation of the child's parameters (/?,, R2) or on
their standard errors. Similarly, adjustment for the
potential contribution of maternal genotype has no
effect on the likelihood ratio test for the child's contribution to risk. This is true despite the correlation
between the child's and mother's genotype in the
population. This orthogonality arises because, under
the multiplicative model, C and M are independent
within each stratum defined by parental mating type.
In a log-linear analysis that stratifies on parental mating type, there is a uniquely definable likelihood ratio
test to assess the contribution of the child's (or mother's) genotype to risk.
Estimating risk in the presence of HardyWeinberg equilibrium
One advantage of the analysis described above is
that no assumption of Hardy-Weinberg equilibrium is
necessary. If conditions of Hardy-Weinberg equilibrium are plausible (e.g., in an ethnically homogeneous
population), then estimates of risk can also be made by
an alternative approach (9) that is also log-linear. This
approach can be followed using standard software,
requires fewer parameters, and has the added feature
of providing an estimate of the population prevalence
of the allele. This is despite being based on only cases
and their parents. However, Hardy-Weinberg equilibrium is a strong assumption, one that most investigators will not want to rely on in practice.
Simulations
We used simulations to explore practical aspects of
analysis, using the NAG Fortran library (Numerical
Algorithms Group) to generate 1,000 data sets for each
Am J Epidemiol
Vol. 148, No. 9, 1998
897
of several parameter-value configurations. Each data
set contained 100 case-parent triads.
All simulations were based on a mixture of two
subpopulations that differed in allele frequency and
baseline risk. One subpopulation was 20 percent of the
total population and the other was 80 percent. In the
smaller group, the baseline disease risk was 0.05
(among those not carrying the variant allele), and the
variant allele frequency was 0.30. The larger subpopulation had a baseline risk of 0.01 and a variant allele
frequency of 0.10. Because we extracted only cases
and their parents from the simulations, absolute values
of the two baseline risks have no effect on the distributions of case-parent triads; only the ratio of the
baseline risks is relevant. Each subpopulation was
assumed to be in Hardy-Weinberg equilibrium.
The simulated population as a whole was not in
Hardy-Weinberg equilibrium, and the genetic stratification would produce marked confounding of the allele effect under a conventional case-control approach.
Suppose that affected babies are compared with unaffected babies and that there is no true effect of the
variant allele on risk. A spurious "gene dose" effect
will be evident, with an odds ratio of 1.6 for babies
carrying one copy of the variant allele and 2.5 for
those carrying two copies. Similar bias would appear
in a case-control comparison of mothers of affected
babies compared with mothers of unaffected babies,
carried out by an investigator concerned about scenario B types of mechanisms. The same gene-dose
effect would be evident. In both designs, the investigator would be led astray by the presence of genetic
population stratification. Bias due to the simulated
population stratification completely disappears in our
analyzed simulations of case-parent triads because of
the conditioning on parental mating type.
We provide results for one set of simulations in
which the variant allele raises the risk 2.5-fold through
a dominant mechanism of action (i.e., in the presence
of either one or two copies of the variant allele). When
the allele is carried by the fetus, the relative risks are
designated as Rx and R2, and when the allele is carried
by the mother, the relative risks are 5, and S2. Under
the dominance assumption, /?, = R2 and 5, = S2.
(These equalities were not assumed in the subsequent
analyses.)
Under these conditions, we generated three types of
data: scenario A, in which the variant allele has its
effect only when present in the offspring (R} = R2 =
2.5, 5, = S2 =1); scenario B, in which the variant
allele has its effect only when present in the mother
(/?, = R2 = 1, Sl = S2 = 2.5); and scenario A + B,
in which the variant allele raises the risk equally
whether carried by the offspring or the mother (Rt =
898
Distinguishing Gene Effects Using Case-Parent Triads
R2 = 2.5, 5, = S2 = 2.5). One thousand independent
replicates of 100 case-parent triads were generated for
each of the three scenarios.
Even though none of these scenarios actually involves four different risks, this would not be known in
a real setting and, thus, we are obliged in the analysis
to estimate each risk separately. We fitted full models
to each data set, estimating all four relative risks
(parameters /?,, R2, 5,, S2). We computed nominal 95
percent confidence limits (standard error based) for
each data set and checked whether the true parameter
value was within those limits.
TABLE 4. Results (testing each of the two null hypotheses:
fl, = R2 = 1 and S, = S2 = 1) for three sets of simulated data,
each with 1,000 independent simulated studies, each of which
included 100 case-parent triads (see text for details)
Scenario A
Effect of child's gene?
Yes
No
Yes
Effect of mother's gene?
No
7
60
739
201
940
792
208
1,000
ScenarioB
Effect of child's gene?
Yes
No
RESULTS
Simulation results are provided in table 3. Estimates
showed no evidence of bias under the null and (for this
small sample size) a slight upward bias under alternatives. All observed coverage rates of confidence limits
were consistent with the assumed 95 percent confidence level.
Using the same sets of data, we attempted to exclude
an effect of either the child's or mother's genes by
reducing the model by two parameters. Standard methods lead to x 2 likelihood ratio tests with 2 df. Results
are given in table 4.
With relative risks of 2.5 and a sample size of 100,
there was 79 percent power to detect the effect of the
child's variant allele under scenario A and 79 percent
power to detect the effect of the mother's allele under
scenario B. An allelic effect was misattributed to the
mother only 6 percent of the time when the effect was
through the child (scenario A) and to the child only 5
percent of the time when the effect was through the
mother (scenario B), with both rates consistent with
the nominal type I error rate. In the combined scenarios A + B, both allelic effects were correctly identified 69 percent of the time, with 98 percent power to
reject the composite null hypothesis of no genetic
effects. Collapsing /?, and R2 into a single R improved
power further (results not shown).
53
Yes
39
748
787
No
12
201
213
51
949
1,000
Effect of mother's gene?
Scenario A + B
Effect of child's gene?
Yes
No
Yes
688
130
818
No
157
25
182
845
155
1,000
Effect of mother's gene?
DISCUSSION
Low-penetrance genes may affect offspring through
alleles carried by the offspring and through maternal
alleles acting via the intrauterine environment. Previous statistical approaches for the study of genetic risk
in case families have focused on the alleles inherited
by the offspring as the crucial determinant of risk
(1-6). The possibility that maternal genes may play an
independent role in the etiology of birth defects has
been recognized (8). However, the only analytical
approach proposed thus far requires data from both
TABLE 3. Estimates of relative risks produced by the variant allele in the offspring (scenario A), the mother (scenario B), or both,
with estimates for each scenario based on simulated data with 1,000 independent samples of 100 case-parent triads
True parameter values
Scenario
fl.
A
Bt
A + B§
2.5
1
2.5
2.5
1
2.5
s,
s,
1
2.5
2.5
1
2.5
2.5
Estimated values*
fli
2.57 (2.52-2.63)t
0.99 (0.97-1.01)
2.56 (2.51-2.61)
NO.
covered
963
961
952
« i
2.51 (2.42-2.60)
0.95 (0.92-0.99)
2.51 (2.44-2.59)
No.
covered
s,
No.
covered
s7
950
952
949
0.99 (0.98-1.01)
2.57 (2.51-2.63)
2.59 (2.53-2.64)
951
954
945
0.99 (0.95-1.03)
2.67 (2.55-2.79)
2.67 (2.58-2.77)
No.
covered
959
963
964
• Transformed mean parameter values from 1,000 replicates with coverage (number of 1,000 simulated studies) of nominal 95% confidence intervals. For
coverage counts, two standard errors would be about 14 counts, based on a rate of 0.95, in 1,000 simulated studies.
t Numbers in parentheses, 95% confidence interval.
t In the estimates, one was excluded because of an infinite Ft, estimate and six because of an infinite S, estimate, due to small numbers in some simulated
cells.
§ In estimates, two were excluded because of infinite S, estimates, due to small numbers in some cells.
Am J Epidemiol
Vol. 148, No. 9, 1998
Wilcox et al.
maternal grandparents as well as mothers (12), which
can be impractical.
We propose an approach based on the genotypes of
cases and their biologic parents. This approach estimates the separate effects of an allele carried by the
mother or by the affected child. Adjustment for confounding factors is feasible, as is the exploration of
gene-environment interaction. In principle, the same
method could be used to test more than two alleles of
a given gene, although the analysis becomes more
complicated. In work reported separately (9), we show
that the likelihood ratio test based on the log-linear
model outperforms the transmission/disequilibrium
test under either the recessive or dominant genetic
model for scenario A, and we also extend the model to
handle parental imprinting scenarios.
These parameter estimates and tests of hypotheses
must be interpreted with caution. Although the relative
risk estimates are derived from a single log-linear
modeling structure, there are some important distinctions between inference related to scenario A and
inference related to scenario B. Tests of scenario A
can be considered as simultaneously tests of association and linkage, that is, of linkage disequilibrium. For
scenario A the informative asymmetry is discerned
against a null background of simple Mendelian inheritance from parent to child. Symmetry of allele counts
(mother vs. father) within the parental mating types is
not actually needed for such tests. By contrast, tests of
scenario B rely on the assumed symmetry of allele
counts for mothers versus fathers within parental mating types. Thus, stronger assumptions are needed for
estimation and testing under scenario B than under
scenario A; moreover, rejection of the symmetry expected in the absence of maternal effects does not
necessarily imply linkage of the variant allele to a
genetic factor that confers risk through the mother but
only implies association. Linkage could be strictly
inferred only in a study that also included genotyping
the baby's maternal grandparents, as proposed by
Mitchell (12).
While our approach to the analysis of case-parent
triad data is not biased under scenario A by asymmetry
within parental mating type or by the presence of
genetic population structure, the resulting associations
may still not be directly causal. An allele that is
associated with a disease outcome under this design
may only be in linkage disequilibrium with a gene that
is important. Thus, the method cannot be expected to
distinguish between a genetic marker proximal to a
disease gene and a disease gene with incomplete penetrance.
Evidence for maternal effects must be interpreted
with particular caution. Suppose the population is
Am J Epidemiol
Vol. 148, No. 9, 1998
899
structured, such that certain subpopulations have
higher baseline risk for the disease and also higher
prevalence of the allele. The variation in baseline risk
across subpopulations could be due in part to unmeasured exposures or to deleterious genes unrelated to
the gene under study. To the extent that there is some
intermarriage across the distinct subpopulations (in
contrast to what was assumed in our simulations),
mothers from high risk subpopulations may be overrepresented among case-parent triads (compared with
fathers) because they bring to the marriage both their
likelihood of carrying the allele and their deleterious
exposure. On the other hand, if intermarriage is common enough to produce serious bias, the population
structure itself should disappear within several generations, thus removing the source of the problem.
Another kind of distortion can be caused by a gene
that affects metabolism of a certain exposure and, as a
consequence, indirectly affects the propensity to be
exposed. An example might be a gene that affects the
metabolism of ethanol, where carriers of the gene may
have a higher or lower alcohol intake on average.
However, such mechanisms simply serve to illustrate
the point that genes can operate either directly or
indirectly through their influence on exposures.
In the simulated samples of 100 case-parent triads,
we found nearly 80 percent power to detect the effect
(relative risk = 2.5) of an allele carried by the offspring. This power is similar to that of a case-control
study under the same conditions, with 100 cases and
100 controls (13). Using the case-parent-triad approach, power was equally high for detecting an allele
working through the maternal genotype. Furthermore,
the power to distinguish alleles working through the
mother or offspring was very high; complete misattribution of a real genetic effect to the wrong member of
the family occurred only about 1 percent of the time.
When the variant allele had effects through both the
mother and the offspring, both effects were estimated
with little bias.
The method that we have described can readily be
generalized to incorporate possible effects of parental
imprinting, where the effect of an inherited allele is
different depending on the parental source of the allele. The fitting of the model is more difficult, as
missing data methods must be applied, but simulations
reveal very good power for detecting imprinting effects (9).
The case-parent-triad analysis requires assumptions
that may not hold in particular situations. Disruption
of Mendelian transmission (e.g., if homozygous carriers of the variant allele do not survive) could lead to
situations where the apparent risk with two alleles (R2)
is lower than the risk with one allele (/?,). Fetal
900
Distinguishing Gene Effects Using Case-Parent Triads
survival might also be related to the condition under
study, for example, in fetuses affected with a neural
tube defect. The log-linear method does not implicitly
assume, however, that survival to clinical detection is
unrelated to genotype. In fact, only a much weaker
assumption is needed. This can be shown by expanding the Bayesian expression used earlier. Because babies can be included in a study only if they survive to
clinical detection, we need the probability of the joint
event where the defect occurs and the fetus survives to
birth. If we denote the survival of the fetus as "S,"
then:
Pr[M,F,C | D,S]
= Pr[D,S | M,F,C] Pr[C | M,F] Pr[M,F] / Pr[D,S].
Now we can rewrite Pr[D,S | M,F,C] = Pr[D | M,F,C]
Pr[S | D,M,F,C]. If the probability of survival among
fetuses who have developed the defect is independent
of the three allele counts, then Pr[S | D,M,F,C] =Pr[S
| D]. Cancellation of this factor in the numerator and
denominator of the above expression removes any
effect of possible differential survival. Thus, the distribution of alleles would still be the same among the
case-parent triads based on surviving cases. In this
way, the method is applicable even to life-threatening
conditions.
Our strategy for distinguishing maternal from fetal
genetic effects could apply to conditions of pregnancy
(such as preeclampsia) or infancy (such as birth defects or developmental problems). This strategy might
also be useful for the study of conditions among older
children and even adults. There are unexplained associations between infant characteristics (such as birth
weight) and adult diseases such as breast cancer and
heart disease (14, 15), consistent with the possibility
that genes that influence the development of the fetus
may also have effects on the risk of adult diseases. A
limitation in applying this method to diseases of adulthood is that parents of cases may be dead and their
genetic information inaccessible.
Compared with a case-control design, the caseparent-triad design has both advantages and disadvantages. Cases and parents of cases are more likely to
consent to genetic testing than are healthy controls or
their parents. Moreover, even in a population where
genotypes for healthy controls can be obtained, casecontrol studies are still vulnerable to confounding due
to population structure, as in our simulations. Furthermore, the study of case-parent triads can work well in
settings where cases are highly selected (e.g., drawn
from a clinic or support-group setting). Case-parent
triads can produce valid estimates of relative risks with
selected cases because the parents of cases provide
inherently well-matched genetic controls. The stratification on parental mating type absorbs variations in
recruitment rate that lead to overrepresentation of certain parental mating types (e.g., because of cultural
factors). These advantages of the case-parent-triad design may make it the method of choice for preliminary
studies of candidate alleles related to conditions of
pregnancy or early life.
Case-control studies have the advantage of being
able to detect nongenetic risk factors and to estimate
the population prevalence of a variant allele. Casecontrol studies are also less susceptible to the potentially distorting effects of non-Mendelian transmission. Mendelian transmission can be tested directly in
a case-control study, if genotypes are obtained for
control parents for a subset (not necessarily random)
of the controls. It may be possible to take advantage of
the respective strengths of the two approaches by
developing hybrid designs in which data from case
parents and some control parents are collected as part
of a case-control study.
The case-parent triad approach has broader use than
has been appreciated. Until now, its application has
been limited to the study of genes working through the
case's genotype and has depended on specialized software. We show that case-parent triad data can be used
to detect the effect of maternal genes, with an analytical approach that uses widely available software. Genetic relative risks can readily be estimated under the
proposed method.
ACKNOWLEDGMENTS
The authors are grateful to Drs. Norman Kaplan,
Stephanie London, and David Umbach for helpful suggestions on earlier drafts of this paper.
REFERENCES
1. Rubinstein P, Walker M, Carpenter C, et al. Genetics of HLA
disease associations: the use of the haplotype relative risk
(HRR) and the "haplo-delta" (Dh) estimates in juvenile diabetes from three racial groups. (Abstract). Hum Immunol
1981;3:384.
2. Falk CT, Rubinstein P. Haplotype relative risks: an easy
reliable way to construct a proper control sample for risk
calculations. Am J Hum Genet 1987;51:227-33.
3. Schaid D, Sommer S. Genotype relative risks: methods for
design and analysis of candidate-gene association studies.
Am J Hum Genet 1993;53:1114-26.
4. Spielman R, McGinnis R, Ewens W. Transmission test for
linkage disequilibrium: the insulin gene region and insulindependent diabetes mellitus (IDDM). Am J Hum Genet 1993;
52:506-16.
Am J Epidemiol
Vol. 148, No. 9, 1998
Wilcox et al.
5. Schaid D, Sommer S. Comparison of statistics for candidategene association studies using cases and parents. Am J Hum
Genet 1994;55:402-9.
6. Spielman R, Ewens W. Invited editorial: the TDT and other
family-based tests for linkage disequilibrium and association.
Am J Hum Genet 1996;59:983-9.
7. Oakley G. Folic acid-preventable spina bifida and anencephaly. JAMA 1993;269:1292-3.
8. van der Put NM, Steegers-Theunissen RP, Frosst P, et al.
Mutated methylenetetrahydrofolate reductase as a risk factor
for spina bifida. Lancet 1995;346:1070-1.
9. Weinberg CR, Wilcox AJ, Lie RT. A log-linear approach to
case-parent-triad data: assessing effects of disease genes that act
directly or through maternal effects and that may be subject to
parental imprinting. Am J Hum Genet 1998;62:969-78.
Am J Epidemiol
Vol. 148, No. 9, 1998
901
10. Self S, Longton G, Kopecky K, et al. On estimating HLAdisease association with application to a study of aplastic
anemia. Biometrics 1991;47:53—61.
11. Ewens W, Spielman R. The transmission/disequilibrium test:
history, subdivision, and admixture. Am J Hum Genet 1995;
57:455-64.
12. Mitchell L. Differentiating between fetal and maternal genotypic effects, using the transmission test for linkage disequilibrium. (Letter). Am J Hum Genet 1997;60:1006-7.
13. Schlesselman JJ. Sample size requirements in cohort and
case-control studies of disease. Am J Epidemiol 1974;99:
381-4.
14. Barker DJ. Maternal and fetal origins of coronary heart disease. J R Coll Physicians Lond 1994;28:544-51.
15. Michels K, Trichopoulos D, Robins J, et al. Birthweight as
a risk factor for breast cancer. Lancet 1996;348:1542—6.