Download Method to Detect Genotype-Environment Interactions for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
American Journal of Epidemiology
Copyright O 1999 by The Johns Hopkins University School of Hygiene and Public Health
AH rights reserved
Vol. 150, No. 11
Printed In USA.
Method to Detect Genotype-Environment Interactions for Quantitative Trait
Loci in Association Studies
Edwin J. C. G. van den O o r d u
Khoury et al. {Am J Hum Genet 1988;42:89-95 and Am J Epidemiol 1993; 137:1241-50) presented an
epidemiologic approach to examine genotype-environment interaction in situations where the disease is either
present or absent. In this article, the author extends the approach of Khoury et al. to quantitative outcome
variables. This extension is relevant for diseases that are extremes on a continuum or when continuous risk
factors are studied. To account for a possible admixture of subgroups in the sample, tests for genotypeenvironment interaction are discussed for designs with parents as controls as well as without parents as
controls. Assuming two environmental conditions, the author demonstrates how the power of these tests can be
calculated and used to estimate the sample sizes needed to detect genotype-environment interaction in a variety
of conditions. In addition, he analyzes simulated data to demonstrate the detection of different mechanisms of
genotype-environment interaction and to study the effectiveness of this approach to identify the correct
mechanism. Finally, extensions to multiple environmental conditions and designs with multiple subjects per
family are discussed. Am J Epidemiol 1999; 150:1179-87
epidemiologic methods; genetics; relative risk; statistics
Insight into the interplay between genes and environments is important to obtain a better understanding
of disease mechanisms (1). A classical example is
phenylketonuria (PKU), a metabolic disorder involving a rare gene (approximately 1 in 10,000 newborns)
which gives rise to a mental handicap only when
phenylaline is present in the diet. Because phenylaline
is present in many foods, under normal circumstances
all children with the genetic defect will get PKU retardation. For this reason the role of environmental factors was initially even overlooked (2). However, the
recognition of the true mechanism brought along the
insight that putting the children with the genetic defect
on a special diet could prevent the adverse effects of
PKU.
PKU is an example of a genotype-environment
interaction. Genotype-environment interaction refers
to the differential effects of the same environment on
individuals with different genotypes (or the differential
effects of different environments on individuals with
the same genotype). The environmental component
does not necessarily have to involve harmful events
but can, for instance, also be a demographic characteristic such as when the expression of a gene is age
dependent. To examine genotype-environment interaction Khoury et al. (3, 4) advocated an epidemiologic
approach. By using epidemiologic definitions these
authors showed how the risk of disease is a function of
the gene effects, the frequency of the exposure to the
environmental agent, and the strength of the interaction between the genotype and the agent. In addition,
they delineated six biologically plausible patterns of
genotype-interactions and proposed empirical tests to
discriminate among these patterns. In subsequent
work, their approach was extended by demonstrating
how to estimate the sample sizes needed to detect the
interactions and by developing strategies to avoid spurious findings caused by an admixture of subgroups
from the population (5-7).
The approach developed by Khoury et al. (3, 4)
applies to qualitative dependent variables where the
disease is either present or absent. However, sometimes the dependent variable is continuous. Examples
are hypertension or the commoner varieties of psychopathology where the disease may be an extreme on
a continuum. In other situations, the dependent variable may be a continuous trait associated with a
chronic disease, such as cholesterol is a quantitative
Received for publication July 16,1998, and accepted for publication March 1, 1999.
Abbreviations: AIC, Akaike's Information Criterion; ANOVA, analysis of variance; PKU, phenylketonuria; QTL, quantitative trait locus;
SEM, structural equation modeling; TDT, transmission-disequilibrium
test; TLI, Tucker-Lewis Index.
1
1nstitute of Psychiatry, London, England.
2
Department of Child and Adolescent Studies, Utrecht University,
Utrecht The Netherlands.
1179
1180
vandenOord
risk factor for heart disease. To examine genotypeenvironment involving quantitative variables we
extended the structural equation modeling approach
outlined by Fulker et al. (8) and Van den Oord
(Institute of Psychiatry, unpublished manuscript).
Some advantages of this approach are: the model parameters have a clear interpretation and known relation
to the additive genetic and dominance variance; it is
not necessary to assume equal variances in all genotype groups; and structural equation modeling is very
flexible so that it can easily be extended to include
multiple subjects per family or to study different
mechanisms of genotype environment interaction.
Traditionally, qualitative traits are studied using casecontrol designs. In this design, it is tested whether subjects with the high-risk allele of genotype are more frequently a case than a control. When the "dependent
variable" is a continuous trait instead of a dichotomous
(case or control) variable, the corresponding test consists of examining whether groups of subjects with different genotypes have different trait means. In addition
to Type I errors, significant differences in means can be
caused by: 1) the marker allele plays a causal role in
determining the trait, 2) the marker allele is in disequilibrium with an unobserved allele that itself is causal
(disequilibrium can occur when the marker and causal
gene He very close to each other on the genome so that
they are transmitted together for many generations), or
3) population admixture which means that there are subgroups, e.g., ethnic groups, that differ from each other
with respect to marker frequencies as well as trait
means. Significant differences due to population admixture are a problem because in this case the marker is
unrelated to a causal gene and not informative about the
differential expression of a gene in different environments. Therefore, if population admixture is present it
needs to be taken into account to avoid inaccurate conclusions. In the context of qualitative traits this can be
done by using the transmission-disequilibrium test
(TDT) (9, 10), which tests whether within parental mating type groups the high risk allele is transmitted more
frequently to cases than other parental alleles. Its quantitative counterpart consists of testing whether within
parental mating type groups the transmission of the
high-risk allele is associated with higher means than the
transmission of the other parental alleles. Although population admixture may affect the differences in allele
frequencies or means between parental mating types,
within parental mating type groups there can be no preferential transmission of an allele or differences in means
unless the marker is related to the causal gene. To
account for a possible population admixture, we present
here tests for genotype-environment interaction with
parents as controls as well as without parents as controls.
After tests with and without parents are introduced,
we show how the power can be calculated and used to
estimate the sample sizes needed to detect genotypeenvironment interactions in a variety of conditions.
Next, we analyze simulated data to demonstrate the
detection of different mechanisms of genotypeenvironment interaction and to study the effectiveness
of our approach to identify the correct mechanism.
TESTS FOR GENOTYPE-ENVIRONMENT
INTERACTION IN SITUATIONS WITH AND
WITHOUT PARENTS AS CONTROLS
To derive tests for genotype-environment interaction
that involve quantitative traits we used structural equation modeling (SEM). To a certain extent, the approach
resembles an analysis of variance (ANOVA) where the
genotypes and environmental conditions are the factors and the quantitative trait the dependent variable.
The analysis consists of modeling the means for each
combination of groups as a function of the genetic
effects, environmental effects, and interaction effects.
Significance tests can be performed by examining the
deterioration in the fit when the interaction parameters
are fixed to zero.
For the test without parents as controls, let o be a
constant that is the same for all groups in the analysis,
gj the effect of genotype j , ek the effect of environmental condition k, ge-lk the interaction effect between the
genotype and environment, and riJk a residual score of
subject i with genotype j in environmental condition k
consisting of the effects of other unlinked loci and
environmental factors. The trait score xijk of subject i
with genotype; in environmental condition k can then
be written as:
= o + j + ek + gejk + rijk
(1)
We assume two alleles, A, and A2, and two environmental conditions. Such restrictions are not required
but are made to simplify the discussion of the tests. For
a biallelic locus, there are three possible genotypes. To
model the effects of these genotypes, we used the parameterization discussed by Falconer (11) in which a
score a is assigned to A,A, subjects, d to A,A2 subjects, and -a to A2A2 subjects. Thus, the genotypic values are expressed on a scale where both homozygotes
have additive genetic scores that differ a from the center of the scale. If d is zero, there is no dominance so
that the heterozygote is exactly in the middle of the
two homozygotes. If d is positive, there is dominance
of the A, allele over the A2 allele and the value of the
heterozygote is relatively closer to the A, A, genotype.
If d is negative, there is dominance of the A2 allele
Am J Epidemiol Vol. 150, No. 11, 1999
Genotype-Environment Interactions for Quantitative Trait Loci 1181
over the A, allele and the value of the heterozygote is
relatively closer to the A2A2 genotype. The environmental effect was modeled by specifying parameter e
in environmental condition 2 that represented the
increase or decrease of the mean in condition 2 compared to condition 1. Finally, interaction effects were
modeled by assigning an interaction effect / to A,A,
subjects in environmental condition 2, an interaction
effect n to AjA2 subjects in environmental condition 2,
and an interaction effect -i to A2A2 subjects in environmental condition 2. Thus, i represented the difference between the value of a homozygote in condition
2 compared to the value of a homozygote in condition
1 after the environmental effect e has been taken into
account. Parameter n represents different amounts of
dominance in both conditions. The expected means in
the six groups are summarized in table 1.
Table 1 shows that the means are modeled with six
parameters o, a, d, e, i, and n. With two environmental
conditions and three genotypes, there are also six
observed means. The model is therefore the saturated
model and an unique value can be estimated for each
parameter. To test for interaction effects, parameters, i,
n, or both, can be fixed to zero. The test statistic equals
the difference between the %2 of the restricted model
and the x2 of the unrestricted model, and is X2 distributed with the number of fixed interaction parameters
as the degrees of freedom. By constraining the variances across the three groups, o^ = a2, such a x2 difference test can also be used to examine whether the
group variances have to be modeled by specifying a
separate variance parameter <Xj in each group or a single variance parameter a 2 for all groups.
In the presence of population admixture, an unbiased test for causal effects of the locus can be obtained
TABLE 1. Expected group means for the tests wttti and without parents as controls
Mean
Without parents as controls
Genotype A,A,
Genotype A,A,
Genotype A^A,
With parents as controls
Mating type A,A,; A.A,
Genotype A,A,
Genotype A^A,
Mating type A ^ A,A,
Genotype A,A,
Genotype AjA,
Mating type A,A,; A.A,
Genotype A,A,
Genotype A,Aj
Genotype AJA,
Environ mental
corKfitton
1
Environmental
condition
2
o+a
o+ d
o- a
o+d+e+ n
o-a+e-l
o, + a
o, + d
o, + a + 0+ i
o, + d + o + n
ot + d
o, + d + 6 + n
o,-a
ot- a
o, + a
o, + d
o, + a +0+1
o, + d+0+n
o,-a + e - i
o,-a
Am J Epidemiol Vol. 150, No. 11, 1999
o+a+ e+/
+0-1
by examining whether within parental mating type
groups subjects with different genotypes have different
means. For such TDT-like tests, only subjects with at
least one heterozygote parent can be used because otherwise all subjects within that group have the same
genotype (e.g., A(AI;A1A1 matings always yield A!A,
genotypes). We want to note that this test is not identical to an ANOVA by parental mating type. This is
because the test allows subjects in different parental
mating type groups to have different means, and constrains only the means of genotypes within the same
parental mating type group to be equal. The mean in
each mating group equals ot = o + mh in which o is
the overall constant and ml the effect of admixture on
the mean in mating type group /. The admixture effect
m depends on the differences in allele frequencies and
trait means among subgroups in the sample. It does not
affect the differences between genotype groups within
mating type groups because in epidemiologic samples
(also in selected samples when there are no causal
genetic effects) the probabilities of getting different
genotypes of offspring are identical in all subgroups. To
account for admixture, formula 1 can therefore be
rewritten as:
(2)
rijk
An additional assumption of this model is that the
interaction effect gejk is independent of the admixture
effect ml and identical in each subgroup in the population. With two alleles, there are three mating types that
involve at least one heterozygote parent (/ = 1..3). The
expected means of the possible genotypes wihin each
of the three mating type groups are shown in the second
part of table 1.
The whole test specifies eight parameters for the
means: ou o2, o3, a, d, e, i, and n. To test for interaction
effects, parameters /, n, or both, can again be fixed to
zero and the x2 difference test with 1 or 2 degrees of
freedom can be used to examine whether the interaction effects are significant. This difference test can also
be used to examine how the variances in the seven
groups have to be modeled.
Example scripts for the tests discussed in this section are available from the author. The scripts involve
the commonly used and excellent SEM program Mx
(12), which can be downloaded from Internet site
http://opal.vcu.edu/html/mx/mxhomepage.html free of
charge.
gejk
POWER TO DETECT GENOTYPE-ENVIRONMENT
INTERACTION
In this section, we demonstrate how the power of
these tests can be calculated and used to estimate the
sample sizes needed to detect genotype-environment
1182
van den Oord
interactions. The sample sizes needed to reject the
incorrect hypothesis assuming no interaction were
computed for 80 percent power with alpha (Type 1
error) is 5 percent. The power was calculated using the
approach discussed by Satorra and Saris (13). Under
the incorrect hypothesis, the test statistic is distributed
approximately as a non-central X2 distribution with
non-centrality parameter X. To determine X, we first
computed the expected means and variances using the
chosen values for the parameters a, d, e, i, n, and X2Next, a model was fitted to these input statistics that
assumed no interaction (i and n were fixed to zero).
The x 2 obtained from this analysis was used as the
value of A, and the number of fixed interaction parameters (two) as the degrees of freedom of the noncentral x 2 distribution.
Different patterns of genotype-environment
interaction
For the power calculations, we used the six types of
biologically plausible patterns of genotype-environment
interaction discussed by Khoury et al. (3, 4). An elaborate discussion and examples can be found in their articles and we will confine ourselves here to a short
description. Because these authors only addressed situations without dominance, we added two subtypes that
involved parameters d and n.
The different types of interaction are depicted in figure 1 and the specification using our model parameters
is shown in table 2. The first type is shown in figure la
and shows the situation where the trait is increased
only in the presence of both the high-risk genotype and
high-risk environmental condition. Thus, the corresponding row in table 2 indicates that parameters a and
e are zero and that i is larger than zero, implying higher
means for genotypes with the A! allele in condition 2.
Figure lb is a variation on interaction type 1 in which
the interaction effect n for the heterozygote is also
larger than zero. In this specific case, we assumed that
the interaction effect for the heterozygote equaled the
interaction effect for the homozygotes {n = i) so that
in condition 2 the mean for A,A2 subjects equaled the
mean of the AjA! subjects.
The second interaction type is depicted in figure lc
and applies when the trait is increased by the high-risk
genotype in the high-risk environmental condition, as
well as by exposure alone. Thus, parameter e is now
larger than zero and figure lc shows that in comparison
to figure la the means are higher for all genotypes in
condition 2. Type 3 interaction is shown in figure Id and
involves the situation where the trait is increased by the
genotype alone but not by the exposure alone. In this situation, parameter a is larger than zero so that in com-
parison to the previous figures there are now also differences between the genotypes in condition 1. A variation including additional dominance effects is depicted
in figure le. Figure le shows that because of the dominance effects the mean of the heterozygotes has shifted
d from the middle of the two homozygotes in both environmental conditions. Because we assumed that dominance parameter d equaled additive genetic parameter a,
in condition 1 the means of the heterozygotes and A ^
homozygotes are the same. The fourth type of interaction is shown in figure If and assumes an effect of the
environment alone, an effect of the genotype alone, and
an interaction effect consisting of higher means for the
high-risk genotype in the high-risk environment Thus,
in terms of the model parameters shown in table 2 this
implies that e, a, and i are all larger than zero. A special
case of type 4 interaction is the multiplicative model
where the interaction effect equals the product of
genetic and environmental effect (i = e x a). Type 5
interaction assumes that the genotype is protective or
decreases the mean in the absence of the exposure but
becomes deleterious or increases the mean in the presence of the exposure. This implies that the value of a is
larger than zero and that interaction parameter i is negative. An example is given in figure lg. Because the
interaction parameter is larger than the additive genetic
values (i = -2 x a) in figure lg, the lines cross and the
ranking of the means are reversed in both conditions.
Finally, the sixth type is identical to type 5 interaction
except that there is also an effect of exposure alone. This
situation is depicted in figure lh. A comparison with figure lg shows that due to the environmental effect e all
genotypes have higher means in condition 2.
Power calculations
To compute the input statistics, we assigned a value of
five to all non-zero parameters that were specified for a
given interaction type (see table 2). For interaction types
5 and 6, we assumed that i was -10. The proportion of
subjects in condition 1, b, and the frequency of the A,
allele, p, were both assumed to be 0.5. For all interaction
types, we fixed in the situation without parents as controls, the proportion of explained variance o^,, to 10
percent of the total variance so that the variance in the
genotype groups equaled a 2 = 10 x <£xpl — a2^,,. The
variances in the situation with parents as controls were
assumed to be equal for all seven groups and identical to
the variance in the situation without parents as controls:
Strictly speaking, the power to detect interaction
depends on the proportion of interaction variance and
not the type of interaction. However, given the same
amount of explained variance, some types will be easAm J Epidemiol
Vol. 150, No. 11, 1999
Genotype-Environment Interactions for Quantitative Trait Loci
Figure 1b: Type 1b interaction
Figure 1a: Type 1a interaction
H
H
Condition 2
Condition 1
1183
Condition 1
Condition 2
Figure 1d: Type 3a interaction
Figure 1c: Type 2 interaction
AjA,
Condition 1
Condition 2
Condition 2
Figure 1h: Type 6 interaction
Condition 2
FIGURE 1. Six types of genotype-environment interaction.
Am J Epidemiol Vol. 150, No. 11, 1999
Condition 2
Condition 1
Figure 1g: Type 5 interaction
Condition 1
Condition 2
Figure 1f Type 4ab interaction
Figure 1e: Type 3b interaction
Condition 1
Condition 1
Condition 1
Condition 2
1184
van den Oord
TABLE 2.
Specification of different kinds of genotype-environment Interactions*
Interaction
type
Hi
Type 1a
Type 1b
Type 2
Type 3a
Type 3b
Type 4a
Type 4b
Type 5
Type 6
0
0
x>0
0
0
x>0
x>0
0
x>0
X
X
X
X
X
X
X
X
X
0
0
0
x>0
x>0
x>0
x>0
x>0
x>0
0
0
0
0
x>0
0
0
0
0
x>0
x>0
x>0
x>0
x>0
x>0
= e*a
x<0
x<0
0
x>0
0
0
0
0
0
0
0
• x denotes estimated parameter, 0 denotes a parameter that is fixed to zero.
ier to detect because they imply a larger proportion of
interaction variance. For instance, in the situation
without parents as controls the explained variance o2^
equals:
y Ow - U)2
o2 , = y
J
(3)
k
= O2 + O2k + O2k
with \ijk as given in table 1, and M-
=
2
j
*
\ij = ^ ^*My*» a r | d M* = 2 P/My*- Th e proportion of
interaction variance can now be computed with
OjiJ(p2 + o^i), where o2 is the variance within the
groups. Substituting the chosen parameter values in
formula 3 shows that the proportions of interaction
variance equal 5.00 for interaction type la, 4.29 for
type lb, 2.50 for type 2, 1.00 for type 3a, 0.83 for type
3b, 0.83 for type 4a, 3.29 for type 4b, 10.00 for type
5, and 6.67 for type 6.
Formula 3 indicates that the explained variance
equals the sum of the variance due to the main effect
of the genotypic groups oj, the main effect of the environmental groups of, and the interaction variance ojk.
It is important to realize that the effects of the genetic
and environmental parameters seen in formula 1 are
not equivalent to the main and interaction effects in
formula 3. For instance, assume type 1 interaction in
which there are no main effects and there is only a contribution of interaction parameter i, that the interaction
parameter i equals 5, and that b = p = 0.5. For this sit-
uation, the group means \ij for the A,A,, A,A2, and
A2A2 genotypes are 2.5, 0, and -2.5, and of equals
3.125. This demonstrates that part of an interaction
effect can show up as a main effect in the traditional
ANOVA context.
Formula 3 shows that the proportion of subjects in
condition 1 and the frequency of the A j allele affect the
interaction variance. To assess the impact of these factors, the power calculations were repeated with b =
0.1/0.9 and p = 0.1/0.9. In these calculations, the
within-group variance for each type of interaction was
fixed to the within group variance in the situation with
b = p = 0.50. Results of the power calculations are
shown in table 3. Forp = b = 0.5, the required sample size to detect interaction with 80 percent power and
alpha is 5 percent was on average 450 and ranged from
100 for interaction type 5 (10 percent interaction variance) to 1,055 for interaction types 3b and 4a (0.83
percent interaction variance). Sample sizes increased
when p and b deviated from 0.5. The required sample
sizes were about the same for conditions with b = 0.1
and b = 0.9. This symmetry implied that it does not
matter whether the proportion of subjects deviates
from 0.5 in the direction of the high-risk or low-risk
environmental condition. Despite small differences in
interaction variance, results for p = 0.1 and p = 0.9
were also symmetric and showed that the required
sample sizes did not depend on whether the A, or A2
allele was more frequent. An exception was the test for
interaction type lb that involved interaction parameter
n for the heterozygote and was much more powerful
withp = 0.1 than withp = 0.9.
With respect to the power for the test with parents as
controls, a first remark is that it will always be lower
than in the situation without parents as controls
because only subjects with at least one heterozygous
parent can be used. The proportion of selected subjects
increases when the allele frequencies become more
equal and the number of alleles become larger. For
instance, in a random sample and with two alleles
Am J Epidemiol
Vol. 150, No. 11, 1999
Genotype-Environment Interactions for Quantitative Trait Loci
1185
TABLE 3. Sample sizes required to detect different types of genotype-environment Interaction for 80%
power at alpha Is 5% In situations with and without parents as controls*,!
6 = 0.5
b = 0.Mb = 0.i
Interaction type
p = 0.1
p = 0.5
p = o.9
p = 0.1
p = 0.5
p = 0.9
Without parents as controls
1a
1b
2
3a
3b
4a
4b
5
6
511
268
994
2,440
2,922
2,922
762
269
390
187
218
361
881
1,055
1,055
278
100
144
511
4,292
994
2,440
2,922
2,922
762
269
390
1,538
836
2,885
6,907
8,246
8,246
2,240
858
1,199
563
666
1,045
2,491
2,973
2,973
814
320
1,538
14,373
2,885
6,907
8,246
8,246
2,240
858
1,199
With parents as controls
1a
1b
2
3a
3b
4a
4b
5
6
297
162
580
1,428
1,711
1,711
444
154
226
187
218
361
881
1,055
1,055
278
100
144
297
1,444
580
1,428
1,711
1,711
444
154
226
859
466
1,646
3,999
4,783
4,783
1,269
460
660
563
441
666
1,045
2,491
2,973
2,973
814
320
441
859
4,755
1,646
3,999
4,783
4,783
1,269
460
660
* For the situation with parents as controls, trie required sample sizes refer to the number of subjects with at
least one heterozygote parent.
t All power calculations assumed two degrees of freedom.
about 75 percent of the subjects can be selected when
p = 0.5 and about 32.8 percent can be selected when/?
= 0.1 or p — 0.9. To examine the power as such and
avoid obvious sample size effects caused by the selection of subjects, the sample sizes reported in table 3 for
the situation with parents as controls referred to the
selected number of subjects with a heterozygous parent. Thus, to obtain the initial pool parents and subjects
that would need to be genotyped, the sample sizes
reported in table 3 need to be divided by 0.75 when p
= 0.5 and by 0.328 whenp = 0.1 or p = 0.9.
The required sample sizes reported in table 3 show
that with p = b = 0.5 the tests in the situation with and
without parents as controls are equally powerful to
detect interaction. With p = b = 0.1 or 0.9 sample
sizes were lower in the situation with parents as controls. This indicated the more severe selection of subjects was partly compensated by the fact that more
informative subjects were selected.
DISCRIMINATIVE POWER
The tests discussed in the previous section address
the question whether or not interaction exists. Once
this has been established, it is of interest to identify the
type of interaction. This can be examined by fitting
models that represent different types of interactions to
Am J Epidemiol
Vol. 150, No. 11, 1999
the data and choosing the model that yields the best fit.
For this purpose, constraints need to be imposed on
parameters a, d, e, i, n. These can be equality constraints in which parameters are fixed to zero, boundary constraints in which parameters are constrained to
lie between certain values, and non-linear constraints
in which parameters are constrained to be non-linear
functions of other parameters.
To illustrate the identification of the correct mechanism, we again used the interaction types discussed by
Khoury et al. (3, 4). The constraints involved were
those reported in table 2. Data were simulated on the
basis of each interaction type and analyzed by fitting
models representing all interaction types. Ideally, the
model used to simulate the data gives the best fit and
is selected. To select the best fitting model, we
employed three different fit indices. First, we selected
the model with the highest p value of the %2 goodness
of fit index. The second index was Akaike's
Information Criterion (14), AIC = %2 - 2 degrees of
freedom (df), which has proven to be a good index in
many situations (15). Smaller values of AIC indicate a
better fit and we therefore selected the model with the
lowest AIC. Finally, we used the Tucker-Lewis Index
(TLI),
TLI =
- X,2/df,)/(}rj/df6 - 1),
1186
van den Oord
the true interaction type under which the data were
simulated and the columns to the interaction models
used to analyze the simulated data. Results showed
that there were marked differences. For instance, type
6 interaction was correctly identified in 99.7 percent of
the 2,000 simulations, whereas type 4b interaction was
correctly identified in 44.95 percent of the 2,000 simulations. Another salient finding was that interaction
type 2 was relatively often mistakenly identified as
interaction type 4b (in 33.35 percent of the 2,000 simulations), 4a as 4b (37.66 percent), and 4b as la (25.50
percent).
A final remark is that the accuracy to detect the correct interaction model will depend on the factors that
determine the power of the tests such as the number of
observations, the proportion of subjects in both environmental conditions, and the allele frequencies. For
instance, with the number of observations equal to 250
instead of 500, the average proportion of correctly
identified interaction types using the p value, AIC, and
TLI dropped in the situation without parents as controls from 75.8 percent, 78.7 percent, and 75.2 percent
to 67.3 percent, 67.3 percent, and 66.8 percent.
which reflects the improvement in fit of the target
model (subscript t) compared to a baseline model (subscript b). Our baseline model assumed no group differences in means. The index ranges from 0 to about 1.
Because larger values imply a better fit, we selected
the model with the highest TLI. We included the TLI
because research has shown that it possesses some
favorable features such as being relatively insensitive
to sample size (16).
We wrote a Pascal program to simulate for each
interaction type 2,000 data sets of 500 subjects. The
parameter values used for this simulation were identical to the ones reported above for the power calculations with p = b — 0.5. This condition was chosen
because our calculations showed that in this situation
there is generally sufficient power to detect genotypeenvironment interaction so that it makes sense to try to
discriminate between the various types.
Results indicated that the proportion of correctly
identified models on the basis of the p value, AIC, and
TLI was on average 75.8 percent, 78.7 percent, and
75.2 percent in the situation without parents as controls,
and 75.6 percent, 77.4 percent, and 75.2 percent in the
situation with parents as controls. This suggested that if
genotype-environment interaction can be detected, it
will generally be possible to identify the correct interaction type. Compared to the other indices, slightly better results were obtained with the AIC. The differences
between the situations with and without parents as controls were small. This indicated that these two designs
did not differ substantially in identifying the mechanism of interaction given equal sample sizes.
Because of the similarity of the results for the three
different indices and the two tests, in table 4 we show
only the complete results for the AIC in the situation
without parents as controls. The rows in table 4 refer to
TABLE 4.
DISCUSSION
The present paper shows how genotype-environment interaction can be studied for quantitative traits.
Although we have confined ourselves to two environmental conditions and three genotypes, the extension
to e environmental conditions and g genotypes is
straightforward. Analogously, the e x g means can be
modeled with one overall mean, e - 1 environmental
effects, g - 1 genotype effects, and {e - \){g - 1) interaction parameters. The test can also be extended to
designs with multiple siblings. Recently, Fulker et al.
Discriminative power to Identify the correct type of genotype-environment Interaction*,t,+
Interaction type
Without parents as
controls
1a
1b
2
3a
3b
4a
4b
5
6
Interaction type
1a
1b
2
3a
3b
4a
4b
79.41
5.65
97.35
0.60
0.45
1.50
0.60
6.65
6.45
0.90
64.80
0.10
6.50
1.70
1.10
0.25
0.05
1.25
7.60
0.20
58.94
4.45
0.05
0.60
33.35
1.55
3.20
37.66
44.95
2.50
0.25
0.15
25.50
0.70
8.50
79.90
3.45
0.20
7.15
6.90
91.30
0.25
2.80
5
6
0.95
0.05
0.05
0.05
0.05
1.45
92.30
0.30
7.70
99.70
* Results were based on 2,000 replications.
t The interaction types in the rows represent the true interaction type under which the data were simulated; the interaction types in the
columns represent the type that was used to analyze the data.
$ Bold numbers indicate percent of correctly identified models.
Am J Epidemiol
Vol. 150, No. 11, 1999
Genotype-Environment Interactions for Quantitative Trait Loci
(8) showed how SEM can be used with sibling pairs by
creating groups consisting of unique combinations of
sibling pair genotypes, modeling the means of the siblings in these groups as a function of genetic effects,
and estimating the correlations among siblings in a
family to model the dependency in the data. To study
interaction, this model could be extended by specifying a parameter e for all siblings in environmental condition 2 to estimate the main environmental effect,
assign an interaction effect / to siblings with genotype
A,A] in environmental condition 2, an interaction
effect n to siblings with genotype A[A2 in environmental condition 2, and an interaction effect -i to siblings with genotype A2A2 in environmental condition
2. This sibling design also provides an alternative way
to control for population admixture (15). That is,
instead of testing whether within parental mating type
groups the means of subjects with different genotypes
differ, it can be tested whether within sibling pair
genotypes groups the means of siblings with different
genotypes differ.
For all power calculations, we assumed that the noncentral x distribution had two degrees of freedom
regardless of whether both interaction parameters were
larger than zero. In a sense, this is a somewhat conservative approach. Under the assumption that a specific
interaction effect for heterozygotes is unlikely, parameter n could be fixed to zero in advance. Genotypeenvironment interaction could then be tested by fixing
parameter i to zero. This test is more powerful because
it has only one degree of freedom so that fewer subjects will be required to detect interaction. Instead of
assuming that parameter n is zero, it is also possible to
test this empirically although it might be useful to
determine the power of this test to exclude the possibility that a nonsignificant result may indicate a lack of
power.
Studying genotype-environment interaction may be
important to obtain a better understanding of the disease mechanisms. In this respect, our power calculations and simulation study were encouraging because
they suggested that genotype-environment interaction
can be detected with realistic sample sizes and that
Am J Epidemiol Vol. 150, No. 11, 1999
1187
once it is found it will generally be possible to identify
the mechanism of interaction.
REFERENCES
1. Khoury MJ. Genetic and epidemiologic approaches to the
search for gene-environment interaction: the case of osteoporosis. Am J Epidemiol 1998; 147:1-2
2. McCall R. So many interactions, so little evidence: why? In:
Wachs TD, Plomin R, eds. Conceptualization and measurement of organism-environment interaction. Washington, DC:
American Psychological Association, 1991:142-61
3. Khoury MJ, Adams MJ, Flanders WD. An epidemiologic
approach to ecogenetics. Am J Hum Genet 1988;42:89-95.
4. Khoury MJ, James LM. Population and family relative risks of
disease associated with environmental factors in the presence
of gene-environment interaction. Am J Epidemiol 1993;137:
1241-50.
5. Hwang SJ, Beaty TH, Liang KY, et al. Minimum sample size
estimation to detect gene-environment interaction in case-control designs. Am J Epidemiol 1994; 140:1029-37.
6. Khoury MJ, Flanders WD. Nontraditional epidemiologic
approaches in the analysis of gene-environment interaction:
case-control studies with no controls! Am J Epidemiol 1996;
144:207-13.
7. Yang Q, Khoury MJ, Flanders WD. Sample size requirements
in case-only designs to detect gene-environment interaction.
Am J Epidemiol 1997; 146:713-20.
8. Fulker DW, Cherny SS, Sham PC, et al. Combined linkage and
association sib-pair analysis for quantitative traits. Am J Hum
Genet 1999;64:259-67.
9. Rubinstein P, Walker M, Carpenter C, et al. Genetics of HLA
disease associations: the use of the haplotype relative risk
(HRR) and the "haplo=delta" (DH) estimates in juvenile diabetes from three racial groups. (Abstract). Hum Immunol
1981;3:384.
10. Spielman RS, McGinnis RE, Ewens WJ. Transmission test for
linkage disequilibrium: the insulin gene region and insulindependent diabetes mellitus (IDDM). Am J Hum Genet 1993;
52:506-16.
11. Falconer DS. Introduction to quantitative genetics. Essex,
England: Longman, 1989
12. Neale MC. Mx: statistical modeling. Richmond, VA:
University of Virginia, Department of Psychiatry, 1994.
13. Satorra A, Saris WE. Power of the likelihood ratio test in
covariance structure analysis. Psychomet 1985 ;50:83-90.
14. Akaike H. Factor analysis and AIC. Psychomet 1987;52:317-32.
15. Williams LJ, Holahan PJ. Parsimony-based fit indices for multiple indicator models: do they work? Struct Equat Model
1994;l:161-89.
16. Marsh HW, Balla JR, McDonald RP. Goodness-of-fit indexes
in confirmatory factor analysis: the effect of sample size.
Psychol Bull 1988; 103:39^10