Download Assumptions of twin modeling

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Biology and consumer behaviour wikipedia , lookup

Genetic engineering wikipedia , lookup

Genetic drift wikipedia , lookup

Heritability of autism wikipedia , lookup

Genome (book) wikipedia , lookup

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Genetic testing wikipedia , lookup

Public health genomics wikipedia , lookup

Irving Gottesman wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Human genetic variation wikipedia , lookup

Behavioural genetics wikipedia , lookup

Heritability of IQ wikipedia , lookup

Twin study wikipedia , lookup

Transcript
Karri Silventoinen
University of Helsinki
Osaka University



Genetic twin modeling, such as all statistical modeling,
makes certain assumptions
In twin modeling, there are two types of assumptions
Some of the assumptions are technical and are based on
the way we specify the model
◦ They can be tested statistically
◦ Violations may indicate complexity of the world not captured by
the used model

However some assumptions do not affect the technical
model specifications but rather how to interpret the
results
◦ Violation of these assumptions may not affect the model fit but
can lead to wrong conclusions
◦ In many cases, they cannot be tested directly
◦ The effects need to be considered when interpreting the results


The most important technical assumption for twin
modeling is that the variances are the for MZ and DZ twins
as well as for the first and second twin in the pair
The technical reason for this was presented already in the
previous presentations
◦ Same path coefficients are used both for MZ and DZ twins
◦ Necessary assumption for making possible to estimate the model
◦ The exception is sib-sib interaction we will treat later in this
presentation

A real problem is, however, that greater variation, for
example, in MZ twins may indicate that there is additional
source of variation
◦ For example, intrauterine conditions may lead to greater variation
of birth weight in MZ twins
◦ We cannot estimate the sources of this extra variation and thus
cannot take it into account when specifying the model


The assumption of similar variances in MZ and DZ
twins is closely related to the equal environment
assumption
This means that the same amount of environmental
variation affects MZ and DZ twins
◦ MZ and DZ twins are thus samples from the same basic
population

However this does not mean that MZ co-twins could
not share more environmental variation than DZ twins
◦ It is very likely that environment of MZ twins is more similar
than DZ co-twins
◦ Environment is partly reflection of genetic factors
◦ This part of environmental variation is (correctly) modeled
as genetic variation




In many cases twin models are written in the
way that equal means are also expected
However because in twin studies we are
interested in decomposing variation, this
does not violate the basic assumptions
Different mean parameters can be given for
different groups
Different means for MZ and DZ twins are
typical for many anthropometric traits
especially in childhood



Take the script “ACE univariate model.R”
Modify the model in a way that it allows
different means for MZ and DZ twins
When comparing the fit indexes, does the
model indicates that different means are
needed for MZ and DZ twins?

A standard way to test the technical assumptions of genetic twin
model is to compare the fit of twin model to the fit of saturated
model
◦ This is the case for OpenMx
◦ For example, MPlus allows directly compare the model fit to observed data

Saturated model does not make these assumptions but estimates
all statistics
◦ In practice it estimates means and variances for MZ and DZ twins as well
as for first and second co-twin in the pair
◦ Genetic twin model, such as ACE model, can be interpreted to be a submodel of the saturated model
◦ This allows to use -2LL statistics to compare models

If the fit of genetic twin model is not statistically significantly
worse than saturated model, it indicates that these assumptions
are fulfilled
◦ Using saturated model, it is possible to test these assumption separately




Take the script “Saturated model.R”
Check what assumptions the model is testing
Run the model and check whether these
assumptions are violated
Finally compare the fit of full saturated model
to fit of ACE twin model

Saturated model estimates 10 parameters

ACE model estimates 4 parameters



◦ Means for MZ and DZ and first and second co-twin (=4
parameters)
◦ Variances for MZ and DZ and first and second twin (=4
parameters)
◦ Covariances for MZ and DZ twins
◦ A, C and E components and one mean parameter
So the difference in -2LL follows χ2-distribution with
6 degrees of freedom
Naturally these numbers will be different, for
example, for sex-limitation model
Good always to think beforehand the number of
estimated parameters because discrepancy may
indicate problems how the model is specified




In addition to the assumptions related to the
building the model, there are assumptions
related to the interpretation of results
The violations of these assumption do not
create technical problems but may affect how
to interpret the results
The most important assumptions are random
mating and lack of gene-environment
interactions
Many models also assume that there is no
sib-sib interactions




As seen in the previous lectures, twin models
assume that genetic correlation is 0.5 in DZ twins
This is because 50% chance that DZ twins have
inherited the same allele from the same parents
However if there exists genetic correlation
between parents, it inflates the genetic
correlation of DZ twins
In practice this leads to the overestimation of
common environmental factors
◦ Common environmental component is estimated if DZ
correlation is more than half of MZ correlation
◦ If genetic correlation is more than 0.5 in DZ twins, this is
estimated as common environmental influence if there is
no dominance genetic inflluence


However, assortative mating does not necessarily create
genetic correlation between spouses
Trait correlation between spouses can be because of
environmental factors
◦ For example there are clear differences in height between
educational categories
◦ So if there is educational correlation between the spouses, it may
lead to correlation in height which is totally caused by
environmental factors

However there is also clear evidence that assortative
mating can be directly based on traits
◦ For example it is very likely that height affects directly spouse
selection
◦ This creates genetic correlation between spouses

These different hypotheses are difficult to test even if
information on parents are available in data
◦ Two generation models are available but in practice complicated


Twin models make usually the assumption of
lack of gene environment interactions
However there is accumulating evidence that
genetic and environmental factors interact
◦ Tomorrow a lecture on this specific topic


It is not possible to test all possible G*E
interactions
However when interpreting the results, the
possible role of G*E needs to be considered
Contributions of Genetic, Shared Environment, Genotype x Shared
Environment Interaction Effects to Twin/Sib Resemblance
Shared
Environment
Additive
Genetic Effects
Genotype x Shared
Environment
Interaction
MZ Pairs
1
1
1x1=1
DZ Pairs/Full Sibs
1
½
1x½=½
In other words—if gene-(shared) environment interaction is not explicitly
modeled, it will be subsumed into the A term in the classic twin model.
Contributions of Genetic, Unshared Environment, Genotype x
Unshared Environment Interaction Effects to Twin/Sib Resemblance
Unshared
(Unique)
Environment
Additive
Genetic Effects
Genotype x
Unshared
Environment
Interaction
MZ Pairs
0
1
0x1=0
DZ Pairs/Full Sibs
0
½
0x½=0
If gene-(unshared) environment interaction is not explicitly modeled,
it will be subsumed into the E term in the classic twin model.



If G-E interaction is not modeled it naturally does not mean that it would
not affect the results
In many cases we have not measured relevant environmental exposures,
but we have to speculate whether they can still explain the found results
G-E interaction may well be one reason why common environmental
influences are rarely seen even in the case when this in counterintuitive
◦ For example, the lack of common environmental effect in many psychological traits
◦ It may reflect rather that the effect of family related factors is modified by genetic
factors than the lack of this effect
◦ This has important scientific and policy implications


However even if G*E effects are not modeled, it does not mean that the
parameter estimates do not have interpretation
They describe heritability of the trait at the level of the environmental
factors in the population
◦ This is comparable to the situation of all other statistics
◦ For example if we have the estimate of mean height in the Finnish population it
applies in the given sex distribution but is different if the sex distribution will change







Usually twin models assume that there is no interaction
between co-twins
In some psychological trait this may not be the case
There can be collaboration and competition between twins
It can be shown that collaboration increases the estimates
of common environmental effects whereas competition
may increases the effect of dominance genetic effects
Sib-interaction effect can be modeled because
collaboration increases MZ variation more and competition
less as compared to DZ twins
So difference in variances between MZ and DZ twins may
be because of sib-sib interaction and this can be tested
This is, however, a rare finding
Sib-sib interaction in ACE model
1
1 / 0.5
1
1
1
1
A
C
1
A
E
C
c
a
1
c
a
e
e
s
BMITWIN1
E
s
BMITWIN2



Gene-gene interactions do not violate the
assumptions of twin modeling, but their effect cannot
be estimated
Gene-gene interactions are modeled in unknown
proposition depending on linkage disequilibrium
If the interaction is between two genes closely linked,
i.e. inherited together, it is modeled as part of
additive genetic variation
◦ In the classical twin design they form only one unit

However if the genes are weakly linked, i.e. usually
inherited separately, this effect mimics dominance
genetic variation
◦ This is understandable because in that case the genes will
be reshuffled between generations such as in the case for
dominance genetic effect







In the recent years epigenetics has become an important
area in genetics
Discordant MZ twin pairs have been very important to
study these issues
By using the classical twin design, it is also possible to
study the heritability of RNA transcription of single genes
However otherwise epigenetics has only minor impact on
twin design
Epigenetic variation is (correctly) modeled as part of A, C
or E component related to factors behind this variation
The same applies to all environmental factors
Thus from the classical twin design point of view
epigenetic is only one source of environmental variation
◦ Such as all environmental variation, part of it may be caused by
genetic factors
Population Research Unit
Department of Social Research
University of Helsinki