Download Intro to Mx Scripts - Virginia Commonwealth University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction to
Genetic
Epidemiology
HGEN619, 2006
Hermine H. Maes
Genetic Epidemiology

Establishing / Quantifying the role of genes and
environment in variation in disease and complex
traits ~ Answering questions about the
importance of nature and nurture on individual
differences

Finding those genes and environmental factors
Genes & Environment



How much of the variation in a trait is accounted
for by genetic factors?
Do shared environmental factors contribute
significantly to the trait variation?
The first of these questions addresses
heritability, defined as the proportion of the total
variance explained by genetic factors
Nature-nurture question

Sir Francis Galton: comparing the similarity of identical
and fraternal twins yields information about the relative
importance of heredity vs environment on individual
differences

Gregor Mendel: classical experiments demonstrated that
the inheritance of model traits in carefully bred material
agreed with a simple theory of particulate inheritance

Ronald Fisher: first coherent account of how the
‘correlations between relatives’ explained ‘on the
supposition of Mendelian inheritance’
People and Ideas
Galton
Mendel
(1865-ish)
Correlation
Family Resemblance
Twins
Ancestral Heredity
Fisher
Darwin
(1865)
Natural Selection
Sexual Selection
Evolution
Particulate Inheritance
Genes: single in gamete
double in zygote
Segregation ratios
Spearman
(1918)
(1858,1871)
(1904)
Common Factor Analysis
Correlation & Mendel
Maximum Likelihood
ANOVA: partition of variance
Wright
(1921)
Path Analysis
Mather (1949) &
Thurstone (1930's)
Jinks (1971)
Multiple Factor Analysis
Biometrical Genetics
Model Fitting (plants)
Joreskog (1960)
Jinks & Fulker (1970)
Model Fitting applied to humans
Covariance
Structure Analysis
LISREL
Morton (1974)
Population
Genetics
Path Analysis &
Family Resemblance
Elston etc (19..)
Segregation
Linkage
Rao, Rice, Reich,
Cloninger (1970's)
Martin & Eaves (1977)
Genetic Analysis of
Covariance Structure
Neale (1990)
Watson &
Crick (1953)
Mx
2000
Assortment
Cultural Inheritance
Molecular
Genetics
Biometrical Model
Aa
h
aa
-d
m
AA
d
To make the simple two-allele model concrete, let us imagine that we are talking about genes
that influence adult stature. Les us assume that the normal range of height for males is from
4 feet 10 inches to 6 feet 8 inches; that is, about 22 inches. And let us assume that each
somatic chromosome has one gene of roughly equivalent effect. Then, roughly speaking, we
are thinking in terms of loci for which the homozygotes contribute +- 1/2 inch (from the
midpoint), depending on whether they are AA, the increasing homozygote, or aa, the
decreasing homozygote. In reality, although some loci may contribute greater effects than
this, others will almost certaily contribute less; thus we are talking about the kind of model in
which any particular polygene is having an effect that would be difficult to detect by the
methods of classical genetics.
in Biometrical Genetics chapter in Methodology for Genetic Studies of Twins and Families
Polygenic Traits
1 Gene
 3 Genotypes
 3 Phenotypes
2 Genes
 9 Genotypes
 5 Phenotypes
3
3
2
2
1
1
0
0
3 Genes
 27 Genotypes
 7 Phenotypes
7
6
5
4
3
2
1
0
4 Genes
 81 Genotypes
 9 Phenotypes
20
15
10
5
0
Stature in adolescent twins
Women
700
600
500
400
300
200
Std. Dev = 6.40
100
Mean = 169.1
N = 1785.00
0
145.0
155.0
150.0
Stature
165.0
160.0
175.0
170.0
185.0
180.0
190.0
Individual differences




Physical attributes (height, eye color)
Disease susceptibility (asthma, anxiety)
Behavior (intelligence, personality)
Life outcomes (income, children)
Polygenic Model



Polygenic model: variation for a trait caused by a
large number of individual genes, each inherited
in a strict conformity to Mendel’s laws
Multifactorial model: many genes and many
environmental factors also of small and equal
effect
Effects of many small factors combined > normal
(Gaussian) distribution of trait values, according
to the central limit theorem.
Central Limit Theorem

The normal distribution is to be expected
whenever variation is produced by the addition
of a large number of effects, non-predominant

This holds quite often

Quantitative traits
Continuous or Categorical ?







Body Mass Index vs “obesity”
Blood pressure vs “hypertensive”
Bone Mineral Density vs “fracture”
Bronchial reactivity vs “asthma”
Neuroticism vs “anxious/depressed”
Reading ability vs “dyslexic”
Aggressive behavior vs “delinquent”
Multifactorial Threshold Model
of Disease
Single threshold
unaffected
Disease liability
affected
Multiple thresholds
normal
mild mod
Disease liability
severe
Genetically Complex Diseases
Imprecise phenotype
 Phenocopies / sporadic cases
 Low penetrance
 Locus heterogeneity/ polygenic effects

Complex Trait Model
Linkage
Marker
Gene1
Linkage
disequilibrium
Linkage
Association
Mode of
inheritance
Gene2
Disease
Phenotype
Individual
environment
Common
environment
Gene3
Polygenic
background
Causes of Variation

pre-1990
 estimation
of ‘anonymous’ genetic and
environmental components of phenotypic
variation
 genetic epidemiologic studies

post-1990
 identification
of QTL’s: quantitative trait loci
contributing to genetic variation of complex
(quantitative) traits
 linkage and association studies
Stages of Genetic Mapping

Are there genes influencing this trait?
 Genetic

Where are those genes?
 Linkage

epidemiological studies
analysis
What are those genes?
 Association
analysis
Partitioning Variation




phenotypic variance (VP) partitioned in genetic
(VG) and environmental (VE)
VP = VG + VE
Assumptions: additivity & independence of
genetic and environmental effects
heritability (h2): proportion of variance due to
genetic influences (h2 = VG /VP)
 property
of a group (not an individual), thus specific
to a group in place & time
Sources of Variance

Genetic factors:
 Additive
(A)
 Dominance (D)

Environmental factors:
 Common
/ Shared (C)
 Specific / Unique (E)
 Measurement Error, confounded with E
Genetic Factors

Additive genetic factors (A): sum of all the
effects of individual loci

Non-additive genetic factors: result of
interactions between alleles at the same locus
(dominance, D) or between alleles on different
loci (epistasis)
Environmental Factors

Shared [common or between-family]
environmental factors (C): aspects of the
environment shared by members of same family
or people who live together, and contribute to
similarity between relatives

Non-shared [specific, unique or within-family]
environmental factors (E): unique to an
individual, contribute to variation within family
members, but not to their covariation
Estimating Components




Estimate phenotypic variance components from
data on covariances of related individuals
Different types of relative pairs share different
amounts of phenotypic variance
Biometrical genetics theory: specify amounts in
terms of genetic and environmental variances
Three major types of study: family, adoption and
twin
Designs to disentangle G+E

Resemblance between relatives caused by:
 Shared
Genes (G = A + D)
 Environment

Common to family members (C)
Differences between relatives caused by:
 Non-shared
 Unique
Genes
environment (E)
Informative Designs

Family studies – G + C confounded

MZ twins alone – G + C confounded

MZ twins reared apart – rare, atypical, selective
placement ?

Adoption studies – increasingly rare, atypical, selective
placement ?

MZ and DZ twins reared together

Extended twin design
Classical Twin Study

MZ and DZ twins reared together
 MZ
twins genetically identical
 DZ twins share on average half their genes

Equal Environments Assumption
 MZ
and DZ twins share relevant
environmental influences to same extent
Zygosity
Identity at marker loci - except
for rare mutation
MZ and DZ twins:
determining zygosity using
ABI Profiler™ genotyping
(9 STR markers + sex)
MZ
DZ
DZ
MZ & DZ Correlations
rMZ > rDZ: G (heritability)
 C: increase rMZ & rDZ
 Relative magnitude of the MZ and DZ
correlations > contribution of additive
genetic (G) and shared environmental (C)
factors
 1-rMZ: importance of specific
environmental (E) factors

Twin Correlations
*
1.0
A
*
.8
E
.5
C
*
DZ
.8
.6
*
DZ
.4
MZ
*
MZ
*
DZ
.8
.7
*
DZ
MZ
*
MZ
Example





thus if, VP = VA + VC + VE = 2.0
CovMZ = VA + VC = 1.6
CovDZ = 1/2VA + VC = 1.2
then, by algebra,
VA = 0.8, VC = 0.8, VE = 0.4
but it isn’t always so simple, consider VP = 1.0,
CovMZ = 0.6; CovDZ = 0.65
then VA = -0.1, VC = 0.7, VE = 0.4
nonsensical negative variance component
Observed Statistics



Trait variance & MZ and DZ covariance as
unique observed statistics
Estimate the contributions of additive genes (A),
shared (C ) and specific (E) environmental
factors, according to the genetic model
Useful tool to generate the expectations for the
variances and covariances under a model is
path analysis
Path Analysis



Allows us to diagrammatically represent linear
models for the relationships between variables
Easy to derive expectations for the variances
and covariances of variables in terms of the
parameters of the proposed linear model
Permits translation into matrix formulation
Variance Components
P = eE + aA + cC + dD
Unique
Environment
Shared
Environment
E
Additive
Genetic
Dominance
Genetic
A
D
C
c
a
e
d
Phenotype
ACE Model Path Diagram for
MZ & DZ Twins 1
MZ=1.0 / DZ=0.5
E
C
e
c
PT1
A
a
A
C
a
c
PT2
E
e
Model Fitting




Evaluate significance of variance components effect size & sample size
Evaluate goodness-of-fit of model - closeness of
observed & expected values
Compare fit under alternative models
Obtain maximum likelihood estimates
Mx
Structural equation modeling package
 Software: www.vcu.edu/mx
 Manual: Neale et al. 2006
 Free

Structural equation modeling




Both continuous and categorical variables
Systematic approach to hypothesis testing
Tests of significance
Can be extended to:
 More
complex questions
 Multiple variables
 Other relatives
SEM: more complex questions I




Are the same genes acting in males and
females? (sex limitation)
Role of age on (a) mean (b) variance (c)
variance components
Are G & E equally important in age, country
cohorts? (heterogeneity)
Are G & E same in other strata (e.g.
married/unmarried)? ( G x E interaction)
SEM: more complex questions II



Do the same genes account for variation in
multiple phenotypes? (multivariate analysis)
Do the same genes account for variation in
phenotypes measured at different ages?
(longitudinal analysis)
Do specific genes account for
variation/covariation in phenotypes?
(linkage/association)
Linkage & Association Analysis
Stages of Genetic Mapping

Are there genes influencing this trait?
 Epidemiological

Where are those genes?
 Linkage

studies
analysis
What are those genes?
 Association
analysis
Linkage Analysis

Sharing between relatives
 Identifies


large regions
Include several candidates
Complex disease
 Scans
on sets of small families popular
 No strong assumptions about disease alleles
 Low power
 Limited resolution
Linkage Scan
Stages of Genetic Mapping

Are there genes influencing this trait?
 Epidemiological

Where are those genes?
 Linkage

studies
analysis
What are those genes?
 Association
analysis
Association Analysis

Sharing between unrelated individuals
 Trait
alleles originate in common ancestor
 High resolution



Recombination since common ancestor
Large number of independent tests
Powerful if assumptions are met
 Same

disease haplotype shared by many patients
Sensitive to population structure
Association Scan
Proof of Concept:
Genes/Regions
Genome Scan
Gene 1
Gene 2
Gene 3
Breast cancer
DLC-1
Chr 8q
Chr 13q
Lung cancer
CD44
Chr 22q
Melanoma
B-RAF
Type 2 diabetes
PPAR
PPP1R3A
HDL-C plasma level
CETP
LPL
Osteoarthritis
AGC1
Schizophrenia
DDC
FOXA2
Gene 4
Chr 1q
First (unequivocal)
positional cloning of a
complex disease QTL !
Number of genes identified from
QTL by year
From QTL to gene: the harvest begins:
RKorstanje & B Paigen : Nature Genetics 31, 235 – 236 (2002)