Download AA - Virginia Institute for Psychiatric and Behavioral Genetics

Document related concepts

Epistasis wikipedia , lookup

Heritability of autism wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genetic engineering wikipedia , lookup

Designer baby wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Genetic testing wikipedia , lookup

Medical genetics wikipedia , lookup

Human genetic variation wikipedia , lookup

Public health genomics wikipedia , lookup

Irving Gottesman wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Genome (book) wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Genetic drift wikipedia , lookup

Microevolution wikipedia , lookup

Population genetics wikipedia , lookup

Behavioural genetics wikipedia , lookup

Heritability of IQ wikipedia , lookup

Twin study wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
Intro to Quantitative
Genetics
HGEN502, 2011
Hermine H. Maes
Intro to Quantitative Genetics





1/18: Course introduction; Introduction to
Quantitative Genetics & Genetic Model Building
1/20: Study Design and Genetic Model Fitting
1/25: Basic Twin Methodology
1/27: Advanced Twin Methodology and Scope of
Genetic Epidemiology
2/1: Quantitative Genetics Problem Session
Aims of this talk
Historical Background
 Genetical Principles

 Genetic
Parameters: additive, dominance
 Biometrical Model

Statistical Principles
 Basic
concepts: mean, variance, covariance
 Path Analysis
 Likelihood
Quantitative Genetics Principles

Analysis of patterns and mechanisms underlying
variation in continuous traits to resolve and
identify their genetic and environmental causes
 Continuous
traits have continuous phenotypic range;
often polygenic & influenced by environmental effects
 Ordinal traits are expressed in whole numbers; can
be treated as approx discontinuous or as threshold
traits
 Some qualitative traits; can be treated as having
underlying quantitative basis, expressed as a
threshold trait (or multiple thresholds)
Types of Genetic Influence

Mendelian Disorders
 Single
gene, highly penetrant, severe, small %
affected (e.g., Huntington’s Disease)

Chromosomal Disorders
 Insertions,
deletions of chromosomal sections,
severe, small % affected (e.g., Down’s Syndrome)

Complex Traits
 Multiple
genes (of small effect), environment, large %
population, susceptibility – not destiny (e.g.,
depression, alcohol dependence, etc)
Genetic Disorders
Great 19th Century Biologists

Gregor Mendel (1822-1884): Mathematical
rules of particulate inheritance (“Mendel’s Laws”)

Charles Darwin (1809-1882): Evolution
depends on differential reproduction of inherited
variants

Francis Galton (1822-1911): Systematic
measurement of family resemblance

Karl Pearson (1857-1936): “Pearson
Correlation”; graduate student of Galton
Family Measurements
Standardize Measurement
Pearson and Lee’s diagram for
measurement of “span”
(finger-tip to finger-tip distance)
Parent Offspring Correlations
From Pearson and Lee (1903) p.378
Sibling Correlations
From Pearson and Lee (1903) p.387
Nuclear Family Correlations
© Lindon Eaves, 2009
Quantitative Genetic Strategies

Family Studies
 Does
the trait aggregate in families?
 The (Really!) Big Problem: Families are a mixture of
genetic and environmental factors

Twin Studies
 Galton’s
solution: Twins
 One (Ideal) solution: Twins separated at birth
 But unfortunately MZA’s are rare
 Easier solution: MZ & DZ twins reared together
Twin Studies Reared Apart

Minnesota Study of Twins Reared Apart (T. Bouchard et al, 1979



>100 sets of reared-apart twins from across the US & UK
All pairs spent formative years apart (but vary tremendously in amount
of contact prior to study)
56 MZAs participated
Types of Twins

Monozygotic (MZ; “identical”):
result from fertilization of a single egg
by a single sperm; share 100% of
genetic material

Dizygotic (DZ, “fraternal” or “nonidentical”): result from independent
fertilization of two eggs by two sperm;
share on average 50% of their genes
Logic of Classical Twin Study





MZs share 100% genes, DZs (on avg) 50%
Both twin types share 100% environment
If rMZ > rDZ, then genetic factors are important
If rDZ > ½ rMZ, then growing up in the same
home is important
If rMZ < 1, then non-shared environmental
factors are important
Causes of Twinning


For MZs, appears to be random
For DZs,
 Increases
with mother’s age (follicle stimulating
hormone, FSH, levels increase with age)
 Hereditary factors (FSH)
 Fertility treatment
 Rates of twins/multiple births are increasing, currently
~3% of all births
Zygosity of Twins
Chorionicity of Twins
100% of DZ twins are dichorionic
~1/3 of MZ twins are dichorionic and
~2/3 are monochorionic
Twin Correlations
Virginia Twin Study of Adolescent Behavioral Development
Scatterplot for corrected MZ stature
Scatterplot for age and sex corrected stature in DZ twins
20
13
8
3
HTDEV2
HTDEV2
10
0
-2
r=0.924
r=0.535
-7
-10
-12
-20
-10
-5
0
5
10
-16
-11
-6
-1
4
HTDEV1
HTDEV1
MZ Stature
DZ Stature
9
14
© Lindon Eaves, 2009
Ronald Fisher (1890-1962)





1918: On the Correlation Between
Relatives on the Supposition of
Mendelian Inheritance
1921: Introduced concept of
“likelihood”
1930: The Genetical Theory of
Natural Selection
1935: The Design of Experiments
Fisher developed mathematical
theory that reconciled Mendel’s
work with Galton and Pearson’s
correlations
Fisher (1918): Basic Ideas







Continuous variation caused by lots of genes (polygenic
inheritance)
Each gene followed Mendel’s laws
Environment smoothed out genetic differences
Genes may show different degrees of dominance
Genes may have many forms (multiple alleles)
Mating may not be random (assortative mating)
Showed that correlations obtained by Pearson & Lee
were explained well by polygenic inheritance
[“Mendelian” Crosses with Quantitative Traits]
Biometrical Genetics
Lots of credit to:
Manuel Ferreira, Shaun Purcell
Pak Sham, Lindon Eaves
Building a Genetic Model
Revisit common genetic parameters - such as allele frequencies,
genetic effects, dominance, variance components, etc
Use these parameters to construct a biometrical genetic model
Model that expresses the:
(1) Mean
(2) Variance
(3) Covariance between individuals
for a quantitative phenotype as a function of genetic parameters.
Genetic Concepts
G
Population level
Allele and genotype frequencies
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G G
Transmission level
Mendelian segregation
Genetic relatedness
G
G
G
G
P
P
Phenotype level
Biometrical model
Additive and dominance components
G
Population level
1. Allele frequencies
A single locus, with two alleles
- Biallelic / diallelic
- Single nucleotide polymorphism, SNP
A
a
Alleles A and a
- Frequency of A is p
- Frequency of a is q = 1 – p
Every individual inherits two alleles
- A genotype is the combination of the two alleles
- e.g. AA, aa (the homozygotes) or Aa (the heterozygote)
A
a
Population level
2. Genotype frequencies (Random mating)
Allele 2
Allele 1
A (p)
a (q)
A (p)
AA (p2)
Aa (pq)
a (q)
aA (qp)
aa (q2)
Hardy-Weinberg Equilibrium frequencies
P (AA) = p2
P (Aa) = 2pq
P (aa) = q2
p2 + 2pq + q2 = 1
Transmission level
Mendel’s experiments
AA
Pure Lines
F1
aa
Aa
Aa
Intercross
AA
Aa
Aa
3:1 Segregation Ratio
aa
Transmission level
F1
Pure line
Aa
aa
Aa
aa
Back cross
1:1 Segregation ratio
Transmission level
AA
Pure Lines
F1
aa
Aa
Aa
Intercross
AA
Aa
Aa
3:1 Segregation Ratio
aa
Transmission level
F1
Pure line
Aa
aa
Aa
aa
Back cross
1:1 Segregation ratio
Transmission level
Mendel’s law of segregation
Mother (A3A4)
Segregation, Meiosis
Father
(A1A2)
A3 (½)
A4 (½)
A1 (½)
A1A3 (¼)
A1A4 (¼)
A2 (½)
A2A3 (¼)
A2A4 (¼)
Gametes
Phenotype level
1. Classical Mendelian traits
Dominant trait (D - presence, R - absence)
- AA, Aa D
- aa
R
Recessive trait (D - absence, R - presence)
- AA, Aa D
- aa
R
Codominant trait (X, Y, Z)
- AA
- Aa
- aa
X
Y
Z
Phenotype level
2. Dominant Mendelian inheritance
Mother (Dd)
Father
(Dd)
D (½)
d (½)
D (½)
DD (¼)
Dd (¼)
d (½)
dD (¼)
dd (¼)
Phenotype level
3. Dominant Mendelian inheritance with incomplete
penetrance and phenocopies
Mother (Dd)
Father
(Dd)
D (½)
d (½)
D (½)
d (½)
DD (¼)
Dd (¼)
dD (¼)
dd (¼)
Incomplete
penetrance
Phenocopies
Phenotype level
4. Recessive Mendelian inheritance
Mother (Dd)
Father
(Dd)
D (½)
d (½)
D (½)
DD (¼)
Dd (¼)
d (½)
dD (¼)
dd (¼)
Phenotype level
Two kinds of differences

Continuous



Graded, no distinct boundaries
e.g. height, weight, blood-pressure, IQ,
extraversion
Categorical




Yes/No
Normal/Affected (Dichotomous)
None/Mild/Severe (Multicategory)
Often called “threshold traits” because
people “affected” if they fall above some
level of a measured or hypothesized
continuous trait
Phenotype level
Polygenic Traits
Mendel’s Experiments in Plant Hybridization, showed how discrete particles (particulate theory of inheritance)
behaved mathematically: all or nothing states (round/wrinkled, green/yellow), “Mendelian” disease
How do these particles produce a continuous trait like stature or liability to a complex disorder?
1 Gene
 3 Genotypes
 3 Phenotypes
2 Genes
 9 Genotypes
 5 Phenotypes
3 Genes
 27 Genotypes
 7 Phenotypes
4 Genes
 81 Genotypes
 9 Phenotypes
Phenotype level
Quantitative traits
g==-1
g==0
.128205
.072
Fraction
AA
g==-1
g==-1
g==0
.128205 g==1
.128205
g==0
-3.90647
Fraction
.128205
0
Fraction
Fraction
Aa
0
g==1
.128205
0
0
g==1
-3.90647
-3.90647
.128205
-3.90647
2.7156
2.7156
qt
Histograms by g
aa
0
-3.90647
2.7156
qt
0
-3.90647
0
-3.90647
2.7156
qt
Histograms by g
2.7156
qt
Histograms by g
Phenotype level
P(X)
Aa
aa
Biometric Model
AA
X
aa
Aa
AA
m
-a
d
+a
m -a
m +d
m +a
Genotypic effect
Genotypic means
Very Basic Statistical Concepts
1. Mean (X)
2. Variance (X)
3. Covariance (X,Y)
4. Correlation (X,Y)
Mean, variance, covariance
1. Mean (X)
x



E
(
X
)
 
x
f
x

n
i
i
i
i
i
Mean, variance, covariance
2. Variance (X)

2


x


i






Var
(
X
)

E
(
X

)


x

f
x

i
i
n

1i
2i
2
Mean, variance, covariance
3. Covariance (X,Y)




x


y



i




Cov
(
X
,
Y
)

E
X


Y



X
Y
i
X i
n

1






x

y

f
x
,y

i
X
i
Y
i
i
i
Y
Mean, variance, covariance (& correlation)
4. Correlation (X,Y)
rx,y 

cov x,y
sx sy
Biometrical model for single biallelic QTL
Biallelic locus
- Genotypes: AA, Aa, aa
- Genotype frequencies: p2, 2pq, q2
Alleles at this locus are transmitted from P-O according to
Mendel’s law of segregation
Genotypes for this locus influence the expression of a
quantitative trait X (i.e. locus is a QTL)
Biometrical genetic model that estimates the contribution of this QTL
towards the (1) Mean, (2) Variance and (3) Covariance between
individuals for this quantitative trait X
Biometrical model for single biallelic QTL
Biallelic locus
- Genotypes: AA, Aa, aa
- Genotype frequencies: p2, 2pq, q2
Alleles at this locus are transmitted from P-O according to
Mendel’s law of segregation
Genotypes for this locus influence the expression of a
quantitative trait X (i.e. locus is a QTL)
Biometrical genetic model that estimates the contribution of this QTL
towards the (1) Mean, (2) Variance and (3) Covariance between
individuals for this quantitative trait X
Biometrical model for single biallelic QTL

x
x

if
i
1. Contribution of the QTL to the Mean (X)
i
Genotypes
AA
Aa
aa
Effect, x
a
d
-a
Frequencies, f(x)
p2
2pq
q2
Mean (X)
= a(p2) + d(2pq) – a(q2)
= a(p-q) + 2pqd
Biometrical model for single biallelic QTL



Var

x
fx

i
i
2
2. Contribution of the QTL to the Variance (X)
i
Genotypes
AA
Aa
aa
Effect, x
a
d
-a
Frequencies, f(x)
p2
2pq
q2
Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2
= VQTL
Broad-sense heritability of X at this locus = VQTL / V Total
Broad-sense total heritability of X
= ΣVQTL / V Total
Biometrical model for single biallelic QTL
Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2
= 2pq[a+(q-p)d]2 + (2pqd)2
= VAQTL + VDQTL
Additive effects: the main effects of individual alleles
Dominance effects: represent the interaction between alleles
aa
–a
Aa
AA
d
+a
m
d=0
Biometrical model for single biallelic QTL
Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2
= 2pq[a+(q-p)d]2 + (2pqd)2
= VAQTL + VDQTL
Additive effects: the main effects of individual alleles
Dominance effects: represent the interaction between alleles
aa
–a
m
Aa
AA
d
+a
d>0
Biometrical model for single biallelic QTL
Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2
= 2pq[a+(q-p)d]2 + (2pqd)2
= VAQTL + VDQTL
Additive effects: the main effects of individual alleles
Dominance effects: represent the interaction between alleles
aa
–a
Aa
d
m
AA
+a
d<0
Biometrical model for single biallelic QTL
+a
d
m
–a
aa
Aa
AA
Var (X) = Regression Variance + Residual Variance
= Additive Variance + Dominance Variance
Biometrical model for single biallelic QTL
Var (X)
= 2pq[a+(q-p)d]2 + (2pqd)2
Demonstrate
VAQTL
+
VDQTL
2A. Average allelic effect
2B. Additive genetic variance
NOTE: Additive genetic variance depends on
allele frequency
& additive genetic value
as well as
dominance deviation
p
a
d
Additive genetic variance typically greater than dominance variance
1/3
Biometrical model for single biallelic QTL
2A. Average allelic effect (α)
The deviation of the allelic mean from the population mean
Allele a
?
Mean (X)
a
A
a
AA
Aa
aa
a
d
-a
p
q
p
q
Population
Allele A
a(p-q) + 2pqd
αa
?
αA
A
Allelic mean
Average allelic effect (α)
ap+dq
dp-aq
q(a+d(q-p))
-p(a+d(q-p))
2/3
Biometrical model for single biallelic QTL
Denote the average allelic effects
- αA = q(a+d(q-p))
- αa = -p(a+d(q-p))
If only two alleles exist, we can define the average effect of
allele substitution
- α = αA - αa
- α = (q-(-p))(a+d(q-p)) = (a+d(q-p))
Therefore:
- αA = qα
- αa = -pα
3/3
Biometrical model for single biallelic QTL
2A. Average allelic effect (α)
2B. Additive genetic variance
The variance of the average allelic effects
Freq.
VAQTL
AA
p2
Aa
aa
αA = qα
αa = -pα
Additive effect
2pq
2αA
αA + αa
= 2qα
= (q-p)α
q2
2αa
= -2pα
= (2qα)2p2 + ((q-p)α)22pq + (-2pα)2q2
= 2pqα2
= 2pq[a+d(q-p)]2
d = 0, VAQTL= 2pqa2
p = q, VAQTL= ½a2
Biometrical model for single biallelic QTL
1. Contribution of the QTL to the Mean (X)
2. Contribution of the QTL to the Variance (X)
2A. Average allelic effect (α)
2B. Additive genetic variance
3. Contribution of the QTL to the Covariance (X,Y)
Biometrical model for single biallelic QTL
3. Contribution of the QTL to the Cov (X,Y)






Cov
(
X
,
Y
)

x

y

f
x
,
y

i
X
i
Y
i
i
i
AA (a-m)
Aa (d-m)
AA (a-m)
(a-m)2
Aa (d-m)
(a-m) (d-m)
(d-m)2
aa (-a-m)
(a-m) (-a-m)
(d-m)(-a-m)
aa (-a-m)
(-a-m)2
Biometrical model for single biallelic QTL
3A. Contribution of the QTL to the Cov (X,Y) – MZ twins






Cov
(
X
,
Y
)

x

y

f
x
,
y

i
X
i
Y
i
i
i
AA (a-m)
AA (a-m)
p2(a-m)2
Aa (d-m)
0 (a-m) (d-m)
aa (-a-m)
0 (a-m) (-a-m)
Aa (d-m)
aa (-a-m)
2pq (d-m)2
0 (d-m)(-a-m)
q2 (-a-m)2
Covar (Xi,Xj) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2
= 2pq[a+(q-p)d]2 + (2pqd)2
= VAQTL + VDQTL
Biometrical model for single biallelic QTL
3B. Contribution of the QTL to the Cov (X,Y) – Parent-Offspring
AA (a-m)
AA (a-m)
Aa (d-m)
aa (-a-m)
p3(a-m)2
Aa (d-m)
p2q (a-m) (d-m)
aa (-a-m)
0 (a-m) (-a-m)
pq (d-m)2
pq2 (d-m)(-a-m)
q3 (-a-m)2
Biometrical model for single biallelic QTL
e.g. given an AA father, an AA offspring can come from either
AA x AA or AA x Aa parental mating types
AA x AA
will occur p2 × p2 = p4
and have AA offspring Prob()=1
AA x Aa
will occur p2 × 2pq = 2p3q
and have AA offspring Prob()=0.5
and have Aa offspring Prob()=0.5
therefore, P(AA father & AA offspring)
= p4 + p 3 q
= p3(p+q)
= p3
Biometrical model for single biallelic QTL
3B. Contribution of the QTL to the Cov (X,Y) – Parent-Offspring
AA (a-m)
AA (a-m)
aa (-a-m)
p3(a-m)2
Aa (d-m)
p2q (a-m) (d-m)
aa (-a-m)
0 (a-m) (-a-m)
Cov (Xi,Xj)
Aa (d-m)
pq (d-m)2
pq2 (d-m)(-a-m)
= (a-m)2p3 + … + (-a-m)2q3
= pq[a+(q-p)d]2
= ½VAQTL
q3 (-a-m)2
Biometrical model for single biallelic QTL
3C. Contribution of the QTL to the Cov (X,Y) – Unrelated individuals
AA (a-m)
AA (a-m)
Aa (d-m)
aa (-a-m)
p4(a-m)2
Aa (d-m) 2p3q (a-m) (d-m) 4p2q2 (d-m)2
aa (-a-m) p2q2(a-m) (-a-m) 2pq3 (d-m)(-a-m)
Cov (Xi,Xj)
= (a-m)2p4 + … + (-a-m)2q4
=0
q4 (-a-m)2
Biometrical model for single biallelic QTL
3D. Contribution of the QTL to the Cov (X,Y) – DZ twins and full sibs
¼ genome
# identical alleles
inherited from
parents
¼ genome
2
¼ (2 alleles)
MZ twins
Cov (Xi,Xj)
¼ genome
1
(father)
+
1
(mother)
½ (1 allele) +
P-O
= ¼ Cov(MZ) + ½ Cov(P-O) + ¼ Cov(Unrel)
= ¼(VAQTL+VDQTL) + ½ (½ VAQTL) + ¼ (0)
= ½ VAQTL + ¼VDQTL
¼ genome
0
¼ (0 alleles)
Unrelateds
Summary
Biometrical model predicts contribution of a QTL to the mean,
variance and covariances of a trait
1 QTL
Var (X) = VAQTL + VDQTL
Cov (MZ) = VAQTL + VDQTL
Cov (DZ) = ½VAQTL + ¼VDQTL
Multiple QTL
Var (X) = Σ(VAQTL) + Σ(VDQTL) = VA + VD
Cov (MZ) = Σ(VA ) + Σ(VD ) = VA + VD
QTL
QTL
Cov (DZ) = Σ(½VA ) + Σ(¼VD ) = ½VA + ¼VD
QTL
QTL
Summary
Biometrical model underlies the variance components estimation
performed in Mx
Var (X) = VA + VD + VE
Cov (MZ) = VA + VD
Cov (DZ) = ½VA + ¼VD
Path Analysis
HGEN502, 2011
Hermine H. Maes
Model Building

Write equations for means, variances and
covariances of different type of relative
or

Draw path diagrams for easy derivation of
expected means, variances and
covariances and translation to
mathematical formulation
Method of Path Analysis



Allows us to represent linear models for the
relationship between variables in diagrammatic
form, e.g. a genetic model; a factor model; a
regression model
Makes it easy to derive expectations for the
variances and covariances of variables in terms
of the parameters of the proposed linear model
Permits easy translation into matrix formulation
as used by statistical programs
Path Diagram Variables




Squares or rectangles denote observed
variables
Circles or ellipses denote latent (unmeasured)
variables
Upper-case letters are used to denote variables
Lower-case letters (or numeric values) are used
to denote covariances or path coefficients
Variables
latent
variables
observed variables
Path Diagram Arrows


Single-headed arrows or paths (–>) are used to
represent causal relationships between
variables under a particular model - where the
variable at the tail is hypothesized to have a
direct influence on the variable at the head
Double-headed arrows (<–>) represent a
covariance between two variables, which may
arise through common causes not represented
in the model. They may also be used to
represent the variance of a variable
Arrows
double-headed arrows
single-headed arrows
Path Analysis Tracing Rules


Trace backwards, change direction at a 2headed arrow, then trace forwards (implies that
we can never trace through two-headed arrows
in the same chain).
The expected covariance between two
variables, or the expected variance of a variable,
is computed by multiplying together all the
coefficients in a chain, and then summing over
all possible chains.
Non-genetic Example
Cov AB
Cov AB = kl + mqn + mpl
Expectations
Cov AB =
 Cov BC =
 Cov AC =
 Var A =
 Var B =
 Var C =

Expectations
Cov AB = kl + mqn + mpl
 Cov BC = no
 Cov AC = mqo
 Var A = k2 + m2 + 2 kpm
 Var B = l2 + n2
 Var C = o2

Genetic Examples
MZ Twins Reared Together
 DZ Twins Reared Together
 MZ Twins Reared Apart
 DZ Twins Reared Apart
 Parents & Offspring

MZ Twins Reared Together
MZ Twins RT
Expected
Covariance
Twin 1
Twin 2
Twin 1
a2+c2+e2
variance
a2+c2
Twin 2
a2+c2
covariance
a2+c2+e2
DZ Twins Reared Together
DZ Twins RT
Expected
Covariance
Twin 1
Twin 1
Twin 2
a2+c2+e2
.5a2+c2
Twin 2
.5a2+c2
a2+c2+e2
MZ Twins Reared Apart
DZ Twins Reared Apart
Twins and Parents
Role of model mediating
between theory and data