Download GWAS for quantitative traits

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Metastability in the brain wikipedia , lookup

Biological neuron model wikipedia , lookup

Agent-based model in biology wikipedia , lookup

Behavioural genetics wikipedia , lookup

Twin study wikipedia , lookup

Heritability of IQ wikipedia , lookup

Transcript
Queensland Institute of
Medical Research
GWAS for quantitative traits
Peter M. Visscher
[email protected]
Overview
•
•
•
•
Darwin and Mendel
Background: population genetics
Background: quantitative genetics
GWAS
– Examples
– Analysis
– Statistical power
[Galton, 1889]
Mendelian Genetics
Following a single (or several)
genes that we can directly score
Phenotype highly informative
as to genotype
Darwin & Mendel
• Darwin (1859) Origin of Species
– Instant Classic, major immediate impact
– Problem: Model of Inheritance
•
•
•
•
Darwin assumed Blending inheritance
Offspring = average of both parents
zo = (zm + zf)/2
Fleming Jenkin (1867) pointed out problem
– Var(zo) = Var[(zm + zf)/2] = (1/2) Var(parents)
– Hence, under blending inheritance, half the variation is
removed each generation and this must somehow be
replenished by mutation.
Mendel
• Mendel (1865), Experiments in Plant Hybridization
• No impact, paper essentially ignored
– Ironically, Darwin had an apparently unread copy in his
library
– Why ignored? Perhaps too mathematical for 19th
century biologists
• Rediscovery in 1900 (by three independent
groups)
• Mendel’s key idea: Genes are discrete particles
passed on intact from parent to offspring
The height vs. pea
debate
(early 1900s)
Biometricians
Mendelians
Do quantitative traits have the same
hereditary and evolutionary properties as
discrete characters?
Trait
Qq
qq
QQ
m-a
m+d
m+a
RA Fisher (1918).
Transactions of the
Royal Society
of Edinburgh
52: 399-433.
Population Genetics
• Allele and genotype frequencies
• Hardy-Weinberg Equilibrium
• Linkage (dis)equilibrium
Allele and Genotype Frequencies
Given genotype frequencies, we can always compute allele
frequencies, e.g.,
1
pi = freq( Ai Ai ) + ∑ freq( Ai Aj )
2 i≠ j
6
The converse is not true: given allele frequencies we
cannot uniquely determine the genotype frequencies
For n alleles, there are n(n+1)/2 genotypes
If we are willing to assume random mating,
 pi2 for i = j
freq ( Ai A j ) = 
2 pi p j for i ≠ j
Hardy-Weinberg
proportions
Hardy-Weinberg
• Prediction of genotype frequencies from allele freqs
• Allele frequencies remain unchanged over generations,
provided:
• Infinite population size (no genetic drift)
• No mutation
• No selection
QC in GWAS studies
• No migration
• Under HW conditions, a single generation of random
mating gives genotype frequencies in Hardy-Weinberg
proportions, and they remain forever in these proportions
Linkage equilibrium
Random mating and recombination eventually changes
gamete frequencies so that they are in linkage equilibrium (LE).
Once in LE, gamete frequencies do not change (unless acted on
by other forces)
At LE, alleles in gametes are independent of each other:
freq(AB) = freq(A)*freq(B)
freq(ABC) = freq(A) * freq(B) * freq(C)
Linkage disequilibrium
When linkage disequilibrium (LD) present, alleles are no
longer independent --- knowing that one allele is in the
gamete provides information on alleles at other loci:
freq(AB) ≠ freq(A) * freq(B)
The disequilibrium between alleles A and B is given by
DAB = freq(AB) – freq(A)*freq(B)
GWAS relies on LD between markers and causal variants
Linkage equilibrium
Q1
Linkage disequilibrium
M1
Q1
Q2
Q1
M1
M1
Q2
Q2
M2
Q2
Q2
M2
M1
Q2
Q1
M1
Q1
M1
Q2
M2
Q2
M2
M2
M2
M2
Q1
Q1
M2
M1
Q1
M1
The Decay of Linkage Disequilibrium
The frequency of the AB gamete is given by
freq(AB) = freq(A)*freq*(B) + DAB
If recombination frequency between the A and B loci
is c, the disequilibrium in generation t is
D(t) = D(0) (1 – c)t
1.00
0.90
0.80
0.70
0.60
LD
Note that D(t) -> zero, although the
approach can be slow when c is very
small
0.50
0.40
0.30
c = 0.10
0.20
NB: Gene mapping & GWAS
c = 0.01
0.10
c = 0.001
0.00
0
10
20
30
40
50
60
Generation
70
80
90
100
Forces that Generate LD
•
•
•
•
•
Drift (finite population size)
Selection
Migration (admixture)
Mutation
Population structure (stratification)
Effective population size determines the
number of markers needed for GWAS
Quantitative Genetics
The analysis of traits whose variation is
determined by both a number of genes and
environmental factors
Trait
Qq
qq
QQ
m-a
m+d
m+a
Phenotype is highly uninformative as to
underlying genotype
Complex (or Quantitative) trait
• No (apparent) simple Mendelian basis for variation
in the trait
• May be a single gene strongly influenced by
environmental factors
• May be the result of a number of genes of equal
(or differing) effect
• Most likely, a combination of both multiple genes
and environmental factors.
• Example: Blood pressure, cholesterol levels, IQ,
height, etc.
Basic model of Quantitative Genetics
Basic model: P = G + E
G = average phenotypic value for that genotype
if we are able to replicate it over the universe
of environmental values, G = E[P]
G x E interaction --- G values are different
across environments. Basic model now
becomes P = G + E + GE
Biometrical model for single diallelic Quantitative
Trait Locus (QTL)
µ = ∑ xi f (xi )
i
Contribution of the QTL to the Mean (X)
Genotypes
AA
Aa
aa
Effect, x
a
d
-a
Frequencies, f(x)
p2
2pq
q2
Mean (X)
= a(p2) + d(2pq) – a(q2)
= a(p-q) + 2pqd
Example: Apolipoprotein E & Alzheimer’s
Genotype
Average age of onset
ee
Ee
EE
68.4
75.5
84.3
2a = G(EE) - G(ee) = 84.3 - 68.4 --> a = 7.95
d = G(Ee) - [ G(EE)+G(ee)]/2 = -0.85
d/a = -0.10
Only small amount of dominance
Biometrical model for single diallelic QTL
Var = ∑ ( xi − µ ) f ( xi )
2
Contribution of the QTL to the Variance (X)
i
Genotypes
AA
Aa
aa
Effect, x
a
d
-a
Frequencies, f(x)
p2
2pq
q2
Var (X)
= (a-m)2p2 + (d-m)22pq + (-a-m)2q2
= VQTL
HW proportions
Biometrical model for single diallelic QTL
Var (X)
= (a-m)2p2 + (d-m)22pq + (-a-m)2q2
= 2pq[a+(q-p)d]2 + (2pqd)2
= VAQTL
+
VDQTL
Additive effects: the main effects of individual alleles
Dominance effects: represent the interaction between alleles
Biometrical model for single biallelic QTL
a
d
m
Fisher 1918
-a
aa
Aa
AA
Var (X) = Regression Variance + Residual Variance
= Additive Variance + Dominance Variance
Association (GWAS)
•
•
•
•
State of play
Model
Analysis method
Power of detection
Number
of loci
5
Percent of Heritability
Measure Explained
50%
32
20%
Systemic lupus
erythematosus
Type 2 diabetes
6
15%
18
6%
HDL cholesterol
7
5.2%
Height
40
5%
Early onset myocardial
infarction
Fasting glucose
9
2.8%
4
1.5%
Disease
Age-related macular
degeneration
Crohn’s disease
Heritability
Measure
Sibling recurrence
risk
Genetic risk
(liability)
Sibling recurrence
risk
Sibling recurrence
risk
Phenotypic
variance
Phenotypic
variance
Phenotypic
variance
Phenotypic
variance
• GWAS works
• Effect sizes are typically small
– Disease: OR ~1.1 to ~1.3
– Quantitative traits: % var explained
<<1%
Effect sizes QT (104 SNPs)
1
3
5
7
9
1
3
5
7
0.
0.
0.
0.
1.
1.
1.
1.
35
30
25
20
15
10
5
0
0.
Frequency
% variance explained, quantitative
traits
Linear model for single SNP
• Allelic
Additive model
Y = µ+ b*x + e
x = 0, 1, 2 for genotypes aa, Aa and AA
• Genotypic
Additive + dominance model
Y = µ + Gi + e
Gi = genotype group for corresponding to
genotypes aa, Aa and AA
Method
• Linear regression
• ANOVA
• (other: maximum likelihood, Bayesian)
Test statistic (allelic model)
T = bˆ / σ (bˆ) ~ t N − 2 ≈ N (0,1)
2
2
ˆ
ˆ
T = b / var(b) ~ F1, N − 2 ≈ χ1
2
var(bˆ) =
σ e2
N var( x)
=
σ e2
N 2 p (1 − p )
Statistical Power (additive model)
q2 = {2p(1-p)[a + d(1-2p)]2} / σp2
Non-centrality parameter of χ2 test:
λ = Nq2/(1-q2) ≈ Nq2
Required sample size given type-I (α) and type-II (β) error:
N = [(1-q2)/(q2)](z(1-α/2) + z(1-β))2 ≈ (z(1-α/2) + z(1-β))2 / q2
LD again
r2 = LD correlation between QTL and genotyped
SNP
Proportion of variance explained at SNP
= r2q2
Required sample size for detection
N ≈ (z(1-α/2) + z(1-β))2 / (r2q2)
Genetic Power Calculator (Shaun Purcell)
http://pngu.mgh.harvard.edu/~purcell/gpc/
Serum bilirubin: if all GWAS were so simple…
2.000
95% CI PHENOTYPE
1.500
38% of phenotypic
variance explained
1.000
0.500
0.000
-0.500
0
1
RS2070959_A
2
1984