* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download GWAS for quantitative traits
Survey
Document related concepts
Transcript
Queensland Institute of Medical Research GWAS for quantitative traits Peter M. Visscher [email protected] Overview • • • • Darwin and Mendel Background: population genetics Background: quantitative genetics GWAS – Examples – Analysis – Statistical power [Galton, 1889] Mendelian Genetics Following a single (or several) genes that we can directly score Phenotype highly informative as to genotype Darwin & Mendel • Darwin (1859) Origin of Species – Instant Classic, major immediate impact – Problem: Model of Inheritance • • • • Darwin assumed Blending inheritance Offspring = average of both parents zo = (zm + zf)/2 Fleming Jenkin (1867) pointed out problem – Var(zo) = Var[(zm + zf)/2] = (1/2) Var(parents) – Hence, under blending inheritance, half the variation is removed each generation and this must somehow be replenished by mutation. Mendel • Mendel (1865), Experiments in Plant Hybridization • No impact, paper essentially ignored – Ironically, Darwin had an apparently unread copy in his library – Why ignored? Perhaps too mathematical for 19th century biologists • Rediscovery in 1900 (by three independent groups) • Mendel’s key idea: Genes are discrete particles passed on intact from parent to offspring The height vs. pea debate (early 1900s) Biometricians Mendelians Do quantitative traits have the same hereditary and evolutionary properties as discrete characters? Trait Qq qq QQ m-a m+d m+a RA Fisher (1918). Transactions of the Royal Society of Edinburgh 52: 399-433. Population Genetics • Allele and genotype frequencies • Hardy-Weinberg Equilibrium • Linkage (dis)equilibrium Allele and Genotype Frequencies Given genotype frequencies, we can always compute allele frequencies, e.g., 1 pi = freq( Ai Ai ) + ∑ freq( Ai Aj ) 2 i≠ j 6 The converse is not true: given allele frequencies we cannot uniquely determine the genotype frequencies For n alleles, there are n(n+1)/2 genotypes If we are willing to assume random mating, pi2 for i = j freq ( Ai A j ) = 2 pi p j for i ≠ j Hardy-Weinberg proportions Hardy-Weinberg • Prediction of genotype frequencies from allele freqs • Allele frequencies remain unchanged over generations, provided: • Infinite population size (no genetic drift) • No mutation • No selection QC in GWAS studies • No migration • Under HW conditions, a single generation of random mating gives genotype frequencies in Hardy-Weinberg proportions, and they remain forever in these proportions Linkage equilibrium Random mating and recombination eventually changes gamete frequencies so that they are in linkage equilibrium (LE). Once in LE, gamete frequencies do not change (unless acted on by other forces) At LE, alleles in gametes are independent of each other: freq(AB) = freq(A)*freq(B) freq(ABC) = freq(A) * freq(B) * freq(C) Linkage disequilibrium When linkage disequilibrium (LD) present, alleles are no longer independent --- knowing that one allele is in the gamete provides information on alleles at other loci: freq(AB) ≠ freq(A) * freq(B) The disequilibrium between alleles A and B is given by DAB = freq(AB) – freq(A)*freq(B) GWAS relies on LD between markers and causal variants Linkage equilibrium Q1 Linkage disequilibrium M1 Q1 Q2 Q1 M1 M1 Q2 Q2 M2 Q2 Q2 M2 M1 Q2 Q1 M1 Q1 M1 Q2 M2 Q2 M2 M2 M2 M2 Q1 Q1 M2 M1 Q1 M1 The Decay of Linkage Disequilibrium The frequency of the AB gamete is given by freq(AB) = freq(A)*freq*(B) + DAB If recombination frequency between the A and B loci is c, the disequilibrium in generation t is D(t) = D(0) (1 – c)t 1.00 0.90 0.80 0.70 0.60 LD Note that D(t) -> zero, although the approach can be slow when c is very small 0.50 0.40 0.30 c = 0.10 0.20 NB: Gene mapping & GWAS c = 0.01 0.10 c = 0.001 0.00 0 10 20 30 40 50 60 Generation 70 80 90 100 Forces that Generate LD • • • • • Drift (finite population size) Selection Migration (admixture) Mutation Population structure (stratification) Effective population size determines the number of markers needed for GWAS Quantitative Genetics The analysis of traits whose variation is determined by both a number of genes and environmental factors Trait Qq qq QQ m-a m+d m+a Phenotype is highly uninformative as to underlying genotype Complex (or Quantitative) trait • No (apparent) simple Mendelian basis for variation in the trait • May be a single gene strongly influenced by environmental factors • May be the result of a number of genes of equal (or differing) effect • Most likely, a combination of both multiple genes and environmental factors. • Example: Blood pressure, cholesterol levels, IQ, height, etc. Basic model of Quantitative Genetics Basic model: P = G + E G = average phenotypic value for that genotype if we are able to replicate it over the universe of environmental values, G = E[P] G x E interaction --- G values are different across environments. Basic model now becomes P = G + E + GE Biometrical model for single diallelic Quantitative Trait Locus (QTL) µ = ∑ xi f (xi ) i Contribution of the QTL to the Mean (X) Genotypes AA Aa aa Effect, x a d -a Frequencies, f(x) p2 2pq q2 Mean (X) = a(p2) + d(2pq) – a(q2) = a(p-q) + 2pqd Example: Apolipoprotein E & Alzheimer’s Genotype Average age of onset ee Ee EE 68.4 75.5 84.3 2a = G(EE) - G(ee) = 84.3 - 68.4 --> a = 7.95 d = G(Ee) - [ G(EE)+G(ee)]/2 = -0.85 d/a = -0.10 Only small amount of dominance Biometrical model for single diallelic QTL Var = ∑ ( xi − µ ) f ( xi ) 2 Contribution of the QTL to the Variance (X) i Genotypes AA Aa aa Effect, x a d -a Frequencies, f(x) p2 2pq q2 Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2 = VQTL HW proportions Biometrical model for single diallelic QTL Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2 = 2pq[a+(q-p)d]2 + (2pqd)2 = VAQTL + VDQTL Additive effects: the main effects of individual alleles Dominance effects: represent the interaction between alleles Biometrical model for single biallelic QTL a d m Fisher 1918 -a aa Aa AA Var (X) = Regression Variance + Residual Variance = Additive Variance + Dominance Variance Association (GWAS) • • • • State of play Model Analysis method Power of detection Number of loci 5 Percent of Heritability Measure Explained 50% 32 20% Systemic lupus erythematosus Type 2 diabetes 6 15% 18 6% HDL cholesterol 7 5.2% Height 40 5% Early onset myocardial infarction Fasting glucose 9 2.8% 4 1.5% Disease Age-related macular degeneration Crohn’s disease Heritability Measure Sibling recurrence risk Genetic risk (liability) Sibling recurrence risk Sibling recurrence risk Phenotypic variance Phenotypic variance Phenotypic variance Phenotypic variance • GWAS works • Effect sizes are typically small – Disease: OR ~1.1 to ~1.3 – Quantitative traits: % var explained <<1% Effect sizes QT (104 SNPs) 1 3 5 7 9 1 3 5 7 0. 0. 0. 0. 1. 1. 1. 1. 35 30 25 20 15 10 5 0 0. Frequency % variance explained, quantitative traits Linear model for single SNP • Allelic Additive model Y = µ+ b*x + e x = 0, 1, 2 for genotypes aa, Aa and AA • Genotypic Additive + dominance model Y = µ + Gi + e Gi = genotype group for corresponding to genotypes aa, Aa and AA Method • Linear regression • ANOVA • (other: maximum likelihood, Bayesian) Test statistic (allelic model) T = bˆ / σ (bˆ) ~ t N − 2 ≈ N (0,1) 2 2 ˆ ˆ T = b / var(b) ~ F1, N − 2 ≈ χ1 2 var(bˆ) = σ e2 N var( x) = σ e2 N 2 p (1 − p ) Statistical Power (additive model) q2 = {2p(1-p)[a + d(1-2p)]2} / σp2 Non-centrality parameter of χ2 test: λ = Nq2/(1-q2) ≈ Nq2 Required sample size given type-I (α) and type-II (β) error: N = [(1-q2)/(q2)](z(1-α/2) + z(1-β))2 ≈ (z(1-α/2) + z(1-β))2 / q2 LD again r2 = LD correlation between QTL and genotyped SNP Proportion of variance explained at SNP = r2q2 Required sample size for detection N ≈ (z(1-α/2) + z(1-β))2 / (r2q2) Genetic Power Calculator (Shaun Purcell) http://pngu.mgh.harvard.edu/~purcell/gpc/ Serum bilirubin: if all GWAS were so simple… 2.000 95% CI PHENOTYPE 1.500 38% of phenotypic variance explained 1.000 0.500 0.000 -0.500 0 1 RS2070959_A 2 1984