Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Association analysis Shaun Purcell Boulder Twin Workshop 2004 Overview • Candidate gene association • Haplotypes and linkage disequilibrium • Linkage and association • Family-based association What is association? • Categorical traits – disease susceptibility genes • Continuous traits – quantitative trait loci, QTL Disease traits Is there a difference in allele/genotype frequency between cases and controls? Case AA n1 Aa n3 aa n5 Control n2 n4 n6 Disease traits Is there a difference in allele/genotype frequency between cases and controls? Case AA Aa aa Control 30 50 20 Test for independence p2 2p(1-p) (1-p)2 25 50 25 2, p-value Disease traits Additive model General model Case Control Case Control Dominant model for A Case Control AA n1 n2 A 2n1+n3 2n2+n4 A* n1+n3 n2+n4 Aa n3 n4 a 2n5+n3 2n6+n4 aa aa n6 n5 1 df 2 df Effect sizes calculated as odds ratios n5 n6 1 df Quantitative traits 4 3 2 Aa 1 0 aa AA -1 -2 Y = aA + dD + e ID 001 002 003 004 005 … aa Y 0.34 1.23 1.66 2.74 1.33 … Aa G aa Aa Aa AA AA … AA A -1 0 0 1 1 … D 0 1 1 0 0 … Some web resources • BGIM http://statgen.iop.kcl.ac.uk/bgim/ Introductory tutorials on twin analysis, primer on maximum likelihood, Mx language. • GxE moderator models http://statgen.iop.kcl.ac.uk/gxe/ • Power calculation http://statgen.iop.kcl.ac.uk/gpc/ • Case/control association tools http://statgen.iop.kcl.ac.uk/gpc/model/ Relative risk Genotype P(D|G) RR AA P(D|AA) P(D|AA)/P(D|aa) Aa P(D|Aa) P(D|Aa)/P(D|aa) aa P(D|aa) 1 P(D|AA) / P(D|aa) labelled RR(AA) P(D|Aa) / P(D|aa) labelled RR(Aa) Genetic models Model RR(Aa) RR(AA) General x y Multiplicative x x2 Dominant x x Recessive 1.000 x No effect 1.000 1.000 Tests Test Alternate Null Any effect? General No effect Any effect assuming a multiplicative gene? Multiplicative No effect Any effect assuming a dominant gene? Dominance No effect Any effect assuming a recessive gene? Recessive No effect Can we assume a multiplicative effect? General Multiplicative Can we assume a dominant effect? General Dominance Can we assume a recessive effect? General Recessive Multiple samples • Constrain frequencies across samples • Constrain effects across samples – Can test genetic models with effects and/or frequencies constrained to be equal – Can perform tests of homogeneity of effects and/or frequencies across samples An example 2 case/control samples • Population frequency 5% Case Control Case Control AA 17 11 AA 37 10 Aa 35 59 Aa 67 43 aa 24 40 aa 37 20 Homogeneous effects across samples Homogeneous allele frequencies across samples Model ----Gen Mult Dom Rec None p 0.367 0.367 RR(Aa) -----1.979 1.979 RR(AA) -----3.663 3.663 -2LL ---- 0.367 0.367 1.911 1.911 3.651 3.651 793.199 0.401 0.401 1.990 1.990 1.990 1.990 802.927 0.405 0.405 1.000 1.000 1.921 1.921 805.064 0.442 0.442 1.000 1.000 1.000 1.000 815.628 793.143 Heterogeneous effects across samples Homogeneous allele frequencies across samples Model ----Gen Mult Dom Rec None p 0.367 0.367 RR(Aa) -----1.235 2.890 RR(AA) -----2.136 5.547 -2LL ---- 0.367 0.367 1.440 2.282 2.073 5.208 788.262 0.401 0.401 1.216 2.936 1.216 2.936 796.422 0.405 0.405 1.000 1.000 1.519 2.195 803.849 0.443 0.443 1.000 1.000 1.000 1.000 815.628 786.498 TESTS OF GENETIC MODELS -- ASSUMING EQ EFFECTS & EQ FREQS ========================================================= Gen Mult Dom Rec Gen Gen Gen vs vs vs vs vs vs vs None None None None Mult Dom Rec (2 (1 (1 (1 (1 (1 (1 df) df) df) df) df) df) df) : : : : : : : 22.485 22.429 12.701 10.564 0.056 9.784 11.921 p p p p p p p = = = = = = = 0.000 0.000 0.000 0.001 0.813 0.002 0.001 TESTS OF GENETIC MODELS -- ASSUMING UNEQ EFFECTS & EQ FREQS =========================================================== Gen Mult Dom Rec Gen Gen Gen vs vs vs vs vs vs vs None None None None Mult Dom Rec (4 (2 (2 (2 (2 (2 (2 df) df) df) df) df) df) df) : : : : : : : 29.130 27.366 19.205 11.779 1.764 9.925 17.351 p p p p p p p = = = = = = = 0.000 0.000 0.000 0.003 0.414 0.007 0.000 TESTS OF EQUAL EFFECTS -- ASSUMING EQ FREQS =========================================== w/ w/ w/ w/ Gen model Mult model Dom model Rec model (2 (1 (1 (1 df) df) df) df) : : : : 6.645 4.938 6.505 1.215 p p p p = = = = 0.036 0.026 0.011 0.270 Indirect association Genotyped markers QTL Ungenotyped markers Recombination Homologous chromosomes in one parent Paternal chromosome Maternal chromosome Recombination event during meiosis Recombinant gamete transmitted, harboring mutation Recombination Homologous chromosomes in one parent Paternal chromosome Maternal chromosome No recombination event during meiosis Nonrecombinant gamete transmitted, not harboring mutation Linkage: affected sib pairs Paternal chromosome Maternal chromosome First affected offspring, no recombination Second affected offspring, recombinant gamete IBD sharing from this one parent (0 or 1) 1 0 Association analysis • Mutation occurs on a ‘red’ chromosome Association analysis • Mutation occurs on a ‘red’ chromosome Association analysis • Association due to `linkage disequilibrium’ Haplotypes M m A AM Am a aM am This individual has aa and Mm genotypes and am and aM haplotypes Haplotypes M m A AM Am a aM am This individual has Aa and Mm genotypes and AM and am haplotypes … but given only genotype data, consistent with Am/aM as well as AM/am Haplotypes M m A AM Am a aM am This individual has AA and Mm genotypes and AM and Am haplotypes Equilibrium haplotype frequencies M m A a pr qr r ps qs s p q Linkage disequilibrium M m A a pr + D qr - D r ps - D qs + D s DMAX = Min(qs, pr) D’ = D /DMAX r2 = D’ / pqrs p q Haplotype analysis 1. Estimate haplotypes from genotypes 2. Associate haplotypes with trait Haplotype AAGG AAGT CGCG AGCT Freq. 40% 30% 25% 5% Odds Ratio 1.00* 2.21 1.07 0.92 * baseline, fixed to 1.00 Linkage Association Sib correlation Trait aa Aa AA QTL genotype 0 1 2 IBD at the QTL Sib correlation Sib correlation Trait LD RF 0 1 2 IBD at the Marker Trait 0 1 2 IBD at the QTL aa Aa AA Marker genotype aa Aa AA QTL genotype Variance Components • Means M1 M2 ASSOCIATION • Variance-covariance matrix LINKAGE V1 C12 C21 V2 Variance Components • Means M1 + bG1 M2 + bG2 ASSOCIATION b = regression coef. G = individual’s genotype • Variance-covariance matrix LINKAGE V1 C12 + q(-½) C21+ q(-½) V2 q = regression coef. = IBD sharing 0, ½,1 Components of a Genetic Theory G – Allele & genotype frequencies G – Demographics & population history – Linkage disequilibrium, haplotype structure • TRANSMISSION MODEL – Mendelian segregation – Identity by descent & genetic relatedness G G G G G G G G G G G G G G G G G G G G P P • PHENOTYPE MODEL – Biometrical model of quantitative traits – Additive & dominance components G Time • POPULATION MODEL G Linkage without association 3/5 3/6 2/6 5/6 3/5 3/2 Both families are ‘linked’ with the marker… …but a different allele is involved. 2/6 5/2 Linkage and association 3/5 3/6 2/6 5/6 3/6 3/2 2/4 6/2 4/6 6/6 All families are ‘linked’ with the marker… … and allele 6 is ‘associated’ with disease Linkage is just association within families 2/6 6/6 Association without linkage Controls Cases 6/6 6/2 3/5 3/4 3/6 2/4 3/2 5/6 3/6 4/6 2/2 2/6 5/2 Allele 6 is more common in the GREEN population The disease is more common in the GREEN population … a ‘spurious association’ 2/5 TDT • Transmission disequilibrium test – test for linkage and association AA Aa Aa AA AA AA aa AA Aa Aa Aa Aa TDT “A” disease allele AA x Aa AA x Aa aa x Aa aa x Aa AA Aa Aa aa Additive + - + - Dominant 0.5 0.5 + - Recessive + - 0.5 0.5 Between and within components Sib1 Sib2 Sib1 = B - W Sib2 = B + W Between and within components • Fulker et al (1999) S1 S2 S1 S2 B W S1 S2 AA AA 1 1 1 0 B+W B-W AA Aa 1 0 0.5 0.5 B+W B-W AA aa 1 -1 0 1 B+W B-W Note : W = S1 – B Parental genotypes • Use parental genotypes to generate B • Examples – AA from AAxAA – Aa from AAxAa – Aa from AaxAa W=0 W = -0.5 W=0 Pat Mat B 1 1 1 0 1 0 -1 1 1 0.5 0 0.5 0 0 -1 0 -1 1 0 -0.5 0 -1 -1 0 -1 -0.5 -1 assoc.mx • Sibling pair sample • B and W components precalculated in input file • Single SNP genotype • Quantitative trait assoc.dat s1 -0.007 -0.829 0.369 0.318 1.52 -0.948 0.596 -1.91 0.499 -1.17 -0.16 s2 -0.972 -0.196 0.645 1.55 0.910 -1.55 -0.394 -0.905 0.940 -1.29 -1.81 g1 -1 1 1 0 0 1 1 0 1 1 1 g2 0 1 1 1 0 1 0 1 0 0 1 b -0.5 1 1 0.5 0 1 0.5 0.5 0.5 0.5 1 w1 -0.5 0 0 -0.5 0 0 0.5 -0.5 0.5 0.5 0 w2 0.5 0 0 0.5 0 0 -0.5 0.5 -0.5 -0.5 0 ! Mx script for QTL association: sib pairs, univariate Group 1 : Calc NG=2 Begin Matrices; ! ** Parameters B Full 1 1 free W Full 1 1 free M Full 1 1 free S Full 1 1 free N Full 1 1 free ! association : between component ! association : within component ! mean ! Shared residual variance ! Nonshared residual variance ! ** Definition variables ** C Full 1 1 ! association : between X Full 1 1 ! association : within, sib 1 Y Full 1 1 ! association : within, sib 2 End Matrices; ! ** Uncomment for B=W model ! Equate W 1 1 1 B 1 1 1 ! Starting values Matrix B 0 Matrix W 0 Matrix M 0 Matrix S 0.5 Matrix N 0.5 End Group2 : Data Group Data NI=7 NO=0 RE file=assoc.dat Labels Sib1 Sib2 g1 g2 b w1 w2 Select Sib1 Sib2 b w1 w2 / Definition b w1 w2 / Matrices = Group 1 Means M + B*C + W*X Covariance S + N S | | Specify C b / Specify X w1 / Specify Y w2 / End | S _ S + N / M + B*C + W*Y / Models B&W B Full 1 1 free W Full 1 1 free !Equate W 1 1 1 B 1 1 1 B=W B Full 1 1 free W Full 1 1 free Equate W 1 1 1 B 1 1 1 B B Full 1 1 free W Full 1 1 !Equate W 1 1 1 B 1 1 1 B=W=0 B Full 1 1 W Full 1 1 !Equate W 1 1 1 B 1 1 1 Tests Test HA H0 Standard association test B=W B=W=0 Test of stratification B&W B=W Robust association test B&W B assoc.mx Model B W -2LL df B&W -0.478 -0.365 2103.96 795 B=W -0.420 -0.420 2105.05 796 B -0.4778 2127.01 796 2163.34 797 B=W=0 Test of total association HA H0 B=W B=W=0 2105.05 2163.34 Δ-2LL = 58.29, df = 1, p < 1e-14 assoc.mx Model B W -2LL df B&W -0.478 -0.365 2103.96 795 B=W -0.420 -0.420 2105.05 796 B -0.4778 2127.01 796 2163.34 797 B=W=0 Test of stratification HA H0 B &W B=W 2103.96 2105.05 Δ-2LL = 1.09, df = 1, p =0.29 assoc.mx Model B W -2LL df B&W -0.478 -0.365 2103.96 795 B=W -0.420 -0.420 2105.05 796 B -0.4778 2127.01 796 2163.34 797 B=W=0 Test of within association HA H0 B &W B 2103.96 2127.01 Δ-2LL = 23.06, df = 1, p < 1e-6 Implementation • QTDT – – – – – – Abecasis et al (2001) AJHG extends between/within model to general pedigrees multiple alleles covariates combined test of linkage and association discrete as well as quantitative traits Linkage Association • families • unrelateds or families • detectable over large distances >10 cM • detectable over small distances <1 cM • large effects OR >3, variance>10% • small effects OR<2, variance<1%