Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Analysis of whole genome association studies in pedigreed populations Goutam Sahana Genetics and Biotechnology Faculty of Agricultural Sciences Aarhus University, 8830 Tjele, Denmark AAR H U S UNIVERSITET Faculty of Agricultural Sciences Concept of mapping Identification of genetic variant underlying disease susceptibility or a trait value Evidence for the location of the gene = Causal variant Approaches to Mapping 1. Candidate gene studies   Association Resequencing approaches 2. Genome-wide studies   Linkage analysis Genome-wide association studies (Linkage disequilibrium, LD mapping) Linkage mapping  Look for marker alleles that are correlated with the phenotype within a pedigree  Different alleles can be connected with the trait in the different pedigrees Association mapping  Marker alleles are correlated with a trait on a population level  Can detect association by looking at unrelated individuals from a population  Does not necessarily imply that markers are linked to (are close to) genes influencing the trait. Linkage vs. association Effect Unlikely to exist Linkage analysis Association study Very difficult Freq. of causal variant Modified from D. Altschuler Linkage vs. association Potential Advantage Linkage Association No prior information regarding gene function required + + Localization to small genomic region - + Not susceptible to effects of stratification + -/+ Sufficient power to detect common alleles of modest effect (MAFs>5%) -/+ + Ability to detect rare allele (MAFs<1%) + - Tools for analysis available + +/- Hirschhorn & Daly, Nature Rev. Genet. 2005 Allelic Association  Direct Association  Allele of interest is itself involved in phenotype  Indirect Association  Allele itself is not involved, but due to LD with the functional variant  Spurious association  Confounding factors (e.g., population stratification) Linkage disequilibrium  Non random association between alleles at different loci. Loci are in LD if alleles are present on haplotypes in different proportions than expected based on allele frequencies  Two alleles that are in LD are occurring together more often than would be expected by chance Linkage disequilibrium Locus A: Alleles A & a; freq. PA & Pa Locus B: Alleles B & b; freq. PB & Pb Possible haplotyoes A A a a B b B b Expected frequencies: pApB p Ap b p a p B p a p b Observed frequencies: pAB pAb paB D = pAB - pApB ≠ 0 pab LD variation across genome  The extent of LD is highly variable across the genome  The determinants of LD are not fully understood.  Factors that are believed to influence LD      Genetic drift Population growth Admixture or migration Selection Variable recombination rates Haplotype Genotypes Locus1 Locus2 Locus3 Locus4 Locus5 Locus6 2 1 3 4 2 1 4 3 2 1 3 2 Haplotypes Identification of phase PHASE BEAGLE 4 1 3 1 2 2 2 3 2 4 3 1 Haplotype-based analysis  Increased ability to identify regions that are shared identical by descent among affected individuals  Haplotypes may the causative ‘composite allele’ rather than a particular nucleotide at a particular SNP  Haplotype analysis is meaningful only if SNPS are in themselves in LD Monogenic verses Complex traits Monogenic trait  Mutation in single gene is both necessary and sufficient to produce the phenotype or to cause the disease  The impact of the gene on genetic risk is the same in all families  Follow clear segregation pattern in families  Typically rare in population Complex trait  Multiple genes lead to genetic predisposition to a phenotype  Pedigree reveals no Mendelian pattern  Any particular gene mutation is neither sufficient nor necessary to explain the phenotype  Environment has major contribution  We study the relative impact of individual gene on the phenotype Some examples Mendelian/ Complex No. of genes Incidence (in 100,000) Cystic fibrosis M 1 40 Huntington disease M 1 5-10 Diabetes, type 2 C ? 10,000 – 20,000 Alzheimer C ? 20,000 Schizophrenia C ? 1000 Disease Quantitative Trait A biological trait that shows continuous variation rather than falling into distinct categories Quantitative trait locus (QTL) - Genetic locus that is associated with variation in such quantitative trait Assessing genetic contributions to complex traits  Continuous characters (wt, blood pressure)  Heritability: Proportion of observed variance in phenotype explained by genetic factors  Discrete characters (disease)  Relative risk ratio: λ= risk to relative of an affected individual/risk in general population  λ encompasses all genetic and environmental effects, not just those due to any single locus Factors that influence identification of allelic association     Effect size Linkage disequilibrium Disease and marker allele frequencies Sample Size Reviewed by Zondervar & Cardon, Nature Rev. Genet. 2004 Odds ratio Sample size Disease allele freq. Marker allele freq. Odd ratio 3.0 0.2 0.05 2.0 1.3 0.2 150 360 2900 0.5 430 1250 11,000 0.2 1170 4150 40,000 0.5 4200 15000 160,000 No. of cases= no. of controls; D’=0.7; power 80%;  =0.001 Zondervar & Cardon (Nature Rev. Genet. 2004) Population stratification Consider two case/control samples, genotyped at a marker with alleles M and m Sample A Sample B M m Freq. Affected 50 50 0.10 Unaffec. 450 450 0.90 Freq. 0.50 0.50 2 NS M m Freq. Affected 1 9 0.01 Unaffec. 99 891 0.99 Freq. 0.10 0.90 2 NS Population stratification Sample A M Sample B m Freq. M m Freq. Affected 50 50 0.10 Affected 1 9 0.01 Unaffec. 450 450 0.90 Unaffec. 99 891 0.99 Freq. 0.50 0.50 Freq. 0.10 0.90 M m Freq. Affected 51 59 0.055 Unaffec. 549 1341 0.945 Freq. 0.30 0.70 2 =14.8 P<0.001 Dealing with population structure  Genomic control (Devlin and Roeder, 1999)  Inflate the distribution of the test statistic by λ.  λ estimated from data 2 No stratification E(2) Test locus Unlinked ‘null’ markers 2 E(2) Stratification Adjust test statistics Dealing with population structure  Structured association (Pritchard et al., 2000)  Discover structure from set of unlinked markers, i.e. assign probabilities of ancestry from k populations to each individual, and then control for it. Association analysis approaches  Case–control studies  Markers frequencies are determined in a group of affected individuals and compared with allele frequencies in a control population  Family based methods  Based on unequal transmission of alleles from parents to a single affected child in each family. Associations are summed over many unrelated families Case-Control studies: 2 test Alleles Genotypes 11 12 22 Case n11 n12 n22 N Ctrl m11 m12 m22 M T12 T22 N+M Total T11 1 2 Total Case n1 n2 2N Ctrl m1 m2 2M Total T1 T2 2(N+M) Total 2x3 contingency table 2x2 contingency table Test of independence: 2 = (O-E)2/E with 2 or 1 df Family based tests  Genotypes from independent family trios where the child is affected  Use the non-transmitted genotypes or alleles as internal controls to the transmitted ones Family-based association studies ? ? 12 34 14 23 14 transmitted non-transmitted control Is an allele transmitted more often than it’s not transmitted to affected offspring ? TDT: Transmission Disequilibrium Test G/g G/G G/g Transmitted Non-transmitted G g G a b g c d TDTG = (TG-NTG)2/(TG+NTG) =(b-c)2/(b+c) ~ 21 TDT: Transmission Disequilibrium Test  Multiallelic markers  ETDT (Sham & Curtis, 1995)  Missing parent genotypes  TRANSMIT (Cayton,1999)  Haplotypes  TDTHAP (Clayton & Jones, 1999)  Sibs  TDT/STDT (Spielman & Ewens, 1998)  Pedigrees  PBAT (Martin et al, 2000)  Quantitative traits  QTDT (Abecasis et al. 2000) Some limitations  Subjects – random or structure family  Parents not available  Difficult when there are very many genes individually of small effect  Environmental influence may obscure genetic effects  Genetic heterogeneity underlying disease phenotype  Hidden (unaccounted) relationship Rare allele Single family is segregating A a B b Offspring group I Offspring group II Complex pedigree & Quantitative traits Complex pedigree  Non-independence among pedigree members  Only polygenic relationship is not sufficient  Association analysis should account for the point-wise relationship among individuals  Identical-by-decent probabilities Methods     Combined linkage and LD Generalized linear models Mixed-model (Yu et al. 2006) Bayesian approach Combined linkage and LD Phenotype= Fixed factors + Polygene + Haplotype • Polygene – the whole relationship in pedigree is used • Identical-by-descend coefficients were estimated for point-wise relationship Phase determination - GDQTL QTL mapping - DMU QTL for Clinical Mastitis in cattle 16 14 12 LA LRT 10 8 6 4 2 0 0.0 0.1 0.2 0.3 0.4 0.5 Morgan 0.6 0.7 0.8 0.9 1.0 QTL for Clinical Mastitis in cattle 16 14 12 LA LRT 10 8 6 4 LD 2 0 0.0 0.1 0.2 0.3 0.4 0.5 Morgan 0.6 0.7 0.8 0.9 1.0 QTL for Clinical Mastitis in cattle 16 14 LD/LA 12 LA LRT 10 8 6 4 LD 2 0 0.0 0.1 0.2 0.3 0.4 0.5 Morgan 0.6 0.7 0.8 0.9 1.0 Simulation      100 half-sib families (Dairy cattle pedigree) 2000 progeny 5 chromosomes – 100 cM (each) SNP – 5000 15 QTL (1QTL-10%, 4QTL-5 %, 10QTL–2%)  50% of the genetic variance  Heritability – 30% Generalized linear models Phenotype= Sire-family + genotype Software – TASSEL http://www2.maizegenetics.net/index.php?page=bioinformatics/tassel Generalized linear models 120 100 - ln(p) 80 60 40 20 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Generalized linear models 120 100 - ln(p) 80 60 40 20 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Generalized linear models 120 100 - ln(p) 80 60 40 20 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Mixed-model (Yu et al. 2006) Phenotype= Fixed factors + SNP + Population + polygene Relationship 0 1 2 STRUCTURE SAS mixed model (Gael Pressoir) Mixed-model 120 100 - ln(p) 80 60 40 20 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Mixed-model 120 100 - ln(p) 80 60 40 20 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Mixed-model 120 100 - ln(p) 80 60 40 20 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Bayesian approach Phenotype= Fixed factors + Polygene + Allele or Haplotype • All markers are fitted simultaneously, search for marker combination that explains the trait variation • Avoid multiple testing Software – iBays (Janss LLG, 2007) Bayesian approach Posterior probability 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Bayesian approach Posterior probability 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Multiple testing Multiple testing  Performing one test at an alpha level of 0.05 implies 5% chance of rejecting a true null hypothesis (false positive)  Performing 100 tests at  = 0.05 when all 100 H0 are true, we expect 5 of the tests to give FP results  Pr(at least one FP)=1-Pr(no FP)= 1- (0.95)100 = 0.994  (if the tests are independent) Multiple testing  Bonferroni correction  Rejection level of each test is i  /m  Permutation test  False discovery rate (FDR)  What proportion of rejections are when H0 is true?  Of all the times you reject H0 how often is H0 true?  q value (Storey et al. PNAS 2003) Summary  4 methods     LD and linkage GLM Mixed-model Bayesian approach Project team Goutam Sahana Bernt Guldbrandtsen Luc Janss Mogens Sandø Lund