Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Computational Challenges in Whole-Genome Association Studies Ion Mandoiu Computer Science and Engineering Department University of Connecticut Approaches to Disease Gene Mapping Linkage analysis Association analysis Cases LOD:=log10(L()/L(1/2)) Very successful for Mendelian diseases (cystic fibrosis, Huntington’s,…) Low power to detect genes with small relative risk in complex diseases [RischMerikangas’96] Controls 2-test Genome-wide scans made possible by recent progress in SNP genotyping technologies Computational Challenges Detecting genotyping errors Imputation of missing genotypes Imputation of untyped genotypes based on reference population (e.g., Hapmap) Haplotype inference and haplotype-based association tests Modeling gene-gene interactions Handling structural variation data provided by new sequencing technologies Optimal multi-stage study design 3 Genotype Error Detection A real problem despite advances in technology In [KMP07] we proposed efficient methods for error detection in trio data based on LLR approach combined with an HMM model of haplotype diversity Parents-COMBINED NO_ERR ERR 1 1000000 0.9 100000 #FP=#FN line 0.8 10000 1000 0.7 TotalProb-TRIO 10 5.94 5.4 5.67 5.13 4.86 4.59 4.32 4.05 3.78 3.51 3.24 2.7 2.97 2.43 2.16 1.89 1.62 1.35 1.08 0.81 0.54 0 0.27 1 Children-COMBINED NO_ERR Sensitivity 100 0.6 0.5 TotalProbCOMBINED 0.4 0.3 ERR 1000000 0.2 100000 0.1 FAMHAP 10000 0 1000 0 100 0.005 0.01 0.015 FP rate 10 5.88 5.6 5.32 5.04 4.76 4.48 4.2 3.92 3.64 3.36 3.08 2.8 2.52 2.24 1.96 1.68 1.4 1.12 0.84 0.56 0 0.28 1 In ongoing work we seek to improve error detection accuracy by using low-level data such as typing confidence scores Genotype Imputation Current genotyping platforms cover <1 mil. SNPs of ~10mil. SNPs causal variant unlikely to be assayed directly Untyped SNPs can be imputed based on linkage disequilibrium info inferred from high-density datasets such as Hapmap Maximum likelihood approach: probabilities computed using HMM Allele frequency, typed genotypes Allele frequency, imputed genotypes Acknowledgements & Advertisment Justin Kennedy, Bogdan Pasaniuc NSF funding (Awards 0546457 and 0543365) DIMACS Workshop on Computational Issues in Genetic Epidemiology August 21 - 22, 2008 DIMACS Center, CoRE Building, Rutgers University Presented under the auspices of the DIMACS/BioMaPS/MB Center Special Focus on Information Processing in Biology. Organizers: Andrew Scott Allen, Duke University, Ion Mandoiu, University of Connecticut Dan Nicolae, University of Chicago, Yi Pan, Georgia State University, Alex Zelikovsky, Georgia State University