Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
An integrative genomics approach to infer causal associations between gene expression and disease Schadt, E. E., Lamb, J., Yang, X., Zhu, J., Edwards, S., Guhathakurta, D., Sieberts, S. K., Monks, S., Reitman, M., Zhang, C., Lum, P. Y., Leonardson, A., Thieringer, R., Metzger, J. M., Yang, L., Castle, J., Zhu, H., Kash, S. F., Drake, T. A., Sachs, A., and Lusis, A. J. Nature Genetics (37): 710-717 Speaker: Yen-Yi Ho Advisor: Giovanni Parmigiani Department of Biostatistics, Johns Hopkins University Outline • Introduction – Background & Definitions – Scientific Questions • Previous eQTL Studies – Gene Expression Data in Humans – Statistical Analytic Approaches – Results • Schadt et al. 2005: An Integrative Approach – Causality Models – Application: Gene Expression in BXD Mice – Results from Application • Discussion of New Approach QTL (Quantitative Trait Locus) L D Genetic locus (QTL; L), Disease (D) • More than 1000 monogenic Mendelian diseases controlling genes have been identified using traditional gene mapping approach. • Multiple genes, environmental factors, and interactions have limited the successes in human complex traits (such as cancer, diabetes, asthma). Introduction We have more information… DNA Genotype Data (SNP polymorphism) mRNA Gene expression Data Expression QTL (eQTL) Goal : Identify genomic locations where genotype significantly affects gene expression. Cis-, trans- , master trans- eQTLs cis- eQTL trans- eQTL master trans- eQTL Constructing regulatory networks (hypothetical example) Genetic locus Expression 1. 2. 3. 4. 5. 6. 1 (B) = cis 2 (A) = cis controlled by 1 (B) No controls 4(D) = cis controlled by 3 (F) Not a cis, controlled by 1 2 4 3 Not a cis, controlled by all Jansen, R.C. & Nap, J.P. (2001) Trends Genet, 2001, 17, 388-391 Scientific Questions • What is the variation and heritability of gene expression? • Are there associations between genetic loci and target gene expression? • What is the proportion of cis-/trans-eQTLs? • How do we verify of cis-? • Are there any master trans-eQTLs? • Annotation and functional categories do cis-, trans- and master trans-eQTLs (KEGG, GO,… ). Genetic locus Expression Scientific questions and goals • Transcript abundance may act as intermediate phenotype between genetic loci and the clinical phenotype. Secondary goal • Incorporate information of genotype, expression, and clinical traits together to construct regulatory networks and to improve understanding of disease etiologies. Data The data • They all used lymphoblastoid cell line from CEPH families to measure expression. Differences • 1. Selected different expression traits • 2. Platforms to measure expression / preprocess • 3. SNP markers density • 4. Different statistical approaches. Statistical methods of human eQTL mapping study Linkage • Nonparametric linkage analysis 1. Sib-pair analysis for quantitative trait (ASP) 2. Variance component analysis (VC) Association (Linkage disequilibrium) • Family-based association analysis (QTDT) • Population-based association analysis (GWA) Generally, the resolution of association approach would be greater than linkage. Comparison of resolution between linkage and association analysis Literature Review Literature review Genes with between / within individual variation > 1 Heritability None Literature Review Literature Review eQTL findings from previous studies • Hit rate: The proportion of expression traits significantly linked to eQTLs (range from 0.8-4%) • Proportion of cis-eQTL is about 30 % • 2 master trans-eQTLs were identified Literature Review Master trans-eQTLs 14q32 20q13 Literature Review An Integrative Approach: Schadt et al., Nature Genetics, 2005 Genetic locus Expression A integrative approach • Models for causality – Causal Model L mRNA Disease – Reactive Model L Disease mRNA – Independent Model mRNA L Disease New approach M1 Likelihood L L: Genotype R: mRNA level D: Disease Disease mRNA • Causal Model – Joint Probability p( L, R, D) p( L) p( R | L) p( D | R) – Likelihood p(D|R, L)=p( D|R) N 3 L( | M 1) p( L j ) L(ri | L j ) L(di | ri ) i 1 j 1 L( r | L) L( d | r ) 1 2 R exp{ 1 2 D|R (r R|L )2 exp{ 2 R } (d D|R )2 2 D| R } M2 Likelihood L Disease L: Genotype R: mRNA level D: Disease mRNA • Reactive Model – Joint probability P( L, R, D) P( L) P( D | L) P( R | D) – Likelihood N 3 L( | M 2) p( L j ) L(di | L j ) L(ri | di ) i 1 j 1 L(d | L) L( r | D ) 1 2 D exp{ 1 2 R|D (d D )2 exp{ p(R|D, L)=p( R|D) } ( r R| D ) 2 2 D 2 R| D } M3 Likelihood mRNA L L : Genotype R: mRNA level D: Disease Disease • Independent Model – Joint Probability P( L, R, D) P( L) P( R | L) P( D | R, L) – Likelihood N 3 L( | M 3) p( L j ) L(ri | L j ) L(di | ri , L j ) i 1 j 1 L(r | L) L ( d | R, L ) 1 2 R2 exp{ 1 2 D|R ( r R| L ) 2 exp{ 2 R } (d D|RL )2 2 D| R } Model Selection • Likelihood-based Causality Model Selection (LCMS) – Calculating the Likelihood based on the data. – The model best supported by the data : smallest AIC (Akaike Information Criterion) AIC=-2 ln L(ˆ) 2 p Simulation study L Ti Li T1 RL2,T1 RT21 ,T2 RL2,T2 The model with an AIC significantly smaller than the AIC’s of the competing models was noted. Application to BXD mice data The data BXD mice: F2 offspring from C57BL/6J (B6) and DBA/2J (DBA). • • C57BL/6J: ob mutation in the C57BL/6J mouse background (B6-ob/ob) causes obesity, but only mild and transient diabetes (Coleman and Hummel, 1973). DBA/2J: mice show a low susceptibility to developing atherosclerotic aortic lesions Gene expression • Liver extracted at 16 months of age • 23,574 gene expression measured using Agilent arrays Genetic loci • 139 autosomal genetic loci (microsatellite markers, 13 cM) Disease • Omental fat pad mass (OFPM) trait New approach ? Filtering L Disease mRNA ? ? • Identify 4 candidate regions for OFPM traits chr1 at 95cM, chr6 at 43 cM, chr9 at 8cM, chr19 at 28cM. • Expression traits significantly correlated with OFPM 440 intermediate expression traits were selected (P<0.001) • Expression trait with significant linkage eQTLs at the candidate regions. 113 expression trait and 267 eQTLs are identified • Perform LCM model selections for the 113 expression traits and ranked the expression traits by percent genetic variation in OFPM causally explained by traits. Results from Application Zfp90: zinc finger protein 90 Hsd11b1: 11-beta hydroxysteroid dehydrogenase isoform 1 C3ar1: complement component 3a receptor 1 Tgfbr2: transforming growth factor, beta receptor II C3ar1 -/- Knockout mice (n=5-7) 10 weeks of age Tgfbr2 +/- Knockout mice (n=5-7) Discussion L mRNA Disease • Fail to discriminate highly correlated traits. • Multiple filtering steps are involved. • Need more development if try to automatically apply to general data sets. • Measurement error of mRNA exceed D • Advantage of constructing eQTL networks Disease L is less likely. Reference • • • • • • • Morley, M.; Molony, C.M.; Weber, T.M.; Devlin, J.L.; Ewens, K.G.; Spielman, R.S. & Cheung, V.G., Genetic analysis of genome-wide variation in human gene expression. Nature, 2004, 430, 743-747 Monks, S.A.; Leonardson, A.; Zhu, H.; Cundiff, P.; Pietrusiak, P.; Edwards, S.; Phillips, J.W.; Sachs, A. & Schadt, E.E., Genetic inheritance of gene expression in human cell lines. Am J Hum Genet, 2004, 75, 1094-1105 Cheung, V.G.; Spielman, R.S.; Ewens, K.G.; Weber, T.M.; Morley, M. & Burdick, J.T. Mapping determinants of human gene expression by regional and genome-wide association. Nature, 2005, 437, 1365-1369 Stranger, B.E.; Forrest, M.S.; Clark, A.G.; Minichiello, M.J.; Deutsch, S.; Lyle, R.; Hunt, S.; Kahl, B.; Antonarakis, S.E.; Tavar?, S.; Deloukas, P. & Dermitzakis, E.T., Genomewide associations of gene expression variation in humans. PLoS Genet, 2005, 1, e78 Deutsch, S.; Lyle, R.; Dermitzakis, E.T.; Attar, H.; Subrahmanyan, L.; Gehrig, C.; Parand, L.; Gagnebin, M.; Rougemont, J.; Jongeneel, C.V. & Antonarakis, S.E. Gene expression variation and expression quantitative trait mapping of human chromosome 21 genes., Hum Mol Genet, 2005, 14, 3741-3749 Jansen, R.C. & Nap, J.P., Genetical genomics: the added value from segregation. Trends Genet, 2001, 17, 388-391 Schadt, E.E.; Lamb, J.; Yang, X.; Zhu, J.; Edwards, S.; Guhathakurta, D.; Sieberts, S.K.; Monks, S.; Reitman, M.; Zhang, C.; Lum, P.Y.; Leonardson, A.; Thieringer, R.; Metzger, J.M.; Yang, L.; Castle, J.; Zhu, H.; Kash, S.F.; Drake, T.A.; Sachs, A. & Lusis, A.J., An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet, 2005, 37, 710-717 Thank you ☺