Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
networking in biochemistry: building a mouse model of diabetes Brian S. Yandell, UW-Madison October 2008 www.stat.wisc.edu/~yandell Real knowledge is to know the extent of one’s ignorance. Confucius (on a bench in Seattle) October 2008 BMI Chair Talk © Brian S. Yandell 1 outline 1. 2. 3. 4. 5. how did I got here? what problems caught my eye? what have I done, anyway? how do I work in teams? what challenges remain? October 2008 BMI Chair Talk © Brian S. Yandell 2 how did I get here? • Biostatistics, School of Public Health, UC-Berkeley 1981 – RA/TA with EL Scott, J Neyman, CL Chiang, S Selvin – PhD 1981 • non-parametric inference for hazard rates (Kjell A Doksum) – Annals of Statistics (1983) 50 citations to date (2 in 2008) • research evolution – early career focus on survival analysis – shift to non-parametric regression (1984-99) – shift to statistical genomics (1991--) • joined Biometry Program at UW-Madison in 1982 – attracted by chance to blend statistics, computing and biology – valued balance of mathematical theory against practice – enjoyed developing methodology driven by collaboration October 2008 BMI Chair Talk © Brian S. Yandell 3 Yandell “Lab” Projects • Bayesian QTL Model Selection – R software development (Whipple Neely) – collaboration with UAB & Jackson Labs – data analysis of SCD1, ins10 • meta-analysis for fine mapping Sorcs1 – Chr 19 QTL introgressed as congenic lines – combined analysis across to increase power • QTL-based causal biochemical networks – algorithm development (Elias Chaibub) – data analysis with Christine Ferrara, Duke U October 2008 BMI Chair Talk © Brian S. Yandell 4 UAB: Allison, Yi Jax: Churchill, von Smith stat/hort: Yandell Duke: Newgaard, Ferrara biochem: Attie, Keller, Zhu BMI: Kendziorski, Broman, Craven Rosetta: Schadt, Zhang, Zhu October 2008 BMI Chair Talk © Brian S. Yandell 5 Pareto diagram of QTL effects 3 (modifiers) minor QTL polygenes 1 2 major QTL 0 3 additive effect major QTL on linkage map 2 1 October 2008 0 4 5 5 10 15 20 25 30 rank order of QTL BMI Chair Talk © Brian S. Yandell 6 problems of single QTL approach • wrong model: biased view – fool yourself: bad guess at locations, effects – detect ghost QTL between linked loci – miss epistasis completely • low power • bad science – use best tools for the job – maximize scarce research resources – leverage already big investment in experiment October 2008 BMI Chair Talk © Brian S. Yandell 7 advantages of multiple QTL approach • improve statistical power, precision – increase number of QTL detected – better estimates of loci: less bias, smaller intervals • improve inference of complex genetic architecture – patterns and individual elements of epistasis – appropriate estimates of means, variances, covariances • asymptotically unbiased, efficient – assess relative contributions of different QTL • improve estimates of genotypic values – less bias (more accurate) and smaller variance (more precise) – mean squared error = MSE = (bias)2 + variance October 2008 BMI Chair Talk © Brian S. Yandell 8 QTL mapping idea • observe phenotype y , marker genotypes m • genetic architecture identifies model – number and location of QTL – gene action and epistasis (pairwise interactions) • missing data: genotypes q at may be unknown – pr(q | m, , ) – form of genotype model well known • phenotype y depends on genotype q – pr(y | q, µ, ) – often linear model in q – possible interactions among QTL (epistasis) October 2008 BMI Chair Talk © Brian S. Yandell 9 October 2008 BMI Chair Talk © Brian S. Yandell 10 how does phenotype y improve guess of QTL genotypes q? D4Mit41 D4Mit214 what are probabilities for genotype q between markers? 120 bp 110 recombinants AA:AB 100 all 1:1 if ignore y and if we use y? 90 AA AA AB AA AA AB AB AB Genotype October 2008 BMI Chair Talk © Brian S. Yandell 11 Gibbs sampler for loci indicators • QTL at pseudomarkers • loci indicators = 1 if QTL present = 0 if no QTL present • Gibbs sampler on loci indicators – relatively easy to incorporate epistasis – Yi et al. (2005, 2007 Genetics) • (earlier work of Yi, Ina Hoeschele) ( q ) ( q ) ( q , q ) , 0 , 1 q October 2008 0 1 11 2 22 12 12 12 k BMI Chair Talk © Brian S. Yandell 12 likelihood and posterior • likelihood relates “known” data (y,m,q) to unknown values of interest (,,) – pr(y,q|m,,,) = pr(y|q,,) pr(q|m,,) – mix over unknown genotypes (q) • posterior turns likelihood into a distribution – weight likelihood by priors – rescale to sum to 1.0 – posterior = likelihood * prior / constant October 2008 BMI Chair Talk © Brian S. Yandell 13 Bayes theorem for QTLs likelihood *prior posterior constant phenotype likelihood *[prior for q , , ,] posterior for q , , , constant pr (y|q , ,)*[pr (q|m , ,)pr (|)pr (|m ,)pr ()] pr (q , , ,| y,m ) pr (y|m ) October 2008 BMI Chair Talk © Brian S. Yandell 14 why use a Bayesian approach? • first, do both classical and Bayesian – always nice to have a separate validation – each approach has its strengths and weaknesses • classical approach works quite well – selects large effect QTL easily – directly builds on regression ideas for model selection • Bayesian approach is comprehensive – samples most probable genetic architectures – formalizes model selection within one framework – readily (!) extends to more complicated problems October 2008 BMI Chair Talk © Brian S. Yandell 15 Markov chain sampling • construct Markov chain around posterior – posterior is stable distribution of Markov chain – use MC samples to estimate posterior • sample QTL model unknowns from full conditionals – update unknowns one at a time or in batches ( , q , , ) ( , q , , ) ( , q , , ) ( , q ,,) ~ pr ( , q ,,|y , m ) 1 October 2008 2 BMI Chair Talk © Brian S. Yandell N 16 Bayes posterior vs. maximum likelihood • LOD: classical Log ODds – maximize likelihood over effects µ – R/qtl scanone/scantwo: method = “em” • LPD: Bayesian Log Posterior Density – average posterior over effects µ – R/qtl scanone/scantwo: method = “imp” LOD ( ) log {max (y|m , , )} c 10 pr LPD ( ) log { pr ( |m )pr (y|m , , )pr ( ) d } C 10 likelihood mixes over missing QTL genotypes : pr (y|m , , ) pr (y|q , )pr (q |m , ) q October 2008 BMI Chair Talk © Brian S. Yandell 17 LOD & LPD: 1 QTL n.ind = 100, 10 cM marker spacing October 2008 BMI Chair Talk © Brian S. Yandell 18 marginal LOD or LPD • what is contribution of a QTL adjusting for all others? – improvement in LPD due to QTL at locus – contribution due to main effects, epistasis, GxE? • how does adjusted LPD differ from unadjusted LPD? – raised by removing variance due to unlinked QTL – raised or lowered due to bias of linked QTL – analogous to Type III adjusted ANOVA tests • can ask these same questions using classical LOD – see Broman’s newer tools for multiple QTL inference October 2008 BMI Chair Talk © Brian S. Yandell 19 1-QTL LOD vs. marginal LPD 1-QTL LOD October 2008 BMI Chair Talk © Brian S. Yandell 20 hyper data: scanone October 2008 BMI Chair Talk © Brian S. Yandell 21 what is best estimate of QTL? • find most probable pattern – 1,4,6,15,6:15 has posterior of 3.4% • estimate locus across all nested patterns – Exact pattern seen ~100/3000 samples – Nested pattern seen ~2000/3000 samples • estimate 95% confidence interval using quantiles > best <- qb.best(qbHyper) > summary(best)$best 247 245 248 246 chrom locus locus.LCL locus.UCL n.qtl 1 69.9 24.44875 95.7985 0.8026667 4 29.5 14.20000 74.3000 0.8800000 6 59.0 13.83333 66.7000 0.7096667 15 19.5 13.10000 55.7000 0.8450000 > plot(best) October 2008 Manichaikul et al. 2008 Genetics (in review) BMI Chair Talk © Brian S. Yandell 22 what patterns are “near” the best? • size & shade ~ posterior • distance between patterns – – – – sum of squared attenuation match loci between patterns squared attenuation = (1-2r)2 sq.atten in scale of LOD & LPD • multidimensional scaling – MDS projects distance onto 2-D – think mileage between cities October 2008 BMI Chair Talk © Brian S. Yandell 23 Software for Bayesian QTLs R/qtlbim: www.qtlbim.org • Properties – cross-compatible with R/qtl – new MCMC algorithms • Gibbs with loci indicators; no reversible jump – epistasis, fixed & random covariates, GxE – extensive graphics • Software history – initially designed (Satagopan, Yandell 1996) – major revision and extension (Gaffney 2001) – R/bim to CRAN (Wu, Gaffney, Jin, Yandell 2003) – R/qtlbim to CRAN (Yi, Yandell et al. 2006) • Publications – Yi et al. (2005); Yandell et al. (2007); Yi et al. (2007ab) October 2008 BMI Chair Talk © Brian S. Yandell 24 BTBR mouse is insulin resistant B6 is not make both obese… glucose (courtesy AD Attie) October 2008 BMI Chair Talk © Brian S. Yandell insulin 25 studying diabetes in an F2 • mouse model: segregating panel from inbred lines – B6.ob x BTBR.ob F1 F2 – selected mice with ob/ob alleles at leptin gene (Chr 6) – sacrificed at 14 weeks, tissues preserved • physiological study (Stoehr et al. 2000 Diabetes) – mapped body weight, insulin, glucose at various ages • gene expression studies – RT-PCR for a few mRNA on 108 F2 mice liver tissues • (Lan et al. 2003 Diabetes; Lan et al. 2003 Genetics) – Affymetrix microarrays on 60 F2 mice liver tissues • U47 A & B chips, RMA normalization • design: selective phenotyping (Jin et al. 2004 Genetics) October 2008 BMI Chair Talk © Brian S. Yandell 26 log10(ins10) Chr 19 black=all blue=male red=female purple=sexadjusted solid=512 mice dashed=311 mice October 2008 BMI Chair Talk © Brian S. Yandell 27 Sorcs1 study in mice: 11 sub-congenic strains marker regression meta-analysis within-strain permutations Nature Genetics 2006 Clee, Yandell et al. October 2008 BMI Chair Talk © Brian S. Yandell 28 Interaction plot for D19Mit58 and D8Mit289 we were lucky! 2.0 AA AB BB 1.8 BTBR background needed to see SORCS1 1.6 logins10 epistatic interaction of chr 19 and 8 … discovered much later D19Mit58 1.4 1.2 1.0 0.8 AA AB BB D8Mit289 October 2008 BMI Chair Talk © Brian S. Yandell 29 Sorcs1 gene & SNPs October 2008 BMI Chair Talk © Brian S. Yandell 30 Sorcs1 study in humans Diabetes 2007 Goodarzi et al. October 2008 BMI Chair Talk © Brian S. Yandell 31 2M observations 30,000 traits 60 mice October 2008 BMI Chair Talk © Brian S. Yandell 32 experimental context • B6 x BTBR obese mouse cross – model for diabetes and obesity – 500+ mice from intercross (F2) – collaboration with Rosetta/Merck • genotypes – 5K SNP Affymetrix mouse chip – care in curating genotypes! (map version, errors, …) • phenotypes – clinical phenotypes (>100 / mouse) – gene expression traits (>40,000 / mouse / 4-6 tissues) – other molecular traits (proteomic, miRNA, metabolomic) October 2008 BMI Chair Talk © Brian S. Yandell 33 QTL mapping thousands of gene expression traits PLoS Genetics 2006 Lan, Chen et al. October 2008 BMI Chair Talk © Brian S. Yandell 34 QTLs on chr n gray scale for variance red=trans blue=cis October 2008 BMI Chair Talk © Brian S. Yandell 35 Chaibub Neto et al. (2008) Genetics October 2008 BMI Chair Talk © Brian S. Yandell 36 causal phenotype networks • goal: mimic biochemical pathways with directed (causal) networks • problem: association (correlation) does not imply causation • resolution: bring in driving causes – genotypes (at conception) – processes earlier in time October 2008 BMI Chair Talk © Brian S. Yandell 37 Causal vs Reactive? (Elias Chaibub, Brian Yandell) y1 causes y2: y1 ~ g1 and y2 ~ g2*y1 October 2008 BMI Chair Talk © Brian S. Yandell 38 Ferrara et al. October 2008 BMI Chair Talk © Brian S. Yandell 39 inferring phenotype networks • build in prior pathway knowledge (PPI, TF) – co-map correlated traits • Banerjee, Yandell, Yi (2008 Genetics) – pathways induce correlation structure • ramp up to 100s, 1000s of phenotypes? – danger of mixing unrelated pathways – want closely linked upstream (causal) drivers October 2008 BMI Chair Talk © Brian S. Yandell 40 UAB: Allison, Yi Jax: Churchill, von Smith stat/hort: Yandell Duke: Newgaard, Ferrara biochem: Attie, Keller, Zhu BMI: Kendziorski, Broman, Craven Rosetta: Schadt, Zhang, Zhu October 2008 BMI Chair Talk © Brian S. Yandell 41 why build Web eQTL tools? • common storage/maintainence of data – one well-curated copy – central repository – reduce errors, ensure analysis on same data • automate commonly used methods – biologist gets immediate feedback – statistician can focus on new methods – codify standard choices October 2008 BMI Chair Talk © Brian S. Yandell 42 how does one build tools? • no one solution for all situations • use existing tools wherever possible – new tools take time and care to build! – downloaded databases must be updated regularly • human component is key – need informatics expertise – need continual dialog with biologists • build bridges (interfaces) between tools – Web interface uses PHP – commands are created dynamically for R • continually rethink & redesign organization October 2008 BMI Chair Talk © Brian S. Yandell 43 steps in using Web tools • • • • user enters data on Web page PHP tool interprets user data PHP builds R script R run on script – creates plots, summaries, warnings • PHP grabs results & displays on page • user examines, saves • user modifies data and reruns October 2008 BMI Chair Talk © Brian S. Yandell 44 raw data or fancy results? • raw data flexible but slow – LOD profiles for 100 (1000) traits? • fancy results from sophisticated analysis – IM, MIM, BIM, MOM analysis – too complicated to put in biologists’ hands? • methods are unrefined, state-of-art, research tools • use of methods involved many subtle choices – batch computation over weeks • compute once, save, display many times October 2008 BMI Chair Talk © Brian S. Yandell 45 October 2008 BMI Chair Talk © Brian S. Yandell 46 LOD profiles: many traits October 2008 BMI Chair Talk © Brian S. Yandell 47 1.5 LOD interval approximate 95% CI October 2008 BMI Chair Talk © Brian S. Yandell 48 QTLs on chr n gray scale for variance red=trans blue=cis October 2008 BMI Chair Talk © Brian S. Yandell 49 what challenges remain? • from eQTL to candidate pathways – statistical issues • networks, correlated traits • better model selection approaches – biological evidence (Weiss 2007 Genetics) • Mouse to human to mouse • KOs, etc. • upgrade informatics environment – harden local code (R, Python, PHP, …) – build on other high throughput systems • Swertz, Jansen (2007); Stein (2008) Nat Rev Gen October 2008 BMI Chair Talk © Brian S. Yandell 50 many thanks Karl Broman Jackson Labs Gary Churchill Hao Wu Randy von Smith U AL Birmingham David Allison Nengjun Yi Tapan Mehta Samprit Banerjee Ram Venkataraman Daniel Shriner Tom Osborn David Butruille Marcio Ferrera Josh Udahl Pablo Quijada Alan Attie Michael Newton Hyuna Yang Daniel Sorensen Daniel Gianola Liang Li my students Jonathan Stoehr Hong Lan Susie Clee Jessica Byers Mark Keller Jaya Satagopan Fei Zou Patrick Gaffney Chunfang Jin Elias Chaibub Neto W Whipple Neely Jee Young Moon USDA Hatch, NIH/NIDDK (Attie), NIH/R01 (Yi, Broman) October 2008 BMI Chair Talk © Brian S. Yandell 51