* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Computational Diagnosis
Long non-coding RNA wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Oncogenomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome evolution wikipedia , lookup
Microevolution wikipedia , lookup
Genomic imprinting wikipedia , lookup
Helitron (biology) wikipedia , lookup
Minimal genome wikipedia , lookup
Genome (book) wikipedia , lookup
Designer baby wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Ridge (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang, Max Planck Institute for Molecular Genetics, Berlin Harry Zuzan, Carrie Blanchette, Erich Huang, Holly Dressman, Jeff Marks, Joe Nevins, Mike West Duke Medical Center & Duke University Estrogen Receptor Status • • • • 7000 genes 49 breast tumors 25 ER+ 24 ER- Tumor – Chip - 7000 Numbers Given Wanted 89% 7000 Numbers The probability that the tumor is ER+ 7000 Numbers Are More Numbers Than We Need Predict ER status based on the expression levels of super-genes Singular Value Decomposition Loadings Singular values E A DF X Data Expression levels of super genes, orthogonal matrix Probit Model P[ Yi 1 | ] (0 βi xi all supergenes Yi i xi Class of tumor i Distribution Function of a Standard Normal Regression weight for super gene i Expression Level of super gene i ) Overfitting • Using only a small number of super genes is not robust at all • When using many (all) supergenes, the linear model can be easily saturated, i.e. we have several models that fit perfectly well • Consequence: For a new patient we find among these models some that support that she is ER+ and others that predict she is ER- Given the Few Profiles With Known Diagnosis: • The uncertainty on the right model is high • The variance of the model-weights is large • The likelihood landscape is flat • We need additional model assumptions to solve the problem Informative Priors Likelihood Prior Posterior If the Prior Is Chosen Badly: • We can not reproduce the diagnosis of the training profiles any more • We still can not identify the model • The diagnosis is driven mostly by the additional assumptions and not by the data The Prior Needs to Be designed in 49 Dimensions • • • • Shape? Center? Orientation? Not to narrow ... not to wide Shape multidimensional normal for simplicity Center i P [ Yi 1 | ] Assumptions on the model correspond to assumptions on the diagnosis Orientation orthogonal super-genes ! Not to Narrow ... Not to Wide Auto adjusting model Scales are hyper parameters with their own priors Prior given the hyper parameter Hyper parameter Rescaling by singular values n p( | T ) N ( i | 0, / d ) i 1 2 i Independent super genes Unbiased prior 2 i A prior for the hyper parameters -Conjugate prior -Flexibility for i -Symmetric U-Shaped prior for 2 i i P [ Yi 1 | ] ~ Gamma ( k / 2, k / 2 ) k=2 or k=3 Latent Variable P[ Yi 1 | ] (0 βi xi ) all supergenes hi 0 i x i ~ N ( 0,1 ) Yi 1 hi 0 Albert & Chip 1993 MCMC - Gibbs Sampler - Sequential updates of conditional distributions p( | X , h, T ) ~ normal p(T | X , h, ) ~ gamma p(h | X , , T ) ~ truncated normal All conditional posteriors can be calculated analytically West 2001, Albert & Chip 1993 What are the additional assumptions that came in by the prior? • The model can not be dominated by only a few super-genes ( genes! ) • The diagnosis is done based on global changes in the expression profiles influenced by many genes • The assumptions are neutral with respect to the individual diagnosis Which Genes Have Driven the Prediction ? Gene Weight nuclear factor 3 alpha 0.853 cysteine rich heart protein 0.842 estrogen receptor 0.840 intestinal trefoil factor 0.840 x box binding protein 1 0.835 gata 3 0.818 ps 2 0.818 liv1 0.812 ... many many more ... ... Thank you!