Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Toward a Characterization of Gene Expression in Single Tumor Samples JOSEPH LUCAS The Power of Microarrays  Promise of “personalized medicine”  Lack of consistency/reproducability  Problem with overfitting  Microarrays from lab bench to clinic  Data collection bias and quality control  In vitro -> in vivo  In vivo -> in vitro  Translation of meta-genes from in vitro to in vivo Overview  Laboratory bias  Experiments on the lab bench  Correction doping controls  Modeling to alleviate bias  Tumor expression  Factors as markers of pathway activity  Biological relevance  Clinical relevance  Beyond Oncogene Upregulation  Human Mammary Epithelial Cells (HMEC)  9 upregulated oncogenes and one set of controls  Data collected in three batches  Demonstrates collection bias  Bild et al., “Oncogenic pathway signatures in human cancers as a guide to targeted therapies” Nature 439 (19), 2006 Collection Bias  Doping control  Should be identical across all observations Collection Bias  Consistent errors across many genes  May obscure interesting biology Modeling to Correct Collection Bias Systematic Errors NFE2L1 Before Subtracting Error Systematic Errors, Corrected Before Subtracting Error NFE2L1 Upregulation of MYC • NFE2L1 – regulation of apoptosis • MYC binding sequence in promoter • GENES & DEVELOPMENT (2003-01-15) After Subtracting Error Single Sample – Factor Modeling  Personalized medicine  Need to deal with one array at a time  Can not use the same correction technique  Relative levels of genes within a sample should be informative Single Sample – Factor Modeling  Personalized medicine  Need to deal with one array at a time  Can not use the same correction technique  Relative levels of genes within a sample should be informative Design Matrix Latent Factors Latent factors for Correction of Lab Bias  Microenvironment Experiments  Chen et al., “Genomic analysis of response to lactic acidosis in human cancers”  Exact same conditions as oncogene experiment  24 “control” arrays split across 2 labs and 4 time points  Uncorrelated measurements of gene expression? Correlation between Two Different Samples Microenvironment, array #1 Consistently Correlated across all Pairs Correlation of ¼ -0.6 Microenvironment, array #1 Factor Model almost Eliminates Correlation Before Correction After Correction Microenvironment, array #1 Factor Model almost Eliminates Correlation Before Correction After Correction Oncogene, array #7 Microarray Quality Control (MAQC)  120 arrays, also U133+ 2.0  6 different labs  5 repetitions per group  4 groups  Universal Human Reference RNA  Human Brain Reference RNA  Titration of RNA to form groups  Nature Biotechnology, all of volume 24 (2006) Example 2  We believe these are collection errors  Due to pH, temperature, duration before washing, etc  Errors should be universal for U133+ 2.0 arrays  Keep all oncogene and microenvironment control observations  Keep all 120 observations from MAQC  Mean expression for each gene is different between MAQC and HMEC’s  Refit model, but assume error correction is same! Retain Ability to Correct Bias in HMEC Improved Fidelity  Have we improved the fidelity? Raw data, Labs 1,2,3,5 Raw data, Lab 4 Raw data, Lab 6 Corrected data UH 75% UH 25% UH HB UH – Universal Human Reference RNA HB – Human Brain Reference RNA Improved Fidelity  Very different error types, both corrected Differentially Expressed Gene UH 75% UH 25% UH HB UH – Universal Human Reference RNA HB – Human Brain Reference RNA UH 75% UH 25% UH HB Defining Success  By design, should be monotone ordering  Does probability of correctly ordering increase? Before Correction UH 75% UH 25% UH After Correction HB Red points are not monotone ! failure UH 75% UH 25% UH HB Red points are monotone ! success MAQC Experiment More than Error Correction?  Can correct biases from vastly different experiments  Aggregate data from multiple labs across multiple time points  Analyze and incorporate new data as it comes in More than Error Correction?  Can correct biases from vastly different experiments  Aggregate data from multiple labs across multiple time points  Analyze and incorporate new data as it comes in  Metagenes discovered in vitro can be used as in vivo phenotypes, however    Signatures developed in cloned cells Lack biological variability In vivo, other pathways will be active/inactive Factor Evolution  Break down into multiple pathways in vivo  Evolutionary factor search to dissect and enhance signatures   Carvalho, et al., “High-dimensional sparse factor modelling Applications in gene expression genomics.”, submitted Consider behavior of genes in vivo  Miller, et al., “An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival”, PNAS, 102, 13550-13555 (2005) in vitro ! in vivo * Mean() Initial genes New genes '225378_at' '225399_at' '225407_at' '225493_at' '225527_at' '225681_at' '225768_at‘ . . . Expression differences from lactic acidosis experiment Highly differentially expressed genes factors P53 Wild Type versus Mutant • Each factor is a collection of genes that are expressed together across all samples P53 Wild Type versus Mutant • Combinations of factors are predictive of important phenotypes Tamoxifen Didn’t receive Tamoxifen Treated with Tamoxifen Dark Blue • All patients receiving Tamoxifen were ER positive • Tamoxifen sensitivity independent of ER status Light Blue Endothelial Cell Signature?  Contains 143 of the 188 genes in a known microvascular endothelial cell signature  Chi et al., “Endothelial cell diversity revealed by global expression profiling”, PNAS, 100 (19), 2003  “independently of ER status of tumor cells, Tam could affect the microvessel structure through the antagonism with endothelial cells ER”  Clinical Cancer Research Vol. 7, 2656-2661, September 2001  [Tamoxifen] “inhibited tube formation by rat microvascular endothelial cells”  Gen Pharmacol (2000) 34: 107-16 Estrogen Receptors Trained on Miller Predictive on others (Wang) Breast Tumor Factors in Lung  Factor behavior in Lung tissue  Endothelial cell factor  Estrogen Receptor factor Endothelial Cell Factor Breast Cancer Samples Lung Cancer Samples Estrogen Receptor Factor Breast Cancer Samples Lung Cancer Samples Summary  Correction of laboratory biases  Allows aggregation of multiple data sets  Discovery of conserved metagenes relevant to  Survival  Cellular phenotypes, ER, PgR, P53  Identification of novel biology  Within a framework that allows identification of meta-genes on single arrays  Beyond?  Concurrent modeling of multiple different tumor types Collaborators Statistics Mike West Carlos Carvalho Dan Merl Quanli Wang Biology Jen-Tsan Ashley Chi Joe Nevins Julia Ling-Yu Chen Andrea Bild Microarrays to Identify Phenotypes Disease Diagnosis Cancer Alzheimers Survival prediction Infection Metastasis prediction Psoriatic Arthritis Drug susceptibility Leber’s Congenital Amaurosis . Usher syndrome . . . Development . Embryonic development . Cellular differentiation Radial symmetry Internal structure . . . Obesity Oligo GEArray® Mouse Obesity Microarray: OMM-017 •PharmaFrontier Co., Ltd. •Genetel Pharmaceuticals •Hong Kong DNA Chips . . . •Liver Int. 2005 Dec;25(6):1091-6. •Obesity Research 11:188-194 (2003) •Physiological Genomics 20:224-232 (2005) . . . Alzheimers Oligo GEArray® Human Alzheimer's Disease Microarray •PNAS 2004 Feb 17;101(7):2173-8. Epub 2004 Feb 9 •The Journal of Neuroscience, Feb 9, 2005, 25(6):1571-1578 •Ann Neurol. 2005 Dec;58(6):909-19 . . . •Primorigen Biosciences •ProteomTech •Ciphergen Biosystems, Inc Cancer Disease Diagnosis “Incipient Alzheimer's disease: Microarray correlation analyses reveal major transcriptional and tumor suppressor responses” PNAS, February 17, 2004, vol. 101 no. 7, 2173-2178 • “Microarray Analyses of Peripheral Blood Cells Identifies Unique Gene Expression Signature in Psoriatic Arthritis” Mol Med. 2005 Jan–Dec; 11(1-12): 21–29 Microarrays at IGSP             Joseph Nevins – Rb/E2F pathway Jen-Tsan Ashley Chi – tumor microenvrionment Phil Febbo – gene expression as phenotypes Anil Potti – individualized chemotherapy Tom Kepler – activation of dendritic cells Gregory Wray – development in echinoderms Paul Magwene – co-expression in microorganisms Geoffrey Ginsburg – expression in peripheral blood John Olson – surgical oncology Ornit Chiba-Falek – Philip Benfy – development and cell differentiation in Arabidopsis >95 papers published by the IGSP microarray facility since 1999 Expanding the Role of   Need not include only collection bias  Identifying signatures in other samples  Factors associated with:  Lactic acidosis  Hypoxia  Various oncogenes  Other sources  Gene lists Simple experiment Change in g,j Estrogen Receptors Trained on Miller Predictive on others (Wang) Progesteron Receptors Trained on Miller Predictive on others (Massague) • Makes use of the ER factor and a new PgR specific factor Predicting Survival Trained on Miller Predictive on the others (Wang) TGF -  Progesteron Receptor Breast Cancer Samples Lung Cancer Samples P53 Mutants Breast vs. Lung Breast Cancer Samples Lung Cancer Samples TGF -  Breast Cancer Samples Lung Cancer Samples Estrogen Receptor Breast vs. Ovarian Breast Cancer Samples Ovarian Cancer Samples