Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microarray Experiment Design and Data Interpretation Susan Hester, Ph.D. Environmental Carcinogenesis Division Toxicogenomic Core Facility US EPA [email protected] 919-541-1320 1 Presentation Outline • Traditional biology versus genomics • Basics of genomics • Data mining goals and approaches using parallel analyses -some examples •Interpreting changes in gene expression to identify altered molecular pathways • Evaluating pathway alterations in concert with traditional toxicology data for greater understanding of mode of action 2 Traditional Biology Measure one tree at a time Measure one element in 10-50 samples 3 “Omic” Biology Measure tens of thousands of elements in 2 to 4 samples Measure Forests (groups of trees) 4 Genomic research is a data-rich technology • Microarrays are called chips or arrays • Takes advantage of the natural property of DNA to pair with its complimentary strand • One strand is built into the array and then is used as a probe for the complementary strand in the biologic sample • The binding confirms the presence of mRNA or cDNA In the sample 5 Genomic Profiling-Find ”Significantly Changed Genes” From: All probesets Typical experiment is ~ 1M datapoints To: Reduce to a much smaller number of “meaningful genes” 6 Finding genes in samples-1st step 1 genechip cell location 1 genechip apply sample 7 2nd step Tagged DNA fragments that base pair will glow 2nd step shine light final image text file with gene intensities 8 Experimental Design • Use adequate controls • Sample collection • Choose time-points and doses • Hybridization schemes-1 or 2 colors 9 Data Quality and Data Mining • RNA quality • Scans • Summary statistics 10 RNA quality: • Agilent 2100 Bioanalyzer • Measure RNA quality and quantity • Uses small sample size and take minutes Good Quality RNA Agilent Gel Image Degraded RNA 11 QC Assessment of Scanned Slide • Showing Good Dynamic Range of Signal Intensity • Low background signal Poor scan Good scan 12 Summary Statistics for each array Raw gene intensity distribution for each array After normalization shows reduced variance max median min Grp 1 2 3 4 5 6 13 Example of with-in group outliers Example of 2 array outliers (high and low median values) Arrays 14 Goals of Data Mining • Reduce the large dataset by first exclude “unchanging genes” • Early microarray papers used a simple “fold change” to find differences • Most analyses now rely on statistical tests to identify changed genes-supervised versus unsupervised • Find genes that distinguish the various biologic classes “significant genes” 15 General Approach: From many genes to a few 28,000 rat genes 34,000 mouse genes normalize data to compare across arrays supervised (prior knowledge) T test, ANOVA, etc. and analysis begins here unsupervised (no prior knowledge) PCA, KNN, clustering genes…now associate with gene name using databases to assign gene function characterize genes into pathways explore pathways by combining into networks 16 Array Image Inspection Confirms the Induction of Many Genes 1 uM As 50 uM As 17 Statistical Filter shows more significant genes at higher doses 1 uM As 50 uM As genes that have values>1.5 fold and significant p<0.05 18 Many Views of the Data • Table of filtered genes • Principal Component Analysis (PCA) • Venn Diagrams-gene level • Correlate Transcription with Functional Assays • Map genes to pathways • Venn Diagram-pathway level 19 Table view: Significantly Altered Genes by Chemical, Day and Dose in rat liver Myclobutanil Propiconazole 4 Day Low-Dose Mid-Dose High-Dose 30 Day Low-Dose Mid-Dose High-Dose 90 Day Mid-Dose High-Dose Triadimefon 4 3 228 0 1 396 5 7 1275 381 1164 220 2536 1395 419 1033 2522 1134 2 36 272 8452 10446 10337 20 Principal Component Analysis • Identifies dose-response, if present • Assess experiment • Worth analyzing ? • Identify outliers-bad chips • Find samples with similar expression patterns What it looks like: What it does • uses all samples and genes • using statistics, reduces and plots the data • helps visualize data in 2 or 3 planes (3D) What it tells • groups samples or genes with similar profiles • differentiates treatment or exposure groups 21 Principal Component Analysis Rat Liver Principal Component Plot 30d HD 15 10 5 PCA3 0 -5 -10 -15 -25 20 10 0 PCA2 -10 -20 -30 0 -10 -20 -30 A1 50 40 30 20 10 -20 PC control Myclobutanil Propiconazole Triadimefon 22 Numbers of Common and Unique Genes Over Time (High Dose)-rat liver 23 Dose response corresponds to functional assays Functional assays Better description of dose response by genomics EROD Activities in 30 day Conazole treated livers Stress Response Genes 30 day dose curve for T/C 10 Fold Induction 9 14 12 6 5 4 3 2 1 10 Cyp2b15 Cyp4a12 Cyp1a1 Gsta2 Aldh1a1 Ces2 Udpgtr2 8 6 4 2 0 0 low mid high Dose PROD Acitivities in 30 day Conazole treated livers 140 Fold Induction Fold Change 8 7 120 100 80 60 40 20 0 T/C, L T/C, M T/C, Hi Low Mid High Dose Triadimefon Propiconazlole Myclobutanil 24 Mapping genes to pathways Pathway Process p-Value # of # of % genes genes in Expressed Pathway Transcription of Retinoid-Target genes Cell signaling/Regulation of transcription 7.56E-09 68 125 54 Regulation activity of EIF2 Cell signaling/Translation regulation 5.86E-05 31 56 55 IGF-R signaling Growth and differentiation 4.57E-06 40 72 56 AKT signaling Growth and differentiation 9.50E-06 33 57 58 PTEN pathway Growth and differentiation 3.65E-05 31 55 56 Tryptophan metabolism Metabolic maps/Amino acid metabolism 3.99E-05 17 24 71 Cholesterol Biosynthesis Metabolic maps/Steroid metabolism 6.25E-06 16 22 82 GTP-XTP metabolism Metabolic maps/Nucleotide metabolism 4.58E-07 34 54 63 CTP/UTP metabolism Metabolic maps/Nucleotide metabolism 1.32E-05 34 60 57 ATP/ITP metabolism Metabolic maps/Nucleotide metabolism 1.49E-05 36 65 55 25 Pathway Venn Unique and common pathways over time 26 Pathway and network visualizations • cellular • molecular • network • metabolic • transcription 27 Example of a molecular pathway with gene intensity values added Oxidative Phosphorylation pathway red=gene induced green=gene repressed rainbow=mixed ATPase Oxidoreductase NADH dehydrogenase succinate dehydrogenase complex cytochrome c oxidase subunit 28 Cellular pathway extracellular cytoplasmic Note c-Jun JNK1, ERK1 repression* nuclear Expression legend Green= decreased Red=increased Rainbow=mixed 29 Gene Network: One Transcription factor: 30 Network objects mapped to cellular localization 31 Conclusions Steps for a successful microarray experiment: • Experiment design-focus your research question • Data quality assessment • Supervised and unsupervised analyses • Integrating gene expression results with other phenotypic endpoints 32