Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BioConductor R for Microarray Analysis Claudio Lottaz Computational Diagnostics Group Computational Molecular Biology Department Max Planck Institute for Molecular Genetics Overview 23-May-17 Overview • • • • • Introduction File Formats Data structures Analysis methods Summary Claudio Lottaz: BioConductor - R for Microarray Analysis 2 / 14 Introduction 23-May-17 The R-Project • S/Splus: commercial statistics software package • origins in the academic community • now commercialized • serious effort in graphical user interface and the like • R: public domain statistics software package • based on the public roots of S • still compatible with the S language • command-line like user interface Claudio Lottaz: BioConductor - R for Microarray Analysis 3 / 14 Introduction 23-May-17 BioConductor • R is extendable through packages • packages may be written in R (language) • programming interface to C available • BioConductor is a collection of packages • various contributors • various methods on different types of data • heterogeneous usage Claudio Lottaz: BioConductor - R for Microarray Analysis 4 / 14 File Formats 23-May-17 File Formats • Importing red/green experiments data • intensities from image processing output .spot or .gpr files (Spot or GenePix packages) • Textual information on probes and targets .gal and .gdl files generated by GenePix • Importing Affymetrix data: • reads Affymetrix CEL-files • needs copyright protected CDF-files for interpretation • Exporting tab delimited ASCII-files Claudio Lottaz: BioConductor - R for Microarray Analysis 5 / 14 Data Structures 23-May-17 Red/Green Specific Data Structures • marrayLayout objects: contain information on • Probes and their locations • House-keeping genes • marrayRaw objects: intensities for a batch of arrays • red/green, Foreground/back ground • information on applied targets • marrayNorm objects: post normalization data • Average log intensities, normalized log ratios • Normalization factors Claudio Lottaz: BioConductor - R for Microarray Analysis 6 / 14 Data Structures 23-May-17 Affymetrix Specific Data Structures • • • • • • Cdf objects: chip description Cel objects: contains probe data of one chip Cel.container object: a set of Cel objects PPSet object: all probes for a particular target PPSet.container object: a set of PPSet objects For convenience: Plobs (probe level objects) • contain a Cdf and a Cel-container object • Simple use, less flexible access Claudio Lottaz: BioConductor - R for Microarray Analysis 7 / 14 Data Structures 23-May-17 Common Data Structures • exprSet objects: hold expression data • matrix of expression data and standard errors • link to phenotype data and gene annotations • geneNames to identify the genes • phenoData objects: hold phenotype/patient data • list of variables for each phenotype • matrix of data: row per case, column per variable • Some packages use their own data structures Claudio Lottaz: BioConductor - R for Microarray Analysis 8 / 14 Data Structures 23-May-17 Utilities • Utilities for resampling • Aggregators • e.g. cumulate results in a cross-validation • Summary statistics • Convenient methods for graphical output • histograms, scatter plots, gene location, boxplots... • on various subsets of data Claudio Lottaz: BioConductor - R for Microarray Analysis 9 / 14 Analysis Methods 23-May-17 Red/Green Specific Analysis • Diagnostical plots to find printing, hybridization or scanning artifacts • boxplots, scatter plots and spatial images • Foreground, background, log-ratio... • Normalization (Yang et al. 2001, 2002) • location normalization: local weighted regression, intensity dependent or 2D spatial • Scale normalization: median absolute deviation (MAD) Claudio Lottaz: BioConductor - R for Microarray Analysis 10 / 14 Analysis Methods 23-May-17 Affymetrix Specific Analysis • Exploring probe level data (package affy) probe names, perfect match/mismatch intensities,... • Normalization (on probe data) • MVA plots for Affymetrix data • Various methods, default is quantile normalization • Determining expression levels • Various methods: Affymetrix (1999), Li&Wong (2001), Irizarry (2002) • Standard errors are determine per expression value Claudio Lottaz: BioConductor - R for Microarray Analysis 11 / 14 Analysis Methods 23-May-17 Common Analysis • Gene filtering: e.g. • find high expressed genes • find differentially expressed genes (also more than 2 groups) • Find genes with similar expression patterns to given gene of interest • Receiver operating characteristic (ROC) • Annotation: chromosome location, gene ontology Claudio Lottaz: BioConductor - R for Microarray Analysis 12 / 14 Analysis Methods 23-May-17 Common Analysis (continued) • Expression density diagnostics • gene-wise compare distributional shapes to find differences between groups • Multiple hypothesis testing • • • • family-wise error rates, false discovery rate minP and maxT procedures, step-up procedures based on various statistic (t-, F-, Wilcoxon...) adjusted p-values for genes declared differentially expressed, obtained through permutation Claudio Lottaz: BioConductor - R for Microarray Analysis 13 / 14 Summary 23-May-17 Summary • Public domain software, reproducible methods • Open source, references to publications • Sophisticated methods available • Rather specific input formats needed, license problem on Affymetrix chip description files • Some heterogeneity in implementation • Blurry definition of the R language Claudio Lottaz: BioConductor - R for Microarray Analysis 14 / 14