Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
c JČMF 2006 ROBUST 2006 Poster Section Gene Expression Data Analysis for In Vitro Toxicology PETR ŠIMEČEK [email protected] Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic, Prague SUMMARY The poster introduces an analysis of microarrays including preprocessing, identification of outliers and statistical tests. The methods are demonstrated on a problem of identification of genes whose expression is affected by exposure to the allergens but not by the irritants. The inference is based on a dataset containing 72 microarrays. Each microarray comprises CD34–DC sample that has been in contact with one of the 6 chemical compounds (4 allergens + 2 irritants). BIOLOGICAL BACKGROUND QUALITY ASSESSEMENT A cDNA microarray consists of a large number of single stranded DNA spots arranged in a grid. Microarrays are used to measure the expression levels of large numbers of different genes (encoding different proteins) simultaneously. From inspected cells mRNA is extracted, purified, amplified, reverse transcribed and indirectly labeled with fluorescent dyes Cy5 (red) and Cy3 (green). During hybridisation the labeled cDNA sequences present in the pooled mixture bind to their complementary sequences on the microarray. Unhybridized cDNA is washed off and the microarray is scanned in a laser scanner. Quality of slides is usually graphically examined using scatter plots. This becomes difficult when the number of arrays is large. A helpful solution is proposed in [3]. (1) (2) For a given array, let Xi and Xi denote the log–intensities of ith gene for the first and the second dye, respectively, and let Ai be a correction term computed by loess regression. The idea is to divide Ai into two parts: (1) (1) e Xi (λ) = Xi − λ · Ai , e (2)(λ) = X (2) + (1 − λ) · Ai . X i i N Ii = max(T, RIi − BIi) • Variance of signals must be stabilized (e.g. by log–transformation). • Some normalization technique (e.g. linear, loess, lts or quantile regression) must be use to transform arrays to the same scale. An usual assumption is that only a small number of genes is differently expressed. 0.4 0.8 Gamma 0.4 0.3 0.2 10 0.1 0.0 0.8 0.7 0.6 0.5 0.0 22 32 43 33 81 1 4 55 50 24 51 47 38 59 35 39 25 37 70 48 6560 82 42 14 23 78 213 8 67 79 56 326 31 73 12 68 34 65 15 19 41 72 57 66 11 69 44 74 63 40 27 752 6 62 7 73 21 49 75 16 29 64 20 1718 71 70 80 0.00 0.10 0.20 1 − Mean correlation STATISTICAL TESTS When thousands of genes are tested for a change in expression due to an exposure, it can easily happen that a gene is marked as significant just by a chance. Extra attention must be therefore paid to multiplicity adjustment of the test level. Several statistical tests have been performed (e.g. paired test and ANOVA + their nonparametric equivalents) and 68 (of 11395) genes have been found significantly differently expressed after exposure to allergens compared to irritants. 2.0 Acknowledgement: The poster was supported by the grant GAČR 201/05/H007. 1.8 References: [1] Amaratunga D. and Cabrera J. (2003). Exploration and Analysis of DNA Microarray and Protein Array Data. Wiley & Sons. [2] Draghici S. (2003). Data Analysis Tools for DNA Microarrays. Chapman & Hall. [3] Park T. et al. (2005). Diagnostic Plots for Detecting Outlying Slides in a cDNA Microarray Experiment. BioTechniques 38. 1.6 second dye log−intensity 2.2 Array 32 0.4 Due to technology imperfections, substantial differences in intensity occur even among microarrays that are generated under exactly the same conditions. The purpose of preprocessing is to avoid such errors (cf. [1] and [2]). • A signal and a background must be separated. The raw intensity of the spots is strongly associated with the background intensity. That calls for a background adjustment. Let us denote RIi the raw spot intensity of the ith gene, and BIi the mean background intensity for the ith gene. Array correlation 0.9 PREPROCESSING Diagnostic plot 2 Maximum difference of correlations 1.0 Diagnostic plot 1 1.6 1.8 2.0 first dye log−intensity 2.2