Download Gene Expression Data Analysis for In Vitro Toxicology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Degradomics wikipedia, lookup

Transcript
c JČMF
2006 ROBUST 2006 Poster Section
Gene Expression Data Analysis for
In Vitro Toxicology
PETR ŠIMEČEK
[email protected]
Institute of Information Theory and Automation,
Academy of Sciences of the Czech Republic, Prague
SUMMARY
The poster introduces an analysis of microarrays including preprocessing, identification of outliers and statistical tests. The methods are demonstrated on a problem of identification of genes
whose expression is affected by exposure to the allergens but not by the irritants. The inference
is based on a dataset containing 72 microarrays. Each microarray comprises CD34–DC sample
that has been in contact with one of the 6 chemical compounds (4 allergens + 2 irritants).
BIOLOGICAL BACKGROUND
QUALITY ASSESSEMENT
A cDNA microarray consists of
a large number of single stranded
DNA spots arranged in a grid. Microarrays are used to measure the
expression levels of large numbers
of different genes (encoding different
proteins) simultaneously. From inspected cells mRNA is extracted,
purified, amplified, reverse transcribed
and indirectly labeled with fluorescent dyes Cy5 (red) and Cy3
(green). During hybridisation the labeled cDNA sequences present in the pooled mixture bind to their complementary sequences on the microarray. Unhybridized cDNA is washed
off and the microarray is scanned in a laser scanner.
Quality of slides is usually graphically examined using scatter plots. This
becomes difficult when the number of arrays is large. A helpful solution
is proposed in [3].
(1)
(2)
For a given array, let Xi and Xi denote the log–intensities of ith
gene for the first and the second dye, respectively, and let Ai be a correction term computed by loess regression.
The idea is to divide Ai into two parts:
(1)
(1)
e
Xi (λ) = Xi − λ · Ai ,
e (2)(λ) = X (2) + (1 − λ) · Ai .
X
i
i
N Ii = max(T, RIi − BIi)
• Variance of signals must be stabilized (e.g. by log–transformation).
• Some normalization technique (e.g. linear, loess, lts or quantile regression) must be use to transform arrays to the same scale. An usual
assumption is that only a small number of genes is differently expressed.
0.4
0.8
Gamma
0.4
0.3
0.2
10
0.1
0.0
0.8
0.7
0.6
0.5
0.0
22
32
43
33
81 1
4
55
50
24
51
47
38
59
35
39
25
37
70
48
6560
82
42
14
23
78
213
8
67
79
56
326
31
73
12
68
34
65
15
19
41
72
57
66
11
69
44
74
63
40
27
752
6
62
7
73
21
49
75
16
29
64
20
1718
71
70
80
0.00
0.10
0.20
1 − Mean correlation
STATISTICAL TESTS
When thousands of genes are
tested for a change in expression
due to an exposure, it can easily
happen that a gene is marked as
significant just by a chance. Extra attention must be therefore
paid to multiplicity adjustment of the test level.
Several statistical tests have been performed (e.g. paired test and
ANOVA + their nonparametric equivalents) and 68 (of 11395) genes
have been found significantly differently expressed after exposure to
allergens compared to irritants.
2.0
Acknowledgement: The poster was supported by the grant GAČR 201/05/H007.
1.8
References:
[1] Amaratunga D. and Cabrera J. (2003). Exploration and Analysis of DNA
Microarray and Protein Array Data. Wiley & Sons.
[2] Draghici S. (2003). Data Analysis Tools for DNA Microarrays. Chapman & Hall.
[3] Park T. et al. (2005). Diagnostic Plots for Detecting Outlying Slides in a cDNA
Microarray Experiment. BioTechniques 38.
1.6
second dye log−intensity
2.2
Array 32
0.4
Due to technology imperfections,
substantial differences in intensity occur even among microarrays that are
generated under exactly the same
conditions. The purpose of preprocessing is to avoid such errors (cf.
[1] and [2]).
• A signal and a background must be
separated. The raw intensity of the
spots is strongly associated with the
background intensity. That calls for
a background adjustment. Let us denote RIi the raw spot intensity
of the ith gene, and BIi the mean background intensity for the ith gene.
Array correlation
0.9
PREPROCESSING
Diagnostic plot 2
Maximum difference of correlations
1.0
Diagnostic plot 1
1.6
1.8
2.0
first dye log−intensity
2.2