Download Using lumi, a package processing Illumina Microarray

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Using lumi, a package processing Illumina
Microarray
Pan Du‡∗, Warren A. Kibbe‡†, Simon Lin‡‡
January 1, 2007
Robert H. Lurie Comprehensive Cancer Center
Northwestern University, Chicago, IL, 60611, USA
‡
Contents
1 Overview of lumi
1
2 Object models of major classes
2
3 Data preprocessing
3.1 Input data . . . . . . . . . . . . . .
3.2 Quality control of the raw data . .
3.3 Variance stabilizing transform . . .
3.4 Data normalization . . . . . . . . .
3.5 Quality control after normalization
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
3
4
10
13
13
4 Gene annotation
14
4.1 Examples of nuID . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Illumina microarray annotation package . . . . . . . . . . . . . . 20
4.3 Transfer Illumina microarray data as nuID annotated . . . . . . 21
5 A use case: from raw data to functional
5.1 Preprocess the Illumina data . . . . . .
5.2 Identify differentiate genes . . . . . . . .
5.3 Gene Ontology annotation . . . . . . . .
6 Reference
1
analysis
21
. . . . . . . . . . . . . . 22
. . . . . . . . . . . . . . 22
. . . . . . . . . . . . . . 23
25
Overview of lumi
lumi R package is designed to preprocess the Illumina microarray data. It includes data input, quality control, variance stabilization, normalization and gene
annotation part. The package can be easily integrated with other microarray
∗ [email protected][email protected][email protected]
1
class: ExpressionSet
Slots
assayData
exprs: gene expression (bead replicate mean)
featureData: gene information
experimentData: experiment meta data
phenoData: expreriment design
......
class: LumiBatch
Slots
assayData
se.exprs: bead replicate standard deviation
beadNum: bead number of each gene
detection: detection probability
history: operation tracking
......
class: LumiQC
Slots
assayData
cv: the coefficients of variance
mean, std, sampleCor,
detectionRate, sampleRelation,
outlier, history
......
Major methods
lumiR: initialization by reading raw data
lumiT: variance stabilizing transform
lumiN: normalization
lumiQ: quality control
getHistory
......
Major methods
plot:
plot QC plots: MAplot,
pairs, boxplot, sample relation,
outlier, cv, densityPlot
getHistory
......
Figure 1: Object models in lumi package
data analysis, like differentiated gene identification, gene ontology analysis or
clustering analysis.
2
Object models of major classes
The lumi package has two major class: LumiBatch and LumiQC. Both classes
are inherited from ExpressionSet class in Bioconductor for better compatibility. Their relations are shown in Figure 1. LumiBatch class includes se.exprs,
beadNum and detection in assayData slot for additional informations unique
to Illumina microarrays. Class LumiQC keeps the quality control information. The S4 function plot supports different kinds of plots by specifying the
specific plot type of LumiQC object. See help of plot-methods function for
details. The history slot records all the operations made on the LumiBatch or
LumiQC objects. This provides data provenance. Function getHistory is to
retrieve the history slot. Please see the help files of LumiBatch and LumiQC
class for more details.
3
Data preprocessing
The first thing is to load the lumi package.
2
Figure 2: An example of the input data format
> library(lumi)
3.1
Input data
The lumiR function supports directly reading the Illumina raw data output
of the Illumina Bead Studio toolkit. It can automatically detect the format
of the text file and create a new LumiBatch object for it. An example of
the input data format is shown in in Figure 2. For simplicity, only part of
the data of first sample is shown. The data in the highlighted columns are
kept in the corresponding slots of LumiBatch object, as shown in Figure 2.
The lumiR function will automatically determine the starting line of the data.
The columns started with AVG_Signal and BEAD_STDEV are required for the
LumiBatch object. The sample IDs and sample labels are extracted from
the column names of the data file. For example, based on the column name:
AVG_Signal-1304401001_A, we will extract "1304401001" as the sample ID
and "A" as the sample label (The function assumes the separation of the sample
ID and the sample label is "_" if it exists in the column name.). The algorithm
will check the uniqueness of sample IDs. If the sample ID is not unique, the
entire portion after "AVG_Signal-" will be used as a sample ID. A file format
error will be reported if the uniqueness still cannot be satisfied in this way.
>
>
>
>
## specify the file name
# fileName <- 'Barnes_gene_profile.txt'
## load the data
# x.lumi <- lumiR(fileName)
# Not Run
# Not Run
Here, we just load the pre-saved example data, which is a LumiBatch
object, example.lumi
>
>
>
>
## load example data
data(example.lumi)
## summary of the example data
example.lumi
Data Information:
Illumina Inc.
Normalization
Array Content
Error Model =
BeadStudio version 1.4.0.1
= none
= 11188230_100CP_MAGE-ML.XML
none
3
DateTime = 2/3/2005 3:21 PM
Local Settings = en-US
Major Operation History:
submitted
finished
command
1 2006-12-27 14:34:38 2006-12-27 14:36:57 lumiR("../data/Barnes_gene_profile.txt")
2 2006-12-27 15:14:37 2006-12-27 15:14:38 Subsetting 8000 features and 4 samples.
Object Information:
LumiBatch (storageMode: lockedEnvironment)
assayData: 8000 features, 4 samples
element names: beadNum, detection, exprs, se.exprs
phenoData
rowNames: A01, A02, B01, B02
varLabels and varMetadata:
sampleID: The unique Illumina microarray Id
label: The label of the sample
featureData
rowNames: GI_21687152-S, GI_20357509-A, ..., GI_4557600-S (8000 total)
varLabels and varMetadata:
targetID: The Illumina microarray gene ID
presentCount: The number of detectable measurements of the gene
experimentData: use 'experimentData(object)'
Annotation character(0)
3.2
Quality control of the raw data
The quality control of a LumiBatch object includes the gene expression value
and its coefficient of variance, estimating the mean and standard deviation of the
microarrays, microarray correlation, detectable probe ratio of each microarray,
density of each microarray, sample (microarray) relations, detecting outliers of
samples(microarrays).
LumiQ function will do these calculation based on a LumiBatch object and
organize the results in a LumiQC object.
>
>
>
>
## Do quality control estimation
lumi.Q <- lumiQ(example.lumi)
## summary of the quality control
lumi.Q
Class: LumiQC
Data dimension:
8000 genes x 4 samples
Summary of Samples:
A01
A02
B01
B02
mean
8.3300 8.5680 8.2630 8.347
standard deviation 1.5870 1.7140 1.7490 1.695
detection rate
0.5387 0.5574 0.5722 0.571
distance to center 12.1300 11.5800 12.0100 11.990
4
Note: the detection threshold is 0.99, "center" is the mean of all samples.
No outliers detected based on distance-to-center threshold 24.
Major Operation History:
submitted
1 2006-12-27 14:34:38 2006-12-27
2 2006-12-27 15:14:37 2006-12-27
3 2007-01-01 23:17:19 2007-01-01
finished
command
14:36:57 lumiR("../data/Barnes_gene_profile.txt")
15:14:38 Subsetting 8000 features and 4 samples.
23:17:27
lumiQ(x.lumi = example.lumi)
The S4 method plot can plot the quality control estimation results kept in
a LumiQC object. The quality control plots includes: the density plot (Figure
3), box plot (Figure 4), pairwise correlation between microarrays (Figure 5),
pairwise MAplot between microarrays (Figure 6), density plot of coefficient of
varience, (Figure 7), and the sample relations (Figure 8). More details are in the
help of plot,LumiQC-method function. Most of these plots can also be plotted
by the extended general functions: hist (for density plot), boxplot, MAplot,
pairs.
Plot the density plot of the LumiQC object. See Figure 3.
>
>
>
>
>
>
## plot the density
plot(lumi.Q, what='density')
## or
hist(lumi.Q)
## or
hist(example.lumi)
Plot the box plot of the LumiQC object. See Figure 4.
>
>
>
>
>
>
## plot the box plot
plot(lumi.Q, what='boxplot')
## or
boxplot(lumi.Q)
## or
boxplot(example.lumi)
Plot the pairwise sample correlation of the LumiQC object. See Figure 5.
>
>
>
>
>
>
## plot the pair plot
plot(lumi.Q, what='pair')
## or
pairs(lumi.Q)
## or
pairs(example.lumi)
The MA plot of the LumiQC object. See Figure 6.
>
>
>
>
>
>
## plot the MAplot
plot(lumi.Q, what='MAplot')
## or
MAplot(lumi.Q)
## or
MAplot(example.lumi)
5
Density plot of intensity
0.3
0.2
0.1
0.0
density
0.4
0.5
0.6
A01
A02
B01
B02
6
8
10
12
14
16
intensity
Figure 3: Density plot of Illumina microarrays before normalization
6
12
10
8
B02
B01
A02
A01
6
amplitude
14
16
Boxplot of microarray intensity
Figure 4: Density plot of Illumina microarrays before normalization
7
Pairwise plot with sample correlation
●
●
Cor = 0.97
0 (> 2, up)
210 (> 2, down)
Cor = 0.97
3 (> 2, up)
152 (> 2, down)
Cor = 0.99
6 (> 2, up)
0 (> 2, down)
14
16
16
●●
●●
●● ●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●● ● ●
●
●
●●
●
● ● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
12
14
10
12
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
B02
Cor = 0.97
1 (> 2, up)
167 (> 2, down)
6
8
10
12
14
16
Cor = 0.97
223 (> 2, up)
0 (> 2, down)
10
●
●
●●
●●●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●●● ●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●● ●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●
●
●
B01
Cor = 0.99
5 (> 2, up)
0 (> 2, down)
8
16
●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
A02
8
10
12
14
16
A01
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
8
6
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●●●●
● ●●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
● ●●●● ●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
● ● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●●● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
14
16
12
14
10
12
8
10
6
8
8
10
12
14
16
6
8
10
12
14
16
Figure 5: Pairwise plot with microarray correlation before normalization
8
10
12
14
16
−4
−6
8
10
12
14
16
8
10
12
14
16
8
10
12
14
16
8
10
12
14
16
8
10
12
14
16
0.0
A02
Median: 0.247
IQR: 0.295
B01
−8
−6
Median: 0.0738
IQR: 0.386
−4
−2
0
M
−1.0
0
−0.5
2
Median: −0.188
IQR: 0.350
4
0.5
6
1.0
8
8
−1.0 −0.5
−6
0.0
−4
0.5
A01
−2
−2
1.0
1.5
0
0
2.0
Pairwise MA plots between samples
6
Median: 0.0198
IQR: 0.322
Median: 0.23
IQR: 0.243
Median: −0.0206
IQR: 0.316
A
Figure 6: Pairwise MAplot before normalization
9
B02
> plot(lumi.Q, what = "cv")
Density plot of coefficient of variance
10
0
5
density
15
20
A01
A02
B01
B02
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
coefficient of variance
Figure 7: Density Plot of Coefficient of Varience
The density plot of the coefficient of variance of the LumiQC object. See
Figure 7.
Plot sample relations using hierarchical clustering, see Figure 8
Plot the sampleRelation using MDS, see Figure 9. The color of the sample is
based on the sample type, which is '100US', '95US:5P', '100US', '95US:5P' for
the sample data. Please see the help of getSampleRelation and plot-methods
for the setting of ”parameterList”.
3.3
Variance stabilizing transform
Variance stabilization is critical for subsequent statistical inference to identify
differential genes from microarray data. We devised a variance-stabilizing transformation (VST) by taking advantages of larger number of technical replicates
available on the Illumina microarray. Please see [1] for details of the algorithm.
Function lumiT performs variance stabilizing transform with both input and
output being LumiBatch object.
10
> plot(lumi.Q, what = "sampleRelation")
10
B02
A02
Sample
hclust (*, "average")
Figure 8: Sample relations before normalization
11
B01
A01
5
Height
15
Clusters of the samples based on 1104 genes with sd/mean > 0.1
> plot(lumi.Q, what = "sampleRelation", parameterList = list(method = "mds", col = c("100US
+
"95US:5P", "100US", "95US:5P")))
Sample relations based on 1104 genes with sd/mean > 0.1
0
A02
−1
B02
−2
Principle component 2
1
2
A01
B01
−10
−5
0
5
Principle component 1
Figure 9: Sample relations before normalization
12
10
Do default VST variance stabilizing transform
> lumi.T <- lumiT(example.lumi)
2007-01-01
2007-01-01
2007-01-01
2007-01-01
23:18:17
23:18:17
23:18:18
23:18:18
,
,
,
,
processing
processing
processing
processing
array
array
array
array
1
2
3
4
Function lumiT also provides options to do "log2" or "cubicRoot" transform. See help of lumiT for details.
3.4
Data normalization
We proposed a robust spline normalization (RSN) algorithm, which combines
the features of quanitle and loess nor-malization, is designed to normalize the
variance-stabilized data. Please see [1] for details of the algorithm.
Function lumiN performs robust spline normalization (RSN) algorithm with
both input and output being LumiBatch object. lumiN also provides options to
do "loess", "quantile", "VSN" normalization. See help of lumiN for details.
Do default RSN between microarray normaliazation
> lumi.N <- lumiN(lumi.T)
2007-01-01
2007-01-01
2007-01-01
2007-01-01
23:18:21
23:18:22
23:18:23
23:18:23
,
,
,
,
processing
processing
processing
processing
array
array
array
array
1
2
3
4
User can also easily select other normalization method. For example, the
following command will run quantile between microarray normaliazation.
> lumi.N <- lumiN(lumi.T, method = "quantile")
3.5
Quality control after normalization
To make sure the data quality meets our requirement, we do a second round of
quality control of normalized data with different QC plots. Compare the plots
before and after normalization, we can clearly see the improvements.
> lumi.N.Q <- lumiQ(lumi.N)
> lumi.N.Q
Class: LumiQC
Data dimension:
8000 genes x 4 samples
Summary of Samples:
A01
A02
B01
B02
mean
8.9420 8.9400 8.9400 8.940
standard deviation 1.2550 1.2540 1.2540 1.254
detection rate
0.5387 0.5574 0.5722 0.571
distance to center 10.7900 10.6000 10.8800 11.000
13
> plot(lumi.N.Q, what = "density")
Density plot of intensity
0.6
0.0
0.2
0.4
density
0.8
1.0
1.2
A01
A02
B01
B02
8
10
12
14
16
intensity
Figure 10: Density plot of Illumina microarrays after normalization
Note: the detection threshold is 0.99, "center" is the mean of all samples.
No outliers detected based on distance-to-center threshold 21.7.
Major Operation History:
submitted
1 2006-12-27 14:34:38 2006-12-27
2 2006-12-27 15:14:37 2006-12-27
3 2007-01-01 23:18:17 2007-01-01
4 2007-01-01 23:18:19 2007-01-01
5 2007-01-01 23:18:24 2007-01-01
4
finished
command
14:36:57 lumiR("../data/Barnes_gene_profile.txt")
15:14:38 Subsetting 8000 features and 4 samples.
23:18:19
lumiT(x.lumi = example.lumi)
23:18:24
lumiN(x.lumi = lumi.T)
23:18:30
lumiQ(x.lumi = lumi.N)
Gene annotation
Because the TargetID used by Illumina microarray is not consistent among
different versions of arrays. For instance, the same 50mer sequence has two different TargetIDs used by Illumina: "GI_21070949-S" in the Mouse_Ref-8_V1
chip and "scl022190.1_154-S" in the Mouse-6_V1 chip. This causes difficulties
when combining clinical microarray data collected over time using different ver14
> plot(lumi.N.Q, what = "boxplot")
12
10
B02
B01
A02
A01
8
amplitude
14
16
Boxplot of microarray intensity
Figure 11: Density plot of Illumina microarrays after normalization
15
> plot(lumi.N.Q, what = "pair")
Pairwise plot with sample correlation
●
●
14
10
12
●●
●
●
●●●
●●
●
●●
●
●●●
●
●
●
●●
●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●●
●●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
● ●
●
8
●
●●
●●
●●
●●
●●
●●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cor = 0.97
0 (> 2, up)
85 (> 2, down)
Cor = 1
0 (> 2, up)
1 (> 2, down)
●●
●
●
●
●
●●
●
●●
●●
●●
●
● ●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●●
●
● ●●●● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●●
● ●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
16
14
12
●
●
16
Cor = 0.97
81 (> 2, up)
0 (> 2, down)
16
●
●
●●
●
●
●
● ●●
●●●
●
●
●●
●
8
●
●●
●●
●●●
●
●●●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●●● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ● ●
●
●
●
●
●
B01
Cor = 1
1 (> 2, up)
0 (> 2, down)
14
10
12
●
●
A02
Cor = 0.97
0 (> 2, up)
86 (> 2, down)
10
16
●
8
●
●
●●●
●●
●
●
●●
●●
●●
●●
●●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●
●●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
16
●●●
●●
●
●●
●●
●●
●●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
● ●
●
●
●●
●
●● ●
●
●●
●●
●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●● ●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
● ●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
16
●
●●
●
●●
●
●
●●
●●
●
●
●●
●
14
14
12
12
10
A01
10
8
8
8
10
12
14
B02
Cor = 0.97
0 (> 2, up)
79 (> 2, down)
8
10
12
14
16
8
10
12
14
16
Figure 12: Pairwise plot with microarray correlation after normalization
16
> plot(lumi.N.Q, what = "MAplot")
0.0
−0.2
−0.6
−0.05
−0.6
0.00
−0.4
A01
−0.4
−0.2
0.05
0.0
0.10
Pairwise MA plots between samples
3.2
3.4
3.6
3.8
4.0
3.0
3.2
3.4
3.6
3.8
4.0
3.0
3.2
3.4
3.6
3.8
4.0
3.4
3.6
3.8
4.0
3.0
3.2
3.4
3.6
3.8
4.0
3.0
3.2
3.4
3.6
3.8
4.0
−0.05
−0.10
0.2
B01
−0.4
Median: −0.00402
IQR: 0.0241
−0.6
Median: −0.000408
IQR: 0.0192
−0.2
0.0
M
3.2
0.00
0.4
A02
0.0
Median: 0.00361
IQR: 0.0242
3.0
0.05
0.6
3.0
Median: 0.00344
IQR: 0.0248
Median: 0.000141
IQR: 0.0194
Median: 0.00440
IQR: 0.0234
A
Figure 13: Pairwise MAplot after normalization
17
B02
> plot(lumi.N.Q, what = "sampleRelation")
10
Sample
hclust (*, "average")
Figure 14: Sample relations after normalization
18
B01
A01
B02
A02
5
Height
15
Clusters of the samples based on 1132 genes with sd/mean > 0.1
> plot(lumi.N.Q, what = "sampleRelation", parameterList = list(method = "mds", col = c("100
+
"95US:5P", "100US", "95US:5P")))
Sample relations based on 1132 genes with sd/mean > 0.1
1.5
B01
0.0
−0.5
−1.0
A02
−1.5
Principle component 2
0.5
1.0
B02
A01
−10
−5
0
5
Principle component 1
Figure 15: Sample relations after normalization
19
10
sions of the chips. Also the targetID is not meaningful for the user. We designed
a nucleotide universal identifier (nuID), which encodes the 50mer oligonucleotide
sequence and contains error checking and self-identification code. For details,
please read [2].
4.1
>
>
>
>
>
Examples of nuID
## provide an arbitrary nucleotide sequence as an example
seq <- 'ACGTAAATTTCAGTTTAAAACCCCCCG'
## create a nuID for it
id <- seq2id(seq)
print(id)
[1] "YGwP0vwBVW"
The original nucleotide sequence can be easily recovered by id2seq
> id2seq(id)
[1] "ACGTAAATTTCAGTTTAAAACCCCCCG"
The nuID is self-identifiable. is.nuID can check the sequence is nuID or not.
A real nuID
> is.nuID(id)
[1] TRUE
An random sequence
> is.nuID("adfqeqe")
[1] FALSE
4.2
Illumina microarray annotation package
Because all the Illumina microarrays use 50-mers, by using the nuID universal
identifier, we are able to build one annotation database for different versions of
the human (or other species) chips. Moreover, the nuID can be directly converted to the probe sequence, and used to get the most updated refSeq matches
and annotations. Currently, annotation packages for human (Human.lumi) and
mouse (Mouse.lumi) have been created. Packages for more species will be created in the future.
The Illumina annotation packages are produced by using AnnBuilder with
small modification. As a result, the format of the package is the same as
Affymetrix annotation package, lots of packages designed for Affymetrix can
also be used for Illumina annotation package.
Here is some examples:
> ## load lumi annotation package
> lib <- 'Human.lumi'
# Huamn lumi annotation package
> if(require(GO) & require(annotate) & require(lib, character.only=TRUE)) {
+
GOId <- 'GO:0004816'
# asparagine-tRNA ligase activity
20
+
+
+
+ }
probe <- lookUp(GOId, lib, 'GO2ALLPROBES')
# probes under 'GO:0004816' category
print(probe)
$`GO:0004816`
IEA
TAS
"WVUU7XyNw3ucXzwdEk" "WVUU7XyNw3ucXzwdEk"
>
>
>
+
+
+
# specify a nuID
nuId <- 'WVUU7XyNw3ucXzwdEk'
if (require(annotate) & require(lib, character.only=TRUE)) {
# get the gene symbol of nuId
getSYMBOL(nuId, lib)
}
WVUU7XyNw3ucXzwdEk
"NARS"
4.3
Transfer Illumina microarray data as nuID annotated
For Illumina microarray data, there are two ways to transfer the TargetID as
nuID. The first way is to use the annotation file for the specific chip which is
provided by Illumina company. 'Human_RefSeq-8.csv' is an example annotation file for Human RefSeq-8 microarray. We recommend using this method
because it can make sure the matching of TargetID and nucleotide sequence.
addNuId2lumi implements the function.
>
>
>
>
## specify the annotation file for the Illumina chip
annotationFile <- 'Human_RefSeq-8.csv'
## Replace the Target ID with nuID
lumi.N <- addNuId2lumi(lumi.N, annotationFile)
# Not run
# Not run
An alternative way is to load the Annotation library and match the targetID
with nuID
> if (require(Human.lumi)) {
+
lumi.N <- addNuId2lumi(lumi.N, lib = "Human.lumi")
+ }
5
A use case: from raw data to functional analysis
Figure 16 shows the data processing flow chart of the use case. Since the classes
in lumi package are inherited from class ExpressionSet, packages and functions
compatible with class ExpressionSet or accepting matrix as input all can be
used for lumi results. Here we just give two examples: using limma to identify
differentiated genes and using GOstats to annotate the significant genes.
We use the Barnes data set [3] as an example. The Barnes data setmeasured
a dilution series of two human tis-sues, brain and placenta, at the titration
rate of 100%, 95%, 75%, 50%, 25% and 0%. The samples were hybridized on
21
Figure 16: Flow chart of the use case
HumanRef-8 BeadChip (Illumina, Inc) in duplicate. We used this data set to
evaluate the detection of differential expressions.
5.1
Preprocess the Illumina data
>
>
>
>
>
library(lumi)
## specify the file name
# fileName <- 'Barnes_gene_profile.txt'
## load the data
# example.lumi <- lumiR(fileName)
>
>
>
>
## load saved data
load(example.lumi)
## Transfer Illumina microarray data as nuID annotated
example.lumi <- addNuId2lumi(example.lumi, lib='Human.lumi')
#
Not run
# Not run
> ## Quality control based on the raw data
> lumi.Q <- lumiQ(example.lumi)
# (optional)
> ## Do default VST variance stabilizing transform
> lumi.T <- lumiT(example.lumi)
> ## Do lumi between microarray normaliazation
> lumi.N <- lumiN(lumi.T)
> ## Quality control after normalization
> lumi.N.Q <- lumiQ(lumi.N)
5.2
# (optional)
Identify differentiate genes
Identify the differentiated genes based on moderated t-test using limma.
Retrieve the normalized data
> dataMatrix <- exprs(lumi.N)
22
To speed up the processing and reduce FDR, remove the unexpressed genes
> presentCount <- pData(featureData(lumi.N))$presentCount
> selDataMatrix <- dataMatrix[presentCount > 0, ]
> selProbe <- rownames(selDataMatrix)
>
>
>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
## Specify the sample type
sampleType <- c('100US', '95US:5P', '100US', '95US:5P')
if(require(limma)) {
## compare '95US:5P' and '100US'
design <- model.matrix(~ factor(sampleType))
colnames(design) <- c('100US', '95US:5P-100US')
fit <- lmFit(selDataMatrix, design)
fit <- eBayes(fit)
## print the top 10 genes
topTable(fit, coef='95US:5P-100US', adjust='fdr', number=10)
## get significant gene list with FDR adjusted p.values less than 0.01
p.adj <- p.adjust(fit$p.value[,2])
sigGene.adj <- selProbe[ p.adj < 0.01]
## without FDR adjustment
sigGene <- selProbe[ fit$p.value[,2] < 0.001]
}
5.3
Gene Ontology annotation
Based on the significant genes identified using limma or t-test, we can further
do Gene Ontology annotation. We can use package GOstats to do the analysis.
Do Hypergeometric test of Gene Ontology based on the significant gene list
(for e. Table 1 shows the significant GO terms of Molecular Function with pvalue less than 0.01. Here only show the significant GO terms of MF (Molecular
Function). For other categories(BP, CC), just follow the same procedure.
> if (require(GOstats) & require(Human.lumi)) {
+
+
## Get the locuslink Id of the gene
+
sigLL <- unique(unlist(mget(sigGene, env=Human.lumiLOCUSID, ifnotfound=NA)))
+
sigLL <- as.character(sigLL[!is.na(sigLL)])
+
params <- new("GOHyperGParams",
+
geneIds= sigLL,
+
annotation="Human.lumi",
+
ontology="BP",
+
pvalueCutoff= 0.01,
+
conditional=FALSE,
+
testDirection="over")
+
+
hgOver <- hyperGTest(params)
+
+
## Get the p-values of the test
+
gGhyp.pv <- pvalues(hgOver)
23
+
+
+
+
+
+
+
+ }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
## select the Go terms with p-value less than 0.01
sigGO.ID <- names(gGhyp.pv[gGhyp.pv < 0.01])
## Here only show the significant GO terms of BP (Molecular Function)
##
For other categories, just follow the same procedure.
sigGO.Term <- getGOTerm(sigGO.ID)[["BP"]]
GO ID
GO:0006817
GO:0006692
GO:0006693
GO:0015698
GO:0006954
GO:0006820
GO:0007229
GO:0009613
GO:0051707
GO:0006690
GO:0001516
GO:0046457
GO:0006950
GO:0015884
GO:0007155
GO:0009611
GO:0007565
GO:0006517
GO:0007179
GO:0006955
GO:0051181
GO:0051706
GO:0046456
GO:0009605
GO:0008632
GO:0051704
GO:0008361
GO:0016049
GO:0007243
GO:0007167
GO:0043174
GO:0043101
GO:0006166
Term
phosphate transpo...
prostanoid metabo...
prostaglandin met...
inorganic anion t...
inflammatory resp...
anion transport
integrin-mediated...
response to pest,...
response to other...
icosanoid metabol...
prostaglandin bio...
prostanoid biosyn...
response to stres...
folic acid transp...
cell adhesion
response to wound...
pregnancy
protein deglycosy...
transforming grow...
immune response
cofactor transpor...
physiological int...
icosanoid biosynt...
response to exter...
apoptotic program
interaction betwe...
regulation of cel...
cell growth
protein kinase ca...
enzyme linked rec...
nucleoside salvag...
purine salvage
purine ribonucleo...
p-value
7.1814e-05
7.4988e-05
7.4988e-05
0.00014391
0.00046099
0.00046297
0.00066863
0.00082834
0.00099687
0.0010893
0.0011223
0.0011223
0.0014567
0.001786
0.0020978
0.0021351
0.0021352
0.0033714
0.0038755
0.0040391
0.0057251
0.0059084
0.0060734
0.0063917
0.0064967
0.0068181
0.0088256
0.0088256
0.0092369
0.0094688
0.0097239
0.0097239
0.0097239
Significant Genes No.
11
6
6
15
21
16
9
41
41
7
4
4
67
3
41
31
8
2
6
52
3
8
5
35
7
10
13
13
21
17
2
2
2
Table 1: GO terms, p-values and counts.
24
Total Genes No.
48
14
14
88
163
108
43
425
429
29
9
9
808
5
446
311
41
2
27
620
7
48
21
390
39
70
107
107
208
157
3
3
3
6
Reference
1. Lin, S.M., Du, P., Kibbe, W.A., ”Model-based Variance-stabilizing Transformation for Illumina Mi-croarray Data”, submitted
2. Du, P., Kibbe, W.A. and Lin, S.M., ”nuID: A Novel Identifier for Oligos,
Ideal for Oligonucleotide-based Microarrays”, submitted.
3. Barnes, M., Freudenberg, J., Thompson, S., Aronow, B. and Pav-lidis,
P. (2005) ”Experimental comparison and cross-validation of the Affymetrix and
Illumina gene expression analysis platforms”, Nucleic Acids Res, 33, 5914-5923.
25