Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Using lumi, a package processing Illumina Microarray Pan Du‡∗, Warren A. Kibbe‡†, Simon Lin‡‡ January 1, 2007 Robert H. Lurie Comprehensive Cancer Center Northwestern University, Chicago, IL, 60611, USA ‡ Contents 1 Overview of lumi 1 2 Object models of major classes 2 3 Data preprocessing 3.1 Input data . . . . . . . . . . . . . . 3.2 Quality control of the raw data . . 3.3 Variance stabilizing transform . . . 3.4 Data normalization . . . . . . . . . 3.5 Quality control after normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3 4 10 13 13 4 Gene annotation 14 4.1 Examples of nuID . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Illumina microarray annotation package . . . . . . . . . . . . . . 20 4.3 Transfer Illumina microarray data as nuID annotated . . . . . . 21 5 A use case: from raw data to functional 5.1 Preprocess the Illumina data . . . . . . 5.2 Identify differentiate genes . . . . . . . . 5.3 Gene Ontology annotation . . . . . . . . 6 Reference 1 analysis 21 . . . . . . . . . . . . . . 22 . . . . . . . . . . . . . . 22 . . . . . . . . . . . . . . 23 25 Overview of lumi lumi R package is designed to preprocess the Illumina microarray data. It includes data input, quality control, variance stabilization, normalization and gene annotation part. The package can be easily integrated with other microarray ∗ [email protected] † [email protected] ‡ [email protected] 1 class: ExpressionSet Slots assayData exprs: gene expression (bead replicate mean) featureData: gene information experimentData: experiment meta data phenoData: expreriment design ...... class: LumiBatch Slots assayData se.exprs: bead replicate standard deviation beadNum: bead number of each gene detection: detection probability history: operation tracking ...... class: LumiQC Slots assayData cv: the coefficients of variance mean, std, sampleCor, detectionRate, sampleRelation, outlier, history ...... Major methods lumiR: initialization by reading raw data lumiT: variance stabilizing transform lumiN: normalization lumiQ: quality control getHistory ...... Major methods plot: plot QC plots: MAplot, pairs, boxplot, sample relation, outlier, cv, densityPlot getHistory ...... Figure 1: Object models in lumi package data analysis, like differentiated gene identification, gene ontology analysis or clustering analysis. 2 Object models of major classes The lumi package has two major class: LumiBatch and LumiQC. Both classes are inherited from ExpressionSet class in Bioconductor for better compatibility. Their relations are shown in Figure 1. LumiBatch class includes se.exprs, beadNum and detection in assayData slot for additional informations unique to Illumina microarrays. Class LumiQC keeps the quality control information. The S4 function plot supports different kinds of plots by specifying the specific plot type of LumiQC object. See help of plot-methods function for details. The history slot records all the operations made on the LumiBatch or LumiQC objects. This provides data provenance. Function getHistory is to retrieve the history slot. Please see the help files of LumiBatch and LumiQC class for more details. 3 Data preprocessing The first thing is to load the lumi package. 2 Figure 2: An example of the input data format > library(lumi) 3.1 Input data The lumiR function supports directly reading the Illumina raw data output of the Illumina Bead Studio toolkit. It can automatically detect the format of the text file and create a new LumiBatch object for it. An example of the input data format is shown in in Figure 2. For simplicity, only part of the data of first sample is shown. The data in the highlighted columns are kept in the corresponding slots of LumiBatch object, as shown in Figure 2. The lumiR function will automatically determine the starting line of the data. The columns started with AVG_Signal and BEAD_STDEV are required for the LumiBatch object. The sample IDs and sample labels are extracted from the column names of the data file. For example, based on the column name: AVG_Signal-1304401001_A, we will extract "1304401001" as the sample ID and "A" as the sample label (The function assumes the separation of the sample ID and the sample label is "_" if it exists in the column name.). The algorithm will check the uniqueness of sample IDs. If the sample ID is not unique, the entire portion after "AVG_Signal-" will be used as a sample ID. A file format error will be reported if the uniqueness still cannot be satisfied in this way. > > > > ## specify the file name # fileName <- 'Barnes_gene_profile.txt' ## load the data # x.lumi <- lumiR(fileName) # Not Run # Not Run Here, we just load the pre-saved example data, which is a LumiBatch object, example.lumi > > > > ## load example data data(example.lumi) ## summary of the example data example.lumi Data Information: Illumina Inc. Normalization Array Content Error Model = BeadStudio version 1.4.0.1 = none = 11188230_100CP_MAGE-ML.XML none 3 DateTime = 2/3/2005 3:21 PM Local Settings = en-US Major Operation History: submitted finished command 1 2006-12-27 14:34:38 2006-12-27 14:36:57 lumiR("../data/Barnes_gene_profile.txt") 2 2006-12-27 15:14:37 2006-12-27 15:14:38 Subsetting 8000 features and 4 samples. Object Information: LumiBatch (storageMode: lockedEnvironment) assayData: 8000 features, 4 samples element names: beadNum, detection, exprs, se.exprs phenoData rowNames: A01, A02, B01, B02 varLabels and varMetadata: sampleID: The unique Illumina microarray Id label: The label of the sample featureData rowNames: GI_21687152-S, GI_20357509-A, ..., GI_4557600-S (8000 total) varLabels and varMetadata: targetID: The Illumina microarray gene ID presentCount: The number of detectable measurements of the gene experimentData: use 'experimentData(object)' Annotation character(0) 3.2 Quality control of the raw data The quality control of a LumiBatch object includes the gene expression value and its coefficient of variance, estimating the mean and standard deviation of the microarrays, microarray correlation, detectable probe ratio of each microarray, density of each microarray, sample (microarray) relations, detecting outliers of samples(microarrays). LumiQ function will do these calculation based on a LumiBatch object and organize the results in a LumiQC object. > > > > ## Do quality control estimation lumi.Q <- lumiQ(example.lumi) ## summary of the quality control lumi.Q Class: LumiQC Data dimension: 8000 genes x 4 samples Summary of Samples: A01 A02 B01 B02 mean 8.3300 8.5680 8.2630 8.347 standard deviation 1.5870 1.7140 1.7490 1.695 detection rate 0.5387 0.5574 0.5722 0.571 distance to center 12.1300 11.5800 12.0100 11.990 4 Note: the detection threshold is 0.99, "center" is the mean of all samples. No outliers detected based on distance-to-center threshold 24. Major Operation History: submitted 1 2006-12-27 14:34:38 2006-12-27 2 2006-12-27 15:14:37 2006-12-27 3 2007-01-01 23:17:19 2007-01-01 finished command 14:36:57 lumiR("../data/Barnes_gene_profile.txt") 15:14:38 Subsetting 8000 features and 4 samples. 23:17:27 lumiQ(x.lumi = example.lumi) The S4 method plot can plot the quality control estimation results kept in a LumiQC object. The quality control plots includes: the density plot (Figure 3), box plot (Figure 4), pairwise correlation between microarrays (Figure 5), pairwise MAplot between microarrays (Figure 6), density plot of coefficient of varience, (Figure 7), and the sample relations (Figure 8). More details are in the help of plot,LumiQC-method function. Most of these plots can also be plotted by the extended general functions: hist (for density plot), boxplot, MAplot, pairs. Plot the density plot of the LumiQC object. See Figure 3. > > > > > > ## plot the density plot(lumi.Q, what='density') ## or hist(lumi.Q) ## or hist(example.lumi) Plot the box plot of the LumiQC object. See Figure 4. > > > > > > ## plot the box plot plot(lumi.Q, what='boxplot') ## or boxplot(lumi.Q) ## or boxplot(example.lumi) Plot the pairwise sample correlation of the LumiQC object. See Figure 5. > > > > > > ## plot the pair plot plot(lumi.Q, what='pair') ## or pairs(lumi.Q) ## or pairs(example.lumi) The MA plot of the LumiQC object. See Figure 6. > > > > > > ## plot the MAplot plot(lumi.Q, what='MAplot') ## or MAplot(lumi.Q) ## or MAplot(example.lumi) 5 Density plot of intensity 0.3 0.2 0.1 0.0 density 0.4 0.5 0.6 A01 A02 B01 B02 6 8 10 12 14 16 intensity Figure 3: Density plot of Illumina microarrays before normalization 6 12 10 8 B02 B01 A02 A01 6 amplitude 14 16 Boxplot of microarray intensity Figure 4: Density plot of Illumina microarrays before normalization 7 Pairwise plot with sample correlation ● ● Cor = 0.97 0 (> 2, up) 210 (> 2, down) Cor = 0.97 3 (> 2, up) 152 (> 2, down) Cor = 0.99 6 (> 2, up) 0 (> 2, down) 14 16 16 ●● ●● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● 12 14 10 12 ●● ● ● ●●● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● B02 Cor = 0.97 1 (> 2, up) 167 (> 2, down) 6 8 10 12 14 16 Cor = 0.97 223 (> 2, up) 0 (> 2, down) 10 ● ● ●● ●●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● B01 Cor = 0.99 5 (> 2, up) 0 (> 2, down) 8 16 ● ● ●● ● ●● ● ● ●● ● ●● ● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● A02 8 10 12 14 16 A01 ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 8 6 ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ●●●● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 14 16 12 14 10 12 8 10 6 8 8 10 12 14 16 6 8 10 12 14 16 Figure 5: Pairwise plot with microarray correlation before normalization 8 10 12 14 16 −4 −6 8 10 12 14 16 8 10 12 14 16 8 10 12 14 16 8 10 12 14 16 8 10 12 14 16 0.0 A02 Median: 0.247 IQR: 0.295 B01 −8 −6 Median: 0.0738 IQR: 0.386 −4 −2 0 M −1.0 0 −0.5 2 Median: −0.188 IQR: 0.350 4 0.5 6 1.0 8 8 −1.0 −0.5 −6 0.0 −4 0.5 A01 −2 −2 1.0 1.5 0 0 2.0 Pairwise MA plots between samples 6 Median: 0.0198 IQR: 0.322 Median: 0.23 IQR: 0.243 Median: −0.0206 IQR: 0.316 A Figure 6: Pairwise MAplot before normalization 9 B02 > plot(lumi.Q, what = "cv") Density plot of coefficient of variance 10 0 5 density 15 20 A01 A02 B01 B02 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 coefficient of variance Figure 7: Density Plot of Coefficient of Varience The density plot of the coefficient of variance of the LumiQC object. See Figure 7. Plot sample relations using hierarchical clustering, see Figure 8 Plot the sampleRelation using MDS, see Figure 9. The color of the sample is based on the sample type, which is '100US', '95US:5P', '100US', '95US:5P' for the sample data. Please see the help of getSampleRelation and plot-methods for the setting of ”parameterList”. 3.3 Variance stabilizing transform Variance stabilization is critical for subsequent statistical inference to identify differential genes from microarray data. We devised a variance-stabilizing transformation (VST) by taking advantages of larger number of technical replicates available on the Illumina microarray. Please see [1] for details of the algorithm. Function lumiT performs variance stabilizing transform with both input and output being LumiBatch object. 10 > plot(lumi.Q, what = "sampleRelation") 10 B02 A02 Sample hclust (*, "average") Figure 8: Sample relations before normalization 11 B01 A01 5 Height 15 Clusters of the samples based on 1104 genes with sd/mean > 0.1 > plot(lumi.Q, what = "sampleRelation", parameterList = list(method = "mds", col = c("100US + "95US:5P", "100US", "95US:5P"))) Sample relations based on 1104 genes with sd/mean > 0.1 0 A02 −1 B02 −2 Principle component 2 1 2 A01 B01 −10 −5 0 5 Principle component 1 Figure 9: Sample relations before normalization 12 10 Do default VST variance stabilizing transform > lumi.T <- lumiT(example.lumi) 2007-01-01 2007-01-01 2007-01-01 2007-01-01 23:18:17 23:18:17 23:18:18 23:18:18 , , , , processing processing processing processing array array array array 1 2 3 4 Function lumiT also provides options to do "log2" or "cubicRoot" transform. See help of lumiT for details. 3.4 Data normalization We proposed a robust spline normalization (RSN) algorithm, which combines the features of quanitle and loess nor-malization, is designed to normalize the variance-stabilized data. Please see [1] for details of the algorithm. Function lumiN performs robust spline normalization (RSN) algorithm with both input and output being LumiBatch object. lumiN also provides options to do "loess", "quantile", "VSN" normalization. See help of lumiN for details. Do default RSN between microarray normaliazation > lumi.N <- lumiN(lumi.T) 2007-01-01 2007-01-01 2007-01-01 2007-01-01 23:18:21 23:18:22 23:18:23 23:18:23 , , , , processing processing processing processing array array array array 1 2 3 4 User can also easily select other normalization method. For example, the following command will run quantile between microarray normaliazation. > lumi.N <- lumiN(lumi.T, method = "quantile") 3.5 Quality control after normalization To make sure the data quality meets our requirement, we do a second round of quality control of normalized data with different QC plots. Compare the plots before and after normalization, we can clearly see the improvements. > lumi.N.Q <- lumiQ(lumi.N) > lumi.N.Q Class: LumiQC Data dimension: 8000 genes x 4 samples Summary of Samples: A01 A02 B01 B02 mean 8.9420 8.9400 8.9400 8.940 standard deviation 1.2550 1.2540 1.2540 1.254 detection rate 0.5387 0.5574 0.5722 0.571 distance to center 10.7900 10.6000 10.8800 11.000 13 > plot(lumi.N.Q, what = "density") Density plot of intensity 0.6 0.0 0.2 0.4 density 0.8 1.0 1.2 A01 A02 B01 B02 8 10 12 14 16 intensity Figure 10: Density plot of Illumina microarrays after normalization Note: the detection threshold is 0.99, "center" is the mean of all samples. No outliers detected based on distance-to-center threshold 21.7. Major Operation History: submitted 1 2006-12-27 14:34:38 2006-12-27 2 2006-12-27 15:14:37 2006-12-27 3 2007-01-01 23:18:17 2007-01-01 4 2007-01-01 23:18:19 2007-01-01 5 2007-01-01 23:18:24 2007-01-01 4 finished command 14:36:57 lumiR("../data/Barnes_gene_profile.txt") 15:14:38 Subsetting 8000 features and 4 samples. 23:18:19 lumiT(x.lumi = example.lumi) 23:18:24 lumiN(x.lumi = lumi.T) 23:18:30 lumiQ(x.lumi = lumi.N) Gene annotation Because the TargetID used by Illumina microarray is not consistent among different versions of arrays. For instance, the same 50mer sequence has two different TargetIDs used by Illumina: "GI_21070949-S" in the Mouse_Ref-8_V1 chip and "scl022190.1_154-S" in the Mouse-6_V1 chip. This causes difficulties when combining clinical microarray data collected over time using different ver14 > plot(lumi.N.Q, what = "boxplot") 12 10 B02 B01 A02 A01 8 amplitude 14 16 Boxplot of microarray intensity Figure 11: Density plot of Illumina microarrays after normalization 15 > plot(lumi.N.Q, what = "pair") Pairwise plot with sample correlation ● ● 14 10 12 ●● ● ● ●●● ●● ● ●● ● ●●● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● 8 ● ●● ●● ●● ●● ●● ●● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●●● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cor = 0.97 0 (> 2, up) 85 (> 2, down) Cor = 1 0 (> 2, up) 1 (> 2, down) ●● ● ● ● ● ●● ● ●● ●● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ●● ● ● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 16 14 12 ● ● 16 Cor = 0.97 81 (> 2, up) 0 (> 2, down) 16 ● ● ●● ● ● ● ● ●● ●●● ● ● ●● ● 8 ● ●● ●● ●●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● B01 Cor = 1 1 (> 2, up) 0 (> 2, down) 14 10 12 ● ● A02 Cor = 0.97 0 (> 2, up) 86 (> 2, down) 10 16 ● 8 ● ● ●●● ●● ● ● ●● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 16 ●●● ●● ● ●● ●● ●● ●● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 16 ● ●● ● ●● ● ● ●● ●● ● ● ●● ● 14 14 12 12 10 A01 10 8 8 8 10 12 14 B02 Cor = 0.97 0 (> 2, up) 79 (> 2, down) 8 10 12 14 16 8 10 12 14 16 Figure 12: Pairwise plot with microarray correlation after normalization 16 > plot(lumi.N.Q, what = "MAplot") 0.0 −0.2 −0.6 −0.05 −0.6 0.00 −0.4 A01 −0.4 −0.2 0.05 0.0 0.10 Pairwise MA plots between samples 3.2 3.4 3.6 3.8 4.0 3.0 3.2 3.4 3.6 3.8 4.0 3.0 3.2 3.4 3.6 3.8 4.0 3.4 3.6 3.8 4.0 3.0 3.2 3.4 3.6 3.8 4.0 3.0 3.2 3.4 3.6 3.8 4.0 −0.05 −0.10 0.2 B01 −0.4 Median: −0.00402 IQR: 0.0241 −0.6 Median: −0.000408 IQR: 0.0192 −0.2 0.0 M 3.2 0.00 0.4 A02 0.0 Median: 0.00361 IQR: 0.0242 3.0 0.05 0.6 3.0 Median: 0.00344 IQR: 0.0248 Median: 0.000141 IQR: 0.0194 Median: 0.00440 IQR: 0.0234 A Figure 13: Pairwise MAplot after normalization 17 B02 > plot(lumi.N.Q, what = "sampleRelation") 10 Sample hclust (*, "average") Figure 14: Sample relations after normalization 18 B01 A01 B02 A02 5 Height 15 Clusters of the samples based on 1132 genes with sd/mean > 0.1 > plot(lumi.N.Q, what = "sampleRelation", parameterList = list(method = "mds", col = c("100 + "95US:5P", "100US", "95US:5P"))) Sample relations based on 1132 genes with sd/mean > 0.1 1.5 B01 0.0 −0.5 −1.0 A02 −1.5 Principle component 2 0.5 1.0 B02 A01 −10 −5 0 5 Principle component 1 Figure 15: Sample relations after normalization 19 10 sions of the chips. Also the targetID is not meaningful for the user. We designed a nucleotide universal identifier (nuID), which encodes the 50mer oligonucleotide sequence and contains error checking and self-identification code. For details, please read [2]. 4.1 > > > > > Examples of nuID ## provide an arbitrary nucleotide sequence as an example seq <- 'ACGTAAATTTCAGTTTAAAACCCCCCG' ## create a nuID for it id <- seq2id(seq) print(id) [1] "YGwP0vwBVW" The original nucleotide sequence can be easily recovered by id2seq > id2seq(id) [1] "ACGTAAATTTCAGTTTAAAACCCCCCG" The nuID is self-identifiable. is.nuID can check the sequence is nuID or not. A real nuID > is.nuID(id) [1] TRUE An random sequence > is.nuID("adfqeqe") [1] FALSE 4.2 Illumina microarray annotation package Because all the Illumina microarrays use 50-mers, by using the nuID universal identifier, we are able to build one annotation database for different versions of the human (or other species) chips. Moreover, the nuID can be directly converted to the probe sequence, and used to get the most updated refSeq matches and annotations. Currently, annotation packages for human (Human.lumi) and mouse (Mouse.lumi) have been created. Packages for more species will be created in the future. The Illumina annotation packages are produced by using AnnBuilder with small modification. As a result, the format of the package is the same as Affymetrix annotation package, lots of packages designed for Affymetrix can also be used for Illumina annotation package. Here is some examples: > ## load lumi annotation package > lib <- 'Human.lumi' # Huamn lumi annotation package > if(require(GO) & require(annotate) & require(lib, character.only=TRUE)) { + GOId <- 'GO:0004816' # asparagine-tRNA ligase activity 20 + + + + } probe <- lookUp(GOId, lib, 'GO2ALLPROBES') # probes under 'GO:0004816' category print(probe) $`GO:0004816` IEA TAS "WVUU7XyNw3ucXzwdEk" "WVUU7XyNw3ucXzwdEk" > > > + + + # specify a nuID nuId <- 'WVUU7XyNw3ucXzwdEk' if (require(annotate) & require(lib, character.only=TRUE)) { # get the gene symbol of nuId getSYMBOL(nuId, lib) } WVUU7XyNw3ucXzwdEk "NARS" 4.3 Transfer Illumina microarray data as nuID annotated For Illumina microarray data, there are two ways to transfer the TargetID as nuID. The first way is to use the annotation file for the specific chip which is provided by Illumina company. 'Human_RefSeq-8.csv' is an example annotation file for Human RefSeq-8 microarray. We recommend using this method because it can make sure the matching of TargetID and nucleotide sequence. addNuId2lumi implements the function. > > > > ## specify the annotation file for the Illumina chip annotationFile <- 'Human_RefSeq-8.csv' ## Replace the Target ID with nuID lumi.N <- addNuId2lumi(lumi.N, annotationFile) # Not run # Not run An alternative way is to load the Annotation library and match the targetID with nuID > if (require(Human.lumi)) { + lumi.N <- addNuId2lumi(lumi.N, lib = "Human.lumi") + } 5 A use case: from raw data to functional analysis Figure 16 shows the data processing flow chart of the use case. Since the classes in lumi package are inherited from class ExpressionSet, packages and functions compatible with class ExpressionSet or accepting matrix as input all can be used for lumi results. Here we just give two examples: using limma to identify differentiated genes and using GOstats to annotate the significant genes. We use the Barnes data set [3] as an example. The Barnes data setmeasured a dilution series of two human tis-sues, brain and placenta, at the titration rate of 100%, 95%, 75%, 50%, 25% and 0%. The samples were hybridized on 21 Figure 16: Flow chart of the use case HumanRef-8 BeadChip (Illumina, Inc) in duplicate. We used this data set to evaluate the detection of differential expressions. 5.1 Preprocess the Illumina data > > > > > library(lumi) ## specify the file name # fileName <- 'Barnes_gene_profile.txt' ## load the data # example.lumi <- lumiR(fileName) > > > > ## load saved data load(example.lumi) ## Transfer Illumina microarray data as nuID annotated example.lumi <- addNuId2lumi(example.lumi, lib='Human.lumi') # Not run # Not run > ## Quality control based on the raw data > lumi.Q <- lumiQ(example.lumi) # (optional) > ## Do default VST variance stabilizing transform > lumi.T <- lumiT(example.lumi) > ## Do lumi between microarray normaliazation > lumi.N <- lumiN(lumi.T) > ## Quality control after normalization > lumi.N.Q <- lumiQ(lumi.N) 5.2 # (optional) Identify differentiate genes Identify the differentiated genes based on moderated t-test using limma. Retrieve the normalized data > dataMatrix <- exprs(lumi.N) 22 To speed up the processing and reduce FDR, remove the unexpressed genes > presentCount <- pData(featureData(lumi.N))$presentCount > selDataMatrix <- dataMatrix[presentCount > 0, ] > selProbe <- rownames(selDataMatrix) > > > + + + + + + + + + + + + + + + ## Specify the sample type sampleType <- c('100US', '95US:5P', '100US', '95US:5P') if(require(limma)) { ## compare '95US:5P' and '100US' design <- model.matrix(~ factor(sampleType)) colnames(design) <- c('100US', '95US:5P-100US') fit <- lmFit(selDataMatrix, design) fit <- eBayes(fit) ## print the top 10 genes topTable(fit, coef='95US:5P-100US', adjust='fdr', number=10) ## get significant gene list with FDR adjusted p.values less than 0.01 p.adj <- p.adjust(fit$p.value[,2]) sigGene.adj <- selProbe[ p.adj < 0.01] ## without FDR adjustment sigGene <- selProbe[ fit$p.value[,2] < 0.001] } 5.3 Gene Ontology annotation Based on the significant genes identified using limma or t-test, we can further do Gene Ontology annotation. We can use package GOstats to do the analysis. Do Hypergeometric test of Gene Ontology based on the significant gene list (for e. Table 1 shows the significant GO terms of Molecular Function with pvalue less than 0.01. Here only show the significant GO terms of MF (Molecular Function). For other categories(BP, CC), just follow the same procedure. > if (require(GOstats) & require(Human.lumi)) { + + ## Get the locuslink Id of the gene + sigLL <- unique(unlist(mget(sigGene, env=Human.lumiLOCUSID, ifnotfound=NA))) + sigLL <- as.character(sigLL[!is.na(sigLL)]) + params <- new("GOHyperGParams", + geneIds= sigLL, + annotation="Human.lumi", + ontology="BP", + pvalueCutoff= 0.01, + conditional=FALSE, + testDirection="over") + + hgOver <- hyperGTest(params) + + ## Get the p-values of the test + gGhyp.pv <- pvalues(hgOver) 23 + + + + + + + + } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 ## select the Go terms with p-value less than 0.01 sigGO.ID <- names(gGhyp.pv[gGhyp.pv < 0.01]) ## Here only show the significant GO terms of BP (Molecular Function) ## For other categories, just follow the same procedure. sigGO.Term <- getGOTerm(sigGO.ID)[["BP"]] GO ID GO:0006817 GO:0006692 GO:0006693 GO:0015698 GO:0006954 GO:0006820 GO:0007229 GO:0009613 GO:0051707 GO:0006690 GO:0001516 GO:0046457 GO:0006950 GO:0015884 GO:0007155 GO:0009611 GO:0007565 GO:0006517 GO:0007179 GO:0006955 GO:0051181 GO:0051706 GO:0046456 GO:0009605 GO:0008632 GO:0051704 GO:0008361 GO:0016049 GO:0007243 GO:0007167 GO:0043174 GO:0043101 GO:0006166 Term phosphate transpo... prostanoid metabo... prostaglandin met... inorganic anion t... inflammatory resp... anion transport integrin-mediated... response to pest,... response to other... icosanoid metabol... prostaglandin bio... prostanoid biosyn... response to stres... folic acid transp... cell adhesion response to wound... pregnancy protein deglycosy... transforming grow... immune response cofactor transpor... physiological int... icosanoid biosynt... response to exter... apoptotic program interaction betwe... regulation of cel... cell growth protein kinase ca... enzyme linked rec... nucleoside salvag... purine salvage purine ribonucleo... p-value 7.1814e-05 7.4988e-05 7.4988e-05 0.00014391 0.00046099 0.00046297 0.00066863 0.00082834 0.00099687 0.0010893 0.0011223 0.0011223 0.0014567 0.001786 0.0020978 0.0021351 0.0021352 0.0033714 0.0038755 0.0040391 0.0057251 0.0059084 0.0060734 0.0063917 0.0064967 0.0068181 0.0088256 0.0088256 0.0092369 0.0094688 0.0097239 0.0097239 0.0097239 Significant Genes No. 11 6 6 15 21 16 9 41 41 7 4 4 67 3 41 31 8 2 6 52 3 8 5 35 7 10 13 13 21 17 2 2 2 Table 1: GO terms, p-values and counts. 24 Total Genes No. 48 14 14 88 163 108 43 425 429 29 9 9 808 5 446 311 41 2 27 620 7 48 21 390 39 70 107 107 208 157 3 3 3 6 Reference 1. Lin, S.M., Du, P., Kibbe, W.A., ”Model-based Variance-stabilizing Transformation for Illumina Mi-croarray Data”, submitted 2. Du, P., Kibbe, W.A. and Lin, S.M., ”nuID: A Novel Identifier for Oligos, Ideal for Oligonucleotide-based Microarrays”, submitted. 3. Barnes, M., Freudenberg, J., Thompson, S., Aronow, B. and Pav-lidis, P. (2005) ”Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms”, Nucleic Acids Res, 33, 5914-5923. 25