* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Supplementary Data (doc 25K)
Transcriptional regulation wikipedia , lookup
Genome evolution wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene desert wikipedia , lookup
Gene expression wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Gene regulatory network wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Community fingerprinting wikipedia , lookup
Data Analysis addendum on normalization methods Applying biological relevance to large gene expression data sets is of paramount importance. We utilized the Affymetrix Laboratory Information Management System (LIMS) to manage data and track projects by organizing and relating various types of data. The LIMS allowed for the absolute comparison of experimental data generated by both microarray analysis and quantitative RTPCR validation studies. Prior to comparative analysis and biological inquiry, all microarray data sets need to be normalized in order to make valid comparisons. Five elementary normalization methods (provided by Silicon Genetics) are described and have been applied to both Affymetrix microarray data sets as well as TaqMan real-time quantitative PCR data sets. Whole experiment normalization, which allows for the normalization of data relative to a set of negative controls, is one example of a potential normalization protocol. In this case, the median signal of a negative control (or group of controls) is subtracted from each gene’s expression level. In the best case scenario, the negative control value will equal the background value (relative to the noise of the system) when compared to the rest of the data set. More specifically, the negative control for all data sets within the scope of this project included two polyadenylated amino acid sequences that were represented on every chip. The second example for whole experiment normalization involves comparison to a positive control. The positive control can be a sequence spiked into the starting RNA sample or into the target solution prior to hybridization. Due to sensitivity concerns (i.e., absolute levels of detection for rare transcripts), we have incorporated two fundamentally different positive controls into all samples. First, a staggered set of mRNA spikes (unlabeled) was added to every RNA sample prior to processing. The spiked transcripts were unlabeled and underwent all subsequent labeling steps with the sample. This approach controlled for specific (endo-/exonuclease) or non-specific (heavy metals) degradation of starting RNA in addition to assessing the efficiency of label incorporation from the enriched samples. All data sets were normalized for the recovery and fidelity of these transcripts. The second positive control consisted of a set of staggered biotin labeled cRNAs that hybridize to bacterial sequences on every chip. These spikes provide information about non-specific factors that affect hybridization conditions leading to chip to chip variability. Combining normalization to both sets of positive controls may be done by an affine or linear transformation. Normalization across multiple genes is the third example of whole experiment normalization. This normalization strategy makes the median 1 for all genes or genes in a certain class. This analysis assumes that the distribution of expressions over different classes is similar. Once again normalization can be affine or linear. This analysis allowed for the reconciliation of samples processed at different times or that may vary due to slight differences in starting material (i.e., cDNA) that can lead to a discordant result from the same treatment group. One normalization strategy that can be applied to array data validation studies being completed by real-time quantitative PCR is single gene normalization. Single gene normalization across multiple experiments can be useful for multiplex TaqMan analysis which has been employed to verify genes selected from the qualitative assessment. We used a single gene (either B2m or GADPH) which does not change as a function of experimental manipulation as a direct control for comparative analysis. The threshold cycle (Ct) for the gene of interest is divided by the control Ct producing a ratio. The control Ct is kept as a measurement of overall expression. By applying the same strategy to our array data sets we compared the relative differences in gene expression from both conditions to establish the relative efficiency/amount of change for any gene detected by either analysis. A gene by gene normalization across multiple experiments is another approach to analyzing expression analysis of individual genes across large data sets. This procedure scaled the readings of a single gene over many experiments to a median of one. This scaling factor is used to determine the overall gene strength assuming that the median of expression levels over each data set is statistically average. By applying a per-gene normalization strategy one can plot several graphs on one set of axes for the same gene across multiple experiments. The normalization procedures described above can be applied in a series of combinations to insure mathematical significance. For example, normalization to a negative control is often followed by normalization to either a positive control, which is then normalized to each gene itself, or one can normalize to each experiment, which is followed by the normalization of each gene to itself. Another example of combinatorial normalization can be illustrated via positive control normalization. This normalization can be used in concert with normalizing each experiment to itself or normalizing each gene to itself. The GeneSpring software package allows for all of these normalization procedures for both Affymetrix GeneChip data sets and Perkin-Elemer 7900 real-time quantitative RT-PCR data sets. Due to the novelty of this experimental approach, all normalization schemata were utilized to try and achieve a consensus set of genes from the microarray data sets.