Download Supplementary Data (doc 25K)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transcriptional regulation wikipedia , lookup

Genome evolution wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene desert wikipedia , lookup

Gene expression wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene regulatory network wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Data Analysis addendum on normalization methods
Applying biological relevance to large gene expression data sets is of paramount
importance. We utilized the Affymetrix Laboratory Information Management
System (LIMS) to manage data and track projects by organizing and relating
various types of data. The LIMS allowed for the absolute comparison of
experimental data generated by both microarray analysis and quantitative RTPCR validation studies. Prior to comparative analysis and biological inquiry, all
microarray data sets need to be normalized in order to make valid comparisons.
Five elementary normalization methods (provided by Silicon Genetics) are
described and have been applied to both Affymetrix microarray data sets as well
as TaqMan real-time quantitative PCR data sets.
Whole experiment normalization, which allows for the normalization of data
relative to a set of negative controls, is one example of a potential normalization
protocol. In this case, the median signal of a negative control (or group of
controls) is subtracted from each gene’s expression level. In the best case
scenario, the negative control value will equal the background value (relative to
the noise of the system) when compared to the rest of the data set. More
specifically, the negative control for all data sets within the scope of this project
included two polyadenylated amino acid sequences that were represented on
every chip.
The second example for whole experiment normalization involves comparison to
a positive control. The positive control can be a sequence spiked into the
starting RNA sample or into the target solution prior to hybridization. Due to
sensitivity concerns (i.e., absolute levels of detection for rare transcripts), we
have incorporated two fundamentally different positive controls into all samples.
First, a staggered set of mRNA spikes (unlabeled) was added to every RNA
sample prior to processing. The spiked transcripts were unlabeled and
underwent all subsequent labeling steps with the sample. This approach
controlled for specific (endo-/exonuclease) or non-specific (heavy metals)
degradation of starting RNA in addition to assessing the efficiency of label
incorporation from the enriched samples. All data sets were normalized for the
recovery and fidelity of these transcripts. The second positive control consisted
of a set of staggered biotin labeled cRNAs that hybridize to bacterial sequences
on every chip. These spikes provide information about non-specific factors that
affect hybridization conditions leading to chip to chip variability. Combining
normalization to both sets of positive controls may be done by an affine or linear
transformation.
Normalization across multiple genes is the third example of whole experiment
normalization. This normalization strategy makes the median 1 for all genes or
genes in a certain class. This analysis assumes that the distribution of
expressions over different classes is similar. Once again normalization can be
affine or linear. This analysis allowed for the reconciliation of samples processed
at different times or that may vary due to slight differences in starting material
(i.e., cDNA) that can lead to a discordant result from the same treatment group.
One normalization strategy that can be applied to array data validation studies
being completed by real-time quantitative PCR is single gene normalization.
Single gene normalization across multiple experiments can be useful for
multiplex TaqMan analysis which has been employed to verify genes selected
from the qualitative assessment. We used a single gene (either B2m or GADPH)
which does not change as a function of experimental manipulation as a direct
control for comparative analysis. The threshold cycle (Ct) for the gene of interest
is divided by the control Ct producing a ratio. The control Ct is kept as a
measurement of overall expression. By applying the same strategy to our array
data sets we compared the relative differences in gene expression from both
conditions to establish the relative efficiency/amount of change for any gene
detected by either analysis.
A gene by gene normalization across multiple experiments is another approach
to analyzing expression analysis of individual genes across large data sets. This
procedure scaled the readings of a single gene over many experiments to a
median of one. This scaling factor is used to determine the overall gene strength
assuming that the median of expression levels over each data set is statistically
average. By applying a per-gene normalization strategy one can plot several
graphs on one set of axes for the same gene across multiple experiments.
The normalization procedures described above can be applied in a series of
combinations to insure mathematical significance. For example, normalization to
a negative control is often followed by normalization to either a positive control,
which is then normalized to each gene itself, or one can normalize to each
experiment, which is followed by the normalization of each gene to itself.
Another example of combinatorial normalization can be illustrated via positive
control normalization. This normalization can be used in concert with normalizing
each experiment to itself or normalizing each gene to itself. The GeneSpring
software package allows for all of these normalization procedures for both
Affymetrix GeneChip data sets and Perkin-Elemer 7900 real-time quantitative
RT-PCR data sets. Due to the novelty of this experimental approach, all
normalization schemata were utilized to try and achieve a consensus set of
genes from the microarray data sets.