Download Lecture slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene prediction wikipedia , lookup

DNA repair wikipedia , lookup

Restriction enzyme wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

DNA barcoding wikipedia , lookup

Gene wikipedia , lookup

Replisome wikipedia , lookup

Designer baby wikipedia , lookup

Virtual karyotype wikipedia , lookup

Metagenomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Transformation (genetics) wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

SNP genotyping wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

DNA vaccination wikipedia , lookup

Molecular cloning wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Non-coding DNA wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

History of genetic engineering wikipedia , lookup

DNA supercoil wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Bioinformatics wikipedia , lookup

Transcript
Normalization
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612
The DNA Array Analysis Pipeline
Question
Experimental Design
Array design
Probe design
Sample Preparation
Hybridization
Buy Chip/Array
Image analysis
Normalization
Expression Index
Calculation
Comparable
Gene Expression Data
Statistical Analysis
Fit to Model (time series)
Advanced Data Analysis
Clustering
Meta analysis
PCA
Classification
Survival analysis
Promoter Analysis
Regulatory Network
DNA Microarray Bioinformatics - #27612
Expression intensities are not just target
concentrations
•
•
•
•
•
•
•
Sample contamination
RNA quality
Sample preparation
Dye effect (cy3/cy5)
Probe affinity
Hybridization
Unspecific signal
(background)
• Saturation
• Spotting
• Other issues related to
array manufacturing
• Image segmentation
• Array spatial effects
DNA Microarray Bioinformatics - #27612
Two kinds of variation in the signal
Global variation
RNA quality
Sample preparation
Dye
Hybridization
Photodetection
Systematic
Gene-specific variation
Spotting (size and shape)
Cross-hybridization
Dye
Biological variation
– Effect
– Noise
Stochastic
DNA Microarray Bioinformatics - #27612
Sources of variation
Global variation:
Gene-specific variation:
Systematic
Stochastic
• Similar effect on many
• Too random to be explicitly
measurements
• Corrections can be
estimated from data
accounted for
• “noise”
Normalization
Statistical testing
DNA Microarray Bioinformatics - #27612
Calibration = Normalization = Scaling
DNA Microarray Bioinformatics - #27612
Nonlinear normalization
DNA Microarray Bioinformatics - #27612
Lowess Normalization
*
M
*
*
*
* *
*
A
One of the most commonly utilized normalization
techniques is the LOcally Weighted Scatterplot
Smoothing (LOWESS) algorithm.
DNA Microarray Bioinformatics - #27612
The Qspline method
From the empirical distribution, a number of quantiles are calculated for
each of the channels to be normalized (one channel shown in red) and for
the reference distribution (shown in black)
A QQ-plot is made and a normalization curve is constructed by fitting a
cubic spline function
As reference one can use an artificial “median array” for a set of arrays
or use a log-normal distribution, which is a good approximation.
DNA Microarray Bioinformatics - #27612
Once again…qspline
Accumulating quantiles
When many microarrays are to be
normalized to each other an average
array can be used as target
DNA Microarray Bioinformatics - #27612
Invariant set normalization (Li and Wong)
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW ) decompressor
are needed to see this picture.
A invariant set of probes is used
-Probes that does does not change intensity rank between arrays
-A piecewise linear median line is calculated
-This curve is used for normalization
DNA Microarray Bioinformatics - #27612
Spatial normalization
Raw data
After intensity
normalization
Spatial bias
estimate
After spatial
normalization
DNA Microarray Bioinformatics - #27612
The DNA Array Analysis Pipeline
Question
Experimental Design
Array design
Probe design
Sample Preparation
Hybridization
Buy Chip/Array
Image analysis
Normalization
Expression Index
Calculation
Comparable
Gene Expression Data
Statistical Analysis
Fit to Model (time series)
Advanced Data Analysis
Clustering
Meta analysis
PCA
Classification
Survival analysis
Promoter Analysis
Regulatory Network
DNA Microarray Bioinformatics - #27612
Expression index value
Some microarrays have multiple probes addressing
the expression of the same target
– Affymetrix
GeneChips
have 11-20 probe
pairs pr. Gene
However
for downstream
analysis
we often want to deal- Perfect
with Match
only(PM)
one
- MisMatch (MM)
value pr. gene.
Therefore we want to collapse the
PM:intensities
CGATCAATTGCACTATGTCATTTCT
from many probes into
MM: CGATCAATTGCAGTATGTCATTTCT
one value:
a gene expression index value
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
DNA Microarray Bioinformatics - #27612
Expression index calculation
Simplest method? Median
But more sophisticated methods exists:
dChip, RMA and MAS 5
DNA Microarray Bioinformatics - #27612
dChip (Li & Wong)
Model:
PMij = qifj + eij
Outlier removal:
– Identify extreme residuals
– Remove
– Re-fit
– Iterate
Distribution of errors eij assumed
independent of signal strength
(Li and Wong, 2001)
DNA Microarray Bioinformatics - #27612
RMA
Robust Multi-array Average (RMA) expression
measure (Irizarry et al., Biostatistics, 2003)
For each probe set, re-write PMij = qifj as:
log(PMij)= log(qi ) + log(fj)
Fit this additive model by iteratively re-weighted
least-squares or median polish
DNA Microarray Bioinformatics - #27612
MAS. 5
MicroArray Suite version 5 uses
Signal = TukeyBiweight{log(PMj - MM*j)}
MM* is an adjusted MM that is never bigger than PM
Tukey biweight is a robust average procedure with weights
and outlier rejection
DNA Microarray Bioinformatics - #27612
Methods compared on expression variance
Standard deviation of gene measures
from 20 replicate arrays
Std Dev of gene measures from 20 replicate arrays
Expression level
RMA: Blue and Red
MAS5: Green
dChip: Black
From Terry speed
DNA Microarray Bioinformatics - #27612
Robustness
MAS 5.0
Log fold change estimate from 20ug cRNA
MAS5.0
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Log fold change estimate from 1.25ug cRNA
(Irizarry et al., Biostatistics, 2003)
DNA Microarray Bioinformatics - #27612
Robustness
dChip
Log fold change estimate from 20ug cRNA
dChip
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Log fold change estimate from 1.25ug cRNA
(Irizarry et al., Biostatistics, 2003)
DNA Microarray Bioinformatics - #27612
Robustness
RMA
Log fold change estimate from 20ug cRNA
RMA
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Log fold change estimate from 1.25ug cRNA
(Irizarry et al., Biostatistics, 2003)
DNA Microarray Bioinformatics - #27612
All of this is implemented in…
R
In the BioConductor packages ‘affy’
(Gautier et al., 2003).
DNA Microarray Bioinformatics - #27612
References
Li and Wong, (2001). Model-based analysis of oligonucleotide arrays: Model
validation, design issues and standard error application.
Genome Biology 2:1–11.
Irizarry, Bolstad, Collin, Cope, Hobbs and Speed, (2003) Summaries of Affymetrix
GeneChip probe level data.
Nucleic Acids Research 31(4):e15.)
Affymetrix. Affymetrix Microarray Suite User Guide. Affymetrix, Santa Clara, CA,
version 5 edition, 2001.
Gautier, Cope, Bolstad, and Irizarry, (2003). affy - an r package for the analysis of
affymetrix genechip data at the probe level. Bioinformatics
DNA Microarray Bioinformatics - #27612