Download An Introduction to Affymetrix Microarrays

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Copy-number variation wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Genomics wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Point mutation wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Genomic imprinting wikipedia , lookup

Public health genomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Long non-coding RNA wikipedia , lookup

NEDD9 wikipedia , lookup

Epigenetics of human development wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genome (book) wikipedia , lookup

Gene desert wikipedia , lookup

Genome evolution wikipedia , lookup

Gene therapy wikipedia , lookup

Gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genome editing wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

SNP genotyping wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Helitron (biology) wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene expression programming wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Transcript
An Introduction to
Affymetrix Microarrays
Naomi Altman
Dept. of Statistics and
Bioinformatics Consulting Centre
PSU
Oct. 14, 2004
DNA 100
A Statistician’s Simplification
Introns are excised – regulatory elements
Exon
Exons are the coding region of the gene
DNA 100
A Statistician’s Simplification
cDNA
What is a microarray probe?
A probe is a spot on an array representing a
gene or part of a gene
On “cDNA” arrays, the probes are actual
pieces of cDNA originally extracted from a
cell
We may not know the genetic sequence of a
cDNA
What is a microarray probe?
If we know the genetic sequence of the
cDNA, we can artificially synthesize a
strand of DNA with the same sequence.
This is called an oligo(nucleotide).
Oligos may be “spotted” on the array like
cDNA or may be synthesized on the array
cDNA versus Oligos
cDNAs have different hybridization
properties due to their biochemistry
Oligos may be chosen to have similar
hybridization properties
- and to represent maximally unique parts of
genes
- or to represent common domains
cDNA versus Oligos
cDNAs are maintained in cDNA libraries
which are expensive to maintain and may
be mislabeled or contaminated
Oligos are synthesized from genomic
sequence information which can be
subject to error
Format of an Affymetrix Array
•Each gene is represented by a “probe set”
•Each “probe set” is 16-20 pairs of oligos
•Each oligo is 25 nucleotides
•A PM (perfect match) probe matches a
strand of cDNA
•The corresponding MM (mismatch) probe
differs from the PM by a change in the central
nucleotide
•The probe pairs are spatially dispersed
•Control probes are printed
Format of an Affymetrix Array
Printing the Slide
Heuristics for “Probe Sets”
MM probe is supposed to control for:
•Variation in chemical composition
•Abundance of cross-hybridizing
fragments from other genes
By combining PM and MM information
from many probes, gene to gene
differences should be minimized.
Heuristics for “Probe Sets”
Initially it was thought that this approach
would make it possible to consider the
measured intensities as a proxy for actual
mRNA concentration.
Gene Expression
The objective is to combine information from the
PM and MM probe set to quantify gene
expression.
Comparability for a single gene across arrays is
essential.
Comparability among genes on a single array is
desirable.
Steps in Data Extraction
An intensity is read for each probe.
The intensities are corrected for background noise.
The intensities are normalized.
The intensities are combined into gene expression
values.
Why Normalize?
Normalization is supposed to remove
systematic measurement errors, both
within and between arrays.
This should improve our power to detect
differential expression.
Affymetrix Expression Values
(MAS 5.0)
Uses a weighted average of (PM-MM).
PM-MM is truncated to a small positive
value if it is negative.
Weights are small if (PM-MM) is an outlier
for the gene, and bigger for central
values.
Affymetrix Expression Values
(MAS 5.0)
Based on the 12 PM-MM pairs, a
Wilcoxon test is done to determine if the
gene expressed in the sample.
“Affy p-value”
Objections to Affymetrix
Normalization
The Affymetrix approach is under attack
because between 15% - 30% of the MM
are greater than the PM.
The word from the grapevine is that the
newest arrays:
have PM only
or
“poor” MMs have been replaced
We do need to normalize across
arrays?
2 Biological Samples
Other Expression Extraction
Methods
RMA (robust multi-array analysis)
After background correction
Probes are normalized using quantile method.
PM probes only are combined into gene
expression values using “median polish” method
that recognizes outliers for the probe across
arrays and within the probe set on one array.
Other Expression Extraction
Methods
dChip=Li and Wong
Uses a robust regression of PM on MM for
each probe set.
Other Expression Extraction
Methods
GCRMA
Assumes that the probe intensity is a function of
optical noise, non-specific binding, specific
binding, whether PM or MM and the probe
sequence.
Probe intensities are adjusted by this model.
Signal is then estimated using median polish.
Does the Normalization Matter For
Assessing Differential Expression?
P -values
Message of Figure 4
The lines represent the probe set for a single gene.
X-axis – concentration of spiked sample.
Y-axis – observed reflectance.
Probes do not hybridize at the same rate.
MM hybridizes at about the same rate as PM for this
gene.
The problem with equalizing
methods
You cannot add a new array without
renormalizing all
The problem with baseline
methods
Choice of baseline affects the
normalization.
What do we recommend at BCC?
Of the microarrays we have seen – Affy arrays are
the most consistent across samples
But – we have seen failures
We always do quality control plots:
1. Array image
2. Pairwise probe plots of replicates
We use RMA.
Software
We use the Affy package in R – free, state
of the art
normalization is easy
Other R packages are available for
assessing differential expression and
clustering
(next week – an Intro to R)
Conclusions
•Affy arrays do need normalization
•Probe-wise normalization is
recommended.
•Equalization methods are best but
baseline matching is OK with careful
choice of baseline array.
•After normalization, PM probes or
PM/MM can be combined to estimate
expression