Download Microarray Pitfalls

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Fetal origins hypothesis wikipedia, lookup

MicroRNA wikipedia, lookup

Genetic engineering wikipedia, lookup

Neuronal ceroid lipofuscinosis wikipedia, lookup

Public health genomics wikipedia, lookup

Gene wikipedia, lookup

Epigenetics of human development wikipedia, lookup

Polycomb Group Proteins and Cancer wikipedia, lookup

Gene therapy wikipedia, lookup

Microevolution wikipedia, lookup

Point mutation wikipedia, lookup

Genome (book) wikipedia, lookup

Protein moonlighting wikipedia, lookup

Genomics wikipedia, lookup

Long non-coding RNA wikipedia, lookup

Vectors in gene therapy wikipedia, lookup

Epigenetics of neurodegenerative diseases wikipedia, lookup

Primary transcript wikipedia, lookup

Messenger RNA wikipedia, lookup

Genome evolution wikipedia, lookup

Gene nomenclature wikipedia, lookup

Designer baby wikipedia, lookup

Epigenetics of diabetes Type 2 wikipedia, lookup

Nutriepigenomics wikipedia, lookup

Gene expression programming wikipedia, lookup

Epitranscriptome wikipedia, lookup

Site-specific recombinase technology wikipedia, lookup

Gene therapy of the human retina wikipedia, lookup

Therapeutic gene modulation wikipedia, lookup

Artificial gene synthesis wikipedia, lookup

Mir-92 microRNA precursor family wikipedia, lookup

RNA-Seq wikipedia, lookup

NEDD9 wikipedia, lookup

Gene expression profiling wikipedia, lookup

Transcript
Microarray Pitfalls
Stem Cell Network
Microarray Course, Unit 3
October 2006
Goals
• To provide some guidelines on Affymetrix
microarrays:
– How to use them
– How not to use them
– Things to keep in mind when designing
experiments and analyzing data
• This is a general discussion of issues and
is by no means exhaustive
Inconsistent Annotations
• Affymetrix provided probeset annotations
change over time
• The gene symbol associated with a given
probeset is not necessarily stable
• This is due to changes in gene prediction
as new information becomes available.
Inconsistent Annotations (2)
An inconsistently annotated probeset
• Perez-Iratxeta, C. and M.A. Andrade.
2005. Inconsistencies over time in 5% of
NetAffx probe-to-gene annotations. BMC
Bioinformatics. 6, 183.
– 5% of probesets have gene identifiers that
change over the two year time span covered
by this analysis
Inconsistent Annotations (3)
• How do we deal with this?
– Always note annotation version used in
analysis especially when it is for publication
– Report probeset name as well as gene
symbol
– Remember that re-analysis with later
annotations may yield different results
– Keep your annotation files up to date
Old chips, new data
• Expression microarrays are designed
based the best available model of the
genome of interest
• The model for the HG-U133 microarrays
was a human genome assembly that was
only 25% complete!
• The human assembly is >99% complete
now
Old chips, new data (2)
• How do we deal with this?
– A number of groups provide re-mappings of
probes to probesets based upon the latest
data available, for example:
• Dai M, et al. Evolving gene/transcript definitions
significantly alter the interpretation of GeneChip
data. Nucleic Acids Res. 2005;33:e175
Multiple Testing Corrections
• A single expression microarray experiment
actually consist of hundreds of thousands
of simultaneous parallel experiment
• This means you can test many hypotheses
simultaneously
• This is not free: the significance of any
given result is decreases as a function of
the number of hypotheses tested
Multiple Testing Corrections (2)
• How do we deal with this?
– Limit the number of hypothesis you are testing
instead of just ‘fishing’ in the whole data set.
– Do this by selecting a set of candidate genes
ahead of time based on your knowledge of
the biology of the system.
Multiple Testing (3)
• Sandrine Dudoit, Juliet Popper Shaffer and
Jennifer C. Boldrick Multiple Hypothesis Testing
in Microarray Experiments Statistical Science
2003, Vol. 18, No. 1, 71–103
– “The biological question of differential expression can
be restated as a problem in multiple hypothesis
testing: the simultaneous test for each gene of the
null hypothesis of no association between the
expression levels and the responses”
• Talk to a statistician if you have doubts
Not everything is in the array
• Probesets are designed with a bias
towards the 3’ end of the gene.
• they won’t distinct splice variants
• won’t pick up alternative 3’ endings
Not everything is in the array (2)
• What can we do about this?
– You should be aware of this, but not much can
be done.
– Use other technologies to complement your
microarray results (PCR, sequencing)
What are you measuring?
• Remember that you are detecting the
average mRNA over a population of cells.
• Is your sample homogenous?
• If it’s not homogenous then what are you
measuring? How many types of cells in
what state?
• Time series of differentiating cells are
particularly problematic.
Inhomogenous Samples?
• Many sources of inhomogeneity
– Source organism gender
– Cell cycle
– Tissue source
– Diet
• Some can be eliminated
• All should be documented where possible
Chips don’t detect protein
• Central assumption of microarray analysis:
The level of mRNA is positively correlated
with protein expression levels.
– Higher mRNA levels mean higher protein
expression, lower mRNA means lower protein
expression
• Other factors:
– Protein degradation, mRNA degradation,
polyadenylation, codon preference, translation
rates,….
Conclusion
• This is a general discussion of issues,
doesn’t cover all pitfalls.
• Please contact [email protected] if you
have any comments, corrections or
questions.
• See associated bibliography for references
from this presentation and further reading.
• Thanks for your attention!