* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Microarray Pitfalls Stem Cell Network Microarray Course, Unit 3 October 2006 Goals • To provide some guidelines on Affymetrix microarrays: – How to use them – How not to use them – Things to keep in mind when designing experiments and analyzing data • This is a general discussion of issues and is by no means exhaustive Inconsistent Annotations • Affymetrix provided probeset annotations change over time • The gene symbol associated with a given probeset is not necessarily stable • This is due to changes in gene prediction as new information becomes available. Inconsistent Annotations (2) An inconsistently annotated probeset • Perez-Iratxeta, C. and M.A. Andrade. 2005. Inconsistencies over time in 5% of NetAffx probe-to-gene annotations. BMC Bioinformatics. 6, 183. – 5% of probesets have gene identifiers that change over the two year time span covered by this analysis Inconsistent Annotations (3) • How do we deal with this? – Always note annotation version used in analysis especially when it is for publication – Report probeset name as well as gene symbol – Remember that re-analysis with later annotations may yield different results – Keep your annotation files up to date Old chips, new data • Expression microarrays are designed based the best available model of the genome of interest • The model for the HG-U133 microarrays was a human genome assembly that was only 25% complete! • The human assembly is >99% complete now Old chips, new data (2) • How do we deal with this? – A number of groups provide re-mappings of probes to probesets based upon the latest data available, for example: • Dai M, et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 2005;33:e175 Multiple Testing Corrections • A single expression microarray experiment actually consist of hundreds of thousands of simultaneous parallel experiment • This means you can test many hypotheses simultaneously • This is not free: the significance of any given result is decreases as a function of the number of hypotheses tested Multiple Testing Corrections (2) • How do we deal with this? – Limit the number of hypothesis you are testing instead of just ‘fishing’ in the whole data set. – Do this by selecting a set of candidate genes ahead of time based on your knowledge of the biology of the system. Multiple Testing (3) • Sandrine Dudoit, Juliet Popper Shaffer and Jennifer C. Boldrick Multiple Hypothesis Testing in Microarray Experiments Statistical Science 2003, Vol. 18, No. 1, 71–103 – “The biological question of differential expression can be restated as a problem in multiple hypothesis testing: the simultaneous test for each gene of the null hypothesis of no association between the expression levels and the responses” • Talk to a statistician if you have doubts Not everything is in the array • Probesets are designed with a bias towards the 3’ end of the gene. • they won’t distinct splice variants • won’t pick up alternative 3’ endings Not everything is in the array (2) • What can we do about this? – You should be aware of this, but not much can be done. – Use other technologies to complement your microarray results (PCR, sequencing) What are you measuring? • Remember that you are detecting the average mRNA over a population of cells. • Is your sample homogenous? • If it’s not homogenous then what are you measuring? How many types of cells in what state? • Time series of differentiating cells are particularly problematic. Inhomogenous Samples? • Many sources of inhomogeneity – Source organism gender – Cell cycle – Tissue source – Diet • Some can be eliminated • All should be documented where possible Chips don’t detect protein • Central assumption of microarray analysis: The level of mRNA is positively correlated with protein expression levels. – Higher mRNA levels mean higher protein expression, lower mRNA means lower protein expression • Other factors: – Protein degradation, mRNA degradation, polyadenylation, codon preference, translation rates,…. Conclusion • This is a general discussion of issues, doesn’t cover all pitfalls. • Please contact [email protected] if you have any comments, corrections or questions. • See associated bibliography for references from this presentation and further reading. • Thanks for your attention!