Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Some views on microarray experimental design Rainer Breitling Molecular Plant Science Group & Bioinformatics Research Centre University of Glasgow, Scotland, UK Personal Background • University of Glasgow, Scotland, UK • Molecular Plant Sciences Group • Bioinformatics Research Centre • Functional Genomics Facility Some common questions in microarray experimental design • • • • How many arrays will I need? Should I pool my samples? Which arrays should I choose? Which samples should I put together on one array? Why are microarrays special? • produce large amounts of data instantaneously • can look for unexpected effects • are still quite expensive almost never repeated careful design necessary before you start How many replicates? • as many as possible Statistics says: The more replicates, the better your estimate of expression (that’s an asymptotic process, so if you add at least a few replicates, the effect will be really strong) How many replicates? n 4( z1 / 2  z1  ) ( /  ) 2 2 •α significance level (probability of detecting FP) •1-β power to detect differences (probability of detecting TP) •σ standard deviation of the log-ratios •δ detectable difference between class mean log-ratios •z percentile of standard normal distribution  n required number of arrays (reference design) How many replicates? • Five Experience shows: For most common experiments you get a reasonable list of differentially expressed genes with 5 replicates How many replicates? • Three One to convince yourself, one to convince your boss, one just in case... How many replicates? • It depends on – the quality of the sample – the magnitude of the expected effect – the experimental design – the method of analysis The quality of the sample • smaller samples (single cells) are more noisy than large samples (tissue homogenates) • cell cultures are less noisy than patient biopsies • sample pooling can decrease noise – if individual variation is not of interest The magnitude of the effect • Microarrays are very sensitive • To keep effects small: – use early time points, gentle stimuli – never compare dogs and donuts • if you get a list of 2000 genes that are significantly changed, your experiment failed! The magnitude of the effect • some problematic cases – stably transfected cell lines (are they still the same cells?) – knock-out organisms (even the same tissue can be a different) – local changes may be diluted  cell isolation will increase noise The experimental design • Three major options: – reference design (flexible) – balanced block design (efficient) – loop design (elegant) The experimental design • loop designs can save samples... A B C D R R R R A B D C • ...but they can cause interpretation nightmares in less simple cases (use for large studies, if you have a full-time statistician in the team) The method of analysis • Golub et al. (1999) data set • 38 leukemia patient bone marrow samples, hybridized individually to Affymetrix microarrays • Differential expression between two leukemia types was examined, using random subsets of the complete dataset The method of analysis 0h 9.5h iterative GroupAnalysis (iGA) 11.5h 13.5h 15.5h 18.5h 20.5h 6144 - purine base metabolism 6099 - tricarboxylic acid cycle 6099 - tricarboxylic acid cycle 3773 - heat shock protein activity 6099 - tricarboxylic acid cycle 9277 - cell wall (sensu Fungi) 3773 - heat shock protein activity 5749 - respiratory chain complex II (sensu Eukarya) 6099 - tricarboxylic acid cycle 3773 - heat shock protein activity 297 - spermine transporter activity 6950 - response to stress 6121 - oxidative phosphorylation, succinate to ubiquinone 5977 - glycogen metabolism 5749 - respiratory chain complex II (sensu Eukarya) 15846 - polyamine transport 297 - spermine transporter activity 8177 - succinate dehydrogenase (ubiquinone) activity 6950 - response to stress 6121 - oxidative phosphorylation, succinate to ubiquinone 4373 - glycogen (starch) synthase activity 3773 - heat shock protein activity 4373 - glycogen (starch) synthase activity 8177 - succinate dehydrogenase (ubiquinone) activity 15846 - polyamine transport 4373 - glycogen (starch) synthase activity 4129 - cytochrome c oxidase activity 6537 - glutamate biosynthesis 5353 - fructose transporter activity 7039 - vacuolar protein catabolism 5751 - respiratory chain complex IV (sensu Eukarya) 6097 - glyoxylate cycle 15578 - mannose transporter activity 6950 - response to stress 5749 - respiratory chain complex II (sensu Eukarya) 5750 - respiratory chain complex III (sensu Eukarya) 7039 - vacuolar protein catabolism 4129 - cytochrome c oxidase activity 6121 - oxidative phosphorylation, succinate to ubiquinone 9060 - aerobic respiration 8645 - hexose transport 5751 - respiratory chain complex IV (sensu Eukarya) 8177 - succinate dehydrogenase (ubiquinone) activity 4129 - cytochrome c oxidase activity respiratory chain complex II glyoxylate cycle citrate (TCA) cycle oxidative phosphorylation Graph-based iterative GroupAnalysis (GiGA) respiratory chain complex III (complex V) What is a good replicate? The experiment your competitor at the other side of the globe would do to see if your results are reproducible Vary “all” parameters – challenge your results Prepare new samples, from new cultures, using new buffers and new graduate students Remember to produce matched controls What is a “bad” replicate? • technical replicates (i.e. hybridizing the same sample repeatedly) • dye-swapping experiments (usually genespecific dye bias is not a big issue, and dye balancing is more efficient anyway) • pooled samples, hybridized repeatedly • the same preparation, only labelled twice Should samples be pooled? • most samples are already pooled – they come from multiple cells • pool to increase amount of mRNA, but only as much as necessary • prepare independent pools to assess variation • problems: bias, “contamination”, outliers, information loss... Which arrays are the best? • Standard arrays compare and exchange data easily • Whole-genome arrays detect unexpected effects, increase confidence • Single-color arrays (Affymetrix GeneChip) for more complex comparisons • Annotated arrays Further reading • Dobbin, Shih & Simon (2003) J. Natl. Cancer Inst. 95: 1362. • Yang & Speed (2002) Nature Rev. Genet. 3: 579. • Breitling (2004) http://www.brc.dcs.gla.ac.uk/~rb106x/microarray_tips.htm Contact Rainer Breitling Bioinformatics Research Centre Davidson Building A416 [email protected] http://www.brc.dcs.gla.ac.uk/~rb106x