* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download cDNA Microarray
Survey
Document related concepts
Transcript
cDNA Microarray Design and Pre-processing By H. Bjørn Nielsen Why Experimental Design 1. To enable statistical hypothesis verification/falsification 2. To balance the effects from undesired controllable effects 3. To ensure sufficient statistical power 1. To enable statistical hypothesis verification/falsification Typically, we want to identify differential expressed genes between a set of conditions using t-test or ANOVA like statistics. This implies that we replicate sampling from a set of fixed conditions. Control vs. Treatment Treatment 1, Treatment 2, Treatment 3 Multi factorial Control Mutant, Treatment Mutant Treated 1. To enable statistical hypothesis verification/falsification But we may also fit to a trend using alternative statistics (Bayesian fit, Boot strapping, ANOVA etc.) Series T0, T1, T2, .... Tn The length of the series or the sampling density may be most important Control vs. Treatment Treatment 1, Treatment 2, Treatment 3 Multi factorial Control Mutant, Replications is essential Treatment Mutant Treated 2. To balance the effects from undesired controllable effects Typical controllable effects Labeling dye Microarray slide Sampling time Growth conditions Minimize and Balance Typical uncontrollable effects Random effects Unintended deviations in sample handling, growth conditions, etc. 2. To ensure sufficient statistical power An appropriate number of replicates are required for distinguishing noise from 'effect' Gene expression studies typically requires +3 replicates Make sure to replicate over the most important sources of variance Typical order of noise contributions are: Biological variation Sample preparation batch Hybridization/slide effect Dye effect/Spot effect t= An example Aim: Identify differentially expressed genes between ill and healthy patients. Samples: 4 ill and 4 healthy patients Using a two channel cDNA array. How should we do? Slide Dye Condition Slide 1 Cy3 ill Slide 1 Cy5 ... ... ... Another example Aim: Identify differentially expressed genes between ill and healthy patients. Samples: 4 ill (2xM +2xF) and 4 healthy (2xM +2F) Using a two channel cDNA array. How should we do? Slide Dye Sex Condition Slide 1 Cy3 M ill Slide 1 Cy5 ... ... ... ... Yet another example Aim Identify genes differentially affected by starving in obese and lean people Samples: 4 obese (2x starving + 2x not starving) and 4 lean (2x starving +2x not starving) Using a one channel GeneChip. How should we do? Chip # BMI Food 1 O S 2 L N ... ... cDNA pre-processing • Background correction • Normalization – Within slide – Between slide Background correction Is it meaningful? Methods: – subtraction – movingmin (3x3) – normexp – none Ritchie et al. 2007, Bioinformatics Normalization within array Correct for any bias that follow an undesired uncontrollable effect – – – – Print tip Microtiter plate Printing order Spatial trends (uneven hybridization) As well as intensity dependent biases Normalization between array Correction for intensity dependent biases – – – – Lowess Qspline Quantiles And more M A