* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Differential Gene Expression Differentially Expressed Genes
Epigenetics in learning and memory wikipedia , lookup
Oncogenomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Gene desert wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Public health genomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
History of genetic engineering wikipedia , lookup
Essential gene wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome evolution wikipedia , lookup
Gene expression programming wikipedia , lookup
Genome (book) wikipedia , lookup
Minimal genome wikipedia , lookup
Ridge (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genomic imprinting wikipedia , lookup
Microevolution wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
DNA Microarrays data analysis - 2006 Differential Gene Expression Mauro Delorenzi September 06 1 Differentially Expressed Genes Goal: Simple case: Identify genes with different levels in two conditions (between two arrays or groups of arrays) Generally: genes associated with a covariate or response of interest ! Examples: ! " Qualitative covariates or factors: treatment, type of diet, cell type, tumor class " Quantitative covariate: dose of drug, age " Responses: metastasis-free survival, cholesterol level September 06 2 Approaches 1. Visual, exploratory inspection for one (or more) slides 2. Compute a test statistic Tj for the effect of each gene j 3. Rank the genes according to T 4. Estimate a reasonable cutoff (statistical significance) 5. Adjust for multiple hypothesis testing September 06 3 Test statistics ! Qualitative covariates (groups): e.g. two-sample t-statistic or non-parametric Wilcoxon statistic F-statistic ! Quantitative covariates: e.g. standardized regression coefficient ! Survival response: e.g. likelihood ratio for Cox model September 06 4 ! What effects to “believe” in ? September 06 " 5 ! September 06 " 6 3 correct ratio, low var at 100-1000 pg ! diff.exp missed completely at 20-40 pg Approx. correct ratio, a bit higher Var at 1.6-2.4 pg False Negative FN " September 06 Different microarray probes have different properties7 a 0-spike at a ratio of 2 Multiple outliers simulating high diff. expr. September 06 8 Single-slide methods ! Model-dependent rules for deciding whether a value pair (R,G) corresponds to a differentially expressed gene ! Amounts to drawing two curves in the (R,G)-plane; call a gene differentially expressed, if it falls outside the region between the two curves ! At this time, not enough known about the systematic and random variation within a microarray experiment to justify these strong modeling assumptions ! n = 1 slide may not be enough (!) September 06 9 Difficulty in assigning valid p-values based on a single slide September 06 10 Single-slide methods ! Chen et al: Each (R,G) is assumed to be normally and independently distributed with constant CV; decision based on R/G only (purple) ! Newton et al: Gamma-Gamma-Bernoulli hierarchical model for each (R,G) (yellow) ! Roberts et al: Each (R,G) is assumed to be normally and independently distributed with variance depending linearly on the mean ! Sapir & Churchill: Each log R/G assumed to be distributed according to a mixture of normal and uniform distributions; decision based on R/G only (turquoise) September 06 11 Informal methods ! ! If no replication (i.e. only have a single array), there are not many options Common methods include: "(log) Fold change exceeding some threshold, e.g. more than 2 (or less than –2) "Graphical assessment, e.g. QQ plot ! However, the threshold is pretty arbitrary September 06 12 Which genes are DE? ! Difficult to judge significance "massive multiple testing problem "don’t know null distribution of M "genes dependent ! Strategy "aim to rank genes "assume most genes are not DE (depending on type of experiment and array) "find genes separated from the majority 13 September 06 Used to assess whether a sample follows a particular (e.g. normal) distribution (or to compare the distributions of two samples) A method for looking for outliers Sample QQ-Plots Sample quantile is 0.125 Value from Normal distribution Theoretical which yields a quantile of 0.125 September 06 14 Typical deviations from straight line patterns ! Outliers ! Curvature at both ends (long or short tails) ! Convex/concave curvature (asymmetry) ! Horizontal segments, plateaus, gaps September 06 15 Outliers September 06 16 Long Tails September 06 17 Short Tails September 06 18 Asymmetry September 06 19 Plateaus/Gaps September 06 20 QQ Plot September 06 21 DE in a QQ plot In this case, the ratios are from a self-self hyb – i.e. NO genes are truly DE! September 06 22 Decision Table POSITIVE CLASSIFIED AS DIFFERENTIALLY EXPRESSED NEGATIVE CLASSIFIED AS NON DE September 06 23 Replicated experiments ! Have n replicates ! For each gene, have n values of M = log2 fold change, one from each array ! Summarize M1, ..., Mn for each gene by "M = average (M1, ..., Mn) "s = SD(M1, ..., Mn) ! Rank genes in order of strength of evidence in favor of DE ! How might we do this? September 06 24 Ranking criteria ! Genes i = 1, ..., p ! Mi = average log2 fold change for gene i "Problem : genes with large variability likely to be selected, even if not DE ! Fix that by taking variability into account: use ti = Mi/ (si/#n) "Problem : genes with extremely small variances make very large t " When the number of replicates is small, the smallest si are likely to be underestimates 25 September 06 T-statistics , many false positive call, when the number of repetitions is small, y = avrg / stdev (3 repl.), x=A G spec September 06 true positive false positive 26 y = avrg / stdev (regression estimate stdev a. A), x=A G-low 27 September 06 Shrinkage estimators ! ! ! Idea: borrow information across genes Here, we ‘shrink’ the ti towards zero by modifying the si in some way (get si*) mod ti = ti* = Mi/(si*/#n) ti ti* ! Many ways to get a value for si* ! We will use the version implemented in the BioConductor package limma September 06 Mi 28 Comparison of statistics 1: extreme in B only (B > -.5) 2: extreme in t only (|t| > 4.5) 3: extreme in B and t only 4: extreme in M only (|M| > .5) 5: extreme in M and B only 6: extreme in M and t only 7: extreme in M, B, t 29 September 06 B vs. av M September 06 1: B only 2: t only 3: B and t only 4: M only 5: M and B only 6: M and t only 7: M, B, t 30 Testing Classical hypothesis testing is setup for a single null and alternative hypothesis. The ’truth’ is that the null is either true or not, but we are not able to know the truth. Based on our collected data, we can make one of two possible decisions: reject the null or do not reject the null. Our data cannot tell us whether the null is true or not, only whether what we see is consistent with the null or not. September 06 31 Testing There are 2 types of errors we can make in this framework: we can make the mistake of rejecting the null when it is really true (a Type I error), or we can make the mistake of not rejecting the null when it is really not true (a Type II error). The Type I error is defined to be the probability, conditional on the null being true, that the null is rejected. That is, the probability that the test statistic falls into the rejection region. The rejection region is determined so that the Type I error does not exceed a userdefined rate (often 5% , but this level is not required). One can also report a p-value, which is the probability, conditional on the null being true, that you observe a test statistic as or more extreme (in the direction of the alternative) than the one you got. September 06 32 Significance of results ! ! ! ! Assessing significance is difficult, due to complicated (and unknown) dependence structure between genes and unknown distribution for log ratios B statistic does not yield absolute cutoff values, because p is not estimated (p is necessary for the calibration) Possible to compute approximate adjusted pvalues by resampling methods Conclusion : use mod t or B statistic for ranking genes, regard associated p-value as rough estimates 33 September 06 The B stat: an Empirical Bayes Method ! The approach implemented in LIMMA is based on an empirical Bayes procedure. The resulting measure is a moderated t-statistic. Improved SD estimates are obtained by using not only replicate measurements of single genes, but by pooling genes. => individual gene SD closer to the overall SD. ! We may equivalently look at the log of the odds ratio (B) ! ! log B = log[ P(µi $ 0)/P(µi = 0)] p( x) 1 ! p( x) the log odds formulation is most useful as a relative rather than absolute measure, as it is difficult to calibrate. ! ! the absolute values of the moderated t-statistic the (adjusted) p-values (FDR) A p-value can be described as the probability a truly null statistic is “as or more extreme” than the one observed September 06 A FDR of 1% means that among all features called significant, 1% of these are truly null on average. 34 P adjusted < 0.01; B > 0.0171; |t| > 4.23 September 06 35 Example: Apo AI experiment (Callow et al., Genome Research, 2000) GOAL: Identify genes with altered expression in the livers of one line of mice with very low HDL cholesterol levels compared to inbred control mice Experiment: • Apo AI knock-out mouse model • 8 knockout (ko) mice and 8 control (ctl) mice (C57Bl/6) • 16 hybridisations: mRNA from each of the 16 mice is labelled with Cy5, pooled mRNA from control mice is labelled with Cy3 Probes: ~6,000 cDNAs, including 200 related to lipid metabolism September 06 36 Which genes have changed? This method can be used with replicated data: 1. For each gene and each hybridisation (8 ko + 8 ctl) use M=log2(R/G) 2. For each gene form the t-statistic: average of 8 ko Ms - average of 8 ctl Ms sqrt(1/8 (SD of 8 ko Ms)2 + 1/8 (SD of 8 ctl Ms)2) 3. Form a histogram of 6,000 t values 4. Make a normal Q-Q plot; look for values “off the line” 5. Adjust for multiple testing 37 September 06 Histogram & Q-Q plot ApoA1 September 06 38 Plots of t-statistics September 06 39 The multiple testing problem ! ! ! ! Multiplicity problem: thousands of hypotheses are tested simultaneously. Increased chance of false positives. Choose as p-value cutoff p=0.01 A Gene that follows the null distribution of no DE will pass the cutoff with probability p Given n genes being tested, on average n*p genes will pass the cutoff. For example n=30’000 and not a single one is differentially expressed. If the genes would be independent, expect 300 genes wrongly called differentially expected. Individual pvalues of e.g. 0.01 no longer correspond to significant findings with high likelihood, many are expected even if the set of value would have been obtained using a generator of independent random numbers with no difference between the conditions being compared This number can fluctuate strongly due to correlation between the genes. It is not simple to base conclusions depending on the number of genes that pass a given p-value cutoff September 06 40 Multiple Testing In the multiple testing situation, there are several possible ways to define an error rate which is meant to be controlled. One possibility here is the family-wise error rate (FWER). This is the probability of at least one Type I error among the entire family of tests. The Bonferroni procedure is an example which provides (strong) control of this error rate. There is the concept of strong or weak error rate control. Weak control only guarantees control under the complete null - ie, only if all nulls are true. Strong control guarantees control under any combination of true and false nulls. In the case of microarrays, it is extremely unlikely that all nulls will be true - eg, no genes differentially expressed - so weak control is not satisfactory in this situation.) 41 September 06 Assigning unadjusted p-values to measures of change ! Estimate p-values for each comparison (gene) by using the permutation distribution of the t-statistics. ! For each of the ( 8 ) = 12,870 possible permutation of the trt / ctl labels, compute the two-sample t-statistics t* for each gene. ! The unadjusted p-value for a particular gene is estimated by the proportion of t*’s greater than the observed t in absolute value. 16 September 06 42 Apo AI: Adjusted and unadjusted p-values for the 50 genes with the larges absolute t-statistics September 06 43 Permutations For paired data, permutations are obtained by switching the characteristic profiles within each pair, yielding 2n possible permutations for n pairs of specimens. For the unpaired or multi-group case, permutations are performed by shuffliing the group membership labels. Note that in each case, the characteristic profiles measured on any given specimen remain intact so as to preserve the correlation among the measured characteristics. With a small number of specimens, it may be possible to enumerate all possible permutations. However, typically the number of permutations is very large, so they are randomly sampled. For example, for paired breast tumor cases, permutations are performed by switching with probability 1/2 the before and after gene expression profiles within each pair. September 06 44 Type I (False Positive) Error Rates ! Family-wise Error Rate FWER = p(FP ! 1) ! False Discovery Rate (BH) FDR = E(FP / P) ! (FDR = 0 if P = 0) False Discovery Rate (SAM) q-value = E(FP | H0C) / P ! False Discovery Proportion FDP = #FP / #P September 06 (FDP = 0 if P = 0) 45 FWER Traditional methods seek strong control of familywise Type I error (FWER): The control of the error rates is strong in the sense that the error rate is controlled regardless of which variables satisfy the null hypothesis. If there are no effects at all, then one controls for the probability that a hypothesis is falsely rejected. For example, Bonferroni correction provides strong control. The Bonferroni correction delivers an upper bound for the probability of a type I error, that is rejection of the null hypothesis (acceptance that there is an effect) by mistake (when there is no effect). The Bonferroni correction is conservative. This can be much higher than the correct p-value. This can be seen with an extreme example. If we would (unknowingly) be measuring 1000 times the same variable and obtain the same values, the p-value would incorrectly be estimated to be 1000 times higher than its actual value. September 06 46 Control of the FWER ! Bonferroni single-step adjusted p-values pj* = min (mpj, 1) ! Take into account the joint distribution of the test statistics: ! Westfall & Young (1993) step-down minP adjusted p-values ! Westfall & Young (1993) step-down maxT adjusted p-values ! Step-down procedures: successively smaller adjustments at each step, Less conservative than Bonferroni September 06 47 Stepwise Procedures The Bonferroni procedure is an example of a single step procedure - it applies an equivalent p-value correction based on the total number of hypotheses, regardless of the ordering of the unadjusted p-values. It has long been recognized that stepwise (sequential) procedures can proved more power while maintaining error rate control. Stepwise procedures (step-down or step-up) allow each p-value to have its own individual correction, which is based not only on the number of hypotheses but also on the outcomes of the other hypothesis tests. Stepwise procedures start with the unadjusted p-values ordered either from most significant to least significant (the step-down order) or from least significant to most significant (the step-up order). p-values are then successively adjusted, with the adjustment depending on the outcome of the previous tests. September 06 48 Other Proposals While numerous methods were available for controlling the family-wise type I error rate (FWE), the multiplicity problem in microarray data does not require a protection against even a single type I error, so that the severe loss of power involved in such protection is not justified. Instead, it may be more appropriate to emphasize the proportion of errors among the identified differentially expressed genes. The expectation of this proportion is the false discovery rate (FDR). Korn et al. proposed two stepwise permutation-based procedures to control with specified confidence the actual number of false discoveries and approximately the actual proportion of false discoveries rather than the expected number and proportion of false discoveries. Simulation studies demonstrate gain in sensitivity (power) to detect truly differentially expressed genes even when allowing as few as one or two false discoveries. Application of the methods allows statements such as “with 95% confidence, the number of false discoveries does not exceed 2 or with approximate 95% confidence, the proportion of false discoveries does not exceed 0.10”. 49 September 06 Assessing significance under multiple testing via permutations, Korn’s method ! Estimate p-values for each comparison (gene) by using the permutation distribution of the rank-specific t-statistics. ! For each of the ( 8 ) = 12,870 possible permutation of the trt / ctl labels, compute the two-sample t-statistics t* for each gene. Compute the distributions of the rank k highest statistics t*(k) from each permutation. ! The p-value for false discovery count = k for a the gene is estimated by the proportion of t*(k) greater than the observed t in absolute value. 16 September 06 50 Family-wise significance with FDC September 06 51 FDR Another type of error rate to consider in the multiple testing context is the false discovery rate (FDR). Here, one is prepared to tolerate some Type I errors, as long as their number is small compared to the total number of rejected null hypotheses. The FDR is defined as the expected proportion of rejected nulls which are false (or 0 if there are no rejected nulls). This error rate results in procedures which are less conservative (and therefore typically more appealing to scientists in the case of very high multiplicity). September 06 52 Control of the FDR realized False Discovery Rate FDR = FP / P ! Benjamini & Hochberg (1995): step-up procedure which controls the expected FDR under some dependency structures ! Benjamini & Yuketieli (2001): conservative step-up procedure which controls the expected FDR under general dependency structures ! ‘Significance Analysis of Microarrays (SAM)’ q value (2 versions) " Efron et al. (2000): weak control " Tusher et al. (2001): strong control ! ‘Korn’s method’ controls for the false discovery counts and approximately controls the FDR " Korn et al. (2003, 2004): September 06 53 Benjamini & Hochberg The BH procedure is a step-up procedure that provides strong control of the FDR. The key to understanding/interpretation is to understand the meaning of the FDR. The FDR indicates the expected (average) proportion of ’discoveries’ (ie, rejected null hypotheses) that are ’false discoveries’ (ie, the null is really true). A rejected null corresponds to a gene identified by the test as ’interesting’, while a true null represents genes that are in reality biologically ’uninteresting’. This means that if the BH adjusted p-value is .05 (say), then for every 20 rejected nulls (’interesting’ genes) you expect 1 of those in fact to correspond to a true null (’uninteresting’ genes). Because the error rate control applies to what should happen ’on average’, the actual number of false discoveries per 20 rejected nulls may be larger or smaller than 1. Also, we cannot tell just based on the FDR which of the rejected nulls are the false discoveries. One way to use the FDR is to prioritize genes for further follow-up. If there are no genes with small FDR, then you should be prepared for several of the ’discoveries’ not to hold up under further scrutiny. If the FDR for a gene is too big, it may be decided to concentrate resources on other genes, or other types of experiments. September 06 54 Benjamini & Hochberg Can power be improved while maintaining control over a meaningful measure of error? Benjamini & Hochberg (1995) define a sequential p-value procedure that controls expected FDR FP/P. Specifically, the BH procedure guarantees E (FDR) " F / M * % " % . That is that for a pre-specifed 0 < % < 1, a cutoff is given at which the expected FDR is not superior to the desired value. Benjamini and Hochberg advocated that the FDR should be controlled at some desirable level, while maximizing the number of discoveries made. They offered the linear step-up procedure as a simple and general procedure that controls the FDR. The linear step-up procedure only makes use of the m p-values, P = (P1, ..., Pm) so it can be applied to any statistics that yields (single test) p values. September 06 55 FDR estimation by SAM September 06 56 R / BioConductor ! limma differential expression for designed experiments, moderated t and B statistics, FDR http://bioinf.wehi.edu.au/limma/ http://bioinf.wehi.edu.au/limmaGUI/ ! samr SAM methods for differential expression and q value (older: siggenes) http://www-stat.stanford.edu/~tibs/SAM/index.html ! multtest adjustments for multiple hypotheses 57 September 06 Acknoledgements ! Slides and Text contributions by ! Darlene Goldstein ! Terry Speed ! Asa Wirapati ! Members of the BCF September 06 58