Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module 3 Hypothesis testing Exploratory Data Analysis and Essential Statistics using R † Boris Steipe Toronto, September 8–9 2011 DEPARTMENT OF BIOCHEMISTRY DEPARTMENT OF MOLECULAR GENETICS † Includes material originally developed by Oedipus ponders the riddle of the Sphinx. Classical (~400 BCE) Raphael Gottardo introduction • One sample and two sample t-tests are used to test a hypothesis about the mean(s) of a distribution. • Gene expression: Is the mean expression level under condition 1 different from the mean expression level under condition 2? • Assume that the data are from a normal distribution. Module 3: Hypothesis testing bioinformatics.ca one sample t-test text Module 3: Hypothesis testing bioinformatics.ca example – HIV data Is gene 1 differentially expressed? Is the mean log ratio equal to zero? # One sample t-test data <- log(read.table(file="hiv.raw.data.24h.txt", sep="\t", header=TRUE)) # Compute the log ratios M <- (data[, 1:4] - data[, 5:8]) gene1 <- t.test(M[1, ], mu=0) x <- seq(-4, 4, 0.1) f <- dt(x, df=3) plot(x, f, xlab="x", ylab="density", type="l", lwd=5) segments(gene1$stat, 0, gene1$stat, dt(gene1$stat, df=3), col=3, lwd=5) segments(-gene1$stat, 0, -gene1$stat, dt(gene1$stat, df=3), col=2, lwd=5) segments(x[abs(x)>abs(gene1$stat)], 0, x[abs(x)>abs(gene1$stat)], f[abs(x)>abs(gene1$stat)], col=4, lwd=1) > gene1 One Sample t-test data: M[1, ] t = 0.7433, df = 3, p-value = 0.5112 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -1.771051 2.850409 sample estimates: mean of x 0.539679 Module 3: Hypothesis testing p–value bioinformatics.ca example – HIV data Is gene 4 differentially expressed? Is the mean log ratio equal to zero? # One sample t-test data <- log(read.table(file="hiv.raw.data.24h.txt", sep="\t", header=TRUE)) # Compute the log ratios M <- (data[, 1:4] - data[, 5:8]) gene4 <- t.test(M[4, ], mu=0) x <- seq(-4, 4, 0.1) f <- dt(x, df=3) plot(x, f, xlab="x", ylab="density", type="l", lwd=5) segments(gene4$stat, 0, gene4$stat, dt(gene4$stat, df=3), col=3, lwd=5) segments(-gene4$stat, 0, -gene4$stat, dt(gene4$stat, df=3), col=2, lwd=5) segments(x[abs(x)>abs(gene4$stat)], 0, x[abs(x)>abs(gene4$stat)], f[abs(x)>abs(gene4$stat)], col=4, lwd=1) > gene4 One Sample t-test data: M[4, ] t = 3.2441, df = 3, p-value = 0.0477 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 0.01854012 1.93107647 sample estimates: mean of x 0.9748083 Module 3: Hypothesis testing p–value bioinformatics.ca what is a p–value? a) A measure of how much evidence we have against the alternative hypothesis. b) The probability of making an error. c) Something that biologists want to be below 0.05 . d) The probability of observing a value as extreme or more extreme by chance alone. e) All of the above. Module 3: Hypothesis testing bioinformatics.ca two–sample t–test Test if the means of two distributions are the same. The data yi1, ..., yin are independent and normally distributed with mean μi and variance σ2, N (μi,σ2), where i=1,2. In addition, we assume that the data in the two groups are independent and that the variance is the same. H0: μ1 = μ2 Module 3: Hypothesis testing H1: μ1 = μ2 bioinformatics.ca two–sample t–test Module 3: Hypothesis testing bioinformatics.ca example revisited – HIV data Is gene 1 differentially expressed? Are the means equal? # Two sample t-test data <- log(read.table(file="hiv.raw.data.24h.txt", sep="\t", header=TRUE)) gene1 <- t.test(data[1,1:4], data[1,5:8], var.equal=TRUE) x <- seq(-4, 4, 0.1) f <- dt(x, df=6) plot(x, f, xlab="x", ylab="density", type="l", lwd=5) segments(gene1$stat, 0, gene1$stat, dt(gene1$stat, df=6), col=3, lwd=5) segments(-gene1$stat, 0, -gene1$stat, dt(gene1$stat, df=6), col=2, lwd=5) segments(x[abs(x)>abs(gene1$stat)], 0, x[abs(x)>abs(gene1$stat)], f[abs(x)>abs(gene1$stat)], col=4, lwd=1) > gene1 Two Sample t-test data: data[1, 1:4] and data[1, 5:8] t = 0.6439, df = 6, p-value = 0.5434 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.511041 2.590398 sample estimates: mean of x mean of y 2.134678 1.594999 Module 3: Hypothesis testing p–value bioinformatics.ca example revisited – HIV data Is gene 4 differentially expressed? Are the means equal? # Two sample t-test data <- log(read.table(file="hiv.raw.data.24h.txt", sep="\t", header=TRUE)) gene4 <- t.test(data[4,1:4], data[4,5:8], var.equal=TRUE) x <- seq(-4, 4, 0.1) f <- dt(x, df=6) plot(x, f, xlab="x", ylab="density", type="l", lwd=5) segments(gene4$stat, 0, gene4$stat, dt(gene4$stat, df=6), col=3, lwd=5) segments(-gene4$stat, 0, -gene4$stat, dt(gene4$stat, df=6), col=2, lwd=5) segments(x[abs(x)>abs(gene4$stat)], 0, x[abs(x)>abs(gene4$stat)], f[abs(x)>abs(gene4$stat)], col=4, lwd=1) > gene1 Two Sample t-test data: data[4, 1:4] and data[4, 5:8] t = 2.4569, df = 6, p-value = 0.04933 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.003964227 1.945652365 sample estimates: mean of x mean of y 7.56130 6.588321 Module 3: Hypothesis testing p–value bioinformatics.ca t–test assumptions Normality: The data need to be Normal. If not, one can use a transformation or a non parametric test. If the sample size is large enough (n>30), the t-test will work just fine (CLT). Independence: Usually satisfied. If not independent, more complex modeling is required. Independence between groups: In the two sample t- test, the groups need to be independent. If not, one can use a paired ttest. Equal variances: If the variances are not equal in the two groups, use Welch's t-test (default in R). Module 3: Hypothesis testing bioinformatics.ca Exercise Try Welch's t-test and the paired t-test. # Two sample t-test (Welch's) gene1 <- t.test(data[1, 1:4], data[1, 5:8]) x <- seq(-4, 4, 0.1) f <- dt(x, df=6) plot(x, f, xlab="x", ylab="density", type="l", lwd=5) segments(gene1$stat, 0, gene1$stat, dt(gene1$stat, df=6), col=2, lwd=5) segments(-gene1$stat, 0, -gene1$stat, dt(gene1$stat, df=6), col=2, lwd=5) segments(x[abs(x)>abs(gene1$stat)], 0, x[abs(x)>abs(gene1$stat)], f[abs(x)>abs(gene1$stat)], col=2, lwd=1) gene1 # Paired t-test gene1 <- t.test(as.double(data[4, 1:4]), as.double(data[4, 5:8]), paired=TRUE) x <- seq(-4, 4, 0.1) f <- dt(x, df=6) plot(x, f, xlab="x", ylab="density", type="l", lwd=5) segments(gene1$stat, 0, gene1$stat, dt(gene1$stat, df=6), col=2, lwd=5) segments(-gene1$stat, 0, -gene1$stat, dt(gene1$stat, df=6), col=2, lwd=5) segments(x[abs(x)>abs(gene1$stat)], 0, x[abs(x)>abs(gene1$stat)], f[abs(x)>abs(gene1$stat)], col=2, lwd=1) gene1 Compare with the previous results. Module 3: Hypothesis testing bioinformatics.ca non–parametric tests Constitute a flexible alternative to the t-tests where there is no distributional assumption. There exist several non parametric alternatives including the Wilcoxon and Mann-Whitney tests. In cases where a parametric test would be appropriate, non-parametric tests have less power. Module 3: Hypothesis testing bioinformatics.ca Exercise Use R to perform a non–parametric test (Wilcoxon) on the gene 1 data and on the gene 4 data. Hint:Type help.search("wilcoxon") Module 3: Hypothesis testing bioinformatics.ca permutation test When computing the p-value, all we need to know is the distribution of our statistics under the null hypothesis. How can we estimate the null distribution? In the two sample case, one could simply randomly permute the group labels and recompute the statistics to simulate the null distribution. Repeat for a number of permutations and compute the number of times you observed a value as extreme or more extreme. Module 3: Hypothesis testing bioinformatics.ca permutation test • Select a statistic (e.g. mean difference, t statistic) • Compute the statistic for the original data t. • For a number of permutations, Np • Randomly permute the labels and compute the associated statistic, t 0i • p-value = { # |t 0i| > |t | } / Np Module 3: Hypothesis testing bioinformatics.ca Exercise Try the permutation test. Interpret the result. # Permutation test gene 1 set.seed(100) Np <- 100 t0 <- rep(0, Np) # Here we use the difference in means, but any statistics could be used! t <- mean(as.double(data[1, 1:4]))-mean(as.double(data[1, 5:8])) for(i in 1:Np) { perm <- sample(1:8) newdata <- data[1, perm] t0[i] <- mean(as.double(newdata[1:4]))-mean(as.double(newdata[5:8])) } pvalue <- sum(abs(t0)>abs(t))/Np pvalue Note: this kind of test is also called "bootstrapping". Module 3: Hypothesis testing bioinformatics.ca the Bootstrap The basic idea is to resample the data we have observed and compute a new value of the statistic/estimator for each resampled data set. Then one can assess the estimator by looking at the empirical distribution across the resampled data sets. Module 3: Hypothesis testing set.seed(100) x <- rnorm(15) muHat <- mean(x) sigmaHat <- sd(x) Nrep <- 100 muHatNew <- rep(0, Nrep) for(i in 1:Nrep) { xNew <- sample(x, replace=TRUE) muHatNew[i] <- median(xNew) } se <- sd(muHatNew) muHat se bioinformatics.ca Error rates Truth Decision Accept H0 H0 H1 1- "False negative" Reject H0 1- "False positive" Module 3: Hypothesis testing bioinformatics.ca Error rates (statistician's version) Truth Decision Accept H0 H0 H1 1- "Type II error" Reject H0 1- "Type I error" Module 3: Hypothesis testing bioinformatics.ca statistical "power" The power of a statistical test is the probability that the test will reject the null hypothesis when the null hypothesis is false (i.e. that it will not make a Type II error, or a false negative decision). As the power increases, the chances of a Type II error occurring decrease. The probability of a Type II error occurring is referred to as the false negative rate (β). Therefore power is equal to 1 − β, which is also known as the sensitivity. Power analysis can be used to calculate the minimum sample size required so that one can be reasonably likely to detect an effect of a given size. Power analysis can also be used to calculate the minimum effect size that is likely to be detected in a study using a given sample size. In addition, the concept of power is used to make comparisons between different statistical testing procedures: for example, between a parametric and a nonparametric test of the same hypothesis. From Wikipedia – Statistical_Power Module 3: Hypothesis testing bioinformatics.ca One sample t-test – power calculation 1 sample t-test: If the mean is μ0, t follows a t-distribution with n-1 degrees of freedom. If the mean is not μ0, t follows a non central t-distribution with n-1 degrees of freedom and noncentrality parameter (μ1-μ0) x (s/√n). Module 3: Hypothesis testing bioinformatics.ca Power, error rates and decision Power calculation in R: > power.t.test(n = 5, delta = 1, sd=2, alternative="two.sided", type="one.sample") One-sample t test power calculation n=5 delta = 1 sd = 2 sig.level = 0.05 power = 0.1384528 alternative = two.sided Other tests are available – see ??power. Module 3: Hypothesis testing bioinformatics.ca Power, error rates and decision PR(False Negative) PR(Type II error) μ0 μ 1 PR(False Positive) PR(Type I error) Module 3: Hypothesis testing bioinformatics.ca Power, error rates and decision Module 3: Hypothesis testing bioinformatics.ca multiple testing Single hypothesis testing • Fix the False Positive error rate (eg. α = 0.05). • Minimize the False Negative (maximize sensitivity) This is what traditional testing does. What if we perform many tests at once? Does this afect our False Positive rate? Module 3: Hypothesis testing bioinformatics.ca multiple testing With high-throughput methods, we usually look at a very large number of decisions for each experiment. For example, we ask for every gene on an array whether it is significantly up- or downregulated. This creates a multiple testing paradox. The more data we collect, the harder it is for every observation to appear significant. Therefore: • We need ways to assess error probability in multiple testing situations correctly; • We need approaches that address the paradox. Module 3: Hypothesis testing bioinformatics.ca multiple testing example 1000 t-tests, all null hypotheses are true (μ = 0). For one test, Pr of a False Positive is 0.05. For 1000 tests, Pr of at least one False Positive is 1–(1–0.05)1000 ≈ 1 > set.seed(100) > y <- matrix(rnorm(100000), 1000, 5) > myt.test <- function(y){ + t.test(y, alternative="two.sided")$p.value +} > P <- apply(y, 1, myt.test) > sum(P<0.05) [1] 44 Module 3: Hypothesis testing bioinformatics.ca FWER The FamilyWise Error Rate is the probability of having at least one False Positive (making at least one type I error) in a "family" of observations. Example: Bonferroni multiple adjustment. p̃g = N x pg If p̃g ≤ α then FWER ≤ α This is simple and conservative, but there are many other (more powerful) FWER procedures. Module 3: Hypothesis testing bioinformatics.ca False Discovery Rate (FDR) The FDR is the proportion of False Positives among the genes called Differentially Expressed (DE). Order the p-values for each of N observations: p(1) ≤ . . . ≤ p(i ) ≤ . . . ≤ p(N) Let k be the largest i such that p(i ) ≤ i / N x α ... then the FDR for genes 1 ... k is controlled at α. Hypotheses need to be independent! FDR: Benjamini and Hochberg (1995) Module 3: Hypothesis testing bioinformatics.ca FDR example # FDR set.seed(100) N <- 10000 alpha <- 0.05 y1 <- matrix(rnorm(9000*4, 0, 1), 9000, 4) y2 <- matrix(rnorm(1000*4, 5, 1), 1000, 4) y <- rbind(y1, y2) myt.test <- function(y){ t.test(y, alternative="two.sided")$p.value } P <- apply(y, 1, myt.test) sum(P<alpha) Psorted <- sort(P) plot(Psorted, xlim=c(0, 1000), ylim=c(0, .01)) abline(b=alpha/N, a=0, col=2) p <- p.adjust(P, method="bonferroni") sum(p<0.05) p <- p.adjust(P, method="fdr") sum(p<0.05) # Calculate the true FDR sum(p[1:9000]<0.05)/sum(p<0.05) Module 3: Hypothesis testing bioinformatics.ca FDR – applied to the HIV dataset # Lab on HIV data data <- log(read.table(file="hiv.raw.data.24h.txt", sep="\t", header=TRUE)) M <- data[, 1:4]-data[, 5:8] A <- (data[, 1:4]+data[, 5:8])/2 Bonferroni p-values < 0.05 # Here I compute the mean over the four replicates M.mean <- apply(M, 1, "mean") A.mean <- apply(A, 1, "mean") n <- nrow(M) # Basic normalization for(i in 1:4) M[, i] <- M[, i]-mean(M[, i]) p.val <- rep(0, n) for(j in 1:n) p.val[j] <- t.test(M[j, ], alternative = c("two.sided"))$p.value p.val.tilde <- p.adjust(p.val, method="bonferroni") plot(A.mean, M.mean, pch=".") points(A.mean[p.val<0.05], M.mean[p.val<0.05], col=3) points(A.mean[p.val.tilde<0.05], M.mean[p.val.tilde<0.05], col=4) Module 3: Hypothesis testing bioinformatics.ca FDR – applied to the HIV dataset FDR < 0.1 p.val.tilde <- p.adjust(p.val, method="fdr") plot(A.mean, M.mean, pch=".") points(A.mean[p.val<0.05], M.mean[p.val<0.05], col=3) points(A.mean[p.val.tilde<0.1], M.mean[p.val.tilde<0.1], col=2) Module 3: Hypothesis testing bioinformatics.ca FDR – applied to the HIV dataset M.sd <- apply(M, 1, "sd") FDR < 0.1 plot(M.mean, M.sd) points(M.mean[p.val.tilde<0.1], M.sd[p.val.tilde<0.1], col=2) The samples included under the FDR criterion are predominantly characterized by a small variance. Is this biologically meaningful? Module 3: Hypothesis testing bioinformatics.ca t–test revisited 1 sample t-test: If the mean is zero, tg approximately follows a t-distribution with R – 1 degrees of freedom. Problems: 1) For many genes S is small! 2) Is t really t-distributed? Module 3: Hypothesis testing bioinformatics.ca t–test revisited 1) For many genes S is small! Use a modified t–statistic. tg = Mg (Sg + c) R (Positive constant) 2) Is t really t-distributed? Estimate the null distribution (through permutation). If there is no differential expression (i.e. Sample and Control are drawn from the same distribution and H0 is true) permuting the columns of the data matrix should not change the statistics. Module 3: Hypothesis testing bioinformatics.ca SAM Module 3: Hypothesis testing bioinformatics.ca SAM library(samr) y <- c(1, 1, 1, 1) data.list <- list(x=M, y=y, geneid=as.character(1:nrow(M)), genenames=paste("g", as.character(1:nrow(M)), sep=""), logged2=TRUE) res <- samr(data.list, resp.type=c("One class"), s0=NULL, s0.perc=NULL, nperms=100, center.arrays=FALSE, testStatistic=c("standard")) delta.table <- samr.compute.delta.table(res, nvals=500) kk <- 1 while(delta.table[kk, 5]>0.1) kk <- kk+1 delta <- delta.table[kk, 1] siggenes.table <- samr.compute.siggenes.table(res, delta, data.list, delta.table) ind.diff <- sort(as.integer(c(siggenes.table$genes.up[, 3], siggenes.table$genes.lo[, 3]))) n1 <- dim(data)[1] ind.log <- rep(FALSE, n1) ind.log[ind.diff] <- TRUE plot(A.mean, M.mean, pch=".") points(A.mean[ind.log], M.mean[ind.log], col=3) Module 3: Hypothesis testing bioinformatics.ca SAM FDR 10% Module 3: Hypothesis testing bioinformatics.ca LIMMA Exercise: perform the same analysis with LIMMA. ### Limma ### source("http://bioconductor.org/biocLite.R") biocLite("limma") ## Compute necessary parameters (mean and standard deviations) library(limma) fit <- lmFit(M) ## Regularize the t-test fit <- eBayes(fit) ## Default adjustment is BH (FDR) topTable(fit, p.value=0.05, number=100) Module 3: Hypothesis testing bioinformatics.ca Multiple conditions Module 3: Hypothesis testing bioinformatics.ca other alternatives • Non parametric versions of the t/F–test • Modified versions of the t/F–test (SAM) • Empirical and fully Bayesian approaches Module 3: Hypothesis testing bioinformatics.ca summary Sample size Number of tests p=1 p>1 n < 30 non-parametric t-test/F- test regularized t-test/F-test (e.g. SAM, limma) + multiple testing n ≥ 30 t-test, F- test t-test, F- test + multiple testing Multiple testing: If hypotheses are independent or weakly dependent use an FDR correction, otherwise use Bonferroni's FWER. For more complex hypotheses, try an anova (p=1) or limma (p>1). Module 3: Hypothesis testing bioinformatics.ca We are on a Coffee Break & Networking Session Module 3: Hypothesis testing bioinformatics.ca