Download Slide - Wellcome Trust Centre for Human Genetics

Dealing With Statistical Uncertainty Richard Mott Wellcome Trust Centre for Human Genetics Synopsis • • • • • • • • Hypothesis testing P-values Confidence intervals T-test Non-parametric tests Permutations Bootstrap Contingency Tables Hypothesis testing Null Hypothesis H0 Examples of H0: – Mean of a population is 3.0 – In a genetic association study, there is no association between disease state and the genotypes of a particular SNP Alternative Hypothesis H1 • Examples of H1: – Mean is 4.0 – Mean > 3.0 – Mean != 3.0 – There is an association between disease and genotype The Normal Distribution X ~ N(m,s2) • Mean m • Variance s2 • Density function – exp(-(y-m)2)/(2ps2)0.5 • Many quantitative traits have Normal Distributions – Height – Weight • Central Limit Theorem – average of many random quantities usually has a Normal Distribution even if the individual observations do not – Mean(X) ~ N(m,s2/n) P-values • The P-value of a statistic Z is the probability of sampling a value more extreme than that observed value z if the null hypothesis is true. • For a one-sided test, the p-value would be a = Prob( Z > z | H0) • For a two-sided test it is a = Prob( |Z| > z | H0) • Note this is not the probability density at x, but the area of the tails Upper and lower 2.5% tails of the N(0,1) distribution, corresponding to |z|=1.96 Power • The power of a statistic Z is the probability of sampling a value more extreme than the observed value z if the alternative hypothesis is true. • For a one-sided test, the power would be b = Prob( Z > z | H1) • For a two-sided test it is b = Prob( |Z| > z | H1) • Note this is not the probability density at x, but area in the tails H0: N(0,1) H1: N(1.5,1) H2: N(4,1) 0.05 = Prob( Z> 1.644|H0) (5% upper tail) 0.44 = Prob( Z> 1.644|H1) 0.99 = Prob( Z> 1.644|H2) The Likelihood • • Likelihood = “probability of the data given the model” Basis of parametric statistical inference • Different hypotheses can often be expressed in terms of different values of the parameters q • Often use log of the likelihood • Normal distribution likelihood, q = (m,s): • Hypothesis testing equivalent to comparing the likelihoods for different q L(q ) = Õ Pr(y i | q ) i The Likelihood Ratio Test • General Framework for constructing hypothesis tests: S = Likelihood( data | H1) / Likelihood( data | H0) Reject H0 if S > s(H0, H1) Threshold s is chosen such that there is a probability a of making a false rejection under H0. a is the size of the test (or false positive rate or Type I error) e.g. a=0.05 b is the power of the test, the probability of correctly rejecting H0 when H1 is true e.g. b=0.8 1- b is the type II error, the false negative rate Generally, for fixed sample size n, if we fix a then we can’t fix b If we fix a and b then we must let n vary. The Neyman Pearson Lemma states that the likelihood ratio test is the most powerful of all tests of a given size Type III error: "correctly rejecting the null hypothesis for the wrong reason". Example: The mean of a Normal Distribution H0: mean = m0 vs H1: mean = m1 data: y1, … yn independently and identically distributed with a Normal distribution N(m,s2) Assume variance is the same and is known log(Lik. ratio) µ 1 y i + f ( m0, mi ,s ) å n i Therefore we base all inferences on the sample mean compared to the difference in the hypothesised means Note the distribution of the sample mean is N(m,s2/n) The Normal and T distributions • For a sample from a Normal distribution with known mean and variance, the sample mean can be standardised to follow the standard Normal distribution • And the 95% confidence interval for the mean is given by • But we often wish to make inferences about samples from a Normal Distribution when the variance is unknown: The variance must be estimated from the data as the sample variance • Because of this uncertainty the distribution of the sample mean is broader, and follows the Tn-1 distribution The T distribution • The Tn and Normal distributions are almost identical for n>20, • As a rule of thumb, an approximate 95% confidence interval for n>20 is T-tests • The T-test compares the means of samples • The T-test is optimal (in Likelihood-ratio sense) if the data are sampled from a Normal distribution • Even if the data are not Normal the test is useful in large samples • There are several versions, depending on the details T tests for the comparison of sample means One Sample Test • One sample y1 ….. yn – H0: the sample mean = m0 – H1: the sample mean != m0 – Reject H0 if – tn-1(a/2) is the quantile of the Tn-1 distribution such that the probability of exceeding tn-1(a/2) is a/2 – Two sided test – in R, tn-1(0.025) = qt(0.025,n-1) T tests for the comparison of sample means Paired T-test • Two samples of paired data (x1,y1), (x2,y2) …. (xn,yn) • Example: – two experimental replicates taken from n individuals, we wish to test if the replicates are statistically similar • • • • H0: mean(x) = mean(y) Take differences d1= x1 - y1 … dn= xn - yn H0: mean(d) = 0 One sample T-test with m0=0 T tests for the comparison of sample means Two-Sample T-test • Two samples of unpaired data x1, x2 ….. xn y1, y2 ….. ym • • H0: mean(x) = mean(y) If we assume the variance is the same in each group then we can estimate the variance as the pooled estimator • The test statistic is (The case with unequal variances is more complicated) T tests in R t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95, ...) • • • • • • • • • • x - a numeric vector of data values. y - an optional numeric vector data values. alternative - a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". mu - a number indicating the true value of the mean paired - a logical indicating whether you want a paired t-test. var.equal - a logical variable indicating whether to treat the two variances as being equal. conf.level - confidence level of the interval. formula - a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding groups. data - an optional data frame containing the variables in the model formula. subset - an optional vector specifying a subset of observations to be used. T tests in R samp1 <- rnorm( 20, 0, 1) # sample 20 numbers from N(0,1) distribution samp2 <- rnorm( 20, 1.5, 1) # sample from N(1.5,1) t.test( samp1, samp2,var.equal=TRUE ) Two Sample t-test data: samp1 and samp2 t = -3.5892, df = 38, p-value = 0.000935 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.9272037 -0.5372251 sample estimates: mean of x mean of y 0.4040969 1.6363113 Outliers A small number of contaminating observations with different distribution • Outliers can destroy statistical genuine significance Reason: the estimate of the variance is inflated, reducing the T statistic • And sometimes create false significance : Type III error: "correctly rejecting the null hypothesis for the wrong reason". • There are Robust Alternatives to the T-test: Wilcoxon tests > samp3 <- samp2 > samp3[1] = 40 # add one outlier > t.test( samp1, samp3,var.equal=TRUE ) Two Sample t-test data: samp1 and samp3 t = -1.5911, df = 38, p-value = 0.1199 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -7.0519594 0.8450837 sample estimates: mean of x mean of y 0.4040969 3.5075347 > var(samp2) [1] 1.160251 > var(samp3) [1] 74.88977 Non-parametric tests Non-parametric tests: Wilcoxon signed rank test (alternative to Paired T-test) • • • • • • • • Two samples of paired data (x1,y1), (x2-y2) …. (xn,yn) Take differences d1= x1 - y1 … dn= xn – yn H0: distribution of d’s is symmetric about 0 Compute the ranks of the absolute differences, rank(|d1|) etc Compute the signs of the differences Compute W, the sum of the ranks with positive signs If H0 true then W should be close to the mean rank n/2 The distribution of W is known and for n>20 a Normal approximation is used. • in R wilcox.test(x, y, alternative = c("two.sided", "less", "greater"), mu = 0, paired = TRUE, exact = NULL, correct = TRUE, conf.int = FALSE, conf.level = 0.95, ...) Wilcoxon rank sum test (alternative to unpaired T-test) • Two samples of unpaired data x1, x2 ….. xn y1, y2 ….. ym • • Arrange all the observations into a single ranked series of length N = n+m. R1 = sum of the ranks for the observations which came from sample 1. – • • The sum of ranks in sample 2 follows by calculation, since the sum of all the ranks equals N(N + 1)/2 Compute U = min(R1 ,R2) - n(n + 1)/2. z = (U –m)/s Normal approximation m = nm/2 s2 = nm(n=m+1)/12 • The Wilcoxon Rank sum test is only about 5% less efficient than the unpaired T-test (in terms of the sample size required to achieve the same power) in large samples. Wilcoxon rank sum test (alternative to unpaired T-test) >wilcox.test(samp1,samp2) Wilcoxon rank sum test data: samp1 and samp2 W = 82, p-value = 0.001041 alternative hypothesis: true mu is not equal to 0 > wilcox.test(samp1,samp3) # with outliers Wilcoxon rank sum test data: samp1 and samp3 W = 82, p-value = 0.001041 alternative hypothesis: true mu is not equal to 0 Binomial Sign Test (T-test on one sample) • One sample y1 ….. yn H0: the sample mean = m0 • Count M, the number of y’s that are > m0 • If H0 is true, this should be about n/2, and should be distributed as a Binomial random variable B(1/2,n) The Binomial Distribution B(n,p) • • • • • • • • • • Probability distribution of getting r successes in n independent trials Discrete Distribution (takes values 0,1,….,n) p = probability of success in one trial coin tossing with a fair coin: p=0.5 P(r|n,p) = pr(1-p)n-r n!/r!(n-r)! n!/r!(n-r)! is the number of ways of choosing r objects from a set of n pr(1-p)n-r is the probability that in a particular choice, there are exactly r successes Mean = p Variance = p(1-p) Normal Approximation for the proportion of successes r/n: – valid in limit for large n – r/n ~ N( p, p(1-p)/n ) Permutation Tests Another non-parametric Alternative • Permutation is a general principle with many applications to hypothesis testing. • But it is not useful for parameter estimation • The method is best understood by an example: – Suppose a data set is classified into N groups. We are interested in whether the differences between group means (eg if N=2 then the two sample T test or Wilcoxon Rank Sum test is appropriate) – Compute a statistic S (eg the difference between the maximum and minimum group mean) – If there are no differences between groups then any permutation of the group labellings between individuals should produce much the same result for S – Repeat i = 1 to M times: • permute the data • compute the statistic on the permuted data, Si – The Permutation p-value is the fraction of permutations that exceed S • It’s that simple! Bootstrapping • Given a set of n things S = {x1 …. x2}, a bootstrap sample is formed by sampling with replacement from S, to create a new set S* of size n • • Original Resample • Some elements of S will be missing in S* and some will be present multiply often (about 1/e = 63% of the elements will be present in S* on average) • S* can be used as a replacement for S in any algorithm that works on S, and we can repeatedly draw further bootstrap samples. • Any statistic Z that can be calculated on S can also be calculated on S*, and so we can construct the bootstrap sampling distribution of Z from repeated samples. • From this we can make inferences about the variation of Z, eg construct a 95% confidence interval (if Z is numeric) • Can be used for parameter estimation Bootstrapping • Bootstrapping is a general way to evaluate the uncertainty in estimates of statistics whilst making few assumptions about the data • It is particularly useful for complex statistics whose distribution cannot be evaluated analytically • Example: – In a gene expression study, we wish to find a minimal set of SNPs that in combination will predict the variation in a phenotype. – Suppose we have a deterministic algorithm A that will find us a plausible set of SNPs (ie a model). • This is likely to be a complex programme and contain some arbitrary steps. • It is also unlikely to give any measure of uncertainty to the model. – There may be many possible sets of SNPs with similar claims, so we would like a way to average across different models, and evaluate the evidence for a SNP as the fraction of models in which the SNP occurs • To do this, we need a way to perturb the data, to generate new data sets on which we can apply the procedure A • Bootstrapping is one way to do this Contingency Tables Smoker Non-smoker Male 10 10 20 Female 5 12 17 15 22 37 Do Males and Females have different rates of smoking? Contingency Tables • Data are a table of counts, with the row and column margins fixed. n11 n12 r1c1/N r1c2/N n21 n22 r2c1/N r2c2/N c1 = n11+ n21 c2 = n12+ n22 r1 = n11+ n12 r2 = n21+ n22 N • Ho: the counts in each cell are consistent with the rows and columns acting independently The Chi-Squared statistics 2 (O E ) å ij ij / Eij X2 = ij log LikRatio = 2åOij log(Oij / Eij ) ij »c 2 (R-1)(C -1) Fisher’s Exact Test for 2 x 2 Contingency tables • Similar idea to a permutation test (see below) • Given the marginal totals in a 2x2 table, we can compute the probability P of getting the observed cell counts assuming the null hypothesis of no associations between classifying factors. – hypergeometric distribution. • Find all tables with same margins and with probabilities no bigger than P • P-value of the table is the sum of these probabilities Fisher’s Exact Test Male Female Smoker a c a+c Non-smoker b d b+d a+b c+d n Male Female Smoker 10 5 15 Non-smoker 10 12 22 20 17 37 æ a + böæ c + d ö ç ÷ç ÷ è a øè c ø P= æ n ö ç ÷ èa + cø R implementation: m <- matrix(c(10,10,5,12),nrow=2) fisher.test(m) Fisher's Exact Test for Count Data data: m p-value = 0.3152 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.513082 11.942154 sample estimates: odds ratio 2.342796 >

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Slide - Wellcome Trust Centre for Human Genetics