Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 9. Hypothesis testing Mathematical Statistics and Discrete Mathematics November 30th, 2015 1 / 14 Motivating example • A politician running for presidency in the USA made a controversial statement on the national TV. He is rightfully worried that the statement had a negative influence on the number of people who will vote for him. He wants us to make a poll to check if less people support him now compared to the previous study. The previous study showed that 20% will vote for the politician. • Let p be the true proportion of supporters of the politician in the population. We define two mutually exclusive hypotheses: H0 : p ≥ 0.2, and H1 : p < 0.2. We assume that each person either supports the candidate or not. We plan to ask 225 randomly chosen people for their opinion. If H0 is true, then on average we should get around 45 or more people who support the candidate. We want to reject H0 if the number of people supporting the politician is too small compared to this prediction. 2 / 14 Motivating example • The question is: what constitutes “too small”? The answer is: this is up to us. We decide to reject H0 if the number of supporters is smaller than 35. • We make the poll and observe 37 supporters. The result is hence not convincing enough (for us), and we (sadly) do not reject H0 . 3 / 14 Hypothesis testing (1) (1) In a hypothesis testing situation, we are interested in a population parameter θ, and we have a preconceived notion concerning its value. Based on this, we define two mutually exclusive hypotheses. The one that we hope the evidence will support is called the research hypothesis and is denoted by H1 . Its negation is called the null hypotheses and is denoted by H0 . The hypotheses are usually of the form H0 : θ ≤ θ 0 , H1 : θ > θ 0 , H0 : θ ≥ θ 0 , H1 : θ < θ 0 , H0 : θ = θ 0 , H1 : θ 6= θ0 . The value θ0 is called the null value and is included in H0 . Recall that in contrast to hypothesis testing, in a typical estimation problem, there is no preconceived notion concerning the value of the parameter θ. 4 / 14 Hypothesis testing (1) (1) A producer of cookies claims that less than 10% of packs contain broken cookies. We question the claim and want to perform hypothesis testing. Our null value is p0 = 0.1, and we formulate the relevant hypotheses: H0 : p ≤ 0.1, H1 : p > 0.1. 5 / 14 Hypothesis testing (2) (2) We have to consider the statistical assumptions concerning the distribution of the data. We need to decide on the test statistic T whose distribution we can establish under the assumption that θ = θ0 . (2) To a pack of cookies we assign a random variable X which follows a Bernoulli distribution. The event {X = 1} means that the pack contains broken cookies, and the event {X = 0} means that there are no broken cookies. We assume that H0 is true which means that X ∼ Bernoulli(p0 ) = Bernoulli(0.1). We are planning to take a sample of 100 packs of cookies. Our test statistic is T = X1 + X2 + . . . + X100 ∼ Binom(100, 0.1). 6 / 14 Hypothesis testing (3) (3) Before collecting the data we select the significance level α, a probability threshold below which the null hypothesis will be rejected. Common values are 5% and 1%. We then compute a critical region of values of T such that the probability of observing a value t of T in this region under the assumption that θ = θ0 is α. (3) We set α = 0.05. We use the central limit theorem to approximate T ∼ Binom(100, 0.1) by a normal variable with mean 10 and variance 9. We want to find a number a0.05 such that P(T > a0.05 ) = 0.05. Using that P(Z > 1.64) = 0.05, we obtain that P(T > 14.92) = 0.05. Our critical region is hence the set {t : t ≥ 15}. 7 / 14 Hypothesis testing (4) (4) We compute the observed value t of the statistic T. If it is in the critical region, then we reject H0 . Otherwise, we do not reject it. (4) We buy 100 packages of cookies and 17 of them contain broken cookies. 17 belongs to the critical region and hence the result is statistically significant at confidence level 95%, and we reject the null hypothesis. 8 / 14 Possible outcomes The possible outcomes of hypothesis testing are: • We correctly fail to reject H0 when H0 is true. • We reject H0 when H0 is true, and we make a type I error. • We correctly reject H0 when H1 is true. • We fail to reject H0 when H1 is true, and we make a type II error. The probability of making a type I error is called the significance level of the test and is denoted by α. The probability of making a type II error is denoted by β. The number 1 − β is called power of the test and is the probability of rejecting H0 when H1 is true. X Compute the significance level of our test for the American politician. Use normal approximation of the binomial distribution Binom(225, 0.2). Our critical region was set to be {1, 2, . . . 35}. X Compute the power of our test for the american politician if the true proportion of supporters is p = 0.1. 9 / 14 Significance testing Significance testing is an alternative (often more convenient) strategy to hypothesis testing: • It follows the same two first steps (1) and (2) as hypothesis testing. • The difference is that step (3) which involves choosing α is skipped. • After step (2), we evaluate the observed value t of the test statistic T. • We then evaluate the probability of observing values as extreme as t under the assumption that θ = θ0 . This probability is referred to as the p-value of the test. Note that the interpretation of the phrase as extreme depends on the form of H0 . The p-value of the test is the smallest level at which we could have set α and still could have been able to reject H0 . 10 / 14 Significance testing X Compute the p-value for the american politician example Binom(225, 0.2), t = 37. X Compute the p-value for the cookie example Binom(100, 0.1), t = 17. X A street crook is accused of using a biased coin when claiming that it is a fair coin. Since we do not know which side is biased, we set H0 to be the hypothesis that the coin is fair, that is p = 1/2. We toss the coin 40 times and observe 28 heads. Compute the p-value of the two-sided test. 11 / 14 Hypothesis and significance tests on the mean The form of the critical region is aligned with the form of H1 : H0 : µ ≤ µ0 , H1 : µ > µ 0 , CR = {t > tc }, H0 : µ ≥ µ0 , H1 : µ < µ 0 , CR = {t < tc }, H0 : µ = µ0 , H1 : µ 6= µ0 , CR = {t > tc } ∪ {t < right-tailed test left-tailed test tc0 }. two-tailed test As with confidence intervals, we will consider the following cases: • normal data and variance known, • arbitrary data, large sample size and variance known (central limit theorem), • normal data and variance unknown (small samples “≤ 30” - t-distribution, large samples “> 30” - normal distribution). Note that in previous examples, we always considered binomial distributions, where variance is always determined by the mean value, and hence we did not have to consider the scenarios above. 12 / 14 Doctors think that the average weight µ of a newborn baby in Sweden has grown compared to the previous estimation of 3.5 kg. The standard deviation is thought not to have change and is assumed to be 0.8. We set the hypothesis: H0 : µ ≤ 3.5, H1 : µ > 3.5. We assume that the data is normally distributed and we consider the test statistic T= X − µ0 √ ∼ N (0, 1) σ/ n We weigh a random sample of 10 babies and obtain results: 2.43, 3.84, 3.92, 3.88, 4.78, 1.95, 4.65, 4.20, 3.35, 4.40. We have x = 3.74, and the value of the test statistic is t= x − µ0 3.74 − 3.5 √ = √ = 0.95. σ/ n 0.8/ 10 The p-value is P(T > t) = P(Z > 0.95) = 1 − 0.83 = 0.17. 13 / 14 Consider the same problem but with the assumption that the variance is unknown. We consider the statistic X − µ0 √ ∼ T10−1 , T= S/ n Using that s = 0.92, we compute the value of this statistic for the same data: t= x − µ0 3.74 − 3.5 √ = √ = 0.824. s/ n 0.92/ 10 The p-value is P(T > t) = P(T9 > 0.824). 0.824 is between t0.25 = 0.703 and t0.1 = 1.383 which means that the p-value is between 0.1 and 0.25. 14 / 14