Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
5. Hypothesis Testing ‘mechanical aptitude’ Is μ > 75 ? Using Data to Test Hypotheses “When it is not in our power to determine what is true, we ought to follow what is most probable.” Confidence intervals are one of the two most common types of formal statistical inference. They are appropriate when our goal is to estimate a population parameter. The second common type of inference is directed at a different goal; to assess the evidence provided by the data in favour of some claim about the population. = [68.5,70.5] = [ 1222, 1300] Hypotheses • In Statistics, a hypothesis proposes a model. Then we look at the data. • If the data are consistent with that model, we have no reason to disbelieve the hypothesis. Data consistent with the model lend support to the hypothesis, but do not prove it. • But if the facts are inconsistent with the model, we need to make a choice as to whether they are inconsistent enough to disbelieve the model. If they are inconsistent enough, we can reject the model. Hypotheses (cont.) • Think about the logic of jury trials: To prove someone is guilty, we start by assuming they are innocent. We retain that hypothesis until the facts make it unlikely beyond a reasonable doubt. Then, and only then, we reject the hypothesis of innocence and declare the person guilty. Hypotheses (cont.) • The same logic used in jury trials is used in statistical tests of hypotheses: We begin by assuming that a hypothesis is true. Next we consider whether the data are consistent with the hypothesis. If they are, all we can do is retain the hypothesis we started with. If they are not, then like a jury, we ask whether they are unlikely beyond a reasonable doubt. Hypotheses (cont.) Testing Hypotheses • In Statistics, we can quantify our doubt by finding the probability that data like we saw could occur based on our hypothesized model. • The null hypothesis, which we denote H0, specifies a population model parameter of interest and proposes a value for that parameter. We might have, for example, H0: μ >75, as in the mechanical aptitude example. • We want to compare our data to what we would expect given that H0 is true. • We then ask how likely it is to get results from a sample like we did if the null hypothesis were true. – If the results seem consistent with what we would expect from natural sampling variability, we’ll retain the hypothesis. – If the probability of seeing results like our data is really low, we reject the hypothesis. Strategy 1 Strategy 1. State a hypothesis 2. Take a random sample from the population of interest and calculate a suitable statistic 3. Investigate how likely the value of that statistic is if your specified hypothesis is true 4. Make a decision as to whether your hypothesis is true given iii. 1. State a hypothesis – – The null hypothesis: To perform a hypothesis test, we must first translate our question of interest into a statement about model parameters. In general, we have H0: parameter = value. The alternative hypothesis: The alternative hypothesis, HA, contains the values of the parameter we accept if we reject the null. HA comes in three basic forms: • • • HA: parameter < value HA: parameter ≠ value HA: parameter > value Strategy 4 5. Hypothesis Testing – The decision in a hypothesis test is always a statement about the null hypothesis. – The decision must state either that we reject or that we fail to reject the null hypothesis. (a) Tests Concerning Means 4. Decision (i) One Sample Test Given a random sample of size n from a population with mean μ and standard deviance σ, how do we test Ho : μ = μ o Against various alternative hypotheses? (i) Case 1 n ≥ 30 2. TS: “production process” in control if μ = 35.50mm σ= 0.45mm. In a random sample of size 40 he obtains a mean of 35.62mm. Should the process be shut down? What do you advise? P-values in Hypothesis Tests 1. σ unknown? Use s. z= Example 1 x - μo σ n 3. Distribution of TS if H0 true: N(0,1) • Once we have our test statistic, we can calculate a P-value—the probability of observing a value for a test statistic at least as far from the hypothesized value as the statistic value actually observed if the null hypothesis is true. • The smaller the P-value, the more evidence we have against the null hypothesis. Alpha Levels Alpha Levels (cont.) • Sometimes we need to make a firm decision about whether or not to reject the null hypothesis. • When the P-value is small, it tells us that our data are rare given H0. How rare is “rare”? • We can define “rare event” arbitrarily by setting a threshold for our P-value. If our P-value falls below that point, we’ll reject H0. We call such results statistically significant. The threshold is called an alpha level, denoted by α. • Common alpha levels are .10, .05, and .01. The alpha level is also called the significance level. (When we reject the null hypothesis, we say that the test is “significant at that level.”) • You need to consider your alpha level carefully and choose an appropriate one for the situation. Hypothesis testing at α=0.01 (large samples, two-tailed test) Z* not significant at α=0.01 Z* significant at α=0.01 Area = 0.005 -2.58 Area = 0.99 Critical Values Again (cont.) Z* significant at α=0.01 Area = 0.005 2.58 • Rather than looking up your test statistic value in the table, you could just check it directly against these critical values. – Any test statistic score larger in magnitude than a particular critical value leads us to reject H0. – Any test statistic score smaller in magnitude than a particular critical value leads us to fail to reject H0. Sig. level α 0.10 Critical values of z for one-tailed test -1.28 or 1.28 Critical values of z for two-tailed test -1.65 or 1.65 0.05 0.01 -1.65 -2.33 or 1.65 or 2.33 -1.96 -2.58 or 1.96 or 2.58 P-Values and Decisions: What to Tell About a Hypothesis Test • How small should the P-value be in order for you to reject the null hypothesis? • It turns out that our decision criterion is contextdependent. – When we’re screening for a disease and want to be sure we treat all those who are sick, we may be willing to reject the null hypothesis of no disease with a fairly large P-value. – A longstanding hypothesis, believed by many to be true, needs stronger evidence (and a correspondingly small P-value) to reject it. • Another factor in choosing a P-value is the importance of the issue being tested. P-Values and Decisions (cont.) • Your conclusion about any null hypothesis should be accompanied by the P-value of the test. • Don’t just declare the null hypothesis rejected or not rejected—report the P-value to show the strength of the evidence against the hypothesis. This will let each reader decide whether or not to reject the null hypothesis. Exercise. Calculate a 95% C.I. for the population mean using the sample data and see if the resulting interval ‘agrees’ with the hypothesis test you just carried out. Alternative Alternatives • As stated earlier, there are three possible alternative hypotheses: • HA: parameter < value • HA: parameter ≠ value • HA: parameter > value • HA: parameter ≠ value is known as a two-sided alternative because we are equally interested in deviations on either side of the null hypothesis value. For two-sided alternatives, the P-value is the probability of deviating in either direction from the null hypothesis value. P-Values and Decisions: What to Tell About a Hypothesis Test • How small should the P-value be in order for you to reject the null hypothesis? • It turns out that our decision criterion is contextdependent. – When we’re screening for a disease and want to be sure we treat all those who are sick, we may be willing to reject the null hypothesis of no disease with a fairly large P-value. – A longstanding hypothesis, believed by many to be true, needs stronger evidence (and a correspondingly small P-value) to reject it. • Another factor in choosing a P-value is the importance of the issue being tested. Alternative Alternatives (cont.) • The other two alternative hypotheses are called one-sided alternatives. A one-sided alternative focuses on deviations from the null hypothesis value in only one direction. Thus, the P-value for one-sided alternatives is the probability of deviating only in the direction of the alternative away from the null hypothesis value. Example 2 Current mean check in time is 3.8mins. In a random sample of 50 under ‘new system’ a mean of 3.3mins (s=1.1) is observed. Should the new system be implemented? What do you advise? (i) Case 2 n < 30 HA μ≠μo 1. σ unknown? Use s. CR at sig. level α Dist under Ho t* < t α/2 or t* > t α/2 t(n-1 df) Two-sided 2. TS: t= tα/2 x - μo σ μ>μo n t* > t α μ<μo t* < t α 0 tα t(n-1 df) One-sided t(n-1 df) tα/2 t(n-1 df) One-sided 3. Distribution of TS if H0 true: 0 tα 0 Example 3 6 Frequency 5 In a sample of 23 a mean speed of 31mph (s=4.25) is observed. Are cars obeying the speed limit in general? 4 3 2 1 0 24 28 32 Speed 36 40 1. A significance test is a formal procedure for comparing observed data with a hypothesis whose truth we want to assess. 2. The hypothesis is a statement about the parameter(s) in a population or model. 3. The results of a test are expressed in terms of a probability that measures how well the data and the hypothesis agree. Strategy Revisited 1. State the Null and Alternative Hypotheses 2. Collect a random sample of data 3. Calculate an appropriate test statistic (TS) 4. Determine the distribution of TS when Ho is true 5. Decide on the ‘Significance Level’ α and corresponding Critical Region CR. 6. Check whether the value of the TS is in the critical region, report the p-value and make a decision.