* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Tests of Significance.
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Eigenstate thermalization hypothesis wikipedia , lookup
Taylor's law wikipedia , lookup
Foundations of statistics wikipedia , lookup
Omnibus test wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Resampling (statistics) wikipedia , lookup
5.2 Tests of Significance Example 5.7. Diet colas use artificial sweeteners to avoid sugar. Colas with artificial sweeteners gradually lose their sweetness over time. Manufacturers therefore test new colas for loss of sweetness before marketing them. Trained tasters sip the cola along with drinks of standard sweetness and score the cola on a “sweetness score” of 1 to 10. The cola is then stored for a month at high temperture to imitate the effect of four months’ storage at room temperature. After a month, each taster scores the stored cola. This is a matched pairs experiment. Our data are the differences (score before storage minus score after storage) in the tasters’ scores. The bigger these differences, the bigger the loss of sweetness. Here are the sweetness losses for a new cola, as measured by 10 trained tasters: 2.0 0.4 0.7 2.0 − 0.4 2.2 − 1.3 1.2 1.1 2.3. Most are positive. That is, most tasters found a loss of sweetness. But the loses are small, and two tasters (the negative scores) thought the cola gained sweetness. Are these data good evidence that the cola lost sweetness in storage? The Reasoning of a Significance Test Note. The average sweetness loss for our cola is given by the sample mean, x= 2.0 + 0.4 + · · · + 2.3 = 1.02. 10 1 That’s not a large loss. Ten different tasters would almost surely give a different result. Maybe it’s just chance that produced this result. A test of significance asks: “Does the sample result x = 1.02 reflect a real loss of sweetness?” OR “Could we easily get the outcome x = 1.02 just by chance?” Note. Next, state the null hypothesis. The null hypothesis says that there is no effect or no change in the population. If the null hypothesis is not true, the sample result is just chance at work. Here, the null hypothesis says that the cola does not lose sweetness (no change). We can write that in terms of the mean sweetness loss µ in the population as H0 : µ = 0. We write H0 , read “H-nought,” to indicate the null hypothesis. The effect we suspect is true, the alternative to “no effect” or “no change,” is described by the alternate hypothesis. We suspect that the cola does lose sweetness. In terms of the mean sweetness loss µ, the alternative hypothesis is Ha : µ > 0. Note. The reasoning of a significance test goes like this. • Suppose for the sake of argument that the null hypothesis is true, that on the average there is no loss of sweetness. • Is the sample outcome = 1.02 surprisingly large under that supposition? If it is, that’s evidence against H0 and in favor of Ha . To answer the question, we use our knowledge of how the sample mean x would vary in repeated samples if H0 really were true. That’s the sampling distribution of x once again. 2 Note. From long experience we also know that the standard deviation for all individual tasters is σ = 1. (It is not realistic to suppose that we know the population standard devatiation σ. We will eliminate this assumption in the next chapter.) The sampling distribution of x from 10 tasters is then normal with mean µ = 0 and standard devia√ √ tion σ/ n = 1/ n = .316. We can judge whether any observed x is surprising by locating it on this distribution. Figure 5.8 (and TM-86) shows the sampling distribution with the observed values of x for two types of cola. • One cola had x = .3 for a sample of 10 tasters. It is clear from Figure 5.8 (TM-86) that an average x this large could easily occur just by chance when the population mean is µ = 0. That 10 tasters find x = .3 is not evidence of a sweetness loss. • The taste for our cola produced x = 1.02. That’s way out on the normal curve in Figure 5.8 (TM-86), so far out that an observed value this large would almost never occur just by chance if the true µ were 0. This observed value is good evidence that in fact the true µ is greater than 0, that is, that the cola lost sweetness. The manufacturer must reformulate the cola and try again. Note. Look again at Figure 5.8 (TM-86). If the alternative hypothesis is true, there is a sweetness loss and we expect the mean loss x found by the tasters to be positive. The farther out x is in the positive direction, the more convinced we are that the population mean µ is not zero but positive. We measure the strength of the evidence against H0 by 3 the probability under the normal curve in Figure 5.8 (TM-86) to the right of the observed x. This probability is called the P −value. It is the probability of a result at least as far out as the result we actually got. The lower this probability, the more surprising our result, and the stronger the evidence against the null hypothesis. Note. Notice: • For one new cola, our 10 tasters gave x = .3. Figure 5.9 (and TM-87) shows the P −value for this outcome. It is the probability to the right of 0.3. This probability is about 0.17. That is, 17%. • Our cola showed a larger sweetness loss, x = 1.02. The probability of a result this large or larger is only 0.0006. Note. Small P −values are evidence against H0, because they say that the observed result is unlikely to occur just by chance. Large P −values fail to give evidence against H0 . A P −value of 0.05 is used as a common rule of thumb. A result with a small P −value, say less than 0.05, is called statistically significant. That’s just a way of saying that chance alone would rarely produce so extreme a result. Outline of a Test Note. Here is the reasoning of a significance test in outline form: 1. Describe the effect you are searching for in terms of a population 4 parameter like the mean µ. 2. The null hypothesis is the statement that this effect is not present in the population. 3. From the data, calculate a statistic like x that estimates the parameter. 4. The P −value says how unlikely a result at least as extreme as the one we observed would be if the null hypothesis were true. Results with small P −values would rarely occur if the null hypothesis were true. We call such results statistically significant. More Detail: Stating Hypotheses Definition. The statement being tested in a test of significance is called the null hypothesis. The test of significance is designed to assess the strength of the evidence against the null hypothesis. Usually the null hypothesis is a statement of “no effect” or “no difference.” Note. The first step in a test of significance is to state a claim that we will try to find evidence against. The alternative hypothesis Ha is the claim about the population that we are trying to find evidence for. Note. In Example 5.7, we were seeking evidence of a loss in sweetness. The null hypothesis says “no loss” on the average in a large population of tasters. The alternative hypothesis says “there is a loss.” So the hypotheses are H0 : µ = 0 and Ha : µ > 0. This alternative hypothesis 5 is one-sided because we are interested only in deviations from the null hypothesis in one direction. Definition. If no direction of difference is mentioned in a problem, and the null hypothesis is H0 : µ = 0, then the alternative hypothesis is two sided: Ha : µ = 0. More Detail: P −Values and Statistical Significance Note. A significance test uses data in the form of a test statistic. The test statistic is usually based on a statistic that estimates the parameter that appears in the hypothesies. Definition. The probability, computed assuming that H0 is true, that the test statistics would take a value as extreme or more extreme than that actually observed is called the P −value of the test. The smaller the P −value is, the stronger is the evidence against H0 provided by the data. Example 5.9. In Example 5.7 the observations are an SRS of size n = 10 from a normal population with σ = 1. The observed mean sweetness loss for one cola was x = .3. The P −value for testing H0 : µ = 0 and Ha : µ > 0 is therefore P (x ≥ .3) calculated assuming that H0 is true. When H0 is true, x has the normal distribution with mean √ √ 0 and standard deviation σ/ n = 1/ 10 = .316. Find the P −value by a normal probability calculation. Start by drawing a picture that 6 shows the P −value as an area under a normal curve. Figure 5.10 (and TM-88) is the picture for this example. Then standardize x to get a standard normal Z and use Table A (see TM-139, TM-140): x−0 .3 − 0 P (x ≥ .3) = P ≥ .316 .316 = P (Z ≥ .95) = 1 − .8289 = .1711 Note. We can compare the P −value with a fixed value that we regard as decisive. This amounts to announcing in advance how much evidence against H0 we will insist on. The decisive value of P is called the significance level. We write it as α, the Greek letter alpha. If we choose α = .05, we are requiring that the data give evidence against H0 so strong that it would happen no more than 5% of the time when H0 is true. Definition. If the P −value is as small or smaller than α, we say that the data are statistically significant at level α. Tests for a Population Mean Note. We have an SRS of size n drawn from a normal population with unknown mean µ. We want to test the hypothesis that µ has a specified value. Call the specified value µ0 . The null hypothesis is H0 : µ = µ0 . The test is based on the sample mean x. Because normal calculations require standardized variables, we will use as our 7 test statistic the standardized sample mean x − µ0 √ . z= σ/ n This z test statistic has the standard normal distribution when H0 is true. If the alternative is one-sided on the high side Ha : µ > µ0 then the p−value is the probability that a standard normal variable Z takes a value at least as large as the observed z. That is, P = P (Z ≥ z). Example 5.10. Suppose that the z test statistic for a two-sided test is z = 1.7. The two-sided P −value is the probability that Z ≤ −1.7 or Z ≥ 1.7. Figure 5.11 (and TM-89) shows this probability as areas under the standard normal curve. Because the standard normal distribution is symmetric, we can calculate this probability by finding P (Z ≥ 1.7) and doubling it: P (Z ≤ −1.7 or Z ≥ 1.7) = 2P (Z ≥ 1.7) = 2(1 − .9554) = .0892. We would make exactly the same calculation if we observed z = −1.7. It is the absolute value |z| that matters, not whether z is positive or negative. Definition. To test the hypothesis H0 : µ = µ0 based on an SRS of size n from a population with unknown mean µ and known standard deviation σ, compute the z test statistic x − µ0 √ . z= σ/ n in terms of a variable Z having the standard normal distribution, the P −value for a test of H0 against Ha : µ > µ0 is 8 P (Z ≥ z) Ha : µ < µ0 is P (Z ≤ z) Ha : µ = µ0 is P (Z ≥ |z|). These p−values are exact if the population distribution is normal and are approximately correct for large n in other cases. Example 5.11. The National Center for Health Statistics reports that the mean systolic blood pressure for males 35 to 44 years of age is 128 and the standard deviation in this population is 15. The medical director of a large company looks at the medical records of 72 executives in this age group and finds that the mean systolic blood pressure in this sample is x = 126.07. Is this evidence that the company’s executives have a different mean blood pressure from the general population? As usual in this chapter, we make the unrealistic assumption that we know the population standard deviation. Assume that executives have the same σ = 15 as the general population of middle-aged males. Step 1: Hypotheses. The null hypothesis is “no difference” from the national mean µ0 = 128. The alternative is two-sided, because the medical director did not have a particular direction in mind before examining the data. So H0 : µ = 128 and Ha : µ = 128. Test 2: Test Statistic. The z test statistic is z= 126.07 − 128 x − µ0 √ √ = = −1.09. σ/ n 15/ 72 Test 3: P −Value. You should still draw a picture to help find the P −value, but now you can sketch the standard normal curve with the observed value of z. Figure 5.12 (and TM-90) shows that the P −value 9 is the probability that a standard normal variable Z takes a value at least 1.09 away from zero. From Table A (and TM-139, TM-140) we find that this probability is P = 2P (Z ≥ 1.09) = 2(1 − .8621) = .2758. Conclusion: More than 27% of the time, an SRS of size 72 from the general male population would have a mean blood pressure at least as far from 128 as that of the executive sample. The observed x = 126.07 is therefore not good evidence that executives differ from other men. Tests with Fixed Significance Level Example 5.13. In Example 5.12, we examined whether the mean NAEP quantitative score of young men is less than 275. The hypotheses are H0 : µ = 275 and Ha : µ < 275. The z statistic takes the value z = −1.45. Is the evidence against H0 statistically significant at the 5% level? To determine significance, we need only compare the observed z = −1.45 with the 5% critical value z ∗ = 1.645 from Table C (and TM-142). Because z = −1.45 is not farther from 0 than -1.645, it is not significant at level α = .05. Definition. To test the hypothesis H0 : µ = µ0 based on an SRS of size n from a population with unknown mean µ and known standard deviation σ, compute the z test statistic z= x − µ0 √ . σ/ n 10 Reflect H0 at significance level α against a one-sided alternative Ha : µ > µ0 if z ≥ z∗ Ha : µ < µ0 if z ≤ −z ∗ where z ∗ is the upper α critical value from Table C (and TM-142). Reject H0 at significance level α against a two-sided alternative Ha : µ = µ0 if |z| ≥ z ∗ where z ∗ is the upper α/2 critical value from Table C (TM-142). Example 5.14. The analytical laboratory of Example 5.4 is asked to evaluate the claim that the concentration of the active ingredient in a specimen is 0.86%. The lab makes 3 repeated analyses of the specimen. The mean result is x = .8404. The true concentration is the mean µ of the population of all analyses of the specimen. The standard deviation of the analysis process is known to be σ = .0068. Is there significant evidence at the 1% level that µ = .86? Step 1: Hypotheses. The hypotheses are H0 : µ = .86 and Ha : µ = .86. Step 2: Test Statstic. The z statistic is z= .8404 − .86 √ = −4.99. .0068/ 3 Step 3: Significance. Because the alternative is two-sided, we compare |z| = 4.99 with the α/2 = .005 critical value from Table C (and TM-142). This critical value is Z ∗ = 2.576. Figure 5.15 (and TM93) illustrates the values of z that are statistically significant. Because 11 |z| > 2.576, we reject the null hypothesis and conclude (at the 1% significance level) that the concentration is not as claimed. Note. The P −value is the smallest level α at which the data are significant. Knowing the P −value allows us to assess significance at any level. Tests from Confidence Intervals Note. A level α two-sided significance test rejects a hypothesis H0 : µ = µ0 exactly when the value µ0 falls outside a level 1 − α confidence interval for µ 12