* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download HYPOTHESIS TESTING I INTUITION OF HYPOTHESIS TESTING
Survey
Document related concepts
Transcript
HYPOTHESIS TESTING I INTUITION OF HYPOTHESIS TESTING Purpose: µ (or some other parameter) is unknown; on the basis of a sample, we wish to determine whether µ is or is not equal to some particular value. Ø contrast with confidence interval estimation: with hypothesis testing some particular value is significant: it matters whether a specific value holds Example: A produce buyer will buy a farmer’s truckload of potatoes if she can be sure that the average weight of the potatoes is 8 oz; that is, µ = 8 oz. Population standard deviation for this variety of potatoes is 1.5 oz; that is, σ = 1.5 oz. and weights are normally distributed. From the truckload, buyer selects a sample of 25 potatoes. The sample mean X = 7.2 oz. Should the buyer accept the load of potatoes? Ø sample means will vary; half the possible sample means are less than the population mean; therefore, X = 7.2 is not conclusive evidence that the population mean is less than 8 oz. The buyer does not want to reject the load without reason. Ø The buyer cannot be absolutely sure of the average weight of the population of potatoes. „ Assume µ = 8 oz: how probable is the result X = 7.2 oz? Put another way: how probable is it that a sample mean will differ from the population mean by 0.8 oz? „ This question can be answered by reference to the sampling distribution. Ø if the population mean = 8 oz „ the sampling distribution of X is a normal distribution „ µX = 8 oz „ σX = σ/√n = 1.5 ÷ 5 = 0.3 Ø Calculating P(X ≤ 7.2 oz ) is a simple normal probability problem =NORMDIST(7.2,8,0.3,TRUE) = 0.0038 BUT: there is an equal probability of erring by the same amount on the upside: hence probability of differing by 0.8 oz in either direction is 2 × 0.0038 = 0.0076 or about eight chances in a thousand This number is called the p-value of the test conclusion: the actual sampling result is quite unlikely if in fact µ = 8. In most such problems, the buyer would reject the lot of potatoes. FORMALIZING Ø state the null hypothesis and the alternative hypothesis here, H0 : µ = 8 H1 : µ ≠ 8 this is called a two-tailed test procedure is arranged to test whether a sample gives sufficient evidence to reject the null hypothesis; if not we must fail to reject the null hypothesis. „ we do not ordinarily speak of "accepting the alternative hypothesis" „ issue hinges on the probability of getting a particular sample; if the sample is too improbable under H0 , then we must reject H0 . Ø Probability never drops to zero; when is improbable too improbable? w Suppose the next farmer’s truckload hasX = 7.8; a sample mean that small or smaller has probability 25.14%, so a sample mean <7.8 or >8.2 is a fifty- fifty chance A sample probability so low that it will cause us to reject H0 is denoted α. The quantity α is also called the significance level of the test. appropriate α value depends on circumstances; he re let us select a 5% significance level, or α = 0.05 The p-value of the test is the probability of a sample result as extreme as that actually obtained if the null hypothesis were true. Our decision rule then is to reject the null hypothesis whenever p- value < α. As above, if the sample mean is 7.2, the p-value = 0.0076 < 0.05. The buyer should reject the null hypothesis and the truckload of potatoes. Alternate formulation: Critical Value Approach „ find zC so that P(z > zC OR z < −zC ) = 0.05: zC = 1.96 „ calculate z = (X − µ X) ÷ σX „ reject the null hypothesis whenever z < −1.96 or z > +1.96 otherwise fail to reject Ø in this case, with known sampling distribution, we could alternatively find the values of X which will produce a z value of ±1.96. „ solve for X in the expression z = (X − µX) ÷ σX substituting ±1.96 for z „ use Excel’s norminv function with probability = 0.025 and 0.975 Thus in the critical value approach, we may state our decision rule in either of two ways: Calculate z = (X − 8) ÷ 0.3 and reject H0 if z < −1.96 or z > +1.96 Reject H0 if X ≤ 7.41 or X > 8.59 either way, we reject H0 : µ = 8. The buyer should reject both the null hypothesis and the load of potatoes. ONE-TAILED VS. TWO-TAILED TESTS Tests we have looked at to this point are two-tailed : being off either way matters Sometimes diverging one way from hypothesized value is costly but the other way is harmless one-tailed tests involve an inequality and are stated as H0 : µ ≤ µ0 H1 : µ > µ0 or H0 : µ ≥ µ0 H1 : µ < µ0 such tests are appropriate when being over the hypothesized value (or under, as the case may be) is harmless or costless use a one-tailed test whenever the requirement is stated as at least, as no more than or other words to that effect Ø EXAMPLE: Fly-by-Night Couriers wish to guarantee delivery on a new route within twelve hours, and they have decided to use a sample of 49 deliveries to test whether their delivery time will meet their desired guarantee. From previous experience, FBN knows that delivery times are normally distributed and that the standard deviation of delivery times is 0.7 hours. Thus they can safely guarantee delivery in twelve hours if the mean delivery time = 10.6 hrs or less. A one-tailed test since being under 10.6 hours is harmless. As worded, H0 : µ ≤ 10.6 H1 : µ > 10.6 Under these circumstances, sampling distribution is normal. Let us set α = 0.05. „ Our decision rule is reject H0 if the sample result has probability less than 0.05 „ Note carefully: the 5% is all in one tail: we reject only if is too big Here, σX = σ/√n = 0.7 ÷ 7 = 0.1. SupposeX = 10.75 hrs. In one-tail case, the p-value = P(X > 10.75). From Excel p- value = 0.067 > 0.05 and FBN fails to reject H0 . „ They can safely implement the guarantee Ø NOTE CAREFULLY: The critical z values for two-tailed tests are different from the critical z values for one-tailed tests at the same significance level. „ a two-tailed test splits the universe of possible Gx ‘s into three regions: Testing H0 : µ = 100 vs. H1 : µ ≠ 100: sample means below 60 or above 140 cause H0 to be rejected: the area in the two tails combined is equal to α „ A one-tailed test splits the universe of X’s into only two regions Testing H0 : µ ≤ 100 vs. H1 : µ > 100: sample means greater than 130 cause the null hypothesis to be rejected, and area in the right tail by itself is equal to α FIVE FORMAL STEPS IN HYPOTHESIS TESTING 1. State the null and alternative hypotheses „ the null and alternative hypotheses must be mutually exclusive: we do not permit things like H0 : µ ≥ 8 vs. H1 : µ ≤ 8 „ and must be exhaustive: we do not permit H0 : µ ≥ 8 vs. H1 : µ < 7.9 „ the null must always contain an equality: we do not permit H0 : µ < 8 vs. H1 : µ ≥8 • choice of which is null and which alternative is determined by nature of the problem • if you can’t decide, test whatever is stated or claimed: above, for example, • the potatoes must be equal 8 oz; that is µ = 8. Since the equality must be in the null, the choice is made for you • delivery within twelve hours ⇒ H0 : µ ≤ 10.6 • sometimes means starting from the alternative: “The average student studies more than 6 hrs a week,” ⇒ H1 : µ > 6 and H0 : µ ≤ 6 „ statement of H0 affects the stringency of the test: above, if we had used H0 : µ ≥ 10.6, we would reverse the consequences of the test. We would reject H0 only whenX is somewhat less than 10.6; that is, we’d implement the guarantee only if the sample mean were somewhat less than FBN’s actual requirement „ Rejecting the null should indicate taking some specific action, like rejecting the load of potatoes in above example. 2. Select a test statistic: decide what the sampling distribution of the statistic is „ depends on knowing the appropriate sampling distribution, given the conditions of the sampling procedure „ Sampling distribution is normal if σ is known AND the population is normally distributed OR σ is known, the population is irregular, and n ≥ 30 OR the problem involves a sample proportion „ Use a t value if σ is NOT known AND the population is normally distributed OR σ is NOT known, the population is irregular, AND n ≥ 30 „ the test statistics will be calculated as z = (X − µ0 )/σX t = (X − µ0 )/sX where µ0 is the hypothesized value of µ 3. Choose a significance level and (optiona l) find the critical value(s) „ this should be done before the sample is taken to avoid distorting results through wishful thinking • proper value of α depends on balancing costs of being wrong; whatever value is selected, this amounts to saying that anyt hing that improbable is too improbable to believe • typical values are 5% or 1% „ samples are always improbable not impossible; it is possible to reject the null hypothesis even when true. • if the potatoes in Farmer Jones’s truckload have µ = 8 oz, some samples have mean less than 7.41 or more than 8.59 oz, and we’d reject H0 even though it is true „ Type I error: Rejecting a true null hypothesis Dr. McRae’s terminology: a rejection error • it is also possible to accept a null hypothesis which is false „ Type II error: accepting a false null hypothesis Dr. McRae’s terminology: an acceptance error Example: suppose above that µ = 7.3, and the potatoes are below standard; from Excel we calculate P(X > 7.41|µ = 7.3) = 0.36: the probability of getting an X that caused us to accept H0 and buy a truckload of substandard potatoes „ either sort of error imposes costs; trick is to balance the two Ø State the decision rule „ reject H0 if p- value < α „ (Optional) find critical values from the appropriate table, the values which cut off α of the distribution • we’re doing a lower one-tail z test at 5% significance, so we wish to find the z value such that only 5% of the distribution is less than that value. We must find the z value such that .05 is less than that value or −1.64 • for an upper one-tail test, we find the α% most improbably large; for an upper one-tail test at 5% significance, we’d find the z value such that 95% of the distribution is less than that: +1.64 „ It is often useful to state the decision rule explicitly: For an upper one-tail test, reject H0 if z > zC, thus if α = 0.05, the decision rule is: Reject H0 if z > 1.64 Ø In some cases, we can define the rejection region before taking the sample: „ use the Excel NORMINV function or „ substitute critical z value into z = (X − µ0 )/σX and solve for XC. Then the rejection rule is: Reject H0 if X > XC „ this was applied above to find that the hypothesis that Farmer Jones’s potatoes had an average weight of 8 oz would be rejected whenever X < 7.41 or X > 8.59 4. Choose a sample, compare p-value to α or compare results to critical value found above, and reject or fail to reject H0 5. Take the action implied by the results of the test Example: FDA requires that cans of tomato sauce on average contain no more than 1 milligram of DHTA. DHTA content is normally distributed, and the population (or “process”) standard deviation is 0.2 mg. A sample of 18 cans is drawn from a canning run of 100,000 cans, and the cans in the sample tested for content of DHTA; the sample average content is 1.07 mg. Should the lot of 100,000 cans be thrown out for excessive preservatives? 1. State the null and alternative hypotheses: “no more than” means? H0 : µ ≤ 1 H1 : µ > 1 2. Select a test statistic: „ population is normally distributed „ σ is known ⇒ the sample means are normally distributed: use NORMDIST to find a p-value for the sample mean. 3. Choose a significance level: set α = 0.01. The decision rule is: Reject H0 if the pvalue of the test is less than 0.01 4. Choose a sample, calculate statistics, and compare results to critical values. Suppose that X = 1.97 NORMDIST(1.07, 1, 0.2/sqrt(18),true) = 0.069 „ We would fail to reject H0 . 5. FDA inspectors should allow this lot of cans to be shipped and sold. Alternatively, following the critical value approach 2. The test statistic is z = (X − µX)/σX 3. Determine the critical value of the test statistic. We will reject H0 if we get an improbably large sample mean, one whose probability, given the truth of the null hypothesis, is 1% or less. Find zC such that P(z ≥ zC) = 0.01. As a cumulative table is laid out, find zC such that P(z ≤ zC) = 0.99 ⇒ zC = +2.33. Or us e NORMSINV(0.99) „ since σ is known, we could easily find the critical value of X by substituting 2.33 in the formula for z and solving for X: 2.33 = (X − 1)/(.2/√18). Solving, XC = 1.1098. „ decision rule: w Calculate z and reject H0 if z ≥ 2.33 OR w Calculate X and reject H0 if X > 1.11 4. z = (1.07 − 1)/0.047 = 1.48 < 2.33; X = 1.07 < 1.11 ⇒ fail to reject H0 Comment: we fail to reject even though X > 1. Choosing a larger α, say α = 0.1 with zC = 1.28, would expand the rejection region and cause us to reject some samples that we will not reject at 1% significance. The 1% test is very conservative; we reject only if quite sure that µ > 1. Put the other way around, this test is not very stringent; we accept fairly weak evidence that the content of food preservative doesn’t exceed safe levels. Ø What is a rejection error? What are its costs? Ø What is an acceptance error? What are its costs? HYPOTHESIS TESTING II EXAMPLES: Ø Cigarette filters must be 18 mm long; if they are much longer or much shorter they will jam the machinery, so the test must assure that the filters coming from our machinery are an average of 18 mm. „ State the appropriate null and alternate hypotheses „ find the critical z value for a test at 5% significance level. „ State the decision rule(s) „ If n = 25 and σ = 0.6, what decision should be made if X = 18.3 mm Ø An environmentalist claims that in actual use, the average mpg of a particular large SUV is less than 10 mpg. To test the claim, a sample of 36 large SUV’s is selected and mileage in the sample group carefully recorded over five hundred miles; according to the manufacturer, the standard deviation of mileage for these vehicles is 1 mpg. „ State the null and alternative hypotheses. „ What is the appropriate test statistic? „ State the decision rule if α = 0.05. „ If the sample mean = 9.97 mpg, what is the decision? Ø Air conditioning equipment in a shopping mall must maintain an average temperature of 70°F for the comfort of customers. If we wish to sample at several times during the day, „ state the appropriate null and alternative hypotheses; „ find the correct z value to use for a test at 1% significance level. „ Suppose it is known that the standard deviation of temperature for this equipment is 1° and that the temperature is to be recorded at 25 different times over a day. Find the region within which H0 will be rejected. „ if X = 70.4, find the p-value of the test. Ø Air conditioning equipment in a greenhouse must maintain a relative humidity averaging at least 40% over the day. The standard deviation is 4%, and sampling will be done at a random moment once an hour. „ State the appropriate null and alternative hypotheses to test whether the equipment is working properly. „ What must we assume in order to use normal probabilities in this problem? „ If the significance level of the test = 0.1, what is the decision rule if using Excel to calculate probabilities? State the appropriate Excel formula if X = 38.2%. Ø A bank has calculated that a major campaign to expand the number of credit card customers will succeed if the average balance on credit cards is more than $500. The bank decides to take a sample and conduct a hypothesis test to see whether in fact average balances exceed $500. „ State the appropriate null and alternative hypotheses. (Note: there is not necessarily a right and a wrong answer here, but how the hypotheses are stated will affect the stringency of the test.) HYPOTHESIS TESTS USING THE t DISTRIBUTION (σ UNKNOWN) Conditions: Ø population standard deviation is unknown and must be estimated with s, the sample standard deviation „ population known to be normally distributed: any sample size „ population not normal: we require n ≥ 30 (These are the same conditions determining use of t in constructing confidence intervals.) In these cases, the test statistic is given by t= x − µ0 s where s x = sx n „ calculated value of t may be compared to a critical value found in the t table or from Excel’s TINV(area in both tails, degrees of freedom) function OR „ p-value of the test may be found using TDIST(ABS(t-value), degrees of freedom, number of tails) w note that ABS, which returns the absolute value of the quantity in parentheses is necessary because Excel expects the t value to be a positive number w in these problems t has n − 1 degrees of freedom Note: when the t distribution is used, we don't know σ, so we cannot calculate a rejection region in terms of X Ø the t values in the table are for one-tailed tests and are always positive: be careful to think whether the test is one- or two-tailed and whether the critical value is positive or negative „ for two-tailed tests, take the table t value corresponding to α/2, where α is the significance level of the test Ø the t values generated by TINV are for two-tailed tests and are always positive: be careful to think whether the test is one- or two-tailed and whether the critical value is positive or negative „ for one-tailed tests, enter a probability double the desired significance level; to find the t-value for α = 0.01 with 20 degrees of freedom, enter TINV(.02, 20) Example: It is claimed that automotive tune-ups will increase average gasoline mileage by at least 5 mpg. To test this claim, a consumer research laboratory measures the average gas mileage of 38 of its employees’ cars before and after a tune-up. The distribution of increases appears to have significant downward skew. In the event the average increase in gas mileage is 4.1 mpg with sample standard deviation = 2.2 mpg. Use this information to test the original claim. 1. State the hypotheses: H0 : µ ≥ 5 mpg H1 : µ < 5 mpg 2. Select test statistic: „ population standard deviation σ is unknown, „ population not normal, but n ≥ 30 ⇒ we a t test 3. Select significance level: let us take α = 0.05. Find the critical values of the test statistic: „ for a one-tailed test with n − 1 = 37 degrees of freedom, tC = − 1.687 „ decision rule: reject H0 if the calculated t = (X − µ0 )/s X < −1.687 OR if the p- value from TDIST(ABS(t),37,1) < 0.05 4. take a sample, calculate t and compare to critical value: sX = s ÷ √n = 2.2 ÷ √38 = 0.35688; t = (4.1 − 5) ÷ 0.35688 „ t = −2.5218 < − 1.687, so reject H0 „ TDIST(ABS(−2.5218),37,1) = 0.008 < 0.05 5. the actual increase in fuel mileage appears to be less than claimed Example: A biologist wishes to test the proposition that the average number of fly larvae in a particular trout stream is 43 per square foot of streambed. She will conduct her test by randomly selecting 25 one-foot-square areas from a half- mile of the stream. Past experience indicates that the distribution of larvae is normal. 1. H0 : µ = 43; H1 : µ ≠ 43 2. under the conditions given, the test statistic is a t. 3. let's take α = 0.01. This is a two-tailed test; each tail will contain an area of α/2 = 0.005. Accordingly, with 24 degrees of freedom, from the table or from TINV(.01, 24) the critical t value = ± 2.797 4. in her sample, the biologist counted X = 45 larvae per square foot with s = 6.2. sX = 6.2/5 = 1.24, so t = (45 − 43)/1.24 = 1.613 < +2.797 OR TDIST(1.613, 24, 2) = 0.1198 > 0.05 „ fail to reject 5. conclude that the average larvae density is 43 per square foot. Spreadsheet Functions: To work repeated problems of the same sort, it’s worth a few minutes to set up a spreadsheet to automate the calculations. Below is a simple example of a spreadsheet to use in t problems. Notice that this worksheet is assumed to begin in row 1and column A; if you do not begin in the same position, all cell addresses in the formulas must be adjusted accordingly. Also note that the row numbers and column letters in the example are for reference purposes; they should not be entered in your spreadsheet. 1 2 3 4 5 6 7 8 A Hypothesized mean = x-bar = s= n= tails = st. error = t= p-value of test = B 14 12 4 21 2 =b3/sqrt(b4) =(b2−b1)/b6 =TDIST(ABS(B7),B4−1,B5) Entered in a spreadsheet, this should give you the answer 0.032936. Notice that to work another problem, all you need do is enter the data for the new problem.