* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 9 - The WA Franke College of Business
Survey
Document related concepts
Transcript
1 Chapter 9 Hypothesis Testing for Population Means and Proportions Hypothesis testing is another method of statistical inference. In constructing confidence intervals we wanted to try to get a good estimate of the location of a population mean. We did not have any idea about what value that mean should have, we just wanted to know what it was. Often we have an idea of what value we want the mean to have or an idea of what value it should have. Go back to Chapter 1 and read the tire manufacture example. In that case the manufacture has a definite idea about what value the mean lifetime of the tires should be. The managers of the tire company don't give these instructions to the engineers: “Go ahead and develop a new tire and then we'll see what it's like”. No, the instructions will be like this: “Develop a tire with an average lifetime of 60,000 miles”. They know what mean they want. The question to be examined is whether the tire developed is the one they want. The company will want to market the tire if they feel it is a good one (i.e. 60000 ) and redesign it if they feel it is a bad one(i.e. 60000 ). Note that the decision to be made is whether the tire is a good one or a bad one. 9.1 A Slightly More Precise Statement We have to select between believing one of two things: (a) the tire is good; or (b) the tire is bad. One of these beliefs will be called the null hypothesis and designated as H 0 . The other will be called the alternative hypothesis and designated as H A . We could represent the problem as H 0 : The tire is good H A : The tire is bad We could also represent it as H 0 : The tire is bad H A : The tire is good Both these representations can be put into a mathematical form. Recall that the value of that we are interested in is 0 =60000 , where 0 indicates the hypothesized value of the mean. So we could write the problem as H 0 : 0 60000 (The tire is good) H A : 0 (The tire is bad) or as 2 H 0 : 0 (The tire is bad) H A : 0 60000 (The tire is good) Does it matter which of the two representations we use? Yes it does. There is nothing that causes beginning students more grief than determining which should the null and which should be the alternative hypothesis. The author suspects that much of this grief has been caused by the arbitrary lifting of the hypothesis testing method baldly from the scientific community. Hypothesis testing there involves verification of scientific theories. We are not doing anything so elegant, we just want to know what to do about the tire. Unfortunately a lot of the nomenclature used in applying hypothesis testing came from the scientific community as well as the technique. For this course we will adopt a simple rule: Choose as the null hypothesis the decision with a specific value. So for the tire problem we would write the decisions as H 0 : 0 60000 (The tire is good) H A : 0 (The tire is bad) or, more generally, as H 0 : 0 some value (decision 1) H A : 0 (decision 2) , or H 0 : 0 (decision 2) H A : 0 (decision 2) Note that the “=” is always in the null and that decision 2 can take one of three different forms. 9.2 Type I and Type II Errors In the hypothesis testing decision process two different errors are possible. The first kind of error is to reject the null hypothesis when it is true. In this case the tire is a good one ( H 0 is true), but for some reason we decide that it is false. In this happens we must redesign the tire, lose the sales we could have had from marketing a good tire, and repeat the costly test procedure. The second kind of mistake possible is to believe the null hypothesis to be true when it actually is false. In this case we would decide to market a bad tire ( H 0 is false). Later on the company is cursed by consumer activists, the president gets to testify before House and Senate committees, class action law suits are files, and customers no longer believe the company's claims about the quality of tires. Both of these errors have formal statements: Type I error = rejecting H 0 if it is true Type II error = accepting H 0 if it is false It will be convenient for what follows to write these as 3 Type I error = rejecting H 0 | H 0 true Type I error = accepting H 0 | H 0 false where the symbol | can be read as “if” (it is a conditional probability). 9.3 Decision Rules A decision rule is some criteria which is used to reject or accept a hypothesis. Such a rule might look like this: Take a sample of n=100 tires, calculate the sample mean, X , and reject H 0 if X 57500 . Notice that the decision rule says to reject the null hypothesis if X 57500 , it does not say to reject the null only if it is false. It says reject the null whether it is true or false. You cannot make a decision rule that says reject only if the null is false. To do that you would have to know if it was true, and if you knew it was true it would be pretty silly to test to see if it is true. Again the decision rule says reject the null if X 57500 and to accept it if X 57500 . If the null is true and we just happen to get X 57500 we will reject the null hypothesis and commit a Type I error. We have seen what a decision rule looks like, next we want to set some criteria for determining if a decision rule is a good one or a bad one. The above decision rule might be a good one, but on the other hand it could be perfectly rotten. To evaluate the decision rule we need to introduce a couple of new concepts. Let P rejecting H 0 | H 0 true and P accepting H 0 | H 0 false For the tire problem this becomes P X 57500 | 0 60000 and P X 57500 | 0 where is the lower case Greek letter “alpha” and is the lower case Greek letter “beta”. We can give the following interpretation for : is the probability that the decision rule will lead to the rejection of H 0 if it is true. We can calculate the value of using material already learned. Suppose n=100 so that there is reason to think the Central Limit Theorem holds and that we know 4 the population standard deviation for tire lifetimes = 10000 so that X 1000 . Later we n will consider the more realistic case where the population standard deviation is not known. Then P X 57500 | 0 60000 and we have X n 1000 so that Z X 57500 60000 2.50 X 1000 P X 57500 | 0 60000 P ( Z 2.50 0.0062 So the probability that this decision rule will lead to the rejection of the null hypothesis when it is true is 0.0062 The region in which the hypothesis is rejected (here X 57500 ) is often called the rejection or critical region. Another way of looking at the above results is the following: Suppose that the null hypothesis is true and that we performed the test a large number of times. Then we would end up with a result in the critical regions about 0.62% of the time for this problem if we use this decision rule. We have not yet considered the calculation of which will be a more complicated proposition than that of calculating . Consider again these terms for the tire problem: P X 57500 | 0 60000 P X 57500 | 0 Note that we can calculate because we know what value of to use, that is we use 60000 in the calculation of . But what value of should be used in the calculation of ? There exists a value of for every value of less than 60000. In other words there are an infinite number of values of , one for each value of 60000 . What we must do is calculate for several alternative values of , a “what if” sort of exercise. So we would calculate for =59000, for =58000, for =57000, etc. Suppose H A : 57000 , then 5 P accepting H 0 | H 0 false P X 57500 | 59000 Again we use the Z-score to find this probability Z X 57500 59000 1.50 X 1000 and P Z 1.5 1 0.0668 0.9332 Now suppose the alternative is H A : 57000 then Z and X 57500 57000 0.50 X 1000 P accepting H 0 | H 0 false P X 57500 | 57000 P Z 0.5 1 P Z 0.5 0.3085 In the first case we can say that the probability that the decision rule will lead us to falsely accept the null hypothesis if =59000 is 0.9332. But if actually has a value of 57000 there is a probability of 0.3085 that the decision rule will lead us to accept the null hypothesis making a Type II error. Two things should be apparent from the above discussion. First of all the calculation of can get complicated. The second thing is that and do not sum to one. So far we have Decision rule: Reject H 0 : if X 57500 0.0062 H A : 57000 0.3085 Nothing says we can’t change the decision rule to see if we might not get results that we like better. After all, the value of is much larger than that for . Let’s try a different decision rule: reject H 0 : if X 58000. P( X 58000 | 0 60000) Z X 58000 60000 2000 2 0 X 1000 P( Z 2) 0.0228. 1000 6 P( X 58000 | 57000) Z X 58000 57000 1000 1 0 X 1000 1000 P( Z 1) 0.1587. Putting the information for both decision rules in one table Decision rule: Reject H 0 : if X 57500 0.0062 Decision rule: Reject H 0 : if X 58000 0.0228 H A : 57000 0.3085 H A : 57000 0.1587 The new decision rule reduces the size of at the cost of increasing . Is there anyway that we can make both of them smaller at the same time? Yes, increase the sample size. 9.4 Selecting a Good Decision Rule This section will discuss some of the considerations which should be used to determine a good critical region. As seen in the discussion and problems above changing the critical region changes both and . These two probabilities do not sum to one, but generally reducing will tend to increase and vice versa. In other words decreasing the probability of a Type I error tends to increase the probability of a Type II error. The costs of these errors should enter into the consideration of the decision rule. If the cost of a Type I error is large relative to that of a Type II error this suggests that should be small relative to . On the other hand, if the situation is reversed this indicates that should be large relative to . After all, if the cost of a Type I error is zero, why would you care if you made one? So determining the costs of these errors is a very important part of the decision making process. How this is done is beyond the scope of this course and will not be pursued further, except in the most general terms. It should always be kept in mind however. Determining these costs is management's responsibility. If management does not know these costs then it should not engage in this sort of testing. The level of significance is a term often given to . So far we have determined the decision rule and then calculated the level of significance. In practice this process is reversed. The level of significance is first determined (after careful consideration of the costs of Type I and Type II errors, of course). The level of significance is then used to determine the critical region. 7 To give you a feel for these considerations we will consider two examples without using numbers. What we want is to gain a feeling for the sort of issues involved in the selection of and . Suppose we know that a drug is effective against a certain deadly disease, say AIDS. We don't know if the drug is safe, and this is what we want to test. The hypotheses might be H 0 : (The drug is safe) H A : (The drug is not safe) What sort of considerations should go into selecting the level of significance? What are the costs of a type I error? If we make a type I error here we deny the public a safe and effective drug against a vicious disease. The company that has manufactured the drug will lose substantial profits. This would argue for a very low value of , we want a very small probability of rejecting the null hypothesis if it is true. The cost of a type II error is also very high. If the drug turns out to be unsafe a number of people may be harmed. The company will likely face expensive law suits. This would argue for a very small value of . This may require extensive sampling to get both and small. To make the picture even more murky, people may be denied a safe and effective drug during a lengthy testing process. There are often no clear and easy answers in this business. Suppose, however, we know the drug to be safe. What we don't know is whether it is effective. H 0 : (The drug is effective) H A : (The drug is not effective) The same sort of arguments will hold for a Type I here as in the preceding case. Again it seems that should be made small. But what about the Type II error and ? How severe are the costs of a type II error? Would patients really be harmed if we believe the drug is effective and it is not? Perhaps not, if there is no alternative treatment. If there is this would change the cost calculation. But in any case, should we insist on as small a value for as we did in the previous case? My guess is that the costs here are not as high, and we might use a larger value for . I am not a medical expert however. The choice of is a management responsibility. Look again at some of the inputs needed in using hypothesis testing for decision making in the drug examples. Inputs will be needed from the legal department, the medical department, the marketing department, and the statistical department. Other opinions might be needed as well. It is managements responsibility to gather these inputs and organize the decision making process. To understand what comes next we will recapitulate some of the material presented in earlier chapters. If the null hypothesis is true, the sampling distribution of x 's will be normally distributed with mean equal to x and standard deviation x . Also recall Z-scores, Z x / x . 8 Now let's see what this means for the tire problem. If the null hypothesis is true then =60000 and if we took a sample of n=100 tires we would expect to get a sample mean near 60000. Let's suppose that 10000 and that when we sample we get X = 59995 then this gives Z X 59995 60000 0.005 0 X 1000 But suppose the null hypothesis is false and = 56000. Again we expect to get an X near , suppose we get X =56040. Now the Z-score is Z X 56040 60000 3.96 0 X 1000 Note that in both cases we subtract 0 from X rather than subtracting . This is because we know what 0 is, we know because we have stated it as the hypothesized value. We can't subtract because we don't know what is. A very important observation can be made from this example. Small values of Z can be taken as evidence that the null hypothesis is true. Large values of Z are evidence that it is false. This is summarized below. If H 0 true If H 0 false 0 0 X 0 X NOT 0 Z X 0 X Z 0 0 Z X 0 0 X 0 Z or Z 0 9.5 A Hypothesis Testing Framework In this section we will state a formal framework for testing hypothesis about the value of . Again recall that small values of Z tend to favor the null while large values tend to favor the alternative. 1. State the null and alternative hypotheses 9 H 0 : 0 some value H A : 0 , or H A : 0 , or H A : 0 2. Choose a level of significance . 3. Using this value of determine the rejection region. 4. Calculate the test statistic X x Z x 5. If Z is in the rejection region, reject the null hypothesis; if Z is not in the rejection region, do not reject the null hypothesis. alpha/2 alpha/2 reject reject Figure 9.1 two tailed hypothesis test The location of the rejection region depends on the form of X 0 . Suppose that H A : 0 If X 0 (which makes Z << 0) we would tend to doubt that the null hypothesis is true. Likewise if X 0 (which makes Z>>0) we would also tend to doubt the truth of the null hypothesis. This means that large positive or negative values would make us tend to reject the null. So the rejection region is split into two parts, one to the left of Z=0 and the other to the right of Z=0. In other words we split into two parts and put one part in each tail. Because the rejection region is located in each tail such tests are called two tailed tests. The rejection region for a two tailed test is shown in Figure 9.1 10 alpha reject Figure 9.2 The rejection region for H A : 0 (a one tailed test) Suppose that the alternative is H A : 0 . In this case values of X 0 will make us tend to favor the alternative over the null. But values of X 0 lead to values of Z 0 . So in this case it is large negative values of Z that will make us consider rejecting the null in favor of the alternative. In this case all of the is located in one of the tails and the test is called a one tailed test. The rejection region for this one tailed test is shown in Figure 9.2. alpha/2 reject Figure 9.3 The rejection region for H A : 0 (a one tailed test) Suppose that the alternative is H A : 0 . In this case values of X 0 will make us tend to favor the alternative over the null. But values of X 0 lead to values of Z 0 . So in this case it is large positive values of Z that will make us consider rejecting the null in favor of the alternative. This is also a one tailed test. The rejection region is shown in Figure9.3. 11 9.6 Hints on Solving Problems The first suggestion is that you follow the testing framework presented above. In particular you must state the null and alternative hypothesis. If you don't do that no one else will know or care about what you are doing. If you don't state the hypothesis you aren't doing anything. Statement of the hypothesis is absolutely vital. Statement of the hypotheses is what gives very many students a whole lot of trouble. Here I will give you some hints. 1. First of all follow the advice of stating that the null hypothesis H 0 : 0 some value will look like this 2. State what you would do if you believed the null hypothesis to be true. 3. State what you would do if you believed the null hypothesis to be false. 4. Use step three to decide whether the alternative should be H A 0 , H A 0 , or H A 0 . Following this hint you will set up the hypothesis as follows: H0 Step 1 Step 2 HA Step 4 Step 3 This outline may seem a little backwards in that you might expect Step 3 to precede Step 4. Actually Step 3 is decided before Step 4, but we will write Step 4 before Step 3. Example 9.1 Let's consider the tire problem again. We will only look at how the null and alternative hypotheses should be stated. We will look at numerical values later. Suppose the tire manufacturer has stated that the company's tires will have an average lifetime of at least 60,000 miles. In Step 1 of the hint we said that a “=”' sign would be needed. The value we see is 60,000 miles. So we would make Step 1 be the following: H 0 : =0 . In Step 2 we have to state what we would do if we believe the null to be true. We could write this as “the claim is true”' or as “the tire is good”. What you need is an expression in English of what you mean by the null being true. Here comes the tricky but useful part. In Step 3 we state in English what we mean by the null being false. For this example this must mean “the claim is false” or “the tire is bad”. In Step 4 we translate the English into mathematics. If “the tire is bad” this must mean 60000 . Alright, let's summarize this. 12 Step 1: H 0 : 0 60000 Step 2: The tire is good Step 3: The tire is bad Step 4: H A : <0 We will write this as: H 0 : 0 60000 (the tire is good) H A : <0 (the tire is bad) Now let's try a slight change to the manufacturer's claim. Suppose the statement is “Our tire have an average life time that exceeds 60,000 miles.” Again we need a specific value for the null, and that is H 0 : =0 60000. That takes care of Step 1. But what does that mean for Step 2. Is the manufactures claim true or not? The claim is that the average lifetime is greater than 60,000 miles. So if the null is true the claim must be FALSE. So we could write for Step 2 “The claim is false”. Step 3 then means “The claim is true”. This mean the alternative (Step 4) should be written as H A : 0 . So we would set up the hypothesis as: H 0 : 0 60000 (the claim is false) H A : 0 (the claim is true) Example 9.2 A bolt manufacture claims that the pitch of the threads of a certain bolt is 1mm. If the pitch is too large or too small the threads of the bolt will strip causing damage to the part using the bolt. A sample of n=100 bolts are measured and the mean pitch of the sample bolts is 0.98mm. Suppose it is known that the population standard deviation is 0.01 mm. Test the manufacturers claim with a level of significance of 5%. The first thing to do is find the hypotheses. Step 1 is 0 1 mm. Step 2 is found as follows: if we think really is equal to 1mm then the claim is true. Step 3 must be the claim is false. In Step 4 we must determine whether we are dealing with a one or two tailed test. If the claim had been the pitch was greater than some value or less than some value we would have a one tailed test. But the claim is that the pitch is equal to some value. If the claim is false the pitch is not equal to this value, then the alternative hypothesis must be 0 . Because we would reject the null for very large positive or negative values of Z (see the table) this is a two tailed test and we split into two parts, one part in each tail. The rejection region for this hypothesis test is shown in Figure 9.4.The problem can be stated more formally in what follows. 13 Figure 9.4 The rejection region for Example 9.3 alpha/2=0.025 alpha/2=0.025 \begin{tabular}{ll} reject -1.96 1.96 reject reject Figure 9.4 The rejection region for Example 92 H 0 : =0 1 (the claim is true) H A : 0 (the claim is false) =0.05 reject H 0 : if Z<-1.96 or Z 1.96 Data: X 0.98, 0.01, X / n 0.001 Test statistic: Z X 0.98 1 20 0 X 0.001 Conclusion: Because Z 20 1.96 is in the rejection region we will reject the null hypothesis. This data gives sufficient evidence to deny the manufacturers claim. Example 9.4 Work the same problem supposing that the standard deviation is 0.15 mm. H 0 : =0 1 (the claim is true) H A : 0 (the claim is false) =0.05 reject H 0 : if Z<-1.96 or Z 1.96 Data: X 0.98, 0.15, X / n 0.015 Test statistic: Z X 0.98 1 1.33 0 X 0.015 14 Conclusion: Because Z is not in the rejection region we cannot reject the null hypothesis. This data does not give sufficient evidence to deny the manufacturers claim. Example 9.4 A light bulb manufacturer claims that the average life time of its bulb exceeds 750 hours. A sample of 36 bulbs is taken which gives a sample mean of 755 hours. The population standard deviation is known to be 20 hours. Test the m manufacturers claim with a level of significance of 5%. In Step 1 set 0 750 . In Step 2 we decide whether the claim is true if 750 . The claim is that the average exceeds 750 hours, so the null hypothesis must be that the claim is false. If the claim had been that the bulbs last at least 750 hours the null would be that the claim is true. Step 3 the claim is true. Step 4 - if the claim is true 750 . The rejection region is shown in Figure 9.5 alpha=0.05 1.65 reject reject Figure 9.5 The rejection region for Example 9.4 H 0 : =0 750 H A : 0 (the claim is false) (the claim is true) =0.05 reject H 0 : Z 1.65 Data: X 755, 20, X / n 3.33 Test statistic: Z X 755 750 1.50 0 X 3.33 Conclusion: Because Z is not in the rejection region we cannot reject the null hypothesis. This data does not give sufficient evidence to support the manufacturers claim. 15 Example 9.5 Use the above example, but let the manufacturer's claim be that the bulbs average lifetime is at least 750 hours. alpha=0.05 reject -1.65 reject Figure 9.6 The rejection region for Example 9.5 H 0 : =0 750 H A : 0 (the claim is true) (the claim is false) =0.05 reject H 0 : Z 1.65 Data: X 755, s 20, X / n 3.33 Test statistic: Z X 755 750 1.50 0 X 3.33 Conclusion: Because Z is not in the rejection region we cannot reject the null hypothesis. This data does not give sufficient evidence to deny the manufacturers claim. Note that in these two examples we reach different conclusions about the manufacturers claims - but the claims were different. 9.8 Hypothesis tests for Small Samples 16 The same considerations that applied to CI for small n also hold for hypothesis test. If is not known it must be replaced with s and the t distribution must be used instead of the Z distribution. In this case the test statistic will be t X 0 sX Example 9.6 The Flagstaff Chamber of Commerce claims that the average wage for contract construction workers in Coconino county is $11.25 per hour. To test this claim the wages of 15 construction workers were sampled. The sample results are that the sample mean is $12.75 and standard deviation is $2.45. Do these data present sufficient evidence to reject the Chamber's claim. Test using a level of significance of 1%. H 0 : =0 11.25 (the claim is true) H A : 0 (the claim is false) =0.01 reject H 0 : t<-2.977 or t 2.977, df=14 Data: X 12.75, s 20, s X s / n 0.633 Test statistic: t X 12.75 11.25 2.37 0 sX 0.633 Conclusion: Because t is not in the rejection region we cannot reject the null hypothesis. This data does not give sufficient evidence to deny the Chambers claim. 17 Example 9. 7 The cost of producing a central processor chip for a computer chip manufacturer has been $98.00 per chip. A new process has been developed which promises to reduce chip costs. To test this a sample of 20 chips is made and the average cost per chip in the sample is $94.00 while the sample standard deviation is $5.00. Do these data provide sufficient evidence to conclude the new process has reduced chip costs? Test using a level of significance of 5%. H 0 : =0 98 (no change in chip cost) H A : 0 (the chip cost has been reduced) =0.05 reject H 0 : if t 1.729 Data: X 94, s 5, s X s / n 1.12, df 19 Test statistic: t X 94 98 3.57 0 sX 1.12 Conclusion: Because t is in the rejection region we can reject the null hypothesis. This data gives sufficient evidence to conclude the new process has reduced chip costs. 9.9 p--values An easier way of computing hypothesis testing results is to use p—values. A p—value is the area in the tail to the right or left of the actual value of the test statistic t in the case of a one tailed tests. In the case of a two tailed test the p—value will be the area in both tails. alpha/2 alpha/2 p/2 p/2 -t reject reject t Figure 9.7 The p—value for a two tailed test. The test statistic will give a value of t or -t but the area is put in both tails. Here we could not reject the null hypothesis 18 A plot of a p—value for a two tailed test is shown in Figure 9.7. Here is an example where p . But this can only be true if the value of t is in the rejection region p/2 p/2 alpha/2 alpha/2 reject -t t reject Figure 9.8 The p—value for a two tailed test where the null hypotheses would be rejected. Figure 9.8 shows the situation where p . This can only occur if the value for t is not in the rejection region. If p then we cannot reject the null hypothesis. If p we can reject the null hypothesis. Caution: some computer packages produce p—values for two tailed tests and some produce p—values for one tailed tests. Make sure you find out which is which. Excel gives you a choice of one or two tailed p—values. The Excel worksheet function to use is TDIST(t value, degrees of freedom, tails). Where tails = 1 for a one tailed test and tails = 2 for a two tailed test. Also note that you must use a positive value for the t—value (if it is negative use the absolute value). 19 Example 9.6 (again, but this time use p—value) H 0 : =0 11.25 (the claim is true) H A : 0 (the claim is false) =0.01 Data: X 12.75, s 20, s X s / n 0.633 Test statistic: t X 12.75 11.25 2.37 0 sX 0.633 Note this is a two tailed test so tails = 2 in Excel p value = 0.032692 =TDIST(2.37,14,2) So p 0.033692 0.01 and we cannot reject the null hypotheses Example 9.7 (again, but this time use p—values) H 0 : =0 98 (no change in chip cost) H A : 0 (the chip cost has been reduced) =0.05 Data: X 94, s 5, s X s / n 1.12, df 19 Test statistic: t X 94 98 3.57 0 sX 1.12 Note this is a one tailed test so tails = 1 and p value = 0.001022 =TDIST(3.57,19,1) Here p 0.0010 0.05 so we can reject the null hypothesis. Example 9.8 A new process has been developed to produce synthetic diamonds. The process is quite costly and will only be profitable if it can produce diamonds with an average weight of at least 1 caret. Because of the cost, a sample of only n = 5 diamonds are produced with weights of X={0.99,1.01,0.97,0.98,0.99} carets. Use Excel and p—values to determine if the process is profitable. Use a 1% level of significance. 20 We have H 0 : =0 1 (the process is profitable) H A : 0 (the process is not profitable) =0.01 x bar S S bar 0.988 =AVERAGE(0.99,1.01,0.97,0.98,0.99) 0.014832397 =STDEV(0.99,1.01,0.97,0.98,0.99) 0.006633072 =0.014832/SQRT(5) t= -1.809136137 =(0.988-1)/0.006633 p-value 0.072345948 =TDIST(1.80914,4,1) Because p 0.07236 0.01 , this data does not provide sufficient evidence to reject the null hypothesis at this level of significance. Suppose, however, the data had been X={0.99,0.98,0.97,0.98,0.99} carets. In that case In this case p 0.00429 0.01 and the data does provide sufficient evidence to reject the null hypothesis. x bar S S xbar 0.982 =AVERAGE(0.99,0.98,0.97,0.98,0.99) 0.0083666 =STDEV(0.99,0.98,0.97,0.98,0.99) 0.003741657 =0.0083666/SQRT(5) t= -4.810702852 =(0.982-1)/0.003741657 p-value 0.004290467 =TDIST(4.8107,4,1) 9.10 Tests of Proportions We can also use this framework to perform hypothesis tests on the binomial parameter p. We do again need to make some changes in notation. The null hypothesis will have the form H 0 : p p0 some value . The test statistic will be Test statistic: Z= pˆ p0 p 0 where p 0 p0 q0 n 21 and p0 1 q0 . Note that we are using Z as the test statistic which implies that we are assuming that the CLT holds. Here our check to see if the CLT holds is CLT check: np0 5 and nq0 5. Example 9.9 Radio station KFUD claims that at least 35% of the listening audience in its coverage area listens to KFUD's 2:00 PM rock and roll music program. To test this claim, 500 radio listeners are asked if they listen to KFUD at the stated time. Of those sampled, 160 say they do listen to KFUD at that time. Do these data provide sufficient evidence to reject KFUD's claim? Test using a 5% level of significance H 0 : p p0 0.035 H A : p p0 (KFUD's claim is true) (KFUD's claim is exaggerated) =0.05 reject H 0 : Z -1.65 Check CLT: np0 500(0.35) 175 5 nq0 500(0.65) 325 5 Data: X=160, n=500, pˆ X 160 0.32 n 500 p0 q0 (0.35)(0.65) 0.021 n 500 pˆ p0 0.32 0.35 1.43 Test statistic: Z p0 0.021 p 0 Conclusion: Because Z is not in the rejection region we cannot reject the null hypothesis. This data do not give sufficient evidence to conclude That KFUD's claim is false. This does not mean that KFUD's claim is true, just that the data does not provide sufficient evidence to reject the claim using a 5% level of significance. Note the role of here. For such a very small value 0.05 we are being very careful to avoid rejecting KFUD's claim unless we are fairly sure the claim is untrue. We could have chosen 0.50 and rejected the claim any time we got a Z- value Z < 0 . However, if KFUD is telling the truth we will have a 50% probability of falsely rejecting the claim. We could also use p—values here as well. We need to compute the area to the left of Z 1.43 and compare that area to . p value = 0.076359 =NORMDIST(-1.43,0,1,TRUE) 22 And since p 0.0764 0.05 we cannot conclude that KFUD’s claim is not true using this level of significance. Had we made 0.10 then we would have rejected the null hypothesis. (Note we put all of and all of the p—value in one tail). Example 9.10 Suppose that the sample results had been X=150 listeners of KFUD's program. In that case pˆ 150 0.30. 500 and the test statistic would be Z pˆ p0 0.30 0.35 2.38 p 0 0.21 at which time we would say that we have sufficient evidence ( Z 2.38 1.65 and is in the rejection region) to reject the null hypothesis and say we do not believe KFUD's claim. We do not say that KFUD's claim is untrue, just that we don't believe it to be true given this evidence. The p—value is p value = 0.008656 =NORMDIST(-2.38,0,1,TRUE) Now p 0.0086 0.05 and we would reject the null hypothesis using the p—value as well. We would have rejected the null in this case if had been as small as 0.01. Example 9.11 Suppose that a certain process used to make computer chips is not perfect (what is?). The process is considered to be working properly if the proportion of defectives is 5% or less. If the proportion of defectives is greater than 5% the process is shut down for repairs. A shut down is very costly (expense of repair, loss of output, etc.) Because of this management feels that they should set at a very low value to avoid this cost if the process is really working properly. A sample is taken of 200 chips and 19 are found to be defective. Using level of 0.02 should the process be halted for repairs? 23 H 0 : p p0 0.05 (the process is working OK) H A : p p0 (the process should be shut down) =0.02 reject H 0 : Z 2.05 Check CLT: np0 200(0.05) 10 5 nq0 200(0.95) 190 5 Data: X=19, n=200, pˆ X 19 0.095 n 200 p0 q0 (0.05)(0.95) 0.0154 n 200 pˆ p0 0.095 0.05 2.92 Test statistic: Z p0 0.0154 p 0 Conclusion: Because Z is in the rejection region we will reject the null hypothesis. This means we shut down the process for repairs (possibly we have made an error here, but it is not very likely). The p—value we get from this is and we would reject the null using the p—value as well. Note we had to use the right most tail of p value = 0.00175 =1-NORM DIST(2.92,0,1,TRUE) the distribution. Example 9.12 Suppose that we had found X=15 defective items in the sample for the previous problem. What would the conclusion been then? H 0 : p p0 0.05 (the process is working OK) H A : p p0 (the process should be shut down) =0.02 reject H 0 : Z 2.05 Check CLT: np0 200(0.05) 10 5 nq0 200(0.95) 190 5 Data: X=15, n=200, pˆ X 15 0.075 n 200 p0 q0 (0.05)(0.95) 0.0154 n 200 pˆ p0 0.075 0.05 1.62 Test statistic: Z p0 0.0154 p 0 24 The value of the test statistic is not in the rejection region, so the data does not present sufficient evidence to reject the null with this set of data. The p—value in this case is p value 0.052616 =1-NORMDIST(1.62,0,1,TRUE) and p 0.0526 0.02 so this data does not give sufficient evidence to reject the null hypothesis at this level of significance. You must be careful using the p—values here in determining if you have a one or two tailed test. The t distribution gave you a facility for selecting a one or two tailed test. The Z distribution does not. You have to do it on your own.. Example 9.13 A gambling house in Las Vegas is concerned that a certain pair of dice is not honest. It the dice are honest, the number 7 should occur on 1/6 of the rolls. To test this they roll the dice 500 times and count the number of times a 7 shows. This occurred 100 times out of the 500 rolls. Test using a 1% level of significance. H 0 : p p0 0.167 (the dice are honest) H A : p p0 (something funny is going on) =0.01 reject H 0 : if Z 2.58 or Z 2.58 Check CLT: np0 500(0.167) 83.5 5 nq0 200(0.833) 416.5 5 Data: X=100, n=500, pˆ X 100 0.2 n 500 p0 q0 (0.167)(0.833) 0.0167 n 500 pˆ p0 0.2 0.167 1.97 Test statistic: Z p0 0.0167 p 0 Using the Z table we would conclude that there is not sufficient evidence to shut the process down. Using the p—value we must recall that this is a two tailed test. So we must calculate the area to the left of –1.97 and the area to the right of +1.97 p/2= p/2= 0.024419 =NORMDIST(-1.97,0,1,TRUE) 0.024419 =1-NORMDIST(1.97,0,1,TRUE) p=p/2+p/2 0.048838 =0.024419+0.024419 or p=2*(p/2) 0.048838 =2*0.024419 25 Note that you can either add the area in the two tails together or save a little time by multiplying the area in one of the tails by two. Suppose that the number 7 had shown up X 120 so that pˆ p0 0.24 0.167 4.37 X 120 pˆ 0.24 so that Z n 500 p0 0.0167 The Z value would lead to the rejection of the null hypothesis in this case. Let’s see what the p— value suggests: p= 1.24344E-05 =2*NORMDIST(-4.37,0,1,TRUE) and p 1.23 105 0.01 and would also lead to the rejection of the null hypothesis. 26 Problems 9.1 Calculate for the tire problem with H 0 : 0 60000 where =10000 if the decision rule is to take a sample of 100 tires and reject the null hypothesis if the resulting sample mean is less than 58500 miles. 9.2 Calculate for the tire problem if the decision rule is to take a sample of 100 tires and reject the null hypothesis if the resulting sample mean is less than 59000 miles. 9.3 Calculate for the tire problem if the decision rule is to take a sample of 1000 reject the null hypothesis if the resulting sample mean is less than 59000 miles. tires and 9.4 Calculate for problem 9.1 for H A : 58000 miles. 9.5 Calculate for problem 9.2 if H A : 58000 miles. Note what happens to the value of and when the decision rules changes. 9.6 A cattle herd has been producing 2 gallons of milk per day per cow. The manager of the dairy operation believes that the standard deviation of the milk production is 0.8 gallon (per cow per day). The manager decides to try a new kind of feed in the hopes of boosting milk production. He uses the new feed and collects 36 days worth of data which gives a sample mean of X 2.2 gallons of milk per day per cow. Does this data give sufficient evidence to conclude that the new feed has increased milk production? Test with a 5% level of significance. 9.7 A particular airline has observed an average of 198 passengers on a certain flight. The airline believes that it can increase the number of passengers by offering increased frequent flyer miles. On its first 16 flights after initiation of the program, the air line observed an average of 205 passengers per flight standard deviation of 8 passengers. Do these data present sufficient evidence to indicate that the reduced fare program has been effective in increasing the airlines passenger usage between these cities? Test using a 1% level of significance. Assume the distribution of passengers to be normal. 9.8 Suppose that the local electric company announces that it wishes to raise its rates to 30 cents per kilowatt-hour. The company claims that this is the same as the average rate for all electric companies in the country. A public interest group disputes this and conducts a survey of rates for 200 different companies. The results of the survey is that the average rate for the surveyed companies is 25 cents per kilowatt-hour with a standard deviation of 5 cents per kilowatt-hour. Do these data present sufficient evidence to conclude that the company's claim is valid and that they do not charge more than similar companies? Test using a 10% level of significance. 9.9 An automobile manufacturer claims its cars have an average gasoline mileage in excess of 30 mpg. A sample of n=100 cars gives a sample mean usage of 30.5 mpg. Assume the population standard deviation is 2 mpg. Do these data indicate that the company's claim is true. Evaluate using a level of significance of 10%. 27 9.10 A vendor of soap powder wants to know if the machines filling the boxes with its product are operating properly. If too much soap is being put in the boxes the company is losing money; if too little soap is being put in the boxes the customers are being cheated. Suppose a machine is picked which is supposed to fill the boxes with one pound of powder. A sample of 25 boxes is taken resulting in a sample mean of 0.981 pound and a standard deviation of 0.037 pounds. Does the data give sufficient evident to conclude that the machine is not operating properly? Use a 1% level of significance. Using Excel 9.11 An automobile manufacturer claims that the average pollution generated by its automobiles does not exceed 100 ppm (parts per million) for each mile driven. To test this claim 40 automobiles are selected, driven, and the pollution produced measures. The mean of the sample is 100.451 ppm and the standard deviation is 1.224 ppm. Test the claim using a 10% level of significance. 9.12 The supporters of Proposition 222 on an election ballot claim that a majority of registered voters favor the proposition. A local newspaper questions 1200 registered voters and 602 of them claim they support the proposition. Test the claim using a 10% level of significance. 9.13. A college administrator claims that at least 60% of the students at his institution graduate within 4 years. A statistics professor at that college examines the records of 300 students and finds that 160 of them graduated within 4 years. Test the administrators claim using a 20% level of significance. 28 Answers 9.1 Calculate for the tire problem with H 0 : 0 60000 where =10000 if the decision rule is to take a sample of 100 tires and reject the null hypothesis if the resulting sample mean is less than 58500 miles. 9.2 Calculate for the tire problem if the decision rule is to take a sample of 100 tires and reject the null hypothesis if the resulting sample mean is less than 59000 miles. 9.3 Calculate for the tire problem if the decision rule is to take a sample of 1000 reject the null hypothesis if the resulting sample mean is less than 59000 miles. tires and 9.4 Calculate for problem 9.1 for H A : 58000 miles. 9.5 Calculate for problem 9.2 if H A : 58000 miles. Note what happens to and when the decision rules changes. the value of 29 9.6 A cattle herd has been producing 2 gallons of milk per day per cow. The manager of the dairy operation believes that the standard deviation of the milk production is 0.8 gallon (per cow per day). The manager decides to try a new kind of feed in the hopes of boosting milk production. He uses the new feed and collects 36 days worth of data which gives a sample mean of X 2.2 gallons of milk per day per cow. Does this data give sufficient evidence to conclude that the new feed has increased milk production? Test with a 5% level of significance. Also test using the p—value H 0 : = 0 2 (no change in milk productin) H A : 0 (increase in milk production) =0.05 reject H 0 : Z 1.65 Data: X 2.2, 0.8, X / n 0.133 Test statistic: Z X 2.2 2.0 1.504 0 X 0.133 For the p—value, note that all the area is in the right tail. So, using Excel p value 0.06629071 = 1 - NORMDIST(1.504,0,1,TRUE) The Z—test for the problem gives Z 1.504 1.65 , so this does not provide sufficient evidence to reject the null hypothesis at a 5% level of significance. The p—value is p 0.066 0.05 and this also indicates that there is not sufficient evidence to reject the null. 30 9.7 A particular airline has observed an average of 198 passengers on a certain flight. The airline believes that it can increase the number of passengers by offering increased frequent flyer miles. On its first 16 flights after initiation of the program, the air line observed an average of 205 passengers per flight standard deviation of 8 passengers. Do these data present sufficient evidence to indicate that the reduced fare program has been effective in increasing the airlines passenger usage between these cities? Test using a 1% level of significance. Assume the distribution of passengers to be normal. H 0 : =0 198 (no change in the number of passengers) H A : 0 (increase in number of passengers) =0.01 reject H 0 : if t 2.602 (15df) Data: X 205, s 8, s X s / n 2 Test statistic: Z p value X 205 198 3.50 0 X 2 0.001612 = TDIST(3.5,15,1) The p—value p 0.0016 0.01 shows we have sufficient evidence to reject the null hypotheses. Like wise the evidence t 3.50 0.01 is sufficient evidence to reject the null. We would conclude that the feed has increased milk production. 31 9.8 Suppose that the local electric company announces that it wishes to raise its rates to 30 cents per kilowatt-hour. The company claims that this is the same as the average rate for all electric companies in the country. A public interest group disputes this and conducts a survey of rates for 200 different companies. The results of the survey is that the average rate for the surveyed companies is 25 cents per kilowatt-hour with a standard deviation of 5 cents per kilowatt-hour. Do these data present sufficient evidence to conclude that the company's claim is valid and that they do not charge more than similar companies? Test using a 10% level of significance. H 0 : =0 30 (The claim is true) H A : 0 (increase in number of passengers) =0.10 reject H 0 : if t 1.298 (199df) Data: X 25, s 5, s X s / n 0.354 Test statistic: Z p value X 25 30 14.12 0 X 0.354 4.03648E-32 = TDIST(14.12,199,1) There is plenty of evidence to reject the null hypothesis here. The t statistic is huge. The p—value is extrememly small, almost equal to zero. 32 Suppose that the sample mean had been 29.5 cents per kilowatt hour rather than 25. How would that have changed the analysis? H 0 : = 0 30 (The claim is true) H A : 0 (the claim is false) =0.10 reject H 0 : if t 1.298 (199df) Data: X 29, s 5, s X s / n 0.354 Test statistic: Z p value X 29.5 30 1.449 0 X 0.354 0.074456 = TDIST(1.449,199,1) We would reject the null here with this value of t and the p—value. We would reject at a 10% level of significance with p 0.074456 0.10 . We would not reject if the level of significance were 5% ( p 0.074456 0.05 ). 33 9.9 An automobile manufacturer claims its cars have an average gasoline mileage in excess of 30 mpg. A sample of n=100 cars gives a sample mean usage of 30.5 mpg. Assume the population standard deviation is 2 mpg. Do these data indicate that the company's claim is true. Evaluate using a level of significance of 10%. H 0 : = 0 30 (The claim is false) H A : 0 (the claim is true) =0.10 reject H 0 : if t 1.296 (99df) Data: X 30.5, s 2, s X s / n 0.2 Test statistic: Z P VALUE X 30.5 30 2.5 0 X 0.2 0.007031 = TDIST(2.5,99,1) We would reject the null hypothesis here using the test statistic because Z 2.5 1.296 . We would also reject the null hypothesis using the p—value because p 0.007 0.10 . 34 9.10 A vendor of soap powder wants to know if the machines filling the boxes with its product are operating properly. If too much soap is being put in the boxes the company is losing money; if too little soap is being put in the boxes the customers are being cheated. Suppose a machine is picked which is supposed to fill the boxes with one pound of powder. A sample of 25 boxes is taken resulting in a sample mean of 0.981 pound and a standard deviation of 0.037 pounds. Does the data give sufficient evident to conclude that the machine is not operating properly? Use a 1% level of significance. Using Excel x bar s n s xbar t 0.981 0.037 25 0.0074 = 0.037 / SQRT(25) -2.56757 = (0.981 - 1) / 0.0074 p value 0.016896 = TDIST(2.56757 ,24 ,2) H 0 : =0 1 (The machine is operating properly) H A : 0 (It is not operating properly) =0.01 Because p 0.016896 0.01 we do not have sufficient evidence to reject the null hypothesis at a 1% level of significance. Note that we would have rejected the null at a 2% level of significance. 35 9.11 An automobile manufacturer claims that the average pollution generated by its automobiles does not exceed 100 ppm (parts per million) for each mile driven. To test this claim 40 automobiles are selected, driven, and the pollution produced measures. The mean of the sample is 100.451 ppm and the standard deviation is 1.224 ppm. Test the claim using a 10% level of significance. x bar 100.451 s 1.224 s xbar 0.193531393 = 1.224 / SQRT(40) t 2.330376012 = (100.451 - 100) / 0.193531 p value 0.012525585 = TDIST(2.330388, 39, 1) H 0 : =0 100 (Claim is true ) H A : 0 (Claim is false) =0.10 Conclusion: Because p 0.012526 0.10 this data gives sufficient evidence to reject the null hypothesis. We would have come to a different conclusion if the level of significance had 1%. 36 9.12 The supporters of Proposition 222 on an election ballot claim that a majority of registered voters favor the proposition. A local newspaper questions 1200 registered voters and 602 of them claim they support the proposition. Test the claim using a 10% level of significance. H 0 : p =p0 0.50 H A : p p0 (not a majority) (a majority) =0.10 reject H 0 : if Z >1.28 CLT: np0 =1200(0.5)=600 nq0 =1200(0.5)=600, so CLT holds Data: X 602, pˆ X 602 0.502 n 1200 0.5 0.5 /1000 0.0144 pˆ p0 0.502 0.500 0.14 Test statistic: Z p p0 q0 / n 0 p 0 0.0144 So there is not sufficient evidence to reject the null hypothesis at the 10% level of significance. Looking at Excel po 0.5 qo 1.224 sigma po0.014433757 = SQRT( (0.5)*(0.5)/1200) p hat 0.501666667 0.501666667 Z 0.115491201 = (0.501667 - 0.5) / 0.014434 p value 0.454028325 = 1 - NORMDIST(0.11549, 0, 1, TRUE) The p—value p 0.454 0.10 and this indicates insufficient evidence to reject the null. 37 9.13. A college administrator claims that at least 60% of the students at his institution graduate within 4 years. A statistics professor at that college examines the records of 300 students and finds that 160 of them graduated within 4 years. Test the administrators claim using a 20% level of significance. H 0 : p =p0 0.60 (claim true) H A : p p0 (claim false) =0.20 reject H 0 : if Z 0.84 CLT: np0 =300(0.6)=180 nq0 =300(0.4)=120, so CLT holds Data: X 160, pˆ X 160 0.53 n 300 0.6 0.4 / 300 0.0283 pˆ p0 0.53 0.6 2.5 Test statistic: Z p p0 q0 / n 0 p 0 0.0283 Using Excel n po qo sigma p hat Z p value 300 0.6 0.4 0.028284 0.533333 -2.4735 0.00669 Does CLT hold? n*po= 180 = 300 * 0.6 n*qo= 120 = 300 * 0.4 = SQRT( 0.6 * 0.4 / 300) = 160 / 300 -2.473498233 = NORMDIST( -2.4735 ,0 ,1, TRUE) Conclusion : p 0.00669 0.20 , so we can reject the null hypothesis at a 20% level of significance.