Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Confidence interval wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Psychometrics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Transcript

252y0811 3/7/08 1 ECO252 QBA2 FIRST EXAM February 28, 2008 Version 1 Name _KEY____________ Class hour: _____________ Student number: __________ Show your work! Make Diagrams! Include a vertical line in the middle! Exam is normed on 50 points. Answers without reasons are not usually acceptable. I. (8 points) Do all the following. x ~ N 7, 11 (But you can’t buy donuts there!) 07 1. Px 0 P z Pz 0.64 Pz 0 P0.64 z 0 .5 .2389 .2611 11 For z make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line! Shade the entire area below -0.64. Because this is entirely on the left side of zero, we must subtract the area between -0.64 and zero from the larger area below zero. If you wish, make a completely separate diagram for x . Draw a Normal curve with a mean at 7. Indicate the mean by a vertical line! Shade the entire area below zero. This area is entirely on the left side of the mean (7), so we subtract the smaller area between zero and the mean from the larger entire area (.5) below the mean. 42 7 14 7 z P0.64 z 3.18 P0 z 3.18 P0 z 0.64 2. P14 x 42 P 11 11 .4993 .2389 .2604 For z make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line! Shade the area between 0.64 and 3.18. Because this is entirely on the right side of zero, we must subtract the area between zero and 0.64 from the larger area between zero and 3.18. If you wish, make a completely separate diagram for x . Draw a Normal curve with a mean at 7. Indicate the mean by a vertical line! Shade the area between 14 and 42. This area is entirely on the right side of the mean (7), so we subtract the smaller area between 14 and the mean from the larger area between 42 and the mean. 30 7 30 7 z P 3.36 z 2.09 P3.36 z 0 P0 z 2.09 3. P30 x 30 P 11 11 .4996 .4817 .9813 For z make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line! Shade the entire area between -3.36 and 2.09. Because this is on both sides of zero, we must add the area between -3.36 and zero to the area between zero and 2.09. If you wish, make a completely separate diagram for x . Draw a Normal curve with a mean at 3. Indicate the mean by a vertical line! Shade the entire area between -30 and +30. This area is on both sides of the mean (7), so we add the area between -30 and the mean to the slightly smaller area between the mean and 30. 4. x.0005 (Do not try to use the t table to get this.) (I only need one answer – you may find more than one possibility.) For z make a diagram. Draw a Normal curve with a mean at 0. z .0005 is the value of z with 0.05% of the distribution above it. Since 100 – 0.05 = 99.95, it is also the .9995 fractile. Since 50% of the standardized Normal distribution is below zero, your diagram should show that the probability between z .0005 and zero is 99.95% - 50% = 49.95% or P0 z z.0005 .4995 . If we check this against the Normal table, we would usually find the probability closest to .4995, but we actually have a choice this time. We can say P0 z 3.27 .4995 , but we also could say P0 z 3.32 .4995 . (Actually the table in the text says Pz 3.29 .99950 . ) Any value between 3.27 and 3.32 is acceptable here. So 3.27 z .125 3.32, with values closer to 3.29 best. This is the value of z that you need for a 99.9% confidence interval. To get from z .0005 to x.0005 , use the formula x z , which is the opposite of x . x 7 3.2911 43.19 , but any answer from x 7 3.27 11 42 .97 to x 7 3.32 11 43.52 is acceptable. If you wish, make a completely separate diagram for x . Draw a Normal curve with z 252y0811 3/7/08 a mean at 7. Show that 50% of the distribution is below the mean (3). If 0.05% of the distribution is above x.0005 , it must be above the mean and have 49.95% of the distribution between it and the mean. 43 .19 7 Check: Px 43.19 P z Pz 3.29 Pz 0 P0 z 3.29 .5 .4995 .0005 . 11 This is identical to the way you normally get a p-value for a right-sided test. 2 252y0811 3/7/08 3 II. (9 points-2 point penalty for not trying part a.) Langley) A copier has been turning out 45 copies a minute. After a repair, the copier is tested 5 times. In these 5 runs the output per minute is 46, 47, 48, 47, 47, 46 a. Treating the above numbers as a random sample of size 6 from the Normal distribution, compute the sample standard deviation, s , of expenditures. Show your work! (2) b. Compute a 90% confidence interval for the mean output per minute. (2) c. Redo b) when you find out that there were only 20 runs to pick the sample of 6 from. (2) d. Assume that the population standard deviation is 0.7 and create a 99.9% two-sided confidence interval for the mean. (2) e. Use your results in a) to test the hypothesis that the mean is above 45 at the 90% level. (3) State your null and alternative hypotheses clearly! f. (Extra Credit) Test the hypothesis that the population standard deviation is 0.70 at the 99.9% significance level assuming that a random sample of 50 yielded a sample standard deviation of 0.75. Solution: a. Treating the above numbers as a random sample of size 6 from the Normal distribution, compute the sample standard deviation, s , of expenditures. Show your work! (2) Total 281 13163 Row x x2 1 2 3 4 5 6 46 47 48 47 47 46 2116 2209 2304 2209 2209 2116 x x 281 and x 13163 . x 281 46.83333 x 2 So n 6 13163 646 .83333 2 2.8352 0.5679 and s x 0.5670 0.7530 . n 1 5 5 b. Compute a 90% confidence interval for the mean output per minute. (2) The table below is part of Table 3. Interval for Confidence Hypotheses Test Ratio Critical Value Interval Mean ( x z 2 x xcv 0 z 2 x x 0 H0 : 0 z known) x H1 : 0 x n Mean ( x 0 x t 2 s x xcv 0 t 2 s x H0 : 0 t unknown) sx H1 : 0 DF n 1 s sx n At his point the population standard deviation is unknown, so we must use the formula for s x2 2 nx 2 s 5 unknown. df n 1 5 . .10 . t n1 t.05 2.015 . s x 2 n s2 n 0.5679 0.09465 6 5 s x 46 .8333 2.015 0.30765 46 .8333 0.6199 or 46.213 to 47.453. 0.30765 . So x t .05 c. Redo b) when you find out that there were only 20 runs to pick the sample of 6 from. (2) sx s n N n N 1 s2 N n 0.5679 20 6 .06974 0.26409 n N 1 6 20 1 5 x t .05 s x 46 .8333 2.015 0.26479 46.8333 0.5678 or 46.266 to 47.401 Incidentally s x N n 20 6 0.7368 .8584 N 1 20 1 252y0811 3/7/08 4 d. Assume that the population standard deviation is 0.7 and create a 99.9% two-sided confidence interval for the mean. (2) 1 .999 .001 . We found on page 1 that z z.0005 3.29 (or 2 something similar). x n 2 n 0.7 6 2 0.081667 0.2858 . So x z .0005 x 46 .8333 3.29 0.2858 46 .833 0.940 or 45.893 to 47.773. e. Use your results in a) to test the hypothesis that the mean is above 45 at the 90% level. (3) State your null and alternative hypotheses clearly! Since the statement ‘the mean is above 45’ does not contain an equality, it must be an alternative H : 45 hypothesis. Our hypotheses are 0 .10 , x 46.83333 , n 6 , 0 45 and H 1 : 45 s x 0.30765 . Since we are worrying about the mean being too large, this is a right-sided test and we want a 5 single critical value for the sample mean above 45. tn 1 t .10 1.476 . The two sided formula is xcv 0 t s x and this becomes xcv 0 t s x 45 1.476 0.30765 45 .4541 . Make a 2 diagram. Make a Normal curve centered at 0 45 . (45, of course, is in your ‘do not reject’ zone.) The ‘reject’ zone is the area under the curve above 45.4531. Shade it. Since x 46.83333 is in the reject zone, reject the null hypothesis. We can say that the copier’s speed has improved. You should, of course, only do this one way – I have to do it 3. If you choose to use a test ratio, x 0 46 .8333 45 5.959 . Make a diagram. Make a Normal the formula t gives us t 0.30765 sx curve centered at zero. (Zero, of course, is in your ‘do not reject’ zone.) The ‘reject’ zone is the area under 5 the curve above tn 1 t .10 1.476 . Shade it. Since t 5.959 is in the reject zone, reject the null hypothesis. If you want a p-value for the null hypothesis, remember t 5.959 and df 5 . The 5 5.893 . Since the significance level falls as highest value on the df 5 row of the t-table is t .001 t gets larger, we can say that p value .001 . Since the p-value is below the .10 significance level, reject the null hypothesis. If you choose to do a confidence interval for the population mean, the formula x t s x 2 must be replaced by a one-sided interval in the same direction of the alternate hypothesis H 1: 45 . This means that the interval is x t s x 46.8333 1.476 0.30765 46.379. Make a diagram. Make a Normal curve centered at x 46.83333 . ( x , of course, is in the confidence interval.) The confidence interval is the area under the curve above 46.379.Shade it. Since 0 45 is not in the confidence interval, reject the null hypothesis. 252y0811 3/7/08 5 f. (Extra Credit) Test the hypothesis that the population standard deviation is 0.70 at the 99.9% significance level assuming that a random sample of 50 yielded a sample standard deviation of 0.75. The table below is part of Table 3. Interval for Confidence Hypotheses Test Ratio Critical Value Interval VarianceH 0 : 2 02 n 1s 2 n 1s 2 .25 .5 2 02 2 2 2 2 Small Sample s cv .5 .5 2 02 n 1 H1: : 2 02 VarianceLarge Sample . s 2DF z 2 2DF H 0 : 2 02 z 2 2DF 1 2 H1 : 2 02 s cv 2 DF z 2 2 DF H : 0.70 Our hypotheses are 0 1 .999 .001 , s 0.75 , n 50 and 0 0.70 . The test H 1 : 0.70 ratio, which is what most people use, is 2 n 1s 2 02 49 0.75 2 0.70 2 56 .25 . Because the degrees of freedom, df n 1 50 1 49 , are too high for the chi-squared table, we must use z 2 2 2df 1 256 .25 249 1 256 .25 249 1 112 .50 97 10.6066 9.8489 0.7577 . We could make a diagram showing a Normal curve centered above zero with one ‘reject’ zone below z z.0005 3.29 and a second reject zone above 2 z 2 z.0005 3.29 . Since z 0.7577 does not fall in these zones, do not reject the null hypothesis. However the easiest way for me is to say p value 2Pz 0.7577 2Pz 0 P0 z 0.76 2.5 .2764 .4472 . Since this pvalue is above any significance level that we would ever use, do not reject the null hypothesis. 252y0811 3/7/08 6 III. Do as many of the following problems as you can. (2 points each unless marked otherwise adding to 13+ points). Show your work except in multiple choice questions. (Actually – it doesn’t hurt there either.) If the answer is ‘None of the above,’ put in the correct answer if possible. ( ) gives points for the question. [ ] gives a running total. 1) If I want to test to see if the population mean of x is smaller than 5 my null hypothesis is: i) 5 ii) 5 iii) * 5 iv) 5 vi) None of the above. (So what is it?) vii) Any of i)-iv) could be right. We need more information. Explanation: The statement “The population mean of x is smaller than 5” translates as 5 . Since this does not contain an equality, it is an alternative hypothesis. The opposite is H 0 : 5 and must be the null hypothesis. 2) Assuming that you have a sample mean of 100 based on a sample of 36 taken from a population of 900 and you are testing to see if the population mean is 90 with a known population standard deviation of 80, the 99% critical values for the sample mean are 80 a) 100 2.576 36 80 b) 100 2.626 36 80 c) 100 2.576 900 80 d) * 90 2.576 36 80 e) 90 2.626 36 80 f) 90 2.576 900 Explanation: The statement “The population mean is 90” translates as H 0 : 90 . So we have .01 , 0 90 , n 36 and 80 . The other two statements, x 100 and N 100 are irrelevant since 1) a critical value is based on the null hypothesis and 2) 900 is more than 20 times 36 so no finite population correction is required. On page 2 of this exam we quote Table 3 to say xcv 0 z x . 2 Degrees of freedom are irrelevant since the population standard deviation is known. The t-table says z 2 t .005 2.576 . 3) Which of the following is a Type 1 error? a) Rejecting the null hypothesis when the null hypothesis is false. b) *Rejecting the null hypothesis when the null hypothesis is true. c) Not rejecting the null hypothesis when the null hypothesis is true. d) Not rejecting the null hypothesis when the null hypothesis is false. e) All of the above f) None of the above. 252y0811 3/7/08 7 4) (Langley) It is generally believed that 15% of white Australians are allergic to penicillin. A doctor believes that the allergy occurs in a lower proportion of Native Australians. To test that belief a random sample is gathered of 50 Native Australians and it is found that only one (2% of the sample) is allergic to penicillin. The doctor creates a p-value to compare against a significance level of 5%. What do we mean by a p-value? [8] a) P-value is the 2% proportion. b)* P-value is P p .02 . c) P-value is P p .02 . d) P-value is 2 P p .02 . e) P-value is 2 P p .02 . f) P-value is P p .15 . g) P-value is P p .15 . h) P-value is 2 P p .15 . i) P-value is 2 P p .15 . j) P-value is 2 P p .05 . Explanation: The statement “The allergy occurs in a lower proportion” translates as H 1 : p .15 . The pvalue is a measure of the credibility of the null hypothesis and is defined as the probability that a test statistic or ratio as extreme as or more extreme than the observed statistic or ratio could occur, assuming 2 .02 and the alternative that the null hypothesis is true. The observed statistic that we are using is p 50 hypothesis says that this is a left-sided test. (We would reject the null hypothesis H 0 : p .15 if p is too far below .15) Thus the p-value is P p .02 . Exhibit 1: (Langley) Langley’s daughter loved to play chutes and ladders. She told her daddy, however, that the one die she used seemed to come up a six when she didn’t want a six. A fair die is equally likely to come up with each of its six faces on top. Langley now suspected that the die was coming up a six more often than it should. As an experiment, he rolled the die 108 times and it came up a six 25 times. 5) We wish to test whether Langley’s suspicion in exhibit 1 is correct . To do so, we must do which of the following: [10] a) A z-test of the population mean. b) *A z-test of a population proportion. c) A t-test of the population mean. d) A 2 -test of the population variance. e) A test of the population median f) None of the above (To get full credit propose a test type.) Explanation: If we are trying to compare the proportion of times that six comes up with 1 6 , we are testing proportions. This could be a binomial test, but the only test anyone uses for a proportion with a large sample is a z-test. A number of you said this involved a mean. Ask yourself of what you would take the mean. 6) In Exhibit 1, what are the null and alternative hypotheses that Langley should be testing. (2) According to Table 3 the test for a proportions involves the following. Interval for Confidence Hypotheses Test Ratio Critical Value Interval Proportion p p0 p p z 2 s p pcv p0 z 2 p H 0 : p p0 z H1 : p p0 p pq p0 q0 sp p n n q 1 p q0 1 p0 252y0811 3/7/08 8 “A fair die is equally likely to come up with each of its six faces on top. Langley now suspected that the die was coming up a six more often than it should,” means that Langley believed H 1 : p 1 6 , which is an alternative hypothesis because it does not contain an equality. So the null hypothesis is the opposite, which H 0 : p 1 6 is H 0 : p 1 6 and we have . 1 6 .1667 1 H : p 1 6 7) In Exhibit 1, what is the value of the test ratio that you would use to test your hypotheses in 6)? Show your work. (Note that this could be right even if the answer to 6) is wrong.) (3) Solution: n 108 and x 25 , so p 1 6 5 6 108 25 1 .2315 . p 0 from the null hypothesis. p 108 6 .001286 .03576 . Thus our test ratio is z p p0 p p0 q0 n .2315 .1667 1.8202 .0356 8) Using a 95% confidence level, explain, using your hypotheses, whether the die was fair. (2) [17] Solution: Since z has the standardized Normal distribution and since our alternative hypothesis is H 1 : p 16 , we are worried about p being too large, so we have a right-sided test. With a 95% confidence level and a 1-sided test, we use z 1.645 and test our computed value of z . Make a diagram. The Normal curve is centered at zero and the ‘reject’ region is all points above 1.645. Shade the ‘reject’ region and note that 1.8202 falls in the ‘reject’ region so we reject our null hypothesis and feel that we have demonstrated that the null hypothesis is false. Exhibit 2: (Ng) The manager of the credit department believes that the average balance held by credit card holders is $75. A random sample of 29 accounts is selected and she finds that the sample mean of the amount owed is $83.40 and the sample standard deviation is $23.40. It is believed that the distribution of the population is approximately Normal. 9) We wish to test whether the manager’s belief in exhibit 2 is correct. To do so, we must do which of the following: (1) [18] a) A z-test of the population mean. b) A z-test of a population proportion. c) *A t-test of the population mean. d) A 2 -test of the population variance. e) A test of the population median. f) None of the above (To get full credit propose a test type.) Explanation: If we are trying to compare the sample mean of $83.40 with a population mean of $75 and the only information we have about the variance is a sample standard deviation, the population variance is unknown so we must do a t-test. 252y0811 3/7/08 9 10) a) State your null and alternative hypotheses to test the manager’s belief in Exhibit 2 (1). b) Give an appropriate critical value or values (for a mean, proportion, variance or median). (2) [21] H : 75 Solution: a) 0 b) On page 2 of this exam we quote Table 3 to say that xcv 0 t s x , where 2 H 1 : 75 s . We are given the following information in the exhibit: 0 75 , n 29 , x 83.40 and sx n s 23.40. We will assume in the absence of any other value that .05 and use t n 1 t 28 2.048 . The standard error is s x s 23 .40 2 .025 23 .40 2 18 .8814 4.3453 .If we fill in the critical value 29 n 29 formula, xcv 0 t s x 75 2.048 4.3453 75 8.899 . The critical values are 66.101 and 83.899. 2 Note: You were not asked for a conclusion to this problem, but x 83.40 is barely below the upper critical value, so that we do not reject the null hypothesis. 11) The manager of the credit department believes that the median balance held by credit card holders is above $75 and that the population does not have a Normal distribution. A random sample of 100 accounts is selected and 60 of the accounts have balances above $75. Which of the following is the null hypothesis that the manager will end up testing? (To protect yourself, you might want to explain what p is. Otherwise I will use my own assumption.) (2) [23] a) p .5 b) p .5 c) p .5 . d) p .5 e) * p .5 f) None of the above (To get full credit propose a null hypothesis.) H : 75 Solution: 0 . Note that if the distribution was Normal, we could test for the mean. Other things H 1 : 75 being equal, we can assume that p is the proportion of accounts that have balances above $75. The quick and dirty way to do this is to copy from the outline. The relevant part is starred. Hypotheses about Hypotheses about a proportion a median If p is the proportion If p is the proportion H 0 : 0 H 1 : 0 H 0 : 0 H 1 : 0 above 0 below 0 H 0 : p .5 H 1 : p .5 H 0 : p .5 * H 1 : p .5 H 0 : p .5 H 1 : p .5 H 0 : p .5 H 1 : p .5 Note: I assumed that by p you meant the proportion of accounts that had balances above 75. Telling me that p is a proportion of the accounts doesn’t tell me anything. 252y0811 3/7/08 10 ECO252 QBA2 FIRST EXAM February 28, 2008 TAKE HOME SECTION Name: _________________________ Student Number and class time: _________________________ IV. Do at least 3 problems (at least 7 each) (or do sections adding to at least 20 points - Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H 1 where appropriate. You have not done a hypothesis test unless you have stated your hypotheses, run the numbers and stated your conclusion. (Use a 95% confidence level unless another level is specified.) Answers without reasons usually are not acceptable. Neatness and clarity of explanation are expected. This must be turned in when you take the in-class exam. Note that answers without reasons and citation of appropriate statistical tests receive no credit. Failing to be transparent about which section of which problem you are doing can lose you credit. Many answers require a statistical test, that is, stating or implying a hypothesis and showing why it is true or false by citing a table value or a p-value. If you haven’t done it lately, take a fast look at ECO 252 - Things That You Should Never Do on a Statistics Exam (or Anywhere Else). Problem 1: (Doane and Seward) A fast food restaurant has just started serving hot cocoa. The management wishes to serve cocoa of an average temperature of 142 degrees. 24 measurements of the temperature in 10 stores are taken. You are manager of store a and will use the corresponding column, where a is the second to last digit of your student number. (For example, Seymour Butz’ student number is 543987 so he uses column x8.) If that number is zero, use column 10. You are testing to see if the mean for your store is 142. There will be a penalty if you do not make it clear what column you are using. Row x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 1 140 142 142 143 144 142 143 146 143 143 2 142 143 143 138 139 142 144 145 144 139 3 141 141 141 140 144 142 144 144 141 145 4 142 142 142 140 143 140 145 144 144 144 5 141 139 142 142 141 141 146 145 142 145 6 141 142 140 139 142 141 142 144 140 142 7 145 144 143 142 141 137 145 141 142 141 8 142 145 142 139 138 142 142 141 141 140 9 142 143 141 145 144 139 145 144 146 142 10 142 143 136 139 145 141 143 144 140 142 11 137 141 142 142 139 141 142 139 143 142 12 139 139 138 138 141 143 142 142 144 146 13 139 143 142 142 145 141 141 141 142 142 14 144 144 139 141 142 142 147 142 143 143 15 140 140 140 142 144 140 141 144 142 142 16 141 138 143 141 145 142 137 145 141 140 17 140 141 139 141 142 142 142 139 142 144 18 140 140 139 140 144 142 140 142 144 143 19 140 142 139 142 136 139 144 143 144 141 20 139 140 141 141 138 142 142 146 145 144 21 146 138 143 143 141 143 147 142 145 143 22 138 139 141 141 142 143 146 144 141 141 23 139 142 140 140 140 142 140 144 143 139 24 142 141 143 140 141 140 143 146 142 142 Assume that the Normal distribution applies to the data and use a 98% confidence level. a. Find the sample mean and sample standard deviation of the incomes in your data, showing your work. (1) (Your mean should be between 140 and 146 and your sample standard deviation should be around 2.) b. State your null and alternative hypotheses (1) c. Test the hypothesis using a test ratio (1) d. Test the hypothesis using a critical value for a sample mean. (1) e. Test the hypothesis using a confidence interval (1) f. Find an approximate p-value for the null hypothesis. (1) g. On the basis of your tests, is the mean temperature correct in your restaurant?? Why? (1) h. How do your conclusions change if the random sample of 24 temperatures is taken on a day in which only 48 cups cocoa are sold? (2) i. Assume that the Normal distribution does not apply and test to see if the median is 142. Be careful! What should you do with numbers that are exactly 142? (2) [12] j. (Extra Credit) Do a 98% confidence interval for the median. (2) Problem 2: Once again assume that the Normal distribution applies to the data in Problem 1, but that we know that the population standard deviation is 2. Our confidence level remains 98%, but we are now testing the hypothesis that the mean is below 143 degrees. a. State your null and your alternative hypotheses. (1) b. Find the value of z that you need for a critical value for a 1-sided test if the confidence level is 98%.(1) 252y0811 3/7/08 11 c. Find a critical value for the sample mean to test if the mean is below 143 degrees. (1) d. Test the hypothesis that the mean is below 143 degrees using an appropriate confidence interval. (2) e. Using your critical value from 2b, create a power curve for your test. (6) f. Assume that the population standard deviation is 2. How large a sample do you need to get a two-sided 98% confidence interval with an error not exceeding 0.5 degrees? (2) [22] Problem 3: According to Doane and Seward about 13% of goods bought at a department store are returned. An organization called Return Exchange will sell you a software product called Verify-1for which it makes the claims below. Verify-1® is quickly operational. And it authorizes returns even quicker Verify-1® identifies fraud and abuse at the point of return before they become liabilities to your brand equity or profits. In stand-alone mode, this easy-to-use, turnkey solution can be operational in 30 days and will reduce your return rate immediately, without disrupting your business or IT configuration. Verify-1® also integrates easily into your existing POS platform. You set the policy, Verify-1® enforces it With Verify-1®, your returns are dealt with consistently utilizing advanced statistical modeling in combination with state return laws and your existing return policies. At the point of return, using the customer’s driver’s license or other valid identification, Verify-1® automatically checks prior return behavior and authorizes or declines the transaction. Customers identified as risks for presenting fraudulent returns are declined, while legitimate returns are speedily accepted. You take a sample of n items and find that there were x returns (about 9%).You are the manager of store a . (a is the last digit of your student number. (For example, Seymour Butz’ student number is 543987 so he manages store 7.) The sample size and number of returns for your store is given below. On the basis of this sample, can you now say that the return rate is now below 13%? Use a confidence level of 95%. Store 1 2 3 4 5 6 7 8 9 10 n 275 250 225 200 175 150 125 100 75 50 x 25 22 20 18 16 13 11 9 7 4 a) State your null and alternative hypotheses. (1) Make sure I know which store you manage. b) Test the hypothesis using a test ratio or a critical value for the observed proportion. (1) Make a diagram showing clearly where your ‘reject’ region is. (Do not round excessively. If you compute proportions carry at least 3 significant figures.) c) Find a p-value for your null hypothesis. (1) d) Test your hypothesis using an appropriate confidence interval. (2) [5] e) Using the 13% proportion as an estimate of the true proportion, find out how large a sample you need to create a 95% confidence interval with an error of no more than 1% (2) f) (Extra credit) Remember that the method that you have been using to deal with proportions substitutes the Normal distribution for the binomial distribution. In general the p-values that you have computed are higher than you would get if you used the binomial distribution. Verify this by making a continuity correction as described in the outline and repeating your test in b). (2) g) (Extra credit) Using 13%, your critical value, a point between your critical value and 13% and one or two other points on the side of the critical value implied by the alternative hypothesis (only one point on this side may give a reasonable value for a proportion) put together a power curve for your test. Remember that your standard error will change if the true proportion changes. (8) h) Go back to the test in parts a) b) and c) of this problem. Take your values of n and x and multiply them by 1.6, rounding your values to the nearest whole number (or numbers) if necessary. Find the new value of the test ratio and get a p-value. What does the change in p-value between parts c) and g) suggest about the effect of increased sample size on the power of the test? (3) [32] Problem 4: According to Doane and Seward both the mean and the standard deviation of pH (a measure of acidity) are of interest to winemakers. Assume that your firm (store from the last problem) has gotten into the wine business. A sample of 16 wine bottles is taken. Your column has the same number as your store. Minitab has calculated all sorts of sample statistics on your data. These are listed below. Use them. Row C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 1 3.41 3.44 3.61 3.39 3.41 3.43 3.40 3.56 3.53 3.17 2 3.45 3.42 3.59 3.37 3.39 3.41 3.38 3.53 3.56 3.21 3 3.51 3.45 3.63 3.41 3.43 3.45 3.42 3.59 3.63 3.27 4 3.52 3.48 3.65 3.44 3.46 3.47 3.45 3.63 3.65 3.28 5 3.68 3.68 3.87 3.66 3.69 3.69 3.68 3.95 3.82 3.44 6 3.29 3.45 3.62 3.41 3.43 3.44 3.42 3.58 3.39 3.05 7 3.39 3.42 3.59 3.37 3.39 3.41 3.38 3.53 3.50 3.15 8 3.57 3.50 3.67 3.45 3.48 3.49 3.47 3.65 3.70 3.33 9 3.38 3.41 3.58 3.36 3.38 3.40 3.37 3.52 3.49 3.14 10 3.14 3.36 3.52 3.30 3.32 3.34 3.31 3.43 3.23 2.90 11 3.61 3.69 3.87 3.66 3.70 3.70 3.68 3.95 3.75 3.37 12 3.23 3.40 3.57 3.35 3.37 3.39 3.36 3.51 3.32 2.99 13 3.48 3.48 3.66 3.44 3.46 3.48 3.45 3.63 3.59 3.24 14 3.39 3.48 3.65 3.43 3.45 3.47 3.44 3.62 3.51 3.15 15 3.49 3.45 3.62 3.40 3.42 3.44 3.41 3.57 3.61 3.25 16 3.50 3.63 3.81 3.60 3.63 3.64 3.62 3.87 3.62 3.26 252y0811 3/7/08 Variable C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 12 N N* Mean SE Mean StDev Minimum Q1 Median Q3 16 16 16 16 16 16 16 16 16 16 0 0 0 0 0 0 0 0 0 0 3.4400 3.4837 3.6569 3.4400 3.4631 3.4781 3.4525 3.6325 3.5562 3.2000 0.0347 0.0245 0.0259 0.0268 0.0281 0.0265 0.0278 0.0388 0.0382 0.0347 0.1387 0.0980 0.1037 0.1072 0.1124 0.1061 0.1110 0.1553 0.1528 0.1387 3.1400 3.3600 3.5200 3.3000 3.3200 3.3400 3.3100 3.4300 3.2300 2.9000 3.3825 3.4200 3.5900 3.3700 3.3900 3.4100 3.3800 3.5300 3.4925 3.1425 3.4650 3.4500 3.6250 3.4100 3.4300 3.4450 3.4200 3.5850 3.5750 3.2250 3.5175 3.4950 3.6675 3.4475 3.4750 3.4875 3.4650 3.6450 3.6450 3.2775 You must state Maximum 3.6800 3.6900 3.8700 3.6600 3.7000 3.7000 3.6800 3.9500 3.8200 3.4400 H 0 and H 1 where applicable to get credit for any of the tests below. Make sure that I know which column you are using! a) The acceptable standard deviation for wine pH is 0.10. Using the data for your store, test the hypothesis that the standard deviation is 0.10 using a 95% confidence level. (2) b) Test the hypothesis that the standard deviation is below .14. (1) c) Repeat a) and b) using the sample (mean and) variance you used in a) and b) but assuming a sample size of 100. Find pvalues. (4) d) Find 2-sided 95% confidence interval for the standard deviation using data from your store and assuming a sample size of 16. (2) e) Repeat d) for a sample size of 100. (1) [41] f) Here’s the easiest question on the exam. By now you should have figured out that you don’t have to understand a statistical test at all if you know i) what it assumes, ii) what the null hypothesis is and iii) what the p-value is associated with the null hypothesis. So, I am going to do a test that the standard deviation is 0.1 on the following data set. C11 3.53 3.51 3.54 3.57 3.78 3.54 3.51 3.59 3.50 3.44 3.78 3.49 3.57 3.57 3.54 3.72 Then I am going to run a Lilliefors test on these data using Minitab. The null hypothesis of the Lilliefors test is that the sample comes from the Normal distribution. The test makes no assumptions about the mean and standard deviation of the population and computes these as sample statistics from the data. After it printed ‘Probability plot of C11,’ the computer printed a graph of the data, but the only thing I looked at was the p-value which was less than .01. After the Lilliefors test, the computer printed out the results of two versions of a statistical test on the standard deviation. The ‘Standard’ version is the method that you learned and is only applicable if the data comes from a Normal distribution. The ‘Adjusted’ version is for all other cases. So explain what p-value I look at and what it tells me. MTB > NormTest c11; SUBC> KSTest. Probability Plot of C11 MTB > OneVariance c11; SUBC> Test .1; SUBC> Confidence 95.0; SUBC> Alternative 0; SUBC> StDeviation. Test and CI for One Standard Deviation: C11 Method Null hypothesis Sigma = 0.1 Alternative hypothesis Sigma not = 0.1 The standard method is only for the normal distribution. The adjusted method is for any continuous distribution. Statistics Variable N C11 16 StDev 0.100 Variance 0.0100 95% Confidence Intervals Variable C11 Tests Variable C11 Method Standard Adjusted CI for StDev (0.074, 0.155) (0.071, 0.170) Method Standard Adjusted Chi-Square 15.06 11.12 CI for Variance (0.0055, 0.0240) (0.0050, 0.0288) DF 15.00 11.07 P-Value 0.895 0.880 252y0811 3/7/08 13 Solutions Problem 1: (Doane and Seward) A fast food restaurant has just started serving hot cocoa. The management wishes to serve cocoa of an average temperature of 142 degrees. 24 measurements of the temperature in 10 stores are taken. You are manager of store a and will use the corresponding column, where a is the second to last digit of your student number. (For example, Seymour Butz’ student number is 543987 so he uses column x8.) If that number is zero, use column 10. You are testing to see if the mean for your store is 142. There will be a penalty if you do not make it clear what column you are using. Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 x1 140 142 141 142 141 141 145 142 142 142 137 139 139 144 140 141 140 140 140 139 146 138 139 142 x2 142 143 141 142 139 142 144 145 143 143 141 139 143 144 140 138 141 140 142 140 138 139 142 141 x3 142 143 141 142 142 140 143 142 141 136 142 138 142 139 140 143 139 139 139 141 143 141 140 143 x4 143 138 140 140 142 139 142 139 145 139 142 138 142 141 142 141 141 140 142 141 143 141 140 140 x5 144 139 144 143 141 142 141 138 144 145 139 141 145 142 144 145 142 144 136 138 141 142 140 141 x6 142 142 142 140 141 141 137 142 139 141 141 143 141 142 140 142 142 142 139 142 143 143 142 140 x7 143 144 144 145 146 142 145 142 145 143 142 142 141 147 141 137 142 140 144 142 147 146 140 143 x8 146 145 144 144 145 144 141 141 144 144 139 142 141 142 144 145 139 142 143 146 142 144 144 146 x9 143 144 141 144 142 140 142 141 146 140 143 144 142 143 142 141 142 144 144 145 145 141 143 142 x10 143 139 145 144 145 142 141 140 142 142 142 146 142 143 142 140 144 143 141 144 143 141 139 142 Assume that the Normal distribution applies to the data and use a 98% confidence level .02 . a. Find the sample mean and sample standard deviation of the incomes in your data, showing your work. (1) (Your mean should be between 140 and 146 and your sample standard deviation should be around 2.) Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 x4 x 42 x8 x 82 143 20449 146 21316 138 19044 145 21025 140 19600 144 20736 140 19600 144 20736 142 20164 145 21025 139 19321 144 20736 142 20164 141 19881 139 19321 141 19881 145 21025 144 20736 139 19321 144 20736 142 20164 139 19321 138 19044 142 20164 142 20164 141 19881 141 19881 142 20164 142 20164 144 20736 141 19881 145 21025 141 19881 139 19321 140 19600 142 20164 142 20164 143 20449 141 19881 146 21316 143 20449 142 20164 141 19881 144 20736 140 19600 144 20736 140 19600 146 21316 3381 476363 3437 492301 x Column 4: x4 x 4 n 4 3381 , 2 s 4 2.8098 1.6762 x8 x x n 8 2 4 476363 and n 4 24 3381 140 .875 s 42 24 476363 24 140 .875 23 Column 8: x 8 3437 , x 2 8 n 1 492301 and n8 24 3437 143 .20833 s 82 24 492301 24 143 .20833 23 nx 4 2 64 .6250 2.8098 23 x 2 2 4 x 2 8 nx8 2 n 1 93 ..9583 4.0851 23 s8 4.0451 2.0212 If you used the definitional formula, your answers should be identical except for rounding error. 252y0811 3/7/08 14 For later use the standard errors are s 4 x s8 x s8 n H 0 : 142 H 1 : 142 s82 s4 n s 42 n 2.8098 0.3422 and 24 4.0851 0.4126 . n 24 b. State your null and alternative hypotheses (1) 23 Note: 0 142 , .02 and df n 1 23 . For a 2-sided test use t .01 2.500 . The table below is part of Table 3. Interval for Confidence Interval Mean ( x t 2 s x unknown) DF n 1 Hypotheses Test Ratio H0 : 0 t H1 : 0 x 0 sx Critical Value xcv 0 t 2 s x sx s n c. Test the hypothesis using a test ratio (1) Make a diagram. Make a Normal curve with a center at zero. Indicate two ‘reject’ zones, one below 23 23 t .01 -2.500 and one above t .01 2.500. For Column 4 the sample mean is x 4 140 .875 and the standard error is s 4 x 0.3422 . So the value of the test ratio is t x 0 140 .875 142 -3.288. This is in the lower ‘reject’ zone so reject the null 0.3422 sx hypothesis. For Column 8 the sample mean is x8 143 .20833 and the standard error is s 8 x 0.4126 . So the value of the test ratio is t x 0 143 .20833 142 2.9286. This is in the upper ‘reject’ zone so reject the null 0.4126 sx hypothesis. d. Test the hypothesis using a critical value for a sample mean. (1) 23 2.500 and 0 142 . xcv 0 t s x where t .01 2 For Column 4, the standard error is s 4 x 0.3422 so the critical values are xcv 142 2.500 0.3422 142 0.856 . Make a diagram. Make a Normal curve with a center at 0 142 . Indicate two ‘reject’ zones, one below 141.145 and one above 142.856. Since the sample mean of x 4 140 .875 falls in the lower ‘reject’ zone, reject the null hypothesis, For Column 8 the standard error is s 8 x 0.4126 , so the critical values are x cv 142 2.500 0.4126 142 1.032 . Make a diagram. Make a Normal curve with a center at 0 142 . Indicate two ‘reject’ zones, one below 140.969 and one above 143.032. Since the sample mean of x8 143 .20833 falls in the upper ‘reject’ zone, reject the null hypothesis. e. Test the hypothesis using a confidence interval (1) 23 2.500. x t 2 s x where t .01 For Column 4 the sample mean is x 4 140 .875 and the standard error is s 4 x 0.3422 . So 140 .875 2.500 0.3422 140 .875 0.856 . Make a diagram. Make a Normal curve with a center at x 4 140 .875 . Shade the confidence interval between 140.019 and 141.731. Since 0 142 does not fall in the confidence interval, reject the null hypothesis. 252y0811 3/7/08 15 For Column 8 the sample mean is x8 143 .20833 and the standard error is s 8 x 0.4126 . So 143 .20833 2.500 0.4126 143 .208 1.032 . Make a diagram. Make a Normal curve with a center at x8 143 .20833 . Shade the confidence interval between 142.176 and 144.240. Since 0 142 does not fall in the confidence interval, reject the null hypothesis. f. Find an approximate p-value for the null hypothesis. (1) Here is the df n 1 23 line of the t table. df .45 .40 .35 .30 .25 .20 .15 .10 .05 .025 .01 .005 .001 23 0.127 0.256 0.390 0.532 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.485 x 0 = -3.288. Since 3.288 is sx between 2.807 and 3.485, we can say .005 Pt 3.288 .001 . Since this is a 2-sided test, we double the p-value and say P value 2Px 140 .875 2Pt 3.288 2Pt 3.288 . Thus .01 p value .002 . For Column 4 the sample mean is x 4 140 .875 and test ratio is t x 0 = 2.9286. Since 2.9286 is sx between 2.807 and 3.485, we can say .005 Pt 2.9286 .001 . Since this is a 2-sided test, we double the p-value and say P value 2Px 143 .20833 2Pt 3.485 . Thus .01 p value .002 . It is time to give the p-values for all of the columns as computed by Minitab. P-values are underlined. For Column 8 the sample mean is x8 143 .20833 and the test ratio is t MTB > Onet c1; SUBC> Test 142. One-Sample T: x1 Test of mu = 142 vs not = 142 Variable N Mean StDev SE Mean x1 24 140.917 2.104 0.430 95% CI (140.028, 141.805) T -2.52 P 0.019 MTB > Onet c2; SUBC> Test 142. One-Sample T: x2 Test of mu = 142 vs not = 142 Variable N Mean StDev SE Mean x2 24 141.333 1.926 0.393 95% CI (140.520, 142.147) T -1.70 P 0.103 MTB > Onet c3; SUBC> Test 142. One-Sample T: x3 Test of mu = 142 vs not = 142 Variable N Mean StDev SE Mean x3 24 140.875 1.849 0.377 95% CI (140.094, 141.656) T -2.98 P 0.007 MTB > Onet c4; SUBC> Test 142. One-Sample T: x4 Test of mu = 142 vs not = 142 Variable N Mean StDev SE Mean x4 24 140.875 1.676 0.342 95% CI (140.167, 141.583) T -3.29 P 0.003 MTB > Onet c5; SUBC> Test 142. One-Sample T: x5 Test of mu = 142 vs not = 142 Variable N Mean StDev SE Mean x5 24 141.708 2.476 0.505 95% CI (140.663, 142.754) T -0.58 P 0.569 MTB > Onet c6; SUBC> Test 142. One-Sample T: x6 Test of mu = 142 vs not = 142 Variable N Mean StDev SE Mean x6 24 141.208 1.444 0.295 95% CI (140.599, 141.818) T -2.69 P 0.013 252y0811 3/7/08 16 MTB > Onet c7; SUBC> Test 142. One-Sample T: x7 Test of mu = 142 vs not = 142 Variable N Mean StDev SE Mean x7 24 143.042 2.404 0.491 95% CI (142.026, 144.057) T 2.12 P 0.045 MTB > Onet c8; SUBC> Test 142. One-Sample T: x8 Test of mu = 142 vs not = 142 Variable N Mean StDev SE Mean x8 24 143.208 2.021 0.413 95% CI (142.355, 144.062) T 2.93 P 0.008 MTB > Onet c9; SUBC> Test 142. One-Sample T: x9 Test of mu = 142 vs not = 142 Variable N Mean StDev SE Mean x9 24 142.667 1.606 0.328 95% CI (141.988, 143.345) T 2.03 P 0.054 MTB > Onet c10; SUBC> Test 142. One-Sample T: x10 Test of mu = 142 vs not = 142 Variable N Mean StDev SE Mean x10 24 142.292 1.829 0.373 95% CI (141.519, 143.064) T 0.78 P 0.443 g. On the basis of your tests, is the mean temperature correct in your restaurant?? Why? (1) Our statistical tests .02 seem to show that columns 1, 3, 4, 6, and 8 give p-values below the significance level, so we reject the hypothesis that the mean temperature is 42 in the corresponding restaurants. However, the p-values for columns 2, 5, 7, 9 and 10 are above the significance level, so we cannot reject the hypothesis. h. How do your conclusions change if the random sample of 24 temperatures is taken on a day in which only 48 cups cocoa are sold? (2) This is pushing things, because you might contend that the samples given are samples from an indefinitely long sting of cocoas. But if we assume that the machine is shut down and recalibrated after the 48 cups are taken, we need a finite population correction because the sample of 24 cups is being taken from a population that is far less than 20 times the sample size. Let’s try this for Column 4. s 4 x s4 N n N 1 s 42 n N n 2.8098 N 1 24 24 0.3422 0.51064 47 n x 0 = -3.288, we are 0.3422 0.71459 0.2445 . If we put this into a t-ratio, the ratio, which was t sx dividing the old ratio by 0.71459, which makes the t-ratio larger, in this case 4.60. This value is off the ttable indicating p value .002 . Any column that resulted in a rejection of the null hypothesis will still 23 2.500 was the value of t that we used in c) above. If reject it with a lower p-value. Remember that t .01 we check the Minitab output for column 2, t 1.70 becomes 2.38 so we still do not reject H 0 . For column 5, t 0.58 becomes -0.81 so we still do not reject H 0 . For column 7, t 2.12 becomes 2.97 so we reject H 0 . For column 9, t 2.03 becomes 2.84 so we reject H 0 . For column 10, t 0.78 becomes 1.09 so we still do not reject H 0 . 252y0811 3/7/08 17 i. Assume that the Normal distribution does not apply and test to see if the median is 142. Be careful! What should you do with numbers that are exactly 142? (2) [12] The following comes from the outline. Hypotheses about Hypotheses about a proportion a median If p is the proportion If p is the proportion H 0 : 0 H 1 : 0 above 0 below 0 H 0 : p .5 H 1 : p .5 H 0 : p .5 H 1 : p .5 The numbers have been placed in order below and the 142s eliminated. At the end of each column we find x , the number of items above 142, and n , the total count of the remaining column. Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x n x1 137 138 139 139 139 139 140 140 140 140 140 141 141 141 141 144 145 146 x2 138 138 139 139 139 140 140 140 141 141 141 141 143 143 143 143 144 144 145 x3 136 138 139 139 139 139 140 140 140 141 141 141 141 143 143 143 143 143 x4 138 138 139 139 139 140 140 140 140 140 141 141 141 141 141 143 143 145 3 18 7 19 5 18 3 18 x5 136 138 138 139 139 140 141 141 141 141 141 143 144 144 144 144 144 145 145 145 9 20 x6 137 139 139 140 140 140 141 141 141 141 141 143 143 143 x7 137 140 140 141 141 143 143 143 144 144 144 145 145 145 146 146 147 147 3 14 13 18 x8 139 139 141 141 141 143 144 144 144 144 144 144 144 144 145 145 145 146 146 146 15 20 x9 140 140 141 141 141 141 143 143 143 143 144 144 144 144 144 145 145 146 x10 139 139 140 140 141 141 141 143 143 143 143 144 144 144 145 145 146 12 18 10 17 The outline says that for relatively small values of n , a continuity correction is advisable, so try n n 2x 1 n z , where the + applies if x , and the applies if x . 2 2 n Column 1 x 3, n 18 Column 2 x 7, n 19 Column 3 x 5, n 18 Column 4 x 3, n 18 Column 5 x 9, n 20 Column 6 x 3 n 14 pvalue 2 P z pvalue 2 P z pvalue 2 P z pvalue 2 P z pvalue 2 P z pvalue 2 P z 23 1 18 2Pz 3.06 2.5 .4989 .0022 18 27 1 19 2Pz 0.92 2.5 .3212 .3576 19 25 1 18 2Pz 1.64 2.5 .4495 .1010 18 28 1 18 2Pz 0.24 2.5 .0948 .8014 18 29 1 20 2Pz 0.22 2.5 .0871 .8258 20 23 1 14 2Pz 1.87 2.5 .4693 .0614 14 252y0811 3/7/08 Column 7 Column 8 Column 9 18 213 1 18 2Pz 1.64 2.5 .4495 .1010 x 13, n 18 pvalue 2 P z 18 215 1 20 2 Pz 2.46 2.5 .4931 .0138164 x 15, n 20 pvalue 2 P z 20 212 1 18 2 Pz 5.24 2.5 .5000 .0000 x 12, n 18 pvalue 2 P z 18 210 1 17 2 Pz 0.49 2.5 .1879 .6242 Column 10 x 10, n 17 pvalue 2 P z 17 There are actually 2 ways to do this. Remember .02 and z z.01 2.576 , so reject the null 2 hy6pothesis if z is not between 2.576 or if the p-value is below .02. The only rejection we have here is for column 9. j. (Extra Credit) Do a 98% confidence interval for the median. (2) Actually, this is a lot easier than the last section. If we don’t mind being a bit sloppy, the outline recommends using k n 1 z . 2 n 2 . For the original data n 24 and z z.01 2.576 , so 2 24 1 2.576 24 6.19 . This must be rounded down, so use the 6th number from the bottom and the 2 19th from the bottom, which is the 6th (24 + 1 – 6) number from the top. The columns appear below with the 6th and 19th number in boldface. k Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 x1 137 138 139 139 139 139 140 140 140 140 140 141 141 141 141 142 142 142 142 142 142 144 145 146 x2 138 138 139 139 139 140 140 140 141 141 141 141 142 142 142 142 142 143 143 143 143 144 144 145 x3 136 138 139 139 139 139 140 140 140 141 141 141 141 142 142 142 142 142 142 143 143 143 143 143 x4 138 138 139 139 139 140 140 140 140 140 141 141 141 141 141 142 142 142 142 142 142 143 143 145 x5 136 138 138 139 139 140 141 141 141 141 141 142 142 142 142 143 144 144 144 144 144 145 145 145 x6 137 139 139 140 140 140 141 141 141 141 141 142 142 142 142 142 142 142 142 142 142 143 143 143 x7 137 140 140 141 141 142 142 142 142 142 142 143 143 143 144 144 144 145 145 145 146 146 147 147 x8 139 139 141 141 141 142 142 142 142 143 144 144 144 144 144 144 144 144 145 145 145 146 146 146 x9 140 140 141 141 141 141 142 142 142 142 142 142 143 143 143 143 144 144 144 144 144 145 145 146 x10 139 139 140 140 141 141 141 142 142 142 142 142 142 142 143 143 143 143 144 144 144 145 145 146 252y0811 3/7/08 19 Problem 2: Once again assume that the Normal distribution applies to the data in Problem 1, but that we know that the population standard deviation is 2. Our confidence level remains 98%, but we are now testing the hypothesis that the mean is below 143 degrees. a. State your null and your alternative hypotheses. (1) H : 143 Our hypotheses are 0 .02 , 0 143 and this is a left sided test because we are worrying about H 1 : 143 our sample mean being below 143. Recall that n 24 , so that our standard error is x 0.16667 0.40825 . The table below is part of Table 3. Interval for Confidence Hypotheses Interval Mean ( H0 : 0 x z 2 x known) H1 : 0 x n Test Ratio z n 2 24 4 24 Critical Value x 0 x xcv 0 z 2 x b. Find the value of z that you need for a critical value for a 1-sided test if the confidence level is 98%. (or, for less credit, 99%) Actually, I should have had you do the test, but, since we are doing a left-sided test, so we must compute x 0 x 143 and, if .01 reject our hypothesis is we find that z is below z.01 2.327 . If z 0.40825 x .02 , use z.02 2.054 or something similar. (Find z.02 2.054 by figuring out that P0 z z.02 .4800 ) c. Find a critical value for the sample mean to test if the mean is below 143 degrees. (1) If .01 use xcv 0 z x 143 2.327 0.40825 142 .050 If .02 use xcv 0 z x 143 2.054 0.40825 142 .161 d. Test the hypothesis that the mean is below 143 degrees using an appropriate confidence interval. (2) Our alternative hypothesis is H 1 : 143 , so we use x z x x z 0.40825 If .01 use 143 .21 2.327 0.40825 144 .16 to 140 .87 2.327 0.40825 141 .82 . ( z.005 2.576 , so a 2-sided interval would be 1.05 ) If your sample mean was 142.05 or lower you will contradict and reject H 0 : 143 . If .02 use 143 .21 2.054 0.40825 144 .05 to 140 .87 2.054 0.40825 141 .71 ( z.01 2.327 , so a 2-sided interval would be 0.95 ) If your sample mean was 142.16 or lower you will contradict and reject H 0 : 143 . e. Using your critical value from 2b, create a power curve for your test. (6) If .01 , you are using x cv 0 z x 142 .050 and you will not reject the null hypothesis if x 142 .050 . A group of suggested points for computing are 143, 142.5, 142.050, 141.5 and 141 1 143 Pz 142 .050 143 Pz 2.33 .5 .4901 .9901 99 % . This was just a 0.40825 check. Power .01 1 142 .5 Pz 142 .050 142 .5 Pz 1.10 .5 .3643 .8643 Power 13.6% 0.40825 252y0811 3/7/08 20 1 141 .952 Pz 1 141 .5 Pz 1 141 Pz 142 .050 141 .050 Pz 0 .5. Power 50 % 0.40825 142 .050 141 .5 Pz 1.35 .5 .4115 .0885 Power 91.2% 0.40825 142 .050 141 Pz 2.57 .5 .4949 .0051 Power 99.5% 0.40825 If .02 , you are using xcv 142 .161 and you will not reject the null hypothesis if x 142 .161 . A group of suggested points for computing are 143, 142.6, 142.161, 141.8 and 141.4 1 143 Pz 142 .161 143 Pz 2.06 .5 .4803 .9803 98 % . This was just a 0.40825 check. Power .02 1 142 .6 Pz 1 142 .161 Pz 1 141 .8 Pz 142 .161 142 .6 Pz 1.07 .5 .3577 .8577 Power 14.2% 0.40825 142 .161 142 .161 Pz 0 .5000 Power 50 % 0.40825 142 .161 141 .8 Pz 0.88 .5 .3106 .1894 Power 81.1% 0.40825 142 .161 141 .4 Pz 1.86 .5 .4686 .0314 Power 96.9% 0.40825 Graph this neatly, with the means on the x axis and probabilities between zero and one on the y axis. 1 141 .4 Pz f. Assume that the population standard deviation is 2. How large a sample do you need to get a two-sided 98% confidence interval with an error not exceeding 0.5 degrees? (2) [22] The formula is n z 2 2 e2 . If .01 , z.005 2.576 and n 2.576 2 2 2 If .02 , z.01 2.327 and n 2.327 2 2 2 0.5 2 0.5 2 106 .17 . Use 107 or more. 86 .64 . Use 87 or more. 252y0811 3/7/08 21 Problem 3: According to Doane and Seward about 13% of goods bought at a department store are returned. An organization called Return Exchange will sell you a software product called Verify-1for which it makes the claims below. Verify-1® is quickly operational. And it authorizes returns even quicker Verify-1® identifies fraud and abuse at the point of return before they become liabilities to your brand equity or profits. In stand-alone mode, this easy-to-use, turnkey solution can be operational in 30 days and will reduce your return rate immediately, without disrupting your business or IT configuration. Verify-1® also integrates easily into your existing POS platform. You set the policy, Verify-1® enforces it With Verify-1®, your returns are dealt with consistently utilizing advanced statistical modeling in combination with state return laws and your existing return policies. At the point of return, using the customer’s driver’s license or other valid identification, Verify-1® automatically checks prior return behavior and authorizes or declines the transaction. Customers identified as risks for presenting fraudulent returns are declined, while legitimate returns are speedily accepted. You take a sample of n items and find that there were x returns (about 9%).You are the manager of store a . (a is the last digit of your student number. (For example, Seymour Butz’ student number is 543987 so he manages store 7.) The sample size and number of returns for your store is given below. On the basis of this sample, can you now say that the return rate is now below 13%? Use a confidence level of 95%. Store 1 2 3 4 5 6 7 8 9 10 n 275 250 225 200 175 150 125 100 75 50 x 25 22 20 18 16 13 11 9 7 4 a) State your null and alternative hypotheses. (1) Make sure I know which store you manage. According to Table 3 the test for a proportion involves the following. Interval for Confidence Hypotheses Test Ratio Critical Value Interval Proportion p p0 p p z 2 s p pcv p0 z 2 p H 0 : p p0 z p H1 : p p0 pq p0 q0 sp p n n q 1 p q0 1 p0 You are trying to find out if your returns are significantly below 13%. This is an alternative hypothesis. H : p .13 Your hypotheses are 0 . . This is a left-sided test since values of the observed proportion below H 1 : p .13 .13 are the only values that could lead to a rejection of the null hypotheses. p 0 .13 . b) Test the hypothesis using a test ratio or a critical value for the observed proportion. (1) Make a diagram showing clearly where your ‘reject’ region is. (Do not round excessively. If you compute proportions carry at least 3 significant figures.) (Store 1) The problem states that we are testing p .13 , when n 275 and .05 . p0 q0 25 .13.87 .09091 and we can compute p .000411 0.02028 , 275 n 275 where p0 comes from our null hypothesis and q0 1 p0 . x 25 or p sp pq .09091 .90909 .000301 0.017336 n 275 252y0811 3/7/08 It is most expedient to use a test ratio, z 22 p p0 p .09091 .13 1.928 . Make a diagram. Show a .02028 normal curve with a mean at zero and an area of 5% below z.05 1.645 . The area below -1.645 is the ‘rejection zone.’ Since our value of z is below -1.645, it is in the ‘rejection zone,’ and we reject the null hypothesis. If we want a critical value use p 0 z.05 p .13 1.645 .02028 .0966 . We reject our null hypothesis if 25 .09091 is in the ‘rejection zone,’ and we reject the null hypothesis. 275 (Store 10) The problem states that we are testing p .13 , when n 50 and .05 . p .0966 . p 4 .08000 and we can compute p 50 p0 comes from our null hypothesis and q0 1 p0 . x 4 or p p0 q0 .13.87 .002262 0.04756 , where n 50 pq .08 .92 .001472 0.038361 n 50 p p0 .08 .13 It is most expedient to use a test ratio, z 1.051 . Make a diagram. Show a normal p .04756 sp curve with a mean at zero and an area of 5% below z.05 1.645 . The area below -1.645 is the ‘rejection zone.’ Since our value of z is not below -1.645, it is not in the ‘rejection zone,’ and we do not reject the null hypothesis. If we want a critical value, use p 0 z.05 p .13 1.645 .04756 .05176 . We reject our null hypothesis 4 .08000 is not in the ‘rejection zone,’ and we do not reject the null hypothesis. 50 c) Find a p-value for your null hypothesis. (1) (Store 1) pvalue Px 25 P p .09091 Pz 1.928 .5 P1.93 z 0 .5 .4732 .0268 . Since this is below .05 , we reject the null hypothesis. (Store 10) pvalue Px 4 P p .08 Pz 1.051 .5 P1.05 z 0 .5 .3531 .1469 . Since this is above .05 , we cannot reject the null hypothesis. if p .0576 . p d) Test your hypothesis using an appropriate confidence interval. (2) [5] H : p .13 Our hypotheses are 0 and the formula for a two-sided confidence interval is p p z s p . If 2 H 1 : p .13 we form a one-sided interval by mimicking the alternative hypothesis, we find p p z s p . Remember that a confidence interval always includes p . z.05 1.645 25 .09091 and s p 0.017336 p .09091 1.645 0.017336 .11943 275 4 (Store 10) p .08000 and s p 0.038361 p .0800 1.645 0.038361 .1431 50 Make a diagram. Draw a Normal curve with a mean at your value of p . Shade the area to the left the (Store 1) p value just computed. For store 1, the interval p .11943 does not include p 0 .13 , so reject the null hypothesis. For store 10, the interval p .1431 includes p 0 .13 , so do not reject the null hypothesis. 252y0811 3/7/08 23 e) Using the 13% proportion as an estimate of the true proportion, find out how large a sample you need to create a 95% confidence interval with an error of no more than 1% (2) This is a two-sided interval, so we can use z.005 1.960 . Then n pqz 2 e2 .13.87 1.960 2 4344 .8 . The .01 sample must have a size of at least 4345. f) (Extra credit) Remember that the method that you have been using to deal with proportions substitutes the Normal distribution for the binomial distribution. In general the p-values that you have computed are lower than you would get if you used the binomial distribution. Verify this by making a continuity correction as described in the outline and repeating your test in c). (2) p .5 n p 0 According to the outline, the continuity corrected version of z is z . The rule of thumb with p this is to use + if p p 0 and to use – if p p 0 . 25 .09091 . We had pvalue Pz 1.928 .5 .4732 .0268 . Since this was below 275 p .5 275 p0 .05 , we rejected the null hypothesis. With continuity correction we get z p (Store 1) p .092727 .13 1.838 . We now find pvalue Pz 1.838 .5 .4671 .0336 . .02028 4 (Store 10) p .08000 . We had pvalue Pz 1.051 .5 .3531 .1469 . Since this was above 50 p .5 50 p0 .05 , we could not reject the null hypothesis. With continuity correction we get z p .09 .13 0.841 . We now find pvalue Pz 0.841 .5 .2995 .2005 . For the values found by .04756 Minitab with and without the correction see the Appendix. g) (Extra credit) Using 13%, your critical value, a point between your critical value and 13% and one or two other points on the side of the critical value implied by the alternative hypothesis (only one point on this side may give a reasonable value for a proportion) put together a power curve for your test. Remember that your standard error will change if the true proportion changes. (8) H : p .13 We are testing 0 with .05 . H 1 : p .13 (Store 1) If we want a critical value use p 0 z.05 p .13 1.645 .02028 .0966 . We do not reject our null hypothesis if p .0966 . For a power curve we will try proportions p1 of .13, .115, .0966, .085 and .07. Recall that p .0966 p1 p p1 p1 q1 and that P p p1 P z 1 or P p .0966 P z p p n .0966 .13 .13.87 P p .0966 P z Pz 1.65 .5 .4505 0.02028 0.02028 275 .9505 .95 . This is automatic, but serves as a check. Power .050 If p .13 , p .115 .885 0.01924 275 .5 .3315 .8815 . Power .119 If p .115 , p .0966 .115 P p .0966 P z Pz 0.96 0.01924 If p .0966 , p Not needed .0966 ..0966 P p .0966 P z Pz 0 .5000 p Power .500 252y0811 3/7/08 24 .0966 .085 .085 .915 Pz 0.69 .5 .2549 0.016817 P p .0966 P z 0.016817 275 .2451 . Power .755 If p .085 , p .07 .93 0.01539 275 = 0418. Power .958 If p .07 , p .0966 .07 P p .0966 P z Pz 1.73 .5 .4582 0.01539 (Store 10) If we want a critical value, use p 0 z.05 p .13 1.645 .04756 .05176 . We do not reject our null hypothesis if p .0576 . For a power curve we will try proportions p1 of .13, .09, .05178, and .01. I also tried .02 to verify my results. Recall that p p p1 p1 q1 and that P p p1 P z 1 or p n .05176 p1 P p .05176 P z . Note how much faster power rises than in the store 1 version. p .13.87 0.04756 50 =.5 + .0445 = .9500. Power .050 If p .13 , p .09 .91 0.04047 50 =.5 + .3264 = .8264 Power .174 .05176 .13 P p .0576 P z Pz 1.645 .04756 If p .09 , p .05176 .09 P p .0576 P z Pz 0.94 .04047 If p .05178 , p Not needed .05176 .05176 P p .0576 P z Pz 0 .5000 p Power .500 .02 .98 0.01980 50 = .0548 Power .945 If p .02 , p .01.99 0.01407 50 =.0015 Power .9985 If p .01 , p .05176 .02 P p .0576 P z Pz 1.60 .5 .4452 .01980 .05176 .01 P p .0576 P z Pz 2.97 .5 .4985 .01407 h) Go back to the test in parts a) b) and c) of this problem. Take your values of n and x and multiply them by 1.6, rounding your values to the nearest whole number (or numbers) if necessary. Find the new value of the test ratio and get a p-value. What does the change in p-value between parts c) and g) suggest about the effect of increased sample size on the power of the test? (3) [32] Sorry!!! I multiplied by 1.5. It really doesn’t matter if we are trying to make a point. The point is that for any given population mean, raising the sample size will increase the power of the test. H 0 : p .13 We are testing with .05 . H 1 : p .13 25 (Store 1) p .09091 . We had pvalue Pz 1.928 .5 .4732 .0268 . Since this was below 275 .05 , we rejected the null hypothesis. If we multiply both the numerator and the denominator by 1.5, we 37 38 get p .09201 or p .08981 . If we take the first ratio 413 412 252y0811 3/7/08 p0 q0 .13.87 .0002738 0.01655 n 413 pvalue Pz 2.295 .5 .4890 .0110 . p 25 z p p0 p .09201 .13 2.295 . We now find .01655 4 .08000 . We had pvalue Pz 1.051 .5 .3531 .1469 . Since this was above 50 .05 , we could not reject the null hypothesis. If we multiply both the numerator and the denominator by (Store 10) p p0 q0 6 .13.87 .08000 . We can now compute p .001508 0.03883 75 n 75 .0800 .13 1.288 . We now find pvalue Pz 1.288 .5 .4015 .0850 . In both .03883 1.5, we get p z p p0 p cases, the rise of n serves to make the standard error smaller, which increases the absolute value of z and thus decreases p-value. The lower p-value is, the closer we get to rejection at any confidence level. 252y0811 3/7/08 26 Problem 4: According to Doane and Seward both the mean and the standard deviation of pH (a measure of acidity) are of interest to winemakers. Assume that your firm (store from the last problem) has gotten into the wine business. A sample of 16 wine bottles is taken. Your column has the same number as your store. Minitab has calculated all sorts of sample statistics on your data. These are listed below. Use them. Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 C1 3.41 3.45 3.51 3.52 3.68 3.29 3.39 3.57 3.38 3.14 3.61 3.23 3.48 3.39 3.49 3.50 C2 3.44 3.42 3.45 3.48 3.68 3.45 3.42 3.50 3.41 3.36 3.69 3.40 3.48 3.48 3.45 3.63 C3 3.61 3.59 3.63 3.65 3.87 3.62 3.59 3.67 3.58 3.52 3.87 3.57 3.66 3.65 3.62 3.81 C4 3.39 3.37 3.41 3.44 3.66 3.41 3.37 3.45 3.36 3.30 3.66 3.35 3.44 3.43 3.40 3.60 C5 3.41 3.39 3.43 3.46 3.69 3.43 3.39 3.48 3.38 3.32 3.70 3.37 3.46 3.45 3.42 3.63 C6 3.43 3.41 3.45 3.47 3.69 3.44 3.41 3.49 3.40 3.34 3.70 3.39 3.48 3.47 3.44 3.64 C7 3.40 3.38 3.42 3.45 3.68 3.42 3.38 3.47 3.37 3.31 3.68 3.36 3.45 3.44 3.41 3.62 C8 3.56 3.53 3.59 3.63 3.95 3.58 3.53 3.65 3.52 3.43 3.95 3.51 3.63 3.62 3.57 3.87 Variable N N* Mean SE Mean StDev Minimum C1 16 0 3.4400 0.0347 0.1387 3.1400 C2 16 0 3.4837 0.0245 0.0980 3.3600 C3 16 0 3.6569 0.0259 0.1037 3.5200 C4 16 0 3.4400 0.0268 0.1072 3.3000 C5 16 0 3.4631 0.0281 0.1124 3.3200 C6 16 0 3.4781 0.0265 0.1061 3.3400 C7 16 0 3.4525 0.0278 0.1110 3.3100 C8 16 0 3.6325 0.0388 0.1553 3.4300 C9 16 0 3.5562 0.0382 0.1528 3.2300 C10 16 0 3.2000 0.0347 0.1387 2.9000 Maximum is removed since it is irrelevant. C9 3.53 3.56 3.63 3.65 3.82 3.39 3.50 3.70 3.49 3.23 3.75 3.32 3.59 3.51 3.61 3.62 Q1 3.3825 3.4200 3.5900 3.3700 3.3900 3.4100 3.3800 3.5300 3.4925 3.1425 C10 3.17 3.21 3.27 3.28 3.44 3.05 3.15 3.33 3.14 2.90 3.37 2.99 3.24 3.15 3.25 3.26 Median 3.4650 3.4500 3.6250 3.4100 3.4300 3.4450 3.4200 3.5850 3.5750 3.2250 Q3 3.5175 3.4950 3.6675 3.4475 3.4750 3.4875 3.4650 3.6450 3.6450 3.2775 You must state H 0 and H 1 where applicable to get credit for any of the tests below. Make sure that I know which column you are using! The usual excerpt from the formula table follows. Interval for Confidence Hypotheses Test Ratio Critical Value Interval VarianceH 0 : 2 02 n 1s 2 n 1s 2 .25 .5 2 02 2 2 2 2 Small Sample s cv .5 .5 2 02 n 1 H1: : 2 02 VarianceLarge Sample s 2DF z 2 2DF H 0 : 2 02 z s cv 2 DF 2 2DF 1 H1 : 2 02 I will work out the solutions for Stores 2 and 8 since they have the largest and smallest variances. 2 z 2 2 DF 252y0811 3/7/08 27 a) The acceptable standard deviation for wine pH is 0.10. Using the data for your store, test the hypothesis that the standard deviation is 0.10 using a 95% confidence level. (2) H : 0.10 Our hypotheses are 0 . .05 and n 16 . So df n 1 15 . We will use the test ratio H 1 : 0.10 method only. Make a diagram! If you are fussy, the diagram should be skewed to the right with a mode at 13, a median at 14.3333 (as computed by Minitab) and a mean at df n 1 15 . Look up 15 27 .4884 and 215 6.2621 . Add a vertical line to indicate the mean or the median and shade .2025 .075 the ‘reject’ zones below 6.2621 and above 27.4884 (the area that isn’t shaded on the diagram below). We have for store 2 s 2 0.0980 or s 22 0.009604. 2 n 1s 2 02 15.009604 14 .406 . Since this .10 2 15 6.2621 and 215 27 .4884 , do not reject the null hypothesis. value is between .2075 .025 We have for store 8 s 8 0.1553 or s82 0.024118. 2 n 1s 2 02 15.024118 .10 2 36 .177 . Since this 15 6.2621 and 215 27 .4884 , reject the null hypothesis. value is not between .2075 .025 b) Test the hypothesis that the standard deviation is below .14. (1) H 0 : 0.14 Our hypotheses are .05 and n 16 . So df n 1 15 . H 1 : 0.14 We will use the test ratio method only. This is a left-sided test because we will only reject the null hypothesis if the sample standard deviation is too small. Make a diagram! (If you are fussy, the diagram should be skewed to the right with a mode at 13, a median at 14.3333 (as computed by Minitab) and a mean at df n 1 15 ). You want a value of .215 that will cut off the bottom 5% of the distribution so look up .29515 7.2609 . Add a vertical line to indicate the mean or the median and shade the ‘reject’ zone below 7.2609. We have for store 2 s 2 0.0980 or s 22 0.009604. 2 n 1s 2 02 15.009604 .14 2 7.350 . Since this value is not below .29515 7.2609 , do not reject the null hypothesis. We have for store 8 s 8 0.1553 or s82 0.024118. 2 n 1s 2 02 value is not below .29515 7.2609 , do not reject the null hypothesis. 15.024118 .14 2 18 .458 . Since this 252y0811 3/7/08 28 c) Repeat a) and b) using the sample (mean and) variance you used in a) and b) but assuming a sample size of 100. Find p-values. (4) H : 0.10 Our hypotheses are 0 .05 and n 100 . So df n 1 99 . Because of the large number H 1 : 0.10 of degrees of freedom, we must use z . Make a diagram of a Normal curve with a mean at zero. Indicate two ‘reject’ zones, one below z.025 1.960 and one above z .025 1.960 . We have for store 2 s 2 0.0980 or s22 0.009604. 2 n 1s 2 02 99 .009604 .10 2 95 .0796 z 2 2 2DF 1 295 .0796 299 1 190.1592 197 13.7898 14.0357 0.246 This is not in a ‘reject’ zone, so do not reject the null hypothesis. We have for store 8 s 8 0.1553 or s82 0.024118. 2 n 1s 2 02 99 .024118 238 .7682 .10 2 z 2 2 2DF 1 2238 .7682 299 1 477.5264 197 21.8524 14.0357 7.817 This is in a ‘reject’ zone, so reject the null hypothesis. H : 0.14 Our hypotheses are 0 .05 and n 100 . So df n 1 99 . H 1 : 0.14 This is a left-sided test because we will only reject the null hypothesis if the sample standard deviation is too small. Because of the large number of degrees of freedom, we must use z . Make a diagram of a Normal curve with a mean at zero. Indicate one ‘reject’ zone below z.05 1.645 . We have for store 2 s 2 0.0980 or s 22 0.009604. 2 n 1s 2 02 99 .009604 .14 2 48 .5100 z 2 2 2DF 1 248 .5100 299 1 97.0200 197 9.8499 14.0357 4.186 This is in the ‘reject’ zone, so reject the null hypothesis. We have for store 8 s 8 0.1553 or s82 0.024118. 2 n 1s 2 02 99 .024118 .14 2 121 .8205 z 2 2 2DF 1 2121 .8205 299 1 243.6410 197 15.6090 14.0357 1.5733 This is not in a ‘reject’ zone, so do not reject the null hypothesis. d) Find 2-sided 95% confidence interval for the standard deviation using data from your store and assuming a sample size of 16. (2) The formula for a Confidence Interval for the variance is 15 6.2621 and 215 27 .4884 , So found .2075 .025 15 0.7387 and 27 .4884 n 1s 2 22 2 n 1s 2 12 2 . We have already 15 15 .5457 and 2.3954 , which means 27 .4884 6.2621 15 1.5477 27 .4884 We have for store 2 s 2 0.0980 or s 22 0.009604. 150.009604 150.009604 2 . If we take square roots, we have 27 .4884 6.2621 0.7387 0.0980 1.5477 0.0980 or 0.07239 0.1517 So the interval for the variance is 252y0811 3/7/08 29 We have for store 8 s 8 .1553 or s82 .024118. 150.024118 150.024118 . If we take square roots, we have 2 27 .4884 6.2621 0.7387 0.1553 1.5477 0.1553 or 0.1147 0.2404 So the interval for the variance is e) Repeat d) for a sample size of 100. (1) [41] s 2df The large sample formula for a Confidence Interval for the variance is z 2 2df s 2df z 2 2df . 299 14 .0712 So We have already found z z .025 1.960 and 2 . 299 1.960 2df . 299 14 .0712 14 .0712 0.8777 and 1.1618 . 1.960 14 .0712 1.960 2df 1.960 14 .0712 We have for store 2 s 2 0.0980 or s 22 0.009604. So the interval is .0980 0.8777 .0980 1.1618 or 0.0860 0.1139 We have for store 8 s 8 .1553 or s82 .024118. So the interval is .1553 0.8777 .1553 1.1618 or 0.1363 0.1804 . .0980 299 1.96 2df .1553 299 1.96 2df .0980 299 1.96 299 .1553 299 1.96 299 or or f) Here’s the easiest question on the exam. By now you should have figured out that you don’t have to understand a statistical test at all if you know i) what it assumes, ii) what the null hypothesis is and iii) what the p-value is associated with the null hypothesis. So, I am going to do a test that the standard deviation is 0.1 on the following data set. C11 3.53 3.44 3.51 3.78 3.54 3.49 3.57 3.57 3.78 3.57 3.54 3.54 3.51 3.72 3.59 3.50 Then I am going to run a Lilliefors test on these data using Minitab. The null hypothesis of the Lilliefors test is that the sample comes from the Normal distribution. The test makes no assumptions about the mean and standard deviation of the population and computes these as sample statistics from the data. After it printed ‘Probability plot of C11,’ the computer printed a graph of the data, but the only thing I looked at was the p-value which was less than .01. After the Lilliefors test, the computer printed out the results of two versions of a statistical test on the standard deviation. The ‘Standard’ version is the method that you learned and is only applicable if the data comes from a Normal distribution. The ‘Adjusted’ version is for all other cases. So explain what p-value I look at and what it tells me. So, if it’s a test for the Normal distribution and we know that the null hypothesis is that it’s Normal and the p-value is less than .01, the p-value is less than any significance level we might use and we can be very sure that the data do not follow a Normal distribution. MTB > NormTest c11; SUBC> KSTest. Probability Plot of C11 MTB > OneVariance c11; SUBC> Test .1; SUBC> Confidence 95.0; SUBC> Alternative 0; SUBC> StDeviation. Test and CI for One Standard Deviation: C11 Method Null hypothesis Sigma = 0.1 252y0811 3/7/08 30 Alternative hypothesis Sigma not = 0.1 The standard method is only for the normal distribution. The adjusted method is for any continuous distribution. Statistics Variable N C11 16 StDev 0.100 Variance 0.0100 95% Confidence Intervals Variable C11 Tests Variable C11 Method Standard Adjusted CI for StDev (0.074, 0.155) (0.071, 0.170) Method Standard Adjusted Chi-Square 15.06 11.12 CI for Variance (0.0055, 0.0240) (0.0050, 0.0288) DF 15.00 11.07 P-Value 0.895 0.880 H : 0.10 The output says that it is testing 0 . The adjusted method works if data is not Normal and H 1 : 0.10 gives us a p-value of .880. Since this p-value is above any significance level we are likely to use, we cannot possibly reject the null hypothesis that the standard deviation is 0.10.