* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Mathematical Template
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Psychometrics wikipedia , lookup
Confidence interval wikipedia , lookup
Omnibus test wikipedia , lookup
Resampling (statistics) wikipedia , lookup
COB 191 Statistical Methods Exam 3 Fall 2007 Name______________________________ Section (circle): 1:25 PM 2:30 PM Dr. Scott Stevens DO NOT TURN TO THE NEXT PAGE UNTIL YOU ARE INSTRUCTED TO DO SO! The following exam consists of 40 questions, each worth 2.5 points. You will have 90 minutes to complete the test. This means that you have, on average, 2 minutes, 15 seconds per question. Record your answer to each question on the scantron sheet provided. You are welcome to write on this exam, but your scantron will record your graded answer. Read carefully, and check your answers. Don’t let yourself write nonsense. Keep your eyes on your own paper. If you believe that someone sitting near you is cheating, raise your hand and quietly inform me of this. I'll keep an eye peeled, and your anonymity will be respected. If any question seems unclear or ambiguous to you, raise your hand, and I will attempt to clarify it. Be sure your correctly record your student number on your scantron, and blacken in the corresponding digits. Failure to do so will cost you 10 points on this exam! Pledge: On my honor as a JMU student, I pledge that I have neither given nor received unauthorized assistance on this examination. Signature ______________________________________ Possibly Useful Excel Functions =NORMDIST(x value, mean, st.dev, TRUE) =NORMINV(probability, mean, st. dev) =TDIST(t value, deg. freedom, # of tails) NORMINV and NORMDIST deal w ith WHITE area =NORMSDIST(z value) =NORMSINV(probability) =TINV(probability, deg. freedom) TINV and TDIST deal w ith WHITE area Questions 1 - 5 deal with the Virginia Construction scenario, below. One hundred state-funded building projects in the state of Virginia were randomly selected, and the construction cost of each was recorded. The mean construction cost for these 100 projects was $850,000 with a standard deviation of $60,000. 1. Given this information, we would expect that the population standard deviation, σ a) b) c) d) e) should be about $600. should be about $6,000. should be about $60,000. should be about $600,000. must be exactly $60,000. 2. Given this information, we could determine the 95% confidence interval for I. the average cost of a state-funded building project in Virginia. II. the fraction of state-funded building projects in Virginia that cost more than $850,000. III. the fraction of state-funded building projects in Virginia that cost more than $910,000. a) I only b) II only c) III only d) I and II only e) I, II and III 3. Suppose that we use the given information to test (at the 10% level of significance) the null hypothesis that the average cost of a state-funded building project in Virginia was less than or equal to $900,000. This test would be a a) one-tailed z test b) one-tailed t test c) two-tailed z test d) two-tailed t test e) paired difference of means test 4. Suppose that we use the given information to test (at the 10% level of significance) the null hypothesis that the average cost of a state-funded building project in Virginia was less than or equal to $900,000. The conclusion of this test would be a) b) c) d) reject the null hypothesis accept the null hypothesis do not reject the null hypothesis. do not accept the null hypothesis. 5. Suppose that we use the given information to test (at the 10% level of significance) the null hypothesis that the average cost of a state-funded building project in Virginia was less than or equal to $900,000. Which expression below would give the value of α appropriate for this test? a) =tdist(50000, 99, 1) d) =normsdist(-0.8333) b) =tdist(8.333,99,1) c) =normsdist(-8.333) e) 0.1 End of Virginia Construction Scenario Questions 6 - 17 deal with the Spell Check study described below. A randomly selected group of 300 registered voters were asked to identify the party with which they were registered. 160 said that they were registered Republicans, 120 said that they were registered Democrats, and the remaining 20 said that they were registered either Independent, or with some other party. Each person was then asked to take a short dictation, writing down this sentence on a piece of paper: “It’s unbelievable that you’re having a lot of arguments about grammar.” The transcriptions were then checked for spelling errors. The results are shown below in a crosstab table. For example, 56 of the respondents were Republicans who made at least one spelling error in transcribing the sentence. No spelling errors 104 84 18 Republican Democrat Other At least one spelling error 56 36 2 In these samples, the mean number of errors made by Republicans was 0.40 with a standard deviation of 0.60. The mean number of errors made by Democrats was 0.85 with a standard deviation of 1.50. For “Others”, the figures were 0.10 for the mean and 0.30 for the standard deviation. Questions 6 to 8 deal only with the Democrats. 6. Suppose we focus for the moment only on Democrats. Then the calculation 36/120 would give the value of a) π b) x c) μ d) p e) x 7. A study of the populace at large has shown that 33% of all adults make at least one spelling error when transcribing the statement. We would like to determine if there is evidence that registered Democrats have a lower error rate than the populace at large. What would be the appropriate null hypothesis for this test? a) μ < 0.33 b) p > 0.33 c) μ = 0.33 d) π < 0.33 e) π > 0.33 8. Continuing with question #7: the appropriate hypothesis test would be a a) one population t test c) two population t test b) one population z test d) two population z test e) paired differences test 9. We focus now only on the population of all registered Republicans. We’ll conduct a hypothesis test at the 0.10 level of significance. The null hypothesis under consideration is that at most 1/3 of all registered Republicans would make spelling errors in their transcriptions. In our sample, 35% of the Republicans made spelling errors. This hypothesis test has a P-value of 0.3274. We conclude that we cannot reject the null hypothesis. Why? a) b) c) d) e) Because 0.10 is less than 0.3333 Because 0.3333 is less than 0.35 Because 0.3274 is greater than 0.10 Because 0.3274 is less than 0.3333 Because 0.3274 is less than 0.35 10. We focus now only on registered voters who are neither Democrat nor Republican. We want to test the null hypothesis that spelling errors would occur in the transcriptions of at most 5% of such “Other” voters. A few moments of consideration, and we’ll be done with this problem. Why? a) Because only 10% of the sample of “Other” voters made mistakes, so we would accept the null hypothesis as true. b) Because only 10% of the sample of “Other” voters made mistakes, so the sample is in the “nonrejectable tail”. c) Because the sample size (20) is too small to use unless we know that the population is nearly normal. d) Because the number of successes in the sample (2) is less than 5. e) Because the expected number of successes in a sample of size (1) is less than 5. 11. Suppose that I believe that the average number of mistakes made in transcription is greater for registered Democrats than it is for registered Republicans. I want to use the data provided to conduct the appropriate hypothesis test at the 5% level of significance. What kind of test is appropriate? a) b) c) d) e) One-tailed, two population test of means Two-tailed, two population test of means One-tailed, two population test of proportions Two tailed, two population test of proportions One-tailed paired difference test of means 12. The distribution of the number of errors made by Democrats in the sample is shown below. You wish to build the 90% confidence interval for the average number of spelling errors that would be made by registered Democratic voters on this transcription test. Which line of thought is most relevant for beginning this task? a) “The population is not normal, so the test cannot be conducted.” b) “There are at least five people who made errors and at least five people who did not, so I can proceed.” c) “The sample is quite skewed, so I cannot construct the interval.” d) “The sample is quite skewed but my sample size is large, so I can still construct the interval.” e) “The normality of the population is irrelevant to me because I am building a confidence interval for a proportion.” 13. Frequency Number of Errors Made By Democratic Respondents in Sample 90 80 70 60 50 40 30 20 10 0 0 1 2 3 4 errors per respondent 5 13. I conduct a two-tailed hypothesis test at the 0.50 level of significance. (Note that this is an uncommonly high level of significance!) The null hypothesis is that the same proportion of Republicans as Democrats make spelling errors on the transcription. The P-value for this test is 0.378. At the 0.50 level of significance, what do I conclude? (Be sure to read all of the answers!) a) There is no difference between the fraction of registered Republicans and registered Democrats who would make spelling errors on the test. b) There is no reason to conclude that there is a difference between the fraction of registered Republicans and registered Democrats who would make spelling errors on the test. c) Registered Democrats are more likely to make spelling errors on the test than registered Republicans. d) 37.8% of registered Republicans and Democrats would make spelling errors on the test. e) The fraction of registered Democrats who would make spelling errors on the test is different than the fraction of registered Republicans who would make a spelling error on the test. 14. By using such a very large level of significance in problem #13, I made myself very vulnerable to a particular kind of error. Which of the following correctly expands this point? a) I’m very likely to reject a null hypothesis that is in fact true. b) I’m very likely to fail to reject a null hypothesis that is actually false. c) I’m very likely to use a larger sample size than is actually needed to conduct the test. d) I’m very likely to reject a null hypothesis that is actually false. e) I’m very likely to accept a null hypothesis, even though doing so is always a mistake. 15. Based on the data given, I want to construct the 80% confidence interval for the fraction of registered Republicans who would make a spelling error (or several) on the dictation test. As a first step, compute p. Your answer must be accurate to at least 4 decimal places. a) 0.0373 b) 0.0377 c) 0.3274 d) 0.3333 e) 0.3356 16. Continuing with #15: The Excel command that would allow you to find the appropriate critical score for the computation of this interval is a) =TDIST(0.20,159,2) d) =NORMSDIST(0.80) b) =TINV(0.20,159) c) =TINV(0.40,159) e) =NORMSINV(0.90) 17. The answer to question #16, numerically, is 1.28. Pretend for this question only that the answer to question 15 was 0.10. Find the lower limit of the 80% confidence interval for the fraction of registered Republicans who would make a spelling error (or several) on the dictation test. a) 0.205 b) 0.222 c) 0.301 d) 0.461 e) 0.478 End of the Spell Check Study Scenario 18. Based on a sample of 400 randomly selected new US houses, we determine a 90% confidence interval for the mean price of a new US house as $260,000 to $280,000. This means that a) we are 90% confident that the mean price for our sample is between $260,000 and $280,000. b) we are 90% confident that the mean price of all new US houses is between $260,000 and $280,000. c) about 90% of our sample had prices between $260,000 to $280,000. d) about 90% of all new US houses have prices between $260,000 and $280,000. e) if we looked at all possible samples of 400 randomly selected new US houses, about 90% of those samples would have an average price between $260,000 and $280,000. Questions 19 - 21 are based on the Drive to Work scenario, described below. I time my drive from home to work on 50 different days, and find that the average time is 16.25 minutes. Assuming that these times constitute a random sample drawn from a population whose standard deviation of 3.50 minutes, I wish to construct the 96% confidence interval for the mean time required for me to get to work. 19. What symbol would be used to represent the value 3.50 in this problem? a) s b) c) x d) s2 e) 2 20. The critical value (z* or t*) needed for the construction of this confidence interval could be computed in Excel by the command a) b) c) d) e) =NORMSINV(0.92) =NORMSINV(0.96) =NORMSINV(0.98) =TINV(0.04,49) =TINV(0.04,50) 21. The margin of error (MOE) for the confidence interval would be computed by multiplying your answer to question number 20 by a) 0.0700 b) 0.2450 c) 0.4950 d) 0.500 e)1.732 End of the Drive to Work Scenario Questions 22-24 deal with the Radiation Therapy scenario below. The radiation machine in a cancer ward is periodically checked to be sure that the level of radiation indicated on its control panel is close to what is actually delivered to the patient. This is important since too low a radiation dose may not kill the cancer, while too high a radiation dose may actually cause severe damage to the patient. To conduct this test, a special sensor is attached to the radiation machine for a time. Each time the machine is used, the sensor can be activated to record both the actual radiation level of the treatment and the radiation level indicated on the control panel. The cost of gathering and recording this information is 1 cent per use recorded. After a month during which data may be collected, a hypothesis test will be conducted using the sensor data. The null hypothesis is that the average level of radiation indicated on the control panel is the same as the average level actually delivered. If this hypothesis is rejected, a repair team costing $100 will be called in to adjust the machine. Note that the actual number of times the machine is used during the month will be the same, regardless of how many times the sensor is activated. 22. Given the consequences discussed above, the hypothesis test should be conducted with a) b) c) d) e) a small sample and small value of alpha a small sample and a large value of alpha a large sample and a small value of alpha a large sample and a large value of alpha a small value of alpha and a large value of beta 23. Suppose the hospital administration reviews the test results and concludes that there is no reason to believe the machine is malfunctioning. Later, they find that the machine was giving twice the radiation, on average, that was indicated by the controls. This would be an example of a) careless extrapolation b) a skewed population d) type II error e) type III error c) type I error 24. The sensor records the actual dosage and the reported dosage for each patient receiving treatment. The best way to conduct the hypothesis test from this data is a) b) c) d) e) as the difference of two means with variances assumed equal. as the difference of two means with variances not assumed equal. as a paired difference, resulting in a test of a single mean. as a difference of two proportions test. as a blind stab in the dark. End of Radiation Therapy Scenario Questions 25 - 27 deal with the Christmas Tree scenario, below. George plans on harvesting the evergreens in a particular acre of his land to be sold as Christmas trees. He has been told that there is virtually no market for trees shorter than 5 feet or taller than 8 feet. Trees between 5 and 8 feet in height can be sold. George wants to make sure that his trees can be sold, so he has his daughter take a random sample of 100 of his evergreens and measure their heights. Using the heights of these 100 trees, she (correctly) builds the 95% confidence interval for the mean tree height. It runs from 6 feet, 4 inches to 7 feet, 4 inches. As a consequence, George orders the evergreens on his property harvested for sale as Christmas trees. 25. What was the average height of the 100 trees in the sample? a) b) c) d) e) 6 feet, 4 inches 6 feet, 10 inches 7 feet, 4 inches definitely between 6’4” and 7’4”, but we can’t be more precise than that impossible to say from the information given 26. What was the average height of all of the evergreens on the acre of property? a) b) c) d) e) 6 feet, 4 inches 6 feet, 10 inches 7 feet, 4 inches definitely between 6’4” and 7’4”, but we can’t be more precise than that impossible to say from the information given 27. When George has his trees harvested, he discovers that about 3/4 of his trees are either smaller than 5’ or taller than 8’. In light of the confidence interval work, he is shocked. He thought that almost all of his evergreens would be in the range of trees that could be sold. What important factor is the most sensible explanation of this “discrepancy”? a) There is a 5% chance that the confidence interval doesn’t contain the population mean. Evidently, his daughter randomly chose a sample that is not representative of the population. b) The heights of evergreen trees on the acre might not be normally distributed. c) George’s daughter may have mismeasured the heights of some of the trees in the sample. d) The confidence interval deals only with the average height of evergreens on the property, not the heights of individual trees. e) George doesn’t know how many evergreens were on his acre of land. End of Christmas Tree Scenario Questions 28 - 30 deal with the Cholesterol scenario below. For the general population of men, the mean cholesterol level is 215. It has been noted by some physicians that men who have creased earlobes tend to have higher cholesterol levels than the population of men at large. To explore this theory, we take as our null hypothesis H0: crease < 215, where crease is the average cholesterol level for men with creased earlobes. We take a random sample of 20 men who have creased earlobes and measure their cholesterol levels. We find that the average cholesterol level for these men is 257, with a standard deviation of 10. We’ll assume that the distribution of cholesterol levels is normally distributed. 28. Find the t value for this sample. It is a) 4.2 b) 18.78 c) 25.7 d) 84 e) 114.9 29. Let t be the value for t found in question 28. To find the P-value corresponding to this t value in Excel, we would enter a) b) c) d) e) =TDIST(t, 19, 1) =TDIST(t, 19, 2) =TDIST(t, 20, 1) =TDIST( t, 20, 2) =TINV(t, 19)/2 30. The critical t value for this test, t*, depends upon I. the value of alpha II. the sample size III. the sample mean a) I only b) II only c) III only d) I and II only e) II and III only 31. The t value from this sample is larger than t*. What should we conclude? a) There is no evidence that men with creased earlobes have a higher average cholesterol level than men at large. b) There is evidence that men with creased earlobes have a higher average cholesterol level than men at large. c) There is no evidence that men with creased earlobes have a lower average cholesterol level than men at large. d) There is evidence that mean with creased earlobes have a lower average cholesterol level than men at large. e) We cannot conclude any of these statements, because the sample’s mean is in the nonrejectable tail of the hypothesis test. End of the Cholesterol scenario 32. Increasing the size of alpha used for a hypothesis test will I. Decrease the size of the nonrejection region. II. Decrease the chance of rejecting a true null hypothesis. III. Decrease the chance of rejecting a false null hypothesis. a) I only b) II only c) I and II only d) I and III only e) II and III only 33. The degrees of freedom for the t statistic in a one mean hypothesis test is the same as a) b) c) n-1 d) / n e) s/ n-1 34. I want to construct the 99% confidence interval for the population proportion. z* for 99% confidence is 2.58. I have no idea of what the population proportion might actually be, but I imagine that it’s somewhere between 0.05 and 0.95. What’s the smallest sample that I could take to assure that the resulting 99% confidence interval has a margin of error of no more than 0.05? a) 26 b) 127 c) 666 d) 2663 e) 16,441 35. I want to construct the 99% confidence interval for the population mean. z* for 99% confidence is 2.58. I know that the population standard deviation is 16. What is the smallest sample that I can take to assure that the resulting 99% confidence interval will have a total width (from lower limit to upper limit) of no more than 10? (You may assume that the population is normally distributed.) a) 2 b) 9 c) 18 d) 36 e) 69 36. As the degrees of freedom of a t distribution grows larger and larger, the t distribution a) b) c) d) e) becomes nearly identical to the z distribution. becomes narrower and narrower, approaching a single “spike”. becomes wider and wider, approaching a “flat line”. becomes negative. becomes undefined. 37. I conduct a one-tailed hypothesis test of the mean for a population whose standard deviation I do not know. The rejection region for this test is in the lower tail. I take a sample of 100 observations and find that the t value computed for my sample is -1.1. I wish to know the P-value of this sample in this one-tailed test. Which calculation in Excel would give me this value? a) =TDIST(-1.1, 99,2) d) =TDIST(-1.1,99,1)/2 b) =TDIST(1.1, 99, 2) e) =TDIST(-1.1,99,1)*2 c) =TDIST(1.1, 99, 1) 38. I want to conduct a one-tailed hypothesis test of the mean for a population whose standard deviation I do not know. The rejection region for this test is in the upper tail. I take a sample of 100 observations. I want to use a 0.05 level of significance. What calculation in Excel would give me t*, the critical value of t for this test? a) =TINV(0.05,99) b) =-TINV(0.05,99) c) =TINV(0.10,99) d) =TINV(0.05,99)*2 e) =TINV(0.05,99)/2 39. A firm is trying to decide if sending its office staff to a word processing training program is worthwhile. It feels that the program is worthwhile if the average typing speed of its office staff members would increase by more than 5 words per minute. To explore the possibility, the firm has 35 of its office staff go through the training program. A typing sample is then given to 70 of the company’s employees (the 35 who received the training and another 35 who did not), and the typing speed of each employee is recorded. What null hypothesis would be the best to use with the available data? a) after - before < 0 where before is the mean typing rate before the training and after is the mean typing rate after the training. b) after - before < 5 where before is the mean typing rate before the training and after is the mean typing rate after the training. c) after - before > 0 where before is the mean typing rate before the training and after is the mean typing rate after the training. d) after - before > 5 where before is the mean typing rate before the training and after is the mean typing rate after the training. e) difference > 5, where difference is the average improvement in each member of the office staff, where “improvement” is computed by taking a staff member’s typing rate after the training and subtracting off his or her typing rate before training. 40. Estimation and hypothesis testing are examples of a) summary statistics b) linear methods c) descriptive statistics d) seat-of-the-pants methods e) inferential statistics