Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EDRS 811 Final Sp. 12 1 EDRS 811 Final Exam Spring, 2011, Brigham The exam is an open book, open notes activity. You must work individually to complete this task, however. Consultation with another person, living, dead, or imagined is a violation of the honor code and could cause a failing grade for the final. You are free to ask questions of clarification but I cannot tell you how to do things at this point. This is as much an evaluation of my teaching as your learning. Try to make me feel good about myself! You may need to use Excel or a powerful calculator for some of the questions. You are permitted to use SPSS but you should not really need to do so. The exam is due by midnight on May 15. Earlier submissions are gratefully accepted! I will place a slot on the Blackboard web site for you to upload your files so that you will be certain that I have received them. Also, a copy of this document will be placed on Blackboard for your use. If you type your responses on this document, please put them in bold or some other form that will make it easy for me to discriminate your work from mine. 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student owns. Categorical B) Number of credit hours taken during that semester. Quantitative C) The time the student waited in line at the bookstore to pay for his/her textbooks. Quantitative D) Home state of the student. Categorical 2. A sample of employees of a large pharmaceutical company has been obtained. The length of time (in months) they have worked for the company was recorded for each employee. A stemplot of these data is shown below. In the stemplot 6|2 represents 62 months. What would be a better way to represent this data set? A) Display the data in a time plot. B) Display the data in a boxplot. C) Split the stems. D) Use a histogram with class width equal to 10 EDRS 811 Final Sp. 12 2 3. 4. 5. When examining a distribution of a quantitative variable, which of the following features do we look for? A) Overall shape, center, and spread. B) Symmetry or skewness. C) Deviations from overall patterns such as outliers. D) The number of peaks or modes. E) All of the above. Select the three most important elements from item three, list them in decreasing order (most second most least) and explain why you believe that item is important (e.g., how does inspection of that element help us in our statistical interpretation of the data? 1) overall shape, center, and spread – gives you the best overall description of the distribution of the data, with the mean, median and spread each giving important information 2) symmetry or skewness – skew will affect the mean, and this tells you which way the data is trending 3) deviations from overall pattern – an outlier will affect the mean, this helps you interpret any differences between the median and mode Statistics were gathered on the number of homicides committed with guns in Australia in the years from 1980 to 2004. From these data the following graph was constructed: This plot is a graph of a(n) _________ and it shows that there is(are) _______ in the data. A) categorical variable; skewness to the right B) histogram; multiple peaks C) line; an increasing trend EDRS 811 Final Sp. 12 3 D) quantitative variable; outlier values E) time series; a decreasing trend 6. The time to complete an exam is approximately Normal with a mean of 70 minutes and a standard deviation of 10 minutes. Using the 68-95-99.7 rule, what percentage of students will complete the exam in under an hour? A) 68% B) 32% C) 16% D) 5% 7. Using the standard Normal distribution tables, what is the area under the standard Normal curve corresponding to Z < 1.1? A) 0.1357 B) 0.2704 C) 0.8413 D) 0.8643 8. Using the standard Normal distribution tables, what is the area under the standard Normal curve corresponding to Z > –1.22? A) 0.1151 B) 0.1112 C) 0.8849 D) 0.8888 9. Using the standard Normal distribution tables, what is the area under the standard Normal curve corresponding to –0.5 < Z < 1.2? A) 0.3085 B) 0.8849 C) 0.5764 D) 0.2815 10. The variable Z has a standard Normal distribution. Find the value z such that 85% of the observations fall below z. A) z = –1.04 B) z = 0.80 C) z = 0.85 D) z = 1.04 EDRS 811 Final Sp. 12 4 11. Consider the following Normal quantile plot: What is the most striking feature of the plot? A) The granularity. B) The strong skewness indicated by the plot. C) The many outliers evident in the plot. D) The fact that Y is categorical. Use the following to answer questions 12 and 13: John’s parents recorded his height at various ages between 36 and 66 months. Below is a record of the results: Age (months) Height (inches) 36 34 48 38 54 41 60 43 66 45 12. Which of the following is the equation of the least-squares regression line of John’s height on age? (Note: You do not need to directly calculate the least-squares regression line to answer this question.) A) Height = 12 (Age) B) Height = Age/12 C) Height = 60 – 0.22 (Age) D) Height = 22.3 + 0.34 (Age) 13. John’s parents decide to use the least-squares regression line of John’s height on age to predict his height at age 21 years (252 months). What conclusion can we draw? A) John’s height, in inches, should be about half his age, in months. B) The parents will get a fairly accurate estimate of his height at age 21 years, because the data are clearly correlated. C) Such a prediction could be misleading, because it involves extrapolation. D) All of the above. EDRS 811 Final Sp. 12 5 14. Which of the following scatterplots would indicate that y is growing linearly over time? A) 20.0 Y 15.0 10.0 5.0 0.0 0 2 4 6 8 10 Time B) 5.0 4.5 Y 4.0 3.5 3.0 2.5 2.0 0 2 4 6 8 10 Time C) 12.0 10.0 Y 8.0 6.0 4.0 2.0 0.0 0 2 4 6 8 10 8 10 Time D) 120 100 Y 80 60 40 20 0 0 2 4 6 Time EDRS 811 Final Sp. 12 6 Use the following information for items 15& 16 Are avid readers more likely to wear glasses than those who read less frequently? Threehundred men in Ohio were selected at random and characterized as to whether they wore glasses and whether the amount of reading they did was above average, average, or below average. The results are presented in the following table: Amount of reading Above average Average Below average Total 15. Glasses? Yes No 47 26 48 78 31 70 126 174 What is the proportion of men in the sample who wear glasses? 0.42 16. What is the proportion of all above average readers who wear glasses? 0.644 Use the following information for items 17-19 The scores of individual students on the American College Testing (ACT) Program Composite College Entrance Examination have a Normal distribution with mean 18.6 and standard deviation 6.0. At Northside High, 36 seniors take the test. Assume the scores at this school have the same distribution as national scores. 17. What is the mean of the sampling distribution of the sample mean score for a random sample of 36 students? 18 18. What is the mean of the sampling distribution of the sample mean score for a random sample of 36 students? 18 19. What is the sampling distribution of the sample mean score for a random sample of 36 students? A) Approximately Normal, but the approximation is poor. B) Approximately Normal, and the approximation is good. C) Exactly Normal. D) Neither Normal nor non-Normal. It depends on the particular 36 students selected. EDRS 811 Final Sp. 12 7 20. A small New England college has a total of 400 students. The Math SAT is required for admission, and the mean score of all 400 students is 620. The population standard deviation is found to be 60. The formula for a 95% confidence interval yields the interval 640 ± 5.88. Determine whether each of the following statements is true or false. A) If we repeated this procedure many, many times, only 5% of the 95% confidence intervals would fail to include the mean Math SAT score of the population of all students at this college. FALSE B) The probability that the population mean will fall between 634.12 and 645.88 is 0.95. TRUE C) The interval is incorrect. It is much too narrow. FALSE D) If we repeated this procedure many, many times, x would fall between 634.12 and 645.88 about 95% of the time. TRUE 21. The scores on the Wechsler Intelligence Scale for Children (WISC) are thought to be Normally distributed with a standard deviation of = 10. A simple random sample of 25 children is taken, and each is given the WISC. The mean of the 25 scores is x = 104.32. Based on these data, what is a 95% confidence interval for ? A) 104.32 ± 0.78 B) 104.32 ± 3.29 C) 104.32 ± 3.92 D) 104.32 ± 19.60 22. The larger the level of confidence, C, the ______ the confidence interval. A) smaller B) larger C) None of the above. 23. When we state the alternative hypothesis to look for a difference in a parameter in either direction, we are doing a _____. A) one-sided test B) two-sided test C) None of the above. 24. Given that a test of significance was done for a two-sided test and the P-value obtained was .02, what would be the P-value for a one-sided significance test? A) 0.02 C) 0.01 B) 0.04 D) 0 EDRS 811 Final Sp. 12 8 25. A simple random sample of six male patients over the age of 65 is being used in a blood pressure study. The standard error of the mean blood pressure of these six men was 22.8. What is the standard deviation of these six blood pressure measurements? A) 9.31 C) 55.85 B) 50.98 D) 136.8 26. A simple random sample of 20 third-grade children from a certain school district is selected, and each is given a test to measure his/her reading ability. You are interested in calculating a 95% confidence interval for the population mean score. In the sample, the mean score is 64 points, and the standard deviation is 12 points. What is the margin of error associated with the confidence interval? A) 2.68 points B) 4.64 points C) 5.62 points D) 6.84 points 27. Suppose a simple random sample size of n is drawn from an appropriately normal population. What degrees of freedom should be used to perform a one sample t procedure? A) n–1 B) n+1 C) n–2 D) n+2 28. Matched pairs t procedures are for use on subjects that are _______. A) independent B) the same or similar C) Normal D) None of the above. 29. One sample t test procedures are for use on subjects that are _______. A) independent B) the same or similar C) binomial D) None of the above. EDRS 811 Final Sp. 12 9 Use the following information to Answer items 30 & 31 Two statistics professors at two rival schools decide to use IQ scores as a measure of how smart the students at their respective schools are. IQ scores are known to be Normally distributed. The two professors will use this knowledge to their advantage. They will randomly select 10 students from their respective schools and determine the students’ IQ scores by means of the standard IQ test. The two professors will use the pooled version of the two-sample t test to determine whether the students at the two universities are equally smart. Let 1 and 2 represent the mean IQ scores of the students at the two universities. Let 1 and 2 be the corresponding population standard deviations. The hypotheses they will test are H0: 1 – 2 = 0 versus Ha: 1 – 2 ≠ 0. Based on the two samples of 10 students, the two professors find the following information: x1 = 111, x 2 = 120, s1 = 7, and s2 = 11. 30. What is the value of the test statistic? A) 0.107 C) –2.18 B) –0.98 D) –3.00 31. Suppose the professors had wished to test the hypotheses H0: 1 = 2 versus Ha: 1 < 2. What can we say about the value of the P-value? A) P-value < 0.01 C) 0.025 < P-value < 0.05 B) 0.01 < P-value < 0.025 D) P-value > 0.05 Use This information for items 32 & 33. The 94 students in a statistics class are categorized by gender and by the year in school. The numbers obtained are displayed below: 32. 33. A) B) C) D) E) Year in school Gender Freshman Sophomore Junior Senior Graduate Total Male 1 2 9 17 2 31 Female 23 17 13 7 3 63 Total 24 19 22 24 5 94 Suppose we wish to test the null hypothesis that there is no association between the year in school and gender. Under the null hypothesis, what is the expected number of male sophomores? A) 2 C) 6.27 B) 6 D) 9.5 It was calculated that the test statistic was X 2 = 8.083. The approximate P-value for this test is then: between 0.02 and 0.04. between 0.01 and 0.02. less than 0.01. greater than 0.04. between 0.15 and 0.25. EDRS 811 Final Sp. 12 10 Use this information for items 34-36 The data referred to in this question were collected on 41 employees of a large company. The company is trying to predict the current salary of its employees from their starting salary (both expressed in thousands of dollars). The SPPS regression output is given below as well as some summary measures: 34. What is an (approximate) 95% confidence interval for the slope 1? A) (–7.57, 4.39) C) (1.80, 2.41) B) (–4.52, 1.34) D) (1.95, 2.26) 35. Suppose we wish to test the hypotheses H0: 1 = 2 versus Ha: 1 2. Together with an insignificant constant in this model, this would imply that the employees currently earn about twice as much as their starting salary. At the 5% significance level, would we reject the null hypothesis? A) Yes B) No C) This cannot be determined from the information given. EDRS 811 Final Sp. 12 11 36. What is the value of the estimate for , the standard deviation of the model deviations i? A) 0.15 C) 7.21 B) 2.93 D) 52.0 The following information refers to items 37 – 40 The following SPSS output represent data collected on 89 middle-aged people. The relationship between body weight and percent body fat is to be studied: 37. 38. 39. What is the equation of the least-squares regression line? Y = -11.570 + 0.165x What is the value of the correlation between body fat and body weight? r = 0.5844 Let be the population correlation between body fat and body weight. What is the value of the t statistic for testing the hypotheses H0: = 0 versus Ha: ≠ 0? t = 3.93 EDRS 811 Final Sp. 12 12 40. Is the slope significantly different from zero? Include the value of the test statistic and the corresponding P-value in your answer. The test statistic t = 3.93 leads to a p-value < 0.001, which means we can reject the null hypothesis and say that the slope is significantly different from zero. 41. In a multiple regression with five explanatory variables, data are collected on 63 observations. What are the degrees of freedom for the ANOVA F test? A) 4 and 57 B) 5 and 57 C) 5 and 58 D) 5 and 62 42. In a multiple regression with four explanatory variables, data are collected on 25 observations. What is the largest value the ANOVA F statistic can take on before we would reject the null hypothesis that all of the regression coefficients are 0, at the 5% significance level? A) 2.78 B) 2.87 C) 3.10 D) 3.51 Use this information for items 43 – 46. The data referred to in this question were collected from several sales districts across the country. The data represent sales for a maker of asphalt roofing shingles. Information on the following variables is available: Sales Sales from last year in thousands of squares Expenditures Promotional expenditures in thousands of dollars Accounts Number of active accounts Competing Brands Number of competing brands producing equivalent or similar products District Potential A coded indicator of the potential of the district (higher score = better potential) Partial SPSS regression output of a multiple regression model with sales as the response variable and the other four variables as predictor variables is given below: EDRS 811 Final Sp. 12 13 43. How many districts were sampled in all? A) 21 C) 25 B) 24 D) 26 44. What is the estimate for the error variance 2? A) 9.604 C) 92.245 B) 12.960 D) 1937.137 45. What proportion of the variation in sales is explained by the set of all four explanatory variables? A) –0.647 C) 0.989 B) 0.558 D) 0.995 46. Which of the four explanatory variables seems to be the least significant in the model? A) Expenditures C) Competing Brands B) Accounts D) District Potential 46. An F test for the two coefficients of promotional expenditures and district potential is performed. The hypotheses are: H0: 1 = 4 = 0 versus Ha: at least one of the j is not 0. The F statistic for this test is 1.482 with 2 and 21 degrees of freedom. What can we say about the P-value for this test? A) P-value < 0.025 C) 0.05 <P-value < 0.10 B) 0.025 <P-value < 0.05 D) P-value > 0.10 47. A study compares five groups with 10 observations in each group. An F statistic of 3.75 is reported. What are the degrees of freedom for this F statistic? A) 4 and 45 C) 5 and 10 B) 4 and 46 D) 5 and 50 EDRS 811 Final Sp. 12 14 48. A study compares three population means. Three independent samples with 15 observations each are taken. The SSE = 1246 and the SST = 1600. What is the value of the F statistic? A) 1.11 C) 4.98 B) 3.32 D) 5.97 Use the following information for items 49 – 51 A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects in each group. A different training program was assigned to each group. After two months, the improvement in endurance was recorded for each subject. A one-way ANOVA is used to compare the five training programs, and the resulting F statistic is 3.69. 49. What can we say about the P-value for this F test? A) P-value < 0.01 B) 0.01 < P-value < 0.025 C) 0.025 < P-value < 0.05 D) 0.05 < P-value < 0.10 50. Which distribution was used to find the P-value? A) F(4, 3) B) F(5, 8) C) F(4, 35) D) t(39) 51. At a significance level of 0.05, what is the appropriate conclusion about mean improvement in endurance? A) The average amount of improvement appears to be the same for all five training programs. B) The average amount of improvement appears to be different for each of the five training programs. C) It appears that at least one of the five training programs has a different average amount of improvement. D) One training program is significantly better than the other four. EDRS 811 Final Sp. 12 15 52. SPSS output for multiple comparisons is given below, using the Bonferroni method with = 0.05: (I) Machine number 1 2 3 4 (J) Machine number 2 3 4 1 3 4 1 2 4 1 2 3 Mean difference (I–J) -15.75* .23 3.08 15.75* 15.98* 18.83* -.23 -15.98* 2.85 -3.08 -18.83 -2.85 95% Confidence interval Lower Upper bound bound -25.37 -6.13 -9.38 9.85 -6.54 12.70 6.13 25.37 6.36 25.60 9.21 28.44 -9.85 9.38 -25.60 -6.36 -6.77 12.46 -12.70 6.54 -28.44 -9.21 -12.46 6.77 Std. error 3.52 3.52 3.52 3.52 3.52 3.52 3.52 3.52 3.52 3.52 3.52 3.52 What is the correct conclusion based on these comparisons? A) Machine 1 seems to give different results from Machine 2. Machines 3 and 4 appear indistinguishable. B) Machine 2 seems to give different results from all other machines. Machines 1, 3, and 4 appear indistinguishable. C) Machine 2 seems to be doing much better than the other three machines. D) None of the above. Use the following information to complete items 53-55. Consider the following groups in the order listed on the table: 53. Complete the contrast coefficients to form the contrast for the group “Human young vs. Human old.” A) 1, 1, -1, -1, 0 B) 0, 1, 1, 1, 1 C) 1, 1, 1, 1, 0 Girls Boys Women Men Cats 54. Complete the contrast coefficients to form the contrast for the group “Human male vs. Human female.” A) 1, 1, 1, 1, 0 C) 1, 1, -1, -1, 0 B) -1, -1, -1, 0 D) -1, 1, -1, 1, 0 55. Complete the contrast coefficients to form the contrast for the group “Cats vs. Human” A) -1, -1, -1, -1, -4 C) 1, 1, 1, 1, 4 B) -1, -1, -1, -1, 4 D) 4, 4, 4, 4, 4 EDRS 811 Final Sp. 12 16 56. By doing multiple comparisons when there are more than two experimental groups, we increase the risk of making what kind of mistake? A) Accepting H0 B) Type I error C) Type II error D) All of the above. (hint, this is incorrect!) Use the following to answer questions 57 -60: A realtor wishes to assess whether a difference exists between home prices in three subdivisions. Independent samples of homes from each of the three subdivisions are obtained and their prices are recorded. The analysis of variance results for comparing these prices are provided below: Source Groups Error Total Sum of squares 157.44 253.50 410.94 DF 2 15 17 Mean square 78.72 16.90 F 4.66 57. How many homes were sampled in total? A) 15 C) 18 B) 17 D) 19 58. Under the null hypothesis of equality of population means, what is the appropriate distribution for the test statistic? A) F(2, 15) C) N(, ) B) F(2, 17) D) t(15) 59. What is the value of the estimate of the common population standard deviation of the populations of home prices in the three subdivisions? A) 4.11 C) 16.90 B) 8.87 D) 78.72 60. What can we say about the P-value for this F test? A) P-value < 0.01 C) 0.025 < P-value < 0.05 B) 0.01 < P-value < 0.025 D) 0.05 < P-value < 0.10 Congratulations! You have completed EDRS 811. Thank you for your hard work on this very abstract material. Go buy yourself something ridiculous to commemorate the event!