* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Econ173_fa02FinalAnswers
Psychometrics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Regression toward the mean wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
Final—Form A Fall 2002 Economics 173 Instructor: Petry Name_____________ SSN______________ Before beginning the exam, please verify that you have 15 pages with 48 questions in your exam booklet. You should also have a decision-tree and formula sheet provided by your TA. Please include your full name, social security number and Net-ID on your bubble sheets. Good luck! 1. Imagine we have sample sales numbers for 50 random books published this year. It happens that the sample has a “symmetric” distribution. If we add the sales of the top 10 best sellers to this sample what would happen to the mean, median and mode of the distribution? a. mode > mean > median b. mode > median > mean c. median > mean > mode d. mean > median > mode e. mean > mode > median 2. If we want to increase the confidence level for estimating an interval for the variance of grades of students in this exam, while maintaining the width of the interval, what can we possibly do? a. Add some new observation to my sample to make the sample size larger b. Throw some observations away to make the sample size smaller c. Decrease the standard deviation of the population d. Increase the standard deviation of the population e. Both (b) and (d) have the same effect, so we would do both 3. If we want to make a confidence interval with a band of +/- .1, 95% confidence, and assuming we have no prior information about the proportion of students who leave campus for winter break, how large must my sample size be? (Z0.005=2.58, Z0.025=1.96, Z0.05=1.645) a. 95 b. 96 c. 97 d. 67 e. 68 4. All of the following are assumptions about errors (residuals) in the classical regression model EXCEPT: a. the probability distribution of errors is normal b. there is no serious multicollinearity present c. errors are correlated with the independent variable(s) d. the standard deviation of errors is constant for all values of the independent variable(s) e. errors are independent of each other across time 582763826 Page 1 of 16 Use the excel output below to answer the next two questions (#5-6). SUMMARY OUTPUT Regression Statistics Multiple R 0.70374 R Square 0.49525 Adjusted R Square 0.487484 Standard Error 3.073497 Observations 67 ANOVA df Regression Residual Total Intercept X Variable 1 1 65 66 SS 602.4573 614.0148 1216.472 MS 602.4573 9.446382 F 63.77652 Significance F 3.09E-11 Coefficients -0.4969 0.727765 Standard Error 0.415287 0.09113 t Stat -1.19653 7.98602 P-value 0.235837 3.09E-11 Lower 95% -1.32629 0.545766 Upper 95% 0.332483 0.909764 5. Which of the following are true? I. The independent variable explains more that 50% of the variability in the independent variable. II. Overall, the model does a good job explaining the data. III. The independent variable is not statistically significant at the 5% level. a. I an II b. I only c. II only d. II and III e. III only 6. In the test for overall significance of the model, the test statistic follows which distribution (with degrees of freedom in parentheses)? a. t(66) b. t(67) c. F(2, 66) d. F(1, 65) e. None of the above 582763826 Page 2 of 16 7. Which of the following statements is definitely correct? a. The 90% confidence interval for the population mean from a sample of 20 observations is narrower than the 87% confidence interval for a sample of 25 holding everything else constant. b. The 90% confidence interval for the population mean from a sample of 20 observations is wider than the 87% confidence interval for a sample of 25 holding everything else constant. c. The 87% confidence interval for the population mean from a sample of 20 observations is narrower than the 90% confidence interval for a sample of 25 holding everything else constant. d. The 87% confidence interval for the population mean from a sample of 20 observations is wider than the 90% confidence interval for a sample of 25 holding everything else constant. e. None of the above Use the following information to answer the next four questions (#8-11). Below is a regression of number of cigarettes smoked per weekend on the independent variables listed. DRINKS is the number of alcoholic drinks consumed per weekend. PARTIES is the number of house parties attended per weekend. ASIAN and AFRICAN-AM are dummy variables for ethnicity (with CAUCASIAN being the only other ethnicity represented in our sample). COLLEGE is whether the person attends college. RUNNER is whether the person runs at least twice a week for 30 minutes. SUMMARY OUTPUT Regression Statistics Multiple R 0.493408 R Square Adjusted R Square 0.194642 Standard Error 50.94051 Observations 100 ANOVA df Regression Residual Total Intercept DRINKS PARTIES ASIAN COLLEGE AFRICAN-AM RUNNER 582763826 6 93 99 SS 77657.86 241329 Coefficients Standard Error 9.641725 16.03158 2.254276 0.522904 0.690245 0.271775 2.569739 3.522896 2.314365 14.2286 -3.19552 20.54125 -2.52578 10.75428 MS t Stat 0.601421 4.311073 2.539772 0.729439 0.162656 -1.55566 -0.23486 F Significance F 4.987784 0.000181 P-value 0.549023 4.04E-05 0.012751 0.467566 0.871142 0.123186 0.814832 Lower 95% Upper 95 -22.1938 41.47 1.215894 3.292 0.150556 1.229 -4.42603 9.565 -25.9408 30.56 -72.746 8.835 -23.8816 18.83 Page 3 of 16 8. Using an alpha of .05, which of the following variables would be significant in the model? a. DRINKS, PARTIES, ASIAN, COLLEGE, AFR-AM, and RUNNER b. ASIAN, COLLEGE, AFR-AM and RUNNER c. DRINKS and PARTIES d. PARTIES e. ASIAN and AFR-AM 9. Assuming a level of significance that allows for ALL the variables to be significant, then what is the difference in average cigarettes smoked per weekend between an Asian and an African-American, all else being equal? a. 2.52578 b. 3.19552 c. -0.6697 d. 5.7653 e. 0.6697 10. Given an SSR of the Reduced Model of 28,384, what is the Test Statistic for the partial F test? a. 8.080 b. 4.747 c. 9.494 d. 3.938 e. 10.385 11. In order to test whether an Asian person smokes 3 more cigarettes compared to Caucasians, all else being equal, what is the appropriate test statistic? a. 0.72944 b. –0.72944 c. –0.12213 d. 0.3835 e. -0.3835 582763826 Page 4 of 16 Use the information below to answer the following two questions (#12-13). Beer demand at a small-sized cricket stadium in New Zealand is being studied. A sample of 25 daily demands was collected, the data being in cases of beer. The descriptive statistics are presented below: Demand Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 201.6 3.289883 204 213 16.44942 270.5833 0.390956 -0.68938 70 159 229 5040 25 12. Calculate the test statistic for testing the alternative hypothesis that the population variance of beer demand is greater than 250. The value of the test statistic is: a. 27.06 b. 1.58 c. 1.65 d. 25.98 e. 37.12 13. Given the right sided critical values: 36.42 (df=24) and 37.65 (df=25), the decision from this test should be: a. fail to reject the null, therefore conclude that the variance is less than 250 b. fail to reject the null, therefore conclude that the variance is not greater than 250 c. reject the null, therefore conclude that the variance is less than 250 d. reject the null, therefore conclude that the variance is greater than 250 e. the test proves inconclusive 582763826 Page 5 of 16 14. Out of 87 people polled in Israel, 21 said “yes” to the question: Do you enjoy watching golf on TV? In Denmark, out of 113 respondents, 49 said yes to the same question. To test the divergence of golf-watching preferences in the two countries, the most suitable test from the list below is: a. z-test for difference in proportions b. chi-square test for difference in proportions c. F-test for difference in variances d. t-test for difference in means (equal variances) e. paired sample t-test for mean difference 15. Data was collected on the same set of sugar-mills in Bangladesh before and after unionization to examine the effect on productivity. Data was in the form of man-hours per ton of sugar produced. The effect of unionization is best tested using: a. z-test for difference in proportions b. chi-square test for difference in proportions c. F-test for difference in variances d. t-test for difference in means (equal variances) e. paired sample t-test for mean difference 16. The effect(s) of multicollinearity is (are): a. The standard errors are enlarged b. t-statistics are decreased c. The F-stat is made artificially high d. All of the above e. (a) and (b) 17. You estimated the following regression SALES = 2.5 + .9WRKYRS + 1.5 WRKYRS^2 (.5) (.18) (0.375) where the standard errors are in parenthesis. From this you can conclude that (with the two sided critical values being +/- 2.05): a. b. c. d. e. The t-stat for the quadratic term is 5 The t-stat for the quadratic term is 4 The t-stat for the quadratic term is 0.1406 The t-stat for the quadratic term is 9 None of the above 18. You are given the following data: F= 12.07449 and MSR=638.0741. Calculate the standard error of the estimate of the model. a. 0.0189 b. 52.8448 c. 7.269 d. 0.1376 e. we do not have sufficient information 582763826 Page 6 of 16 Use the following information to answer the next two questions (#19-20). Cigarette consumption data was collected from 1975 to 1994 (in thousands of sticks of annual consumption) and subjected to regression trend analysis. The regression outputs are provided below (NOTE: the year 1975 was coded as 1): Intercept CODED YEAR Intercept CODED YEAR CODED^2 Intercept CODED YEAR CODED^2 CODED^3 Coefficients 4317.958 -83.6436 Coefficients 4265.625 -69.3709 -0.67965 Coefficients 4151.68 -11.2297 -7.43527 0.214464 Standard Error 25.02099206 2.088712061 Standard Error 37.89803543 8.311626354 0.384451587 Standard Error 42.14282662 16.95671287 1.85247981 0.058077868 t Stat 172.5734 -40.0455 t Stat 112.5553 -8.34625 -1.76785 t Stat 98.51451 -0.66226 -4.01368 3.692696 P-value 1.99E-30 4.77E-19 P-value 7.27E-26 2.04E-07 0.095028 P-value 1.06E-23 0.517232 0.001003 0.001973 19. After studying these outputs and at a 10% level of significance, which of the models given above would you choose? a. linear b. quadratic c. cubic d. any of the above are acceptable 20. Irrespective of which model you actually chose above, use the cubic model to “predict” cigarette consumption in the year 1997 (in thousands of sticks), rounded to the nearest integer: a. 2570 b. 2311 c. 2394 d. not enough information for prediction to be performed. 582763826 Page 7 of 16 Use the following information to answer the next 3 questions (#21-23). A quarterly data set was processed, with the intention of constructing seasonal indices. As we know, the first step in this effort is to fit a linear trend. The regression output is provided here: Coefficients 143.0072 7.416087 Intercept Period Standard Error t Stat 8.681187 16.47324 0.607557 12.20641 P-value 7.35E-14 2.86E-11 The residual output and some additional information: RESIDUAL OUTPUT Year 1994 Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1995 1996 1997 1998 1999 yhat (predicted y) 150.4233333 157.8394203 165.2555072 172.6715942 180.0876812 187.5037681 194.9198551 202.335942 209.752029 217.1681159 224.5842029 232.0002899 239.4163768 246.8324638 254.2485507 261.6646377 269.0807246 276.4968116 283.9128986 291.3289855 298.7450725 306.1611594 313.5772464 320.9933333 Residuals 33.57667 15.16058 -5.25551 16.32841 10.91232 -2.50377 -10.9199 -2.33594 -4.75203 -25.1681 -24.5842 -3.00029 -3.41638 -27.8325 -43.2486 10.33536 10.91928 -15.4968 -8.9129 30.67101 32.25493 -5.16116 -7.57725 30.00667 percentage trend 1.223214484 1.096050655 0.968197688 1.094563358 1.060594477 0.986646838 0.943977718 0.988455131 0.977344539 0.884107684 0.890534585 0.987067732 0.985730396 0.887241478 0.829896569 1.039498506 1.040579924 0.943953019 0.968606926 1.105279653 1.107968065 0.983142344 0.975836109 1.093480654 We also provide the seasonal indices that were computed based on all this information: Q1 Q2 Q3 Q4 TOTAL 582763826 1.063 0.961 0.927 1.049 4 Page 8 of 16 21. The seasonally adjusted, or deseasonalized, value that you obtain from the actual data point for the 2nd quarter of 1997, is: a. 246.83 b. 219 c. 227.89 d. 237.21 e. –27.83 22. The trend plus seasonal component based “forecast” for the 2nd quarter of 1997 is: a. 246.83 b. 219 c. 227.89 d. 237.21 e. –27.83 23. Heteroscedasticity a. occurs when errors do not have a constant variance , and may be detected by a fan shape in the residual plot b. occurs when errors do not have zero mean, and is detected by a cyclical pattern in the residuals c. occurs when errors are not constant, and is detected by a fan shape in the residual plot d. occurs when errors are correlated, and is detected by a cyclical pattern in the residuals e. none of the above 582763826 Page 9 of 16 Use the following information to answer the next seven questions (#24-30). A multiple regression was performed to explain college GPA on the basis of high-school GPA, SAT score and the number of hours spent per week on extracurricular activities in the final year of high school. Both the college and high-school GPA variables are continuous, with ranges from 0 to 12 (due to summation of several years of GPA). The regression output is provided below, with some parts deliberately and willfully hidden: Regression Statistics Multiple R 0.536870707 R Square 0.288230156 Adjusted R Square 0.265987348 Standard Error 2.030233313 Observations 100 ANOVA df Regression Residual Total Intercept HS GPA SAT Activities 3 Coefficients 0.72110455 0.610872024 0.002708497 SS 160.2370587 395.6973413 555.9344 MS 53.41235 4.121847 F Standard Error 1.869815022 0.100749211 0.002873196 0.064049816 t Stat 0.385656 6.063293 0.942677 0.722149 P-value 0.700605 2.62E-08 0.348212 0.471959 Significance F 3.54141E-07 Lower 95% -2.99045177 Upper 95% 4.4326609 -0.00299476 -0.0808845 0.0084117 0.1733915 24. The residual degrees of freedom is equal to: a. 100 b. 99 c. 98 d. 97 e. 96 25. For every additional hour of extracurricular activity, college GPA increases by (approximately): a. -11.27, therefore, it actually decreases b. 0.046 c. 11.27 d. 0.089 e. 0.06 26. For testing the overall validity of the model, the value of the test statistic is: a. 53.41 b. 12.96 c. 0.077 d. 220.16 e. 3.54141E-07 582763826 Page 10 of 16 27. What percent of the variability in college GPA has been explained by this model, without correcting for the number of independent variables? a. 53.69 b. 28.82 c. 26.60 d. 2.03 e. same as the correct answer to the previous question 28. Among the following choices, which is the only probable candidate for a 95% confidence interval for the coefficient on high-school GPA? a. (-0.27, 0.65) b. (-0.27, -0.65) c. (0.41, 0.81) d. (-0.41, 0.81) e. (-0.41, -0.81) 29. Which of the following statements is correct? a. for every additional point scored on the SAT, the estimated average college GPA increases by 0.0027 b. for every additional point scored on the SAT, the average college GPA increases by 0.0027 c. for every additional point scored on the SAT, the population average college GPA increases by 0.0027 d. for every additional point scored on the SAT, the estimated average college GPA decreases by 0.0027 e. for every point increase in GPA, SAT score must go up by 0.0027 30. Based on these results, if you were to form a reduced model and then do a partial F-test, the degrees of freedom of your statistic would be: a. 1 and 100 b. 2 and 100 c. 96 and 100 d. 2 and 96 e. 1 and 96 31. To represent the categorical variable “MUSIC,” which has levels “classic rock”, “modern rock” and “other,” how many dummy variables need to be constructed? a. 1 b. 2 c. 3 d. 4 e. 0 582763826 Page 11 of 16 32. Given the following information, determine if first order autocorrelation exists: n=25 k=5 alpha=0.10 d=0.90 from DW table: dL=0.95 dU=1.45 a. there is no evidence of positive autocorrelation b. there is significant evidence of negative autocorrelation c. we have enough evidence to conclude that positive autocorrelation exists d. the test is inconclusive e. there is no evidence of first order correlation between the errors Use the following information to answer the next six questions (#33-38). We have collected two samples, one each from two independent populations, with the intention of comparing the population means. The following sample statistics were calculated: x1 604.02, x2 633.23, s1 64.05, s 2 103.29 33. First we need to perform an F-test to see if the two population variances are equal. Given that the right critical value is F0.05, 42,106 = 1.50, what is the left critical value F0.95, 106, 42? a. 3 b. 0.667 c. –1.5 d. 42 e. 106 34. The value of the F test statistic is: a. 0.385 b. 2.601 c. 0.912 d. 1.132 e. 1.50 35. Based on the result of the F-test, the appropriate means test in this case is: a. the chi-square test for mean difference b. the paired sample t-test for mean difference c. the pooled variance t-test d. the unequal variance t-test e. the z-test for mean difference 36. The test statistic for the test selected above is: a. -2.09 b. –1.21 c. 3.32 d. 4.16 e. 0.34 582763826 Page 12 of 16 37. Given that the two-tailed p-value (i.e. when H1: 1 2) corresponding to the test stat calculated above is 0.0386, your decision at the 5% level should be: a. reject the null hypothesis, the means are different. b. Do not reject the null hypothesis, the means are not different c. reject the null hypothesis, the means are not different d. Do not reject the null hypothesis, the mean of population 2 is greater. 38. If on the other hand, you do a one tailed test with H1: 1 > 2, and still obtain the same test statistic, the new p-value (for this alternative hypothesis) should be: a. 1.21 b. 0.0386 c. 0.0772 d. 0.9228 e. 0.9807 Use the following information to answer the next 2 questions (#39-40). Assume you are interested in calculating the seasonal indices for a quarterly data set. You conduct an analysis of trend then averaged the values obtained by quarter. This process produced the following numbers: Q1: 1.053 Q2: .941 Q3: .907 Q4: 1.029 39. Your next step would be to: a. proceed with these numbers as your seasonal indices b. throw out the numbers and conclude no “stable” seasonality exists in this data set c. purify the data d. normalize the data e. none of the above 40. The seasonal index used to represent the 1st quarter would be: a. 1.053 b. .9825 c. 1.072 d. 1.035 e. None of the above 41. Which of the following is a measure of the linear relationship between two variables? a. The standard deviation b. The covariance c. The coefficient of correlation d. The variance e. Both b & c 582763826 Page 13 of 16 42. Generally speaking, if two variables are unrelated, the covariance will be: a. a smaller positive number than if they were related b. a smaller positive or negative number than if they were related c. a larger negative number than if they were related d. a larger number than if they were related e. a positive number close to zero 43. The Central Limit Theorem is among the most remarkable theorems in all statistics. Essentially it says that: a. for sufficiently large samples, the sampling distribution of the sample mean is approximately normal when the sample is drawn from a normal population b. for sufficiently large samples, the sampling distribution of the sample mean is approximately normal, irrespective of the population distribution c. the sample mean is always equal to the population mean if both come from normal distributions d. both a & b e. both b & c 44. A wavelike pattern which persists for more than a year describes which component of a time series? a. long term trend b. cyclical effect c. seasonal effect d. random variation e. none of the above 45. The disadvantages of the moving average method of smoothing a time series include which of the following: I. Only a small portion of the data set is represented in each averaged value II. You can only use this method to forecast one period ahead III. You lose a portion of your data set IV. You cannot smooth using an even number of periods to calculate the average a. I & II b. I & III c. II, III & IV d. I, II, III & IV e. III & IV 582763826 Page 14 of 16 Use the following information to answer the next two questions (#46-47). Use w=.8 to smooth the following data set: Day Value 1 18 2 22 3 23 4 19 5 28 46. The smoothed value for time period 3 is: a. 22.6 b. 21.9 c. 22.1 d. 22.9 e. 22.3 47. The forecasted value for time period 6 would be: a. 26.1 b. 25.8 c. 26.3 d. 28.0 e. We cannot use this method of smoothing to forecast 48. In the exponential smoothing method that we used in lecture, the larger the smoothing constant: a. the more heavily the current actual data point is weighted in the smoothed series b. the more heavily the smoothed data point lagged one period is weighted in the smoothed series c. the less volatile the smoothed series would become d. the larger the difference between the smoothed data point and the actual data point would be for each period e. both a & c 582763826 Page 15 of 16 Answer Key: 1. d 2. a 3. c 4. c 5. c 6. d 7. b 8. c 9. d 10. b 11. c 12. d 13. b 14. a 15. e 16. e 17. b 18. c 19. c 20. a 21. c 22. d 23. a 24. e 25. b 582763826 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. b b c a d b c b a d a a e d c e b b b b a c a Page 16 of 16