Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Economics 102: Analysis of Economic Data Cameron Fall 2012 Department of Economics, U.C.-Davis Final Exam (A) Wednesday December 12 Compulsory. Closed book. Total of 60 points and worth 45% of course grade. Read question carefully so you answer the question. Question scores Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f Points 1 2 3 3 1 1 2 3 1 3 2 Question 4a 4b 4c 4d 5a 5b 5c 5d 5e Points 2 2 2 2 1 2 3 2 2 3a 3b 3c 3d 3e 3f 1 1 2 2 4 2 M ult Choice 8 Multiple Choice Questions (circle one part) 1: 2: 3: 4: a a a a b b b b c c c c d d d d e e e e 5: 6: 7: 8: a a a a b b b b c c c c d d d d e e e e 1 Questions 1-4 Consider data on salary, SAT test scores, and individual characteristics for 876 people from the 2010 round for the representative sample of the National Longitudinal Survey of Youth 1997. Dependent Variable SALARY = Annual salary in dollars. LNSALARY = Natural logarithm of SALARY. Regressors SATMATH = Highest score on Math portion of SAT (possible range 200-800) SATVERBAL = Highest score on Verbal portion of SAT (possible range 200-800) HIGHGRADE = Highest grade of completed schooling YEARBORN = Year born SEX = 1 if female and 0 if male Use the two pages of output provided at the end of this exam on: Critical T values, summary statistics, correlations and regressions. Part of the following questions involves deciding which output to use. You can use the output that gets the correct answer in the quickest possible way. 1.(a) Do there appear to be any unusual values taken by any of the variables? Explain. (b) From the output, is the SATMATH score approximately normally distributed? Explain. (c) Give a 95% con…dence interval for the population mean salary. (d) Perform a test at signi…cance level .05 of the claim that the population mean score on the Math component of the SAT exceeds 500. State clearly the null and alternative hypotheses of your test, and your conclusion. (e) If we regressed SALARY on just one of SATMATH, SATVERB, HIGHGRADE, YEARBORN and SEX, which of these variables would best explain SALARY? Explain. 2 2. In this question the regression studied is a linear regression of SALARY on SATMATH. (a) According to the regression results, by how much does salary change by if the score on the Math component of the SAT increases by 100 points? (b) Give a 95 percent con…dence interval for the population slope parameter. (c) Test the hypothesis at signi…cance level 5% that the population slope coe¢ cient equals 60. State clearly the null and alternative hypothesis in terms of population parameters and your conclusion. (d) Predict the conditional mean salary for a person with score 600 on SATMATH. (e) Give a 95 percent con…dence interval for the conditional mean salary for a person with score 600 on SATMATH. Give your answer as an expression involving numbers only, though you need not complete all the calculations. (f) Give the Stata commands that enable estimation of a quadratic model of SALARY as a function of SATMATH. 3 3. In this question consider both the regressions where SALARY is the dependent variable. (a) Do any of the coe¢ cients in the larger model have unexpected sign? (b) What is the impact on salary of being female? (c) Are SATMATH, SATVERB, HIGHGRADE, YEARBORN and SEX jointly statistically signi…cant at 5 percent? State clearly the null and alternative hypotheses of your test, and your conclusion. (d) Using measures of goodness-of-…t, which model explains the data better - the multivariate regression or the bivariate regression? Explain your answer. (e) Are SATVERB, HIGHGRADE, YEARBORN and SEX jointly statistically signi…cant at 5 percent? Perform an appropriate test. State clearly the null and alternative hypotheses of your test, and your conclusion. You can use as critical value 2.38. (f) If predicting the actual value of SALARY from this regression, what is the minimum possible width of a 95% con…dence interval? Hint: A quite brief answer is possible. 4 4. In this question consider the regression where LNSALARY is the dependent variable. (a) What is the impact on level of salary (not log of salary) of being female? (b) Provide a meaningful interpretation of the estimated coe¢ cient for HIGHGRADE. (c) Suppose we wish to replace HIGHGRADE with the following indicator variables DLOW = 1 if HIGHGRADE < 12 and DLOW = 0 otherwise DMEDIUM = 1 if HIGHGRADE = 12 and DMEDIUM = 0 otherwise DHIGH = 1 if HIGHGRADE > 12 and DHIGH = 0 otherwise Do you see any problems in giving the following Stata command regress LNSALARY DLOW DMEDIUM DHIGH (d) Given the regression output provided, do you prefer the regression with LNSALARY as dependent variable or the regression with SALARY as the dependent variable? Explain. 5 5. This question has various unrelated parts. (a) What is created by the Stata command generate y = x[_n-1] (b) What is created by the Stata command scatter y x jj lfit y x (c) A Census of a country …nds that it has 100,000,000 people with mean age 25 and standard deviation of age equal to 20. We obtain 400 random samples of size 100 from these random samples and calculate the mean age in each sample. What do you expect will be the mean, standard deviation and distribution of these 400 means? (d) For the Cobb-Douglas production function Q = aK b Lc , state how you would use a regression to tests constant returns to scale. (e) Suppose X = 10 with probability 0:5 and X = 20 with probability 0:5. What is the variance of X? Show all workings. 6 Multiple choice questions (1 point each) 1. Consider a sample of size 3 that takes values 1, 2 and 3. The sample standard deviation equals a. 1 b. 2 c. 3 d. 4 e. none of the above 2. Let ybi = b1 x1i + b2 x2i + a. b. c. Pn yi i=1 (b Pn i=1 (yi Pn i=1 (yi + bk xki : Then the ordinary least squares estimator minimizes y)2 y)2 ybi )2 d. none of the above 3. Multivariate regression analysis of the California Academic Performance Index reveals that school scores are determined a. primarily by the level of teacher credentials b. primarily by educational attainment of students’parents c. the two are roughly equally important d. neither of these provides much explanation. 4. The estimated standard deviation of the slope coe¢ cient is called a. the standard error of the regression b. the root means squared error of the error c. both a. and b. d. neither a. nor b. 7 (For questions 5-6): Statistical inference is based on assumptions including 1. The population model is y = 1 + 2 x2 + 3 x3 + + k xk + ": 2. The error has mean zero and is not correlated with the regressor. 3. The errors for di¤erent observations have the same variance, denoted 2 . 4. The errors for di¤erent observations are uncorrelated. 5. The sample size n ! 1: 5. Which of these assumptions are essential for the OLS estimator to be unbiased a. assumptions 1 to 2 b. assumptions 1 to 3 c. assumptions 1 to 4 d. assumptions 1 to 5. 6. Which of these assumptions ensure that the usual t tests are valid a. assumptions 1 to 2 b. assumptions 1 to 3 c. assumptions 1 to 4 d. assumptions 1 to 5. 7. The p-value in a t-test of statistical signi…cance of a regressors is a measure of a. the probability of jT j being no greater than the observed jtj, under the null hypothesis b. the probability of jT j being at least as great as the observed jtj, under the null hypothesis c. the probability of jT j being no greater than the observed jtj, under the alternative hypothesis d. the probability of jT j being at least as great as the observed jtj, under the alternative hypothesis. 8. In linear OLS regression a major problem arises if a. important regressors are omitted b. unnecessary (or irrelevant) regressors are included c. neither a. nor b. d. both a. and b. 8 Cameron: Department of Economics, U.C.-Davis SOME USEFUL FORMULAS Univariate Data P x = n1 ni=1 xi and Pn ttail(df; t) = Pr[T > t] where T t(df ) x t t =2 =2;n 1 s2x = and p (sx = n) such that Pr[jT j > t Bivariate Data Pn =2 ] 1 n 1 x)2 i=1 (xi t= x p0 s= n is calculated using invttail(df; =2): = x)(yi y) sxy = [Here sxx = s2x and syy = s2y ]: Pn 2 s s (x (y y) x y i i=1 i Pn i=1 (x x)(yi y) Pn i b1 = y b2 x yb = b1 + b2 xi b2 = i=1 x)2 i=1 (xi P P TSS = ni=1 (yi yi )2 ResidualSS = ni=1 (yi ybi )2 Explained SS = TSS - Residual SS rxy = pPn R2 = 1 b2 t= t b2 i=1 (xi x)2 ResidualSS/TSS =2;n 2 sb2 s2e i=1 (xi 20 s2b2 = Pn sb2 yjx = x 2 b1 + b2 x t se =2;n 2 E[yjx = x ] 2 b1 + b2 x t =2;n 2 s2e = x)2 se q 1 n + q 1 n 1 n 2 Pn i=1 (yi 2 P(x x) 2 (x x) i i + +1 ybi )2 2 P(x x) 2 (x x) i i Multivariate Data yb = b1 + b2 x2i + R2 = 1 bj t + bk xki ResidualSS/TSS =2;n k sbj and k 1 (1 n k R2 = R2 t= bj R2 ) j0 sbj R2 =(k 1) F = F (k 1; n k) (1 R2 )=(n k) (ResSSr ResSSu )=(k g) F = F (k g; n k) ResSSu =(n k) Ftail(df 1; df 2; f ) = Pr[F > f ] where F is F (df 1; df 2) distributed. F such that Pr[F > f ] = is calculated using invFtail(df 1; df 2; ): 9 t_.05,v for v = 875 1.6465969 t_.025,v for v = 875 1.9626788 t_.005,v for v = 875 2.5814598 v = 874 1.6465989 v = 874 1.962682 v = 874 2.5814662 v = 873 1.6466009 v = 873 1.9626851 v = 873 2.5814727 v = 872 1.6466029 v = 872 1.9626882 v = 872 2.5814792 v = 871 1.6466049 v = 871 1.9626913 v = 871 2.5814857 v = 870 1.646607 v = 870 1.9626945 v = 870 2.5814922 . summarize SALARY SATMATH SATVERB HIGHGRADE YEARBORN SEX Variable Obs Mean SALARY SATMATH SATVERB HIGHGRADE YEARBORN 876 876 876 876 876 36766.98 543.8356 543.9498 15.80251 1982.975 SEX 876 .5216895 Std. Dev. Min Max 24595.69 110.9271 107.2137 2.109254 .8123669 300 250 250 9 1982 130254 750 750 20 1984 .4998147 0 1 . summarize SATMATH, detail CVC_SAT_MATH_SCORE_2007 1% 5% 10% 25% Percentiles 250 350 450 450 50% Smallest 250 250 250 250 550 75% 90% 95% 99% Largest 750 750 750 750 650 650 750 750 Obs Sum of Wgt. Mean Std. Dev. Variance Skewness Kurtosis 876 876 543.8356 110.9271 12304.81 -.1044574 2.742132 . correlate SALARY SATMATH SATVERB HIGHGRADE YEARBORN SEX (obs=876) SALARY SATMATH SATVERB HIGHGRADE YEARBORN SEX SALARY SATMATH SATVERB HIGHGR~E YEARBORN 1.0000 0.2184 0.1043 0.1415 -0.1077 -0.0844 1.0000 0.6167 0.3138 0.0706 -0.1213 1.0000 0.3101 0.0822 -0.0349 1.0000 -0.0182 0.0556 1.0000 -0.0240 SEX 1.0000 10 . regress SALARY SATMATH Source SS df MS Model Residual 2.5246e+10 5.0408e+11 1 874 2.5246e+10 576754596 Total 5.2933e+11 875 604947912 SALARY Coef. SATMATH _cons 48.42325 10432.69 Std. Err. t 7.319038 4062.217 6.62 2.57 Number of obs F( 1, 874) Prob > F R-squared Adj R-squared Root MSE = = = = = = 876 43.77 0.0000 0.0477 0.0466 24016 P>|t| [95% Conf. Interval] 0.000 0.010 34.05831 2459.853 62.78819 18405.53 . regress SALARY SATMATH SATVERB HIGHGRADE YEARBORN SEX Source SS df MS Model Residual 3.9390e+10 4.8994e+11 5 870 7.8781e+09 563148341 Total 5.2933e+11 875 604947912 SALARY Coef. SATMATH SATVERB HIGHGRADE YEARBORN SEX _cons 49.65464 -12.41648 1044.342 -3604.361 -3297.171 7149091 Std. Err. 9.410566 9.645084 407.7865 992.5971 1626.269 1968178 t 5.28 -1.29 2.56 -3.63 -2.03 3.63 Number of obs F( 5, 870) Prob > F R-squared Adj R-squared Root MSE P>|t| = = = = = = 876 13.99 0.0000 0.0744 0.0691 23731 [95% Conf. Interval] 0.000 0.198 0.011 0.000 0.043 0.000 31.18458 -31.34684 243.9818 -5552.526 -6489.04 3286159 68.12471 6.513868 1844.702 -1656.196 -105.3031 1.10e+07 . regress LNSALARY SATMATH SATVERB HIGHGRADE YEARBORN SEX Source SS df MS Model Residual 29.9731056 550.006387 5 870 5.99462113 .632191249 Total 579.979492 875 .662833706 LNSALARY Coef. SATMATH SATVERB HIGHGRADE YEARBORN SEX _cons .0011716 -.0001234 .0284078 -.1220743 -.0898066 251.362 Std. Err. .0003153 .0003232 .013663 .0332572 .0544885 65.94429 t 3.72 -0.38 2.08 -3.67 -1.65 3.81 Number of obs F( 5, 870) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.703 0.038 0.000 0.100 0.000 = = = = = = 876 9.48 0.0000 0.0517 0.0462 .7951 [95% Conf. Interval] .0005528 -.0007577 .0015915 -.187348 -.1967509 121.9336 .0017905 .0005109 .055224 -.0568006 .0171377 380.7905 11