Download Economics 102: Analysis of Economic Data Cameron Fall 2012

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Economics 102: Analysis of Economic Data
Cameron Fall 2012
Department of Economics, U.C.-Davis
Final Exam (A) Wednesday December 12
Compulsory. Closed book. Total of 60 points and worth 45% of course grade.
Read question carefully so you answer the question.
Question scores
Question 1a 1b 1c 1d 1e
2a 2b 2c 2d 2e 2f
Points
1 2 3 3 1
1 2 3 1 3 2
Question 4a 4b 4c 4d
5a 5b 5c 5d 5e
Points
2 2 2 2
1 2 3 2 2
3a 3b 3c 3d 3e 3f
1 1 2 2 4 2
M ult Choice
8
Multiple Choice Questions (circle one part)
1:
2:
3:
4:
a
a
a
a
b
b
b
b
c
c
c
c
d
d
d
d
e
e
e
e
5:
6:
7:
8:
a
a
a
a
b
b
b
b
c
c
c
c
d
d
d
d
e
e
e
e
1
Questions 1-4
Consider data on salary, SAT test scores, and individual characteristics for 876 people from the
2010 round for the representative sample of the National Longitudinal Survey of Youth 1997.
Dependent Variable
SALARY = Annual salary in dollars.
LNSALARY = Natural logarithm of SALARY.
Regressors
SATMATH = Highest score on Math portion of SAT (possible range 200-800)
SATVERBAL = Highest score on Verbal portion of SAT (possible range 200-800)
HIGHGRADE = Highest grade of completed schooling
YEARBORN = Year born
SEX = 1 if female and 0 if male
Use the two pages of output provided at the end of this exam on:
Critical T values, summary statistics, correlations and regressions.
Part of the following questions involves deciding which output to use.
You can use the output that gets the correct answer in the quickest possible way.
1.(a) Do there appear to be any unusual values taken by any of the variables? Explain.
(b) From the output, is the SATMATH score approximately normally distributed? Explain.
(c) Give a 95% con…dence interval for the population mean salary.
(d) Perform a test at signi…cance level .05 of the claim that the population mean score on the
Math component of the SAT exceeds 500.
State clearly the null and alternative hypotheses of your test, and your conclusion.
(e) If we regressed SALARY on just one of SATMATH, SATVERB, HIGHGRADE, YEARBORN
and SEX, which of these variables would best explain SALARY? Explain.
2
2. In this question the regression studied is a linear regression of SALARY on SATMATH.
(a) According to the regression results, by how much does salary change by if the score on the
Math component of the SAT increases by 100 points?
(b) Give a 95 percent con…dence interval for the population slope parameter.
(c) Test the hypothesis at signi…cance level 5% that the population slope coe¢ cient equals 60.
State clearly the null and alternative hypothesis in terms of population parameters and your
conclusion.
(d) Predict the conditional mean salary for a person with score 600 on SATMATH.
(e) Give a 95 percent con…dence interval for the conditional mean salary for a person with score
600 on SATMATH.
Give your answer as an expression involving numbers only, though you need not complete all the
calculations.
(f) Give the Stata commands that enable estimation of a quadratic model of SALARY as a
function of SATMATH.
3
3. In this question consider both the regressions where SALARY is the dependent variable.
(a) Do any of the coe¢ cients in the larger model have unexpected sign?
(b) What is the impact on salary of being female?
(c) Are SATMATH, SATVERB, HIGHGRADE, YEARBORN and SEX jointly statistically signi…cant at 5 percent? State clearly the null and alternative hypotheses of your test, and your
conclusion.
(d) Using measures of goodness-of-…t, which model explains the data better - the multivariate
regression or the bivariate regression? Explain your answer.
(e) Are SATVERB, HIGHGRADE, YEARBORN and SEX jointly statistically signi…cant at 5
percent? Perform an appropriate test. State clearly the null and alternative hypotheses of your
test, and your conclusion. You can use as critical value 2.38.
(f) If predicting the actual value of SALARY from this regression, what is the minimum possible
width of a 95% con…dence interval? Hint: A quite brief answer is possible.
4
4. In this question consider the regression where LNSALARY is the dependent variable.
(a) What is the impact on level of salary (not log of salary) of being female?
(b) Provide a meaningful interpretation of the estimated coe¢ cient for HIGHGRADE.
(c) Suppose we wish to replace HIGHGRADE with the following indicator variables
DLOW = 1 if HIGHGRADE < 12 and DLOW = 0 otherwise
DMEDIUM = 1 if HIGHGRADE = 12 and DMEDIUM = 0 otherwise
DHIGH = 1 if HIGHGRADE > 12 and DHIGH = 0 otherwise
Do you see any problems in giving the following Stata command
regress LNSALARY DLOW DMEDIUM DHIGH
(d) Given the regression output provided, do you prefer the regression with LNSALARY as dependent variable or the regression with SALARY as the dependent variable? Explain.
5
5. This question has various unrelated parts.
(a) What is created by the Stata command generate y = x[_n-1]
(b) What is created by the Stata command scatter y x jj lfit y x
(c) A Census of a country …nds that it has 100,000,000 people with mean age 25 and standard
deviation of age equal to 20.
We obtain 400 random samples of size 100 from these random samples and calculate the mean age
in each sample.
What do you expect will be the mean, standard deviation and distribution of these 400 means?
(d) For the Cobb-Douglas production function Q = aK b Lc , state how you would use a regression
to tests constant returns to scale.
(e) Suppose X = 10 with probability 0:5 and X = 20 with probability 0:5. What is the variance
of X? Show all workings.
6
Multiple choice questions (1 point each)
1. Consider a sample of size 3 that takes values 1, 2 and 3. The sample standard deviation equals
a. 1
b. 2
c. 3
d. 4
e. none of the above
2. Let ybi = b1 x1i + b2 x2i +
a.
b.
c.
Pn
yi
i=1 (b
Pn
i=1 (yi
Pn
i=1 (yi
+ bk xki : Then the ordinary least squares estimator minimizes
y)2
y)2
ybi )2
d. none of the above
3. Multivariate regression analysis of the California Academic Performance Index reveals that
school scores are determined
a. primarily by the level of teacher credentials
b. primarily by educational attainment of students’parents
c. the two are roughly equally important
d. neither of these provides much explanation.
4. The estimated standard deviation of the slope coe¢ cient is called
a. the standard error of the regression
b. the root means squared error of the error
c. both a. and b.
d. neither a. nor b.
7
(For questions 5-6): Statistical inference is based on assumptions including
1. The population model is y =
1
+
2 x2
+
3 x3
+
+
k xk
+ ":
2. The error has mean zero and is not correlated with the regressor.
3. The errors for di¤erent observations have the same variance, denoted
2
.
4. The errors for di¤erent observations are uncorrelated.
5. The sample size n ! 1:
5. Which of these assumptions are essential for the OLS estimator to be unbiased
a. assumptions 1 to 2
b. assumptions 1 to 3
c. assumptions 1 to 4
d. assumptions 1 to 5.
6. Which of these assumptions ensure that the usual t tests are valid
a. assumptions 1 to 2
b. assumptions 1 to 3
c. assumptions 1 to 4
d. assumptions 1 to 5.
7. The p-value in a t-test of statistical signi…cance of a regressors is a measure of
a. the probability of jT j being no greater than the observed jtj, under the null hypothesis
b. the probability of jT j being at least as great as the observed jtj, under the null hypothesis
c. the probability of jT j being no greater than the observed jtj, under the alternative hypothesis
d. the probability of jT j being at least as great as the observed jtj, under the alternative hypothesis.
8. In linear OLS regression a major problem arises if
a. important regressors are omitted
b. unnecessary (or irrelevant) regressors are included
c. neither a. nor b.
d. both a. and b.
8
Cameron: Department of Economics, U.C.-Davis
SOME USEFUL FORMULAS
Univariate Data
P
x = n1 ni=1 xi
and
Pn
ttail(df; t) = Pr[T > t] where T
t(df )
x
t
t
=2
=2;n 1
s2x =
and
p
(sx = n)
such that Pr[jT j > t
Bivariate Data
Pn
=2 ]
1
n 1
x)2
i=1 (xi
t=
x
p0
s= n
is calculated using invttail(df; =2):
=
x)(yi y)
sxy
=
[Here sxx = s2x and syy = s2y ]:
Pn
2
s
s
(x
(y
y)
x
y
i
i=1 i
Pn i=1
(x
x)(yi y)
Pn i
b1 = y b2 x
yb = b1 + b2 xi
b2 = i=1
x)2
i=1 (xi
P
P
TSS = ni=1 (yi yi )2 ResidualSS = ni=1 (yi ybi )2 Explained SS = TSS - Residual SS
rxy = pPn
R2 = 1
b2
t=
t
b2
i=1 (xi
x)2
ResidualSS/TSS
=2;n 2
sb2
s2e
i=1 (xi
20
s2b2 = Pn
sb2
yjx = x 2 b1 + b2 x
t
se
=2;n 2
E[yjx = x ] 2 b1 + b2 x
t
=2;n 2
s2e =
x)2
se
q
1
n
+
q
1
n
1
n 2
Pn
i=1 (yi
2
P(x x) 2
(x
x)
i
i
+
+1
ybi )2
2
P(x x) 2
(x
x)
i
i
Multivariate Data
yb = b1 + b2 x2i +
R2 = 1
bj
t
+ bk xki
ResidualSS/TSS
=2;n k
sbj
and
k 1
(1
n k
R2 = R2
t=
bj
R2 )
j0
sbj
R2 =(k 1)
F =
F (k 1; n k)
(1 R2 )=(n k)
(ResSSr
ResSSu )=(k g)
F =
F (k g; n k)
ResSSu =(n k)
Ftail(df 1; df 2; f ) = Pr[F > f ] where F is F (df 1; df 2) distributed.
F such that Pr[F > f ] =
is calculated using invFtail(df 1; df 2; ):
9
t_.05,v for v = 875
1.6465969
t_.025,v for v = 875
1.9626788
t_.005,v for v = 875
2.5814598
v = 874
1.6465989
v = 874
1.962682
v = 874
2.5814662
v = 873
1.6466009
v = 873
1.9626851
v = 873
2.5814727
v = 872
1.6466029
v = 872
1.9626882
v = 872
2.5814792
v = 871
1.6466049
v = 871
1.9626913
v = 871
2.5814857
v = 870
1.646607
v = 870
1.9626945
v = 870
2.5814922
. summarize SALARY SATMATH SATVERB HIGHGRADE YEARBORN SEX
Variable
Obs
Mean
SALARY
SATMATH
SATVERB
HIGHGRADE
YEARBORN
876
876
876
876
876
36766.98
543.8356
543.9498
15.80251
1982.975
SEX
876
.5216895
Std. Dev.
Min
Max
24595.69
110.9271
107.2137
2.109254
.8123669
300
250
250
9
1982
130254
750
750
20
1984
.4998147
0
1
. summarize SATMATH, detail
CVC_SAT_MATH_SCORE_2007
1%
5%
10%
25%
Percentiles
250
350
450
450
50%
Smallest
250
250
250
250
550
75%
90%
95%
99%
Largest
750
750
750
750
650
650
750
750
Obs
Sum of Wgt.
Mean
Std. Dev.
Variance
Skewness
Kurtosis
876
876
543.8356
110.9271
12304.81
-.1044574
2.742132
. correlate SALARY SATMATH SATVERB HIGHGRADE YEARBORN SEX
(obs=876)
SALARY
SATMATH
SATVERB
HIGHGRADE
YEARBORN
SEX
SALARY
SATMATH
SATVERB HIGHGR~E YEARBORN
1.0000
0.2184
0.1043
0.1415
-0.1077
-0.0844
1.0000
0.6167
0.3138
0.0706
-0.1213
1.0000
0.3101
0.0822
-0.0349
1.0000
-0.0182
0.0556
1.0000
-0.0240
SEX
1.0000
10
. regress SALARY SATMATH
Source
SS
df
MS
Model
Residual
2.5246e+10
5.0408e+11
1
874
2.5246e+10
576754596
Total
5.2933e+11
875
604947912
SALARY
Coef.
SATMATH
_cons
48.42325
10432.69
Std. Err.
t
7.319038
4062.217
6.62
2.57
Number of obs
F( 1,
874)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
876
43.77
0.0000
0.0477
0.0466
24016
P>|t|
[95% Conf. Interval]
0.000
0.010
34.05831
2459.853
62.78819
18405.53
. regress SALARY SATMATH SATVERB HIGHGRADE YEARBORN SEX
Source
SS
df
MS
Model
Residual
3.9390e+10
4.8994e+11
5
870
7.8781e+09
563148341
Total
5.2933e+11
875
604947912
SALARY
Coef.
SATMATH
SATVERB
HIGHGRADE
YEARBORN
SEX
_cons
49.65464
-12.41648
1044.342
-3604.361
-3297.171
7149091
Std. Err.
9.410566
9.645084
407.7865
992.5971
1626.269
1968178
t
5.28
-1.29
2.56
-3.63
-2.03
3.63
Number of obs
F( 5,
870)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
=
=
=
=
=
=
876
13.99
0.0000
0.0744
0.0691
23731
[95% Conf. Interval]
0.000
0.198
0.011
0.000
0.043
0.000
31.18458
-31.34684
243.9818
-5552.526
-6489.04
3286159
68.12471
6.513868
1844.702
-1656.196
-105.3031
1.10e+07
. regress LNSALARY SATMATH SATVERB HIGHGRADE YEARBORN SEX
Source
SS
df
MS
Model
Residual
29.9731056
550.006387
5
870
5.99462113
.632191249
Total
579.979492
875
.662833706
LNSALARY
Coef.
SATMATH
SATVERB
HIGHGRADE
YEARBORN
SEX
_cons
.0011716
-.0001234
.0284078
-.1220743
-.0898066
251.362
Std. Err.
.0003153
.0003232
.013663
.0332572
.0544885
65.94429
t
3.72
-0.38
2.08
-3.67
-1.65
3.81
Number of obs
F( 5,
870)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.703
0.038
0.000
0.100
0.000
=
=
=
=
=
=
876
9.48
0.0000
0.0517
0.0462
.7951
[95% Conf. Interval]
.0005528
-.0007577
.0015915
-.187348
-.1967509
121.9336
.0017905
.0005109
.055224
-.0568006
.0171377
380.7905
11