Download Economics 102: Analysis of Economic Data Cameron Fall 2004

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Taylor's law wikipedia , lookup

Categorical variable wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Economics 102: Analysis of Economic Data
Cameron Fall 2004
Department of Economics, U.C.-Davis
Final Exam (A) Friday December 17
Compulsory. Closed book. Total of 60 points and worth 40% of course grade.
Read question carefully so you answer the question.
Question
Points
Question scores
2a 2b 2c 2d 2e
3a 3b 3c 3d
2 2 2 4 2
2 2 2 2
Question 5a 5b 5c
M ult Choice
Points
2 2 2
10
1a 1b 1c 1d 1e
2 2 4 4 2
4a 4b 4c 4d
2 2 2 4
Multiple Choice Questions (circle one part)
1:
2:
3:
4:
5:
a
a
a
a
a
b
b
b
b
b
c
c
c
c
c
d
d
d
d
d
e
e
e
e
e
6:
7:
8:
9:
10:
a
a
a
a
a
b
b
b
b
b
c
c
c
c
c
d
d
d
d
d
e
e
e
e
e
Short Answer Questions 1-5
Data on Median Housing Rents and related data for 58 California Counties from the 2000 Census.
Dependent Variable
Rent = Monthly rent (median in the county) for Occupied Housing in 2000 in dollars.
Log (Rent) = Natural logarithm of Rent.
Regressors
Value = Value (median in the county) of an Occupied House in the county in 2000 in dollars.
Vacancy Rate = Percentage of Rental Housing Units in the county unoccupied in 2000.
Persons PH = number of persons per household in Occupied Housing in the county
%Renter = Percentage of Occupied Housing in the county that is rental (rather than nonrental)
Use the two pages of output provided at the end of this exam on:
1. Descriptive statistics output for all variables.
2. Correlations.
3. Histogram and scatterplot.
4. Various TINV values.
5. Three regressions.
Part of the following questions involves deciding which output to use. You can use
the output that gets the correct answer in the quickest possible way.
1
1.(a) Does rent appear to be symmetrically distributed?
Explain your answer giving two di¤erent reasons.
(b) On average how many years would a house need to be rented until the accumulated rent
equalled the value of the house?
(c) Give a 99% con…dence interval for the population mean rent.
(d) Perform a one-sided test at signi…cance level .05 of the claim that the population mean rent
exceeds $600.
State clearly the null and alternative hypotheses of your test, and your conclusion.
(e) Which variable appears to be most highly correlated with rent? Explain your answer.
2
2. In this question the regression studied is a linear regression of Rent on Value.
(a) According to the regression results, by how much does rent change by in response to a one
thousand dollar increase in house value?
(b) Give a 95 percent con…dence interval for the population slope parameter.
(c) Give a 90 percent con…dence interval for the population slope parameter.
(d) Test the hypothesis at signi…cance level 5% that a $100,000 increase in house value
is associated with a $100 increase in monthly rent. State clearly the null and alternative
hypothesis in terms of population parameters and your conclusion.
[Hint: You need to think carefully about appropriate units of measurement here].
(e) From Figure 2 output the R2 = 0:90 (to two decimal places). State how you could have
obtained this from the Figure 1 output.
3
3. In this question the regression studied is a linear regression of Rent on Value.
(a) Predict the actual rent for house value of $200,000.
(b) Using relevant output show that the sum of squares
value.
Pn
i=1 (xi
x)2 = 535:4
109 for house
(c) Give a 95 percent con…dence interval for actual rent for house value of $200,000, using the
result given in part (b) (even if your answer in (b) was di¤erent).
Give answer as an expression involving numbers only, though you need not complete all the calculations.
(d) Give a 95 percent con…dence interval for the conditional mean of actual rent for house value
of $200,000, using the result given in part (b) (even if your answer in (b) was di¤erent).
Give answer as an expression involving numbers only, though you need not complete all the calculations.
4
4. In this question consider both the regressions where Rent is the dependent variable.
(a) Is the vacancy rate an important determinant of rent? Explain your answer.
(b) Are value, vacancy rate, persons per household and % renter jointly statistically signi…cant at
5 percent? Perform an appropriate test. State clearly the null and alternative hypotheses of your
test, and your conclusion.
(c) Which model explains the data better, the multivariate regression or the bivariate regression,
on the basis of …t of the model? Explain your answer.
(d) Which model explains the data better, the multivariate regression or the bivariate regression,
on the basis of a formal test of statistical signi…cance at 5 percent? Perform an appropriate test.
State clearly the null and alternative hypotheses of your test, and your conclusion.
Note that FINV(.05,3,53)=2.78.
5
5. In this question consider the regression where Log(Rent) is the dependent variable.
(a) Provide a meaningful interpretation of the estimated coe¢ cient for Value.
(b) Does this regression provide an easy way to estimate the elasticity of house rent with respect
to house value. If yes, give the estimate. If no, give a method that does provide an easy way to
estimate this elasticity.
(c) For the Cobb-Douglas production function Q = aK b Lc , state how you could estimate the
coe¢ cients b and c using the regression command in Excel.
6
Multiple choice questions (1 point each)
1. The data of questions 1-5 are an example of
a. cross-section data
b. longitudinal data
c. time-series data
d. none of the above.
2. For data that are normally distributed, approximately 95 percent of the observations lie
a. within two means of the standard deviation
b. within two standard deviations of the mean
c. within two means of the variance
d. within two variances of the mean
e. none of the above.
3. Suppose annual GDP data are available. The data are in row A beginning in cell A2. To obtain
the percentage growth rate in GDP one gives the Excel command
a. = 100*(A2-A3)/A2
b. = 100*(A2-A3)/A3
c. = 100*(A3-A2)/A2
d. = 100*(A3-A2)/A3
4. If we take the natural logarithm of GDP then the annual change in log(GDP) measures
a. the annual growth rate in GDP measured as a proportion
b. the annual growth rate in GDP measured as a percentage
c. both a. and b.
d. neither a. nor b.
5. According to assumptions 1 to 5 the population model used in bivariate regression
a. y given x is normally distributed with mean b1 + b2 x2 and variance s2
b. y given x is normally distributed with mean
1
+
2 x2
and variance 0
c. y given x is normally distributed with mean b1 + b2 x2 and variance 0
d. y given x is normally distributed with mean
1
+
2 x2
and variance
2
7
6. Let ybi = b1 x1i + b2 x2i +
Pn
a.
ybi )2
i=1 (yi
Pn
b.
y)2
i=1 (yi
Pn
c.
yi y)2
i=1 (b
+ bk xki : Then the ordinary least squares estimator minimizes
d. None of the above.
7. In the formula for R2 , the quantity
a. total sum of squares
Pn
i=1 (yi
y)2 is called the
b. residual sum of squares
c. regression sum of squares
d. none of the above.
8. Consider regression of hourly wage in dollars on an intercept and an indicator variable for
whether left-handed or right-handed, so w = a + bd where d = 1 if left-handed. If b = 2 then
a. left-handed people on average have hourly wage $2 higher than that for right-handed
b. left-handed people on average have hourly wage $2 lower than that for right-handed
c. neither of the above.
9. Suppose in regression of rent on an intercept and house value using California county-level data
we wish to also control for region. We compute four separate indicator variables for whether or
not the county is north coast, south coast, central valley or eastern mountains. We should
a. include just one of the four indicator variables in the regression
b. include two of the four indicator variables in the regression
c. include three of the four indicator variables in the regression
d. include all four indicator variables in the regression
10. According to the paper by Krueger, in regressions explaining log hourly wage
a. there is no independent e¤ect of computer use
b. there is an independent e¤ect of computer use, but this disappears once we control for other
explanatory variables such as education, region and occupation
c. there is an independent e¤ect of computer use, and this remains once we control for other
explanatory variables such as education, region and occupation
d. none of the above.
8
Cameron: Department of Economics, U.C.-Davis
SOME USEFUL FORMULAS
Univariate Data
Pn
x=
1
n
x
t
t=
x 0
p
s= n
i=1
xi
=2;n 1
s2x =
and
=2
n 1
p
(s= n)
TDIST(t; df; 1) = Pr[T > t]
t
1
such that Pr[jT j > t
Pn
TDIST(t; df; 2) = Pr[jT j > t]
and
=2 ]
=
x)2
i=1 (xi
is calculated using TINV( ; df ):
Bivariate Data
Pn
x)(yi y)
sxy
=p
Pn
2
sxx syy
i=1 (xi
Pn i=1 (yi y)
(x
x)(yi y)
Pn i
yb = b1 + b2 xi
b2 = i=1
b1 = y bx
x)2
i=1 (xi
P
P
TSS = ni=1 (yi yi )2
ErrorSS= ni=1 (yi ybi )2
RegSS = TSS - ErrorSS
rxy = pPn
i=1 (xi
x)2
R2 = 1 ErrorSS/TSS
b2
t=
t
=2;n 2
b2
20
sb2
s2e
i=1 (xi
s2b2 = Pn
sb2
yjx = x 2 b1 + b2 x
t
E[yjx = x ] 2 b1 + b2 x
=2;n 2
t
x)2
se
=2;n 2
se
1
s2e =
q
1
n
+
q
1
n
n 2
Pn
2
P(x x) 2
i (xi x)
+
i=1 (yi
+1
ybi )2
2
P(x x) 2
i (xi x)
Multivariate Data
yb = b1 + b2 x2i +
+ bk xki
R2 = 1 ErrorSS/TSS
bj
t=
t
=2;n k
bj
j0
R2 = R2
k 1
(1
n k
R2 )
sbj
sbj
R2 =(k 1)
F =
(1 R2 )=(n k)
and
F =
(SSEr SSEu )=(k
SSEu =(n k)
g)
9
Figure 1 Descriptive statistics, correlations, graphs, TINV values
10
Figure 2 Regression Output
11