Download ECO671, Spring 2006 , Second homework assignment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Time series wikipedia , lookup

Least squares wikipedia , lookup

Data assimilation wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Names _______________________________________________________________
ECO671, Spring 2015, Third homework assignment.
Prof. Bill Even
The assignment is due Monday 5/5 by 5 p.m. Submit your assignment via email to [email protected] by the
deadline. Name the file hw4_xx_yy – where xx and yy are the uniqueids for the two team members. Late
assignments will be penalized at the rate of 20 percentage points for every day (or part thereof) that the assignment is
overdue. All team members will receive the same grade unless someone convinces me that I should do otherwise.
Provide a type-written response to all the questions. Paste the relevant portion of the stata log (both the stata
commands and the output) beneath the relevant part of each question in this word document and then provide a typewritten explanation (or leave adequate space for handwritten explanations beneath relevant stata code and results).
Be sure to include enough Stata code that I can determine exactly how you generated your data, variables, and results.
If I am unable to determine what you did, I will assume that it is wrong.
I. Multinomial logit and Probit. (50 points)
In this problem, you are asked to examine data on travel mode choice between Sydney and Melbourne. This data set is
made available by Stata to illustrate the use of alternative specific multinomial logit and probit models. The variables in
the data are described below:
Dependent variables:
mode
travel mode alternatives (1=air, 2=train, 3=bus, 4=car)
choice
travel mode chosen (1 if mode chosen, 0 otherwise)
Alternative specific variables:
termtime
time spent in terminal time (0 for car)
invehiclecost
in-vehicle cost
traveltime
travel time
travelcost
generalized cost of travel (foregone wages plus explicit travel expense)
Case specific variables
income
household income
partysize
party size travelling
id: person identifier.
The data for the first two people in the data are listed below:
id
1
1
1
1
2
2
2
2
choice
0
0
0
1
0
0
0
1
termtime
69
34
35
0
64
44
53
0
invehicle cost
59
31
25
10
58
31
25
11
travel time travel cost
100
70
372
71
417
70
180
30
68
68
354
84
399
85
255
50
income
35
35
35
35
30
30
30
30
partysize
1
1
1
1
2
2
2
2
mode
air
train
bus
car
air
train
bus
car
Notice that for each person, there are 4 lines of data corresponding to the 4 possible choices. The “case specific”
(alternative invariant) variables for a given person do not vary across alternatives (e.g. a person’s income is constant
across the 4 travel modes); the alternative specific variables (e.g. travel time) vary across travel modes for each person.
The reference manuals for asclogit and asmprobit provide examples of how to estimate multinomial logit/probit models
with the type of data The Stata reference manuals for the asmprobit and asclogit routines are available on my website
here .
Names _______________________________________________________________
1. Using the asclogit model, estimate the relationship between the travel mode choice all of the control variables listed
(alternative specific and case specific). Provide estimates with air as the base outcome, and for a model with bus as the
base outcome. Output these coefficients using the outreg2 command and paste the results below. Describe how the
estimates change when you switch the base group and explain why this happens.
2. Using the estimated model, compute the probability that a person with the average characteristics uses each mode of
transportation. [See predict option in post-estimation commands.] Note that this creates a single variable containing the
predicted probabilities for each of the four modes. How do the mean of the predicted probabilities compare with the
fraction of workers that choose each mode in a single table. 1
3. The margins command is not compatible with asclogit or asmprobit. To compute marginal effects after estimating
either model, you must use estat mfx. The default option with estat mfx is to calculate the marginal effect of each
control variables on the probability of choosing each of the four modes at the mean value of all the control variables.
Use the estat mfx command to estimate marginal effects. Based on the results, how would a 10 percent increase in
income affect the probability that a person with the average characteristics would choose each mode of travel? Describe
how you used the results from mfx to derive these estimates.
4. Define travel-cost as the “price” of a mode of travel. Use your asclogit estimates to to derive
a. the own-price (travel-cost) elasticity of demand for airline travel (i.e. %ch in air demand / % ch in air price))
b. the cross-price elasticity of demand between airline and bus travel (i.e. % ch in air demand / % ch in bus price)
5. Suppose that there is a proposal to increase the price (travel-cost) of airline travel by 10% . Estimate the effect of
such a change on the fraction of people that would choose each of the 4 modes of transportation. 2 Summarize your
results in a clearly labeled table.
6. It is conceivable that the elasticity of demand for a travel mode varies with income. Estimate an asclogit model that
allows the effect of price to vary for people according to income level. Based on your results, is the demand for a trave
mode more or less elastic among high income households? Explain and provide a test of the null hypothesis that the
own-price elasticity does not vary with income.
7. Stratify your sample into a high and low income group according to whether the person’s income is above or below
median income. Use Stata’s lrtest to generate a test of the null hypothesis that the coefficients from the asclogit model
are equal for high and low income workers. Discuss your conclusion.
8. Suppose that the train blows up and is no longer a travel option. Compute the predicted fraction of workers that
would use each of the remaining three travel modes and compare the distribution of travel choices before and after the
train disappears in a clearly labeled table. Provide a brief description of how you derived your new estimates. [Hint:
IIA.]
9. Unlike the multinomial logit, the multinomial probit (asmprobit in Stata) allows for error terms to be correlated
across choices and can thus relax the IIA assumption. If corr(independent) is the option chosen for estimating the
model, IIA is imposed. If corr(unstructured) is the option chosen, IIA is not imposed. Estimate the same model as in
(1) with independent and unstructured correlation. Use a likelihood ratio test to test whether the IIA assumption is valid
in the asmprobit. Interpret your results.
10. Use the asmprobit model to repeat the exercise in (2). Copy the table from (2) and add a column showing how the
estimates compare in the logit versus probit model.
1
An easy way to compare these stats is table mode, c(mean p mean choice) . Notice that this creates the mean of p (your
predicted probability) and choice for each mode in the data set.
2
The easiest way to do this is to predict the probabilities with the current prices, then raise the prices by 10% for
everyone and regenerate predictions.
Names _______________________________________________________________
II. Heckman sample selection model (50 points).
For this problem, use the following stata data set constructed from the SCF. The data set is
g:\eco\evenwe\eco671\data\scf671_2.dta and is a combination of data from the 7 cross sectional surveys
conducted between 1989 and 2007 (SCF is collected every 3 years). The variable you will be examining is
the mortgage rates paid by various households. The variables contained in the data set are given below:
mortgage1
mortrate1
year
hhage
married
divorce
risk
nw
kids
mort30atloan
lvratio
rincome
black
Hispanic
Has first mortage dummy
Interest rate on first mortgage x 100
year of survey
age of head of household
currently married
currently divorced or separated dummy
attitude toward financial risk: 1=willing to take high risk; 4=unwilling to take any risk
household networth in $1000s
# of children in family
30 year mortgage rate on conventional loan when loan was originated
loan-to-house value at time loan was originated
real household income
dummy for black race
dummy for Hispanic ethnicity
For people without a mortgage (mortgage1=0), no mortgage rate is reported.
11. Estimate a linear regression model of mortgage rates controlling for all of the control variables above
with the exception of the risk and number of children. Include year dummies in your regression.
12. Use the Heckit model (see heckman in Stata) with the two-step option to re-estimate the mortgage rate
equation. In the sample selection equation, use all the controls that were in the mortgage rate equation except
loan-to-value ratio and mort30atloan (these don’t exist for people without loans). Also, add dummy variables
for attitudes toward risk to the sample selection equation and the number of children.
13. Do the Heckit results suggest positive or negative sample selection? Is sample selection statistically
significant? Provide the basis for your answer.
14. Compare the estimated effect of the household head’s age and income in the OLS versus Heckman
model.
a. What does this tell you about the direction of the sample-selection bias in the OLS estimates?
b. Given the estimated parameters in the Heckit model, is this what you would have predicted? Why?
15. Estimate the predicted mortgage rate for everyone in the sample using:3
a. the OLS model
b. the sample selection model without conditioning on whether the person actually takes a mortgage
(the unconditional prediction)
iii. the sample selection model conditional on the person taking a mortgage (the conditional
prediction).
16. For the group that has a mortgage,
a. how do the OLS predictions and the unconditional predictions of the mortgage rate from the sample
selection model compare?
3
To perform this prediction, you will have to replace missing values for lvratio and mort30atloan.
these with the mean values observed for those with a mortgage.
I suggest you replace
Names _______________________________________________________________
b. how do the conditional and unconditional predictions of the mortgage rate compare?
c. In light of the above comparisons, how might you interpret OLS estimates when there is sample
selection? Explain.
17. To investigate the importance of how the sample selection problem differs across households, based on
the Heckman model results, compute the probability that each person in the sample has a mortgage (see
Heckman postestimation to find the appropriate prediction term). Divide the sample into two groups: those
with a probability of receiving mortgage<.25 and those with probability>.75. The gap between the
conditional and unconditional mortgage rate can be thought of as the “sample selection effect” on mortgages.
Where is the “sample selection effect” largest? Why should you expect this? Explain.