Download Powerpoint Slides for Unit 10

Document related concepts

Lasso (statistics) wikipedia , lookup

Discrete choice wikipedia , lookup

Data assimilation wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Time series wikipedia , lookup

Choice modelling wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
The Power of
Regression
• Previous Research Literature Claim
• Foreign-owned manufacturing plants have greater
levels of strike activity than domestic plants
• In Canada, strike rates of 25.5% versus 20.3%
• Budd’s Claim
• Foreign-owned plants are larger and located in
strike-prone industries
• Need multivariate regression analysis!
1
The Power of
Regression
Dependent Variable: Strike Incidence
(1)
U.S. Corporate Parent
(Canadian Parent omitted)
Number of Employees
(1000s)
Industry Effects?
Sample Size
0.230**
(0.117)
---
(2)
0.201*
(0.119)
0.177**
(0.019)
(3)
0.065
(0.132)
0.094**
(0.020)
No
No
Yes
2,170
2,170
2,170
* Statistically significant at the 0.10 level; ** at the 0.05 level
(two-tailed tests).
2
Important Regression
Topics
• Prediction
• Various confidence and prediction intervals
• Diagnostics
• Are assumptions for estimation & testing fulfilled?
• Specifications
• Quadratic terms? Logarithmic dep. vars.?
• Additional hypothesis tests
• Partial F tests
• Dummy dependent variables
• Probit and logit models
3
Confidence Intervals
• The true population [whatever] is within the following
interval (1-)% of the time:
Estimate ± t/2  Standard ErrorEstimate
• Just need
• Estimate
• Standard Error
• Shape / Distribution (including degrees of freedom)
4
Prediction Interval for
New Observation at xp
1. Point Estimate
2. Standard Error
3. Shape
• t distribution with n-k-1 d.f
4. So prediction interval for a new observation is
Siegel, p. 481
5
Prediction Interval for
Mean Observations at xp
1. Point Estimate
2. Standard Error
3. Shape
• t distribution with n-k-1 d.f
4. So prediction interval for a new observation is
Siegel, p. 483
6
Earlier Example
1. Find 95% CI for Joe’s exam
score (studies for 20 hours)
Hours of Study (x) and Exam
Regression Statistics
Score
(y) Example
Multiple R
0.770
R Squared
0.594
Adj. R Squared
0.543
Standard Error
10.710
Obs.
2. Find 95% CI for mean score
for those who studied for 20
hours
10
ANOVA
df
SS
MS
F
Significance
Regression
1
1340.452
1341.452
11.686
0.009
Residual
8
917.648
114.706
Total
9
2258.100
Coeff.
Std. Error
t stat
p value
Lower 95%
Upper 95%
Intercept
39.401
12.153
3.242
0.012
11.375
67.426
hours
2.122
0.621
3.418
0.009
0.691
3.554
-x = 18.80
7
Diagnostics /
Misspecification
• For estimation & testing to be valid…
• y = b0 + b1x1 + b2x2 + … + bkxk + e makes sense
• Errors (ei) are independent
• of each other
Violations render
• of the independent variables
our inferences
• Homoskedasticity
invalid and
misleading!
• Error variance independent
of the independent variables
• e2 is a constant
• Var(ei)  xi2 (i.e., not heteroskedasticity)
8
Common Problems
• Misspecification
• Omitted variable bias
• Nonlinear rather than linear relationship
• Levels, logs, or percent changes?
• Data Problems
• Skewed variables and outliers
• Multicollinearity
• Sample selection (non-random data)
• Missing data
• Problems with residuals (error terms)
• Non-independent errors
• Heteroskedasticity
9
Omitted Variable Bias
• Question 3 from Sample Exam B
wage = 9.05 + 1.39 union
(1.65) (0.66)
wage = 9.56 + 1.42 union + 3.87 ability
(1.49) (0.56)
(1.56)
wage = -3.03 + 0.60 union + 0.25 revenue
(0.70) (0.45)
(0.08)
• H. Farber thinks the average union wage is different from average
nonunion wage because unionized employers are more selective
and hire individuals with higher ability.
• M. Friedman thinks the average union wage is different from the
average nonunion wage because unionized employers have
different levels of revenue per employee.
10
Checking the
Assumptions
• How to check the validity of the assumptions?
• Cynicism, Realism, and Theory
• Robustness Checks
• Check different specifications
• But don’t just choose the best one!
• Automated Variable Selection Methods
• e.g., Stepwise regression (Siegel, p. 547)
• Misspecification and Other Tests
• Examine Diagnostic Plots
11
Diagnostic Plots
Residuals
Increasing spread might indicate
heteroskedasticity.
Try transformations
or weighted
least
squares.
Predicted Values
12
“Tilt” from
outliers might
indicate
skewness.
Try log
transformation
Residuals
Diagnostic Plots
Predicted Values
13
Problematic Outliers
Stock Performance and CEO Golf Handicaps (New York Times, 5-31-98)
Number of obs = 44
R-squared = 0.1718
-----------------------------------------------stockrating |
Coef. Std. Err.
t
P>|t|
-------------+---------------------------------handicap | -1.711
.580
-2.95
0.005
_cons | 73.234
8.992
8.14
0.000
------------------------------------------------
Without 7
“Outliers”
Number of obs = 51
R-squared = 0.0017
-----------------------------------------------stockrating |
Coef. Std. Err.
t
P>|t|
-------------+---------------------------------handicap |
-.173
.593
-0.29
0.771
_cons | 55.137
9.790
5.63
0.000
------------------------------------------------
With the 7
“Outliers”
14
Are They Really
Outliers??
Residuals
Diagnostic Plot is OK
BE CAREFUL!
Predicted Values
Stock Performance and CEO Golf Handicaps (New York Times, 5-31-98)
15
Curvature
might indicate
nonlinearity.
Try quadratic
specification
Residuals
Diagnostic Plots
Predicted Values
16
Good
diagnostic
plot. Lacks
obvious
indications of
other
problems.
Residuals
Diagnostic Plots
Predicted Values
17
Adding Squared
(Quadratic) Term
Job Performance regression on Salary (in $1,000s) (Egg Data)
Source | SS
df
MS
Number of obs = 576
------- -+-------------------F(2,573) = 122.42
Model | 255.61
2 127.8
Prob > F = 0.0000
Residual | 598.22 573 1.044
R-squared = 0.2994
---------+-------------------Adj R-squared = 0.2969
Total | 853.83 575 1.485
Root MSE = 1.0218
---------------+-------------------------------------------job performance|
Coef.
Std. Err.
t
P>|t|
---------------+-------------------------------------------salary |
.0980844
.0260215
3.77
0.000
salary squared |
-.000337
.0001905
-1.77
0.077
_cons | -1.720966
.8720358
-1.97
0.049
------------------------------------------------------------
Salary Squared = Salary2 [=salary^2 in Excel]
18
Quadratic Regression
Job perf = -1.72 + 0.098 salary – 0.00034 salary squared
Job Performance
8
6
4
Quadratic
regression
(nonlinear)
2
0
30
50
70
90
110
130
150
Annual Salary (1000s)
19
Quadratic Regression
Job perf = -1.72 + 0.098 salary – 0.00034 salary squared
Job Performance
8
-linear coeff.
Max =
2*quadratic coeff.
6
4
But
where?
2
Effect of salary
will eventually
turn negative
0
30
50
70
90
110 130 150 170 190
Annual Salary (1000s)
20
Another Specification
Possibility
• If data are very skewed, can try a log specification
• Can use logs instead of levels for independent
and/or dependent variables
• Note that the interpretation of the coefficients will
change
• Re-familiarize yourself with Siegel, pp. 68-69
21
Quick Note on Logs
• a is the natural logarithm of x if:
2.71828a = x
a
e =x
•
•
•
•
or,
The natural logarithm is abbreviated “ln”
• ln(x) = a
In Excel, use ln function
We call this the “log” but don’t use the “log” function!
Usefulness: spreads out small values and narrows large
values which can reduce skewness
22
Earnings Distribution
Skewed to
the right
Weekly Earnings from the March 2002 CPS, n=15,000
23
Residuals from Levels
Regression
Skewed to
the right—
use of t
distribution
is suspect
Residuals from a regression of Weekly Earnings on
demographic characteristics
24
Log Earnings
Distribution
Not perfectly
symmetrical,
but better
Natural Logarithm of Weekly Earnings from the March 2002
CPS, i.e., =ln(weekly earnings)
25
Residuals from Log
Regression
Almost
symmetrical
—use of t
distribution
is probably
OK
Residuals from a regression of Log Weekly Earnings on
demographic characteristics
26
Hypothesis Tests
• We’ve been doing hypothesis tests for single coefficients
• H0:  = 0
reject if |t| > t/2,n-k-1
• HA:   0
• What about testing more than one coefficient at the
same time?
• e.g., want to see if an entire group of 10 dummy
variables for 10 industries should be in the model
• Joint tests can be conducted using partial F tests
27
Partial F Tests
H0: 1 = 2 = 3 = … = C = 0
HA: at least one i  0
• How to test this?
• Consider two regressions
• One as if H0 is true
• i.e., 1 = 2 = 3 = … = C = 0
• This is a “restricted” (or constrained) model
• Plus a “full” (or unconstrained) model in which the
computer can estimate what it wants for each
coefficient
28
Partial F Tests
• Statistically, need to distinguish between
• Full regression “no better” than the restricted
regression
– versus –
• Full regression is “significantly better” than the
restricted regression
• To do this, look at variance of prediction errors
• If this declines significantly, then reject H0
• From ANOVA, we know ratio of two variances has an F
distribution
• So use F test
29
Partial F Tests
 SS
)/C
/(n  k  1)
Restricted
residual
Full
residual
(SS
F
SS
Full
residual
• SSresidual = Sum of Squares Residual
• C = #constraints
• The partial F statistic has C, n-k-1 degrees of freedom
• Reject H0 if F > F,C, n-k-1
30
Coal Mining Example
(Again)
Regression Statistics
R Squared
0.955
Adj. R Squared
0.949
Standard Error
108.052
Obs.
47
ANOVA
df
SS
MS
F
Significance
Regression
6
9975694.933
1662615.822
142.406
0.000
Residual
40
467007.875
11675.197
Total
46
10442702.809
Coeff.
Std. Error
t stat
p value
Lower 95%
Upper 95%
-168.510
258.819
-0.651
0.519
-691.603
354.583
hours
1.244
0.186
6.565
0.000
0.001
0.002
tons
0.048
0.403
0.119
0.906
-0.001
0.001
unemp
19.618
5.660
3.466
0.001
8.178
31.058
WWII
159.851
78.218
2.044
0.048
1.766
317.935
Act1952
-9.839
100.045
-0.098
0.922
-212.038
192.360
Act1969
-203.010
111.535
-1.820
0.076
-428.431
22.411
Intercept
31
Minitab Output
Predictor
Constant
hours
tons
unemp
WWII
Act1952
Act1969
S = 108.1
Analysis of
Source
Regression
Error
Total
Coef
-168.5
1.2235
0.0478
19.618
159.85
-9.8
-203.0
StDev
258.8
0.186
0.403
5.660
78.22
100.0
111.5
R-Sq = 95.5%
Variance
DF
SS
6
9975695
40
467008
46 10442703
T
-0.65
6.56
0.12
3.47
2.04
-0.10
-1.82
P
0.519
0.000
0.906
0.001
0.048
0.922
0.076
R-Sq(adj) = 94.9%
MS
1662616
11675
F
142.41
P
0.000
32
Is the Overall Model
Significant?
H0: 1 = 2 = 3 = … = 6 = 0
HA: at least one i  0
• Note: for testing the overall model, C=k
• i.e., testing all coefficients together
• From the previous slides, we have SSresidual for the
“full” (or unconstrained) model
• SSresidual=467,007.875
• But what about for the restricted (H0 true) regression?
• Estimate a constant only regression
33
Constant-Only Model
Regression Statistics
R Squared
0
Adj. R Squared
0
Standard Error
476.461
Obs.
47
ANOVA
df
SS
MS
F
Significance
Regression
0
0
0
.
.
Residual
46
10442702.809
227015.278
Total
46
10442702.809
Coeff.
Std. Error
t stat
p value
Lower 95%
Upper 95%
671.937
69.499
9.668
0.0000
532.042
811.830
Intercept
34
Partial F Tests
(10,442,702.809  467,007.875)/6
F
467,007.875/(47  6  1)
= 142.406
H0: 1 = 2 = 3 = … = 6 = 0
HA: at least one i  0
• Reject H0 if F > F,C, n-k-1 = F0.05,6,40 = 2.34
• 142.406 > 2.34 so reject H0. Yes, overall model is
significant
35
Denominator Degrees of Freedom
Select F Distribution
5% Critical Values
Numerator Degrees of Freedom
1
2
3
4
5
6
…
1
161 199 216 225 230 234
2 18.5 19.0 19.2 19.2 19.3 19.3
3 10.1 9.55 9.28 9.12 9.01 8.94
8 5.32 4.46 4.07 3.84 3.69 3.58
10 4.96 4.10 3.71 3.48 3.33 3.22
11 4.84 3.98 3.59 3.36 3.20 3.09
12 4.75 3.89 3.49 3.26 3.11 3.00
18 4.41 3.55 3.16 2.93 2.77 2.66
40 3.94 3.09 2.84 2.46 2.31 2.19
1000 3.85 3.00 2.61 2.38 2.22 2.11
…
36
A Small Shortcut
Regression Statistics
R Squared
0.955
Adj. R Squared
0.949
Standard Error
108.052
Obs.
47
For constant only model,
SSresidual=10,442,702.809
ANOVA
df
SS
MS
F
Significance
Regression
6
9975694.933
1662615.822
142.406
0.000
Residual
40
467007.875
11675.197
Total
46
10442702.809
Coeff.
Std. Error
p value
Lower 95%
-168.510
258.819
hours
1.244
0.186
tons
0.048
0.403
unemp
19.618
5.660
WWII
159.851
78.218
2.044
0.048
1.766
317.935
Act1952
-9.839
100.045
-0.098
0.922
-212.038
192.360
Act1969
-203.010
111.535
-1.820
0.076
-428.431
22.411
Intercept
t stat
Upper 95%
-0.651
0.519
-691.603
354.583
So to
test overall
model,
you
0.000
0.001
0.002
don’t6.565
need to
run a constant0.119
0.906
-0.001
0.001
only3.466
model0.001
8.178
31.058
37
An Even Better
Shortcut
Regression Statistics
R Squared
0.955
Adj. R Squared
0.949
Standard Error
108.052
Obs.
47
ANOVA
df
SS
MS
F
Significance
Regression
6
9975694.933
1662615.822
142.406
0.000
Residual
40
467007.875
11675.197
Total
46
10442702.809
Coeff.
Std. Error
-168.510
258.819
hours
1.244
0.186
tons
0.048
0.403
unemp
19.618
5.660
WWII
159.851
78.218
Act1952
-9.839
100.045
-0.098
0.922
-212.038
192.360
Act1969
-203.010
111.535
-1.820
0.076
-428.431
22.411
Intercept
t stat
p value
Lower 95%
Upper 95%
In fact,
the
ANOVA
table
F
-0.651
0.519
-691.603
354.583
test is6.565
exactly0.000
the test0.001
for the 0.002
0.001
overall0.119
model0.906
being -0.001
3.466
0.001
8.178
31.058
significant—recall
Unit
2.044
0.048
1.7668
317.935
38
Testing Any Subset
Regression Statistics
Partial F test can be 0.955
Adj. R Squared
0.949
used
to
test
any
subset
Standard Error
108.052
of
Obs. variables
47
R Squared
ANOVA
df
SS
MS
F
Significance
Regression
6
9975694.933
1662615.822
142.406
0.000
Residual
40
467007.875
11675.197
Total
46
10442702.809
Coeff.
Std. Error
t stat
p value
Lower 95%
Upper 95%
-168.510
258.819
-0.651
0.519
-691.603
354.583
Intercept
hours
1.244
tons
0.048
unemp
19.618
WWII
159.851
Act1952
-9.839
Act1969
-203.010
6.565
0.000
0.001
0.002
For example,
0.403
0.119
0.906
-0.001
0.001
H0: WWII3.466
= Act1952
5.660
0.001 = Act1969
8.178 = 031.058
78.218
2.044
0.048
1.766
317.935
H
:
at
least
one


0
A
i
100.045
-0.098
0.922
-212.038
192.360
0.186
111.535
-1.820
0.076
-428.431
22.411
39
Restricted Model
Regression Statistics
R Squared
0.955
Adj. R Squared
0.949
Standard Error
108.052
Obs.
Restricted regression with
WWII = Act1952 = Act1969 = 0
47
ANOVA
df
SS
MS
F
Significance
Regression
3
9837344.76
3279114.920
232.923
0.000
Residual
43
605358.049
14078.094
Total
46
10442702.809
Coeff.
Std. Error
t stat
p value
147.821
166.406
0.888
0.379
0.0015
0.0001
20.522
0.000
-0.0008
0.0003
-2.536
0.015
7.298
4.386
1.664
0.103
Intercept
hours
tons
unemp
40
Partial F Tests
(605,358.049  467,007.875)/3
F
467,007.875/(47  6  1)
= 3.950
H0: WWII = Act1952 = Act1969 = 0
HA: at least one i  0
• Reject H0 if F > F,C, n-k-1 = F0.05,3,40 = 2.84
• 3.95 > 2.84 so reject H0. Yes, subset of three coefficients
are jointly significant
41
Regression and
Two-Way ANOVA
Blocks
Treatments
1
2
3
4
5
A B C
10 9 8
12 6 5
18 15 14
20 18 18
8 7 8
“Stack” data
using dummy
variables
A B C B2 B3 B4 B5 Value
1 0 0 0 0 0 0
10
1 0 0 1
0
0
0
12
1 0 0 0
1
0
0
18
1 0 0 0
0
1
0
20
1 0 0 0 0 0 1
8
0 1 0 0
0
0
0
9
0 1 0 1
0
0
0
6
0 1 0 0
1
0
0
15
0 1 0 0 0 1 0
18
0 1 1 0
0
0
1
7
0 0 1 0
0
0
0
8
…
…
42
Recall Two-Way Results
ANOVA: Two-Factor Without Replication
Source of
Variation
SS
df
MS
312.267
4
Treatment
26.533
Error
Total
Blocks
F
Pvalue
F
crit
78.067
38.711 0.000
3.84
2
13.267
6.579 0.020
4.46
16.133
8
2.017
354.933
14
43
Regression and
Two-Way ANOVA
Source |
SS
df
MS
----------+---------------------Model | 338.800
6 56.467
Residual | 16.133
8
2.017
-------------+------------------Total | 354.933 14 25.352
Number of obs
F( 6,
8)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
15
28.00
0.0001
0.9545
0.9205
1.4201
------------------------------------------------------------treatment |
Coef. Std. Err.
t
P>|t| [95% Conf. Int]
----------+-------------------------------------------------b | -2.600
.898
-2.89
0.020 -4.671 -.529
c | -3.000
.898
-3.34
0.010 -5.071 -.929
b2 | -1.333 1.160
-1.15
0.283 -4.007
1.340
b3 |
6.667 1.160
5.75
0.000
3.993
9.340
b4 |
9.667 1.160
8.34
0.000
6.993 12.340
b5 | -1.333 1.160
-1.15
0.283 -4.007
1.340
_cons | 10.867
.970
11.20
0.000
8.630 13.104
------------------------------------------------------------44
Regression and
Two-Way ANOVA
Regression Excerpt for Full Model
Source |
SS
df
MS
---------+------------------Model | 338.800 6 56.467
Residual | 16.133 8
2.017
---------+------------------Total | 354.933 14 25.352
Regression Excerpt for
b2= b3 =… 0
Source |
SS
df
MS
---------+------------------Model | 26.533
2 13.267
Residual | 328.40 12 27.367
---------+------------------Total | 354.933 14 25.352
Use these SSresidual
values to do partial F
tests and you will get
exactly the same
answers as the TwoWay ANOVA tests
Regression Excerpt for
b = c = 0
Source |
SS
df
MS
---------+------------------Model | 312.267
4 78.067
Residual | 42.667 10 4.267
---------+------------------Total | 354.933 14 25.352
45
Denominator Degrees of Freedom
Select F Distribution
5% Critical Values
1
1
161
2
18.5
3
10.1
8
5.32
10 4.96
11 4.84
12 4.75
18 4.41
40 3.94
1000 3.85
 3.84
Numerator Degrees of Freedom
2
3
4
5
6
9
199 216 225 230 234
241
19.0 19.2 19.2 19.3 19.3
19.4
9.55 9.28 9.12 9.01 8.94
8.81
4.46 4.07 3.84 3.69 3.58
3.39
4.10 3.71 3.48 3.33 3.22
3.02
3.98 3.59 3.36 3.20 3.09
2.90
3.89 3.49 3.26 3.11 3.00
2.80
3.55 3.16 2.93 2.77 2.66
2.46
3.09 2.84 2.46 2.31 2.19
2.12
3.00 2.61 2.38 2.22 2.11
1.89
3.00 2.60 2.37 2.21 2.10
1.83
…
46
3 Seconds of Calculus
y y

x x
 log( x) x

x
x
bo
 0 if b o is a constant
x
 (b1 x)
 b1
x
47
Regression
Coefficients
• y = b0 + b1x
(linear form)
y
 b1
x
1 unit change in x
changes y by b1
• log(y) = b0 + b1x (semi-log form)
 log( y ) y / y %y


 b1
x
x
x
1 unit change in x
changes y by b1
(x100) percent
• log(y) = b0 + b1log(x) (double-log form)
 log( y ) y / y %y


 b1
 log( x) x / x %x
1 percent change in
x changes y by b1
percent
48
Log Regression
Coefficients
• wage = 9.05 + 1.39 union
• Predicted wage is $1.39 higher for unionized
workers (on average)
• log(wage) = 2.20 + 0.15 union
• Semi-elasticity
• Predicted wage is approximately 15% higher for
unionized workers (on average)
• log(wage) = 1.61 + 0.30 log(profits)
• Elasticity
• A one percent increase in profits increases
predicted wages by approximately 0.3 percent
49
Multicollinearity
Auto repair records,
weight, and engine size
Number of obs =
69
F( 2,
66) =
6.84
Prob > F
= 0.0020
R-squared
= 0.1718
Adj R-squared = 0.1467
Root MSE
= .91445
---------------------------------------------repair |
Coef.
Std. Err.
t
P>|t|
-------+-------------------------------------weight | -.00017
.00038
-0.41
0.685
engine | -.00313
.00328
-0.96
0.342
_cons | 4.50161
.61987
7.26
0.000
---------------------------------------------50
Multicollinearity
• Two (or more) independent variables are so highly
correlated that a multiple regression can’t disentangle
the unique contributions of each
• Large standard errors and lack of statistical
significance for individual coefficients
• But joint significance
• Identifying multicollinearity
• Some say “rule of thumb |r|>0.70” (or 0.80)
• But better to look at results
• OK for prediction
• Bad for assessing theory
51
Prediction With
Multicollinearity
• Prediction at the Mean (weight=3019 and engine=197)
Model for
prediction
Lower
Upper
Predicted 95% Limit 95% Limit
(Mean)
Repair
(Mean)
Multiple
Regression
3.411
3.191
3.631
Weight
Only
3.412
3.193
3.632
Engine
Only
3.410
3.192
3.629
52
Dummy Dependent
Variables
• Dummy dependent variables
• y = b0 + b1x1 + … + bkxk + e
• Where y is a {0,1} indicator variable
• Examples
• Do you intend to quit? yes / no
• Did the worker receive training? yes/no
• Do you think the President is doing a good job?
yes/no
• Was there a strike? yes / no
• Did the company go bankrupt? yes/no
53
Linear Probability
Model
• Mathematically / computationally, can estimate a
regression as usual (the monkeys won’t know the
difference)
• This is called a “linear probability model”
• Right-hand side is linear
• And is estimating probabilities
• P(y =1) = b0 + b1x1 + … + bkxk
• b1=0.15 (for example) means that a one unit
change in x1 increases probability that y=1 by
0.15 (fifteen percentage points)
54
Linear Probability
Model
• Excel won’t know the difference, but perhaps it should
• Linear probability model problems
• e2 = P(y=1)[1-P(y=1)]
• But P(y =1) = b0 + b1x1 + … + bkxk
• So e2 is
• Predicted probabilities are not bounded by 0,1
• R2 is not an accurate measure of predictive ability
• Can use a pseudo-R2 measure
• Such as percent correctly predicted
55
Logit Model &
Probit Model
• Solution to these problems is to use nonlinear
functional forms that bound P(y=1) between 0,1
• Logit Model (logistic regression)
eb0 b1x1 b2 x2 ... bk xk e
P( y  1) 
1  eb0 b1x1 b2 x2 ... bk xk e
Recall, ln(x) = a
when ea = x
• Probit Model
P( y  1)  (b0  b1 x1  b2 x2  ...  bk xk  e)
• Where  is the normal cumulative distribution
function
56
Logit Model &
Probit Model
• Nonlinear so need statistical package to do the
calculations
• Can do individual (z-tests, not t-tests) and joint
statistical testing as with other regressions
• Also confidence intervals
• Need to convert coefficients to marginal effects for
interpretation
• Should be aware of these models
• Though in many cases, a linear probability model
works just fine
57
Example
• Dep. Var: 1 if you know of the FMLA, 0 otherwise
Probit estimates
Number of obs =
1189
LR chi2(14)
= 232.39
Prob > chi2
= 0.0000
Log likelihood = -707.94377
Pseudo R2
= 0.1410
-----------------------------------------------------------FMLAknow | Coef. Std. Err. z
P>|z| [95% Conf. Int]
---------+-------------------------------------------------union |
.238
.101
2.35
0.019
.039
.436
age | -.002
.018
-0.13
0.897
-.038
.033
agesq |
.135
.219
0.62
0.536
-.293
.564
nonwhite | -.571
.098
-5.80
0.000
-.764
-.378
income | 1.465
.393
3.73
0.000
.696
2.235
incomesq | -5.854
2.853
-2.05
0.040 -11.45
-.262
[other controls omitted]
_cons | -1.188
.328
-3.62
0.000 -1.831
-.545
-----------------------------------------------------------58
Marginal Effects
• For numerical interpretation / prediction, need to
convert coefficients to marginal effects
• Example: Logit Model
 P( y  1) 
  b0  b1 x1  b2 x2  ...  bk xk  e
log 
 1  P( y  1) 
• So b1 gives effect on Log(•), not P(y=1)
• Probit is similar
• Can re-arrange to find out effect on P(y=1)
• Usually do this at the sample means
59
Marginal Effects
Probit estimates
Number of obs =
1189
LR chi2(14)
= 232.39
Prob > chi2
= 0.0000
Log likelihood = -707.94377
Pseudo R2
= 0.1410
-----------------------------------------------------------FMLAknow | dF/dx Std. Err. z
P>|z| [95% Conf. Int]
---------+-------------------------------------------------union |
.095
.040
2.35
0.019
.017
.173
age | -.001
.007
-0.13
0.897
-.015
.013
agesq |
.054
.087
0.62
0.536
-.117
.225
Nonwhite | -.222
.036
-5.80
0.000
-.293
-.151
income |
.585
.157
3.73
0.000
.278
.891
incomesq | -2.335
1.138
-2.05
0.040 -4.566
-.105
[other controls omitted]
-----------------------------------------------------------
For numerical interpretation / prediction, need to convert
coefficients to marginal effects
60
But Linear Probability
Model is OK, Too
Union
Nonwhite
Income
Income
Squared
Probit
Coeff.
0.238
(0.101)
Probit
Marginal
0.095
(0.040)
Regression
0.084
(0.035)
-0.571
(0.098)
1.465
(0.393)
-0.222
(0.037)
0.585
(0.157)
-0.192
(0.033)
0.442
(0.091)
-5.854
(2.853)
-2.335
(1.138)
-1.354
(0.316)
So
regression is
usually OK,
but should
still be
familiar with
logit and
probit
methods
61