Download robust

Document related concepts

Data assimilation wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Choice modelling wikipedia , lookup

Least squares wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Discrete choice wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Section 3
Probit and Logit Models
1
Dichotomous Data
• Suppose data is discrete but there are
only 2 outcomes
• Examples
– Graduate high school or not
– Patient dies or not
– Working or not
– Smoker or not
• In data, yi=1 if yes, yi =0 if no
2
How to model the data generating
process?
• There are only two outcomes
• Research question: What factors impact
whether the event occurs?
• To answer, will model the probability
the outcome occurs
• Pr(Yi=1) when yi=1 or
• Pr(Yi=0) = 1- Pr(Yi=1) when yi=0
3
• Think of the problem from a MLE
perspective
• Likelihood for i’th observation
• Li= Pr(Yi=1)Yi [1 - Pr(Yi=1)](1-Yi)
• When yi=1, only relevant part is Pr(Yi=1)
• When yi=0, only relevant part is [1 - Pr(Yi=1)]
4
• L = Σi ln[Li] =
= Σi {yi ln[Pr(yi=1)] + (1-yi)ln[Pr(yi=0)] }
• Notice that up to this point, the model is
generic. The log likelihood function will
determined by the assumptions
concerning how we determine Pr(yi=1)
5
Modeling the probability
• There is some process (biological, social,
decision theoretic, etc) that determines
the outcome y
• Some of the variables impacting are
observed, some are not
• Requires that we model how these
factors impact the probabilities
• Model from a ‘latent variable’
perspective
6
• Consider a women’s decision to work
• yi* = the person’s net benefit to work
• Two components of yi*
– Characteristics that we can measure
• Education, age, income of spouse, prices of child
care
– Some we cannot measure
• How much you like spending time with your kids
• how much you like/hate your job
7
• We aggregate these two components into one equation
• yi* = β0 + x1i β1+ x2i β2+… xki βk+ εi
= xi β + εi
• xi β (measurable characteristics but with uncertain weights)
• εi random unmeasured characteristics
• Decision rule: person will work if yi* > 0
(if net benefits are positive)
yi=1 if yi*>0
yi=0 if yi*≤0
8
• yi=1 if yi*>0
• yi* = xi β + εi > 0 only if
• εi > - xi β
• yi=0 if yi*≤0
• yi* = xi β + εi ≤ 0 only if
• εi ≤ - xi β
9
• Suppose xi β is ‘big.’
– High wages
– Low husband’s income
– Low cost of child care
• We would expect this person to work,
UNLESS, there is some unmeasured
‘variable’ that counteracts this
10
• Suppose a mom really likes spending
time with her kids, or she hates her job.
• The unmeasured benefit of working has
a big negative coefficient εi
• If we observe them working, εi must not
have been too big, since
• yi=1 if εi > - xi β
11
• Consider the opposite. Suppose we
observe someone NOT working.
• Then εi must not have been big, since
• yi=0 if εi ≤ - xi β
12
Logit
• Recall yi =1 if εi > - xi β
• Since εi is a logistic distribution
• Pr(εi > - xi β) = 1 – F(- xi β)
• The logistic is also a symmetric
distribution, so
•
1 – F(- xi β)
•
= F(xi β)
•
= exp(xi β)/(1+exp(xi β))
13
• When εi is a logistic distribution
• Pr(yi =1) = exp(xi β)/(1+exp(xi β))
• Pr(yi=0) = 1/(1+exp(xi β))
14
Example: Workplace smoking
bans
• Smoking supplements to 1991 and 1993
National Health Interview Survey
• Asked all respondents whether they
currently smoke
• Asked workers about workplace tobacco
policies
• Sample: workers
• Key variables: current smoking and
whether they faced by workplace ban
15
• Data: workplace1.dta
• Sample program: workplace1.doc
• Results: workplace1.log
16
Description of variables in data
•
. desc;
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
storage display
value
variable name
type
format
label
variable label
-----------------------------------------------------------------------> smoker
byte
%9.0g
is current smoking
worka
byte
%9.0g
has workplace smoking bans
age
byte
%9.0g
age in years
male
byte
%9.0g
male
black
byte
%9.0g
black
hispanic
byte
%9.0g
hispanic
incomel
float %9.0g
log income
hsgrad
byte
%9.0g
is hs graduate
somecol
byte
%9.0g
has some college
college
float %9.0g
-----------------------------------------------------------------------
17
Summary statistics
•
sum;
•
•
•
•
•
•
•
•
•
•
•
•
•
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------smoker |
16258
.25163
.433963
0
1
worka |
16258
.6851396
.4644745
0
1
age |
16258
38.54742
11.96189
18
87
male |
16258
.3947595
.488814
0
1
black |
16258
.1119449
.3153083
0
1
-------------+-------------------------------------------------------hispanic |
16258
.0607086
.2388023
0
1
incomel |
16258
10.42097
.7624525
6.214608
11.22524
hsgrad |
16258
.3355271
.4721889
0
1
somecol |
16258
.2685447
.4432161
0
1
college |
16258
.3293763
.4700012
0
1
18
Running a probit
• probit smoker age incomel male black
hispanic hsgrad somecol college worka;
• The first variable after ‘probit’ is the
discrete outcome, the rest of the
variables are the independent variables
• Includes a constant as a default
19
Running a logit
• logit smoker age incomel male black
hispanic hsgrad somecol college worka;
• Same as probit, just change the first
word
20
Running linear probability
• reg smoker age incomel male black hispanic
hsgrad somecol college worka, robust;
• Simple regression.
• Standard errors are incorrect
(heteroskedasticity)
• robust option produces standard errors
with arbitrary form of heteroskedasticity
21
Probit Results
•
•
•
•
Probit estimates
•
•
•
•
•
•
•
•
•
•
•
•
•
•
-----------------------------------------------------------------------------smoker |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0012684
.0009316
-1.36
0.173
-.0030943
.0005574
incomel |
-.092812
.0151496
-6.13
0.000
-.1225047
-.0631193
male |
.0533213
.0229297
2.33
0.020
.0083799
.0982627
black | -.1060518
.034918
-3.04
0.002
-.17449
-.0376137
hispanic | -.2281468
.0475128
-4.80
0.000
-.3212701
-.1350235
hsgrad | -.1748765
.0436392
-4.01
0.000
-.2604078
-.0893453
somecol |
-.363869
.0451757
-8.05
0.000
-.4524118
-.2753262
college | -.7689528
.0466418
-16.49
0.000
-.860369
-.6775366
worka | -.2093287
.0231425
-9.05
0.000
-.2546873
-.1639702
_cons |
.870543
.154056
5.65
0.000
.5685989
1.172487
------------------------------------------------------------------------------
Log likelihood = -8761.7208
Number of obs
LR chi2(9)
Prob > chi2
Pseudo R2
=
=
=
=
16258
819.44
0.0000
0.0447
22
How to measure fit?
• Regression (OLS)
– minimize sum of squared errors
– Or, maximize R2
– The model is designed to maximize predictive
capacity
• Not the case with Probit/Logit
– MLE models pick distribution parameters so as
best describe the data generating process
– May or may not ‘predict’ the outcome well
23
Pseudo R2
• LLk log likelihood with all variables
• LL1 log likelihood with only a constant
• 0 > LLk > LL1 so | LLk | < |LL1|
• Pseudo R2 = 1 - |LL1/LLk|
• Bounded between 0-1
• Not anything like an R2 from a
regression
24
Predicting Y
• Let b be the estimated value of β
• For any candidate vector of xi , we can predict
probabilities, Pi
• Pi = Ф(xib)
• Once you have Pi, pick a threshold value, T, so
that you predict
• Yp = 1 if Pi > T
• Yp = 0 if Pi ≤ T
• Then compare, fraction correctly predicted
25
• Question: what value to pick for T?
• Can pick .5
– Intuitive. More likely to engage in the
activity than to not engage in it
– However, when the  is small, this criteria
does a poor job of predicting Yi=1
– However, when the  is close to 1, this
criteria does a poor job of picking Yi=0
26
• *predict probability of smoking;
• predict pred_prob_smoke;
• * get detailed descriptive data about predicted
prob;
• sum pred_prob, detail;
• * predict binary outcome with 50% cutoff;
• gen pred_smoke1=pred_prob_smoke>=.5;
• label variable pred_smoke1 "predicted smoking, 50%
cutoff";
• * compare actual values;
• tab smoker pred_smoke1, row col cell;
27
•
. sum pred_prob, detail;
•
•
•
•
•
•
•
Pr(smoker)
------------------------------------------------------------Percentiles
Smallest
1%
.0959301
.0615221
5%
.1155022
.0622963
10%
.1237434
.0633929
Obs
16258
25%
.1620851
.0733495
Sum of Wgt.
16258
•
•
•
•
•
•
50%
75%
90%
95%
99%
.2569962
.3187975
.3795704
.4039573
.4672697
Largest
.5619798
.5655878
.5684112
.6203823
Mean
Std. Dev.
.2516653
.0960007
Variance
Skewness
Kurtosis
.0092161
.1520254
2.149247
28
• Notice two things
– Sample mean of the predicted probabilities
is close to the sample mean outcome
– 99% of the probabilities are less than .5
– Should predict few smokers if use a 50%
cutoff
29
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
| predicted smoking,
is current |
50% cutoff
smoking |
0
1 |
Total
-----------+----------------------+---------0 |
12,153
14 |
12,167
|
99.88
0.12 |
100.00
|
74.93
35.90 |
74.84
|
74.75
0.09 |
74.84
-----------+----------------------+---------1 |
4,066
25 |
4,091
|
99.39
0.61 |
100.00
|
25.07
64.10 |
25.16
|
25.01
0.15 |
25.16
-----------+----------------------+---------Total |
16,219
39 |
16,258
|
99.76
0.24 |
100.00
|
100.00
100.00 |
100.00
|
99.76
0.24 |
100.00
30
• Check on-diagonal elements.
• The last number in each 2x2 element is
the fraction in the cell
• The model correctly predicts 74.75 +
0.15 = 74.90% of the obs
• It only predicts a small fraction of
smokers
31
• Do not be amazed by the 75% percent
correct prediction
• If you said everyone has a  chance of
smoking (a case of no covariates), you
would be correct Max[(,(1-)] percent of
the time
32
• In this case, 25.16% smoke.
• If everyone had the same chance of
smoking, we would assign everyone
Pr(y=1) = .2516
• We would be correct for the 1 - .2516 =
0.7484 people who do not smoke
33
Key points about prediction
• MLE models are not designed to
maximize prediction
• Should not be surprised they do not
predict well
• In this case, not particularly good
measures of predictive capacity
34
Translating coefficients in probit:
Continuous Covariates
• Pr(yi=1) = Φ[β0 + x1i β1+ x2i β2+… xki βk]
• Suppose that x1i is a continuous variable
• d Pr(yi=1) /d x1i = ?
• What is the change in the probability of
an event give a change in x1i?
35
Marginal Effect
• d Pr(yi=1) /d x1i
• = β1 φ[β0 + x1i β1+ x2i β2+… xki βk]
• Notice two things. Marginal effect is a
function of the other parameters and the
values of x.
36
Translating Coefficients:
Discrete Covariates
• Pr(yi=1) = Φ[β0 + x1i β1+ x2i β2+… xki βk]
• Suppose that x2i is a dummy variable (1
if yes, 0 if no)
• Marginal effect makes no sense, cannot
change x2i by a little amount. It is either
1 or 0.
• Redefine the variable of interest.
Compare outcomes with and without x2i
37
• y1 = Pr(yi=1 | x2i=1)
= Φ[β0 + x1iβ1+ β2 + x3iβ3 +… ]
• y0 = Pr(yi=1 | x2i=0)
= Φ[β0 + x1iβ1+ x3iβ3 … ]
Marginal effect = y1 – y0.
Difference in probabilities with and
without x2i?
38
In STATA
• Marginal effects for continuous
variables, STATA picks sample means
for X’s
• Change in probabilities for dichotomous
outcomes, STATA picks sample means
for X’s
39
STATA command for Marginal
Effects
• mfx compute;
• Must be after the outcome when
estimates are still active in program.
40
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Marginal effects after probit
y = Pr(smoker) (predict)
= .24093439
-----------------------------------------------------------------------------variable |
dy/dx
Std. Err.
z
P>|z| [
95% C.I.
]
X
---------+-------------------------------------------------------------------age | -.0003951
.00029
-1.36
0.173 -.000964 .000174
38.5474
incomel | -.0289139
.00472
-6.13
0.000
-.03816 -.019668
10.421
male*|
.0166757
.0072
2.32
0.021
.002568 .030783
.39476
black*| -.0320621
.01023
-3.13
0.002 -.052111 -.012013
.111945
hispanic*| -.0658551
.01259
-5.23
0.000 -.090536 -.041174
.060709
hsgrad*|
-.053335
.01302
-4.10
0.000
-.07885 -.02782
.335527
somecol*| -.1062358
.01228
-8.65
0.000 -.130308 -.082164
.268545
college*| -.2149199
.01146 -18.76
0.000 -.237378 -.192462
.329376
worka*| -.0668959
.00756
-8.84
0.000
-.08172 -.052072
.68514
-----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1
41
Interpret results
• 10% increase in income will reduce
smoking by 2.9 percentage points
• 10 year increase in age will decrease
smoking rates .4 percentage points
• Those with a college degree are 21.5
percentage points less likely to smoke
• Those that face a workplace smoking
ban have 6.7 percentage point lower
probability of smoking
42
• Do not confuse percentage point and
percent differences
– A 6.7 percentage point drop is 29% of the
sample mean of 24 percent.
– Blacks have smoking rates that are 3.2
percentage points lower than others, which
is 13 percent of the sample mean
43
Comparing Marginal Effects
Variable
age
incomel
male
Black
hispanic
hsgrad
college
worka
LP
-0.00040
-0.0289
0.0167
-0.0321
-0.0658
-0.0533
-0.2149
-0.0669
Probit
-0.00048
-0.0287
0.0168
-0.0357
-0.0706
-0.0661
-0.2406
-0.0661
Logit
-0.00048
-0.0276
0.0172
-0.0342
-0.0602
-0.0514
-0.2121
-0.0658
44
When will results differ?
• Normal and logit CDF look
– Similar in the mid point of the distribution
– Different in the tails
• You obtain more observations in the
tails of the distribution when
– Samples sizes are large
–  approaches 1 or 0
• These situations will produce more
differences in estimates
45
Some nice properties of the Logit
• Outcome, y=1 or 0
• Treatment, x=1 or 0
• Other covariates, x
• Context,
– x = whether a baby is born with a low
weight birth
– x = whether the mom smoked or not during
pregnancy
46
• Risk ratio
RR = Prob(y=1|x=1)/Prob(y=1|x=0)
Differences in the probability of an event
when x is and is not observed
How much does smoking elevate the chance
your child will be a low weight birth
47
• Let Yyx be the probability y=1 or 0 given
x=1 or 0
• Think of the risk ratio the following way
• Y11 is the probability Y=1 when X=1
• Y10 is the probability Y=1 when X=0
• Y11 = RR*Y10
48
• Odds Ratio
OR=A/B = [Y11/Y01]/[Y10/Y00]
A = [Pr(Y=1|X=1)/Pr(Y=0|X=1)]
= odds of Y occurring if you are a smoker
B = [Pr(Y=1|X=0)/Pr(Y=0|X=0)]
= odds of y happening if you are not a smoker
What are the relative odds of Y happening if you do
or do not experience X
49
• Suppose Pr(Yi =1) = F(βo+ β1Xi + β2Z) and
F is the logistic function
• Can show that
• OR = exp(β1) = e β1
• This number is typically reported by
most statistical packages
50
• Details
• Y11 = exp(βo+ β1 + β2Z) /(1+ exp(βo+ β1+ β2Z) )
• Y10 = exp(βo+ β2Z)/(1+ exp(βo+β2Z))
• Y01 = 1 /(1+ exp(βo+ β1 + β2Z) )
• Y00 = 1/(1+ exp(βo+β2Z)
• [Y11/Y01] = exp(βo+ β1 + β2Z)
• [Y10/Y00] = exp(βo+ β2Z)
• OR=A/B = [Y11/Y01]/[Y10/Y00]
= exp(βo+ β1 + β2Z)/ exp(βo + β2Z)
= exp(β1)
51
• Suppose Y is rare,  close to 0
– Pr(Y=0|X=1) and Pr(Y=0|X=0) are both
close to 1, so they cancel
• Therefore, when  is close to 0
– Odds Ratio = Risk Ratio
• Why is this nice?
52
Population attributable risk
• Average outcome in the population
•  = (1-) Y10 +  Y11 = (1- )Y10 + (RR)Y10
• Average outcomes are a weighted average of
outcomes for X=0 and X=1
• What would the average outcome be in the
absence of X (e.g., reduce smoking rates to 0)
• Ya = Y10
53
Population Attributable Risk
• PAR
• Fraction of outcome attributed to X
• The difference between the current rate
and the rate that would exist without X,
divided by the current rate
• PAR = ( – Ya)/
= (RR – 1)/[(1-) + RR]
54
Example: Maternal Smoking and
Low Weight Births
• 6% births are low weight
– < 2500 grams (
– Average birth is 3300 grams (5.5 lbs)
• Maternal smoking during pregnancy has
been identified as a key cofactor
– 13% of mothers smoke
– This number was falling about 1 percentage
point per year during 1980s/90s
– Doubles chance of low weight birth
55
Natality detail data
• Census of all births (4 million/year)
• Annual files starting in the 60s
• Information about
– Baby (birth weight, length, date, sex, plurality,
birth injuries)
– Demographics (age, race, marital, educ of mom)
– Birth (who delivered, method of delivery)
– Health of mom (smoke/drank during preg, weight
gain)
56
• Smoking not available from CA or NY
• ~3 million usable observations
• I pulled .5% random sample from 1995
• About 12,500 obs
• Variables: birthweight (grams), smoked,
married, 4-level race, 5 level education,
mothers age at birth
57
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
-----------------------------------------------------------------------------> storage display
value
variable name
type
format
label
variable label
-----------------------------------------------------------------------------> birthw
int
%9.0g
birth weight in grams
smoked
byte
%9.0g
=1 if mom smoked during
pregnancy
age
byte
%9.0g
moms age at birth
married
byte
%9.0g
=1 if married
race4
byte
%9.0g
1=white,2=black,3=asian,4=other
educ5
byte
%9.0g
1=0-8, 2=9-11, 3=12, 4=13-15,
5=16+
visits
byte
%9.0g
prenatal visits
------------------------------------------------------------------------------
58
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
dummy |
variable, |
=1 |
=1 if mom smoked
ifBW<2500 |
during pregnancy
grams |
0
1 |
Total
-----------+----------------------+---------0 |
11,626
1,745 |
13,371
|
86.95
13.05 |
100.00
|
94.64
89.72 |
93.96
|
81.70
12.26 |
93.96
-----------+----------------------+---------1 |
659
200 |
859
|
76.72
23.28 |
100.00
|
5.36
10.28 |
6.04
|
4.63
1.41 |
6.04
-----------+----------------------+---------Total |
12,285
1,945 |
14,230
|
86.33
13.67 |
100.00
|
100.00
100.00 |
100.00
|
86.33
13.67 |
100.00
59
• Notice a few things
– 13.7% of women smoke
– 6% have low weight birth
• Pr(LBW | Smoke) =10.28%
• Pr(LBW |~ Smoke) = 5.36%
• RR
= Pr(LBW | Smoke)/ Pr(LBW |~ Smoke)
= 0.1028/0.0536 = 1.92
60
Logit results
•
Log likelihood = -3136.9912
Pseudo R2
=
0.0330
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
-----------------------------------------------------------------------------lowbw |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------smoked |
.6740651
.0897869
7.51
0.000
.4980861
.8500441
age |
.0080537
.006791
1.19
0.236
-.0052564
.0213638
married | -.3954044
.0882471
-4.48
0.000
-.5683654
-.2224433
_Ieduc5_2 | -.1949335
.1626502
-1.20
0.231
-.5137221
.1238551
_Ieduc5_3 | -.1925099
.1543239
-1.25
0.212
-.4949791
.1099594
_Ieduc5_4 | -.4057382
.1676759
-2.42
0.016
-.7343769
-.0770994
_Ieduc5_5 | -.3569715
.1780322
-2.01
0.045
-.7059081
-.0080349
_Irace4_2 |
.7072894
.0875125
8.08
0.000
.5357681
.8788107
_Irace4_3 |
.386623
.307062
1.26
0.208
-.2152075
.9884535
_Irace4_4 |
.3095536
.2047899
1.51
0.131
-.0918271
.7109344
_cons | -2.755971
.2104916
-13.09
0.000
-3.168527
-2.343415
------------------------------------------------------------------------------
61
Odds Ratios
• Smoked
– exp(0.674) = 1.96
– Smokers are twice as likely to have a low
weight birth
• _Irace4_2 (Blacks)
– exp(0.707) = 2.02
– Blacks are twice as likely to have a low
weight birth
62
Asking for odds ratios
• Logistic y x1 x2;
• In this case
• xi: logistic lowbw smoked age
married i.educ5 i.race4;
63
•
Log likelihood = -3136.9912
Pseudo R2
=
0.0330
•
•
•
•
•
•
•
•
•
•
•
•
•
•
-----------------------------------------------------------------------------lowbw | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------smoked |
1.962198
.1761796
7.51
0.000
1.645569
2.33975
age |
1.008086
.0068459
1.19
0.236
.9947574
1.021594
married |
.6734077
.0594262
-4.48
0.000
.5664506
.8005604
_Ieduc5_2 |
.8228894
.1338431
-1.20
0.231
.5982646
1.131852
_Ieduc5_3 |
.8248862
.1272996
-1.25
0.212
.6095837
1.116233
_Ieduc5_4 |
.6664847
.1117534
-2.42
0.016
.4798043
.9257979
_Ieduc5_5 |
.6997924
.1245856
-2.01
0.045
.4936601
.9919973
_Irace4_2 |
2.028485
.1775178
8.08
0.000
1.70876
2.408034
_Irace4_3 |
1.472001
.4519957
1.26
0.208
.8063741
2.687076
_Irace4_4 |
1.362817
.2790911
1.51
0.131
.9122628
2.035893
------------------------------------------------------------------------------
64
PAR
• PAR = (RR – 1)/[(1-) + RR]
• = 0.137
• RR = 1.96
• PAR = 0.116
• 11.6% of low weight births attributed to
maternal smoking
65
Hypothesis Testing in MLE
models
• MLE are asymptotically normally
distributed, one of the properties of MLE
• Therefore, standard t-tests of hypothesis
will work as long as samples are ‘large’
• What ‘large’ means is open to question
• What to do when samples are ‘small’ –
table for a moment
66
Testing a linear combination of
parameters
• Suppose you have a probit model
• Φ[β0 + x1iβ1+ x2i β2 + x3iβ3 +… ]
• Test a linear combination or parameters
• Simplest example, test a subset are zero
• β1= β2 = β3 = β4 =0
• To fix the discussion
• N observations
• K parameters
• J restrictions (count the equals signs, j=4)
67
Wald Test
• Based on the fact that the parameters
are distributed asymptotically normal
• Probability theory review
– Suppose you have m draws from a standard
normal distribution (zi)
– M = z12 + z22 + …. Zm2
– M is distributed as a Chi-square with m
degrees of freedom
68
• Wald test constructs a ‘quadratic form’
suggested by the test you want to perform
• This combination, because it contains squares
of the true parameters, should, if the
hypothesis is true, be distributed as a Chi
square with j degrees of freedom.
• If the test statistic is ‘large’, relative to the
degrees of freedom of the test, we reject,
because there is a low probability we would
have drawn that value at random from the
distribution
69
Reading values from a Table
• All stats books will report the
‘percentiles’ of a chi-square
– Vertical axis (degrees of freedom)
– Horizontal axis (percentiles)
– Entry is the value where ‘percentile’ of the
distribution falls below
70
• Example: Suppose 4 restrictions
• 95% of a chi-square distribution falls
below 9.488.
• So there is only a 5% a number drawn at
random will exceed 9.488
• If your test statistic is below, cannot
reject null
• If your test statistics is above, reject null
71
Chi-square
DOF
1
2
3
4
5
6
7
8
9
10
Percentiles of the Chi-squared
0.500 0.750 0.800 0.900 0.950
0.455 1.323 1.642 2.706 3.841
1.386 2.773 3.219 4.605 5.991
2.366 4.108 4.642 6.251 7.815
3.357 5.385 5.989 7.779 9.488
4.351 6.626 7.289 9.236 11.070
5.348 7.841 8.558 10.645 12.592
6.346 9.037 9.803 12.017 14.067
7.344 10.219 11.030 13.362 15.507
8.343 11.389 12.242 14.684 16.919
9.342 12.549 13.442 15.987 18.307
0.990
6.635
9.210
11.345
13.277
15.086
16.812
18.475
20.090
21.666
23.209
0.995
7.879
10.597
12.838
14.860
16.750
18.548
20.278
21.955
23.589
25.188
72
Wald test in STATA
• Default test in MLE models
• Easy to do. Look at program
• test hsgrad somecol college
• Does not estimate the ‘restricted’ model
• ‘Lower power’ than other tests, i.e., high
chance of false negative
73
-2 Log likelihood test
• * how to run the same tests with a -2 log like test;
• * estimate the unresticted model and save the estimates
;
• * in urmodel;
• probit smoker age incomel male black hispanic
• hsgrad somecol college worka;
• estimates store urmodel;
• * estimate the restricted model. save results in
rmodel;
• probit smoker age incomel male black hispanic
• worka;
• estimates store rmodel;
• lrtest urmodel rmodel;
74
• I prefer -2 log likelihood test
– Estimates the restricted and unrestricted
model
– Therefore, has more power than a Wald test
• In most cases, they give the same
‘decision’ (reject/not reject)
75