Download Transfer Program in the 1997 Cohort of the NLSY

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data center wikipedia , lookup

Predictive analytics wikipedia , lookup

Data model wikipedia , lookup

3D optical data storage wikipedia , lookup

Data analysis wikipedia , lookup

Information privacy law wikipedia , lookup

Forecasting wikipedia , lookup

Business intelligence wikipedia , lookup

Data vault modeling wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Transcript
Program Recipiency
Karima Nagi
Transfer Program in the
1997 Cohort of the NLSY
Dan Black
University of Chicago & NORC
Four types of transfer payments
• TANF (AFDC), Temporary Assistance to
Needy Families, what is generally referred
to as “welfare”
• Food Stamps
• WIC (Women, Infants, and Children)
• Other transfer payments – primarily SSI,
or Supplemental Security Income, which
may include some “insurance” programs
such as Disability Insurance
Four types of variables
• Status: an indicator for each month
whether or not the household is
participating in the program since age 14
• Amount received
• Household member(s) receiving the aid
• Deny: Indicates that the respondent
denies previously reported receipt
Two types of social insurance
• Workers compensation (only available in
first two rounds)
• Unemployment Insurance
• Social insurance programs are different
than transfer payments because your
“contributions” or taxes fund the program
• I will focus on Unemployment Insurance
for this talk
Strengths of data
• While this data clearly allows you to track the
incidence of, say, Food Stamp use, other data
sets such as the CPS will give you much larger
and much broader samples
• Our data covers adolescent “participation” in
programs as well as young adult with reasonably
large samples
• Our data allow you to see what fraction of time
they have been on a program
Strengths of data
• Our data allow you to estimate the
duration of the spell of recipiency
• Such duration models are also often
referred to as “hazard models”
• Policy relevance: for a program of a given
size you might want to know is this a large
group of individuals with relatively short
stays or a small group with long stays
A gentle but incomplete
introduction to hazard models
• These are discrete-time hazard models because
the data are monthly
• Most standard software is not appropriate
because it assumes underlying data are
continuous (no observations with the same
length of spell)
• With monthly data, you will get a very large
number of observations with the same spell
length
• Stata has ado file that will do estimation,
pgmhaz.ado
A gentle but incomplete
introduction to hazard models
• Basic idea: Given that the spell of transfer
recipiency lasted until time t0, what is the
probability that it ends at time (t0+1)?
• Mathematically, this is just
0
f
(
t
 1| X )
0
0
Pr(t  1| t , X ) 
0
1  F (t | X )
• If you have a complete set of conditional
probabilities, you can recover probability
function
A gentle but incomplete
introduction to hazard models
• A couple of complications
• First, you may not observed the end of the spell
because of data limitations or other censoring
mechanisms, right-hand censoring
• Second, you may not observe the beginning of
the spell because of data limitations (left-hand
censoring)
• The first problem is easy to handle (and
pgmhaz.ado does it for you), the third is a lot
harder
A gentle but incomplete
introduction to hazard models
• To estimate these models, you would arrange
the data where each month is an observation.
For persons’ whose spells last 3 months, they
will have 3 observations. For persons’ whose
spells last 5 months, they will have 5
observations
• Each dependent variable will be a 0 until the
month they leave, which will be a 1
• Recipiency may not be monthly – may be weekly
A gentle but incomplete
introduction to hazard models
• Consider the probability we are estimating
0
f
(
t
 1| X )
0
0
Pr(t  1| t , X ) 
1  F (t 0 | X )
• In months 1, the probability of “surviving” to
month 0 has no effect so you use the whole
sample
• In month 2, you use only those who survived
month 1 so your sample is conditional
A gentle but incomplete
introduction to hazard models
• Again, we wish to estimate
0
f
(
t
 1| X )
0
0
Pr(t  1| t , X ) 
1  F (t 0 | X )
• This begins to look like a logit or probit
problem, and the pmghaz.ado uses a logittype model, but constrains to coefficients to be
the same on the X’s across months
• Because the sample is limited to survivors for
months 2 and beyond we may have a problem
A gentle but incomplete
introduction to hazard models
• The surviving sample is selected on survivorship
so any unobserved differences (or
heterogeneity) results in biased estimates
• One approach is to assume that the
heterogeneity results in a gamma mixture model
as proposed by Meyer (and you guessed it,
pgmhaz.ado does this for you)
• More complicated adjustments (HeckmanSinger’s nonparametric adjustment) are not yet
implemented
A gentle but incomplete
introduction to hazard models
• Further reading
– Basic approach
• Meyer, B. D. “Unemployment insurance and unemployment
spells,” Econometrica August 1990 58(4) 757-782.
– More complicated heterogeneity adjustment
• Heckman, J.J., and B. Singer. “Econometric duration
analysis” Journal of Econometrics January/February 1984
24(1-2) 63-132
– Left-hand censoring
• Berger, M.C. and D. A. Black. “The Duration of Medicaid
Spells: An Analysis Using Flow and Stock Samples,” Review
of Economics and Statistics November 1998 80(4) 667-674.
A very short bit on UI
• UI is an insurance program that only
covers workers with an adequate work
history and who are not dismissed for
cause or who do not quit
• A UI claim covers one year and allows you
to take full benefits for 6 months
(generally) within the year
• May be periodically extended
How do you go from the data to
estimation?
• Start with the User Manual!!!
• Begin to pull the data you need, including
covariates
• This is a slow process, but do not rush it.
You will just have to come back
• Can pull some variables from Harris
School’s (Bob Michael’s) “flat files” at
http://harrisschool.uchicago.edu/Research/faculty_projec
ts/NLSY97_flat_files/
Data to estimation
Name Tag Question
Variable Title
Year
1 R9147700
UNEMP_STATUS_2002.01 2002 UNEMPLOYMENT INSURANCE: R RECEIVED IN MONTH 01 2004
2 R9147800
UNEMP_STATUS_2002.02 2002 UNEMPLOYMENT INSURANCE: R RECEIVED IN MONTH 02 2004
3 R9147900
UNEMP_STATUS_2002.03 2002 UNEMPLOYMENT INSURANCE: R RECEIVED IN MONTH 03 2004
4 R9148000
UNEMP_STATUS_2002.04 2002 UNEMPLOYMENT INSURANCE: R RECEIVED IN MONTH 04 2004
5 R9148100
UNEMP_STATUS_2002.05 2002 UNEMPLOYMENT INSURANCE: R RECEIVED IN MONTH 05 2004
6 R9148200
UNEMP_STATUS_2002.06 2002 UNEMPLOYMENT INSURANCE: R RECEIVED IN MONTH 06 2004
Data to estimation
• Web investigator will dutifully give you
8984 observations, one for each public id
• Most of these are not going to be used.
For instance, in 2002, only 208 report any
spell of unemployment
• You need to remove data with no UI spells
(which is easy in Stata using the egen
anycount option
– egen x = anycount(ui*), v(1)
Data to estimation
• Let’s look at some data
• I took the data from 2002 and dropped all
observations that had no unemployment
spells
• I am going to name variable mxx, where
xx is the number of the month from 01 to
12. This a terrible convention (I actually
used ui2002xx, but it was hard to display)
Data to estimation
+-----------------------------------------------------------------------+
m01 m02 m03 m04 m05 m06 m07 m08 m09 m10 m11 m12
----------------------------------------------------------------------1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
No No No No No No Yes Yes Yes No No Yes
No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
No No No No No No No No No No Yes Yes
No No No No No No No No Yes Yes Yes Yes
No No No No No No No Yes Yes No No No
----------------------------------------------------------------------No No No No No No No Yes No No No No
No No No No No No No Yes Yes Yes No No
No No No No No No Yes Yes No No No No
Yes No No No No No No No No No No No
No No Yes Yes Yes Yes Yes No No No No No
Data to estimation
+-----------------------------------------------------------------------+
m01 m02 m03 m04 m05 m06 m07 m08 m09 m10 m11 m12
----------------------------------------------------------------------52.
120.
142.
153.
No No No No No No No No No Yes Yes
No No Yes Yes . . . . . . . .
Yes . . . . . . . . . . .
Yes No No . . . . . . . . .
.
Data to estimation
• Need to then keep relevant months and
reshape the data to meet your estimation
needs (see Stata’s reshape command)
• Will need to collect and append any time
invariant variables – race, sex, ethnicity,
ASVAB score – to each month of the data
• If you believe that calendar month affects
transitions, you need to keep track of time
as well
Further considerations
• You may want some covariates to vary
with time (age, education, number of
children). How will you do this?
• Age is relatively easy as you can augment
it by a month
• Education: What if attending school?
• Children: need to be very precise about
timing