Download Document

Discrete and Categorical Data William N. Evans Department of Economics University of Maryland 1 Part I Introduction 2 Introduction • Workhorse statistical model in social sciences is the multivariate regression model • Ordinary least squares (OLS) • yi = β0 + x1i β1+ x2i β2+… xki βk+ εi • yi = xi β + εi 3 Linear model yi =  + xi + i •  and  are “population” values – represent the true relationship between x and y • Unfortunately – these values are unknown • The job of the researcher is to estimate these values • Notice that if we differentiate y with respect to x, we obtain • dy/dx =  4 •  represents how much y will change for a fixed change in x – Increase in income for more education – Change in crime or bankruptcy when slots are legalized – Increase in test score if you study more 5 Put some concreteness on the problem • State of Maryland budget problems – Drop in revenues – Expensive k-12 school spending initiatives • Short-term solution – raise tax on cigarettes by 34 cents/pack • Problem – a tax hike will reduce consumption of taxable product • Question for state – as taxes are raised, how much will cigarette consumption fall? 6 • Simple model: yi =  + xi + i • Suppose y is a state’s per capita consumption of cigarettes • x represents taxes on cigarettes • Question – how much will y fall if x is increased by 34 cents/pack? • Problem – many reasons why people smoke – cost is but one of them – 7 • Data – (Y) State per capita cigarette consumption for the years 1980-1997 – (X) tax (State + Federal) in real cents per pack – “Scatter plot” of the data – Negative covariance between variables • When x>, more likely that y< • When x<, more likely that y> • Goal: pick values of  and  that “best fit” the data – Define best fit in a moment 8 Notation • True model • yi =  + xi + i • We observe data points (yi,xi) • The parameters  and  are unknown • The actual error (i) is unknown • Estimated model • (a,b) are estimates for the parameters (,) • ei is an estimate of i where • ei=yi-a-bxi • How do you estimate a and b? 9 Objective: Minimize sum of squared errors • Min iei2 = i(yi – a – bxi)2 • Minimize the sum of squared errors (SSE) • Treat positive and negative errors equally – Over or under predict by “5” is the same magnitude of error – “Quadratic form” – The optimal value for a and b are those that make the 1st derivative equal zero – Functions reach min or max values when 10 Cigarette Consumption and Taxes Per capita packs/year 300 250 200 150 100 50 0 0 20 40 60 80 100 120 Tax per pack (cents) 11 Cigarette Consumption and Taxes Per capita packs/year 300 250 200 150 100 50 0 0 20 40 60 80 100 120 Tax per pack (cents) 12 • The model has a lot of nice features – Statistical properties easy to establish – Optimal estimates easy to obtain – Parameter estimates are easy to interpret – Model maximizes prediction • If you minimize SSE you maximize R2 • The model does well as a first order approximation to lots of problems 13 Discrete and Qualitative Data • The OLS model work well when y is a continuous variable – Income, wages, test scores, weight, GDP • Does not has as many nice properties when y is not continuous • Example: doctor visits • Integer values • Low counts for most people • Mass of observations at zero 14 Downside of forcing non-standard outcomes into OLS world? • Can predict outside the allowable range – e.g., negative MD visits • Does not describe the data generating process well – e.g., mass of observations at zero • Violates many properties of OLS – e.g. heteroskedasticity 15 This talk • Look at situations when the data generating process does lend itself well to OLS models • Mathematically describe the data generating process • Show how we use different optimization procedure to obtain estimates • Describe the statistical properties 16 • Show how to interpret parameters • Illustrate how to estimate the models with popular program STATA 17 Types of data generating processes we will consider • Dichotomous events (yes or no) – 1=yes, 0=no – Graduate high school? work? Are obese? Smoke? • Ordinal data – Self reported health (fair, poor, good, excel) – Strongly disagree, disagree, agree, strongly agree 18 • Count data – Doctor visits, lost workdays, fatality counts • Duration data – Time to failure, time to death, time to reemployment 19 Econometric Resources • Recommended textbook – Jeffrey Wooldridge, undergraduate and grad – Lots of insight and mathematical/statistical detail – Very good examples • Helpful web sites – My graduate class – Jeff Smith’s class 20 Part II A quick introduction to STATA 21 STATA • Very fast, convenient, well-documented, cheap and flexible statistical package • Excellent for cross-section/panel data projects, not as great for time series but getting better • Not as easy to manipulate large data sets from flat files as SAS • I usually clean data in SAS, estimate models in STATA 22 • Key characteristic of STATA – All data must be loaded into RAM – Computations are very fast – But, size of the project is limited by available memory • Results can be generated two different ways – Command line – Write a program, (*.do) then submit from the command line 23 Sample program to get you started • cps87_or.do • Program gets you to the point where can • Load data into memory • Construct new variables • Get simple statistics • Run a basic regression • Store the results on a disk 24 Data (cps87_do.dta) • Random sample of data from 1987 Current Population Survey outgoing rotation group • Sample selection – Males – 21-64 – Working 30+hours/week • 19,906 observations 25 Major caveat • Hardest thing to learn/do: get data from some other source and get it into STATA data set • We skip over that part • All the data sets are loaded into a STATA data file that can be called by saying: use data file name 26 Housekeeping at the top of the program • • • * this line defines the semicolon as the ; * end of line delimiter; # delimit ; • • * set memork for 10 meg; set memory 10m; • • • • * write results to a log file; * the replace options writes over old; * log files; log using cps87_or.log,replace; • • * open stata data set; use c:\bill\stata\cps87_or; • • * list variables and labels in data set; desc; 27 • • • • • • • • • • • • • • • • • -----------------------------------------------------------------------------> storage display value variable name type format label variable label -----------------------------------------------------------------------------> age float %9.0g age in years race float %9.0g 1=white, non-hisp, 2=place, n.h, 3=hisp educ float %9.0g years of education unionm float %9.0g 1=union member, 2=otherwise smsa float %9.0g 1=live in 19 largest smsa, 2=other smsa, 3=non smsa region float %9.0g 1=east, 2=midwest, 3=south, 4=west earnwke float %9.0g usual weekly earnings ------------------------------------------------------------------------------ 28 Constructing new variables • Use ‘gen’ command for generate new variables • Syntax – gen new variable name=math statement • Easily construct new variables via – Algebraic operations – Math/trig functions (ln, exp, etc.) – Logical operators (when true, =1, when false, =0) 29 From program • • • • • • • • • • • • * generate new variables; * lines 1-2 illustrate basic math functoins; * lines 3-4 line illustrate logical operators; * line 5 illustrate the OR statement; * line 6 illustrates the AND statement; * after you construct new variables, compress the data again; gen age2=age*age; gen earnwkl=ln(earnwke); gen union=unionm==1; gen topcode=earnwke==999; gen nonwhite=((race==2)|(race==3)); gen big_ne=((region==1)&(smsa==1)); 30 Getting basic statistics • desc -- describes variables in the data set • sum – gets summary statistics • tab – produces frequencies (tables) of discrete variables 31 • * get descriptive statistics; • sum; • * get detailed descriptics for continuous variables; • sum earnwke, detail; • * get frequencies of discrete variables; • tabulate unionm; • tabulate race; • * get two-way table of frequencies; • tabulate region smsa, row column cell; 32 STATA Resources - Specific • “Regression Models for Categorical Dependent Variables Using STATA” – J. Scott Long and Jeremy Freese • Available for sale from STATA website for $52 (www.stata.com) • Post-estimation subroutines that translate results – Do not need to buy the book to use the subroutines 33 • In STATA command line type • net search spost • Will give you a list of available programs to download • One is Spostado from http://www.indiana.edu/~jslsoc/stata • Click on the link and install the files 34 Continuous Distributions • Random variables with infinite number of possible values • Examples -- units of measure (time, weight, distance) • Many discrete outcomes can be treated as continuous, e.g., SAT scores 35 How to describe a continuous random variable • The Probability Density Function (PDF) • The PDF for a random variable x is defined as f(x), where f(x) $ 0 If(x)dx = 1 • Calculus review: The integral of a function gives the “area under the curve” 36 5 Graph of y=f(x) 4 y=f(x) y 3 2 1 0 a x 37 Cumulative Distribution Function (CDF) • Suppose x is a “measure” like distance or time •0# x# 4 • We may be interested in the Pr(x#a) ? 38 CDF a F (a )   f ( x )dx  Pr( x  a ) 0 What if we consider all values?  Pr( x  )   f ( x )dx  1 0 39 Properties of CDF • Note that Pr(x # b) + Pr(x>b) =1 • Pr(x>b) = 1 – Pr(x # b) • Many times, it is easier to work with compliments 40 General notation for continuous distributions • The PDF is described by lower case such as f(x) • The CDF is defined as upper case such as F(a) 41 Standard Normal Distribution • Most frequently used continuous distribution • Symmetric “bell-shaped” distribution • As we will show, the normal has useful properties • Many variables we observe in the real world look normally distributed. • Can translate normal into ‘standard normal’ 42 Examples of variables that look normally distributed • IQ scores • SAT scores • Heights of females • Log income • Average gestation (weeks of pregnancy) • As we will show in a few weeks – sample means are normally distributed!!! 43 Standard Normal Distribution • PDF: 1 f (z)  e 2 1  z2 2  ( z ) • For - 4# z # 4 44 Notation • (z) is the standard normal PDF evaluated at z • [a] = Pr(z # a) a Pr( z  a )    ( z )dz  ( a )  45 Standard Normal PDF 0.45 0.40 0.35 f(Z) 0.30 0.25 0.20 0.15 0.10 0.05 0.00 -3 -2 -1 0 1 2 3 Z 46 Standard Normal • Notice that: – – – – – Normal is symmetric: (a) = (-a) Normal is “unimodal” Median=mean Area under curve=1 Almost all area is between (-3,3) • Evaluations of the CDF are done with – Statistical functions (excel, SAS, etc) – Tables 47 Standard Normal CDF • Pr(z # -0.98) = [-0.98] = 0.1635 Area Under Standard Normal PDF 0.45 0.40 0.35 f(Z) 0.30 0.25 0.20 0.15 0.10 0.05 0.00 -3 -2 -1 0 1 2 3 Z 48 0.09 0.0559 0.0681 0.0823 0.0985 0.1170 0.1379 0.1611 0.1867 0.2148 0.2451 0.2776 0.08 0.0571 0.0694 0.0838 0.1003 0.1190 0.1401 0.1635 0.1894 0.2177 0.2483 0.2810 0.07 0.0582 0.0708 0.0853 0.1020 0.1210 0.1423 0.1660 0.1922 0.2206 0.2514 0.2843 0.06 0.0594 0.0721 0.0869 0.1038 0.1230 0.1446 0.1685 0.1949 0.2236 0.2546 0.2877 0.05 0.0606 0.0735 0.0885 0.1056 0.1251 0.1469 0.1711 0.1977 0.2266 0.2578 0.2912 0.04 0.0618 0.0749 0.0901 0.1075 0.1271 0.1492 0.1736 0.2005 0.2296 0.2611 0.2946 0.03 0.0630 0.0764 0.0918 0.1093 0.1292 0.1515 0.1762 0.2033 0.2327 0.2643 0.2981 0.02 0.0643 0.0778 0.0934 0.1112 0.1314 0.1539 0.1788 0.2061 0.2358 0.2676 0.3015 0.01 0.0655 0.0793 0.0951 0.1131 0.1335 0.1562 0.1814 0.2090 0.2389 0.2709 0.3050 0 0.0668 0.0808 0.0968 0.1151 0.1357 0.1587 0.1841 0.2119 0.2420 0.2743 0.3085 Z -1.50 -1.40 -1.30 -1.20 -1.10 -1.00 -0.90 -0.80 -0.70 -0.60 -0.50 49 • Pr(z # 1.41) = [1.41] = 0.9207 Area Under Standard Normal PDF 0.45 0.40 0.35 f(Z) 0.30 0.25 0.20 0.15 0.10 0.05 0.00 -3 -2 -1 0 Z 1 2 3 50 Z 0.50 0.60 0.70 0.80 0.90 1.00 1.10 1.20 1.30 1.40 1.50 0.00 0.6915 0.7257 0.7580 0.7881 0.8159 0.8413 0.8643 0.8849 0.9032 0.9192 0.9332 0.01 0.6950 0.7291 0.7611 0.7910 0.8186 0.8438 0.8665 0.8869 0.9049 0.9207 0.9345 0.02 0.6985 0.7324 0.7642 0.7939 0.8212 0.8461 0.8686 0.8888 0.9066 0.9222 0.9357 0.03 0.7019 0.7357 0.7673 0.7967 0.8238 0.8485 0.8708 0.8907 0.9082 0.9236 0.9370 0.04 0.7054 0.7389 0.7704 0.7995 0.8264 0.8508 0.8729 0.8925 0.9099 0.9251 0.9382 0.05 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0.8749 0.8944 0.9115 0.9265 0.9394 0.06 0.7123 0.7454 0.7764 0.8051 0.8315 0.8554 0.8770 0.8962 0.9131 0.9279 0.9406 0.07 0.7157 0.7486 0.7794 0.8078 0.8340 0.8577 0.8790 0.8980 0.9147 0.9292 0.9418 0.08 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.8810 0.8997 0.9162 0.9306 0.9429 0.09 0.7224 0.7549 0.7852 0.8133 0.8389 0.8621 0.8830 0.9015 0.9177 0.9319 0.9441 51 • Pr(x>1.17) = 1 – Pr(z # 1.17) = 1[1.17] • = 1 – 0.8790 = 0.1210 Area Under Standard Normal PDF 0.45 0.40 0.35 f(Z) 0.30 0.25 0.20 0.15 0.10 0.05 0.00 -3 -2 -1 0 Z 1 2 3 52 Z 0.50 0.60 0.70 0.80 0.90 1.00 1.10 1.20 1.30 1.40 1.50 0.00 0.6915 0.7257 0.7580 0.7881 0.8159 0.8413 0.8643 0.8849 0.9032 0.9192 0.9332 0.01 0.6950 0.7291 0.7611 0.7910 0.8186 0.8438 0.8665 0.8869 0.9049 0.9207 0.9345 0.02 0.6985 0.7324 0.7642 0.7939 0.8212 0.8461 0.8686 0.8888 0.9066 0.9222 0.9357 0.03 0.7019 0.7357 0.7673 0.7967 0.8238 0.8485 0.8708 0.8907 0.9082 0.9236 0.9370 0.04 0.7054 0.7389 0.7704 0.7995 0.8264 0.8508 0.8729 0.8925 0.9099 0.9251 0.9382 0.05 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0.8749 0.8944 0.9115 0.9265 0.9394 0.06 0.7123 0.7454 0.7764 0.8051 0.8315 0.8554 0.8770 0.8962 0.9131 0.9279 0.9406 0.07 0.7157 0.7486 0.7794 0.8078 0.8340 0.8577 0.8790 0.8980 0.9147 0.9292 0.9418 0.08 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.8810 0.8997 0.9162 0.9306 0.9429 0.09 0.7224 0.7549 0.7852 0.8133 0.8389 0.8621 0.8830 0.9015 0.9177 0.9319 0.9441 53 • Pr(0.1 # z # 1.9) = Pr(z # 1.9) – Pr(z # 0.1) = (1.9) - (0.1) = 0.9713 - 0.5398 = 0.4315 54 Area Under Standard Normal PDF 0.45 0.40 0.35 f(Z) 0.30 0.25 0.20 0.15 0.10 0.05 0.00 -3 -2 -1 0 1 2 3 Z 55 Z 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 0.00 0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257 0.7580 0.7881 0.8159 0.8413 0.01 0.5040 0.5438 0.5832 0.6217 0.6591 0.6950 0.7291 0.7611 0.7910 0.8186 0.8438 0.02 0.5080 0.5478 0.5871 0.6255 0.6628 0.6985 0.7324 0.7642 0.7939 0.8212 0.8461 0.03 0.5120 0.5517 0.5910 0.6293 0.6664 0.7019 0.7357 0.7673 0.7967 0.8238 0.8485 0.04 0.5160 0.5557 0.5948 0.6331 0.6700 0.7054 0.7389 0.7704 0.7995 0.8264 0.8508 0.05 0.5199 0.5596 0.5987 0.6368 0.6736 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0.06 0.5239 0.5636 0.6026 0.6406 0.6772 0.7123 0.7454 0.7764 0.8051 0.8315 0.8554 0.07 0.5279 0.5675 0.6064 0.6443 0.6808 0.7157 0.7486 0.7794 0.8078 0.8340 0.8577 0.08 0.5319 0.5714 0.6103 0.6480 0.6844 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.09 0.5359 0.5753 0.6141 0.6517 0.6879 0.7224 0.7549 0.7852 0.8133 0.8389 0.8621 56 Z 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 0.00 0.8413 0.8643 0.8849 0.9032 0.9192 0.9332 0.9452 0.9554 0.9641 0.9713 0.9772 0.01 0.8438 0.8665 0.8869 0.9049 0.9207 0.9345 0.9463 0.9564 0.9649 0.9719 0.9778 0.02 0.8461 0.8686 0.8888 0.9066 0.9222 0.9357 0.9474 0.9573 0.9656 0.9726 0.9783 0.03 0.8485 0.8708 0.8907 0.9082 0.9236 0.9370 0.9484 0.9582 0.9664 0.9732 0.9788 0.04 0.8508 0.8729 0.8925 0.9099 0.9251 0.9382 0.9495 0.9591 0.9671 0.9738 0.9793 0.05 0.8531 0.8749 0.8944 0.9115 0.9265 0.9394 0.9505 0.9599 0.9678 0.9744 0.9798 0.06 0.8554 0.8770 0.8962 0.9131 0.9279 0.9406 0.9515 0.9608 0.9686 0.9750 0.9803 0.07 0.8577 0.8790 0.8980 0.9147 0.9292 0.9418 0.9525 0.9616 0.9693 0.9756 0.9808 0.08 0.8599 0.8810 0.8997 0.9162 0.9306 0.9429 0.9535 0.9625 0.9699 0.9761 0.9812 0.09 0.8621 0.8830 0.9015 0.9177 0.9319 0.9441 0.9545 0.9633 0.9706 0.9767 0.9817 57 Important Properties of Normal Distribution • Pr(z # A) = [A] • Pr(z > A) = 1 - [A] • Pr(z # - A) = [-A] • Pr(z > -A) = 1 - [-A] = [A] 58 Maximum likelihood estimation • Observe n independent outcomes, all drawn from the same distribution • (y1, y2, y3….yn) • yi is drawn from f(yi; θ) where θ is an unknown parameter for the distribution f • Recall definition of indepedence. If a and b and independent, Prob(a and b) = Pr(a)Pr(B) 59 • Because all the draws are independent, the probability these particular n values of Y would be drawn at random is called the ‘likelihood function’ and it equals • L = Pr(y1)Pr(y2)…Pr(yn) • L = f(y1; θ)f(y2; θ)…..f(y3; θ) 60 • MLE: pick a value for θ that best represents the chance these n values of y would have been generated randomly • To maximize L, maximize a monotonic function of L • Recall ln(abcd)=ln(a)+ln(b)+ln(c)+ln(d) 61 • Max L = ln(L) = ln[f(y1; θ)] +ln[f(y2; θ)] + ….. ln[f(yn; θ) = Σi ln[f(yi; θ)] • Pick θ so that L is maximized • dL/dθ = 0 62 L θ1 θ2 θ 63 Example: Poisson • Suppose y measures ‘counts’ such as doctor visits. • yi is drawn from a Poisson distribution • f(yi;λ) =e-λ λyi/yi! For λ>0 • E[yi]= Var[yi] = λ 64 • Given n observations, (y1, y2, y3….yn) • Pick value of λ that maximizes L • Max L = Σi ln[f(yi; θ)] = Σi ln[e-λ λyi/yi!] = Σi [– λ + yiln(λ) – ln(yi!)] = -n λ + ln(λ) Σi yi – Σi ln(yi!) 65 • L = -n λ + ln(λ) Σi yi – Σi ln(yi!) • dL/dθ = -n + (1/ λ )Σi yi = 0 • Solve for λ • λ = Σi yi /n =  = sample mean of y 66 • In most cases however, cannot find a ‘closed form’ solution for the parameter in ln[f(yi; θ)] • Must ‘search’ over all possible solutions • How does the search work? • Start with candidate value of θ. • Calculate dL/dθ 67 • If dL/dθ > 0, increasing θ will increase L so we increase θ some • If dL/dθ < 0, decreasing θ will increase L so we decrease θ some • Keep changing θ until dL/dθ = 0 • How far you ‘step’ when you change θ is determined by a number of different factors 68 L dL/dθ > 0 θ1 θ 69 L dL/dθ < 0 θ3 θ 70 Properties of MLE estimates • Sometimes call efficient estimation. Can never generate a smaller variance than one obtained by MLE • Parameters estimates are distributed as a normal distribution when samples sizes are large 71 Section 3 Probit and Logit Models 72 Dichotomous Data • Suppose data is discrete but there are only 2 outcomes • Examples – Graduate high school or not – Patient dies or not – Working or not – Smoker or not • In data, yi=1 if yes, yi =0 if no 73 How to model the data generating process? • There are only two outcomes • Research question: What factors impact whether the event occurs? • To answer, will model the probability the outcome occurs • Pr(Yi=1) when yi=1 or • Pr(Yi=0) = 1- Pr(Yi=1) when yi=0 74 • Think of the problem from a MLE perspective • Likelihood for i’th observation • Li= Pr(Yi=1)Yi [1 - Pr(Yi=1)](1-Yi) • When yi=1, only relevant part is Pr(Yi=1) • When yi=0, only relevant part is [1 - Pr(Yi=1)] 75 • L = Σi ln[Li] = = Σi {yi ln[Pr(yi=1)] + (1-yi)ln[Pr(yi=0)] } • Notice that up to this point, the model is generic. The log likelihood function will determined by the assumptions concerning how we determine Pr(yi=1) 76 Modeling the probability • There is some process (biological, social, decision theoretic, etc) that determines the outcome y • Some of the variables impacting are observed, some are not • Requires that we model how these factors impact the probabilities • Model from a ‘latent variable’ perspective 77 • Consider a women’s decision to work • yi* = the person’s net benefit to work • Two components of yi* – Characteristics that we can measure • Education, age, income of spouse, prices of child care – Some we cannot measure • How much you like spending time with your kids • how much you like/hate your job 78 • We aggregate these two components into one equation • yi* = β0 + x1i β1+ x2i β2+… xki βk+ εi = xi β + εi • xi β (measurable characteristics but with uncertain weights) • εi random unmeasured characteristics • Decision rule: person will work if yi* > 0 (if net benefits are positive) yi=1 if yi*>0 yi=0 if yi*≤0 79 • yi=1 if yi*>0 • yi* = xi β + εi > 0 only if • εi > - xi β • yi=0 if yi*≤0 • yi* = xi β + εi ≤ 0 only if • εi ≤ - xi β 80 • Suppose xi β is ‘big.’ – High wages – Low husband’s income – Low cost of child care • We would expect this person to work, UNLESS, there is some unmeasured ‘variable’ that counteracts this 81 • Suppose a mom really likes spending time with her kids, or she hates her job. • The unmeasured benefit of working has a big negative coefficient εi • If we observe them working, εi must not have been too big, since • yi=1 if εi > - xi β 82 • Consider the opposite. Suppose we observe someone NOT working. • Then εi must not have been big, since • yi=0 if εi ≤ - xi β 83 Logit • Recall yi =1 if εi > - xi β • Since εi is a logistic distribution • Pr(εi > - xi β) = 1 – F(- xi β) • The logistic is also a symmetric distribution, so • 1 – F(- xi β) • = F(xi β) • = exp(xi β)/(1+exp(xi β)) 84 • When εi is a logistic distribution • Pr(yi =1) = exp(xi β)/(1+exp(xi β)) • Pr(yi=0) = 1/(1+exp(xi β)) 85 Example: Workplace smoking bans • Smoking supplements to 1991 and 1993 National Health Interview Survey • Asked all respondents whether they currently smoke • Asked workers about workplace tobacco policies • Sample: workers • Key variables: current smoking and whether they faced by workplace ban 86 • Data: workplace1.dta • Sample program: workplace1.doc • Results: workplace1.log 87 Description of variables in data • . desc; • • • • • • • • • • • • • • • storage display value variable name type format label variable label -----------------------------------------------------------------------> smoker byte %9.0g is current smoking worka byte %9.0g has workplace smoking bans age byte %9.0g age in years male byte %9.0g male black byte %9.0g black hispanic byte %9.0g hispanic incomel float %9.0g log income hsgrad byte %9.0g is hs graduate somecol byte %9.0g has some college college float %9.0g ----------------------------------------------------------------------- 88 Summary statistics • sum; • • • • • • • • • • • • • Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------smoker | 16258 .25163 .433963 0 1 worka | 16258 .6851396 .4644745 0 1 age | 16258 38.54742 11.96189 18 87 male | 16258 .3947595 .488814 0 1 black | 16258 .1119449 .3153083 0 1 -------------+-------------------------------------------------------hispanic | 16258 .0607086 .2388023 0 1 incomel | 16258 10.42097 .7624525 6.214608 11.22524 hsgrad | 16258 .3355271 .4721889 0 1 somecol | 16258 .2685447 .4432161 0 1 college | 16258 .3293763 .4700012 0 1 89 Running a probit • probit smoker age incomel male black hispanic hsgrad somecol college worka; • The first variable after ‘probit’ is the discrete outcome, the rest of the variables are the independent variables • Includes a constant as a default 90 Running a logit • logit smoker age incomel male black hispanic hsgrad somecol college worka; • Same as probit, just change the first word 91 Running linear probability • reg smoker age incomel male black hispanic hsgrad somecol college worka, robust; • Simple regression. • Standard errors are incorrect (heteroskedasticity) • robust option produces standard errors with arbitrary form of heteroskedasticity 92 Probit Results • • • • Probit estimates • • • • • • • • • • • • • • -----------------------------------------------------------------------------smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------age | -.0012684 .0009316 -1.36 0.173 -.0030943 .0005574 incomel | -.092812 .0151496 -6.13 0.000 -.1225047 -.0631193 male | .0533213 .0229297 2.33 0.020 .0083799 .0982627 black | -.1060518 .034918 -3.04 0.002 -.17449 -.0376137 hispanic | -.2281468 .0475128 -4.80 0.000 -.3212701 -.1350235 hsgrad | -.1748765 .0436392 -4.01 0.000 -.2604078 -.0893453 somecol | -.363869 .0451757 -8.05 0.000 -.4524118 -.2753262 college | -.7689528 .0466418 -16.49 0.000 -.860369 -.6775366 worka | -.2093287 .0231425 -9.05 0.000 -.2546873 -.1639702 _cons | .870543 .154056 5.65 0.000 .5685989 1.172487 ------------------------------------------------------------------------------ Log likelihood = -8761.7208 Number of obs LR chi2(9) Prob > chi2 Pseudo R2 = = = = 16258 819.44 0.0000 0.0447 93 How to measure fit? • Regression (OLS) – minimize sum of squared errors – Or, maximize R2 – The model is designed to maximize predictive capacity • Not the case with Probit/Logit – MLE models pick distribution parameters so as best describe the data generating process – May or may not ‘predict’ the outcome well 94 Pseudo R2 • LLk log likelihood with all variables • LL1 log likelihood with only a constant • 0 > LLk > LL1 so | LLk | < |LL1| • Pseudo R2 = 1 - |LL1/LLk| • Bounded between 0-1 • Not anything like an R2 from a regression 95 Predicting Y • Let b be the estimated value of β • For any candidate vector of xi , we can predict probabilities, Pi • Pi = Ф(xib) • Once you have Pi, pick a threshold value, T, so that you predict • Yp = 1 if Pi > T • Yp = 0 if Pi ≤ T • Then compare, fraction correctly predicted 96 • Question: what value to pick for T? • Can pick .5 – Intuitive. More likely to engage in the activity than to not engage in it – However, when the  is small, this criteria does a poor job of predicting Yi=1 – However, when the  is close to 1, this criteria does a poor job of picking Yi=0 97 • *predict probability of smoking; • predict pred_prob_smoke; • * get detailed descriptive data about predicted prob; • sum pred_prob, detail; • * predict binary outcome with 50% cutoff; • gen pred_smoke1=pred_prob_smoke>=.5; • label variable pred_smoke1 "predicted smoking, 50% cutoff"; • * compare actual values; • tab smoker pred_smoke1, row col cell; 98 • . sum pred_prob, detail; • • • • • • • Pr(smoker) ------------------------------------------------------------Percentiles Smallest 1% .0959301 .0615221 5% .1155022 .0622963 10% .1237434 .0633929 Obs 16258 25% .1620851 .0733495 Sum of Wgt. 16258 • • • • • • 50% 75% 90% 95% 99% .2569962 .3187975 .3795704 .4039573 .4672697 Largest .5619798 .5655878 .5684112 .6203823 Mean Std. Dev. .2516653 .0960007 Variance Skewness Kurtosis .0092161 .1520254 2.149247 99 • Notice two things – Sample mean of the predicted probabilities is close to the sample mean outcome – 99% of the probabilities are less than .5 – Should predict few smokers if use a 50% cutoff 100 • • • • • • • • • • • • • • • • • • | predicted smoking, is current | 50% cutoff smoking | 0 1 | Total -----------+----------------------+---------0 | 12,153 14 | 12,167 | 99.88 0.12 | 100.00 | 74.93 35.90 | 74.84 | 74.75 0.09 | 74.84 -----------+----------------------+---------1 | 4,066 25 | 4,091 | 99.39 0.61 | 100.00 | 25.07 64.10 | 25.16 | 25.01 0.15 | 25.16 -----------+----------------------+---------Total | 16,219 39 | 16,258 | 99.76 0.24 | 100.00 | 100.00 100.00 | 100.00 | 99.76 0.24 | 100.00 101 • Check on-diagonal elements. • The last number in each 2x2 element is the fraction in the cell • The model correctly predicts 74.75 + 0.15 = 74.90% of the obs • It only predicts a small fraction of smokers 102 • Do not be amazed by the 75% percent correct prediction • If you said everyone has a  chance of smoking (a case of no covariates), you would be correct Max[(,(1-)] percent of the time 103 • In this case, 25.16% smoke. • If everyone had the same chance of smoking, we would assign everyone Pr(y=1) = .2516 • We would be correct for the 1 - .2516 = 0.7484 people who do not smoke 104 Key points about prediction • MLE models are not designed to maximize prediction • Should not be surprised they do not predict well • In this case, not particularly good measures of predictive capacity 105 Translating coefficients in probit: Continuous Covariates • Pr(yi=1) = Φ[β0 + x1i β1+ x2i β2+… xki βk] • Suppose that x1i is a continuous variable • d Pr(yi=1) /d x1i = ? • What is the change in the probability of an event give a change in x1i? 106 Marginal Effect • d Pr(yi=1) /d x1i • = β1 φ[β0 + x1i β1+ x2i β2+… xki βk] • Notice two things. Marginal effect is a function of the other parameters and the values of x. 107 Translating Coefficients: Discrete Covariates • Pr(yi=1) = Φ[β0 + x1i β1+ x2i β2+… xki βk] • Suppose that x2i is a dummy variable (1 if yes, 0 if no) • Marginal effect makes no sense, cannot change x2i by a little amount. It is either 1 or 0. • Redefine the variable of interest. Compare outcomes with and without x2i 108 • y1 = Pr(yi=1 | x2i=1) = Φ[β0 + x1iβ1+ β2 + x3iβ3 +… ] • y0 = Pr(yi=1 | x2i=0) = Φ[β0 + x1iβ1+ x3iβ3 … ] Marginal effect = y1 – y0. Difference in probabilities with and without x2i? 109 In STATA • Marginal effects for continuous variables, STATA picks sample means for X’s • Change in probabilities for dichotomous outcomes, STATA picks sample means for X’s 110 STATA command for Marginal Effects • mfx compute; • Must be after the outcome when estimates are still active in program. 111 • • • • • • • • • • • • • • • • • Marginal effects after probit y = Pr(smoker) (predict) = .24093439 -----------------------------------------------------------------------------variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------age | -.0003951 .00029 -1.36 0.173 -.000964 .000174 38.5474 incomel | -.0289139 .00472 -6.13 0.000 -.03816 -.019668 10.421 male*| .0166757 .0072 2.32 0.021 .002568 .030783 .39476 black*| -.0320621 .01023 -3.13 0.002 -.052111 -.012013 .111945 hispanic*| -.0658551 .01259 -5.23 0.000 -.090536 -.041174 .060709 hsgrad*| -.053335 .01302 -4.10 0.000 -.07885 -.02782 .335527 somecol*| -.1062358 .01228 -8.65 0.000 -.130308 -.082164 .268545 college*| -.2149199 .01146 -18.76 0.000 -.237378 -.192462 .329376 worka*| -.0668959 .00756 -8.84 0.000 -.08172 -.052072 .68514 -----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1 112 Interpret results • 10% increase in income will reduce smoking by 2.9 percentage points • 10 year increase in age will decrease smoking rates .4 percentage points • Those with a college degree are 21.5 percentage points less likely to smoke • Those that face a workplace smoking ban have 6.7 percentage point lower probability of smoking 113 • Do not confuse percentage point and percent differences – A 6.7 percentage point drop is 29% of the sample mean of 24 percent. – Blacks have smoking rates that are 3.2 percentage points lower than others, which is 13 percent of the sample mean 114 Comparing Marginal Effects Variable age incomel male Black hispanic hsgrad college worka LP -0.00040 -0.0289 0.0167 -0.0321 -0.0658 -0.0533 -0.2149 -0.0669 Probit -0.00048 -0.0287 0.0168 -0.0357 -0.0706 -0.0661 -0.2406 -0.0661 Logit -0.00048 -0.0276 0.0172 -0.0342 -0.0602 -0.0514 -0.2121 -0.0658 115 When will results differ? • Normal and logit CDF look – Similar in the mid point of the distribution – Different in the tails • You obtain more observations in the tails of the distribution when – Samples sizes are large –  approaches 1 or 0 • These situations will produce more differences in estimates 116 Some nice properties of the Logit • Outcome, y=1 or 0 • Treatment, x=1 or 0 • Other covariates, x • Context, – x = whether a baby is born with a low weight birth – x = whether the mom smoked or not during pregnancy 117 • Risk ratio RR = Prob(y=1|x=1)/Prob(y=1|x=0) Differences in the probability of an event when x is and is not observed How much does smoking elevate the chance your child will be a low weight birth 118 • Let Yyx be the probability y=1 or 0 given x=1 or 0 • Think of the risk ratio the following way • Y11 is the probability Y=1 when X=1 • Y10 is the probability Y=1 when X=0 • Y11 = RR*Y10 119 • Odds Ratio OR=A/B = [Y11/Y01]/[Y10/Y00] A = [Pr(Y=1|X=1)/Pr(Y=0|X=1)] = odds of Y occurring if you are a smoker B = [Pr(Y=1|X=0)/Pr(Y=0|X=0)] = odds of y happening if you are not a smoker What are the relative odds of Y happening if you do or do not experience X 120 • Suppose Pr(Yi =1) = F(βo+ β1Xi + β2Z) and F is the logistic function • Can show that • OR = exp(β1) = e β1 • This number is typically reported by most statistical packages 121 • Details • Y11 = exp(βo+ β1 + β2Z) /(1+ exp(βo+ β1+ β2Z) ) • Y10 = exp(βo+ β2Z)/(1+ exp(βo+β2Z)) • Y01 = 1 /(1+ exp(βo+ β1 + β2Z) ) • Y00 = 1/(1+ exp(βo+β2Z) • [Y11/Y01] = exp(βo+ β1 + β2Z) • [Y10/Y00] = exp(βo+ β2Z) • OR=A/B = [Y11/Y01]/[Y10/Y00] = exp(βo+ β1 + β2Z)/ exp(βo + β2Z) = exp(β1) 122 • Suppose Y is rare,  close to 0 – Pr(Y=0|X=1) and Pr(Y=0|X=0) are both close to 1, so they cancel • Therefore, when  is close to 0 – Odds Ratio = Risk Ratio • Why is this nice? 123 Population attributable risk • Average outcome in the population •  = (1-) Y10 +  Y11 = (1- )Y10 + (RR)Y10 • Average outcomes are a weighted average of outcomes for X=0 and X=1 • What would the average outcome be in the absence of X (e.g., reduce smoking rates to 0) • Ya = Y10 124 Population Attributable Risk • PAR • Fraction of outcome attributed to X • The difference between the current rate and the rate that would exist without X, divided by the current rate • PAR = ( – Ya)/ = (RR – 1)/[(1-) + RR] 125 Example: Maternal Smoking and Low Weight Births • 6% births are low weight – < 2500 grams ( – Average birth is 3300 grams (5.5 lbs) • Maternal smoking during pregnancy has been identified as a key cofactor – 13% of mothers smoke – This number was falling about 1 percentage point per year during 1980s/90s – Doubles chance of low weight birth 126 Natality detail data • Census of all births (4 million/year) • Annual files starting in the 60s • Information about – Baby (birth weight, length, date, sex, plurality, birth injuries) – Demographics (age, race, marital, educ of mom) – Birth (who delivered, method of delivery) – Health of mom (smoke/drank during preg, weight gain) 127 • Smoking not available from CA or NY • ~3 million usable observations • I pulled .5% random sample from 1995 • About 12,500 obs • Variables: birthweight (grams), smoked, married, 4-level race, 5 level education, mothers age at birth 128 • • • • • • • • • • • • • • • • -----------------------------------------------------------------------------> storage display value variable name type format label variable label -----------------------------------------------------------------------------> birthw int %9.0g birth weight in grams smoked byte %9.0g =1 if mom smoked during pregnancy age byte %9.0g moms age at birth married byte %9.0g =1 if married race4 byte %9.0g 1=white,2=black,3=asian,4=other educ5 byte %9.0g 1=0-8, 2=9-11, 3=12, 4=13-15, 5=16+ visits byte %9.0g prenatal visits ------------------------------------------------------------------------------ 129 • • • • • • • • • • • • • • • • • • • • dummy | variable, | =1 | =1 if mom smoked ifBW<2500 | during pregnancy grams | 0 1 | Total -----------+----------------------+---------0 | 11,626 1,745 | 13,371 | 86.95 13.05 | 100.00 | 94.64 89.72 | 93.96 | 81.70 12.26 | 93.96 -----------+----------------------+---------1 | 659 200 | 859 | 76.72 23.28 | 100.00 | 5.36 10.28 | 6.04 | 4.63 1.41 | 6.04 -----------+----------------------+---------Total | 12,285 1,945 | 14,230 | 86.33 13.67 | 100.00 | 100.00 100.00 | 100.00 | 86.33 13.67 | 100.00 130 • Notice a few things – 13.7% of women smoke – 6% have low weight birth • Pr(LBW | Smoke) =10.28% • Pr(LBW |~ Smoke) = 5.36% • RR = Pr(LBW | Smoke)/ Pr(LBW |~ Smoke) = 0.1028/0.0536 = 1.92 131 Logit results • Log likelihood = -3136.9912 Pseudo R2 = 0.0330 • • • • • • • • • • • • • • • -----------------------------------------------------------------------------lowbw | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------smoked | .6740651 .0897869 7.51 0.000 .4980861 .8500441 age | .0080537 .006791 1.19 0.236 -.0052564 .0213638 married | -.3954044 .0882471 -4.48 0.000 -.5683654 -.2224433 _Ieduc5_2 | -.1949335 .1626502 -1.20 0.231 -.5137221 .1238551 _Ieduc5_3 | -.1925099 .1543239 -1.25 0.212 -.4949791 .1099594 _Ieduc5_4 | -.4057382 .1676759 -2.42 0.016 -.7343769 -.0770994 _Ieduc5_5 | -.3569715 .1780322 -2.01 0.045 -.7059081 -.0080349 _Irace4_2 | .7072894 .0875125 8.08 0.000 .5357681 .8788107 _Irace4_3 | .386623 .307062 1.26 0.208 -.2152075 .9884535 _Irace4_4 | .3095536 .2047899 1.51 0.131 -.0918271 .7109344 _cons | -2.755971 .2104916 -13.09 0.000 -3.168527 -2.343415 ------------------------------------------------------------------------------ 132 Odds Ratios • Smoked – exp(0.674) = 1.96 – Smokers are twice as likely to have a low weight birth • _Irace4_2 (Blacks) – exp(0.707) = 2.02 – Blacks are twice as likely to have a low weight birth 133 Asking for odds ratios • Logistic y x1 x2; • In this case • xi: logistic lowbw smoked age married i.educ5 i.race4; 134 • Log likelihood = -3136.9912 Pseudo R2 = 0.0330 • • • • • • • • • • • • • • -----------------------------------------------------------------------------lowbw | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------smoked | 1.962198 .1761796 7.51 0.000 1.645569 2.33975 age | 1.008086 .0068459 1.19 0.236 .9947574 1.021594 married | .6734077 .0594262 -4.48 0.000 .5664506 .8005604 _Ieduc5_2 | .8228894 .1338431 -1.20 0.231 .5982646 1.131852 _Ieduc5_3 | .8248862 .1272996 -1.25 0.212 .6095837 1.116233 _Ieduc5_4 | .6664847 .1117534 -2.42 0.016 .4798043 .9257979 _Ieduc5_5 | .6997924 .1245856 -2.01 0.045 .4936601 .9919973 _Irace4_2 | 2.028485 .1775178 8.08 0.000 1.70876 2.408034 _Irace4_3 | 1.472001 .4519957 1.26 0.208 .8063741 2.687076 _Irace4_4 | 1.362817 .2790911 1.51 0.131 .9122628 2.035893 ------------------------------------------------------------------------------ 135 PAR • PAR = (RR – 1)/[(1-) + RR] • = 0.137 • RR = 1.96 • PAR = 0.116 • 11.6% of low weight births attributed to maternal smoking 136 Hypothesis Testing in MLE models • MLE are asymptotically normally distributed, one of the properties of MLE • Therefore, standard t-tests of hypothesis will work as long as samples are ‘large’ • What ‘large’ means is open to question • What to do when samples are ‘small’ – table for a moment 137 Testing a linear combination of parameters • Suppose you have a probit model • Φ[β0 + x1iβ1+ x2i β2 + x3iβ3 +… ] • Test a linear combination or parameters • Simplest example, test a subset are zero • β1= β2 = β3 = β4 =0 • To fix the discussion • N observations • K parameters • J restrictions (count the equals signs, j=4) 138 Wald Test • Based on the fact that the parameters are distributed asymptotically normal • Probability theory review – Suppose you have m draws from a standard normal distribution (zi) – M = z12 + z22 + …. Zm2 – M is distributed as a Chi-square with m degrees of freedom 139 • Wald test constructs a ‘quadratic form’ suggested by the test you want to perform • This combination, because it contains squares of the true parameters, should, if the hypothesis is true, be distributed as a Chi square with j degrees of freedom. • If the test statistic is ‘large’, relative to the degrees of freedom of the test, we reject, because there is a low probability we would have drawn that value at random from the distribution 140 Reading values from a Table • All stats books will report the ‘percentiles’ of a chi-square – Vertical axis (degrees of freedom) – Horizontal axis (percentiles) – Entry is the value where ‘percentile’ of the distribution falls below 141 • Example: Suppose 4 restrictions • 95% of a chi-square distribution falls below 9.488. • So there is only a 5% a number drawn at random will exceed 9.488 • If your test statistic is below, cannot reject null • If your test statistics is above, reject null 142 Chi-square DOF 1 2 3 4 5 6 7 8 9 10 Percentiles of the Chi-squared 0.500 0.750 0.800 0.900 0.950 0.455 1.323 1.642 2.706 3.841 1.386 2.773 3.219 4.605 5.991 2.366 4.108 4.642 6.251 7.815 3.357 5.385 5.989 7.779 9.488 4.351 6.626 7.289 9.236 11.070 5.348 7.841 8.558 10.645 12.592 6.346 9.037 9.803 12.017 14.067 7.344 10.219 11.030 13.362 15.507 8.343 11.389 12.242 14.684 16.919 9.342 12.549 13.442 15.987 18.307 0.990 6.635 9.210 11.345 13.277 15.086 16.812 18.475 20.090 21.666 23.209 0.995 7.879 10.597 12.838 14.860 16.750 18.548 20.278 21.955 23.589 25.188 143 Wald test in STATA • Default test in MLE models • Easy to do. Look at program • test hsgrad somecol college • Does not estimate the ‘restricted’ model • ‘Lower power’ than other tests, i.e., high chance of false negative 144 -2 Log likelihood test • * how to run the same tests with a -2 log like test; • * estimate the unresticted model and save the estimates ; • * in urmodel; • probit smoker age incomel male black hispanic • hsgrad somecol college worka; • estimates store urmodel; • * estimate the restricted model. save results in rmodel; • probit smoker age incomel male black hispanic • worka; • estimates store rmodel; • lrtest urmodel rmodel; 145 • I prefer -2 log likelihood test – Estimates the restricted and unrestricted model – Therefore, has more power than a Wald test • In most cases, they give the same ‘decision’ (reject/not reject) 146 Ordered probit models 147 Ordered Probit • Many discrete outcomes are to questions that have a natural ordering but no quantitative interpretation: • Examples: – Self reported health status • (excellent, very good, good, fair, poor) – Do you agree with the following statement • Strongly agree, agree, disagree, strongly disagree 148 • Can use the same type of model as in the previous section to analyze these outcomes • Another ‘latent variable’ model • Key to the model: there is a monotonic ordering of the qualitative responses 149 Self reported health status • Excellent, very good, good, fair, poor • Coded as 1, 2, 3, 4, 5 on National Health Interview Survey • We will code as 5,4,3,2,1 (easier to think of this way) • Asked on every major health survey • Important predictor of health outcomes, e.g. mortality • Key question: what predicts health status? 150 • Important to note – the numbers 1-5 mean nothing in terms of their value, just an ordering to show you the lowest to highest • The example below is easily adapted to include categorical variables with any number of outcomes 151 Model • yi* = latent index of reported health • The latent index measures your own scale of health. Once yi* crosses a certain value you report poor, then good, then very good, then excellent health 152 • yi = (1,2,3,4,5) for (fair, poor, VG, G, excel) • Interval decision rule • yi=1 • yi=2 • yi=3 • yi=4 • yi=5 if if if if if yi* ≤ u1 u1 < yi* ≤ u2 u2 < yi* ≤ u3 u3 < yi* ≤ u4 yi* > u4 153 • As with logit and probit models, we will assume yi* is a function of observed and unobserved variables • yi* = β0 + x1i β1 + x2i β2 …. xki βk + εi • yi* = xi β + εi 154 • The threshold values (u1, u2, u3, u4) are unknown. We do not know the value of the index necessary to push you from very good to excellent. • In theory, the threshold values are different for everyone • Computer will not only estimate the β’s, but also the thresholds – average across people 155 • As with probit and logit, the model will be determined by the assumed distribution of ε • In practice, most people pick nornal, generating an ‘ordered probit’ (I have no idea why) • We will generate the math for the probit version 156 Probabilities • Lets do the outliers, Pr(yi=1) and Pr(yi=5) first • Pr(yi=1) • = Pr(yi* ≤ u1) • = Pr(xi β +εi ≤ u1 ) • =Pr(εi ≤ u1 - xi β) • = Φ[u1 - xi β] = 1- Φ[xi β – u1] 157 • Pr(yi=5) • = Pr(yi* > u4) • = Pr(xi β +εi > u4 ) • =Pr(εi > u4 - xi β) • = 1 - Φ[u4 - xi β] = Φ[xi β – u4] 158 Sample one for y=3 • Pr(yi=3) = Pr(u2 < yi* ≤ u3) = Pr(yi* ≤ u3) – Pr(yi* ≤ u2) = Pr(xi β +εi ≤ u3) – Pr(xi β +εi ≤ u2) = Pr(εi ≤ u3- xi β) - Pr(εi ≤ u2 - xi β) = Φ[u3- xi β] - Φ[u2 - xi β] = 1 - Φ[xi β - u3] – 1 + Φ[xi β - u2] = Φ[xi β - u2] - Φ[xi β - u3] 159 Summary • Pr(yi=1) = 1- Φ[xi β – u1] • Pr(yi=2) = Φ[xi β – u1] - Φ[xi β – u2] • Pr(yi=3) = Φ[xi β – u2] - Φ[xi β – u3] • Pr(yi=4) = Φ[xi β – u3] - Φ[xi β – u4] • Pr(yi=5) = Φ[xi β – u4] 160 Likelihood function • There are 5 possible choices for each person • Only 1 is observed • L = Σi ln[Pr(yi=k)] for k 161 Programming example • Cancer control supplement to 1994 National Health Interview Survey • Question: what observed characteristics predict self reported health (1-5 scale) • 1=poor, 5=excellent • Key covariates: income, education, age, current and former smoking status • Programs • sr_health_status.do, .dta, .log 162 • desc; • • • • • • • • • • male age educ smoke smoke5 black othrace sr_health byte byte byte byte byte float float float %9.0g %9.0g %9.0g %9.0g %9.0g %9.0g %9.0g %9.0g famincl float %9.0g =1 if male age in years years of education current smoker smoked in past 5 years =1 if respondent is black =1 if other race (white is ref) 1-5 self reported health, 5=excel, 1=poor log family income 163 • tab sr_health; • 1-5 self | • reported | • health, | • 5=excel, | • 1=poor | Freq. Percent Cum. • ------------+----------------------------------• 1 | 342 2.65 2.65 • 2 | 991 7.68 10.33 • 3 | 3,068 23.78 34.12 • 4 | 3,855 29.88 64.00 • 5 | 4,644 36.00 100.00 • ------------+----------------------------------• Total | 12,900 100.00 164 In STATA • oprobit sr_health male age educ famincl black othrace smoke smoke5; 165 • • • • Ordered probit estimates • • • • • • • • • • • • • • • • • -----------------------------------------------------------------------------sr_health | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------male | .1281241 .0195747 6.55 0.000 .0897583 .1664899 age | -.0202308 .0008499 -23.80 0.000 -.0218966 -.018565 educ | .0827086 .0038547 21.46 0.000 .0751535 .0902637 famincl | .2398957 .0112206 21.38 0.000 .2179037 .2618878 black | -.221508 .029528 -7.50 0.000 -.2793818 -.1636341 othrace | -.2425083 .0480047 -5.05 0.000 -.3365958 -.1484208 smoke | -.2086096 .0219779 -9.49 0.000 -.2516855 -.1655337 smoke5 | -.1529619 .0357995 -4.27 0.000 -.2231277 -.0827961 -------------+---------------------------------------------------------------_cut1 | .4858634 .113179 (Ancillary parameters) _cut2 | 1.269036 .11282 _cut3 | 2.247251 .1138171 _cut4 | 3.094606 .1145781 ------------------------------------------------------------------------------ Log likelihood = -16401.987 Number of obs LR chi2(8) Prob > chi2 Pseudo R2 = = = = 12900 2379.61 0.0000 0.0676 166 Interpret coefficients • Marginal effects/changes in probabilities are now a function of 2 things – Point of expansion (x’s) – Frame of reference for outcome (y) • STATA – Picks mean values for x’s – You pick the value of y 167 Continuous x’s • Consider y=5 • d Pr(yi=5)/dxi = d Φ[xi β – u4]/dxi = βφ[xi β – u4] • Consider y=3 • d Pr(yi=3)/dxi = βφ[xi β – u3] - βφ[xi β – u4] 168 Discrete X’s • xi β = β0 + x1i β1 + x2i β2 …. xki βk – X2i is yes or no (1 or 0) • ΔPr(yi=5) = • Φ[β0 + x1i β1 + β2 + x3i β3 +.. xki βk] - Φ[β0 + x1i β1 + x3i β3 …. xki βk] • Change in the probabilities when x2i=1 and x2i=0 169 Ask for marginal effects • mfx compute, predict(outcome(5)); 170 • mfx compute, predict(outcome(5)); • • • • • • • • • • • • • • • • Marginal effects after oprobit y = Pr(sr_health==5) (predict, outcome(5)) = .34103717 -----------------------------------------------------------------------------variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------male*| .0471251 .00722 6.53 0.000 .03298 .06127 .438062 age | -.0074214 .00031 -23.77 0.000 -.008033 -.00681 39.8412 educ | .0303405 .00142 21.42 0.000 .027565 .033116 13.2402 famincl | .0880025 .00412 21.37 0.000 .07993 .096075 10.2131 black*| -.0781411 .00996 -7.84 0.000 -.097665 -.058617 .124264 othrace*| -.0843227 .01567 -5.38 0.000 -.115043 -.053602 .04124 smoke*| -.0749785 .00773 -9.71 0.000 -.09012 -.059837 .289147 smoke5*| -.0545062 .01235 -4.41 0.000 -.078719 -.030294 .081395 -----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1 171 Interpret the results • Males are 4.7 percentage points more likely to report excellent • Each year of age decreases chance of reporting excellent by 0.7 percentage points • Current smokers are 7.5 percentage points less likely to report excellent health 172 Minor notes about estimation • Wald tests/-2 log likelihood tests are done the exact same was as in PROBIT and LOGIT 173 • Use PRCHANGE to calculate marginal effect for a specific person prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16); – When a variable is NOT specified (famincl), STATA takes the sample mean. 174 • PRCHANGE will produce results for all outcomes • • • • • male 0->1 Avg|Chg| .0203868 0->1 5 .05096698 1 -.0020257 2 -.00886671 3 -.02677558 4 -.01329902 175 • • • • • • age Min->Max -+1/2 -+sd/2 MargEfct Avg|Chg| .13358317 .00321942 .03728014 .00321947 1 .0184785 .00032518 .00382077 .00032515 2 .06797072 .00141642 .01648743 .00141639 3 .17686112 .00424452 .04910323 .00424462 4 .07064757 .00206241 .0237889 .00206252 176 Section Count Data Models 177 Introduction • Many outcomes of interest are integer counts – Doctor visits – Low work days – Cigarettes smoked per day – Missed school days • OLS models can easily handle some integer models 178 • Example – SAT scores are essentially integer values – Few at ‘tails’ – Distribution is fairly continuous – OLS models well • In contrast, suppose – High fraction of zeros – Small positive values 179 • OLS models will – Predict negative values – Do a poor job of predicting the mass of observations at zero • Example – – – – Dr visits in past year, Medicare patients(65+) 1987 National Medical Expenditure Survey Top code (for now) at 10 17% have no visits 180 • visits | Freq. Percent Cum. • ------------+----------------------------------• 0 | 915 17.18 17.18 • 1 | 601 11.28 28.46 • 2 | 533 10.01 38.46 • 3 | 503 9.44 47.91 • 4 | 450 8.45 56.35 • 5 | 391 7.34 63.69 • 6 | 319 5.99 69.68 • 7 | 258 4.84 74.53 • 8 | 216 4.05 78.58 • 9 | 192 3.60 82.19 • 10 | 949 17.81 100.00 • ------------+----------------------------------• Total | 5,327 100.00 181 Poisson Model • yi is drawn from a Poisson distribution • Poisson parameter varies across observations • f(yi;λi) =e-λi λi yi/yi! For λi>0 • E[yi]= Var[yi] = λi = f(xi, β) 182 • λi must be positive at all times • Therefore, we CANNOT let λi = xiβ • Let λi = exp(xiβ) • ln(λi) = (xiβ) 183 • d ln(λi)/dxi = β • Remember that d ln(λi) = dλi/λi • Interpret β as the percentage change in mean outcomes for a change in x 184 Problems with Poisson • Variance grows with the mean – E[yi]= Var[yi] = λi = f(xi, β) • Most data sets have over dispersion, where the variance grows faster than the mean • In dr. visits sample,  = 5.6, s=6.7 • Impose Mean=Var, severe restriction and you tend to reduce standard errors 185 Negative Binomial Model Pr( yi )  i ( i  yi )  1         ( i )( yi  1)    1     1  yi • Where γi = exp(xiβ) and δ ≥ 0 • E[yi] = δγi = δexp(xiβ) • Var[yi] = δ (1+δ) γi • Var[yi]/ E[yi] = (1+δ) 186 • δ must always be ≥ 0 • In this case, the variance grows faster than the mean • If δ=0, the model collapses into the Poisson • Always estimate negative binomial • If you cannot reject the null that δ=0, report the Poisson estimates 187 • Notice that ln(E[yi]) = ln(δ) + ln(γi), so • d ln(E[yi]) /dxi = β • Parameters have the same interpretation as in the Poisson model 188 In STATA • POISSON estimates a MLE model for poisson – Syntax POISSON y independent variables • NBREG estimates MLE negative binomial – Syntax NBREG y independent variables 189 Interpret results for Poisson • Those with CHRONIC condition have 50% more mean MD visits • Those in EXCELent health have 78% fewer MD visits • BLACKS have 33% fewer visits than whites • Income elasticity is 0.021, 10% increase in income generates a 2.1% increase in visits 190 Negative Binomial • Interpret results the same was as Poisson • Look at coefficient/standard error on delta • Ho: delta = 0 (Poisson model is correct) • In this case, delta = 5.21 standard error is 0.15, easily reject null. • Var/Mean = 1+delta = 6.21, Poisson is mis-specificed, should see very small 191 standard errors in the wrong model Selected Results, Count Models Parameter (Standard Error) Variable Poisson Negative Binomial Age65 0.214 (0.026) 0.103 (0.055) Age70 0.787 (0.026) 0.204 (0.054) Chronic 0.500 (0.014) 0.509 (0.029) Excel (0.031) -0.527 (0.059) (0.007) 0.038 (0.016) -0.784 Ln(Inc). 0.021 192 Section Duration Data 193 Introduction • Sometimes we have data on length of time of a particular event or ‘spells’ – Time until death – Time on unemployment – Time to complete a PhD • Techniques we will discuss were originally used to examine lifespan of objects like light bulbs or machines. These models are often referred to as “time to failure” 194 Notation • T is a random variable that indicates duration (time til death, find a new job, etc) • t is the realization of that variable • f(t) is a PDF that describes the process that determines the time to failure • CDF is F(t) represents the probability an event will happen by time t 195 t F (t )  Pr( s  t )   f ( s)ds 0 • F(t) represents the probability that the event happens by ‘t’. • What is the probability a person will die on or before the 65th birthday? 196 • Survivor function, what is the chance you live past (t) • S(t) = 1 – F(t) • If 10% of a cohort dies by their 65th birthday, 90% will die sometime after their 65th birthday 197 • Hazard function, h(t) • What is the probability the spell will end at time t, given that it has already lasted t • What is the chance you find a new job in month 12 given that you’ve been unemployed for 12 months already Pr( t  T  t  h | T  t )  ( t )  lim h 0 h 198 • PDF, CDF (Failure function), survivor function and hazard function are all related • λ(t) = f(t)/S(t) = f(t)/(1-F(t)) • We focus on the ‘hazard’ rate because its relationship to time indicates ‘duration dependence’ 199 • Example: suppose the longer someone is out of work, the lower the chance they will exit unemployment – ‘damaged goods’ • This is an example of duration dependence, the probability of exiting a state of the world is a function of the length 200 • Mathematically • d λ(t) /dt = 0 then there is no duration dep. • d λ(t) /dt > 0 there is + duration dependence the probability the spell will end increases with time • d λ(t) /dt < 0 there is – duration dependence the probability the spell will end decreases over time 201 • Your choice, is to pick values for f(t) that have +, - or no duration dependence 202 Different Functional Forms • Exponential – λ(t)= λ – Hazard is the same over time, a ‘memory less’ process • Weibull – – – – – F(t) = 1 – exp(-γtα) where α,γ > 0 λ(t) = αγtα-1 if α>1, increasing hazard if α<1, decreasing hazard if α=1, exponential 203 • Others: Lognormal, log-logistic, Gompertz 204 NHIS Multiple Cause of Death • NHIS – annual survey of 60K households – Data on individuals – Self-reported healthm DR visits, lost workdays, etc. • MCOD – Linked NHIS respondents from 1986-1994 to National Death Index through Dec 31, 1995 – Identified whether respondent died and of what cause 205 • Our sample – Males, 50-70, who were married at the time of the survey – 1987-1989 surveys – Give everyone 5 years (60 months) of followup 206 Key Variables • max_mths maximum months in the survey. • Diedin5 respondent died during the 5 years of followup • Note if diedn5=0, the max_mths=60. Diedin5 identifies whether the data is censored or not. 207 Identifying Duration Data in STATA • Need to identify which is the duration data stset length, failure(failvar) • Length=duration variable • Failvar=1 when durations end in failure, =0 for censored values • If all data is uncensored, omit failure(failvar) 208 • In our case • Stset max_mths, failure(diedin5) 209 Getting Kaplan-Meier Curves • Tabular presentation of results sts list • Graphical presentation sts graph • Results by subgroup sts graph, by(income) 210

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document