Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Limited Dependent Variables: Binary Models Erik Nesson Ball State University MBSW 2013 1 Outline 1. Overview of LDVs 2. Binary Outcome Models a. Linear Probability Model b. Logit and Probit 3. Interpretation of Coefficients a. Odds ratios vs. marginal effects b. Implementation in Stata 2 Binary Outcome Models β’ Dependent variable takes values of 0 or 1 β’ Model of interest is probability that y=1 conditional on independent variables ο§ π π¦ = 1|π = G π, π½ ο§ Here πrepresents matrix of covariates and π½ represents vector of coefficients β’ Examples ο§ Is a person obese? Does a person smoke? Does a person contract a disease? 3 Example: Secondhand Smoke β’ Half of adult non-smokers are exposed to environmental tobacco smoke (ETS) β’ Main point of public policies is to reduce ETS exposure in specific areas ο§ Ex: smokefree air laws are meant to reduce ETS exposure at work β’ But most research about tobacco control policies focuses on reducing smoking β’ Main question: ο§ Do tobacco control policies reduce ETS exposure in the workplace? 4 How to measure ETS exposure? β’ NHANES Dataset ο§ Repeated cross-section dataset covering 1988-1994 and 1999-2004 ο§ Roughly 10k individuals interviewed every year ο§ All individuals complete extensive survey AND receive physical ο§ Contains extensive demographic and health history information β’ Sample ο§ Non-smoking, employed individuals age 18 to 65 ο§ 8,554 individuals 5 Dependent and Independent Variables 1. Indicators of ETS Exposure a. Q: ββ¦ how many hours per day can you smell the smoke from other peopleβs cigarettes, cigars, and/or pipes?β b. Serum cotinine levels β’ β’ Cotinine is major metabolite of nicotine with 8-16 hr half life Very low levels can be detected 2. Tobacco Control Policies a. Cigarette Taxes measured in real $2009 b. The percent of each individualβs state living under a workplace SFA law (from 0 to 100) 6 Summary Statistics Smell Smoke at Work Observable Cotinine Level Cigarette Excise Tax Female Age Black Hispanic Married Income to Poverty Ratio B.A. Degree Some College H.S. Degree Less than H.S. Family Size Rooms in Home All Workers (N=8554) Mean Std. Dev. 0.818 2.126 0.629 0.483 1.265 0.644 0.502 0.500 41.260 12.955 0.102 0.303 0.117 0.321 0.658 0.474 3.236 1.639 0.328 0.469 0.307 0.461 0.242 0.429 0.122 0.328 3.145 1.514 6.334 2.116 White Collar Workers (N=4825) Mean Std. Dev. 0.536 1.717 0.572 0.495 1.267 0.633 0.587 0.492 41.625 12.624 0.092 0.290 0.074 0.262 0.664 0.472 3.572 1.556 0.446 0.497 0.316 0.465 0.192 0.394 0.045 0.208 3.003 1.407 6.566 2.189 Blue Collar Workers (N=3729) Mean Std. Dev. 1.388 2.686 0.746 0.435 1.262 0.666 0.330 0.470 40.522 13.571 0.121 0.327 0.203 0.402 0.646 0.478 2.559 1.591 0.089 0.285 0.290 0.454 0.343 0.475 0.278 0.448 3.431 1.674 5.866 1.877 T-Test 0.000 0.000 0.834 0.000 0.007 0.000 0.000 0.248 0.000 0.000 0.105 0.000 0.000 0.000 0.000 7 Self-Reported ETS Exposure at Work by Job Category 90 White Collar Jobs 80 70 Blue Collar Jobs Percent 60 Total 50 40 30 20 10 0 No Yes Any Secondhand Smoke Exposure Notes: Data from NHANES III and NHANES 1999/2000 - NHANES 2003/2004. 8 Tabulation of Observable Cotinine Levels by Job Category 80% 70% Percent 60% White Collar Jobs Blue Collar Jobs Total 50% 40% 30% 20% 10% 0% No Yes Any Secondhand Smoke Exposure Notes: Data from NHANES III and NHANES 1999/2000 - NHANES 2003/2004. 9 Basic Model β’ Basic model estimates exposure to ETS as a function of tobacco control policies, individual characteristics, other geographic characteristics: ο§ π πΈπππππ‘ = 1 ππΆ, π, π, π, π = G ππΆππ‘ π½ + πππ‘ πΌ + 10 Linear Probability Model β’ Assume conditional probability is linear in π ο§ π π¦ = 1|π = ππ½ β’ Upsides to LPM ο§ Easy to run: run a linear regression: β’ π¦ = π½0 + π½1 π₯1 + β― + π½π π₯π ο§ Interpretation is easy! β’ ππ π¦=1|π ππ₯π = π½π β’ In words, βEvery unit increase in π₯π is associated with a π½π percentage point change in the probability that y=1.β ο§ Prediction is easy! β’ π π¦ = 1|π = π₯ = π½0 + π½1 π₯1 + β― + π½π π₯π 11 Binary Outcome Models β’ Downsides to LPM ο§ G ππ½ may not be linear ο§ Predicted probability does not need to be between 0 and 1 ο§ Constant marginal effect may not be reasonable! 12 Linear Probability Results β’ Table shows marginal effects with standard errors in parentheses Self-Reported Exposure White Collar Blue Collar All Workers Workers Workers Cigarette Excise Tax 0.035 0.055* 0.021 (0.03) (0.03) (0.07) Work SFA Law -0.001* 0.000 -0.002*** (0.00) (0.00) (0.00) Observable Cotinine Levels White Collar Blue Collar All Workers Workers Workers 0.019 0.015 0.026 (0.03) (0.04) (0.05) -0.002*** -0.003*** -0.001 (0.00) (0.00) (0.00) 13 Logit or Probit β’ Assume that πΊ(π§) is a function such that πΊ π§ β 0,1 for any value of π§ ο§ As π§ β β πΊ(π§) β 1 and as π§ β ββ πΊ(π§) β 0 β’ For Logit πΊ ππ½ = exp ππ½ 1+exp ππ½ β’ For Probit πΊ π β² π½ = Ξ¦ ππ½ ο§ Note: Ξ¦ π is the standard normal CDF 14 Interpretation of Coefficients β’ Continuous variable: ο§ If π π¦ = 1|π = G ππ½ then what is marginal effect, ππ π¦=1|π i.e. ? ππ₯π ο§ Using some ππ π¦=1|π calculus: ππ₯π = g ππ½ π½π β’ Where g ππ½ is the density function associated with G ππ½ ο§ Some notes: 1. 2. 3. ππ π¦=1|π ππ₯π doesnβt only depend on π½π What values of π should we use? Marginal effect isnβt constant like in linear probability model 15 Interpretation of Coefficients β’ Discrete variable: ο§ If π π¦ = 1|π = G ππ½ then marginal effect of π₯1 increasing from 0 to 1 = G π½0 + π·π + π½2 π₯2 + β― + π½π π₯π β G π½0 + π½2 π₯2 + β― + π½π π₯π ο§ Some notes: 1. Again, marginal effect doesnβt only depend on π½π 2. What values of π should we use? 3. Marginal effect isnβt constant like in linear probability model 16 Calculating Marginal Effects β’ For both continuous and discrete variables, other coefficients and variable values enter into marginal effects calculation β’ Three common approaches: 1. Marginal effect at the mean: Use mean values of other variables 2. Average marginal effect: Calculate marginal effect for each observation 3. Marginal effect at some other value of coefficients 17 Calculating Marginal Effects β’ Calculating marginal effect at the mean for π₯π ο§ Continuous coefficient β’ Find g ππ½ by plugging in mean values for π β’ Multiply g ππ½ by π½π ο§ Discrete coefficient β’ Find G π β² π½ when π₯π = 1 by plugging in mean values for π and 1 for π₯π β’ Find G π β² π½ when π₯π = 0 by plugging in mean values for π and 0 for π₯π β’ Subtract two values 18 Calculating Marginal Effects β’ Calculating average marginal effect for π₯π ο§ Continuous coefficient β’ Find g ππ½ for each observation and multiply by π½π β’ Find mean value for all observations ο§ Discrete coefficient β’ β’ β’ β’ Find G ππ½ when π₯π = 1 for each observation Find G ππ½ when π₯π = 0 for each observation Subtract two values for each observation Find mean value for all observations 19 Implementation in Stata β’ Code for estimating marginal effect at the mean in Stata: ο§ logit y x ο§ margins, dydx(varlist) atmeans β’ Code for estimating average marginal effect in Stata: ο§ logit y x ο§ margins, dydx(varlist) β’ Usually Stata is smart enough to determine which independent variables are binary 20 Odds Ratios β’ Common in other fields to run a logit model and report an odds ratio β’ What are odds? ο§ Odds of an event = β’ The odds ratio = π π¦=1|π₯1 =1 1βπ π¦=1|π₯1 =1 π π¦=1|ππ =π 1βπ π¦=1|ππ =π π π¦=1|ππ =π 1βπ π¦=1|ππ =π β’ Odds ratio>1: event is more likely to happen β’ Odds ratio<1: event is less likely to happen 21 Odds Ratios and Logit β’ How do Odds Ratios work in Logit? ο§ Odds π¦ = 1 given π₯1 = 1: exp ππ½ π π¦ = 1|π₯1 = 1, π 1 + exp ππ½ = = exp ππ½ exp ππ½ 1 β π π¦ = 1|π₯1 = 1, π 1β 1 + exp ππ½ β’ Note: Z β² π½ = π½0 + π½1 π + π½2 π₯2 + β― + π½π π₯π ο§ Similarly, odds π¦ = 1 given π₯1 = 0: π π¦ = 1|π₯1 = 0, π = exp ππ½ 1 β π π¦ = 1|π₯1 = 0, π β’ Note: W β² π½ = π½0 + π½1 π + π½2 π₯2 + β― + π½π π₯π 22 Odds Ratios and Logit β’ Then Odds Ratio = ο§ Plugging in: exp ππ½ exp ππ½ exp π½0 +π½1 +π½2 π₯2 +β―+π½π π₯π exp π½0 +π½2 π₯2 +β―+π½π π₯π = exp π½1 β’ Quick notes about the odds ratio 1. Odds ratio does not depend on other coefficients or independent variables 2. May be difficult to translate into policy 3. Odds ratios are very easy to compute! Simply exponentiate coefficients 23 Odds Ratios and Logit β’ What does an odds ratio of 2 mean? ο§ Odds of y=1 are 2x greater when x=1 than when x=0 ο§ Could be that β’ Odds that y=1|x=1 = 4 and odds that y=1|x=0 = 2 β’ Odds that y=1|x=1 = 3 and odds that y=1|x=0 = 1.5 β’ Odds that y=1|x=1 = 2 and odds that y=1|x=0 = 1 β’ So odds ratios are not equal to marginal effects β’ Do not tell us about differences in probability 24 Odds Ratios and Marginal Effects β’ Some notation: ο§ π0 : π(π¦ = 1|π₯ = 0) π0 : Odds that π¦ = 1|π₯ = 0 ο§ π1 : π(π¦ = 1|π₯ = 1) π1 : Odds that π¦ = 1|π₯ = 1 π·π π·π πΆπ πΆπ 0.05 0.10 0.50 0.80 0.90 0.10 0.15 0.55 0.85 0.95 0.05 0.11 1.00 4.00 9.00 0.11 0.18 1.22 5.67 19.00 Odds Ratio 2.11 1.59 1.22 1.42 2.11 Marginal Effect 0.05 0.05 0.05 0.05 0.05 25 Logit Results β’ Table shows odds ratios, standard errors in parentheses, and marginal effects in brackets Cigarette Excise Tax Work SFA Law Self-Reported Exposure White Collar Blue Collar All Workers Workers Workers 1.226 1.447 * 1.127 (1.17) (0.79) (3.49) [0.030] [0.039] [0.027] 0.993 *** 0.995 0.988 *** (-0.32) (-0.86) (-0.38) [-0.001] [-0.001] [-0.003] Observable Cotinine Levels White Collar Blue Collar All Workers Workers Workers 1.008 1.000 1.043 (0.02) (0.05) (0.02) [0.001] [0.000] [0.003] 0.990 *** 0.987 *** 0.997 (0.00) (0.00) (0.00) [-0.002] [-0.002] [-0.000] 26 Other Issues β’ Standard errors are complicated ο§ Be wary of canned programs (like Stata!) which allow calculation of robust variance/covariance matrices. β’ Interaction terms are also complicated ο§ Odds ratios can be difficult to interpret ο§ Marginal effects are better! 27