Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lasso (statistics) wikipedia , lookup
Discrete choice wikipedia , lookup
Data assimilation wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Interaction (statistics) wikipedia , lookup
Time series wikipedia , lookup
Choice modelling wikipedia , lookup
Linear regression wikipedia , lookup
The Power of Regression • Previous Research Literature Claim • Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants • In Canada, strike rates of 25.5% versus 20.3% • Budd’s Claim • Foreign-owned plants are larger and located in strike-prone industries • Need multivariate regression analysis! 1 The Power of Regression Dependent Variable: Strike Incidence (1) U.S. Corporate Parent (Canadian Parent omitted) Number of Employees (1000s) Industry Effects? Sample Size 0.230** (0.117) --- (2) 0.201* (0.119) 0.177** (0.019) (3) 0.065 (0.132) 0.094** (0.020) No No Yes 2,170 2,170 2,170 * Statistically significant at the 0.10 level; ** at the 0.05 level (two-tailed tests). 2 Important Regression Topics • Prediction • Various confidence and prediction intervals • Diagnostics • Are assumptions for estimation & testing fulfilled? • Specifications • Quadratic terms? Logarithmic dep. vars.? • Additional hypothesis tests • Partial F tests • Dummy dependent variables • Probit and logit models 3 Confidence Intervals • The true population [whatever] is within the following interval (1-)% of the time: Estimate ± t/2 Standard ErrorEstimate • Just need • Estimate • Standard Error • Shape / Distribution (including degrees of freedom) 4 Prediction Interval for New Observation at xp 1. Point Estimate 2. Standard Error 3. Shape • t distribution with n-k-1 d.f 4. So prediction interval for a new observation is Siegel, p. 481 5 Prediction Interval for Mean Observations at xp 1. Point Estimate 2. Standard Error 3. Shape • t distribution with n-k-1 d.f 4. So prediction interval for a new observation is Siegel, p. 483 6 Earlier Example 1. Find 95% CI for Joe’s exam score (studies for 20 hours) Hours of Study (x) and Exam Regression Statistics Score (y) Example Multiple R 0.770 R Squared 0.594 Adj. R Squared 0.543 Standard Error 10.710 Obs. 2. Find 95% CI for mean score for those who studied for 20 hours 10 ANOVA df SS MS F Significance Regression 1 1340.452 1341.452 11.686 0.009 Residual 8 917.648 114.706 Total 9 2258.100 Coeff. Std. Error t stat p value Lower 95% Upper 95% Intercept 39.401 12.153 3.242 0.012 11.375 67.426 hours 2.122 0.621 3.418 0.009 0.691 3.554 -x = 18.80 7 Diagnostics / Misspecification • For estimation & testing to be valid… • y = b0 + b1x1 + b2x2 + … + bkxk + e makes sense • Errors (ei) are independent • of each other Violations render • of the independent variables our inferences • Homoskedasticity invalid and misleading! • Error variance independent of the independent variables • e2 is a constant • Var(ei) xi2 (i.e., not heteroskedasticity) 8 Common Problems • Misspecification • Omitted variable bias • Nonlinear rather than linear relationship • Levels, logs, or percent changes? • Data Problems • Skewed variables and outliers • Multicollinearity • Sample selection (non-random data) • Missing data • Problems with residuals (error terms) • Non-independent errors • Heteroskedasticity 9 Omitted Variable Bias • Question 3 from Sample Exam B wage = 9.05 + 1.39 union (1.65) (0.66) wage = 9.56 + 1.42 union + 3.87 ability (1.49) (0.56) (1.56) wage = -3.03 + 0.60 union + 0.25 revenue (0.70) (0.45) (0.08) • H. Farber thinks the average union wage is different from average nonunion wage because unionized employers are more selective and hire individuals with higher ability. • M. Friedman thinks the average union wage is different from the average nonunion wage because unionized employers have different levels of revenue per employee. 10 Checking the Assumptions • How to check the validity of the assumptions? • Cynicism, Realism, and Theory • Robustness Checks • Check different specifications • But don’t just choose the best one! • Automated Variable Selection Methods • e.g., Stepwise regression (Siegel, p. 547) • Misspecification and Other Tests • Examine Diagnostic Plots 11 Diagnostic Plots Residuals Increasing spread might indicate heteroskedasticity. Try transformations or weighted least squares. Predicted Values 12 “Tilt” from outliers might indicate skewness. Try log transformation Residuals Diagnostic Plots Predicted Values 13 Problematic Outliers Stock Performance and CEO Golf Handicaps (New York Times, 5-31-98) Number of obs = 44 R-squared = 0.1718 -----------------------------------------------stockrating | Coef. Std. Err. t P>|t| -------------+---------------------------------handicap | -1.711 .580 -2.95 0.005 _cons | 73.234 8.992 8.14 0.000 ------------------------------------------------ Without 7 “Outliers” Number of obs = 51 R-squared = 0.0017 -----------------------------------------------stockrating | Coef. Std. Err. t P>|t| -------------+---------------------------------handicap | -.173 .593 -0.29 0.771 _cons | 55.137 9.790 5.63 0.000 ------------------------------------------------ With the 7 “Outliers” 14 Are They Really Outliers?? Residuals Diagnostic Plot is OK BE CAREFUL! Predicted Values Stock Performance and CEO Golf Handicaps (New York Times, 5-31-98) 15 Curvature might indicate nonlinearity. Try quadratic specification Residuals Diagnostic Plots Predicted Values 16 Good diagnostic plot. Lacks obvious indications of other problems. Residuals Diagnostic Plots Predicted Values 17 Adding Squared (Quadratic) Term Job Performance regression on Salary (in $1,000s) (Egg Data) Source | SS df MS Number of obs = 576 ------- -+-------------------F(2,573) = 122.42 Model | 255.61 2 127.8 Prob > F = 0.0000 Residual | 598.22 573 1.044 R-squared = 0.2994 ---------+-------------------Adj R-squared = 0.2969 Total | 853.83 575 1.485 Root MSE = 1.0218 ---------------+-------------------------------------------job performance| Coef. Std. Err. t P>|t| ---------------+-------------------------------------------salary | .0980844 .0260215 3.77 0.000 salary squared | -.000337 .0001905 -1.77 0.077 _cons | -1.720966 .8720358 -1.97 0.049 ------------------------------------------------------------ Salary Squared = Salary2 [=salary^2 in Excel] 18 Quadratic Regression Job perf = -1.72 + 0.098 salary – 0.00034 salary squared Job Performance 8 6 4 Quadratic regression (nonlinear) 2 0 30 50 70 90 110 130 150 Annual Salary (1000s) 19 Quadratic Regression Job perf = -1.72 + 0.098 salary – 0.00034 salary squared Job Performance 8 -linear coeff. Max = 2*quadratic coeff. 6 4 But where? 2 Effect of salary will eventually turn negative 0 30 50 70 90 110 130 150 170 190 Annual Salary (1000s) 20 Another Specification Possibility • If data are very skewed, can try a log specification • Can use logs instead of levels for independent and/or dependent variables • Note that the interpretation of the coefficients will change • Re-familiarize yourself with Siegel, pp. 68-69 21 Quick Note on Logs • a is the natural logarithm of x if: 2.71828a = x a e =x • • • • or, The natural logarithm is abbreviated “ln” • ln(x) = a In Excel, use ln function We call this the “log” but don’t use the “log” function! Usefulness: spreads out small values and narrows large values which can reduce skewness 22 Earnings Distribution Skewed to the right Weekly Earnings from the March 2002 CPS, n=15,000 23 Residuals from Levels Regression Skewed to the right— use of t distribution is suspect Residuals from a regression of Weekly Earnings on demographic characteristics 24 Log Earnings Distribution Not perfectly symmetrical, but better Natural Logarithm of Weekly Earnings from the March 2002 CPS, i.e., =ln(weekly earnings) 25 Residuals from Log Regression Almost symmetrical —use of t distribution is probably OK Residuals from a regression of Log Weekly Earnings on demographic characteristics 26 Hypothesis Tests • We’ve been doing hypothesis tests for single coefficients • H0: = 0 reject if |t| > t/2,n-k-1 • HA: 0 • What about testing more than one coefficient at the same time? • e.g., want to see if an entire group of 10 dummy variables for 10 industries should be in the model • Joint tests can be conducted using partial F tests 27 Partial F Tests H0: 1 = 2 = 3 = … = C = 0 HA: at least one i 0 • How to test this? • Consider two regressions • One as if H0 is true • i.e., 1 = 2 = 3 = … = C = 0 • This is a “restricted” (or constrained) model • Plus a “full” (or unconstrained) model in which the computer can estimate what it wants for each coefficient 28 Partial F Tests • Statistically, need to distinguish between • Full regression “no better” than the restricted regression – versus – • Full regression is “significantly better” than the restricted regression • To do this, look at variance of prediction errors • If this declines significantly, then reject H0 • From ANOVA, we know ratio of two variances has an F distribution • So use F test 29 Partial F Tests SS )/C /(n k 1) Restricted residual Full residual (SS F SS Full residual • SSresidual = Sum of Squares Residual • C = #constraints • The partial F statistic has C, n-k-1 degrees of freedom • Reject H0 if F > F,C, n-k-1 30 Coal Mining Example (Again) Regression Statistics R Squared 0.955 Adj. R Squared 0.949 Standard Error 108.052 Obs. 47 ANOVA df SS MS F Significance Regression 6 9975694.933 1662615.822 142.406 0.000 Residual 40 467007.875 11675.197 Total 46 10442702.809 Coeff. Std. Error t stat p value Lower 95% Upper 95% -168.510 258.819 -0.651 0.519 -691.603 354.583 hours 1.244 0.186 6.565 0.000 0.001 0.002 tons 0.048 0.403 0.119 0.906 -0.001 0.001 unemp 19.618 5.660 3.466 0.001 8.178 31.058 WWII 159.851 78.218 2.044 0.048 1.766 317.935 Act1952 -9.839 100.045 -0.098 0.922 -212.038 192.360 Act1969 -203.010 111.535 -1.820 0.076 -428.431 22.411 Intercept 31 Minitab Output Predictor Constant hours tons unemp WWII Act1952 Act1969 S = 108.1 Analysis of Source Regression Error Total Coef -168.5 1.2235 0.0478 19.618 159.85 -9.8 -203.0 StDev 258.8 0.186 0.403 5.660 78.22 100.0 111.5 R-Sq = 95.5% Variance DF SS 6 9975695 40 467008 46 10442703 T -0.65 6.56 0.12 3.47 2.04 -0.10 -1.82 P 0.519 0.000 0.906 0.001 0.048 0.922 0.076 R-Sq(adj) = 94.9% MS 1662616 11675 F 142.41 P 0.000 32 Is the Overall Model Significant? H0: 1 = 2 = 3 = … = 6 = 0 HA: at least one i 0 • Note: for testing the overall model, C=k • i.e., testing all coefficients together • From the previous slides, we have SSresidual for the “full” (or unconstrained) model • SSresidual=467,007.875 • But what about for the restricted (H0 true) regression? • Estimate a constant only regression 33 Constant-Only Model Regression Statistics R Squared 0 Adj. R Squared 0 Standard Error 476.461 Obs. 47 ANOVA df SS MS F Significance Regression 0 0 0 . . Residual 46 10442702.809 227015.278 Total 46 10442702.809 Coeff. Std. Error t stat p value Lower 95% Upper 95% 671.937 69.499 9.668 0.0000 532.042 811.830 Intercept 34 Partial F Tests (10,442,702.809 467,007.875)/6 F 467,007.875/(47 6 1) = 142.406 H0: 1 = 2 = 3 = … = 6 = 0 HA: at least one i 0 • Reject H0 if F > F,C, n-k-1 = F0.05,6,40 = 2.34 • 142.406 > 2.34 so reject H0. Yes, overall model is significant 35 Denominator Degrees of Freedom Select F Distribution 5% Critical Values Numerator Degrees of Freedom 1 2 3 4 5 6 … 1 161 199 216 225 230 234 2 18.5 19.0 19.2 19.2 19.3 19.3 3 10.1 9.55 9.28 9.12 9.01 8.94 8 5.32 4.46 4.07 3.84 3.69 3.58 10 4.96 4.10 3.71 3.48 3.33 3.22 11 4.84 3.98 3.59 3.36 3.20 3.09 12 4.75 3.89 3.49 3.26 3.11 3.00 18 4.41 3.55 3.16 2.93 2.77 2.66 40 3.94 3.09 2.84 2.46 2.31 2.19 1000 3.85 3.00 2.61 2.38 2.22 2.11 … 36 A Small Shortcut Regression Statistics R Squared 0.955 Adj. R Squared 0.949 Standard Error 108.052 Obs. 47 For constant only model, SSresidual=10,442,702.809 ANOVA df SS MS F Significance Regression 6 9975694.933 1662615.822 142.406 0.000 Residual 40 467007.875 11675.197 Total 46 10442702.809 Coeff. Std. Error p value Lower 95% -168.510 258.819 hours 1.244 0.186 tons 0.048 0.403 unemp 19.618 5.660 WWII 159.851 78.218 2.044 0.048 1.766 317.935 Act1952 -9.839 100.045 -0.098 0.922 -212.038 192.360 Act1969 -203.010 111.535 -1.820 0.076 -428.431 22.411 Intercept t stat Upper 95% -0.651 0.519 -691.603 354.583 So to test overall model, you 0.000 0.001 0.002 don’t6.565 need to run a constant0.119 0.906 -0.001 0.001 only3.466 model0.001 8.178 31.058 37 An Even Better Shortcut Regression Statistics R Squared 0.955 Adj. R Squared 0.949 Standard Error 108.052 Obs. 47 ANOVA df SS MS F Significance Regression 6 9975694.933 1662615.822 142.406 0.000 Residual 40 467007.875 11675.197 Total 46 10442702.809 Coeff. Std. Error -168.510 258.819 hours 1.244 0.186 tons 0.048 0.403 unemp 19.618 5.660 WWII 159.851 78.218 Act1952 -9.839 100.045 -0.098 0.922 -212.038 192.360 Act1969 -203.010 111.535 -1.820 0.076 -428.431 22.411 Intercept t stat p value Lower 95% Upper 95% In fact, the ANOVA table F -0.651 0.519 -691.603 354.583 test is6.565 exactly0.000 the test0.001 for the 0.002 0.001 overall0.119 model0.906 being -0.001 3.466 0.001 8.178 31.058 significant—recall Unit 2.044 0.048 1.7668 317.935 38 Testing Any Subset Regression Statistics Partial F test can be 0.955 Adj. R Squared 0.949 used to test any subset Standard Error 108.052 of Obs. variables 47 R Squared ANOVA df SS MS F Significance Regression 6 9975694.933 1662615.822 142.406 0.000 Residual 40 467007.875 11675.197 Total 46 10442702.809 Coeff. Std. Error t stat p value Lower 95% Upper 95% -168.510 258.819 -0.651 0.519 -691.603 354.583 Intercept hours 1.244 tons 0.048 unemp 19.618 WWII 159.851 Act1952 -9.839 Act1969 -203.010 6.565 0.000 0.001 0.002 For example, 0.403 0.119 0.906 -0.001 0.001 H0: WWII3.466 = Act1952 5.660 0.001 = Act1969 8.178 = 031.058 78.218 2.044 0.048 1.766 317.935 H : at least one 0 A i 100.045 -0.098 0.922 -212.038 192.360 0.186 111.535 -1.820 0.076 -428.431 22.411 39 Restricted Model Regression Statistics R Squared 0.955 Adj. R Squared 0.949 Standard Error 108.052 Obs. Restricted regression with WWII = Act1952 = Act1969 = 0 47 ANOVA df SS MS F Significance Regression 3 9837344.76 3279114.920 232.923 0.000 Residual 43 605358.049 14078.094 Total 46 10442702.809 Coeff. Std. Error t stat p value 147.821 166.406 0.888 0.379 0.0015 0.0001 20.522 0.000 -0.0008 0.0003 -2.536 0.015 7.298 4.386 1.664 0.103 Intercept hours tons unemp 40 Partial F Tests (605,358.049 467,007.875)/3 F 467,007.875/(47 6 1) = 3.950 H0: WWII = Act1952 = Act1969 = 0 HA: at least one i 0 • Reject H0 if F > F,C, n-k-1 = F0.05,3,40 = 2.84 • 3.95 > 2.84 so reject H0. Yes, subset of three coefficients are jointly significant 41 Regression and Two-Way ANOVA Blocks Treatments 1 2 3 4 5 A B C 10 9 8 12 6 5 18 15 14 20 18 18 8 7 8 “Stack” data using dummy variables A B C B2 B3 B4 B5 Value 1 0 0 0 0 0 0 10 1 0 0 1 0 0 0 12 1 0 0 0 1 0 0 18 1 0 0 0 0 1 0 20 1 0 0 0 0 0 1 8 0 1 0 0 0 0 0 9 0 1 0 1 0 0 0 6 0 1 0 0 1 0 0 15 0 1 0 0 0 1 0 18 0 1 1 0 0 0 1 7 0 0 1 0 0 0 0 8 … … 42 Recall Two-Way Results ANOVA: Two-Factor Without Replication Source of Variation SS df MS 312.267 4 Treatment 26.533 Error Total Blocks F Pvalue F crit 78.067 38.711 0.000 3.84 2 13.267 6.579 0.020 4.46 16.133 8 2.017 354.933 14 43 Regression and Two-Way ANOVA Source | SS df MS ----------+---------------------Model | 338.800 6 56.467 Residual | 16.133 8 2.017 -------------+------------------Total | 354.933 14 25.352 Number of obs F( 6, 8) Prob > F R-squared Adj R-squared Root MSE = = = = = = 15 28.00 0.0001 0.9545 0.9205 1.4201 ------------------------------------------------------------treatment | Coef. Std. Err. t P>|t| [95% Conf. Int] ----------+-------------------------------------------------b | -2.600 .898 -2.89 0.020 -4.671 -.529 c | -3.000 .898 -3.34 0.010 -5.071 -.929 b2 | -1.333 1.160 -1.15 0.283 -4.007 1.340 b3 | 6.667 1.160 5.75 0.000 3.993 9.340 b4 | 9.667 1.160 8.34 0.000 6.993 12.340 b5 | -1.333 1.160 -1.15 0.283 -4.007 1.340 _cons | 10.867 .970 11.20 0.000 8.630 13.104 ------------------------------------------------------------44 Regression and Two-Way ANOVA Regression Excerpt for Full Model Source | SS df MS ---------+------------------Model | 338.800 6 56.467 Residual | 16.133 8 2.017 ---------+------------------Total | 354.933 14 25.352 Regression Excerpt for b2= b3 =… 0 Source | SS df MS ---------+------------------Model | 26.533 2 13.267 Residual | 328.40 12 27.367 ---------+------------------Total | 354.933 14 25.352 Use these SSresidual values to do partial F tests and you will get exactly the same answers as the TwoWay ANOVA tests Regression Excerpt for b = c = 0 Source | SS df MS ---------+------------------Model | 312.267 4 78.067 Residual | 42.667 10 4.267 ---------+------------------Total | 354.933 14 25.352 45 Denominator Degrees of Freedom Select F Distribution 5% Critical Values 1 1 161 2 18.5 3 10.1 8 5.32 10 4.96 11 4.84 12 4.75 18 4.41 40 3.94 1000 3.85 3.84 Numerator Degrees of Freedom 2 3 4 5 6 9 199 216 225 230 234 241 19.0 19.2 19.2 19.3 19.3 19.4 9.55 9.28 9.12 9.01 8.94 8.81 4.46 4.07 3.84 3.69 3.58 3.39 4.10 3.71 3.48 3.33 3.22 3.02 3.98 3.59 3.36 3.20 3.09 2.90 3.89 3.49 3.26 3.11 3.00 2.80 3.55 3.16 2.93 2.77 2.66 2.46 3.09 2.84 2.46 2.31 2.19 2.12 3.00 2.61 2.38 2.22 2.11 1.89 3.00 2.60 2.37 2.21 2.10 1.83 … 46 3 Seconds of Calculus y y x x log( x) x x x bo 0 if b o is a constant x (b1 x) b1 x 47 Regression Coefficients • y = b0 + b1x (linear form) y b1 x 1 unit change in x changes y by b1 • log(y) = b0 + b1x (semi-log form) log( y ) y / y %y b1 x x x 1 unit change in x changes y by b1 (x100) percent • log(y) = b0 + b1log(x) (double-log form) log( y ) y / y %y b1 log( x) x / x %x 1 percent change in x changes y by b1 percent 48 Log Regression Coefficients • wage = 9.05 + 1.39 union • Predicted wage is $1.39 higher for unionized workers (on average) • log(wage) = 2.20 + 0.15 union • Semi-elasticity • Predicted wage is approximately 15% higher for unionized workers (on average) • log(wage) = 1.61 + 0.30 log(profits) • Elasticity • A one percent increase in profits increases predicted wages by approximately 0.3 percent 49 Multicollinearity Auto repair records, weight, and engine size Number of obs = 69 F( 2, 66) = 6.84 Prob > F = 0.0020 R-squared = 0.1718 Adj R-squared = 0.1467 Root MSE = .91445 ---------------------------------------------repair | Coef. Std. Err. t P>|t| -------+-------------------------------------weight | -.00017 .00038 -0.41 0.685 engine | -.00313 .00328 -0.96 0.342 _cons | 4.50161 .61987 7.26 0.000 ---------------------------------------------50 Multicollinearity • Two (or more) independent variables are so highly correlated that a multiple regression can’t disentangle the unique contributions of each • Large standard errors and lack of statistical significance for individual coefficients • But joint significance • Identifying multicollinearity • Some say “rule of thumb |r|>0.70” (or 0.80) • But better to look at results • OK for prediction • Bad for assessing theory 51 Prediction With Multicollinearity • Prediction at the Mean (weight=3019 and engine=197) Model for prediction Lower Upper Predicted 95% Limit 95% Limit (Mean) Repair (Mean) Multiple Regression 3.411 3.191 3.631 Weight Only 3.412 3.193 3.632 Engine Only 3.410 3.192 3.629 52 Dummy Dependent Variables • Dummy dependent variables • y = b0 + b1x1 + … + bkxk + e • Where y is a {0,1} indicator variable • Examples • Do you intend to quit? yes / no • Did the worker receive training? yes/no • Do you think the President is doing a good job? yes/no • Was there a strike? yes / no • Did the company go bankrupt? yes/no 53 Linear Probability Model • Mathematically / computationally, can estimate a regression as usual (the monkeys won’t know the difference) • This is called a “linear probability model” • Right-hand side is linear • And is estimating probabilities • P(y =1) = b0 + b1x1 + … + bkxk • b1=0.15 (for example) means that a one unit change in x1 increases probability that y=1 by 0.15 (fifteen percentage points) 54 Linear Probability Model • Excel won’t know the difference, but perhaps it should • Linear probability model problems • e2 = P(y=1)[1-P(y=1)] • But P(y =1) = b0 + b1x1 + … + bkxk • So e2 is • Predicted probabilities are not bounded by 0,1 • R2 is not an accurate measure of predictive ability • Can use a pseudo-R2 measure • Such as percent correctly predicted 55 Logit Model & Probit Model • Solution to these problems is to use nonlinear functional forms that bound P(y=1) between 0,1 • Logit Model (logistic regression) eb0 b1x1 b2 x2 ... bk xk e P( y 1) 1 eb0 b1x1 b2 x2 ... bk xk e Recall, ln(x) = a when ea = x • Probit Model P( y 1) (b0 b1 x1 b2 x2 ... bk xk e) • Where is the normal cumulative distribution function 56 Logit Model & Probit Model • Nonlinear so need statistical package to do the calculations • Can do individual (z-tests, not t-tests) and joint statistical testing as with other regressions • Also confidence intervals • Need to convert coefficients to marginal effects for interpretation • Should be aware of these models • Though in many cases, a linear probability model works just fine 57 Example • Dep. Var: 1 if you know of the FMLA, 0 otherwise Probit estimates Number of obs = 1189 LR chi2(14) = 232.39 Prob > chi2 = 0.0000 Log likelihood = -707.94377 Pseudo R2 = 0.1410 -----------------------------------------------------------FMLAknow | Coef. Std. Err. z P>|z| [95% Conf. Int] ---------+-------------------------------------------------union | .238 .101 2.35 0.019 .039 .436 age | -.002 .018 -0.13 0.897 -.038 .033 agesq | .135 .219 0.62 0.536 -.293 .564 nonwhite | -.571 .098 -5.80 0.000 -.764 -.378 income | 1.465 .393 3.73 0.000 .696 2.235 incomesq | -5.854 2.853 -2.05 0.040 -11.45 -.262 [other controls omitted] _cons | -1.188 .328 -3.62 0.000 -1.831 -.545 -----------------------------------------------------------58 Marginal Effects • For numerical interpretation / prediction, need to convert coefficients to marginal effects • Example: Logit Model P( y 1) b0 b1 x1 b2 x2 ... bk xk e log 1 P( y 1) • So b1 gives effect on Log(•), not P(y=1) • Probit is similar • Can re-arrange to find out effect on P(y=1) • Usually do this at the sample means 59 Marginal Effects Probit estimates Number of obs = 1189 LR chi2(14) = 232.39 Prob > chi2 = 0.0000 Log likelihood = -707.94377 Pseudo R2 = 0.1410 -----------------------------------------------------------FMLAknow | dF/dx Std. Err. z P>|z| [95% Conf. Int] ---------+-------------------------------------------------union | .095 .040 2.35 0.019 .017 .173 age | -.001 .007 -0.13 0.897 -.015 .013 agesq | .054 .087 0.62 0.536 -.117 .225 Nonwhite | -.222 .036 -5.80 0.000 -.293 -.151 income | .585 .157 3.73 0.000 .278 .891 incomesq | -2.335 1.138 -2.05 0.040 -4.566 -.105 [other controls omitted] ----------------------------------------------------------- For numerical interpretation / prediction, need to convert coefficients to marginal effects 60 But Linear Probability Model is OK, Too Union Nonwhite Income Income Squared Probit Coeff. 0.238 (0.101) Probit Marginal 0.095 (0.040) Regression 0.084 (0.035) -0.571 (0.098) 1.465 (0.393) -0.222 (0.037) 0.585 (0.157) -0.192 (0.033) 0.442 (0.091) -5.854 (2.853) -2.335 (1.138) -1.354 (0.316) So regression is usually OK, but should still be familiar with logit and probit methods 61