Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Logistic Regression Analysis Estimation and Interpretation • Logit Estimation – Log likelihood estimation • Hypothesis Tests • Interpretation • Reversing Logits – Averages – Types April 18, 2006 Lecture 13 Slide #1 Logit Estimation • Logit uses maximum likelihood estimation – Counterpart to minimizing least squares • MLE identifies the probability of obtaining the sample as a function of model parameters (X’s): – What values for b’s make the sample most likely? April 18, 2006 Lecture 13 Slide #2 Logit Estimation, Continued The probability that Y i = 1 is : 1 K 1 Pi = where Li 0 k 1 k X ik - Li (1 + e ) Each observation i (Yi plus the Xi’s) contributes to the MLF by Pi if Yi=1, and by 1-Pi if Yi=0. The contribution is: PiYi (1 Pi )1Yi and the ML function is L {Pi (1 Pi ) Yi 1 Yi } In other words, the MLF is largest for the model that best predicts when Y=1 or Y=0; when the predicted value of Y is correct and close to 1 or 0, the MLF is maximized. April 18, 2006 Lecture 13 Slide #3 Logit Estimation, Continued The MLF identifies values of the b’s that maximize the log likelihood: log e L {Yi log e Pi (1 Yi ) log e (1 Pi )} The solution involves taking the first derivative of the log likelihood with respect to each of the b’s, setting them to zero, and solving the simultaneous equation. The solution of the equation isn’t linear, so it can’t be solved directly. Instead, it’s solved through a sequential estimation process that looks for successively better “fits” of the model. April 18, 2006 Lecture 13 Slide #4 Example of Logit Estimation: Radiation Protection Standards 3. Supra-linear relationship High Cancer Incidence 1. Linear relationship 2. Sub-linear relationship 0 0 April 18, 2006 Radiation Dose Lecture 13 High Slide #5 Measurement • “Given your own knowledge of radiation effects on humans and other organisms, which of the above hypothesized relationships do you think is most likely correct?” • “On a scale where zero means not at all certain, and ten means completely certain, how certain are you that the relationship you identified is correct?” • “Which of these three hypothesized relationships do you think should be assumed for purposes of setting public safety standards for managing radioactive materials?” April 18, 2006 Lecture 13 Slide #6 Example of Logit Estimation, Continued • We need a binary dependent variable – Focus only on those (the majority) who believe the threshold model (quad-linear) is correct – Predict the choice between the threshold (Quad=0) and the linear (=1) model as the basis for their preferred safety standard – – – generate DR_correct = c4_28_ra recode DR_correct (2=0) (1=1) (3=.) (4=.) tabulate DR_correct – Now recode the D-R function that is preferred for standard setting – – – generate DR_standard = c4_30_pb recode DR_standard (2=0) (1=1) (3=.) (4=.) tabulate DR_standard April 18, 2006 Lecture 13 Slide #7 Example of Logit Estimation, Continued . logit DR_s tandard DR_cert ide ology sex if DR_correct==0 Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = = = = -607.99677 -584.66068 -584.57097 -584.57097 Logit estimates Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -584.57097 = = = = 943 46.85 0.0000 0.0385 -----------------------------------------------------------------------------DR_standard | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------DR_cert | -.1700262 .0299573 -5.68 0.000 -.2287414 -.111311 ideology | -.1041382 .0512416 -2.03 0.042 -.20457 -.0037065 sex | -.5071621 .1980048 -2.56 0.010 -.8952443 -.1190799 _cons | 1.229425 .3058896 4.02 0.000 .6298925 1.828958 ------------------------------------------------------------------------------ April 18, 2006 Lecture 13 Slide #8 Example of Logit Estimation, Continued • Model “goodness of fit”: . lstat Logistic model for DR_standard -------- True --- ----Classified | D ~D | Total -----------+--------------------------+----------+ | 59 53 | 112 | 267 564 | 831 -----------+--------------------------+----------Total | 326 617 | 943 Classified + if predicted Pr(D) >= .5 True D defined as DR_standard != 0 -------------------------------------------------Sensitivity Pr( +| D) 18.10% Specificity Pr( -|~D) 91.41% Positive predictive value Pr( D| +) 52.68% Negative predictive value Pr(~D| -) 67.87% -------------------------------------------------False + rate for true ~D Pr( +|~D) 8.59% False - rate for true D Pr( -| D) 81.90% False + rate for classified + Pr(~D| +) 47.32% False - rate for classified Pr( D| -) 32.13% -------------------------------------------------Correctly classified 66.07% -------------------------------------------------- April 18, 2006 Lecture 13 Slide #9 Logit Assumptions and Qualifiers • The model is correctly specified – True conditional probabilities are logistic function of the X’s – No important X’s omitted; no extraneous X’s included – No significant measurement error • The cases are independent • No X is a linear function of other X’s – Multicollinearity leads to imprecision • Influential cases can bias estimates • Sample size: n-K should exceed 100 – Independent covariation is critical April 18, 2006 Lecture 13 Slide #10 Logit Hypothesis Tests • Nested Model Tests (like F-Tests in OLS) – Is a more complex model a better fit? • Test to see if parameters for omitted variables are statistically indistinguishable from zero: H2 2(log e LK H log e LK ) • Where the Chi-square table uses K degrees of freedom. • If p < 0.05, the complex model fits significantly better April 18, 2006 Lecture 13 Slide #11 More Logit Hypothesis Tests • To test for the overall hypothesis that all b’s are equal to zero (like an overall F-test): – Compare the final log-likelihood with the initial one, using the same formula: Initial log likelihood = -607.997 Final log likelihood = -584.571 Difference = -23.426 2 K1 2(log e Li log e L f ) = 46.85, df=K-1; p-value > 0.001 (see Hamilton p. 354) April 18, 2006 Lecture 13 Slide #12 Still More Logit Hypothesis Tests • z-statistic: – Similar to the t-stat in OLS – Compares the estimated coefficient to the estimated standard error – P-value is derived from the Chi-Square distribution • Attached to each estimated coefficient – The p-value shows probability that the null hypothesis is correct, given the data April 18, 2006 Lecture 13 Slide #13 Interpreting Logits • Logits can be used to directly calculate odds: anti log e Lˆ • Logits can be reversed to obtain the predicted probabilities: Pˆ April 18, 2006 1 Lˆ 1 e Lecture 13 Slide #14 Interpreting Logits, Continued How would you calculate the effect of a particular independent variable, Xi, on the probability of Y = 1? • Set all Xj’s at their mean, then calculate Pˆ • 1 Lˆ 1 e With Xi at it’s minimum and maximum. Then calculate the difference. April 18, 2006 Lecture 13 Slide #15 Estimated Probability Effects .6 .5 .4 M edi an s p line .3 .2 0 2 4 6 8 10 DR_ce rt April 18, 2006 Lecture 13 Slide #16 Interpreting Logits, Continued •Another method: “Typing” •Calculate the logit for distinct types of observations: •Conservative, certain, male •Liberal, uncertain female •(or any permutation you like) •Homework: •Predict the choice between Quad and LD-HR (will require recodes) •Plot the effect of the risk index and ideology on probability of a shift April 18, 2006 Lecture 13 Slide #17 Coming Up... • Chapter 7 – All • Statistical Problems with Logit – Effects of assumption failures – Diagnostics • Begin discussion of Factor Analysis April 18, 2006 Lecture 13 Slide #18