Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Logistic regression Logistic regression Binary dependent variables Binary dependent variables Logistic regression Linear Probability Model OLS gives the “linear probability model” in this case: OLS regression requires interval dependent variable Binary or “yes/no” dependent variables are not suitable Nor are rates, e.g., n successes out of m trials Pr(Y = 1) = α + βX sociology data is 0/1, prediction is probability AT UNIVERSITY OF LIMERICK Errors are distinctly not normal See credit card example: becomes unrealistic only at very sociology low or high income sociology Particular difficulties with multiple explanatory variables. AT UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK sociology sociology UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK Assumptions violated, but if predicted probabilities in range 0.2–0.8, not too bad While predicted value can be read as a probability, can depart from 0:1 range sociology sociology AT UNIVERSITY OF LIMERICK sociology UNIVERSITY OF LIMERICK 1 Logistic regression 2 Logistic regression Binary dependent variables Binary dependent variables sociology Logistic transformation How to transform probability to −∞ : ∞ range? Odds: p 1−p sociology sociology Alternatively: AT UNIVERSITY OF LIMERICK – range is 0 : ∞ Or: sociology AT UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK Pr(Y = 1) = eα+ βX = eα eβX 1 − Pr(Y = 1) p Log of odds: log 1−p has range −∞ : ∞ sociology sociology sociology UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK eα+ βX 1 = 1 + eα+ βX 1 + e−α− βX AT UNIVERSITY OF LIMERICK Pr(Y = 1) = sociology UNIVERSITY OF LIMERICK Logistic regression uses this as the dependent variable: Pr(Y = 1) log = α + βX 1 − Pr(Y = 1) Probability is bounded [0 : 1] OLS predicted value is unbounded sociology Logistic regression UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK sociology UNIVERSITY OF LIMERICK 3 Logistic regression 4 Logistic regression Binary dependent variables Inference sociology Parameters Theβ parameter is the effect of a unit change in X on Pr(Y=1) log 1−Pr(Y=1) This implies a multiplicative change of eβ in the Odds Pr(Y=1) , 1−Pr(Y=1) For each explanatory variable, H0 : β = 0 is the interesting sociology null in sociology AT UNIVERSITY OF LIMERICK sociology Death penalty example allows us to see the link between odds ratios and estimates AT UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK β̂ z = SE is approximately normally distributed (large sample property) 2 β̂ More usually, the Wald test is used: SE has a χ2 sociology distribution with one degree of freedom But the effect of β on P depends on the level of β sociology UNIVERSITY OF LIMERICK In practice, inference is similar to OLS though based on a different logic Thus eβ represents an odds ratio See credit card example sociology Inference UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK sociology sociology UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK sociology UNIVERSITY OF LIMERICK 5 Logistic regression 6 Logistic regression Inference Inference sociology Likelihood ratio tests Where l0 is the likelihood of the model without Xj , and l1 sociology that with it, the quantity l0 −2 log = −2 (log l0 − log l1 ) l1 AT UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK If we compare a model against the null model (no explanatory variables, it tests H0 : β 1 = β 2 = . . . = β k = 0 sociology sociology is χ2 distributed with one degree of freedom AT UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK UNIVERSITY OF LIMERICK More generally, −2 log ll1o tests nested models: where model 1 contains all the variables in model 0, plus m extra ones, it tests the null that all the extra βs are zero (χ2 with m df) sociology The “likelihood ratio” test is thought more robust than the Wald test for smaller samples sociology sociology Nested models UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK Strong analogy with F-test in OLS sociology sociology UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK 7 sociology UNIVERSITY OF LIMERICK 8 Logistic regression Logistic regression Maximum likelihood Maximum likelihood Maximum likelihood estimation Iterative search Sometimes the values can be chosen analytically What is this “likelihood”? A likelihood function is written, defining the probability of observing the actual data given parameter estimates Differential calculus derives the values of the parameters that maximise the likelihood, for a given data setsociology Unlike OLS, logistic regression (and many, many other models) are extimated by maximum likelihood estimation sociology In general this works by choosing values for the parameter estimates which maximise the probability (likelihood) of observing the actual data AT UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK Often, such “closed form solutions” are not possible, and the values for the parameters are chosen by a systematic computerised search (multiple iterations) OLS can be ML estimated, and yields exactly the same sociology results sociology of a vast range of Extremely flexible, allows estimation complex models within a single framework AT UNIVERSITY OF LIMERICK sociology AT UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK sociology sociology UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK sociology UNIVERSITY OF LIMERICK 9 Logistic regression 10 Logistic regression Maximum likelihood Tabular data sociology Likelihood as a quantity If we think of it as a table where each cell contains n yeses and m − n noes (n successes out of m trials) we can fit sociology grouped logistic regression sociology AT UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK Reported as log-likelihood, hence bounded [−∞ : 0] n successes out of m trials implies a binomial distribution of degree m n = α + βX log m − n sociology The parameter estimates will be exactly the same as if the data were treated individually sociology sociology Thus is usually a large negative number Where an iterative solution is used, likelihood at each stage is usually reported – normallysociology getting nearer 0 at each step AT UNIVERSITY OF LIMERICK sociology AT UNIVERSITY OF LIMERICK UNIVERSITY OF LIMERICK If all the explanatory variables are categorical (or have few fixed values) your data set can be represented as a table Either way, a given model yields a specific maximum likelihood for a give data set This is a probability, henced bounded [0 : 1] sociology Tabular data UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK sociology UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK UNIVERSITY OF LIMERICK 12 11 Logistic regression Logistic regression Tabular data Goodness of fit and accuracy of classification sociology Tabular data and goodness of fit But unlike with individual data, we can calculate goodness of fit, by relating observed successes to predicted in each cell If these are close we cannot reject the null hypothesis that the model is incorrect (i.e., you want a high p-value) sociology Where li is the likelihood of the current model, and ls is the likelihood of the “saturated model” the test statistic is l −2 log i ls UNIVERSITY OF LIMERICK Where the number of “settings” (combinations of values of explanatory variables) is large, this approach to fit is not feasible Cannot be used with continuous covariates Hosmer-Lemeshow statistic attempts to create an analogy sociology Divide sample into deciles of predicted probability Calculate a fit measure based on observed and predicted numbers in the ten groups Simulation shows this is χ2 distributed with 2 df Not a perfect solution, sensitive to how the cuts are made AT UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK Pseudo-R2 measures exist, butsociology none approaches the clean interpretation as in OLS See http://www.ats.ucla.edu/stat/mult_pkg/faq/ sociology sociology general/Psuedo_RSquareds.htm The saturated model predictssociology perfectly and has as many parameters as there are “settings” (cells in the table) The test has df of number of settings less number of sociology parameters estimated, and is sociology χ2 distributed AT UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK sociology Fit with individual data UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK UNIVERSITY OF LIMERICK 13 Logistic regression Logistic regression Goodness of fit and accuracy of classification Goodness of fit and accuracy of classification sociology Predicting outcomes Predicted yes a c Predicted no b d Proportion correctly classified: a a+b ; False positive: Stata: estat class False Not automatically a problem but can give rise to attempts to estimate a parameter as −∞ or +∞ sociology If this happens, you will see a large parameter estimate sociology and a huge standard error AT UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK In individual data, sometimes certain combinations of variables have only successes or only failures a+d a+b+c+d d c+ d sociology negative: b+b d In Stata, these cases are dropped from estimation – you sociology need to be aware of this as it changes the interpretation (you may wish to drop one of the offending variables sociology instead) sociology AT UNIVERSITY OF LIMERICK sociology AT UNIVERSITY OF LIMERICK UNIVERSITY OF LIMERICK Zero cells in tables can cause problems: no yeses or no noes for particular settings Specificity: c a+c ; sociology Some problems UNIVERSITY OF LIMERICK Another way of assessing the adequacy of a logit model is its accuracy of classification: True yes True no Sensitivity: 14 AT UNIVERSITY OF LIMERICK sociology UNIVERSITY OF LIMERICK AT UNIVERSITY OF LIMERICK 15 UNIVERSITY OF LIMERICK 16