Download Logistic regression Linear Probability Model Logistic transformation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Least squares wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Transcript
Logistic regression
Logistic regression
Binary dependent variables
Binary dependent variables
Logistic regression
Linear Probability Model
OLS gives the “linear probability model” in this case:
OLS regression requires interval dependent variable
Binary or “yes/no” dependent variables are not suitable
Nor are rates, e.g., n successes out of m trials
Pr(Y = 1) = α + βX
sociology
data is 0/1, prediction is probability
AT UNIVERSITY OF LIMERICK
Errors are distinctly not normal
See credit card example: becomes unrealistic only at very
sociology
low or high income
sociology
Particular difficulties with multiple
explanatory variables.
AT UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
sociology
sociology
UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
Assumptions violated, but if predicted probabilities in
range 0.2–0.8, not too bad
While predicted value can be read as a probability, can
depart from 0:1 range
sociology
sociology
AT UNIVERSITY OF LIMERICK
sociology
UNIVERSITY OF LIMERICK
1
Logistic regression
2
Logistic regression
Binary dependent variables
Binary dependent variables
sociology
Logistic transformation
How to transform probability to −∞ : ∞ range?
Odds:
p
1−p
sociology
sociology
Alternatively:
AT UNIVERSITY OF LIMERICK
– range is 0 : ∞
Or:
sociology
AT UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
Pr(Y = 1)
= eα+ βX = eα eβX
1 − Pr(Y = 1)
p
Log of odds: log 1−p has range −∞ : ∞
sociology
sociology
sociology
UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
eα+ βX
1
=
1 + eα+ βX
1 + e−α− βX
AT UNIVERSITY OF LIMERICK
Pr(Y = 1) =
sociology
UNIVERSITY OF LIMERICK
Logistic regression uses this as the dependent variable:
Pr(Y = 1)
log
= α + βX
1 − Pr(Y = 1)
Probability is bounded [0 : 1]
OLS predicted value is unbounded
sociology
Logistic regression
UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
sociology
UNIVERSITY OF LIMERICK
3
Logistic regression
4
Logistic regression
Binary dependent variables
Inference
sociology
Parameters
Theβ parameter
is the effect of a unit change in X on
Pr(Y=1)
log 1−Pr(Y=1)
This implies a multiplicative change of eβ in
the Odds
Pr(Y=1)
,
1−Pr(Y=1)
For each explanatory variable, H0 : β = 0 is the interesting
sociology
null
in
sociology
AT UNIVERSITY OF LIMERICK
sociology
Death penalty example allows us to see the link between
odds ratios and estimates
AT UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
β̂
z = SE is approximately normally distributed (large
sample property)
2
β̂
More usually, the Wald test is used: SE has a χ2
sociology
distribution with one degree of freedom
But the effect of β on P depends on the level of β
sociology
UNIVERSITY OF LIMERICK
In practice, inference is similar to OLS though based on a
different logic
Thus eβ represents an odds ratio
See credit card example
sociology
Inference
UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
sociology
sociology
UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
sociology
UNIVERSITY OF LIMERICK
5
Logistic regression
6
Logistic regression
Inference
Inference
sociology
Likelihood ratio tests
Where l0 is the likelihood of the model without Xj , and l1
sociology
that with it, the quantity
l0
−2 log
= −2 (log l0 − log l1 )
l1
AT UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
If we compare a model against the null model (no
explanatory variables, it tests
H0 : β 1 = β 2 = . . . = β k = 0
sociology
sociology
is χ2 distributed with one degree of freedom
AT UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
UNIVERSITY OF LIMERICK
More generally, −2 log ll1o tests nested models: where
model 1 contains all the variables in model 0, plus m extra
ones, it tests the null that all the extra βs are zero (χ2 with
m df)
sociology
The “likelihood ratio” test is thought more robust than the
Wald test for smaller samples
sociology
sociology
Nested models
UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
Strong analogy with F-test in OLS
sociology
sociology
UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
7
sociology
UNIVERSITY OF LIMERICK
8
Logistic regression
Logistic regression
Maximum likelihood
Maximum likelihood
Maximum likelihood estimation
Iterative search
Sometimes the values can be chosen analytically
What is this “likelihood”?
A likelihood function is written, defining the probability of
observing the actual data given parameter estimates
Differential calculus derives the values of the parameters
that maximise the likelihood, for a given data setsociology
Unlike OLS, logistic regression (and many, many other
models) are extimated by maximum likelihood estimation
sociology
In general this works by choosing values for the parameter
estimates which maximise the probability (likelihood) of
observing the actual data
AT UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
Often, such “closed form solutions” are not possible, and
the values for the parameters are chosen by a systematic
computerised search (multiple iterations)
OLS can be ML estimated, and yields exactly the same
sociology
results
sociology of a vast range of
Extremely flexible, allows estimation
complex models within a single framework
AT UNIVERSITY OF LIMERICK
sociology
AT UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
sociology
sociology
UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
sociology
UNIVERSITY OF LIMERICK
9
Logistic regression
10
Logistic regression
Maximum likelihood
Tabular data
sociology
Likelihood as a quantity
If we think of it as a table where each cell contains n yeses
and m − n noes (n successes out of m trials) we can fit
sociology
grouped logistic regression
sociology
AT UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
Reported as log-likelihood, hence bounded [−∞ : 0]
n successes out of m trials implies a binomial distribution
of degree m
n
= α + βX
log
m − n sociology
The parameter estimates will be exactly the same as if the
data were treated individually
sociology
sociology
Thus is usually a large negative number
Where an iterative solution is used, likelihood at each stage
is usually reported – normallysociology
getting nearer 0 at each step
AT UNIVERSITY OF LIMERICK
sociology
AT UNIVERSITY OF LIMERICK
UNIVERSITY OF LIMERICK
If all the explanatory variables are categorical (or have few
fixed values) your data set can be represented as a table
Either way, a given model yields a specific maximum
likelihood for a give data set
This is a probability, henced bounded [0 : 1]
sociology
Tabular data
UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
sociology
UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
UNIVERSITY OF LIMERICK
12
11
Logistic regression
Logistic regression
Tabular data
Goodness of fit and accuracy of classification
sociology
Tabular data and goodness of fit
But unlike with individual data, we can calculate goodness
of fit, by relating observed successes to predicted in each
cell
If these are close we cannot reject the null hypothesis that
the model is incorrect (i.e., you want a high p-value)
sociology
Where li is the likelihood of the current model, and
ls is the
likelihood of the “saturated model” the test statistic is
l
−2 log i
ls
UNIVERSITY OF LIMERICK
Where the number of “settings” (combinations of values of
explanatory variables) is large, this approach to fit is not
feasible
Cannot be used with continuous covariates
Hosmer-Lemeshow statistic attempts to create an analogy
sociology
Divide sample into deciles of predicted probability
Calculate a fit measure based on observed and predicted
numbers in the ten groups
Simulation shows this is χ2 distributed with 2 df
Not a perfect solution, sensitive to how the cuts are made
AT UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
Pseudo-R2 measures exist, butsociology
none approaches the clean
interpretation as in OLS
See http://www.ats.ucla.edu/stat/mult_pkg/faq/
sociology
sociology
general/Psuedo_RSquareds.htm
The saturated model predictssociology
perfectly and has as many
parameters as there are “settings” (cells in the table)
The test has df of number of settings less number of
sociology
parameters estimated, and is sociology
χ2 distributed
AT UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
sociology
Fit with individual data
UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
UNIVERSITY OF LIMERICK
13
Logistic regression
Logistic regression
Goodness of fit and accuracy of classification
Goodness of fit and accuracy of classification
sociology
Predicting outcomes
Predicted yes
a
c
Predicted no
b
d
Proportion correctly classified:
a
a+b ;
False positive:
Stata: estat class
False
Not automatically a problem but can give rise to attempts
to estimate a parameter as −∞ or +∞
sociology
If this happens, you will see a large parameter estimate
sociology
and a huge standard error
AT UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
In individual data, sometimes certain combinations of
variables have only successes or only failures
a+d
a+b+c+d
d
c+
d
sociology
negative: b+b d
In Stata, these cases are dropped
from estimation – you
sociology
need to be aware of this as it changes the interpretation
(you may wish to drop one of the offending variables
sociology
instead)
sociology
AT UNIVERSITY OF LIMERICK
sociology
AT UNIVERSITY OF LIMERICK
UNIVERSITY OF LIMERICK
Zero cells in tables can cause problems: no yeses or no
noes for particular settings
Specificity:
c
a+c ;
sociology
Some problems
UNIVERSITY OF LIMERICK
Another way of assessing the adequacy of a logit model is
its accuracy of classification:
True yes True no
Sensitivity:
14
AT UNIVERSITY OF LIMERICK
sociology
UNIVERSITY OF LIMERICK
AT UNIVERSITY OF LIMERICK
15
UNIVERSITY OF LIMERICK
16