Download Quantitative Methods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Time series wikipedia , lookup

Regression toward the mean wikipedia , lookup

Choice modelling wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Quantitative Methods
Analyzing dichotomous dummy
variables
Logistic Regression Analysis
Like ordinary regression and ANOVA, logistic
regression is part of a category of models
called generalized linear models.
Generalized linear models were developed
to unify various statistical models (linear
regression, logistic regression, poisson
regression). We can think of maximum
likelihood as a general algorithm to
estimate all these models.
Logistic Regression Analysis--GLM
GLM
Each outcome of the dependent variable
(that is, each Y) is assumed to be generated
from a particular distribution function in the
exponential family (normal, binomial, poisson,
etc.)
Logistic Regression Analysis
(a diversion into probability distributions)
Normal distribution—a family of distributions, each
member of which can be defeind by the mean
and variance—many physical phenomena can
be approximated well by the normal distribution.
Binomial distribution—probability distribution of # of
successes in a sequence of Bermoulli trials (where
outcomes fall into one of two categories—i.e.,
”occurred” and “did not occur”. Note that in
large samples, if the dependent variable is not too
skewed, then the normal distribution approximates
the binomial distribution.
Logistic Regression Analysis
(a diversion into probability distributions)
Poisson Distribution—expresses the probability of a #
of events occurring in a fixed period of time, if the
events occur with a known average rate, and
independently of the time since the last event.
(Note that the negative binomial distribution is
used to model event counts that are skewed. One
can also think about the “polya” distribution which
can be used to model occurrences of
“contagious” discrete events – tornado outbreaks.
Logistic Regression—when?
Logistic regression models are appropriate for
dependent variables coded 0/1.
We only observe “0” and “1” for the
dependent variable—but we think of the
dependent variable conceptually as a
probability that “1” will occur.
Logistic Regression--examples
Some examples
Vote for Obama (yes, no)
Turned out to vote (yes, no)
Sought medical assistance in last year (yes, no)
Logistic Regression—why not OLS?
Why can’t we use OLS? After all, linear
regression is so straightforward, and (unlike
other models) actually has a “closed form
solution” for the estimates.
Logistic Regression—why not OLS?
Three problems with using OLS.
First, what is our dependent variable,
conceptually? It is the probability of y=1. But
we only observe y=0 and y=1. If we use OLS,
we’ll get predicted values that fall between 0
and 1—which is what we want—but we’ll also
get predicted values that are greater than 1,
or less than 0. That makes no sense.
Logistic Regression—Why not OLS?
Three problems using OLS.
Second problem—there is heteroskedasticity in the
model. Think about the meaning of “residual”. The
residual is the difference between the observed and
the predicted Y.
By definition, what will that residual look like at the
center of the distribution?
By definition, what will that residual look like at the tails
of the distribution?
Logistic Regression—why not OLS?
Three problems using OLS.
The third problem is substantive. The reality is that
many choice functions can be modeled by an Sshaped curve. Therefore (much as when we
discussed linear transformations of the X variable),
it makes sense to model a non-linear relationship.
Logistic Regression—but similar to
OLS....
So. We actually could correct for the
heteroskedasticity, and we could
transform the equation so that it
captured the “non-linear”
relationship, and then use linear
regression. But what we usually do....
Logistic Regression—but similar to
OLS...
...is use logistic regression to predict the
probability of the occurrence of an
event.
Logistic Regression—s shaped curve
Logistic Regression—
S shaped curve and Bernoulli variables
Note that the observed dependent
variable is a Bernoulli (or binary)
variable. But what we are really
interested in is predicting the
probability that an event occurs (i.e.,
the probability that y=1).
Logistic Regression--advantage
Logistic regression is particularly handy
because (unlike, say, discriminant
analysis) it makes no assumptions about
how the independent variables are
distributed. They don’t have to be
continuous versus categorical, normally
distributed—they can take any form.
Logistic Regression—
exponential values and natural logs
Note—”exp” is the exponential function. Ln is
the natural log. These are opposites.
When we take the exponential function of any
number, we take 2.72 raised to the power of
that number. So, exp(3)=2.72 * 2.72 *
2.72=20.09.
If we take ln (20.09), we get the number 3.
Logistic Regression--transformation

Note that you can think of logistic regression in terms of
transforming the dependent variable so that it fits an s-shaped
curve. Note that the odds ratio is the probability that a case
will be a 1 divided by the probability that it will not be a 1. The
natural log of the odds ratio is the “logit” and it is a linear
function of the x’s (that is, of the right hand side of the model).
Logistic Regression--transformation
Note that you can equivalently talk about
modelling the probability that y=1 (theta,
below), as below (these are the same
mathematical expressions):
Logistic Regression
Note that the independent variables are not
related to the probability that y=1.
However, the independent variables are
linearly related to the logit of the
dependent variables.
Logistic Regression--recap
Logistic regression analysis, in other words, is
very similar to OLS regression, just with a
transformation of the regression formula. We
also use binomial theory to conduct the
tests.
Logistic Regression—Model fit
Recall that in OLS, we minimized the
squared residuals in order to find the
line that best fit the data.
In logistic regression analysis, we use a
calculus-based function called
Maximum Likelihood.
Logistic Regression—MLE
Through an iterative process, it finds the
function that will maximize our ability to
predict the probability of y based on what
we know about x. In other words, ML will
find the best values for the estimated
effect of party, ideology, sex, race, etc.
the predict the likelihood that someone
will vote for Obama.
Logistic Regression Analysis-iteration
 In other words, MLE starts with an initial (arbitrary)
guesstimate of what the coefficients will be, and
then determines the direction and size change
which will increase the log likelihood (goodness of
fit—that is, how likely it is that the observed value
of the dependent variable can be predicted from
the observed variables of the independent
variables).
Logistic Regression Analysis-iteration
 After estimating an initial function,
the program continues estimating
with new estimates to reach an
improved function—until convergence
is reached (that is, the log likelihood,
or the goodness of fit, does not
change significantly).
Logistic Regression--tests
There are two main forms of the
likelihood ratio test for goodness of
fit.
Logistic Regression--tests
1. Test of the overall model (model chi-square test).
Compares the researcher’s model to a reduced
model (the baseline model with the constant only).
A well fitting model is significant at the .05 level or
above—that is, a well fitting model is one that fits
the data better than a model with only the
constant. A finding of significance means that one
can reject the null hypothesis that all of the
predictor effects are zero (this is equivalent to an
“f” test in OLS.)
Logistic Regression--tests
2. Test of individual model parameters.
(Note that the Wald statistic has a chi-squared distribution, but other than
that, it is just the same as the “t” that we use in OLS.)
You can also calculate a likelihood ratio statistic. Essentially, one is
comparing the goodness of fit for the overall model with the
goodness of fit with a “nested” model which drops an independent
variable. (This is generally considered preferable to the Wald statistic
if the coefficient values are very high).
Logistic Regression--interpretation
Most commonly, with all other variables held
constant, there is a constant increase of b1
in the logit (p) for every 1-unit increase in x1.
But remember that even though the right hand
side of the model is linearly related to the logit
(that is, to the natural log of the odds-ratio),
what does it mean for the actual probability
that y=1?
Logistic Regression
It’s fairly straightforward—it’s
multiplicative.
If b1 takes the value of 2.3 (and we know that
exp(2.3)=10), then if x1 increases by 1, the
odds that the dependent variable takes the
value of 1 increase tenfold.
Logistic Regression—presentation
Likewise, it’s difficult to explain to the reader what the
parameter estimates mean—because they reflect
changes in the logit (the natural log of the odds-ratio)
for each one-unit change in x.
But what you want to tell your readers is how much the
probability that y=1 changes (given a 1-unit change in
x).
Logistic Regression—transform back
So, you need to transform into predicted probabilities.
Create predicted y’s (just as you would in OLSpredicted y=a +
bx + bx....)
And then transform:
epy / (1 + epy) = predicted probability
(many software packages will do this for you. See Gary King. Or,
if you are fond of rotary dial phones, create your own excel
file to do this (which has the advantage of flexibility)).
Logistic Regression—logit v. probit
What’s the difference? Well, MLE
requires assumptions about the
probability distribution of the errors—
logistic regression uses the standard
logistic probability distribution,
whereas probit uses the standard
normal distribution.
Logistic Regression—logit v. probit
Logit is more common. And note that
logit and probit often give the same
results.
But note that there can be differences
between the two link functions—see
this paper by Hahn and Soyer.
Logistic Regression—ordered logit
Ordered models assume there's some
underlying, unobservable true outcome variable,
occurring on an interval
scale.
We don't observe that interval-level information about
the outcome, but only whether that unobserved value
crosses some threshold(s) that put the outcome into
a lower or a higher category, categories which are
ranked, revealing ordinal but not interval-level
information.
Logistic Regression—ordered logit
If you are using ordered logit, you will
get results that include “cut points”
(intercepts) and coefficients.
OLR essentially runs multiple
equations—one less than the number
of options on one’s scale.
Logistic Regression—ordered logit
For example, assume that you have a 4 point
scale, 1=not at all optimistic, 2=not very
optimistic, 3=somewhat optimistic, and
4=very optimistic.
The first equation compares the likelihood that
y=1 to the likelihood that y does not =1 (that
is, y=2 or 3 or 4)
Logistic Regression—ordered logit
The second equation compares the
likelihood that y=1 or 2 to the
likelihood that y=3 or 4.
The third equation compares the
likelihood that y=1, 2, or 3 to the
likelihood that y=4.
Logistic Regression—ordered logit
Note that OLR only reports one
parameter estimate for each
indpendent variable. That is, it
constrains the parameter estimates
to be constant across categories.
Logistic Regression—ordered logit
It assumes that the coefficients for the
variables would not vary if one
actually separately estimated the
different equations.
Logistic Regression—ordered logit
(Note that in Stata one can actually test if this
assumption is true, without running the
separate models. There’s some parallel here
to the non-linearity issue we discussed last
week, where OLS is assuming that your
independent variable is linearly related to the
dependent variable—but you can actually
break apart the independent variable to test
whether that is true.)
Logistic Regression—ordered logit
The results also give you intercepts (check to see
how these are coded—they generally mean
the same thing, but the directions of the
parameters are different in SAS versus Stata
(just as an example). (SAS also models y=0 in
a regular logistic regulation, so you need to flip
the signs to get the more intuitive results).
Multinomial Analyses
Multinomial logit can be used when
categories of the dependent variable
cannot be ordered in a meaningful way.
One category is chosen as the “comparison
category”, and the beta coefficient (b)
represents the change in odds of being in
the dependent variable category relative
to the comparison category (for a one-unit
change in the right-hand side variables).
Multinomial Analyses
 The model:
Multinomial Analyses
 Multinomial logit is simple to estimate—and is often
used.
 However, it is appropriate only if the introduction or
removal of a choice has no effect on the
(proportional) probability of choosing each of the
others.
 For example—Perot versus Clinton versus Bush, 1992.
Does removing Perot from the equation mean that
the probability of choosing Clinton relative to the
probability of choosing Bush changes? If so
multinomial logit is inappropriate.
Multinomial Analyses
 Multinomial probit does not require
that assumption that choices are
independent across alternatives.
And, though it demands a great deal
of computing resources, recent
advances mean that it is increasingly
practical to use.
Multinomial Analyses
 So, often Multinomial Probit is
recommended.
 Dow and Endersby (2004) point out,
however, that the choice of a model really
depends on how you see the underlying
choice process that generated the observed
data. In reality, neither model (MNP or
MNL) will be clearly advantageous.
Multinomial Analyses
 And Dow and Endersby argue that MNP
sometimes “fails to converge at a global
optimum”. Put simply, they argue that MNP
often comes up with imprecise estimates—
that is, there are multiple sets of estimates
that fit the data equally well.
 Two studies that compare the MNP and MNL
model: Alvarez and Nagler (2001) and
Quinn et. al. (1999) Alvarez and Nagler
argue for MNP—Quinn et. al. are more
agnostic.
Multinomial logit
 Also, conditional logit: Conditional
logit only includes variables that are
related to the options being chosen
for the dependent variable.