Download Warsaw Summer School 2011, OSU Study Abroad Program

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Time series wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Choice modelling wikipedia , lookup

Regression toward the mean wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Warsaw Summer School 2015, OSU Study Abroad
Program
Advanced Topics:
Interaction
Logistic Regression
Interaction term
An interaction means that the effect of one variable is
different for different types of individuals, e.g. Males and
Females.
If we think that Males react differently to pain and this
reaction has different effect on Y than in the case of
Females we would estimate the following model:
Y = b0 + b1*pain + b3*male*pain + b2*male
which can be rewritten as:
Y = b0 + (b1 + b3*male)*pain + b2*male
Now the effect of pain for females (female==0) is
b1 + b3*0 = b1.
The effect of pain for males (male==1) is
b1 + b3*1 = b1 + b3.
Interaction
• Note that the only '1' for male*pain is when you actually
have a male with pain.
• Thus, the coefficient associated with this value of 1 will
represents a unique effect of pain on males that is not there
on females. This is an interaction.
Example with dummies
Dependent Variable Y – exam scores
C = Coffee (yes = 1, no = 0)
R = Chocolate (yes = 1, no = 0)
Some take C, some R, but some both. In a regression, the
"both Coffee and Chocolate" variable would be referred to
as "the interaction of Coffee and Chocolate".
C*R
Regression:
Y = 50 + 5C + 10R – 3C*R
Interpretation
• If either C or R is zero, C*R equals zero; if both Coffee
and Chocolate are 1, then C*R equals one. That is exactly
what we want for comparison of the effect of interaction.
In a regression result, the simplest way to interpret the
coefficient of a dummy variable is, "what happens when
you change the value from 0 to 1 and leave all the other
variables the same.“ However note that that C*R = 1
implies that C=1 and R=1
Combination of dummies
There are four possible combinations for C, R, and CxR:
1. C = 0, R = 0, C*R = 0
2. C = 1, R = 0, C*R = 0
3. C = 0, R = 1, C*R = 0
4. C = 1, R = 1, C*R = 1
Interpretation of these situations
Diminishing return
Y = 50 + 10C + 20R – 3C*R
Even though the coefficient of the interaction is negative,
Coffee and Chocolate together might be a positive thing.
Taking both Coffee and Chocolate, score 27 points higher!
What the -3 is telling you is, there are diminishing returns to
taking both.
You might think that, since Coffee improves you by 10, and
Chocolate improves you by 20, that, if you take both, you'll
improve by 30. That is not right.
Interpreting Parameters with
Interaction Terms
An interaction term is a term composed of the product of two
characteristics. For example:
Income explained by gender and education
Interaction term: Female*Education.
• Why are interaction terms used?
• Different slopes for men and women!
Eg
Income = a + b1F + b2educ + b3F*educ
The parameter on the interaction term, b3,tells us the
difference between the male slope and female slope for
income.
Parameters
Suppose we estimate parameters using regression for the
following two models:
• Income = a1 + g*educ for men
• Income = a2 + d*educ
for women
And then we estimate the parameters of a third model on
pooled data:
• Income= a + b1F + b2educ + b3(F*educ)
It turns out that:
a = a1
b1 = a2 – a1
b2 = g
b3 = d - g
Logistic regression
Regression and dummy DV: I
•
•
What we want to predict from a knowledge of relevant
independent variables is not a precise numerical value of
a dependent variable, but rather the probability (p) that
it is 1 (event occurring) rather than 0 (event not
occurring). This means that, while in linear regression,
the relationship between the dependent and the
independent variables is linear, this assumption is not
made in logistic regression. Instead, the logistic
regression function is use.
Why not to use ordinary regression? The predicted
values could become greater than one and less than zero.
Such values are theoretically inadmissible.
Regression and dummy DV: II
• One of the assumptions of regression is that the variance of
Y is constant across values of X. This cannot be the case
with a binary variable, because the variance is pq. When
50 percent of the people are 1s, then the variance is .25, its
maximum value. As we move to more extreme values, the
variance decreases. When P=.10, the variance is .1*.9 = .09,
so as P approaches 1 or zero, the variance approaches zero.
Regression and dummy DV: III
• The significance testing of the b weights rest upon the
assumption that errors of prediction (Y-Y') are normally
distributed. Because Y only takes the values 0 and 1, this
assumption is pretty hard to justify, even approximately.
Therefore, the tests of the regression weights are suspect if
you use linear regression with a binary DV.
Odds and log odds
• Suppose we only know a person's education and we want
to predict whether that person voted (1) or not voted (0) in
the last election. We can talk about the probability of
voting, or we can talk about the odds of voting. Let's say
that the probability of voting at a given education is .90.
Then the odds would be
• Odds = p / 1 – p
or
Odds = p / q where q = 1 - p
• (Odds can also be found by counting the number of people
in each group and dividing one number by the other.
Clearly, the probability is not the same as the odds.)
Odds and log odds
• In our example, the odds would be .90/.10 or 9 to one. Now
the odds of not voting would be .10/.90 or 1/9 or .11. This
asymmetry is unappealing, because the odds of voting
should be the opposite of the odds of not votiong.
• We can take care of this asymmetry though the natural
logarithm, ln.
• The natural log of 9 is 2.217 (ln(.9/.1)=2.217). The natural
log of 1/9 is -2.217 (ln(.1/.9)=-2.217), so the log odds of
voting is exactly opposite to the log odds of not voting.
Natural logarithm
• The natural logarithm is the logarithm to the base e, where
e is a constant approximately equal to 2.7. The natural
logarithm is generally written as ln(x), or sometimes, if the
base of e is implicit, as log(x).
• The natural logarithm of a number x (written as ln(x)) is
the power to which e would have to be raised to equal x.
For example, ln(7.389...) is 2, because e2=7.389.... The
natural log of e itself (ln(e)) is 1 because e1 = e, while the
natural logarithm of 1 (ln(1)) is 0, since e0 = 1.
Ln
• Note that the natural log is zero when X is 1. When X is
larger than one, the log curves up slowly. When X is less
than one, the natural log is less than zero, and decreases
rapidly as X approaches zero. When P = .50, the odds are
.50/.50 or 1, and ln(1) =0. If P is greater than .50, ln(P/(1-P)
is positive; if P is less than .50, ln(odds) is negative. [A
number taken to a negative power is one divided by that
number, e.g. e-10 = 1/e10. A logarithm is an exponent from
a given base, for example ln(e10) = 10.]
Logistic regression
Ln(p / 1 – p) = a + B1*X1 + B2 *X2