Download Self-BLAME

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Vector generalized linear model wikipedia , lookup

Predictive analytics wikipedia , lookup

Plateau principle wikipedia , lookup

Disease wikipedia , lookup

Least squares wikipedia , lookup

Generalized linear model wikipedia , lookup

Regression analysis wikipedia , lookup

Transcript
Logistic Regression
Part I - Introduction
Logistic Regression
• Regression where the response variable is
dichotomous (not continuous)
• Examples
– effect of concentration of drug on whether
symptoms go away
– effect of age on whether or not a patient
survived treatment
– effect of negative cognitions about SELF,
WORLD, or Self-BLAME on whether a
participant has PTSD
Simple Linear Regression
• Relationship between continuous
response variable and continuous
explanatory variable
• Example
– Effect of concentration of drug on reaction
time
– Effect of age of patient on number of years of
post-operation survival
Simple Linear Regression
• RT (ms) = β0 + β1 x concentration (mg)
• β0 is value of RT when concentration is 0
• β1 is change in RT caused by a change in
concentration of 1mg.
• E.g. RT = 400 + 50 x concentration
Logistic Regression
• What do we do when we have a response
variable which is not continuous, but is
dichotomous
Probability of Disease
Concentration
Odds of Disease
Concentration
Log(Odds) of Disease
Concentration
Odds
• Odds are simply the ratio of the
proportions for the two possible
outcomes.
• If p is the proportion for one outcome,
then 1- p is the proportion for the
second outcome.
Odds (Example)
• At concentration level 16 we observe 75
participants out of 100 showing no disease
(healthy)
• If p is the probability of healthy is then p = 0.75.
• Then 1 – p is the probability of not healthy, and
equals 0.25
• Odds of showing healthy over not healthy
given concentration level 16
• p / (1 – p) = 0.75/0.25 = 3
• Means that it is 3 times more likely that person is
healthy at concentration level 16
Logarithms
• Logarithms are a way of expressing
numbers as powers of a base
• Example
– 102 = 100
– 10 is called the “base”
– The power, 2 in this case, is called the
“exponent”
• Therefore 102 = 100 means that log10100
=2
Log Odds
• Odds of being healthy after 16mg of drug
is 3
• Log odds is log(3) = 1.1
• Lets say that odds of being healthy after
2mg of drug is 0.25
• Means that it is four times more likely to
not be healthy after 2mg of drug
• Log odds is log(0.25) = -1.39
Logistic Regression
• With Log-odds we can now look at the linear
relationship between dichotomous response and
continuous explanatory
 pˆ 
   0  1 X
log 
 1  pˆ 
Where, for example, p is the probability of being
healthy at different levels of drug concentration,
X
Example: Simple Logistic
Regression
• Look at the effect of drug concentration on
probability of NOT having disease (i.e.
being healthy)
• Use SPSS to do the regression (we’ll all do
this soon)
• Get
 pˆ nodisease 
  2.92  0.106  Concentration
log 
 1  pˆ nodisease 
Looks Like
 pˆ nodisease 
  2.92  0.106  Concentration
log 
 1  pˆ nodisease 
• Interpreting parameters (b0 and b1) in logistic
regression is a little tricky
• An increase of 1mg of concentration increases
the log(odds) of being healthy by 0.106
• An increase of 1mg of concentration increases
the odds of being healthy by
eb1  e.106  1.111
• Increasing concentration by 1mg increases
odds of being healthy by a factor of 1.11
Slope Parameter
• Parameter β1 in general:
– if positive then increasing X increases the
odds of p
– if negative then increasing X decreases the
odds of p
– the larger (in magnitude) the larger the effect
of X on p
• Like simple linear regression, can test
whether or not β1 is significantly different
from 0.
Let’s break to do simple Logistic
Regression
• Open XYZ.sav in SPSS
• Fit logistic regression with
– PTSD (Y/N) as response variable
– Self-BLAME as explanatory variable
•
•
•
•
Is the effect of Self-BLAME significant?
Get parameter estimates
Write equation of model
What is the odds of having PTSD given SelfBLAME score of 3?
• Use the interpretation of the regression
coefficient to work out odds given Self-BLAME
of 4.
Logistic Regression
Part II – Multiple Logistic
Regression
Multiple Linear Regression
• Simple Linear Regression extended out to
more than one explanatory variable
• Example
– Effect of both concentration and age on
reaction time
– Effect of age, number of previous operations,
time in anaesthesia, cholesterol level, etc. on
number of years of post-operation survival
Multiple Linear Regression
RT (ms) = β0 + β1 x concentration (mg) + β2 x
age + β3 x gender (0=male,1=female)
β0 is value of RT when concentration is 0.
β1 is change in RT caused by a change in
concentration of 1mg.
β2 is change in RT caused by a change in age of 1
year.
β3 is change in RT caused by going from male to
female in gender.
Multiple Logistic Regression
• Look at the effect of drug concentration, age and
gender on probability of NOT having disease
 pˆ 
   0  1 X 1   2 X 2   3 X 3
log 
 1  pˆ 
Where p is the probability of not having the
disease, X1 is the concentration of drug (mg),
X2 is age (years), and X3 is gender (0 for males,
1 for females)
 pˆ nodisease 
  2.92  0.106  Concentration  0.0532  Age  0.001 Gender
log 
 1  pˆ nodisease 
• Again, use SPSS to fit logistic model
• Increasing concentration increases odds of not
having the disease (again, being healthy)
• Increasing age decreases odds of being healthy
• “Increasing” gender (from male to female)
increases odds of being healthy
• In particular, increasing age decreases the odds
of being healthy by a factor of 0.95
• M to F increases odds by factor of 1.001
Was it worth adding the factors?
• When we add parameters we make our
model more complicated.
• We really want this addition to be “worth it”
• In other words, adding age and gender
should improve our explanation of disease
• But what constitutes an improvement
Was it worth adding the factors?
• Quality (badness) of model fit is given by
-2logL
• If we fit want to see if it was worth adding
parameters we can compare the quality of the
fit of the simple and the more complex model
• Quality of model fit follows a chi-square (χ2)
distribution with degrees-of-freedom (df) equal to
the number of parameters in the model
• The difference between quality of fit also
follows a χ2 distribution with df equal to the
difference in the number of parameters between
the two models
Was it worth adding these factors?
• Simple logistic regression model has
overall χ2 of 45.7
• This multiple logistic regression model with
2 extra parameters has χ2 of 40.02
• Test whether χ2 = 45.7 - 40.02 = 5.68 is a
significant improvement
• Critical χ2 for 2 df is 5.99
• Our χ2 is smaller and so NO, not worth it
BUT…
• It doesn’t look like gender is having much of an
effect
• Check SPSS output and see that Wald χ2 for
Gender is 0.527, which has p = .47
• Perhaps it wasn’t worth adding both parameters,
but it will be worth just adding Age
• Age has Wald-χ2 = 4.33, p = .03
• When we only add Age, change in χ2 = 5.5 and
we test against χ2 with df of 1, which has p = .02
Logistic Regression Model Building
• What if we have a whole host of possible
explanatory variables
• We want to build a model which predicts whether
a person will have a disease given a set of
explanatory variables
• SAME as multiple linear regression
–
–
–
–
–
Forward selection
Backward elimination
Stepwise
All subsets
Hierarchical
How to know if a model is good
• All about having a model which does a good job of
appropriately classifying participants as having disease
or not
• In particular, model predicts how many people have
disease and how many people don’t have the disease
• The model can be
– Correct in two ways
• Correctly categorise a person who has a disease as having a
disease
• Correctly say no disease when no disease
– Incorrect in two ways
• Incorrectly categorise a person who has a disease as not having a
disease
• Incorrectly say no disease when disease
Accuracy of model
• Proportion of correct classifications
– Number of correct disease participants plus
number of correct no disease participants
divided by number of participants in total
nCD  nC ND
nCD  nC ND  nICD  nIC ND
Sensitivity of model
• Proportion of ‘successes’ correctly
identified
– Number of correct no disease participants
divided by total number of no disease
participants
nC ND
nC ND  nIC ND
Specificity of model
• Proportion of ‘failures’ correctly identified
– Number of correct disease participants
divided by total number of disease
participants
nC D
nC D  nIC D
Now…a real example
• Startup, Makgekgenene and Webster
(2007) looked at whether or not the
subscales of the Posttraumatic Cognitions
Inventory (PTCI) are good predictors of
Posttraumatic Stress Disorder (PTSD)
• Subscales are
– Negative Cognitions About SELF
– Negative Cognitions about the WORLD
– Self-BLAME
Descriptive Results
• PTSD participants showed higher scores
than non-PTSD in all three subscales
variables
Multiple Logistic Regression
• Response variable:
– whether or not the participant has PTSD
• Explanatory variables:
– Negative Cognitions About SELF
– Negative Cognitions about the WORLD
– Self-BLAME
Let’s do the Logistic Regression
• Open XYZ.sav in SPSS
• Run the appropriate regression
• What are the parameter estimates for our
three explanatory variables?
• Which of these are significant (at α = .05)?
• What are the odds ratios for those that are
significant?
• Anything unusual?
Self-BLAME
• Self-BLAME has a negative odds ratio.
• This means that increasing self-blame
decreases the chance of having PTSD
• This is surprising, especially since
participants with PTSD showed higher
Self-BLAME scores
• What’s going on?
Self-BLAME and SELF scales
• Startup et al. (2007) explain this by stating
that Self-BLAME is made up of both
behavioural and characterological
questions
• SELF, however, may also tap into
characterological aspects of self-blame
• Behavioural self-blame can be considered
adaptive. It may help avoid PTSD
• Characterological self-blame, however,
may be detrimental, and lead to PTSD
Suppressor Effect
• The relationship between SELF and PTSD is
strong, and accounts for the negative
relationship. This includes the effect of
characterological self-blame.
• The variation in PTSD that is left for Self-BLAME
to account for is the positive aspect of the
relationship between the Self-BLAME scores
and PTSD.
• The negative aspect of Self-BLAME scores has
been suppressed (already accounted for by
SELF). The positive aspect of Self-BLAME can
now come out.
Homework (haha)
• Evaluate the model by looking at
– Accuracy of model’s predictions
– Sensitivity of model’s predictions
– Specificity of model’s predictions