Download Self-BLAME

Logistic Regression Part I - Introduction Logistic Regression • Regression where the response variable is dichotomous (not continuous) • Examples – effect of concentration of drug on whether symptoms go away – effect of age on whether or not a patient survived treatment – effect of negative cognitions about SELF, WORLD, or Self-BLAME on whether a participant has PTSD Simple Linear Regression • Relationship between continuous response variable and continuous explanatory variable • Example – Effect of concentration of drug on reaction time – Effect of age of patient on number of years of post-operation survival Simple Linear Regression • RT (ms) = β0 + β1 x concentration (mg) • β0 is value of RT when concentration is 0 • β1 is change in RT caused by a change in concentration of 1mg. • E.g. RT = 400 + 50 x concentration Logistic Regression • What do we do when we have a response variable which is not continuous, but is dichotomous Probability of Disease Concentration Odds of Disease Concentration Log(Odds) of Disease Concentration Odds • Odds are simply the ratio of the proportions for the two possible outcomes. • If p is the proportion for one outcome, then 1- p is the proportion for the second outcome. Odds (Example) • At concentration level 16 we observe 75 participants out of 100 showing no disease (healthy) • If p is the probability of healthy is then p = 0.75. • Then 1 – p is the probability of not healthy, and equals 0.25 • Odds of showing healthy over not healthy given concentration level 16 • p / (1 – p) = 0.75/0.25 = 3 • Means that it is 3 times more likely that person is healthy at concentration level 16 Logarithms • Logarithms are a way of expressing numbers as powers of a base • Example – 102 = 100 – 10 is called the “base” – The power, 2 in this case, is called the “exponent” • Therefore 102 = 100 means that log10100 =2 Log Odds • Odds of being healthy after 16mg of drug is 3 • Log odds is log(3) = 1.1 • Lets say that odds of being healthy after 2mg of drug is 0.25 • Means that it is four times more likely to not be healthy after 2mg of drug • Log odds is log(0.25) = -1.39 Logistic Regression • With Log-odds we can now look at the linear relationship between dichotomous response and continuous explanatory  pˆ     0  1 X log   1  pˆ  Where, for example, p is the probability of being healthy at different levels of drug concentration, X Example: Simple Logistic Regression • Look at the effect of drug concentration on probability of NOT having disease (i.e. being healthy) • Use SPSS to do the regression (we’ll all do this soon) • Get  pˆ nodisease    2.92  0.106  Concentration log   1  pˆ nodisease  Looks Like  pˆ nodisease    2.92  0.106  Concentration log   1  pˆ nodisease  • Interpreting parameters (b0 and b1) in logistic regression is a little tricky • An increase of 1mg of concentration increases the log(odds) of being healthy by 0.106 • An increase of 1mg of concentration increases the odds of being healthy by eb1  e.106  1.111 • Increasing concentration by 1mg increases odds of being healthy by a factor of 1.11 Slope Parameter • Parameter β1 in general: – if positive then increasing X increases the odds of p – if negative then increasing X decreases the odds of p – the larger (in magnitude) the larger the effect of X on p • Like simple linear regression, can test whether or not β1 is significantly different from 0. Let’s break to do simple Logistic Regression • Open XYZ.sav in SPSS • Fit logistic regression with – PTSD (Y/N) as response variable – Self-BLAME as explanatory variable • • • • Is the effect of Self-BLAME significant? Get parameter estimates Write equation of model What is the odds of having PTSD given SelfBLAME score of 3? • Use the interpretation of the regression coefficient to work out odds given Self-BLAME of 4. Logistic Regression Part II – Multiple Logistic Regression Multiple Linear Regression • Simple Linear Regression extended out to more than one explanatory variable • Example – Effect of both concentration and age on reaction time – Effect of age, number of previous operations, time in anaesthesia, cholesterol level, etc. on number of years of post-operation survival Multiple Linear Regression RT (ms) = β0 + β1 x concentration (mg) + β2 x age + β3 x gender (0=male,1=female) β0 is value of RT when concentration is 0. β1 is change in RT caused by a change in concentration of 1mg. β2 is change in RT caused by a change in age of 1 year. β3 is change in RT caused by going from male to female in gender. Multiple Logistic Regression • Look at the effect of drug concentration, age and gender on probability of NOT having disease  pˆ     0  1 X 1   2 X 2   3 X 3 log   1  pˆ  Where p is the probability of not having the disease, X1 is the concentration of drug (mg), X2 is age (years), and X3 is gender (0 for males, 1 for females)  pˆ nodisease    2.92  0.106  Concentration  0.0532  Age  0.001 Gender log   1  pˆ nodisease  • Again, use SPSS to fit logistic model • Increasing concentration increases odds of not having the disease (again, being healthy) • Increasing age decreases odds of being healthy • “Increasing” gender (from male to female) increases odds of being healthy • In particular, increasing age decreases the odds of being healthy by a factor of 0.95 • M to F increases odds by factor of 1.001 Was it worth adding the factors? • When we add parameters we make our model more complicated. • We really want this addition to be “worth it” • In other words, adding age and gender should improve our explanation of disease • But what constitutes an improvement Was it worth adding the factors? • Quality (badness) of model fit is given by -2logL • If we fit want to see if it was worth adding parameters we can compare the quality of the fit of the simple and the more complex model • Quality of model fit follows a chi-square (χ2) distribution with degrees-of-freedom (df) equal to the number of parameters in the model • The difference between quality of fit also follows a χ2 distribution with df equal to the difference in the number of parameters between the two models Was it worth adding these factors? • Simple logistic regression model has overall χ2 of 45.7 • This multiple logistic regression model with 2 extra parameters has χ2 of 40.02 • Test whether χ2 = 45.7 - 40.02 = 5.68 is a significant improvement • Critical χ2 for 2 df is 5.99 • Our χ2 is smaller and so NO, not worth it BUT… • It doesn’t look like gender is having much of an effect • Check SPSS output and see that Wald χ2 for Gender is 0.527, which has p = .47 • Perhaps it wasn’t worth adding both parameters, but it will be worth just adding Age • Age has Wald-χ2 = 4.33, p = .03 • When we only add Age, change in χ2 = 5.5 and we test against χ2 with df of 1, which has p = .02 Logistic Regression Model Building • What if we have a whole host of possible explanatory variables • We want to build a model which predicts whether a person will have a disease given a set of explanatory variables • SAME as multiple linear regression – – – – – Forward selection Backward elimination Stepwise All subsets Hierarchical How to know if a model is good • All about having a model which does a good job of appropriately classifying participants as having disease or not • In particular, model predicts how many people have disease and how many people don’t have the disease • The model can be – Correct in two ways • Correctly categorise a person who has a disease as having a disease • Correctly say no disease when no disease – Incorrect in two ways • Incorrectly categorise a person who has a disease as not having a disease • Incorrectly say no disease when disease Accuracy of model • Proportion of correct classifications – Number of correct disease participants plus number of correct no disease participants divided by number of participants in total nCD  nC ND nCD  nC ND  nICD  nIC ND Sensitivity of model • Proportion of ‘successes’ correctly identified – Number of correct no disease participants divided by total number of no disease participants nC ND nC ND  nIC ND Specificity of model • Proportion of ‘failures’ correctly identified – Number of correct disease participants divided by total number of disease participants nC D nC D  nIC D Now…a real example • Startup, Makgekgenene and Webster (2007) looked at whether or not the subscales of the Posttraumatic Cognitions Inventory (PTCI) are good predictors of Posttraumatic Stress Disorder (PTSD) • Subscales are – Negative Cognitions About SELF – Negative Cognitions about the WORLD – Self-BLAME Descriptive Results • PTSD participants showed higher scores than non-PTSD in all three subscales variables Multiple Logistic Regression • Response variable: – whether or not the participant has PTSD • Explanatory variables: – Negative Cognitions About SELF – Negative Cognitions about the WORLD – Self-BLAME Let’s do the Logistic Regression • Open XYZ.sav in SPSS • Run the appropriate regression • What are the parameter estimates for our three explanatory variables? • Which of these are significant (at α = .05)? • What are the odds ratios for those that are significant? • Anything unusual? Self-BLAME • Self-BLAME has a negative odds ratio. • This means that increasing self-blame decreases the chance of having PTSD • This is surprising, especially since participants with PTSD showed higher Self-BLAME scores • What’s going on? Self-BLAME and SELF scales • Startup et al. (2007) explain this by stating that Self-BLAME is made up of both behavioural and characterological questions • SELF, however, may also tap into characterological aspects of self-blame • Behavioural self-blame can be considered adaptive. It may help avoid PTSD • Characterological self-blame, however, may be detrimental, and lead to PTSD Suppressor Effect • The relationship between SELF and PTSD is strong, and accounts for the negative relationship. This includes the effect of characterological self-blame. • The variation in PTSD that is left for Self-BLAME to account for is the positive aspect of the relationship between the Self-BLAME scores and PTSD. • The negative aspect of Self-BLAME scores has been suppressed (already accounted for by SELF). The positive aspect of Self-BLAME can now come out. Homework (haha) • Evaluate the model by looking at – Accuracy of model’s predictions – Sensitivity of model’s predictions – Specificity of model’s predictions

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Self-BLAME