Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Birthday problem wikipedia , lookup
Predictive analytics wikipedia , lookup
Risk management wikipedia , lookup
Enterprise risk management wikipedia , lookup
Least squares wikipedia , lookup
Regression analysis wikipedia , lookup
Blood pressure wikipedia , lookup
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate the effect of a risk on the occurrence of a disease event. Example: Framingham Heart Study Coronary heart disease and blood pressure LOGISTIC REGRESSION: AN EXAMPLE Event: Coronary Heart Disease Occurrence is the dependent variable, which takes 2 values: Yes or No. Risk factor: Blood pressure Systolic blood pressure is the independent variable X, a continuous measurement. The probability of getting coronary heart disease depends on blood pressure. DATA MAN John Steven Sean Brian Michael Terry Joseph Patrick Teddy Ryan . . . SYSTOLIC BP 130 140 145 150 155 160 165 170 175 180 . . . DEVELOPED CHD NO NO NO NO YES NO NO YES YES YES . . . . . . 0 0 0 0 1 0 1 1 1 1 SCATTER PLOT 1.0 CHD 0.8 0.6 0.4 0.2 0.0 120 140 160 180 Systolic blood pressure 200 Prob(CHD) LINEAR REGRESSION FOR Prob.(CHD): NOT A GOOD IDEA! 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 120 140 160 180 Systolic blood pressure 200 PROPORTION WITH CHD BY SBP GROUP Systolic BP Range Proportion 130-149 mmHg 0/3 0.00 150-169 mmHg 2/4 0.50 170-189 mmHg 3/3 1.00 LOGISTIC REGRESSION PROBABILITY MODEL 1 p(X) = ----------------------------1 + exp (- b0 - b1X) The probability of the event varies as an S-shaped function of the risk factor X: the logistic curve. LOGISTIC CURVE MODEL: OCCURRENCE OF CHD AS A FUNCTION OF SBP Probability of CHD 1 Probability 0.8 0.6 0.4 0.2 0 0 100 200 300 Systolic Blood Pressure prob.=1/{1+exp(-6.08 + 0.0243(SBP)} LOGISTIC MODEL: LOG ODDS p (X) log ----------- = b0 + b1X 1 - p (X) The log of the odds of the event is a linear function of X. Log(odds of CHD) = - 6.08 + 0.0243(SBP) ODDS The odds of an event is the chance that the event occurs divided by the chance of its not occurring: Odds = p/(1 - p) = p/q b1: KEY PARAMETER OF THE LOGISTIC MODEL p (X) log ----------- = b0 + b1X 1 - p (X) The parameter b1 is like the slope of a linear regression model. b1 = 0 indicates that X has no effect on the probability, e.g., a man’s chance of CHD does not depend on his SBP. b 1: KEY PARAMETER p (X) log ----------- = b0 + b1X 1 - p (X) The coefficient b1 measures the amount of change in the log of the odds per unit change in X. b 1: KEY PARAMETER log odds(X+1) = b0 + b1(X+1) = b0 + b1X+ b1 log odds(X) = b0 + b1X Difference in log odds = b1 E.g., the log of the odds of getting CHD increases by 0.0243 for an increase of 1 mmHg of systolic blood pressure. (Hard to explain to a patient!) THE COEFFICIENT b1 AND THE ODDS RATIO Difference in log odds given by b1 translates into the odds ratio (OR). exp(b1) = OR = ratio of odds at risk level of X+1 to the odds when risk level is X b1 = 0 OR = 1. THE COEFFICIENT $1 AND THE ODDS RATIO For example, the odds of CHD are multiplied by the factor exp(0.0243) = 1.025 for every increase of 1 mmHg in SBP. A difference of 10 mmHg multiplies the odds of CHD by (1.025)10, or 1.275. ESTIMATION OF THE PARAMETERS Technique: Maximum likelihood estimation For large sample sizes, the normal distribution is used to put a confidence interval around the estimate of the coefficient b1. HYPOTHESIS TESTING Ho: b1 = 0 No difference in risk at different levels of the risk factor X. No association between risk factor X and probability of occurrence. HYPOTHESIS TESTING Ha: b1 =/= 0 or b1 > 0 (risk increases with X) or b1 < 0 (risk goes down as X increases) HYPOTHESIS TESTING Ho: OR = 1 Ha: OR =/= 1 or OR > 1 (risk increases with X) or OR < 1 (X is protective) RESULTS OF LOGISTIC REGRESSION OR with confidence interval and p value indicate whether there is a significant association between level of the risk factor and chance of occurrence OR = 1.025 (1.015, 1.034), p < 0.001 RESULTS OF LOGISTIC REGRESSION Can be used to predict an individual’s risk: prob. of CHD when SBP = 180: p/q = exp{-6.082 + 0.0243(180)} Solve for p: prob. of CHD = 0.125 MULTIVARIATE LOGISTIC REGRESSION Model with additional risk factors: p (X) log ----------- = b0 + b1X + b2X 1 - p (X) Log(odds of CHD) = b 0+ b1(SBP) + b2(CHOL) + b3(smoker)