Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Now we will look at one continuous predictor. The basic model looks the same as for binary predictors except now we can plug in any value for X that is reasonable. 1 Similar to our earlier derivations, we want to look at the odds ratio in the case of continuous predictors. As in linear regression, by default we will look at a 1-unit increase in X. So here we have used our model to find the log odds of the outcome event for a value of the predictor of X+1. Substituting X+1 gives beta_0 + beta_1 times the quantity X + 1. We will distribute the beta_1 to get beta_0 + beta_1 + beta_1 times X. Then we have the log odds for a value of the predictor of X. Substituting simply returns beta_0 + beta_1 times X. To find the log odds ratio, we find the log odds for X+1 minus the log odds for X. Simplifying we cancel the beta_0 and beta_1 times X terms to get just beta_1 as the log odds ratio for a 1-unit increase in X. Again we exponentiate to find the odds ratio for a 1-unit increase in X. 2 Often, we are interested in a more meaningful change than a 1-unit increase. If we want to find the odds ratio for a delta-unit increase in X we perform a similar calculation. The result of which is that the log odds ratio for a delta-unit increase in X is equal to beta_1 times delta and we can exponentiate this value to find the odds ratio for a delta-unit increase in X. We will illustrate applying these results with an example shortly. 3 Here is the SAS code for the logistic model relating CHD to the continuous predictor AGE in the WCGS data. • We begin with PROC LOGISTIC on our data. • We have NO class statement here – AGE is continuous. • Then we have our model statement. • On the left of the equals we have the binary outcome variable CHD69 with the event = Yes specified and on the right of the equals the predictor AGE • We have added the confidence intervals for the parameter estimates to this output – we could have requested this in the previous two as well – notice that the parameter estimates tables from the earlier results did not have confidence intervals. One reason is we will usually use the results from the odds ratios directly and so don’t really need to see these intervals for the BETAs. This will produce a separate table instead of providing the results with the parameter estimates table. • That is the basic code for proc logistic but we will also use ODDSRATIO and UNITS statements. • Here we use the ODDSRATIO statement to request the ODDSRATIO for the predictor AGE but since we specified the units for age are 10, this will give the odds ratio for a 10year increase in age, instead of a 1-year increase in age which would be the default. 4 There isn’t much new in the output here. We don’t have any class level information because our predictor is continuous. We are back to having 1 degree of freedom in the test of the global null. 5 We have our parameter estimates table and odds ratio for age along with the graph – which I edited to fit here. We also get the confidence interval for our parameter estimates from the clparm = wald option. 6 From the ODDSRATIO and UNITS statements, we get another table with an odds ratio, but this is for a 10-year increase in age – along with a graphical representation of this odds ratio, again edited to fit here. 7 Here I used SAS to obtain this table which we will discuss later. You will be asked to do this by hand. 8 We can calculate the odds ratio for a 1-unit increase in age by exponentiating the parameter estimate of 0.0744 to get 1.077. We can see that it is highly statistically significant. 9 Again, we have two ways to interpret this odds ratio. Here we wrote: • For each 1 year increase in age, the odds of coronary heart disease are 1.077 times larger. The 95% confidence interval suggests this value could be as low as 1.054 to as high as 1.101. • For each 1 year increase in age, the odds of coronary heart disease increases by 7.7%. The 95% confidence interval suggests this value could be as low as 5.4% to as high as 10.1%. We will see eventually that for continuous predictors with odds ratios between 0 and 1 we will prefer the percent decrease terminology as it will be clearer than the first choice. We could construct these confidence intervals for a 1-unit increase in X ourselves using exactly the same equation as before. 10 We saw earlier that if we want to estimate a delta unit increase then we multiply the parameter estimate by the delta value and then exponentiate the results. Here, for a 5-year increase in age, we have exp of 5 times beta_1 which is 1.451. 11 Here we will use c instead of delta. Feel free to keep using delta, here instead we just thought it was a little nicer. To construct the confidence interval for a c-unit increase in X, we must first find the confidence interval for a 1-unit increase using the formula on the bottom. Then we multiply both of those values by c before exponentiating to find the confidence interval for the odds ratio for a c-unit increase in X. You can practice this with the presented results on a 10-year increase in age but here we presented the results for a 5-year increase. 12 Using the confidence interval for the original parameter estimate (not the confidence interval for the for odds ratio) we can calculate the confidence interval for a 5-year increase in age by multiplying the log odds confidence interval values by 5 and then exponentiating the results. Again, you can practice all of this with a 10-year increase and check your results with the output provided. 13 We can interpret this odds ratio for a 5-year increase similarly. For each 5-year increase in age, the odds of coronary heart disease are 1.451 times larger. The 95% confidence interval ranges from 1.299 – 1.621. 14 We can interpret this odds ratio for a 5-year increase similarly. For each 5 year increase in age, the odds of coronary heart disease increases by 45.1% (95% confidence interval 29.9% – 62.1%) BE CAREFUL when odds ratio is less than 1, the first interpretation still works, although for example we would say 0.6 times SMALLER, but the % decrease is found by subtracting the odds ratio from 1 15 Again, see if you can use the model to predict two probabilities – P(50) and P(60). Using those two probabilities find the odds(50) and odds(60). Then find the odds ratio, relative risk, and excess risk from the calculated probabilities or odds as appropriate. There are 5 more sets you can practice with but soon enough you will have your own assignment on logistic regression with one predictor. 16 Notice there are many similarities between our work here and that for linear regression. But, because the left-side of our model is now the LOG-ODDS of the outcome occurring for a specific value of X, there are added difficulties that must be correctly handled. We have seen that the interpretation for each of the three main types of predictors are similar to that for linear regression except that we now have the odds of the even occurring instead of mean value of the outcome directly. For binary and multi-level predictors we have a comparison of the odds as compared to the reference group and for a continuous predictor we have a 1-unit or c-unit increase in X. Next we will look at multiple logistic regression and the complexities it brings such as confounding as well as testing for and handling interactions. 17