Download 1 Now we will look at one continuous predictor. The basic model

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Choice modelling wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Confidence interval wikipedia , lookup

Transcript
Now we will look at one continuous predictor.
The basic model looks the same as for binary predictors except now we can plug in any
value for X that is reasonable.
1
Similar to our earlier derivations, we want to look at the odds ratio in the case of continuous
predictors. As in linear regression, by default we will look at a 1-unit increase in X.
So here we have used our model to find the log odds of the outcome event for a value of
the predictor of X+1. Substituting X+1 gives beta_0 + beta_1 times the quantity X + 1. We
will distribute the beta_1 to get beta_0 + beta_1 + beta_1 times X.
Then we have the log odds for a value of the predictor of X. Substituting simply returns
beta_0 + beta_1 times X.
To find the log odds ratio, we find the log odds for X+1 minus the log odds for X. Simplifying
we cancel the beta_0 and beta_1 times X terms to get just beta_1 as the log odds ratio for
a 1-unit increase in X.
Again we exponentiate to find the odds ratio for a 1-unit increase in X.
2
Often, we are interested in a more meaningful change than a 1-unit increase. If we want to
find the odds ratio for a delta-unit increase in X we perform a similar calculation.
The result of which is that the log odds ratio for a delta-unit increase in X is equal to beta_1
times delta and we can exponentiate this value to find the odds ratio for a delta-unit
increase in X.
We will illustrate applying these results with an example shortly.
3
Here is the SAS code for the logistic model relating CHD to the continuous predictor AGE in
the WCGS data.
• We begin with PROC LOGISTIC on our data.
• We have NO class statement here – AGE is continuous.
• Then we have our model statement.
• On the left of the equals we have the binary outcome variable CHD69 with the event =
Yes specified and on the right of the equals the predictor AGE
• We have added the confidence intervals for the parameter estimates to this output – we
could have requested this in the previous two as well – notice that the parameter
estimates tables from the earlier results did not have confidence intervals. One reason is
we will usually use the results from the odds ratios directly and so don’t really need to
see these intervals for the BETAs. This will produce a separate table instead of providing
the results with the parameter estimates table.
• That is the basic code for proc logistic but we will also use ODDSRATIO and UNITS
statements.
• Here we use the ODDSRATIO statement to request the ODDSRATIO for the predictor
AGE but since we specified the units for age are 10, this will give the odds ratio for a 10year increase in age, instead of a 1-year increase in age which would be the default.
4
There isn’t much new in the output here.
We don’t have any class level information because our predictor is continuous.
We are back to having 1 degree of freedom in the test of the global null.
5
We have our parameter estimates table and odds ratio for age along with the graph – which
I edited to fit here.
We also get the confidence interval for our parameter estimates from the clparm = wald
option.
6
From the ODDSRATIO and UNITS statements, we get another table with an odds ratio, but
this is for a 10-year increase in age – along with a graphical representation of this odds
ratio, again edited to fit here.
7
Here I used SAS to obtain this table which we will discuss later. You will be asked to do this
by hand.
8
We can calculate the odds ratio for a 1-unit increase in age by exponentiating the
parameter estimate of 0.0744 to get 1.077.
We can see that it is highly statistically significant.
9
Again, we have two ways to interpret this odds ratio.
Here we wrote:
• For each 1 year increase in age, the odds of coronary heart disease are 1.077 times
larger. The 95% confidence interval suggests this value could be as low as 1.054 to as
high as 1.101.
• For each 1 year increase in age, the odds of coronary heart disease increases by 7.7%.
The 95% confidence interval suggests this value could be as low as 5.4% to as high as
10.1%.
We will see eventually that for continuous predictors with odds ratios between 0 and 1 we
will prefer the percent decrease terminology as it will be clearer than the first choice.
We could construct these confidence intervals for a 1-unit increase in X ourselves using
exactly the same equation as before.
10
We saw earlier that if we want to estimate a delta unit increase then we multiply the
parameter estimate by the delta value and then exponentiate the results.
Here, for a 5-year increase in age, we have exp of 5 times beta_1 which is 1.451.
11
Here we will use c instead of delta. Feel free to keep using delta, here instead we just
thought it was a little nicer.
To construct the confidence interval for a c-unit increase in X, we must first find the
confidence interval for a 1-unit increase using the formula on the bottom.
Then we multiply both of those values by c before exponentiating to find the confidence
interval for the odds ratio for a c-unit increase in X.
You can practice this with the presented results on a 10-year increase in age but here we
presented the results for a 5-year increase.
12
Using the confidence interval for the original parameter estimate (not the confidence
interval for the for odds ratio) we can calculate the confidence interval for a 5-year increase
in age by multiplying the log odds confidence interval values by 5 and then exponentiating
the results.
Again, you can practice all of this with a 10-year increase and check your results with the
output provided.
13
We can interpret this odds ratio for a 5-year increase similarly.
For each 5-year increase in age, the odds of coronary heart disease are 1.451 times larger.
The 95% confidence interval ranges from 1.299 – 1.621.
14
We can interpret this odds ratio for a 5-year increase similarly.
For each 5 year increase in age, the odds of coronary heart disease increases by 45.1%
(95% confidence interval 29.9% – 62.1%)
BE CAREFUL when odds ratio is less than 1, the first interpretation still works, although for
example we would say 0.6 times SMALLER, but the % decrease is found by subtracting the
odds ratio from 1
15
Again, see if you can use the model to predict two probabilities – P(50) and P(60).
Using those two probabilities find the odds(50) and odds(60).
Then find the odds ratio, relative risk, and excess risk from the calculated probabilities or
odds as appropriate.
There are 5 more sets you can practice with but soon enough you will have your own
assignment on logistic regression with one predictor.
16
Notice there are many similarities between our work here and that for linear regression.
But, because the left-side of our model is now the LOG-ODDS of the outcome occurring for
a specific value of X, there are added difficulties that must be correctly handled.
We have seen that the interpretation for each of the three main types of predictors are
similar to that for linear regression except that we now have the odds of the even occurring
instead of mean value of the outcome directly.
For binary and multi-level predictors we have a comparison of the odds as compared to the
reference group and for a continuous predictor we have a 1-unit or c-unit increase in X.
Next we will look at multiple logistic regression and the complexities it brings such as
confounding as well as testing for and handling interactions.
17