Download regression-Dalgaard

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Quantium Medical Cardiac Output wikipedia , lookup

Transcript
Thuesen data - regression analysis
roger
April 18, 2016
Notice one missing value.
Dalgaard's Chapter 6 introduces regression with this example, which uses data from this
article of Thuesen et al.
The chapter has no discussion of the scientific questions or even the meaning of the
variables. Thuesen et al wanted to investigate the pathophysiological background for the
increased cardiac performance described in short-term insulin-dependent diabetes. The
intervention was Low-dose intravenous glucagon. The outcome measure (among others)
was "mean circumferential shortening velocity".
Abstract
In order to investigate the pathophysiological background for the increased cardiac performance described in
short-term insulin-dependent diabetes, we infused glucagon intravenously in 8 healthy men at a dose of 5
ng/kg/min for 1 h and at a dose of 10 ng/kg/min for a further hour. Heart rate and blood pressure were measured
and myocardial contractility assessed by echocardiography as the fractional shortening of the left ventricle and as
the mean circumferential shortening velocity before the glucagon infusion (first base-line level), after the first
glucagon infusion period, after the second glucagon infusion period and at 1 h after stopping the glucagon infusion
(second base-line level). Plasma levels of glucagon were 79 +/- 15 ng/l, 123 +/- 76 ng/l, 381 +/- 179 ng/l and 77 +/22 ng/l, respectively. Heart rate decreased significantly during the first (8%, p less than 0.05) and second (6%, p
less than 0.01) glucagon infusion period compared to the mean of the first and the second base-line value. Mean
arterial blood pressure, fractional shortening of the left ventricle and mean circumferential shortening velocity
were unchanged. We conclude that increments in plasma concentrations of glucagon to levels seen in poorly
controlled diabetes does not change myocardial contractility in normal man.
Here is the greatly simplified regression analysis from Dalgaard. We can attach the thuesen
data set, in lieu of including "data=" arguments.
We fit the regression with lm(). We plot the regression line and the predicted values.
attach(thuesen)
search()
## [1]
## [4]
## [7]
## [10]
".GlobalEnv"
"package:stats"
"package:utils"
"Autoloads"
"thuesen"
"package:graphics"
"package:datasets"
"package:base"
ls(pos=2)
## [1] "blood.glucose"
"short.velocity"
"package:ISwR"
"package:grDevices"
"package:methods"
plot(blood.glucose, short.velocity, xlim=c(0,20), ylim=c(1,2))
theFormula = short.velocity ~ blood.glucose
summary(lm(formula = theFormula))
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Call:
lm(formula = theFormula)
Residuals:
Min
1Q
Median
-0.40141 -0.14760 -0.02202
3Q
0.03001
Max
0.43490
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.09781
0.11748
9.345 6.26e-09 ***
blood.glucose 0.02196
0.01045
2.101
0.0479 *
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2167 on 21 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.1737, Adjusted R-squared: 0.1343
F-statistic: 4.414 on 1 and 21 DF, p-value: 0.0479
summary(lm(formula = theFormula, na.action = ))
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Call:
lm(formula = theFormula)
Residuals:
Min
1Q
Median
-0.40141 -0.14760 -0.02202
3Q
0.03001
Max
0.43490
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.09781
0.11748
9.345 6.26e-09 ***
blood.glucose 0.02196
0.01045
2.101
0.0479 *
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2167 on 21 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.1737, Adjusted R-squared: 0.1343
F-statistic: 4.414 on 1 and 21 DF, p-value: 0.0479
abline(lm.out <- lm(theFormula))
abline(lm.out$coefficients, col='red', lty=3, lwd=3)
##
This fails due to the missing value.
try(points(blood.glucose, predict(lm.out), col="blue", pch=2))
## This works.
missingShortVelocity = is.na(thuesen$short.velocity)
points(blood.glucose[-which(missingShortVelocity)], predict(lm.out),
col="blue", pch=2)
The first
attempt to plot the predicted points fails due to the missing value. The "try()" call catches
the error and protects us from having a fatal error.
Note the special method for abline() when the arg is an "lm" object.
It seems that this is the right formula for the scientific question. See the abstract above.
What if you did the opposite regression? Now we repeat, but add a line for the reverse
regression:
plot(blood.glucose, short.velocity, xlim=c(0,20), ylim=c(1,2))
abline(lm.out$coefficients, col='red', lty=3, lwd=3)
points(blood.glucose[-which(missingShortVelocity)], predict(lm.out),
col="blue", pch=2)
## Now add the reverse regression:
theFormulaReversed = blood.glucose ~ short.velocity
lm.Reversed.out <- lm(theFormulaReversed)
slope.original = lm.Reversed.out$coefficients['short.velocity']
slope.mirrored = 1 / slope.original
intercept.original = lm.Reversed.out$coefficients['(Intercept)']
intercept.mirrored = -intercept.original / slope.original
abline(a = intercept.mirrored, b = slope.mirrored, col='green', lty=3, lwd=3)
We have to reflect the regression line, to show it in the same plot. Notice the tilt! The
stronger the correlation, the closer the two lines will be (small angle). Here the correlation
is weak.
What is the effect of the missing value for short.velocity? (Row 16)
lm(formula = theFormula, data=thuesen)
##
## Call:
## lm(formula = theFormula, data = thuesen)
##
## Coefficients:
##
(Intercept) blood.glucose
##
1.09781
0.02196
lm(formula = theFormula, data=thuesen, na.action = na.omit)
## default
##
## Call:
## lm(formula = theFormula, data = thuesen, na.action = na.omit)
##
## Coefficients:
##
(Intercept) blood.glucose
##
1.09781
0.02196
lm(formula = theFormula, data=thuesen, na.action = na.exclude)
## same
##
## Call:
## lm(formula = theFormula, data = thuesen, na.action = na.exclude)
##
## Coefficients:
##
(Intercept) blood.glucose
##
1.09781
0.02196
lm(formula = theFormula, data=thuesen[-which(missingShortVelocity), ])
##
## Call:
## lm(formula = theFormula, data = thuesen[-which(missingShortVelocity),
##
])
##
## Coefficients:
##
(Intercept) blood.glucose
##
1.09781
0.02196
NOTE: if you leave out the call to which() (which I did at first), it leaves out the first row!
That's because the values of missingShortVelocity are F's and T's, zero's and ones.