Download Computational formulas for confidence and prediction intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Eigenstate thermalization hypothesis wikipedia , lookup

German tank problem wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Confidence interval for the mean of Y at a given value of X:
To compute this interval, we need the standard error for the predicted mean of Y at a given value of x,
which is unfortunately messy. It depends upon: (1) the estimated RMSE; (2) the sample size (n), (3) the
distance of the given value of X (xi) from the sample mean of X, and the total variation in X (the sum of
squares of X, which is the numerator of the sample variance of X).
Specifically, we write this CI:
yˆ i  t n  2 RMSE
( xi  x ) 2
1

n  ( xi  x ) 2
At x=115 for Cyril Burt’s data we have
 1 (115  97.4) 2 
114.13  2.01(7.38926)  
 114.13  3.20  (110.93, 117.33)
11,222.2 
 53
Prediction interval for the values of Y at a given value of X
We just predicted the average value of Y at a given value of X; now we want to predict Johnny’s IQ
(he’s in a foster home) on the basis of his brother Frankie’s IQ. We need to include 2 sources of
variation: (1) Deviation of y^ at xi from the true mean (as above); and (2) Deviation of Johnny from mean
for all foster twins, whose sibs have an IQ = 115. We are not predicting a mean anymore; we seek a range
of plausible values for the values of individual cases. In other words, we’re not using the regression line
for what it was designed to do best (predict on average) but for something it is not likely to do well!!
The formula for the prediction interval is very similar to that for the confidence interval above, but we
add an additional value of 1 under the square root sign which essentially adds an additional factor of the
RMSE to the interval:
1

( xi  x ) 2
yˆ i  t n  2 RMSE  

1

2
 n  ( xi  x )

At x=115 for Cyril Burt’s data we have
 1 (115  97.4) 2 
114.13  2.01(7.38926)  
 1  114.13  15.16  (98.77, 129.29)
53
11
,
222
.
2


We call this a prediction interval, not a confidence interval, because we are not predicting a population
parameter, but rather a range of plausible values for individual cases. Note that the interval is very very
wide; Johnny could be average (98.77) or very, very bright (IQ of nearly 130). Interval covers 2 sds of
IQ. Interval is almost so wide as to not be very useful. Although it’s unlikely that Johnny is very dull,
we don’t know much more. Bottom line: Very difficult to predict an individual Y on the basis of
regression models. These models are suited to predicting what happens on average. (Recall that the
confidence interval for the mean was very narrow--3 IQ points in either direction.)