Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Confidence interval for the mean of Y at a given value of X: To compute this interval, we need the standard error for the predicted mean of Y at a given value of x, which is unfortunately messy. It depends upon: (1) the estimated RMSE; (2) the sample size (n), (3) the distance of the given value of X (xi) from the sample mean of X, and the total variation in X (the sum of squares of X, which is the numerator of the sample variance of X). Specifically, we write this CI: yˆ i t n 2 RMSE ( xi x ) 2 1 n ( xi x ) 2 At x=115 for Cyril Burt’s data we have 1 (115 97.4) 2 114.13 2.01(7.38926) 114.13 3.20 (110.93, 117.33) 11,222.2 53 Prediction interval for the values of Y at a given value of X We just predicted the average value of Y at a given value of X; now we want to predict Johnny’s IQ (he’s in a foster home) on the basis of his brother Frankie’s IQ. We need to include 2 sources of variation: (1) Deviation of y^ at xi from the true mean (as above); and (2) Deviation of Johnny from mean for all foster twins, whose sibs have an IQ = 115. We are not predicting a mean anymore; we seek a range of plausible values for the values of individual cases. In other words, we’re not using the regression line for what it was designed to do best (predict on average) but for something it is not likely to do well!! The formula for the prediction interval is very similar to that for the confidence interval above, but we add an additional value of 1 under the square root sign which essentially adds an additional factor of the RMSE to the interval: 1 ( xi x ) 2 yˆ i t n 2 RMSE 1 2 n ( xi x ) At x=115 for Cyril Burt’s data we have 1 (115 97.4) 2 114.13 2.01(7.38926) 1 114.13 15.16 (98.77, 129.29) 53 11 , 222 . 2 We call this a prediction interval, not a confidence interval, because we are not predicting a population parameter, but rather a range of plausible values for individual cases. Note that the interval is very very wide; Johnny could be average (98.77) or very, very bright (IQ of nearly 130). Interval covers 2 sds of IQ. Interval is almost so wide as to not be very useful. Although it’s unlikely that Johnny is very dull, we don’t know much more. Bottom line: Very difficult to predict an individual Y on the basis of regression models. These models are suited to predicting what happens on average. (Recall that the confidence interval for the mean was very narrow--3 IQ points in either direction.)