Download Accuracy of Prediction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear least squares (mathematics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Transcript
Accuracy of Prediction
How accurate are predictions
based on a correlation?
Accuracy depends on rXY
 If we know nothing about an individual (e.g.,
we try to predict the IQ of a randomly
selected person), we should guess the
mean.
 If we always guess the mean, then the
variance tells us the average “cost” of our
guesses.
 However, if we use X to predict Y, we can
reduce this cost by r-squared.
On Sale: How Accurate?
 By squaring the correlation, we know what
percentage of variance will be reduced by
using X to predict Y.
 If r = 1 or r = -1, the squared value is 1.
These are both cases of perfect prediction,
like 100% off.
 If r = ½ or r = -½, the squared correlation is
¼ or .25. This means that a correlation of .5
only reduces the cost by 25%.
Variance of Residuals: the “standard
error of regression”
 The average squared deviation between the
guess and the actual value of Y is called the
variance of residuals (errors)
 You compute it by multiplying the original
variance of Y by (1 – r2), where r is the
correlation between X and Y.
 The standard error of regression is the
square root of this variance.
Sample Problem
 Suppose we use sister’s IQ to predict
brother’s IQ. The means of X and Y are
both 100, and the standard deviations are
both 15.
 The variance of predicting Joe’s IQ if we
don’t know Jane’s IQ is 225.
 The correlation is .5, so the variance of the
residuals is (1-.25)(225) = 168.75.
Standard Deviation of Errors
 Take the square root of the variance of
residuals to compute the standard error of
regression, i.e., the standard deviation of
differences between predicted and obtained.
 For our problem, the square root is 12.99,
approximately 13.
 Knowing Sister’s IQ reduces the standard
deviation of residuals from 15 to 13.
Summary
 If Jane has an IQ of 130, we predict her
brother to have an IQ of 115.
 However, not all brothers of sisters with
such IQ will be exactly 115.
 On average, they will have a mean IQ of
115, with a standard deviation of 13.
 The probability that Joe has a higher IQ than
his sister is only about 12%.