Download Analysis of Residuals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Birthday problem wikipedia , lookup

Vector generalized linear model wikipedia , lookup

Simplex algorithm wikipedia , lookup

Predictive analytics wikipedia , lookup

Probability box wikipedia , lookup

Generalized linear model wikipedia , lookup

Regression analysis wikipedia , lookup

Transcript
Analysis of Residuals
©2005 Dr. B. C. Paul
Examining Residuals of
Regression (From our Previous Example)
Set up your linear regression in the
Usual manner.
Selecting Plots
After setting your dependent and
Independent variables and before
Clicking ok, click plots instead.
Picking Residual Plots
Plot the residual on the Y axis
Against the predicted value on
The X axis.
Ask for Histograms and normal
Probability plots.
More Plots
Use the next button to allow you
To select another plot.
Then enter the residual on the
Y axis against the dependent
Variable.
Finally tell the computer to
Continue.
You Will Still Get the Normal
Tables we Saw Before
Scroll down
To see what
Is new.
Some Abnormality in the
Histogram
A Histogram is a bar chart
Showing the number of
Results in different numeric
Intervals.
In this case we can see there
May be two families of
Unexplained events and
One of them is causing the
Model to over-predict
(note the negative tail).
We Have a Cumulative
Probability Plot
Cumulative probability
Counts all the samples
That should have come
Up by a certain point
(it is an integration of the
Probability distribution).
Normal would plot on a
Straight line. This is
Somewhat straight but
The slope at the center is
Wrong and the tails
Drift off. (More commentary
On reading cumulative
Probability plots later).
Look for Trends that have
been systematically missed
This plot shows
The residual
(amount we
Missed by) against
The predicted
Value.
If there is a trend
In the points it
May tell us
What we missed.
In this case it is
Pretty scattered.
Missing Trends
We are still missing
Something because
There is a definite
Trend in the residuals
Relative to the actual
MPG.
We are missing a
Variable or factor.
(it might be linear).
Consider Another Data Set
We have an Independent and
Dependent Variable.
(The data set could represent
Any problem we wished to
Model).
Tell it to do a Regression of the
Dependent against the
Independent Variable.
Be sure we also ask for our
Residual plots.
Go to Results
The R^2 value is 0.996 – darn
One is a straight line. How much
Closer do you want to be.
This regression looks like it
Fits like a glove – The
Mean Square for regression
Is 5 orders of magnitude
Greater than the MS for error.
The F statistic blows the null
Hypothesis off the map.
No Chance the Slope or
Constant are Zero
There is some evidence the
distribution of residuals is a little
skewed.
The residual distribution is
definitely skewed off to one side
Oh Boy – Can You See the
Trend we missed here?
Here the residuals
Follow a clear and
Unmistakable shape of
An effect we missed.
This Thing Has a Second
Order or Curved Effect
OK – Now What Do I Do?




Linear Regression Rapidly and Quantitatively Fits a
simple linear function of one variable to another.
We noted that there had to be other effects present
on the gas mileage but linear regression only
handles one independent variable.
We also noted that sometimes there our second or
higher order effects of a variable present – a straight
line just doesn’t fit that
We may want to have some more powerful tools to
fall back on (we just try the easy stuff first).