Download Residual and Residual Plot

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Least squares wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
© Copyright 2015. All rights reserved. www.cpalms.org
Introduction
Maybe you noticed that you missed a lot of free throws
in basketball games. You decided to practice your free
throw shooting to improve.
Maybe you told a joke that hurt your friend’s feelings.
You remembered to be more sensitive around him or
her in the future.
We all learn from our mistakes. In mathematics, too, you
can learn a lot about data by looking at error.
That’s what this lesson is all about!
© Copyright 2015. All rights reserved. www.cpalms.org
Vocabulary
residual
residual plot
© Copyright 2015. All rights reserved. www.cpalms.org
Residuals
Is the difference between the observed value of the
dependent variable (y) and the predicted value (ŷ)
from the Regression Line. Each data point has one
residual.
 Residual = Observed value - Predicted value
residual = y – ŷ
© Copyright 2015. All rights reserved. www.cpalms.org
© Copyright 2015. All rights reserved. www.cpalms.org
Why we do Residual
Analysis in Regression?
Because the estimated linear regression line you
calculated may not be the “best” linear
regression line
Because a linear regression model is not always
appropriate for the data
You should assess the appropriateness of the
model by calculating residuals and examining
residual plots.
© Copyright 2015. All rights reserved. www.cpalms.org
Residual Plots
A residual plot is a graph that shows the residuals on the
vertical axis and the independent variable on the horizontal
axis.
If the points in a residual plot look randomly dispersed
around the horizontal axis (No obvious mathematical
patterns), a linear regression model is appropriate for the
data;
otherwise, a different linear equation or a non-linear model
is more appropriate.
© Copyright 2015. All rights reserved. www.cpalms.org
Random Pattern
a linear regression model is appropriate
© Copyright 2015. All rights reserved. www.cpalms.org
Non-random:
a non-linear model is more appropriate.
© Copyright 2015. All rights reserved. www.cpalms.org
Non-random:
a non-linear model is more appropriate.
© Copyright 2015. All rights reserved. www.cpalms.org
Example 1
The equation
y = −2x + 20 models the
data in the table at the
left. Is the model a
good fit?
© Copyright 2015. All rights reserved. www.cpalms.org
Step 1- Calculate the residuals and organize
your calculations in the table.
Observed y
ŷ Predicted Value
Plug the x value into
y = −2x + 20 to get the
ŷ value from model
(predicted value).
Find the Residual
(difference between
observed and predicted
value) Residual = y – ŷ
© Copyright 2015. All rights reserved. www.cpalms.org
We graph x versus
Residual
Step 2: Use the points (x, residual)
to make a scatter plot.
The points are
randomly
dispersed about
the horizontal axis.
So, the equation
y = − 2x + 20 is a
good fit.
© Copyright 2015. All rights reserved. www.cpalms.org
The line is a Good Fit
Example 2
The table at the left shows the
ages x and salaries y (in
thousands of dollars) of eight
employees at a company. The
equation y = 0.2x + 38 models
the data. Is the model a good fit?
© Copyright 2015. All rights reserved. www.cpalms.org
Step 1- Calculate the residuals and organize your
calculations to make a scatter plot results in a table.
Plug the x value into
y = 0.2x + 38
Residual = y – ŷ
© Copyright 2015. All rights reserved. www.cpalms.org
Step 2: Use the points (x, residual)
to make a scatter plot.
The points form a ∩shaped pattern. So, the
equation y = 0.2x + 38
does not model the
data well.
© Copyright 2015. All rights reserved. www.cpalms.org
Recap:
One way to determine how well a line of fit models
a data set is to analyze residuals.
A residual is the difference between the observed
y-value of a data point and the corresponding
predicted y-value found using the line of fit. A
residual can be positive, negative, or zero.
A plot of the residuals shows how well a model fits
a data set.
If the model is a good fit, then the residual points
will be randomly dispersed about the horizontal
axis. If the model is not a good fit, then the residual
points will form some type of pattern.
© Copyright 2015. All rights reserved. www.cpalms.org