Download Regression Review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Time series wikipedia , lookup

Forecasting wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Regression Review
Quiz Prep
Objectives
• Determine the relative error in a measurement or
prediction using a linear model
• Construct scatterplots from sets of data pairs
• Recognize when patterns of points in a scatterplot
have a linear form
• Recognize when the pattern in the scatterplot show
that the two variables are positively or negatively
related
• Identify individual data points, called outliers, that
fall outside the general pattern of the other data
Objectives cont
• Determine residuals between the actual value and
the predicted value for each point in the data set
• Use a graphing calculator to determine a line of best
fit by the least-squares method
• Measure the strength of the correlation (association)
by a correlation coefficient
• Recognize that a strong correlation does not
necessarily imply a linear or cause and effect
relationship
Vocabulary
• Scatterplot – a graph of individual (x, y) points
• Outlier – a data point outside the general pattern of
points in the scatterplot
• Residuals – a statistical term for the error: actual
value – predicted value
• Least Squares Regression Line – a line that minimizes
the sum of the squares of all the residuals
• Linear Correlation Coefficient – r, measures how
strongly two variables follow a linear pattern
• Lurking Variable – better called an extraneous
variable; one that is not measured or accounted for in
the experiment
Vocabulary Cont
• Error – is the difference between the actual value and
the predicted value; error = actual - predicted
• Observed Value – also known as the actual value
• Expected Value – also known as the predicted value
• Relative Error – is the ratio of the error to the observed
value
TI-83 Instructions for Scatter Plots
•
•
•
•
•
•
Enter explanatory variable (x-values) in L1
Enter response variable (y-values) in L2
Press 2nd y= for StatPlot, select 1: Plot1
Turn plot1 on by highlighting ON and enter
Highlight the scatter plot icon and enter
Press ZOOM and select 9: ZoomStat
Interpreting Scatterplots
Scatter plots should be described by
– Direction
positive association (positive slope left to right)
negative association (negative slope left to right)
– Form
linear – straight line,
curved – quadratic, cubic, etc, exponential, etc
– Strength of the form
weak
moderate (either weak or strong)
strong
– Outliers (any points not conforming to the form)
– Clusters (any sub-groups not conforming to the form)
Scatterplot Examples
Strong Negative Linear Association
Response
Response
Response
Explanatory
Explanatory
Strong Positive Linear Association
Explanatory
No Association
Response
Response
Explanatory
Strong Negative Nonlinear Association
Explanatory
Moderate Negative Linear Association
Interpreting our Scatterplot
y
40
Direction
positive association
Form
linear
Strength of the form
relatively strong
Outliers
(240, 15)
Clusters
35
30
25
20
15
10
5
W
100
125
150
175
200
225
250
none
TI-83 Instructions for Linear
Regression
• Enter explanatory variable (x-values) in L1
• Enter response variable (y-values) in L2
• Press STAT  CALC choose option 4:
• 4: LINREG(ax+b) L1,L2,Y1
– get L1 by doing 2nd Key 1; get L2 by 2nd 2
– get Y1 by hitting VARS, Y-VARS, enter, enter
– remember to put comma’s between them
TI-83 Output for Linear Regression
LinReg
y = ax + b
a = 3.567296997
Linear Regression Model:
b = -19.40934372
y = 3.5673x – 19.4093
r2 = .9147134779
r = .9564065443
To predict a value using the model, plug in x=#
into the model.
y = 3.5673(12) – 19.4093 = 23.3983
Residuals Part One
• Positive residuals mean that the observed (actual
value, y) lies above the line (predicted value, y-hat)
predicted value is smaller
• Negative residuals mean that the observed (actual
value, y) lies below the line (predicted value, y-hat)
predicted value is larger
• Order is not optional!
Least Squares Regression Line
residual
residual
• The blue line minimizes the sum of the
squares of the residuals (dark vertical lines)
Important Properties of r
• Correlation makes no distinction between explanatory
and response variables
• r does not change when we change the units of
measurement of x, y or both
• Positive r indicates positive association between the
variables and negative r indicates negative
association
• The correlation r is always a number between -1 and 1
Example 2
Match the r values
to the Scatterplots
to the left
1)
2)
3)
4)
5)
6)
r = -0.99
r = -0.7
r = -0.3
r=0
r = 0.5
r = 0.9
F
E
D
A
B
C
A
D
B
E
C
F
Residuals Part Two
• The sum of the least-squares residuals is always zero
• Residual plots helps assess how well the line
describes the data
• A good fit has
– no discernible pattern to the residuals
– and the residuals should be relatively small in size
• A poor fit violates one of the above
– Discernible patterns:
Curved residual plot
Increasing / decreasing spread in residual plot
(horn-effect)
– Outliers may indicate a poor fit
Residuals Part Two Cont
A)
Unstructured scatter
of residuals
GOOD FIT
Curved pattern of
residuals
B)
POOR FIT
Horn-effect
C)
POOR FIT
Cause-and-Effect Relationships
• Strong correlations between two variables does not
mean that a cause-and-effect relationship exists
• For example there is a strong correlation between
the number of drownings in a month and the number
of cases of Rocky Mountain spotted fever
• Both are tied to the seasonal warming of summer
and having no direct effect on each other
• Cause and effect can only be determined by a well
designed experiment and never by observation
Interpolation and Extrapolation
Minimum
input value
Extrapolation
Extrapolation
Interpolation
Maximum
input value
Error and Relative Error
•
The error in a prediction is the difference between the
observed value (actual value that was measured) and
the predicted value from the model. The Relative
error is the ratio of the error to the observed value.
Observed value – predicted value
Relative error = -------------------------------------------------Observed value
=
•
error
------------------------Observed value
It is normally expressed as a percentage