Download Regression Line

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Time series wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Linear Regression
•
•
•
•
•
•
•
•
•
Essentials
Line Basics

y = mx + b vs. y  b0  b1x
Definitions
Scatter Plot & Regression Line
Notation & Formulae
Regression Considerations
Line of Best Fit – Least Squares Line
Example
Essentials: Regression
(Predictions based upon the known.)





Understand what the regression process does - prediction.
Be able to state the steps we use leading up to the decision to conduct
regression.
Be able to calculate the slope of a line and the y-intercept.
Be able to calculate a regression equation and apply it to the prediction of
other values. Know that these are estimates, not necessarily the actual values
that might occur.
Know what the Least Squares Property and Line of Best Fit. Residual –
what’s that?
A Linear Equation in
One Independent Variable
b is the
y-intercept (the point at which the
line intersects the y-axis). It is the
value of y when x = 0.
y is the dependent
variable (also called the response
variable). Its value depends on the
value of x.
y = mx + b
x is the
independent variable
(also known as the predictor
variable.)
m is the slope of the line.
The slope indicates how much the y-value
increases (or decreases if the slope is negative)
when the x-value increases by 1 unit. When m is
positive, the line will have an upward slope. When
m is negative, the line will have a downward slope.
y
5
4
3
2
1
-4
-3
-2
-1
1
-1
-2
-3
-4
-5
2
3
4
x
y
(-1, 4)
(-2, 2)
.
.
5
4
3
2
1
-x
-4
-3
-2
-1
1
-1
-2
-3
-4
-5
-y
2
3
4
x
y
.
5
.
4
3
2
1
-4
-3
-2
.
.
.
-1
.
1
-1
-2
-3
-4
-5
y=mx+b
y=2x+1
2
3
4
x
The Regression Equation
x is the independent variable (predictor variable)
^
y is the dependent variable (response variable)
^
y = b0 +b1x
(recall, y = mx +b )
Where: b0 = y intercept
b1 = slope
Regression
Definitions
Regression Equation
Given a collection of paired data, the regression
equation
y^ = b + b x
0
1
algebraically describes the relationship between the
two variables
Regression Line
(line of best fit or least-squares line)
The regression line is the graph of the regression
equation
Always Look at a Scatterplot First
You should be able to “see” a straight line being
passed through the data points.
Regression Line Plotted on Scatterplot
The Regression Line is calculated to minimize the distance
of the line from the observed values.
Notation for Regression Equation
Population
Parameter
Sample
Statistic
y-intercept of regression equation
0
b0
Slope of regression equation
1
Equation of the regression line
y =  0 +  1x
b1
y^ = b0 + b1x1
Formulas for b0 and b1
Slope:
nxy  (x)(y )
b1 
2
2
n ( x )  ( x )
y-intercept:
NOTE: If you do not find b1 first,
then b0 may be determined by:
b0  y  b1 x
(y)(x 2 )  (x)(xy)
b0 
n(x 2 )  (x)2
The Regression Line
^
y=


b0 +b1x
Fits the sample points best.
Distances between this line and the sample
points are at a minimum.
When is it reasonable to do
Regression

Start by asking the following:



Does it make sense to look at the relationship between these two
variables?
Does a scatter plot present a relationship (either positive or negative)?
If yes to both, calculate r (the correlation).

Is the correlation statistically significant?



Yes - go on to regression
No – best estimate becomes the mean of the y variable
Conduct regression analysis (if yes above)

Use the regression equation to calculate (estimate) a y-value given a
specific x-value.
Predictions
In predicting a value of y based on some given value
of x ...
1. If there is not a significant linear
correlation, the best predicted y-value is y.
2. If there is a significant linear correlation, the best
predicted y-value is found by substituting the
x-value into the regression equation.
Predicting the Value of a Variable
Start
Calculate the value of r
and test the hypothesis
that  = 0
Is
there a
significant linear
correlation
?
No
Given any value of one
variable, the best predicted
value of the other variable
is its sample mean.
Yes
Use the regression
equation to make
predictions. Substitute
the given value in the
regression equation.
Guidelines for Using The Regression Equation

If there is no significant linear correlation, don’t use
the regression equation to make predictions.

When using the regression equation for predictions,
stay within the scope of the available sample data.

A regression equation based on old data is not
necessarily valid now.

Don’t make predictions about a population that is
different from the population from which the sample
data was drawn.
Definitions

Marginal Change
the amount a variable changes when the
other variable changes by exactly one unit

Outlier
a point lying far away from the other data
points

Influential Points
points which strongly affect the graph of the
regression line
Residuals and the
Least-Squares Property
Definitions

Residual
For a sample of paired (x,y) data, the difference (y - ^
y)
between an observed sample y-value and the value of y-hat,
which is the value of y that is predicted by using the regression
equation.

Least-Squares Property
A straight line satisfies this property if the sum of the squares
of the residuals is the smallest sum possible.
Residuals and the
Least-Squares Property
x
y
y^ = 5 + 4x
1 2 4 5
4 24 8 32
y
32
30
28
26
24
22
20
18
16
14
12
10
8
6
4
2
0
• Residual = 7
• Residual = 11
•
•
Residual = -13
Residual = -5
x
1
2
3
4
5
Example : Orion Cars













Orion Cars: The age and price for a sample of 11 Orions are noted below.
Calculate a correlation coefficient and , if appropriate, a regression equation
for the relationship. Determine the value of cars that are 4.5 years and 10
years old.
Car Age (yrs.)
Price ($100’s)
1
5
85
2
4
103
3
6
70
4
5
82
5
5
89
6
5
98
7
6
66
8
6
95
9
2
169
10
7
70
11
7
48
Example : Orion Cars
Example : Orion Cars
Example : Orion Cars
Statistics
Model Summa ry
Orion Car Age
Orion Car
Price
Valid
11
11
Missing
12
12
Mean
5.27
88.64
Median
5.00
85.00
1.421
31.159
Minimum
2
48
Maximum
7
169
N
Std. Deviation
Model
R
1
.924a
R Square
Adjust ed
R Square
St d. Error of
the Es timate
.853
.837
12.577
a. Predic tors: (Constant), Orion Car Age
Coefficientsa
Unstandardized
Coefficients
Model
1
B
Std. Error
(Constant)
195.468
15.240
Orion Car Age
-20.261
2.800
a. Dependent Variable: Orion Car Price
(Price in thousands)
Standardized
Coefficients
Beta
t
-.924
Sig.
12.826
.000
-7.237
.000
Example : Orion Cars
(Price in thousands)
Example : Orion Cars
(Price in thousands)
With influential point
(Price in thousands)
Without influential point