* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Simple Linear Regression
Data assimilation wikipedia , lookup
Forecasting wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Time series wikipedia , lookup
Interaction (statistics) wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Regression toward the mean wikipedia , lookup
Regression analysis wikipedia , lookup
Lecture 2
Simple Linear Regression
STAT 512
Spring 2011
Background Reading
KNNL: Chapter 1
2-1
Topic Overview
This topic we will cover:
Regression Terminology
Simple Linear Regression with a single
predictor variable
2-2
Relationships Among Variables
Functional Relationships – The value of
the dependent variable Y can be computed
exactly if we know the value of the
independent variable X. (e.g., Y=2X)
Statistical Relationships – Not a perfect or
exact relationship. The expected value of
the response variable Y is a function of the
explanatory or predictor variable X. The
observed value of Y is the expected value
plus a random deviation.
2-3
Simple Linear Regression
2-4
Uses of SLR
Why Use Simple Linear Regression?
Descriptive/ Exploratory purposes (explore
the strength of known cause/effect
relationships)
Administrative Control (often the response
variable is $$$)
Prediction of outcomes (predict future
needs; often overlaps with cost control)
2-5
Statistical Relationships vs. Causality
Statistical relationships do not imply
causality!!!
Example : A Lafayette ice cream shop does
more business on days when attendance at
an Indianapolis swimming pool is high.
2-6
Data for Simple Linear Regression
Observe pairs of variables; Each pair is
called a case or a data point
Y i is the ith value of the response variable;
X i is the ith value of the explanatory (or
predictor) variable; in practice the value of
X i is a known constant.
2-7
Simple Linear Regression Model
Statement of Model
ìï i = 1, 2,..., n
Y i = b 0 + b 1X i + ei where ïí
ïï ei ~ N (0, s 2 )
ïî
Model Parameters (unknown)
b 0 = intercept; may not have meaning
b 1 = slope; b 1 = 0 if no relationship
between X and Y.
s is the error variance
2
2-8
E (Y i )
64447 4448
Y i = b 0 + b1X i + ei
2-9
Interpretation of the Regression
Coefficients
b 0 is the expected value of the response
variable when X = 0.
b 1 represents the increase (or decrease if
negative) in the mean response for a 1-unit
increase in the value of X.
2-10
Features of SLR Model
Errors are independent, identically
distributed normal random variables:
ei ~ Normal (0, s
iid
2
)
iid
Normal (b 0 + b 1X i , s 2 )
Implies Y i ~
(See A.36, p1303 for the proof)
2-11
Fitted Regression Equation
The parameters b 0, b1, s must be estimated
from the data.
2
Estimates denoted b0, b1, s 2 .
Fitted (or estimated) regression line is
Yˆ = b + b X
i
0
1
i
The “hat” symbol is used to differentiate the
fitted value Yˆi from the actual observed
value Y i .
2-12
Residuals
The deviations (or errors) from the true
regression line, ei = Y i - (b 0 + b1X i ),
cannot be known since the regression
parameters b 0 and b 1 are unknown.
We may estimate these by the residuals:
ei = Observed - P redict ed
= Y i - Yˆi
= Y i - (b0 + b1X i )
2-13
Error Terms vs Residuals
2-14
Assumptions
Model assumes that the error terms are
independent, normal, and have constant
variance.
Residuals may be used to explore the
legitimacy of these assumptions.
More on this topic in later.
2-15
Least Squares Estimation
Want to find “best” estimates (b0, b1 ) for
(b 0, b1 ).
Best estimates will minimize the sum of the
squared residuals:
SSE =
å
n
2
i
e =
å
i= 1
2
éY i - (b0 + b1X i )ù
ë
û
To do this, use calculus (see pages 17, 18 of
KNNL).
2-16
Least Squares Solution
The LS estimate for b 1 can be written in
terms of the “sums of squares”
b1 =
å (X - X )(Y - Y ) = SS
SS
X
X
(
)
å
i
D
i
XY
2
i
X
The LS estimate for b 0 is
b0 = Y - b1X
2-17
About the LS Estimates
They are also the maximum likelihood
estimates (see KNNL pages 27-32).
These are the best estimates because they
are unbiased (their expectation is the
parameter that they are estimating) and
they have minimum variance among all
such estimators.
Big picture: We wouldn’t want to use any
other estimates because we can do no
better.
2-18
Mean Square Error
We also need to estimate s . This estimate
is developed based on the sum of the
squared residuals (SSE) and the available
degrees of freedom:
2
ei
SSE
å
s = MSE =
=
dfE
n- 2
2
2
The error degrees of freedom are based on
the fact that we have n observations and 2
parameters (b 0, b1 ) that we have already
estimated.
2-19
Variance Notation
s = MSE will always be the estimate for
2
s . This can be confusing, because there
will be estimated variances for other
quantities, and these will be denoted e.g.
2
2
s {b1 }, s {b0 }, etc. These are not
products, but single variance quantities.
2
To avoid confusion, I will generally write
MSE whenever referring to the estimate
for s 2 .
2-20
EXAMPLE: Diamond Rings
Variables
Response Variable ~ price in Singapore
dollars (Y)
Explanatory Variable ~ weight of diamond
in carats (X)
Associated SAS File
diamonds.sas
2-21
SAS Regression Procedure
PROC REG data=diamonds;
model price=weight;
RUN;
2-22
Output (1)
Source
Model
Error
Total
DF
1
46
47
Sum of
Squares
2098596
46636
2145232
Mean
Square
2098596
1013.81886
Root MSE = 31.84052
2-23
Output (2)
Variable DF
Intercept 1
weight
1
Parameter
Estimate
-259.62591
3721.02485
Standard
Error
17.31886
81.78588
2-24
Output Summary
From the output, we see that
b0 = - 259.6
b1 = 3721.0
MSE = 1014
MSE = 31.8
Note that the Root-MSE has a direct interpretation
as the estimated standard deviation (in $$).
2-25
Interpretations
It doesn’t really make sense to talk about a
1-carat increase. But we can change this to
a 0.01-carat increase by dividing by 100.
From b1 we see that a 0.01-carat increase in
the weight of a diamond will lead to a
$37.21 increase in the mean response.
The interpretation of b0 would be that one
would actually be paid $260 to simply take
a 0-carat diamond ring. Why doesn’t this
make sense?
2-26
Scope of Model
The scope of a regression model is the
range of X-values over which we actually
have data.
Using a model to look at X-values outside
the scope of the model (extrapolation) is
quite dangerous.
2-27
2-28
Prediction for 0.43 Carats
Does this make sense in light of the previous
discussion?
Suppose we assume that it does. Then the
mean price for a 0.43 carat ring can be
computed as follows:
Yˆ = - 260 + 3721(0.43) = 1340
How confident would you be in this
estimate?
2-29
Upcoming in Lecture 3...
We will discuss more about inference
concerning the regression coefficients
Background Reading
o 2.1-2.6
2-30