Download Simple Linear Regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Forecasting wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Time series wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression toward the mean wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Lecture 2
Simple Linear Regression
STAT 512
Spring 2011
Background Reading
KNNL: Chapter 1
2-1
Topic Overview
This topic we will cover:
 Regression Terminology
 Simple Linear Regression with a single
predictor variable
2-2
Relationships Among Variables
 Functional Relationships – The value of
the dependent variable Y can be computed
exactly if we know the value of the
independent variable X. (e.g., Y=2X)
 Statistical Relationships – Not a perfect or
exact relationship. The expected value of
the response variable Y is a function of the
explanatory or predictor variable X. The
observed value of Y is the expected value
plus a random deviation.
2-3
Simple Linear Regression
2-4
Uses of SLR
Why Use Simple Linear Regression?
 Descriptive/ Exploratory purposes (explore
the strength of known cause/effect
relationships)
 Administrative Control (often the response
variable is $$$)
 Prediction of outcomes (predict future
needs; often overlaps with cost control)
2-5
Statistical Relationships vs. Causality
 Statistical relationships do not imply
causality!!!
 Example : A Lafayette ice cream shop does
more business on days when attendance at
an Indianapolis swimming pool is high.
2-6
Data for Simple Linear Regression
 Observe pairs of variables; Each pair is
called a case or a data point
 Y i is the ith value of the response variable;
 X i is the ith value of the explanatory (or
predictor) variable; in practice the value of
X i is a known constant.
2-7
Simple Linear Regression Model
 Statement of Model
ìï i = 1, 2,..., n
Y i = b 0 + b 1X i + ei where ïí
ïï ei ~ N (0, s 2 )
ïî
 Model Parameters (unknown)
 b 0 = intercept; may not have meaning
 b 1 = slope; b 1 = 0 if no relationship
between X and Y.
 s is the error variance
2
2-8
E (Y i )
64447 4448
Y i = b 0 + b1X i + ei
2-9
Interpretation of the Regression
Coefficients
 b 0 is the expected value of the response
variable when X = 0.
 b 1 represents the increase (or decrease if
negative) in the mean response for a 1-unit
increase in the value of X.
2-10
Features of SLR Model
 Errors are independent, identically
distributed normal random variables:
ei ~ Normal (0, s
iid
2
)
iid
Normal (b 0 + b 1X i , s 2 )
 Implies Y i ~
(See A.36, p1303 for the proof)
2-11
Fitted Regression Equation
 The parameters b 0, b1, s must be estimated
from the data.
2
 Estimates denoted b0, b1, s 2 .
 Fitted (or estimated) regression line is
Yˆ = b + b X
i
0
1
i
 The “hat” symbol is used to differentiate the
fitted value Yˆi from the actual observed
value Y i .
2-12
Residuals
 The deviations (or errors) from the true
regression line, ei = Y i - (b 0 + b1X i ),
cannot be known since the regression
parameters b 0 and b 1 are unknown.
 We may estimate these by the residuals:
ei = Observed - P redict ed
= Y i - Yˆi
= Y i - (b0 + b1X i )
2-13
Error Terms vs Residuals
2-14
Assumptions
 Model assumes that the error terms are
independent, normal, and have constant
variance.
 Residuals may be used to explore the
legitimacy of these assumptions.
 More on this topic in later.
2-15
Least Squares Estimation
 Want to find “best” estimates (b0, b1 ) for
(b 0, b1 ).
 Best estimates will minimize the sum of the
squared residuals:
SSE =
å
n
2
i
e =
å
i= 1
2
éY i - (b0 + b1X i )ù
ë
û
 To do this, use calculus (see pages 17, 18 of
KNNL).
2-16
Least Squares Solution
 The LS estimate for b 1 can be written in
terms of the “sums of squares”
b1 =
å (X - X )(Y - Y ) = SS
SS
X
X
(
)
å
i
D
i
XY
2
i
X
 The LS estimate for b 0 is
b0 = Y - b1X
2-17
About the LS Estimates
 They are also the maximum likelihood
estimates (see KNNL pages 27-32).
 These are the best estimates because they
are unbiased (their expectation is the
parameter that they are estimating) and
they have minimum variance among all
such estimators.
 Big picture: We wouldn’t want to use any
other estimates because we can do no
better.
2-18
Mean Square Error
 We also need to estimate s . This estimate
is developed based on the sum of the
squared residuals (SSE) and the available
degrees of freedom:
2
ei
SSE
å
s = MSE =
=
dfE
n- 2
2
2
 The error degrees of freedom are based on
the fact that we have n observations and 2
parameters (b 0, b1 ) that we have already
estimated.
2-19
Variance Notation
 s = MSE will always be the estimate for
2
s . This can be confusing, because there
will be estimated variances for other
quantities, and these will be denoted e.g.
2
2
s {b1 }, s {b0 }, etc. These are not
products, but single variance quantities.
2
 To avoid confusion, I will generally write
MSE whenever referring to the estimate
for s 2 .
2-20
EXAMPLE: Diamond Rings
Variables
 Response Variable ~ price in Singapore
dollars (Y)
 Explanatory Variable ~ weight of diamond
in carats (X)
Associated SAS File
 diamonds.sas
2-21
SAS Regression Procedure
PROC REG data=diamonds;
model price=weight;
RUN;
2-22
Output (1)
Source
Model
Error
Total
DF
1
46
47
Sum of
Squares
2098596
46636
2145232
Mean
Square
2098596
1013.81886
Root MSE = 31.84052
2-23
Output (2)
Variable DF
Intercept 1
weight
1
Parameter
Estimate
-259.62591
3721.02485
Standard
Error
17.31886
81.78588
2-24
Output Summary
From the output, we see that
b0 = - 259.6
b1 = 3721.0
MSE = 1014
MSE = 31.8
Note that the Root-MSE has a direct interpretation
as the estimated standard deviation (in $$).
2-25
Interpretations
 It doesn’t really make sense to talk about a
1-carat increase. But we can change this to
a 0.01-carat increase by dividing by 100.
 From b1 we see that a 0.01-carat increase in
the weight of a diamond will lead to a
$37.21 increase in the mean response.
 The interpretation of b0 would be that one
would actually be paid $260 to simply take
a 0-carat diamond ring. Why doesn’t this
make sense?
2-26
Scope of Model
 The scope of a regression model is the
range of X-values over which we actually
have data.
 Using a model to look at X-values outside
the scope of the model (extrapolation) is
quite dangerous.
2-27
2-28
Prediction for 0.43 Carats
 Does this make sense in light of the previous
discussion?
 Suppose we assume that it does. Then the
mean price for a 0.43 carat ring can be
computed as follows:
Yˆ = - 260 + 3721(0.43) = 1340
How confident would you be in this
estimate?
2-29
Upcoming in Lecture 3...
 We will discuss more about inference
concerning the regression coefficients
 Background Reading
o 2.1-2.6
2-30