Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

German tank problem wikipedia , lookup

Time series wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Confidence interval wikipedia , lookup

Choice modelling wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Slides Prepared by
JOHN S. LOUCKS
St. Edward’s University
© 2002 South-Western/Thomson Learning
Slide 1
Chapter 14
Simple Linear Regression









Simple Linear Regression Model
Least Squares Method
Coefficient of Determination
Model Assumptions
Testing for Significance
Using the Estimated Regression Equation
for Estimation and Prediction
Computer Solution
Residual Analysis: Validating Model Assumptions
Residual Analysis: Outliers and Influential
Observations
Slide 2
The Simple Linear Regression Model

Simple Linear Regression Model
y = 0 + 1 x + 

Simple Linear Regression Equation
E(y) = 0 + 1x

Estimated Simple Linear Regression Equation
y^ = b0 + b1x
Slide 3
Least Squares Method

Least Squares Criterion
min  (y i  y i ) 2
where:
yi = observed value of the dependent variable
for the ith observation
y^i = estimated value of the dependent variable
for the ith observation
Slide 4
The Least Squares Method

Slope for the Estimated Regression Equation
 xi y i  (  xi  y i ) / n
b1 
2
2
 xi  (  xi ) / n

y-Intercept for the Estimated Regression Equation
_
_
b0 = y - b1x
where:
xi = value of independent variable for ith observation
yi = value of dependent variable for ith observation
_
x = mean value for independent variable
_
y = mean value for dependent variable
n = total number of observations
Slide 5
Example: Reed Auto Sales

Simple Linear Regression
Reed Auto periodically has a special week-long sale.
As part of the advertising campaign Reed runs one or
more television commercials during the weekend
preceding the sale. Data from a sample of 5 previous
sales are shown below.
Number of TV Ads
1
3
2
1
3
Number of Cars Sold
14
24
18
17
27
Slide 6
Example: Reed Auto Sales



Slope for the Estimated Regression Equation
b1 = 220 - (10)(100)/5 = 5
24 - (10)2/5
y-Intercept for the Estimated Regression Equation
b0 = 20 - 5(2) = 10
Estimated Regression Equation
y^ = 10 + 5x
Slide 7
Example: Reed Auto Sales
Scatter Diagram
30
25
Cars Sold

20
y = 5x + 10
15
10
5
0
0
1
2
TV Ads
3
4
Slide 8
The Coefficient of Determination

Relationship Among SST, SSR, SSE
SST = SSR + SSE
2
2
^ )2
 ( y i  y )   ( y^i  y )   ( y i  y
i

Coefficient of Determination
r2 = SSR/SST
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Slide 9
Example: Reed Auto Sales

Coefficient of Determination
r2 = SSR/SST = 100/114 = .8772
The regression relationship is very strong since
88% of the variation in number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.
Slide 10
The Correlation Coefficient

Sample Correlation Coefficient
rxy  (sign of b1 ) Coefficien t of Determinat ion
rxy  (sign of b1 ) r 2
where:
b1 = the slope of the estimated regression
equation yˆ  b0  b1 x
Slide 11
Example: Reed Auto Sales

Sample Correlation Coefficient
rxy  (sign of b1 ) r 2
The sign of b1 in the equation yˆ  10  5 x is “+”.
rxy = + .8772
rxy = +.9366
Slide 12
Model Assumptions

Assumptions About the Error Term 
• The error  is a random variable with mean of
zero.
• The variance of  , denoted by  2, is the same for
all values of the independent variable.
• The values of  are independent.
• The error  is a normally distributed random
variable.
Slide 13
Testing for Significance



To test for a significant regression relationship, we
must conduct a hypothesis test to determine whether
the value of 1 is zero.
Two tests are commonly used
• t Test
• F Test
Both tests require an estimate of  2, the variance of 
in the regression model.
Slide 14
Testing for Significance

An Estimate of  2
The mean square error (MSE) provides the estimate
of  2, and the notation s2 is also used.
s2 = MSE = SSE/(n-2)
where:
SSE   (yi  yˆi ) 2   ( yi  b0  b1 xi ) 2
Slide 15
Testing for Significance

An Estimate of 
• To estimate  we take the square root of  2.
• The resulting s is called the standard error of the
estimate.
SSE
s  MSE 
n2
Slide 16
Testing for Significance: t Test

Hypotheses
H 0 : 1 = 0
H a : 1 = 0

Test Statistic
b1
t
sb1

Rejection Rule
Reject H0 if t < -t or t > t
where t is based on a t distribution with
n - 2 degrees of freedom.
Slide 17
Example: Reed Auto Sales

t Test
• Hypotheses
• Rejection Rule
H 0 : 1 = 0
H a : 1 = 0
For  = .05 and d.f. = 3, t.025 = 3.182
Reject H0 if t > 3.182
• Test Statistics
t = 5/1.08 = 4.63
• Conclusions
Reject H0
Slide 18
Confidence Interval for 1


We can use a 95% confidence interval for 1 to test
the hypotheses just used in the t test.
H0 is rejected if the hypothesized value of 1 is not
included in the confidence interval for 1.
Slide 19
Confidence Interval for 1

The form of a confidence interval for 1 is:
b1  t / 2 sb1
where
b1 is the point estimate
t / 2 sb1 is the margin of error
t / 2 is the t value providing an area
of /2 in the upper tail of a
t distribution with n - 2 degrees
of freedom
Slide 20
Example: Reed Auto Sales



Rejection Rule
Reject H0 if 0 is not included in the confidence
interval for 1.
95% Confidence Interval for 1
b1  t / 2 sb1 = 5 +/- 3.182(1.08) = 5 +/- 3.44
or 1.56 to 8.44
Conclusion
Reject H0
Slide 21
Testing for Significance: F Test

Hypotheses
H 0 : 1 = 0
H a : 1 = 0

Test Statistic
F = MSR/MSE

Rejection Rule
Reject H0 if F > F
where F is based on an F distribution with 1 d.f. in
the numerator and n - 2 d.f. in the denominator.
Slide 22
Example: Reed Auto Sales

F Test
• Hypotheses
• Rejection Rule
H 0 : 1 = 0
H a : 1 = 0
For  = .05 and d.f. = 1, 3: F.05 = 10.13
Reject H0 if F > 10.13.
• Test Statistic
F = MSR/MSE = 100/4.667 = 21.43
• Conclusion
We can reject H0.
Slide 23
Some Cautions about the
Interpretation of Significance Tests


Rejecting H0: 1 = 0 and concluding that the
relationship between x and y is significant does not
enable us to conclude that a cause-and-effect
relationship is present between x and y.
Just because we are able to reject H0: 1 = 0 and
demonstrate statistical significance does not enable
us to conclude that there is a linear relationship
between x and y.
Slide 24
Using the Estimated Regression Equation
for Estimation and Prediction

Confidence Interval Estimate of E(yp)
y p  t /2 s y p

Prediction Interval Estimate of yp
yp + t/2 sind
where the confidence coefficient is 1 -  and
t/2 is based on a t distribution with n - 2 d.f.
Slide 25
Example: Reed Auto Sales



Point Estimation
If 3 TV ads are run prior to a sale, we expect the
mean number of cars sold to be:
y^ = 10 + 5(3) = 25 cars
Confidence Interval for E(yp)
95% confidence interval estimate of the mean number
of cars sold when 3 TV ads are run is:
25 + 4.61 = 20.39 to 29.61 cars
Prediction Interval for yp
95% prediction interval estimate of the number of
cars sold in one particular week when 3 TV ads are
run is:
25 + 8.28 = 16.72 to 33.28 cars
Slide 26
Residual Analysis

Residual for Observation i
yi – y^i

Standardized Residual for Observation i
y i  y^i
^
sy i  y i
where:
syi  y^i  s 1  hi
Slide 27
Example: Reed Auto Sales

Residuals
Observation
1
2
3
4
5
Predicted Cars Sold
15
25
20
15
25
Residuals
-1
-1
-2
2
2
Slide 28
Example: Reed Auto Sales
Residual Plot
TV Ads Residual Plot
3
2
Residuals

1
0
-1
-2
-3
0
1
2
3
4
TV Ads
Slide 29
Residual Analysis

Detecting Outliers
• An outlier is an observation that is unusual in
comparison with the other data.
• Minitab classifies an observation as an outlier if its
standardized residual value is < -2 or > +2.
• This standardized residual rule sometimes fails to
identify an unusually large observation as being
an outlier.
• This rule’s shortcoming can be circumvented by
using studentized deleted residuals.
• The |i th studentized deleted residual| will be
larger than the |i th standardized residual|.
Slide 30
End of Chapter 14
Slide 31