Download x - statspages

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia , lookup

Confidence interval wikipedia , lookup

Regression toward the mean wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
CHAPTER 15
Simple Linear Regression
and Correlation
to accompany
Introduction to Business Statistics
seventh edition, by Ronald M. Weiers
Presentation by Priscilla Chaffe-Stengel
Donald N. Stengel
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 15 - Key Concept
Regression analysis generates a
“best-fit” mathematical
equation that can be used in
predicting the values of the
dependent variable as a
function of the independent
variable.
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Direct vs Inverse Relationships
• Direct relationship:
– As x increases, y increases.
– The graph of the model rises from left to right.
– The slope of the linear model is positive.
• Inverse relationship:
– As x increases, y decreases.
– The graph of the model falls from left to right.
– The slope of the linear model is negative.
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression Model
• Probabilistic Model:
yi = b0 + b1xi + ei
where yi = a value of the dependent variable, y
xi = a value of the independent variable, x
b0 = the y-intercept of the regression line
b1 = the slope of the regression line
ei = random error, the residual
• Deterministic Model:
yˆ = b + b x
i
0
1 i
where b  b , b  b
0 0 1 1
and yˆi is the predicted value of y in contrast to the actual
value of y.
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Determining the Least Squares
Regression Line
• Least Squares Regression Line:
yˆ = b0 + b1x1
– Slope
( x y ) – n x  y
i i
b =
1
( x 2 ) – n x 2
i
– y-intercept
b = y – b x
0
1
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression:
An Example
• Problem 15.9:
For a sample of 8 employees, a personnel director has
collected the following data on ownership of company
stock, y, versus years with the firm, x.
x
y
6
300
12
408
14
560
6
252
9
288
13
650
15
630
9
522
(a) Determine the least squares regression line and
interpret its slope. (b) For an employee who has been with
the firm 10 years, what is the predicted number of shares
of stock owned?
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Excel Output, Problem 15.9, cont.
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
0.848584
0.72009481
Adjusted R Square 0.67344395
Standard Error
91.4789339
Observations
8
ANOVA
df
SS
MS
Regression
1
129173.1279 129173.128
Residual
6
50210.37209 8368.39535
Total
7
Years
Significance F
15.43583 0.00772299
179383.5
Coefficients Standard Error
Intercept
F
t Stat
P-value
Lower 95%
Upper 95%
44.3139535
108.5086985 0.40839079 0.69716178 -221.197461 309.825368
38.755814
9.864427133 3.92884589 0.00772299 14.6184126 62.8932153
The y-intercept
The slope
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Problem 15.9, cont.
• Interpretation of the slope: For every
additional year an employee works for the
firm, the employee acquires an estimated
38.8 shares of stock per year.
• If x1 = 10, the point estimate for the number
of shares of stock that this employee owns
is:
yˆ = 44.314 + 38.7558 x
= 44.314 + 38.7558(10)
= 431.872  432 shares
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Interval Estimates Using the
Regression Model
• Confidence Interval for the Mean of y
– places an upper and lower bound around
the point estimate for the average value
of y given x.
• Prediction Interval for an Individual y
– places an upper and lower bound around
the point estimate for an individual value
of y given x.
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
To Form Interval Estimates
• The Standard Error of the Estimate, sy,x
– The standard deviation of the distribution of the
» data points above and below the regression line,
» distances between actual and predicted values of y,
» residuals, of e
– The square root of MSE given by ANOVA
( yi – yˆ )2
s y,x =
n–2
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Equations for the Interval Estimates
• Confidence Interval for the Mean of y
2
ˆy  ta (s y,x) 1n + (x value – x)
( x )2
2
( x 2) – ni
i
• Prediction Interval for the Individual y
ˆy  ta (sy,x ) 1 + 1n + (x value – x )2
( x )2
2
i
( x 2 ) –
n
i
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Using Intervals – Problem 15.9
• For employees who worked 10 years for the firm,
what is the 95% confidence interval for their mean
share holdings?
This calls for a confidence interval on the
average number of shares owned by
employees who worked for the firm 10 years.
So we will use:
2
(
x
value
–
x
)
1
yˆ  ta  s y,x  n +
2
(
x
)

2
2
( x ) – n
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Standard Error of the Estimate,
Definitional Equation
x
6
12
14
6
9
13
15
9
y
300
408
560
252
288
650
630
522
Predicted y
276.8488
509.3837
586.8953
276.8488
393.1163
548.1395
625.6512
393.1163
Sum =
Squared Residual
535.9763
10278.6589
723.3598
617.4647
11049.4321
10375.5544
18.9124
16611.0135
50210.3721
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Evaluating the Confidence Interval
 ( yi – yˆ )2
= 50,210.3721 = 91.4789
s y,x =
n–2
8–2
Since n = 8, df = 8 – 2 = 6 and ta/2 = 2.447. From our prior
analyses, Sx = 84, Sx2 = 968, and the predicted y = 431.872.
2
(
x
value
–
x
)
1



+
=
yˆ ta s y,x n
2
(
x
)

2
2
( x ) – n
2
(
10
–
10
.
5
)
1
=
431.872  (2.447)(91.4789) +
8
2
84
968 –
8
431.872  (2.447)(91.4789)(0.3576) = 431.872  80.057
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Interpreting the Confidence Interval
• Based on our calculations, we would
have 95% confidence that the mean
number of shares for persons working
for the firm 10 years will be between:
431.872 – 80.057 = 351.815
and
431.872 + 80.057 = 511.929
Written in interval notation:
(351.815, 511.929)
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Using Intervals – Problem 15.9
• An employee worked 10 years for the firm. What
is the 95% prediction interval for her share
holdings?
This calls for a prediction interval on the
number of shares owned by an individual
employee who worked for the firm 10 years.
So we will use:
2
(
x
value
–
x
)
1
yˆ  ta  s y,x  1 + n +
2
(
x
)

2
2
( x ) – n
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Evaluating the Prediction Interval Problem 15.9
Since n = 8, df = 8 – 2 = 6 and ta/2 = 2.447. From our prior
analyses, Sx = 84, Sx2 = 968, and the predicted y = 431.872.
2
yˆ  ta  s y,x  1 + 1n + ( x value – x ) =
2
(
x
)

2
2
( x ) – n
2
(
10
–
10
.
5
)
1



+
+
=
431.872 (2.447) (91.4789) 1
8
2
968 – 84
8
431.872  (2.447)(91.4789)(1.0620) = 431.872  237.734
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Interpreting the Prediction Interval –
Problem 15.9
• Based on our calculations, we would
have 95% confidence that the number of
shares an employee working for the firm
10 years will hold will be between:
431.872 – 237.734 = 194.138
and
431.872 + 237.734 = 669.606
Written in interval notation,
(194.138 , 669.606)
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Comparing the Two Intervals
Notice that the confidence interval for the
mean is much narrower than the prediction
interval for the individual value. There is
greater fluctuation among individual values
than among group means. Both are centered
y = 431.872
at the point estimate.
|
|
|
0
100
Confidence
Interval:
Prediction
Interval:
|
|
200
|
194.1
|
|
300
|
|
400
|
|
500
|
|
|
351.8
511.9
|
600
|
|
700
|
669.6
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Coefficient of Correlation
• A measure of the
– Direction of the linear relationship between
x and y.
» If x and y are directly related, r > 0.
» If x and y are inversely related, r < 0.
– Strength of the linear relationship between
x and y.
» The larger the absolute value of r, the more the
value of y depends in a linear way on the value of x.
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Testing for Linearity
Key Argument:
• If the value of y does not change linearly
with the value of x, then using the mean
value of y is the best predictor for the actual
value of y. This implies y = y is preferable.
• If the value of y does change linearly with
the value of x, then using the regression
model gives a better prediction for the value
of y than using the mean of y. This implies
y = yˆ is preferable.
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Coefficient of Determination
• A measure of the
– Strength of the linear relationship between
x and y.
» The larger the value of r2, the more the value of
y depends in a linear way on the value of x.
– Amount of variation in y that is related to
variation in x.
– Ratio of variation in y that is explained by
the regression model divided by the total
variation in y.
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Three Tests for Linearity
• 1. Testing the Coefficient of Correlation
H0: r = 0 There is no linear relationship between x and y.
H1: r  0 There is a linear relationship between x and y.
r
Test Statistic: t =
1 – r2
n– 2
• 2. Testing the Slope of the Regression Line
H0: b1 = 0 There is no linear relationship between x and y.
H1: b1  0 There is a linear relationship between x and y.
Test Statistic:
t=s
b
1
y ,x
 x2  n( x )2
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Three Tests for Linearity
• 3. The Global F-test
H0: There is no linear relationship between x and y.
H1: There is a linear relationship between x and y.
SSR
Test Statistic: F = MSR =
1
MSE
SSE
(n – 2)
Note: At the level of simple linear regression, the global
F-test is equivalent to the t-test on b1. When we conduct
regression analysis of multiple variables, the global Ftest will take on a unique function.
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
Excel Output, Problem 15.9
The global
F test
statistic for
the test of
H0: b 1 = 0
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
0.848584
0.72009481
Coefficient of correlation
Coefficient of determination
Adjusted R Square 0.67344395
Standard Error
91.4789339
Observations
8
ANOVA
df
SS
MS
Regression
1
129173.1279 129173.128
Residual
6
50210.37209 8368.39535
Total
7
Years
Note that:
(1) both t and F
have the same
p-value, and
(2) t2 = F.
Significance F
15.43583 0.00772299
179383.5
Coefficients Standard Error
Intercept
F
t Stat
P-value
Lower 95%
Upper 95%
44.3139535
108.5086985 0.40839079 0.69716178 -221.197461 309.825368
38.755814
9.864427133 3.92884589 0.00772299 14.6184126 62.8932153
The calculated t for the test of H0: b1 = 0
© 2011 Cengage Learning.
All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.