Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Time series wikipedia , lookup

Choice modelling wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Statistics for Managers
Using Microsoft® Excel
4th Edition
Chapter 13
Introduction to Multiple Regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-1
Chapter Goals
After completing this chapter, you should be
able to:
 apply multiple regression analysis to business
decision-making situations
 analyze and interpret the computer output for a
multiple regression model
 perform residual analysis for the multiple
regression model
 test the significance of the independent
variables in a multiple regression model
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-2
Chapter Goals
(continued)
After completing this chapter, you should be
able to:
 use a coefficient of partial determination to test
portions of the multiple regression model
 incorporate qualitative variables into the
regression model by using dummy variables
 use interaction terms in regression models
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-3
The Multiple Regression
Model
The coefficients of the multiple regression model are
estimated using sample data
Estimated multiple regression model:
Estimated
(or predicted)
value of Y
Estimated
intercept
Estimated slope coefficients
Ŷ  b0  b1X1  b2 X2    bk Xk
In this chapter we will always use Excel to obtain the
regression slope coefficients and other regression
summary measures.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-4
Multiple Regression Model
Two variable model
Y
Ŷ  b0  b1X1  b2 X2
X2
X1
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-5
Example:
2 Independent Variables
 A distributor of frozen desert pies wants to
evaluate factors thought to influence demand
 Dependent variable:
Pie sales (units per week)
 Independent variables: Price (in $)
Advertising ($100’s)
 Data is collected for 15 weeks
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-6
Pie Sales Model
Week
Pie
Sales
Price
($)
Advertising
($100s)
1
350
5.50
3.3
2
460
7.50
3.3
3
350
8.00
3.0
4
430
8.00
4.5
5
350
6.80
3.0
6
380
7.50
4.0
7
430
4.50
3.0
8
470
6.40
3.7
9
450
7.00
3.5
10
490
5.00
4.0
11
340
7.20
3.5
12
300
7.90
3.2
13
440
5.90
4.0
14
450
5.00
3.5
15
300
7.00
2.7
Multiple regression model:
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Sales = b0 + b1 (Price)
+ b2 (Advertising)
Chap 13-7
Estimating a Multiple Linear
Regression Equation
 Excel will be used to generate the coefficients
and measures of goodness of fit for multiple
regression
 Excel:
 Tools / Data Analysis... / Regression
 PHStat:
 PHStat / Regression / Multiple Regression…
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-8
Multiple Regression Output
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
15
df
Sales  306.5262 - 24.9751(Price)  74.1310(Advertising)
SS
MS
F
2
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
Chap 13-9
The Multiple Regression Equation
Sales  306.5262 - 24.9751(Price)  74.1310(Advertising)
where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
b1 = -24.975: sales
will decrease, on
average, by 24.9751
pies per week for
each $1 increase in
selling price, net of
the effects of changes
due to advertising
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
b2 = 74.131: sales will
increase, on average,
by 74.1310 pies per
week for each $100
increase in
advertising, net of the
effects of changes
due to price
Chap 13-10
Using The Model to Make
Predictions
Predict sales for a week in which the selling
price is $5.50 and advertising is $350:
Sales  306.526 - 24.975(Pri ce)  74.131(Adv ertising)
 306.526 - 24.975 (5.50)  74.131 (3.5)
 428.62
Predicted sales
is 428.62 pies
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Note that Advertising is
in $100’s, so $350
means that X2 = 3.5
Chap 13-11
Predictions in PHStat
 PHStat | regression | multiple regression …
Check the
“confidence and
prediction interval
estimates” box
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-12
Predictions in PHStat
(continued)
Input values
<
Predicted Y value
<
Confidence interval for the
mean Y value, given
these X’s
<
Prediction interval for an
individual Y value, given
these X’s
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-13
Coefficient of
Multiple Determination
 Reports the proportion of total variation in Y
explained by all X variables taken together
2
Y.12..k
r
SSR regression sum of squares


SST
total sum of squares
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-14
Multiple Coefficient of
Determination
(continued)
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
Regression
r
15
df
SSR 29460.0


 .5215
SST 56493.3
52.1% of the variation in pie sales
is explained by the variation in
price and advertising
47.46341
Observations
ANOVA
2
Y.12
SS
MS
F
2
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
Chap 13-15
Adjusted r2
 r2 never decreases when a new X variable is
added to the model
 This can be a disadvantage when comparing
models
 What is the net effect of adding a new variable?
 We lose a degree of freedom when a new X
variable is added
 Did the new X variable add enough
explanatory power to offset the loss of one
degree of freedom?
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-16
Adjusted r2
(continued)
 Shows the proportion of variation in Y explained
by all X variables adjusted for the number of X
variables used
2
adj
r

 n  1 
2
 1  (1  rY.12..k )

 n  k  1

(where n = sample size, k = number of independent variables)
 Penalize excessive use of unimportant independent
variables
 Smaller than r2
 Useful in comparing among models
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-17
Multiple Coefficient of
Determination
(continued)
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
15
df
r  .4417
2
adj
44.2% of the variation in pie sales is
explained by the variation in price and
advertising, taking into account the sample
size and number of independent variables
SS
MS
F
2
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
Chap 13-18
Residual Plots Used
in Multiple Regression
 These residual plots are used in multiple
regression:
<
 Residuals vs. Yi
 Residuals vs. X1i
 Residuals vs. X2i etc.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-19
Residual Plots Pie Example
Use Tools | Data Analysis | Regression
Residuals
Price Residual Plot
200
100
0
-100 4
5
6
7
8
9
Price
Residuals
Advertising Residual Plot
200
100
0
-100 2
2.5
3
3.5
4
4.5
5
Advertising
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-20
Residual Plots Pie Example
Use Tools | Data Analysis | Regression
Pie Sales
Normal Probability Plot
550
500
450
400
350
300
250
200
0
20
40
60
80
100
Sample Percentile
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-21
Is the Model Significant?
 F-Test for Overall Significance of the Model
 Shows if there is a linear relationship between all
of the X variables considered together and Y
 Use F test statistic
 Hypotheses:
H0: β1 = β2 = … = βk = 0 (no linear relationship)
H1: at least one βi ≠ 0 (at least one independent
variable affects Y)
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-22
F-Test for Overall Significance
(continued)
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
15
df
MSR 14730.0
F

 6.5386
MSE
2252.8
With 2 and 12 degrees
of freedom
SS
MS
P-value for
the F-Test
F
2
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
Chap 13-23
F-Test for Overall Significance
(continued)
Test Statistic:
H0: β1 = β2 = 0
H1: β1 and β2 not both zero
 = .05
df1= 2
df2 = 12
MSR
F
 6.5386
MSE
Decision:
p-value=.01201
Reject H0 at  = 0.05
Conclusion:
 = .05
0
Do not
reject H0
Reject H0
F
There is evidence that at
least one independent
variable affects Y
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-24
Are Individual Variables
Significant?
 Use t-tests of individual variable slopes
 Shows if there is a linear relationship between the
variable Xi and Y
 Hypotheses:
 H0: βi = 0 (no linear relationship)
 H1: βi ≠ 0 (linear relationship does exist
between Xi and Y)
Test Statistic
bi  0
t
S bi
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
(df = n – k – 1)
Chap 13-25
Are Individual Variables
Significant?
(continued)
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
15
df
t-value for Price is t = -2.306, with
p-value .0398
t-value for Advertising is t = 2.855,
with p-value .0145
SS
MS
F
2
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
Chap 13-26
Inferences about the Slope:
t Test Example
H0: β1 = 0
H1: β1  0
From Excel output:
Price
Advertising
H0: β2 = 0
H1: β2  0
Coefficients
Standard Error
t Stat
P-value
-24.97509
10.83213
-2.30565
0.03979
74.13096
25.96732
2.85478
0.01449
The test statistic for each variable falls
in the rejection region (p-values < .05)
Decision:
d.f. = 15-2-1 = 12
 = .05
Reject H0 for each variable
Conclusion:
There is evidence that both
Price and Advertising affect
pie sales at  = .05
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-27
Confidence Interval Estimate
for the Slope
(continued)
Confidence interval for the population slope βi
bi  t nk 1Sbi
where t has
(n – k – 1) d.f.
(15 – 2 – 1) = 12 d.f.
Coefficients
Standard Error
…
Intercept
306.52619
114.25389
…
57.58835
555.46404
Price
-24.97509
10.83213
…
-48.57626
-1.37392
74.13096
25.96732
…
17.55303
130.70888
Advertising
Lower 95%
Upper 95%
Example: Excel output also reports these interval endpoints:
Weekly sales are estimated to be reduced by between 1.37 to
48.58 pies for each increase of $1 in the selling price
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-28
Coefficient of Partial Determination
for k variable model
2
rYj.(all
v ariables except j)

SSR (X j | all variables except j)
SST SSR(all variables)  SSR(X j | all variables except j)
 Measures the proportion of variation in the dependent
variable that is explained by Xj while controlling for
(holding constant) the other explanatory variables
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-29
Coefficient of Partial
Determination in Excel
 Coefficients of Partial Determination can be
found using Excel:
 PHStat | regression | multiple regression …
 Check the “coefficient of partial determination” box
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-30
Using Dummy Variables
 A dummy variable is a categorical explanatory
variable with two levels:
 yes or no, on or off, male or female
 coded as 0 or 1
 Regression intercepts are different if the variable
is significant
 Assumes equal slopes for other variables
 If more than two levels, the number of dummy
variables needed is (number of levels - 1)
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-31
Dummy-Variable Model Example
(with 2 Levels)
Ŷ  b0  b1X1  b2 X2
Let:
Y = pie sales
X1 = price
X2 = holiday (X2 = 1 if a holiday occurred during the week)
(X2 = 0 if there was no holiday that week)
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-32
Dummy-Variable Model Example
(with 2 Levels)
(continued)
Ŷ  b0  b1X1  b2 (1)  (b0  b2 )  b1X1
Holiday
Ŷ  b0  b1X1  b2 (0) 
No Holiday
Y (sales)
b0 + b2
b0
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
 b1X1
b0
Different
intercept
Same
slope
If H0: β2 = 0 is
rejected, then
“Holiday” has a
significant effect
on pie sales
X1 (Price)
Chap 13-33
Interpreting the Dummy Variable
Coefficient (with 2 Levels)
Example:
Sales  300 - 30(Price)  15(Holiday)
Sales: number of pies sold per week
Price: pie price in $
1 If a holiday occurred during the week
Holiday:
0 If no holiday occurred
b2 = 15: on average, sales were 15 pies greater in
weeks with a holiday than in weeks without a
holiday, given the same price
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-34
Dummy-Variable Models
(more than 2 Levels)
 The number of dummy variables is one less
than the number of levels
 Example:
Y = house price ; X1 = square feet
If style of the house is also thought to matter:
Style = ranch, split level, condo
Three levels, so two dummy
variables are needed
Advanced concept not included in homework
or on the exam.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-35
Interaction Between
Explanatory Variables
 Hypothesizes interaction between pairs of X variables
 Response to one X variable may vary at different levels of
another X variable
 Contains two-way cross product terms
Ŷ  b0  b1X1  b2 X2  b3 X3
 b0  b1X1  b2 X2  b3 (X1X2 )
 Hypotheses:
 H0: 3 = 0 (no interaction between X1 and X2)
 H1: 3  0 (X1 interacts with X2)
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-36
Interaction Regression Model
Worksheet
Item, i
Yi
X1i
X2i
X1i*X2i
1
1
1
3
3
2
4
8
5
40
3
1
3
2
6
4
3
5
6
30
…
…
…
…
…
Multiply X1 by X2 to get X1*X2.
Run regression with Y, X1, X2 , X1X2
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-37
Chapter Summary
 Developed the multiple regression model
 Tested the significance of the multiple
regression model
 Discussed adjusted r2
 Used residual plots to check model
 Tested individual regression coefficients
 Tested portions of the regression model
 Used dummy variables
 Evaluated interaction effects
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
Chap 13-38