Download Business Statistics: A Decision

Document related concepts

Instrumental variables estimation wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Least squares wikipedia , lookup

Regression toward the mean wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Chapter 9
Student Lecture Notes
9-1
Department of Quantitative Methods & Information
Systems
ECON 504
Chapter 3
Multiple Regression
Complete Example
Spring 2013
Dr. Mohammad Zainal
Review Goals
After completing this lecture, you should be
able to:

Formulate null and alternative hypotheses for
applications involving a single population mean or
proportion

Formulate a decision rule for testing a hypothesis

Know how to use the test statistic, critical value, and
p-value approaches to test the null hypothesis

Know what Type I and Type II errors are
2
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-2
Review Goals
(continued)




explain model building using multiple regression
analysis
apply multiple regression analysis to business
decision-making situations
analyze and interpret the computer output for a
multiple regression model
test the significance of the independent variables
in a multiple regression model
3
Review Goals
(continued)



recognize potential problems in multiple
regression analysis and take steps to correct the
problems
incorporate qualitative variables into the
regression model by using dummy variables
use variable transformations to model nonlinear
relationships
4
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-3
What is a Hypothesis?

A hypothesis is a claim
(assumption) about a
population parameter:

population mean
Example: The mean monthly cell phone bill
of this city is µ = $42

population proportion
Example: The proportion of adults in this
city with cell phones is P = .68
5
The Null Hypothesis, H0

States the assumption (numerical) to be
tested
Example: The average number of TV sets in
U.S. Homes is at least three ( H0 : μ  3 )

Is always about a population parameter,
not about a sample statistic
H0 : μ  3
H0 : x  3
6
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-4
The Null Hypothesis, H0
(continued)




Begin with the assumption that the null
hypothesis is true
 Similar to the notion of innocent until
proven guilty
Refers to the status quo
Always contains “=” , “≤” or “” sign
May or may not be rejected
7
The Alternative Hypothesis, HA

Is the opposite of the null hypothesis





e.g.: The average number of TV sets in U.S.
homes is less than 3 ( HA: µ < 3 )
Challenges the status quo
Never contains the “=” , “≤” or “” sign
May or may not be accepted
Is generally the hypothesis that is believed
(or needs to be supported) by the
researcher – a research hypothesis
8
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-5
Formulating Hypotheses


Example 1: Ford motor company has
worked to reduce road noise inside the cab
of the redesigned F150 pickup truck. It
would like to report in its advertising that
the truck is quieter. The average of the
prior design was 68 decibels at 60 mph.
What is the appropriate hypothesis test?
9
Formulating Hypotheses


Example 1: Ford motor company has worked to reduce road noise inside
the cab of the redesigned F150 pickup truck. It would like to report in its
advertising that the truck is quieter. The average of the prior design was
68 decibels at 60 mph.
What is the appropriate test?
H0: µ ≥ 68 (the truck is not quieter) status quo
HA: µ < 68 (the truck is quieter) wants to support

If the null hypothesis is rejected, Ford has sufficient
evidence to support that the truck is now quieter.
10
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-6
Formulating Hypotheses


Example 2: The average annual income of
buyers of Ford F150 pickup trucks is
claimed to be $65,000 per year. An
industry analyst would like to test this
claim.
What is the appropriate hypothesis test?
11
Formulating Hypotheses


Example 1: The average annual income of buyers of Ford F150
pickup trucks is claimed to be $65,000 per year. An industry
analyst would like to test this claim.
What is the appropriate test?
H0: µ = 65,000 (income is as claimed) status quo
HA: µ ≠ 65,000 (income is different than claimed)

The analyst will believe the claim unless
sufficient evidence is found to discredit it.
12
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-7
Hypothesis Testing Process
Claim: the population mean age is 50.
Null Hypothesis: H0: µ = 50
Population
Now select a random sample:
Sample
Is x = 20
likely if
µ = 50?
Suppose the sample
mean age is 20:
x = 20
If not likely,
REJECT
Null Hypothesis
13
Reason for Rejecting H0
Sampling Distribution of x
x
20
If it is unlikely that
we would get a
sample mean of
this value ...
μ = 50
If H0 is true
... if in fact this were
the population mean…
... then we
reject the null
hypothesis that
μ = 50.
14
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-8
Errors in Making Decisions

Type I Error
 Reject a true null hypothesis
 Considered a serious type of error
The probability of Type I Error is 

Called level of significance of the test

Set by researcher in advance
15
Errors in Making Decisions
(continued)

Type II Error
 Fail to reject a false null hypothesis
The probability of Type II Error is β

β is a calculated value, the formula is
discussed later in the chapter
16
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-9
Outcomes and Probabilities
Possible Hypothesis Test Outcomes
State of Nature
Key:
Outcome
(Probability)
Decision
H0 True
H0 False
Do Not
Reject
H0
No error
(1 -  )
Type II Error
(β)
Reject
H0
Type I Error
()
No Error
(1-β)
17
Type I & II Error Relationship
 Type I and Type II errors cannot happen at
the same time

Type I error can only occur if H0 is true

Type II error can only occur if H0 is false
If Type I error probability (  )
, then
Type II error probability ( β )
18
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-10
Factors Affecting Type II Error

All else equal,

β
when the difference between
hypothesized parameter and its true value

β
when


β
when
σ

β
when
n
The formula used to
compute the value of β
is discussed later in the
chapter
19
Level of Significance, 

Defines unlikely values of sample statistic if
null hypothesis is true


Defines rejection region of the sampling
distribution
Is designated by  , (level of significance)

Typical values are .01, .05, or .10

Is selected by the researcher at the beginning

Provides the critical value(s) of the test
20
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-11
Hypothesis Tests for the Mean
Hypothesis
Tests for 
σ Known

σ Unknown
Assume first that the population
standard deviation σ is known
21
Process of Hypothesis Testing

1. Specify population parameter of interest

2. Formulate the null and alternative hypotheses

3. Specify the desired significance level, α

4. Define the rejection region


5. Take a random sample and determine
whether or not the sample result is in the
rejection region
6. Reach a decision and draw a conclusion
22
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-12
Level of Significance
and the Rejection Region
Level of significance =
Lower tail test

Upper tail test
Example:
H0: μ ≥ 3
HA: μ < 3
Two tailed test
Example:
Example:
H0: μ ≤ 3
HA: μ > 3
H0: μ = 3
HA: μ ≠ 3


-zα 0
Reject H0
zα
0
Do not
reject H0
 /2
Do not
reject H0
/2
-zα/2 0
Reject H0
Reject H0
Do not
reject H0
zα/2
Reject H0
23
Critical Value for Lower Tail Test

The cutoff value, -zα or xα ,
H0: μ ≥ 3
HA: μ < 3
is called a critical value

Reject H0
x   μ  z
σ
n
-zα
xα
Do not reject H0
0
µ=3
24
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-13
Critical Value for Upper Tail Test

The cutoff value, zα or xα ,
H0: μ ≤ 3
is called a critical value
HA: μ > 3

Do not reject H0
0
zα
µ=3
xα
x   μ  z
Reject H0
σ
n
25
Critical Values for
Two Tailed Tests

H0: μ = 3
HA: μ  3
There are two cutoff
values (critical values):
± zα/2
or
xα/2
xα/2
/2
/2
Lower
Reject H0
Upper
Do not reject H0
-zα/2
xα/2
0
µ=3
Lower
x /2  μ  z /2
Business Statistics: A Decision-Making Approach, 7e
Reject H0
zα/2
xα/2
σ
n
Upper
26
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-14
The Rejection Region
Lower tail test
Example:
H0: μ ≥ 3
HA: μ < 3
Upper tail test
Two tailed test
Example:
Example:
H0: μ ≤ 3
HA: μ > 3
H0: μ = 3
HA: μ ≠ 3


-zα 0
xα
0
Do not
reject H0
Reject H0
Reject H0 if z < -zα
i.e., if x < xα
Do not
reject H0
zα
xα
Reject H0
Reject H0 if z > zα
i.e., if x > xα
 /2
/2
-zα/2 0 zα/2
x α/2(L)
x α/2(U)
Reject H0
Do not
reject H0
Reject H0
Reject H0 if z < -zα/2 or z > zα/2
i.e., if x < xα/2(L) or x > xα/2(U)
27
Two Equivalent Approaches
to Hypothesis Testing

z-units:




For given , find the critical z value(s):
 -zα , zα ,or ±zα/2
Convert the sample mean x to a z test statistic:
Reject H0 if z is in the rejection region,
otherwise do not reject H0
z 
x μ
σ
n
x units:

Given , calculate the critical value(s)


xα , or xα/2(L) and xα/2(U)
The sample mean is the test statistic. Reject H0 if x is in the
rejection region, otherwise do not reject H0
28
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-15
Hypothesis Testing Example
Test the claim that the true mean # of TV sets
in US homes is at least 3.
(Assume σ = 0.8)
1. Specify the population value of interest
 The mean number of TVs in US homes
2. Formulate the appropriate null and alternative
hypotheses
 H0: μ  3
HA: μ < 3 (This is a lower tail test)
3. Specify the desired level of significance
 Suppose that  = .05 is chosen for this test



29
Hypothesis Testing Example
(continued)

4. Determine the rejection region
 = .05
Reject H0
-zα= -1.645
Do not reject H0
0
This is a one-tailed test with  = .05.
Since σ is known, the cutoff value is a z value:
Reject H0 if z < z = -1.645 ; otherwise do not reject H0
30
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-16
Hypothesis Testing Example

5. Obtain sample evidence and compute the
test statistic
Suppose a sample is taken with the following
results: n = 100, x = 2.84 ( = 0.8 is assumed known)

z 
Then the test statistic is:
xμ
2.84  3  .16


 2.0
σ
0.8
.08
n
100
31
Hypothesis Testing Example
(continued)

6. Reach a decision and interpret the result
 = .05
z
Reject H0
-1.645
Do not reject H0
0
-2.0
Since z = -2.0 < -1.645, we reject the null
hypothesis that the mean number of TVs in US
homes is at least 3. There is sufficient evidence
that the mean is less than 3.
32
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-17
Hypothesis Testing Example
(continued)

An alternate way of constructing rejection region:
Now
expressed
in x, not z
units
 = .05
x
Reject H0
2.8684
2.84
Since x = 2.84 < 2.8684,
we reject the null
hypothesis
Do not reject H0
3
x α  μ  zα
σ
0.8
 3  1.645
 2.8684
n
100
33
p-Value Approach to Testing



Convert Sample Statistic ( x ) to Test Statistic
(a z value, if σ is known)
Determine the p-value from a table or
computer
Compare the p-value with 

If p-value <  , reject H0

If p-value   , do not reject H0
34
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-18
p-Value Approach to Testing
(continued)

p-value: Probability of obtaining a test
statistic more extreme ( ≤ or  ) than the
observed sample value given H0 is true


Also called observed level of significance
Smallest value of  for which H0 can be
rejected
35
p-value example

Example: How likely is it to see a sample mean
of 2.84 (or something further below the mean) if
the true mean is  = 3.0?
P( x  2.84 | μ  3.0)



2.84  3.0 
 P z 

0.8


100 

 P(z  2.0)  .0228
 = .05
p-value =.0228
2.8684
3
x
2.84
-1.645
-2.0
0
z
36
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-19
p-value example
(continued)

Compare the p-value with 

If p-value <  , reject H0

If p-value   , do not reject H0
 = .05
Here: p-value = .0228
 = .05
Since .0228 < .05, we reject
the null hypothesis
p-value =.0228
2.8684
3
2.84
37
Example: Upper Tail z Test
for Mean ( Known)
A phone industry manager thinks that
customer monthly cell phone bill have
increased, and now average over $52 per
month. The company wishes to test this
claim. (Assume  = 10 is known)
Form hypothesis test:
H0: μ ≤ 52 the average is not over $52 per month
HA: μ > 52
the average is greater than $52 per month
(i.e., sufficient evidence exists to support the
manager’s claim)
38
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-20
Example: Find Rejection Region
(continued)

Suppose that  = .10 is chosen for this test
Find the rejection region:
Reject H0
 = .10
Do not reject H0
0
zα=1.28
Reject H0
Reject H0 if z > 1.28
39
Review:
Finding Critical Value - One Tail
What is z given  = 0.10?
.90
.10
 = .10
.50 .40
z
Standard Normal
Distribution Table (Portion)
0 1.28
Z
.07
.08
.09
1.1 .3790 .3810 .3830
1.2 .3980 .3997 .4015
1.3 .4147 .4162 .4177
Critical Value
= 1.28
40
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-21
Example: Test Statistic
(continued)
Obtain sample evidence and compute the test
statistic
Suppose a sample is taken with the following
results: n = 64, x = 53.1 (=10 was assumed known)

Then the test statistic is:
z 
xμ
53.1  52

 0.88
σ
10
n
64
41
Example: Decision
(continued)
Reach a decision and interpret the result:
Reject H0
 = .10
Do not reject H0
0
1.28
Reject H0
z = .88
Do not reject H0 since z = 0.88 ≤ 1.28
i.e.: there is not sufficient evidence that the
mean bill is over $52
42
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-22
p -Value Solution
(continued)
Calculate the p-value and compare to 
p-value = .1894
Reject H0
 = .10
0
Do not reject H0
1.28
Reject H0
z = .88
P(x  53.1 | μ  52.0)



53.1  52.0 
 P z 

10


64


 P(z  0.88)  .5  .3106
 .1894
Do not reject H0 since p-value = .1894 >  = .10
43
Critical Value
Approach to Testing

When σ is known, convert sample statistic ( x ) to
a z test statistic
Hypothesis
Tests for 
 Known
 Unknown
The test statistic is:
z 
x μ
σ
n
44
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-23
Critical Value
Approach to Testing

When σ is unknown, convert sample statistic ( x )
to a t test statistic
Hypothesis
Tests for 
 Known
 Unknown
The test statistic is:
t n1 
(The population must be
approximately normal)
x μ
s
n
45
Hypothesis Tests for μ,
σ Unknown






1. Specify the population value of interest
2. Formulate the appropriate null and
alternative hypotheses
3. Specify the desired level of significance
4. Determine the rejection region (critical values
are from the t-distribution with n-1 d.f.)
5. Obtain sample evidence and compute the
t test statistic
6. Reach a decision and interpret the result
46
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-24
Example: Two-Tail Test
( Unknown)
The average cost of a
hotel room in New York
is said to be $168 per
night. A random sample
of 25 hotels resulted in
x = $172.50 and
s = $15.40. Test at the
 = 0.05 level.
H0: μ = 168
HA: μ  168
(Assume the population distribution is normal)
47
Example Solution: Two-Tail Test
H0: μ = 168
HA: μ  168
  = 0.05
 n = 25
 Critical Values:
t24 = ± 2.0639
  is unknown, so
use a t statistic
/2=.025
Reject H0
-tα/2
-2.0639
t n 1 
/2=.025
Do not reject H0
0
Reject H0
tα/2
1.46 2.0639
x μ
172.50  168

 1.46
s
15.40
n
25
Do not reject H0: not sufficient evidence that
true mean cost is different than $168
48
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-25
Type II Error

Type II error is the probability of
failing to reject a false H0
Suppose we fail to reject H0: μ  52
when in fact the true mean is μ = 50

50
52
Reject
H0: μ  52
Do not reject
H0 : μ  52
49
Type II Error
(continued)

Suppose we do not reject H0:   52 when in fact
the true mean is  = 50
This is the range of x where
H0 is not rejected
This is the true
distribution of x if  = 50
50
52
Reject
H0:   52
Do not reject
H0 :   52
50
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-26
Type II Error
(continued)

Suppose we do not reject H0: μ  52 when
in fact the true mean is μ = 50
Here, β = P( x  cutoff ) if μ = 50
β

50
52
Reject
H0: μ  52
Do not reject
H0 : μ  52
51
Calculating β

Suppose n = 64 , σ = 6 , and  = .05
σ
6
 52  1.645
 50.766
n
64
cutoff  x   μ  z 
(for H0 : μ  52)
So β = P( x  50.766 ) if μ = 50

50
50.766
Reject
H0: μ  52
52
Do not reject
H0 : μ  52
52
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-27
Calculating β
(continued)

Suppose n = 64 , σ = 6 , and  = .05



50.766  50 
P( x  50.766 | μ  50)  P z 
  P(z  1.02)  .5  .3461  .1539
6


64 

Probability of
type II error:

β = .1539
50
52
Reject
H0: μ  52
Do not reject
H0 : μ  52
53
Hypothesis Tests in Minitab
54
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-28
Hypothesis Tests in Minitab
55
Sample Minitab Output
56
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-29
Hypothesis Tests Summary

Addressed hypothesis testing methodology

Performed z Test for the mean (σ known)


Discussed p–value approach to
hypothesis testing
Performed one-tail and two-tail tests . . .
57
Hypothesis Tests Summary
(continued)



Performed t test for the mean (σ
unknown)
Performed z test for the proportion
Discussed Type II error and computed its
probability
58
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-30
Multiple Regression Assumptions
Errors (residuals) from the regression model:
<
e = (y – y)




The model errors are independent and
random
The errors are normally distributed
The mean of the errors is zero
Errors have a constant variance
59
Model Specification



Decide what you want to do and select the
dependent variable
Determine the potential independent variables for
your model
Gather sample data (observations) for all
variables
60
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-31
The Correlation Matrix

Correlation between the dependent variable and
selected independent variables can be found
using Excel:


Formula Tab: Data Analysis / Correlation
Can check for statistical significance of correlation
with a t test
61
Example

A distributor of frozen desert pies wants to
evaluate factors thought to influence demand


Dependent variable:
Pie sales (units per week)
Independent variables: Price (in $)
Advertising ($100’s)

Data are collected for 15 weeks
62
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-32
Pie Sales Model
Week
Pie
Sales
Price
($)
Advertising
($100s)
1
350
5.50
3.3
2
460
7.50
3.3
3
350
8.00
3.0
4
430
8.00
4.5
5
350
6.80
3.0
6
380
7.50
4.0
7
430
4.50
3.0
8
470
6.40
3.7
9
450
7.00
3.5
10
490
5.00
4.0
11
340
7.20
3.5
12
300
7.90
3.2
13
440
5.90
4.0
14
450
5.00
3.5
15
300
7.00
2.7
Multiple regression model:
Sales = b0 + b1 (Price)
+ b2 (Advertising)
Correlation matrix:
Pie Sales
Pie Sales
Price
Advertising
Price
Advertising
1
-0.44327
1
0.55632
0.03044
1
63
Interpretation of Estimated
Coefficients


Slope (bi)
 Estimates that the average value of y changes by bi
units for each 1 unit increase in Xi holding all other
variables constant
 Example: if b1 = -20, then sales (y) is expected to
decrease by an estimated 20 pies per week for each $1
increase in selling price (x1), net of the effects of
changes due to advertising (x2)
y-intercept (b0)
 The estimated average value of y when all xi = 0
(assuming all xi = 0 is within the range of observed
values)
64
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-33
Pie Sales Correlation Matrix
Pie Sales
Pie Sales
Price
Advertising

Advertising
-0.44327
1
0.55632
0.03044
1
Price vs. Sales : r = -0.44327


Price
1
There is a negative association between
price and sales
Advertising vs. Sales : r = 0.55632

There is a positive association between
advertising and sales
65
Scatter Diagrams
Sales vs. Price
Sales
600
500
400
300
200
Sales vs. Advertising
Sales
100
600
0
0
2
4
6
8
Price
10 500
400
300
200
100
0
0
1
2
3
Advertising
4
5
66
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-34
Estimating a Multiple Linear
Regression Equation


Computer software is generally used to generate
the coefficients and measures of goodness of fit
for multiple regression
Excel:


Data / Data Analysis / Regression
Minitab:

Stat / Regression / Regression…
67
Estimating a Multiple Linear
Regression Equation

Excel:
68
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-35
Estimating a Multiple Linear
Regression Equation

Minitab:
69
Multiple Regression Output
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
Standard Error
0.44172
47.46341
Observations
ANOVA
Regression
Sales  306.526 - 24.975(Price)  74.131(Advertising)
15
df
SS
MS
2
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
P-value
Significance F
0.01201
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
F
6.53861
Lower 95%
Upper 95%
555.46404
70
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-36
The Multiple Regression Equation
Sales  306.526 - 24.975(Price)  74.131(Advertising)
where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
b1 = -24.975: sales
will decrease, on
average, by 24.975
pies per week for
each $1 increase in
selling price, net of
the effects of changes
due to advertising
b2 = 74.131: sales will
increase, on average,
by 74.131 pies per
week for each $100
increase in
advertising, net of the
effects of changes
due to price
71
Using The Model to Make
Predictions
Predict sales for a week in which the selling
price is $5.50 and advertising is $350:
Sales  306.526 - 24.975(Price)  74.131(Advertising)
 306.526 - 24.975 (5.50)  74.131(3.5)
 428.62
Predicted sales
is 428.62 pies
Note that Advertising is
in $100’s, so $350
means that x2 = 3.5
72
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-37
Predictions in Minitab
73
Predictions in Minitab
(continued)
<
Predicted y value
<
Confidence interval for the
mean y value, given
these x’s
<
Prediction interval for an
individual y value, given
these x’s
74
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-38
Multiple Coefficient of
Determination (R2)

Reports the proportion of total variation in y
explained by all x variables taken together
R2 
SSR Sum of squaresregression

SST
Total sum of squares
75
Multiple Coefficient of
Determination
(continued)
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
Standard Error
Regression
SSR 29460.0

 .52148
SST 56493.3
0.44172
52.1% of the variation in pie sales
is explained by the variation in
price and advertising
47.46341
Observations
ANOVA
R2 
15
df
SS
MS
2
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
P-value
Significance F
0.01201
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
F
6.53861
Lower 95%
Upper 95%
555.46404
76
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-39
Adjusted R2


R2 never decreases when a new x variable is
added to the model
 This can be a disadvantage when comparing
models
What is the net effect of adding a new variable?
 We lose a degree of freedom when a new x
variable is added
 Did the new x variable add enough
explanatory power to offset the loss of one
degree of freedom?
77
Adjusted R2
(continued)

Shows the proportion of variation in y explained by
all x variables adjusted for the number of x
variables used
 n 1 
R2A  1  (1  R2 )

 n  k  1
(where n = sample size, k = number of independent variables)



Penalize excessive use of unimportant independent
variables
Smaller than R2
Useful in comparing among models
78
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-40
Multiple Coefficient of
Determination
(continued)
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
15
df
Regression
R2A  .44172
44.2% of the variation in pie sales is
explained by the variation in price and
advertising, taking into account the sample
size and number of independent variables
SS
MS
2
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
P-value
Significance F
0.01201
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
F
6.53861
Lower 95%
Upper 95%
555.46404
79
Is the Model Significant?


F-Test for Overall Significance of the Model
Shows if there is a linear relationship between all
of the x variables considered together and y

Use F test statistic

Hypotheses:


H0: β1 = β2 = … = βk = 0 (no linear relationship)
HA: at least one βi ≠ 0 (at least one independent
variable affects y)
80
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-41
F-Test for Overall Significance
(continued)

Test statistic:
SSR
MSR
k
F

SSE
MSE
n  k 1
where F has (numerator) D1 = k and
(denominator) D2 = (n – k – 1)
degrees of freedom
81
F-Test for Overall Significance
(continued)
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
15
df
F
MSR 14730.0

 6.5386
MSE
2252.8
With 2 and 12 degrees
of freedom
SS
MS
2
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
F
6.53861
P-value
Significance F
0.01201
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
P-value for
the F-Test
Lower 95%
Upper 95%
555.46404
82
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-42
F-Test for Overall Significance
(continued)
H0: β1 = β2 = 0
HA: β1 and β2 not both zero
 = .05
df1= 2
df2 = 12
Critical
Value:
F = 3.885
Do not
reject H0
Reject H0
F
MSR
 6.5386
MSE
Decision:
Reject H0 at  = 0.05
Conclusion:
The regression model does explain
a significant portion of the variation
in pie sales
 = .05
0
Test Statistic:
F
F.05 = 3.885
(There is evidence that at least one
independent variable affects y )
83
Are Individual Variables
Significant?



Use t-tests of individual variable slopes
Shows if there is a linear relationship between the
variable xi and y
Hypotheses:


H0: βi = 0 (no linear relationship)
HA: βi ≠ 0 (linear relationship does exist
between xi and y)
84
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-43
Are Individual Variables
Significant?
(continued)
H0: βi = 0 (no linear relationship)
HA: βi ≠ 0 (linear relationship does exist
between xi and y )
Test Statistic:
bi  0
t
sbi
(df = n – k – 1)
85
Are Individual Variables
Significant?
(continued)
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
Standard Error
0.44172
47.46341
Observations
ANOVA
Regression
15
df
t-value for Price is t = -2.306, with
p-value .0398
t-value for Advertising is t = 2.855,
with p-value .0145
SS
MS
2
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
P-value
Significance F
0.01201
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
F
6.53861
Lower 95%
Upper 95%
555.46404
86
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-44
Inferences about the Slope:
t Test Example
From Excel output:
H0: βi = 0
HA: βi  0
Coefficients
Price
Advertising
d.f. = 15-2-1 = 12
t Stat
P-value
-24.97509
Standard Error
10.83213
-2.30565
0.03979
74.13096
25.96732
2.85478
0.01449
The test statistic for each variable falls
in the rejection region (p-values < .05)
 = .05
t/2 = 2.1788
/2=.025
/2=.025
Decision:
Reject H0 for each variable
Conclusion:
Reject H0
Do not reject H0
-tα/2
-2.1788
0
tα/2
Reject H0
2.1788
There is evidence that both
Price and Advertising affect
pie sales at  = .05
87
Confidence Interval Estimate
for the Slope
Confidence interval for the population slope β1
(the effect of changes in price on pie sales):
bi  t  / 2sbi
where t has
(n – k – 1) d.f.
Coefficients
Standard Error
…
Intercept
306.52619
114.25389
…
57.58835
Price
-24.97509
10.83213
…
-48.57626
-1.37392
74.13096
25.96732
…
17.55303
130.70888
Advertising
Lower 95%
Upper 95%
555.46404
Example: Weekly sales are estimated to be reduced
by between 1.37 to 48.58 pies for each increase of $1
in the selling price
88
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-45
Standard Deviation of the
Regression Model

The estimate of the standard deviation of the
regression model is:
SSE
 MSE
n  k 1
s 

Is this value large or small? Must compare to the
mean size of y for comparison
89
Standard Deviation of the
Regression Model
(continued)
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
Standard Error
0.44172
47.46341
Observations
ANOVA
Regression
The standard deviation of the
regression model is 47.46
15
df
SS
MS
2
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
P-value
Significance F
0.01201
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
F
6.53861
Lower 95%
Upper 95%
555.46404
90
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-46
Standard Deviation of the
Regression Model
(continued)



The standard deviation of the regression model is
47.46
A rough prediction range for pie sales in a given
week is  2(47.46)  94.2
Pie sales in the sample were in the 300 to 500
per week range, so this range is probably too
large to be acceptable. The analyst may want to
look for additional variables that can explain more
of the variation in weekly sales
91
Multicollinearity


Multicollinearity: High correlation exists
between two independent variables
This means the two variables contribute
redundant information to the multiple regression
model
92
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-47
Multicollinearity
(continued)

Including two highly correlated independent
variables can adversely affect the regression
results



No new information provided
Can lead to unstable coefficients (large
standard error and low t-values)
Coefficient signs may not match prior
expectations
93
Some Indications of Severe
Multicollinearity




Incorrect signs on the coefficients
Large change in the value of a previous
coefficient when a new variable is added to the
model
A previously significant variable becomes
insignificant when a new independent variable
is added
The estimate of the standard deviation of the
model increases when a variable is added to
the model
94
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-48
Detect Collinearity
(Variance Inflationary Factor)
VIFj is used to measure collinearity:
VIF j 
1
1  R 2j
R2j is the coefficient of determination when the jth
independent variable is regressed against the
remaining k – 1 independent variables
If VIFj ≥ 5, xj is highly correlated with
the other explanatory variables
95
Detect Collinearity
Regression Analysis
Price and all other X
Regression Statistics
Multiple R
0.030437581
R Square
0.000926446
Adjusted R
Square
Standard Error
Observations
VIF
-0.075925366
1.21527235
15
1.000927305
Output for the pie sales example:
 Since there are only two
explanatory variables, only one VIF
is reported
 VIF is < 5
 There is no evidence of
collinearity between Price and
Advertising
96
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-49
Qualitative (Dummy) Variables

Categorical explanatory variable (dummy
variable) with two or more levels:





yes or no, on or off, male or female
coded as 0 or 1
Regression intercepts are different if the
variable is significant
Assumes equal slopes for other variables
The number of dummy variables needed is
(number of levels – 1)
97
Dummy-Variable Model Example
(with 2 Levels)
Let:
ŷ
b1x1  b2 x 2
y = b
pie
0 sales
x1 = price
x2 = holiday (X2 = 1 if a holiday occurred during the week)
(X2 = 0 if there was no holiday that week)
98
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-50
Dummy-Variable Model Example
(with 2 Levels)
(continued)
ŷ  b0  b1x1  b2 (1)  (b0  b2 )  b1x1
ŷ  b0  b1x1  b2 (0) 
b0
Different
intercept
y (sales)
 b1x1
Holiday
No Holiday
Same
slope
If H0: β2 = 0 is
rejected, then
“Holiday” has a
significant effect
on pie sales
b0 + b2
b0
x1 (Price)
99
Interpreting the Dummy Variable
Coefficient (with 2 Levels)
Example:
Sales  300 - 30(Price)  15(Holiday)
Sales: number of pies sold per week
Price: pie price in $
1 If a holiday occurred during the week
0 If no holiday occurred
b2 = 15: on average, sales were 15 pies greater in
weeks with a holiday than in weeks without a
holiday, given the same price
100
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-51
Dummy-Variable Models
(more than 2 Levels)



The number of dummy variables is one less than
the number of levels
Example:
y = house price ; x1 = square feet
The style of the house is also thought to matter:
Style = ranch, split level, condo
Three levels, so two dummy
variables are needed
101
Dummy-Variable Models
(more than 2 Levels)
Let the default category be “condo”

1 if ranch
x2  

0 if not
(continued)

1 if split level
x3  

0 if not
ŷ  b0  b1x1  b2 x 2  b3 x3
b2 shows the impact on price if the house is a
ranch style, compared to a condo
b3 shows the impact on price if the house is a
split level style, compared to a condo
102
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-52
Interpreting the Dummy Variable
Coefficients (with 3 Levels)
Suppose the estimated equation is
ŷ  20.43  0.045x1  23.53x2  18.84x3
For a condo: x2 = x3 = 0
ŷ  20.43  0.045x1
For a ranch: x3 = 0
ŷ  20.43  0.045x1  23.53
For a split level: x2 = 0
ŷ  20.43  0.045x1  18.84
With the same square feet, a
ranch will have an estimated
average price of 23.53
thousand dollars more than a
condo
With the same square feet, a
ranch will have an estimated
average price of 18.84
thousand dollars more than a
condo.
103
Interaction Effects


Hypothesizes interaction between pairs of x
variables
 Response to one x variable varies at different
levels of another x variable
Contains two-way cross product terms
y  β0  β1x1  β 2 x 2  β3 x 3  β 4 x1x 2  β5 x12 x 2
Basic Terms
Interactive Terms
104
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-53
Effect of Interaction
Given:

y  β0  β1x1  β2 x 2  β3 x1x 2  ε
Without interaction term, effect of x1 on y is
measured by β1
With interaction term, effect of x1 on y is
measured by β1 + β3 x2
Effect changes as x2 increases



105
Interaction Example
y = 1 + 2x1 + 3x2 + 4x1x2
y
where x2 = 0 or 1 (dummy variable)
x2 = 1
y = 1 + 2x1 + 3(1) + 4x1(1)
= 4 + 6x1
12
8
x2 = 0
y = 1 + 2x1 + 3(0) + 4x1(0)
= 1 + 2x1
x1
4
0
0
0.5
1
1.5
Effect (slope) of x1 on y does depend on x2 value
106
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-54
Interaction Regression Model
Worksheet
Case, i
yi
x1i
x2i
x1i x2i
1
2
3
4
:
1
4
1
3
:
1
8
3
5
:
3
5
2
6
:
3
40
6
30
:
multiply x1 by x2 to get x1x2, then
run regression with y, x1, x2 , x1x2
107
Evaluating Presence
of Interaction

Hypothesize interaction between pairs of
independent variables
y  β0  β1x1  β2 x 2  β3 x1x 2  ε

Hypotheses:
 H0: β3 = 0 (no interaction between x1 and x2)
 HA: β3 ≠ 0 (x1 interacts with x2)
108
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-55
Model Building

Goal is to develop a model with the best set of
independent variables



Stepwise regression procedure


Easier to interpret if unimportant variables are
removed
Lower probability of collinearity
Provide evaluation of alternative models as variables
are added
Best-subset approach

Try all combinations and select the best using the
highest adjusted R2 and lowest sε
109
Stepwise Regression


Idea: develop the least squares regression
equation in steps, either through forward
selection, backward elimination, or through
standard stepwise regression
The coefficient of partial determination is the
measure of the marginal contribution of each
independent variable, given that other
independent variables are in the model
110
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Chapter 9
Student Lecture Notes
9-56
Best Subsets Regression


Idea: estimate all possible regression equations
using all possible combinations of independent
variables
Choose the best fit by looking for the highest
adjusted R2 and lowest standard error sε
Stepwise regression and best subsets
regression can be performed using Minitab, or
other statistical software packages
111
Business Statistics: A Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.