Download Chapter 13

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Student’s Solutions Manual and Study Guide: Chapter 14
Page 1
Chapter 14
Building Multiple Regression Models
LEARNING OBJECTIVES
This chapter presents several advanced topics in multiple regression analysis
enabling you to:
1. Generalize linear regression models as polynomial regression models using
model transformation and Tukey’s ladder of transformation, accounting for
possible interaction among the independent variables.
2. Examine the role of indicator, or dummy, variables as predictors or
independent variables in multiple regression analysis.
3. Use all possible regressions, stepwise regression, forward selection, and
backward elimination search
procedures to develop regression models that account for the most variation in
the dependent variable and are parsimonious.
4. Recognize when multicollinearity is present, understanding general techniques
for preventing and controlling it.
5. Explain when to use logistic regression, and interpret its results.
CHAPTER OUTLINE
14.1
Non Linear Models: Mathematical Transformation
Polynomial Regression
Tukey’s Ladder of Transformations
Regression Models with Interaction
Model Transformation
14.2
Indicator (Dummy) Variables
14.3
Model-Building: Search Procedures
Search Procedures
All Possible Regressions
Stepwise Regression
Forward Selection
Backward Elimination
Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition
Student’s Solutions Manual and Study Guide: Chapter 14
14.4
Multicollinearity
14.5
Logistic Regression
Page 2
KEY TERMS
All Possible Regressions
Backward Elimination
Dummy Variable
Forward Selection
Indicator Variable
Multicollinearity
Quadratic Regression Model
Qualitative Variable
Search Procedures
Stepwise Regression
Tukey’s Four-quadrant Approach
Tukey’s Ladder of Transformations
Variance Inflation Factor (VIF)
STUDY QUESTIONS
1. Another name for an indicator variable is a ________________ variable. These
variables are _____________________ as opposed to quantitative variables.
2. Indicator variables are coded using __________ and _________.
3. Suppose an indicator variable has four categories. In coding this into variables
for multiple regression analysis, there should be _______________ variables.
4. Regression models in which the highest power of any predictor variable is one
and in which there are no interaction terms are referred to as
________________________ models.
5. The interaction of two variables can be studied in multiple regression using the
_______________ terms.
6. Suppose a researcher wants to analyze a set of data using the model: ŷ = b0b1x
The model would be transformed by taking the _______________________ of
both sides of the equation.
7. Perhaps the most widely known and used of the multiple regression search
procedures is _______________________ regression.
Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition
Student’s Solutions Manual and Study Guide: Chapter 14
Page 3
8. One multiple regression search procedure is Forward Selection. Forward
selection is essentially the same as stepwise regression except that
_________________________.
9. Backward elimination is a step-by-step process that begins with the
_________________________ model.
10. A search procedure that computes all the possible linear multiple regression
models from the data using all variables is called
___________________________________.
11. When two or more of the independent variables of a multiple regression model
are highly correlated it is referred to as
___________________________________________. This condition causes
several other problems to occur including
(1) difficulty in interpreting
__________________________________________.
(2) Inordinately small ______________________ for the regression coefficients
may result.
(3) The standard deviations of regression coefficients are
________________________.
(4) The ____________________________ of estimated regression coefficients
may be the opposite of what would be expected for a particular predictor
variable.
Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition
Student’s Solutions Manual and Study Guide: Chapter 14
ANSWERS TO STUDY QUESTIONS
1. Dummy, Qualitative
2. 0, 1
3. 3
4. First-Order
5. x1  x2 or Cross Product
6. Logarithm
7. Stepwise
8. Once a variable is entered into the process, it is never removed
9. Full
10. All Possible Regressions
11. Multicollinearity, the Estimates of the Regression Coefficients,
t Values, Overestimated, Algebraic Sign
Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition
Page 4
Student’s Solutions Manual and Study Guide: Chapter 14
Page 5
SOLUTIONS TO PROBLEMS IN CHAPTER 14
14.1
Simple Regression Model:
ŷ = – 147.27 + 27.128 x
F = 229.67 with p = .000, se = 27.27, R2 = .97, adjusted R2 = .966, and
t = 15.15 (for x) with p = .000. This is a very strong simple regression
model.
Quadratic Model (Using both x and x2):
ŷ = – 22.01 + 3.385 x + 0.9373 x2
F = 578.76 with p = .000, se = 12.3, R2 = .995, adjusted R2 = .993, for x:
t = 0.75 with p = .483, and for x2: t = 5.33 with p = .002. The quadratic
model is also very strong with an even higher R2 value. However, in this
model only the x2 term is a significant predictor.
14.3
Simple regression model:
ŷ = - 1,456.6 + 71.017 x
R2 = .928 and adjusted R2 = .910. t = 7.17 (for x) with p = .002.
Quadratic regression model:
ŷ = 1,012 - 14.06 x + 0.6115 x2
R2 = .947 but adjusted R2 = .911. The t statistic for the x term is t = - 0.17
with p = .876. The t statistic for the x2 term is t = 1.03 with p = .377
Neither predictor is significant in the quadratic model. Also, the adjusted
R2 for this model is virtually identical to the simple regression model. The
quadratic model adds virtually no predictability that the simple regression
model does not already have. The scatter plot of the data follows:
Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition
Student’s Solutions Manual and Study Guide: Chapter 14
Page 6
7000
6000
Ad Exp
5000
4000
3000
2000
1000
30
40
50
60
70
80
90
100
110
Eq & Sup Exp
14.5
The regression model is:
ŷ = - 28.61 - 2.68 x1 + 18.25 x2 - 0.2135 x12 - 1.533 x22 + 1.226 x1x2
F = 63.43 with p = .000 significant at  = .001
se = 4.669, R2 = .958, and adjusted R2 = .943
None of the t statistics for this model are significant. They are t(x1) = 0.25 with p = .805, t(x2) = 0.91 with p = .378, t(x12) = - 0.33 with .745,
t(x22) = - 0.68 with .506, and t(x1x2) = 0.52 with p = .613. This model has
a high R2 yet none of the predictors are individually significant.
The same thing occurs when the interaction term is not in the model.
None of the t statistics are significant. The R2 remains high at .957
indicating that the loss of the interaction term was insignificant.
14.7
The regression equation is:
ŷ = 13.619 - 0.01201 x1 + 2.998 x2
Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition
Student’s Solutions Manual and Study Guide: Chapter 14
Page 7
The overall F = 8.43 is significant at  = .01 (p = .009).
se = 1.245, R2 = .652, adjusted R2 = .575
The t statistic for the x1 variable is only t = -0.14 with p = .893. However
the t statistic for the dummy variable, x2 is t = 3.88 with p = .004. The
indicator variable is the significant predictor in this regression model that
has some predictability (adjusted R2 = .575).
14.9
This regression model has relatively strong predictability as indicated by
R2 = .795. Of the three predictor variables, only x1 and x2 have significant
t statistics (using  = .05). x3 (a non-indicator variable) is not a significant
predictor. x1, the indicator variable, plays a significant role in this model
along with x2.
14.11 The regression equation is:
Price = 3.4394 - 0.0195 Hours + 9.113 ProbSeat + 10.528 Downtown
The overall F = 6.58 with p = .0099 which is significant at  = .01. se =
3.94, R2 = .664, and adjusted R2 = .563. The difference between R2 and
adjusted R2 indicates that there are some non-significant predictors in the
model. The t statistics, t = - 0.13 with p = .901 and t = 1.34 with p = .209,
of Hours and Probability of Being Seated are non-significant at  = .05.
The only significant predictor is the dummy variable, Downtown location
or not, which has a t statistic of 3.95 with p = .003 which is significant at
 = .01. The positive coefficient on this variable indicates that being in
the Downtown adds to the price of a meal.
14.13 Stepwise Regression:
Step 1:
After developing a simple regression model for each
independent variable, we select the model with x2 with
t = - 7.35 and R2 = .794. The model is ŷ = 36.15 - 0.146 x2.
Step 2:
x3 enters the model and x2 remains in the model.
t for x2 is -4.60, t for x3 is 2.93. R2 = .876.
The model is ŷ = 26.40 - 0.101 x2 + 0.116 x3.
Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition
Student’s Solutions Manual and Study Guide: Chapter 14
Step 3:
Page 8
The regression model is explored that contains x1 in
addition to x2 and x3. The model does not produce any
significant result. No new variable is added to the
model produced in Step 2. Note that at every step of the
procedure, the variable x1 appears to be non-significant.
14.15 The output shows that the final model had four predictor variables, x3, x1,
x2, and x6. The variables, x4 and x5 did not enter the stepwise analysis.
The procedure took four steps. The final model was:
y1 = 5.96 – 5.00 x3 + 3.22 x1 + 1.78 x2 + 1.56 x6
The R2 for this model is .5929, and se is 3.36. The t ratios are:
x3 : t = 3.07; x1 :t = 2.05; x2: t = 2.02; and x6: t = 1.98.
14.17 Stepwise Regression:
Step 1:
After developing a simple regression model for each
independent variable, we select the model for Durability
with t = 3.32. For this model: R2 = .379 and se = 15.48.
The regression equation is:
Amount Spent = 17.093 + 7.135 Durability
Step 2:
The regression models are explored that contain Value
or Service in addition to Durability. The t value of the
regression coefficient for Value (Service) is not
significant. No new variable is added to the
model produced in Step 1.
14.19
y
y
1
x1 -.653
x1
x2
-.653 -.891
1
x2 -.891 .650
x3
.821
.650 -.615
1
-.688
Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition
Student’s Solutions Manual and Study Guide: Chapter 14
x3 .821 -.615 -.688
Page 9
1
There appears to be some correlation between all pairs of the predictor
variables, x1, x2, and x3. All pairwise correlations between independent
variables are in the .600 to .700 range.
14.21 The predictor intercorrelations are:
Value
Value
Durability
Service
1
.559
.533
Durability
.559
1
.364
Service
.533
.364
1
An examination of the predictor intercorrelations reveals that Service and
Durability have very little correlation, but Value and Durability have a
correlation of .559 and Value and Service a correlation of .533. These
correlations might suggest multicollinearity.
14.23 The log of the odds ratio or logit equation is:
ln ( S )  0.932546  0.0000323 Payroll Expenditur es.
The G statistic is 11.175 which with one degree of freedom has a p-value
of 0.001. Thus, there is overall significance in this model.
The predictor, Payroll Expenditures, is significant at  = .01 because the
associated p-value of 0.008 is less than  = .01.
If the payroll expenditures are $80,000, then
ln ( S )  0.932546  0.0000323 (80,000)
14.5 PR ln( S )
 3.516546
OBLEMS
S  e 3.516546  0.0297.
From this, the probability that the hospital with the $80,000 payroll
expenditure is a psychiatric hospital can be determined by
S
0.0297

p

 0.0288 or 2.88% .
S  1 0.0297  1
Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition
Student’s Solutions Manual and Study Guide: Chapter 14
Page 10
14.25 The log of the odds ratio or logit equation is:
ln ( S )  3.07942  0.0544532 Number of Production Workers .
The G statistic is 97.492 which with one degree of freedom has a p-value
of 0.000. Thus, there is overall significance in this model.
The p-value associated with the predictor variable, Number of Production
Workers, is 0.000. This indicates that Number of Production Workers is a
significant predictor in the model at  = .001.
If the number of production workers is 30, then
ln ( S )  3.07942  0.0544532 30
ln ( S )  1.445824
S  e 1.445824  0.23555.
From this, the probability that that a company with 30 production
workers has a large value of industrial shipments can be determined by
S
0.23555

p

 0.1906 or 19.06% .
S  1 0.23555  1
14.27 The regression model is:
ŷ = 564.2 - 27.99 x1 - 6.155 x2 - 15.90 x3
F = 11.32 with p = .003, se = 42.88, R2 = .809, adjusted R2 = .738.Thus,
overall there is statistical significance at  = .01, For x1, t = -0.92 with p =
.384, for x2, t = -4.34 with p = .002, for x3, t = -0.71 with p = .497. Thus,
only one of the three predictors, x2, is a significant predictor in this model.
This model has very good predictability (R2 = .809). The gap between R2
and adjusted R2 underscores the fact that there are two non-significant
predictors in this model. x1 is a non-significant indicator variable.
14.29 Stepwise Regression:
Step 1:
After developing a simple regression model for each
independent variable(x1 , Log x1) , we select the
model for Log x1 because it has the largest absolute value
of t = 17.36 ( p-value of 0.000). For this model: R2 =
.9617. The model appears in the form:
Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition
Student’s Solutions Manual and Study Guide: Chapter 14
Page 11
ŷ = - 13.20 + 11.64 Log x1.
Step 2:
The regression model with two predictors is explored
that contains x1 in addition to Log x1. At this step, the t ratio
for x1 is 0.90 with the p-value = 0.386. It indicates that the
predictor x1 is non-significant. No new variable is added to
the model produced in Step 1.
14.31 Stepwise Regression:
Step 1:
After developing a simple regression model for each
independent variable (Copper , Silver, Aluminum), we
select the model with Silver because it has the largest
absolute value t statistic: tSilver = 3.32 ( p-value of 0.007).
The predictor Silver is significant at  = .01. For this
model: R2 = 0.5244. The regression equation is
Gold = 233.4 + 17.74 Silver .
Step 2:
The regression models with two predictors are explored
that contain Copper (or Aluminum) in addition to Silver. At
this step, analysis of the t statistics shows the best model:
Gold = – 50.07 + 18.86 Silver +3.587 Aluminum.
The R2 at this step is .8204, the t ratio for Silver is 5.43
with p = .0004, and the t ratio for Aluminum is 3.85 with
p = .004.
Step 3:
A search is made to determine whether the variable Copper
in conjunction with Silver and Aluminum produces
the largest significant absolute t value in the model. The
model does not produce significant result. No new
variable is added to the model produced in Step 2.
14.33 Let Beef = x1, Chicken = x2, Eggs = x3, Bread = x4, Coffee = x5, and Price
Index = y.
Stepwise Regression:
Step 1:
Using graphs and Tukey’s ladder of transformations we
develop a simple regression model for each independent
variable (x1, Log x2, x3 , x4 , x5 ). We select the model for
x1 because it has the largest absolute value of
t = 13.67. For this model: R2 = .8696. The model appears
in the form ŷ = 93.62 + 0.2080 x1.
Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition
Student’s Solutions Manual and Study Guide: Chapter 14
Page 12
Step 2:
The regression models with two predictors are explored
that contain Logx2 (or x3, x4, x5) in addition to x1. At this
step, analysis of the t statistics shows the best model:
ŷ = 86.96 + 0.1427 x1 + 0.08561 x4.
2
The R at this step is .9033, the t ratio for x1 is 5.67
with p = .000, and the t ratio for x4 is 3.06 with
p = .005 (it is significant at  = .01).
Step 3:
A search is made to determine which of the remaining
independent variables in conjunction with x1 and x4
produces the largest significant absolute t value in the
model. None of the models produce significant results. No
new variables are added to the model produced in Step 2.
14.35 Stepwise Regression:
Step 1:
After developing a simple regression model for each
independent variable (Familiarity, Satisfaction , Proximity),
we select the model with Familiarity because it has the
largest absolute value t statistic: t Familiarity = 6.71 ( p-value
of 0.000). The predictor Familiarity is significant at  =
.001.For this model: R2 = 0.6167. The regression equation
is
Number of Visits = 0.05488 + 1.0915 Familiarity.
Step 2:
A search is made to determine whether the variable
Satisfaction or Proximity in conjunction with Familiarity
produces the significant absolute t value in the model. None
of the models produce significant results. No new
variables are added to the model produced in Step 1.
14.37 The output shows that that the stepwise regression procedure stopped at
Step 3. At step 1, the model with x3 is selected. R2 = .8124 and t statistic
for x3: t = 6.90 . The regression equation is ŷ = 74.81 + 0.099 x3.
At step 2, x2 is entered into the model along with x3. The regression
equation is ŷ = 82.18 + 0.067 x3 – 2.26 x2. The t statistics are
t x3  3.65 and t x2  2.32 . The R2 for this model is .8782. At step 3, x1 is
entered into the model along with x3 and x2. The procedure stops
Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition
Student’s Solutions Manual and Study Guide: Chapter 14
Page 13
here with a final model of: ŷ = 87.89 + 0.071 x3 – 2.71 x2 – 0.256 x1. The
t statistics are : t x3  5.22 , t x2  3.71 and t x1  3.08 . The R2 for this
model is .9407 indicating very strong predictability.
14.39 The log of the odds ratio or logit equation is:
ln ( S )  3.94828  1.36988 Number of kilometres .
The G statistic is 100.537 with p-value of 0.000. Thus, the model is
significant overall. The degree of freedom is equal to 1. The p-value
associated with the predictor variable, Number of kilometres, is 0.000.
This indicates that Number of kilometres is a significant predictor in the
model at  = .001.
If a shopper drives 5 kilometres to get to the store, then
ln ( S )  3.94828  1.36988 (5)
ln ( S )  2.90112
S  e 2.90112  18.1945 .
From this, the probability that that a person would purchase something
can be determined by
S
18.1945

p

 0.948 or about 95% .
S  1 18.1945  1
This indicates that there is very high probability that the person who drives
5 kilometres would purchase something.
For 4 kilometres, the probability drops to .822. For 3 kilometres, the
probability drops to .540 (almost a coin toss). For 2 kilometres, the
probability drops to .230. For 1 kilometer, the probability drops to .071.
Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition
Student’s Solutions Manual and Study Guide: Chapter 14
Page 14
Legal Notice
Copyright
Copyright © 2014 by John Wiley & Sons Canada, Ltd. or related companies. All
rights reserved.
The data contained in these files are protected by copyright. This manual is
furnished under licence and may be used only in accordance with the terms of
such licence.
The material provided herein may not be downloaded, reproduced, stored in a
retrieval system, modified, made available on a network, used to create
derivative works, or transmitted in any form or by any means, electronic,
mechanical, photocopying, recording, scanning, or otherwise without the prior
written permission of John Wiley & Sons Canada, Ltd.
(MMXIII xii FI)
Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition