Download r 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Interaction (statistics) wikipedia , lookup

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Regression toward the mean wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Chapter 4
Regression
Models
Prepared by Lee Revere and John Large
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-1
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Learning Objectives
Students will be able to:
1. Identify variables and use them in a
regression model.
2. Develop simple linear regression
equations from sample data and interpret
the slope and intercept.
3. Compute the coefficient of
determination and the coefficient of
correlation and interpret their meanings.
4. Interpret the F-test in a linear regression
model.
5. List the assumptions used in regression
and use residual plots to identify
problems.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-2
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Learning Objectives
(continued)
Students will be able to:
6. Develop a multiple regression model and
use it to predict.
7. Use dummy variables to model
categorical data.
8. Determine which variables should be
included in a multiple regression model.
9. Transform a nonlinear function into a
linear one for use in regression.
10. Understand and avoid common mistakes
made in the use of regression analysis.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-3
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Chapter Outline
4.1
4.2
4.3
4.4
4.5
4.6
Introduction
Scatter Diagrams
Simple Linear Regression
Measuring the Fit of a
Regression Model
Using Computer Software for
Regression
Assumptions of the
Regression Model
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-4
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Chapter Outline
(continued)
4.7
4.8
4.9
4.10
4.11
4.12
Testing the Model for Significance
Multiple Regression Analysis
Binary or Dummy Variables
Model Building
Nonlinear Regression
Cautions and Pitfalls in Regression
Analysis
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-5
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Introduction
Regression analysis is a very
valuable tool for today’s manager.
Regression is used to:
 understand the relationship between
variables.
 predict the value of one variable based
on another variable.
Cost estimation models are a good example.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-6
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Introduction
(continued)
A regression model is comprised of
a dependent, or response, variable
and an independent, or predictor,
variable.
Dependent Variable = Independent Variable(s)
Prediction Relationship
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-7
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Scatter Diagram
A scatter diagram is used to
graphically investigate the
relationship between the dependent
and independent variables.


Plot the dependent variable on the Y
axis.
Plot the independent variable on the
X axis.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-8
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Triple A Construction
Example
Triple A Construction Company renovates
old homes in Albany. They have found that
its dollar volume of renovation work is
dependent on the Albany area payroll.
Triple A Sales
($100,000’s)
6
8
9
5
4.5
9.5
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
Local Payroll
($100,000,000’s)
3
4
6
4
2
5
4-9
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Triple A Construction
Example (continued)
Sales
($100,000)
Dependent Variable
Scatter Diagram
Payroll Line Fit Plot
10
8
6
4
2
0
0
2
4
6
Payroll ($100.000,000's)
Independent Variable
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-10
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
8
Simple Linear
Regression
Regression models are used to test
if a relationship exists between
variables; that is, to use one
variable to predict another.
However, there is some random
error that cannot be predicted.
Y = 0 + 1X + error
Where,
Y = dependent variable (response)
X = independent variable (predictor / explanatory)
0 = intercept (value of Y when X = 0)
1 = slope of the regression line
Error = random error
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-11
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Simple Linear Regression
(continued)
Sample data are used to estimate
the true values for the intercept and
slope.
Y = b0+ b 1X
Where,
Y = predicted value of Y
The difference between the actual
value of Y and the predicted value
(using sample data) is known as
the error.
Error = (actual value) – (predicted value)
e=Y-Y
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-12
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Least Squares
Regression
Sales
($100,000)
Least squares regression minimizes
the sum of the squared errors.
Payroll Line Fit Plot
10
8
6
4
2
0
0
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
2
4
6
Payroll ($100.000,000's)
4-13
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
8
Least Squares
Regression Equations
Least squares regression equations
are:
Y = b0+ b 1X
X  Y 


XY 
 X  X Y  Y   XY  nXY 



b
 X  X   X  n X
X
2
1
n
2
2
2


Y
X


b Y b X  n b n
0
1
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
1
4-14
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
X
n
2
Calculating the
Regression Line: Triple
A Construction
2
2
Sales (Y)
Payroll (X)
(X - X)
6
3
1
1
8
4
0
0
9
6
4
4
5
4
0
0
4.5
2
4
5
9.5
5
1
2.5
Summations for each column:
42
24
10
12.5
Y = 42/6 = 7
(X-X)(Y-Y)
X = 24/6 = 4
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-15
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Calculating the
Regression Line (continued)
Calculating the required
parameters:
b1=∑(X-X)(Y-Y)
∑ (X-X)2
=
12.5
10
= 1.25
bo= Y – b1X = 7 – (1.25)(4) = 2
So,
Y = 2 + 1.25 X
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-16
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Using Regression Line
If the payroll estimations for next
year were $600 million, what is
the predicted value of Triple A’s
sales?
Y = 2 + 1.25 X
Sales = 2 + 1.25 (payroll)
So,
Next year sales = 2 + 1.25 (6) = 9.5
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-17
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Measuring the Fit of
the Regression Model
To understand how well the model
predicts the response variable, we
evaluate the following:
 The variability in the Y variable
SST – Total variability about the mean
SSE – Variability about the regression line
SSR – Variability that is explained
 Coefficient of Determination
r2 - Proportion of explained variation
 Correlation Coefficient
r – Strength of the relationship between Y
and X variables
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-18
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Measuring the Fit of
the Regression Model
Errors (deviations) may be positive or
negative. Summing the errors would be
misleading, thus we square the terms
prior to summing.
 Sum of Squares Total (SST) measures the
total variable in Y.
SST =∑ (Y-Y) 2
 Sum of the Squared Error (SSE) is less
than the SST because the regression line
reduced the variability.
SSE =∑ e 2 = ∑ (Y-Y)
2
 Sum of Squares due to Regression (SSR)
indicated how much of the total variability
is explained by the regression model.
SSR =∑(Y-Y)2
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-19
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Measuring the Fit of
the Regression Model
(continued)
For Triple A Construction:
SST =∑ (Y-Y) 2 = 22.5
SSE =∑ e 2 = ∑ (Y-Y)
2
= 6.875
SSR =∑(Y-Y)2 = 15.625
Note:
SST = SSR + SSE
Explained
Variability
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-20
Unexplained
Variability
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Coefficient of
Determination
The coefficient of determination (r2 )
is the proportion of the variability in Y
that is explained by the regression
equation.
r2 = SSR = 1 – SSE
SST
SST
For Triple A Construction:
r2 = 15.625 = 0.6944
22.5
69% of the variability in sales is explained
by the regression based on payroll.
Note: 0 < r2 < 1
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-21
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Correlation Coefficient
The correlation coefficient (r)
measures the strength of the linear
relationship.
r
nXYXY
[nX X ][nY (Y Y ]
2
2
2
2
2
For Triple A Construction, r = 0.8333
Note: -1 < r < 1
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-22
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Correlation Coefficient
(continued)
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-23
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Computer Software for
Regression
In Excel, use Tools/
Data Analysis. This
is an ‘add-in’ option.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-24
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Computer Software for
Regression (continued)
After selecting the regression
option, this will appear
X and Y ranges
Specify labels if
included in range
Output area
Scatter diagram
output
Residual (error) output
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-25
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Computer Software for
Regression (continued)
A scatter diagram will be given.
Multiple r is
correlation
coefficient (r)
High r2(close to 1)
Regression coefficients
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-26
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Assumptions of the
Regression Model
We make certain assumptions about
the errors in a regression model
which allow for statistical testing.
Assumptions:
 Errors are independent.
 Errors are normally distributed.
 Errors have a mean of zero.
 Errors have a constant variance.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-27
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Residual Analysis
Residual analyses (plots) will
highlight glaring violations of the
assumptions.
Healthy Residual Plot – no violations
X
0
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-28
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Residual Analysis:
Nonlinear Violation
Nonlinear Residual Plot –violation
0
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
X
4-29
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Residual Analysis:
Nonconstant Error
Nonconstant Error Residual Plot –violation
0
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
X
4-30
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Estimating the
Variance
The mean squared error (MSE) is
the estimate of the error variance of
the regression equation.
2
s = MSE = SSE
n–k-1
Where,
n = number of observations in the sample
k = number of independent variables
For Triple A Construction, s 2= 1.7188
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-31
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Estimating the
Variance (continued)
The standard deviation of the
regression is used in many statistical
tests about the regression model.
s = MSE
For Triple A Construction, s = 1.31
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-32
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Testing the Model for
Significance: F-test
An F-test is used to statistically
test the null hypothesis that there
is no linear relationship between
the X and Y variables (i.e. β1 = 0).
If the significance level for the F
test is low, we reject Ho and conclude
there is a linear relationship.
F = MSR
MSE
where, MSR = SSR
k
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-33
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Testing the Model for
Significance: F-test
For Triple A Construction:
MSR = 15.625 = 15.625
1
F
= 15.625 = 9.0909
1.7188
The significance level for F = 9.0909 is
0.0394, indicating we reject Ho and
conclude a linear relationship exists
between sales and payroll.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-34
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Testing the Model for
Significance: R2
r2 is the best measure of the
strength of the prediction
relationship between the X and Y
variables.
 Values closer to 1 indicate a strong
prediction relationship.
 Good regression models have
significant F-test and high r2
values.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-35
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Testing the Model for
Significance: Coefficient
Hypotheses
Statistical tests of significance
can be performed on the coefficients.
The null hypothesis is that the
coefficient of X (i.e., the slope of the
line) is 0.
 P values are the observed significance
level and can be used to test the null
hypothesis.
 For a simple linear regression the test
of the regression coefficients gives the
same information as the F-test.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-36
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
ANOVA Tables
When developing a regression
model, an ANOVA table is
computing by most statistical
software. The general form of the
ANOVA table is helpful for
understanding the interrelatedness
of error terms.
DF
Regression k
SS
MS
F
Significance
SSR
MSR
MSR/MSE
P-value
MSE
Residual
n-k-1
SSE
Total
n-1
SST
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-37
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Multiple Regression
Multiple regression models are
similar to simple linear regression
models except they include more
than one X variable.
Y = b0+ b1 X 1+ b2X 2+…+ bnXn
slope
Independent variables
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-38
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Multiple Regression:
Wilson Realty Example
Wilson Realty wants to develop a model to
determine the suggested listing price for a house
based on size and age.
Price
Sq. Feet
Age
Condition
35000
1926
30
Good
47000
2069
40
Excellent
49900
1720
30
Excellent
55000
1396
15
Good
58900
1706
32
Mint
60000
1847
38
Mint
67000
1950
27
Mint
70000
2323
30
Excellent
78500
2285
26
Mint
79000
3752
35
Good
87500
2300
18
Good
93000
2525
17
Good
95000
3800
40
Excellent
97000
1740
12
Mint
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-39
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Wilson Realty Example
(continued)
67% of the variation in
sales price is explained by
size and age.
Ho: No linear
relationship
is rejected
Y = 60815.45 + 21.91(size) – 1449.34 (age)
Ho: β1 = 0 is rejected
Ho: β2 = 0 is rejected
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-40
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Wilson Realty Example
(continued)
Wilson Realty has found a linear
relationship between price and size
and age. The coefficient for size
indicates each additional square foot
increases the value by $21.91, while
each additional year in age decreases
the value by $1449.34.
Y = 60815.45 + 21.91(size) – 1449.34 (age)
For a 1900 square foot house that is 10
years old, the following prediction can be
made:
$87,951 = 21.91(1900) + 1449.34(10)
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-41
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Binary Variables
Binary (or dummy) variables
are special variables that are
created for qualitative data.
 A dummy variable is assigned a
value of 1 if a particular condition is
met and a value of 0 otherwise.
 The number of dummy variables
must equal one less than the number
of categories of the qualitative
variable.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-42
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Wilson Realty Example:
Binary Variables
Return to Wilson Realty, and let’s
evaluate how to use property
condition in the regression model.
There are three categories: Mint,
Excellent, and Good.
X3 = 1 if the house is in excellent condition
= 0 otherwise
X4 = 1 if the house is in mint condition
= 0 otherwise
Note: If both X3 and X 4 = 0 then the
house is in good condition
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-43
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Wilson Realty: Binary
Variables (continued)
What can you say about the new model?
Y = 48329.23 + 28.21 (size) – 1981.41(age) +
23684.62 (if mint) + 16581.32 (if excellent)
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-44
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Model Building
The best model is a statistically
significant model with a high r2
and a few variables.
 As more variables are added to the
model, the r2 usually increases.
 The adjusted r2 takes into account
the number of independent variables
in the model.
Note: When variables are added to the model, the
value of r2 can never decrease; however, the
adjusted r2 may decrease.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-45
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Model Building
(continued)
Collinearity or multicollinearity
exists when an independent variable
is correlated with another
independent variable.
 Collinearity and multicollinearity
create problems in the coefficients.
 The overall model prediction is still
good; however individual
interpretation of the variables is
questionable.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-46
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Nonlinear Regression
Nonlinear relationships may exist
between variables, thereby requiring
a transformation of one or more
variables to achieve linearity.
 Transformations may be used to turn
a nonlinear model into a linear
model.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-47
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Automobile Example:
Nonlinear Regression
Engineers at Colonel Motors want to use
regression analysis to improve fuel efficiency.
They are studying the impact of weight on
miles per gallon (MPG).
MPG
Weight
MPG
Weight
12
4.58
20
3.18
13
4.66
23
2.68
15
4.02
24
2.65
18
2.53
33
1.70
19
3.09
36
1.95
19
3.11
42
1.92
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-48
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Automobile Example
(continued)
Perhaps a nonlinear relationship exists?
45
40
35
Linear regression line
MPG
30
25
20
15
Nonlinear regression line
10
5
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Weigth (1,000 lbs)
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-49
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
5
Automobile Example
(continued)
 Linear regression model:
MPG = 47.8 – 8.2 (weight)
F significance = .0003
r2 = .7446
 Nonlinear (transformed variable)
regression model
MPG = 79.8 – 30.2(weigth) + 3.4 (weight)
2
F significance = .0002
R2 = .8478
Which model is best? What are the difficulties
with interpreting the individual coefficients?
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-50
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Cautions and Pitfalls
 If the assumptions are not met, the
statistical test may not be valid.
 Correlation does not mean causation.
 Multicollinearity causes problems with
coefficient interpretation.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-51
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458
Cautions and Pitfalls
(continued)
 Prediction beyond the range of X
values in the sample can be misleading,
including interpretation of the intercept
(X=0).
 A linear regression model may not be
the best model, even in the presence of
a significant F test.
 A statistically significant relationship
does not mean practical value.
To accompany Quantitative Analysis
for Management, 9e
by Render/Stair/Hanna
4-52
© 2006 by Prentice Hall, Inc.,
Upper Saddle River, NJ 07458