Download The Quadratic Regression Model

Document related concepts

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression toward the mean wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Choice modelling wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Chapter 15
Model Building and Model
Diagnostics
McGraw-Hill/Irwin
Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved.
Model Building and Model Diagnostics
15.1 The Quadratic Regression Model
15.2 Interaction
15.3 Logistic Regression (Optional)
15.4 Model Building, and the Effects of
Multicollinearity
15.5 Improving the Regression Model I:
Diagnosing and Using Information
about Outlying and Influential
Observations
15-2
Model Building and Model Diagnostics
15.6 Improving the Regression Model II:
Transforming the Dependent and
Independent Variables
15.7 Improving the Regression Model III:
The Durbin-Watson Test and
Dealing with Autocorrelation
15-3
The Quadratic Regression Model
• One useful form of linear regression is the
quadratic regression model
• Assume we have n observations of x and y
• The quadratic regression model relating y to x
is y = β0 + β1x + β2x2 + 
– β0 + β1x + β2x2 is the mean value of the
dependent variable y when the value of the
independent variable is x
– β0, β1 and β2 are unknown regression parameters
relating the mean value of y to x
–  is an error term that describes the effects on y of
all factors other than x and x2
15-4
The Quadratic Regression Model
Visually
15-5
A Note on the Quadratic Model
• Even though the quadratic model employs
the squared term x2 and, as a result,
assumes a curved relationship between the
mean value of y and x, this model is a linear
regression model
• This is because b0 + b1x + b2x2 expresses the
mean value y as a linear function of the
parameters b0, b1, and b2
• As long as the mean value of y is a linear
function of the regression parameters, we
have a linear regression model
15-6
Example 15.1: The Gasoline Additive
Case #1
15-7
Example 15.1: The Gasoline Additive
Case #2
15-8
Example 15.1: The Gasoline Additive
Case #3 (MINITAB Output)
15-9
Example 15.1: The Gasoline Additive
Case #4
• Oil company wishes to find the value of x that
maximizes predicted mileage
• Using calculus, it can be shown that x=2.44
maximizes mileage
– Therefore, the oil company should blend 2.44
units of additive ST-3000 to each gallon
• The resulting (maximized) mileage is:
– ŷ = 25.7152 + 4.9762(2.44) – 1.01905(2.44)2
– ŷ = 31.7901 miles per gallon
15-10
More Variables
• We have only looked at the simple case
where we have y and x
• That gave us the quadratic regression
model y = β0 + β1x + β2x2 + 
• However, we are not limited to just two
terms
• The following would also be a valid
quadratic regression model
y = β 0 + β 1x 1 + β 2x 12 + β 3 x 2 + β 4x 3 + 
15-11
Interaction
• Multiple regression models often contain
interaction variables
– These are variables that are formed by multiplying
two independent variables together
– For example, x1·x2
• In this case, the x1·x2 variable would appear in the model
along with both x1 and x2
• We use interaction variables when the
relationship between the mean value of y and
one of the independent variables is
dependent on the value of another
independent variable
15-12
Interaction Variable Example
• Consider a company that runs both radio and
television ads for its products
• It is reasonable to assume that raising either
ad amount would raise sales
• However, it is also reasonable to assume that
the effectiveness of television ads depends,
in part, on how often consumers hear the
radio ads
• Thus, an interaction variable would be
appropriate
15-13
Spotting Interactive Terms
• It is fairly easy to construct data plots to
check for interaction when a careful
experiment is carried out
• It is often not possible to construct the
necessary plots with less structured data
• If an interaction is suspected, we can include
the interactive term and see if it is significant
15-14
Example 15.3: The Fresh Detergent Case
• Enterprise Industries produces Fresh liquid
laundry detergent
• Would like to predict demand
• Gathers the following data
– Demand (y)
– Price of Fresh (x1)
– Average industry price (x2)
– Advertising (x3)
– Price difference, e.g. x2 – x1 (x4)
15-15
Example 15.3: The Fresh Detergent Case
#2
15-16
Example 15.3: The Fresh Detergent Case
#3
• In Example 15.2 developed the model:
ŷ = 17.3244 + 1.3070 x4 – 3.6956 x3 +
0.3486 x23
• Since there might be interaction
between x4 and x3, wish to add x4x3
term to model
15-17
Example 15.3: Excel and MegaStat
Output
15-18
Example 15.3: Illustrating the Interaction
15-19
A Note on Interactive Model
Construction
When an interaction term (say x1x2) is
important to a model, it is the usual
practice to leave the corresponding linear
terms (x1 and x2) in the model no matter
what their p-values
15-20
Logistic Regression
• Logistic regression and least squares
regression are very similar
– Both produce prediction equations
• The y variable is what makes logistic
regression different
– With least squares regression, the y variable is a
quantitative variable
– With logistic regression, it is usually a dummy 0/1
variable
• With large data sets, y variable may be the probability of
a set of observations having a dummy variable value of
one
15-21
Regression Drawbacks When Using
Dummy Dependent Variable
• It is possible to have a predicted y value less
than zero or greater than one
• One assumption is constant variance but this
is not possible with a dummy variable
– When 50 percent of the y’s are ones, the variance
is .25, its maximum value
– As the percentage of y’s that are one approaches
one or zero, the variance approaches zero
• Another assumption is the error terms are
normally distributed
– Since y can be only 0 or 1, this is a hard to justify
• Logistic regression overcomes these
drawbacks
15-22
Example: Price Reduction Coupons
x
1
2
3
4
5
6
y
4
7
20
35
44
46
p
.08
.14
.40
.70
.88
.92
• The x values are six coupon amounts
– Each were sent to 50 people
• The y values are the number who responded
• The p value is the probability that someone
from that group responded
– This will be the dependent variable
15-23
Logistic Curve
15-24
Logistic Example
e b o  b1x 
px  
 b o  b1 x 
1 e
• The formula above is the logistic curve
• p(x) denotes the probability that a household
will redeem a coupon
• We know from prior slide that b0 = –3.7456
and b1 = 1.1109
– In logistic regression, these are computed using
maximum likelihood estimators
• Prior slide gives estimates for x=1 to x=6
15-25
General Logistic Regression Model
 b o  b1 x1  b 2 x2  b k xk 
e
px1 , x2 ,, xk  
 b o  b1 x1  b 2 x2  b k xk 
1 e
• p(x1,x2,…xk) is the probability that the event
under consideration will occur when the
values of the independent variable are
x1,x2,…xk
• The odds of the event occurring are
p(x1,x2,…xk)/(1-p(x1,x2,…xk))
– The probability that the event will occur divided by
the probability it will not occur
15-26
Model Building and the Effects of
Multicollinearity
• Multicollinearity is the condition where the
independent variables are dependent, related
or correlated with each other
• Effects
– Hinders ability to use t statistics and p-values to
assess the relative importance of predictors
– Does not hinder ability to predict the dependent
(or response) variable
• Detection
– Scatter plot matrix
– Correlation matrix
– Variance inflation factors (VIF)
15-27
Variance Inflation Factors (VIF)
• The variance inflation factor for the jth
independent (or predictor) variable xj is
1
VIFj 
1  R 2j
• where Rj2 is the multiple coefficient of
determination for the regression model
relating xj to the other predictors:
x1,…,xj-1,xj+1, xk
x j = β 0 + β 1x 1 +
β2x2+…+βj+1xj+1+…+βkxk+
15-28
The Sale Territory Performance Case #1
Sales
3669.88
3473.95
2295.10
4675.56
6125.96
2134.94
5031.66
3367.45
6519.45
4876.37
2468.27
2533.31
2408.11
2337.38
4586.95
2729.24
3289.40
2800.78
3264.20
3453.62
1741.45
2035.75
1578.00
4167.44
2799.97
Time
43.10
108.13
13.82
186.18
161.79
8.94
365.04
220.32
127.64
105.69
57.72
23.58
13.82
13.82
86.99
165.85
116.26
42.28
52.84
165.04
10.57
13.82
8.13
58.54
21.14
MktPoten
Adver MktShare
74065.11 4582.88
2.51
58117.30 5539.78
5.51
21118.49 2950.38
10.91
68521.27 2243.07
8.27
57805.11 7747.08
9.15
37806.94
402.44
5.51
50935.26 3140.62
8.54
35602.08 2086.16
7.07
46176.77 8846.25
12.54
42053.24 5673.11
8.85
36829.71 2761.76
5.38
33612.67 1991.85
5.43
21412.79 1971.52
8.48
20416.87 1737.38
7.80
36272.00 10694.20
10.34
23093.26 8618.61
5.15
26879.59 7747.89
6.64
39571.96 4565.81
5.45
51866.15 6022.70
6.31
58749.82 3721.10
6.35
23990.82
860.97
7.37
25694.86 3571.51
8.39
23736.35 2845.50
5.15
34314.29 5060.11
12.88
22809.53 3552.00
9.14
Change
0.34
0.15
-0.72
0.17
0.50
0.15
0.55
-0.49
1.24
0.31
0.37
-0.65
0.64
1.01
0.11
0.04
0.68
0.66
-0.10
-0.03
-1.63
-0.43
0.04
0.22
-0.74
Accts WkLoad
74.86
15.05
107.32
19.97
96.75
17.34
195.12
13.40
180.44
17.64
104.88
16.22
256.10
18.80
126.83
19.86
203.25
17.42
119.51
21.41
116.26
16.32
142.28
14.51
89.43
19.35
84.55
20.02
119.51
15.26
80.49
15.87
136.58
7.81
78.86
16.00
136.58
17.44
138.21
17.98
75.61
20.99
102.44
21.66
76.42
21.46
136.58
24.78
88.62
24.96
Rating
4.9
5.1
2.9
3.4
4.6
4.5
4.6
2.3
4.9
2.8
3.1
4.2
4.3
4.2
5.5
3.6
3.4
4.2
3.6
3.1
1.6
3.4
2.7
2.8
3.9
15-29
The Sale Territory Performance Case
MINITAB Output of Correlation Matrix
15-30
Sale Territory Case MegaStat Output of
the t Statistics, p-Values, and VIF
15-31
The Sale Territory Performance Case #4
• From prior slide:
– Maximum VIFj = 5.639 (for Accts)
– Mean VIFj = 2.667
• Probably not severe multicollinearity
15-32
Variance Inflation Factors Notes
• VIFj = 1 implies xj not related to other
predictors
• Largest VIFj is greater than ten suggest
severe multicollinearity
• Average VIF substantially greater than
one suggests severe multicollinearity
15-33
Impact of Multicollinearity
• Multicollinearity can hinder our ability to use
the t statistics and related p-values to assess
the importance of the independent variables
– Even when the multicollinearity itself is not severe
• With multicollinearity, the t statistic and pvalue measure the additional importance of
the independent variable xj over the
combined importance of the other
independent variables
15-34
Impact of Multicollinearity
Continued
• When two variables are multicollinear,
they contribute redundant information
• This causes the resulting t statistic to be
smaller than it would be if the variable
were used alone
15-35
Comparing Regression Models on R2,
s, Adjusted R2, and Prediction Interval
• Multicollinearity causes problems evaluating
the p-values of the model
• Therefore, we need to evaluate more than the
additional importance of each independent
variable
• We also need to evaluate how the variables
work together
• One way to do this is to determine if the
overall model gives a high R2 and adjusted
R2, a small s, and short prediction intervals
15-36
Effect of Adding Independent Variable
• Adding any independent variable will
increase R2
• Even adding an unimportant
independent variable
• Thus, R2 cannot tell us that adding an
independent variable is undesirable
15-37
A Better Criterion
• A better criterion is the size of the standard
error s
• If s increases when an independent variable
is added, we should not add that variable
• However, decreasing s alone is not enough
– Adding a variable reduces degrees of freedom
and that makes the prediction interval for y wider
– Therefore, an independent variable should only be
included if it reduces s enough to offset the higher
t value and reduces the length of the desired
prediction interval for y
15-38
C Statistic
• Another quantity for comparing regression
models is called the C statistic
– Also known as CP statistic
• First, calculate mean square error for the
model containing all p potential independent
variables
– Denoted s2p
• Next, calculate SSE for a reduced model with
k independent variables
SSE
• Calculate C as C  2  n  2k  1
sp
15-39
C Statistic Continued
• We want the value of C to be small
• Adding unimportant independent variables
will raise the value of C
• While we want C to be small, we also wish to
find a model for which C roughly equals k+1
– A model with C substantially greater than k+1 has
substantial bias and is undesirable
– If a model has a small value of C and C for this
model is less than k+1, then it is not biased and
the model should be considered desirable
15-40
Stepwise Regression and Backward
Elimination
• Testing various combinations of variables can
be tedious
• In many situations, it is useful to have an
iterative model selection procedure
– At each step, a single independent variable is
added to or deleted from the model
– The model is then reevaluated
– This continues until a final model is found
• There are two such approaches
– Stepwise regression
– Backward elimination
15-41
Stepwise Regression #1
• Assume there are p potential independent
variables
– Further, assume that p is large
• Stepwise regression uses t statistics to
determine the significance of the independent
variables in various models
• Stepwise regression needs two alpha values
– entry, the probability of a type I error related to
entering an independent variable into the model
– stay, the probability of a type I error related to
retaining an independent variable that was
previously entered into the model
15-42
Stepwise Regression #2
• Step 1: The stepwise procedure considers
the p possible one-independent variable
regression models
– Finds the variable with the largest absolute t
statistic
• Denoted as x[1]
– If x[1] is not significant at the entry level, the
process terminates by concluding none of the
independent variables are significant
– Otherwise, x[1] is retained for use in Step 2
15-43
Stepwise Regression #3
• Step 2: The stepwise procedure considers
the p-1 possible two-independent variable
models of the form y = b0 + b1x[1] + b2xj + 
– For each new variable, it tests
H0: b2 = 0
Ha: b2  0
• Pick the variable giving the largest t statistic
• If resulting variable is significant, checks x[1] against stay
to see if it should stay in the model
• This is needed due to multicollinearity
15-44
Stepwise Regression #4
• Further steps: This adding and checking
for removal continues until all nonselected independent variables are
insignificant and will not enter model
– Will also terminate when the variable to be
added to the model is the one just removed
from it
15-45
Backward Elimination
• With backwards elimination, we begin with a
full regression model containing a p potential
independent variables
• We then find the one having the smallest t
statistic
– If this variable is significant, we stop
– If this variable is insignificant, it is dropped and the
regression is rerun with p-1 potential independent
variables
• The process continues to remove variables
one-at-a-time until all the variables are
significant
15-46
Diagnosing and Using Information About
Outlying and Influential Observations
• Observation 1: Outlying with respect to y value
• Observation 2: Outlying with respect to x value
• Observation 3: Outlying with respect to x value and y
value not consistent with regression relationship
(Influential)
15-47
Leverage Values
• Leverage values can help us identify outliers
• The leverage value for an observation is the
distance value (discussed earlier)
• This value is a measure of the distance between
the x value and the center of the experimental
region
• If the leverage value for an observation is large, it
is an outlier with respect to its x value
– Large means greater than twice the average of all the
leverage values
– This can be shown to be 2(k+1)/n
15-48
Hospital Labor Needs Data
y
Monthly labor hours required
x1 Monthly X-ray exposures
x2 Monthly occupied bed days
x3 Average length of patient stay (days)
15-49
Hospital Labor Needs Data #2 Leverage
Values
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Predicted
Hours
688.409
566.520
721.848
696.820
965.393
1,033.150
1,172.464
1,603.620
1,526.780
1,611.370
1,993.869
1,613.270
1,676.558
1,854.170
1,791.405
2,160.550
2,798.761
2,305.580
4,191.333
3,503.930
3,190.957
3,571.890
4,364.502
3,741.400
4,364.229
4,026.520
8,713.307
10,343.810
11,732.170 12,080.864
15,414.940 15,133.026
18,854.450 19,260.453
Residual Leverage
0.121
-121.889
0.226
-25.028
0.130
67.757
0.159
431.156
0.085
84.590
0.112
-380.599
0.084
177.612
0.083
369.145
0.085
-493.181
0.120
-687.403
0.077
380.933
0.177
-623.102
0.064
-337.709
0.146
1,630.503
0.682
-348.694
0.785
281.914
0.863
-406.003
Studentized
Residual
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786
Studentized
Deleted
Residual Cook's D
0.002
-0.203
0.000
-0.044
0.001
0.114
0.028
0.752
0.000
0.138
0.014
-0.642
0.002
0.291
0.009
0.612
0.016
-0.828
0.049
-1.214
0.009
0.630
0.067
-1.129
0.006
-0.553
0.353
4.558
0.541
-1.006
0.897
0.989
5.033
-1.975
15-50
Residuals and Studentized Residuals
• One way to identify an outlier is residuals
• Any residual that is substantially different
from the others is suspect
• For a more precise idea, we can calculate the
studentized residual
– This is the observation’s residual divided by the
residual’s standard error
• If the studentized residual is outside of the
range
–2 to +2, we have some evidence that it is an
outlier
15-51
Residuals and Studentized Residuals
Continued
Residual
Studentize d Residual 
Residual Standard Error


e
i

  ei 


s
1

h
i 

An observation is outlying with respect to y if it has
a large studentized (or standardized) residual,
|StRes| greater than 2
15-52
Hospital Labor Needs Data
• From earlier slide, studentized residual
for observation #14 is 2.871
• This exceeds 2
• This observation is an outlier with
respect to y
15-53
Calculating the Deleted Residual For
Observation i
• Compute model using all observations
• Calculate yi
• Compute regression model using all
observations except i
• Use this reduced model to recompute yi
• Subtract the two values
• Divide this value by its standard error to get
the studentized deleted residual
• The value is compared to –t0.025 and +t0.025
• Values outside this range are outliers
15-54
Studentized Deleted Residuals
Deleted Residual
Studentize d Deleted Residual 
Deleted Residual Standard Error
 di
nk 2


 ei
2
 sd
SSE
(
1

h
)

e
i
i
 i




An observation is outlying with respect to y if it
has a large studentized deleted residual, |tRes|
greater than t/2 [with (n-k-2) d.f.]
15-55
Hospital Labor Needs Data
• From earlier slide, observation #14 has
a studentized deleted residual of 4.558
• The data has n-k-2 = 17-3-2 = 12
degrees of freedom
• t0.025 = 2.179
• 4.558 > 2.179
• Observation #14 is outlying with respect
to y
15-56
Cook’s Distance
• An observation is influential with respect
to the estimated regression parameters
b0, b1,…, bk if it has a large Cook’s
distance
– Di greater than F.50 [with k+1 and n-(k+1)
degrees of freedom]
ei2
Cook' s Distance  Di 
(k  1) s 2
 hi 

2
(
1

h
)
i


15-57
Hospital Labor Needs Data
• From earlier slide, observation #17 has
a Cook’s D value of 5.033
• F.05 with k+1 = 3+1 = 4 numerator
degrees of freedom and 13 (from earlier
slide) denominator degrees of freedom
is 0.8845
• 5.033 > 0.8845
– Observation # 17 is influential with respect
to the estimated regression parameters
15-58
What to do About Outliers?
• First, check to see if the data was recorded
correctly
– If not correct, discard the observation and rerun
• If correct, search for a reason for the
observation
– Might be caused by a situation we do not wish to
model
• If so, drop the observation
• If no reason found, consider that there might
be an important independent variable not
currently included in the model
15-59
Transforming the Dependent and
Independent Variables
• A possible remedy for violations of the
constant variance, correct functional form and
normality assumptions is to transform the
dependent variable
• Possible transformations include
– Square root
– Quartic root
– Logarithmic
• The appropriate transformation will depend
on the specific problem with the original data
set
15-60
The Durbin-Watson Test and Dealing
with Autocorrelation
• One type of autocorrelation is called
first-order autocorrelation
• This is when the error term in time
period t (t) is related to the error term in
time period t-1 (t-1)
• The Durbin-Watson statistic checks for
first-order autocorrelation
15-61
Durbin-Watson Test Statistic
n
d
2


e

e
 t t 1
t 2
n
2
e
t
t 1
• Where e1, e2,…, en are time-ordered
residuals
– If d < dL,, we reject H0
– If d > dU,, we reject H0
– If dL, ≤ d ≤ dU,, the test is inconclusive
• Tables A.10, A.11, and A.12 give values for
dL, and dU, at different alpha values
15-62