Download t - Portal UniMAP

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Time series wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Choice modelling wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
CHAPTER 11:
INTRODUCTION TO
MULTIPLE
REGRESSION
CHAPTER OUTLINE:
11.1 MULTIPLE REGRESSION MODEL
11.2 MULTIPLE COEFFICIENT OF DETERMINATION
11.3 MODEL ASSUMPTIONS
11.4 TEST OF SIGNIFICANCE
11.5 ANALYSIS OF VARIANCE (ANOVA)
11.1 MULTIPLE REGRESSION MODEL
 In multiple regression, there are several
independent variables (X)and one dependent
variable (Y).
 The multiple regression model:
Y = b 0 + b1 X 1 + b 2 X 2 +........ + b p X p +e
where:
 0 , 1 ,  2 ......  p are the parameters, and
e is a random variable called the error term
X 1, X 2,......, X k are the independent variables.
 This equation that describes how the dependent
variable y is related to these independent
variables x1, x2, . . . xp.
11.1 MULTIPLE REGRESSION MODEL
 Multiple regression analysis is use when a
statistician thinks there are several independent
variables contributing to the variation of the
dependent variable.
 This analysis then can be used to increase the
accuracy of predictions for the dependent
variable over one independent variable alone.
Estimated Multiple Regression Equation

Estimated Multiple Regression Equation
Yˆ = b 0 + b1 X1 + b 2 X 2 +........ + b p X p

In multiple regression analysis, we interpret each
regression coefficient as follows:
βi represents an estimate of the change in y
corresponding to a 1-unit increase in xi when all
other independent variables are held constant.
11.2 MULTIPLE COEFFICIENT OF
DETERMINATION (R2)
 As with simple regression, R2 is the coefficient of
multiple determination, and it is the amount of
variation explained by the regression model.
 Formula: R 2 = SSR = 1 - SSE
SST
SST
MULTIPLE CORRELATION COEFFICIENT (R)
 In multiple regression, as in simple regression,
the strength of the relationship between the
independent variable and the dependent
variable is measured by correlation coefficient,
R.
11.3 MODEL ASSUMPTIONS
 The errors ( ) are normally distributed with
2
(

)
mean E (e ) = 0 and variance Var
= .
 The errors are statistically independent. Thus
the error for any value of Y is unaffected by the
error for any other Y-value.
 The X-variables are linear additive (i.e., can be
summed).
11.5 ANALYSIS OF VARIANCE (ANOVA)
General form of ANOVA table:

Source
Degrees of
Freedom
Sum of
Squares
Mean Squares
Regression
p
SSR
MSR=SSR
p
Error
n-p-1
SSE
Total
n-1
SST
Value of the
Test Statistic
F=MSR
MSE
MSE=SSE
n-p-1
Excel’s ANOVA Output
A
32
33
34
35
36
37
38
B
C
D
E
F
ANOVA
Regression
Residual
Total
SST
df
SS
MS
F
Significance F
2 500.3285 250.1643 42.76013
2.32774E-07
17 99.45697 5.85041
19 599.7855
SSR
11.4 TEST OF SIGNIFICANCE
 In simple linear regression, the F and t tests provide
the same conclusion.
 In multiple regression, the F and t tests have different
purposes.
The F test is used to determine whether a significant
relationship exists between the dependent variable
and the set of all the independent variables.
The F test is referred to as the test for overall
significance.
The t test is used to determine whether each of the individual
independent variables is significant.
A separate t test is conducted for each of the
independent variables in the model.
We refer to each of these t tests as a test for individual
significance.
Testing for Significance: F Test - Overall Significance
Hypotheses
H 0 : β1 = β 2 = . . . = β p = 0
H1: One or more of the parameters
is not equal to zero.
Test Statistics
F = MSR/MSE
Rejection Rule
Reject H0 if
p-value < α or if F > Fα
where :
F is based on an F distribution
With p d.f. in the numerator and
n - p - 1 d.f. in the denominator.
Testing for Significance: t Test- Individual Parameters
Hypotheses
Test Statistics
Rejection Rule
H0 : b i = 0
H1 : b i № 0
bi
t=
sbi
Reject H0 if
p-value < α or
t < -tα/2 or t > t α/2
Where:
t α/2 is based on a t distribution
with n - p - 1 degrees of freedom.
Example:
An independent trucking company, The Butler Trucking Company involves
deliveries throughout southern California. The managers want to estimate
the total daily travel time for their drivers. He believes the total daily travel
time would be closely related to the number of miles traveled in making the
deliveries.
a) Determine whether there is a relationship among the variables using
a = 0.05
b) Use the t-test to determine the significance of each independent
variable. What is your conclusion at the 0.05 level of significance?
Solution:
a) Hypothesis Statement:
H 0 : b1 = b 2 = 0
H1:One or more of the parameters is not equal to zero
Test Statistics:
F = 32.88
Rejection Region:
F0.05,2,7 = 4.74
Since 32.88>4.74, we Reject H0 and conclude that there is a significance
relationship between travel time (Y) and two independent variables,
miles traveled and number of deliveries.
Solution:
b) Hypothesis Statement:
H 0 : b1 = 0
H1: : b1 № 0
Test Statistics:
t=
0.061135
= 6.18
0.009888
Rejection Region:
t0.05/2,7 = 2.365
Since 6.18>2.365, we Reject H0 and conclude that there is a significance
relationship between travel time (Y) and miles traveled (X1).
Solution:
b) Hypothesis Statement:
H0 : b2 = 0
H1: : b 2 № 0
Test Statistics:
t=
0.9234
= 4.18
0.2211
Rejection Region:
t0.05/2,7 = 2.365
Since 4.18>2.365, we Reject H0 and conclude that there is a significance
relationship between travel time (Y) and number of deliveries (X2).
End of Chapter 11