* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download t - Portal UniMAP
Survey
Document related concepts
Transcript
CHAPTER 11: INTRODUCTION TO MULTIPLE REGRESSION CHAPTER OUTLINE: 11.1 MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE 11.5 ANALYSIS OF VARIANCE (ANOVA) 11.1 MULTIPLE REGRESSION MODEL In multiple regression, there are several independent variables (X)and one dependent variable (Y). The multiple regression model: Y = b 0 + b1 X 1 + b 2 X 2 +........ + b p X p +e where: 0 , 1 , 2 ...... p are the parameters, and e is a random variable called the error term X 1, X 2,......, X k are the independent variables. This equation that describes how the dependent variable y is related to these independent variables x1, x2, . . . xp. 11.1 MULTIPLE REGRESSION MODEL Multiple regression analysis is use when a statistician thinks there are several independent variables contributing to the variation of the dependent variable. This analysis then can be used to increase the accuracy of predictions for the dependent variable over one independent variable alone. Estimated Multiple Regression Equation Estimated Multiple Regression Equation Yˆ = b 0 + b1 X1 + b 2 X 2 +........ + b p X p In multiple regression analysis, we interpret each regression coefficient as follows: βi represents an estimate of the change in y corresponding to a 1-unit increase in xi when all other independent variables are held constant. 11.2 MULTIPLE COEFFICIENT OF DETERMINATION (R2) As with simple regression, R2 is the coefficient of multiple determination, and it is the amount of variation explained by the regression model. Formula: R 2 = SSR = 1 - SSE SST SST MULTIPLE CORRELATION COEFFICIENT (R) In multiple regression, as in simple regression, the strength of the relationship between the independent variable and the dependent variable is measured by correlation coefficient, R. 11.3 MODEL ASSUMPTIONS The errors ( ) are normally distributed with 2 ( ) mean E (e ) = 0 and variance Var = . The errors are statistically independent. Thus the error for any value of Y is unaffected by the error for any other Y-value. The X-variables are linear additive (i.e., can be summed). 11.5 ANALYSIS OF VARIANCE (ANOVA) General form of ANOVA table: Source Degrees of Freedom Sum of Squares Mean Squares Regression p SSR MSR=SSR p Error n-p-1 SSE Total n-1 SST Value of the Test Statistic F=MSR MSE MSE=SSE n-p-1 Excel’s ANOVA Output A 32 33 34 35 36 37 38 B C D E F ANOVA Regression Residual Total SST df SS MS F Significance F 2 500.3285 250.1643 42.76013 2.32774E-07 17 99.45697 5.85041 19 599.7855 SSR 11.4 TEST OF SIGNIFICANCE In simple linear regression, the F and t tests provide the same conclusion. In multiple regression, the F and t tests have different purposes. The F test is used to determine whether a significant relationship exists between the dependent variable and the set of all the independent variables. The F test is referred to as the test for overall significance. The t test is used to determine whether each of the individual independent variables is significant. A separate t test is conducted for each of the independent variables in the model. We refer to each of these t tests as a test for individual significance. Testing for Significance: F Test - Overall Significance Hypotheses H 0 : β1 = β 2 = . . . = β p = 0 H1: One or more of the parameters is not equal to zero. Test Statistics F = MSR/MSE Rejection Rule Reject H0 if p-value < α or if F > Fα where : F is based on an F distribution With p d.f. in the numerator and n - p - 1 d.f. in the denominator. Testing for Significance: t Test- Individual Parameters Hypotheses Test Statistics Rejection Rule H0 : b i = 0 H1 : b i № 0 bi t= sbi Reject H0 if p-value < α or t < -tα/2 or t > t α/2 Where: t α/2 is based on a t distribution with n - p - 1 degrees of freedom. Example: An independent trucking company, The Butler Trucking Company involves deliveries throughout southern California. The managers want to estimate the total daily travel time for their drivers. He believes the total daily travel time would be closely related to the number of miles traveled in making the deliveries. a) Determine whether there is a relationship among the variables using a = 0.05 b) Use the t-test to determine the significance of each independent variable. What is your conclusion at the 0.05 level of significance? Solution: a) Hypothesis Statement: H 0 : b1 = b 2 = 0 H1:One or more of the parameters is not equal to zero Test Statistics: F = 32.88 Rejection Region: F0.05,2,7 = 4.74 Since 32.88>4.74, we Reject H0 and conclude that there is a significance relationship between travel time (Y) and two independent variables, miles traveled and number of deliveries. Solution: b) Hypothesis Statement: H 0 : b1 = 0 H1: : b1 № 0 Test Statistics: t= 0.061135 = 6.18 0.009888 Rejection Region: t0.05/2,7 = 2.365 Since 6.18>2.365, we Reject H0 and conclude that there is a significance relationship between travel time (Y) and miles traveled (X1). Solution: b) Hypothesis Statement: H0 : b2 = 0 H1: : b 2 № 0 Test Statistics: t= 0.9234 = 4.18 0.2211 Rejection Region: t0.05/2,7 = 2.365 Since 4.18>2.365, we Reject H0 and conclude that there is a significance relationship between travel time (Y) and number of deliveries (X2). End of Chapter 11