Download Document

Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning Slide 1 Chapter 14 Simple Linear Regression          Simple Linear Regression Model Least Squares Method Coefficient of Determination Model Assumptions Testing for Significance Using the Estimated Regression Equation for Estimation and Prediction Computer Solution Residual Analysis: Validating Model Assumptions Residual Analysis: Outliers and Influential Observations Slide 2 The Simple Linear Regression Model  Simple Linear Regression Model y = 0 + 1 x +   Simple Linear Regression Equation E(y) = 0 + 1x  Estimated Simple Linear Regression Equation y^ = b0 + b1x Slide 3 Least Squares Method  Least Squares Criterion min  (y i  y i ) 2 where: yi = observed value of the dependent variable for the ith observation yî = estimated value of the dependent variable for the ith observation Slide 4 The Least Squares Method  Slope for the Estimated Regression Equation  xi y i  (  xi  y i ) / n b1  2 2  xi  (  xi ) / n  y-Intercept for the Estimated Regression Equation _ _ b0 = y - b1x where: xi = value of independent variable for ith observation yi = value of dependent variable for ith observation _ x = mean value for independent variable _ y = mean value for dependent variable n = total number of observations Slide 5 Example: Reed Auto Sales  Simple Linear Regression Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown below. Number of TV Ads 1 3 2 1 3 Number of Cars Sold 14 24 18 17 27 Slide 6 Example: Reed Auto Sales    Slope for the Estimated Regression Equation b1 = 220 - (10)(100)/5 = 5 24 - (10)2/5 y-Intercept for the Estimated Regression Equation b0 = 20 - 5(2) = 10 Estimated Regression Equation y^ = 10 + 5x Slide 7 Example: Reed Auto Sales Scatter Diagram 30 25 Cars Sold  20 y = 5x + 10 15 10 5 0 0 1 2 TV Ads 3 4 Slide 8 The Coefficient of Determination  Relationship Among SST, SSR, SSE SST = SSR + SSE 2 2 ^ )2  ( y i  y )   ( yî  y )   ( y i  y i  Coefficient of Determination r2 = SSR/SST where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error Slide 9 Example: Reed Auto Sales  Coefficient of Determination r2 = SSR/SST = 100/114 = .8772 The regression relationship is very strong since 88% of the variation in number of cars sold can be explained by the linear relationship between the number of TV ads and the number of cars sold. Slide 10 The Correlation Coefficient  Sample Correlation Coefficient rxy  (sign of b1 ) Coefficien t of Determinat ion rxy  (sign of b1 ) r 2 where: b1 = the slope of the estimated regression equation yˆ  b0  b1 x Slide 11 Example: Reed Auto Sales  Sample Correlation Coefficient rxy  (sign of b1 ) r 2 The sign of b1 in the equation yˆ  10  5 x is “+”. rxy = + .8772 rxy = +.9366 Slide 12 Model Assumptions  Assumptions About the Error Term  • The error  is a random variable with mean of zero. • The variance of  , denoted by  2, is the same for all values of the independent variable. • The values of  are independent. • The error  is a normally distributed random variable. Slide 13 Testing for Significance    To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of 1 is zero. Two tests are commonly used • t Test • F Test Both tests require an estimate of  2, the variance of  in the regression model. Slide 14 Testing for Significance  An Estimate of  2 The mean square error (MSE) provides the estimate of  2, and the notation s2 is also used. s2 = MSE = SSE/(n-2) where: SSE   (yi  yî ) 2   ( yi  b0  b1 xi ) 2 Slide 15 Testing for Significance  An Estimate of  • To estimate  we take the square root of  2. • The resulting s is called the standard error of the estimate. SSE s  MSE  n2 Slide 16 Testing for Significance: t Test  Hypotheses H 0 : 1 = 0 H a : 1 = 0  Test Statistic b1 t sb1  Rejection Rule Reject H0 if t < -t or t > t where t is based on a t distribution with n - 2 degrees of freedom. Slide 17 Example: Reed Auto Sales  t Test • Hypotheses • Rejection Rule H 0 : 1 = 0 H a : 1 = 0 For  = .05 and d.f. = 3, t.025 = 3.182 Reject H0 if t > 3.182 • Test Statistics t = 5/1.08 = 4.63 • Conclusions Reject H0 Slide 18 Confidence Interval for 1   We can use a 95% confidence interval for 1 to test the hypotheses just used in the t test. H0 is rejected if the hypothesized value of 1 is not included in the confidence interval for 1. Slide 19 Confidence Interval for 1  The form of a confidence interval for 1 is: b1  t / 2 sb1 where b1 is the point estimate t / 2 sb1 is the margin of error t / 2 is the t value providing an area of /2 in the upper tail of a t distribution with n - 2 degrees of freedom Slide 20 Example: Reed Auto Sales    Rejection Rule Reject H0 if 0 is not included in the confidence interval for 1. 95% Confidence Interval for 1 b1  t / 2 sb1 = 5 +/- 3.182(1.08) = 5 +/- 3.44 or 1.56 to 8.44 Conclusion Reject H0 Slide 21 Testing for Significance: F Test  Hypotheses H 0 : 1 = 0 H a : 1 = 0  Test Statistic F = MSR/MSE  Rejection Rule Reject H0 if F > F where F is based on an F distribution with 1 d.f. in the numerator and n - 2 d.f. in the denominator. Slide 22 Example: Reed Auto Sales  F Test • Hypotheses • Rejection Rule H 0 : 1 = 0 H a : 1 = 0 For  = .05 and d.f. = 1, 3: F.05 = 10.13 Reject H0 if F > 10.13. • Test Statistic F = MSR/MSE = 100/4.667 = 21.43 • Conclusion We can reject H0. Slide 23 Some Cautions about the Interpretation of Significance Tests   Rejecting H0: 1 = 0 and concluding that the relationship between x and y is significant does not enable us to conclude that a cause-and-effect relationship is present between x and y. Just because we are able to reject H0: 1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship between x and y. Slide 24 Using the Estimated Regression Equation for Estimation and Prediction  Confidence Interval Estimate of E(yp) y p  t /2 s y p  Prediction Interval Estimate of yp yp + t/2 sind where the confidence coefficient is 1 -  and t/2 is based on a t distribution with n - 2 d.f. Slide 25 Example: Reed Auto Sales    Point Estimation If 3 TV ads are run prior to a sale, we expect the mean number of cars sold to be: y^ = 10 + 5(3) = 25 cars Confidence Interval for E(yp) 95% confidence interval estimate of the mean number of cars sold when 3 TV ads are run is: 25 + 4.61 = 20.39 to 29.61 cars Prediction Interval for yp 95% prediction interval estimate of the number of cars sold in one particular week when 3 TV ads are run is: 25 + 8.28 = 16.72 to 33.28 cars Slide 26 Residual Analysis  Residual for Observation i yi – yî  Standardized Residual for Observation i y i  yî ^ sy i  y i where: syi  yî  s 1  hi Slide 27 Example: Reed Auto Sales  Residuals Observation 1 2 3 4 5 Predicted Cars Sold 15 25 20 15 25 Residuals -1 -1 -2 2 2 Slide 28 Example: Reed Auto Sales Residual Plot TV Ads Residual Plot 3 2 Residuals  1 0 -1 -2 -3 0 1 2 3 4 TV Ads Slide 29 Residual Analysis  Detecting Outliers • An outlier is an observation that is unusual in comparison with the other data. • Minitab classifies an observation as an outlier if its standardized residual value is < -2 or > +2. • This standardized residual rule sometimes fails to identify an unusually large observation as being an outlier. • This rule’s shortcoming can be circumvented by using studentized deleted residuals. • The |i th studentized deleted residual| will be larger than the |i th standardized residual|. Slide 30 End of Chapter 14 Slide 31

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document