Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 13 Simple Linear Regression Analysis McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression 13.1 The Simple Linear Regression Model and the Least Square Point Estimates 13.3 Testing the Significance of Slope and y-Intercept 13-2 The Simple Linear Regression Model and the Least Squares Point Estimates • The dependent (or response) variable is the variable we wish to understand or predict • The independent (or predictor) variable is the variable we will use to understand or predict the dependent variable • Regression analysis is a statistical technique that uses observed data to relate the dependent variable to one or more independent variables 13-3 Objective of Regression Analysis The objective of regression analysis is to build a regression model (or predictive equation) that can be used to describe, predict and control the dependent variable on the basis of the independent variable 13-4 Example 13.1: Fuel Consumption Case #1 Average Hourly Temperature Week x (deg F) 1 28.0 2 28.0 3 32.5 4 39.0 5 45.9 6 57.8 7 58.1 8 62.5 Weekly Fuel Consumption y (MMcf) 12.4 11.7 12.4 10.8 9.4 9.5 8.0 7.5 13-5 Example 13.1: Fuel Consumption Case #2 13-6 Example 13.1: Fuel Consumption Case #3 13-7 Example 13.1: Fuel Consumption Case #4 • The values of β0 and β1 determine the value of the mean weekly fuel consumption μy|x • Because we do not know the true values of β0 and β1, we cannot actually calculate the mean weekly fuel consumptions • We will learn how to estimate β0 and β1 in the next section • For now, when we say that μy|x is related to x by a straight line, we mean the different mean weekly fuel consumptions and average hourly temperatures lie in a straight line 13-8 Form of The Simple Linear Regression Model • y = β0 + β1 x + ε • y = β0 + β1x + ε is the mean value of the dependent variable y when the value of the independent variable is x • β0 is the y-intercept; the mean of y when x is 0 • β1 is the slope; the change in the mean of y per unit change in x • ε is an error term that describes the effect on y of all factors other than x 13-9 Regression Terms • β0 and β1 are called regression parameters • β0 is the y-intercept and β1 is the slope • We do not know the true values of these parameters • So, we must use sample data to estimate them • b0 is the estimate of β0 and b1 is the estimate of β1 13-10 The Simple Linear Regression Model Illustrated 13-11 The Least Squares Estimates, and Point Estimation and Prediction • The true values of β0 and β1 are unknown • Therefore, we must use observed data to compute statistics that estimate these parameters • Will compute b0 to estimate β0 and b1 to estimate β1 13-12 The Least Squares Point Estimates • Estimation/prediction equation ŷ = b0 + b1x • Least squares point estimate of the slope β1 b1 SS xy SS xy SS xx x y ( x x )( y y ) x y i i SS xx ( xi x ) i i i x x n i n 2 2 2 i i 13-13 The Least Squares Point Estimates Continued • Least squares point estimate of the yintercept 0 b0 y b1 x y y i n x x i n 13-14 Example 13.3: Fuel Consumption Case #1 y 12.4 11.7 12.4 10.8 9.4 9.5 8.0 7.5 81.7 x 28.0 28.0 32.5 39.0 45.9 57.8 58.1 62.5 351.8 x2 784.00 784.00 1056.25 1521.00 2106.81 3340.84 3375.61 3906.25 16874.76 xy 347.20 327.60 403.00 421.20 431.46 549.10 464.80 468.75 3413.11 13-15 Example 13.3: Fuel Consumption Case #2 • From last slide, – Σyi = 81.7 – Σxi = 351.8 – Σx2i = 16,874.76 – Σxiyi = 3,413.11 • Once we have these values, we no longer need the raw data • Calculation of b0 and b1 uses these totals 13-16 Example 13.3: Fuel Consumption Case #3 (Slope b1) SS xy x y x y i i i i n (351.8)(81.7) 3413.11 179.6475 8 x x n 2 SS xx 2 i i 2 (351.8) 16874.76 1404.355 8 b1 SS xy SS xx 179.6475 0.1279 1404.355 13-17 Example 13.3: Fuel Consumption Case #4 (y-Intercept b0) y y 81.7 10.2125 n 8 xi 351.8 x 43.98 n 8 i b0 y b1 x 10.2125 (0.1279)( 43.98) 15.84 13-18 Example 13.3: Fuel Consumption Case #5 • Prediction (x = 40) • ŷ = b0 + b1x = 15.84 + (-0.1279)(28) • ŷ = 12.2588 MMcf of Gas 13-19 Example 13.3: Fuel Consumption Case #6 13-20 Example 13.3: The Danger of Extrapolation Outside The Experimental Region 13-21 Testing the Significance of the Slope • A regression model is not likely to be useful unless there is a significant relationship between x and y • To test significance, we use the null hypothesis: H0: β1 = 0 • Versus the alternative hypothesis: Ha: β1 ≠ 0 13-22 Testing the Significance of the Slope #2 If the regression assumptions hold, we can reject H0: 1 = 0 at the level of significance (probability of Type I error equal to ) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding pvalue is less than 13-23 Testing the Significance of the Slope #3 Alternative Reject H0 If p-Value Ha: β1 > 0 t > tα Area under t distribution right of t Ha: β1 < 0 t < –tα Area under t distribution left of t Ha: β1 ≠ 0 |t| > tα/2* Twice area under t distribution right of |t| * That is t > tα/2 or t < –tα/2 13-24 Testing the Significance of the Slope #4 • Test Statistics b1 s t= where sb1 sb1 SS xx • 100(1-α)% Confidence Interval for β1 [b1 ± t /2 Sb1] • t, t/2 and p-values are based on n–2 degrees of freedom 13-25 Example 13.6: MINITAB Output of Regression on Fuel Consumption Data 13-26 Example 13.6: Excel Output of Regression on Fuel Consumption Data 13-27 Example 13.6: Fuel Consumption Case • The p-value for testing H0 versus Ha is twice the area to the right of |t|=7.33 with n-2=6 degrees of freedom • In this case, the p-value is 0.0003 • We can reject H0 in favor of Ha at level of significance 0.05, 0.01, or 0.001 • We therefore have strong evidence that x is significantly related to y and that the regression model is significant 13-28 A Confidence Interval for the Slope • If the regression assumptions hold, a 100(1-) percent confidence interval for the true slope B1 is – b1 ± t/2 sb • Here t is based on n - 2 degrees of freedom 13-29 Example 13.7: Fuel Consumption Case • An earlier printout tells us: – b1 = -0.12792 – sb1 = 0.01746 • We have n-2=6 degrees of freedom – That gives us a t-value of 2.447 for a 95 percent confidence interval • [b1 ± t0.025 · sb1] = [-0.12792 ± 0.01746] = [-0.1706, -0.0852] 13-30 Testing the Significance of the y-Intercept If the regression assumptions hold, we can reject H0: 0 = 0 at the level of significance (probability of Type I error equal to ) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding pvalue is less than 13-31 Testing the Significance of the y-Intercept #2 Alternative Reject H0 If p-Value Ha: β0 > 0 t > tα Area under t distribution right of t Ha: β0 < 0 t < –tα Area under t distribution left of t Ha: β0 ≠ 0 |t| > tα/2* Twice area under t distribution right of |t| * That is t > tα/2 or t < –tα/2 13-32 Testing the Significance of the y-Intercept #3 Test Statistics b0 1 x2 t= where sb0 s sb0 n SS xx 100(1-)% Confidence Interval for 1 [b0 t / 2 sb0 ] t, t/2 and p-values are based on n–2 degrees of freedom 13-33