Download CH2

Introduction to Linear Regression Chap 2-1 Scatter Plots and Correlation  A scatter plot (or scatter diagram) is used to show the relationship between two variables  Correlation analysis is used to measure strength of the association (linear relationship) between two variables  Only concerned with strength of the relationship  No causal effect is implied Chap 2-2 Scatter Plot Examples Linear relationships y Curvilinear relationships y x y x y x x Chap 2-3 Scatter Plot Examples (continued) Strong relationships y Weak relationships y x y x y x x Chap 2-4 Scatter Plot Examples (continued) No relationship y x y x Chap 2-5 Correlation Coefficient (continued)   The population correlation coefficient ρ (rho) measures the strength of the association between the variables The sample correlation coefficient r is an estimate of ρ and is used to measure the strength of the linear relationship in the sample observations Chap 2-6 Features of ρ and r      Unit free Range between -1 and 1 The closer to -1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker the linear relationship Chap 2-7 Examples of Approximate r Values y y y x r = -1 r = -.6 y x x r=0 y r = +.3 x r = +1 x Chap 2-8 Calculating the Correlation Coefficient Sample correlation coefficient: r  ( x  x)( y  y) [ ( x  x ) ][  ( y  y ) ] 2 2 or the algebraic equivalent: r n xy   x  y [n( x 2 )  ( x )2 ][n(  y 2 )  ( y )2 ] where: r = Sample correlation coefficient n = Sample size x = Value of the independent variable y = Value of the dependent variable Chap 2-9 Calculation Example Tree Height Trunk Diameter y x xy y2 x2 35 8 280 1225 64 49 9 441 2401 81 27 7 189 729 49 33 6 198 1089 36 60 13 780 3600 169 21 7 147 441 49 45 11 495 2025 121 51 12 612 2601 144 =321 =73 =3142 =14111 =713 Chap 2-10 Calculation Example (continued) Tree Height, y 70 r n xy   x  y [n(  x 2 )  (  x) 2 ][n(  y 2 )  (  y) 2 ] 60  50 40 8(3142)  (73)(321) [8(713)  (73)2 ][8(14111)  (321)2 ]  0.886 30 20 10 0 0 2 4 6 8 10 Trunk Diameter, x 12 14 r = 0.886 → relatively strong positive linear association between x and y Chap 2-11 Significance Test for Correlation   Hypotheses H0: ρ = 0 HA: ρ ≠ 0 (no correlation) (correlation exists) Test statistic  t r (with n – 2 degrees of freedom) 1 r n2 2 Chap 2-12 Example: Produce Stores Is there evidence of a linear relationship between tree height and trunk diameter at the .05 level of significance? H0: ρ = 0 H1: ρ ≠ 0 (No correlation) (correlation exists)  =.05 , df = 8 - 2 = 6 t r 1 r 2 n2  .886 1  .886 2 82 Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc.  4.68 Chap 2-13 Example: Test Solution t r 1 r 2 n2  .886 1  .886 2 82 Decision: Reject H0  4.68 Conclusion: There is evidence of a linear relationship at the 5% level of significance d.f. = 8-2 = 6 /2=.025 Reject H0 -tα/2 -2.4469 /2=.025 Do not reject H0 0 Reject H0 tα/2 2.4469 4.68 Chap 2-14 Introduction to Regression Analysis  Regression analysis is used to:  Predict the value of a dependent variable based on the value of at least one independent variable  Explain the impact of changes in an independent variable on the dependent variable Dependent variable: the variable we wish to explain Independent variable: the variable used to explain the dependent variable Chap 2-15 Simple Linear Regression Model  Only one independent variable, x  Relationship between x and y is described by a linear function  Changes in y are assumed to be caused by changes in x Chap 2-16 Types of Regression Models Positive Linear Relationship Negative Linear Relationship Relationship NOT Linear No Relationship Chap 2-17 Population Linear Regression The population regression model: Population y intercept Dependent Variable Population Slope Coefficient Independent Variable y  β0  β1x  ε Linear component Random Error term, or residual Random Error component Chap 2-18 Linear Regression Assumptions  Error values (ε) are statistically independent  Error values are normally distributed for any given value of x  The probability distribution of the errors is normal  The probability distribution of the errors has constant variance  The underlying relationship between the x variable and the y variable is linear Chap 2-19 Population Linear Regression y y  β0  β1x  ε (continued) Observed Value of y for xi εi Predicted Value of y for xi Slope = β1 Random Error for this x value Intercept = β0 xi x Chap 2-20 Estimated Regression Model The sample regression line provides an estimate of the population regression line Estimated (or predicted) y value Estimate of the regression intercept Estimate of the regression slope ŷi  ˆ0  ˆ1x Independent variable The individual random error terms ei have a mean of zero Chap 2-21 Least Squares Criterion ̂ 0 and ̂1 are obtained by finding the values of ̂ 0and ̂1 that minimize the sum of the squared residuals  2 ˆ  e   (y y) 2 2 ˆ ˆ   (y  (0  1x)) Chap 2-22 The Least Squares Equation  The formulas for ̂1 and ̂ 0 are: ( x  x )( y  y ) S xy  ˆ 1   2 S xx  (x  x ) algebraic equivalent: ˆ1  x y   xy  S xy n  2 ( x ) S xx 2 x  n and ˆ0  y  ˆ1 x Chap 2-23 Interpretation of the Slope and the Intercept   ̂ 0 is the estimated average value of y when the value of x is zero ̂1 is the estimated change in the average value of y as a result of a oneunit change in x Chap 2-24 Finding the Least Squares Equation   The coefficients ̂0 and ̂1 will usually be found using computer software, such as Minitab Other regression measures will also be computed as part of computer-based regression analysis Chap 2-25 Simple Linear Regression Example  A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet)  A random sample of 10 houses is selected  Dependent variable (y) = house price in $1000s  Independent variable (x) = square feet Chap 2-26 Sample Data for House Price Model House Price in $1000s (y) Square Feet (x) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700 Chap 2-27 Regression Using MINATAB  Stat / Regression/ Regression Regression Analysis: House price versus Square Feet The regression equation is  House price = 98.2 + 0.110 Square Feet Predictor Coef SE Coef T P Constant 98.25 58.03 1.69 0.129 Square Feet 0.110 0.03297 3.33 0.010 S = 41.3303 R-Sq = 58.1% R-Sq(adj) = 52.8% Chap 2-28 MINITAB OUTPUT Analysis of Variance Source DF SS MS F P Regression 1 18935 18935 11.08 0.010 Residual Error 8 13666 1708 Total 9 32600 Chap 2-29 Graphical Presentation Stat /Regression/ Fitted Line Plot Chap 2-30 Interpretation of the Intercept, ̂ 0 House price = 98.25 + 0.110 Square Feet  ̂ 0 is the estimated average value of Y when the value of X is zero (if x = 0 is in the range of observed x values)  Here, no houses had 0 square feet, so ̂ 0 = 98.25 just indicates that, for houses within the range of sizes observed, $98,250 is the portion of the house price not explained by square feet Chap 2-31 Interpretation of the Slope Coefficient, ̂ 1 House price = 98.2 + 0.110 Square Feet  ̂1 measures the estimated change in the average value of Y as a result of a oneunit change in X  Here, ̂1 = .110 tells us that the average value of a house increases by 0.110 ($1000) = $110, on average, for each additional one square foot of size Chap 2-32 Least Squares Regression Properties  The sum of the residuals from the least squares regression line is 0 (  ( y yˆ )  0 )  The sum of the squared residuals is a minimum (minimized ( y yˆ ) 2 )  The simple regression line always passes through the mean of the y variable and the mean of the x variable  The least squares coefficients are unbiased  estimates of β0 and β1 Chap 2-33 Explained and Unexplained Variation  Total variation is made up of two parts:  SST Total sum of Squares SST   ( y  y )2 SSRe s Sum of Squares Error (Residual) SSRe s   ( y  yˆ )2  SSR Sum of Squares Regression SSR   ( yˆ  y )2 where: y = Average value of the dependent variable y = Observed values of the dependent variable ŷ = Estimated value of y for the given x value Chap 2-34 Explained and Unexplained Variation (continued)  SST = total sum of squares   SSRe s = error (residual) sumSS of squares T   Measures the variation of the yi values around their mean y Variation attributable to factors other than the relationship between x and y SS R = regression sum of squares  Explained variation attributable to the relationship between x and y Chap 2-35 Explained and Unexplained Variation (continued) y yi SSRe s  2 = (yi - yi ) SS R  _ 2 = (yi - y)  y _  y SST = (yi - y)2 _ y Xi _ y x Chap 2-36 Coefficient of Determination, R2  The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable  The coefficient of determination is also called R-squared and is denoted as R2 SS R R  SST 2 where 0  R2  1 Chap 2-37 Coefficient of Determination, R2 (continued) Coefficient of determination SS R sum of squares explained by regression R   SST total sum of squares 2 Note: In the single independent variable case, the coefficient of determination is R r 2 2 where: R2 = Coefficient of determination r = Simple correlation coefficient Chap 2-38 Examples of Approximate R2 Values y R2 = 1 R2 = 1 x 100% of the variation in y is explained by variation in x y R2 = +1 Perfect linear relationship between x and y: x Chap 2-39 Examples of Approximate R2 Values y 0 < R2 < 1 x Weaker linear relationship between x and y: Some but not all of the variation in y is explained by variation in x y x Chap 2-40 Examples of Approximate R2 Values R2 = 0 y No linear relationship between x and y: R2 = 0 x The value of Y does not depend on x. (None of the variation in y is explained by variation in x) Chap 2-41 Standard Error Estimation House price = 98.2 + 0.110 Square Feet Predictor Coef SE Coef T P Constant 98.25 58.03 1.69 0.129 Square Feet 0.110 0.03297 3.33 0.010 S = 41.3303 R-Sq = 58.1% R-Sq(adj) = 52.8% Source DF SS MS F P Regression 1 18935 18935 11.08 0.010 Residual Error 8 13666 1708 Total 9 32600 Chap 2-42 Standard Error of Estimate  The standard deviation of the variation of observations around the regression line is estimated by MS Re s SS Re s  n  k 1 Where SSRe s = Sum of squares error (residual) n = Sample size k = number of independent variables in the model Chap 2-43 The Standard Deviation of the Regression Slope  The standard error of the regression slope coefficient ( ̂1) is estimated by Se(ˆ1 )  MS Res 2 (x  x)   MS Res 2 ( x)  2 x  n where: Se( ˆ1 ) = Estimate of the standard error of the least squares slope MSRe s SSRe s  n2 = Sample standard error of the estimate Chap 2-44 Comparing Standard Errors y Variation of observed y values from the regression line small MS Res y x y Variation in the slope of regression lines from different possible samples small Se(ˆ1 ) x large Se(ˆ1 ) x y large MSRe s x Chap 2-45 Inference about the Slope: t Test  t test for a population slope   Null and alternative hypotheses    Is there a linear relationship between x and y? H0: β1 = 0 H1: β1  0 (no linear relationship) (linear relationship does exist) Test statistic   ˆ1  β1 t Se( ˆ1 ) d.f.  n  2 where: ̂1= Sample regression slope coefficient β1 = Hypothesized slope Se( ˆ1 ) = Estimator of the standard error of the slope Chap 2-46 Inference about the Slope: t Test (continued) House Price in $1000s (y) Square Feet (x) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700 Estimated Regression Equation: House price = 98.25 + 0.110 Square Feet The slope of this model is 0.110 Does square footage of the house affect its sales price? Chap 2-47 Inferences about the Slope: t Test Example Test Statistic: t = 3.33 H0: β1 = 0 HA: β1  0 From MINITAB output: Predictor Constant Coef SE Coef 98.25 58.03 T P 1.69 0.129 Square Feet 0.110 0.03297 3.33 0.010 d.f. = 10-2 = 8 /2=.025 Reject H0 /2=.025 Do not reject H0 -tα/2 -2.3060 0 Reject H 0 tα/2 2.3060 3.329 Decision: Reject H0 Conclusion: There is sufficient evidence that square footage affects house price Chap 2-48 Regression Analysis for Description Confidence Interval Estimate of the Slope: ˆ1  t /2 Se(ˆ1 ) d.f. = n - 2 At 95% level of confidence, the confidence interval for the slope is (0.0337, 0.1858) This 95% confidence interval does not include 0. Conclusion: There is a significant relationship between house price and square feet at the .05 level of significance Chap 2-49 Confidence Interval for the Average y, Given x Confidence interval estimate for the mean of y given a particular x 0 Size of interval varies according to distance away from mean, x yˆ  t /2 1 (x 0  x) MSRe s [  ] 2 n  (x  x) 2 Chap 2-50 Confidence Interval for an Individual y, Given x Confidence interval estimate for an Individual value of y given a particular x 0 yˆ  t /2 1 (x 0  x) MSRe s [1   ] 2 n  (x  x) 2 This extra term adds to the interval width to reflect the added uncertainty for an individual case Chap 2-51 Interval Estimates for Different Values of x y Prediction Interval for an individual y, given x 0 ̂ 0 Confidence Interval for the mean of y, given x 0 ̂1 x xp x Chap 2-52 Example: House Prices House Price in $1000s (y) Square Feet (x) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700 Estimated Regression Equation: House price = 98.25 + 0.110 Square Feet Predict the price for a house with 2000 square feet Chap 2-53 Example: House Prices (continued) Predict the price for a house with 2000 square feet: house price  98.25  0.110 (sq.ft.)  98.25  0.110(2000)  318.25 The predicted price for a house with 2000 square feet is 318.25($1,000s) = $318,250 Chap 2-54 Estimation of Mean Values: Example Confidence Interval Estimate for E(y)| x 0 Find the 95% confidence interval for the average price of 2,000 square-foot houses  Predicted Price Yi = 318.25 ($1,000s) ŷ  t α/2 1 (x 0  x) 2 MS Res [  ]  318.25  37.12 2 n  (x  x) The confidence interval endpoints are 281.13 -- 355.37, or from $281,13 -- $355,37 Chap 2-55 Estimation of Individual Values: Example Prediction Interval Estimate for y| x 0 Find the 95% confidence interval for an individual house with 2,000 square feet  Predicted Price Yi = 317.85 ($1,000s) ŷ  t α/2 1 (x 0  x) 2 MSRe s [1   ]  318.25  102.28 2 n  (x  x) The prediction interval endpoints are 215.97 -- 420.53, or from $215,97 -- $420,53 Chap 2-56 Finding Confidence and Prediction Intervals MINITAB Stat\Regression\Regression-option Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 317.8 16.1 (280.7, 354.9) (215.5, 420.1) Values of Predictors for New Observations Square Feet New Obs 1 2000 Chap 2-57 Finding Confidence and Prediction Intervals MINITAB Stat\Regression\fitted line plot Chap 2-58

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download CH2