Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
A PowerPoint Presentation Package to Accompany Applied Statistics in Business & Economics, 4th edition David P. Doane and Lori E. Seward Prepared by Lloyd R. Jaisingh McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 12 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation Analysis 12.2 Simple Regression 12.3 Regression Terminology 12.4 Ordinary Least Squares Formulas 12.5 Tests for Significance 12.6 Analysis of Variance: Overall Fit 12.7 Confidence and Prediction Intervals for Y 12.8 Residual Tests 12.9 Unusual Observations 12.10 Other Regression Problems 12-2 Chapter 12 Simple Regression Chapter Learning Objectives LO12-1: LO12-2: LO12-3: LO12-4: LO12-5: LO12-6: LO12-7: LO12-8: LO12-9: LO12-10: LO12-11: Calculate and test a correlation coefficient for significance. Interpret the slope and intercept of a regression equation. Make a prediction for a given x value using a regression equation. Fit a simple regression on an Excel scatter plot. Calculate and interpret confidence intervals for regression coefficients. Test hypotheses about the slope and intercept by using t tests. Perform regression with Excel or other software. Interpret the standard error, R2, ANOVA table, and F test. Distinguish between confidence and prediction intervals. Test residuals for violations of regression assumptions. Identify unusual residuals and high-leverage observations. 12-3 Chapter 12 LO12-1 12.1 Visual Displays and Correlation Analysis Visual Displays • • Begin the analysis of bivariate data (i.e., two variables) with a scatter plot. A scatter plot - displays each observed data pair (xi, yi) as a dot on an X/Y grid. - indicates visually the strength of the relationship between the two variables. Sample Scatter Plot 12-4 Chapter 12 LO12-1 12.1 Visual Displays and Correlation Analysis LO12-1: Calculate and test a correlation coefficient for significance. Correlation Coefficient • The sample correlation coefficient (r) measures the degree of linearity in the relationship between X and Y. -1 ≤ r ≤ +1 r = 0 indicates no linear Relationship. 12-5 Chapter 12 LO12-1 12.1 Visual Displays and Correlation Analysis Scatter Plots Showing Various Correlation Values Strong Positive Correlation Strong Negative Correlation Weak Positive Correlation No Correlation Weak Negative Correlation Nonlinear Relation 12-6 Chapter 12 LO12-1 12.1 Visual Displays and Correlation Analysis Steps in Testing if r = 0 (Tests for Significance) • • Step 1: State the Hypotheses. Determine whether you are using a one or two-tailed test and the level of significance (a). H0: r = 0 H1: r ≠ 0 Step 2: Specify the Decision Rule. For degrees of freedom d.f. = n -2, look up the critical value ta in Appendix D. • Step 3: Calculate the Test Statistic. • Step 4: Make the Decision. If the sample correlation coefficient r exceeds the critical value ra, then reject H0. If using the t statistic method, reject H0 if t > ta or if the p-value ≤ a. • Note: r is an estimate of the population correlation coefficient r (rho). • 12-7 Chapter 12 LO12-1 12.1 Visual Displays and Correlation Analysis Critical Value for Correlation Coefficient (Tests for Significance) • Equivalently, you can calculate the critical value for the correlation coefficient using • • This method gives a benchmark for the correlation coefficient. However, there is no p-value and is inflexible if you change your mind about a. Quick Rule for Significance A quick test for significance of a correlation at a = .05 is |r| > 2/n 12-8 Chapter 12 LO12-2 12.2 Simple Regression LO12-2: Interpret the slope and intercept of a regression equation. What is Simple Regression? • • • Simple Regression analyzes the relationship between two variables. It specifies one dependent (response) variable and one independent (predictor) variable. This hypothesized relationship here will be linear. 12-9 Chapter 12 LO12-2 12.2 Simple Regression Models and Parameters • • • The assumed model for a linear relationship is y = b0 + b1x + e. The relationship holds for all pairs (xi , yi ). The error term is not observable, is assumed normally distributed with mean of 0 and standard deviation s. • The unknown parameters are b0 Intercept b1 Slope • • The fitted model used to predict the expected value of Y for a given value of X is The fitted coefficients are b0 the estimated intercept b1 the estimated slope 12-10 Chapter 12 LO12-4 12.3 Regression Terminology LO12-4: Fit a simple regression on an Excel scatter plot. A more precise method is to let Excel calculate the estimates. We enter observations on the independent variable x1, x2, . . ., xn and the dependent variable y1, y2, . . ., yn into separate columns, and let Excel fi t the regression equation, as illustrated in Figure 12.6. Excel will choose the regression coefficients so as to produce a good fit. 12-11 Chapter 12 12.4 Ordinary Least Squares (OLS) Formulas Slope and Intercept • The ordinary least squares method (OLS) estimates the slope and intercept of the regression line so that the residuals are small. or Coefficient of Determination (Assessing the Fit) • R2 is a measure of relative fit based on a comparison of SSR ( regression sum of squares) and SST (Total Sums of Squares). One can use technology to compute. • Often expressed as a percent, an R2 = 1 (i.e., 100%) indicates perfect fit. • In a bivariate regression, R2 = (r)2 12-12 Chapter 12 LO12-5 12.5 Test For Significance Confidence Intervals for Slope and Intercept LO12-5: Calculate and interpret confidence intervals for regression coefficients. • Confidence interval for the true slope and intercept: • Note: One can use Excel, Minitab, MegaStat or other technologies to compute these intervals and do hypothesis tests relating to linear regression. 12-13 12.5 Test For Significance LO12-6: Test hypotheses about the slope and intercept by using t tests. Chapter 12 LO12-6 Hypothesis Tests • If b1 = 0, then X cannot influence Y and the regression model collapses to a constant b0 plus random error. • The hypotheses to be tested are: d.f. = n -2 Reject H0 if tcalc > ta/2 or if p-value a. 12-14 Chapter 12 LO12-8 12.6 Analysis of Variance: Overall Fit LO12-8: Interpret the standard error, R2, ANOVA table, and F test. F Test for Overall Fit • To test a regression for overall significance, we use an F test to compare the explained (SSR) and unexplained (SSE) sums of squares. • Reject H0 of a significant relationship if Fcalc > Fa, 1, n - 2 or if p-value ≤ a. 12-15 12B-15 Chapter 12 LO12-9 12.7 Confidence and Prediction Intervals for Y LO12-9: Distinguish between confidence and prediction intervals for Y. How to Construct an Interval Estimate for Y • • Confidence Interval for the conditional mean of Y. Prediction intervals are wider than confidence intervals because individual Y values vary more than the mean of Y. 12-16 Chapter 12 LO12-10 12.8 Residual Tests LO12-10: Test residuals for violations of regression assumptions. Three Important Assumptions 1. 2. 3. 4. The errors are normally distributed. The errors have constant variance (i.e., they are homoscedastic). The errors are independent (i.e., they are nonautocorrelated). Note: One can use the appropriate technology (MINITAB, EXCEL, etc.) to test for violations of the assumptions. 12-17 Chapter 12 LO12-11 12.9 Unusual Observations LO12-11: Identify unusual residuals and high leverage observations. Standardized Residuals • • One can use Excel, Minitab, MegaStat or other technologies to compute standardized residuals. If the absolute value of any standardized residual is at least 2, then it is classified as unusual. Leverage and Influence • • • A high leverage statistic indicates the observation is far from the mean of X. These observations are influential because they are at the “ end of the lever.” The leverage for observation i is denoted hi. A leverage that exceeds 3/n is unusual. 12-18 Chapter 12 12.10 Other Regression Problems Outliers Outliers may be caused by - an error in recording data - impossible data - an observation that has been influenced by an unspecified “lurking” variable that should have been controlled but wasn’t. To fix the problem, - delete the observation(s) - delete the data - formulate a multiple regression model that includes the lurking variable. 12-19 12B-19 Chapter 12 12.10 Other Regression Problems Model Misspecification • • If a relevant predictor has been omitted, then the model is misspecified. Use multiple regression instead of bivariate regression. Ill-Conditioned Data • • • Well-conditioned data values are of the same general order of magnitude. Ill-conditioned data have unusually large or small data values and can cause loss of regression accuracy or awkward estimates. Avoid mixing magnitudes by adjusting the magnitude of your data before running the regression. 12-20 Chapter 12 12.10 Other Regression Problems Spurious Correlation • • In a spurious correlation two variables appear related because of the way they are defined. This problem is called the size effect or problem of totals. Model Form and Variable Transforms • • • • • Sometimes a nonlinear model is a better fit than a linear model. Excel offers many model forms. Variables may be transformed (e.g., logarithmic or exponential functions) in order to provide a better fit. Log transformations reduce heteroscedasticity. Nonlinear models may be difficult to interpret. 12-21