Download Chapter 12 - McGraw Hill Higher Education

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
A PowerPoint Presentation Package to Accompany
Applied Statistics in Business &
Economics, 4th edition
David P. Doane and Lori E. Seward
Prepared by Lloyd R. Jaisingh
McGraw-Hill/Irwin
Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 12
Simple Regression
Chapter Contents
12.1 Visual Displays and Correlation Analysis
12.2 Simple Regression
12.3 Regression Terminology
12.4 Ordinary Least Squares Formulas
12.5 Tests for Significance
12.6 Analysis of Variance: Overall Fit
12.7 Confidence and Prediction Intervals for Y
12.8 Residual Tests
12.9 Unusual Observations
12.10 Other Regression Problems
12-2
Chapter 12
Simple Regression
Chapter Learning Objectives
LO12-1:
LO12-2:
LO12-3:
LO12-4:
LO12-5:
LO12-6:
LO12-7:
LO12-8:
LO12-9:
LO12-10:
LO12-11:
Calculate and test a correlation coefficient for significance.
Interpret the slope and intercept of a regression equation.
Make a prediction for a given x value using a regression equation.
Fit a simple regression on an Excel scatter plot.
Calculate and interpret confidence intervals for regression coefficients.
Test hypotheses about the slope and intercept by using t tests.
Perform regression with Excel or other software.
Interpret the standard error, R2, ANOVA table, and F test.
Distinguish between confidence and prediction intervals.
Test residuals for violations of regression assumptions.
Identify unusual residuals and high-leverage observations.
12-3
Chapter 12
LO12-1
12.1 Visual Displays and
Correlation Analysis
Visual Displays
•
•
Begin the analysis of bivariate data (i.e., two variables) with a scatter plot.
A scatter plot
- displays each observed data pair (xi, yi) as a dot on an X/Y grid.
- indicates visually the strength of the relationship between the two variables.
Sample Scatter Plot
12-4
Chapter 12
LO12-1
12.1 Visual Displays and
Correlation Analysis
LO12-1: Calculate and test a correlation coefficient for significance.
Correlation Coefficient
•
The sample correlation coefficient (r) measures the degree of linearity
in the relationship between X and Y.
-1 ≤ r ≤ +1
r = 0 indicates no linear
Relationship.
12-5
Chapter 12
LO12-1
12.1 Visual Displays and
Correlation Analysis
Scatter Plots Showing Various Correlation Values
Strong Positive Correlation
Strong Negative Correlation
Weak Positive Correlation
No Correlation
Weak Negative Correlation
Nonlinear Relation
12-6
Chapter 12
LO12-1
12.1 Visual Displays and
Correlation Analysis
Steps in Testing if r = 0 (Tests for Significance)
•
•
Step 1: State the Hypotheses.
Determine whether you are using a one or two-tailed test and the level of
significance (a).
H0: r = 0
H1: r ≠ 0
Step 2: Specify the Decision Rule.
For degrees of freedom d.f. = n -2, look up the critical value ta in Appendix D.
•
Step 3: Calculate the Test Statistic.
•
Step 4: Make the Decision.
If the sample correlation coefficient r
exceeds the critical value ra, then reject
H0.
If using the t statistic method, reject H0 if t
> ta or if the p-value ≤ a.
• Note: r is an estimate of the population
correlation coefficient r (rho).
•
12-7
Chapter 12
LO12-1
12.1 Visual Displays and
Correlation Analysis
Critical Value for Correlation Coefficient (Tests for Significance)
•
Equivalently, you can calculate the critical value for the correlation coefficient
using
•
•
This method gives a benchmark for the correlation coefficient.
However, there is no p-value and is inflexible if you change your mind about a.
Quick Rule for Significance
A quick test for
significance
of a correlation
at a = .05 is |r| > 2/n
12-8
Chapter 12
LO12-2
12.2 Simple Regression
LO12-2: Interpret the slope and intercept of a regression equation.
What is Simple Regression?
•
•
•
Simple Regression analyzes the relationship between two variables.
It specifies one dependent (response) variable and one independent (predictor)
variable.
This hypothesized relationship here will be linear.
12-9
Chapter 12
LO12-2
12.2 Simple Regression
Models and Parameters
•
•
•
The assumed model for a linear relationship is y = b0 + b1x + e.
The relationship holds for all pairs (xi , yi ).
The error term is not observable, is assumed normally distributed with mean of
0 and standard deviation s.
•
The unknown parameters are
b0
Intercept
b1
Slope
•
•
The fitted model used to predict the expected value of Y for a given value of X is
The fitted coefficients are
b0
the estimated intercept
b1
the estimated slope
12-10
Chapter 12
LO12-4
12.3 Regression Terminology
LO12-4: Fit a simple regression on an Excel scatter plot.
A more precise method is to let Excel
calculate the estimates. We enter
observations on the independent
variable x1, x2, . . ., xn and the
dependent variable y1, y2, . . ., yn into
separate columns, and let Excel fi t the
regression equation, as illustrated in
Figure 12.6. Excel will choose the
regression coefficients so as to
produce a good fit.
12-11
Chapter 12
12.4 Ordinary Least Squares (OLS)
Formulas
Slope and Intercept
•
The ordinary least squares method (OLS) estimates the slope and intercept of the
regression line so that the residuals are small.
or
Coefficient of Determination (Assessing the Fit)
•
R2 is a measure of relative fit based on a comparison of SSR ( regression sum
of squares) and SST (Total Sums of Squares). One can use technology to
compute.
•
Often expressed as a percent, an R2 = 1
(i.e., 100%) indicates perfect fit.
•
In a bivariate regression, R2 = (r)2
12-12
Chapter 12
LO12-5
12.5 Test For Significance
Confidence Intervals for Slope and Intercept
LO12-5: Calculate and interpret confidence intervals for regression
coefficients.
•
Confidence interval for the true slope and intercept:
• Note: One can use Excel, Minitab, MegaStat or other technologies to compute
these intervals and do hypothesis tests relating to linear regression.
12-13
12.5 Test For Significance
LO12-6: Test hypotheses about the slope and intercept by using t tests.
Chapter 12
LO12-6
Hypothesis Tests
•
If b1 = 0, then X cannot influence Y and the regression model collapses to a
constant b0 plus random error.
•
The hypotheses to be tested are:
d.f. = n -2
Reject H0 if tcalc > ta/2
or if p-value  a.
12-14
Chapter 12
LO12-8
12.6 Analysis of Variance: Overall Fit
LO12-8: Interpret the standard error, R2, ANOVA table, and F test.
F Test for Overall Fit
•
To test a regression for overall significance, we use an F test to compare the
explained (SSR) and unexplained (SSE) sums of squares.
•
Reject H0 of a significant relationship if Fcalc > Fa, 1, n - 2 or if p-value ≤ a.
12-15
12B-15
Chapter 12
LO12-9
12.7 Confidence and Prediction
Intervals for Y
LO12-9: Distinguish between confidence and prediction intervals for Y.
How to Construct an Interval Estimate for Y
•
•
Confidence Interval for the conditional mean of Y.
Prediction intervals are wider than confidence intervals because individual Y
values vary more than the mean of Y.
12-16
Chapter 12
LO12-10
12.8 Residual Tests
LO12-10: Test residuals for violations of regression assumptions.
Three Important Assumptions
1.
2.
3.
4.
The errors are normally distributed.
The errors have constant variance (i.e., they are homoscedastic).
The errors are independent (i.e., they are nonautocorrelated).
Note: One can use the appropriate technology (MINITAB, EXCEL, etc.) to test for
violations of the assumptions.
12-17
Chapter 12
LO12-11
12.9 Unusual Observations
LO12-11: Identify unusual residuals and high leverage observations.
Standardized Residuals
•
•
One can use Excel, Minitab, MegaStat or other technologies to compute standardized
residuals.
If the absolute value of any standardized residual is at least 2, then it is classified as
unusual.
Leverage and Influence
•
•
•
A high leverage statistic indicates the observation is far from the mean of X.
These observations are influential because they are at the “ end of the lever.”
The leverage for observation i is denoted hi.
A leverage that exceeds
3/n is unusual.
12-18
Chapter 12
12.10 Other Regression Problems
Outliers
Outliers may be caused by
- an error in recording
data
- impossible data
- an observation that has
been influenced by an
unspecified “lurking”
variable that should
have been controlled
but wasn’t.
To fix the problem,
- delete the observation(s)
- delete the data
- formulate a multiple regression model
that includes the lurking variable.
12-19
12B-19
Chapter 12
12.10 Other Regression Problems
Model Misspecification
•
•
If a relevant predictor has been omitted, then the model is misspecified.
Use multiple regression instead of bivariate regression.
Ill-Conditioned Data
•
•
•
Well-conditioned data values are of the same general order of magnitude.
Ill-conditioned data have unusually large or small data values and can cause loss
of regression accuracy or awkward estimates.
Avoid mixing magnitudes by adjusting the magnitude of your data before running
the regression.
12-20
Chapter 12
12.10 Other Regression Problems
Spurious Correlation
•
•
In a spurious correlation two variables appear related because of the way they are
defined.
This problem is called the size effect or problem of totals.
Model Form and Variable Transforms
•
•
•
•
•
Sometimes a nonlinear model is a better fit than a linear model.
Excel offers many model forms.
Variables may be transformed (e.g., logarithmic or exponential functions) in order
to provide a better fit.
Log transformations reduce heteroscedasticity.
Nonlinear models may be difficult to interpret.
12-21
Related documents