Download Tests for Significance - McGraw Hill Higher Education

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Instrumental variables estimation wikipedia , lookup

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript










12
Chapter
Simple Regression
Visual Displays and Correlation Analysis
Simple Regression
Regression Terminology
Ordinary Least Squares Formulas
Tests for Significance
Analysis of Variance: Overall Fit
Confidence and Prediction Intervals for Y
Violations of Assumptions
Unusual Observations
Other Regression Problems
McGraw-Hill/Irwin
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved.
Visual Displays and
Correlation Analysis
Visual Displays – Scatter Plot - displays each
observed data pair (xi, yi) as a dot on an X/Y grid
Figure 12.1
12-2
Visual Displays and
Correlation Analysis
Correlation Analysis
•
The sample correlation coefficient (r) measures the degree of
linearity in the relationship between X and Y.
-1 < r < +1
Strong negative relationship
Strong positive relationship
•
r = 0 indicates no linear relationship
•
To test the hypothesis H0: r = 0, the test statistic is:
12-3
Visual Displays and
Correlation Analysis
Tests for Significance
calc
•
•
•
The critical value ta is obtained from Appendix D using n = n – 2 degrees
of freedom for any a.
Equivalently, you can calculate the critical value for the correlation
coefficient using
This method gives a benchmark for the correlation coefficient.
12-4
Simple Regression
What is Simple Regression?
•
•
•
•
•
•
Simple Regression analyzes the relationship between two
variables.
It specifies one dependent (response) variable and one
independent (predictor) variable.
This hypothesized relationship may be linear, quadratic, or
whatever.
Unknown parameters are
b0
Intercept
b1
Slope
The assumed model for a linear relationship is yi = b0 + b1xi + ei
for all observations (i = 1, 2, …, n)
The error term is not observable, is assumed normally distributed
with mean of 0 and standard deviation s.
12-5
Regression Terminology
Models and Parameters
•
The fitted model used to predict the expected value of Y for a
given value of X is
^
yi = b0 + b1xi
•
•
•
The fitted coefficients which can be computed using
formulas or technology are
b0
the estimated intercept
b1
the estimated slope
^i.
Residual is ei = yi - y
Residuals may be used to estimate s, the standard deviation
of the errors.
12-6
Ordinary Least Squares Formulas
Slope and Intercept
•
The OLS estimator for the slope is:
or
•
The OLS estimator for the intercept is:
12-7
Ordinary Least Squares Formulas
Coefficient of Determination
•
R2 is a measure of relative fit based on a
comparison of the regression sum of squares
(SSR) and the total sum of squares (SST).
0 < R2 < 1
•
•
Often expressed as a percent, an R2 = 1 (i.e., 100%)
indicates perfect fit.
In a simple regression, R2 = (r)2
12-8
12-8
Tests for Significance
• Confidence Intervals for the true Slope:
•
Confidence interval for the true Intercept:
• Hypothesis Tests
• If b1 = 0, then X cannot influence Y and the regression model
collapses to a constant b0 plus random error.
12-9
12-9
Tests for Significance
•
The hypotheses to be tested using technology or formulas are:
12-10
12-10
Analysis of Variance: Overall Fit
F Statistic for Overall Fit
•
For a simple regression, the F statistic is
calc
•
•
For a given sample size, a larger F statistic
indicates a better fit.
Reject H0 if F > F1,n-2 from Appendix F for a
given significance level a or if p-value < a.
12-11
12-11
Confidence and Prediction
Intervals for Y
How to Construct an Interval Estimate for Y
•
Confidence Interval for the conditional mean of Y
•
Prediction interval for individual values of Y is
12-12
12-12
Violations of Assumptions
Three Important Assumptions
1.
2.
3.
The errors are normally distributed.
The errors have constant variance (i.e., they are homoscedastic)
The errors are independent (i.e., they are nonautocorrelated).
•
•
The error ei is unobservable.
The residuals ei from the fitted regression give clues about the
violation of these assumptions.
Leverage and Influence
•
•
•
A high leverage statistic indicates the observation is far from the
mean of X.
These observations are influential because they are at the “ end of
the lever.”
The leverage for observation i is denoted hi
12-13
12-13
Unusual Observations
A leverage that exceeds
3/n is unusual.
Studentized Deleted Residuals
• Studentized deleted residuals are another
way to identify unusual observations.
• A studentized deleted residual whose
absolute value is 2 or more may be
considered unusual.
• A studentized deleted residual whose
absolute value is 2 or more is an outlier.
12-14
12-14
Other Regression Problems
• Outliers – can cause loss of fit and other
problems.
• Model Misspecification – occurs when a relevant
predictor has been omitted.
• Ill-Conditioned Data – can cause loss of regression
accuracy.
• Spurious Correlation – occurs when two variables
appear related because of the way they are defined.
• Model Form and Variable Transforms – sometimes
linear relationships will not work and transformations are
necessary in order to do any analysis.
12-15