Download Chapter 13 - McGraw Hill Higher Education

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Taylor's law wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Omnibus test wikipedia , lookup

Transcript
Chapter 13
Simple Linear Regression
Analysis
McGraw-Hill/Irwin
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
Simple Linear Regression
Analysis
13.1 The Simple Linear Regression Model and
the Least Square Point Estimates
13.2 Model Assumptions and the Standard
Error
13.3 Testing the Significance of Slope and yIntercept
13.4 Confidence and Prediction Intervals
13.5 Simple Coefficients of Determination and
Correlation
13-2
Simple Linear Regression
Analysis Continued
13.6
Testing the Significance of the
Population Correlation Coefficient
(Optional)
13.7 An F Test for the Model
13.8 The QHIC Case
13.9 Residual Analysis (Optional)
13.10 Some Shortcut Formulas (Optional)
13-3
LO 1: Explain the simple
linear regression
model.




13.1 The Simple Linear Regression
Model and the Least Squares
Point Estimates
The dependent (or response) variable is the
variable we wish to understand or predict
The independent (or predictor) variable is the
variable we will use to understand or predict the
dependent variable
Regression analysis is a statistical technique that
uses observed data to relate the dependent variable
to one or more independent variables
The objective is to build a regression model that can
describe, predict and control the dependent variable
based on the independent variable
13-4
LO 2: Find the least
squares point estimates
of the slope and yintercept.


The Least Squares Point
Estimates
Estimation/prediction equation
ŷ = b0 + b1x
Least squares point estimate of the slope β1
b1 
SS xy
SS xx
SS xy   ( xi  x )( yi  y )   xi yi 
 x  y 
i
i
n

x

SS   ( x  x )   x 
n
Least squares point estimate of the y-intercept 0
y
x


b  y b x
y
x
2
2
xx

2
i
i
i
i
0
1
n
i
n
13-5
LO 3: Describe the
assumptions behind
simple linear regression
and calculate the
standard error.
1.
2.
3.
4.
13.2 Model Assumptions
and the Standard Error
Mean of Zero
At any given value of x, the population of potential error term
values has a mean equal to zero
Constant Variance Assumption
At any given value of x, the population of potential error term
values has a variance that does not depend on the value of x
Normality Assumption
At any given value of x, the population of potential error term
values has a normal distribution
Independence Assumption
Any one value of the error term ε is statistically independent of
any other value of ε
13-6
LO3
Sum of Squares

Sum of squared errors

Mean square error
SSE   ei2   ( yi  yˆi ) 2

Point estimate of the residual variance σ2
s 2  MSE 

SSE
n-2
Standard error

Point estimate of residual standard deviation σ
SSE
s  MSE 
n-2
13-7
LO 4: Test the
significance of the
slope and y-intercept.


13.3 Testing the Significance
of the Slope and y-Intercept
A regression model is not likely to be useful unless
there is a significant relationship between x and y
To test significance, we use the null hypothesis:
H0: β1 = 0

Versus the alternative hypothesis:
Ha: β1 ≠ 0
13-8
LO3
Testing the Significance of the
Slope #2
Alternative
Reject H0 If
p-Value
Ha: β1 > 0
t > tα
Area under t distribution right
of t
Ha: β1 < 0
t < –tα
Area under t distribution left
of t
Ha: β1 ≠ 0
|t| > tα/2*
Twice area under t
distribution right of |t|
* That
is t > tα/2 or t < –tα/2
13-9
LO 5: Calculate and
interpret a confidence
interval for a mean
value and a prediction
interval for an individual
value.




13.4 Confidence and
Prediction Intervals
The point on the regression line corresponding to a
particular value of x0 of the independent variable x is
ŷ = b0 + b1x0
It is unlikely that this value will equal the mean value
of y when x equals x0
Therefore, we need to place bounds on how far the
predicted value might be from the actual value
We can do this by calculating a confidence interval
mean for the value of y and a prediction interval for
an individual value of y
13-10
LO 6: Calculate and
interpret the simple
coefficients of
determination and
correlation.



13.5 Simple Coefficient of
Determination and Correlation
How useful is a particular regression model?
One measure of usefulness is the simple
coefficient of determination
It is represented by the symbol r2
This section may be read
anytime after reading Section
13.1
13-11
LO 7: Test hypotheses
about the population
correlation coefficient
(optional).



13.6 Testing the Significance of
the Population Correlation
Coefficient (Optional)
The simple correlation coefficient (r)
measures the linear relationship between the
observed values of x and y from the sample
The population correlation coefficient (ρ)
measures the linear relationship between all
possible combinations of observed values of
x and y
r is an estimate of ρ
13-12
LO 8: Test the
significance of a simple
linear regression model
by using an F test.
13.7 An F Test for Model

For simple regression, this is another way to
test the null hypothesis
H 0: β 1 = 0


This is the only test we will use for multiple
regression
The F test tests the significance of the overall
regression relationship between x and y
13-13
LO 9: Use residual
analysis to check the
assumptions of simple
linear regression
(optional).


Checks of regression assumptions are
performed by analyzing the regression residuals
Residuals (e) are defined as the difference
between the observed value of y and the
predicted value of y, e = y - ŷ



13.9 Residual Analysis
(Optional)
Note that e is the point estimate of ε
If regression assumptions valid, the population
of potential error terms will be normally
distributed with mean zero and variance σ2
Different error terms will be statistically
independent
13-14
13.10 Some Shortcut
Formulas (Optional)
Total variation  SSTO  SS yy
Explained variation  SSR 
SS xy2
SS xx
Unexplaine d variation  SSE = SS yy 
SS xy2
SS xx
where
SS xy

x  y 

  ( x  x )( y  y )   x y 
i
i
SS xx   ( xi  x )
i
i
i

x

x 
n

y

y 
n
i
n
2
2
SS yy   ( yi  y )
2
i
i
2
2
2
i
i
13-15