Download Inference for Regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia , lookup

German tank problem wikipedia , lookup

Time series wikipedia , lookup

Confidence interval wikipedia , lookup

Linear regression wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Inference for Regression
Section 13.3, Page 284
1
Population Regression Line
Secton 13.3, Page 285
2
Standard Error of the Regression
Line
The σ above is the standard deviation of the regression line. It
is estimated by the standard error of the regression line
Se = standard deviation of the residuals
(y  yˆ ) 2
Se 
n 2

Section 13.3, Page 286
3
Inference for Slope and Intercept
For a give population of points (x, y) the true regression is
given by the equation y=α + βx. When we take a random
sample of the points and construction a regression
equation yˆ  a  bx In this case a is an estimator of α
and b is an estimator of β.
When certain conditions are met, the sampling distributions
for a and for b are t-distributions, each with n-2 degrees of
freedom where n equals the number of points in the
sample.
t-distribution for a
intercept

t-distribution for b
slope
a  
b  
1
x2
a 

n (x  x ) 2
b 
se
n 1sx

Section 13.4, Page 287
4
Hypotheses Test for the Slope
Because we have sampling distribution for a and for b
we can test hypotheses about them. The most often
test relates to the slope. Recall that if the slope for
the regression line is 0, then there is not useful
regression line. We therefore most often test the
following hypotheses for b:
H0 :   0
HA :   0
When the null hypotheses is rejected, then we
have a useful relationship. How useful will be
determined by the coefficient of determination.

Section 13.4, Page 288
5
Hypotheses Test for the Y-Intercept
The hypotheses test for the slope is most important to
see if we have a usable regression. Sometimes, a
hypotheses test for the y-intercept is used. In this
case:
H0 :   0
HA :   0

Section 13.2, Page 707
6
Confidence interval for slope and
Intercept
When conditions have been met, the confidence interval
for the slope is:
b t
b
*
n2
The confidence interval for the y-intercept is:
a t
b
*
n2


Section 13.5, Page 289
7
Confidence Interval for Regression
Equation Predictions
When predicting y values for a given x value there
are two situations.
We want to predict the mean y value for a given x
value. We calculate the confidence interval as
follows:
*
a  bx *  tn2
 (a bx* ) where
se2
 (a bx* )   b (x  x ) 
n
*
Notice that the standard error gets larger as x*
gets further away from the mean of the x values.

When we want to predict a single point y value for
a given x*, there is more variability, and the
Standard Error is:
2
s
 (a bx* )   b (x *  x )  e  se2
n

Section 13.5, Page 291
8
Conditions for Regression Inference
1. Linearity Assumption: A check of a scatter plot should
show a linear pattern.
2. Independence Assumption: The residuals must be
mutually independent. A check of a residuals plotted
against the x values show no patters, trends, or
clumping.
3. Equal variance assumption: The variability of y
should be be about the same for all values of x. A
check of the residuals plotted against the predicted y
values should show roughly equal spread.
4. Normal Population Assumption. We assume that the
errors around the idealized regression line form a
normal distribution around each x value. We check
for this condition by checking all the residuals to see if
they came from a normal population. We look at a
probability plot of the residuals, or a histogram of the
residuals if n is fairly large.
Section 13.5
9
Regression Inference Problem (1)
Find the regression line for Median Stat as the xvariable and Graduation Rate as the y-variable.
Test the hypotheses for the slope, find the 95%
interval for the slope, find the predicted mean
graduation rate for a Median SAT of 900 and the
95% interval for that prediction. Check the
conditions necessary for inference.
Section 13.2, Page 708
10
Inference Problem (2)
Select the program REGINFER
Enter the X and Y lists.
Chose YES to Scatter Plot.
Note that the scatter plot has
a linear pattern
Press ENTER to display the data
screen.
The y-intercept is -91.3132.
The slope is 0.1321.
For each one point increase in the
SAT Score, the Graduation rate
increases by 0.1321 point.
The coefficient of correlation is
.7589.
The coefficient of determination is
.5757. The regression model
explains 57.57 % of variance in
Graduation Rates.
The standard error of the
regression line is 6.1457
The standard error of the intercept
is 31.8968.
The standard error of the slope is
0.0314.
Section 13.5
Scatter Plot
Data Screen
11
Inference Problem (3)
Press ENTER to go to the plot
menu. Select 1, the Residuals
plotted against the x values
There is no pattern, so the
linear model is appropriate.
Also, the variance of the
residuals is about equal along
the range of x values satisfying
the equal variance assumption.
Press ENTER and then select
3=NORMAL PROBABILITY
PLOT.
The plot pattern is roughly a
straight line, indicating that
the residuals are from a
normal distribution. The
normal population
assumption is satisfied.
Section 13.5,
Plot Menu
Residuals vs xvalues
Probability Plot
12
Inference Problem (4)
Press ENTER and the
inference screen
appears. The p-value for
the alternative hypothesis
of the slope ≠ 0 is 0.001.
We reject the null
hypotheses and conclude
that we have a useful
regression. Press
ENTER and confidence
interval menu appears.
Select 1=Yes, and on the
next screen choose a .95
confidence level. Press
ENTER and the CI
screen appears with the
intervals for the slope and
intercept.
Inference Screen
CI Menu
CI Screen
Section 13.5
13
Inference Problem (5)
Press ENTER and the prediction
menu appears. Select 2=Y-hat
mean. On the next screen,
enter x = 900. Press ENTER
and the prediction screen
appears.
Prediction Menu
The top line shows the x-value
and the z-score of the x-value.
900 is 2.2 standard deviation
units below the mean of the xvalues.
Lines 2 and 3 show the
minimum and maximum xvalues.
The predicted y-mean value for
x=900 is 27.6028. The 95%
confidence interval is shown for
the y-mean value prediction.
The margin of error of the
prediction interval is 8.4079.
Press ENTER and then select
3= No Estimates to exit
program.
Section 13.5
Prediction Screen
14
Problems
a. Find the equation to predict the clutch size from
snout size.
b. What percent of the variance in clutch is explained
by the model?
c. Find the 95% Confidence interval for the slope.
d. Predict the mean clutch size for a 65 snout size and
give the 95% confidence Interval.
e. Predict an individual clutch size for a snout size of 65
and give the 95% confidence interval.
Section 13.5
15
Problems
Section 13.5
16