Download Inference about the Slope and Intercept

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Choice modelling wikipedia , lookup

Regression toward the mean wikipedia , lookup

Regression analysis wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Inference about the Slope and Intercept
• Recall, we have established that the least square estimates b0 and b1
are linear combinations of the Yi’s.
• Further, we have showed that they are unbiased and have the
following variances
1 X2
Var b0     
 n S XX
2



and
Var b1  
2
S XX
• In order to make inference we assume that εi’s have a Normal
distribution, that is εi ~ N(0, σ2).
• This in turn means that the Yi’s are normally distributed.
• Since both b0 and b1 are linear combination of the Yi’s they also
have a Normal distribution.
week 4
1
Inference for β1 in Normal Error Regression Model
• The least square estimate of β1 is b1, because it is a linear
combination of normally distributed random variables (Yi’s) we
have the following result:
2



b1 ~ N  1 ,
S XX




• We estimate the variance of b1 by S2/SXX where S2 is the MSE
which has n-2 df.
• Claim: The distribution
b1  1
S2
of is t with n-2 df.
S XX
• Proof:
week 4
2
Tests and CIs for β1
• The hypothesis of interest about the slope in a Normal linear
regression model is H0: β1 = 0.
• The test statistic for this hypothesis is
b1
b1
t stat 

2
S .E b1 
S
S XX
• We compare the above test statistic to a t with n-2 df distribution to
obtain the P-value….
• Further, 100(1-α)% CI for β1 is:
S
b1  t n 2 ; 2
 b1  t n 2 ; 2 S .E b1 
S XX
week 4
3
Important Comment
• Similar results can be obtained about the intercept in a Normal
linear regression model.
• See the book for more details.
• However, in many cases the intercept does not have any
practical meaning and therefore it is not necessary to make
inference about it.
week 4
4
Example
• We have Data on Violent and Property Crimes in 23 US
Metropolitan Areas.The data contains the following three variables:
violcrim = number of violent crimes
propcrim = number of property crimes
popn = population in 1000's
• We are interested in the relationship between the size of the city and
the number of violent crimes….
week 4
5
Prediction of Mean Response
• Very often, we would want to use the estimated regression line to
make prediction about the mean of the response for a particular X
value (assumed to be fixed).
• We know that the least square line Yˆ  b0  b1 X is an estimate of
E Y    0  1 X
• Now, we can pick a point in the range in the regression line (Xh, Yh)
then, Yˆh  b0  b1 X h is an estimate of E Yh    0  1 X h .
• Claim:
• Proof:
2



X

X
1
Var Yh    2   h
n
S XX





• This is the variance of the estimate of E(Y) when X = Xh.
week 4
6
Confidence Interval for E(Yh)
• For a given Xh , a 100(1-α)% CI for the mean value of Y is
1 X h  X 
ˆ
Yh  t n 2 ; 2 s

n
S XX
2
where s  MSE .
• Note, the CI above will be wider the further Xh is from X .
week 4
7
Example
• Consider the snow gauge data.
• Suppose we wish to predict the mean loggain when the device was
calibrated at density 0.5, that is, when Xh = 0.5….
week 4
8
Yˆh, new
Prediction of New Observation
• We want to use the regression line to predict a particular value of Y
for a given X = Xh,new, a new point taken after n observation.
• The predicted value of a new point measured when X = Xh,new is
Yˆ
b b X
h , new
0
1
h , new
• Note, the above predicted value is the same as the estimate of E(Y)
at Xh,new but it should have larger variance.
• The predicted value Yˆh, new has two sources of variability. One is due
to the regression line being estimated by b0+b1X. The second one is
due to εh,new i.e., points don’t fall exactly on line.
• To calculated the variance of we look at the difference
Yh,new  Yˆh, new ....
week 4
9
Prediction Interval for New Observation
• 100(1-α)% prediction interval for when X = Xh,new is
1
Yˆh ,new  t n  2 ; 2 s 1  
n
X
 X
2
h , new
S XX
• This is not a confidence interval; CI’s are for parameters and we are
estimating a value of a random variable.
week 4
10
Confidence Bands for E(Y)
• Confidence bands capture the true mean of Y , E(Y) = β0+ β1X,
everywhere over the range of the data.
• For this we use the Working-Hotelling procedure which gives us the
following boundary values at any given Xh
Yˆ  2 F2, n  2 ; s
1 X

n
h  X
S XX
2
where F(2, n-2); α is the upper α –quantile from an F distribution with
2 and n-2 df. (Table B.4)
week 4
11
Decomposition of Sum of Squares
• The total sum of squares (SS) in the response variable is
SSTO   Yi  Y 
2
• The total SS can be decompose into two main sources; error SS and
regression SS.
• The error SS is SSE 
2
e
 i.
• The regression SS is SSR  b
2
1
 X
 X .
2
i
It is the amount of variation in Y’s that is explained by the linear
relationship of Y with X.
week 4
12
Claims
• First, SSTO = SSR +SSE, that is
SSTO   Yi  Y   b
2
2
1
 X
 X    ei2
2
i
• Proof:….
• Alternative decomposition is
SSTO   Yi  Y 
2

  Yˆi  Y

2

  Yi  Yˆi

2
• Proof: Exercises.
week 4
13
Analysis of Variance Table
• The decomposition of SS discussed above is usually summarized in analysis
of variance table (ANOVA) as follow:
• Note that the MSE is s2 our estimate of σ2.
week 4
14
Coefficient of Determination
• The coefficient of determination is
SSR
SSE
2
R 
 1
SSTO
SSTO
• It must satisfy 0 ≤ R2 ≤ 1.
• R2 gives the percentage of variation in Y’s that is explained by the
regression line.
week 4
15
Claim
• R2 = r2, that is the coefficient of determination is the correlation
coefficient square.
• Proof:…
week 4
16
Important Comments about R2
• It is useful measure but…
• There is no absolute rule about how big it should be.
• It is not resistant to outliers.
• It is not meaningful for models with no intercepts.
• It is not useful for comparing models unless one set of predictors is
a subset of the other.
week 4
17
ANOVE F Test
• The ANOVA table gives us another test of H0: β1 = 0.
• The test statistics is Fstat
MSR

MSE
• Derivations …
week 4
18