Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear least squares (mathematics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Transcript
C22.0015 Homework Set 9 Spring 2011

1. In the simple linear regression model, we use
Yi = 0 + 1 xi + i
for i = 1, 2, …, n. The ’s are assumed to be a sample from a normal population with
mean 0 and standard deviation . The values x1, x2, … , xn are assumed non-random.
Since we treat x1, x2, … , xn as nonrandom, the quantity Sxx is also nonrandom.
The fitted line will be denoted Y = ̂0 + ̂1 x or perhaps as Y = b0 + b1 x . Not all
sources use the same notation.
Assume that you already know ̂1 =
(a)
S xy
S xx
, E ̂1 = 1 , and Var( ̂1 ) =
2
.
S xx
2
.
n
Show that Cov( Y , Yi ) =
(b) Show that Cov( Y , ̂1 ) = 0.
HINT: For any two sets of values u1, u2, … , un and v1, v2, … , vn , each of
n
length n, it happens that
  ui  u  vi  v 
n
 u

i 1
n
to prove. Also,
  ui  u  vi  v  
i 1
(c)
Show that Cov( ̂ 0 , ̂1 ) =  x
i 1
n
 u v
i 1
i
i
i
 u  vi . This is very easy
v
2
.
S xx
2. As in the previous problem, we use the simple linear regression model
Yi = 0 + 1 xi + i
for i = 1, 2, …, n. The ’s are assumed to be a sample from a normal population with
mean 0 and standard deviation . The values x1, x2, … , xn are assumed non-random.
(a) Show that E (MSRegr) = 2  12 S xx .
HINT: Recall that, with one predictor, MSRegr = SSRegr = ˆ 12 Sxx =
will help to show first that Sxy = 1 S xx 

1
n
 x
i 1
i
S 
xy
S xx
2
.
It
 x  i .
 gs 2011
C22.0015 Homework Set 9 Spring 2011

(b) Show that E(MSResid) = 2 .
HINT: Find E ( SSTot ). Then get E(SSResid) by subtraction.
3. Use the same simple linear regression model as in the first two problems. Be sure to
distinguish carefully between the noise term i and the residual ei = Yi – Yˆi . Find the
distribution of ei . Note also E ei and Var ei . You may assume that Cov( ̂ 0 , i) = 0
and also Cov( ̂1 , i) = 0 ; these need a careful proof, but you can assume them here.
4. In a general multiple linear regression, establish the algebraic relationship between the
F statistic and the R2 statistic. You can use the notation of this analysis of variance
table.
Source of
variation
Degrees
of
freedom
Sum Squares
Regression
K
SSRegr
Residual
n–K–1
SSResid
Total
n–1
SSTot =
SSRegr + SSResid
Then recall that R2 =

SSRegr
SSTot
Mean Squares
MSRegr =
SSRegr
K
SS Resid
MSResid =
n  K 1
F
F=
MSRegr
MSResid
. Work from the definition of F :
2
 gs 2011
C22.0015 Homework Set 9 Spring 2011

5. Using the simple linear regression model (see Problem 1), suppose that you have
collected these data:
The regression equation is
Y = 49.2 + 0.457 X
Predictor
Constant
X
S = 16.3670
Coef
49.161
0.45704
SE Coef
6.186
0.07748
R-Sq = 34.5%
T
7.95
5.90
P
0.000
0.000
R-Sq(adj) = 33.5%
Analysis of Variance
Source
DF
SS
Regression
1
9321.6
Residual Error 66 17680.1
Total
67 27001.7
MS
9321.6
267.9
F
34.80
P
0.000
You have just acquired a new data point for which you see xnew , but you have not yet
observed Ynew .
(a)
Identify the value 0 , 1 ,  , b0 , b1, s , n .
(b)
Give a point prediction for Ynew in symbolic terms. “Symbolic terms” refers to
the items given in part (a). Call this Ŷnew . Provide a numeric prediction for
xnew = 62.
(c)
The Ŷnew of (b) is a random variable. Find E Ŷnew and Var Ŷnew .
(d)
Give a 95% prediction interval for Ynew . This will certainly be something of the
form Ŷnew  (expression). Give this symbolically and then for the specific case
xnew = 62.
(e)
Since 0 and 1 are parameters, so is 0 + 1 xnew . Give a point estimate for
0 + 1 xnew in symbolic form. Call this Ynew . Give this value for the situation
xnew = 62.
(f)
The Ynew of (e) is a random variable. Find E Ynew and Var Ynew .
(g)
Give a 95% confidence interval for the parameter combination 0 + 1 xnew . Do
this symbolically and then for xnew = 62.
(h)
The intervals in (d) and (g) are different. Can you give a short explanation as to
why this should be so?

3
 gs 2011
C22.0015 Homework Set 9 Spring 2011

(i)
 0 
Find a 1 – α confidence set for the vector   . This is not easy. You can use
 1 
the following facts.
*
*
If W is a m-by-1 random normal vector with mean 0 and variance matrix
 , and if we assume that  is a full-rank m-by-m matrix, then the
-1
distribution of W  Ω W is chi-squared with m degrees of freedom.
If G is distributed as chi-squared on m degrees of freedom, and if H is
distributed as chi-squared on q degrees of freedom, and if G and H are
G/m
independent, then the distribution of
is Fm , q .
H /q
 0 
Certainly   is a random normal vector, but it needs some massage to take
 1 
advantage of the first * fact.

4
 gs 2011