Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
C22.0015 Homework Set 9 Spring 2011 1. In the simple linear regression model, we use Yi = 0 + 1 xi + i for i = 1, 2, …, n. The ’s are assumed to be a sample from a normal population with mean 0 and standard deviation . The values x1, x2, … , xn are assumed non-random. Since we treat x1, x2, … , xn as nonrandom, the quantity Sxx is also nonrandom. The fitted line will be denoted Y = ̂0 + ̂1 x or perhaps as Y = b0 + b1 x . Not all sources use the same notation. Assume that you already know ̂1 = (a) S xy S xx , E ̂1 = 1 , and Var( ̂1 ) = 2 . S xx 2 . n Show that Cov( Y , Yi ) = (b) Show that Cov( Y , ̂1 ) = 0. HINT: For any two sets of values u1, u2, … , un and v1, v2, … , vn , each of n length n, it happens that ui u vi v n u i 1 n to prove. Also, ui u vi v i 1 (c) Show that Cov( ̂ 0 , ̂1 ) = x i 1 n u v i 1 i i i u vi . This is very easy v 2 . S xx 2. As in the previous problem, we use the simple linear regression model Yi = 0 + 1 xi + i for i = 1, 2, …, n. The ’s are assumed to be a sample from a normal population with mean 0 and standard deviation . The values x1, x2, … , xn are assumed non-random. (a) Show that E (MSRegr) = 2 12 S xx . HINT: Recall that, with one predictor, MSRegr = SSRegr = ˆ 12 Sxx = will help to show first that Sxy = 1 S xx 1 n x i 1 i S xy S xx 2 . It x i . gs 2011 C22.0015 Homework Set 9 Spring 2011 (b) Show that E(MSResid) = 2 . HINT: Find E ( SSTot ). Then get E(SSResid) by subtraction. 3. Use the same simple linear regression model as in the first two problems. Be sure to distinguish carefully between the noise term i and the residual ei = Yi – Yˆi . Find the distribution of ei . Note also E ei and Var ei . You may assume that Cov( ̂ 0 , i) = 0 and also Cov( ̂1 , i) = 0 ; these need a careful proof, but you can assume them here. 4. In a general multiple linear regression, establish the algebraic relationship between the F statistic and the R2 statistic. You can use the notation of this analysis of variance table. Source of variation Degrees of freedom Sum Squares Regression K SSRegr Residual n–K–1 SSResid Total n–1 SSTot = SSRegr + SSResid Then recall that R2 = SSRegr SSTot Mean Squares MSRegr = SSRegr K SS Resid MSResid = n K 1 F F= MSRegr MSResid . Work from the definition of F : 2 gs 2011 C22.0015 Homework Set 9 Spring 2011 5. Using the simple linear regression model (see Problem 1), suppose that you have collected these data: The regression equation is Y = 49.2 + 0.457 X Predictor Constant X S = 16.3670 Coef 49.161 0.45704 SE Coef 6.186 0.07748 R-Sq = 34.5% T 7.95 5.90 P 0.000 0.000 R-Sq(adj) = 33.5% Analysis of Variance Source DF SS Regression 1 9321.6 Residual Error 66 17680.1 Total 67 27001.7 MS 9321.6 267.9 F 34.80 P 0.000 You have just acquired a new data point for which you see xnew , but you have not yet observed Ynew . (a) Identify the value 0 , 1 , , b0 , b1, s , n . (b) Give a point prediction for Ynew in symbolic terms. “Symbolic terms” refers to the items given in part (a). Call this Ŷnew . Provide a numeric prediction for xnew = 62. (c) The Ŷnew of (b) is a random variable. Find E Ŷnew and Var Ŷnew . (d) Give a 95% prediction interval for Ynew . This will certainly be something of the form Ŷnew (expression). Give this symbolically and then for the specific case xnew = 62. (e) Since 0 and 1 are parameters, so is 0 + 1 xnew . Give a point estimate for 0 + 1 xnew in symbolic form. Call this Ynew . Give this value for the situation xnew = 62. (f) The Ynew of (e) is a random variable. Find E Ynew and Var Ynew . (g) Give a 95% confidence interval for the parameter combination 0 + 1 xnew . Do this symbolically and then for xnew = 62. (h) The intervals in (d) and (g) are different. Can you give a short explanation as to why this should be so? 3 gs 2011 C22.0015 Homework Set 9 Spring 2011 (i) 0 Find a 1 – α confidence set for the vector . This is not easy. You can use 1 the following facts. * * If W is a m-by-1 random normal vector with mean 0 and variance matrix , and if we assume that is a full-rank m-by-m matrix, then the -1 distribution of W Ω W is chi-squared with m degrees of freedom. If G is distributed as chi-squared on m degrees of freedom, and if H is distributed as chi-squared on q degrees of freedom, and if G and H are G/m independent, then the distribution of is Fm , q . H /q 0 Certainly is a random normal vector, but it needs some massage to take 1 advantage of the first * fact. 4 gs 2011