* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Inference about the Slope and Intercept
Survey
Document related concepts
Transcript
Inference about the Slope and Intercept • Recall, we have established that the least square estimates b0 and b1 are linear combinations of the Yi’s. • Further, we have showed that they are unbiased and have the following variances 1 X2 Var b0 n S XX 2 and Var b1 2 S XX • In order to make inference we assume that εi’s have a Normal distribution, that is εi ~ N(0, σ2). • This in turn means that the Yi’s are normally distributed. • Since both b0 and b1 are linear combination of the Yi’s they also have a Normal distribution. week 4 1 Inference for β1 in Normal Error Regression Model • The least square estimate of β1 is b1, because it is a linear combination of normally distributed random variables (Yi’s) we have the following result: 2 b1 ~ N 1 , S XX • We estimate the variance of b1 by S2/SXX where S2 is the MSE which has n-2 df. • Claim: The distribution b1 1 S2 of is t with n-2 df. S XX • Proof: week 4 2 Tests and CIs for β1 • The hypothesis of interest about the slope in a Normal linear regression model is H0: β1 = 0. • The test statistic for this hypothesis is b1 b1 t stat 2 S .E b1 S S XX • We compare the above test statistic to a t with n-2 df distribution to obtain the P-value…. • Further, 100(1-α)% CI for β1 is: S b1 t n 2 ; 2 b1 t n 2 ; 2 S .E b1 S XX week 4 3 Important Comment • Similar results can be obtained about the intercept in a Normal linear regression model. • See the book for more details. • However, in many cases the intercept does not have any practical meaning and therefore it is not necessary to make inference about it. week 4 4 Example • We have Data on Violent and Property Crimes in 23 US Metropolitan Areas.The data contains the following three variables: violcrim = number of violent crimes propcrim = number of property crimes popn = population in 1000's • We are interested in the relationship between the size of the city and the number of violent crimes…. week 4 5 Prediction of Mean Response • Very often, we would want to use the estimated regression line to make prediction about the mean of the response for a particular X value (assumed to be fixed). • We know that the least square line Yˆ b0 b1 X is an estimate of E Y 0 1 X • Now, we can pick a point in the range in the regression line (Xh, Yh) then, Yˆh b0 b1 X h is an estimate of E Yh 0 1 X h . • Claim: • Proof: 2 X X 1 Var Yh 2 h n S XX • This is the variance of the estimate of E(Y) when X = Xh. week 4 6 Confidence Interval for E(Yh) • For a given Xh , a 100(1-α)% CI for the mean value of Y is 1 X h X ˆ Yh t n 2 ; 2 s n S XX 2 where s MSE . • Note, the CI above will be wider the further Xh is from X . week 4 7 Example • Consider the snow gauge data. • Suppose we wish to predict the mean loggain when the device was calibrated at density 0.5, that is, when Xh = 0.5…. week 4 8 Yˆh, new Prediction of New Observation • We want to use the regression line to predict a particular value of Y for a given X = Xh,new, a new point taken after n observation. • The predicted value of a new point measured when X = Xh,new is Yˆ b b X h , new 0 1 h , new • Note, the above predicted value is the same as the estimate of E(Y) at Xh,new but it should have larger variance. • The predicted value Yˆh, new has two sources of variability. One is due to the regression line being estimated by b0+b1X. The second one is due to εh,new i.e., points don’t fall exactly on line. • To calculated the variance of we look at the difference Yh,new Yˆh, new .... week 4 9 Prediction Interval for New Observation • 100(1-α)% prediction interval for when X = Xh,new is 1 Yˆh ,new t n 2 ; 2 s 1 n X X 2 h , new S XX • This is not a confidence interval; CI’s are for parameters and we are estimating a value of a random variable. week 4 10 Confidence Bands for E(Y) • Confidence bands capture the true mean of Y , E(Y) = β0+ β1X, everywhere over the range of the data. • For this we use the Working-Hotelling procedure which gives us the following boundary values at any given Xh Yˆ 2 F2, n 2 ; s 1 X n h X S XX 2 where F(2, n-2); α is the upper α –quantile from an F distribution with 2 and n-2 df. (Table B.4) week 4 11 Decomposition of Sum of Squares • The total sum of squares (SS) in the response variable is SSTO Yi Y 2 • The total SS can be decompose into two main sources; error SS and regression SS. • The error SS is SSE 2 e i. • The regression SS is SSR b 2 1 X X . 2 i It is the amount of variation in Y’s that is explained by the linear relationship of Y with X. week 4 12 Claims • First, SSTO = SSR +SSE, that is SSTO Yi Y b 2 2 1 X X ei2 2 i • Proof:…. • Alternative decomposition is SSTO Yi Y 2 Yˆi Y 2 Yi Yˆi 2 • Proof: Exercises. week 4 13 Analysis of Variance Table • The decomposition of SS discussed above is usually summarized in analysis of variance table (ANOVA) as follow: • Note that the MSE is s2 our estimate of σ2. week 4 14 Coefficient of Determination • The coefficient of determination is SSR SSE 2 R 1 SSTO SSTO • It must satisfy 0 ≤ R2 ≤ 1. • R2 gives the percentage of variation in Y’s that is explained by the regression line. week 4 15 Claim • R2 = r2, that is the coefficient of determination is the correlation coefficient square. • Proof:… week 4 16 Important Comments about R2 • It is useful measure but… • There is no absolute rule about how big it should be. • It is not resistant to outliers. • It is not meaningful for models with no intercepts. • It is not useful for comparing models unless one set of predictors is a subset of the other. week 4 17 ANOVE F Test • The ANOVA table gives us another test of H0: β1 = 0. • The test statistics is Fstat MSR MSE • Derivations … week 4 18