Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inference about the Slope and Intercept • Recall, we have established that the least square estimates b0 and b1 are linear combinations of the Yi’s. • Further, we have showed that they are unbiased and have the following variances 1 X2 Var b0 n S XX 2 and Var b1 2 S XX • In order to make inference we assume that εi’s have a Normal distribution, that is εi ~ N(0, σ2). • This in turn means that the Yi’s are normally distributed. • Since both b0 and b1 are linear combination of the Yi’s they also have a Normal distribution. week 4 1 Inference for β1 in Normal Error Regression Model • The least square estimate of β1 is b1, because it is a linear combination of normally distributed random variables (Yi’s) we have the following result: 2 b1 ~ N 1 , S XX • We estimate the variance of b1 by S2/SXX where S2 is the MSE which has n-2 df. • Claim: The distribution b1 1 S2 of is t with n-2 df. S XX • Proof: week 4 2 Tests and CIs for β1 • The hypothesis of interest about the slope in a Normal linear regression model is H0: β1 = 0. • The test statistic for this hypothesis is b1 b1 t stat 2 S .E b1 S S XX • We compare the above test statistic to a t with n-2 df distribution to obtain the P-value…. • Further, 100(1-α)% CI for β1 is: S b1 t n 2 ; 2 b1 t n 2 ; 2 S .E b1 S XX week 4 3 Important Comment • Similar results can be obtained about the intercept in a Normal linear regression model. • See the book for more details. • However, in many cases the intercept does not have any practical meaning and therefore it is not necessary to make inference about it. week 4 4 Example • We have Data on Violent and Property Crimes in 23 US Metropolitan Areas.The data contains the following three variables: violcrim = number of violent crimes propcrim = number of property crimes popn = population in 1000's • We are interested in the relationship between the size of the city and the number of violent crimes…. week 4 5 Prediction of Mean Response • Very often, we would want to use the estimated regression line to make prediction about the mean of the response for a particular X value (assumed to be fixed). • We know that the least square line Yˆ b0 b1 X is an estimate of E Y 0 1 X • Now, we can pick a point in the range in the regression line (Xh, Yh) then, Yˆh b0 b1 X h is an estimate of E Yh 0 1 X h . • Claim: • Proof: 2 X X 1 Var Yh 2 h n S XX • This is the variance of the estimate of E(Y) when X = Xh. week 4 6 Confidence Interval for E(Yh) • For a given Xh , a 100(1-α)% CI for the mean value of Y is 1 X h X ˆ Yh t n 2 ; 2 s n S XX 2 where s MSE . • Note, the CI above will be wider the further Xh is from X . week 4 7 Example • Consider the snow gauge data. • Suppose we wish to predict the mean loggain when the device was calibrated at density 0.5, that is, when Xh = 0.5…. week 4 8 Yˆh, new Prediction of New Observation • We want to use the regression line to predict a particular value of Y for a given X = Xh,new, a new point taken after n observation. • The predicted value of a new point measured when X = Xh,new is Yˆ b b X h , new 0 1 h , new • Note, the above predicted value is the same as the estimate of E(Y) at Xh,new but it should have larger variance. • The predicted value Yˆh, new has two sources of variability. One is due to the regression line being estimated by b0+b1X. The second one is due to εh,new i.e., points don’t fall exactly on line. • To calculated the variance of we look at the difference Yh,new Yˆh, new .... week 4 9 Prediction Interval for New Observation • 100(1-α)% prediction interval for when X = Xh,new is 1 Yˆh ,new t n 2 ; 2 s 1 n X X 2 h , new S XX • This is not a confidence interval; CI’s are for parameters and we are estimating a value of a random variable. week 4 10 Confidence Bands for E(Y) • Confidence bands capture the true mean of Y , E(Y) = β0+ β1X, everywhere over the range of the data. • For this we use the Working-Hotelling procedure which gives us the following boundary values at any given Xh Yˆ 2 F2, n 2 ; s 1 X n h X S XX 2 where F(2, n-2); α is the upper α –quantile from an F distribution with 2 and n-2 df. (Table B.4) week 4 11 Decomposition of Sum of Squares • The total sum of squares (SS) in the response variable is SSTO Yi Y 2 • The total SS can be decompose into two main sources; error SS and regression SS. • The error SS is SSE 2 e i. • The regression SS is SSR b 2 1 X X . 2 i It is the amount of variation in Y’s that is explained by the linear relationship of Y with X. week 4 12 Claims • First, SSTO = SSR +SSE, that is SSTO Yi Y b 2 2 1 X X ei2 2 i • Proof:…. • Alternative decomposition is SSTO Yi Y 2 Yˆi Y 2 Yi Yˆi 2 • Proof: Exercises. week 4 13 Analysis of Variance Table • The decomposition of SS discussed above is usually summarized in analysis of variance table (ANOVA) as follow: • Note that the MSE is s2 our estimate of σ2. week 4 14 Coefficient of Determination • The coefficient of determination is SSR SSE 2 R 1 SSTO SSTO • It must satisfy 0 ≤ R2 ≤ 1. • R2 gives the percentage of variation in Y’s that is explained by the regression line. week 4 15 Claim • R2 = r2, that is the coefficient of determination is the correlation coefficient square. • Proof:… week 4 16 Important Comments about R2 • It is useful measure but… • There is no absolute rule about how big it should be. • It is not resistant to outliers. • It is not meaningful for models with no intercepts. • It is not useful for comparing models unless one set of predictors is a subset of the other. week 4 17 ANOVE F Test • The ANOVA table gives us another test of H0: β1 = 0. • The test statistics is Fstat MSR MSE • Derivations … week 4 18