Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 F-Tests F -test is useful for testing joint hypotheses about variances. It is the ratio of 2 chi-squares (sum of squares of n independent random variables distributed normally). Let there be k independent variables D.O.F. of simple regression = n −k − 1 ESS/k ∼ F (k, n − k − 1) SSR/(n − k − 1) �n 2 i=1 (Ŷi − Ȳ ) /k F = �n 2 i=1 (Yi − Ŷi ) /(n − k − 1) �n where the ESS = i=1 (Ŷi − Ȳ )2 is the explained sum of squares (see Stock and Watson, p. 123). Imagine a random variable Y with a mean of Ȳ . Now take a sample and say that it is greater than Ȳ . What explains this? One explanation is random variation, due to the inherent variability of the random variable as measured by the variance of the population, σ 2 . Another explanation is that particular Y is associated with an X, some other variable, that is greater than its own mean, X̄. The SSR is the sum of squared residuals and defined as F = SSR = n � i=1 We have F = (Yi − Ŷi )2 ESS/k SSR/(n − k − 1) Put large variance in the numerator when forming the F . F is significantly different from 1 then the two variances are not likely to be the same. Is it possible to have a significant F -test in multiple regression and still have insignificant individual variables (that is all with t-stats less than tcrit )? Yes! It is possible to have large enough sample size such that the F - statistic is higher than its critical value no matter what the regression is. Possible to have a significant F -test in simple regression and still have insignificant individual variable (that is all with t-stat less than tcrit )? No. 1.1 Deriving The F-Test for a Simple Regression (k = 1) F - statistic is often used as test hypotheses about the regression line as a whole. It asks what is the ratio of the variance explained by the regression to what is left over. If the null hypothesis is true, then the variation of Y from observation to observation will not be affected by changes in X, but must be explained by the random disturbance term alone. In this case the numerator and denominator of the F - statistic will be the same. To see this, write (in single deviation form) 1 Ŷi = Ȳ + β̂xi n � i=1 (Ŷi − Ȳ ) = β̂ 2 2 n � x2i = ESS i=1 The F is really ratio of two variances. A variance is an expected value of sum of squares of deviations around the mean of whatever one is taking the variance of. What is the expected value of the ESS E(ESS) = E(β̂ 2 n � x2i ) = i=1 n � x2i E(β̂ 2 ) i=1 Adding and subtracting the population parameter β: and expanding: E(β̂ 2 ) = E[(β̂ − β) + β)]2 E(β̂ 2 ) = E[(β̂ − β)2 + 2β(β̂ − β) + β 2 ] By the theorem on the linear combination of expected values: E(β̂ 2 ) = E[(β̂ − β)2 ] + E[2β(β̂ − β)] + E(β 2 )] but since β̂ is an unbiased estimator of β, 2β(β̂ − β) = 0 and since β is a number and not a random variable E(β 2 ) = β 2 we have: E(β̂ 2 ) = E[(β̂ − β)2 ] + β 2 so that the expected value of the ESS can be written: E(ESS) = E[(β̂ − β)2 n � x2i ] + β 2 i=1 n � x2i (1) i=1 Now look at the first term on the right, E[(β̂ − β)2 . This is just a fancy way to write the variance of β̂. Let’s look at the variance of β̂ for a moment. The variance of the estimator β̂ is given by var(β̂1 ) = var � �n � xi Yi �i=1 n 2 i=1 xi When you look at this expression, it is really just linear combination of the Y s with weights xi wi = �n 2 i=1 xi so that var(β̂1 ) = wi2 var (Yi ) 2 and since the var (Yi ) = σi2 the variance of the population, we have σ2 var(β̂1 ) = �n x2i i=1 Now in equation 1 we can substitute this last equation rearranged var(β̂1 ) n � x2i = σ 2 i=1 to get: E(ESS) = σ 2 + β 2 n � x2i (2) i=1 Under the null hypothesis that β = 0, we get E(ESS) = σ 2 This means that if there really isn’t any explanatory power of the Xs the variance of Ŷ around the mean will be the same as the population variance. This is what we would have guessed; that is, the expected value of the sum of squared deviations around the mean is the population variance when β = 0. Where does the χ2 distribution come in? Each of the Xs below can be thought separately as a identically and independently distributed random variable (iid), conditional on X. So where does this χ2 get its degree of freedom? Think of each condition Y as deviating from the mean Ȳ and since they are all identically distributed they have the same mean. What is the distribution of the SSR under the alternate hypothesis that β is not equal to zero? Note that the sampling distribution β̂/SE(β̂) is distributed as N (x, µ, σ) = (x; 0, 1). Hence β̂ 2 /SE(β̂)2 is distributed as the sum of squares independently distributed normal random variables with mean 0 and variance 1, that is, as chi-square with n − 2 degrees of freedom. So it finally makes sense that the F - statistic can be written F = for a simple regression. 1.2 ESS SSR/(n − 2) Alternative Views of the F-Statistic This can be seen as a restricted or constrained case of OLS (k = 0). The unrestricted or unconstrained case would be just be with k = 1. Calculate the sample variance in the unrestricted case: 3 n � i=1 now the restricted ui 2 /(n − k − 1) = SSR/(n − 2) n � i=1 (Yi − Ȳ )2 /(n − 1) = T SS/1 Now let ∆ESS be the ratio of the increase in the error sum of squares caused by using restricted model. ∆ESS = T SS − SSR SSR but since T SS = ESS + SSR, we have: ESS/1 ∼ F (1, n − 2) SSR/(n − 2) ∆ESS = If this is close to zero, the contribution made by the unrestricted model (using the regression) is insignificant. 1.3 The F can be expressed in terms of the R2 �n i=1 ei R = 1 − �n 2 1 − R2 = 1 − 2 2 i=1 (Yi − Ȳ ) = T SS − SSR T SS T SS − SSR T SS − T SS + SSR SSR = = T SS T SS T SS R2 T SS − SSR = 1 − R2 SSR F = ESS/k SSR/(n − k − 1) but since T SS = ESS + SSR F[ Hence: k T SS − SSR ]= n−k−1 SSR F[ k R2 ]= n−k−1 1 − R2 F = R2 /k (1 − R2 )/(n − k − 1) If the null hypothesis is true then we would expect the R2 and therefore the F to be approximately 0. Thus a high value of the F -statistic is a rationale for rejecting the null. An F - statistic not significantly different from zero leads 4 us conclude that the explanatory variable(s) does little to explain the variation around the mean of Y. Main Point: the F -test is really a way of getting to use a distribution for the R2 to be able to test its significance. 1.4 F -statistic is the same as the t-stat in simple regression In single deviation form, we can write: Ŷi = Ȳ + β̂xi n � i=1 (Ŷi − Ȳ ) = β̂ 2 �� F = �n �n i=1 (Ŷi n 2 i − Ŷi ) i=1 (Y� n (n − 2) i=1 x2i − Ȳ )2 2 i=1 (Yi − Ŷi ) /(n − 2) = β̂ 2 2 n � x2i i=1 = SE(β̂) �n 2 i=1 xi � n [SE(β̂)]2 i=1 β̂ SE(β̂) 5 =t x2i = β̂ 2 [SE(β̂)]2 = t2