Download Chapter 3 Interval Estimation and Hypothesis Testing e x y + β+β= σ

Chapter 3 Interval Estimation and Hypothesis Testing The linear regression equation is: y i = β1 + β 2 x i + e i for i = 1, 2, . . . , N Recall that b2 can be expressed as a linear function of the errors e i . Any linear function of normally distributed variables is also normally distributed. Therefore, the distributional properties of the slope estimator can be stated as: b2 ~ N( β 2 , var( b2 )) The error assumptions are: E( ei ) = 0 for all i 2 A result is: var( ei ) = σ 2 for all i homoskedasticity cov( ei , e j ) = 0 for all i ≠ j uncorrelated errors The least squares (OLS) estimators b1 and b2 are point estimators of the unknown parameters β1 and β 2 . b2 − β 2 ~ N( 0 , 1 ) var( b2 ) the standard normal distribution To work with this, replace var(b2 ) with the estimator vâr(b2 ) to get: b2 − β 2 b − β2 ~ t (N − 2 ) = 2 vâr(b2 ) se(b2 ) Now construct interval estimators and consider hypothesis tests for β1 and β 2 . This is the t-distribution with (N−2) degrees of freedom. Introduce the assumption that the errors are normally distributed. The error assumptions can be stated as: ( ei ~ N 0 , σ 2 ) for all i Also, the equation errors are independently distributed. Independence can be assumed since if two variables are normally distributed then zero correlation implies statistical independence. The normality asssumption can be justified. View the error term as the sum of a ‘large’ number of independent and identically distributed random variables that reflect various omitted and unobservable factors. By the central limit theorem, this sum tends to a normal distribution. 1 Econ 326 - Chapter 3 2 Econ 326 - Chapter 3 The probability density function (PDF) of the t-distribution is a symmetric bell-shaped curve centered about zero. It looks very similar to the standard normal distribution. Interval estimation can now proceed. For a probability value α , a critical value t c can be found such that: b − β2   P− t c ≤ 2 ≤ tc  = 1 − α se(b2 )   The shape of the t-distribution depends on the degrees of freedom (df). As N increases (say N > 100) the t-distribution can be approximated by the standard normal distribution. The figure below shows a comparison of the PDF of the t-distribution with 5 degrees of freedom and the standard normal distribution. For a valid PDF, the total area under the curve is equal to one. Note that the t-distribution has thicker tails compared to the normal distribution. The normal distribution has a higher peak around the mean of 0. Rearrange to obtain: P[ b2 − t c se(b2 ) ≤ β 2 ≤ b2 + t c se(b2 )] = 1 − α This gives a 100(1− α )% confidence interval estimator for β 2 as: b2 ± t c se(b2 ) Standard choices for the probability α are 0.10, 0.05, or 0.01. For example, α = 0.05 is associated with a 95% confidence interval. N(0,1) t(5) Critical values t c are listed in the t-distribution table printed on the inside front cover of the textbook. -3 3 -2 -1 0 1 2 3 Econ 326 - Chapter 3 4 Econ 326 - Chapter 3 The confidence interval estimation rule can be applied to a data set to get an interval estimate. The 95% confidence interval estimate for β 2 is: b2 ± t cse(b2 ) = 10.21 ± 2.024 (2.09) Example = [ 5.97 , 14.45] The household food expenditure textbook data has N = 40 observations. Least squares (OLS) estimation gives: b2 = 10.21 and se(b2 ) = 2.09 What affects the width of the interval estimate ? For confidence interval estimation set α = 0.05 an increase in se(b2 ) gives a wider confidence interval. The critical value from the t-distribution with Ν−2 = 38 degrees of freedom is illustrated in the picture. as the sample size N increases: o t c decreases o se(b2 ) decreases More information leads to more precise estimates. The extra information is reflected in narrower confidence intervals. probability density function of t(38) the higher the probability content (1 − α) the wider the confidence interval. A 99% confidence interval is wider than a 95% confidence interval. upper tail area = 0.05/2 = 0.025 How is a 95% interval estimate interpreted ? -tc 5 0 tc=2.024 Econ 326 - Chapter 3 View the data set as one sample from the population. A different sample will give different parameter estimates and different interval estimates. An interval estimate may or may not contain the true parameter value β 2 . In 1000 samples, 950 of the interval estimates will contain β 2 and 50 interval estimates will not contain β 2 . 6 Econ 326 - Chapter 3 The interpretation of an interval estimate can be demonstrated with a computer simulation experiment. A computer simulation experiment was performed. Results from the first 10 samples are in the table. For the household food expenditure textbook data suppose that the true economic model is: Sample 1 12.72 2 12.19 3 11.78 4 14.41 5 10.55 6 9.32 7 16.35 8 13.28 9 12.64 10 10.80 more samples y = 80 + 12 x That is, β 2 = 12 . The steps in the experiment are: Step 1: Use a normal random number generator to obtain errors ei . Step 2: For the 40 observations on household income x i generate a sample of observations for food expenditure: y i = 80 + 12 x i + ei for i = 1, 2, . . . , 40 Step 3: With the observations (y i , x i ) estimate the regression parameters by the least squares method. Calculate a 95% interval estimate for the slope parameter. b2 average of 1000 samples 95% confidence interval 8.14 8.06 7.40 10.63 6.93 4.95 12.53 9.62 7.87 7.22 17.31 16.31 16.17 18.20 14.18 13.69 20.16 16.93 17.40 14.38 does not contain β2 12.03 Note that each interval estimate is centered at the slope estimate b2 . Repeat the three steps 1000 times. The results from the 1000 samples show that the average of the slope estimates is 12.03, very close to the true parameter β 2 = 12. This demonstrates that the least squares estimator b2 is an unbiased estimator for β 2 . 7 Econ 326 - Chapter 3 8 Econ 326 - Chapter 3 Inspection of the 10 interval estimates reported above shows that one does not contain the true parameter 12 and the others do contain the true parameter. The results for all 1000 samples showed that 951 of the interval estimates contained the true slope parameter 12. The other 49 interval estimates either had an upper limit below the value 12 or a lower limit above the value 12. Hypothesis Tests The linear regression equation is: y i = β1 + β 2 x i + e i for i = 1, 2, . . . , N Assume the equation errors are independently distributed and ( ei ~ N 0 , σ 2 ) for all i Consider testing the null hypothesis: H0 : β 2 = 0 against the alternative hypothesis: H1 : β 2 ≠ 0 (a two-sided alternative) Why is this an interesting hypothesis to test ? For a given data set, if the null hypothesis is not rejected then there is no evidence for a linear relationship between y and x. Rejection of the null hypothesis implies that there is a ‘statistically significant’ relationship between y and x. That is, the estimate b 2 is significantly different from zero. This test is called a ‘test of significance’. 9 Econ 326 - Chapter 3 10 Econ 326 - Chapter 3 • Test of significance approach Decision Rule Choose a significance level α . This sets the probability of a Type I error. A Type I error is rejecting a null hypothesis that is true. If the null hypothesis β 2 = 0 is correct then the random variable: b2 − 0 b2 = ~t se(b2 ) se(b2 ) (N − 2) Standard choices are α = 0.10, 0.05, or 0.01. t-distribution with N−2 df From the least squares (OLS) estimation results a numeric t-test statistic is calculated as: Three equivalent test methods are available. t= • Confidence interval approach Construct a 100(1− α )% confidence interval estimate for β 2 . If the value 0 is contained in this interval then do not reject the null hypothesis of a zero slope coefficient. b2 se(b 2 ) Use the t-distribution table to look-up the critical value t c that satisfies: P( t (N − 2 ) > t c ) = α/2 The decision rule is reject the null hypothesis if: t > tc or t < − tc That is, reject the null hypothesis if: t > tc 11 Econ 326 - Chapter 3 12 Econ 326 - Chapter 3 • p-value approach For this example, for a two-tail test, an exact p-value is: p = 2 P( t (70 ) > 2.2 ) The p-value of a test statistic is the lowest significance level at which a null hypothesis can be rejected. The p-value calculation is shown in the figure. Example From a data set with 72 observations, least squares estimation results give: b2 = 11 and se(b2 ) = 5 probability density function of t(70) To test the null hypothesis of a zero slope coefficient the test statistic is: t = 11 / 5 = 2.2 lower tail area =p/2 upper tail area =p/2 From the t-distribution table, with 72−2 = 70 degrees of freedom, critical values for a two-tail test are: 5% significance level α = 0.05 t c = 1.994 2% significance level α = 0.02 t c = 2.381 The calculated t-test statistic exceeds the 5% critical value and therefore the null hypothesis is rejected. But the null hypothesis is not rejected at a 2% level. -t 0 t = 2.2 p-values can be computed with Microsoft Excel. For this example, select Insert Function T.DIST.2T(2.2, 70). ↑ two-tail test The answer for the p-value is 0.0311. This suggests that the p-value for the test is between 0.02 and 0.05. 13 Econ 326 - Chapter 3 14 Econ 326 - Chapter 3 The p-value approach to hypothesis testing can now be presented. A one-tail test may be of interest. The test of interest is: The linear regression equation gives a description of economic behaviour. Economic theory may suggest a sign for the slope coefficient β 2 . For example, suppose economic theory argues for a negative coefficient (a downward-sloping line). This can be tested with the left-tail test: H0 : β 2 = 0 against H1 : β 2 ≠ 0 From the least squares estimation results, calculate the test statistic: t= b2 se( b 2 ) H0 : β 2 ≥ 0 Next get the computer to calculate the p-value for a two-tail test as: p = 2 P( t (N − 2 ) > t ) against H1 : β 2 < 0 Rejection of the null hypothesis gives support for the economic theory of a negative slope coefficient. The numerical test statistic is: For a chosen significance level α , there is evidence to reject the null hypothesis if: p<α t= b2 se( b 2 ) The p-value for a left-tail test is calculated as: p = P( t (N − 2 ) < t ) p-values to accompany test statistics are routinely reported by econometrics computer packages. On the Stata estimation results, a p-value reported as 0.000 actually means p < 0.0005. This says reject the null hypothesis at any reasonable significance level. 15 Econ 326 - Chapter 3 16 Econ 326 - Chapter 3 The p-value calculation for the left-tail test can be illustrated. Alternatively, the test of significance approach can be used to make a decision for a left-tail test. For a significance level α , use the t-distribution table to look-up the critical value t c that satisfies: P( t (N − 2 ) > t c ) = α probability density function of t(N-2) By symmetry, this is the same as: P( t (N − 2 ) < − t c ) = α The decision rule is reject the null hypothesis in favour of the alternative β 2 < 0 if area = p-value t=b2/se(b2) t < − tc 0 For a chosen significance level α , there is evidence to reject the null hypothesis if: p<α 17 Econ 326 - Chapter 3 18 Econ 326 - Chapter 3 For t-test statistics, the Stata estimation results report p-values for two-tail tests. A p-value for a one-tail test can be recovered from the p-value for a two-tail test. Denote p * against H0 : β 2 = c where c is some number. H1 : β 2 ≠ 0 If the null hypothesis is correct then the random variable: as the p-value for the test. b2 − c ~t se(b2 ) (N − 2) Now suppose the hypothesis of interest is: H0 : β 2 ≥ 0 against H1 : β 2 ≠ c against The statement of the hypothesis is motivated by the underlying economic theory. Consider the two-tail test: H0 : β 2 = 0 A more general hypothesis is: H1 : β 2 < 0 If the slope estimate b2 is negative then the p-value for this left-tail test is: With a sample of data, the least squares estimation results give a slope estimate b2 and the estimated standard error se(b2 ) . A test statistic and accompanying p-value for a two-tail test are calculated as: p* / 2 t= b2 − c se(b2 )  b −c p = 2 P t (N − 2) > 2 se(b2 )      For a chosen significance level α , the information in the data gives evidence to reject the null hypothesis if: p<α 19 Econ 326 - Chapter 3 20 Econ 326 - Chapter 3 Example: A One-Tail Test The estimated household food expenditure equation is: fôod i = 83.42 + 10.21 incomei The household food expenditure model, introduced earlier, is: (43.41) food i = β 1 + β 2 income i + e i For the sample of N = 40 households, food expenditure is measured in $ and income is reported in $100. A new supermarket will be profitable if household expenditure on food is more than $8 out of each additional $100 of income. Should the property owner build the supermarket? against (standard errors) The test statistic is: t= b2 − 8 10.21 − 8 = 1.06 = 2.09 se(b2 ) The computer software reports a p-value for a two-tail test as: 2 P( t ( 38 ) > 1.06 ) = 0.2978 This problem suggests the right-tail test: H0 : β2 ≤ 8 (2.09) A p-value for the right-tail test is: H1 : β 2 > 8 p = P( t (38) > 1.06 ) = If the null hypothesis is rejected then there is evidence that the supermarket will be profitable. 0.2978 = 0.15 2 The calculated p-value of 0.15 exceeds the usual significance levels (of 0.10, 0.05 or 0.01). Therefore, there is no evidence in the data to reject the null hypothesis. A new supermarket is unlikely to be profitable. 21 Econ 326 - Chapter 3 22 Econ 326 - Chapter 3 The graph shows the calculation of the p-value for the right-tail test as the area under the t-distribution probability density function to the right of the calculated test statistic. probability density function of t(38) p = 0.15 0 23 t=1.06 Econ 326 - Chapter 3

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 3 Interval Estimation and Hypothesis Testing e x y + β+β= σ