Download Task: t test

Task: t test Goal: understand (i) the t test; (ii) p-value Reading: appendix C.6 of the textbook 1. Recall σy2 n Keep in mind, formula (1) is good only for random (i.i.d) sample. var(ȳ) = (1) 2. The estimator for the population variance σy2 is the sample variance 1 ∑ (yi − ȳ)2 n − 1 i=1 n s2y = (2) The square root of s2y is called standard deviation (of y), denoted by sy . 3. If we take square root of (1) and replace σy with sy we get the standard error (of ȳ) sy se(ȳ) = √ n (3) So standard deviation is for y, while standard error is for ȳ. 4. The (one-sample) t test (t-value, t-ratio, t-statistic) is defined as t≡ ȳ − µy ȳ − µy = sy √ se(ȳ) n (4) where µy is unknown. 5. Basically the t test is the standardized sample mean . 6. This is how we make the t test computable. For a given sample, we can always compute the sample mean ȳ and its standard error se(ȳ). Then we compare the sample mean to a hypothesized value c, which is specified in the null hypothesis: H0 : µy = c 1 (5) Then the computable t test becomes t≡ ȳ − c ȳ − c = sy √ se(ȳ) n (6) 7. Intuitively, the null hypothesis H0 specified in (5) is false (so should be rejected) if we find big difference of ȳ −c. This is because ȳ is consistent, converging to true population mean µy asymptotically, and the true population mean can be different from c. 8. However, the big difference of ȳ − c may occur by chance, since we use sample instead of population. In order to take into account the sampling error, we use the t test (6), which is effectively a scaled difference. The scaler is the inverse of the standard error 1 . Everything else equal, the evidence against H0 is more convincing if (i) ȳ − c is se(ȳ) big, or (ii) se(ȳ) is small. In short, the null hypothesis H0 is rejected if the t test is big (in absolute value). 9. Then the old question reappears: how big is big? We need to refer to the standard normal distribution, which is the limit of the sampling distribution of the t test according to central limit theorem (CLT). There are two equivalent ways to define “big” based on the standard normal distribution: |t| > 1.96 (critical value) p-value ≡ P (|Z| > |t|) = P (Z > |t|) + P (Z < −|t|) < 0.05 (7) (8) So the p-value is the probability of a standard normal variable taking values as extreme as the absolute value of t test. If the p-value is small, say, less than 0.05, that means the t value is in the tail area of the normal distribution. So something very unlikely under the null hypothesis happens instead. This can be seen as the evidence against the null hypothesis. 10. The stata function to get P (Z < z) is normal(z) 11. To summarize The null hypothesis is rejected if p-value is less than 0.05 The null hypothesis is rejected if t-test is greater than 1.96 in absolute value 2 12. So far we focus on the two tailed test. That is, the alternative hypothesis is H1two-tailed : µy ̸= c. Under the alternative hypothesis, it is possible that µy > c or µy < c. We get one-tailed test if we rule out one of the possibilities. For instance, one alternative can be H1one-tailed : µy > c. In that case, the p-value for this one-tailed test is P (Z > |t|) 13. Exercise: how to get the p-value for H1one-tailed : µy < c? 14. The stata command for the t test is ttest y = c where c is the hypothesized value of the population mean, see (5). The p-value for the two-tailed test is reported as Pr(|T| > |t|). 15. Denote the observed t value by t. You can explicitly get the p-value for the two-tailed test using the stata command dis 2*normal(-abs(t)) when the sample is large (and CLT works well). For small sample, the t test follows student T distribution. The two-tailed p value is dis 2*ttail(df, abs(t)) where df is the degree of freedom. 16. Critical thinking (a) Why do some people use the critical value of 1.645 other than 1.96? (b) Because we use sample instead of population, is it possible that we reject a correct null hypothesis or fail to reject a false null hypothesis? (c) Please apply the t test for the population mean of ratings of eco201 instructor. Consider two null hypothesis H0 : µy = 2; H0 : µy = 5. Which null hypothesis is more likely to be rejected, why? 3 Math: confidence interval Goal: understand the confidence interval for population mean Reading: appendix C.5 of the textbook 1. The sample mean produces a point estimate for the population mean. For a given sample we get one estimate, a point. 2. We may prefer an interval estimate, which can explicitly accounts for the sampling error. 3. We start from P (−1.96 < Z < 1.96) = 0.95 ⇒ ) ȳ − µy −1.96 < σy < 1.96 = 0.95 ⇒ ( P √ n ( ) σy σy P ȳ − 1.96 √ < µy < ȳ + 1.96 √ = 0.95 n n (9) (10) (11) Finally, after replacing unknown σy with the standard deviation sy we have 95% confidence interval for µy is (ȳ − 1.96se(ȳ), ȳ + 1.96se(ȳ)) where se(ȳ) = s √y , n (12) see (3). The interpretation of confidence interval is given by (11). 4. The confidence interval or interval estimate is more informative than the sample mean, the point estimate. 5. The stata command to obtain the 95% confidence interval for µy is ci y 6. Discuss: (a) how to find the 90% confidence interval for µy ? Is the 90% confidence interval wider or narrower than the 95% confidence interval? (b) is the confidence interval reliable in small sample? (c) how to obtain confidence interval if data do not follow normal distribution 4

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Task: t test