Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
How do we use confidence intervals & significance tests to make inferences from a random sample about a population mean? How do we use confidence intervals & significance tests to compare the means of two populations? Standard error: when the standard deviation of a statistic is estimated from the data (i.e. from a sample), the result is called the standard error of the statistic. Standard error: the estimated average deviation from the expected value of the sample mean if the sample were repeated over & over. Standard error is based on the tdistribution, not the standard normal (z) distribution. Because it’s based on sample data, the t-distribution is less certain, less precise, & thus more variable than the z-distribution. Hence the t-distribution is flatter, or wider, than the z-distribution, when N<=1000. But the t-distribution closely & increasingly approximates the zdistribution once sample size reaches N=120. When N>1000, then the t- and zdistributions are identical. Put differently, the smaller the sample size (i.e. the fewer the degrees of freedom*), the wider (i.e. the less precise) the t-distribution is relative to the z-distribution. * Recall that ‘degrees of freedom’ are the amount of information available to estimate a statistic. The more df’s, the better. This, then, is another reason to have larger samples: so that the t-distribution becomes more precise, & thus so hypothesis tests can be more accurate. The z-distribution is used when we know the population’s standard deviation—which, however, we virtually never know. Almost always, then, we use the tdistribution, because we are estimating a statistic from sample data. Confirm that there’s a different tdistribution for each n – 1 distribution: Check the t-distribution critical values in Moore/McCabe/Craig (Table D, page T-11) for each df. N>=120: the t-distribution closely & increasingly approximates the zdistribution. N>1000: the t-distribution & zdistribution are identical. See Table D (page T-11). Standard error of the mean: when the standard deviation of the mean is estimated from sample data (& thus the t-distribution is used). Formula for the standard error of the mean: se s n We’ve already been using this formula, but we’ve generally been using the z-distribution. From now on, when we refer to the standard error of the mean, we’ll use the tdistribution. A sample mean will deviate from the population mean due to sampling error (not to mention non-sampling error). The standard error of the mean gives the estimated size of this deviation. From now on, then, think standard error & tdistribution. Here’s the t-confidence interval for the mean of a quantitative variable: x t * s/ n How to use the t-distribution in hypothesis tests We can use t-value confidence intervals to make inferences from a sample mean about a benchmark mean (i.e. some hypothesized parameter from the present or past). The One-Sample t-Test The one-sample t-test uses the t-confidence interval to compare the mean of a random sample to some benchmark parameter (from the present or past). E.g., compare the mean SAT score of a random sample of FIU undergrads to some other, ‘ideal’ score (e.g., 500). Is the difference large enough relative to the standard error of the difference to be statistically significant? E.g., compare the mean SAT score of a random sample of FIU undergrads today to that of FIU undergrads a decade ago. Is the difference large enough relative to the standard error of the difference to be statistically significant? E.g., compare the mean SAT score of a random sample of FIU students to the national SAT mean. Is the difference large enough relative to the standard error of the difference to be statistically significant? The one-sample t-test compares the mean of a quantitative variable from a random sample to some benchmark parameter. This benchmark parameter may be: some measurement ideal some independent, comparison group a parameter from the past or present The one-sample t-test requires: a probability sample of independent observations a quantitative variable a graphic check for pronounced skewness & outliers a benchmark comparison mean t-tests of all sorts can be used safely: When the probability sample N<15 if the data distribution is close to normal (i.e. no more than minimal skewness & no pronounced outliers, because mean & sd are not resistant). When 15<N<40 & there is no pronounced skewness & no outliers. When N>=40 (more or less) if there are no outliers, even if there is pronounced skewness (although transforming this may be safer), due to the central limit theorem & the law of large numbers. What if the sample distribution is too small & non-normal, &/or contains pronounced outliers? One possible option: transform the variable &/or eliminate the outliers (in Stata see ‘help ladder’). Alternatively, use a non-parametric (i.e. distribution free) statistic: the sign rank test or the sign test—though these are much less precise & are weaker than parametric procedures for testing hypotheses. Stata: see ‘help signrank’ or ‘help signtest.’ See Moore/McCabe chap. 7 & the CD-Rom chapter on nonparametric statistics. If the distribution is acceptable or becomes so after you’ve intervened, then use the one-sample t-test: Ho: there’s no difference. x Ha: there is a difference. x Or a one-sided alternative hypothesis. Put differently: Ho: difference = 0 Ha: difference ~= 0 One-sided hypothesis: difference > 0; or difference < 0 E.g., compare the mean SAT score of a random sample of FIU undergrads to some other, ‘ideal’ score (e.g., 500): is the difference statistically significant? First, check that the sample assumption is fulfilled. Second, do a graphic check for pronounced skewness (if sample size <40) & for outliers, taking action to minimize the problems if necessary. Third, state the hypotheses, e.g.: Ho: FIU mean SAT = 500 Ha: FIU mean SAT 500 Put differently: Ho: diff=0. Ha: diff 0. Fourth, test the hypothesis. These data aren’t in memory, so the Stata test is ttesti rather than ttest. . testi sample-n sample-mean sample-sd benchmark-mean . ttesti 400 512 73 500 One-sample t test x Obs Mean 400 512 Degrees of freedom: Std. Err. Std. Dev. 3.65 73 [95% Conf. Interval] 504.8244 519.1756 399 Ho: mean(x) = 500 Ha: mean < 500 Ha: mean ~= 500 Ha: mean > 500 t= t= t= 3.2877 P<t= 0.9995 3.2877 P>t= 0.0011 3.2877 P>t= 0.0005 Conclusion: Reject the null hypothesis (p=0.001 for a two-tailed test, df=399). Note: if the data are in memory, modify the Stata command as follows: . ttest FIU_SAT = 500 Before we move on to another variety of t-tests: What’s the purpose of the onesample t-test? What kind of data does it require? How do we conduct the test? When does it test significant or insignificant? Example: There is evidence that 51% of a specific graduate program’s student admissions are women, but your program has admitted just 43% women. Should you use a one-sample t-test to assess whether or not this difference is statisically significant? Caution One-sample t-test requires a probability sample. All conclusions are uncertain. Sampling & non-sampling sources of error. The next variety of t-test—matched pairs—applies the one-sample t-test to an after vs. before ‘difference’ score for comparing means for a random sample of matched after vs. before observations. E.g., the mean SAT score of a random sample of FIU students before they received SAT-training versus after they received such training Is the difference in scores large enough relative to the standard error of the difference to be statistically significant? E.g., the mean cholesterol level of a random sample of adults before they go on a low-fat diet versus after they went on the diet. Is the difference large enough relative to the standard error of the difference to be statistically significant? E.g., the mean earnings of a random sample of inner-city women workers before they received skill-training versus after they received such training Is the difference large enough relative to the standard error of the difference to be statistically significant? This is called the matched pair (or dependent sample) t-test: Ho: 0 (i.e. there’s no after vs. before effect) Ha: 0 (i.e. there is an after vs. before effect: the after-mean is greater than the beforemean) The after vs. before matched pairs, of course, are not independent of each other. Is the after vs. before difference in sample means large enough relative to the standard error of the difference to test statistically significant? What kind of data does the matched pairs (or dependent sample) t-test require? a random sample involving the same, matched observations (i.e. individuals or subjects) before & after the treatment a quantitative variable recall the previous discussion of sample size. a graphic check for pronounced skewness & outliers And something else: the sd of the ‘before’ group can’t be more than two times larger/smaller than that of the ‘after’ group. If it is, then use an adjusted version of the t-test: e.g., Stata’s ‘unequal’ option) What if the sample distribution is too skewed &/or contains pronounced outliers? Consider transforming the data &/or eliminating the outliers. Alternatively, use a non-parametric test (i.e. distribution free) test: the sign test or the sign rank test. Here’s an after vs. before example concerning a hypothetical test to improve standardized reading scores. List the ‘after’ data first. For ‘after’ & ‘before’ data, list: sample size sample mean sd . ttesti 22 520 46.1 23 501 44.7 Two-sample t test with equal variances ----------------------------------------------------------------------------------------| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-----------------------------------------------------------------------------x| 22 520 9.828553 46.1 499.5604 540.4396 y| 23 501 9.320594 44.7 481.6703 520.3297 ---------+-----------------------------------------------------------------------------combined | 45 510.2889 6.840412 45.88688 496.5029 524.0748 ---------+------------------------------------------------------------------------------diff | 19 13.53576 -8.297466 46.29747 ------------------------------------------------------------------------------------------Degrees of freedom: 43 Ho: mean(x) - mean(y) = diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 t = 1.4037 t = 1.4037 t = 1.4037 P < t = 0.9162 P > |t| = 0.1676 P > t = 0.0838 Note: if the data are in memory modify the Stata command as follows: . ttest after_score = before_score Example: You gain access to an entire class of third graders for your curriculum experiment. You submit them to a new curriculum to promote scientific thinking. You give them an after vs. before ttest & assess the magnitude of the effect & the test of significance. Correct, or not? Another kind: t-test for comparing the means of two groups: What if we want to compare the means of a quantitative variable for two groups within the same random sample? E.g., we want to know if there’s a statistically significant difference between female & male mean SAT scores among college students. Restricting the population to FIU, we could randomly sample female & male SAT scores at FIU: is the difference between the two groups statistically significant? This variety of t-test, then, compares the mean value on a quantitative variable between two groups (e.g., female vs. male) within the same random sample. Is the difference between the two groups relative to the standard error of the difference statistically significant? The t-test for comparing the means of two groups requires: a random sample of independent observations a quantitative response variable a binary categorical explanatory variable (e.g., females vs. males) a graphic check for pronounced skewness & outliers, & a comparison of the distributions (boxplot) the standard deviation of neither group can be more than twice that of the other group (or else an adjusted version of the t-test must be used: e.g., Stata’s ‘unequal’ option) What if the sample distribution is too skewed &/or contains pronounced outliers? Consider transforming the data &/or eliminating the outliers. Alternatively, use a non-parametric test such as the median test or rank sum test. Non-parametric tests: premised on ranking of sampled observations. Parametric tests: premised on Central Limit Theorem—approximately normal sampling distribution of sample means. The t-test for comparing the means of two ‘independent’ groups: Ho: 1 2 (i.e. there’s no difference) Ha: 1 2 (or one-sided: > or <) Is the difference large enough relative to the combined standard error of the two groups to be statistically significant? Here’s an example concerning female vs. male standardized reading test scores among a sample of California high school students (hsb2.dta). These data are in memory, so the Stata command is ttest: . Ho: ... Ha: … see two-tailed test below . ttest read, by(female) Two-sample t test with equal variances Group Obs male female combined diff Mean Std. Err. Std. Dev. [95% Conf. Interval] 91 52.82418 1.101403 10.50671 50.63605 55.0123 109 51.73394 .9633659 10.05783 49.82439 53.6435 52.23 .7249921 10.25294 50.80035 53.65965 1.457507 -1.783997 3.964459 200 1.090231 Degrees of freedom: 198 Ho: mean(male) - mean(female) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t= t= t= 0.7480 P<t= 0.7723 0.7480 P>t= 0.4553 0.7480 P>t= 0.2277 The test conclusion? A Brief Review What’s the difference between the z-distribution & the t-distribution? At what sample size do they become nearly identical? At what sample size do they become identical? In conducting the various kinds of t-test: Ask if the data are drawn from a random sample of independent observations. Check the sample size. Graphically check the data for pronounced skewness (if sample size< 40) & outliers, & if there are two distributions compare their standard deviations. Conclusions are always uncertain: sampling & nonsampling sources of error. If N<40 & the data do not have pronounced skewness &/or outliers, use the t-procedures to make inferences about a one-sample mean or two-sample means. If at this sample size there are problems with skewness &/or outliers, consider transforming the data &/or eliminating outliers. Alternatively, use nonparametric (i.e. distribution-free) procedures. If N>=40, central limit theorem & law of large numbers kick in, but check for outliers. Summary What’s a standard error? What’s the difference between standard error of the mean & standard deviation? Why is the difference important? What kinds of confidence intervals & what tests do we use to make inferences about one-sample & twosample means of random variables? What are the premises of these tests, & how do we assess the premises in any given case?