Download Comparing Two Means

 For 95 out of 100 (large) samples, the interval    x x  1.96  n  But we don’t know  ?! will contain the true population mean. Inference for the Mean of a Population  To estimate m, we use a confidence interval around x.   x  1.96   x  n The confidence interval is built with , which we replace with s (the sample std. dev.) if  is not known. t-distributions s n xm t s n The “standard error” of x.  The “standard error” of x. For an SRS sample, the one-sample t-statistic has the t-distribution with n-1 degrees of freedom. (see Table D) t-distributions  t-distributions with k (=n-1) degrees of freedom – – – – are labeled t(k), are symmetric around 0, and are bell-shaped … but have more variability than Normal distributions, due to the substitution of s in the place of . Example: Estimating the level of vitamin C    Data: 26 31 23 22 11 22 14 31 Find a 95% confidence interval for m. A: ( , ) Write it as “estimate plus margin of error” STATA Exercise 1 STATA Exercise 2 STATA Exercise 2 STATA Exercises 3 and 4 STATA Exercise 5 Paired, unpaired tests  “Paired” tests compare each individual between two variables and ask whether the mean difference (“gain” in this example) is zero. Ho: mean(pretest - posttest) = mean(diff) = 0 STATA Exercise 6 Robustness of t procedures  t-tests are only appropriate for testing a hypothesis on a single mean in these cases: – – – If n<15: only if the data is Normally distributed (with no outliers or strong skewness) If n≥15: only if there are no outliers or strong skewness If n≥40: even if clearly skewed (because of the Central Limit Theorem) Comparing Two Means Comparing Two Means   Suppose we make a change to the registration procedure. Does this reduce the number of mistakes? Basically, we’re looking at two populations: – –  the before-change population (population 1) the after-change population (population 2) Is the mean number of mistakes (per student) different? Is m1 – m2 = 0 or  0? Comparing Two Means  Notice that we are not matching pairs. We compare two groups. Comparing Two Means Population Variable Mean Standard Deviation 1 x1 m1 1 2 x2 m2 2 Comparing Two Means Population Sample Size Sample Mean Sample Standard Deviation 1 n1 x1 s1 2 n2 x2 s2 Comparing Two Means  The population, really, is every single student using each registration procedure, an infinite number of times. –   Suppose we get a “good” result today: how do we know it will be repeated tomorrow? We can’t repeat the procedure an infinite number of times, we only have a “sample”: numbers from one year. We estimate (m1 – m2) with (x1 – x2) . Comparing Two Means  Remember x is a Random Variable. To estimate m we need both x and the margin of error around x, which is t *  x    x    n ,  So we need to know   n or rather, the appropriate standard error for this estimation. Because we are estimating a difference, we need the standard error of a difference. Comparing Two Means r=0   If the standard error for x1 is 1  Then the standard error for (x1 – x2) is  12 n1 2  2  n2 n1 Two-sample significance test  x1  x2   m1  m 2  t  2 1 n1   2 2 n2 STATA uses the Satterthwaite approximation as a default. This t* does not have a t-distribution because we are replacing two standard deviations by their sample equivalents. STATA uses the Satterthwaite approximation as a default. This t* does not have a t-distribution because we are replacing two standard deviations by their sample equivalents. STATA Exercise 7 STATA Exercise 5 Paired, unpaired tests   “Paired” tests compare each individual between two variables and ask whether the mean difference (“gain” in this example) is zero. Ho: mean(pretest - posttest) = mean(diff) = 0 “Unpaired” tests take the mean of each variable and test whether the difference of the means is zero. Ho: mean(pretest) - mean(posttest) = diff = 0 ttest ego, by(group) unequal STATA Exercise 8 Robustness and Small Samples  Two-sample methods are more robust than one-sample methods. – More so if the two samples have similar shapes and sample sizes.   STATA assumes that the variances are the same (what the book calls “pooled t procedures”), unless you tell it the opposite, using the unequal option. Small samples, as always, make the test less robust. Pooled two-sample t procedures Pooled two-sample t procedures   Suppose the two Normal population distributions have the same standard deviation. Then the t-statistic that compares the means of samples from those two populations has exactly a t-distribution. Pooled two-sample t procedures   The common, but unknown standard deviation of both populations is . The sample standard deviations s1 and s2 estimate . The best way to combine these estimates is to take a “weighted average” of the two, using the dfs as the weights: 2 2     n  1 s  n  1 s 1 2 2 s 2p  1 n1  n2  2 THE POOLED TWO-SAMPLE T PROCEDURES (assuming  is the same for both populations) sp 1 1  n1 n2 Here, t* is the value for the t(n1 + n2 – 2) density curve with area C between – t* and t*. To test the hypothesis Ho: m1 = m2, compute the pooled two-sample t statistic x1  x2  t sp 1 1  n1 n2 And use P-values from the t(n1 + n2 – 2) distribution. ttest ego, by(group)

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Comparing Two Means