Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of statistics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Regression toward the mean wikipedia , lookup
Misuse of statistics wikipedia , lookup
For 95 out of 100 (large) samples, the interval x x 1.96 n But we don’t know ?! will contain the true population mean. Inference for the Mean of a Population To estimate m, we use a confidence interval around x. x 1.96 x n The confidence interval is built with , which we replace with s (the sample std. dev.) if is not known. t-distributions s n xm t s n The “standard error” of x. The “standard error” of x. For an SRS sample, the one-sample t-statistic has the t-distribution with n-1 degrees of freedom. (see Table D) t-distributions t-distributions with k (=n-1) degrees of freedom – – – – are labeled t(k), are symmetric around 0, and are bell-shaped … but have more variability than Normal distributions, due to the substitution of s in the place of . Example: Estimating the level of vitamin C Data: 26 31 23 22 11 22 14 31 Find a 95% confidence interval for m. A: ( , ) Write it as “estimate plus margin of error” STATA Exercise 1 STATA Exercise 2 STATA Exercise 2 STATA Exercises 3 and 4 STATA Exercise 5 Paired, unpaired tests “Paired” tests compare each individual between two variables and ask whether the mean difference (“gain” in this example) is zero. Ho: mean(pretest - posttest) = mean(diff) = 0 STATA Exercise 6 Robustness of t procedures t-tests are only appropriate for testing a hypothesis on a single mean in these cases: – – – If n<15: only if the data is Normally distributed (with no outliers or strong skewness) If n≥15: only if there are no outliers or strong skewness If n≥40: even if clearly skewed (because of the Central Limit Theorem) Comparing Two Means Comparing Two Means Suppose we make a change to the registration procedure. Does this reduce the number of mistakes? Basically, we’re looking at two populations: – – the before-change population (population 1) the after-change population (population 2) Is the mean number of mistakes (per student) different? Is m1 – m2 = 0 or 0? Comparing Two Means Notice that we are not matching pairs. We compare two groups. Comparing Two Means Population Variable Mean Standard Deviation 1 x1 m1 1 2 x2 m2 2 Comparing Two Means Population Sample Size Sample Mean Sample Standard Deviation 1 n1 x1 s1 2 n2 x2 s2 Comparing Two Means The population, really, is every single student using each registration procedure, an infinite number of times. – Suppose we get a “good” result today: how do we know it will be repeated tomorrow? We can’t repeat the procedure an infinite number of times, we only have a “sample”: numbers from one year. We estimate (m1 – m2) with (x1 – x2) . Comparing Two Means Remember x is a Random Variable. To estimate m we need both x and the margin of error around x, which is t * x x n , So we need to know n or rather, the appropriate standard error for this estimation. Because we are estimating a difference, we need the standard error of a difference. Comparing Two Means r=0 If the standard error for x1 is 1 Then the standard error for (x1 – x2) is 12 n1 2 2 n2 n1 Two-sample significance test x1 x2 m1 m 2 t 2 1 n1 2 2 n2 STATA uses the Satterthwaite approximation as a default. This t* does not have a t-distribution because we are replacing two standard deviations by their sample equivalents. STATA uses the Satterthwaite approximation as a default. This t* does not have a t-distribution because we are replacing two standard deviations by their sample equivalents. STATA Exercise 7 STATA Exercise 5 Paired, unpaired tests “Paired” tests compare each individual between two variables and ask whether the mean difference (“gain” in this example) is zero. Ho: mean(pretest - posttest) = mean(diff) = 0 “Unpaired” tests take the mean of each variable and test whether the difference of the means is zero. Ho: mean(pretest) - mean(posttest) = diff = 0 ttest ego, by(group) unequal STATA Exercise 8 Robustness and Small Samples Two-sample methods are more robust than one-sample methods. – More so if the two samples have similar shapes and sample sizes. STATA assumes that the variances are the same (what the book calls “pooled t procedures”), unless you tell it the opposite, using the unequal option. Small samples, as always, make the test less robust. Pooled two-sample t procedures Pooled two-sample t procedures Suppose the two Normal population distributions have the same standard deviation. Then the t-statistic that compares the means of samples from those two populations has exactly a t-distribution. Pooled two-sample t procedures The common, but unknown standard deviation of both populations is . The sample standard deviations s1 and s2 estimate . The best way to combine these estimates is to take a “weighted average” of the two, using the dfs as the weights: 2 2 n 1 s n 1 s 1 2 2 s 2p 1 n1 n2 2 THE POOLED TWO-SAMPLE T PROCEDURES (assuming is the same for both populations) sp 1 1 n1 n2 Here, t* is the value for the t(n1 + n2 – 2) density curve with area C between – t* and t*. To test the hypothesis Ho: m1 = m2, compute the pooled two-sample t statistic x1 x2 t sp 1 1 n1 n2 And use P-values from the t(n1 + n2 – 2) distribution. ttest ego, by(group)