Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ismor Fischer, 1/7/2009 Appendix / A3. Statistical Inference / Means & Proportions, One & Two Samples-1 A3. Statistical Inference Hypothesis Testing for One Mean μ POPULATION Assume random variable X ∼ N(μ, σ ). 1a Testing H0: μ = μ 0 vs. HA: μ ≠ μ 0 ONE SAMPLE Test Statistic (with s replacing σ in standard error σ / n ): X − μ0 s/ n ⎧ Z , if n ≥ 30 ~ ⎨ ⎩tn −1 , if n < 30 1a Normality can be verified empirically by checking quantiles (such as 68%, 95%, 99.7%), stemplot, normal scores plot, and/or “Lilliefors Test.” If the data turn out not to be normally distributed, things might still be OK due to the Central Limit Theorem, provided n ≥ 30. Otherwise, a transformation of the data can sometimes restore a normal distribution. 1b When X1 and X2 are not close to being normally distributed (or more to the point, when their difference X1 – X2 is not), or not known, a common alternative approach in hypothesis testing is to use a “nonparametric” test, such as a Wilcoxon Test. There are two types: the “Rank Sum Test” (or “Mann-Whitney Test”) for independent samples, and the “Signed Rank Test” for paired sample data. Both use test statistics based on an ordered ranking of the data, and are free of distributional assumptions on the random variables. 2 If the sample sizes are large, the test statistic follows a standard normal Z-distribution (via the Central Limit Theorem), with standard error = σ 1 2 / n1 + σ 2 2 / n2 . If the sample sizes are small, the test statistic does not follow an exact t-distribution, as in the single sample case, unless the two population variances σ12 and σ22 are equal. (Formally, this requires a separate test of how significantly the sample statistic s12 / s22, which follows an F-distribution, differs from 1. An informal rule of thumb is to accept equivariance if this ratio is between 0.25 and 4. Other, formal tests, such as “Levene’s Test”, can also be used.) In this case, the two samples can be pooled together to increase the power of the t-test, and the common value of their equal variances estimated. However, if the two variances cannot be assumed to be equal, then approximate t-tests – such as Satterwaithe’s Test – should be used. Alternatively, a Wilcoxon Test is frequently used instead; see footnote 1b above. Ismor Fischer, 1/7/2009 Appendix / A3. Statistical Inference / Means & Proportions, One & Two Samples-2 Hypothesis Testing for Two Means μ 1 vs. μ 2 POPULATION Random Variable X defined on two groups (“arms”): Assume X1 ~ N(μ 1 , σ 1), X2 ~ N(μ 2 , σ 2). Testing H0: μ 1 − μ 2 = μ 0 1a, 1b Note: = 0, frequently mean μ of X1− X2 TWO SAMPLES Independent 2 n1 ≥ 30, n2 ≥ 30 ( X 1 − X 2 ) − μ0 Z = s12 s2 2 + n1 n2 ~ N(0,1) σ 12 ≠ σ 22 σ 12 = σ 22 Test Statistic (σ 12 , σ 22 replaced by spooled2 in standard error): n1 < 30, n2 < 30 Paired Test Statistic (σ 12 , σ 22 replaced by s12 , s22 in standard error): T = ( X 1 − X 2 ) − μ0 spooled 2 1 1 + n1 n2 ~ tdf , df = n1 + n2 – 2 Since the data are naturally “matched” by design, the pairwise differences constitute a single collapsed sample. Therefore, apply the appropriate onesample test to the random variable D = X1 – X2 where spooled 2 = [ (n1 – 1) s12 + (n2 – 1) s22 ] / df (hence D = X 1 − X 2 ), having mean μ = μ1 − μ2; s = sample standard deviation of the D-values. Must use an approximate t-test, such as Satterwaithe. Note that the Wilcoxon Signed Rank Test may be used as an alternative. Note that the Wilcoxon (= Mann-Whitney) Rank Sum Test may be used as an alternative. Ismor Fischer, 1/7/2009 Appendix / A3. Statistical Inference / Means & Proportions, One & Two Samples-3 Hypothesis Testing for One Proportion π POPULATION Binary random variable Y, with P (Y = 1) = π Testing H0: π = π 0 vs. HA: π ≠ π 0 ONE SAMPLE π (1 – π ) / n with N(0, 1) distribution. If n is large3, then standard error ≈ • For confidence intervals, replace π by its point estimate πˆ = X / n, where X = Σ (Y = 1) = # “successes” in sample. • For acceptance regions and p-values, replace π by π 0, i.e., Test Statistic: Z = πˆ − π 0 π 0 (1 − π 0 ) ~ N(0, 1) n If n is small, then the above approximation does not apply, and computations are performed directly on X, using the fact that it is binomially distributed. That is, X ∼ Bin (n; π ). Messy by hand... 3 In this context, “large” is somewhat subjective and open to interpretation. A typical criterion is to require that the mean number of “successes” n π , and the mean number of “failures” n (1 − π ) , in the sample(s) be sufficiently large, say greater than or equal to 10 or 15. (Other, less common, criteria are also used.) Ismor Fischer, 1/7/2009 Appendix / A3. Statistical Inference / Means & Proportions, One & Two Samples-4 Hypothesis Testing for Two Proportions π 1 vs. π 2 POPULATION Binary random variable Y defined on two groups (“arms”), P (Y1 = 1) = π 1 , P (Y2 = 1) = π 2 Testing H0: π 1 − π 2 = 0 vs. HA: π 1 − π 2 ≠ 0 TWO SAMPLES Independent Standard error = Paired π 1 (1 – π 1) / n 1 + π 2 (1 – π 2) / n 2 , with N(0, 1) distribution 3 Large • For confidence intervals, replace π1, π2 by point estimates πˆ1 , πˆ 2 . • For acceptance regions and p-values, replace π1, π2 by the pooled estimate of their common value under the null hypothesis, πˆpooled = (X 1 + X 2) / (n 1 + n 2), i.e., Test Statistic: Z = (πˆ1 − πˆ 2 ) − 0 πˆ pooled (1 − πˆ pooled ) 1 1 + n1 n2 McNemar’s Test (A “matched” form of the χ2 Test.) ~ N(0, 1) Alternatively, can use a Chi-squared (χ 2) Test. Small Fisher’s Exact Test (Messy; based on the “hypergeometric distribution” of X.) Ad hoc techniques; not covered.