Download Appendix > Statistical Inference > Means and Proportions, One and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Ismor Fischer, 1/7/2009
Appendix / A3. Statistical Inference / Means & Proportions, One & Two Samples-1
A3. Statistical Inference
Hypothesis Testing for One Mean μ
POPULATION
Assume random variable X ∼ N(μ, σ ). 1a
Testing H0: μ = μ 0 vs. HA: μ ≠ μ 0
ONE SAMPLE
Test Statistic (with s replacing σ in standard error σ / n ):
X − μ0
s/ n
⎧ Z , if n ≥ 30
~ ⎨
⎩tn −1 , if n < 30
1a
Normality can be verified empirically by checking quantiles (such as 68%, 95%, 99.7%),
stemplot, normal scores plot, and/or “Lilliefors Test.” If the data turn out not to be normally
distributed, things might still be OK due to the Central Limit Theorem, provided n ≥ 30.
Otherwise, a transformation of the data can sometimes restore a normal distribution.
1b
When X1 and X2 are not close to being normally distributed (or more to the point, when their
difference X1 – X2 is not), or not known, a common alternative approach in hypothesis testing
is to use a “nonparametric” test, such as a Wilcoxon Test. There are two types: the “Rank
Sum Test” (or “Mann-Whitney Test”) for independent samples, and the “Signed Rank Test”
for paired sample data. Both use test statistics based on an ordered ranking of the data, and are
free of distributional assumptions on the random variables.
2
If the sample sizes are large, the test statistic follows a standard normal Z-distribution (via the
Central Limit Theorem), with standard error = σ 1 2 / n1 + σ 2 2 / n2 . If the sample sizes are
small, the test statistic does not follow an exact t-distribution, as in the single sample case,
unless the two population variances σ12 and σ22 are equal. (Formally, this requires a separate
test of how significantly the sample statistic s12 / s22, which follows an F-distribution, differs
from 1. An informal rule of thumb is to accept equivariance if this ratio is between 0.25 and 4.
Other, formal tests, such as “Levene’s Test”, can also be used.) In this case, the two samples
can be pooled together to increase the power of the t-test, and the common value of their equal
variances estimated. However, if the two variances cannot be assumed to be equal, then
approximate t-tests – such as Satterwaithe’s Test – should be used. Alternatively, a
Wilcoxon Test is frequently used instead; see footnote 1b above.
Ismor Fischer, 1/7/2009
Appendix / A3. Statistical Inference / Means & Proportions, One & Two Samples-2
Hypothesis Testing for Two Means μ 1 vs. μ 2
POPULATION
Random Variable X defined on two groups (“arms”):
Assume X1 ~ N(μ 1 , σ 1),
X2 ~ N(μ 2 , σ 2).
Testing H0: μ 1 − μ 2 = μ 0
1a, 1b
Note: = 0, frequently
mean μ of X1− X2
TWO SAMPLES
Independent 2
n1 ≥ 30, n2 ≥ 30
( X 1 − X 2 ) − μ0
Z =
s12 s2 2
+
n1 n2
~ N(0,1)
σ 12 ≠ σ 22
σ 12 = σ 22
Test Statistic (σ 12 , σ 22 replaced by spooled2 in standard error):
n1 < 30, n2 < 30
Paired
Test Statistic (σ 12 , σ 22 replaced by s12 , s22 in standard error):
T =
( X 1 − X 2 ) − μ0
spooled
2
1 1
+
n1 n2
~ tdf , df = n1 + n2 – 2
Since the data are naturally “matched”
by design, the pairwise differences
constitute a single collapsed sample.
Therefore, apply the appropriate onesample test to the random variable
D = X1 – X2
where spooled 2 = [ (n1 – 1) s12 + (n2 – 1) s22 ] / df
(hence D = X 1 − X 2 ), having mean
μ = μ1 − μ2; s = sample standard
deviation of the D-values.
Must use an approximate t-test, such as Satterwaithe.
Note that the Wilcoxon Signed Rank
Test may be used as an alternative.
Note that the Wilcoxon (= Mann-Whitney) Rank Sum Test
may be used as an alternative.
Ismor Fischer, 1/7/2009
Appendix / A3. Statistical Inference / Means & Proportions, One & Two Samples-3
Hypothesis Testing for One Proportion π
POPULATION
Binary random variable Y, with P (Y = 1) = π
Testing H0: π = π 0 vs. HA: π ≠ π 0
ONE SAMPLE
π (1 – π ) / n with N(0, 1) distribution.
If n is large3, then standard error ≈
•
For confidence intervals, replace π by its point estimate
πˆ = X / n, where X = Σ (Y = 1) = # “successes” in sample.
•
For acceptance regions and p-values, replace π by π 0, i.e.,
Test Statistic:
Z =
πˆ − π 0
π 0 (1 − π 0 )
~ N(0, 1)
n
If n is small, then the above approximation does not apply, and computations
are performed directly on X, using the fact that it is binomially distributed.
That is, X ∼ Bin (n; π ). Messy by hand...
3
In this context, “large” is somewhat subjective and open to interpretation. A typical criterion
is to require that the mean number of “successes” n π , and the mean number of “failures”
n (1 − π ) , in the sample(s) be sufficiently large, say greater than or equal to 10 or 15. (Other,
less common, criteria are also used.)
Ismor Fischer, 1/7/2009
Appendix / A3. Statistical Inference / Means & Proportions, One & Two Samples-4
Hypothesis Testing for Two Proportions π 1 vs. π 2
POPULATION
Binary random variable Y defined on two groups (“arms”),
P (Y1 = 1) = π 1 , P (Y2 = 1) = π 2
Testing H0: π 1 − π 2 = 0 vs. HA: π 1 − π 2 ≠ 0
TWO SAMPLES
Independent
Standard error =
Paired
π 1 (1 – π 1) / n 1 + π 2 (1 – π 2) / n 2 ,
with N(0, 1) distribution
3
Large
•
For confidence intervals, replace π1, π2 by point estimates πˆ1 , πˆ 2 .
•
For acceptance regions and p-values, replace π1, π2 by the pooled
estimate of their common value under the null hypothesis,
πˆpooled = (X 1 + X 2) / (n 1 + n 2), i.e.,
Test Statistic:
Z =
(πˆ1 − πˆ 2 ) − 0
πˆ pooled (1 − πˆ pooled )
1 1
+
n1 n2
McNemar’s Test
(A “matched” form of the χ2 Test.)
~ N(0, 1)
Alternatively, can use a Chi-squared (χ 2) Test.
Small
Fisher’s Exact Test
(Messy; based on the “hypergeometric distribution” of X.)
Ad hoc techniques; not covered.
Related documents