Download St 314 – Day #13 Notes A. Normal Approximation to the Binomial

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

St 314 – Day #13 Notes
A. Normal Approximation to the Binomial
Recall the binomial distribution:
P(Y = y) = p(y) = (n y )py (1-p)n-y
y = 0, 1, …
which gives the probability of y successes out of n independent trials, where the
probability of success on any one trial is p (0 < p < 1).
If n is large this becomes difficult to compute. However, we can use the Central Limit
Theorem (CLT) to approximate the r.v. Y with a normal r.v. Y* where Y and Y* have the
same mean and variance:
E(Y* ) = E(Y) = np
Var(Y* ) = Var(Y) = np(1 – p) = npq
Note: For this to work, we need n large enough so that both np and nq are at least five
(and 10 is suggested)
One problem: For the binomial r.v. Y:
P(Y = a) is a positive number for a = 0, 1, …
But for the normal r.v. Y* :
P(Y* = a) =
for all a (including 0, 1, …)
The fix is discussed in the text (p. 133) and suggested by the diagram below:
Binomial Distribution
Event prob.,Trials
Result: we use a continuity correction and replace P(Y* = a) – which equals zero – with:
P(a – 0.5 ≤ Y* ≤ a + 0.5)
which more accurately reflects the probability function p(y) of the binomial r.v. Y.
Note: Other results from using this correction are given in the text (p. 133). Some
P(Y ≤ 10) →
P(Y > 12) →
B. HTs and CIs for p
Suppose we have a random sample of size n from a binomial distribution.
Let Y be the number of successes.
Obvious estimator of p is the sample proportion:
Y can be approximated by a normal r.v. with
mean =
variance =
Consequently, the sampling distribution of the estimator
by a normal distribution with:
can be approximated
The hypothesis testing procedure then becomes:
Step 1: Let p0 be the nominal value of the proportion p. Then our null hypothesis is:
and our alternative hypothesis is one of:
depending on what is of concern in the problem.
Step 2: If the null hypothesis is true, then
. Let q0 = 1 – p0 . Then the test statistic
which is approximately distributed as
Note: As before, n should satisfy:
and ≥ 10 is recommended.
Step 3: The critical region depends on Ha:
If Ha: p < p0 then we use a lower one-tailed test and thus reject H0 if:
If Ha: p > p0 then we use an upper one-tailed test and thus reject H0 if:
If Ha: p ≠ p0 then we use a two-tailed test and thus reject H0 if:
Step 4: Perform the n trials, and say we observe y successes. Then let
and compute Z.
Step 5: Reach conclusion and state it in words.
Example 4.10 (p. 180): This deal with the breaking strength of carbon fibers. Historically
10% are non-conforming. Concern is with developing a monitoring procedure, so
changes from the 10% level in either direction are of interest.
Step 1:
H0 :
Step 2: Test statistic is:
We need:
Thus, n ≥
n∗p0 ≥ 5
n∗q0 ≥ 5
Step 3: Choice of α =
we reject H0 if:
(see text for discussion) and two-sided alternative means
Step 4: From the data: n =
and y =
Step 5: Since Z =
we do not reject the null hypothesis H0 .
In words: there is not sufficient evidence from the data to reject the claim that the true
proportion of non-conforming fibers is different than the nominal value of 10%.
As described in the text (p. 183), the appropriate CI for the proportion p is:
Note: this uses an estimated standard error of:
Example 4.11 (p. 184): The carbon fibers problem again. With α = 0.01 we have Zα/2 =
so a 99% CI for p is:
Finally, the CI result can be used to determine the sample size n that would be necessary
if we wanted to produce an interval with half-width no greater than some pre-determined
constant B.
Results in:
Note: in the above, p0 is either:
(a) a reasonable guess of p if we have one, or
(b) 0.5 if we don’t have such a guess.