Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

St 314 – Day #13 Notes A. Normal Approximation to the Binomial Recall the binomial distribution: P(Y = y) = p(y) = (n y )py (1-p)n-y y = 0, 1, … which gives the probability of y successes out of n independent trials, where the probability of success on any one trial is p (0 < p < 1). If n is large this becomes difficult to compute. However, we can use the Central Limit Theorem (CLT) to approximate the r.v. Y with a normal r.v. Y* where Y and Y* have the same mean and variance: E(Y* ) = E(Y) = np Var(Y* ) = Var(Y) = np(1 – p) = npq Note: For this to work, we need n large enough so that both np and nq are at least five (and 10 is suggested) One problem: For the binomial r.v. Y: P(Y = a) is a positive number for a = 0, 1, … But for the normal r.v. Y* : P(Y* = a) = for all a (including 0, 1, …) The fix is discussed in the text (p. 133) and suggested by the diagram below: Binomial Distribution probability 0.15 Event prob.,Trials 0.01,1000 0.12 0.09 0.06 0.03 0 0 2 4 6 8 10 12 -1- 14 16 18 20 Result: we use a continuity correction and replace P(Y* = a) – which equals zero – with: P(a – 0.5 ≤ Y* ≤ a + 0.5) which more accurately reflects the probability function p(y) of the binomial r.v. Y. Note: Other results from using this correction are given in the text (p. 133). Some examples: P(Y ≤ 10) → P(Y > 12) → B. HTs and CIs for p Suppose we have a random sample of size n from a binomial distribution. Let Y be the number of successes. Obvious estimator of p is the sample proportion: Y can be approximated by a normal r.v. with mean = variance = Consequently, the sampling distribution of the estimator by a normal distribution with: can be approximated The hypothesis testing procedure then becomes: Step 1: Let p0 be the nominal value of the proportion p. Then our null hypothesis is: and our alternative hypothesis is one of: depending on what is of concern in the problem. -2- Step 2: If the null hypothesis is true, then is: . Let q0 = 1 – p0 . Then the test statistic which is approximately distributed as Note: As before, n should satisfy: and ≥ 10 is recommended. Step 3: The critical region depends on Ha: If Ha: p < p0 then we use a lower one-tailed test and thus reject H0 if: If Ha: p > p0 then we use an upper one-tailed test and thus reject H0 if: If Ha: p ≠ p0 then we use a two-tailed test and thus reject H0 if: Step 4: Perform the n trials, and say we observe y successes. Then let and compute Z. Step 5: Reach conclusion and state it in words. Example 4.10 (p. 180): This deal with the breaking strength of carbon fibers. Historically 10% are non-conforming. Concern is with developing a monitoring procedure, so changes from the 10% level in either direction are of interest. Step 1: H0 : Ha: Step 2: Test statistic is: Z= We need: and Thus, n ≥ n∗p0 ≥ 5 n∗q0 ≥ 5 ⇒ ⇒ n≥ n≥ . Step 3: Choice of α = we reject H0 if: (see text for discussion) and two-sided alternative means -3- Step 4: From the data: n = and y = so Step 5: Since Z = we do not reject the null hypothesis H0 . In words: there is not sufficient evidence from the data to reject the claim that the true proportion of non-conforming fibers is different than the nominal value of 10%. As described in the text (p. 183), the appropriate CI for the proportion p is: Note: this uses an estimated standard error of: Example 4.11 (p. 184): The carbon fibers problem again. With α = 0.01 we have Zα/2 = so a 99% CI for p is: Finally, the CI result can be used to determine the sample size n that would be necessary if we wanted to produce an interval with half-width no greater than some pre-determined constant B. Setting: Results in: Note: in the above, p0 is either: (a) a reasonable guess of p if we have one, or (b) 0.5 if we don’t have such a guess. -4-