Download Normal approximation to the binomial distribution Consider coin

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Probability and Statistics
Grinshpan
Normal approximation to the binomial distribution
Consider coin tossing with probability of heads p, 0 < p < 1, and probability of tails q = 1 − p.
Let m̂ = m̂(n) be the number of heads in n independent tosses. Then m̂ is a sum of n independent
copies of a Bernoulli random variable (1/0 indicator of whether the coin lands heads up) and has a
binomial distribution. We have
( )
n k n−k
P (m̂ = k) =
p q
, E[m̂] = pn, and Var(m̂) = pqn.
k
When n is large, computing the probability that m̂ falls within certain bounds, k1 and k2 , may
involve very large numbers. For instance,
(
)
9.33 × 10157
100
100!
∼
.
=
(50!)2
9.25 × 10128
50
Thus the evaluation of an exact answer,
k2 ( )
∑
n k n−k
P (k1 ≤ m̂ ≤ k2 ) =
p q
,
k
k=k1
is a computational challenge. This puts some limitations on the use of Bernoulli’s formula.
An alternative is provided by normal approximation: for large n, the rescaled random variable
m̂ − pn
√
pqn
is very close to the standard normal Z ∼ N (0, 1) in distribution. This was proved by de Moivre (1730)
for p = 1/2, and by Laplace (1812) for a general 0 < p < 1. One distinguishes between
the local
2
1
1
P (m̂ = k) ≈ √
e−(k−np) /2npq √
npq
2π
and the integral
∫ x2
2
1
k2 −np
√
P (k1 ≤ m̂ ≤ k2 ) ≈ √
e−x /2 dx,
x1 = k√1 −np
npq , x2 =
npq
2π x1
approximation forms of the de Moivre–Laplace limit theorem. Each is a consequence of the other.
Thus, for large n, the distribution of m̂ is nearly N (pn, pqn). This allows a quick approximate answer
in terms of well-understood Φ(x), the cumulative distribution function of Z.
Example [Rice, page 187]
Should we suspect a bias?
Suppose that a coin is tossed 100 times and lands heads up 60 times.
(
)
100
If the coin is fair, the probability that m̂ = 50 is
/2100 ≈ 0.0796, and the probability that
50
(
)
100
m̂ = 60 is
/2100 ≈ 0.0108. What does this say? Is m̂ = 60 a likely or an unlikely outcome?
60
Let us take an approximation approach. For a fair coin, we have E[m̂] = 0.5 · 100 = 50 and
Var(m̂) = 0.25 · 100 = 25. Since the number of trials is large, the normalized binomial variable
(m̂ − 50)/5 should be well approximated by Z :
)
(
m̂ − 50
≥ 2 ≈ P (Z ≥ 2) = 1 − Φ(2) < 0.05/2 = 0.025.
P (m̂ ≥ 60) = P
5
The estimate suggests that the likelihood of the event m̂ = 60 is small (after all, Z = 2 is two standard
deviations away from the mean). So suspecting a bias would not be unreasonable.