Download Discrete Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
PROBABILITY MODELS
35
10. Discrete probability distributions
In this section, we discuss several well-known discrete probability distributions and study
some of their properties. Some of these distributions, like the Binomial and Geometric
distributions, have appeared before in this course; others, like the Negative Binomial
distribution, have not.
10.1. Binomial distribution (n, p). For n ≥ 1 and 0 ≤ p ≤ 1, let X1 , . . . , Xn be independent,
identically distributed (i.i.d.) Bernoulli random variables with parameter p. Recall that a
X1 has the Bernoulli(p) distribution if its probability mass function is given by


p,
k=1



1
−
p,
k=0
p(k) := 


 0,
othewise.
Then X := X1 + · · · + Xn is said to have the Binomial distribution with parameter (n, p).
The mass function of the Binomial distribution is obtained by a standard combinatorial
argument:
!
n k
pX (k) =
p (1 − p)n−k , k = 0, . . . , n.
k
Notice that a Bernoulli random variable has the Binomial distribution with parameter (1, p).
We have shown previously that for X1 ∼ Bernoulli(p), EX1 = p and Var(X) = p(1 − p).
Therefore, by our representation of X ∼ Binomial(n, p) as a sum X1 + · · · + Xn of i.i.d.
Bernoulli-p random variables, we have
n
X
EX = E(X1 + · · · Xn ) =
EX j = np and
j=1
Var(X) = Var(X1 + · · · + Xn ) =
n
X
Var(X j ) = np(1 − p).
j=1
10.2. Uniform distribution (n). Suppose U is uniformly distributed on {1, . . . , n}, that is
(
1/n, u = 1, . . . , n
pU (u) :=
0,
otherwise.
By symmetry, we can deduce that EU = (n + 1)/2. Alternatively, let
n
n
X
(i + 1)2 X i2 (n + 1)2 12
2
2
∆2 := E(U + 1) − EU =
−
=
− .
n
n
n
n
i=1
i=1
Also,
E(U + 1)2 − EU2 = E[(U + 1)2 − U2 ] = 2EU + 1.
Putting these two statements together, we obtain EU = (n + 1)/2. We also obtain the very
important identity
n
X
n(n + 1)
i=
.
2
i=1
We can use the same method to compute EU2 . Put
∆3 := E(U + 1)3 − EU3 =
(n + 1)3 13
−
= n2 + 3n + 3;
n
n
36
HARRY CRANE
and notice also that
∆3 = E[3U2 + 3U + 1] = 3EU2 + 3EU + 1 = 3EU2 + 3
n+1
+ 1.
2
Consequently, we have
EU2 =
1 2 3n 1
1 2
2n + 3n + 1 .
n +
+
=
3
2
2
6
Here, we obtain the identity
n
X
i=1
i2 =
n(2n2 + 3n + 1) n(n + 1)(2n + 1)
=
.
6
6
Putting together EU and EU2 gives
(n + 1)2 n2 − 1
1
=
.
Var(U) = EU2 − [EU]2 = (2n2 + 3n + 1) −
6
4
12
Now, suppose W is uniformly distributed over {a, a + 1, . . . , b}, for a < b. To find EW
and Var(W), we can use what we know about EU and Var(U) for U ∼ Uniform(n). In
particular, if W is uniform on {a, a + 1, . . . , b}, then W can be expressed as W = U + a − 1, for
U ∼ Uniform(b − a + 1). Therefore,
b+a
b−a+2
+a−1=
2
2
(b − a + 1)2 − 1
Var(W) = Var(U + a − 1) = Var(U) =
.
12
EW = E[U + a − 1] = EU + a − 1 =
and
10.3. Hypergeometric distribution (N, m, n). We have previously encountered the Hypergeometric distribution when we discussed probabilities for various events related to lottery
numbers. Suppose an urn contains N balls, m ≤ N of which are white, N − m of which are
black. We draw n ≤ N balls without replacement and let X be the number of white balls
drawn. The probability mass function of X is
( m N−m . N
k n−k
n , max(0, n − N + m) ≤ k ≤ min(n, m)
pX (k) :=
0,
otherwise.
If we let X1 , . . . , Xn be the outcome of the ith draw, i = 1, . . . , n, where
(
1, ith draw is a white ball
Xi :=
0,
otherwise,
then X can be expressed as the sum X = X1 + · · · + Xn and the Xi ’s are exchangeable, but
not independent. In this case, we have
m
, i = 1, . . . , n, and
N
m(m − 1)
P{Xi = X j = 1} =
, 1 ≤ i , j ≤ n.
N(N − 1)
P{Xi = 1} =
PROBABILITY MODELS
37
Clearly, EX = nEX1 = nm/N. To compute Var(X), we note that
mN−m
,
N N
m(m − 1)
EXi X j =
,
N(N − 1)
Var(Xi ) =
for i , j, and
m N−m
Cov(Xi , X j ) = EXi X j − EXi EX j = − N N < 0.
N−1
Thus,
Var(X) =
X
Var(Xi ) + 2
i<j
i
= n
X
! m N−m
mN−m
n N N
Cov(Xi , X j ) = n
+2
N N
2 N−1
N−nmN−m
.
N−1N N
Writing C = (N − n)/(N − 1), we obtain
Var(X) = C Var(X0 ),
where X0 has the Binomial distribution with parameter (n, m/N). We interpret C as the finite
population correction factor for drawing without replacement from a finite urn with initial
proportion m/N of white to black balls. Note that C → 1 as N → ∞, and so the Binomial
distribution has the interpretation of drawing from an urn with infinitely many balls, a
fraction p of which are white.
10.4. Geometric distribution (p). Let W be the waiting time for the first head when tossing
a coin with success probability p ∈ [0, 1]. So W = n when the first head appears on the
nth toss. Here, we can compute both the probability mass function and the cumulative
distribution function in closed form:
pW (n) := p(1 − p)n−1 , n = 1, 2, . . . , and
n
n
X
X
P{W ≤ n} =
pW (j) =
p(1 − p) j−1 = 1 − (1 − p)n .
j=1
j=1
38
HARRY CRANE
We compute the expectation of W by
EW =
∞
X
np(1 − p)n−1
n=1
∞
X
= p
n(1 − p)n−1
n=1
∞
= p
d X n
q
dq
n=0
d 1
dq 1 − q
1
= p
(1 − q)2
= p
= p/p2
= 1/p.
We can also compute conditional distributions for W, which reveals an interesting and
unique property of the Geometric distribution. Let n > k, then
P{W = n | W > k} =
=
P{{W = n} ∩ {W > k}}
P{W > k}
n−1
pq
1 − (1 − qk )
= pqn−k−1 .
Consequently, the conditional distribution of W − k, given W > k, is Geometric(p). This
property is known as the memoryless property. The Geometric distribution is the unique
distribution on the positive integers with the memoryless property.
Alternatively, we can compute EW by conditioning on the first flip. In this case,
EW = 1 × P{W = 1} + E(W | W > 1)P{W > 1}
= P{W = 1} + E((W − 1) + 1 | W > 1)P{W > 1}
= P{W = 1} + E(W ∗ + 1)P{W > 1},
where W ∗ = W − 1. By the memoryless property, EW = EW ∗ and we have
EW = p + (1 − p)(1 + EW).
PROBABILITY MODELS
39
To compute the variance, it is easiest to compute the second factorial moment of W:
EW(W − 1) =
=
∞
X
n=1
∞
X
n(n − 1)pqn−1
n(n − 1)pqn−1
n=2
∞
X
= p/q
(n + 2)(n + 1)qn
= p/q
= pq
n=0
d2
dq2
d2
dq2
X
qn
n≥0
1
1−q
= 2pq/p3
= 2q/p2 .
Now,
2q/p2 = EW(W − 1) = EW 2 − EW = EW 2 − 1/p;
and we have
EW 2 = 2q/p2 + p/p2 and
2q + p
1
Var(W) =
− 2 = q/p2 .
p2
p
Alternatively, we could define a Geometric random variable V to be the number of tails
before the first head. So, in our notation, we have V = W − 1 and
pV (v) = pW (v + 1) = pqv , v = 0, 1, . . . ,
EV = EW − 1 = 1/p − 1 = q/p,
Var(V) = Var(W − 1) = q/p2 .
10.5. Negative Binomial distribution (p, r). Consider tossing a p-coin (0 < p < 1) repeatedly and consider Wr , the number of tosses until the rth tail. The probability mass function
of Wr , r ≥ 1, is
( k−1
k
r
r−1 p (1 − p) , k = r, r + 1, r + 2, . . .
pWr (k) :=
0,
otherwise.
Alternatively, for Vr = Wr − r, the number of tails before the rth head, we have
!
k+r−1 k
pVr (k) = pWr (k + r) =
p (1 − p)r .
r−1
40
HARRY CRANE
This latter specification motivates the name Negative Binomial:
!
!
k+r−1
k+r−1
=
r−1
k
(r + k − 1)(r + k − 2) · · · (r + 1)r
=
k!
k
(−1) (−r)(−r − 1) · · · (−r − k + 1)
=
k!
!
−r
=: (−1)k
.
k
Therefore, we can write
!
−r
pVr (k) =
(−p)k (1 − p)r .
k
Alternatively, we can write Wr = X1 + · · · + Xr , where X1 , . . . , Xr are i.i.d. Geometric(p).
Thus,
EWr = E(X1 + · · · + Xr ) = r/(1 − p)
and
Var(Wr ) = rp/(1 − p) .
2
We could, however, compute these quantities without noticing the representation of Wr as
a sum of r independent Geometric random variables. In this case, we have
!
∞
X
k−1 k
EWr =
k
p (1 − p)r
r−1
k=r
=
∞
X
=
k!
pk (1 − p)r
(r − 1)!(k − r)!
k=r
!
∞
rX k k
p (1 − p)r
p
r
k=r
|
{z
}
pmf of NB(p,r+1)
= r/p.
The Negative Binomial distribution arises in a common probabilistic theme of coupon
collecting.
Example 10.1 (Coupon collecting). Consider rolling a fair 6-sided die repeatedly. How many
rolls are needed before all 6 numbers occur?
Let N be the number of rolls required to see all 6 numbers of a fair 6-sided die. Clearly,
P{N > 0} = 1. Now, let
Fni := {i does not occur in the first n rolls}.
Then,


6


[



n
P{N > n} = P 
F
= S1 − S2 + S3 − S4 + S5 − S6 ,

i




i=1
PROBABILITY MODELS
41
by inclusion-exclusion, where
X
Sk :=
1≤i1 <···<ik ≤6
P{Fni1 ∩ · · · ∩ Fnik },
k = 1, . . . , 6.
We have
P{Fni } = (5/6)n ,
P{Fni ∩ Fnj } = (4/6)n ,
i , j,
P{Fni ∩ Fnj ∩ Fnk } = (3/6)n ,
i , j , k,
and so on. Therefore,
! ! ! 6 4 n
6 3 n
6 5 n
−
+
− ··· ,
P{N > n} =
2 6
3 6
1 6
and P{N = n} = P{N > n − 1} − P{N > n}. Alternatively, we can write N = 1 + X1 + · · · + X5 ,
where Xi is the number of additional rolls needed to produce the (i + 1)st new number. In this
case, X1 ∼ Geometric(5/6), X2 ∼ Geometric(4/6), . . . , X5 ∼ Geometric(1/6), and Xi ’s are all
independent. Therefore, we have
EN =
Var(N) =
6
X
i=1
5
X
i=1
6/i = 147/10 and
X 6(6 − i)
1 − i/6
=
= 3899/100.
(i/6)2
i2
1≤i≤5
Example 10.2 (Length of a game of craps). The approach to the previous example can be used to
study the length of a game of craps. Let R be the total number of rolls and let A ≥ 0 be the number
of additional rolls (if any) made after the first roll. Then
R = A + 1,
ER = EA + 1
and
SD(R) = SD(A).
For j = 2, . . . , 12, let T j := {first roll is j} and let
p j := P{T j } =
6 − |7 − j|
,
36
j = 2, . . . , 12.
For n = 0,
P{A = n} = p2 + p3 + p7 + p11 + p12 = 1/3.
For n = 1, 2, . . ., condition on the result after the first roll.
X
P{A = n} =
P{T j }P{A = n | T j }.
j=4,5,6,8,9,10
If the point is j, then the number of additional rolls follows the Geometric distribution with success
probability p j + p7 . Writing θ j := p j + p7 , we have
X
P{A = n} =
p j (1 − θ j )n−1 θ j .
j=4,5,6,8,9,10
42
HARRY CRANE
To find EA, we can use the conditioning rule for expectations
X
EA =
P{T j }E[A | T j ]
j=4,5,6,8,9,10
= 2
X
p j /θ j
j=4,5,6
= 2
X
j=3,4,5
1
6+ j
= 392/165.
10.6. Factorial moments. For a random variable X, the quantity µk := EXk is called the kth
moment of X and
νk := EX↓k = E[X(X − 1) · · · (X − k + 1)]
is called the kth factorial moment of X. Sometimes, computing factorial moments makes
calculation of ordinary moments easier. For example,
EX = µ1 = ν1
and
Var(X) = µ2 − µ21 = ν2 + ν1 − ν21 .
P
Let A1 , . . . , An be events and let X := j IA j be the number of events that occur. We know
P
ν1 = EX = i P[Ai ] = S1 . The number of pairs of events to occur is
!
X
X(X − 1)
X
.
IAi A j =
=
2
2
1≤i<j≤n
P
Therefore, ν2 = E[X(X − 1)] = 2!S2 , where S2 := i<j P{Ai ∩ A j }. In general, with Sk :=
P
1≤i1 <···<ik ≤n P{Ai1 ∩ · · · ∩ Aik }, we have νk = k!Sk .
10.7. Poisson distribution (µ).
Example 10.3 (Modeling volume). How do we model the volume (number of shares traded) of a
stock popular stock on an ordinary day? To begin, we assume the number of people n in the market
is very large and, for now, we assume that each person can purchase at most one share of stock and
does so independently of everyone else with very small probability p > 0. If X denotes the number of
shares bought, then X ∼ Binomial(n, p). We already know that EX = np and, if np is of moderate
size, we would like to know what P{X = k} is for moderate values of k.
If p is small and np is moderate, then log(1−p)n = n log(1−p) ≈ −np = −µ; hence, (1−p)n ≈ e−µ .
Thus, for moderate values of k, we have
!
n k
P{X = k} =
p (1 − p)n−k
k
1 n↓k
(np)k (1 − p)n (1 − p)−k
k! nk
1
≈
× 1 × µk × e−µ × 1
k!
= µk e−µ /k!.
=
In fact, we see that this informal approximation gives rise to another distribution, known as the
Poisson distribution.
PROBABILITY MODELS
43
More formally, if we take n → ∞, pn → 0, and npn → µ ∈ [0, ∞) in
!
n k
p (1 − pn )n−k ,
k n
we obtain the limiting distribution
!
n k
p (1 − p)n−k → µk e−µ /k!.
k
A random variable X for which
pX (k) = µk e−µ /k!,
k = 0, 1, 2, . . . ,
has the Poisson distribution with parameter µ. The Poisson distribution is often used to
model rare events.
For the X ∼ Poisson(µ), we have
EX =
∞
X
kµk e−µ /k!
k=0
=
∞
X
µk e−µ /(k − 1)!
k=1
∞
X
= µ
µk−1 e−µ /(k − 1)!
k=1
|
{z
}
pmf of Poisson(µ)
= µ.
The mode (most likely value) of X ∼ Poisson(µ) is bµc, the greatest integer smaller than
µ, if µ is not an integer, and both µ and µ − 1 if µ ∈ Z. To see this, we compare the ratio of
successive probabilities
µ
µk e−µ /k!
pX (k)
= .
= k−1 −µ
pX (k − 1) µ e /(k − 1)!
k
Also, for j = 1, 2, . . ., P{X ≥ j} ≤ µ j / j!:
µ2
µ
+
+ ···
P{X ≥ j} = µ e /j! 1 +
j + 1 (j + 1)(j + 2)
≤ µ j e−µ / j! 1 + µ/1! + µ2 /2! + · · ·
!
j −µ
= µ j e−µ / j!eµ
= µ j / j!.
Theorem 10.4 (Law of rare events). Suppose Y ∼ Binomial(n, p) and X ∼ Poisson(np), then
Y ≈D X if n is large, p is small, and np is moderate.
In practice, Theorem 10.4 applies when np ≈ 5, or so.
Theorem 10.5 (A General Poisson Approximation theorem). For each n, suppose
1,n , . . . , An,n
PA
n
are (not necessarily independent, not necessarily equi-probable) events. Let Nn := j=1 IA j,n be the
44
HARRY CRANE
number of events to occur. If n → ∞ and Ai,n ’s vary with n in such a way that


\

k


X




Sk,n :=
P
A

 → λk /k! for each k,
i j ,n 



1≤i1 <···<ik ≤n  j=1
then
P{Nn = j} → λ j e−λ / j!,
(18)
j = 0, 1, . . . .
Intuitively, if you have a large number n of things, each with a small probability, and
they are approximately independent, then Nn is approximately Poisson.
Example 10.6 (Hat matching). Suppose there is a group of n people, each with their own hat.
Everyone throws their hat into a pile and then the group, one atPa time, chooses a hat randomly from
the pile. Let Ai,n := {ith person picks his own hat} and Nn := nj=1 IA j,n , the number of people who
pick their own hat in a group of n. Then (18) holds if
• for each n, the Ai,n ’s are exchangeable,
• for n → ∞, P{Ai,n } → 0 in such a way that ENn = nP{Ai,n } = λ, and
• the Ai,n ’s are asymptotically independent in the sense that
P{A1,n ∩ · · · ∩ Ak,n }
→1
Qk
j=1 P{A j,n }
as n → ∞,
for all 1 ≤ k ≤ n.
In this case,
Sk,n
=
!
n
P{A1,n ∩ · · · ∩ Ak,n }
k
=
P{A1,n ∩ · · · ∩ Ak,n }
1 (n)↓k
× k (nP{A1,n })k ×
Qk
k!
n
j=1 P{A j,n }
→ λk /k!.
For the hat matching problem, we have
• A1,n , . . . , An,n are exchangeable,
• nP{A1,n } = n × 1/n = 1 = λ, and
• P{A1,n ∩ · · · ∩ Ak,n }/P(A1,n )k = nk /n↓k → 1.
Hence, Nn ≈ Poisson(1).