Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PROBABILITY MODELS 35 10. Discrete probability distributions In this section, we discuss several well-known discrete probability distributions and study some of their properties. Some of these distributions, like the Binomial and Geometric distributions, have appeared before in this course; others, like the Negative Binomial distribution, have not. 10.1. Binomial distribution (n, p). For n ≥ 1 and 0 ≤ p ≤ 1, let X1 , . . . , Xn be independent, identically distributed (i.i.d.) Bernoulli random variables with parameter p. Recall that a X1 has the Bernoulli(p) distribution if its probability mass function is given by p, k=1 1 − p, k=0 p(k) := 0, othewise. Then X := X1 + · · · + Xn is said to have the Binomial distribution with parameter (n, p). The mass function of the Binomial distribution is obtained by a standard combinatorial argument: ! n k pX (k) = p (1 − p)n−k , k = 0, . . . , n. k Notice that a Bernoulli random variable has the Binomial distribution with parameter (1, p). We have shown previously that for X1 ∼ Bernoulli(p), EX1 = p and Var(X) = p(1 − p). Therefore, by our representation of X ∼ Binomial(n, p) as a sum X1 + · · · + Xn of i.i.d. Bernoulli-p random variables, we have n X EX = E(X1 + · · · Xn ) = EX j = np and j=1 Var(X) = Var(X1 + · · · + Xn ) = n X Var(X j ) = np(1 − p). j=1 10.2. Uniform distribution (n). Suppose U is uniformly distributed on {1, . . . , n}, that is ( 1/n, u = 1, . . . , n pU (u) := 0, otherwise. By symmetry, we can deduce that EU = (n + 1)/2. Alternatively, let n n X (i + 1)2 X i2 (n + 1)2 12 2 2 ∆2 := E(U + 1) − EU = − = − . n n n n i=1 i=1 Also, E(U + 1)2 − EU2 = E[(U + 1)2 − U2 ] = 2EU + 1. Putting these two statements together, we obtain EU = (n + 1)/2. We also obtain the very important identity n X n(n + 1) i= . 2 i=1 We can use the same method to compute EU2 . Put ∆3 := E(U + 1)3 − EU3 = (n + 1)3 13 − = n2 + 3n + 3; n n 36 HARRY CRANE and notice also that ∆3 = E[3U2 + 3U + 1] = 3EU2 + 3EU + 1 = 3EU2 + 3 n+1 + 1. 2 Consequently, we have EU2 = 1 2 3n 1 1 2 2n + 3n + 1 . n + + = 3 2 2 6 Here, we obtain the identity n X i=1 i2 = n(2n2 + 3n + 1) n(n + 1)(2n + 1) = . 6 6 Putting together EU and EU2 gives (n + 1)2 n2 − 1 1 = . Var(U) = EU2 − [EU]2 = (2n2 + 3n + 1) − 6 4 12 Now, suppose W is uniformly distributed over {a, a + 1, . . . , b}, for a < b. To find EW and Var(W), we can use what we know about EU and Var(U) for U ∼ Uniform(n). In particular, if W is uniform on {a, a + 1, . . . , b}, then W can be expressed as W = U + a − 1, for U ∼ Uniform(b − a + 1). Therefore, b+a b−a+2 +a−1= 2 2 (b − a + 1)2 − 1 Var(W) = Var(U + a − 1) = Var(U) = . 12 EW = E[U + a − 1] = EU + a − 1 = and 10.3. Hypergeometric distribution (N, m, n). We have previously encountered the Hypergeometric distribution when we discussed probabilities for various events related to lottery numbers. Suppose an urn contains N balls, m ≤ N of which are white, N − m of which are black. We draw n ≤ N balls without replacement and let X be the number of white balls drawn. The probability mass function of X is ( m N−m . N k n−k n , max(0, n − N + m) ≤ k ≤ min(n, m) pX (k) := 0, otherwise. If we let X1 , . . . , Xn be the outcome of the ith draw, i = 1, . . . , n, where ( 1, ith draw is a white ball Xi := 0, otherwise, then X can be expressed as the sum X = X1 + · · · + Xn and the Xi ’s are exchangeable, but not independent. In this case, we have m , i = 1, . . . , n, and N m(m − 1) P{Xi = X j = 1} = , 1 ≤ i , j ≤ n. N(N − 1) P{Xi = 1} = PROBABILITY MODELS 37 Clearly, EX = nEX1 = nm/N. To compute Var(X), we note that mN−m , N N m(m − 1) EXi X j = , N(N − 1) Var(Xi ) = for i , j, and m N−m Cov(Xi , X j ) = EXi X j − EXi EX j = − N N < 0. N−1 Thus, Var(X) = X Var(Xi ) + 2 i<j i = n X ! m N−m mN−m n N N Cov(Xi , X j ) = n +2 N N 2 N−1 N−nmN−m . N−1N N Writing C = (N − n)/(N − 1), we obtain Var(X) = C Var(X0 ), where X0 has the Binomial distribution with parameter (n, m/N). We interpret C as the finite population correction factor for drawing without replacement from a finite urn with initial proportion m/N of white to black balls. Note that C → 1 as N → ∞, and so the Binomial distribution has the interpretation of drawing from an urn with infinitely many balls, a fraction p of which are white. 10.4. Geometric distribution (p). Let W be the waiting time for the first head when tossing a coin with success probability p ∈ [0, 1]. So W = n when the first head appears on the nth toss. Here, we can compute both the probability mass function and the cumulative distribution function in closed form: pW (n) := p(1 − p)n−1 , n = 1, 2, . . . , and n n X X P{W ≤ n} = pW (j) = p(1 − p) j−1 = 1 − (1 − p)n . j=1 j=1 38 HARRY CRANE We compute the expectation of W by EW = ∞ X np(1 − p)n−1 n=1 ∞ X = p n(1 − p)n−1 n=1 ∞ = p d X n q dq n=0 d 1 dq 1 − q 1 = p (1 − q)2 = p = p/p2 = 1/p. We can also compute conditional distributions for W, which reveals an interesting and unique property of the Geometric distribution. Let n > k, then P{W = n | W > k} = = P{{W = n} ∩ {W > k}} P{W > k} n−1 pq 1 − (1 − qk ) = pqn−k−1 . Consequently, the conditional distribution of W − k, given W > k, is Geometric(p). This property is known as the memoryless property. The Geometric distribution is the unique distribution on the positive integers with the memoryless property. Alternatively, we can compute EW by conditioning on the first flip. In this case, EW = 1 × P{W = 1} + E(W | W > 1)P{W > 1} = P{W = 1} + E((W − 1) + 1 | W > 1)P{W > 1} = P{W = 1} + E(W ∗ + 1)P{W > 1}, where W ∗ = W − 1. By the memoryless property, EW = EW ∗ and we have EW = p + (1 − p)(1 + EW). PROBABILITY MODELS 39 To compute the variance, it is easiest to compute the second factorial moment of W: EW(W − 1) = = ∞ X n=1 ∞ X n(n − 1)pqn−1 n(n − 1)pqn−1 n=2 ∞ X = p/q (n + 2)(n + 1)qn = p/q = pq n=0 d2 dq2 d2 dq2 X qn n≥0 1 1−q = 2pq/p3 = 2q/p2 . Now, 2q/p2 = EW(W − 1) = EW 2 − EW = EW 2 − 1/p; and we have EW 2 = 2q/p2 + p/p2 and 2q + p 1 Var(W) = − 2 = q/p2 . p2 p Alternatively, we could define a Geometric random variable V to be the number of tails before the first head. So, in our notation, we have V = W − 1 and pV (v) = pW (v + 1) = pqv , v = 0, 1, . . . , EV = EW − 1 = 1/p − 1 = q/p, Var(V) = Var(W − 1) = q/p2 . 10.5. Negative Binomial distribution (p, r). Consider tossing a p-coin (0 < p < 1) repeatedly and consider Wr , the number of tosses until the rth tail. The probability mass function of Wr , r ≥ 1, is ( k−1 k r r−1 p (1 − p) , k = r, r + 1, r + 2, . . . pWr (k) := 0, otherwise. Alternatively, for Vr = Wr − r, the number of tails before the rth head, we have ! k+r−1 k pVr (k) = pWr (k + r) = p (1 − p)r . r−1 40 HARRY CRANE This latter specification motivates the name Negative Binomial: ! ! k+r−1 k+r−1 = r−1 k (r + k − 1)(r + k − 2) · · · (r + 1)r = k! k (−1) (−r)(−r − 1) · · · (−r − k + 1) = k! ! −r =: (−1)k . k Therefore, we can write ! −r pVr (k) = (−p)k (1 − p)r . k Alternatively, we can write Wr = X1 + · · · + Xr , where X1 , . . . , Xr are i.i.d. Geometric(p). Thus, EWr = E(X1 + · · · + Xr ) = r/(1 − p) and Var(Wr ) = rp/(1 − p) . 2 We could, however, compute these quantities without noticing the representation of Wr as a sum of r independent Geometric random variables. In this case, we have ! ∞ X k−1 k EWr = k p (1 − p)r r−1 k=r = ∞ X = k! pk (1 − p)r (r − 1)!(k − r)! k=r ! ∞ rX k k p (1 − p)r p r k=r | {z } pmf of NB(p,r+1) = r/p. The Negative Binomial distribution arises in a common probabilistic theme of coupon collecting. Example 10.1 (Coupon collecting). Consider rolling a fair 6-sided die repeatedly. How many rolls are needed before all 6 numbers occur? Let N be the number of rolls required to see all 6 numbers of a fair 6-sided die. Clearly, P{N > 0} = 1. Now, let Fni := {i does not occur in the first n rolls}. Then, 6 [ n P{N > n} = P F = S1 − S2 + S3 − S4 + S5 − S6 , i i=1 PROBABILITY MODELS 41 by inclusion-exclusion, where X Sk := 1≤i1 <···<ik ≤6 P{Fni1 ∩ · · · ∩ Fnik }, k = 1, . . . , 6. We have P{Fni } = (5/6)n , P{Fni ∩ Fnj } = (4/6)n , i , j, P{Fni ∩ Fnj ∩ Fnk } = (3/6)n , i , j , k, and so on. Therefore, ! ! ! 6 4 n 6 3 n 6 5 n − + − ··· , P{N > n} = 2 6 3 6 1 6 and P{N = n} = P{N > n − 1} − P{N > n}. Alternatively, we can write N = 1 + X1 + · · · + X5 , where Xi is the number of additional rolls needed to produce the (i + 1)st new number. In this case, X1 ∼ Geometric(5/6), X2 ∼ Geometric(4/6), . . . , X5 ∼ Geometric(1/6), and Xi ’s are all independent. Therefore, we have EN = Var(N) = 6 X i=1 5 X i=1 6/i = 147/10 and X 6(6 − i) 1 − i/6 = = 3899/100. (i/6)2 i2 1≤i≤5 Example 10.2 (Length of a game of craps). The approach to the previous example can be used to study the length of a game of craps. Let R be the total number of rolls and let A ≥ 0 be the number of additional rolls (if any) made after the first roll. Then R = A + 1, ER = EA + 1 and SD(R) = SD(A). For j = 2, . . . , 12, let T j := {first roll is j} and let p j := P{T j } = 6 − |7 − j| , 36 j = 2, . . . , 12. For n = 0, P{A = n} = p2 + p3 + p7 + p11 + p12 = 1/3. For n = 1, 2, . . ., condition on the result after the first roll. X P{A = n} = P{T j }P{A = n | T j }. j=4,5,6,8,9,10 If the point is j, then the number of additional rolls follows the Geometric distribution with success probability p j + p7 . Writing θ j := p j + p7 , we have X P{A = n} = p j (1 − θ j )n−1 θ j . j=4,5,6,8,9,10 42 HARRY CRANE To find EA, we can use the conditioning rule for expectations X EA = P{T j }E[A | T j ] j=4,5,6,8,9,10 = 2 X p j /θ j j=4,5,6 = 2 X j=3,4,5 1 6+ j = 392/165. 10.6. Factorial moments. For a random variable X, the quantity µk := EXk is called the kth moment of X and νk := EX↓k = E[X(X − 1) · · · (X − k + 1)] is called the kth factorial moment of X. Sometimes, computing factorial moments makes calculation of ordinary moments easier. For example, EX = µ1 = ν1 and Var(X) = µ2 − µ21 = ν2 + ν1 − ν21 . P Let A1 , . . . , An be events and let X := j IA j be the number of events that occur. We know P ν1 = EX = i P[Ai ] = S1 . The number of pairs of events to occur is ! X X(X − 1) X . IAi A j = = 2 2 1≤i<j≤n P Therefore, ν2 = E[X(X − 1)] = 2!S2 , where S2 := i<j P{Ai ∩ A j }. In general, with Sk := P 1≤i1 <···<ik ≤n P{Ai1 ∩ · · · ∩ Aik }, we have νk = k!Sk . 10.7. Poisson distribution (µ). Example 10.3 (Modeling volume). How do we model the volume (number of shares traded) of a stock popular stock on an ordinary day? To begin, we assume the number of people n in the market is very large and, for now, we assume that each person can purchase at most one share of stock and does so independently of everyone else with very small probability p > 0. If X denotes the number of shares bought, then X ∼ Binomial(n, p). We already know that EX = np and, if np is of moderate size, we would like to know what P{X = k} is for moderate values of k. If p is small and np is moderate, then log(1−p)n = n log(1−p) ≈ −np = −µ; hence, (1−p)n ≈ e−µ . Thus, for moderate values of k, we have ! n k P{X = k} = p (1 − p)n−k k 1 n↓k (np)k (1 − p)n (1 − p)−k k! nk 1 ≈ × 1 × µk × e−µ × 1 k! = µk e−µ /k!. = In fact, we see that this informal approximation gives rise to another distribution, known as the Poisson distribution. PROBABILITY MODELS 43 More formally, if we take n → ∞, pn → 0, and npn → µ ∈ [0, ∞) in ! n k p (1 − pn )n−k , k n we obtain the limiting distribution ! n k p (1 − p)n−k → µk e−µ /k!. k A random variable X for which pX (k) = µk e−µ /k!, k = 0, 1, 2, . . . , has the Poisson distribution with parameter µ. The Poisson distribution is often used to model rare events. For the X ∼ Poisson(µ), we have EX = ∞ X kµk e−µ /k! k=0 = ∞ X µk e−µ /(k − 1)! k=1 ∞ X = µ µk−1 e−µ /(k − 1)! k=1 | {z } pmf of Poisson(µ) = µ. The mode (most likely value) of X ∼ Poisson(µ) is bµc, the greatest integer smaller than µ, if µ is not an integer, and both µ and µ − 1 if µ ∈ Z. To see this, we compare the ratio of successive probabilities µ µk e−µ /k! pX (k) = . = k−1 −µ pX (k − 1) µ e /(k − 1)! k Also, for j = 1, 2, . . ., P{X ≥ j} ≤ µ j / j!: µ2 µ + + ··· P{X ≥ j} = µ e /j! 1 + j + 1 (j + 1)(j + 2) ≤ µ j e−µ / j! 1 + µ/1! + µ2 /2! + · · · ! j −µ = µ j e−µ / j!eµ = µ j / j!. Theorem 10.4 (Law of rare events). Suppose Y ∼ Binomial(n, p) and X ∼ Poisson(np), then Y ≈D X if n is large, p is small, and np is moderate. In practice, Theorem 10.4 applies when np ≈ 5, or so. Theorem 10.5 (A General Poisson Approximation theorem). For each n, suppose 1,n , . . . , An,n PA n are (not necessarily independent, not necessarily equi-probable) events. Let Nn := j=1 IA j,n be the 44 HARRY CRANE number of events to occur. If n → ∞ and Ai,n ’s vary with n in such a way that \ k X Sk,n := P A → λk /k! for each k, i j ,n 1≤i1 <···<ik ≤n j=1 then P{Nn = j} → λ j e−λ / j!, (18) j = 0, 1, . . . . Intuitively, if you have a large number n of things, each with a small probability, and they are approximately independent, then Nn is approximately Poisson. Example 10.6 (Hat matching). Suppose there is a group of n people, each with their own hat. Everyone throws their hat into a pile and then the group, one atPa time, chooses a hat randomly from the pile. Let Ai,n := {ith person picks his own hat} and Nn := nj=1 IA j,n , the number of people who pick their own hat in a group of n. Then (18) holds if • for each n, the Ai,n ’s are exchangeable, • for n → ∞, P{Ai,n } → 0 in such a way that ENn = nP{Ai,n } = λ, and • the Ai,n ’s are asymptotically independent in the sense that P{A1,n ∩ · · · ∩ Ak,n } →1 Qk j=1 P{A j,n } as n → ∞, for all 1 ≤ k ≤ n. In this case, Sk,n = ! n P{A1,n ∩ · · · ∩ Ak,n } k = P{A1,n ∩ · · · ∩ Ak,n } 1 (n)↓k × k (nP{A1,n })k × Qk k! n j=1 P{A j,n } → λk /k!. For the hat matching problem, we have • A1,n , . . . , An,n are exchangeable, • nP{A1,n } = n × 1/n = 1 = λ, and • P{A1,n ∩ · · · ∩ Ak,n }/P(A1,n )k = nk /n↓k → 1. Hence, Nn ≈ Poisson(1).