Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
OpenStax-CNX module: m13124 1 GEOMETRIC DISTRIBUTION ∗ Ewa Paszek This work is produced by OpenStax-CNX and licensed under the † Creative Commons Attribution License 2.0 Abstract This course is a short series of lectures on Introductory Statistics. Topics covered are listed in the Table of Contents. The notes were prepared by Ewa Paszek and Marek Kimmel. The development of this course has been supported by NSF 0203396 grant. 1 GEOMETRIC DISTRIBUTION To obtain a binomial random variable, we observed a sequence of n Bernoulli trials and counted the number of successes. Suppose now that we do not x the number of Bernoulli trials in advance but instead continue to observe the sequence of Bernoulli trials until a certain number r, of successes occurs. The random variable of interest is the number of trials needed to observe the rth success. 1.1 Let rst discuss the problem when r =1. That is, consider a sequence of Bernoulli trials with probability p of success. This sequence is observed until the rst success occurs. Let X denot the trial number on which the rst success occurs. For example, if F and S represent failure and success, respectively, and the sequence starts with F,F,F,S,. . ., then X =4. Moreover, because the trials are independent, the probability of such sequence is 3 P (X = 4) = (q) (q) (q) (p) = q 3 p = (1 − p) p. In general, the p.d.f. f (x) = P (X = x) , of X is given by f (x) = (1 − p)x−1 p,x = 1, 2, ..., because there must be x -1 failures before the rst success that occurs on trail x. We say that X has a geometric distribution. note: for a geometric series, the sum is given by ∞ X k=0 when |r| < 1. ∗ Version 1.3: Oct 8, 2007 3:25 pm -0500 † http://creativecommons.org/licenses/by/2.0/ http://cnx.org/content/m13124/1.3/ ark = ∞ X k=1 ark−1 = a , 1−r OpenStax-CNX module: m13124 2 Thus, ∞ X f (x) = x=1 ∞ X k−1 (1 − p) p= x=1 p = 1, 1 − (1 − p) so that f (x) does satisfy the properties of a p.d.f.. From the sum of geometric series we also note that, when k is an integer, ∞ X P (X > k) = k x−1 (1 − p) p= x=k+1 (1 − p) p k = (1 − p) = q k , 1 − (1 − p) and thus the value of the distribution function at a positive integer P (X ≤ k) = ∞ X x−1 (1 − p) k is k p = 1 − P (X > k) = 1 − (1 − p) = 1 − q k . x=k+1 Example 1 Some biology students were checking the eye color for a large number of fruit ies. For the individual y, suppose that the probability of white eyes is 14 and the probability of red eyes is 34 , and that we may treat these ies as independent Bernoulli trials. The probability that at least four ies have to be checked for eye color to observe a white-eyed y is given by P (X ≥ 4) = P (X > 3) = q 3 = 3 3 = 0.422. 4 The probability that at most four ies have to be checked for eye color to observe a white-eyed y is given by P (X ≤ 4) = 1 − q 4 = 1 − 4 3 = 0.684. 4 The probability that the rst y with white eyes is the fourth y that is checked is P (X = 4) = q 4−1 3 3 1 p= = 0.105. 4 4 It is also true that " 3 # 3 4 # " 3 3 1 3 − 1− = . P (X = 4) = P (X ≤ 4) − P (X ≤ 3) = 1 − 4 4 4 4 In general, f (x) = P (X = x) = http://cnx.org/content/m13124/1.3/ x−1 3 1 , x = 1, 2, 3, ... 4 4 OpenStax-CNX module: m13124 3 1.2 To nd a mean and variance for the geometric distribution, let use the following results about the sum and the rst and second derivatives of a geometric series. For −1 < r < 1 , let g (r) = ∞ X a . 1−r ark = k=0 Then g ' (r) = ∞ X akrk−1 = k=1 and g '' (r) = ∞ X a (1 − r) ak (k − 1) rk−2 = k=2 If X 2, 2a 3. (1 − r) has a geometric distribution and 0 < p < 1 , then the mean of E (X) = ∞ X xq x−1 p = x=1 p 2 (1 − q) = X is given by 1 , p (1) using the formula for g ' (x) with a = p and r = q . note: for example, that if p =1/4 is the probability of success, then E (X) = 1/ (1/4) = 4 trials are needed on the average to observe a success. 1.3 To nd the variance of , let rst nd the second factorial moment E [X (X − 1)]. We have X E [X (X − 1)] = ∞ X x (x − 1) q x−1 p = ∞ X pqx (x − 1) q x−2 = x=1 x=1 Using formula for g '' (x) with a = pq and r = q . Thus the variance of X 2pq 3 (1 − q) = 2q . p2 is 2 2 V ar (X) = E X 2 − [E (X)] = {E [X (X − 1)] + E (X)} − [E (X)] = = 2q p2 + 1 p − The standard deviation of 1 p2 = X is 2q+p−1 p2 = 1−p p2 . σ= p (1 − p) /p2 . Example 2 Continuing with example 1 (Example 1), with p =1/4, we obtain 1 1/4 = 4, 3/4 σ 2 = (1/4) 2 = µ= and σ= http://cnx.org/content/m13124/1.3/ √ 12, 12 = 3.464. OpenStax-CNX module: m13124 note: Binomial Distribution1 note: Poisson Distribution2 1 "BERNOULLI TRIALS and the BINOMIAL DISTRIBUTION", see section at <http://cnx.org/content/m13123/latest/#sec_3> 2 "POISSON DISTRIBUTION": Section POISSON DISTRIBUTION <http://cnx.org/content/m13125/latest/#sec_1> http://cnx.org/content/m13124/1.3/ 4