Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Notes on The Poisson Distribution 1. Essentially the Poisson distribution arises when the random variable in question is a count of points or “specks” in a CONTINUOUS MEDIUM. 2. Consider random points (we often refer to them as “events” and think of them as “specks”) which occur in space or time. 3. Suppose that these “events” occur in such a way that: (i) The counts of points in non-overlapping regions are INDEPENDENT random variables. (ii) If R is a region having unit “measure” then the expected count of points in R is equal to a constant α. 4. Remark: By “measure” we mean length, or area, or volume, or hypervolume, depending on the dimension of the space. 5. “Unit” measure means that this measure is equal to 1. 6. Now let X be the count of points or events in a region R, where the measure (length, area, volume, etc.) of R is equal to t. 7. Let λ = αt. 8. The random variable X has a Poisson distribution with parameter λ. 9. The probability function for X is P (X = x) = e−λ λx x! 10. The mean of X is E(X) = λ 11. The variance of X is V (X) = λ = E(X) !!! 12. Hence the standard deviation of X is equal to √ λ. 13. The Poisson distribution can also arise as the “limit” of binomial distributions. 14. The rough idea is that if you are counting the number of (rare) successes in a LARGE sample then this like counting “specks” in a continuous medium. 15. More explicitly, if X ∼ Bin(n, p) where n is “large” and p is “small”, then it is APPROXIMATELY TRUE that X ∼ Poisson(λ) where λ = np. 16. I.e. it is exactly true that if X ∼ Bin(n, p) then P (X = x) = n x px (1 − p)n−x 17. It is also approximately true that if n is large and p is small, then P (X = x) = e−λ λx x! where λ = np. 18. The latter can be much easier to calculate than the former, and can be less subject to numerical error. 19. The approximation is often good enough for practical purposes. 20. How large is “large” and how small is “small”? There is of course no definite answer, but a good rule of thumb is that one should have n ≥ 100, p ≤ 0.01, and np ≤ 20. 21. Example of a Poisson distribution: Chartreuse Fig trees occur fairly sparsely in a forest. The average number of Chartreuse Fig trees per hectare is 80. A silvaculture researcher picks a random point in the forest, draws a circle of radius 12 m., and counts the number of Chartreuse Fig trees within that circle. 22. Questions: (a) What is the expected number of Chartreuse Fig trees that the researcher will observe? (b) What is the variability of this number? (c) What is the probability that she observes exactly 4 Chartreuse Fig trees? (d) What is the probability that she observes fewer than 4 Chartreuse Fig trees? 23. To solve this problem, we let X be the number of Chartreuse Fig trees within the circle. 24. Since these trees are “fairly sparse” we can (maybe!) get away with assuming that they “don’t interact” and hence that the counts of these trees in non-overlapping regions are independent. 25. Hence we can assume that X has a Poisson distribution. 26. The measure of the circle is t = πr 2 = π × 122 = 452.3893 square metres. 27. Hence the Poisson parameter for X is λ = t × 80/10000 = 452.3893 × 0.008 = 3.6191. 2 28. Now the answers to the questions: (a) The expected number of Chartreuse Fig trees is E(X) = λ = 3.6191. (b) √ The variance of X is V (X) = λ = 3.6191 and so the standard deviation of X is 3.6191 = 1.9024 (to 4 decimal places). (c) The probability of observing exactly 4 Chartreuse Fig trees is P (X = 4) = e−3.6191 × 3.61914 = 0.1916 4! (to 4 decimal places). (d) The probability of observing fewer than 4 trees is P (X < 4) = P (X ≤ 3) = P (X = 0) + . . . + P (X = 3) After some tedious calculation, or some judicious use of a computer we get that this is equal to 0.0268 + 0.0970 + 0.1756 + 0.2118 = 0.5112. 29. “Judicious use of a computer” basically means that you should make use of Minitab (or the equivalent). 30. To do so, click on Calc, then on Probability Distributions, then on Poisson. 31. Then make the appropriate choices and fill in the appropriate boxes. 32. Or, more simply, in the session window, type the command cdf 3; (NOTE: This sez Cdf NOT Pdf!!!) followed by the subcommand pois 3.6191. . 33. The foregoing gives you the value of F (3) where F (x) is the cumulative distribution function of a Poisson random variable with parameter λ = 3.6191. 34. That is, it gives P (X ≤ 3). 35. Suppose you want the individual values of P (X = 0), . . . , P (X = 3). 36. The easiest thing to do is to put the x values 0, 1, 2, 3, into column 1 (say). 37. Then type into the session window the command pdf c1 c2; followed by the subcommand pois 3.6191. . 38. You can follow this up with the commands print c2 and sum c2 . 39. The print command will display the probabilities of 0, 1, 2, 3, in the session window. 40. The sum command will give you the same result as using cdf 3 . 3 41. Example of using the Poisson approximation to the Binomial distribution: In an article in the Los Angeles Times, Dec. 1993, it was reported that 1 in 200 people carry the defective gene that causes inherited colon cancer. In a sample of 1000 individuals, what is the probability that (a) Between 5 and 8 individuals carry the gene? (b) At least 8 carry the gene? 42. Solution: Let X be the number of individuals in the sample who carry the defective gene. The distribution of X is Bin(1000, 0.005). The n and p satisfy the rule of thumb for being large and small respectively, so we can say that X is approximately Poisson distributed with parameter λ = np = 5. (a) The probability that between 5 and 8 individuals in the sample carry the gene is P (X = 5) + . . . + P (X = 8) Using the Poisson approximation this is (approximately) e−5 55 e−5 56 e−5 58 + + ... + 5! 6! 8! = 0.1755 + 0.1462 + 0.1044 + 0.0653 = 0.4914 (where the calculated values have all been rounded to 4 decimal places). (b) The probability that at least 8 individuals in the sample carry the gene is P (X ≥ 8) = 1 − P (X < 8) = 1 − P (X ≤ 7). Using the Poisson approximation e−5 57 e−5 51 e−5 52 + + ... + 1! 2! 7! = 0.0067 + 0.0337 + 0.0842 + 0.1404 + P (X ≤ 7) = e−5 + 0.1755 + 0.1755 + 0.1462 + 0.1044 = 0.8666 (where again the calculated values have all been rounded to 4 decimal places). Hence the probability that at least 8 individuals in the sample carry the gene is (approximately) 1 − 0.8666 − 0.1334. 43. Problem: Suppose that grasshoppers are distributed at random and independently in a large field, and that there are on average 2 grasshoppers per square metre. How large should the radius of a circular sampling region be taken so that the probability of finding at least one grasshopper in the region is 0.99? What about so that the probability of finding at least 2 grasshoppers is 0.99? (Here you’ll get an equation that you “can’t solve” — at least not “analytically”. You could use trial and error methods to get an approximate solution, or you could use graphical methods with the help of Minitab. Did anybody learn Newton’s method for solving equations in Math 1003?) 4