Download Notes on The Poisson Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Notes on The Poisson Distribution
1. Essentially the Poisson distribution arises when the random variable in question is a
count of points or “specks” in a CONTINUOUS MEDIUM.
2. Consider random points (we often refer to them as “events” and think of them as
“specks”) which occur in space or time.
3. Suppose that these “events” occur in such a way that:
(i) The counts of points in non-overlapping regions are INDEPENDENT random
variables.
(ii) If R is a region having unit “measure” then the expected count of points in R is
equal to a constant α.
4. Remark: By “measure” we mean length, or area, or volume, or hypervolume, depending on the dimension of the space.
5. “Unit” measure means that this measure is equal to 1.
6. Now let X be the count of points or events in a region R, where the measure (length,
area, volume, etc.) of R is equal to t.
7. Let λ = αt.
8. The random variable X has a Poisson distribution with parameter λ.
9. The probability function for X is
P (X = x) =
e−λ λx
x!
10. The mean of X is E(X) = λ
11. The variance of X is V (X) = λ = E(X) !!!
12. Hence the standard deviation of X is equal to
√
λ.
13. The Poisson distribution can also arise as the “limit” of binomial distributions.
14. The rough idea is that if you are counting the number of (rare) successes in a LARGE
sample then this like counting “specks” in a continuous medium.
15. More explicitly, if X ∼ Bin(n, p) where n is “large” and p is “small”, then it is
APPROXIMATELY TRUE that X ∼ Poisson(λ) where λ = np.
16. I.e. it is exactly true that if X ∼ Bin(n, p) then
P (X = x) =
n
x
px (1 − p)n−x
17. It is also approximately true that if n is large and p is small, then
P (X = x) =
e−λ λx
x!
where λ = np.
18. The latter can be much easier to calculate than the former, and can be less subject
to numerical error.
19. The approximation is often good enough for practical purposes.
20. How large is “large” and how small is “small”? There is of course no definite answer,
but a good rule of thumb is that one should have n ≥ 100, p ≤ 0.01, and np ≤ 20.
21. Example of a Poisson distribution: Chartreuse Fig trees occur fairly sparsely in a
forest. The average number of Chartreuse Fig trees per hectare is 80. A silvaculture
researcher picks a random point in the forest, draws a circle of radius 12 m., and
counts the number of Chartreuse Fig trees within that circle.
22. Questions:
(a) What is the expected number of Chartreuse Fig trees that the researcher will
observe?
(b) What is the variability of this number?
(c) What is the probability that she observes exactly 4 Chartreuse Fig trees?
(d) What is the probability that she observes fewer than 4 Chartreuse Fig trees?
23. To solve this problem, we let X be the number of Chartreuse Fig trees within the
circle.
24. Since these trees are “fairly sparse” we can (maybe!) get away with assuming that
they “don’t interact” and hence that the counts of these trees in non-overlapping
regions are independent.
25. Hence we can assume that X has a Poisson distribution.
26. The measure of the circle is t = πr 2 = π × 122 = 452.3893 square metres.
27. Hence the Poisson parameter for X is λ = t × 80/10000 = 452.3893 × 0.008 = 3.6191.
2
28. Now the answers to the questions:
(a) The expected number of Chartreuse Fig trees is E(X) = λ = 3.6191.
(b) √
The variance of X is V (X) = λ = 3.6191 and so the standard deviation of X is
3.6191 = 1.9024 (to 4 decimal places).
(c) The probability of observing exactly 4 Chartreuse Fig trees is
P (X = 4) =
e−3.6191 × 3.61914
= 0.1916
4!
(to 4 decimal places).
(d) The probability of observing fewer than 4 trees is
P (X < 4) = P (X ≤ 3) = P (X = 0) + . . . + P (X = 3)
After some tedious calculation, or some judicious use of a computer we get that
this is equal to 0.0268 + 0.0970 + 0.1756 + 0.2118 = 0.5112.
29. “Judicious use of a computer” basically means that you should make use of Minitab
(or the equivalent).
30. To do so, click on Calc, then on Probability Distributions, then on Poisson.
31. Then make the appropriate choices and fill in the appropriate boxes.
32. Or, more simply, in the session window, type the command cdf 3; (NOTE: This sez
Cdf NOT Pdf!!!) followed by the subcommand pois 3.6191. .
33. The foregoing gives you the value of F (3) where F (x) is the cumulative distribution
function of a Poisson random variable with parameter λ = 3.6191.
34. That is, it gives P (X ≤ 3).
35. Suppose you want the individual values of P (X = 0), . . . , P (X = 3).
36. The easiest thing to do is to put the x values 0, 1, 2, 3, into column 1 (say).
37. Then type into the session window the command pdf c1 c2; followed by the subcommand pois 3.6191. .
38. You can follow this up with the commands print c2 and sum c2 .
39. The print command will display the probabilities of 0, 1, 2, 3, in the session window.
40. The sum command will give you the same result as using cdf 3 .
3
41. Example of using the Poisson approximation to the Binomial distribution:
In an article in the Los Angeles Times, Dec. 1993, it was reported that 1 in 200
people carry the defective gene that causes inherited colon cancer. In a sample of
1000 individuals, what is the probability that
(a) Between 5 and 8 individuals carry the gene?
(b) At least 8 carry the gene?
42. Solution: Let X be the number of individuals in the sample who carry the defective
gene. The distribution of X is Bin(1000, 0.005). The n and p satisfy the rule of thumb
for being large and small respectively, so we can say that X is approximately Poisson
distributed with parameter λ = np = 5.
(a) The probability that between 5 and 8 individuals in the sample carry the gene is
P (X = 5) + . . . + P (X = 8)
Using the Poisson approximation this is (approximately)
e−5 55 e−5 56
e−5 58
+
+ ... +
5!
6!
8!
= 0.1755 + 0.1462 + 0.1044 + 0.0653
= 0.4914
(where the calculated values have all been rounded to 4 decimal places).
(b) The probability that at least 8 individuals in the sample carry the gene is
P (X ≥ 8) = 1 − P (X < 8) = 1 − P (X ≤ 7).
Using the Poisson approximation
e−5 57
e−5 51 e−5 52
+
+ ... +
1!
2!
7!
= 0.0067 + 0.0337 + 0.0842 + 0.1404 +
P (X ≤ 7) = e−5 +
0.1755 + 0.1755 + 0.1462 + 0.1044
= 0.8666
(where again the calculated values have all been rounded to 4 decimal places).
Hence the probability that at least 8 individuals in the sample carry the gene is
(approximately) 1 − 0.8666 − 0.1334.
43. Problem: Suppose that grasshoppers are distributed at random and independently
in a large field, and that there are on average 2 grasshoppers per square metre. How
large should the radius of a circular sampling region be taken so that the probability
of finding at least one grasshopper in the region is 0.99?
What about so that the probability of finding at least 2 grasshoppers is 0.99? (Here
you’ll get an equation that you “can’t solve” — at least not “analytically”. You could
use trial and error methods to get an approximate solution, or you could use graphical
methods with the help of Minitab. Did anybody learn Newton’s method for solving
equations in Math 1003?)
4