Download The Poisson Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
The Poisson Distribution
Mary Lindstrom
(Adapted from notes provided by Professor Bret Larget)
February 5, 2004
Statistics 371
Last modified: February 4, 2004
The Poisson Distribution
The Poisson distribution arises in many biological contexts.
Examples of random variables for which a Poisson distribution
might be reasonable include:
•
•
•
•
the number of bacterial colonies in a Petri dish;
the number of trees in an area of land;
the number of offspring an individual has;
the number of nucleotide base substitutions in a gene over a
period of time;
Statistics 371
1
Probability Mass Function
The probability mass function of the Poisson distribution with
mean µ is
e−µµk
for k = 0, 1, 2, . . ..
Pr{Y = k} =
k!
Where e is a special number in math (like π). It is approximately
equal to 2.718282.
The Poisson distribution is discrete, like the binomial distribution,
but has only a single parameter µ and it has infinitely many
possible outcomes.
The mean and variance are the same for a Poisson random
variable.
µY = E(Y ) = µ
σ 2 = V ar(Y ) = µ
Statistics 371
2
Computing Poisson Probabilities in R
The function dpois will compute Poisson probabilities.
−10 1012
e
with the command
If µ = 10, we can find Pr{Y = 12} =
12!
> dpois(12, 10)
[1] 0.09478033
> dpois(12, 1000)
[1] 0
Or, if we wanted the probabilities that a Poisson random variable
with mean 4 would take on the values 8 through 12 we would
type:
> dpois(8:12, 4)
[1] 0.0297701813 0.0132311917 0.0052924767 0.0019245370 0.0006415123
Statistics 371
3
Plotting Poisson Probabilities in R
>
>
>
>
source("prob.R")
par(mfrow = c(2, 1))
gpois(mu = 2)
gpois(mu = 10)
0.25
0.00
Probability
Poisson Distribution
mu = 2
0
2
4
6
8
Count
0.00 0.10
Probability
Poisson Distribution
mu = 10
0
5
10
15
20
Count
Statistics 371
4
Poisson approximation to the binomial
One way that the Poisson distribution can arise is as an
approximation for the binomial distribution when p is small. The
approximation is quite good for large enough n.
If p is small and n is large then the probability that a binomial(n, p)
R.V. is equal to k is approximately the same as the probability
that a Poisson R.V. with µ = np is equal to k.
Here is an example with p = 0.01 and n = 50.
> dbinom(0:4, 50, 0.01)
[1] 0.605006067 0.305558620 0.075618042 0.012221098 0.001450484
> dpois(0:4, 50 * 0.01)
[1] 0.606530660 0.303265330 0.075816332 0.012636055 0.001579507
This approximation is most useful when n is large so that the
binomial coefficients are very large.
Statistics 371
5
The Poisson Process
The Poisson Process arises naturally under assumptions that are
often reasonable. For the following, think of occurrences as
being exact times of events or random locations of something.
The assumptions are:
1. The chance of two simultaneous occurrences (or occurrences at the same location) is negligible;
2. The expected value of the random number of occurrences
in a time interval (or in a region) is proportional to the
length of the interval (area of the region).
3. The random number of occurrences in non-overlapping
time intervals (regions) are independent.
Under these assumptions, the random variable that counts the
number of occurrences has a Poisson distribution.
Statistics 371
6
Generating one Poisson R.V. from
another
Sometimes we are given a Poisson R.V. for the number of
occurrences for one unit of length (area, time) but are interested
in another. We can create a second Poisson random variable
from the first.
If Y1 is a Poisson random variable with mean µ1 counting the
number of occurrences per unit length (area, time).
Then the random variable Y2 which counts the number of
occurrences in an interval of length (area, time) t is Poisson
with mean µ2 = tµ1.
Statistics 371
7
Example
Suppose that we assume that at a location, a particular species of
plant is distributed according to a Poisson process with expected
density 0.2 individuals per square meter. In a nine square meter
quadrant, what is the probability of no individuals?
Solution: The number of individuals in 9 square meters has a
Poisson distribution with mean µ = 9×0.2 = 1.8. The probability
of no individuals in 9 meters is
e−1.8(1.8)0
Pr{Y = 0} =
= e−1.8 = 0.165
0!
In R, we can compute this as
> dpois(0, 1.8)
[1] 0.1652989
Statistics 371
8
Example (cont.)
Find the probability of three or more individuals in 9 square
meters.
Solution: Instead of summing the probabilities from 3 to infinity,
we can use the complement rule.
Pr{Y ≥ 3} = 1−Pr{Y ≤ 2} = 1−Pr{Y = 0}−Pr{Y = 1}−Pr{Y = 2}
In R, this is found as
> 1 - sum(dpois(0:2, 1.8))
[1] 0.2693789
Statistics 371
9
Serial (or Limiting) Dilution Assays
Consider an experiment to find the concentration of colony
forming units (cells that will result in the growth of a colony)
in a solution. Assume that it is difficult or impossible to count
individual colonies.
First create a series of dilutions at the following strengths:
•
•
•
•
•
•
•
1/2 the strength of the the original solution
1/4 the strength of the the original solution
1/8 the strength of the the original solution
1/16 the strength of the the original solution
1/32 the strength of the the original solution
1/64 the strength of the the original solution
1/128 the strength of the the original solution
Statistics 371
10
Serial (or Limiting) Dilution Assays
Now take 80 plates and plate out 1 milliliter of each solution on
to each of 10 growing plates.
Wait an appropriate length of time and record the number of
plates that have at least one colony growing on it.
Our data might look like:
Dilution
Number of plates out
of 10 with no colonies
Full strength
0
1/2 strength
1
1/4 strength
3
1/8 strength
6
1/16 strength
8
1/32 strength
8
1/64 strength
9
1/128 strength 10
Statistics 371
11
Serial (or Limiting) Dilution Assays
How would we estimate the number of colony forming units
(CFU) per ml in the original solution?
First let’s compute the proportion of the plates with no colonies:
Plates with no colonies
Dilution
Full strength
1/2 strength
1/4 strength
1/8 strength
1/16 strength
1/32 strength
1/64 strength
1/128 strength
Statistics 371
Number
Proportion
0
1
3
6
8
8
9
10
0.00
0.10
0.30
0.60
0.80
0.80
0.90
1.00
12
Serial (or Limiting) Dilution Assays
Lets start with the Full strength solution. Let’s assume:
• the original solution has µ CFUs per ml
• the number of CFU’s in any individual ml of the original
solution is a Poisson distributed random variable with mean
µ.
Is this reasonable? What are the assumptions required for a R.V.
to be Poisson?
1. The chance of two simultaneous occurrences (or occurrences at the same location) is negligible;
2. The expected value of the random number of occurrences
in a time interval (or in a region) is proportional to the
length of the interval (area of the region).
Statistics 371
13
Serial (or Limiting) Dilution Assays
3. The random number of occurrences in non-overlapping
time intervals (regions) are independent.
If Y is the number of CFU’s in 1 ml of the original solution then
we know that
e−µµk
Pr{Y = k} =
k!
for k = 0, 1, 2, . . ..
A plate having no colonies is equivalent to Y = 0.
calculate the probability that Y = 0 as
We can
e−µµ0
Pr{Y = 0} =
= e−µ
0!
Statistics 371
13
Serial (or Limiting) Dilution Assays
What about our dilutions?
If the number of CFU’s in a ml of the original solution is
Poisson(µ) then the number of CFU’s in 1/d dilution of the
original solution will be Poisson(µ/d).
Let’s expand our table to show the expected proportion of plates
with no colonies
Statistics 371
14
Serial (or Limiting) Dilution Assays
Dilution
Full strength
1/2 strength
1/4 strength
1/8 strength
1/16 strength
1/32 strength
1/64 strength
Plates with no colonies
Expected
Number Proportion Proportion
0
0.00
exp(−µ)
1
0.10
exp(−µ/2)
3
0.30
exp(−µ/4)
6
0.60
exp(−µ/8)
8
0.80 exp(−µ/16)
8
0.80 exp(−µ/32)
9
0.90 exp(−µ/64)
Notes
• exp(x) = ex
• We drop the dilutions which give 0 or 10 plates with zero
colonies. This method does not use them.
Statistics 371
15
Serial (or Limiting) Dilution Assays
Now take the log (base e) of the proportions and the expected
proportions
Dilution
1/2 strength
1/4 strength
1/8 strength
1/16 strength
1/32 strength
1/64 strength
Statistics 371
Plates with no colonies
Log Log expected
Number Proportion
Proportion
1 loge(0.10)
−µ/2
3 loge(0.30)
−µ/4
6 loge(0.60)
−µ/8
8 loge(0.80)
−µ/16
8 loge(0.80)
−µ/32
9 loge(0.90)
−µ/64
16
Serial (or Limiting) Dilution Assays
Multiply through by the dilutions to get
Dilution
1/2 strength
1/4 strength
1/8 strength
1/16 strength
1/32 strength
1/64 strength
Statistics 371
Plates with no colonies
Sample Theoretical
Number
Statistic
Value
1
−2 × loge(0.10)
µ
3
−4 × loge(0.30)
µ
6
−8 × loge(0.60)
µ
8 −16 × loge(0.80)
µ
8 −32 × loge(0.80)
µ
9 −64 × loge(0.90)
µ
17
Serial (or Limiting) Dilution Assays
Doing the math we get
Dilution
1/2 strength
1/4 strength
1/8 strength
1/16 strength
1/32 strength
1/64 strength
mean
Plates with no colonies
Sample Theoretical
Number Statistic
Value
1
4.8
µ
3
4.8
µ
6
4.1
µ
8
3.6
µ
8
7.1
µ
9
6.7
µ
5.2
µ
So we might use 5.2 as our estimate of the true but unknown
value of µ.
Note that there are other ways of analyzing this type of
experiment.
Statistics 371
18
Serial (or Limiting) Dilution Assays
We can use R to calculate the expected number of plates with
zero colonies assuming µ = 5.2.
> dilutions = 2^(0:7)
> dilutions
[1]
1
2
4
8 16 32 64 128
> means <- 5.2/dilutions
> means
[1] 5.200000 2.600000 1.300000 0.650000 0.325000 0.162500 0.081250 0.040625
> round(dpois(0, means), 2)
[1] 0.01 0.07 0.27 0.52 0.72 0.85 0.92 0.96
Statistics 371
19
Serial (or Limiting) Dilution Assays
So assuming µ = 5.2 our observed and expected proportion of
plates with no colonies are
Dilution
full strength
1/2 strength
1/4 strength
1/8 strength
1/16 strength
1/32 strength
1/64 strength
1/128 strength
Proportion
Observed Expected
0.00
0.01
0.10
0.07
0.30
0.27
0.60
0.52
0.80
0.72
0.80
0.85
0.90
0.92
1.00
0.96
Why don’t the observed and expected match?
Statistics 371
20