Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Poisson Distribution Mary Lindstrom (Adapted from notes provided by Professor Bret Larget) February 5, 2004 Statistics 371 Last modified: February 4, 2004 The Poisson Distribution The Poisson distribution arises in many biological contexts. Examples of random variables for which a Poisson distribution might be reasonable include: • • • • the number of bacterial colonies in a Petri dish; the number of trees in an area of land; the number of offspring an individual has; the number of nucleotide base substitutions in a gene over a period of time; Statistics 371 1 Probability Mass Function The probability mass function of the Poisson distribution with mean µ is e−µµk for k = 0, 1, 2, . . .. Pr{Y = k} = k! Where e is a special number in math (like π). It is approximately equal to 2.718282. The Poisson distribution is discrete, like the binomial distribution, but has only a single parameter µ and it has infinitely many possible outcomes. The mean and variance are the same for a Poisson random variable. µY = E(Y ) = µ σ 2 = V ar(Y ) = µ Statistics 371 2 Computing Poisson Probabilities in R The function dpois will compute Poisson probabilities. −10 1012 e with the command If µ = 10, we can find Pr{Y = 12} = 12! > dpois(12, 10) [1] 0.09478033 > dpois(12, 1000) [1] 0 Or, if we wanted the probabilities that a Poisson random variable with mean 4 would take on the values 8 through 12 we would type: > dpois(8:12, 4) [1] 0.0297701813 0.0132311917 0.0052924767 0.0019245370 0.0006415123 Statistics 371 3 Plotting Poisson Probabilities in R > > > > source("prob.R") par(mfrow = c(2, 1)) gpois(mu = 2) gpois(mu = 10) 0.25 0.00 Probability Poisson Distribution mu = 2 0 2 4 6 8 Count 0.00 0.10 Probability Poisson Distribution mu = 10 0 5 10 15 20 Count Statistics 371 4 Poisson approximation to the binomial One way that the Poisson distribution can arise is as an approximation for the binomial distribution when p is small. The approximation is quite good for large enough n. If p is small and n is large then the probability that a binomial(n, p) R.V. is equal to k is approximately the same as the probability that a Poisson R.V. with µ = np is equal to k. Here is an example with p = 0.01 and n = 50. > dbinom(0:4, 50, 0.01) [1] 0.605006067 0.305558620 0.075618042 0.012221098 0.001450484 > dpois(0:4, 50 * 0.01) [1] 0.606530660 0.303265330 0.075816332 0.012636055 0.001579507 This approximation is most useful when n is large so that the binomial coefficients are very large. Statistics 371 5 The Poisson Process The Poisson Process arises naturally under assumptions that are often reasonable. For the following, think of occurrences as being exact times of events or random locations of something. The assumptions are: 1. The chance of two simultaneous occurrences (or occurrences at the same location) is negligible; 2. The expected value of the random number of occurrences in a time interval (or in a region) is proportional to the length of the interval (area of the region). 3. The random number of occurrences in non-overlapping time intervals (regions) are independent. Under these assumptions, the random variable that counts the number of occurrences has a Poisson distribution. Statistics 371 6 Generating one Poisson R.V. from another Sometimes we are given a Poisson R.V. for the number of occurrences for one unit of length (area, time) but are interested in another. We can create a second Poisson random variable from the first. If Y1 is a Poisson random variable with mean µ1 counting the number of occurrences per unit length (area, time). Then the random variable Y2 which counts the number of occurrences in an interval of length (area, time) t is Poisson with mean µ2 = tµ1. Statistics 371 7 Example Suppose that we assume that at a location, a particular species of plant is distributed according to a Poisson process with expected density 0.2 individuals per square meter. In a nine square meter quadrant, what is the probability of no individuals? Solution: The number of individuals in 9 square meters has a Poisson distribution with mean µ = 9×0.2 = 1.8. The probability of no individuals in 9 meters is e−1.8(1.8)0 Pr{Y = 0} = = e−1.8 = 0.165 0! In R, we can compute this as > dpois(0, 1.8) [1] 0.1652989 Statistics 371 8 Example (cont.) Find the probability of three or more individuals in 9 square meters. Solution: Instead of summing the probabilities from 3 to infinity, we can use the complement rule. Pr{Y ≥ 3} = 1−Pr{Y ≤ 2} = 1−Pr{Y = 0}−Pr{Y = 1}−Pr{Y = 2} In R, this is found as > 1 - sum(dpois(0:2, 1.8)) [1] 0.2693789 Statistics 371 9 Serial (or Limiting) Dilution Assays Consider an experiment to find the concentration of colony forming units (cells that will result in the growth of a colony) in a solution. Assume that it is difficult or impossible to count individual colonies. First create a series of dilutions at the following strengths: • • • • • • • 1/2 the strength of the the original solution 1/4 the strength of the the original solution 1/8 the strength of the the original solution 1/16 the strength of the the original solution 1/32 the strength of the the original solution 1/64 the strength of the the original solution 1/128 the strength of the the original solution Statistics 371 10 Serial (or Limiting) Dilution Assays Now take 80 plates and plate out 1 milliliter of each solution on to each of 10 growing plates. Wait an appropriate length of time and record the number of plates that have at least one colony growing on it. Our data might look like: Dilution Number of plates out of 10 with no colonies Full strength 0 1/2 strength 1 1/4 strength 3 1/8 strength 6 1/16 strength 8 1/32 strength 8 1/64 strength 9 1/128 strength 10 Statistics 371 11 Serial (or Limiting) Dilution Assays How would we estimate the number of colony forming units (CFU) per ml in the original solution? First let’s compute the proportion of the plates with no colonies: Plates with no colonies Dilution Full strength 1/2 strength 1/4 strength 1/8 strength 1/16 strength 1/32 strength 1/64 strength 1/128 strength Statistics 371 Number Proportion 0 1 3 6 8 8 9 10 0.00 0.10 0.30 0.60 0.80 0.80 0.90 1.00 12 Serial (or Limiting) Dilution Assays Lets start with the Full strength solution. Let’s assume: • the original solution has µ CFUs per ml • the number of CFU’s in any individual ml of the original solution is a Poisson distributed random variable with mean µ. Is this reasonable? What are the assumptions required for a R.V. to be Poisson? 1. The chance of two simultaneous occurrences (or occurrences at the same location) is negligible; 2. The expected value of the random number of occurrences in a time interval (or in a region) is proportional to the length of the interval (area of the region). Statistics 371 13 Serial (or Limiting) Dilution Assays 3. The random number of occurrences in non-overlapping time intervals (regions) are independent. If Y is the number of CFU’s in 1 ml of the original solution then we know that e−µµk Pr{Y = k} = k! for k = 0, 1, 2, . . .. A plate having no colonies is equivalent to Y = 0. calculate the probability that Y = 0 as We can e−µµ0 Pr{Y = 0} = = e−µ 0! Statistics 371 13 Serial (or Limiting) Dilution Assays What about our dilutions? If the number of CFU’s in a ml of the original solution is Poisson(µ) then the number of CFU’s in 1/d dilution of the original solution will be Poisson(µ/d). Let’s expand our table to show the expected proportion of plates with no colonies Statistics 371 14 Serial (or Limiting) Dilution Assays Dilution Full strength 1/2 strength 1/4 strength 1/8 strength 1/16 strength 1/32 strength 1/64 strength Plates with no colonies Expected Number Proportion Proportion 0 0.00 exp(−µ) 1 0.10 exp(−µ/2) 3 0.30 exp(−µ/4) 6 0.60 exp(−µ/8) 8 0.80 exp(−µ/16) 8 0.80 exp(−µ/32) 9 0.90 exp(−µ/64) Notes • exp(x) = ex • We drop the dilutions which give 0 or 10 plates with zero colonies. This method does not use them. Statistics 371 15 Serial (or Limiting) Dilution Assays Now take the log (base e) of the proportions and the expected proportions Dilution 1/2 strength 1/4 strength 1/8 strength 1/16 strength 1/32 strength 1/64 strength Statistics 371 Plates with no colonies Log Log expected Number Proportion Proportion 1 loge(0.10) −µ/2 3 loge(0.30) −µ/4 6 loge(0.60) −µ/8 8 loge(0.80) −µ/16 8 loge(0.80) −µ/32 9 loge(0.90) −µ/64 16 Serial (or Limiting) Dilution Assays Multiply through by the dilutions to get Dilution 1/2 strength 1/4 strength 1/8 strength 1/16 strength 1/32 strength 1/64 strength Statistics 371 Plates with no colonies Sample Theoretical Number Statistic Value 1 −2 × loge(0.10) µ 3 −4 × loge(0.30) µ 6 −8 × loge(0.60) µ 8 −16 × loge(0.80) µ 8 −32 × loge(0.80) µ 9 −64 × loge(0.90) µ 17 Serial (or Limiting) Dilution Assays Doing the math we get Dilution 1/2 strength 1/4 strength 1/8 strength 1/16 strength 1/32 strength 1/64 strength mean Plates with no colonies Sample Theoretical Number Statistic Value 1 4.8 µ 3 4.8 µ 6 4.1 µ 8 3.6 µ 8 7.1 µ 9 6.7 µ 5.2 µ So we might use 5.2 as our estimate of the true but unknown value of µ. Note that there are other ways of analyzing this type of experiment. Statistics 371 18 Serial (or Limiting) Dilution Assays We can use R to calculate the expected number of plates with zero colonies assuming µ = 5.2. > dilutions = 2^(0:7) > dilutions [1] 1 2 4 8 16 32 64 128 > means <- 5.2/dilutions > means [1] 5.200000 2.600000 1.300000 0.650000 0.325000 0.162500 0.081250 0.040625 > round(dpois(0, means), 2) [1] 0.01 0.07 0.27 0.52 0.72 0.85 0.92 0.96 Statistics 371 19 Serial (or Limiting) Dilution Assays So assuming µ = 5.2 our observed and expected proportion of plates with no colonies are Dilution full strength 1/2 strength 1/4 strength 1/8 strength 1/16 strength 1/32 strength 1/64 strength 1/128 strength Proportion Observed Expected 0.00 0.01 0.10 0.07 0.30 0.27 0.60 0.52 0.80 0.72 0.80 0.85 0.90 0.92 1.00 0.96 Why don’t the observed and expected match? Statistics 371 20