Download Introduction to Probability Distributions

PHYS 233 Probability Distributions Introduction to Probability Distributions This is not intended to be an alternative to reading a textbook on probability and probability distributions. More detail can be found in the following references: References 1. ‘An Introduction to Error Analysis’ by J.R. Taylor, University Science Books, 1997. 2. ‘Data Reduction and Error Analysis for the Physical Sciences’ by P.R. Bevington and D.K. Robinson, McGraw Hill, 1992. Introduction Probability distributions describe the probability of observing a particular event. There are several probability distributions that are important to physicists. The binomial distribution, while not of much practical significance, is easy to describe, and can be used to derive the other distributions used most often by experimental physicists: the Gaussian and Poisson distributions. The Gaussian, or normal distribution, is the most important as it is most often used to describe the distribution of results for any measurement subject to small, random error. The Poisson distribution is particularly useful in describing counting experiments. A fourth distribution, the exponential distribution or interval distribution, describes the distribution of intervals between counting events. In this lab, you will investigate the Poisson and interval distributions. Probability In order to understand the statistical methods of dealing with random processes and how some predictability can be garnered from such chance events, we will examine some simple cases involving coin tosses and dice. First we introduce three important properties of probability: 1. If you consider two possible events A and B which are mutually exclusive (that is, if A happens B cannot happen and vice versa) then the probability of either A or B happening is the sum of the probabilities of A and B: P (A or B) = P (A) + P (B). An example of two such events would be a coin toss where there are two possible events, A =heads or B =tails. 2. The sum of the probabilities of all possible mutually exclusive events of a trial is unity, because one of the events must happen in every trial: P (A) + P (B) + P (C) + ... = 1. In our coin toss example, the coin must turn up either heads or tails. 3. The probability that two independent events will both happen is the product of the probabilities of the two single events: P (A and B) = P (A) · P (B). An example of two independent events would be two coin tosses. PHYS 233 Probability Distributions From these rules we can draw the following conclusions If a trial has n and only n possible different outcomes, and if you know that all of the outcomes have equal a priori probabilities of happening, then the probability of a given outcome must be equal to 1/n. If you classify the outcomes of a trial into different classes, and if the number of events belonging to one class is m, the probability that an event belonging to that class will happen is m/n. We have to bear in mind that the concept of “equal probability” of events has to be derived from experience. Once we have classified by experience all the possible different and mutually exclusive events in such a manner that they have equal a priori probability, we can apply the rules of probabilities for detailed calculations. The key problem, therefore, is to identify which events have equal a priori probability. It requires considerable care to avoid mistakes. For example, if you toss two coins, you might argue that there are three possible outcomes: two heads, two tails, or one head and one tail. If you assume that each of these probabilities are equally likely then the predicted probability would be 1/3 each. Experience shows this to be wrong. The mistake is in having assumed two different events are only one event: heads followed by tails, and tails followed by heads. This nuance will be clarified by working out in detail the case of tossing four coins. Example 1: Four coins Toss four coins. Each coin has a 50% probability of turning up heads and a 50% probability of turning up tails. (This seems logical, but it is an assumption that should be justified by experience.) Let p represent the probability of heads and q = 1 − p that of tails: p = 0.5, q = 0.5. The probability of no heads in a toss is the probability that all four coins turn up tails simultaneously: (probability coin A is tails and coin B is tails and coin C is tails and coin D is tails) = (probability coin A is tails) x (probability coin B is tails) x (probability coin C is tails) x (probability coin D is tails). There are 16 different ways the toss can turn out if we can distinguish which coin is which. Each of the 16 ways is equally likely and only 1 of those sixteen ways is all tails. Let Po represent the probability of none of the four coins turning up heads. Po = q 4 = 0.5 × 0.5 × 0.5 × 0.5 = 1/16 PHYS 233 Probability Distributions There are four ways that one coin can turn up heads. Coin A can be heads, coin B could be heads, coin C could be heads or coin D could be heads. Each one of these has a probability of p q 3 = 1/16. Thus there are 4 chances out of 16 for 1 head if we don’t care which coin is heads: P1 = q p3 + pqp2 + p2 qp + p3 q = 4 × (1/16) = 1/4 The probability that both coins of a specific pair are heads and the other two are tails is p2 q 2 . To calculate the probability that any two coins be heads we have to figure out how many different pairs there are. How many different ways can the four coins turn up two heads and two tails? Consider choosing the two coins that are to be heads. There are four ways of choosing the first coin and three ways of choosing the second so that there are 4 × 3 or 12 ways of choosing two from four (“four choose two or 4 C2 ”). But half of these 12 are really the same two coins that have been chosen in a different order. For example if we label the coins ABCD we can choose two in the following possible ways: AB AC AD BA (same as AB) BC BD CA (same as AC) CB (same as BC) CD DA (same as AD) DB (same as BD) DC (same as CD) Those cases where the same two coins have been chosen but in a different order must be eliminated from the count. The ways of choosing two different coins from among four are shown in Fig. 1. Thus you can see that the total number is A B C D A B C D A B CD Figure 1: Choosing two coins from four. PHYS 233 4×3 2 Probability Distributions =6. You should be able to convince yourself that the number of different ways r things can be chosen from m, when the order is unimportant, is m! (m − r)!r! The logic in this formula is as follows: the number of ways one can choose r from m without m! regard to duplication is m(m − 1)(m − 2)...(m − r + 1) which is (m−r)! . This quantity must be divided by r! to account for duplicates consisting of the same coins chosen in a different order. This is the number of different possible combinations of m items taken r at a time. The Binomial Distribution Now we are ready to write down an expression for the probability distribution that describes the likelihood of r events (e.g. heads) occurring in a total of m events (e.g. coin flips) where the probability of an r-event occurring is p while the probability of it not occurring is (1 − p). Since the individual events occur independently, the probability of a subset of r events amongst many m is the product of individual probabilities. If r occur, then m − r don’t and the probability is pr (1 − p)m−r . For the total probability of a particular event occurring (e.g. 2 heads), we multiply the probability that the event occurs by the number of ways that event can occur. The complete formula for the probability distribution is then given by m! Pr = (1 − p)m−r pr . (1) (m − r)!r! This distribution is called the binomial distribution. It describes the probability that r events occur among a total of m independent events. Note that it is a discrete distribution; it is defined only at integral values of the variable r. We can now use Eq. 1 to calculate the probability of getting two heads among four coins. Remember, for the coin toss, the number of events is r = 2 out of a total of m = 4 coins and the probability of each event is p = 1/2. Then 4! 1 1− 2!2! 2 3 . = 8 P2 = The other values of Pr can be obtained similarly. 2 2 1 2 PHYS 233 Probability Distributions Problem 1: Use Eq. 1 to complete column 2 of the following table. Plot the histogram of values. r 0 1 2 3 4 sum Pr rPr (r − r̄)2 Pr Recall that the total probability of all possible events must sum to unity: m X Pr = 1 . (2) r=0 Problem 2: Verify that this sum does work out to unity. Sum the entries of the second column and write your result in the last row of the table. The third column of the table allows you to work out the average number of heads in a given toss. Given the probabilities Pr for each different outcome, the average of r can be calculated using the following simple formula: r̄ = m X rPr . (3) r=0 Using this definition and Eq. 1 we expect that, for a binomial distribution, r̄ = m p. Problem 3: Fill in the third column and add up the terms. Is the average reasonable? The fourth column allows you to work out the variance. Given the probabilities Pr for each different outcome, the variance can be calculated using the following simple formula: σ2 = m X (r − r̄)2 Pr . (4) r=0 For a binomial distribution, σ 2 = m p (1 − p). Problem 4: Fill in column 4. Is the variance reasonable? Note: If one expands (p+q)4 one gets p4 +4p3 q +6p2 q 2 + 4pq 3 +q 4 . Each term of this expansion corresponds to one of the probabilities in Table I. This “binomial expansion” was described by Newton. The factors of each term can be figured out using “Pascal’s Triangle” that was promulgated by Pascal. The sides of Pascal’s triangle are 1’s. Interior numbers are obtained by summing the two numbers to the left and right above its position. 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 PHYS 233 Probability Distributions Example 2: Twelve six-sided dice Here we will let twelve six-sided dice represent twelve total events. After a roll of the dice, a die that turns up a ’snake eye’, , can be our choice of event that we want to keep track of. The probability of this event occurring is p = 1/6. (Why?) Problem 5: Work out the probability of rolling r = 0 ... 12 snake eyes and complete a table similar to the one you used in Problem 2. Plot a histogram of values. Also verify that the sum of the probabilities is unity, and that the average number of decays and the variance are reasonable. Example 3: Sixteen eight-sided dice The event of interest is again rolling a ’snake eye’. Problem 6: What is m and p for this example? Work out Pr for r = 0 ... 16 and complete a table similar to that used in Problem 2. Plot a histogram of values. Also verify that the sum of the probabilities is unity, and that the average number of decays and the variance are reasonable. The Poisson Distribution The decay of radioactive atoms provides another convenient source of random events to help us explore how we can use statistics to deal with randomness. A sample of radioactive material contains a large number of atoms. Many of these atoms are unstable and will transform to another element or isotope by emitting a photon, electron or alpha particle. We will assume that, once an unstable ”parent” decays, the resulting ”daughter” is stable and can emit no more particles. In more complicated cases, the daughter might be unstable as well but we will not deal with that situation now. Even though the time at which any particular atom will decay is unknown, there is some regularity in the process that we can discover by looking at the average behavior of a large number of atoms over a long time. For example, the fraction of unstable atoms that decays in a certain time period, for example one second, fluctuates around a well-defined average value. Two characteristics are important in understanding radioactive decay. First, the probability per unit time that an undecayed atom will decay within an infinitesimal time interval ∆t is a constant: Probability of decay in ∆t → a as ∆t → 0 ∆t PHYS 233 Probability Distributions where a is the probability per unit time of observing a decay. Second, the atoms are independent; the state of any atom does not affect another. We can use the concepts developed in the previous sections to describe the probability of radioactive decay occurring in a number of unstable atoms by realizing that each radioactive atom is equivalent to a coin or die, that the passing of a one-second time interval is equivalent to each toss of four coins or twelve dice, and decay of an atom is equivalent to a coin turning up heads or a die turning up a ’snake eye’. The case of radioactive decay is of course different from that of the coin and dice experiments we have been discussing. In a real radioactive sample there are a huge number of atoms, but each one has a small probability of decay, i.e. m → ∞, p → 0, but their product remains finite. In this case it is possible to make some approximations that simplify Eq. 1. 1. for r m m! = m(m − 1)...(m − r + 1) (m − r)! ' mr Problem 7: Work out how much difference this approximation makes for 100!/95!. 2. for small p (1 − p) ' e−p This comes from the Taylor expansion of the exponential function. When p is much less than unity, the squared, cubic and higher order terms of the expansion are negligible. Thus (1 − p)m−r ' = ' = e−p(m−r) e−pm epr e−pm · 1 e−pm . Problem 8: Work out how much percentage difference this approximation makes for p = 0.1, m = 100, and r = 5. Substituting these results into Eq. 1, we find Pr = mr e−pm pr r! . (5) Now define µ = pm , the average number of radioactive decays in each time interval. In this limit, the binomial distribution reduces to the following form: PHYS 233 Probability Distributions µr −µ e . (6) r! This distribution is called the Poisson distribution. Recall that Pr is the probability of r counts per time interval and n is the average number of counts per time interval. We have just shown that the Poisson distribution is the limit of the binomial distribution in cases where m is large and p is small. This is the case in most radioactive samples. Therefore, the Poisson distribution is a good approximation for analyzing counts from a radioactive sample. For a series of events described by the Poisson distribution, the average expected value calculated from Eq. 3 is r̄ = µ and the variance, calculated from Eq. 4, is σ 2 = µ. Pr ' Problem 9: As an exercise it is interesting to see how closely the Poisson distribution approximates the binomial distribution for the case of 16 eight-sided dice being rolled with a decay probability of 1/8 each time. Copy the following table and fill it in. r 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 sum Pr (binomial) Pr (Poisson) The similarity between the Poisson and binomial distributions, even in this case which is far from the limit where the Poisson distribution strictly applies, underlines why it was so difficult to distinguish the three boxes of the group experiment. The randomness of the finite set of results in each case masks the small distinctions among the distributions. Note: Both µr and r! are large even though their ratio might be relatively small. In general, if you wish to evaluate such expressions numerically, it is better to PHYS 233 Probability Distributions find a form that does not involve the ratio of two large numbers that evaluates to a small number. Thus one further approximation is useful. For µ 1, one can show that the Poisson distribution approaches a Gaussian distribution of mean √ µ and standard deviation µ. Thus, in this limit, Pr ' √ (r−µ)2 1 e− 2µ 2πµ . (7) The Poisson distributions for µ = 1, 3 and 10 are shown in Fig. 2. Probability, Pr 0.4 µ=1 µ=3 µ=10 Gaussian 0.3 0.2 0.1 0.0 0 5 10 15 20 Observed counts, r Figure 2: For µ 1, the Poisson distribution approaches a Gaussian distribution of mean √ µ and standard deviation µ. The Interval Distribution Do you know the story of Schrodinger’s Cat? Erwin Schrodinger proposed putting a cat in a box in which there was a device that would kill the cat upon the detection of a single radioactive decay event. There was a great deal of ceremony about the method of potential execution. The cat would first be put in the box and the lid fastened securely. Then the electronics would be turned on for a predetermined time interval over which there would be exactly a 50% chance of detecting a radioactive decay event. The question then posed was whether the cat would really be either alive or dead before an observer actually unfastened the lid and peered in to confirm the result. (If you think the answer is obvious, then you have yet to be introduced to the subtle philosophical quibbles of quantum mechanics. And if you think this example proves the cruel inhumanity of physicists, keep in mind that the experiment was only ever intended as hypothetical one, a.k.a., a “Gedanken” experiment.) With what probability does the cat live t seconds after the insertion of the radioactive sample? This question can be answered by determining the probability distribution of the time intervals between the detection of radioactive decay events. PHYS 233 Probability Distributions Imagine that we have a radioactive sample, a detector and some stopwatches. Over a period of time, we detect a series of pulses, each representing one detected event. We can characterize the pulse series by the time intervals between events. When a decay is observed we will start a stopwatch and stop it when the next decay is observed. We will repeat this many times and plot a histogram of the measured time intervals. time interval We can derive the expected distribution for the case where the decays occur randomly. Let the probability per unit time of measuring a decay be a and let it be constant with time. Let the probability of measuring a decay in a finite time interval be P . Note that there is a difference between a and P because a is the probability of measuring a decay in an infinitesimal time interval, conditioned on the assumption that the atom is undecayed immediately prior to that time interval of measurement, and P applies to any finite time interval, and reflects the cumulative likelihood of a decay event occurring as time elapses over that period of observation. The difference is especially significant if a is much greater than zero. So, if we start observing at t = 0, what is the probability that a decay has not occurred before a later time td ? If we slice the time into intervals of ∆t each, then there will be td /∆t intervals before time td . The probability of measuring a decay in each time slice is a∆t. Thus the probability that a decay hasn’t been measured up to a time td is Pno decay (t td < td ) = (1 − a ∆t) ∆t . The composite probability that the decay does not occur before td and that the first decay occurs in the interval ∆t at time td is td P (td ± ∆t/2) = (1 − a ∆t) ∆t a ∆t = e−a td a ∆t where we have multiplied the probability of no decay occurring up to time td by the probability of measuring a decay in the time slice ∆t. If we divide P (t) by ∆t and take the limit as ∆t → 0, we get the probability per unit time of the event occurring in the time interval t : lim ∆t→0 = a e−at = p(t; a) . (8) Thus p(t; a) is the interval distribution function that describes the probability of an event occurring in an interval of length t. It is also known as the exponential distribution. It is a probability density function and has units of probability per unit time. PHYS 233 Probability Distributions Properties of Probability Density Functions The binomial and Poisson distributions discussed in the first sections are discrete distributions. In contrast, the interval distribution is by nature continuous because t can take on any value. For discrete distributions, the normalization condition, and the mean and variance of the distribution are given by Eqs. 2, 3 and 4, respectively. For a continuous distribution, analogous definitions must be made. Any probability density must be normalized to unity (this may require multiplying p(x) by a suitable constant): Z ∞ −∞ p(x) dx = 1 . (9) Assuming a normalised probability density function, the average of a continuous variable x can be calculated using the formula: x̄ = Z ∞ −∞ x p(x) dx . (10) The variance of the distribution can be calculated as follows: σ 2 = x¯2 − x̄2 = Z ∞ −∞ x2 p(x) dx − Z ∞ −∞ 2 x p(x) dx . Problem 10: Calculate the mean and variance of the interval distribution. (11)

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Introduction to Probability Distributions