Download Introduction to Probability Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of randomness wikipedia , lookup

Randomness wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Inductive probability wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Law of large numbers wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
PHYS 233
Probability Distributions
Introduction to Probability Distributions
This is not intended to be an alternative to reading a textbook on probability and probability
distributions. More detail can be found in the following references:
References
1. ‘An Introduction to Error Analysis’ by J.R. Taylor, University Science Books, 1997.
2. ‘Data Reduction and Error Analysis for the Physical Sciences’ by P.R. Bevington and
D.K. Robinson, McGraw Hill, 1992.
Introduction
Probability distributions describe the probability of observing a particular event. There are
several probability distributions that are important to physicists. The binomial distribution,
while not of much practical significance, is easy to describe, and can be used to derive the
other distributions used most often by experimental physicists: the Gaussian and Poisson
distributions. The Gaussian, or normal distribution, is the most important as it is most often
used to describe the distribution of results for any measurement subject to small, random
error. The Poisson distribution is particularly useful in describing counting experiments.
A fourth distribution, the exponential distribution or interval distribution, describes the
distribution of intervals between counting events. In this lab, you will investigate the Poisson
and interval distributions.
Probability
In order to understand the statistical methods of dealing with random processes and how
some predictability can be garnered from such chance events, we will examine some simple cases involving coin tosses and dice. First we introduce three important properties of
probability:
1. If you consider two possible events A and B which are mutually exclusive (that is, if
A happens B cannot happen and vice versa) then the probability of either A or B
happening is the sum of the probabilities of A and B: P (A or B) = P (A) + P (B). An
example of two such events would be a coin toss where there are two possible events,
A =heads or B =tails.
2. The sum of the probabilities of all possible mutually exclusive events of a trial is unity,
because one of the events must happen in every trial: P (A) + P (B) + P (C) + ... = 1.
In our coin toss example, the coin must turn up either heads or tails.
3. The probability that two independent events will both happen is the product of the
probabilities of the two single events: P (A and B) = P (A) · P (B). An example of two
independent events would be two coin tosses.
PHYS 233
Probability Distributions
From these rules we can draw the following conclusions
ˆ If a trial has n and only n possible different outcomes, and if you know that all of the
outcomes have equal a priori probabilities of happening, then the probability of a given
outcome must be equal to 1/n.
ˆ If you classify the outcomes of a trial into different classes, and if the number of events
belonging to one class is m, the probability that an event belonging to that class will
happen is m/n.
We have to bear in mind that the concept of “equal probability” of events has to be
derived from experience. Once we have classified by experience all the possible different and
mutually exclusive events in such a manner that they have equal a priori probability, we can
apply the rules of probabilities for detailed calculations. The key problem, therefore, is to
identify which events have equal a priori probability. It requires considerable care to avoid
mistakes. For example, if you toss two coins, you might argue that there are three possible
outcomes: two heads, two tails, or one head and one tail. If you assume that each of these
probabilities are equally likely then the predicted probability would be 1/3 each. Experience
shows this to be wrong. The mistake is in having assumed two different events are only one
event: heads followed by tails, and tails followed by heads. This nuance will be clarified by
working out in detail the case of tossing four coins.
Example 1: Four coins
Toss four coins. Each coin has a 50% probability of turning up heads and a 50%
probability of turning up tails. (This seems logical, but it is an assumption that
should be justified by experience.) Let p represent the probability of heads and
q = 1 − p that of tails: p = 0.5, q = 0.5.
The probability of no heads in a toss is the probability that all four coins turn
up tails simultaneously:
(probability coin A is tails and coin B is tails and coin C is tails and
coin D is tails) = (probability coin A is tails) x (probability coin B is
tails) x (probability coin C is tails) x (probability coin D is tails).
There are 16 different ways the toss can turn out if we can distinguish which coin
is which. Each of the 16 ways is equally likely and only 1 of those sixteen ways
is all tails. Let Po represent the probability of none of the four coins turning up
heads.
Po = q 4
= 0.5 × 0.5 × 0.5 × 0.5
= 1/16
PHYS 233
Probability Distributions
There are four ways that one coin can turn up heads. Coin A can be heads, coin
B could be heads, coin C could be heads or coin D could be heads. Each one of
these has a probability of p q 3 = 1/16. Thus there are 4 chances out of 16 for 1
head if we don’t care which coin is heads:
P1 = q p3 + pqp2 + p2 qp + p3 q
= 4 × (1/16)
= 1/4
The probability that both coins of a specific pair are heads and the other two are
tails is p2 q 2 . To calculate the probability that any two coins be heads we have
to figure out how many different pairs there are. How many different ways can
the four coins turn up two heads and two tails? Consider choosing the two coins
that are to be heads. There are four ways of choosing the first coin and three
ways of choosing the second so that there are 4 × 3 or 12 ways of choosing two
from four (“four choose two or 4 C2 ”). But half of these 12 are really the same
two coins that have been chosen in a different order. For example if we label the
coins ABCD we can choose two in the following possible ways:
AB
AC
AD
BA (same as AB)
BC
BD
CA (same as AC)
CB (same as BC)
CD
DA (same as AD)
DB (same as BD)
DC (same as CD)
Those cases where the same two coins have been chosen but in a different order
must be eliminated from the count. The ways of choosing two different coins
from among four are shown in Fig. 1. Thus you can see that the total number is
A B C D
A B C D
A B CD
Figure 1: Choosing two coins from four.
PHYS 233
4×3
2
Probability Distributions
=6.
You should be able to convince yourself that the number of different ways r things can
be chosen from m, when the order is unimportant, is
m!
(m − r)!r!
The logic in this formula is as follows: the number of ways one can choose r from m without
m!
regard to duplication is m(m − 1)(m − 2)...(m − r + 1) which is (m−r)!
. This quantity must
be divided by r! to account for duplicates consisting of the same coins chosen in a different
order. This is the number of different possible combinations of m items taken r at a time.
The Binomial Distribution
Now we are ready to write down an expression for the probability distribution that describes
the likelihood of r events (e.g. heads) occurring in a total of m events (e.g. coin flips)
where the probability of an r-event occurring is p while the probability of it not occurring
is (1 − p). Since the individual events occur independently, the probability of a subset of r
events amongst many m is the product of individual probabilities. If r occur, then m − r
don’t and the probability is pr (1 − p)m−r . For the total probability of a particular event
occurring (e.g. 2 heads), we multiply the probability that the event occurs by the number
of ways that event can occur. The complete formula for the probability distribution is then
given by
m!
Pr =
(1 − p)m−r pr .
(1)
(m − r)!r!
This distribution is called the binomial distribution. It describes the probability that r events
occur among a total of m independent events. Note that it is a discrete distribution; it is
defined only at integral values of the variable r.
We can now use Eq. 1 to calculate the probability of getting two heads among four coins.
Remember, for the coin toss, the number of events is r = 2 out of a total of m = 4 coins and
the probability of each event is p = 1/2. Then
4!
1
1−
2!2!
2
3
.
=
8
P2 =
The other values of Pr can be obtained similarly.
2 2
1
2
PHYS 233
Probability Distributions
Problem 1: Use Eq. 1 to complete column 2 of the following table. Plot the histogram of values.
r
0
1
2
3
4
sum
Pr
rPr
(r − r̄)2 Pr
Recall that the total probability of all possible events must sum to unity:
m
X
Pr = 1 .
(2)
r=0
Problem 2: Verify that this sum does work out to unity. Sum the entries of the
second column and write your result in the last row of the table.
The third column of the table allows you to work out the average number of heads in a
given toss. Given the probabilities Pr for each different outcome, the average of r can be
calculated using the following simple formula:
r̄ =
m
X
rPr
.
(3)
r=0
Using this definition and Eq. 1 we expect that, for a binomial distribution, r̄ = m p.
Problem 3: Fill in the third column and add up the terms. Is the average reasonable?
The fourth column allows you to work out the variance. Given the probabilities Pr for
each different outcome, the variance can be calculated using the following simple formula:
σ2 =
m
X
(r − r̄)2 Pr
.
(4)
r=0
For a binomial distribution, σ 2 = m p (1 − p).
Problem 4: Fill in column 4. Is the variance reasonable?
Note: If one expands (p+q)4 one gets p4 +4p3 q +6p2 q 2 +
4pq 3 +q 4 . Each term of this expansion corresponds to one
of the probabilities in Table I. This “binomial expansion”
was described by Newton. The factors of each term can
be figured out using “Pascal’s Triangle” that was promulgated by Pascal. The sides of Pascal’s triangle are
1’s. Interior numbers are obtained by summing the two
numbers to the left and right above its position.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
PHYS 233
Probability Distributions
Example 2: Twelve six-sided dice
Here we will let twelve six-sided dice represent twelve total events. After a roll
of the dice, a die that turns up a ’snake eye’,
,
can be our choice of event that we want to keep track of. The probability of this
event occurring is p = 1/6. (Why?)
Problem 5: Work out the probability of rolling r = 0 ... 12 snake eyes and complete
a table similar to the one you used in Problem 2. Plot a histogram of values. Also
verify that the sum of the probabilities is unity, and that the average number of
decays and the variance are reasonable.
Example 3: Sixteen eight-sided dice
The event of interest is again rolling a ’snake eye’.
Problem 6: What is m and p for this example? Work out Pr for r = 0 ... 16 and
complete a table similar to that used in Problem 2. Plot a histogram of values.
Also verify that the sum of the probabilities is unity, and that the average number
of decays and the variance are reasonable.
The Poisson Distribution
The decay of radioactive atoms provides another convenient source of random events to
help us explore how we can use statistics to deal with randomness. A sample of radioactive
material contains a large number of atoms. Many of these atoms are unstable and will
transform to another element or isotope by emitting a photon, electron or alpha particle.
We will assume that, once an unstable ”parent” decays, the resulting ”daughter” is stable
and can emit no more particles. In more complicated cases, the daughter might be unstable
as well but we will not deal with that situation now.
Even though the time at which any particular atom will decay is unknown, there is some
regularity in the process that we can discover by looking at the average behavior of a large
number of atoms over a long time. For example, the fraction of unstable atoms that decays
in a certain time period, for example one second, fluctuates around a well-defined average
value.
Two characteristics are important in understanding radioactive decay. First, the probability per unit time that an undecayed atom will decay within an infinitesimal time interval
∆t is a constant:
Probability of decay in ∆t
→ a as ∆t → 0
∆t
PHYS 233
Probability Distributions
where a is the probability per unit time of observing a decay. Second, the atoms are independent; the state of any atom does not affect another.
We can use the concepts developed in the previous sections to describe the probability of
radioactive decay occurring in a number of unstable atoms by realizing that each radioactive
atom is equivalent to a coin or die, that the passing of a one-second time interval is equivalent
to each toss of four coins or twelve dice, and decay of an atom is equivalent to a coin turning
up heads or a die turning up a ’snake eye’.
The case of radioactive decay is of course different from that of the coin and dice experiments we have been discussing. In a real radioactive sample there are a huge number of
atoms, but each one has a small probability of decay, i.e. m → ∞, p → 0, but their product
remains finite. In this case it is possible to make some approximations that simplify Eq. 1.
1. for r m
m!
= m(m − 1)...(m − r + 1)
(m − r)!
' mr
Problem 7: Work out how much difference this approximation makes for
100!/95!.
2. for small p
(1 − p) ' e−p
This comes from the Taylor expansion of the exponential function. When p is much less
than unity, the squared, cubic and higher order terms of the expansion are negligible.
Thus
(1 − p)m−r '
=
'
=
e−p(m−r)
e−pm epr
e−pm · 1
e−pm .
Problem 8: Work out how much percentage difference this approximation makes
for p = 0.1, m = 100, and r = 5.
Substituting these results into Eq. 1, we find
Pr =
mr e−pm pr
r!
.
(5)
Now define µ = pm , the average number of radioactive decays in each time interval. In
this limit, the binomial distribution reduces to the following form:
PHYS 233
Probability Distributions
µr −µ
e
.
(6)
r!
This distribution is called the Poisson distribution. Recall that Pr is the probability of r
counts per time interval and n is the average number of counts per time interval. We have
just shown that the Poisson distribution is the limit of the binomial distribution in cases
where m is large and p is small. This is the case in most radioactive samples. Therefore, the
Poisson distribution is a good approximation for analyzing counts from a radioactive sample.
For a series of events described by the Poisson distribution, the average expected value
calculated from Eq. 3 is r̄ = µ and the variance, calculated from Eq. 4, is σ 2 = µ.
Pr '
Problem 9: As an exercise it is interesting to see how closely the Poisson distribution approximates the binomial distribution for the case of 16 eight-sided dice
being rolled with a decay probability of 1/8 each time. Copy the following table
and fill it in.
r
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
sum
Pr (binomial)
Pr (Poisson)
The similarity between the Poisson and binomial distributions, even in this case which
is far from the limit where the Poisson distribution strictly applies, underlines why it was
so difficult to distinguish the three boxes of the group experiment. The randomness of the
finite set of results in each case masks the small distinctions among the distributions.
Note: Both µr and r! are large even though their ratio might be relatively small.
In general, if you wish to evaluate such expressions numerically, it is better to
PHYS 233
Probability Distributions
find a form that does not involve the ratio of two large numbers that evaluates to
a small number. Thus one further approximation is useful. For µ 1, one can
show that the Poisson distribution approaches a Gaussian distribution of mean
√
µ and standard deviation µ. Thus, in this limit,
Pr ' √
(r−µ)2
1
e− 2µ
2πµ
.
(7)
The Poisson distributions for µ = 1, 3 and 10 are shown in Fig. 2.
Probability, Pr
0.4
µ=1
µ=3
µ=10
Gaussian
0.3
0.2
0.1
0.0
0
5
10
15
20
Observed counts, r
Figure 2: For µ 1, the Poisson distribution approaches a Gaussian distribution of mean
√
µ and standard deviation µ.
The Interval Distribution
Do you know the story of Schrodinger’s Cat? Erwin Schrodinger proposed putting a cat
in a box in which there was a device that would kill the cat upon the detection of a single
radioactive decay event. There was a great deal of ceremony about the method of potential
execution. The cat would first be put in the box and the lid fastened securely. Then the
electronics would be turned on for a predetermined time interval over which there would be
exactly a 50% chance of detecting a radioactive decay event. The question then posed was
whether the cat would really be either alive or dead before an observer actually unfastened
the lid and peered in to confirm the result. (If you think the answer is obvious, then you
have yet to be introduced to the subtle philosophical quibbles of quantum mechanics. And
if you think this example proves the cruel inhumanity of physicists, keep in mind that the
experiment was only ever intended as hypothetical one, a.k.a., a “Gedanken” experiment.)
With what probability does the cat live t seconds after the insertion of the radioactive
sample? This question can be answered by determining the probability distribution of the
time intervals between the detection of radioactive decay events.
PHYS 233
Probability Distributions
Imagine that we have a radioactive sample, a detector and some stopwatches. Over a
period of time, we detect a series of pulses, each representing one detected event. We can
characterize the pulse series by the time intervals between events. When a decay is observed
we will start a stopwatch and stop it when the next decay is observed. We will repeat this
many times and plot a histogram of the measured time intervals.
time
interval
We can derive the expected distribution for the case where the decays occur randomly.
Let the probability per unit time of measuring a decay be a and let it be constant with
time. Let the probability of measuring a decay in a finite time interval be P . Note that
there is a difference between a and P because a is the probability of measuring a decay in
an infinitesimal time interval, conditioned on the assumption that the atom is undecayed
immediately prior to that time interval of measurement, and P applies to any finite time
interval, and reflects the cumulative likelihood of a decay event occurring as time elapses
over that period of observation. The difference is especially significant if a is much greater
than zero.
So, if we start observing at t = 0, what is the probability that a decay has not occurred
before a later time td ? If we slice the time into intervals of ∆t each, then there will be td /∆t
intervals before time td . The probability of measuring a decay in each time slice is a∆t.
Thus the probability that a decay hasn’t been measured up to a time td is
Pno
decay (t
td
< td ) = (1 − a ∆t) ∆t
.
The composite probability that the decay does not occur before td and that the first decay
occurs in the interval ∆t at time td is
td
P (td ± ∆t/2) = (1 − a ∆t) ∆t a ∆t
= e−a td a ∆t
where we have multiplied the probability of no decay occurring up to time td by the probability of measuring a decay in the time slice ∆t.
If we divide P (t) by ∆t and take the limit as ∆t → 0, we get the probability per unit
time of the event occurring in the time interval t :
lim
∆t→0
= a e−at = p(t; a) .
(8)
Thus p(t; a) is the interval distribution function that describes the probability of an event
occurring in an interval of length t. It is also known as the exponential distribution. It is a
probability density function and has units of probability per unit time.
PHYS 233
Probability Distributions
Properties of Probability Density Functions
The binomial and Poisson distributions discussed in the first sections are discrete distributions. In contrast, the interval distribution is by nature continuous because t can take on
any value.
For discrete distributions, the normalization condition, and the mean and variance of the
distribution are given by Eqs. 2, 3 and 4, respectively.
For a continuous distribution, analogous definitions must be made.
Any probability density must be normalized to unity (this may require multiplying p(x)
by a suitable constant):
Z ∞
−∞
p(x) dx = 1 .
(9)
Assuming a normalised probability density function, the average of a continuous variable
x can be calculated using the formula:
x̄ =
Z ∞
−∞
x p(x) dx .
(10)
The variance of the distribution can be calculated as follows:
σ 2 = x¯2 − x̄2 =
Z ∞
−∞
x2 p(x) dx −
Z ∞
−∞
2
x p(x) dx
.
Problem 10: Calculate the mean and variance of the interval distribution.
(11)