Download Chapter 8 Discrete probability and the laws of chance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of randomness wikipedia , lookup

Indeterminism wikipedia , lookup

Randomness wikipedia , lookup

Dempster–Shafer theory wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Gambler's fallacy wikipedia , lookup

Law of large numbers wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Chapter 8
Discrete probability and the laws of
chance
8.1
Introduction
In this chapter we lay the groundwork for calculations and rules governing simple discrete probabilities. These steps will be essential to developing the skills to analyzing and understanding problems
of genetic diseases, genetic codes, and a vast array of other phenomena where the laws of chance
intersect the the processes of biological systems. To gain experience with probability, it is important
to see simple examples. We start this introduction with experiments that can be easily reproduced
and tested by the reader.
8.2
Simple experiments
Consider the following experiment: We flip a coin and observe any one of two possible results:
“heads” (H) or “tails” (T). A fair coin is one for which these results are equally likely. Similarly,
consider the experiment of rolling a dice: A six-sided dice can land on any of its six faces, so
that a “single experiment” has six possible outcomes. We anticipate getting each of the results
with an equal probability, i.e. if we were to repeat the same experiment many many times, we
would expect that, on average, the six possible events would occur with similar frequencies. We
say that the events are random and unbiased for “fair” dice. How likely are we to roll a 5 and
a 6 in successive experiments? A five or a six? If we toss a coin ten times, how probable is it
that we get 8 out of ten heads? For a given experiment such as the one described here, we are
interested in quantifying how likely it is that a certain event is obtained. Our goal in this chapter is
to make more precise our notion of probability, and to examine ways of quantifying and computing
probabilities for experiments such as these. To motivate this investigation, we first look at results
of a real experiment performed in class by students.
v.2005.1 - January 5, 2009
1
Math 103 Notes
8.3
Chapter 8
Empirical probability
Each student in a class of N = 121 individuals was asked to toss a penny 10 times. The students
were then asked to record their results and to indicate how many heads they had obtained in this
sequence of tosses. (Note that the order of the heads was not taken into account, only how many
were obtained out of the 10 tosses.) The table shown below specifies the number, k, of heads
(column 1), the number, xk , of students who responded that they had obtained that many heads
(column 2). In column (4) we compute the fraction of the class p(xk ) = xk /N who got exactly k
heads. In column (3) we display a cumulative number of students who got any number up to and
including k heads, and then in column (5) we compute the cumulative fraction of the class who
got any number up to and including k heads. We will henceforth associate the fraction with the
empirical probability of k heads. In the last column we include the cumulative probability, i.e. the
sum of the probabilities of getting any number up to k heads.
Number of heads
frequency
k
xk
Cumulative
k
X
xi
probability
p(xk ) = xk /N
i
0
1
2
3
4
5
6
7
8
9
10
0
1
2
10
27
26
34
14
7
0
0
0
1
3
13
40
66
100
114
121
121
121
cumulative probability
k
X
p(xi )
i
0.00
0.0083
0.0165
0.0826
0.2231
0.2149
0.2810
0.1157
0.0579
0.00
0.00
0.00
0.0083
0.0248
0.1074
0.3306
0.5455
0.8264
0.9421
1.00
1.00
1.00
Table 8.1: Results of a real experiment carried out by 121 students in this mathematics course.
Each student tossed a coin 10 times. We recorded the number of students who got 0, 1, 2, etc
heads. The fraction of the class that got each outcome is identified with the (empirical) probability
of that outcome. See Figure 8.1 for the same data presented graphically.
In Figure 8.1 we show what this distribution looks like on a bar graph. We observe that this
“empirical” distribution is not very symmetric, because it is based on a total of only 121 trials
(i.e. 121 repetitions of the experiment of 10 tosses). However, it is clear from this distribution
that certain results occurred more often (and hence are associated with a greater probability) than
others. To the right, we also show the cumulative distribution function, superimposed as an xy-plot
on the same graph. Observe that this function starts with the value 0 and climbs up to value 1,
since the probabilities of any of the events (0, 1, 2, etc heads) must add up to 1.
v.2005.1 - January 5, 2009
2
Math 103 Notes
Chapter 8
1.0
0.4
empirical probability of k heads in 10 tosses
Cumulative distribution
0.0
0.0
0.0
number of heads (k)
10.0
0.0
number of heads (k)
10.0
Figure 8.1: The data from Table 8.1 is shown plotted on this graph. A total of N = 121 people were
asked to toss a coin n = 10 times. In the bar graph (left), the horizontal axis reflects k, the number,
of heads (H) that came up during those 10 coin tosses. The vertical axis reflects the fraction p(xk )
of the class that achieved that particular number of heads. The same bar graph is shown on the
right, together with the cumulative function that sums up the values from left to right.
v.2005.1 - January 5, 2009
3
Math 103 Notes
8.4
Chapter 8
Mean and variance of a probability distribution
In a previous chapter, we considered distributions of grades and computed a mean (also called
average) of that distribution. The identical concept applies in the distributions discussed in the
context of probability, but here we interchangeably use the terminology mean or average value
or expected value.
Suppose we toss a coin n times and let xi stand for the number of heads that are obtained in
those n tosses. Then xi can take on the values xi = 0, 1, 2, 3, . . . n. Let p(xi ) be the probability of
obtaining exactly xi heads. By analogy to ideas in a previous chapter, we would define the mean
(or average or expected value), x̄ of the probability distribution by the ratio
Pn
xi p(xi )
x̄ = Pi=0
.
n
i=0 p(xi )
However, this expression can be simplified by observing that, according to property (2) of discrete
probability, the denominator is just
n
X
p(xi ) = 1
i=0
This explains the following definition of the expected value.
Definition
The expected value x̄ of a probability distribution (also called the mean or average value) is
x̄ =
n
X
xi p(xi ) .
i=0
It is important to keep in mind that the expected value or mean is a kind of “average x coordinate”, where values of x are weighted by their frequency of occurrence. This is similar to the
idea of a center of mass (x positions weighted by masses associated with those positions). The
mean is a point on the x axis, representing the “average” outcome of an experiment. (Recall that
in the distributions we are describing, the possible outcomes of some observation or measurement
process are depicted on the x axis of the graph.) The mean is not the same as the average value of
a function, discussed in an earlier chapter. (In that case, the average is an average y coordinate.)
We also define a numerical quantity that represents the width of the distribution. We define the
variance, V and standard deviation, σ as follows:
v.2005.1 - January 5, 2009
4
Math 103 Notes
Chapter 8
The variance, V , of a distribution is
V =
n
X
i=0
(xi − x̄)2 p(xi ).
where x̄ is the mean. The standard deviation, σ is
√
σ = V.
In the problem sets, we show that the variance can also be expressed in the form
V = M2 − x̄2
where M2 is the second moment of the distribution. Moments of a distribution are defined as the
numbers obtained by summing up products of the probability weighted by powers of x. The j’th
moment, Mj of a distribution is
n
X
Mj =
(xi )j p(xi ).
i=0
8.4.1
Example
For the empirical probability distribution shown in Figure 8.1, the mean (expected value) is calculated by performing the following sum, based on the table of events shown above:
x̄ =
10
X
k=0
xk p(xk ) = 0(0) + 1(0.0083) + 2(0.0165) + · · · + 8(0.0579) + 9(0) + 10(0) = 5.2149
The mean number of heads in this set of experiments is about 5.2. Intuitively, we would expect
that in a fair coin, half the tosses should produce heads, i.e. on average 5 heads would be obtained
out of 10 tosses. We see from the fact that the empirical distribution is slightly biased, that the
mean is close to, but not equal to this intuitive theoretical result.
To compute the variance we form the sum
10
10
X
X
2
(k − 5.2149)2p(k).
(xk − x̄) p(xk ) =
V =
k=0
k=0
Here we have used the mean calculated above and the fact that xk = k. We obtain
V = (0 − 5.2149)2(0) + (1 − 5.2149)2(0.0083) + · · · + (9 − 5.2149)2(0) + (10 − 5.2149)2(0) = 2.0530
The standard deviation is then σ =
v.2005.1 - January 5, 2009
√
V = 1.4328.
5
Math 103 Notes
8.5
Chapter 8
Theoretical probability
Our motivation in what follows it to put results of an experiment into some rational context. We
would like to be able to predict the distribution of outcomes based on underlying “laws of chance”.
Here we will formalize the basic rules of probability, and learn how to assign probabilities to events
that consist of repetitions of some basic, simple experiment like the coin toss. Intuitively, we expect
that in tossing a fair coin, half the time we should get H and half the time T. But as seen in our
experimental results, there can be long repetitions that result in very few H or very many H, far
from the mean or expected value. How do we assign a theoretical probability to the event that only
1 head is obtained in 10 tosses of a coin? This motivates our more detailed study of the laws of
chance and theoretical probability.
As we have seen in our previous example, the probability p assigns a number to the likelihood
of an outcome of an experiment. In the experiment discussed above, that number was the fraction
of the students who got a certain number of heads in a coin toss repetition.
8.5.1
Basic definitions of probability
Suppose we label the possible results of the experiment by symbols e1 , e2 , e3 , . . . ek , . . . em where m
is the number of possible events (e.g. m = 2 for a coin flip, m = 6 for a dice toss). We will refer
to these as events, and our purpose here will be to assign numbers, called probabilities, p, to these
events that indicate how likely it is that they occur. Then the following two conditions are required
for p to be a probability:
1. The following inequality must be satisfied:
0 ≤ p(ek ) ≤ 1 for all events ek .
Here p(ek ) = 0 is interpreted to mean that this event never happens, and p(ek ) = 1 means
that this event always happens. The probability of each (discrete) event is a number between
0 and 1.
2. If {e1 , e2 , e3 . . . ek , . . . em } is a list of all the possible events then
p(e1 ) + p(e2 ) + p(e3 ) + · · · + p(ek ) + . . . p(em ) = 1
or simply
m
X
p(ek ) = 1
k=1
That is, the probabilities of any of the events occurring sum up to one, since one or another
of the events must always occur.
v.2005.1 - January 5, 2009
6
Math 103 Notes
Chapter 8
Definition
The list of all possible “events” {e1 , e2 , e3 . . . ek , . . . em } is called the sample space.
8.6
Multiple events and combined probabilities
We here consider an “experiment” that consists of more than one repetition. For example, each
player tossed a coin 10 times to generate data used earlier. We aim to have a way of describing the
number of possible events as well as the likelihood of getting any one or another of these events.
First multiplication principle
If there are N1 possible events in experiment 1 and N2 possible events in experiment
2, then there are N1 · N2 possible events of the combined set of experiments.
In the above multiplication principle we assume that the order of the events is important.
Example 1
Each flip of a coin has two events. Flipping two coins can give rise to 2 · 2 = 4 possible events
(where we distinguish between the result TH and HT, i.e. order of occurrence is important.)
Example 2
Rolling a dice twice gives rise to 6 · 6 = 36 possible events.
Example 3
A sequence of three letters is to be chosen from a 4-letter alphabet consisting of the letters T, A,
G, C. (For example, TTT, TAG, GCT, and GGA are all examples of this type.) The number of
ways of choosing such 3-letter “words” is 4 × 4 × 4 = 64 since we can pick any of the four letters in
any of the three positions of the sequence.
8.7
Calculating the theoretical probability
How do we assign a probability to a given set of events? In the first example in this chapter, we
used data to do this, i.e. we repeated an experiment many times, and observed the fraction of
times of occurrence of each event. The resulting distribution of outcomes was used to determine
empirical probability. Here we take the alternate approach: we make some simplifying assumptions
about each elementary event and use the rules of probability to compute a theoretical probability.
v.2005.1 - January 5, 2009
7
Math 103 Notes
Chapter 8
Equally likely assumption
One of the most common assumptions is that each event occurs with equal likelihood. Suppose that
there are m possible events and that each is equally likely. Then the probability of each event is
1/m, i.e.
P(ei ) =
1
for i = 1 . . . m
m
Example 1
For a fair coin tossed one time, we expect that the probability of getting H or T is equal. In that
case,
P(H) + P(T ) = 1
P(H) = P(T )
Together these imply that
1
P(H) = P(T ) = .
2
Example 2
For a fair 6-sided die, the same assumption leads to the conclusion that the probability of getting
any one of the six faces as a result of a roll is P(ek ) = 1/6 for k = 1 . . . 6.
Independent events
In order to combine results of several experiments, we need to discuss the notion of independence
of events. Essentially, independent events are those that are not correlated or linked with one
another. For example, we assume in general that the result of one toss of a coin does not influence
the result of a second toss. All theoretical probability calculated in this chapter will be based on
this important assumption.
Second multiplication principle
Suppose events e1 and e2 are independent. Then, if the probability of event e1 is
P(e1 ) = p1 and the probability of event e2 is P(e2 ) = p2 , the probability of event e1
and event e2 both occurring is
P(e1 and e2 ) = p1 · p2 .
v.2005.1 - January 5, 2009
8
Math 103 Notes
Chapter 8
We also say that this is the probability of event e1 AND event e2 . This is sometimes written as
P(e1 ∩ e2 ).
If the two events e1 and e2 are not independent, the probability of both occuring is
P(e1 ∩ e2 ) = P(e1 ) · P(e2 assuming that e1 happened) = P(e2 ) · P(e1 assuming that e2 happened)
Example 3
The probability of tossing a coin to get H and rolling a dice to get a 6 is the product of the individual
probabilities of each of these events, i.e.
1
1 1
· = .
2 6
12
P(H and 6) =
Example 4
The probability of rolling a dice twice, to get a 3 followed by a 4, is
P(3 and 4) =
1 1
1
· = .
6 6
36
Addition principle
Suppose events e1 and e2 are mutually exclusive. Then the probability of getting
event e1 OR event e2 is given by
P(e1 ∪ e2 ) = p1 + p2 .
In general, i.e. the events e1 and e2 are not necessarily mutually exclusive, the following equation
holds:
P(e1 ∪ e2 ) = P(e1 ) + P(e2 ) − P(e1 ∩ e2 ).
Example 5
When we roll a dice once, assuming that each face has equal probability of occurring, the chances
of getting either a 1 or a 2 (i.e. any of these two possibilities out of a total of six) is
P({1} ∪ {2}) =
v.2005.1 - January 5, 2009
1
1 1
+ = .
6 6
3
9
Math 103 Notes
Chapter 8
Example 6
When we flip a coin, the probability of getting either heads or tails is
P({H} ∪ {T }) =
1 1
+ = 1.
2 2
This makes sense since there are only 2 possible events (N = 2), and we said earlier that
1, i.e. one of the 2 events must always occur.
PN
k=1 p(ek )
=
Subtraction principle
If the probability of event ek is P(ek ) then the probability of NOT getting event ek is
P(not ek ) = 1 − P(ek ).
Example 7
When we roll a dice once, the probability of NOT getting the value 2 is
P(not 2) = 1 −
1
5
= .
6
6
Alternatively, we can add up the probabilities of getting all the results other than a 2, i.e. 1, 3, 4,
5, or 6, and arrive at the same answer.
P({1} ∪ {3} ∪ {4} ∪ {5} ∪ {6}) =
5
1 1 1 1 1
+ + + + = .
6 6 6 6 6
6
Example 8
A box of jelly beans contains a mixture of 10 red, 18 blue, and 12 green jelly beans. Suppose that
these are well-mixed and that the probability of pulling out any one jelly bean is the same.
(a) What is the probability of randomly selecting two blue jelly beans from the box? (b) What
is the probability of randomly selecting two beans that have similar colors?
(c) What is the probability of randomly selecting two beans that have different colors?
Solution
There are a total of 40 jelly beans in the box. In a random selection, we are assuming that each
jelly bean has equal likelihood of being selected.
(a) Suppose we take out one jelly bean and then a second. Once we take out the first, there
will be 39 left. If the first one was blue (with probability 18/40), then there will be 17 blue
ones left in the box. Thus, the probability of selecting a blue bean AND another blue bean is:
18 17
· 39 = 0.196.
P(2 blue) = 40
v.2005.1 - January 5, 2009
10
Math 103 Notes
Chapter 8
The same answer is obtained by considering pairs of jelly beans. There are a total of (40 × 39)/2
pairs and out of these, only (18 × 17)/2 pairs are pure blue. Thus the probability of getting a blue
pair is
(18 × 17)/2
= 0.196
(40 × 39)/2
(The result is the same whether we select both simultaneously or one at a time.)
(b) Two beans will have similar colors if they are both blue OR both red OR both green. We have
10 9
12 11
to add the corresponding probabilities, that is P(same color) = ( 18
· 17 )+( 40
· 39 )+( 40
· 39 ) = 0.338.
40 39
(c) Two beans will have different colors if we DO NOT get the case that the two beans will have
the same color. Thus P(not same color) = 1 − P(same color) = 1 − 0.338 = 0.662.
Example 9
(a) How many different ways are there of rolling a pair of dice to get the total score of 7? (By total
score we mean the sum of both faces.)
(b) What is the probability of rolling the total score 7 with a pair of fair dice ?
(c) What is the probability of rolling the total score 8 with a pair of dice?
(d) What is the probability of getting a total of 13 by rolling three fair dice?
Solution
(a) We can think of the result as
+
= 7.
Then for the first die we could have any value, j = 1, 2, . . . 6 (i.e. a total of 6 possibilities) but then
the second die must be 7 − j, which means that there is no choice for the second die. Thus there
are 6 ways of obtaining a total of 7. (We do not need to list those ways, here, since the argument
establishes an unambiguous answer, but here is that list anyway, showing the face value of each
pair of events that totals 7: (1, 6); (2, 5); (3, 4); (4, 3); (5, 2); (6, 1)).
(b) There are a total of 6 × 6 = 36 possibilities for the outcomes of rolling two dice, and we saw
above that 6 of these will add up to 7. Assuming all possibilities are equally likely (for fair dice),
this means that the probability of a total of 7 for the pair is 6/36 = 1/6.
(c) Here we must be more careful, since the previous argument will not quite work: We need
+
= 8.
For example, if the first dice comes up 1, then there is no way to get a total of 8 for the pair. The
smallest value that would work on the “first die” is 2, so that we have only 5 possible choices for
the first die, and then the second has to make up the difference. There are only 5 such possibilities.
These are: (2, 6); (3, 5); (4, 4); (5, 3); (6, 2). Therefore the probability of such an event is 5/36.
v.2005.1 - January 5, 2009
11
Math 103 Notes
Chapter 8
(d) To get 13 by rolling three fair dice, we need
+
+
= 13.
We consider the possibilities: If the first dice comes up a 6, then we need, for the other pair
+
= 13 − 6 = 7.
We already know this can be done in six ways. If the first dice comes up a 5, then we need, for
other pair
+
= 13 − 5 = 8.
There are 5 ways to get this. Let us organize our “counting” of the possibilities in the following
table, to be systematic and to see a pattern:
Face value total of
of first die remaining pair
Number of ways
to get this
6
+
=7
6
5
+
=8
5
4
+
=9
4
3
+
= 10
3
2
+
= 11
2
1
+
= 12
1
We can easily persuade ourselves that there is a pattern being followed in building up this table.
We see that the total number of ways of getting the desired result is just a sum of the numbers in
the third column, i.e. 6 + 5 + 4 + 3 + 2 + 1 = 21. But the total number of possibilities for the three
dice is 63 = 216. Thus the probability of a total score of 13 is 21/216 = 0.097. In this example, we
had to list some different possibilities in order to achieve the desired result.
The examples in this section illustrate the simplest probability assumptions, rules, and calculations. Many of the questions asked in these examples were answered by careful “counting” of
possibilities and computing the fraction of cases in which some desired result is obtained. In the
following sections, we will discuss ways of representing outcomes of measurement (by distributions,
by numerical descriptors such as “mean” and “variance”). We will also study techniques for helping
us to “count” the number of possibilities and to compute probabilities associated with repeated
trials of one type of experiment.
v.2005.1 - January 5, 2009
12
Math 103 Notes
8.8
Chapter 8
Theoretical probability of coin tossing
Earlier in this chapter, we studied the results of a coin-tossing experiment. Now we turn to a
theoretical investigation of the same type of experiment to understand predictions of the basic rules
of probability. We would like to quantify the probability of getting some number, k, of heads when
the coin is tossed n times. We start with an elementary example when the coin is tossed only three
times (N = 3) to build up intuition. Let us use the notation p=P(H) and q=P(T) to represent the
probabilities of each outcome.
For our theoretical probability investigation, we will make the simplest assumption about each
elementary event, i.e. that it is equally likely to obtain a head (H) and tail (T) in one repetition.
Then the probabilities of event H and event T in a single toss are the same, i.e. 1/2:
P (H) = P (T ) = 1/2 i.e. p = q = 1/2.
A new feature of this section is that we will summarize the probability using a frequency distribution. In this important type of plot, the horizontal axis represents some observed or measured
value in an experiment (for example, the number of heads in a coin toss experiment). The vertical
axis represents how often that outcome is obtained (i.e. the frequency, or probability of the event.)
(b) three coin tosses
Suppose we are not interested in the precise order, but rather in the total number of heads (or
tails). For example, we may win $1 for every H and lose $1 for every T that occurs. When a fair
coin is tossed 3 times, the possible events are as follows: Grouping events together by the number
of heads obtained, we summarize the same information with the following frequency table:
event
TTT
TTH
THT
HTT
THH
HTH
HHT
HHH
number of heads
0
1
1
1
2
2
2
3
probability
· 21 · 12 = 18
· 21 · 12 = 18
· 21 · 12 = 18
· 21 · 12 = 18
· 21 · 12 = 18
· 21 · 12 = 18
· 21 · 12 = 18
· 21 · 12 = 18
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
Table 8.2: A list of all possible results of a 3 coin toss experiment, showing the number of heads in
each case and the theoretical probabilities of each of the results
Each result shown above has the same probability, p = 1/23 = 1/8. Grouping together results
from Table 8.2, and forming Table 8.3 we find similarly that the probability of getting no heads is
1/8, of getting one head is 3/8 (the sum of the probabilities of three equally likely events), of getting
two heads is 3/8, and of getting three heads is 1/8. This distribution is shown in Figure 8.2(b).
We can use these results to calculate the theoretical mean number of heads (expected value) in this
experiment.
v.2005.1 - January 5, 2009
13
Math 103 Notes
Chapter 8
number of heads (xk = k)
0
1
2
3
probability
P(TTT)
P(TTH)+P(THT)+P(TTH)
P(HTT)+P(HHT)+P(THH)
P(HHH)
result
1/8
3/8
3/8
1/8
Table 8.3: Theoretical probability of getting 0, 1, 2, 3, H’s in a 3-coin toss experiment.
8.8.1
Example
In the case of three tosses of a coin described above and shown in Figure 8.2(b), the expected value
is:
3
X
k · P(k Heads)
x̄ =
k=0
x̄ = 0p(0) + 1p(1) + 2p(2) + 3p(3)
3
3
1
1
x̄ = 0 · + 1 · + 2 · + 3 · .
8
8
8
8
3 6 3
12
x̄ = + + =
= 1.5
8 8 8
8
Thus, in three coin tosses, we expect that on average we would obtain 1.5 heads.
1.0
p(x)
distribution of Heads in 3 coin tosses
0.0
-0.5
3.5
Figure 8.2: The theoretical probability distribution for the number of heads obtained in three tosses
of a fair coin.
The variance of this distribution can be calculated as follows:
V =
3
X
3
3
1
1
(xi − x̄)2 p(xi ) = (0 − 1.5)2 · + (1 − 1.5)2 · + (2 − 1.5)2 · + (3 − 1.5)2 ·
8
8
8
8
i=0
v.2005.1 - January 5, 2009
14
Math 103 Notes
Chapter 8
3
1
V = 2(1.5)2 + 2(0.5)2 = 0.5625 + 0.1875 = 0.75
8
8
The standard deviation is
√
√
σ = V = 0.75 = 0.866
The bar graph shown in Figure 8.2 (a) and (b) will be referred to as the probability distribution
for the number of heads in three tosses of a coin.
We make the following observations about this distribution:
• The values of the probabilities are all positive and satisfy 0 ≤ p ≤ 1.
• In both graphs, the sum of the areas of all the bars in the bar graph is 1, i.e.
n
X
p(xi ) = 1.
i=0
• Some events (for example obtaining 1 or 2 heads) appear more often than others, and are
thus associated with larger values of the probability. (Even though each underlying event is
equally likely, there are many combinations that contribute to the result of one head.)
• There is a pattern in the probability we computed theoretically for k, the number of heads
obtained in n tosses of a fair coin. The pattern so far is:
Probability of k heads in n tosses of a fair coin= n
[Number of possible ways to obtain k heads ] × 21
• So far, to determine the number of possible ways to obtain k heads, i.e., the factor in the
square brackets, we have listed all the possibilities and counted those that have exactly so
many heads. This would become very tedious, especially for large number of tosses, n. For this
reason, some part of this chapter will be a diversion into the investigation of permutations and
combinations. These results will help to understand what factor goes into the square brackets
in the above term.
• In the case of a fair coin here examined, p = q = 1/2. This accounts for the factor (1/2)n .
We will see later that this result is modified somewhat when the coin is biased, i.e. not fair,
so that the probability of H is not the same as the probability of T, i.e., p 6= q. In that case
the factor (1/2)n will be modified (to pk q n−k , as later discussed).
The coin toss experiment is an important example of a type of experiment with only two outcomes (H vs T). Such experiments are called Bernoulli trials. Here we have looked at examples
where the probability of each event was (assumed to be) the same. We will generalize this to unequal probabilities further on in this chapter. The above motivation leads us to consider the subject
of permutations and combinations.
v.2005.1 - January 5, 2009
15
Math 103 Notes
8.9
Chapter 8
How many possible ways are there of getting k heads
in n tosses? permutations and combinations
In computing theoretical probability, we often have to “count” the number of possible ways there
are of obtaining a given type of outcome. So far, it has been relatively easy to simply display all
the possibilities and group them into classes (0, 1, 2, etc heads out of n tosses, etc.). This is not
always the case. When the number of repetitions of an experiment grows, it may be very difficult
and boring to list all possibilities. We develop some shortcuts to figure out, in general, how many
ways there are of getting each type of outcome. This will make the job of computing theoretical
probability easier. In this section we introduce some notation and then summarize general features
of combinations and permutations to help in “counting” the possibilities.
8.9.1
Factorial notation
Let n be an integer, n ≥ 0. Then n!, called “n factorial”, is defined as the following product of
integers:
n! = n(n − 1)(n − 2) . . . (2)(1)
Example
1! = 1
2! = 2 · 1 = 2
3! = 3 · 2 · 1 = 6
4! = 4 · 3 · 2 · 1 = 24
5! = 5 · 4 · 3 · 2 · 1 = 120
We also define
0! = 1
8.9.2
Permutations
A permutation is a way of arranging objects, where the order of appearance of the objects is
important.
v.2005.1 - January 5, 2009
16
Math 103 Notes
Chapter 8
n!
(a)
n distinct objects
n
n−1
n−2
P(n,k)=
(b)
n distinct objects
n
n−1
...
...
2
1
n slots
n!
(n−k)!
k slots
n−k+1
C(n,k)
k!
(c)
n distinct objects
k objects
n
n−1
...
n−k+1
k slots
Figure 8.3: This diagram illustrates the meanings of permutations and combinations. (a) The
number of permutations (ways of arranging) n objects into n slots. There are n choices for the first
slot, and for each of these, there are n − 1 choices for the second slot, etc. In total there are n! ways
of arranging these objects. (Note that the order of the objects is here important.) (b) The number
of permutations of n objects into k slots, P (n, k), is the product n · (n − 1) · (n − 2) . . . (n − k + 1)
which can also be written as a ratio of factorials. (c) The number of combinations of n objects in
groups of k is called C(n, k) (shown as the first arrow in part c). Here order is not important. The
step shown in (b) is equivalent o the two steps shown in (c). This means that there is a relationship
between P (n, k) and C(n, k), namely, P (n, k) = k!C(n, k).
v.2005.1 - January 5, 2009
17
Math 103 Notes
Chapter 8
Example 6
Given the three cards Jack, Queen, King, we could permute them to form the sequences
JQK
JKQ
QKJ
QJK
KQJ
KJQ
We observe that there are six possible arrangements (permutations). Other than explicitly
listing all the arrangements, as done here, (possible only for small sets of objects) we could arrive
at this fact by reasoning as follows: Let us consider the possible “slots” that can be filled by the
three cards
.
We have three choices of what to put in the first slot (J or K or Q). This uses up one card, so
for each of the above choices, we then have only two choices for what to put in the second slot.
The third slot leaves no choice: we must put our remaining card in it. Thus the total number of
possibilities, i.e. the total number of permutations of the three cards is
3 × 2 × 1 = 6.
A feature of this argument is that it can be easily generalized for any number of objects. For
example, given N = 10 different cards, we would reason similarly that as we fill in ten slots, we can
choose any one of 10 cards for the first slot, any of the remaining 9 for the next slot, etc., so that
the number of permutations is
10 × 9 × 8 · · · × 2 × 1 = 10! = 3628800
We can summarize our observation in the following statement:
The number of permutations (arrangements) of n objects is n!. (See Figure 8.3(a).)
Recall that the factorial notation n! was defined in section 8.9.1.
Example 7
How many different ways are there to display five distinct playing cards?
Solution
The answer is 5! = 120. Here the order in which the cards are shown is important.
Suppose we have n objects and we randomly choose k of these to put into k boxes (one per
box). Assume k < n.
v.2005.1 - January 5, 2009
18
Math 103 Notes
Chapter 8
For example, the objects are
♣♦♥♠ ⋆ • ◦ ⊕
and we must choose some of them (in order) so as to fill up the 4 slots:
We ask ”how many ways there are of arranging n objects taken k at a time?” As in our previous
argument, the first slot to fill comes with a choice of n objects (for n possibilities). This “uses up”
one object leaving (n − 1) to choose from in the next stage. (For each of the n first choices there
are (n − 1) choices for slot 2, forming the product n · (n − 1)). In the third slot, we have to choose
among (n − 2) remaining objects, etc. By the time we arrive at the k’th slot, we have (n − k + 1)
choices. Thus, in total, the number of ways that we can form such arrangements of n objects into
k slots, represented by the notation P (n, k) is
P (n, k) = n · (n − 1) · (n − 2) . . . (n − k + 1).
We can also express this in the form of factorial notation:
P (n, k) =
n · (n − 1) · (n − 2) . . . (n − k + 1) · (n − k) · · · · 3 · 2 · 1
n!
=
.
(n − k) . . . (n − k − 1) · · · · 3 · 2 · 1
(n − k)!
These remarks motivate the following observation:
The number of permutations of n objects taken k at a time is
P (n, k) =
n!
.
(n − k)!
(See Figure 8.3(b).)
8.9.3
Combinations and binomial coefficients
How many ways are there to choose k objects out of a set of n objects if the order of the selection
does not matter? For example, if we have a class of 10 students, how many possible pairs of students
can be formed for a team project? In the case that the order of the objects is not important, we
refer to the number of possible combinations of n objects taken k at a time by the notation C(n, k)
or, more commonly, by
n!
n
=
C(n, k) =
k
(n − k)!k!
Note that two notations are commonly used to refer to the same concept. We will henceforth use
mainly the notation C(n, k). We can read this notation as “n choose k”. The values C(n, k) are
also called the binomial coefficients for reasons that will shortly become apparent.
As shown in Figure 8.3(b,c), combinations are related to permutations in the following way: To
find the number of permutations of n objects taken k at a time, P (n, k), we would
v.2005.1 - January 5, 2009
19
Math 103 Notes
Chapter 8
• Choose k out of n objects. But the number of ways of doing this is C(n, k), i.e.,
n
C(n, k) =
k
• Find all the permutations of the k chosen objects. We have discussed that there are k! ways
of arranging (i.e. permuting) k objects.
Thus
P (n, k) =
n
k
· k!
The above remarks lead to the following conclusion:
The number of combinations of n objects taken k at a time, sometimes called “n choose
k”, (also called the binomial coefficient, C(n, k)) is:
P (n, k)
n!
n
=
C(n, k) =
=
.
k
k!
k!(n − k)!
We can observe an interesting symmetry property, namely that
n
n
or C(n, k) = C(n, n − k).
=
n−k
k
It is worth noting that the binomial coefficients are entries that occur in Pascal’s triangle:
1
11
121
1331
14641
1 5 10 10 5 1
Each term in Pascal’s triangle is obtained by adding the two diagonally above it. The top of the
triangle represents C(0, 0) and is associated with n = 0. The next row represents C(1, 0) and
C(1, 1). For row number n, terms along the row are the binomial coefficients C(n, k), starting with
k = 0 at the beginning of the row and and going to k = n at the end of the row. For example,
we see above that the value of C(5, 2) = C(5, 3) = 10 The triangle can be continued, by including
subsequent rows; this is left as an exercise for the reader.
8.9.4
Example
How many ways are there of getting k heads in n tosses of a fair coin?
v.2005.1 - January 5, 2009
20
Math 103 Notes
Chapter 8
Solution
This problem motivated the discussion of permutations and combinations. We can now answer this
question.
The number of possible ways of obtaining k heads in n tosses is
C(n, k) =
n!
.
k!(n − k)!
Thus the probability of getting any outcome consisting of k heads when a fair coin is tossed n times,
is
n!
For a fair coin, P(k heads in n tosses)=
k!(n − k)!
n
1
.
2
The term containing the power (1/2)n is the probability of any one specific sequence of possible
H’s and T’s. The multiplier in front, which as we have seen is the binomial coefficient C(n, k), is
how many such sequences have exactly k H’s and all the rest (i.e., n − k) T’s. (The greater the
number of possible combinations, the more likely it is that any one of them would occur.)
8.9.5
Example
How many combinations can be made out of of 5 objects if we take 1, or 2 or 3 etc objects at a
time?
Solution
Here the order of the objects is not important. The number of ways of taking 5 objects k at a time
(where 0 ≤ k ≤ 5) is C(5, k). For example, the number of combinations of 5 objects taken 3 at a
time is
5!
5·4·3·2·1
5·4
C(5, 3) =
=
=
= 10.
3!(5 − 3)!
(3 · 2 · 1)(2 · 1)
2
The list of all the coefficients C(5, k) appears as the last row displayed above in Pascal’s triangle.
8.9.6
Example
How many different 5-card hands can be formed from a deck of 52 ordinary playing cards?
v.2005.1 - January 5, 2009
21
Math 103 Notes
Chapter 8
Solution
We are not concerned here with the order of appearance of the cards that are dealt, only with
the “hand” (i.e. composition of the final collection of 5 cards). Thus we are asking how many
combinations of 5 cards there are from a set of 52 cards. The solution is
C(52, 5) =
8.9.7
52 · 51 · 50 · 49 · 48
52!
=
= 2, 598, 960.
5!(52 − 5)!
5·4·3·2·1
The binomial theorem
An interesting application of combinations is the formula for the product of terms of the form
(a + b)n known as the Binomial expansion. Consider the simple example
(a + b)2 = (a + b) · (a + b).
We expand this by multiplying each of the terms in the first factor by each of the terms in the
second factor:
(a + b)2 = a2 + ab + ba + b2 .
However, the order of factors ab or ba does not matter, so we count these as two identical terms,
and express our result as
(a + b)2 = a2 + 2ab + b2 .
Similarly, consider the product
(a + b)3 = (a + b)(a + b)(a + b).
Now, to form the expansion, each term in the first factor is multiplied by two other terms (one
chosen from each of the other factors). This leads to an expansion of the form
(a + b)3 = a3 + 3a2 b + 3ab2 + b3 .
More generally, consider a product of the form
(a + b)n = (a + b) · (a + b) · · · · (a + b).
By analogy, we expect to see terms of the form shown below in the expansion for this binomial, i.e.
(a + b)n = an +
an−1 b +
an−2 b2 + · · · +
an−k bk + · · · +
abn−1 + bn .
The first and last terms are accompanied by the “coefficients” 1, since such terms can occur in only
one way each. However, we must still “fill in the boxes” with coefficients that reflect the number of
times that terms of the given form an−k bk occur. But this product is made by choosing k a’s out of
a total of n factors (and picking b’s from all the rest of the factors). We already know how many
ways there are of selecting k items out of a collection of n, namely, the binomial coefficients. Thus
(a+b)n = an +C(n, 1)an−1 b+C(n, 2)an−2 b2 +· · ·+C(n, k)ak bn−k +· · ·+C(n, 2)a2 bn−2 +C(n, 1)abn−1 +bn
where the binomial coefficients are as defined in section 8.9.3. We have used the symmetry property
C(n, k) = C(n, n − k) in the coefficients in this expansion.
v.2005.1 - January 5, 2009
22
Math 103 Notes
8.9.8
Chapter 8
Example
Find the expansion of the expression (a + b)5 .
Solution
The coefficients we need in this expansion are formed from C(5, k). We have already calculated the
binomial coefficients for the required expansion in example 8.9.5, namely, 1 5 10 10 5 1. Thus the
desired expansion is
(a + b)5 = a5 + 5a4 b + 10a3 b2 + 10a2 b3 + 5ab4 + b5 .
8.10
A coin toss experiment and binomial distribution
A Bernoulli Trial is an experiment that has only two possible results. A typical example of this type
is the coin toss. We have already studied examples of results of a Bernoulli trial in this chapter.
Here we expand our investigation to consider more general cases and their distributions.
We have already examined in detail an example of a Bernoulli trial in which each outcome is
equally likely, i.e. a coin toss with P(H) = P(T). In this section we will drop the assumption that
each event is equally likely, and examine a more general case.
If we do not know that the coin is fair, we might assume that the probability that it lands on H
is p and on T is q. That is
p(H) = p, p(T ) = q
In general, p and q may not be exactly equal. By the property of probabilities,
p+q =1
Consider the following specific outcome of an experiment in which an (unfair) coin is tossed 10
times:
TTHTHHTTTH
Assuming that each toss is independent of the other tosses, we find that the probability of this
event is: q · q · p · q · p · p · q · q · q · p = p4 q 6 . The probability of this specific event is the same
as the probability of the specific event HHHHTTTTTT (Since each event has the same number of
H’s and T’s). The probability of each event is a product of factors of p and q (one for each of H
or T that appear). Further, the number of ways of obtaining an outcome with a specific number
of H’s (for example, four H’s as in this illustration) is the same whether the coin is fair or not.
(That number is a combination, i.e. the binomial coefficient C(n, k) as before.) The probability of
getting any outcome with k heads when the (possibly unfair) coin is tossed n times is thus a simple
generalization of the probability for a fair coin:
v.2005.1 - January 5, 2009
23
Math 103 Notes
Chapter 8
The binomial distribution
Given a (possibly unfair) coin with P(H)= p, and P(T)=q, where p + q = 1, if the coin is
tossed n times, the probability of getting exactly k heads is given by
P (k heads out of n tosses) = C(n, k)pk q n−k =
n!
pk q n−k .
k!(n − k)!
We refer to this distribution as the Binomial distribution. In the case of a fair coin,
p = q = 1/2 and the factor pk q n−k is replaced by (1/2)n .
Having obtained the form of the binomial distribution, we wish to show that the probability of
obtaining any of the possible outcomes, i.e. P(1 or 2 or . . . n Heads out of n tosses)=1. We can
show this with the following calculation.
For each toss,
(p + q) = 1.
Then raising each side to the power n,
(p + q)n = 1n = 1,
but by the Binomial theorem, the expression on the left can be expanded, to form,
(p+q)n = pn +C(n, 1)pn−1q+C(n, 2)pn−2q 2 +· · ·+C(n, k)pk q n−k +· · ·+C(n, 2)p2 q n−2 +C(n, 1)pq n−1 +q n ,
that is,
n
(p + q) =
n
X
C(n, k)pk q n−k
k=0
n
Therefore, since (p + q) = 1, it follows that
n
X
C(n, k)pk q n−k = 1
k=0
Thus
Pn
k=0 P(k
Heads out of n tosses)=1, verifying the desired relationship.
Remark: Each term in the above expansion can be interpreted as the probability of a certain
type of event. The first term is the probability of tossing exactly n heads: there is only one way this
can happen, accounting for the coefficient 1. The last term is the probability of tossing exactly n
Tails. The product pk q n−k reflects the probability of a particular sequence containing k Heads and
the rest (n − k) Tails: but there are many ways of generating that type of sequence: C(n, k) is the
number of distinct combinations that are all counted as k heads. Thus the combined probability of
getting any of the events in which there are k Heads is given by C(n, k)pk q n−k .
8.10.1
Example
Suppose P(H)=p = 0.1. What is the probability of getting 3 Heads if this unfair coin is tossed 5
times?
v.2005.1 - January 5, 2009
24
Math 103 Notes
Chapter 8
Solution
¿From the above results, P (3 heads out of 5 tosses) = p3 q 2 C(5, 3). But p = 0.1 and q = 1 −p =
0.9, so P(3 heads out of 5 tosses)= 0.13 0.92 C(5, 3) = (0.001)(.81)10 = 0.0081
8.11
Mean of a binomial distribution
A binomial distribution has a particularly simple mean. An important result, established in the
calculations in this section is as follows:
Consider a Bernoulli trial in which the probability of event e1 is p. Then if this trial is
repeated n times, the mean of the resulting binomial distribution, i.e. expected number of
times that event e1 occurs is
x̄ = np.
Thus the mean of a binomial distribution is the number of repetitions multiplied by the
probability of the event in a single trial.
We here verify the simple formula for the mean of a binomial distribution. The calculations
use many properties of series that were established in Chapter 1. The calculation is presented for
completeness, rather than importance but the result (in the box above) is very useful and important.
By definition of the mean,
x̄ =
n
X
xk p(xk )
k=0
But here xk = k is the number of heads obtained, and p(xk )=P(k heads in n tosses)= C(n, k)pk q n−k
is the distribution of k heads in n tosses computed in this chapter. Then
x̄ =
n
X
k=0
k n−k
k · C(n, k)p q
=
n
X
k=1
k
n!
pk q n−k
k!(n − k)!
where in the last sum we have dropped the k = 0 term, since it makes no contribution to the total.
The numerators in the sum are of the form k · n · (n − 1) . . . (n − k + 1) and the denominators are
k · (k − 1) . . . 2 · 1. We can cancel one factor of k from top and bottom. We can also take one
common factor of n out of the sum:
x̄ =
n
X
n(n − 1) . . . (n − k + 1)
(k − 1) . . . 2 · 1
k=1
pk q n−k = n
n
X
(n − 1) . . . (n − k + 1)
(k − 1) . . . 2 · 1
k=1
pk q n−k
We now shift the sum by defining the following replacement index: let ℓ = k − 1 then k = ℓ + 1 so
when k = 1, ℓ = 0 and when k = n, ℓ = n − 1. We replace the indices and take one common factor
of p out of the sum:
x̄ = n
n−1
X
(n − 1) . . . (n − ℓ)
ℓ=0
v.2005.1 - January 5, 2009
ℓ!
p
ℓ+1 n−ℓ−1
q
= np
n−1
X
ℓ=0
(n − 1)!
pℓ q n−ℓ−1
ℓ!(n − ℓ − 1)!
25
Math 103 Notes
Chapter 8
Let m = n − 1, then
m
X
m!
pℓ q m−ℓ = np,
ℓ!(n
−
m)!
ℓ=0
P
where in the last step we have used the fact that nk=0 p(xk ) = 1. This verifies the result.
x̄ = np
8.12
A continuous distribution
0.4
The Normal distribution
0.0
-4.0
4.0
Figure 8.4: The Normal (or Gaussian) distribution is given by equation (8.1) and has the distribution
shown in this figure.
If we were to repeat the number of coin tosses a large number of times, n, we would see a certain
trend: There would be a peak in the distribution at the probability of getting heads 50% of the
time, i.e. at N/2 heads.
A fact which we state but do not prove here is that the probability of N/2 heads, p(N/2) behaves
like
p(N/2) ≈
This can also be written in the form
r
v.2005.1 - January 5, 2009
r
2
=
πN
N
p(N/2) ≈
4
r
r
1
2π
r
4
.
N
1
= Const.
2π
26
Math 103 Notes
Chapter 8
√
One finds that the shapes of the various distributions is similar, but that a scale factor of N /2 is
applied to stretch the graph horizontally, while compressing it vertically to preserve its total area.
The graph is also shifted so that its peak occurs at N/2.
As the number of Bernoulli trials grows, i.e. as we toss our imaginary coin in longer and longer
sets (N → ∞), a remarkable thing happens to the binomial distribution: it becomes smoother
and smoother, until it grows to resemble a continuous distribution that looks like a “Bell curve”.
That curve is known as the Gaussian or Normal distribution. If we scale√this curve vertically and
horizontally (stretch vertically and compress horizontally by the factor N /2) and shift its peak
to x = 0, then we find a distribution that describes the deviation from the expected value of 50%
heads. The resulting function is of the form
1
2
p(x) = √ e−x /2
2π
(8.1)
We will study properties of this (and other) such continuous distributions in a later section. We
show a typical example of the Normal distribution in Figure 8.4. Its cumulative distribution is then
shown (without and with the original distribution superimposed) in Figure 8.5.
1.0
1.0
The cumulative distribution
The cumulative distribution
The normal distribution
0.0
0.0
-4.0
4.0
-4.0
4.0
Figure 8.5: The Normal probability density with its corresponding cumulative distribution.
8.13
Summary
In this chapter, we introduced the notion of probability of elementary events. We learned that a
probability is always a number between 0 and 1, and that the sum of (discrete) probabilities of all
possible (discrete) outcomes is 1. We then described how to combine probabilities of elementary
events to calculate probabilities of compound independent events in a variety of simple experiments.
We defined the notion of a Bernoulli trial, such as tossing of a coin, and studied this in detail.
We investigated a number of ways of describing results of experiments, whether in tabular or
graphical form, and we used the distribution of results to define simple numerical descriptors. The
v.2005.1 - January 5, 2009
27
Math 103 Notes
Chapter 8
mean is a number that, more or less, describes the location of the “center” of the distribution
(analogous to center of mass), defined as follows:
The mean (expected value) x̄ of a probability distribution is
x̄ =
n
X
xi p(xi ).
i=0
The standard deviation is, roughly speaking, the “width” of the distribution.
The standard deviation, σ is
σ=
√
V
where V is the variance,
V =
n
X
i=0
(xi − x̄)2 p(xi ).
While the chapter was motivated by results of a real experiment, we then investigated theoretical
distributions, including the binomial. We found that the distribution of events in a repetition of a
Bernoulli trial (e.g. coin tossed n times) was a Binomial distribution, and we computed the mean
of that distribution.
Suppose that the probability of one of the events, say event e1 in a Bernoulli trial is p (and
hence the probability of the other event e2 is q = 1 − p), then
P (k occurrences of given event out of n trials) =
n!
pk q n−k .
k!(n − k)!
This is called the binomial distribution. The mean of the binomial distribution, i.e. the
mean number of events e1 in n repeated Bernoulli trials is
x̄ = np.
v.2005.1 - January 5, 2009
28