Download Lectures 3-4: Combinatorial Analysis 1. Combinatorial analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lectures 3-4: Combinatorial Analysis
1. Combinatorial analysis: Many problems in probability involve finite sample spaces in
which every outcome is equally likely (e.g., a sequence of tosses of a fair coin). In this case,
the probability of an event can be calculated by dividing the number of outcomes that belong
to that particular event by the total number of outcomes in the sample space. Combinatorial
analysis provides us with a set of tools that can be used to efficiently determine the sizes of
these sets.
2. Ordered Samples
Basic Principle of Counting: Suppose that r experiments are performed such that the first
experiment can result in any one of n1 outcomes, the second experiment can result in any one
of n2 outcomes, the third experiment can result in any one of n3 outcomes, and so forth. Here
the particular outcomes (but not the number thereof) that can occur on the k’th experiment
may depend on the outcomes of experiments 1 through k − 1. Then the total number of possible
outcomes for the ordered sequence of all r experiments is n1 n2 · · · nr .
Here are some important special cases:
Ordered samples: If set E1 contains n1 elements, E2 contains n2 elements, ..., and Er contains nr elements, then the number of ordered r-tuples (x1 , x2 , · · · , xr ) that can be formed with
xi ∈ Ei is equal to n1 n2 · · · nr .
Sampling with replacement: If E1 = · · · = Er = E, which contains n elements, then the
number of ordered r-tuples is just nr . We can think of the r-tuple (x1 , x2 , · · · , xr ) as an ordered
sample of size r from E where we are sampling with replacement, i.e., the same element can
be sampled repeatedly.
Sampling without replacement: If we sample a set containing n elements r-times without
replacement, then the number of ordered samples is n(n − 1)(n − 2) · · · (n − r + 1). In this
case, each element can be sampled at most once, so that the elements that can be sampled on
the k’th draw depend on the elements sampled on draws 1, · · · , k − 1.
Ex: The number of different subsets of a set E containing n elements is 2n . To see this, write
E = {e1 , · · · , en } and notice that there is a one-to-one correspondence between the subsets of
E and binary sequences (x1 , · · · , xn ) obtained by setting xi = 1 if ei is in the subset and xi = 0
otherwise.
Ex: Proteins and DNA sequences. Proteins are sequences of 20 basic molecules called
amino acids. The order in which the amino acids are strung together determines both the structure and the function of the protein. Notice that there are 20L distinct proteins that can be
made using a sequence of L amino acids.
1
DNA is also a polymer, but it is made up 4 basic molecules called nucleotides, denoted A, T,
G, and C. Thus there are 4l distinct DNA sequences that are l nucleotides long. Proteins are
encoded by DNA sequences, i.e., each protein corresponds to a segment of DNA called a gene,
and this encoding is local in the sense that changing one nucleotide in a gene will change at
most one amino acid in the corresponding protein. However, more than one nucleotide is needed
to specify an amino acid. Indeed, since there are 20 amino acids but only 4 nucleotides, groups
of either one nucleotide or two neighboring nucleotides could specify at most 4 and 42 = 16
amino acids. In fact, groups of three neighboring nucleotides, called triplet codons (or codons,
for short) are used to specify amino acids. This is possible since there are 43 = 64 such triplet
codons, which is more than enough to encode 20 amino acids. Moreover, because there are more
than 20 such codons, we can deduce that the genetic code must be degenerate: some amino
acids must be encoded by more than one triplet codon. In general, this degeneracy is present
at the third nucleotide in the codon, which can often be changed without changing the identity
of the corresponding amino acid.
3. Permutations
Suppose that the set E contains n distinct elements. Then the number of ordered arrangements
of these elements is equal to
n! = n(n − 1)(n − 2) · · · 1,
where the expression n! is read n factorial. By convention, 0! = 1 and
1! = 1, 2! = 2, 3! = 6, 4! = 24, 5! = 120, 6! = 720, 8! = 40320, · · ·
Ex: The number of distinct shuffles of a pack of cards is 52! ≈ 8.1 × 1067 . The number of
shuffles that move each card from its current place in the deck to a place occupied by another
card belonging to the same suit is (13!)4 ≈ 1.5 × 1039 .
Now suppose that E contains n = n1 + n2 + · · · + nr elements, n1 of which are alike, n2 of which
are alike, etc. How many ordered arrangements are there of E in which like elements are not
distinguished?
To answer this question, let Pn,r denote the unknown number and notice that for each such
arrangement, there are n1 ! ways to permute the type one elements amongst themselves, there
are n2 ! ways to permute the type two elements among themselves, and so forth. It follows
that for each arrangement of the elements of E there are n1 !n2 ! · · · nr ! permutations that shuffle
elements of the same type amongst themselves. Since there are n! ordered arrangements of E
in which we distinguish between all elements (say by adding extra labels to these elements), we
see that
n! = Pn,r n1 !n2 ! · · · nr ! ,
and therefore the number of ordered arrangements that do not distinguish between elements of
the same type is
n!
Pn,r =
.
n1 !n2 ! · · · nr !
2
Ex: The number of rearrangements of the letters in the word CHACHALACA is:
10!
= 12600,
4!3!2!1!
since the letter A is repeated 4 times, C is repeated 3 times, H is repeated 2 times, and L occurs
just once.
4. Combinations
If a set E contains n distinct elements, then how many different groups of r objects are contained
in E? Notice that this is equivalent to asking how many different ways we can sample r objects
from E without replacement and without regard to order.
To solve this, observe that by part 2 above, we know that there are
n(n − 1)(n − 2) · · · (n − r + 1) =
n!
(n − r)!
different ways to sample r objects from E without replacement when the order does matter.
Furthermore, by part 3, there are r! different permutations (or orders) of each such set of r
objects. Consequently, if Cn,r denotes the number of different groups of r objects in a set of size
n, then
n!
= Cn,r r!,
(n − r)!
since the number of ordered samples (the left-hand side) is equal to the number of unordered
samples (Cn,r ) times the number of ways of ordering any particular sample containing r objects
(r!). This shows that
n!
n
Cn,r =
=
,
r!(n − r)!
r
where the expression nr is read ‘n choose r’. Also, observe that
n
n
=
,
r
n−r
which follows either by direct calculation or by noticing that each choice of r elements from n
corresponds to a choice of the n − r remaining objects.
Ex: Suppose
you flip a coin ten times. Then the number of sequences which contain exactly k
10
heads is k :
10
10
10
10
0 = 10 = 1, 1 = 9
10
10
10
3 = 7 = 120, 4 =
10
= 10, 10
2 = 8 = 45
10
10
=
210,
6
5 = 252
Notice that there are many more sequences which have roughly equal numbers of heads and tails
than there are sequences that have a preponderance of one over the other.
3
An important context in which the numbers
n
(x + y) =
n
k
occur is in the binomial expansion:
n X
n
k=0
k
xk y n−k .
This can be proved by induction or by observing that when we expand the product (x + y)(x +
y) · · · (x + y), there will be exactly nk ways of choosing k terms to contribute an x and n − k
terms to contribute a y, with each such choice contributing 1 to the coefficient of xk y n−k on the
right-hand side.
By taking either x = y = 1 or x = 1, y = −1, we obtain the following important identities:
n X
n
= (1 + 1)n = 2n
k
k=0
n X
n
(−1)k = (−1 + 1)n = 0.
k
k=0
5. Multinomial coefficients
There is an important generalization of the binomial coefficients. Notice that the act of choosing
k elements from a set E of size n is equivalent to dividing E into two disjoint subsets of sizes k
and n − k. More generally, if n = n1 + · · · + nr , where the ni ’s are positive integers, then the
number of ways of dividing E into r disjoint subsets of respective sizes n1 , n2 , · · · , nr is equal to
n
n!
=
.
n1 , n2 , · · · , nr
n1 !n2 ! · · · nr !
The quantity n1 ,···n ,nr is called a multinomial coefficient.
Ex: Suppose that you roll a six-sided die 10 times. Then the number of sequences containing
five 1’s, three 3’s and two 6’s is
10
10
=
= 2520.
5, 0, 3, 0, 0, 2
5, 3, 2
Remark: Those ni equal to zero can be omitted from the multinomial coefficient.
The binomial expansion also generalizes to expressions involving three or more variables. The
following identity is called the multinomial expansion:
X
n
n
(x1 + x2 + · · · + xr ) =
xn1 1 xn2 2 · · · xnr r .
n
,
n
,
·
·
·
,
n
1 2
r
n +···+n =n
1
r
Here, the sum on the right-hand side is over all nonnegative integer-valued vector (n1 , n2 , · · · , nr )
such that n1 + n2 + · · · nr = n.
4