Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lectures 3-4: Combinatorial Analysis 1. Combinatorial analysis: Many problems in probability involve finite sample spaces in which every outcome is equally likely (e.g., a sequence of tosses of a fair coin). In this case, the probability of an event can be calculated by dividing the number of outcomes that belong to that particular event by the total number of outcomes in the sample space. Combinatorial analysis provides us with a set of tools that can be used to efficiently determine the sizes of these sets. 2. Ordered Samples Basic Principle of Counting: Suppose that r experiments are performed such that the first experiment can result in any one of n1 outcomes, the second experiment can result in any one of n2 outcomes, the third experiment can result in any one of n3 outcomes, and so forth. Here the particular outcomes (but not the number thereof) that can occur on the k’th experiment may depend on the outcomes of experiments 1 through k − 1. Then the total number of possible outcomes for the ordered sequence of all r experiments is n1 n2 · · · nr . Here are some important special cases: Ordered samples: If set E1 contains n1 elements, E2 contains n2 elements, ..., and Er contains nr elements, then the number of ordered r-tuples (x1 , x2 , · · · , xr ) that can be formed with xi ∈ Ei is equal to n1 n2 · · · nr . Sampling with replacement: If E1 = · · · = Er = E, which contains n elements, then the number of ordered r-tuples is just nr . We can think of the r-tuple (x1 , x2 , · · · , xr ) as an ordered sample of size r from E where we are sampling with replacement, i.e., the same element can be sampled repeatedly. Sampling without replacement: If we sample a set containing n elements r-times without replacement, then the number of ordered samples is n(n − 1)(n − 2) · · · (n − r + 1). In this case, each element can be sampled at most once, so that the elements that can be sampled on the k’th draw depend on the elements sampled on draws 1, · · · , k − 1. Ex: The number of different subsets of a set E containing n elements is 2n . To see this, write E = {e1 , · · · , en } and notice that there is a one-to-one correspondence between the subsets of E and binary sequences (x1 , · · · , xn ) obtained by setting xi = 1 if ei is in the subset and xi = 0 otherwise. Ex: Proteins and DNA sequences. Proteins are sequences of 20 basic molecules called amino acids. The order in which the amino acids are strung together determines both the structure and the function of the protein. Notice that there are 20L distinct proteins that can be made using a sequence of L amino acids. 1 DNA is also a polymer, but it is made up 4 basic molecules called nucleotides, denoted A, T, G, and C. Thus there are 4l distinct DNA sequences that are l nucleotides long. Proteins are encoded by DNA sequences, i.e., each protein corresponds to a segment of DNA called a gene, and this encoding is local in the sense that changing one nucleotide in a gene will change at most one amino acid in the corresponding protein. However, more than one nucleotide is needed to specify an amino acid. Indeed, since there are 20 amino acids but only 4 nucleotides, groups of either one nucleotide or two neighboring nucleotides could specify at most 4 and 42 = 16 amino acids. In fact, groups of three neighboring nucleotides, called triplet codons (or codons, for short) are used to specify amino acids. This is possible since there are 43 = 64 such triplet codons, which is more than enough to encode 20 amino acids. Moreover, because there are more than 20 such codons, we can deduce that the genetic code must be degenerate: some amino acids must be encoded by more than one triplet codon. In general, this degeneracy is present at the third nucleotide in the codon, which can often be changed without changing the identity of the corresponding amino acid. 3. Permutations Suppose that the set E contains n distinct elements. Then the number of ordered arrangements of these elements is equal to n! = n(n − 1)(n − 2) · · · 1, where the expression n! is read n factorial. By convention, 0! = 1 and 1! = 1, 2! = 2, 3! = 6, 4! = 24, 5! = 120, 6! = 720, 8! = 40320, · · · Ex: The number of distinct shuffles of a pack of cards is 52! ≈ 8.1 × 1067 . The number of shuffles that move each card from its current place in the deck to a place occupied by another card belonging to the same suit is (13!)4 ≈ 1.5 × 1039 . Now suppose that E contains n = n1 + n2 + · · · + nr elements, n1 of which are alike, n2 of which are alike, etc. How many ordered arrangements are there of E in which like elements are not distinguished? To answer this question, let Pn,r denote the unknown number and notice that for each such arrangement, there are n1 ! ways to permute the type one elements amongst themselves, there are n2 ! ways to permute the type two elements among themselves, and so forth. It follows that for each arrangement of the elements of E there are n1 !n2 ! · · · nr ! permutations that shuffle elements of the same type amongst themselves. Since there are n! ordered arrangements of E in which we distinguish between all elements (say by adding extra labels to these elements), we see that n! = Pn,r n1 !n2 ! · · · nr ! , and therefore the number of ordered arrangements that do not distinguish between elements of the same type is n! Pn,r = . n1 !n2 ! · · · nr ! 2 Ex: The number of rearrangements of the letters in the word CHACHALACA is: 10! = 12600, 4!3!2!1! since the letter A is repeated 4 times, C is repeated 3 times, H is repeated 2 times, and L occurs just once. 4. Combinations If a set E contains n distinct elements, then how many different groups of r objects are contained in E? Notice that this is equivalent to asking how many different ways we can sample r objects from E without replacement and without regard to order. To solve this, observe that by part 2 above, we know that there are n(n − 1)(n − 2) · · · (n − r + 1) = n! (n − r)! different ways to sample r objects from E without replacement when the order does matter. Furthermore, by part 3, there are r! different permutations (or orders) of each such set of r objects. Consequently, if Cn,r denotes the number of different groups of r objects in a set of size n, then n! = Cn,r r!, (n − r)! since the number of ordered samples (the left-hand side) is equal to the number of unordered samples (Cn,r ) times the number of ways of ordering any particular sample containing r objects (r!). This shows that n! n Cn,r = = , r!(n − r)! r where the expression nr is read ‘n choose r’. Also, observe that n n = , r n−r which follows either by direct calculation or by noticing that each choice of r elements from n corresponds to a choice of the n − r remaining objects. Ex: Suppose you flip a coin ten times. Then the number of sequences which contain exactly k 10 heads is k : 10 10 10 10 0 = 10 = 1, 1 = 9 10 10 10 3 = 7 = 120, 4 = 10 = 10, 10 2 = 8 = 45 10 10 = 210, 6 5 = 252 Notice that there are many more sequences which have roughly equal numbers of heads and tails than there are sequences that have a preponderance of one over the other. 3 An important context in which the numbers n (x + y) = n k occur is in the binomial expansion: n X n k=0 k xk y n−k . This can be proved by induction or by observing that when we expand the product (x + y)(x + y) · · · (x + y), there will be exactly nk ways of choosing k terms to contribute an x and n − k terms to contribute a y, with each such choice contributing 1 to the coefficient of xk y n−k on the right-hand side. By taking either x = y = 1 or x = 1, y = −1, we obtain the following important identities: n X n = (1 + 1)n = 2n k k=0 n X n (−1)k = (−1 + 1)n = 0. k k=0 5. Multinomial coefficients There is an important generalization of the binomial coefficients. Notice that the act of choosing k elements from a set E of size n is equivalent to dividing E into two disjoint subsets of sizes k and n − k. More generally, if n = n1 + · · · + nr , where the ni ’s are positive integers, then the number of ways of dividing E into r disjoint subsets of respective sizes n1 , n2 , · · · , nr is equal to n n! = . n1 , n2 , · · · , nr n1 !n2 ! · · · nr ! The quantity n1 ,···n ,nr is called a multinomial coefficient. Ex: Suppose that you roll a six-sided die 10 times. Then the number of sequences containing five 1’s, three 3’s and two 6’s is 10 10 = = 2520. 5, 0, 3, 0, 0, 2 5, 3, 2 Remark: Those ni equal to zero can be omitted from the multinomial coefficient. The binomial expansion also generalizes to expressions involving three or more variables. The following identity is called the multinomial expansion: X n n (x1 + x2 + · · · + xr ) = xn1 1 xn2 2 · · · xnr r . n , n , · · · , n 1 2 r n +···+n =n 1 r Here, the sum on the right-hand side is over all nonnegative integer-valued vector (n1 , n2 , · · · , nr ) such that n1 + n2 + · · · nr = n. 4