Download notes (p.7-12)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression programming wikipedia , lookup

Transcript
2
Counting
The easiest entry to mathematical probability is via discrete mathematics, which is the topic we study during the first few weeks. We
begin by introducing some of the basic concepts of discrete mathematics: sets, lists, and functions. They are used in a large variety of
circumstances and are fundamental to mathematics and beyond.
Sets. A set is an unordered collection of distinct objects (called elements). A set can be given by listing its elements, e.g.
A = {1, 3, 5, 7, 9},
B = {Austria, Germany, Switzerland}
or by describing the elements by their properties;
A = {positive odd numbers less than ten},
B = { Countries where German is an official language }
In the second case the description must be objectively unambiguous, e.g. ”delicious cakes” do not form a set in a mathematical sense.
The fact that an object x belongs to the set A is expressed by the notation x 2 A; if x does not belong to A then we write x 62 A. Due
to the unambiguity requirement, for every object x and set A, of the two statements x 2 A, x 62 A, precisely one holds.
The union, intersection, difference, and symmetric difference of two sets are defined as:
A [ B = {x | x 2 A or x 2 B},
A \ B = {x | x 2 A and x 2 B},
A
A
B = {x | x 2 A and x 62 B},
B = (A
B) [ (B
A).
(1)
(2)
(3)
(4)
Look at Figure 2 for a visual representation of the operations. We say that A and B are disjoint if A \ B = ;. The number of elements
Figure 2: From left to right: the union, the intersection, the difference, and the symmetric difference of two sets represented as disks in
the plane.
in a set A is denoted as |A|, or sometimes as #A. It is referred to as the size or the cardinality of A. For the time being, we limit
ourselves to finite sets, for which the cardinality is a non-negative integer. For such sets, the number of elements in the union of two sets
cannot be larger than the sum of the cardinalities:
|A [ B|  |A| + |B|,
(5)
with equality if and only if A and B are disjoint. A subset X ✓ A is a set such that every element in X is also an element of A. Note that
A may or may not have additional elements. It follows that |X|  |A|. The power set of A is the set of all subsets: 2A = {X | X ✓ A}.
Note that the elements of the power set are sets which themselves contain elements. For example, if A = {1, 2}, then
2A = {{}, {1}, {2}, {1, 2}}.
Notice the subtle difference between 1 and {1}; the latter being a set whose single element is 1. Also notice that the power set includes
the empty subset, which is typically denoted as ; = {}. As suggested by the notation, the cardinality of the power set is
|2A | = 2|A| .
(6)
Indeed, when we select a subset, we have two choices for each element: to use it or not to use it. There are |A| elements and therefore
2|A| different ways to resolve the choices.
7
Lists and functions. A list is an ordered collection of not necessarily distinct elements. Instead of studying them directly, we
express lists in terms of another mathematical concept in which we map elements of one set to elements of another set. A function f
from a domain D to a range R, denoted as f : D ! R, associates exactly one element in R to each element x 2 D. A list of k elements
is a function {1, 2, . . . , k} ! R; see Figure 3. Notice that the same element in the range may be associated to an arbitrary number of
elements of the domain (possible to none), but every element of the domain associated a unique element in the range. Thus the domain
and the range play an asymmetric role in this association.
D
1
2
3
4
5
6
7
8
f
a
b
c
d
1
2
3
z
R
Figure 3: The displayed function represents the list abcbz133.
It may be interesting to count how many functions there are from a given finite domain to a given finite range. Note that we have |R|
choices for each element in D. Since the choices are independent, we have a total of |R||D| functions. An example is R = {yes, no},
for which the function can be interpreted as taking a subset of D. The number of such functions is indeed |R||D| = 2|D| , which is the
number of subsets of D.
Besides modeling lists, functions are generally useful to express relationships. For example, every gene has a length, measured as
the number of nucleotides, so we have a function from the set of genes to the positive integers. It is possible that two genes have the
same length, and there are surely positive integers that are not the lengths of any gene. We will refine our language about functions to
make these distinctions. The function f : D ! R is injective or one-to-one if f (x) 6= f (y) for all x 6= y. It is surjective or onto if, for
every r 2 R, there exists some x 2 D with f (x) = r. The function is bijective or a one-to-one correspondence if it is both injective
and surjective. Any bijective function has an inverse, denoted by f 1 which maps from R to D, i.e. the role of the domain and range is
interchanged. The definition is simply f 1 (y) = x for any y 2 R if x is the unique element of D such that f (x) = y. In plain English,
the arrows in the visualization like in Fig. 3 are reversed (notice, however, that f in this picture is neither surjective nor injective, so it
does not have an inverse; indeed by reversing the arrows, we would not get a function).
The concept of functions is well known from elementary mathematics and calculus. For example, if R = D = R is the set of real
numbers, then the formula f (x) = 2x + 3 defines a function from R to R: its graph is a straight line in the coordinate plane. Its inverse
function is f 1 (y) = 12 (y 3) [CHECK!]. Another example is the function f (x) = x2 . Notice that even if we choose the domain
to be the whole real line, D = R, the range is only the set of non-negative real numbers, R = R+ . With this choice of D and R the
function is surjective but not injective [WHY?], hence it has no inverse. However, we may decide to restrict the domain and declare
that D = R+ , i.e. we define a new function, given by the same formula f (x) = x2 but with this new domain. It is still true that
R = R+ and this new function is now bijective [WHY?]. Many elementary functions in calculus are defined as inverses of certain other
p
functions, for example the square root function g(y) = y is the inverse of the square function f (x) = x2 we just defined. Similarly,
the logarithm is the inverse of the exponential function, etc.
Returning to finite sets, observe that two finite sets D and R have the same size if and only if there exists a bijection f : D ! R. For
finite sets, this is an observation, while for infinite sets, it is a definition of what we mean by ”same size” of infinite sets. (Interestingly,
there is rich hierarchy of infinite sets of ”different” size (also called cardinality), i.e. in mathematical set theory there are many infinities.
For example, the cardinality of the integer numbers is the same as the even numbers [WHY?], but not the same as the cardinality of the
real numbers).
To count the bijections, we assign elements of R to elements of D, in sequence. Assuming |D| = |R| = n, we have n choices for
the first element in the domain, n 1 choices for the second, n 2 for the third, and so on. Hence the number of different bijections
from D to R is n · (n 1) · . . . · 1, pronounced n factorial and denoted in short by n!.
8
Stirling formula. The following asymptotic formula provides a good intuition how fast n! grows as n grows:
⇣ n ⌘n p
n! ⇡
where the precise meaning of ⇡ is that
n!
⇣ ⌘n p
n
e
2⇡n
e
2⇡n
as n ! 1.
! 1,
This is called the Stirling formula. Notice that the most important term is nn , i.e. it shows that n! grows faster than any exponential
function (of n). One may also write it in exponentiated form:
n! ⇡ en log n
n+ 1
log n+ 1
log 2⇡
2
2
The leading term in the exponent is n log n. The subleading term is
.
n, this is less important, etc.
Permutations. A permutation is an ordered collection of distinct elements. In its basic form, it uses all elements, so it can be
defined as a bijection from a finite set to itself, f : D ! D. For example, the permutations of {1, 2, 3} are: 123, 132, 213, 231, 312,
and 321. Here we list the permutations in lexicographic order, as they would appear in a dictionary. Assuming |D| = k, there are k!
permutations or, equivalently, orderings of the set. To see this, we note that there are k choices for the first element, k 1 choices for
the second, k 2 for the third, and so on. The total number of choices is therefore k(k 1) · . . . · 1 = k!.
To shorten notation, we write [k] = {1, 2, . . . , k}. For k  n, a k-element permutation is an injection [k] ! [n]. In other words, a
k-element permutation is a list of k distinct elements from [n]. For example, the 3-element permutations of {1, 2, 3, 4} are
123,
213,
312,
412,
124,
214,
314,
413,
132,
231,
321,
421,
134,
234,
324,
423,
142,
241,
341,
431,
143,
243,
342,
432.
There are 24 of them. Six of them are orderings of {1, 2, 3}, or of any three of the four integers. In general, we have
n · (n
1) · . . . · (n
k + 1) =
n!
(n
k)!
(7)
k-element permutations of n elements, and we have k! orderings of every k-element subset of the n elements. It follows that we get the
number of such subsets by dividing (7) through k!.
Subsets. The binomial coefficient,
noted above, we have
n
k
, pronounced n choose k, is by definition the number of k-element subsets of a size n set. As
n
k
!
=
(n
n!
.
k)!k!
(8)
We fill out the following tables with values of nk , where the row index is the value of n and the column index is the value of k. Values
of nk for k < 0 and k > n are set to zero and omitted from the table:
0
1
2
3
4
5
0
1
1
1
1
1
1
1
2
3
4
5
1
2
3
4
5
1
3
6
10
1
4
10
1
5
1
We notice several patterns, namely n0 = n
= 1 (meaning there is only one way to choose no element, and also only one way to
n
choose all element), and nk = n n k . This table is also known as Pascal’s Triangle. If we draw it symmetrically, then we see that each
entry in the triangle is the sum of the two closest entries in the previous row:
9
1
1
1
1
3
1
4
1
5
n
k
=
n 1
k 1
+
n 1
k
1
3
6
10
We express this recipe more formally. For convenience, we define
PASCAL’ S R ELATION.
1
2
1
4
10
n
k
1
5
1
= 0 whenever n < 0, k < 0, or k > n.
.
P ROOF. We give two arguments for this identity. The first works by algebraic manipulations. Writing the first factor n as n = (n k)+k
in the definition of n!, we get
!
(n k)(n 1)! + k(n 1)!
n
=
k
(n k)!k!
=
=
(n 1)!
(n 1)!
+
k 1)!k!
(n k)!(k 1)!
!
!
n 1
n 1
+
.
k
k 1
(n
For the second argument, we partition the subsets of size k into two collections. Let |S| = n and let a be an arbitrary but fixed element
from S. Then nk counts the number of k-element subsets of S. To get the number of subsets that contain a, we count the (k 1)element subsets of S {a}, and to each such subset, we add a to get a k-element subset. To get the number of subsets that do not
contain a, we count the k-element subsets of S {a}. The former is nk 11 and the latter is n k 1 . Since the subsets that contain a are
different from the subsets that do not contain a, we get the number of k-element subsets of S equal to nk 11 + n k 1 , as required.
One of the advantages of the binomial coefficients over the powers of the integers is that they permit simple formulas for interesting
sums. As an example, we take a sum of binomial coefficients in which the k is fixed.
L EMMA.
Pn
j=k
j
k
=
n+1
k+1
.
P ROOF. We use Pascal’s Relation to prove this identity. It is instructive to trace our steps graphically, as shown in Figure 4. In the first
0
Figure 4: We replace each crossed out element by the two elements above it in Pascal’s triangle.
step, we replace n+1
by the two binomial coefficients above it in the triangle. Keeping the first term, we replace the second term by
k+1
the two elements above it in the triangle, and so forth. When we leave the triangle, the term is zero and we can stop. The kept terms are
the ones in the claimed sum.
Binomial coefficients. We use binomial coefficients to find a formula for (x + y)n . First, let us look at an example:
(x + y)2 = (x + y)(x + y)
= xx + yx + xy + yy
= x2 + 2xy + y 2 .
10
Notice that the coefficients in the last line are the same as in row 2 of Pascal’s Triangle. This is more generally the case:
B INOMIAL T HEOREM. (x + y)n =
Pn
i=0
n
i
xn i y i .
(The ”big sigma” is a shorthand notation for large summations, it stands for
Pn
i=0
ai = a0 + a1 + . . . + an .)
P ROOF. If we write each term of the result before combining like terms, we list every possible way to select either one x or one y from
each factor. Thus, the coefficient of xn i y i is equal to nn i = ni . In words, it is the number of ways we can select n i factors to
be x and have the remaining i factors be y. This is equivalent to selecting i factors to be y and have the remaining factors be x.
The Binomial Theorem has a number of interesting consequences. The first states something we already know, namely that the
cardinality of the power set is 2 to the cardinality of the set. Specifically, if we add the numbers of subsets of size k, for all possible
choices of k, we get the total number of subsets:
C OROLLARY 1.
Pn
i=0
n
i
= 2n .
P ROOF. Let x = y = 1 in the binomial theorem.
To find out how many subsets of a finite set have odd cardinality and how many have even cardinality, we consider the alternating
sum of binomial coefficients:
C OROLLARY 2.
Pn
i=0 (
P ROOF. Set x = 1 and y =
1)i
n
i
= 0.
1.
This implies that there are equally many odd subsets as there are even subsets. Hence, the number of odd subsets of a set of cardinality
n is 2n 1 .
Multisets. The difference between a set and a multiset is that the latter may contain the same element multiple times. In other words,
a multiset is an unordered collection of not necessarily distinct elements. We can list the repetitions:
A, G, G, T, T, T,
or we can specify the multiplicities,
#A = 1, #C = 0, #G = 2, #T = 3.
The size of a multiset is the sum of the multiplicities. For example, if we grind up a gene of length k into its individual nucleotides,
we get a multiset of size k. We show how to count the possible multisets by considering the problem of distributing k (identical) books
among n (different) shelves. The number of ways is equal to
• the number of size-k multisets of the n shelves;
• the number of ways to write k as a sum of n non-negative integers.
The latter means the number of solutions of the equation
m1 + m2 + . . . + mn = k
where all mj ’s are nonnegative integers, expressing the multiplicity of the element j. Notice that the labels 1, 2, 3, . . . n represent the
elements of the base set (having n elements), but they can be replaced with other notation depending on how we listed the elements.
E.g. in the above example, where the base set is naturally identified with {A, C, G, T }, we have mA = 1, mC = 0, mG = 2, mT = 3.
What matters is only the cardinality 4 of the base set and not the specific letters we use to denote them.
To get the formula, we line up our k books, then place n 1 dividers between them. The number of books between the i-th and the
(i 1)-st dividers is equal to the number of books on the i-th shelf. We thus have n + k 1 objects: k books plus n 1 dividers. The
1
number of ways to choose n 1 dividers from n + k 1 objects is n+k
= n+kk 1 .
n 1
11
Returning to the example of the gene, we note that n = 4 is the number of different types of nucleotides, and k is the length of the
gene. Grinding up the gene, produces a multiset of size k. The size is the number of times we get A plus the same for C and for G and
for T . There are only
!
!
k+3
k+3
=
k
3
different multisets of this kind. This should be contrasted to the number of nucleotide sequences of length k, which is 4k .
12