* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Probability Theory
Survey
Document related concepts
Transcript
Probability Theory Robert R. Snapp January 13, 2009 1 Informal Set Theory A set is defined as collection of objects, or elements. Familiar examples are the set of greek letters, f˛; ˇ; !; : : : ; !g, the set of binary values, B D f0; 1g, and the set of integers Z D f: : : ; !1; 0; 1; : : :g, the set of positive integers (or natural numbers) N D ZC D f1; 2; 3; : : :g, the set of negative integers Z! D f!1; !2; !3; : : :g, and the set of nonpositive integers Z" D f0; 1; 2; : : :g. Ellipses (the symbol “: : :”) can represent either a finite or infinite number of implied elements. We will often use capital Roman letters, such as A; B; C to represent different sets, and lowercase letters (x; y; z) to represent elements. When defining sets using the brace notation, e.g., fx; y; zg, it is important to adhere to the rule that each element should only appear once. If an element x belongs to a set A, we then state that x is a member of A. More concisely, we write x 2 A (or A 3 x). This notation allows us to define the set of rational numbers as Q D fm=n W m; n 2 Z and n ¤ 0g: (The above notation, fx W P .x/g, denotes the set of all elements x such that the proposition P .x/ is true.) For completeness, recall that the symbol R denotes the set of real numbers, and C, the set of complex numbers. Thus " 2 R and 1 C i 2 C. If an element x is not a member of A, we write x … A. For example, 0 … N If every member of A is also a member of B, we say that A is a subset of B, and write A " B (or B # A). If A is not a subset of B, we write A › B. If every member of A belongs to B, and B contains at least one member that does not belong to A, we say that A is a proper subset of B, and write A $ B (or B % A). Thus, N $ Z because all natural numbers are also integers, but not all integers are natural numbers. Two sets A and B are said to be equal if the contain exactly the same elements, or alternatively if A " B and B " A. In this case we write A D B. Note that the set relations ", $, and D are transitive: A " B and B " C imply that A " C . A set without members is called the empty set, and is designated by the special symbol ;, or more pictorially by fg. On occasion we may find it useful to consider sets that have other sets as members. In such cases a set of sets is often called a class or family, and a set of classes or families, a collection. An example of a class is the power set of A, denoted by P .A/, which is defined as the set of all subsets of A: P .A/ D fB W B " Ag: Thus, for example P .B/ D f;; f0g; f1g; f0; 1gg. The cardinality of a set A, denoted by jAj, is defined to be the number of members it contains. Thus, j;j = 0, jBj D 2 and jNj D 1. A set A is said to be finite if jAj < 1. A set is said to be denumerable if its elements correspond one-to-one with the natural numbers. E XERCISE 1 Show that Z and Q are denumerable sets. 1 1 INFORMAL SET THEORY The German mathematician Georg Cantor (1845–1918) created quite a stir in mathematics by showing that it was not possible to construct a one-to-one correspondence between the natural numbers, and a bounded interval of real numbers, e.g., the unit interval Œ0; 1#. In fact, there are many more numbers in the unit real interval, than there are in the set N 1 . He thus demonstrated that it is incorrect to assume that there is only one infinity, 1. Rather there is an infinite hierarchy of infinities, @0 < @1 < @2 < & & & , called “aleph null,” “aleph one,” etc. This result is more than esoteric. It forces us to treat real-valued random variables with special care in order to avoid absurd conclusions. Cantor’s showed that the reals are not denumerable using proof by contradiction. For example, let’s assume that the numbers in the real interval can be placed into a one-to-one correspondence with the naturals. In the table below, naturals appear on the left, the reals in [0, 1] on the right. We represent each number on the right by its decimal expansion, which is non-repeating for the transcendental reals, such as " ! 3. Also note that we don’t care about placing the reals in any particular order. N 1 2 3 4 5 6 :: : Œ0; 1# D fx 2 R W 0 ' x ' 1g 0.01250987362839402938420& & & 0.98983749283497383223384& & & 0.73397403982739483333049& & & 0.14159265358979323846264& & & 0.25000000000000000000000& & & 0.12345678901234567890123& & & :: : Now if our assumption is correct, every real number in the unit interval appears somewhere in the right column. However, Cantor discovered that by manipulating the diagonal digits (which appear above in bold) we can easily create a number that does not appear in the right column. Let’s create a new number z that has a different first digit than the first number, a different second digit than the second, and so on, for example z D 0:194617 & & & . Since this number has a digit that differs from every number in the table, it cannot appear in the right hand side. (For example, if I were to claim that z actually appears on the right side in the 145,763,804th row, you could rightly say, “that’s impossible, because the 145,763,804th digit of z differs from the number that appears in that row.” Since this diagonalization trick can be applied to any attempted one-to-one correspondence, the real numbers are not denumerable. An interesting corollary is that the set of infinite sequences of coin tosses is also not denumerable. Note that ever real number x in the unit interval can be represented as a infinite dyadic series: xD b2 b3 b1 C 2 C 3 C &&& 2 2 2 where bk 2 f0; 1g, for k D 1; 2; : : :, which is equivalent to the binary representation, x D 0:b1 b2 b3 & & & . Now simply associate 0 with T and 1 with H , to construct a one-to-one correspondence between the reals in Œ0; 1# and the set of infinite sequences of coin tosses. This has important implications for many stochastic processes. 1.1 Set Operations Any two sets can be combined to define a new set in a variety of ways. The union of A and B, denoted symbolically by A [ B, consists of the members that belong to at least one of the two sets. Thus, A [ B D fx W x 2 A or x 2 Bg: For example, f1; 2; 3; 4g [ f3; 4; 5; 6g D f1; 2; 3; 4; 5; 6g: 1 His method used a powerful diagonalization technique, that eventually lead to the even greater stir in logic caused by Kurt Gödel. January 13, 2009 (1:57 PM) 2 Robert R. Snapp © 2008 1 INFORMAL SET THEORY The intersection of A and B, denoted by A \ B, consists of the members that are common to both sets. Thus, A \ B D fx W x 2 A and x 2 Bg: Thus, for example, f1; 2; 3; 4g \ f3; 4; 5; 6g D f3; 4g: Two arbitrary sets, A and B are said to be disjoint if A \ B D ;. The difference between A and B, denoted by A n B, consists of the members of A that do not belong to B. Thus, A n B D fx W x 2 A and x … Bg: For example, f1; 2; 3; 4g n f3; 4; 5; 6g D f1; 2g; and f3; 4; 5; 6g n f1; 2; 3; 4g D f5; 6g; The symmetric difference between A and B, denoted by A 4 B, consists of the elements that are unique to each set. Thus, A 4 B D .A n B/ [ .B n A/; whence, f1; 2; 3; 4g 4 f3; 4; 5; 6g D f1; 2; 5; 6g; 1.2 Venn Diagrams Ω Ω B Bc A Ac Figure 1: A Venn diagram of two sets, B $ A. Figure 2: The complements of the previous two sets. The crosshatching evident in the region corresponding the Ac suggests that B " A ) Ac $ B c . The English logician, John Venn (1834–1923) introduced the use of two-dimensional diagrams to help visualize abstract set relations and operations (Figs. 1 through 3). In each of these diagrams is constructed in the context of an abstract universal set, $ that is depicted as the enclosing rectangle. Each subset of $ is represented by a set of points within the rectangle, usually by an elliptical region. In order to distinguish one subset from the other, their interiors are sometimes shaded distinctly. In Fig. 1, two intersecting subsets of $ are shown, one as subset A (shaded with January 13, 2009 (1:57 PM) 3 Robert R. Snapp © 2008 1 INFORMAL SET THEORY right-handed diagonals.2 ), and the other as subset B (with left-handed diagonals). Because B is a subset of A, the region interior to B exhibits both kinds of shading. This suggests the hypothesis, B " A ) A \ B D B. However, it should be emphasized that although Venn diagrams provide a useful heuristic, they do not provide a general proof of a theorem, as they can only illustrate one particular instance. Thus proofs of set theoretic theorems are algebraic in nature. For instance, to prove the above hypothesis, we show that A " B implies both of the following statements: (i) A \ B " B, and (ii) B " A \ B. To demonstrate (i) we use the definition of set intersection: if x 2 A \ B, then x 2 A and x 2 B. This, in turn, by the definition of subset, implies that A \ B is a subset of B (as well as of A). To demonstrate (ii) we begin with the definition of B " A, which implies x 2 B ) x 2 A. Since now x 2 A and x 2 B, it follows that x 2 A \ B. Consequently, by the definition of subset, B $ A \ B. Given a particular context, one defines the universal set, $, appropriately. Thus, when discussing subsets of real numbers, $ D R, for subsets of playing cards $ D fA|; 2|; : : : ; K|; A}; 2}; : : : ; K}; A~; 2~; : : : ; K~; A(; 2(; : : : ; K(g: Once $ has been defined, it is possible to introduce a new set operation, the complement, which is defined by Ac D fx 2 $ W x … Ag: The complements of sets A and B in Fig. 1 are depicted in Fig. 2. Here, Ac is the region that is exterior to the ellipse that defines A, and is shaded with right-handed diagonals. Similarly, B c is the region exterior to the circle that defines B, and is shaded with left-handed diagonals. That the region defining Ac is cross hatched suggests that Ac $ B c , and thus (perhaps) that B $ A ) Ac $ B c . (Can you prove this?) One of the more useful theorems from set theory are De Morgan’s Laws: .A \ B/c D Ac [ B c .A [ B/c D Ac \ B c Given a sequence set A1 ; A2 ; : : : ; An ; : : : that is possibly infinite, the mutual union and intersection are defined respectively as \ [ An D A1 [ A2 [ & & & ; and, An D A1 \ A2 \ & & & : n n Using induction, one can generalize De Morgan’s Laws for sequences of sets A1 ; A2 ; : : :, !c [ \ An D An c n [ n An !c n D \ An c n A denumerable family of sets fA1 ; A2 ; : :S :g is said to be mutually disjoint if Ai \ Aj D ; whenever i ¤ j . A denumerable family is said to be complete if n An D $: 1.3 Limit sets Given an infinite sequence of sets A1 ; A2 ; A3 ; : : :, we define the limit superior as lim sup An D n 1 [ 1 \ Ak ; nD1 kDn 2 Here, right-handed diagonals run from the upper right to the lower left. They are the diagonals that are easiest to draw with one’s right hand. Conversely, left-handed diagonals run from the upper left to the lower right. January 13, 2009 (1:57 PM) 4 Robert R. Snapp © 2008 1 INFORMAL SET THEORY Ω ∪ ∪ A B A B C C Figure 3: A Venn diagram of three intersecting sets A; B, and C . Set A is represented by right diagonals, set B by left diagonals, and set C by horizontal rules. and the limit inferior as lim inf An D n 1 \ 1 [ Ak : nD1 kDn Informally, the limit superior contains those elements that occur in an infinite number of sets, while the limit inferior contains those elements that occur in all but a finite number of sets. If A D lim sup An D lim inf An ; n n then we say that the sequence fAn g has a limit equal to A: lim An D A: n!1 1.4 Problems 1.1 Evaluate .f1; 2; 3g [ f2; 3; 4g/ \ .f1; 2; 3g n f3; 4; 5g/. 1.2 Create a copy of the Venn diagram that appears in Fig. 3, and label each of the 8 homogeneously shaded region in terms of A, B, and C . For example, the central region that contains all three shadings (left diagonal, right diagonal, and horizontal rules) is A \ B \ C . 1.3 Show that .A [ B/ \ C D .A \ C / [ .B \ C /. 1.4 Show that A [ .B \ C / D .A [ B/ \ .A [ C /. 1.5 Show that A $ B and B $ C ) A $ C , (transitivity). 1.6 Show that A 4 B D .A [ B/ n .A \ B/. 1.7 Show that ; " A for every set A. 1.8 Show that .A \ B/ " .A [ B/ for any sets A and B. 1.9 Show that A $ B ) jAj 'j Bj. 1.10 Show that .Ac /c D A. January 13, 2009 (1:57 PM) 5 Robert R. Snapp © 2008 2 COUNTING 1.11 Show that $c D ; and ;c D $. 1.12 Show that A \ $ D A and A [ ; D A. 1.13 Show that A \ .B n .A \ B// D ;. 1.14 The union of two arbitrary sets can be expressed as the union of two disjoint sets, e.g. A [ B D A [ .B n .A \ B/ (see the preceding problem). Find an analogous decomposition (or partition) for the union of three arbitrary sets A [ B [ C. 1.15 Show that lim infn An " lim supn An . 2 Counting (William Feller’s Introduction to Probability Theory and Its Applications, Vol. I [1] provides a more detailed explanation of the following material.) 2.1 Permutations and Combinations The number of permutations of n distinct objects equals n-factorial: nŠ D n.n ! 1/.n ! 2/ & & & 3 & 2 & 1. (Recall that 0Š D 1.) For example, there are six different permutations of the greek letters ˛; ˇ; ! : ˛ˇ! ˛!ˇ ˇ! ˛ ˇ˛! ! ˛ˇ !ˇ˛ Note that there are 3 different ways to choose the first letter, but once it has been chosen only 2 choices remain for the second letter, and once those two are chosen, only 1 option remains for the third and final letter. Thus the number of permutations is 3 & 2 & 1 D 6. This multiplication principle can be extended to solve a multitude of combinatorial problems. E XERCISE 2 In how many different p ways can a standard deck of 52 playing cards be ordered? For problems of this sort, Stirling’s Formula, nŠ ) nn e !n 2" n, is quite useful. Now consider the enumeration of permutations with repeated elements, e.g., the sequence ˛˛˛ˇˇ can be reordered in only ten different ways: ˛˛˛ˇˇ ˛ˇ˛ˇ˛ ˛˛ˇ˛ˇ ˇ˛˛ˇ˛ ˛ˇ˛˛ˇ ˛ˇˇ˛˛ ˇ˛˛˛ˇ ˇ˛ˇ˛˛ ˛˛ˇˇ˛ ˇˇ˛˛˛ The previous formula, 5Š D 120 grossly over counts the number of distinct sequences by a factor of 3Š & 2Š, the number of (identical) permutations of 3 ˛s, multiplied by the number of (identical) permutations of 2 ˇs. Thus number of distinct permutations of ˛˛˛ˇˇ is 5!/(3! 2!) = 10, in agreement with the constructive enumeration above. More generally the number of permutations of N˛ ˛s, Nˇ ˇs, : : :, N! !s is .N˛ C Nˇ C & & & C N! /Š : N ˛ Š Nˇ Š & & & N! Š This ratio arises from the observation that .N˛ C Nˇ C & & & C N! /Š is a gross over count, as it includes all duplicate character sequences. Note that the denominator expresses the number of duplicates of each sequence, as there are N˛ Š indistinguishable orderings of the N˛ ˛s, Nˇ Š indistinguishable orderings of the Nˇ ˇs, etc. Thus, the above ratio, also known as a multinomial coefficient: ! .N˛ C Nˇ C & & & C N! /Š N˛ C Nˇ C & & & C N ! D N˛ Nˇ & & & N! N ˛ Š N ˇ Š & & & N! Š January 13, 2009 (1:57 PM) 6 Robert R. Snapp © 2008 2 COUNTING yields the correct answer. For sequences having only two distinct symbols (e.g. ˛˛˛ˇˇ), the multinomial coefficient reduces to the more familiar binomial coefficient: ! n.n ! 1/ & & & .n ! k C 1/ n nŠ D : D kŠ .n ! k/Š kŠ k Sequences with two symbols can also be used to represent subset inclusion. For example, suppose we would like to count the number of possible five-card poker hands. In the diagram below, an ˛ is placed immediately below each particular card that is included in the poker hand A |; 2 |; 3 |; 4 |; 5 |. A ˇ is place immediately below each of the remaining 47 cards not included in the hand. Clearly there is a one-to-one correspondence between the set of five-card poker hands, and set of 52 character permutations consisting of 5 ˛s and 47 ˇs. Thus the number of poker hands is ! 52 D 2; 598; 960: 5 More generally, the number of ways of selecting subsets of size k from a universal set of size n, (or the number of combinations), is ! n n : Ck D k A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K | | | | | | | | | | | | | } } } } } } } } } } } } } ! ! ! ! ! ! ! ! ! ! ! ! ! ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ˛ ˛ ˛ ˛ ˛ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ 2.2 Occupancy Problems Many common problems in discrete probability can be reduced to occupancy problems. Consider the problem of placing n labeled tokens into k labeled bins. In how many ways can this be done? Since each token can be assigned to a bin in k distinguishable ways, independent of the rest, the multiplication principle yields k n arrangements. Example 1 What is the cardinality of the set of n-bit binary numbers? Here each of the n bits plays the role of a token, and the possible values, 0 or 1, assumes the role of a bin. Thus k D 2, and so the answer is 2n . Likewise, there are 2n different possible outcomes from a sequence of n tosses of a single coin. A slightly more difficult situation occurs in the event that the n tokens are indistinguishable from one-another. For example, let k = 6 and n D 8, and let vertical bars denote the dividers between the bins, and let disks (*) denote the tokens. One possible arrangement is j * j **j ****j *jjj Note that k C 1 D 7 dividers are used to delinate the k D 6 bins. Moreover the two outer dividers are not really necessary, since no token can be placed to the left of the first divider, nor to the right of the last one. Thus, this configuration can be abbreviated by the n D 8 disks and k ! 1 D 6 dividers: *j * *j * * * *j * jj Since there is a one-to-one correspondence between the set of these configurations and distinguishable strings of n ˛s and k ! 1 ˇs, the number of possible arrangements is ! ! nCk!1 nCk!1 .n C k ! 1/Š : D D nŠ .k ! 1/Š n k!1 January 13, 2009 (1:57 PM) 7 Robert R. Snapp © 2008 3 ELEMENTARY PROBABILITY THEORY 2.3 Problems 2.1 How many different five-card hands of poker include four cards of the same rank? 2.2 How many five-card hands of poker include both three of a kind (three cards of the same rank), and a pair (two cards of the same rank)? This hand is called a full house. 2.3 How many five-card hands of poker include five cards in the same suit that do not have consecutive ranks? This hand is called a flush. 2.4 How many five-card hand of poker include five cards of consecutive rank, representing two or more suits. This hand is called a straight. 2.5 How many five-card hands of poker include only a pair of cards of the same rank? 2.6 In the game of blackjack, each player is initially dealt two cards, from a standard deck of 52 cards. (We will ignore for the time being that casinos typically use several decks of cards to confound card counters.) If one of the cards dealt is an Ace, and the other a face card (J; Q; K) or a ten, then the player declares “blackjack” and wins instantly. How many different combinations of two cards yields blackjack? 2.7 In how many different ways can two rooks be placed on a chessboard so that they can take one another. [1] 2.8 If n tokens are placed randomly into n bin, find the probability that exactly one bin remains empty. [1] 2.9 Three dice are thrown twice. What is the probability that the first throw differs from the second if the dice are distinguishable (e.g., one die might be red, another black, and the third one white)? Repeat this problem for the cast that the dice are indistinguishable (so that a 1–3–5 now matches the permutations 1–5–3, 3–5–1, etc.) [1] 2.10 In how many distinguishable ways can n red tokens, and m black tokens be placed into k bins? 2.11 Prove the identity ! ! ! n n!1 n!1 D C : k k!1 k 2.12 (a) Each year Bob orders exactly 10 boxes of cookies from Alice, the girl scout who lives next door. If 8 cookie varieties are currently available, in how many different ways can Bob place his order? (b) Each year Maria orders a number of boxes that ranges anywhere from 0 to 10. In how many different ways can she place her cookie order from Alice? 3 Elementary Probability Theory The development of the theory of probability is a fascinating historical subject. [4] 3.1 Kolmogorov’s Axioms of Probability A. N. Kolmogorov (1933) introduced a probability space [3] as a triple .$; A; P /: 1. $ is the set of elementary events, or sample space. 2. Let A $ $. A is called an event. A] then denotes the set of observable events. (Technically, A is a class. Nevertheless it is usually referred to as a set.) In order to achieve self-consistency, A must also be a so-called & -algebra of events: * If A 2 A, then Ac 2 A. January 13, 2009 (1:57 PM) 8 Robert R. Snapp © 2008 3 ELEMENTARY PROBABILITY THEORY * If Ai 2 A, for i D 1; 2; : : : (finite or countably infinite), then [ Ai 2 A: i * $ 2 A. 3. P is a probability measure that assigns a non-negative, real number to each element of A, such that * P .$/ D 1, and * if A1 ; A2 ; : : : (with Ai 2 A, for i D 1; 2; : : :) denotes a sequence of mutually exclusive events, then ! [ X P Ai D P .Ai /: (complete additivity) i i Example 2 (Three consecutive coin tosses) The set of elementary events is finite: $ D fHHH; HH T; H TH; H T T; THH; TH T; T TH; T T T g The set of measurable events, A has cardinality jAj D 2j!j D 256: Some elements of A include * Each elementary event: fHHH g; fHH T g; fH TH g; fH T T g; : : : * Sequences with two or more heads: fHHH; HH T; H TH; THH g * Sequences with one or fewer heads: fH T T; TH T; T TH; T T T g * Sequences with exactly one head fH T T; TH T; T TH g * Palindromic sequences fHHH; H TH; T T T; TH T g * Non-palindromic sequences fHH T; H T T; THH; T TH g * First toss lands heads: fHHH; H TH; HH T; H T T g * First toss lands tails: fTHH; TH T; T TH; T T T g * Homogeneous sequences: fHHH; T T T g * Heterogeneous sequences: fHH T; H TH; H T T; THH; TH T; T TH g 3.2 Some Useful Theorems For a & -algebra A 1. If Ai 2 A, for i D 1; 2; : : : ; n then Proof: By De Morgan’s Law, Tn iD1 Ai 2 A. n \ iD1 Ai D n [ iD1 Ai c !c : By the definition of a & -algebra, the right side of the above is in A. ! 2. If A; B 2 A, then A n B 2 A. Proof: A n B D A \ B c : From the theorem above, A \ B c 2 A. ! Given a probability space .$; A; P /, with A; B; C 2 A, January 13, 2009 (1:57 PM) 9 Robert R. Snapp © 2008 4 CONDITIONAL PROBABILITY 3. P .Ac / D 1 ! P .A/. 4. P .;/ D 0. 5. P .A/ D 0 » A D ; 6. P .A [ B/ D P .A/ C P .B/ ! P .A \ B/ 7. P .A [ B [ C / D P .A/ C P .B/ C P .C / ! P .A \ B/ ! P .A \ C / ! P .B \ C / C P .A \ B \ C / 4 Conditional Probability Given a probability space .$; A; P /, with A; B 2 A, such that P .B/ > 0, the conditional probability of A with respect to B is defined as def P .A \ B/ : P .AjB/ D P .B/ Theorems 8. For Ai 2 A, for i D 1; 2; : : : ; n ˇ n!1 ! ! n ˇ\ " ! ˇ \ ˇ ˇ P Ai D P .A1 / P .A2 jA1 / P A3 ˇA1 \ A2 & & & P An ˇ Ai ˇ iD1 iD1 9. P .$jAi / D 1. 10. If Ai 2 A, for i D 1; 2; : : : (finite or countably infinite), are mutually exclusive, Ai \ Aj D ; whenever i ¤ j , and if B 2 A with P .B/ > 0, then ˇ ! X [ ˇˇ P Ai ˇB D P .Ai jB/: ˇ i 4.1 i Bayes’s Theorem P .AjB/ D P .BjA/P .A/ P .B/ Proof: From the definition of conditional probability, or Theorem 6 with n D 2, P .A \ B/ D P .AjB/ P .B/; D P .BjA/ P .A/; (by symmetry). Equating the two right sides of the above, yields the hypothesis.! Theorem (Total Probability): Let Ai 2 A, for i D 1; 2; : : : ; n, be mutually exclusive, with B 2 A. Then, n X P .B/ D P .BjAi / P .Ai /: Sn iD1 Ai D $. Let iD1 Proof: B D Sn iD1 B \ Ai . Note that .B \ Ai / \ .B \ Aj / D ;, whenever i ¤ j . ! Combining the above, yields the more general theorem of Bayes: P .BjAj / P .Aj / P .Aj jB/ D Pn : iD1 P .BjAi / P .Ai / January 13, 2009 (1:57 PM) 10 Robert R. Snapp © 2008 5 STATISTICAL INDEPENDENCE 5 Statistical Independence Two events A; B 2 A are said to be independent if P .A \ B/ D P .A/ P .B/. Exercise: If A and B are independent, show that P .Ac \ B/ D P .Ac / P .B/; P .A \ B c / D P .A/ P .B c /; and, P .Ac \ B c / D P .Ac / P .B c /: More generally, n events Ai 2 A, for i D 1; 2; : : : ; n are said to be mutually independent if all of the following equations are satisfied, for m D 1; 2; 3; : : : n P .Ai1 \ Ai2 \ & & & \ Aim / D P .Ai1 / P .Ai2 / & & & P .Aim /; with, 1 ' i1 < i2 < & & & < im ' n. Example 3 With n D 3, the events A, B, and C , are independent if and only if P .A \ B/ D P .A/ P .B/; P .A \ C / D P .A/ P .C /; P .B \ C / D P .B/ P .C / P .A \ B \ C / D P .A/ P .B/ P .C /: Example 4 Let A1 ; A2 ; : : : An denote n mutually independent events, and let pi D P .Ai / for i D 1; : : : ; n. Then the probability that none of the events occurs is # $ P .A1 c \ A2 c \ & & & \ An c / D P .A1 [ A2 [ & & & [ An /c D 1 ! P .A1 [ A2 [ & & & [ An / D1! ! n X i1 D1 n X P .Ai1 / C n X n X n X pi1 C n X i1 D1 i2 Di1 C1 n X i1 D1 i2 Di1 C1 i3 Di2 C1 P .Ai1 \ Ai2 / P .Ai1 \ Ai2 \ Ai3 / C & & & C .!1/n P .A1 \ A2 \ : : : An / D1! ! i1 D1 n X n X n X n X pi1 pi2 i1 D1 i2 Di1 C1 n X i1 D1 i2 Di1 C1 i3 Di2 C1 pi1 pi2 pi3 C & & & C .!1/n p1 p2 p3 & & & pn D .1 ! p1 /.1 ! p2 /.1 ! p3 / & & & .1 ! pn / D P .A1 c / P .A2 c / P .A3 c / & & & P .An c / : 5.1 Birthday Paradox Let An denote the event that two or more people, in a group of n share the same birthday (neglecting leap years). &% & % & % 2 n!1 1 c 1! &&& 1 ! : P .An / D 1 1 ! 365 365 365 Thus, &% & % & % 2 n!1 1 P .An / D 1 ! P .An c / D 1 ! 1 1 ! 1! &&& 1 ! : 365 365 365 January 13, 2009 (1:57 PM) 11 Robert R. Snapp © 2008 7 DISCRETE RANDOM VARIABLES 1.0 n 10 20 30 40 50 60 70 80 6 P (An) 0.8 P .An / 0.116948 0.411438 0.706316 0.891232 0.970374 0.994123 0.999160 0.999914 0.6 0.4 0.2 0.0 0 20 40 n 60 80 Bernoulli Trials Consider the example of a coin being tossed n times. Assume successive tosses are mutually independent, and let n p D P .H / and q D 1 ! p D P .T /. (Note, j$j D 2n , and jAj D 22 .) Then ! n i n!i b.i; n/ D P .i heads in a sequence of n tosses/ D pq : i Note, P .$/ D n X iD0 ! n X n i n!i b.i; n/ D D .p C q/n D 1: pq i iD0 Example 5 P .1 or more heads in n tosses/ D 1 ! b.0; n/ D 1 ! .1 ! p/n : 7 Discrete Random Variables An integer-valued random variable is a function X W $ ! S, where S $ Z. (This definition is readily generalized to random variables that assume binary values, categorical values, and values from other discrete spaces.) The discrete random variable X is said to be measurable if f! 2 $ W X.!/ D i g 2 A; 8i 2 S: The probability distribution of a discrete random variable X 2 S is defined by def Pi D P fX D i g D P f! 2 $ W X.!/ D i g: January 13, 2009 (1:57 PM) 12 Robert R. Snapp © 2008 8 CONTINUOUS RANDOM VARIABLES 7.1 Common Discrete Distributions Distribution Uniform Parameters Pi S n 2 ZC 1 n f1; 2; : : : ; ng n 2 ZC ! n i n!i p q i f0; 1; : : : ; ng p; q + 0; Binomial pCq D1 p; q + 0; Geometric Poisson Hypergeometric 7.2 ' 2 .0; C1/ e !" k; m; n 2 ZC ; #m$# k 'mCn Z" qp i pCq D1 'i iŠ n $ k!i #mCn$ k i Z" f0; 1; : : : ; mg Distributions of Several Discrete Random Variables Let Xi W $ ! Si $ Z for i D 1; 2; : : : ; n. These random variables are measurable if 8x1 2 S1 ; : : : ; 8xn 2 Sn , f! 2 $ W X1 .!/ D x1 ; : : : ; Xn .!/ D xn g 2 A: We define the joint discrete probability distribution as def P .x1 ; x2 ; : : : ; xn / D P fX1 D x1 ; X2 D x2 ; : : : ; Xn D xn g D P f! 2 $ W X1 .!/ D x1 ; X2 .!/ D x2 ; : : : ; Xn .!/ D xn g The random variables X1 ; : : : ; Xn are said to be independent if the probability distribution factors, as P .x1 ; x2 ; : : : ; xn / D P fX1 D x1 ; X2 D x2 ; : : : ; Xn D xn g D P fX1 D x1 g P fX2 D x2 g &&& P fXn D xn g D PX1 .x1 / PX2 .x2 / & & & PXn .xn / (Often the subscripts are dropped in the last expression, if no ambiguity arises.) 8 Continuous Random Variables A real-valued random variable is a function X W $ ! S $ R, A real-valued random variable is measurable if f! 2 $ W X.!/ < xg 2 A; 8x 2 S: January 13, 2009 (1:57 PM) 13 Robert R. Snapp © 2008 9 STATISTICAL MOMENTS OF RANDOM VARIABLES We define the probability distribution of the real-valued, random variable X , as def FX .x/ D P fX < xg D P f! 2 $ W X.!/ < xg; where the subscript of FX is often ommitted if no confusion arises. Likewise, we define the probability density of X 2 S, as fX .x/ D d FX .x/ dx at all points x 2 S where the derivative is defined. 8.1 Common Continuous Probability Densities Distribution Parameters fX .x/ S Rectangular a; b 2 R a<b 1 b!a Œa; b# Triangular a>0 % & 1 jxj 1! a a Œ!a; a# Normal (2R & 2 .0; C1/ ! R Gamma (or exponential if ! D 1) '; s 2 .0; C1/ ! x ""!1 e !x=s s s ).'/ Œ0; 1/ Cauchy s 2 .0; C1/ 1 s 2 " x C s2 R Beta a; b 2 .0; C1/ ).a C b/ a!1 .1 ! x/b!1 x ).a/ ).b/ Œ0; 1# def ).u/ D 9 .x ! (/2 1 exp ! p 2& 2 & 2" Z 0 1 t u!1 e !t dt; ).u C 1/ D u).u/; ).n C 1/ D n Š Statistical Moments of Random Variables Moment Continuous? Z E.X/ D x fX .x/ dx i2S ZS X E.X k / D i k P fX D i g E.X k / D x k fX .x/ dx S i2S " ! " ! def Var.X/ D E .X ! E.X//2 D E X 2 ! E.X/2 Discrete X E.X/ D i P fX D i g Mean k-th moment Variance E.aX C bY / D aE.X / C bE.Y /; January 13, 2009 (1:57 PM) 14 Var.aX C b/ D a2 Var.X / Robert R. Snapp © 2008 9 STATISTICAL MOMENTS OF RANDOM VARIABLES ? If the density fX .x/ is undefined, these integrals can be evaluated as Stieltjes integrals, i.e., Z 1 X def E .g.X// D g.x/ dFX .x/ D lim sup g.x/ .Fx ..i C 1/ı// ! Fx .i ı//; ı!0 S iD!1 iı<x#.iC1/ı where we assume the same value is obtained if sup is replaced by inf. Example 6 (Indicator Functions) Let A $ R denote a measurable event for a real-valued random variable X . The indicator function of event A is defined by ( 1; if x 2 A IA .x/ D 0; otherwise: Then, EfIA .X /g D Z R IA .x/ dFX .x/ D Z A dFX .x/ D P fAg: Example 7 (Moments of a Binomial Random Variable) Let X , Binomial.n; p/, i.e., ! n i n!i P fX D i g D b.i; n/ D pq i for i D 0; 1; : : : ; n, where q D 1 ! p. To compute the mean: E.X / D D n X iD0 i & P fX D i g D n X iD0 ! ! n n i n!i @ X n i n!i i Dp pq pq @p i i iD0 @ p .p C q/n D np.p C q/n!1 D np @p To compute the variance: # $ Var.X / D E X 2 ! E.X /2 D E .X.X ! 1// C E.X / ! E.X /2 ! ! n n 2 X X n i n!i n i n!i 2 @ E.X.X ! 1// D i.i ! 1/ Dp pq pq @p 2 i i iD0 iD0 2 @ .p C q/n D n.n ! 1/p 2 .p C q/n!2 D n.n ! 1/p 2 @p 2 Var.X / D n.n ! 1/p 2 C np ! .np/2 D np.1 ! p/ D npq D p2 9.1 Generating Functions for Discrete Random Variables If X is a discrete random variable, defined for example on the integers, we define its generating function g W R ! R according to the expression C1 h i X s k P fX D kg: gX .s/ D E s X D Note that gX .1/ D 1 as P kD!1 P fX D kg D 1. Likewise, we see that X gX0 .1/ D kP fX D kg D E.X / k k gX00 .1/ January 13, 2009 (1:57 PM) D X k ' ( k.k ! 1/P fX D kg D E ŒX.X ! 1/# D E X 2 ! E ŒX # : 15 Robert R. Snapp © 2008 9 STATISTICAL MOMENTS OF RANDOM VARIABLES # $2 Thus, Var ŒX # D gX00 .1/ C gX0 .1/ ! gX0 .1/ . Generating functions are especially useful for computing moments of the sum of a random number of random variables. (This is an example of what is known as a compound distribution.) For example, let S D X1 CX2 C& & &CXN , and N 2 Z" where the Xi 2 Z are mutually independent and identically distributed (i.i.d.) by pk D P fX Pi D kg, k is chosen according to the distribution qn D P fN D ng. Letting gX .s/ D k pk s and gN .s/ D Pindependently 1 n q s , By definition, n nD0 ' ( gS .s/ D E s S ( ' D E s X1 C$$$CXN h ' ˇ (i D E E s X1 C$$$CXN ˇ N D n D D 1 X ˇ ( ' E s X1 C$$$CXn ˇ N D n qn nD0 1 X nD0 ( ' E s X1 C$$$CXn qn N X ( ' ( ' D E s X1 & & & E s Xn qn D nD0 1 X # $n gX .s/ qn nD0 # $ D gN gX .s/ : 0 By the chain rule of calculus, EŒS # D gS0 .1/ D gN .1/gX0 .1/ D EŒN #EŒX #. 9.2 Characteristic Functions for Continuous Random Variables If X is a random variable, defined for example on the reals, we define its characteristic function, as Z C1 ' itX ( *X .t / D E e D e itx dF .x/; !1 p where i D !1, and t 2 R. In the event that X is a continuous random variable with probability density fX .x/, the above reduces to Z C1 *X .t / D e itx fX .x/ dx: def !1 9.3 Problems 9.1 A coin that lands heads with probability p is tossed repeatedly until it lands tails. Compute the mean and variance of the number of times the coin is tossed. 9.2 Show that the generating function for a binomially distributed random variable B , Binomial(n, p) is given by gB .s/ D .1 C .s ! 1/p/n : Use gB to compute the mean and variance of B. 9.3 Show that the generating function for a Poisson random variable, X , Poisson.'/ is given by gX .s/ D e .s!1/" . Use gX to compute the mean and variance of X . 9.4 A random variable U assumes values from the discrete set f0; 1; 2; : : : ; n ! 1g with equal probability, 1=n. Show that its generating function is given by 1 ! sn gU .S / D : n.1 ! s/ January 13, 2009 (1:57 PM) 16 Robert R. Snapp © 2008 11 DISTRIBUTIONS OF FUNCTIONS OF RANDOM VARIABLES Use gU to compute the mean and variance of U . 10 Multiple Random Variables Sometimes problems will involve more than one random quantity, e.g., Xi W $ ! Si $ R; for i D 1; 2; : : : ; n. These variables are measurable, if 8x1 2 S1 ; : : : ; 8xn 2 Sn , f! 2 $ W X1 .!/ < x1 ; : : : ; Xn .!/ < xn g 2 A: We define the joint probability distribution as def FX1 ;:::;Xn .x1 ; : : : ; xn / D D P fX1 < x1 ; : : : ; Xn < xn g P f! 2 $ W X1 .!/ < x1 ; : : : ; Xn .!/ < xn g: The joint probability density is defined as fX1 ;:::;Xn .x1 ; : : : ; xn / D @n FX ;:::;Xn .x1 ; : : : ; xn /: @x1 & & & @xn 1 The random variables X1 ; : : : ; Xn are said to be independent, if FX1 ;:::;Xn .x1 ; : : : ; xn / D FX1 .x1 / & & & FXn .xn /: 11 Distributions of functions of random variables Let X denote a continuous random variable with probability distribution FX , and let * W R ! R. Then the probability distribution of the induced random variable Y D *.X / is obtained as FY .y/ D P fY ' yg D P fX 2 * !1 ..!1; y#/g (With slight notational abuse, we let * !1 .S / denote the inverse image if set S . That is x 2 * !1 .S / if and only if y D *.x/ 2 S .) Consequently, if both FX and * !1 are differentiable, then the probability density of Y is obtained via, X ˇ ˇ!1 fX .x/ ˇ* 0 .x/ˇ : fY .y/ D fxW#.x/Dyg 2 2 As an example, consider Y D X , i.e., *.x/ D x : Then, # p $ #p $ fX y fX ! y C : fY .y/ D p p 2 y 2 y To generalize the above to the multivariate case, let X1 ; X2 ; : : : ; Xn denote n random variables governed by the joint probability distribution, FX1 ;:::;Xn .x1 ; : : : ; xn /; and let ! W Rn ! Rn . Consequently, let Yi D *i .X1 ; : : : ; Xn /, for i D 1; : : : ; n denote the n components of ! applied to X1 ; : : : ; Xn . With the assumption that fX1 ;:::;Xn exists, and that ! is differentiable, then fY1 ;:::;Yn .y/ D January 13, 2009 (1:57 PM) X fxWyD!.x/g 17 fX1 ;:::;Xn .x/ jJ.x/j Robert R. Snapp © 2008 11 DISTRIBUTIONS OF FUNCTIONS OF RANDOM VARIABLES where, 2 @# 1 @x1 6 : J.x/ D det 6 4 :: @#n @x1 @#1 @xn &&& :: : &&& :: 7 7 : 5 @#n @xn is the Jacobian determinant of the variable transformation. 11.1 3 Distributions of sums and quotients Assume that X and Y are continuous random variables defined by the joint probability density f .x; y/, i.e. Z x Z y FX;Y .x; y/ D P fX ' x; Y ' yg D f .x 0 ; y 0 / dx 0 dy 0 !1 !1 The distribution and density of S D X C Y is obtained by computing the probability of the event fS ' sg: Z C1 Z s!x dx f .x; y/ dy: FS .z/ D P fS ' sg D P fX C Y ' sg D Y !1 x C X Whence, y D fS .s/ D s Figure 4: The shaded region depicts the event X C Y ' s. D Z C1 !1 f .x; s ! x/ dx: In the event that X and Y are also independent, f .x; y/ D fX .x/fY .y/, and Z C1 fS .s/ D fX .x/fY .s ! x/ dx D .fX - fY /.s/: (1) !1 the convolution of the two densities. Similarly, the distribution and density of the quotient Q D Y =X is found by evaluating the probability of the event Y fQ ' qg D f.x; y/ 2 R W y=x < qg D f.x; y/ 2 R W y > qx; x < 0g [ f.x; y/ 2 R W y < qx; x > 0g: x yDq X Thus, FQ .q/ D Z 0 dx !1 Figure 5: The shaded region depicts Whence, the event Y =X ' q, for q > 0. Z 0 Z fQ .q/ D FQ0 .q/ D ! xf .x; qx/ dx C !1 11.2 FS0 .s/ !1 0 C1 Z C1 qx f .x; y/ dy C xf .x; qx/ dx D Z Z C1 0 C1 !1 dx Z qx f .x; y/ dy: !1 jxjf .x; qx/ dx: Problems 11.1 Derive the probability densities of Y D e X , where X is a given continuous random variable with probability density fX defined over the reals. 11.2 Let X and Y be continuous random variables with joint probability density f .x; y/. Derive an expression for the probability distributions and densities of densities of (a) aX C bY , (b) X ! Y , (c) X Y and (d) X=Y . 11.3 Let X , N .(x ; &x2 / and Y , N .(y ; &y2 /, be independent, normal random variables. Use formula 1 to derive the probability density of the sum X C Y . January 13, 2009 (1:57 PM) 18 Robert R. Snapp © 2008 13 CHEBYSHEV’S INEQUALITY 11.4 Let X; Y 2 Œ0; C1/ be independent, exponentially distributed random variables, with fX .x/ D ˛e !˛x , fY .y/ D ˇe !ˇy , with ˛; ˇ 2 .0; C1/. Compute the probability density of X C Y . 11.5 Let Z D Y X , where X and Y are independent continuous random variables that are uniformly distributed on the unit interval .0; 1/. Show that EfZ k g D log.k C 1/=k, for k D 1; 2; : : :. 12 Convergence of a random sequence Since the precise values of random variables are usually unpredictable, it is somewhat surprising that one can say anything substantial about the limit of a sequence of random variables of the form X1 ; X2 ; X3 ; : : :. As one might expect, the degree of certainty obtained for deterministic sequences such as xk D .2 C k/=.3 C 2k/, cannot in general be realized for a random sequence. However, four common categories of convergence are encountered in a probabilistic framework, and are here defined from strongest to weakest. 1. The sequence Xn is said to converge with probability one to X if n o P lim Xn D X D 1: n!1 This type is also known as almost sure (a.s.) convergence, which we often write as, lim Xn D X n!1 (a.s.): 2. The sequence Xn is said to converge in probability to X (or Xn ! X , i.p.) if for every + > 0, lim P fjXn ! X j > +g D 0: n!1 3. The sequence Xn is said to converge in quadratic mean to X (or Xn ! X , in q.m.) if i h lim E .Xn ! X /2 D 0: n!0 4. For n D 1; 2; : : :, let Fn .x/ D P fXn ' xg denote the probability distribution of Xn , and let F .x/ D P fX ' xg denote that for X . The sequence Xn is said to converge in distribution to X if lim Fn .x/ D F .x/ n!1 for all x where F is continuous. 13 Chebyshev’s Inequality Let X W $ ! R denote a random variable with mean ( and variance & 2 . Then for any + > 0, P fjX ! (j > +g < January 13, 2009 (1:57 PM) 19 &2 : +2 Robert R. Snapp © 2008 15 STRONG LAW OF LARGE NUMBERS Proof: Let f denote the probability density of X . & 2 D VarfX g D Ef.X ! (/2 g D D > Z R .x ! (/2 f .x/ dx Z Z >+ 2 jx!$j>% jx!$j>% 2 Z .x ! (/ f .x/ dx C Z jx!$j#% .x ! (/2 f .x/ dx .x ! (/2 f .x/ dx jx!$j>% f .x/ dx D + 2 P fjX ! (j > +g: ! Chebyshev’s Inequality, &2 : +2 is a rather “weak” inequality. Note, that + ' & yields a trivial upper bound. Thus, Chebyshev’s inequality is most useful when + . & . P fjX ! (j > +g < For example, it can be used to prove the weak law of large numbers, which states that the average value of a sequence of independent observations tends to the statistical mean. 14 Weak law of large numbers o n Let X1 ; X2 ; : : : ; Xn denote a sequence of i.i.d. random variables, with ( D E fXi g and & 2 D E .Xi ! (/2 , for i D 1; 2; : : : . Let, 1 Mn D .X1 C & & & C Xn / n denote the sample mean. Observe, ) * n 1 1X .X1 C & & & C Xn / D EfMn g D E E.Xi / D (; n n iD1 ) * n 1 1 X &2 .X1 C & & & C Xn / D 2 Var.Xi / D VarfMn g D Var n n n iD1 By Chebyshev’s inequality, for all + > 0, P fjMn ! (j > +g D Thus, Mn D 15 1 n Pn iD1 &2 : n+ 2 Xi ! ( in probability as n ! 1. Strong Law of Large Numbers With probability one, Mn converges to the mean: o n P lim Mn D ( D 1: n!1 January 13, 2009 (1:57 PM) 20 Robert R. Snapp © 2008 REFERENCES 16 Borel-Cantelli Lemmas Lemma 1 (Borel-Cantelli) Let fAn g denote a sequence of measurable events. One of the following conditions is valid: " ! P 1. If 1 nD1 P fAn g < 1, then P lim supn An D 0. 2. If P1 nD1 " ! P fAn g D 1, then P lim supn An D 1. References [1] William Feller. An Introduction to Probability Theory and Its Applications, volume I. John Wiley & Sons, New York, third edition, 1968. [2] Paul R. Halmos. Naive Set Theory. Springer-Verlag, New York, 1974. [3] A. N. Kolmogorov. Foundations of Probability. Chelsea, New York, second edition, 1956. [4] I. Todhunter. A History of the Mathematical Theory of Probability. Chelsea, 1965. January 13, 2009 (1:57 PM) 21 Robert R. Snapp © 2008