Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MSc Maths and Statistics 2008 Department of Economics UCL Chapter 5: Probability Jidong Zhou Chapter 5: Probability Probability theory is the main tool to deal with problems with uncertainty. We introduce the standard probability model first.1 1 Definition of Probability • Experiments: an experiment is a process whose outcome is not known in advance with certainty. For example, tossing a coin twice is an experiment. • The sample space: the set contains all possible outcomes of an experiment, denoted by Ω. For example, all possible outcomes of tossing a coin twice are {HH, HT, T H, T T }. • Events: an event is a subset of the sample space Ω. For example, “at least one Head is realized” or {HH, HT, T H} is an event. We now define probability: • Let Σ be the set of all “well-behaved” subsets of Ω (or all possible “interesting” events of an experiment).2 • For any event in Σ, say A, we assign a number Pr(A) which is called the probability of the event A if the following properties (or axioms) are satisfied: — for any event A, Pr(A) ≥ 0. — Pr(Ω) = 1. — for every sequence of disjoint events {Ai }i∈N , Ã∞ ! ∞ [ X Pr Ai = Pr (Ai ) . i=1 i=1 • Based on this definition, we can derive the following results: — Pr(∅) = 0 1 What does probability exactly mean? It is a debatable problem among statisticians. See, e.g., Section 1.2 in DeGroot and Schervish (2002). 2 More precisely, we only consider those subsets or events which are measurable. But the concept of measurable sets is beyond the scope of this course. There is no such a problem if the sample space Ω is a finite set. 1 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 5: Probability Jidong Zhou — for any finite sequence of disjoint sets {Ai }i=1,2,··· ,n Ãn ! n [ X Pr Ai = Pr (Ai ) i=1 i=1 — Pr(Ac ) = 1 − Pr(A) — if A ⊂ B, then Pr(A) ≤ Pr(B) — for any A ∈ Σ, 0 ≤ Pr(A) ≤ 1 — for any A, B ∈ Σ, Pr (A ∪ B) = Pr(A) + Pr(B) − Pr(AB), where Pr(AB) = Pr(A ∩ B). Exercise 1 (i) Prove the last result in the above; then prove that Pr(A ∪ B ∪ C) = Pr(A) + Pr(B) + Pr(C) − [Pr(AB) + Pr(BC) + Pr(AC)] + Pr(ABC). Can you work out the general case? (ii) Prove that, for any two events A and B, the probability that exactly one of the two events will occur is Pr(A) + Pr(B) − 2 Pr(A ∩ B) 2 Finite Sample Spaces In this part, we consider relatively simple experiments which have finite number of outcomes. The sample spaces of such experiments is called finite sample spaces. 2.1 Counting methods We first illustrate some frequently used counting methods through some examples: • Multiplication rule: Suppose that two dices are rolled, and there are six possible outcomes for each dice. What is the number of possible outcomes for this experiment? The answer is 36. In general, if an experiment consists of two parts, and suppose the first part has m possible outcomes and the second part, regardless of the outcome of the first part, has n possible outcomes. Then the total number of possible outcomes for this experiment is m × n. 2 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 5: Probability Jidong Zhou • Permutations: Suppose that we want to arrange six different books in a row. How many possible arrangements do we have? For the first position, we have six choices. Once the first position is filled, we have 5 choices for the second position, and so on. Hence, we have 6 × 5 × 4 × 3 × 2 × 1 = 6! possible arrangements. In general, given n distinct, if we want to randomly select k ≤ n items one by one without replacement, the total number of possible sequences is n × (n − 1) × · · · × (n − k + 1) ≡ Pnk . This is also called the number of permutations of n items taken k at a time.3 In particular, Pnn = n!. • Combinations: Suppose that we have four candidates a, b, c, and d, and two of them will be randomly chosen to be the committee members. How many different selections do we have? All possibilities are listed below: {a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d}. So the answer is 6.4 In general, given a set consisting of n distinct elements, if we want to randomly select k ≤ n ones to form a subset, then the number of possible subsets is n! ≡ Cnk . k!(n − k)! This is called the number of combinations of n items taken k at a time. Sometimes it ¡ ¢ is also denoted by nk . — notice that an ordered list of k items randomly selected from n items can be constructed as follows: we first randomly pick k items at a time, and then arrange them in a particular order. Therefore, we must have Pnk = Cnk Pkk . Actually, that is the way we get the formula for Cnk . 3 Selecting k items sequentially is equivalent to selecting k items at a time and then assigning them a particular order. 4 Notice that here the order does not matter because, for example, {a, b} = {b, a}. However, if the problem becomes that we want to randomly choose two: one for the president and the other for the secretary, then the order matters. In that case, the number of different arrangements is 12, since for each pair to selected candidates we have two ways to assign their positions. 3 MSc Maths and Statistics 2008 Department of Economics UCL 2.2 Chapter 5: Probability Jidong Zhou Computing probability in simple sample spaces • A simple sample space is a finite sample space in which each outcome is equally likely. (This is sometimes called the classical probability model.) Then, if Ω = {a1 , · · · , an }, we have Pr({ai }) = 1/n for any i, and the probability of any events is just the number of outcomes in that event divided by n. — in order to know the probability of an event, we first need to compute the number of outcomes in the whole sample space, and then we compute the number of outcomes in this event. • Here are some examples: — if two balanced dices are rolled, what is the probability that the sum of the two numbers that appear will be 5? Let (x, y) be an outcome where x is the appeared number of the first dice and y is the appeared number of the second one. Then the total number of outcomes, according to the multiplication rule, is 36. The sum will be 5 if one is 1 and the other is 4 or if one is 2 and the other is 3, and so the number of outcomes in that event is 4. Thus, the probability is 1 4 = . 36 9 — (the birthday problem) consider a group of k people (2 ≤ k ≤ 365). Assume that the birthdays of these people are unrelated (e.g., no twins), and each of the 365 days of the year is equally likely to be the birthday of any person in this group. Then what is the probability that at least two people among them will have the same birthday, that is, will have been born on the same day of the same month but not necessarily in the same year? The answer is 1− k P365 . 365k The second term is just the probability that all k persons have different birthdays. For example, if k = 50, this probability is about 0.970. — suppose that a class contains 15 boys and 30 girls, and that 10 students are to be selected at random for a special assignment. What is the probability that exactly 3 boys will be selected? The number of outcomes in the event that 10 students are 4 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 5: Probability Jidong Zhou 10 ; and the number of outcomes in the event that 3 boys selected at random is C45 3 C 7 according to the multiplication rule. are selected and 7 girls are selected is C15 30 3 C7 C15 30 ≈ 0.2904. 10 C45 Exercise 2 (i) If 12 balls are thrown at random into 20 boxes, what is the probability that no box will receive more than one ball? (ii) Suppose a fair coin is to be tossed 10 times. What is the probability of obtaining exactly 3 heads? (iii) Suppose that a deck of 52 cards containing four aces is shuffled thoroughly and the cards are then distributed among four players so that each of them receives 13 cards. What is the probability that each player will get one ace? Pi=n i k ; (c) n (iv) Show that (a) Cnk = Cnn−k ; (b) Cnk + Cnk−1 = Cn+1 i=0 Cn = 2 . 3 Conditional Probability Suppose we run an experiment with a sample space Ω, and let Σ be the set of all appropriate subsets of Ω. Suppose all events in Σ have been assigned probabilities. • We want to know the revised probability of event A after learning that event B has occurred. This is called the conditional probability of A given B and is denoted by Pr(A|B). • How to calculate this conditional probability? Consider a simple example in which Ω = {1, 2, 3} and each outcome is equally likely. Suppose now we know that event B = {1, 2} has occurred. Then what is the probability that event A = {2, 3} has occurred? Since A has occurred only if {2} in B has occurred, so conditional probability is just 1/2, or 1 Pr(AB) = . Pr(B) 2 • This rule works in general, i.e., Pr(A|B) = Pr(AB) . Pr(B) That is, the conditional probability Pr(A|B) is just the proportion of the total probability Pr(B) that is represented by the probability Pr(AB). Clearly, if A and B are disjoint, then Pr(A|B) = 0. 5 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 5: Probability Jidong Zhou • Rearranging this expression yields Pr(AB) = Pr(A|B) Pr(B) = Pr(B|A) Pr(A), and this can be used to compute the probability of an intersection of more sets: Pr(ABC) = Pr(A|BC) Pr(BC) = Pr(A|BC) Pr(B|C) Pr(C) and in general, for sets A1 , A2 , A3 , · · · , An : Pr(A1 A2 · · · An ) = Pr(A1 ) Pr(A2 |A1 ) Pr(A3 |A1 A2 ) · · · Pr(An |A1 A2 · · · An−1 ). Exercise 3 Suppose that a box contains one blue card and four red cards, which are labeled a, b, c, and d. Suppose two cards are selected at random from these five cards without replacement. (i) If it is known that card a has been selected, what is the probability that both cards are red? (ii) If it is known that at least one red card has been selected, what is the probability that both cards are red? 4 Independence • If learning that event B has occurred does not change our probability judgment of event A, we say that A and B are independent.5 Formally, A and B are independent if Pr(A|B) = Pr(A), or Pr(AB) = Pr(A) Pr(B). • The concept of independence in the probability theory is related with but different from the ordinary use of independence. In the ordinary sense, that two events are independent usually means that these two events are physically unrelated. For example, for two machines which operate independently in two factories, the event that one machine will become inoperative should be independent of the event that the other machine will become inoperative. Although two physically unrelated events should be independent in probability theory, the converse is not true as the following example shows. Suppose 5 It does not mean that we cannot infer anything about A from knowing B has occurred. 6 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 5: Probability Jidong Zhou that a balance dice is rolled. Let A be the event that an even number is obtained, and let B be the event that one of the numbers 1, 2, 3,or 4 is obtained. It is easy to see that Pr(A) = 1/2, Pr(B) = 2/3, and Pr(AB) = 1/3. Thus, A and B are independent, but clearly they are physically related. • The n events A1 , A2 , · · · , An are independent if, for every subset Ai1 , · · · , Aik of k of these events (k = 2, · · · , n), Pr(Ai1 Ai2 · · · Aik ) = Pr(Ai1 ) Pr(Ai2 ) · · · Pr(Aik ). • The n events A1 , A2 , · · · , An are conditionally independent given B if, for every subset Ai1 , · · · , Aik of k of these events (k = 2, · · · , n), Pr(Ai1 Ai2 · · · Aik |B) = Pr(Ai1 |B) Pr(Ai2 |B) · · · Pr(Aik |B). Exercise 4 (i) Show that, if A and B are independent, then A and B c are also independent. (ii) Two students A and B are both registered for a certain course. Assume that student A attends class 80 percent of the time, and student B attends class 60 percent of the time, and the absences of the two students are independent. (a) What is the probability that at least one of the two students will be in class on a given day? (b) If at least one of the two students is in class on a given day, what is the probability that A is in class that day? 5 Bayes’ Theorem • Consider an example first. A medical test can indicate whether a person has a certain disease in the following way: if a person has this disease, then with probability 0.9 the test result will be positive; if a person does not have this disease, then with probability 0.1 the test result will be positive. Suppose the data suggest that the chance of having this disease is only 0.0001 in the population. Now if the test result for a randomly selected person is positive, then what shall we say about the probability that this person has this disease? We can first calculate the probability of the event that a randomly selected person will get a positive test result (we call it A). Given that this person does have this disease (of which the probability is 0.0001), A will happen with probability 0.9. Givent that this person does not have this disease (of which the probability is 0.9999), A will happen with probability 0.1. Thus, Pr(A) = 0.0001 × 0.9 + 0.9999 × 0.1 = 0.100 08. 7 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 5: Probability Jidong Zhou Among this probability, only 0.0001 × 0.9 = 0.000 09 is contributed by the event that this person does have this disease (we call it B1 ), and most of it, 0.9999 ×0.1 = 0.099 99, is contributed by the event that this person does not have this disease (we call it B0 ). Therefore, we should infer that the probability of B1 conditional on A is that Pr(B1 |A) = or Pr(B1 |A) = 0.000 09 ≈ 0.000899 0.100 08 Pr(A|B1 ) Pr(B1 ) Pr(B1 A) = . Pr(A) Pr(A|B1 ) Pr(B1 ) + Pr(A|B0 ) Pr(B0 ) This is the basic idea of Bayes’ theorem. We usually call Pr(B1 ) the prior belief and Pr(B1 |A) the posterior belief. Although the posterior is higher than the prior due to the positive test result, it is still quite low.6 • In general, the Bayes’ theorem can be stated as follows. Let B1 , B2 , · · · , Bn be a partition of the sample space Ω and A be an event.7 Since Pr(A) = Pr(AB1 ) + Pr(AB2 ) + · · · + Pr(ABn ) = Pr(A|B1 ) Pr(B1 ) + Pr(A|B2 ) Pr(B2 ) + · · · + Pr(A|Bn ) Pr(Bn ), we have Pr(B1 |A) = = Pr(B1 A) Pr(A) Pr(A|B1 ) Pr(B1 ) Pn i=1 Pr(A|Bi ) Pr(Bi ) if Pr(A) > 0. This formula tells us how to calculate the posterior of B1 after learning A has occurred, when the priors Pr(Bi ) and the conditional probabilities Pr(A|Bi ) are known. Exercise 5 (i) Suppose a ball is drawn from an urn. With probability 1/3, this urn is of type I which contains k red balls and n − k blue balls; with probability 2/3, this urn is of type II which contains k blue balls and n − k red balls. Now if the ball observed is a red one, then what is the probability that this urn is of type I? (ii) In a certain city, 30 percent of the people are Conservatives, 50 percent are Liberals, and 20 percent are Independents. Records show that in a particular election, 65 percent of the Conservatives voted, 82 percent of the Liberals voted, and 50 percent of the Independents voted. If a person in the city is selected at random and it is learned that she did not vote in the last election, what is the probability that she is a Liberal? 6 If a person ignores the fact the base rate of having this disease is quite low, he may conclude, after obtaining a positive test result, that he has this disease with probability 0.9. 7 A partition means that all Bi are disjoint with each other and the union of them is Ω. 8