* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 2. SAMPLE SPACES WITH NO STRUCTURE
Survey
Document related concepts
Transcript
Chapter 2. SAMPLE SPACES WITH NO STRUCTURE In many practical examples, the sample space has some structure: there are relationships between the outcomes. In this chapter, probability theory is developed without making such assumptions. 2.1 Deductions from Axioms REMINDER A1 For any event A, 0 ≤ Pr(A) ≤ 1. A2 For the event S, Pr(S) = 1. A3 For any two events A and B satisfying A ∩ B = ∅, Pr(A ∪ B) = Pr(A) + Pr(B). 2.1.1 THEOREM: Pr(A) = 1 − Pr(A). S $ ' A & % PROOF : Since A is the complement of A, A ∪ A = S and A ∩ A = ∅. Hence, using A3 Pr(A ∪ A) = Pr(A) + Pr(A) i.e Pr(A) = Pr(A ∪ A) − Pr(A) But A ∪ A = S and, by A2, Pr(S) = 1 and hence Pr(A) = 1 − Pr(A). 2.1.2 SPECIAL CASE: A = S This gives Pr(S) = 1 − Pr(S) = 0 , from A2 That is, Pr(∅) = 0 . Note: This result can be thought of as obvious. The important thing is that we don’t have to assume that it is true: we can deduce it from the three axioms. 2.1.3 THEOREM: If A ⊃ B then Pr(A) ≥ Pr(B). ' $ ' S $ A B & & % % PROOF: If A ⊃ B, then A = B ∪ (A ∩ B) and B ∩ (A ∩ B) = ∅. Hence, by A3, Pr(A) = Pr(B) + Pr(A ∩ B). By A1, Pr(A ∩ B) ≥ 0, so Pr(A) ≥ Pr(B). Note: Intuitively, this result is also obviously true. It is included here to show how, like the result in §2.1.2, it can be derived from the axioms. 2.1.4 THEOREM: For any two events A and B, Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B). ' ' $ $ S B A & & % % PROOF : A ∪ B = A ∪ (A ∩ B) and B = (A ∩ B) ∪ (A ∩ B). In both cases the RHS contains mutually exclusive events. [Use A3 and substitute for Pr(A ∩ B) .] It is often useful to write an expression as a union of mutually exclusive events. Fuller details: Meyer p.14, Arthurs, p.14, Feller, p.23, Clarke and Cooke, p.134. 2.1.5 Extension of result 2.1.4 to 3 events. ' A ' $ $ B ' $ & & S % C % & % Pr(A ∪ B ∪ C) = Pr(A) + Pr(B) + Pr(C) − Pr(A ∩ B) − Pr(A ∩ C) − Pr(B ∩ C) + Pr(A ∩ B ∩ C) PROOF: Write A ∪ B ∪ C as A ∪ (B ∪ C) and apply result 2.1.4 twice. 2.1.6 Further extension to n events. Pr n [ Ai = sum of individual probabilities i=1 −probabilities of all pairs +probabilities of all triples −··· T n n −(−1) Pr i=1 Ai . [PROOF: by induction] COROLLARY: For mutually exclusive events Pr(union) = sum of individual probabilities. Addition Law of Probability 2.2 Sampling Problems Many applications of ‘symmetry’ probabilities arise from problems in which a randomising device is used to select a sample from some population. Terminology: Terms like ‘sample at random’ or ‘select a random sample’ are often used. These may sound vague – but in fact they are very precise. They both mean ‘select a sample in such a way that all possible samples have exactly the same chance of being the one selected’. The order in which the sample members are selected may or may not be important. We assume for the moment that it is important. With replacement or without replacement Suppose that a sample of size r is to be chosen at random from a population of size n. There are two main possibilities. A Random sampling with replacement . 1 n 2 n 3 n ··· ··· r n Each ‘box’ can be filled in n different ways. The sample of size r can be selected in nr ways; each possible sample has a probability 1 nr of being the one selected. B Random sampling without replacenent 1 n 2 n−1 3 n−2 ··· ··· r n−r+1 First sample member: n possibilities Second sample member: n − 1 possibilities and so on. The number of possible samples D is given by D = n(n − 1)(n − 2) · · · (n − r + 1). REMINDER: Factorial n: n! = 1.2.3 . . . (n − 1).n. n! . Hence D = (n−r)! Notes: 1. D is the denominator in probability calculations. 2. Reminder: The order in which the sample members are selected is taken as important. The denominator D is the total number of ways in which the sample members can be selected in that order. Since sampling is done at random, each of these D samples has exactly the same chance of being the one selected. For any given event, the numerator N will be the number of these samples which result in the event occurring. We sometimes call this the number of samples favourable to the event. Therefore, for any event, Pr(event) = number of favourable permutations . total number of permutations Suppose now that the order in which the sample members are selected is not important. Whether a particular event occurs depends only on which population members are selected for the sample, and not on the order in which they are selected. So far, the numerator N and denominator D in calculations have been the numbers of permutations involved. We can now simplify calculations by using combinations instead of permutations in the numerator and denominator. If this is done, D becomes the number of different ways of choosing r items for the sample out of n: D= n! n = r!(n − r)! r n [We can also write as nCr .] r Reminder: D = n n! = r!(n−r)! . r The numerator N is now the number of combinations favourable to the required event. EXAMPLE: Three cards are selected from a pack, at random, without replacement. Events: A: first ace appears at 3rd card B: exactly one card is an ace. We wish to find Pr(A) and Pr(B). Event A: first ace appears at 3rd card For A, the order in which the cards are selected is important. D = No. of ways of choosing 3 from 52 in order 52! = = 52 × 51 × 50. (52 − 3)! N = No. of ways favourable to event A = 48 × 47 × 4. Hence Pr(A) = 48.47.4 = 0.0681. 52.51.50 Event B: exactly one card is an ace. For event B, the order in which the three cards appear is not important. Using combinations: 52 D = = 22100, 3 48 4 N = = 4512. 2 1 Hence Pr(B) = 4512 = 0.2042. 22100 In problems where order is not important, it is still possible to use permutations – but this usually makes the calculations more complicated. For event B: D = 52 × 51 × 50 = 6 × 22100 N = (4 × 48 × 47) + (48 × 4 × 47) + (48 × 47 × 4) = 6 × 4512. Hence N/D = 4512/22100, as before. 2.3 Conditional Probability This topic relates to two (or more) events associated with the same experiment. ' ' $ $ S B A & & % % Two events A and B divide S into 4 regions. We now consider the form of relationship between these events. Example: E – two cards are taken in sequence, without replacement, from a pack, at random. We consider two events A: the first card is an ace, Pr(A) = 4/52. B: the second card is an ace, Pr(B) = 4/52. Suppose now that the first card is examined and seen to be an ace. What is Pr(B)? The answer is not 3 . 51 Reminder (from §1.4): To each event arising out of an experiment, a number (the probability of that event) is permanently assigned. 3 represent? What does the number 51 3 does not arise from an The ratio 51 experiment alone. It appears as a result of an experiment being performed and a particular condition being met. The experiment is that we choose two cards, at random, in sequence. The condition is that the first card chosen is an Ace. We can say that the conditional probability 3 . of B given A is 51 This conditional probability is calculated as follows 4.3 ) ( 52.51 4) ( 52 = Pr(A ∩ B) . Pr(A) ' ' $ $ S B A & & % % DEFINITION: If A and B are two events, then the conditional probability of B given A is defined as Pr(A ∩ B) Pr(A) for an event A such that Pr(A) > 0. NOTATION: We write the conditional probability of B given A as Pr(B | A). That is, Pr(B | A) = Pr(A ∩ B) . Pr(A) Notes: 1. Conditional probabilities can be interpreted just as ordinary (often called marginal ) probabilities: symmetry limiting relative frequency subjective 2. It is often easier to evaluate a conditional probability than a marginal probability. This happens in particular for events resulting from a sequence of actions. To obtain a marginal probability from a conditional one the formula is used in this way: Pr(A ∩ B) = Pr(B | A). Pr(A). Note that we also have: Pr(A ∩ B) = Pr(A | B). Pr(B). EXAMPLE (Two cards): Cards are selected at random without replacement from a pack. Define D as the event: D = first ace appears at 2nd card. Find Pr(D). SOLUTION: Define two events: A = first card chosen is not an ace B = second card chosen is an ace Then D ≡ A∩B. Now Pr(A) = 48/52, and Pr(B | A) = 4/51. Hence Pr(D) = Pr(A ∩ B) = Pr(A) Pr(B | A) = 48 4 16 · = = 0.0724. 52 51 221 Extension The basic result Pr(A ∩ B) = Pr(A) Pr(B | A) extends easily to three or more events. We thus obtain: Pr(A ∩ B ∩ C) = Pr(A) Pr(B | A) Pr(C | A, B) and so on. Applying this to the experiment of drawing cards at random without replacement, the argument extends easily to the event: An: the first ace appears at the nth card. For example, for the case n = 4, Pr(A4) = 48 47 46 4 × × × . 52 51 50 49 48·47·46·4 is Note: Writing this as Pr(A4) = 52·51·50·49 also instructive. EXAMPLE (Two dice): Two unbiased dice are thrown. X: score shown on die 1, Y : score shown on die 2. Consider two events: A: B: {Y = 2} {X < Y } The probabilities for the four combinations of results for A, B, A, B are: B A 1/36 A 14/36 Total 15/36 B 5/36 16/36 21/36 Total 6/36 30/36 36/36 Hence Pr(A | B) = Pr(A ∩ B) 1/36 1 = = . Pr(B) 15/36 15 2.4 Independence In general, for two events A and B, Pr(B | A) 6= Pr(B). Example: an unbiased die is thrown: A = {even}, B = {1, 2, 3} But sometimes the two probabilities may be equal: Example : A = {even}, B = {1, 2}. DEFINITION: If, for two events A and B, Pr(B | A) = Pr(B), then we say that B is independent of A. Alternatively, we say that the events A and B are independent of each other. Notes: (1) Independence is reflexive . If B is independent of A, then Pr(A ∩ B) , Pr(B) = Pr(B | A) = Pr(A) Therefore Pr(A ∩ B) = Pr(A) · Pr(B) . Dividing both sides by Pr(B), we obtain: Pr(A) = Pr(A ∩ B) = Pr(A | B) . Pr(B) So, if A is independent of B, then B is independent of A, and vice versa. (2) Interpretation of independence If A and B are not independent, then Pr(B | A) 6= Pr(B). Information that A has occurred changes our assessment of B. [It does not alter Pr(B). It does alter our assessment of the chance that B will occur, which is affected by our knowledge that A has occurred.] But, if A and B are independent, knowledge about the occurrence of B does not affect our assessment of A. (3) Theorem: If A and B are independent, then so are A and B. Proof: Independence ⇒ Pr(A ∩ B) = Pr(A) · Pr(B). Now, the events (A ∩ B) and (A ∩ B) are mutually exclusive, and A = (A ∩ B) ∪ (A ∩ B). Hence, using axiom A3, Pr(A) = Pr(A ∩ B) + Pr(A ∩ B). We can therefore write : Pr(A ∩ B) = Pr(A) − Pr(A ∩ B) = Pr(A) − Pr(A) · Pr(B) = Pr(A) · {1 − Pr(B)} = Pr(A) · Pr(B). Corollary: If A and B are independent, then A and B are independent. Also, A and B are independent. (4) The Multiplication Law of probability When events A and B are independent, then Pr(A ∩ B) = Pr(A) · Pr(B). In words: The multiplication law states that, if A and B are independent events, then their joint probability is the product of the individual probabilities. Note: Compare this with the general result Pr(A ∩ B) = Pr(A) Pr(B | A) = Pr(B) Pr(A | B), which holds for all events A and B. (5) When does independence occur? In practice, it is often known that two events are independent, and the multiplication law can then be used to calculate the joint probability. Experiments often consist of a set of quite independent components, or trials, with different events relating to different trials. Example: E is ‘toss a coin, throw a die’ Event A = {Coin shows heads}, Event B = {Die shows a 6}. If the tossing of the coin and the throw of the die are unrelated, the events A and B will be independent. Pr(A ∩ B) = Pr(A) · Pr(B) 1 1 1 = × = . 2 6 12 Distinction in the context of sampling: with replacement, events independent ; without replacement, not independent. (6) Pairwise and Mutual Independence If there are three or more events, it is possible for all pairs to be independent, but for there to be a more complex type of dependence. Example: Toss two fair coins independently, and define events as follows: Event A: Coin 1 shows Heads Event B: Coin 2 shows Heads Event C: Exactly one coin shows Heads Clearly Pr(A) = Pr(B) = 1 2 , and A and B are 1. independent, so that Pr(A ∩ B) = 4 Now C = (A ∩ B) ∪ Pr(A ∩ B). So Pr(C) = Pr(A ∩ B) + Pr(A ∩ B) = 1 2. Also A ∩ C = A ∩ B, so Pr(A ∩ C) = 1 4. Therefore events A and C are independent. Also B and C are independent. But what about the event A ∩ B ∩ C? Distinction: In this example, we can say that events A, B and C are pairwise independent, but they are not mutually independent. In practice, pairwise independent events are almost always mutually independent (e.g. events from different components of an experiment). Definition: Events A1, A2, . . . An are mutually independent if and only if Pr(Ai ∩ Aj ) = Pr(Ai) Pr(Aj ), i 6= j, Pr(Ai ∩ Aj ∩ Ak ) = Pr(Ai) Pr(Aj ) Pr(Ak ), i 6= j, i 6= k, j 6= k, Pr n \ i=1 ··· Ai = n Y Pr(Ai), i=1 i.e. if all subsets obey the multiplication law. 2.5 Two Important Theorems Consider a set A1, A2, . . . Ak of mutually exclusive and exhaustive events. Let B be some other event from the same experiment. Law of Total Probability: Pr(B) Pr(A1) · Pr(B | A1) = Pr(A2) · Pr(B | A2) + + ... + Pr(Ak ) · Pr(B | Ak ) = k X i=1 Pr(Ai) · Pr(B | Ai) PROOF: (illustrated for the case k = 5) A4 A3 A1 ' $ & B% A5 S A2 Each element of S is a member of one and only one of the A’s. Hence, the same is true of each element of B . We therefore obtain the result: B ≡ (A1 ∩ B) ∪ (A2 ∩ B) ∪ · · · ∪ (Ak ∩ B). where the events on the RHS are mutually exclusive. Hence Pr(B) = Pk i=1 Pr(Ai ∩ B). But Pr(Ai ∩ B) = Pr(Ai) · Pr(B | Ai) and so Pr(B) = k X i=1 Pr(Ai) · Pr(B | Ai). EXAMPLE: Three boxes contain certain items: box i contains ni items, of which di are defective. In an experiment, one box is chosen at random. Then, one item is chosen at random from the chosen box. Find the probability that the chosen item is defective, when n1 = 50, n2 = 100, n3 = 100, d1 = 5, d2 = 3, d3 = 5. SOLUTION: Reminder: n1 = 50, n2 = 100, n3 = 100, d1 = 5, d2 = 3, d3 = 5. Events: Let Ai = ‘box i is chosen,’ i = 1, 2, 3 Let B = ‘the chosen item is defective’. 1 Then Pr(A1) = Pr(A2) = Pr(A3) = 3 and 5 Pr(B | A1) = , 50 3 , Pr(B | A2) = 100 5 . Pr(B | A3) = 100 Hence 1 5 1 3 1 5 · + · + · 3 50 3 100 3 100 18 = = 0.06. 300 Pr(B) = EXAMPLE: Two cards (revisited) Two cards are selected from a pack of 52, without replacement. Event A: first card is an ace Event B: second card is an ace. We know that 4 ; 52 3 Pr(B | A) = ; 51 Pr(A) = Pr(A) = 48 52 Pr(B | A) = 4 . 51 Because A and A are mutually exclusive and exhaustive events, it follows that Pr(B) = Pr(B | A) Pr(A) + Pr(B | A) Pr(A) 3 4 4 48 = · + · 51 52 51 52 4 = . 52 Application of the Law of Total Probability: The example concerns a 2-stage experiment. Stage 1: A random choice is made, and either A or A occurs. Stage 2: A further random choice is made, and B may occur. We wish to find Pr(B), but it is easier to find Pr(B | A) and Pr(B | A), since the result of Stage 1 influences what happens in Stage 2. Simple extension: in Stage 1, we have a set of mutually exclusive and exhaustive events A1, A2, . . . , Ak , rather than just two such events (A and A). Further extension to multi-stage experiments: at stage 1, one of A1, A2, . . . , Ak occurs; at stage 2, one of B1, B2, . . . , Bj occurs; at stage 3, 4, . . . at stage n, some event N may occur, the conditional probability depending on which of the As, Bs etc. occurred. BAYES’ THEOREM If A1, A2, . . . , Ak are mutually exclusive and exhaustive events, and if B is an event based on the same experiment, then Pr(Ai) Pr(B | Ai) Pr(Ai | B) = nP k j=1 Pr(Aj ) Pr(B | Aj ) o. PROOF By definition , Pr(Ai ∩ B) = Pr(Ai) Pr(B | Ai) , = Pr(B) Pr(Ai | B) . Equating the two right-hand sides, we obtain: Pr(Ai) Pr(B | Ai) Pr(Ai | B) = , Pr(B) = nP Pr(Ai) Pr(B | Ai) k j=1 Pr(Aj ) Pr(B | Aj ) using the law of total probability. Applications of Bayes’ theorem: time–reversal, assessment of evidence, more advanced statistical methods o, EXAMPLE (continuation) Ai: ‘box i chosen’, B: ‘item is defective’. The calculations can be laid out most easily in tabular form, as follows: Box (i) ni di 1 50 5 2 100 3 3 100 5 Pr(B | Ai) 5 50 3 100 5 100 Pr(Ai) 1 3 1 3 1 3 Pr(Ai ∩ B) Pr(Ai | B) 5 50 · 10 18 1 3 3 100 · 3 18 1 3 5 100 · 1 3 5 18 Pr(Ai): initial, or prior , probability. Pr(Ai | B): final, or posterior , probability. The prior probability is adjusted by using the evidence provided by the data B. Exercise: Find Pr(Ai | B) for the cards example. CHAPTER 2 SUMMARY • Probability theory is developed in general by deduction from the three axioms. • Several important general results, e.g. Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B). • Concepts of random sampling with replacement and without replacement. • Conditional probability: Pr(A ∩ B) Pr(B | A) = . Pr(A) • Independence: Pr(A ∩ B) = Pr(A) × Pr(B); pairwise and mutual independence. • Law of total probability; Bayes’ theorem.