* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Probability
Survey
Document related concepts
Transcript
Probability COMP 245 STATISTICS Dr N A Heard Contents 1 Sample Spaces and Events 1.1 Sample Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Combinations of Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 2 2 Probability 2.1 Axioms of Probability . . . . 2.2 Simple Probability Results . . 2.3 Independence . . . . . . . . . 2.4 Interpretations of Probability . . . . 3 3 3 4 4 3 Examples 3.1 De Méré’s problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Joint events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 6 4 Conditional Probability 4.1 Definition . . . . . . . . . . . . . . . . 4.2 Examples . . . . . . . . . . . . . . . . . 4.3 Conditional Independence . . . . . . . 4.4 Bayes Theorem and the Partition Rule 4.5 More Examples . . . . . . . . . . . . . 1 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8 8 9 9 11 Sample Spaces and Events Sample Spaces We consider a random experiment whose range of possible outcomes can be described by a set S, called the sample space. We use S as our universal set (Ω). Ex. • Coin tossing: S = {H, T}. • Die rolling: S = { , , , , , }. • 2 coins: S = {(H, H), (H, T), (T, H), (T, T)}. 1 1.2 Events An event E is any subset of the sample space, E ⊆ S; it is a collection of some of the possible outcomes. Ex. • Coin tossing: E = {H}, E = {T}. • Die rolling: E = { }, E = {Even numbered face} = { , , }. • 2 coins: E = {Head on the first toss} = {(H, H), (H, T)}. Extreme possible events are ∅ (the null event) or S. The singleton subsets of S (those subsets which contain exactly one element from S) are known as the elementary events of S. Suppose we now perform this random experiment; the outcome will be a single element ∗ s ∈ S. Then for any event E ⊆ S, we will say E has occurred if and only if s∗ ∈ E. If E has not occurred, it must be that s∗ ∈ / E ⇐⇒ s∗ ∈ E, so E has occurred; so E can be read as the event not E. First notice that the smallest event which will have occurred will be the singleton {s∗ }. For any other event E, E will occur if and only if {s∗ } ⊂ E. Thus we can immediately draw two conclusions before the experiment has even been performed. Remark 1.1. For any sample space S, the following statements will always be true: 1. the null event ∅ will never occur; 2. the universal event S will always occur. Hence it is only for events E in between these extreme events, ∅ ⊂ E ⊂ S for which we have uncertainty about whether E will occur. It is precisely for quantifying this uncertainty over these events that we require the notion of probability. 1.3 Combinations of Events Set operators on events Consider a set of events { E1 , E2 , . . .}. • The event [ Ei = {s ∈ S|∃i s.t. s ∈ Ei } will occur if and only if at least one of the events i { Ei } occurs. So E1 ∪ E2 can be read as event E1 or E2 . • The event \ Ei = {s ∈ S|∀i, s ∈ Ei } will occur if and only if all of the events { Ei } occur. i So E1 ∩ E2 can be read as events E1 and E2 . • The events are said to be mutually exclusive if ∀i, j, Ei ∩ Ej = ∅ (i.e. they are disjoint). At most one of the events can occur. 2 2 Probability 2.1 Axioms of Probability σ-algebras of events So for our random experiment with sample space S of possible outcomes, for which events/subsets E ⊆ S would we like to define the probability of E occurring? Every subset? If S is finite or even countable, then this is fine. But for uncountably infinite sample spaces it can be shown that we can very easily start off defining sensible, proper probabilities for an initial collection of subsets of S in a way that leaves it impossible to then carry on and consistently define probability for all the remaining subsets of S. For this reason, when defining a probability measure on S we (usually implicitly) simultaneously agree on a collection of subsets of S that we wish to measure with probability. Generically, we will refer to this set of subsets as S. There are three properties we will require of S, the reasons for which will become immediately apparent when we meet the axioms of probability. We need S to be 1. nonempty, S ∈ S; 2. closed under complements: E ∈ S =⇒ E ∈ S; 3. closed under countable union E1 , E2 , . . . ∈ S =⇒ [ Ei ∈ S. i Such a collection of sets is known as a σ-algebra. Axioms of Probability A probability measure on the pair (S, S) is a mapping P : S → [0, 1] satisfying the following three axioms for all subsets of S on which it is defined (S, the measurable subsets of S) 1. ∀ E ∈ S, 0 ≤ P( E) ≤ 1; 2. P(S) = 1; 3. Countably additive: For disjoint subsets E1 , E2 , . . . ∈ S ! P [ Ei 2.2 = ∑ P( Ei ). i i Simple Probability Results Exercises: From 1-3 it is easy to derive the following: 1. P( E) = 1 − P( E); 2. P(∅) = 0; 3. For any events E and F, P( E ∪ F ) = P( E ) + P( F ) − P( E ∩ F ). 3 2.3 Independence Two events E and F are said to be independent if and only if P( E ∩ F ) = P( E)P( F ). This is sometimes written E ⊥ F. More generally, a set of events { E1 , E2 , . . .} are said to be independent if for any finite subset { E10 , E20 , . . . , En0 }, ! P n \ n Ei0 i =1 = ∏ P( Ei0 ). i =1 If events E and F are independent, then E and F are also independent. Proof: Since F = ( E ∩ F ) ∪ ( E ∩ F ) is a disjoint union, P( F ) = P( E ∩ F ) + P( E ∩ F ) by Axiom 3. So P( E ∩ F ) = P( F ) − P( E ∩ F ) = P( F ) − P( E)P( F ) = (1 − P( E))P( F ) = P( E)P( F ). 2.4 Interpretations of Probability Classical If S is finite and the elementary events are considered “equally likely”, then the probability of an event E is the proportion of all outcomes in S in which lie inside E, P( E ) = | E| . |S| Ex. • Rolling a die: Elementary events are { }, { }, . . . , { }. – P({ }) = P({ }) = . . . = P({ }) = 61 . – P(Odd number) = P({ , , }) = 3 6 = 12 . • Randomly drawn playing card: 52 elementary events {♠2}, {♠3}, . . . , {♠A}, {♥2}, {♥3}, . . . , . . . , {♣K}, {♣A}. – P(♠) = P(♥) = P(♦) = P(♣) = 14 . – The joint event {Suit is red and value is 3} contains two of 52 elementary events, so 2 1 P({red 3}) = 52 = 26 . Since suit and face value should be independent, check that P({red 3}) = P({♥, ♦}) × P({any 3}). The “equally likely” (uniform) idea can be extended to infinite spaces, by apportioning probability to sets not by their cardinality but by other standard measures, like volume or mass. Ex. • If a meteorite were to strike Earth, the probability that it will strike land rather than sea would be given by Total area of land . Total area of Earth 4 Frequentist Observation shows that if one takes repeated observations in “identical” random situations, in which event E may or may not occur, then the proportion of times in which E occurs tends to some limiting value - called the probability of E Ex. • Proportion of heads in tosses of a coin: H, H, T, H, T, T, H, T, T, . . . → 21 . Subjective Probability is a degree of belief held by an individual. For example, De Finetti (1937/1964) suggested the following: Suppose a random experiment is to be performed, where an event E ⊆ S may or may not happen. Now suppose an individual is entered into a game regarding this experiment where he has two choices, each leading to monetary (or utility) consequences: 1. Gamble: If E occurs, he wins £1; if E occurs, he wins £0; 2. Stick: Regardless of the outcome of the experiment, he receives £P( E) for some real number P( E). The critical value of P( E) for which the individual is indifferent between options 1 and 2 is defined to be the individual’s probability for the event E occurring. This procedure can be repeated for all possible events E in S. Suppose after this process of elicitation of the individual’s preferences under the different events, we can simultaneously arrange an arbitrary number of monetary bets with the individual based on the outcome of the experiment. If it is possible to choose these bets in such a way that the individual is certain to lose money (this is called a “Dutch Book”), then the individuals degrees of belief are said to be incoherent. To be coherent, it is easily seen, for example, that we must have 0 ≤ P( E) ≤ 1 for all events E, E ⊆ F =⇒ P( E) ≤ P( F ), etc. 5 3 Examples 3.1 De Méré’s problem Antoine Gombaud, chevalier de Méré (1607-1684) posed to Pascal the following gambling problem: which of these two events is more likely? 1. E = {4 rolls of a die yield at least one }. 2. F = {24 rolls of two dice yield at least one pair of }. De Méré observed that E seemed to lead to a profitable even money bet whereas F did not. We calculate P( E) and P( F ). 1. Each roll of the die is independent from the previous rolls, and so there are 64 equally likely outcomes. Of these, 54 show no s. showing is So the probability of no 54 64 ≈ 0.4823. showing, is ≈ 1 − 0.4823 = 0.5177. So P( E), the probability of at least one 2. There are 3624 equally likely outcomes here. Of these, 3524 don’t show a So the probability of no is 3524 3624 ≈ 0.5086 , is ≈ 1 − 0.5086 = 0.4914 So P( F ), the probability of at least one Hence P( E) ≈ 0.5177 > 3.2 1 2 . > 0.4914 ≈ P( F ). Joint events Coin and Die Consider tossing a coin and rolling a die. We would consider each of the 12 possible combinations of Head/Tail and die value as equally likely. So we can construct a probability table: H T 1 12 1 12 1 6 1 12 1 12 1 6 1 12 1 12 1 6 1 12 1 12 1 6 1 12 1 12 1 6 1 12 1 12 1 6 1 2 1 2 From this table we can calculate the probability of any event we might be interested in, simply by adding up the probabilities of all the elementary events it contains. For example, the event of getting a head on the coin {H} = {(H, ), (H, ), . . . , (H, )} 6 has probability P({H}) = P({(H, )}) + P({(H, )}) + . . . + P({(H, )}) 1 1 1 = + +...+ 12 12 12 1 = . 2 Notice the two experiments satisfy our probability definition of independence, since for example 1 1 1 = × = P({H}) × P({ }). P({(H, )}) = 12 2 6 Coin and Two Dice A crooked die called a top has the same faces on opposite sides. Suppose we have two dice, one normal and one which is a top with opposite faces numbered , , or . Now suppose we first flip the coin. If it comes up heads, we roll the normal die; tails, and we roll the top. To calculate the probability table easily, we notice that this is equivalent to the previous game using one normal die except with the change after tails that a roll of a is relabelled as a , → , → . So we can just merge those probabilities in the tails row. H T 1 12 1 6 1 4 1 12 0 1 12 1 12 1 6 1 4 1 12 0 1 12 1 12 1 6 1 4 1 12 0 1 12 1 2 1 2 The probabilities of the different outcomes of the dice change according to the outcome of the coin toss. And note, for example, P({(H, )}) = 1 1 1 1 6= = × = P({H}) × P({ }). 12 24 2 12 So the two experiments are now dependent. 7 4 Conditional Probability 4.1 Definition For two events E and F in S where P( F ) 6= 0, we define the conditional probability of E occurring given that we know F has occurred as P( E | F ) = P( E ∩ F ) . P( F ) Note that if E and F are independent, then P( E | F ) = 4.2 P( E ∩ F ) P( E )P( F ) = = P( E ). P( F ) P( F ) Examples Example 1 - Rolling a We roll a normal die once. 1. What is the probability of E = {the die shows a }? 2. What is the probability of E = {the die shows a F = {the die shows an odd number}? } given we know Solution: 1. P( E) = Number of ways a can come up 1 = . Total number of possible outcomes 6 2. Now the set of possible outcomes is just F = { , , }. So P( E| F ) = Note P( F ) = Number of ways a can come up 1 = . Total number of possible outcomes 3 P( E ∩ F ) 1 and E ∩ F = E, and hence we have P( E| F ) = . 2 P( F ) Example 2 - Rolling two dice Suppose we roll two normal dice, one from each hand. Then the sample space comprises all of the ordered pairs of dice values S = {( , ), ( , ), . . . , ( , )}. Let E be the event that the die thrown from the left hand will show a larger value than the die thrown from the right hand. P( E ) = # outcomes with left value > right 15 = . total # outcomes 36 8 Suppose we are now informed that an event F has occurred, where F = {the value of the left hand die is } How does this change the probability of E occurring? Well since F has occurred, the only sample space elements which could have possibly occurred are exactly those elements in F = {( , ), ( , ), ( , ), ( , ), ( , )( , )}. Similarly the only sample space elements in E that could have occurred now must be in E ∩ F = {( , ), ( , ), ( , ), ( , )}. So our revised probability is # outcomes with left value > right 4 P( E ∩ F ) = = ≡ P( E | F ). 6 P( F ) total # outcomes ( , ·) Discussion of Examples In both examples, we considered the probability of an event E, and then reconsidered what this probability would be if we were given the knowledge that F had occurred. What happened? Answer: The sample space S was replaced by F, and the event E was replaced by E ∩ F. So originally, we had P( E ) = P( E | S ) = P( E ∩ S ) (since E ∩ S = E, and P(S) = 1 by Axiom 2). P( S ) So we can think of probability conditioning as a shrinking of the sample space, with events replaced by their intersections with the reduced space and a consequent rescaling of probabilities. 4.3 Conditional Independence Earlier we met the concept of independence of events according to a probability measure P. We can now extend that idea to conditional probabilities since P(·| F ) is itself a perfectly good probability measure obeying the axioms of probability. For three events E1 , E2 and F, the event pair E1 and E2 are said to be conditionally independent given F if and only if P( E1 ∩ E2 | F ) = P( E1 | F )P( E2 | F ). This is sometimes written E1 ⊥ E2 | F. 4.4 Bayes Theorem and the Partition Rule Bayes Theorem For two events E and F in S, we have P( E ∩ F ) = P( F )P( E | F ); (1) P( E ∩ F ) = P( E )P( F | E ). (2) but also, since E ∩ F ≡ F ∩ E Equating the RHS of (1) and (2), provided P( F ) 6= 0 we can rearrange to obtain P( E | F ) = P( E )P( F | E ) . P( F ) 9 Partition Rule Consider a set of events { F1 , F2 , . . .} which form a partition of S. Then for any event E ⊆ S P( E ) = ∑ P(E| Fi )P( Fi ). i Proof: E = E∩S = E∩ [ i Fi = [ ( E ∩ Fi ). i So ! [ P( E ) = P ( E ∩ Fi ) , i which, by countable additivity (Axiom 3) and noting that since the { F1 , F2 , . . .} are disjoint so are { E ∩ F1 , E ∩ F2 , . . .}, implies P( E ) = ∑ P(E ∩ Fi ). (3) i (3) is known as the law of total probability; and it can be rewritten P( E ) = ∑ P(E| Fi )P( Fi ). i For any events E and F in S, note that { F, F }, say, form a partition of S. So by the law of total probability we have P( E ) = P( E ∩ F ) + P( E ∩ F ) = P( E | F )P( F ) + P( E | F )P( F ). Terminology When considering multiple events, say E and F, we often refer to • probabilities of the form P( E| F ) as conditional probabilities; • probabilities of the form P( E ∩ F ) as joint probabilities; • probabilities of the form P( E) as marginal probabilities. 10 4.5 More Examples Example 1 - Defective Chips Ex. A box contains 5000 VLSI chips, 1000 from company X and 4000 from Y. 10% of the chips made by X are defective and 5% of those made by Y are defective. If a randomly chosen chip is found to be defective, find the probability that it came from X. Let E = “chip was made by X”; let F = “chip is defective”. First of all, which probabilities have we been given? A box contains 5000 VLSI chips, 1000 from company X and 4000 from Y. 1000 = 0.2, 5000 4000 = 0.8. P( E ) = 5000 =⇒ P( E) = 10% of the chips made by X are defective and 5% of those made by Y are defective. =⇒ P( F | E) = 10% = 0.1, P( F | E) = 5% = 0.05. We have enough information to construct the probability table F F E 0.02 0.18 0.2 E 0.04 0.76 0.8 0.06 0.94 The law of total probability has enabled us to extract the marginal probabilities P( F ) and P( F ) as 0.06 and 0.94 respectively. So by Bayes Theorem we can calculate the conditional probabilities. In particular, we want P( E | F ) = P( E ∩ F ) 0.02 1 = = . P( F ) 0.06 3 11 Example 2 - Kidney stones Kidney stones are small (< 2cm diam) or large (> 2 cm diam). Treatment can succeed or fail. The following data were collected from a sample of 700 patients with kidney stones. Large (L) Small (L) Total Success (S) 247 315 562 Failure (S) 96 42 138 343 357 700 For a patient randomly drawn from this sample, what is the probability that the outcome of treatment was successful, given the kidney stones were large? Clearly we can get the answer directly from the table by ignoring the small stone patients P( S | L ) = 247 343 or we can go the long way round: 343 , 700 247 , P( S ∩ L ) = 700 P( S ∩ L ) P( S | L ) = = P( L ) P( L ) = 247 700 343 700 = 247 . 343 Example 3 - Multiple Choice Question A multiple choice question has c available choices. Let p be the probability that the student knows the right answer, and 1 − p that he does not. When he doesn’t know, he chooses an answer at random. Given that the answer the student chooses is correct, what is the probability that the student knew the correct answer? Let A be the event that the question is answered correctly; let K be the event that the student knew the correct answer. Then we require P(K | A). By Bayes Theorem P( K | A ) = P( A | K )P( K ) P( A ) and we know P( A|K ) = 1 and P(K ) = p, so it remains to find P( A). By the partition rule, P( A ) = P( A | K )P( K ) + P( A | K )P( K ) 1 and since P( A|K ) = , this gives c 1 P( A ) = 1 × p + × (1 − p ). c Hence p cp P( K | A ) = = . 1− p cp + 1 − p p+ c Note: the larger c is, the greater the probability that the student knew the answer, given that they answered correctly. 12 Example 4 - Super Computer Jobs Measurements at the North Carolina Super Computing Center (NCSC) on a certain day showed that 15% of the jobs came from Duke, 35% from UNC, and 50% from NC State University. Suppose that the probabilities that each of these jobs is a multitasking job is 0.01, 0.05, and 0.02 respectively. 1. Find the probabilities that a job chosen at random is a multitasking job. 2. Find the probability that a randomly chosen job comes from UNC, given that it is a multitasking job. Solution: Let Ui = “job is from university i”, i = 1, 2, 3 for Duke, UNC, NC State respectively; let M = “job uses multitasking”. 1. P( M ) = P( M |U1 )P(U1 ) + P( M |U2 )P(U2 ) + P( M|U3 )P(U3 ) = 0.01 × 0.15 + 0.05 × 0.35 + 0.02 × 0.5 = 0.029. 2. P( M|U2 )P(U2 ) P( M ) 0.05 × 0.35 = = 0.603. 0.029 P(U2 | M) = Example 5 - HIV Test A new HIV test is claimed to correctly identify 95% of people who are really HIV positive and 98% of people who are really HIV negative. Is this acceptable? If only 1 in a 1000 of the population are HIV positive, what is the probability that someone who tests positive actually has HIV? Solution: Let H = “has the HIV virus”; let T = “test is positive”. We have been given P( T | H ) = 0.95, P( T | H ) = 0.98 and P( H ) = 0.001. We wish to find P( H | T ). P( T | H )P( H ) P( T | H )P( H ) + P( T | H )P( H ) 0.95 × 0.001 = 0.95 × 0.001 + 0.02 × 0.999 = 0.045 P( H | T ) = That is, less than 5% of those who test positive really have HIV. 13 Example 5 - continued If the HIV test shows a positive result, the individual might wish to retake the test. Suppose that the results of a person retaking the HIV test are conditionally independent given HIV status (clearly two results of the test would certainly not be unconditionally independent). If the test again gives a positive result, what is the probability that the person actually has HIV? Solution: Let Ti = “ith test is positive”. P( T1 ∩ T2 | H )P( H ) P( T1 ∩ T2 ) P( T1 ∩ T2 | H )P( H ) = P( T1 ∩ T2 | H )P( H ) + P( T1 ∩ T2 | H )P( H ) P( T1 | H )P( T2 | H )P( H ) = P( T1 | H )P( T2 | H )P( H ) + P( T1 | H )P( T2 | H )P( H ) P( H | T1 ∩ T2 ) = by conditional independence. Since P( Ti | H ) = 0.95 and P( Ti | H ) = 0.02, P( H | T1 ∩ T2 ) = 0.95 × 0.95 × 0.001 ≈ 0.693. 0.95 × 0.95 × 0.001 + 0.02 × 0.02 × 0.999 So almost a 70% chance after taking the test twice and both times showing as positive. For three times, this goes up to 99%. 14