Download Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Week 3 Conditional probabilities, Bayes formula,
Expected value of a random variable
WEEK 3 page 1
We recall our discussion of 5 card poker hands.
Example 13 : a) What is the probability of event A that a 5 card poker hand contains one or more aces.
This is the probability of the complement of the event that the hand contains no aces.
485/525 . Thus P( A) = P( 1 or more ace) = 1 – P( no aces ) = 525−485/525
52
48
where the numerator  −  is the number of hands having 1 or more aces (i.e. not having zero
5
5
P( no aces ) =
aces ). We could also have computed this by noting that the event 1 or more aces is the union of the 4
events : exactly 1 ace (hence 4 non-aces chosen from 48 non-aces) , exactly two aces (hence 3 nonaces) , exactly 3 aces or exactly 4 aces. By our basic counting principles we then must have
52 − 48 = 4 ⋅ 48  4 ⋅ 48  4 ⋅ 48  4 ⋅ 48
5
5
1 4
2 3
3 2
4 1
               
b) What is the probability of the event B that we are dealt a full house consisting of 3 aces and 2
kings ? The probability is the number of ways to choose 3 aces from the 4 aces in the deck times the
ways to choose 2 kings from 4 over the total number of 5 card poker hands or
4 ⋅ 4 / 52
.
3 2 5
c) What is the probability of any full house (3 of one kind two of another) ? This is the same as in b)
above except there are 13 ways to choose the kind that we have 3 of and 12 ways to choose the kind we
4 4 52
have 2 of (since it can't be the kind we already picked). So probability 13⋅12⋅ ⋅ /
3 2 5
    
   
d) What is the conditional probability P(B | A ) of the full house in b) if we are told by the dealer
before he gives us our 5 cards that our hand has at least 1 ace in it already. This is a conditional
probability problem conditioned on the event A that our hand has at least one ace in it. In this case we
saw in part a) that the number of such hands with one or more ace is the total number of hands minus
52 − 48
the number having no aces or
hands. These hands constitutes the reduced sample space
5
5
A for the conditional event problem (i.e. the event A that our hand has at least one ace in it in this
example) . We can assume that our full house is then randomly selected from these hands with each
such hand being equally likely. This yields the number of full houses aces over kings found in the
numerator of b) above over the size of the reduced sample space or
4⋅4
3 2
P  B∩A
P  B | A=
=
P  A
52 − 48
5
5
To see the last equality, note that by dividing both the numerator and denominator above by the total
52
number of poker hands
hands this is the same as the answer in b) divided by the answer in a)
5
i.e. it is equal to P( B ) / P( A) and since the full house containing 3 aces that is the event B is
contained in the event A that at least one ace occurs we also have B=B∩ A . This is how we will
define conditional probability in general. I.e. we have motivated the following definition.
  
  
    
 
WEEK 3 page 2
Definition of conditional probability : for any two events A and B with P(A) non-zero
P  B∩A
we define P  B | A =
. If we let p(B)= P  B | A one checks that for fixed A the set
P  A
function measure p(B) satisfies the 3 axioms needed for it to be a probability measure. Namely it
satisfies
1) 0≤ p  A≤1
P  A
p = p A=
=1 (i.e. since ∩ A= A the sample space  has measure 1 )
2)
P  A
3) countable additivity for p( ) follows directly from the same property of P( ) .
Thus a conditional probability measure really is a probability measure.
The above definition can be re-written as the multiplication rule for conditional probabilities :
P  A∩B = P  A⋅P  B | A .
For three events (and similarly for more than three) this takes the form
P  A∩B∩C=P  A⋅P  B| A⋅P C | A∩B ,
which can be verified by going back to the definition of conditional probabilities. If we think of events
which are ordered in time, this says that the probability of a sequence of events occurring may be
written as a product where at each step of the product we condition the next event on the previous
events that are assumed to have already occurred. For the three events above first A occurs and then B
occurs (given A already has) and then C occurs (given A and B have already occurred).
Example 1 : To calculate the probability of selecting 3 aces in a row from a randomly shuffled 52 card
deck, we have done such problems directly by a counting argument. Namely the number of ways to do
this is the number of ways to choose 3 aces from the 4 in the deck and to get the probability we divide
by the number of ways to select 3 cards from 52 . I.e .
P( select three aces in a row from a randomly shuffled 52 card deck )
4⋅3⋅2
= 4 / 52 =
.
52⋅51⋅50
3 3
We could also obtain the same result using the multiplication rule for conditional probabilities namely
Ai be the event that the i th card selected is an ace, we have
letting
4 3 2
P  A1∩A2∩ A3=P  A1⋅P  A2 | A1⋅P  A3 | A1∩ A2  = ⋅ ⋅
52 51 50
That is, for the first ace there are 4 ways (aces) to choose an ace out of 52 equally likely cards to
select , but having chosen an ace there are now 3 ways to choose an ace from the remaining 3 aces left
in the 51 equally likely cards remaining, and finally 2 ways to choose the third ace out of the remaining
2 aces in the 50 equally likely cards remaining given that an ace was obtained in each of the previous
two selections.
  
Example 2 : problem 3.70 b) from the text : during the month of May in a certain town the probability
of the event R k1 that day k+1 is rainy given Rk that day k was is .80 . That is assuming days are
either rainy or sunny (not rainy) P  R k1 | Rk =.80 . This implies P  S k 1 | Rk =.20 is the
probability of a sunny day given the previous day was raining. We are also told that
P  R k1 | S k =.60 is the probability that the next day will be rainy given the previous was sunny.
What then is the probability that for some 5 consecutive days in May in this town a rainy day is
followed by two more rainy days, then a sunny day and then a rainy day. We are not told the probability
that a particular day is rainy so we must interpret this as the probability that given the first day is rainy
that the second and third are too, the fourth sunny and the fifth rainy.
WEEK 3
page 3
By the multiplication rule for conditional probabilities we have
P  R5 S 4 R3 R 2 | R1= P  R5 | S 4 R3 R2 R 1⋅P  S 4 | R3 R 2 R1 ⋅P  R3 | R 2 R1 ⋅P  R2 | R 1
now we use the property (so-called Markov property) that the next event only depends on the previous
one and nothing earlier (one says that the Markov chain has a “memory of one”. to get
= P R5 | S 4 ⋅P S 4 | R3 ⋅P R 3 | R2⋅P R2 | R1  = .6⋅.2⋅.8⋅.8 = .0768
for our desired conditional probability.
Remark : A Markov chain generalizes this example to the case where there may be more than the two
states rainy or sunny. For example in the finite state space case for a 5 state chain, the chain is
described by a 5 by 5 transition matrix of conditional probabilities. For example the row 1 column 3
entry of the matrix gives the conditional probability of going to state 3 given that the previous state was
state 1. If we want to know the probability of going from state 2 to state 5 in 4 time steps we look at the
(2,5) entry of the fourth power of the transition matrix. If one wants to remember the last two states
(memory of two) there is a trick one uses. This situation can still be described by a Markov chain where
now we enlarge the state space from 5 states to all 25 pairs of states (which we could label as states 1
through 25). Now the transition matrix is a 25 by 25 matrix of 625 transition probabilities. Similarly if
we want to remember the previous three states our enlarged state space would then consist of all 125
triples of states one through 5 (so 5 cubed or 125 states) etc. To learn more consider taking Math 632
Introduction to Stochastic (random) Processes.
Independent events : Intuitively what we mean when we say two events A and B are independent such
as two consecutive flips of a fair coin, is that being told that event B has occurred (the first flip yielded
a head say) should not influence the probability of A occurring (the second flip is a tail say) or in
symbols P  A| B=P  A .
Equivalently by the definition of conditional probability, this is true if and only if
P  A∩B=P  A P  B ( <-- pairwise indepence )
i.e. the probability of the intersection of the two events factors as the product of the probabilities of the
individual events. More generally, we say a collection of events are independent if the probability of
the intersection of the events in any sub-collection of two or more of the events factors as the product
of the probabilities of the individual events: P A i ∩ Ai ∩...∩ Ai =P Ai ×P A i ×. . .×P Ai  .
It is possible for three events to fail to be independent ( not independent = dependent ) even when
any two of the events are pairwise independent.
1
2
k
1
2
k
Example 3: Three events which are dependent (not independent) but which are pairwise independent :
For a simple example of this consider flipping a fair coin twice. Let
A= the first flip yields a head={HH, HT},
B= the second flip yields a head={TH, HH},
C= exactly one head occurs in the two flips={TH, HT}
Note that the intersection of the three events is the empty set which has probability 0, we have
P  A∩B∩C=P ∅=0≠P  A⋅P  B⋅P C=1/8 since the individual events each have
probability
P  A=P  B= PC =1/2 so their product is 1/8. Since 0 is not equal to 1/8, A, B, and C are not
independent events. but we claim the probability of the intersection of any two of these events is
P  A∩B=P  B∩C =P  A∩C =1 /4 , so these are pairwise independent events since clearly
P  A∩B=P  A⋅P  B etc.
Example 4 : Consider 5 rolls of a fair (six-sided) die. The probability of rolling a 3 for any particular
WEEK 3
page 4
roll is 1/6 while the probability of not rolling a 3 is 5/6 by the probability of the complement. Find the
probability of rolling exactly two 3's in 5 rolls. Using independence, the probability of any particular
sequence of 5 rolls, which we view as 5 independent events, two of which involve rolling a 3, and the
other three involve rolling anything else , is the product of the individual probabilities1/6 times 1/6
times 5/6 times 5/6 times 5/6. But there are 5 choose 2 ways that we could have selected the particular
two rolls in which the 3 occurred. Thus by the sum rule for probabilities of disjoint events :

= 5 1/62 5/63 .
2
This is an example of a binomial random variable where the probability of success (a 3 is rolled) for
each trial (roll) is p=1/6 and the probability of failure is (1-p) = 5/6. Similar reasoning gives that the
probability of exactly k successes in n independent trials each having success probability p is the
binomial probability
b( k; n, p) =
P( exactly k successes in n independent trials each having success probability p )
k
n−k
= n p 1− p
.
k
P( two threes in 5 rolls) =

Example 5 : Consider a system where parts 1 and 2 operate independently of one another each with
failure probability .1 but are running in parallel so that the system diagram looks like
_____
______| 1 |______
/
|____|
\
------/
\__________
\
_____
/
\______| 2 |______/
|____|
Letting
A ={ component 1 operates successfully},
B= {component 2 operates successfully},
then P( system succeeds ) = P( either 1 or 2 operates) =
P  A∪B=P  AP  B−P  A⋅P  B=.9.9−.9⋅.9=.99
so the combined system is operational 99% of the time.
A slightly more complicated example involves a similar system like
_____
_____
______| 1 |_____| 3 |__
/
|____|
|_____| \
------/
\_________
\
_____
/
\______| 2 |______ ______/
|____|
where now with event C = {component 3 operates successfully}, with C independent of A and B
P( system succeeds ) =
P  A∩C∪B=P  A∩CP  B−P  A∩B∩C=.9.9.9−.9.9.9=.981
using independence of the three events and assuming component 3 also fails with probability .1.
The success probability is slightly lower than before since now both components 1 and 3 must work
properly for the top series to work.
WEEK 3
page 5
Example 6 : Problem 3.75 : A tree diagram for conditional probabilities is a useful device.. Figure 3.16
of the text used for exercise 3.75 is the following
.30
.30
B---------- A
B------------ A
/ \
/ \
.4 /
\-------- A
.4 /
\-------- A
/
which can be filled in as
/
.70
\
\
\
.6\
.8


\ B -------- A
\ B ---------- A
\
\

\------------ A
\--------- A
.2
.2


The interpretation is that P  B=.4 , P  A | B=.30 , P  A | B =.20 from which we infer
 =.6 , P  A | B=.7 , P  A| B =.8 (using the law of the probability for the complement )
PB
  = (.4)(.3) + (.6)(.8) = .60
P
 A=P  B P  A | BP  B  P  A| B
a)
P  A∩ B P  B P  A| B .4.3 1
=
=
= = .20 by part a) using the multiplication
b) P  B | A=
P  A
P  A
.60
5
rule for conditional probabilities.
Note that the original diagram gave us P  A | B and what was wanted in part b) was to reverse the
order of conditioning, that is to find P  B | A . This is the situation where Bayes' theorem applies.
One has a collection of mutually exclusive (disjoint) events which exhaust the sample space. In this
B .
case the sample space is a disjoint union of B and 
c) Similarly
P  B| 
A=
P  B P  
A | B .4.7
=
= .70
P
A
.4
Part a) is referred to as the rule of total probability (or rule of elimination) . It is used to get the
denominator in part b). Parts b) and c) are known as Bayes' theorem. To get the rule of total
probability we note that the disjoint partition of the sample space such as B∪ B= also partitions
any set A= A∩B∪ A∩ B  as a disjoint union. Since the union is disjoint, the third axiom of
  . We then re-write each probability on the right via
probability gives P  A=P  A∩BP  A∩ B
the multiplication rule for conditional probabilities (essentially the definition of conditional
probability). This gives the rule of total probability which gives the denominator in Bayes' theorem.
E 1∪E 2∪... E k = is a disjoint union which exhausts the sample space  then
P  E l  P  D | El 
P  E l | D= k
Bayes' theorem says
. The probabilities P  E l  l=1, .. , k are
∑ P  E j  P  D| E j 
More generally if
j=1
called the priors (which reflect our best knowledge before the experiment D). Then we collect some
new data and update these to get the posterior probabilities P  E l | D . Certain types of probability
distributions have the property that the prior and posterior both have a similar form except that certain
real number parameters characteristic of the distribution change in ways which are easy to compute.
Then if we get some new data in we can regard the old posterior as the new prior and use the new data
to update to get the new posterior and so on.
WEEK 3
page 6
Example 7 : An Ace Electronics dealer sells 3 brands of televisions : 50% are the first brand which are
Italian made , 30% are the second brand which are French made, and 20% are the third brand which are
Swiss made. Each TV comes with a 1 year warranty. We know that 25% of the Italian TVs will require
repair work under warranty, 20% of the French will require repairs under warranty, and 10% of the
Swiss TVs will need repair under warranty.
a) What is the probability that a randomly selected customer has bought an Italian TV that will need
repairs under warranty ?
b) What is the probability that a randomly selected customer has a TV that will need repairs under
warranty ?
c) If the customer returns to the store with a TV that needs repairs under warranty, what is the
probability that it is an Italian TV? French ? Swiss ?
Letting A be the event that a randomly selected TV is Italian , B that it is French, C that it is Swiss and
R that it will need repairs under warranty.
We are given that P  A=.50, P  B=.30, PC =.20, P R | A=.25, P  R | B=.20, P  R| C =.10
Part a) asks for P  A∩R=P  A⋅P R | A=.50⋅.25=.125=1 /8. by the product rule for
conditional probabilities.
Part b) wants P(R) which uses the rule of total probability
P  R= P R∩AP  R∩BP  R∩C =P  A P  R| AP  B P  R| BP C  P  R| C 
= .125 + .060 + .020 = .205.
P  A P  R| A .125
=
=.61 using Bayes' theorem and again
Part c) asks for P  A| R=
P  R
.205
.060
.20
=.29 and P C | R=
=.10 which could also be obtained as
for P  B| R=
.205
.205
1−P  A| R−P  B | R=1−.61−.29=.10 .
Example 8 : Suppose a certain diagnostic test is 98% accurate both on those who do and those that
don't have a disease. If .3%=.003 of the population has cancer, find the probability that a randomly
selected tested person has cancer (C) given that the test is positive (+). The given information says
  from which we deduce that
P C = .003, P | C = .98 = P −| C


P  C=.997, P | C =.02=P −|C  Then by Bayes' theorem,
P|C  PC
.98×.003
3
=
≈
P ( C |) =
. This is slightly larger
 P C
  .98×.003.02×.997
P | C PC P | C
23
than 15% . Thus the posterior probability of having the disease (given a positive test result) is over 150
chances out of 1000 up from the original prior probability of .003 ( 3 chances in 1000 ) prior to testing.
A positive test result only gives a 15% chance of having the disease due to the fact that the chance of
having it in the population as a whole is so small.
Expectation of a random variable (also called its expected value or mean value)
Consider the following game: we flip a fair coin 5 times. If a head occurs for the first time on the j th
j
flip the game pays us winnings amount W =W =a j =2 dollars for 1≤ j≤5 where  is
a particular outcome of the experiment that is a sequence of 5 heads or tails such that the first head
occurs on the j th flip and using independence of flips this happens with probability
1
p j= P  :W =a j = j (= P W =a j for short ) . If no heads occur in 5 flips we'll take j=6 so
2
that the game pays us the grand prize of W =26=$ 64 and this occurs with probability 1/32. Now
the 64 dollar question is : How much are you willing to pay to play this game so that on average you
will break even ?
WEEK 3
page 7
If we play the game a large number of times n then by the relative frequency interpretation of
probabilities, the number of times s j we won amount a j dollars (successes s j in winning the
sj
is approximately the
j th amount ) is approximately n p j (equivalently the probability p j ≈
n
relative fraction of times we won amount a j ). Thus if we play the game independently many times
n wining amount W k on the k th time that we played, our long term average or sample mean
n
winnings
∑ Wk
∑ ajn pj
is approximately
or canceling the n's and denoting our
≈ j
 = k=1
W
n
n
average winnings by E[ W ] (the expected value of the random variable W ) we find for the
definition of the expectation of any discrete random variable W taking possible values a j (from
a countable set of values) with probabilities p j :
E [W ]=∑ a j p j=∑ a j P W j =a j  .
j
j
In our particular example when the sample size n gets big but the sum over j is for 1≤ j≤6 this gives
E [W ]=21/241/ 481/8161/16321/3264 1/32
= 111112=$ 7 .
Thus we should be willing to pay $7 each time we play the game, assuming we play more than 32
times, long enough to win the grand prize which we expect to happen around 1/32 of the time.
The situation is a little different if instead of stopping at the 5th flip we flip the coin 15 times with grand
prize 216 dollars. It is easy enough to calculate the expectation in this case using the above formula.
But realistically not all of us would want to wait around on average 215 times which could take
several years before we see the rare event of winning the grand prize which occurs with probability
15
1/ 2  . Economists speak of a utility function which describes how much playing such a game is
worth to us personally and which may vary from person to person depending on our tastes in gambling
and how much we are willing to risk.
In one of the homeworks problem 3.90 involving expected values, a company pays some per unit
cost of C dollars and sells the item at a per unit sales price S=S 1 . If a fixed number k of items
are stocked for the day then the cost of the k units is a fixed amount k⋅C . The demand (how many
units customers desire to purchase that day) is assumed to be a random variable where p j gives the
probability that the demand equals j units that day and the sales price resulting from a given demand j
is then a random variable X given by
X = S⋅j if j≤k −1 else X =S⋅k if j≥k . (I.e. The actual sales cannot exceed the
number in stock.) Then the expected profit is the expected sales price minus the fixed cost or
k−1
E [ P ]=E [ X ]−C⋅k =∑ p j⋅S⋅ j ∑ p j ⋅S⋅k −C⋅k .
j =0
j≥k
That is for demand j less than the number k in stock our sales price for the j units sold is S⋅j and this
occurs with probability p j while if the demand is greater than or equal to the number k in stock we
sell k items at a price S⋅k and this occurs with probability ∑ p j . Note we could have re-written
j≥k
the fixed cost k⋅C =k⋅C⋅∑ p j (since the sum of the probabilities equals 1) and then the above is
j
equivalent to the expected profit where the profit is the random variable
price random variable minus the fixed cost for k items in stock .
X −C⋅k which is the sales