Download Appendix: Conditional Probability: Conditional probability is updated

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Appendix: Conditional Probability:
Conditional probability is updated probability. Conditionality is used to update
probability when some relevant information becomes available. It is
accomplished by updating the sample space.
Let us start with reviewing the set operations. There are two binary set
operations, union (denoted by U) and intersection (denoted by ∩).
The union of two sets, A U B, is a set consisting of all the elements in A and B.
It is the union of elements from both sets. Let us have an example. If X is the
number of dots on the upper side of a rolled die, then the sample space for this
random variable is S = {1, 2, 3, 4, 5, 6}. Let A = {2, 4, 6} and B = {5, 6}, then A U B =
{2, 4, 5, 6}, not {2, 4, 5, 6, 6} because any element cannot be listed more than
once in a set.
The intersection of two sets, A ∩ B, is a set consisting of all the common elements
between A and B.
It is an intersection of the elements from both sets. In the example with the
rolled die above, A ∩ B = {6}, not {6, 6} which is not a set.
There is a set-operator that operates on one set. It is the complement of a set. If
the set is D, then the complement of D is denoted as Dc or D’.
The complement of a set is a set consisting of all the elements not in the set but
in the sample space.
It is the complement to the event for the sample space. For example, Ac = {1, 3, 5}
and Bc = {1, 2, 3, 4} with the rolled die.
Notice that, for any subset D of S (that is, all the elements of D are elements of S),
D U Dc = S and D ∩ Dc = φ (= { }, the empty set). So, P(D U Dc) = P(S) = 1 and P(D U
Dc) = P(D) + P(Dc) because D ∩ Dc = φ and P(D ∩ Dc) = 0, which gives P(D) + P(Dc) =
1 or P(D) = 1 - P(Dc), useful for finding P(D) when it is difficult to find P(D) but P(Dc)
1
is easy to find. Generally, if one is difficulty find, the other is easy to find. By the
way, these probability equations are equivalent to P(Dc) = 1 - P(D) as well.
Please verify all this using A and Ac in the rolled-die example. If the die is assumed
to be balanced, then each sample point has the equal probability of 1/6 (that is,
P(A) = ½ and P(Ac) = ½). Can you do it? Also, repeat the verification with B in the
rolled-die example. This time, P(B) = 1/3 and P(Bc) = 2/3. Can you verify them?
Generally, P(D U E) = P(D) + P(E) – P(D ∩ E). The subtraction of P(D ∩ E) is
required because it is added twice as part of P(D) and P(E). So, one of them must
be subtracted to make it right. See the left diagram below. Two sets, D and E, are
given as circles in their sample space given as a rectangle. The overlapping area
of the two circles is D ∩ E. The elements of sets are not explicit given (unless
needed) but are assumed to exist. So, when circles for sets are overlapping, it is
assumed that the intersection has at least one element (not empty). On the other
hand, if there is no common element between two sets (that is, the sets are
mutually exclusive), their circles are given without overlapping, see the right
diagram below. By the way, these diagrams are called Venn diagrams which are
very commonly used with sets.
D
D
E
E
S
S
If Sets D and E are mutually exclusive to each other, then P(D U E) = P(D) + P(E) –
P(D ∩ E) becomes P(D U E) = P(D) + P(E) since P(D ∩ E) = P(φ)= 0. By the way,
remember that it is always P(A ∩ Ac) = 0 because A and Ac are mutually exclusive.
2
Now, let us go back to conditional probabilities. The conditional probability of D
given E is denoted as P(D|E) and defined as
P(D|E) =
𝐏(𝐃∩𝐄)
𝐏(𝐄)
,
where E ≠ φ since you cannot condition on a set that does not occur and, also,
cannot have 0 in the numerator. Note that P(φ) = 0 and the conditional
probability must be defined as a number between 0 and 1, both inclusive, like any
other probability.
A common mistake is to use P(DUE) in the numerator. However, this would result
in P(D|E) > 1 because DUE is greater than E (see the left diagram above), and you
know it is incorrect. Conditional or not, a probability is a number between 0 and
1, both inclusive. It is sometimes said “the conditional probability of D on E.” So,
P(D ∩ E) is “on” P(E) in the fraction. However, it cannot be “on E” in the fraction
because you cannot use division on sets (division is for numbers).
“The conditional probability of D given E” means that “the conditional probability
of D given that E has occurred.” We are interested in P(D), say, to make some
important decisions. We received the information that E has occurred. Then, P(D)
is obsolete and incorrect in light of this information. We should not make
important decisions based on old, incorrect probability because it leads to
incorrect decisions. So, we need to update P(D) using the information that E has
occurred.
P(D) is a probability with respect to S. In fact, P(D) is a conditional probability of D
given S, P(D|S) = P(D ∩ S)/P(S) = P(D)/1 = P(D) since P(S) = 1 and D ∩ S = D (check
this with the left diagram above). That is, any probability, including P(D), is a
conditional probability on S. Now, we have S = E U Ec and suppose that we now
have the information that Ec is no longer true (since E has occurred). So, P(D) =
P(D|S) is no longer correct. How to correct it? Using E as a new sample space,
you condition D on E. Actually, you must condition D ∩ E, instead of D, since D =
(D ∩ E)U(D ∩ Ec) and D ∩ Ec is not valid; namely, D ∩ Ec = φ because E has occurred.
3
That is, we have D = D ∩ E and P(D) = P(D ∩ E) with the information that E has
occurred.
All this is clearer with the left diagram above. Since E has occurred, shrink S (the
rectangle) to E (the circle), getting rid of Ec and shrink D to P(D ∩ E). Consequently,
P(D) is now changed (shrunk or reduced) to P(D ∩ E) which is the probability of D
with respect to E, no longer to S. So, P(D) must be P(D ∩ E)/P(E) (not P(D) = P(D ∩
S)/P(S) any more) since E has occurred. This is what the two sentences,
“Conditionality is used to update probability when some relevant information
becomes available” and “It is accomplished by updating the sample space,” mean.
So, you are using some probability to make decisions. Then becomes available a
piece of information that some part of the sample space is no longer true. So, the
sample space and the probability for your decision-making are obsolete and no
longer correct. As a result, you update (correct) the probability and the sample
space by shrinking the sample space to the part that is true. The resulting,
updated, correct probability is the conditional probability given the true part of
the original sample space. The true part is the updated, correct sample space.
Let us have some examples. You live in a big city where the chance of getting
convicted for murder is only 10%, P(C) = 0.10, the chance of getting arrested for
murder is 8%, P(A) = 0.08, and the chance of getting arrested and convicted for
murder is 7%, P(A ∩ C) = 0.07. You just despise your mother-in-law. After
examining these probability, you decide that it is worth risking the chance and
whack her (oh, it feels so good – she had it coming!!)
However, now you are arrested for the murder and sitting in the back of a policy
car. What is your chance of getting convicted for the murder? It is not 10%. You
might think it is only 7% because the probability is for the event of being arrested
and convicted for the murder. However, this P(A ∩ C) = 0.07 is based on the
entire samples space, including “not getting arrested, Ac.” It must be updated by
conditioning it on A, which is P(A ∩ C)/P(A) = 0.07/0.08 = 0.875. Yes, you need a
good lawyer.
4
Conditioning can be used to obtain predictive (speculative) probability. This is a
conditional probability of A given that B will occur. That is, a conditional
probability can be an updated probability, the last example, or a predictive
probability. For instance, you have just come into a big money by winning in a big
poker tournament and are wondering how to invest the money.
Of course, you decide not to invest in crude oil because the chance of the crude
oil price going up 40% within one month is only 0.0001, P(O40) = 0.0001. Also, the
chance of U. S. going to war in the Middle East is 0.00005, P(W) = 0.00005, and
the chance of U. S. going to war and the crude oil price going up 40% within one
month is 0.00004, P(W ∩ O40) = 0.00004.
Now, you have just talked to your college buddy who works for the Department
of Defense, and he has told you that we are about to go to war in the Middle East.
Then, the chance of crude oil price going up 40% is predicted by P(O40|W) = P(W ∩
O40)/P(W) = 0.00004/0.00005 = 0.8. With this predictive probability, you might
want to invest your poker tournament winning in crude oil for a month or so.
Now, by multiplying both sides of P(D|E) =
P(D∩E)
P(E)
with P(E) and swapping the
left and right sides around, you get P(D ∩ E) = P(E)*P(D|E). Some mathematicians
and statisticians define the conditional probability of D given E to be any
probability P(D|E) that satisfies the last equation because it is equivalent to the
original definition but does not require any condition such as E ≠ φ (the simpler,
the better). Indeed, the last equation appears often in practice.
An example of P(D ∩ E) = P(E)P(D|E) is the following. You draw two cards from a
well-shuffled deck of 52 playing cards without replacement (that is, the first card
drawn stays on the table and not goes back into the deck for the second draw).
You are interested in the probability of the first card drawn is a black card, B1, and
the second card drawn is also a black card, B2. That is, P(B1 ∩ B2) is of your
interest. This probability is computed as
1
*
1
26 25
. The first probability, 1/26, is P(B1)
and the second probability, 1/25, is P(B2|B1). That is, P(B1 ∩ B2) = P(B1)*P(B2|B1).
5
By the way, can you find the probability of the first card drawn is a heart and the
second card drawn is a red card?
Conditional probabilities are used in the Bayes’ Theorem and Bayesian statistics.
If you are interested in them, you can find information about them at
http://en.wikipedia.org/wiki/Bayes'_theorem
http://en.wikipedia.org/wiki/Category:Bayesian_statistics
http://en.wikipedia.org/wiki/Bayesian_inference
Independent Events:
If the probability of one event remains the same as before and after another
event happens, then the second event has no effect on the first event (in terms of
its probability). These events are called independent events. That is, A and B are
independent events if and only if P(A | B) = P(A) or P(B | A) = P(B) . By the way,
“if and only if” means the two-way implication.
If A is independent of B; that is, P(A | B) = P(A), then P(A | B) = P(A ∩ B)/P(B) =
P(A). That is, P(A ∩ B)/P(B) = P(A) which is equivalent to P(A ∩ B) = P(A)*P(B). So,
if P(A | B) = P(A), then P(A ∩ B) = P(A)*P(B). Now, this means P(B | A) = P(A ∩
B)/P(A) = P(B)*P(A)/P(A) = P(B), and we have P(B | A) = P(B) as well. This shows
that B is independent of A if and only if A is independent of B. So, we have the
following.
A and B are independent of each other if and only if P(A | B) = P(A) or P(B | A) =
P(B).
And, equivalently,
A and B are independent of each other if and only if P(A ∩ B) = P(A)*P(B).
Let us have some examples of independent (and dependent) events. Recall the
example with drawing two cards from a well-shuffled deck of 52 cards. If the
6
drawing is performed with replacement (that is, the card drawn first is shuffled
back into the deck for the second draw), then the probability of two cards drawn
being black cards is 1/26*1/26. That is, P(B1 ∩ B2) = P(B1)*P(B2). So, B1 and B2 are
independent events when the drawing is conducted with replacement while they
are not independent events when the drawing is conducted without replacement.
If a balance die is rolled and a fair coin is tossed, then the evens concerning the
number of dots on the rolled die and the events concerning the side showing up
on the tossed coin are independent of each other. For instance, let A be the
event of the number of dots on the die greater than 4 and H be the event of the
side up being heads. Then, P(A) = 2/6 = 1/3 and P(H) = 1/2 and P(A ∩ H) = 2/12 =
1/6. P(A)*P(H) = (1/3)*(1/2) = 1/6 = P(A ∩ H). That is, A and H are independent
events of each other. This independence comes from the physical and
operational independences of the die and the coin.
In the example with the murder of your mother-in-low, A and C are not
independent events since P(A ∩ C) = 0.07 is different from 0.008 =P(A)*P(C)
(=0.08*0.10). In the example with the crude oil price, the two events, O40 and W,
are not independent events because P(O40|W) = 0.8 and P(O40) = 0.0001 are
different from each other.
Finally, there are some interesting questions. If A and B are independent, are A
and Bc independent? If A and B are independent, are Ac and B independent? If A
and B are independent, are Ac and Bc independent? You should be able to find
the answers by checking whether the probabilities of intersections can be
factored into the individual probabilities or not. For instance, to answer whether
A and Bc are independent or not, you need to check whether P(A ∩ Bc) is same as
P(A)P(Bc) or not, using P(A ∩ B) = P(A)P(B). Venn diagrams are very helpful for this.
© Copyrighted by Michael Greenwich, 08/2012.
7