Conditional Probability Download

Transcript
Conditional Probability
STA 281 Fall 2011
1 Definition
Often we are only interested in particular rows or columns of a probability table. Consider the
newspaper example, and the question “Of those that receive the morning paper, what proportion receive
the evening paper?” This question does not concern the entire population of households; it only
concerns those who receive a morning paper. Probabilities that refer only to subsets of the population
are called conditional probabilities. Recall the probability table we constructed
0.10 0.20 0.30
0.50 0.20 0.70
0.60 0.40 1.00
The question asked concerns only those who receive a morning paper, which is 60% of the entire
population. We want to know, out of that 60%, what proportion receive the evening paper. To provide
an intuitive fell for how this question is answered, suppose we sampled 100 people from the population.
On average 60 of those people would receive the morning paper. Looking at the M column of the table,
we see that of those 60, on average 50 receive only the morning paper while 10 receive both. So 10 of
the 60 who receive the morning paper also receive the evening paper, so the conditional probability is
10/60=1/6.
Usually we don’t go through the argument concerning sampling a set of people and just divide the
probabilities directly. There are 60% of the people who receive a morning paper, with 50% receiving
only the morning paper and 10% receiving the evening paper. So 0.10/0.60=1/6 is the conditional
probability of receiving an evening paper given one receives a morning paper.
Mathematically, a conditional probability has two parts: First, a conditional probability only asks
about a subset of the population, not the entire population. Second, a conditional probability asks some
property of that subset. In our example question, the subset of interest was those who receive the
morning paper, while the property we are interested in was receiving an evening paper. In general, we
have a question: “of those who are in subset A, what is the probability they are in B.” This question is
translated into mathematical symbols
, which is read “the probability of B given A.”
Notice how we solved the problem. First, we found which of the individuals were in the subset of
interest. This involved finding
, the unconditional probability of the subset. Then, within that
subset, we found how many individuals had the property we were interested in. The result was
Instead of writing this fraction in words, we can use symbols. For the denominator, the “people in
subset A” refers to
. For the numerator, the people must be in subset A, but they must also have
property B. Since both criteria must be satisfied, the numerator is
, resulting in
Mathematically, this formula is the definition of conditional probability.
immediately implies what is called the intersection rule
1
Rearranging the terms
Similarly, since just switching the roles of A and B in the definition of conditional probability yields
, we find
Since
is the same as
, the two previous equations provide two ways of finding the
probability of an intersection. Simply use whichever conditional probability is more convenient.
2 Recognizing Conditional Probabilities
Remember the key point about conditional probability is that we are only interested in a subset of the
population. The first step in evaluating a conditional probability is determining which outcomes we are
interested in. In our example we were interested in people who received a morning paper. The second
step is to identify the property of interest (in our example it was receiving an evening paper). After that
you use the definition of conditional probability
.
Whenever you see a probability stated or are asked for a probability, ask yourself two questions.
First, “Who is this statement about?” In a conditional probability we are only interested in a subset of
the population. It is important to determine which subset as soon as possible so we can proceed.
Second, “What are they asking about?” That is, what property of the subset is the question or statement
about.
For example, suppose our population is all registered voters. Contrast the two questions: “What
proportion of women are Democrats?” and “What proportion of voters are women Democrats?” The
first statement does not ask about all registered voters, it only asks about women. Therefore it is a
conditional probability. After deciding it only asks about women, we must then determine what exactly
it wants to know about women. In this example, the property of interest is being Democratic. If W is
the event “voter is female” and D is the event “voter is a Democrat”, the first question asks for
.
The second question does not place any restriction on the population since it asks for the proportion of
voters. The property it is interested in is whether a voter is a Democratic woman. Thus the second
question is asking for
. These are separate questions, so you must recognize which one you
are being asked.
Of course, in any language there are multiple ways to ask the same question. The following
questions are equivalent, all ask for the probability of receiving an evening paper in the set of outcomes
where a morning paper was received.

What proportion of those that receive a morning paper receive an evening paper?

Given a person receives a morning paper, what is the probability they receive an evening paper?

If someone receives a morning paper, what is the probability they receive an evening paper?
There are a number of ways to report the result P(A|M)=1/6 as well.

1/6 of those that receive a morning paper receive an evening paper.

Given someone receives a morning paper, there is a 1/6 probability of receiving an evening
paper.

If someone receives a morning paper there is a 1/6 probability of receiving an evening paper.
2
The probability table allows us to compute conditional probabilities fairly easily, since we are just
involved with one row or column of the table. More complicated conditional probabilities may be
computed as well. For example, given that someone receives at least one of the papers, what is the
probability they receive at most one of the papers? This is a conditional probability since we are
interested only in those who receive at least one paper, not everyone. This subset of individuals
includes 3 cells of our table, for a total of 80% of the population.
Within that subset, we are interested in the property “receive at most one of the papers”. As with
all conditional probabilities, we look at the individuals within the subset and try to determine which
obey the property. Just looking at the cells which compose the subset, we find only the cells
and
satisfy the property. Those two boxes total 70% of the overall probability. The
conditional probability is thus 0.70/0.80=7/8.
3 Using Conditional Probabilities
The definition of conditional probability is that
. We derive previously the
intersection rule
. This formula allows us to take conditional probabilities as
given information and complete a probability table.
Suppose in a small university, students work in either the dorms or library. 70% of the students do
some work in the library while 10% work only in the dorms. Of those who work in the library, 30% also
work in the dorms. Use this information to construct a probability table.
Students may either work in the dorms or the library. Since they may work or not work in the
library and work or not work in dorms, we may construct a probability table.
1.00
We are given the information that 70% of the students work in the library. Since this information was
given without any reference to whether or not those students work in the dorms, it is therefore placed in
the margins of the table. We are also given that 10% work only in the dorms. This is one of the 4 core
cells of the table. Before working with the conditional probability, we may use some arithmetic to fill in
some of the table.
0.10
0.20
0.30
0.70
1.00
To complete the table, we must use the conditional probability given in the problem. We are
given the information that of those who work in the library, 30% work in the dorms. This is a
conditional probability, not the proportion that work in both. We are conditioning on people who work
in the library (70% of the students). We are given that 30% of those 70% also work in the dorm. 30% of
70% may be found by multiplying the probabilities, so (0.30)(0.70)=0.21 work in the dorm and the
library. Mathematically, we have been given
=0.30, and we have found
We may complete the table.
Since we have computed the probabilities of all outcomes, we may compute any probabilities from
the table.
3

What is the probability a student works only in the library? (0.49)

What is the probability a student works in the dorms? (0.31)

What is the probability a student works in either the dorms or in the library?
(0.21+0.49+0.10=0.80)

What is the probability a student works in neither the library nor the dorms? (0.20)

What is the probability a student that does not work in the library works in the dorms?
(0.10/0.30=1/3)
4 More difficult problems
In the dorm and library example, the conditional probability provided allowed the direct computation of
a cell probability. Problems involving conditional probabilities may be more difficult.
4.1
Applicants Example
Suppose a company is looking at applicants for a position. The position requires (A) experience and (B)
a master’s degree. Suppose that 90% of the applicants have at least one of (A) or (B). Suppose further
that, of those applicants with at least one of (A) or (B), 50% have both. Of those applicants with exactly
one of (A) or (B), 2/3 have (A).
We can construct a probability table and fill in one of the cells directly. Since 90% of the applicants
have at least one of (A) or (B), we may determine by the complement rule that 10% have neither. This
results in the table
0.10
1.00
To fill in the remainder of the table, we have to use conditional probabilities. We are given that 50% of
the people with at least one have both. This probability involves the subset “with at least one”. This
subset contains 90% of the applicants. Of this 90%, we are given 50% have both. Thus 50% of the 90%,
or 45%, have both.
The remaining cells must be found by utilizing that 2/3 of the applicants with exactly one of (A) or
(B) have (A). Although we are not given the proportion of individuals with exactly one directly, we can
derive it to be 1-0.45-0.10=0.45. Since 2/3 of those 45% have (A), we conclude
and
complete the table through arithmetic.
4.2
Professors and their Stories Example
Here is another example. Suppose that 10% of professors can think of decent stories to go with their
exam problems. Of those that can, 60% can think of funny stories. Suppose also that, of those
professors whose stories are at least one of funny/decent, 75% of the professors write funny stories.
The first two probabilities allow us to complete part of the table.
0.06
0.04
0.10
0.90
4
1.00
To fill in the remaining entries, we must use the information that “of those professors whose stories are
at least one of funny/decent, 75% of the professors write funny stories”. Unfortunately, with the table
so far we cannot compute the probability of being at least one of funny or decent, because we are
missing the cell
. Let’s just fill in this unknown quantity with x and see if we can make
progress in solving for x. In terms of the unknown x, the proportion of professors whose stories are at
least one of funny or decent is 0.06+0.04+x=0.10+x. We are given
Solving for x, we find x=0.06, so we may complete the table. As with all probability tables, once the
table is completed we may compute any probability.

What proportion of professors write both decent and funny stories? (0.06)

Of the professors with funny stories, what proportion write decent stories? (0.06/0.12=0.5)

Of those professors whose stories are exactly one of funny or decent, what proportion write
funny stories? (0.06/0.10=0.6)
5 Independence
If
is different than
, then A and B are related. Return to the newspaper example. We
find
=1/6 while
=1/2. People who receive the morning paper are less likely to receive
the evening paper than people who do not receive the morning paper. Since the probability the event E
occurs depends on whether or not M occurred, we call these events dependent. There are many events
of this form. Any events with a causal relationship, for example, will be of this form. Individuals with a
Ph.D. are much more likely to get a faculty position at a research university than individuals who do not
have a Ph.D., the simple reason being a Ph.D. is a requirement for such a faculty position. Indirect links
between variables also create dependence. An elementary school student with large feet is more likely
to read well. Foot size and reading ability have no causal link, but children with big feet tend to be
older, and older children tend to read better than younger children. All of these types of relationships
between variables result in dependent events. Much of science is concerned with dependence. In
medicine, for example, we would like to know whether administering a treatment to a patient increases
the patient’s chance of recovery.
Occasionally two variables are unrelated. A coin has no memory. If you flip a fair coin twice, the
result of the second flip has nothing to do with the result of the first flip. The probability of heads is
always 0.5. If
is the event “heads on the first flip” and H2 is the event “heads on the second flip”, then
=0.5 and
=0.5. They are the same probability since the result of the first flip does
not affect the probability of heads on the second flip.
Events A and B where
are called independent events. Essentially, whether or
not B occurred does not affect the probability A occurred. The purpose of this section is to derive some
special properties of independent events. With the information
, we can derive some
equivalent definitions of independence. Recall the Law of Total Probability
Using the rule for intersections
5
Now using the assumed information
[
]
This equivalent definition of independence states that the overall probability of A, P(A), is the same as
the conditional probability of A given B, P(A|B). If you know the overall probability of A, knowing
whether or not B occurred does not change that probability. Another equivalent definition may be
found by using the rule for intersections
This third definition is the one used for mathematical purposes. The reason is that the conditional
probability is a fraction, and thus there is the possibility of dividing by zero. The equation above
presents no such problem.
If P(A) and P(B) are both greater than 0, then all three definitions are equivalent. However, if
P(A)=0 or P(B)=0, then the first two definitions might result in division by zero problems. To verify
independence in this course, you must show the third definition, that
.
For example, reconsider the newspaper example.
independent?” you should check
If you were asked “Are events M and E
Since the equality required for independence does not hold, the events are dependent, not independent.
Alternatively, suppose a probability table has
0.08
0.32
0.40
0.12
0.48
0.60
0.20
0.80
1.00
In this example,
and thus the events are independent.
6 The Relationship between Disjoint and Independent
The short answer is that there is no relationship between two disjoint events and two independent
events. They are separate definitions, and they are useful in different scenarios.
Disjoint events are defined as events A and B such that
. Disjoint events are useful in
that they simplify the rule for unions. In general, the rule for unions states
If A and B are disjoint, then
and thus the union rule simplifies to the third axiom
6
Remember, you need the disjoint assumption to make the simplification (technically you only need
, which you can derive from the disjoint assumption).
Independence is defined as
intersections. In general,
, and thus results in a simplification of the rule for
If A and B are independent, this simplifies to
Remember you have to have the independence assumption to make this simplification.
Independent events are sometimes disjoint and sometimes not, while disjoint events are sometimes
independent and sometimes not. You have to know your purpose and check the appropriate definition.
7 Bayes Rule
A conditional probability concerns a subset of the population. Within that population, all the axioms of
probability apply. For example, all conditional probabilities have to be nonnegative, just like axiom 2.
Theorems such as the complement rule also still apply. Thus, we can prove theorems like
Suppose a university is interested in graduation rates between students who have off-campus jobs and
students who do not. Let G be the event “graduates in 6 years or less” and W be the event “works offcampus”. If you are given the information that 60% of the students with off-campus jobs graduate in 6
years, then P(G|W)=0.60. You can also conclude that 40% of the students with off-campus jobs do
NOT graduate. In symbols, you can conclude P(GC|W)=0.40. In both conditional probabilities, you are
conditioning on the same group of people, students with off-campus jobs. If 60% of those students
graduate, the other 40% of those students do not.
Unfortunately, many people make the mistake of doing calculations based on different populations,
such as concluding, incorrectly, that P(A|B)=1-P(A|BC). In our example, this would be the same as
concluding that since 60% of the students with off-campus jobs graduate, then only 40% of the students
without off-campus jobs graduate. But this conclusion is unwarranted. Just because 60% of the students
with off-campus jobs graduate doesn’t say anything about the students without off-campus jobs. They
are separate groups of people, and can have separate, unrelated probabilities. It’s possible all students
without off-campus jobs graduate. It is also possible none of them do, or anywhere in between.
Another common mistake is to assume P(A|B)=1-P(B|A). In our example, this would be the same
as using the information that 60% of students with off-campus jobs graduate to conclude that 40% of
students who graduate have off-campus jobs. This conclusion is unwarranted as well. To make this
more obvious, let F be the event a person is female and P be the event the person is pregant. Suppose at
any given time that 2% of women are pregnant. This says P(P|F)=0.02. You cannot then conclude
that 98% of pregnant people are female.
There actually is a relationship between P(A|B) and P(B|A), but it is more complicated. Note by
definition
7
By the law of total probability
, so
We can then use the rule for intersections repeatedly to show
This last equation is a simplified version of Bayes rule, which is vital in many scientific applications.
Typically the event A is that some hypothesis is true, and thus AC is the event the hypothesis is false.
The event B corresponds to some piece of data being observed. By comparing the relative likelihoods A
and AC assign to B (i.e., the relative probability of observing the data when the hypothesis is true P(B|A)
versus observing the data when the hypothesis is false P(B|AC) you can compute how observing the data
B affects your belief in the hypothesis A.
Note that Bayes rule includes the probabilities P(A) and P(AC). This is useful in situations such as
medical diagnostic testing, where it is worthwhile to incorporate relative rates of disease into the
calculations. If an exotic, rare disease produces a particular symptom but a common disease also
produces the symptom, when faced with the symptom the more likely outcome is that the common
disease is present.
In this course, Bayes rule is typically not computed symbolically, but arises naturally as you fill in a
probability table. Conditional probabilities allow you to compute cells of the table, and after the table is
completed you can compute whatever probability you wish.
8 Sample Questions
The following questions are divided into three levels of difficulty (all relative). The “easiest” problems
contain no conditional probabilities in the given information, but do ask you to compute conditional
probabilities from the table. The “medium” problems have conditional probabilities in the given
information as well, so you must use the conditional probabilities to construct the table. The “hardest”
problems often require some type of algebra to construct the probability table.
1) (Easiest) In a study of 3756 court cases, Kalven and Zeisell (1966) recorded the jury panel’s
decision. In addition, they separately asked the judge how he or she would have decided the same
case if there were no jury panel. They found that: the judge would not have convicted in 17% of the
cases; both the judge and the jury would have convicted in 64% of the cases; and the judge and jury
disagree on whether to convict in 22% of the cases
a) Construct a probability table.
b) What is the probability that neither the judge nor the jury would convict?
c) Given the panel convicts, what is the probability the judge would also convict?
d) Are the events “judge convicts” and “jury panel convicts” disjoint? Why or why not?
8
2) (Easiest) Suppose in a particular company, employees use workstations or PCs (could be both or
neither). The probability an employee uses a PC is 0.95 and the probability an employee uses a
workstation is 0.20. The probability an employee uses both is 0.19.
a) Construct a probability table.
b) What is the probability an employee who does not use a PC uses a workstation?
c) What is the probability an employee uses exactly one of the machines?
d) Given an employee uses at least one of the machines, what is the probability they use a
workstation?
3) (Easiest) A muscle cell has 2 sites where electricity can conduct into the cell. Every time the
body intends to stimulate the muscle cell, an attempt is made to channel electricity through each of
the 2 sites. The probability both sites conduct electricity is 0.4. The probability that exactly one
site conducts electricity is 0.4. Finally, the probability site 1 conducts electricity is 0.7.
a) Construct a probability table.
b) What is the probability site 2 conducts electricity?
c) Given at least one of the sites conducts electricity, what is the probability both sites conduct
electricity?
4) (Medium) Suppose a professor only writes two types of exams, “easy” or “hard”. Suppose that 90%
of the exams are hard. There is an 80% chance that the first question on a hard exam will be
difficult, and a 15% chance that the first question on an easy exam will be difficult.
a) Construct a probability table.
b) On a given exam, what is the probability it is a hard exam that contains a non-difficult first
question?
c) On a given exam, what is the probability that the first question will be non-difficult?
d) Suppose the first question on a given exam is non-difficult. Given this information, what is the
probability the exam is hard?
e) Are the events “easy exam” and “difficult first question” independent? Why?
5) (Medium) Suppose at a particular park visitors can hike or raft. The probability that someone will
raft is 0.40. The probability that someone won’t hike is 0.10. Given someone hikes, the probability
they raft is 0.40.
a) Construct a probability table.
b) What is the probability of rafting?
c) What is the probability of not hiking?
d) What is the probability of someone participating in exactly one?
e) What is the probability of someone participating in at least one?
f)
Of those who do not participate in both, what proportion participate in neither?
9
6) (Medium) Suppose that 10% of all cases go to trial. Of those that go to trial, the defendant is
found guilty in 95% of the cases. Of those that do not go to trial, the defendant is found guilty
(through a plea bargain) in 40% of the cases.
a) Construct a probability table.
b) What proportion of defendants are found guilty?
c) Given a defendant is found guilty, what is the probability their case went to trial?
d) What proportion of cases that go to trial do not result in a guilty plea?
7) (Hardest) Investment advisors might subscribe to the Wall Street Journal or Investor’s Business
Daily. Suppose 80% subscribe to the WSJ. Of those that subscribe to the WSJ, 75% also subscribe
to IBD. Also, 20% of those that subscribe to exactly one of the papers subscribe to IBD.
a) Construct a probability table.
b) What is the probability an investment advisor subscribe to neither paper?
c) What proportion of investment advisors subscribe to IBD?
d) Given an investment advisor receives IBD, what is the probability they also receive the WSJ?
8) (Hardest) Suppose there are two restaurants in a small town, Abelard’s Attic and Baltazar’s Buffet.
Suppose further that 20% of the people in the town dine at neither restaurant. Of those that go to at
least one of the restaurants, 75% dine at both. Of those that dine at exactly one of the restaurants,
75% dine at Abelard’s.
a) Construct a probability table.
b) What proportion dine at Baltazar’s?
c) Of those that dine at Baltazar’s, what proportion dine at Abelard’s?
9) (Hardest) Let A and B be events. Suppose that P(A)=0.3 and P(Bc|A)=0.2. Suppose further that
given exactly one of the two events occurs, 40% of the time it is B that occurred.
a) Construct a probability table.
b) Given that at least one of the events occurred, what is the probability B occurred?
c) What is P(B|A)?
10) (Hardest) Let A and B be events. Suppose
. Suppose also that, conditional on
exactly one of the events occurring, the probability A occurs is 0.8. Finally, suppose that the
probability neither event occurs given B did not occur is 1/9.
a) You should know what to do for part (a) by now.
b) What is the probability that at least one of the events occurs?
c) Given that at most one of the events occur, what is the probability A occurs?
10
9 Solutions for Sample Problems
1)
a)
P
Pc
J
0.64
0.19
0.83
Jc
0.03
0.14
0.17
0.67
0.33
1.00
b) 0.14
c) 0.64/0.67=0.9552
d) They are not disjoint, they occur together with probability 0.64.
2)
a)
WS
WSc
PC
0.19
0.76
0.95
PCc
0.01
0.04
0.05
0.20
0.80
1.00
b) 0.01/0.05=0.20
c) 0.76+0.01=0.77
d) 0.20/(0.19+0.01+0.76)=0.2083
3)
a)
Site 1 conducts
0.40
0.30
0.70
Site 2 conducts
Site 2 doesn’t conduct
Site 1 doesn’t conduct
0.10
0.20
0.30
0.50
0.50
1.00
b) 0.50
c) 0.40/0.80=0.5
4) Let E be the event an exam is easy. Let D be the event the first question is difficult.
a)
D
Dc
E
0.015
0.085
0.100
Ec
0.720 0.735
0.180 0.265
0.900 1.00
b) 0.180
c) 0.265
d) 0.180/0.265=0.6792
11
e) The events are dependent, since
5)
a)
R
Rc
H
0.36
0.54
0.90
Hc
0.04
0.06
0.10
0.40
0.60
1.00
b) 0.40
c) 0.10
d) 0.54+0.04=0.58
e) 0.36+0.54+0.04=0.94
f)
0.06/0.64=0.0938
6) Let T be the event “go to trial” and G be the event “found guilty”.
a)
G
Gc
T
0.095
0.005
0.100
Tc
0.360 0.455
0.540 0.545
0.900 1.00
b) 0.455
c) 0.095/0.455=0.2088
d) 0.005/0.100=0.05
7)
a)
IBD
IBDc
WSJ
0.60
0.20
0.80
WSJc
0.05 0.65
0.15 0.35
0.20 1.00
b) 0.15
c) 0.65
d) 0.60/0.65=0.9231
12
8)
a)
B
Bc
A
0.60
0.15
0.75
Ac
0.05
0.20
0.25
0.65
0.35
1.00
B
Bc
A
0.24
0.06
0.30
Ac
0.04
0.66
0.70
0.28
0.72
1.00
B
Bc
A
0.45
0.40
0.85
Ac
0.10
0.05
0.15
0.55
0.45
1.00
b) 0.65
c) 0.60/0.65=0.9231
9)
a)
b) 0.28/0.34=0.8235
c) 0.24/0.30=0.8
10)
a)
b) 0.95
c) 0.40/0.55=0.7273
13