Download P(A|B)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Chapter 2. Conditional Probability
The probabilities assigned to various events depend on what
is known about the experimental situation when the assignment
is made. For a particular event A, we have used P(A) to
represent the probability assigned to A; we now think of P(A) as
the original or unconditional probability of the event A.
2.1 The definition of conditional probability
In this section, we examine how to information “an event B
has occurred” affects the probability assigned to A. We will use
the notation P(A|B) to represent the conditional probability of A
given that the event B has occurred.
Conditioning is one of the fundamental tools of probability:
probably the most fundamental tool. It is especially helpful for
calculating the probabilities of intersections, such as P(A|B),
which themselves are critical for the useful Partition Theorem.
Additionally, the whole field of stochastic processes is based on
the idea of conditional probability. What happens next in a
process depends, or is conditional, on what has happened
beforehand.
Dependent events. Suppose A and B are two events on the same
sample space. There will often be dependence between A and B.
This means that if we know that B has occurred, it changes our
knowledge of the chance that A will occur.
Example1: Toss a die once.
However, if we know that B has occurred, then there is an
increased chance that A has occurred:
Conditioning as reducing the sample space
Example 2. The car survey in Examples of basic probability
calculations
also asked respondents which they valued more
highly in a car: ease of parking, or style/prestige. Here are the
responses:
Suppose we pick a respondent at random from all those in the table.
Let event A =“respondent thinks that prestige is more important”.
Suppose we reduce our sample space from
This is our definition of conditional probability:
Definition: Let A and B be two events with P(B)>0. The
conditional probability that event A occurs, given that event B
has occurred, is written P(A|B), and is given by
Note: Follow the reasoning above carefully. It is important to
understand why the conditional probability is the probability of
the intersection within the new sample space
Conditioning on event B means changing the sample space to
B.
Think of P(A|B) as the chance of getting an A, from the set of
B's only.
The Multiplication Rule
For any events A and B,
New statement of the Partition Theorem
(The Law of Total Probability)
The Multiplication Rule gives us a new statement of the
Partition Theorem (Total Probability Theorem)
Both formulations of the Partition Theorem are very widely used,
but especially the conditional formulation
Examples of conditional probability and partitions
Example 3.
A news magazine publishes three columns entitled “Art”(A),
“Books”(B), and “Cinema”(C). Reading habits of a randomly selected reader with
respect to these columns are
Read
A
B
C
AB
AC
B C
A B C
regularly
Probability 0.14
0.23
0.37
0.08
0.09
0.13
0.05
We thus have
P( A  B) 0.08

 0.348
P( B)
0.23
P( A  ( B  C )) 0.04  0.05  0.03
P( A | B  C ) 

 0.255
P( B  C )
0.47
P( A  ( A  B  C ))
P( A | reads at least one)  P( A | A  B  C ) 
P( A  B  C )
P( A)
0.14


 0.286
P( A  B  C ) 0.49
P(( A  B)  C )) 0.04  0.05  0.08
P( A  B | C ) 

 0.459
P(C )
0.37
P( A | B) 
Example 4.
Four individuals have responded to a request by a blood bank for
blood donations. None of them has donated before, so their blood types are unknown.
Suppose only type A+ is desired and only one of the four actually has this type. If the
potential donors are selected in random order for typing, what is the probability that at
least three individuals must by typed to obtain the desired type?
Solution. Making the identification B={first type not A+} and
A={second type not A+},
P(B)=3/4.
Given that the first type is not A+, two of the three individuals
left are not A+, so P(A|B)=2/3.
The multiplication rule now gives
P(at least three individuals are typed)=P(A  B)=P(A|B)P(B)
=2/3*3/4=0.5
The multiplication rule is most useful when the experiment consists of
several stages in succession. The conditioning event B then describes the
outcome of the first stage and A the outcome of the second, so that
P(A|B)-conditioning on what occurs first-will often be known.
The rule is easily extended to experiments involving more than two
stages.
More than two events
To find P( A1
A2
A3 )
, we can apply the multiplication rule
successively:
Where A1 occurs first, followed by A2 , and finally A3 .
Example 5. For the blood typing experiment of the above example,
P(third type is A)  P(third is | first isn' t  second isn' t )
 P(second isn' t | first isn' t )  P( first isn' t )
1 2 3 1
     0.25
2 3 4 4
Example 6:
A box contains w white balls and r red balls. Draw 3 balls without
replacement. What is the probability of getting the sequence white, red, white?
Solution:
Example 7.
Tom gets the bus to campus every day. The bus is on time with
probability 0.6, and late with probability 0.4.
The sample space can be written as
We can formulate events as follows:
T = “on time”; L = “late”.
From the information given, the events have probabilities:
P(T) = 0.6 ; P(L) = 0.4
Question(a) Do the events T and L form a partition of the sample space?
Explain why or why not.
Solution.
Yes.
They cover all possible journeys (probabilities sum to 1), and there is no
overlap in the events by definition.
The buses are sometimes crowded and sometimes noisy, both of
which are problems for Tom as he likes to use the bus journeys to do his
Stats assignments. When the bus is on time, it is crowded with probability
0.5. When it is late, it is crowded with probability 0.7. The bus is noisy
with probability 0.8 when it is crowded, and with probability 0.4 when it
is not crowded.
Question(b) Formulate events C and N corresponding to the bus being
crowded and noisy. Do the events C and N form a partition of the sample
space? Explain why or why not.
Solution.
Let C = “crowded”, N =“noisy”.
C and N do NOT form a partition of  .
It is possible for the bus to be noisy when it is crowded, so there must be
some overlap between C and N.
Question(c) Write down probability statements corresponding to the
information given above. Your answer should involve two statements
linking C with T and L, and two statements linking N with C.
Solution.
Questin(d) Find the probability that the bus is crowded.
Question(e) Find the probability that the bus is noisy.
Example 8.
A chain of video stores sells three different brands of VCRs. Of its
VCR sales, 50% are brand 1(the least expensive), 30% are brand 2, and 20% are
brand 3. Each manufacturer offers a 1-year warranty on parts and labor. It is known
that 25% of brand 1’s VCRs require warranty repair work, whereas the corresponding
percentages for brands 2 and 3 are 20% and 10%, respectively.
Question(a) What is the probability that a randomly selected purchaser has bought a
brand 1 VCR that will need repair while under warranty?
Question(b) What is the probability that a randomly selected purchaser has a VCR
that will need repair while under warranty?
Question(c) If a customer returns to the store with a VCR that needs warranty repair
work, what is the probability that it is a brand 1 VCR? A brand 2 VCR? A brand 3
VCR?
Solution. Let Ai ={brand i is purchased}, for i=1,2,and 3.
B={needs repair}, B’={doesn’t need repair}.
Then P( A1 )=0.5, P( A2 )=0.3, P( A3 )=0.2
P(B| A1 )=0.25,
P(B| A2 )=0.2,
P(B| A3 )=0.1.
(a) P( A1  B )=P(B| A1 )P( A1 )=0.25*0.5=0.125
(b) P(B)=P((brand 1 and repair) or (brand 2 and repair) or (brand 3 and repair) )
= P( A1  B )+P( A2  B )+P( A3  B )=0.125+0.06+0.02=0.205
(c) P( A1 | B )=
P( A1  B) 0.125

 0.61 ,
P( B)
0.205
P( A3 | B )=1- P( A1 | B )-P( A2 | B )=0.1
P( A2 | B )=
P( A2  B) 0.06

 0.29
P( B)
0.205
2.2
Statistical Independence
Two events A and B are statistically independent if the
occurrence of one does not affect the occurrence of the other.
We use this as our definition of statistical independence.
For more than two events, we say:
Statistical independence for calculating the probability of an
intersection
We usually have two choices.
1. IF A and B are statistically independent, then
2. If A and B are not known to be statistically independent, we usually
have to use conditional probability and the multiplication rule:
This still requires us to be able to calculate P(A|B).
Note: If events are physically independent, then they will also be statistically independent.
Pairwise independence does not imply mutual independence
Example 9. A jar contains 4 balls: one red, one white, one blue,
and one red, white& blue. Draw one ball at random.
So A, B and C are NOT mutually independent, despite being
pairwise independent.
Example 10.
It is known that 30% of a certain company’s washing machines
require service while under warranty, whereas only 10% of its dryers need such
service. If someone purchases both a washer and a dryer made by this company, what
is the probability that both machines need warranty service?
Let A denote the event that the washer needs service while under warranty,
Let B defined analogously for the dryer.
Then P(A)=0.3, P(B)=0.1.
Assuming that the two machines function independently of one another,
the desired probability is
P( A  B)  P( A)  P( B)  0.3  0.1  0.03
The probability that neither machine needs service is
P( A'B' )  P( A' )  P( B' )  (0.7)(0.9)  0.63.
Example 11. A system consists of four components, as illustrated in Fig .The entire
system will work if either the 1-2 subsystem works or if the 3-4 subsystem works
(since the two subsystems are connected in parallel). Since the two components in
each subsystem are connected in series, a subsystem will work only if both its
components work. If components work or fail independently of one another and if
each works with probability 0.9, what is the probability that the entire system will
work (the system reliability coefficient)?
Letting Ai (i=1,2,3,4) be the event that the ith component works, the Ai ’s are
mutually independent.
The event that the 1-2 subsystem works in A1  A2 , and similarly, A3  A4 denotes
the event that the 3-4 subsystem works.
The
event
that
the
entire
system
works
is
( A1  A2 )  ( A3  A4 ) ,
so
P[( A1  A2 )  ( A3  A4 )]  P( A1  A2 )  P( A3  A4 )  P[( A1  A2 )  ( A3  A4 )]
 P( A1 )  P( A2 )  P( A3 )  P( A4 )  P( A1 )  P( A2 )  P( A3 )  P( A4 )
 (0.9)(0.9)  (0.9)(0.9)  (0.9)(0.9)(0.9)(0.9)
 0.9636
Example 12. Suppose that a machine produces a defective item with probability
p (0<p<1) and produces a nondefective item with probability 1-p. Suppose further that
six items produced by the machine are selected at random and inspected, and that the
results (defective or nondefective) for these six items are independent. We shall
determine the probability that exactly two of the six items are defective.
Solution. It can be assumed that the sample space S contains all possible
arrangements of six items, each one of which might be either defective or
nondefective.
Let Dj denote the event that the jth item in the sample is defective,
c
then D j is the event that this item is nondefective.
Since the outcomes for the six different items are independent, the probability of
obtaining any particular sequence of defective and nondefective items will simply be
the product of the individual probabilities for the items. For example,
P( D1c  D2  D3c  D4c  D5  D6c )  P( D1c ) P( D2 ) P( D3c ) P( D4c ) P( D5 ) P( D6c )
 (1  p) p(1  p)(1  p) p(1  p)
 p 2 (1  p)4 .
It can be seen that the probability of any other particular sequence in S containing two
defective items and four nondefective items will also be p 2 (1  p)4 .
6
Since there are  
2
 
distinct arrangements of two defective items and four nondefective items.
 6
The probability of obtaining exactly two defectives is   p 2 (1  p ) 4 .
2
 
2.3
Bayes' Theorem: inverting conditional probabilities
Then
This is the simplest form of Bayes' Theorem, named after Thomas Bayes (1702-61), English
clergyman and founder of Bayesian Statistics.
Bayes' Theorem allows us to “invert” the conditioning, i.e. to
express P(B| A) in terms of P(A|B).
This is very useful. For example, it might be easy to calculate,
P(later event|earlier event);
but we might only observe the later event and wish to deduce
the probability that the earlier event occurred,
P(earlier event| later event)
Full statement of Bayes' Theorem:
Example 13. The case of the Perfidious Gardener.
Mr Smith owns a hysterical rosebush. It will die with probability 1/2 if watered, and
with probability 3/4 if not watered. Worse still, Smith employs a perfidious gardener
who will fail to water the rosebush with probability 2/3. Smith returns from holiday to
find the rosebush . . . DEAD!!
What is the probability that the gardener did not water it?
So the gardener failed to water the rosebush with probability
3/4.
Example14. The case of the Defective Ketchup Bottle.
Ketchup bottles are produced in 3 different factories, accounting for 50%, 30%, and
20% of the total output respectively. The percentage of defective bottles from the 3
factories is respectively 0.4%, 0.6%, and 1.2%. A statistics lecturer who eats only
ketchup finds a defective bottle in her wig. What is the probability that it came from
Factory 1?
Information given: