Download solutions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
DUBLIN INSTITUTE OF TECHNOLOGY
KEVIN STREET, DUBLIN 8
N
S
Probability Based Learning:
Introduction to Probability
O
REVISION QUESTIONS
SO
LU
TI
*** SOLUTIONS ***
M ACHINE L EARNING AT DIT
Dr. John Kelleher
Dr. Brian Mac Namee
*** SOLUTIONS ***
*** SOLUTIONS ***
*** SOLUTIONS ***
1. Given the joint distribution for X and Y listed in Table 1 calculate the following:
(a) P (X = x1 )
( marks)
0.02 + 0.14 + 0.10= 0.26
(b) P (Y = y2 )
0.14 + 0.32 = 0.46
(c) P (Y = y2 |X = x1 )
N
S
( marks)
( marks)
P (a∧b)
P (b)
P (Y =y2 ∧X=x1 )
→
P (X=x1 )
0.14
0.26
P (Y = y2 |X = x1 ) =
P (Y = y2 |X = x1 ) =
→
O
From the product rule: P (a|b) =
Table 1: Joint Distribution for X and Y
X = x1
0.02
0.14
0.10
X = x2
0.30
0.32
0.12
LU
TI
Y = y1
Y = y2
Y = y3
2. Given that P (a|b) = 0.5, P (a) = 0.3, P (b) = 0.4 calculate P (b|a).
P (b|a) =
P (a|b)×P (b)
P (a)
=
0.5×0.4
0.3
(x marks)
= 0.67
SO
3. Consider the domain of dealing 5-card poker hands from a standard deck of 52 cards,
under the assumption that the dealer is fair.
(a) Given that the number of combinations of r objects that can be selected, without regard to order and without repetition, from n distinct objects is given by the
n!
equation nr = (n−r)!r!
. How many atomic events are there in the joint probability distribution (i.e., how many 5-card hands are there)?
52!
52?51?50?49?48
There are 52
= 2, 598, 960 possible five-card
5 = (47!5! =
1?2?3?4?5
hands
*** SOLUTIONS ***
1
*** SOLUTIONS ***
*** SOLUTIONS ***
(b) What is the probability of each atomic event?
By the fair-dealing assumption, each of these is equally likely. Each
hand therefore occurs with probability 1/2,598,960.
N
S
(c) What is the probability of being dealt a royal straight flush (i.e. being dealt a hand
containing Ace, King, Queen, Jack and 10 all from the one suit)?
There are four hands that are royal straight ushes (one in each suit).
These events are are mutually exclusive, therefore the probability of a
royal ush is just the sum of the probabilities of the atomic events, i.e.,
1
4
2,598,960 = 649,740 .
(d) What is the probability of being dealt a four of a kind (i.e. four kings, or four
nines etc.?
There are 13 possible kinds and for each, the fth card can be one of 48
13×48
1
possible other cards. The total probability is therefore 2,598,960
= 4,165
.
O
4. Given the full joint distribution shown in Table 2, calculate the following:
Table 2: Full joint distribution for a dentist visit
¬toothache
catch ¬catch
0.072
0.008
0.144
0.576
LU
TI
toothache
catch ¬catch
0.108
0.012
0.016
0.064
cavity
¬cavity
(a) P (toothache)
This asks for the probability that T oothache is true. P (toothache) =
0.108 + 0.012 + 0.016 + 0.064 = 0.2
(b) P(Cavity)
This asks for the vector of probability values for the random variable
Cavity. It has two values, which we list in the order htrue, f alsei. First
add up P (cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2. Then we have
textbf P (Cavity) = h0.2, 0.8i .
SO
(c) P(T oothache|cavity)
This asks for the vector of probability values for T oothache, given that
Cavity is true. textbf P (T oothache|cavity) = h 0.108+.012
, 0.072+0.008
i=
0.2
0.2
h0.6, 0.4i
(d) P(Cavity|toothache ∨ catch)
*** SOLUTIONS ***
2
*** SOLUTIONS ***
*** SOLUTIONS ***
N
S
This asks for the vector of probability values for Cavity, given that either
T oothache or Catch is true.
Recall P (a|b) = PP(a∧b)
(b) →
P(Cavity|toothache ∨ catch) =
, P (¬cavity∧(toothache∨cavity))
i
h P (cavity∧(toothache∨cavity))
P (toothache∨catch)
P (toothache∨catch)
First compute P (toothache ∨ catch) = 0.108 + 0.012 + 0.016 + 0.064 +
0.072 + 0.144 = 0.416.
Then P(Cavity|toothache ∨ catch) =
h 0.108+0.012+0.072
, 0.016+0.064+0.144
i = h0.4615, 0.5384i
0.416
0.416
O
5. After you yearly checkup, the doctor has bad news and good news. The bad news is
that you tested positive for a serious disease and that the test is 99% accurate (i.e., the
probability of testing positive when you do have the disease is 0.99, as is the probability of testing negative when you don’t have the disease). The good news is that
this is a rare disease, striking only 1 in 10,000 people of your age. Why is it good
news that the disease is rare? What are the chances that you actually have the disease?
LU
TI
We are given the following information: P (test|disease) = 0.99
P (¬test|¬disease) = 0.99
P (disease) = 0.0001
and the observation test.
What the patient is concerned about is P (disease|test). Roughly speaking,
the reason it is a good thing that the disease is rare is that P (disease|test)
is proportional to P (disease), so a lower prior for disease will mean a lower
value for P (disease|test). Roughly speaking, if 10,000 people take the test,
we expect 1 to actually have the disease, and most likely test positive, while
the rest do not have the disease, but 1 percent of them (about 100 people)
will test positive anyway, so P (disease|test) will be about 1 in 100. More
(a)
precisely: P (a|b) = P (b|a)P
P (a)
P (disease|test) =
P (disease|test) =
P (test|disease)P (disease)
→
P (test)
P (test|disease)P (disease)
P (test|disease)P (disease)+P (test|6disease)P (¬disease)
→
0.99×0.0001
(0.99×0.0001)+(0.1×0.9999)
SO
= .009804
The moral is that when the disease is much rarer than the test a ccuracy,
a positive test result does not mean the disease is likely. A false positive
reading remains much more likely.
6. Suppose you are given a bag containing n unbiased coins. You are told that n − 1 of
these coins are normal, with heads on one side and tails on the other, whereas one is is
fake, with heads on both sides.
(a) Suppose you reach into the bag, picking out a coin uniformly at random, flip it,
and get a head. What is the (conditional) probability that the coin you chose is
the fake coin?
*** SOLUTIONS ***
3
*** SOLUTIONS ***
*** SOLUTIONS ***
P (f ake|heads) =
1
1× n
2+(n−1)
2n
1
n
N
S
There are n ways to pick a coin, and 2 outcomes for each flip (although
with the fake coin, the results of the ip are indistinguishable), so there
are 2n total atomic events, each equally likely. Of those, only 2 pick the
fake coin, and 2 + (n − 1) result in heads (the two heads results for the
fake coin and half the rest of the flips (n-1)).
So the probability of a fake coin given heads, P (f ake|heads), is
P (heads|f ake)P (f ake)
. P (heads|f ake) = 1
P (heads)
1
P (f ake) = n
−head−results
P (heads) = totals−number−of
= 2+(n−1)
total−atomic−events
2n
→
2n
2+(n−1)
LU
TI
O
P (f ake|heads) = ×
→
2
P (f ake|heads) = 2+(n−1) →
2
P (f ake|heads) = n+1)
More formally: Let α be a normalising constant.
P(F ake|heads) = αP(heads|F ake)P(F ake) →
P(F ake|heads) = αhP (heads|f ake), P (heads|¬f ake)ihP (f ake), P (¬f ake)i
P(F ake|heads) = αh1.0, 0.5ih n1 , (n−1)
n i
1 (n−1)
P(F ake|heads) = αh n , 2n i
Let us compute α
(n−1)
2+(n−1)
1
= n+1
n + 2n =
2n
2n →
n+1
By definition α 2n = 1 →
2n
α = n+1
2n
Plugging α = n+1
back into our equation
2n (n−1)
n+1 , 2n
2
h n+1
, (n−1)
n+1
P(F ake|heads) = h n1 ×
P(F ake|heads) =
×
2n
n+1 i
→
SO
(b) Suppose you continue flipping the coin for a total of k times after picking it and
see k head. Nows what is the conditional probability that you picked the fake
coin?
*** SOLUTIONS ***
4
*** SOLUTIONS ***
*** SOLUTIONS ***
Now there are 2k n atomic events:
(i) there are n ways to pick a coin,
(ii) there are k flips of the picked coin. Each flip can result in one of
2 outcomes (although with the fake coin, the results of the ip are
indistinguishable). As a result there are 2k possible outcomes for k
flips.
N
S
(iii) so there are 2k n atomic events in the domain (outcomes of flips ×
ways of picking a coin
Of these 2k n atomic events 2k pick the fake coin, and 2k + (n − 1) result in heads. So the probability of a fake coin given a run of k heads,
2k
P (f ake|headsk ), is (2k +(n−1))
. Note this approaches 1 as k increases,
as expected.
LU
TI
O
Doing it the formal way:
P(F ake|headsk ) = αP(headsk |F ake)P(F ake) →
P(F ake|headsk ) = αh1.0, 0.5k ih n1 , (n−1)
n i→
1 (n−1)
k
P(F ake|heads ) = αh n , 2k n i →
Let us compute α
2k +(n−1)
(n−1)
1
→
n + 2k n =
2k n
2k n
By definition α = 2k +(n−1) →
2k n
back into our equation
2k +(n−1)
2k
k
P(F ake|heads ) = h 2k +(n−1)
, 2k(n−1)
i
+(n−1
Plugging α =
SO
(c) Suppose you wanted to decide whether the chosen coin was fake by flipping it
k times. The decision procedure returns FAKE if all k flips come up heads,
otherwise it returns NORMAL. What is the (unconditional) probability that this
procedure makes an error?
The procedure makes an error if and only if a fair coin is chosen and
turns up heads k times in a row. The probability of this is:
P (¬f ake|headsk ) = P (headsk ||¬f ake)P (¬f ake) →
P (¬f ake|headsk ) = 0.5k × n−1
n →
P (¬f ake|headsk ) = n−1
2k n
7. Suppose you are a witness to a nighttime hit-and-run accident involving a taxi in
Athens. All taxis in Athens are blue or green. You swear, under oath, that the taxi
was blue. Extensive testing shows that, under the dim lighting conditions, discrimination between blue and green is 75% reliable.
(a) Is it possible to calculate the most likely color for the taxi? (Hint: distinguish
carefully between the proposition that the taxi is blue and the proposition that it
appears blue.)
*** SOLUTIONS ***
5
*** SOLUTIONS ***
*** SOLUTIONS ***
N
S
The relevant aspect of the world can be described by two random variables: B means the taxi was blue, and LB means the taxi looked blue.
The information on the reliability of color identication can be written as
P (LB|B) = 0.75 and P (¬LB|¬B) = 0.75
We need to know the probability that the taxi was blue, given that it
looked blue:
P (B|LB) ∝ P (LB|B)P (B) ∝ 0.75P (B)
P (¬B|LB) ∝ P (LB|¬B)P (¬B) ∝ 0.25(1?P (B))
Thus we cannot decide the probability without some information about
the prior probability of blue taxis, P (B). For example, if we knew that
all taxis were blue, i.e., P (B) = 1, then obviously P(B—LB) = 1. On
the other hand, if we adopt Laplaces Principle of Indifference, which
states that propositions can be deemed equally likely in the absence of
any differentiating information, then we have P (B) = 0.5 and P (B —LB)
= 0.75. Usually we will have some differentiating information, so this
principle does not apply.
SO
LU
TI
O
(b) What about now, given that 9 out of 10 Athenian taxis are green?
Given that 9 out of 10 taxis are green, and assuming the taxi in question
is drawn randomly from the taxi population, we have P (B) = 0.1. Hence
P (B|LB) ∝ 0.75 × 0.1 ∝ 0.075
P (¬B|LB) ∝ 0.25 × 0.9 ∝ 0.225
0.075
P (B|LB) = 0.075+0.225
= 0.25
0.225
P (¬B|LB) = 0.075+0.225
= 0.75
*** SOLUTIONS ***
6
*** SOLUTIONS ***