Download 02 Probability, Bayes Theorem and the Monty Hall Problem

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Indeterminism wikipedia , lookup

History of randomness wikipedia , lookup

Random variable wikipedia , lookup

Dempster–Shafer theory wikipedia , lookup

Randomness wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Birthday problem wikipedia , lookup

Inductive probability wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Probability, Bayes’ Theorem
and the Monty Hall Problem
Probability Distributions
• A random variable is a variable whose value is uncertain.
• For example, the height of a randomly selected person in this class
is a random variable – I won’t know its value until the person is
selected.
• Note that we are not completely uncertain about most random
variables.
– For example, we know that height will probably be in the 5’-6’ range.
– In addition, 5’6” is more likely than 5’0” or 6’0” (for women).
• The function that describes the probability of each possible value of
the random variable is called a probability distribution.
PSYC 6130, PROF. J. ELDER
2
Probability Distributions
• Probability distributions are closely related to frequency
distributions.
PSYC 6130, PROF. J. ELDER
3
Probability Distributions
• Dividing each frequency by the total number of scores
and multiplying by 100 yields a percentage distribution.
PSYC 6130, PROF. J. ELDER
4
Probability Distributions
• Dividing each frequency by the total number of scores
yields a probability distribution.
PSYC 6130, PROF. J. ELDER
5
Probability Distributions
• For a discrete distribution, the probabilities over all
possible values of the random variable must sum to 1.
PSYC 6130, PROF. J. ELDER
6
Probability Distributions
•
For a discrete distribution, we can talk about the probability of a particular
score occurring, e.g., p(Province = Ontario) = 0.36.
•
We can also talk about the probability of any one of a subset of scores
occurring, e.g., p(Province = Ontario or Quebec) = 0.50.
•
In general, we refer to these occurrences as events.
PSYC 6130, PROF. J. ELDER
7
Probability Distributions
• For a continuous distribution, the probabilities over all possible
values of the random variable must integrate to 1 (i.e., the area
under the curve must be 1).
• Note that the height of a continuous distribution can exceed 1!
Shaded area = 0.683
PSYC 6130, PROF. J. ELDER
Shaded area = 0.954
8
Shaded area = 0.997
Continuous Distributions
•
For continuous distributions, it does not make sense to talk about the
probability of an exact score.
– e.g., what is the probability that your height is exactly 65.485948467… inches?
Normal Approximation to probability distribution for height of Canadian females
(parameters from General Social Survey, 1991)
0.16
  5'3.8"
Probability p
0.14
0.12
s  2.6"
?
0.1
0.08
0.06
0.04
0.02
0
55
PSYC 6130, PROF. J. ELDER
60
65
Height (in)
9
70
75
Continuous Distributions
•
It does make sense to talk about the probability of observing a score that falls within a certain
range
–
e.g., what is the probability that you are between 5’3” and 5’7”?
–
e.g., what is the probability that you are less than 5’10”?
Valid events
Normal Approximation to probability distribution for height of Canadian females
(parameters from General Social Survey, 1991)
Probability p
0.16
0.14
  5'3.8"
0.12
s  2.6"
0.1
0.08
0.06
0.04
0.02
0
55
PSYC 6130, PROF. J. ELDER
60
65
Height (in)
10
70
75
Probability of Combined Events
Let p( A) represent the probability of event A.
0  p( A)  1
If A and B are disjoint (mutually exclusive) events, then
p( A or B )  p( A)  p(B )
Example: in the context of the Community Health Survey:
Let A represent the event that the respondent lives in Alberta.
Let B represent the event that the respondent lives in BC.
Then p( A)  0.087
p(B )  0.106
p( A or B )  0.193
PSYC 6130, PROF. J. ELDER
11
Probability of Combined Events
More generally, if A and B are not mutually exclusive,
p( A or B )  p( A)  p(B )  p( A and B )
Example: Canadian Community Health Survey, Sleeping Habits
Let A  event that respondent sleeps less than 6 hours per night.
Let B  event that respondent reports trouble sleeping most or all of the time
p( A)  0.139
p(B )  0.152
p( A and B )  0.061
Thus
p( A or B )  0.139  0.152  0.061  0.230
PSYC 6130, PROF. J. ELDER
12
Exhaustive Events
• Two or more events are said to be exhaustive if at least
one of them must occur.
• For example, if A is the event that the respondent sleeps
less than 6 hours per night and B is the event that the
respondent sleeps at least 6 hours per night, then A and
B are exhaustive.
• (Although A is probably the more exhausted!!)
PSYC 6130, PROF. J. ELDER
13
Independence
Two events are independent if the occurence of one
in no way affects the probability of the other.
If events A and B are independent, then
p( A and B )  p( A)p(B )
If events A and B are not independent, then
p( A and B )  p( A)p(B | A)
Example: pick a card, any card.
PSYC 6130, PROF. J. ELDER
14
An Example: The Monty Hall Problem
PSYC 6130, PROF. J. ELDER
15
Problem History
• When problem first appeared in Parade, approximately
10,000 readers, including 1,000 PhDs, wrote claiming
the solution was wrong.
• In a study of 228 subjects, only 13% chose to switch.
PSYC 6130, PROF. J. ELDER
16
Intuition
• Before Monty opens any doors, there is a 1/3 probability
that the car lies behind the door you selected (Door 1),
and a 2/3 probability it lies behind one of the other two
doors.
• Thus with 2/3 probability, Monty will be forced to open a
specific door (e.g., the car lies behind Door 2, so Monty
must open Door 3).
• This concentrates all of the 2/3 probability in the
remaining door (e.g., Door 2).
PSYC 6130, PROF. J. ELDER
17
PSYC 6130, PROF. J. ELDER
18
Analysis
Car hidden behind Door 1
Car hidden behind Door 2
Car hidden behind Door 3
Player initially picks Door 1
Host opens either Door 2 or 3
Switching
loses with
probability 1/6
Switching
loses with
probability 1/6
Switching loses with
probability 1/3
PSYC 6130, PROF. J. ELDER
Host must open Door 3
Host must open Door 2
Switching wins with probability
1/3
Switching wins with probability
1/3
Switching wins with probability 2/3
19
Notes
• It is important that
– Monty must open a door that reveals a goat
– Monty cannot open the door you selected
• These rules mean that your choice may constrain what
Monty does.
– If you initially selected a door concealing a goat, then there is
only one door Monty can open.
• One can rigorously account for the Monty Hall problem
using a Bayesian analysis
PSYC 6130, PROF. J. ELDER
20
End of Lecture 2
Sept 17, 2008
Conditional Probability
• To understand Bayesian inference, we first need to understand the
concept of conditional probability.
• What is the probability I will roll a 12 with a pair of (fair) dice?
• What if I first roll one die and get a 6? What now is the probability
that when I roll the second die they will sum to 12?
p( A  6)  __?
Let A be the state of die 1
Let B be the state of die 2
p(B  6)  __?
Let C be the sum of die 1 and 2
p(C  12)  __?
p(C  12 | A  6)  __?
“Probability of C given A”
PSYC 6130, PROF. J. ELDER
22
Conditional Probability
• The conditional probability of A given B is the joint
probability of A and B, divided by the marginal
probability of B.
p( A, B )
p( A | B ) 
p(B )
• Thus if A and B are statistically independent,
p( A | B ) 
p( A, B ) p( A)p(B )

 p( A).
p(B )
p(B )
• However, if A and B are statistically dependent, then
p( A | B )  p( A).
PSYC 6130, PROF. J. ELDER
23
Bayes’ Theorem
• Bayes’ Theorem is simply a consequence of the
definition of conditional probabilities:
p( A, B )
p( A | B ) 
 p( A, B )  p( A | B )p(B )
p(B )
p( A, B )
p(B | A) 
 p( A, B )  p(B | A)p( A)
p( A)
Thus p( A | B )p(B )  p(B | A)p( A)
Bayes’ Equation
p(B | A)p( A)
 p( A | B ) 
p(B )
PSYC 6130, PROF. J. ELDER
24
Bayes’ Theorem
• Bayes’ theorem is most commonly used to estimate the
state of a hidden, causal variable H based on the
measured state of an observable variable D:
Likelihood
Prior
p(H | D) 
p(D | H )p(H )
p(D )
Posterior
PSYC 6130, PROF. J. ELDER
Evidence
25
Bayesian Inference
• Whereas the posterior p(H|D) is often difficult to estimate
directly, reasonable models of the likelihood p(D|H) can
often be formed. This is typically because H is causal on
D.
• Thus Bayes’ theorem provides a means for estimating
the posterior probability of the causal variable H based
on observations D.
PSYC 6130, PROF. J. ELDER
26
Marginalizing
• To calculate the evidence p(D) in Bayes’ equation, we
typically have to marginalize over all possible states of
the causal variable H.
p(D | H )p(H )
p(H | D) 
p(D )
p(D)  p(D, H1 )  p(D, H2 ) 
 p(D, Hn )
 p(D | H1 )p(H1 )  p(D | H2 )p(H2 ) 
PSYC 6130, PROF. J. ELDER
27
 p(D | Hn ) p(Hn )
The Full Monty
• Let’s get back to The Monty Hall Problem.
• Let’s assume you initially select Door 1.
• Suppose that Monty then opens Door 2 to reveal a goat.
• We want to calculate the posterior probability that a car
lies behind Door 1 after Monty has provided these new
data.
PSYC 6130, PROF. J. ELDER
28
The Full Monty
Let Ci represent the state that the car lies behind Door i , i  [1,2,3].
Let Mi represent the event that Monty opens door i , i  [1,2,3],
revealing a goat.
p(M2 | C1 )p(C1 )
We seek p(C1 | M2 ) 
p(M2 )
PSYC 6130, PROF. J. ELDER
29
The Full Monty
Since p(C2 | M2 )  0, we can obtain p(C3 | M2 ) by subtracting p(C1 | M2 ) from 1
(Remember that the probabilities of exhaustive events add to 1!)
However, we can also calculate p(C3 | M2 ) directly:
p(C3 | M2 ) 
p(M2 | C3 )p(C3 )
p(M2 )
PSYC 6130, PROF. J. ELDER
30
But we’re not on Let’s Make a Deal!
• Why is the Monty Hall Problem Interesting?
– It reveals limitations in human cognitive processing of
uncertainty
– It provides a good illustration of many concepts of probability
– It get us to think more carefully about how we deal with and
express uncertainty as scientists.
• What else is Bayes’ theorem good for?
PSYC 6130, PROF. J. ELDER
31
Clinical Example
• Christiansen et al (2000) studied the mammogram results of 2,227
women at health centers of Harvard Pilgrim Health Care, a large
HMO in the Boston metropolitan area.
• The women received a total of 9,747 mammograms over 10 years.
Their ages ranged from 40 to 80. Ninety-three different radiologists
read the mammograms, and overall they diagnosed 634
mammograms as suspicious that turned out to be false positives.
• This is a false positive rate of 6.5%.
• The false negative rate has been estimated at 10%.
PSYC 6130, PROF. J. ELDER
32
Clinical Example
• There are about 58,500,000 women between the ages of
40 and 80 in the US
• The incidence of breast cancer in the US is about
184,200 per year, i.e., roughly 1 in 318.
PSYC 6130, PROF. J. ELDER
33
Clinical Example
Let C0 represent the absence of cancer.
Let C1 represent the presence of cancer.
Let M0 represent a negative mammogram result.
Let M1 represent a positive mammogram result.
Suppose your friend receives a positive mammogram result.
What quantity do you want to compute?
Remember: p(C1 | M1 )  p(M1 | C1 )!
PSYC 6130, PROF. J. ELDER
34