Download Bayes` theorem

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Randomness wikipedia , lookup

Probability box wikipedia , lookup

Birthday problem wikipedia , lookup

Dempster–Shafer theory wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Doomsday argument wikipedia , lookup

Transcript
The end of the world
and life in a
computer simulation
1
The doomsday argument
In the
Leslie
introduces
familiar
reasoning:
Leslie
begreading
ins his for
exptoday,
osition
of the
argumaent
withsort
an of
exa
mple:
This The
reasbasic
oningidea
is, here
fairlyis clea
a we
kind
of reas
one rly,
which
employ
allonin
the time
in nour
ordinary
the world.
It ht
g upo
whi
ch we reasoning
rely all ofabout
the tim
e. It mig
be sum
med
like this
might
beup
summed
up :asiffollows:
we have two theories, and the first makes a certain event much mor
e
likely than the other, and we observe that event, that should lead
us to favor the first theory.
Note that this doesn’t mean that The
we principle
should alw
of ays
confirmation
think that the first theory is true; rather,
what it means is that, whatever our initial estimate of the probab
ility of the first theory, we
should increase that estimat
E eis upo
evidence
T1
if the
probability
of
n obsfor
ervi
ngover
theT2eve
nt in
question.
E given T1 > the probability of E given T2.
Now consider the application of that line of reasoning to the exa
mples of balls shot at random
from Here
a lott
mac
hine. Sup
e tha
youtwokno
w thabetween
t the ball
E ery
is our
evidence,
and pos
T1 and
T2tare
theories
which
aremac
trying
When
s inwe
the
hintoe decide.
are num
bered
sequenti
ally
(wi
th
no
rep
eats) ofbeg
inning
we talk about the probability
X given
Y, wit
we h
are1,talking
about
the don
probability
that
X will
take
place,
if re
but tha
t you
’t know
how
man
y ball
s the
are inYthe
hine. Now we start the machine, and a ball comes out with ‘3’ on
alsomac
happens.
it. You’re now
asked: do you think that it is more likely that the machine has 10
balls in it, or 10,000 balls in
it? The line of reasoning sketched above seems to favor the hyp
othesis that there are just 10
balls in the machine. (Note that, whichever hypothesis you endorse
, you could be wrong; this
is not a form of reasoning which delivers results guaranteed to be
correct. The question is just
which of these hypotheses is most likely, given the evidence.)
The basic idea here is one which we employ all the time in our ordinary reasoning about the world. It
might be summed up as follows:
The principle of confirmation
E is evidence for T1 over T2 if the probability of
E given T1 > the probability of E given T2.
Here E is our evidence, and T1 and T2 are two theories between which we are trying to decide. When
we talk about the probability of X given Y, we are talking about the probability that X will take place, if
Y also happens.
Now consider the application of that line of reasoning to the examples of balls shot at random from a lottery
machine. Suppose that you know that the balls in the machine are numbered sequentially (with no repeats)
beginning with 1, but that you don't know how many balls there are in the machine. Now we start the
machine, and a ball comes out with “3” on it. You're now asked: do you think that it is more likely that the
machine has 10 balls in it, or 10,000 balls in it?
The principle of confirmation suggests - correctly, it seems - that our piece of evidence favors the hypothesis
that there are just 10 balls in the machine. This is because the probability that “3” comes out given that balls
1-10 are in the machine is 10%, whereas the probability that this ball comes out given that balls numbered
1-10,000 are in the machine is only 0.01%.
(Note that, whichever hypothesis you endorse, you could be wrong; this is not a form of reasoning which
delivers results guaranteed to be correct. The question is just which of these hypotheses is most likely, given
the evidence.)
This gives us some information about how to reason about probabilities, but not very much. It tells us when
evidence favors one theory over another, but does not tell us how much. It leaves unanswered questions like:
if before I thought that the 10,000 ball hypothesis was 90% likely to be true, should I now think that the 10 ball
hypothesis is more than 50% likely to be true? If I assigned each of the two hypotheses prior to the
emergence of the “3” ball a probability of 0.5 (50% likely to be true), what probabilities should I assign to the
theories after the ball comes out?
This gives us some information about how to reason about probabilities, but not very much. It tells us when
evidence favors one theory over another, but does not tell us how much. It leaves unanswered questions like:
if before I thought that the 10,000 ball hypothesis was 90% likely to be true, should I now think that the 10 ball
hypothesis is more than 50% likely to be true? If I assigned each of the two hypotheses prior to the
emergence of the “3” ball a probability of 0.5 (50% likely to be true), what probabilities should I assign to the
theories after the ball comes out?
1.1
Bay
To understand Leslie’s argument, we’ll have to understand how to answer these sorts of questions.
In fact, we can do better than just saying that i
One way to answer these questions employs a widely
assign to one theory. We can, using a widely ac
accepted rule of reasoning called “Bayes’ theorem,”
much
you
should
raise
your probabilit
namedsay
after how
Thomas
Bayes,
an 18th
century
English
mathematician
Presbyterian
minister.
widelyand
accepted
is that
following it enables one
To arrive
the theorem,
we begin with
the following
Toat arrive
at Bayes’
theorem,
we can begin w
definition of conditional probability, where, as is
probability’:
probability
of one
standard,
we abbreviate the
“the conditional
probability
of xclaim, given
claims
and b, we can say that
given y”
as “Pr(xa| y)”:
P (a|b) =
P (a&b)
P (b)
This says,
in effect,words,
that the probability
of a given of
b isa given b is t
In other
the probability
the chance that a and b are both true, divided by the
the chances that b is true. For example, let a =
chances that b is true.
that each of Obama, Hilary, and McCain have
probability is that Obama wins, given that a m
that a man would win, you should then (given
there is a 0.5 probability that Obama will win.
In fact, we can do better than just saying that in
One way to answer these questions employs a widely
assign
one theory.
We can,
using a widely acc
accepted
rule ofto
reasoning
called “Bayes’
theorem,”
much
you
should
raise
your probability
namedsay
after how
Thomas
Bayes,
an 18th
century
English
mathematician
Presbyterian
minister.
widelyand
accepted
is that
following it enables one t
To arrive at the theorem, we begin with the following
To arrive at Bayes’ theorem, we can begin wit
definition of conditional probability, where, as is
probability’:
probability
of one
standard,
we abbreviate the
“the conditional
probability
of xclaim, given t
given y”
as “Pr(xa| y)”:
claims
and b, we can say that
P (a|b) =
P (a&b)
P (b)
This says, in effect, that the probability of a given b is
In other
words,
the probability
the chance
that a and
b are both
true, divided byof
thea given b is the
the
chances
thatchances
b is true. that b is true. For example, let a = ‘O
that each of Obama, Hilary, and McCain have a
Let’s work through an example. Suppose that this is some time before the 2008 election, and let a =
probability
isofthat
Obama
wins,
given that a man
‘Obama wins’, and let b = ‘a man wins.’ Suppose that you
think that each
Obama,
Hilary, and
McCain
that
a manthat
would
you that
should
have a 1/3 chance of winning. Then what is the conditional
probability
Obamawin,
wins, given
a manthen (given
wins, using the above formula?
there is a 0.5 probability that Obama will win.
The conditional probability is that Obama wins, given that a man wins, is ½, since in this case Pr(a&b)=⅓
Using this definition of conditional probability,
and Pr(b)=⅔. Intuitively, if you found out only that a man would win, you should then (given the initial
6= 0:
probability assignments) think that there is a 0.5 probability
that Obama will win.
P (a&b)
Using this definition of conditional probability, we can then argue
Bayes’ =
theorem
as follows, assuming
1. for
P (a|b)
P (b)
that Pr(b)≠0.
P (a&b)
2. P (b|a) = P (a)
3. P (a|b) ⇤ P (b) = P (a&b)
w
To arrive at Bayes’ theorem, we can begin wit
t
To arrive at Bayes’ theorem, we can begin with the definition of what is called ‘conditional
probability’:
probability
of one
claim, given
probability’: the probability of one claim, given
that another the
is true.
In particular,
for arbitrary
claims a and b, we can say that
claims a and b, we can say that
Definition of conditional probability
P (a|b) =
P (a&b)
P (b)
P (a|b) =
P (a&b)
P (b)
In other words, the probability of a given b is the chance that a and b are both true, divided by
other wins’,
words,
of aSuppose
given b is the
the chances that b is true. For example, let a =In‘Obama
andthe
let probability
b = ‘a man wins.’
that each of Obama, Hilary, and McCain
have
a 1/3
chance
of winning.
the
conditional
the
chances
that
b probability,
is true.Then
For
example,
Using this
definition
of conditional
we can
then
arguelet
for a = ‘O
probability is that Obama wins, givenBayes’
that a
man
is of
1/2.
Intuitively,
if you and
foundMcCain
out only have a
theorem
as
follows,
assuming
that
Pr(b)≠0.
thatwins,
each
Obama,
Hilary,
that a man would win, you should then (given the initial probability assignments) think that
probability is that Obama wins, given that a man
there is a 0.5 probability that Obama will win.
that a man would win, you should then (given
Using this definition of conditional probability,there
we can
argue
as follows, that
assuming
that P
(b) win.
isthen
a 0.5
probability
Obama
will
6= 0:
Derivation of Bayes’ theorem
1. P (a|b) =
2.
3.
4.
5.
C.
P (a&b)
P (b)
P (a&b)
P (a)
P (b|a) =
P (a|b) ⇤ P (b) = P (a&b)
P (a&b) = P (b|a) ⇤ P (a)
P (a|b) ⇤ P (b) = P (b|a) ⇤ P (a)
P (a)
P (a|b) = P (b|a)
P (b)
Using this definition of conditional probability, w
def. of conditional probability
6= 0:
def. of conditional probability
(1), multiplication
by =’;s
1. P (a|b)
= PP(a&b)
(b)
(2), multiplication
by =’s
P (a&b)
2. P (b|a) = P (a)
(3),(4)
division
by =’s
3. P (a|b) ⇤ (5),
P (b)
= P (a&b)
4. P (a&b) = P (b|a) ⇤ P (a)
This conclusion is Bayes’ theorem. Often, what we want to know is, intuitively, the probability
(a|b)
⇤is P
(b)
=theorem
P (b|a)
⇤ Pthe
(a)probability
Theofconclusion
of this argument
is Bayes’
what
it says write
that
if we
want
to know
some hypothesis
‘h’ given
some theorem.
evidenceIntuitively,
‘e’; then5.
wePwould
the
as:
(b|a)
P (a)
of some theory given a bit of evidence, what we need to knowC.
arePthree
things:
the probability
of the evidence
(a|b)
= P(1)
P (b)
given the theory (i.e.,Phow
likely
(h) P
(e|h)the evidence is to happen if the theory is true), (2) the prior probability of the theory,
P prior
(h|e)probability
=
and (3) the
of the evidence.
P (e)
This conclusion is Bayes’ theorem. Often, what
of some
‘h’ given
some
evidence ‘e’; t
Consider what this would say about the example
of thehypothesis
lottery machine.
Suppose
for simplicity
a man
would win,
then (given
the
Using this definition ofthat
conditional
probability,
we canyou
then should
argue as follows,
assuming
thatinitial
P (b)
there isDerivation
a 0.5 probability
that Obama will win.
6= 0:
of Bayes’ theorem
1. P (a|b) =
2.
3.
4.
5.
C.
P (a&b)
P (b)
P (a&b)
P (a)
prob
def. of conditional
probability
Using this definition of conditional
probability,
we can then argu
def. of conditional probability
6= 0:
P (b|a) =
P (a|b) ⇤ P (b) = P (a&b)
P (a&b) = P (b|a) ⇤ P (a)
1. P (a|b)
P (a|b) ⇤ P (b) = P (b|a) ⇤ P (a)
P (a) 2. P (b|a)
P (a|b) = P (b|a)
P (b)
=
P (a&b)
P (b)
P (a&b)
P (a)
(1), multiplication by =’;s
(2), multiplication by =’s
(3),(4)
(5), division by =’s
def. of con
=
def. of con
3. P (a|b) ⇤ P (b) = P (a&b)
(1), m
This conclusion is Bayes’ theorem. Often, what we want to know is, intuitively, the probability
4.
P (a&b)
P then
(b|a)what
⇤ would
Pit (a)
(2), m
Theofconclusion
of this argument
is Bayes’
theorem.
Intuitively,
says write
is that the
if wetheorem
want to know
some hypothesis
‘h’ given
some
evidence=
‘e’;
we
as: the probability
of some theory given a bit of evidence,
what
we need
know
three things:
(1) the probability of the evidence
5. P
(a|b)
⇤ P to(b)
= are
P (b|a)
⇤ P (a)
given the theory (i.e.,Phow
likely
if the
(h) P
(e|h)the evidence is to happen
P (b|a)
P theory
(a) is true), (2) the prior probability of the theory,
P prior
(h|e)probability
=
C. P (a|b) =
and (3) the
of the evidence.
P (e)
P (b)
If we let “h” stand for the hypothesis to be tested and “e” for the relevant evidence, then the theorem can be
Thissayconclusion
is Bayes’
Often,
whatforwesimplicity
want to
Consider
what this would
about the example
of thetheorem.
lottery machine.
Suppose
rewritten
as follows:
kno
that you know going in
there
are only two‘h’
options,
equally likely
be correct:
ofthat
some
hypothesis
givenwhich
someareevidence
‘e’; to
then
we would w
that there are 10 balls in the machine, and that there are 10,000. Let e be the evidence that the
Bayes’ theorem
first ball to come out is #3, and let h be the hypothesis that there are 10 balls in the machine.
Then we might say:
P (h|e) = P (h) P (e|h)
P (e)
P (h) = 0.5
This theorem is very useful, since often it is easy to figure out the conditional probability of the evidence given the
Consider
whatprobability
this would
say about
example of the lottery
theory, butPvery
to figure
out the conditional
of the theory
given thethe
evidence.
(e|h)hard
= 0.1
you
know
P (e) = 0.5(0.1 +that
0.0001)
= 0.05005
going in that there are only two options, which
A good example of this sort of situation is given by our example of the lottery balls.
that there are 10 balls in the machine, and that there are 10,000
0.5 0.1
Then we find, via Bayes’
theorem,
thatcome
P (h|e)out
= 0.05005
= 0.999.
So, on
the the
basis hypothesis
of the evidencethat th
first
ball to
is #3,
and let
h be
that the first ball to come
out was
you say:
should revise your confidence in the 10-ball hypothesis
Then
we #3,
might
from 50% to 99.9% certainty.
of
some
given
some
‘e’; then we would w
there
is a hypothesis
0.5 probability‘h’
that
Obama
willevidence
win.
Bayes’ theorem
Using this definition
of conditional probability, we can then argue as follows,
P (h) P (e|h)
6= 0: P (h|e) =
P (e)
P (a&b)
1.
P
(a|b)
=
def. of conditional prob
(b)
This theorem is very useful, since often it is easy toPfigure
out the conditional probability of the evidence given the
Consider
what
this
would
say about
example
the lottery
theory, but very hard to figure
out the
conditional
of the theory
given thethe
evidence.
2. P
(b|a)
= probability
def. ofof
conditional
prob
P (a)
P (a&b)
that3.you
know
going
in that there are only two(1),
options,
which
P (a|b)
⇤ P (b)
= P (a&b)
multiplication
A good example of this sort of situation is given by our example of the lottery balls.
P (a&b)
P (b|a)
⇤ Pin
(a)the machine, and that there
(2), multiplication
that4.there
are=10
balls
are 10,000
Remember that we have two hypotheses:
that⇤there
balls
labeled
1-10 in the machine, and that there are balls
5.ball
P (a|b)
P (b)are
=
P (b|a)
⇤ P (a)
first
to
come
out
is
#3,
and let h be the hypothesis that t
labeled 1-10,000 in the machine. Let’s assume that,
at the
start, we think that these hypotheses have equal
P (b|a)
P (a)
C.probability
P (a|b) =
(5),by
division
P (b) which sort of machine we are given is determined
probability: they are each assigned
0.5. (Maybe
Then
we
might
say:
coin flip.)
This conclusion is Bayes’ theorem. Often, what we want to know is, intuitiv
Now suppose that the first ball
comes
out, again‘h’
at given
random,
is a “3”.
Bayes’ theorem
us, given
the the theo
of that
some
hypothesis
some
evidence
‘e’; thencan
wetellwould
write
(h)
0.5 before us contains 10 rather than 10,000 balls. According to
foregoing information, how likely it is P
that
the=
machine
Bayesians, we figure this out by conditionalizing on our evidence, i.e. finding the conditional probability of the
P (h)
P (e|h)
(e|h)
0.1
theory given the evidence.
PP(h|e)
==
P (e)
Pcontains
(e) = 10
0.5(0.1
+ 0.0001)
Let “h” be the theory that the machine
balls. Then
to compute=
the0.05005
relevant conditional probability, we
need to know three values. Consider what this would say about the example of the lottery machine. Sup
that you know going in that there are only two options, which
are equally l
0.5 0.1
(h|e)are=10,000.
=e0.999
that there are 10 balls in the machine, andthat
thatPthere
be th
0.05005Let
theobviously,
ball
come
out
you should
revise
Next, we need to know Pr(e that
|first
h). Fairly
this
isis0.1.
ball
tofirst
come
outto
#3, and
letwas
h be#3,
the hypothesis
that
there your
are 10con
ba
Then
we
might
say: ofcertainty.
from
50%
to
99.9%
But we also need to know the
unconditional
probability
e, Pr(e). How should we figure this out?
First, we need to know Pr(h).Then
From the
know
that thistheorem,
is 0.5.
weabove,
find,wevia
Bayes’
A natural thought is that since
we knowtheorem
that each of the two hypotheses - 10 ball and 10,000 ball - have a
Bayes’
P (h) = 0.5 can be restated in the following way:
probability of 0.5, Pr(e) should be the mean of the conditional probabilities of e given those two hypotheses. In this
case:
P (e|h) = 0.1
P (e) = 0.5(0.1 + 0.0001) = 0.05005
0.5 0.1
2
4. P (a&b) = P (b|a) ⇤ (2),
4. P (a&b) = P (b|a) ⇤ P (a) of some
multiplication
by =’s‘e’; (2),
hypothesis ‘h’P (a)
given
some evidence
thenmultiplication
we would w
5. P (a|b) ⇤ P (b) = P (b|a) ⇤ P (a)
5. P (a|b) ⇤ P (b) = P (b|a) ⇤ P (a)
(3),(4)
Bayes’
theorem
P (b|a)
P (a)
C. P (a|b) =
(5), division
P (b|a) P (a)
P (b)
C. P (a|b) =
P (h) P (e|h) (5), division by =’s
P (b)
P (h|e) =
P (e)
This conclusion is Bayes’
theorem. Often, what we want to know is, intuitive
conclusion is Bayes’ theorem.of Often,
what we ‘h’
want
to know
is, intuitively,
the
some hypothesis
given
some evidence
‘e’; then
weprobability
would write the theor
Now suppose that the first ball that comes out, again at random, is a “3”. Bayes’ theorem can tell us, given the
me hypothesis
‘h’ given some evidence ‘e’; then we would write the theorem as:
foregoing information, how likely
it is that the
machine
before
us contains
10 ratherthe
than 10,000
balls. of
According
to
Consider
what
this
would
say about
example
the lottery
P (h)
(e|h)
Bayesians, we figure this out by conditionalizing
onPour
evidence, i.e. finding the conditional probability of the
P
(h|e)
=
that
you
know
going
in that there are only two options, which
P (e)
theoryP given
evidence.
(h) Pthe
(e|h)
P (h|e) =
P (e)
that there are 10 balls in the machine, and that there are 10,000
Let “h” be the theory that the machine contains 10 balls. Then to compute the relevant conditional probability, we
would
aboutand
the example
the lottery
machine.
Sup
first ballwhat
to this
come
outsay
is #3,
let h beofthe
hypothesis
that
t
need to know three values. Consider
thatthe
youexample
know going
in lottery
that there
are onlySuppose
two options,
which are equally li
sider what
this
would
say
about
of
the
machine.
for
simplicity
Then
we
might
say:
First, we need to know Pr(h).that
Fromthere
the above,
weballs
knowin
that
thismachine,
is 0.5.
are
10
the
and that
there
you know going in that there are only two options, which are equally
likely
toare
be10,000.
correct:Let e be the
ball
tothat
come
out isare
is0.1.
#3,
and let
h be
thethe
hypothesis
Next,
needin
to know
Pr(e |first
h). Fairly
this
there are
10weballs
the machine,
andobviously,
there
10,000.
Let
e be
evidencethat
thatthere
the are 10 ba
Then wePmight
say:
(h)
=
0.5 of e,that
ball to But
come
out
is
#3,
and
let
h
be
the
hypothesis
there
are 10weballs
we also need to know the unconditional probability
Pr(e).
How should
figurein
thisthe
out?machine.
n we might
say:
A natural thought is that since we know
each
the two hypotheses - 10 ball and 10,000 ball - have a
P that
(e|h)
=of0.1
P (h) = 0.5
probability of 0.5, Pr(e) should be the mean of the conditional probabilities of e given those two hypotheses. In this
P (e)==0.10.5(0.1 + 0.0001) = 0.05005
case:
P (e|h)
P (h) = 0.5
P (e|h) = 0.1
P (e) = 0.5(0.1 + 0.0001) = 0.05005
0.5 0.1
Then we find, via Bayes’ theorem, that P (h|e) = 0.05005
= 0.999
P (e) =
0.5(0.1
+ 0.0001)
0.5 0.1
Plugging
these
numbers =
into0.05005
Bayes’we
theorem,
we Bayes’
get:
Then
find,
via
theorem,
that
P
(h|e)
=
= 0.999.
So, your
on thecon
ba
0.05005
that the first ball to come out was #3, you
should
revise
that the first ball to come out was #3, you should revise your confidence in the
from 50% to 0.5
99.9%
0.1 certainty.
from 50% to 99.9%
certainty.
n we find, via Bayes’ theorem, that P (h|e) = 0.05005 = 0.999. So, on the basis of the evidence
the first ball to come out was #3,
you should
revise your
confidence
in the
the 10-ball
hypothesis
Bayes’
theorem
be
restated
in
following
way:
Bayes’
theorem
can can
be restated
in the following
way:
Hence,
after
you
see
the
“3”
ball
come
out
of
the
lottery
machine,
you
should
revise
the
probability
you
assign to the
50% to 99.9% certainty.
es’
10-ball hypothesis from 0.5 to .999 - that is, you should switch from thinking that the machine has a 50% chance of
being can
a 10-ball
machine toin
thinking
that it has away:
99.9% chance of being a 10-ball machine.
theorem
be restated
the following
2
2
of some hypothesis ‘h’ given some evidence ‘e’; then we would w
Bayes’ theorem
P (h|e) =
P (h) P (e|h)
P (e)
Hence, after you see the “3” ball come out of the lottery machine, you should revise the probability you assign to the
what
this
would
say about
example
ofchance
the lottery
10-ball hypothesis from 0.5 to Consider
.999 - that is, you
should
switch
from thinking
that thethe
machine
has a 50%
of
being a 10-ball machine to thinking
it has
a 99.9%
chancein
of being
10-ball machine.
thatthat
you
know
going
thatathere
are only two options, which
that there are 10 balls in the machine, and that there are 10,000
An intuitive way to think about what this all means is in terms of what bets you would be willing to accept. If you
first
ball
to comethen
outyouisshould
#3, be
and
letto haccept
be the
hypothesis
think that something has a 50%
chance
of happening,
willing
all bets
which give youthat t
better than even odds, and willing
to reject
bets which
Then
we allmight
say:give you worse odds. Analogous remarks apply to different
probability assignments.
This link between probability assignments
and =
bets0.5
is one way to bring out a strength of the Bayesian approach to
P (h)
belief formation. Following the Bayesian rule of conditionalization is the only way to avoid being subject to a Dutch
book.
P (e|h) = 0.1
P (e) no
=matter
0.5(0.1
0.05005
A Dutch book is a combination of bets which,
what+
the0.0001)
outcome, =
is sure
to lose. For example, suppose
that we have a 10 horse race, and you have the following views about the probabilities that certain horses will win.
0.5 0.1
Then
we
find,
via
Bayes’
theorem,
that
P
(h|e)
=
#3 has at least a 40% chance of winning
0.05005 = 0.999
#6
has athe
slightly
better
than
chance
winning
that
first
ball
toeven
come
outof was
#3, you should revise your con
#8 has a better than 1 in 3 chance of winning
from 50% to 99.9% certainty.
If these are your probability assignments, then you should be willing to make the following three bets:
Bayes’ theorem can be restated in the following way:
$2 on #3 to win, at 3-2 odds
$3 on #6 to win, at even odds
$1.50 on #8 to win, at 3-1 odds
2
of some hypothesis ‘h’ given some evidence ‘e’; then we would w
Bayes’ theorem
P (h|e) =
P (h) P (e|h)
P (e)
This link between probability assignments and bets is one way to bring out a strength of the Bayesian approach to
Consider what this would say about the example of the lottery
belief formation. Following the Bayesian rule of conditionalization is the only way to avoid being subject to a Dutch
that you know going in that there are only two options, which
book.
that there are 10 balls in the machine, and that there are 10,000
A Dutch book is a combination of bets which, no matter what the outcome, is sure to lose. For example, suppose
first ball to come out is #3, and let h be the hypothesis that t
that we have a 10 horse race, and you have the following views about the probabilities that certain horses will win.
Then we might say:
#3 has at least a 40% chance of winning
#6 has a slightly better than even chance of winning
#8 has a better
than=1 in
3 chance of winning
P (h)
0.5
P (e|h)
0.1 be willing to make the following three bets:
If these are your probability assignments,
then you=should
P (e) = 0.5(0.1 + 0.0001) = 0.05005
$2 on #3 to win, at 3-2 odds
$3 on #6 to win, at even odds
$1.50 on #8 to win, at 3-1 odds
0.5 0.1
Then we find, via Bayes’ theorem, that P (h|e) = 0.05005
= 0.999
that
But this is a Dutch book. Can you
seethe
why?first ball to come out was #3, you should revise your con
from 50% to 99.9% certainty.
One of the main arguments given in favor of updating your beliefs using Bayesian conditionalization, using Bayes’
theorem, is that other ways of updating your beliefs will leave you, in the above sense, subject to a Dutch book. And
Bayes’
can
be restated
way:
being subject to a Dutch book seems
like theorem
a paradigm of
irrationality.
It seems in
sortthe
of likefollowing
the probabilistic
version of
having inconsistent beliefs. Hence it is a strength of Bayesianism that it avoids this situation.
2
of some hypothesis ‘h’ given some evidence ‘e’; then we would w
Bayes’ theorem
P (h|e) =
P (h) P (e|h)
P (e)
would
say
With this Bayesian apparatusConsider
in hand, let’swhat
return tothis
Leslie’s
argument.
about the example of the lottery
that you know going in that there are only two options, which
Leslie asks us to consider two hypotheses about the future course of human civilization:
that there are 10 balls in the machine, and that there are 10,000
to go
come
is #3,
and
Doom Soon. The first
humanball
race will
extinctout
by 2150,
with the
totallet h be the hypothesis that t
humans born by the
time ofwe
such
extinction
being 500 billion.
Then
might
say:
Doom Delayed. The human race will go on for several thousand
centuries, with the total humans born before the race goes extinct being
P (h) = 0.5
50 thousand billion.
P (e|h)
= 0.1 we have, along with Bayesian conditionalization, to
Leslie thinks that we can use a certain
kind of evidence
show how likely these two hypotheses are. To do this, though, we’ll first have to think about how likely we
P (e) = 0.5(0.1 + 0.0001) = 0.05005
think these two hypotheses are to begin with.
Doom Soom is pretty grim; it means that the human race will go extinct during the lives of your grandchildren.
0.5 0.1
Then
we
find,
via
Bayes’
theorem,
that
P
(h|e)
=
= 0.999
Let’s suppose that we think that this is pretty unlikely - maybe that it has a 1% chance of happening.0.05005
Let’s
suppose (we’ll relax this assumption
later)
that Doom
Delayed
is the
other
possibility,
so that it has
a 99%
that the
first
ball to
come
outonly
was
#3,
you should
revise
your con
chance of happening.
from 50% to 99.9% certainty.
What evidence could we possibly have now to help us decide between these hypotheses now? Some obvious
candidates spring to mind: the
proliferation
of nuclear
weapons;
astronomical
of the probabilities
Bayes’
theorem
can
be restated
in calculations
the following
way: of
large asteroids colliding with the earth; prophecies involving the end of the Mayan calendar; etc. But the
evidence that Leslie has in mind is of a different sort.
2
of some hypothesis ‘h’ given
Doom Soon. The human race will go extinct by 2150, with the total
humans born by the time of such extinction being 500 billion.
Doom Delayed. The human race will go on for several thousand
centuries, with the total humans born before the race goes extinct being
50 thousand billion.
Bayes’ theorem
P (h|e) =
P (h) P (e|h)
P (e)
Consider what this would sa
you
going in that
Doom Soom is pretty grim; it means that the human race will go extinct during thethat
lives of
your know
grandchildren.
Let’s suppose that we think that this is pretty unlikely - maybe that it has a 1% chance
happening.
thatofthere
areLet’s
10 balls in the
suppose (we’ll relax this assumption later) that Doom Delayed is the only other possibility, so that it has a 99%
first ball to come out is #3,
chance of happening.
Then we might say:
What evidence could we possibly have now to help us decide between these hypotheses now? Some obvious
candidates spring to mind: the proliferation of nuclear weapons; astronomical calculations of the probabilities of
large asteroids colliding with the earth; prophecies involving the end of the Mayan calendar; etc. But the
P (h) = 0.5
evidence that Leslie has in mind is of a different sort.
P (e|h) = 0.1
That evidence is: each of us is one of the first 50 billion human beings born. Let’s now ask, using the Bayesian
method, what probability, in light of this evidence, we should assign to Doom Soon (DS) and
Doom
P (e)
= Delayed
0.5(0.1 +
(DD).
0.000
First, what is the probability of our evidence, conditional on the truth of DS? It seems that it should be 0.1; if there
Then
we50find,
via Bayes’
will be 500 billion total human beings, then I have a 1 in 10 chance of being among
the first
billion born.
the
that the first ball to come ou
Second, what is Pr(e | DD)? By analogous reasoning, it seems to be 1 in 1000, or .001.
from 50% to 99.9% certainty
But what is the probability of e, by itself? On the current assumption that either DS or DD is true, and that DD is
99 times more likely to be true, it seems that Pr(e) should be equal to:
Bayes’ theorem can be
[99 * Pr(e | DD) + 1 * Pr(e | DS)] / 100 = .00199
resta
of some hypothesis ‘h’ given
Doom Soon. The human race will go extinct by 2150, with the total
humans born by the time of such extinction being 500 billion.
Doom Delayed. The human race will go on for several thousand
centuries, with the total humans born before the race goes extinct being
50 thousand billion.
Bayes’ theorem
P (h|e) =
P (h) P (e|h)
P (e)
Consider what this would sa
That evidence is: each of us is one of the first 50 billion human beings born. Let’sthat
now ask,
using
the Bayesian
you
know
going in that
method, what probability, in light of this evidence, we should assign to Doom Soon (DS) and Doom Delayed
that there are 10 balls in the
(DD).
first ball to come out is #3,
So we have the following:
Then we might say:
P r(DS) = 0.01
P r(DD) = 0.99
P r(e) = 0.00199
P r(e|DS) = 0.1
P r(e|DD) = 0.001
P (h) = 0.5
P (e|h) = 0.1
And this is all we need to figure out the probabilities of DS and DD conditional on the evidence that we are
P (e) = 0.5(0.1
among the first 50 billion human beings born.
P (DS|e) =
P (DD|e) =
P (DS) ⇤ P (e|DS)
.01 ⇤ .1
=
= .503
P (e)
.00199
P (DD) ⇤ P (e|DD)
.99 ⇤ .001
=
= .497
P (e)
.00199
+ 0.000
Then we find, via Bayes’ the
that the first ball to come ou
from 50% to 99.9% certainty
Bayes’ theorem can be resta
So, even if we begin by thinking that the probability of Doom Soon is only 1%, reflection on the simple fact that
we are born among the first 50 billion humans shows that we should think that there is a greater than 50%
chance that the human race will be extinct in the next 150 years.
of some hypothesis ‘h’ given
Doom Soon. The human race will go extinct by 2150, with the total
humans born by the time of such extinction being 500 billion.
Doom Delayed. The human race will go on for several thousand
centuries, with the total humans born before the race goes extinct being
50 thousand billion.
P r(DS) = 0.01
P r(DD) = 0.99
P r(e) = 0.00199
P r(e|DS) = 0.1
P r(e|DD) = 0.001
Bayes’ theorem
P (h|e) =
P (h) P (e|h)
P (e)
Consider what this would sa
that you
know going in that
P (DS) ⇤ P (e|DS)
.01 ⇤ .1
P (DS|e) =
=
= .503
P (e) that there
.00199 are 10 balls in the
P (DD) ⇤ P (e|DD)
.99 to
⇤ .001
first ball
come
out is #3,
P (DD|e) =
=
= .497
P (e)
Then we.00199
might say:
So, even if we begin by thinking that the probability of Doom Soon is only 1%, reflection on the simple fact that
P (h) than
= 0.5
we are born among the first 50 billion humans shows that we should think that there is a greater
50%
chance that the human race will be extinct in the next 150 years.
P (e|h) = 0.1
Varying our initial assumptions changes the outcome dramatically. Suppose that reflection upon the risk of
P (e) = 0.5(0.1 + 0.000
climate change, nuclear war, etc. makes us think that we should assign DS and DD roughly equal initial
probabilities - suppose we think, before considering the fact that we were born in the first 50 billion people, that
the probability of each is approximately 0.5. On this assumption, the probability of Doom Soon after
Then we find, via Bayes’ the
conditionalizing on our evidence is just over 99%.
that the first ball to come ou
Alternatively, we might think (as Leslie says, p. 202) that if the human race survives
past 50%
2050, then
it will likely
from
to 99.9%
certainty
colonize other planets, making it quite likely that the population of humans will ultimately grow to be at least 50
million billion (50 quadrillion). On this assumption, even if we begin by assigning Doom Soon a chance of only
theorem
can be
1%, conditionalizing on the evidence that we were among the first 50 billion born Bayes’
gives Doom
Soon a probability
of 99.9%.
resta
of some hypothesis ‘h’ given
Doom Soon. The human race will go extinct by 2150, with the total
humans born by the time of such extinction being 500 billion.
Doom Delayed. The human race will go on for several thousand
centuries, with the total humans born before the race goes extinct being
50 thousand billion.
Bayes’ theorem
P (h|e) =
P (h) P (e|h)
P (e)
Consider what this would sa
that onyou
knowfactgoing
So, even if we begin by thinking that the probability of Doom Soon is only 1%, reflection
the simple
that in that
we are born among the first 50 billion humans shows that we should think that there
is a greater
that
therethan
are50%
10 balls in the
chance that the human race will be extinct in the next 150 years.
first ball to come out is #3,
Then upon
we might
Varying our initial assumptions changes the outcome dramatically. Suppose that reflection
the risk ofsay:
climate change, nuclear war, etc. makes us think that we should assign DS and DD roughly equal initial
probabilities - suppose we think, before considering the fact that we were born in the first 50 billion people, that
the probability of each is approximately 0.5. On this assumption, the probability of Doom Soon
after
P (h)
= 0.5
conditionalizing on our evidence is just over 99%.
P (e|h) = 0.1
Alternatively, we might think (as Leslie says, p. 202) that if the human race survives past 2050, then it will likely
P (e) = 0.5(0.1 + 0.000
colonize other planets, making it quite likely that the population of humans will ultimately grow to be at least 50
million billion (50 quadrillion). On this assumption, even if we begin by assigning Doom Soon a chance of only
1%, conditionalizing on the evidence that we were among the first 50 billion born gives Doom Soon a probability
Then we find, via Bayes’ the
of 99.9%.
that the first ball to come ou
The numbers are surprising. But more surprising than the numbers is the way we arrived at them. One thinks of
from 50% to 99.9% certainty
revising one’s view about the likelihood of Doom Soon based on empirical claims about nuclear weapons,
climate change, asteroids, etc. It seems crazy that such dramatic changes in our view about the extinction of the
Bayes’
human race should result from mere reflection on how many human beings have been
born.theorem can be
This is why this argument deserves to be considered a paradox. We have a plausible argument from Leslie that
we should radically change our view of the future based on the number of human beings who have lived, but it
seems clear that it is unreasonable to change our views on this topic for this reason.
resta
This is why this argument deserves to be considered a paradox. We have a plausible argument from Leslie that
we should radically change our view of the future based on the number of human beings who have lived, but it
seems clear that it is unreasonable to change our views on this topic for this reason.
We’ll move on shortly to consider some objections to Leslie’s argument. But first lets consider another, even
more surprising, application of the same sort of reasoning, due to Nick Bostrom (in “Are you living in a computer
simulation?”).
Suppose we met some Martians, or other beings from another planet, who were made of entirely different
materials from us - suppose, for example, that they were composed almost entirely of silicon. Could such beings
be conscious?
It seems clear that they could; beings could be conscious even if made of radically different materials than us.
But what makes such beings conscious? Plausibly, what makes them conscious is not what they are made of,
but how what they are made of is organized; even something made of silicon could be conscious if its silicon
parts were arranged in a structure as complex as a human brain.
If this is true, then it seems clear that computers could be conscious. Indeed, given reasonable assumptions about
future increases in the computing power available to us, it seems quite plausible that in the relatively near future
computers will be able to run simulations of human life which contain many, many conscious beings.
With continuing increases in computing power, there will presumably come a time in which a vast, vast majority of
all conscious beings to have existed will have existed only in computer simulations.
Are you living in a computer simulation right now? It seems that reasoning analogous to that used in the doomsday
argument suggests that it is reasonable for you to believe that you are. After all, if the hypothesis that the vast
number of conscious beings to have existed exist only in a computer simulation is true, then the probability that you
do not exist in a computer simulation is extremely low.
Of course, there is another option: human beings could become extinct before the relevant advances in computing
power are realized. So, it seems, it is overwhelmingly likely that either you are living in a computer simulation, or
human beings are soon to become extinct.
How might one respond to Leslie’s argument? (We’ll leave it an open question whether the responses we
consider will also apply to the “computer simulation” argument.)
One could of course respond that the Bayesian apparatus on which it depends is faulty. But that would seem
hasty; let’s consider some other possibilities.
One possibility is that the problem begins with the obviously false assumption that Doom Soon and Doom
Delayed are the only relevant possibilities. Would Leslie’s argument still work if we relaxed this assumption, and
took into account the fact that there are many possible futures for the human species? What would the
analogous situation be with the example of the numbered balls and the lottery machine?
Let’s consider a different line of objection:
Look, we know that something must be wrong with this way of arguing. After all,
couldn't someone in ancient Rome have used this reasoning to show that the end
the world would come before 500 AD? And wouldn't they have been wrong? So
mustn't there also be something wrong with our using this reasoning?
Is this a good objection?
A more promising line of objection focuses on the apparent assumption that one’s location in the birth order of
human beings is random. Leslie asks you to, in effect, assimilate this case to the lottery ball example. But why
think that the fact that I am born in the first 50 billion people is relevantly analogous to the number “50 billion”
coming out of the lottery machine?
To develop this objection, let’s think more closely about exactly which assumptions are involved in the lottery
machine example. (This follows the discussion in Mark Greenberg’s “Apocalypse not just yet.”)
Doom Soon. The human race will go extinct by 2150, with
the total humans born by the time of such extinction being
500 billion.
Doom Delayed. The human race will go on for several
thousand centuries, with the total humans born before
the race goes extinct being 50 thousand billion.
A more promising line of objection focuses on the apparent assumption that one’s location in the birth order of
human beings is random. Leslie asks you to, in effect, assimilate this case to the lottery ball example. But why
think that the fact that I am born in the first 50 billion people is relevantly analogous to the number “50 billion”
coming out of the lottery machine?
To develop this objection, let’s think more closely about exactly which assumptions are involved in the lottery
machine example. (This follows the discussion in Mark Greenberg’s “Apocalypse not just yet.”)
An analogous case would be a lottery machine which we know to contain either 500 billion or 50 thousand billion
balls. To make the numbers smaller, let’s think of a pair of lottery machines with, respectively, 500 and 50,000
balls. Suppose you are confronted with a lottery machine which you know to be of one of the types just
described.
Suppose now that the lottery machine spits out a ball which reads “50.” (This is supposed to be analogous to
finding that you are among the first 50 billion people born.) Isn’t this evidence, as Leslie says, that the machine
has 500 rather than 50,000 balls in it?
It is - but only if we make two assumptions about the machines.
First, we must assume that the ball which comes out is randomly selected by the lottery machine. If, for example,
balls with numbers higher than “100” are slightly larger and never come out of the machine, then obviously the
fact that a ball labeled “50” came out of the machine would be no evidence at all about how many balls are in
the machine.
Second, and just as important, we must assume that the ball with the label “50” on it would still be in the urn if
it contained 500 balls.
Doom Soon. The human race will go extinct by 2150, with
the total humans born by the time of such extinction being
500 billion.
Doom Delayed. The human race will go on for several
thousand centuries, with the total humans born before
the race goes extinct being 50 thousand billion.
Suppose now that the lottery machine spits out a ball which reads “50.” (This is supposed to be analogous to
finding that you are among the first 50 billion people born.) Isn’t this evidence, as Leslie says, that the machine
has 500 rather than 50,000 balls in it?
It is - but only if we make two assumptions about the machines.
First, we must assume that the ball which comes out is randomly selected by the lottery machine. If, for example,
balls with numbers higher than “100” are slightly larger and never come out of the machine, then obviously the
fact that a ball labeled “50” came out of the machine would be no evidence at all about how many balls are in
the machine.
Second, and just as important, we must assume that the ball with the label “50” on it would still be in the urn if
it contained 500 balls.
This second condition is easy to miss, since we of course assume that a machine with N balls in it contains those
balls labeled 1-N. But of course things don’t have to work this way. Suppose that the 500 balls in the 500-ball
machine were randomly selected from the balls in the 50,000 ball machine. Then the “50” ball might not be in the
500-ball machine. In this case, would the emergence of the “50” ball from the machine count in favor of the
hypothesis that the machine before us contains 500 rather than 50,000 balls?
The problem is that any way of understanding the analogy between the Doomsday argument and the lottery
machine example seems to violate one of the two assumptions needed to legitimate the reasoning used in the
lottery machine case.
To see this, think about the question: what is the analogue in the Doomsday argument of the numbers written on
the balls in the lottery machine case?
Doom Soon. The human race will go extinct by 2150, with
the total humans born by the time of such extinction being
500 billion.
Doom Delayed. The human race will go on for several
thousand centuries, with the total humans born before
the race goes extinct being 50 thousand billion.
Suppose now that the lottery machine spits out a ball which reads “50.” (This is supposed to be analogous to
finding that you are among the first 50 billion people born.) Isn’t this evidence, as Leslie says, that the machine
has 500 rather than 50,000 balls in it?
It is - but only if we make two assumptions about the machines.
First, we must assume that the ball which comes out is randomly selected by the lottery machine. If, for example,
balls with numbers higher than “100” are slightly larger and never come out of the machine, then obviously the
fact that a ball labeled “50” came out of the machine would be no evidence at all about how many balls are in
the machine.
Second, and just as important, we must assume that the ball with the label “50” on it would still be in the urn if
it contained 500 balls.
To see this, think about the question: what is the analogue in the Doomsday argument of the numbers written on
the balls in the lottery machine case?
Here’s one idea: the number “written on you” is the number you happen to be in the birth order of the human
species. So (let’s suppose) my number is “50 billion” because I just happened to be the 50 billionth person born.
On this idea, people don’t have “built in” numbers: rather, they are assigned the numbers they get based on
when they are born.
But now imagine that the lottery machine worked this way. On this view, there are an undisclosed number of
balls in the machine, none of which have numbers written on them. We write the numbers on the balls as they
come out of the machine, beginning with “1.” Now suppose we get to “50.” Is the fact that a ball with such a low
number came out evidence that we have a 500-ball machine before us? Clearly not, because the first
assumption - the assumption of random selection - is violated.
Doom Soon. The human race will go extinct by 2150, with
the total humans born by the time of such extinction being
500 billion.
Doom Delayed. The human race will go on for several
thousand centuries, with the total humans born before
the race goes extinct being 50 thousand billion.
Suppose now that the lottery machine spits out a ball which reads “50.” (This is supposed to be analogous to
finding that you are among the first 50 billion people born.) Isn’t this evidence, as Leslie says, that the machine
has 500 rather than 50,000 balls in it?
It is - but only if we make two assumptions about the machines.
First, we must assume that the ball which comes out is randomly selected by the lottery machine. If, for example,
balls with numbers higher than “100” are slightly larger and never come out of the machine, then obviously the
fact that a ball labeled “50” came out of the machine would be no evidence at all about how many balls are in
the machine.
Second, and just as important, we must assume that the ball with the label “50” on it would still be in the urn if
it contained 500 balls.
So let’s suppose instead that every human being comes with a “built in” number, just like the lottery balls have
numbers written on them prior to their emergence from the lottery machine.
There are two problems with this suggestion. First, how do we know what anyone’s number is, on this way of
viewing the analogy? What does it even mean to say that people have a certain number?
Second, if we can come up with a way of assigning numbers to all the people that could have existed - we violate
the second assumption. For there is no guarantee that, if N people exist, they will have the numbers 1-N.
This is just like the case in which 500 balls are randomly selected from the 50,000 ball machine - in this case,
even if we somehow knew that my number was 50 billion - this would provide us no evidence at all about how
many human beings will eventually exist.