Download Probabilities in Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of randomness wikipedia , lookup

Randomness wikipedia , lookup

Indeterminism wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Probability box wikipedia , lookup

Birthday problem wikipedia , lookup

Dempster–Shafer theory wikipedia , lookup

Risk aversion (psychology) wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Probabilities in Science.
1. What is a probability?
This is a famously difficult question—fortunately, we don’t need a firm answer here. For our
purposes, a probability can be thought of however you like—as a fundamental fact about the world
(the chance that an isolated uranium 238 atom has of decaying over a period of 100 years), as an
expression of our ignorance and the symmetries of a class of physical trials (the probability that a wellflipped coin will land heads, or tails), as the long-run (“limit”) frequency of a certain type of event in
an indefinitely long sequence of trials (the probability of a rat belonging to a particular laboratory
strain developing cancer after a certain procedure), or the frequency of an event type in an actual
population (the probability of a 52 year old Caucasian Canadian citizen dying of a heart attack in the
next year). The key thing for us is to understand how probabilities of various sentences are related to
each other, and what this tells us about evidence in science.
Bayesians regard probability theory as the foundation of rationality—wherever you get your
probabilities, and whatever probabilities you start with, the key question is how you should adjust the
probabilities you assign to various sentences in response to the evidence.
For our purposes a probability is a number representing the strength of our cognitive commitment to
a sentence—and therefore closely tied to decision-making. This is particularly clear if we think of
decisions as bets: When I build a bridge or an aircraft or anything else according to certain
engineering principles and practices, I show that I am confident enough in the reliability of those
principles and practices that I regard the probability of a failure (in general, of all the various kinds of
failures that might occur) as low enough to be acceptable. In general, when I show that certain odds
are acceptable to me for a given bet (i.e. at least fair), I implicitly reveal how confident I am about the
outcome of the bet.
So how confident should we be about various bets? There are mathematical rules that come in
here—the mathematics of probability theory. Learning to use these rules has been very profitable for
many people, starting with the very first probability theorists (Pascal et al.), who applied them to
gambling, and won big.
Probabilities are represented by real numbers between 0 and 1 (inclusive); A sentence you assign
probability 0 to is one you are certain is not true; a sentence you assign probability 1 to is one you are
certain is true. Any sentence that holds everywhere in the outcome space will get probability 1; a
sentence that holds nowhere in the outcome space gets probability 0. The probabilities of some
sentences are constrained by the probabilities we assign to others. So:
Let the probability of a given sentence p be written Pr(p). Then
1.
Pr(not p) = 1 – Pr(p)
2.
Pr(p or q) = Pr(p) + Pr(q) – Pr(p and q)
3.
Pr(p and q) = Pr(p) * Pr(q/p) = Pr(q) * Pr(p/q)
Comments:
1.
‘not p’ is a sentence that is true if p is false, and false if p is true. So whatever probability we
assign to p, we know that the “rest of the space” of possible outcomes is taken up by not p,
and the whole space (i.e. all the range of possible outcomes) has to have probability (measure)
So Pr(p)+Pr(not p) = 1, and Pr(not p) = 1- Pr(p), as 1 required.
2.
‘p or q’ is a sentence that is true if p is true, true if q is true, and false only if both p and q are
false. Think of the region of the outcome space where ‘p or q’ is true. This includes all the
region where p holds and all the region where q holds. But if we just add the two
probabilities, we are counting any part of the space where p and q both hold twice. So we
need to subtract the probability of ‘p and q’ from the sum.
3.
‘p and q’ is a sentence that is true if p and q are both true, and false if either or both are false.
So we need to consider the region of the outcome space where the p-region and the q-region
overlap. This is the proportion of the whole space covered by p, multiplied by the proportion
of the p-region that is also inside the q-region. And that is precisely what we mean by
Pr(q/p). Of course this is the same if we reverse the role of p and q, so the second version
must give the same result. We call Pr(p/q) the probability of p given q.
This isn’t a lot of math—but it produces a very interesting and important result that we call Bayes’
theorem (named after Thomas Bayes, a 19th century British scholar interested in probability):
Since Pr(p) * Pr(q/p) = Pr(q) * Pr(p/q), then (dividing through by Pr(q) & reversing the order):
Bayes: Pr (p/q) = Pr (p) * Pr(q/p) / Pr(q)
This is an important result because it says something about how we should respond to evidence.
This is particularly clear if we think in terms of predictions that a theory makes. Let T be a theory
(formulated as a long and complex sentence), and let P be a prediction that we are sure must hold if T is
true, but that is otherwise pretty unlikely. Then consider what happens, according to Bayes’ rule, if we
learn that P is true. We apply Bayes’ rule to get:
Pr(T/P) = Pr(T) * Pr(P/T) / Pr(P).
But Pr(P/T) is about 1, since we’re sure that if T is true, P will result. And Pr(P) is about equal to Pr(T),
since we’re pretty sure P will be false, unless T is true. If we put Pr(P) = Pr(T), and Pr(P/T) = 1 into our
equation, we get:
Pr(T/P) = Pr(T) * 1 / Pr(T) = 1.
So the probability of T, given P is true, is going to be close to 1 if the probability of P given T is close to 1,
and the probability of P is very close to the probability of T (i.e. we think P is very unlikely unless T is
true).
This is important because, when we learn that P is true, Bayesians think the right thing to do is
replace the probability we assign to each sentence with a new probability equal to the old probability of that
sentence given P. So learning that a theory makes a true prediction that is very unlikely unless the theory
itself is true should lead us to assign a probability very close to 1 to the theory! Successful predictions, on
this account, are very powerful evidence for a theory, so long as those predictions are not likely to be true
unless the theory is also true. This is the key we will apply to evaluating how much support a new piece of
evidence gives to a theory. There are two main ways a piece of evidence can fail to support a theory well:
Failure 1: The theory doesn’t really make a strong prediction, i.e. the “evidence” is not terribly likely even
if the theory is true.
Failure 2: The “prediction” is likely (even already known) regardless of whether the theory is true or not.
There are lots of ways to dress up a bad piece of evidence and make it look good.
A popular one among ‘psychics’ is to adopt a ‘theory’ (I’m psychic!) that makes vague, mysterious
sounding predictions. Then, when something specific happens that more or less “fits” the vague prediction,
you can claim that the theory “predicted” that specific event. Since the specific event was unlikely, it looks
as though the successful prediction really should count in favour of the theory. But in fact, many different
“specific” events would fit with the actual prediction made—the vagueness of the prediction makes it pretty
likely that some event fitting its requirements will happen regardless of whether the theory is true or not.
And that’s the probability that belongs in our equation, not the probability of the surprising, specific event
that happens to fit the “prediction.”
Another trick the psychics use (have a look at the tabloids) is to make a lot of predictions. Even if every
prediction is pretty unlikely, the chances are that some of them will succeed—and then, of course, you
make a big fuss about your successes while ignoring the failures.
A third is the “conspiracy theory” maneuver: here the theory is tailored, bit by bit, to fit the evidence—
whenever a new and interesting fact is observed, you add new elements to the theory so that the new theory
“predicts” the fact: you add new members to the supposed conspiracy, you add new facts that they aim to
hide, and so on, until the whole house of cards is delicately balanced on a carefully selected group of facts.
But whenever we design a theory to predict a known fact, we already know the probability of the prediction
being true is 1, i.e. Pr(P) = 1. And when we put that into our equation, we get no change in the probability
of the theory.