Download Probabilities in Science

Probabilities in Science. 1. What is a probability? This is a famously difficult question—fortunately, we don’t need a firm answer here. For our purposes, a probability can be thought of however you like—as a fundamental fact about the world (the chance that an isolated uranium 238 atom has of decaying over a period of 100 years), as an expression of our ignorance and the symmetries of a class of physical trials (the probability that a wellflipped coin will land heads, or tails), as the long-run (“limit”) frequency of a certain type of event in an indefinitely long sequence of trials (the probability of a rat belonging to a particular laboratory strain developing cancer after a certain procedure), or the frequency of an event type in an actual population (the probability of a 52 year old Caucasian Canadian citizen dying of a heart attack in the next year). The key thing for us is to understand how probabilities of various sentences are related to each other, and what this tells us about evidence in science. Bayesians regard probability theory as the foundation of rationality—wherever you get your probabilities, and whatever probabilities you start with, the key question is how you should adjust the probabilities you assign to various sentences in response to the evidence. For our purposes a probability is a number representing the strength of our cognitive commitment to a sentence—and therefore closely tied to decision-making. This is particularly clear if we think of decisions as bets: When I build a bridge or an aircraft or anything else according to certain engineering principles and practices, I show that I am confident enough in the reliability of those principles and practices that I regard the probability of a failure (in general, of all the various kinds of failures that might occur) as low enough to be acceptable. In general, when I show that certain odds are acceptable to me for a given bet (i.e. at least fair), I implicitly reveal how confident I am about the outcome of the bet. So how confident should we be about various bets? There are mathematical rules that come in here—the mathematics of probability theory. Learning to use these rules has been very profitable for many people, starting with the very first probability theorists (Pascal et al.), who applied them to gambling, and won big. Probabilities are represented by real numbers between 0 and 1 (inclusive); A sentence you assign probability 0 to is one you are certain is not true; a sentence you assign probability 1 to is one you are certain is true. Any sentence that holds everywhere in the outcome space will get probability 1; a sentence that holds nowhere in the outcome space gets probability 0. The probabilities of some sentences are constrained by the probabilities we assign to others. So: Let the probability of a given sentence p be written Pr(p). Then 1. Pr(not p) = 1 – Pr(p) 2. Pr(p or q) = Pr(p) + Pr(q) – Pr(p and q) 3. Pr(p and q) = Pr(p) * Pr(q/p) = Pr(q) * Pr(p/q) Comments: 1. ‘not p’ is a sentence that is true if p is false, and false if p is true. So whatever probability we assign to p, we know that the “rest of the space” of possible outcomes is taken up by not p, and the whole space (i.e. all the range of possible outcomes) has to have probability (measure) So Pr(p)+Pr(not p) = 1, and Pr(not p) = 1- Pr(p), as 1 required. 2. ‘p or q’ is a sentence that is true if p is true, true if q is true, and false only if both p and q are false. Think of the region of the outcome space where ‘p or q’ is true. This includes all the region where p holds and all the region where q holds. But if we just add the two probabilities, we are counting any part of the space where p and q both hold twice. So we need to subtract the probability of ‘p and q’ from the sum. 3. ‘p and q’ is a sentence that is true if p and q are both true, and false if either or both are false. So we need to consider the region of the outcome space where the p-region and the q-region overlap. This is the proportion of the whole space covered by p, multiplied by the proportion of the p-region that is also inside the q-region. And that is precisely what we mean by Pr(q/p). Of course this is the same if we reverse the role of p and q, so the second version must give the same result. We call Pr(p/q) the probability of p given q. This isn’t a lot of math—but it produces a very interesting and important result that we call Bayes’ theorem (named after Thomas Bayes, a 19th century British scholar interested in probability): Since Pr(p) * Pr(q/p) = Pr(q) * Pr(p/q), then (dividing through by Pr(q) & reversing the order): Bayes: Pr (p/q) = Pr (p) * Pr(q/p) / Pr(q) This is an important result because it says something about how we should respond to evidence. This is particularly clear if we think in terms of predictions that a theory makes. Let T be a theory (formulated as a long and complex sentence), and let P be a prediction that we are sure must hold if T is true, but that is otherwise pretty unlikely. Then consider what happens, according to Bayes’ rule, if we learn that P is true. We apply Bayes’ rule to get: Pr(T/P) = Pr(T) * Pr(P/T) / Pr(P). But Pr(P/T) is about 1, since we’re sure that if T is true, P will result. And Pr(P) is about equal to Pr(T), since we’re pretty sure P will be false, unless T is true. If we put Pr(P) = Pr(T), and Pr(P/T) = 1 into our equation, we get: Pr(T/P) = Pr(T) * 1 / Pr(T) = 1. So the probability of T, given P is true, is going to be close to 1 if the probability of P given T is close to 1, and the probability of P is very close to the probability of T (i.e. we think P is very unlikely unless T is true). This is important because, when we learn that P is true, Bayesians think the right thing to do is replace the probability we assign to each sentence with a new probability equal to the old probability of that sentence given P. So learning that a theory makes a true prediction that is very unlikely unless the theory itself is true should lead us to assign a probability very close to 1 to the theory! Successful predictions, on this account, are very powerful evidence for a theory, so long as those predictions are not likely to be true unless the theory is also true. This is the key we will apply to evaluating how much support a new piece of evidence gives to a theory. There are two main ways a piece of evidence can fail to support a theory well: Failure 1: The theory doesn’t really make a strong prediction, i.e. the “evidence” is not terribly likely even if the theory is true. Failure 2: The “prediction” is likely (even already known) regardless of whether the theory is true or not. There are lots of ways to dress up a bad piece of evidence and make it look good. A popular one among ‘psychics’ is to adopt a ‘theory’ (I’m psychic!) that makes vague, mysterious sounding predictions. Then, when something specific happens that more or less “fits” the vague prediction, you can claim that the theory “predicted” that specific event. Since the specific event was unlikely, it looks as though the successful prediction really should count in favour of the theory. But in fact, many different “specific” events would fit with the actual prediction made—the vagueness of the prediction makes it pretty likely that some event fitting its requirements will happen regardless of whether the theory is true or not. And that’s the probability that belongs in our equation, not the probability of the surprising, specific event that happens to fit the “prediction.” Another trick the psychics use (have a look at the tabloids) is to make a lot of predictions. Even if every prediction is pretty unlikely, the chances are that some of them will succeed—and then, of course, you make a big fuss about your successes while ignoring the failures. A third is the “conspiracy theory” maneuver: here the theory is tailored, bit by bit, to fit the evidence— whenever a new and interesting fact is observed, you add new elements to the theory so that the new theory “predicts” the fact: you add new members to the supposed conspiracy, you add new facts that they aim to hide, and so on, until the whole house of cards is delicately balanced on a carefully selected group of facts. But whenever we design a theory to predict a known fact, we already know the probability of the prediction being true is 1, i.e. Pr(P) = 1. And when we put that into our equation, we get no change in the probability of the theory.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Probabilities in Science