Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of randomness wikipedia , lookup
Probability box wikipedia , lookup
Infinite monkey theorem wikipedia , lookup
Birthday problem wikipedia , lookup
Inductive probability wikipedia , lookup
Boy or Girl paradox wikipedia , lookup
Ars Conjectandi wikipedia , lookup
Probability I. Why do we need to look probability? Probability is fundamental to statistics (okay, so that doesn't explain things). Briefly, we use statistics to figure out how “probable” something is. In other words, if we observe something (an event of some kind), we then figure out what the probability of that event is. If the probability of that event occurring “by chance” is very low, we tend to think something else is going on. A biological example: We try out a new medicine to reduce blood pressure. We find the following (measuring systolic blood pressure): ̄y Placebo 156 186 145 168 149 164.4 Medicine 147 163 162 143 147 152.4 Does the medicine work? To answer this question, we ask: What is the probability of getting this result if the medicine does not work? Another way of looking at this: What is the probability of getting the above result completely due to chance? If this probability is low (small), then we say that the result can't be due to chance. In other words the medicine works. (Incidentally, the probability of getting the above result due to chance is 27.5%; do you think the medicine works?) But before we can really understand what happened in this example, we need to understand probability. Let's try another, non-biological, example: We flip a coin 10 times. There are 11 possible results: 5 heads, 5 tails 4 heads, 6 tails 6 heads, 4 tails 3 heads, 7 tails 7 heads, 3 tails 2 heads, 8 tails 8 heads, 2 tails 1 head, 9 tails 9 heads, 1 tail 0 heads, 10 tails 10 heads, 0 tails For which of these results would you think that the coin is unfair? Most likely if you got one of the results near the bottom of either of these columns you'd think the coin is unfair. Not convinced? Suppose I give you a dollar for every head, and you give me a dollar for every tail, and we flip the coin 10 times (my coin!) and we get 0 heads and 10 tails. You really think I’m being honest?? When we can figure out the probability of getting 8 tails and 2 heads, we will be done with basic probability. So, to summarize, we need probability to answer questions about our data: what is the probability our results are due to chance? II. Some very basic probability. We won’t learn a lot of probability here. Just enough so that you understand a couple of basics. Probability is a sub-discipline in mathematics, it's that complicated! So let's define probability: Probability is a number that describes the chance of an event happening. We note probability by: Pr{E} (sometimes also just P{E}) This means “the probability of the event ‘E’“ So what is E? Some examples: E1: Heads, when tossing a coin. E2: the # 6, when rolling a dice. E3: the # 7, when rolling two dice. E4: the Orioles (or Nationals) winning the World Series. so we have: Pr{E1} = 1/2 Pr{E2} = 1/6 Pr{E3} = 1/6 (more on this soon) Pr{E4} = ????? probably close to zero, but more soon. Important: Pr{E} is always between 0 and 1. More formally: 0 ≤ Pr{E} ≤ 1 If you ever calculate a probability outside this range, you have made a mistake!! Your text is a little strange on the top of p. 90 [80] {86}. Let’s think about things this way instead (this definition is a bit more old fashioned, but it's a bit easier to explain): Pr { E } = the number of ways the event (E )can occur the number of possible outcomes So let's apply this to the probabilities we calculated above: Pr{E1} = 1/2 because there is only 1 way of getting a head, but two possible outcomes (heads or tails). Pr{E2} = 1/6 again, because there is only 1 way of getting a 6, but there are six possible outcomes (the numbers 1 through 6). Pr{E3} = 1/6 this one is trickier. First, figure we need to figure out how many different ways you can get 7: dice # 1 2 1 2 3 4 5 6 6 5 4 3 2 1 Note that the sum in each case is 7. In other words, there are six different ways you can get a 7. How many possible outcomes are there? If you make up a list like above, you’ll get 36 different possible outcomes. So we have 6/36, which simplifies to 1/6. But there's a simpler way to figure out all possible outcomes: The first dice can give us 6 different outcomes. The second dice can give us 6 different outcomes. But this together and we have: 6 x 6 = 36 And we have 35 possible outcomes and again we have 6/36 = 1/6 for our answer. Pr{E4} = ?? This one is actually quite difficult, and we won't illustrate this. But a few comments: For many years the Orioles were one of the worst teams in baseball, in which case we could assume the numerator is essentially 0, and we don't care about the denominator. These days it would be harder to figure out since the numerator is not 0. A totally naive way would be to assume that every team is doing equally well: There are 30 teams in baseball (in 2015). So if every team is doing equally well, we can just do: 1/30. (One possible way to win the World Series, 30 possible outcomes (30 teams)). This is a rather stupid way do do things! Professional sports statisticians use much more complicated ways to figure this out! Let's do another example and introduce some new/different notation: For a single die, what is the probability of getting a 2, 3, or 4? Let Y = our result Remember, Y is our random variable So now we can write our event as follows: E=2≤Y≤4 Notice that there are three possible ways to get what we want (2, 3, or 4). We have six possible outcomes (the numbers 1-6). So we can write our probability as follows: Pr{E} = Pr{2 ≤ Y ≤ 4} = 3/6 = 1/2. See also your text on page 92 - 93 [96 - 97] {102-103} for this and other examples. Combining probabilities: Again, we’re just introducing the basics. Suppose you wanted to figure out the probability of getting a 6 with a dice and heads with a coin. You can count up all the possible outcomes, and do it that way, or you can multiply some probabilities (but see note below on independence). For instance: Let Y1 = outcome with dice, and let Y2 = outcome with coin Pr{Y1 = 6} = 1/6 Pr{Y2 = heads} = 1/2 (See above if you don't remember how we got these probabilities) So, assuming Y1 and Y2 are independent (see below) we have: Pr{Y1 = 6 and Y2 = heads} = 1/6 x 1/2 = 1/12 Your book uses “probability trees”, which are very convenient when you don't have a lot of outcomes, but can get out of hand very fast otherwise (as in our example): Independence: our events (e.g., the dice and coin above) can not influence each other. In statistical language, they must be INDEPENDENT. For example, you somehow connect your coin and dice so that whenever you roll a 6 you ALWAYS get heads Obviously, the multiplication rule above is no longer valid/ We won't worry about instances like this, although they're obviously very important - see your text if you're interested. Let's do another example, this time using the text (example 3.19 [3.16] {3.2.9}): Newborn infants sometimes suffer from a serious respiratory ailment known as “hypoxic respiratory failure”. A possible alternative treatment to surgery is Nitric Oxide. Does Nitric Oxide work? Newborns with the disease were assigned into two equal sized groups. One group was given nitric oxide, the other was not. In the group given nitric oxide, 54.4% did not need surgery. In the control group, only 36.4% did not need surgery. So what is the overall probability of needing surgery? In the control group, 1- .364 = .636 or 63.6% needed surgery. In the treatment group 1- .544 = .456 or 45.6% needed surgery. So the overall probability of needing surgery becomes: Pr{surgery} = (.5 x .456) + (.5 x .636) = .228 + .318 = .546 So 54.6% of newborn infants with this condition needed surgery. Note that we multiply the various probabilities and then add up the outcomes we’re interested in. (Caution: if you have the 2nd edition there is a slight math mistake in this example.) Comment: if you want to express probabilities in %, that's fine (the book often asks you to do this), but make sure you do not forget the % symbol. Conditional probability: When we talk about conditional probability, what we’re interested in is “what is the probability that Event A happens GIVEN that Event B has happened”. For example: Pr{rolling an 8 with 2 dice GIVEN that the first dice shows a 3} We usually write this like this: Pr{rolling an 8 with 2 dice | that the first dice shows a 3} where the vertical bar (“|”) means “given”. So lets solve the above problem: One die shows a 3 This implies the other die MUST come up 5. So what is Pr{rolling a 5}? Easy, just 1/6. But notice the following (one way of looking at independence): Pr{rolling an 8 with 2 dice } = 5/36 (You can do the math yourself, just add up all the possible ways of getting an 8 with 2 dice and divide by 36) This is NOT the same answer we got above. This, incidentally, tells us that the events “rolling an 8 with 2 dice” and “the first dice shows a 3" are not independent. A good way of thinking about conditional probability is to "redefine" your universe: For example, in the above dice rolling experiment, we are no longer interested in all possible outcomes with two dice, JUST THOSE in which the first dice you've rolled shows a 3. So you only look at the results where the first dice showed a 3, and ignore all others For example, you're not interested in ANYTHING if the first die is a 2. *************** There is a more formal approach in your text. See p. 604 - 605 [90 - 92] {97} if you're really interested. If you do look at the more formal approach, be aware that calculating the probability for the numerator may not be straight forward since the events may not be independent (so you can't always just multiply)) *************** Let's do an example from the book on p. 98 and 99 [86 & 87] {92 & 93}. This is interesting because it also talks about medical testing. First, let's list some probabilities (see Fig. 3.6 {3.2.5}): Pr{having disease and (+) test } = .076 Pr{having disease and (-) test } = .004 Pr{not having disease and (+) test } = .092 Pr{not having disease and (-) test } = .828 Also note that the probability of having this disease is 0.08 (and therefore the probability of not having the disease is 0.92). It is obviously possible that someone who doesn't have the disease will test positive, or someone who has the disease will test negative. These results would be called a “false positive” or “false negative” respectively. Just because the test comes back positive, doesn't mean one has the disease. For example 3.21 [3.18] {3.2.11}: Suppose the test is positive, what is the probability someone actually has the disease? Pr{having disease | (+) test} First we need to figure out the overall probability of a postive outcome: Pr{ (+) test } = .076 + .092 = .168 (we just added up everything above with a (+) test. This is our new universe. We’re no longer interested in anything with a (-) test) Then we get Pr{having disease and (+) test } = .076 (this is straight from above) Finally, we figure out what we want: Pr{ having disease and (+) test } = 0.076 = 0.452 Pr{ (+) test } 0.168 (For those who are really curious - notice the similarity with the above and the more formal approach - it's not an accident!) And finally we can say that Pr{having disease | (+) test} = 0.452 This is pretty awful It means that even if you have a positive test, chances are you DON’T have the disease. This is a horrible test! Let’s take the example one step further and figure out: Pr{no disease | (-) test} very quickly now: Pr{ (-) test }= .004 + .828 = .832 so Pr{no disease | (-) test} = .828/.832 = .995 This is much better. If you have a negative test, you almost certainly don’t have the disease. (Note that your text doesn't use conditional probability for this problem, which makes it a bit confusing following it) III. Back to coins (and introducing the binomial distribution): Let’s return to our problem from right at the beginning, flipping a coin 10 times. So what IS the probability of getting 8 tails if the coin is really fair? Let’s figure out the number of possible outcomes: All possible ways of getting 0 tails: HHHHHHHHHH All possible ways of getting 1 tail: THHHHHHHHH HTHHHHHHHH HHTHHHHHHH etc. All possilbe ways of getting 2 tails: TTHHHHHHHH THTHHHHHHH THHTHHHHHH etc. Then you need to list all possible ways of getting 3 tails, then 4 tails, and so forth. You could do it this way, but you’d be at it for a very long time. Obviously we need to do something else. The binomial coefficient: In a situation like this where we have a bunch of trials (tosses), we can figure out how many different possible ways we can get what we want by using something called the binomial coefficient: n! n = C = n j j ! n− j! j The “!”symbol means “factorial”. This is defined as follows (for any positive integer, x): x! = x(x-1)(x-2)...(2)(1) so, for example, 3! = 3 · 2 · 1 = 6, and 5! = 5 · 4 · 3 · 2 · 1 = 120 This gets big FAST (try 10! or 20!) Also, 0! = 1 Why?? It doesn't seem to make an sense, but it works. You can actually show this, but it's complicated and involves some messy calculus. So what about all the other stuff? n = the number of trials j = the number of “successes” (we'll be using “y” here eventually) So, for our example, we have: 10! 10 = 8! 10−8! 8 Don't just plug this in and use brute force to get an answer - often you can cancel a lot of terms and even do this by hand (without a calculator) after you've simplified it (demonstrate). Incidentally, your book lists binomial coefficients in table 2 of the appendix. To use it, figure out n and j (= y), then look down the row for n, and the column for j. For example, if n = 5 and j = y = 3 , the table gives 10. In any case, the formula tells us that there are 45 different ways of getting 8 tails if we toss a coin 10 times. We could now go through and calculate the number of different ways of getting 0 tails, 1 tail, 2 tails, etc. to get our denominator, and then do 45/denominator. This would work, but even this is still tedious. Incidentally, if you want to know why the binomial coefficient works, see Appendix 3.2 in the 3rd edition, or 3.1 in the 4th edition. So let’s multiply some probabilities instead: The probability of getting a tail = .5, so if we want 8 tails, we can multiply: .5 · .5 · .5 · .5 · .5 · .5 · .5 · .5 = .58 = 0.00390625 But if we have 10 trials, that means we have 2 heads, so we multiply that in as well: .5 · .5 = .52 = .25 Now if we multiply everything together, we get: 0.00390625x0.25=0.0009765625 What we have now is the probability of one way of getting 8 heads and 2 tails BUT there are 45 ways of getting 8 heads and 2 tails, so we have: 45 · 0.00097625625 = 0.0439 And finally, the probability of getting 8 heads in 10 tosses is 0.0439 This is an application of the binomial distribution formula, which is given as follows: ( nj) p q j n− j = n p j (1− p)n− j j () where p = probability of success q = 1 - p = probability of failure n = the number of trials j = y = the number of successes Most statisticians prefer the second version of the formula since it has one less variable. Also remember that we will be using y instead of j shortly (we're just using j right now to be consistent with the text) In the coin example we're working on we have: p = probability of tail = .5 q = 1 - p = probability of head = .5 n = the number of tosses = 10 j = y = the number of successes = 8 so we have, (much easier now): P (8 tails in10 tosses ) = 10 .58 .52 = 0.0439 8 ( ) Let’s finish for today by going through example 3.29 [3.45] {3.6.4 - but note that for some bizarre reason the 4th edition changed p from 0.39 to 0.37}. We are sampling 5 individuals from a large population (basically, sampling with replacement; we may talk more about this later) 39% of the individuals in the population are mutants (37% if you have the fourth edition). So the probability of getting three mutants is: P (3 mutants in a sample of 5) = 5 .393 .612 = 10×0.0221 = .221 3 () (If you have the fourth edition, you'll get 0.201 here). Your book lists all possible combinations for a sample of 5 in table 3.5 {3.6.3}. You should look at this and make sure you know how they got the results. Your text presents the binomial distribution in a slightly different way (there’s nothing wrong with it, it’s just different).