* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The opening example in the lecture is designed to illustrate the
Survey
Document related concepts
Transcript
The opening example in the lecture is designed to illustrate the basics of probability and overcome some of the standard misconceptions. I first learned of the example from Natalie Angier’s The Canon. Her example comes from Deborah Nolan (a statistician at UC Berkley). A variation on the example can also be found in Nolan’s book with Andrew Gelman Teaching Statistics: A Bag of Tricks. Divide the students into 2 groups (I use if the last digit of their id number is odd or even). Instruct one group to flip a coin 100 times, recording the heads and tails in order. It is important that you make sure they record the order and not just a tally of heads or tails. Instruct the other group to fake it. They should write down a series of heads and tails as if they were flipping a coin on a sheet of paper with their name on it. Tell the students that you will leave the room and then try to guess which flips are real and which are fake. Leave the room for 5 minutes while the students do this. When you return collect the lists. Designate one student as the reporter. Looking at the lists, go through and guess student by student while the reporter counts the number of correct and incorrect guesses you make. How do you guess? Look at the longest run of consecutive heads or tails. If it is 6 or more guess it is a real coin flip. If the longest run is shorter, guess fake. Another way to do it is to have two sequences written on the board by the students and then guess when you come in. This has a nice visual cue to it because they see the patterns. The next batch of slides walks through the “odds” of different outcomes. This isn’t introducing probability yet, but the students walk through some of the basics. (Also note that it is only for consecutive heads. Given that consecutive tails are equally likely, the odds of “a run of that length” is twice what is listed). Slide 7 Why didn’t the fake flips look like this? Ask the students who did the fake flips why they didn’t have longer runs of consecutive H or T in their results. They were probably focused on the short term patterns. They induced dependence between outcomes. The problem with the artificial flips introduces two concepts that are important to probability—the dependence/independence of the event, and, loosely, asymptotics. Slide 8 The definition of probability is from Levin and Fox 2004. Elementary Statistics in Social Research. You might want a different definition. Some emphasize repeated sampling, some multiple experiments. Hopefully your book has one and you should use that. This one emphasizes events which matches the previous example. Slide 9 Properties of probability There are some basic properties of probability that are important to understand 1) Probability of an outcome must be between zero and one. This means that there can’t be negative probabilities, obviously, but it also limits the language we use for probabilities. They are bounded by 0 and 1. They way we write it up and discuss it need to be in the proper scale. We will come back to this idea when we talk about sums and products of probabilities. 2) Sums of exclusive events If there are multiple possible outcomes (and they are mutually exclusive), then the probability of either of 2 of them occurring is the sum of the probabilities for the individual events. Let’s look at an example: Slides 10-11 This example allows for understanding the probabilities of non-mutually exclusive outcomes. Example: what is the probability I draw a diamond from a full deck of cards? 0.25 What is the probability I draw a heart? 0.25 What is the probability I draw a red card? 0.50 Slide 11 What is the probability of a face card? There are 12 face cards out of 52 (3/13 is 0.230769). What is the probability of a red card or a face card? Not the sum of the two. How many possibilities? There are 26 red cards out of 52. Those all would be yeses in this set up. There are also 6 black face cards. So out of the 52 cards, 32 of them are either red or a face card (or both). Slide 12: Complement rule The probability of event not occurring is one minus the probability of the event occurring. Seems obvious, but this is a powerful idea, because we can use this to invert our thinking on some hard problems to make them easier. Slide 13 Continues the card example to illustrate the complement rule Slide 14 The probability of one of the full set of possible outcomes occurring is 1. Seems logical and is an extension of the complement rule. Slide 15 uses the coin flip as an example. Ask the students, what are the two possible outcomes? (this may be too obvious. It is heads or tails). Slide 16 fills in the outcomes What are the probabilities of each outcome (only accept 0.5 as an answer, not 50/50 or 50%). Slide 17 fills in the p-values Slide 18 changes the event. It is now two coins. Ask “What are the possible numbers of heads that you can flip if you flip two coins?” You want to not ask for possible events because in this example order doesn’t matter (you will point this out in a minute). Slide 19 What is the probability that you will get 2 heads if you flip 2 coins? Slide 20 What is the probability that you will get 0 heads if you flip 2 coins? Slide 21 Given the answer to the first two questions, the answer to the probability of getting 1 heads is obvious Slide 22 This fills in the last probabilities. This illustrates the idea of the sum of independent events. Think back to the 4th slide (now slide 23 as well IF YOU CHANGE THE ORDER OF THE SLIDES FIX THIS). We listed 4 possible outcomes each with equal odds of occurring (one in four). But either the second (HT) or the third (TH) fall into our new second category (1 heads). The probability of getting either HT or TH is the sum of the two probabilities or 0.25 + 0.25 = 0.50 Slide 24 Returns slide 21 Also note that the sum of the probabilities is 1. Slide 25 This slide introduces the idea of independence. Independence is important when we are looking at multiple events (2 coin flips). It is what allows us to simply add the probabilities together in the previous example. The P(heads) on the second flip is independent of the outcome of the first flip. When would this not be true? Slide 26 What if we draw 2 cards from a deck without replacing the first one. What is the probability that the first card is red? 0.5 What is the probability that the second one is red? It depends on the color of the first card. Slide 27 After the first card is drawn, there will be 51 cards left. If the first card is red, there will be 25 red cards. If the first card is black, there will be 26 red cards. So, if the first card is red, the probability that the second will be is 25/51 or 0.49 If the first card is black, the probability that the second will be is 26/51 or 0.51 Slide 28 begins the “What’s the color on the other side of the card?” example from Gelman and Nolan Teaching Statistics: A Bag of Tricks (section 7.5.1, page 111). Before class prepare several sets of 3 cards. 1 should be blue on both sides, 1 should be pink on both sides and one should be pink on one side and blue on the other. Have a hat or a box, or something that the student can draw a card from so that they see only one side of the card and not the other. Give one set of cards to each student and keep one for yourself. Using your set, draw a card and tape it to one side of the board. Ask the students what is the probability that the same color as the front of the card. And ask why. They will probably guess 1/2 because they know that half of the cards are the same and half are different. The probabilities look independent. They aren’t. The correct answer is 2/3. Don’t explain this yet. Do a simulation. Make each student draw a card and tape it to the board. If the color is the same as yours, put it on the side of the board you did. If it is the other color, put it on the other side. When you have all of the cards taped to the board, flip them over (tape them so that they have the other side showing). There should be a noticeable difference. They shouldn’t be evenly pink and blue. The quicker way to run it is to have everyone draw one and set if on their desk. Ask how many have blue showing (should be half the class). Tell them to flip it and keep their hand up if it blue on the other side. Repeat with pink. Slide 33 is the key The first arrows are the probability that you draw one of the cards (1/3 for each card). The second arrows are the second half—what is the back of that card? Given that you have drawn a card, there is a 0.5 chance that each side is the back. If the front is blue you have either drawn the blue/blue card or the blue /pink card with the blue side up. So, there are only three ways to see a blue card (blue/blue side 1, blue/blue side 2, or blue/pink side blue). Of those three choices, how many have a blue on the back? What does this tell you? Knowing what is on the front of the card tells you something about what is on the back. Another way to think of it is to ask: Can you name one of the two cards left in the hat? If the card is blue, you know that pink/pink is in the hat. So this is either blue/blue or blue/pink. You also know that it isn’t blue/pink with pink up. So you are back to the three choices and the math follows. Slide 34 This is a final summary of the technical rule of conditional probabilities. LECTURE 2 The starting point of this lecture is to move from the basics of probability last time to illustrate how we can use probability. A key piece to this is the beginning of the idea of testing the null hypothesis. Slide 2 Introduction. Draw the connection between last lecture and this one. The basic question for class today is why did I talk about cards and coins last time. That is, how do we use probability in Political Science? Slide 3 Think about a real world example about how probability is used. You are a bureaucrat at the FDA charged with approving some new drug. The drug company comes to you with the results of a clinical trial. Assume that the trial is perfectly done. They have 40 people with the disease and they give 20 the new drug and 20 a placebo. Slide 4 Results: 10 of the 20 people with the placebo, 11 of 20 with the drug survive. Ask: Would you approve the drug? Why not? They will probably tell you something about their naïve understanding of probability, that if half the people survive with a placebo, then you wouldn’t be surprised if 11 survive with a drug that doesn’t work. Slide 5 Ok, that makes sense, but what if the results had been different. Now 2 with the placebo survive and 18 with the drug. Would they approve it? Most of the students will answer yes. That looks to be more than chance Slide 6 Third set of results: 8 in the placebo survive and 12 in the drug condition. Now do you approve it? Most students will be ambivalent. That is the point. Wouldn’t it be nice to have some set of rules that 1) are clear in their implications and 2) we could all agree on? Slide 7 This is why we use probability. It is a set of agreed on rules that everyone recognizes. They are based on mathematical formulas that have well understood properties. These rules will be in place before anyone looks at the data. That last part is important. In this example, drug companies have millions invested in the development of drugs. If they are the ones writing the initial reports on if the drug works, they have a strong financial incentive to say that it does and could cook the books in their favor. Agreed on a priori rules prevent that. BUT we have to understand what these rules are to use them properly We have seen some of how probability works last time, lets go through another example Slide 8 This example is found in both Natalie Angiers The Canon and Gelman and Nolan’s Teaching Statistics: A Bag of Tricks. Angiers references Nolan’s class as her example. Ask: how many of you would accept the following wager: If no 2 people in this room have the same birthday (month and day, not year) you get an A for the class. If 2 or more people have the same birthday, then you get an F. (obviously, not an ethical wager for an instructor). Slide 9 Would you have won the wager? This is the probabilistic question (until we look at the data). There are X number of people in the class and 365/6 days in the year. Obviously, if there were more than 366 people in the class, two of you had to share the same birthday. But X is a lot smaller than 366. Let’s see who would have won. Proceed around the room, starting with yourself. Announce your birthday and ask if anyone matches. Then start at one end of the room and ask that person’s birthday. Go person by person until you get a match. Slide 10 Let’s calculate the probability. The number isn’t 60/365 [NOTE THE 60 IS USED AS A PLACEHOLDER FOR THE NUMBER OF STUDENTS IN CLASS. CHANGE THIS TO MATCH YOUR CLASS] We are going to use the complement rule. We are asking if at least one pair shares a birthday. That is the complement to the probability that no pair of people in this class share a birthday. So p(at least one pair share a birthday) is equal to 1-p(no pair equals a birthday). How many unique pairs are there in the class? Work through the math: I could pair with every student in the class. That is N pairs. The first student could pair with every other student which is N-1 (so we are at N+N-1). The second student could pair with every student but the first to create a new unique pair, this is N-2. That means we have N+N-1+N-2 And so on until the last two students who could pair with themselves to create the last pair. n The formula for this is [N choose K] where we have N people and we want to know k how many unique ways to choose 2. For this example that is 1770 pairs. [NOTE CHANGE THIS TO THE RIGHT NUMBER OF N] If we want to know the probability of a pair having the same birthday, we need the complement of the probability that NONE of these 1770 pairs have the same birthday. Still, the probability that a pair has the same birthday is 1/365 or 0.00274, so maybe this isn’t too bad. Slide 11 Let’s assume that birthdays are independent (no twins). We know that the probability of a pair matching is 0.00274. The complement of that is 0.99726. We have 1770 of these pairs. Remember the rule from last time. What was the probability of 2 heads? It was the probability of the first is a heads and the second is a heads. Slides 12 through 14 are from the first lecture. Repeating the multiple coin flips example. Slide 14 adds the probabilities. Let’s see the math for the birthday example Slide 15 The probability is 0.99726 raised to the 1770th power That is 0.008. This is the probability that none of the 1770 pairs in the class were the same. By the way, this is why the wager would have been unethical for me to take. I knew you would lose. Slide 16 The rules of probability determine the probability that any combination of events would occur given we know how likely the individual events are. In any enterprise where we are looking at data and trying to find a pattern, we want to know if these patterns exist due to some phenomenon we are interested in or simply by chance. In the first FDA example you were confident that the difference between 10 living in the placebo group and 11 in the drug wasn’t enough to be not by chance. In the second (2 in the placebo and 18 in the trial) you were confident that it wasn’t by chance. This is the heart of statistical significance. If the pattern is unlikely to exist via chance then we conclude that the pattern is statistically significant. If the probability calculation can’t rule out that the pattern exists by chance we conclude that the pattern isn’t statistically significant. This is the goal of research and probability is the tool we use. Slide 17 What is statistical significance not? Not a question of the “strength of the relationship” It might be a powerful pattern, but it might not. What if the drug trial example was run on 20,000 people and 2 in the placebo group survived, but 50 in the trial group did. The drug would have a statistically significant effect, but 99.5% still died. Slide 18 This the difference between statistical and substantive significance. Slide 19 Then what is statistical significance. It is a 1) probabilistic statement 2) that a pattern in the data also exists in the population and 3) isn’t in the data by chance. It is probabilistic because we will be saying that it probably didn’t occur by chance. It could have. Also anything is possible by chance. For instance, could it be possible that in the 2008 election Gallup randomly sampled 1000 voters and only got a hold of Obama voters? Sure, it is possible to get 1000 Obama voters out of a population of 300 million. Highly unlikely, but possible. Statistical significance just means that the pattern is sufficiently unlikely to occur by chance that we conclude it is real. Slide 20 There is always a chance that we are wrong. Slide 21 The way to test the statistical significance of a pattern is through hypothesis testing. We have talked about the development of hypotheses before, and now we will walk through the logic of the testing. The tricky part is that we never test our hypothesis directly. Instead we test the null hypothesis . The null is the complement to our hypothesis. If we hypothesize that there is a pattern in the data, then the null is the hypothesis that there is no pattern. Slide 22 Why do we use the null hypothesis? Aren’t we interested in finding patterns, not the absence of patterns? Yes. But think about the logic of the birthday example. We wanted to know the probability of finding at least 1 match among us. We looked at the probability that there were no matches, the compliment of what we were interested in and used the rule about complements in probability. We are, intuitively, going to do the same thing here. If there is a pattern in our data, there are 2 possibilities to explain it. 1) it occurred by chance. 2) it didn’t occur by chance. The first is, essentially, the null hypothesis, the second is the hypothesis we think is true. We will find out the probability that we could find the pattern in the data by chance. If this probability is really low, then we conclude that it probably wasn’t by chance and the pattern is real. Part of the problem is that we don’t have a strong enough theory to say “by how much.” Example: There is a gender gap in voting for president, therefore women were more likely to vote for Obama than men. That is the extent of our theory. Our theories are never strong enough o give point predictions about how much more. The question we ask is how big of a difference is enough to think that our theory is right. Slide 23 The answer is always big enough that it didn’t occur by chance. But how big is that? This is where we lean on probability to get the answer. Remember, anything is possible, In a survey it would be possible to gen only women who voted for Obama and only men who voted for McCain, even if there were no gender gap. Slide 24 Not everything is probable. And we want to know exactly how improbable outcomes would be if the null hypothesis were true. Slide 25: Gender gap in voting example. This is based on the 2008 ANES. The vote calculation is based on question V085044a. Recoded so that the Obama voters are a 1 and McCain voters are 0. The gender variable is V083311. Men are coded as a 1 Women were more likely to vote for Obama than men, 69% vs. 63% is that big enough? Slide 26 Probability lets us answer that. We need to make a few assumptions. 1) Assume that the data are right 2) assume that there is no real difference in the population. Men and women were equally likely to vote for Obama. 3) Assume a probability distribution (don’t worry right now—this is next week). Basically assume that we can write down (or the computer knows) a formula for calculating a probability. Then we can calculate the probability that these are created by chance. Slide 27 Hold on, I thought we wanted to know the opposite. I thought we wanted to know the probability that it didn’t occur by chance. Don’t we want to test our hypothesis? We can’t really do that. Testing the null hypothesis is the best we can do. Slide28 Why? Why is this the best we can do? At the end of our tests we have to conclude one of two things: Are theory is right or our theory is wrong. Either the data support our hypothesis (there is a connection between gender and voting) or the data support the null hypothesis (there is no connection). These are two mutually exclusive probabilities. We cannot be both right or wrong. There is no third option. This stark distinction is great. Because there are only two worlds (hypothesis is right/hypothesis is wrong), the probability that we are in one of those worlds is one minus the probability that we are in the other. We can, therefore, test the null hypothesis and use this to determine which world we are in. Slide 29 Here is what we do. We observe our data. We calculate the probability of seeing these data if there is no relationship. If we sampled 906 women and 621 men, and every person had a .666 probability of voting for McCain {this is the sample estimate from the data}, what is the probability that we would get this big a difference? The answer is very small. 0.01. Slide 30 That is very unlikely. But what does that mean? It means one of three things. The data are wrong somehow (and they aren’t). We were really unlucky in our sampling (maybe). Or the assumption that there is no relationship is wrong. If the assumption that no relationship that is underneath the probability calculation is wrong, then what is right? Well, we live in one of two worlds. Either there isn’t a relationship or there is. We conclude that there is relationship Slide 31 How improbably is enough? We are forced to make a stark decision based on this probability of the null hypothesis being right. How small of a probability is enough to conclude that the null is wrong? 0.05 (1 out of 20) Why? Convention. Something of a holdover from pre-computer days. It is really just a norm. we will talk more about this when we talk about probability distributions next week.