Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
TOPIC 32 Binomial Distribution If you are completely unprepared for a multiple choice exam and have to guess blindly on each question, how likely are you to pass? Would you prefer few options per question or many? Would you prefer many questions or few? How can you assess whether a group of people do better than blind guessing in a taste test to distinguish between brands of cola? Can election results be predicted simply from comparing candidates’ faces and having no other knowledge about them? You will learn how to answer all of these questions, and many more, as you study the binomial distribution. Overview You continued your study of random variables in the previous topic. Now you will investigate the distribution of a particular type of random variable, known as the binomial distribution, that arises in many situations. This probability distribution often applies to studies that collect binary categorical data. In addition to studying properties of the binomial distribution, you will learn how to apply it to assess the statistical significance of a sample result. Preliminaries 1. If you were presented with three cups of cola, two of which contained one brand and one of which contained a different brand, do you think you would be able to successfully pick out the odd cola? Rossman/Chance, Workshop Statistics, 4/e Topic 32 1 2. In the previous scenario, if someone else successfully picked out the odd cola four times in four tries, would you be fairly convinced the person is actually able to distinguish better than blind guessing? 3. In the question 1 scenario, if 12 people in a class of 21 correctly identify the odd cola, would you be fairly convinced the class as a whole is able to distinguish better than blind guessing? 4. If you are completely unprepared for a multiple choice exam and so must blindly guess on all the questions, would you prefer that the questions have three options (possible answers) each or five options each? Explain briefly. 5. In the question 4 scenario, if you must answer at least half the questions correctly in order to pass the exam, would you prefer the exam consist of 10 questions or 30 questions? Explain briefly. Rossman/Chance, Workshop Statistics, 4/e Topic 32 2 In-Class Activities Activity 32-1: Distinguishing Between Colas 13-13, 15-13, 17-24, 18-9, 32-1, 32-2, 32-7, 32-8, 32-9 In an experiment to determine whether people can distinguish between two brands of cola, subjects are presented with three cups. Two cups contain one brand of cola, and the third contains the other brand. Subjects are to taste from all three cups and then identify the one that differs from the other two. a. If you have absolutely no ability to distinguish between the colas and, therefore, must guess blindly, what is the probability you would correctly identify the cup containing the odd cola? Now suppose you repeat this process (of trying to identify which of the three cups contains the odd cola) a total of four times. Let’s continue to assume you have no ability to distinguish and so make blind guesses each time. b. Go ahead: Make a blind guess for which of the three cups contains the odd cola by circling one letter for each of these four trials: c. 1. A B C 2. A B C 3. A B C 4. A B C Your instructor will tell you the “correct” answers here. Count how many you got correct, and then combine your results with the rest of your class. Record these results in the following table: Number of Correct Identifications Number of Students Proportion of Students Rossman/Chance, Workshop Statistics, 4/e Topic 32 0 1 2 3 4 Total 3 d. Based on these results, would you consider it surprising if a guessing subject got all four identifications correct? Explain. Now let’s begin to calculate the exact probabilities of the various possible outcomes for the number of correct identifications, still assuming a subject guesses blindly each time. The following lists all possible outcomes in the sample space. For example, S1S2F3S4 (the third outcome in the top row) means a subject correctly identified the odd cup on the first two trials (Successes), but then got the third one wrong (Failure) before getting the fourth one correct (Success). S1S2S3S4 S1S2S3F4 S1S2F3S4 S1S2F3F4 S1F2S3S4 S1F2S3F4 S1F2F3S4 S1F2F3F4 F1S2S3S4 F1S2S3F4 F1S2F3S4 F1S2F3F4 F1F2S3S4 F1F2S3F4 F1F2F3S4 F1F2F3F4 e. How many possible outcomes are in this sample space? f. Explain why these outcomes are not equally likely. Rossman/Chance, Workshop Statistics, 4/e Topic 32 4 g. Determine the value of the random variable X = number of correct answers for each outcome in the sample space. Record these values directly below the outcomes in the sample space listed above part e. h. Use the multiplication rule for independent events to calculate the probability of the outcome S1S2F3S4. Also explain why assuming the outcomes are independent from trial to trial is a reasonable thing to do here. i. What is true about the probability for each of the outcomes (S1S2S3F4, S1S2F3S4, S1F2S3S4, F1S2S3S4) that correspond to three successes (i.e., X = 3)? Explain. (Hint: You have already calculated one of these probabilities. You do not need to calculate the others if you think this through.) j. To determine Pr(X = 3), add the probabilities Pr(S1S2S3F4) + Pr(S1S2F3S4) + Pr(S1F2S3S4) + Pr(F1S2S3S4). Also explain why the addition is valid here. Rossman/Chance, Workshop Statistics, 4/e Topic 32 5 You should have found in part i that all outcomes resulting in the same number of successes (such as X = 3) have the same probability. Therefore, you can find the probability of a certain number of successes by multiplying the probability for any one of these outcomes by the number of outcomes that produce that number of successes. k. How many outcomes correspond to two successes (i.e., X = 2)? Use this answer and the probability of one of these outcomes (say, S1S2F3F4) to determine Pr(X = 2). l. The other probabilities are reported in this table (to four decimal places). Enter the probabilities you calculated in parts j and k. Also verify the probabilities sum to 1. Number Correct Probability m. 0 .1975 1 .3951 2 3 4 .0123 Are these theoretical probabilities fairly close to the empirical estimates from your class results in part c? Rossman/Chance, Workshop Statistics, 4/e Topic 32 6 This probability distribution is known as the binomial distribution. It applies to random processes for which • Each trial has two possible outcomes (typically referred to as “success” and “failure”). • The probability of “success” remains constant on each trial (call it ). • The outcomes of the trials are independent. • The random variable of interest (X) is the number of successes in a fixed number (n) of trials. The probability distribution of a binomial random variable with parameters n and is given by n nk Pr(X k) k 1 k for possible values k = 0, 1, …, n. The expected value of a binomial random variable is n, and the variance is n (1 – ). The term n k is read “n choose k” and is calculated as n n! k k ! n k ! where n! = n (n – 1) (n – 2) 2 1 and 0! is defined to be 1. You can calculate binomial probabilities by hand or with technology. Note: When using technology, you can often calculate either the probability of an exact number of successes Pr(X = k) or the probability of that many or fewer successes Pr(X ≤ k). Rossman/Chance, Workshop Statistics, 4/e Topic 32 7 n. Use the probabilities in part l to determine the probability that a guessing subject gets at least half of the four identifications correct. o. Use the probability distribution in part l to calculate the expected number of successes in four trials, still assuming the subject is blindly guessing. Also interpret this value, and explain why it makes sense intuitively. p. Use technology to verify the calculations in part n for a subject blindly guessing on four trials. Also use technology to produce a graph of this probability distribution. q. Suppose a subject really does get all four identifications correct. Is it possible that a guessing subject could do this? Is it very unlikely that a guessing subject would do this? Based on this probability, would you be fairly convinced this subject is really able to do better than blind guessing at distinguishing colas? Explain. Rossman/Chance, Workshop Statistics, 4/e Topic 32 8 Watch Out Be careful not to round too much too early with these probability calculations. For example, use .333 or better use .3333 rather than .33 to represent 1/3, or else considerable rounding errors could emerge in subsequent calculations. Always check that the four conditions (two possible outcomes, independent outcomes, constant probability of success, fixed number of trials) of a binomial distribution are satisfied before you apply the binomial distribution. Activity 32-2: Distinguishing Between Colas 13-13, 15-13, 17-24, 18-9, 32-1, 32-2, 32-7, 32-8, 32-9 Reconsider the previous activity. A statistics professor actually conducted this study with 21 students, each of whom tried to identify which of three cups of cola was different from the other two. a. Suppose again that all of the students were guessing blindly among the three cups. Determine the expected value of the number of students who correctly identify the odd cola. Also explain why this value makes sense intuitively. (Hint: The binomial model is reasonable here because you are assuming all students are guessing blindly, so their selections are independent and have the same probability of success.) It turned out that 12 of the 21 students successfully identified the odd cola. b. Is this more than the expected value? Rossman/Chance, Workshop Statistics, 4/e Topic 32 9 c. Determine the probability that 12 or more students would correctly identify the odd cola, if, in fact, all 21 students were blindly guessing among the three cups. In others words, find Pr(X ≥ 12), where the random variable X has a binomial distribution with n = 21 and π = 1/3. (Hints: You can calculate this in two ways. You could calculate the probability of exactly 12 successes, exactly 13 successes, and so on, through 21 successes, and then sum those probabilities. Or, an easier way is to use technology. One caution is that some technology tools may require you to calculate Pr(X ≤ 11) first, and then use the complement rule to calculate Pr(X ≥ 12) = 1 – Pr(X ≤ 11).) d. Based on this probability, is it possible for 12 or more students to have been successful even if they were all blindly guessing? Is it very unlikely for 12 or more students to have been successful even if they were all blindly guessing? Rossman/Chance, Workshop Statistics, 4/e Topic 32 10 e. Do the sample data provide fairly strong evidence to suggest these students do better than blind guessing when trying to distinguish between cola tastes? Explain the reasoning behind your conclusion, based on the probability calculation. The reasoning process you have employed here concerns the concept of statistical significance and is called a binomial test. The probability you have calculated is a p-value. When this probability is very small, it suggests that data at least as extreme as the result observed would very rarely occur by chance if your starting assumption about the random process (e.g., subjects are blindly guessing) is correct. Therefore, a small p-value probability provides evidence against your starting assumption about the process. (The concept of statistical significance and p-value are examined in much detail in Topics 13 and 14 and Units 4 and 5.) In this case, a .021 probability says that if subjects were blindly guessing, there’s only about a .021 chance of obtaining 12 or more successes in 21 trials. This probability is small enough to provide reasonably strong doubt about the assumption that these subjects are blindly guessing. Now suppose this study had involved twice as many subjects, and the same proportion successfully identified the odd cola. (So, for instance, suppose 24 of 42 subjects were successful.) f. Without doing any calculations, would you expect this result to provide stronger evidence against the assumption of blind guessing, weaker evidence against that assumption, or the same amount of evidence? Explain the intuition behind your answer. Rossman/Chance, Workshop Statistics, 4/e Topic 32 11 g. Reconsider the previous question. Use technology to perform the relevant binomial probability calculation. Report this probability, and indicate whether it is greater than, less than, or the same as the probability in part c. h. In light of your answer to part g, do you want to reconsider your answer to part f? Explain. Watch Out Be careful when applying the complement rule with the binomial distribution. Depending on the technology you are using, you may need to find Pr(X ≥ k) by taking 1 – Pr(X ≤ k – 1). In other words, the complement of “k or more successes” is “k – 1 or fewer successes.” Assuming the p-value tells you more than it really does is easy. The p-value reveals the probability of obtaining such an extreme sample result (or one even more extreme), given the starting assumption about the random process (e.g., blindly guessing). But it does not provide the probability that the starting assumption is correct. In other words, the p-value does not give you the probability that subjects are blindly guessing. This p-value also does not tell you why these subjects are doing better than blindly guessing. Rossman/Chance, Workshop Statistics, 4/e Topic 32 12 Activity 32-3: Binomial or Not? 32-3, 32-10 For each of the following situations, indicate whether the random variable has a binomial distribution. If it is binomial, indicate the values of n and (if known). If you need to make any assumptions, clearly state them. If the random variable is not binomial, indicate which of the binomial conditions are not satisfied and explain why. a. You flip a fair coin 100 times and count the number of heads. b. You flip a fair coin and count the number of flips until you obtain a head for the first time. c. You give four babies back to their mother at random and count the number of mothers who get the correct baby. d. You bet on a color for ten spins of a roulette wheel and count how many times you win. e. You play 20 games of a new video game for the first time, improving as you gain more experience, counting the number of times you win. f. You randomly select 10 playing cards from a 52-card deck and count how many cards are red. Rossman/Chance, Workshop Statistics, 4/e Topic 32 13 g. You randomly select one playing card at random from a 52-card deck, replace it, shuffle the cards, and draw another card at random. You repeat this process for a total of ten draws, counting the number of red cards that you draw. These last two questions reveal the difference between sampling without replacement (in part e) and sampling with replacement (in part f). Sampling without replacement does not produce a binomial distribution because the draws are not independent. (For example, if the first 9 draws all produced red cards, then the probability of drawing a red card would be lower on the 10th draw because more black cards than red ones would be left in the deck.) But sampling with replacement does lead to a binomial distribution because putting each card back and reshuffling ensures the draws are independent and the probability of success does not change. Similarly, when a sample is taken without replacement from a very large population such as a sample of 1000 individuals from all American adults, the binomial model often provides a reasonable approximation. In this case, you are sampling without replacement and the binomial distribution technically does not apply. However, when the sample size is a small fraction of the population size, the conditional probability of success does not change much and the binomial distribution can be used safely as an approximation. The conventional guideline is this is a reasonable approximation when the population size is at least 20 times the sample size. h. You take a random sample of 25 Reese’s Pieces candies (which come in brown, yellow, and orange) and count the number that are brown or yellow. Explain how the number of brown or yellow candies can be safely considered to approximately follow a binomial distribution. Rossman/Chance, Workshop Statistics, 4/e Topic 32 14 Watch Out The word success is commonly used for one of the outcomes in a binomial process. But in a given situation, the term success might not refer to a happy event. Particularly with medical studies, a success could be defined as an unfortunate outcome such as illness or death. Also, with more than two possibilities in the process, you can often consider one a success and all others a “failure.” Activity 32-4: Marriage Ages 8-17, 9-6, 16-19, 17-22, 23-1, 23-12, 26-4, 29-17, 29-18, 32-4, 32-18 Recall from Activity 8-17 that a student examined the ages of 24 married couples. Look back at the table of raw data (ages) in Activity 8-17 on pages 163-164 of the textbook. For two of the 24 couples, you cannot tell whether the husband or wife is older. a. For the remaining 22 couples, in how many is the husband older than the wife? In how many is the wife older than the husband? Husband older: b. Wife older: Suppose, for the moment, that there were no tendency for husbands to be older than their wife. Then, for each of the 22 couples, the process of learning who is older would be like flipping a fair coin. In this situation, let the random variable X be the number of couples for which the husband is older. Would X have a binomial distribution? If so, explain and indicate what the values of n and are. If not, explain why not. Rossman/Chance, Workshop Statistics, 4/e Topic 32 15 c. Use the binomial distribution to calculate the probability that the number of couples for which the husband is older would be at least as many as your answer to part a. d. Is this probability small enough to cast considerable doubt on the assumption that there’s no tendency for husbands to be older than their wife? Explain the reasoning process behind your answer. The calculation and reasoning you have performed here, by ignoring the quantitative aspect of the data and only considering which member of the pair has the larger value (age in this case), is called a sign test. Self-Check Activity 32-5: Pop Quiz Suppose you are completely unprepared for a multiple choice quiz and so must guess blindly among the options for each question. You will pass the quiz only if you answer more than half of the questions correctly. Suppose there are five questions with four options for each. Let the random variable X be the number of questions that you answer correctly. a. Explain why this random variable has a binomial distribution. (Hint: List and verify each of the four conditions for a binomial random variable.) Rossman/Chance, Workshop Statistics, 4/e Topic 32 16 b. Identify the values of n and for this binomial distribution. c. Determine the probabilities (to four decimal places) for all possible values of the random variable X. (Feel free to use technology.) d. Calculate the probability that you pass the quiz (answer more than half of the questions successfully). Also interpret this probability. e. If this were a true/false quiz (i.e., only two options per question) with five questions, would you expect to have a larger, smaller, or the same probability of passing the quiz? Explain intuitively, without performing any calculations. Rossman/Chance, Workshop Statistics, 4/e Topic 32 17 f. Reconsider the previous question. Now calculate this probability, and compare it to your answer from part d. g. Now suppose the quiz again has 4 options per question, but with 15 questions rather than 5. Would you expect to have a larger, smaller, or the same probability of passing the quiz? Explain intuitively, without performing any calculations. h. Reconsider the previous question. Now calculate this probability, and compare it to your answer from part d. Solution a. Each question is a trial with two possible outcomes (correct answer, wrong answer). The trials are independent because your answer to any one question does not affect the probability of answering Rossman/Chance, Workshop Statistics, 4/e Topic 32 18 any other question correctly. The probability of success does not change because you are blindly guessing on each question and so have a 1/4 = .25 probability of answering correctly. The random variable is the number of successes (correct answers) in five trials (questions). b. n = 5, = .25 c. The possible values are 0, 1, 2, 3, 4, and 5. To calculate Pr(X = 3) by hand 5 3 .25 1 .25 3 53 .75 ≈ .0879 = 6 .25 3 2 Calculating the other probabilities with technology gives Number Correct Probability (n = 5, = .25) d. 0 .2373 1 .3955 2 .2637 3 .0879 4 .0146 5 .0010 Passing the quiz requires getting three or more answers correct. The probability of this result is .0879 + .0146 + .0010 = .1035. If you took many such quizzes with a strategy of blind guessing, then in the long run you would pass about 10.35% of them. e. With fewer options per question, the probability of getting any one question correct increases, so the probability of passing the exam (answering more than half of the questions correctly) should also increase. f. The random variable now has a binomial distribution with n = 5 (as before) and = 1/2 = .5 (higher than before). The probability distribution is now Number Correct Probability (n = 5, = .25) 0 .03125 1 .15625 2 .3125 3 .3125 4 .15625 5 .03125 The probability of passing the quiz is now .3125 + .15625 + .03125 = .5. You now have a 50-50 chance of passing the quiz, much higher than before. If you must guess blindly among several options, you should hope for a small number of options to guess among. Rossman/Chance, Workshop Statistics, 4/e Topic 32 19 g. With more questions, and with a fairly small probability of getting any one question correct (.25), you are less likely to get more than half of the questions correct. In this case, being lucky and answering more questions correctly than you are supposed to will be more difficult (your expected value is answering less than half the questions correctly). h. The random variable now has a binomial distribution with n = 15 (greater than before) and = 1/4 = .25 (as originally). Technology reveals the probability of answering 8 or more correctly is .0173. This probability is much less than before, indicating you will only pass less than 2% of all such quizzes if you blindly guess. If you must guess blindly among several options, you should hope for a small number of questions. Wrap-Up This topic has extended your study of probability and random variables by introducing you to the binomial distribution. This probability distribution arises when you are counting the number of “successes” in a fixed number of trials, where the trials are independent and the probability of success does not change. You have derived an expression for calculating binomial probabilities, depending on the number of trials (n) and the probability of success (). You have also practiced how to use technology to perform such calculations, and you been given formulas to use to calculate expected value and variance of a binomial distribution. Much of your work in this topic has involved applying the binomial distribution to assess whether an observed result is statistically significant, meaning it is unlikely to occur by chance under a starting assumption about the random process. For random processes to which the binomial distribution applies, such a reasoning process is called a binomial test. When the calculated probability from such a test is very small, then the observed result is unlikely to have occurred by chance (when the starting assumption is true) and so provides evidence against the starting assumption. You have also applied this binomial test to quantitative data (turning the outcomes into yes/no responses), producing what is known as a sign test. Rossman/Chance, Workshop Statistics, 4/e Topic 32 20 In Brief Some useful definitions to remember and habits to develop from this topic are Binomial distributions arise when four conditions are met: trials result in only two possible outcomes, trials are independent, probability of success does not change, and you’re counting the number of success in a fixed number of trials. Binomial probabilities can be calculated from this expression: n Pr X k k 1 k nk The expected value of a binomial distribution is n; the variance is n(1 – ). Binomial distributions apply to random processes that involve sampling with replacement. When the population size is much larger than the sample size (at least 20 times larger), the binomial distribution can be used as a reasonable approximation. The probability of obtaining a result at least as extreme as an observed result, by chance, assuming some starting assumption about the random process, is called a p-value. A binomial test calculates a p-value based on the binomial distribution. When a p-value is very small, the observed result casts doubt on the starting assumption that underlies the test. A sign test provides an alternative procedure to a paired t-test, applying the binomial distribution to paired quantitative data by taking into account only which member of the pair has the greater value. You should be able to Identify whether or not a random variable has a binomial distribution and, if so, specify its parameter values. (Activities 32-1, 32-3, 32-5) Calculate probabilities from the binomial distribution. (Activities 32-1, 32-3, 32-4, 32-5) Perform and interpret the results of a binomial test. (Activities 32-2, 32-4) Rossman/Chance, Workshop Statistics, 4/e Topic 32 21 Conduct a sign test using the binomial distribution. (Activity 32-4) Exercises Exercise 32-6: Choosing Teams Ten basketball players gathered at lunchtime to play a quick game. In order to divide into two teams of five players each, each player stood along the midcourt line facing the same sideline and then, on cue, took a step at random to one side or the other. If five people stepped one way and the other five stepped the other way, that determined the teams. Otherwise, they would repeat the process and continue until five stepped in each direction. a. Determine the probability that five players would step in each direction on the very first try. Also be sure to indicate how your calculation relates to a binomial distribution. One day the players came up with a new strategy: The first nine to show up would all stand in a line and then step at random in one direction. If five stepped one way and four the other way, that would determine the teams, with the tenth player joining the team with four. b. Determine the probability that the nine steps would result in five stepping one way and four the other way. Again be sure to indicate how your calculation relates to a binomial distribution. c. Does the new strategy increase the probability the players will be able to form teams by taking only one set of steps? Exercise 32-7: Distinguishing Between Colas 13-13, 15-13, 17-24, 18-9, 32-1, 32-2, 32-7, 32-8, 32-9 Reconsider Activity 32-1 concerning a study of whether people can distinguish between cola brands. Subjects were presented with three cups of cola, two of which contained the same brand and the third containing a different brand. Suppose all subjects were blindly guessing. Determine the probability that more than half of the sample would correctly identify the odd cola if the sample size were Rossman/Chance, Workshop Statistics, 4/e Topic 32 22 a. n = 9 (Be sure to indicate how you perform the calculation.) b. n = 21 c. n = 99 d. Comment on how the probability (of more than half correctly identifying the odd cola) changes as the sample size increases. Also explain why this makes sense intuitively. Exercise 32-8: Distinguishing Between Colas 13-13, 15-13, 17-24, 18-9, 32-1, 32-2, 32-7, 32-8, 32-9 Reconsider the previous activity. Now suppose each subject is presented with four cups instead of three, again with one cup containing a different brand of cola than the others. Suppose again that all subjects guess blindly in trying to identify the odd cola. Consider again the probability that more than half of the sample guesses correctly. a. Would you expect this probability to be larger than before (when there were only three cups), smaller, or the same? Explain your reasoning. b. Perform this probability calculation with a sample size of n = 9. Is your intuition from part a supported? Explain. Exercise 32-9: Distinguishing Between Colas 13-13, 15-13, 17-24, 18-9, 32-1, 32-2, 32-7, 32-8, 32-9 Reconsider Activity 32-1 again. Suppose again that all 21 subjects are guessing blindly among the three cups. a. How many correct identifications are necessary in order for such a result (or one more extreme) to have a probability of .05 or less? (Hint: You could use trial-and-error, or your technology might have an “inverse cumulative probability” feature that you can use. In this case, think carefully about whether you are finding k or k – 1.) Rossman/Chance, Workshop Statistics, 4/e Topic 32 23 b. Repeat part a for a probability of .01 or less. Exercise 32-10: Binomial or Not 32-3, 32-10 For each of the following, indicate whether it satisfies the conditions for a binomial distribution. If it does, explain why and also specify the values of n and , if possible. If you must make any assumptions, clearly state them. If not, identify which conditions are not satisfied and explain why. a. Four students are chosen at random from your class, and you count the number of males chosen. b. Two baseball teams play a series of games until one of them wins a total of four games. You count the total number of games played. c. You play solitaire repeatedly until you win for the third time, counting the total number of games played. d. You play ten games of solitaire and count how many times you win. e. You collect a sample of 50 M&M candies and count the number of green ones. Exercise 32-11: Rolling Dice 30-8, 31-7, 31-8, 31-9, 32-11 A classic gambling question that helped to initiate the mathematical study of probability is the following: Which is more likely: that 4 rolls of one fair die would result in at least one 6, or that 24 rolls of a pair of fair dice would result in at least one (6, 6)? a. To calculate the probability that four rolls of one fair die would result in at least one 6, define an appropriate random variable and indicate the probability of interest in terms of that random variable. b. Does the random variable you defined follow a binomial distribution? Explain. If it does, identify the values of n and . Rossman/Chance, Workshop Statistics, 4/e Topic 32 24 c. Explain why the probability of rolling at least one 6 in 4 rolls of a fair die is not 4/6. d. Calculate the probability of rolling at least one 6 in 4 rolls of a fair die. Hint: Clearly explain how you are using the complement rule (and/or any others). e. Calculate the probability of rolling at least one (6, 6) in 24 rolls of a pair of fair dice. Again, set up your answer in terms of random variable notation. f. Are these probabilities the same? Are they similar? Which is the better bet? Exercise 32-12: Free Candy Bars 32-12, 32-13 An Australian teacher of statistics noticed that the maker of Mars candy bars was running a special promotion. The wrapper advertised that 1 in 6 candy bars would come with a prize for a free candy bar. The teacher promptly bought 18 candy bars. Let the random variable X represent the number of free candy bars that he would win. a. What assumptions must you make in order for X to satisfy the conditions of a binomial distribution? b. Determine the expected value of the number of free candy bars that he would win. c. Calculate the probability that he would win (exactly) this many (your answer to part b) free candy bars. Would you say that he’s likely to win this many free candy bars? d. It turned out that he won 0 free candy bars from his purchase of 18 bars. What is the probability of this result, assuming the claim is true about the purchase of 1 in 6 candy bars resulting in winning a free one? e. Is the probability in part d small enough to cast doubt on the claim that 1 in 6 candy bars is a winner? Explain your reasoning. Rossman/Chance, Workshop Statistics, 4/e Topic 32 25 Exercise 32-13: Free Candy Bars 32-12, 32-13 a. Suppose the manufacturer made 6,000 candy bars and 1,000 of these bars were “winners.” If the teacher buys 18 candy bars, none of which are winners, calculate the probability that the next randomly selected candy bar will be a winner. (Hint: How many candy bars are left in the population, and how many are winners?) b. Is the probability in part a close to 1/6? c. Repeat part a assuming only 600 candy bars were manufactured, including 100 winners. d. Repeat part a assuming only 60 candy bars were manufactured, including 10 winners. This activity illustrates how the binomial distribution, which would use a probability of 1/6 for each trial, can be a reasonable approximation when sampling without replacement but from a population whose size is much larger than the sample size as in part a. Exercise 32-14: Baseball “Big Bang” 17-5, 17-17, 32-14 In some baseball games, the winning team scores more runs in one inning than the losing team does in the entire game. This phenomenon is known as a “big bang.” (If you don’t know anything about baseball, don’t worry.) A young reader once wrote to the “Ask Marilyn” column in Parade magazine to say that his grandfather told him this “big bang” phenomenon occurs in 3/4 of all baseball games. Marilyn responded by asserting the proportion of games with a “big bang” is actually less than 3/4. To investigate these claims, a random sample of 15 of the 2430 major-league baseball games played in 2007 was taken, and the number of these games with a “big bang” was determined. a. Even though the sampling was done without replacement, applying a binomial distribution is reasonable here. Explain why. Rossman/Chance, Workshop Statistics, 4/e Topic 32 26 b. Determine the expected number of “big bang” games in the sample, assuming the grandfather’s claim is true. The sample actually revealed that a “big bang” occurred in 7 of these 15 games. c. Use the binomial distribution to determine the probability of a result at least this extreme if the grandfather’s claim is true. (Hint: Because Marilyn conjectured the actual probability of a “big bang” is less than 3/4, you want the probability of 7 or fewer big bangs in a sample of 15 assuming the grandfather’s claim were true.) d. Is the probability small enough to cast strong doubt on the grandfather’s claim? Explain the reasoning behind your answer. Exercise 32-15: Heart Transplant Mortality In September 2000, heart transplants at St. George’s Hospital in London were suspended because of concern that more patients were dying than previously. Newspapers reported that the 20% survival rate in the last ten cases at the hospital was of particular concern because it was less than one-fourth the national average. Let the random variable X represent the number of successful operations in a random sample of ten cases. Suppose the probability of survival in a heart transplant for any one patient at this hospital is equal to the national rate of .85. a. Identify the probability distribution of X (both its name and the values of n and ). b. Report the probabilities for the possible values of X, and provide a graph of this probability distribution. (Feel free to use technology.) c. Identify the most likely value of X and its probability. Of the past ten heart transplant cases in this hospital, eight died. Rossman/Chance, Workshop Statistics, 4/e Topic 32 27 d. Determine the probability of two or fewer survivors in a random sample of ten heart transplant patients, still assuming the probability of survival at this hospital is equal to the national rate of .85. e. Based on the probability in part d, do the data (eight deaths in the last ten cases in this hospital) provide strong evidence the probability of survival in this hospital is actually less than the national rate of .85? Explain the reasoning process underlying your answer. f. When analyzing data on all 371 patients who received a heart transplant at this hospital between 1986 and 2000, researchers found that 79 had died (Poloniecki et al., 2004). Determine the pvalue based on these data. (In other words, determine the probability of finding 292 or fewer survivors among 371 random patients, assuming the probability of survival is .85.) g. Summarize your conclusions from these 15-year data, and explain how the conclusions follow from your probability analysis. Exercise 32-16: Predicting Elections 32-16, 32-17 Do voters make judgments about political candidates based on their facial appearance? Can you correctly predict the outcome of an election, more often than not, simply by choosing the candidate whose face is judged to be more competent-looking? Researchers investigated this question in a study published in Science (Todorov, Mandisodka, Goren, and Hall, 2005). Participants were shown pictures of two candidates and asked who had the more competent-looking face. Researchers then predicted the winner to be the candidate whose face was judged to look more competent by most of the participants. For the 32 U.S. Senate races in 2004, this method predicted the winner correctly in 23 of them. a. Determine the probability that 32 flips of a fair coin would result in 23 or more heads. (Also called the p-value of this test.) b. Explain how your calculation in part a relates to the research question in this study. Rossman/Chance, Workshop Statistics, 4/e Topic 32 28 c. Summarize the conclusion you would draw from this study, based on your probability calculation. Also explain the reasoning process behind your conclusion. Exercise 32-17: Predicting Elections 32-16, 32-17 Reconsider the previous activity. These researchers also predicted the outcomes of 279 races for the U.S. House of Representatives in 2004. The “competent face” method correctly predicted the winner in 189 of those races. Calculate the p-value for these data, summarize your conclusion, and explain its reasoning process. Rossman/Chance, Workshop Statistics, 4/e Topic 32 29 Exercise 32-18: Marriage Ages 8-17, 9-6, 16-19, 17-22, 23-1, 23-12, 26-4, 29-17, 29-18, 32-4, 32-18 Reconsider Activity 32-4. Those data were part of a larger study in which the student researcher analyzed 100 marriage licenses. The husband was older than the wife in 67 couples, the wife was older in 27 couples, and the older person could not be determined for 6 couples. a. Conduct a sign test on these data, for assessing whether the husband tends to be older than the wife. Report the p-value, and describe how you calculated it. Also summarize your conclusion and explain its reasoning process. b. For the six couples who were reported to be the same number of years old, what would the pvalue be if the husband was actually older in each case? Would your conclusion change? c. For the six couples who were reported to be the same number of years old, what would the pvalue be if the husband was actually younger in each case? Would your conclusion change? Exercise 32-19: Catnip Aggression 23-14, 23-15, 32-19 Refer to the study and data in Activity 23-13. Conduct a sign test to investigate whether the data provide evidence that cats tend to have more negative interactions after being exposed to catnip than before. Describe how you calculated the p-value, and explain how you reached your conclusion. Activity 32-20: Alarming Wake-Up 23-4, 30-5, 32-20, 32-21 Reconsider Activity 30-5, which described a study that investigated whether children are more likely to wake up when a smoke alarm uses a mother’s voice rather than a conventional tone. The data from the study are reproduced in the following table: Rossman/Chance, Workshop Statistics, 4/e Topic 32 30 Awoke to Conventional Alarm Awoke to Mother’s Voice Do Not Wake to Mother’s Voice Total Do Not Wake to Conventional Alarm Total 14 0 9 1 23 1 14 10 24 One way to analyze these data is to ignore the children who had the same response to both alarms, considering only those who woke up to one alarm but not the other. a. How many of these 24 children woke up to one type of alarm but not the other? b. How many of them woke up to the mother’s voice but not the conventional alarm? c. Assume for now that of the nine children who woke up to only one kind of alarm, each child was equally likely to wake up to the mother’s voice or the conventional tone. Given that assumption, calculate the probability the results would have turned out at least as extreme as they did: all nine waking up to the mother’s voice. d. Is this probability small enough to provide convincing evidence the mother’s voice is a more effective alarm than the conventional tone? Explain your reasoning. (This procedure is called McNemar’s test and applies to matched-pairs data collected with a binary response variable.) Activity 32-21: Alarming Wake-Up 23-4, 30-5, 32-20, 32-21 Reconsider the previous activity. Another response variable in that study was whether the child successfully escaped the house within five minutes of the alarm sounding. The data are reproduced in the following table: Rossman/Chance, Workshop Statistics, 4/e Topic 32 31 Escaped to Conventional Alarm Escaped to Mother’s Voice Did Not Escape to Mother’s Voice Total Did Not Escape to Conventional Alarm Total 7 2 13 2 20 4 9 15 24 Conduct McNemar’s test on these data, investigating whether children are more likely to successfully escape within five minutes of waking to the mother’s voice as opposed to the conventional alarm. Show the details of calculating the probability, and explain the reasoning process behind your conclusion. Rossman/Chance, Workshop Statistics, 4/e Topic 32 32