Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 6.3 Binomial and Geometric Random Variables Introduction We frequently encounter experimental situations where there are two outcomes of interest –Boy/girl –Makes a shot/misses –Meets requirements/doesn’t meet –Flip coin—kicking off/receiving Binomial Settings When the same chance process is repeated several times, we are often interested in whether a particular outcome does or doesn’t happen on each repetition. Some random variables count the number of times the outcome of interest occurs in a fixed number of repetitions. They are called binomial random variables. A binomial setting arises when we perform several independent trials of the same chance process and record the number of times that a particular outcome occurs. The four conditions for a binomial setting are: B• Binary? The possible outcomes of each trial can be classified as “success” or “failure.” I• Independent? Trials must be independent; that is, knowing the result of one trial must not tell us anything about the result of any other trial. N• Number? The number of trials n of the chance process must be fixed in advance. S• Success? There is the same probability p of success on each trial. Binomial Distribution • Binomial distribution – the distribution of the count X of successes in the binomial setting with parameters n and p • N = numbers of observations • P = probability of success we say X is B(n,p) 4 Determine whether the random variables below have a binomial distribution. Justify your answer 1. Roll a fair die 10 times and let X = the number of 6s. Yes: B - success = 6, failure = 1-5 I – knowing the outcome of one roll doesn’t influence the outcome of any other rolls N - n = 10 trials S – prob is always the same 1/6 2.Shoot a basketball 20 times from various distances on the court. Let Y = number of shots made. No: B - success = make the shot, failure = don’t make the shot I – knowing the outcome of one roll doesn’t influence the outcome of any other rolls N - n = 20 trials S – prob changes as you move around the court. 3. Observe the next 100 cars that go by and let C = color. No: B – no success or failure. Too many colors. I – knowing the outcome of one roll doesn’t influence the outcome of any other rolls N - n = 100 trials S – don’t know what a success is to determine what the probability is. • Binomial distributions are an important class of discrete probability distributions • Remember – not all counts have binomial distributions • The central problem of a binomial experiment is to find the probability of the number of successes out of n trials 6 Binomial Probabilities In a binomial setting, we can define a random variable (say, X) as the number of successes in n independent trials. We are interested in finding the probability distribution of X. Each child of a particular pair of parents has probability 0.25 of having type O blood. Genetics says that children receive genes from each of their parents independently. If these parents have 5 children, the count X of children with type O blood is a binomial random variable with n = 5 trials and probability p = 0.25 of a success on each trial. In this setting, a child with type O blood is a “success” (S) and a child with another blood type is a “failure” (F). What’s P(X = 2)? P(SSFFF) = (0.25)(0.25)(0.75)(0.75)(0.75) = (0.25)2(0.75)3 = 0.02637 However, there are a number of different arrangements in which 2 out of the 5 children have type O blood: SSFFF FSFSF SFSFF FSFFS SFFSF FFSSF SFFFS FFSFS FSSFF FFFSS Verify that in each arrangement, P(X = 2) = (0.25)2(0.75)3 = 0.02637 Therefore, P(X = 2) = 10(0.25)2(0.75)3 = 0.2637 Binomial coefficient (number of ways of arranging k successes among n observations) nCr 𝑛 𝑛! = 𝑘 𝑘! 𝑛 − 𝑘 ! Binomial probability 𝑛 𝑘 𝑃 𝑥=𝑘 = 𝑝 1−𝑝 𝑘 Number of arrangements of k successes Probability of k successes 𝑛−𝑘 Probability of n-k failures 8 Review : Algebra II Topic • Binomial expansion: • Pascal’s Triangle • (x + y) n •nCr 9 How to Find Binomial Probabilities How to Find Binomial Probabilities Step 1: State the distribution and the values of interest. Specify a binomial distribution with the number of trials n, success probability p, and the values of the variable clearly identified. Step 2: Perform calculations—show your work! Do one of the following: (i) Use the binomial probability formula to find the desired probability; or (ii) Use binompdf or binomcdf command and label each of the inputs. Step 3: Answer the question. Example: How to Find Binomial Probabilities Each child of a particular pair of parents has probability 0.25 of having blood type O. Suppose the parents have 5 children (a) Find the probability that exactly 3 of the children have type O blood. Let X = the number of children with type O blood. We know X has a binomial distribution with n = 5 and p = 0.25. æ5ö P(X = 3) = ç ÷(0.25) 3 (0.75) 2 = 10(0.25) 3 (0.75) 2 = 0.08789 è 3ø (b) Should the parents be surprised if more than 3 of their children have type O blood? To answer this, we need to find P(X > 3). P(X > 3) = P(X = 4) + P(X = 5) æ 5ö æ5ö 4 1 = ç ÷(0.25) (0.75) + ç ÷(0.25) 5 (0.75) 0 è 4ø è5ø = 5(0.25) 4 (0.75)1 + 1(0.25) 5 (0.75) 0 = 0.01465 + 0.00098 = 0.01563 Since there is only a 1.5% chance that more than 3 children out of 5 would have Type O blood, the parents should be surprised! cdf • We frequently want to find the probability that a random variable takes a range of values • The cumulative distribution function of X calculates the sum of the probabilities for 0, 1, 2, …, up to the value X • That is, it calculates the probability of obtaining at most X successes in n trials From Using a TI-83 or TI-84 Series Graphing Calculator in an Introductory Statistics class By W. Scott Street, IV Dept of Statistical Sciences & Operations Research VA Commonwealth University Cumulative probability From Using a TI-83 or TI-84 Series Graphing Calculator in an Introductory Statistics class By W. Scott Street, IV Dept of Statistical Sciences & Operations Research VA Commonwealth University Suppose a cereal manufacturer puts cards of famous athletes in boxes of cereal in the hope of boosting sales. The manufacturer announces that 20% of the boxes contain an Alex Rodriguez card, 30% contain a card of Michael Phelps and the rest contain a card of Serena Williams. You buy 6 boxes of cereal, what is the probability you get exactly 2 A- Rod cards? P(2 successes among 6 trials) = (.2)(.2)(.8)(.8)(.8)(.8) = 0.0164 𝑛 𝑛! = 𝑘 𝑘! 𝑛 − 𝑘 ! 6 6! = = 15 2 2! 6 − 2 ! 𝑛 𝑘 𝑃 𝑥=𝑘 = 𝑝 1−𝑝 𝑘 𝑛−𝑘 𝑃 #𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 = 2 = 15(0.2)2 1 − 0.2 = 0.246 4 The Last Kiss Do people have a preference for the last thing they taste? Researchers at the University of Michigan designed a study to find out. The researchers gave 22 students five different Hershey’s Kisses (milk chocolate, dark chocolate, crème, caramel, and almond) in random order and asked the student to rate each one. Participants were not told how many Kisses they would be tasting. However, when the 5th and final Kiss was presented, participants were told that it would be their last one. Of the 22 students, 14 of them gave the final Kiss the highest rating. <http://www.sitemaker.umich.edu/eob/files/obrienellsworth201 2.pdf> Problem: Assume that the participants in the study don’t have a special preference for the last thing they taste. That is, assume that the probability a person would prefer the last Kiss tasted is p = 0.20. (a) What is the probability that exactly 5 of the 22 participants would prefer the last Kiss they tried? (b) What is the probability that 14 or more of the 22 participants would prefer the last Kiss they tried? (a) Step 1: State the distribution and the values of interest. Let X = the number of participants who prefer the last Kiss they taste. X has a binomial distribution with n = 22 and p = 0.20. We want to find P(X = 5). Step 2: Perform calculations—show your work! 22 5 17 P X 5 0.20 0.80 5 = 0.1898. Using technology: The command binompdf(trials:22, p:0.20, x value:5) gives 0.1898. Step 3: Answer the question. There is about a 19% chance that exactly 5 participants would choose the last Kiss, assuming that they have no special preference for the last thing they taste. (b) Step 1: State the distribution and the values of interest. Let X = the number of participants who prefer the last Kiss they taste. X has a binomial distribution with n = 22 and p = 0.20. We want to find P(X ≥ 14). Step 2: Perform calculations—show your work! P(X ≥ 14) = 1 – P(X ≤ 13). Using technology: The command 1 – binomcdf(trials:22, p:0.20, x value:13) gives 0.00001. Step 3: Answer the question. There is about a 0.001% chance that 14 or more participants would choose the last Kiss, assuming that they have no special preference for the last thing they taste. Because this probability is so small, there is convincing evidence that the participants have a preference for the last thing they taste. It is almost impossible to get 14 or more just by chance. A quality engineer selects an SRS of 10 switches from a large shipment for detailed inspection. Unknown to the engineer, 10% of the switches in the shipment fail to meet the specifications. What is the probability that no more than 1 of the 10 switches in the sample fail inspection? Let X= number of switches that fail. We are looking for P(X<1). This is a binomial dist with n=10, p=0.1. Binomcdf(10, .1, 1) = 0.7361 The probability that no more than 1 switch will fail is 73.6%. Welcome back! Mean and Standard Deviation of a Binomial Distribution We describe the probability distribution of a binomial random variable just like any other distribution – by looking at the shape, center, and spread. Consider the probability distribution of X = number of children with type O blood in a family with 5 children. xi 0 1 2 3 4 5 pi 0.2373 0.3955 0.2637 0.0879 0.0147 0.00098 Shape: The probability distribution of X is skewed to the right. It is more likely to have 0, 1, or 2 children with type O blood than a larger value. Center: The median number of children with type O blood is 1. Based on our formula for the mean: m X = å x i pi = (0)(0.2373) + 1(0.39551) + ...+ (5)(0.00098) Spread: The variance of X is = 1.25 s = å (x i - m X ) 2 pi = (0 -1.25) 2 (0.2373) + (1-1.25) 2 (0.3955) + ...+ 2 X (5 -1.25) 2 (0.00098) = 0.9375 The standard deviation of X is s X = 0.9375 = 0.968 Mean and Standard Deviation of a Binomial Distribution Mean and Standard Deviation of a Binomial Random Variable If a count X has the binomial distribution with number of trials n and probability of success p, the mean and standard deviation of X are m X = np s X = np(1- p) Note: These formulas work ONLY for binomial distributions. They can’t be used for other distributions! Tastes as good as the real thing? The makers of a diet cola claim that its taste is indistinguishable from the full-calorie version of the same cola. To investigate, an AP® Statistics student named Emily prepared small samples of each type of soda in identical cups. Then she had volunteers taste each cola in a random order and try to identify which was the diet cola and which was the regular cola. Overall, 23 of the 30 subjects made the correct identification. If we assume that the volunteers really couldn’t tell the difference, then each one was guessing with a 1/2 chance of being correct. Let X = the number of volunteers who correctly identify the colas. Problem: (a) Explain why X is a binomial random variable. (b) Find the mean and the standard deviation of X. Interpret each value in context. (c) Of the 30 volunteers, 23 made correct identifications. Does this give convincing evidence that the volunteers can taste the difference between the diet and regular colas? (a) The chance process is each volunteer guessing which sample is the diet cola. Binary? Yes; guesses are either correct or incorrect. Independent? Yes; the results of one volunteer’s guess tells us nothing about the results of other volunteers’ guesses. Number? Yes; there are 30 trials. Success? Yes; the probability of guessing correctly is always 50%. Because X is counting the number of successful guesses, X is a binomial random variable with n = 30 and p = 0.50. (b) The mean of X is X np = 30(0.5) = 15, and the standard deviation of X is X np(1 p) 30(0.5)(1 0.5) = 2.74. If this experiment were repeated many times and the volunteers were randomly guessing, the average number of correct guesses would be about 15. Also, the number of correct guesses would typically vary by about 2.74 from the mean (15). (c) P(X > 23) = 1 – P(X < 22) = 22) = 1 – binomcdf(trials:30, p:0.5, x value:22) = 1 – 0.9974 = 0.0026. There is a very small chance that there would be 23 or more correct guesses if the volunteers couldn’t tell the difference in the colas. Therefore, we have convincing evidence that the volunteers can taste the difference. Dead batteries Almost everyone has one—a drawer that holds miscellaneous batteries of all sizes. Suppose that your drawer contains 8 AAA batteries but only 6 of them are good. You need to choose 4 for your graphing calculator. If you randomly select 4 batteries, what is the probability that all 4 of them will work? Using the Binomial distribution you get: 4 4 0 (0.75) (0.25) 4 P(X = 4) = = 0.3164 The correct answer is: 0.2143 6 8 5 7 4 6 3 = 0.2143 5 Because we are sampling without replacement, the selections of batteries aren’t independent. We can ignore this problem if the sample we are selecting is less than 10% of the population. However, in this case we are sampling 50% of the population (4/8), so it is not reasonable to ignore the lack of independence and use the binomial distribution. This explains why the binomial probability is so different from the actual probability. NASCAR cards and cereal boxes In the “NASCAR Cards and Cereal Boxes” example from Section 5.1, we read about a cereal company that put 1 of 5 different cards into each box of cereal. Each card featured a different driver: Jeff Gordon, Dale Earnhardt, Jr., Tony Stewart, Danica Patrick, or Jimmie Johnson. Suppose that the company printed 20,000 of each card, so there were 100,000 total boxes of cereal with a card inside. If a person bought 6 boxes at random, what is the probability of getting no Danica Patrick cards? Let X be the number of Danica Patrick cards obtained from 6 different boxes of cereal. Because we are sampling without replacement, the trials are not independent. The distribution of X is not quite binomial—but it is close. If we assume X is binomial with n = 6 and p = 0.2, 6 P(X = 0) = (0.2)0 (0.8)6 = 0.262144 0 There is a 26.2% chance of getting no Danica Patrick cards if a person bought 6 boxes at random. The actual probability, using the general multiplication rule, is P(no Danica Patrick cards) 80,000 79,999 79,998 79,997 79,996 79,995 100,000 99,999 99,998 99,997 99,996 99,995 = 0.262134 Normal Approximations to binomial distributions • As the number of trials n gets larger, the binomial distribution gets close to a normal distribution • When n is large, we can use normal probability calculations to approximate hard to calculate binomial probabilities. 30 Binomial Distributions in Statistical Sampling The binomial distributions are important in statistics when we wish to make inferences about the proportion p of successes in a population. Almost all real-world sampling, such as taking an SRS from a population of interest, is done without replacement. However, sampling without replacement leads to a violation of the independence condition. When the population is much larger than the sample, a count of successes in an SRS of size n has approximately the binomial distribution with n equal to the sample size and p equal to the proportion of successes in the population. 10% Condition When taking an SRS of size n from a population of size N, we can use a binomial distribution to model the count of successes in the sample as long as 1 n< 10 N or 10n < N Normal Approximations for Binomial Distributions As n gets larger, something interesting happens to the shape of a binomial distribution. The figures below show histograms of binomial distributions for different values of n and p. What do you notice as n gets larger? Normal Approximation For Binomial Distributions: The Large Counts Condition Suppose that X has the binomial distribution with n trials and success probability p. When n is large, the distribution of X is approximately Normal with mean and standard deviation mX = np s X = np(1- p) As a rule of thumb, we will use the Normal approximation when n is so large that np ≥ 10 and n(1 – p) ≥ 10. That is, the expected number of successes and failures are both at least 10. N(np, np(1 p)) Teens and debit cards In a survey of 506 teenagers aged 14 to 18, subjects were asked a variety of questions about personal finance (http://www.nclnet.org/personal-finance/66- teensand-money/120-ncl-survey-teens-and-financialeducation). One question asked teens if they had a debit card. Problem: Suppose that exactly 10% of teens aged 14 to 18 have debit cards. Let X = the number of teens in a random sample of size 506 who have a debit card. (a)Show that the distribution of X is approximately binomial. (b) Check the conditions for using a Normal approximation in this setting. (c) Use a Normal distribution to estimate the probability that 40 or fewer teens in the sample have debit cards. (a) Binary? Yes; teens either have a debit card or they don’t. Independent? No; because we are sampling without replacement. However, because the sample size (n = 506) is much less than 10% of the population size (there are millions of teens aged 14 to 18), the responses will be very close to independent. Number? Yes; there is a fixed sample size, n = 506. Success? Yes; the unconditional probability of selecting a teen with a debit card is 10%. (b) We need to check the Large Counts condition. Because np = 506(0.10) = 50.6 and n(1 – p) = 506(0.90) = 455.4 are both at least 10, we should be safe using the Normal approximation. (c) Step 1: State the distribution and the values of interest. X = np = 506(0.10) = 50.6 and X np(1 p) 506(0.1)(0.9) = 6.75. Thus, X is approximately Normally distributed with mean 50.6 and standard deviation 6.75. We want to find P(X ≤ 40). Step 2: Perform calculations—show your work! Standardizing the boundary value gives z 40 50.6 = –1.57. Using Table A, P(Z ≤ –1.57) = 0.0582. Using 6.75 technology: The command normalcdf(lower:–1000, upper:40, μ:50.6, : 6.75) gives an area of 0.0582. (Note: The probability using the binomial distribution is 0.064). Step 3: Answer the question. There is about a 6% chance that 40 or fewer teens in a sample of size 506 will have a debit card. Warmup 1. (#6.76 and 78) Suppose you purchase a bundle of 10 bare-root rhubarb plants. The sales clerk tells you that 5% of these plants will die before producing any rhubarb. Assume that the bundle is a random sample of plants and that the sales clerk’s statement is accurate. Let Y = the number of plants that die before producing any rhubarb. a.Find P(Y=1) b.Would you be surprised if 3 or more of the plants in the bundle die before producing any rhubarb? 2. (#6.86) Engineers define reliability as the probability that an item will perform its function under specific conditions for a specific period of time. A certain model of aircraft engine is designed so that each engine has probability 0.999 of performing properly for an hour of flight. Company engineers test an SRS of 350 engines of this model. Let X = the number of that operate for an hour without failure. a. Explain why X is a binomial random variable. b. Find the mean and standard deviation of X. Interpret each value in context. c. Two engines failed the test. Are you convinced that this model of engine is less reliable than it’s supposed to be? Compute P(X<348) and use the result to justify your answer. Activity Children’s cereals have posters of Drake, Taylor Swift or Rihanna. Mrs. Richardson is a big Taylor Swift fan but it takes her 8 boxes of cereal to get one. Was she unlucky? a) Use a die to simulate this experiment. Let 1 or 2 represent the event of buying a box of Frosted Flakes and getting a Taylor Swift poster. If one of the other sides lands on tip, roll again. Count the number of rolls until you get a 1 or 2. b) Make a histogram of the number of rolls the students in your class require to get their first Taylor Swift poster. c) Describe the distribution. d) What was the average number of “boxes” purchased to get a Taylor Swift poster? e) Estimate the chance that Mrs. R would have to buy 8 or more boxes to get her poster. Consider the following situations: • Flip a coin until you get a head • Roll a die until you get a 3 • In basketball, attempt a three-point shot until you make a basket Geometric Settings In a binomial setting, the number of trials n is fixed and the binomial random variable X counts the number of successes. In other situations, the goal is to repeat a chance behavior until a success occurs. These situations are called geometric settings. A geometric setting arises when we perform independent trials of the same chance process and record the number of trials it takes to get one success. On each trial, the probability p of success must be the same. Geometric Settings In a geometric setting, if we define the random variable Y to be the number of trials needed to get the first success, then Y is called a geometric random variable. The probability distribution of Y is called a geometric distribution. The number of trials Y that it takes to get a success in a geometric setting is a geometric random variable. The probability distribution of Y is a geometric distribution with parameter p, the probability of a success on any trial. The possible values of Y are 1, 2, 3, . . . . Like binomial random variables, it is important to be able to distinguish situations in which the geometric distribution does and doesn’t apply! Geometric Probability Formula The Lucky Day Game. The random variable of interest in this game is Y = the number of guesses it takes to correctly match the lucky day. What is the probability the first student guesses correctly? The second? Third? What is the probability the kth student guesses correctly? P(Y =1) =1/7 P(Y = 2) = (6/7)(1/7) = 0.1224 P(Y = 3) = (6/7)(6/7)(1/7) = 0.1050 Geometric Probability Formula If Y has the geometric distribution with probability p of success on each trial, the possible values of Y are 1, 2, 3, … . If k is any one of these values, k-1 P(Y =k) =(1- p) p A probability distribution table for the geometric random variable is a bit different because it never ends. The probabilities are the terms of a geometric sequence X 1 2 3 4 5 6… P(X) p (1-p)p (1-p)2p (1-p)3p (1-p)4p (1-p)5p… Example Roll a die until you get a 6. • The probability of rolling a 6 = 1/6 • The probability of rolling the first 6 on the first roll: P(X=1) = 1/6. geometpdf(1/6,1) Calculating Probabilities • The probability of rolling the first 6 after the first roll: P(X>1)=1-1/6. 1-geometpdf(1/6,1) • The probability of rolling the first 6 on the second roll: P(X=2)=(5/6)*(1/6). geometpdf(1/6,2) Calculating Probabilities • The probability of rolling the first 6 on the second roll or before: P(X<2)=(1/6) +(5/6)*(1/6) geometcdf(1/6,2) • The probability of rolling the first 6 after the second roll: P(X>2)=1-((1/6) +(5/6)*(1/6)) 1-geometcdf(1/6,2) Monopoly In the board game Monopoly, one way to get out of jail is to roll doubles. Suppose that this was the only way a player could get out of jail. The random variable of interest in this example is Y = number of attempts it takes to roll doubles one time. (a) Find the probability that it takes 3 turns to roll doubles. P(Y = 3) = (5/6)2(1/6) = 0.116 (b) Find the probability that it takes more than 3 turns to roll doubles, and interpret this value in context. Because there are an infinite number of possible values of Y greater than 3, we will use the complement rule. P(Y > 3) = 1 – P(Y < 3) = = 1 – P(Y = 3) – P(Y = 2) – P(Y = 1) = 1 – (5/6)2(1/6) – (5/6)1(1/6) – (5/6)0(1/6) = 0.5787. or on calc: 1 - geometcdf (p: 1/6, x: 3)=0.579 If a player tried to get out of jail many, many times by trying to roll doubles, about 58% of the time it would take more than 3 attempts. Mean of a Geometric Random Variable The table below shows part of the probability distribution of Y. We can’t show the entire distribution because the number of trials it takes to get the first success could be an incredibly large number. yi 1 2 3 4 5 6 pi 0.143 0.122 0.105 0.090 0.077 0.066 … Shape: The heavily right-skewed shape is characteristic of any geometric distribution. That’s because the most likely value is 1. Center: The mean of Y is µY = 7. We’d expect it to take 7 guesses to get our first success. Spread: The standard deviation of Y is σY = 6.48. If the class played the Lucky Day game many times, the number of homework problems the students receive would differ from 7 by an average of 6.48. Mean (Expected Value) Of A Geometric Random Variable If Y is a geometric random variable with probability p of success on each trial, then its mean (expected value) is E(Y) = µY = 1/p