Download Chapter 12

ACMS 20340 Statistics for Life Sciences Chapter 12: Discrete Probability Distributions What about categorical variables? We’ve studied various distributions of quantitative variables, most notably, the Normal distributions. But what is the appropriate probability model for the count of successful outcomes of a categorical variable? We will focus on one distribution in particular, the binomial distribution. Some Motivating Examples I I You toss a fair coin ten times. I How many times does it come up heads? I What is the probability of it coming up heads exactly three times? An obstetrician oversees 12 single-birth deliveries on a certain day. I How many of the deliveries are of girls? I What is the probability of there being exactly 7 girls in this “batch” of 12? The Binomial Setting 1. There is a fixed number n of observations. 2. The n observations are independent, which means that knowing the result of one observation doesn’t change the probabilities we assign to other observations. 3. Each observation falls into one of two categories, one of which we will call “success”, and the other “failure”. 4. The probability p of a success is the same for each observation. The Binomial Distribution The count X of successes in the binomial setting has the binomial distribution with parameters n and p. The parameter n is the number of observations, and p is the probability of a success on any one observation. The possible values of X are whole numbers from 0 to n. An important caveat: Not all counts have a binomial distribution, so we must ensure that we’re in the binomial setting before we conclude that a count has a binomial distribution. Binomial Distribution Examples I You toss a fair coin ten times and count the number of Hs. I I I An obstetrician oversees 12 single-birth deliveries on a certain day and counts the number of girls born. I I I n = 10 p = 1/2 n = 12 p = 1/2 You roll a fair die 100 times and count the number of occurrence of ‘1’. I I n = 100 p = 1/6 A Non-Example You select five balls from a barrel containing 50 red balls and 50 blue balls, without replacement. What is the probability of selecting only red balls? 50 100 ! 49 99 ! 48 98 ! 47 97 ! 46 96 ! = 1081 = 0.028 38412 Why aren’t these counts binomially distributed? Binomial Probabilities 1 What we’d like is a formula for the probability that a binomial random variable takes any value. Idea: We add probabilities for the different ways of getting exactly that many successes in n observations. That is, if X is a binomial random variable, we want a formula for calculating P(X = k) for any k = 0, 1, 2, . . . , n. Binomial Probabilities 2 Let’s first consider an example. Each child born to a particular set of parents has probability 0.25 of having blood type O. If these parents have 5 children, what is the probability of exactly two of them having blood type O? The count of children with blood type O is binomially distributed: I n=5 I p = 0.25 Let’s use “S” to stand for success (blood type O) and “F ” to stand for failure. Binomial Probabilities 3 Step 1: What is the probability of that just the first and third child give successes? That is, P(SFSFF ) =? The probability of a sequence of independent events is the product of the probabilities of each individual event: P(SFSFF ) = P(S) · P(F ) · P(S) · P(F ) · P(F ) = (0.25)(0.75)(0.25)(0.75)(0.75) = (0.25)2 (0.75)3 Binomial Probabilities 4 Step 2: Observe that any arrangement of 2 S’s and 3 F’s has this same probability: we always just multiply 0.25 twice and 0.75 three times whenever we have 2 S’s and 3 F ’s. So the probability that X = 2 is the probability of getting 2 S’s and 3 F ’s in any arrangement whatsoever: SSFFF SFSFF SFFSF SFFFS FSSFF FSFSF FSFFS FFSSF FFSFS FFFSS There are ten such arrangements, each with the same probability, and hence P(X = 2) = 10(0.25)2 (0.75)3 = 0.2637. The Binomial Coefficient The number of ways of arranging k successes among n observations is given by the binomial coefficient n n! = k!(n − k)! k for any k = 0, 1, 2, . . . , n. Recall that the factorial of n, n! is n! = n · (n − 1) · (n − 2) · . . . · 3 · 2 · 1, and 0!=1. The Binomial Coefficient in Action How many different ways are there to have exactly two successes in five trials? 5 5! = 2!3! 2 (5)(4)(3)(2)(1) = (2)(1)(3)(2)(1) (5)(4) = (2)(1) 20 = = 10. 2 The Official Formula for Binomial Probabilitiies If X has the binomial distribution with n observations and probability p of success for each observation, then the possible values of X are 0, 1, 2, . . . , n. If k is any one of these values, then n k P(X = k) = p (1 − p)n−k . k Example One in ten boxes of Cracker Jacks contains a decoder ring. What is the probability that no more than one of ten randomly chosen boxes of Cracker Jacks contains a decoder ring? I n = 10 I p = 0.1 P(X ≤ 1) = P(X = 0) + P(X = 1) 10 10 0 10 = (0.1) (0.9) + (0.1)(0.9)9 0 1 10! 10! = (1)(0.3487) + (0.1)(0.3874) 0!10! 1!9! = (1)(1)(0.3487) + (10)(0.1)(0.3874) = 0.3487 + 0.3874 = 0.7361 Binomial mean and standard deviation Q In many repetitions of the binomial setting, with n observations and the probability of success p, what will be the average count of successes? (In other words, what is the mean of the count variable X ?) A If a count X has the binomial distribution with n observations and probability p of success, the mean and standard deviation of X are µ = np p σ = np(1 − p). Coin Tossing You toss a fair coin ten times and count the occurrence of Hs. I n = 10 I p = 1/2 If we repeat the ten trials repeatedly, how many heads should occur on average? µ = np = (10)(1/2) = 5 And the standard deviation? p p p σ = np(1 − p) = 10(1/2)(1/2) = 5/2 The Normal Approximation to Binomial Distributions Suppose that a count X has the binomial distribution with n observations and probability of success p. When p n is large, the distribution of X is approximately Normal, N(np, np(1 − p)). As a rule of thumb, we use the Normal approximation when n is so large that np ≥ 10 and n(1 − p) ≥ 10. !"#$%&'()#*+,$-'(./#$( Remember This? One Last Example About 60% of American adults are either overweight or obese. What is the probability that at least 1520 individuals from a random sample of 2500 adults are overweight or obese? Given that our sample is random, we can take the 2500 members of our sample to be independent. So we’re in the binomial setting: I n = 2500 I p = 0.6 Using software, we find that P(X ≥ 1520) = 0.2131. Let’s Use the Normal Approximation 1 µ = np = (2500)(0.6) = 1500 p p σ = np(1 − p) = (2500)(0.6)(0.4) = 24.49 The distribution of this binomial random variable is approximated well by the Normal distribution N(1500, 24.49) (since np = 1500 ≥ 10 and n(1 − p) = 1000 ≥ 10). 2/22/12 11:47 AM Let’s Use the Normal Approximation 2 X − 1500 1520 − 1500 P(X ≥ 1520) = P ≥ 24.49 24.49 ! = P(Z ≥ 0.82) = 1 − 0.7939 = 0.2061 The Normal approximation 0.2061 differs from the software result 0.2131 by only 0.007.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 12