Download Chapter 4 - Statistics

Handout 4: Binomial Distribution Reading Assignment: Chapter 5 In the previous handout, we looked at continuous random variables and calculating probabilities and percentiles for those type of variables. Throughout this handout we will discuss discrete random variables more specifically, binomial random variables. Recall that discrete random variables can take only one of a countable list of distinct values. The Binomial Random Variable Certain conditions must be met for a variable to be considered a binomial random variable, but the basic idea is that a binomial random variable is a count of how many times an event occurs (or does not occur) in a particular number of independent observations or trials that make up a random circumstance. One of the more basic examples of a binomial random variable is the number of heads observed in four tosses of a fair coin. We define a binomial random variable as X = number of successes in the n trials of a binomial process (e.g. X = the number of heads in four tosses of a fair coin). A binomial process is defined by the following conditions: 1. There are n specified ‘trials’. 2. Each observation results in one of two possible outcomes, called ‘success’ and ‘failure’. 3. The probability of a ‘success’ remains the same from one trial to the next, and this probability is denoted by p. The probability of a ‘failure’ is q = 1 − p for every trial. 4. The outcomes are independent from one trial to the next Sometimes there may be more than two possible simple events for each trial (think of rolling a die there are six possible outcomes), but the random variable counts how many times a particular subset of the possibilities occurs (the up-face on the die is an even value). Surveyed responses can also produce a binomial random variable when we count how many individuals in the sample have a particular trait or opinion. Suppose a class consists of ten boys and ten girls. Five children are randomly selected to give a presentation. Let X = the number of girls selected. Is this a binomial random variable? Go through the conditions of a binomial process and check that each condition is met. Computing Probabilities for Binomial Variables For a binomial random variable, the probabilities for the possible values of X are given by the formula P (X = j) = n! · pj · (1 − p)n−j , for j = 0, 1, 2, n j! · (n − j)! Note that j! = (j) · (j − 1) · (j − 2) · · · (1). The calculator will handle this calculator exclusively for this course, but let’s go through a very simple example. You flip a fair coin four times, what is the probability that you obtain three heads. Let X = the number of heads obtained from flipping a fair coin four times. For this example we have that there are four trials (n = 4), a ‘success’ is obtaining a head face-up with the probability of that occurring being 12 for each trial, and, lastly, the outcome of one trial does not influence another (independent trials). Thus, the conditions are satisfied for a binomial process. We are looking for P (X = 3) (the probability that we obtain three heads from flipping a fair coin four times). Plugging the values we have into the formula we have 1 P (X = 3) = 4! 3!·(4−3)! · 13 2 · 1 4−3 2 4! 3!·(1)! · 13 2 · 11 2 = = So, when flipping a coin four times, we should expect to obtain three heads % of the time. iClicker Questions Now, suppose that we wanted to know the probability of obtaining more heads than tails when flipping a fair coin four times. In other words, we want to know the probability of obtain 3 or 4 heads (or 3 or more heads) when flipping a fair coin four times. If we let X = the number of heads obtained while flipping a fair coin four times again, we can write this statement using probability notation as P (X = 3) + P (X = 4) = P (X ≥ 3) In the previous examples we used the probability distribution function (pdf) to determine the probability at exactly one point (the both start with ‘p’). In this example we want to sum the pdfs, or cumulate, over more than one point so we will use the cumulative distribution function (cdf) in the calculator. Now, the calculator calculates the cdf as P (X ≤ j) and we will need to think through how to use this function to obtain the probability we want. In other words, we need to ‘remove’ the probability we do not want from the total probability. Note: for discrete random variables, the ‘equality’ in probability notation matters while it does not for continuous random variables. 2 In your calculator you will be using binomcdf(n, p, j)): press [2nd], then [VARS] Select binomcdf( in the DISTR tab and then press [ENTER] Type in ‘n’, ‘p’, and ‘j’ (the number of trials, the probability of a ‘success’, and the number of ‘successes’ we are interested in) on the main screen Press [ ) ] to close the parenthesis and then press [ENTER] iClicker Question Expected Value and Standard Deviation of a Binomial Random Variable The mean, or expected value, of a binomial random variable is µ = E(X) = n · p where n = number of trials and p = probability of success. If you were to flip a fair coin 100 times, for instance, how many heads would you expect to result, on average? There is also a formula for the standard deviation of a binomial random variable. The formula for the mean and standard deviation of a binomial random variable were derived by using algebra, but you won’t need to know these derivations. The results are useful for later applications. p The standard deviation of a binomial random variable is σ = n · p · (1 − p) where, again, n = number of trials and p = probability of success. What would we expect the average deviation from the mean number of heads obtained when flipping a fair coin 100 times? A study by the Center for Financial Services Innovation showed that only 64% of U.S. income earners aged 15 and older had a bank account (A. Carrns, Banks Court a New Client, The Wall Street Journal, March 16, 2007, p. D1). If a random sample of 20 U.S. income earners aged 15 and older is selected, how many U.S. income earners aged 15 and older would we expect to have a bank account and what would the average deviation from the mean be? Source: Levine, Krehbiel, and Berenson. Business Statistics: A First Course. 5th ed. New Jersey: Pearson Education, Inc., 2010. 174. Print. 3 Normal Approximation In the figure below, we see the pdfs for all possible values that the number of U.S. income earners aged 15 and older. What do we notice about its’ shape? Histogram of Probabilities for the Number of U.S. Income Earners Aged 15 and Older Beyond the empirical rule, we may apply the normal curve to approximate binomial probabilities. We will visualize (as seen above) that the binomial distribution is centered on n · p with a standard deviation of p n · p · (1 − p). What would the probability of observing at most 10 U.S. income earners aged 15 and older with a bank account? Remember to convert to a z-score and plot it on standard normal curve before calculating the normal approximation using normcdf(LOWER, UPPER). Keep in mind that the normal curve values are between negative and positive infinity. Let’s recalculate that probability, but using the exact method (the binomial cumulative distribution function). Remember that we use binomcdf(n, p, j) to calculate P (X ≤ j). Note that the normal approximation is indeed an approximation and it is important to note that some approximations are better than others. 4 The normal curve gives reasonably good approximations of binomial probabilities whenever both n · p > 5 and n · (1 − p) > 5. If the histograms of probabilities for a binomial variable are noticeably skewed (positively or negatively), the approximations will be poor approximates. In other words, if p is either too close to 0 or too close to 1, it will not be ‘safe’ to use the normal curve approximate. 5

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 4 - Statistics