Download Chapter 4 - Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Probability amplitude wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Handout 4: Binomial Distribution
Reading Assignment: Chapter 5
In the previous handout, we looked at continuous random variables and calculating probabilities and percentiles for those type of variables. Throughout this handout we will discuss discrete random variables more specifically, binomial random variables. Recall that discrete random variables can take only one of a
countable list of distinct values.
The Binomial Random Variable
Certain conditions must be met for a variable to be considered a binomial random variable, but the basic
idea is that a binomial random variable is a count of how many times an event occurs (or does not occur) in
a particular number of independent observations or trials that make up a random circumstance. One of the
more basic examples of a binomial random variable is the number of heads observed in four tosses of a fair
coin. We define a binomial random variable as X = number of successes in the n trials of a binomial process
(e.g. X = the number of heads in four tosses of a fair coin).
A binomial process is defined by the following conditions:
1. There are n specified ‘trials’.
2. Each observation results in one of two possible outcomes, called ‘success’ and ‘failure’.
3. The probability of a ‘success’ remains the same from one trial to the next, and this probability is
denoted by p. The probability of a ‘failure’ is q = 1 − p for every trial.
4. The outcomes are independent from one trial to the next
Sometimes there may be more than two possible simple events for each trial (think of rolling a die there
are six possible outcomes), but the random variable counts how many times a particular subset of the possibilities occurs (the up-face on the die is an even value). Surveyed responses can also produce a binomial
random variable when we count how many individuals in the sample have a particular trait or opinion.
Suppose a class consists of ten boys and ten girls. Five children are randomly selected to give a presentation.
Let X = the number of girls selected. Is this a binomial random variable? Go through the conditions of a
binomial process and check that each condition is met.
Computing Probabilities for Binomial Variables
For a binomial random variable, the probabilities for the possible values of X are given by the formula
P (X = j) =
n!
· pj · (1 − p)n−j , for j = 0, 1, 2, n
j! · (n − j)!
Note that j! = (j) · (j − 1) · (j − 2) · · · (1). The calculator will handle this calculator exclusively for this
course, but let’s go through a very simple example.
You flip a fair coin four times, what is the probability that you obtain three heads.
Let X = the number of heads obtained from flipping a fair coin four times. For this example we have
that there are four trials (n = 4), a ‘success’ is obtaining a head face-up with the probability of that occurring being 12 for each trial, and, lastly, the outcome of one trial does not influence another (independent
trials). Thus, the conditions are satisfied for a binomial process.
We are looking for P (X = 3) (the probability that we obtain three heads from flipping a fair coin four
times). Plugging the values we have into the formula we have
1
P (X = 3) =
4!
3!·(4−3)!
·
13
2
·
1 4−3
2
4!
3!·(1)!
·
13
2
·
11
2
=
=
So, when flipping a coin four times, we should expect to obtain three heads
% of the time.
iClicker Questions
Now, suppose that we wanted to know the probability of obtaining more heads than tails when flipping
a fair coin four times. In other words, we want to know the probability of obtain 3 or 4 heads (or 3 or more
heads) when flipping a fair coin four times. If we let X = the number of heads obtained while flipping a fair
coin four times again, we can write this statement using probability notation as
P (X = 3) + P (X = 4) = P (X ≥ 3)
In the previous examples we used the probability distribution function (pdf) to determine the probability
at exactly one point (the both start with ‘p’). In this example we want to sum the pdfs, or cumulate, over
more than one point so we will use the cumulative distribution function (cdf) in the calculator. Now, the
calculator calculates the cdf as P (X ≤ j) and we will need to think through how to use this function to
obtain the probability we want.
In other words, we need to ‘remove’ the probability we do not want from the total probability. Note: for
discrete random variables, the ‘equality’ in probability notation matters while it does not for continuous
random variables.
2
In your calculator you will be using binomcdf(n, p, j)):
press [2nd], then [VARS]
Select binomcdf( in the DISTR tab and then press [ENTER]
Type in ‘n’, ‘p’, and ‘j’ (the number of trials, the probability of a ‘success’,
and the number of ‘successes’ we are interested in) on the main screen
Press [ ) ] to close the parenthesis and then press [ENTER]
iClicker Question
Expected Value and Standard Deviation of a Binomial Random Variable
The mean, or expected value, of a binomial random variable is µ = E(X) = n · p where n = number of
trials and p = probability of success. If you were to flip a fair coin 100 times, for instance, how many heads
would you expect to result, on average?
There is also a formula for the standard deviation of a binomial random variable. The formula for the
mean and standard deviation of a binomial random variable were derived by using algebra, but you won’t
need to know these derivations. The results are useful for later applications.
p
The standard deviation of a binomial random variable is σ = n · p · (1 − p) where, again, n = number
of trials and p = probability of success. What would we expect the average deviation from the mean number
of heads obtained when flipping a fair coin 100 times?
A study by the Center for Financial Services Innovation showed that only 64% of U.S. income earners
aged 15 and older had a bank account (A. Carrns, Banks Court a New Client, The Wall Street Journal,
March 16, 2007, p. D1). If a random sample of 20 U.S. income earners aged 15 and older is selected, how
many U.S. income earners aged 15 and older would we expect to have a bank account and what would the
average deviation from the mean be?
Source: Levine, Krehbiel, and Berenson. Business Statistics: A First Course. 5th ed. New Jersey: Pearson Education, Inc., 2010. 174. Print.
3
Normal Approximation
In the figure below, we see the pdfs for all possible values that the number of U.S. income earners aged 15
and older. What do we notice about its’ shape?
Histogram of Probabilities for the Number
of U.S. Income Earners Aged 15 and Older
Beyond the empirical rule, we may apply the normal curve to approximate binomial probabilities. We will
visualize
(as seen above) that the binomial distribution is centered on n · p with a standard deviation of
p
n · p · (1 − p).
What would the probability of observing at most 10 U.S. income earners aged 15 and older with a bank
account? Remember to convert to a z-score and plot it on standard normal curve before calculating the
normal approximation using normcdf(LOWER, UPPER). Keep in mind that the normal curve values are
between negative and positive infinity.
Let’s recalculate that probability, but using the exact method (the binomial cumulative distribution function). Remember that we use binomcdf(n, p, j) to calculate P (X ≤ j).
Note that the normal approximation is indeed an approximation and it is important to note that some
approximations are better than others.
4
The normal curve gives reasonably good approximations of binomial probabilities whenever both n · p > 5
and n · (1 − p) > 5.
If the histograms of probabilities for a binomial variable are noticeably skewed (positively or negatively), the
approximations will be poor approximates. In other words, if p is either too close to 0 or too close to 1, it
will not be ‘safe’ to use the normal curve approximate.
5