Download Chapter 5 Discrete Probability Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Odds wikipedia , lookup

Indeterminism wikipedia , lookup

History of randomness wikipedia , lookup

Stochastic geometry models of wireless networks wikipedia , lookup

Random variable wikipedia , lookup

Randomness wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Law of large numbers wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Elementary Statistics
Chapter 5
Goal:
Spring 2012
Discrete Probability Distributions
To become familiar with how to use Excel 2007/2010 for binomial distributions.
Instructions: Open Excel and click on the Stat button in the Quick Access Bar. Scroll down until you
see BINOM.DIST. (It might be spelt slightly different in Excel 2007). Select that tool. Here is what you
should see:
Try finding the probability of 5 or fewer successes when there were 24 trials and the probability of
success on any one trial is 0.5. Fill out the tool as follows:
Number_s:
5
Trials:
24
Probability_s: 0.5
Cumulative:
true
Midway down the tool screen on the right, you’ll see the answer. It should read 0.003305376. Try it.
15
Elementary Statistics
Goal:
Spring 2012
To become familiar with how to use Excel 2007/2010 for Poisson distributions.
Instructions: Open Excel and click on the Stat button in the Quick Access Bar. Scroll down until you
see POISSON. (It might be spelt slightly different in Excel 2007). Select that tool. Here is what you
should see:
Try finding the probability of 7 arrivals during some minute when the average number of arrivals is 3.5
per minute. Fill out the tool as follows:
X:
7
Mean:
3.5
Cumulative:
false
You should see a probability of 0.038549. Try it.
16
Elementary Statistics
Spring 2012
Chapter 5
Discrete Probability Distributions
Goal:
To become familiar with discrete probability distributions and specifically, the
Binomial Distribution and the Poisson Distribution.
Reading:
Triola, Chapter 5, Sections 5.2 – 5.3
A stochastic process is any process that generates values in a random fashion, each of which has a
probability associated with it. For example, rolling dice is a stochastic process generating numbers, 2
through 12, in a random way, and the probability of throwing a 7 for example, is 1/6. Taking a poll is a
stochastic process because you cannot predict how a person is going to answer other than “yes” or “no” if
those are your only two choices
A random variable is a number that results from a stochastic process and hence has a probability
associated with it. For example, if we’re rolling dice, then the random variable is the sum showing on the
dice, a number between 2 and 12. If we use to represent the random number, then
is a specific
event. We typically use capital letters like A or B. Therefore if A is the event that we roll a seven all of
the following are equivalent:
( )
(
)
In the case of polling, if we code a “yes” answer as the number 1 and a “no” answer as a 0, then X can
take on one of two values, 0 or 1, and we can then ask such questions as what is P(X=7) equal to.
A probability distribution is a set of all possible outcomes from some stochastic process, showing or
describing each random variable generated by the process along with its associated probability. For
example, the following table shows all possible outcomes of rolling dice and the probability of each
outcome, and hence is a probability distribution:
Notice the similarity between a probability distribution and a frequency distribution. How can you turn any
frequency distribution into a probability distribution?
17
Elementary Statistics
Spring 2012
Each value of X is a possible outcome of rolling the dice. Note that the sum of all the probabilities is 1.0,
as well it should be since one of those rolls has to occur. This is an important property of a probability
distribution. The probabilities must always sum to 1.0 or otherwise it’s not a probability distribution.
A discrete probability distribution is one that results from a stochastic process, where the random
variable is a discrete number. We will study continuous probability distributions in a later chapter. Rolling
dice is a good example of a process that generates a discrete probability distribution.
Binomial Distribution
A commonly encountered discrete probability distribution is the binomial probability distribution.
The following process will generate one:
1. The process has a fixed number of trials. For example, let’s say we roll the dice 50 times.
2. The trials must be independent. Each roll is independent of the other rolls; no roll depends on a
previous roll regardless of what the spectators are telling you at the casino.
3. Each trial must have only one of two outcomes. Here, our dice example deviates, because
anyone of eleven numbers can come up. However, if we change things just slightly, we can
make it fit. If we define a win or a success as when the number 7 comes up, and everything else
as a loss or failure, then we have only two possible outcomes, success or failure.
4. The probability of a win or a success remains the same throughout all the trials. In our case,
given the we defined a “success” as rolling a seven, the probability of a success is for each roll
(the probability of a failure or loss would then be
).
There are tables for finding the probabilities of different events given that we are working with a binomial
distribution. However, we are going to use Excel. For example, let’s say that we roll the dice 10 times.
What is the probability that we will roll exactly 4 sevens? You run the BINOM.DIST tool, found on the
Statistics function menu, and fill it out as shown below.
Number_s is the number of successes you’re testing for, in this case 4. Trials is the total number of times
you roll the dice. Probability_s is the probability of rolling a seven on any one roll. Cumulative is set to
false if we want exactly 4 times. As you can see in the middle of the window, the probability of rolling a 7
exactly four times out of ten is 0.0543 (rounded to four decimal places).
18
Elementary Statistics
Spring 2012
Now suppose that we wanted to know the probability of rolling a 7 no more than four times. This means
that in addition to rolling a 7 four times, we also include the case of rolling a 7 three times, or two times, or
one time or no time. This is what we mean by “no more than” aka, less than or equal to.
The only change in the use of the BINOMDIST tool is that we now enter “true” for Cumulative:
We now see that the probability of rolling a 7 no more than four times has jumped to 0.9845.
Finally, how would we find the probability of rolling a 7 at least four times. This means that we would
count rolling a 7 four times, five times, six, seven, eight, nine, or ten times. Note, it’s “or” and not “and”.
Take a moment here and think about the differences and similarities between, no more than four,
at least four, four or less, four or more.
The problem we encounter is that the tool is designed to give us only the case where we are asking for
“no more than” a certain number. Hence, we have to use the complement of the event, and then
subtract that result from 1.0. The complement of “at least four times” is “no more than three times”.
Think about that for a while. We use the tool to find the probability of rolling a 7 no more than three times
(Number_s will be 3). The probability is 0.930. Therefore, the probability of rolling a 7 at least four times
is
( )
( ̅)
Here’s another example of how to the binomial distribution is used. Let’s say that Kim felt she was highly
qualified for a job she applied for but didn’t get it, and that she suspected the company of gender
discrimination. After a little research, she found that out of the last 20 new employees hired, only three
were women. Furthermore, the applicant pool was very large and had an equal number of qualified men
and women in it. If there was no hiring bias, you would expect that each person had a 50-50 chance of
getting hired or a probability of 0.5. However, Kim found that only 15% of the new employees were
women. Now, 15% is a lot smaller than 50%, so how likely is it that only 15% hired were women if we
assume that there is no gender bias? To answer this question, we have to find the probability that only
three women or fewer were hired purely by chance.
firm grasp of what we’re doing here.
Read this last sentence again until you have a
19
Elementary Statistics
Spring 2012
We use the binomial distribution to find the above mentioned probability. The number of “successes” in
this case is 3, the number of “trials” is 20, the total number of new employees, and the probability of
getting hired if chance alone was at work would be 0.5. Finally, we want to know, given this scenario,
what is the probability that three or less women would be hired purely by chance:
As you can see, the probability is 0.001. Statisticians have agreed that any event that has a probability of
less than 0.05 of occurring is a highly unlikely event. This is known as the Rare Event Rule.
If given a set of assumptions, the probability of an event occurring purely by chance is less than 0.05,
if the event actually does occur, we can assume that the given set of assumptions was most likely
incorrect.
In this example, the given assumption was that there was no bias, and hence every candidate had a 5050 chance of being hired. However, under that assumption, the probability of no more than 3 women
being hired is 0.001, quite a bit less than 0.05. Therefore, we can conclude that it is most likely that the
original assumption was incorrect, i.e. it is far more likely that a hiring bias based on gender did exist.
This is an important and powerful application of the binomial distribution. Please reread this last
example until you understand it.
Poisson Distribution
This distribution is less common than the Binomial Distribution, but it has important applications. One
such application is predicting how many visits to a website will be occurring at the same time. Whenever
you have events occurring in a random fashion and “filling some bucket” you will have a Poisson
Distribution. Here are some examples:
20
Elementary Statistics
Spring 2012
1. People queuing at a cash register. Let’s say that in a certain supermarket, people arrive at the
cash register at an average rate of one every two minutes during their busy time. If the average
time to service a customer is two minutes, then on average, there should never be anyone
waiting in line. However, arrivals are a random event. If we consider the two minute window for
servicing a customer as our “bucket”, we can ask, what is the probability that we’ll end up with
people waiting one minute, two minutes, three minutes, etc.
2. Let’s say that a monkey is throwing darts at a board. The board has been evenly divided into 100
squares. The monkey is given 50 darts to throw, and let’s further assume that where the dart
lands on the board is completely random. In other words, the dart hits are uniformly distributed
over the board. With this information we can calculate the probability that any one square will be
hit with more than one dart. In this case, the “bucket” are the squares. The dart hitting the
squares are the random event “filling” the bucket.
Here are the requirements for using the Poisson Distribution.
1. The occurrences must be random.
2. The occurrences must be independent of each other.
3. The occurrences must be uniformly distributed over the “buckets”.
Here’s an example of how the Poisson Distribution would be used. When using a pellet fertilizer spread
by a broadcast method, we would like an even distribution of fertilizer. Too little and growth would be
stunted, too much, and it will cause a “burn”. Let’s say that on average, 100 pellets fall on a square
meter. What is the probability that 80 pellets or less fall on some square meter?
The average number of “hits” per square meter is 100. To find the probability that at most 80 pellets fall
on some square meter we use the Excel tool, POISSON.DIST:
The chances would be 0.023 or 2.3%.
21