Download The Geometric Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
I can prove anything by statistics - except the truth.
George Canning
Chapter 8
Sec 8.2
Having spent much time studying binomial distributions, what qualifies, formulas to use,
etc., learning about geometric distributions should be easy.
Let's start by contrasting the two distributions and settings.
Binomial: has a FIXED number of trials before the experiment begins and X counts the
number of successes obtained in that fixed number.
Geometric: has a fixed number of successes (ONE...the FIRST) and X counts the
number of trials needed to obtain that first success. It is theoretically possible to proceed
indefinitely without ever obtaining a success.
Examples would be
1) flip a coin UNTIL you get a head
2) roll a die UNTIL you get a 3
3) attempt a three-point shot in basketball UNTIL you make a basket
A random variable X is geometric provided that the following conditions are met: (a-c are
same as binomial)
a) each observation falls into one of just two categories, called success or failure
b) probability of a success, p, is the same for each observation
c) observations are all independent
NEW
d) the variable of interest is the number of trials required to obtain the FIRST success.
Recognizing the existence of either a binomial or geometric distribution is essential to
know how to proceed with your data analysis. Here is an example that should help
explain how to VERIFY a geometric setting.
An experiment consists of rolling a single die. The event of interest is rolling a 3: this is
called a success. The random variable is defined as X = number of trials UNTIL a 3
occurs. To VERIFY that this is a geometric setting, note that rolling a 3 will represent a
success, and rolling any other number will represent a failure. The probability of rolling
a 3 on each roll is the same: 1/6. The observations are independent. A trial consists of
rolling the die once. We roll the die until a 3 appears. Since all of the requirements are
satisfied, this experiment describes a geometric setting.
Rule for calculating geometric probabilities:
If X has a geometric distribution with probability p of success and (1-p) of failure on each
observation, the possible values of X are 1, 2, 3, .... If n is any one of these values, the
probability that the first success occurs on the nth trial is P(X = n) = (1 - p)n-1 p
See p. 465 and p. 467
This rule can be used to construct a probability distribution table for X = number of rolls
of a die until a 3 occurs from our earlier example. We'll use the TI 83 to do this now.
When graphing the distribution of X as a probability distribution histogram it will appear
to be strongly skewed to the right. This will ALWAYS be the case. Try to determine
why from the formula!
The mean (expected value) and standard deviation of a geometric random variable can be
calculated using these formulas:
If X is a geometric random variable with probability of success p on each trial, then the
mean of the random variable , that is the expected number of trials required to get the first
1
(1  p)
success, is E(X) =  X  and the Var(X) =  2 
whose square root yields the
p
p2
standard deviation  
(1  p )
p2
See p. 469
One more rule to go....
P(X >n) or the probability that it takes MORE than a certain number of trials to achieve
the first success is:
P(X > n) = (1 - p)n
Since we are seeking the first success at whatever trial it occurs, geometric simulations
are called "waiting time" simulations. Makes sense! Let's discuss the cereal problem on
page 473.
In the TI-83/84, 2nd  VARS  geometpdf or geometcdf (probability, # of trials)
Remember: pdf is used for one value (=) and cdf is cumulative (≤)