Download Chapter 12

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ACMS 20340
Statistics for Life Sciences
Chapter 12:
Discrete Probability Distributions
What about categorical variables?
We’ve studied various distributions of quantitative variables, most
notably, the Normal distributions.
But what is the appropriate probability model for the count of
successful outcomes of a categorical variable?
We will focus on one distribution in particular, the binomial
distribution.
Some Motivating Examples
I
I
You toss a fair coin ten times.
I
How many times does it come up heads?
I
What is the probability of it coming up heads exactly three
times?
An obstetrician oversees 12 single-birth deliveries on a certain
day.
I
How many of the deliveries are of girls?
I
What is the probability of there being exactly 7 girls in this
“batch” of 12?
The Binomial Setting
1. There is a fixed number n of observations.
2. The n observations are independent, which means that
knowing the result of one observation doesn’t change the
probabilities we assign to other observations.
3. Each observation falls into one of two categories, one of which
we will call “success”, and the other “failure”.
4. The probability p of a success is the same for each
observation.
The Binomial Distribution
The count X of successes in the binomial setting has the binomial
distribution with parameters n and p.
The parameter n is the number of observations, and p is the
probability of a success on any one observation.
The possible values of X are whole numbers from 0 to n.
An important caveat: Not all counts have a binomial distribution,
so we must ensure that we’re in the binomial setting before we
conclude that a count has a binomial distribution.
Binomial Distribution Examples
I
You toss a fair coin ten times and count the number of Hs.
I
I
I
An obstetrician oversees 12 single-birth deliveries on a certain
day and counts the number of girls born.
I
I
I
n = 10
p = 1/2
n = 12
p = 1/2
You roll a fair die 100 times and count the number of
occurrence of ‘1’.
I
I
n = 100
p = 1/6
A Non-Example
You select five balls from a barrel containing 50 red balls and 50
blue balls, without replacement.
What is the probability of selecting only red balls?
50
100
!
49
99
!
48
98
!
47
97
!
46
96
!
=
1081
= 0.028
38412
Why aren’t these counts binomially distributed?
Binomial Probabilities 1
What we’d like is a formula for the probability that a binomial
random variable takes any value.
Idea: We add probabilities for the different ways of getting exactly
that many successes in n observations.
That is, if X is a binomial random variable, we want a formula for
calculating
P(X = k)
for any k = 0, 1, 2, . . . , n.
Binomial Probabilities 2
Let’s first consider an example.
Each child born to a particular set of parents has probability 0.25
of having blood type O.
If these parents have 5 children, what is the probability of exactly
two of them having blood type O?
The count of children with blood type O is binomially distributed:
I
n=5
I
p = 0.25
Let’s use “S” to stand for success (blood type O) and “F ” to
stand for failure.
Binomial Probabilities 3
Step 1: What is the probability of that just the first and third
child give successes? That is,
P(SFSFF ) =?
The probability of a sequence of independent events is the product
of the probabilities of each individual event:
P(SFSFF ) = P(S) · P(F ) · P(S) · P(F ) · P(F )
= (0.25)(0.75)(0.25)(0.75)(0.75)
= (0.25)2 (0.75)3
Binomial Probabilities 4
Step 2: Observe that any arrangement of 2 S’s and 3 F’s has this
same probability: we always just multiply 0.25 twice and 0.75 three
times whenever we have 2 S’s and 3 F ’s.
So the probability that X = 2 is the probability of getting 2 S’s
and 3 F ’s in any arrangement whatsoever:
SSFFF SFSFF SFFSF SFFFS FSSFF
FSFSF FSFFS FFSSF FFSFS FFFSS
There are ten such arrangements, each with the same probability,
and hence
P(X = 2) = 10(0.25)2 (0.75)3 = 0.2637.
The Binomial Coefficient
The number of ways of arranging k successes among n
observations is given by the binomial coefficient
n
n!
=
k!(n − k)!
k
for any k = 0, 1, 2, . . . , n.
Recall that the factorial of n, n! is
n! = n · (n − 1) · (n − 2) · . . . · 3 · 2 · 1,
and 0!=1.
The Binomial Coefficient in Action
How many different ways are there to have exactly two successes in
five trials?
5
5!
=
2!3!
2
(5)(4)(3)(2)(1)
=
(2)(1)(3)(2)(1)
(5)(4)
=
(2)(1)
20
=
= 10.
2
The Official Formula for Binomial Probabilitiies
If X has the binomial distribution with n observations and
probability p of success for each observation, then the possible
values of X are 0, 1, 2, . . . , n.
If k is any one of these values, then
n k
P(X = k) =
p (1 − p)n−k .
k
Example
One in ten boxes of Cracker Jacks contains a decoder ring.
What is the probability that no more than one of ten randomly
chosen boxes of Cracker Jacks contains a decoder ring?
I
n = 10
I
p = 0.1
P(X ≤ 1) = P(X = 0) + P(X = 1)
10
10
0
10
=
(0.1) (0.9) +
(0.1)(0.9)9
0
1
10!
10!
=
(1)(0.3487) +
(0.1)(0.3874)
0!10!
1!9!
= (1)(1)(0.3487) + (10)(0.1)(0.3874)
= 0.3487 + 0.3874 = 0.7361
Binomial mean and standard deviation
Q In many repetitions of the binomial setting, with n
observations and the probability of success p, what will be the
average count of successes?
(In other words, what is the mean of the count variable X ?)
A If a count X has the binomial distribution with n observations
and probability p of success, the mean and standard deviation
of X are
µ = np
p
σ = np(1 − p).
Coin Tossing
You toss a fair coin ten times and count the occurrence of Hs.
I
n = 10
I
p = 1/2
If we repeat the ten trials repeatedly, how many heads should
occur on average?
µ = np = (10)(1/2) = 5
And the standard deviation?
p
p
p
σ = np(1 − p) = 10(1/2)(1/2) = 5/2
The Normal Approximation to Binomial Distributions
Suppose that a count X has the binomial distribution with n
observations and probability of success p.
When p
n is large, the distribution of X is approximately Normal,
N(np, np(1 − p)).
As a rule of thumb, we use the Normal approximation when n is so
large that np ≥ 10 and n(1 − p) ≥ 10.
!"#$%&'()#*+,$-'(./#$(
Remember This?
One Last Example
About 60% of American adults are either overweight or obese.
What is the probability that at least 1520 individuals from a
random sample of 2500 adults are overweight or obese?
Given that our sample is random, we can take the 2500 members
of our sample to be independent.
So we’re in the binomial setting:
I
n = 2500
I
p = 0.6
Using software, we find that
P(X ≥ 1520) = 0.2131.
Let’s Use the Normal Approximation 1
µ = np = (2500)(0.6) = 1500
p
p
σ = np(1 − p) = (2500)(0.6)(0.4) = 24.49
The distribution of this binomial random variable is approximated
well by the Normal distribution N(1500, 24.49)
(since np = 1500 ≥ 10 and n(1 − p) = 1000 ≥ 10).
2/22/12 11:47 AM
Let’s Use the Normal Approximation 2
X − 1500
1520 − 1500
P(X ≥ 1520) = P
≥
24.49
24.49
!
= P(Z ≥ 0.82)
= 1 − 0.7939 = 0.2061
The Normal approximation 0.2061 differs from the software result
0.2131 by only 0.007.
Related documents