Download Week 23

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
WEEK #23: Statistics for Spread; Binomial Distribution
Goals:
• Study measures of central spread, such interquartile range, variance, and standard
deviation.
• Introduce standard distributions, including the binomial distribution.
Textbook reading for Week #23: Study Adler Section 6.9, 7.4
Standard Distributions
We have seen distributions, both discrete and continous, that originate with data from
experiments, or with formulas we made up to fit an example.
Many common and important classes of experiments have well-understood distributions.
We will start with the binomial distribution.
Coin Flipping
The canonical example for the binomial distribution is a sequence of coin flips. Compute
the probability distribution for the number of heads flipped, if we flip a coin three times.
The coin is loaded, so the probability of a heads each time is only 0.4 (tails probability is
0.6).
The Binomial Distribution
Consider the scenario of repeatedly running a two-outcome experiment. e.g.
• coin flipping, where each outcome is either heads or tails, or
• selecting alleles from parents, where each outcome is either B or b1
We arbitrarily label one of the outcomes as a “success” (e.g. heads, B allele).
If the probability of a successful outcome in each experiment is p, then what is the probability of getting exactly k successes in n trials?
1
These two-outcome experiments are sometimes called “Bernoulli trials”.
We note that there is an essential counting step in computing these probabilities:
“How many ways can there be k successful outcomes out of n trials?”
Fortunately, this is an well-known question, with a well-understood solution, and the
calculation is built into most scientific calculators:
Number of ways to choose
= “n choose k ′′ =
k successes in n tries
n!
n
=
k
k!(n − k)!
On a Casio 991 model calculator, you can compute this with the (<SHIFT> ÷) or nCr
button.
Use the binomial distribution to find the probability that you will roll exactly 3 ones while
rolling a fair six-sided die 10 times.
Consider a multi-genic phenotype, for which the visible effect of the genotype depends on
the number of B copies a plant has in total over 12 different loci (24 possible B copies
in total). Find the probability that a new offspring has a total of 16 different B alleles
out of the possible 24, given that B alleles are distributed with p = 0.5 in the population
at all loci.
We can now formalize these calculations by defining the binomial distribution.
Binomial Distribution
If a trial has two outcomes, and each trial is independent, the probability of k successful
outcomes in n trials is given by
n k
b(k; n, p) =
p (1 − p)n−k
k
where p is the probability of success in each trial.
Properties of the Binomial Distribution
If you flip 100 coins, intuitively how many do you expect to come up heads?
If the coins were loaded, so that the probability of a heads was only 0.1 instead of 0.5,
intuitively how many heads would you expect out of 100 tosses?
Mean of the Binomial Distribution
The mean of a binomial distribution, b(k; n, p), is given by
E(b) = b̄(k; n, p) = n · p
Variance of the Binomial Distribution
variance = σ 2 = Var(b(k; n, p)) = np(1 − p)
Standard Deviation of the Binomial Distribution
Standard dev. = σ = Std. Dev(b(k; n, p)) =
p
np(1 − p)
Histograms of the Binomial Distribution
Knowing the binomial distribution function, it straightforward to compute the probability of each number of successes, and so to draw a graph of the entire distribution.
What is the range of the number of successes in n trials? How many possible outcomes
is does that entail?
Consider the distribution of the number of heads turning up in 10 flips of a fair coin.
Sketch the distribution you would expect for the total number of heads out of 10.
0 1 2 3 4 5 6 7 8 9 10
Consider the distribution of the number of heads turning up in 10 flips of a loaded coin,
where heads have only a 0.1 probability each flip. Sketch the distribution you would expect
for the total number of heads out of 10.
0 1 2 3 4 5 6 7 8 9 10
Here are some other distributions of the binomial distribution, as we change the probability of each individual success (p), and the number of trials (n).
p = 0.1, n = 10
0
1
2
3
4
5
6
7
p = 0.3, n = 10
8
] ]
9 10
0
1
2
⇓
0
5
10
15
20
] ]
0
5
⇓
0
10
20
30
20
40
60
7
8
9 10
] ]
0
1
2
10
40
50
] ]
0
10
20
20
] ]
0
5
] ]
0
20
40
60
6
7
8
9 10
10
15
20
⇓
40
50
] ]
0
10
20
30
40
50
80
100
⇓
p = 0.3, n = 100
100
5
p = 0.5, n = 50
⇓
80
4
⇓
15
30
3
p = 0.5, n = 20
⇓
p = 0.1, n = 100
0
6
p = 0.3, n = 50
⇓
] ]
5
p = 0.3, n = 20
p = 0.1, n = 50
] ]
4
⇓
p = 0.1, n = 20
] ]
3
p = 0.5, n = 10
p = 0.5, n = 100
80
Comment on the patterns you see in the distributions.
100
] ]
0
20
40
60
Normal Approximation to the Binomial Distribution
From the histograms of the binomial distribution, it seems that for large n values the
binomial distribution starts to look a lot like a normal, gaussian, or bell-curve distribution.
One commonly referenced rule of thumb is:
A binomial distribution will be approximately normal in shape if
both np and n(1 − p) are above 10.
Relate this observation back to the previous histograms.
Beyond its mathematical interest (why does the binomial look like the normal distribution?), we can take advantage of well-understood properties of the normal distribution
in analyzing binomial data.
Theorem: For a normal distribution, the probability of an outcome within ±2 standard
deviations of the mean is 95% (rounded).
0.4
0.3
0.2
0.1
0
−4 −3 −2 −1
0
1
2
3
4
Under the assumption that some binomial distributions approximate the normal distribution, express this theorem as it applies to binomial distributions.
Example: A drug is undergoing re-evaluation by Health Canada for effectiveness as
an anti-fungal treatment. The manufacturer claims the drug is effective 60% of the time
in killing off the fungus. Health Canada tracks 100 patients who are treated. Sketch
the probability distribution for the number of patients who are cured by the treatment,
assuming the manufacturer’s claims are true.
0 10 20 30 40 50 60 70 80 90 100
In this trial, only 53% of the patients are cured by the drug. Comment on how much you
can trust the claimed 60% cure rate.
Doubts have been raised in other countries about the same drug, so a larger trial is
commissioned with 1,000 patients. Sketch the probability distribution for the number of
patients cured in this trial, assuming again a 60% curative probability for each patient.
0
200
400
600
800
1000
Again, only 53% of the patients are cured by the drug. Is your conclusion the same or
different than in the last example, and why?