Download Week 23

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
WEEK #23: Statistics for Spread; Binomial Distribution
Goals:
• Study measures of central spread, such interquartile range, variance, and standard deviation.
• Introduce standard distributions, including the binomial distribution.
Textbook reading for Week #23: Study Adler Section 6.9, 7.4
14
Standard Distributions
We have seen distributions, both discrete and continous, that originate with data
from experiments, or with formulas we made up to fit an example.
Many common and important classes of experiments have well-understood distributions. We will start with the binomial distribution.
Coin Flipping
The canonical example for the binomial distribution is a sequence of coin flips.
Compute the probability distribution for the number of heads flipped, if we flip
a coin three times. The coin is loaded, so the probability of a heads each time
is only 0.4 (tails probability is 0.6).
Week 23 – Statistics for Spread; Binomial Distribution
15
The Binomial Distribution
Consider the scenario of repeatedly running a two-outcome experiment. e.g.
• coin flipping, where each outcome is either heads or tails, or
• selecting alleles from parents, where each outcome is either B or b1
We arbitrarily label one of the outcomes as a “success” (e.g. heads, B allele).
If the probability of a successful outcome in each experiment is p, then what
is the probability of getting exactly k successes in n trials?
1
These two-outcome experiments are sometimes called “Bernoulli trials”.
16
We note that there is an essential counting step in computing these probabilities:
“How many ways can there be k successful outcomes out of n trials?”
Fortunately, this is an well-known question, with a well-understood solution, and
the calculation is built into most scientific calculators:
n
n!
Number of ways to choose
= “n choose k ′′ =
=
k successes in n tries
k
k!(n − k)!
Week 23 – Statistics for Spread; Binomial Distribution
17
On a Casio 991 model calculator, you can compute this with the (<SHIFT> ÷) or
nCr button.
Use the binomial distribution to find the probability that you will roll exactly
3 ones while rolling a fair six-sided die 10 times.
18
Consider a multi-genic phenotype, for which the visible effect of the genotype
depends on the number of B copies a plant has in total over 12 different loci
(24 possible B copies in total). Find the probability that a new offspring has
a total of 16 different B alleles out of the possible 24, given that B alleles are
distributed with p = 0.5 in the population at all loci.
Week 23 – Statistics for Spread; Binomial Distribution
19
We can now formalize these calculations by defining the binomial distribution.
Binomial Distribution
If a trial has two outcomes, and each trial is independent, the probability of k
successful outcomes in n trials is given by
n k
b(k; n, p) =
p (1 − p)n−k
k
where p is the probability of success in each trial.
20
Properties of the Binomial Distribution
If you flip 100 coins, intuitively how many do you expect to come up heads?
If the coins were loaded, so that the probability of a heads was only 0.1 instead
of 0.5, intuitively how many heads would you expect out of 100 tosses?
Week 23 – Statistics for Spread; Binomial Distribution
Mean of the Binomial Distribution
The mean of a binomial distribution, b(k; n, p), is given by
E(b) = b̄(k; n, p) = n · p
Variance of the Binomial Distribution
variance = σ 2 = Var(b(k; n, p)) = np(1 − p)
Standard Deviation of the Binomial Distribution
p
Standard dev. = σ = Std. Dev(b(k; n, p)) = np(1 − p)
21
22
Histograms of the Binomial Distribution
Knowing the binomial distribution function, it straightforward to compute the
probability of each number of successes, and so to draw a graph of the entire
distribution.
What is the range of the number of successes in n trials? How many possible
outcomes is does that entail?
Consider the distribution of the number of heads turning up in 10 flips of a
fair coin. Sketch the distribution you would expect for the total number of
heads out of 10.
0 1 2 3 4 5 6 7 8 9 10
Week 23 – Statistics for Spread; Binomial Distribution
23
Consider the distribution of the number of heads turning up in 10 flips of a
loaded coin, where heads have only a 0.1 probability each flip. Sketch the
distribution you would expect for the total number of heads out of 10.
0 1 2 3 4 5 6 7 8 9 10
24
Here are some other distributions of the binomial distribution, as we change the
probability of each individual success (p), and the number of trials (n).
p = 0.1, n = 10
0
1
2
3
4
5
6
7
p = 0.3, n = 10
8
9 10
0
⇓
5
10
15
2
3
4
5
6
7
8
9 10
0
⇓
p = 0.1, n = 20
0
1
p = 0.5, n = 10
0
5
10
15
2
3
4
5
6
7
8
9 10
⇓
p = 0.3, n = 20
20
1
p = 0.5, n = 20
20
0
5
10
15
20
Week 23 – Statistics for Spread; Binomial Distribution
25
p = 0.1, n = 50
0
10
20
30
p = 0.3, n = 50
40
50
0
10
⇓
20
40
60
30
40
50
0
10
⇓
p = 0.1, n = 100
0
20
p = 0.5, n = 50
100
0
20
40
60
30
40
50
80
100
⇓
p = 0.3, n = 100
80
20
p = 0.5, n = 100
80
100
0
20
Comment on the patterns you see in the distributions.
40
60
26
Normal Approximation to the Binomial Distribution
From the histograms of the binomial distribution, it seems that for large n values
the binomial distribution starts to look a lot like a normal, gaussian, or bell-curve
distribution.
One commonly referenced rule of thumb is:
A binomial distribution will be approximately normal in shape if
both np and n(1 − p) are above 10.
Relate this observation back to the previous histograms.
Week 23 – Statistics for Spread; Binomial Distribution
27
Beyond its mathematical interest (why does the binomial look like the normal
distribution?), we can take advantage of well-understood properties of the normal
distribution in analyzing binomial data.
Theorem: For a normal distribution, the probability of an outcome within ±2
standard deviations of the mean is 95% (rounded).
0.4
0.3
0.2
0.1
0
−4 −3 −2 −1
0
1
2
3
4
Under the assumption that some binomial distributions approximate the normal distribution, express this theorem as it applies to binomial distributions.
28
Example: A drug is undergoing re-evaluation by Health Canada for effectiveness as an anti-fungal treatment. The manufacturer claims the drug is
effective 60% of the time in killing off the fungus. Health Canada tracks 100
patients who are treated. Sketch the probability distribution for the number of
patients who are cured by the treatment, assuming the manufacturer’s claims
are true.
0 10 20 30 40 50 60 70 80 90 100
Week 23 – Statistics for Spread; Binomial Distribution
29
In this trial, only 53% of the patients are cured by the drug. Comment on
how much you can trust the claimed 60% cure rate.
30
Doubts have been raised in other countries about the same drug, so a larger
trial is commissioned with 1,000 patients. Sketch the probability distribution
for the number of patients cured in this trial, assuming again a 60% curative
probability for each patient.
0
200
400
600
800
1000
Week 23 – Statistics for Spread; Binomial Distribution
31
Again, only 53% of the patients are cured by the drug. Is your conclusion the
same or different than in the last example, and why?