Download Class 18

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Homework 8, Due March 25 2010
Chapter 6 (pages 299–301): 6.36, 6.38, 6.42, 6.44
1
P (x) =
n!
px(1 − p)n−x.
x!(n − x)!
This formula is known as the binomial distribution. The essential
conditions for a binomial distribution are:
1. Each outcome of the experiment has exactly two possible
outcomes, which we label “success” and “failure” though
they could have many different interpretations (alive or dead,
rain or no rain, flight arrives on time or not, etc.). Data
having this form are called binary data.
2. The experiment is repeated a number of times (n) and the
different experiments are independent.
3. The probability of “success” is some number p (between 0
and 1) and is the same for all the experiments.
2
Here is another example.
A tennis player serves her first serve in 70% of the time. Assume
each serve is independent of all the others. She serves the ball
six times. What is the probability that she gets
(a) All 6 serves in?
(b) Exactly 4 serves in?
(c) At least 4 serves in?
(d) No more than 4 serves in?
3
Solution:
(a) (0.7)6 = .118.
6! × (0.7)4 × (0.3)2 = 6×5×4×3×2×1 × (0.7)4 × (0.3)2
(b) 4!2!
4×3×2×1×2×1
= 15 × .2401 × .09 = .324.
6! × (0.7)5 × 0.3
(c) The probability of exactly 5 is 5!1!
= 6 × 0.16807 × 0.3 = .303. So the probability of at least 4
is .324+.303+.118=.745.
(d) The probability of at least 5 is .303+.118=.421 so the probability of not more than 4 is 1–.421=.579.
4
The Normal approximation
Although the binomial formula is faster to calculate than trying
to count all possibilities, it would still be hard to use for large
samples, say n = 100. In this case, we use an alternative approach based on approximating the binomial distribution by a
normal distribution.
Suppose we have a binomial distribution with n trials and probability of success p, and n is some large number (say, 100). In
this situation, we are not usually interested in the exact number
of successes, but in the probability that the number will be more
or less than some given number.
5
Example 1 (from text): In a certain week in 1997, the police at
a certain location in Philadephia made 262 car stops. Of these,
207 drivers were African American. Among the whole population
of Philadelphia, 42.2% are African American. Does this prove the
police were guilty of “racial profiling”, i.e. deliberately stopping
drivers because they were African Americans?
Assuming the traffic stops are independent and the proportion
of African Americans driving at this particular location is the
same as the proportion in the whole city, this corresponds to the
random variable X (number of African Americans among those
stopped) having a binomial distribution with n = 262, p = 0.422.
The question is, what is the probability that X ≥ 207 if the
binomial distribution is correct?
6
Note: In this case it wouldn’t make sense to try to calculate the
probability that X is exactly 207. What we’re really concerned
about is that the number is so large, so a natural question is
“what is the probability that the number would have been as
large as this by chance?” That leads us to consider X ≥ 207
rather than X = 207.
7
A Key Formula (page 297)
The binomial distribution for n trials with probability p of success
on each trial has mean µ and standard deviation σ given by
µ = np,
q
np(1 − p).
σ =
8
The solution proceeds by several steps:
Step 1: Calculate the mean of X. This is given by the formula
µ = np = 262 × 0.422 = 110.6.
Step 2: Calculate the standard deviation of X. This is given by
the formula
q
√
σ = np(1 − p) = 262 × 0.422 × 0.578 = 7.99.
Step 3: Convert the given x value (207) to z. So
z=
207 − 110.6
x−µ
=
= 12.07.
σ
7.99
Step 4: Calculate the probability associated with this z value.
9
The only problem with step 4 is: the number’s off the chart!
The regular table only goes up to 3.49. In fact, if you look at
the little table in the bottom corner of page A2, you can see
some further numbers:
z
3.5
4.0
4.5
5.0
Probability
.999767
.9999683
.9999966
.999999713
Even at z = 5, the probability to the left of z (i.e. less than 5)
is more than .999999, which means that the probability to the
right of z is less than .0000001. Replace z = 5 by z = 12, and
the probability of that is much smaller again.
10
Conclusion. The probability that we could have got this result
(207 African Americans out of 262) by chance is so small that it
is effectively 0. This seems to be completely convincing evidence
that the police were engaging in the practice of racial profiling.
However, there are other possible explanations — for example,
perhaps the proportion of African Americans driving past this
particular checkpoint was much greater than 42.2%.
Further Discussion. It is possible to compute the exact probability that X ≥ 207} in this example: the answer is 4.9 × 10−34. To
give an idea of how small a probability that is, it is roughly equivalent to the probability that your favorite baseball team win the
World Series 23 times in succession! [On the assumption that
there are 30 Major League Baseball teams, that any one of them
is equally likely to win in a given
23 year, and that results from year
1
= 1.1 × 10−34.]
to year are independent. 30
11
Example 2:
Consider our earlier example about the tennis player who gets in
70% of her serves. In a whole match she serves 80 times. What
is the probability she makes at least 65 of these?
12
Solution:
First calculate µ and σ:
µ = 80 × 0.7 = 56,
√
σ =
80 × 0.7 × 0.3 = 4.1.
Then for x = 65, we have z = 65−56
4.1 = 2.20. Look up 2.20 in the
normal table: the corresponding left-hand probability is .9861.
So the answer is 1 − 0.9861 = 0.0139. In other words, it would
be very unusual if she actually achieved this in a game, though
it would be nothing like as “surprising” as our racial profiling
example!
[Again it is possible to use a computer to calculate the exact
probability. In this case it comes to .0161, compared with the
above approximate answer of .0139. This gives an idea how
accurate the normal approximation is. It’s not perfect, but it’s
good enough for most practical calculations.]
13
Guidelines for the normal approximation to the binomial distribution (see sidebar p. 299): the binomial distribution can be
approximated well by a normal distribution when the expected
number of successes, np, expected number of failures, n(1 − p),
are both at least 15.
In the racial profiling example, n = 262, p = 0.422 so np =
110.6, n(1 − p) = 151.4. In the tennis example, n = 70, p = 0.7
so np = 56, n(1 − p) = 24. In both cases, the number is greater
than 15 so the condition is satisfied.
14
Sampling Distributions
Example: An ABC News/Washington Post opinion poll published
February 23 2009 stated that President Obama had an approval
rating of 68% (among all voters — the ratings were sharply
different among Democrats, Independents and Republicans).
This is based on a sample of 1001 voters. The margin of error
is described as plus or minus 3%.
What exactly does this mean?
15
Let’s focus on the proportion of people who supported the President. In this case 68% is a statistic — the number calculated
from the sample. The true proportion in the population is an
unknown value p.
Collecting a sample is essentially a binomial distribution, with n =
1001. However most opinion polls are reported as the percentage
or proportion of people who vote a certain way, rather than the
total number. Therefore, our interest is in the sample proportion.
If X is the number of people who support Obama in the poll,
then the sample proportion is X/n (so in this example, X was
about 681, which would lead to X/n = 681/1001 = 0.68 to two
decimal places).
16
For a sample proportion we have (see sidebar, page 313):
Mean = p,
s
p(1 − p)
Standard Deviation =
.
n
So assuming p = 0.68,rin this case we get a mean of 0.68 and a
standard deviation of
p(1−p)
=
n
q
0.68×0.32 = 0.0147.
1001
Also the normal distribution applies (because again np > 15, n(1−
p) > 15), so we can assume the sampling distribution is approximately normal.
17
Conclusion: If the true value of p = 0.68, then in repeated samples of size 1008, the sampling distribution will be approximately
normal with a mean of 0.68 and a standard deviation of 0.0147.
In particular, approximately 95% of all polls will result in a sample
proportion within 0.0294 (2 standard deviations, or 3 percentage
points) of the mean, and approximately 99.7% of all polls will result in a sample proportion within 0.0441 (3 standard deviations,
or 4.4 percentage points) of the mean.
r
plays a special role in this calcuBecause the quantity p(1−p)
n
lation, it is given a special name — the standard error.
18