Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Sampling Distribution
of a Sample Proportion
Lecture 27
Sections 8.1 – 8.2
Wed, Oct 25, 2006
Preview of the Central Limit
Theorem



We looked at the distribution of the sum of 1, 2,
and 3 uniform random variables U(0, 1).
We saw that the shapes of their distributions
was moving towards the shape of the normal
distribution.
If we replace “sum” with “average,” we will
obtain the same phenomenon, but on the scale
from 0 to 1 each time.
Preview of the Central Limit
Theorem
2
1
0
1
Preview of the Central Limit
Theorem
2
1
0
1
Preview of the Central Limit
Theorem
2
1
0
1
Preview of the Central Limit
Theorem

Some observations:
Each distribution is centered at the same place, ½.
 The distributions are being “drawn in” towards the
center.
 That means that their standard deviation is
decreasing.


Can we quantify this?
Preview of the Central Limit
Theorem
= ½
2 = 1/12
2
1
0
1
Preview of the Central Limit
Theorem
= ½
2 = 1/24
2
1
0
1
Preview of the Central Limit
Theorem
= ½
2 = 1/36
2
1
0
1
Preview of the Central Limit
Theorem

This tells us that a mean based on three
observations is much more likely to be close to
the population mean than is a mean based on
only one or two observations.
Parameters and Statistics

THE PURPOSE OF A STATISTIC IS TO
ESTIMATE A POPULATION PARAMETER.
A sample mean is used to estimate the population
mean.
 A sample proportion is used to estimate the
population proportion.



Sample statistics, by their very nature, are
variable.
Population parameters are fixed.
Some Questions



We hope that the sample proportion is close to
the population proportion.
How close can we expect it to be?
Would it be worth it to collect a larger sample?
If the sample were larger, would we expect the
sample proportion to be closer to the population
proportion?
 How much closer?

The Sampling Distribution of a
Statistic

Sampling Distribution of a Statistic – The
distribution of values of the statistic over all
possible samples of size n from that population.
The Sample Proportion





Let p be the population proportion.
Then p is a fixed value (for a given population).
Let p^ (“p-hat”) be the sample proportion.
Then p^ is a random variable; it takes on a new
value every time a sample is collected.
The sampling distribution of p^ is the
probability distribution of all the possible values
of p^.
Example



Suppose that this class is 3/4 freshmen.
Suppose that we take a sample of 1 student.
Find the sampling distribution of p^.
Example
3/4
F
P(F) = 3/4
N
P(N) = 1/4
1/4
Example


Let X be the number of freshmen in the sample.
The probability distribution of X is
x
P(x)
0
1/4
1
3/4
Example


Let p^ be the proportion of freshmen in the
sample. (p^ = X/n.)
The sampling distribution of p^ is
x
P(p^ = x)
0
1/4
1
3/16
Example


Now we take a sample of 2 student, sampling
with replacement.
Find the sampling distribution of p^.
Example
3/4
3/4
F
P(FF) = 9/16
N
P(FN) = 3/16
F
P(NF) = 3/16
N
P(NN) = 1/16
1/4
1/4
3/4
N
F
1/4
Example


Let X be the number of freshmen in the sample.
The probability distribution of X is
x
P(x)
0
1/16
1
6/16
2
9/16
Example


Let p^ be the proportion of freshmen in the
sample. (p^ = X/n.)
The sampling distribution of p^ is
x
P(p^ = x)
0
1/16
1/2
6/16
1
9/16
Samples of Size n = 3

If we sample 3 people (with replacement) from
a population that is 3/4 freshmen, then the
proportion of freshmen in the sample has the
following distribution.
x
0
1/3
2/3
1
P(p^ = x)
1/64 = .02
9/64 = .14
27/64 = .42
27/64 = .42
Samples of Size n = 4

If we sample 4 people (with replacement) from
a population that is 3/4 freshmen, then the
proportion of freshmen in the sample has the
following distribution.
x
0
1/4
P(p^ = x)
1/256 = .004
12/256 = .05
2/4
3/4
1
54/256 = .21
108/256 = .42
81/256 = .32
The Parameters of the Sampling
Distributions

When n = 1, the sampling distribution is
p^
0
1

P(p^)
1/4
3/4
The mean and standard deviation are
 = 3/4 = 0.75
 2 = 3/16 = 0.1875

The Parameters of the Sampling
Distributions


When n = 2, the sampling distribution is
p^
P(p^)
0
1/16
1/2
6/16
1
9/16
The mean and standard deviation are
 = 3/4 = 0.75
 2 = 3/32 = 0.09375

The Parameters of the Sampling
Distributions


When n = 3, the sampling distribution is
p^
P(p^)
0
1/64 = .02
1/3
9/64 = .14
2/3
27/64 = .42
1
27/64 = .42
The mean and standard deviation are
 = 3/4 = 0.75
 2 = 3/48 = 0.0625

The Parameters of the Sampling
Distributions


When n = 4, the sampling distribution is
p^
P(p^)
0
1/256 = .004
1/4
12/256 = .05
2/4
54/256 = .21
3/4
108/256 = .42
1
81/256 = .32
The mean and standard deviation are
 = 3/4 = 0.75
 2 = 3/64 = 0.046875

Sampling Distributions


Run the program
Central Limit Theorem for Proportions.exe.
Use n = 30 and p = 0.75; generate 10000
samples.
100 Samples of Size n = 30
 = 0.75
 = 0.079
Observations and Conclusions


Observation #1: The values of p^ are clustered
around p.
Conclusion #1: p^ is probably close to p.
Larger Sample Size



Now we will select 10000 samples of size 30
instead of only 100 samples.
Run the program
Central Limit Theorem for Proportions.exe.
Pay attention to the shape of the distribution.
10,000 Samples of Size n = 30
 = 0.75
 = 0.0395
10,000 Samples of Size n = 30
More Observations and Conclusions


Observation #2: The distribution of p^ appears
to be approximately normal.
Conclusion #2: We can use the normal
distribution to calculate just how close to p we
can expect p^ to be.
Larger Sample Size



Now we will increase the sample size from 30 to
200 (and still generate 10000 such samples).
Run the program
Central Limit Theorem for Proportions.exe.
Pay attention to the spread (standard deviation)
of the distribution.
10000 Samples of Size n = 200
 = 0.75
 = 0.0395
Observations and Conclusions



Observation #3: As the sample size increases,
the clustering is tighter.
Conclusion #3-1: Larger samples give more
reliable estimates.
Conclusion #3-2: For sample sizes that are large
enough, we can make very good estimates of
the value of p.
One More Conclusion


However, we must know the values of  and 
for the distribution of p^.
That is, we have to quantify the sampling
distribution of p^.
The Sampling Distribution of

It turns out that the sampling distribution of p^
is approximately normal with the following
parameters.
Mean of pˆ  p
Variance of pˆ 
p1  p 
n
Standard deviation of pˆ 

^
p
p1  p 
n
This is the Central Limit Theorem for
Proportions, summarized on page 519.
The Sampling Distribution of

^
p
The approximation to the normal distribution is
excellent if
np  5 and n1  p  5.
Why Surveys Work


Check out the latest poll results for the Virginia
Senate race between George Allen and James
Webb:
http://www.realclearpolitics.com/latestpolls/
If Webb really has 47% of the (decided) vote,
what is the probability that a survey of 625 likely
voters would show that he had only 43%?
Why Surveys Work

First, describe the sampling distribution of p^ if
the sample size is n = 625 and p = 0.47.
Check: np = 293.75  5 and n(1 – p) = 331.25  5.
 p^ is approximately normal.

 pˆ  0.47
 pˆ 
0.47 0.53  0.01996
625
Why Surveys Work

The z-score of 0.43 is
0.43  0.47
z
 2.004
0.01996


P(p^ < 0.43) = P(Z < -2.004)
= 0.0225 (not likely!)
Or use normalcdf(-E99, 0.43, 0.47, 0.01996).
Why Surveys Work




Perform the same calculation, but with a smaller
sample size, say n = 50.
The probability turns out to be 0.2855, nearly a
30% chance!
By symmetry, there is also nearly a 30% chance
that the sample proportion is greater than 51%.
Thus, there is nearly a 60% chance that the sample
proportion is off by at least 4 percentage points.
The Margin of Error



For now, we can consider the margin of error to
be 2 standard deviations.
In our example, with sample size n = 625, the
margin of error is
2(0.01996) = 0.03992 = 3.992%.
With a sample size of n = 50, the margin of
error is
2(0.07058) = 0.14116 = 14.116%.