Download A non-election-related poll! A new method for margin of error:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
11/2/12
Nov. 2 Statistic for the day:
Prior to the 2012 summer Olympics, the
percentages of Americans saying they
intended to watch at least a fair amount of the
Olympics was higher for women (63%) than
for men (53%)
A non-election-related poll!
In a Gallup poll conducted Jul. 19 -22, 2012, people
were asked:
How much of the Olympics do you intend to
watch?
59% answered "a great deal" or "a fair amount".
The fine print (from gallup.com):
maximum ± 4% margin of error; sample size=1030
Assignment: Read Chapter 19
Exercises pp. 367-369: 1, 2, 5, 6, 8, 10
Review
Remember the empirical (68 – 95 – 99.7) rule?
How did we measure and assess the uncertainty
in the sample percentage back in chapter 4?
margin of error = p
1
sample size
68%
95%
For this Gallup poll, the sample size is 1030, so we get:
margin of error = p
1
1
=
= 0.031
32.9
1030
(and Gallup says “maximum is ±4 percentage points”.)
A new method for margin of error:
Based on the 68-95-99.7 rule, since there is something
appealing about 95%, we can redefine the margin of error
as
Margin of error = 2 standard deviations
But standard deviations of what??
-3
-2
-1
0
1
2
3
99.7%
-3
It takes ±2 standard
deviations to get 95%.
-2
-1
0
1
2
3
-3
-2
-1
0
1
2
3
The Gallup Poll was based on 1030 telephone interviews.
Based on the sample of 1030 American adults, Gallup
estimated that 59% of a population of hundreds of millions
planned to watch at least a fair amount of the Olympics.
If they took a different sample of 1030, they would have gotten
a new sample percentage. It will not always be exactly 59%.
If they took lots of samples of 1030, they would get lots of
sample percentages.
Let's look at a hypothetical histogram for the percentages.
1
11/2/12
Histogram of 10,000 Percentages
10,000 percentages based on
10,000 samples of 1030 each.
Mean = ???
(okay, I cheated. I used 50%
for the true population percent.
But I had to use something for
the "unknown" population
percent!)
Approx. standard deviation =
(53% − 47%) / 4 =
1.5% (or .015)
44
s
46
48
50
52
54
56
Formula for estimating the standard
deviation of a sample proportion (without
a histogram):
sample proportion ⇥ (1 sample proportion)
sample size
Or in our case:
standard deviation ⇡
r
(.59) ⇥ (.41)
= 0.015
1030
If we happen to know the true population proportion we use it
instead of the sample proportion. (This is unrealistic; do you see why?)
What to expect from sample proportions:
An example
Facts: fingerprints may be influenced by prenatal hormones.
Most people have more ridges on right hand than left.
People who have more on the left hand are said to have
leftward asymmetry.
So in our example, if the sample size is 1030,
Then old method for MARGIN OF ERROR gives:
margin of error = p
1
1
=
= 0.031
32.9
1030
Or 3.1 %
And we report 59% + 3.1%
But suppose we define the margin of error to be 2 standard
deviations. We estimated the standard deviation from the
histogram to be .015. This nearly agrees since 2×.015 = .03.
Pretty close!
But creating a hypothetical histogram is a
royal pain! Is there an alternative?
Summary:
1.  We take a sample of 1030 phone interviews
2.  We estimate the percent of American adults
who plan to watch the Olympics: 59%
3.  To assess the uncertainty in the 59% sample
figure, we think of a normal curve of percentages
with a standard deviation of .015 = 1.5%
4.  Since this normal curve has 95% of its distribution
within 2×1.5% of the true value we want to know, we
conclude that 59% plus or minus 2×1.5% is a
reasonable interval of values for that true value to lie in.
Notice: the old M.O.E. formula gives about the same as 2×1.5%
In a study of 186 heterosexual and 66 homosexual men
26 (14%) heterosexual men showed the trait and
20 (30%) homosexual men showed the trait
(Reference: Hall, J. A. Y. and Kimura, D. "Dermatoglyphic
Asymmetry and Sexual Orientation in Men", Behavioral
Neuroscience, Vol. 108, No. 6, 1203-1206, Dec 94. )
Is it unusual to observe a sample of 66 men and observe
a sample proportion of 30%?
Women are more likely to have this trait than men.
The proportion of all men who have this trait is about 15%
2
11/2/12
Histogram of proportions, with Normal Curve
n = 66, true proportion = .15, standard deviation
= .044
We now know what the distribution of sample
proportions based on a sample of 66 should look like.
We will suppose that the true proportion in the
population of men is 15%.
Standard
deviation
(.15) ⇥ (1
66
.15)
= 0.044
15
Frequency
r
10
5
Now what? Let’s borrow some old ideas and find a zscore for the 30% observed in the experiment:
2 std devs
0
0.0
0.1
0.15
0.2
0.3
0.238
homosexual men
4 standard deviations
Thus, a sample proportion of 30% would be
(.30-.15)/.044 = 3.41 standard deviations above the true
mean, assuming that the sample is a representative
sample from the population.
Sample means: measurement variables
Suppose we want to estimate the mean weight at PSU
Histogram of Weight, with Normal Curve
The sample proportion for homosexual men (30%) is too
large to come from the expected distribution of sample
proportions.
What is the uncertainty in the mean?
We need a margin of error for the mean.
Suppose we take another sample of 237.
What will the mean be?
Will it be 152.5 again?
40
30
Frequency
0.062
Probably not.
20
Consider what would happen if we took 1000 samples,
each of size 237, and computed 1000 means.
10
0
100
200
300
Weight
Data from stat 100 survey, spring 2004. Sample size 237.
Mean value is 152.5 pounds.
Standard deviation is about (240 – 100)/4 = 35
Hypothetical result, using a "population" that resembles our sample:
Histogram of 1000 means with normal
curve, based on samples of size 237
Frequency
100
50
0
145
150
155
Weight
Standard deviation is about
(157 – 148)/4 = 9/4 = 2.25
160
Extremely
interesting:
The histogram
of means is
bell-shaped,
even though
the original
population
was skewed!
Formula for estimating the standard deviation of
the sample mean (don't need histogram)
Just like in the case of proportions, we would
like to have a simple formula to find the
standard deviation of the mean without having
to resample a lot of times.
Suppose we have the standard deviation of the
original sample. Then the standard deviation
of the sample mean is:
standard deviation of the data
sample size
3
11/2/12
Example: SAT math scores
So in our example of weights:
The standard deviation of the sample is about 35.
Hence by our formula:
Standard deviation of the mean is 35 divided by
the square root of 237:
35/15.4 = 2.3
(Recall we estimated it to be 2.25)
So the margin of error of the sample mean is
2×2.3 = 4.6
Report 152.5 ± 4.6 (or 147.9 to 157.1)
Suppose nationally we know that the SAT math test has a
mean of 100 points and a standard deviation of 100 points.
Draw by hand a picture of what you expect the distribution
of sample means based on samples of size 100 to look like.
Sample means have a normal distribution
mean 500
standard deviation 100/10 = 10
So draw a bell shaped curve, centered at 500, with 95%
of the bell between 500 – 20 = 480 and 500 + 20 = 520
0.03
0.04
Normal curve of SAT means, sample size 100
0.02
A random sample of 100
SAT math scores with a
mean of 540 would be
very unusual.
0.00
0.01
A sample of 100 with a
mean of 510 would not
be unusual.
460
480
500
520
540
Score
4