Download Open

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Measuring change in
sample survey data
Underlying Concept
A sample statistic is our best estimate of a
population parameter
If we took 100 different samples from the
same population to measure, for example,
the mean height of men, we would get a
100 different estimates of mean height.
The mean of these means would be very
close to the real population mean.
Population of men
(Population Size = N)
Sample of men from population
(Sample size = n)
We take a sample
from the
population and
measure the
heights of all our
sample members.
The mean height
from this sample
is 174.4 cm.
Sample of men from population
(Sample size = n)
We take a
another sample
from the same
population and
measure the
heights of all
sample members.
The mean height
from this sample
is 165.9 cm.
Sample of sample means
We take a another 100 samples from the same population and measure the
heights of all sample members.
Sample 1 mean was 174.4 cm
Sample 2 mean was 165.9 cm
Sample 3 mean was 171.0 cm
Sample 4 mean was 175.2 cm
Sample 5 mean was 162.8 cm
Etc
Etc
Etc
Sample of sample means
We don’t ever take hundreds of samples. We just take 1.
The concept of the mean of sample means is central to all survey
statistics.
The central limit theorem says that if we took a sufficiently large
number of samples, the mean of the sample means would be
normally distributed.
This is true even if the thing we are measuring is not normally
distributed.
The central limit theorem can be proved mathematically.
It is the basis of how we calculate our required sample size and
how we calculate confidence intervals around our
estimates……………….
Variance, standard deviation and
standard error
Variance = the sum of
squared differences
from the mean
divided by n-1
Sample
values (n=5)
Difference from
mean
Squared difference
from the mean
172
171 - 172 = -1
-1 x -1 = 1
169
2
4
168
3
9
175
-4
16
171
0
0
Mean = 171
Sum = 0
Sum of squares = 30
Variance = 30 / 4 = 7.5
Standard deviation = the
square root of the
variance
SD = √ variance
= √7.5 = 2.74
Standard error = the
square root of the
variance divided by
the sample size
SE = √ (variance / n)
= √ (7.5 / 5)
= 1.22
Standard Error
The standard error is our best estimate of
the standard deviation of the sample
means.
In other words if we took 100 samples from
the same population and got 100
estimates of men’s mean height, the
standard deviation of that mean is the
standard error.
Confidence Intervals
Because the means of sample means are normally distributed, we can use the
characteristics of the normal distribution to look at our mean and standard
error.
We know that in a normal distribution 68.3% of values fall within one standard
deviation of the mean and 95% fall within 1.96 standard deviations of the
mean.
So 1.96 times the standard error gives us the 95% confidence limits.
Our standard error is 1.22.
1.96 x 1.22 = 2.4
Our sample mean is 171.0
171.0 – 2.4 = 168.6
171.0 + 2.4 = 173.4
So.. If we took 100 samples, 95 of them would have a mean somewhere
between 168.6 and 173.4.
Or… we can be 95% confident that the true mean (the population mean) lies
between 168.6 and 173.4.
It works the same for proportions
The 95% confidence interval around a proportion is 1.96 times the standard error of the
estimate.
The standard error of a proportion is √ ( (p (100-p)) / n )
Where p is the percentage and n is the sample size.
So if we estimate that 75% of people prefer dogs from a sample of 45, p=75 and n= 45.
= √ (( 75 x (100-75)) / 45 )
= √ ( (75 x 25) / 45 )
= √ ( 1875
/ 45 )
=√
42.7
= 6.5
75 – 6.5 = 68.5 and 75 + 6.5 = 81.5
So.. if we took 100 samples, 95 of them would have a percentage somewhere between
68.5 and 81.5.
Or… we can be 95% confident that the true percentage of people who prefer dogs (the
population percentage) lies between 68.5 and 81.5.
Design effects
• If the sample is not a simple random sample
then an adjustment will need to be made to the
standard error
• Proportionate stratification will decrease the
standard error
• Disproportionate stratification will increase the
standard error
• Clustering will increase the standard error
• See PEAS website for information about design
effects
http://www2.napier.ac.uk/depts/fhls/peas/index.htm
Finite Population Correction
If the sample size is a large proportion of
the population size (>5%) then applying
the finite population correction will reduce
the standard error
Showing confidence intervals graphically
45
40
35
Percentage
30
25
20
15
10
5
0
2005
2006
2007
Year
2008
2009
Measuring change over time
• As a rule of thumb, if two confidence intervals do
not overlap we can be confident that there has
been a change in the population
• This requires that broadly similar sample
methodology was used, and exactly the same
survey questions
• If different methodologies are used or the
question changes, it becomes very difficult to
say whether change in the population has
occurred
Setting a target
1. Calculate the confidence interval around
the baseline estimate
2. Estimate what the confidence interval will
be around the target figure
3. Make sure they don’t overlap
Word of caution
• Don’t mistake “statistically significant” for
“meaningful”
• A change of 0.01% can be statistically significant
if the survey is large and precise enough, but
most people wouldn’t call that meaningful
• A meaningful change in the population could be
missed if the survey isn’t designed to be precise
enough: Make sure the survey is designed with
the purpose of monitoring change in mind
• And don’t forget all the non-statistical issues!!!
Related documents