Download Open

Measuring change in sample survey data Underlying Concept A sample statistic is our best estimate of a population parameter If we took 100 different samples from the same population to measure, for example, the mean height of men, we would get a 100 different estimates of mean height. The mean of these means would be very close to the real population mean. Population of men (Population Size = N) Sample of men from population (Sample size = n) We take a sample from the population and measure the heights of all our sample members. The mean height from this sample is 174.4 cm. Sample of men from population (Sample size = n) We take a another sample from the same population and measure the heights of all sample members. The mean height from this sample is 165.9 cm. Sample of sample means We take a another 100 samples from the same population and measure the heights of all sample members. Sample 1 mean was 174.4 cm Sample 2 mean was 165.9 cm Sample 3 mean was 171.0 cm Sample 4 mean was 175.2 cm Sample 5 mean was 162.8 cm Etc Etc Etc Sample of sample means We don’t ever take hundreds of samples. We just take 1. The concept of the mean of sample means is central to all survey statistics. The central limit theorem says that if we took a sufficiently large number of samples, the mean of the sample means would be normally distributed. This is true even if the thing we are measuring is not normally distributed. The central limit theorem can be proved mathematically. It is the basis of how we calculate our required sample size and how we calculate confidence intervals around our estimates………………. Variance, standard deviation and standard error Variance = the sum of squared differences from the mean divided by n-1 Sample values (n=5) Difference from mean Squared difference from the mean 172 171 - 172 = -1 -1 x -1 = 1 169 2 4 168 3 9 175 -4 16 171 0 0 Mean = 171 Sum = 0 Sum of squares = 30 Variance = 30 / 4 = 7.5 Standard deviation = the square root of the variance SD = √ variance = √7.5 = 2.74 Standard error = the square root of the variance divided by the sample size SE = √ (variance / n) = √ (7.5 / 5) = 1.22 Standard Error The standard error is our best estimate of the standard deviation of the sample means. In other words if we took 100 samples from the same population and got 100 estimates of men’s mean height, the standard deviation of that mean is the standard error. Confidence Intervals Because the means of sample means are normally distributed, we can use the characteristics of the normal distribution to look at our mean and standard error. We know that in a normal distribution 68.3% of values fall within one standard deviation of the mean and 95% fall within 1.96 standard deviations of the mean. So 1.96 times the standard error gives us the 95% confidence limits. Our standard error is 1.22. 1.96 x 1.22 = 2.4 Our sample mean is 171.0 171.0 – 2.4 = 168.6 171.0 + 2.4 = 173.4 So.. If we took 100 samples, 95 of them would have a mean somewhere between 168.6 and 173.4. Or… we can be 95% confident that the true mean (the population mean) lies between 168.6 and 173.4. It works the same for proportions The 95% confidence interval around a proportion is 1.96 times the standard error of the estimate. The standard error of a proportion is √ ( (p (100-p)) / n ) Where p is the percentage and n is the sample size. So if we estimate that 75% of people prefer dogs from a sample of 45, p=75 and n= 45. = √ (( 75 x (100-75)) / 45 ) = √ ( (75 x 25) / 45 ) = √ ( 1875 / 45 ) =√ 42.7 = 6.5 75 – 6.5 = 68.5 and 75 + 6.5 = 81.5 So.. if we took 100 samples, 95 of them would have a percentage somewhere between 68.5 and 81.5. Or… we can be 95% confident that the true percentage of people who prefer dogs (the population percentage) lies between 68.5 and 81.5. Design effects • If the sample is not a simple random sample then an adjustment will need to be made to the standard error • Proportionate stratification will decrease the standard error • Disproportionate stratification will increase the standard error • Clustering will increase the standard error • See PEAS website for information about design effects http://www2.napier.ac.uk/depts/fhls/peas/index.htm Finite Population Correction If the sample size is a large proportion of the population size (>5%) then applying the finite population correction will reduce the standard error Showing confidence intervals graphically 45 40 35 Percentage 30 25 20 15 10 5 0 2005 2006 2007 Year 2008 2009 Measuring change over time • As a rule of thumb, if two confidence intervals do not overlap we can be confident that there has been a change in the population • This requires that broadly similar sample methodology was used, and exactly the same survey questions • If different methodologies are used or the question changes, it becomes very difficult to say whether change in the population has occurred Setting a target 1. Calculate the confidence interval around the baseline estimate 2. Estimate what the confidence interval will be around the target figure 3. Make sure they don’t overlap Word of caution • Don’t mistake “statistically significant” for “meaningful” • A change of 0.01% can be statistically significant if the survey is large and precise enough, but most people wouldn’t call that meaningful • A meaningful change in the population could be missed if the survey isn’t designed to be precise enough: Make sure the survey is designed with the purpose of monitoring change in mind • And don’t forget all the non-statistical issues!!!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Open