Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BASIC STATISTICS Martin van Staveren Technical Director, Consumer Intelligence © Kantar Media CONTENTS DESCRIPTIVE STATISTICS STATISTICS TO MEASURE THE ACCURACY OF SURVEY FINDINGS USEFUL FORMULAE AND TABLES AVERAGES 10 11 15 15 16 16 16 17 17 18 18 19 20 Mean Mode Median 208 / 13 = 16 16 16 Mean Mode Median 1070 / 11 = 90.3 5 or 200 5 1 2 3 4 5 5 50 100 200 200 500 DISPERSION (i) So, as well as measuring the average, we need: MEASURES OF DISPERSION These tell us how spread out, or how consistent our results are. The most commonly used measure of dispersion is the: STANDARD DEVIATION In the two previous examples, the standard deviations are: 2.7 and 147.2 DISPERSION (ii) Mean Standard Deviation A 100 0 100 0 100 0 50 350 B 35 40 45 50 55 60 65 350 50 46.3 50 10.0 THE STANDARD DEVIATION ANSWER A B C D E F G 35 40 45 50 55 60 65 DEVIATION FROM AVERAGE 35 - 50 = 40 - 50 = 45 - 50 = 50 - 50 = 55 - 50 = 60 - 50 = 65 - 50 = DEVIATION2 -15 -10 -5 0 +5 +10 +15 225 100 25 0 25 100 225 700 VARIANCE = 700 / 7 =100 STANDARD DEVIATION = 100 = 10 Mean & Standard Deviation for Scale Strongly agree (5) Slightly agree (4) Neither (3) Slightly Disagree (2) Strongly Disagree (1) Sum Mean Variance Standard Deviation 50% 20% 15% 10% 5% 50 x 5 = 250 20 x 4 = 80 15 x 3 = 45 50 x 12 = 50 20 x 02 = 0 15 x 12 = 15 10 x 22 = 40 5 x 32 = 45 10 x 2 = 20 5 x 1 = 10 400 150 1.50 4.00 1.22 CONTENTS DESCRIPTIVE STATISTICS STATISTICS TO MEASURE THE ACCURACY OF SURVEY FINDINGS USEFUL FORMULAE AND TABLES SIGNIFICANCE TESTING • Uses a special type of Standard Deviation - the STANDARD ERROR • This measures the dispersion of a survey statistic from all possible samples from a universe • So it can tell us how often we will select a sample which, by chance, gives a very different result from that for the whole universe THE STANDARD ERROR (i) For a sample of 3 answers from: 35,40,45,50,55,60,65 Take all possible samples of three from the total set of seven; there are 343 (7x7x7) of these and so on......... Sample Answers Average 1 35,35,35 35.00 2 35,35,40 36.67 3 35,40,35 36.67 4 40,35,35 36.67 5 35,40,40 38.33 THE STANDARD ERROR (ii) Calculate the mean and variance of all 343 possible sample means: Mean 50 Variance 33.33 Standard Deviation 33.33 = 5.77 This Standard Deviation is the Standard Error of the Mean SAMPLING FREQUENCY DISTRIBUTION 33 36 37 36 28 Lower 95% Level 21 15 1 3 6 10 1.96 Standard Errors 33 28 21 Number of Samples Upper 95% Level Normal Curve 15 10 1.96 Standard Errors 6 3 1 35.00 36.67 38.33 40.00 41.67 43.33 45.00 46.67 48.33 50.00 51.67 53.33 55.00 56.67 58.33 60.00 61.67 63.33 65.00 45 40 35 30 25 20 15 10 5 0 DISPERSION OF THE SAMPLING DISTRIBUTION i) Of the 343 possible samples: 231 (67%) lie within 1 Standard Error of the true population mean 323 (95%) lie within 2 Standard Errors of the true population mean ii) In general, if we know the Standard Error of a statistic, we can be 95% sure that the true population mean lies within 2 Standard Errors of our sample estimate BACK TO THE STANDARD ERROR In practice, we can’t measure Standard Error from multiple samples, so use this formula: Standard Error = = 102 / 3 = 33.33 = 5.77 Pop. SD2 / Sample Size But, population standard deviation is usually unknown, so substitute sample SD: Standard Error = Sample SD2 / Sample Size WHY IS THE STANDARD ERROR IMPORTANT? Because the more variable a measure is, the more observations we should take. Standard Error = SD2 / Sample Size - How tall ? - How many magazines do you read ? THE STATISTICS SO FAR (i) Mean: measure of “central tendency”; a typical value Sum of Items Sample Size Standard Deviation: measures degree of variation in the data Sum of Squared Deviations Sample Size THE STATISTICS SO FAR (ii) Standard Error: • Measures the accuracy of a statistic • Gives a range (confidence interval) either side of the estimated statistic, within which the population value of that statistic is almost certain to fall Standard Deviation2 Sample Size The more variable a measure is, the more observations we should take CONTENTS DESCRIPTIVE STATISTICS STATISTICS TO MEASURE THE ACCURACY OF SURVEY FINDINGS USEFUL FORMULAE AND TABLES STANDARD ERROR FOR A PERCENTAGE (p*q)/n where: p is percentage; q is 100-p, n is sample size So the Standard Error of a statistic estimated at 40% on a sample of 1000 = ( 40 * 60 ) / 1000 = 1.55 So, we are 95% confident that the true population figure lies between: 40% + (2 x 1.55) = 43.1% and 40% - (2 x 1.55) = 36.9% TEST THE DIFFERENCE BETWEEN 2 PERCENTAGES Standard Error= (P1(100-P1)/N1)+(P2(100-P2)/N2) For example, if 30% of ABC1s, and 20% of C2DEs, use credit cards regularly, on samples of 333 and 667: Standard Error= (30x70/333)+(20x80/667) = 2.95 The actual difference is 30 - 20 = 10% 10 / 2.95 = 3.39 Standard Errors The probability of getting this difference from our sample, when the real difference in the population is zero, is less than 5% (difference is greater than 2 standard errors) So difference is statistically significant USEFUL TABLES Confidence limits for a percentage: Sample size: 200 500 1000 2000 10% 20% 50% 4.5% 3.0% 2.0% 1.5% 6.0% 4.0% 2.5% 2.0% 7.0% 4.5% 3.5% 2.5% 35% result on sample of 600, Confidence Limit is approx. + 4.0% USEFUL TABLES Significant difference for 2 percentages: 10% 20% 50% Sample size: 200 500 1000 2000 6.0% 4.0% 8.0% 5.0% 10.0% 6.5% 2.0% 2.5% 3.5% 3.0% 3.5% 4.5% 40% v 48% on samples of 300, need difference of approx. + 8.0%