Download Statistics - Kantar

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
BASIC STATISTICS
Martin van Staveren
Technical Director, Consumer Intelligence
© Kantar Media
CONTENTS
DESCRIPTIVE STATISTICS
STATISTICS TO MEASURE THE ACCURACY OF SURVEY FINDINGS
USEFUL FORMULAE AND TABLES
AVERAGES
10
11
15
15
16
16
16
17
17
18
18
19
20
Mean
Mode
Median
208 / 13 = 16
16
16
Mean
Mode
Median
1070 / 11 = 90.3
5 or 200
5
1
2
3
4
5
5
50
100
200
200
500
DISPERSION (i)
So, as well as measuring the average, we need:
MEASURES OF DISPERSION
These tell us how spread out, or how consistent our results are.
The most commonly used measure of dispersion is the:
STANDARD DEVIATION
In the two previous examples, the standard deviations are:
2.7 and 147.2
DISPERSION (ii)
Mean
Standard Deviation
A
100
0
100
0
100
0
50
350
B
35
40
45
50
55
60
65
350
50
46.3
50
10.0
THE STANDARD DEVIATION
ANSWER
A
B
C
D
E
F
G
35
40
45
50
55
60
65
DEVIATION
FROM
AVERAGE
35 - 50 =
40 - 50 =
45 - 50 =
50 - 50 =
55 - 50 =
60 - 50 =
65 - 50 =
DEVIATION2
-15
-10
-5
0
+5
+10
+15
225
100
25
0
25
100
225
700
VARIANCE = 700 / 7 =100
STANDARD DEVIATION =
100 = 10
Mean & Standard Deviation for Scale
Strongly agree (5)
Slightly agree (4)
Neither (3)
Slightly Disagree (2)
Strongly Disagree (1)
Sum
Mean
Variance
Standard Deviation
50%
20%
15%
10%
5%
50 x 5 = 250
20 x 4 = 80
15 x 3 = 45
50 x 12 = 50
20 x 02 = 0
15 x 12 = 15
10 x 22 = 40
5 x 32 = 45
10 x 2 = 20
5 x 1 = 10
400
150
1.50
4.00
1.22
CONTENTS
DESCRIPTIVE STATISTICS
STATISTICS TO MEASURE THE ACCURACY OF SURVEY FINDINGS
USEFUL FORMULAE AND TABLES
SIGNIFICANCE TESTING
•  Uses a special type of Standard Deviation - the STANDARD
ERROR
•  This measures the dispersion of a survey statistic from all
possible samples from a universe
•  So it can tell us how often we will select a sample which, by
chance, gives a very different result from that for the whole
universe
THE STANDARD ERROR (i)
For a sample of 3 answers from:
35,40,45,50,55,60,65
Take all possible samples of three from the total set of seven; there are 343 (7x7x7) of
these
and so on.........
Sample
Answers
Average
1
35,35,35
35.00
2
35,35,40
36.67
3
35,40,35
36.67
4
40,35,35
36.67
5
35,40,40
38.33
THE STANDARD ERROR (ii)
Calculate the mean and variance of all 343 possible sample means:
Mean
50
Variance
33.33
Standard Deviation
33.33 = 5.77
This Standard Deviation is the
Standard Error of the Mean
SAMPLING FREQUENCY DISTRIBUTION
33
36 37 36
28
Lower
95% Level
21
15
1
3
6
10
1.96 Standard Errors
33
28
21
Number
of
Samples
Upper 95%
Level
Normal
Curve
15
10
1.96 Standard Errors
6
3
1
35.00
36.67
38.33
40.00
41.67
43.33
45.00
46.67
48.33
50.00
51.67
53.33
55.00
56.67
58.33
60.00
61.67
63.33
65.00
45
40
35
30
25
20
15
10
5
0
DISPERSION OF THE SAMPLING DISTRIBUTION
i) Of the 343 possible samples:
231 (67%) lie within 1 Standard Error of the true population mean
323 (95%) lie within 2 Standard Errors of the true population mean
ii) In general, if we know the Standard Error of a statistic, we can
be 95% sure that the true population mean lies within 2 Standard
Errors of our sample estimate
BACK TO THE STANDARD ERROR
In practice, we can’t measure Standard Error from
multiple samples, so use this formula:
Standard Error =
= 102 / 3
= 33.33
= 5.77
Pop. SD2 / Sample Size
But, population standard deviation is usually
unknown, so substitute sample SD:
Standard Error = Sample SD2 / Sample Size
WHY IS THE STANDARD ERROR
IMPORTANT?
Because the more variable a measure is, the more
observations we should take.
Standard Error =
SD2 / Sample Size
- How tall ?
- How many magazines do you read ?
THE STATISTICS SO FAR (i)
Mean: measure of “central tendency”; a typical value
Sum of Items
Sample Size
Standard Deviation: measures degree of variation in the
data
Sum of Squared Deviations
Sample Size
THE STATISTICS SO FAR (ii)
Standard Error:
•  Measures the accuracy of a statistic
•  Gives a range (confidence interval) either side of the
estimated statistic, within which the population value
of that statistic is almost certain to fall
Standard Deviation2
Sample Size
The more variable a measure is, the
more observations we should take
CONTENTS
DESCRIPTIVE STATISTICS
STATISTICS TO MEASURE THE ACCURACY OF SURVEY FINDINGS
USEFUL FORMULAE AND TABLES
STANDARD ERROR FOR A PERCENTAGE
(p*q)/n
where: p is percentage; q is 100-p, n is sample size
So the Standard Error of a statistic estimated at
40% on a sample of 1000
= ( 40 * 60 ) / 1000 = 1.55
So, we are 95% confident that the true
population figure lies between:
40% + (2 x 1.55) = 43.1% and
40% - (2 x 1.55) = 36.9%
TEST THE DIFFERENCE BETWEEN 2 PERCENTAGES
Standard Error= (P1(100-P1)/N1)+(P2(100-P2)/N2)
For example, if 30% of ABC1s, and 20% of C2DEs, use
credit cards regularly, on samples of 333 and 667:
Standard Error= (30x70/333)+(20x80/667) = 2.95
The actual difference is 30 - 20 = 10%
10 / 2.95 = 3.39 Standard Errors
The probability of getting this difference from our
sample, when the real difference in the
population is zero, is less than 5% (difference is
greater than 2 standard errors)
So difference is statistically significant
USEFUL TABLES
Confidence limits for a percentage:
Sample size:
200 500 1000 2000
10%
20%
50%
4.5% 3.0% 2.0% 1.5%
6.0% 4.0% 2.5% 2.0%
7.0% 4.5% 3.5% 2.5%
35% result on sample of 600,
Confidence Limit is approx. + 4.0%
USEFUL TABLES
Significant difference for 2 percentages:
10%
20%
50%
Sample size:
200
500
1000
2000
6.0% 4.0%
8.0% 5.0%
10.0% 6.5%
2.0%
2.5%
3.5%
3.0%
3.5%
4.5%
40% v 48% on samples of 300,
need difference of approx. + 8.0%
Related documents