* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Determining Sample Size
Survey
Document related concepts
Transcript
Essentials of
Marketing Research
Chapter 13:
Determining Sample Size
WHAT DO STATISTICS MEAN?
• DESCRIPTIVE STATISTICS
– NUMBER OF PEOPLE
– TRENDS IN EMPLOYMENT
– DATA
• INFERENTIAL STATISTICS
– MAKE AN INFERENCE ABOUT A
POPULATION FROM A SAMPLE
POPULATION PARAMETER
VERSUS
SAMPLE STATISTICS
POPULATION PARAMETER
• VARIABLES IN A POPULATION
• MEASURED CHARACTERISTICS OF A
POPULATION
• GREEK LOWER-CASE LETTERS AS
NOTATION, e.g. m, s, etc.
SAMPLE STATISTICS
• VARIABLES IN A SAMPLE
• MEASURES COMPUTED FROM
SAMPLE DATA
• ENGLISH LETTERS FOR NOTATION
– e.g., X or S
MAKING DATA USABLE
• Data must be organized into:
– FREQUENCY DISTRIBUTIONS
– PROPORTIONS
– CENTRAL TENDENCY
• MEAN, MEDIAN, MODE
– MEASURES OF DISPERSION
• range, deviation, standard deviation, variance
Frequency Distribution of Deposits
Amount
Frequency
Percent
Probability
Under $3,000
499
16
.16
$3,000-$4,999
530
17
.17
$5,000-$9,999
562
18
.18
$10,000$14,999
718
23
.23
$15,000 or more 811
26
.26
Total
100
1
3,120
MEASURES OF CENTRAL
TENDENCY
• MEAN - ARITHMETIC AVERAGE
• MEDIAN - MIDPOINT OF THE
DISTRIBUTION
• MODE - THE VALUE THAT OCCURS
MOST OFTEN
Number of Sales Calls Per Day
by Salespersons
Salesperson
Mike
Patty
Billie
Bob
John
Frank
Chuck
Samantha
Number of
Sales calls
4
3
2
5
3
3
1
5
26
Sales for Products A and B, Both Average 200
Product A
196
198
199
199
200
200
200
201
201
201
202
202
Product B
150
160
176
181
192
200
201
202
213
224
240
261
MEASURES OF DISPERSION
• THE RANGE
• STANDARD DEVIATION
Low Dispersion Versus High
Dispersion
5
4
Low Dispersion
3
2
1
150
160
170
180
190
200
Value on Variable
210
5
4
High dispersion
3
2
1
150
160
170
180
190
Value on Variable
200
210
Standard Deviation
2
2
S=
S
=
(X - X)
n - 1
THE NORMAL DISTRIBUTION
• NORMAL CURVE
• BELL-SHAPED
• ALMOST ALL OF ITS VALUES ARE
WITHIN PLUS OR MINUS 3
STANDARD DEVIATIONS
• I.Q. IS AN EXAMPLE
NORMAL DISTRIBUTION
MEAN
Normal Distribution
13.59%
2.14%
34.13%
34.13%
13.59%
2.14%
An example of the distribution of
Intelligence Quotient (IQ) scores
13.59%
34.13%
13.59%
34.13%
2.14%
2.14%
70
85
100
IQ
115
130
STANDARDIZED NORMAL
DISTRIBUTION
• SYMMETRICAL ABOUT ITS MEAN
• MEAN IDENTIFIES HIGHEST POINT
• INFINITE NUMBER OF CASES - A
CONTINUOUS DISTRIBUTION
• AREA UNDER CURVE HAS A PROBABILITY
DENSITY = 1.0
• MEAN OF ZERO, STANDARD DEVIATION
OF 1
A STANDARDIZED NORMAL CURVE
-2
-1
0
1
2
STANDARDIZED
SCORES
•POPULATION DISTRIBUTION
•SAMPLE DISTRIBUTION
•SAMPLING DISTRIBUTION
POPULATION DISTRIBUTION
-s
m
s
x
SAMPLE DISTRIBUTION
_
C
S
X
SAMPLING DISTRIBUTION
µX
SX
C
STANDARD ERROR
OF THE MEAN
STANDARD DEVIATION OF THE
SAMPLING DISTRIBUTION
CENTRAL LIMIT THEOREM
PARAMETER ESTIMATES
• POINT ESTIMATES
• CONFIDENCE INTERVAL ESTIMATES
RANDOM SAMPLING ERROR
AND SAMPLE SIZE ARE
RELATED
SAMPLE SIZE
• VARIANCE (STANDARD
DEVIATION)
• MAGNITUDE OF ERROR
• CONFIDENCE LEVEL
Determining Sample Size
Recap
Sample Accuracy
• How close the sample’s profile is to the true
population’s profile
• Sample size is not related to
representativeness,
• Sample size is related to accuracy
Methods of Determining Sample Size
• Compromise between what is theoretically
perfect and what is practically feasible.
• Remember, the larger the sample size, the
more costly the research.
• Why sample one more person than
necessary?
Methods of Determining Sample Size
• Arbitrary
– Rule of Thumb (ex. A sample should be at least
5% of the population to be accurate
– Not efficient or economical
• Conventional
– Follows that there is some “convention” or
number believed to be the right size
– Easy to apply, but can end up with too small or
too large of a sample
Methods of Determining Sample Size
• Cost Basis
– based on budgetary constraints
• Statistical Analysis
– certain statistical techniques require certain
number of respondents
• Confidence Interval
– theoretically the most correct method
Notion of Variability
Little
variability
Great
variability
Mean
Notion of Variability
• Standard Deviation
– approximates the average distance away from
the mean for all respondents to a specific
question
– indicates amount of variability in sample
– ex. compare a standard deviation of 500 and
1000, which exhibits more variability?
Measures of Variability
• Standard Deviation: indicates the degree of variation or
diversity in the values in such as way as to be translatable
into a normal curve distribution
• Variance = (x-x)2/ (n-1)
• With a normali curve, the midpoint (apex) of the
curve is also the mean and exactly 50% of the
distribution lies on either side of the mean.
Normal Curve and Standard
Deviation
Number of
standard
deviations
from the
mean
+/- 1.00 st dev
Percent of
area under
the curve
Percent of
area to the
right or left
68%
16%
+/- 1.64 st dev
90%
5%
+/- 1.96 st dev
95%
2.5%
+/- 2.58 st dev
99%
0.5%
Notion of Sampling Distribution
• The sampling distribution refers to what would be
found if the researcher could take many, many
independent samples
• The means for all of the samples should align
themselves in a normal bell-shaped curve
• Therefore, it is a high probability that any given
sample result will be close to but not exactly to the
population mean.
Normal, bell-shaped curve
Midpoint
(mean)
Notion of Confidence Interval
• A confidence interval defines endpoints based on
knowledge of the area under a bell-shaped curve.
• Normal curve
– 1.96 times the standard deviation theoretically defines
95% of the population
– 2.58 times the standard deviation theoretically defines
99% of the population
Notion of Confidence Interval
• Example
– Mean = 12,000 miles
– Standard Deviation = 3000 miles
• We are confident that 95% of the
respondents’ answers fall between 6,120
and 17,880 miles
12,000 + (1.96 * 3000) = 17,880
12,000 - (1.96 * 3000) = 6.120
Notion of Standard Error of a Mean
• Standard error is an indication of how far away from
the true population value a typical sample result is
expected to fall.
• Formula
– S X = s / (square root of n)
– S p = Square root of {(p*q)/ n}
•
•
•
•
•
where S p is the standard error of the percentage
p = % found in the sample and q = (100-p)
S X is the standard error of the mean
s = standard deviation of the sample
n = sample size
Computing Sample Size Using The
Confidence Interval Approach
• To compute sample size, three factors need
to be considered:
– amount of variability believed to be in the
population
– desired accuracy
– level of confidence required in your estimates
of the population values
Determining Sample Size Using a
Mean
• Formula: n = (pqz2)/e2
• Formula: n = (s2z2)/e2
• Where
– n = sample size
– z = level of confidence (indicated by the number of standard
errors associated with it)
– s = variability indicated by an estimated standard deviation
– p = estimated variability in the population
– q = (100-p)
– e = acceptable error in the sample estimate of the
population
Determining Sample Size Using a
Mean: An Example
• 95% level of confidence (1.96)
• Standard deviation of 100 (from previous
studies)
• Desired precision is 10 (+ or -)
• Therefore n = 384
– (1002 * 1.962) / 102
Practical Considerations in
Sample Size Determination
• How to estimate variability in the
population
– prior research
– experience
– intuition
• How to determine amount of precision
desired
– small samples are less accurate
– how much error can you live with?
Practical Considerations in
Sample Size Determination
• How to calculate the level of confidence
desired
– risk
– normally use either 95% or 99%
Determining Sample Size
• Higher n (sample size) needed when:
– the standard error of the estimate is high
(population has more variability in the
sampling distribution of the test statistic)
– higher precision (low degree of error) is
needed (i.e., it is important to have a very
precise estimate)
– higher level of confidence is required
• Constraints: cost and access
Notes About Sample Size
• Population size does not determine sample
size.
• What most directly affects sample size is
the variability of the characteristic in the
population.
– Example: if all population elements have the
same value of a characteristic, then we only
need a sample of one!