Download Lecture 13-14. Sampling.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
MARKETING RESEARCH
LECTURE 15-16.
SAMPLING.
Note: Sampling is fundamental to most human behavior. When tasting new food, a person
will typically eat one or two bits and then form an opinion. The same way a researcher
“tastes” (samples) a universe of subjects and generalizes about how the universe
behaves.
Is a sample needed?
MR researcher may identify a problem and reveal that secondary data is not sufficient to
clarify the issue (e.g. “a decision must be made about which of the two package designs to
use”).
Assuming that sampling would be useful, still, the researcher must evaluate how useful the
primary info would be.
It may turn out that although secondary info is not perfect, it may be “good enough” to
solve at least a part of the problem.
Only after passing this stage the researcher must decide about what type of sample to
take.
The basic issues of Sample Design.
1. Must be representative of the population of interest in term of the key responses. (This
does not imply that the sample is to be made strictly of typical general population; a
sample can be very different demographically and still be representative of some types
of behavior).
2. Dealing with non-responses may be even more complex issue than the sampling plan
itself.
In order to cover sampling in a systematic way, the notes will address the following
sequential issues:
1. Who is the target population (frame)?
2. What method (process) will be used to elicit responses?
3. How many will be sampled?
4. How will the sampling points be selected?
5. What will be done about non-response?
1. WHO IS THE TARGET POPULATION?
The target population (frame) is that part of the total population (universe) to which the
study is directed. E.g. for a company, selling automobiles in the United States, the
universe could be the entire US population plus foreign visitors, and the frame might be
people aged 18 or over. (Alternatively, the focus could be on relatively well-off individuals,
and hence the target might be those with annual incomes above $100000).
Too large target population will lead to appearance of responses that are meaningless; on
the other hand, too narrow frame will tend to exclude potentially useful responses.
The choice of TP is about balancing between including irrelevant sampling points and
excluding relevant ones.
2. WHAT METHOD WILL BE USED?
Part of the job of designing a sampling plan is to specify how the data will be collected.
This means that, for a survey, a choice must be made b/w phone, personal, mail, and so
forth. Since the choice of the method of data collection often influences the type of sample
to be drawn, the method in an integral part of the sample determination.
3. HOW MANY WILL BE SAMPLED?
It is a very complex issue and a whole branch of Bayesian statistics is devoted to this
question. However, for a practitioner, four major considerations are paramount: statistical
precision, credibility, company policy, and financial constraints.
Statistical precision.
The larger the sample – the more confident a researcher can be that the results are
representative of the things being measured. In general, the precision of a sample is
related to the square root of the sample size – i.e. to double the precision of an estimate,
the sample must be four times as large.
(By precision we mean the level of uncertainty about the value of the construct being
measured).
Once the sample is drawn, the level of precision is already determined. If the level of
precision needed can be specified ahead of time, however, it is possible to determine the
minimum required sample size (which refers to the number of responses, not the number
of individuals in the target sample).
The figures below refer to measuring a single key item, while many surveys have multiple
items of interest; consequently, the formulas below are not yet an absolute answer, but a
guideline to approach more complex statistical issues.
Averages.
E.g. Average income of US households. X
It represents the sample mean. However, the sample X may not exactly equal the
population average due to idiosyncrasies of the sample. The most accurate statement can
be made is that the true mean is within some range about X .
In order to quantify this range, two important facts are used:
1. The sample mean is approximately normally distributed.
2. The standard deviation of the sample mean is the standard deviation calculated in the
sample divided by the square root of the sample size:
s
X

s
n
Using these two facts, it is possible to construct a range, known as confidence interval,
into which the true mean will fall with a given level of certainty (confidence level).
This rage is given by
Range in which true mean falls = Sample mean ± z 
s
n
z is a constant drawn from a standard normal table which depends on the level of
confidence desired (e.g. commonly used 95% confidence interval is given by 1.96).
2
 (X  X)
s 
n 1
(standard deviation is a square root of the variance above
2
X
s X2 )
Hence if we took a sample of 400 households in the US and measured their average
income as $15172 and the standard deviation of their income as 6216, the 95%
confidence interval for true household average income in the US would be
15172  1.96(
6216
)  15172  609
400
The precision of this estimate is determined by sample size (n), standard deviation (s), and
the confidence level desired.
Formula to determine sample size in advance.
1. Decide on an acceptable confidence level.
2. Estimate the standard deviation of data (e.g., incomes)
(To do this – run either pilot test or subjectively from prior research, or else – take
range of distribution and divide it by 6).
3. Solve the following formula for the necessary sample size:
Tolerance level acceptable = z 
s
n
E.g., assuming s=3000, the tolerance acceptable were 100, and we want to be 95% sure
that we would be within the tolerance, we then get
3000
n
n=3457,44  3.458
100 = 1.96
Proportions.
Assume, we wish to estimate the proportion of our accounts who also buy from a major
competitor. The proportion is approximately normally distributed.
The proportion of our accounts who buy from the competitor can be estimated by a
confidence interval:
p  zα
p(1  p)
n
E.g., we sample 400 of our 30000 accounts and find that 32% also buy from our
competitor. The 95% confidence interval for the percent of all our accounts who buy from
the major competitor is given by:
Range of true proportion = Sample Proportion  z α
p(1  p)
n
equals, range from 27.4% to 36.6% (where sample proportion is 32%).
__________
As in case of the mean, the formula can be used to estimate the necessary size in
advance. In this case the required sample size can be derived from the following:
Tolerance = z α
p(1  p)
n
Assuming, we wanted to be accurate within 3% at the 95% confidence interval, this
reduces to:
p(100  p)
n
(we think that p was about 10% in this example)
3 = 1.96
or
n  1.96
30
 19.6
3
therefore,
n = 384.16  385.
__________
Alternatively, we could adopt the “conservative” procedure and assume p were 50%. This
produces the maximum sample size needed for a given tolerance. Actually, any p b/w 0.3
and 0.7 produce fairly similar results. In this case, we would obtain
3  1.96
(50)(50)
n
or
50
3
n = 1067,1  1068.
n  1.96
(The result is so-called “magic sample size” of b/w 1000 and 1500, which is characteristic
of most national samples.
Finite Population Correction.
Both previous procedures apply to a situation where the target population is essentially
infinite (final consumer product studies).
When sample gets large in relation to the target population (over 10 to 20 percent of its
size), however, these formulas will overestimate the required sample. Assuming sample
points are expensive, the finite size correction factor should be employed. (The correction
factor accounts for the fact that when the sample size “n” approaches the population size
N, the uncertainty about the population average drops to zero. This correction would
convert the two formulas for determining sample size to the more accurate
Numerical tolerance = z α
s
N n
N 1
n
Where N is the size of the target population;
Percent tolerance = zα
p(1  p) N  n
n
N 1
Financial constraints.
n
SP - EN - D
IT
Where, SP is budget, EN are fixed costs of the studies; D are dinners, trips or whatever
else we can charge to that, and IT is the variable cost of a sample point or interview (IT);
or,
n
(budget) - (fixed costs) - (trips and dinners)
(variable cost per sample point)
This formula is, in reality, every bit as important in determining sample size as those
relating to the statistical precision of the results.
Planning for subsamples.
The formulas above consider aggregate level parameters. Once we break our sample
down into groups – the precision is decreased even more. E.g., there are nine groups in
population. Even if the groups are equal in size (what is rarely true), this leaves only 150
respondents per group, and consequently the 95% confidence interval for the fraction in
the subgroup is about plus or minus 8% (e.g., 42% to 58%).
To accurately represent 9 categories would require 9 times bigger sample (what is costly).
Special techniques are used to minimize costs in multiple group samples.
SELECTION OF SAMPLE POINTS.
General two approaches exist – pure probabilistic and purposive sampling. In the first case
each subject has equal probability to be selected; in the second case, researchers put
greater importance on some segments of the target population (and, consequently,
somewhat overrepresent them).
Simple random sampling. (The most democratic).
It has many nice properties, although, the method is not the most efficient.
E.g. a sample of 20 can contain all smokers, severely distorting the outcome of research
about the danger of cigarettes. This is a rare case, but the smaller a sample becomes –
the more typical this error becomes.
In order to select subjects randomly from the universe (which is a long list of all existing
subjects), random numbers can be used.
nth Name (Systematic)
The idea is to select a starting point and then to select each nth subject. E.g., if I wished to
draw a target sample of 30 from a target population of 1200, I might arbitrary select the
11th individual (a number b/w 1 and 40) as a starting point and then individuals 51, 91, 131,
…, 1171 as my 30 target sample points.
The major problem is subtle cycles of parameters, which can repeat regularly, and my
range may converge with the internal cycling range.
The method is considered more advanced than pure random sampling. Most consumer
phone an mail surveys are based on nth name designs.
Stratified.
For many studies, the target population can be divided into segments with different
characteristics. In this case the information about the segments (strata) can be used to
design the sampling plan. Specifically, separate sampling plans can be drawn for each of
the stratum. This guarantees that each stratum will be adequately represented, something
which random sampling does not.
Assuming different samples are drawn from each stratum, the mean and standard
deviation of a variable in the entire target population can then be estimated as follows:
Let
N i = size of the ith stratum
ni = sample size of the ith stratum
N = size of the total target population
n = total sample size
N
wi = weight of the estimate of the ith stratum = i
N
k = number of stratum
si = standard deviation in the ith stratum
xi = mean in the ith stratum
Then
k
x   wi xi
i 1
and
k
 wi2 sx2i 
sx 
i 1
k
 wi2
i 1
si2
ni
For proportions, the formulas become:
k
p   wi pi
i 1
And
sp 
k
w
2
i
i 1
pi (1  pi )
ni
a. Proportionate stratified sampling. The number sampled in each stratum is proportionate
to the size of the stratum.
The sample size of each stratum ( ni ) is given by the proportion of the population that
falls into that stratum (Ni/N). The standard deviation of the estimate of the mean
becomes
2
sx 
2
 N i  si
  N  n
i
and since
ni 
Ni
n
N
then
sx 
w s
2
i i
n
E.g. Beer consumers were divided into four segments (strata) on the basis of
demographics. A proportionate sample would then be drawn with sample size in each
stratum proportionate to the size of the sample.
Size of
Stratum
Average beer St.Dev. Of
Stratum stratum
sample size consumption Beer Consumption
1
8000
80
20
2
6000
60
10
3
4000
40
15
4
2000
20
6
20000
200
Total
4
4
5
2
x  0.4(20)  .3(10)  .2(15)  .1(6)  8  3  3  .6  14.6
2
2
2
42
2 4
2 5
2 2
s x  (.4)
 (.3)
 (.2)
 (.1)
 0.032  0.024  0.025  0.002  0.288
80
60
40
20
2
Alternatively, the shortcut formula gives
.4(4) 2  .3(4) 2  .2(5) 2  .1(2) 2
 .288
200
s 
x
Disproportionate sampling. “Undemocratic”, since some strata are deemed more important
than others. A disproportionately large part of the sample is then obtained from these
important strata.
Some segments can more important than other; consequently, a researcher may wish to
sample a disproportionately larger number of subjects from a certain strata/s.
On the other hand, assuming financial constraints, a researcher may collect data in greater
portion from “easy-to-question” strata in order to get a bigger sample.
Another, more logical, excuse to have disproportionate sampling has statistical nature.
E.g., there is a stratum with average consumption of 3 and deviation of 4; another stratum
has average consumption of 320 and standard deviation of 0. Considering second strata, a
single observation would be sufficient and any additional – redundant. In order to produce
the minimum variance (most reliable) estimate of the overall mean, a only one observation
is to be taken from second strata, and all other financially possible – from the first one.
General formula for optimal sampling to minimize total variance of the estimate is:
ni 
wi si
k
w s
i 1
n
i i
and the resulting error of the mean is:
 n

  wi si 

s x   i 1
n
2
returning to the beer consumption example, the optimal sample allocations will be as
follows:
n1 
.4(4)
 200  80
.4(4)  .3(4)  .2(5)  .1(2)
n1,n2 ,n3 will correspondingly be 60, 50, and 10.
NOTE! The procedure requires knowing the standard deviations in each stratum in
advance. Since the true standard deviations are never deducible, we substitute either
subjective estimates of the sample or the results of a prior study.
Assuming we now proceeded to take another survey of size 200 according to the
disproportionate approach, the results might be as follows:
Size of
Stratum
Average beer St.Dev. Of
Stratum stratum
sample size consumption Beer Consumption
1
8000
80
20
4
2
6000
60
10
4
3
4000
50
15
5
4
2000
10
6
2
20000
200
Total
Average beer consumption would then be as follows:
x  .4(20)  .3(10)  .2(15)  .1(6)  14.6
The standard deviation would be:
sx 
[0.4(4)  0.3(4)  0.2(5)  0.1(2)] 2

200
42
 0.283
200
In this case, the standard deviation is only slightly (less than 1 percent) smaller under the
disproportionate sampling plan, a surprisingly typical result. In fact, unless the standard
deviations of the strata are very different, disproportionate sampling does very little to the
variance estimate. For example, if 50 were sampled from each of the four strata, the
standard deviation of the mean would be (assuming the estimates of the mean and
standard deviations were unchanged):
s x  (.4)2
42
42
52
22
5.04
 (.3)2
 (.2)2
 (.1)2

 .1008  .317
50
50
50
50
50
The point, therefore, is that for most marketing surveys, sampling disproportionately to get
the most reliable estimates is not very useful.
The major reason for using a stratified sample, therefore, is to ensure adequate
representation of key subgroups of the target population.
Stratified samples often are applied to situations where more than one variable serves as
a basis for stratification. For example, I might be interested in a consumer product, which
appealed primarily to middle-aged, high-income consumers. Given a budget which allowed
for a sample of 2000, stratified sampling plan might look like:
Income level
Under $10000
$10000-$19999
$20000 - $29999
$30000+
Under 30
50
50
100
200
Age group
30-50
50
200
400
500
51+
50
100
100
200
Universal Sampling (Census).
Impossible and overly expensive for final consumer markets; but, if dealing with 30-40
major clients on industrial market, then it is both logical and statistically desirable to scan
all subjects possible.
Convenience.
Useful for hypothesis generation and initial pilot research (i.e., learn to run surveys on
convenience samples).
Quota.
A compromise b/w stratified and convenience sampling. E.g., a firm may want the opinions
of at least 30 housewives b/w ages 30 and 45. Hence a quota sample may be generated
by having the interviewer collect data from the first 30 women who agree to participate.
Cluster samples.
Cluster samples are exactly what the name implies – samples, gathered in clusters.
Motivation – cost reduction. E.g., area sampling (collecting a certain number of interviews
from a certain geographical region), or list sampling (sampling all people, whose names
begin with a set of randomly selected letters).
Sequential sampling.
A small sample is drawn and the results are analyzed. If the results are sufficiently clear, a
decision is made and the rest of sample is not drawn. In not – another sample is drawn
sequentially.
Multistage samples (!)
A mix of several approaches. E.g., for big samples, first – select a set of areas; second –
select (e.g., randomly) a set of locations within areas; third – select individuals within
locations (e.g., randomly, or systematically). This approach turns out to be both
representative and reasonably efficient.
SOURCES OF LISTS
It is not a trivial task to find a good list (can be obsolete, incomplete, contain errors, etc.).
In States, suppliers charge approximately from $50 to $120 per 1000 list items (E.g., Fritz
Hofheimer’s 1977 catalogue).
SCREEING QUESTIONS.
Are asked in order to avoid interviewing subjects, who do not qualify.
E.g.,
1. Do you smoke?
Yes____ No____
(if answer is yes – continue on question 2, if no – terminate interview).
When the target sample is too small (e.g., hang gliders), the interviewer uses referral
practice – that is – asking first respondent to name others, who would qualify for the
research. (Screening in this case will turn out to be very inefficient).
NON-RESPONSE PROBLEM
Too low response rate rises suspicion about whether the survey is representative or not.
Small number of subjects, who responded may possess special traits and be different from
the universe.
Non-coverage.
Non-listed phone numbers, ghetto-areas that are difficult to cover, low responses on mail
surveys, etc. are the attributes of non-coverage.
Non-response.
Final subject cannot be fount (went to Cyprus).
Subject is not at home (e.g., asking “What do you do during evening?” during evening
survey will generate disproportionately high percent of answers “stay home,” while those
who are not at home (with different opinions) cannot be reached).
Refusal (afraid, have no interest, time pressure, etc.).
Typology of potential respondents:
1.
2.
3.
4.
5.
Happy to respond (15 percent).
Willing to be convinced to respond with modest effort (50%).
Can be bought at high price (15%).
No way to make them respond (10%).
Not even covered by the process (10%).
Consequently, categories 1,2,3 can be covered only, resulting in average response rates
from 10 to 70%.
Determinants of response rate.
Interest.
Can be the major determinant of response rate. The higher – the better. Is subjective and
dependent on other parameters below.
Length.
The longer questionnaire is – the less likely someone is to begin it/complete it/answer all
questions/etc.
Opening gambit.
Intro, which invites respondent to participate is extremely important.
Besides interviewer exterior, voice, etc., different appeals also may be useful, e.g.: mercy
(I am a poor student…), self-interest (“your opinion will count), or duty (“you should
express your views”). Guarantees of anonymity are also useful in persuading reluctant
respondents to participate, as well as credentials of the surveying company. Use of
person’s name also seem to increase response rate (e.g., in mail surveys); plus – return
envelope, “classy” mail, etc. also count.
Incentives/bribery.
Works, but respondents start to think to give answers “that I wish to hear” instead of true
ones, leading to additional biases.
Cheap stimuli are ineffective (e.g., a reward up to a dollar per long interview, or a small
souvenir), while a chance to win a color TV set worth $500 turn out to be more effective.
Format.
Use adequate white space, good plain paper, standard fonts, etc.
Advance Notice.
In order to secure operation, it is common to give advance notice (by phone, postcard, or
letter) of the impending study. This is often useful in increasing both response rate and
quality.
Callback/Follow-up.
Subsequent attempts to reach a respondent if s/he was not at home at first time. Twothree attempt callback plans are typical, and can raise the efficiency up to 80 percent.
Overall.
There must be a trade-off b/w cost per subject and effect. Some basic studies of
psychological nature may suffice with moderately biased samples, and no call-backs,
incentives, etc. can be used; while average income studies require representative samples
– what means – more expenditures.
Problem of dropouts in multi-wave surveys is typical and accounts to 5% in welldesigned/controlled settings up to 25% in mostly uncontrolled panels.
Researcher may waste resources by throwing away partial involvement subjects (and the
data collected from them) or may still incorporate incomplete data into a research and run
the risk of measuring differences b/w people rather than changes over time.
Weighting Account for Non-response.
E.g. mail survey, which attempted to find a number of people who would be interested in
buying a new product. There was a 30 percent response rate, and of the respondents, 40
percent said they would buy a product. One estimate for the percent of the population who
would be interested in buying (which incidentally, would greatly overstate actual buying)
would thus be 40%. On the other hand, we could assume that the other 70% did not return
the survey because they were not interested in the product. In that case, the appropriate
estimate would be 40%*30=12%. Actually, the number would probably lie somewhere in
between 12 and 40, but 12 is more likely to be the true estimate than 40.
This is only a single instance of difficulties, related to non-response. I redirect you to other
sources for obtaining more professional knowledge of how to deal with this respondent’s
practice.
(Read out loud pp.310-311 about a historical fatal sample design [Roosevelt vs. Landon
victory prediction]).