Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MARKETING RESEARCH LECTURE 15-16. SAMPLING. Note: Sampling is fundamental to most human behavior. When tasting new food, a person will typically eat one or two bits and then form an opinion. The same way a researcher “tastes” (samples) a universe of subjects and generalizes about how the universe behaves. Is a sample needed? MR researcher may identify a problem and reveal that secondary data is not sufficient to clarify the issue (e.g. “a decision must be made about which of the two package designs to use”). Assuming that sampling would be useful, still, the researcher must evaluate how useful the primary info would be. It may turn out that although secondary info is not perfect, it may be “good enough” to solve at least a part of the problem. Only after passing this stage the researcher must decide about what type of sample to take. The basic issues of Sample Design. 1. Must be representative of the population of interest in term of the key responses. (This does not imply that the sample is to be made strictly of typical general population; a sample can be very different demographically and still be representative of some types of behavior). 2. Dealing with non-responses may be even more complex issue than the sampling plan itself. In order to cover sampling in a systematic way, the notes will address the following sequential issues: 1. Who is the target population (frame)? 2. What method (process) will be used to elicit responses? 3. How many will be sampled? 4. How will the sampling points be selected? 5. What will be done about non-response? 1. WHO IS THE TARGET POPULATION? The target population (frame) is that part of the total population (universe) to which the study is directed. E.g. for a company, selling automobiles in the United States, the universe could be the entire US population plus foreign visitors, and the frame might be people aged 18 or over. (Alternatively, the focus could be on relatively well-off individuals, and hence the target might be those with annual incomes above $100000). Too large target population will lead to appearance of responses that are meaningless; on the other hand, too narrow frame will tend to exclude potentially useful responses. The choice of TP is about balancing between including irrelevant sampling points and excluding relevant ones. 2. WHAT METHOD WILL BE USED? Part of the job of designing a sampling plan is to specify how the data will be collected. This means that, for a survey, a choice must be made b/w phone, personal, mail, and so forth. Since the choice of the method of data collection often influences the type of sample to be drawn, the method in an integral part of the sample determination. 3. HOW MANY WILL BE SAMPLED? It is a very complex issue and a whole branch of Bayesian statistics is devoted to this question. However, for a practitioner, four major considerations are paramount: statistical precision, credibility, company policy, and financial constraints. Statistical precision. The larger the sample – the more confident a researcher can be that the results are representative of the things being measured. In general, the precision of a sample is related to the square root of the sample size – i.e. to double the precision of an estimate, the sample must be four times as large. (By precision we mean the level of uncertainty about the value of the construct being measured). Once the sample is drawn, the level of precision is already determined. If the level of precision needed can be specified ahead of time, however, it is possible to determine the minimum required sample size (which refers to the number of responses, not the number of individuals in the target sample). The figures below refer to measuring a single key item, while many surveys have multiple items of interest; consequently, the formulas below are not yet an absolute answer, but a guideline to approach more complex statistical issues. Averages. E.g. Average income of US households. X It represents the sample mean. However, the sample X may not exactly equal the population average due to idiosyncrasies of the sample. The most accurate statement can be made is that the true mean is within some range about X . In order to quantify this range, two important facts are used: 1. The sample mean is approximately normally distributed. 2. The standard deviation of the sample mean is the standard deviation calculated in the sample divided by the square root of the sample size: s X s n Using these two facts, it is possible to construct a range, known as confidence interval, into which the true mean will fall with a given level of certainty (confidence level). This rage is given by Range in which true mean falls = Sample mean ± z s n z is a constant drawn from a standard normal table which depends on the level of confidence desired (e.g. commonly used 95% confidence interval is given by 1.96). 2 (X X) s n 1 (standard deviation is a square root of the variance above 2 X s X2 ) Hence if we took a sample of 400 households in the US and measured their average income as $15172 and the standard deviation of their income as 6216, the 95% confidence interval for true household average income in the US would be 15172 1.96( 6216 ) 15172 609 400 The precision of this estimate is determined by sample size (n), standard deviation (s), and the confidence level desired. Formula to determine sample size in advance. 1. Decide on an acceptable confidence level. 2. Estimate the standard deviation of data (e.g., incomes) (To do this – run either pilot test or subjectively from prior research, or else – take range of distribution and divide it by 6). 3. Solve the following formula for the necessary sample size: Tolerance level acceptable = z s n E.g., assuming s=3000, the tolerance acceptable were 100, and we want to be 95% sure that we would be within the tolerance, we then get 3000 n n=3457,44 3.458 100 = 1.96 Proportions. Assume, we wish to estimate the proportion of our accounts who also buy from a major competitor. The proportion is approximately normally distributed. The proportion of our accounts who buy from the competitor can be estimated by a confidence interval: p zα p(1 p) n E.g., we sample 400 of our 30000 accounts and find that 32% also buy from our competitor. The 95% confidence interval for the percent of all our accounts who buy from the major competitor is given by: Range of true proportion = Sample Proportion z α p(1 p) n equals, range from 27.4% to 36.6% (where sample proportion is 32%). __________ As in case of the mean, the formula can be used to estimate the necessary size in advance. In this case the required sample size can be derived from the following: Tolerance = z α p(1 p) n Assuming, we wanted to be accurate within 3% at the 95% confidence interval, this reduces to: p(100 p) n (we think that p was about 10% in this example) 3 = 1.96 or n 1.96 30 19.6 3 therefore, n = 384.16 385. __________ Alternatively, we could adopt the “conservative” procedure and assume p were 50%. This produces the maximum sample size needed for a given tolerance. Actually, any p b/w 0.3 and 0.7 produce fairly similar results. In this case, we would obtain 3 1.96 (50)(50) n or 50 3 n = 1067,1 1068. n 1.96 (The result is so-called “magic sample size” of b/w 1000 and 1500, which is characteristic of most national samples. Finite Population Correction. Both previous procedures apply to a situation where the target population is essentially infinite (final consumer product studies). When sample gets large in relation to the target population (over 10 to 20 percent of its size), however, these formulas will overestimate the required sample. Assuming sample points are expensive, the finite size correction factor should be employed. (The correction factor accounts for the fact that when the sample size “n” approaches the population size N, the uncertainty about the population average drops to zero. This correction would convert the two formulas for determining sample size to the more accurate Numerical tolerance = z α s N n N 1 n Where N is the size of the target population; Percent tolerance = zα p(1 p) N n n N 1 Financial constraints. n SP - EN - D IT Where, SP is budget, EN are fixed costs of the studies; D are dinners, trips or whatever else we can charge to that, and IT is the variable cost of a sample point or interview (IT); or, n (budget) - (fixed costs) - (trips and dinners) (variable cost per sample point) This formula is, in reality, every bit as important in determining sample size as those relating to the statistical precision of the results. Planning for subsamples. The formulas above consider aggregate level parameters. Once we break our sample down into groups – the precision is decreased even more. E.g., there are nine groups in population. Even if the groups are equal in size (what is rarely true), this leaves only 150 respondents per group, and consequently the 95% confidence interval for the fraction in the subgroup is about plus or minus 8% (e.g., 42% to 58%). To accurately represent 9 categories would require 9 times bigger sample (what is costly). Special techniques are used to minimize costs in multiple group samples. SELECTION OF SAMPLE POINTS. General two approaches exist – pure probabilistic and purposive sampling. In the first case each subject has equal probability to be selected; in the second case, researchers put greater importance on some segments of the target population (and, consequently, somewhat overrepresent them). Simple random sampling. (The most democratic). It has many nice properties, although, the method is not the most efficient. E.g. a sample of 20 can contain all smokers, severely distorting the outcome of research about the danger of cigarettes. This is a rare case, but the smaller a sample becomes – the more typical this error becomes. In order to select subjects randomly from the universe (which is a long list of all existing subjects), random numbers can be used. nth Name (Systematic) The idea is to select a starting point and then to select each nth subject. E.g., if I wished to draw a target sample of 30 from a target population of 1200, I might arbitrary select the 11th individual (a number b/w 1 and 40) as a starting point and then individuals 51, 91, 131, …, 1171 as my 30 target sample points. The major problem is subtle cycles of parameters, which can repeat regularly, and my range may converge with the internal cycling range. The method is considered more advanced than pure random sampling. Most consumer phone an mail surveys are based on nth name designs. Stratified. For many studies, the target population can be divided into segments with different characteristics. In this case the information about the segments (strata) can be used to design the sampling plan. Specifically, separate sampling plans can be drawn for each of the stratum. This guarantees that each stratum will be adequately represented, something which random sampling does not. Assuming different samples are drawn from each stratum, the mean and standard deviation of a variable in the entire target population can then be estimated as follows: Let N i = size of the ith stratum ni = sample size of the ith stratum N = size of the total target population n = total sample size N wi = weight of the estimate of the ith stratum = i N k = number of stratum si = standard deviation in the ith stratum xi = mean in the ith stratum Then k x wi xi i 1 and k wi2 sx2i sx i 1 k wi2 i 1 si2 ni For proportions, the formulas become: k p wi pi i 1 And sp k w 2 i i 1 pi (1 pi ) ni a. Proportionate stratified sampling. The number sampled in each stratum is proportionate to the size of the stratum. The sample size of each stratum ( ni ) is given by the proportion of the population that falls into that stratum (Ni/N). The standard deviation of the estimate of the mean becomes 2 sx 2 N i si N n i and since ni Ni n N then sx w s 2 i i n E.g. Beer consumers were divided into four segments (strata) on the basis of demographics. A proportionate sample would then be drawn with sample size in each stratum proportionate to the size of the sample. Size of Stratum Average beer St.Dev. Of Stratum stratum sample size consumption Beer Consumption 1 8000 80 20 2 6000 60 10 3 4000 40 15 4 2000 20 6 20000 200 Total 4 4 5 2 x 0.4(20) .3(10) .2(15) .1(6) 8 3 3 .6 14.6 2 2 2 42 2 4 2 5 2 2 s x (.4) (.3) (.2) (.1) 0.032 0.024 0.025 0.002 0.288 80 60 40 20 2 Alternatively, the shortcut formula gives .4(4) 2 .3(4) 2 .2(5) 2 .1(2) 2 .288 200 s x Disproportionate sampling. “Undemocratic”, since some strata are deemed more important than others. A disproportionately large part of the sample is then obtained from these important strata. Some segments can more important than other; consequently, a researcher may wish to sample a disproportionately larger number of subjects from a certain strata/s. On the other hand, assuming financial constraints, a researcher may collect data in greater portion from “easy-to-question” strata in order to get a bigger sample. Another, more logical, excuse to have disproportionate sampling has statistical nature. E.g., there is a stratum with average consumption of 3 and deviation of 4; another stratum has average consumption of 320 and standard deviation of 0. Considering second strata, a single observation would be sufficient and any additional – redundant. In order to produce the minimum variance (most reliable) estimate of the overall mean, a only one observation is to be taken from second strata, and all other financially possible – from the first one. General formula for optimal sampling to minimize total variance of the estimate is: ni wi si k w s i 1 n i i and the resulting error of the mean is: n wi si s x i 1 n 2 returning to the beer consumption example, the optimal sample allocations will be as follows: n1 .4(4) 200 80 .4(4) .3(4) .2(5) .1(2) n1,n2 ,n3 will correspondingly be 60, 50, and 10. NOTE! The procedure requires knowing the standard deviations in each stratum in advance. Since the true standard deviations are never deducible, we substitute either subjective estimates of the sample or the results of a prior study. Assuming we now proceeded to take another survey of size 200 according to the disproportionate approach, the results might be as follows: Size of Stratum Average beer St.Dev. Of Stratum stratum sample size consumption Beer Consumption 1 8000 80 20 4 2 6000 60 10 4 3 4000 50 15 5 4 2000 10 6 2 20000 200 Total Average beer consumption would then be as follows: x .4(20) .3(10) .2(15) .1(6) 14.6 The standard deviation would be: sx [0.4(4) 0.3(4) 0.2(5) 0.1(2)] 2 200 42 0.283 200 In this case, the standard deviation is only slightly (less than 1 percent) smaller under the disproportionate sampling plan, a surprisingly typical result. In fact, unless the standard deviations of the strata are very different, disproportionate sampling does very little to the variance estimate. For example, if 50 were sampled from each of the four strata, the standard deviation of the mean would be (assuming the estimates of the mean and standard deviations were unchanged): s x (.4)2 42 42 52 22 5.04 (.3)2 (.2)2 (.1)2 .1008 .317 50 50 50 50 50 The point, therefore, is that for most marketing surveys, sampling disproportionately to get the most reliable estimates is not very useful. The major reason for using a stratified sample, therefore, is to ensure adequate representation of key subgroups of the target population. Stratified samples often are applied to situations where more than one variable serves as a basis for stratification. For example, I might be interested in a consumer product, which appealed primarily to middle-aged, high-income consumers. Given a budget which allowed for a sample of 2000, stratified sampling plan might look like: Income level Under $10000 $10000-$19999 $20000 - $29999 $30000+ Under 30 50 50 100 200 Age group 30-50 50 200 400 500 51+ 50 100 100 200 Universal Sampling (Census). Impossible and overly expensive for final consumer markets; but, if dealing with 30-40 major clients on industrial market, then it is both logical and statistically desirable to scan all subjects possible. Convenience. Useful for hypothesis generation and initial pilot research (i.e., learn to run surveys on convenience samples). Quota. A compromise b/w stratified and convenience sampling. E.g., a firm may want the opinions of at least 30 housewives b/w ages 30 and 45. Hence a quota sample may be generated by having the interviewer collect data from the first 30 women who agree to participate. Cluster samples. Cluster samples are exactly what the name implies – samples, gathered in clusters. Motivation – cost reduction. E.g., area sampling (collecting a certain number of interviews from a certain geographical region), or list sampling (sampling all people, whose names begin with a set of randomly selected letters). Sequential sampling. A small sample is drawn and the results are analyzed. If the results are sufficiently clear, a decision is made and the rest of sample is not drawn. In not – another sample is drawn sequentially. Multistage samples (!) A mix of several approaches. E.g., for big samples, first – select a set of areas; second – select (e.g., randomly) a set of locations within areas; third – select individuals within locations (e.g., randomly, or systematically). This approach turns out to be both representative and reasonably efficient. SOURCES OF LISTS It is not a trivial task to find a good list (can be obsolete, incomplete, contain errors, etc.). In States, suppliers charge approximately from $50 to $120 per 1000 list items (E.g., Fritz Hofheimer’s 1977 catalogue). SCREEING QUESTIONS. Are asked in order to avoid interviewing subjects, who do not qualify. E.g., 1. Do you smoke? Yes____ No____ (if answer is yes – continue on question 2, if no – terminate interview). When the target sample is too small (e.g., hang gliders), the interviewer uses referral practice – that is – asking first respondent to name others, who would qualify for the research. (Screening in this case will turn out to be very inefficient). NON-RESPONSE PROBLEM Too low response rate rises suspicion about whether the survey is representative or not. Small number of subjects, who responded may possess special traits and be different from the universe. Non-coverage. Non-listed phone numbers, ghetto-areas that are difficult to cover, low responses on mail surveys, etc. are the attributes of non-coverage. Non-response. Final subject cannot be fount (went to Cyprus). Subject is not at home (e.g., asking “What do you do during evening?” during evening survey will generate disproportionately high percent of answers “stay home,” while those who are not at home (with different opinions) cannot be reached). Refusal (afraid, have no interest, time pressure, etc.). Typology of potential respondents: 1. 2. 3. 4. 5. Happy to respond (15 percent). Willing to be convinced to respond with modest effort (50%). Can be bought at high price (15%). No way to make them respond (10%). Not even covered by the process (10%). Consequently, categories 1,2,3 can be covered only, resulting in average response rates from 10 to 70%. Determinants of response rate. Interest. Can be the major determinant of response rate. The higher – the better. Is subjective and dependent on other parameters below. Length. The longer questionnaire is – the less likely someone is to begin it/complete it/answer all questions/etc. Opening gambit. Intro, which invites respondent to participate is extremely important. Besides interviewer exterior, voice, etc., different appeals also may be useful, e.g.: mercy (I am a poor student…), self-interest (“your opinion will count), or duty (“you should express your views”). Guarantees of anonymity are also useful in persuading reluctant respondents to participate, as well as credentials of the surveying company. Use of person’s name also seem to increase response rate (e.g., in mail surveys); plus – return envelope, “classy” mail, etc. also count. Incentives/bribery. Works, but respondents start to think to give answers “that I wish to hear” instead of true ones, leading to additional biases. Cheap stimuli are ineffective (e.g., a reward up to a dollar per long interview, or a small souvenir), while a chance to win a color TV set worth $500 turn out to be more effective. Format. Use adequate white space, good plain paper, standard fonts, etc. Advance Notice. In order to secure operation, it is common to give advance notice (by phone, postcard, or letter) of the impending study. This is often useful in increasing both response rate and quality. Callback/Follow-up. Subsequent attempts to reach a respondent if s/he was not at home at first time. Twothree attempt callback plans are typical, and can raise the efficiency up to 80 percent. Overall. There must be a trade-off b/w cost per subject and effect. Some basic studies of psychological nature may suffice with moderately biased samples, and no call-backs, incentives, etc. can be used; while average income studies require representative samples – what means – more expenditures. Problem of dropouts in multi-wave surveys is typical and accounts to 5% in welldesigned/controlled settings up to 25% in mostly uncontrolled panels. Researcher may waste resources by throwing away partial involvement subjects (and the data collected from them) or may still incorporate incomplete data into a research and run the risk of measuring differences b/w people rather than changes over time. Weighting Account for Non-response. E.g. mail survey, which attempted to find a number of people who would be interested in buying a new product. There was a 30 percent response rate, and of the respondents, 40 percent said they would buy a product. One estimate for the percent of the population who would be interested in buying (which incidentally, would greatly overstate actual buying) would thus be 40%. On the other hand, we could assume that the other 70% did not return the survey because they were not interested in the product. In that case, the appropriate estimate would be 40%*30=12%. Actually, the number would probably lie somewhere in between 12 and 40, but 12 is more likely to be the true estimate than 40. This is only a single instance of difficulties, related to non-response. I redirect you to other sources for obtaining more professional knowledge of how to deal with this respondent’s practice. (Read out loud pp.310-311 about a historical fatal sample design [Roosevelt vs. Landon victory prediction]).