Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sample Design: Part 1 Slide 1 The goal of this lecture on sample design is to introduce you to basic samples and terminology, to discuss the various stages of selecting a sample, to introduce you to two basic types of samples—probability and non-probability—and to explain the different types of samples in each of those categories and when you might use them, to introduce you to the notion of sampling error, and to close with a brief discussion of Internet and panel samples. Slide 2 As the caption to this cartoon states, “In sampling you examine less than the whole body of data to reach a conclusion about that body. Thus, when you taste your first bite of something and say ‘It needs salt’, you’re sampling.” Slide 3 Perhaps a non-food context might help. In the four photos shown on this slide, the first one is at normal resolution and the subsequent three photos are of lower and lower resolution. The face remains recognizable when going from photo #1 to #2. By photo #4, it’s almost impossible to discern a face. This photo analogy hints at issues about sample size and representativeness. Slide 4 I’ll begin by introducing you to some basic sampling terms: population or universe, population element, census, and sample. Slide 5 A population or universe is any complete group. It could be a group of people, sales territories, or stores. It’s the total group from which information is needed. Note that a population need not be huge. For example, the students enrolled in this course represent the population of students enrolled in this course. Slide 6 To take a census means studying all the elements comprising a population. The U.S. Government attempts a population census every 10 years. Of course, the population could be small, like the students enrolled in this course. If I surveyed all those students, then I’d be taking a census. Slide 7 In contrast to a census, a sample is a subset of a larger population. Cost and time are the main reasons for drawing a sample. A census is expensive unless the population is small and accessible. Typically, a population is large and not readily accessible, in which case drawing a representative sample is the only cost- and time-effective way to assess the population. Slide 8 Researchers progress through the following stages as they draw a sample: (1) defining the target population, (2) selecting a sample frame, (3) deciding whether to use a probability or non- Page | 1 probability sampling method, (4) planning the procedure for selecting sampling units, (5) determining the sample size, (6) drawing the sample, and (7) conducting the field work. Slide 9 The first step in defining a target population is to consider the research objectives. For example, if I’m interested in conducting a study on different advertising appeals or attempting to fine tune a new questionnaire scale, then an appropriate population is students. Alternatively, if I was a marketing manager interested in adding a product to my existing product line, the population is unlikely to be students, unless the product is sweatshirts or PCs. To operationally define a population like NMSU alumni, I might contact the alumni office and request a complete list of alumni. If the population is NMSU marketing majors, then I might acquire a list of those students from the COB. Although such a list won’t be perfect—it will include some non-majors (and nonstudents) and exclude some current majors—I can assume that all current marketing majors appear on that list. When selecting a target population, the convenience and cost of alternative samples should be considered. An operationally defined population that’s less expensive to access should be used, ceteris paribus, because the savings per completed questionnaire could be put toward collecting a larger sample. Slide 10 The second step is to select a sampling frame. This will only be relevant for probability or scientific samples. A sampling frame is the list of elements from which a sample may be drawn. Alternative sampling frames might include mailing and commercial lists, but they can be problematic if they’re not representative of the target population. For example, for my dissertation research I needed to identify people who relocated within the last six months. At the time, R.L. Pope maintained the ‘new movers list’, so I paid $4,000 for contact information for 40,000 households. Unfortunately, the list was not as advertised, i.e., 80% of households had relocated in the last six months. In fact, one person I contacted hadn’t moved in 57 years! Sadly, I would have been better off randomly phoning people across the U.S. Although other commercially available lists may perform better, all such lists can be problematic. Slide 11 The third step is to identify the sampling units. Sampling units can be persons, household, businesses, et cetera. Several years ago, I conducted a study for the NMSU Marketing Department to assess the skills of and needed by undergraduate marketing majors, so I surveyed several groups. One group was Fortune 500 companies—you can’t survey the company but you can survey the personnel director—and few opted to respond. Yet, the sampling unit was personnel directors for those companies. Sometimes the primary sampling unit is of interest; other times it’s the secondary sampling unit (as per cluster sampling). Slide 12 The fourth step is to decide if it’s best to draw a probability (scientific) or non-probability (nonscientific) sample. For a probability sample, every population element has a known non-zero probability of selection; for a non-probability sample, the probability of selecting a particular member of the population is unknown. This known or unknown probability of selecting population elements differentiates the two types of samples. A sample is scientific if the probability of selecting any member is known, and non-scientific if the probability of selecting any member is unknown; the latter case makes it difficult to extrapolate from the sample to the population. Page | 2 Slide 13 (No Audio) Slide 14 When are non-probability samples preferred to probability samples, and vice versa? Here are five considerations: Cost is an issue and non-probability samples are far cheaper than probability samples. For exploratory research, non-probability samples often are serviceable. For conclusive research—from which managers draw conclusions and act accordingly—probability samples should be representative and hence reliable. If non-sampling error are a major error component, then non-probability sampling is acceptable because sampling error is a lesser error source. Alternatively, if sampling errors are the largest error component, then a probability sample is preferred. If the population is relatively homogeneous—members are relatively similar to one another, then intra-population variability is low and a non-probability sample may be sufficiently representative of the larger population. Hence, it may be possible to draw a representative sample without probability sampling. Alternatively, if the population is highly variable or heterogeneous, then a probability sample is preferred because there’s more control over its representativeness. If statistics—which are irrelevant for non-probability samples—are needed to extrapolate from a sample to the population, then probability sampling is required. If operational considerations are critical, then a non-probability sample is preferred. Such samples are easier and faster to collect, more accessible, and less expensive. Probability samples tend to require great care in selection and are more expensive because creating a sample frame is an expensive proposition. In summary, non-probability samples are great for exploratory research, great if non-sampling errors are larger than sampling errors, the population is similar across members, statistical analyses are not required, and operational considerations are critical. In contrast, probability samples are preferred if decisions will be made based on the research, sampling error is the largest component of total error, population elements tend to be heterogeneous, statistical analyses are important to extrapolate to a larger population, and unfavorable operational considerations are less critical. Slide 15 At this point, a discussion on the different types of non-probability and probability samples is warranted. I’ll start with non-probability samples because it’s easier to contrast the probability samples after discussing non-probability samples. The four types of non-probability samples are convenience, judgmental, quota, and snowball. Then I’ll talk about probability samples: random, systematic, stratified, and cluster. Slide 16 (No Audio) Slide 17 I’ve included this cartoon to illustrate the problem with televised polls that news networks run, even when they’re indicated to be non-scientific, as viewers tend to tune out that caveat Page | 3 because they don’t know what it means. The cartoon shows a person who cares a lot about the outcome of a television-program-instigated survey and will continue to dial the 900 number to register his vote. Baseball fans are encouraged to submit multiple All-Star ballots; in fact, Internet-based ballots allow voting for the starting team and the last player elected to the AllStar team. As a result, votes for the starting team are weighted by fan interest; it isn’t a one person, one vote election. Slide 18 Convenience samples also are called haphazard or accidental samples. Convenience sampling is a procedure for obtaining people or units that are convenient to the researcher. Convenience is not necessarily a problem and convenience sampling is desirable for some types of research. In particular, convenience samples are very useful for pre-testing questionnaires because they are inexpensive and can help to identify poorly worded questions and poorly formatted questionnaires. Slide 19 The sampling error introduced by convenience sampling is twofold. First, certain elements of the target population will be systematically excluded. Second, elements that are not members of the target population will be included. As a result, the sample will only include a biased subset of the target population. Slide 20 This classic Pogo cartoon makes the point once more that a convenience sample may be great for some purposes but not for trying to generalize to a larger population. Slide 21 A second type of non-probability sample is a judgment sample, also referred to as a purposive sample. The judgment in this case is the judgment of the researcher; an experienced person selects the sample based on his or her judgment about appropriate characteristics required of sample members. One example of such samples is very familiar: television networks use judgment sampling to forecast the winner before all votes are tallied. These forecasts are based on exit interviews of voters as they leave the polling precinct. It is cost prohibitive to place interviewers at every precinct in the U.S. However, there’s historical data on the districts whose voters are most predictive of the election. With historical data, the networks can identify these key precincts and station interviewers at key polling places. The judgment here is informed judgment based on historical voting patterns. If conducted properly, exit polling provides accurate forecast; the misleading exit polls for the Bush-Kerry presidential election were a byproduct of poor questioning rather than poor sampling Slide 22 Unlike convenience sampling, in which people who are and aren’t members of a population are surveyed, the judgment sample includes only relevant member of a population. However, if certain members of the population are systematically excluded from the sample, then the survey results may be biased. Page | 4 Slide 23 A quota sample is a sample in which various population subgroups are representative on pertinent sample characteristics to the extent desired by researchers. Slide 24 Sex and age are two characteristics that a researcher might want to control. For example, the researchers for this chewing gum study controlled respondents such that 45% are male and 55% are female. The same can be done for age of respondents and mouthwash use or gender and razor blade consumption. For mouthwash, the field service could be required to solicit responses so that 45% of the sample was male and 55% was female, 20% were between the ages of 16-24, 25% were between 25 and 34, et cetera. Slide 25 The last of the four non-probability samples is called a snowball sample. For such samples, a small initial set of respondents is selected by a probability method and additional respondents are recommended by these initial respondents. Assume I want to survey a group that is rare in the population and there’s no good mailing or phone list available. I might randomly dial the phone, looking for people who qualify as members of the target population. If I dial enough numbers, I’ll eventually find one person. It’s likely that one person knows other similar people. At the end of the interview, I’d ask that person for three other people who are qualified to participate in my study due to similar interests, predispositions, and behaviors. For example, suppose there was no good list of NMSU alumni. I could start dialing phone numbers in New Mexico until I contacted a NMSU alum. At the end of the interview, I would ask for three more alums, then contact those three people and ask each of them for three more alums, and so on. Basically, the sample would snowball from one or a handful of initial respondents who would identify other qualified respondents, who would in turn identify more respondents, et cetera. Page | 5