Download Literature Review of Sample Design and Methods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Literature Review of Sample Design and Methods Generally, a researcher employs sampling strategies in order to generate statistics and
generalize findings to a larger population. Sampling refers to the process of selecting individuals
from a larger group of people and drawing conclusion that are “an accurate representation of
how the larger group of people acts or what they believe” (Frankel & Wallen, 2006, p. 92).
According to Fowler (2002), the process by which a sample is selected adheres to the following
three key aspects that will be examined at greater length.
1. The sample frame is the set of people that has a chance to be selected, given the sampling
approach that is chosen. Statistically speaking, a sample can be representative only of the
population included in the sample frame. One design issue is how well the sample frame
corresponds to the population a researcher wants to describe.
2. Probability sampling procedures must be used to designate individual units for inclusion
in a sample. Each person must have a known chance of selection set by the sampling
procedure. If researcher discretion or respondent characteristics such as respondent
availability or in initiative affect the chances of selection, there is no statistical basis for
evaluating how well or how poorly the sample represents the population; commonly used
approaches to calculating confidence intervals around sample estimates are not
applicable.
3. The details of the sample design, its size and specific procedures used for selecting units,
will influence the precision of sample estimates, that is, how closely a sample is likely to
approximate the characteristics of the whole population.
The Population and the Sample Frame
The population of interest is typically a group of persons who posses a certain
characteristic (or set of characteristics) (Frankel & Wallen, 2006). The actual population can be
any size and is usually referred to as the target population to which a researcher would like to
generalize. However, the entire target population is sometimes difficult to sample so a more
narrowly defined population, or the accessible population, is considered. According to Frankel
and Wallen (2006), a more narrowly defined population will often save on time, effort and even
money, but may limit the findings generalizability. In addition, they assert that it is important for
the researcher to clearly describe the population and the sample in sufficient detail so interested
parties can apply the findings to their own situations.
Those individuals of a population that have a chance of being selected comprise the
sample frame. According to Fowler (2002), the first step in evaluating the quality of a sample is
to define the sample frame and place it in one of the following three general classes::
1. Sampling is done from a more or less complete list of individuals in the population to be
studied.
2. Sampling is done from a set of people who go somewhere or do something that enables
them to be sampled. In these cases, there is not an advance list from which the sampling
occurs; the creation of the list and the process of sampling may occur simultaneously.
1
3. Sampling is done in two or more stages, with the first stage involving sampling
something other than the individuals finally to be selected. In one or more steps, these
primary units are sampled, and eventually a list of individuals (or other sampling units) is
created, from which a final sample selection is made.
Furthermore, Fowler (2002) proffers three characteristics of a sample frame that a
researcher should evaluate:
1. Comprehensiveness. A sample can be representative only of the sample frame, that is,
the population that actually had a chance to be selected and the extent to which those
excluded are distinctive. If a list is considered for sampling, then it is imperative to
evaluate how the list was compiled, how and when additions and deletions are made, and
the number and characteristics of people likely to be left off the list.
2. Probability of selection. A sampling scheme may not give every member of the sampling
frame the same chance of selection, as would be the case if each individual appeared once
and only once on a list. If the probability of selection for each individual is unknown,
then it is not possible to accurately estimate the relationship between the sample statistics
and the population from which it was drawn.
3. Efficiency. In some instances, sampling frames include units that are not among those
that the researcher wants to sample. Assuming that eligible persons can be identified at
the point of data collection. This may or may not be a very cost-effective approach.
Probability Sampling
Stuart (1984) defines probability sampling as the kind of sampling in which “every
element in the population has a nonzero chance of being selected.” Each individual in a sample
frame drawn from the population is selected by chance and at random. When probability
sampling is used, according to Ary et al., (2002), “inferential statistics enable researchers to
estimate the extent to which the findings based on the sample are likely to differ from what they
would have found by studying the whole population” (p. 165). Random sampling “eliminates
the possibility of the sampler’s biases messing up the sample’s representativeness” (Popham,
1993, p. 246). The four most common types of probability sampling used in educational
research are simple random sampling, stratified random sampling, cluster random sampling, and
systematic sampling, (Note: Ary et al., Fowler, Fraenkel & Wallen, and Popham address the
various types of sampling methods in their texts. A synthesis of the procedures drawn from
these authors will be described.)
Simple Random Sampling
A simple random sample is one in which each member of the population has an equal and
independent chance of being included in the random sample. If the sample is large, this method
is the best way to obtain a sample representative of the population (Fraenkel & Wallen, 2006).
Simple random sampling approximates drawing a sample out of a hat: Members of a population
are selected one at a time, independent of one another and without replacement; once a unit is
selected, it has no further chance to be selected (Fowler, 2002). The steps in simple random
sampling, proffered by Ary et al., comprise the following:
2
1. Define the population.
2. List all members of the population.
3. Select the sample by employing a procedure where sheer chance determines which
members on the list are drawn from the sample.
The most utilitarian way to draw a random sample is to employ a table of random
numbers. The table of random numbers typically consists of an extensive series of five-digit
numbers randomly generated by a computer. The first step is to assign each member of the
population a distinct identification number and then use the table of random numbers to select
the identification numbers of subjects to include in the table (Ary et al.).
According to Fraenkel And Wallen (2006), the advantage of random sampling is that it is
very likely to produce a representative sample if the sample is large enough. The biggest
disadvantage is that it is not easy to do. Each member of the population must be identified and
must be able to be contacted.
Stratified Random Sampling
Stratified random sampling is “a process in which certain subgroups, or strata, are
selected for the sample in the same proportion as they exist in the population” (Fraenkel &
Wallen, 2006, p. 96). In stratified sampling, the researcher first identifies the strata of interest
and then randomly draws a specified number of subjects from each stratum; either by taking
equal numbers from each stratum or in proportion to the size of the stratum in the population
(Ary, et al., 2002). Popham (1993) warns, however, not to simply subdivide a population into
age, sex, and socioeconomic subgroups unless the researcher believes these dimensions are
relevant to the things being measured.
According to Ary, et al., an advantage of stratified sampling is that it allows the
researcher to study differences among various subgroups of a population and guarantees
representation of defined groups in the population. In addition, Fraenkel and Wallen (2006)
contend that stratified random sampling increases the likelihood of representativeness, especially
if the sample in not very large. They suggest that the disadvantage is that is requires more effort
on the part of the researcher. Popham (1993) purports that stratified random sampling is seen as
a more refined method of sampling over simple random sampling.
Cluster random Sampling
Cluster random sampling permits the selection of groups, or clusters, of subjects rather
than individuals (Fraenkel and Wallen, 2006). Simply, there may be occasions when it is not
possible to select a sample of individuals from a population because a list of all members of the
population may not be available. While simple random sampling is more effective with larger
numbers of individuals, cluster random sampling is more effective with larger number of clusters
(Fraenkel & Wallen, 2006). Ary et al. (2002) asserts that is essential for the clusters chosen in
the study to be chosen at random from a population of clusters. In addition, once a cluster is
selcted, all the members of the cluster must be included in the sample. Furthermore, if the
number of clusters is small, the likelihood of sampling error is great – even if the total number of
subjects is large.
3
Systematic Sampling
Systematic sampling involves drawing a sample by taking every Kth case from a list of
the population (Ary, et al., 2002). Systematic sampling differs from simple random sampling in
that the various choices are not independent. In other words, once the first case is chosen, all
subsequent cases to be included in the sample are automatically chosen (Ary et al., 2002). Ary et
al. (2002) offer the following steps for conducting a systematic sample:
1. Decide how many subjects you want in the sample (n).
2. Divide N (the total number of members in the population) by n and determine the
sampling interval (K) to apply to the list.
3. Select the first member randomly from the first K members of the list, and then select
every Kth member of the population for the sample. For example, assume the population
of 500 subjects and a desired sample size of 50: K = N/n = 500/50 = 10.
4. Start near the top of the list so that the first case can be randomly selected from the first
ten cases; and then select every tenth case thereafter.
Fraenkel and Wallen (2006) note the following caveat with regard to systematic
sampling:
When planning to select a sample from a list of some sort, researchers should
carefully examine the list to make sure there is no cyclical pattern present. If the
list has been arranged in a particular order, researchers should make sure the
arrangements will not bias the sample in some way that could distort the results.
If such seems to be the case, steps should be taken to ensure representativeness –
for example, by randomly selecting individuals from each of the cyclical portions.
In fact, if a population list is randomly ordered, a systematic sample drawn from
the list is a random sample (p. 100).
Selecting the Sample Size and Minimizing Error
According to Ary, et al. (2002), all things being equal, the larger the sample size,
the better representativeness of the population. The most important characteristic of a
sample, therefore, is its representativeness, not its size. Fraenkel and Wallen (2006)
concede that some differences will exist between the sample and population, but of the
sample is randomly selected and of sufficient size, the differences are likely to be
“relatively insignificant and incidental” (p. 103). At what point, they ask, does a sample
stop being too small and become sufficiently large? They contend that a sample should
“be as large as the researcher can obtain with a reasonable expenditure of time and
energy” (p. 104). Popham (1993) further suggests that the researcher “must determine
the uncertainty in the estimate that one would tolerate before changing a decision” (p.
249), and “the task of identifying a sufficiently large sample is more difficult than is
usually thought” (p. 250).
So how big should a sample be? Fowler (2002) identified three common but
inappropriate ways to answer this question.
4
1. The adequacy of the sample should not depend heavily on the fraction of the
population included in that sample – that somehow 1% or 5%, or some other
percentage of the population will make a sample credible.
2. Do not rely on what other competent researchers have considered to be adequate
sample sizes when making your sample size decision. The sample size decision
should be made on a case-by-case basis with the researcher “considering the
variety of goals to be achieved by a particular study and taking into account
numerous other aspects of the research design.
3. The researcher should not decide how much margin of error he or she can tolerate
or how much precision is required of estimates before determining the sample
size.
Fowler concedes that the third point makes theoretical sense, but “provides little
help to most researchers trying to design real studies” (p. 35). But it is important to note
that there is a credible resource on the Internet that will determine sample size by taking
into consideration factors such as the necessary degree of precision: confidence interval
and confidence level. The online “Sample Size Calculator” presented by Creative
Research Systems (http://www.surveysystem.com/sscalc.htm) will determine how many
people you need to interview in order to get results that reflect the target population as
precisely as needed. Before using the sample size calculator, there are two terms that you
need to know. These are: confidence interval and confidence level.
•
•
The confidence interval is the plus-or-minus figure usually reported in newspaper
or television opinion poll results. For example, if you use a confidence interval of
4 and 47% percent of your sample picks an answer you can be "sure" that if you
had asked the question of the entire relevant population between 43% (47-4) and
51% (47+4) would have picked that answer.
The confidence level tells you how sure you can be. It is expressed as a percentage
and represents how often the true percentage of the population who would pick an
answer lies within the confidence interval. The 95% confidence level means you
can be 95% certain; the 99% confidence level means you can be 99% certain.
Most researchers use the 95% confidence level.
When you put the confidence level and the confidence interval together, you can say that
you are 95% sure that the true percentage of the population is between 43% and 51%. The wider
the confidence interval you are willing to accept, the more certain you can be that the whole
population answers would be within that range.
Finally, Fowler (2002) proffered the following three different ways in which the
sampling process can affect the quality of survey estimates and potentially cause
sampling error:
•
•
If the sample frame excludes some people whom we want to describe, sample
estimates will be biased to the extent that those omitted differ from those
included.
If the sampling process is not probabilistic, the relationship between the sample
and those sampled is problematic. One can argue for the credibility of a sample
5
•
on grounds other than the sampling process; however, there is no statistical basis
for saying a sample is representative of the sampled population unless the
sampling process gives each person selected a known probability of selection.
The size and design of a probability sample, together with the distribution of what
is being estimated, determine the size of the sampling errors, that is, the chance
variations that occur because of collecting data about only a sample of the
population.
References
Ary, D., Jacobs, L.C., & Razavieh, A. (2002). Introduction to research in education. Belmont,
CA: Wadsworth/Thomson Learning.
Creative Research Systems. (2007). The survey system.
<http://www.surveysystem.com/sdesign.htm>
Fowler, F.J. (2002). Survey research methods. Thousand Oaks, CA: Sage Publications.
Fraenkel, J.R., & Wallen, N.E. (2006). How to design and evaluate research in education. New
York: McGraw-Hill.
Popham, W.J. (1993). Educational evaluation. Needham Heights, MA: Allyn and Bacon.
Stuart, A. (1984). The ideas of sampling (3rd ed.). New York: Macmillan.
6