Download Why sample?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Sampling. Fundamental principles.
Daniel Gile
[email protected]
www.cirinandgile.com
Gile Sampling
1
Why sample? (1)
In research
There is often (but not always) an attempt
To generalize
on the basis of a limited number of observations
Because access is available only to part of reality
If reality was homogenous
One observation would be enough
(or two or three to make sure no error of observation or
measurement was made)
Gile Sampling
2
Why sample? (2)
But reality is generally complex
with variability
It is therefore necessary to find a means to make sure
(to the best possible extent)
that the part of reality which will be observed/measured
will represent adequately the whole phenomenon in
which one is interested
Sampling is a set of methods which seeks to ensure
that whatever will be observed or measured will
be as similar as possible in its relevant features to
the whole phenomenon under study
Gile Sampling
3
Representative samples and sampling error
A sample should be representative – that is its most important
feature.
In statistical terms, this does not mean that it should have exactly
the same characteristics as the whole phenomenon which is the
object of research (the population)
Some difference between the features of the sample and the
features of the population is always possible, and even very
likely. Such a difference is called ‘sampling error’, though
it is not an ‘error’ in the usual sense of the word.
A representative sample is not one without a sampling error.
It is one without bias, that is, a systematic deviation from the
features of the population (generally either systematically
more, or systematically less)
Gile Sampling
4
Samples and populations
In statistics, the set of individuals, objects, processes, events,
situations or other entities that are the object of research are
called the population
The sample is a subset of this population
It has a size, namely the number of entities of which it is made up
Population
Sample
Gile Sampling
5
What is measured in samples?
Generally, what is measured in the sample is one of the features
of the entities it is made of, in order to evaluate its value in the
population
(percentage of unemployed, students’ marks in a test, mean time
to perform a task…)
Two very important measurements in a sample are those of its
mean value, the sample mean
and the value of its standard deviation, which is a rough
approximation of how much individual values in the sample
vary around the mean
In a representative sample,
The mean is an approximation of the population mean
The standard deviation gives us an idea about the degree of
uncertainty we have in our inferences
Gile Sampling
6
Representative samples, biased samples (1)
If a sample is representative of the population,
measurements on any of the units of which it is made can
vary greatly around the population mean. It may be
higher, it may be lower, sometimes considerably, but it is
expected that the mean value of these measurements will
tend to be closer to the population mean than extreme
values which could be found.
Sample means can nevertheless be sometimes higher,
sometimes lower than the population mean.
If a representative sample is drawn and its mean is,
calculated, and then another representative sample is
drawn and its mean is calculated and so on, the successive
means should tend to be equally spread around the
population mean.
Gile Sampling
7
Representative samples, biased samples (2)
When a large number of samples has been drawn and their
sample means have been calculated, their mean (the mean
of the means) will be very close to the population mean…
provided these are representative samples.
If the samples are biased in some way, their means will tend
to be systematically either above or below the population
mean, and this tendency will not disappear no matter how
many samples are drawn.
Gile Sampling
8
Sampling error and sample size
The variability of sampling error
(the random difference between sample mean and population
mean) can be reduced by increasing sample size.
However, the reduction is proportional
not to the increase in sample size,
but to the square root of the coefficient by which it is increased
In other words, in order to reduce such variability by half,
you need to increase sample size by a factor of 4.
If you want to reduce it by 75%,
you have to multiply the sample size by 16.
In concrete terms, this means that sampling costs increase
exponentially for relatively little gain.
This is why generally, sampling is not done with thousands of
units.
Gile Sampling
9
Other ways of reducing sampling error
Another way of reducing sampling error consists in using more
precise sampling methods, provided available information on
the population allows it.
For instance, if it is known that in a given population,
70% belong to ethnic group A
20% belong to ethnic group B
10% belong to ethnic group C
(and it is believed that the ethnic groups are relevant)
In a simple random sample, some ethnic groups could be overrepresented or under-represented.
In order to reduce the error, random sampling can be done with
70% of people from group A, 20% from group B, 10% from
group C.
This method is called stratified sampling
Gile Sampling
10
But if there is so much uncertainty, is it legitimate at
all to make inferences?
Mathematical calculations based on probability theory make
it possible to assess the probability the mean calculated
on a representative sample stays within a certain distance
from the population mean.
(Confidence interval)
This is only an estimation, but it is likely to be true with a
certain probability, so it is helpful in making inferences...
But there is no absolute certainty….
Especially if there is a hidden bias somewhere.
Gile Sampling
11
So how do you know that a sample is representative
(i.e. not biased)?
The only way to do away with any risk of bias
Is to conduct random sampling,
That is,
A drawing samples where each unit in the population has
the probability of being drawn.
This can be done with a random number table, or with a
computer that generates a quasi-random sequence of
numbers
But any human method based on some sort of rationale
other than the generation of random phenomena is
associated with risks of introducing some hidden bias.
Gile Sampling
12
What happens in the field?
In human and social sciences, only rarely is it possible to actually
do random sampling
If only because only rarely does one have a full list of all persons
in a population, so that those to be in the sample can be chosen
at random from the whole population.
Also, even when such a list is available, only rarely will those
drawn into the sample actually accept to participate.
As a result, most of the sampling done is non random,
convenience sampling or sampling with ‘self-selection’ by
participants/respondents
So one cannot be certain that the sample is not biased.
So strictly speaking, if one applies strict scientific norms,
inferences drawn from such samples are not up to standards.
Gile Sampling
13
Implications (1)
This does not invalidate the approach totally
Especially if care is taken to draw samples that one believes
to be representative
Depending on the investigator’s knowledge of the
phenomenon under study and of his/her beliefs as to
where biases could lie.
But there is no certainty, and there is a subjective, arbitrary
part in one’s assessment of the findings and their
reliability
Gile Sampling
14
Implications (2)
Scientific caution therefore calls for tentative conclusions,
not strong claims
Statistical analyses can be carried out, but ideally, readers of
the report should be reminded that the samples are not
necessarily representative
In come cases, when the investigator knows that the sample
is part of a well-defined subset of the population with its
own feature
(young people, people from a certain cultural background,
from a certain social class etc.)
It is desirable to point out one’s awareness of the possibility
of bias arising from specific features of this subset of the
population.
Gile Sampling
15
Implications (3)
Summing up this particular issue,
Unless a sample is truly random
Any generalization can only be tentative,
Which means that findings in a single study do not prove
anything
It is only through the accumulation of findings pointing in
the same direction
That the idea that they can be generalized gradually gains
ground
Gile Sampling
16
Samples and case studies
But in that case, is there any fundamental difference between
studies on samples and studies on single cases (case studies)?
After all, the accumulation of convergent results in case studies
has the same effect as the accumulation of convergent findings
in studies on samples
Indeed, but studies on samples are more powerful, because they
reduce variability, since a sample mean is very likely to be
closer to the population mean than a random individual value
Nevertheless, case studies are legitimate, and in concrete terms of
feasibility, they can sometimes be replicated more often than
studies on samples… and each case study often makes it
possible to conduct a more thorough, deeper investigation than
studies on samples
Gile Sampling
17