Download Sampling

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Sampling
Sampling Distributions
• Sample is subset of population used to
infer something about the population.
• Probability – know the likelihood of
selection
• Nonprobability – likelihood unknown
Random sampling:
• each member of population has an equal
and independent chance of being
selected.
• Equal – no bias of one person chosen
rather than another
• Independent – choice of one person does
not influence choice of next
• RANDOM and HAPHAZARD are NOT the
same thing.
• True random sampling is a very systematic
structured selection
Simple random
•
•
•
•
Define population
List members of population
Assign numbers to each member
Random selection (eg random number
table)
• If you have the whole population this
works
Systematic sampling
• Select every kth value but have random
start point
• Population list in random order
• Does not have equal chance of selection
Stratified random selection
• If some characteristic of the population
needs to considered – eg gender, religion
• Need a profile of the population
• Know proportions in each category and
select sample to match BUT must use
random selection
Cluster sampling
• Units of individual selection at random
• Eg dorm, clinic, school
• Not independent
• Bias possible
Nonprobability sampling
• Convenience – very common
• Quota – selects profile but not random
selection – first 10 sign up…
Other samples
• Matched – precision match (eg twins)
• Range – categorize then assign
• Cohort samples – common in
development studies
Type of sampling When it should
be used
Probability
sampling
Advantages
Disadvantages
Population’s
members are
similar to each
other
Systematic
Population’s
sampling
members are
similar to each
other
Stratified random Heterogeneous
sampling
population –
several groups
Ensures a good
representation
Time consuming
and tedious
Ensures a good
representation,
no random
number table
Ensures a good
representation of
all strata in
population
Easy and
convenient
Less random
Simple random
sampling
Cluster sampling
Population
consists of units
rather than
individuals
Time consuming
and tedious
Possibility that
members of units
are different from
one another –
decreasing
sampling
effectiveness
Nonprobability
sampling
Convenience
sampling
Quota sampling
Sample is captive Easy and
inexpensive
Strata present
Some
and stratified but representation of
sampling not
all strata in
possible
population
Questionable
representation
Questionable
representation
Two factors count
• Random selection
• Size of sample
Landon vs FDR (1936)
Digest
Predict
election
Gallup
Predict
Digest
Gallup
predict
election
Result
43%
44%
56%
62%
FDR
10 million
surveys
(2.4 m)
3000
50,000
• When a selection procedure is biased
taking a large sample does not help.
• It just repeats the same mistake over and
over.
Sampling Distribution of Means
The distribution of sample means is the collection
of all the possible random samples of a
particular size (n) that can be obtained from a
population.
• in probability terms we have all possible
outcomes and can determine the probability of
any one outcome
• the sample means clump around the population
mean (as you would expect if the samples are
representing the population)
Central limit theorem states:
• For any population with mean μ (mu) and
standard deviation σ (sigma), the
distribution of the sample means for a
sample size n will approach a normal
distribution with a mean μ and standard
deviation of σ/√n (standard error) as n
approaches infinity.
What does it mean?
• for any population the distribution of
sample means will approach normal ( the
original population does not need to be
normal)
• the distribution of sample means rapidly
approaches n>30 gives a good
approximation
weblink
Standard Error
• The difference between one sample mean and
the population mean.
• σ/√n
• What influences standard error?
• Population standard deviation – the closer your
sample is clustered around the mean the closer
it will be to estimating the population mean.
• Sample size – generally the larger the sample
the more representative.
Histogram consistent Tues Stroop
Mean = 19.6
4
3.5
3
2.5
frequency
2
1.5
1
0.5
0
15
20
Time (seconds)
25
30
N=2
10 samples
Mean =19.06
One sample
Mean =16.7
1
2
0.9
1.8
0.8
1.6
0.7
1.4
0.6
1.2
0.5
1
0.4
0.8
0.3
0.6
0.2
0.4
0.1
0.2
0
10
12
14
16
18
20
0
14
22
16
20
18
16
14
100 samples
Mean =19.97
12
10
8
6
4
2
0
14
16
18
20
22
24
26
28
18
20
22
24
26
N=4
One sample
Mean =22.8
10 samples
Mean =19.4
1
0.9
0.8
2
0.7
1.8
0.6
1.6
0.5
1.4
0.4
1.2
0.3
1
0.2
0.8
0.1
0.6
0
16
18
20
22
24
26
28
0.4
0.2
0
17.5
20
18
100 samples
Mean =19.56
16
14
12
10
8
6
4
2
0
16
17
18
19
20
21
22
23
24
25
18
18.5
19
19.5
20
20.5
21
21.5
22
Related documents