Download Random sampling

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Gibbs sampling wikipedia , lookup

Misuse of statistics wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Transcript
SAMPLING
Basic concepts
•
•
•
•
•
•
•
Why not measure everything?
– Practical reason: Measuring every member of a population is
too expensive or impractical
– Mathematical reason: Random sampling allows us to test
hypotheses using inferential (probability) statistics
Population
– Largest group to which we intend to project the findings of a
study (e.g., every inmate in Jay’s prison)
– Parameter: A statistic of the population; e.g., mean sentence
length
Sample
– Any subgroup of the population, however selected
– Samples intended to represent a population must be selected
in a way to make them “representative” (will come up later)
Unit of analysis
– “Persons, places, things or events” under study
– The “container” for the variables
“Member” or “element” of the population
– What we call a case once it’s been drawn into a sample
Sampling frame
– A listing of all “elements” or members of the population
Probability sampling
–
“Gold standard” - every element (“case”) in the population
has the same chance of being included in the sample
–
Random sampling is the most common probability technique
Jay’s correctional institution
Population
Sample
Sampling accuracy and error
•
Representativeness: Samples should accurately reflect, or
represent, the population from which they are drawn
– If a sample is representative, then we can accurately “make
inferences” (apply our findings) to the population
• We can simply describe the population
• Or we can test hypotheses and extend our findings to
the population
– Warning: we cannot generalize to other populations – only
the population from which the sample was drawn
•
Sampling error: Unintended differences between a population
parameter and the equivalent statistic from an unbiased sample
– Inevitable result of sampling
– Try it out in class! Calculate the parameter, mean age. Then
take a random sample (more about that later) and compare
it to the sample statistic.
– Any difference between the two is “sampling error.” It
should decrease as sample size increases
– Rule of thumb: To minimize sampling error sample size
should be at least 30 for populations up to about 500; for
larger populations sample size should be greater
RANDOM (PROBABILITY)
SAMPLING
Sampling process
•
Sample with or without replacement?
– With replacement: Return each case to the population before drawing the next
• Keeps the probability of being drawn the same
• Makes it possible to redraw the same case
– Without replacement: Drawn cases are not returned to the population
• Probability of undrawn cases being selected increases as cases are drawn
– In social science research sampling without replacement is by far the most common
• Most sampling frames are sufficiently large so that as elements are drawn
changes in the probability of being drawn are small
•
Sample: simple or stratified? (examples on next two slides)
– In simple random sampling we randomly draw from the entire population
– In stratified random sampling we divide the population into subgroups according to
a characteristic of interest
• For example, male and female; officers and supervisors; violent offenders and
property offenders
– Can designate strata before or after sampling
• Proportionate: Draw a sample from the population without regard to strata,
then stratify
• Disproportionate (most common): Stratify first, then draw samples of equal size
from each stratum
Exercise - using simple random sampling
to describe a population
Population: 200 inmates
Mean sentence: 2.94 years
Draw a random
sample of 10
and compare its
mean to the
population
parameter. Then
do the same with
a random
sample of 30.
Frequency (# prisoners)
Data from
Jay’s correctional center
Assignment
How much error
is there? Does it
change with
sample size?
Koko Wachtel, warden
Sentence length in years
Exercise - using stratified random sampling
to describe a population
Population 200 inmates; mean sentence 2.94 years
Property crimes: 150
Mean sentence: 2.88
Violent crimes: 50
Mean sentence: 3.12
Assignment
Draw a
random
sample of 30
from each
stratum and
compare its
mean to the
corresponding
population
parameters.
How much
error is there?
Exercise - using random sampling
to test a hypothesis
Hypothesis: A pre-existing personal relationship between criminal and
victim is more likely in violent crimes than in crimes against property
You have full access to crime data for Sin City. These statistics show that in 2014
there were 200 crimes, of which 75 percent were property crimes and 25 percent
were violent crimes. For each crime, you know whether the victim and the suspect
were acquainted (yes/no).
Applying what we learned from the preceding two slides…
1. Identify the population.
2. How would you sample?
A. Would you stratify before or after?
B. Which is better? Why?
Stratified proportionate random sampling
Hypothesis: A pre-existing personal relationship between criminal and victim is
more likely in violent crimes than in crimes against property
Sin City
200 crimes in 2014
50 violent (25 %)
150 property (75 %)
randomly select 30 cases
(15% of the population)
(expect 7.5 violent – 25%)
(expect 22.5 property – 75%)
Compare proportions of these cases
where suspects knew the victim
Stratified disproportionate random sampling
Hypothesis: A pre-existing personal relationship between criminal and victim is
more likely in violent crimes than in crimes against property
Sin City
200 crimes in 2014
50 violent (25 %)
150 property (75 %)
randomly select 30 cases
from each category
30 violent
30 property
Compare proportions within each where suspect and
victim were acquainted
(Note: cannot combine results)
Exercise: Using random sampling
to test hypotheses
Hypothesis1: Gender affects cynicism (two-tailed)
Hypothesis2: Male cops are more cynical than female cops (one-tailed)
Sin City Police Department has 200 officers; 150 are male and 50 are
female. We wish to test the above hypotheses.
1. Identify the population.
2. How would you sample?
A. Would you stratify? In advance or later?
B. Which is better? Why?
Stratified proportionate random sampling
Hypothesis: Gender affects cynicism (two-tailed)
Male cops are more cynical than female cops (one-tailed)
150 male
(75 %)
Sin City
200 officers
50 female
(25 %)
randomly select 30 officers
expect 7.5
females
expect 22.5 males
Compare average cynicism scores
Is there a problem? Hint: how many females in the sample?
Stratified disproportionate random sampling
Hypothesis: Gender affects cynicism (two-tailed)
Male cops are more cynical than female cops (one-tailed)
Sin City
200 officers
150 male (75 %)
50 female (25 %)
randomly select 30 officers from
each stratum
30 males
30 females
Compare average cynicism scores
Note: don’t recombine these into a single
sample!
Sampling in experiments
Making cops “kinder” and “gentler”
The Anywhere Police Department has 200 patrol officers, of which 150 are
males and 50 are females. Chief Jay wants to test a program that’s
supposed to reduce officer cynicism.
Hypothesis: Officers who complete the training program will be less
cynical
Dependent variable: Score on cynicism scale (1-5, low to high)
Independent variable: Cynicism reduction program (yes/no)
Stratified disproportionate random sampling
Hypothesis: officers who complete the training program will be less cynical
population:
200 patrol officers
150 males (75%)
50 females (25%)
CONTROL
GROUP
EXPERIMENTAL
GROUP
EXPERIMENTAL
GROUP
CONTROL
GROUP
Randomly Assign
25 Officers
Randomly Assign
25 Officers
Randomly Assign
25 Officers
Randomly Assign
25 Officers
For each group, pre-measure dependent variable officer cynicism
Apply the intervention (apply the value of the independent variable – the program.)
NO
YES
YES
NO
For each group, post-measure dependent variable officer cynicism
Also compare within-group changes – what do they tell us?
OTHER SAMPLING TECHNIQUES
Quasi-probability sampling
•
Systematic sampling
– Randomly select first element, then choose every 5th, 10th, etc.
depending on the size of the sampling frame (number of cases or
elements in the population)
– If done with care can give results equivalent to fully random sampling
– Caution: if elements in the sampling frame are ordered in a particular
way a non-representative sample might be drawn
•
Cluster sampling
– Method
• Divide population into equal-sized groups (clusters) chosen on
the basis of a neutral characteristic
• Draw a random sample of clusters. The study sample contains
every element of the chosen clusters.
– Often done to study public opinion (city divided into blocks)
– Rule of equally-sized clusters usually violated
– The “neutral” characteristic may not be so and affect outcomes!
– Since not everyone in the population has an equal chance of being
selected, there may be considerable sampling error
Non-probability sampling
•
Accidental sample
– Subjects who happen to be encountered by researchers
– Example – observer ride-alongs in police cars
•
Quota sample
– Elements are included in proportion to their known representation in
the population
•
Purposive/“convenience” sample
– Researcher uses best judgment to select elements that typify the
population
– Example: Interview all burglars arrested during the past month
•
Issues
– Can findings be “generalized” or projected to a larger population?
– Are findings valid only for the cases actually included in the samples?
PRACTICAL EXERCISE
Class assignment - non-experimental designs
Hypothesis: Higher income persons drive more
expensive cars - Income  Car Value
• Independent variable: income
– Categorical, nominal: student
or faculty/staff
• Dependent variable: car value
– Categorical, ordinal: 1 (cheapest),
2, 3, 4 or 5 (most expensive)
• Assignment
– Visit one faculty and one student lot.
– Select ten vehicles in each lot using
systematic sampling
– Use the operationalized car values to
code each car’s value
– Give each team member a filled-in copy
and turn one in per team next week
– We will complete the tables in class
– This assignment is worth five points
PLEASE BRING THESE
FORMS TO EVERY CLASS SESSION!