Download slide show

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Gibbs sampling wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Transcript
SAMPLING
Purposes
Representativeness
“Sampling error”
Review: essential definitions
•
•
•
•
•
Population (N=size)
– Largest group to which we intend to project (apply)
the findings of a study
– All prisoners in Jay’s prison / all students in his class
– “Parameter” - any statistic (e.g., mean) of a
population
Sample (n=size)
– Any subgroup of the population
– Samples intended to represent a population must be
selected in special ways (will come up later)
Unit of analysis
– The “container” for the variables
– Here, the variables under study are sentence length
and type of crime (property or violent)
• What “contains” them? Prisoners!
Case
– A single occurrence of a unit of analysis
– Here, it’s any one prisoner
– Cases are “members” or “elements” of the population
from which one or more samples are drawn
Sampling frame
– A list of all “elements” or members of the population
Jay’s prison / Jay’s class
Population
Sample
Purposes & representativeness
•
Purposes of sampling
– Descriptive: Describe characteristics of a
population without having to measure every
member (e.g., age, height, gender)
– Explanatory: Help test hypotheses of cause
and effect (e.g., gender determines height)
•
Representativeness: Samples should accurately
reflect, or represent, the population from which
they are drawn
– We will be exploring ways to make samples
“representative”
– If a sample is representative, we can apply
findings from that sample - “make inferences”
- to the population from which the sample was
drawn
Population parameters
21
70
M
26
63
F
21
68
M
23
70
M
22
67
M
25
65
M
28
62
F
24
70
M
29
68
M
21
73
M
Sample statistics
10
24.00
Sampling error
•
•
Sampling error: Unintended
differences between a population
parameter and the equivalent
statistic from an unbiased sample
– Inevitable result of sampling
– Try it out in class! Calculate
the population mean for age.
– Then take samples of different
size and calculate their mean.
– Any difference between a
population parameter and a
sample statistic is
“sampling error.” It should
decrease as sample size
increases
Rule of thumb
– To minimize sampling error,
sample size should be
at least 30 for populations
up to about 500
– For larger populations sample
sizes should be larger
Population parameters
Sample 1 statistics
Sample 2 statistics
3
10
PROBABILITY SAMPLING
With/without replacement
Simple random sampling
Stratified random sampling / proportionate
Stratified random sampling / disproportionate
Sampling with / without replacement
•
With replacement: Return each case
to the population before drawing the
next
– Makes it possible to redraw the
same case (not good)
– Keeps the probability of a case
being drawn the same from
beginning to end (good)
•
Without replacement: Drawn cases
are not returned to the population
– Probability of undrawn cases being
selected increases as cases are
drawn
– Sampling without replacement is
by far the most common
– Most sampling frames are
sufficiently large so that as cases
are drawn changes in the
probability that any particular case
will be drawn are small
X
X
Simple random sampling
• In simple random
sampling we draw at
random from the
entire population
21
70
M
26
63
F
21
68
M
23
70
M
22
67
M
25
65
M
28
62
F
24
70
M
29
68
M
21
73
M
Using simple random sampling
to describe a population
Population: 200 inmates
Mean sentence: 2.94 years
Draw a random
sample of 10
and compare its
mean to the
population
parameter. Then
do the same with
a random
sample of 30.
Frequency (# prisoners)
Data from
Jay’s correctional center
Koko Wachtel, warden
Assignment
How much error
is there? Does it
change with
sample size?
Sentence length in years
Stratified random sampling - proportionate
N=31
(M=21 F=10)
Draw a
random
sample (say,
n=20) from
the population
n=20
(M=14 F=6)
“Stratify” - group
these cases
according to
their value or
score on your
variable of
interest
These groups are
called “strata”
(sing., “stratum”)
n=14
n=6
M
F
If you did a good
job randomly
sampling, the size
of each group - its
“n”, or number of
cases - should be
roughly
proportionate to
that score or
value’s proportion
in the population
Proceed with your
analysis
Stratified random sampling - disproportionate
N=31
First, designate a
variable of interest
(gender)
Separate the
population into
subgroups
(“strata”) that
correspond with
the variable’s
values (M, F)
n=21
n=10
n=10
n=10
M
F
M
F
Draw
random
samples of
equal size
from each
subgroup
Proceed
with your
analysis
Using stratified random sampling
to describe a population
Population 200 inmates; mean sentence 2.94 years
Property crimes: 150
Mean sentence: 2.88
Violent crimes: 50
Mean sentence: 3.12
Assignment
Draw a
random
sample of 30
from each
stratum and
compare its
mean to the
corresponding
population
parameters.
How much
error is there?
Using stratified random sampling
to test a hypothesis - exercises
1
2
Hypothesis: A pre-existing personal
relationship between criminal and victim
is more likely in violent crimes than in
crimes against property
Hypothesis: Male cops are more cynical
than female cops
You have full access to crime data for Sin City.
These statistics show that in 2014 there were
200 crimes, of which 75 percent were
property crimes and 25 percent were violent
crimes. For each crime, you know whether
the victim and the suspect were acquainted
(yes/no).
Sin City Police Department has 200 officers;
150 are male and 50 are female.
1. Identify the population.
2. How would you sample proportionally?
3. How would you sample disproportionately?
4. In this example which of the above is
preferable? Why?
Stratified proportionate random sampling
Hypothesis: A pre-existing personal relationship between criminal and victim is
more likely in violent crimes than in crimes against property
Sin City
200 crimes in 2014
50 violent (25 %)
150 property (75 %)
randomly select 30 cases
(15% of the population)
(expect 7.5 violent – 25%)
(expect 22.5 property – 75%)
Compare proportions within each where suspect and victim were acquainted
BUT: The frequency (number of cases) for violent crime is very small!
Stratified disproportionate random sampling
Hypothesis: A pre-existing personal relationship between criminal and victim is
more likely in violent crimes than in crimes against property
Sin City
200 crimes in 2014
50 violent (25 %)
150 property (75 %)
randomly select 30 cases
from each category
30 violent
30 property
Compare proportions within each where suspect and victim were acquainted
Note: don’t recombine these into a single sample to compute a mean!
Stratified proportionate random sampling
Hypothesis: Male cops are more cynical than female cops
150 male
(75 %)
Sin City
200 officers
50 female
(25 %)
randomly select 30 officers
expect 7.5 females
expect 22.5 males
Compare average cynicism scores
BUT: The frequency (number of cases) for females is very small!
Stratified disproportionate random sampling
Hypothesis: Male cops are more cynical than female cops
Sin City
200 officers
150 male (75 %)
50 female (25 %)
randomly select 30 officers from
each stratum
30 males
30 females
Compare average cynicism scores
Note: don’t recombine these into a single sample to compute a mean!
OTHER SAMPLING TECHNIQUES
Quasi-probability sampling:
systematic sampling, cluster sampling
Non-probability sampling
Quasi-probability sampling
•
Systematic sampling
– Randomly select first element, then choose every 5th, 10th, etc.
depending on the size of the sampling frame (number of cases or
elements in the population)
– If done with care can give results equivalent to fully random sampling
– Caution: if elements in the sampling frame are ordered in a particular
way a non-representative sample might be drawn
•
Cluster sampling
– Method
• Divide population into equal-sized groups (clusters) chosen on
the basis of a neutral characteristic
• Draw a random sample of clusters. The study sample contains
every element of the chosen clusters.
– Often done to study public opinion (city divided into blocks)
– Rule of equally-sized clusters usually violated
– The “neutral” characteristic may not be so and affect outcomes!
– Since not everyone in the population has an equal chance of being
selected, there may be considerable sampling error
Non-probability sampling
•
Accidental sample
– Subjects who happen to be encountered by researchers
– Example – observer ride-alongs in police cars
•
Quota sample
– Elements are included in proportion to their known representation in
the population
•
Purposive/“convenience” sample
– Researcher uses best judgment to select elements that typify the
population
– Example: Interview all burglars arrested during the past month
•
Issues
– Can findings be “generalized” or projected to a larger population?
– Are findings valid only for the cases actually included in the samples?
PRACTICAL EXERCISE:
SYSTEMATIC SAMPLING
Class assignment - systematic sampling
Hypothesis: Higher income persons drive more
expensive cars - Income  Car Value
• Independent variable: income
– Categorical, nominal: student
or faculty/staff
• Dependent variable: car value
– Categorical, ordinal: 1 (cheapest),
2, 3, 4 or 5 (most expensive)
• Panel assignment (worth 5 points)
– Select a panel coordinator
– Visit a student lot
– Select ten vehicles in each lot using
systematic sampling
– Use the operationalized car values to
code each car’s value
– Give each team member a filled-in copy
and turn one in per team next week
– The copy you turn in must have the
printed name and signature of each
panelist who participated in collecting
data
PLEASE BRING THIS
FORM TO EVERY CLASS SESSION!