* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Random sampling
Survey
Document related concepts
Transcript
SAMPLING Basic concepts • • • • • • • Why not measure everything? – Practical reason: Measuring every member of a population is too expensive or impractical – Mathematical reason: Random sampling allows us to test hypotheses using inferential (probability) statistics Population – Largest group to which we intend to project the findings of a study (e.g., every inmate in Jay’s prison) – Parameter: A statistic of the population; e.g., mean sentence length Sample – Any subgroup of the population, however selected – Samples intended to represent a population must be selected in a way to make them “representative” (will come up later) Unit of analysis – “Persons, places, things or events” under study – The “container” for the variables “Member” or “element” of the population – What we call a case once it’s been drawn into a sample Sampling frame – A listing of all “elements” or members of the population Probability sampling – “Gold standard” - every element (“case”) in the population has the same chance of being included in the sample – Random sampling is the most common probability technique Jay’s correctional institution Population Sample Sampling accuracy and error • Representativeness: Samples should accurately reflect, or represent, the population from which they are drawn – If a sample is representative, then we can accurately “make inferences” (apply our findings) to the population • We can simply describe the population • Or we can test hypotheses and extend our findings to the population – Warning: we cannot generalize to other populations – only the population from which the sample was drawn • Sampling error: Unintended differences between a population parameter and the equivalent statistic from an unbiased sample – Inevitable result of sampling – Try it out in class! Calculate the parameter, mean age. Then take a random sample (more about that later) and compare it to the sample statistic. – Any difference between the two is “sampling error.” It should decrease as sample size increases – Rule of thumb: To minimize sampling error sample size should be at least 30 for populations up to about 500; for larger populations sample size should be greater RANDOM (PROBABILITY) SAMPLING Sampling process • Sample with or without replacement? – With replacement: Return each case to the population before drawing the next • Keeps the probability of being drawn the same • Makes it possible to redraw the same case – Without replacement: Drawn cases are not returned to the population • Probability of undrawn cases being selected increases as cases are drawn – In social science research sampling without replacement is by far the most common • Most sampling frames are sufficiently large so that as elements are drawn changes in the probability of being drawn are small • Sample: simple or stratified? (examples on next two slides) – In simple random sampling we randomly draw from the entire population – In stratified random sampling we divide the population into subgroups according to a characteristic of interest • For example, male and female; officers and supervisors; violent offenders and property offenders – Can designate strata before or after sampling • Proportionate: Draw a sample from the population without regard to strata, then stratify • Disproportionate (most common): Stratify first, then draw samples of equal size from each stratum Exercise - using simple random sampling to describe a population Population: 200 inmates Mean sentence: 2.94 years Draw a random sample of 10 and compare its mean to the population parameter. Then do the same with a random sample of 30. Frequency (# prisoners) Data from Jay’s correctional center Assignment How much error is there? Does it change with sample size? Koko Wachtel, warden Sentence length in years Exercise - using stratified random sampling to describe a population Population 200 inmates; mean sentence 2.94 years Property crimes: 150 Mean sentence: 2.88 Violent crimes: 50 Mean sentence: 3.12 Assignment Draw a random sample of 30 from each stratum and compare its mean to the corresponding population parameters. How much error is there? Exercise - using random sampling to test a hypothesis Hypothesis: A pre-existing personal relationship between criminal and victim is more likely in violent crimes than in crimes against property You have full access to crime data for Sin City. These statistics show that in 2014 there were 200 crimes, of which 75 percent were property crimes and 25 percent were violent crimes. For each crime, you know whether the victim and the suspect were acquainted (yes/no). Applying what we learned from the preceding two slides… 1. Identify the population. 2. How would you sample? A. Would you stratify before or after? B. Which is better? Why? Stratified proportionate random sampling Hypothesis: A pre-existing personal relationship between criminal and victim is more likely in violent crimes than in crimes against property Sin City 200 crimes in 2014 50 violent (25 %) 150 property (75 %) randomly select 30 cases (15% of the population) (expect 7.5 violent – 25%) (expect 22.5 property – 75%) Compare proportions of these cases where suspects knew the victim Stratified disproportionate random sampling Hypothesis: A pre-existing personal relationship between criminal and victim is more likely in violent crimes than in crimes against property Sin City 200 crimes in 2014 50 violent (25 %) 150 property (75 %) randomly select 30 cases from each category 30 violent 30 property Compare proportions within each where suspect and victim were acquainted (Note: cannot combine results) Exercise: Using random sampling to test hypotheses Hypothesis1: Gender affects cynicism (two-tailed) Hypothesis2: Male cops are more cynical than female cops (one-tailed) Sin City Police Department has 200 officers; 150 are male and 50 are female. We wish to test the above hypotheses. 1. Identify the population. 2. How would you sample? A. Would you stratify? In advance or later? B. Which is better? Why? Stratified proportionate random sampling Hypothesis: Gender affects cynicism (two-tailed) Male cops are more cynical than female cops (one-tailed) 150 male (75 %) Sin City 200 officers 50 female (25 %) randomly select 30 officers expect 7.5 females expect 22.5 males Compare average cynicism scores Is there a problem? Hint: how many females in the sample? Stratified disproportionate random sampling Hypothesis: Gender affects cynicism (two-tailed) Male cops are more cynical than female cops (one-tailed) Sin City 200 officers 150 male (75 %) 50 female (25 %) randomly select 30 officers from each stratum 30 males 30 females Compare average cynicism scores Note: don’t recombine these into a single sample! Sampling in experiments Making cops “kinder” and “gentler” The Anywhere Police Department has 200 patrol officers, of which 150 are males and 50 are females. Chief Jay wants to test a program that’s supposed to reduce officer cynicism. Hypothesis: Officers who complete the training program will be less cynical Dependent variable: Score on cynicism scale (1-5, low to high) Independent variable: Cynicism reduction program (yes/no) Stratified disproportionate random sampling Hypothesis: officers who complete the training program will be less cynical population: 200 patrol officers 150 males (75%) 50 females (25%) CONTROL GROUP EXPERIMENTAL GROUP EXPERIMENTAL GROUP CONTROL GROUP Randomly Assign 25 Officers Randomly Assign 25 Officers Randomly Assign 25 Officers Randomly Assign 25 Officers For each group, pre-measure dependent variable officer cynicism Apply the intervention (apply the value of the independent variable – the program.) NO YES YES NO For each group, post-measure dependent variable officer cynicism Also compare within-group changes – what do they tell us? OTHER SAMPLING TECHNIQUES Quasi-probability sampling • Systematic sampling – Randomly select first element, then choose every 5th, 10th, etc. depending on the size of the sampling frame (number of cases or elements in the population) – If done with care can give results equivalent to fully random sampling – Caution: if elements in the sampling frame are ordered in a particular way a non-representative sample might be drawn • Cluster sampling – Method • Divide population into equal-sized groups (clusters) chosen on the basis of a neutral characteristic • Draw a random sample of clusters. The study sample contains every element of the chosen clusters. – Often done to study public opinion (city divided into blocks) – Rule of equally-sized clusters usually violated – The “neutral” characteristic may not be so and affect outcomes! – Since not everyone in the population has an equal chance of being selected, there may be considerable sampling error Non-probability sampling • Accidental sample – Subjects who happen to be encountered by researchers – Example – observer ride-alongs in police cars • Quota sample – Elements are included in proportion to their known representation in the population • Purposive/“convenience” sample – Researcher uses best judgment to select elements that typify the population – Example: Interview all burglars arrested during the past month • Issues – Can findings be “generalized” or projected to a larger population? – Are findings valid only for the cases actually included in the samples? PRACTICAL EXERCISE Class assignment - non-experimental designs Hypothesis: Higher income persons drive more expensive cars - Income Car Value • Independent variable: income – Categorical, nominal: student or faculty/staff • Dependent variable: car value – Categorical, ordinal: 1 (cheapest), 2, 3, 4 or 5 (most expensive) • Assignment – Visit one faculty and one student lot. – Select ten vehicles in each lot using systematic sampling – Use the operationalized car values to code each car’s value – Give each team member a filled-in copy and turn one in per team next week – We will complete the tables in class – This assignment is worth five points PLEASE BRING THESE FORMS TO EVERY CLASS SESSION!