Download Samples

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
INTRO TO RESEARCH METHODS
SPH-X590 SUMMER 2015
SAMPLING
RANDOM SELECTION
RANDOM ASSIGNMENT
Presentation Outline
• Data Collection Methods: Participants & Sampling
o
o
o
o
o
o
Homogeneity & Heterogeneity
Validity
Terminology
Inference
Sample Sizes
Sampling Distribution
• Study Designs Considerations for Sampling
o Experimental, Quasi & Non
o Random Selection vs. Randomization
o Error
• Probability Samples and Non-Probability Samples
Data Collection Methods:
Homogeneity & Sampling
• Sampling is a bit philosophical in a way: the basic premise is
that we are not that different from one another.
• Sampling is basically how many individuals do I need in order
to say something about the group to which they belong.
• For example, if every member of a group or a population is
identical, then that population is homogenous.
o The characteristics of any one individual in the population is the same
as the characteristics of any other member of the population.
o There is little or no variation among individuals.
Data Collection Methods:
Homogeneity & Sampling
• If the humanoid population of Mars is homogenous,
how many aliens would we need to abduct from Mars
in order to understand what Martians are like?
Data Collection Methods:
Heterogeneity & Sampling
• When members of a group or a population are different from
one another, the population is heterogeneous
o A wide range of characteristics among individuals
o Significant variation among individuals
• How does this change alien abduction scheme to understand
Martians?
• To describe a heterogeneous population, we need to observe
multiple individuals so that we capture the full range/ variety of
important characteristics that may exist.
Data Collection Methods:
Validity & Sampling
• In the ideal study design for our research on Martians, we
would randomly select the aliens (i.e. subjects) from the
larger population of Martians (i.e. the population of
interest).
o
Randomly picking Martians ensures External Validity.
• Then we would randomly assign the individual Martians
(i.e. subjects) to a Control or an Experimental condition.
o Randomly assigning the Martians ensures Internal Validity).
o Rarely are researchers able to study people (much less
Martians) in an Experimental design using random selection and
assignment.
Data Collection Methods:
Validity & Sampling
• In most scientific research, more emphasis on Internal
Validity than External Validity.
o More concerned with whether the Independent Variable is truly
“causing” a change in the Dependent Variable, than with the
generalizability of the effect.
• Replicating the study with other populations is a way to
establish External Validity (i.e. Reproducibility).
•
•Random Sampling ensures that the Dependent Variable is the
only variable which has a differential influence on conditions
being compared in the study.
o Controls for extraneous Confounding Variables
Data Collection Methods:
Sampling Terminology
• Sample Element: a single case/unit from a population and measured—
Unit of Analysis
o For example, a person, thing, specific time, etc.
o For example, a Martian
• Sample Universe: theoretical aggregation of all possible Sample Elements —
Unspecified by time and space
o For example, Mars/ All Martians
• Theoretical Population: theoretical aggregation of specified elementsDefined by time and space
o For example, 2020 Urban Population of Martians
Data Collection Methods:
Conceptual Model of Sampling
Sample Universe
Theoretical Population
Sample Population
Sample Frame
Sample
Sample
Elements
Data Collection Methods:
Sampling Terminology
• Sample, Target, or Study Population: aggregation of the population from
which the sample is actually drawn
o For example, Martians living in the Capitol City of Mars in 2020
• Sampling Frame: a specific list that closely approximates all elements in the
population— Researcher selects units from this list to create the study sample
o For example, the 2020 Phonebook of Martians living in the Capitol City of Mars.
• Sample: a set of cases that is drawn from a larger pool and used to make
generalizations about the population
Data Collection Methods:
Samples & Populations
The Theoretical Population
To what population
do you want to
generalize?
2020 Martian Population
The Study Population
Martians living in the
Capitol City of Mars in
2020 .
What population can
you access?
How are you going to
access your sample?
Who is in your study?
The Sampling Frame
Martians listed in the 2020,
Mars Capital City Yellow Pages.
The Sample
A Group of Martians
Data Collection Methods & Analysis:
Variation & Sampling
• A sample of individuals of the population must have
essentially the same variation as the population of individuals
for any information from the sample to be useful.
• The more different individuals in a population are from one
another, the greater the chance is the sample does not
sufficiently describe a population.
o The more heterogeneous the population, the more likely that the
inferences we make about the population are wrong.
o The more heterogeneous the population, the larger the sample needs
to be to adequately describe the population.
• The more observations, the more accurate the inferences
Data Collection Methods & Analysis:
Sampling Terminology
•
Parameter: any characteristic of a population that is true/ known
•
Estimate: any characteristic assessed from a sample
•
Sampling Error: how close the sample estimates for a characteristic are to the population
parameter
o
o
o
o
o Estimates refer to Samples
o For example, of the 100 Martians sampled for our study, 45% are male and 55% are female.
o
o
o
o
•
Known parameters are from a Census.
A Census is when all members of the Study Population included in the study
For example, % of males or % of females of the 2010 US population
For example, % of males or % of females of the 2020 Martian population
How well the observed value approximates the true/known value of the population.
Sampling Error is a result of not being able to study all the members of a population, but only a sample
of individuals from that population.
Sampling Error is an estimate of precision.
For example, news polls often report the results of a poll followed by “ + or – ” 5% or 3 points.
Standard Error (SE): a measure of Sampling Error
o
o
o
SE is an inverse function of Sample Size: think back to Pre-Calculus.
As Sample Size increases, SE decreases: the sample is more precise.
So, the smallest Standard Error (SE) has the greatest precision!: if uncertain, choose to increase sample
size.
Data Collection Method & Analysis:
Sampling & Causal Inference
Your Expectations
of the Data:
Hypotheses
Actual
observations
in the data
Population
Sampling Process
Sampling
Frame
The Sample
Causal
Inference
• Sampling allows researchers to use the data to say something (make an
inference) with confidence, about a whole (population) based on the study
of a only a few members (sample).
• I can infer something about all Martians based on information I collect from
a small number of Martians.
Data Collection Methods & Analysis:
Sample Sizes
Question: How large should the size of the Sample be?
Answer: It all depends.
• Sample Size is a matter of:
o How much sampling error can be tolerated?: Level of Precision
o How big or small is the Population?: Sample Size is very important with small
populations
o How different are individuals from each other within the population in regards
to the characteristic of interests? : Within Group Variation
o How small is the smallest subgroup within the sample for which estimates are
needed?: Sample Size must be big enough to properly estimate or make
inferences about the smallest subgroup in the sample.
– http://www.surveysystem.com/sscalc.htm
Data Collection Methods & Analysis:
Populations, Samples, & Statistics
Variable
Responses
Date of Interview (DOI): MM/DD/YYYY
1. What is your Birthdate (DOB)?
•
MM/DD/YYYY
Operational Definition:
AGE = DOI- DOB
Average Age=
143. 69 Years
Statistics
The Sample
Population Parameter
Study Population
Average Age=
143. 72 Years
Data Collection Methods & Analysis:
Sample Statistic
• The Standard Error (SE) is highest for a population that has a
50:50 distribution the characteristic/ variable of interest.
• There is NO SE for a characteristic with 100% distribution
across all the Sample Elements.
o SE refers to variables: variables by definition have more than 1 value.
Notation
•
o
o
o
o
•
o
Small letters (miniscule) whether English or Greek
alphabet refer to samples.
s (or se) = standard error
S=
n = sample size
p = % having particular characteristic (1-q)
q = % not having particular characteristic (1-p)
Large Letters whether English or Greek alphabet
refer statistics about your sample
For example, N = Population Size
q*p
S=
n
S=
.9 * .1
100
= ..03 or 3%
.5 *.5 = .05 or 5%
100
Data Collection Methods & Analysis:
Sample Sizes & Sampling Errors
How comfortable are you with how wrong or right the result of you analysis is?
Would you invest your money based answer that could be 10% higher or lower?
Data Collection Methods & Analysis:
The Sampling Distribution
Martian Sample 1
Martian Sample 2
Martian Sample 3
Average = 125.20 years
Average = 126.20 years
Average = 125.12 years
The Sampling Distribution
is the distribution of a statistics across an
infinite number of samples.
The Age Distribution of 100
Martians from a limitless
number of samples pulled from
the Study Population of
Martians has a characteristic
distribution/ curve:
the average of the averages
Data Collection Methods & Analysis:
Designs
• Experimental Design:
o The researcher randomly assign subjects to treatment/ conditions
(=variables).
o Causal Estimation is possible
o Causal Inferences are stronger
o Random Sampling from the population less important
o Usually conducted in a Laboratory
• Quasi- Experimental or Observational Design:
o
o
o
o
o
e.g., survey research, polls, etc.
Subjects are not randomly assigned to variables
Random Sampling is important.
Selection Bias is a concern
Causal Inference is compromised.
Data Collection Methods & Analysis:
Error
Ideally, sample statistics should be as close as possible to
population parameters, but variability has many causes:
•
Probability Sampling error: the difference between a sample statistic and
its population parameter.
o Random Sampling allows us to estimate the typical size of the Sampling Error.
•
Non-Sampling Error: from other sources, can be systematic bias, and is
difficult to estimate.
o Examples of non-sampling error include under-coverage, nonresponse,
question wording / response bias, question order.
Data Collection Methods & Analysis:
Natural Experiment Designs
• A type of Quasi- Experimental or Observational
Studies (esp. surveys) in which respondents’ values
on a causal variable are plausibly random.
o Some consider it an Experimental Design
o Researcher could not or did not manipulate the
Experimental variable.
Examples:
• Powerball lottery
• Births in last half 2014
• City councils headed by women
• Parity 3 birth after same sex or opposite sex
Data Collection Methods & Analysis:
Random Selection or Random Assignment
• Random Selection and Random Assignment are commonly confused or
used interchangeably: the terms refer to entirely different processes.
• Random Selection refers to how sample members are selected from the
population to participate in the study.
• Random Selection relies on some form of Random Sampling.
o Random Sampling is a probability sampling method: relies on the laws of
probability to select a sample.
o Sample Statistics from a random sample all for causal inferences/ estimation
to the population parameters: the basis of statistical tests of significance.
• Random Assignment is a component of Experimental Design.
o Study participants have equal chance of ending up in the Experimental group/
condition or the Control group/ condition random procedure.
o Random Assignment is also known as Randomization.
Data Collection Methods & Analysis:
Probability, Probability Samples, & Representativeness
• Your sample must be r e p r e s e n t a t i v e of the population in terms
of the variables of interest.
o A sample will be r e p r e s e n t a t i v e of the population from which it
comes, if each individual member of the population has an equal chance
(Probability) of being picked.
• Probability Samples are more accurate than Non-Probability Samples
o Conscious and unconscious Sampling Bias removed
o Probability Samples allow researchers to estimate the accuracy of the sample.
o Probability Samples permit the estimation of population parameters.
Data Collection Methods & Analysis:
Sampling, Samples & Probability
• Sampling is the process of selecting observations (a
sample) to provide an adequate description and robust
inferences of the population from which the sample comes.
o Your sample must be r e p r e s e n t a t i v e of the
population.
• There are 2 types of Sampling:
1. Non-Probability Sampling
2. Probability Sampling
Probability Sampling:
Simple Random Sampling (SRS)
• The basic sampling method which most others are based on.
• Method:
o A sample size ‘n’ is drawn from a population ‘N’ in such a way that every possible element
in the population has the same chance of being selected.
o Take a number of samples to create a sampling distribution
• Typically conducted “without replacement”
• What are some ways for conducting an SRS?
o Random numbers table, drawing out of a hat, random timer, etc.
• Not usually the most efficient, but can be most accurate!
o Time & money can become an issue
o What if you only have enough time and money to conduct one sample?
Probability Sampling:
A Simple Random Sample
How To
1.
2.
3.
4.
List all the subjects in a population
Assign a number to each subject
Pick numbers from a list of random numbers
Select the subjects who correspond to the random numbers to be in
the sample.
Pros
• Works well for people in households, or students in classes, for example.
Cons
• The larger the population the more Cost and Feasibility become
problematic.
Probability Sampling:
Systematic & Stratified Sampling
• Systematic Random Sample:
o Pick a random case from the first k cases of a sample: select every kth case after
that one
o For example, you randomly picked case 12; then pick every 5th case afterwards
until you have the sample size you need.
• Stratified Random Sample:
o Divide a population into groups (or strata), then select a simple random sample
from each stratum.
o For example, dividing the population of Martians into different groups based on
the characteristics of their eyes: No Eyes, One Eye, Two Eyes or Three Eyes.
o Then, randomly picking Martians from each of those groups until I have the
sample size to represent the group (stratum) and the population.
Probability Sampling:
Systematic Random Sampling
• Method:
o Starting from a random point on a sampling frame, every nth element
in the frame is selected at equal intervals (Sampling Interval).
o Sampling Interval tells the researcher how to select elements from
the frame (1 in ‘k’ elements is selected): Depends on sample size
needed
• Example:
o You have a sampling frame (list) of 10,000 people and you need a
sample of 1000 for your study: What is the sampling interval that you
should use?
o Every 10th person listed (1 in 10 persons)
• Empirically provides identical results to SRS, but is more
efficient.
• Caution: Need to keep in mind the nature of your frame for SS
to work- beware of periodicity!
o Periodicity is the number of times you plan on collecting data from
participants: for example, 4 times a year for 5 years.
Probability Sampling:
Stratified Random Sampling
• Method:
o Divide the population by certain characteristics into homogeneous subgroups
(strata)
o Elements within each strata are homogeneous, but are heterogeneous across
strata.
o A simple random or a systematic sample is taken from each strata relative to the
proportion of that stratum to each of the others.
• When is it appropriate?
o When a stratum of interest is a small percentage of a population and random
processes could miss the stratum by chance.
o When enough is known about the population that it can be easily broken into
subgroups or strata.
Probability Sampling:
Cluster & Multistage Sampling
• Cluster Sampling:
o Divide the population into groups called clusters or primary sampling units (PSUs);
take a random sample of the clusters.
o For example, if I wanted to study Martian school children, rather than randomly
selecting from a list all Martian school students, I could create a list of all Martian
schools and randomly choose the schools to be in my sample.
• Multistage Sampling:
o Several levels of nested clusters, often including both stratified and cluster sampling
techniques.
o For example, I may use a cluster sampling method to choose schools for my study of
Martian school children, and then I select a simple random sample of students
within the school.
o Alternatively, I could randomly select Martian schools to create a sampling frame of
Martian schools, then create a sampling frame of the Martian students attending
those schools.
o I could create a systematic or a simple random sample of the Martian students from the
list of Martian students until the needed sample size is reached: schools would be selected
based on the probability of the student in that school is selected.
Probability Sampling:
Cluster Sampling
• Some populations are spread out
o For example, over a state or country
• Elements occur in clumps: Primary Sampling Units (PSU)
o For example, towns, districts, schools
• Sample Elements are hard to reach and identify.
• Trade Accuracy for Efficiency.
o Convenience, Effort, Time and Resources are important considerations.
• You cannot assume that any one cluster is better or worse
than another cluster.
Probability Sampling:
Multistage Sampling
• Used when:
o Researchers lack a good sampling frame for a dispersed population.
o The cost to reach a sample element to is very high.
• Each cluster is internally heterogeneous and homogeneous to
all the other clusters: Between vs. Within Variability
• Usually less expensive than Simple Random Sampling but not
as accurate
o Each stage in Cluster Sampling introduces Sampling Error: the more
stages, the greater the likelihood of error .
• Can combine Simple Random Sampling, Systematic
Random Sampling, Stratified Sampling with Cluster
Sampling!!
Data Collection Methods and Analysis:
Random Selection or Random Assignment
Why Random Selection?
• Each Sample Element (i.e. individual) has an equal probability of being
picked for a study: selection process is unpredictable/ no pattern
• Reduces research bias
• Researcher can calculate the probability of certain outcomes because of
the Sampling Distribution.
• Several types of probability samples
Why Random Assignment?
• Samples created by Random Assignment are most likely the best
representative of the population of interest
• Random Assignment as a random processes, allows researchers to
calculate the deviation between the sample statistics and the population
parameter.
Data Collection Methods and Analysis:
Randomization/ Random Assignment
• A general term for the techniques for insuring that any member
of a population has an equal chance of appearing in a sample.
• Each participant in the study has an equal and unbiased chance
of being assigned to any of the conditions being compared in
the experiment.
• Parallel groups are equated which controls for both known and
unknown extraneous/ confounding variables.
• Sample statistics from randomized samples (i.e. samples formed
from randomization/ random assignment) will on average have
the same values as the population parameters.
Non-Probability Sampling:
• Non-probability Sampling:
o cannot specify the probability that a given sample will be selected
o Examples: Snowball Sampling or Respondent Driven Sampling
Why would use a Non-Probability Sample?
• Often inexpensive
• Great for “Hard to Reach” Populations
o Difficult to sample,
o Require great trust, or
o Using lengthy unstructured interviews
• Some variables and their relationships are universal: makes
sampling method irrelevant!
o Many Life & Medical Science Researches: human physiology and anatomy
are so similar that we don’t need many subjects and don’t need to worry
about generalizing to the population of human beings.