Download Notes 9b - York University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
GS/PPAL 6200 3.00
Research Methods and Information
Systems
A QUANTITATIVE RESEARCH PROJECT (1) DATA COLLECTION
(2) DATA DESCRIPTION
(3) DATA ANALYSIS
A Quantitative Research Project:
Generic Overview
• Research Topic: What is the main issue?
• Research Questions: Descriptive? Relational?
Causal? (How will we measure key variables?)
• Quantitative Research Design: Experimental?
Cross-sectional? Longitudinal?
• Data Collection Method: Survey? Secondary
Data? Census or Sample?
• Data Analysis: Descriptive Statistics? Regression
Analysis?
A Quantitative Research Project:
An Example
 Research Topic: Academic Performance
 Research Questions: How well do graduating students
perform academically? What explains that performance?
Measure “academic performance” by graduating CGPA
 Research Design: Cross-sectional analysis of graduating
students in a given year
? Data Collection: Survey (a random sample of) students
graduating in 2014
? Data Description: Describe the data with basic statistics
? Data Analysis: Reasons for attending university and
performance; Total hours studied and CGPA
DATA COLLECTION INSTRUMENT:
Survey Questionnaire
• To obtain the data from graduating students in
2014 we need a survey instrument and code
book
• To develop the survey questions, we might
first
– Conduct a focus group to get a better sense of the
key factors influencing academic performance of a
small convenience sample
– Conduct a small pilot study to test our survey
instrument and to practice the analysis
Whom to Survey? Census or Sample?
• What is the total theoretical population in which
one is interested? What is the accessible
population?
• Is it feasible or practicable to conduct a census on
the accessible population?
• A census would tell us the actual information for
all graduating students in a given year, which is a
sample of all students graduating for all time
• A sample of all the students graduating in a given
year is then a sample of a sample
DATA COLLECTION PROCESS - OVERVIEW
• WHAT: We will measure “academic performance” by the
student’s CGPA on graduation
• WHO: It is not practicable to survey all graduating students, so
we will choose a sample of students
• HOW: We want to conduct a statistical analysis so we will
collect data from a sample of students
• WHY: To test a hypothesis about the factor(s) that influence
CGPA but…
– We know that the CGPA mean we observe in our sample will be imprecise as a
measure of the true mean
– More accurate information is costly
– So…we choose our sample size guided by this tradeoff
Strategies for selecting cases to study
(i.e., a sample)
Probability sampling:
• Random Sampling – easy to do and explain,
but not the statistically efficient, and may not
be a good representation of sub-groups
• Stratified Random Sampling – take a simple
random sample from subgroups of the
population
• Systematic random sampling – take every kth
unit where k = N/n
Strategies for selecting cases to study
(i.e., sample)
Non-probability Sampling
• Convenience sampling – easily accessible but
not necessarily representative of the
population
• Purposive sampling – reaches a target
population
• Expert sampling – convenes a panel of experts
• Snowball sampling – first respondents
recommend others to be included
How do we Sample?
Whom do we Sample?
• If the relevant population is all York University students
graduating in 2014 with undergraduate degrees, then the
census population is approximately 10,000 students
• To collect information from all 10,000 students is not
practicable; therefore consider sampling
• Sampling Technique: Probability Sampling - Simple Random
Sampling
• Sampling Frame: How to select participants?
– Once Ethics Approval obtained…
– University database contains student contact information and
CGPAs of all graduating students; a random number generator
can perform the randomization for selection; …
How many do we sample? Sample Size
• For inferential statistics, “small” is n < 30
• Decision is guided by two competing goals:
– maximize the probability that we obtain correct
information on the relevant variables and
– minimize the cost of our study
How will we know when we know?
• Understanding the information we have is
complicated by the uncertainty inherent in the
data we collect
• Construct Validity /Measurement Issues: Is CGPA
a good indicator of academic performance?
• Is the mean (average) CGPA we observe for the
census population equal to the “true” mean?
• Is the mean CGPA we observe for a sample equal
to the census population mean?
The Challenge
• We can never know if we are observing the
“true” population mean (average) of CGPA since
any observed population mean will deviate plus
or minus σ (= a “standard deviation”)
• Any census of a graduating class will only be a
sample of the “true” population of all graduating
classes
• A sample of the census population – as a sample
of the sample – introduces more uncertainty
Uncertainty complicates “knowing”
• Uncertainty Source #1: If we are seeking to
explain the key factors determining the CGPA of
graduates, we have to account for the fact that
the observed CGPA might deviate from the true
population mean by an amount sigma ( = σ )
• Uncertainty Source #2: If it is infeasible to
conduct a census of all graduating students in
even one year, and all we can do is sample the
sample, then we have additional uncertainty
related to the size of the sample
Uncertainty from Sampling
• We know there is one inescapable source of
uncertainty (Uncertainty Source #1)
• The sampling error (Uncertainty Source #2)
complicates this uncertainty … but in a
predictable way.
• We know the larger (smaller) the sample size,
the smaller (greater) the uncertainty from any
sampling error IF we use a simple random
sampling method
Some Vocabulary for this Uncertainty
• We can never know if we are observing the
true value of CGPA or some value plus (minus)
deviations due to (1) some unexplainable
shock (Uncertainty Source #1) – so we talk
about a “Confidence Interval”
• We know that sampling error (Uncertainty
Source #2) is possible – so we talk about our
analysis of the sample in terms of its “Margin
of Error”
What we know in the face of this Uncertainty
• For a 95% Confidence Interval we know our
census population mean would be close to the
true population mean 95% of the times and
• …we can have the confidence that the census
population mean plus or minus a random
error will contain the true population mean
95% of the time
What we know (cont’d)
• When sampling error is possible, and we have only
sample statistics to estimate census population values,
we must adjust our understanding of the 95% CI for
this additional uncertainty
• If - we have a Margin of Error of 10% for a 95%
Confidence Interval …
• Then - 90% of the estimated sample Confidence
Intervals in repeated random sampling of the census
population will contain the true population mean
(average) value 95% of the time
Sample Size Guide
(meaningful for studies adopting random sampling)
• For a 95% confidence interval (Margin of Error
of 5%), a sample of 400 is needed
• …95% CI, Margin of Error of 10%, n = 100
• …95% CI, Margin of Error of 3%, n = 1000
• …95% CI, Margin of Error of 1%, n = 10,000