Download Powerpoint

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Time series wikipedia , lookup

Transcript
Quantifying Data
Data Entry
 Define variables, enter case data, conduct runs
 Coding and Recoding
– If numeric values not pre-assigned, decide
on coding system
– If there is open-ended data, would need to
decide how to deal with responses
 Defining your variables
Data Cleaning
 Reread each set of responses back (immediately)
to confirm accuracy
 “Possible-code cleaning”
– easiest way to check is to run a frequency
distribution
 Contingency cleaning
– On the “if” questions
 “Sort” by response
– do you recycle… then check the “what do you
recycle” variable
 Can also run cross tabs and make sure cells are
empty
Basic Analysis – Measures of Central Tendency
 Mean: sum of values divided by the number of
cases
– simple average
 Median: middle attribute in a list of observed
attributes
– extreme cases eliminated
 Mode: most frequently occurring attribute
– used with nominal variables, i.e.. sex
• most respondents were women
• usually report with percentage, 60% were women
Cross Tabs
 Used often with Bivariate data
 Convention usually places
– “independent
columns
variables”
across
top
– “dependent variables” in rows below
in
Coding and data entry options
 Transfer sheets are special forms ruled off in
80 columns
 Edge coding involves recording code #'s in
margins of questionnaires
 Direct data entry involves entering data directly
into computer; eliminating transfer sheets
 Data entry by interviewer (CATI)
 Optical scan sheets
Coding
 What is it?
– It is the assignment of numerical values to
information or responses gathered by
a research instrument
 Codebook: describes the locations of
variables and lists the codes assigned to
the attributes of the variables
Data Management Process
 concerned with the process by which raw
data gathered by some instrument are
converted into numbers for analysis
purposes
 Collect information with data gathering instrument
 Use codebook to transfer this information to a
transfer sheet or code sheet (optional)
 Create data file from information on code sheet
by entering data from a computer keyboard
 Check/clean up data file for accuracy
– Data cleaning done by
– Computer edit programs
– Examine distributions
– Contingency cleaning
 What about open-ended items?
– Read through responses a create a preliminary
code based on responses
– If more than 10% of responses fall into "other"
category, code needs to be revised to include
many of these responses
Elementary Quantitative Analyses
To understand the meaning of
univariate, bivariate, and multivariate
analysis
To become familiar with the meaning
of several univariate and bivariate
statistics
Analysis Strategies
Why do we have to have them?
– People who read our ‘research’
are interested in the highlights
– Should try to communicate
findings in an understandable and
‘painless fashion’
Three types of analysis
 Univariate analysis
– the examination of the distribution of cases on
only one variable at a time (e.g., college
graduation)
 Bivariate analysis
– the
examination
of
two
variables
simultaneously (e.g., the relation between
gender and college graduation)
 Multivariate analysis
– the examination of more than two variables
simultaneously (e.g., the relationship between
gender, race, and college graduation)
“Purpose”
 Univariate analysis
– Purpose: description
 Bivariate analysis
– Purpose: determining the empirical
relationship between the two variables
 Multivariate analysis
– Purpose: determining the empirical
relationship among the variables
Types of Statistics
 Techniques that summarize and describe
characteristics of a group or make
comparisons of characteristics between
groups are knows as descriptive statistics.
 Inferential statistics are used to make
generalizations or inferences about a
population based on findings from a sample.
 The choice of a type of analysis is based on
the evaluation questions, the type of data
collected, and the audience who will receive
the results.
Univariate Analysis
 Involves examination of the distribution
of cases on only ONE variable at a time
 Frequency distributions are listings of the
number of cases in each attribute of a
variable
– Ungrouped frequency distribution
– Grouped frequency distribution
 Proportions express number of cases of
the criterion variable as part of the total
population; frequency of criterion
variable divided by N
Percentages are simple 100 X
proportion
– Or [100 X (frequency of criterion
variable divided by N)]
Rates make comparisons more
meaningful by controlling for
population differences
Measures of Central Tendency
 Measures of central tendency reflect the
central tendencies of a distribution
– Mode reflects the attribute with the
greatest frequency
– Median reflects the attribute that
cuts the distribution in half
– Mean reflects the average; sum of
attributes divided by # of cases
Measures of Dispersion
 Measures of
spread
or
distribution
dispersion reflect
distribution
of
the
the
– Range is the difference between largest &
smallest scores; high – low
– Variance is the average of the squared
differences between each observation and
the mean
– Standard deviation is the square root of
variance
Types of Variables
 Continuous: increase steadily in tiny
fractions
 Discrete: jumps
category
from category
to
Subgroup Comparisons
 Somewhere between univariate
bivariate, are Subgroup Comparisons
&
 Present descriptive univariate data for
each of several subgroups
– Ratios: compare the number of cases in one
category with the number in another
Bivariate Analysis
Bivariate analysis focus on the
relationship
between
two
variables
Contingency Tables
 Format: attributes of independent variable
are used as column headings and attributes
of the dependent variable are used as row
headings
 Guidelines for presenting & interpreting
contingency tables
– Contents of table described in title
– Attributes of each variable clearly described
– Base on which percentages are computed should be
shown
– Norm is to percentage down & compare across
– Table should indicate # of cases omitted from analysis
Multivariate Analysis
Multivariate Analysis allow the
separate and combined effects of
the independent variable to be
examined