Download Data Analysis - Fresno State Email

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Categorical variable wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Data Analysis
Statistics
Levels of Measurement
• Nominal – Categorical; no implied rankings among the
categories. Also includes written observations and
written responses from qualitative interviews or openended survey questions.
• Ordinal – Categorical data with implied rankings or data
obtained through respondent ranking of categories. In
some cases, a ranking process may be set up for a
particular variable.
• Interval – No fixed zero point. Data is numerical, not
categorical. Rank order among variables is explicit with
an equal distance between points in the data set: -2, -1,
0, +1, + 2
• Ratio – Fixed zero point; otherwise the same as interval.
•
•
•
•
•
•
•
•
In general, type of data can be
inferred using the following the
criteria
Nominal – Categorical; no implied rankings among the categories. Also
includes written observations and written responses from qualitative
interviews or open-ended survey questions.
Ordinal – Categorical data with implied rankings or data obtained through
respondent ranking of categories. In some cases, a ranking process may be
set up for a particular variable.
Interval – No fixed zero point. Data is numerical, not categorical. Rank order
among variables is explicit with an equal distance between points in the
data set: -2, -1, 0, +1, + 2
Ratio – Fixed zero point; otherwise the same as interval.
Any categorical data is either nominal or ordinal.
All qualitative data is nominal.
All scores on standardized scales are either interval or ratio. (Note: almost
all the scales we use in social work, except IQ scores are ratio).
The level of measurement determines what statistical method we can use.
In some cases, we can covert a
variable into another level of
measurement
We can change a variable from
ratio to either ordinal or nominal
Coverting Data (Use Recode in
SPSS)
Data Set Categories
5
1 to 2
Occurrence
s
2
8
4
2
9
3 to 5
6 to 8
9 to 10
3
3
2
6
10
7
Advantages of using ratio data
• We can covert it to another level of data;
we can’t do this with nominal data.
• People can simply write down information
about how they fit a particular attribute
(age, income).
• We have more statistical options with ratio
data. Inferential statistics requires that
dependent variables always be ratio.
Primary types of data analysis
are:
• Qualitative
• Descriptive. Used to describe the distribution of
a single variable or the relationship between two
nominal variables (mean, frequencies, crosstabulation)
• Inferential (Used to establish relationships
among variables; assumes random sampling
and a normal distribution)
• Nonparametric (Used to establish causation for
small samples or data sets that are not normally
distributed)
Much of what you will use in your
research will be descriptive
statistics.
For example, the most basic type of descriptive
statistic is the frequency. Frequencies are the
number of times a specific value or data within a
specific category occurs.
Most often we convert frequencies to percentages
– Formula is f/n, where f = frequency and n = the
total number of values in a data set. For
example, the if the age 25 occurs 5 times in a
data set of 50 = 5/50 = 10%.
Examples of use of frequency data
• 40% of respondents are male.
• The mean level of income was $35,000
• 40% of all female voters cast their vote for
Arnold compared to 52% of the male voters.
*Note: the other descriptive statistic we use is the
standard deviation. It describes the degree to
which data points vary from the mean of a
distribution. In a research article, you will see the
standard deviation included with the mean.
Application of Standard Deviation
(SD)
• Mean income was $35,000 with SD = $
5,000
• M = $23,000, SD = $500
• This is interpreted as there being less
variability in income among members of
the second data set. That is scores are
grouped more tightly around the mean.
Normal Distribution
•
•
•
•
Mean=median=mode
Bell shape curve
50% of scores fall below and 50% fall above the mean.
Data set can be assessed in terms of how much data
falls within one, two or three standard deviations from
the mean.
• Generally is unimodal although some distributions may
be bimodal or trimodal.
• Theoretically, at least, inferential statistics may only be
used when a set of scores conform to a normal
distribution. However, this assumption is often violated.
Frequencies used in almost all types of data analysis.
Frequency tables can be formatted in a variety of ways.
(Some analysis add value and cumulative percent)
Age
Number Percent
0-18
10
20.0%
19-34
15
30.0%
35-64
15
30.0%
65 &
over
Total
10
20.0%
50
100%
We can also use tables to determine if there
is a relationship between two nominal
variables, although we can not assess the
strength of the relationship. This is called a
cross-tabulation
Starting Salary
$20,000 to
$29,999
$30,000 to
$39,999
$40,000 to
$54,999
Total
Female
Male
19 (70%)
5 (23%)
7 (26%)
14 (64%)
1 (4%)
3 (13%)
27 (100%)
22 (100%)
Categories in both Qualitative
Analysis must be:
• Mutually exclusive (no overlap)
• Exhaustive (all possible categories should
be included)
Cross-tabulation is the basis for
chi-square. Chi-square:
• Measures the strength of the relationship
between the two variables in the table.
• Is not technically a inferential statistic –
does not require a normal distribution –
but is often grouped with inferential
statistics.
• Usually requires a random sample
although data collected from everyone in a
population group is usually considered
sufficient for a chi-square analysis.
Means can also be used to make
comparisons among groups.
Income
Male
M = $35,000 SD =
$5,000
Female
M = $22,000 SD =
$750
You may use means on your
project
• If your variables include ratio data
• If you want to compare groups on a ratio
variable
• If you want to summarize scores on a
standardized instrument or a likert scale
Some inferential statistics look at the strength of
the relationship between mean scores on ratio
level variables and membership in particular
demographic group
• T-tests (two group comparisons)
• Analysis of variance (compares three or
more groups)
Answers question: Is the difference in
means between the two (or more) groups
large enough to be statistically significant?
We also use correlations to measure the
strength of a relationship between two
variables. Correlations can only be used
• To assess the strength of two ratio level
variables.
• To measure associations rather than
cause and effect relationships.
• With data sets in which there are 30 or
more observations.
Inferential statistics commonly used
include:
• Independent T-test (compares two groups on one
variable). (Test statistic = T)
• Paired sampled t-test (compares ratio level scores on
pre and post test data). (Test statistic = T)
• ANOVA – compares three or more groups on ratio data
(Test statistic = F)
• Correlation – measures the association between two
ratio level variables (Test statistic = R)
• Regression analysis (dependent ratio variable – can
include more than one independent variable (can be a
combination of ratio, ordinal, and nominal data in the
regression model). (Test statistic is R2, F, or partial
correlation coefficients)
Inferential Statistics require that we assess
the probability that there is actually a causal
relationship between two variables.
• We state the research & null hypotheses.
• State the degree to which we will risk being wrong about
whether or not a relationship actually exists between two
variables (level of significance – usually under .10)
• Choose an appropriate statistical test and compute it.
• Compare the probability level on your computer print out
to the level of significance. If the p. value is lower than
your confidence level, then reject the null hypothesis. If
the p value is higher than the confidence level, accept
the null hypothesis.
For example:
• There is a positive relationship between
scores on the self-esteem scale and
depression. Level of significance is .05. R
= .75, p = .01. Reject Null Hypothesis and
accept the Research Hypothesis.
• Women will have higher test scores than
men. Level of significance = .10. T = .30,
p. = .60. Accept the Null Hypothesis and
Reject the Research Hypothesis.
Other info
• Chi-square is interpreted in the same way as inferential
statistics.
• Most statistics books contain tables that let you
determine p values if you calculate test statistics by
hand.
• SPSS print outs always contain p values for inferential
statistics.
• Theoretical assumptions are often violated in research
articles.
• Sample size determines if a relationship between two or
more variables is large enough to be statistically
significant.
• Relationships between two variables can be either
positive or negative. High positive relationships are close