Download Lecture #10

Document related concepts

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Foundations of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Analysis of variance wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Action Research
Review
INFO 515
Glenn Booker
INFO 515
Lecture #10
1
Why do we do this?
Measurements are needed to understand
a system, and predict its future behavior
 Statistical techniques provide a commonly
accepted means of analyzing
measurements
 Statistics is based on recognizing that
measurements tend to fall over a range of
values, not just one precise number

INFO 515
Lecture #10
2
Types of Research




Historical (what
happened?)
Descriptive (what is
happening?)
Developmental
(over time)
Case and Field (study
an organization)
INFO 515





Lecture #10
Correlational (does A
affect B?)
Causal Comparative
(what caused it)
True Experimental
(single / double blind)
Quasi-Experimental
Action Research
3
Data Analysis
Raw data, such as one survey result
 Refined data, such as the distribution of
ages of Philadelphia residents
 Derived data, such as comparing the age
distribution of Philadelphia residents to
that of the country

INFO 515
Lecture #10
4
Population vs. Sample
Often the subject of interest (population)
is so big it isn’t feasible to measure it all
 Then a sample of measurements can be
made, and we want to relate the sample
measurement to the population

INFO 515
Lecture #10
5
Sampling

Sampling can be done using probabilistic
techniques (e.g. various random samples)




Simple or stratified random,
Cluster (geographic), or
Systematic (every Nth) samples
Or using non-probabilistic methods
(whoever’s convenient, specific groups,
or experts)
INFO 515
Lecture #10
6
Customer Satisfaction Surveys

A special case of sampling, customer
satisfaction surveys are often done using:




In person interview
Telephone interview
Questionnaire by mail
Sample sizes are based on the allowable
error, population size, and the result
obtained
INFO 515
Lecture #10
7
Measurement Scales

Measurements can use four major types
of scales; the types of analysis possible
depend strongly on the type of
measurements used




INFO 515
Nominal (named buckets, without sequence)
Ordinal (ordered buckets)
Interval (intervals mean something, can +-)
Ratio (you can form ratios, can +-*/ )
Lecture #10
8
Discrete versus Continuous

Discrete (nonparametric) measurements
use nominal or ordinal scales; only specific
values are allowed


Car make = Chevy, or cost = High
Continuous (parametric) measurements
use interval or ratio scales, and generally
have integer or real number values

INFO 515
Temperature = 98.6 deg F, Height = 172.1 cm
Lecture #10
9
Descriptive Statistics

Many common statistics can describe the
central tendency of a set of measurements




INFO 515
Average (arithmetic mean)
Minimum, Maximum, Range
Median (middle value)
Mode (most common value)
Lecture #10
10
Normal Distribution
Many measurements can be described by
a “normal” distribution, which is
summarized by an average value and a
standard deviation, s or s
 We can predict how likely any range of
values is to occur for a normal distribution
(how often is X between 5 and 8?)

INFO 515
Lecture #10
11
Z Score
Z scores measure how far from the mean
a single measurement is
z = (Xi - m) / s
 Same formula used for finding “t” too
 Does not only apply to a normal
distribution, but if it does, then we can
predict the probability of that value or
higher/lower occurring

INFO 515
Lecture #10
12
Standard Error
A sample of N measurements will have a
standard error SEx = s / sqrt(N)
 The standard error allows us to define the
confidence interval, CI
CI = mean +/- crit*SEx
where “crit” is the critical z score for a
large sample, or the critical t score for a
small sample

INFO 515
Lecture #10
13
Critical z and t
The critical z score is only a function of the
desired confidence level of the results
(zc = 1.96 for 95% confidence level)
 Critical t score is a function of the sample
size (degrees of freedom, df = n-1) and
the desired confidence level


INFO 515
As df gets very large, critical t  critical z
Lecture #10
14
Confidence Level
We have to accept some level of
uncertainty in a statistical analysis – our
conclusion might be wrong!
 Generally, a 95% level of confidence is
used, unless life is on the line - then a
99% level of confidence is required


INFO 515
Use 95% typically, hence critical significance
is 0.050
Lecture #10
15
Confidence Level
The level of confidence of your results,
plus the critical significance, always equals
exactly one
 For practically every statistical test, having
the Significance of the result less than
the critical value means to reject the
null hypothesis


INFO 515
If Sig
actual
< Sig
crit,
reject null hypothesis
Lecture #10
16
Frequency and Percentage
Frequency graphs and crosstabs can
provide a lot of information just from
counts of a nominal or ordinal
measurement occurring, possibly given
with the percentages of each event’s
occurrence
 Histograms can provide similar charts for
ratio or interval scaled data

INFO 515
Lecture #10
17
Scatterplots

Scatter plots or diagrams show the
relationship between two or more
measures


INFO 515
The horizontal axis is generally the
independent variable (X), sometimes also
called a factor or grouping variable
The vertical axis is generally the dependent
variable (Y), which is the measure you’re
trying to understand
Lecture #10
18
Hypothesis Testing

Some statistics are used in the context of
testing a hypothesis - a statement whose
truth you wish to determine


Are Philadelphians more likely to be Nobel
Prize winners?
The Null hypothesis is the opposite of the
hypothesis, and generally says there is no
difference or no effect observed

INFO 515
Philadelphians no more likely to be Nobel Prize
winners than any other group
Lecture #10
19
Hypothesis Testing
Can’t truly PROVE anything - only
determine if the differences observed are
“not likely to be due to chance”
 Select one or more “Tests of Significance”
to determine if there is a statistically
significant difference (Yes/No); if Yes, then
can


INFO 515
Select one or more “Measures of Association”
to describe the strength of the difference, and
possibly its direction
Lecture #10
20
One versus Two Tailed Tests
A null hypothesis which tests for “no
difference” uses a two tailed test
 A null hypothesis which specifically tests
for “greater than” uses a one tailed test
 A null hypothesis which specifically tests
for “less than” uses a one tailed test


INFO 515
One versus two tailed changes the critical z
or t score; generally makes the test easier to
show significance – that’s why two-tailed tests
are used
Lecture #10
21
Z or T Test
The z or t tests can be used to compare
two distribution means, or compare one
distribution mean to a fixed value (interval
or ratio data)
 Compare the actual z or t score to the
critical z or t score
 If the actual z or t score is closer to zero
than the critical value, accept the null
hypothesis

INFO 515
Lecture #10
22
Z or T Test (Two Tailed)
Accept Null Hypothesis
Reject Null
Hypothesis
Reject Null
Hypothesis
X
actual z
or t
-crit
mean
z or t
scale
+crit
Notice this is for the x or t value, NOT the significance of that value
INFO 515
Lecture #10
23
Z or T Test (One Tailed)
Accept Null Hypothesis
Reject Null
Hypothesis
X
actual z
or t
mean
z or t
scale
+crit
(Case here is testing if the actual value is greater than the mean;
for a “less than” case, use only the negative critical value.)
INFO 515
Lecture #10
24
Is My Sample Normal?
Boxplots and stem-and-leaf diagrams can
help show graphically whether a sample
has a fairly normal distribution
 The skewness and kurtosis of a data set
can help identify non-normality, if their
values are more than two times their own
standard errors

INFO 515
Lecture #10
25
T Tests

T tests compare means for ratio or
interval data



INFO 515
Independent t test is for two different strata
within one data set
Paired t test is to compare measures of the
same group before and after some event (drug
test), or the samples are otherwise believed to
be dependent on each other
One-sample t test compares one sample to a
fixed value
Lecture #10
26
T Tests
Null hypothesis is that there is no
difference between the means
 Results (e.g. significance) may differ if
variances are not equal, since df changes
 The Levene test checks for equal
variances



INFO 515
Null hypothesis for the Levene test is that the
variances are equal
If the Levene significance < 0.050, variances
are not equal (reject the null hypothesis)
Lecture #10
27
Independent T Test Evaluation

Three ways to check the results of a T test



INFO 515
If the T test’s significance < 0.050, reject the
null hypothesis
Check the stated t value against the critical t
value for this ‘df’ level; if t(actual) > t(critical)
reject the null hypothesis
If the confidence interval for the difference
between the means does not include zero,
reject the null hypothesis
Lecture #10
28
Evaluating Significance
Accept Null
Hypothesis
Reject Null Hypothesis
X
Significance
Actual
Sig.
Critical
0.050
0
INFO 515
Lecture #10
29
Paired T Test Evaluation
Checks before and after test cases
 Includes a correlation factor (like ‘r’)




Can use paired test if significance < 0.050
Larger correlation factor means stronger
relationship between the variables
Test evaluation as Independent T Test

INFO 515
Significance, ‘t’ value, and confidence interval
Lecture #10
30
One-Sample T Test
Compare a sample mean to a fixed value
 Test shows the actual values of means,
with their std deviation and std error
 Same interpretation of results


INFO 515
Significance, ‘t’ value, and confidence interval
Lecture #10
31
F Test and ANOVA
Compare several means against each
other using Analysis of Variance (ANOVA)
and the F test
 Like extending the T tests to many
variables
 Want data from random samples of
normal populations with equal variances

INFO 515
Lecture #10
32
F Test and ANOVA

Output includes the Levene test



Want significance for Levene > 0.050, so
that equal variances can be assumed
Otherwise, should not use ANOVA
Evaluate F by its significance

INFO 515
If Sig. < 0.050, reject the null hypothesis
(there is a significant difference among the
means)
Lecture #10
33
Additional ANOVA Tests
Once the F test shows there is some
difference in the means across a subset,
additional ANOVA tests can help identify
more specific trends and differences
 Types of tests (see end of lecture 6)
include



INFO 515
Pairwise Multiple Comparisons
Post Hoc Range Tests
Lecture #10
34
Pairwise Multiple Comparisons

Pairwise Multiple Comparisons check two
subsets of data at a time


Bonferroni test is better for a small number
of subsets
Tukey test is better for many subsets
Both assume subset variances are equal
 For each pair of subset values,
Sig < 0.050 means the difference in
means is significant

INFO 515
Lecture #10
35
Post Hoc Range Tests

Post Hoc Range Tests look for groups
within each subset which all have similar
variances


Tukey and Tukey’s-b tests include Post Hoc
Range Tests
Each column of the output is a subset with
statistically similar means

INFO 515
Subsets may overlap substantially
Lecture #10
36
Contrasts Across Means
Look across subset means to see if there
is a trend, such as a linear increase or
decrease across subsets
 Can check for Linear, Quadratic, or Cubic
relationships



(i.e. first, second, or third order polynomials)
Check Significance of F for the Unweighted
version of each relationship (Linear, etc.) if
Sig. < 0.050, reject the null hypothesis
INFO 515
Lecture #10
37
Determine Linearity
An option under Compare Means / Means
allows checking just for linearity
 This confirms the ANOVA test result for
Linearity
 And gives R and Eta parameters, which
are Measures of Association

INFO 515
Lecture #10
38
R and Eta
Pearson’s R * measures how well the data
fits the regression (-1 is a perfect negative
correlation, 0 is no relationship, 1 is
perfect positive correlation), and describes
the amount of shared variance
between them
 Eta squared gives how much of the
variance in one variable is caused by
the changes in the other variable

* Named for English statistician Karl Pearson, 1857-1936
(per http://human-nature.com/nibbs/03/kpearson.html)
INFO 515
Lecture #10
39
Regression Analysis
Regression Analysis looks at two interval
or ratio-scaled variables (generically X and
Y) and tries to fit an equation between
them
 A dozen different equations are available



Linear, Power, Logarithmic, Exponential, etc.
Significance is checked by ANOVA F, and
Sig. of the regression coefficients;
association is measured with R Squared
INFO 515
Lecture #10
40
Regression Analysis
For a regression to have any significance,
we must have ANOVA’s Sig. F < 0.050
 Then each variable’s coefficient (b0, b1,
etc.) must have significance < 0.050



Otherwise the coefficient might be zero
Then the better regression equations are
ranked in order of strength by R Square,
which is confirmed visually by plotting
INFO 515
Lecture #10
41
Regression Analysis

The standard error of coefficients is given,
so confidence intervals can be formed

Also helps report them meaningfully, so you
don’t report a value as 4.861435 if it has a
standard error of 0.92

INFO 515
Depending on the accuracy of the source data, you
could report that result as 5 +/- 1, or 4.9 +/- 0.9,
or 4.86 +/- 0.92
Lecture #10
42
Crosstabs
Crosstabs display data sorted by two
or more variables in table form
 Often just counts of each category,
and/or the percentage of counts
 Recoding data allows interval or ratio
scale data to be put into groups (e.g.
age 18-25)

INFO 515
Lecture #10
43
Pearson’s Chi Square
Measures how well the actual (observed)
data differs from a even (expected)
distribution of data
 The “expected” data can be a random
distribution (same number of counts per
cell), or adjusted for the actual total
counts for each row and column

INFO 515
Lecture #10
44
Pearson’s Chi Square Evaluation
When chi square is larger than the critical
value, reject the null hypothesis
 Or if the significance of chi square is <
0.050, reject the null hypothesis
 Can also generate Chi square for a single
variable
 Beware that Chi square is less meaningful
for large matrices


INFO 515
Or, it’s too easy for large matrices to show
significance falsely using Chi square
Lecture #10
45
Residuals
A residual is the difference between the
Observed and Estimated values for a cell
 Residuals can be plotted to look for
outliers
 Residuals can be standardized by dividing
by their standard deviation


INFO 515
Cells with a standardized residual magnitude
> 2 contribute a lot to Chi square
Lecture #10
46
Measures of Association
Measures of Association between two
variables can be symmetric or directional
 Dozens of measures have been developed
to work with chi square test
 Interpret them like ‘r’ - zero means no
correlation, larger values mean a stronger
correlation


INFO 515
Some can be > 1
Lecture #10
47
Measures of Association
Symmetric measures don’t care which
variable is dependent (Y)
 Directional measures DO care which
variable is dependent (A = f(B) is not B =
f(A))


INFO 515
Some directional measures have a “symmetric”
value, the weighted average of the other two
Lecture #10
48
Symmetric Measures

The “Contingency Coefficient” is the main
symmetric measure with a Chi Square test



Works even with nominal data
Evaluated like Pearson’s r
Phi and Cramer’s V are other symmetric
measures
INFO 515
Lecture #10
49
Directional Measures

Directional measures range from 0 to 1


INFO 515
Lambda is the recommended directional
measure - tells what proportion of the
dependent variable is predicted by the
independent variable (like Eta)
Eta can be applied here if one variable is
interval or ratio scaled
Lecture #10
50
Relative Risk and Odds Ratio
Use only with 2x2 tables
 Are quite directional
 Tells how much more likely one cell is to
occur than the others
 Need to be very careful when interpreting

INFO 515
Lecture #10
51
Square Tables

Tables with the same number of rows and
columns (RxR), and the same variables in
those rows and columns, can use kappa



INFO 515
Measures strength of association, like ‘r’
Check results for significance (<0.050)
Then judge the value of kappa using a
fixed scale
Lecture #10
52
General RxC Measures
Many measures can be used with a
general table of R rows and C columns
 Gamma is the recommended measure
(symmetric)
 Spearman’s Correlation Coefficient is also
widely used


INFO 515
Ranges from -1 to +1, based on ordered
categories
Lecture #10
53
Yule’s Q
Yule’s Q is a special case of gamma for a
2x2 table
 Is judged on a fixed scale, like ‘r’

INFO 515
Lecture #10
54