Download Contingency Table Analysis & Review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Contingency Table Analysis
Mary Whiteside, Ph.D.
Overview
Hypotheses of equal proportions
 Hypotheses of independence
 Exact distributions and Fisher’s test
 The Chi squared approximation
 Median test
 Measures of dependence
 The Chi squared goodness-of-fit test
 Cochran’s test

Contingency Table Examples

Countries - religion by government
 States – dominant political party by
geographic region
 Mutual funds - style by family
 Companies - industry by location of
headquarters
More examples 
Countries - government by GDP
categories
 States - divorce laws by divorce rate
categories
 Mutual funds - family by Morning Star
rankings
 Companies - industry by price earnings
ratio category
Statistical Inference hypothesis of
equal proportions
H0: all probabilities (estimated by
proportions, relative frequencies) in
the same column are equal,
H1:at least two of the probabilities in
the same column are not equal
Here, for an r x c contingency table, r
populations are sampled with fixed
row totals, n1, n2, … nr.
Hypothesis of independence
H0: no association
i.e. row and column variable are independent,
H1: an association,
i.e. row and column variable are not independent
Here, one populations is sampled with
sample size N. Row totals are random
variables.
Exact distribution for 2 x 2 tables:
hypothesis of equal proportions; n1 =
n2 = 2
2
0
2
0
2
0
2
0
0
2
1
1
0
2
0
2
0
2
2
0
0
2
1
1
Fisher’s Exact Test
For 2 x 2 tables assuming fixed row
and column totals r, N-r, c, N-c:
 Test statistic = x, the frequency of
cell11
 Probability = hyper-geometric
probability of x successes in a
sample of size r from a population of
size N with c successes

Large sample approximation for
either test
Chi squared
= S [Observed - Expected]2 /Expected
 Observed frequency for cell ij comes
from cross-tabulation of data
 Expected frequency for cell ij

= Probability Cell ij * N

Degrees of freedom (r-1)*(c-1)
Computing Cell Probabilities
Assumes independence or equal
probabilities (the null hypothesis)
 Probability Cell ij = Probability Row i
* Probability Column j
= (R i/N) * (C j/N)
 Expected frequency ij = (R/N)*(C/N)*N
= R*C/N.
Distribution of the Sum
Chi Square with (r-1)*(c-1) degrees of
freedom
 Assumes

[Observed - Expected]2 /Expected
is standard normal squared

Implies
[Observed - Expected] /Square root[Expected]
is standard normal

Implies
 = s2 and Observed is a Poisson RV
Poisson is approximately normal if  > 5,
traditional guideline
 Conover’s relaxed guideline page 201

Measures of Strength:
Categorical Variables
 Phi
2x2
 Cramer's V for rxc
 Pearson's Contingency
Coefficient
 Tschuprow's T
Measures of Strength:
Ordinal Variables
 Lambda A ..
Rows dependent
 Lambda B .. Columns dependent
 Symmetric Lambda
 Kendall's tau-B
 Kendall's tau-C
 Gamma
Steps of Statistical Analysis
Significance - Strength
1- Test for significance of the observed
association
2 - If significant, measure the strength
of the association
Consider the correlation
coefficient
a measure of association (linear relationship
between two quantitative variables)
significant but not strong
 significant and strong
 not significant but “strong”
 not significant and not strong

r and Prob (p-value)
r = .20
 r = .90
 r = .90
 r = .20

p-value < .05
p-value < .05
p-value > .05
p-value > .05
Concepts
Predictive associations must be both
significant and strong
 In a particular application, an
association may be important even if
it is not predictive (I.e. strong)

More concepts
Highly significant , weak
associations result from large
samples
 Insignificant “strong” associations
result from small samples - they may
prove to be either predictive or weak
with larger samples

Examples
Heart attack Outcomes by
Anticoagulant Treatment
 Admission Decisions by Gender

Summary
 Is
there an association?
– Investigate with Chi square p-value
 If
so, how strong is it?
– Select the appropriate measure of
strength of association
 Where
does it occur?
– Examine cell contributions
Related documents