Download Chap. 14: The Chi-Square Test & The Analysis of

Document related concepts
no text concepts found
Transcript
Contingency Tables
1.
Explain 2 Test of Independence
2.
Measure of Association
Contingency Tables
• Tables representing all combinations of
levels of explanatory and response
variables
• Numbers in table represent Counts of the
number of cases in each cell
• Row and column totals are called
Marginal counts
2x2 Tables
• Each variable has 2 levels
– Explanatory Variable – Groups (Typically
based on demographics, exposure)
– Response Variable – Outcome (Typically
presence or absence of a characteristic)
2x2 Tables - Notation
Outcome
Present
Outcome
Absent
Group
Total
Group 1
n11
n12
n1.
Group 2
n21
n22
n2.
Outcome
Total
n.1
n.2
n..
2 Test of Independence
2 Test of Independence
• 1. Shows If a Relationship Exists Between
2 Qualitative Variables
– One Sample Is Drawn
– Does Not Show Causality
• 2. Assumptions
– Multinomial Experiment
– All Expected Counts  5
• 3. Uses Two-Way Contingency Table
2 Test of Independence
Contingency Table
• 1. Shows # Observations From 1 Sample
Jointly in 2 Qualitative Variables
2 Test of Independence
Contingency Table
• 1. Shows # Observations From 1 Sample
Jointly in 2 Qualitative Variables
Levels of variable 2
House Style
Split-Level
Ranch
Total
House Location
Urban
Rural
63
49
15
33
78
82
Levels of variable 1
Total
112
48
160
2 Test of Independence
Hypotheses & Statistic
• 1. Hypotheses
– H0: Variables Are Independent
– Ha: Variables Are Related (Dependent)
2 Test of Independence
Hypotheses & Statistic
• 1. Hypotheses
– H0: Variables Are Independent
– Ha: Variables Are Related (Dependent)
• 2. Test Statistic
 
2

all cells
Observed count
ch
ch
nij  E nij

E n
ij
2
Expected
count
2 Test of Independence
Hypotheses & Statistic
• 1. Hypotheses
– H0: Variables Are Independent
– Ha: Variables Are Related (Dependent)
• 2. Test Statistic
 
2

all cells
Observed count
ch
ch
nij  E nij

E n
ij
2
Expected
count
Rows Columns
• Degrees of Freedom: (r - 1)(c - 1)
2

Test of Independence
Expected Counts
• 1. Statistical Independence Means Joint
Probability Equals Product of Marginal
Probabilities
• 2. Compute Marginal Probabilities &
Multiply for Joint Probability
• 3. Expected Count Is Sample Size Times
Joint Probability
Expected Count Example
Expected Count Example
Location
Urban Rural
House Style Obs. Obs.
Total
Split-Level
63
49
112
Ranch
15
33
48
78
82
160
Total
Expected Count Example
Marginal probability = 112
160
Location
Urban Rural
House Style Obs. Obs.
Total
Split-Level
63
49
112
Ranch
15
33
48
78
82
160
Total
Expected Count Example
Marginal probability = 112
160
Location
Urban Rural
House Style Obs. Obs.
Total
Split-Level
63
49
112
Ranch
15
33
48
78
82
160
Total
Marginal probability =
78
160
Expected Count Example
Joint probability =
112 78
160 160
Marginal probability = 112
160
Location
Urban Rural
House Style Obs. Obs.
Total
Split-Level
63
49
112
Ranch
15
33
48
78
82
160
Total
Marginal probability =
78
160
Expected Count Example
Joint probability =
112 78
160 160
Marginal probability = 112
160
Location
Urban Rural
House Style Obs. Obs.
Total
Split-Level
63
49
112
Ranch
15
33
48
78
82
160
Total
Marginal probability =
78
160
112 78
Expected count = 160·
160 160
= 54.6
Expected Count Calculation
Expected Count Calculation
Expected count =
aRow totalf aColumn totalf
Sample size
Expected Count Calculation
Expected count =
aRow totalf aColumn totalf
Sample size
House Location
112·82
160
Urban
Rural
House Style Obs. Exp. Obs. Exp. Total
112·78
160
Split-Level
63
54.6
49
57.4
112
Ranch
15
23.4
33
24.6
48
78
78
82
82
Total
48·78
160
160
48·82
160
2 Test of Independence
Example
• You’re a marketing research analyst. You ask a
random sample of 286 consumers if they
purchase Diet Pepsi or Diet Coke. At the .05
level, is there evidence of a relationship?
Diet Coke
No
Yes
Total
Diet Pepsi
No
Yes
84
32
48
122
132
154
Total
116
170
286
2 Test of Independence
Solution
2 Test of Independence
Solution
• H0:
• Ha:
=
• df =
• Critical Value(s):
Test Statistic:
Decision:
Reject
Conclusion:
0
2
2 Test of Independence
Solution
• H0: No Relationship Test Statistic:
• Ha: Relationship
=
• df =
• Critical Value(s):
Decision:
Reject
Conclusion:
0
2
2 Test of Independence
Solution
• H0: No Relationship Test Statistic:
• Ha: Relationship
  = .05
• df = (2 - 1)(2 - 1)
•
=1
Critical Value(s):
Decision:
Reject
Conclusion:
0
2
2 Test of Independence
Solution
• H0: No Relationship Test Statistic:
• Ha: Relationship
  = .05
• df = (2 - 1)(2 - 1)
•
=1
Critical Value(s):
Decision:
Reject
 = .05
0
3.841
2
Conclusion:
2 Test of Independence
Solution

E(nij)  5 in all
cells
116·132
286
Diet Pepsi
154·116
286
No
Yes
Diet Coke Obs. Exp. Obs. Exp. Total
No
84
53.5
32
62.5
116
Yes
48
78.5
122
91.5
170
132
132
154
154
286
Total
170·132
286
170·154
286
2 Test of Independence
Solution
 
2

all cells
ch
ch
nij  E nij
E n
ij
af
af
n11  E n11

E n
11
84  53.5

53.5
2
2
2
af
af
n12  E n12

E n
32  62.5

62.5
12
2
2
af
af
n22  E n22

E n
122  91.5

91.5
2
22
2
 54.29
2 Test of Independence
Solution
• H0: No Relationship Test Statistic:
• Ha: Relationship
2 = 54.29
  = .05
• df = (2 - 1)(2 - 1)
•
=1
Critical Value(s):
Decision:
Reject
 = .05
0
3.841
2
Conclusion:
2 Test of Independence
Solution
• H0: No Relationship Test Statistic:
• Ha: Relationship
2 = 54.29
  = .05
• df = (2 - 1)(2 - 1)
•
=1
Critical Value(s):
Reject
 = .05
0
3.841
2
Decision:
Reject at  = .05
Conclusion:
2 Test of Independence
Solution
• H0: No Relationship Test Statistic:
• Ha: Relationship
2 = 54.29
  = .05
• df = (2 - 1)(2 - 1)
•
=1
Critical Value(s):
Reject
 = .05
0
3.841
2
Decision:
Reject at  = .05
Conclusion:
There is evidence of a
relationship
Siskel and Ebert
•
|
Ebert
•
Siskel |
Con
Mix
Pro |
Total
• -----------+---------------------------------+---------•
Con |
24
8
13 |
45
•
Mix |
8
13
11 |
32
•
Pro |
10
9
64 |
83
• -----------+---------------------------------+---------•
Total |
42
30
88 |
160
Siskel
and
Ebert
•
|
Ebert
•
Siskel |
Con
Mix
Pro |
Total
•-----------+---------------------------------+---------•
Con |
24
8
13 |
45
•
|
11.8
8.4
24.8 |
45.0
•-----------+---------------------------------+---------•
Mix |
8
13
11 |
32
•
|
8.4
6.0
17.6 |
32.0
•-----------+---------------------------------+---------•
Pro |
10
9
64 |
83
•
|
21.8
15.6
45.6 |
83.0
•-----------+---------------------------------+---------•
Total |
42
30
88 |
160
•
|
42.0
30.0
88.0 |
160.0
•
Pearson chi2(4) =
45.3569
p < 0.001
Yate’s Statistics
• Method of testing for association for 2x2
tables when sample size is moderate (
total observation between 6 – 25)
 O
ij
 
2
i
 eij  0.5
j
eij

2
Measures of association
– Relative End
Risk
– Odds Ratio
– Absolute Risk
of Chapter
Any blank slides that follow are
blank intentionally.
Relative Risk
• Ratio of the probability that the outcome
•
characteristic is present for one group,
relative to the other
Sample proportions with characteristic from
groups 1 and 2:
n11
1 
n1.
^
n21
2 
n2.
^
Relative Risk
• Estimated Relative Risk:
^
RR   1
^

2
95% Confidence Interval for Population Relative Risk:
( RR (e 1.96
v
) , RR (e1.96
^
e  2.71828
v 
v
))
^
(1   1 )
(1  

n11
n21
2
)
Relative Risk
• Interpretation
– Conclude that the probability that the
outcome is present is higher (in the
population) for group 1 if the entire interval is
above 1
– Conclude that the probability that the
outcome is present is lower (in the
population) for group 1 if the entire interval is
below 1
– Do not conclude that the probability of the
outcome differs for the two groups if the
interval contains 1
Example - Coccidioidomycosis and
TNF-antagonists
• Research Question: Risk of developing Coccidioidmycosis
associated with arthritis therapy?
• Groups: Patients receiving tumor necrosis factor  (TNF)
versus Patients not receiving TNF (all patients arthritic)
TNF
Other
Total
Source: Bergstrom, et al (2004)
COC
7
4
11
No COC
240
734
974
Total
247
738
985
Example - Coccidioidomycosis and
TNF-antagonists
• Group 1: Patients on TNF
• Group 2: Patients not on TNF
^
7
4
1 
 .0283  2 
 .0054
247
738
^
^
1
.0283
RR  ^ 
 5.24
 2 .0054
95%CI : (5.24e 1.96
.3874
1  .0283 1  .0054
v

 .3874
7
4
, 5.24e1.96
.3874
)  (1.55 , 17.76)
Entire CI above 1  Conclude higher risk if on TNF
Odds Ratio
• Odds of an event is the probability it occurs
•
•
divided by the probability it does not occur
Odds ratio is the odds of the event for group 1
divided by the odds of the event for group 2
Sample odds of the outcome for each group:
n11 / n1.
n11
odds1 

n12 / n1. n12
odds2 
n21
n22
Odds Ratio
• Estimated Odds Ratio:
odds1 n11 / n12 n11n22
OR 


odds2 n21 / n22 n12n21
95% Confidence Interval for Population Odds Ratio
( OR (e 1.96
v
) , OR (e1.96 v ) )
1
1
1
1
e  2.71828
v 



n11
n12
n21
n22
Odds Ratio
• Interpretation
– Conclude that the probability that the
outcome is present is higher (in the
population) for group 1 if the entire interval is
above 1
– Conclude that the probability that the
outcome is present is lower (in the
population) for group 1 if the entire interval is
below 1
– Do not conclude that the probability of the
outcome differs for the two groups if the
interval contains 1
Example - NSAIDs and GBM
• Case-Control Study (Retrospective)
– Cases: 137 Self-Reporting Patients with Glioblastoma
Multiforme (GBM)
– Controls: 401 Population-Based Individuals matched to
cases wrt demographic factors
GBM Present GBM Absent
NSAID User
32
138
NSAID Non-User
105
263
Total
137
401
Source: Sivak-Sears, et al (2004)
Total
170
368
538
Example - NSAIDs and GBM
32(263)
8416

 0.58
138(105) 14490
1
1
1
1
v



 0.0518
32 138 105 263
OR 
95% CI : ( 0.58e 1.96
0.0518
, 0.58e1.96
0.0518
)  (0.37 , 0.91)
Interval is entirely below 1, NSAID use appears
to be lower among cases than controls
Absolute Risk
• Difference Between Proportions of outcomes
with an outcome characteristic for 2 groups
• Sample proportions with characteristic
from groups 1 and 2:
n11
1 
n1.
^
n21
2 
n2.
^
Absolute Risk
Estimated Absolute Risk:
^
^
AR   1   2
95% Confidence Interval for Population Absolute Risk
 ^  ^  ^ 
 1  1   1   2 1   2 

 

AR  1.96
n1.
n2.
^
Absolute Risk
• Interpretation
– Conclude that the probability that the
outcome is present is higher (in the
population) for group 1 if the entire interval is
positive
– Conclude that the probability that the
outcome is present is lower (in the
population) for group 1 if the entire interval is
negative
– Do not conclude that the probability of the
outcome differs for the two groups if the
interval contains 0
Example - Coccidioidomycosis and
TNF-antagonists
• Group 1: Patients on TNF
• Group 2: Patients not on TNF
^
7
4
1 
 .0283  2 
 .0054
247
738
^
^
^
AR   1   2  .0283  .0054  .0229
.0283(.9717) .0054(.9946)

247
738
 .0229  .0213  (0.0016 , 0.0242)
95%CI : .0229  1.96
Interval is entirely positive, TNF is
associated with higher risk
Ordinal Explanatory and Response
Variables
• Pearson’s Chi-square test can be used to test
•
associations among ordinal variables, but more
powerful methods exist
When theories exist that the association is
directional (positive or negative), measures exist
to describe and test for these specific
alternatives from independence:
– Gamma
– Kendall’s tb
Concordant and Discordant Pairs
• Concordant Pairs - Pairs of individuals where one
•
•
individual scores “higher” on both ordered
variables than the other individual
Discordant Pairs - Pairs of individuals where one
individual scores “higher” on one ordered
variable and the other individual scores “lower”
on the other
C = # Concordant Pairs D = # Discordant Pairs
– Under Positive association, expect C > D
– Under Negative association, expect C < D
– Under No association, expect C  D
Example - Alcohol Use and Sick Days
• Alcohol Risk (Without Risk, Hardly any Risk,
•
•
•
Some to Considerable Risk)
Sick Days (0, 1-6, 7)
Concordant Pairs - Pairs of respondents where
one scores higher on both alcohol risk and sick
days than the other
Discordant Pairs - Pairs of respondents where
one scores higher on alcohol risk and the other
scores higher on sick days
Source: Hermansson, et al (2003)
Example - Alcohol Use and Sick Days
A
Y
C
D
d
o
d
d
a
t
A
W
7
3
5
5
H
4
3
6
3
S
2
5
4
1
T
3
1
5
9
• Concordant Pairs: Each individual in a given cell is
concordant with each individual in cells “Southeast”
of theirs
•Discordant Pairs: Each individual in a given cell is
discordant with each individual in cells “Southwest”
of theirs
Example - Alcohol Use and Sick Days
A
Y
C
D
d
o
d
d
a
t
A
W
7
3
5
5
H
4
3
6
3
S
2
5
4
1
T
3
1
5
9
C  347(63  56  25  34)  113(56  34)  154(25  34)  63(34)  83164
D  145(154  63  52  25)  113(154  52)  56(52  25)  63(52)  73496
Measures of Association
• Goodman and Kruskal’s Gamma:
CD
 
CD
^
^
1    1
• Kendall’s tb:
CD
^
tb 
(n   ni. )( n 2   n. j )
2
2
2
When there’s no association between the ordinal variables,
the population based values of these measures are 0.
Statistical software packages provide these tests.
Example - Alcohol Use and Sick Days
C  D 83164  73496


 0.0617
C  D 83164  73496
^
c
y
m
a
b
o
r
l
E
x
o
u
O
K
5
0
7
5
O
G
2
2
7
5
N
9
a
N
b
U
Related documents