Download X 2 - s3.amazonaws.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Inference for two-way tables
General R x C tables
• Tests of homogeneity of a factor across groups or
independence of two factors rely on Pearson’s X2 statistic.
• X2 is compared to a c2((r-1)x(c-1)) distribution
• Expected cell counts should be larger than 5.
2 x 2 tables
• Cohort (prospective) data (H0: relative risk for incidence = 1)
• Case-control (retrospective) data (H0: odds ratio = 1)
• Cross-sectional data (H0: relative risk for prevalence = 1)
• Paired binary data – McNemar’s test (H0: odds ratio = 1)
• For rare disease OR  RR
• Fisher’s exact test
Fall 2002
Biostat 511
299
Categorical Data
Types of Categorical Data
•Nominal
•Ordinal
Often we wish to assess whether two
factors are related. To do so we construct an
R x C table that cross-classifies the
observations according to the two factors.
Such a table is called a contingency table.
We can test whether the factors are “related”
using a c2 test.
We will consider the special case of 2 x 2
tables in detail.
Fall 2002
Biostat 511
300
Categorical Data
Contingency tables arise from two different, but
related, situations:
1) We sample members of 2 (or more) groups
(e.g. lung cancer vs control) and classify
each member according to some qualitative
characteristic (e.g. cigarette smoking).
Cancer
Control
None
p11
p21
Number cigarettes/day
<5
5-14 15-24 25-49
p12
…
p22
…
50+
The hypothesis is
H0: groups are homogeneous (p1j=p2j for all
j)
HA: groups are not homogeneous
Fall 2002
Biostat 511
301
Categorical Data
Contingency tables arise from two different, but
related, situations:
2) We sample members of a population and
cross-classify each member according to two
qualitative characteristics (e.g. willingness to
participate in vaccine study vs education
level).
definitely
not
< HS
p11
high school
p21
> HS
:
p.1
probably probably definitely
not
p12
p13
p14
…
p1.
The hypothesis is
H0: factors are independent (pij=pi.p.j )
HA: factors are not independent
Fall 2002
Biostat 511
302
Categorical Data
Example 1. Education versus willingness to
participate in a study of a vaccine to prevent HIV
infection if the study was to start tomorrow. Counts,
row percents and row totals are given.
definitely probably probably definitely Total
not
not
< high
52
79
342
226
699
school
7.4%
11.3%
48.9%
32.3%
high school
62
153
417
262
894
6.9%
17.1%
46.6%
29.3%
some
53
213
629
375
1270
college
4.2%
16.8%
49.5%
29.5%
college
54
231
571
244
1100
4.9%
21.0%
51.9%
22.2%
some post
18
46
139
74
277
college
6.5%
16.6%
50.2%
26.7%
graduate/
25
139
330
116
610
prof
4.1%
22.8%
54.1%
19.0%
Total
264
861
2428
1297
4850
5.4%
17.8%
50.1%
26.7%
Fall 2002
Biostat 511
303
Categorical Data
Example 2. From the 1984 General Social
Survey
Very
Income dissatisfied
< 6000
6000-15000
15000-25000
>25000
Fall 2002
20
22
13
7
Job Satisfaction
Somewhat Moderately Very
dissatisfie
satisfied satisfied
d
24
80
82
38
104
125
28
81
113
18
54
92
Biostat 511
304
Categorical Data
Example 3: From Doll and Hill (1952) retrospective assessment of smoking frequency.
The table displays the daily average number of
cigarettes for lung cancer patients and control
patients.
Cancer
Control
Total
Fall 2002
None
7
0.5%
61
4.5%
68
<5
55
4.1%
129
9.5%
184
Daily # cigarettes
5-14 15-24 25-49 50+
489
475
293
38
36.0% 35.0% 21.6% 2.8%
570
431
154
12
42.0% 31.8% 11.3% 0.9%
1059
906
447
50
Biostat 511
Total
1357
1357
2714
305
Test of Homogeneity
In example 3 we want to test whether the smoking
frequency is the same for each of the populations
sampled. We want to test whether the groups are
homogeneous with respect to a characteristic. The
concept is similar to a t-test, but the response is
categorical.
H0: smoking frequency same in both groups
HA: smoking frequency not the same
Q: What does H0 predict we would observe if
all we knew were the marginal totals?
None
Cancer
50+ Total
1357
Control
1357
Total
Fall 2002
68
<5
Daily # cigarettes
5-14 15-24 25-49
184
1059
Biostat 511
906
447
50 2714
306
Test of Homogeneity
A: H0 predicts the following expectations:
Daily # cigarettes
5-14 15-24 25-49 50+ Total
529.5
453 223.5
25 1357
Cancer
None
34
<5
92
Control
34
92
529.5
453
223.5
25 1357
Total
68
184
1059
906
447
50 2714
Each group has the same proportion in each cell as
the overall marginal proportion. The “equal”
expected number for each group is the result of
the equal sample size in each group (what would
change if there were half as many cases as
controls?)
Fall 2002
Biostat 511
307
Test of Homogeneity
Recall, we often use the Poisson distribution to
model counts. Suppose the observed counts in each
cell, Oij, are Poisson random variables with means
mij. Then
Oij  mij
Z
mij
would be approximately normal.
It turns out that Z2 has a known distribution … it
follows a “chi-squared (c2) distribution with 1
degree of freedom” (MM table F).
Further, the sum of squared independent standard
normal random variables follows a chi-square
distribution with n degrees of freedom.
Let Zi be standard normals, N(0,1) and let
X
Z12
 Z 22
   Z n2
n
  Zi2
i 1
X has a c2(n) distribution
Fall 2002
Biostat 511
308
Test of Homogeneity
Therefore,
2


O

m
ij
ij
Z2 
~ c 2 (1)
m ij
We don’t know the mij, but, under H0, we can
estimate them based on the margins. We call
these the expected counts, Eij.
Summing the differences between the observed
and expected counts provides an overall
assessment of H0.
X2  
i, j
Oij  Eij 2 ~ c 2 (r  1)  (c  1)
Eij
X2 is known as the Pearson’s Chi-square
Statistic.
Fall 2002
Biostat 511
309
Test of Homogeneity
In example 3 the contributions to the X2 statistic
are:
Cancer
None
<5
7  34
55  92
2
34
Daily # cigarettes
5-14 15-24 25-49
etc.
2
50+
Total
92
2
Control 61  34
34
Total
Cancer
None < 5
21.44 14.88
Control
21.44 14.88
Daily # cigarettes
5-14 15-24 25-49 50+ Total
3.10
1.07 21.61 6.76
3.10
1.07
21.61
6.76
Total
X2  
i, j

Oij  Eij
Eij

2
 137.7
Looking in MM table F, we find that Qc.952 (5)= 11.07.
Conclusion?
Fall 2002
Biostat 511
310
Test of Independence
The Chi-squared Test of Independence is
mechanically the same as the test for
homogeneity. The only difference is that the R x
C table is formed based on the levels of 2 factors
that are cross-classified. Therefore, the null and
alternative hypotheses are different:
H0: The two factors are independent
HA: The two factors are not independent
Independence implies that each row has the same
relative frequencies (or each column has the same
relative frequency).
Example 1 is a situation where individuals are
classified according to two factors. In this
example, the assumption of independence implies
that willingness to participate doesn’t depend on
the level of education.
Fall 2002
Biostat 511
311
definitely probably probably definitely Total
not
not
< high
52
79
342
226
699
school
7.4%
11.3%
48.9%
32.3%
high school
62
153
417
262
894
6.9%
17.1%
46.6%
29.3%
some
53
213
629
375
1270
college
4.2%
16.8%
49.5%
29.5%
college
54
231
571
244
1100
4.9%
21.0%
51.9%
22.2%
some post
18
46
139
74
277
college
6.5%
16.6%
50.2%
26.7%
graduate/
25
139
330
116
610
prof
4.1%
22.8%
54.1%
19.0%
Total
264
861
2428
1297
4850
5.4%
17.8%
50.1%
26.7%
Q: Based on the observed row proportions,
how does the independence hypothesis look?
Q: How would the expected cell frequencies be
calculated?
Q: How many degrees of freedom would the
chi-square have?
Fall 2002
Biostat 511
312
The expected counts under independence are ...
< high
school
high school
some
college
college
some post
college
graduate/
prof
Total
definitely probably probably definitely Total
not
not
38.1
124.1
349.9
186.9
699
48.7
69.1
158.7
225.5
447.6
635.8
239.1
339.6
894
1270
59.9
15.1
195.3
49.2
550.7
138.7
294.2
74.1
1100
277
33.2
108.3
305.4
163.1
610
264
5.4%
861
17.8%
2428
50.1%
1297
26.7%
4850
X2 = 89.7
15 df
p < .0001
Fall 2002
Biostat 511
313
Summary
c2 Tests for R x C Tables
1. Tests of homogeneity of a factor across
groups or independence of two factors rely
on Pearson’s X2 statistic.
2. X2 is compared to a c2((r-1)x(c-1))
distribution (MM, table F or display
chiprob(df,X2)).
3. Expected cell counts should be larger than 5.
4. We have considered a global test without
using possible factor ordering. Ordered
factors permit a test for trend (see Agresti,
1990).
Fall 2002
Biostat 511
314
2 x 2 Tables
Example 1: Pauling (1971)
Patients are randomized to either receive
Vitamin C or placebo. Patients are followedup to ascertain the development of a cold.
Vitamin C
Cold - Y Cold - N
17
122
Total
139
Placebo
31
109
140
Total
48
231
279
Q: Is treatment with Vitamin C associated
with a reduced probability of getting a cold?
Q: If Vitamin C is associated with reducing
colds, then what is the magnitude of the
effect?
Fall 2002
Biostat 511
315
2 x 2 Tables
Example 2: Keller (AJPH, 1965)
Patients with (cases) and without (controls)
oral cancer were surveyed regarding their
smoking frequency (this table collapses over
the smoking frequency categories).
Case
484
Control
385
Total
869
NonSmoker
27
90
117
Total
511
475
986
Smoker
Q: Is oral cancer associated with smoking?
Q: If smoking is associated with oral cancer,
then what is the magnitude of the risk?
Fall 2002
Biostat 511
316
2 x 2 Tables
Example 3: Norusis (1988)
In 1984, a random sample of US adults were
cross-classified based on their income and
reported job satisfaction:
Dissatisfied Satisfied Total
< $15,000
104
391
495
 $15,000
66
340
406
Total
170
731
901
Q: Is salary associated with job satisfaction?
Q: If salary is associated with satisfaction,
then what is the magnitude of the effect?
Fall 2002
Biostat 511
317
2 x 2 Tables
Example 4: HIVNET (1995)
Subjects were surveyed regarding their
knowledge of vaccine trial concepts both at
baseline and at month 3 after an informed
consent process. The following table shows
the subjects cross-classified according to the
two responses.
Month 3
Incorrect Correct
Incorrect
Baseline
Correct
Total
Total
251
178
429
68
319
98
276
166
595
Q: Did the informed consent process improve
knowledge?
Q: If informed consent improved knowledge
then what is the magnitude of the effect?
Fall 2002
Biostat 511
318
2 x 2 Tables
Each of these tables can be represented as
follows:
E
not E
Total
D
not D
Total
a
b
(a + b) = n1
c
d
(c + d) = n2
(a + c) = m1 (b + d) = m2
N
The question of association can be addressed
with Pearson’s X2 (except for example 4) We
compute the expected cell counts as follows:
Expected:
E
not E
Total
Fall 2002
D
not D
Total
n1m1/N
n1m2/N
(a + b) = n1
n2m1/N
n2m2/N
(c + d) = n2
(a + c) = m1 (b + d) = m2
N
Biostat 511
319
2 x 2 Tables
Pearson’s chi-square is given by:
X   Oi  Ei  / Ei
4
2
2
i 1
2
2
2
nm 
n m  
/ 2 1    d  2 2 
N 
 N  
n m  n m  
n m  n m 

  a  1 1  / 1 1    b  1 2  / 1 2  
N   N  
N   N 

n2 m1 

c



N 

2
n m 
/ 2 2  
 N 
N ad  bc 

n1n2 m1m2
2
Q: How does this X2 test compare in Example 1 to
simply using the 2 sample binomial test of
H 0 : P( D | E )  P( D | E ) ?
Fall 2002
Biostat 511
320
2 x 2 Tables
Example 1: Pauling (1971)
Vitamin C
Cold - Y Cold - N
17
122
Total
139
Placebo
31
109
140
Total
48
231
279
H0 : probability of disease does not depend
on treatment
HA : probability of disease does depend on
treatment
N ad  bc
X 
n1n2 m1m2
2
2
27917  109  31 122

139  140  48  231
 4.81
For the p-value we compute P(c2(1) > 4.81) =
0.028. Therefore, we reject the independence of
treatment and disease.
2
Fall 2002
Biostat 511
321
Vitamin C
Cold - Y Cold - N
17
122
Total
139
Placebo
31
109
140
Total
48
231
279
Two sample test of binomial proportions:
p1 = P(cold | Vitamin C)
p2 = P(cold | placebo)
H0 : p 1 = p 2
HA : p 1  p2
Z
pˆ 1  pˆ 2
pˆ 0 1  pˆ 0 1 / n1  1 / n2 
(17 / 139  31/ 140)
48 231
1 / 139  1 / 140
279 279
 2.193

For the 2-sided p-value we compute 2  P(| Z |
> 2.193) = 0.028. Therefore, we reject H0 with
the exact same result as the c2 test. (Z2 = X2)
Fall 2002
Biostat 511
322
2 x 2 Tables
Applications In Epidemiology
Example 1 fixed the number of E and not E, then
evaluated the disease status after a fixed period of
time. This is a prospective study. Given this design
we can estimate the relative risk:
RR 
P D | E 
P D | E 
The range of RR is [0, ). By taking the logarithm,
we have (- , +) as the range for ln(RR) and a better
approximation to normality for the estimated lnRˆ R  :
 Pˆ D | E  
ˆ

lnRR   ln
ˆ
 PD | E  
 a / n1 

 ln
 c / n2 

1  p1 1  p2 

lnRˆ R  ~ N  ln p1 / p2 ,

p1n1
p2 n2 

Fall 2002
Biostat 511
323
Vitamin C
Cold - Y Cold - N
17
122
Total
139
Placebo
31
109
140
Total
48
231
279
The estimated relative risk is:
Pˆ  D | E 
ˆ
RR 
Pˆ D | E 
17 / 139

31/ 140
 0.55
We can obtain a confidence interval for the relative
risk by first obtaining a confidence interval for the logRR:
 
 1 
1  p1 1  p2
lnRˆ R   QZ 2  

p1n1
p2 n2
For Example 1, a 95% confidence interval for the log
relative risk is given by:
1  pˆ1 1  pˆ 2
lnRˆ R   1.96 

pˆ1n1
pˆ 2 n2
ln0.55  1.96 
Fall 2002
122
109

17139 31140
Biostat 511
324
-0.593 ± 1.96 × 0.277
-0.593 ± 0.543
(-1.116, -0.050)
To obtain a 95% confidence interval for the
relative risk we exponentiate the end-points of the
interval for the log - relative risk. Therefore,
( exp(-1.116), exp(-0.050))
( .33 , .95 )
is a 95% confidence interval for the relative risk.
Fall 2002
Biostat 511
325
2 x 2 Tables
Applications In Epidemiology
In Example 2 we fixed the number of cases and
controls then ascertained exposure status. Such a
design is known as case- control study. Based on
this we are able to directly estimate:
P( E | D) and P( E | D)
However, we generally are interested in the
relative risk which is not estimable from these
data alone - we’ve fixed the number of diseased
and diseased free subjects. Instead of the relative
risk we can estimate the exposure odds ratio
which Cornfield (1951) showed equivalent to the
disease odds ratio:
P E | D  / 1  P E | D  P D | E  / 1  P D | E 

PE | D / 1  PE | D  PD | E / 1  PD | E 
Fall 2002
Biostat 511
326
Odds Ratio
Furthermore, for rare diseases, P(D | E)  0 so that the
disease odds ratio approximates the relative risk:
P D | E  / 1  P D | E  P D | E 

P D | E / 1  P D | E  P D | E 
Since with case-control data we are able to effectively
estimate the exposure odds ratio we are then able to
equivalently estimate the disease odds ratio which for
rare diseases approximates the relative risk.
Fall 2002
Biostat 511
327
2 x 2 Tables
Applications in Epidemiology
Like the relative risk, the odds ratio has [0, ) as
its range. The log odds ratio has (- , +) as its
range and the normal approximation is better as an
approximation to the estimated log odds ratio.
p /q
OR  1 1
p2 / q2
pˆ / qˆ
Oˆ R  1 1
pˆ 2 / qˆ 2
ad
Oˆ R 
bc
Confidence intervals are based upon:

1
1
1
1 

ln Oˆ R ~ N ln(OR),



n1 p1 n1q1 n 2 p2 n 2 q2 

 
Therefore, a (1 - ) confidence interval for the log
odds ratio is given by:


 1 
1 1 1 1
 ad 
ln   QZ 2  
  
a b c d
 bc 
Fall 2002
Biostat 511
328
Example 2:
Case
484
Control
385
Total
869
NonSmoker
27
90
117
Total
511
475
986
Smoker
The estimated odds ratio (odds of cancer for
smokers relative to the odds of cancer for nonsmokers) is given by:
484  90
Oˆ R 
 4.19
27  385
A 95% confidence interval for the log odds ratio is
given by:
1
1
1 1
ln(4.19)  1.96 

 
484 385 27 90
1.433  1.96  0.230
1.433  0.450
( 0.983 , 1.883 )
Fall 2002
Biostat 511
329
To obtain a 95% confidence interval for the
odds ratio we simply exponentiate the end-points
of the interval for the log odds ratio. Therefore,
( exp(0.983) , exp(1.883) )
or
( 2.672 , 6.573 )
is a 95% confidence interval for the odds ratio.
Fall 2002
Biostat 511
330
2 x 2 Tables
Applications in Epidemiology
Example 3 is an example of a cross-sectional study
since only the total for the table is fixed in advance.
The row totals or column totals are not fixed in
advance.
In epidemiological studies, the relative risk or odds
ratio may be used to summarize the association
when using a X-sectional design. The major
distinction from a prospective study is that a crosssectional study will reveal the number of cases
currently in the sample. These are known as
prevalent cases. In a prospective study we count
the number of new cases, or incident cases.
Study
Cohort
Probability Description
incidence probability of
obtaining the disease
Cross-sectional prevalence probability of having
the disease
Fall 2002
Biostat 511
331
Paired Binary Data
Example 4 measured a binary response pre and
post treatment. This is an example of paired
binary data. One way to display these data is the
following:
Baseline
Month 3
Total
Correct
166
276
Incorrect
429
319
Total
595
595
442
748
1190
Q: Can’t we simply use X2 Test of Homogeneity to
assess whether this is evidence for an increase in
knowledge?
A: NO!!! The X2 tests assume that the rows are
independent samples. In this design it is the same
595 people at Baseline and at 3 months.
Fall 2002
Biostat 511
332
Paired Binary Data
For paired binary data we display the results as
follows:
Time 2
0
1
Time 1 0 n00
n01
1 n10
n11
This analysis explicitly recognizes the heterogeneity
of subjects. Thus, those that score (0,0) and (1,1)
provide no information about the effectiveness of the
treatment since they may be “weak” or “strong”
individuals. These are known as the concordant
pairs. The information regarding treatment is in the
discordant pairs, (0,1) and (1,0).
p1 = success probability at Time 1
p2 = success probability at Time 2
H0 : p 1 = p2
HA : p 1  p2
Fall 2002
Biostat 511
333
Paired Binary Data
McNemar’s Test
Under the null hypothesis, H0 : p1 = p2, we expect
equal numbers to change from 0 to 1 and from 1 to 0
(E[n01] = E[n10]). Specifically, under the null:
M  n01  n10
1

n10 | M ~ Bin M , 
2

n10  M 21
Z
M 21 1 12 
Under H0, Z2 ~ c2(1), and forms the basis for
McNemar’s Test for Paired Binary Responses.
The odds ratio comparing the odds of success at
Time 2 to Time 1 is estimated by:
n01
ˆ
OR 
n10
Confidence intervals can be obtained as described in
Breslow and Day (1981), section 5.2, or in Armitage
and Berry (1987), chapter 16.
Fall 2002
Biostat 511
334
Paired Binary Data
A common epidemiological design is to
match cases and controls regarding certain
factors (e.g. age, gender…) then ascertain the
exposure history (e.g. smoking) for each
member of the pair. The results for all pairs
can be summarized by:
Control EE+
Case
En00
n10
E+
n01
n11
Given this design we can use McNemar’s Test
to test the hypotheses
Fall 2002
H0 : P ( D | E )  P ( D | E )
(OR = 1)
HA : P ( D | E )  P ( D | E )
(OR  1)
Biostat 511
335
Example 4:
Month 3
Incorrect Correct
Incorrect
Baseline
Correct
Total
Total
251
178
429
68
319
98
276
166
595
We can test H0: p1 = p2 using McNemar’s
Test:
n01  M 21
Z
M 12  12 

178  178  68 / 2
178  68 / 4
 7.01
Comparing 7.012 to a c2 (1) we find that p <
0.001. Therefore we reject the null hypothesis
of equal success probabilities for Time 1 and
Time 2.
We estimate the odds ratio as Oˆ R  178 / 68  2.62.
Fall 2002
Biostat 511
336
Summary for 2 x 2 Tables
•Cohort Analysis (Prospective)
1. H0: P( D | E )  P( D | E )
2. RR for incident disease
3. c2 test
•Case Control Analysis (Retrospective)
1. H0: P( E | D)  P( E | D)
2. OR ( RR for rare disease)
3. c2 test
•Cross-sectional Analysis
1. H0: P( D | E )  P( D | E )
2. RR for prevalent disease
3. c2 test
•Paired Binary Data
1. H0: P( D | E )  P( D | E )
2. OR
3. McNemar’s test
Fall 2002
Biostat 511
337
Fisher’s Exact Test
Motivation: When a 2  2 table contains cells
that have fewer than 5 expected observations,
the normal approximation to the distribution
of the log odds ratio (or other summary
statistics) is known to be poor. This can lead
to incorrect inference since the p-values based
on this approximation are not valid.
Solution: Use Fisher’s Exact Test
D+
E+
D- Total
n1
ETotal
Fall 2002
n2
m1
m2
Biostat 511
N
338
Fisher’s Exact Test
Example: (Rosner, p. 370) Cardiovascular
disease. A retrospective study is done among
men aged 50-54 who died over a 1-month
period. The investigators tried to include
equal numbers of men who died from CVD
and those that did not. Then, asking a close
relative, the dietary habits were ascertained.
non-CVD
High Salt Low Salt
2
23
Total
25
CVD
5
30
35
Total
7
53
60
A calculation of the odds ratio yields:
2  30
OR 
 0.522
5  23
Interpret.
Fall 2002
Biostat 511
339
Fisher’s Exact Test
D+
E+
D- Total
n1
ETotal
n2
m1
m2
N
If we fix all of the margins then any one cell of the
table will allow the remaining cells to be filled. Note
that a must be greater than 0, less than both n1 and
m1, and an integer. Thus there are only a relatively
few number of possible table configurations if either
n1or m1 is small (with n1, n2, m1, m2 fixed).
Under the null hypothesis,
H0 : OR = 1
we can use the hypergeometric distribution (a
probability distribution for discrete rv’s) to compute
the probability of any given configuration. Since we
have the distribution of a statistic (a) under the null,
we can use this to compute p-values.
Fall 2002
Biostat 511
340
Fisher’s Exact Test
Example: (Rosner, p. 370) Cardiovascular disease.
High Salt Low Salt
2
23
non-CVD
Total
25
CVD
5
30
35
Total
7
53
60
E a | H 0  
n1m1 7  25

 2.92
N
60
Possible Tables:
0
25
35
7 53 60
.017
4
25
35
7 53 60
.214
Fall 2002
1
25
35
7 53 60
.105
5
25
35
7 53 60
.082
2
25
35
7 53 60
.252
6
25
35
7 53 60
.016
Biostat 511
3
25
35
7 53 60
.312
7
25
35
7 53 60
.001
341
Fisher’s Exact Test
Using the hypergeometric distribution we can
compute the exact probability of each of these
tables (under H0: p1 = p2) (Rosner pg. 370)
To compute a p-value we then use the usual
approach of summing the probability of all events
(tables) as extreme or more extreme than the
observed data.
•For a one tailed test of p1 < p2 (p1 > p2) we
sum the probabilities of all tables with a less
than or equal to (greater than or equal to) the
observed a.
•For a two-tailed test of p1 = p2 we compute
the two one-tailed p-values and double the
smaller of the two.
You will never do this by hand ….
Fall 2002
Biostat 511
342
Categorical data -summary
2x2?
Yes
No
Samples
independent?
Yes
Expected > 5?
Yes
2 sample Z test
for proportions or
c2 test
2xk?
No
Yes
McNemar’s
test
Test for trend in
proportions?
No
Fisher’s
exact test
Yes
c2
test for
trend
No
c2 test for
R x C table
No
Expected > 5?
No
Yes
c2 test
Fall 2002
Biostat 511
Exact test
343