Download Statistics for Clinicians 5: Comparing proportions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
J. Paediatr. Child Health (2001) 37, 392–394
Statistics for Clinicians
5: Comparing proportions using the chi-squared test
JB CARLIN1,4 and LW DOYLE2,3,4
1Clinical
Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute, Parkville, 2Division of Newborn
Services, Royal Women’s Hospital, Melbourne, Departments of 3Obstetrics and Gynaecology, and 4Paediatrics,
the University of Melbourne, Parkville, Victoria, Australia
As described in the first article in this series1 a review of
articles in the Journal of Paediatrics and Child Health found
that 97% of papers reported categorical data, and 45% reported
univariate inferential statistics for dichotomous outcomes. An
example of a dichotomous outcome is shown in Table 1, where,
for our previously described data set2 from a study of very low
birthweight (VLBW) infants born at the Royal Women’s
Hospital in 1992, we compare the survival rates for males and
females.
The most commonly used statistical method for comparing
proportions is the chi-squared (χ2) test (often termed ‘Pearson’s
chi-squared’ after the turn-of-the-century statistical pioneer
Karl Pearson). The test begins with the null hypothesis that
there is no association between exposure and outcome, or
equivalently in our example that the survival rates in males and
females are not different. Like all statistical tests, the chisquared test is based on calculating a value (the ‘test statistic’)
that measures the extent of deviation from the null hypothesis.
To explain this, we display in Table 2 a symbolic version of
Table 1, where a, b, c and d represent the cell counts in Table 1.
If the null hypothesis were true, the ‘best guess’ to estimate the
number that should end up in cell a, given the total numbers
exposed and unexposed, would be obtained by applying the
overall proportion of ‘Yes’ outcomes to the total number that
were exposed, which we can express symbolically as:
or equivalently:
(a + c)
(a + b + c + d)
× (a + b).
that the observed (sometimes written Oi) counts in the individual cells (i) deviate from the expected (Ei) numbers under
the null hypothesis. We take the expected number away from
the observed number, and square it to remove any negative
sign. The noise is related to the variance of the numbers in the
cells. It turns out that the variance for each cell is the expected
number (Ei) for that cell. The contribution of an individual cell
to the overall test statistic (X2) is then:
(Oi – Ei ) 2
Ei
We sum the contribution of all of the individual cells to calculate
the overall X2, and then refer to a table to compare this value with
the χ2 distribution, which is the probability distribution that can
be shown to apply to this statistic if the null hypothesis is true. In
particular, the P value is obtained as the probability that a value as
large or larger than X2 would arise from this distribution. (There is
actually a whole family of χ2 distributions, indexed by a
parameter called the degrees of freedom, but the appropriate one
for the 2 × 2 table has just one degree of freedom.)
Returning to the example of survival rates in VLBW infants,
we proceed as shown in Table 1, and find that the resultant X2
is 9.16, giving a P value of 0.002. We would then conclude that
we have highly statistically significant evidence that the
survival rate of males was different from (and in fact lower
than) that for females.
An equivalent method to calculate X2 for a 2 × 2 table is the
formula:
(a + b) (a + c)
(a + b + c + d)
Notice that this is just the product of the row and column
totals divided by the overall total. A similar calculation can be
applied to obtain ‘expected’ counts for each of the remaining
cells.
As we explained in the previous article3 one way of thinking
about most statistical tests is as a signal-to-noise ratio. For a
comparison of proportions, the signal is related to the amount
X2 =
(ad – bc)2 (a + b + c + d )
(a + b) (a + c) (b + c) (c + d)
which would generally be easier for the reader to compute by
hand. Of course, as with most statistical tests, we usually do
not have to calculate X2 or its corresponding P value by hand,
since statistical packages are far quicker and more accurate.
An advantage of the more detailed method of calculation
displayed in Tables 1 and 2 is that it can be more generally
applied, in particular to tables that have more than two columns
Correspondence: Associate Professor LW Doyle, Department of Obstetrics & Gynaecology, The University of Melbourne, Parkville, Victoria
3010, Australia. Fax: +61 (03) 9347 1761; email: [email protected]
Accepted for publication 4 June 2001.
Comparing proportions using the chi-squared test
Table 1
393
Survival to 2 years of age by gender
Survived
Yes
No
Total
75
27
102
(73.5%)
(26.5%)
90
10
(90.0%)
(10.0%)
165
37
Male
Gender
Female
Total
Calculations for chi-squared test
Observed values (Oi):
=
Σ
202
75, 27, 90, 10
Expected values (Ei):
X2 =
100
83.3, 18.7, 81.7 18.3
(Oi – Ei ) 2
(75 – 83.3)2
=
Ei
83.3
(27 – 18.7)2
+
18.7
(90 – 81.7)2
+
81.7
(10 –18.3)2
+
18.3
0.830 + 3.699 + 0.847 + 3.780 = 9.16
P value = (χ12 > 9.16) = 0.002
Table 2
2 × 2 table for a dichotomous outcome
Outcome
Yes
No
Total
Yes
a
b
a+b
No
c
d
c+d
a+c
b+d
Exposure
Total
Calculations for chi-squared test
Observed values (Oi):
Expected values (Ei):
a, b, c, d
(a + b) (a + c)
Σ
(Oi – Ei ) 2
(a + b) (b + d)
,
(a + b + c + d)
X2 =
a+b+c+d
(c + d) (a + c)
,
(a + b + c + d)
(c + d) (b + d)
,
(a + b + c + d)
(a + b + c + d)
where Σ means ‘the sum of’
Ei
P value = Pr (χ12 > X2)
= probability that a value from the χ12 distribution would exceed the observed X2.
or two rows, or both. For example we might wish to compare
the survival rate according to several clinical categories, or
between say four birthweight categories in 250 g intervals
(between 500 and 1500 g). The same general chi-squared calculation will give a test of the null hypothesis of no association
between the row classification and survival probability, except
that the degrees of freedom of the chi-squared distribution
for obtaining the P value increases. (It is always equal to the
product of one less than the number of rows by one less than
the number of columns).
394
When comparing more than two proportions, however, it is
often preferable to perform more focussed analyses. For
example, to compare survival across birthweight categories, a
chi-squared test for trend could be used to assess whether the
survival rate increases (or decreases) with the inherent order of
the categories. More focussed tests like these will tend to be
more sensitive or powerful than the general-purpose chisquared. The details can be found in standard textbooks4,5 and
the chi-squared trend test is available in standard statistical
packages (with occasional minor variations).
The theory underlying the chi-squared test breaks down
when the sample size, and in particular the counts in some of
the cells of a table, become small. A widely accepted rule of
thumb is that the chi-squared should not be relied on if any
of the four expected cell counts of a 2 × 2 table is less than five.
How to proceed in the case of such small sample sizes is
somewhat controversial,6 but most statisticians recommend the
use of Fisher’s ‘exact test’. The name ‘exact’ does not signify
any particular claim to validity in this context, merely that the
calculation of the P value is based on the exact enumeration
of cases rather than a mathematical approximation as in the
chi-squared. The ‘exact’ P value calculation actually tends to
produce a rather conservative result (in general the Fisher test
gives a higher P value than the chi-squared); for us this is no
major criticism given the general tendency to overinterpret the
‘significance’ of P values near the conventional critical value of
0.05.3 In summary, we recommend chi-squared analysis for
most comparisons of two proportions, and Fisher’s exact test
when the sample size becomes small.
It should be remembered that our discussion in this article
has considered only the comparison of proportions that are
based on two (or more) independent samples. Where the
comparison is based on observing the same individuals on two
occasions, or on comparing within matched pairs (for example,
JB Carlin and LW Doyle
twins), a different procedure must be used. A very simple test
called McNemar’s test is available for such data; we again refer
the reader to standard texts for the details.4,5
Finally the reader will recall that in discussing the comparison of means using the t-test, in our previous article, we highlighted a number of difficulties that arise from over-reliance on
the method of hypothesis testing in statistical analysis. Similar
issues arise with the chi-squared test, and in the next article in
the series we return to the concept of the confidence interval
and show how it can be used to examine comparisons between
groups. This method has the advantage of helping us to
separate the concepts of statistical and clinical significance, by
ensuring that the substantive size of the difference between
groups is emphasised, along with relevant uncertainty, rather
than focussing on statistical significance at the 5% (or any
other) level.
REFERENCES
1
2
3
4
5
6
Doyle LW, Carlin JB. Statistics for clinicians. 1: Introduction.
J. Paediatr. Child Health 2000; 36: 74–5.
Carlin JB, Doyle LW. Statistics for clinicians. 2: Describing and
displaying data. J. Paediatr. Child Health 2000; 36: 270–4.
Carlin JB, Doyle LW. Statistics for clinicians. 4: Basic concepts of
statistical reasoning: Hypothesis tests and the t-test. J. Paediatr.
Child Health 2001; 37: 72–7.
Bland M. An Introduction to Medical Statistics, 2nd edn. Oxford
Medical Publications, Oxford, 1995.
Armitage P, Berry G. Statistical Methods in Medical Research,
3rd edn. Blackwell, Oxford, 1994.
Haviland MG. Yates’s correction for continuity and the analysis of
2 x 2 contingency tables. (With discussion.) Stat. Med. 1990;
9: 363–383.