Download Statistics for Clinicians 5: Comparing proportions

J. Paediatr. Child Health (2001) 37, 392–394 Statistics for Clinicians 5: Comparing proportions using the chi-squared test JB CARLIN1,4 and LW DOYLE2,3,4 1Clinical Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute, Parkville, 2Division of Newborn Services, Royal Women’s Hospital, Melbourne, Departments of 3Obstetrics and Gynaecology, and 4Paediatrics, the University of Melbourne, Parkville, Victoria, Australia As described in the first article in this series1 a review of articles in the Journal of Paediatrics and Child Health found that 97% of papers reported categorical data, and 45% reported univariate inferential statistics for dichotomous outcomes. An example of a dichotomous outcome is shown in Table 1, where, for our previously described data set2 from a study of very low birthweight (VLBW) infants born at the Royal Women’s Hospital in 1992, we compare the survival rates for males and females. The most commonly used statistical method for comparing proportions is the chi-squared (χ2) test (often termed ‘Pearson’s chi-squared’ after the turn-of-the-century statistical pioneer Karl Pearson). The test begins with the null hypothesis that there is no association between exposure and outcome, or equivalently in our example that the survival rates in males and females are not different. Like all statistical tests, the chisquared test is based on calculating a value (the ‘test statistic’) that measures the extent of deviation from the null hypothesis. To explain this, we display in Table 2 a symbolic version of Table 1, where a, b, c and d represent the cell counts in Table 1. If the null hypothesis were true, the ‘best guess’ to estimate the number that should end up in cell a, given the total numbers exposed and unexposed, would be obtained by applying the overall proportion of ‘Yes’ outcomes to the total number that were exposed, which we can express symbolically as: or equivalently: (a + c) (a + b + c + d) × (a + b). that the observed (sometimes written Oi) counts in the individual cells (i) deviate from the expected (Ei) numbers under the null hypothesis. We take the expected number away from the observed number, and square it to remove any negative sign. The noise is related to the variance of the numbers in the cells. It turns out that the variance for each cell is the expected number (Ei) for that cell. The contribution of an individual cell to the overall test statistic (X2) is then: (Oi – Ei ) 2 Ei We sum the contribution of all of the individual cells to calculate the overall X2, and then refer to a table to compare this value with the χ2 distribution, which is the probability distribution that can be shown to apply to this statistic if the null hypothesis is true. In particular, the P value is obtained as the probability that a value as large or larger than X2 would arise from this distribution. (There is actually a whole family of χ2 distributions, indexed by a parameter called the degrees of freedom, but the appropriate one for the 2 × 2 table has just one degree of freedom.) Returning to the example of survival rates in VLBW infants, we proceed as shown in Table 1, and find that the resultant X2 is 9.16, giving a P value of 0.002. We would then conclude that we have highly statistically significant evidence that the survival rate of males was different from (and in fact lower than) that for females. An equivalent method to calculate X2 for a 2 × 2 table is the formula: (a + b) (a + c) (a + b + c + d) Notice that this is just the product of the row and column totals divided by the overall total. A similar calculation can be applied to obtain ‘expected’ counts for each of the remaining cells. As we explained in the previous article3 one way of thinking about most statistical tests is as a signal-to-noise ratio. For a comparison of proportions, the signal is related to the amount X2 = (ad – bc)2 (a + b + c + d ) (a + b) (a + c) (b + c) (c + d) which would generally be easier for the reader to compute by hand. Of course, as with most statistical tests, we usually do not have to calculate X2 or its corresponding P value by hand, since statistical packages are far quicker and more accurate. An advantage of the more detailed method of calculation displayed in Tables 1 and 2 is that it can be more generally applied, in particular to tables that have more than two columns Correspondence: Associate Professor LW Doyle, Department of Obstetrics & Gynaecology, The University of Melbourne, Parkville, Victoria 3010, Australia. Fax: +61 (03) 9347 1761; email: [email protected] Accepted for publication 4 June 2001. Comparing proportions using the chi-squared test Table 1 393 Survival to 2 years of age by gender Survived Yes No Total 75 27 102 (73.5%) (26.5%) 90 10 (90.0%) (10.0%) 165 37 Male Gender Female Total Calculations for chi-squared test Observed values (Oi): = Σ 202 75, 27, 90, 10 Expected values (Ei): X2 = 100 83.3, 18.7, 81.7 18.3 (Oi – Ei ) 2 (75 – 83.3)2 = Ei 83.3 (27 – 18.7)2 + 18.7 (90 – 81.7)2 + 81.7 (10 –18.3)2 + 18.3 0.830 + 3.699 + 0.847 + 3.780 = 9.16 P value = (χ12 > 9.16) = 0.002 Table 2 2 × 2 table for a dichotomous outcome Outcome Yes No Total Yes a b a+b No c d c+d a+c b+d Exposure Total Calculations for chi-squared test Observed values (Oi): Expected values (Ei): a, b, c, d (a + b) (a + c) Σ (Oi – Ei ) 2 (a + b) (b + d) , (a + b + c + d) X2 = a+b+c+d (c + d) (a + c) , (a + b + c + d) (c + d) (b + d) , (a + b + c + d) (a + b + c + d) where Σ means ‘the sum of’ Ei P value = Pr (χ12 > X2) = probability that a value from the χ12 distribution would exceed the observed X2. or two rows, or both. For example we might wish to compare the survival rate according to several clinical categories, or between say four birthweight categories in 250 g intervals (between 500 and 1500 g). The same general chi-squared calculation will give a test of the null hypothesis of no association between the row classification and survival probability, except that the degrees of freedom of the chi-squared distribution for obtaining the P value increases. (It is always equal to the product of one less than the number of rows by one less than the number of columns). 394 When comparing more than two proportions, however, it is often preferable to perform more focussed analyses. For example, to compare survival across birthweight categories, a chi-squared test for trend could be used to assess whether the survival rate increases (or decreases) with the inherent order of the categories. More focussed tests like these will tend to be more sensitive or powerful than the general-purpose chisquared. The details can be found in standard textbooks4,5 and the chi-squared trend test is available in standard statistical packages (with occasional minor variations). The theory underlying the chi-squared test breaks down when the sample size, and in particular the counts in some of the cells of a table, become small. A widely accepted rule of thumb is that the chi-squared should not be relied on if any of the four expected cell counts of a 2 × 2 table is less than five. How to proceed in the case of such small sample sizes is somewhat controversial,6 but most statisticians recommend the use of Fisher’s ‘exact test’. The name ‘exact’ does not signify any particular claim to validity in this context, merely that the calculation of the P value is based on the exact enumeration of cases rather than a mathematical approximation as in the chi-squared. The ‘exact’ P value calculation actually tends to produce a rather conservative result (in general the Fisher test gives a higher P value than the chi-squared); for us this is no major criticism given the general tendency to overinterpret the ‘significance’ of P values near the conventional critical value of 0.05.3 In summary, we recommend chi-squared analysis for most comparisons of two proportions, and Fisher’s exact test when the sample size becomes small. It should be remembered that our discussion in this article has considered only the comparison of proportions that are based on two (or more) independent samples. Where the comparison is based on observing the same individuals on two occasions, or on comparing within matched pairs (for example, JB Carlin and LW Doyle twins), a different procedure must be used. A very simple test called McNemar’s test is available for such data; we again refer the reader to standard texts for the details.4,5 Finally the reader will recall that in discussing the comparison of means using the t-test, in our previous article, we highlighted a number of difficulties that arise from over-reliance on the method of hypothesis testing in statistical analysis. Similar issues arise with the chi-squared test, and in the next article in the series we return to the concept of the confidence interval and show how it can be used to examine comparisons between groups. This method has the advantage of helping us to separate the concepts of statistical and clinical significance, by ensuring that the substantive size of the difference between groups is emphasised, along with relevant uncertainty, rather than focussing on statistical significance at the 5% (or any other) level. REFERENCES 1 2 3 4 5 6 Doyle LW, Carlin JB. Statistics for clinicians. 1: Introduction. J. Paediatr. Child Health 2000; 36: 74–5. Carlin JB, Doyle LW. Statistics for clinicians. 2: Describing and displaying data. J. Paediatr. Child Health 2000; 36: 270–4. Carlin JB, Doyle LW. Statistics for clinicians. 4: Basic concepts of statistical reasoning: Hypothesis tests and the t-test. J. Paediatr. Child Health 2001; 37: 72–7. Bland M. An Introduction to Medical Statistics, 2nd edn. Oxford Medical Publications, Oxford, 1995. Armitage P, Berry G. Statistical Methods in Medical Research, 3rd edn. Blackwell, Oxford, 1994. Haviland MG. Yates’s correction for continuity and the analysis of 2 x 2 contingency tables. (With discussion.) Stat. Med. 1990; 9: 363–383.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistics for Clinicians 5: Comparing proportions