Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Biometrical Journal 42 (2000) 1, 59±69 Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Kung-Jong Lui Department of Mathematical Sciences College of Sciences San Diego State University USA Summary This paper discusses interval estimation of the simple difference (SD) between the proportions of the primary infection and the secondary infection, given the primary infection, by developing three asymptotic interval estimators using Wald's test statistic, the likelihood-ratio test, and the basic principle of Fieller's theorem. This paper further evaluates and compares the performance of these interval estimators with respect to the coverage probability and the expected length of the resulting confidence intervals. This paper finds that the asymptotic confidence interval using the likelihood ratio test consistently performs well in all situations considered here. When the underlying SD is within 0.10 and the total number of subjects is not large (say, 50), this paper further finds that the interval estimators using Fieller's theorem would be preferable to the estimator using the Wald's test statistic if the primary infection probability were moderate (say, 0.30), but the latter is preferable to the former if this probability were large (say, 0.80). When the total number of subjects is large (say, 200), all the three interval estimators perform well in almost all situations considered in this paper. In these cases, for simplicity, we may apply either of the two interval estimators using Wald's test statistic or Fieller's theorem without losing much accuracy and efficiency as compared with the interval estimator using the asymptotic likelihood ratio test. Key words: Interval Estimation; Coverage probability; Likelihood ratio test; Fieller's Theorem. 1. Introduction To establish the characteristics of a given disease, one of the interesting problems is to assess the effect due to the primary infection on the likelihood of developing the secondary infection. For example, consider the data (Agresti, 1990, Pages 45±46) about a sample of calves. Calves are first classified by whether they get a primary pneumonia infection. After recovering from the primary infection, calves are then reclassified by whether they develop a secondary infection within a defined time period. In this situation, observations are taken from the same group of calves and hence are likely to be dependent. Therefore, when estimating the simple difference (SD) between the probability of the primary infection and the conditional probability 60 K.-J. Lui: Confidence Intervals of the Difference between Proportions of the secondary infection, given the primary infection, we cannot apply all the interval estimators of SD developed under two independent samples (Thomas and Gart, 1977; Anbar, 1983, 1984; Beal, 1987; Mee, 1984; Hauck and Anderson, 1986; Miettinen and Nurminen, 1985; Santner and Snell, 1980; Wallenstein, 1997). Note that the completely randomized trial, in which calves are randomly allocated into the control and experimental groups, is certainly not ethical and adequate for use here. In this paper, we concentrate discussion on interval estimation of the SD between the probability of the primary infection and the conditional probability of the secondary infection, given the primary infection. We develop three asymptotic interval estimators using Wald's test statistic, the likelihood ratio test, and the basic principle of Fieller's theorem. To evaluate and compare the performance of these interval estimators, we calculate the coverage probability and the expected length of the resulting confidence intervals on the basis of the exact distribution in a variety of situations. We find that the interval estimator using the asymptotic likelihood ratio test, which involves a sophisticated numerical procedure, consistently performs well in all the situations considered here. When the underlying SD is within 0.10 and the total number of subjects is not large (say, 50), the interval estimator using Fieller's theorem would be preferable to the estimator using the Wald's test statistic if the underlying primary infection probability were moderate (say, 0.30). On the other hand, however, the latter would be preferable to the former if the underlying primary infection probability were high (say, 0.80). When the total number of subjects is large (say, 200), all the three estimators perform reasonably well in almost all situations considered in this paper. Therefore, for simplicity, we may apply either of the two asymptotic interval estimators using Wald's test statistic or Fieller's theorem in these situations without losing much accuracy and efficiency as compared with the asymptotic confidence interval using the likelihood ratio test. Note that Agresti (1990) discusses a hypothesis testing procedure for testing whether there is an effect due to the primary infection on the probability of developing the secondary infection and Lui (1998) discusses interval estimation of risk ratio between the two successive infections. However, none of these two papers considers interval estimation of the SD as focused here. 2. Interval Estimators Consider a study, in which the data can be summarized by use of the following 2 2 table: Secondary Infection Yes No Primary Yes p11 p12 p1: Infection No ÿ p22 p22 ; Biometrical Journal 42 (2000) 1 61 where 0 < pij < 1 (for i 1; 2 and j 1; 2) denotes the probability of the corresponding cells, p1: p11 p12 , and p1: p22 1. As also noted elsewhere (Agresti, 1990), by definition, no subject can have the secondary infection without first having the primary infection (i.e., p21 0). In this paper, we focus discussion on interval estimation of the SD between the probability of the primary infection and the conditional probability of the secondary infection, given the primary infection. In terms of the pij , the SD, denoted by d, is defined as p1: ÿ p11 =p1: . Hence, for given p1: and d, we have p11 p1: p1: ÿ d, p12 p1: 1 ÿ p1: d, and p22 1 ÿ p1: . Note that the range for d, by definition, is ÿ1 < d < 1. Suppose that we take a random sample of n subjects. Let nij denote the corresponding number of subjects who fall in the cell with probability pij. Then the log-likelihood for a given (n11 ; n12 ; n22 ) is then Log L C n11 flog p1: log p1: ÿ dg n12 flog p1: log 1 ÿ p1: dg n22 log 1 ÿ p1: ; 1 where C is a constant, that does not depend on parameters d and p1: . On the basis of (1), we can easily show that the maximum likelihood estimates (MLEs) of p1: and d ^ p^1: ÿ ^ are p^1: n11 n12 =n and d p11 =^ p1: , respectively, where p^11 n11 =n. Furthermore, with using the inverse of the observed information matrix, we obtain the ^ of the asymptotic variance for d ^ to be f^ d d estimate Var p11 p^12 =^ p31: p^1: 1 ÿ p^1: g=n (Appendix). Therefore, the asymptotic 1 ÿ a% confidence interval for d is ml ; mu ; 2 q o q o n n ^ ÿ Za=2 Var ^ Za=2 Var ^ and mu min 1; d ^ d d d d where ml max ÿ1; d and Za is the upper 100ath percentile of the standard normal distribution. For testing H0 : d d0 versus Ha : d 6 d0 , it is easy to see that the acceptance region using the asymptotic likelihood ratio test consists of all sample vectors (n11 ; n12 ; n22 ) such that p11 n12 log ^ p12 n22 log ^ p22 2 n11 log ^ p1: d0 g log f^ p1: d0 ÿ d0 g ÿ n11 log f^ ÿ n12 log f^ p1: d0 g log f1 ÿ p^1: d0 d0 g ÿ n22 log f1 ÿ p^1: d0 g c2a ; 3 where p^ij nij =n is the MLE of pij ; p^1: d0 denotes the conditional MLE of p1: , for a given fixed d0 (Appendix), and c2a is the upper 100ath percentile of the central c2 -distribution with one degree of freedom. Therefore, we can obtain the asymptotic likelihood ratio test based confidence interval by inverting the acceptance region (Casella and Berger, 1990): rl ; ru ; 4 62 K.-J. Lui: Confidence Intervals of the Difference between Proportions where ÿ1 < rl < ru < 1 are the smaller and the larger roots of d0 such that 2 n11 log ^ p11 n12 log ^ p12 n22 log ^ p22 ÿ n11 log f^ p1: d0 g log f^ p1: d0 ÿ d0 g ÿ n12 log f^ p1: d0 g log f1 ÿ p^1: d0 d0 g ÿ n22 log f1 ÿ p^1: d0 g c2a : Recall that, by definition, the d defined here can be rewritten as a ratio p21: ÿ p11 =p1: . Following Fieller's theorem (Casella and Berger, 1990), we define Z n^ p21: ÿ p^1: = n ÿ 1 ÿ p^11 ÿ d^ p1: . Note that the expectation 2 2 ^ E n^ p1: ÿ p1: = n ÿ 1 p1: and E ^ p11 p11 . Thus, E Z 0. By use of the delta method and the multivariate Central Limit Theorem (Anderson, 1958), we p can easily show that n Z asymptotically follows the normal distribution with mean 0 and asymptotic variance Var3 p11 1 ÿ p11 2np1: ÿ 1= n ÿ 1 ÿ d2 p1: 1 ÿ p1: ÿ 2 2np1: ÿ 1= n ÿ 1 ÿ d p11 p22 . Thus, the probability that : 2 PfZ 2 = Var3 =n Za=2 g 1 ÿ a if n were large. This leads us to consider the following working quadratic equation in d: ^ 2 Bd ^ C^ 0 ; Ad A^ p^2 ÿ Z 2 p^1: 1 ÿ p^1: =n, 5 where B^ ÿ p^1: = n ÿ 1 ÿ p^11 p^1: 1: a=2 2 ÿZa=2 2n^ p1: ÿ 1 p^1: 1 ÿ p^1: = n ÿ 1 n ÿ p^11 p^22 =n, and C^ n^ p21: ÿ p^1: = 2 2 2 n ÿ 1 ÿ p^11 ÿ Za=2 ^ p11 1 ÿ p^11 =n 2n^ p1: ÿ 1 p^1: 1 ÿ p^1: = n ÿ 12 n ÿ2 2n^ p1: ÿ 1 p^11 p^22 = n ÿ 1 n. If both A^ > 0 and B^2 ÿ 4A^C^ > 0, then the asymptotic 100 1 ÿ a% confidence interval of SD as n is large is given by ÿ2 n^ p21: ql ; qu ; where and 6 n o p ^ ql max ÿ1; ÿB^ ÿ B^2 ÿ 4A^C^ = 2A n o p ^ . qu min 1; ÿB^ B^2 ÿ 4A^C^ = 2A 3. Coverage Probability and Expected Length To evaluate the finite-sample performance of interval estimators (2, 4, and 6) for the SD, we calculate the coverage probability and the expected length of the resulting 95% confidence interval on the basis of the exact trinomialPdistribution. By definition, the coverage probability is simply equal to 1 d 2 cl ; cu f n11 ; n12 ; n22 , where cl ; cu is the confidence interval obtained by use of (2, 4, and 6) and is a function of n11 ; n12 ; n22 , 1 d 2 cl ; cu is the indicator function and 1 if d 2 cl ; cu is true, and 0, otherwise, and where f n11 ; n12 ; n22 is Biometrical Journal 42 (2000) 1 63 the trinomial distribution with the underlying cell probabilities: p11 ; p12 ; and p22 . Similarly, the expected length of the resulting confidence interval is given by P cu ÿ cl f n11 ; n12 ; n22 . ^ is not well-defined and interval estimator (2) is inapNote that when p^1: 0; d plicable. Similarly, in this case, the coefficient of the quadratic terms d2 in equation (5) is 0 and hence we cannot apply (6) to obtain the confidence interval of d either. Furthermore, if either A^ < 0 or B^2 ÿ 4A^C^ < 0, then (6) cannot be applied as well. Note also that the logarithmic function log X is defined only for 0 < X < 1. Therefore, if any cell frequency nij in a random vector (n11 ; n12 ; n22 ) were 0, we would not be able to apply interval estimator (4). When evaluating the performance of (2, 4, and 6), we calculate the coverage probability and the expected length, conditional upon those samples in which the confidence limits of using the respective interval estimator exist. For completeness, we also calculate the probability that we fail to produce confidence limits for each of interval estimators (2, 4, and 6). For given values of p1: and d, as noted before, all parameter values: p11 p1: p1: ÿ d, p12 p1: 1 ÿ p1: d, and p22 1 ÿ p1: are uniquely determined. We consider the situations, in which p1: 0:30, 0.50, and 0.80; d ÿ0:30; ÿ0:20; ÿ0:10; . . . ; 0:30 but which such a restriction that the corresponding cell probabilities: p11 ; p12; and p22 are all > 0; and n 50, 100, and 200. We write programs in SAS (1990) to enumerate the exact probability f n11 ; n12 ; n22 of the desired trinomial distribution. 4. Results Table 1 summarizes the results about the coverage probability and the expected length of the resulting 95% confidence intervals conditional upon those samples in which the confidence limits of the respective interval estimator exist in a variety of situations. As seen from Table 1, when n 200, all estimators perform reasonably well in almost all situations considered here. When both n and p1: are not large (i.e., n 50 and p1: 0:30) and d is within 0.10, estimators (4 and 6) outperforms estimator (2), of which the coverage probability is likely to be less than the desired confidence level. On the other hand, in these cases but in which p1: is large ( 0:80), estimator (2 and 4) is preferable to estimator (6). We also find that the probability of failing to produce an 95% confidence interval by use of either estimator (2 and 6) is negligible (< 0:001) in all situations considered in Table 1, but this probability for use of (4) can be of practical significance when n is not large ( 50). 5. An example To illustrate the practical usefulness of (2, 4, and 6), we consider the example (Agresti, 1990, Pages 45±46) about 156 calves born in Florida. Calves are first 64 K.-J. Lui: Confidence Intervals of the Difference between Proportions Table 1 The coverage probability and the expected length (presented in parenthesis) of the resulting 95% confidence interval for the underlying risk difference between the primary infection and the secondary infection given the primary infection d ÿ0:30; ÿ0:20; . . . ; 0:30 but with such a restriction that p11 ; p12 ; and p22 are all > 0 for use of estimators (2, 4, 6) in the situations, in which the probability of primary infection p1: 0:30, 0.50, and 0.80; and the total number of subjects n 50, 100, and 200 n p1: Estimator d 0.30 ÿ0.3 ÿ0.2 ÿ0:1 0.0 0.1 0.2 0.50 ÿ0.3 ÿ0.2 ÿ0.1 0.0 0.1 0.2 0.3 0.80 ÿ0.1 0.0 0.1 0.2 0.3 50 100 200 2 4 6 2 4 6 2 4 6 0.926 (0.548) 0.930 (0.557) 0.922 (0.548) 0.924 (0.518) 0.910 (0.465) 0.935 (0.380) 0.941 (0.528) 0.944 (0.537) 0.942 (0.530) 0.949 (0.510) 0.955 (0.475) 0.958 (0.431) 0.919 (0.628) 0.937 (0.640) 0.943 (0.631) 0.955 (0.599) 0.962 (0.542) 0.971 (0.453) 0.942 (0.391) 0.941 (0.398) 0.940 (0.391) 0.939 (0.371) 0.936 (0.334) 0.938 (0.275) 0.946 (0.384) 0.948 (0.390) 0.948 (0.384) 0.949 (0.366) 0.948 (0.335) 0.959 (0.288) 0.934 (0.416) 0.942 (0.423) 0.948 (0.416) 0.950 (0.395) 0.957 (0.357) 0.965 (0.296) 0.943 (0.278) 0.945 (0.282) 0.945 (0.278) 0.947 (0.263) 0.942 (0.238) 0.943 (0.196) 0.950 (0.275) 0.949 (0.279) 0.949 (0.275) 0.947 (0.262) 0.948 (0.238) 0.948 (0.200) 0.943 (0.286) 0.946 (0.291) 0.949 (0.286) 0.952 (0.271) 0.954 (0.245) 0.954 (0.203) 0.934 (0.412) 0.938 (0.448) 0.930 (0.468) 0.937 (0.474) 0.941 (0.468) 0.941 (0.448) 0.935 (0.412) 0.954 (0.412) 0.945 (0.443) 0.952 (0.460) 0.951 (0.466) 0.946 (0.460) 0.945 (0.443) 0.946 (0.412) 0.914 (0.443) 0.918 (0.479) 0.940 (0.499) 0.944 (0.506) 0.942 (0.499) 0.943 (0.479) 0.944 (0.443) 0.944 (0.294) 0.943 (0.319) 0.947 (0.333) 0.947 (0.338) 0.945 (0.333) 0.945 (0.319) 0.946 (0.294) 0.948 (0.293) 0.948 (0.317) 0.951 (0.330) 0.951 (0.334) 0.949 (0.330) 0.949 (0.317) 0.947 (0.293) 0.931 (0.304) 0.932 (0.329) 0.942 (0.343) 0.949 (0.348) 0.944 (0.343) 0.946 (0.329) 0.945 (0.304) 0.946 (0.209) 0.947 (0.226) 0.948 (0.236) 0.948 (0.239) 0.949 (0.236) 0.948 (0.226) 0.948 (0.209) 0.949 (0.208) 0.949 (0.225) 0.949 (0.235) 0.949 (0.238) 0.950 (0.235) 0.950 (0.225) 0.950 (0.208) 0.939 (0.212) 0.942 (0.230) 0.946 (0.240) 0.945 (0.243) 0.948 (0.240) 0.948 (0.230) 0.948 (0.212) 0.941 (0.284) 0.944 (0.328) 0.943 (0.356) 0.943 (0.371) 0.945 (0.377) 0.955 (0.293) 0.952 (0.332) 0.949 (0.356) 0.946 (0.369) 0.948 (0.373) 0.919 (0.293) 0.928 (0.336) 0.932 (0.364) 0.934 (0.379) 0.941 (0.384) 0.948 (0.203) 0.947 (0.233) 0.947 (0.253) 0.948 (0.264) 0.945 (0.268) 0.950 (0.206) 0.948 (0.235) 0.950 (0.253) 0.949 (0.264) 0.948 (0.267) 0.939 (0.206) 0.941 (0.236) 0.942 (0.256) 0.944 (0.267) 0.944 (0.271) 0.950 (0.144) 0.949 (0.166) 0.948 (0.180) 0.949 (0.187) 0.948 (0.190) 0.950 (0.145) 0.949 (0.166) 0.949 (0.180) 0.950 (0.187) 0.950 (0.190) 0.946 (0.145) 0.944 (0.167) 0.946 (0.181) 0.945 (0.188) 0.947 (0.191) Biometrical Journal 42 (2000) 1 65 classified according to whether they are infected with pneumonia within 60 days after birth. They are then classified again by whether they develop a secondary infection within two weeks after clearing up the first infection. As shown in Table 3.2 on Page 46 by Agresti (1990), we have n11 30, n12 63, and n22 63. ^ is 0.274. Applying interval estimators (2, 4, With given these data, the estimate d and 6), we obtain the 95% confidence intervals of d to be [0.151, 0.396], [0.148, 0.392], and [0.137, 0.385], respectively. Because the lower limits of these resulting confidence intervals are all larger then 0, applying any of these interval estimators may suggest that the primary infection of pneumonia should stimulate a natural immunity to reduce the likelihood of a secondary infection. Although this inference is the same as that claimed elsewhere with using a hypothesis test procedure (Agresti, 1990, Page 47), we do need to implicitly assume that the immunity level of calves to pneumonia does not vary much within the first 3 months of birth and the follow-up period of 14 days is sufficiently long enough to calculate the proportion of the secondary infection to draw the above conclusion. When applying the study design discussed here to study the natural immunity, it is certainly important to decide how to choose an appropriate length of the follow-up period. However, this decision is essentially dependent on subjective knowledge of the characteristics of the underlying disease and beyond the scope of this paper. 6. Discussion The coverage probability of interval estimator (4) using the asymptotic likelihoodratio test consistently agrees reasonably well with the desired confidence level of 95% in all situations considered in Table 1, while those of estimators (2 and 6) can be less than the 95% when n is not large. Furthermore, the expected length for use of (4) may often be the shortest among these three estimators when the coverage probability is in the near neighborhood of 95% (Table 1). Therefore, in the situation in which the probability of failing to produce an interval estimate by use of (4) is negligible, estimator (4) might be generally recommended if n were not large ( 50). On the other hand, use of (4) requires a sophisticated numerical procedure to calculate the confidence limits, while application of the other two estimators (2 and 6) is simple to implement. Thus, when n is large 200 and all the three estimators are essentially equivalent, we may wish to apply estimators (2 and 6) for simplicity. : : ^ In the above example, the MLEs of p1: and d are p^1: 0:60 and d 0:274, respectively. The total number of subjects n is 156. According to the results presented in Table 1, all three interval estimators (2, 4 and 6) are appropriate for use in this case. This is consistent with the finding that all the resulting 95% confidence intervals are similar to one another. Note that the probability of failing to produce confidence limits for use of (2 and 6), as shown in Table 2, is negligible for all situations considered here. There- 66 K.-J. Lui: Confidence Intervals of the Difference between Proportions Table 2 The probability of failing to produce an 95% confidence interval in application of interval estimators (2, 4, and 6) for the underlying risk difference d ÿ0:30; ÿ0:20; ÿ0:10; . . . ; 0:30 but with such a restriction that p11 ; p12 ; and p22 are all > 0 in the situations, in which the prohability of primary infection p1: 0:30, 0.50, and 0.80; and the total number of subjects n 50, 100, and 200 n p1: Estimator d 50 100 200 2 4 6 2 4 6 2 4 6 0.30 ÿ0.3 ÿ0.2 ÿ0.1 0:0 0:1 0:2 0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.001 0.002 0.009 0.045 0.218 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.048 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.50 ÿ0.3 ÿ0.2 ÿ0.1 0:0 0:1 0:2 0:3 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.005 0.000 0.000 0.000 0.000 0.000 0.005 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.80 ÿ0.1 0:0 0:1 0:2 0:3 0.000 0.000 0.000 0.000 0.000 0.015 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 fore, the resulting coverage probability and the expected length for these two estimators calculated conditional upon the samples in which the confidence limits exist are essentially equivalent to those normally calculated over all samples. However, the probability of failing to apply (4) when any cell frequency, n11 ; n12 ; or n22 equals 0 can be non-negligible. For example, when n 50, p1: 0:30, and d 0:20, this probability is approximately 0.218 (Table 2). To avoid this limitation in application of (4), we can apply the commonly-used adjustment for sparse data by adding 0.50 to each cell frequency whenever this occurs. With use of this and hoc adjustment in the above case considered in Table 2, we find that the coverage probability and the expected length change from 0.958 and 0.431 to 0.950 and 0.412, respectively. The magnitudes of these changes are certainly of no practical importance. In fact, we have recalculated all the coverage probability and the expected length with use of this as hoc adjustment to eliminate the probability of failing to produce confidence limits for using (4) in all situations considered in Table 1. Because the differences between the results of using (4) presented in Biometrical Journal 42 (2000) 1 67 Table 1 and those with this adjustment are generally quite small, we decide not to present them for brevity. Finally, note that though the logarithmic transformation has been successfully applied to derive the confidence interval for the other epidemiologic indices such as risk ratio or odds ratio (Katz et al., 1978; Lui, 1995, 1996, and 1998), we do not recommend use of this transformation to derive the confidence interval of the ^ SD as focused here. This is not only because the sampling distribution of log d ^ can be even more skewed than that of d when the underlying d is small, but also ^ is undefined when d ^ is <0. because log d In summary, this paper proposes three asymptotic confidence interval for the SD between successive infections. This paper demonstrates that the interval estimator using the asymptotic likelihood ratio test can consistently perform well in a variety of situations. However, application of this procedure involves iterative numerical calculation. When the probability of the underlying primary infection is moderate ( 0:30) and the SD is within 0.10, we may use the interval estimator using the Fieller's theorem. On the other hand, when the probability of the underlying primary infection is high ( 0:80), we may apply the interval estimator using the Wald's test statistic. Acknowledgements The author wishes to thank the referee for many helpful and valuable comments to improve the clarity of this paper. This work in part was supported by the grant from the Agency for Health Care Policy and Research #R01-HS07161. Appendix For a given sample vector (n11 ; n12 ; n22 ), the log-likelihood is Log L C n11 flog p1: log p1: ÿ dg n12 flog p1: log 1 ÿ p1: dg n22 log 1 ÿ p1: : Then the MLEs of p1: and d are simply the roots for p1: and d of the following two equations: @ Log L n11 f1=p1: 1= p1: ÿ dg @p1: n12 f1=p1: ÿ 1= 1 ÿ p1: dg ÿ n22 = 1 ÿ p1: 0 A:1 and @ Log L ÿn11 = p1: ÿ d n12 = 1 ÿ p1: d 0 : @d A:2 68 K.-J. Lui: Confidence Intervals of the Difference between Proportions ^ p^1: ÿ p^11 =^ We can easily show that the MLEs are p^1: n11 n12 =n and d p1: . Furthermore, @ 2 Log L ÿ n11 f1=p21: 1= p1: ÿ d2 g @p21: ÿ n12 f1=p21: 1= 1 ÿ p1: d2 g ÿ n22 = 1 ÿ p1: 2 ; A:3 2 @ Log L ÿn11 = p1: ÿ d2 ÿ n12 = 1 ÿ p1: d2 ; A:4 2 @d @ Log L A:5 n11 = p1: ÿ d2 n12 = 1 ÿ p1: d2 : @p1: @d ^ for the corresponding parameters in When substituting the MLEs p^1: and d ^ (A.3±A.5) we can obtain the estimate of the asymptotic variance for the MLE d 3 to be f^ p11 p^12 =^ p1: p^1: g=n through use of the inverse of the observed information matrix. Note that for a given fixed d0 such that ÿ1 < d0 < 1, as p1: increases from @ Log L in the left-hand of equamax f0; d0 g to min f1; 1 d0 g, the value of @p1: tion (A.1) decreases from 1 to ÿ1. Furthermore, (A.1) is a continuous function over max f0; d0 g p1: min f1; 1 d0 g. These suggest that, for a given fixed d0 , where ÿ1 < d0 < 1, the conditional MLE p^1: d0 of p1: is simply the unique root for p1: (falling in the range of max f0; d0 g p1: min f1; 1 d0 g of equation (A.1) with replacing d by d0. References Agresti, A., 1990: Categorical Data Analysis. Wiley, New York. Anbar, D., 1983: On estimating the difference between two probabilities, with special reference to clinical trials. Biometrics 39, 257±262. Anbar, D., 1984: Confidence bounds for the difference between two probabilities. Biometrics (reply to letter) 40, 1176. Anderson, T. W., 1958: An Introduction to Multivariate Statistical Analysis. Wiley, New York. Beal, S. L., 1987: Asymptotic confidence intervals for the difference between two binomial parameters for use with small samples. Biometrics 43, 941±950. Casella, G. and Berger, R. L., 1990: Statistical Inference. Duxbury, Belmont, California. Hauck, W. W. and Anderson, S., 1986: A comparison of large sample confidence interval methods for the difference of two binomial probabilities. The American Statistician 40, 318±322. Katz, D., Baptista, J., Azen, S. P., and Pike, M. C., 1978: Obtaining confidence intervals for the risk ratio in cohort studies. Biometrics 34, 469±474. Lui, K.-J., 1995: Confidence intervals for the risk ratio in cohort studies under inverse sampling. Biometrical Journal 37, 965±971. Lui, K.-J., 1996: Notes on Confidence limits for the odds ratio in case-control studies under inverse sampling. Biometrical Journal 38, 221±229. Lui, K.-J., 1998: Interval estimation of risk ratio between the secondary infection given the primary infection and the primary infection. Biometrics 54, 706±711. Biometrical Journal 42 (2000) 1 69 Mee, R. W., 1984: Confidence bounds for the difference between two probabilities. Biometrics 40, 1175±1176. Miettinen, O. and Nurminen, M., 1985: Comparison analysis of two rates. Statistics in Medicine 4, 213±226. Santner, T. J. and Snell, M. K., 1980: Small-sample confidence intervals for p1 ÿ p2 and p1 =p2 in 2 2 contingency tables. Journal of the American Statistical Association 73, 386±394. Thomas, D. G. and Gart, J. J., 1977: A table of exact confidence limits for differences and ratios of two proportions and their odds ratios. Journal of the American Statistical Association 72, 73±76. SAS Institute, Inc., 1990: SAS Language, Version 6, 1st edition. Cary, North Carolina. Wallenstein, S., 1997: A non-iterative accurate asymptotic confidence interval for the difference between two proportions. Statistics in Medicine 16, 1329±1336. Kung-Jong Lui Department of Mathematical Sciences College of Sciences San Diego State University 5500 Campanile Drive San Diego, CA 92182-7720 USA E-mail: [email protected] Received, November 1997 Revised, August 1999 Accepted, August 1999