Download Empirical Likelihood Confidence Intervals for Response Mean with

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Scandinavian Journal of Statistics, Vol. 36: 671–685, 2009
doi: 10.1111/j.1467-9469.2009.00651.x
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics. Published by Blackwell Publishing Ltd.
Empirical Likelihood Confidence Intervals
for Response Mean with Data Missing at
Random
LIUGEN XUE
College of Applied Sciences, Beijing University of Technology
ABSTRACT. A kernel regression imputation method for missing response data is developed.
A class of bias-corrected empirical log-likelihood ratios for the response mean is defined. It is
shown that any member of our class of ratios is asymptotically chi-squared, and the corresponding
empirical likelihood confidence interval for the response mean is constructed. Our ratios share
some of the desired features of the existing methods: they are self-scale invariant and no plug-in
estimators for the adjustment factor and asymptotic variance are needed; when estimating the
non-parametric function in the model, undersmoothing to ensure root-n consistency of the estimator
for the parameter is avoided. Since the range of bandwidths contains the optimal bandwidth for
estimating the regression function, the existing data-driven algorithm is valid for selecting an
optimal bandwidth. We also study the normal approximation-based method. A simulation study
is undertaken to compare the empirical likelihood with the normal approximation method in terms
of coverage accuracies and average lengths of confidence intervals.
Key words: bandwidth, confidence interval, empirical likelihood, kernel regression imputation
method, missing at random, response mean
1. Introduction
Missing response data often arise in various experimental settings, including market research
surveys, medical studies, opinion polls and socioeconomic investigations. Statistical analysis
with missing data is a very difficult task since in most cases the missing data themselves
contain little or no information about the missing data mechanism (MDM). The fundamental
and most widely used assumption about the MDM is that it is a missing at random (MAR)
model (Rubin, 1976). The basic idea of MAR is that the probability that a response variable
is observed can depend only on the values of those other variables that have been observed.
This concept has been extensively studied, and effective computational methods for handling
missing data under the MAR assumption have been well developed.
Let X be a d-dimensional vector of factors and let Y be a response variable influenced by
X. In practice, one often obtains a random sample of incomplete data {(Xi , Yi , i ); 1 ≤ i ≤ n},
where all the Xi s are observed and i = 0 if Yi is missing, otherwise i = 1. This class of sample
missing data can arise because of a double or two-phase sampling scheme first proposed by
Neyman (1938). The data may also arise from other distinctive sources. Typically, they may
occur in any experimental situation where the treatment is susceptible to contamination or
subject mortality.
To estimate the mean of Y for the missing data {(Xi , Yi , i ); 1 ≤ i ≤ n}, a common method
is to impute (i.e. fill in) a plausible value for each missing datum and then construct an
estimator from the imputed data as if they were complete data. Cheng (1994) applied kernel
regression imputation to estimate the mean of Y, say . He gave an estimator of , say ˆC ,
and established the asymptotic normality of a modified version of ˆC under the assumption
that the Y values are MAR. Hahn (1998) established the semi-parametric efficiency bound
672
L. Xue
Scand J Statist 36
for an estimation of , and constructed an estimator based on the propensity score p(x) that
achieves the bound. These results can be used to perform interval estimation and hypothesis
testing on . The other works include Wang & Rao (2002a,b) and Chen et al. (2006).
A competitive method for constructing the confidence interval of is the empirical likelihood method, introduced by Owen (1988, 1990). It has many advantages over other methods
such as those based on normal approximations or the bootstrap (Hall & La Scala, 1990).
Many authors have developed methods for non- and semi-parametric regression models. Some
related works include: Chen & Hall (1993), Kitamura (1997), Chen & Sitter (1999), Peng
(2004), Wang et al. (2004), Zhu & Xue (2006), Xue & Zhu (2006, 2007a,b), Stute et al.
(2007), among others. Qin & Zhang (2007) employed an empirical likelihood method to seek
a constrained empirical likelihood estimation of the response mean with the assumption that
responses are MAR. With the non-parametric kernel regression imputation scheme, Wang &
Rao (2002a) developed imputed empirical likelihood approaches for constructing confidence
intervals of . Their main idea is to first impute the missing Y -values by a kernel regression
imputation and then construct a complete data empirical likelihood for from the imputed
data set as if they were independent and identically distributed (i.i.d.) observations. However,
the imputed data are not i.i.d. because the plug-in estimator is used. As a consequence,
the empirical log-likelihood ratio under imputation is asymptotically distributed as a scaled
chi-square variable. Therefore, the empirical log-likelihood ratio cannot be applied directly to
make a statistical inference on . This motivates them to adjust the empirical log-likelihood
ratio so that the adjusted empirical log-likelihood ratio is asymptotically chi-squared. The
adjustment is to multiply by an adjustable factor to get the adjusted empirical likelihood ratio.
However, there are two issues: the first issue is that the adjustment factor is very complicated
and contains several unknowns to be estimated; the second issue is that the undersmoothing
for estimating the unknown function brings a difficulty in selecting bandwidths. In addition,
we need to point out that theorem 2.1 in Hjort et al. (2009) cannot be applied directly in
practice although they have provided a general framework for the ratio based on plug-in
estimation, because their theorem does not answer how to construct an auxiliary random
vector for a special model.
In this paper, we construct a weight-corrected empirical log-likelihood ratio for such
that the ratio is asymptotically chi-squared. With auxiliary information, we also construct
a weight-corrected empirical log-likelihood ratio for , and it is shown that the ratio has
an asymptotic chi-squared distribution. To compare the empirical likelihood method with
the normal approximation method, we also construct a weighted estimator and a maximum
empirical likelihood estimator for , and show their asymptotic behaviours. Our results can
be used directly to construct the confidence intervals for . Zhu & Xue (2006) proposed the
bias-corrected method for constructing the empirical likelihood ratio. One main feature of
our approach is to directly calibrate the empirical likelihood ratio so that the resulting ratio
is asymptotically chi-squared. As the ratio does not need to be multiplied by an adjustment
factor, this avoids a difficulty in estimating an unknown adjustment factor. This is especially
attractive in some cases when the adjustment factor is difficult to estimate efficiently. In
addition, we do not need undersmoothing in selecting bandwidth. The range of bandwidths
contains the optimal bandwidth for estimating the regression function, and the existing
data-driven algorithm is valid for selecting an optimal bandwidth.
The rest of this paper is organized as follows. In section 2, our methods are elaborated, and
some of our main results are given. In section 3, a simulation study is conducted to compare
the empirical likelihood with the normal approximation method in terms of coverage
accuracies and average lengths of confidence intervals. In section 4, the concluding remarks
are given. Proofs of the theorems are relegated to the Appendix.
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.
Empirical likelihood for response mean
Scand J Statist 36
673
2. Methods and results
Throughout this paper, we make the MAR assumption for Y values. The MAR assumption
implies that and Y are conditionally independent given X , that is,
P( = 1|Y , X ) = P( = 1|X ),
denoted by p(x).
p(·) is called the selection probability function.
2.1. Weight-corrected empirical likelihood
To construct the empirical likelihood ratio function for , Wang & Rao (2002a) applied kernel
regression imputation to introduce the auxiliary random variables
Ỹ i = i Yi + (1 − i )m̂b (Xi ), i = 1, . . ., n,
where m̂b (x) is a truncated version of the estimator of m(x) = E(Y | X = x); that is,
(nhd )−1 ni= 1 i Yi Kh (Xi − x)
m̂b (x) =
.
max{b, (nhd )−1 ni= 1 i Kh (Xi − x)}
(1)
(2)
Here each of h = hn and b = bn is a sequence of positive constants tending to zero, while
Kh (·) = K (·/h), and K (·) is a kernel function. Using Ỹ i , Wang & Rao (2002a) constructed an
˜
estimated empirical log-likelihood ratio function, say l().
However, the asymptotic
˜ is not standard chi-squared. Actually, l()
˜ is asymptotically distributed as
distribution of l()
˜ must be adjusted because
a scaled chi-square variable with one degree of freedom. Thus, l()
it cannot be used directly to make a statistical inference for . The adjustment is to multiply
by an adjustable factor that is estimated. Now, we directly construct a weight-corrected empirical log-likelihood ratio statistic for such that the statistic is asymptotically chi-square distributed without the need for an adjustment factor. Since Ỹ i contains the estimator m̂b (Xi ),
there exists the bias m̂b (Xi ) − m(Xi ) in Ỹ i . To reduce the bias, we use the approach of weighted
imputation. Therefore, a new auxiliary variable Ŷ i , i = 1, . . ., n, depending on estimated
response probabilities p̂(Xi ), is defined by
i
i Yi
+ 1−
m̂b (Xi ),
Ŷ i =
(3)
p̂(Xi )
p̂(Xi )
where m̂b (x) is defined in (2), and p̂(x) is the estimator of p(x); that is,
n
i = 1 i La (Xi − x)
p̂(x) =
.
max {1, ni= 1 La (Xi − x)}
(4)
Here a = an is a sequence of positive constants tending to zero, while La (·) = L(·/a), and L(·) is
a kernel function. Therefore, a weight-corrected empirical log-likelihood ratio function for is
defined as:
ˆ = −2 max
l()
n
log(npi ),
i =1
where the maximum is taken over all sets of non-negative numbers p1 , . . ., pn that sum to
n
= . By the Lagrange multiplier method, when min1≤i≤n Ŷ i < <
1 and such that
i = 1 pi Ŷ i
ˆ can be represented as:
max1≤i≤n Ŷ i , the ratio l()
ˆ =2
l()
n
log(1 + (Ŷ i − )),
i =1
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.
(5)
674
L. Xue
Scand J Statist 36
where = () is the solution of the equation
n
i =1
Ŷ i − = 0.
1 + (Ŷ i − )
(6)
ˆ has asymptotically chi-squared
Since the bias in (3) is corrected, it can be derived that l()
distribution. This result is given in theorem 1.
Denote by f (x) and F (x) the probability density and distribution function of X , respec
tively. Let g(x) = p(x) f (x). Assume that Z = di= 1 |zi | for any vector Z = (z1 , . . ., zd )T . The
following conditions are needed for our results.
(C1) The selection probability function p(x), the X -density f (x) and m(x) all have bounded
partial derivatives up to order r with r ≥ 2 and r > d/2, and infx p(x) > 0.
(C2) supx E(Y 2 | X = x) < ∞.
√
(C3) nE[|m(X )|I {g(X ) < 2b}] → 0, where b is defined as in m̂b (x).
√
(C4) nP{X > Mn } → 0, where 0 < Mn → ∞.
(C5) K (·) is a non-negative and bounded kernel function of order r with compact
support, where r ≥ 2 and r > d/2.
(C6) L(·) is a bounded kernel function of order r with r ≥ 2 and r > d/2, and
c1 I {u ≤ } ≤ L(u) ≤ c2 I {u ≤ }
for some finite constants > 0 and c2 ≥ c1 > 0.
(C7) nh2d b4 → ∞ and nh4r b−4 → 0, where r is the order of the kernel K.
(C8) na2d Mn−2d → ∞ and na4r → 0, where r is the order of the kernel L.
Remark 1. Conditions (C1), (C2), (C5) and (C6) are standard assumptions for nonparametric regression problems. Especially, p(x) being bounded away from zero in (C1) implies
that data cannot be missing with probability 1 anywhere in the domain of the X variable.
Conditions (C3) and (C4) are commonly used for avoiding the boundary problem. Condition (C3)
has been used by Zhu & Fang (1996) and Wang & Rao (2002a). Condition (C4) can be satisfied
in the following three cases: (a) the distribution of X has compact support; (b) X has a density
function f (x), and there exist positive constants and such that f (x) ≤ exp(−x ) when x
is large enough; (c) X has a density function f (x), and there exist positive constants and such
that f (x) ≤ x− when x is large enough. For example, the uniform distribution satisfies (a),
the normal and exponential distribution satisfy (b) and the Cauchy distribution satisfies (c). Also,
conditions (C3) and (C4) are simultaneously satisfied for the following two cases used in the
simulation study: (i) X follows a truncated normal distribution; (ii) X follows a standard
√
exponential distribution, and m(X ) is proportional to exp(−cX ) for c > 0 such that nb1 + c → 0
4r
−4
and Mn = 2 ln n. In condition (C7), nh b → 0 is required to control the bias induced by kernel
smoothing whereas nh2d b4 → ∞ leads to consistent estimation of m(x). When we assume that
b = O(n− ) for some small 0 < < 1/4, condition (C7) means that the convergence rate of h has a
range between cn−(1−4)/(2d) and c̄n−(1 + 4)/(4r) for some positive constants c < c̄. Thus, when
estimating the regression function, the optimal convergence rate n−1/(2r + d) is within the range. For
instance, in the univariate case d = 1, r = 2 and = 1/12, the range is between cn−1/3 and c̄n−1/6 ,
and the optimal bandwidth c0 n−1/5 can be found within the range, where c0 is a positive constant. Therefore, the optimal bandwidth can be chosen by using the cross-validation method.
Condition (C8) can be explained similarly. It is worth pointing out that condition (C7) relaxes
condition (C.hn ) in Wang & Rao (2002a), and overcomes the difficulty in selecting bandwidths.
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.
Empirical likelihood for response mean
Scand J Statist 36
675
D
Let −→ represent convergence in distribution, and let χ2r be a chi-square variable with r
ˆ is asymptotically chi-square distributed with
degrees of freedom. Theorem 1 shows that l()
one degree of freedom.
Theorem 1
D
ˆ −→
Suppose that conditions (C1)–(C8) hold. If is the true parameter, then l()
χ21 .
Let χ21 (1 − ) be the 1 − quantile of χ21 for 0 < < 1. Using theorem 1, we obtain an
approximate 1 − confidence interval for , defined by
˜ ={˜ | l(
˜ ≤ χ2 (1 − )}.
ˆ )
I ()
1
Theorem 1 can also be used to test the hypothesis H0 : = 0 . One could reject H0 at level
ˆ 0 ) > χ2 (1 − ).
if l(
1
˜ the imputed file should provide the response probabilities
For the confidence intervals I (),
˜ can be
p̂(Xi ) in order to compute Ŷ i given by (3). In this way, the confidence interval I ()
computed.
The curse of dimensionality is an issue with the kernel estimators m̂b (x) and p̂(x) when the
dimension d of X is high. Since the target of the inference is a finite dimensional rather than
m(x) and p(x), the curse of dimensionality will affect small to moderate sample performance
of the proposed estimator as long as the biases of the kernel estimators are controlled. When
d ≥ 4, controlling the bias requires the order of the kernel r > 2, the so-called high-order
kernel, so that nh4r b−4 → 0 and na4r → 0 instead of nh8 b−4 → 0 and na8 → 0 when a conventional second-order kernel is used. Using a high-order kernel may occasionally cause p̂(x) to
not be a proper selection probability function as the kernel function L(·) may be negative.
In this case, we can re-adjust the weights in p̂(x) by using a similar method used by Hall &
Murison (1993) for high-order kernel density estimators.
Suppose that we have an auxiliary parametric model for p(x), re-denoted by p(x, ), where
is a q × 1 unknown parameter vector. Then we can use a parametric estimation method to
ˆ Hence we can obtain an auxiliary variable Y̌i by substituting p̂(Xi )
get an estimator of , say .
ˇ by substitutˆ
of (3) with p(x, ), and obtain a weight-corrected empirical likelihood ratio l()
ˆ withY̌i . It can be shown that l()
ˆ have the same asymptotic chi-square
ˇ and l()
ing Ŷ i of l()
distribution. The choice of p(x, ) can be the logistic regression function. Similarly, we can
also assume that p(x) is a semi-parametric regression function, and the corresponding
results can be derived. In either of these cases, the method does not require high-dimensional
smoothing operations.
2.2. Weight-corrected empirical likelihood with auxiliary information
We assume that auxiliary information on X of the form E{A(X )} = 0 is available, where
A(X ) = (A1 (X ), . . ., Aq (X ))T , q > 0, is a known vector (or scalar) function, for example, when
the mean or median of X is known in the scalar X case.
By using the auxiliary information, an empirical log-likelihood ratio for is defined as:
lˆAI () = −2 max
n
log(np̃i ),
i =1
where the maximum is taken over all sets of non-negative numbers p̃1 , . . ., p̃n that sum to 1
and such that ni= 1 p̃i A(Xi ) = 0 and ni= 1 p̃i Ŷ i = .
Denote i () = (AT (Xi ), Ŷ i − )T . Provided that the origin is inside the convex hull of the
points 1 (), . . ., n (), the method of Lagrange multipliers leads to the representation
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.
676
L. Xue
lˆAI () = 2
Scand J Statist 36
n
log(1 + T i ()),
(7)
i =1
where satisfies
i ()
1
= 0.
n i = 1 1 + T i ()
n
(8)
We have the following result.
Theorem 2
Suppose that conditions (C1)–(C8) hold, and let E{A(X )AT (X )} be a positive definite matrix.
D
If is the true parameter, then lˆAI () −→ χ2q + 1 .
Similar to theorem 1, the result of theorem 2 can be used to construct a confidence interval
ˆ
in subsection 2.1 in the case of no
for . It may be noted that lˆAI () is reduced to l()
auxiliary information.
2.3. Normal approximation-based method
We now turn to the estimation of . The practical motivation for imputation is to provide
users with a completed (or imputed) data file with missing Yi replaced by m̂b (Xi ). The user
then computes the estimate of from the imputed file {(Ỹ i , i ); 1 ≤ i ≤ n}, where Ỹ i is
defined in (1). Note that Xi may be available only to the imputer and not reported on the
data file. Using the imputed data file, Wang & Rao (2002a) proposed the following estimator
for :
1
ˆWR =
n
n
Ỹ i .
(9)
i =1
The estimator is similar to the estimator ˆC proposed in Cheng (1994). It is shown that ˆWR
and ˆC have the same asymptotic variance (Wang & Rao, 2002a).
We propose a weighted imputation estimator for , which is defined by
1
ˆWI =
n
n
Ŷ i ,
(10)
i =1
where the Ŷ i are defined in (3). Alternatively, a variant of ˆWI previously considered by Cheng
(1990) is the sample average of all the regression estimates; that is,
1
˜ =
n
n
m̂(Xi ),
i =1
where m̂(·) is the Nadaraya–Watson kernel estimator of m(·) based on (Xi , Yi ) for
˜ our estimator ˆWI fully employs the information
i ∈ {i : i = 1}. Compared with the estimator ,
in the sample{(Xi , Yi , i ); 1 ≤ i ≤ n}.
Remark 2. If the analyst is using the incomplete data file {(Xi , Yi , i ); 1 ≤ i ≤ n}, then
imputation is not needed and in this case the objective is to give an efficient estimator of using the auxiliary variable Xi observed on all the units in the sample. Under this scenario,
our estimator ˆWI is simply a difference estimator under two-phase sampling used in the
survey context, where simple random sampling is used in the first phase and Poisson sampling
with probabilities p̂(Xi ) in the second phase.
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.
Empirical likelihood for response mean
Scand J Statist 36
677
ˆ
We may also maximize {−l()}
to obtain an estimator for the parameter , say ˆME , called
the maximum empirical likelihood estimator. It can be shown that
ˆME = ˆWI + oP n−1/2 ,
(11)
that is, ˆWI and ˆME are asymptotically equal. The asymptotic normality for ˆWI and ˆME are
given in theorem 3.
Theorem 3
Suppose that conditions (C1)–(C8) hold. Then
√
D
n(ˆ − ) −→ N(0, V ),
where ˆ can be taken to be ˆWI and ˆME , and V = E{
2 (X )/p(X )} + var(m(X )) with
2 (x) = var(Y | X = x).
It has been shown by Robins et al. (1995) and Hahn (1998) that V is the lower bound for
the asymptotic variance of any regular estimator in a semi-parametric missing data problem.
From theorem 3 and lemma A.1 in Wang & Rao (2002a), it follows that ˆWR and ˆWI
defined by (9) and (10) have the same asymptotic variance, and hence theorem 3 is also valid
for the estimator ˆWR . Therefore, we recommend ˆWR as point estimator because it can be
computed from the imputed data file.
By the ‘plug-in’ method, we can define a consistent estimator of the asymptotic variance
ˆ say V̂ ; that is,
V of ,
1
ˆ 2,
(Ŷ i − )
n i =1
n
V̂ =
ˆ
where ˆ is taken to be ˆWR or ˆWI . The estimator V̂ is simpler than the estimator V̂ n ()
defined by Wang & Rao (2002a). From theorem 3, we obtain
√
1/2 D
n(ˆ − )/ V̂ −→ N(0, 1).
Using this result, we obtain a normal approximation-based confidence interval for , namely
ˆ − z1−/2 V̂ /n, ˆ + z1−/2 V̂ /n ,
where z1−/2 is the 1 − /2 quantile of the standard normal distribution, and ˆ is taken to be
ˆWR or ˆWI .
3. Simulations
In this section, we present a simulation study to compare five methods in terms of coverage
accuracies and average lengths of confidence intervals based on them. The five methods are:
the weight-corrected empirical likelihood (WCEL) proposed in subsection 2.1; the adjusted
empirical likelihood (AEL) suggested in Wang & Rao (2002a); the weight-corrected empirical likelihood with auxiliary information (WCELA) introduced in subsection 2.2; the adjusted
empirical likelihood with auxiliary information (AELA) proposed in Wang & Rao (2002a)
and the normal approximation (NA) methods based on ˆWI and ˆWR . For convenience, in
what follows, NA(ˆWI ) and NA(ˆWR ) denote the corresponding normal approximation confidence intervals for ˆWI and ˆWR .
The first regression model is:
(12)
Y = (X − 1)2 + |X |,
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.
678
L. Xue
Scand J Statist 36
where X follows the truncated normal distribution with truncation constant 4, in which the
normal distribution has mean 1 and variance 1, and follows the normal distribution with
mean 0 and variance 0.16. The kernel functions K (x) and L(x) were, respectively, taken to
be 0.75(1 − x2 )I {|x| ≤ 1} and 0.5I {|x| ≤ 1}, where I {·} is the indicator function. We used the
cross-validation method to select the optimal bandwidth of h. However, since such a selection
involves the value of b, we have to consider the selection of b when we choose h. Looking
back at b defined in m̂b (x), its function is to avoid the technical problems at the boundary of
the support domain of X. Clearly, if the density function of X , f (x), is bounded away from
zero, b can be selected as a small positive value. In practice, when we have a dataset, the
values of the density at these data points are non-zero in many cases. Therefore, the selection
of b is less important than that of h. This observation leads us to propose the following
approach: specify a value of b, say b̃ = n−1/8 . The cross-validation criterion is given by
1
{Yi − m̂(−i)
(Xi ; h)}2 ,
b̃
n i =1
n
cv(h) =
(Xi ; h) is the Nadaraya–Watson estimator of m(Xi ) that is computed when the ith
where m̂(−i)
b̃
observation Xi is deleted. A cross-validation bandwidth hcv is then obtained by minimizing
cv(h) with respect to h; that is, hcv = infh > 0 cv(h). The optimal bandwidth of a, say acv , can
also be selected by using the cross-validation criterion. Mn was taken to be 2 ln n. It is easily
shown that hcv , acv , b̃ and Mn selected by this approach satisfy conditions (C3), (C4), (C7)
and (C8). Therefore, we use the bandwidths hcv and acv to compute the WCEL and WCELA
ratios, and use the bandwidth hcv n−2/15 to compute the AEL and AELA ratios, because the
AEL and AELA methods require undersmoothing the regression estimate.
We generated 5000 Monte Carlo random samples of size n = 30, 60 and 100 based on the
following three selection probability functions proposed by Wang & Rao (2002a).
Case 1. p1 (x) = 0.8 + 0.2|x − 1| if |x − 1| ≤ 1, and 0.95 elsewhere.
Case 2. p2 (x) = 0.9 − 0.2|x − 1| if |x − 1| ≤ 4.5, and 0.1 elsewhere.
Case 3. p3 (x) = 0.6 for all x.
The auxiliary information E(X ) = 1 was used when we calculated the empirical coverage
and average lengths of confidence intervals for WCELA and AELA. To assess whether or
not coverage errors are symmetric between the two tails, we report also the percentage PL of
intervals in which the lower limit is greater than the true value of and the percentage PR of
intervals in which the higher limit is smaller than the true value of . The empirical coverage
in percentage, (PL , PR ), and average lengths of the confidence intervals, with a nominal level
1 − = 0.95, were computed with 5000 simulation runs. The simulation results are reported
in Table 1.
From Table 1, we have the following observations.
(1) In the case where no auxiliary information is available, WCEL performs better than
AEL because the associated confidence intervals have uniformly shorter average lengths
and higher coverage accuracies. In addition, WCEL and AEL have slightly longer
interval lengths, but higher coverage probabilities, than NA(ˆWI ) and NA(ˆWR ). The
√
size of the Monte Carlo errors is 0.95 × 0.05/5000 ≈ 0.00308 for = 0.05. The coverage probabilities for the empirical likelihood are close to the confidence levels claimed
when the sample size is 100.
(2) For the case when auxiliary information is available, we observe that the empirical
coverage levels for confidence intervals based on WCELA are uniformly higher than
those based on AELA, and the average lengths of the confidence intervals based on
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.
Empirical likelihood for response mean
Scand J Statist 36
679
Table 1. For model (12), the empirical coverage (EC) in percentage, indicators of symmetry (PL , PR ) and
average lengths (AL) for confidence intervals for under different selection probability functions p(x) and
different sample sizes n when the nominal level is 0.95
WCEL
AEL
WCELA
AELA
NA(ˆWI )
NA(ˆWR )
30 EC
(PL , PR )
AL
60 EC
(PL , PR )
AL
100 EC
(PL , PR )
AL
91.52
(2.58, 5.90)
0.9865
93.10
(2.04, 4.86)
0.7107
94.12
(2.06, 3.82)
0.5542
91.42
(2.64, 5.94)
0.9871
93.04
(2.08, 4.88)
0.7110
94.08
(2.10, 3.82)
0.5543
90.56
(1.86, 7.58)
1.0575
93.18
(1.80, 5.02)
0.8003
95.48
(1.46, 3.06)
0.6495
90.48
(1.80, 7.72)
1.0585
93.14
(1.80, 5.06)
0.8010
95.46
(1.54, 3.00)
0.6502
90.20
(1.06, 8.74)
0.9774
91.44
(1.02, 7.54)
0.7103
93.12
(1.06, 5.82)
0.5540
90.16
(1.08, 8.76)
0.9775
91.38
(1.04, 7.58)
0.7103
93.10
(1.06, 5.84)
0.5440
p2 (x)
30 EC
(PL , PR )
AL
60 EC
(PL , PR )
AL
100 EC
(PL , PR )
AL
90.92
(2.06, 7.02)
0.9881
92.62
(1.72, 5.66)
0.7226
93.66
(1.78, 4.56)
0.5628
90.72
(1.76, 7.52)
0.9883
92.56
(1.58, 5.86)
0.7227
93.58
(1.48, 4.94)
0.5629
89.16
(1.36, 9.48)
1.0641
92.52
(1.58, 5.9)
0.8108
94.68
(1.38, 3.94)
0.6535
89.06
(1.38, 9.56)
0.0650
92.50
(1.46, 6.04)
0.8112
94.60
(1.18, 4.22)
0.6539
88.98
(0.82, 10.2)
0.9796
91.12
(0.94, 7.94)
0.7148
92.62
(0.90, 6.48)
0.5583
88.52
(0.92, 10.56)
0.9797
90.72
(0.92, 8.36)
0.7148
91.68
(0.80, 7.52)
0.5584
p3 (x)
30 EC
(PL , PR )
AL
60 EC
(PL , PR )
AL
100 EC
(PL , PR )
AL
90.50
(2.38, 7.12)
1.0164
92.52
(2.12, 5.36)
0.7390
93.58
(1.94, 4.48)
0.5751
90.20
(2.54, 7.26)
1.0169
92.50
(2.04, 5.46)
0.7393
93.52
(1.80, 4.68)
0.5755
89.12
(1.92, 8.96)
1.0944
92.48
(1.74, 5.78)
0.8301
94.58
(1.46, 3.96)
0.6673
88.98
(1.98, 9.04)
1.0953
92.46
(1.86, 5.68)
0.8308
94.56
(1.50, 3.94)
0.6675
88.88
(0.94, 10.18)
0.9806
90.92
(1.04, 8.04)
0.7171
92.32
(0.82, 6.86)
0.5603
88.64
(1.30, 10.06)
0.9807
90.14
(1.24, 8.62)
0.7172
91.82
(1.12, 7.06)
0.5603
p(x)
n
p1 (x)
Feature
WCEL, weight-corrected empirical likelihood; AEL, adjusted empirical likelihood; WCELA, WCEL
with auxiliary information; AELA, AEL with auxiliary information.
WCELA are uniformly shorter than those based on AELA. Also, when n = 100, WCELA
obviously outperforms WCEL, which does not use the auxiliary information, and hence
also all NA in term of coverage accuracies of confidence intervals. When n = 30, 60,
WCELA performed poorly, because its ratio has higher dimension than the WCEL ratio.
(3) The empirical likelihood confidence intervals have more balanced tail error rates than
the normal approximation confidence intervals as shown by the values of (PL , PR ) for
all the cases considered. The normal approximation-based confidence intervals
produced larger differences between PL and PR .
(4) All the empirical coverage accuracies increase and the average lengths decrease as n
increases. Also, the coverage accuracies and average lengths depend on the selection
probability function p(x). In case 1, all the methods generally perform better than in
the other two cases, because the missing rate of case 1 is lower than those of cases 2
and 3, where the average missing rates corresponding to the preceding three cases are
approximately 0.09, 0.26 and 0.40, respectively. Generally, the empirical coverage accuracies decrease and average lengths increase as the missing rate increases for every fixed
sample size. These findings basically agree with those that were discovered by Wang &
Rao (2002a).
Our simulation results for the case of normal X agree with those for the truncated normal
X case. Our regularity conditions are satisfied for the latter case, and it is interesting that the
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.
680
L. Xue
Scand J Statist 36
results remain valid for the normal X case although it is not easy for us to prove that the normal
distribution satisfies conditions (C3) and (C4) for the polynomial model considered before.
The choice of the trimming constants b seems not so sensitive with regard to coverage
accuracy and interval length, although it is important. When b was taken to be n−1/7 and
n−1/9 , results similar to Table 1 were obtained.
We now consider the regression model with a two-dimensional covariate (X1 , X2 ), that is,
Y = 5 exp(−0.3X1 − 1.2X2 ) + ,
(13)
where ∼ N(0, 0.32 ), X1 and X2 are independent standard exponential variables with mean 1
and variance 1. The selection probability is taken as:
P( = 1 | X ) =
exp(1 + 0.7X1 + 3X2 )
.
1 + exp(1 + 0.7X1 + 3X2 )
(14)
The kernel functions were taken to be the product kernel K (x1 , x2 ) = K0 (x1 )K0 (x2 ) and
L(x1 , x2 ) = L0 (x1 )L0 (x2 ), where K0 (x) = (15/16)(1 − x2 )2 I {|x| ≤ 1} and L0 (x) = 0.5I {|x| ≤ 1}.
The optimal bandwidths hopt and aopt were selected by using the cross-validation method.
We also used auxiliary information E(Xi ) = 1 when we calculated the empirical coverage and
average lengths of the confidence intervals for WCELA and AELA. We generated 5000 Monte
Carlo random samples of size n = 30, 50, 100 and 200. The empirical coverage in percentage,
(PL , PR ) and average lengths of the confidence intervals, with a nominal level 1 − = 0.95,
were computed with 5000 simulation runs. The simulation results are reported in Table 2.
From Table 2, we obtain the following conclusions. When no auxiliary information is
available, WCEL performs slightly better than AEL. Also, WCEL and AEL obviously outperform all NA in term of coverage accuracies of confidence intervals, and WCEL and AEL
provide more balanced tail error rates than NA, but they have slightly longer interval lengths.
When auxiliary information is available, the average lengths of the confidence intervals based
on WCELA are shorter than those based on AELA, and the empirical coverage levels based
on WCELA are slightly higher than those based on AELA. Also, when n = 100 and 200,
WCELA and AELA perform better than the other four methods because the associated
confidence intervals have uniformly shorter average lengths and higher coverage accuracies.
Table 2. For model (13) and selection probability function (14), the empirical coverage (EC) in percentage,
indicators of symmetry (PL , PR ) and average lengths (AL) for confidence intervals for under different
sample sizes n when the nominal level is 0.95
Feature
WCEL
AEL
WCELA
AELA
NA(ˆWI )
NA(ˆWR )
30
EC
(PL , PR )
AL
93.70
(2.24, 4.06)
0.8643
93.58
(2.18, 4.24)
0.8649
90.62
(1.4, 7.98)
0.6190
90.18
(1.32, 8.5)
0.6197
91.94
(2.2, 5.86)
0.8787
91.94
(2.16, 5.9)
0.8788
50
EC
(PL , PR )
AL
94.40
(2.02, 3.58)
0.6907
94.38
(2.02, 3.6)
0.6910
93.40
(1.28, 5.32)
0.5076
93.38
(1.22, 5.4)
0.5090
93.54
(1.94, 4.52)
0.6904
93.52
(1.9, 4.58)
0.6904
100
EC
(PL , PR )
AL
94.54
(2.12, 3.34)
0.4933
94.52
(2.1, 3.38)
0.4935
95.34
(1.04, 3.62)
0.3765
95.32
(1.0, 3.68)
0.3774
94.18
(2.02, 3.8)
0.4925
94.14
(2.02, 3.84)
0.4926
200
EC
(PL , PR )
AL
94.78
(1.92, 3.3)
0.3501
94.68
(1.92, 3.4)
0.3502
96.04
(0.96, 3.0)
0.2694
96.02
(0.9, 3.08)
0.2700
94.52
(1.72, 3.76)
0.3499
94.50
(1.72, 3.78)
0.3499
n
WCEL, weight-corrected empirical likelihood; AEL, adjusted empirical likelihood; WCELA, WCEL
with auxiliary information; AELA, AEL with auxiliary information.
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.
Scand J Statist 36
Empirical likelihood for response mean
681
In addition, all the empirical coverage accuracies increase and the average lengths decrease
as n increases.
4. Concluding remarks
In this paper, we have proposed a bias-correction technique for constructing an empirical
likelihood ratio when the response might be missing at random. A bias-corrected empirical
likelihood approach to inference for the mean of the response variable was developed. A
non-parametric version of Wilks’ theorem was proved for the weight-corrected empirical
log-likelihood ratio by showing that it has an asymptotic chi-square distribution. Also, with
auxiliary information, a weight-corrected empirical log-likelihood ratio was derived, and it
was shown that the ratio is asymptotically chi-squared. In addition, a normal approximationbased method was considered. The advantage of the empirical likelihood method was
indicated with a simulation study. Our method for the response mean is distinguished from
those of Wang & Rao (2002a) and Qin & Zhang (2007), which focus on constructing weightcorrected empirical likelihood ratios. Also, the bias-correction technique proposed in this
paper might be used to study a class of semi-parametric regression models. The methodology
that is presented here can also be generalized to estimate other marginal parameters or
functions, such as var(Y ), the cumulative distribution function F (y) of Y and the quantiles
of F (y). These relevant problems obviously merit further study.
Acknowledgements
The author thanks the editor, the associate editor and two referees for their thoughtful and
constructive comments and suggestions. The research was supported by the National Natural
Science Foundation of China (10871013), the Beijing Natural Science Foundation (1072004)
and the PhD Program Foundation of Ministry of Education of China (20070005003).
References
Chen, J., Fan, J., Li, K. H. & Zhou, H. (2006). Local quasi-likelihood estimation with data missing at
random. Statist. Sinica 16, 1044–1070.
Chen, S. X. & Hall, P. (1993). Smoothed empirical likelihood confidence intervals for quantiles. Ann.
Statist. 21, 1166–1181.
Chen, J. & Sitter, R. R. (1999). A pseudo-empirical likelihood approach to the effective use of auxiliary
information in complex surveys. Statist. Sinica 9, 385–406.
Cheng, P. E. (1990). Applications of kernel regression estimation: a survey. Commun. Statist. A 19, 4103–
4134.
Cheng, P. E. (1994). Nonparametric estimation of mean functionals with data missing at random.
J. Amer. Statist. Assoc. 89, 81–87.
Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimation of average
treatment effects. Econometrica 66, 315–331.
Hall, P. & La Scala, B. (1990). Methodology and algorithms of empirical likelihood. Int. Statist. Rev.
58, 109–127.
Hall, P. & Murison, R. D. (1993). Correcting the negativity of high-order kernel density estimators.
J. Multivariate Anal. 47, 103–122.
Hjort, N. L., McKeague, I. W. & Van Keilegom, I. (2009). Extending the scope of empirical likelihood.
Ann. Statist. 37, 1039–1111.
Kitamura, Y. (1997). Empirical likelihood methods with weakly dependent processes. Ann. Statist. 25,
2084–2102.
Neyman, J. (1938). Contribution to the theory of sampling human population. J. Amer. Statist. Assoc.
33, 101–106.
Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single function. Biometrika 75,
237–249.
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.
682
L. Xue
Scand J Statist 36
Owen, A. B. (1990). Empirical likelihood ratio confidence regions. Ann. Statist. 18, 90–120.
Peng, L. (2004). Empirical-likelihood-based confidence interval for the mean with a heavy-tailed distribution. Ann. Statist. 32, 1192–1214.
Qin, J. & Zhang, B (2007). Empirical-likelihood-based inference in missing response problems and its
application in observational studies. J. R. Statist. Soc. Ser. B Stat. Methodol. 69, 101–122.
Robins, J. M., Rotnitzky, A. & Zhao, L. P. (1995). Analysis of semiparametric regression models for
repeated outcomes in the presence of missing data. J. Amer. Statist. Assoc. 90, 106–121.
Rubin, D. B. (1976). Inference and missing data. Biometrika 63, 581–592.
Spiegelman, C. & Sacks, J. (1980). Consistent window estimation in nonparametric regression. Ann.
Statist. 5, 595–620.
Stute, W., Xue, L. G. & Zhu, L. X. (2007). Empirical likelihood inference in nonlinear error in covariables models with validation data. J. Amer. Statist. Assoc. 102, 332–346.
Wang, Q. H., Linton, O. & Härdle, W. (2004). Semiparametric regression analysis with missing response
at random. J. Amer. Statist. Assoc. 99, 334–345.
Wang, Q. H. & Rao, J. N. K. (2002a). Empirical likelihood-based inference under imputation for missing
response data. Ann. Statist. 30, 896–924.
Wang, Q. H. & Rao, J. N. K. (2002b). Empirical likelihood-based inference in linear models with missing
data. Scand. J. Statist. 29, 563–576.
Xue, L. G. & Zhu, L. X. (2006). Empirical likelihood for single-index models. J. Multivariate Anal.. 97,
1295–1312.
Xue, L. G. & Zhu, L. X. (2007a). Empirical likelihood for a varying coefficient model with longitudinal
data. J. Amer. Statist. Assoc. 102, 642–654.
Xue, L. G. & Zhu, L. X. (2007b). Empirical likelihood semiparametric regression analysis for longitudinal data. Biometrika 94, 921-937.
Zhu, L. X. & Fang, K. T. (1996). Asymptotics for kernel estimate of sliced inverse regression. Ann. Statist. 34, 1053–1069.
Zhu, L. X. & Xue, L. G. (2006). Empirical likelihood confidence regions in a partially linear single-index
model. J. R. Statist. Soc. B 68, 549–570.
Received April 2007, in final form March 2009
Liugen Xue, College of Applied Sciences, Beijing University of Technology, Beijing 100124, P.R. China.
E-mail: [email protected]
Appendix
In this Appendix, we provide proofs of theorems 1–3. The following lemmas are useful for
proving the theorems.
Lemma 1
Suppose that conditions (C1)–(C3) and (C5) hold. We then have, uniformly over 1 ≤ i ≤ n,
E{m̂b (Xi ) − m(Xi )}2 = O((nhd b2 )−1 ) + O(h2r b−2 ) + o(n−1/2 ),
where m̂b (·) is defined in (2).
Proof. Denote gb (x) = max{b, g(x)} and mb (x) = m(x)g(x)/gb (x). We have, for all 1 ≤ i ≤ n,
E{mb (Xi ) − m(Xi )}2 ≤ cE[|m(X )|I {g(X ) < b}] = o(n−1/2 ),
where c is a positive constant. Therefore, to prove lemma 1, we only need to show that, for
all 1 ≤ i ≤ n,
E{m̂b (Xi ) − mb (Xi )}2 = O((nhd b2 )−1 ) + O(h2r b−2 ) + o(n−1/2 ).
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.
(A1)
Empirical likelihood for response mean
Scand J Statist 36
683
Let
ĝ(x) = (nhd )−1
n
j Kh (Xj − x),
ĝ b (x) = max{b, ĝ(x)},
j =1
n (x) = (nhd )−1
n
j {Yj − m(Xj )}Kh (Xj − x),
j =1
n (x) = (nhd )−1
n
j {m(Xj ) − m(x)}Kh (Xj − x),
j =1
Qn (x) = m(x){ĝ(x)gb (x) − g(x)ĝ b (x)}/{gb (x)ĝ b (x)},
and denote Tn (x) = m̂b (x) − mb (x). By direct calculation, it can be verified that
Tn (x) = {n (x) + n (x)}/ ĝ b (x) + Qn (x).
Consequently, for all 1 ≤ i ≤ n, we have
E{Tn2 (Xi )} ≤ 3b−2 E{2n (Xi )} + 3b−2 E{2n (Xi )} + 3E{Qn2 (Xi )}.
(A2)
It can be, that, for all 1 ≤ i ≤ n,
E{2n (Xi )} = O((nhd )−1 ),
(A3)
E{2n (Xi )} = O((nhd−2 )−1 ) + O(h2r ),
(A4)
E{Qn2 (Xi )} = O((nhd b2 )−1 ) + O(h2r b−2 ) + o(n−1/2 ).
(A5)
This together with (A2)–(A5) proves (A1), and hence lemma 1 is proved.
Lemma 2
Suppose that conditions (C1), (C4) and (C6), except for m(x) in (C1), hold. Then we have,
uniformly over 1 ≤ i ≤ n,
E{p̂(Xi ) − p(Xi )}2 = O((nad )−1 Mnd ) + O(a2r ) + o(n−1/2 ),
where p̂(·) is defined in (4).
Proof. Following the lines of Spiegelman & Sacks (1980), we can prove that, uniformly
over 1 ≤ i ≤ n,
E{1/Cn (Xi )} = O((nad )−1 Mnd ) + o(n−1/2 ),
where
Cn (Xi ) = max 1, c1
I {Xk − Xi ≤ a} .
k =i
/
Denote
n
Wnj (x) = La (Xj − x)/ max 1,
La (Xk − x) .
k =1
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.
(A6)
684
L. Xue
Scand J Statist 36
By direct calculation, we obtain
n
n
2
2
Wnj (Xi ){j − p(Xj )} + 3E
E{p̂(Xi ) − p(Xi )}2 ≤ 3E
Wnj (Xi ){p(Xj ) − p(Xi )}
j =1
+ 3E
n
2
j =1
Wnj (Xi ) − 1 p(Xi )
j =1
≡ J1 + J2 + J3 .
(A7)
By othogonality and (A6), we can derive that
J1 = O((nad )−1 Mnd ) + o(n−1/2 ),
(A8)
J2 = O((nad )−1 Mnd ) + o(n−1/2 ) + O(a2r ),
(A9)
J3 = O((nad )−1 Mnd ) + o(n−1/2 ).
(A10)
Substituting (A8)–(A10) into (A7), the proof of lemma 2 is completed.
Lemma 3
Suppose that conditions (C1)–(C8) hold. If is the true parameter, then
1 D
√
(Ŷ i − ) −→ N(0, V ),
n i =1
n
where V is defined in theorem 3.
Proof. We also use the notations of lemmas 1 and 2. It is straightforward to obtain
1 √
(Ŷ i − ) = T1 + T2 + T3 ,
n i =1
n
where
n 1 i
{Yi − m(Xi )} + {m(Xi ) − } ,
T1 = √
n i = 1 p(Xi )
n 1
1 1
=
T2 √
−
i {Yi − m(Xi )},
n i = 1 p̂(Xi ) p(Xi )
n 1 i
T3 = √
{m̂b (Xi ) − m(Xi )}.
1−
p̂(Xi )
n i =1
√
Since nT1 is a sum of i.i.d. random variables, by the central limit theorem we get that
D
T1 −→ N(0, V ). We can prove that T = oP (1), = 2, 3, from which lemma 3 follows.
Lemma 4
Suppose that conditions (C1)–(C8) hold. If is the true parameter, then
1
P
(Ŷ i − )2 −→ V,
n i =1
n
where V is defined in theorem 3.
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.
Empirical likelihood for response mean
Scand J Statist 36
685
Lemma 5
Suppose that conditions (C1)–(C8) hold. Then
max |Ŷ i | = oP (n1/2 ) and = OP (n−1/2 ).
1≤i≤n
By lemmas 1 and 2, and using some arguments similar to those used in the proof of lemma
3, we can prove lemma 4. Similar to the proof of lemmas A.3 and A.4 in Wang & Rao
(2002a), we also can prove lemma 5. Here, their proofs are omitted.
Proof of theorem 1. Using (5), (6) and lemmas 3–5, and similar to the proof of theorem 1
in Wang & Rao (2002a), we can obtain that
2 n
−1
n
1
2
ˆl() = √1
+ oP (1).
(Ŷ i − )
(Ŷ i − )
n i =1
n i =1
This together with lemmas 3 and 4 proves theorem 1.
Proof of theorem 2. By (7) and (8), and similar to the proof of theorem 1, it can be shown
that
T
n
n
1 −1
ˆl n, AI () = √1
() Vn, AI √
() + oP (1),
(A11)
n i =1 i
n i =1 i
where
Vn, AI =
Vn2
,
Vn
Vn1
Vn2
1
(Ŷ i − )2 ,
n i =1
n
Vn =
1
A(Xi )AT (Xi ),
n i =1
n
Vn1 =
1
A(Xi )(Ŷ i − ).
n i =1
n
Vn2 =
Similar to the proof of theorem 3.2 in Wang & Rao (2002a), it can be shown that
1 D
√
() −→ N(0, VAI ),
n i =1 i
n
where
VAI =
V1
V2
V2
V
(A12)
with V1 = E{A(X )AT (X )}, V2 = E{A(X )(m(X ) − )} and V is defined in theorem 3. Similar
to the proof of (A.55) in Wang & Rao (2002a), we can prove that
P
Vn, AI −→ VAI .
(A13)
Therefore, theorem 2 follows immediately from (A11) to (A13).
Proof of theorem 3. From (10) and (11), we get that
√
1 n(ˆ − ) = √
(Ŷ i − ) + oP (1).
n i =1
n
This together with lemma 3 proves theorem 3.
© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.