Download Binary Time Series Modeling With Application to

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Binary Time Series Modeling With Application
to Adhesion Frequency Experiments
Ying H UNG, Veronika Z ARNITSYNA, Yan Z HANG, Cheng Z HU, and C. F. Jeff W U
Repeated adhesion frequency assay is the only published method for measuring the kinetic rates of cell adhesion. Cell adhesion plays an
important role in many physiological and pathological processes. Traditional analysis of adhesion frequency experiments assumes that the
adhesion test cycles are independent Bernoulli trials. This assumption often can be violated in practice. Motivated by the analysis of repeated
adhesion tests, a binary time series model incorporating random effects is developed. A goodness-of-fit statistic is introduced to assess the
adequacy of distribution assumptions on the dependent binary data with random effects. The asymptotic distribution of the goodness-of-fit
statistic is derived, and its finite-sample performance is examined through a simulation study. Application of the proposed methodology to
real data from a T-cell experiment reveals some interesting information, including the dependency between repeated adhesion tests.
KEY WORDS: Cell adhesion; Goodness-of-fit test; Micropipette experiments; Random effects.
1. INTRODUCTION
This research is motivated by the statistical analysis of time
series data from biomechanical experiments that study protein,
DNA, and RNA at the level of single molecules (Mehta, Rief,
Spudich, Smith, and Simmons 1999). Single-molecule mechanics experiments use ultrasensitive force techniques to mechanically characterize a single pair of molecules that physically
links the force sensor to a sample surface. Figure 1 illustrates
a simple experiment, the micropipette adhesion frequency assay (Chesla, Selvaraj, and Zhu 1998). Here a human red blood
cell (left) pressurized by micropipette suction is used as a force
transducer to test interactions between molecules presented on
the red cell membrane and the counter molecules on the surface of another cell (right; only partly shown). The two cells
are put together for a predetermined duration [Fig. 1(b)], then
retracted away. The simplest measurement evaluates whether
a controlled contact results in adhesion. If adhesion is resulted,
then retraction will stretch the red cell [Fig. 1(c)]. If no adhesion
is resulted, then the red cell will not be stretched [Fig. 1(a)].
To ensure adhesion mediated by a single molecular bond, the
experimental condition is designed such that adhesion is infrequent (Zhu, Long, Chesla, and Bongrand 2002). As such, in any
particular test, both positive (i.e., adhesion; scored as 1) and
negative (i.e., no adhesion; scored as 0) outcomes are possible and random. Due to the inherent stochastic nature of single
molecular interactions, such analysis requires numerous measurements to obtain their statistical properties; for example, the
probability of adhesion can be estimated from the frequency of
occurrence of adhesion in a large number of contacts (Chesla
et al. 1998). The probability distribution of single bond lifetimes can be estimated from the histogram of a large number of
lifetime measurements (Marshall et al. 2003). Experimentally,
these are obtained by sequentially repeating the measurements
many times.
Ying Hung is Assistant Professor, Department of Statistics, Rutgers University, Piscataway, NJ 08854, Yan Zhang is Professor, Institute of Biomedical Engineering, Department of Anatomy, Second Military University, China; Cheng
Zhu is Regent’s Professor, Wallace H. Coulter Department of Biomedical Engineering; and C. F. Jeff Wu is Professor, H. Milton Stewart School of Industrial
and Systems Engineering (E-mail: [email protected]), Georgia Institute
of Technology, Atlanta, GA 30332. Wu’s research was supported by National
Science Foundation grant DMS 0305996. Zhu’s research was supported by National Institute of Health grants AI38282 and AI44902. The authors thank the
joint editors, an associate editor, and two referees for their helpful comments
and suggestions.
A crucial assumption that allows the use of measurements
from repeated tests for probability calculation is that all measurements are identical yet independent from one another; that
is, the test sequence consists of independent and identically distributed random variables. But this may or may not be valid,
depending on the particular biological system in question. Recently, Zarnitsyna et al. (2007) demonstrated that this assumption is not valid in some biological systems. Specifically, they
showed that adhesion occurring in the immediate past test can
either increase or decrease the likelihood for the next test to
result in adhesion. A simple analysis has been developed to determine whether the independent assumption is valid and, if it is
not, to measure the amount of change in the probability of adhesion in the next test due to adhesion occurring in the immediate
past test.
In this article we extend the simple analysis to a more sophisticated binary time series model. Numerous methods for
binary time series analysis are available in the literature (Zeger
and Qaqish 1988; Li 1994; Slud and Kedem 1994; Benjamin,
Robert, and Stasinopolous 2003). Most of these methods have
been developed for single series of observations; extensions
to multiple binary time series modeling and related inferences
have not been studied systematically. Both Li (1994) and Kedem and Fokianos (2002, p. 84) have pointed out the importance of extensions to cases in which a series is collected for
each individual. This is different from classical time series
analysis, in that the binary time series are observed on different replicates of the experimental units. Correlation among
the repeated observations may arise not only from memory effects, but also from shared unobserved variables. Therefore,
more general models are needed to incorporate the correlations among repeated observations. Another important issue is
model diagnostics. As an alternative to Pearson’s chi-squared
test, which works under the As an alternative assumption, chisquared test, statistics and their theoretical properties need to be
developed.
The remainder of this article is organized as follows. Some
preliminary analysis results for an adhesion frequency experiment are presented in Section 2. In Section 3 a class of multiple
binary time series models is proposed. In Section 4 goodnessof-fit test for model assumptions and their asymptotic properties are derived, and its finite-sample performance is examined
1248
© 2008 American Statistical Association
Journal of the American Statistical Association
September 2008, Vol. 103, No. 483, Theory and Methods
DOI 10.1198/016214508000000508
Hung et al.: Binary Time Series Modeling
1249
Figure 1. Photomicrographs of the micropipette adhesion frequency assay.
through a simulation study. In Section 5 the proposed model and
inferences are applied to the same experiment, and the results
are compared with those in Section 2. A summary and some
concluding remarks are given in Section 6.
2. PRELIMINARY ANALYSIS OF AN ADHESION
FREQUENCY EXPERIMENT
In the micropipette adhesion frequency assay, adhesion between the two cells was staged by placing the cells onto controlled contact for a given time and area through a computerdriven micromanipulation to ensure that each contact was as
close to identical as possible to any other contacts (Fig. 1). The
average number of bonds (ANB) is a transformation of the contact time (Chesla et al. 1998). For each ANB, several replicates
of cell pairs were tested. For each pair of cells, the adhesion
test cycle (i.e., contact and retraction) was repeated 50 times.
Test scores (denoted by y) were recorded in binary form (i.e.,
y = 0 or 1), resulting in multiple binary time series of the type
exemplified in Table 1.
Under independent Bernoulli trial assumption (Chesla et al.
1998), the average adhesion probability (PANB ) can be simply
estimated by the adhesion frequency, calculated as
PANB =
number of adhesions
.
number of test cycles
(1)
Figure 2 shows an example of the relationship between PANB
and ANB (unpublished data, courtesy of Y. Zhang and C. Zhu).
In this micropipette experiment, the adhesion test was conducted with seven different ANBs (.085, .17, .255, .34, .51, .68,
and 1.36). Each of the first two ANBs, has six pairs of replicates, whereas each of the others have five pairs each. Each
point in Figure 2 represents the PANB value for one pair of
cells, as calculated from (1). The solid line represents the average over all of the replicates under the same ANB.
The existing method of characterizing the relationship between PANB and ANB (Chesla et al. 1998) is based on the assumption that the binary time series data (e.g., Table 1) form
Bernoulli sequences. But for each pair of cells, the adhesion test
cycles are observed repeatedly. The independence assumption
may not hold, as demonstrated recently (Zarnitsyna et al. 2007);
therefore, the adequacy of the distributional assumption must be
checked before the method is applied. One graphical technique
for assessing this assumption is the probability plot. If the data
are collected from independent Bernoulli trials, then the number of trials needed to achieve success will follow a geometric
distribution with probability p, where p = Pr(y = 1). For each
ANB, the number of tests needed to achieve success is calculated over all replicates. Then its empirical cumulative distribution can be plotted against the geometric distribution, where the
parameter p is estimated by (1) at each ANB. In the probability
plots (not shown here to save space), significant deviations from
the straight line indicate violation of the independent Bernoulli
assumption. Zarnitsyna et al. (2007), reached similar conclusions using a different analysis, which motivated our present
work.
To provide further insight into the violation of the independent Bernoulli assumption, we use additional graphical plots to
better characterize the dependence among repeated binary observations. The idea is to compare the conditional PANB given
the previous test results. Define P (1|1) to be the conditional
PANB given adhesion in the previous test and P (1|0) to be the
conditional PANB given no adhesion in the previous test. If the
test results are independent, then P (1|1) should be equal to
P (1|0), and both can be estimated by the PANB in (1). In Figure 3, for each ANB, the “+” points represent the conditional
probability, P (1|1), calculated for each replicate, and the dotted line represents P (1|1) calculated over all replicates under
Table 1. Example of adhesion frequency experiment data
ANB
Fifty repeated adhesion tests
.085
.085
01010011011101010000 . . .
00010000100010100110 . . .
..
.
00111010000001000011 . . .
11110000111000000011 . . .
..
.
1.360
1.360
Figure 2. PANB varying with the ANB.
1250
Journal of the American Statistical Association, September 2008
dent with a common underlying multivariate distribution. This
model is used to represent the natural heterogeneity across individuals in the regression coefficient. More discussion about this
model has been given by Diggle, Heaqerty, Liang, and Zeger
(2002).
3.1.2 Binary Time Series Models. Non-Gaussian time series modeling techniques have been discussed extensively in
the literature. Benjamin et al. (2003) proposed a generalized autoregressive moving average (GARMA) model. Applying this
GARMA model with logistic link, a binary time series {yt } can
be fitted as
logit(µt ) = xt" α +
Figure 3. Memory effects in micropipette experiments (
, P (1|0)).
, P (1|1);
the same ANB. Similarly, the triangle points and dashed line
are those for the conditional probability P (1|0). For comparison, the solid line represents the PANB calculated by (1) at each
ANB. Because the dotted line and “+” points are much higher
than the dashed line and triangle points, the PANB is higher if
adhesion occurs in the previous test. This provides strong evidence of a memory effect on repeated tests. A more in-depth biological discussion has been given by Zarnitsyna et al. (2007),
who first observed the memory effect through a different analysis. From Figure 3, we can infer the existence of serial correlations and interactions. The figure also shows the heterogeneity
among subjects. To describe and quantify significant effects on
PANB , we consider the use of a new binary time series model
that incorporates the various effects suggested by the plots.
3. MODELING AND ESTIMATION
3.1 Modeling
In this section we propose a new binary time series model.
But first, we review some existing models.
3.1.1 Random-Effects Models. Random-effects models are
most useful in longitudinal data analysis when correlation arises
from some unobservable variables shared among repeated observations. Consider a binary realization {yij } taking value 0 or
1 for subject i at the j th observation. For given subject-specific
coefficients βi , assuming that the repeated observations for each
individual are independent, the random-effects model takes the
form
log
Pr(yij = 1|βi )
= β0 + βi + xij" α,
1 − Pr(yij = 1|βi )
(2)
where the vector xij denotes the covariates associated with the
fixed effects α and the random effects βi are mutually indepen-
R
!
r=1
ϕr A(yt−r ) +
Q
!
q=1
ζq M(yt−q , µt−q ),
(3)
where xt are covariates at time t and µt = E(yt |Ht ) is
the conditional mean given the previous information Ht =
{xt , . . . , x1 , yt−1 , . . . , y1 , µt−1 , . . . , µ1 }. A and M are functions representing the autoregressive (AR) and moving average (MA) terms with corresponding order R and Q. These two
functions together are denoted by ARMA(R, Q). The ϕr ’s and
ζq ’s are the AR and MA parameters. For binary time series, a
reasonable choice for A and M can be yt and residuals such as
yt − µt .
Model (3) includes many well-known models as special
cases. One important submodel is the Zeger–Qaqish model with
logistic link,
logit(µt ) = xt" α +
R
!
ϕr yt−r ,
(4)
r=1
and the MA form of this model (Li 1994),
logit(µt ) = xt" α +
Q
!
q=1
ζq (yt−q − µt−q ).
(5)
Asymptotic properties for the AR logistic regression models
have been explored through conditional likelihood (Kaufmann
1987) and partial likelihood (Kedem and Fokianos 2002).
We propose a binary time series mixed (BTSM) model,
a multiple logistic time series model with random effects that
takes into account the heterogeneity among experimental units.
Consider a binary time series realization {yit } taking values 0
or 1 for subject i at time t, where i = 1, . . . , m, t = 1, . . . , n,
and mn = N . Suppose that the experimental units are sampled
from a population. It is reasonable to assume that the random effects, β i , are independent from a normal distribution with mean
b and variance σb2 . For the vector β = (β1 , . . . , βm )" , its distribution can be written as N (b, #), where b is a column of b’s
of length m, # = σb2 Im , and Im is the m × m identity matrix.
The vector xit = {xit,1 , . . . , xit,p }" denotes the covariates associated with the p-dimensional fixed effects α = (α1 , . . . , αp )" ,
and zit = {zit,1 , . . . , zit,m }" denotes the design matrix for the
random effects β such that z"it β = βi , that is, zit,i = 1 and
zit,j = 0 for all j $= i.
Denote the conditional mean as µit = E(yit |Hit ). Given the
previous information, Hit = {xit , xit−1 , xit−2 , . . . , yit−1 , yit−2 ,
Hung et al.: Binary Time Series Modeling
1251
. . . , µit−1 , µit−2 , . . .} and random effects, the yit ’s are condiβ
tionally independent with mean E(yit |β, Hit ) = µit . By the loβ
gistic link function, the conditional mean µit is related to the
β
linear predictor ηit by
β
β
logit(µit ) = ηit
= z"it β + x"it α +
+
Q
!
q=1
L
!
l=1
γi xit−l yit−l +
ζq (yit−q − µit−q ).
R
!
ϕr yit−r
r=1
(6)
This model is called a BTSM model. The random effects β are
used to represent a variety of situations, including subject heterogeneity, unobserved covariates, and other forms of overdispersion. Here the heterogeneity is modeled directly through
subject-specific parameter. If random intercept alone cannot
sufficiently capture the variation exhibited in the data, then
this model can be easily extended to a general form by incorporating more complicated random effects. Given βi , the
yit ’s are correlated because yit−l explicitly influences yit . This
correlation can be explained by the AR and MA components
in (6). The MA process involving µi,t−q makes the model
more complicated. In this formulation, the interaction terms
(xit−1 yit−1 , . . . , xit−L yit−L ) between covariates and past outcomes provide flexibility in adjusting the time series structure
with respect to different covariate settings.
Our proposed BTSM model is general and includes the aforementioned models. The development here is based on the logistic link, which is popular and easy to interpret. It can be
easily extended to other link functions, however. The randomeffects model in (2) is a submodel of the BTSM model under
the assumption that the repeated measurements for each unit
are independent, and the correlations among repeated observations arise only from some unobserved variables. With the
logistic link function, the GARMA model in (3) is a special
case of the BTSM model with no random effect included; that
is, based on the population average, it models the time series
structure without considering the heterogeneities among the
units.
More than a simple extension of existing models, the BTSM
model poses some challenging tasks. By considering the hidden variables shared among units, it incorporates random effects in logistic time series regression. This makes estimation
and inference more complicated and different from that in standard binary time series analysis. Another important issue is the
goodness-of-fit test for model diagnostics. Related works on
linear mixed models has been reported in the literature (Jiang
2001a,b); however, there is no existing method for testing the
distributional assumption in binary time series models with random effects. Furthermore, the asymptotic chi-squared distribution cannot be applied to the new test statistics because of its
independence assumption. Instead, we use a martingale central
limit theorem in the next section to derive the asymptotic properties.
3.2 Estimation by Partial Likelihood
The model-fitting procedure herein is based on partial likelihood (PL). PL was introduced by Cox (1975). A more formal
definition and theoretical justification have been given by Wong
(1986). Fokianos and Kedem (2004) have discussed using PL in
time series that follow generalized linear models (GLMs). For
the BTSM model, the presence of random effects causes some
difficulty with integration, making the estimation different from
that of standard methods in time series analysis. In this section
we propose an approximation procedure to tackle this problem.
Denote the observation vector by y = (y1 , . . . , ym )" , where
the observations for subject i are yi = (yi1 , . . . , yin )" , γ =
(γ1 , . . . , γO )" , ϕ = (ϕ1 , . . . , ϕR )" , and ζ = (ζ1 , · · · , ζQ )" . Assume that
ω = (α " , γ " , ϕ " , ζ " )"
are s-dimensional fixed effects and that X is the corresponding
matrix with rows
"
Xit" = x"it , xit−1 yit−1 , . . . , xit−O yit−O , yit−1 , . . . , yit−R ,
#
(yi,t−1 − µit−1 ), . . . , (yi,t−Q − µit−Q ) .
Similarly, with rows z"it , the design matrixes for the random
effects are denoted by Z. Given the previous information Hit ,
the corresponding PL for fixed effects is
PL(ω|β) =
m $
n
$
i=1 t=1
plω (yit |β, Hit )
m $
n
$
=
[πit (ω|β)]yit [1 − πit (ω|β)]1−yit ,
i=1 t=1
β
where πit (ω|β) = Pω|β (yit = 1|Hit ) = µit .
The integrated quasi-PL function used to estimate (ω, σb2 ) is
defined by
'
&
%
1
|#|−1/2 exp log PL(ω|β) − β " # −1 β dβ.
2
Because of the difficulty in implementing the full PL, we use
the penalized quasi-PL (PQPL) as an approximation. The integrated quasi-partial log-likelihood can be approximated by
Laplace’s method (Barndorff-Nielsen and Cox 1989; Breslow
and Clayton 1993),
1
− log |Im + Zt WZ#|
2
)
m !
n (
!
πit (ω|β̃)
yit log
+
+ log(1 − πt (ω|β̃))
1 − πit (ω|β̃)
i=1 t=1
1 "
− β̃ # −1 β̃,
(7)
2
where W is the N × N diagonal matrix with diagonal terms
= πit *
(ω|β̃)(1 − πit (ω|β̃)) and β̃ = β̃(ω, σb ) is the solution
wit*
n
2
of m
i=1
t=1 (yit − πit (ω|β))zit − β/σb = 0, which maximizes the sum of the last two terms in (7). Using derivations
similar to those of Breslow and Clayton (1993), the PQPL score
equations for fixed effects ω and random effects β are
m !
n
!
i=1 t=1
Xit (yit − πit (ω, σb )) = 0
(8)
1252
Journal of the American Statistical Association, September 2008
and
m !
n
!
i=1 t=1
zit (yit − πit (ω, σb )) = # −1 β,
(9)
where πit (ω, σb ) = Pω,σb (yit = 1|Hit ). Given σb , the maximum quasi-PL estimator (MQPLE) of (ω̂, β̂) can be obtained
by solving these two score equations. An important role in PL
inference is played by the score process (8)–(9), which is a vector of martingales with respect to Hit . Thus the study of asymptotic behavior of the MQPLE ω̂, described in Section 3.4,
is based on central limit theorems for martingales.
Questions regarding the existence and uniqueness of the MQPLE are important. Fu (1998) discussed these issues in a penalized likelihood estimation problem. Similar results can be
extended to the MQPLE and provide the essential conditions
needed for existence and uniqueness of the MQPLE.
The restricted maximum likelihood (REML) (Patterson and
Thompson 1971) version of the approximated profile quasilikelihood function for the variance components can be written
as
ql(ω̂(σb ), σb ) ≈ − 12 log |V| − 12 log |X" V−1 X|
− 12 (Y − Xω̂)" V−1 (Y − Xω̂),
fixed-effects estimators ω̂. With the aid of the working dependent variables Y defined in Section 3, Theorem 2 gives the asymptotic properties for the REML estimator of σb2 based on
some asymptotic properties for linear mixed models (LMMs)
with GLM iterative weights (Jiang 1996). Assumptions and
proofs are given in the appendixes.
Theorem 1. Under assumptions A1 and A2, the MQPLE for
the fixed effects ω̂ are consistent and asymptotically normal as
N → ∞,
√
1
N (ω̂ − ω) = +−1
(12)
N √ Sn (ω, σb ) + op (1)
N
and
√
d
1/2
N +N (ω̂ − ω) −→ N (0, Is ),
(13)
where
sample information matrix +N = N1 ×
*n *the
m
"
i=1 Xit Xit πit (ω, σb )(1 − πit (ω, σb )) and Sn (ω, σb ) =
*t=1
n *m
"
t=1
i=1 Xit (yit − πit (ω, σb )) = X (y − π(ω, σb )).
Based on work of Breslow and Clayton (1993), inference on
variance component in model (6) can be formulated as an iterative procedure to estimate LMMs with the GLM iterative
weight W−1 as
(10)
where X and Z are the design matrixes, V = W−1 + Z#Z" , and
β
β
β
Y is a vector with components Yit = ηit + (yit − µit )/(µit (1 −
β
µit )). Differentiating (10) with respect to σb2 gives the estimating equation for the variance components (Harville 1977;
Searle, Casella, and McCullock 1992),
)'
&
(
1
∂V
∂V
= 0,
− (Y − Xω)" V−1 2 V−1 (Y − Xω) − tr P 2
2
∂σb
∂σb
(11)
where P = V−1 − V−1 X(X" V−1 )−1 X" V−1 .
Estimation of the fixed effects and variance components can
be obtained by iteratively solving (8), (9), and (11). This estimation procedure differs from that for standard generalized linear
mixed models (GLMMs) because the new model involves time
series structure; that is, the µit−q term depends on all of the previous observations throughout the iterations. We may compute
µit−q by setting the initial µit−q ’s to 0 or to the sample mean of
yit . This should have a negligible effect for a sufficiently long
iteration. Estimation can be carried out by simple modification
in standard statistical software for GLMMs, such as the SAS
GLIMMIX package. Details about GLMMs have been given
by Breslow and Clayton (1993). Questions regarding robust estimation and an efficient algorithm have been addressed by several authors (e.g., McCulloch 1997; Lin and Breslow 1996).
Y = Xω + Zβ + ,,
where β comes from N (0, #), , = (,1 , . . . , ,N ) follows from
N (0, W−1 ), and the corresponding wit ’s are rewritten as
(w1 , . . . , wN ). Recall that # = σb2 Im . Jiang (1996) developed
rigorous asymptotic properties for REML estimates of variance
components in LMMs without the normality assumption on
random effects and errors; thus Theorem 2 is a special case
for LMMs with known unequal weights. Here we borrow some
notation from Jiang (1996). Define
"
#−1
V∗ = A At W1/2 VW1/2 A At ,
where A is any N × (N − s) matrix such that rank(A) = N − s
and At W1/2 X = 0;
+
"
#"
g(σb ) = IN , σb2 W1/2 Z ;
 √
, wl ,
1≤l≤N

 l
β
-l = +l−N ,
N + 1 ≤ l ≤ N + m;

 σ2
b
"
#−1/2 " 1/2
V1 = A" W1/2 VW1/2 A
AW
"
#−1/2
× ZZ" W1/2 A A" W1/2 V W1/2 A
;
V1 (σb ) = g(σb )V∗ W1/2 ZZ" W1/2 V∗ g(σb )" ;
IN =
3.3 Asymptotic Properties
Here we study large-sample properties for fixed effects and
variance components in our BTSM model. Considering a model
that includes time-dependent covariates, Fokianos and Kedem
(2004) studied the asymptotic behavior of fixed effects ω in
generalized linear time series models using PL inference. Theorem 1 is an extension of their work to multiple binary time
series models with random effects. Based on the quasi-PL, the
theorem gives the consistency and asymptotic normality for the
(14)
and
tr(V1 V1 )
;
m
K=
N +m
1 !
(E-l4 − 3)V1 (σb )2ll ;
m
l=1
J = 2I N + K.
Theorem 2. Under assumptions A3 and A4, as N → ∞ and
m → ∞, the REML estimate for variance components is consistent and asymptotically normal with
√
d
J −1/2 I N m(σ̂b2 − σb2 ) −→ N (0, 1).
(15)
Hung et al.: Binary Time Series Modeling
1253
4. GOODNESS OF FIT FOR MODEL DIAGNOSTICS
4.1 Goodness-of-Fit Test
Pearson’s chi-squared test is generally used to test whether
data follow some specific distribution. An important assumption for this test is the independence of the observations. How
can we perform testing for model assumptions when the data
come from a binary time series model? One approach is to
classify the responses according to mutually exclusive events
in terms of the previous output and then check the differences
between the observed and theoretical frequencies in each category. This can be written as follows.
Assume that the binary data yit come from a binomial
distribution with probability p depending on Hit . Hit , defined in Section 3.1, can be decomposed into several mutually exclusive events. Recall that Hit = {xit , xit−1 , xit−2 , . . . ,
yit−1 , yit−2 , . . . , µit−1 , µit−2 , . . .}. Suppose that the decomposition is determined by n1 covariates (xit , . . . , xit−n1 ) decomposed into c1 exclusive subsets, n2 AR effects (yit−1 , . . . ,
yit−n2 ) decomposed into c2 exclusive subsets, and n3 MA effects (µit−1 , . . . , µit−n3 ) decomposed into c3 exclusive subsets, where ci ≥ 1, for i = 1, 2, 3. Therefore, there are K exclusive events denoted by E1 , . . . , EK with
K = c1 c2 c3 .
Define
Mk ≡
and
ek (ω, σb ) ≡
m !
n
!
i=1 t=1
ni
m !
!
i=1 t=1
(16)
(18)
k=1
Unlike Pearson’s chi-squared test, the asymptotic distribution
for this new statistic may not be chi-squared. Thus there is
no need to have a normalizing constant in the test statistic to
achieve a chi-squared distribution. Instead, for simplicity, we
choose a unified N , as was suggested by Jiang (2001b).
The asymptotic properties for the test statistic (18) are given
in Theorem 3, a proof of which is given in Appendix D. The
following notation is used in the theorem. Define
θ = (ω" , σb ),
'
& !!
∂
1
1(Hit ∈Ek ) " πit (θ )
+−1
D=
N ,
N
∂ω
1≤k≤K
t
i
1
0
Git = 1(Hit ∈Ek ) − DXit 1≤k≤K ,
i
1(Hit ∈Ek ) Pω,σb (yit = 1|Hit ).
K
!
(Mk − ek (ω, σb ))2
k=1
K
1 !
(Mk − ek (ω̂, σ̂b ))2 .
N
χ̂ 2 =
1
C = √ W1/2 V∗ W1/2 ZZ" W1/2 V∗ W1/2 ,
m
)'
&
(
∂
(I N )−1 ! !
0= √
,
1(Hit ∈Ek ) 2 πit (θ )
m
∂σb
1≤k≤K
t
yit 1(Hit ∈Ek )
hit = Git (yit − πit (θ )) − 0Cit (Yit − Xit" ω)2 ,
Similar to Pearson’s chi-squared test, a test statistic can be defined by
χ∗ ≡
includes time-dependent covariates. For this general formulation, inference is made based on the PL function. Asymptotic
distribution of the new test statistic is derived by exploiting the
martingale properties of the quasi-partial score process, which
differs from that of Jiang (2001b).
Define a new goodness-of-fit test statistic for distributional
assumptions in the BTSM model as
E(Mk )
.
(17)
If the parameters ω and σ are known, then the asymptotic distribution for this test statistic is chi-squared with K degrees of
freedom.
For our BTSM model, new construction and asymptotic
properties of the goodness-of-fit test need to be rigorously established, for two reasons. First, the probability Pω,σb (yit =
1|Hit ) is not completely specified under the null hypothesis,
because it involves the unknown parameters (ω, σb ). After the
unknown parameters in ej (ω, σb ) are replaced by the estimated
values (ω̂, σˆb ), the chi-squared approximation may not be valid
(Chernoff and Lehmann 1954; Jiang 2001b). Second, because
of the random effects and time series structures in the BTSM
model, the observations are correlated. Accordingly, the asymptotic chi-squared result may not follow from the classic central
limit theorem.
Jiang (2001b) derived the asymptotic distribution for goodness-of-fit test in LMMs with continuous response to assess
the adequacy of distributional assumptions. Here we construct
a new test statistic based on binary observations and the corresponding time series model. Furthermore, our BTSM model
Vh =
m !
n
!
R = tr((CV)2 ) −
!!
(CV)2it ,
1N = (N )−1 [Vh + 20R0" ].
(19)
var(hit ),
i=1 t=1
i
t
and
Note that for N × N matrixes C and CV, Cit and (CV)it represent the ((i − 1)n + t)th diagonal elements.
Theorem 3. Suppose that 1N in (19) converges to a limiting
value, 1. Under assumptions A1–A8, as N → ∞, the asymptotic distribution of the goodness-of-fit statistic (18) is
d
χ̂ 2 −→
K
!
λi Z2i ,
(20)
j =1
where 3 = diag(λ1 , . . . , λK ), λi are the eigenvalues of 1 and
Z1 , . . . , ZK are iid N (0, 1).
2 + 20̂R̂ 0̂" ] denote the estimate of (19).
Let 1̂ = N −1 [Vh
Computing 1̂ is essential to obtain the critical values in the
goodness-of-fit test. In practice, it often is straightforward to
evaluate 1̂ by a Monte Carlo method as follows:
&! !
'
−1
"
!
var(hit ) + 20̂R̂ 0̂
1̂ = N
i
t
1254
Journal of the American Statistical Association, September 2008
≈N
−1
3
U
!! 1 !
"
#"
#"
ĥit,(u) − hit ĥit,(u) − hit
U
t
i
+2
3
u=1
U
1 !0
0̂(u) R̂(u) 0̂"(u)
U
u=1
4
1
! !"
#"
#"
1
=
ĥit,(u) − hit ĥit,(u) − hit
N −1
U
t
i
4
U
1 !
"
+2
0̂(u) R̂(u) 0̂(u) ,
N
Table 3. Empirical level of the goodness-of-fit test at 5%
Model
Chi(m, n) K = 2 K = 4 K = 6 K = 8 squared
BTSM–AR(1)
(24, 20) .042
(16, 10) .047
.046
.049
.046
.054
.044
.055
.094
.119
BTSM–MA(1)
(24, 20) .046
(16, 10) .063
.052
.056
.054
.065
.062
.074
.084
.172
BTSM–AR(2)
(24, 20) .056
(16, 10) .06
.042
.048
.048
.054
.06
.062
.151
.277
BTSM–ARMA(1, 1) (24, 20) .044
(16, 10) .055
.046
.058
.042
.051
.046
.056
.294
.346
u=1
where U is the number of Monte Carlo simulations; ĥit,(l) ,
N , and R̂
0̂(l) , Iˆ(l)
(l) are estimates with θ replaced by θ̂ ; hit =
*
U
1
u=1 ĥit,(u) ; yit is a sample from Bernoulli trials with probU
ability following the fitted BTSM model, and the βi ’s are
iid variables generated from N (b̂, σ̂b ). As mentioned in Section 3.2, Laplace’s method can be applied to approximate integration in ∂πit (θ )/∂ω" and ∂πit (θ )/∂σb2 .
4.2 Finite-Sample Performance and
Empirical Application
To examine the finite-sample performance of the proposed
tests, we carried out some simulations under nulls and alternatives. Each result was calculated based on 5,000 simulations with a 5% significance level. Two sample sizes, N = 480
(m = 24, n = 20) and N = 160 (m = 16, n = 10), and four different partitions (K = 2, 4, 6, 8) were studied. For simplicity,
here we focus only on equal cell partitions in this simulation
study. As mentioned in Section 4.1, when unknown parameters
are involved, no existing test has a valid asymptotic distribution. Thus, we compare our method with a naive test, Pearson’s
chi-squared test in (17), but with parameters estimated. Because
parameters are not assumed known, a naive way to apply Pearson’s chi-squared test is to modify the asymptotic chi-squared
distribution with K − 1 − a degrees of freedom, where a is
the number of parameters being estimated. Here we conducted
the comparison only for K = 8. For example, for the second
model [BTSM–AR(2)] in Table 2, five parameters were estimated (three fixed effects, one random effect, and one corresponding variance). As noted earlier, the chi-squared distribution with 2(= 8 − 1 − 5) degrees of freedom may be incorrect (even asymptotically). Because this naive critical value is
too small, using the correct critical value possibly could correct
the empirical levels in our simulations. Clearly, however, this
would only come at the expense of further reducing the power.
Binary data were generated using the BTSM models listed in
Table 2 with four different time series structures. Table 3 reports
the empirical rejection probabilities associated with these four
models to examine the empirical level of the test. In general, as
the sample size increases, the empirical level of the proposed
test becomes more stable with respect to the number of partitions K. Compared with the naive Pearson chi-squared test, the
proposed method performs better in two respects. First, when
the number of estimated parameters involved in the model increases, the proposed method provides a more stable empirical level; for example, the empirical level of the naive test almost doubles and far exceeds the nominal 5% level when the
number of estimated parameters increases from four [AR(1) or
MA(1)] to five [AR(2) or ARMA(1, 1)]. This is because the
critical value of the naive test decreases rapidly as the number of estimated parameters increases. The other advantage of
the proposed method is its performance robustness to sample
size. For the naive test, the empirical level increases dramatically as the sample size decreases, whereas for the proposed
method, this increase is only slight to modest. Table 4 reports
the computing times on a 3.4-GHz PC for calculating the empirical level (based on 5,000 simulations) using R. The computing
time increases linearly with sample size, whereas it increases
marginally with K.
In terms of power, we chose two alternatives to assess the
distributional assumptions involved in the fitted model (at the
5% level): the Bernoulli assumption for the binary data and the
normal assumption for the random effects. The first alternative
assumes that the random effects are normally distributed and
that the binary data follow a beta-binomial distribution; that is,
the yit ’s are generated from a Bernoulli(Pit ) distribution, and
Pit is a random variable with a beta(µit , 1 − µit ) distribution.
The second alternative assumes a departure from the normal
assumption for random effects. Let yit follow a Bernoulli(µit )
distribution and let the random effect β follow a mixture of two
normal distributions, N (b1 , 1) and N (b2 , 1), with probabilities
prob and 1 − prob, denoted by MIXN(b1 , b2 , prob). In the simulation, the random effect is assumed to be MIXN(−.5, .5, .3).
Table 2. BTSM models with four different time series structures
BTSM–AR(1)
BTSM–MA(1)
BTSM–AR(2)
BTSM–ARMA(1, 1)
Model
βi
logit(µit ) = βi + 1.3yit−1 + .3xit , xit = .2, .4, .6, .8
logit(µit ) = βi + 1.3(yit−1 − µit−1 ) + .3xit , xit ∈ (0, 1)
logit(µit ) = βi + yit−1 + .5yit−2 + .3xit , xit ∈ (0, 1)
logit(µit ) = β + 1.5yit−1 + .5(yit−1 − µit−1 ) − .5xit , xit ∈ (0, 1)
N (−.3, .5)
N (−.3, .5)
N (−1, .5)
N (−.3, .5)
Hung et al.: Binary Time Series Modeling
1255
Table 4. Computing times (in minutes) for calculating the
empirical level
Model
Table 6. Power of testing normal random effects under the
mixed normal distribution
Chi(m, n) K = 2 K = 4 K = 6 K = 8 squared
Model
Chi(m, n) K = 2 K = 4 K = 6 K = 8 squared
BTSM–AR(1)
(24, 20)
(16, 10)
13
5
15
6
15
6
17
6
17
6
BTST–AR(1)
(24, 20) .319
(16, 10) .205
BTSM–MA(1)
(24, 20)
(16, 10)
16
7
18
7
18
7
20
9
20
9
BTSM–MA(1)
(24, 20) .994
(16, 10) .924
(24, 20)
(16, 10)
18
6
18
7
20
7
20
8
20
8
BTSM–AR(2)
BTSM–ARMA(1, 1) (24, 20)
(16, 10)
18
7
18
7
21
8
22
8
22
8
BTSM–AR(2)
For each alternative, the µit ’s are obtained from the values
specified in the four models given in Table 2. Based on the generated data, models were fitted by the procedure described in
Section 3.
Tables 5 and 6 report the empirical rejection probabilities for
both alternatives associated with four BTSM models to evaluate the empirical power. Clearly, for the first two models, the
proposed test is more powerful than the naive test for both alternatives [with the exception of BTSM–AR(1), m = 24, n = 20,
K = 2 in Table 6]. In some cases, for the latter two models,
when K is small (mostly for K = 2, and some for K = 4), the
naive method has more power than the proposed method, but
this is due to the higher empirical levels of the former, as shown
in Table 3. Another issue is the dependence of performance
on K, the number of cells. It is well known that the power of
this type of goodness-of-fit test can vary greatly with K. This
is shown by the simulation results, especially when the sample size is smaller. Therefore, proper choice of partitions is important. This leads to the following guidelines for choosing the
optimal number of partitions.
Although the construction of the goodness-of-fit test allows
arbitrary partitioning of the cells, its performance depends on
choosing the proper number of exclusive subsets K in (16).
How does one choose the optimal number of partitions? First,
to ensure sufficient power, K should not be too small, because
the fewer the cells, the more difficult it is to distinguish between
two distributions. On the other hand, if there are too many cells,
then the size of the test may become a problem, because the asymptotic distribution of the test is based on a K-dimensional
central limit theorem. A necessary condition to maintain this
Table 5. Power of testing the Bernoulli assumption under the
beta-binomial distribution
Model
Chi(m, n) K = 2 K = 4 K = 6 K = 8 squared
BTSM–AR(1)
(24, 20) .651
(16, 10) .361
.659
.403
.718
.608
.772
.548
.574
.296
BTSM–MA(1)
(24, 20) .792
(16, 10) .528
.806
.729
.811
.579
.912
.634
.778
.428
BTSM–AR(2)
(24, 20) .688
(16, 10) .603
.858
.596
.762
.586
.756
.594
.728
.578
BTSM–ARMA(1, 1) (24, 20) .402
(16, 10) .216
.818
.286
.754
.428
.746
.502
.648
.486
.427
.314
.486
.458
.674
.327
.354
.135
.983
1
.988
.998
.97
.993
.826
(24, 20) .596
(16, 10) .336
.607
.375
.686
.395
.702
.448
.618
.432
BTSM–ARMA(1, 1) (24, 20) .546
(16, 10) .323
.658
.348
.586
.356
.616
.434
.518
.346
1
asymptotic property is that K/N 1/5 → 0 (Senatov 1980; Jiang
2001b); therefore, the proper number of partitions should be
chosen from 1 to .N 1/5 /. Within this range, conducting a simulation with comparable sample sizes will help determine the optimal number of partitions. R code for the simulations (available
at http://www2.isye.gatech.edu/∼jeffwu/publications/ ), can be
easily implemented.
5. APPLICATION IN THE ADHESION
FREQUENCY EXPERIMENT
In this section we revisit the adhesion frequency experiment
data and apply the proposed model to predict adhesion probability, PANB . As described in Section 2, 37 pairs of cells ware
used in this experiment. Adhesion test cycles for each pair were
repeated 50 times. To study the time series behavior, the first
five observations for each subject are treated as additional predictor variables; therefore, in this example, m = 37 and n = 45.
The covariate here is the average number of bonds, denoted by
ANBi for the ith pair of cells. For each pair of cells, the ANB
is fixed; therefore, there is no time-dependent covariate in this
example, and the one-dimensional (p = 1) covariates in model
(6) can be simplified by assuming that xit = xit,1 = ANBi , for
all t.
With fixed effects,
ω = (α1 , γ1 , ϕ1 , ζ1 ),
and the corresponding Xit" = (ANBi , ANBi × yit−1 , yit−1 ,
(yit−1 − µit−1 )), the fitted BTSM model for PANB is
logit(µit ) = βi + α1 ANBi + γ1 ANBi × yit−1
+ ϕ1 yit−1 + ζ1 (yit−1 − µit−1 ),
(21)
where βi ∼ N (−1.33, .44). The value of the MQPLE is
ω̂ = (.97, −.62, 1.76, −.86),
with corresponding p values .004, .031, <.001, and .006. The
estimated variance component σ̂b = .4 (with standard deviation .14) provides clear evidence of the substantial heterogeneity among subjects. In model (21), ANBi and yit−1 have significant effects on cell PANB at time t. The positive α1 value
of .97 indicates increasing cell PANB with respect to the ANB.
The adhesion memory can be described by a first-order ARMA
process. The positive ϕ1 value of 1.76 indicates that the PANB
is higher if adhesion occured in the previous test. The significant interaction (ANBi × yit−1 ) plays an important role in
1256
Journal of the American Statistical Association, September 2008
model interpretation. Based on the fitted model (21), the coefficient of the ANBi , .97 − .62yit−1 , shows that the effect
of ANB is smaller if adhesion occured in the previous test
(yit−1 = 1). On the other hand, based on the coefficient of yit−1
(i.e., 1.76 − .86 − .62ANBi = .9 − .62ANBi ), the effect of yit−1
is reduced as the average number of bonds increases. Furthermore, the memory effect is close to 0 if the ANB is around
1.45 (= .9/.62). This implies that for two cells with ANB > 4.3
seconds, the repeated adhesion tests become nearly independent. This model provides much new information on the adhesion frequency analysis, giving not only a flexible model for
considering the memory effect, but also the conditions under
which the independence assumption may hold.
The distributional assumptions here are the normally distributed random effects and dependent Bernoulli distributed responses. To assess their adequacy, we applied the proposed
goodness-of-fit test (18) in this example. Based on some simulation studies that we suggested in Section 4.2, the optimal
number of partition in this example was K = 4. Therefore, we
first partitioned the previous information space Hit into four
disjoint events,
E1 = (yit−1 = 0, µit−1 > .5),
E2 = (yit−1 = 0, µit−1 ≤ .5),
E3 = (yit−1 = 1, µit−1 > .5),
E4 = (yit−1 = 1, µit−1 ≤ .5),
and
that is, c1 = 1, c2 = 2, and c3 = 2 in (16). We conducted 5,000
Monte Carlo simulations to evaluate 1̂. The corresponding
eigenvalues for 1̂ are {.3100, .1428, .0350, .0252}. By Theorem 3, the critical values of the proposed goodness-of-fit test
at α = .01, .05, and .1 are 2.3819, 1.6271, and 1.2556. The test
statistic under model (21) is χ̂ 2 = .9392, which is much smaller
than the critical values. Thus, we have no evidence on which to
reject the hypothesis that the binary responses in adhesion tests
follow a dependent Bernoulli distribution with probability given
by model (21). Similar to the study in Section 4.2, we compared
the proposed test with the naive chi-squared test in (17) with 1
degree of freedom. The naive test statistic has value 6.8838 with
the corresponding p value of .0087. This would lead to the rejection of the hypothesis of dependency and the model in (21).
In view of the simulation results in Section 4.2 that the naive
test can have an very large test statistic value, such a conclusion
cannot be taken seriously.
Recall that the preliminary analysis in Section 2 demonstrates some memory effects in the repeated observations. By
applying the BTSM model, the cell adhesion memory can be
described by an ARMA(1, 1) process. Moreover, model (21)
can quantify the effect of ANB and identify a significant interaction between ANB and the previous test result. This is a
significant advantage, because in practice it is difficult to assess the MA and interaction effects by graphical analysis. As
shown in this example, by including the interaction term, the
BTSM model provides flexibility in capturing different time
series structures with respect to different covariates. Given the
fitted models, goodness-of-fit tests can be conducted to check
the distributional assumptions. The test results provide statistical evidence as to the adequacy of the distributional assumption and supports model-based predictions. Another advantage
of the BTSM model is that it incorporates random effects, allowing inference and predictions beyond the particular subjects
used in the experiment.
6. SUMMARY AND CONCLUDING REMARKS
Despite the prevalence of multiple binary time series data in
many applications, their modeling and inference have not been
systematically studied in the literature. Here we have proposed
a BTSM model to analyze data when a repeated binary time series is observed for each subject. The model handles multiple
time series by incorporating random effects to borrow strength
across different subjects, thereby allowing inference and predictions beyond the specific units in the study. The BTSM model
includes numerous known models as special cases, and it also
may have applications in longitudinal analysis.
Estimators for the fixed effects and variance components
have been shown to be consistent and asymptotically normally
distributed. We have proposed a new goodness-of-fit test to
assess the adequacy of the distributional assumptions in the
BTSM model. Because there are some unknown parameters
and the data are dependent, the asymptotic distribution for the
test statistic is derived using a martingale central limit theorem.
Not surprisingly, the results differ from those of the classical
Pearson’s chi-squares test. Our proposed test outperformed the
naive Pearson’s chi-squared test in an simulation study. Some
guidelines are given on the choice of the optimal number K of
partitions.
As an application, we applied our BTSM model to fit some
multiple binary time series observed in a T-cell adhesion frequency experiment. This study demonstrates how the BTSM
model can help quantitatively describe the effects of significant
factors. Furthermore, the fitted model provides valuable information on MA and interaction effects that cannot be obtained
from graphical analysis. Our example demonstrates that the
first-order autocorrelation effect can be observed from graphical analysis, but not when higher-order autocorrelations are
present. It also demonstrates the goodness-of-fit test. Although
the covariates in this example are independent of time, the proposed model and inference are generally applicable to problems
with time-dependent covariates.
APPENDIX A: ASSUMPTIONS
A1. The parameter ω belongs to an open set B ⊆ R s .
surely in a nonrandom
A2. The covariate matrix Xit lies almost
* *
" X > 0] = 1.
compact subset of R s such that P [ i t Xit
it
2
A3. σb ≥ 0 and var(β1 ) > 0.
A4. As N → ∞, lim inf λmin cor(IN −s , V1 ) > 0 and
lim tr(V"1 V1 )1/2 = ∞, where for matrixes A1 and A2 ,
cor(A1 , A2 ) = tr(A"1 A2 )/[tr1/2 (A"1 A1 ) tr1/2 (A"2 A2 )].
A5. 2N −1/2 aT" D2 and 2N −1/2 aT" 0J 1/2 2 are bounded, where
2κ2 = (κ " κ)1/2 for any vector κ.
(Y*
− Xω)" V−1/2 as a vector with elements 5it
A6. Define 5" =*
2
and σN = i t var((C ∗ V)it 52it + G∗it (yit − πit (θ ))) +
*
2 (it)" $=it (C ∗ V)2it,(it)" .There exists Lit such that as N → ∞,
−2 *
{ it E[(C ∗ V)it ×
the following quantities converge to 0: σN
*
1
2
2
(5it − 1)] 1(|5it |>Lit ) + 2 it$=(it)" ((C ∗ V)it,(it)" )2 [δit +
−2 *
∗ 2
2
δ(it)" ]} and σN
it (Git ) E(yit −πit (θ)) 1(|yit −πit (θ)|>Lit ) ,
2
where δit = E5it 1(|5it |>Lit ) .
Hung et al.: Binary Time Series Modeling
1257
A7. There exist Lit such that as N → ∞, the following quantities
−4 *
converge to 0: σN
{ it E[(C ∗ V)it (52it − 1)]4 ×
*
∗
4 ∗ ∗
+
+
1(|5it |≤Lit )
it$=(it)" ((C V)it,(it)" ) δit δ(it)"
* *
−4 *
∗
∗
∗
2
2
4
it [ (it)" $=it ((C V)it,(it)" ) ] δit } and σN
it (Git ) ×
∗ =
E(yit − πit (θ))4 1(|yit −πit (θ)|≤Lit ) ,
where
δit
E54it 1(|5it |≤Lit ) .
2 → 0, where ξ = (C ∗ V) −
A8. As N → ∞, λmax ξ " ξ/σN
diag(C ∗ V).
Assumptions A1 and A2 are required for the asymptotic properties
for fixed effects estimated from partial likelihood. Lindeberg’s condition holds under assumption A2 (Fokianos and Kedem, 1998), which
leads to the proof of Theorem 1. Assumptions A3 and A4 are the same
as those of Jiang (1996).
APPENDIX B: PROOF OF THEOREM 1
Only sketches of the proofs are given in Appendixes B, C, and D.
Details can be found at http://www.amstat.org/publications/jasa/
supplemental_materials.
Based on the partial likelihood, the partial score process for ω can
be written as
Sn (ω, σb ) =
n !
m
!
t=1 i=1
Xit (yit − πit (ω, σb )).
Assume that a σ -field is generated from the past data and covariates
Fn−1 = σ (H1n , H2n , . . . , Hmn ). It is clear that E[Sn (ω, σb )|Fn−1 ] =
Sn−1 (ω, σb ) and E[Sn (ω, σb )] = 0. Based on this and on assumptions A1 and A2, it is easy to see that the partial score process
Sn (ω, σb ) is the sum of mean-0 martingale differences with respect
to Fn−1 . The asymptotic normality follows from the martingale central limit theorem. Details of the proof are analogous to that outlined
by Slud and Kedem (1994).
where λ1 , . . . , λK are the eigenvalues of 1 (1) .
The proof is omitted because it is similar to that of Jiang (2001a).
The only difference is that here the asymptotics are proven using a martingale central limit theorem. Because we use the partial likelihood,
this result is more general than that of Jiang (2001a).
Lemma D.2. Using the notation in Section 3.3, for any µ ∈ R \ {0},
0
√
∗ (Y − Xω)
µJ −1/2 I N m(σ̂b2 − σb2 ) = (Y − Xω)" BN
"
∗ (Y − Xω)#1, (D.2)
− E (Y − Xω)" BN
∗ = J −1/2 µW1/2 V ∗ W1/2 ZZ " W1/2 V ∗ W1/2 /√m.
where BN
Proof. Following the same argument as in Theorem 2, considering
the LMM with GLM weights, we can obtain this result by modifying
a formula of Jiang (1996, p. 276, first formula) to
√
µJ −1/2 I N m(σ̂b2 − σb2 ) = - " BN - − E(- " BN - ),
√
where BN = J −1/2 µV1 (σb )/ m and V1 (σb ) is as defined in Theorem 2. Lemma D.2 follows because - " g(σb )W−1/2 = (Y − Xω)" .
Lemma D.3. Denote θ = (ω" , σb ), ξk = Mk − ek (θ̂), and ξ =
(ξk )1≤k≤K . Let T be an orthogonal matrix such that T " 1N T =
diag(λN,1 , . . . , λN,K ), where λN,1 , . . . , λN,K are the eigenvalues of
1N . For any a ∈ R K ,
n
m !
"
# !
ϒit + op (1),
a " (N)−1/2 T " ξ =
where T a = aT = (aT ,1 , . . . , aT ,K )" , G∗it = (N)−1/2 aT" Git , 5" =
(5it ) = (Y − Xω)" V−1/2 , var(5it ) = 1,
&! !(
)'
∂
1(Hit ∈Ek ) 2 πit (θ)
C ∗ = (N)−1/2 aT"
∂σb
1≤k≤K
t
i
×
APPENDIX C: PROOF OF THEOREM 2
The inference for the variance component can be formulated as
a LMM with variances of error terms following the GLM iterative
weights in (14). Define Y∗ = W1/2 Y, X∗ = W1/2 X, Z∗ = W1/2 Z,
and , ∗ = W1/2 ,. Replacing these in (14), the results follow directly as
a special case of theorem 4.1 of Jiang (1996).
APPENDIX D: PROOF OF THEOREM 3
The proof follows along the lines described by Jiang (2001b). It
includes several lemmas that culminate in the final proof.
Lemma D.1. Under the same assumptions as in Theorem 3, define
1n = 1n (θ) = n−1
(1)
where
3
(1)
hi = 1(Hi ∈Ek ) −
(1)
5
n
!
i=1
× +−1
n (ω)Xi
1≤k≤K
1
χ̂ 2 =
n
j =1
d
(Mj − ej (ω̂))2 −→
k=1
(it)" $=it
Note that C ∗ is a N × N matrix with Ci∗" t " ,it indicating the element in
C ∗ with [(i " − 1)n + t " ]th column and [(i − 1)n + t]th row, and, by the
∗
definition given before Theorem 3, Cit,it
= Cit∗ .
− (N)−1/2
(yi − πi (ω)).
(1)
K
!
ϒit = G∗it (yit − πit (θ)) − (C ∗ V)it 52it
)
( !
−
(C ∗ V)it,(it)" 5(it)" 5it + (C ∗ V)it .
k=1
Suppose that 1n converges to a limiting value 1 (1) . If there is no
random effect in model (6) (i.e., m = 1 in Thm. 3), then the asymptotic
distribution of the test statistic (18) is
K
!
= (N)−1/2 aT" 0C,
K
!
"
#
a " (N)−1/2 T " ξ = (N)−1/2
aT ,k (Mk − ek (θ))
6
n
#
1 !"
"
1(Hj ∈Ek ) Xj (1 − πj (ω))πj (ω)
n
4
and
(I N )−1
C
√
m
Proof. For 1 ≤ k ≤ K, ξk = Mk − ek (θ) − (ek (θ̂) − ek (θ )). By definition,
" (1) #
var hn,i ,
j =1
(D.3)
i=1 t=1
λk Z2k ,
(D.1)
K
!
k=1
aT ,k (ek (θ̂) − ek (θ )).
By the Taylor expansion, the second term on the right side can be approximated by
6
35 m n
K
!
!!
∂
−1/2
(N)
aT ,k
1(Hit ∈Ek ) " πit (θ) (ω̂ − ω)
∂ω
k=1
i=1 t=1
+
5m n
!!
∂
6
4
1(Hit ∈Ek ) 2 πit (θ) (σ̂b2 − σb2 ) .
∂σb
i=1 t=1
1258
Journal of the American Statistical Association, September 2008
The result follows by Theorem 1, assumption A5, and Lemma D.2.
Lemma D.4. Under A6–A8, as N → ∞,
m !
n
!
i=1 t=1
d
ϒit −→ N (0, a " 3a).
(D.4)
*
*n
Proof. First, derive the asymptotic distribution of m
i=1 t=1 ϒit /
σN , where σN is as defined in assumption A6. Decompose ϒit /σN =
(1)
(2)
ϒit + ϒit , where
(
)
! "
#
1
(1)
G∗it u∗it + (C ∗ V)it Uit +
ϒit =
(C ∗ V)it,(it)" u(it)" uit
σN
"
(it) $=it
and
(2)
ϒit =
(
)
( !
1
∗ + (C ∗ V) V +
G∗it vit
(C ∗ V)it,(it)" v(it)" uit
it it
σN
"
(it) $=it
+
( !
(it)" $=it
) )
5(it)" (C ∗ V)it,(it)" vit .
Define
Uit = (52it − 1)1(|5it |<Lit ) − E(52it − 1)1(|5it |<Lit ) ,
Vit = (52it − 1) − Uit ,
vit = 5it − uit ,
and
*m *n
i=1
(2)
t=1 ϒit con-
verges to 0 in L2 . Next consider ϒit , an array of martingale differences, following the same argument as in theorem 5.2 of Jiang (1996).
Based on assumption A7 and Rosenthal’s inequality (Hall and Heyde
(1)
1980), maxit |ϒit | is bounded in L2 and converges to 0 in probability.
By theorem 3.2 of Hall and Heyde (1980), to prove Lemma D.4,
* *
(1)
we need to show that i t (ϒit )2 converges to a " 3a in probability.
First, it can be decomposed as
m !
3
3
n
!
!
" (1) #2 !
ϒit
=
ti +
si ,
j =1
where
−2
t1 = σN
m !
n
!
0" ∗
#2
(C V)it Uit + G∗it u∗it
i=1 t=1
"
#2 1
− E (C ∗ V)it Uit + G∗it u∗it ,
)
m !
n ( !
!
−2
(C ∗ V)it,(it)" u(it)"
t2 = 2σN
i=1 t=1 (it)" $=it
0
× (C ∗ V)it (Uit uit − E(Uit uit ))
1
+ (G∗it u∗it )uit − E((G∗it u∗it )uit ) ,
)2
)
m !
n (( !
!
−2
(C ∗ V)it,(it)" u(it)" (u2it − Eu2it ) ,
t3 = 2σN
i=1 t=1
(it)" $=it
and
0
m !
n ( !
!
i=1 t=1 (it)" $=it
(C ∗ V)it,(it)" u(it)"
)
1
× E((C ∗ V)it Uit uit ) + E((G∗it u∗it )uit ) ,
−2
s3 = 2σN
m !
n (( !
!
i=1 t=1
(it)" $=it
(C ∗ V)it,(it)" u(it)"
)2
)
Eu2it .
By assumption A7 and Rosenthal’s inequality, we can show that ti → 0
in L2 for i = 1, 2, 3, which is similar to the result in theorem 5.2 of
Jiang (1996). By assumption A7,
−2
s1 = σN
m !
n
!
i=1 t=1
"
#
var (C ∗ V)it 52it + G∗it (yit − πit (θ)) + op (1).
(D.5)
Analogous to theorem 5.2 of Jiang (1996), by assumptions A6–A8, we
have
'
&
λmax (ξ " ξ ) 1/2
→ 0,
(D.6)
Es22 ≤ c
2
σN
(D.7)
Proof of Theorem 3
∗ = (y − π (θ)) − u∗ .
vit
it
it
it
j =1
−2
s2 = 2σN
* *
(1) 2
2
By (D.5) and (D.7),
i
t (ϒit ) = 1 + op (1). Because σN =
"
"
"
a T 1N T a, it converges to a 3a in probability. Consequently, (D.4)
follows.
− E(yit − πit (θ))1(|yit −πit (θ)|<Lit ) ,
i=1 t=1
i=1 t=1
(it)" $=it
u∗it = (yit − πit (θ))1(|yit −πit (θ)|<Lit )
(1)
m !
n
!
"
#2
E (C ∗ V)it Uit + G∗it u∗it ,
where c represents for a constant and
!
−2
(C ∗ V)2it,(it)" + op (1).
s3 = 2σN
uit = 5it 1(|5it |<Lit ) − E5it 1(|5it |<Lit ) ,
By assumption A6, we can easily show that
−2
s1 = σN
From Lemmas D.2–D.4, we have, for any a,
"
# d
a " N −1/2 T " ξ −→ (a " 3a) 1/2 Z,
d
where Z ∼ N (0, 1), from which N −1/2 T " ξ −→ N (0, 3) follows.
[Received May 2007. Revised April 2008.]
REFERENCES
Barndorff-Nielsen, O. E., and Cox, D. R. (1989), Asymptotic Techniques for
Use in Statistics, London: Chapman & Hall.
Benjamin, M. A., Robert, A. R., and Stasinopoulos, D. M. (2003), “Generalized
Autoregressive Moving Average Models,” Journal of the American Statistical
Association, 98, 214–223.
Breslow, N. E., and Clayton, D. G. (1993), “Approximate Inference to Generalized Linear Mixed Models,” Journal of the American Statistical Association,
88, 9–25.
Chernoff, H., and Lehmann, E. L. (1954), “The Use of Maximum Likelihood
Estimations in χ 2 Tests for Goodness of Fit,” The Annals of Mathematical
Statistics, 25, 579–586.
Chesla, S. E., Selvaraj, P., and Zhu, C. (1998), “Measuring Two-Dimensional
Receptor-Ligand Binding Kinetics by Micropipette,” Biophysical Journal, 75,
1553–1572.
Cox, D. R. (1975), “Partial Likelihood,” Biometrika, 62, 69–76.
Diggle, P., Heagerty, P., Liang, K.-Y., and Zeger, S. (2002), Analysis of Longitudinal Data (2nd ed.), Oxford, U.K.: Oxford University Press.
Fokianos, K., and Kedem, B. (1998), “Prediction and Classification of NonStationary Categorical Time Series,” Journal of Multivariate Analysis, 67,
277–296.
(2004), “Partial Likelihood Inference for Time Series Following Generalized Linear Models,” Journal of Time Series Analysis, 25, 173–197.
Fu, W. J. (1998), “Penalized Regressions: The Bridge versus the Lasso,” Journal of Computational and Graphical Statistics, 7, 397–416.
Hall, P., and Heyde, C. C. (1980), Martingale Limit Theory and Its Application,
New York: Academic Press.
Hung et al.: Binary Time Series Modeling
Harville, D. A. (1977), “Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems,” Journal of the American Statistical
Association, 72, 320–340.
Jiang, J. (1996), “REML Estimation: Asymptotic Behavior and Related Topics,” The Annals of Statistics, 24, 255–286.
(2001a), “A Nonstandard Chi-Square Test With Application to Generalized Linear Model Diagnostics,” Statistics and Probability Letters, 53,
101–109.
(2001b), “Goodness-of-Fit Tests for Mixed Model Diagnostics,” The
Annals of Statistics, 29, 1137–1164.
Kaufmann, H. (1987), “Regression Models for Nonstationary Categorical Time
Series: Asymptotic Estimation Theory,” The Annals of Statistics, 15, 79–98.
Kedem, B., and Fokianos, K. (2002), Regression Models for Time Series Analysis, New York: Wiley.
Li, W. K. (1994), “Time Series Models Based on Generalized Linear Models:
Some Further Results,” Biometrics, 50, 506–511.
Lin, X., and Breslow, N. E. (1996), “Bias Correction in Generalized Linear
Mixed Models With Multiple Components of Dispersion,” Journal of the
American Statistical Association, 91, 1007–1016.
Marshall, B. T., Long, M., Piper, J. W., Yago, T., McEver, R. P., and Zhu, C.
(2003), “Direct Observation of Catch Bonds Involving Cell-Adhesion Molecules,” Nature, 423 190–193.
McCulloch, C. E. (1997), “Maximum Likelihood Algorithms for Generalized
Linear Mixed Models,” Journal of the American Statistical Association, 92,
162–170.
1259
Mehta, A. D., Rief, M., Spudich, J. A., Smith, D. A., and Simmons, R. M.
(1999), “Single-Molecule Biomechanics With Optical Methods,” Science,
283, 1689–1695.
Patterson, H. D., and Thompson, R. (1971), “Recovery of Interblock Information When Block Sizes Are Unequal,” Biometrika, 58, 545–554.
Searle, S. R., Casella, G., and McCulloch, C. E. (1992), Variance Components,
New York: Wiley.
Senatov, V. V. (1980), “Uniform Estimates of the Rate of Convergence in the
Multi-Dimensional Central Limit Theorem,” Theory of Probability and Its
Applications, 25, 745–759.
Slud, E., and Kedem, B. (1994), “Partial Likelihood Analysis of Logistic Regression and Autoregression,” Statistica Sinica, 4, 89–106.
Wong, W. H. (1986), “Theory of Partial Likelihood,” The Annals of Statistics,
14, 88–123.
Zarnitsyna, V. I., Huang, J., Zhang, F., Chien, Y.-H., Leckband, D., and Zhu, C.
(2007), “Memory in Receptor–Ligand Mediated Cell Adhesion,” Proceedings
of the National Academy of Sciences USA, 104, 18037–18042.
Zeger, S. L., and Qaqish, B. (1988), “Markov Models for Time Series: A QuasiLikelihood Approach,” Biometrics, 44, 1019–1032.
Zhu, C., Long, M., Chesla, S. E., and Bongrand, P. (2002), “Measuring Receptor/Ligand Interaction at the Single-Bond Level: Experimental and Interpretative Issues,” Annals of Biomedical Engineering, 30, 305–314.