Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Binary Time Series Modeling With Application to Adhesion Frequency Experiments Ying H UNG, Veronika Z ARNITSYNA, Yan Z HANG, Cheng Z HU, and C. F. Jeff W U Repeated adhesion frequency assay is the only published method for measuring the kinetic rates of cell adhesion. Cell adhesion plays an important role in many physiological and pathological processes. Traditional analysis of adhesion frequency experiments assumes that the adhesion test cycles are independent Bernoulli trials. This assumption often can be violated in practice. Motivated by the analysis of repeated adhesion tests, a binary time series model incorporating random effects is developed. A goodness-of-fit statistic is introduced to assess the adequacy of distribution assumptions on the dependent binary data with random effects. The asymptotic distribution of the goodness-of-fit statistic is derived, and its finite-sample performance is examined through a simulation study. Application of the proposed methodology to real data from a T-cell experiment reveals some interesting information, including the dependency between repeated adhesion tests. KEY WORDS: Cell adhesion; Goodness-of-fit test; Micropipette experiments; Random effects. 1. INTRODUCTION This research is motivated by the statistical analysis of time series data from biomechanical experiments that study protein, DNA, and RNA at the level of single molecules (Mehta, Rief, Spudich, Smith, and Simmons 1999). Single-molecule mechanics experiments use ultrasensitive force techniques to mechanically characterize a single pair of molecules that physically links the force sensor to a sample surface. Figure 1 illustrates a simple experiment, the micropipette adhesion frequency assay (Chesla, Selvaraj, and Zhu 1998). Here a human red blood cell (left) pressurized by micropipette suction is used as a force transducer to test interactions between molecules presented on the red cell membrane and the counter molecules on the surface of another cell (right; only partly shown). The two cells are put together for a predetermined duration [Fig. 1(b)], then retracted away. The simplest measurement evaluates whether a controlled contact results in adhesion. If adhesion is resulted, then retraction will stretch the red cell [Fig. 1(c)]. If no adhesion is resulted, then the red cell will not be stretched [Fig. 1(a)]. To ensure adhesion mediated by a single molecular bond, the experimental condition is designed such that adhesion is infrequent (Zhu, Long, Chesla, and Bongrand 2002). As such, in any particular test, both positive (i.e., adhesion; scored as 1) and negative (i.e., no adhesion; scored as 0) outcomes are possible and random. Due to the inherent stochastic nature of single molecular interactions, such analysis requires numerous measurements to obtain their statistical properties; for example, the probability of adhesion can be estimated from the frequency of occurrence of adhesion in a large number of contacts (Chesla et al. 1998). The probability distribution of single bond lifetimes can be estimated from the histogram of a large number of lifetime measurements (Marshall et al. 2003). Experimentally, these are obtained by sequentially repeating the measurements many times. Ying Hung is Assistant Professor, Department of Statistics, Rutgers University, Piscataway, NJ 08854, Yan Zhang is Professor, Institute of Biomedical Engineering, Department of Anatomy, Second Military University, China; Cheng Zhu is Regent’s Professor, Wallace H. Coulter Department of Biomedical Engineering; and C. F. Jeff Wu is Professor, H. Milton Stewart School of Industrial and Systems Engineering (E-mail: [email protected]), Georgia Institute of Technology, Atlanta, GA 30332. Wu’s research was supported by National Science Foundation grant DMS 0305996. Zhu’s research was supported by National Institute of Health grants AI38282 and AI44902. The authors thank the joint editors, an associate editor, and two referees for their helpful comments and suggestions. A crucial assumption that allows the use of measurements from repeated tests for probability calculation is that all measurements are identical yet independent from one another; that is, the test sequence consists of independent and identically distributed random variables. But this may or may not be valid, depending on the particular biological system in question. Recently, Zarnitsyna et al. (2007) demonstrated that this assumption is not valid in some biological systems. Specifically, they showed that adhesion occurring in the immediate past test can either increase or decrease the likelihood for the next test to result in adhesion. A simple analysis has been developed to determine whether the independent assumption is valid and, if it is not, to measure the amount of change in the probability of adhesion in the next test due to adhesion occurring in the immediate past test. In this article we extend the simple analysis to a more sophisticated binary time series model. Numerous methods for binary time series analysis are available in the literature (Zeger and Qaqish 1988; Li 1994; Slud and Kedem 1994; Benjamin, Robert, and Stasinopolous 2003). Most of these methods have been developed for single series of observations; extensions to multiple binary time series modeling and related inferences have not been studied systematically. Both Li (1994) and Kedem and Fokianos (2002, p. 84) have pointed out the importance of extensions to cases in which a series is collected for each individual. This is different from classical time series analysis, in that the binary time series are observed on different replicates of the experimental units. Correlation among the repeated observations may arise not only from memory effects, but also from shared unobserved variables. Therefore, more general models are needed to incorporate the correlations among repeated observations. Another important issue is model diagnostics. As an alternative to Pearson’s chi-squared test, which works under the As an alternative assumption, chisquared test, statistics and their theoretical properties need to be developed. The remainder of this article is organized as follows. Some preliminary analysis results for an adhesion frequency experiment are presented in Section 2. In Section 3 a class of multiple binary time series models is proposed. In Section 4 goodnessof-fit test for model assumptions and their asymptotic properties are derived, and its finite-sample performance is examined 1248 © 2008 American Statistical Association Journal of the American Statistical Association September 2008, Vol. 103, No. 483, Theory and Methods DOI 10.1198/016214508000000508 Hung et al.: Binary Time Series Modeling 1249 Figure 1. Photomicrographs of the micropipette adhesion frequency assay. through a simulation study. In Section 5 the proposed model and inferences are applied to the same experiment, and the results are compared with those in Section 2. A summary and some concluding remarks are given in Section 6. 2. PRELIMINARY ANALYSIS OF AN ADHESION FREQUENCY EXPERIMENT In the micropipette adhesion frequency assay, adhesion between the two cells was staged by placing the cells onto controlled contact for a given time and area through a computerdriven micromanipulation to ensure that each contact was as close to identical as possible to any other contacts (Fig. 1). The average number of bonds (ANB) is a transformation of the contact time (Chesla et al. 1998). For each ANB, several replicates of cell pairs were tested. For each pair of cells, the adhesion test cycle (i.e., contact and retraction) was repeated 50 times. Test scores (denoted by y) were recorded in binary form (i.e., y = 0 or 1), resulting in multiple binary time series of the type exemplified in Table 1. Under independent Bernoulli trial assumption (Chesla et al. 1998), the average adhesion probability (PANB ) can be simply estimated by the adhesion frequency, calculated as PANB = number of adhesions . number of test cycles (1) Figure 2 shows an example of the relationship between PANB and ANB (unpublished data, courtesy of Y. Zhang and C. Zhu). In this micropipette experiment, the adhesion test was conducted with seven different ANBs (.085, .17, .255, .34, .51, .68, and 1.36). Each of the first two ANBs, has six pairs of replicates, whereas each of the others have five pairs each. Each point in Figure 2 represents the PANB value for one pair of cells, as calculated from (1). The solid line represents the average over all of the replicates under the same ANB. The existing method of characterizing the relationship between PANB and ANB (Chesla et al. 1998) is based on the assumption that the binary time series data (e.g., Table 1) form Bernoulli sequences. But for each pair of cells, the adhesion test cycles are observed repeatedly. The independence assumption may not hold, as demonstrated recently (Zarnitsyna et al. 2007); therefore, the adequacy of the distributional assumption must be checked before the method is applied. One graphical technique for assessing this assumption is the probability plot. If the data are collected from independent Bernoulli trials, then the number of trials needed to achieve success will follow a geometric distribution with probability p, where p = Pr(y = 1). For each ANB, the number of tests needed to achieve success is calculated over all replicates. Then its empirical cumulative distribution can be plotted against the geometric distribution, where the parameter p is estimated by (1) at each ANB. In the probability plots (not shown here to save space), significant deviations from the straight line indicate violation of the independent Bernoulli assumption. Zarnitsyna et al. (2007), reached similar conclusions using a different analysis, which motivated our present work. To provide further insight into the violation of the independent Bernoulli assumption, we use additional graphical plots to better characterize the dependence among repeated binary observations. The idea is to compare the conditional PANB given the previous test results. Define P (1|1) to be the conditional PANB given adhesion in the previous test and P (1|0) to be the conditional PANB given no adhesion in the previous test. If the test results are independent, then P (1|1) should be equal to P (1|0), and both can be estimated by the PANB in (1). In Figure 3, for each ANB, the “+” points represent the conditional probability, P (1|1), calculated for each replicate, and the dotted line represents P (1|1) calculated over all replicates under Table 1. Example of adhesion frequency experiment data ANB Fifty repeated adhesion tests .085 .085 01010011011101010000 . . . 00010000100010100110 . . . .. . 00111010000001000011 . . . 11110000111000000011 . . . .. . 1.360 1.360 Figure 2. PANB varying with the ANB. 1250 Journal of the American Statistical Association, September 2008 dent with a common underlying multivariate distribution. This model is used to represent the natural heterogeneity across individuals in the regression coefficient. More discussion about this model has been given by Diggle, Heaqerty, Liang, and Zeger (2002). 3.1.2 Binary Time Series Models. Non-Gaussian time series modeling techniques have been discussed extensively in the literature. Benjamin et al. (2003) proposed a generalized autoregressive moving average (GARMA) model. Applying this GARMA model with logistic link, a binary time series {yt } can be fitted as logit(µt ) = xt" α + Figure 3. Memory effects in micropipette experiments ( , P (1|0)). , P (1|1); the same ANB. Similarly, the triangle points and dashed line are those for the conditional probability P (1|0). For comparison, the solid line represents the PANB calculated by (1) at each ANB. Because the dotted line and “+” points are much higher than the dashed line and triangle points, the PANB is higher if adhesion occurs in the previous test. This provides strong evidence of a memory effect on repeated tests. A more in-depth biological discussion has been given by Zarnitsyna et al. (2007), who first observed the memory effect through a different analysis. From Figure 3, we can infer the existence of serial correlations and interactions. The figure also shows the heterogeneity among subjects. To describe and quantify significant effects on PANB , we consider the use of a new binary time series model that incorporates the various effects suggested by the plots. 3. MODELING AND ESTIMATION 3.1 Modeling In this section we propose a new binary time series model. But first, we review some existing models. 3.1.1 Random-Effects Models. Random-effects models are most useful in longitudinal data analysis when correlation arises from some unobservable variables shared among repeated observations. Consider a binary realization {yij } taking value 0 or 1 for subject i at the j th observation. For given subject-specific coefficients βi , assuming that the repeated observations for each individual are independent, the random-effects model takes the form log Pr(yij = 1|βi ) = β0 + βi + xij" α, 1 − Pr(yij = 1|βi ) (2) where the vector xij denotes the covariates associated with the fixed effects α and the random effects βi are mutually indepen- R ! r=1 ϕr A(yt−r ) + Q ! q=1 ζq M(yt−q , µt−q ), (3) where xt are covariates at time t and µt = E(yt |Ht ) is the conditional mean given the previous information Ht = {xt , . . . , x1 , yt−1 , . . . , y1 , µt−1 , . . . , µ1 }. A and M are functions representing the autoregressive (AR) and moving average (MA) terms with corresponding order R and Q. These two functions together are denoted by ARMA(R, Q). The ϕr ’s and ζq ’s are the AR and MA parameters. For binary time series, a reasonable choice for A and M can be yt and residuals such as yt − µt . Model (3) includes many well-known models as special cases. One important submodel is the Zeger–Qaqish model with logistic link, logit(µt ) = xt" α + R ! ϕr yt−r , (4) r=1 and the MA form of this model (Li 1994), logit(µt ) = xt" α + Q ! q=1 ζq (yt−q − µt−q ). (5) Asymptotic properties for the AR logistic regression models have been explored through conditional likelihood (Kaufmann 1987) and partial likelihood (Kedem and Fokianos 2002). We propose a binary time series mixed (BTSM) model, a multiple logistic time series model with random effects that takes into account the heterogeneity among experimental units. Consider a binary time series realization {yit } taking values 0 or 1 for subject i at time t, where i = 1, . . . , m, t = 1, . . . , n, and mn = N . Suppose that the experimental units are sampled from a population. It is reasonable to assume that the random effects, β i , are independent from a normal distribution with mean b and variance σb2 . For the vector β = (β1 , . . . , βm )" , its distribution can be written as N (b, #), where b is a column of b’s of length m, # = σb2 Im , and Im is the m × m identity matrix. The vector xit = {xit,1 , . . . , xit,p }" denotes the covariates associated with the p-dimensional fixed effects α = (α1 , . . . , αp )" , and zit = {zit,1 , . . . , zit,m }" denotes the design matrix for the random effects β such that z"it β = βi , that is, zit,i = 1 and zit,j = 0 for all j $= i. Denote the conditional mean as µit = E(yit |Hit ). Given the previous information, Hit = {xit , xit−1 , xit−2 , . . . , yit−1 , yit−2 , Hung et al.: Binary Time Series Modeling 1251 . . . , µit−1 , µit−2 , . . .} and random effects, the yit ’s are condiβ tionally independent with mean E(yit |β, Hit ) = µit . By the loβ gistic link function, the conditional mean µit is related to the β linear predictor ηit by β β logit(µit ) = ηit = z"it β + x"it α + + Q ! q=1 L ! l=1 γi xit−l yit−l + ζq (yit−q − µit−q ). R ! ϕr yit−r r=1 (6) This model is called a BTSM model. The random effects β are used to represent a variety of situations, including subject heterogeneity, unobserved covariates, and other forms of overdispersion. Here the heterogeneity is modeled directly through subject-specific parameter. If random intercept alone cannot sufficiently capture the variation exhibited in the data, then this model can be easily extended to a general form by incorporating more complicated random effects. Given βi , the yit ’s are correlated because yit−l explicitly influences yit . This correlation can be explained by the AR and MA components in (6). The MA process involving µi,t−q makes the model more complicated. In this formulation, the interaction terms (xit−1 yit−1 , . . . , xit−L yit−L ) between covariates and past outcomes provide flexibility in adjusting the time series structure with respect to different covariate settings. Our proposed BTSM model is general and includes the aforementioned models. The development here is based on the logistic link, which is popular and easy to interpret. It can be easily extended to other link functions, however. The randomeffects model in (2) is a submodel of the BTSM model under the assumption that the repeated measurements for each unit are independent, and the correlations among repeated observations arise only from some unobserved variables. With the logistic link function, the GARMA model in (3) is a special case of the BTSM model with no random effect included; that is, based on the population average, it models the time series structure without considering the heterogeneities among the units. More than a simple extension of existing models, the BTSM model poses some challenging tasks. By considering the hidden variables shared among units, it incorporates random effects in logistic time series regression. This makes estimation and inference more complicated and different from that in standard binary time series analysis. Another important issue is the goodness-of-fit test for model diagnostics. Related works on linear mixed models has been reported in the literature (Jiang 2001a,b); however, there is no existing method for testing the distributional assumption in binary time series models with random effects. Furthermore, the asymptotic chi-squared distribution cannot be applied to the new test statistics because of its independence assumption. Instead, we use a martingale central limit theorem in the next section to derive the asymptotic properties. 3.2 Estimation by Partial Likelihood The model-fitting procedure herein is based on partial likelihood (PL). PL was introduced by Cox (1975). A more formal definition and theoretical justification have been given by Wong (1986). Fokianos and Kedem (2004) have discussed using PL in time series that follow generalized linear models (GLMs). For the BTSM model, the presence of random effects causes some difficulty with integration, making the estimation different from that of standard methods in time series analysis. In this section we propose an approximation procedure to tackle this problem. Denote the observation vector by y = (y1 , . . . , ym )" , where the observations for subject i are yi = (yi1 , . . . , yin )" , γ = (γ1 , . . . , γO )" , ϕ = (ϕ1 , . . . , ϕR )" , and ζ = (ζ1 , · · · , ζQ )" . Assume that ω = (α " , γ " , ϕ " , ζ " )" are s-dimensional fixed effects and that X is the corresponding matrix with rows " Xit" = x"it , xit−1 yit−1 , . . . , xit−O yit−O , yit−1 , . . . , yit−R , # (yi,t−1 − µit−1 ), . . . , (yi,t−Q − µit−Q ) . Similarly, with rows z"it , the design matrixes for the random effects are denoted by Z. Given the previous information Hit , the corresponding PL for fixed effects is PL(ω|β) = m $ n $ i=1 t=1 plω (yit |β, Hit ) m $ n $ = [πit (ω|β)]yit [1 − πit (ω|β)]1−yit , i=1 t=1 β where πit (ω|β) = Pω|β (yit = 1|Hit ) = µit . The integrated quasi-PL function used to estimate (ω, σb2 ) is defined by ' & % 1 |#|−1/2 exp log PL(ω|β) − β " # −1 β dβ. 2 Because of the difficulty in implementing the full PL, we use the penalized quasi-PL (PQPL) as an approximation. The integrated quasi-partial log-likelihood can be approximated by Laplace’s method (Barndorff-Nielsen and Cox 1989; Breslow and Clayton 1993), 1 − log |Im + Zt WZ#| 2 ) m ! n ( ! πit (ω|β̃) yit log + + log(1 − πt (ω|β̃)) 1 − πit (ω|β̃) i=1 t=1 1 " − β̃ # −1 β̃, (7) 2 where W is the N × N diagonal matrix with diagonal terms = πit * (ω|β̃)(1 − πit (ω|β̃)) and β̃ = β̃(ω, σb ) is the solution wit* n 2 of m i=1 t=1 (yit − πit (ω|β))zit − β/σb = 0, which maximizes the sum of the last two terms in (7). Using derivations similar to those of Breslow and Clayton (1993), the PQPL score equations for fixed effects ω and random effects β are m ! n ! i=1 t=1 Xit (yit − πit (ω, σb )) = 0 (8) 1252 Journal of the American Statistical Association, September 2008 and m ! n ! i=1 t=1 zit (yit − πit (ω, σb )) = # −1 β, (9) where πit (ω, σb ) = Pω,σb (yit = 1|Hit ). Given σb , the maximum quasi-PL estimator (MQPLE) of (ω̂, β̂) can be obtained by solving these two score equations. An important role in PL inference is played by the score process (8)–(9), which is a vector of martingales with respect to Hit . Thus the study of asymptotic behavior of the MQPLE ω̂, described in Section 3.4, is based on central limit theorems for martingales. Questions regarding the existence and uniqueness of the MQPLE are important. Fu (1998) discussed these issues in a penalized likelihood estimation problem. Similar results can be extended to the MQPLE and provide the essential conditions needed for existence and uniqueness of the MQPLE. The restricted maximum likelihood (REML) (Patterson and Thompson 1971) version of the approximated profile quasilikelihood function for the variance components can be written as ql(ω̂(σb ), σb ) ≈ − 12 log |V| − 12 log |X" V−1 X| − 12 (Y − Xω̂)" V−1 (Y − Xω̂), fixed-effects estimators ω̂. With the aid of the working dependent variables Y defined in Section 3, Theorem 2 gives the asymptotic properties for the REML estimator of σb2 based on some asymptotic properties for linear mixed models (LMMs) with GLM iterative weights (Jiang 1996). Assumptions and proofs are given in the appendixes. Theorem 1. Under assumptions A1 and A2, the MQPLE for the fixed effects ω̂ are consistent and asymptotically normal as N → ∞, √ 1 N (ω̂ − ω) = +−1 (12) N √ Sn (ω, σb ) + op (1) N and √ d 1/2 N +N (ω̂ − ω) −→ N (0, Is ), (13) where sample information matrix +N = N1 × *n *the m " i=1 Xit Xit πit (ω, σb )(1 − πit (ω, σb )) and Sn (ω, σb ) = *t=1 n *m " t=1 i=1 Xit (yit − πit (ω, σb )) = X (y − π(ω, σb )). Based on work of Breslow and Clayton (1993), inference on variance component in model (6) can be formulated as an iterative procedure to estimate LMMs with the GLM iterative weight W−1 as (10) where X and Z are the design matrixes, V = W−1 + Z#Z" , and β β β Y is a vector with components Yit = ηit + (yit − µit )/(µit (1 − β µit )). Differentiating (10) with respect to σb2 gives the estimating equation for the variance components (Harville 1977; Searle, Casella, and McCullock 1992), )' & ( 1 ∂V ∂V = 0, − (Y − Xω)" V−1 2 V−1 (Y − Xω) − tr P 2 2 ∂σb ∂σb (11) where P = V−1 − V−1 X(X" V−1 )−1 X" V−1 . Estimation of the fixed effects and variance components can be obtained by iteratively solving (8), (9), and (11). This estimation procedure differs from that for standard generalized linear mixed models (GLMMs) because the new model involves time series structure; that is, the µit−q term depends on all of the previous observations throughout the iterations. We may compute µit−q by setting the initial µit−q ’s to 0 or to the sample mean of yit . This should have a negligible effect for a sufficiently long iteration. Estimation can be carried out by simple modification in standard statistical software for GLMMs, such as the SAS GLIMMIX package. Details about GLMMs have been given by Breslow and Clayton (1993). Questions regarding robust estimation and an efficient algorithm have been addressed by several authors (e.g., McCulloch 1997; Lin and Breslow 1996). Y = Xω + Zβ + ,, where β comes from N (0, #), , = (,1 , . . . , ,N ) follows from N (0, W−1 ), and the corresponding wit ’s are rewritten as (w1 , . . . , wN ). Recall that # = σb2 Im . Jiang (1996) developed rigorous asymptotic properties for REML estimates of variance components in LMMs without the normality assumption on random effects and errors; thus Theorem 2 is a special case for LMMs with known unequal weights. Here we borrow some notation from Jiang (1996). Define " #−1 V∗ = A At W1/2 VW1/2 A At , where A is any N × (N − s) matrix such that rank(A) = N − s and At W1/2 X = 0; + " #" g(σb ) = IN , σb2 W1/2 Z ; √ , wl , 1≤l≤N l β -l = +l−N , N + 1 ≤ l ≤ N + m; σ2 b " #−1/2 " 1/2 V1 = A" W1/2 VW1/2 A AW " #−1/2 × ZZ" W1/2 A A" W1/2 V W1/2 A ; V1 (σb ) = g(σb )V∗ W1/2 ZZ" W1/2 V∗ g(σb )" ; IN = 3.3 Asymptotic Properties Here we study large-sample properties for fixed effects and variance components in our BTSM model. Considering a model that includes time-dependent covariates, Fokianos and Kedem (2004) studied the asymptotic behavior of fixed effects ω in generalized linear time series models using PL inference. Theorem 1 is an extension of their work to multiple binary time series models with random effects. Based on the quasi-PL, the theorem gives the consistency and asymptotic normality for the (14) and tr(V1 V1 ) ; m K= N +m 1 ! (E-l4 − 3)V1 (σb )2ll ; m l=1 J = 2I N + K. Theorem 2. Under assumptions A3 and A4, as N → ∞ and m → ∞, the REML estimate for variance components is consistent and asymptotically normal with √ d J −1/2 I N m(σ̂b2 − σb2 ) −→ N (0, 1). (15) Hung et al.: Binary Time Series Modeling 1253 4. GOODNESS OF FIT FOR MODEL DIAGNOSTICS 4.1 Goodness-of-Fit Test Pearson’s chi-squared test is generally used to test whether data follow some specific distribution. An important assumption for this test is the independence of the observations. How can we perform testing for model assumptions when the data come from a binary time series model? One approach is to classify the responses according to mutually exclusive events in terms of the previous output and then check the differences between the observed and theoretical frequencies in each category. This can be written as follows. Assume that the binary data yit come from a binomial distribution with probability p depending on Hit . Hit , defined in Section 3.1, can be decomposed into several mutually exclusive events. Recall that Hit = {xit , xit−1 , xit−2 , . . . , yit−1 , yit−2 , . . . , µit−1 , µit−2 , . . .}. Suppose that the decomposition is determined by n1 covariates (xit , . . . , xit−n1 ) decomposed into c1 exclusive subsets, n2 AR effects (yit−1 , . . . , yit−n2 ) decomposed into c2 exclusive subsets, and n3 MA effects (µit−1 , . . . , µit−n3 ) decomposed into c3 exclusive subsets, where ci ≥ 1, for i = 1, 2, 3. Therefore, there are K exclusive events denoted by E1 , . . . , EK with K = c1 c2 c3 . Define Mk ≡ and ek (ω, σb ) ≡ m ! n ! i=1 t=1 ni m ! ! i=1 t=1 (16) (18) k=1 Unlike Pearson’s chi-squared test, the asymptotic distribution for this new statistic may not be chi-squared. Thus there is no need to have a normalizing constant in the test statistic to achieve a chi-squared distribution. Instead, for simplicity, we choose a unified N , as was suggested by Jiang (2001b). The asymptotic properties for the test statistic (18) are given in Theorem 3, a proof of which is given in Appendix D. The following notation is used in the theorem. Define θ = (ω" , σb ), ' & !! ∂ 1 1(Hit ∈Ek ) " πit (θ ) +−1 D= N , N ∂ω 1≤k≤K t i 1 0 Git = 1(Hit ∈Ek ) − DXit 1≤k≤K , i 1(Hit ∈Ek ) Pω,σb (yit = 1|Hit ). K ! (Mk − ek (ω, σb ))2 k=1 K 1 ! (Mk − ek (ω̂, σ̂b ))2 . N χ̂ 2 = 1 C = √ W1/2 V∗ W1/2 ZZ" W1/2 V∗ W1/2 , m )' & ( ∂ (I N )−1 ! ! 0= √ , 1(Hit ∈Ek ) 2 πit (θ ) m ∂σb 1≤k≤K t yit 1(Hit ∈Ek ) hit = Git (yit − πit (θ )) − 0Cit (Yit − Xit" ω)2 , Similar to Pearson’s chi-squared test, a test statistic can be defined by χ∗ ≡ includes time-dependent covariates. For this general formulation, inference is made based on the PL function. Asymptotic distribution of the new test statistic is derived by exploiting the martingale properties of the quasi-partial score process, which differs from that of Jiang (2001b). Define a new goodness-of-fit test statistic for distributional assumptions in the BTSM model as E(Mk ) . (17) If the parameters ω and σ are known, then the asymptotic distribution for this test statistic is chi-squared with K degrees of freedom. For our BTSM model, new construction and asymptotic properties of the goodness-of-fit test need to be rigorously established, for two reasons. First, the probability Pω,σb (yit = 1|Hit ) is not completely specified under the null hypothesis, because it involves the unknown parameters (ω, σb ). After the unknown parameters in ej (ω, σb ) are replaced by the estimated values (ω̂, σˆb ), the chi-squared approximation may not be valid (Chernoff and Lehmann 1954; Jiang 2001b). Second, because of the random effects and time series structures in the BTSM model, the observations are correlated. Accordingly, the asymptotic chi-squared result may not follow from the classic central limit theorem. Jiang (2001b) derived the asymptotic distribution for goodness-of-fit test in LMMs with continuous response to assess the adequacy of distributional assumptions. Here we construct a new test statistic based on binary observations and the corresponding time series model. Furthermore, our BTSM model Vh = m ! n ! R = tr((CV)2 ) − !! (CV)2it , 1N = (N )−1 [Vh + 20R0" ]. (19) var(hit ), i=1 t=1 i t and Note that for N × N matrixes C and CV, Cit and (CV)it represent the ((i − 1)n + t)th diagonal elements. Theorem 3. Suppose that 1N in (19) converges to a limiting value, 1. Under assumptions A1–A8, as N → ∞, the asymptotic distribution of the goodness-of-fit statistic (18) is d χ̂ 2 −→ K ! λi Z2i , (20) j =1 where 3 = diag(λ1 , . . . , λK ), λi are the eigenvalues of 1 and Z1 , . . . , ZK are iid N (0, 1). 2 + 20̂R̂ 0̂" ] denote the estimate of (19). Let 1̂ = N −1 [Vh Computing 1̂ is essential to obtain the critical values in the goodness-of-fit test. In practice, it often is straightforward to evaluate 1̂ by a Monte Carlo method as follows: &! ! ' −1 " ! var(hit ) + 20̂R̂ 0̂ 1̂ = N i t 1254 Journal of the American Statistical Association, September 2008 ≈N −1 3 U !! 1 ! " #" #" ĥit,(u) − hit ĥit,(u) − hit U t i +2 3 u=1 U 1 !0 0̂(u) R̂(u) 0̂"(u) U u=1 4 1 ! !" #" #" 1 = ĥit,(u) − hit ĥit,(u) − hit N −1 U t i 4 U 1 ! " +2 0̂(u) R̂(u) 0̂(u) , N Table 3. Empirical level of the goodness-of-fit test at 5% Model Chi(m, n) K = 2 K = 4 K = 6 K = 8 squared BTSM–AR(1) (24, 20) .042 (16, 10) .047 .046 .049 .046 .054 .044 .055 .094 .119 BTSM–MA(1) (24, 20) .046 (16, 10) .063 .052 .056 .054 .065 .062 .074 .084 .172 BTSM–AR(2) (24, 20) .056 (16, 10) .06 .042 .048 .048 .054 .06 .062 .151 .277 BTSM–ARMA(1, 1) (24, 20) .044 (16, 10) .055 .046 .058 .042 .051 .046 .056 .294 .346 u=1 where U is the number of Monte Carlo simulations; ĥit,(l) , N , and R̂ 0̂(l) , Iˆ(l) (l) are estimates with θ replaced by θ̂ ; hit = * U 1 u=1 ĥit,(u) ; yit is a sample from Bernoulli trials with probU ability following the fitted BTSM model, and the βi ’s are iid variables generated from N (b̂, σ̂b ). As mentioned in Section 3.2, Laplace’s method can be applied to approximate integration in ∂πit (θ )/∂ω" and ∂πit (θ )/∂σb2 . 4.2 Finite-Sample Performance and Empirical Application To examine the finite-sample performance of the proposed tests, we carried out some simulations under nulls and alternatives. Each result was calculated based on 5,000 simulations with a 5% significance level. Two sample sizes, N = 480 (m = 24, n = 20) and N = 160 (m = 16, n = 10), and four different partitions (K = 2, 4, 6, 8) were studied. For simplicity, here we focus only on equal cell partitions in this simulation study. As mentioned in Section 4.1, when unknown parameters are involved, no existing test has a valid asymptotic distribution. Thus, we compare our method with a naive test, Pearson’s chi-squared test in (17), but with parameters estimated. Because parameters are not assumed known, a naive way to apply Pearson’s chi-squared test is to modify the asymptotic chi-squared distribution with K − 1 − a degrees of freedom, where a is the number of parameters being estimated. Here we conducted the comparison only for K = 8. For example, for the second model [BTSM–AR(2)] in Table 2, five parameters were estimated (three fixed effects, one random effect, and one corresponding variance). As noted earlier, the chi-squared distribution with 2(= 8 − 1 − 5) degrees of freedom may be incorrect (even asymptotically). Because this naive critical value is too small, using the correct critical value possibly could correct the empirical levels in our simulations. Clearly, however, this would only come at the expense of further reducing the power. Binary data were generated using the BTSM models listed in Table 2 with four different time series structures. Table 3 reports the empirical rejection probabilities associated with these four models to examine the empirical level of the test. In general, as the sample size increases, the empirical level of the proposed test becomes more stable with respect to the number of partitions K. Compared with the naive Pearson chi-squared test, the proposed method performs better in two respects. First, when the number of estimated parameters involved in the model increases, the proposed method provides a more stable empirical level; for example, the empirical level of the naive test almost doubles and far exceeds the nominal 5% level when the number of estimated parameters increases from four [AR(1) or MA(1)] to five [AR(2) or ARMA(1, 1)]. This is because the critical value of the naive test decreases rapidly as the number of estimated parameters increases. The other advantage of the proposed method is its performance robustness to sample size. For the naive test, the empirical level increases dramatically as the sample size decreases, whereas for the proposed method, this increase is only slight to modest. Table 4 reports the computing times on a 3.4-GHz PC for calculating the empirical level (based on 5,000 simulations) using R. The computing time increases linearly with sample size, whereas it increases marginally with K. In terms of power, we chose two alternatives to assess the distributional assumptions involved in the fitted model (at the 5% level): the Bernoulli assumption for the binary data and the normal assumption for the random effects. The first alternative assumes that the random effects are normally distributed and that the binary data follow a beta-binomial distribution; that is, the yit ’s are generated from a Bernoulli(Pit ) distribution, and Pit is a random variable with a beta(µit , 1 − µit ) distribution. The second alternative assumes a departure from the normal assumption for random effects. Let yit follow a Bernoulli(µit ) distribution and let the random effect β follow a mixture of two normal distributions, N (b1 , 1) and N (b2 , 1), with probabilities prob and 1 − prob, denoted by MIXN(b1 , b2 , prob). In the simulation, the random effect is assumed to be MIXN(−.5, .5, .3). Table 2. BTSM models with four different time series structures BTSM–AR(1) BTSM–MA(1) BTSM–AR(2) BTSM–ARMA(1, 1) Model βi logit(µit ) = βi + 1.3yit−1 + .3xit , xit = .2, .4, .6, .8 logit(µit ) = βi + 1.3(yit−1 − µit−1 ) + .3xit , xit ∈ (0, 1) logit(µit ) = βi + yit−1 + .5yit−2 + .3xit , xit ∈ (0, 1) logit(µit ) = β + 1.5yit−1 + .5(yit−1 − µit−1 ) − .5xit , xit ∈ (0, 1) N (−.3, .5) N (−.3, .5) N (−1, .5) N (−.3, .5) Hung et al.: Binary Time Series Modeling 1255 Table 4. Computing times (in minutes) for calculating the empirical level Model Table 6. Power of testing normal random effects under the mixed normal distribution Chi(m, n) K = 2 K = 4 K = 6 K = 8 squared Model Chi(m, n) K = 2 K = 4 K = 6 K = 8 squared BTSM–AR(1) (24, 20) (16, 10) 13 5 15 6 15 6 17 6 17 6 BTST–AR(1) (24, 20) .319 (16, 10) .205 BTSM–MA(1) (24, 20) (16, 10) 16 7 18 7 18 7 20 9 20 9 BTSM–MA(1) (24, 20) .994 (16, 10) .924 (24, 20) (16, 10) 18 6 18 7 20 7 20 8 20 8 BTSM–AR(2) BTSM–ARMA(1, 1) (24, 20) (16, 10) 18 7 18 7 21 8 22 8 22 8 BTSM–AR(2) For each alternative, the µit ’s are obtained from the values specified in the four models given in Table 2. Based on the generated data, models were fitted by the procedure described in Section 3. Tables 5 and 6 report the empirical rejection probabilities for both alternatives associated with four BTSM models to evaluate the empirical power. Clearly, for the first two models, the proposed test is more powerful than the naive test for both alternatives [with the exception of BTSM–AR(1), m = 24, n = 20, K = 2 in Table 6]. In some cases, for the latter two models, when K is small (mostly for K = 2, and some for K = 4), the naive method has more power than the proposed method, but this is due to the higher empirical levels of the former, as shown in Table 3. Another issue is the dependence of performance on K, the number of cells. It is well known that the power of this type of goodness-of-fit test can vary greatly with K. This is shown by the simulation results, especially when the sample size is smaller. Therefore, proper choice of partitions is important. This leads to the following guidelines for choosing the optimal number of partitions. Although the construction of the goodness-of-fit test allows arbitrary partitioning of the cells, its performance depends on choosing the proper number of exclusive subsets K in (16). How does one choose the optimal number of partitions? First, to ensure sufficient power, K should not be too small, because the fewer the cells, the more difficult it is to distinguish between two distributions. On the other hand, if there are too many cells, then the size of the test may become a problem, because the asymptotic distribution of the test is based on a K-dimensional central limit theorem. A necessary condition to maintain this Table 5. Power of testing the Bernoulli assumption under the beta-binomial distribution Model Chi(m, n) K = 2 K = 4 K = 6 K = 8 squared BTSM–AR(1) (24, 20) .651 (16, 10) .361 .659 .403 .718 .608 .772 .548 .574 .296 BTSM–MA(1) (24, 20) .792 (16, 10) .528 .806 .729 .811 .579 .912 .634 .778 .428 BTSM–AR(2) (24, 20) .688 (16, 10) .603 .858 .596 .762 .586 .756 .594 .728 .578 BTSM–ARMA(1, 1) (24, 20) .402 (16, 10) .216 .818 .286 .754 .428 .746 .502 .648 .486 .427 .314 .486 .458 .674 .327 .354 .135 .983 1 .988 .998 .97 .993 .826 (24, 20) .596 (16, 10) .336 .607 .375 .686 .395 .702 .448 .618 .432 BTSM–ARMA(1, 1) (24, 20) .546 (16, 10) .323 .658 .348 .586 .356 .616 .434 .518 .346 1 asymptotic property is that K/N 1/5 → 0 (Senatov 1980; Jiang 2001b); therefore, the proper number of partitions should be chosen from 1 to .N 1/5 /. Within this range, conducting a simulation with comparable sample sizes will help determine the optimal number of partitions. R code for the simulations (available at http://www2.isye.gatech.edu/∼jeffwu/publications/ ), can be easily implemented. 5. APPLICATION IN THE ADHESION FREQUENCY EXPERIMENT In this section we revisit the adhesion frequency experiment data and apply the proposed model to predict adhesion probability, PANB . As described in Section 2, 37 pairs of cells ware used in this experiment. Adhesion test cycles for each pair were repeated 50 times. To study the time series behavior, the first five observations for each subject are treated as additional predictor variables; therefore, in this example, m = 37 and n = 45. The covariate here is the average number of bonds, denoted by ANBi for the ith pair of cells. For each pair of cells, the ANB is fixed; therefore, there is no time-dependent covariate in this example, and the one-dimensional (p = 1) covariates in model (6) can be simplified by assuming that xit = xit,1 = ANBi , for all t. With fixed effects, ω = (α1 , γ1 , ϕ1 , ζ1 ), and the corresponding Xit" = (ANBi , ANBi × yit−1 , yit−1 , (yit−1 − µit−1 )), the fitted BTSM model for PANB is logit(µit ) = βi + α1 ANBi + γ1 ANBi × yit−1 + ϕ1 yit−1 + ζ1 (yit−1 − µit−1 ), (21) where βi ∼ N (−1.33, .44). The value of the MQPLE is ω̂ = (.97, −.62, 1.76, −.86), with corresponding p values .004, .031, <.001, and .006. The estimated variance component σ̂b = .4 (with standard deviation .14) provides clear evidence of the substantial heterogeneity among subjects. In model (21), ANBi and yit−1 have significant effects on cell PANB at time t. The positive α1 value of .97 indicates increasing cell PANB with respect to the ANB. The adhesion memory can be described by a first-order ARMA process. The positive ϕ1 value of 1.76 indicates that the PANB is higher if adhesion occured in the previous test. The significant interaction (ANBi × yit−1 ) plays an important role in 1256 Journal of the American Statistical Association, September 2008 model interpretation. Based on the fitted model (21), the coefficient of the ANBi , .97 − .62yit−1 , shows that the effect of ANB is smaller if adhesion occured in the previous test (yit−1 = 1). On the other hand, based on the coefficient of yit−1 (i.e., 1.76 − .86 − .62ANBi = .9 − .62ANBi ), the effect of yit−1 is reduced as the average number of bonds increases. Furthermore, the memory effect is close to 0 if the ANB is around 1.45 (= .9/.62). This implies that for two cells with ANB > 4.3 seconds, the repeated adhesion tests become nearly independent. This model provides much new information on the adhesion frequency analysis, giving not only a flexible model for considering the memory effect, but also the conditions under which the independence assumption may hold. The distributional assumptions here are the normally distributed random effects and dependent Bernoulli distributed responses. To assess their adequacy, we applied the proposed goodness-of-fit test (18) in this example. Based on some simulation studies that we suggested in Section 4.2, the optimal number of partition in this example was K = 4. Therefore, we first partitioned the previous information space Hit into four disjoint events, E1 = (yit−1 = 0, µit−1 > .5), E2 = (yit−1 = 0, µit−1 ≤ .5), E3 = (yit−1 = 1, µit−1 > .5), E4 = (yit−1 = 1, µit−1 ≤ .5), and that is, c1 = 1, c2 = 2, and c3 = 2 in (16). We conducted 5,000 Monte Carlo simulations to evaluate 1̂. The corresponding eigenvalues for 1̂ are {.3100, .1428, .0350, .0252}. By Theorem 3, the critical values of the proposed goodness-of-fit test at α = .01, .05, and .1 are 2.3819, 1.6271, and 1.2556. The test statistic under model (21) is χ̂ 2 = .9392, which is much smaller than the critical values. Thus, we have no evidence on which to reject the hypothesis that the binary responses in adhesion tests follow a dependent Bernoulli distribution with probability given by model (21). Similar to the study in Section 4.2, we compared the proposed test with the naive chi-squared test in (17) with 1 degree of freedom. The naive test statistic has value 6.8838 with the corresponding p value of .0087. This would lead to the rejection of the hypothesis of dependency and the model in (21). In view of the simulation results in Section 4.2 that the naive test can have an very large test statistic value, such a conclusion cannot be taken seriously. Recall that the preliminary analysis in Section 2 demonstrates some memory effects in the repeated observations. By applying the BTSM model, the cell adhesion memory can be described by an ARMA(1, 1) process. Moreover, model (21) can quantify the effect of ANB and identify a significant interaction between ANB and the previous test result. This is a significant advantage, because in practice it is difficult to assess the MA and interaction effects by graphical analysis. As shown in this example, by including the interaction term, the BTSM model provides flexibility in capturing different time series structures with respect to different covariates. Given the fitted models, goodness-of-fit tests can be conducted to check the distributional assumptions. The test results provide statistical evidence as to the adequacy of the distributional assumption and supports model-based predictions. Another advantage of the BTSM model is that it incorporates random effects, allowing inference and predictions beyond the particular subjects used in the experiment. 6. SUMMARY AND CONCLUDING REMARKS Despite the prevalence of multiple binary time series data in many applications, their modeling and inference have not been systematically studied in the literature. Here we have proposed a BTSM model to analyze data when a repeated binary time series is observed for each subject. The model handles multiple time series by incorporating random effects to borrow strength across different subjects, thereby allowing inference and predictions beyond the specific units in the study. The BTSM model includes numerous known models as special cases, and it also may have applications in longitudinal analysis. Estimators for the fixed effects and variance components have been shown to be consistent and asymptotically normally distributed. We have proposed a new goodness-of-fit test to assess the adequacy of the distributional assumptions in the BTSM model. Because there are some unknown parameters and the data are dependent, the asymptotic distribution for the test statistic is derived using a martingale central limit theorem. Not surprisingly, the results differ from those of the classical Pearson’s chi-squares test. Our proposed test outperformed the naive Pearson’s chi-squared test in an simulation study. Some guidelines are given on the choice of the optimal number K of partitions. As an application, we applied our BTSM model to fit some multiple binary time series observed in a T-cell adhesion frequency experiment. This study demonstrates how the BTSM model can help quantitatively describe the effects of significant factors. Furthermore, the fitted model provides valuable information on MA and interaction effects that cannot be obtained from graphical analysis. Our example demonstrates that the first-order autocorrelation effect can be observed from graphical analysis, but not when higher-order autocorrelations are present. It also demonstrates the goodness-of-fit test. Although the covariates in this example are independent of time, the proposed model and inference are generally applicable to problems with time-dependent covariates. APPENDIX A: ASSUMPTIONS A1. The parameter ω belongs to an open set B ⊆ R s . surely in a nonrandom A2. The covariate matrix Xit lies almost * * " X > 0] = 1. compact subset of R s such that P [ i t Xit it 2 A3. σb ≥ 0 and var(β1 ) > 0. A4. As N → ∞, lim inf λmin cor(IN −s , V1 ) > 0 and lim tr(V"1 V1 )1/2 = ∞, where for matrixes A1 and A2 , cor(A1 , A2 ) = tr(A"1 A2 )/[tr1/2 (A"1 A1 ) tr1/2 (A"2 A2 )]. A5. 2N −1/2 aT" D2 and 2N −1/2 aT" 0J 1/2 2 are bounded, where 2κ2 = (κ " κ)1/2 for any vector κ. (Y* − Xω)" V−1/2 as a vector with elements 5it A6. Define 5" =* 2 and σN = i t var((C ∗ V)it 52it + G∗it (yit − πit (θ ))) + * 2 (it)" $=it (C ∗ V)2it,(it)" .There exists Lit such that as N → ∞, −2 * { it E[(C ∗ V)it × the following quantities converge to 0: σN * 1 2 2 (5it − 1)] 1(|5it |>Lit ) + 2 it$=(it)" ((C ∗ V)it,(it)" )2 [δit + −2 * ∗ 2 2 δ(it)" ]} and σN it (Git ) E(yit −πit (θ)) 1(|yit −πit (θ)|>Lit ) , 2 where δit = E5it 1(|5it |>Lit ) . Hung et al.: Binary Time Series Modeling 1257 A7. There exist Lit such that as N → ∞, the following quantities −4 * converge to 0: σN { it E[(C ∗ V)it (52it − 1)]4 × * ∗ 4 ∗ ∗ + + 1(|5it |≤Lit ) it$=(it)" ((C V)it,(it)" ) δit δ(it)" * * −4 * ∗ ∗ ∗ 2 2 4 it [ (it)" $=it ((C V)it,(it)" ) ] δit } and σN it (Git ) × ∗ = E(yit − πit (θ))4 1(|yit −πit (θ)|≤Lit ) , where δit E54it 1(|5it |≤Lit ) . 2 → 0, where ξ = (C ∗ V) − A8. As N → ∞, λmax ξ " ξ/σN diag(C ∗ V). Assumptions A1 and A2 are required for the asymptotic properties for fixed effects estimated from partial likelihood. Lindeberg’s condition holds under assumption A2 (Fokianos and Kedem, 1998), which leads to the proof of Theorem 1. Assumptions A3 and A4 are the same as those of Jiang (1996). APPENDIX B: PROOF OF THEOREM 1 Only sketches of the proofs are given in Appendixes B, C, and D. Details can be found at http://www.amstat.org/publications/jasa/ supplemental_materials. Based on the partial likelihood, the partial score process for ω can be written as Sn (ω, σb ) = n ! m ! t=1 i=1 Xit (yit − πit (ω, σb )). Assume that a σ -field is generated from the past data and covariates Fn−1 = σ (H1n , H2n , . . . , Hmn ). It is clear that E[Sn (ω, σb )|Fn−1 ] = Sn−1 (ω, σb ) and E[Sn (ω, σb )] = 0. Based on this and on assumptions A1 and A2, it is easy to see that the partial score process Sn (ω, σb ) is the sum of mean-0 martingale differences with respect to Fn−1 . The asymptotic normality follows from the martingale central limit theorem. Details of the proof are analogous to that outlined by Slud and Kedem (1994). where λ1 , . . . , λK are the eigenvalues of 1 (1) . The proof is omitted because it is similar to that of Jiang (2001a). The only difference is that here the asymptotics are proven using a martingale central limit theorem. Because we use the partial likelihood, this result is more general than that of Jiang (2001a). Lemma D.2. Using the notation in Section 3.3, for any µ ∈ R \ {0}, 0 √ ∗ (Y − Xω) µJ −1/2 I N m(σ̂b2 − σb2 ) = (Y − Xω)" BN " ∗ (Y − Xω)#1, (D.2) − E (Y − Xω)" BN ∗ = J −1/2 µW1/2 V ∗ W1/2 ZZ " W1/2 V ∗ W1/2 /√m. where BN Proof. Following the same argument as in Theorem 2, considering the LMM with GLM weights, we can obtain this result by modifying a formula of Jiang (1996, p. 276, first formula) to √ µJ −1/2 I N m(σ̂b2 − σb2 ) = - " BN - − E(- " BN - ), √ where BN = J −1/2 µV1 (σb )/ m and V1 (σb ) is as defined in Theorem 2. Lemma D.2 follows because - " g(σb )W−1/2 = (Y − Xω)" . Lemma D.3. Denote θ = (ω" , σb ), ξk = Mk − ek (θ̂), and ξ = (ξk )1≤k≤K . Let T be an orthogonal matrix such that T " 1N T = diag(λN,1 , . . . , λN,K ), where λN,1 , . . . , λN,K are the eigenvalues of 1N . For any a ∈ R K , n m ! " # ! ϒit + op (1), a " (N)−1/2 T " ξ = where T a = aT = (aT ,1 , . . . , aT ,K )" , G∗it = (N)−1/2 aT" Git , 5" = (5it ) = (Y − Xω)" V−1/2 , var(5it ) = 1, &! !( )' ∂ 1(Hit ∈Ek ) 2 πit (θ) C ∗ = (N)−1/2 aT" ∂σb 1≤k≤K t i × APPENDIX C: PROOF OF THEOREM 2 The inference for the variance component can be formulated as a LMM with variances of error terms following the GLM iterative weights in (14). Define Y∗ = W1/2 Y, X∗ = W1/2 X, Z∗ = W1/2 Z, and , ∗ = W1/2 ,. Replacing these in (14), the results follow directly as a special case of theorem 4.1 of Jiang (1996). APPENDIX D: PROOF OF THEOREM 3 The proof follows along the lines described by Jiang (2001b). It includes several lemmas that culminate in the final proof. Lemma D.1. Under the same assumptions as in Theorem 3, define 1n = 1n (θ) = n−1 (1) where 3 (1) hi = 1(Hi ∈Ek ) − (1) 5 n ! i=1 × +−1 n (ω)Xi 1≤k≤K 1 χ̂ 2 = n j =1 d (Mj − ej (ω̂))2 −→ k=1 (it)" $=it Note that C ∗ is a N × N matrix with Ci∗" t " ,it indicating the element in C ∗ with [(i " − 1)n + t " ]th column and [(i − 1)n + t]th row, and, by the ∗ definition given before Theorem 3, Cit,it = Cit∗ . − (N)−1/2 (yi − πi (ω)). (1) K ! ϒit = G∗it (yit − πit (θ)) − (C ∗ V)it 52it ) ( ! − (C ∗ V)it,(it)" 5(it)" 5it + (C ∗ V)it . k=1 Suppose that 1n converges to a limiting value 1 (1) . If there is no random effect in model (6) (i.e., m = 1 in Thm. 3), then the asymptotic distribution of the test statistic (18) is K ! = (N)−1/2 aT" 0C, K ! " # a " (N)−1/2 T " ξ = (N)−1/2 aT ,k (Mk − ek (θ)) 6 n # 1 !" " 1(Hj ∈Ek ) Xj (1 − πj (ω))πj (ω) n 4 and (I N )−1 C √ m Proof. For 1 ≤ k ≤ K, ξk = Mk − ek (θ) − (ek (θ̂) − ek (θ )). By definition, " (1) # var hn,i , j =1 (D.3) i=1 t=1 λk Z2k , (D.1) K ! k=1 aT ,k (ek (θ̂) − ek (θ )). By the Taylor expansion, the second term on the right side can be approximated by 6 35 m n K ! !! ∂ −1/2 (N) aT ,k 1(Hit ∈Ek ) " πit (θ) (ω̂ − ω) ∂ω k=1 i=1 t=1 + 5m n !! ∂ 6 4 1(Hit ∈Ek ) 2 πit (θ) (σ̂b2 − σb2 ) . ∂σb i=1 t=1 1258 Journal of the American Statistical Association, September 2008 The result follows by Theorem 1, assumption A5, and Lemma D.2. Lemma D.4. Under A6–A8, as N → ∞, m ! n ! i=1 t=1 d ϒit −→ N (0, a " 3a). (D.4) * *n Proof. First, derive the asymptotic distribution of m i=1 t=1 ϒit / σN , where σN is as defined in assumption A6. Decompose ϒit /σN = (1) (2) ϒit + ϒit , where ( ) ! " # 1 (1) G∗it u∗it + (C ∗ V)it Uit + ϒit = (C ∗ V)it,(it)" u(it)" uit σN " (it) $=it and (2) ϒit = ( ) ( ! 1 ∗ + (C ∗ V) V + G∗it vit (C ∗ V)it,(it)" v(it)" uit it it σN " (it) $=it + ( ! (it)" $=it ) ) 5(it)" (C ∗ V)it,(it)" vit . Define Uit = (52it − 1)1(|5it |<Lit ) − E(52it − 1)1(|5it |<Lit ) , Vit = (52it − 1) − Uit , vit = 5it − uit , and *m *n i=1 (2) t=1 ϒit con- verges to 0 in L2 . Next consider ϒit , an array of martingale differences, following the same argument as in theorem 5.2 of Jiang (1996). Based on assumption A7 and Rosenthal’s inequality (Hall and Heyde (1) 1980), maxit |ϒit | is bounded in L2 and converges to 0 in probability. By theorem 3.2 of Hall and Heyde (1980), to prove Lemma D.4, * * (1) we need to show that i t (ϒit )2 converges to a " 3a in probability. First, it can be decomposed as m ! 3 3 n ! ! " (1) #2 ! ϒit = ti + si , j =1 where −2 t1 = σN m ! n ! 0" ∗ #2 (C V)it Uit + G∗it u∗it i=1 t=1 " #2 1 − E (C ∗ V)it Uit + G∗it u∗it , ) m ! n ( ! ! −2 (C ∗ V)it,(it)" u(it)" t2 = 2σN i=1 t=1 (it)" $=it 0 × (C ∗ V)it (Uit uit − E(Uit uit )) 1 + (G∗it u∗it )uit − E((G∗it u∗it )uit ) , )2 ) m ! n (( ! ! −2 (C ∗ V)it,(it)" u(it)" (u2it − Eu2it ) , t3 = 2σN i=1 t=1 (it)" $=it and 0 m ! n ( ! ! i=1 t=1 (it)" $=it (C ∗ V)it,(it)" u(it)" ) 1 × E((C ∗ V)it Uit uit ) + E((G∗it u∗it )uit ) , −2 s3 = 2σN m ! n (( ! ! i=1 t=1 (it)" $=it (C ∗ V)it,(it)" u(it)" )2 ) Eu2it . By assumption A7 and Rosenthal’s inequality, we can show that ti → 0 in L2 for i = 1, 2, 3, which is similar to the result in theorem 5.2 of Jiang (1996). By assumption A7, −2 s1 = σN m ! n ! i=1 t=1 " # var (C ∗ V)it 52it + G∗it (yit − πit (θ)) + op (1). (D.5) Analogous to theorem 5.2 of Jiang (1996), by assumptions A6–A8, we have ' & λmax (ξ " ξ ) 1/2 → 0, (D.6) Es22 ≤ c 2 σN (D.7) Proof of Theorem 3 ∗ = (y − π (θ)) − u∗ . vit it it it j =1 −2 s2 = 2σN * * (1) 2 2 By (D.5) and (D.7), i t (ϒit ) = 1 + op (1). Because σN = " " " a T 1N T a, it converges to a 3a in probability. Consequently, (D.4) follows. − E(yit − πit (θ))1(|yit −πit (θ)|<Lit ) , i=1 t=1 i=1 t=1 (it)" $=it u∗it = (yit − πit (θ))1(|yit −πit (θ)|<Lit ) (1) m ! n ! " #2 E (C ∗ V)it Uit + G∗it u∗it , where c represents for a constant and ! −2 (C ∗ V)2it,(it)" + op (1). s3 = 2σN uit = 5it 1(|5it |<Lit ) − E5it 1(|5it |<Lit ) , By assumption A6, we can easily show that −2 s1 = σN From Lemmas D.2–D.4, we have, for any a, " # d a " N −1/2 T " ξ −→ (a " 3a) 1/2 Z, d where Z ∼ N (0, 1), from which N −1/2 T " ξ −→ N (0, 3) follows. [Received May 2007. Revised April 2008.] REFERENCES Barndorff-Nielsen, O. E., and Cox, D. R. (1989), Asymptotic Techniques for Use in Statistics, London: Chapman & Hall. Benjamin, M. A., Robert, A. R., and Stasinopoulos, D. M. (2003), “Generalized Autoregressive Moving Average Models,” Journal of the American Statistical Association, 98, 214–223. Breslow, N. E., and Clayton, D. G. (1993), “Approximate Inference to Generalized Linear Mixed Models,” Journal of the American Statistical Association, 88, 9–25. Chernoff, H., and Lehmann, E. L. (1954), “The Use of Maximum Likelihood Estimations in χ 2 Tests for Goodness of Fit,” The Annals of Mathematical Statistics, 25, 579–586. Chesla, S. E., Selvaraj, P., and Zhu, C. (1998), “Measuring Two-Dimensional Receptor-Ligand Binding Kinetics by Micropipette,” Biophysical Journal, 75, 1553–1572. Cox, D. R. (1975), “Partial Likelihood,” Biometrika, 62, 69–76. Diggle, P., Heagerty, P., Liang, K.-Y., and Zeger, S. (2002), Analysis of Longitudinal Data (2nd ed.), Oxford, U.K.: Oxford University Press. Fokianos, K., and Kedem, B. (1998), “Prediction and Classification of NonStationary Categorical Time Series,” Journal of Multivariate Analysis, 67, 277–296. (2004), “Partial Likelihood Inference for Time Series Following Generalized Linear Models,” Journal of Time Series Analysis, 25, 173–197. Fu, W. J. (1998), “Penalized Regressions: The Bridge versus the Lasso,” Journal of Computational and Graphical Statistics, 7, 397–416. Hall, P., and Heyde, C. C. (1980), Martingale Limit Theory and Its Application, New York: Academic Press. Hung et al.: Binary Time Series Modeling Harville, D. A. (1977), “Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems,” Journal of the American Statistical Association, 72, 320–340. Jiang, J. (1996), “REML Estimation: Asymptotic Behavior and Related Topics,” The Annals of Statistics, 24, 255–286. (2001a), “A Nonstandard Chi-Square Test With Application to Generalized Linear Model Diagnostics,” Statistics and Probability Letters, 53, 101–109. (2001b), “Goodness-of-Fit Tests for Mixed Model Diagnostics,” The Annals of Statistics, 29, 1137–1164. Kaufmann, H. (1987), “Regression Models for Nonstationary Categorical Time Series: Asymptotic Estimation Theory,” The Annals of Statistics, 15, 79–98. Kedem, B., and Fokianos, K. (2002), Regression Models for Time Series Analysis, New York: Wiley. Li, W. K. (1994), “Time Series Models Based on Generalized Linear Models: Some Further Results,” Biometrics, 50, 506–511. Lin, X., and Breslow, N. E. (1996), “Bias Correction in Generalized Linear Mixed Models With Multiple Components of Dispersion,” Journal of the American Statistical Association, 91, 1007–1016. Marshall, B. T., Long, M., Piper, J. W., Yago, T., McEver, R. P., and Zhu, C. (2003), “Direct Observation of Catch Bonds Involving Cell-Adhesion Molecules,” Nature, 423 190–193. McCulloch, C. E. (1997), “Maximum Likelihood Algorithms for Generalized Linear Mixed Models,” Journal of the American Statistical Association, 92, 162–170. 1259 Mehta, A. D., Rief, M., Spudich, J. A., Smith, D. A., and Simmons, R. M. (1999), “Single-Molecule Biomechanics With Optical Methods,” Science, 283, 1689–1695. Patterson, H. D., and Thompson, R. (1971), “Recovery of Interblock Information When Block Sizes Are Unequal,” Biometrika, 58, 545–554. Searle, S. R., Casella, G., and McCulloch, C. E. (1992), Variance Components, New York: Wiley. Senatov, V. V. (1980), “Uniform Estimates of the Rate of Convergence in the Multi-Dimensional Central Limit Theorem,” Theory of Probability and Its Applications, 25, 745–759. Slud, E., and Kedem, B. (1994), “Partial Likelihood Analysis of Logistic Regression and Autoregression,” Statistica Sinica, 4, 89–106. Wong, W. H. (1986), “Theory of Partial Likelihood,” The Annals of Statistics, 14, 88–123. Zarnitsyna, V. I., Huang, J., Zhang, F., Chien, Y.-H., Leckband, D., and Zhu, C. (2007), “Memory in Receptor–Ligand Mediated Cell Adhesion,” Proceedings of the National Academy of Sciences USA, 104, 18037–18042. Zeger, S. L., and Qaqish, B. (1988), “Markov Models for Time Series: A QuasiLikelihood Approach,” Biometrics, 44, 1019–1032. Zhu, C., Long, M., Chesla, S. E., and Bongrand, P. (2002), “Measuring Receptor/Ligand Interaction at the Single-Bond Level: Experimental and Interpretative Issues,” Annals of Biomedical Engineering, 30, 305–314.