Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psychometrics wikipedia , lookup
Sufficient statistic wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Confidence interval wikipedia , lookup
Foundations of statistics wikipedia , lookup
Gibbs sampling wikipedia , lookup
Misuse of statistics wikipedia , lookup
Biomrtrika (1986). 73. 1. pp. 91-104 91 Printed ill Great Britain Confidence regions and tests for a change-point in a sequence of exponential family random variables BY K. J. WORSLEY Department of Mathematics and Statistics, McGUl University, Montreal, Quebec H3A 2K6, Canada SUMMABY Maximum likelihood methods are used to test for a change in a sequence of independent exponential family random variables, with particular emphasis on the exponential distribution. The exact null and alternative distributions of the test statistics are found, and the power is compared with a test based on a Linear trend statistic. Exact and approximate confidence regions for the change-point are based on the values accepted by a level a likelihood ratio test and a modification of the method proposed by Cox & SpJ0tvoll (1982). The methods are applied to a classical data set on the time intervals between coal mine explosions, and the change in variation of stock market returns. In both cases the confidence regions for the change-point cover historical events that may have caused the changes. Some key words: Change-point; Change in variation; Exponential family. 1. INTRODUCTION The time intervals between explosions in British coal mines between 1875 and 1950, in which more than ten people were killed, have been analysed in an early paper by Maguire, Pearson & Wynn (1952). They concluded that the data had an exponential distribution with constant mean over time. Cox & Lewis (1966, Ch. 3) reanalysed the data with more powerful techniques and found strong evidence that the mean did not remain constant, but that it followed a quadratic trend in time. A plot of this data is given in Fig. l(a), and quite clearly there is a sharp increase in the time intervals after 1895, indicating that mine explosions became less frequent. This suggests a model for the mean time interval which remains constant up to an unknown point in the sequence and then changes to a different mean which remains constant for the rest of the sequence. The model can be formulated as follows. Let Xlt ...,XHbe the sequence of n independent time intervals between accidents, ordered in time, and let /z, = E(Xt) for i = I, ...,n. Consider the model for a change in mean after the fcth observation: where \x and fi* are the unknown means before and after the unknown change-point k. In general, we shall assume that Xt has a one-parameter exponential family distribution with density f{xt) = exp[{zia(^)-6(/i,)}/tf>,-l-c(3;,,<£/)] (t = 92 K. J. WORSLEY where a, b and c are known functions which specify the distribution, and (f>{ is a known dispersion parameter. If Xt is discrete, then j{xt) is the probability function rather than the density. This general model covers many important cases. For this example, where we assume that the intervals between accidents have an exponential distribution, a(^,) = —l/fiit b(t*t) = log/ij and (/>, = 1 (i = 1,...,»). Another example is given by Hsu (1979) who considers the problem of a change in mean of a sequence of gamma random variables, with applications to the variation of stock market returns. If the returns are assumed to be normally distributed with zero mean, then the squared return X( is a multiple of a chisquared random variable with one degree of freedom and its expectation nt is the variance of the return. Thus a change in variation can be modelled as H with a and b as before and 0 ( = 2. A plot of the squared return of the Dow-Jones Industrial Average between July 1971 and July 1974 is shown in Fig. l(b). The data suggest a change in variation near March 1973. The problem of testing for a change in mean of a binomial proportion has been considered by Hinkley & Hinkley (1970). Here Xt is the proportion of successes out of Nt trials, a(Hi) = log {/ij/(l-/*,)}, &(/*,) = -log(l-/i,) and 0, = l/Nt for i =l,...,n. Finally, if a(Hj) = (it and b([i,) = %nf, then Xt is normally distributed with variance 4>t. The problem of testing for a change in the mean of normal random variables has been considered by many authors; Zacks (1982) gives an extensive bibliography. Hinkley (1970) considers the general problem of inference about the change-point k. The aim of this paper is to use maximum likelihood methods to test the null hypothesis Ho of no change, /i = /z*, against the alternative Hx: fi #= /**, find the power of the test, give a point estimate k of when the change took place and find a confidence region for the change-point. Hinkley (1970) gives some asymptotic results for the distribution of k but points out that k is not sufficient for k and so inference about k based on the likelihood ratio should be more efficient. Instead we shall propose a confidence region for k that contains those values which cannot be rejected as the true values by a level a likelihood ratio test. A similar nonparametric confidence region for the size of the change in location has been proposed by Schechtman (1983). A confidence region for the partitions of means into groups proposed by Cox & Spjotvoll (1982) could also be applied to this problem provided that we restrict ourselves to two groups of adjacent time points in the sequence. The confidence region for k would then consist of all change-points that partition the sequence into two subsequences that are judged internally homogeneous in mean at a predetermined level a. A suitable test of homogeneity for gamma random variables is Bartlett's test. Unfortunately, as we shall see in §6, Maguire et al. (1952) conclude using Bartlett's test that the entire sequence of time intervals between coal mine explosions is homogeneous in mean, and so following Cox & SpJ0tvoll (1982) we would conclude that no changes took place. However using the more powerful likelihood ratio test of § 3 there is overwhelming evidence for at least one change in the sequence. Obviously the Cox-Spjotvoll procedure is too conservative because Bartlett's test of homogeneity is not powerful enough. Instead of testing against the saturated model, the test of homogeneity could be replaced by one which tests for a time trend or for yet another unknown change-point. We would then include in the confidence set all changepoints that give rise to subsequences with no time trend or no further change-points. However if there is a gradual time trend or more than one change-point then this 93 Confidence regions and tests for a change-point (a) 5 -. B H 1 - o-l 1870 !••,"•• • T ' 1880 1890 1 ' 1900 ' i 1910 Year J—^^r >—r, »• 1920 1930 1940 |_o 1950 (b) r 25 25 -| £ •£ 20 • 20 s a 15 - • 15 h 10 =I 5 1972 1973 Year 1974 Fig. 1. Plots of Xt (circles), and Zk and Mt (lines), for (a) the time between coal mine explosions, and (b) the two week average squared per cent return of the Dow-Jones Industrial Average. Conservative 95% confidence regions (7, and Da for the change-points are indicated by horizontal lines at heights T—ca and da respectively. procedure may give an empty confidence set, since no subsequence would be judged internally homogeneous. In this case the full method of Cox & Spjotvoll (1982) in which the procedure is applied to all possible partitions is more appropriate. A simple method of estimating all the change-points in a sequence is the binary segmentation procedure used extensively in cluster analysis and applied by Scott & Knott (1974) to the partitioning of means in the analysis of variance. The sequence of observations is tested for a change, and if the change is judged significant at a 94 K. J. WORSLEY predetermined level a then the two subsequences before and after the change-point are tested in the same way. Vostrikova (1981) shows that under certain conditions this procedure consistently estimates all the change-points in a sequence. For the case of the normal distribution the exact null distribution of the test statistic is found by Hawkins (1977), and for the binomial distribution the exact null and alternative distributions are obtained by Worsley (1983). In §3 we shall show that these methods can be extended to the general case. Particular emphasis will be placed on the exponential distribution, where we shall make use of a fast and efficient algorithm due to Noe (1972) for the distribution of Kolmogorov-Smirnov type statistics. A table of percentage points of the null distribution of the test statistic is given. In § 4 the power of the test is compared with that of a test proposed by Hsu (1979) based on the slope of a linear time trend. Finally, in §5, an exact confidence region for the change-point is proposed, and compared with an exact Cox-Spjotvoll type confidence region, and simple conservative approximations are given. In § 6 the methods are applied to the mine accidents data and the stock market returns data, where we find strong evidence that a single change has occurred. In both examples the confidence region for the change-point covers historical events that may have played an important part in causing these changes. 2. TEST STATISTICS 21. Notation For simplicity we shall assume throughout that the dispersion parameter is constant for all observations, that is <t>t = <f> (i = I, ...,n)\ the generalization to unequal dispersion parameters is straightforward. It will also be more convenient to work in terms of the natural parameters 9( = a(n{), so that the density of Xt becomes f(Xi) = exP[{xiei-iP(9l)}/4, + c(xi,(l>)l (2-1) where ^(9^ = b{a~1(9{)}. Let 9 = a(fi) and 9* = a(/i*), and let Sk and S* be the sum of the first k and last k* = n — k observations: t= t The log likelihood of the sample is L = {Sk6-kifr(6) + Ste*-k*\(>(0*)}/(t>, (2-3) plus a constant which does not involve 9 or 9*. Clearly 8k and S* are sufficient for 9 and 9* for fixed k. Let us reparameterize so that A = 9* — 9 is the change in natural parameter, and 9* is a nuisance parameter. In many cases A has a natural interpretation. For the case of the exponential distribution, A = 1/fi—l/fi* is the change in accident rate, and for the binomial distribution A is the log odds ratio. Then the likelihood (23) becomes L = {-SkA + S9*-k\l/(9*-A)-k*i(,(9*)}/<t>, (2-4) where S = Sk + S* = Xt + ... + Xn is the sample sum. It is clear that conditional on S, Sk is sufficient for A for fixed k and that the distribution of the sample depends on S, A and k. For the case of known k, exact inference about A is now possible if we condition on S. However if k is unknown and Confidence regions and tests for a change-point 95 becomes an extra parameter of interest, conditional inference is not as straightforward. Instead we shall base the choice of our estimators and test statistics unconditional on 8, whereas we shall determine their distributions, and hence our inference, conditional on S. This will have the advantage that our inference will not depend on the nuisance parameter 6*. Since S itself does not depend on A;, then this advantage will be gained for all k, and in particular when k is unknown. 2-2. The likelihood ratio test for a change We wish to test the null hypothesis Ho: A = 0 of no change against the alternative Hx: A 4= 0, where \i, /i* and k are unknown. Consider the likelihood unconditional on S. Under Ho the maximum likelihood estimator of /j = /i* is X = S/n. Under H1 for fixed k, the maximum likelihood estimators of /i and /i* are Xk = SJk and X% = S*/k* respectively. The maximum likelihood estimator of A is Ak = a(X^) — a(Xk). Let Zk be minus twice the log likelihood ratio for fixed k. Then from (2-3) Zk = 2[k{Xka(Xk)-b(Xk)} + k*{Xta(Xt)-b(Xt)}-n{Xa(X)-b(X)}]/<j>. (2-5) Thus minus twice the log likelihood ratio, for unknown k, is T = m&xkZk = Zt, (2-6) say, where k is the maximum likelihood estimator of k. For the normal case, Zk is the square of the usual two-sample Z-statistic, Zk = (Xt-Xk)2/(4>/k + <fi/k*), (2-7) which has a noncentral chi-squared distribution with one degree of freedom, independent of S. For the gamma case, Zk is"minus twice the logarithm of Bartlett's statistic for two samples, Zk = 2{klog(X/Xk) + k*log(X/X!)}/<t>, (2-8) and as we shall see in the next section, the distribution of Zk does in general depend on S. 3. EXACT DISTRIBUTIONS 31. Distribution of Sk Let Pk,n(Sk;S) be the density or probability function of Sk conditional on S when A = 0; the dependence on <> / has been omitted for clarity. For the gamma case, /JkiII is a beta density in SJS with parameters k/<p and k*l4>; for the Poisson case, pkn is a binomial probability function with <S' trials and probability of success k/n; for the binomial case, fikn is a hypergeometric probability function; but for the inverse Gaussian case there is no explicit expression for /}t>(1; see Tweedie (1957). It can be shown that the density or probability function of Sk conditional on S when A 4=0 is gkjSk;S,A)ccex<p(-SkA/4>)pkjSk;S) ' (3-1) with constant of proportionality chosen so that the integral or summation over all Sk is unity. In general, gk „ depends on S as well as A, but for the gamma case there is a slight simplification. Since flk „ depends on S only through the ratio SJS, then gkn depends on S 96 K. J. WORSLEY and A only through the parameter AS. This parameter has a convenient interpretation in the exponential case: AS/n = A/(l/X) is the ratio of the change in rate to the estimated average rate. For simplicity we shall omit the dependence on n and S and write gk(8k;A)=gkj8k;8,A). 32. The null distribution of T The null distribution of T can be found by a straightforward generalization of the iterative method employed by Hawkins (1977) for the normal case, and by Worsley (1983) for the binomial case. Let FktH{t;8k,8) = Vv(Zl^t;i=\,...,k\8k,S) (k = l,...,n), (3-2) where Zn is defined to be zero. I t can be shown that the second derivative of Zk with respect to Sk is always positive so that the support of Fk „, namely {Sk: Zk ^ t}, is always an interval. Then pr (T ^ 11S) = FnJt ;S,S) = 0n(t; 8), (3-3) say. For simplicity we shall omit the dependence on n and S and write Fk(t;8k) = L E M M A 1. For k = 1, . . . , n — 1 and nt = n (i= k+1 FkJt;Sk,S). l,...,k + l), )=\Fk(t;Sk)pk,k+1(Sk;Sk+1)dSk if Zk + 1 ^ t, and zero otherwise, urith integration replaced by summation for discrete distributions. Proof. By the law of total probability Since 8k is a partial sum of independent random variables then S^-.-.S,, = S is a Markovian sequence. Conditional on 8k and S, then Slt ...,Sk-t are independent of Sk + l. Since Zt depends only on St conditional on fixed 8 (i = 1, ...,n— 1), then if Zk+l ^ t, and zero otherwise. The density of 8k, conditional on Sk + l and 8 depends on Sk + 1, and is given by Pk,k+i(Sk; Sk + l). • For the normal case this Lemma has been successfully used by Hawkins (1977) and Worsley (1983) to find the null distribution of T for samples up to size n = 50. For the Poisson and binomial cases integration is replaced by summation, and Worsley (1983) uses this Lemma to find the exact null distribution of T for samples up to size n = 200. However for other distributions such as the gamma and inverse Gaussian the Lemma leads to numerical problems when quadrature is used to evaluate the integrals. The reason is that Sk ^ Sk + l for positive random variables and the density fiktk + i(Sk; Sk + l) puts large weight at Sk near Sk + 1, particularly if <j> is large. In the gamma case with <f> = 2, so that the observations have a chi-squared distribution with one degree of Confidence regions and tests for a change-point 97 freedom, Pkik + i(Sk; *St + 1) is infinite at Sk = Sk + 1, and for the exponential case with <f> = 1, Pkik + i(Sk; *St + 1) is a maximum at Sk = Sk+l. Thus to control numerical errors, Fk + l(t; 8k + 1) must be evaluated at the same values of Sk + 1 as the values of Sk for which Fk(t;Sk) is evaluated. However {Sk: Zk < i) is not the same interval as {<St + 1: Zk + 1 ^ t} and so it is impossible to select values for both which are the nodes of a high accuracy quadrature rule such as Gaussian quadrature. Fortunately, for the special case of the exponential distribution there is another method available for calculating the null distribution of T which does not involve numerical integration. It is well known that the null distribution of SJS, ...,5 n _!/S conditional on S is the same as that of the order statistics U"^,..., U"n_X) of a sample of Ti—1 independent uniform random variables. Since Z, depends only on SJS, the event Zi < t is the event at(t) ^ SJS ^ b^t) for some at(t) and bt(t) for i = I, ...,n — 1, and so Oa(t;S) = vr(T^t\S) = pr{o,(<) ^ U*w ^ 6,(0; t = 1,...,»-1}. (3-4) Noe (1972) gives a fast efficient algorithm for calculating the probability that uniform order statistics are bounded by arbitrary numbers. If we use this algorithm it is possible to calculate the null distribution of T for sample sizes as large as n = 100 without excessive round off errors. It is clear from (3-4) that the null distribution of T conditional on S does not depend on S, and so it is also the unconditional distribution of T; a table of percentage points of T can thus be provided. Table 1 gives the level a points of T for a = 010,005,001 and Table 1. Level a. points ta of T for a sequence of n exponential random variables; maximum level a points caof T — Zk conditional on Sk and S; maximum level a points daofMk n <„ a = 0-10 c« d. t. a = 0-05 c. d. 10 20 30 40 50 60 70 80 90 100 6193 6-936 7-306 7-544 7-718 7-852 7-962 8-053 8132 8-200 5-049 6-056 6-536 6-840 7-058 7-225 7-360 7-473 7-568 7-652 6-623 7-606 8-075 8-373 8-588 8-753 8-887 8-999 9-094 9-177 7-661 8-428 8-809 9O54 9-233 9-371 9-483 9-577 9-658 9-728 6-356 7-444 7-956 8-279 8-509 8-686 8-828 8-946 9-047 9-134 8-071 9-079 9-559 9-865 10-084 10-254 10-391 10-505 10-602 10-687 10-992 11-797 12196 12-454 12-641 12-786 12-903 13-001 13085 13158 a = 0-01 c. d. 9-443 10-627 11195 11-551 11-804 11-997 12153 12-281 12-390 12-485 11-360 12-403 12-903 13-221 13-449 13-626 13-769 13-887 13-989 14-077 n = 10(10)100 found using the algorithm of Noe (1972). Note that these points do not appear to converge to a limiting distribution, but appear to increase without limit. For the normal case Hawkins (1977) proves that the null distribution of T is unbounded, and more recently Miller & Siegmund (1982) give an asymptotic normal distribution for T with mean and variance increasing with the sample size n. 33. The alternative distribution of T Under the alternative hypothesis of a change of size A after change-point k the distribution of T conditional on S can be found in a similar way to that of the null distribution. 98 LEMMA K. J. WORSLEY 2. Under the alternative hypothesis H^. A 4= 0 at change-point k, pr(T^t\S)= \Fk(t; Sk) FAt; 8?) gk(Sk; A) dSk. Proof. By the law of total probability, pr(T ^t\S) = p r ( Z , < t;i= 1, ...,n-l \Sk = s,8)dpr(Sk ^s\S). Conditional on Sk and S, S, is independent of Sj for t < k < j , and since Z, depends only on St then ,...,n-l\Sk,S). (3-5) Conditional on Sk, the first probability depends only on the distribution of Xl,...,Xk which by definition is Fk(t; 8k). The second probability depends only on the distribution of Xk+l,...,Xn, and since the reversed sequence has the same distribution as the original sequence under the null hypothesis then this probability is ^(JjS*). From (31), d pr (Sk ^ s | S) is gk(s; A) da and the result follows. • This Lemma has been used by Worsley (1983) to find the power of the test for the normal and binomial cases. For other distributions such as the gamma and inverse Gaussian the numerical evaluation of Fk and F& using Lemma 1 causes some difficulties. However for the exponential case Fk and F^ can be evaluated directly using the algorithm of Noe (1972). Conditional on Sk, SJS^ ...,8k-JSk have the distribution of k— 1 uniform order statistics U\l)t..., U\k-iy Hence Fk(t;Sk)=pr{al(t)S/Sk^ Uk(l) ^ b^S/S,, ;i = 1, ...,k-1}. (3-6) Since these calculations do not involve A then the alternative distribution of T conditional on S can easily be found for several values of AS using Lemma 2. As an added benefit a simple check on the numerical calculations can be obtained by putting A = 0. This gives the null distribution of T which should be the same for all values of k. 4. POWER COMPARISONS Hsu (1979) has proposed a test for a change in a sequence of gamma random variables under the assumption that the change can occur with equal probability at any point in the sequence. The likelihood ratio statistic is based on a linear combination of the observations weighted by the sequence number: W = ^(i-l)Z,/S (0<W<n-l). (4-1) It is straightforward to show that under the null hypothesis Ho of no change, the expectation of W is |(n— 1), and so we reject Ho for large values of | W—$(n— 1) |. The exact null distribution of W is complex, but for the exponential case <f> = 1 it has a simple form. We can write W as W^YSr/S, (4-2) where 8*/S,..., S*- JS have the distribution of n — 1 ordered uniform random variables. Confidence regions and tests for a cJtange-point 99 But since the sum of ordered statistics is the same as the sum of unordered'statistics then W has the distribution of the sum of n— 1 independent uniform random variables. It is then straightforward to show that the first four cumulants of W are 3n = 0, (4-3) Hsu (1979) gives the cumulants of W for general (p and suggests that an Edgeworth expansion can be used to find the approximate distribution of W. The power of W has been found by Hsu (1979) using simulation. We shall now find an approximate expression for the alternative distribution of W conditional on S using the approach of Lemma 2 combined with the Edgeworth expansion above. For a change of size A at change-point k, we have -f (4-4) W= (4-5) Jo by the law of total probability. Conditional on Sk and S, it can be shown that where r = SJS, r* = 1 — r, and Wk and Wp are independent statistics with the null distribution of W for samples of size k and k* respectively. Hence the first four cumulants of W conditional on Sk and 8 are Ki = = r2K2k- r*2 2k>, (4-6) Thus pr(W ^ w\Sk,S) can be found using the Edgeworth expansion, and integrated over the density of Sk to give the desired result. Note again that the accuracy of the calculations can be checked by putting A = 0; this gives the null distribution which can be compared with the direct Edgeworth approximation for W. For the exponential case, Fig. 2 gives the power of T and W at the level a = O05 for a sample of size n = 60 and k = 6, 10, 20 and 30. The power is plotted as a function of Sk, the size of the change A divided by the estimated asymptotic standard deviation of At Fig. 2. The powers of T (solid lines) and W (dashed lines) against 6k, for a = O05, n = 60, and k = 30,20,10,6 (from top to bottom on the right, from bottom to top on the left). The powers of Zk for known k are also shown for comparison (dotted lines). 100 K. J. WORSLEY under Ho. For the general case this is given by var(At) = var{a(Xt)-a(Xk)}^{<lf/k + <l>/k*)/V(X), (4-7) where V(n,) = l/a'(fi,) is the variance of X, as a function of /z,. For the exponential case, <f> = I, V{Hi) = nf, and we have X ~ i . (4-8) For known k the likelihood ratio test is based on Zk, and its power depends only on 5k. Thus the exact power of Zk is also included in Fig. 2 for comparison with T and W. Several points can be noted. The power of T is slightly lower than that of W for k near the middle of the sequence, but for k near the beginning or end T is much more powerful than W; T is almost always as powerful as Zk. Since 5k decreases as k approaches the ends of the sequence then all tests are less powerful at the ends of the sequence. The power curves are not symmetrical in Sk; the tests are more powerful at detecting a negative change than a positive change near the beginning of the sequence. Plots of the same power functions for a sample of size n = 30 showed a similar pattern. 5. INFERENCE ABOUT THE CHANGE-POINT 51. A confidence region for the change-point k An exact 1 — a confidence region for the change-point k is the set Ca of values K that cannot be rejected as the true change-point by a suitable level a test. Minus twice the log likelihood ratio for testing H*: k = K, against H*: k 4= K, where K is fixed, is T — ZK, so that Ca contains the set of values of K for which ZK is sufficiently close to its maximum T. If the data indicate that there are two distinct change-points, then both may be accepted as possible change-points, and so both points may be included in the confidence region. Thus such a region may be disconnected, and a disconnected confidence region may indicate the presence of more than one change-point. Since the null hypothesis now contains the nuisance parameter A, then our inference can be made free of A if we condition on the sufficient statistic SK as well as S. In this case ZK is fixed, and so the test statistic is effectively T conditional on SK and S. Its null distribution can be found in the same way as the alternative distribution of T. From (3-6) we have = FK(t;8K)FK.(t;Si), (5-1) Vr(T ^t\SK,S) where K* = n — K, and so the confidence region is Ca = {K: FK(T; SK)FK.(T; 8%)^l-a}. (5-2) Note that Ca may be empty for sufficiently large a. The construction of Ca requires either the direct evaluation of FK and FK. for each K or else a table of level a points of T — ZK conditional on SK and S, for each K, SK, n and S. For the normal and gamma cases FK does not depend on S but still such a table is obviously not practical. However, for the exponential case the maximum level a point ca of T — ZK, conditional on SK and S over all values of K and SK for fixed n, has been found by numerical methods. For n = 10(10)100 and a = 010, 005, 0-01, the maximum occurred at K = 1 and Sx — S/n, and the values of ca are given in Table 1. Thus a conservative 1 — a confidence region for k is Ga = {K.T-ZK^ cx} = {K: ZK > T-ca}; (5-3) in other words, the set of change-points for which the plot of Zk exceeds T — ca. This Confidence regions and tests for a change-point 101 means that if the plot has two separate peaks of almost equal height, suggesting two change-points in the sequence, then 0a will be two disconnected regions that contain them both. Note that since T — Z$ = 0 then &a always contains k. 52. A Cox-Spjotvoll type confidence region The method of Cox & Spjetvoll (1982) can be modified by choosing a test of homogeneity as follows. Let the confidence region Da contain all change-points K that partition the sequence into two subsequences in which we accept the hypothesis of no further change-points at level a. Consider the model with one known change-point at K and a second unknown change-point at k. Then we wish to test the hypothesis HQ against H\ as before, and put K in Da if H* is accepted at level a. Let T£ be the equivalent of the test statistic T evaluated only for the subsequence of observations Xl3..., XK and let T£ be the equivalent of T evaluated only for the subsequence XK + l, ...,Xn. Then minus twice the log likelihood ratio of H* against //f is MK = max {T£, T£). Conditional on SK and S, T£ and T£ are independent and so the null distribution of MK is ^t\ SK, S) = GK(t; SK) Gg.it; 8%) (5-4) and so an exact 1 — a confidence region for k is Da = {K:GK(T;SK)GK.(T;St)^l-a}. (5-5) Again the construction of Da is laborious, but for the exponential case the maximum level a point da of MK over all values of K for fixed n has been found by numerical methods. The maximum always occurred at K = %n, and the values of da for n=10(10)100 and a = 010,0-05,0O1 are given in Table 1. The conservative 1 - a confidence region for k is Ba = {K:MK^dx}. (5-6) The exact confidence of Ba is largest for A = 1, but for a = 0-05, it never exceeds 0-968 for n= 10^0)100. There is a close link between MK and T — ZK. Since H\ now contains two changes then it is straightforward to show that MK ^ T — ZK with approximate equality near the ends of the sequence. Hence da ^ ca, as can be seen from Table 1. 6. EXAMPLES 6-1. Time intervals between coal mine explosions Maguire et al. (1952) give the time intervals in days between explosions in British coal mines from 1875 to 1950. The n = 109 time intervals Xk together with the test statistics Zk are plotted against the cumulative time Sk in Fig. l(a). The maximum occurs at k = 46 which corresponds to the year 1890 and the estimated size of the change is A46 = 0-00553 explosions per day or 2-02 explosions per year. The test statistic for a change is T = 26-280, and from Table 1 the change is highly significant. The exact level a = 0-05 point of T for n = 109 is ta = 9784, well below the observed value of T. The simplest confidence region to construct is the conservative region Ca for the change-point k. A line at a height of ca = 9-204 below T is drawn on the plot of Zk in Fig. l(a). All values of k for which Zk lies above this line are in (Ja. Now 6a = {28,31,32,34, ...,53} contains the years 1884 to 1895 with a break in 1885 and 1886. The exact confidence region Ca = {36, ...,53} covers the narrower interval from 102 K. J. WORSLEY 1887 to 1895. The statistic for a second change Mk is also plotted in Fig. l(a), together with a line at a height of da = 10755. The conservative Cox-Spjotvoll type region £)a = {28,..., 53} covers the broader interval from 1884 to 1895, and the exact region Da is the same as 3X. In this example the region Cx is narrower than Da. Following the binary segmentation procedure of Vostrikova (1981), we may now look for further changes in the two subsequences before and after 1890. The two subsequences give T+6 = 2-230 and T£6 = 3529, well below the level a = 0-10 critical points, and so we conclude that no further changes took place. Note that MA6 = max {2-230,3529} is the test statistic for homogeneity in the Cox-Spjetvoll type confidence region Da. The fact that k = 46 is included in this region is further evidence that no more changes took place. It is interesting to try to find historical events that may have caused the observed change. In 1886 and 1894 two Royal Commissions reported on accidents in mines, and extensive testing demonstrated beyond all doubt that coal dust in the workings was to blame for originating and extending explosions beyond the coal face. They recommended wetting of the sides and floors to keep down dust, and a change from gunpowder to other forms of blasting powder that produce less flame. These recommendations were implemented in the Explosive in Coal Mines Order of 1896. 6'2. Variation in stock market returns Hsu (1979) gives 162 weekly closing values Pt of the Dow-Jones Industrial Average from July 1, 1971 to-August 2, 1974. The rates of return are R^iPt^-PJ/P, (t= 1 161). It is assumed that Rl, ...,R161 are independent normal random variables with constant mean and a variance which may change after an unknown time. The variance is estimated by Xt = (Rt — H)2, where M is the average rate of return, and since the sequence is long we may assume that Xt, ...,X16l are proportional to independent chi-squared random variables with one degree of freedom. Here E(Xt) = \i{ is the unknown variance, and we wish to test for a change in mean variance fit after an unknown point in the sequence. Using the linear trend statistic W, Hsu (1979) concluded that a highly significant change had taken place. To apply the methods of this paper we can overcome the numerical difficulties mentioned in §32 by transforming the data to exponential random variables by averaging pairs of adjacent values of X( to produce X't = \(X2i-i + X2i) (i = 1,....80). The change-point model now states that a change may take place after any pair of observations, and so only a small amount of information has been lost. Values of the average squared percent return X'k are plotted in Fig. l(b) together with the corresponding test statistic Zk. The maximum occurs at k = 44 which corresponds to late February 1973, and the estimated size of the change is KM = 0-274 per cent" 2 . The test statistic for a change is T = 24-894, much greater than the level a = 0O5 point ta = 9-577 from Table 1. The confidence region 6a = {32,..., 48} is the set of values of k for which Zk is above the line ca = 8946 units below T, also drawn in Fig. l(b). The exact interval is Ca = {34,37,..., 47}. From the plot of Mk in Fig. l(b), the set of A; for which Mk is below da = 10505 is 3a = {31, ...,48}. The exact interval Da is the same as Da. All intervals cover a period from October 1972 to March 1973. No further change-points were found in the subsequences before and after the first change-point: T^ = 5562 and TXA = 7-191. The start of the confidence region corresponds to the Watergate break-in, and the end corresponds to a period when U.S. prime interest rates began increasing. Confidence regions and tests for a change-point 103 7. CONCLUSION This paper has studied the problem of inference about the change in mean of a sequence of observations. Maximum likelihood methods have been used to test for a change, to estimate the change-point, and to give a confidence region for the changepoint. The main theoretical results rely on the Markovian property of a sequence of partial sums of independent random variables, and as such they can be applied to any statistic that is a maximum of a function of the partial sums. In particular, the conditional likelihood at k, maxA{7t(iSk ; &)/gk(Sk ; 0), is a function only of Sk for fixed S, and so the same theoretical results can be applied to the maximum conditional likelihood ratio test statistic; for simplicity we have confined our attention to the unconditional likelihood ratio test statistic T. Finally, the exponential family is assumed only in so far as it is characterized as the set of distributions for which the sums of the observations are sufficient for the mean. This implies that if we condition on the sample total then our inference is free of nuisance parameters, and as an added benefit the test statistic for a change at k depends only on the partial sum of the observations, thus guaranteeing the Markovian property. A natural question to ask is which of the two confidence regions Ca and Da is narrower. The answer depends very much on the assumed model. If we know that there is exactly one change-point k in the sequence then the probability that K is in the region can be found by conditioning on Sk, SK and S, applying the Markovian property, and integrating over 8k and SK. The calculations are laborious and have not been attempted. Examples in §§61 and 62 present some evidence that Ca is narrower than Da. On the other hand, if there is more than one change-point or a trend in the mean, then Da may be empty since no subsequence would be judged homogeneous in mean. If there are two change-points then Ca may contain them both, particularly if there is an increase in mean followed by a decrease. For these reasons Ca seems preferable. ACKNOWLEDGEMENTS T would like to thank the referees for suggestions which clarified many points. This work was supported by a Natural Sciences and Engineering Research Council of Canada grant and by a Fondation de Chercheurs et d'Action Concertee de Quebec subvention. REFERENCES Cox, D. R. & LEWIS, P. A. W. (1966). The Statistical Analysis of Series of Events. London: Methuen. Cox, D. R. & SPJ0TVOLL, E. (1982). On partitioning means into groups. Scand. J. Statist. 9, 147-52. HAWKINS, D. M. (1977). Testing a sequence of observations for a shift in location. J. Am. Statist. Assoc. 72, 180-6. HINKLEY, D. V. (1970). Inference about the change-point in a sequence of random variables. Biometrika 57, 1-1T. HINKLEY, D. V. & HINKLEY, E. A. (1970). Inference about the change-point in a sequence of binomial random variables. Biometrika 57, 477-88. Hsu, D. A. (1979). Detecting shifts of parameter in gamma sequences with applications to stock price and air traffic flow analysis. J. Am. Statist. Assoc. 74, 31-40. MAOUIBB, B. A., PEABSON, E. S. & WYNN, A. H. A. (1952). The time intervals between industrial accidents. Biometrika 88, 168-80. MILLEE, R. & SIEGMUND, D. (1982). Maximally selected chi square statistics. Biometrics 88, 1011-16. NOE, M. (1972). The calculation of distributions of two-sided Kolmogorov-Smimov type statistics. Ann. Math. Statist. 48, 58-64. SCHECHTMAN, E. (1983). A conservative nonparametric distribution-free confidence bound for the shift in the changepoint problem. Comm. Statist. A 12, 2455-64. 104 K. J. WORSLE\ A. J. & KNOTT, M. (1974). A cluster analysis method for grouping means in the analysis of variance. Biometrics 30, 507-12. TWEEDIE, M. C. K. (1957). Statistical properties of the inverse gaussian distributions II. Ann. Math. Statist. 28, 696-705. VOSTRTKOVA, L. Ju. (1981). Detecting "disorder" in multidimensional random processes. Sov. Math. Dokl. 24, 55-9. WOBSLBY, K. J. (1983). The power of likelihood ratio and cumulative sum tests for a change in a binomial probability. Biometrilca 70, 455-64. ZAOKS, S. (1982). Classical and Bayesian approaches to the change-point problem: fixed sample and sequential procedures. Statistique et Analyse des Donnies 1, 48-81. SOOTT, [Received April 1984. Revised April 1985] NOTK ADDED IN PROOF I am indebted to D. R. Cox'for referring me to the papers by Cobb (1978) and Jarrett (1979). The latter points out several errors in the data on coal mine explosions as given by Maguire et al. (1952) and gives a corrected data set of 190 observations extending back to 1851. The conclusions given in §61 remain substantially the same; the changepoint is estimated at exactly the same point in the data but because of the increased number of observations the likelihood ratio statistic increases to T = 71-219. However the widths of the confidence regions remain much the same, because extending the sequence of observations, as Hinkley (1970) points out, has little effect on inference about the change-point. The work of Cobb (1978) on a conditional solution to the change-point problem deserves more attention. Cobb shows that the conditional distribution of the maximum likelihood estimator k given the ancillary values of observations adjacent to k is approximately the same as a Bayesian posterior distribution under a uniform prior. Specifically, pr(k =fc|Xi._d+ 1 ,X approximately for some small d. We can thus construct an approximate (1—a) confidence region for k as Ea = {k: Zk ^ z} where z is the largest value such that In-l keE. / For the coal mine explosions data as used in §61 with a = 005, Ea = {36, ...,39,41, ...,53} which is the same as Ca with one less point. For the stock returns data Ea is exactly the same as Ca. The essential difference between Ca and Ea is that Cx contains acceptable values of k conditional only on sufficient statistics for the nuisance parameters, whereas Ea contains acceptable values of k conditional on the shape of the likelihood Zi~Zh for \i — k\ < d, near the estimated change-point. It is surprising that in these two examples the confidence regions are almost exactly the same, though in practice Ea is far easier to construct than Ca. ADDITIONAL REFERENCES COBB, G. \V. (1978). The problem of the Xile: Conditional solution to a change-point problem. Biometrika 65, 243-51. JARRETT, R. G. (1979). A not* on the intervals between coal-mining disasters. Biometrika 66, 191-3.