Download Confidence regions and tests for a change

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Sufficient statistic wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Confidence interval wikipedia , lookup

Foundations of statistics wikipedia , lookup

Gibbs sampling wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Biomrtrika (1986). 73. 1. pp. 91-104
91
Printed ill Great Britain
Confidence regions and tests for a change-point
in a sequence of exponential family random variables
BY K. J. WORSLEY
Department of Mathematics and Statistics, McGUl University, Montreal,
Quebec H3A 2K6, Canada
SUMMABY
Maximum likelihood methods are used to test for a change in a sequence of
independent exponential family random variables, with particular emphasis on the
exponential distribution. The exact null and alternative distributions of the test
statistics are found, and the power is compared with a test based on a Linear trend
statistic. Exact and approximate confidence regions for the change-point are based on
the values accepted by a level a likelihood ratio test and a modification of the method
proposed by Cox & SpJ0tvoll (1982). The methods are applied to a classical data set on
the time intervals between coal mine explosions, and the change in variation of stock
market returns. In both cases the confidence regions for the change-point cover historical
events that may have caused the changes.
Some key words: Change-point; Change in variation; Exponential family.
1.
INTRODUCTION
The time intervals between explosions in British coal mines between 1875 and 1950, in
which more than ten people were killed, have been analysed in an early paper by
Maguire, Pearson & Wynn (1952). They concluded that the data had an exponential
distribution with constant mean over time. Cox & Lewis (1966, Ch. 3) reanalysed the data
with more powerful techniques and found strong evidence that the mean did not remain
constant, but that it followed a quadratic trend in time. A plot of this data is given in
Fig. l(a), and quite clearly there is a sharp increase in the time intervals after 1895,
indicating that mine explosions became less frequent.
This suggests a model for the mean time interval which remains constant up to an
unknown point in the sequence and then changes to a different mean which remains
constant for the rest of the sequence. The model can be formulated as follows. Let
Xlt ...,XHbe the sequence of n independent time intervals between accidents, ordered in
time, and let /z, = E(Xt) for i = I, ...,n. Consider the model for a change in mean after the
fcth observation:
where \x and fi* are the unknown means before and after the unknown change-point k. In
general, we shall assume that Xt has a one-parameter exponential family distribution
with density
f{xt) = exp[{zia(^)-6(/i,)}/tf>,-l-c(3;,,<£/)]
(t =
92
K. J. WORSLEY
where a, b and c are known functions which specify the distribution, and (f>{ is a known
dispersion parameter. If Xt is discrete, then j{xt) is the probability function rather than
the density.
This general model covers many important cases. For this example, where we assume
that the intervals between accidents have an exponential distribution, a(^,) = —l/fiit
b(t*t) = log/ij and (/>, = 1 (i = 1,...,»). Another example is given by Hsu (1979) who
considers the problem of a change in mean of a sequence of gamma random variables,
with applications to the variation of stock market returns. If the returns are assumed to
be normally distributed with zero mean, then the squared return X( is a multiple of a chisquared random variable with one degree of freedom and its expectation nt is the
variance of the return. Thus a change in variation can be modelled as H with a and b as
before and 0 ( = 2. A plot of the squared return of the Dow-Jones Industrial Average
between July 1971 and July 1974 is shown in Fig. l(b). The data suggest a change in
variation near March 1973.
The problem of testing for a change in mean of a binomial proportion has been
considered by Hinkley & Hinkley (1970). Here Xt is the proportion of successes out of Nt
trials, a(Hi) = log {/ij/(l-/*,)}, &(/*,) = -log(l-/i,) and 0, = l/Nt for i =l,...,n. Finally,
if a(Hj) = (it and b([i,) = %nf, then Xt is normally distributed with variance 4>t. The
problem of testing for a change in the mean of normal random variables has been
considered by many authors; Zacks (1982) gives an extensive bibliography. Hinkley
(1970) considers the general problem of inference about the change-point k.
The aim of this paper is to use maximum likelihood methods to test the null hypothesis
Ho of no change, /i = /z*, against the alternative Hx: fi #= /**, find the power of the test,
give a point estimate k of when the change took place and find a confidence region for the
change-point.
Hinkley (1970) gives some asymptotic results for the distribution of k but points out
that k is not sufficient for k and so inference about k based on the likelihood ratio should
be more efficient. Instead we shall propose a confidence region for k that contains those
values which cannot be rejected as the true values by a level a likelihood ratio test. A
similar nonparametric confidence region for the size of the change in location has been
proposed by Schechtman (1983).
A confidence region for the partitions of means into groups proposed by Cox &
Spjotvoll (1982) could also be applied to this problem provided that we restrict ourselves
to two groups of adjacent time points in the sequence. The confidence region for k would
then consist of all change-points that partition the sequence into two subsequences that
are judged internally homogeneous in mean at a predetermined level a. A suitable test of
homogeneity for gamma random variables is Bartlett's test. Unfortunately, as we shall
see in §6, Maguire et al. (1952) conclude using Bartlett's test that the entire sequence of
time intervals between coal mine explosions is homogeneous in mean, and so following
Cox & SpJ0tvoll (1982) we would conclude that no changes took place. However using
the more powerful likelihood ratio test of § 3 there is overwhelming evidence for at least
one change in the sequence.
Obviously the Cox-Spjotvoll procedure is too conservative because Bartlett's test of
homogeneity is not powerful enough. Instead of testing against the saturated model, the
test of homogeneity could be replaced by one which tests for a time trend or for yet
another unknown change-point. We would then include in the confidence set all changepoints that give rise to subsequences with no time trend or no further change-points.
However if there is a gradual time trend or more than one change-point then this
93
Confidence regions and tests for a change-point
(a)
5 -.
B
H
1 -
o-l
1870
!••,"•• • T
'
1880
1890
1 '
1900
'
i
1910
Year
J—^^r
>—r,
»•
1920
1930
1940
|_o
1950
(b)
r 25
25 -|
£
•£ 20
• 20
s
a 15 -
• 15
h 10 =I
5
1972
1973
Year
1974
Fig. 1. Plots of Xt (circles), and Zk and Mt (lines), for (a) the time between coal mine explosions,
and (b) the two week average squared per cent return of the Dow-Jones Industrial Average.
Conservative 95% confidence regions (7, and Da for the change-points are indicated by horizontal
lines at heights T—ca and da respectively.
procedure may give an empty confidence set, since no subsequence would be judged
internally homogeneous. In this case the full method of Cox & Spjotvoll (1982) in which
the procedure is applied to all possible partitions is more appropriate.
A simple method of estimating all the change-points in a sequence is the binary
segmentation procedure used extensively in cluster analysis and applied by Scott &
Knott (1974) to the partitioning of means in the analysis of variance. The sequence of
observations is tested for a change, and if the change is judged significant at a
94
K. J. WORSLEY
predetermined level a then the two subsequences before and after the change-point are
tested in the same way. Vostrikova (1981) shows that under certain conditions this
procedure consistently estimates all the change-points in a sequence.
For the case of the normal distribution the exact null distribution of the test statistic is
found by Hawkins (1977), and for the binomial distribution the exact null and
alternative distributions are obtained by Worsley (1983). In §3 we shall show that these
methods can be extended to the general case. Particular emphasis will be placed on the
exponential distribution, where we shall make use of a fast and efficient algorithm due to
Noe (1972) for the distribution of Kolmogorov-Smirnov type statistics. A table of
percentage points of the null distribution of the test statistic is given. In § 4 the power of
the test is compared with that of a test proposed by Hsu (1979) based on the slope of a
linear time trend. Finally, in §5, an exact confidence region for the change-point is
proposed, and compared with an exact Cox-Spjotvoll type confidence region, and simple
conservative approximations are given.
In § 6 the methods are applied to the mine accidents data and the stock market returns
data, where we find strong evidence that a single change has occurred. In both examples
the confidence region for the change-point covers historical events that may have played
an important part in causing these changes.
2. TEST STATISTICS
21. Notation
For simplicity we shall assume throughout that the dispersion parameter is constant
for all observations, that is <t>t = <f> (i = I, ...,n)\ the generalization to unequal dispersion
parameters is straightforward. It will also be more convenient to work in terms of the
natural parameters 9( = a(n{), so that the density of Xt becomes
f(Xi) = exP[{xiei-iP(9l)}/4, + c(xi,(l>)l
(2-1)
where ^(9^ = b{a~1(9{)}. Let 9 = a(fi) and 9* = a(/i*), and let Sk and S* be the sum of
the first k and last k* = n — k observations:
t= t
The log likelihood of the sample is
L = {Sk6-kifr(6) + Ste*-k*\(>(0*)}/(t>,
(2-3)
plus a constant which does not involve 9 or 9*. Clearly 8k and S* are sufficient for 9 and
9* for fixed k.
Let us reparameterize so that A = 9* — 9 is the change in natural parameter, and 9* is
a nuisance parameter. In many cases A has a natural interpretation. For the case of the
exponential distribution, A = 1/fi—l/fi* is the change in accident rate, and for the
binomial distribution A is the log odds ratio. Then the likelihood (23) becomes
L = {-SkA
+ S9*-k\l/(9*-A)-k*i(,(9*)}/<t>,
(2-4)
where S = Sk + S* = Xt + ... + Xn is the sample sum.
It is clear that conditional on S, Sk is sufficient for A for fixed k and that the
distribution of the sample depends on S, A and k. For the case of known k, exact
inference about A is now possible if we condition on S. However if k is unknown and
Confidence regions and tests for a change-point
95
becomes an extra parameter of interest, conditional inference is not as straightforward.
Instead we shall base the choice of our estimators and test statistics unconditional on 8,
whereas we shall determine their distributions, and hence our inference, conditional on
S. This will have the advantage that our inference will not depend on the nuisance
parameter 6*. Since S itself does not depend on A;, then this advantage will be gained for
all k, and in particular when k is unknown.
2-2. The likelihood ratio test for a change
We wish to test the null hypothesis Ho: A = 0 of no change against the alternative
Hx: A 4= 0, where \i, /i* and k are unknown. Consider the likelihood unconditional on S.
Under Ho the maximum likelihood estimator of /j = /i* is X = S/n. Under H1 for fixed k,
the maximum likelihood estimators of /i and /i* are Xk = SJk and X% = S*/k*
respectively. The maximum likelihood estimator of A is Ak = a(X^) — a(Xk). Let Zk be
minus twice the log likelihood ratio for fixed k. Then from (2-3)
Zk = 2[k{Xka(Xk)-b(Xk)} + k*{Xta(Xt)-b(Xt)}-n{Xa(X)-b(X)}]/<j>.
(2-5)
Thus minus twice the log likelihood ratio, for unknown k, is
T = m&xkZk = Zt,
(2-6)
say, where k is the maximum likelihood estimator of k.
For the normal case, Zk is the square of the usual two-sample Z-statistic,
Zk = (Xt-Xk)2/(4>/k + <fi/k*),
(2-7)
which has a noncentral chi-squared distribution with one degree of freedom, independent
of S. For the gamma case, Zk is"minus twice the logarithm of Bartlett's statistic for two
samples,
Zk = 2{klog(X/Xk) + k*log(X/X!)}/<t>,
(2-8)
and as we shall see in the next section, the distribution of Zk does in general depend
on S.
3. EXACT DISTRIBUTIONS
31. Distribution of Sk
Let Pk,n(Sk;S) be the density or probability function of Sk conditional on S when
A = 0; the dependence on <>
/ has been omitted for clarity. For the gamma case, /JkiII is a
beta density in SJS with parameters k/<p and k*l4>; for the Poisson case, pkn is a
binomial probability function with <S' trials and probability of success k/n; for the
binomial case, fikn is a hypergeometric probability function; but for the inverse
Gaussian case there is no explicit expression for /}t>(1; see Tweedie (1957).
It can be shown that the density or probability function of Sk conditional on S when
A 4=0 is
gkjSk;S,A)ccex<p(-SkA/4>)pkjSk;S)
'
(3-1)
with constant of proportionality chosen so that the integral or summation over all Sk is
unity. In general, gk „ depends on S as well as A, but for the gamma case there is a slight
simplification. Since flk „ depends on S only through the ratio SJS, then gkn depends on S
96
K.
J.
WORSLEY
and A only through the parameter AS. This parameter has a convenient interpretation in
the exponential case: AS/n = A/(l/X) is the ratio of the change in rate to the estimated
average rate. For simplicity we shall omit the dependence on n and S and write
gk(8k;A)=gkj8k;8,A).
32. The null distribution of T
The null distribution of T can be found by a straightforward generalization of the
iterative method employed by Hawkins (1977) for the normal case, and by Worsley
(1983) for the binomial case. Let
FktH{t;8k,8)
= Vv(Zl^t;i=\,...,k\8k,S)
(k = l,...,n),
(3-2)
where Zn is defined to be zero. I t can be shown that the second derivative of Zk with
respect to Sk is always positive so that the support of Fk „, namely {Sk: Zk ^ t}, is always
an interval. Then
pr (T ^ 11S) = FnJt ;S,S) = 0n(t; 8),
(3-3)
say. For simplicity we shall omit the dependence on n and S and write
Fk(t;8k)
=
L E M M A 1. For k = 1, . . . , n — 1 and nt = n (i=
k+1
FkJt;Sk,S).
l,...,k
+ l),
)=\Fk(t;Sk)pk,k+1(Sk;Sk+1)dSk
if Zk + 1 ^ t, and zero otherwise, urith integration replaced by summation for discrete
distributions.
Proof. By the law of total probability
Since 8k is a partial sum of independent random variables then S^-.-.S,, = S is a
Markovian sequence. Conditional on 8k and S, then Slt ...,Sk-t are independent of Sk + l.
Since Zt depends only on St conditional on fixed 8 (i = 1, ...,n— 1), then
if Zk+l ^ t, and zero otherwise. The density of 8k, conditional on Sk + l and 8 depends on
Sk + 1, and is given by Pk,k+i(Sk; Sk + l).
•
For the normal case this Lemma has been successfully used by Hawkins (1977) and
Worsley (1983) to find the null distribution of T for samples up to size n = 50. For the
Poisson and binomial cases integration is replaced by summation, and Worsley (1983)
uses this Lemma to find the exact null distribution of T for samples up to size n = 200.
However for other distributions such as the gamma and inverse Gaussian the Lemma
leads to numerical problems when quadrature is used to evaluate the integrals. The
reason is that Sk ^ Sk + l for positive random variables and the density fiktk + i(Sk; Sk + l)
puts large weight at Sk near Sk + 1, particularly if <j> is large. In the gamma case with
<f> = 2, so that the observations have a chi-squared distribution with one degree of
Confidence regions and tests for a change-point
97
freedom, Pkik + i(Sk; *St + 1) is infinite at Sk = Sk + 1, and for the exponential case with
<f> = 1, Pkik + i(Sk; *St + 1) is a maximum at Sk = Sk+l. Thus to control numerical errors,
Fk + l(t; 8k + 1) must be evaluated at the same values of Sk + 1 as the values of Sk for which
Fk(t;Sk) is evaluated. However {Sk: Zk < i) is not the same interval as {<St + 1: Zk + 1 ^ t}
and so it is impossible to select values for both which are the nodes of a high accuracy
quadrature rule such as Gaussian quadrature.
Fortunately, for the special case of the exponential distribution there is another
method available for calculating the null distribution of T which does not involve
numerical integration. It is well known that the null distribution of SJS, ...,5 n _!/S
conditional on S is the same as that of the order statistics U"^,..., U"n_X) of a sample of
Ti—1 independent uniform random variables. Since Z, depends only on SJS, the event
Zi < t is the event at(t) ^ SJS ^ b^t) for some at(t) and bt(t) for i = I, ...,n — 1, and so
Oa(t;S) = vr(T^t\S)
= pr{o,(<) ^ U*w ^ 6,(0; t = 1,...,»-1}.
(3-4)
Noe (1972) gives a fast efficient algorithm for calculating the probability that uniform
order statistics are bounded by arbitrary numbers. If we use this algorithm it is possible
to calculate the null distribution of T for sample sizes as large as n = 100 without
excessive round off errors.
It is clear from (3-4) that the null distribution of T conditional on S does not depend on
S, and so it is also the unconditional distribution of T; a table of percentage points of T
can thus be provided. Table 1 gives the level a points of T for a = 010,005,001 and
Table 1. Level a. points ta of T for a sequence of n exponential random variables;
maximum level a points caof T — Zk conditional on Sk and S; maximum level a points
daofMk
n
<„
a = 0-10
c«
d.
t.
a = 0-05
c.
d.
10
20
30
40
50
60
70
80
90
100
6193
6-936
7-306
7-544
7-718
7-852
7-962
8-053
8132
8-200
5-049
6-056
6-536
6-840
7-058
7-225
7-360
7-473
7-568
7-652
6-623
7-606
8-075
8-373
8-588
8-753
8-887
8-999
9-094
9-177
7-661
8-428
8-809
9O54
9-233
9-371
9-483
9-577
9-658
9-728
6-356
7-444
7-956
8-279
8-509
8-686
8-828
8-946
9-047
9-134
8-071
9-079
9-559
9-865
10-084
10-254
10-391
10-505
10-602
10-687
10-992
11-797
12196
12-454
12-641
12-786
12-903
13-001
13085
13158
a = 0-01
c.
d.
9-443
10-627
11195
11-551
11-804
11-997
12153
12-281
12-390
12-485
11-360
12-403
12-903
13-221
13-449
13-626
13-769
13-887
13-989
14-077
n = 10(10)100 found using the algorithm of Noe (1972). Note that these points do not
appear to converge to a limiting distribution, but appear to increase without limit. For
the normal case Hawkins (1977) proves that the null distribution of T is unbounded, and
more recently Miller & Siegmund (1982) give an asymptotic normal distribution for T
with mean and variance increasing with the sample size n.
33. The alternative distribution of T
Under the alternative hypothesis of a change of size A after change-point k the
distribution of T conditional on S can be found in a similar way to that of the null
distribution.
98
LEMMA
K. J. WORSLEY
2. Under the alternative hypothesis H^. A 4= 0 at change-point k,
pr(T^t\S)=
\Fk(t; Sk) FAt; 8?) gk(Sk; A) dSk.
Proof. By the law of total probability,
pr(T
^t\S)
=
p r ( Z , < t;i=
1, ...,n-l
\Sk = s,8)dpr(Sk
^s\S).
Conditional on Sk and S, S, is independent of Sj for t < k < j , and since Z, depends only
on St then
,...,n-l\Sk,S).
(3-5)
Conditional on Sk, the first probability depends only on the distribution of Xl,...,Xk
which by definition is Fk(t; 8k). The second probability depends only on the distribution
of Xk+l,...,Xn, and since the reversed sequence has the same distribution as the original
sequence under the null hypothesis then this probability is ^(JjS*). From (31),
d pr (Sk ^ s | S) is gk(s; A) da and the result follows.
•
This Lemma has been used by Worsley (1983) to find the power of the test for the
normal and binomial cases. For other distributions such as the gamma and inverse
Gaussian the numerical evaluation of Fk and F& using Lemma 1 causes some difficulties.
However for the exponential case Fk and F^ can be evaluated directly using the
algorithm of Noe (1972). Conditional on Sk, SJS^ ...,8k-JSk have the distribution of
k— 1 uniform order statistics U\l)t..., U\k-iy Hence
Fk(t;Sk)=pr{al(t)S/Sk^
Uk(l) ^ b^S/S,, ;i = 1, ...,k-1}.
(3-6)
Since these calculations do not involve A then the alternative distribution of T
conditional on S can easily be found for several values of AS using Lemma 2. As an added
benefit a simple check on the numerical calculations can be obtained by putting A = 0.
This gives the null distribution of T which should be the same for all values of k.
4. POWER COMPARISONS
Hsu (1979) has proposed a test for a change in a sequence of gamma random variables
under the assumption that the change can occur with equal probability at any point in
the sequence. The likelihood ratio statistic is based on a linear combination of the
observations weighted by the sequence number:
W = ^(i-l)Z,/S
(0<W<n-l).
(4-1)
It is straightforward to show that under the null hypothesis Ho of no change, the
expectation of W is |(n— 1), and so we reject Ho for large values of | W—$(n— 1) |.
The exact null distribution of W is complex, but for the exponential case <f> = 1 it has a
simple form. We can write W as
W^YSr/S,
(4-2)
where 8*/S,..., S*- JS have the distribution of n — 1 ordered uniform random variables.
Confidence regions and tests for a cJtange-point
99
But since the sum of ordered statistics is the same as the sum of unordered'statistics then
W has the distribution of the sum of n— 1 independent uniform random variables. It is
then straightforward to show that the first four cumulants of W are
3n
= 0,
(4-3)
Hsu (1979) gives the cumulants of W for general (p and suggests that an Edgeworth
expansion can be used to find the approximate distribution of W.
The power of W has been found by Hsu (1979) using simulation. We shall now find an
approximate expression for the alternative distribution of W conditional on S using the
approach of Lemma 2 combined with the Edgeworth expansion above. For a change of
size A at change-point k, we have
-f
(4-4)
W=
(4-5)
Jo
by the law of total probability. Conditional
on Sk and S, it can be shown that
where r = SJS, r* = 1 — r, and Wk and Wp are independent statistics with the null
distribution of W for samples of size k and k* respectively. Hence the first four
cumulants of W conditional on Sk and 8 are
Ki
=
= r2K2k- r*2
2k>,
(4-6)
Thus pr(W ^ w\Sk,S) can be found using the Edgeworth expansion, and integrated
over the density of Sk to give the desired result. Note again that the accuracy of the
calculations can be checked by putting A = 0; this gives the null distribution which can
be compared with the direct Edgeworth approximation for W.
For the exponential case, Fig. 2 gives the power of T and W at the level a = O05 for a
sample of size n = 60 and k = 6, 10, 20 and 30. The power is plotted as a function of Sk,
the size of the change A divided by the estimated asymptotic standard deviation of At
Fig. 2. The powers of T (solid lines) and W (dashed lines) against 6k, for a = O05, n = 60, and
k = 30,20,10,6 (from top to bottom on the right, from bottom to top on the left). The powers of Zk
for known k are also shown for comparison (dotted lines).
100
K. J. WORSLEY
under Ho. For the general case this is given by
var(At) = var{a(Xt)-a(Xk)}^{<lf/k
+ <l>/k*)/V(X),
(4-7)
where V(n,) = l/a'(fi,) is the variance of X, as a function of /z,. For the exponential case,
<f> = I, V{Hi) = nf, and we have
X
~
i
.
(4-8)
For known k the likelihood ratio test is based on Zk, and its power depends only on 5k.
Thus the exact power of Zk is also included in Fig. 2 for comparison with T and W.
Several points can be noted. The power of T is slightly lower than that of W for k near
the middle of the sequence, but for k near the beginning or end T is much more powerful
than W; T is almost always as powerful as Zk. Since 5k decreases as k approaches the ends
of the sequence then all tests are less powerful at the ends of the sequence. The power
curves are not symmetrical in Sk; the tests are more powerful at detecting a negative
change than a positive change near the beginning of the sequence. Plots of the same
power functions for a sample of size n = 30 showed a similar pattern.
5. INFERENCE ABOUT THE CHANGE-POINT
51. A confidence region for the change-point k
An exact 1 — a confidence region for the change-point k is the set Ca of values K that
cannot be rejected as the true change-point by a suitable level a test. Minus twice the log
likelihood ratio for testing H*: k = K, against H*: k 4= K, where K is fixed, is T — ZK, so
that Ca contains the set of values of K for which ZK is sufficiently close to its maximum T.
If the data indicate that there are two distinct change-points, then both may be accepted
as possible change-points, and so both points may be included in the confidence region.
Thus such a region may be disconnected, and a disconnected confidence region may
indicate the presence of more than one change-point.
Since the null hypothesis now contains the nuisance parameter A, then our inference
can be made free of A if we condition on the sufficient statistic SK as well as S. In this case
ZK is fixed, and so the test statistic is effectively T conditional on SK and S. Its null
distribution can be found in the same way as the alternative distribution of T. From (3-6)
we have
= FK(t;8K)FK.(t;Si),
(5-1)
Vr(T ^t\SK,S)
where K* = n — K, and so the confidence region is
Ca = {K: FK(T; SK)FK.(T; 8%)^l-a}.
(5-2)
Note that Ca may be empty for sufficiently large a. The construction of Ca requires either
the direct evaluation of FK and FK. for each K or else a table of level a points of T — ZK
conditional on SK and S, for each K, SK, n and S. For the normal and gamma cases FK
does not depend on S but still such a table is obviously not practical. However, for the
exponential case the maximum level a point ca of T — ZK, conditional on SK and S over all
values of K and SK for fixed n, has been found by numerical methods. For n = 10(10)100
and a = 010, 005, 0-01, the maximum occurred at K = 1 and Sx — S/n, and the values of
ca are given in Table 1. Thus a conservative 1 — a confidence region for k is
Ga = {K.T-ZK^
cx} = {K: ZK > T-ca};
(5-3)
in other words, the set of change-points for which the plot of Zk exceeds T — ca. This
Confidence regions and tests for a change-point
101
means that if the plot has two separate peaks of almost equal height, suggesting two
change-points in the sequence, then 0a will be two disconnected regions that contain
them both. Note that since T — Z$ = 0 then &a always contains k.
52. A Cox-Spjotvoll type confidence region
The method of Cox & Spjetvoll (1982) can be modified by choosing a test of
homogeneity as follows. Let the confidence region Da contain all change-points K that
partition the sequence into two subsequences in which we accept the hypothesis of no
further change-points at level a. Consider the model with one known change-point at K
and a second unknown change-point at k. Then we wish to test the hypothesis HQ against
H\ as before, and put K in Da if H* is accepted at level a. Let T£ be the equivalent of the
test statistic T evaluated only for the subsequence of observations Xl3..., XK and let T£
be the equivalent of T evaluated only for the subsequence XK + l, ...,Xn. Then minus
twice the log likelihood ratio of H* against //f is MK = max {T£, T£). Conditional on SK
and S, T£ and T£ are independent and so the null distribution of MK is
^t\ SK, S) = GK(t; SK) Gg.it; 8%)
(5-4)
and so an exact 1 — a confidence region for k is
Da = {K:GK(T;SK)GK.(T;St)^l-a}.
(5-5)
Again the construction of Da is laborious, but for the exponential case the maximum
level a point da of MK over all values of K for fixed n has been found by numerical
methods. The maximum always occurred at K = %n, and the values of da for
n=10(10)100 and a = 010,0-05,0O1 are given in Table 1. The conservative 1 - a
confidence region for k is
Ba = {K:MK^dx}.
(5-6)
The exact confidence of Ba is largest for A = 1, but for a = 0-05, it never exceeds 0-968 for
n= 10^0)100.
There is a close link between MK and T — ZK. Since H\ now contains two changes then
it is straightforward to show that MK ^ T — ZK with approximate equality near the ends
of the sequence. Hence da ^ ca, as can be seen from Table 1.
6. EXAMPLES
6-1. Time intervals between coal mine explosions
Maguire et al. (1952) give the time intervals in days between explosions in British coal
mines from 1875 to 1950. The n = 109 time intervals Xk together with the test statistics
Zk are plotted against the cumulative time Sk in Fig. l(a). The maximum occurs at
k = 46 which corresponds to the year 1890 and the estimated size of the change is
A46 = 0-00553 explosions per day or 2-02 explosions per year. The test statistic for a
change is T = 26-280, and from Table 1 the change is highly significant. The exact level
a = 0-05 point of T for n = 109 is ta = 9784, well below the observed value of T.
The simplest confidence region to construct is the conservative region Ca for the
change-point k. A line at a height of ca = 9-204 below T is drawn on the plot of Zk in Fig.
l(a). All values of k for which Zk lies above this line are in (Ja. Now
6a = {28,31,32,34, ...,53} contains the years 1884 to 1895 with a break in 1885 and
1886. The exact confidence region Ca = {36, ...,53} covers the narrower interval from
102
K. J. WORSLEY
1887 to 1895. The statistic for a second change Mk is also plotted in Fig. l(a), together
with a line at a height of da = 10755. The conservative Cox-Spjotvoll type region
£)a = {28,..., 53} covers the broader interval from 1884 to 1895, and the exact region Da
is the same as 3X. In this example the region Cx is narrower than Da.
Following the binary segmentation procedure of Vostrikova (1981), we may now look
for further changes in the two subsequences before and after 1890. The two subsequences
give T+6 = 2-230 and T£6 = 3529, well below the level a = 0-10 critical points, and so we
conclude that no further changes took place. Note that MA6 = max {2-230,3529} is the
test statistic for homogeneity in the Cox-Spjetvoll type confidence region Da. The fact
that k = 46 is included in this region is further evidence that no more changes took place.
It is interesting to try to find historical events that may have caused the observed
change. In 1886 and 1894 two Royal Commissions reported on accidents in mines, and
extensive testing demonstrated beyond all doubt that coal dust in the workings was to
blame for originating and extending explosions beyond the coal face. They recommended
wetting of the sides and floors to keep down dust, and a change from gunpowder to other
forms of blasting powder that produce less flame. These recommendations were
implemented in the Explosive in Coal Mines Order of 1896.
6'2. Variation in stock market returns
Hsu (1979) gives 162 weekly closing values Pt of the Dow-Jones Industrial Average
from July 1, 1971 to-August 2, 1974. The rates of return are
R^iPt^-PJ/P,
(t= 1
161).
It is assumed that Rl, ...,R161 are independent normal random variables with constant
mean and a variance which may change after an unknown time. The variance is
estimated by Xt = (Rt — H)2, where M is the average rate of return, and since the sequence
is long we may assume that Xt, ...,X16l are proportional to independent chi-squared
random variables with one degree of freedom. Here E(Xt) = \i{ is the unknown variance,
and we wish to test for a change in mean variance fit after an unknown point in the
sequence. Using the linear trend statistic W, Hsu (1979) concluded that a highly
significant change had taken place.
To apply the methods of this paper we can overcome the numerical difficulties
mentioned in §32 by transforming the data to exponential random variables by
averaging pairs of adjacent values of X( to produce X't = \(X2i-i + X2i) (i = 1,....80).
The change-point model now states that a change may take place after any pair
of observations, and so only a small amount of information has been lost.
Values of the average squared percent return X'k are plotted in Fig. l(b) together with
the corresponding test statistic Zk. The maximum occurs at k = 44 which corresponds to
late February 1973, and the estimated size of the change is KM = 0-274 per cent" 2 . The
test statistic for a change is T = 24-894, much greater than the level a = 0O5 point
ta = 9-577 from Table 1. The confidence region 6a = {32,..., 48} is the set of values of k for
which Zk is above the line ca = 8946 units below T, also drawn in Fig. l(b). The exact
interval is Ca = {34,37,..., 47}. From the plot of Mk in Fig. l(b), the set of A; for which Mk
is below da = 10505 is 3a = {31, ...,48}. The exact interval Da is the same as Da. All
intervals cover a period from October 1972 to March 1973. No further change-points
were found in the subsequences before and after the first change-point: T^ = 5562 and
TXA = 7-191. The start of the confidence region corresponds to the Watergate break-in,
and the end corresponds to a period when U.S. prime interest rates began increasing.
Confidence regions and tests for a change-point
103
7. CONCLUSION
This paper has studied the problem of inference about the change in mean of a
sequence of observations. Maximum likelihood methods have been used to test for a
change, to estimate the change-point, and to give a confidence region for the changepoint. The main theoretical results rely on the Markovian property of a sequence of
partial sums of independent random variables, and as such they can be applied to any
statistic that is a maximum of a function of the partial sums. In particular, the
conditional likelihood at k, maxA{7t(iSk ; &)/gk(Sk ; 0), is a function only of Sk for fixed S,
and so the same theoretical results can be applied to the maximum conditional likelihood
ratio test statistic; for simplicity we have confined our attention to the unconditional
likelihood ratio test statistic T. Finally, the exponential family is assumed only in so far
as it is characterized as the set of distributions for which the sums of the observations are
sufficient for the mean. This implies that if we condition on the sample total then our
inference is free of nuisance parameters, and as an added benefit the test statistic for a
change at k depends only on the partial sum of the observations, thus guaranteeing the
Markovian property.
A natural question to ask is which of the two confidence regions Ca and Da is narrower.
The answer depends very much on the assumed model. If we know that there is exactly
one change-point k in the sequence then the probability that K is in the region can be
found by conditioning on Sk, SK and S, applying the Markovian property, and
integrating over 8k and SK. The calculations are laborious and have not been attempted.
Examples in §§61 and 62 present some evidence that Ca is narrower than Da. On the
other hand, if there is more than one change-point or a trend in the mean, then Da may be
empty since no subsequence would be judged homogeneous in mean. If there are two
change-points then Ca may contain them both, particularly if there is an increase in
mean followed by a decrease. For these reasons Ca seems preferable.
ACKNOWLEDGEMENTS
T would like to thank the referees for suggestions which clarified many points. This
work was supported by a Natural Sciences and Engineering Research Council of Canada
grant and by a Fondation de Chercheurs et d'Action Concertee de Quebec subvention.
REFERENCES
Cox, D. R. & LEWIS, P. A. W. (1966). The Statistical Analysis of Series of Events. London: Methuen.
Cox, D. R. & SPJ0TVOLL, E. (1982). On partitioning means into groups. Scand. J. Statist. 9, 147-52.
HAWKINS, D. M. (1977). Testing a sequence of observations for a shift in location. J. Am. Statist. Assoc. 72,
180-6.
HINKLEY, D. V. (1970). Inference about the change-point in a sequence of random variables. Biometrika
57, 1-1T.
HINKLEY, D. V. & HINKLEY, E. A. (1970). Inference about the change-point in a sequence of binomial
random variables. Biometrika 57, 477-88.
Hsu, D. A. (1979). Detecting shifts of parameter in gamma sequences with applications to stock price and air
traffic flow analysis. J. Am. Statist. Assoc. 74, 31-40.
MAOUIBB, B. A., PEABSON, E. S. & WYNN, A. H. A. (1952). The time intervals between industrial accidents.
Biometrika 88, 168-80.
MILLEE, R. & SIEGMUND, D. (1982). Maximally selected chi square statistics. Biometrics 88, 1011-16.
NOE, M. (1972). The calculation of distributions of two-sided Kolmogorov-Smimov type statistics. Ann.
Math. Statist. 48, 58-64.
SCHECHTMAN, E. (1983). A conservative nonparametric distribution-free confidence bound for the shift in
the changepoint problem. Comm. Statist. A 12, 2455-64.
104
K. J. WORSLE\
A. J. & KNOTT, M. (1974). A cluster analysis method for grouping means in the analysis of variance.
Biometrics 30, 507-12.
TWEEDIE, M. C. K. (1957). Statistical properties of the inverse gaussian distributions II. Ann. Math. Statist.
28, 696-705.
VOSTRTKOVA, L. Ju. (1981). Detecting "disorder" in multidimensional random processes. Sov. Math. Dokl.
24, 55-9.
WOBSLBY, K. J. (1983). The power of likelihood ratio and cumulative sum tests for a change in a binomial
probability. Biometrilca 70, 455-64.
ZAOKS, S. (1982). Classical and Bayesian approaches to the change-point problem: fixed sample and
sequential procedures. Statistique et Analyse des Donnies 1, 48-81.
SOOTT,
[Received April 1984. Revised April 1985]
NOTK ADDED IN PROOF
I am indebted to D. R. Cox'for referring me to the papers by Cobb (1978) and Jarrett
(1979). The latter points out several errors in the data on coal mine explosions as given
by Maguire et al. (1952) and gives a corrected data set of 190 observations extending
back to 1851. The conclusions given in §61 remain substantially the same; the changepoint is estimated at exactly the same point in the data but because of the increased
number of observations the likelihood ratio statistic increases to T = 71-219. However
the widths of the confidence regions remain much the same, because extending the
sequence of observations, as Hinkley (1970) points out, has little effect on inference
about the change-point.
The work of Cobb (1978) on a conditional solution to the change-point problem
deserves more attention. Cobb shows that the conditional distribution of the maximum
likelihood estimator k given the ancillary values of observations adjacent to k is
approximately the same as a Bayesian posterior distribution under a uniform prior.
Specifically,
pr(k =fc|Xi._d+ 1 ,X
approximately for some small d. We can thus construct an approximate (1—a)
confidence region for k as Ea = {k: Zk ^ z} where z is the largest value such that
In-l
keE.
/
For the coal mine explosions data as used in §61 with a = 005,
Ea = {36, ...,39,41, ...,53} which is the same as Ca with one less point. For the stock
returns data Ea is exactly the same as Ca. The essential difference between Ca and Ea is
that Cx contains acceptable values of k conditional only on sufficient statistics for the
nuisance parameters, whereas Ea contains acceptable values of k conditional on the
shape of the likelihood Zi~Zh for \i — k\ < d, near the estimated change-point. It is
surprising that in these two examples the confidence regions are almost exactly the same,
though in practice Ea is far easier to construct than Ca.
ADDITIONAL
REFERENCES
COBB, G. \V. (1978). The problem of the Xile: Conditional solution to a change-point problem. Biometrika 65,
243-51.
JARRETT, R. G. (1979). A not* on the intervals between coal-mining disasters. Biometrika 66, 191-3.