Download Journal of Hydrology, 58 (1982) 11-

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Journal of Hydrology, 58 (1982) 11--27
Elsevier Scientific Publishing Company, Amsterdam -- Printed in The Netherlands
11
[3]
SOME METHODS FOR TESTING THE HOMOGENEITY OF RAINFALL
RECORDS
T.A. BUISHAND
Royal Netherlands Meteorological Institute (K.N.M.L ), De Bilt (The Netherlands)
(Received June 25, 1981; accepted for publication August 19, 1981)
ABSTRACT
Buishand, T.A., 1982. Some methods for testing the homogeneity of rainfall records.
J. Hydrol., 58: 11--27.
Cumulative deviations from the mean are often used in the analysis of homogeneity.
Features of five tests on the cumulative deviations are discussed. Some of these tests
have optimal properties in testing the null hypothesis of homogeneity against a shift in
the mean at an unknown point.
Together with the classical yon Neumann ratio the tests were applied to the annual
amounts of 30-yr. rainfall records in The Netherlands. For a large number of records
strong indications for a change in the mean were found. There were only small differences between the various test-statistics with respect to the number of records for which
the null hypothesis was rejected.
INTRODUCTION
H o m o g e n e o u s rainfall r e c o r d s are o f t e n r e q u i r e d in h y d r o l o g i c design.
H o w e v e r , it f r e q u e n t l y o c c u r s t h a t rainfall d a t a o v e r d i f f e r e n t p e r i o d s are
n o t c o m p a r a b l e since t h e m e a s u r e d a m o u n t o f rainfall d e p e n d s o n such
f a c t o r s as t h e t y p e , h e i g h t a n d e x p o s u r e o f t h e raingauge, w h i c h h a v e n o t
a l w a y s b e e n t h e s a m e . T h e r e f o r e m a n y m e t e o r o l o g i c a l i n s t i t u t e s m a i n t a i n an
archive w i t h i n f o r m a t i o n o n t h e r a i n g a u g e sites a n d t h e i n s t r u m e n t s used.
U n f o r t u n a t e l y , it is o f t e n n o t possible t o s p e c i f y t h e n a t u r e o f c h a n g e s in
t h e m e a n a m o u n t o f rainfall f r o m t h e s t a t i o n d o c u m e n t a t i o n . T h i s is p a r t l y
b e c a u s e it is n o t a l w a y s k n o w n h o w a c h a n g e in t h e i n s t r u m e n t or in t h e
r a i n g a u g e site m a y i n f l u e n c e t h e m e a s u r e d a m o u n t o f rainfall a n d p a r t l y
b e c a u s e it is highly q u e s t i o n a b l e w h e t h e r t h e s t a t i o n i n f o r m a t i o n gives a
c o m p l e t e p i c t u r e o f t h e r a i n g a u g e site d u r i n g t h e p e r i o d t h a t t h e s t a t i o n h a s
b e e n in o p e r a t i o n .
B e c a u s e o f t h e u n c e r t a i n t y a b o u t p o s s i b l e changes, graphical m e t h o d s are
o f t e n used in c l i m a t o l o g y a n d h y d r o l o g y t o o b t a i n s o m e insight i n t o t h e
h o m o g e n e i t y o f a r e c o r d . A p o p u l a r t o o l is t h e d o u b l e - m a s s c u r v e ( S e a r c y
a n d H a r d i s o n , 1 9 6 0 ) , w h i c h is o b t a i n e d b y p l o t t i n g t h e c u m u l a t i v e a m o u n t s
0022-1694/82/0000--0000/$02.75
© 1982 Elsevier Scientific Publishing Company
12
o f the station u n d e r consideration against the cumulative a m o u n t s of a set of
neighbouring stations. T he p l o t t e d points t end t o fall along a straight line
u nd er conditions o f h o m o g e n e i t y . Instead of the double-mass curve one can
also plot the cumulative deviations f r o m some average value. The cumulative deviations have the advantage t hat changes in the mean a m o u n t of
rainfall are easier recognized (Craddock, 1979). The graph o f the cumulative
deviations is sometimes called a residual mass curve.
Th o u g h graphs are useful for the d e t e c t i o n of shifts in the mean it is
usually n o t obvious how real changes can be distinguished f r o m purely rand o m fluctuations. T h e r e f o r e it is always necessary to test the significance o f
departures f r o m h o m o g e n e i t y by statistical methods. C o m m o n statistical
techniques in climatology and h y d r o l o g y are reviewed in a publication on
climatic change by the World Meteorological Organization (W.M.O., 1966).
It is a surprising fact, however, t hat these statistical tests are n o t based on
some characteristic o f the cumulative sums in the graphical analysis.
The in ten tio n of this paper is t o discuss some tests on the cumulative
deviations. These tests are c o m p a r e d with the classical von N eum ann ratio.
A study was made on properties of the test-statistics for a simple model
with a shift in t he mean. F u r t h e r the usefulness of the tests was investigated
for annual rainfall totals in T he Netherlands for the period 1951-1980. First
some features of t he test-statistics are derived and t hen the application to
the rainfall data is discussed.
STATISTICAL ANALYSIS OF HOMOGENEITY
In the i n t r o d u c t i o n the need for statistical techniques was emphasized to
test the h o m o g e n e i t y of rainfall records. Suppose t hat one wants to test the
h o m o g e n e i t y o f a sequence Y1, Y2, • • •, Yn. Under t he null hypothesis H0
it is usually assumed t h a t the Yi's have the same mean. The form of the alternative hypothesis H1 is generally r at her vague since o f t e n no reliable prior
i n f o r m a t i o n is available a b o u t possible changes in the mean.
Usually, some assumptions are made on the j o i n t distribution o f the Yi's.
Most tests require t h a t the Yi's be i ndependent . This is n o t a serious restriction, since th e tests are usually p e r f o r m e d on consecutive seasonal or annual
values which are a p p r o x i m a t e l y i n d e p e n d e n t in m a n y countries. The distributions o f th e test-statistics in this paper are derived for the case t hat the
Yi's are stochastically i n d e p e n d e n t and have a normal distribution, The tests
can still be applied, however, when there are slight departures f r o m normality.
In th e literature a b o u t testing the h o m o g e n e i t y of rainfall records, hardly
any a t t e n t i o n is paid to the distribution of test-statistics under the alternative hypothesis. Generally, no i n f o r m a t i o n is given on the probability of
rejecting the null hypothesis in relation to the magnitude of changes in the
mean. In this paper, properties of test-statistics are illustrated for t he case
t h a t th e Yi's are normally distributed with mean:
13
u,
E (Yi)
i =
= t #+A,
1 ....
i=m+l
, m
....
(1)
,n
and variance:
var Yi = o~
The model assumes a jump in the mean of magnitude A after m observations. In the sequel, examples are given of the probability of rejecting H 0
as a function of A. Also, some remarks are made on the estimation of the
change-point m.
The von Neumann ratio
The well-known yon N e u m a n n ratio is defined by:
,-1
2/~
N = ~ ( Y i - - Yi+l)
i=l
( Y i - - Y):
(2)
~i=1
in which Y stands for the average of the Yi's.
Under the null hypothesis of a constant mean it can be shown that
E(N) = 2. For a non-homogeneous record the mean of N tends to be smaller
than 2. A table of percentage points of N for normally distributed samples is
given by Owen (1962).
The yon Neumann ratio is closely related to the first-order serial correlation coefficient (W.M.O., 1966). A comprehensive study of the effect of
changes in the mean on the correlogram was made by Yevjevich and Jeng
(1969).
Cumulative deviations
Tests for homogeneity can be based on the adjusted partial sums or cumulative deviations from the mean:
k
S~ = 0;
St = ~
(Yi--Y),
k= 1,...,n
(3)
i=l
Note t h a t S* -- 0. For a homogeneous record one may expect that the S t ' s
fluctuate around zero since there is no systematic pattern in the deviations
of the Yds from their average value Y. On the other hand, when A is negative
in eq. 1 most values of S t are positive because the Yi's tend to be larger than
:P if i ~< m, and smaller than Y if i > m. A typical example is given in Fig. 1.
For A positive the S t ' s tend to be negative.
Rescaled adjusted partial sums are obtained by dividing the S~'s by the
sample standard deviation:
St*
= S~/Dy,
k = 0,...,
n
(4)
14
Yk
20-
10
0
0
5'
1o
'
I'5
5
k
15
~0
k
Fig. 1. Non-homogeneous time series with adjusted partial sums.
with
D2Y = ~ (Yi-- Y)2/n
i=l
The values of the S~*'s are not influenced by a linear transformation of
the data. For instance, if the amount of rainfall is expressed in metres instead of in millimetres, the S~ 's are diminished by a factor 1000 but the
S~*'s remain unchanged. Therefore tests of homogeneity are based on the
rescaled adjusted partial sums S~*.
A statistic which is sensitive to departures from homogeneity is:
V =
max
IS~*l
(5)
O~k~n
High values of Q are an indication for a change in level. Critical values for
the test-statistic can be found in Table I. The percentage points in this table
are based on 19,999 synthetic sequences of Gaussian random numbers.
For n-+~o the critical values of Q can be obtained from a table of the
Kolmogorow-Smirnov goodness-of-fit statistic, see the Appendix.
TABLE I
Percentage points of Q p ~ r n a n d
R]X/-n
Q]Vrn
n
10
20
30
40
50
100
oo
R/%/~
90%
95%
99%
90%
95%
99%
1.05
1.10
1.12
1.13
1.14
1.17
1.22
1.14
1.22
1.24
1.26
1.27
1.29
1.36
1.29
1.42
1.46
1.50
1.52
1.55
1.63
1.21
1.34
1.40
1.42
1.44
1.50
1.62
1.28
1.43
1.50
1.53
1.55
1.62
1.75
1.38
1.60
1.70
1.74
1.78
1.86
2.00
15
A n o t h e r statistic which can be used for testing homogeneity is the range:
R =
max
St*--
O~k<~n
min
St*
(6)
O~k~n
The range is an important q u a n t i t y in studies on the storage capacity of
reservoirs. Much work has been done on its statistical properties in relation
to the famous Hurst p h e n o m e n o n (Gomide, 1978).
Shifts in the mean usually give rise to high values of the range. A figure
with percentage points of the distribution of R under the null hypothesis
is given by Wallis and O'Connell (1973). Some percentage points are also
given in Table I since it is n o t convenient to determine critical values from a
graph.
Worsley's likelihood ratio test
Consider again eq. 1 and assume t h a t one wishes to test A = 0 against
A 4= 0. If the position of the change-point m is k n o w n Student's t-test can
be used. In situations that no information about m is available the test can
be based on.
W =
max
[tk[
(7)
l <~ k <~ n - 1
where tk denotes Student's t for testing a difference in mean between the
first k and the last (n -- k) observations.
Critical values for the test-statistic can be obtained from a paper by
Worsley (1979). The test is equivalent with the likelihood ratio test. It is
also possible to give a relation between W and the weighted adjusted partial
sums
:
Z'~ = [ k ( n - - k ) l - v 2 S t ,
k=l,...,n--1
(8)
The largest weights are given to S T and S*-1. The weights are relatively small
for k in the neighbourhood of ½n. From eq. A-3 in the Appendix it is seen
t h a t the variance of Z~ does n o t depend on k.
Dividing Z~ by the sample standard deviation gives the weighted rescaled
adjusted partial sums Z~ *. Let
V =
max
IZ~*l
(9)
l~<k~< n-1
then some algebra shows (Worsley, 1979):
W = ( n - - 2) 1/2 V / ( 1 - - V 2)1/2
(10)
So there is a unique relation between V and W, which means t h a t tests on
V and W are equivalent.
16
Bayesian procedures
B a y e s i a n p r o c e d u r e s f o r t h e d e t e c t i o n o f c h a n g e s in t h e m e a n h a v e b e e n
d e v e l o p e d b y C h e r n o f f a n d Z a c k s ( 1 9 6 4 ) a n d G a r d n e r ( 1 9 6 9 ) . In t h e derivat i o n o f B a y e s i a n t e s t s it is a s s u m e d t h a t t h e v a r i a n c e a ~ is k n o w n . G a r d n e r ' s
statistic f o r a t w o - s i d e d t e s t o n a shift in t h e m e a n at an u n k n o w n p o i n t can
b e w r i t t e n as:
r~-i
= ~ Pk {S~ / ay}2
(11)
k=l
w h e r e Pk d e n o t e s t h e p r i o r p r o b a b i l i t y t h a t t h e shift o c c u r s j u s t a f t e r t h e k t h
o b s e r v a t i o n (k = 1 , . . . , n - - 1).
W h e n t h e s t a n d a r d d e v i a t i o n is n o t k n o w n a y c a n b e r e p l a c e d b y t h e
s a m p l e s t a n d a r d d e v i a t i o n . F o r Pk i n d e p e n d e n t o f k ( u n i f o r m p r i o r distribution) one obtains:
1
U -
n-1
E {S~*} 2
n(n + 1) k=,
(12)
a n d f o r Pk p r o p o r t i o n a l t o 1 / [ k (n - - k) ] o n e o b t a i n s :
n-1
A =
~
{Z~*} 2
(13)
k=l
L a r g e values o f t h e s e test-statistics are an i n d i c a t i o n f o r d e p a r t u r e s f r o m
h o m o g e n e i t y . Critical values f o r U a n d A are given in T a b l e II. T h e p e r c e n t age p o i n t s in this t a b l e are b a s e d o n 1 9 , 9 9 9 s y n t h e t i c s e q u e n c e s o f G a u s s i a n
r a n d o m n u m b e r s . T h e l i m i t i n g d i s t r i b u t i o n s o f U a n d A are t h o s e o f c e r t a i n
test-statistics o f t h e C r a m ~ r - - v o n Mises t y p e . T h e statistic U/n c o r r e s p o n d s
a s y m p t o t i c a l l y w i t h S m i r n o v ' s ¢~2 a n d t h e statistic A w i t h t h e A n d e r s o n - D a r l i n g statistic, see t h e A p p e n d i x .
TABLE II
Percentage points of U and A
n
10
20
30
40
50
100
oo
U
A
90%
95%
99%
90%
95%
99%
0.336
0.343
0.344
0.341
0.342
0.341
0.347
0.414
0.447
0.444
0.448
0.452
0.457
0.461
0.575
0.662
0.691
0.693
0.718
0.712
0.743
1.90
1.93
1.92
1.91
1.92
1.92
1.93
2.31
2.44
2.42
2.44
2.48
2.48
2.49
3.14
3.50
3.70
3.66
3.78
3.82
3.86
17
The power o f tests on homogeneity
The probability of detecting changes in the mean of a sequence Y1,
Y 2 , . . - , Yn by statistical methods depends on how serious these changes
are. When only a small change occurs during a short period of the sample
record there is little chance t h a t the tests will indicate non-homogeneity. On
the other hand, for feasible test-statistics it is necessary that they should
be able to indicate all relevant departures from homogeneity.
A study on the power of tests for a change in level at an u n k n o w n point
was made by Sen and Srivastava (1975). These authors compared the likelih o o d ratio statistic with Bayesian procedures. In this paper the power of
N, Q and W is discussed for testing A = 0 against A ~= 0.
For a particular test-statistic the probability of rejecting H0 depends on
the significance level ~, the value of A, the standard deviation o y , the number of observations n and the position of the change-point m. The dependence on A and Oy can be combined into one single parameter: A' = A / o y .
The power of N, Q and W was investigated for ~ = 0.05 and n = 30. Comparisons between these test-statistics were based on their power function:
P ( A ' , m ) = Pr(H0 is rejected I A ' , m )
(14)
If A = 0, then P ( A ' , m ) = ~ = 0.05; for A ~= 0 and m fixed the power function increases monotonically with the absolute value of A'. With IA'l
growing, P(A',m) tends to 1, t h a t is H 0 is rejected with probability 1.
To obtain the power of N, Q and W 1,999 sequences of 30 pseudo-random
numbers were generated from a standard normal distribution. For each sequence the statistics N, Q and W were calculated and then the critical values
were read from the ordered samples of the c o m p u t e d statistics. The powers
P(a.m)
1.0
,"'"" ......
0.8-
y
0.6-
0.4-
0.2.~...~..
0
o
Q ~ /
//
/.." /'
.""]:llll
//
f
/
//
/'I"'' ~"
m:
15
n :30
i
'
i
&
Fig. 2. S i m u l a t e d p o w e r s o f t h e statistics N, Q and W for testing a change of level in the
middle o f a sequence (ol = 0.06).
18
P(z~',m)
1.0
0.8 -
.
W , ~ ,...°"
..',:"
O.Z,
~
S~
/ //1/
""~--/i
.. ~ / "
0.2
.;;,d"
"
0
.
i
"-N
m=5
n =30
[
i
i
&
Fig. 3. Simulated powers of the statistics N, Q and W for testing a change of level near the
beginning of a sequence (~ = 0.05).
i
were based on t he same set of r a n d o m num bers by calculating t he teststatistics again after adding a cons t a nt A' to the last ( 3 0 - m) numbers of
each sequence.
Simulated p o w e r functions of N, Q and W for m = 15 and m = 5 {which
is equivalent to m = 25) are given in Figs. 2 and 3, respectively. Since the
p o w e r functions are symmetric in A', non-negative values of A' are considered only. F r o m the figures it is seen t hat the von N e u m a n n ratio N is
less p o wer f u l than Q and W b o t h f or m = 5 and m = 15. This is n o t surprising since N is n o t based on a specific f o r m of the alternative hypothesis
whereas Q and W are particularly designed for testing a change in level at
an u n k n o w n point. F o r ot her departures f r o m h o m o g e n e i t y N could be
mo r e p o w e r f u l t ha n Q and W.
Q is superior to W for m = 15, while the opposite holds for m = 5. In
general, for m in t he n e i g h b o u r h o o d o f ½n the statistic Q is m ore pow erful
t h an W. On the o t h e r hand, W is m or e sensitive to changes at the beginning
and at the end o f the sequence. This is a consequence of the large values of
t h e weights [k(n -- k)]-]/2 near t he end-points.
Th e p o wer o f the Bayesian statistic U is comparable with t hat of Q and
t h e p o wer f u n c t i o n o f A is s om ew ha t similar to t hat of W. F o r a single
change in the mean t he range R is less powerful than Q. But for t w o changepoints th e range usually gives a be t t e r test. A case with two change-points is
discussed b y Buishand {1981).
Estimation o f the position o f a change-point
Graphs o f cumulative deviations are o f t e n used to det erm i ne the position
19
of change-points. It is then assumed that something has happened at points
where the cumulative sum plot shows a clear change of slope. For the model
in eq. 1 the position of the m a x i m u m of IS~[ or IZ~[ can be taken as an
estimate for the change-point m.
Let K be the value of k for which IS~[ reaches its maximum, i.e.
Q = [S~* [. In the same way M is chosen such that V = [Z~*[. Asymptotic
properties of the statistic M were derived b y Hinkley (1970). Because of the
slow convergence of M to its asymptotic distribution Hinkley's results are
n o t applicable to most hydrological sequences.
Pr (K:k)
0.5-
0.4
~--- rn =15
0.3-
'!f
_
0.2
m=5
I
I
L-, i
0.1
-J
L-~
-
r-J
i
i
10
20
k
Fig. 4. Distribution o f the index for w h i c h Is~l reaches its m a x i m u m u n d e r the c o n d i t i o n
that the null h y p o t h e s i s is rejected at t h e 5% level (n = 30, IA'I = 1.5).
Pr
(M:k)
0.5-
0.4-
:*--m:5
o.3-
~--rn=15
o,2f-J
r-J
0
i0
20
k
Fig. 5. Distribution of the index for which iZ~t reaches its m a x i m u m under the condition
that the null hypothesis is rejected at the 5% level (n ----30, i A'i = 1.5).
20
For n = 30 the probability distribution of K and M was obtained from the
generated samples on which Figs. 2 and 3 were based. Only those samples
were taken into account for which the null hypothesis of a constant mean
was rejected at the 5% level. Figs. 4 and 5 give the distributions of K and M
in the situation t h a t IA'] = 1.5. The distributions are given for two positions
of the change-point: m = 5 and m = 15.
The peak in the empirical distributions of K and M always coincides with
the position of the change-point m. For m = 15 the statistic K is less dispersed than M, while on the other hand M is superior if m = 5. The distribution of K is highly skewed when the change in the mean occurs at the
beginning of the sequence. This can be roughly explained as follows.
Fig. 6 gives the means of S~ and Z~ (obtained from eq. A-1 in the Appendix) for m = 5 and ~ = --1.5. The mean of S~ rises quickly to its m a x i m u m
at k = 5, but for k > 5 the mean drops down slowly. From the figure one
reads for instance E(S~o) >E(S~). Also from eq. A-2 it follows var(S~0)>
var(S~) and consequently for s sufficiently large Pr(S~o>S)> Pr(S~>s).
Since the probability of high values for S~ at the beginning of the sequence
is relatively small, it is very unlikely t h a t S~ reaches its m a x i m u m for k < 5.
Therefore the distribution of K is positively skewed in this situation. This is
n o t the case for the statistic M, since for the weighted adjusted partial sums
Z~ the curve of the mean is rather symmetric near the peak at k = 5, and the
variance does n o t depend on k.
So in the situation of a single change-point the index for which Z~ reaches
its m a x i m u m (or minimum) has a rather symmetric distribution. When there
are two change-points it may occur t h a t the positions of the m a x i m u m and
m i n i m u m of the Z~'s have very skewed distributions (Buishand, 1981).
In Figs. 4 and 5 the magnitude of the jump in the mean was 1.5oy. For
larger jumps in the mean the distributions of K and M are more concentrated
around the position of the change-point m. When there is only a small change
the estimates of m are widely scattered.
E (S~()
E(Z~)
0.6
2-
0.
0
0
~0
20
k 30
0 3
10
2'o k 30
Fig. 6. M e a n o f S ~ a n d Z ~ f o r n = 3 0 , m = 5 a n d A = - - 1 . 5 .
21
APPLICATION TO RAINFALL DATA
In the climatological network of the Royal Netherlands Meteorological
Institute (K.N.M.I.) there are about 320 stations with daily rainfall registrations (about 1 gauge per 1 0 0 k m ~ ). The data from 1951 onwards are available on magnetic tape.
The homogeneity of the records from 264 stations was investigated for
the period 1951--1980. Stations with long interruptions in the observations
were n o t taken into account. To obtain a sequence of 30 yr. for each station,
missing data (e.g. due to a change of observer or the damaging floods in
February 1953) were supplemented from nearby stations.
The use o f year-by-year differences
For the analysis of homogeneity the c o u n t r y was divided into a number of
regions (Fig. 7). In the flat regions I, II, III and IV there is little variation in
the local rainfall climate. Differences in the mean a m o u n t of rainfall are
more pronounced in the small hilly region V. Fig. 8 gives for each region the
annual means over consecutive 5-yr. periods. The figure shows t h a t the early
1950's were rather dry. The very wet 1960's were followed by the dry
1970's.
The statistical tests were applied to the sequence of year-by-year differences:
ri = z i - Ri
(15)
with Xi: a m o u n t of rainfall in year i for the station under consideration; and
R~: average a m o u n t of rainfall in year i for the other stations in the region.
In general, regional means are hardly sensitive to changes in the site of
individual rainfall stations. Local changes in the observations of the station
under consideration affect the means of the Xi's and the Yi's in the same
way. But since o~ < o~c, the Y~'s are preferred for testing homogeneity.
In The Netherlands the standard height of the rain-gauge is 0.40 m and the
climatological conditions are such t h a t a station relocation or a change in
the exposure of the gauge m a y lead to a decrease or increase in the annual
mean of 5--10%. From Fig. 8 it is seen t h a t in all regions the annual mean is
~ 8 0 0 mm. Since o y is on average 45 mm, the standardized shift A' is nearly
1 for a change in the mean of 5%. For this value of A' it is seen from Fig. 2
t h a t for m = 15 the probability of rejecting the null hypothesis varies from
0.27 to 0.67, depending on the test-statistic used. These probabilities are
much smaller for a change in the mean near the end-points of the sequence.
For m = 5 it follows from Fig. 3 t h a t the probability of rejecting H 0 still
differs substantially from 1 for jumps in the mean of 10% (A' ~ 2).
For each test-statistic the number of records were counted for which the
null hypothesis of a constant mean was rejected at a certain significance
level. The results are given in Table III for the 5% level and in Table IV for
22
lOOkm
5O
i
o
..
r
i
II
)
Ig
,r,
:/ i
•L . . - %
s
( s.
'!
\
~,
)
/
/
~'/
'\4
i!
,
"L\
Fig. 7. Geographical regions used in the analysis of homogeneity.
mm
900
I
/L
800
700
11
-
1980
1950
1950
1980
mm
9oo1
,oo~F ~-
m
T~
r
2
1980
1950
1980
Fig. 8. Average annual amounts over 5-yr. periods for the regions in Fig. 7.
23
TABLE III
Results of tests on homogeneity ((~ -- 0.05)
Region
I
II
III
IV
V
Total
Total number
of stations
Number of stations for which H 0 is rejected
N
Q
R
W
U
A
66
71
53
64
16
22
17
19
21
25
10
17
22
31
10
16
23
20
11
16
17
23
11
13
16
22
10
12
I0
5
--
--
--
--
264
79
1
73
80
70
64
60
TABLE IV
Results of tests on homogeneity (~ ----0.01)
Region
Total number
of stations
Number of stations for which H0 is rejected
N
I
66
9
II
III
IV
71
53
64
10
3
6
V
Total
I0
.
264
28
.
Q
R
W
11
16
6
8
10
14
4
11
7
9
9
10
4
6
6
4
6
7
4
6
27
25
26
.
41
.
.
39
U
A
.
t h e 1% level. Despite t h e l o w p o w e r o f t h e test-statistics f o r relevant changes
in t h e m e a n , t h e r e are m a n y r e c o r d s with s t r o n g statistical evidence o f n o n homogeneity.
Discussion of the results
Table I I I s h o w s t h a t f o r each test-statistic t h e r e are ~ 70 significant values
at the 5% level. U n d e r t h e null h y p o t h e s i s o f a c o n s t a n t m e a n f o r all 2 6 4
r e c o r d s the e x p e c t e d n u m b e r o f significant values is 13. P r o v i d e d t h a t corr e l a t i o n b e t w e e n t h e Yi's f r o m d i f f e r e n t stations can be neglected, t w e n t y
significant values w o u l d be highly unusual. This last a s s u m p t i o n is questionable since t h e r e is always s o m e c o r r e l a t i o n b e t w e e n the y e a r - b y - y e a r differences o f n e a r b y stations. F o r instance, let A and B be t w o n e i g h b o u r i n g
s t a t i o n s in t h e same r e g i o n a n d s u p p o s e t h a t in a p a r t i c u l a r y e a r the a m o u n t
o f rainfall at s t a t i o n A lies a b o v e t h e regional average. T h e n it is very likely
t h a t the a n n u a l a m o u n t o f its n e i g h b o u r B is also higher t h a n the regional
m e a n . D u e to this c o r r e l a t i o n it is possible t h a t in a p a r t i c u l a r region t h e
n u m b e r o f significant values is m u c h larger t h a n t h e e x p e c t e d n u m b e r u n d e r
24
t he null hypothesis. However, when all records are h o m o g e n e o u s it is very
unlikely t h a t this occurs in nearly all regions as is the case here.
So it can be c oncl uded t h a t m a n y records are n o t hom ogeneous. There are
only small differences b e t w e e n the various test-statistics with respect t o t he
n u m b e r o f significant values. T he fact t h a t this n u m b e r is relatively high for
the statistics N and R indicates t h a t departures f r o m h o m o g e n e i t y do n o t
always consist o f a single shift in t he mean.
T h o u g h for a n u m b e r of records there is statistical evidence of changes
in th e mean, it is o f t e n n o t possible to correct these records. T o make sensible corrections it is necessary to know the causes of differences in the mean.
F o r ten records with serious departures f r o m h o m o g e n e i t y a careful examination o f the station history was made to t ry to find a reason for these
departures. Only in five cases was some indication f o u n d for a decrease or
increase in th e mean a m o u n t of rainfall. In one of these the situation o f the
raingauge site had been gradually improved and in four others there was a
marked change in t he slope o f t he cumulative sum plot coinciding with the
date o f a station relocation. But even for three of these four stations it was
n o t quite clear w hy t he change of location resulted in a considerable decrease or increase in t he mean a m o u n t o f rainfall.
Other methods for testing homogeneity
Sometimes the sequence of ratios Xi/Ri is preferred for testing homogeneity (W.M.O., 1966). Instead of testing for a constant ratio bet w een t w o
quantities one can also test f or a c ons t ant difference bet w een their logarithms. Th e tests f or h o m o g e n e i t y were repeated with the logarithms of the
annual amounts, which gave the same results as the tests on t he original
annual amounts.
Th e tests were also d o n e with a partition of T he Netherlands into fifteen
regions instead of five. F o r m os t stations the results were nearly identical.
There were, however, a few stations f or which one subdivision indicated
serious departures f r o m h o m o g e n e i t y whereas the o t h e r subdivision did not.
SUMMARY AND CONCLUSIONS
Characteristics of cumulative deviations f r o m the mean can be used to
test the h o m o g e n e i t y o f rainfall records. As a first example t w o tests on the
rescaled adjusted partial sums were introduced.
Weighted cumulative deviations were discussed to emphasize changes near
the end-points o f t h e sequence. It was p o i n t e d o u t t h a t Worsley's likelihood
ratio test f o r a shift in t he mean in normal populations is equivalent t o a test
o n th e weighted adjusted partial sums.
Some a t t e n t i o n was paid to Bayesian procedures for testing a change in
level. Th e resulting test-statistics are simple quadratic forms o f the rescaled
adjusted partial sums.
25
It was shown by the data generation m e t h o d t h a t tests on the cumulative
deviations are superior to the classical yon Neumann ratio for a model with
only one change in the mean. The tests were applied to annual data for 264
rainfall stations in The Netherlands. There was strong evidence of departures
from homogeneity. The yon Neumann ratio gave nearly the same results as
the tests on the cumulative deviations.
ACKNOWLEDGEMENTS
The author wishes to t h a n k his colleagues of the Climatological Branch
for proposing this subject. He also would like to express his sincere gratitude
to Messrs. A. Denkema and A.C. Patist for their work with the rainfall data.
APPENDIX
Properties of adjusted partial sums
When the Y}s have a normal distribution, then the adjusted partial sums
are also normally distributed. For the model in eq. 1 it can readily be shown
that:
_ k(n -- m) A,
E(S~) =
t
k=O,...,m
n
(A-l)
m(n - - k ) A '
n
k=m+
l,...,n
and
var(S~) =
k(n--k)o~,, k=O,...,n
(A-2)
n
So for the weighted adjusted partial sums Z~ one obtains:
var(Z~) = l a ~ ,
k= 1,...,n--1
(A-3)
n
Asymptotic properties of the sequence {S~} in the situation that A = 0
are used in heuristic derivations of the limiting distributions of the Kolmogorov--Smirnov and the Cram6r--von Mises statistics (Doob, 1949; Anderson
and Darling, 1952). Let:
=
max
O~k~n
]S'~]/ay
(A-4)
26
and
R = {max
St--
O~l~n
min
(A-5)
S~}/ay
O~k~n
The limiting distribution of Q/n is the same as that of the Kolmogorov-Smirnov statistic (Doob, 1949). A derivation of the asymptotic distribution
o f / ~ is given by Feller (1951). The distribution of the quadratic form in
eq. 11 was investigated by Anderson and Darling (1952) to derive the limiting distribution of the Cram~r--von Mises statistic.
Properties o f rescaled adjusted partial sums
Because of the sample standard deviation in the denominator of eq. 4
the rescaled adjusted partial sums do n o t have a normal distribution. When
A = 0, the squares of the weighted rescaled adjusted partial sums Z~* have
a beta distribution with parameters ½ and ½n -- 1 (Anis and Lloyd, 1976).
Therefore:
1
var(Z~*) = E(Z~*) 2 - n - - l ' k = l ,
n--1
(A-6)
-
-
•
•
°
~
and
var(S~*)
=
E
** 2
(Sk)
=
k(n--k)
n--1
'
k=O,...,n
(A-7)
Substitution of this expression for E(S~*) 2 in the right-hand side of eq. 12
gives E(U) = ~ for all n. So U/n and Smirnov's ~2 have the same mean. In
the same way it is shown that E ( A ) = 1 in correspondence with the mean of
the Anderson--Darling statistic.
Since for independent normal variates the sample standard deviation D y
converges with probability 1 to a y , the statistics Q and R have the same
limiting distribution as Q and/~, respectively, and the asymptotic distributions of U/n and A are identical to those of Smirnov's ~2 and the Anders o n - D a r l i n g statistic.
REFERENCES
Anderson, T.W. and Darling, D.A., 1952. Asymptotic theory of certain "goodness of fit"
criteria based on stochastic processes. Ann. Math. Stat., 23: 193--212.
Anis, A.A. and Lloyd, E.H., 1976. The expected value of the adjusted rescaled Hurst
range of independent normal summands. Biometrika, 63: 111--116.
Buishand, T.A., 1981. The analysis of homogeneity of long-term rainfall records in The
Netherlands. R. Neth. Meteorol. Inst. (K.N.M.I.), De Bilt, Sci. Rep. No. 81-7.
Chernoff, H. and Zacks, S., 1964. Estimating the current mean of a normal distribution
which is subjected to changes in time. Ann. Math. Stat., 35: 999--1018.
Craddock, J.M., 1979. Methods of comparing annual rainfall records for climatic purposes. Weather, 34: 332--346.
27
Doob, J.L., 1949. Heuristic approach to the Kolmogorov--Smirnov theorems. Ann. Math.
Stat., 20: 393--403.
Feller, W., 1951. The asymptotic distribution of the range of sums of independent random variables. Ann. Math. Stat., 22: 427--432.
Gardner Jr., L.A., 1969. On detecting changes in the mean of normal variates. Ann. Math.
Star., 40: 116--126.
Gomide, F.L.S., 1978. Markovian inputs and the Hurst phenomenon. J. Hydrol., 37:
23--45.
Hinkley, D.V., 1970. Inference about the change-point in a sequence of random variables.
Biometrika, 57: 1--17.
Owen, D.B., 1962. Handbook of Statistical Tables. Addison-Wesley, Reading, Mass.
Searcy, J.K. and Hardison, C.H., 1960. Double-mass curves. In: Manual of Hydrology:
Part 1, General Surface Water Techniques. U.S. Geol. Surv., Water-Supply Pap.,
1541-B: Washington, D.C., 31--59.
Sen, A. and Srivastava, M.S., 1975. On tests for detecting change in mean. Ann. Stat.,
3: 98--108.
Wallis, J.R. and O'Connell, P.E., 1973. Firm reservoir y i e l d - How reliable are historic
hydrological records? Hydrol. Sci. Bull., 18: 347--365.
W.M.O. (World Meteorological Organization), 1966. Climatic change. World Meteorol.
Org., Geneva, Tech. Note 79.
Worsley, K.J., 1979. On the likelihood ratio test for a shift in location of normal populations. J. Am. Stat. Assoc., 74: 365--367.
Yevjevich, V. and Jeng, R.J., 1969. Properties of non-homogeneous hydrologic series.
Colo. State Univ., F o r t Collins, Colo., Hydrol. Pap. 32.