Download accepted manuscript

Document related concepts
no text concepts found
Transcript
Thesis for the degree of Licentiate of Enginering in Mathematical
Statistics
Modelling Precipitation in
Sweden Using Multiple Step
Markov Chains and a
Composite Model
Jan Lennartsson
Department of Mathematical Sciences
Chalmers University of Technology and University of Gothenburg
Göteborg, Sweden, 2008
Modelling Precipitation in Sweden Using Multiple Step Markov Chains and a Composite Model
Jan Lennartsson
© Jan Lennartsson, 2008
Department of Mathematical Sciences
Chalmers University of Technology and University of Gothenburg
SE–41296 Göteborg, Sweden
Phone: +4631–772 1000
ISSN 1652-9715
Technical Report 2008:38
Printed in Sweden, 2008.
Abstract
In this thesis, we propose a new method to model precipitation in Sweden. We consider
a chain dependent stochastic model that consists of a first component that models the
probability of occurrence of precipitation at a weather station and a second component
that models the amount of precipitation when precipitation occurs. For the first component we fit a multiple order Markov chain. It turns out that for most of the weather
stations in Sweden a Markov chain of an order higher than one is required. For the second
component, which is a temporal Gaussian process with marginals transformed to have a
distribution composed of the empirical distribution of the amount of historically observed
precipitation at each weather station below a suitable threshold and a fitted generalized
Pareto distribution above that threshold. In other words, we model the temporal dependence between amounts of precipitation at different times by means of a Gaussian copula.
The derived model is then used to compute different weather indices. The distribution of
these indices according to our model show good agreement with the corresponding empirical distributions for the indices as computed from real world data, which supports the
choice of the model.
Keywords: Multiple order Markov chain; Generalized Pareto distribution; Gaussian
copula; Precipitation process; Empirical distribution; Sweden.
i
Acknowledgements
First, I would like to express my gratitude to my supervisor, Patrik Albin, for the
support, guidance and belief in me. Thank you for always having time with my
questions and ideas and for sharing your experience and for pushing me forward
with great enthusiasm. You also deserve a rose for just being the lovely unique man
that you are.
Secondly, Anastassia Baxevani for endlessly accepting my stripped declarations and
lack of stringency. Also, I am in great debt to you for aiding, driving, pushing and
forcing this project into land.
Thirdly, Dan Strömberg for committing and helping out and my previous co–advisor,
Igor Rychlik, for his support.
Fourthly and in ascending order Daniel Drugge, Elin Lennartsson and Martin Norling for helping me complete this thesis.
Fifthly, Johan Tykesson, Dan Kuylenstierna, Marcus Warfheimer and Marcus Gavell
deserves gratitude for putting up with me during courses and/or most of the time
accepting or intriguing challenging discussions.
Ottmar Cronie for temporarily looking after my office.
Former and present colleagues at mathematical sciences.
*
This work was financially supported by Göteborgs MiljöVetenskapliga centrum (GMV).
iii
Contents
Abstract
i
Acknowledgements
iii
Introduction
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
My contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
2
3
Appended paper
5
v
Introduction
This licentiate thesis is based on the manuscript “Modelling precipitation in Sweden
using multiple step Markov chains and a composite model” which can be found as
an appended paper after this introduction. This manuscript has been accepted for
publication in Journal of Hydrology. The manuscript is a joint work with Anastassia Baxevani and Deliang Chen. Below is an introductory motivation and a nontechnical description of the manuscript, together with a statement of my personal
contributions to the manuscript as compared with that of my co-authors.
Motivation
Distribution of precipitation is a major environmental issue on both a national and
an international level. For example, it is clear that the effects of drought or anomalously wet weather conditions may have devastating consequences on agriculture.
However, the entire distribution of the downfall is also of great interest and realistic
sequences of meteorological variables such as precipitation are key inputs in many
hydrologic, ecologic and agricultural models.
Climate, and in particular precipitation, is governed by a set of physical principles which may in mathematical language be represented as differential equations. In
meteorology, where only a short time span is considered, these differential equations
are used to predict precipitation. However, the equations feature high complexity
which has the consequence that no exact solution(s) are known. This in turn results in that variability and uncertainty for predicting precipitation in longer time
periods, e.g., over a month or a year, grows in an unmanageable extent. In order
to forecast precipitation for longer time spans the precipitation process is therefore
usually modelled as a stochastic process.
In the absence of complete knowledge of all the underlying processes which govern climate, simulation models are needed to model the stochastic behaviour of
the system when historical records are of insufficient duration or inadequate spatial
and/or temporal coverage. In these cases synthetic sequences may be used to fill
in gaps in historical records, to extend the historical record, or to generate realizations of weather that are stochastically similar to the historical record. A weather
generator is a stochastic numerical model that generates daily weather series with
the same statistical properties as observed real world weather series.
However, even if the measured information of interest – i.e., the amount of
precipitation – is a fairly accessible entity, the actual building of a weather generator
is made difficult by the long time periods and complex processes which governs it.
In contrast to meteorology, which focuses on short term weather systems, in the
case of a weather generator the interest lies rather in unfolding the structure of
the underlying process. The actual pattern of measures such as frequency or the
trends of those systems are of more interest than the daily precipitation. When
1
watching the news one is usually more interested in knowledge about the opposite,
the precipitation in a near future.
The climate consists of more than averages and seasonal variations. Rare events
with extreme weather are also a part of the climate. There are different kinds of
climate extremes, some are violent, such as a severe rainfall, which may have a
devastating affect on the community. In order to optimize the counter response to
these a model that accurately describes the distribution of these events is a high
priority.
Description
The aim is to create a weather generator that can produce realistic sequences of
precipitation for some specific geographical sites in Sweden.
In developing the weather generator, the stochastic structure of the time series of
daily amounts of precipitation is described by a statistical model. The parameters
of this model are estimated by means of using observed real world precipitation
data. The thus completed weather generator allows us to generate arbitrarily long
series with the stochastic structure of the model. We can also evaluate the degree
of similarity between weather data produced by the model and the real world data
series.
The statistical model is a chain-dependent model which consists of two steps:
First a model for the sequence of wet/dry days, for which we employ a multiple order
Markov chain. Secondly a model for the amount of precipitation for the wet days,
for which we employ a composite model that incorporates the empirical distribution
together with an extreme value distribution for the tail.
In this study the actual physical processes that are influencing the climate are
of less importance than what they accomplish, that is, the daily amount of precipitation.
In the mathematical part of our work a great deal revolves around Markov
processes. Briefly put, a Markov process is a process that features the property
of a full blown amnesia: The future is independent of the past conditioned on that
the current is known. A multiple step Markov chain is just that the full amnesia
property is relaxed so that with knowledge of the current state and the states some
steps back, then the future is independent of the past. However, even though the
Markov property is very useful in terms of explicitly finding the probability of a vast
number of interesting events, one must remember that this is just a mathematical
model for a real world that sometimes might be much more complicated. Still it is
a good and versatile model.
By countless examples it has been demonstrated that Gaussian processes very
accurately describe the dependence structure in our world. So when confronting
the fact that the data indicated temporal dependence it is natural to appointee
a Gaussian copula. Instead of estimating the parameter for a Gaussian copula
directly from the data – a computationally very hard assignment – we sidestep that
problem by means of using the fact that all computations for copulas are based
on the empirical rank. We transform these ranks to their corresponding normal
quantile, meaning that if the bivariates really are governed by the Gaussian copula,
then the transformed bivariables are multivariate normal distributed. Estimating
the Gaussian copula parameter now reduces to estimate the correlation coefficient
for a multivariate normal distributed variable, which is an elementary task.
2
Since extreme rainfall so crucially affect the community, we spend substantial
effort on analysing these rare events. By a point-process approach, singling out the
dependence of close-by extremes, the extremal distribution is estimated.
My contribution
The problem of creating a weather generator, together with suggestions for suitable
data sets, was proposed to me by environmentalist Deliang Chen. My head advisor
– Patrik Albin – got me started by suggesting dividing the precipitation process
in different parts and specifically modelling the precipitation-/no precipitation part
as a multiple step Markov chain and the rain distribution by a combination of the
empirical distribution and an extreme value distribution. By the fruitful tutoring of
my co-advisor – Anastassia Baxevani – we moved the problem forward and ended
up with this thesis at hand. Except for the above mentioned crucial advisory contributions, everything in this thesis has been done by myself the undersigned. That
means building up the stochastical model from scratch into something that very
accurately replicates historical data, by means of statistical test with respect to the
established weather indices.
3
Appended paper
Lennartsson, J., Baxevani, A., Chen, D., Modelling precipitation in Sweden using
multiple step Markov chains and a composite model, Journal of Hydrology (2008),
doi: 10.1016/j.jhydrol.2008.10.003
Two versions of the manuscript is appended because of the differing typographical advantages; the final manuscript as accepted for publication and the present
version of the manuscript as typeset by Journal of Hydrology.
5
Modelling Precipitation in Sweden Using Multiple Step Markov
Chains and a Composite Model
Jan Lennartsson1 , Anastassia Baxevani1∗†, Deliang Chen2
1
Department of Mathematical Sciences, Chalmers University of Technology, Univer-
sity of Gothenburg, Gothenburg, Sweden
2
Department of Earth Sciences, University of Gothenburg, Gothenburg, Sweden
Abstract
In this paper, we propose a new method for modelling precipitation in Sweden. We consider a
chain dependent stochastic model that consists of a component that models the probability of
occurrence of precipitation at a weather station and a component that models the amount of
precipitation at the station when precipitation does occur. For the first component, we show
that for most of the weather stations in Sweden a Markov chain of an order higher than one is
required. For the second component, which is a Gaussian process with transformed marginals,
we use a composite of the empirical distribution of the amount of precipitation below a given
threshold and the generalized Pareto distribution for the excesses in the amount of precipitation
above the given threshold. The derived models are then used to compute different weather
indices. The distribution of the modelled indices and the empirical ones show good agreement,
which supports the choice of the model.
Key words: High order Markov chain, generalized Pareto distribution, copula, precipitation process, Sweden
∗
†
Corresponding author
Research supported partially by the Gothenburg Stochastic Center and the Swedish foundation for
Strategic Research through Gothenburg Mathematical Modelling Center.
1
1
Introduction
Realistic sequences of meteorological variables such as precipitation are key inputs in many
hydrologic, ecologic and agricultural models. Simulation models are needed to model
stochastic behavior of climate system when historical records are of insufficient duration
or inadequate spatial and /or temporal coverage. In these cases synthetic sequences may
be used to fill in gaps in the historical record, to extend the historical record, or to generate
realizations of weather that are stochastically similar to the historical record. A weather
generator is a stochastic numerical model that generates daily weather series with the
same statistical properties as the observed ones, see Liao et al. (2004).
In developing the weather generator, the stochastic structure of the series is described
by a statistical model. Then, the parameters of the model are estimated using the observed
series. This allows us to generate arbitrarily long series with stochastic structure similar
to the real data series.
Parameter estimation of stochastic precipitation models has been a topic of intense
research the last 20 years. The estimation procedures are intrinsically linked to the nature
of the precipitation model itself and the timescale used to represent the process. There are
models which describe the precipitation process in continuous time and models describing
the probabilistic characteristics of precipitation accumulated on a given time period, say
daily or monthly totals. Different reviews of the available models have been presented: see
for example Woolshiser (1992), Cox and Isham (1988) and Smith and Robinson (1997).
Continuous time models for a single site with parameters related to the underlying physical precipitation process are particularly important for the analysis of data at
short timescales, e.g. hourly. Some of these models are described in Rodrı́guez-Iturbe et
al. (1987, 1988) and Waymire and Gupta (1981).
When only accumulated precipitation amounts for a particular time period (daily) are
recorded then empirical statistical models, based on stochastic models that are calibrated
from actual data are appealing. Empirical statistical models for generating daily precipitation data at a given site can be classified into four different types, chain dependent
or two-part models, transition probability matrix models, resampling models and ARMA
time series models, see Srikanthan and McMahon (2001) for a complete review of the
2
different models.
A generalization of the precipitation models for a single site is the spatial extension
of these models for multiple sites, to try to incorporate the intersite dependence but preserving the marginal properties at each site. A more ambitious task is the modelling of
precipitation continuously in time and space and original work on these type of models
based on point process theory was presented by LeCam (1961) and further developed by
Waymire et al. (1984) and Cox and Isham (1994). Mellor (1996) has developed the modified turning bands model which reproduces some of the physical features of precipitation
fields in space as rainbands, cluster potential regions of rain cells.
In this study we concentrate on the chain-dependent model for the daily precipitation
in Sweden which consists of two steps, first a model for the sequence of wet/dry days
and second, a model for the amount of precipitation for the wet days. For the first, we
use high-order Markov chains and for the second we introduce a composite model that
incorporates the empirical distribution and the generalized Pareto distribution.
2
Data
Fig.1: Location of the stations.
Precipitation data from 20 stations in Sweden have been used in the studies presented
in this paper. The locations are shown in Fig. 1 and the names of the stations are given
in Table 1. The data consist of accumulated daily precipitation collected during 44 years
starting on the 1st of January 1961 and ending the 31st of December 2004 and are provided
by the Swedish Meteorological and Hydrological Institute (SMHI). The number of missing
3
observations in all stations is generally low (< 5%). The time plots of the annual number
of wet days (above the threshold 0.1 mm) at the 20 stations are presented in Fig. 2.
1970 1980 1990 2000
100
0
Years
100
0
200
100
0
Malugn
1970 1980 1990 2000
200
100
0
200
100
1970 1980 1990 2000
300
200
100
0
1970 1980 1990 2000
Years
Härnösand
300
0
1970 1980 1990 2000
300
Years
annual no. of wet days
100
1970 1980 1990 2000
annual no. of wet days
200
300
Years
annual no. of wet days
200
annual no. of wet days
300
Rösta
annual no. of wet days
1970 1980 1990 2000
1970 1980 1990 2000
300
200
100
0
1970 1980 1990 2000
Years
Years
Years
Years
Vattholma
Myskåsen
Piteå
Stensele
100
1970 1980 1990 2000
200
100
0
Years
Haparanda
200
100
1970 1980 1990 2000
0
annual no. of wet days
Years
Years
0
1970 1980 1990 2000
Years
1970 1980 1990 2000
Years
1970 1980 1990 2000
1970 1980 1990 2000
100
Karesuando
100
0
100
200
Pajala
200
100
200
300
Years
300
200
300
Kvikkjokk
annual no. of wet days
300
1970 1980 1990 2000
300
annual no. of wet days
200
300
annual no. of wet days
300
annual no. of wet days
annual no. of wet days
annual no. of wet days
1970 1980 1990 2000
Söderköping
Years
annual no. of wet days
0
Säffle
Stockholm
annual no. of wet days
1970 1980 1990 2000
Borås
300
0
0
100
Hanö
Years
0
100
200
Years
100
0
200
Years
200
0
1970 1980 1990 2000
Ungsberg
300
Years
300
0
200
300
annual no. of wet days
100
Varberg
300
annual no. of wet days
annual no. of wet days
200
0
Bolmen
annual no. of wet days
annual no. of wet days
annual no. of wet days
Lund
300
300
200
100
0
1970 1980 1990 2000
Years
Fig.2: Time plot of annual number of wet days.
Time plots of annual number of wet days showed that the precipitation regime in
some stations (namely, Söderköpping, Rösta and Stensele) contains possible trends. The
4
results presented in the next sections refer to the whole period of data from all stations,
but attention should be paid when we refer to the above mentioned stations. In Fig. 3,
time plots of the annual amount of precipitation of the wet days are presented. The total
amounts of precipitation seem to be stationary over the different years.
3
Model
To model precipitation in Sweden, we have decided to use a chain dependent model. The
first part of the model can be dealt with using Markov chains. Gabriel and Newman (1962)
used a first-order stationary Markov chain. The models have since been extended to allow
for non-stationarity, both by fitting separate chains to different periods of the year and
by fitting continuous curves to the transition probabilities, see Stern and Coe (1984) and
references within. The order of Markov chain required has also been discussed extensively,
for example Chin (1977) and references therein, with the obvious conclusion that different
sites require different orders. Still, the first order Markov chains are a popular choice since
they have been shown to perform well for a wide range of different climates, see for example
Bruhn et al. (1980), Lana and Burgueno (1998) and Castellvi and Stockle (2001). The
main deficiency associated with the use of first order models is that long dry spells are
not well reproduced, see Racsko et al. (1991), Guttorp (1995).
To model the amount of precipitation that has occurred during a wet day, different
models have been proposed in the literature all of which assume that the daily amounts
of precipitation are independent and identically distributed. Stidd (1973) and Hutchinson (1995) have proposed a truncated normal model for the amount of precipitation with
a time dependent parameter, while the Gamma and Weibull distributions have been selected by Geng et al. (1986) as well as Selker and Heith (1990), because of their site-specific
shape.
In this study, we model the occurrence of wet/dry days using Markov chains of higher
order and for the amount of precipitation we use a composite model, consisting of the
empirical distribution function for values below a threshold and the distribution of excesses
for values above the given threshold. Such a model is more flexible, describes better the
tail of the distribution and additionally allows for dependence in the precipitation process.
5
Bolmen
2000
1980
Years
500
500
1980
1000
500
500
amount of precip. (mm)
2000
1980
Härnösand
500
Rösta
1500
1000
500
2000
1980
1500
1000
500
2000
1980
Years
Years
Years
Myskåsen
Piteå
Stensele
500
1000
500
2000
1980
Years
Haparanda
amount of precip. (mm)
500
2000
1980
Years
2000
1500
1000
500
1980
2000
Years
1500
1000
500
2000
1980
Years
Years
Pajala
Karesuando
amount of precip. (mm)
1500
1000
500
Years
500
2000
Kvikkjokk
1000
1980
1000
Years
1500
1980
1500
amount of precip. (mm)
1000
1500
amount of precip. (mm)
1500
2000
Years
Years
amount of precip. (mm)
amount of precip. (mm)
1000
Years
1000
1980
1500
Vattholma
1980
amount of precip. (mm)
1980
2000
Söderköping
500
2000
1500
2000
1980
Years
1000
Malugn
amount of precip. (mm)
amount of precip. (mm)
Stockholm
500
2000
1500
Years
1500
1000
Säffle
1000
2000
1500
Years
1500
Years
amount of precip. (mm)
1980
amount of precip. (mm)
1000
1980
2000
Borås
amount of precip. (mm)
amount of precip. (mm)
Hanö
1500
1980
500
Years
amount of precip. (mm)
1980
500
1000
amount of precip. (mm)
500
1000
Ungsberg
1500
amount of precip. (mm)
1000
Varberg
1500
amount of precip. (mm)
amount of precip. (mm)
amount of precip. (mm)
Lund
1500
2000
2000
1500
1000
500
1980
2000
Years
Fig.3: Time plot of annual amount of precipitation.
Let Zt be the precipitation at a certain site at time t measured in days. Then, a
chain-dependent model for the precipitation is given by,
Zt = Xt Wt ,
where Xt and Wt are stochastic processes such that Xt takes values in {0, 1} and Wt
6
Number Name
1
Lund
2
Bolmen
3
Hanö
4
Borås
5
Varberg
6
Ungsberg
7
Säffle
8
Söderköping
9
Stockholm
10
Malung
11
Vattholma
12
Myskelåsen
13
Härnösand
14
Rösta
15
Piteå
16
Stensele
17
Haparanda
18
Kvikkjokk
19
Pajala
20
Karesuando
Table 1: Names of weather stations.
takes values in R+ \ 0. The processes Xt and Wt will be referred to as the occurrence of
precipitation and the amount of precipitation process, respectively.
The approach presented in this study provides a mechanism to make predictions of
precipitation in time. This is particularly important for many applications in hydrology,
ecology and agriculture. For example, at a monthly level, the amount of precipitation and
the probability and length of a dry period are required quantities for many applications.
7
4
Models for the Occurrence of Precipitation
Let {Xt , t = t1 , . . . , tN } denote the sequence of daily precipitation occurrence, i.e. Xt = 1,
indicates a wet day and Xt = 0, a dry day. A wet day in the context of this study, occurs
when at least 0.1mm of precipitation was recorded by the rain gauge. The level has been
chosen above zero in order to avoid identifying dew and other noise as precipitation and
to also avoid difficulties arising from the inconsistent recording of very small precipitation
amounts. Moreover, daily precipitation amounts of less than 0.1mm can have relatively
large observational errors, and including them would cause a significant change in the
estimated transition probabilities of the occurrences. As a consequence this introduces
additional errors into the fitted models. The model is fitted over different periods of the
year, that is subsets of the N days of the year, that may be assumed stationary.
Before we continue any further we need to introduce some notation. Let S = {0, 1}
denote the state space of the k-Markov chain Xt . The elements of S are called letters and
an ordering of letters w ∈ S l = S × · · · × S is called a word of length l, while the words
composed of the letters from position i to j in w for some 1 ≤ i ≤ j ≤ l, are denoted as
l
wij = (wi , wi+1 , ..., wj ). Finally, for k ≤ l let τk (w) = wl−k+1
denote the k-tail of the w
word, i.e. τk (w) denotes the last k letters of w. If no confusion will arise when k ≤ j − i,
we also write τk (wj ) instead of τk (wij ).
It is assumed that the process Xt is a k-Markov chain: a model completely characterized by the transition probability
pw,j (t) := P (Xt = j|τk (X t−1 ) = w),
j ∈ S,
t = t1 , . . . , tN ,
where w is a word of length k and X t−1 = {. . . , Xt−2 , Xt−1 } is the whole process up
to t − 1 so τk (X t−1 ) is the last k days up to and including Xt−1 ; that is, τk (X t−1 ) =
(Xt−k , . . . , Xt−1 ). Note that, for a 2−state Markov chain of any order pw,1 (t)+pw,0 (t) = 1.
In the special case of time homogeneous Markov chain, pw,j (t) = pw,j , for t = t1 , . . . , tN ,
i.e. the transition probabilities are independent of time.
Let nw,j (t) denote the number of years during which day t is in state j and is preceded
by the word w (i.e. τk (X t−1 ) = w, w ∈ S k and Xt = j). Then the probabilities pw,j (t) are
estimated by the observed proportions
p̂w,j (t) =
nw,j (t)
,
nw,+ (t)
w ∈ Sk,
8
j ∈ S,
t = t1 , . . . , tN ,
where + indicates summation over the subscript. Note also that day 60 (February 29th )
has data only in leap years so day 59 precedes day 61 in non-leap years. Fig. 4 (left)
shows the unconditional probability of precipitation, pooled over 5 days for clarity, plotted
against t for the data from the station in Lund.
In the context of environmental processes, non-stationarity is often apparent, as in
this case, because of seasonal effects or different patterns in different months. A usual
practice is to specify different subsets of the year as seasons, which results to different
models for each season, although the determination of an appropriate segregation into
0.8
20
0.75
19
0.7
18
0.65
17
mean no. of wet days
prob. of precip.
seasons is itself an issue.
0.6
0.55
0.5
16
15
14
0.45
13
0.4
12
0.35
11
0.3
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
10
Jan
Dec
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Fig.4: Lund, Sweden (data from 1961 to 2004). (Left): Observed p̂(t) pooled over 5
days. (Right): Mean number of wet days per month (”+”), and per season (solid lines).
4.1
Fitting Models to the Occurrence of Precipitation
There is an inter-annual variation in the annual number of wet days, as can be seen in
Fig. 2. Moreover, there is also seasonal variation in the mean monthly number of wet days,
see Fig. 4 (Right) for data from Lund, although this is not as prominent as in other regions
of the world. It is possible that the optimum order of the chain describing the wet/dry
sequence varies within the year and from one year to another. It is therefore important
to properly identify the period of record that can be assumed as time homogeneous.
Moreover, the problem of finding an appropriate model for the occurrence of precipitation process, Xt , is equivalent to the problem of finding the order of a multiple step
Markov chain. The Akaike Information Criterion (AIC), Bayesian Information Criterion
9
(BIC) and the Generalized Maximum Fluctuation Criterion (GMFC) order estimators, a
short description of which can be found in the subsection 8.1, have been applied to the
data for each of the stations. Various block lengths were considered for determining the
order of the Markov chain, k, as suggested in Jimoh and Webster (1996).
• 1 month blocks (i.e. January, February, ..., December),
• 2 month blocks (January - February, February - March, ..., December - January),
• 3 month blocks (January - March, February - April, ..., December - February).
The effect of block length on the order of the Markov chain can be seen in Figs. 5-7.
We can notice that grouping the data in blocks of length more than one month, results
in Markov chains of ”smoother” order, in the sense that the order of the chain does not
change so fast. It is also interesting to notice that while the order of the Markov chain for
the stations 16-20, varies a lot according to the AIC and GMFC estimators it seems to
be almost constant for the BIC order estimator. As it has been expected, the BIC order
estimator underestimates the order k of the Markov chain relatively to both the AIC and
GMFC order estimators for large k and moderate data sets, see Dalevi et al. (2006), while
the values of the GMFC order estimator lie between the BIC and AIC ones. The results
presented in Figs. 5-7, confirm that the model order is sensitive to the season (month) and
the length of the season (number of months) considered, as well as the method used in
identifying the optimum order. Possible dependence on the threshold used for identifying
wet and dry days has not been studied here. For the rest of this study, we define as
seasons the 3 month periods, December-February, March-May, June-August, SeptemberNovember. As can be seen in Fig. 3 for the station in Lund, the rest of the stations
provide with similar plots, the probability of precipitation is close to be constant during
these periods, which makes the assumption of stationarity seem plaussible. The orders of
the Markov chain for these periods can be found in Fig. 7. For the rest of this study the
order k of the Markov chain is decided according to the GMFC order estimator.
10
Estimated Orders by Akaike order estimator
st 01
st 02
st 03
st 04
Estimated Orders by Bayesian order estimator
st 05
st 01
5
3
1
st 02
st 03
st 04
Estimated Orders by GMFC order estimator
st 05
5
3
1
st 06
st 07
st 08
st 09
st 10
st 06
st 07
st 08
st 09
st 10
5
3
1
st 11
st 12
st 13
st 14
st 11
st 12
st 13
st 14
st 17
st 18
st 19
DecJan Jun
DecJan Jun
DecJan Jun
st 20
DecJan Jun
st 04
st 05
st 06
st 07
st 08
st 09
st 10
st 11
st 12
st 13
st 14
st 15
st 16
st 17
st 18
st 19
st 20
5
3
1
st 16
st 17
st 18
st 19
st 20
5
3
1
Jan Jun
st 03
5
3
1
st 15
5
3
1
st 16
st 02
5
3
1
st 15
5
3
1
st 01
5
3
1
5
3
1
Dec
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
5
3
1
Dec
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
Dec
Fig.5: k-Markov chain orders for block lengths of one month, (Jan, Feb, ...).
Estimated Orders by Akaike order estimator
st 01
st 02
st 03
st 04
Estimated Orders by Bayesian order estimator
st 05
st 01
5
3
1
st 02
st 03
st 04
Estimated Orders by GMFC order estimator
st 05
5
3
1
st 06
st 07
st 08
st 09
st 10
st 11
st 12
st 13
st 14
st 06
st 07
st 08
st 09
st 10
st 17
st 18
st 19
st 15
st 11
st 12
st 13
st 14
DecJan Jun
DecJan Jun
DecJan Jun
st 20
DecJan Jun
st 04
st 05
st 06
st 07
st 08
st 09
st 10
5
3
1
st 15
st 11
st 12
st 13
st 14
st 15
st 16
st 17
st 18
st 19
st 20
5
3
1
st 16
st 17
st 18
st 19
st 20
5
3
1
Jan Jun
st 03
5
3
1
5
3
1
st 16
st 02
5
3
1
5
3
1
5
3
1
st 01
5
3
1
Dec
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
5
3
1
Dec
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
Dec
Fig.6 k-Markov chain orders for block lengths of two months, (Jan-Feb, Feb-Mar, ...).
Estimated Orders by Akaike order estimator
st 01
st 02
st 03
st 04
Estimated Orders by Bayesian order estimator
st 05
st 01
5
3
1
st 02
st 03
st 04
Estimated Orders by GMFC order estimator
st 05
5
3
1
st 06
st 07
st 08
st 09
st 10
st 06
st 07
st 08
st 09
st 10
5
3
1
st 11
st 12
st 13
st 14
st 11
st 12
st 13
st 14
st 17
st 18
st 19
st 20
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
st 16
Dec
st 04
st 05
st 06
st 07
st 08
st 09
st 10
st 11
st 12
st 13
st 14
st 15
st 16
st 17
st 18
st 19
st 20
5
3
1
st 17
st 18
st 19
st 20
5
3
1
Jan Jun
st 03
5
3
1
st 15
5
3
1
st 16
st 02
5
3
1
st 15
5
3
1
st 01
5
3
1
5
3
1
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
Dec
5
3
1
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
Fig.7 k-Markov chain orders for block lengths of three months, (Jan-Mar, Feb-Apr, ... ).
4.2
Distribution of Dry Spell length
An interesting aspect of the wet/dry behavior, i.e. the process Xt , is the distribution of
the dry (wet) spells, i.e., the number of consecutive dry (wet) days, which is an accessible
property of multiple step Markov chains.
11
Dec
For a time homogeneous (stationary) k-Markov chain Xt , (k ≥ 2), with state-space S
let T be the first time the process Xt is such that τ2 (X t ) = (1, 0), i.e.,
T = inf{t ≥ 0 : τ2 (Y t ) = (1, 0)}.
So T is the time of the start of the first dry period. Let also for the words u, v ∈ S k
au,v = P (τk (X T ) = v|τk (X 0 ) = u)
denote the probability the process Xt has at time T a k-tail equal to v given that the
k-tail at time 0 is equal to u. The probabilities au,v are easily obtained for stationary
processes, see Norris (1997). Note that at t = 0, there may be the start of a dry period,
the start of a wet period, the continuation of a dry period or the continuation of a wet
period. If D(Xt ) denotes the length of the first dry period that starts at time t = 0 for the
k-Markov chain Xt , then assuming additionally that the process Xt is time homogeneous,
the distribution of the first dry spell can be computed as
X
X
πu
P (D(Xt ) = m) =
au,w P (τm (X m−1 ) = 0, Xm = 1|τk (X 0 ) = w),
{u∈S k }
{w∈S k :τ2 (w)=(1,0)}
(1)
where 0 is used to denote sequences of 0′ s of appropriate length.
Now, if v = w01 is a word of length m + k (0 here is of order m − 1) and using the
fact the process Xt is a k-Markov chain, Eq.1 can be rewritten as
P (D(Xt ) = m) =
X
{u∈S k }
πu
X
au,w
{w∈S k :τ2 (w)=(1,0)}
m
Y
P (Xi = vk+i |τk (X i−1 ) = τk (v k+i−1 )).
i=1
(2)
Remark 1 Here we should notice that the distribution of the first dry spell is different
than the distribution of the subsequent dry spells for Markov chains of order greater than
two. For one or two order Markov chains there is no need for this distinction. Moreover
the equivalent of Eq. 1 for k = 1 is
m−1
P (D(Xt ) = m) = p0,0
p0,1
while for k = 2, Eq. 2 simplifies to

 p
10,1
P (D(Xt ) = m) =
 p pm−2 p
10,0 00,0
where au,v = 1 for all u,v in Eq. 1.
12
for m = 1
00,1
for m ≥ 2,
The distribution of the first dry spell can be also used for model selection or model
validation purposes. For this, we use the Kolmogorov-Smirnov (KS) test, see Benjamin
and Cornell (1970). The one sample KS test compares the empirical distribution function
with the cumulative distribution function specified by the null hypothesis.
Assuming that Pk (x) is the true distribution function (of a Markov chain of order k)
the KS test is
D = sup |Pk (D(X) ≤ m) − Femp (m)|,
m∈N+
where Femp (x) is the empirical cumulative distribution of the length of the first dry spell.
If the data comes truly from a k order Markov chain and the transition probabilities are
the correct ones, then by Glivenko-Cantelli theorem, the KS test converges to zero almost
surely (a.s).
To apply the test, the transition probabilities have been estimated from the data using
maximum likelihood for different values of the order k of the Markov chain. To obtain
the empirical distribution of the length of the first dry spell, we have computed the length
of the dry spells (sequence of zeros) following the first (1, 0). (Here note that this is
equivalent to computing the length of the first dry spell for Markov chains of order k = 1
or k = 2. In the case of k = 3, although the distribution of the first dry spell is not
exactly the same as the distribution of any dry spell, we have still used all the dry spells
available due to shortage of data.) The procedure has been applied separately to data
from each station and season. If the first observations were zeros, they were ignored as
the continuation of a dry spell. Also if a dry spell was not over by the end of the season
then it was followed inside the next season.
To determine whether the theoretical model was correct or not, Monte Carlo simulations were performed. We have obtained the empirical distribution of the length of the
first dry spell using 500 synthetic wet/dry records of 44 years of data (each station and
season was treated separately), and the KS test was computed for each one of them, which
resulted to the distribution of the KS statistic.
13
Estimated Orders by KS−criterion of dry spell order estimator
st 01
st 02
st 03
st 04
st 05
st 06
st 07
st 08
st 09
st 10
5
3
1
5
3
1
st 11
st 12
st 13
st 14
st 15
st 16
st 17
st 18
st 19
st 20
5
3
1
5
3
1
Jan May Nov Jan May Nov Jan May Nov Jan May Nov Jan May Nov
Fig.8: Order of Markov chain as suggested by the Kolmogorov-Smirnov statistic at 10%
tail value for each station and season.
The suggested orders of the Markov chain using the Kolmogorov-Smirnov statistic at
the 10% tail value are collected in Fig. 8. The resulting orders of the Markov chain appear
to be close to those obtained by the BIC order estimator. In Table 2, we have collected
information on how many data sets have passed the Kolmogorov-Smirnov test at the 10%
tail value for the different seasons. Observe that the KS test suggests that the 1-Markov
chain, although widely used, is an inadequate model for the majority of the stations in
Sweden over the different seasons.
Season
Model
S1
S2
S3 S4
k=1
1
0
1
6
k=2
20
20
20
20
k=3
20
20
20
20
Table 2: Number of data sets that have passed the Kolmogorov-Smirnov test at the
10% tail value for different orders of the Markov chain. S1 stands for Dec.-Feb., S2 for
Mar.-May, S3 for Jun.-Aug. and S4 for Sep.-Nov.
4.3
Distribution of Long Dry Spells
Let us now define as long dry spell, a dry spell with length longer or equal to the order k
of the Markov chain. Then it is easy to show that the distribution of the long dry spell is
14
actually geometric. Indeed, let a long dry spell that starts at time i have length m ≥ k
and let us also assume that we know that the length of the dry spell is at least l. Then,
for m ≥ l ≥ k
m−l
P (D(Xt ) = m|τl (X i+l−1 ) = 0) = p0,1 p0,0
= p0,1 (1 − p0,1 )m−l ,
where as before
p0,1 = P (Xn+1 = 1|τk (X n ) = 0),
∀n.
Therefore, the expected length of long dry spells is given by
E(D(Xt )|τl (X i+l−1 ) = 0) = l +
1 − p0,1
.
p0,1
(3)
1
0.9
probability
0.8
0.7
0.6
0.5
0.4
Empirical
3−Markov
2−Markov
1−Markov
0.3
2
4
6
8
10
time (days)
12
14
Fig. 9: Conditional distribution of Dry Spell given the Dry Spell is longer or equal to 3
days for k-Markov chain models of order k = 1, k = 2 and k = 3 and the data from Lund.
Data are from the winter months December-February.
Fig. 9 shows the conditional distribution of dry spell given that it has lasted for more
than two days for the first season and the data from Lund. The estimated order of the
Markov chain for this data set is 2 using both the GMFC and the KS criterion. A first order
Markov chain, the popular model of choice in this case would obviously underestimate
the risk of a long dry spell. A two order Markov chain seems to be the best choice for
this particular data set.
It is clear from Table 3, that underestimation of the order k of the Markov chain leads
to underestimation of the expected length of the long dry spells, where again a dry spell
is defined as long if it has length larger than or equal to the order of the Markov chain.
15
Model
l=1 l=2 l=3
k=1
2.49
3.49
4.49
k=2
-
3.91
4.91
k=3
-
-
5.11
2.56
3.97
5.23
Observed
mean value
Table 3: Expected length of long dry spells for season Dec-Feb in Lund.
5
Modeling the Amount Precipitation Process
In this section we model the amounts of daily precipitation. This is done in two steps.
Firstly we model the dependence structure of the amount precipitation process and secondly we estimate the marginal distribution.
One of the important features of any climatological data set, is that they exhibit
dependence between nearby stations or successive days. In this work we are interested in
the latter case and the dependence structure is modelled using two-dimensional Gaussian
copula.
After the copula has been estimated, we remove the days with precipitation below the
cut-off level of 0.1mm. That is, we let Yt be the thinning process resulting from the amount
of precipitation process Wt when we consider only the wet days, i.e., Yt := Wt |Xt = 1.
Then, the marginal distribution of the amounts of daily precipitation is modelled following
an approach that combines the fit of the distribution of excesses over a high threshold
with the empirical distribution of the thinned data below the threshold.
5.1
Copula
Almost every climatological data set exhibit dependence between successive days. To
model the temporal dependence structure of the data we use the two-dimensional Gaussian
copula C given by
Z
Φ−1 (u)
Z
Φ−1 (v)
1
−x
p
e
2π 1 − ρ2
= Φρ (Φ−1 (u), Φ−1 (v)),
C(u, v; ρ) =
−∞
−∞
16
2 −2ρxy+y 2
2(1−ρ2 )
dxdy
(4)
where Φ is the cumulative distribution function of the standard normal distribution and
Φρ is the joint cumulative distribution function of two standard normal random variables
with correlation coefficient ρ.
To estimate the copula, let
A = {t : Yt > 0 and Yt+1 > 0},
be the set of all days with non zero precipitation that were followed by days also with
non zero precipitation (greater than 0.1mm) and
u = [Ya1 , Ya2 , . . . ] ,
v = [Ya1 +1 , Ya2 +1 , . . . ] ,
a1 , a2 , · · · ∈ A
be the vectors consisting of the amounts of precipitation during the days indicated in the
set A and the following days respectively, both with marginal distribution F (x). Then,
transforming the vectors u and v by taking the empirical cumulative distribution corrected
by the factor
n
,
n+1
(n is the number of days with positive precipitation in the data set)
results to vectors U and V respectively that follow the discrete uniform distribution in
(0, 1). If the Gaussian copula in Eq. 4 describes correctly the dependence structure of the
data, then

 

µ1
σ1
ρσ1 σ2
,
 .
Φ−1 (U), Φ−1 (V) ∼ N 
µ2
ρσ1 σ2
σ2
Finally the copula parameter ρ is estimated using Pearson’s correlation coefficient. An
analytic description of the method and its application can be found in Lennartsson and
Shu, (2005). The dependence between successive days is demonstrated in Fig. 13 where
the transformed data from Lund are plotted.
4
3
2
T(Yt+1)
1
0
−1
−2
−3
−4
−4
−3
−2
−1
0
T(Yt)
17
1
2
3
4
Fig.13: Plot of the dependence structure with the marginal distributions transformed to
standard normal.
For a thorough coverage of bivariate copulas and their properties see Hutchinson and
Lai (1990), Joe (1997), Nelsen (2006), and Trivedi and Zimmer (2005) who provide with
a copula tutorial for practitioners. The values of the correlation coefficient ρ, estimated
for each station are collected in Table 4. Notice that all the estimates of the correlation
coefficient ρ are statistically significant, which makes the assumption of independence
between the data points to seem unreasonable.
5.2
Marginal Distribution
Finally, to model the amount precipitation process we propose an approach that combines
the fit of the distribution of excesses over a high threshold with the empirical distribution
of the original data below the threshold. We commence our analysis by introducing
some notation followed by some introductory remarks. Let X1 , X2 , . . . be a sequence of
independent and identically distributed random variables having marginal distribution
F (x). Let us also denote by
Fu (x) = P (X ≤ x|X > u),
for x > u, the conditional distribution of X given that it exceeds level u and assume that
Fu (x) can be modelled by means of a generalized Pareto distribution, that is
Fu (x) = 1 − 1 + ξ
x−u
σ
− 1ξ
,
(5)
for some µ, σ > 0 and ξ over the set {x : x > u and 1 + ξ x−u
> 0}, and zero otherwise.
σ
Let also, Femp (x) denote the empirical distribution i.e.,
1X
{Xi ≤ x},
Femp (x) =
n i=1
n
where {·} denotes the indicator function of an event, i.e. the 0 − 1 random variable which
takes value 1 if the condition between brackets is satisfied and 0 otherwise.
Finally, define the function
FC (x; u) = Femp (x ∧ u) + (1 − Femp (u))Fu (x),
18
which, as can be easily checked, is a probability distribution function that will be used to
model the amount precipitation process. Thus what needs to be addressed is the choice
of the level u above which the excesses can be accurately modelled using a generalized
Pareto distribution as well as methods for the estimation of the distribution parameters.
5.2.1
Choice of Threshold Level
Selection of a threshold level u, above which the generalized Pareto distribution assumption is appropriate is a difficult task in practice see for example, McNeil (1996), Davison
and Smith (1990) and Rootzen and Tajvidi (1997). Frigessi et al. (2002), suggest a dynamic mixture model for the estimation of the tail distribution without having to specify
a threshold in advance. Once the threshold u is fixed, the model parameters ξ and σ are
estimated using maximum likelihood, although there exists a number of other alternative
methods, see for instance Resnick (1997) and Crovella and Taqqu (1999) and references
therein.
5.2.2
Extreme Value Analysis for Dependent Sequences
The generalized Pareto distribution is asymptotically a good model for the marginal
distribution of high excesses of independent and identically distributed random variables,
see Coles (2001), Leadbetter et al. (1983). Unfortunately, this is a property that is almost
unreasonable for most of the climatological data sets since dependence in successive days
is to be expected. A way of dealing with the dependence between the excesses is either to
choose the level u high enough so that enough time has past between successive excesses to
make them independent or to use declustering, which is probably the most widely adopted
method for dealing with dependent exceedances; it corresponds to filtering the dependent
observations to obtain a set of threshold excesses that are approximately independent,
see Coles (2001). A simple way of determining m-clusters of extremes, after specifying a
threshold u, is to define consecutive excesses of u to belong to the same m-cluster as long
as they are separated by less than m + 1 time days. It should be noted that the separation
of extreme events into clusters is likely to be sensitive to the choice of u, although we do
not study this effect in this work. The effect of declustering to the generalized Pareto
distribution in Eq. 5 is the replacement of the parameters σ and ξ by σθ−1 and ξ, where
19
θ is the so-called extremal index and is loosely defined as
θ = (limiting mean cluster size)−1 .
5.3
Method Application
In this subsection we apply the method described in subsection 5.2 to model the thinning
of the amount of precipitation process, i.e. Yt . To demonstrate the method we use data
from the station in Lund. The rest of the stations give similar results.
As we have already seen, the data exhibit temporal dependence. The correlation
coefficient ρ, using the Gaussian copula for the data from Lund was estimated to be 0.1362.
The dependence in the data can also be seen in Fig. 10, where the expected number of
m clusters (with more than one observation) for different values of m and u = 15mm
are plotted. The expected number of these m clusters, assuming the observations are
independent is denoted by ’o’ and are consistently less that the observed number of m
clusters that is denoted by ’+’. The expected number of m clusters computed assuming
the observations are actually correlated (ρ = 0.1362) is denoted by ’*’ and provides with
an obvious improvement to the assumption of independence. We also provide with 95%
exact confidence intervals for both cases. The observed values fall inside the confidence
interval constructed assuming correlated data.
No clusters with more than one obs
50
40
30
20
10
0
0
1
2
3
4
5
6
m
Fig. 10: Number of m-clusters with more than one observation. ’+’ denotes the observed
and ’o’ the theoretical number of m-clusters assuming that the observations are independent, while ’*’ denotes the number of m clusters using ρ = 0.1362. Line ’–’ denotes the
95% confidence interval for the theoretical number of m-clusters assuming independence,
20
while ’-.’ denotes the 95% confidence interval for the theoretical number of m-clusters
assuming ρ̂ = 0.1362.
After the cluster size has been decided, in the case of the station in Lund m = 0, we
turn to the problem of estimating the parameters ξ, σ and θ for the generalized Pareto
model. The choice of the specific threshold (u = 15mm) was based on mean residual life
plot. It is expected, see Coles (2001) that for the threshold u for which the generalized
Pareto model provides a good approximation for the excesses above that level, the mean
residual life plot i.e. the locus of the points
)
!
(
nu
1 X
(Yt(i) − u) : u < Ytmax ,
u,
nu i=1
where Yt(1) , . . . , Yt(nu ) are the nu observations that exceed u and Ytmax is the largest observation of the process Yt , should be approximately linear in u. Fig. 11 shows the mean
residual life plot with approximate 95% confidence interval for the daily precipitation in
Lund. The graph appears to curve from u = 0 until u = 15 and is approximately linear
after that threshold. It is tempting to conclude that there is no stability until u = 28
after which there is approximate linearity which suggests u = 28. However, such threshold
gives very few excesses for any meaningful inference (33 observations out of 16000). So
we decided to work initially with the threshold set at u = 15.
12
10
E[X−u|X>u]
8
6
4
2
0
0
10
20
30
u
40
50
60
Fig. 11: Mean residual life plot of amount precipitation process from Lund, dotted lines
give the 95% confidence interval.
Finally, the different diagnostic plots for the fit of the Generalized Pareto distribution
are collected in Fig. 12. The data from the rest of the stations have produced similar plots
21
none of which gave any reason for concern about the quality of the fitted models. The
parameters of the generalized Pareto model for the data from all the stations together
with 95% confidence intervals are collected in Table 4. For three different stations, (i.e.
0.5
0
0
0.5
1
Quantile Plot
60
40
20
20
40
GP model
GP model (mm)
Return Level Plot
Density Plot
no of observations
Return Level
Extreme Empirical model
Probability Plot
1
Extreme Empirical model (mm)
Bolmen, Borås, and Hapamanda), the estimates of the shape parameter, ξ, are negative.
80
60
40
10
40
80
60
40
20
0
160
Return Period (Years)
60
20
40
60
Amount of precipitation (mm)
Fig. 12: Diagnostic plots for threshold excess model fitted to daily precipitation data
from the station in Lund.
Table 5 shows θ for different values of m-clusters and threshold u = 15 for the data
from Lund.
6
Evaluation
To verify the validity of the model, we have obtained distribution functions of the different precipitation indices as stipulated by the Expert Team and its predecessor, the
CCl/CLIVAR Working Group (WG) on Climate Change Detection, see Peterson et al. (2001)
and Karl et al. (1999). Sixteen of those indices are of relevance to this work, two regarding only the occurrence of precipitation process (CDD and CWD), another two regarding
only the amount precipitation process (SDII and Prec90p) and the remaining twelve concerning both processes, see Table 6. Using the chain dependent model, we have obtained
the distribution of each index based on 100, 000 simulations. This has been compared
to the empirical distribution (’.-’ line in Figs. 14 - 18). The agreement between the two
distributions is more than satisfactory. Moreover, the empirical distribution falls always
inside the 90% exact confidence intervals. The results have been presented for the weather
station in Lund. The rest of the stations give similar results.
22
Station
σ̂
CI for σ̂
ξˆ
CI for ξˆ
Lund
5.91
(4.93, 7.03)
0.076
(-0.041, 0.236) 0.935 15
0.1362
Bolmen
6.44
(5.56, 7.41)
-0.0002 (-0.095, 0.116) 0.921 15
0.2008
Hanö
5.29
(3.044, 8.737) 0.458
(0.115, 1.05)
0.977 25
0.1649
Borås
7.63
(7.01, 8.28)
(-0.067,0.053)
0.794 10
0.1982
Varberg
5.48
(4.687, 6.378) 0.106
(0.001, 0.236)
0.926 15
0.1206
Ungsberg
5.768 (4.622,7.115)
0.245
(0.089,0.445)
0.925 15
0.1843
Säffle
6.62
(5.96,7.329)
0.099
(0.027,0.183)
0.857 10
0.1809
Söderköping 6.259 (4.32,8.884)
0.297
(0.1, 0.649)
0.984 25
0.1678
Stockholm
5.597 (4.827,6.453)
0.135
(0.033,0.259)
0.903 10
0.1523
Malung
6.355 (5.676,7.095)
0.08
(0.004,0.17)
0.86
10
0.2280
Vattholma
4.964 (3.521,6.784)
0.334
(0.098,0.667)
0.984 20
0.1709
Myskelåsen
6.854 (5.962,7.844)
0.019
(-0.072,0.13)
0.849 10
0.2311
Härnösand
7.863 (7.053, 8.742) 0.087
(0.011, 0.175)
0.832 10
0.2068
Rösta
6.276 (5.453,7.19)
(-0.062, 0.145) 0.876 10
0.2116
Piteå
5.937 (4.429, 7.822) 0.19
(0.004, 0.456)
0.96
20
0.2010
Stensele
7.66
0.041
(-0.11, 0.236)
0.915 15
0.2249
Haparanda
5.628 (4.405, 7.07)
-0.073
(-0.196, 0.125) 0.984 18
0.1871
Kvikkjokk
5.66
0.04
(-0.04, 0.137)
0.864 10
0.2526
Pajala
5.033 (3.705, 6.728) 0.356
(0.153, 0.646)
0.966 18
0.2385
Karesuando
5.303 (4.117, 6.754) 0.12
(-0.037, 0.34)
0.922 15
0.2206
(6.098, 9.5)
(5.01, 6.36)
-0.011
0.032
θ̂
u (mm) ρ̂
Table 4: Extremal parameters and their 95% confidence intervals for each weather station.
m
θ̂
0
0.9144
1
0.8836
2
0.8425
3
0.8322
Table 5: Values of the parameter θ for different choices of m clusters.
23
Index
Description
R10mm
Heavy precipitation days
R20mm
Very heavy precipitation days
RX1day
Highest 1 day precipitation amount
RX5day
Highest 5 day precipitation amount
maxi Zi
P
maxi 4j=0 Zi+j
CDD
Max number of consecutive dry days
max{j : τj (X i ) = 0}
CWD
Max number of consecutive wet days
R75p
Moderate wet days
R90p
Above moderate wet days
R95p
Very wet days
R95p
Extremely wet days
max{j : w = τj (X i ), wk > 0, ∀k}
P
1{Zi >q0.75 }
P
1{Zi >q0.90 }
P
1{Zi >q0.95 }
P
1{Zi >q0.99 }
P
P
Zi 1{Zi >q0.75 } / Zi
P
P
Zi 1{Zi >q0.90 } / Zi
P
P
Zi 1{Zi >q0.95 } / Zi
P
P
Zi 1{Zi >q0.99 } / Zi
P
P
Yi / 1{Yi >0 }
R75pTOT Precipitation fraction due to R75p
R90pTOT Precipitation fraction due to R90p
R95pTOT Precipitation fraction due to R95p
R99pTOT Precipitation fraction due to R99p
Formula
P
1{Zi >10}
P
1{Zi >20}
SDII
Simple daily intensity index
Prec90p
90%-quant. of thinned amount of precipitation FY−1 (0.9)
Table 6: Weather Indices and their mathematical expressions. The quantiles q(·) have
been estimated using the observed data.
As we can see, Fig. 14 (top left), approximately during two years we expect to have
about 17 days with precipitation more than 10mm and, Fig. 14 (top right), about 3 days
with precipitation more than 20mm. But then, see Fig. 14 (bottom left), the precipitation
during each one of these three days will be quite a lot more than 20mm. Fig. 14 (bottom
right) tell us that the probability of having 5 consecutive days of really heavy precipitation
in Lund is quite high.
24
Index R10mm
Index R20mm
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
Model cumul. distr.
Empirical distr.
10
15
20
no. of days
25
30
Model cumul. distr.
Empirical distr.
0
1
2
3
Index RX1day
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
Model cumul. distr.
Empirical distr.
15
20
25
30
5
6
no. of days
7
8
9
10
Index RX5day
1
0
4
35
40
45
50
amount of prec. (mm)
55
60
Model cumul. distr.
Empirical distr.
0
65
30
40
50
60
70
80
amount of prec. (mm)
90
100
110
Fig. 14: Plots of R10mm (top left), R20mm (top right), RX1day (bottom left) and
RX5day (bottom right). theoretical distribution ’-’ and empirical distribution ’.-’.
Index CDD
Index CWD
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
Model cumul. distr.
Empirical distr.
10
15
20
25
no. of days
30
35
Model cumul. distr.
Empirical distr.
8
10
12
14
16
no. of days
18
20
22
24
Fig. 15: Plot of maximum number of consecutive dry days (left), and maximum number
of consecutive wet days (right).
As we notice in Fig. 15 (left), once every two years we should expect to have a dry
spell with length more than two weeks, and a wet spell of approximately 12 days.
25
Index R75p
Index R90p
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
Model cumul. distr.
Empirical distr.
0
30
35
40
45
50
no. of days
55
60
65
Model cumul. distr.
Empirical distr.
70
10
15
20
no. of days
25
30
Index R99p
Index R95p
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
Model cumul. distr.
Empirical distr.
Model cumul. distr.
Empirical distr.
4
6
8
10
12
no. of days
14
16
18
0
1
2
3
4
5
no. of days
6
7
8
9
Fig. 16: Plot of the probability of number of moderate wet days (top left), above moderate wet days (top right), very wet days (bottom left) and extremely wet days (bottom
right).
In Fig. 16 (top left), we see that every two years in Lund, we expect to have almost
fifty moderately wet days (top right), almost 18 above moderately wet days (top right),
almost 8 very wet days (bottom left) and almost 2 extremely wet days (bottom right).
In Fig. 17 (top left), we see that during the fifty moderately wet days that we expect over a period of two years in Lund we will have about 70% of the total amount of
precipitation. Similarly, during the 18 above moderate wet days we expect on average a
little more than 40% of the total precipitation amount (top right), for the 8 very wet days
about 25% of the total amount (bottom left) and for the 2 extremely wet days about 10%
(bottom right) of the total amount.
In Fig. 18 (left), we see that the average amount of precipitation per day of precipitation is 3.5mm and also every year on average only 1 out of the 10 precipitation days the
downfall exceeds 9.5mm.
26
Index R75pTOT
Index R90pTOT
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
Model cumul. distr.
Empirical distr.
0
0.6
0.65
0.7
0.75
Model cumul. distr.
Empirical distr.
0
0.8
0.25
0.3
0.35
Index R95pTOT
0.4
0.45
0.5
0.55
Index R99pTOT
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
Model cumul. distr.
Empirical distr.
0
0.15
0.2
0.25
0.3
0.35
Model cumul. distr.
Empirical distr.
0
0.4
0
0.05
0.1
0.15
0.2
0.25
Fig. 17: Percentage of precipitation during the moderately wet days (top left), the above
moderate wet days (top right), the very wet days (bottom left) and the extremely wet
days (bottom right).
Index SDII
Index Prec90p
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
Model cumul. distr.
Empirical distr.
0
3
3.5
4
amount of prec. (mm)
4.5
Model cumul. distr.
Empirical distr.
0
5
8
9
10
11
amount of prec. (mm)
12
13
Fig. 18: Plot of the average amount of precipitation per day of precipitation (left) and the
90% quantile of the amount of precipitation of the thinned precipitation process (right).
7
Conclusions
In this paper, we have modelled the temporal variability of the precipitation in Sweden.
The different weather stations have been assumed as not having any spatial dependence.
27
It is among our future research plans to try to model also the spatial variability of the
precipitation in the different weather stations in Sweden. Some interesting conclusions
can be drawn.
We have used a chain dependent model for the precipitation. That consists of a component for the occurrence of precipitation and a component for the amount of precipitation.
For the first component, we have used high order Markov chains with two states. We have
shown that the 1-Markov chain model that has been used extensively, is an inadequate
model for most of the Swedish stations. For example, when the distribution of the long
dry spell is of interest, the 1-Markov chains underestimates the length of the long dry
spell in some cases up to half a day.
For the amount of precipitation process, we have used a copula to describe the temporal
dependence structure between successive days, which in reality is a Gaussian process with
transformed marginals. Then, the cumulative distribution has been modelled in two
steps. First using the empirical distribution for the amounts of precipitation that are less
than a given threshold and, then using a generalised Pareto distribution to model the
excesses above the threshold. Such models have the advantage that they provide with the
mathematical platform that allows computation of such quantities as return periods.
Finally, the distributions of different weather indices have been computed using Monte
Carlo Markov Chain techniques, and been compared to the empirical distributions obtained from the data. The agreement between the two distributions has been really good,
which supports the choice of the models.
References
[1] Akaike, H. (1974). A new look at statistical model identification, IEEE Trans. Auto.
Contol, AC, 19, pp.716-722
[2] Benjamin, J.R. and Cornell, C.A., (1970). Probability, Statistics and Decision for Civil
Engineers, McGraw-Hill, Inc., New York, 685 pp.
[3] Bruhn, J.A., Fry, W.E. and Fick, G.W., (1980). Simulation of daily weather data
using theoretical probability distributions. J. Appl. Meteorol. 19, pp. 1029-1036.
[4] Castellvi, F. and Stockle, C.O., (2001). Comparing a locally-calibrated versus a generalised temperature weather generation. Trans. ASAE 44 5, pp. 1143-1148.
28
[5] Chin, E. H., (1977). Modelling daily precipitation process with Markov chain, Wat.
Resources Res., 13, 949-956.
[6] Coles, S., (2001). An Introduction to Statistical Modeling of Extreme Values. Springer,
London.
[7] Cox, D.R. and Isham, V., (1988). A simple spatial-temporal model for rainfall (with
discussion). Proc. R. Soc. Lond. A, 415, 317-328.
[8] Cox, D.R. and Isham, V., (1994). Stochastic models of precipitation. In Statistics for
the Environment 2: Water Related Issues (eds V. Barnett and K.F. Turkman), ch. 1, pp.
3-18. Chichester: Wiley.
[9] Crovella, M. and Taqqu, M., (1999). Estimating the heavy tail index from scaling
properties, Methodology and Computing in Applied Probability 1, 55-79.
[10] Dalevi, D., Pubhashi, D. and Hermansson, M., (2006). A New Order Estimator for
Fixed and Variable Length Markov Models with Applications to DNA Sequence Similarity, Stat. Appl. Genet. Mol. Biol., 5, Article 8.
[11] Davison, A.C. and Smith, R.L., (1990). Models for exceedances over high thresholds,
J. Roy. Statist. Soc. B, 52, pp. 393-442.
[12] Frigessi, A., Haug, O., Rue, H., (2002). A Dynamic Mixture Model for Unsupervised
Tail Estimation without Threshold Selection, Extremes, 5, pp.219-235.
[13] Gabriel, K.R. and Neumann, J., (1962). A Markov chain model for daily rainfall
occurrences at Tel Aviv. Quart.J.royal Met.Soc. 88, 90-95.
[14] Geng, S., Frits, W.T., de Vries, P. and Supit, I., (1986). A simple method for generating daily raifall data. Agric. For. Meteorol. 36, pp. 363-376.
[15] Guttorp, P. (1995). Stochastic Modelling of Scientific Data,Chapman & Hall, London
Chapter 2
[16] Hutchinson, M.F., (1995). Stochastic space-time weather models from ground-based
data. Agric. For. Meteorol., 73, 237-264.
[17] Hutchinson, T.P. and Lai, C.D., (1990). Continuous Bivariate Distributions, Emphasising Applications. Sydney, Australia: Rumsby.
[18] Jimoh, O.D. and Webster, P., (1996). the optimum order of a Markov chain model
for daily rainfall in Nigeria. Journal of Hydrology, 185, 45-69.
[19] Joe, H., (1997). Multivariate Models and Dependence Concepts. London: Chapman
29
& Hall
[20] Karl, T.R., Nicholls, N. and Ghazi, A., (1999). CLIVAR/GCOS/WMO workshop on
indices and indicators for climate extremes: Workshop summary, Climatic Change. Vol.
32, pp. 3-7.
[21] Lana, X. and Burgueno, A., (1998). Daily dry-wet behaviour in Catalonia (NE Spain)
from the viewpoint of Markov chains, Int. J. Climatol. 18, 793-815.
[22] Leadbetter, M.R., Lindgren, G., and Rootzen, H., (1983). Extremes and Related
Properties of Random Sequences and Series. Springer Verlag, New York.
[23] LeCam, L., (1961). A stochastic description of precipitation Proc.4th Berkeley Symp.,
pp.165-186.
[24] Lennartsson, J., and Shu, M., (2005). Copula Dependence Structure on Real Stock
Markets, Masters thesis, Chalmers University of Technology, 2005-01.
[25] Liao, Y., Zhang, Q. and Chen, D., (2004). Stochastic modeling of daily precipitation
in China. Journal of Geographical Sciences, 14(4), 417-426.
[26] Mellor, D., (1996). The modified turning bands (mtb) model for space-time rainfall:i,
model definition and properties. J. Hydrol., 175 113-127.
[27] McNeil, A.J., (1996). Estimating the tails of loss severity distributions using extreme
value theory, Technical report, Department Mathematik, ETH Zentrum, Zurich.
[28] Nelsen, R. B., (2006). An Introduction to Copulas 2nd edition. New York: Springer.
[29] Norris, J.R., (2005). Markov chains, Cambridge University Press.
[30] Peterson, T.C., Folland, C., Gruza, G., Hogg, W., Mokssit, A. and Plummer, N.,
(2001). Report on the Activities of the Working Group on Climate Change Detection
and Related Rapporteurs 1998-2001. World Meteorological Organisation, WCDMP-47,
WMO-TD 1071.
[31] Racsko, P., Szeidl, L. and Semenov, M., (1991). A serial approach to local stochastic
weather models. Ecol. Model. 57, pp. 27-41.
[32] Resnick, S.I., (1997). Heavy tail modeling and teletraffic data, The Annals of Statistics 255, 1805-1869.
[33] Rootzen, H. and Tajvidi, N., (1997). Extreme value statistics and wind storm losses:
A case study, Scandinavian Actuarial Journal 1, 70-94.
[34] Rodrı́guez-Iturbe, I., Cox, D. and Isham, V., (1987). Some models for rainfall based
30
on stochastic point processes. Proc. R. Soc. Lond., A 410, 269-288.
[35] Rodrı́guez-Iturbe, I., Cox, D. and Isham, V., (1988). A point process model for rainfall: further developments. Proc. R. Soc. Lond., A 417, 283-298.
[36] Schwarz, G., (1978). Estimating the dimension of a model. Ann. Stat. 6, pp. 461464.
[37] Selker, J.S. and Haith, D.A., (1990). Development and testing of simple parameter
precipitation distributions. Water Resour. Res. 26 11, pp. 2733-2740.
[38] Smith, R. L. and Robinson, P.J., (1997). A Bayesian approach to the modelling of
spatial-temporal precipitation data. Lect. Notes Statist., 237-269.
[39] Srikanthan, R. and McMAhon, T.A., (2001). Stochastic generation of annual, monthly
and daily climate data: A review. Hydrol. Earth Syst. Sci. 5 4, pp. 653-670.
[40] Stern, R.D. and Coe, R. (1984)., A Model fitting Analysis of Daily Rainfall Data,
J.R.Statist.Soc. A, 147, Part1, pp.1-34.
[41] Stidd, C.K., (1973). Estimating the precipitation climate. Wat. Resour. Res., 9
1235-1241.
[42] Trivedi, P. K. and Zimmer, D.M., (2007). Copula Modelling: An Introduction for
Practitioners, Foundations and Trends in Econometrics, Vol. 1, No 1, 1-111.
[43] Waymire, E., Gupta, V. K., (1981). The mathematical structure of rainfall representations: 3,, Some applications of the point process theory to rainfall processes. Wat.
Resour. Res.17, 1287-1294.
[44] Waymire, E., Gupta, V. K. and Rodrı́guez-Iturbe, I., (1984). Spectral theory of rainfall intensity at the meso-β scale. Wat. Resour. Res.20, 1453-1465.
[45] Woolhiser, D.A., (1992). Modelling daily precipitation-progress and problems. In:
A. Walden and P. Guttorp (Editors), Statistics in the Environmental and Earth Sciences.
Edward Arnold, London, pp.71-89.
31
8
Appendix
8.1
Review of Mathematical Order Estimators
Let Xt denote a k-Markov chain that is defined on a state space S and xn1 its realisation.
Let also PML(k) (xn1 ) be the kth order maximum likelihood, i.e.
PML(k) (xn1 ) = max P (X1k )Πni=k+1 P (Xi = xi |τk (X i−1 ) = τk (xi−1 )).
Tong (1975) reported that the Akaike Information Criterion (AIC) order estimator, could
be used as an objective technique for determining the optimum order k of the chain, see
also Akaike (1974).The optimium order k is the order that has the minimum loss function:
k̂AIC (xn1 ) = argmink (− log PML(k) (xn1 ) + |S|k ).
Schwartz (1978) presented an alternative technique the Bayesian Information Criterion
(BIC) order estimator whose consistency was established under general conditions was
only recently established. The optimum order, k is the order that minimises the loss
function which now is given by:
k̂BIC (xn1 ) = argmink (− log PML (xn1 ) +
|S|k (|S| − 1)
log(n)).
2
Dalevi et al. (2006) showed using experimental results that the BIC order estimator tends
to under-estimate the order as k gets larger for moderate data sizes.
Finally, the Maximal Fluctuation Criterion (MFC) contrary to the AIC and BIC
order estimators, was specifically designed for multiple step Markov chains. Let for any
realisation x ∈ S n of the k-Markov chain, Nx (w) = |{i ∈ [1, n] : τl (xi ) = w, w ∈ S l }|
denote the number of times w occurs in x. The Peres-Shields Fluctuation function is
defined as
∆k (v) = max |Nx (vs) −
s∈S
Nx (τk (v)s)
Nx (v)|.
Nx (τk (v))
When the order of the Markov chain is k or less, this fluctuation is small. Therefore, the
Maximal Fluctuation Criterion (MFC) order estimator is defined as
k̂MFC (xn1 ) = min{k ≥ 0 :
max
k<|v|<log log(n)
∆k (v) < n3/4 }.
In practice the function log log(·) is substituted by any function that grows slower than
log(·). Dalevi et al. (2006) suggested the Generalized Maximum Fluctuation Criterion
32
(GMFC) order estimator, which is closely related to the Maximal Fluctuation Criterion
(MFC) order estimator,
k̂GMFC (xn1 ) = argmaxk
maxk−1<|v|<f (n) ∆k−1 (v)
,
maxk<|v|<f (n) ∆k (v)
where f (n) is any function that satisfies the same conditions as for the GMF order estimator.
33
Accepted Manuscript
Modelling precipitation in Sweden using multiple step Markov chains and a
composite model
Jan Lennartsson, Anastassia Baxevani, Deliang Chen
PII:
DOI:
Reference:
S0022-1694(08)00484-8
10.1016/j.jhydrol.2008.10.003
HYDROL 16319
To appear in:
Journal of Hydrology
Received Date:
Revised Date:
Accepted Date:
28 May 2008
3 October 2008
4 October 2008
Please cite this article as: Lennartsson, J., Baxevani, A., Chen, D., Modelling precipitation in Sweden using multiple
step Markov chains and a composite model, Journal of Hydrology (2008), doi: 10.1016/j.jhydrol.2008.10.003
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Modelling precipitation in Sweden using multiple step Markov chains and a
composite model
Jan Lennartsson1 , Anastassia Baxevani1∗†, Deliang Chen2
1
1
Department of Mathematical Sciences, Chalmers University of Technology, University of Gothen2
2
burg, Gothenburg, Sweden
3
Sweden
4
Abstract
5
In this paper, we propose a new method for modelling precipitation in Sweden. We consider a chain dependent
6
stochastic model that consists of a component that models the probability of occurrence of precipitation at a
7
weather station and a component that models the amount of precipitation at the station when precipitation does
8
occur. For the first component, we show that for most of the weather stations in Sweden a Markov chain of
9
an order higher than one is required. For the second component, which is a Gaussian process with transformed
10
marginals, we use a composite of the empirical distribution of the amount of precipitation below a given threshold
11
and the generalized Pareto distribution for the excesses in the amount of precipitation above the given threshold.
12
The derived models are then used to compute different weather indices. The distribution of the modelled indices
13
and the empirical ones show good agreement, which supports the choice of the model.
Key words: High order Markov chain, generalized Pareto distribution, copula, precipitation process, Sweden
14
15
Department of Earth Sciences, University of Gothenburg, Gothenburg,
1
Introduction
16
Realistic sequences of meteorological variables such as precipitation are key inputs in many hydrologic,
17
ecologic and agricultural models. Simulation models are needed to model stochastic behavior of climate
18
system when historical records are of insufficient duration or inadequate spatial and /or temporal coverage.
19
In these cases synthetic sequences may be used to fill in gaps in the historical record, to extend the
20
historical record, or to generate realizations of weather that are stochastically similar to the historical
21
record. A weather generator is a stochastic numerical model that generates daily weather series with the
22
same statistical properties as the observed ones, see Liao et al. (2004).
23
In developing the weather generator, the stochastic structure of the series is described by a statistical
24
model. Then, the parameters of the model are estimated using the observed series. This allows us to
25
generate arbitrarily long series with stochastic structure similar to the real data series.
∗ Corresponding
† Research
author
supported partially by the Gothenburg Stochastic Center and the Swedish foundation for Strategic Research
through Gothenburg Mathematical Modelling Center.
1
ACCEPTED MANUSCRIPT
26
Parameter estimation of stochastic precipitation models has been a topic of intense research in the last
27
20 years. The estimation procedures are intrinsically linked to the nature of the precipitation model itself
28
and the timescale used to represent the process. There are models which describe the precipitation process
29
in continuous time and models describing the probabilistic characteristics of precipitation accumulated
30
on a given time period, say daily or monthly totals. Different reviews of the available models have been
31
presented: see for example Woolshiser (1992), Cox and Isham (1988) and Smith and Robinson (1997).
32
Continuous time models for a single site with parameters related to the underlying physical precipi-
33
tation process are particularly important for the analysis of data at short timescales, e.g. hourly. Some
34
of these models are described in Rodrı́guez-Iturbe et al. (1987, 1988) and Waymire and Gupta (1981).
35
When only accumulated precipitation amounts for a particular time period (daily) are recorded then
36
empirical statistical models, based on stochastic models that are calibrated from actual data are appealing.
37
Empirical statistical models for generating daily precipitation data at a given site can be classified into
38
four different types, chain dependent or two-part models, transition probability matrix models, resampling
39
models and ARMA time series models, see Srikanthan and McMahon (2001) for a complete review of the
40
different models.
41
A generalization of the precipitation models for a single site is the spatial extension of these models
42
for multiple sites, to try to incorporate the intersite dependence but preserving the marginal properties
43
at each site. A more ambitious task is the modelling of precipitation continuously in time and space and
44
original work on these type of models based on point process theory was presented by LeCam (1961) and
45
further developed by Waymire et al. (1984) and Cox and Isham (1994). Mellor (1996) has developed the
46
modified turning bands model which reproduces some of the physical features of precipitation fields in
47
space as rainbands, cluster potential regions of rain cells.
48
In this study we concentrate on the chain-dependent model for the daily precipitation in Sweden
49
which consists of two steps, first a model for the sequence of wet/dry days and second, a model for the
50
amount of precipitation for the wet days. For the first, we use high-order Markov chains and for the
51
second we introduce a composite model that incorporates the empirical distribution and the generalized
52
Pareto distribution.
53
2
Data
54
Precipitation data from 20 stations in Sweden have been used in the studies presented in this paper. The
55
locations are shown in Fig. 1 and the names of the stations are given in Table 1. The data consist of
56
accumulated daily precipitation collected during 44 years starting on the 1st of January 1961 and ending
57
the 31st of December 2004 and are provided by the Swedish Meteorological and Hydrological Institute
58
(SMHI). The number of missing observations in all stations is generally low (< 5%). The time plots of
59
the annual number of wet days (above the threshold 0.1 mm) at the 20 stations are presented in Fig. 2.
60
Time plots of annual number of wet days showed that the precipitation regime in some stations
61
(namely, Söderköpping, Rösta and Stensele) contains possible trends. The results presented in the next
2
ACCEPTED MANUSCRIPT
62
sections refer to the whole period of data from all stations, but attention should be paid when we refer
63
to the above mentioned stations. In Fig. 3, time plots of the annual amount of precipitation of the wet
64
days are presented. The total amounts of precipitation seem to be stationary over the different years.
65
3
Model
66
To model precipitation in Sweden, we have decided to use a chain dependent model. The first part of the
67
model can be dealt with using Markov chains. Gabriel and Neumann (1962) used a first-order stationary
68
Markov chain. The models have since been extended to allow for non-stationarity, both by fitting separate
69
chains to different periods of the year and by fitting continuous curves to the transition probabilities, see
70
Stern and Coe (1984) and references within. The order of Markov chain required has also been discussed
71
extensively, for example Chin (1977) and references therein, with the obvious conclusion that different
72
sites require different orders. Still, the first order Markov chains are a popular choice since they have been
73
shown to perform well for a wide range of different climates, see for example Bruhn et al. (1980), Lana
74
and Burgueno (1998), Aksoy and Bayazit (2000) and Castellvi and Stockle (2001). The main deficiency
75
associated with the use of first order models is that long dry spells are not well reproduced, see Racsko
76
et al. (1991), Guttorp (1995).
77
To model the amount of precipitation that has occurred during a wet day, different models have been
78
proposed in the literature all of which assume that the daily amounts of precipitation are independent
79
and identically distributed. Stidd (1973) and Hutchinson (1995) have proposed a truncated normal
80
model for the amount of precipitation with a time dependent parameter, while the Gamma and Weibull
81
distributions have been selected by Geng et al. (1986) as well as Selker and Heith (1990), because of their
82
site-specific shape.
83
In this study, we model the occurrence of wet/dry days using Markov chains of higher order and for
84
the amount of precipitation we use a composite model, consisting of the empirical distribution function
85
for values below a threshold and the distribution of excesses for values above the given threshold. Such a
86
model is more flexible, describes better the tail of the distribution and additionally allows for dependence
87
in the precipitation process.
Let Zt be the precipitation at a certain site at time t measured in days. Then, a chain-dependent
model for the precipitation is given by,
Zt = Xt Wt ,
88
where Xt and Wt are stochastic processes such that Xt takes values in {0, 1} and Wt takes values in
89
R+ \ 0. The processes Xt and Wt will be referred to as the occurrence of precipitation and the amount
90
of precipitation process, respectively.
91
The approach presented in this study provides a mechanism to make predictions of precipitation in
92
time. This is particularly important for many applications in hydrology, ecology and agriculture. For
93
example, at a monthly level, the amount of precipitation and the probability and length of a dry period
94
are required quantities for many applications.
3
95
4
ACCEPTED MANUSCRIPT
Models for the Occurrence of Precipitation
96
Let {Xt , t = t1 , . . . , tN } denote the sequence of daily precipitation occurrence, i.e. Xt = 1, indicates a
97
wet day and Xt = 0, a dry day. A wet day in the context of this study, occurs when at least 0.1mm of
98
precipitation was recorded by the rain gauge. The level has been chosen above zero in order to avoid
99
identifying dew and other noise as precipitation and to also avoid difficulties arising from the inconsistent
100
recording of very small precipitation amounts. Moreover, daily precipitation amounts of less than 0.1mm
101
can have relatively large observational errors, and including them would cause a significant change in the
102
estimated transition probabilities of the occurrences. As a consequence this introduces additional errors
103
into the fitted models. The model is fitted over different periods of the year, that is subsets of the N
104
days of the year, that may be assumed stationary.
105
Before we continue any further we need to introduce some notation. Let S = {0, 1} denote the
106
state space of the k-Markov chain Xt . The elements of S are called letters and an ordering of letters
107
w ∈ S l = S ×· · ·×S is called a word of length l, while the words composed of the letters from position i to
108
l
j in w for some 1 ≤ i ≤ j ≤ l, are denoted as wij = (wi , wi+1 , ..., wj ). Finally, for k ≤ l let τk (w) = wl−k+1
109
denote the k-tail of the w word, i.e. τk (w) denotes the last k letters of w. If no confusion will arise when
110
k ≤ j − i, we also write τk (wj ) instead of τk (wij ).
It is assumed that the process Xt is a k-Markov chain: a model completely characterized by the
transition probability
pw,j (t) := P (Xt = j|τk (X t−1 ) = w),
j ∈ S,
t = t1 , . . . , tN ,
111
where w is a word of length k and X t−1 = {. . . , Xt−2 , Xt−1 } is the whole process up to t − 1 so τk (X t−1 )
112
is the last k days up to and including Xt−1 ; that is, τk (X t−1 ) = (Xt−k , . . . , Xt−1 ). Note that, for a
113
2−state Markov chain of any order pw,1 (t) + pw,0 (t) = 1. In the special case of time homogeneous Markov
114
chain, pw,j (t) = pw,j , for
t = t1 , . . . , tN , i.e. the transition probabilities are independent of time.
Let nw,j (t) denote the number of years during which day t is in state j and is preceded by the word
w (i.e. τk (X t−1 ) = w, w ∈ S k and Xt = j). Then the probabilities pw,j (t) are estimated by the observed
proportions
p̂w,j (t) =
nw,j (t)
,
nw,+ (t)
w ∈ Sk,
j ∈ S,
t = t1 , . . . , tN ,
115
where + indicates summation over the subscript. Note also that day 60 (February 29th ) has data only in
116
leap years so day 59 precedes day 61 in non-leap years. Fig. 4 (left) shows the unconditional probability
117
of precipitation, pooled over 5 days for clarity, plotted against t for the data from the station in Lund.
118
In the context of environmental processes, non-stationarity is often apparent, as in this case, because
119
of seasonal effects or different patterns in different months. A usual practice is to specify different subsets
120
of the year as seasons, which results to different models for each season, although the determination of
121
an appropriate segregation into seasons is itself an issue.
4
ACCEPTED MANUSCRIPT
122
4.1
123
There is an inter-annual variation in the annual number of wet days, as can be seen in Fig. 2. Moreover,
124
there is also seasonal variation in the mean monthly number of wet days, see Fig. 4 (Right) for data
125
from Lund, although this is not as prominent as in other regions of the world. It is possible that the
126
optimum order of the chain describing the wet/dry sequence varies within the year and from one year to
127
another. It is therefore important to properly identify the period of record that can be assumed as time
128
homogeneous.
Fitting Models to the Occurrence of Precipitation
129
Moreover, the problem of finding an appropriate model for the occurrence of precipitation process, Xt ,
130
is equivalent to the problem of finding the order of a multiple step Markov chain. The Akaike Informa-
131
tion Criterion (AIC), Bayesian Information Criterion (BIC) and the Generalized Maximum Fluctuation
132
Criterion (GMFC) order estimators, a short description of which can be found in the subsection 8.1, have
133
been applied to the data for each of the stations. Various block lengths were considered for determining
134
the order of the Markov chain, k, as suggested in Jimoh and Webster (1996).
135
• 1 month blocks (i.e. January, February, ..., December),
136
• 2 month blocks (January - February, February - March, ..., December - January),
137
• 3 month blocks (January - March, February - April, ..., December - February).
138
The effect of block length on the order of the Markov chain can be seen in Figs. 5-7. We can notice
139
that grouping the data in blocks of length more than one month, results in Markov chains of ”smoother”
140
order, in the sense that the order of the chain does not change so fast. It is also interesting to notice
141
that while the order of the Markov chain for the stations 16-20, varies a lot according to the AIC and
142
GMFC estimators it seems to be almost constant for the BIC order estimator. As it has been expected,
143
the BIC order estimator underestimates the order k of the Markov chain relatively to both the AIC and
144
GMFC order estimators for large k and moderate data sets, see Dalevi et al. (2006), while the values of
145
the GMFC order estimator lie between the BIC and AIC ones. The results presented in Figs. 5-7, confirm
146
that the model order is sensitive to the season (month) and the length of the season (number of months)
147
considered, as well as the method used in identifying the optimum order. Possible dependence on the
148
threshold used for identifying wet and dry days has not been studied here. For the rest of this study,
149
we define as seasons the 3 month periods, December-February, March-May, June-August, September-
150
November. As can be seen in Fig. 3 for the station in Lund, the rest of the stations provide with similar
151
plots, the probability of precipitation is close to be constant during these periods, which makes the
152
assumption of stationarity seem plaussible. The orders of the Markov chain for these periods can be
153
found in Fig. 7. For the rest of this study the order k of the Markov chain is decided according to the
154
GMFC order estimator.
5
ACCEPTED MANUSCRIPT
155
4.2
156
An interesting aspect of the wet/dry behavior, i.e. the process Xt , is the distribution of the dry (wet)
157
spells, i.e., the number of consecutive dry (wet) days, which is an accessible property of multiple step
158
Markov chains.
Distribution of Dry Spell length
For a time homogeneous (stationary) k-Markov chain Xt , (k ≥ 2), with state-space S let T be the
first time the process Xt is such that τ2 (X t ) = (1, 0), i.e.,
T = inf{t ≥ 0 : τ2 (Y t ) = (1, 0)}.
So T is the time of the start of the first dry period. Let also for the words u, v ∈ S k
au,v = P (τk (X T ) = v|τk (X 0 ) = u)
denote the probability the process Xt has at time T a k-tail equal to v given that the k-tail at time 0 is
equal to u. The probabilities au,v are easily obtained for stationary processes, see Norris (2005). Note
that at t = 0, there may be the start of a dry period, the start of a wet period, the continuation of a
dry period or the continuation of a wet period. If D(Xt ) denotes the length of the first dry period that
starts at time t = 0 for the k-Markov chain Xt , then assuming additionally that the process Xt is time
homogeneous, the distribution of the first dry spell can be computed as
P (D(Xt ) = m) =
{u∈S k }
159
πu
au,w P (τm (X m−1 ) = 0, Xm = 1|τk (X 0 ) = w),
(1)
{w∈S k :τ2 (w)=(1,0)}
where 0 is used to denote sequences of 0 s of appropriate length.
Now, if v = w01 is a word of length m + k (0 here is of order m − 1) and using the fact the process
Xt is a k-Markov chain, Eq.(1) can be rewritten as
P (D(Xt ) = m) =
{u∈S k }
πu
au,w
m
P (Xi = vk+i |τk (X i−1 ) = τk (v k+i−1 )).
(2)
i=1
{w∈S k :τ2 (w)=(1,0)}
Remark 1 Here we should notice that the distribution of the first dry spell is different than the distribution of the subsequent dry spells for Markov chains of order greater than two. For one or two order
Markov chains there is no need for this distinction. Moreover the equivalent of Eq.( 1) for k = 1 is
P (D(Xt ) = m) = pm−1
0,0 p0,1
while for k = 2, Eq. (2) simplifies to
⎧
⎨ p
10,1
P (D(Xt ) = m) =
⎩ p pm−2 p
10,0 00,0
160
for m = 1
00,1
for m ≥ 2,
where au,v = 1 for all u,v in Eq. (1).
161
The distribution of the first dry spell can be also used for model selection or model validation purposes.
162
For this, we use the Kolmogorov-Smirnov (KS) test, see Benjamin and Cornell (1970). The one sample
6
ACCEPTED MANUSCRIPT
163
KS test compares the empirical distribution function with the cumulative distribution function specified
164
by the null hypothesis.
Assuming that Pk (x) is the true distribution function (of a Markov chain of order k) the KS test is
D = sup |Pk (D(X) ≤ m) − Femp (m)|,
m∈N+
165
where Femp (x) is the empirical cumulative distribution of the length of the first dry spell. If the data
166
comes truly from a k order Markov chain and the transition probabilities are the correct ones, then by
167
Glivenko-Cantelli theorem, see Dudewicz and Mishra (1988) the KS test converges to zero almost surely
168
(a.s).
169
To apply the test, the transition probabilities have been estimated from the data using maximum
170
likelihood for different values of the order k of the Markov chain. To obtain the empirical distribution
171
of the length of the first dry spell, we have computed the length of the dry spells (sequence of zeros)
172
following the first (1, 0). (Here note that this is equivalent to computing the length of the first dry spell
173
for Markov chains of order k = 1 or k = 2. In the case of k = 3, although the distribution of the first
174
dry spell is not exactly the same as the distribution of any dry spell, we have still used all the dry spells
175
available due to shortage of data.) The procedure has been applied separately to data from each station
176
and season. If the first observations were zeros, they were ignored as the continuation of a dry spell. Also
177
if a dry spell was not over by the end of the season then it was followed inside the next season.
178
To determine whether the theoretical model was correct or not, Monte Carlo simulations were per-
179
formed. We have obtained the empirical distribution of the length of the first dry spell using 500 synthetic
180
wet/dry records of 44 years of data (each station and season was treated separately), and the KS test
181
was computed for each one of them, which resulted to the distribution of the KS statistic.
182
The suggested orders of the Markov chain using the Kolmogorov-Smirnov statistic at the 10% tail
183
value are collected in Fig. 8. The resulting orders of the Markov chain appear to be close to those obtained
184
by the BIC order estimator. In Table 2, we have collected information on how many data sets have passed
185
the Kolmogorov-Smirnov test at the 10% tail value for the different seasons. Observe that the KS test
186
suggests that the 1-Markov chain, although widely used, is an inadequate model for the majority of the
187
stations in Sweden over the different seasons.
188
4.3
Distribution of Long Dry Spells
Let us now define as long dry spell, a dry spell with length longer or equal to the order k of the Markov
chain. Then it is easy to show that the distribution of the long dry spell is actually geometric. Indeed,
let a long dry spell that starts at time i have length m ≥ k and let us also assume that we know that the
length of the dry spell is at least l. Then, for m ≥ l ≥ k
m−l
P (D(Xt ) = m|τl (X i+l−1 ) = 0) = p0,1 pm−l
,
0,0 = p0,1 (1 − p0,1 )
where as before
p0,1 = P (Xn+1 = 1|τk (X n ) = 0),
7
∀n.
ACCEPTED MANUSCRIPT
Therefore, the expected length of long dry spells is given by
E(D(Xt )|τl (X i+l−1 ) = 0) = l +
1 − p0,1
.
p0,1
(3)
189
Fig. 9 shows the conditional distribution of dry spell given that it has lasted for more than two days
190
for the first season and the data from Lund. The estimated order of the Markov chain for this data set is
191
2 using both the GMFC and the KS criterion. A first order Markov chain, the popular model of choice
192
in this case would obviously underestimate the risk of a long dry spell. A two order Markov chain seems
193
to be the best choice for this particular data set.
194
It is clear from Table 3, that underestimation of the order k of the Markov chain leads to underesti-
195
mation of the expected length of the long dry spells, where again a dry spell is defined as long if it has
196
length larger than or equal to the order of the Markov chain.
197
5
Modeling the Amount Precipitation Process
198
In this section we model the amounts of daily precipitation. This is done in two steps. Firstly we model
199
the dependence structure of the amount precipitation process and secondly we estimate the marginal
200
distribution.
201
One of the important features of any climatological data set, is that they exhibit dependence between
202
nearby stations or successive days. In this work we are interested in the latter case and the dependence
203
structure is modelled using two-dimensional Gaussian copula.
204
After the copula has been estimated, we remove the days with precipitation below the cut-off level
205
of 0.1mm. That is, we let Yt be the thinning process resulting from the amount of precipitation process
206
Wt when we consider only the wet days, i.e., Yt := Wt |Xt = 1. Then, the marginal distribution of the
207
amounts of daily precipitation is modelled following an approach that combines the fit of the distribution
208
of excesses over a high threshold with the empirical distribution of the thinned data below the threshold.
209
5.1
Copula
Almost every climatological data set exhibit dependence between successive days. To model the temporal
dependence structure of the data we use the two-dimensional Gaussian copula C given by
C(u, v; ρ) =
=
Φ−1 (u)
−∞
Φ−1 (v)
−∞
2
2
1
− x −2ρxy+y
2(1−ρ2 )
e
dxdy
2π 1 − ρ2
(4)
Φρ (Φ−1 (u), Φ−1 (v)),
210
where Φ is the cumulative distribution function of the standard normal distribution and Φρ is the joint
211
cumulative distribution function of two standard normal random variables with correlation coefficient ρ.
To estimate the copula, let
A = {t : Yt > 0 and Yt+1 > 0},
8
ACCEPTED MANUSCRIPT
be the set of all days with non zero precipitation that were followed by days also with non zero precipitation
(greater than 0.1mm) and
u = [Ya1 , Ya2 , . . . ] ,
v = [Ya1 +1 , Ya2 +1 , . . . ] ,
a1 , a2 , · · · ∈ A
be the vectors consisting of the amounts of precipitation during the days indicated in the set A and
the following days respectively, both with marginal distribution F (x). Then, transforming the vectors
u and v by taking the empirical cumulative distribution corrected by the factor
n
n+1 ,
(n is the number
of days with positive precipitation in the data set) results to vectors U and V respectively that follow
the discrete uniform distribution in (0, 1). If the Gaussian copula in Eq. (4) describes correctly the
dependence structure of the data, then
⎛⎡
⎤ ⎡
−1
µ
σ1
1
⎦,⎣
Φ (U), Φ−1 (V) ∼ N ⎝⎣
µ2
ρσ1 σ2
ρσ1 σ2
σ2
⎤⎞
⎦⎠ .
212
Finally the copula parameter ρ is estimated using Pearson’s correlation coefficient. An analytic description
213
of the method and its application can be found in Lennartsson and Shu, (2005). The dependence between
214
successive days is demonstrated in Fig. 10 where the transformed data from Lund are plotted.
215
For a thorough coverage of bivariate copulas and their properties see Hutchinson and Lai (1990),
216
Joe (1997), Nelsen (2006), and Trivedi and Zimmer (2005) who provide with a copula tutorial for prac-
217
titioners. The values of the correlation coefficient ρ, estimated for each station are collected in Table 4.
218
Notice that all the estimates of the correlation coefficient ρ are statistically significant, which makes the
219
assumption of independence between the data points to seem unreasonable.
220
5.2
Marginal Distribution
Finally, to model the amount precipitation process we propose an approach that combines the fit of the
distribution of excesses over a high threshold with the empirical distribution of the original data below
the threshold. We commence our analysis by introducing some notation followed by some introductory
remarks. Let X1 , X2 , . . . be a sequence of independent and identically distributed random variables
having marginal distribution F (x). Let us also denote by
Fu (x) = P (X ≤ x|X > u),
for x > u, the conditional distribution of X given that it exceeds level u and assume that Fu (x) can be
modelled by means of a generalized Pareto distribution, that is
− 1ξ
x−u
Fu (x) = 1 − 1 + ξ
,
σ
(5)
for some µ, σ > 0 and ξ over the set {x : x > u and 1 + ξ x−u
σ > 0}, and zero otherwise. Let also, Femp (x)
denote the empirical distribution i.e.,
n
Femp (x) =
1
{Xi ≤ x},
n i=1
9
ACCEPTED MANUSCRIPT
221
where {·} denotes the indicator function of an event, i.e. the 0 − 1 random variable which takes value 1
222
if the condition between brackets is satisfied and 0 otherwise.
Finally, define the function
FC (x; u) = Femp (x ∧ u) + (1 − Femp (u))Fu (x),
223
which, as can be easily checked, is a probability distribution function that will be used to model the
224
amount precipitation process. Thus what needs to be addressed is the choice of the level u above which
225
the excesses can be accurately modelled using a generalized Pareto distribution as well as methods for
226
the estimation of the distribution parameters.
227
5.2.1
228
Selection of a threshold level u, above which the generalized Pareto distribution assumption is appropriate
229
is a difficult task in practice see for example, McNeil (1996), Davison and Smith (1990) and Rootzen and
230
Tajvidi (1997). Frigessi et al. (2002), suggest a dynamic mixture model for the estimation of the tail
231
distribution without having to specify a threshold in advance. Once the threshold u is fixed, the model
232
parameters ξ and σ are estimated using maximum likelihood, although there exists a number of other
233
alternative methods, see for instance Resnick (1997) and Crovella and Taqqu (1999) and references therein.
234
5.2.2
Choice of Threshold Level
Extreme Value Analysis for Dependent Sequences
The generalized Pareto distribution is asymptotically a good model for the marginal distribution of high
excesses of independent and identically distributed random variables, see Coles (2001), Leadbetter et
al. (1983). Unfortunately, this is a property that is almost unreasonable for most of the climatological data
sets since dependence in successive days is to be expected. A way of dealing with the dependence between
the excesses is either to choose the level u high enough so that enough time has past between successive
excesses to make them independent or to use declustering, which is probably the most widely adopted
method for dealing with dependent exceedances; it corresponds to filtering the dependent observations
to obtain a set of threshold excesses that are approximately independent, see Coles (2001). A simple way
of determining m-clusters of extremes, after specifying a threshold u, is to define consecutive excesses
of u to belong to the same m-cluster as long as they are separated by less than m + 1 time days. It
should be noted that the separation of extreme events into clusters is likely to be sensitive to the choice
of u, although we do not study this effect in this work. The effect of declustering to the generalized
Pareto distribution in Eq.( 5) is the replacement of the parameters σ and ξ by σθ−1 and ξ, where θ is
the so-called extremal index and is loosely defined as
θ = (limiting mean cluster size)−1 .
10
ACCEPTED MANUSCRIPT
235
5.3
236
In this subsection we apply the method described in subsection 5.2 to model the thinning of the amount
237
of precipitation process, i.e. Yt . To demonstrate the method we use data from the station in Lund. The
238
rest of the stations give similar results.
Method Application
239
As we have already seen, the data exhibit temporal dependence. The correlation coefficient ρ, using
240
the Gaussian copula for the data from Lund was estimated to be 0.1362. The dependence in the data can
241
also be seen in Fig. 11, where the expected number of m clusters (with more than one observation) for
242
different values of m and u = 15mm are plotted. The expected number of these m clusters, assuming the
243
observations are independent is denoted by ’o’ and are consistently less than the observed number of m
244
clusters that is denoted by ’+’. The expected number of m clusters computed assuming the observations
245
are actually correlated (ρ = 0.1362) is denoted by ’*’ and provides with an obvious improvement to the
246
assumption of independence. We also provide with 95% exact confidence intervals for both cases. The
247
observed values fall inside the confidence interval constructed assuming correlated data.
After the cluster size has been decided, in the case of the station in Lund m = 0, we turn to the
problem of estimating the parameters ξ, σ and θ for the generalized Pareto model. The choice of the
specific threshold (u = 15mm) was based on mean residual life plot. It is expected, see Coles (2001)
that for the threshold u for which the generalized Pareto model provides a good approximation for the
excesses above that level, the mean residual life plot i.e. the locus of the points
nu
1 u,
(Yt(i) − u) : u < Ytmax ,
nu i=1
248
where Yt(1) , . . . , Yt(nu ) are the nu observations that exceed u and Ytmax is the largest observation of the
249
process Yt , should be approximately linear in u. Fig. 12 shows the mean residual life plot with approximate
250
95% confidence interval for the daily precipitation in Lund. The graph appears to curve from u = 0mm
251
until u = 15mm and is approximately linear after that threshold. It is tempting to conclude that there
252
is no stability until u = 28mm after which there is approximate linearity which suggests u = 28mm.
253
However, such threshold gives very few excesses for any meaningful inference (33 observations out of
254
16000). So we decided to work initially with the threshold set at u = 15mm.
255
Finally, the different diagnostic plots for the fit of the Generalized Pareto distribution are collected in
256
Fig. 13. The data from the rest of the stations have produced similar plots none of which gave any reason
257
for concern about the quality of the fitted models. The parameters of the generalized Pareto model for
258
the data from all the stations together with 95% confidence intervals are collected in Table 4. For three
259
different stations, (i.e. Bolmen, Borås, and Hapamanda), the estimates of the shape parameter, ξ, are
260
negative.
261
Table 5 shows θ for different values of m-clusters and threshold u = 15mm for the data from Lund.
11
262
6
Evaluation
ACCEPTED MANUSCRIPT
263
To verify the validity of the model, we have obtained distribution functions of the different precipitation
264
indices as stipulated by the Expert Team and its predecessor, the CCl/CLIVAR Working Group (WG) on
265
Climate Change Detection, see Peterson et al. (2001) and Karl et al. (1999). Sixteen of those indices are
266
of relevance to this work, two regarding only the occurrence of precipitation process (CDD and CWD),
267
another two regarding only the amount precipitation process (SDII and Prec90p) and the remaining
268
twelve concerning both processes, see Table 6.
269
Using the chain dependent model, we have obtained the distribution of each index based on 100, 000
270
simulations. This has been compared to the empirical distribution (’.-’ line in Figs. 14 - 18). The
271
agreement between the two distributions is more than satisfactory. Moreover, the empirical distribution
272
falls always inside the 90% exact confidence intervals. The results have been presented for the weather
273
station in Lund. The rest of the stations give similar results.
274
As we can see, Fig. 14 (top left), approximately during two years we expect to have about 17 days
275
with precipitation more than 10mm and, Fig. 14 (top right), about 3 days with precipitation more than
276
20mm. But then, see Fig. 14 (bottom left), the precipitation during each one of these three days will be
277
quite a lot more than 20mm. Fig. 14 (bottom right) tell us that the probability of having 5 consecutive
278
days of really heavy precipitation in Lund is quite high.
As we notice in Fig. 15 (left), once every two years we should expect to have a dry spell with length
279
280
more than two weeks, and a wet spell of approximately 12 days.
281
In Fig. 16 (top left), we see that every two years in Lund, we expect to have almost fifty moderately
282
wet days (top right), almost 18 above moderately wet days (top right), almost 9 very wet days (bottom
283
left) and almost 2 extremely wet days (bottom right).
284
In Fig. 17 (top left), we see that during the fifty moderately wet days that we expect over a period of
285
two years in Lund we will have about 70% of the total amount of precipitation. Similarly, during the 18
286
above moderate wet days we expect on average a little more than 40% of the total precipitation amount
287
(top right), for the 8 very wet days about 25% of the total amount (bottom left) and for the 2 extremely
288
wet days about 10% (bottom right) of the total amount.
In Fig. 18 (left), we see that the average amount of precipitation per day of precipitation is 3.5mm
289
290
291
and also every year on average only 1 out of the 10 precipitation days the downfall exceeds 9.5mm.
7
Conclusions
292
In this paper, we have modelled the temporal variability of the precipitation in Sweden. The different
293
weather stations have been assumed as not having any spatial dependence. It is among our future research
294
plans to try to model also the spatial variability of the precipitation in the different weather stations in
295
Sweden. Some interesting conclusions can be drawn.
296
We have used a chain dependent model for the precipitation. That consists of a component for the
297
occurrence of precipitation and a component for the amount of precipitation. For the first component,
12
ACCEPTED MANUSCRIPT
298
we have used high order Markov chains with two states. We have shown that the 1-Markov chain model
299
that has been used extensively, is an inadequate model for most of the Swedish stations. For example,
300
when the distribution of the long dry spell is of interest, the 1-Markov chains underestimates the length
301
of the long dry spell in some cases up to half a day.
302
For the amount of precipitation process, we have used a copula to describe the temporal dependence
303
structure between successive days, which in reality is a Gaussian process with transformed marginals.
304
Then, the cumulative distribution has been modelled in two steps. First using the empirical distribution
305
for the amounts of precipitation that are less than a given threshold and, then using a generalised Pareto
306
distribution to model the excesses above the threshold. Such models have the advantage that they provide
307
with the mathematical platform that allows computation of such quantities as return periods.
308
Finally, the distributions of different weather indices have been computed using Monte Carlo Markov
309
Chain techniques, and been compared to the empirical distributions obtained from the data. The agree-
310
ment between the two distributions has been really good, which supports the choice of the models.
311
References
312
[1] Akaike, H. (1974). A new look at statistical model identification, IEEE Trans. Auto. Contol, AC,
313
19, pp.716-722
314
[2] Aksoy, H. and Bayazit, M., (2000). A model for daily flows of intermittent streams. Hydrological
315
Processes, 14, 1725-1744.
316
[3] Benjamin, J.R. and Cornell, C.A., (1970). Probability, Statistics and Decision for Civil Engineers,
317
McGraw-Hill, Inc., New York, 685 pp.
318
[4] Bruhn, J.A., Fry, W.E. and Fick, G.W., (1980). Simulation of daily weather data using theoretical
319
probability distributions. J. Appl. Meteorol. 19, pp. 1029-1036.
320
[5] Castellvi, F. and Stockle, C.O., (2001). Comparing a locally-calibrated versus a generalised tempera-
321
ture weather generation. Trans. ASAE 44 5, pp. 1143-1148.
322
[6] Chin, E. H., (1977). Modelling daily precipitation process with Markov chain, Wat. Resources Res.,
323
13, 949-956.
324
[7] Coles, S., (2001). An Introduction to Statistical Modeling of Extreme Values. Springer, London.
325
[8] Cox, D.R. and Isham, V., (1988). A simple spatial-temporal model for rainfall (with discussion). Proc.
326
R. Soc. Lond. A, 415, 317-328.
327
[9] Cox, D.R. and Isham, V., (1994). Stochastic models of precipitation. In Statistics for the Environment
328
2: Water Related Issues (eds V. Barnett and K.F. Turkman), ch. 1, pp. 3-18. Chichester: Wiley.
329
[10] Crovella, M. and Taqqu, M., (1999). Estimating the heavy tail index from scaling properties, Method-
330
ology and Computing in Applied Probability 1, 55-79.
331
[11] Dalevi, D., Pubhashi, D. and Hermansson, M., (2006). A New Order Estimator for Fixed and Vari-
332
able Length Markov Models with Applications to DNA Sequence Similarity, Stat. Appl. Genet. Mol.
333
Biol., 5, Article 8.
13
ACCEPTED MANUSCRIPT
334
[12] Davison, A.C. and Smith, R.L., (1990). Models for exceedances over high thresholds, J. Roy. Statist.
335
Soc. B, 52, pp. 393-442.
336
[12] Frigessi, A., Haug, O., Rue, H., (2002). A Dynamic Mixture Model for Unsupervised Tail Estimation
337
without Threshold Selection, Extremes, 5, pp.219-235.
338
[13] Dudewizz, E. J. and Mishra, S. N. (1988). Modern Mathematical Statistics, Wiley Series in Proba-
339
bility and Mathematical Statistics.
340
[14] Gabriel, K.R. and Neumann, J., (1962). A Markov chain model for daily rainfall occurrences at Tel
341
Aviv. Quart.J.Royal Met.Soc. 88, 90-95.
342
[15] Geng, S., Frits, W.T., de Vries, P. and Supit, I., (1986). A simple method for generating daily rainfall
343
data. Agric. For. Meteorol. 36, pp. 363-376.
344
[16] Guttorp, P. (1995). Stochastic Modelling of Scientific Data,Chapman & Hall, London Chapter 2
345
[17] Hutchinson, M.F., (1995). Stochastic space-time weather models from ground-based data. Agric.
346
For. Meteorol., 73, 237-264.
347
[18] Hutchinson, T.P. and Lai, C.D., (1990). Continuous Bivariate Distributions, Emphasising Applica-
348
tions. Sydney, Australia: Rumsby.
349
[19] Jimoh, O.D. and Webster, P., (1996). The optimum order of a Markov chain model for daily rainfall
350
in Nigeria. Journal of Hydrology, 185, 45-69.
351
[20] Joe, H., (1997). Multivariate Models and Dependence Concepts. London: Chapman & Hall
352
[21] Karl, T.R., Nicholls, N. and Ghazi, A., (1999). CLIVAR/GCOS/WMO workshop on indices and
353
indicators for climate extremes: Workshop summary, Climatic Change. Vol. 32, pp. 3-7.
354
[22] Lana, X. and Burgueno, A., (1998). Daily dry-wet behaviour in Catalonia (NE Spain) from the
355
viewpoint of Markov chains, Int. J. Climatol. 18, 793-815.
356
[23] Leadbetter, M.R., Lindgren, G., and Rootzen, H., (1983). Extremes and Related Properties of Ran-
357
dom Sequences and Series. Springer Verlag, New York.
358
[24] LeCam, L., (1961). A stochastic description of precipitation Proc.4th Berkeley Symp., pp.165-186.
359
[25] Lennartsson, J., and Shu, M., (2005). Copula Dependence Structure on Real Stock Markets, Masters
360
thesis, Chalmers University of Technology, 2005-01.
361
[26] Liao, Y., Zhang, Q. and Chen, D., (2004). Stochastic modeling of daily precipitation in China.
362
Journal of Geographical Sciences, 14(4), 417-426.
363
[27] Mellor, D., (1996). The modified turning bands (mtb) model for space-time rainfall:i, model defini-
364
tion and properties. J. Hydrol., 175 113-127.
365
[28] McNeil, A.J., (1996). Estimating the tails of loss severity distributions using extreme value theory,
366
Technical report, Department Mathematik, ETH Zentrum, Zurich.
367
[29] Nelsen, R. B., (2006). An Introduction to Copulas 2nd edition. New York: Springer.
368
[30] Norris, J.R., (2005). Markov chains, Cambridge University Press.
369
[31] Peterson, T.C., Folland, C., Gruza, G., Hogg, W., Mokssit, A. and Plummer, N., (2001). Report on
370
the Activities of the Working Group on Climate Change Detection and Related Rapporteurs 1998-2001.
14
ACCEPTED MANUSCRIPT
371
World Meteorological Organisation, WCDMP-47, WMO-TD 1071.
372
[32] Racsko, P., Szeidl, L. and Semenov, M., (1991). A serial approach to local stochastic weather models.
373
Ecol. Model. 57, pp. 27-41.
374
[33] Resnick, S.I., (1997). Heavy tail modeling and teletraffic data, The Annals of Statistics 255, 1805-
375
1869.
376
[34] Rootzen, H. and Tajvidi, N., (1997). Extreme value statistics and wind storm losses: A case study,
377
Scandinavian Actuarial Journal 1, 70-94.
378
[35] Rodrı́guez-Iturbe, I., Cox, D. and Isham, V., (1987). Some models for rainfall based on stochastic
379
point processes. Proc. R. Soc. Lond., A 410, 269-288.
380
[36] Rodrı́guez-Iturbe, I., Cox, D. and Isham, V., (1988). A point process model for rainfall: further
381
developments. Proc. R. Soc. Lond., A 417, 283-298.
382
[37] Schwarz, G., (1978). Estimating the dimension of a model. Ann. Stat. 6, pp. 461-464.
383
[38] Selker, J.S. and Haith, D.A., (1990). Development and testing of simple parameter precipitation
384
distributions. Water Resour. Res. 26 11, pp. 2733-2740.
385
[39] Smith, R. L. and Robinson, P.J., (1997). A Bayesian approach to the modelling of spatial-temporal
386
precipitation data. Lect. Notes Statist., 237-269.
387
[40] Srikanthan, R. and McMAhon, T.A., (2001). Stochastic generation of annual, monthly and daily
388
climate data: A review. Hydrol. Earth Syst. Sci. 5 4, pp. 653-670.
389
[41] Stern, R.D. and Coe, R. (1984)., A Model fitting Analysis of Daily Rainfall Data, J.R.Statist.Soc.
390
A, 147, Part1, pp.1-34.
391
[42] Stidd, C.K., (1973). Estimating the precipitation climate. Wat. Resour. Res., 9 1235-1241.
392
[43] Tong, H. (1975). Determination of the order of a Markov chain by Akaike’s information criterion. J.
393
Appl. Prob., 12: 486-497.
394
[44] Trivedi, P. K. and Zimmer, D.M., (2007). Copula Modelling: An Introduction for Practitioners,
395
Foundations and Trends in Econometrics, Vol. 1, No 1, 1-111.
396
[45] Waymire, E., Gupta, V. K., (1981). The mathematical structure of rainfall representations: 3,, Some
397
applications of the point process theory to rainfall processes. Wat. Resour. Res.17, 1287-1294.
398
[46] Waymire, E., Gupta, V. K. and Rodrı́guez-Iturbe, I., (1984). Spectral theory of rainfall intensity at
399
the meso-β scale. Wat. Resour. Res.20, 1453-1465.
400
[47] Woolhiser, D.A., (1992). Modelling daily precipitation-progress and problems. In: A. Walden and
401
P. Guttorp (Editors), Statistics in the Environmental and Earth Sciences. Edward Arnold, London,
402
pp.71-89.
403
15
404
405
8
Appendix
8.1
ACCEPTED MANUSCRIPT
Review of Mathematical Order Estimators
Let Xt denote a k-Markov chain that is defined on a state space S and xn1 its realisation. Let also
PML(k) (xn1 ) be the kth order maximum likelihood, i.e.
PML(k) (xn1 ) = max P (X1k )Πni=k+1 P (Xi = xi |τk (X i−1 ) = τk (xi−1 )).
Tong (1975) reported that the Akaike Information Criterion (AIC) order estimator, could be used as
an objective technique for determining the optimum order k of the chain, see also Akaike (1974). The
optimium order k is the order that has the minimum loss function:
k̂AIC (xn1 ) = argmink (− log PML(k) (xn1 ) + |S|k ).
Schwarz (1978) presented an alternative technique the Bayesian Information Criterion (BIC) order
estimator whose consistency was established under general conditions was only recently established. The
optimum order, k is the order that minimises the loss function which now is given by:
k̂BIC (xn1 ) = argmink (− log PML (xn1 ) +
|S|k (|S| − 1)
log(n)).
2
406
Dalevi et al. (2006) showed using experimental results that the BIC order estimator tends to under-
407
estimate the order as k gets larger for moderate data sizes.
Finally, the Maximal Fluctuation Criterion (MFC) contrary to the AIC and BIC order estimators,
was specifically designed for multiple step Markov chains. Let for any realisation x ∈ S n of the k-Markov
chain, Nx (w) = |{i ∈ [1, n] : τl (xi ) = w, w ∈ S l }| denote the number of times w occurs in x. The
Peres-Shields Fluctuation function is defined as
∆k (v) = max |Nx (vs) −
s∈S
Nx (τk (v)s)
Nx (v)|.
Nx (τk (v))
When the order of the Markov chain is k or less, this fluctuation is small. Therefore, the Maximal
Fluctuation Criterion (MFC) order estimator is defined as
k̂MFC (xn1 ) = min{k ≥ 0 :
max
k<|v|<log log(n)
∆k (v) < n3/4 }.
In practice the function log log(·) is substituted by any function that grows slower than log(·). Dalevi et
al. (2006) suggested the Generalized Maximum Fluctuation Criterion (GMFC) order estimator, which is
closely related to the Maximal Fluctuation Criterion (MFC) order estimator,
k̂GMFC (xn1 ) = argmaxk
408
maxk−1<|v|<f (n) ∆k−1 (v)
,
maxk<|v|<f (n) ∆k (v)
where f (n) is any function that satisfies the same conditions as for the GMF order estimator.
409
Fig.1: Location of the stations.
410
Fig.2: Time plot of annual number of wet days.
411
Fig.3: Time plot of annual amount of precipitation.
16
ACCEPTED MANUSCRIPT
412
413
Fig.4: Lund, Sweden (data from 1961 to 2004). (Left): Observed p̂(t) pooled over 5 days. (Right):
Mean number of wet days per month (”+”), and per season (solid lines).
414
Fig.5: k-Markov chain orders for block lengths of one month, (Jan, Feb, ...).
415
Fig.6 k-Markov chain orders for block lengths of two months, (Jan-Feb, Feb-Mar, ...).
416
Fig.7 k-Markov chain orders for block lengths of three months, (Jan-Mar, Feb-Apr, ... ).
417
Fig.8: Order of Markov chain as suggested by the Kolmogorov-Smirnov statistic at 10% tail value
418
for each station and season.
419
Fig. 9: Conditional distribution of Dry Spell given the Dry Spell is longer or equal to 3 days for
420
k-Markov chain models of order k = 1, k = 2 and k = 3 and the data from Lund. Data are from the
421
winter months December-February.
422
423
Fig.10: Plot of the dependence structure with the marginal distributions transformed to standard
normal.
424
Fig. 11: Number of m-clusters with more than one observation. ’+’ denotes the observed and ’o’ the
425
theoretical number of m-clusters assuming that the observations are independent, while ’*’ denotes the
426
number of m clusters using ρ = 0.1362. Line ’–’ denotes the 95% confidence interval for the theoretical
427
number of m-clusters assuming independence, while ’-.’ denotes the 95% confidence interval for the
428
theoretical number of m-clusters assuming ρ̂ = 0.1362.
429
430
431
432
433
434
435
436
437
438
439
440
441
442
Fig. 12: Mean residual life plot of amount precipitation process from Lund, dotted lines give the
95% confidence interval.
Fig. 13: Diagnostic plots for threshold excess model fitted to daily precipitation data from the station
in Lund.
Fig. 14: Plots of R10mm (top left), R20mm (top right), RX1day (bottom left) and RX5day (bottom
right). theoretical distribution ’-’ and empirical distribution ’.-’.
Fig. 15: Plot of maximum number of consecutive dry days (left), and maximum number of consecutive
wet days (right).
Fig. 16: Plot of the probability of number of moderate wet days (top left), above moderate wet days
(top right), very wet days (bottom left) and extremely wet days (bottom right).
Fig. 17: Percentage of precipitation during the moderately wet days (top left), the above moderate
wet days (top right), the very wet days (bottom left) and the extremely wet days (bottom right).
Fig. 18: Plot of the average amount of precipitation per day of precipitation (left) and the 90%
quantile of the amount of precipitation of the thinned precipitation process (right).
443
Table 1: Names of weather stations.
444
Table 2: Number of data sets that have passed the Kolmogorov-Smirnov test at the 10% tail value
445
for different orders of the Markov chain. S1 stands for Dec.-Feb., S2 for Mar.-May, S3 for Jun.-Aug. and
446
S4 for Sep.-Nov.
447
Table 3: Expected length of long dry spells in days for season Dec-Feb in Lund.
448
Table 4: Extremal parameters and their 95% confidence intervals for each weather station.
17
ACCEPTED MANUSCRIPT
449
Table 5: Values of the parameter θ for different choices of m clusters.
450
Table 6: Weather Indices and their mathematical expressions. The quantiles q(·) have been estimated
451
using the observed data.
18
Figure 1
ACCEPTED MANUSCRIPT
Figure 2
ACCEPTED MANUSCRIPT
annual no. of wet days
300
200
100
0
1970 1980 1990 2000
300
200
100
0
1970 1980 1990 2000
Years
Years
Hanö
Borås
300
200
100
0
Bolmen
annual no. of wet days
annual no. of wet days
annual no. of wet days
Lund
1970 1980 1990 2000
Years
300
200
100
0
1970 1980 1990 2000
Years
Figure 2
ACCEPTED MANUSCRIPT
annual no. of wet days
300
200
100
0
1970 1980 1990 2000
300
200
100
0
1970 1980 1990 2000
Years
Years
Säffle
Söderköping
300
200
100
0
Ungsberg
annual no. of wet days
annual no. of wet days
annual no. of wet days
Varberg
1970 1980 1990 2000
Years
300
200
100
0
1970 1980 1990 2000
Years
Figure 2
ACCEPTED MANUSCRIPT
annual no. of wet days
300
200
100
0
1970 1980 1990 2000
300
200
100
0
1970 1980 1990 2000
Years
Years
Vattholma
Myskåsen
300
200
100
0
Malugn
annual no. of wet days
annual no. of wet days
annual no. of wet days
Stockholm
1970 1980 1990 2000
Years
300
200
100
0
1970 1980 1990 2000
Years
Figure 2
ACCEPTED MANUSCRIPT
annual no. of wet days
300
200
100
0
1970 1980 1990 2000
300
200
100
0
1970 1980 1990 2000
Years
Years
Piteå
Stensele
300
200
100
0
Rösta
annual no. of wet days
annual no. of wet days
annual no. of wet days
Härnösand
1970 1980 1990 2000
Years
300
200
100
0
1970 1980 1990 2000
Years
Figure 2
ACCEPTED MANUSCRIPT
annual no. of wet days
300
200
100
0
1970 1980 1990 2000
300
200
100
0
1970 1980 1990 2000
Years
Years
Pajala
Karesuando
300
200
100
0
Kvikkjokk
annual no. of wet days
annual no. of wet days
annual no. of wet days
Haparanda
1970 1980 1990 2000
Years
300
200
100
0
1970 1980 1990 2000
Years
Figure 3
ACCEPTED MANUSCRIPT
1500
1000
500
1980
1500
1000
500
2000
1980
Years
Years
Hanö
Borås
1500
amount of precip. (mm)
amount of precip. (mm)
Bolmen
amount of precip. (mm)
amount of precip. (mm)
Lund
1000
500
1980
Years
2000
2000
1500
1000
500
1980
Years
2000
Figure 3
ACCEPTED MANUSCRIPT
1500
1000
500
1980
1500
1000
500
2000
1980
Years
Years
Säffle
Söderköping
1500
amount of precip. (mm)
amount of precip. (mm)
Ungsberg
amount of precip. (mm)
amount of precip. (mm)
Varberg
1000
500
1980
Years
2000
2000
1500
1000
500
1980
Years
2000
Figure 3
ACCEPTED MANUSCRIPT
1500
1000
500
1980
1500
1000
500
2000
1980
Years
Years
Vattholma
Myskåsen
1500
amount of precip. (mm)
amount of precip. (mm)
Malugn
amount of precip. (mm)
amount of precip. (mm)
Stockholm
1000
500
1980
Years
2000
2000
1500
1000
500
1980
Years
2000
Figure 3
ACCEPTED MANUSCRIPT
1500
1000
500
1980
1500
1000
500
2000
1980
Years
Years
Piteå
Stensele
1500
amount of precip. (mm)
amount of precip. (mm)
Rösta
amount of precip. (mm)
amount of precip. (mm)
Härnösand
1000
500
1980
Years
2000
2000
1500
1000
500
1980
Years
2000
Figure 3
ACCEPTED MANUSCRIPT
1500
1000
500
1980
1500
1000
500
2000
1980
Years
Years
Pajala
Karesuando
1500
amount of precip. (mm)
amount of precip. (mm)
Kvikkjokk
amount of precip. (mm)
amount of precip. (mm)
Haparanda
1000
500
1980
Years
2000
2000
1500
1000
500
1980
Years
2000
Figure 4
ACCEPTED MANUSCRIPT
0.8
0.75
0.7
prob. of precip.
0.65
0.6
0.55
0.5
0.45
0.4
0.35
0.3
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Figure 4
ACCEPTED MANUSCRIPT
20
19
18
mean no. of wet days
17
16
15
14
13
12
11
10
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Figure 5
ACCEPTED MANUSCRIPT
Estimated Orders by Akaike order estimator
st 01
st 02
st 03
st 04
st 05
st 06
st 07
st 08
st 09
st 10
5
3
1
5
3
1
st 11
st 12
st 13
st 14
st 15
st 16
st 17
st 18
st 19
st 20
5
3
1
5
3
1
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
Dec
Figure 5
ACCEPTED MANUSCRIPT
Estimated Orders by Bayesian order estimator
st 01
st 02
st 03
st 04
st 05
st 06
st 07
st 08
st 09
st 10
5
3
1
5
3
1
st 11
st 12
st 13
st 14
st 15
st 16
st 17
st 18
st 19
st 20
5
3
1
5
3
1
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
Dec
Figure 5
ACCEPTED MANUSCRIPT
Estimated Orders by GMFC order estimator
st 01
st 02
st 03
st 04
st 05
st 06
st 07
st 08
st 09
st 10
5
3
1
5
3
1
st 11
st 12
st 13
st 14
st 15
st 16
st 17
st 18
st 19
st 20
5
3
1
5
3
1
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
Dec
Figure 6
ACCEPTED MANUSCRIPT
Estimated Orders by Akaike order estimator
st 01
st 02
st 03
st 04
st 05
st 06
st 07
st 08
st 09
st 10
5
3
1
5
3
1
st 11
st 12
st 13
st 14
st 15
st 16
st 17
st 18
st 19
st 20
5
3
1
5
3
1
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
Dec
Figure 6
ACCEPTED MANUSCRIPT
Estimated Orders by Bayesian order estimator
st 01
st 02
st 03
st 04
st 05
st 06
st 07
st 08
st 09
st 10
5
3
1
5
3
1
st 11
st 12
st 13
st 14
st 15
st 16
st 17
st 18
st 19
st 20
5
3
1
5
3
1
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
Dec
Figure 6
ACCEPTED MANUSCRIPT
Estimated Orders by GMFC order estimator
st 01
st 02
st 03
st 04
st 05
st 06
st 07
st 08
st 09
st 10
5
3
1
5
3
1
st 11
st 12
st 13
st 14
st 15
st 16
st 17
st 18
st 19
st 20
5
3
1
5
3
1
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
Dec
Figure 7
ACCEPTED MANUSCRIPT
Estimated Orders by Akaike order estimator
st 01
st 02
st 03
st 04
st 05
st 06
st 07
st 08
st 09
st 10
5
3
1
5
3
1
st 11
st 12
st 13
st 14
st 15
st 16
st 17
st 18
st 19
st 20
5
3
1
5
3
1
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
Dec
Figure 7
ACCEPTED MANUSCRIPT
Estimated Orders by Bayesian order estimator
st 01
st 02
st 03
st 04
st 05
st 06
st 07
st 08
st 09
st 10
5
3
1
5
3
1
st 11
st 12
st 13
st 14
st 15
st 16
st 17
st 18
st 19
st 20
5
3
1
5
3
1
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
Dec
Figure 7
ACCEPTED MANUSCRIPT
Estimated Orders by GMFC order estimator
st 01
st 02
st 03
st 04
st 05
st 06
st 07
st 08
st 09
st 10
5
3
1
5
3
1
st 11
st 12
st 13
st 14
st 15
st 16
st 17
st 18
st 19
st 20
5
3
1
5
3
1
Jan Jun
DecJan Jun
DecJan Jun
DecJan Jun
DecJan Jun
Dec
Figure 8
ACCEPTED MANUSCRIPT
Estimated Orders by KS−criterion of dry spell order estimator
st 01
st 02
st 03
st 04
st 05
st 06
st 07
st 08
st 09
st 10
5
3
1
5
3
1
st 11
st 12
st 13
st 14
st 15
st 16
st 17
st 18
st 19
st 20
5
3
1
5
3
1
Jan May Nov Jan May Nov Jan May Nov Jan May Nov Jan May Nov
Figure 9
ACCEPTED MANUSCRIPT
1
0.9
probability
0.8
0.7
0.6
0.5
0.4
Empirical
3−Markov
2−Markov
1−Markov
0.3
2
4
6
10
8
time (days)
12
14
Figure 10
ACCEPTED MANUSCRIPT
4
3
2
T(Yt+1)
1
0
−1
−2
−3
−4
−4
−3
−2
−1
0
T(Yt)
1
2
3
4
ACCEPTED MANUSCRIPT
Figure 11
No clusters with more than one obs
50
40
30
20
10
0
0
1
2
3
m
4
5
6
Figure 12
ACCEPTED MANUSCRIPT
12
10
E[X−u|X>u]
8
6
4
2
0
0
10
20
30
u
40
50
60
Figure 13
1
0.5
0
0
1
0.5
Quantile Plot
60
40
20
20
40
GP model
GP model (mm)
Return Level Plot
Density Plot
no of observations
Return Level
Extreme Empirical model
Probability Plot
Extreme Empirical model (mm)
ACCEPTED MANUSCRIPT
80
60
40
10
40
Return Period (Years)
160
60
80
60
40
20
0
20
40
Amount of precipitation (mm)
60
Figure 14
ACCEPTED MANUSCRIPT
Index R10mm
1
0.8
0.6
0.4
0.2
0
Model cumul. distr.
Empirical distr.
10
15
20
no. of days
25
30
Figure 14
ACCEPTED MANUSCRIPT
Index R20mm
1
0.8
0.6
0.4
0.2
0
Model cumul. distr.
Empirical distr.
0
1
2
3
4
6
5
no. of days
7
8
9
10
Figure 14
ACCEPTED MANUSCRIPT
Index RX1day
1
0.8
0.6
0.4
0.2
Model cumul. distr.
Empirical distr.
0
15
20
25
30
50
45
40
35
amount of prec. (mm)
55
60
65
Figure 14
ACCEPTED MANUSCRIPT
Index RX5day
1
0.8
0.6
0.4
0.2
Model cumul. distr.
Empirical distr.
0
30
40
50
80
70
60
amount of prec. (mm)
90
100
110
Figure 15
ACCEPTED MANUSCRIPT
Index CDD
1
0.8
0.6
0.4
0.2
0
Model cumul. distr.
Empirical distr.
10
15
25
20
no. of days
30
35
Figure 15
ACCEPTED MANUSCRIPT
Index CWD
1
0.8
0.6
0.4
0.2
0
Model cumul. distr.
Empirical distr.
8
10
12
16
14
no. of days
18
20
22
24
Figure 16
ACCEPTED MANUSCRIPT
Index R75p
1
0.8
0.6
0.4
0.2
Model cumul. distr.
Empirical distr.
0
30
35
40
45
50
no. of days
55
60
65
70
Figure 16
ACCEPTED MANUSCRIPT
Index R90p
1
0.8
0.6
0.4
0.2
0
Model cumul. distr.
Empirical distr.
10
15
20
no. of days
25
30
Figure 16
ACCEPTED MANUSCRIPT
Index R95p
1
0.8
0.6
0.4
0.2
0
Model cumul. distr.
Empirical distr.
4
6
8
12
10
no. of days
14
16
18
Figure 16
ACCEPTED MANUSCRIPT
Index R99p
1
0.8
0.6
0.4
0.2
0
Model cumul. distr.
Empirical distr.
0
1
2
3
5
4
no. of days
6
7
8
9
Figure 17
ACCEPTED MANUSCRIPT
Index R75pTOT
1
0.8
0.6
0.4
0.2
Model cumul. distr.
Empirical distr.
0
0.6
0.65
0.7
0.75
0.8
Figure 17
ACCEPTED MANUSCRIPT
Index R90pTOT
1
0.8
0.6
0.4
0.2
Model cumul. distr.
Empirical distr.
0
0.25
0.3
0.35
0.4
0.45
0.5
0.55
Figure 17
ACCEPTED MANUSCRIPT
Index R95pTOT
1
0.8
0.6
0.4
0.2
Model cumul. distr.
Empirical distr.
0
0.15
0.2
0.25
0.3
0.35
0.4
Figure 17
ACCEPTED MANUSCRIPT
Index R99pTOT
1
0.8
0.6
0.4
0.2
Model cumul. distr.
Empirical distr.
0
0
0.05
0.1
0.15
0.2
0.25
Figure 18
ACCEPTED MANUSCRIPT
Index SDII
1
0.8
0.6
0.4
0.2
Model cumul. distr.
Empirical distr.
0
3
4
3.5
amount of prec. (mm)
4.5
5
Figure 18
ACCEPTED MANUSCRIPT
Index Prec90p
1
0.8
0.6
0.4
0.2
Model cumul. distr.
Empirical distr.
0
8
9
11
10
amount of prec. (mm)
12
13
Table 1
ACCEPTED MANUSCRIPT
Number
Name
1
Lund
2
Bolmen
3
Hanö
4
Borås
5
Varberg
6
Ungsberg
7
Säffle
8
Söderköping
9
Stockholm
10
Malung
11
Vattholma
12
Myskelåsen
13
Härnösand
14
Rösta
15
Piteå
16
Stensele
17
Haparanda
18
Kvikkjokk
19
Pajala
20
Karesuando
1
Table 2
ACCEPTED MANUSCRIPT
Season
Model
S1
S2
S3
S4
k=1
1
0
1
6
k=2
20
20
20
20
k=3
20
20
20
20
1
Table 3
ACCEPTED MANUSCRIPT
Model
l=1
l=2
l=3
k=1
2.49
3.49
4.49
k=2
-
3.91
4.91
k=3
-
-
5.11
2.56
3.97
5.23
Observed
mean value
1
Table 4
ACCEPTED MANUSCRIPT
CI for σ̂
ξˆ
CI for ξˆ
θ̂
u (mm)
ρ̂
5.91
(4.93, 7.03)
0.076
(-0.041, 0.236)
0.935
15
0.1362
Bolmen
6.44
(5.56, 7.41)
-0.0002
(-0.095, 0.116)
0.921
15
0.2008
Hanö
5.29
(3.044, 8.737)
0.458
(0.115, 1.05)
0.977
25
0.1649
Borås
7.63
(7.01, 8.28)
-0.011
(-0.067,0.053)
0.794
10
0.1982
Varberg
5.48
(4.687, 6.378)
0.106
(0.001, 0.236)
0.926
15
0.1206
Ungsberg
5.768
(4.622,7.115)
0.245
(0.089,0.445)
0.925
15
0.1843
Säffle
6.62
(5.96,7.329)
0.099
(0.027,0.183)
0.857
10
0.1809
Söderköping
6.259
(4.32,8.884)
0.297
(0.1, 0.649)
0.984
25
0.1678
Stockholm
5.597
(4.827,6.453)
0.135
(0.033,0.259)
0.903
10
0.1523
Malung
6.355
(5.676,7.095)
0.08
(0.004,0.17)
0.86
10
0.2280
Vattholma
4.964
(3.521,6.784)
0.334
(0.098,0.667)
0.984
20
0.1709
Myskelåsen
6.854
(5.962,7.844)
0.019
(-0.072,0.13)
0.849
10
0.2311
Härnösand
7.863
(7.053, 8.742)
0.087
(0.011, 0.175)
0.832
10
0.2068
Rösta
6.276
(5.453,7.19)
0.032
(-0.062, 0.145)
0.876
10
0.2116
Piteå
5.937
(4.429, 7.822)
0.19
(0.004, 0.456)
0.96
20
0.2010
Stensele
7.66
(6.098, 9.5)
0.041
(-0.11, 0.236)
0.915
15
0.2249
Haparanda
5.628
(4.405, 7.07)
-0.073
(-0.196, 0.125)
0.984
18
0.1871
Kvikkjokk
5.66
(5.01, 6.36)
0.04
(-0.04, 0.137)
0.864
10
0.2526
Pajala
5.033
(3.705, 6.728)
0.356
(0.153, 0.646)
0.966
18
0.2385
Karesuando
5.303
(4.117, 6.754)
0.12
(-0.037, 0.34)
0.922
15
0.2206
Station
σ̂
Lund
1
Table 5
ACCEPTED MANUSCRIPT
m
θ̂
0
0.9144
1
0.8836
2
0.8425
3
0.8322
1
Table 6
ACCEPTED MANUSCRIPT
Index
Description
Formula
P
1{Zi >10}
P
1{Zi >20}
R10mm
Heavy precipitation days
R20mm
Very heavy precipitation days
RX1day
Highest 1 day precipitation amount
RX5day
Highest 5 day precipitation amount
maxi Zi
P4
maxi j=0 Zi+j
CDD
Max number of consecutive dry days
max{j : τj (X i ) = 0}
CWD
Max number of consecutive wet days
R75p
Moderate wet days
R90p
Above moderate wet days
R95p
Very wet days
R95p
Extremely wet days
R75pTOT
Precipitation fraction due to R75p
R90pTOT
Precipitation fraction due to R90p
R95pTOT
Precipitation fraction due to R95p
R99pTOT
Precipitation fraction due to R99p
SDII
Simple daily intensity index
max{j : w = τj (X i ), wk > 0, ∀k}
P
1{Zi >q0.75 }
P
1{Zi >q0.90 }
P
1{Zi >q0.95 }
P
1{Zi >q0.99 }
P
P
Zi 1{Zi >q0.75 } / Zi
P
P
Zi 1{Zi >q0.90 } / Zi
P
P
Zi 1{Zi >q0.95 } / Zi
P
P
Zi 1{Zi >q0.99 } / Zi
P
P
Yi / 1{Yi >0 }
Prec90p
90%-quant. of thinned amount of precipitation
FY−1 (0.9)
1