Download Proj_1 - University of Washington

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Project 1
Advanced Hydrology 599
Survey of Point Process Precipitation Models and Implementation of the BartlettLewis Cluster Model Formulation
Alan Hamlet
1/22/02
Overview
For this project we reviewed some selected papers in the literature on stochastic point
process precipitation models and wrote computer code to implement two of the more
successful techniques in order to become more familiar with the technical details. The
work in coding the models was roughly divided into several parts: 1) construction of
three number generators for Poisson, geometric, and exponential distributions 2) coding
of two cluster-based models and aggregation to daily timestep 3) testing using one case
study from the literature, which included a statistical analysis of observed and simulated
data 4) A sensitivity analysis of the parameter space surrounding around the optimal
solution used in testing the code.
It was hoped that we would also be able to fully implement the process of fitting these
models to some new data, but this proved to be a challenging problem in and of itself
that exceeded the time available for the project.
Chunmei and I wrote the code for the number generators and accompanying statistical
analysis as a shared resource, and we also used a single precipitation data set (15 May-16
June for Denver Air Port) that had been previously fit to both the Neyman-Scott and
Bartlett-Lewis formulations as a test case in two papers (Rodreigues-Iturbe et al.
1984,1987). The implementation of the code for each model formulation was done
individually, with Chunmei examining the Neyman-Scott formulation and myself the
Bartlett-Lewis formulation.
At the outset of the project we reviewed a number of articles on the historic development
of point process models, but settled on Rodriguez-Iturbe (1984, 1987) as the best surveys
of available practice and techniques. There are a few more recent articles discussing
these techniques as well (e.g. Velghe et al. 1994), but we primarily looked at what these
authors were attempting to do, and did not spend much time on the details of these
investigations. Foufoula-Georgiou (1985) also provided some useful references,
discussion, and information on technical details.
A Brief Overview of Some Point Process Model Types
Models based on simple Poisson rate processes that determine, in the model, the time
between rain events, and variations on this basic concept that give additional temporal
structure to the simulated storm events by initiating rectangular pulses with stocastic
length and depth have been shown to have difficulties in reproducing a range of statistical
properties that characteristic precipitation events at a point. These simple models, for
example, can be trained to reproduce the mean and variance of the data, but may fail to
capture the autocorrelations present in the observed data (Rodreigues-Iturbe et al. 1987).
These models also have the limitation that aggregation to data having different time steps
does not necessarily reproduce the observed statistics well in each case, i.e. the
parameters must be chosen based on a particular timestep.
So called "cluster-based" models add more temporal structure by linking a Poisson based
storm initiation process (a stochastic time between “storm origins”) with the initiation of
a number of storm “cells”, each of which is characterized either as a pulse of stochastic
“depth” (as in some Neyman-Scott formulations), or as a stochastic precipitation rate and
duration (as in the Bartlett-Lewis formulation). The contribution from these individual
storm cells are then summed together to construct the total precipitation occurring in
some regular interval of time (i.e. a time series of hourly or daily data). Models of this
type have been demonstrated to successfully reproduce a number of meaningful statistics
such as the mean, variance, skewness, autocorrelation, and probability of days without
rain. These more sophisticated schemes also demonstrate better stability within the fitted
parameter space when moving between different time scales (i.e. the same parameters
will work reasonably well when aggregating the pulse data to hourly or daily timesteps)
(Rodreigues-Iturbe et al. 1987)
The process of fitting appropriate parameters to reproduce the statistics of a given data set
involves the solution of a number of simultaneous non-linear equations equating the
secondary characteristics (e.g. mean, variance, etc.) of the model formulation and the
statistics of the observed data.
Bartlett Lewis Formulation
The Bartlett-Lewis formulation produces a sequence of storm cells having intensity
(precipitation rate) and duration and uses five parameters to define the stochastic nature
of the construction process. There is no theoretical constraint on the type of distributions
used in each step of the process, but the formulation used here (Rodreigues-Iturbe et al.
1987) has been demonstrated to work reasonably well in reproducing the autocorrelation
of observed data. The process of generating a series of "storm cells" with stochastic
duration and intensity may be summarized as follows:
1) The generation of a “storm origin” is assumed to follow a Poisson distributed rate
process having an expected value of lambda. Thus the expected value of the time
between storm origins is 1/lambda. This parameter has a strong effect on the mean,
variance, and time between precipitation events.
2) Each storm origin will generate a variable number of “storm cells” until a certain time
from the storm origin is exceeded, at which time cell generation ceases. This time
from the storm origin, which I will call the “generation duration” occurs is taken to be
exponentially distributed with an expected value of 1/gamma. This parameter
contributes to the persistence of storm cells once a storm is initiated, and almost
completely determines the autocorrelation at lag 1.
3) The first cell is formulated to occur at the time of the storm origin, and additional
cells are generated (until the generation duration is exceeded) as a second Poisson
2
process with expected value Beta. So the mean time between storm cells is 1/Beta.
This parameter partly determines the intensity of each storm event.
4) The intensity of each cell’s contribution is taken to be exponentially distributed with
expected value 1/eta. This parameter partly determines the intensity of each storm
event.
5) The duration of each cell’s contribution is taken to be exponentially distributed with
expected value 1/nu. This parameter partly determines the persistence of each storm
event and the autocorrelation within storms.
The figure below shows a schematic of the cell generation process. Each storm cell
generated is defined in the code simply by a start time, an end time, and an intensity.
End of Cell Generation
Storm Origin
Cell Duration
Cell 3
Cell 1
Cell 2
Cell 4
Time to next cell
Cell
Intensity
Schematic of Bartlett-Lewis Cell Generation Scheme
Methods of aggregation to daily time step is discussed in the next section.
A Brief Overview of Parameter Optimization Strategies
The process of fitting a set of model parameters to a set of statistics derived from some
observed precipitation data is difficult in practice, and a number of researchers have
conducted studies to attempt to simplify these procedures. The most basic technique and
probably the most objective, is to explicitly solve a series of independent non-linear
equations using numerical techniques. This is the approach of Rodreigues-Iturbe et al.
(1987) in our test case. Other researchers have made the case that some of the parameters
tend to dominant the effects on certain statistics of the simulated data, and that these can
be selected as initial estimates, thus reducing the degrees of freedom of the problem.
This would not appear to be viable option in the case of the Bartlett-Lewis formulation,
however (see section below on parameter sensitivity analysis).
Some Technical Details
Number Generators
The number generators were all taken from code in Numerical Recipes in C (Press et al.
1997) and were all based on the uniform deviate generator ran1 (Press et al. 1997, pp
280) . Exponential and geometric distribution values were based on continuous values of
3
the cumulative distribution functions; the Poisson processes were assumed to generate
discrete integer values of the rate (i.e. 0,1,2,3,. . .), and were generated using the
“rejection method “(Press et al. 1997, pp 294).
Use of Poisson Distributed Values
Some confusion was initially encountered in how to treat a value of 0 returned from the
discrete Poisson number generator (a frequent occurrence in the case of the storm origin,
for example). It was decided that the appropriate action when receiving a value of zero
for the Poisson rate is simply to advance the time variable one hour, and generate no
cells. For the second Poisson process determining the rate of cell generation, a value of
zero from the number generator would again advance the time of generation one hour,
and no cell would be generated at that time, however the cell generation process
continues until the generation duration is exceeded.
Aggregating the Contribution of Each Storm Cell to Daily Time Steps
Each storm cell is saved in the code as a start time, an end time, and an intensity. In
generating the daily time series for precipitation, the contribution of each storm cell to
each individual 24-hour segment is determined as the portion of the cell’s duration (hrs)
falling within the 24 hour “window” multiplied by the cell’s intensity (mm/hr). If more
than one cell affects the 24 hour window, the contributions are summed.
Model Test Case
The test case for the coded model was the fitted parameters reported in Rodreigues-Iturbe
et al. (1987) using data from the Denver Air Port for the period from 15 May- 16 June for
the years from 1949-1976. We obtained the daily rainfall data from the Earth Info CD’s,
and calculated the statistics used in the fitting process for both the observed data and our
model simulations aggregated to daily timestep. The table below shows the model
parameters used. The model was allowed to produce storm cells until the elapsed time
was greater than 100,000 hours, and the cell pulses were then aggregated to a daily time
series. The long simulations reduced the sensitivity to the initial seed value for the
random number generator. Our model results are shown below and compared to the
results presented in the paper. These results were considered good enough to verify that
the model formulation was reasonably consistent with what was reported in RodreiguesIturbe et al. (1987) for the Bartlett-Lewis formulation and that the code in our
implementation was working as intended.
4
Poisson Distribution Parameters
0.00796 (lambda)
0.600 (beta)
Exponential Distribution Parameters
0.0947 (gamma)
0.334 (eta)
1.70 (nu)
Model Parameters (from Rodreigues-Iturbe et al. (1987))
Stats for Observed Data:
Stats for B-L Model Data:
Stats for B-L Model Data:
(Our Analysis)
(Our Implementation)
(Rodriguez-Iturbe 1987)
mean = 2.10
variance = 41.6
skewness = 5.67
autocorrelation lag1 = 0.158
autocorrelation lag2 = -0.0224
autocorrelation lag3 = -0.0223
probability of zero rain = 0.64
mean = 2.122
variance = 35.3
skewness = 4.03
autocorrelation lag1 = 0.204
autocorrelation lag2 = 0.013
autocorrelation lag3 = -0.006
probability of zero rain = 0.76
mean = 2.125
variance = 40.52
skewness = 3.05
autocorrelation lag1 = 0.16
autocorrelation lag2 = 0.01
autocorrelation lag3 = 0
probability of zero rain = 0.78
Model Verification Results Comparing Observed and Simulated Statistics
I should note that our statistics for the observed data do not exactly match those reported
in Rodrigues-Iturbe et al. (1987), perhaps because these authors had access to hourly
data which was subsequently aggregated to more precise estimates of precipitation at
daily timestep, or to other discrepancies that are not diagnosed here. My model
simulations had similar statistical characteristics to those reported for the B-L model in
the paper using the same parameters, but these, too, did not exactly match. Potential
discrepancies between the number generators used may explain minor discrepancies in
the results.
Parameter Sensitivity Analysis
A simple sensitivity analysis was performed on each parameter individually (keeping the
other parameters fixed), in which each parameter value was halved and doubled , and the
output statistics compared. While this is not very sophisticated analysis, it gives some
sense of how the output statistics are affected by the choice of parameter, which is what I
was after. The results are shown in the series of figures below (one for each parameter).
5
Percent Change
140
120
100
80
60
40
20
0
-20
-40
-60
*1/2
*2
mean
variance
skewness
autocorr
lag1
p no rain
Sensitivity to Lambda (storm origin rate parameter)
200
Percent Change
150
100
50
*1/2
*2
0
-50
-100
mean
variance
skewness
autocorr
lag1
p no rain
Sensitivity to Beta (storm cell generation rate parameter)
6
Percent Change
140
120
100
80
60
40
20
0
-20
-40
-60
-80
*1/2
*2
mean
variance
skewness
autocorr
lag1
p no rain
Sensitivity to Gamma (cell generation process duration )
350
Percent Change
300
250
200
150
*1/2
100
*2
50
0
-50
-100
mean
variance
skewness
autocorr
lag1
p no rain
Sensitivity to Eta (cell intensity )
7
350
Percent Change
300
250
200
150
*1/2
100
*2
50
0
-50
-100
mean
variance
skewness
autocorr
lag1
p no rain
Sensitivity to nu (cell duration )
The sensitivity analysis shows that while some parameters in the Bartlett-Lewis
formulation have a dominant effect on certain statistics (e.g. lambda clearly determines
the probability of no rain near the opitimal solution, and gamma the lag1 autocorrelation),
it is also clear that the degrees of freedom in the fitting problem cannot be reduced very
much in an objective manner. This point was also made by Rodreigues-Iturbe et al.
(1987).
Discussion
Cluster-based point process models of the type examined here reached a sufficient state
of development in the late 1980's to successfully reproduce the statistics of observed
rainfall patterns at a single point. While some applications are clearly still available for
these techniques (e.g. filling in holes in data records). The issues surrounding nonstationary precipitation statistics and the importance of spatial variations at various time
scales would appear to limit the importance of these methods for other purposes. It
would appear, for example, that physically-based climate models may eventually provide
a better approach to the problem of estimating future precipitation variability, with
perhaps some extension of these techniques in statistical downscaling procedures. For
example, variations in simulated pressure fields from reanalysis data could be associated
with precipitation statistics in the observed record, to which a stochastic model could be
fitted. Then short-timestep precipitation data could potentially be generated from future
climate scenarios from first principles, with the assumption that the relationship between
current weather patterns (e.g. pressure fields) and precipitation statistics are essentially
unaltered in the future. This is very different from the assumption that the current
precipitation statistics will be the same in the future.
8
References:
Foufoula-Georgiou, E, 1985, Discrete-Time Point Process Models for Daily Rainfall,
Water Resources Series Technical Report No. 93, Department of Civil and
Environmental Engineering, University of Washington, March
Rodriguez-Iturbe, I, Cox, DR, Isham V, 1987, Some Models for rainfall based on
stochastic point processes, Proc. R. Soc. London, A410, pp 269-288
Rodriguez-Iturbe, I, Gupta, VK, Waymire, E, 1984, Scale considerations in the modeling
of temporal rainfall, WRR, 20 (11), pp 1611-1619
Press W, Teukolsky, Vetterling, Flannery, 1997, Numerical Recipes in C, Cambridge
University Press
Velghe, T, Troch, PA, De Troch, FP, Van de Velde, J, 1994, Evaluations of cluster-based
rectangular pulses point process models for rainfall, WRR, 30 (10), pp 2847-2857
9
Appendix 1 Model Code
/******************************************************
This program outputs a time series of stocastic Bartlett-Lewis Pulses
and includes math functions for one uniform and three non-uniform
distributions
******************************************************/
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
int main(void) {
int i,j,k;
int stime, etime, MAXTIME=200000;
long int *seed=0;
float expparam[5], poiparam[5], geoparam[5];
float ran1(long int *idum);
float expdev(long int *idum, float *lambda);
float geodev(long int *idum, float *pp);
float poidev(long *idum, float xm);
float gammln(float xx);
float P, basetime, time, celltime, gentime, dura, intens;
float pulse[30000][3], sum, up, low;
int numcells, countpulse, daycount;
/* Assign memory to seed pointer*/
seed=malloc(4);
/* initialize the random number generator seed and params */
*seed=-5000;
printf("%ld\n", *seed);
poiparam[0] = 0.00796;
printf("%f\n", poiparam[0]);
poiparam[1] = 0.6;
printf("%f\n", poiparam[1]);
geoparam[0] = 0.1;
printf("%f\n", geoparam[0]);
geoparam[1] = 0.4;
printf("%f\n", geoparam[1]);
expparam[0] = 0.09479;
printf("%f\n", expparam[0]);
expparam[1] = 0.33445;
10
printf("%f\n", expparam[1]);
expparam[2] = 3.4;
printf("%f\n", expparam[2]);
/*Neyman-Scott Model
basetime=0;
while(time<MAXTIME){
basetime += 1/poidev(seed, poiparam[0]);
numcells= (int)geodev(seed, &geoparam[0]);
printf("numcells = %d\n", numcells);
for(j=0; j<numcells; j++){
time= basetime + expdev(seed, &expparam[1]);
intens= expdev(seed, &expparam[2]);
printf("%f %f\n", time, intens);
}
}
*/
/*********************Bartlett-Lewis Model **/
basetime=0;
countpulse=0;
while(basetime<MAXTIME) {
/*
printf("basetime=%f ,gentime=%f\n", basetime, gentime); */
/* generate new storm origin and stocastic cell generation time*/
P=poidev(seed, poiparam[0]);
gentime= expdev(seed, &expparam[0]);
if(P==0.0) basetime += 1.0;
else {
basetime += 1/P;
/* if a storm is initiated make one cell at storm origin*/
celltime=basetime;
intens=expdev(seed, &expparam[1]);
dura=expdev(seed, &expparam[2]);
pulse[countpulse][0]=celltime;
11
pulse[countpulse][1]=celltime+dura;
pulse[countpulse][2]=intens;
countpulse++;
/*
printf("%f %f %f\n", celltime, celltime+dura, intens); */
/*if a storm is initiated generate cells until the gentime is
exceeded*/
while(celltime < (basetime+gentime)){
P=poidev(seed, poiparam[1]);
if(P==0.0) celltime += 1.0;
else {
celltime += 1/P;
intens=expdev(seed, &expparam[1]);
dura=expdev(seed, &expparam[2]);
pulse[countpulse][0]=celltime;
pulse[countpulse][1]=celltime+dura;
pulse[countpulse][2]=intens;
countpulse++;
/*
printf("%f %f %f\n", celltime, celltime+dura, intens); */
}
}
}
}
/**********************************************************************
******/
/*************aggegate to daily data*********************/
daycount=1;
stime=0;
etime=24;
while(daycount < MAXTIME/24){
for(i=0;i<countpulse; i++) {
if(pulse[i][1] <= etime) up= pulse[i][1];
else up = etime;
if(pulse[i][0] >= stime) low= pulse[i][0];
else low = stime;
if(pulse[i][1] < stime || pulse[i][0] > etime) sum +=0.0;
else sum += (up - low) * pulse[i][2];
}
printf("%d %f\n", daycount, sum);
12
sum=0;
daycount++;
stime += 24;
etime += 24;
}
/************************************************************/
/*************************************** testing routine
printf("test values:\n");
for(i=0;i<25;i++)
printf("%f %f %f %f\n", ran1(seed), expdev(seed, &expparam[0]),
geodev(seed, &geoparam[0]), poidev(seed, poiparam[0]));
*******************************************************/
free(seed);
return (0);
}
/***********************************
This function implements a uniform number generator
Numerical Recipies, 1992)
***********************************/
#define
#define
#define
#define
#define
#define
#define
#define
#define
[0,1] (pg 280
IA 16807
IM 2147483647
AM (1.0/IM)
IQ 127773
IR 2836
NTAB 32
NDIV (1+(IM-1)/NTAB)
EPS 1.2E-7
RNMX (1.0-EPS)
float ran1(long *idum) {
int j;
long k;
static long iy=0;
static long iv[NTAB];
float temp;
13
/* initialization sequence from any negative integer */
if(*idum <= 0 || !iy) {
if(-(*idum) < 1) *idum=1;
else *idum=-(*idum);
for(j=NTAB+7;j>=0;j--) {
k=(*idum)/IQ;
*idum = IA*(*idum-k*IQ) - IR*k;
if(*idum < 0) *idum += IM;
if(j < NTAB) iv[j] = *idum;
}
iy=iv[0];
}
/* generate random deviate in the interval [0,1] without endpoints */
k = (*idum)/IQ;
*idum = IA*(*idum - k*IQ) -IR*k;
if(*idum < 0) *idum += IM;
j=iy/NDIV;
iy=iv[j];
iv[j] = *idum;
if((temp=AM*iy) > RNMX) return RNMX;
else return temp;
}
/*************************************************************
This function calls the uniform generator ran1 and then makes the
translation to a
an exponential distribution value having a mean of 1/lambda. The first
argument is the
seed for ran1, the second argument is the exponential parameter lambda.
***********************************************************************
/
#include <math.h>
#include <stdlib.h>
float expdev(long *idum,
float *lambda) {
float ran1(long *idum);
float dum, X;
do
dum= ran1(idum);
while(dum==0.0);
14
X = -1 * ( log(1.0-dum)/(*lambda) );
/*
printf("%f %f %f\n", dum, *lambda, X); */
return X;
}
/*************************************************************
This function calls the uniform generator ran1 and then makes the
translation to a
single parameter geometric distribution value with parameter p (special
case where r=1).
The first argument is the seed for ran1, the second argument is the
parameter p.
***********************************************************************
/
#include <math.h>
#include <stdlib.h>
float geodev(long *idum,
float *pp) {
float ran1(long *idum);
float A,B, C, dum, X;
C= 1.0-(*pp);
do
dum= ran1(idum);
while(((log(C) * dum)/(*pp) + C) < 0);
A = (log(C) * dum)/(*pp) + C;
X = log(A)/log(C);
/* printf("%f %f %f %f %f\n", dum, *pp, C, A, X); */
return X;
}
/**************************************************************
This function calls the uniform generator ran1 and gammaln function,
creats a poisson distribution value by using rejection method. The
first argument is xm (lamda), the second argument is the seed for ran1.
***************************************************************/
#include <math.h>
#define PI 3.141592654
15
float poidev(long *idum, float xm)
{
float gammln(float xx);
float ran1(long *idum);
static float sq,alxm,g,oldm=(-1.0);
float em,t,y;
if (xm<12.0) {
if (xm != oldm) {
oldm=xm;
g=exp(-xm);
}
em=-1;
t=1.0;
do{
++em;
t *= ran1(idum);
} while (t>g);
} else {
if (xm!=oldm) {
oldm=xm;
sq=sqrt(2.0*xm);
alxm=log(xm);
g=xm*alxm-gammln(xm+1.0);
}
do {
do{
y=tan(PI*ran1(idum));
em=sq*y+xm;
} while (em<0.0);
em=floor(em);
t=0.9*(1.0+y*y)*exp(em*alxm-gammln(em+1.0)-g);
} while (ran1(idum)>t);
}
return em;
}
/*************************************************************
This function is used to get natural log of the gamma function.
*************************************************************/
#include <math.h>
float gammln(float xx)
{
double x,y,tmp,ser;
static double cof[6]={76.18009172947146,86.50532032941677,24.01409824083091,1.231739572450155,0.1208650973866179e-2,-0.5395239384953e-5};
int j;
y=x=xx;
tmp=x+5.5;
tmp -= (x+0.5)*log(tmp);
ser=1.000000000190015;
for (j=0;j<=5;j++) ser += cof[j]/++y;
16
return -tmp+log(2.5066282746310005*ser/x);
}
17