Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Project 1 Advanced Hydrology 599 Survey of Point Process Precipitation Models and Implementation of the BartlettLewis Cluster Model Formulation Alan Hamlet 1/22/02 Overview For this project we reviewed some selected papers in the literature on stochastic point process precipitation models and wrote computer code to implement two of the more successful techniques in order to become more familiar with the technical details. The work in coding the models was roughly divided into several parts: 1) construction of three number generators for Poisson, geometric, and exponential distributions 2) coding of two cluster-based models and aggregation to daily timestep 3) testing using one case study from the literature, which included a statistical analysis of observed and simulated data 4) A sensitivity analysis of the parameter space surrounding around the optimal solution used in testing the code. It was hoped that we would also be able to fully implement the process of fitting these models to some new data, but this proved to be a challenging problem in and of itself that exceeded the time available for the project. Chunmei and I wrote the code for the number generators and accompanying statistical analysis as a shared resource, and we also used a single precipitation data set (15 May-16 June for Denver Air Port) that had been previously fit to both the Neyman-Scott and Bartlett-Lewis formulations as a test case in two papers (Rodreigues-Iturbe et al. 1984,1987). The implementation of the code for each model formulation was done individually, with Chunmei examining the Neyman-Scott formulation and myself the Bartlett-Lewis formulation. At the outset of the project we reviewed a number of articles on the historic development of point process models, but settled on Rodriguez-Iturbe (1984, 1987) as the best surveys of available practice and techniques. There are a few more recent articles discussing these techniques as well (e.g. Velghe et al. 1994), but we primarily looked at what these authors were attempting to do, and did not spend much time on the details of these investigations. Foufoula-Georgiou (1985) also provided some useful references, discussion, and information on technical details. A Brief Overview of Some Point Process Model Types Models based on simple Poisson rate processes that determine, in the model, the time between rain events, and variations on this basic concept that give additional temporal structure to the simulated storm events by initiating rectangular pulses with stocastic length and depth have been shown to have difficulties in reproducing a range of statistical properties that characteristic precipitation events at a point. These simple models, for example, can be trained to reproduce the mean and variance of the data, but may fail to capture the autocorrelations present in the observed data (Rodreigues-Iturbe et al. 1987). These models also have the limitation that aggregation to data having different time steps does not necessarily reproduce the observed statistics well in each case, i.e. the parameters must be chosen based on a particular timestep. So called "cluster-based" models add more temporal structure by linking a Poisson based storm initiation process (a stochastic time between “storm origins”) with the initiation of a number of storm “cells”, each of which is characterized either as a pulse of stochastic “depth” (as in some Neyman-Scott formulations), or as a stochastic precipitation rate and duration (as in the Bartlett-Lewis formulation). The contribution from these individual storm cells are then summed together to construct the total precipitation occurring in some regular interval of time (i.e. a time series of hourly or daily data). Models of this type have been demonstrated to successfully reproduce a number of meaningful statistics such as the mean, variance, skewness, autocorrelation, and probability of days without rain. These more sophisticated schemes also demonstrate better stability within the fitted parameter space when moving between different time scales (i.e. the same parameters will work reasonably well when aggregating the pulse data to hourly or daily timesteps) (Rodreigues-Iturbe et al. 1987) The process of fitting appropriate parameters to reproduce the statistics of a given data set involves the solution of a number of simultaneous non-linear equations equating the secondary characteristics (e.g. mean, variance, etc.) of the model formulation and the statistics of the observed data. Bartlett Lewis Formulation The Bartlett-Lewis formulation produces a sequence of storm cells having intensity (precipitation rate) and duration and uses five parameters to define the stochastic nature of the construction process. There is no theoretical constraint on the type of distributions used in each step of the process, but the formulation used here (Rodreigues-Iturbe et al. 1987) has been demonstrated to work reasonably well in reproducing the autocorrelation of observed data. The process of generating a series of "storm cells" with stochastic duration and intensity may be summarized as follows: 1) The generation of a “storm origin” is assumed to follow a Poisson distributed rate process having an expected value of lambda. Thus the expected value of the time between storm origins is 1/lambda. This parameter has a strong effect on the mean, variance, and time between precipitation events. 2) Each storm origin will generate a variable number of “storm cells” until a certain time from the storm origin is exceeded, at which time cell generation ceases. This time from the storm origin, which I will call the “generation duration” occurs is taken to be exponentially distributed with an expected value of 1/gamma. This parameter contributes to the persistence of storm cells once a storm is initiated, and almost completely determines the autocorrelation at lag 1. 3) The first cell is formulated to occur at the time of the storm origin, and additional cells are generated (until the generation duration is exceeded) as a second Poisson 2 process with expected value Beta. So the mean time between storm cells is 1/Beta. This parameter partly determines the intensity of each storm event. 4) The intensity of each cell’s contribution is taken to be exponentially distributed with expected value 1/eta. This parameter partly determines the intensity of each storm event. 5) The duration of each cell’s contribution is taken to be exponentially distributed with expected value 1/nu. This parameter partly determines the persistence of each storm event and the autocorrelation within storms. The figure below shows a schematic of the cell generation process. Each storm cell generated is defined in the code simply by a start time, an end time, and an intensity. End of Cell Generation Storm Origin Cell Duration Cell 3 Cell 1 Cell 2 Cell 4 Time to next cell Cell Intensity Schematic of Bartlett-Lewis Cell Generation Scheme Methods of aggregation to daily time step is discussed in the next section. A Brief Overview of Parameter Optimization Strategies The process of fitting a set of model parameters to a set of statistics derived from some observed precipitation data is difficult in practice, and a number of researchers have conducted studies to attempt to simplify these procedures. The most basic technique and probably the most objective, is to explicitly solve a series of independent non-linear equations using numerical techniques. This is the approach of Rodreigues-Iturbe et al. (1987) in our test case. Other researchers have made the case that some of the parameters tend to dominant the effects on certain statistics of the simulated data, and that these can be selected as initial estimates, thus reducing the degrees of freedom of the problem. This would not appear to be viable option in the case of the Bartlett-Lewis formulation, however (see section below on parameter sensitivity analysis). Some Technical Details Number Generators The number generators were all taken from code in Numerical Recipes in C (Press et al. 1997) and were all based on the uniform deviate generator ran1 (Press et al. 1997, pp 280) . Exponential and geometric distribution values were based on continuous values of 3 the cumulative distribution functions; the Poisson processes were assumed to generate discrete integer values of the rate (i.e. 0,1,2,3,. . .), and were generated using the “rejection method “(Press et al. 1997, pp 294). Use of Poisson Distributed Values Some confusion was initially encountered in how to treat a value of 0 returned from the discrete Poisson number generator (a frequent occurrence in the case of the storm origin, for example). It was decided that the appropriate action when receiving a value of zero for the Poisson rate is simply to advance the time variable one hour, and generate no cells. For the second Poisson process determining the rate of cell generation, a value of zero from the number generator would again advance the time of generation one hour, and no cell would be generated at that time, however the cell generation process continues until the generation duration is exceeded. Aggregating the Contribution of Each Storm Cell to Daily Time Steps Each storm cell is saved in the code as a start time, an end time, and an intensity. In generating the daily time series for precipitation, the contribution of each storm cell to each individual 24-hour segment is determined as the portion of the cell’s duration (hrs) falling within the 24 hour “window” multiplied by the cell’s intensity (mm/hr). If more than one cell affects the 24 hour window, the contributions are summed. Model Test Case The test case for the coded model was the fitted parameters reported in Rodreigues-Iturbe et al. (1987) using data from the Denver Air Port for the period from 15 May- 16 June for the years from 1949-1976. We obtained the daily rainfall data from the Earth Info CD’s, and calculated the statistics used in the fitting process for both the observed data and our model simulations aggregated to daily timestep. The table below shows the model parameters used. The model was allowed to produce storm cells until the elapsed time was greater than 100,000 hours, and the cell pulses were then aggregated to a daily time series. The long simulations reduced the sensitivity to the initial seed value for the random number generator. Our model results are shown below and compared to the results presented in the paper. These results were considered good enough to verify that the model formulation was reasonably consistent with what was reported in RodreiguesIturbe et al. (1987) for the Bartlett-Lewis formulation and that the code in our implementation was working as intended. 4 Poisson Distribution Parameters 0.00796 (lambda) 0.600 (beta) Exponential Distribution Parameters 0.0947 (gamma) 0.334 (eta) 1.70 (nu) Model Parameters (from Rodreigues-Iturbe et al. (1987)) Stats for Observed Data: Stats for B-L Model Data: Stats for B-L Model Data: (Our Analysis) (Our Implementation) (Rodriguez-Iturbe 1987) mean = 2.10 variance = 41.6 skewness = 5.67 autocorrelation lag1 = 0.158 autocorrelation lag2 = -0.0224 autocorrelation lag3 = -0.0223 probability of zero rain = 0.64 mean = 2.122 variance = 35.3 skewness = 4.03 autocorrelation lag1 = 0.204 autocorrelation lag2 = 0.013 autocorrelation lag3 = -0.006 probability of zero rain = 0.76 mean = 2.125 variance = 40.52 skewness = 3.05 autocorrelation lag1 = 0.16 autocorrelation lag2 = 0.01 autocorrelation lag3 = 0 probability of zero rain = 0.78 Model Verification Results Comparing Observed and Simulated Statistics I should note that our statistics for the observed data do not exactly match those reported in Rodrigues-Iturbe et al. (1987), perhaps because these authors had access to hourly data which was subsequently aggregated to more precise estimates of precipitation at daily timestep, or to other discrepancies that are not diagnosed here. My model simulations had similar statistical characteristics to those reported for the B-L model in the paper using the same parameters, but these, too, did not exactly match. Potential discrepancies between the number generators used may explain minor discrepancies in the results. Parameter Sensitivity Analysis A simple sensitivity analysis was performed on each parameter individually (keeping the other parameters fixed), in which each parameter value was halved and doubled , and the output statistics compared. While this is not very sophisticated analysis, it gives some sense of how the output statistics are affected by the choice of parameter, which is what I was after. The results are shown in the series of figures below (one for each parameter). 5 Percent Change 140 120 100 80 60 40 20 0 -20 -40 -60 *1/2 *2 mean variance skewness autocorr lag1 p no rain Sensitivity to Lambda (storm origin rate parameter) 200 Percent Change 150 100 50 *1/2 *2 0 -50 -100 mean variance skewness autocorr lag1 p no rain Sensitivity to Beta (storm cell generation rate parameter) 6 Percent Change 140 120 100 80 60 40 20 0 -20 -40 -60 -80 *1/2 *2 mean variance skewness autocorr lag1 p no rain Sensitivity to Gamma (cell generation process duration ) 350 Percent Change 300 250 200 150 *1/2 100 *2 50 0 -50 -100 mean variance skewness autocorr lag1 p no rain Sensitivity to Eta (cell intensity ) 7 350 Percent Change 300 250 200 150 *1/2 100 *2 50 0 -50 -100 mean variance skewness autocorr lag1 p no rain Sensitivity to nu (cell duration ) The sensitivity analysis shows that while some parameters in the Bartlett-Lewis formulation have a dominant effect on certain statistics (e.g. lambda clearly determines the probability of no rain near the opitimal solution, and gamma the lag1 autocorrelation), it is also clear that the degrees of freedom in the fitting problem cannot be reduced very much in an objective manner. This point was also made by Rodreigues-Iturbe et al. (1987). Discussion Cluster-based point process models of the type examined here reached a sufficient state of development in the late 1980's to successfully reproduce the statistics of observed rainfall patterns at a single point. While some applications are clearly still available for these techniques (e.g. filling in holes in data records). The issues surrounding nonstationary precipitation statistics and the importance of spatial variations at various time scales would appear to limit the importance of these methods for other purposes. It would appear, for example, that physically-based climate models may eventually provide a better approach to the problem of estimating future precipitation variability, with perhaps some extension of these techniques in statistical downscaling procedures. For example, variations in simulated pressure fields from reanalysis data could be associated with precipitation statistics in the observed record, to which a stochastic model could be fitted. Then short-timestep precipitation data could potentially be generated from future climate scenarios from first principles, with the assumption that the relationship between current weather patterns (e.g. pressure fields) and precipitation statistics are essentially unaltered in the future. This is very different from the assumption that the current precipitation statistics will be the same in the future. 8 References: Foufoula-Georgiou, E, 1985, Discrete-Time Point Process Models for Daily Rainfall, Water Resources Series Technical Report No. 93, Department of Civil and Environmental Engineering, University of Washington, March Rodriguez-Iturbe, I, Cox, DR, Isham V, 1987, Some Models for rainfall based on stochastic point processes, Proc. R. Soc. London, A410, pp 269-288 Rodriguez-Iturbe, I, Gupta, VK, Waymire, E, 1984, Scale considerations in the modeling of temporal rainfall, WRR, 20 (11), pp 1611-1619 Press W, Teukolsky, Vetterling, Flannery, 1997, Numerical Recipes in C, Cambridge University Press Velghe, T, Troch, PA, De Troch, FP, Van de Velde, J, 1994, Evaluations of cluster-based rectangular pulses point process models for rainfall, WRR, 30 (10), pp 2847-2857 9 Appendix 1 Model Code /****************************************************** This program outputs a time series of stocastic Bartlett-Lewis Pulses and includes math functions for one uniform and three non-uniform distributions ******************************************************/ #include <math.h> #include <stdlib.h> #include <stdio.h> int main(void) { int i,j,k; int stime, etime, MAXTIME=200000; long int *seed=0; float expparam[5], poiparam[5], geoparam[5]; float ran1(long int *idum); float expdev(long int *idum, float *lambda); float geodev(long int *idum, float *pp); float poidev(long *idum, float xm); float gammln(float xx); float P, basetime, time, celltime, gentime, dura, intens; float pulse[30000][3], sum, up, low; int numcells, countpulse, daycount; /* Assign memory to seed pointer*/ seed=malloc(4); /* initialize the random number generator seed and params */ *seed=-5000; printf("%ld\n", *seed); poiparam[0] = 0.00796; printf("%f\n", poiparam[0]); poiparam[1] = 0.6; printf("%f\n", poiparam[1]); geoparam[0] = 0.1; printf("%f\n", geoparam[0]); geoparam[1] = 0.4; printf("%f\n", geoparam[1]); expparam[0] = 0.09479; printf("%f\n", expparam[0]); expparam[1] = 0.33445; 10 printf("%f\n", expparam[1]); expparam[2] = 3.4; printf("%f\n", expparam[2]); /*Neyman-Scott Model basetime=0; while(time<MAXTIME){ basetime += 1/poidev(seed, poiparam[0]); numcells= (int)geodev(seed, &geoparam[0]); printf("numcells = %d\n", numcells); for(j=0; j<numcells; j++){ time= basetime + expdev(seed, &expparam[1]); intens= expdev(seed, &expparam[2]); printf("%f %f\n", time, intens); } } */ /*********************Bartlett-Lewis Model **/ basetime=0; countpulse=0; while(basetime<MAXTIME) { /* printf("basetime=%f ,gentime=%f\n", basetime, gentime); */ /* generate new storm origin and stocastic cell generation time*/ P=poidev(seed, poiparam[0]); gentime= expdev(seed, &expparam[0]); if(P==0.0) basetime += 1.0; else { basetime += 1/P; /* if a storm is initiated make one cell at storm origin*/ celltime=basetime; intens=expdev(seed, &expparam[1]); dura=expdev(seed, &expparam[2]); pulse[countpulse][0]=celltime; 11 pulse[countpulse][1]=celltime+dura; pulse[countpulse][2]=intens; countpulse++; /* printf("%f %f %f\n", celltime, celltime+dura, intens); */ /*if a storm is initiated generate cells until the gentime is exceeded*/ while(celltime < (basetime+gentime)){ P=poidev(seed, poiparam[1]); if(P==0.0) celltime += 1.0; else { celltime += 1/P; intens=expdev(seed, &expparam[1]); dura=expdev(seed, &expparam[2]); pulse[countpulse][0]=celltime; pulse[countpulse][1]=celltime+dura; pulse[countpulse][2]=intens; countpulse++; /* printf("%f %f %f\n", celltime, celltime+dura, intens); */ } } } } /********************************************************************** ******/ /*************aggegate to daily data*********************/ daycount=1; stime=0; etime=24; while(daycount < MAXTIME/24){ for(i=0;i<countpulse; i++) { if(pulse[i][1] <= etime) up= pulse[i][1]; else up = etime; if(pulse[i][0] >= stime) low= pulse[i][0]; else low = stime; if(pulse[i][1] < stime || pulse[i][0] > etime) sum +=0.0; else sum += (up - low) * pulse[i][2]; } printf("%d %f\n", daycount, sum); 12 sum=0; daycount++; stime += 24; etime += 24; } /************************************************************/ /*************************************** testing routine printf("test values:\n"); for(i=0;i<25;i++) printf("%f %f %f %f\n", ran1(seed), expdev(seed, &expparam[0]), geodev(seed, &geoparam[0]), poidev(seed, poiparam[0])); *******************************************************/ free(seed); return (0); } /*********************************** This function implements a uniform number generator Numerical Recipies, 1992) ***********************************/ #define #define #define #define #define #define #define #define #define [0,1] (pg 280 IA 16807 IM 2147483647 AM (1.0/IM) IQ 127773 IR 2836 NTAB 32 NDIV (1+(IM-1)/NTAB) EPS 1.2E-7 RNMX (1.0-EPS) float ran1(long *idum) { int j; long k; static long iy=0; static long iv[NTAB]; float temp; 13 /* initialization sequence from any negative integer */ if(*idum <= 0 || !iy) { if(-(*idum) < 1) *idum=1; else *idum=-(*idum); for(j=NTAB+7;j>=0;j--) { k=(*idum)/IQ; *idum = IA*(*idum-k*IQ) - IR*k; if(*idum < 0) *idum += IM; if(j < NTAB) iv[j] = *idum; } iy=iv[0]; } /* generate random deviate in the interval [0,1] without endpoints */ k = (*idum)/IQ; *idum = IA*(*idum - k*IQ) -IR*k; if(*idum < 0) *idum += IM; j=iy/NDIV; iy=iv[j]; iv[j] = *idum; if((temp=AM*iy) > RNMX) return RNMX; else return temp; } /************************************************************* This function calls the uniform generator ran1 and then makes the translation to a an exponential distribution value having a mean of 1/lambda. The first argument is the seed for ran1, the second argument is the exponential parameter lambda. *********************************************************************** / #include <math.h> #include <stdlib.h> float expdev(long *idum, float *lambda) { float ran1(long *idum); float dum, X; do dum= ran1(idum); while(dum==0.0); 14 X = -1 * ( log(1.0-dum)/(*lambda) ); /* printf("%f %f %f\n", dum, *lambda, X); */ return X; } /************************************************************* This function calls the uniform generator ran1 and then makes the translation to a single parameter geometric distribution value with parameter p (special case where r=1). The first argument is the seed for ran1, the second argument is the parameter p. *********************************************************************** / #include <math.h> #include <stdlib.h> float geodev(long *idum, float *pp) { float ran1(long *idum); float A,B, C, dum, X; C= 1.0-(*pp); do dum= ran1(idum); while(((log(C) * dum)/(*pp) + C) < 0); A = (log(C) * dum)/(*pp) + C; X = log(A)/log(C); /* printf("%f %f %f %f %f\n", dum, *pp, C, A, X); */ return X; } /************************************************************** This function calls the uniform generator ran1 and gammaln function, creats a poisson distribution value by using rejection method. The first argument is xm (lamda), the second argument is the seed for ran1. ***************************************************************/ #include <math.h> #define PI 3.141592654 15 float poidev(long *idum, float xm) { float gammln(float xx); float ran1(long *idum); static float sq,alxm,g,oldm=(-1.0); float em,t,y; if (xm<12.0) { if (xm != oldm) { oldm=xm; g=exp(-xm); } em=-1; t=1.0; do{ ++em; t *= ran1(idum); } while (t>g); } else { if (xm!=oldm) { oldm=xm; sq=sqrt(2.0*xm); alxm=log(xm); g=xm*alxm-gammln(xm+1.0); } do { do{ y=tan(PI*ran1(idum)); em=sq*y+xm; } while (em<0.0); em=floor(em); t=0.9*(1.0+y*y)*exp(em*alxm-gammln(em+1.0)-g); } while (ran1(idum)>t); } return em; } /************************************************************* This function is used to get natural log of the gamma function. *************************************************************/ #include <math.h> float gammln(float xx) { double x,y,tmp,ser; static double cof[6]={76.18009172947146,86.50532032941677,24.01409824083091,1.231739572450155,0.1208650973866179e-2,-0.5395239384953e-5}; int j; y=x=xx; tmp=x+5.5; tmp -= (x+0.5)*log(tmp); ser=1.000000000190015; for (j=0;j<=5;j++) ser += cof[j]/++y; 16 return -tmp+log(2.5066282746310005*ser/x); } 17