Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Survey

Document related concepts

no text concepts found

Transcript

WORLD METEOROLOGICAL ORGANIZATION OPERATIONAL HYDROLOGY REPORT No. 33 STATISTICAL DISTRIBUTIONS FOR FLOOD FREQUENCY ANALYSIS SECRETARIAT OF THE WORLD METEOROLOGICAL ORGANIZATION - GENEVA - SWITZERLAND 1989 CONTENTS vi Page CHAPTER 6 6.1 6.2 6.3 6.4 65 CHAPTER 7 7.1 7.2 CHAPTER 8 8.1 82 8.3 8.4 85 8.6 8.7 8.8 8.9 8.10 METHODS OF CHOOSING BETWEEN DISTRIBUTIONS Introduction Influence of Outliers Traditional Methods Recent Approaches , Summary .. . .. . . . DISTRIBUTIONS PREVIOUSLY CHOSEN OR RECOMMENDED FOR NATIONAL USE . WMO Survey . Selected Cases .. CONCLUDING REMARKS Types of model The modelling problem Descriptive ability of distributions Predictive ability and robustness Parameter estimation At-site and at-site{regional estimation Arid and semi-arid zones Regional homogeneity Necessity for flow gauging Interprelation and use of flood frequency estimates . .. . . . .. . . . . .. 45 47 50 52 52 53 59 59 59 59 59 59 60 60 60 61 61 63 REFERENCES APPENDIX I APPENDIX 2 APPENDIX 3 APPENDIX 4 APPENDIX 5 APPENDIX 6 43 43 43 VOLUME FLOODS ESTIMATES OF POPULATION MOMENTS AND THEIR BIASES MOMENT RATIO DIAGRAMS PARAMETER ESTIMATION BY PROBABILITY WEIGHTED MOMENTS (PWM) NUMERICAL EXAMPLES WMO SURVEY ON DISTRIBUTION TYPES CURRENTLY IN USE FOR FREQUENCY ANALYSIS OF EXTREMES OF FLOODS BY HYDROLOGICAL AND OTHER SERVICES AU . . . A2.1 .. M.I . A5.1 .. A6.1 A3.1 FOREWORD Choosing a statistical distribution for flood frequency analysis has remained a difficult problem for hydrologists, designers of hydraulic structures, irrigation engineers and planners of water resources. Several distribution types are used to estimate extremes of flows and precipitation, but the merits of their applicability to different typeS of data and for different purposes have not been clearly established. The reasons for operational use of a particular distribution type in many countries are frequently subjective or historical, as confmned by the 1983 - WMO survey, results of which are given in this report In addition to the survey on current practices of countries with regard to selection and application of statistical distributions, the WMO Commission for Hydrology, at its sixth session (1980) decided that a report containing detailed guidance on the merits and selection of distribution types for flood frequency analysis should be prepared. Similar guidance on selection of distribution types for extremes of precipitation was published by WMO in 1981 (Operationl Hydrology Report No. 15 - WMO-No. 560). Bearing this in mind, the Secretariat arranged for the preparation of the present report which has been written by Dr. C. Cunnane of the Department of Engineering Hydrology, University College Galway, Ireland. I should like to express the gratitude of the World Meteorological Organization to Members who have willingly participated in the survey and to Dr. Cunnane for preparing this excellent report on a very important subject. (G.O.P Obasi) Secretary-General x SUMMARY Since the beginning of this centnry significant developments have taken place in the methods for statistical flood freqnency analyis nsing models to annual maximum (AM) series and partial duration (PD) series. The AM and PD models are defined and compared and the relation between return period T and the distribution function F(Q) of flood magnitudes in each series is developed, leading to the Q - T relation. General statistical properties of observed AM and PD series are then outlined, following which a discussion of the problems encountered in modelling these series satisfactorily is given. Methods of estimating flood distribution quantiles using at-site and regional data both separatly and together, are outlined in Chapter 4, with some examples in Appendix 5. These include the efficient and robust methods based on regionally averaged probability weighted moments, Bayesian methods and the regionally estimated TCEV model. Regional homogeneity and flood quantile estimation for arid zones are also discussed in Chapter 4. Properties of flood quantile estimators are discussed in Chapter 5 which includes discussion of robustness and efficiency of competing models and the effects on quantile estimates of regional heterogeneity, of temporal dependence in flood series and of spatial dependence between flood series. Methods of choosing between statistical distributions are diScussed in Chapter 6 where it is pointed out that conventional goodness of fit tests are of little value in this context. Studies conducted in a number of countries aimed at selecting a "best" distribution for nationwide use in the AM model are outlined in Chapter 7, followed by asummary and concluding remarks in Chapter 8. In addition to the traditional methods of testing the goodness of fit of distribntions of observed flood series the report discnsses the progress which has been made in recent years through "behaviour" related studies and robustness studies, Behaviour studies examine the statistical characteristics of samples of flood data within several regions on the one hand and of random samples drawn from candidate distributions on the other. All of the traditionally used distributions fail to behave like real hydrological data and only the recently developed Wakeby and TCEV distributions seem to be satisfactory. Robustness studies have shown that the Wakeby (WAK) and General Extreme Value (GEV) distributions when used regionally, with parameters estimated by probability weighted moments (PWM), are robust and least sensitive to changes in the nnknown parent distribution which is being modelled. TCEV tends to be unbiased but less efficient than these two. On the other hand, the log Pearson Type 3 (LP3) distribution with parameter estimates obtained by moments from logarithms of the data is least robust of the popular methods and it is extremely sensitive to changes in the unknown underlying parent distribntion. Therefore LP3 cannot henceforth be recommended generally for flood frequency estimation. In the use of Wakeby and GEV distributions parameter estimation, where possible, should be carried out on a regional basis. Parameter estimation by PWM, although not quite as efficient as maximum likelihood, is almost free of bias, easy to use and generally unaffected by outliers. Illustrative examples of parameter estimation are given in the appendices of the report. In Appendix 6 is given a summary and an analysis of replies to a questionnaire on distribution types currently in use for frequency analysis of extreme of precipitation and floods by national Hydrological and other Services. This survey reveals that the Extreme Value TYpe I (EVI) and the log-normal distribution are used more commonly. The Weibull formula is favoured most for plotting position although its continued use for this pllIJlose is not recommended in this report. This report recommends that flood estimates be based on joint nse of at-site and regional data using an index flood method of quantile estimation. It also recommends that flow measurements be commenced at any site, or in any region, which is being contemplated as the site of any water resources project. To date, no amount of formulae or statistical dexterity can make np for the lack of data on which to base an accurate estimate of mean annual flood and hence of floods of higher retnrn periods at a site. xi RESUME Depuis Ie debut du siecle, les methodes d'analyse statistique de la frequence des crnes ont connu une evolution significative par I'usage de modeles de series des maximums annuels et de series de valeurs excedentaires. Apres avoir defini et compare les modeles en question, I'auteur developpe une relation entre la periode de recurrence T et la fonction de distribution F(Q) des grandeurs de crne dans chaque serie. Les proprieres statistiques generales des series observees sont alars esquissees, et les problemes que souleve la modeIisation de ces series sont exposes. Les methodes d'estimation des quantiles de la distribution des crnes en utilisant les mesures in situ et des donnees regionales, tant separement que conjointement, sont decrites au chapitre 4, avec quelques exemples dans I'appendice 5. II s'agit notamment des methodes efficaces et robustes fondees sur les moyennes regionales des moments ponderes par la probabilite, des methodes de Bayes et du modele des valeurs extremes 11 deux composantes (TCEV) estime sur une base regionale. On trouvera aussi au chapitre 4 une discussion sur l'homogeneireregionale et I'estimation des quantiles de crnes pour les zones arides. Les proprieres des estimateurs de quantiles de crnes sont examinees au chapitre 5, oul'on passe en revue la robustesse et I'efficacite de modeles concurrents et les effets sur les estimations de quantiles de I'heterogenCite regionale, de la dependance temporelle dans les series de crnes et de la dependance spatiale entre les senes de crnes. Les methodes servant 11 faire une selection entre distributions statistiques sont examinees au chapitre 6, ou i! est indique que les tests classiques de l'adeqnation d'une distribution sont de peu de valeur dans ce contexte. Les etudes menees dans un certain nombre de pays en vue de choisir la distribution "Ia meilleure" 11 utiliser 11 l'echelle d'un pays dans Ie modele de series des maximums annuels sont decrites au chapitre 7, suivies d'un resume et de conclusions au chapitre 8. Outre les methodes traditionnelles de verifier I'adequation de distributions de series de crnes observees, Ie rapport examine les progres realises ces dernieres annees grace 11 des etudes se rapportant au "comportement" et des etudes de robustesse. Les etudes relatives au comportement examinent d'une part les caracteristiques statistiques d'echantillons de donnees relatives aux crnes dans plusieurs regions et d'autre part celles d'echantillons aleatoires tires des distributions envisagees. Toutes les distributions traditionnellement utilisees ne se comportent pas commes les donnees hydrologiques reelles et seules les distributions de Wakeby et TCEV recemment mises au point semblent satisfaisantes. Des etudes de robustesse ont montre que la distribution de Wakeby (WAK) et la distribution generale des valeurs extremes (GEV) lorsqu'elles sont utilisees sur Ie plan regional, avec des parametres estimes 11 I'aide des moments ponderes par la probabilite, sont robustes et Ie moins sensibles aux variations de la distribution reelle inconnue qu'i! s'agit de modeliser. La distribution TCEV tend 11 etre sans biais mais elle est moins efficace que les deux precedentes. D'autre part, la distribution logarithmique de Pearson de type 3 (LP3) avec des estimations de parametres obtenus 11 I'aide de moments des logarithmes des donnees est la mains robuste des methodes courantes et clle est extrememeut sensible aux changements de la distribution reelle inconnue. Par consequence, la distribution LP3 ne peut desorrnais pas etre recommandee d'une maniere generale pour estimer la frequence de crnes. Dans I'utilisation de la distribution de Wakeby et de la distribution GEV, l'estimation des parametres devrait, autant que possible, se faire sur une base regionale. L'estimation des parametres 11 l'aide des moments ponderes par la probabilire (PWM), bien qu'elle ne soit pas tout 11 fait aussi efficace que la vraisemblance maxirnale, est presque exempte de biais, facile 11 utiliser et n'est generalement pas affectee par des valeurs ab6rrantes. Les appendices du rapport contiennent 11 titre d'illustration des exemples d'estimation des parametres. Dans I'appendice 6, on trouvera un resume et une analyse des reponses 11 un questionnaire sur les types de distribution actuellement utilises par les services nationaux hydrologiques et autres pour l'analyse de frequence des valeurs extremes des precipitations et des crnes. Cette enquete montre que la distribution des valeurs extremes du type I (EVI) xii RESUME et la distribution log-normale sont les plus couramment utilisees. La formule de Weibull a surtout la faveur pour Ie report sur papier bien que Ie present rapport ne recommande pas que I'on continue a I'utiliser a cet effet. Le rapport recommande que les estimations de crnes se fondent sur I'utilisation conjointe des donnees in situ de valeurs regionales avec une methode d'estimation des quantiles fondee sur une erne indice. II recommande aussi que I'on entreprenne des mesures de I'<\coulement en tout point, ou en lOute region, oil on envisage d'amenager un ouvrage concernant les ressources hydrauliques. A ce jour, ni Ie nombre des formules ni I'habilete statistique ne peuvent pallier Ie manque de donnees pour fonder une estimation exacte de la erne annuelle moyenne et par consequent des crnes ayant une periode de recurrence plus grande en un site donne. xiii PE310ME C Ha'l3Jla 11W16mH6ro CI'O!I6THJ1 C BIIQIJ;OM B MQIl6!lH PUOB JJ)lIIHIlX rQIIOBYX MaKCHMYMOB JJ)lIIHIlX 'IllCl'H'IHOH nJlQllomKHrenbHOCfH (PO) (AM) H PUOB 3Ha'IHrenbllO yalB6pm6HCrDOB3J1HCb M6TQl1b1 crarncrll'leGKoro aH3J1H3a nOBTOpH6Mocru n_OB. B OT'I6T6 ,llaHbI onjl6.ll6!l6HH6 H CPaBH6HH6 MQll6!l6H T n6PHQllOM nOHTOpH6MOCfH ycrauOBHTb CBH3b M6lKJlY AM H PD. Q H qiYHKI\II6H pacnpeJl6!l6HHH H T. FlQ) AM H PO H YCTallOBJI6HO COOTHom6HH6 M6:JK.llY B6!1H'lHH naBQllKOB B KIDI<JlOM PJlllY, n03BOJ1HBmee npHHeJl611b1 TaIOK6 06mH6 CTaTHCfHqecKH6 xapaKT6pHCTHKH PUOB Ha6moJl6HHii 3a KOTOpYMH C116JlY6T OnucauH6 npOOn6M. CBH3aHHIlX C npH6MJl6MID1 MQll6!lHPOIIHHH6M 3TUX pUOB. M6TQllY 01l6HOK KBaHTHJ16H pacnp6Jl6!l6HHH naBQllKOB C HCnMb30BaHH6M nOKanbHYX H perHoHanbHYX JJ)lIIHIlX teaK no OTJl6!lbIlOCfH. TaK H BMecre. HPHB6Jl6HbI B rnaoo 4 C HecKMbKHMH npHM6paMH B npHJ1OlIC6HHH 5. B HUX BKJ1lOq6HbI 3l\iqieKTHBHbl6 H ycrOHqHBbl6 M6TQllbl, OCHOBaHHbl6 Ha perHOHanbHO OCpeJlH6HHYX B3B6IlI6HHbIX MOM6HTa.X B6poHTHOCfH. M6TQllY 1i6HCCH H MQll6!lb TCEV )lJ1H perHOH3J1bHOro paCq6Ta. B maDe 4 H3nOllC6HbI peruoHanbBaH QllHOPQllIIOCfb H OU6HKH naBQlllCOBbIX KllHHTHJ16H JlJ1H 3aCYIlIJ1HBYX 3OH. 5. CBOHCTBa 01l6HOK naBQllKOBYX KBaHTHn6H 06cYlKJl6HbI B maB6 B KOTOPYIO BKJ1IOQ6HO onHGallH6 ycroH'IHBOCTH H 3l\il\i6KTHBIIOCfH COIIOCTaMH6MbIX MQll6Jl6H H 0llHCaHH6 MHHHHH. KOTOpoe OKa3b1Ba1OT Ha 01l611I<H KII3IfI1llIeH pernoHaJ1bllllJl IIJIOCI1l3OCI1I rereporellIfOCn.. 3aBHCHM0Cfb PJlllOB HaIIQIJ;KOB DO Bp6M611H H 33llItafM0Cfb PUOB HaIlQlJ;KOB B 6. M6TQllbI Bbloopa CTaTHcrHQ6CKHX pacnp6Jl6Jl6l1HH H3JlOllC611b1 B maB6 TpaJlHIIHOllllbl6 KpHTepHH comaGHH B )laHHOM CIIyqae 116 HM610T IionbillOro 3HaQellHH. B KOTOPOH nOKa3allO, QTO npoDeJl61111b16 B PU6 crpall HOCJI6JlOBaHHH. HanpaBIl6H11b16 lIa BbiOOP 'lIaunyqIlI6ro. paCHp6Jl6Jl6HHH JlJ1H HaHHOll3J1bllOro HcnMb30BaIIHH B MQll6JlHX AM. 1\3.IIOlK611b1 B rnaoo 7, a.pe3IOM6 H BbIIIQllY - B mane 8. ,l\o1l0JlllHT6Jlb1lO K TPlIJlHIIHOIIHYM MeTQllaM npoHePKH KpHTepHH COmaGIIJI paCHp6Jl6Jl611HH Bp6M611l1b1X PUOB lIa6n1OJl6HHIlX naBQlllCOB. B JJ)lIIIIOM OneT6 06cyJKJlH6TCH nporpecc, JlocrurllYTbIH B nOCll6JlHH6 rQllbI. B H3yq6HHii 'nOB6JleHHH' H YCTOH'IHBOCTH MQll6Jl6H. Hayq6HH6 nOB6Jl6I1HH KacaJ10Cb CTaTHCfHQecKHX xapaKTepHCTHK BbiooPKH JJ)lIIHIlX 110 naBQlllCaM B lIp6Jl6Jlax lIecKMbKHX perHOHOB, C QllIIOH CTOPOIIbI, H CIIyqaHllblx BbiooPOK H3 B03MOlKlIbIX pacnpeJl6!l6HuH. C JlJlYroH CTOpollbl. Be6 H3 TPaJlHIIHOHHO HCHOJ1b3Y6MbIX pacnp6Jl6JleHHH 116 nOKa3aJlH peaJJbllOro nODeJl611HH. CBOHcrB6HHOro rHJlPDOOrHQecKHM )laIIHYM. H TOJ1bKO p33paooTaHHDe I16JlaBHO pacnpeJl6Jl6HH6 )3HKOOH H pacnp6Jl6!l6llH6 TCEV. npeJlCTaBJ1HlOTCH YJlOM6TBOpHrenbHYMu. (WAK) H3YQ6HH6 ycrOHQHBOCfH 1I0Ka3HJ1H, QTO pacnpeJl6!l6HH6 )3HK66H 3KCfpeManbHllX 3Ha'l611Hil (GEV), H pacnpeJl6!l6HH6 06IlIHX 6CIIH OHH HCHMb3ylOTCH perHOllanbHO BMecre C napaMeTpaMH, paroIHTaHHYMH 110 B3HeIlI6HIIbIM MOM6HTUM DePOJITIIOCfH (PWM). JlBJ1HIOTCH ycroH~bIMH H HaHM6Hee qyHCfBHrenbllblMH K H3MeH6HHJ1M B H6H3B6CfHOM pacnjl6.ll6!l6HHH r6H6panbHOH COBOtcyIlHOCfH. KOTOpoe MQll6JlHPYeTCH. HBJ1H6TCH HecM6D\6HHOH. HO M6Hee 3l\il\i6KTHBlfOH OIl6HKOH no cpaBlI6HHIO C norapHqiMHQ6CKDe pacnp6Jl6!l6HH6 nHpeOHa THna 3 (LP3) JlIIYMJI npeJlHMaraeTCH, QTO JlPyrHMH. TCEV C JlpyroH crOPOllbI, C paCCQHTaHHLlMH napaM6TpaMH, nMYQ6HHblMH no MOM61ITaM norapHqiMOB JJ)lIIHbIX, JlMJl6TCH HaHM6Hee ycrOHQHBYM H3 06D\6H3B6CfHbIX M6TQllOB H HCKJ1IO'lHT6!IbHO QYJICfBHT6!IbHYM K H3M6H611HHM B H6H31l6CT110M pacnpeJl6!l6HHH. n6lKaD\6M B OCHODe r6H6panbHoH COBOtcynHOCfH. nooTOMY LP3 o6bl'lHO H6 p6KOM6HJlYeTCH JlJ1H pacq6TOB nOBTOpJl6MOCfH HaBQIlICOB. UPH HCnMb30BaHHH paCnp6Jl6!l6HHJI )3HKOOH H pacnp6Jl6!l6HHH B03MOJKHocru JlOJ1lKeH IlPOBQllHTbClI Ha perHonanbHoH OCHOHe. GEV paCQ6T napaM6TpoB 110 M6p6 XOTJI pacc'lHTHIIIIbIH C nOMOD\blO PWM napaMeTp H6 HBJ1H6TCH crMb lIC6 3l\il\ieKTHBHYM. teaK MaKCHManbHaH DepDHTHOCfb, HO OH 1I0~H coo6Qll6H OT CM6D\61rnocru. nerKO HCII~TCH H H6 nQllB6pra6TCH B03Jl6HcraHlO peaKO OTKJ1OHHIOD\HXCJI 3HaQeuuH. B npHJ1OlIC6HHHX K OTQeTY JlHIOTCH npHM6pbl pam6Ta napaM6TpoB. B npHJ10llC6HHH 6 IIPHB6Jl6HbI pe31OM6 H aHanH3 OTB6TOB Ha BonpOCHHK no THnaM paCnp6Jl6Jl6HHH, KOTOpbl6 B HaCTOJlD\ee BP6MJI HCIIMb3YlOTCJI PEJIOME xiv lIaUHOmLllbHaMH rIlJlJlO.llorn.OCKHMH WlH ilpyrnMH CIlyilKWMlI naBQllKOB. 11M allaJUf3a nOBTOpBeMocrn 3KcrpeMll11blfill( llCa,l\KOB H choT 06wp nOKa:lblll'<eT. qTQ lIaHlionee qaero HCIlMb3yeTCH 3KcrpeMll11bllJlJl Be11H.H1la THna JlOrapH,pMH.ecKH 1I0PMll11blloe paCllpeilenellHe. I (EVI) H ,l(JlB HallecellHB MecTOnOJlOilKeHHB lIa rpa,pHK lIpeilnO.HTaJOT B OCIIOBIIOM HCIlMbWBaTb ,poPMYllY Beii6YJlJla. XOTB ee nOCTOBlllloe lIpHMellellHe ilJIB 3TOii uenH lie peKOMellilYeTClI B 3TOM OTQere;, B OT.eTe peKoMellilyeTcB IIPOH3BOilHTb pac.eTU naBQIlKOB lIa OCllOBe COBMecTHOrO HClIOJlb30BallHB JlOKll11bllUX H pernOIlll11bllUX I\lllIIIUX C IIpHMellellHeM lII/)\eKCIIOro MeTW KBalITHJlbllOii ouelllCH naBQllKOB. 'lluoKe peKOMellilYeTClI npoIIQII,IITb H3MepellHe nOTOKa B JIlo6oM MecrollOJ1OilKelllIH HJIH B JlI060M paiiolle. me lIJ1aHHPyeTCH npoHeilellHe JlI060ro npoeKTa. OTHOClIIllerorn K BQIllIUM pecypcaM. B lIacrDBIllee BpeMB lIHKaKoe KM""OC1'BO ,popMyn. H lIHKaKoe HCKYCCTBO CTaTHCTH.eCKOro aHaJIH3a He MoryT BOCIIOJIIIHTb lIeilOCTaTOK ilaHlIUX. ua KOTOpUX MOJKllO 060CIIOBaTb TO.IIUii pac.eT cpeilHerQllOBOrO naBOilKR H. CJlMOBaTenbIlO. naBOilKOB C 60JlblllHM lIepHQIlOM nOBTOpBeMocrn. xv RESUMEN Desde el comienzo del presente siglo se han producido importantes avances en los metodos para el anlilisis estadfstico de Ia frecuencia de las crecidas utilizando modelos de Ia serie de maximos anuales (AM) y de Ia serie de duraci6n parcial (PD). Se definen y comparan los modelos de AM y PD Yse desarrolla la relaci6n entre Ie perfodo de retorno T y Ia funci6n de distribuci6n F(Q) de las magnitudes de Ia crecida en cada serie, 10 que conduce a la relaci6n Q - T. Se resellan despues las propiedades estadisticas generales de las series AM y PD observadas, tras considerar los problemas hallados en la modelaci6n satisfactoria de esas series. En el Capitulo 4 se exponen los metodos para estimar los cuantiles de la distribuci6n de las crecidas utilizando por separado y juntos los datos regionales y los obtenidos in situ, dando algunos ejemplos en el Apendice 5. Entre estos figuran los metodos s6lidos y eficaces basados en los momentos ponderados de probabilidad regionalmente promediada, los metodos bayesianos y el modelo TCEV estimado regionalmente. Tambien se consideran en el Capitulo 4 la homogeneidad regional y Ia estimaci6n de los cuantiles de las crecidas en las zonas andas. En el Capitulo 5 se trata de las propiedades de los factores de estimaci6n de los cuantiles de las crecidas, 10 que incluye el exarnen de Ia solidez y eficacia de los modelos en competencia y los efectos que ejercen la heterogeneidad regional, la dependencia temporal en la serle de inundaciones y Ia dependencia espacial entre series de inundaciones sobre las estimaciones de los cuantiles. En el Capitulo 6 se discuten los metodos para e1egir entre distintas distribuciones estadfsticas, sella1ando que las pruehas convencionales de buen ajuste tienen escaso valor en este contexto. En el Capitulo 7 se resellan los estudios efectuados en distintos paises destinados a seleccionar Ia "mejor" distribuci6n para uso naciona! en el modelo AM. EI Capitolo 8 contiene un resumen y las conclusiones. Ademas de los metodos tradicionales de prueba del buen ajuste de las distribuciones de las series de crecidas observadas, el informe trata de los avances efectuados en los ultimos alios en los estudios de "comportamiento" y en los estudios de solidez. En los primeros se exarninan las caracterfsticas estadisticas de muestras de datos sobre crecidas denlro de varias regiones, por una parte, y de muestras aleatorias tomadas de distribuciones candidatas, pur otra parte. E! comportamiento de todas las distribuciones utilizadas tradicionalmente no corresponde a los datos hidrol6gicos reales y sOlo las distribuciones de Wakeby y TCEV, recientemente elabnradas, parecen satisfactorias. Los estudios de solidez han mostrado que las distribuciones de Wakeby (WAK) Y del valor extremo general (GEV), utilizadas regional mente con parametros calculados por los momentos ponderados de probabilidad (PWM), son s6lidas y por 10 menos sensibles a los cambios de Ia distribuci6n original desconocida que se halla en curso de modelaci6n. La distribuci6n TCEV tiende a estar exenta de sesgos, pero es menos eficaz que esas dos. Por otra parte,la distribuci6n logaritrnica de Pearson de tipo 3 (LP3), con estimaciones de los paramelros obtenidas por momentos a partir de los logaritrnos de los datos, es la menos s6lida de los metodos muy difundidos y es extremadamente sensible a los cambios de la distribuci6n original subyacente desconocida. Pur consigniente, la distribucion LP3 no puede recomendarse en general para Ia estimaci6n de Ia frecuencia de las crecidas. Al utilizar las distribuciones de Wakeby y GEV, la estimaci6n de los parametros debe efectuarse en 10 posible sobre una base regional. La estimaci6n de los panimetros por el metodo PWM, aunque no es tan eficaz como Ia probabilidad maxima, eslll casi exenta de sesgo, es de facil utilizaci6n yen general no eslll afectada por factores exteriores. En los apendices del informe se facilitan ejemplos ilustrativos de estimaci6n de parametros. En el Apendice 6 se facilita un resumen y un anlilisis de las respueslas a un cuestionario sobre los tipos de distribuci6n actualmente en uso para el anl1lisis de Ia frecuencia de extremos de precipitaci6n y de crecidas por los servicios hidrol6gicos naciona!es y otros. Esa encuesta pone de manifiesto que se utiliza mas corrientemente la distribucion del valor extremo de tipo I (BVI) Y Ia distribuci6n log-normal. La formula de Weibull se emplea sobre todo para las inscripciones grlificas, aunque en este informe no se recomienda Ia continuaci6n de su uso con esta fmaIidad. xvi RESUMEN El presente infonne recomienda que las estimadoiles iIe las erccidas se basen en la utilizacion eonjunta de datos regionales y obtenidos in situ utilizando un metoda de erecida fndiee para la estimaeion de los euantiles. Se recomienda que se inieien medieiones del caudal en todo emplazamiento 0 region en donde se plantee el estableeimiento de algun proyecto de recursos hfdrieos. Rasta la fecha, ningun eonjunto de formulas ni eonocimientos estadfstieos puede suplir la falta de datos en los que basar una estimaeion precisa de las erecidas anuales medias y por ende de las erecidas en los perfodos de retorno maximos en e1 emplazamiento. CHAPTER 1 BASIC CONCEPTS OF FLOOD FREQUENCY ANALYSIS 1.1 Introduction In flood frequency analysis the objective is to estimate a flood magnitude corresponding to any required return period of occurence. The resulting magnitude-return period relationship will be referred to as the Q - T relationship. 1.1.1 R1mJRN PERIOD Flood peaks do not occur with any fixed pattern in time or in magnitude. Successive exceedances of some magnitude are separated by varying intervals of time, Figure 1.1. For any arbitrary discharge Q' we define return period as the average of the inter-event times, T(Q') = Average('th 't2. 't3 (1.1) ) a straightforward definition of return period which does not entail any reference to probability. The concept of . probability is introduced in flood frequency models and the resulting magnitude return period relationships in Section 1.5 below. Return period is also referred to as recurrence interval; Q - - - - ....... - ... ".,- "". 1". - --- Figure 1.1 Quantities used in defmition of return period Jt.may be that the hydrological regime at any river site changes due either to man's direct influence on the catchment or to climatic change. The latter may of course be very difficult to detect or quantify. This report deals only with situations where neither of these changes occur at a detectable rate. 2 1 .2 STAllSTICALDISTRlDlfflONS FOR FLOOD FREQUENCY ANALYSIS Flood frequency models At any river site it is usually assumed that nature provides a unique Q - T relationship and that Q is a monotonically increasing function of T. In order to estimate this natural Q - T relationship from a good quality continuous hydrometric record of N years duration it is necessary to resort to a statistical or stochastic model of the continuous hydrograph which retains information in the hydrograph relevant to the Q - T relationship and discards the rest. Three such models are: (a) annual maximum series model, AM. (b) partial duration series, PD, or peaks over a threshold, POT model. (c) time series model, TS. This report is primarily concerned with the AM model; the PD model receives some attention while the TS model, from which a Q-T relation may be obtained by simulation (Quimpo. (1967), Hall and O'Connell (1972», is acknowledged but is considered outside the scope of this report. If the assumptions of independence of flood peaks are violated, see Section 2.3, then use ofa TS model would be one way to proceed. Even if the statistical parameters of these models were known perfectly for a given river site, it must be assumed that a particular value of Q would: be attributed differing values of T b)' each of these-models imdfurtherthat these values of T would differ from the value innatnre. Thus, if for some Q, T is the true value ofretum period in nature and TAM and TpD are the values attributed to Q by the two models it must be assumed that (1.2) * Under certain not unreasonable assumptions, Langbein (1949) showed that TAM TpD , the greatest difference being for. small values ofT. Under his conditions the difference converges to 0.5 year as T increases. The difference for small values of T is a result of sampling the time base of the entire hydrograph discreteIY.illMllit,s 9ta year,a time interval .wliich is of the same order of magnitude as the return period of small flood peaks. This distortion of.. -:the time .,. scale is not present to .an obvious degree in the partial durationmodet and it may be reasonable to asSume that r pD = T. , . , Langbein de<lhced that = I 1 - exp( -II Tp&) (1.3) which gives for instance: T';'M - 1:.58, 2.54 and 100.50 for = LOO,2.00 and rOO.OO respectivet~. Thus the relative difference is greatest for small values of T (WMO, 1983). The derivation of eqn(1.3) has been discussed by Chow(1950) and by Takeuchi (1984). The valfdfty"of equation CU} has been empirically examined by a numbe; of in~~stigators. Beard (1974) conduded that PD frequencies are related to AM frequencies differently in different regions and thus that empirical relationships should be used rather than a single theoretical relationship. Beran and Nozdryn·Plotnicki (1977) report on a study of low return period floods on British rivers. They found that Langbein's result does not hold exactly even though the form of the relationship between TAM-,andTpD is. approximately correct. Yevjevich and Taesombnt (1978) report, inter alia, on a comparison of return periods of fixed flood magnitudes estimated by (a) annual maximum model and (b) partial series duration model. The latter TpD values are compared with TpD values calculated by Langbein's formula from the TAM- values obtained in (a). In one river the agreement is excellent for TpD values of two years and less while in the second river the agreement is remarkably good for return periods up to ten years. CHAPrERI 1. 3 3 BASIC CONCEPTS OF FLOOD FREQUENCY ANALYSIS Relative merits of AM, PD and TS models Apart from their relative abilities in representing the parent Q - T relationship as discussed above, their relative merits can also be considered under two other headings: identification of data series and statistical efficiency of estimates of QT by each model. (a) Identification of data series The series of annual maximum floods can.be extracted without difficulty from a hydrometric record. The frequency of problems of identification caused by a flood continuing from the end of one year into the beginning of the next can be considerably reduced by selecting a date within the dry season as the commencement of the hydrometric year. The extraction of the partial duration series of floods is less straightforward because of occasional bunching of flood peaks. The possibility that such peak magnitudes are not statistically independent has led to a certain amount of unease about the validity of statistical methods used with this model. In addition, prior to 1963 (see Borgman 1963) no engineering literature dealt with peaks over a threshold series type of problem in which the number of peaks included differed from the number of years of record. In the latter case, the series was known as the annual exceedance series. (b) Statistical efficieny of estimates of Qr by each model A Denoting the estimate of Qr obtained by the AM method as Qr and that obtained from the same hydrometric record by the PD method as <:!; it is usually observed that these two estimates are unequal. Furthermore the sampling varianceof<:!r is not equal to that of Q;, Le. var(<:!r) '" var(Q;). From a statistical poiut of view that method which has the smallest sampling variance enjoys an advantage. Under certain common assumptions Cunnane (1973) examined • A 1'\* A A* • • . the relattve values ofvar(Qr) and var(Qr) and found that var(Qr) < var(Qr) provIded A < 1.65 where A IS the mean number of peaks per year included in the PD series. When A > 1.65 the opposite was true. This shows that the AM method is statistically more efficient than the PD method when A is small but less efficient when A is large. In many practical situations the assumptions of the PD model may not be valid if A is increased to too high a level, certainly if A >3. These results have been re-examined by Yevjevich and Taesombut (1978) and by Taesombut and Yevjevich (1978) with the help of simulated flows obtained from a time series model. Their results suggest that a value of A > 1.8 or 1.9 maybe required to ensure greater efficiency of PD estimates of Qr, the change from A> 1.65 being a result of small but significant differences between parameters estimated from AM and PD simulated series. (Taesombut and Yevjevich, 1978, Chapter 7.2 and Tables 7.1 and 7.2). On the basis of very restricted simulation results Tavares and da Silva (1983) reported that Cunnane's (1973) expression for var(<:!;) overestimates the true sampling variance if A < 1 and underestimates it if A> 2. Rosbjerg (1984) gives an improved exact expression for var(Q;) for both the case of independent peaks and serially correlated peaks. Apart from these restricted comparisons no further objective comparisons based on bias and efficiency of Qr estimates have been published and the TS model has never been considered in this way. 1. 4 Measures of flood magnitude Some aspect of the flood wave such as its peak level, H, peak discharge, Q, or volume, V, must be designated to represent its magnitude. Which of these is chosen depends on the use to· which the frequency curve is to be put Another aspect which is of interest particularly in the economic evaluation of road closures and crop inundations, is the duration D dorlng which a given level is exceeded. If interest is confined to a single cross-section of a river it may sometimes be sufficient to analyse the series of peak levels ralher than the series of peak discharges. However, if there have been changes in channel geometry at the site during the period of record the series of levels is nonhomogeneous and unsuitable for ordinary statistical analysis. This disadvantage is not shared, however, by the corresponding series of discharges. In addition there are other reasons why the series of discharges is preferable to the series of levels: 4 STATISTICAL DISTRIBUTIONS FOR FLOOD FREQUENCY ANALYSIS (i) the results of the analysis of discharges can be related to catchment characteristics and thereby transferred to other sites on the river or other catchments and (ii) the effects on levels at the site in question of some hydraulic structure downstream of it or of flood protection works can only be obtained via discharges. The volume of the flood wave is of paramount importance whenever storage plays more than a minor part in the sitnation being examined, for instance, in reservoir problems while it is also important in some flood protection schemes. 1.4.1 SINGLE vARlABLE DESCRIPTION Despite the importance of the flood volume, the majority of flood frequency research and published flood data has been based on flood peak discharge. While flood volumes and durations have been studied at individual sites for investigating particular projects, few data or results have been published in the periodical hydrological literature. Exceptions are an extensive study of the distribution of flood volumes on 64 catchments published by NERC (1975, 1.5) and a guide to a particular method of flood volume frequency analysis prepared by Beard (1962) and U.S. Corps. of Engineers (1975). This lack of easily available, published data have forced researchers to consider peak values only. In keeping with the availability of the material and the popularity of flood peaks, the main body of this Report deals with distribution of flood peak magnitude Q. In applications, the recommendations made can be taken to apply to instantaneous flood peaks or to peaks of daily mean flows. It should be borne in mind that these quantities may differ considerably on small catchments. The question of the distribution of flood volumes and durations is considered separately in Appendix 1. 1. 5 Magnitude-return period (Q- T) relationships 1.5.1 ANNuALMAXlMUMSERIES MODEL A series of annual maximum floods (QI, Q2 QN) is assumed to form a random sample from a stationary population in which Q is a random variable with distribution PR(Q';; q) = F(q). The variate value with exceedance probability I I T is said to have return period T. Denoting this value Qr it is such that: I . F(Qr) lIT = (1.4) The justification for this may be seen by considering a sequence of Bernoulli trials, in each of which the outCQme is either an exceedance or a non-exceedance of Qr.. If the probability of exceedance of Qr in each trial is p = I I T, then in a sequence of M years (trials) the expected number of exceedances is M.p = MIT by the binomial distribution. The expected number of exceedances of Qr in T years is thus TIT = I. Further the length of interval elapsing between successive exceedances in such a sequence has the Geometric distribution with expected value I I p = T. This result about the average interval between successive exceedances of Qr is consistent with the concept on which the definition of equation 1.1 is based. If F(Qr) is known, equation lA, after some algebraic manipulation, provides the Q - T relation. For instance, if F(.) is the extreme value type I or Gumbel distribution, EVI equation 104 gives: Pr(Q';;Qr) = QT- u) exp[ • exp - ( - - ] 0: = I - II T (1.5) where u and 0: are EVllocation and scale parameters, which leads to Qr = u - aln[-ln(1 - lIT)] (1.6) where YT = -In[-ln(l· lIT)]. (1.7) CHAPrnRl 5 BASIC CONCEPI'S OF FLOOD FREQUENCY ANALYSIS An alternative fonn is: = j! + 0" (1.8) KT where j! and 0" are population mean and standard deviation and = - ";6 [0.5772 + In ( _In (1 - 11 T It »] (1.9) is a frequency factor depending only on T, (Chow,1951). In equations (104) to (1.9) T is synonymous with TAM' In distributions snch as 2-parameter gamma or log-normal as well as most three parameter distributions KT is a function of the shape (skewness) parameter as well as ofT. 1.5.2 PARTIAL DURATION SERIES (OR PEAKS OVER A1lIRESHOID) MODEL In this model most of the flow hydrograph is disregarded and the hydrograph is viewed as a series of randomly spaced flood peaks of random magnitude. For ease of statistical modelling and also for ease of identification of the values which fonn the series, only the series of peaks exceeding an arbitrary threshold qo are considered. The most general concept is that of a joint probability density functionf (~l, qr, ~2, qz, ~3, q3 •......) which expresses jointly the random distribution of times of occurrence of peaks exceeding a threshold qo and the magnitudes of the peaks. A very general treatment of the probability statements involved has been given by Todorovic (1970) while particular models from his general approach have been dealt with by Todorovic and Zelenhasic (1970) and by Todorovic and Ronsselle (1971). Earlier and somewhat less general treatments were given by Borgman (1963), Shane and Lynn (1964) and by Bernier (1967). In particular, each of these showed that if the number of flood peaks exceeding some value qo (a threshold value) in some interval of time such as a year has a Poisson distribution with parameter A. then the number of events exceeding a greater value q' is also Poisson distribnted with parameter A'= AP where p = PR(q~q' I q~qo)' Here p is a conditional probability, being the proportion of all peaks exceeding qo which also exceed q'. Most practical applications make nse of this least general model which assumes a Poisson distribution for the number of events exceeding qo per year or at most a seasonally varying Poisson distribution. In either case the parameter may be estimated by the mean number of events exceeding qo per year when estimates of flood magnitudes are required for return periods well in excess of one year as in such circumstances the seasonality aspect has little effect. The seasonality aspcct can be very important when small return periods are being considered. If the simplest Poisson assumption is made then a magnitude Qr of T year return period will appear once on average among every AT flood peaks in the series, Abeing the Poisson parameter of mean number of events per year exceeding qo. In other words the exeeedanee probability of QT in the population of flood peaks which exceed qo is II AT. If this conditional distribution has distribution function F(q I q~qo) then the Q - T relation is defined by I - or F(Qr/Qr~qo) F(Qr/Qr~qo) = = I/A T In particular if F( ) is assumed, as frequently is the case, to be exponential with standard deviation F(q/q~qo) = Qr +.~ [n ( (l.lO) 1- I/A T 1- exp[-(q - qo)/~] ~, (1.1\) then (l.lO) gives: 1.5.3 = qo A T) (1;\2) REruRN PERIOD AND PROBABILITY In the basic definition of return period T in equation. (1.1) there is no mention of probability. The type of models described here have been developed to estimate the Q - T relation from an observed hydrograph and in doing so the concept of the distribution of a population of flood peaks has been introduced and hence a link between return period and probability has been developed. This is a conseqnence of the model and not of the definition of return period. 6 1. 6 STATISTICAL OISTRIBUTIONS FOR FLOOD FREQUENCY ANALYSIS General considerations A primary assumption of hydrological frequency analysis is that the series of events being considered have random magnitudes which are mutually independent and identically distributed (tid). Thus existence of serial correlation or trend in the series would invalidate the usc of such analysis. The result of any frequency analysis is an estimated Q'T relationship. While modem practice attempts to quantify the error in such relationships it must still be borne in mind that some estimates involve considerable statistical extrapolation. Some authors, particularly Klemes (1986, 1987), draw attention tc the accompanying inherent dangers. Flood frequency analysis is not an end in itself but provides vital information for the design of many hydraulic structures (culverts, bridges, reservoir spillways, road embankments, flood control levees) and for risk assessment in flood plain use and insurance. And, as mentioned in Section 1.4, the primary variable for a particular application may be one or more of peak, duration or volume. In applying frequency analysis in any problem area the analyst must, using his knowledge and familiarity with the region, take into account the uses tc which the frequency curve will be put, the type, amount and quality of hydrological data available, the nature of the hydrological regime and in particular the causitive factors of floods and any special factors, natural or otherwise, controlling them. In preparing guidelines for general use in a particular region the responsible hydrologists will, in addition to technical hydrological problems, have to take into account the institutional framework of such guidelines and in particular any legal requirements, including those suggested by case law, tc be met by such guidelines. In addition the degree of expertise available among potential users will bave a bearing an whether such guidelines are presented as strict codes of practice tc be rigidly applied or otherwise. Such gnidelines usually try tc provide procedures for basic checking of data, treatment of low and high outliers, estimation of missing peaks, inclusion of historical flood data, detailed methods for ungauged catchments and some comment on the relationship between design floods obtained from statistical analysis of floods on the one hand and from rainfall runoff models on the other. In this report a degree of familiarity with the concepts and practice of frequency analysis on the part of the reader is assumed. The report is not envisaged primarily as an instruction manual nor as an introduction tc the practice of frequency analysis with all the attendant considerations mentioned in the previous two paragraphs. What it does aim to do is to outline rationally ideas which are fundamental to the technical questions of frequency analysis and tc collate and synthesize recent relevant research on these questions. Some conclusions emerge which are not presented as binding truths but rather as points worthy of consideration by those responsible for flood frequency analysis work. CHAPTER 2 STATISTICAL PROPERTIES OF OBSERVED FLOOD SERIES 1.1 Distribution The distribution of flood magnitudes in the PD series tends to be abruptly truncated at some threshold while that of the AM series always has some values to the left of the mode. These latter values reflect the presence of annual maximum from years having small flood peaks (see Figure 2.1). ", I f(q) 1 f(q) I I I I -q (a) Typical PD series histogram and idealized probability density function.. (b) Typical AM series histogram and idealized probability density function Figure 2.1 Contrasting PD and AM histogram shapes. While many theoretical forms of probability distributions provide a satisfactory fit to such histograms within the observed range of peak values, such distributions may differ in the shape of their tails and this shows up acutely at large T on Q - T diagrams (see Figure 3.1 later). The distinctiou betwecn different shapes of Q - T relations is of great practical significance if cost effective recommendations are to be made by flood hydrologists. The histogram alone, based usually on 50 or fewer observations, is unable to aid the task of discriminating between distribution tails. Therefore, conective evidence about distributional behaviour of floods on many rivers needs to be studied. Such behaviour is reflected in the higher dimensionless moments C v, C s and Ck and in the extreme values of samples. Such quantities are now discussed while factors affecting choice of distribution are discussed in Chapter 3 and actual choice of distribution is discussed in Chapter 6. 1.1 Sample Statistics Conventional moments and their dimensionless ratios C v , C s and Ck are well known and estimation of them from samples is discussed in Appendix 2 and their typical values in the flood frequency context are given below in section 2.2.1. Probability weighted moments (PWM) are newer (Greenwood et al.. 1979) and are superior to ordinary moments in helping to identify distributions, estimate their parameters and test hypotheses about values of such parameters. PWMs are defined in Appendix 4 and use of dimensionless PWM ratios, known as I.-moments (Hosking, 1986), to help identify distributions is discussed briefly in Appendix 3. 2.2.1 MO,,"NTS The mean of these series reflects the size of the catchment from which the data come. The mean of the AM series Q (= mcan annual flood) is roughly proportional to AO.75 where A is catchment area. (Hazen, 1932, Nash and Shaw, 1965). The standard deviation of these series is generally proportional to catchment size except that along a single river it sometimes decreases with distance downstream. The coefficient of variation, Cv, dcscribes the standard deviation 8 STATISTICAL DISTRIBUfIONS FOR FLOOD FREQUENCY ANALYSIS as a proportion of the mean. In the majority of AM series its estimated value lies between 0.3 and 0.8 but values outside this range can and do occur. (See NERC, 1975 (p.l22) and Matalas et aI., 1975). Unusually low, as small as 0.1, valucs of Cv occur in equatorial regions of high rainfall whose flood producing mechanism is fairly uniform from year to year, such as Papua New Guinea (Atkins, 1982). Low C v values also occur in relatively impermeable, high rainfall catchments in temperate zones, (NERC, 1975, (pp 182 -183), C v= 0.15). Large values, Cv > 1, occur in some AM series which display a well-behaved heterogeneity but most cases of large Cv are due to the presence of one or more outliers. In some AM series the largest value may be more than twice as large as the second largest while in others the smallest value may appear to be inordinately small. In general, the value of skewness, C s, estimated from AM series is positive, ranging from zero to 5.0 or more, with the majority of values lying in the range 0.5 to 3.0. (See NERC, 1975 (P122), Matalas et aI., 1975 and Rossi et al., 1984). Values of Cv and Cs tend to be positively correlated since the same features cause high or low values of each. Since hydrological practice sometimes makes use of logarithms of data, Z = log Q, it is worth noting that C s in log space (LS) of AM is frequently negative (Landwehr et al., 1978). Values of kurtosis, Ck, ilJ AM series vary' from 2 to 8 which are regional average valn';s reported by Landwehr et ai. (1978). Individual values.mayvary widely outside these limits. In extensive randolll sampling froni· distributions commonly used in hydrological frequency analysis Wallis et al. (1974) noticed that no matter how many samples were generated, even from e"tremely skewed parent p'opulations, the value of skewnesS obtained from such samples is bounded. above. Kirby (1974) confirmed theoretically that both Cv and C s in random samples are bounded and that the bounds are a function of sample size N alone,. thus:< I Cs I (N - 1)°.5· < --,("-N,--...;2,,,)...-c (N _1)0.5 (2.1) (2.2) where C v and C s are based on uncorrected moments as in equations (A2.1) to (A2.3) of Appendix 2. Kurtosis is also bounded by a function of sample size, not evaluated by Kirby (1974). Landwehr et al. (1978, Table 2) found that average regional values of C v, Cs and Ck. measured from AM data, increase with record length while Hazen (1932) also found that skewness of AM dala tcnded to increase with record length and suggested a correction factor (I + 8.51N) by which to multiply small sample values of C s ' This factor was later found to be useful for samples from mildly skewed parent populations (Wallis et aI.. 1974 and Appendix 2 of this report). Because of the bounds which exist on Cv , Cs and Ck values of these quantities measured from random samples are biased downwards and tables of bias factors are given by Wallis et al. (1974). The magnitude of these biases are outlined in Appendix 2. If AM series arc in fact random samples from an unknown distribution the values of Cv , C s and Ck obtained from them are biased estimates of the population values. This bias is important in making inferences from the sample moments, for instance by use of moment ratio diagrams, about the form of the population distribution. It can be concluded that the populations from which AM series are assumed to be random samples have . values of Cv , C s and Ck in excess of regional average values of those statistics calculated by the usual formulae. Corrections for such bias should be made if method of moments estimation is used. Appendix 2 gives some details but completely general rules cannot be given. 2.2.2 REGIONAL DISTRIBUTION OF C y . C S AND C K While regional mean and standard deviation values of C v• C s and Ck have been reported the distribution of these quantities within a region has rarely been examined. Such knowledge may be useful in the inference problem of determining the form of distribution for AM series. Rossi et al. (1984) used the observed sampling distribution of skewness obtained from 39 Italian AM series of average length 40 years as a criterion to be reproduced by similarly sized dala sets randomly drawn from a two component extreme value (TCEV) distribution. Based on the comparison of observed and Monte Carlo derived sampling distributions they eliminated both EVI and log normal distributions as candidates for modelling Italian AM series. Njenga (1985) found that the sampling distributions of Cv , C s and Ck calculated from 64 AM series of length 15 years each in Ircland are remarkably well behaved and are very similar to, but not exactly equal to, the mean sampling CHAPfER2 9 STATISTICAL PROPERTIES OF OBSERVED FLOOD SERIES distribution of corresponding statistics of data sets randomly drawn from GEV parent distributions with shape parameter k in the range 0.0 to - 0.1 (Figure 2.2). Thus, while the large sampling standard error of a skewness value calculated from an individual AM series makes it impossible to use such a value on its own with confidence (pointed out by Slade, 1936) it is quite possible that regional distributions of Cv , C, and Ck may contain information of value in inferring characteristics of parent distributions from which the AM data may have been drawn. 3..-----------------------, .... 64 OBSERVED RANKED SKEWNESS VALUES (9(1) I = 1, .. , 64) - - - Smoothed trace of 9(i)' whereg (i) average laO s~mulations of gel) obtained from random samples from GEV parent with k - - - likewise with k = 0.0 , /.' -0.1 =: 2 / / , /.' / .. , .... ,/ 1 /./ / / / ,/ ". ". / ... , / / ". '". / / / .,/ '/ .' , /~y" ...,. ,/;. ,,. ,/ ... ~ /.' // 1/ :., ' / ....• / ..//1 .. / I I I I / • -1.t---~--~---~--_r_--_,_--_,_--___1 -2 -7 o 7 2 EVI STANDARDISED Figure 2.2 3 VARIATE 4 5 y. Sampling distribution of observed AM skewness in Ireland (64 stations, 15 years each) compared with trace of expected value of corresponding quantities obtained on assumption of GEV parent populations with k = 0.0 and -0.1. (After Njenga, 1985). 10 2.2.3 STATISTICAL DISTRIBUTIONS FOR FLOOD FREQUENCY ANALYSIS DISTRIBUTION OF STANDARDIZED LARGEST FLOOD PEAK If QN is the largest peak in an AM series of length N let (2.3) YN = (QN - a)1 b be a standardized largest valne, where a and b are measures of location and scale respectively. Values of YN calculated from several records of equal length N from a region define an observed sampling distribution of YN which could usefully be compared with Monte Carlo derived distributions of YN (for similarly sized data sets) obtained from a variety of distributional assumptions. Such comparisons would be of assistance in identifying parent distributions whose sample extremes behave similarly to observed AM extremes. The values (a, b) could be distributiou free, e.g. mean and standard deviation or the estimated parameters of a particular distributional form; for example EVI parameters, estimated by ML, were used by Rossi et al. (1984) in outlier tests but could also be used in the more general manner outlined above. As an example the sampling distribution of YN = QmaxlQ from 64 AM records of size 15 in SE England are shown in Figure 2.3. 6 64 Stations each with 'U c: IV 'E> 15 years record • 5 c: w toIV w -" 4 GEV Simulated .c: 0 !!!. 3 ••••• ~ .t IV .£I ~ IV E 2 Observed a 1 -2 0 2 4 6 EV1 yl Figure 2.3 Observed regional distribution of YN = Qmax I Qbar compared with (smoothed) expectations of corresponding ranked values from 64 samples of size 15 from GEV parent distribution with ].1 = 100.0, IX = 0.3 and k =- 0.20 . CHAPfER2 2.2.4 11 STATISTICAL PROPERTIES OF OBSERVED FLOOD SERIES CONDITION OF SEPARAnON OF SKEWNESS Matalas et al. (1975) have examined the regional mean Cs and standard deviation O"cs of Cs for subsets of length 10, 20 and 30 years of recorded AM series. They found for each subset length that a plot of O"cs versus Cs prepared from historical data from 14 regions in U.S. was not in agreement with corresponding plots obtained from similarly sized random samples drawn from distributions commonly used in hydrological frequency analysis. This disagreement was termed "the condition of separation", which is illustrated in Figure 204. This condition manifests itself on Figure 2.2 by the empirical distribution of skewness having a steeper slope than the corresponding distribution of expected values. This result suggests that no conventional distribution could have generated the observed flow values. However the Wakeby (Houghton, 1978) and TCEV (Rossi et aI., 1984) distributions, with certain parameter combinations, yield poiuts which overlap the regional flood data points of Figure 204. '" )( ~ ___ Regional ')( " ____=__---------- -- ",,, ,,-"~":.--"::....:.":....-------:;_ Figure 204 2.3 flood data Samples from known distributions Illustration of condition of separation DEPARTURES FROM INDEPENDENCE It is usually assumed that all the peak magnitudes in the AM series are mutually independcnt in the statistical sense. This assumption is usually justified. Beard (1974) concluded that AM series of 300 gauging stations were not substantially autocorrelated. In the study of over 30 long AM series in Britain the use of 3 different tests for persistence did not reveal any appreciable dependence in AM series (NERC, 1975). There may however be some element of serial persistencc displayed by extremely large rivers. Many authors have expressed the fear that successive clements in a PD series may be corrclated and this has given rise to doubts about the applicability of the method. For this reason, arbitrary rules are adopted for deciding whether to include certain peaks in the serics or not. For example it may be required that two peaks must be separated in time by more than 3 tp, where tp is an average timc to peak for the catchment (NERC, 1975, VolA, p14) or the two peaks must be separated by more than (5 days + In A), where A is catchment area in square miles (Bcard, 1974), if both are to be included. A further condition might be that flow between successive peaks must fall to less than half, two thirds or threc quartcrs of the first or smallcr peak flow. More recently USWRC(1981, p8) has declined to recommend "specific guidelines" for defining flood events to be included in a PD series. Studies carried out on PD series in Britain (NERC 1975, also Cunnane, 1979) indicate that the magnitudes of successive PD scries peaks, selected in accordance with the abovc rules, are not correlated and may be regarded as statistically independent but that the intcr-event limes are not independent random variables. That is persistence in the PD series, if it exists, should be looked for in the process giving rise to the peaks rather than among the peak magnitudes. (Sec also Ashkar and Rousselle, 1983 a. and b.). 12 STATISTlCAL DISTRffiunONS FOR FLOOD FREQUENCY ANALYSIS Because it is difficult to prove conclusively that serial persistence is absent in either PD or AM series a number of simulation studies have been conducted on the effect this may have on parameter estimation and standard error of estimate. See for inslRnce Landwehr et al. (1979a), Srikanthan and McMahon (1981), Tasker (1983) for AM series and Tavares and da Silva (1983), Rosbjerg (1984) for PD series. In general, when serial persistence is present, the assumption of serial independence leads to more biased quantile estimation and larger slRndard error of quantile estimate than when serial persistence is absent and the correct model form is assumed. In addition to serial persistence studies NERC (1975) also reports on split sample tests and trend analysis of AM series. Significant trend was found in a small number of British AM series. It is not possible to make a general slRtement about non-slRtionarity of AM or PD series because there can be circumslRnces in which definite catchment changes can bring about a trend (see Reich, 1985), even though this may be difficult to detect or measure. Naturally, if trend or cyclic change does exist in a flood series it would have to be removed before a conventional flood frequency analysis could be attempted. It is recommended that lests for trend be performed, as for inslRnee in NERC (1975, Chapler 2), to check that the usual assumptions of frequency analysis are valid. ,.,i'" CHAPTER 3 THE MODELLll'lIGPROBLEM 3.1 Introduction The flood frequency modelling problem relates to: (a) choice of model type (AM, PD, or other), (b) choice of distribution to be used in the chosen model (e.g. choice of distribution for AM or.PD series), (c) choice of method of parameter and quantile estimation, (d) choice of scheme for joint use of at-site data, when available, and regional data. Aspects (a) and (b) are discussed in this chapter while (c) and (d) are discussed more fully in Chapters 4 and 5. Finally, the choice of flood frequency estimation methods is discusscd in Chapter 6. It should be noted here that two separate aspects of such choice are important. These are the descriptive and predictive properties of the chosen method. The descriptive property relates to the requirement that the chosen distribution shape resembles the observed sample distribution of floods and that random samples drawn from the chosen model distribution must be statistically similar to the properties of real flood series described in Chapter 2. The predictive property relates to the requirement that quantile estimates are robust with small bias and standard error. 3.2 Choice of model type The relative merits of different model types were discussed briefly in Chapter I, Section 3. Where very low return period (T < 2) floods are being considered, such as in flooding of agricultural land, it should be remembered that the AM model causes a distortion of the Q - T relation because TAM 'Ie T. Use of the PD model obviates this problem in such cases. If AM data are the only ones conveniently avairable then the Q - T relation so obtained has to be modified, by use of Langbein's relation between T and TAM, equation 1.3, or some empirically derived local counterpart of it (Beard,I974, Beran and Nozdryn-Plotnicki 1977, Takeuchi, 1984). In most other flood estimation problems quantile, of high return period are sought and the difference between T and TAM is of little concern. What is then of interest is accuracy (lack of bias) and efficiency (inverse of sampling standard error) inherent in the method used. Thus it is valid to ask whether either of AM, PD or TS models display any marked superiority over the others. This question has not yet been fully investigated; what has been done was reviewed earlier in Section 1.3(b). 3.3 Population and Distributions The choice of distribution problem, especially in the AM series case has attracted considerable interest (Gumbel, (941), Moran (1957), Benson (1968), Jenkinson (1969), Beard (1974), NERC(l975), Houghton (l977a) , Boughton (1980), Rossi ~. (1984), Ahmad et a1. (1988». Until the mid 1970's there was a tendency to treat this aspect in isolation from choice of method of parameter and quantile estimation and from the choice of scheme for joint use of at-site and regional data. This separation cannot be justified as all three aspects interact when the hydrology of a region or country is being eonsidered,(Fiering 1967). The question of choice of distribution for PD series has, on the other hand, received considerably less attention. . 3.3.1 DISTRIBUTIONS FOR AM SERIES Choice of distribution for AM series has received widespread attention. For instance, Benson (1968) described a study which concentrated on finding a distributional form which would describe well the observed sample distribution of ten long records of AM floods. NERC (1975) also devoted considerable attention to the same problem. Surprisingly, however, WMO (1984) reports (see Appendix.6) that: 14 STATISTICAL DISTRIBUTION FOR FLOOD FREQUENCY ANALYSIS "In many countries the selection of an AM distrihution is a<;tually not made in any objective manner and that the choice of distribution is argued in a gcneral manner, as follows: The (chosen) distribution is j .• , widely accepted, simple convenient to apply, .; ':: c·' . - consistent, flexible or robust (low sensibility to outliers), - theoretically well based, - documented in the Guide (WMO, 1983) and elsewhere. No special method of parameter estimation is preferredoand the graphical method is as frequently or even more used as any other method." Apart from goodness of fit type tests, information about distribution type should be inferrabIe from the dimensionless moments C v, C s and Ck measured from AM data. Such inference, attempted by Wu and Goodridge (1976) and by McMahon and Srikanthan (1981), is unsatisfactory unless allowance is made for the bias which is known to exist in the estimates of C v , C s and Ck. (See Chapter 2, Section 2, and Appendices 2 and 3). Improvements on such inferences may be possible through more powerful goodness of fit tests such as Anderson-Darling and modified Anderson-Darling EDF tests (Ahmad et aI, 1988b) and L-moment ratio diagrams based on PWMs (Hosking, 1986, 1988). 3.3.1.1 Robustness In the past (e.g Benson (1968), NERC (1975», it was generally assumed that Once a distributional choice was made according to a goodness of fit criterion that a satisfactory basis for flood estimation was established, Such an approach assumes that a single, as yet unidentified, under lying distributional form is adequate for modelling AM flood peaks. Even if this assumption were true it has to be accepted that the true underlying distributional form cannot be identified with certainty at the present time, either on a single-site or regional basis. This fact should be taken into account when selecting a distribution. In doing so a distribution and an associated method of parameter estimation (denoted DIE for "distribution and estimation procedure") must be sought which is robust with respect to extreme upper quantile estimation, over a reasonable range of distributions, random samples from which have statistical characteristics similar to observed AM flood data. Such a reasonable range of distributions will be termed "flood-like", following Landwehr fi.l!!. (1980). A robust DIE procedure in this context, is relatively insensitive to small changes in the distributional assumptions which it assumes are true. For instance, let QT(A) and Qr(B) be estimates of QT obtained by assuming DIE procedures "A" and "B" respectively. If DIE procedure "A" is robust, then QT(A) should be a good estimate of Or regardless of whether the population from which the sample has been drawn is "A" distributed, "B" distributed or "otherwise" distributed. On the other hand if DIE proeedure "B" is non-robust then QT(B) might be a reasonable estimate of QT when the population from which the sample is drawn is "B" distributed but it may be greatly in error when the population is "A" distributed. The criteria for assessing these estimates are bias, b, and root mean square error rmse. where I b = rmse = E(Qr - Qr) (3.1) (3.2) Such assessments cannot be conducted on observed flood data because the true value of QT is unknown, Such assessments can only be conducted in controlled simulation experiments using random samples drawn from known statistical "flood-like" distributions, (see for instance Hosking tlJ!!.. (1985a». Before promoting a particular DIE procedure for use in flood frequency analysis, it is essential to confirm that it yields quantile estimates with relatively small bias and rmse over a reasonable range of "nood-like" distributions, since the true underlying distribution form is unknown. While this requirement is certainly necessary it may not alone be sufficient. CHAPrER3 3.3.1.2 15 TIlE MODELLING PROBLEM Single or multi-component AM distributions. In most applications the model consists of a single distribution. However, there are cases where a model which recognizes different physical flood producing mechanisms might have to be considered. If physical circumstances warrant it the observed AM flood series may be considered as the maxima of samples from two distinct sub-populations (USWRC, 1981; Waylen and Woo, 1982). If two sub-populations are assumed then two sets of parameters must be estimated. In addition, some such models (Singh and Sinclair, 1972) contain a mixture parameter which must also be estimated. Obviously these more general models Wilh extra parameters should only be adopted if there is a clear physical distinction between the two types of events as for instance between floods caused by summer cloud bursts and spring snowmelt (Waylen and Woo 1982) or between floods caused by hurricane and non-hurricane events, (Fiering and Jackson 1971, p.80). Rossi tl..ll! (1984) have suggested a two component extreme value distribution on the basis that it can satisfactorily represent the regional spread of observed skewness and standardized largest values of observed AM series. 3.3 .1.3 Candidate AM distributions In either case, many candidate distributions have been suggested including: Log-normal Pearson type 3 Extreme value type I Extreme value type 2 Extreme value type 3 Gamma Log-Pearson type 3 General extreme value Weibull Wakeby Boughton Two component EV Log-logistic Generalized logistic (LN) (P3) (EV1) (EV2) (EV3) (LP3) (GEV) (WAK) (TCEV) (LLG) (GLG) (Hazen, 1914) (Fosler, 1924) (Gumbel, 1941) (Gumbel,1941) (Jenkinson, 1969) (Moran, 1957) (US Water Resources Council, 1967, 1976, 1977,1981) (Jenkinson, 1955, 1959) (WU and Goodridge, 1976) (Houghton, 1978a) . (Boughton, 1980) (Rossi et aI, 1984) (Ahmad et aI., 1988) (Ahmad, 1988) The mathematical forms of the distributions mentioned are given in Table 3.1 while the magnitude-return period relation corresponding to some of these distributions for a fixed value of mean and variance is shown in Figure 3.1 on an EVI reduced variate base y. It should be noted that for T > 5, YT = In(T - 0.5). It can be seen that magnitude of the EVI variate itself increases linearly with y (~ InT) while some others increase nonlinearly. Variates with large positive skewness increase more rapidly than EVI while those with negative skewness increase less rapidly. The latter tend towards a definite upper limit. Various distributions may also be distinguished in another way namely by the relationships which exist between C v and C s and between Cs and Ck for each distributional form. A selection of these are shown on moment ratio diagrams in Figure 3.2. The relations shown apply to population parameter values. The relations appropriate to small sample expectations of these statistics plot below those of Figure 3.2 (see Appendix 3). Because of the known bias in small sample estimates of moments, it is anticipated that a curve joining plotted points representing the expected value of skewness Cs and expected val ue of kurtosis Ck in small samples should plot below the curve representing the population (Cs,C0 relations in Figure 3.2. Some of these distributions were proposed initially because of their abilities to model different shapes of histogram or perhaps simply because they had not been used already. However, some of them have been reeommended on the basis of deductive reasoning, for instance, the extreme value family of distributions were proposed by Gumbel (1941) while Chow (1954), many years after its initial use for AM series, suggested theoretical reasons for use of the lognormal distribution. Both these proposals show that there is some deductive basis for these distributions. 3.3.1.4 Theoretical arguments Gumbel postulated that because the maximum flow in a year is the maximum of daily values it should be distributed as an extreme value variate. However, the 365 flow values are neither identically nor independently distributed. If, therefore, the annual maxima have an extreme value distribution it is for other than those rcasons stated 16 STATISTICAL DISlRillunON FOR FLOOD FREQUENCY ANALYSIS by Gumbel. Nevertheless, there are local maxima within the year and some M of these may be mutually independent, even if not identically distributed. If this M is sufficiently "large" in the context of extreme value theory, then perhaps the maximum may have an EV distribution. From this it is difficult to see a really secure argument in favour of an EV distribution for flood series (see also Lamberti and Pilati, 1985). A more important problem in the application of EV distributions to floods is that of choosing between Types 1, 2 and 3. Theory offers no help in this regard where the need is greatest. The difficulty and importance of distinguishing between EVI and EV2 or between EVI and EV3 is just as great as that of distinguishing between EVI (or EY2 or EV3) and LP3 for instance. In itself, belief in the grounds on which the EV family of distributions is recommended does not lead to a solution of the choice of distribution problem because great differences exist between the members of the family itself, in the manner in which Q varies with T. Tests proposed by van Montfort (1970) and by Hosking et al. (l985b) to test for EYI against alternatives of EV2 and GEY respectively are mentioned later in Chapter 5. The EV3 distribution possesses a finite upper bound and because of this it appeals to those who believe that the flood magnitudes on a catchment must have an upper bound. However, when such an estimated upper bound results by chance from estimating the parameters from a single small or medium sized sample, it can sometimes be unrealistically small being only a few percent greater than the largest flood in the record. Of the other distributions, only the log-normal has had any theoretical support elicited for it but then only after 40 years of prior use in hydrology. Chow (1954) stated that if the annual maximum flood could be considered to be the product of a large number of random effects then it would be log-normally distributed, because the logarithm of the variate could be considered to be the sum of a large number of random effects and would therefore be normally distributed by the central limit theorem. However, to be·valid as a deductive theory these effects would have to be identifiable. Failing this the distribution can only be supported by empirical data. Thus, theoretical arguments cannot, per se,. identify a best choice of distribution for floods. Given the empirical' evidence that most flOod series are positively skewed, theoretical knowledge of distributions' properties can serve to eliminate symmetrical distributions such as normal or rectangular. However this use of theoretical knowledge is not to be confused. with knowledge claimed to be due to reasoning about the genesis. of floods. In effect,. at the time of writing, empirical suitability plays a much· larger role in distribution choice than a priori reasoning. 3.3.1.5 Implication olan upper bound' on floods Because ofthe striking difference between the major groups of distributions in the growth of flood magnitude with return period the choice of distribution to be used in engineering design has serious economic implications. If there is a statistical upper bound, Qmn, to flood magnitude at anyri'ver site then it would be uneconomical to size large structures for flood magnitudes derived on the assumption of an unbounded distribution unless Qmn » Qr for reasonably large T. And conversely, if the distribution of flood magnitudes is statistically unbounded above then the consequences of designing a large structure for flOod magnitudes obtained on the assumption of a bounded distribution could be calamitous. It is worth noting that the motivation for Boughton's (1980) distribution was that in Eastern Australia the largest recorded flows in some rivers show log-probability plot behaviour which suggests a tailing off towards some upper bound. This is particularly true when the standardized variable K = (Z - Z)/sz is plotted as ordinate, where Z = log Q. That is, the 3 or 4 largest recorded values on many such rivers do not differ greatly from one another as happens in other rivers. This is manifested by a preponderence of negative values of skewness in log space. The remarkable feature of such largest floods are that in magnitude they can be significant proportions of the PMP flood. This led Boughton to believe that a distribution which can accommodate an upper bound should be formulated for use with such series. It might be expected that some guidance on the question of whether an upper bound exists to flood magnitude or not might be obtained by recourse to physical examination of the precipitation process which causes floods. While the concept of a maximum possible rainfall amount, from the viewpoint of the limit of the capabilities of the physical processes causing it, is very well publicised and documented (WMO, 1969,1986), it is not universally accepted by hydrologists, (see for instance, Yevjevich (1968), and Wallis, (1980)). In this context also, Alexander et al. (1969(a) and (b)) attempted to derive what the distributional form should be which describes the series of annual maximum floods from consideration of the distribution of the rainfall magnitudes involved. However, having examined many aspects of the problem the authors were unable to offer solid guidance on the question of choice of distribution for floods. Reich (1970) using data from 26 Pennsylvanian catchments found that "no usable relationship could be found between the extreme value statistics of rainfall and floods." On the other hand, French practice assumes that "the scale parameter of the distribution of extreme flood volume (and hence indirectly that of the flood peak) can be extrapolated using the distribution of rainfall volume (Guillot, 1973) using the method known as GRADEX. CHAPTER 3 TIlE MODELLING PROBLEM 17 300,------------------'-------------------,----,'" ALL x HAVE Cs =2.5 f" = 100, C v =0.3 FOR EV2,TCEV,WAK Ck",,19, 21, 25 RESPECTIVELY [GEVk =-0.15 FOR =+0.15 FOR EV2 A EVl EV3) 200 _------- ---x-EV3 100 O+-------.--------r~-----,..-------_,_-_____j o -2 2 7 6 4 300~-----~------------------------r71 ALL HAVE )'- C s =2.5 x FOR =100, C v = 0.3 P3,LN3,LP3 C S =0.93 FOR LN2 C s = 1.14 FOR EVl 200 100 ---------~EVl 1.01 1.11 1.5 2 5 10 20 50 100 200 500 103 ol--'====='==~;==='====~==;:::=' ==='=====';:::=='====~"~AM;~==~ o -2 Figure 3.1 2 4 EVI Magnitude - return period relations for selected distributions Y 6 7 18 STATISTICAL DISTRIBlITION FOR FLoOD FREQUENCY ANALYSIS Distribution Name Variate and parameter ranges Probability density function fix) or Distribution function F(x) Extreme Value Type I -oo~x~oo U (Gumbel or EVI) F(x) = exp{- e-f: . General Extreme Value (GEV) a>O )} rr a>O . F(x)=exp - u+~~x:"':Oo if k<O ICke-U)r'k} a k -Oo<x:",:u+~ ifk>O k Extreme Value Type 2 (EV2) Log~Gumbel k>O, F(x) = exp{-(:~:J} e~ x 0:"': e < U F(x) = exp {- e' (lop - b)/a) } a>O -Oo:"':logx~Oo Pearson Type 3 f(x) = etr exp{-~} Ial r(b) a m:"':x if a>O . x:"':m if a<O Gamma f(x) = (x/a)b.l Ialr(b) x exp (--) a O:"':xifa>O Le. Pearson Type 3 with m = 0 Exponential f(x) = -I exp (x-m) - -'-- 0' a a <, i.e, Pearsou Type 3 with b = i;'1lIld Weibull with b = I. Weibull be-my-' f(x) = lal -aexp {(X-mn - ..- x:"': 0 if a<O m:"':x .. mSx if a> 0 x;:o,mif a<O Table 3.1 Mathematical definitions of distributions used for AM series CHAPTER 3 19 TIlE MODELLING PROBLEM Variate and parameter ranges Distribution Probability density function fix) Name or Distributionfunction F(x) 2 ParameleC f(x) = 1 eX{4-C fi1rax f() x = 1 exp{ '-,cOg (x-m) - by} fi1ra.(x-m) , a Og O<x ,x - by} a Lognonnal (LN2) 3 Parameter m<x Lognonnal (LN3) 2 Component Extreme Value (TCEV) F(x) = exp(-A, e-x,e, _A2 e- X ,e,) Wakeby (WAK) x = m + a[1 - (1 - F(X))b] - c[l- (1- F(X))"dJ See Appcndix 4, Table A4.4. Log Pearson If x is LP3 and z = log x c ~ z~ cC::;x$oo a> 0 Type 3 (LP3) f(x) = (';')'"' exp{-C'C)} xlaW(b) a -oo::;z$;c a<O x >0 (:I, > 0 (:12 > 0 DO O$;x$ec Ifz-log x then z-/4 C In[-ln(F)] -A Boughton (1980) --=K=A+ Log -logistic (LL) F(x) = {I + [(x - a)/b)""'}"' a, -=~K~A x> a, e> 0, b > 0 (Ahmad et al,1988) Generalised logistic (GL) F(x) = {I +[I-){x - a)IPJ"'f , y;t 0, (Ahmad 1988) r> 0, =' Table 3.1 (continued) y< 0, a + {I + exp[ -(x - a)IP]r' , y=O, _00 Ply~ < x ::; -oo<x<oo x < = a + Ply 20 STATISTICAL DISTRIBUTION FOR FLOOD FREQUENCY ANALYSIS 40, ---.." f---IE~P. :>0f--~----_"7'-----:::>l // .:, ~ 1--- ...... ~ --- ,,~ NORMAL (Cs:O) O·~----____:c:_---'--''--~ 0-0 0-5 C v 1-0 I I ...L_ _-L._---L_...L.....l_.L....L...l.....L ,l- SKEWNESS 0" L..._ _.L.._...L_.l----L.....l----l.....l...l '0 Figure 3.2 Moment-ratio diagrams for selected distributions CHAPfER3 3.3.1.6 21 THE MODELUNG PROBLEM Role of skewness in distribution choice It must be acknowledged that a distribution for annual maximum floods cannot be chosen solely on the basis of a priori theoretical arguments. The characteristics of observed flood data must be determined in a suitable fashion and taken into account when a distribution is being chosen. Some distributions can be excluded if it is known that random samples from them do not have characteristics in common with observed flood data. Slack tl.l!l. (1975) have shown in the context of quantile estimation, that quantile estimates with small expected opportunity design loss (which is a function of quantile estimate bias and rmse) are obtained even when the assumed form of model distribution is not identical to the population distribution provided that the model distribution is selected from among distributions which have approximately the same skewness as the population. In the case of a three parameter distribution this would involve placing a constraint on the shape parameter. This is one reason for the need to be able to make correct inferences about the AM population skewness. Rossi illl. (1984) actually rejected EVI and LN2 distributions for Italian conditions because of their inconsistency with observed skewness of the data. 3.3.1.7 Parsimony A further question is whether the flood series of all the rivers in a particular region, country or continent should or could be considered to be distributed according to a common distributional form. While this hypothesis may not be provable, the results of random sampling experiments (Wallis et a1.,1974; Matalas illl.,1975) indicate such diversity of samples from a common parent that this hypothesis might well be difficult to reject. Its acceptance would concur with the general scientific preference for parsimony of model types. 3.3.2 DISTRIBlITIONS FOR PD SERIES The modelling problem in the partial duration series is also one of choice of distribution coupled with a choice of the number of peaks, M = AN, to include in the series. While increasing M increases the amount of information in the series it sometimes makes the problem of choice of distribution more difficult and it may also affect the assumptions made about model structure as explained below. An exponential distribution of magnitudes is moderately satisfactory in series of length M < 2N. Increasing the size of the series to M > 3N or more, often introduces a striking departure from exponential behaviour at the lower end and results in a series that is not always easy to model. If qo is too low the distribution displays a mode at some value greater than qo' as illustrated in Figure 3.3(a). This may be more difficult to model than a distribution whose mode is at qo' In such cases, the statistical algebra of truncated distributions may be complex. The exponential distribution is exceptional in this regard and the invariance of its algebraic form with change in threshold level helps to account for its popularity in this application. f(q) f(q) L--:~-----------"q L- qo (a) Low value of threshold. Truncated distribution with mode> qo. This can be difficult to model. Figure 3.3. .L.- -=== q qo (b) Medium to high value of threshold. Truncated distribution with mode at qo. This may be modelled more easily. Effect of threshold level on type of distribution required to model the PD series. 22 STATISTICAL DISTRIBUTION FOR FLOOD FREQUENCY ANALYSIS In most published papers on analysis of partial duration flood series the exponential distribution plays a prominent part but other distributions have also been uscd. These distributions (log-normal, Pearson 3, log-Pearson 3 , Pearson 4, Polya, gamma, sinepower, geometric, Goodrich and extreme value type I) are listed in Table X of WMO Operational Hydrology Report No. 15 (Sevruk and Geiger, 1981) together with references to publications in which they appear. Many of these refer to precipitation scrics. The generalizcd Pareto distribution (GPO), (van Montfort and Witter, 1985), can also be adapted for use in this context and could be a more realistic model for PO magnitudes than the simple exponential. This would be consistent with the use of GEV distribution for AM series. 3.3.2.1 Clusters ofpeaks in PD series If the number of peaks, M, includcd in thc scries is high thcn the corresponding threshold, '10, is necessarily low. In some years in these circumstances flood peaks exceeding qo appear to occur in clusters which gives rise to the fear that the successive flood peaks are not statistically independcnt and/or that the rate of occurrence of the peaks as distinct from the magnitudes, is controllcd at times by some form of persistent process. Cunnahe (1979), having applied arbitrary but consistcnt rules to exclude somc flood pcaks, found no correlation among peak magnitudes but concluded that a persistence mechanism may exist in the process which gives rise to the flood peaks. Ashkar and Rousselle (l983a) discuss the effect of choice of threshold level qo on the validity of the PO model assumptions and recommend that qo be chosen sufficiently large to ensure that the occurrence of flood peaks exceeding qo occur in a Poisson fashion. Ashkar and Rousselle (l983b) examine the effect of placing restrictions on the selection, for the PO series, of flood events from the observed historical series to exclude peaks which might be feared to be inter-correlated. They reiterate the advantages of choosing a threshold qo which allows the valid use of the Poisson assumption but where circumstances do not allow such a choice, they refer to the relatively new stochastic trigger type model (Cervantes 1981, Kavvas 1982 (a) and (b), Cervantes et al. (1983)). Apart from these restricted comparisons no further objective comparisons based on bias and efficiency of Or estimates have been published and the TS model has never been considered in this way. CHAPTER 4 METHODS 4.1 OF QUANTILE ESTIMATION Introduction Many methods of quantile estimation have been suggested in the past, some of which are tailored for particular circumstances. As indicated in Section 4.2, estimation methods depend on data availability and on the amount of regional pooling of data which is to be allowed. Section 4.3 outlines a number of basic distinctions between different types of methods while section 4.4 outlines the main types of schemes used in practice for at-site, at-site/regional and regional only (ungauged catchment) cases. Section 4.5 discusses design flood specification and draws attention to the question of expected probability correction. In view of the increased use of regional analysis, the value of which is being continually demonstrated, the question of regional homogeneity of flood statistical behaviour is of great importance and this is discussed briefly in Section 4.6. Some aspects of flood estimation in arid zones, by no means a solved problem, are discussed in Section 4.7. While the PD model has some advantages over the AM model much of what follows relates to the AM model because it is by far the more widely used mainly becanse the AM series is more easily available. 4.2 Dependence on data availability The method adopted for estimation of QT in any situation depends on the amount and type of hydrological data available at the site in question. Two broad categories of data availability are:(a) at-site hydrological data are available, along with data at some other sites in the region; (b) at-site hydrological data are not available but data at some other sites in the region are available. From the data input point of view, three categories of quantile estimation schemes may be identified:(a) use of at-site data alone; (b) joint use of at-site and regional data; (c) use of regional data alone in the absencc of at-site data. These three methods are amplified in Section 4.3. In the absence of at-site and regional flow data recourse would have to be had to estimating flood qnantiles from minfall statistics and a rainfall-runoff model by methods which are not the subject of this report. 4.3 Some basic distinctions and concepts 4.3.1 DISTRIBUTIONAL AND PARAMEfRIC METIIQDS (OR DISTRIBlITlON-FREEAND NON-PARAMIITRIC METI-lODS) The probability, that the ith largest among N past AM flood events will be exceeded during N' future AM floods, may be estimated, using "ball and urn" type probability models, without making any assumption about the distribution of flood magnitudes (Thomas, 1949; Gumbel, 1958, Chapter 2.2). Such a technique is called distributionfree. Its drawback is that it can be applied only to the N observed variate values and neither intermediate values nor values outside the observed range can have their probabilities estimated. Hence quantiles of arbitrary return periods cannot be estimated directly by "ball and urn" distribution-free methods. The use of modem non-parametric methods, suitable for use with large samples, has been demonstrated hydrologically by Adamowski (1985). Such a method can obviate some of the disadvantages of the "ball and urn" methods at the expense of computational effort and some complexity. If AM data arc represented on a cumulative histogram or on a probability plot an eye-guided curve may be drawn through the plot which thus defines in a non-algebraic manner, a relation between magnitude Q and probability (and hence T). Such a non-algebraic relation defines the distribution of Q in a non-parametric manner. If such a relation 24 STATISTICAL DISTRIBUIlONS FOR FLOOD FREQUENCY ANALYSIS could be described algebraically by an expression which depends on only a few constants or parameter values (e.g. equation 1.6), it would be called a parametric description of the distribution of Q. In general, distribution free methods are less efficient (see section 4.3.2) than distributional methods. They are by nature mOre robust than distributional methods but this in itself is insufficient to recommend them. The statistical nature of flood series (Chapter 2, Section 2) is sufficiently well-known to make a reasonable, if not perfect, choice of a distribution, whose use would yield more efficient quantile estimates than distribution-free methods. Similarly, parametric description of distributions leads to quantile estimators which are objective and whose sampling properties are capable of being more readily investigated. They are generally more efficient and hence they are preferred to nonparametric descriptions of flood distributions. 4.3.2 BIAS, STANDARD ERROR ANDEFFlCIENCY Objective quantile estimation methods are based on methods devised for use with truly random samples from stationary populations. Such random samples have the characteristic that different samples, when treated in the same way, generally yield numerically different values of quantile estimates. Subsamples of long AM series also yield different values of quantile estimates and their variation is similar to what would be expected to occur among estimates obtained from truly random samples, a finding which lends some validity to the assumption that AM series can be treated by random sample methods. (See for instance NERC 1975, 1.2.5). A If Or is the value of quantile estimate obtained by a particular distribution and estimation (DIE) procedure and QT is the population value, define bias, b, standard error, se, and root mean square error, rrnse, by A = E(OT) -OT = E[ OT - E(QT) ] se(QT) = [ var«h)] 1/2 (4.3) rmse = [E (QT - OT)2]1/2 (4.4) (rmse)2 = (se)2 + b 2 (4.5) b A var (QT) A A A (4.1) 2 (4.2) from which One DIE procedure is more efficient than another if it has smaller sampling variance (= se2) than the other. Efficiency of a DIE procedure is a property which is inversely proportional 10 sampling variance. 4.3 .2./ Confidence intervals A 90% confidence interval for a quantile, based on a single estimate QT, is usually evaluated as (4.6) A on the assumption that the QT statistic is normally distributed (e.g. Hosking et aI., 1985a, p.95 and Fig.5). Hebson A and Cunnane (1987) found that QT estimates have a skewed distribution when obtained from small samples by at-site estimation methods but that they are very close to being normally distributed when obtained by combined atsite/regional methods. Guidelines for flood frequency analysis in the 'United Stales (USWRC, 1981) contain methods for computing confidence intervals for skewed distributions (specifically LP3) using a non-centralt-distribution while Stedinger (1983) gives a good account of confidence intervals in the hydrological context. A confidence interval obtained in this way needs 10 be interpreted with care. The 90% confidence interval thus defined would include or straddle the true value QT in 90% of samples in repeated sampling. It does not mean that there CHAPTER 4 25 METIIODSOFQUANITLEESTIMATION A is 90% probability that QT lies in the interval, defined in equation (4.6), calculated from a particular sample. The latter type of statement would be valid only in a Bayesian context. 4.3.3 BAYESIAN AND NON-BAYESIAN MErnODS OF ESTIMATION In non-Bayesian estimation, parameters and quantiles of a population are considered to be fixed but unkown. Their values are estimated from random sample data. In Bayesian estimation, parameters and quantiles are considered not as unknown constants but rather as unknown, random variables, the distributions of which are modified by random sample data in the estimation procedure. In Bayesian methods information about parameters and quantiles from separate sources, such as regional and at-site data, are combined in a logical framework. The distribution of each parameter, and that of Qr also, is modified by each new piece of information. Initial knowledge, if any, is expressed as a prior distribution and sample information is incorporated in the form of a likelihood function which depends on the distributional form chosen for the population. The prior distribution and sample likelihood are combined via Bayes' Theorem to give a posterior distribution of the parameters and of the quantity required, Qr. If there is no initial knowledge on the parameters, they are assigned a uniform prior distribution. This is known as the non-informative prior distribution. An informative prior can be obtained from regionally-based equations relating the parameters to characteristics of the catchment such as area, slope, soil cover and an index of climate such as mean annual rainfall, (Cunnane and Nash, 1971; Wood and Rodriguez-Iturbe, 1975; Kuczera, 1982; Kuczera,1983). (If upper and lower confidence bounds, Qu and QL are obtained for Qr from its Bayesian posterior distribution then we may speak of the probability of the event [QL < Qr < QuJ if we understand the probability to be a subjective probability whereas we cannot make such a direct statement in the non-Bayesian case). The Bayesian estimator is also of direct use fordccision-making (Wood and Rodriguez-Iturbe, 1975). Inference about a hydrologic quantity such as QT is not an end in itself but one of the steps taken towards an engineering and economic decision such as the height of a levee, depth of a channel or width of a spillway. Bayesian inference can be combined with decision theory in a far more satisfactory manner than can traditional frequency-based inferences. In this approach model uncertainty can also be taken into account (Wood,1974). 4.3.4 DATA DISPLAY Ranked sample data are usually displayed on a probability plot having a normal, Gumbel (EVI) or exponential base. Unbiased plotting positions should be used (Lieblein 1953, Cunnane, 1978) because the traditional Weibull formula, i/(N+I), leads on average to data from a "straight line" population having an elongated S appearance on the probability plot. The resulting bias in graphical quantile estimates was noted by Benson (1960) in his classic sampling experiment, but such bias was not attributed by him to the plotting position bias. The unbiased plotting formula are: For normal paper: = i - 3/8 N + 1/4 (4.7a) (BJorn 1958) For Gumbel and = i - 0,44 N + 0.12 (4.7b) exponential Paper: (Gringorten 1963) where Fi is the plotting probability, N is sample size and i is rank with i = I indicating the smallest sample member. Unbiased plotting formulae are distribution-specific. If a single distribution frcc formula is required then 26 STATISTICAL DISTRIBUTIONS FOR FLOOD FREQUENCY ANALYSIS = i - 0.4 N + 0.2 (4.8a) might be a reasonable compromise for mildy skewed data (Cunnane, 1978) while Hazen's formula (Hazen, 1914) Fi = i - 0.5 N (4.8b) would be more suitable for more skewed data. Unbiased plotting positions have been considered by Ii et al. (1984) for the P3 distribution and by ArnellJll ill. (1986) for the GEV distribution. Each of these specifies the plotting formula in tenns of i, N and constants which vary with the respective shape parameters of the distributions and which are available in tabular fonn. In-na (1988) developed a single formula, in which skewness is explicitly included, for unbiased plotting positions for P3 samples which, after a little rearrangement, can be expressed as F·1 = i - 0.53 + 0.3 C s N + 0.05 + 0.3 C s (4.9) a result also published in Nguyen tlJI!. (1988). In these cases specially graduated graph paper can be constructed for specific values of the shape parameter on which data from a distribution having that value of shape parameter would tend to plot as a straight line. Zhang(l982) and Hirsch (1985) have considered plotting formulae in the context of inclusion of historical flood data in estimation schemes. Probability plots can also be prepared on ordinary graph paper if the plotting positions are expressed as the expected values of the reduced (standardized) variate order statistics E(y(i», (Cunnane, 1978). An approximation to these are Yi = y(F;) = E(Y(i» (4.10) where y(F) is the inverse fonn of the cumulative distribution function of the appropriate reduced variate, and Fi is obtained via equations (4.7) or (4.8); for example, Yi = - In [-In [(i - 0.44) I (N + 0.12)] ] (4.11) is an approximation to E(Y(i» for EVI which is almost exact at i=N. For the exponential case the exact expression for E(Y(i» has been given by Sukhatme (1937), ~ j I .L.tj=t N + I - j 4.4 Quantile estimation schemes 4.4.1 USE OF AT-SITE DATA ALONE (4.12) AM series data are considered as a random sample from a population of flood values whose distribution can be described by a probability density function which depends on just a few parameters. Traditionally one of the two or three parameter distributions of Table 3.1 is selected. The three parameters usually related to location, scale and shape of the distribution and are closely analogous to the distribution's mean, standard deviation and coefficient of skewness. Having decided on a distribution, parameter estimation may be made by a non-Bayesian or by a Bayesian method. The latter is generally used in the regional context only (see 4.4.2.2 below). 4.4.1.1 Non-Bayesian methods These include moments (MOM), maximum likelihood (ML), least squares (LS), probability weighted moments, (PWM) (Greenwood et al., 1979) and sextiles (Jenkinson, 1969). MOM, although easy to apply, does not use all the sample information in an exhaustive manner and is not as efficient as ML estimation, especially in three parameter distributions (Matalas and Wallis, 1973). CHAPIER4 METIlODS OF QUANTILE ESTIMAnON 27 ML method is regarded as being best because it is most efficient. That is, the sampling variance of the estimated parameters and hence of Qr is asymptotically smaller than by any other estimation method. ML estimates are frequently biased but corrections for bias can be found (Fiorentino and Gabriele, 1984, Hosking, 1985). More seriously, ML estimtes may not always be feasible in small samples from three parameter distributions (Matalas and Wallis, 1973, Hosking !ll.Jl!.... 1985b). The application of ML estimation is no longer unattractive from a numerical point of view because of the widespread use of programmable calculators and the increased number of micro and personal computers now available. The location and scale parameters of a distribution can be obtained by least squares regression of the ranked sample values on ordered reduced variate values known as plotting positions. The use of generalised least squares for this purpose was pointed out by Lloyd (1952) while Chow(1954) used ordinary least squares and Gumbel (1958) introduced a modified least squares procedure. The resulting scale parameter, and hence Qr, will be biased upwards if the Weibull plotting position, i/(N+I), is used (Lowery and Nash, 1970) but need not be biased if suitable plotting positions are adopted (Lieblein 1953; Cunnane 1978). The PWM method, developed by Greenwood lllJl!...1979), calculates linear functions M(O), M(l), M(2) of the sample data (see Appendix 4) and these quantities are equated to theoretically derived expressions in the distribution's parameters in a manner analogous to the use of ordinary moments. In the EVI and exponential cases the resulting quantile estimates are linear functions of the sample data, a property shared by least squares estimates in these cases. The PWM method has good statistical properties (Landwehr .e1.ll!., 1979, Hosking tl.ll!., 1985b). It was originally used, however, only with distributions whose distribution function F(q), is explicitly expressible in inverse form q = q(F), thus ruling out normal, Pearson type 3 and their log-distributions. However Song and Ding (1988) have derived an algorithm for PWM estimation of P3 parameters and the use of PWM estimation for TCEV has been investigated (Beran tl1!!., 1986). A rigorous definitive account of PWMs is give by Hosking (1986). If parameters of a log-distribution are being estimated by moments there is more than one possible variation, viz. (a) the moments of the data may be equated to the corresponding expressions in terms of parameters in the flow or real domain (b) the moments of the logarithms of the data may be equated to expressions in the parameters in the logdomain or (c) one parameter may be estimated in the real domain and one or more parameters may be estimated in the log domain. The latter is referred to as the method of mixed moments (see for instance Phien and Hira (1983». 4.4.1.2 Historical Data Methods have also been developed, with the help of existing techniques for censored samples (see for instance Kendall and Stuart, 1961), to incorporate some types of historical information or to allow data to be used which have been truncated owing to an inadequate streamflow recorder. Such techniques have been reported by Benson (1950), Leese (1973), USWRC (1981), Condie and Lee (1983) and have recently been examined critically by Hosking and Wallis (l986a, 1986b) and by Stedinger and Cohn (1985, 1986). Chinese practice (Chen tl.ll!., 1975; Hua, 1985) is based on graphical estimation using plotting positions whose derivation has been thoroughly investigated by Zhang (1982). 4.4.1.3 Robustness Another topic which has rightly attracted the attention of hydrologists (Houghton 1977, Kuczera 1982b) during the last decade is the robustness of statistical estimators. Broadly speaking an estimator is robust if it estimates Qr "sensibly well" even if the assumptions used in the estimates are slightly wrong or if the data are ill-behaved because of outliers or measurement errors. Robustness has been more thronghly discussed in Section 3.3.1.1. 4.4.2 JOINT USE OF AT-SITE AND REGIONAL DATA. This method, of which there arc many variations, assumes that AM populations at several sites in a region are similar with respect to statistical characteristics which arc not dependent 'on catchment size. This homogeneity assumption is a gross over simplification but it is a very convenient and effective one. In most circumstances it is advisable to combine site specific and regional information. If only a small sample of AM data are available at a site, one could not hope to estimate the entire Q-T relationship from it. At the very most no more than two parameters of the AM distribution would be estimated from the sample while the form of the distribution to be fitted would have to be chosen in the light of regional experience. Indeed, in view of the very large variations that occur in C v estimated from small samples drawn from a parent population, the estimation of a second parameter from small samples is of doubtful validity. 28 STATISTICAL DISTRIBUTIONS FOR FLOOD FREQUENCY ANALYSIS In addition, if a three parameter distribntion is adopted the third parameter, skewness or a function of it, cannot bc estimated from the small sample because of the high sampling variance involved. Hence an average regional value of skcwness must be used. Such a practice is currently prevalent in the United Slates of America (USWRC, 1981) based on a map of "generalized" 10g- skew. This practice is not condoned however by all hydrologists in that country and is strongly opposed by some (see Wallis, 1980), on the grounds of serious slatistical shortcomings (Landwehr tlJ!l, 1978). In the U.K. NERC (1975) recommended that if only a short record were available, that Q be estimated from it and that a regional multiplier QT/Q be used in conjunction with it This of course not only assumes a conslant regional skewness but also a conslaUt regional coefficient of variation. Thus in general, one or two parameters of the flood population at the required site are oblained from the at-site record and the remainder of the required information is obtained by some regional averaging procedure. 4.4.2.1 Index flood procedure The variate Q is divided by Q and the resulting variate X = Q/Q is assumed to have the same form of distribution at every site. In this context Q is the index flood. The parameters of the distribution of X are oblained from the combined regional dala sets and Q for any application is obtained from at-site data. The quantile OT is then estimated as A OT _ = Q . XT (4.13) The parameters of the X distribution may be obtained by (a) Regional averaging of dimensionless at-site quantile estimates QT/Q (Dalrymple, 1960) (b) Regional averaging of dimensionless moments such as C v (Nash and Shaw, 1965) and/or Cs (USWRC, 1976, 1977, 1981). (c) Regional averaging of dimensiouless at-site order Slatistics X(i) = Q(iyQ and fitting a distribution to these either graphically or numerically by regression-probability plot approach or by probability weighted moments. A variatiou of this, adapted for use with records of nnequallength was described by NERC (1975,1.2.6). The X(i) quantities were also used by Houghton (1978) to estimate a regional "righteous" Wakeby distribution from 46 US flood records of length 60-years. (d) Regional averaging of dimensionless PWMs, M(l)!M(O), M(2)/M(O), M(3)!M(O),as outlined by Wallis (1980). (Note that M(O) = Q = sample mean). This method has been found to be easy to apply and is robust and efficient. An example of its use is given in Appendix 5. (e) Pooling all x = Q/Q values (Xij = Qij/Qio i = I, 2 ... Nj; j = I, 2... M) and treating them as a single sample from a population for parameter estimation purposes. This could be called a "slation year" approach. In (a) to (d) slatistics are averaged over sites without invoking the space-time equivalence principle encompassed in the station year approach (e). Hence (a) to (d) avoid most undesireable consequences of interstation dependence or correlation. Procedure (d) is demonstrated by way of numerical example in Appendix 5. CHAPTER 4 4.4.2.2 29 METIlODS OF QUANTILE ESTIMATION Bayesian Procedures The use of Bayes' Theorem for combining prior and sample flood information was introduced by Bernier (1967). Cunnane and Nash (1971) showed how it could be used to combine regional estimates of Q and C v obtained from catchment characteristics, using a bivariate log-normal distribution for Q and Cv , and site data assumed to be EYI distributed to give a posterior distribution for Qr. This method involves considerable amounts of numerical integration. Wood (1974) discusses the manner in which uncertainty about model type can be taken into account by use of Bayes' Theorem while Wood and Rodriguez-Iturbe (1975) give a specific example of Bayesian analysis which uses regional and site hydrological data, flood cost and damage functions to determine the optimum size of flood protection works. They assume that floods are log-normally distributed and use conjugate distribution theory, rather than extensive numerical methods, to obtain the posterior distribution of the log-normal distribution parameters. Kuczera (1982) has given a very thorough description of a general framework in which regional and site specific information can be combined in Bayesian analysis to give a posterior distribution of a flood quantity, together with a study of risk. This he describes as the empirical Bayes (EB) approach. Kuczera (1983) extends this to the study of the effect of spatial correlation and sampling uncertainty on the above EB procedure. It should be noted that while application of a Bayesian method involves more algebraic development than a conventional treatment, the final estimate is capable of being interpreted as a linear combination of a prior (regional) and a sample (site record) estimate. This can be exactly so in the case of prior and sample distributions both being normal. 4.4.2.3 Two component extreme value (TCEV) procedure. This procedure (Rossi et aI., 1984) is a regional flood frequency procedure which uses a 4-parameter distribution. It is intended for use in conditions where some floods are considered to be caused by a different physical mechanism which tends to produce outliers Le. abnormally large floods which are apparently not consistent with the remainder of the series. Ideally, such events should be recognisable by their physical causes in which case two of the parameters would be estimated from "ordinary" flood events and the other two would be estimated from the "extraordinary" flood events. If there are insufficient of the latter available at any site then these two latter parameter values would have be estimated from regionally pooled and standardised "extraordinary" values. In the case of Italian data however Rossi tl.ll!. (1984) were unable to identify, by cause, floods of the two types even though they were able to establish that outliers exist in their data set. They thus resorted to estimation of parameters without decomposition of the data into two types. Because of the small number of extraordinary events at any site it was also necessary to adapt a regional approach, which entails some sort of standardisation method to remove the effect of differences in catchment sizes and other characterstics. The TCEY distribution can be written as = exp[- Al e- q/9[ - A z e-q/92] (4.14a) = exp[- e -(q - c[)/9rJ (4.14b) . exp[- e - (q - c2)/9i1 = where c[ 9 1 In Al and c2 = 92 In A 2. This represents the product of two EYI distribution functions which indicates that Q is the maximum of two independent EYI variates. The observed data at each site are standardised by y' (4.15) = where (Ch91) are ML estimates of EYI parameters obtained from the trimmed AM sample following omission of outliers previously detected in a specified way. Y' is distributed in a similar way to X, F y ' (y') = exp [-A', e-y'/S'I - A'2 e-y'/S'2] (4.16) with modified parameters, (4.17) 30 81'ATISTICAL DISTRInUfIONS FOR FLOOD FREQUENCY ANALYSIS The jth sample of M stations in a region provides Nj valnes of y', viz. [Y'ij, j = I, 2...Nj]. The station year assumption is then invoked and L = N, + N 2 + ... Nj + '" + N M values of Y' are assumed to form a single random sample from a TCEV distribution. The parameter values are estimated by the method of maximum likelihood, using techniques originating with Hasselblad (1969). Having estimated regional parameters for y', regional quantiles Y"T can be obtained from which at-site estimates can be obtained by inverting equation (4.15) (4.18) where (Elj, ej) are values of (El ,ell obtained from the trimmed data set at site j. The above method has been applied to 39 Italian AM series totalling 1525 station years by Rossi ~. (1984), and by Beran et al. (1986) to 57 British AM series totalling 2334 station years. In both cases random samples from the fitted regional distribution gave skewness values which were as variable as the skewness of the observed data, thus accounting for Matalas lllJ!l. (1975) condition of separation of skewness. 4.4.3 USE OF REGIONAL DATA ALONE When no site data are available there are currently three possible procedures. The first of these uses the index flood approach in the following three steps and was formalized by Dalrymple (1960). (a) Establish a dimensionless XT - T relation, where XT = QT/Q, by one of the methods of Section 4.4.2.1; (b) Estimate Q from a regionally calibrated relation bctween Q and physically measurable catchment characteristics (Benson,1962; Nash and Shaw, 1965; Thomas and Benson,1970; NERC,1975; Stedinger and Tasker,1985, 1986) and f\ (c) use equation (4.13) to obtain OT. The form of relation adopted for estimating Q depends on the amount of physical and climatic data available. A considerable rangc of numerically expressed catchment characteristics have been described by Benson (1962) and a wide range of logarithmic regression equations for both high and low flows have been given by Thomas and Benson (1970). A second approach, used by Nash and Shaw (1966) and also by US Corps of Engineers, is to summarise each observed streamflow record in a region by the mean Q and coefficient of variation Cv of its annual maximum flood series and to derive separate relations, by logarthmic regression, between these two quantities and catchment characteristics. Using the latter values of an ungauged catchment, estimates of Q and Cv are obtained. An assumption is then made that the AM population is distributed according to some two parameter distribution such as EVI, gamma or LN2 and the distribution is filled by the first two moments obtained via Q and C v. However while a statistically significant relation between Cv and catchment characteristics may be obtained, the resulting equation usually has a prohibitively large standard errcr. . . A third approach is to estimate Qr separately at each site in the region, by filling a distribution to the AM data for that site, for a selection of T values such as 2,5,10,25,50 and 100 years. For each T a logarthmic regression f\ relation is established between Qr and the catchment characteristics. Then when f\ OT is required for an ungauged f\ catchment, the values of the relevant catchment characteristics are inserted in the derived equation and OT is evaluated. This approach has the disadvantage that many sets of parameters have to be estimated. There is considerable sampling Q,., / Q,.2 > I even error in each relation derived and the ratio of OTI to OT2 may not be realistic. It could occur that though T, < T2 . However this laller condition rarely oecurs in practice (W.O. Thomas Jr., 1988 Pers. Comm.) and can usually be avoided by using the same catchment characteristics for each value of T. This third method is widely used in the U.S.A. especially by U.S. Geological Survey (Thomas, 1987; Tasker, 1987). The paper by Thomas (1987) contains a list of more than 60 current reports for individual states in the U.S.A. which give equations relating OT to catchment characteristics, for instance Eychaner (1984), Simmons and Carpenter (1978), Bridges (1982) to mention but CHAPTER 4 31 METI-lODS OF QUANTILE ESTIMAnON a few; a list of 13 reports detailing methods of estimating peak floods from channel geometry, for instance Hedman and Riggs (1978); Osterkamp (1982) and a list of26 reports which include methods for urbanized catchments, for instance Olin and Bingham (1982) and Conger (1986). 4.5 Specification of design flood quantile Stedinger (1983 b) draws attention to the fact that two formulations exist for the design flood quantile. The first or traditional formulation is to seek the best sample or regional estimate of (4.19) where Jl and IT are population mean and standard deviation and KT is a frequency factor (Chow, 1951) dependent on the " " and IT are statistical estimates of Jl and IT then the traditional estimate of Qr is form of the distribution and on T. If Jl " = Jl" + cr" KT Qr (4.20) The second formulation takes into account Beard's (1960) observation that the expected exceedance probability, over many equally sized samples, is not Iff as is required. In the case of Q beiug a normal variate Beard (1960) showed that the expression Qr = t + 3 [I + (I/N)]I/2 t,,-! (4.21) does have the required exceedance probability Iff when averaged over many samples. Here N is sample size and tN_! is a Student t variate with N-l degrees of freedom. The use of equation (4.21) instead of equation (4.20) is known as Beard's expected probability correction. Stedinger (1983b) also shows that equation (4.21) is equivalent to the result obtained by Bayesian analysis using a non-informative prior distribution for the parameters. Moran (1957) also derived equation (4.20) for the design value by considering the joint distribution of the observed sample mean and standard deviation and a future, as yet unobserved, value from the population. This concept, well developed for the normal distribution case, should not be lost sight of when other distributions are being used. It is automatically taken care of in a Bayesian estimation framework. 4.6 Regional Homogeneity Regional flood estimation methods are based on the premise that standardised flood variate, such as X = Q/E(Q) has the same distribution at every site in the chosen region. In particular Cv(x) and Cs(x) are considered to be constant across the region. Serious departures from such assumptions could lead to biased flood estimates at some sites. Those catchments whose Cv and Cs values happen to coincide with the regional mean values would fortuitously not suffer such bias. Nevertheless if the degree of heterogeneity present is not too great its negative effect may be more than compensated for by lhe larger sample of sites contributing to parameler estimates. Thus xT estimated from M siles, which are slightly heterogeneous, may be more reliable than XT estimated from a smaller number, say M/3, more homogeneous sites, especially if flow records are short. At least five categories of questions arise in this context (Cunnane 1987), (a) is flood frequency behaviour of anyone of M sites in a region, with AM records available, inconsistent with that of the remainder of the group? (b) are geographically defined regions better or worse than regions obtaiued by partitioning the catchment characteristic data spaee? (c) how can a large group of catchmeuts be divided into homogeneous sub-groups or regions? (d) how can an ungauged catchment be allocated to one of a number of pre-selected homogeneous regions? (e) what degree of departure from regional homogeneity can be tolerated in a flood quantile estimation procedure? 32 STATISTICAL DISTRIBUIlONS FOR H..OOD FREQUENCY ANALYSIS The present state of knowledge about these questions is based on the work of Langbein (1947), Benson (1962b), Cole (1966), Biswas and Fleming (1966), De Courcey (1972), White (1975), Mosley (1981), Beable and McKerchcr (1982), Tasker (1982), Farhan (1984), Acreman and Sinclair (1984, 1986), Wiltshire (1985, 1986a,b,c) and Wiltshire and Beran (1986). The more recent of these have been surveyed in Cunnane (1987). Geographical regions are convenient and may divide a country into disparate regions by chance, because of variation of soils, climate and topography with latitude and longitude, but in general the geographical proximity of two catchments is no guarantee that they are similar from the flood frequency point of view. Wiltshire (1986a,b,c) shows that geographical regions in Britain do not display as much internal homogeneity nor as much heterogeneity between regions as regions dcrived by other methods proposed by him. Acreman and Sinclair (1986) draw similar conclusions using Scottish data. In seeking to divide a country into regions which are internally homogeneous but mutually heterogeneous, Willshire (1985) divides all his catchments into two or more groups by partitioning on one or more catchment characteristics. Thus each step consists of a trial set of regions and the internal homogeneity and mutual heterogeneity of these regions is numerically expressed in terms of a flow statistic such as Cv. The process is repeated by altering the . partioning points in catchment characteristic space until an acceptable set of regions has been identified. This is clearly a computationally intensive procedure. Acreman and Sinclair (1986), on the other hand, seek homogeneous regions by using a clustering algorithm which allocates catchments to regions by identifying clusters of catchments in catchment characteristic data space. The regions thus identified are tested for dissimilarity by the use of a likelihood ratio test on the assumption that data are GEV distributed. At this time it is not possible to define regions without some possibility of misaUocation of catchments to incorrect regions. If an ungauged or poorly gauged catchment is assumed to be assigned to one of several previously defined regions then a new problem arises unless the regions are defined geographically. Multivariate discriminant analysis, suggested by Mosley (1981), has been used by Wiltshire (1986a,b,c) for British catchments. In this method a probability of membership of region k, Pk, can be calculated for any catchment whose catchment characteristic vector is given using coefficients determined during the allocation of gauged catchments to regions. The standardized quantile estimate for that catchment is then given by (4.22) where summation is over all regions and xn is the kth region estimate of XT = QT / Q. Finally it may be stated briefly here that in regions of relatively low C v ( < 0.6 say) a small degree of heterogeneity does not negate the benefits of using an at-site/regional estimating method. However in extremely heterogeneous regions, C v > 0.6, and coefficient of variation of Cv > 0.2 then use of a single regional estimate of XT = QT / Qwill lead to severe positive or negative bias for some catchments (Lettenmaier and Potter, 1985). On the other hand, at-site estimates from short records in high Cv regions have such large standard error as to be almost worthless. 4.7 Flood estimation in arid and semi-arid zones Arid and semi-arid zones contain some streams which may have no flow for periods of time which sometimes extend to numbers of years. Thus the usual AM model assumptions are not true and statistical problems arise. (Measurement problems are also a great problem in such zones). According to Yevjevich (1979) arid zone precipitation series have similar statistical properties to truncated humid zone series, in which case the sequence of all peaks in an arid zone river might be expected to look like a PD series from a humid zone river; Thus PD models are very appropriate for arid zone hydrology when used with a regional standardization and parameter estimating scheme. In the absence of readily available PD data, AM models have to be adapted to take into account years having one or more zero flood values. A distribution may be fitted to the non-zero values and for selected flood values, Q, the conditional exceedance probability P'(Q) is calculated from the fitted distribution. The conditional probability is converted to unconditional probability by moltiplying by the probability of a non-zero flood year P(Q) = P'(Q) . [mIN] (4.23) CHAPTER 4 MElHODS OF QUANIlLE ESTIMATION 33 where P(Q) is the unconditional probability in !he annual series and mIN is !he probability of a non-zero flood year, m being the number of non-zero flood years in N years of record. A more complex version of this type of adjustment, based on the work of Jennings and Benson (1969), is recommended by USWRC (1981). The above type of treatment is more correct and reliable (Beard, 1974) than replacing zero values by some arbitrary non zero value such as 1%Q. Dalin (1986) has pointed out that AM series in arid zone streams in the Middle East may consist of three categories of data (a) zero or drought values, 10 - 20% of values (b) normal flood values, 60 - 80% of values (c) extreme flood values, 10 - 20% of values These percentages depend on the region and degree of aridity, and !he percentage of zero values may be as high as 50% in Southwestern United States (Thomas, 1988, Pers. Comm.). When plotted on probability paper these indicate three regimes, the upper two of which are reminiscent of Rossi tl..l!1's (1984) TCEV model and of POller's (1958) "upper and lower frequency" curves. Dalin demonstrates how the upper "extreme" values may be standardized by a statistic calculated from the "normal" flood values of which there are a greater number. The standardized "extreme" values are then pooled regionally and have a simple exponential distribution fitted to them. This may be also viewed as an index flood method applied to truncated data. Floods in arid zones arise mainly from intense convective thunderstorms of very limited areal extent and which thus affect catchments randomly with little spatial pattern or coherence. Some minor floods occur as a result of other lower intensity rainfalls leading to very high values of C v for arid zone flood series (see McMahon (1979), Wallis (1982)). Long records are essential for such circumstances but in their absence some form of regionalization technique such as those of Section 4.4.2 above or an adaptation of Dalin's (1986) approach or a regionally calibrated PD model is essential. CHAPTER 5 PROPERTIES OF QUANTILE ESTIMATORS 5. 1 Sources of error of estimation A The estimated value, Qr , may differ from the true value Qr because of (a) Inability of model chosen (AM or PD) to reproduce the population Q-T relation; (b) Incorrect choice of distribution to describe the population within the chosen model type; (c) Bias in the estimating procedure (ifthis is known to exist, a correction can be made for it); (d) Sampling error due to the fact that parameters are estimated from a fmite sample; ee) The available record (sample) may not bea truly random sample from the required population. No . control can be exercised over this even though tests can be made to test the reasonableness of the assumption. In this chapter sources (a) and (e) will not be considered further. It is inevitable in all flood estimation schemes that sources (b), (c) and (d) contribute. 5.2 Properties of at-site quantile estimators Investigations of these properties consider quantile estimates obtained from random samples under one of the following assumptions: (a) the estimating method is based on knowledge of the form of the parent distribution; (b) the estimating method is based deliberately on the assumption that the data have come from a distribution different in form from the true parent distribution. This enables the robustness of the estimating procedure to be examined. In either case the results are expressible in terms of bias, standard error and root mean square error. In case (a), the required expressions are obtainable analytically for some methods of estimation while for others recourse mnst be had to Monte Carlo simnlation methods. In case (b) analytical methods are too intractable and simnlation methods are always necessary. Several investigations of the above types have been published and Table 5.1 lists some of the relevant references. In general, standard error of estimate increases with T, population Cv and Cs values and is inversely proportional to sample size: se(&r) = (J ~(T, Cs) I (5.1) NII2 where (J is population standard deviation and ~( ) is a function which depends on the form of the parent distribution and on the method of parameter estimation. ~() would also depend on the form of the estimating distribution if different from the parent. Tables 5.2 (a), (b) and (c) show a limited selection of standard error values adapted or taken from published A sources. Table 5.2 (a) shows in the simple case of a two parameter distribution (EVI) how se(Qr )varies with N, T and parent Cv . For simplicity Table 5.2 (a) is based on ordinary moments estimation. Slightly different values would be obtained using other methods of estimation (Lowery and Nash, 1970; Landwehr~, I 980; Fiorentino and Gabrielle,1984; Hosking,1985). Table 5.2 (b) gives selected results from Hosking tl.1!L (1985 b) for GEV A populations and shows how seCOr) varies with N, T and method of parameter estimation for a fixed value of parent skewness. CHAPrER5 Model Distribution Extreme Value Type I Lognormal Pearson Type 3 Log Pearson 3 General Extreme Value (GEV) and LogEVI Wakeby TCEV • •• 35 PROPERTIES OF QUANTILE ESTIMATORS At-site Regional • • • •• • •• •• • • • •• •• •• • • • • • •• • • •• •• • § §§ §§ §§ Reference Kaczmarek (1957) Nash and Amorocho (1966) Lowery and Nash (1970) Landwehr et al. (1979a) Greis and Wood (1981) Fiorentino and Gabrielle (1984) Lettenmaier ~ (1985, 1987) Lettemnaier and Potter (1985) Kaczmarek (1957) Sangal and Biswas (1970) Burges l:Ul. (1975) Kuczera (1982b) Stedinger (1980) Lettenmaier and Potter (1985) Matalas and Wallis (1973) BobCe (1973) §§ Condie (1977) Nozdryn-Plotnicki and Watt (1979) Hoshi and Burges (1981) Phien and Hsu (1984) Wallis and Wood (1985) §§ § §§ Jenkinson (1969) Hosking et al. (1985a) Hosking et al. (1985b) Wallis and Wood (1985) Lettenmaier et al. (1985, 1987) Arnell and Gabrielle (1985, 1988) § §§ §§ §§ Landwehr et al. (l979b,c) Wallis (1980) Hosking l:Ul. (1985a) Wallis and Wood (1985) Arnell and Gabrielle (1985, 1988) §§ §§ Arnell and Gabrielle (1985, 1988 Arnell and Beran (1987) Parent and assumed model distributions are the same. Also tests model distribution under different parent distribution assumption(s). (Robustness test) . § and §§ For regional cases correspond to * and ** in at-site estimation. Table 5.1. Selected references to investigations into sampling properties of quantile estimates. 36 STATISTICALDISTRIBUITONS FOR FLOOD FREQUENCY ANALYSIS Table 5.2 (c) gives selected results from Kuczera (1982 a) in which a robustness study was carried out using a variety of methods to estimate quantiles of Wakeby parcnt distribution. These results show that 2 parameter estimators, LN2 and EVl, have smaller rmse values than 3 and 4 parameter estimators especially in small samples. A considerable proportion of rmse is due to bias in the 2 parameter estimators while that proportion is almost negligible in the 3 and 4 parameter estimators. In other words the more flexible 3 and 4 parameter estimators have negligible bias but very large sampling error while the less flexible 2 parameter estimators have considerable bias but much smaller standard error. Similar conclusions were drawn by Lellenmaier et al. (1987) in relation to EVI and GEV quantile estimators, the latter having negligible bias and very large standard errors while the former exhibit tight confidence intervals but with considerable bias when the population is GEV (k < 0). This type of result is displayed schematically in Figure 5.1 (a) and (c). IFIXED SKEWNESS MODELS (usually with 2-parameters) fa; Model C s < Parent C s Negv, bias (b; Model Cs ==> => 1 I > Parent Cs posve bias 1 "--- .... 0 !- _- .. ---- -- --........... - ..... - -. T o ~~- ~- ----- ~.......... 1--_ ....... --- -. -I ~- "...,.""'- - _ T -1 IVARIABLE SKEWNESS MODELS (usually with> 2 parameters) I - (0) At-site Use 1 small bias large se -_.- _ ' ", 1---- -- -.. a , ,, ... 0 (d) Regional lor xr + At-site low bias 1 low se , - T "" , "\ ·1 I I I I 0 10 10' 10' 1--_._--------.- o ----T I- - ------- --_. . -1 T ------------ I I 0 10 -- J 10' I 10' T bias/QT [ bias!: 1·96 se (aT)] / Q T Figure 5.1 Qualitative outline of simulation results about quantile estimates. (a) and (b) show effect of choice of wrong distribution having fixed but incorrect skewness. (c) and (d) show the effect of using a flexible distribution i.e. low bias and large standard error ( = se ) when used in at-site mode but with very much reduced se as well as low bias when used in at-site/regional mode. CllAl'IER5 37 PROPERTIES OF QUANTILE ESTIMATORS A se(Qr) Qr ax! A [se(Qr)/Qr]% Cv T 0.3 10 100 139.3 194.2 19.8 37.3 [14] (19) 8.7 16.7 [6.0] (8.5] 6.3 [4.5] 1l.8 [6.1] 0.6 10 100 178.6 288.4 39.6 74.6 [22] [26] 17.7 [10m 33.3 [11.5] 12.5 [7.0] 23.6 [8.2] 0.9 10 100 217.9 382.6 59.5 [27] 1l1.8 [29] 26.6 [12.2] 50.0 [13.1] 18.8 [8.6] 35.4 [9.3] 10 Sample Size 100 50 (a) Standard error of QT estimated by EVI/MOM in samples of size 10, 50 and 100 from EVI populations, with J.l = 100 and Cv =0.3, 0.6 and 0.9. T 10 2.84 100 7.55 A A se(QT) and [se(Qr) I Qr]% E Qr PWM ML JS 0.97 [34] 1.42 [50] 0.97 [34] 0.54 [19] 0.62 [22] 0.54 [19] 0.40 [14] 0.43 [15] 0.40 [14] PWM 4.15 [55] Not quoted 4.23 [56] 2.49 [33] 3.02 [40] 2.49 [33] 1.81 [24] 1.89 [25] 1.81 [24] ML JS 15 Sample size 50 100 (b) Standard error of QT estimates by GEV I PWM, GEV I ML and GEV IJS from GEV samples. Parent population parameters u = 0, a = I, k = . 0.2. (J.l = 1.16, Cv = 0.31, Cs = 3.54). (From Hosking et aI., 1985b, Table 7). (IS = Jenkinson's (1969) method of sextiles). Estimator rmse %Qr % due to bias LN2/ML LP3 EVI/ML log EVI/MOM WAK·4/pWM 0.78 1.20 0.75 1.44 1.03 26 40 26 49 35 Sample Size 15 26 2 64 25 I rmse %Qr % due to bias 0.06 0.82 0.66 1.18 0.76 20 28 22 40 26 35 3 76 44 I 30 (c) Root mean square error of Qr estimates for T = 100 by five estimatOrs in samples of sizes 15 and 30 from Wakeby parent population with J.l = 100, Cv = 0.52, Cs = 2.42 and QIOO = 2.936. (Adopted from Tables 1,3,4 of Kuczera, 1982b) Table 5.2 Standard error of selected quantile estimators showing dependence on sample size and on some population parameters 38 5.3 STAllSTICAL DISTRIBUTIONS FOR FLOOD FREQUENCY ANALYSIS Properties of at-site/regional quantile estimators While at-site/regional estimators such as index flood methods (Dalrymple, 1960; NERC, 1975; Beable and McKercher, 1982) have been widely used, systematic investigation of the sampling properties of these was not undertaken until this decade (Wallis, 1980; Greis and Wood, 1981; Kuczera, 1982 b; Hosking tl.ll!.., 1985a; Wallis and Wood, 1985; Lettenmaier tl.Jl!...1987; Lettenmaier and Potter, 1985; Arnell and Gabrielle, 1985, 1988). The circumstances investigated in these tests can be seen in Table 5.1. Lettenmaier (1985) gives a lucid account of results available up to mid -1985. All such investigations have been conducted by simulation methods. Such tests generally proceed along the following lines. (a) Select a flood-like parent distribution; (b) Select the estimators to be tested i.e. model distributions and methods of parameter estimation; (c) Select a hypothetical region, I.e. number of stations, M and length of record at each station; (<1) Select parent distribution parameters so as to produce regions with required C v and C s, with or without regional homogeneity; (e) For each selected parent: (I) (i) Generate a region-full of data, (ii) For each estimator calculate Qr for each site, for different selected values of T, (iii) Repeat (a) and (b) a large number (usually at least 1000) of times, (iv) For each estimator and return period calculate bias, se and rmse A Compare the estimators on the basis ofresults of e (iv). The selection of steps (a) to (d) determines the range of applicability of the results. A wide variety of parent distributions have been used in such tests. Wakeby, GEV and TCEV parents are considered to be hydrologically most challenging when testing the robustness of both 2-parameter and more complex models. The robustness of a particular model is tested by assessing its performance with data drawn from a different parent distribution. Such experiments yield a large volume of results which are not easy to assimilate at a glance. The general nature ofresults of selected experimental types is shown in Figure 5.1. Figure 5.1 (a) shows the type of result obtained when a 2-parameter fixed skew model is being used to estimate quantiles when data are drawn from a population with higher skewness than that implied by the model. This usually results in negative bias, whether used in at-site or atsite/regional mode. Wheu used in at-site mode it has much smaller se than a typical 3-parameter model, Figure 5.1 (c), while the latter has much less bias. Corresponding but opposite remarks apply when the model skewness exceeds that of the parent, Figure 5.1 (b). Thus 2-parameter models have high efficiency but are biased while in contrast 3-parameter models can be unbiased but quite inefficient. The 2-parameter model bias can be damaging while the 3 parameter inefficiency make 3parameter models unattractive for at-site use alone. In general the 3-, 4- and 5-parameter models used in regional index flood mode with PWM estimation lead to results typified by Figure 5.1 (d) and outperform all other models in respect of bias and especially efficiency (small se). It should be noted that ML estimation is usable only in regional estimation schemes which make the station year assumption that all standardized flood values in a region can be considered to be a single random sample. This assumption is not always been regarded as valid and hence station year methods are not often used. In his review of estimation methods Lettenmaier (1985) concludes: "regionalisation is the most viable way of improving flood quantile estimation. The performance of regional PWM (index flood type) estimators for GEV and WAK distributions, in particular, are so superior to the currently used institutional methods that no viable argument for the continuation of the current practice is evident. Particularly where the flexibility of using a 3-parameter distribution is required, the reduction in variability of flood quantile estimators achieved by proper regionalization is so large that atsite estimators should not be seriously considered. The lone exception will be where no physically defensible region CHAPTER 5 39 PROPERTIES OF QUANTILE ESTlMATORS can be identified. In such cases, it will be necessary to use a 2-parameter distribution, such as EVI [or LN2J, and accept large estimation bias, or a 3-parameter distribution such as GEV [or LP3] and accept large estimation variability. It A An example of the reduction of se(Qr) with increase in region size for homogeneous regions is given in Table 5.3. Since Lettenmaier's (1985) review, the use of TCEV distribution as a flood parent and use of Rossi ~ (1984) TCEV regionalization procedure have been investigated by Arnell and Gabrielle (1985, 1988) in studies in which GEV and WAK parents and regional estimating methods were also used. It was found that the WAK/PWM at-site/regional quantile estimating procedure performed very well over all parent distributions. GEV/pWM performed well with GEV parents but noticeably less well with TCEV and WAK parents, a lack of flexibility in GEV seen also in the results of Hosking ~ (1985a). The TCEV regional model had downward bias and relatively small se when the parent was GEV, but had small bias and large se, particularly for large T, with TCEV and WAK parents. However, the TCEV model may be realistic in arid zones as well as in two-flood season humid regions and should not be lightly dismissed. It deserves further testing in such conditions. L M N 30 150 196 300 400 600 900 1 5 14 10 20 20 30 30 30 14 30 20 30 30 Quantile Return Pedod A [se(Qr) / Qr]% 8.8 4.1 3.3 3.0 2.4 2.1 1.7 T = 20 22.9 9.9 7.7 7.2 5.7 4.9 4.1 T = 100 55.1 20.3 15.0 14.4 11.2 9.8 8.1 T = 1000 A Table 5.3. Reduction in se (Qr) with increase in region size using at-site/regional GEV/PWM estimating procedure. Parent population is GEV (10,4, - 0.177) at all sites, i.e a homogeneous region with Cv = 0.53, Cs = 3.0. First row is for at-site GEV/pWM, others are for at-site/regional GEV/PWM estimation. (M = No. stations, N =No. of years at each station, L = M.N). (Adapted from Hosking!ll..llL 1985a, Table 3). 5.3.1 EFFECT OF REGIONAL 1lIITER00ENEITY Even where flood statistical behaviour is not regionally homogeneous Lettenmaier and Potter (1985) and Lettenmaier et aI. (1987) have shown that at-site/regional procedures are still preferable to at-site procedures unless the degree of heterogeneity is very large. 5.3.2 !'ERFoRMANCEOFLP3 QUANTILE ESTIMATIONMETIIODS Following recommendations of a US Interagency Committee, LP3 was recommended for use by US Federal Agencies (USWRC, 1967). The basis of the choice was goodness of fit to 10 long natural streamflow records. Beard (1974) examined the ability of LP3 and of other models to predict rates of OCcurrence, in the second half of a flood record, of quantiles estimated by each model from the first half of the records. A correction for expected probability (Beard, 1960) was employed. On the basis of the results from 300 records, he concluded that only LP3 with regional skew values, and LN2 were able to predict future frequencies without bias and that the former method was preferable. The quantile estimating ability of LP3 has been examined for at-site use by Kuczera (1982 b) who found that not ouly are EVI and LN2 more efficient quantile estimators than LP3 but 4-parameter WAK is also more efficient (see Table 5.2.c). Wallis and Wood (1985) tested the regional LP3 quantile estimator as specified by USWRC (1981) and found that when LP3 is the parent, the LP3 estimator is less precise than regional GEV/PWM or regional WAKjPWM. 40 STATISTICALDlSTRIBUIlONS FOR FLOOD FREQUENCY ANALYSIS The same conclusion is true whcn other parent distributions are considered but to a greater degree indicating that LP3 is also lacking in robustness. A Table 5.4 shows examples of how GEV/pWM and WAK/PWM usually give smaller se(Qr) than the USWRC (1981) recommended regional LP3 procedure, especially where site population skewness is negative. Selected Site No. Parent Cs(z) LP3 with Regional Skew A bias se (Qr) 9 18 1 14 16 -0.660 -0.349 -0.064 0.216 0.441 1.033 0.435 0.098 -0.058 -0.31 1.301 0.611 0.363 0258 0.247 GEV/PWM at-site regional A bias se (Qr) 0.010 0.032 0.215 0.152 0.361 0.186 0.161 0.289 0.223 0.397 WAK/PWM at site/regional bias - 0.096 - 0.134 0.987 0.031 0.218 A se(Qr) 0.200 0.201 0.206 0.162 0.272 Table 5.4: Comparison of three regional estimating techniques for estimating 5OD-year return period flood. Data generated as LP3 for 20 site heterogeneous region with -0.66 < Cs(z) < 0.44, where Cs(z) ~ skewness in log space. Tabulated quantities are fractions of true quantile values at each site. (Selected from Wallis and Wood, 1985, Table 3). The conclusions drawn by Wallis and Wood (1985) have been contested by both Beard (1987) and Landwehr l<l al. (1987). Beard (1987) disagrees that the 6 step strategy outlined above for testing quantile estimating methods is appropriate and would promote the split sample test (Beard, 1974; USWRC, 1981) in its place. He also advocates dealing with flows in the log-domain to reduce skewness rather than in the Q - domain. In their reply Wallis and Wood (1987) disagree that the split sample technique is appropriate and state, "it is insufficient for identifying the underlying distribution of floods", and that hence the modem emphasis is on selecting robust procedures. They also rebut the suggestion that the log transform, by reducing skewness, improves quantile estimation since in some cases in the form of LP3 it can lead to some "spurious quantile estimates" which can be avoided by the use of PWM estimation. Landwehr et al. (1987) express their disagreement with the Wallis and Wood (1985) findings more extensively. They assert that it has not been "shown, that in expectation, that there will be a gain using WAK/PWM, or at least no loss, for any region of interest that may be constructed by any criteria". They also claim that Wallis and Wood (1985) base their claims for PWMjWAK over LP3/WRC on the use of average regional bias, which averages out individual occurrences of under- and over-estimation. They state that if average absolute bias were used as a criterion instead that LP3 in one form or other (depending on the way regional information is used with it) would in many cases appear less biased than WAK/PWM. However in making this statement (pI208) the standard error of estimate is temporarily ignored. However Landwehr fi.ll!. (1987) introduce two variants of LP3 regional techniques, not considered by Wallis and Wood (1985) nor included in USWRC (1981), and show that these compare favourably with (in Region lA) and better than (in Region IB) WAK/PWM in two new hypothetical regions not considered by Wallis and Wood (1985). Thus Landwehr l<l..JI!... (1987) went considerably outside the scope of the original paper and introduced previously unpublished variants of LP3 regional techniques in order to challenge the Wallis and Wood (1985) findings and perhaps to defend the use of LP3 as an institutionally recommended tool in U.S.A.. In fact W.O. Thomas, Jr. (1988, Personal Communication) claims that the LP3/regional skew tested by Wallis and Wood (1985) is "not a true regional method", presumably because only skewness and not other statistics is averaged regionally in it. It is however the variant which was advocated for institutional use by USWRC (1981) and it would seem that it is this recommended variant that was intended to be subjected to test by Wallis and Wood (1985). Despite the defence of LP3 based techniques put forward by Landwehr et al. (1987) it does not diminish the fact that LP3 can lead to low upper bounds on flood magnitude. Use of LP3 must always be accompanied by some contingency technique for recognizing and dealing with low outliers. CHAPTER 5 5.4 41 PROPERTIES OF QUANTILE ESTIMATORS Properties of regional·only qnantile estimators Statistical flood estimation for nngauged catchments depends on regression relationships between some or all of Q, C V , Qr" On .... and catchment characteristics, as explained in Chapter 4.3.3. When the index flood approach _ A _ - is used, Qr =Q. xT (equation 4.13), the XT component contributes much less to error in Qr than Q does when Q is obtained from a regression relationship. Nash and Shaw (1965) showed that se( Q) ~ sd(Q) = standard deviation of population. In other words Q from catchment characteristics is as good as having a sample of annual maxima of size one. Their result was based on data from 57 catchments. NERC (1975) using data from 530 catchments found no improvement on se(Q). Increasing the number of catchments increases the precision with which parameters of the regression relation are established but it does not change the nature of the problem because the scatter of residuals about the regression relation is still inherently the same. NERC (1975) did find that restricting the calibration to include only data from longer records with high quality measurement, resulted in a small improvement in se(Q). Stedinger and Tasker (1985, 1986) show that a more comprehensive treatment of the residuals, using either generalised least squares or weighted least squares calibration, leads to improved prediction from catchment characteristics, especially where record length varies from site to site. These methods also enable very short records to be used more profitably. Hebson and Cunnane (1987) have attempted to investigate, by simulation, ungauged catchment index flood quantile estimates. They tried to simulate data which were compatible with real catchments and found that Q obtained from a simple regression relation were considerably more variable than estimates obtained from a single year record, which does not augur well for the use of such relations. As stated in Chapter 4.4.3 the relationships between Cv and catchment characteristics is in some regions statistically significant but it usually has very large standard error. If a regional estimate of C v is to be used for an ungauged catchment, then the regional mean value should be used. This is a compromise which may lead to under- or over-estimation in some catchments. A A The use ofregression relationships between QT and catchment characteristics which, as indicated in Section 4.4.3, Qr is widely used in U.S.A., leads to levels of accuracy for different values of T which depend not only on the number of sites M used in the calibration but also on the length of record, N, at each site. Tasker and Moss (1979, Fig. 3) using a simple model, Qr = aAb, shows that se(QT) can be greatly reduced by increasing M if T is small, but if Tis A large N has to be increased in order to reduce se(Qr). Their Figure 3 also shows the limit of accuracy available with A such a simple model. The observed se(Qr) for T = 2, 10,50 and 100 years are 90, 79, 85 and 88% of Qr respectively. Increasing M to 50 and N to 35 will reduce these percentages to about 55% for T = 50 and 60% for T = 100 but they cannot be further reduced without a more complex model, which in turn may need more data for accurate estimation because of the incresed number of parameters to be estimated. W.O. Thomas, Jr. (1988, Pers. Comm.) has indicated that for estimation of 100 year flood the regression equations currently used in the U.S.A. have an accuracy equivalent to that obtainable from between 8 and 12 years of record. 5.5 Effect of spatial and temporal interdependence of flood magnitndes on qnantile estimates If a great degree of spatial dependence exists between neighbouring flood records then new information is not necessarily included when new stations are added to a regional estimating scheme. However inclusion of interdependent data does not invalidate the calculation of mean values of selected statistics over a region even though such means are not as precisely determined as from an equal amount of independent data. Thus methods such as Dalrymple's (1960) or average standardized PWMs (Wallis,1980) are not damagingly affected by interdependence, (see Hosking and Wallis, 1985). Methods which depend on the station year assumption that all standardized flood values in a region form a single random sample may be affected to some degree by such dependence. One method of dealing with such dependence has been suggested by Aereman and Hosking (1986). 42 STAllSTICALDISTRIBunONS FOR FLOOD FREQUENCY ANALYSIS Temporal interdependence can be measured, perhaps imperfectly by serial correlation coefficients. The amount likely to exist in a flood series is unlikely to effect greatly the reliability of quantile estimates (see Landwehr ll1..llL.. 1979 a; Srikanthan and McMahon, 1981b). 5.6 Confidence intervals A A A If quantile estimates Qr are normally distributed an interval of [Qr ± 1.96 se(Qr)] defines a 95% confidence interval. Estimates obtained by at-site/regional index flood methods tend to be normally distributed (Hosking 1aJl1, 1985 a; Hebson and Cunnane, 1987) but estimates obtained by at-site methods tend to be highly skewed and hence the A A symmetrical confidence interval [Qr ± 1.96 se (Qr)] is misleading, being much too short in the upper portion. Fora. detailed account of confidence intervals see Stedinger (1983). 5.7 Comparison of different techniques of parameter estimation Traditionally, parameter estimation methods have been compared under the headings of bias, efficiency, sufficiency and consistency (Kendall and Stuart, 1961 VollI, Chapter 17) although only the first two of these receive much consideration by hydrologists. In general ML.methods are known to be most efficient in asymptotically large samples. While there is no guarantee that ML is most efficient in small samples it frequently is so. Apart from efficiency, feasibility is another consideration. For instance Matalas and Wallis (1973) reported that ML estimates were impossible to obtain for some LP3 samples using P3jML estimation methods. On a different level ML estimation has not been applied. or may be impossible to apply, to WAK parameter estimation. Likewise PWM estimation has not been applied to normal, LN or LP3 parameter estimation but Song and Ding (1988) have shown how it can be applied for P3 estimation even though P3 distribution is not expressible in inverse form. Many of the studies listed in Table 5.1 have concentrated on comparing several methods of estimating parameters and quantiles from particular distributions on the assumption that the form of parent population is known. While these studies provide valuable insight on estimation methods the most important fact to remember is that if an index flood type at-site/regional method is adopted for use a sufficiently flexible multiparameter distribution can be chosen whereas at most a 2-parameter distribution should be used on an at-site basis. The use of regional data with the flexible distribution will ensure that the parameters are well estimated and hence the benefits of flexibility are not marred by excessive standard errors. CHAPTER 6 METHODS OF CHOOSING BETWEEN DISTRIBUTIONS 6.1. Introduction Several techniques have been used in the past for evaluating the suitability of different distributions for AM series. An attempt is made to classify these in Table 6.1. Two main categories can be identified in the use of these techniques: (a) Tests of descriptive ability (i) Seek from among known distributions that one which fits observed AM data best, judged according to one or more of the criteria I to VI given in Table 6.I(a). Generally speaking this philosophy prevailed prior to the mid 1970's and cannot be regarded as having been successful on its own. (ii) Examine the statistical behaviour, especially the sampling distribution of Cv , C, and standardized largest sample values, of candidate distributions to determine whether they are capable of producing random samples having the same statistical characteristics as observed AM series, Category VII in Table 6.1 (a). (b) Tests of predictive ability Test how well a candidate distribution can estimate the Q - T relationship or the frequency of future events when the population distribution is not identical to that of the candidate distribution, Categories VIII and IX of Table 6.1 (b). Different methods of parameter estimation must also be included in this part of the test. In this Chapter the traditional methods and recent approaches to the problem are discussed. 6.2 Influence of Outliers Some hydrologists are inclined to believe that many existing statistical flood frequency estimation methods underestimate the frequency of occurrence of very large floods. There is some difficulty in formulating a test of such a belief. Since outliers, defined in some suitable objective way, are rare and do not appear in each observed hydrological record, a test of any hypothesis about their frequency of occurrence must be done on a regional basis. Random samples drawn from statistical populations can produce outliers in hydrologically sized samples. It is relatively simple to count the frequency of outliers occurring in sets of randomly generated samples from some population in order to use them as a standard of comparison but it is less easy to define a set of outliers in a real hydrological region. Typically Nj (j=1,2,...M) years of record are available at M gauging sites in a region (some on the same stream perhaps) and occasional historical records of unusually large floods are available for some of these M sites, as well as for some additional sites which are not gauged on a regular basis. It is difficult to establish a procedure for obtaining, from such a data base, a count of how many outliers of differeut degrees of severity have actually occurred in the region, which may be compared in an unbiased manuer with the results of a simulation study. With such a fair procedure for comparison one could then test whether existing or proposed flood frequency models arc capable of generating randoin data sets, similar in size and record lengths to observed hydrometric data sets, which have the same outlier producing capability as nature. It may be that some models (candidate distributions) might display outlier related behaviour relative to the observed data analogous to the condition of separation for skewness. Such a condition might point to the need for a thick-tailed distribution. The amount of influence which outliers should be allowed to have in distribution selection and parameter estimation needs to be considered. Outliers can be excluded from the estimation procedure only if it is certaiu that AM floods can be adequately modelled by a single known distributional form. (The distribution and estimation procedure in question need not match the parent (i.e. nature) precisely so long as it is robust). In such a case, the Or estimate obtained from the truncated sample may be closer to the true value than that obtained from the entire sample. Even if retained, outliers have only a small effect if an efficient method of parameter estimation (ML or PWM) is used. 44 STATISTICAL DISTRIBUIlONS FOR FLOODFREQUENCY ANALYSIS CATEGORY I PROCEDURE Graphical Histogram - Visual inspection Probability plot - Visual inspection Probability plot including confidence interval type "control" bands (Gumbel, 1941) II Goodness Of fit tests Chi-Square Kolmogorov-Smirnov Anderson-Darling ill N Tests based on skewness (e.g. is Cs '" 1.14?). Visual tests based on Moment-Ratio diagrams. Numerical indices of agreement calculated from probability plot as in USWRC (Benson, 1968) and NERC (1975). V Test of distributional hypothesis against specified alternative e.q. EVI versus EV2 (van Montfort, 1970) EVI versus GEV (Hosking !ll..l!!., 1985b) VI Regional Dooling of data and applying I - IV to select a single regional distribution VII Behaviour Analysis Test by simulation study or theoretical analysis whether candidate distribution can give rise to random samples having the same general statistical properties as observed flood data series e.g: incidence of outliers, variability of calculated skewness Matalas et al.(l975),Rossi et al. (l985),Ahmad et al. (1988a) Table 6.1 Categories of procedures used to select a flood distribution. (a) Tests of descriptive ability. CHAPTER 6 45 METIlODS OF CHOOSING BETWEEN DISTRIBlITIONS CATEGORY PROCEDURE vm Split Sample Test Distributions are fitted to fIrst half ofeach record and expected numbers of exceedances of specifIed magnitudes in second half of record are compared with observed number (Beard, 1974). IX Robustness Test whether a distribution and method of parameter estimation, considered jointly are insensitive to departures from assumptions made in their use. e.g: Kuczera (1982b), Hosking et al. (1985), Wallis & and Wood (1985), Arnell and Fiorentino (1988). Table 6.1 Categories of procedures used to select a flood distribution. (b) Tests of predictive ability. On the other hand, if it is regarded as true that AM floods come from two very different sub-populations, then the outliers must be retained if the sample is to be regarded as random and unbiased. Even then, because there are so few of them, such outliers can be used only in a regional estimation procedure, perhaps to estimate the parameters of the upper end of the compound distribution. The Rossi l:t.l!!. (1984) regional estimation procedure explicitly extracts all outliers prior to estimating the at-site standardizing statistics and then includes the standardized outliers in the final stage of the estimation procedure. 6.3 Traditional Methods Dntilthe 1970's the suitability of any particular distribution for flood frequency analysis was often judged on the basis of physical inspection of the data on a probability plot. On such a plot the sample values of the hydrological record appear as a series of plotted points while the estimated distribution of a particular form, whose suitability is being examined, would appear as a line or curve. The fonn of distribution whose line or curve showed best agreement with the plotted points would then be chosen. Gumbel (1941), in demonstrating the use of EV distributions, made use of confIdence intervals about the line or curve of the fitted distribution on the probability plot in order to help judge or demonstrate the suitability of these distributions for annual maximum series data. It is now understood however, both from theoretical statistics and from split sample tests on hydrological data, that any single record of data from a given distribution can display a plotted behaviour which is quite different from the parent Q-T relationship. Thus a sample from a straight line population could display marked curvature on a probability plot while on the other hand a sample from a population whose parent Q - T relationship is curved, might by chance plot remarkably close to a straight line on a probability plot. Therefore there is a distinct possibility of error when choosing a distribution on the basis of inspection of a probability plot. This is not the fault of the plot, per se, but stems from the highly uncertain nature of the problem. Other methods of choosing between distributions suffer analogous weaknesses even though the procedures may be less subjectively defined. Certain goodness of fit statistics may be used as the basis of a test of the hypothesis that a given sample of data may be regarded as having been drawn randomly from a distribution of specified form. Examplcs of these are Chisquare and Kolmogorov-Smimov statistics. Such tests can reject, or by default accept, the null hypothesis that a sample of data have come from a stated distribution. (If the parameters of the distribution named in the hypothesis are estimated from the sample itself, as is usually the case, this is taken into account when counting degrees of freedom Le. in detennining the distribution of the test statistic). These tcsts can be used to test distributions separately to dctermine whether the data are in accord with the distributions or not. 46 STATISTICAL DISTRIBUTIONS FOR FLOOD FREQUENCY ANALYSIS Such tests are not, however, discriminatory tests for choosing between one distribution and another. Further, since each theoretical distributional form being tested against any sample nsually has its parameter values estimated from the sample, it follows that the several candidate distributions are constrained to be similar, at least in mean and variance. Because of this inbuilt forced similarity, a test of the hypothesis that data have come from EVI when the alternative hypothesis is that they have come from a lognormal distribution with equal mean and variance has necessarily very low statistical power, or equivalentiy the test has a very high probability of a Type II error of inference, i.e that the real differences will not be detected. In Britain, NERC (1975) published results of using Chi-Square goodness of fit and Kolmogorov-Smirnov goodness of fit tests in determining the suitability of different distributions for annual maximum flood series. These resnlts are outlined in Chapter 7 and in general must be regarded as inconclusive because of their lack of statistical power when choosing between alternative distributions. Van Montfort (1970) proposed a specific likelihood ratio test for discriminating between EVI and EV2 distributions a1thongh this test is also of limited power in the circumstances since both distributions nnder test are constrained to be very similar because their parameters are estimated from the same sample. Otten and van Montfort (1978) discuss the statistical power of such tests. Hosking ~(1985b) give an easily used test of an EVI hypothesis against a GEV alternative. This test is detailed in Appendix 4. Other tests based.on the agreement between ranked sample magnitudes and estimated or fitted magnitudes, as viewed on probability paper, have been used by U.S. Water Resources Council (Benson 1968) and NERC (1975) and the results of using these are discussed in Chapter 7. It should be noted that such goodness of fit tests based on probability plots suffer from a major weakness in that they take no account of the fact that the natural sampling variation of the largest elements in a sample are far greater than that for the middle ranking values. Another point to note is that such tests almost inevitably pick out the three parameter distribution before the two parameter ones. This is not a snfficient reason for accepting three parameter distributions and rejecting two parameter ones. If thetrne population were EVI such tests would show more favour to the GEV, with its third parameter, than to the parent itself. The same would hold if the true population were lognormal; such tests would show more favour to LP3 (of which it is a special case) than to the parent distribution. Another test is based on the relations between Cv andCs and between Cs and Ck in a distribution of particular mathematical form. On moment ratio diagrams, first used by Karl Pearson for demonstrating his system of frequency curves, showing the Cv - Cs and the Cs - Ck relationships respectively, each distributional form can be represented by a point, a curve or region. The corresponding quantities calculated from an observed record of flows define a single point on each diagram. This plotted point may identify a distribution, and it is to be hoped that a given sample identifies the same distribution on both diagrams. The drawback of this approach is that the lengths of record available in hydrology do not allow the positions of the plotted points to be obtained accurately and hence there is the possibility of error in the diagnosis. This possibility of error is usually not reduced by plotting the results from several catchments on the same diagram. This often results in a wide scatter of points and it is not possible to determine precisely how much of this scatter is due to sampling error and how much is due to differences of parent distributions between catchments. Some properties of moment ratio diagrams are discussed in Appendix 3 from which the weaknesses of this technique can be deduced. Split sample tests (Beard, 1974) are predictive ability tests which compare the expected number of exceedances of specified magnitudes, as estimated from the first half of a record, with what has been observed in the second half. The distribution which produces the best agreement is accepted as best. Beard (1974) conclUded that only LP3 (with Regional Skew estimates) and LN2 distributions were satisfactory when tested on data of 300 rivers in U.S.A. 6.3.1 GOODNESS OF FIT AS TIIE SOLE CRITERION The disadvantage of using the wrong form of distribution for flood series is that of over- and under-design of hydranlic structures. Overdesign involves unnecessary construction or other flood plain costs while underdesign may result in excessive future damages. Such a disadvantage would seem obvious in any individual case. Nevertheless Matalas and Wallis (1972) arrived at an unexpected conclusion when they examined, by simulation methods, strategies for minimising the overdesign, underdesign and total costs associated with the T = 50 year reservoir design flood. They found that use of the Gaussian (Normal) distribution for flood series can, in certain circumstances, minimise expected overdesign costs. This is surprising since it is generally acknowledged that the Gaussian distribution is not a good descriptor of observed flood series. The important word here is "expected" and undoubtedly individual cases of extreme over- and underdesign could occur with such a choice of distribution. However, the study alerts us to the fact that CHAPfER6 METHODS OF CHOOSING BETWEENDISTRIBlITIONS 47 goodness of fit alone need not be the sole criterion in choosing a distribntion for flood series when many different projects are being jointly considered on a nationwide basis. The Matalas and Wallis (1972) finding however is based on the parsimonious attitude that a single form of distribution should be used for alI applications in a given region. Different results might well be obtained if a nonparsimonious attitude were adopted whereby the best fitting distribution for each simulated sample is used instead of a single distribution for alI samples. The procedure of checking the goodness of fit of candidate distributions to AM series by traditional methods has not led to a unique satisfactory choice of distribution for any region. The power of such tests is necessarily very low since fitted candidate distributions are constrained, by the fitting procedure, to be similar to one another for each site before the test is undertaken. 6.4 Recent Approaches Recent approaches correspond to categories VII and IX in Table 6.1. This work has appeared in print as Matalas and Wallis (1972, 1973), Wallis tl.1I!.. (1974), Matalas l:1.ll1 (1975), Landwehr et al. (1978, 1979), Wallis (1980) in behaviour analysis category VII of Table 6.1(a) and as Houghton (1977, 1978 a,b), Wallis (1980), Greis and Wood (1981), Kuczera (1982 b), Rossi tl1!1 (1984), Wallis and Wood (1985), Hosking et al. (1985a) and Arnell and GabrielIe (1986) in the robustness category IX of Table 6.1(b). These publications are very detailed but an outline of the principal f'mdings is attempted here. 6.4.1 BEHAVIOUR ANALYSIS WalIis tlJ!!. (1974) examined the sampling properties of random samples from distributions which have been nsed for flood frequency analysis. They fonnd that sample estimates of C v and C" which are sometimes used to make inferences about the parent population, are biased downwards and Kirby (1974) conflrmed that these sample quantities are subject to upper bounds which are functions of sample size, (see equations (2.·1) and (2.2». From this it must be concluded that parent flood distributions have higher Cv and C, (and probably also Ck) values than are indicated by conventional methods of calculation used heretofore. Matalas!llll!. (1975) discovered the condition of separation for skewness whereby the variability of skewness is greater among samples of hydrological AM series than among equal sized samples drawn randomly from parent candidate distributions which are commonly used as flood frequency distributions including LN, P3, WeibulI and EV1. Landwehr!llll!. (1978) showed that LP3 also does not explain the condition of separation nor does the GEV (Cunnane, 1984). Houghton (1978a) however, investigated the Wakeby distribntion, which has been suggested by H.A. Thomas, and found that for certain combinations of parameters random samples from it do not display the condition of separation. In conclusion alI previously tried distributions failed to produce sample skewness which behaved similarly to observed AM skewness. Only the Wakeby distribution did so. Landwehr!llll!. (1978, p.90?) found a combination of Wakeby parameters which agreed well with Cv and C, values of their regional AM data. Landwehr tlJl1. (1978) considered the effect of kurtosis on the condition of separation and concluded that large kurtosis is a necessary but not sufficient property of any distribution required to explain the separation condition. Previously Matalas l:lJ!1 (1975) and Wallis et al (1977) had found that the condition of separation could not be due to (i) the relatively smalI number of historical sequences available for analysis, (ii) autocorrelation (iii) cross-correlation, but could be accounted for by spatial mixing (heterogeneity) of C, values within a region and by non-stationarity in C,. Therefore, the condition of separation has to be given serious consideration. Next, Landwehr et aI (1978) considered how accurately or reliably inferences could be made about QT from studies of z = log Q, the intention, among other things, being to examine the utility of resorting to log-distributions (LN and LP3) with parameter estimation by moments in log space (LS) as distinct from in the Q - domain or real space (RS). They showed that the condition of separation also exists in LS and that neither LN nor LP3 distribution explain it there. They note that C,(Q) derived from AM series is mostly positive and becomes more so as sample size, N, increases while in LS, C,(Z) is mostly negative and becomes more so as N increases. If an LP3 distribution's parameters are estimated by moments in LS and C,(Z) < 0 then the fltted distribution will have an upper bound in RS. Such a fitted distribution is unlikely to have a positive C,(Q) in RS and this is a contradiction of the gcneralIy observed 48 STATISTICAL DISTRlBlITlONS FOR FLOOD FREQUENCY ANALYSIS condition of Cs(Q) > 0 and even more contradictory when we consider that Cs(Q) is probably under-estimated anyway. This is one argument against LP3 as usually fitted I.e. by moments in LS. However W.O. Thomas Jor (1988, Pers. Comm.) quoting Gilroy (1972) points out that "the upper bound for the LP3 will be at least four standard deviations in log units above the mean if the skew is > - 0.9" a condition that is met by most applications. On the other hand, Monte Carlo experiments with the Wakeby distribution indicate that both Cs(Q) > 0 and Cs(Z) < 0 are realisable in that distribution subject to certain restrictions on the parameters. Further Monte Carlo experiments were conducted by Landwehr et aI. (1978) under a series of six different assumed parent regional hydrologies, as defined by AM flood distribution, with a view to studying how effective regional maps of skewness in LS are in conveying information about skewness in RS. They concluded that a skewness value in RS could not be uniquely inferred from the value in LS, as distinctly different Cs(Q) contours could give rise to identical Cs(Z) contours and vice versa. The transfer from LS to RS depends on the form of the parent distribution. Since, in truth, the latter is unknown for AM series, they conclude that "the construction and use of regional skew maps are most likely to be counter productive." The magnitude of this effect on quantile estimates was not quantified. Rossi et al. (1984) tested the suitability of EV1, LN and PEV hypotheses on the basis of their ability to reproduce observed regional skewness of Italian AM series (39 series of average length 40 years each). They found that each of these hypotheses conflicted with the observed data but that TCEV could generate samples with the same regional distribution of skewness as the observed data, and hence accounted for the condition of separation of skewness. Beran!.<l al. (1986) showed that the TCEV distribution could account for the variability of skewness of British data. Recently Ahmad tlJ!!. (1988) have shown that the log-logistic LLG is able to model flood series for Scottish rivers from the point of view of reproducing the observed regional distribution of skewness as well as scoring better than GEV, LN3 and P3 .in at-site and regional goodness of fit tests. In reparameterised form LLG isa special case of the generalised logistic distribution, GLG, (Hosking 1986a, Ahmad, 1988). In conclusion, behaviour analysis shows that none of the commonly used distributions produce samples which behave in the same manner as AM flood series of equal size, on the assumption that a given hydrological region is homogeneous in skewness and time (stationary). Only the Wakeby, TCEV and GLG (or LLG) distribution emerge as possible candidates, under these assumptions, from this examination. The Wakeby distribution has the disadvantage of having five parameters which may seem too many to estimate from a hydrological AM sample. Parameter estimates obtained via PWM (Greenwood et al 1979), Landwehr et al. (1980) may not always be feasible in 'small samples but estimates of Wakeby parameters from regionally averaged PWM's (Wallis,1980) nearly always exist. It.is not suggested . here that these multiparameter distributions be used in an at-site mode but rather in an at-site/regional mode. Wakeby fitting failures seem to be most prevalent with data straddling zero, with extremely high Cv data or with samples that are so thin-tailed that no legal Wakeby distribution exists (Wallis, 1984 personal communication). These conditions rarely occur among sets of hydrological or meteorological maxima, and hence failure to fit a Wakeby distribution has so far proved to be an academic point rather thana practical hydrological problem. Robustness tests (next section) can be used to determine if other distributions can be used to approximate well the Q - T relationship of a Wakebydistribution. Indications are that the GEV distribution is a moderately good substitute. 6.4.2 ROBUSTNESS A procedure for estimating QT is robust if it yields estimates of Qr which are good (low bias, high efficiency) even if the procedure is based on an assumption which is not true. A procedure is not robust if it yields poor estimates of Qr when the procedure's assumptions depart even slightly from what is true. Since we do not know the distribution of AM floods in nature it behoves us to seek out and find a distribution and an estimating procedure which together are robust when dealing with distributions which give random samples which have a flood-like behaviour. It should be pointed out that split sample tests based on historical AM flood records are inadequate for testing the robustness of any distribution and estimation (D / E) procedure. A suitable method of testing aD/ E procedure involves simulating random samples from a parent distribution in which the Q - T relationship is known exactly (Hosking tlJ!!., 1985 a). To be authentic, in this context, the parent distribution must produce random samples which are flood like in their behaviour. Such a parent would be a Wakeby, TCEV, GLG or possibly GEV distribution, with suitable parameter values. Then the D / E under test is applied to 1\ each sample and QT is obtained from each sample for a selection of T values. This is repeated for M samples (M large) 1\ and analogs of equations (4.1) to (4.5) are used to calculate bias and rmse from the M values ofQr: CHAPfER6 49 MElHODS OF CHOOSING BETWEEN DISTRIBlITIONS M Mean (6.1) 1: = i=l SI. dev. (6.2) Bias (6.3) Rmse (6.4) = A In these expressions QT is the known population value. The sampling distribution of QT is also examined and frequently this can be approximated by a Normal distribution so that 5% and 95% quantiles of the sampling distribution, denoted lower and upper confidence levels, LCL and UCL, can be obtained as: A A A A LCL = Or - 1.645 SQT UCL = Or +1.645 SQT (6.5) (6.6) AlI these quantities can be made non-dimensional by dividing by the population value QT' This practice is usualIy followed to enable intercomparison of D / E procedures. Results are tabulated or presented on diagrams such as Figure 5.1. The performance of the D / E procedure under test is then seen through the magnitude of the bias and the spread of the 90% confidence interval LCL to UCL. Different D / E procedures may be tested in tltis way and their biases and confidence interval widths compared. The D / E procedure giving the smalIest bias or narrowest confidence band may then be chosen under the assumption of the given parent distribution. The test must then be repeated with data assumed to have come from different forms of parent distribution. Obviously a large amount of computer time is required to undertake such tests and the programme of work has to be very carefulIy planned and the computer experiments have to be properly designed to avoid obtaining useless results. The sources of studies of quantile estimate robustness performed to date are summarised in Table 6.2. Some studies refer to single sample (at-site) estimating procedures while others refer to at-site/regional procedures only. Some alIow inferences to be made about both type of procedure. The following general points can be made from the results of such tests. (l) At - site / regional methods are better than at-site methods even in the presence of a modest amount of heterogeneity. (2) Two parameter models have substantially smaller standard error and rmse than three parameter models. Two parameter models usually are biased, if the model skewness is invariant and much smaller thau the population skewness. (3) Three parameter models, while yielding relatively unbiased estimates have such high standard error when used as at - site estimators as to make them very unattractive for that purpose. (4) Regional index flood procedures outperform existing regional Bayesian procedures. 50 STATISTICALDISTRlBUllONS FOR FLOOD FREQUENCY ANALYSIS (5) PWM based regional index flood procedures are most efficient and least biased and are easy to apply. (6) The at-site / regional WAK / PWM procedure is uniformly the best quantile estimator when considered over all relevant studies of Table 6.2. This is so notwithstanding the extra parameters involved in the WAK distribution. The regional aspect of this procedure must be stressed. It would not in general be prudent to apply WAK on an at-site basis. (7) The at - site / regional GEV / PWM procedure is good with GEV parents but is not as robust as WAK / PWM, because it is not as flexible. With non-GEV data it suffers more from bias than from standard error. (8) The TCEV regionalisation procedure yields relatively unbiased but very variable results. There are also some data sets to which TCEV cannot be fitted. It also has have to be based on large data sets. (9) The LP3/regional skew procedure is less efficient than either at - site / regional WAK / PWM or GEV / PWM procedures even when the population is LP3 distributed. It performs particularly poorly for sites with negative skewness and when the parent population is not LP3. Wallis and Wood (1985) go so far as to say that the LP3/WRC procedure "should not be used as a basis for engineering design because significantly more accurate estimates can be obtained by other currently available statistical procedures". Landwehr et al. (1987) have contested these conclusions and claim that a previously unpublished variant of a regional index flood method using LP3is as effective as regional WAK / PWM. This was discussed in Chapter 5.3. (10) Fully numerical variants, indicated as FSR, of the NERC (1975) regional estimation method perform poorly, with very large standard error especially when record lengths are short (Hosking lllJl!.. 1985 a).· Such implementations as have been computerised have differed sufficiently one from another to materially affect their relative performances. Fodnstance, Amell and Gabrielle (1985) found that FSR performance was not as bad as indicated by Hosking et al. (1985 a), the differences being due to differences in the exact steps followed in the numerical algorithm. 6.5 Summary .Goodness of fit tests applied to records of AM floods individually and conventional tests of hypotheses are not conclusive when seeking a flood distribution. They can serve to reject some distributions but are not necessarily good descriminators between accepted ones. Behaviour analysis indicates tbat real AM flood data samples behave differently from random samples drawn from the parent distributions conventionally used in flood frequency analysis. The Wakeby, TCEV and GLG distributions can bridge the gap between theoretical and observed AM flood data. Robustness studies indicate that quantile estimates using 2 parameter distributions suffer more from bias than those based on multiparameter ones. The latter suffer from large standard error if used in at-site mode but not in regional mode. The at - site / regional WAK / PWM appears to be efficient and robust. Early studies (Wallis and Wood, 1985) indicated that the opposite was true of the LP3 based method recommended by USWRC (1981) but this finding has been contested by Landwehr lllJl!.. (1987) albeit with a different variant of LP3 regional estimation. The GEV / PWM regional procedure is not quite as robust as the corresponding WAK / PWM. The TCEV regional method can model real flood data behaviour well although its quantile estimating ability is not as good as WAK / PWM. CHAPTER 6 REGIONAL AT . SITE COMPARISONS Landwehr WJ1 (1980): (i) (ii) 51 METIiODS OF CHOOSING BETWEEN DISTRIBUI10NS 6 Wakeby parents EV1, LN3, WAK estimators COMPARISONS Parent Estimator Kuczera (1982) LNx5 WAK LN/LEB Hosking l:1Jl!... (1985a) GEV WAK FSR /LS GEV /PWM WAK/PWM WEIB/PWM WElB Kuczera (1982): Lettenmaier tl.lll. (1985) EVI GEV EV1/PWM GEV /PWM WAK/PWM EV2/PWM EV3/PWM (i) GEV parent (ii) EVI and GEV estimators Wallis and Wood (1985) GEV WAK LP3 GEV/PWM WAK/PWM LP3 / PWM-RS Lettenmaier and Potter (1985): Lettenmaier & Potter (1985) EVI LN2 P3 EVI LN2/LEB EVI/LEB Arnell & Gabrielle (1985, 1988) TCEV WAK GEV TCEV /ML WAK/PWM GEV /PWM Landwehr et al. (1987) LP3 LP3/MOM-RS (i) 4 Wakeby parents (ii) N, LN2, LP3, EV1,log EV1, WAK estimators Lettenmaier et al. (1985): (i) EV1, LN2 and P3 parents (ii) EV1, LN2 estimators Wallis and Wood (1985): (i) LP3, GEV, WAK parents (ii) LP3, GEV Estimators Table 6.2 Sources of quantile estimating robustness studies results. CHAPTER 7 DISTRIBUTIONS PREVIOUSLY CHOSEN OR RECOMMENDED FOR NATIONAL USE 7.1 WMO Survey In 1983 WMO conducted a survey of current practices of selected countries with regard 10 use of distribution types for frequency analysis on extremes of precipitation and floods. Replies 10 questionnaire were received from 55 agencies in 27 countries. These replies were examined and summarised in a report WMO (1984) which is attached as Appendix 6. Table 7.1 tabulates results from Appendix 6, Table III, about the most commonly used flood distributions among the agencies and countries surveyed. The EVI distribution was the most commonly used distribution of all followed closely by lognormal and LP3 with P3 a little less frequently used. If reported 2 and 3 parameter gamma uses are combined with P3 uses the resulting total resembles that for LP3. If reported uses of EV2 and EV3 are pooled with GEV uses it can be seen that EV distributions, exclusive of EVl, had a large degree of usage even though not recommended as standard by a large number of agencies or countries. It should be noted that there is considerable overlap between some columns in Table 7.1 because many agencies reported use of more than one distribution and some countries reported use of more than one distribution as standard. Table 4 of Appendix 6 shows that almost half of all agencies use the Weibull plotting position for data display and lor calculation while one third use either of Blom, Hazen or Gringorten. The remainder use variants which are very similar 10 these, all being special cases of T = (N+1-2a)/(i-a) with 0.25"; a ,,; 0.5. As pointed out in Chapter 4.3.4 the Weibull formula is biased and should not be used for flood data. Anyone of the others is considerably better, with the Gringorten formula preferred for EVI (Gumbel) paper and the Blom formula preferred for Normal probability paper. The WMO survey points out that choice of distribution is made in many countries with the help of goodness of fit tests and tests of hypotheses against specific alternatives. In Chapter 6, of this report it was argued that such procedures are unable to lead to a unique choice of distribution and may not necessarily lead 10 distributions which are robust as flood quantile estimators. EVI EV2 No. Agencies in which it is used 28 No. Countries in which it is used No. Agencies in which it is used as standard No. Countries in which it is used as standard in one or more Agencies Table 7.1 EV3 GEV LN 11 8 7 27 16 9 8 2 18 3 0 10 3 0 Gamma 2/3 LP3 Exponential (pDModels) 17 6 22 5 16 12 6 13 4 5 16 9 3 17 3 1 8 7 3 7 2 P3 Summary of frequency and extent of use of flood distributions. (Distributions with one or fewer reported uses not included) (Source: Table 3, Appendix 6. Total No. Agencies in survey = 54. Total No. Countries = 28). CHAPTER 7 DISTRIBUTIONS PREVIOUSLY CHOSEN OR RECOMMENDED FOR NATIONAL USE 53 The survey, which is based on at-site estimation, also revealed that "no special method of parameter estimation is preferred and the graphical method is as frequently, or even more, nsed as any other method". This should no longer be the case as there is sufficient evidence to show that joint at-site/regional procedures, based on PWMs, are better than most other flood quantile estimation methods. 7.2 SELECTED CASES Snmmaries of selected investigations into the form of distribution for AM floods are given below in order of their relative popularity. 7.2.1 UNTIED STATES OF AMERICA The U.S. Water Resonrces Conncil established a Work Group on Flood Freqnency Methods in 1966. The Group's first two years work was comprehensively reported upon by Benson (1968). The group decided that several methods of flood frequency analysis in common use among Federal agencies would be applied to a group of 10 long term records of annual maximum flows at selected stations in the U.S.. The catchments chosen represent a wide range of climate, hydrological conditions and size of catchment. Records with obvious outliers were avoided initally. One of the records was 97 years long, the other nine varying between 40 and 62 years. The following distributions were fitted to the 10 records Distribution Number of different programs used 2-parameter gamma 2 Gumbel 2 Log Gumbel 2 Log-Normal 4 Log-Pearson Type 3 3 Hazen method 1 Total 14 The first five distributions were fitted by the programs of more than one agency and in all 14 sets of computations were used. From each computation the flood estimates of return period T = 2, 5, 10,25,50 and 100 years were obtained. For each return period there were 14 values for each station. The calculated estimates of Qr for each T value listed above were compared with "data values" obtained for each station from a probability plot of that station's data. The ranked flood data at a station were assigned probability plotting positions i / (N+ I), where i is rank and N is record length, and plotted accordingly on extreme value logarithmic graph paper. Valnes of QT, the "data values", were obtained by linear interpolation between the two adjacent peaks which bracketed the specified probability or return period. These "data values" were then used as the basis against which the 14 computed valnes of Qr, for each T, could be compared. The deviations for each return period were expressed as «\ -QD)/ QD where (\ is the estimate of QT 54 STATISTICALDISTRIBlITIONS FOR FLOOD FREQUENCY ANALYSIS obtained by one of the 14 distribution-fitting computer programs and QD is the corresponding interpolated "data value". These standardized deviations were listed separately by method and averages over the 10 stations were obtained for each return period and method. These average values of deviations "were an important consideration in deciding between methods". In coming to a conclusion and in making a recommendation it was noted that "no single method of testing the computed results against the original data was acceptable to all those on the Work Group". Further "the statistical consultants employed by the group had indicated that no unique procedure could be specified as correct for anyone method of flood-frequency analysis" and they "could not offer a mathematically rigorous method" for selecting a best method. Consequently, the group decided that if a unique choice could not be made on statistical grounds alone then a choice would nevertheless have to be made for compelling administrative reasons. Guided by the average deviation results described above, the group decided to recommend that the LP3 distribution (with LN as a special case) be adopted as a base method for analysing flood-flow frequencies (for Federal Agencies). Allowance was made for the use of other distributions also provided there were sufficiently justifiable reasons. The group also recommended that their choice of a base method should not freeze hydrological practice into a set pattern but that research into flood frequency methods should continue. The U.S. Water Resources Council accepted the group's recommendation on the LP3 distribution and has retained it in each of its subsequent recommendations on methods of determining flow frequencies (1967,1976,1977, 1981). This choice has been supported by Beard (1974). Implementation of the method involves computing the mean, standard deviation and skewness of logarithms of the data series. This can be referred to as estimation by moments in log space (LS) as distinct from other possible moment estimators (phien and Hsu, 1984). A general map of skewness has been prepared by the Council for the entire United States of America. This allows a value of skewness to be obtained for a site at which insufficiently long records are available for a reliable estimate of skewness to be obtained from the data. The "Guidelines for Determining Flood Flow Frequency" (USWRC, 1976, 1977, 1981) also discusses issues other than the choice of distribution notably (i) the detection and treatment of outliers, (ti) treatment of series containing floods caused by vastly different types of precipitation events and (iii) treatment of years with zero flood values for rivers in arid climates. 7.2.2 UNITED KINGDOM A Floods Study Team was set up in England in 1970 to study all aspects of flood hydrology with a view to recommending methods of floodestiination to the engineering 'profession. The tearn consisted of full·time professional staff working at the Institute of Hydrology and the Meteorogical Office. All the extant recorded flow data were collected from the several gauging authorities: All usable flood series extracted from these'records, over 500 in nmuber, were published. Major studies in rainfall frequency estimation, flood frequency estimation, rainfall-runoff modelling and flood routing were carried out. The results of the studies were published in five volumes by NatUral Environment Research Council (NERC,1975). The Flood Studies Report was reviewed by Burges (1979). . Methods of flood estimation considered were the use. of the annual maximum series model and the partial duration series model principally, with an exploratory studY' of a time series method using a shot nOise model. Estimation problems encountered in series with missing and lor censored peaks were also dealt with. Among these methods, the relationships developed for flood estimation for ungauged catchments were based on the annual maximum series model. A detailed study of the form of distribution of annual maximum floods was carried out. Only six records in excess of fifty years length were available, so it was decided to include records of thirty years or longer, of which twenty nine were available, into this aspect of the study. Six additional records, although shorter than thirty years, were included from the Republic of Ireland. A variety of statistical tests were employed in trying to discriminate between distributions but not all tests were applied to all distributions. 7.2.2.1. X2 goodness of fit index This index is not renowned for high power in the statistical sense and is not very useful. It was applied individually to thirty eight records with each of three distributions (EV1, LN2, GEV). The number of stations at which it rejected each distribution varied greatly with the significance level of the test Significance Level DistributiQn Gumbel Table 7.2. LognQrmal 17 7 0.10 0.05 0.01 7.2.2.2 55 DISTRlBtrr10NS PREVIOUSLY CHOSEN OR RECOMMENDED FOR NATIONAL USE CHAPTER 7 GEV 14 11 15 7 3 2 2 Number Qf times Qut Qf 38 recQrds that stated distributiQns were rejected by X2 goodness Qf fittest. The Kolmogorov-Smirnov (K-S) gOOd1U!8S offit test This test was applied to the abQve three distributiQns as well as tQ the twQ parameter Gamma distributiQn. In CQntrasttQ the X2 test. this test rejected a distributiQn in Qnly a few cases. It shQuid be nQted that the applicability Qf this test is nQt restricted in any way by sample size. Significance Level DistributiQn 0.10 0.05 0.01 Table 7.3 7.2.2.3 Gumbel Gamma I I 0 0 0 2 LognQrmal 1 0 0 GEV 0 0 0 Number Qf times Qut Qf 38 records that stated distributiQns were rejected by KQlmQgQrov-SmirnQv goodness Qf fit test. Goodness of fit test based on probability plot The distributiQn and the record to be CQmpared are represented by a line Qr curve and a set Qf plQtted pQints respectively Qn the same diagram Qr probability plQt. For each distributiQn and sample being cQmpared the quantities di = [Q(i) - d(i)l/Q. i = 1,2.... N (7.1) were cQmputed where N is recQrd length, Q(i) is the Qbserved ith smallest value, dO) is the cQmputed variate value Qn the line Qr curve at the ith plQtting pQsitiQn and Q is the mean Qf the Qbserved series. A set Qf d values dr. d2.....dN can be summarised either by the mean absQlute value lal or by the roQt mean square deviatiQn. rms(d). As a result, the cQnclusiQn drawn from any Qne cQmparisQn (Qf fitted distributiQn and Qbserved recQrd) may depend Qn which methQd Qf summarising the d values is used. AnQther SQurce Qf diversity lies in the manner in which dO) is defined. It depends Qn the plQtting probability fQrmula used. The traditiQnal Weibull fQrmula Fi = i I (N + 1) is biased (Cunnane, 1978) but was included because Qf its widespread use. The Hazen plQtting pQsitiQn was alsQ used. Fi = (i - 0.5)/N. as was the plQtting pQsitiQn based Qn the expected value Qf the reduced variate Qrder statistic. The latter varies with distributiQn type but cQrresponds to the GringQrten fQrmula (i - 0.44)/ (N + 0.12) for the Gumbel distributiQn and to the BJQm fQrmula (i - 3/8)1(N + 1/4) for the NQrmal distributiQn. FQr each recQrd (35 in number) and each distributiQn (7 in number) values Qf lill were Qbtained using each Qf the three plQtting pQsitiQn fQrmulae mentiQned. In additiQn rms(d) was Qbtained using the plQtting pQsitiQn based Qn 56 STATISTICAL DISTRIBUflONS FOR FLOOD FREQUENCY ANALYSIS the expected value of the order statistics. This gave three tables, each containing 7 x 35 values of liil and one table of 7 x 35 rms(d) values. The distributions used were EVI, G2, LN2, GEV, P3, LP3 and log-gamma. For each record, in each table, each distribution was assigned a rank between I and 7, rank I for the best fitting distribution (low liil) and seven for the worst fitting one. For each distribution the ranks were summed over the 35 stations and these totals of ranks were used as the basis of comparison. In the first three tables the results were the same showing no dependence on plotting position. The three parameter distributions showed the best fit in the following order: LP3, GEV, P3 while LN2 was the best fitting of the two parameter distributions. However, wheu the goodness of fit was expressed by rms(d), (the fourth table), P3 jumped to first place followed by LP3 and GEV. Thus the measure of average d value influences the relative merits of the several distributions in a serious way since LP3 and P3 distributions are so different in the manner with which magnitude varies with return period T at large values of T. In summary the results showed that Three parameterdistributions fitted more closely than two parameter ones. LN2 fitted more closely than any other two parameter distribution. On the basis of mean absolute deviation Idlthe order was LP3 better than GEV better than P3 and this result was independent of plotring positiun formula and method of parameter estimation (moments or maximum likelihood). On the basis of root mean square deviation, rms(d) the order was P3 better than LP3 better than GEV. 7.2.2.4 Method used by U.S. Water Resources 'Council The method used by the group on flow frequency methods, described earlier in 7.2.1 above was applied also to the UK but.only to the data of six gauging stations. The same seyen distributions as were used in the previous section of the study were used. It was found that, when using the Weibull plotting formula as in the USWRC study, lhe GEV distribution gave the best fit at low values of T while LP3 give the closest fit at high values of T. When this type of study was repeated for the six stations but using the Gringorten plotting formula (related to expected value of reduced variate order statistics) P3 again showed the closest fil, followed by GEV, followed by LP3. Thus again it was found that the order in which the distributious would be chosen as best depends on a small change in procedure. 7.2.2.5. Recommended distribution Following the studies described above the U.K. Flood Studies Report (NERC, 1975) recommended the GEV distribution because: It performed consistently well, if not best in the statistical goodness of fit tests. The superiority of either P3 or LP3 distributions depends on which of two arbitrary steps in the testing procedure was adopted. It described very well the empirically derived distribntions of the standardised variable Q I Q which were derived for each of 10 regions by pooling the data of each region. (NERC 1975, VoI.1.2.6). It is in accord with the rainfall probability distributions derived independently from a vast amount of meteorological data (NERC 1975, VoUI). Estimates of its parameters by moments, sextiles or maximum likelihood are as easy to obtain as the corresponding estimates in other three parameter distributions. [Note that PWM estimation for GEV (Hosking .tl...Jl!.. 1985 b) is both simple and efficient]. It has some theoretical attraction. It was recommended that if only a small sample, N < 25, were available, the EVI distribution be fitted to it because it is a special case of GEV. This was recommended for the sake of consistency even though the lognormal performed better than it in the goodness of fit tests. If the true distribution were GEV then the EVI would certainly be able to approximate it well over a limited range of return period without having to suffer the disadvantage of having to estimate a third parameter from the small sample. CHAPfER7 DISTRIBUTIONS PREVIOUSLY CHOSEN OR RECOMMENDED FOR NATIONAL USE 57 It should be noted that the type of goodness of fit tests related to probability plots and described above suffer from a major weakness in that they take no account of the fact that the natural sampling variation of the largest elements in a sample are far greater than that for the middle ranking values. Another point to note is that such tests almost inevitably pick out the three parameter distributions before the two parameter ones. This is not a sufficient reason for accepting three parameter distributions and rejecting two parameter ones. If the true population were EVI such tests would show more favour to the GEV, with its third parameter, than to the parent itself. The same would hold if the true population were LN; such tests would show more favour to LP3 than to the parent distribution. 7.Z.3 AUSTRAIlA. In 1977 the Institution of Engineers of Australia adopted the LP3 distribution for flood frequency analysis in Australia (lEA, 1977). The suitability of this distribution for describing Australian annual maximum series data has been studied by McMahon and Srikanthan (1981). They made use of the moment ratio diagrams Cs versus Cv and PI versus PZ where Cv and Cs are coefficients of variation and skewness respectively, both corrected to some extent for small sample bias, and PI = c; and PZ is the kurtosis. On these diagrams the LP3 distribution appears as a curve whose coordinates vary with the value of the shape parameter, b, in the log domain. There is a separate curve for each value of scale parameter, a, considered. (See Table 3. I for notation). On these diagrams also the Normal, exponential and EVI distributions can be represented by single points while Gamma, Weibull and LN distributions are represented by single lines or curves. Sample values of Cv, Cs = -{lfi and PZ were computed for 172 series of annual maximum floods and each series was represented by a point on both the Cs - Cv and the PI - PZ diagram. The authors state that "in the PI - PZ diagram, it is observed that most of the points plot to the right of the Gamma, lognormal, Weibull, Normal, Gumbel and exponential distributions". In the Cs - Cv diagram these distributions do not cover more than half the points. However the LP3 distribution is found to cover satisfactorily the data points both in the PI - PZ and Cs - Cv diagrams. This simple analysis suggests that at least for these 172 streams LP3 is the only suitable general distribution for flood frcquency analysis. In particular it is noted that the two parameter lognormal distribution is generally unsatisfactory". The consistency of the plotted points between the two types of diagrams waS checked as follows. For each stream the value of LP3 scale parameter a was determined from Cv and Cs' Using this scale parameter value and the observed PI = C~ value, a value of pz waS obtained from the PI - PZ diagram. A plot of this value of PZ against the observed value of PZ for each stream was then prepared. If the plotted points fell on the 45° line it would imply full consistency between the two types of diagrams for the LP3 distribution in the light of the observed data. The majority of the points plotted on one side of the line however. The sample sizes on which the above analysis are based are not given, although full details are available in McMahon (1979). It is inevitable that the observed values of Cv, Cs and PZ suffer from sampling effects and possibly also from bias. If a correction is made for bias the sampling errors remain and even if these are randomly distributed about the respective true values, their scatter may make a judgement between distributions less certain. The authors conclude "Nevertheless the analysis as it stands suggests, at least for these Australian streams, that the LP3 distribution is a satisfactory base method for analysing flood flow frequencies". In conclusion also the authors questioned the wisdom of setting the skewness to zero when using the LP3 distribution for flood estimation problems in those cases where the observed value of sample skewness, in the log domain, is not statistically different from zero. The shortcomings of moment ratio diagrams for the purposes of selecting a distribution are discussed in Appendix 3. 7.Z.4 STATE OF CALIFORNIA (USA) Wu and Goodridge (1974) report on studies made on the distributional form of rainfall and runoff series in California. These were annual series of: Short duration rainfall of from 5 minutes to 12 hours duration from 73 stations with 20 or more years of record; Long duration rainfall of from one to 60 days duration from 53 stations with 70 or more years record; 58 STATISTICAL DISTRIBUI10NS FOR FLOOD FREQUENCY ANALYSIS Annual peak runoff and runoff volumes for durations ranging from I to 364 days from 90 records of 20 or more years. For each type of series for each station values of skewness Cs = ~ and kurtosis excess Ck = lh - 3 were calculated and weighted average values of Cs and Ck for each type of series were obtained. The pair of values (Cs , C0 for each type of series was then plotted on Cs - Ck moment ratio diagrams on which specific points, lines or regions represent the theoretical relation between Cs and Ck of the several candidate distributions of interest. Separate moment ratio diagrams were used for (i) short duration rainfall, (ii) long duration rainfall and (iii) runoff data. Their conclusions are (a) that the P3 distribution is the best overall model for precipation series of durations up to 30 days, with the Weibull distribution best for longer durations and (b) that the Weibull distribution is the best overall model for all of the runoff series. It should be noted that the curves representing the P3 and Weibull distributions on the Cs - Ck diagrams lie quite close to one another. The Wu and Goodrige (1974) study is not comparable with that of McMahon and Srikanthan (1981 a) even though both are based on the use of moment ratio diagrams. The former does not consider the LP3 distribution while the latter cousiders only that distribution. In another study Cruff and Rantz (1965) compared the use of six probability distributions for flood series in coastal regions of California and recommended use of the P3 distribution as a result. 7.2.5 ITALY Cicioni !1Jl.!. (1973) examined the suitability of five distributions on 108 sets of flood data and concluded that LN2 was the best as awhole. The other distributions tested were LN3, G2, P3 and EVl. However Rossi «.l!L. (1984) found EV I, LN2 and PEV unsuitable for Italian flood data and recommended TCEV instead on the grounds of its ability to model regional distribution of skewness. 7.2.6 CANADA Spence (1973) compared the use of Normal, LN2, EVI and log EVI distributions to AM series for Canadian Prairie catchments and concluded that the LN2 distribution was the most suitable overall one. CHAPTER 8 CONCLUDING REMARKS 8 •1 Types of model Statistical methods of flood frequency estimation in current use employ an AM or PD series type models in which a series of flood magnitudes are assumed to behave like a random sample of independent identically distributed variates. Most research work and published work deal with peak flows as the variable of interest even though the variables flood volume and flood duration, are of practical interest in many applications. In the majority or research projects attention has been confmed to the AM model. 8.2 The modelling prohlem The main modelling problem is the selection of the probability distribution for the flood magnitudes coupled with the choice of estimation procedure. 8.3 Descriptive ahility of distributions Random samples from many of the distributions traditionally used for frequency analysis do not display the same behaviour of skewness as do observed regional AM data sets. Exceptions are the relatively recently introduced "thick tailed" distributions such as WAK, GLG and TCEV. These are also sufficiently flexible to provide a good fit to observed data Many of the traditionally used three-parameter distributions (p3, LP3, GEV or EV2, Weibull) are sufficiently flexible to provide a moderately good fit to observed data but they do give rise to the condition of separation of skewness. Of the two widely used two-parameter distributions, EVI and LN2, the latter can show a reasonable fit to a wider variety of observed data than can the former. The EVI sometimes fits observed data wen in humid climates in which floods do not vary greatly from year to year (low Cv). These two distributions also give rise to the condition of separation of skewness. 8.4 Predictive ability and robustness As well as considering descriptive ability the choice of a DIE procedure must take into account the predictive abilities of such procedures. In view of the lack of absolute knowledge of the correct form of distribution of floods the property of robustuess is very important in this context. This depends both on the distribution chosen and method of parameter estitnation. 8.5 Parameter estimation Parameter estimation by ML has optimal properties when the sample(s) on which it is used actually are drawn from the distribution assumed in the procedure. If the sample is from a different distribution the optimal properties are by no means guaranteed. Since it is distribution-specific it may not be robust. Parameter estimation by ordinary moments, while very popular among hydrologists, is known to be biased and inefficient especially with three-parameter distributions. The exact corrections for bias are not easy to summarize in simple formulae. Parameter estimation by PWM, which is relatively new, is as easy to apply as ordinary moments, is usually unbiased and is almost as efficient as ML. Indeed in small samples PWM may be as efficient as ML. With a suitable choice of distribution PWM estimation also contributes to robustness and is attractive from that point of view. Another attraction of the PWM method is that it can be easily used in regional estitnation schemes. 60 STATISTICAL DISTRIBunONS FOR FLOOD FREQUENCY ANALYSIS Graphical estimation even in regional index flood types of scheme leads to very variable estimates which are not objective and should not be used since efficient, objective methods are available. Unbiased plotting positions (Chapter 4.3.4) should be used for data display. Weibull plotting positions lead to bias in quantile estimates if used in either graphical or least squares estimation schemes. 8.6 At-site and at-site/regional estimation A choice must be made between flood estimates based on (a) at-site data alone or (b) at-site plus regional data. (a) Flood estimates may be based on at-site data alone if: (i) the at-site record is exceptionally long; (ii) there are no regional data available; (iii) the region is very heterogeneous i.e. Cv of Cv > 0.4. It must be accepted that a single at-site record can provide limited quality estimates over a limited range of return periods. If a two-parameter distribution is used the standard error ofestimate will be smaller than if a three-parameter distribution is used but the bias will (probably) be larger. On the other hand, use of a threeparameter distribution can be accompanied by such large standard error as to make the estimate of very little value. This is the limitation of use of at-site data alone. It follows of course that it is not suggested here that multiparameter distributions such as WAK or TCEV be used with single records; (b) Flood estimates may profitably be based on joint use of at-site and regional data, providing a reasonably homogeneous flood region can be identified. In this context a homegeneous region is a collection of catchments whose flood statistics are homogeneous. It does not imply that all catchments in it are in a confined geographical area. The advantage of joint use of at-site and homogeneous regional data is that there is sufficient information in the combined data set to enable a muItiparameter distribution to be used reliably. Thus a distribution which does not cause thc condition of separation of skewness can be used. In these circumstances, the WAK distribution needs to be considered. It does not cause the condition of separation of skewness i.e. it satisfies the descriptive requirements of a flood frequency model and its parameters can be reliably estimated from regionally averaged standardized PWMs and it has been impressive in all the robustness tests published thus far. It is stressed again that it is not being suggested here that WAK be used in at-side mode. Of course, all three-parameter distributions, when uscd in this context, will give more reliable results than when used in at-site mode alone even though they may cause the condition of separation of skewness i.e. they may not satisfy all the descriptive requirements of a flood frequency model. 8.7 Arid and semi-arid zones Floods in semi-arid and arid zones generally have much higher Cv than those of humid zones. Hence longer records, unfortunately not often available, are required for such zones. Also the lower AM flood values may be more of a distraction than value to the estimation scheme. Serious consideration should be given to censoring the lower AM values. Since this would leave very few flood values at each station it is imperative to use regional estimation methods in such circumstances, subject to the proviso that Cv of C v does not exceed 0.4. Many of these "high Cv hydrologies" are in the developing countries. 8.8 Regional homogeneity Research is continuing to establish how to distinguish between catchments within a country which have different Qr IQ versus T relations (see Chapters 4.6 and 5.3.1). Some catchment types may have steeper curves (larger Cv) than others and this difference may be a function of catchment characteristics other than area alone. Since greater quantile accuracy can be achieved by grouping catchments into homogeneous groups, efforts should be made in any CHAPTER 8 CONCLUDING REMARKS 61 flood estimation scheme to check on regional homogeneity. A small amount of regional heterogeneity is tolerable and in such cases regional flood estimation schemes still perform better than at-site ones. 8.9 Necessity for flow gauging In the absence of at-site data a Q versus catchment characteristics relation may be used to obtain Q. It is worth stressing that a gauging station should be installed at any site as soon as it becomes clear that flood estimates will be required there, as a small amount of site data greatly improves the precision of the Q estimate which can then be used with a regionally based estimate of Qr IQ 8.10 Interpretation and use of flood frequency estimates. The concept of flood frequency analysis as an aid iu decision making is a useful one. However it is a technique'which may easily be mis-used. Most desigu flood estimates involve statistical extrapolation of some kind, the dangers of which, referred to in Chapter 1, Section 6, should be borne in mind. REFERENCES Acreman, M.C., and Sinclair, C.D., 1986: Classification of drainage basins according to their physical characteristics; An application for flood frequency analysis in Scotland. J. Hydro!., 84, 365 -380. (Also pres. at Brit. Hydro!. Soc., Newcastle upon Tyne, July, 1984). Acreman, M.C., and Hosking, LR.M., 1986: Estimating regional flood frequency curves for Scotland in the presence of correlation. Unpub!. Rept., 1nst. of Hydrology, Wallingford, U.K Adamowski, K, 1985: Nonparametric kernel estimation of flood frequencies. Water ResouT. Res.,21(11), 1585 - 1590. Aitchison, J., and Brown, J.AC., 1957: The log-normal distribution, with special reference to its uses in economics. Cambridge University Press, New York. Alexander, G.N., Karoly, A. and Susts, AB., 1969: Equivalent distributions with application to rainfall as an upper bound to flood distributions. J. Hydro!. 9, 322 - 344. Alexander, G.N., Karoly, A. and Susts, A.B., 1969: Equivalent distributions with application to rainfall as an upper bound to flood distributions (continued, parts 3 and 4). J. Hydro!. 9, 345 - 373. Ahmad, M.l., 1988: Application of statistical methods to flood frequency analysis. Thesis pres. in fulfillment of Ph. D. degree, Univ. of St. Andrews, Scotland, 169 pages. Ahmad, M.l., Sinclair, C.D. and Spurr, B.D., 1988 (b): Assessment of flood frequency models using EDF statistics. Water Resour. Res. 24 (8), 1323 - 1328. Ahmad,M.I., Sinclair, C.D. and Werrity, A., 1988: Log-logistic flood frequency analysis. J. Hydro!., 98, 205-224. Anderson, T.W. and Darling, D.A., 1954: A test of goodness of fit. JASA,49,765-769. Amell, N.W. and Gabrielle, S., 1985: Regional flood frequency analysis with the two-component extreme value distribution: An assessment using computer simulation experiments. Workshop on combined efficiency of direet and indirect estimations for point and regional flood prediction, Perugia, Italy, December. Amell, N.W., Beran, M.A, and Hosking, J.R.M., 1986: Unbiased plotting positions for the general extreme value distribution. J. Hydro!., 86, 59 - 69. Amell, N.W. and Gabrielle, S., 1988: The performance of the two component extreme value distribution in regional flood frequency analysis. Water ResouT. Res., 24(6), 879 - 887. Ashkar, F. and Rousselle, J. A multivariate statistical analysis of flood magnitude, duration and volume In V.P. Singh (Ed.) "Statistical analysis of Rainfall and Runoff', pp 651-668, Water Resources Pub!., Colorado, 1982. Ashkar, F. and Rousselle, J., 1983a: Some remarks on the truncation used in partial duration flood series models. Water Resour. Res., 19(2),477-480. Ashkar, F. and Rousselle, J., 1983b: The effect of certain restrictions imposed on the inter-arrival times of floods. Water ResouT. Res., 19(2),481-485 Atkins, GP., 1980: Regional flood frequency aualysis in Papua New Guinea. Ph.D thesis, Univ. Technology, Lae, Papua,New Guinea. 473 pages. Bardsley, KE., 1977: A test for distinguishing betwecn extreme value distributions. J. Hydro!., 34(3/4), 371-381. Bcable, M.E. and McKerchar, AI., 1982: Regional flood estimation in New Zealand. New Zealand National Water and Soil Cons. Org., Water and Soil Tech. Pub!. No. 20, 132pp. Beard, L.R., 1960: Probability estimates based on small normal distribution samples. Jour. Geophys. Res., 65(7), 2143-2148. Beard, L.R., 1962: Statistical methods in hydrology. Hydrologic Engineering Centre, Corps of Engineers, Davis, Ca. REFERENCES 64 Beard, L.R., 1974: Flood flow frequency techniqucs. Center for Research in Water Resources, The University of Texas at Austin, 28 pages + Tables + Appendices. Beard, L.R., 1987: Discussion of "Relative Accuracy of Log Pearson 1I1 Procedures" by Wallis and Wood (1985). ASCE, J.Hydrau!. Engng., 113(a), 1205 -1206. Benson, M.A., 1950: Use of historical data in'flood frequency analysis. Trans. Am. Geophys. Union, 31(3), 419 -424. Benson, M.A., 1960: Characteristics of frequency curves based au a theoreticall()()() year record. U.S. Geo!. Surv., Water Supply Paper 1543-A. Benson, M.A., 1962a: Evolution of methods for evaluating the occurrence of floods. U.S. Geo!. Surv., Water Supply Paper 1550-A. Benson, M.A., 1962b: Factors influencing the occurrence of floods in a humid region of diverse terrain. U.S. Geo!. Surv., Water Supply Paper No. 1580 - B., 64p, Benson, M.A., 1968: Uniform flood frequency estimating methods for federal agencies. Water Resour. Res, 4(5), 891 908. Beran, M.A. and Nozdryn-Plotnicki, MJ., 1977: Estimation of low return period floods. Hydro!. Sci. Bul!. 22, 275 282. Beran, M.A., Hosking, J.R.M. and Arnell, N., 1986: Comment on "Two-component extreme value distribution for flood frequency analysis" by Rossi et al. (1984). Water Resour. Res., 22(2), 263 - 266. Bernier, r., Bernier, r., 1967b: Methodes Bayesiennes en hydrologie statistique. Proc. Int!' Hydro!. Symp., Fort Collins, 1967a: Sur Ie theorie du renouvellement et son application en hydrologie. Electricite de France,. HYD, 67(10). ColoradO, U.S.A., 459 - 470. Biswas, A.K. and Fleming, G., 1966: Floods in Scotland: Magnitude and frequency. Water and Water Engg., 246252. Blom, G., 1958: Statistical estimates and transformed Beta variables. John Wiley, New York, pp. 68-75 and 143-146. Bobee, B., 1973: Sample error ofT-year events computed by fitting a Pearson Type 3 distributiou. Water Resour. Res., 9(5), 1264 - 1270. Bobee, B. and Robitaille, R., 1975: Correction and bias in the estimation of the coefficient of skewness. Water Resour: Res., 11(6), 851 - &54. Borgman, L.E., 1963: Risk criteria. Jour. Waterways and Harb. Div., ASCE, 89, 1 - 35. Boughton, W.C., 1980: A frequency distribution for annual floods. Water Resour. Res., 16(2), 347 - 354. Bridges, W.e., 1982: Technique for estimating magnitude and frequency of floods on natural flow streams in Florida. US Geo!. Surv.. Water Resour. Invest 82-4012. Burges, S.J., 1979: Review of NERC's flood studies report 1960 (q.v.) EOS, Amer. Geophys. Union, 60(46), 788 790. Burges, S.J., Lettenmaier, D.P. and Bates, C.L., 1975: Properties of the three-parameter log-normal distribution. Water Resour. Res., 11(2),229 - 235. Cervantes, J.E., 1981: A trigger type cluster model for flood analysis. Ph.D. thesis, Purdue Univ., West Lafayette, Indiana, U.S.A.. Cervantes, J.E., Kavvas, M.L and Delleur, J.W. 1983: Cluster mOdel for flood analysis. Water Resour. Res., 19(1), 209-224. STATISTICAL DISTRIBUTIONS FOR H.OOD FREQUENCY ANALYSIS 65 Charbeneau, R.J., 1978: Comparison of the two- and three-parameter log normal distributions used in streamflow synthesis. Water Resour. Res., 14(1), 149-150. Chen Jia-Qi, Ye Yong-yi and Tan Wei, 1975: The important role of historical flood data in the estimation of spillway design floods. Seientia Sinica, 18(5),669 - 680. Chow, V.T., 1950: Discussion of "Annual Floods and the Partial Duration Series" by W.B. Langbein. Trans. Am. Geophys. Union, 31(6), 939 - 941. Chow, V.T., 1951: A general formula for hydrologic frequency analysis. Trans. Amer. Geophys. Union, 32, 231- 237. Chow, V.T., 1953: Frequency analysis of hydrologic data with special application to rainfall intensities. Univ. Illinois Eng. Expt. St., Bul!. 414. Chow. V.T., 1954: The log probability law and its engineering applications. Proc. ASCE, 80(536), I - 25. Cieioni, G., Giuliano G. and Spaziani, F.M., 1973: Best fitting of probability functions to a set of data for flood studies. Proc. Second Int!. Symp. on Hydrology - floods and droughts. Water Resour. Pub!., Fort Collins, Colorado pp. 304 - 314. Cole, G., 1966: An application of the regional analysis of flood flows. Institution of Civil Engineers, London, Symposium on river flood hydrology (1965), 39-57. Condie, R., 1977: The log-Pearson type 3 distribution: The T-year event and its asymptotic standard error by maximium likelihood theory. Water Resour Res., 13(6),987-991. Commons, W., 1986: Sampling behaviour of coefficients of skewness and kurtosis in the context of moment ratio diagrams. Unpub!. Report, Dept. Engng. Hydro!., Univ. Col!., Galway, Ireland. 6 pages. Condie, R. and Lee, K.A., 1983: Flood frequency analysis with historic infonnation. J. Hydro!., 58(1/2), 47-62. Conger, D.H., 1986: Estimating magnitude and frequency of floods for ungauged urban streams in Wisconsin. US Geo!. Surv. Water Resour. Invest. Rept 86-4005. Correia, F.N.: Multivariate partial duration series in flood risk analysis. In V.P. Singh (Ed.) "Hydrologic Frequency Modelling", 541-554, D. Reidel Pub!. Co., 1987. Cmff, R.W. and Rantz, S.E., 1965: A comparison of methods used in flood frequency studies for coastal basins in California. U.S.Geo!. Surv. Water Supply Paper 1580-E. Cunnane, C. and Nash, J.E., 1971: Bayesian estimation of frequency of hydrologic events. Proc. Warsaw Symp. on Math. Models in Hydrology. (lAHS Sci.Pub!. 100, pp47-55, 1974). Cunnane, c., 1973: A particular comparison of annual maximum and partial duration series methods of flood frequency prediction. J. Hydro!., 18,257-271, with ERRATA 19, p377. Cunnane, c., 1978: Unbiassed plotting positions - a review. J. Hydro!., 37(3/4), 205 - 222. Cunnane, C., 1979: A note on the Poisson assumption in partial duration series models. Water Resour. Res., 15(2), 489 - 494. Cunnane. c., 1984: Condition of separation of skewness of random samples from the general extreme value distribution. Unpub!. Rep., Dept. Eng. Hydro!., Univ. Coli., Galway, 7pp Cunnane, C., 1987: Review of statistical models for flood frequency estimation. Paper pres. at Int. Symp. on flood frequency and risk analyses, Baton Rouge, La.. Pub!. in Singh, V.P. (Ed.), Hydrologic frequency modelling, 49 - 95, Reidel Pub!. Co., Dordrecht. Curetan, E.E., 1968: Unbiased estimation of the standard deviation. Amer. Stat. 22(1), p.22. Dalin, J.S., 1986: Statistical analyses of mixed populations. Paper pres. at Int. Sympos. on Flood frequency analysis and risk analyses,Baton Rouge, La., 20 MS pages. REFERENCES 66 Dalrymple, T., 1960: Flood freqency methods. U.S. Geo!. Snrvey, Water Supply Paper 1543 A, pp 11 - 51, . Washington. De Coursey, D.G., 1972: Objective regionalisation by peak flow rates. Proceedings Second International Symposium in Hydrology, Fort Collins, Colorado, pp 385 - 405. Eagleson, P.S., 1972: Dynamics of flood frequency. Water Resources Research, 8(4), 878-898. Eychaner, J.H., 1984: Estimation of magnitude and frequency of floods in Pima County, Arizona with comparisons of alternative methods. Water Resour. Invest. Rep.84-4142, U.S. Geo!' Surv., Tucson, Arizona. Farhan, Y.l., 1984: Regionalisation of surface water catchments on east bank of Jordan. 25th Inln!. Geog. Congress Pre - Sympos.No 30 on "Problems in regional hydrology", Univ. of Freiburg, Fed. Rep. Germany. Fiering, M.B., 1969: Streamflow Synthesis. Macmillan, London (Chapt.3). Fiering, M.B. and Jackson, B.B., 1971: Synthetic streamflows. Amer. Geophys. Union., Water Resour. Monograph 1,98 pp. Fiorentino, M. and Gabriele, S., 1984: A correction for the bias of maximum likelihood estimators of Gumbel parameters. J. Hydro!. 73 (1/2), 39 - 50. Foster, H.A., 1924: Theoretical frequency curves and their application to engineering problems. Trans. ASCE, 87, 142-173. Fuller, W.E., 1914: Flood Flows. Trans. Am. Soc. Civ. Engrs., 77, 564 - 617. Greenwood, JA., Landwehr, J.M., Matalas, N.C. and Wallis, J.R., 1979: Probability weighted moments: Definition and relation to parameters of distributions expressible in inverse form. Water Resonr. Res., 15(5) 1049-1054. Greis, N.P. and Wood, E.F., 1981: Regional flood frequency estimation and network design. Water Resour. Res., 17(4), 1167 - 1177. Gringorten, u., 1963: A plotting rule for extreme probability paper. J. Geophys. Res., 68(3), 813 - 814. Guillot, P., 1973: Application of the method of GRADEX. Proc. 2nd Int. Sympos. in Hydro!., "Floods and droughts", pp44 - 49, Water Resour. Public., Fort Collins, Colorado. Gumbel, E.J., 1941: The return period of flood flows. Ann. Math. Statist., 12(2), 163 - 190. Gumbel, E.J., 1958: Statistics of extremes. Columbia Univ. Press, 375 pp. Hall, MJ. and O'Connell, P.E., 1972: Time series analysis of mean daily river flows. Water and Water Engng., 76, 125-133. Hasselbad, V., 1969: Estimation of finite mixtures of distributions from the exponential family. J. Am. Stat. Assoc., 64, 1459 - 1471. Hazen, A., 1914: Discussion on "Flood flows" by W.E. Fuller. Trans. ASCE., 77, 626-632. Hazen, A., 1932: Flood flows. John Wiley, New York, 199 pp. Hebson, C.S. and Wood, E.F., 1982: A derived flood frequency distribution using Horton order ratios. Water Resour. Res., 18(5), 1509 - 1518. Hebson, C.S. and Cunnane, c., 1986: Assessment of use of at-site and regional flood data for flood frequency estimation. Paper presented at Int. Sympos. on Flood Freqnency and Risk Analyses, Baton Rouge. Pub!. in Singh, V.P. (Ed.), 1987, Hydrologic Frequency Modelling, 433 - 448, Reidel Pub!. Co., Dordrecht. Hedman, E.R. and Osterkamp, W.R., 1982: Streamflow characteristics related to channel geometry of streams in Western United States. US Geo!. Surv. Water Supply Paper 2193, 17p. STAllSTICALDlSTRIBUfiONS FOR HOOn FREQUENCY ANALYSIS 67 Hirsch, R.M.: Probability plotting position formulas for flood records with historical information. Paper pres. at USChina bilateral symposium on the analysis of extraordinary flood events, Nanjing, Oct. 1985. Pub!. in J. Hydro!., 96, (1-4), 185 - 199, 1987 Hoshi, K., and Burges, S.1., 1981: Sampling properties of parameter estimates for the log-Pearson Type 3 distribution using moments in real space. J. Hydro!., 53: 305-316. Hosking, J.R.M., 1984: Testing the general extreme value distribution hypothesis. Biometrika. Hosking, J.R.M., 1985: Comment on "A correction for the bias of maximum likelihood estimates of Gumbel parameters", by Fiorentino and Gabrielle. J. Hydro!., 78(3/4), 393-396. Hosking, J.RM., 1986a: The theory of probability weighted moments. IBM Math. Res. Rep., RC12210, Yorktown Heights, New York, 16Op. Hosking, J.R.M., 1986b: The Wakeby distribution. IBM Math. Res. Rep., RC12302, Yorktown Heights, New York, 21p. Hosking, J.R.M. and Wallis, J.R., 1984: Palaeoflood hydrology and flood frequency analysis. EOS, 65(45), p890 (Pres. at Amer. Geophys. Union Fan Meeting, San Francisco, Dec. 1984). Hosking, J.R.M. and Wallis, J.R., 1985: The effect of inter-site dependence on regional flood frequency analysis. EOS, 66(46), p906, (presented AGU Fan Meeting, San Francisco, Dec. 1985). Hosking, J.R.M. and Wallis, J.R., 1986a: Palaeoflood hydrology and flood frequency Analysis. Water Resour. Res., 22(4),543 - 550. Hosking, J.R.M. and Wallis, J.R., 1986b: The value of historical data in flood frequency analysis. Water Resour. Res., 22(11), 1606 - 1612. Hosking, J.R.M., Wallis, J.R., and Wood, E.F., 1985 (a): An appraisal of the regional flood frequency procedure in the UK Flood Studies Report. Hydro!. Sci. J., 30(1), 85-109. Hosking, J.R.M., Wallis, J.R. and Wood, E.F., 1985 (b): Estimation of the generalised extreme value distribution by the method of probability weighted moments. Technometrics, 27(3), 251 - 261. Houghton, J.C., 1977: Robust estimation of the frequency of extreme events in a flood frequency.conte Ph.D. Dissertation, Div. of App!. Sci., Harvard Univ., Cambridge, Mass. Houghton, J.e., 1978 (a): Birth of a parent: The Wakeby distribution for modelling flood flows. Water Resour. Res., 14(6), 1105 - 1109. Houghton, J.C., 1978 (b): The incomplete means estimation procedure applied to flood frequency analysis. Water Res. Res., 14(6), llll - 1115. Hua, Shi-Qian, 1985: A general survey of flood frequency analysis in China. Paper pres. at US-China bilateral Symposium on the analysis of extraordinary flood events, Nanjing. Pub!. in J. Hydro!., 96,(1-4), 15 - 24, 1987 Hydrologic Engineering Center (HEC), 1975: Hydrologic frequency analysis. Hydrologic Engineering Methods for Water Resour. Development, (US contrib. to lHD),Vo!. 3, Corps of Engineers, Davis, Calif., HEC-lHD0300,16 pages. 1n-na, Nophadal ,1988: A new plotting formula for Pearson Type 3 distribution. M.Sc. Thesis Subm. to Dept. Engng. Hydro!., Univ. Col!., Galway, Ireland, 49 pages. Institution of Engineers, Australia, 1977: Australian rainfall and runoff - flood analysis and design. I.E.A., 149 pp. Jenkinson, A.F., 1955: The frequency distribution of the annual maximum (or minimum) values of meteorological elements. Quart. J. Roy. Meteor. Soc., 81 158 - 171. 68 REFERENCES Jenkinson, A.F., 1969: Statistics of extremes. In: Estimation of maximum floods. WMO No 233,TPI26, (Tech. Note No. 98), 183 " 228. Jennings, M.E. and Benson, M.A., 1969: Frequency curves for annual flood series with some zero events or incomplete data. Water Resour. Res., 5(1), 276 - 280. Ji Xue-wu, Ding Jing, H.W. Shen and Salas J.D., 1984: Plotting positions for Pearson type 3 Distrihution. J. Hydro!. (74), 1 - 29. Kaczmarek, Z., 1957: Efficiency of the estimation of floods with a given return period. Proc. Toronto Sympos., lARS Public. No. 45, 145 - 159. Kavvas, M.L., 1982 (a): Stochastic trigger model for flood peaks, 1. Development of the mode!. Water Resour. Res., 18(2), 383 - 398. Kavvas, M.L., 1982 (b): Stochastic trigger model for flood peaks, 2. Application of the model to the flood peaks of Goksu Karahaali. Water Resour. Res., 18(2), 399 - 411. Kendall, M.G. and Stuart, A., 1961: The advanced theory of statistics, Griffin, London, Vo!. 2, 522 - 527. Kirby, W., 1974: Algebraic boundness of sample statistics. Water Resour. Res., 10(2),220 - 222. Klemes, V., 1986: Dilettantism in Hydrology:- Transition or Destiny? Water Resour. Res., 22(9), 177S - 188S. Klemes, V., 1987: Hydrological and engineering relevance of flood frequency analysis. Paper presented at Int. Sympos. on Flood Frequency and Risk Analyses, Baton Rouge. Pub!. in Singh, V.P. (Ed.), 1987, Hydrologic Frequency Modelling, 1 - 18, Reidel Pub!. Co., Dordrecht. Kottegoda, N.T., 1984: Investigation of outliers in annual maximum flow series. J. Hydro!., 72, pp. 105 - 137. Kuczera, G., 1982a: Combining site-specific and regional information: An empirical Bayes approach. Water Resour. Res., 18(2),306 - 314. Kuczera, G., 1982b: Robust flood frequency models. Water Resour. Res., 18(2),315 - 324. Kuczera, G., 1983: Effect of sampling uncertainty and spatial correlation on an empirical Bayes procedure for combining site and regional information. J. Hydro!., 65(4), 373 -398. Lall U. and Beard, L.R., 1982: Estimation of Pearson type 3 moments. Water Resour. Res., 18(5), 1563 - 1569. Lamberti, P. and Pilati, S., 1985: Probability distributions of annual maxima of seasonal hydrological variables. Hydro!. Sci. Jour., 30(1), III - 136. Landwehr, J.M., Matalas, N.C. and Wallis, J.R., 1978: Some comparisons of flood statistics in real and log space. Water Res. Res. 14(5),902 - 920, 1978; and CORRECTION in 15(6), p1672. Landwehr, J.M., Matalas, N.C. and Wallis, J.R., 1979 (a): Probability weighted moments compared with some traditional techniques in estimating Gumbel parameters and quantiles. Water Resour. Res. 15(5), 1055 - 1064. Landwehr, J.M., Matalas, N;C. and Wallis, J.R., 1979 (b): Estimation of parameters and quantiles of Wakeby distribution. Water Res. Res. 15(6). 1361 -1379. Landwehr, J.M., Matalas, N.C. and Wallis J.R, 1980: Quantile estimation with more or less flood-like distributions. Water Resour. Res., 16(3),547 - 555. ' Landwehr, J.M., Tasker, G.D. and Jarret, RD., 1987: Discussion of "Relative Accuracy of log-Pearson IT! procedures" by Wallis and Wood (1985). ASCE, J. Hydrau!. Engng., 113(9), 1206· 1210 . . Langbein, W.B. and others, 1947: Topographic characteristics of drainage basins. U.S. Goo!. Surv. Prof. Paper 968·C, pp 125 - 157. STATISTICAL DISTRIBlmONS FOR FLOOD FREQUENCY ANALYSIS 69 Langbein, W.B., 1949: Annual floods and the partial duration flood series. Trans. Am. Geophys. Union, 30, 879 881. Leese, M.N., 1973: Use of censored data in the estimation of Gumbel distribution parameters for annual maximum flood series. Water Resour. Res., 9(b), 1534 - 1542. Lettenmaier, D.P, 1985: Regionalization in flood frequency analysis:- Is it the answer? Paper pres. at US-China Bilateral Symposium on the analysis of extraordinary flood events, Nanjing. Lettenmaier, D.P., and Potter, K.W., 1985: Testing flood frequency estimation methods using a regional flood generating mode!. Water Resources Res., 21(12),1903 - 1914. Lettenmaier, D.P., Wallis J.R., and Wood, E.F., 1985 Note on the comparative robustness of estimates of extreme flood quantiles. Pres. at American Geophysical Union, Spring Meeting, Baltimore. Lettenmaier, D.P., Wallis, J.R. and Wood, E.F., 1987: Effect of heterogeneity on flood frequency estimation. Water Resour. Res., 23(2), 313 -323. Lieblein, J., 1953: A new method of analysing extreme value data. U.S. Nat. Adv. Comm. Aeronaut., Tech. Note 3053, 88 pages. Lloyd, E.H., 1952: Least squares estimation of location and scale parameters using order statistics. Biometrika, 39, 88 - 95. Lowery, M.D. and Nash, J.E., 1970: A comparison of methods of fitting the double exponential distribution. 1. Hydro!., 10(3), 259 - 275. Matalas, N.C. and Wallis, J.R., 1972: An approach to formulating strategies for flood frequency analysis. Proc. Int. Symp. on Uncertainties in Hydrologic and Water Resource Systems, Tucson, Arizona, 940 - 961. Matalas, N.C. and Wallis, J.R., 1973: Eureka! It fits a Pearson type 3 distribution. Water Resour. Res., 9(2), 281-289. Matalas, N.C., Wallis J.R 815 - 826. and Slack J.R., 1975: Regional skew in search of a parent. Water Resour. Res., 11(6), McMahon, T.A., 1979(a): Hydrologic characteristics of Australian streams. Monash Univ., Clayton, Viet., Civ. Eng. Res. Rep. No. 3/79, 79 pp. McMahon, T.A., 1979(b): Hydrologic characteristics of arid zones. Proc. Canberra Sympos., "The hydrology of areas of low precipitation", lARS Pub!. No. 128, pp 105 - 124. McMahon, T.A.. and Srikanthan, R., 1981 (a): Log-Pearson type 3 distribution - is it applicable to flood frequency analysis of Australian streams? 1. Hydro!., 52,139 - 147. McMahon, T.A. and Srikanthan, R., 1981 (b): Log-Pearson type 3 distribution - effect of dependence, distribution parameters and sample size on peak annual flood estimates. J. Hydro!., 52, 149 - 159. Moran, P.A.P., 1957: The statistical treatment of flood flows. Trans. AGU, 38(4), 519 - 523. Mosley, M.P., 1981: Delimitation of New Zealand hydrological regions. Hydro!., 49, 173 - 192. Nash LE. and Amorocho, J., 1966: The accuracy of the prediction of floods of high return period. Water Resour. Res., 2(2), 191 - 198. Nash J.E. and Amorocho, J., 1967: Note on "The accuracy of the prediction of floods of high return period." Water Resour. Res. (Letters), 3(2), p635. Nash, J.E. and Shaw, B.L., 1966: Flood frequency as a function of catchment characteristics. Proc. Sympos. on River Flood Hydrology (1965), Inst. Civ. Engnrs., London, 115 - 136. NERC, 1975: Flood Studies Report. Nat. Environ. Res. Council, London, Vols. I - 5, 1100 pp. REFERENCES 70 Nguyen, V., In-na, N. and Bobee, B: A new plotting position fonnula for Pearson Type 3 distribution. Subm. for publication, ASCE J. Hydrau!. Engug., 2Opp, 1988. Njenga, M., 1985: Simulation applied to the inference problem of the underlying distribution of hydrologic random variables. Thesis subm. in partial fulfillment of M.Sc., DeplEng.Hydro!., Univ.Coll., Galway, 15Opp. Nozdryn-Plotnicki, MJ. and Watt, W.E., 1979: Assessment of fitting techniques for the log-Pearson type 3 distribution using Monte Carlo simulation. Water Resour. Res., 15(3),714-718. O'Donnell, T., Hall, M.J. and O'Connell, P.E., 1972: Some applications of stochastic hydrological models. Int. Symp. on Modelling Techniques in Water Resources Systems, (Editor A.K. Biswas) VoU, pp 227 - 239, Pub!. by Environment Canada, Ottawa. Olin D.A. and Bingham, RH., 1982: Synthesised flood frequency of urban streams in Alabama. US Geo!. Survey Water Resour. Invest. 82-683. Otten, A and van Montfon, M.AJ., 1978: The power of two tests on the type of distribution of extremes. J. Hydro!., 37, 195 - 199. Phien, R.N. and Hira, M.A., 1983: Log-Pearson type 3 distribution: parameter estimation. J. Hydro!., 64, 25 - 37. Phien, R.N. and Lung-Cheng Hsu, 1984: Variance of the T-year event in the 10gcPearson type 3 distribution. J. Hydro!. 77, 141 - 158. Potter, W.D., 1958: Upper and lower frequency curves for peak rates of runoff. Trans., Amer. Geophy. Union, 39(1), ppl00-105. Quimpo, RG., 1967: Stochastic model of daily flow sequences. Hydrology Paper No.18, Colorado State Univ. Reich, B.M., 1970: Flood series compared to rainfall extremes. Water Resour. Res., 6(6), 1655 - 1667. Reich, B.M., 1977: Lysenkoism in U.S. flood determinations. Special session on flood frequency methods, Amer. Geophys. Union, San Francisco, l3pages. Reich, B.M., 1985: Flood probability effects from progressive fluvial erosion. Paperpresented at US - China Bilateral Sympos. on the analysis of extraordinary flood events. Nanjing, China. Riggs, H.C., 1978: Streamflow characteristics from channel size. AS.C.E., J. Hydrau!. Div., 104, HYl, 87-96. Rosbjerg, D., 1984: Estimation in partial duration series with independent and dependent peak values. J. Hydro!., 76, 183 - 195. Rossi, E, Fiorentino, M. and Versace, P., 1984: Two component extreme value distribution for flood frequency analysis. Water Resour. Res., 20(7), 847 - 856. Rossi, F., Fiorentino, M. and Versace, P., 1986: Reply to comment by 267 - 269. Beran~. (1986). Water Resour. Res., 22(2), Sangal, B.P., and Biswas, A.K., 1970: The three-parameter log-nonnal distribution and its application in hydrology. Water Resour. Res;, 6(2), 505 - 515. Shane, R.M. and Lynn, W.R, 1964: Mathematical model for flood risk evaluation. J. Hydrau!. Div., ASCE, 90, 1 20. Sinclair, C.D. and Ahmad MJ., 1987: Modified Anderson-Darling test. Techn. Rept., Dept. Math. Sci., Univ. of St. Andrews. Simmons, RR. and Carpenter, D.H., 1978: Technique for estimating the magnitude and frequency of floods in Delaware. US Goo!. Surv. Water Resour. Invest., Open File Rept 78-93. Singh, K.P. and Sinclair, RA.: Two distribution method for flood frequency analysis. J. Hydrau!. Div., ASCE, HYI, 98,29 - 44. STATISTICALDISTRIBU!10NS FOR R..OOD FREQUENCY ANALYSIS 71 Singh, V.P. and Aminian Hossein, 1986: An empirical relation between volume and peak of direct runoff. Water Resour. Bull., 22(5), 725 - 730. Singh, V.P. (Editor). "Regional Flood Frequency Analysis" Proc. Int. Sympos. on Flood Frequency and Risk Analyses, Baton Rouge, Louisiana, May 1986. Pub!. by Reidel Pub!. Co., Dordrecht, 400 pp. 1987. Slack, J.R., Wallis, J.R. and Malalas, N.C., 1975: On the value of information to flood frequency analysis, Water Resour. Res., 11(5),629-647. Slade, I.J., 1936: The reliability of statistical methods in the determination of flood frequencies. U.S. Geo!. Surv., Water Supply Paper 771, pp 421 - 432. Song Dedun and Ding Jing, 1988: Estimation of Pearson type 3 parameters by method of probability weighted moments. J. Hydro!., 101,47 -61. Spence, E.S., 1973: Theoretical frequency distributions for the analysis of plains streamflow. Can. J. Earth Sciences, 10, 130 - 139. Stedinger, J.R., 1980: Fitting log-normal distributions to hydrologic data. Water Resour. Res., 16(3) 481-490. Stedinger, J.R., 1983 (a): Design events with specified flood risk. Water Resour. Res., 19(2),511 - 522. Stedinger, J.R., 1983 (b): Estimating a regional flood frequency distribution. Water Resour. Res., 19(2),503 - 510. Stedinger, J.R., 1983 (c): Confidence intervals for design events. ASCE, J. Hydrau!. Engng., 109(1), pp13 - 27. Stedinger, JR. and Tasker, G.D., 1985: Regional hydrological analysis: 1. Ordinary, weighted, and generalised least squares compared. Water Resour. Res., 21(9),1421 - 1432. Stedinger, J.R. and Tasker, G.D., 1986: Regional hydrological analysis: 2. Model error estimators, estimation of sigma and log-Pearson type 3 distributions. Water Resour. Res., 22(10), 1487 - 1499. Sukhatme, P.V., 1937: Tests of significance for the Chi square population with two degrees of freedom. Annals of Eugenics, 8, 52 - 56. Taesombut, V. and Yevjevich, V., 1978: Use of partial flood series for estimating distribution of maximum annual flood peak. Hydro!. Paper No. 97, Colorado State University, Fort Collins, 71 pp. Takeuchi, K., 1984: Annual maximum series and partial duration series - evaluation of Langbein's formula and Chow's discussion. 1. Hydro!., 68, 275 - 284. Tasker, G.D., 1982: Comparing methods of hydrologic regionalization. Water Resour. Bull., 18(6),965 - 970 Tasker, G.D., 1983: Effective record length for the T-year event. J. Hydro!., 64, 39 - 47. Tasker, G.D.: Regional Analysis of Flood Frequencies. In Singh V.P. (Ed.), "Regional Flood Frequency Analysis", Reidel Pub!. Co., Dordrecht, pp 1-9, 1987. Tasker, G.D. and Moss, M.E., 1979: Analysis of Arizona flood data network for regional information, Water Resour. Res., 15(6), 1791 - 1796. Tavares, L.V. and Da Silva, J.E., 1983: Partial duration series method revisited, J. Hydro!., 64, 1 - 24. Thomas, H.A., 1949: Frequency of minor floods. Jour. Boston Soc. Civ. Engrs., 35, p. 425. Thomas, D.M. and Benson M.A., 1970: Generalization of streamflow characteristics from drainage-basin characteristics. U.S.G.S. Water-Supply Paper No. 1975. Thomas, W.O., Jr., 1985: A uniform technique for flood frequency analysis. ASCE, J. Water Resour. Plann. and Manag., 111(3) 321 - 327. REFERENCES 72 Thomas, W.O., Jr., 1987: Techniques used by U.S. Geological Survey in estimating the magnitude and frequency of floods. Proc. 18th. Binghampton Geomorpho!. Sympos., Pub!. by Unwin and Hyman, London, pp267-288. Todorovic, P., 1970: On some problems involving a random number of random variables. Annals Math. Statist., 41(3), 1059 - 1063. Todorovic, P. and Zelenhasic, E., 1970: A stochastic model for flood analysis. Water Resour. Res., 6(6),1641 - 1648. Todorovic, P. and Rousselle, J., 1971: Some problems of flood analysis. Water Resour. Res., 7(5), 1144 - 1150. Todorovic P, 1978: Stochastic Models of Floods. Water Resour Res., 14(2),345-356,. U.S.W.R.e., 1967: A uniform technique for determining flood flow frequencies. Bulletin 15, Hydrology Committee, Water Resources Council, Washington. U.S.W.R.C., 1976: Guidelines for determining flood flow frequency. Bulletin 17, Hydrology Committee, Water Resources Council, Washington. (Also revised versions, Bulletin 17A, 1977 and Bulletin 17B, 1981). Van Montfort, J.AJ., 1970: Ou testing that the distribution of extremes is of type I when type 2 is the alternative. J. Hydro!.,11(4), 421 - 427. Van Montfort, M.AJ. and Gomes, M.L, 1985: Statistical choice of extremal models for complete and censored data. J. Hydro!., 77, 77 - 87. Wallis, J.R., 1980: Risk and uncertainties in the evaluation of flood events for the design of hydrologic structures. Keynote address at "Seminar on Extreme Hydrological Events - Floods and Droughts", Erice, Italy, 33 pp. Wallis, J.R., 1982: Hydrological problems associated with oil shale development. In Environmental Systems Analysis and Management. Ed. S. Rinaldi, North Hol1and Pub!. Co., ppl85 - 102. Wallis, J.R. and Wood, E.F., 1985: Relative accuracy of log-Pearson III procedures. A.S.C.E. Journal of Hydraulic Engineering, 111(7), 1043 - 1056. (See also discussion and reply, 113(9), 1205 - 1214, 1987). Wallis, J.R. and Wood, E.F., 1987: Reply to discussion, by Beard (1987) and by Landwehr tl.ll!. (1987) of "Relative accuracy of log-Pearson III procedures" by Wallis and Wood (1985). ASCE, J. Hydrau!. Engng., 113(9), 1210 - 1214 Wallis, J.R., Matalas N.C. and Slack J.R., 1974: Just a moment! Water Resour. Res., 10(2),211 - 219. Wallis, J.R., Matalas, N.C. and Slack, J.R., 1977: Apparent regional skew. Water Resour. Res., 13(1), 159 - 182. Waylen, P. and Woo, Ming-ko, 1982: Prediction of annual floods generated by mixed processes. Water Resour. Res., 18(4), 1283 - 1286. White, E.L., 1975: Factor analysis of drainage basin properties: classification of flood behaviour in terms of basin geomorphology. Water Resources Bulletin, 11(4),676 - 687. Wiltshire, S.W., 1985: Grouping basins for regional flood frequency analysis. Hydro!. Sci. Joum., 30(1), 151 - 159. Wiltshire, S.W., 1986 (a): Identification of homogeneous regions for flood frequency analysis. J. Hydro!., 84, 287 302. Wiltshire, S.W., 1986 (b): Regional flood frequency analysis I: Homogeneity statistics. Hydro!. Sci. Jour., 31(3), 321 - 333. Wiltshire, S.W., 1986 (c): Regional flood frequency analysis Britain. Hydro!. Sci. Jour., 31(3),335 - 346. 11: Multivariate classification of drainage basins in Wiltshire, S. and Beran, M., 1986: Multivariate techniques for the identification of homogeneous flood frequency regions. Pres. at Internat. Sympos. on flood frequency and risk analyses, Baton Rouge, USA, 17p, 1986. Pub!. in Singh, V.P. (Ed.), 1987, Hydrologic frecuency modelling, Reidel Pub!. Co., Dordrecht, pp133-146. STATISTICALDISTRIBUTlONS FOR RDODFREQUENCY ANALYSIS 73 Wood, E.F., 1974: A Bayesian approach to analysing uncertainty among stochastic models. Research Report 74 - 16, Int. Inst. for App!. Syst. Analy., Laxenburg, Austria, 19 pp. Wood, E.F., 1976: An Analysis of the effects of parameter uncertainty in detenministic hydrologic models. Water Resources Research, 12(5), 925-935. Wood, E.F. and Rodriguez-Iturbe, I., 1975: Bayesian inference and decision making for extreme hydrologic events. Water Resour. Res., 11(4),533 - 542. World Meteorological Organization, 1969: Estimation of maximum floods. World Meteoro!. Organization, WMO-No. 233,11' 126, (Tech. Note No. 98), pp 183 -228. World Meteorological Organization, 1983: Guide to Hydrological Practices, Volume II. World Meteoro!. Organization, WMO-No. 168, p5 - 26. World Meteorological Organization, 1981: Selection of distribution types for extremes of precipitation, by B. Sevruk and H. Geiger. World Meteoro!. Organization, WMO-No. 560, OH Rep. No. 15,64 pp. World Meteorological Organization, 1986: Manual on estimation of probable maximum precipitation. World Meteoro!. Organisation, WMO-No. 332, OH Report 1, Revised edition Wu, B. and Goodridge, J.D., 1976: Selection of frequency distributions for hydrologic frequency analysis. Dept. of Water Resources, State of California, Sacramento, 85 pages. Yevjevich, V., 1968: Misconceptions in hydrology and their consequences. Water Resour. Res., 4 (2), 225 - 232. Yevjevich, V., 1979: Extraction of full information on flood peaks in arid areas. Proc Canberra Sympos., "The hydrology of areas of low precipation", IAHS Pub!. No. 128, pp223 - 234. Yevjcvich, V. and Taesombut, V., 1978: Information on flood peaks in daily flow series. Proc. Int. Symp. on Risk and Reliability in Water Wesources, Univ. of Waterloo, Ontario, Canada, pp. 451 - 470. Yevjevich, V. and Obeysekaera, J.B.T., 1984: Estimation of skewness of hydrologic variables. Water Resour. Res., 20(7), 935 - 943. Zhang, Y., 1982: Plotting positions of annual flood extremes considering extraordinary values. Water Resour. Res., 18(4),859-864. APPENDIX 1. VOLUME FLOODS. Introduction As indicated in Chapter 1 the vast majority of published work regarding flood distribution choice and performance relates to instantaneous peak flows or to peak one day mean flows .. However instantaneous peak flow information (or that on one day flow) alone is insufficient for certain very important types of work. Knowledge of the instantaneous flood peak - return period relationship enables calculation of the probability that a bridge, flood plain or building may be inundated or that a levee may be overtopped. However it provides no information on the volume of water which may flow over a levee and which may have to be pumped back into the river at a later stage. The amount and frequency of such pumping would need to be estimated for economic analysis of an existing or proposed scheme. There are two possible categories of volume - frequency problems. These relate to Ca) volumes over a threshold flow and (b) total volumes of flow. Q Qo ----- L- - ' \/ f_.L-.L.L....L-L..L....L-LLJ_-=_ 1<-1.- - t D ---toI,j Figure ALl Definition sketches for volume flood variables. Ca) VOLUMESOVERA 11IRESHOlD. The data series SI' S2 ,S3 involved in such study would consist of volumes of flows exceeding Qa ' where Qo is the existing within-bank or within-levee flow or a proposed design channel flow. While analysis of such series have undoubtedly been undertaken reports of them have not been widely published. Hence no immediate advice about distributions for such series is available. However because of the Central Limit Tbeorem of statistics the variable S must be less skewed than instantaneous flood peak Q. The theoretical derivation of distribution of flood volumes in this context has been discussed by Todorovic (1978), Ashkar and Rousselle (1982) and Correia (1987). Al.2 (b) TarALFLOwVOLUMES. Information on total flow volume - return period rel"atioI1ships V(D) - T, for a series of different durations D, is a nonnal requirment for spillway assessment of any reservoir and is a necessity in designing a flood control reservoir or in assessing its operating rules. Let (Vj(D) , i = 1, 2... N] bea series of annnal maximum flow volumes in N years of record relating to duration D. Typically;s~ch:series would-bC abstracted for D = 1, 3, 7 or 10, 30, 90 and 365 days. (Hydrologic Engineering Center (HEC), 1975). Typically Vj(D) would be expressed as an average flow so that "peak flows and volumes can be readily compared and coordinated" (HEC, 1975). That is, if Vi*(D) is the '1ctu!'1 maximum volnme of flow for duration D in m3 in year i, Vi(D) is taken as Vj*(D) 1(86400D) m3/s. .. . As indicated above, because of the Central Limit Theorem of statistics, the skewness of the variable V(D) should be less than that of instantaneous peak Q and one would expect for sufficiently large D that V(D) wonld have a Normal (Gaussian) distribntion. HEC (1975) report typical skewnessvalnes of 10gV(D) for United States of America ranging from zero for instantaneous peaks (LN distribution) to - 0.23 for D = 10, - 0.37 for D = 90 and - 0040 for D = 365 days. These values are recommended for use in USA in the absence of locally derived regional valnes. These, along with the first two moments in the log domain obtained from the at-site data, are then used with the LP3 assumption (in a manner analogons to the USWRC (1977) recommendation for instantaneons peak flows) to calculate V(D)-T relationships. Note that the negative skewness values imply upper bounds for V(D) at large T. Statistical modelling of some aspects of the relationships between flood peak, dnration and volume has also been discussed by Todorovic (1978) while Singh and Hossein (1986) have examined empirical relationships between peak and flood volume. In view of .the superiority of index flood melhodsover the LP3 I regional skew procedure it is recommended that V(D) quantiles be also estimated by at-site I regional index flood methods. No preciseguidclines can be offered about the choice of parent distribution for standardized flood volnmes at this time. However those distribntions which havc been found robnst for instantaneous flood peaks should prove satisfactory for volnmes also. . In any regional stndy of flood volumes the several standardised V(D) -. Tielationships for D = 1, 3, 7, .... days should be examined for the possibility of finding one or more relationships between the V(D) - T parameters and the duration D. This would allow for more efficient joint estimation of all parameters. For instance NERC(1975, I, p354) found that the dependence of Q(D) on D could be expressed as Q{D) Qi (1 + BD)n 1 " D " 10 days, (ALl) a form of relationship often used for rainfall intensity statistics. This was suggested by comparison with the rainfall intensity duration relationship derived in the meteorological studies of NERC (1975, Volume 2, Chapter 3). Here Q i is mean of AM instantaneous flood peak series while Q(D) is mean of AM flood volumes series (expressed as m3 Is). Unfortunately the estimated parameters Band n interact with one anotrrer and as a result were found to be poorly correlated with catchment characteristics. In addition NERC (1975, I, pp 358-361) concluded that Cv(D), although displaying a small reduction in value with increase in D, could be regarded as constant for all durations up to 10 days. This effectively means a common growth curve xT - T could be adopted for all durations, where xT = Q(D)T I Q (D). This degree of simplification however may not apply for D considerably larger than 10 days nor may it extend reliably to other regions. APPENDIX 2 ESTIMATES Let Xt, x2 OF POPULATION MOMENTS AND THEIR BIASES xN denote values in a random sample from some X population. Define crude sample moments as mr = E (xi - X)r / N (A2.I) where x is sample mean and summation is from i=I to i=N and let (A2.2) = In general mr is a downwards biased estimate of llr, the population rth central moment and (A2.3) = is a downwards biassed estimate of skewness, C s. These biasses (Wallis et al , 1974) are larger, in small samples, than can be corrected by replacing the denominator of the sample moments N by (N-I) or some other simplc expression in N. For instance (A2.4) = is an unbiassed estimate of the second central moment 112 but its square root is not unbiassed for the standard deviation, cr (Cureton, 1968). Similarly 1\ m3 = - 3 / [(N - 1) (N - 2)] N E (Xi - x) (A2.5) is considered an nnbiassed estimator for 113 in samples from a Normal distribution but 1\ g (A2.6) = is a biassed estimator for C s' The expression = N2 E (Xi - x)4/ [(N - 1) (N - 2) (N - 3)] (A2.7) has been proposed as an nnbiassed estimator for 114 in samples from a Normal distribution but 1\ h = (A2.8) is a biassed estimator for kurtosis, Ck. Use of N alone as denominator in equation (A2.1) leads to biassed estimators for cr, Cs and Ck but nse of the denominators (N - 1), (N- I)(N - 2) / Nand (N - I)(N - 2)(N - 3) / NO' for ~2 ' ~3 and ~4 respectively is not sufficient to eliminate bias in C s and Ck estimates. The amounts of bias in SN and gN of equations (A2.2) and (A2.3) as estimators for cr and Cs respectively were shown by Wallis.tl..l!!.. (1974) to be functions of three things: (i) sample size (ii) skewness of parent population (iii) form of parent population distribntion This dependence is shown in Table A2.I in which a small selection of bias ratios, a(S) and a(G), are quoted from Wallis et aI. (1974 , Tables 3 and 4). The bias ratios are defined by A2.2 U(SN) = (population a)1 (Mean of 100 000 values of "N) U(gN) = (population Cs) I (Mean of 100 000 values of gN) There is only weak dependence between u(sN) and the form of the population distribution for population Cs < 5, but u(gN) is quite dependent on the form of population distribution especially for C s > 3 and it is very dependent on the actual value of population C s . Wallis et al. (1974) discovered that the sampling distribution of skewness obtained by simulation seemed to have an upper bound which was never exceeded no matter how many simulations were performed. Kirby (1974) investigated the theoretical possibility of such an upper bound on Cv and gN and proved theoretically that such bounds exist, thus: o < C v < (N-l)1/2 (A2.9) I gN I < (N - 2) I (N - 1)1/2 (A2.10) These are distribution-free results and hold for any set of N values, randomly selected or otherwise. To see this consider samples of size 10 containing 9 values of unity and large 10th values e.g. 10, 102 , 103 106 and note that while gN grows in these samples that it never exceeds the bound ofequation (A2.1O). Hazen (1932) noticed that skewness calculated from AM series increased with sample size N and suggested that sample values of skewness be multiplied by (I + 8.5 I N) to eorrect for this bias, The effectiveness of this factor in conjunction with equation (A2.6) for estimating C s as G* = A g . (1 + 8.5 I N) (A2Il) where ~ is defined by equation (A2.6), was investigated by Wallis et al (1974) who computed u*(G) = (population Cs)/(Mean of 100 000 values of G*) (A212) and from comparison of u*(G) and U(gN) they concluded that G* is an approximately unbiassed estimator from C s for a small range of C s values, say 0.5 < C s < 2 for the lognormal distribution. It is also a good estimator for theEVI distribution. From the above it can be concluded that the populations, from which AM series are assumed to be random samples, have values of Cv , Cs and Ck which are in excess of the average regional value calculated by the usual formulae. The Wallis et al, (1974) findings, based on Monte Carlo simulation work, has serious implications for methods of flood estimation which depend on the use of sample moments. One such method is the US WRC (1967, 1977, 1981) recommended use of the log Pearson Type 3 distribution with parameter estimation by moments in the log-domain. BObee and Robitaille (1975) suggested a skewness estimator for the Pearson Type 3 distribution 20.2) ( 1.48 6.77) ---2] - -G [(I 6.51 Gs s +N+7+ N+N2 G s (A2.13) as where is the average regional skew. Lall and Beard (1982) pointed out that Bobee and Robitaille's estimator is based on Wallis et a1.'s (1974) investigation into the expected value of gN given C s while it is used to estimate C s given gN' They state that "In the light of the bounded nature of sample skew and its skewed distribution it is suspected that there is a lack of a one-to-one correspondence between the expected value of C s given gN and the expected value of gN given Cs". In conclusion they point out the need to take the magnitude of the Kirby (1974) bound on skewness into account and they state that the nature of the (required) bias correction factor for skewness is probably markedly different from what is currently used. Yevjevich and Obeysekera (1984) also consider methods of skewness estimation, especially from gamma and lognormally distributed samples. They point out that estimates of population skewness obtained by distributional A2.3 methods of parameter estimation (e.g. ML with form of population known) show marked improvement in bias and efficiency over distribution-free methods (e.g. use of sample moments with distribution-free correction factors). Distribution Population Skewness a(sN) a(gN) N=IO N=50 N=90 Normal 0.00 1.084 1.016 1.009 Gumbel 1.14 1.108 1.021 1.012 2.172 1.217 1.123 Lognormal 0.25 1.14 5.00 15.00 1.085 1.104 1;276 1;581 1;016 1.020 1;083 1.226 1;009 1.011 1;054 1.164 1;903 2.161 4.234 10.239 1.141 1;221 2.064 4.654 1;076 1.139 1;757 3.788 Pareto 3.00 5.00 15.00 1.191 1;265 1.392 1.047 1.079 1.147 1;027 1.050 1.102 2.744 4.202 11.784 1;484 2.118 5.613 1.316 1;813 4.659 0.25 1.14 5.00 1.084 1.104 1.390 1.015 1.020 1.094 1.008 1.011 1.055 1.868 1;972 2.735 1.129 1.174 1;499 1.066 1.096 1.323 0.25 1.14 5.00 15.00 1;080 1;014 1;018 1.086 1;304 1.008 1;010 1.053 1;209 1;863 1;819 3.325 7.990 1.125 1.141 1;722 3.706 1;068 1;079 1;490 3.008 2.194 1;207 1.113 Pearson 3 Weibull l.1oo 1;327 1;861 N=1O Not Quoted a*(G) . Table A2.1 N=50 N=90 Cs Selection of Bias Factor Values for Sample Standard Deviation and Skewness. a = (population parameter value){(Mean of sample statistic value over 100 000 random samples), (After Wallis et aI. 1974). APPENDIX 3 MOMENT RATIO DIAGRAMS. Conventional moment ratio diagrams A distribution function expressed in parametric form has a small number of parameters (I, 2, 3 or 4 usually). AIl moments are expressible as functions of these and moments of order higher than the number of parameters are necessarily expressible as functions of lower order moments. A moment ratio diagram is a graph of one such moment as a unique function of a lower order one. Populations corresponding to different applications can be represented on such a diagram by use of dimensionless moments such as Cv , Cs and Ck. Figure 3.2 in Chapter 3 is an example of Ck versus Cs relations for lognormal, Weibull, GEV and Pearson Type 3 distributions. EVI and exponential distributions plot as points on this diagram while the Cs - Ck points for Wakeby distributions plot neither as a single point or a curve but cover an area on this diagram. This Wakeby area encloses most of the curves shown except that Weibull and Gamma distributions having C s greater than approximately 2 and LN distributions having Cs approximately greater than 3.5 fall below and outside the valid Wakeby area. Plotting sample values of (Cs , Ck) or sample avcrages (Cs , Ck) from a region-full of data on such diagrams ought to help diagnosis of the correct distributional form for floods. However, the bias corrections required to be applied to such values prior to plotting depend on sample size. parent skewness and parent distributional form, (Appendix 2). If parent skewness < 2 then eqn (A2.I I) might be reasonably used to removc bias in Cs but no such simple adjustment is readily known for Ck. If parent skewness > 2 but the parent distributional form is unknown the choice of bias correction to be made to Cs and Ck is even less certain since the bias corrections for given sample size are very dependent on distributional form when parent skewness> 2 (Wallis et al.. 1974). This is true especially of the realistic flood-like distributions such as WAK. GEV and TCEV. Apart from bias the standard error of such (Cs , Ck) or even of regional mean values (Cs • Ck) is very large. For instance if M=25 stations and N=30 years, then se(C s) ~ 0.15 before bias correction (~ 0.30 after bias correction) when parent is GEV with skewness = 2.5. This is of the same order of magnitude as the horizontal width (ie in Cs direction) of the band enclosing 4 distributions on Figure 3.2. When se(Ck) is taken into account simultaneously it must be appreciated that it is virtually impossible to select a distribution unambiguously using such a moment ratio diagram. Commons (1986) has investigated some sampling properties of skewness - kurtosis diagrams by Monte Carlo methods using a GEV parent distribution. GEV was choscn because of its flood-like properties and convenience of use. The location and scale parameters were arbitrarily set to 0 and I respectively while k values corresponding to a range of skewness given in Table A3.l were used. Because of the known bias in small sample estimates of moments it is anticipated that a curve joining plotted points representing the expected value of skewness Cs and expected value of kurtosis Ck in small samples should plot below the curve representing the population (Cs • CjJ relation. Such a curve would depend on sample size N and one would expect a family of such curves to exist for different N values. A selection of (Ck. Cs) relations obtained by Commons (1986) for N = 10(20)90 are shown on Fig A3.1a while Fig A3.lb shows the same information in the form of path lines taken by (C s , CjJ points from a common parent, fixed k, as a function of sample size N. These curves arc obtained from M = 100 000 samples of each size. Figs A3.1c and d show corresponding plots obtained from M = 100 samples. These are much less well defined indicating that (C s, CjJ values obtained from 100' samples or fewer would be poor indicators of the true GEV parent even if the "correct" biased plots of figures A3.2a and b were available. Since hydrological regions rarely have more than 100 gauging sites in them. the hydrologist seeking to verify the suitability of a certain distribution has to contend with the uncertainty caused by erratic (random) effects in Figures A3.lc and d. It goes without saying that such plots would be even less useful for discriminating between distribution types on a diagram such as Figure 3.2. Chapter 3. A3.2 16 16 POpn. M=100000 _'30 (a) ," 12 iii "'iii .... ....0: "'0 0 i7 0: :::l '" ~ 12 8 :::l '" p~ 4 2 0 8 4' -- 20 3 - 2 3 2 3 16 16 M=1'OO M·100000 (d) (b) 12 12, (J) "'0 iii iii 0 .... .... 0: 0: :::l :::l '" '" 8 8 4 4 2 0 o Figure A3.1: 1 SKEWNESS 2 1 3 SKEWNESS Plots of (C"C0 obtained by simulation from GEV parent population, asa function of sample size, (a)and (c), and of parent shape parameterk, (b) and (d), M is the number of simulations on which the diagrams are based, L • Moment ratio diagrams The L • moment diagram is a new superior diagnostic tool based on the reletively recently introduced probability weighted moments, PWMs, defined in.Appendix 4. Hosking (1986 a) has defined L· moments, which are linear combinations of PWMs, as follows: (A3.l) Al = MlOO A2 = 2 MUD MlOO A3 = 6 M120 6 MUD A4 = 20 MBO 30 M120 (A3.2) + (A3.3) MlOO + 12 MUD MlOO (A3.4) A3.3 k 11 0.278 0.201 0.133 0.084 0.025 6.849 9.532 14.528 23.451 80.451 (J Cs 1.001 1.051 1.109 1.164 1.243 0.00 0.25 0.50 0.707 1.00 2,717 2.876 3.258 3.764 4.734 5.390 0.000 0.5772 1.283 1.14 -0.041 -0.109 -0.177 -0.216 -0.240 -0.288 0.620 0.697 0.787 0.845 0.884 0.970 1.358 1.515 1.734 1.899 2.022 2.336 1.414 2.00 3.00 4.00 5.00 10.00 TableA3.1: Ck 6.923 11.920 23.397 76.200 297.442 -- Population mean, standard deviation, skewness and kurtosis valnes of GEV(O, 0, k) populations used by Commons (1986). Skewness values fixed as in Wallis et aI., (1974). A.2 is a measure of distribution scale or spread and LCv = A.2 / A.lis analogous to ordinary C v. The dimensionless L - moments ~3 = (A3.5) ~4 = (A3.6) can be considered as L - skewness and L - kurtosis respectively, each lying in the range (-1,+1). These quantities can also be defmed in terms of quantities M10k in view of the relations in equations A4.2 and A4.3 of Appendix 4. Hosking (1988, pp 3 - 5) advocates the use of t3 - t4 plots (t3 and t4 being sample estimates of ~3 and ~4 respectively) for the purposes of identifying the underlying distribution from which samples were drawn. He demonstrates that (C s , Ck) values from several samples drawn from distinct distributions show very considerable overlap when plotted on a (Cs , C0 moment ratio diagram, see Figure A3.2. On the other hand the corresponding (t3, 14) values show considerably less overlap and hence could be expected to help identify underlying parent distributions with greater reliability than is possible with the conventional moment ratio diagram. The t3 - 4 diagram also has the advantage of being based on unbiassed sample quantities in contrast to Cs - Ck quantities which have to be corrected for bias, a requirement which has its own difficulties as explained above and in Appendix 2. Hosking (1988) demonstrated this effect with samples from a GEV and two Weibull distributions. The same effect can be seen in Figure A3.2 for samples from GEV and Pearson Type 3 populations, each having the same value of Cv and Cs. It is clear that in this case the L - moments show better ability to discriminate between the distributions. This ability is dependent on parent skewness however, increasing with increase in skewness and decreasing with decrease in it. See over leaf for Figure A3.2 A3.4 30 • D 0.4 GEV P3 D en .... en 0.3 D 0 ~ ~J' " •• nib ~D .~ 0 0 2 SKEWNESS ...:I 3 4 ... .'!ot • • D D • •• '" I!~ 0 D. f-< l>: ffiI P3 en .... en 20 g 10 • D 0.2 • •• .~dil!P D ••• y",(~ 0.1 0.0 0.0 • 0.1 • • 0.2 D D 0.3 0.4 0.5 L - SKEWENESS Figure A3.2: Plots of sample skewness versus sample kurtosis, and of sample L - skewness versus sample L-kurtosis, for 50 samples of size 100 simulated from GEV and P3 parent eaeh having ~ = 100, Cv =0.5 and Cs = 1.90. The L - moments show better ability to discriminate between the two. APPENDIX 4 PARAMETER ESTIMATION BY PROBABILITY WEIGHTED MOMENTS (PWM). A4.I Introduction Probability weighted moments were introduced by Greenwood ~ (1979) and further analysed by Hosking (1986). They are useful in deriving expressions for the parameters of distributions whose inverse forms x = x(F) can be explicitly defined. In particular they allow parameter estimates to be obtained for distributions which are defined only in inverse form, such as the Wakeby distribution which was introduced as a general flood frequency distribution model by Houghton (1978). Note thatEVI and GEV distributions can be written in both forms F = F(x) and x = x(F) and hence their parameters can also be estimated by PWM. PWM's are defined (Greenwood tl.ll!., 1979) by (M.l) where i, j, k are real numbers. If j = k = 0 and i is a non-negative integer then M.ioo represents the conventional moment of order i about the origin. The special cases of i = 1 and either j = 0 or k = 0, M ijO and M IOk ' are linear in x and are of sufficient generality for parameter estimation. M ljO and M IOk are interdependent and related as follows (Greenwood tl.ll!., 1979, EqnA) MIlX< = I:=o (D M ljO = I~=o (D (_1)k M lOk (-1)j M ijO (A4.2) (A4.3) The form M lOk is used by Greenwood tl.Jl!. (1979) for parameter estimation in the Weibull, Generalised Lambda, Logistic and Wakeby distributions. The latter form (M IjO) is used by them for EV1 and Kappa distributions and is also adopted by Hosking ~(1985, a,b) in the GEV case. Table A4.1 gives formulae for MijO ' for EV1 and GEV cases and MIlX< for the Wakeby case. A4.2 Principles of parameter estimation by PWM. These may be summarised as: (i) For the chosen distribution find expressions for either M'jo or M lOk , as convenient, in terms of the unknown parameters (Table A4.l), . A A (ii) Calculate sample esumates M ijO (or M IOk ) of the PWMs used in (i), (see section A4.3 below), (iii) Equate the sample estimates of (ii) to the expressions in (i) for as many values of j (or k) as there are unknown parameters. (iv) Solve the algebraic equations in (iii) for the unknown parameters (see Section A4.4 below). These are precisely the same type of steps as are involved in estimation by ordinary moments. A4.2 Ref. Inverse fonn and PWM fonnulae Distribution EVI x = u + a[ -In -In F}, M ,jO = _u_ + I + j where GEV Wakeby (Original fonn) Wakeby E where F = Pr(X:S:x) a [ In(l+j) + I + j E ) * , = Euler's Number = 0.57721... +~[I-(-lnF)k} x = u M ,jO = 'j":j:""J [ u x = e x = m + a [ I - (I - F)b] M ,Ok = l'""+'k x = ~ + ~[I-(l-F)f3] - M ,Ok = I I + k I + a [ I - (j + Itk r(l + k) ] / k ) - A(I - F) B + C (I - F) -D , m a -c + I + k - ** or + - c[I-(I-F)-dj, ++ a c + I + k + b I + k - d * t[1- (I-FtO] +++ (Reparameterised. fonn) Table A4.1 : a y [~ + I + k + f3 + I + k - *** oj Inverse form and PWM formulae for 3 common flood frequency distributions. Both Wakeby fonns refer to the same distribution. The second fonn allows for easier checking on the validity of estimated parameters. (* From Greenwood tl.lIl., 1979, Table 2) ( ** From Hosking tl.lIl., 1985b, eqn 9) (*** From Hosking 1986b, eqns 2,4 and 5) . (+ Houghton's (1978a) form) (++ Greenwood ~ (1979) form) (+++ Hosking's (1986b) reparameterised fonn; ~ = m, f3 = b, a = ab, 'Y= cd, = d). ° A4.3 Sample Estimates of PWM'S Unbiassed sample estimates of MljO and M 10k can be expressed, (Hosking, 1986b, p17; Landwehr 1979, eqn. 16) as : j = 0,1, ..... N-I ill..ll!.., (A4.4) A4.3 (A4.5) where x(~ , i = 1, 2".N are the ordered N sample values, with i = 1 denoting the smallest value, and the quantities in square brackets represent Fj and (1 - FY' type quantities or weights respectively. An alternative "plotting position" type weighting function i - 0.35 N = (A4.6) leading 10 (A4.7) = = N -1 ""N '::"i=l [I-F(i)] k (A4.8) xli) has been recommended by Landwehr l:!1l!... (l979b,c), Wallis (1980) and Hosking~, (I985a,b). This latter form is also recommended here for hydrological purposes. " ljO for j Note that M A =0 and M" IOk for k =0 is identical to the sample mean. Also, the relations (A4.2) and A A A (A4.3) hold for M ljO and M IOk calculated from (A4.4) and (A4.5) and also separately for M ljO and M 10k calculated from (A4.7) and (A4.8). A 4.3.1 Exam pie calculations of PWM's. Annual maximum peak floods for River Trent at Nottingham, England for 1884 - 1933 (N = 50 years) are used for illustration. These are taken from NERC (1975, Vol. IV, pp 266-288). The sample values are given in Table A4.2 while unbiassed PWM estimates obtained by equations (A4.4) and (A4.5) and biassed PWM estimates obtained by equations (A4.7) and (A4.8) are also shown in that table. The two sets of PWM estimates differ by only very small amounts but they lead to different parameter values and quantile estimates in most estimating schemes. The biased PWM estimates are favoured for purposes of quantile estimation in hydrology as indicated earlier. "ljO values by equation (A4.7), the biased but favoured method, is illustrated in outline form CalcqIation of M " in Table A4.3. Calculation of M IOk values proceeds in an analogous manner with F(i)j values replaced by [I - F(il. "IOk could be obtained from M"1jo values using equation (A4.2) thus: Alternatively, M M I01 " = M IOO " " M"o " Mill> = " IOO M " 2M"o + " I20 M "103 M = M IOO " " 3M"o + " I20 3M " I30 M " Mill< = "IOO M 4M"o " + " I20 6M " I30 4M (A4.8a) (A4.8b) (A4.8c) + " I40 M (A4.8d) A4.4 Sample data in chronological order. 950.90 195.99 368.46 688.81 658.34 270.53 787.46 118.88 565.16 399.00 186.34 276.32 493.12 474.52 599.95 866.7& 801.53 410.09 393.52 336.79 822.93 324.94 221.49 643.43 410.09 526.68 618.38 720.14 446.86 512.16 482.22 966.72 908.31 393.52 505.76 537.65 594.59 681.11 505.76 945.48 370.79 337.73 360.41 720.14 704.37 221.20 526.68 200.19 480.67 206.16 206.16 370.79 493.12 618.38 822.93 221.20 393.52 505.76 643.43 866.78 221.49 393.52 505.76 658.34 908.31 270.53 399.00 512.16 681.11 945.48 276.32 410.09 526.68 688.81 950.90 324.94 410.09 526.68 704.37 966.72 Ranked sample data. 186.34 337.73 474.52 565.16 720.14 118.88 336.79 446.86 537.65 720.14 195.99 360.41 480.67 594.59 787.46 200.19 368.46 482.22 599.95 801.53 Estimates of probability weighted moments j=k=O j=k=l j=k=2 j=k=3 j=k=4 514.7810 514.7810 321.6082 193.1728 237.5128 109.0774 189.5726 72.9222 158.3115 53.4462 514.7810 514.7810 321.8682 192.9128 238.0372 109.0817 190.3035 72.9844 159.2062 53.5233 (a) Unbiassed. M FO J. M ,Ok (b) Biassed. M ,jO M lOk Table A4.2: Example data of River Trent at Trent Bridge and sample PWM values, (1884 - 1933, N = 50). The PWM values are calculated to four decimal places to avoid possible inconsistencies whcn they are used for parameter estimation purposes. In Wakeby parameter estimation some linear combinations of PWM's occur whose coefficients are large and which alternate in sign. Insufficient accuracy due to deliberate rounding of PWM's leads to quite subslautia1 changes in the results or leads to invalid, see Table A4.4 and Section A4.4.3 below, Wakeby parameters. Rounding errors also affect GEV parameter estimates from PWMs. Hence it is recommended that PWM values are not rounded off during hand-calculations. A 4.4 Parameter estimating equations or .algorithms For information on a computer program for Wakeby parameter estimation see Section A4.5 of this Appendix, page A4.14. . . A4.4.1 EVt CASE The equations of Table A4.1 are simple to handle and yield 1\ a = A (M,OO A - 2M lOl) / In 2 (A4.9) A4.5 x(il(l Rank i F(i) Flow (i-0.35)/50 XCi) 1 2 3 4 5 0.0130 0.0330 0.0530 0.0730 0.0930 118.88 186.34 195.99 200.19 206.16 1.5454 6.1492 10.3875 14.6139 19.1729 0.0201 0.2029 0.5505 1.0668 1.7831 46 47 48 49 50 0.9130 0.9330 0.9530 0.9730 0.9950 866.78 908.31 945.48 950.90 966.72 791.3701 847.8432 901.0424 925.2257 959.9530 722.5209 790.6739 858.6934 900.2246 953.2333 Sum x(il(;) 25739.05 16093.410 Mean = 514.78 321.8682 Sum/50 " = M"Xl " = MIlO 11901.860 x(i)F(i)3 x(i)F(i)4 2.612E·4 6.697E·3 0.0292 0.0779 0.1658 3.395E·6 2.21OEA 1.546E-3 5.685E-3 1.542E-2 659.6616 737.6987 818.3348 875.9380 946.5607 9515.175 238.0372 " I7Jl =M 190.3035 " I30 =M 602.2711 688.2729 779.8731 852.2877 939.9347 7960.310 159.2062 " I40 =M "1jO values by Eqn.(A4.7). Table A4.3 : Outline illustration of calculation of M " 0; = A (2Mll 0 A M lOo) lin 2 - (M.I0) and A U A" eo; = (A4.11) Inserting t1te sample PWM values from part (b) of Table A4.2 gives A 0; = [514.7810 - 2(192.9128)] /0.69315 = 186.04 A = 514.7810 - (0.5772) (186.04) = 407.40. U The estimated 50· year return period flood is A xso A4.4.2 = A U + ,,' 0;( -In-In(I-1150)) GEV CASE. The equations involved are not immediately soluble but Hosking et al. (1985, a,b) give a simple, accurate a1goritltm : A k = 7.8590 C + 2.9554 C2 (M.12) A4.6 where A C = A M IOO 2 MIlO A 3 M I20 M lOO A A A a. A U = = log 2 log 3 A A MlOoI . k [2 MIlo A (A4.13) A r(1+k) . [1 _ 2- k A (A4.12a) ] A A A M IOO + a. [r(1 + k) - l]/k (A4.14) Inserting the sample PWM values from part (b) of Table A4.2 gives C = 2(321.8682) - 514.7810 3(238.0372) - 514.7810 = 0.6469423 = 0,125 - log2 log3 0.630929 A A k A a. A u 0.016005 = r(1+k) = 0.941743 = 206.23 = 418.67 The estimated 50 year return period flood correspunds to F = 0,98 and A AA u + (a. I k). (: ( 1 - (-In 0,98) } = A 1055.49 m 3 Is. A In the calculation of C, k and r(1 +k) considerable care should be taken to avoid rounding errors in order to obtain consistent results. The two components of C are of the same order of magnitude so the less significant digits determine its value. A4.4.2.! 1'EsTOF HYPOTIIESIS TIIATK= O. A statistical test of the hypothesis that an observed random sample has come from an EVI distribution when EV2 is the alternative has been given by van Montfort (1970) while Hosking (1984) has compared a number of methods of testing whether the shape parameter, k, is zero in the GEV distribution, (i.e the EVl· hyputhesis with GEV as alternative). The most powerful is a likelihood ratio test while a simple test based on the probability weighted moment (PWM) estimate of k is almost as puwerfuL This latter test. is ,also given by Hosking tl1l!...(1985b). In it the PWM estimate of k is taken to be distributed as N[O, 0.5635/N] so that the test consists'of comparing the statistic Z = A k(N/0.5635)1/2 with the critical values of the standardised Normal variate. A In this case N = 50, k = 0.125 and Z = 0.125 (50 10.5635)1/2 = 1.177 which is not significantly large at 5% significance level. Hence the hypothesis that k = 0 is not rejected. A A k value of 0.125 corresponds to a GEV population skewness value of approximately 0.5 which is in disagreement with the general observation that flood pupuIations are more pusitively skewed. Howeverthe disagreeA ' ment is only an apparent one in view of the fact that k is not significantly different from zero (BVI skewness = 1.14) or even from - 0.05 (GEV skewness = 1.53). The above calculations with a different 50 year River Trent data set (1917A 1969, years 1955-57 missing) gives k = - 0.01. A4.7 A4.4.3 WAKEBYCASE Explicit solutions for the parameters in terms of PWMs, M lOk , have been given by Greenwood tl..l!L.(1979) for both the 4- parameter (m = 0) and 5-parameter (m '" 0) Wakeby distributions. These are given here first while Hosking's (1986) solution for the re-parameterised fonn is given later, in Section A4.4.3.2. A4.4.3.1 Greenwood ~ Let M(k) = (1979) estim£ltes. M 'Dk for k = 0,1,2,3,4. (i) Case m - 0 (4 parameter Wakebyl 1. Calculate N l' N z' N, as " M(D) (A4.15) Nz = " + 8 M(l) " - M(D) " -9 M(2) (A4.16) = " - M(O) " -3M(2)+ 4 M(l) (M.17) N, N, 2. 3. = - 27 " M(Z) " + 16 M(l) " Calculate C 1• C 2 and C, as " C, = 64 ~') + 54 " 8M(I) (A4.18) Cz = " + 18 M(Z) " - 4M(I) " 16 M(,) (M.19) C, = " 4 M(,) + 6 M(2) - 2M(I) A " M(Z) " (M.W) Calculate b" = (N,C , - N , C,) ± 2(N2 C, - N,Cz) H (A4.21) where Ifb, llnd b2 are the two values ofb in (A4.21) let (A4.2Ia) 4. " Calculated "d 5. = (N, + b" Nz) I (Nz + b" N,). (A4.22) Next calculate " - d) " M(D) { O} = (I + b)(1 (A4.23) A4.8 A A (A4.24) (I J =2(2+ b)(2-d)M(1)' A A Then estimate a and c A A 6. a = A ~ (~+ .LQ.l.) + ~ l..llA (b+ l)(b+ 2) ( 8) 2 + b (M.25) 1 and A A 7. C = A I (l - d)(2 - d) ( - ( 1 1\ 1\ 1\/1. d (b + d) 2 + d + \O_L) (A4.26) Applying these formulae to the example data of section A4.3.1 gives N, = - 373.3834 Nz = 46.7857 N3 = - 70.3751 C, = - 161.9436 Cz = 12.0352 C3 = - 11.6363 b, = 22.86140 b. = = 22.8614 1\ b A {OJ= 17757.94898 A .a A 1\/\ 1\ d = - 0.4457 (1J = 23459.43566 = -952.4527. A 230.8253 = 0.44569 c 1\/\ Note that b> - I, d < 1 and (a b + cd) > 0 as required for valid solutions, see "Checks on parameter validity", Section A4.4.3.2, below. Then 1\./\ A "T = x(F = I-1m = A 1\ a(I-(I-F)b) -c(I-(I-F)-d) The 50 year return period flood corresponds to F = 0.98, A xso- = x(0.98) = 230.8253 [1 - (0.02)22.8614] - (-952.5427).[1- (0.02y(-0.4457)j = 230.8253 + (952.4527).[0.825108] = 1016.702 m 3/s, say 1020 m3/s. Ii!} Case m '" 0 I 5 parameter Wakeby). I. Calculate N[, N z and N3 as (A4.27) A4.9 A 1\ N z = + 16 M(,) 1\ /I. 27 M(z) + 12 M(l) - - M(O) (A4.28) (A4.29) 2. 3. Calculate Cl' C z and C, as Cl = Cz = C, = 1\ 1\ 1\ 1\ O) (A4.30) 48 M(,) + 27 M(Z) - 4 M(l) (A4.31) 125 M(.) - 192 M(,) + 81 M(Z) - 8 25 A A M(.) • A A A A 5 M(.)- 12 M(,) + 9 M(2) - M A A 2 M(z) (A4.32) A Calculate b ( N, C l A b = - N l C,) ± H 2(Nz C, - N, C z) (M.33) where H = A This gives two values of b, say bl and bz. Let (M.34) 4. A Calculate d A A d = N l + bNz Nz 5. (M.35) + bN, Calculate A A A (0 ] = ( 1] = 2(2 + (2 ] = 3(3 + b)(3 - d .M(Z) (3 ] = 4(4 + b) (4 - d .M(,) A A (1 + b) (1 - d).~o) (M.36) ~ (2 - 3)'~(I) (A4.37) A 3) A A 3) A (A4.38) (M.39) A Then estimate m, a and c 6. A m = [[3]-[2]-(1]+[OlJ/4 (MAO) A4.1O A A 7. a 8. C = A (b + n(b + 2) ( -l!L I\f\A " b (b + d) 2 + b (M.4I) 3)(2-d) ( - [I) (A4.42) (1- A = 1\" 1\ d (b + d) " 2 + d Applying these formulae to the example data of Section (A4.31) gives Nt C, b, = = = 49.4962 30.2650 6.69148 C2 = 6.69148 d A b A = 104.4811 = 168.8859 A a A c b2 = = = 22.7152 8.3893 0.35047 = 0.35047 A N, C, = = 47.1025 12.2857 [ I } = 7882.0622 [ 3 ) = 13578.8845 {O } = 5347.0944 (2 } = 10625.9922 m N2 = - 1014.8514 and the estimated 50 year flood is = M.4.3.2 x(O.98) = 104.4811 + 168.8859 [I - (0.02)6.69150)_ (-1014.8514) [1- (0.02) -(-0.35047)) = 104.4811 + 168.8859 + 757.2706 =1030.6376 say 1030 m3/s. Checks on parameter validity~ Not all arbitrary combinations of values of a, b, c, d yield an x-F relation corresponding to a valid distriQution i.e. one whose F value lies in the range 0 to I and one for Which the PWMs exist. Valid and invalid parameter combinations were given by Landwehr l1.llL(1978, with corrections, 1979) and were reproduced by Wallis (1980). These are given here in Table A4.4. In addition to these conditions it is necessary that b>-I for the Wakeby mean (= and d<1 (A4.43) A M(O)) to exist (Landwehr tl.ll!., 1978, p914). If these conditions are not also satisfied by band A dthen one of the assumptions on which the algorithm is based, namely existence of M(Ol' is invalid and the algorithm must be considered to have failed. The conditions of Table A4.4 have been expressed more simply for the reparameterised form by Hosking (1986b) and these are given in Table A4.4 both in Hosking's notation and also in terms of a,b,c,d. The first three conditions are necessary to ensure the uniqueness of x and F, i.e that no two distinct parameter sets yield the same x(F) while the last two, in conjunction with the first three, ensure that x(F) does define a proper probability distribution. A4.11 If an invalid parameter combination arises from the above calculation steps, Landwehr ~ (l979b,c) and A Wallis (1980) suggest selecting a maximum allowable value of b, say 50, and calculating the remaining parameter A A values conditional on that b value. If the resulting estimates yield another invalid combination then b is decreased by a preselected amount LIb and another trial is made. These iterations are continued until a valid parameter set is found or A A else until b becomes less than a preselected minimum allowable value, say b = I, in which case the algorithm is considered to fail to provide any valid parameter estimates. Failure may occur with PWMs from individual small samples from thin tailed distributions (Wallis, 1984, Pers. comm.). It rarely if ever occurs with PWM values in the form of regionally averaged standardised PWM's used in the recommended at site / regional WAK / PWM flood frequency procedure whose details are given in Appendix 5 and whose properties were discussed in Chapter 5. Imposing a large value on bminimises the contribution of (I_F)ii to the value of x(F), except for very small A A values of F, i,e in the non-flood end of the distribution. In such a case (m+a) combine to form a single location parameter and the Wakeby distribution is effectively reduced to a 3 parameter distribution, equivalent in form to a generalised Pareto distribution. The above procedure whereby 4 of the parameters are estimated, conditional on a fairly arbitrarily assigned A large value ofb and the consequent degeneration of Wakeby into Pareto does not appeal to Hosking (1986b) who states that "the eventual parameter estimates are effectively on the boundary of the parameter space and the sampling properties of the estimates are difficult to assess. It seems preferable to accept that failure of the estimation procedure is an indication that the Wakeby distribution is not a suitable choice to fit to the data". A4.4.3.3 Numerical Accuracy. The number of decimal places used above may seem excessive for hydrological calculation. However once the hydrological data have been specified, what remains is statistical calculation and calculations based on linear combinations of PWMs as used above are quite sensitive to rounding errors. It is quite in order to round off hydrological data to one or zero decimal places according to the appropriate level of accuracy attributed to the flood data on hydrological or hydrometric grounds. Such rounding should be distinguished from arbitrary rounding within a set of A statistical calculations. Finally the calculated xT value should be rounded at the end of calculations to avoid any A illusions of hydrological accuracy. In fact in the two Wakeby examples shown, the calculated xT are not significantly different from 1000 m'/s. A4 .4.3.4 Effect of rounding. A It is useful to note that if the M lOk values used in the example are rounded off to (515.0, 193.0, 1090,73.0 A and 53.5) i.e a maximum of 0.07% round off, then the 5 parameter Wakeby calculations lead to a value of d = 10.76 which exceeds 1.0 and hence is invalid, see eqn. A4.43. This occurs partly because the internal relations between the several i):10k values have been slightly but significantly disturbed to the extent that the rounded incapable of describing a 5-parameter Wakeby distribution. A4.4.3.5 i):10k values are Estimation of Wakeby parameters in reparameterised form. This form, due to Hosking (1986b), is given in the last entry of Table A4.1. In this case Hosking (1986b) gives the following estimating algorithm for the 5 parameter Wakeby. A Let ak = M lOk = Sample estimate of M lOk obtained by one of the methods of Section A4.3. I. Calculate (M.44) A4.12 Sign of Parameter Valid distribution? (i) a b c d Yes + + + + + + + + + + + + X - + + + - - + - - + - - + + - 1: 2: 3: 4: X X + +. - + + - + + - + + - 2 X X 3 4 + + + + 5 - 1 X X - - No 1 'I- - Maybe X X Valid if (ab + cd) > O. Valid if (ab + cd) > 0 anda> c and b"; Idl. Valid if (ab + cd) >0 and a> c and b" Idl. Valid if (ab + cd) > 0 and either Ibl < Idl or c > a when Ibl = Idl. Valid if (ab + cd) > 0 and either Ibl > Idl or c > a when Ibl = Idl. 5: In addition to the above it is necessary for the mean to exist i.e. b> -1 and d < I, (Eqn. A4.43). Condition Hosking's notation Greenwood ~ notation (ii) Table A4.4 1 ~+ Ii > 0 or ~=r=Ii=O 2 if a = 0 then~=O ifab=O then b=O 3 ify=O then Ii = 0 if cd = 0 then d=O 4 y"O cd"O 5 a+y"O ab+cd"O b + d > 0 or b=cd=d=O Necessary conditions to be satisfied by Wakeby distribution parameters. (i) from Landwehr et aI. (1978, corrected 1979), (ii) from Hosking (1986b). A4.13 (M.45) Nz = 80 - 12 at + 27l1z - 16 8:l N 3 = 80 - 6 at + 9 liz - (M.46) 4 a3 and 2. C l = 4(m-3)(m-4)al - 27(m-2)(m4)lIz + 32(m-2)(m-3)a3 - m3am. 1 (M.47) C2 = 2(m-3)(m-4)al - 9(m-2)(m-4)a2 + 8(m-2)(m-3)a3 - m2am_l (M.48) (m-3)(m-4)al - 3(m-2)(m-4)a2 + 2(m-2)(m-3)a3 - m am- l (A4.49) C3 = where m takes the value m = 5 or m = N (i.e sample size) in the 5 parameter Wakeby case. (m is not to be confused with the parameter m in Greenwood ~.(1979) version of the first form in Table A4.1). Using m = 5 corresponds to equating the sample and theoretical values of the first 5 PWM's as in Greenwood tl..lI!.. (1979). Using m = N in effect makes the estimated lower bound parameter = x(l)' the smallest sample value. In this example, let m = 5. Hence C l = 8 a l - 81 liz + 192 a3 - 125 a. (A4.50) C z = 4 at - 27 liz + 48 a 3 - 25 a. (A4.51) C 3 = 2 at - 12 a, - 5 a. (A4.52) 9 az + Note that Nand C values used here are equal in magnitude but opposite in sign to those used in equations (A4.27 - A4.32). These changes in sign do not affect the following parameter estimates. 3. ~ and - g are the roots of the quadratic equation (A4.53) where i.e Zl' H = z2 - (N l C 3 - N 3 C l ) ± 2(N z C 3 - N 3 C z) = H (A4.54) (Nt C 3 - N 3 C I )2 - 4 (Nz C3 - N3 Cz} (Nl Cz - NZC I ) A 13 max (z" zz} (A4.55) = - min (z" zz} (M.56) = A o A A 4. ~ = {(80 - 8al - 27l1z + 64a3) + 5. a 6. Y A A A A A A AA (13 - 8)(80 - 4al - 9l1z + 1003) - J3 8 (80 - 2al - 3l1z + 4a3) l A A = (1+ (3)(2 + (3)[ - ~ - (1- 8) 1Io + 2(2 - 0) all A A A A A = (1 - 8)(2 - 0) ( ~ + (1+ (3) 80 - 2(2 + (3) all A A I (J3 + 0) A (A4.58) A I (J3 + 0) Applying these formulae to the example data of Section (A4.3.1) gives (A4.57) (A4.59) A4.14 = - 49.4962 Nz = 22.7152 N~ = - 47.1025 C1 = - 30.2650 Cz = 8.3893 C3 = - 12.2857 = 0.35047 = 355.6754 N, ~ = 6.69149 ~ = 6.69149 /) /I IJ /I /I l; = 104.4811 /I IX = = 0.35047 /I 1130.0976 'Y For F = 0,98, T = 50 ~50 = 104.4811 + (1130.0976/6.69149).[1-(0.02)6.69149]_- (355.6754/-0.3505).[1-(0.02)'(-0.3505)] = = 104.4811 + 168.8858 + 1014.7658(0,74619) 1030.57, say 1030 m3/s. Checks on parameter validity. In the re-parameterised form of Table A4.2, the checks on parameter validity to ensure that x(F) is a valid inverse distribntion function and that no two distinct sels of parameters. yield the same value of x(F) are simpler than those required for the first form. These conditions (Hosking 1986) are given in Table A4.4 and are satisfied by the parameter values calcu1ated above. A4.4.3.6 Sampling Properties of Wakeby parameter and quantile estimates. Landwehr tl.l!L. (1979 a,b) investigated these properties by simulation methods. They found that parameter estimates from sample sizes prevalent in hydrology are extremely variable and biased and hence may bear little resemblance to the true values. In such samples also there is a possibility that no valid distribution is described by the estimated parameters. On the other hand estimated quantiles display less erratic behaviour although in hydrologically sized samples they are still quite variable. However Wakeby quantiles of an index flood scaled distribution, estimated from regionally averaged PWMs, are practically unbiassed and have acceptably small standard errors, regardless of the true form of the parent distribution. This property of robustness is discussed in more detail in Chapter 5. Also failure of the estimating alogrithm to produce valid parameter combinations is unlikely in the regional estimation context. A 4.5 Computer program for parameter estimation A computer program is available for performing the calculations involved in the numerical examples of this Appendix. IUs anticipated that this program will be available through the WMO - HaMS programme. APPENDIXS NUMERICAL EXAMPLES Introduction Numerical examples are included here to illustrate some of the at-site/ regional techniques mentioned in the text. Such examples serve only to illustrate the calculation steps involved. They do not in themselves show how appropriate or how efficient such methods are for estimating flood quantiles, QT' Only appropriate statistical tests and analyses, carried out algebraicly or by simulation methods, can answer such questions. Such studies were discussed in Chapter S. DATA A medium sized data set of M = 20 stations each having N = 20 years of AM flood record from East Central England is used for illustrative purposes. The AM series and their probability weighted moment (PWM) values are given in Table AS.1. Some AM floods in this region are caused by frontal rain (winter or summer), some by convective rainstorms (mostly summer) and some by snowmelt (winter/spring). Regional average values of statistics obtained from a larger data set indicates that popnlation regional average values of Cv and Cs are of the order of 0.50 and 3.0 respectively. Index flood method based on regionally averaged standardised probability weighted moments This method was initially proposed by Wallis (1980) in conjunction with the Wakeby distribution. That procedure is denoted WAK / PWM in the text. The same method was used with EVI and GEV distributions by Greis and Wood (1981) and by Hosking et al. (198Sa) respectively. Basically the variate Q is replaced by X = Q / (E(Q) which is assumed to have a common distribution at all sites in the region. The scaling quantity, in this case E(Q), is called an index flood because for a given catchment it characterises the flood magnitudes for that catchment and it reflects the effect of the very important controlling factor of catchment area. A graphical implementation of this type of technique was introduced by Dalrymple (1960). Tbe steps involved are: The PWMs of this X distribution are first estimated, steps (1) to (3) below, in a distribution free manner. These are then used to estimate the parameters and quantiles of a distribution chosen to represent the X distribution, steps (4) to (6) below. These X quantile estimates are then scaled (i.e. multiplied) by estimated mean annual flood Q, i.e. an estimate of E(Q)for any site at which flood quantile estimates are desired (step 7). A Step I. For each AM record compute MlOko k GEV, V Step 2. =3 for = 0, 1, 2...v, using equation (A4.8). 4-parameter Wakeby and v A A =4 for S paramcter Wakeby. Here y = I for EV1, v = 2 for A M 100 is the mean of the series. A For each record compute mk = M 10k / M100, k = 0, I, 2,.....v. These are standardised PWM values. The Gik values for the present example are tabulated in Table AS.I. Note that Gi O = 1.0. Step 3. For each k, calculate weighted average values of Gi k over M records as follows M mk = L (Gik)j [Nj / L] (AS.I) Nj (AS.2) j=! where M L = L j=! m These ( mO =1, I' m2 ) values are now considered as sample estimates ofPWMs for the regional standardised population whose variate X = Q I E(Q) has mean of unity. In the present example the weights [NjJLl are all equal and mk is simply the arithmetic mean of (Gikh, (Gik )2...•.....(Gikho. The mk values for the present example are: (LO, 0.36217, 0.20679. 0.13976. 0.10336). Step 4: Choose a distributional form for the X population. Step 5: Apply methods of PWMparameter estimation described in Appendix 4 to estimate the X population ). parameters from (I, m 2, m 3 Step 6: Calculate estimates of quantiles, QT, of the estimated X population. These are assumed to apply to every site in the. region. Step 7: The at-site quantile estimate QT is obtained as A A ~ - A (A5.3) Q.XT = where Q is the at-site mean annual flood obtained preferably from observed data or in their obsence from a regionally calibrated relation between Q and catchment characteristics. Example Cal: Application of GID' - distribution. 1\ Since the GEV estimating equations (A4.12) A (A4.14) are expressed in terms of MljO rather than MlOk. convert the mJi; values obtained above to corresponding m j. values using equations (A4.3). Thus (mj)j,=2 1.0 - 2(0.36217) + 0~20679 = OA8245. = A Note that the exact same values would have been obtained for m j if MIjO values had been used in steps I to 5 above. Now (LO, R63783. 0.48245) are used to estimate GEV values of (u. a. k) using equations (A4.12), (A4.12a), (A4.13) and (A4.14) to give A. k A· u a" = -O.U64 = 0.7508 (}.3529 = A A lOT = A u + a.t. "I I - f • In (t - I I 'F»)I{ Ii k "'O X = 2A934 (A5A) A5.3 Thus for site number 10 whose mean annual flood estimate is Q =17.352 m3/s 17.352 (2.4934) = 43.265. say 43 m3/s. = Example (b). Application of 5-parameter Wakeby distribution. 1\ Use (l.O. m 10 m2. m3. m4) = [1.0. 0.36217. 0.20679, 0.13976, 0.10336) for [MlOk. k estimating procedure described in Appendix 4, pp A4.9 - A4.10. = 0 ..... 4) in the This leads to the following estimates in the Greenwood tlJl1 (1979) notation m = 0.2442 a = ~ = 0.4598 4.1904 c = 1.3874 = 0.2170. 1\ 1\ a A A k A A The corresponding values in Hosking's parameterisation are ~ =0.2442, a = 1.9267, P = 4.1904, y= 0.3011 and Ii = 0.2170. Each of these sets of parameter estimates satisfy the conditions. expressed in Table A4.4. for the definition of a valid Wakeby distribution. Then from A x(F) = 1\ 1\ m + a [1 1\ (1 - F)b ] 1\ 1\ - c[ 1 - (1 - F)-d ] the T = 50 year quantile estimate is 1\ x50 = 0.2442 + 0.4598[1 - (0.02)4.1904] - 1.3874[1 _ (0.02)'0.2170] = 0.2442 + 0.4598 + 1.8551 = 2.5591 Thus for site number 10. 1\ Qso = (17.352)(2.5591) = 44.405, say 44.5 m3/s. (A5.5) M.4 32007 28804 YEARS 1940-59 1935-55 1950-69 1944-66 1960-79 1960-79 1960-79 1950-69 1960-79 1940-59 27.36 26.60 17.45 22.34 31.34 34.51 41.53 25.48 27.36 20.72 172.75 137.07 164.50 181.17 240.28 356.96 105.13 140.85 63.13 140.85 2.34 4.13 4.16 10.12 6.18 4.81 3.01 27.81 4.07 4.29 560.82 1006.53 1107.33 495.39 624.08 476.74 624.08 458.53 374.00 279.08 28.88 8.63 7.93 12.63 5.38 18.32 22.93 11.61 25.90 14.72 6.94 2.55 5.77 6.26 2.41 5.02 8.37 7.26 8.74 5.00 6.90 3.53 5.85 7.01 2.63 8.18 10.52 9.74 10.45 7.84 23.73 14.56 13.11 10.18 26.32 10.77 16.61 30.03 24.05 16.61 11.88 4.59 8.88 14.05 10.70 13.36 14.29 13.82 14.84 15.13 31.51 24.68 22.90 3.47 25.28 14.01 24.80 6.00 10.10 18.17 23.70 25.48 10.57 11.70 24.69 17.64 13.65 14.16 13.14 24.32 306.74 176.94 137.07 148.55 115.35 140.85 111.90 148.55 203.02 133.33 4.55 3.26 5.33 7.20 6.10 6.25 2.91 5.23 5.51 4.81 747.15 547.16 732.82 809.99 265.71 295.86 356.04 337.56 731.25 385.12 14.11 9.71 7.82 7.46 34.30 5.16 37.54 28.06 25.86 27.99 6.29 2.68 6.52 2.38 7.82 2.56 7.73 7.47 7.77 7.37 6.04 2.69 10.48 4.18 17.01 0.70 16.06 17.39 18.58 20.50 18.22 9.53 17.46 16.73 5.91 13.67 16.94 13.87 15.77 14.97 19.53 18.00 19.16 6.03 7.36 1.84 7.53 7.43 7.36 6.99 22.31 17.56 14.01 13.33 21.26 10.21 11.08 17.56 26.49 12.30 22.687 166.250 8.997 65.179 5.279 39.280 3.616 27.546 2.701 20.879 6.104 2.049 1.207 0.838 0.632 560.762 17.747 211.720 5.922 122.770 3.128 84.275 2.020 63.336 1.460 1.000 0.397 0.233 0.159 0.119 1.000 0.336 0.198 0.137 0.104 ,l- Ml00 MI0l M102 M103 MI04 roo ml m2 m3 m4 1.000 0.392 0.236 0.166 0.126 1.000 0.378 0.219 0.150 0.113 1.000 0.334 0.176 0.114 0.082 32002 32006 28070 I I I I I I I I AM DATA m 3 /s I I I I I I I I 30001 32004 28002 .,. 28010 32003 STATION 5.846 2.288 1.290 0.852 0.618 9.314 16.452 2.996 6.542 1.534 3.884 0.953 2.683 0.658 2.008 1.000 0.391 0.221 0.146 0.106 1.000 0.322 0.165 0.102 0.071 1.000 0.398 0.236 0.163 0.122 11.139 17.352 4.103 6.477 2.276 3.609 1.498 2.375 1.083 1.714 1.000 0.368 0.204 0.134 0.097 ·1.000 0.373 0.208 0.137 0.099 Table M.l : Annual maximum flood data from East Central England for regional flood frequency estimation example. A5.5 32008 1945-66 32010 1940-59 33006 33009 34001 34002 34003 36001 36008 37001 1956-77 1951-70 1958-77 1958-77 1959-78 1955-74 1961-80 1950-69 3.66 29.56 3.53 3.92 7.09 12.87 6.06 4.13 8.84 15.80 89.09 65.45 75.28 21.45 58.30 38.61 255.00 28.67 48.65 62.68 6.47 78.60 9.69 80.90 65.03 9.41 8.73 152.72 8.85 77.74 5.63 77.74 4.74 82.90 5.60 147.16 4.36 77.98 10.39 145.34 10.42 9.90 11.13 7.14 6.00 7.89 6.63 17.98 7.22 21.80 13.73 10.44 12.90 5.28 5.00 6.12 3.37 13.02 6.92 61.92 5.60 10.07 5.18 2.62 3.66 2.72 8.23 6.42 9.35 6.38 25.20 20.39 40.07 40.07 40.07 23.22 24.92 33.76 38.91 23.22 12.77 15.78 21.72 5.98 13.09 11.87 85.00 22.30 17.78 20.38 28.03 16.99 26.61 21.23 17.55 14.72 15.57 31.14 22.93 20.95 3.14 7.72 9.45 26.72 7.33 15.31 4.12 10.43 14.84 11.66 71.51 54.15 53.86 47.83 80.57 48.39 46.43 68.28 98.14 62.53 5.40 85.85 6.60 78.10 9.82 95.35 7.59 36.98 9.25 81.63 9.12 ' 89.16 2.89 183.06 13.30 128.92 28.03 83.77 6.46 88.74 13.08 8.74 14.21 14.21 2.88 7.10 15.23 5.01 7.71 8.04 10.98 8.89 12.58 9.89 2.61 3.81 11.31 4.13 9.07 7.31 5.19 9.02 8.11 3.43 2.88 9.99 2.40 10.65 3.72 6.04 20.39 18.69 99.12 34.92 29.88 28.67 31.01 4.07 31.06 29.38 19.83 10.88 9.97 31.20 8.50 18.52 38.00 '34.34 21.11 22.96 36.80 25.48 13.99 38.64 9.48 24.52 19.96 34.11 31.14 20.95 10.309 3.219 1.712 1.118 0.814 68.744 24.413 14.283 9.792 7.291 8.617 3.129 1.824 1.249 0.932 1.000 0.312 0.166 0.108 0.079 1.000 0.355 0.208 0.142 0.106 1.000 0.363 0.212 0.145 0.108 Table A5.1 (continued) 96.884 10.116 10.964 38.929 3.737 3.209 23.717 2.152 1.746 16.803 1.468 1.144 12.841 1.094 0.828 1.000 0.402 0.245 0.173 0.133 1.000 0.369 0.213 0.145 0.108 1.000 0.293 0.159 0.104 0.076 6.083 31.851 22.099 23540 2.221 11.815 7.213 9.385 1.236 6.886 4.039 5.545 0.824 4.679 2.704 3.831 0.607 3.448 1.985 2.879 1.000 0.365 0.203 0.135 0.100 1.000 0.371 0.216 0.147 0.108 1.000 0.326 0.183 0.122 0.090 1.000 0.399 0.236 0.163 0.122 MI00 MIDI M102 M 103 MI04 rno rnl rn2 rn3 ID4 1 APPENDIX 6 WMO SURVEY ON DISTRIBUTION TYPES CURRENTLY IN USE FOR FREQUENCY ANALYSIS OF EXTREMES OF FLOODS BY HYDROLOGICAL AND OTHER SERVICES Prepared for the WMO Secretariat by B. Sevruk and H. Geiger, Institute of Geography, Federal Institute of Technology, Zurich, Switzerland. Introduction The report contains an analysis of replies to a questionnaire sent to selected members of the WMO Commission for Hydrology. Fifty-five agencies from 28 countries took part in the survey. The information and results are summarized in five tables as follows: Table I List of countries and agencies Table II Distribution types currently in use for extremes of precipitation Table III Distribution types currently in use for floods Table IV Plotting positions currently in use for extremes of precipitation and floods Table V The most frequently used distribution types for precipitation and floods Each country was also requested to indicate specific references of publications, manuals or textbooks containing descriptions of the national / agency standard or recommended distribution type used operationally. These references are also included in this Appendix. Selection of distribution types In most countries there is no "standard" or adopted distribution; but some statistical is procedure recommended in a general manner and applied. Usually a number of different distributions are applied to data, as shown in Tables II and III and the choice of a particular distribution is made empirically and I or by comparison by using statistical tests. The "best fit"distribution is eventually used. A frequently used statistical test is that by Kolmogorov-Smirnov but the X2 and Van Montfort [extreme value type I (EVI) and extreme value type 2(EV2) distribution] as well as many other tests (Anderson-Darling; Brunet-Moret; Cramer-Von Mises; Kritski-Menkel, and some subjective tests) are also applied. However, in many countries a selection of an AM distribution is actually not made in any objective manner; lhe choice of distribution is argued in a general manner, as follows: The distribution is - widely, most or generally accepted simple, easy, quick or convenient to apply consistent, flexible or robust (low sensibility to oUlliers) - theoretically well based - documented in the WMO Guide or elsewhere. No special method or parameter estimation is preferred and the graphical method is as frequently or even more used as any other method. A6.2 Currently used distribution types and plotting positions The most freqently used or recommended frequency distribution types for extremes of both precipitation and floods are the EVI and the log-normal distributions. Almost one half of all agencies use these distribution types. This agrees well with the conclusions in the WMO report by Sevruk and Geiger (1981) which reviewed the literature on the extremes of precipitation. The EV2, the Pearson 3 (P3) and the log-Pearson 3 (LP3) distributions are also in use but mainly for floods, where they account for one third of all the cases. Among the plotting positions, the Weibull formula is by far the most favoured (1{2 of agencies), followed by the Gringorten, Hazen and Blom formulas (1/3 of all agencies). A6.3 TABLE I List of countries and agencies No. COUNlRY AGENCY ADDRESS ABREVIATION REFERENCE 1 Australia Queensland Water Resources Commission GPO Box 2454, Brisbane QLD 400I AA 1 2 M.W.S. & D.B. PO Box A53, Sydney South, Australia 2000 AA2 3 N.S.W. Public Works Department Office Phillip St., Sydney 2000 AA3 29 4 Engineering & Water Supply Depanment GPO Box 1751, Adelaide 5001 AA4 24,29,37,47 5 Bureau of Meteorology GPO Box 1289K, Melbourne VIC., 3001 AA5 37,48,49 6 " AA6 36,48,49 ·AA7 29,31,32 6, 7, 8, 12, 14, 34 7 State Rivers & Water Supply Commission of Victoria, 8 Hydro-Electric Comm. GPO Box 355D, Hobart, Tasmania, 7001 AA8 29 9 Water Resources Branch Public Works Department 2 Havelock Street, West Perth, Western Australia AA9 29 10 Water Division, Dept. of Transpon & Works Darwin, Northern Territory AA 10 . 590 Orrong Road, ArrnadaIe, Victoria 3143 29,37,47 11 Auslria Hydrographical Service Marxergasse 2, A-1030 Vienna All 20,40 12 Belgium Section d'Hydrologie, Institut Royal Meteorologique de Belgique 3, ave. Circulaire, 1180 Brus,els BE I 45 13 Brazil Departamento Nacional deObrasde Saneamento-Ministerio do Interior Rua Debrct, 23-99 Andar-Rio de Janeiro RJ Brazil BR I 14 Bulgaria Institute d'Hydrologie et de meteorologic Blvd. Lenine, No.66 Sofia No. BU I 55,56,57 15 Canada Inland Water Directorate Environment Canada CA I Ottawa, Ontario,KIA OE7 22 A6.4 TABLE I (continued) No. COUNlRY AGENCY ADDRESS ABREVIATION REFERENCE 16 Costa Rica Instituto Costaricense de Electricidad Apartado 10032, San Jose CR I 17 Cyprus DejJartment of Water Development Nicosia CY I 18 Czechoslovakia Czech Hydrometeonr logical Institute' Slovak Hydrometeorological Institute 15129 Praha 5, Smichov, Holeckova 8, 833 15 BratislavaKolilia, Jeseniova 17 CZl' 19 Finland The National Board of Waters,"Hydrological Office P,O. Box 250 SF-OOIO! Helsinki 10 FII 20 Fnmce Service Hydrologique de I'Orstom 7~ 74ROlited'Aulnay 91140 Bondy FRI 12 ·21 German Democratic Republic Meteorologischer Dienst der DDR, Forschungsinstitut fllr Hydrometeorologie DDR-1020 Berlin GDI 41 Institut fllr Wasserwirtschafi DDR-ll90 Berlin GD2 20 14, 27, 35, 52 22 Germany, Federal Republic Institut fUr Hydrologie und Wasserwirtschafi, Universitllt Karlsruhe " KaiserSlrasse 12, 7500 Karlsruhe I GF I 29 23 Hong Kong Royal Observatory Nathan Road~ KoWloon HKI 25 24 Hungary ResearchCenlre for Water Resources Development PO Box 27, H-1453 Budapest HUI 16 25 Netherlands Rijkswaterstaat' Directie PO Box 20907, Waterhuishouding en ,·2500 EX The Hague Waterbeweging NE 1 3, .:is Royal Netherlands Meteorological Institute (KNMI) PO Box 201, 3730 AE De Bilt NE2 9, 10 26 27 New Zealand Water Resource Survey DSIR P.O. Box 12 - 043 Wellington NZI "5, 15, 50 28 Norway The Norwegian Meteorological Inst. Box 320, Blindem Oslo 3 NOI 38 A6.5 TABLE I (continued) No. COUNTRY AGENCY ADDRESS ABREVIATION 29 Poland Institute of Meteorology & Water Management 61 , Podlesna str., 01 - 673 Warsaw, PO I 11,30 30 Romania Institute de Meteorologie et Hydrologie Soseaua BucurestiPloiesti, Nr. 97Bucuresti-Romanie RO I 19 31 Sweden Swedish Meteorological & Hydrologicallnst. Box 923, S - 601 19 Norrk5ping SE I 32 Switzerland InstilUt federal de recherches foresticres CH-8903 Birmensdorf SW I 33 Service hydrologique nationale Case postale 2742 3001 Berne SW2 34 Service federal des routes et des digues Case postale 2743 3001 Bcrne SW3 Hydrology Division Royal Irrigation Dept. 811 Samsen Road Bangkok 10300 TH I 36 Electricity Generating Authority of Thailand Rama 6 Bridge Nonthaburi, Bangkok TH2 32 37 Hydrology Section National Energy Administration Ban Phibuntham, Kasatsuk Bridge, Rama I Road Bangkok 10500 TH3 36 35 Thailand REFERENCE 54 21 38 Turkey General Directorate of State Hydraulic Works Yilcetepe, Ankara TU I 4,23 39 United Kingdom Wessex Water Authority Wessex House Passage Street Bristol BS20JQ UK I 39 40 North West Water Authority Dawson House, Liverpool Road, Warrington WA5 3LW UK2 39 41 Thames Waler Authority Vastern Road Reading RG I 8DB UK3 39 42 Southern Water Authority Guildbourne House Worthing, BNll ILD UK4 39 43 Anglian Water Authority Ambury Road, Huntingdon PEl86NZ UK5 39 A6.6 TABLE I (continued) AGENCY ADDRESS ABREVIATION Institute of Hydrology Wallingford OXlOSBB UK6 2S.49 Greater LondilU Council Dept. of Public Health Engineering, River Branch South Block, County Hall, London SEI 7PB UK7 39 46 Norihumbrian Water Authority Gosforlh, Newcastle upon Tyne NE33PX UK8 1,2, IS 41' Sevem"TtentWater Authority 2297 CoventryRoad Birmingham, B263PU UK9 17.39 4S' WelshWater Authority Cambrian Way. Brecon, Powys, LD37HP UK 10 39' 49 South West Water Authority Matford Lane,Exeter EX24QX, Devon UKlI 39 No. COUN1RY 44 45 UnitedKingdom (Continued) REFERENCE 50; Uruguay Direccion Nacional de Meteorologia ' Castilla de Correa No; 64 UR I 53 51 USA NOAA,National Weather Service S060 13th St. Silver Spring MD US 1 26.51 52 US Army Corps of Engineers. HQDA (DAEN"CWH) Washington DC 20314 US 2 51 53 US Geological Survey 415 National Center Reston Virginia 22092 US 3 51 National Council for Scientific Research & Meteorological Dept. POBox 30200, Lusaka Zambia ZA 1 54 zambia A6.7 TABLE II Distribution types currently in use for extremes of precipitation (') indicates distribution recommimded or used as "standard" distribution DISTRIBUTION TYPE COUNTRY AND AGENCY TOTAL % TOTAL' % NO. OF COUNTRIES Extreme Value (EVI) AAI, BEl', CRI', CZ',Fll', FRI, GD, GFI', HKI', NE2', NZI', NOl', POI', SEI, SWI, THI', TH2', TUI', UK2', UK4', UKS, UK7', UK11', URI', USI' 2S 30 19 41 20 Extreme value (EV2) AAI, BEl', HI, FRI, SWI, TUI', UKS, USI' 8 10 3 6 8 Extreme value (EV3) AAI, GD', SEI, UKS 4 S I 2 3 General extreme value (GEV) FRI, NE2', UKl", UK3', UKS', UKl!' 6 7 S 11 3 Pearson 3 (P3) AAI, BUI, FRI, POI', ROl', TH3, TUI' 7 9 3 6 7 Gamma 2 or 3 parameters BUI, CYI', CZ', SEI 4 S 2 4 4 Exponential GFI', SWI 2 2 I 2 2 Log-Pearson 3 (LP3) AAI, AA3, FRI, TR3', TVI' S 6 2 4 4 Normal SEI, ZAI' 2 2 I 2 2 Normal- power-transformed AAI I I 0 0 I Log-Normal AAl', AAS', AA7', AA8', BEl, BUI', FRI,HUI, POI', SEI, TH3,TUI, , UKS 13 16 7 IS 10 General symmetrical distribution AAI I I 0 0 I Boughton AAI I I 0 I I Wakeby AAI I 0 0 I Loi des fuils FRI I I 0 0 I Krilski-Menkel ROI' I I I 2 I I I I I I I 2 2 I I 84 100% 47 100% Empirical methods Log-log Hazen Total BRI' TRI' A6,S TABLE III Distribution types currently iu use for floods (*) indicates recommended or used as "standard" distribution DIS1RIBUTION TYPE COUN1RY AND AGENCY TOTAL % TOTAL* % NUMBER OF COUNlRIES Extreme value (EVI) AAI, AAZ, AA3*, All*, BEl, CAl, CRI', CYI', FII*, FRI, GD, GFI, HKI', NZI*, SEI, TRI', THZ', TH3*, TVI*, UKI, UKZ*, UK3*, UK4*, UK5*, UK7*, UK9, UKI9*, UK))* ZS ZO.O IS Z5.0 16 Extreme value (EVZ) AAI, AAZ, AA3*, BEl, FII,FRI, GEl, NZI, TVI*,UK5, URI* )) 7.7 3 40 9 Extreme value (EV3) AAI, CAl, CRI, CZ, FII, SI, TH3, UK5 S 5.6 0 0.0 S Geneml extreme value (GEV) FRI, UKI*, UKZ*, UK3* UK5*, UK9, UK))* 7 4.9 5 7.0 Z Pearson 3 (P3) AAI, AAIO, All', CZ', BU!, FRI, HU!, POI', ROl*, SWZ, SW3, THZ', TH3, TVI*, UKI', UKIO', UK)) l7 11.9 8 IZ.O IZ Gamma with Z or 3 parameters BEl, BU!, CRI', CZ', GFI', SEI 6 4.Z 3 4.0 6 Exponential BEl, NEI', SW3, UK7',UKS* 5 3.5 3 4.0 4 Log-Pearson 3 (LP3) AAI*, AA3', AA4',AA6', AA7*, AAS' ,AA9', AA10', All', CAl, CRI', CZ', FRI, GFI', NZI, SWZ, TH3', TUl', UK)), USI', USZ', US3' ZZ 15.4 17 Z3.0 13 Normal SEI I 0.7 0 0.0 I Nonnalpower transformed AAI' I 0.7 0 0.0 I Normal-Box-Cox transfonned AA4 I 0.7 I 1.0 I Log-Normal AAI, AAZ, AA3', AM, AA7', AA9', BEl, BUl', CAl, CRI', CYI, CZ', FIl, FRI, GFI, SEI, SWZ, SW3, THZ', TH3, TVI', UK5, UK7', UK9, UKIO', UK)), ZAI' 27 18.9 11 15.0 16 A6.9 TABLE III (continued) DIS1RIBUTION TYPE COUNTRY AND AGENCY TOTAL % TOTAL' % NUMBER OF COUNTRIES General symmetrical distribution AAI I 0.7 0 0.0 I Boughton distribution AAI 1 0.7 0 0.0 I Wakeby Distribution AAI I 0.7 0 0.0 1 Loi des foits FRI 1 0.7 0 0.0 1 Kritski-Menkel ROI' I 0.7 I 1.0 1 Beta FRI 0.7 0 0.0 1 0.7 1 I 1 0.6 1 I 1 Empirical methods Hazen TIll' Schreiber-Noblis All' Total 1 143 72 A6.10 TABLE IV Plotting positions used in use for extremes of precipitation and floods (n; sample size, m ; rank in descending order) PLOTTING POSmON Weibull COUNlRY AND AGENCY T ; n + I m AAI, AA2, AA3, AA6, AA7 TOTAL (%) NUMBER OF COUNlRIES 29 47 16 8 13 3 7 11 6 6 10 2 AA8, AA9, AA11, All, BEl, CRI, FII, GFI, HKI, NZI, POI, RO!, SWI, SW2, THI, TH2, TUI, UK2, UK4, UK11, US I, US2, US3, ZAI Gringorton T ; n + 0.12 m - 0.44 NE2, NZl, UK2, UK3, UK4, UK5, UK9, UK11 Hazen T ; n m - 0.5 AA2, BRl, CZ, FRl, THl UK9, UKll Blom T ; n + 0.25 m - 0.375 AA7, UK2, UK5, UK8, UK9 UK 11 Cunnane T; n + 0.2 m - 0.4 AAl, AAlO, CAl, GFI 4 7 3 California T ; n m AA2, All, BRI 3 5 3 Chegodayev T ; n + 0.4 m - 0.3 NEI, POI 2 3 2 Adamowski T ; n + 0.5 m - 0.24 CAl 1 2 1 T ; (n + I) - a m-a 1 2 1 61 100 Total AA4 A6.11 TABLE V The most frequently used distribution types for precipitation and floods (*) indicates reeommended or used as "sl1lndard" distribution DISTRIBUTION TYPE FLOODS PRECIPITATION % %* % %* Extreme value I (BVl) 30 41 20 27 Log-normal (LN) 16 15 19 15 Extreme value 2 (BV2) 10 6 g 4 General extreme value (GEV) 7 11 5 7 Pearson 3 (P3) 9 6 12 12 Log-Pearson 3 (LP3) 6 4 15 23 Extreme value 3 (BV3) 5 2 6 0 Gamma 5 4 4 6 Others 12 II A6d2 References containing descriptions of distribution types used by national hydrological agencies I .2 Archer, D.P., 1981: Seasonality of flooding and assessment of seasonal flood risk. Proc. Inst. Civ. Eng. Pt2, 70. Archer, DR, 1981: A catchment approach to flood estimation. J. Inst. Water Eng. and Sci., 35 (3). 3 Barlow, R.E. et aI., 1972: Statistical inference under order restrictions. Wiley, New York. 4 Bayazii, M.: Statistical methods in hydrology. Ankara. 5 Beable, M.E. and McKerchar, A.I., 1982: Regional flood estimation in New Zealand. Tech. Pub. No. 20, Water & Soil Div., Ministry of Works and Development, Wellington. 6 Bobee, B. and Boucher, P., 1981: Adjustment des distributions Pearson type 3, gamma, log-Pearson type 3 ei 10g:gamma. Rapport Scientifique No. 105, INRS-Eau, Universite du Quebec. 7 Bonghton, W.C., 1980: A frequeney distribution for annual floods. Water Resour. Res., 16 (2), pp.347-354. 8 Box, G.E.P. and Tiao, G.C., 1973: Bayesian influence in statistical analysis. Addison-Wesley, Reading, pp . 156-160. 9 Buishand, T.A. and Velds, C.A., 1980: Klimaat van Nederland, Neerslag en Verdamping. Staatsuitgeverij, Den Haag, pp 206. 10 Buishand, T.A., 1983: Uitzonderlijk hoge neerslaghoeveelheden en de theorie van de extreme waarden. Cultuurtechnisch Tijdschrift, 23, 9-20 11 Byczkowski, A., 1972: Hydrological background for the design of the land reclamation structures. Extremal flows. PWRiL, Warsaw. 12 Cahiers de I'Orstom, Serie d'Hydrologie, 6 (3), 1969; 10 (2), 1973; 11 (4), 1974; 12 (2), 1975; 15 (3), 1978. 13 Chander, S., Spolia, S.K. and KUMAR, 1978: Flood frequency analysis by power transformation. J. Hydraul. Div., A.S.C.E., 104 (HYll), 1495-1504. 14 Chow, V.T. (ed), 1964: Handbook of applied hydrology. McGraw-Hill Book Company, pp 8-25 to 8-26. 15 Coulter, J.D. and Hessel!, J.W.D., 1980: The frequency of high intensity rainfall in New Zealand, Part II, point estimates. NZ Met. Ser. 16 Csoma, J. and Szigyarto, Z., 1975: A matematikai statisztika alkalmazasa a hidrol6giaban. Vizgazdlilkodesi Tudomanyos KutatO Intezet, Budapest. 17 Cunnane, C., 1978: Unbiased plotting positions - a review. J. Hydrol., 37 (3/4), pp 205-222. 18 Dalrymple, T., 1960: Flood frequency analysis. Manual of Hydrology. Pt. 3, US Geol. Survey. 19 Diaconu, C. and Mociornita, D., Instructions techniques pour la determination des particularites des cmes de calcul (tMoriques) sur les rivieres, Bucarest. 20 DVWK, 1976: Empfehlung zur Berechnung der Hochwasserwahrscheinlichkeit. H. 101, DVWK-Reglen ZUf Wasserwirtschaft. Hamburg, Paul Parey Verlag. 21 Eidgenoessisches amt fUr Strassen nnd Flussbau, 1974: Die grossten bis zum Jahre 1969 beobachteten Abflussmengen von schweizerischen Gewassem, Bern. A6.13 22 Environment Canada Manuals Flood Damage Reduction Program, Flood Frequency Analysis (FDRPFFA) Flood Frequency Analysis with Low Outliers (LOWOUT) Flood Frequency Analysis with Historic Information (ISTORIC) Inland Waters Directorate, Environment Canada, Ottawa, Ontario, KIAOE7 23 Erkek, C.: Statistical methods in flood estimations. Ankara. 24 Gnanadesikan, R., 1977: Methods for statistical data analysis of multi-variate observations. J. Wiley & Sons. 25 Gumbel, E.J., 1954: Statistical theory of extreme values and some practical applications. US Nat. Bur. Stds. Appl. Math. Ser. 33. 26 Gumbel, E.1., 1958: Statistics of Extremes. Columbia University Press, New York, pp 375. 27 Haan, T.C., 1977: Statistical methods in hydrology. The Iowa State University Press. 28 Hamlin, M.1. and Wright, C.E., 1978: The effect of drought on the river systems. Pap. to Royal Soc., pp 7393,London. 29 Institution of Engineers, 1977: Australian rainfall and runoff-flood analysis and design. 2nd ed., Australia. 30 Kaczmarek, Z., 1970: Statistical methods in hydrology and meteorology, WKi, Warsaw. 31 Karoly, A. and Alexander, G.N., 1960: Analysis of storm rainfall in Victoria. 32 Kite, G.W., 1977: Frequency and risk 'analysis in hydrology, Water Resour. Publ., Fort Collins, Colorado 80522, U.S.A. 33 Koppittke, R., Steward, B. and Tickle, K., 1976: Frequency analysis of flood data in Queensland. Paper presented at Institution of Eng. Austr. Hydrol. Symp., Sydney. 34 Landwehr, J.M., Matalas, N.C and Wallis, J.R., 1979: Estimation of parameters and quantiles of Wakeby distribution 1 and 2. Water Resour. Res., 15 (6), pp 1361-1379. 35 Linsley, R.K., Kohler, M.A. and Paulhus, J.L.H.: Applied hydrology. McGraw-Hill, New York. 36 Maniak, U., De Haar, U., Hofius, K., Johannsen, H.H., Liebscher, H., Schroeder, R., Schultz, G., Wohr, F. and Zayc, R, 1971: Theoretische Hydrologie, Heft 1, Stochastische Verfahren. Deutsche Forschungsgemeinschaft, Bonn. 37 McMahon, T.A. and Srikanthan, R., 1981: LP III distribution - is it applicable to flood frequency analysis of Australian streams? J. Hydrol., 52 (lj2), pp 139-147. 38 Nemec, J., 1972: Engineering Hydrology. McGraw-Hill Publishing Company Limited, England. 39 NERC, 1975: Flood studies report. Nat. Env. Res. Council, London. 40 Nobilis, F., 1981: Zur Berechnung der n-jlihrlichkeit von Hochwassern, und zur Interpretation von KonfidenzintervalJen. Mitteilungsblatt des Hydrographischcn Dienstes in Oesterreich, Nr. 49, S.44-59, Wein. 41 QWRC, 1982: An empirical study of flood frequency data. Queensland Water Resour. Comm., Surface Water Branch, Internal Rep. No. 000716. PR, Febr. 42 QWRC, 1983: Plotting positions for Queensland flood frequency data. Queensland Water Resour. Comm., Surface Water Branch, Internal Rep. No. 000717, Jan. 43 Reich, T.: Zur Hliufigifkeitsverteilung extremer Tagessummen der Niederschlagshohe, Meteorol. Z. (in press). A6.14 44 Sevruk, B. and Geiger, H.; 1981: Selection of distribution types for extremes of precipitation. World Meteoro!. Org., Operational Hydrol. Rep., 15, WMO-No. 560, pp 64. 45 Sneyers, R., 1975: Sur l'analyse statistique des series d'observations. WMO-Tech. Note No. 143, pp 189. 46 Srikanthan, R. and McMahon, T.A., 1981: Log Pearson III distribution - effect of dependance, distribution parameters and sample size on peak annual flood estimates. J. Hydro!., 52 (1/2), pp 149-159. 47 Srikanthan, R. and McMahon, T.A., 1981: Log Pearson III distribution - an empirically-derived plotting position. J. Hydrol., 52 (1/2), pp 161-163. 48 Stephens, M.E., 1974: EDF Statistics for goodness of fit and some comparisons. J. Am. Stat Ass., Vol. 6g, pp 730-737. 49 Tabony, R.G., 1977: The variability of long duration rainfall over Great Britain. Metoorol. Off. Sci. Pap. No. 37. 50 Tomlinson, A.I., 1980: The frequency of high intensity rainfalls in New Zealand. Part I. Tech. Pub. No. 19, Water & Soil Div. Ministry of Works and Development, Wellington, pp 36 + 4 maps. 51 U.S. Interagency Committee on Water Data, 1982: Guidelines for determining flood flow frequency: Hydrology Subcommittee Bulletin 17-B (with editorial corrections, March 1982), U.S. Gool. Survey, Office of Water Data Coordination, Reston, Virginia 22092. 52 Viesmann, W.R. et aI., 1972: Introduction to hydrology. Intext Educational Publishers, New York. 53 WMO, 1981: Hydrological Operational Multi-purpose Subprogramme (HOMS) Reference Manual. World Meteorol. Org., 1st Ed. 54 Zeller, J., Geiger, H. and Roethlisberger, G.. 1976·1981: Starknieder-schlfige des schweizerischen Alpen - und Alpenrandgebietes. Band 1-5, Eidg. Anstalt fUr das forstliche Versuchswesen, Birmensdorf, Schweiz. 55 Marinov, I. et aI., 1979 and 1980: Manuel hydrologique, Vol. I & II, Hydrologuitchen naratchnik, "Tecnica", Sofia. 56 Guerassimov, S., 1980: Manual de determination des crnes en Bulgarie, Blvd. Lennine 66, Sofia. 57 Guerassimov, S., 1976: Determination de la distribution empirique precise a composition des components statistiques, Hydrologie et .meteorologie, No.2, 1976, Sofia.