Download homogeneity testing for peak flow in catchments in the

HOMOGENEITY TESTING FOR PEAK FLOW IN CATCHMENTS IN THE EQUATORIAL NILE BASINS Opere A.O.1, Mkhandi S.2, Willems P.3 1 2 Department of Meteorology, University of Nairobi, Kenya, [email protected] Departme nt of water resources Engineering, University of Dar-Es-Salaam, Tanzania, [email protected] 3 Hydraulics Laboratory, University of Leuven, Belgium, [email protected] ABSTRACT Regional flood frequency analysis deals with the identification of homogeneous regions of which the distribution of peak flows from sites from such a region are similar. Once a homogenous region is identified, standardised data from different sites within the region can be pooled together and a single frequency curve applicable to the region can be derived. A region can be considered homogeneous for flood frequency analysis if sufficient evidence can be established that data at different sites in the region are drawn from the same distribution (except for the scale parameter). Hosking and Wallis (1993) developed several homogeneity tests for use in regional studies. The aim of these tests is to estimate the degree of heterogeneity in a group of sites and to assess whether they might reasonably be treated as a homogeneous region. A method commonly used with flow data to determine regional homogeneity is the Lmoment ratio diagrams. The method is based on the conce pt that all sites in a homogeneous region have the same population L-moments. Their sample L-moments will, however, be different owing to sampling variability. This has been taken into account by simulation. In addition to the L-moments, also the STU-index has been considered, which is based on the differences between the arithmetic averages of the at site data, including and excluding the largest value. From both the STU-index method and the L-moment diagrams, it is concluded that the Kenyan stations and the Tanzanian stations considered in this study (these are stations in Equatorial Nile basins) each belong to the same region. This homogeneity grouping forms the fundamental base for regional modeling in the study area. 1 INTRODUCTION In many water -engineering applications, the time series are too short for a reliable estimation of extreme events. The difficulties are related to the uncertainty in the calibration of the appropriate extreme value distribution. Regionalization provides a means to cope w ith this problem by assisting in the identification of the shape of the extreme value distribution, leaving only a measure of scale (or more general, an ‘index’) to be estimated from the at-site data. Regional extreme value analysis involves three major steps: (1) Grouping of sites into homogeneous regions. (2) Estimation of the regional extreme value distribution for each homogeneous region (after dividing the random variable under study by the scale parameter). The data at the different stations of the same homogeneous region can be combined for this task. (3) Estimation of the at-site scale parameter and corresponding extreme value distribution. This paper focuses on the application and comparison of techniques for (1) homogeneity testing as well as (2) the identif ication of regional distributions. Flow data from the Lake Victoria subbasins in Kenya and Tanzania were considered. METHODS FOR IDENTIFICATION OF HOMOGENEOUS REGIONS The grouping into homogeneous regions can be done by the identification of geographically contiguous regions. Geographical proximity does, however, not guarantee hydrological similarity. Therefore, it is better to define similarity between sites based on catchment -related characteristics or statistical flow characteristics. Different sites are only considered to belong to different homogeneous regions when the hypothesis of complete homogeneity between the stations is rejected by the test (following statistical hypothesis testing). Another approach may be based on correlations. Sites are grouped when they have similar catchment characteristics (e.g. flood response, when regionalized flood frequency analysis is conducted). An ungauged (or gauged) site can then be assigned to a region based on catchment characteristics alone. Often a common regional distribution can be assumed within each homogeneous region delineated. Therefore, a region can also be considered homogeneous if sufficient evidence can be established that data at different sites in the region are drawn from the same distribution (except for the scale parameter). The regional distribution after dividing the variable by the scale parameter is also called the ‘regional growth curve’. Hosking and Wallis (1993) developed several tests for use in regional studies. They gave guidelines for judging the degree of homogeneity of a group of sites, and for choosing and estimating a regional distribution. The three regional homogeneity measures selected for this study are the STU-index method, the heterogeneity measure, and the L-moments diagram method. L-moment ratio diagrams have become most popular tools for regional distribution identification, and for testing for outlier stations. L-moment diagrams as a tool for identifying a regional distribution have been used in numerous other studies, including 2 Chowdhury et al. (1989), Pilon & Adamowski (1992), Vogel & Fennessey (1993), and Vogel et al. (1993a, 1993b). An alternative test for homogeneity based on estimated dimensionless 10-year floods was developed by Lu & Stedinger (1992). Chowdhury et al. (1989) compared several goodness-of-fit tests for the regional GEV distribution and found that a new chisquare test based on the L-coefficient of variation and the L-skewness outperformed other classical tests. Identification of homogeneous regions (regions in which the distribution of peak flows are similar) is important for regional flood frequency analysis. Once a homogenous region is identified, standardized data from different sites within the region can be pooled together and a single frequency curve applicable to the region can be derived. In cases where adequate rainfall or river flow records are not available at or near the site of interest, it is difficult for hydrologists and engineers to derive reliable flood estimates directly and regional studies can be useful. Hereafter, the STU-index method, the heterogeneity measure, and the L-moments diagram method for identification of homogeneous regions are summarized and discussed. Discordancy measure (STU-index method) The STU-index method is based on the differences between the arithmetic averages of the at site data, including and excluding the largest value. Consider Q 1j < Q2j < Q3j < … < Qnj as the ordered set of observations of the annual maximal floods at site j, for n is the number of observations in the sample. The mean flood, Quj , at site j including the largest value in the sample (outlier) is given by: Quj = 1 n ∑ Q , i = 1, 2, …, n n i =1 ij (1) while the mean flood, Q j , at site j excludes the largest value in the sample (outlier) and is given by: 1 n −1 (2) Qj = ∑ Q , i = 1, 2, …, (n-1) ( n − 1) i =1 ij The STU-method is based on a quantity STU j (called discordancy measure) that indicates the existence of an upper outlier at each station j by the following equation: STU j = (Q uj − Q j ) / D j (3) in which D j is defined as follows: [ ] 2 2 D j = (Quj ) / n + (Q j ) /(n − 1) 1/ 2 (4) To determine homogeneity, in the context of outliers, the procedure is as follows: - Step (1): Calculate the quantity STU j for all stations within the region. - Step (2): Rank the STU j values in ascending order and plot them against their rank. 3 If the region were homogeneous, in the context of outliers, the STU j values would lie on a line parallel to the axis of their rank. However, for a moderately heterogeneous region, they appear as a straight line. Then if there are STU j values among the higher ranks that show significant deviatio n from this line, the stations that correspond to these STU j values have upper outliers. Heterogeneity measure statistics The statistics are considered to be constant across a homogeneous region. Departures from such assumptions could lead to a bias in the flood estimates at some sites. Those catchments whose Cv, Cs, Ck, and Qmax / Q values happen to coincide with the regional mean values would not suffer such bias. Under this method, the hydrologic homogeneity can be invest igated in order to select the catchments whose Cv, Cs, Ck , and Q max / Q values happen to coincide with the regional mean value. Also statistics such as Cv, Cs, Ck, Q max / Q , L-moments, etc. can be used to estimate the degree of heterogeneity in a group of sites and to assess whether they might reasonably be treated as a homogeneous region. The statistics are considered to be constant across a homogeneous region. Departures from such assumptions could lead to a bias in the flood estimates at some sites. Those catchments whose statistics’ values happen to coincide with the regional mean values would not suffer such bias. Under this method, the hydrologic homogeneity can be investigated in order to select the catchments whose statistics’ values happen to coincide with the regional mean value. The method is hereafter applied to the use of L-moments. Hereafter, first the L-moments are defined, whereafter the use with heterogeneity measure statistics is discussed. L-moments For a random variable X with cumulative distribution function F , the following quantiles: { βr = E X [F ( X )] r } (5) are the probability-weighted moments (PWMs), defined by Greenwood et al. (1979) and used by them to estimate the parameters of the probability distributions. Hosking (1986, 1990) defined L -moments to be linear combinations of PWMs: λ r +1 = r ∑ Pr*,k β k (6) k =0 where: P * r , k = ( − 1) In particular the first four L-moments are: λ1 = β0 4 r −k  r   r + k     k  k  (7) λ2 = 2 β 1 - β0 λ3 = 6 β 2 - 6 β1 + β0 λ4 = 20 β 3 - 30 β2 + 12 β1 - β0 (8) L-moment ratios are the quantiles τ = λ 2/λ 1 and τr = λ r /λ 2, r = 3, 4, … In the Figures 2 and 3, hereafter, examples are given of L -moment ratio diagrams. L-moments are similar to but more convenient than PWMs because they are more easily interpretable as measures of distribution shape. In particular, λ 1 is the mean of the distribution, a measure of location; λ 2 is a measure of scale; τ3 and τ4 are measures of skewness and kurtosis, respectively. The L-CV, τ 2 = λ2/λ1 , is analogous to the usual coefficient of variation. The foregoing quantities are defined for a probability distribution but, in practice, must often be estimated from a finite sample. If we let Q1 < Q2 < … < Q n be the ordered sample and define the sample L -moments: r l r +1 = ∑ Pr*,k bk (9) k =0 where: br = 1 n ( j − 1)( j − 2 ) ...( j − r ) Qj j = r +1 ( n − 1)(n − 2) ... (n − r) n ∑ (10) then lr is an unbiased sample -based estimator of λr. The estimators tr=lr/l2 of τr are consistent but not unbiased. The quantiles l1, l2 , l3 , t3 and t4 are useful summary statistics for data samples. They can be used to judge which distributions are consistent with a given data sample (Hosking, 1990). They can also be used to estimate parameters when fitting a distribution to a sample, by equating the sample and population L-moments. In this study, they are used on the basis of a heterogeneity measure statistic. Heterogeneity measure statistic by means of L-moments In a homogeneous region, all sites have the same population L-moments. Their sample Lmoments will, however, be different owing to sampling variability. The easiest (but subjective) method is the visual assessment of the dispersion of the at-site Lmoments such as the L-moment coefficient of variation (L-CV), the L-skewness (L-CS) or the L-kurtosis (L -CK). This can be done based on a plot of L-CS versus L-CV or L-CK (also called L-moment diagram). An alternative (and more objective) simple measure of the dispersion of the sample Lmoments is the standard deviation of the at-site L-CVs. We use L-CVs because between-site variation in L-CV has a much larger effect than variation in the L-CS or L-CK on the variance of the estimates of quantiles, except those in the far tail of the distribution (Hosking et al., 1985a). To establish what would be expected as dispersion, simulation can be used. By repeated simulation of a homogeneous region with sites having record lengths the same as those of the 5 observed data, the mean and the standard deviation of the chosen dispersion measure can be obtained (Madsen et al., 1997b). The level of homogeneity or heterogeneity measure, H, of a region then can be expressed as: Y − Yˆ (11) H= σ where Y is the statistic or variable considered to test the homgeneity, Yˆ the mean of the simulated values for this variable and σ is the standard deviation of the simulated values. However, a more appropriate statistic compares the observed and the simulated dispersion, through the weighted standard deviation (V) of the at-site L -CVs: N V= ∑ n (t i i i =1 − t )2 (12) N ∑n i=1 i where t i and t are the sample L-CVs at each site i and the regional mean L-CV respectively, while ni are the sample record lengths at each site i and N the number of sites. Then fit a distribution to the group average L-moments 1, t , t 3 , t 4 , which are the mean sample Lmoment ratios obtained from a finite sample. The regional or group average L-moment ratios, with sites weighted proportionally to their record lengths, are to be calculated: N t = N ∑ n i ti i =1 N ∑ i =1 , tr = ni ∑ i =1 N nit r ∑ i =1 i , r = 3 , 4 , ... (13) ni where the sample L -moment ratios at site i are denoted by ti2, ti3 , ti4 , etc. Finally, a large number of regions (Nsim) is to be simulated from this distribution; the regions are homogeneous and have no cross correlation or serial correlation, and sites have the same record lengths as the observed data. For each simulated region calculate V and from the simulations determine the mean and standard deviation of the N sim values of V and call these µv and σv. Mathematically, equation (11) can be written as: H = V − µV σV (14) in which H is the heterogeneity measure statistic. The region is declared to be heterogeneous if H is sufficiently large. The region is regarded as “acceptably homogeneous” if H < 1, “possibly heterogeneous” if 1≤ H <2, and “definitely heterogene ous” if H ≥ 2 (Hosking and Wallis, 1993). 6 Regional distribution analysis For each of the homogeneous regions identified, a regional distribution needs to be derived. One approach is based on the empirical distributions determined for all the sites within the region. The average of the empirical distributions is determined to represent the frequency curve for the region. An investigation using this approach can be found hereafter in Figure 4 for Tanzania. Regional flood frequenc y analysis can also be based on the L-moments: The L-moments λ 1, τ2, τ3, …, τp are estimated by the corresponding sample L-moments of the at-site statistics. Hosking et al. (1985a), Lettenmaier and Potter (1985), Wallis and Wood (1985), Lettenmaier et al. (1987), Hosking and Wallis (1988), and Potter and Lettenmaier (1990) have shown that index flood procedures based on PWMs or L-moments yield robust and accurate quantile estimates. 7 NILE BASIN RESULTS The regional homogeneity methods mentioned above are applied to the flow data at gauging stations in Kenya and Tanzania. Table 1 gives an overview of the stations considered in the study. Serial River no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Table 1: List of discharge stations used in the analysis Station Area Country Period (km2) record Ngono Ngono Ngono Ruvuma Kagera Moame Magogo Simiyu Simiyu Kwoittobos Kwoittobos Noigameget Nundoro Sergoit Sosiani Miriu Yala Nzoia Yala Nzoia Yala Kipkarren Kipwen Little nzoia Rongit Nyando Kuja Migori Kipkarren Muhutwe Kalebe brg Kyaka rd brg Mwendo ferry Nyakanyasi Mabuki brg Shinyanga rd Road crossing Ndagalu 1be06 1be01 1bc01 1cb08 1ca02 1cb05 1jf06 1fg02 1ee01 1fg01 1da02 1fe02 1ce01 1cb09 1bb01 1bg07 1gd04 1kc03 780 1185 2608 48228 1410 1212 10659 10560 808 715 681 167 717 697 394 2864 11849 2388 8417 1577 2440 80 1474 684 2520 3046 Tanzania Tanzania Tanzania Tanzania Tanzania Tanzania Tanzania Tanzania Tanzania Kenya Kenya Kenya Kenya Kenya Kenya Kenya Kenya Kenya Kenya Kenya Kenya Kenya Kenya Kenya Kenya Kenya Kenya of Years of record 1971-1982 9 1970-1982 10 1970-1982 11 1970-1982 10 1970-1978 9 1970-1982 11 1970-1982 11 1970-1978 9 1970-1982 11 1956-1984 29 1956–1975 20 1950-1985 36 1964-1984 21 1960 – 1985 26 1960 – 1989 30 1964-1988 25 1959 –1985 27 1963-1994 32 1950-1985 36 1950-1988 39 1962-1985 24 1950-1987 38 1964-1984 21 1957-1985 29 1961-1984 24 1956-1988 33 1951-1985 35 1cd01 67 Kenya 1932-1987 56 Homogeneity test results Results of the methods that were used to determine regional homogeneity are presented in this section. The STU-index method has been applied to 19 stations on the Kenyan side of the basin in order to test the homogeneity of this part of the basin. The results are shown in figure1. 8 STU-index versus Rank 0.4 0.35 STU-index 0.3 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Rank Figure 1: STU-index versus Rank It can be seen in figure 1 that a line fitted to the 19 locations is close to linear which is indicative that the 19 sites together show some degree of homogeneity. It should be noted that the grouping under this method is largely based on outlier detection and should not be conclusive. The 19 sites were further subjected to other statistical tests for homogeneity in order to validate the results. These statistics are based on the visual assessment of the L-moment statistics and the homogeneity measure statistics as presented by equations 5-14. The homogeneity measure criterion involves the computation of a heterogeneity measure statistics for the observed and simulated data and comparing the results: The resulting H statistics computed is approximately -1.25 and the absolute value is indicative of a fairly good homogeneous region. The results also agree very well with the homogeneity results from the STU-index method. 9 L-moment ratio diagrams The L-moment ratio diagrams below (Figure 2 and 3) display the results of the visual assessment of the dispersion of the at-site L-moments obtained by plotting L-CS versus L -CV or L-CK on a graph. Simulated data 0.3 0.25 LCV 0.2 0.15 0.1 0.05 0 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 L-skewness LCV Observed 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -1.5 -1 -0.5 0 0.5 L-skewness Figure 2: L-moment ratio diagrams for L-CV and L-skewness (L -CS) 10 1 Simulated 0.7 0.6 L-kurtosis 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 L-skewness L-kurtosis Observed 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -1.5 -1 -0.5 0 0.5 1 L-skewness Figure 3: L -moment ratio diagrams for L -kurtosis (L-CK) and L-skewness (L-CS) The figure 1 and Figure 2 show similar spread for the observed and simulated statistics in each case. This is an indication that the stations belong to a similar homogeneous region and therefore can be represented by common L-moment statistics. The stations can thus be recommended for regional analysis. Similar work has been done for the Tanzanian stations indicated in Table 1. For these stations, it was concluded that they belong to one homogeneous region. Results of regional analysis based on empirical distributions For each of the homogeneous regions, regional flood frequency distributions can be derived. Example of the results based on empirical distributions is shown in Figure 4 for the Tanzanian stations. Empirical distributions are determined first for each site by ranking the standardized data (after dividing by the scale parameter) in ascending order and then assigning to each of the ranked standardised flow magnitudes the probability of nonexceedance by using the Gringorten plotting position formula. Plots of empirical distributions 11 for all the stations, i.e. plots of ranked standardised flow versus non-exceedance probability, were plotted on the same graph. From the figure it is observed that the empirical distributions for the different sites plot close to each other indicating that the stations in the study area constitute a single homogenous region. This finding suggests that a single theoretical extreme distribution can be used to derive a regional frequency curve for the Lake Victoria subbasin in Tanzania. Referring to the calibration results reported in Opere et al. (2005), the selected distribution to fit the AM series was the EV1/Gumbel distribution for the 9 Tanzanian sites. On this basis the EV1/Gumbel distribution using the PWM method of parameter estimation was used to derive the regional frequency curve for the region. The derived curve is shown in Figure 4. From the figure it can be observed that EV1/Gumbel distribution fitted by PWM method gives a good fit to the observed data. The equation of the regional frequency curve is ( Q(T) = 0.801 + 0.3451 y(T) ), with y(T) the reduced variate -ln(-ln(F(Q))). Plot of Empirical Distributions Nyakanyasi Shinyanga SimiyuRb Ndagalu Muhutwe Mean Kalebe PWM Kyaka Mwendo Mabuki 2.50 2.00 Q/MAF 1.50 1.00 0.50 -1.500 -1.000 -0.500 0.00 0.000 0.500 1.000 1.500 2.000 2.500 -ln(-ln(F(Q))) Figure 4: The plot of empirical distributions for the Tanzanian sites studied 12 3.000 CONCLUSION The results indicated that the 19 locations on the Kenyan side of the lake Victoria basin could be grouped together into hydrological homogeneous regions. Observations from the sites on the Tanzania side of the basin, based on empirical distributions also indicate that the stations in the study area constitute a single homogenous region which suggests that a single theoretical extreme value distribution can be used for each of the regions to derive a regional frequency curve for the Lake Victoria sub-basins in Kenya and Tanzania. The homogeneity grouping formed the fundamental base for regional modeling. On this basis the EV1/Gumbel distribution using the PWM method of parameter estimation was used to derive the regional frequency curves for the region. ACKNOWLEDGMENTS This paper was prepared based on the research activities of the FRIEND/Nile Project which is funded by the Flemish Government of Belgium through the Flanders-UNESCO Science Trust Fund cooperation and executed by UNESCO Cairo Office. The authors would like to express their great appreciation to the Flemish Government of Belgium, the Flemish experts and universities for their financial and technical support to the project. The authors are indebted to UNESCO Cairo Office, the FRIEND/Nile Project management team, overall coordinator, thematic coordinators, themes researchers and the implementing institutes in the Nile countries for the successful execution and smooth implementation of the project. Thanks are also due to UNESCO Offices in Nairobi, Dar Es Salaam and Addis Ababa for their efforts to facilitate the implementation of the FRIEND/Nile activities. REFERENCES Greenwood, J.A., J.M. Landwehr, N.C. Matalas, and J.R. Wallis (1979: Probability weighted moments: Definition and relation to parameters of several distributions expressable in inverse form, Water Resour. Res., 15, 1049-1054. Hosking, J.R.M , Wallis J.R. and Wood, E.F. (1985a): An appraisal of of the regional flood frequency procedure in the U.K. Flood Studies Report, Hydrol. Sci. J., 30(1), 85109. Hosking, J.R.M (1986): The theory of probability weighted moments, Res. Rep. RC12210, IBM Res., Yorktown Heights, N.Y. Hosking, J.R.M, and J.R.Wallis (1988): The effect of intersite dependence on regional flood frequency analysis, Water Resour. Res., 24(4), 588-600. Hosking, J.R.M (1990): L -moments: Analysis and estimation of distributions using linear combinations of order statistics, J. R. Stat. Soc. B, 52(1), 105-124. Hosking, J.R.M and J.R. Wallis (1993): Some Statistics Useful in Regional Frequency Analysis.Water Resour. Res.,29(2), 271-281. Lettenmaier, D.P., and K.W. Potter (1985): Testing flood frequency estimation methods using a regional flood generation model, Water Resour. Res.,21, 1903-1914. 13 Lettenmaier, D.P., J.R. Wallis, and E.F. Wood (1987): Effect of regional heterogeneity on flood fr equency estimation, Water Resour. Res., 23(2), 313-323. Lu, L.H. and Stedinger, J.R.1992: Sampling variance of normalized GEV/PWM quantile estimators and a regional homogeneity test. J. Hydrol. 138, 223-245. Madsen, H., C.P. Pearson and D. Rosbjerg (1997b): Comparison of annual maximum series and partial duration series methods for modelling extreme hydrologic events. 2. Regional Modelling. Water Resour. Res.,33(4), 759-769. Opere A.O., Mkhandi S., and Willems P. (2005): At site flood frequency analysis For the nile equatorial basins (submitted to the conference) Pilon, P.J., and K. Adamowski (1992): The value of regional information to flood frequency analysis using the method of L-moments, Can. J. Civ. Eng.,19, 137-147. Potter, K.W, and D.P. Lettenmaier (1990): A comparison of regional flood frequency estimation methods using a resampling method, Water Resour. Res., 26, 415-424. Vogel, R.M., and N.M. Fennessay (1993): L-moment diagrams should replace product moment diagrams, Water Resour. Res.,29(6), 1745-1752. Vogel, R.M., T.A. McMahon, and F.H.S Chiew (1993a): Flood flow frequency model selection in Australia, J. Hydrol.,146, 421-449. Vogel, R.M., W.O. Thomas, and T.A. McMahon (1993b): Flood flow frequency model selection in Southwestern United States, J. Water Resour. Plann. Manage.,119(3), 353366. Wallis, J.R., and E.F. Wood (1985): Relative accuracy of log Pearson III procedures, J. Hydraul. Eng., 111(7), 1043-1056. 14

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download homogeneity testing for peak flow in catchments in the