Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Patrick Belmont Utah Watershed Coordinators Workshop July 21, 2015 Flood Frequency Analysis Background: Flood frequency analysis involves predicting the probability that a given flow rate will be equaled or exceeded in a given time period (usually one year) based on an observed record of peak discharges. The methods used for making this prediction rely on some basic statistical principles, so the first part of this discussion will be a review of the relevant parts of statistics theory. Given a sequence of data, it is possible to develop a set of descriptive statistics. The most commonly used statistics in flood frequency analysis are the mean μ, standard deviation σ, and skew coefficient γ. Theoretically we could only know the true values of the mean, standard deviation, and skew if we had perfect knowledge of an infinitely long series of data. In practice, we never know the true values of these statistics. We can only estimate them based on available data using equations 1, 2 and 3. Estimate for μ = 1 n xi n i 1 (1) 1 n xi 2 Estimate for σ = n 1 i 1 n Estimate for γ = 1/ 2 n xi (2) 3 (3) i 1 (n 1)( n 2) 3 These 3 statistics are used to describe probability density functions or PDFs. The area under the PDF curve, above or below a particular Q, represents the probability that the given Q value will be exceeded (area to the right of Q) or not exceeded (area to the left of Q). In the same way, calculating the area under the curve between two Q values gives you the probability of a flood occurring that falls between those two Q values (Figure 1). Figure 1. Graphical description of a Probability Density Function (PDF). By definition, the total area beneath a PDF is 1. In flood frequency analysis, we are generally interested in the probability that a flood will be greater than a given value. To make this prediction, we use the cumulative density function or CDF, denoted by F(Q), which is defined as the area beneath the PDF up to a given value of Q (equation 4). In other words, this is the nonexceedance probability (the probability that the given flood will not occur or be exceeded in a given time period). Q F (Q) f (Q)dQ (4) The probability that a flood would be higher than a given value is called the exceedance probability P and is simply one minus the non-exceedance probability F (equation 5). P(Q) 1 F (Q) (5) The exceedance probability is often expressed in terms of its reciprocal (equation 6), which is referred to as the recurrence interval or return period (T). P 1 T (6) It is important to remember that the return period is a statistical construct that represents the average time period between events of a given magnitude and does not infer that the event of interest occurs once every T years. Finally, if the exceedance probability, P(Q) for a given discharge is known, it is possible to compute the probability for that particular discharge being exceeded at least once over any specific time period. This is typically referred to as hydrologic risk (RH) and is computed using equation 7. Probability of exceedance at least once in n years = Hydrologic Risk = RH = 1-(1-P)n (7) For example, we can compute the probability of a 100-year flood occurring (or being exceeded) in any given year as: T = 100 P = 1/T = 1/100 = 0.01 (i.e., a 1% probability) Probability of exceedance of the 100 year event at least once in a given 100-yr period = 𝑃(𝑄 > 𝑄100 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑐𝑒 𝑖𝑛 100 𝑦𝑒𝑎𝑟𝑠) = 1 − (1 − .001)100 = 1 − 0.99100 = 0.635 The probability is not one, since it is not guaranteed that the 100-year flood will occur in any of the 100 given years. Parametric Probability Distributions: Flood frequency analysis is generally performed by fitting a CDF to peak flow data observed at a stream gaging site. There are many possible analytical distributions to choose from. Perhaps the best known is the normal distribution whose PDF and CDF are fully described by two parameters: the mean μ and the standard deviation σ. However, the normal distribution does not do a good job describing flood frequency data, which tend to exhibit skewed distributions, with a ‘fat tail’ to the right (called ‘right skewed’), indicating a disproportionate number of low frequency, high magnitude events (Figure 2). 0.45 0.4 0.35 0.3 Normal Distribution A Skewed Distribution With the Same Mean 0.25 f(x) 0.2 0.15 0.1 0.05 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 -0.05 x Figure 2. The PDF for a normal distribution (blue, skew = 0) and a skewed probability distribution with the same mean μ and a similar standard deviation σ. The skewed distribution is said to be positively (or right) skewed, since it represents a larger probability for large events to occur than does the unskewed distribution. Flood data are not typically symmetrically distributed like the normal distribution. In fact, flood data typically exhibit a high degree of skew. One reason for this is that flood flows vary over orders of magnitude, but have a lower limit of zero, so that the high floods cause the mean to be very high relative to most floods (i.e., the median or mode). Skew can be reduced by taking the log of the flood peaks. However, even the log of peak flow data very rarely follow a normal distribution and still tend to be skewed. There are several skewed parametric distributions (i.e. PDF’s that are defined by a few parameters) that are commonly used for describing flood frequency data. The most common among these are the Log Normal distribution and the Log Pearson type III (LPIII) distribution. The log normal distribution is simply defined by the PDF and CDF for a normal distribution (i.e. the z-score table), except that the variable Q is no longer observed discharge but the log of Q, and the mean μlogQ and standard deviation σlogQ refer to the mean and standard deviations of the log-transformed data. Note that the average of a series of log-transformed values is not the same as the log of the average of a series of values: μ(log(Q)) ≠ log(𝜇(𝑄)) The LPIII distribution also uses the log-transformed Q data, but then uses another parameter to account for skew in the log transformed dataset. The procedure for fitting this distribution to data will be presented in the next section. To establish common practices in hydrologic analysis, in 1981 the US Water Resources Council (WRC) recommended using the Log Pearson Type III probability distribution for flood frequency analysis. Since that time, the LPIII distribution has become the most commonly used approach. Estimating skew from small datasets potentially introduces a large amount of error. Consequently, Bulletin 17B recommends weighting the skew computed from the data at a gage with a regional skew coefficient computed from as many other nearby gauges as possible. The assumption is that most streams in a region will have a similar skew in their flood frequency distributions, and that this can be used to get a more accurate estimate. Using the LPIII distribution to estimate peak discharge for a given recurrence interval: Equation 8 is used to predict the discharge for a given recurrence interval (T) flood: log QT logQ K T logQ (8) where logQ is the mean of the log-transformed annual peak flow series, logQ is the standard deviation of the log-transformed annual peak flow series, and KT is called the frequency factor, which depends on the skew of the log-transformed annual peak flow series as well as the return period of interest and is taken from a table (see spreadsheet or Table 1 below, from Mays (2005, Water Resources Engineering, Wiley). In the spreadsheet provided, this KT value is automatically computed using a macro when you click the ‘compute KT’ button. To compute the discharge of interest (QT) you must back-transform the logQT value according the following equation (do this by simply hitting the 10x button on your calculator): 𝑄𝑇 = 10𝑙𝑜𝑔𝑄𝑇 Table 1. KT values based on skew and return interval for Log-Pearson III flood frequency analysis