Download Belmont - Flood Frequency Analysis Explanation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

Probability amplitude wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Patrick Belmont
Utah Watershed Coordinators Workshop
July 21, 2015
Flood Frequency Analysis
Background:
Flood frequency analysis involves predicting the probability that a given flow rate will be
equaled or exceeded in a given time period (usually one year) based on an observed record of
peak discharges. The methods used for making this prediction rely on some basic statistical
principles, so the first part of this discussion will be a review of the relevant parts of statistics
theory.
Given a sequence of data, it is possible to develop a set of descriptive statistics. The most
commonly used statistics in flood frequency analysis are the mean μ, standard deviation σ, and
skew coefficient γ. Theoretically we could only know the true values of the mean, standard
deviation, and skew if we had perfect knowledge of an infinitely long series of data. In practice,
we never know the true values of these statistics. We can only estimate them based on available
data using equations 1, 2 and 3.
Estimate for μ =
1 n
 xi
n i 1
(1)
 1 n
xi   2 
Estimate for σ = 

 n  1 i 1

n
Estimate for γ =
1/ 2
n  xi   
(2)
3
(3)
i 1
(n  1)( n  2)
3
These 3 statistics are used to describe probability density functions or PDFs. The area under the
PDF curve, above or below a particular Q, represents the probability that the given Q value will
be exceeded (area to the right of Q) or not exceeded (area to the left of Q). In the same way,
calculating the area under the curve between two Q values gives you the probability of a flood
occurring that falls between those two Q values (Figure 1).
Figure 1. Graphical description of a Probability Density Function (PDF).
By definition, the total area beneath a PDF is 1. In flood frequency analysis, we are generally
interested in the probability that a flood will be greater than a given value. To make this
prediction, we use the cumulative density function or CDF, denoted by F(Q), which is defined as
the area beneath the PDF up to a given value of Q (equation 4). In other words, this is the nonexceedance probability (the probability that the given flood will not occur or be exceeded in a
given time period).
Q
F (Q)
 f (Q)dQ
(4)

The probability that a flood would be higher than a given value is called the exceedance
probability P and is simply one minus the non-exceedance probability F (equation 5).
P(Q)  1  F (Q)
(5)
The exceedance probability is often expressed in terms of its reciprocal (equation 6), which is
referred to as the recurrence interval or return period (T).
P
1
T
(6)
It is important to remember that the return period is a statistical construct that represents the
average time period between events of a given magnitude and does not infer that the event of
interest occurs once every T years. Finally, if the exceedance probability, P(Q) for a given
discharge is known, it is possible to compute the probability for that particular discharge being
exceeded at least once over any specific time period. This is typically referred to as hydrologic
risk (RH) and is computed using equation 7.
Probability of exceedance at least once in n years = Hydrologic Risk =
RH = 1-(1-P)n
(7)
For example, we can compute the probability of a 100-year flood occurring (or being exceeded)
in any given year as:
T = 100
P = 1/T = 1/100 = 0.01 (i.e., a 1% probability)
Probability of exceedance of the 100 year event at least once in a given 100-yr period =
𝑃(𝑄 > 𝑄100 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑐𝑒 𝑖𝑛 100 𝑦𝑒𝑎𝑟𝑠) = 1 − (1 − .001)100 = 1 − 0.99100 = 0.635
The probability is not one, since it is not guaranteed that the 100-year flood will occur in any of
the 100 given years.
Parametric Probability Distributions:
Flood frequency analysis is generally performed by fitting a CDF to peak flow data observed at a
stream gaging site. There are many possible analytical distributions to choose from. Perhaps the
best known is the normal distribution whose PDF and CDF are fully described by two
parameters: the mean μ and the standard deviation σ.
However, the normal distribution does not do a good job describing flood frequency data, which
tend to exhibit skewed distributions, with a ‘fat tail’ to the right (called ‘right skewed’),
indicating a disproportionate number of low frequency, high magnitude events (Figure 2).
0.45
0.4
0.35
0.3
Normal Distribution
A Skewed Distribution With the Same
Mean
0.25
f(x)
0.2
0.15
0.1
0.05
0
-5
-4
-3
-2
-1
0
1
2
3
4
5
-0.05
x
Figure 2. The PDF for a normal distribution (blue, skew = 0) and a skewed probability distribution with the same
mean μ and a similar standard deviation σ. The skewed distribution is said to be positively (or right) skewed, since
it represents a larger probability for large events to occur than does the unskewed distribution.
Flood data are not typically symmetrically distributed like the normal distribution. In fact, flood
data typically exhibit a high degree of skew. One reason for this is that flood flows vary over
orders of magnitude, but have a lower limit of zero, so that the high floods cause the mean to be
very high relative to most floods (i.e., the median or mode). Skew can be reduced by taking the
log of the flood peaks. However, even the log of peak flow data very rarely follow a normal
distribution and still tend to be skewed.
There are several skewed parametric distributions (i.e. PDF’s that are defined by a few
parameters) that are commonly used for describing flood frequency data. The most common
among these are the Log Normal distribution and the Log Pearson type III (LPIII) distribution.
The log normal distribution is simply defined by the PDF and CDF for a normal distribution (i.e.
the z-score table), except that the variable Q is no longer observed discharge but the log of Q,
and the mean μlogQ and standard deviation σlogQ refer to the mean and standard deviations of the
log-transformed data. Note that the average of a series of log-transformed values is not the same
as the log of the average of a series of values:
μ(log(Q)) ≠ log(𝜇(𝑄))
The LPIII distribution also uses the log-transformed Q data, but then uses another parameter to
account for skew in the log transformed dataset. The procedure for fitting this distribution to data
will be presented in the next section.
To establish common practices in hydrologic analysis, in 1981 the US Water Resources Council
(WRC) recommended using the Log Pearson Type III probability distribution for flood
frequency analysis. Since that time, the LPIII distribution has become the most commonly used
approach. Estimating skew from small datasets potentially introduces a large amount of error.
Consequently, Bulletin 17B recommends weighting the skew computed from the data at a gage
with a regional skew coefficient computed from as many other nearby gauges as possible. The
assumption is that most streams in a region will have a similar skew in their flood frequency
distributions, and that this can be used to get a more accurate estimate.
Using the LPIII distribution to estimate peak discharge for a given recurrence interval:
Equation 8 is used to predict the discharge for a given recurrence interval (T) flood:
log QT   logQ  K T  logQ
(8)
where  logQ is the mean of the log-transformed annual peak flow series,  logQ is the standard
deviation of the log-transformed annual peak flow series, and KT is called the frequency factor,
which depends on the skew of the log-transformed annual peak flow series as well as the return
period of interest and is taken from a table (see spreadsheet or Table 1 below, from Mays (2005,
Water Resources Engineering, Wiley). In the spreadsheet provided, this KT value is
automatically computed using a macro when you click the ‘compute KT’ button. To compute the
discharge of interest (QT) you must back-transform the logQT value according the following
equation (do this by simply hitting the 10x button on your calculator):
𝑄𝑇 = 10𝑙𝑜𝑔𝑄𝑇
Table 1. KT values based on skew and return interval for Log-Pearson III flood frequency
analysis