Download ExtremeValues

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
04/11/2006
Frequency Analysis
Reading: Applied Hydrology Chapter 12
Slides Prepared byVenkatesh Merwade
Hydrologic extremes

Extreme events



Floods
Droughts
Magnitude of extreme events is related to their
frequency of occurrence
Magnitude 


1
Frequency of occurence
The objective of frequency analysis is to relate the
magnitude of events to their frequency of occurrence
through probability distribution
It is assumed the events (data) are independent and
come from identical distribution
2
Return Period





Random variable: X
xT
Threshold level:
Extreme event occurs if: X  xT
Recurrence interval:   Time between ocurrences of X  x
Return Period: E ( )
T
Average recurrence interval between events equalling or
exceeding a threshold

If p is the probability of occurrence of an extreme
event, then E ( )  T  1
p
or
1
P ( X  xT ) 
T
3
More on return period


If p is probability of success, then (1-p) is the probability
of failure
Find probability that (X ≥ xT) at least once in N years.
p  P( X  xT )
P ( X  xT )  (1  p)
P ( X  xT at least once in N years)  1  P( X  xT all N years)
 1
P ( X  xT at least once in N years)  1  (1  p ) N  1  1  
 T
N
4
Return period example

Dataset – annual maximum discharge for 106
years on Colorado River near Austin
xT = 200,000 cfs
Annual Max Flow (10 3 cfs)
600
No. of occurrences =
3
500
400
2 recurrence intervals
in 106 years
300
200
T = 106/2 = 53 years
100
0
1905
1908
1918
1927
1938
1948
1958
1968
1978
1988
Year
P( X ≥ 100,000 cfs at least once in the next 5 years) = 1-
1998
If xT = 100, 000 cfs
7 recurrence intervals
T = 106/7 = 15.2 yrs
(1-1/15.2)5 =
0.29
5
Data series
Annual Max Flow (10 3 cfs)
600
500
400
300
200
100
0
1905
1908
1918
1927
1938
1948
1958
1968
1978
1988
1998
Year
Considering annual maximum series, T for 200,000 cfs = 53 years.
The annual maximum flow for 1935 is 481 cfs. The annual maximum data series probably
excluded some flows that are greater than 200 cfs and less than 481 cfs
Will the T change if we consider monthly maximum series or weekly maximum series?
6
Hydrologic data
series

Complete duration series


Partial duration series


Magnitude greater than base value
Annual exceedance series


All the data available
Partial duration series with # of values
= # years
Extreme value series

Includes largest or smallest values in
equal intervals



Annual series: interval = 1 year
Annual maximum series: largest values
Annual minimum series : smallest
values
7
Probability distributions

Normal family


Generalized extreme value family


Normal, lognormal, lognormal-III
EV1 (Gumbel), GEV, and EVIII (Weibull)
Exponential/Pearson type family

Exponential, Pearson type III, Log-Pearson type
III
8
Normal distribution

Central limit theorem – if X is the sum of n independent

and identically distributed random variables with finite variance,
then with increasing n the distribution of X becomes normal
regardless of the distribution of random variables
pdf for normal distribution
1
f X ( x) 
e
 2
1  x 
 

2  
2
 is the mean and  is the standard
deviation
Hydrologic variables such as annual precipitation, annual average streamflow, or
annual average pollutant loadings follow normal distribution
9
Standard Normal distribution


A standard normal distribution is a normal
distribution with mean () = 0 and standard
deviation () = 1
Normal distribution is transformed to standard
normal distribution by using the following
formula:
z
X 

z is called the standard normal variable
10
Lognormal distribution


If the pdf of X is skewed, it’s not
normally distributed
If the pdf of Y = log (X) is
normally distributed, then X is
said to be lognormally distributed.
 ( y   y )2 

f ( x) 
exp 
2

2 y 
x 2

1
x  0, and y  log x
Hydraulic conductivity, distribution of raindrop sizes in storm follow
lognormal distribution.
11
Extreme value (EV) distributions



Extreme values – maximum or minimum values
of sets of data
Annual maximum discharge, annual minimum
discharge
When the number of selected extreme values is
large, the distribution converges to one of the
three forms of EV distributions called Type I, II
and III
12
EV type I distribution

If M1, M2…, Mn be a set of daily rainfall or streamflow,
and let X = max(Mi) be the maximum for the year. If Mi
are independent and identically distributed, then for large
n, X has an extreme value type I or Gumbel distribution.
f ( x) 

 x u
 x  u 
exp 
 exp 


  
 
1
6sx

u  x  0.5772
Distribution of annual maximum streamflow follows an EV1 distribution
13
EV type III distribution

If Wi are the minimum streamflows in
different days of the year, let X =
min(Wi) be the smallest. X can be
described by the EV type III or
Weibull distribution.
 k  x 
f ( x)    
    
k 1
  x k 
exp    
    
x  0;  , k  0
Distribution of low flows (eg. 7-day min flow)
follows EV3 distribution.
14
Exponential distribution


Poisson process – a stochastic process
in which the number of events
occurring in two disjoint subintervals
are independent random variables.
In hydrology, the interarrival time
(time between stochastic hydrologic
events) is described by exponential
distribution
f ( x )  e
 x
1
x  0;  
x
Interarrival times of polluted runoffs, rainfall intensities, etc are described by
exponential distribution.
15
Gamma Distribution


The time taken for a number of events
(b) in a Poisson process is described
by the gamma distribution
Gamma distribution – a distribution
of sum of b independent and identical
exponentially distributed random
variables.
b x b 1e  x
f ( x) 
x  0;   gamma function
( b )
Skewed distributions (eg. hydraulic conductivity)
can be represented using gamma without log
transformation.
16
Pearson Type III

Named after the statistician Pearson, it is also
called three-parameter gamma distribution. A
lower bound is introduced through the third
parameter (e)
b ( x  e ) b 1 e   ( x e )
f ( x) 
( b )
x  e ;   gamma function
It is also a skewed distribution first applied in hydrology for
describing the pdf of annual maximum flows.
17
Log-Pearson Type III

If log X follows a Person Type III distribution,
then X is said to have a log-Pearson Type III
distribution
b ( y  e ) b 1 e   ( y e )
f ( x) 
( b )
y  log x  e
18
Frequency analysis for extreme events
Q. Find a flow (or any other event) that has a return period of T years
f ( x) 

 x u
 x  u 
exp 
 exp 


  
 
1
6sx

u  x  0.5772
Define a reduced variable y

 x  u 
F ( x)  exp  exp 

  

y
EV1 pdf and cdf
x u

F ( x)  exp exp( y )
y   ln lnF ( x)    ln ln(1  p) where p  P(x  xT )

 1 
yT   ln  ln1  
 T 

If you know T, you can find yT, and once yT is know, xT can be computed by
xT  u  yT
19
Example 12.2.1


Given annual maxima for 10-minute storms
Find 5- & 50-year return period 10-minute
storms
x  0.649 in
s  0.177 in

6s


6 * 0.177

 0.138
u  x  0.5772  0.649  0.5772 * 0.138  0.569
  T 
  5 
y5   ln ln
   ln ln
  1.5
  T  1 
  5  1 
x5  u  y5  0.569  0.138 *1.5  0.78 in
x50  1.11in
20
Frequency Factors



Previous example only works if distribution is
invertible, many are not.
Once a distribution has been selected and its
parameters estimated, then how do we use it?
xT  x  KT s
Chow proposed using:
xT  Estimated event magnitude

where KT  Frequency factor
T  Return period
x  Sample mean
s  Sample standard deviation
fX(x)
x
KT s
P ( X  xT ) 
xT
x
21
1
T
Normal Distribution

Normal distribution
1
f X ( x) 
e
 2
KT

1  x 
 

2  
2
xT  x

 zT
s
So the frequency factor for the Normal
Distribution is the standard normal variate
xT  x  KT s  x  zT s

Example: 50 year return period
T  50; p 
1
 0.02; K 50  z50  2.054
50
Look in Table 11.2.1 or use –NORMSINV (.)
in EXCEL or see page 390 in the text book
22
EV-I (Gumbel) Distribution

 x  u 
F ( x)  exp  exp  

  


6s

u  x  0.5772
  T 
yT   ln ln

  T  1 
xT  u  yT
 x  0.5772
x
6

s
  T  
6 
s  ln ln 
 
    T  1  
  T  
6
 s
0.5772  ln ln 
 
  T  1  
xT  x  KT s
KT  
6
  T  
 
0.5772  ln ln
 
  T  1  
23
Example 12.3.2

Given annual maximum rainfall, calculate 5-yr
storm using frequency factor
6
  T  
KT  
 
0.5772  ln ln
 
  T  1  
KT  
  5  
6
0
.
5772

ln
ln
   0.719


 
  5  1  
xT  x  KT s
 0.649  0.719  0.177
 0.78 in
24
Probability plots



Probability plot is a graphical tool to assess whether
or not the data fits a particular distribution.
The data are fitted against a theoretical distribution
in such as way that the points should form
approximately a straight line (distribution function
is linearized)
Departures from a straight line indicate departure
from the theoretical distribution
25
Normal probability plot
Steps

1.
2.
Rank the data from largest (m = 1) to smallest (m = n)
Assign plotting position to the data
1.
2.
3.
4.

Plotting position – an estimate of exccedance probability
Use p = (m-3/8)/(n + 0.15)
Find the standard normal variable z corresponding to the
plotting position (use -NORMSINV (.) in Excel)
Plot the data against z
If the data falls on a straight line, the data comes from a
normal distributionI
26
Normal Probability Plot
600
500
Q (1000 cfs)
Data
400
Normal
300
200
100
0
-3
-2
-1
0
1
2
3
Standard normal variable (z)
Annual maximum flows for Colorado River near Austin, TX
The pink line you see on the plot is xT for T = 2, 5, 10, 25, 50, 100, 500 derived using
the frequency factor technique for normal distribution.
27
EV1 probability plot

Steps
1.
2.
3.
4.

Sort the data from largest to smallest
Assign plotting position using Gringorten formula
pi = (m – 0.44)/(n + 0.12)
Calculate reduced variate yi = -ln(-ln(1-pi))
Plot sorted data against yi
If the data falls on a straight line, the data
comes from an EV1 distribution
28
EV1 probability plot
600
500
Data
Q (1000 cfs)
400
EV1
300
200
100
0
-2
-1
0
1
2
3
4
5
6
7
EV1 reduced variate
Annual maximum flows for Colorado River near Austin, TX
The pink line you see on the plot is xT for T = 2, 5, 10, 25, 50, 100, 500 derived using
the frequency factor technique for EV1 distribution.
29