Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
FREQUENCY ANALYSIS • Basic Problem: To relate the magnitude of extreme events to their frequency of occurrence through the use of probability distributions. FREQUENCY ANALYSIS • Basic Assumptions: (a) Data analyzed are to be statistically independent & identically distributed - selection of data (Time dependence, time scale, mechanisms). (b) Change over time due to man-made (eg. urbanization) or natural processes do not alter the frequency relation - temporal trend in data (stationarity). FREQUENCY ANALYSIS • Practical Problems: (a) Selection of reasonable & simple distribution. (b) Estimation of parameters in distribution. (c) Assessment of risk with reasonable accuracy. REVIEW OF BASIC CONCEPTS Probabilistic Outcome of a hydrologic event (e.g., rainfall amount & duration; flood peak discharge; wave height, etc.) is random and cannot be predicted with certainty. Terminologies - Population The collection of all possible outcomes relevant to the process of interest. Example: (1) Max. 2-hr rainfall depth: all non-negative real numbers; (2) No. of storm in June: all non-negative integer numbers. - Sample A measured segment (or subset) of the population. REVIEW OF BASIC CONCEPTS Terminologies - Random Variable A variable describable by a probability distribution which specifies the chance that the variable will assume a particular value. Convention: Capital letter for random variables (say, X) whereas the lower case letter (say, x) for numerical realization that the random variable X will take. Example: X = rainfall amount in 2 hours (a random variable) ; x = 100.2 mm/2hr (realization). - Random variables can be - discrete (eg., no. of rainy days in June) or - continuous (eg., max. 2-hr rainfall amount, flood discharge).. REVIEW OF BASIC CONCEPTS Terminologies Frequency & Relative Frequency o For discrete random variables: o Frequency is the number of occurrences of a specific event. Relative frequency is resulting from dividing frequency by the total number of events. e.g. n = no. of years having exactly 50 rainy days; N = total no. of years. Let n=10 years and N=100 years. Then, the frequency of having exactly 50 rainy days is 10 and the relative frequency of having exactly 50 rainy days in 100 years is n/N = 0.1. o For continuous random variables: o Frequency needs to be defined for a class interval. o A plot of frequency or relative frequency versus class intervals is called histogram or probability polygon. o As the number of sample gets infinitely large and class interval length approaches to zero, the histogram will become a smooth curve, called probability density function. REVIEW OF BASIC CONCEPTS Terminologies Probability Density Function (PDF) – • For a continuous random variable, the PDF must satisfy f ( x) dx 1 - and f(x) ≥ 0 for all values of x. • For a discrete random variable, the PDF must satisfy p ( x) 1 all x and 1≥ p(x) ≥ 0 for all values of x. REVIEW OF BASIC CONCEPTS Terminologies - Cumulative Distribution Function For a continuous random variable, xo Pr( X xo ) f ( x) dx - For a discrete random variable, by Pr( X x o) = all xi xo p( xi ) Statistical Properties of Random Variables • Population - Synonymous to sample space, which describes the complete assemblage of all the values representative of a particular random process. • Sample - Any subset of the population. • Parameters - Quantities that are descriptive of the population in a statistical model. Normally, Greek letters are used to denote statistical parameters. • Sample statistics (or simply statistics): Quantities calculated on the basis of sample observations. Statistical Moments of Random Variables • Descriptors commonly used to show statistical properties of a RV are those indicative (1) Central tendency; (2) Dispersion; (3) Asymmetry. • Frequently used descriptors in these three categories are related to statistical moments of a RV. • Two types of statistical moments are commonly used in hydrosystem engineering applications: (1) product-moments and (2) L-moments. Product-Moments • rth-order product-moment of X about any reference point X=xo is defined, for continuous case, as r r r E X - xo = x - xo f x x dx = x - xo dFx x - - whereas for discrete case, K r r E X - xo = xk - xo p x xk k=1 • • where E[] is a statistical expectation operator. In practice, the first three moments (r=1, 2, 3) are used to describe the central tendency, variability, and asymmetry. Two types of product-moments are commonly used: – Raw moments: µr'=E[Xr] rth-order moment about the origin; and – Central moments: µr=E[(X-µx)r] = rth-order central moment – Relations between two types of product-moments are: r μ r = -1 C r,i xi μ'r - i i= 0 . • i r μ'r = C r,i xi μ r - i i= 0 where Cn,x = binomial coefficient = n!/(x!(n-x)!) Main disadvantages of the product-moments are: (1) Estimation from sample observations is sensitive to the presence of outliers; and (2) Accuracy of sample product-moments deteriorates rapidly with increase in the order of the moments. Mean, Mode, Median, and Quantiles • Expectation (1st-order moment) measures central tendency of random variable X E X = μx = x f x dx = x dF x = 1 - F x dx x - x - x - – Mean () = Expectation = l1 = location of the centroid of PDF or PMF. – Two operational properties of the expectation are useful: K K • E ak X k = a k k k=1 k=1 in which k=E[Xk] for k = 1,2, …, K. • For independent random variables, • K K E X k = k k=1 k=1 Mode (xmo) - the value of a RV at which its PDF is peaked. The mode, xmo, can be obtained by f x solving =0 x x x xmo • Median (xmd) - value that splits the distribution into two equal halves, i.e, Fx xmd = xmd f x dx = 0.5 x - • • Quantiles - 100pth quantile of a RV X is a quantity xp that satisfies P(X xp) = Fx(xp) = p A PDF could be uni-modal, bimodal, or multi-modal. Generally, the mean, median, and mode of a random variable are different, unless the PDF is symmetric and uni-modal. Uni-modal and bi-modal distributions fx(x) x (a) Uni-modal distribution fx(x) x (b) Bi-modal distribution Variance, Standard Deviation, and Coefficient of Variation • Variance is the second-order central moment measuring the spreading of a RV over its range, Var X = 2 s = E X - μx = 2 x 2 x - μx 2 f x x dx - • Standard deviation (sx) is the positive square root of the variance. • Coefficient of variation, Wx=sx/x, is a dimensionless measure; useful for comparing the degree of uncertainty of two RVs with different units. • Three important properties of the variance are: – (1) Var[c] = 0 when c is a constant. – (2) Var[X] = E[X2] - E2[X] – For multiple independent random variables, K K 2 2 Var ak X k = ak s k k=1 k=1 where ak =a constant and sk = standard deviation of Xk, k=1,2, ..., K. Skewness Coefficient • Measures asymmetry of the PDF of a random variable • Skewness coefficient, gx, defined as 3 μ 3 E X - μx g x = 1.5 = s x3 μ2 • The sign of the skewness coefficient indicates the degree of symmetry of the probability distribution function. • Pearson skewness coefficient – g1 = μ x - x mo sx • In practice, product-moments higher than 3rd-order are less used because they are unreliable and inaccurate when estimated from a small number of samples • See Table for equations to compute the sample productmoments. Relative locations of mean, median, and mode for positively-skewed, symmetric, and negatively-skewed distributions. fx(x) fx(x) (a) Positively Skewed, gx>0 (b) Symmetric, gx=0 x x xmo xmd x xxmo=xmd fx(x) (c) Negatively skewed, gx<0 x xmd xmo x Product-moments of random variables Moment First Measure of Central Location Definition Mean, Expected value E(X)=x Variance, Var(X)=2= sx2 Continuous Variable x x f ( x) dx x - s (x - μx ) fx (x) dx 2 x 2 - Second Dispersion Third 3 ( x - μx ) f x ( x) dx - Asymmetry Skewness coefficient, gx gx = 3 / sx3 μ4 (x - μ ) x - Fourth Peakedness Kurtosis, x Excess coefficient, x x = 4 / sx4 xx- 4 μx xk p( xk ) x xi / n all x's 2 2 s xk - x Px xk x all x ' s Wx = sxx μ3 Sample Estimator s x Var ( X ) Standard deviation, sx s x Var ( X ) Coefficient of variation,Wx Wx = sxx Skewness Discrete Variable f x ( x) dx 1 xi - x n -1 2 2 Cv s x gx = 3 / sx3 g = m3 / s3 μ4 ( xk - μx )4 px ( xk ) m4 x = 4 / sx4 xx- s m3 all x's 1 xi - x n -1 μ3 ( xk - μx )3 px ( xk ) all x's s2 n xi - x (n - 1)(n - 2) 3 n (n 1) xi - x (n - 1)(n - 2)(n - 3) k = m4 / s4 4 Kurtosis (x) • Measure of the peakedness of a distribution. • Related to the 4th central product-moment as X - μx 4 E μ4 x = 2 = s x4 μ2 • For a normal RV, its kurtosis is equal to 3. Sometimes, coefficient of excess, x=x-3, is used. • All feasible distribution functions, skewness coefficient and kurtosis must satisfy g x2 1 x Some Commonly Used Distributions • NORMAL DISTRIBUTION f N x | x, s 2 x = 1 2π s x 1 exp 2 x - μx σ x 2 , for - < x < Standardized Variable: Z X - s Z has mean 0 and standard deviation 1. Some Commonly Used Distributions • STANDARD NORMAL DISTRIBUTION: 0.5 0.4 (z ) 0.3 0.2 0.1 0 -3 -2 -1 0 z 1 2 3 Some Commonly Used Distributions • LOG-NORMAL DISTRIBUTION 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 (a) x = 1.0 Wx=0. 3 Wx=0. 6 0 1 2 Wx=1.3 3 x 4 0.7 5 6 x =1.65 0.6 (b) Wx = 1.30 0.5 0.4 x =2.25 fLN(x) f LN x | ln x 1 ln( x) - 2 ln x , s ln2 x = exp - ,x>0 2 s ln x x 2 s ln x 1 0.3 x =4.50 0.2 0.1 0 0 1 2 3 x 4 5 6 fLN(x) Some Commonly Used Distributions • Gumbel (Extreme-Value Type I) Distribution x - F EV 1 x | , = exp - exp - for maxima f EV 1 x | , = 1 x - exp - x - exp - for maxima x - ξ = 1 - exp - exp + for minima β = x - exp + x - exp + for minima 1 fEV1(y) 0.4 Max Min 0.3 0.2 0.1 0 -4 -3 -2 -1 0 y 1 2 3 4 Some Commonly Used Distributions • Log-Pearson Type 3 Distribution f P 3 x | , , = 1 β Γ x-ξ β α -1 e - x - / 0.30 4,1 0.25 1,4 0.20 fG(x) 0.15 2,4 0.10 0.05 0.00 0 2 4 6 8 x 10 12 14 Some Commonly Used Distributions • Log-Pearson Type 3 Distribution 1 f LP 3 x | , , = x β Γ ln x - ξ β α -1 e - ln( x ) - / with >0, xe when >0 and with >0, xe when <0