Download Concepts in probability, Statistics and Stochastic Modeling

Concepts in Probability, Statistics and Stochastic Modeling • Loucks et al., 2005, Chapter 7 Learning Objective • Be able to use probability and statistics to quantify uncertainty and natural variability in physical quantities How Express a Distribution Cumulative Density Probability Density Which method conveys the information best to you? Probability Plot Equation Carl Friedrich Gauß, immortalized A random variable X is a variable whose outcomes (values) are governed by the laws of chance. 0.30 Probability density function 0.20 0.10 x1 0.00  f (x)dx f(x) P( x1  X  x 2 )  x2 0 2 4 6 x 8 10 12 Cumulative distribution function  f (x )dx 0.4 F(x) dF f (x)  dx 0.8  0.0 F( x )  P( X  x )  x2 0 2 4 6 x 8 10 12 Continuous and Discrete Random Variables From: Loucks, D. P., E. van Beek, J. R. Stedinger, J. P. M. Dijkman and M. T. Villars, (2005), Water Resources Systems Planning and Management: An Introduction to Methods, Models and Applications, UNESCO, Paris, 676 p, http://hdl.handle.net/1813/2804 0.8 0.4 F(X) 0.0 0.4 F(x) F(U) 0.0 F(u) 0.8 Generating a random variable from a given distribution 0.0 0.4 U 0.8 0 2 u 1. 2. X4 6 8 10 12 x Generate U from a uniform distribution between 0 and 1 Solve for X=F-1(U) F-1(U) is randomly distributed with CDF F(x) Basis P(X<x)=P(U<F(x))=P(F-1(U)<x) Generating a Pseudo random number • There is a lot of lore about this. Refer to: Press, W. H., B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, (1988), Numerical Recipes in C : The Art of Scientific Computing, Cambridge University Press, New York, 735 p. • Congruential method rnext  remainder of [( rprev  a  c)  m] • Each r is an integer random number between 0 and m-1.  by (m-1) gives a number between 0 and 1 that repeats after at most m numbers. Numerical recipes gives "good" choices for a, c and m. • R has built in functions runif to generate uniform random numbers, as well as other distributions, e.g rnorm, rgamma. Moments of Random Variables Moments of Random Variables Population Sample  Mean  1 N X   Xi N i 1  xf ( x )dx   Expectation 1 N Ê( X )  Xi  N i 1  xf ( x )dx E( X )    Expectation operator E(g( X))  1 N Ê( g( X ))   g( X i ) N i 1  g(x )f (x )dx    ( x  ) 2  Variance 2 N 1 S  ( X i  X )2  N ( 1) i 1 f ( x )dx 2   E([ X  E( X )] 2 )  Skewness 1 3   ( x  )   3 f ( x )dx 3  E([ X  E( X )] ) /  3 ˆ  1 N (X i  X) 3  N i 1 S3 L-Moments 2  1 / 2E[X(2|2)  X(1|2) ] Probability weighted moments L-moment estimators L-Moment Diagrams From: Loucks, D. P., E. van Beek, J. R. Stedinger, J. P. M. Dijkman and M. T. Villars, (2005), Water Resources Systems Planning and Management: An Introduction to Methods, Models and Applications, UNESCO, Paris, 676 p, http://hdl.handle.net/1813/2804 From: Salas, J. D., J. W. Delleur, V. Yevjevich and W. L. Lane, (1980), Applied Modeling of Hydrologic Time Series, Water Resources Publications, Littleton, Colorado, 484 p. From: Salas, J. D., J. W. Delleur, V. Yevjevich and W. L. Lane, (1980), Applied Modeling of Hydrologic Time Series, Water Resources Publications, Littleton, Colorado, 484 p. Hillsborough River at Zephyr Hills, September flows 0.00010 x = 8621 mgal S = 8194 mgal n = 31 0.00000 Density 0.00020 Fitting a probability distribution to data 0 5000 10000 15000 mgal 20000 25000 30000 35000 Method of Moments • Using the sample moments as the estimate for the population parameters 2 ˆ ˆ E ( X )  x ; Var ( X )   0.00020 Method of Moments Gamma distribution  x 1e  x f (x)  () 2 0.00010 ˆ ˆ  =1.3 x 10-3 x 0.00000 Density ˆ   x  =1.1 S   0 5000 10000 15000 20000 25000 30000 35000 0.00020 Method of Moments Log-Normal distribution f (x)  0.00010 S x ˆ 2y  ln( CV2  1) =0.643 1 2 ˆ y  ln( x exp(  ˆ y )) =8.29 2 0.00000 Density CV  2    1  1  ln( x )   y   exp     y 2  y x  2    0 5000 10000 15000 20000 25000 30000 35000 Method of Maximum Likelihood • “Back into” the estimate by assuming the parameters we are trying to estimate from the data are known. • How likely are the sample values we have, given a certain set of parameter values? • We can express this as the joint density of the random sample given the parameter value. f X 1 , X 2 ,..., Xn x1 , x2 ,..., xn |     f X xi |   • After we obtain the data (random sample), we use the joint density to define the Likelihood function. n L | x1 , x2 ,..., xn    f X xi |   i 1 0.00020 Likelihood L   fX xi |  0.00010 ln(L)= -312 (for log normal) 0.00000 Density ln(L)= -311 (for gamma) 0 5000 10000 15000 20000 25000 30000 35000 Normalization • Much theory relies on the central limit theorem so applies to Normal Distributions • Where the data is not normally distributed normalizing transformations are used – Log – Box Cox (Log is a special case of Box Cox) Box-Cox Normalization The Box-Cox family of transformations that includes the logarithmic transformation as a special case (=0). It is defined as: z = (x -1)/ ;   0 z = ln(x);  = 0 where z is the transformed data, x is the original data and  is the transformation parameter. Box-Cox Normalization So… the log looked OK ( = 0). Is that what we really want? Let’s skip the derivations for now and look at the answer for our three proposed methods. Determining Transformation Parameters • Trial and error: apply a series of trial lambda values and evaluate statistic. • PPCC (Filliben’s Statistic): R2 of best fit line of the QQplot • Kolomgorov-Smirnov (KS) Test (any distribution): p-value • Shapiro-Wilks Test for Normality: p-value Quantiles Rank the data pi 0.6 i n 1 0.2 prob( X  x i )  F(y) x1 x2 x3 . . . xn Theoretical distribution, e.g. Standard Normal -3 -2 -1 0 qi1 2 3 y qi is the distribution specific theoretical quantile associated with ranked data value xi Quantile-Quantile Plots 7 6 5 3 4 Sample Quantiles 3000 2000 1000 0 Sample Quantiles xi ln(xi) 8 Normal Q-Q Plot QQ-plot for Log-Transformed Flows 4000 Normal QQ-plot for Q-Q RawPlot Flows -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 Theoretical Quantiles Theoretical Quantiles qi qi Need transformation to make the Raw flows Normally distributed. 2 3 Example: Determining Transformation Parameters • Alafia River historical monthly flows • Evaluate using all three criteria • Test a range of lambda values from -2 to 2 by 0.1 for Filliben’s and KS • Test a range of lambda values from -1 to 1 by 0.1 for Shapiro-Wilks (errors for larger lambda values). Box-Cox Normality Plot for Monthly September Flows on Alafia R. Using PPCC 0.2 0.4 0.6 This is close to 0,  = -0.14 0.0 Fillibens Statistic 0.8 1.0 Box-Cox Normality Plot for Alafia R. -2 -1 0 Box-Cox Lambda Value Optimal Lambda= -0.14 1 2 Kolmogorov-Smirnov Test • Specifically, it computes the largest difference between the target CDF FX(x) and the observed CDF, F*(X). • The test statistic D2 is: n  D2  max F * ( X (i ) )  FX ( X (i ) ) i 1  i (i )   max   FX ( X )  i 1  n  n where X(i) is the ith largest observed value in the random sample of size n. Box-Cox Normality Plot for Monthly September Flows on Alafia R. 1.0 Box-Cox Normality Plot for (KS) Alafia R.Statistic Using Kolmogorov-Smirnov 0.2 0.4 0.6  = -0.39 0.0 KS p-value 0.8 This is not as close to 0, -2 -1 0 Box-Cox Lambda Value Optimal Lambda= -0.39 1 2 http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/wilkshap.htm Box-Cox Normality Plot for Monthly September Flows on Alafia R. 0.2 0.4 0.6 This is close to 0,  = -0.14. Same as PPCC. 0.0 Shapiro-Wilks p-value 0.8 1.0 Box-Cox Normality Plot for Alafia R. Using Shapiro-Wilks Statistic -1.0 -0.5 0.0 Box-Cox Lambda Value Optimal Lambda= -0.14 0.5 1.0

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Concepts in probability, Statistics and Stochastic Modeling