Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Concepts in Probability, Statistics and Stochastic Modeling • Loucks et al., 2005, Chapter 7 Learning Objective • Be able to use probability and statistics to quantify uncertainty and natural variability in physical quantities How Express a Distribution Cumulative Density Probability Density Which method conveys the information best to you? Probability Plot Equation Carl Friedrich Gauß, immortalized A random variable X is a variable whose outcomes (values) are governed by the laws of chance. 0.30 Probability density function 0.20 0.10 x1 0.00 f (x)dx f(x) P( x1 X x 2 ) x2 0 2 4 6 x 8 10 12 Cumulative distribution function f (x )dx 0.4 F(x) dF f (x) dx 0.8 0.0 F( x ) P( X x ) x2 0 2 4 6 x 8 10 12 Continuous and Discrete Random Variables From: Loucks, D. P., E. van Beek, J. R. Stedinger, J. P. M. Dijkman and M. T. Villars, (2005), Water Resources Systems Planning and Management: An Introduction to Methods, Models and Applications, UNESCO, Paris, 676 p, http://hdl.handle.net/1813/2804 0.8 0.4 F(X) 0.0 0.4 F(x) F(U) 0.0 F(u) 0.8 Generating a random variable from a given distribution 0.0 0.4 U 0.8 0 2 u 1. 2. X4 6 8 10 12 x Generate U from a uniform distribution between 0 and 1 Solve for X=F-1(U) F-1(U) is randomly distributed with CDF F(x) Basis P(X<x)=P(U<F(x))=P(F-1(U)<x) Generating a Pseudo random number • There is a lot of lore about this. Refer to: Press, W. H., B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, (1988), Numerical Recipes in C : The Art of Scientific Computing, Cambridge University Press, New York, 735 p. • Congruential method rnext remainder of [( rprev a c) m] • Each r is an integer random number between 0 and m-1. by (m-1) gives a number between 0 and 1 that repeats after at most m numbers. Numerical recipes gives "good" choices for a, c and m. • R has built in functions runif to generate uniform random numbers, as well as other distributions, e.g rnorm, rgamma. Moments of Random Variables Moments of Random Variables Population Sample Mean 1 N X Xi N i 1 xf ( x )dx Expectation 1 N Ê( X ) Xi N i 1 xf ( x )dx E( X ) Expectation operator E(g( X)) 1 N Ê( g( X )) g( X i ) N i 1 g(x )f (x )dx ( x ) 2 Variance 2 N 1 S ( X i X )2 N ( 1) i 1 f ( x )dx 2 E([ X E( X )] 2 ) Skewness 1 3 ( x ) 3 f ( x )dx 3 E([ X E( X )] ) / 3 ˆ 1 N (X i X) 3 N i 1 S3 L-Moments 2 1 / 2E[X(2|2) X(1|2) ] Probability weighted moments L-moment estimators L-Moment Diagrams From: Loucks, D. P., E. van Beek, J. R. Stedinger, J. P. M. Dijkman and M. T. Villars, (2005), Water Resources Systems Planning and Management: An Introduction to Methods, Models and Applications, UNESCO, Paris, 676 p, http://hdl.handle.net/1813/2804 From: Salas, J. D., J. W. Delleur, V. Yevjevich and W. L. Lane, (1980), Applied Modeling of Hydrologic Time Series, Water Resources Publications, Littleton, Colorado, 484 p. From: Salas, J. D., J. W. Delleur, V. Yevjevich and W. L. Lane, (1980), Applied Modeling of Hydrologic Time Series, Water Resources Publications, Littleton, Colorado, 484 p. Hillsborough River at Zephyr Hills, September flows 0.00010 x = 8621 mgal S = 8194 mgal n = 31 0.00000 Density 0.00020 Fitting a probability distribution to data 0 5000 10000 15000 mgal 20000 25000 30000 35000 Method of Moments • Using the sample moments as the estimate for the population parameters 2 ˆ ˆ E ( X ) x ; Var ( X ) 0.00020 Method of Moments Gamma distribution x 1e x f (x) () 2 0.00010 ˆ ˆ =1.3 x 10-3 x 0.00000 Density ˆ x =1.1 S 0 5000 10000 15000 20000 25000 30000 35000 0.00020 Method of Moments Log-Normal distribution f (x) 0.00010 S x ˆ 2y ln( CV2 1) =0.643 1 2 ˆ y ln( x exp( ˆ y )) =8.29 2 0.00000 Density CV 2 1 1 ln( x ) y exp y 2 y x 2 0 5000 10000 15000 20000 25000 30000 35000 Method of Maximum Likelihood • “Back into” the estimate by assuming the parameters we are trying to estimate from the data are known. • How likely are the sample values we have, given a certain set of parameter values? • We can express this as the joint density of the random sample given the parameter value. f X 1 , X 2 ,..., Xn x1 , x2 ,..., xn | f X xi | • After we obtain the data (random sample), we use the joint density to define the Likelihood function. n L | x1 , x2 ,..., xn f X xi | i 1 0.00020 Likelihood L fX xi | 0.00010 ln(L)= -312 (for log normal) 0.00000 Density ln(L)= -311 (for gamma) 0 5000 10000 15000 20000 25000 30000 35000 Normalization • Much theory relies on the central limit theorem so applies to Normal Distributions • Where the data is not normally distributed normalizing transformations are used – Log – Box Cox (Log is a special case of Box Cox) Box-Cox Normalization The Box-Cox family of transformations that includes the logarithmic transformation as a special case (=0). It is defined as: z = (x -1)/ ; 0 z = ln(x); = 0 where z is the transformed data, x is the original data and is the transformation parameter. Box-Cox Normalization So… the log looked OK ( = 0). Is that what we really want? Let’s skip the derivations for now and look at the answer for our three proposed methods. Determining Transformation Parameters • Trial and error: apply a series of trial lambda values and evaluate statistic. • PPCC (Filliben’s Statistic): R2 of best fit line of the QQplot • Kolomgorov-Smirnov (KS) Test (any distribution): p-value • Shapiro-Wilks Test for Normality: p-value Quantiles Rank the data pi 0.6 i n 1 0.2 prob( X x i ) F(y) x1 x2 x3 . . . xn Theoretical distribution, e.g. Standard Normal -3 -2 -1 0 qi1 2 3 y qi is the distribution specific theoretical quantile associated with ranked data value xi Quantile-Quantile Plots 7 6 5 3 4 Sample Quantiles 3000 2000 1000 0 Sample Quantiles xi ln(xi) 8 Normal Q-Q Plot QQ-plot for Log-Transformed Flows 4000 Normal QQ-plot for Q-Q RawPlot Flows -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 Theoretical Quantiles Theoretical Quantiles qi qi Need transformation to make the Raw flows Normally distributed. 2 3 Example: Determining Transformation Parameters • Alafia River historical monthly flows • Evaluate using all three criteria • Test a range of lambda values from -2 to 2 by 0.1 for Filliben’s and KS • Test a range of lambda values from -1 to 1 by 0.1 for Shapiro-Wilks (errors for larger lambda values). Box-Cox Normality Plot for Monthly September Flows on Alafia R. Using PPCC 0.2 0.4 0.6 This is close to 0, = -0.14 0.0 Fillibens Statistic 0.8 1.0 Box-Cox Normality Plot for Alafia R. -2 -1 0 Box-Cox Lambda Value Optimal Lambda= -0.14 1 2 Kolmogorov-Smirnov Test • Specifically, it computes the largest difference between the target CDF FX(x) and the observed CDF, F*(X). • The test statistic D2 is: n D2 max F * ( X (i ) ) FX ( X (i ) ) i 1 i (i ) max FX ( X ) i 1 n n where X(i) is the ith largest observed value in the random sample of size n. Box-Cox Normality Plot for Monthly September Flows on Alafia R. 1.0 Box-Cox Normality Plot for (KS) Alafia R.Statistic Using Kolmogorov-Smirnov 0.2 0.4 0.6 = -0.39 0.0 KS p-value 0.8 This is not as close to 0, -2 -1 0 Box-Cox Lambda Value Optimal Lambda= -0.39 1 2 http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/wilkshap.htm Box-Cox Normality Plot for Monthly September Flows on Alafia R. 0.2 0.4 0.6 This is close to 0, = -0.14. Same as PPCC. 0.0 Shapiro-Wilks p-value 0.8 1.0 Box-Cox Normality Plot for Alafia R. Using Shapiro-Wilks Statistic -1.0 -0.5 0.0 Box-Cox Lambda Value Optimal Lambda= -0.14 0.5 1.0