* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 7_Stochastic simula..
Survey
Document related concepts
Transcript
FW853--Topic 7. Stochastic Simulations 1. Readings a. Bartell, S.M., J.E. Breck, R. H. Gardner, and A.L. Brenkert. 1986. Individual parameter perturbation and error analysis of fish bioenergetics models. Canadian Journal of Fisheries and Aquatic Sciences 43:160-168 b. Higgins, K., A. Hastings, J. N. Sarvela, and L. W. Botsford. Stochastic dynamics and deterministic skeletons: population behavior of dungeness crab. Science 276:1431-1435. 2. Why do stochastic simulations? a. Empirical evaluation of statistical tests or methods b. System dynamics with stochasticity may be different than deterministic i. More ‘realistic’ c. Managing in face of uncertainty-my view is that uncertainty is a basic ‘fact’ of life. We need to develop and evaluate management strategies that recognize this fact explicitly. i. Types or sources of uncertainty (uncertainty is ‘buzzword’) (1) Uncertain knowledge about current state (a) Measurement error (variance) (b) Unknown bias in measurement methods (actual errors) (2) Uncertain future- Process variability (a) examples: rainfall over course of next year, recruitment of fish, insect outbreaks (3) Uncertain models- we can never be sure that we have the ‘right’ model. In fact, we are sure that nearly always our model does not incorporate all of the system characteristics. We hope it captures the important ones, however. d. How to incorporate stochasticity- Key problem is to choose appropriate distribution(s), and incorporate it (them) in the appropriate places in the model. Example-in modeling stream flow, you might have measurement error to your inputs, process error in terms of variable rainfall, and models that operate on a coarser scale than the actual processes take place in. Each of these may have a different distribution, and all contribute to the overall uncertainty to the model output. The point of this is not that it’s useless to try to model - rather that these are the facts of life for virtually all models of real systems. Our challenge as modellers/data analysts is to (1) reduce model uncertainty (2) acknowledge variability (3) try to develop management strategies that are robust or ‘optimal’ given the uncertainties Introduction: Some basic probability terms and concepts a. Random variables-discrete and continuous b. Probability Density Function (continuous) or Probability Mass Function (discrete) c. Cumulative Density Function d. Expected values e. Variance f. Covariance How to incorporate variability a. Choosing a distribution-first step is deciding if continuous or discrete distribution 3. 4. i. ii. When shopping for distributions-need to first consider processes and which distribution makes ‘sense’. One problem if don’t do this is that many distributions are flexible enough that several will fit the data nearly equally as well. Philosophically this may not be a problem since if they fit the data nearly the same, then over the range of the data the distributions must be pretty similar. (1) Testing data against theoretical distributions (a) One-sample Chi-square test. Good for discrete variates, but problem is how to specify intervals for continuous variates. (b) Kolmogorov-Smirnov. Appropriate for all continuous variates, but not always most powerful method. (c) Normal-specialized tests such as Shapiro-Wilk (implemented in SAS) Continuous distributions (1) Uniform (a) Starting point for almost all other continuous distributions. (b) Rarely use alone-only when you only know the max and min of a distribution, and think everything in between is equally likely. Note that lots of people use this, however, as their first (and unfortunately sometimes last) distribution in stochastic simulations. (2) Normal (a) One of most important distributions, and should be most familar to people. One thing to note-there is no closed form for CDF, so when we get to generating normals, we need to use an indirect method. (b) Processes generating normal-many measurements are distributed normally, means of variables based on moderate to large sample sizes are ~ normally, other examples? (c) Usually incorporated in an additive sense to some outcomes or some rates-we will see how this is related to lognormal. Key is want mean of added variance to equal 0 so don’t un-intentionally shift mean of state variables. (3) Lognormal (a) Closely related to normal-basically exp( normal). (b) Processes generating - when have normally distributed process error in rates, then have lognormally distributed variance in outcomes-Show example. (c) Usually incorporated in a multiplicative sense to outcomes. Key is want mean of multiplicative factor to equal 1 so don’t un-intentionally shift mean of state variables. (4) Exponential (a) Lifetime of objects with constant hazard rate (5) Gamma (Erlang) (a) Time to complete task if have several independent steps (6) Weibul (a) also used to represent lifetime of devices, time to complete task (7) b. Beta (a) Truly flexible distribution (8) Multivariate distributions (i.e., non-independent or correlated variables) - won’t cover in class. For multivariate normal, use Choleski Factorization iii. Discrete distributions (1) Binomial/Bernoulli (a) Number of success in t independent trials (2) Geometric (a) Number of failures before a success (b) Number of items examined before a defect found (3) Poisson (a) Occurrence of rare events (4) Negative Binomial (a) Often describes number of animals in a quadrat, particularly when animals are clustered iv. Bootstrapping v. Basic question-is variance incorporated into rates or outcomes? vi. Another basic question-is error additive or multiplicative Generating variables i. If Cumulative Distribution Function (CDF) exists, then use directly in conjunction with Uniform (1) One of keys in generating uniform random variables is to use a ‘good’ pseudo-random number generator (2) Example for Exponential where f(x)=1/ e-x/ and F(x)= -ln(1-u) where u is height of CDF (a) show picture ii. Otherwise, transformation methods (1) Box-Muller for normal iii. Otherwise, acceptance/rejection methods Expected Value and Variance If c and d are constants, and X and Y are random variables Rules for Expected Value E(cX) = c E(X) E(X+c) = c + E(X) E(X+Y) = E(X) + E(Y) If X and Y are independent, then E(XY) = E(X) E(Y) Rules for Variances Var(cX) = c2 Var(X) Var(X+c) = Var(X) Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y) Var(X_Y) = Var(X) + Var(Y) _ 2Cov(X,Y) If X and Y are independent, then Var(XY) = (E(X))2 Var(Y) + (E(Y))2 Var(X) Rules for Covariances Cov(X+c,Y+d) = Cov(X,Y) Cov(cX,dY) = cd Cov(X,Y) Cov(X+Y,Z) = Cov(X,Z) + Cov(Y,Z) Cov(X,X) = Var(X) If X and Y are independent, then Cov(X,Y) = 0 Computational Formulae Var(X) = [n X2 + ( X)2 ] / n(n_1) Cov(X,Y) = [ (X_E(X))(Y_E(Y))] / (n_1) Box-Muller method for generating N(0,1) variates Begin by generating X1 ~ U(0,1) and X2 ~ U(0,1) To get Y ~ N(0,1) take the transformation: Y1 = Sqrt(-2 ln(X1)) * cos 2X2 can also generate 2nd normally distributed random variable by: Y2 = Sqrt(-2 ln(X1)) * sin 2X2 To get a normally distributed variable with a specific variance (and mean of zero), multiply Y by the square root of the target variance. To get a specific mean, add the desired mean to Y. Key - you need to do this in the “correct” order - first scale the variance, then scale the mean. To get a lognormally distributed variable- transform the normally distributed variable by: Z = eX where X ~ N(0,1) To get lognormal variable with specific mean and variance, Mean (of normal) = ln [2 / (2 +2)1/2 ] Variance (of normal) = ln [(2 +2)/2] where and are the target mean and variance of lognormal