Download 7_Stochastic simula..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Predictive analytics wikipedia , lookup

Computer simulation wikipedia , lookup

Regression analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Least squares wikipedia , lookup

Value at risk wikipedia , lookup

Taylor's law wikipedia , lookup

Generalized linear model wikipedia , lookup

Transcript
FW853--Topic 7. Stochastic Simulations
1.
Readings
a.
Bartell, S.M., J.E. Breck, R. H. Gardner, and A.L. Brenkert. 1986.
Individual parameter perturbation and error analysis of fish bioenergetics models.
Canadian Journal of Fisheries and Aquatic Sciences 43:160-168
b.
Higgins, K., A. Hastings, J. N. Sarvela, and L. W. Botsford. Stochastic
dynamics and deterministic skeletons: population behavior of dungeness
crab. Science 276:1431-1435.
2.
Why do stochastic simulations?
a.
Empirical evaluation of statistical tests or methods
b.
System dynamics with stochasticity may be different than deterministic
i.
More ‘realistic’
c.
Managing in face of uncertainty-my view is that uncertainty is a basic ‘fact’ of
life. We need to develop and evaluate management strategies that recognize this
fact explicitly.
i.
Types or sources of uncertainty (uncertainty is ‘buzzword’)
(1)
Uncertain knowledge about current state
(a)
Measurement error (variance)
(b)
Unknown bias in measurement methods (actual errors)
(2)
Uncertain future- Process variability
(a)
examples: rainfall over course of next year, recruitment of
fish, insect outbreaks
(3)
Uncertain models- we can never be sure that we have the ‘right’
model. In fact, we are sure that nearly always our model does not
incorporate all of the system characteristics. We hope it captures
the important ones, however.
d.
How to incorporate stochasticity- Key problem is to choose appropriate
distribution(s), and incorporate it (them) in the appropriate places in the model.
Example-in modeling stream flow, you might have measurement error to your
inputs, process error in terms of variable rainfall, and models that operate on a
coarser scale than the actual processes take place in. Each of these may have a
different distribution, and all contribute to the overall uncertainty to the model
output. The point of this is not that it’s useless to try to model - rather that these
are the facts of life for virtually all models of real systems. Our challenge as
modellers/data analysts is to (1) reduce model uncertainty (2) acknowledge
variability (3) try to develop management strategies that are robust or ‘optimal’
given the uncertainties
Introduction: Some basic probability terms and concepts
a.
Random variables-discrete and continuous
b.
Probability Density Function (continuous) or Probability Mass Function (discrete)
c.
Cumulative Density Function
d.
Expected values
e.
Variance
f.
Covariance
How to incorporate variability
a.
Choosing a distribution-first step is deciding if continuous or discrete distribution
3.
4.
i.
ii.
When shopping for distributions-need to first consider processes and
which distribution makes ‘sense’. One problem if don’t do this is that
many distributions are flexible enough that several will fit the data nearly
equally as well. Philosophically this may not be a problem since if they
fit the data nearly the same, then over the range of the data the
distributions must be pretty similar.
(1)
Testing data against theoretical distributions
(a)
One-sample Chi-square test. Good for discrete variates,
but problem is how to specify intervals for continuous
variates.
(b)
Kolmogorov-Smirnov. Appropriate for all continuous
variates, but not always most powerful method.
(c)
Normal-specialized tests such as Shapiro-Wilk
(implemented in SAS)
Continuous distributions
(1)
Uniform
(a)
Starting point for almost all other continuous distributions.
(b)
Rarely use alone-only when you only know the max and
min of a distribution, and think everything in between is
equally likely. Note that lots of people use this, however,
as their first (and unfortunately sometimes last) distribution
in stochastic simulations.
(2)
Normal
(a)
One of most important distributions, and should be most
familar to people. One thing to note-there is no closed
form for CDF, so when we get to generating normals, we
need to use an indirect method.
(b)
Processes generating normal-many measurements are
distributed normally, means of variables based on moderate
to large sample sizes are ~ normally, other examples?
(c)
Usually incorporated in an additive sense to some outcomes
or some rates-we will see how this is related to lognormal.
Key is want mean of added variance to equal 0 so don’t
un-intentionally shift mean of state variables.
(3)
Lognormal
(a)
Closely related to normal-basically exp( normal).
(b)
Processes generating - when have normally distributed
process error in rates, then have lognormally distributed
variance in outcomes-Show example.
(c)
Usually incorporated in a multiplicative sense to outcomes.
Key is want mean of multiplicative factor to equal 1 so
don’t un-intentionally shift mean of state variables.
(4)
Exponential
(a)
Lifetime of objects with constant hazard rate
(5)
Gamma (Erlang)
(a)
Time to complete task if have several independent steps
(6)
Weibul
(a)
also used to represent lifetime of devices, time to complete
task
(7)
b.
Beta
(a)
Truly flexible distribution
(8)
Multivariate distributions (i.e., non-independent or correlated
variables) - won’t cover in class. For multivariate normal, use
Choleski Factorization
iii.
Discrete distributions
(1)
Binomial/Bernoulli
(a)
Number of success in t independent trials
(2)
Geometric
(a)
Number of failures before a success
(b)
Number of items examined before a defect found
(3)
Poisson
(a)
Occurrence of rare events
(4)
Negative Binomial
(a)
Often describes number of animals in a quadrat,
particularly when animals are clustered
iv.
Bootstrapping
v.
Basic question-is variance incorporated into rates or outcomes?
vi.
Another basic question-is error additive or multiplicative
Generating variables
i.
If Cumulative Distribution Function (CDF) exists, then use directly in
conjunction with Uniform
(1)
One of keys in generating uniform random variables is to use a
‘good’ pseudo-random number generator
(2)
Example for Exponential where f(x)=1/ e-x/ and F(x)= -ln(1-u)
where u is height of CDF
(a)
show picture
ii.
Otherwise, transformation methods
(1)
Box-Muller for normal
iii.
Otherwise, acceptance/rejection methods
Expected Value and Variance
If c and d are constants, and X and Y are random variables
Rules for Expected Value
E(cX) = c E(X)
E(X+c) = c + E(X)
E(X+Y) = E(X) + E(Y)
If X and Y are independent, then
E(XY) = E(X) E(Y)
Rules for Variances
Var(cX) = c2 Var(X)
Var(X+c) = Var(X)
Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y)
Var(X_Y) = Var(X) + Var(Y) _ 2Cov(X,Y)
If X and Y are independent, then
Var(XY) = (E(X))2 Var(Y) + (E(Y))2 Var(X)
Rules for Covariances
Cov(X+c,Y+d) = Cov(X,Y)
Cov(cX,dY) = cd Cov(X,Y)
Cov(X+Y,Z) = Cov(X,Z) + Cov(Y,Z)
Cov(X,X) = Var(X)
If X and Y are independent, then
Cov(X,Y) = 0
Computational Formulae
Var(X) = [n X2 + ( X)2 ] / n(n_1)
Cov(X,Y) = [ (X_E(X))(Y_E(Y))] / (n_1)
Box-Muller method for generating N(0,1) variates
Begin by generating X1 ~ U(0,1) and X2 ~ U(0,1)
To get Y ~ N(0,1) take the transformation:
Y1 = Sqrt(-2 ln(X1)) * cos 2X2
can also generate 2nd normally distributed random variable by:
Y2 = Sqrt(-2 ln(X1)) * sin 2X2
To get a normally distributed variable with a specific variance
(and mean of zero), multiply Y by the square root of the target
variance. To get a specific mean, add the desired mean to
Y. Key - you need to do this in the “correct” order - first scale
the variance, then scale the mean.
To get a lognormally distributed variable- transform the
normally distributed variable by:
Z = eX
where X ~ N(0,1)
To get lognormal variable with specific mean and variance,
Mean (of normal)
= ln [2 / (2 +2)1/2 ]
Variance (of normal) = ln [(2 +2)/2]
where  and  are the target mean and variance of
lognormal