Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Simulation Wrap-up,
Statistics
COS 323
Today
• Simulation Re-cap
• Statistics
• Variance and confidence intervals for
simulations
• Simulation wrap-up
• FYI: No class or office hours Thursday
Simulation wrap-up
Last time
• Time-driven, event-driven
• “Simulation” from differential equations
• Cellular automata, microsimulation, agentbased simulation
– see e.g.
http://www.microsimulation.org/IMA/What%20is
%20microsimulation.htm
• Example applications: SIR disease model,
population genetics
Simulation: Pros and Cons
• Pros:
–
–
–
–
Building model can be easy (easier) than other approaches
Outcomes can be easy to understand
Cheap, safe
Good for comparisons
• Cons:
–
–
–
–
Hard to debug
No guarantee of optimality
Hard to establish validity
Can’t produce absolute numbers
Simulation: Important Considerations
• Are outcomes statistically significant? (Need
many simulation runs to assess this)
• What should initial state be?
• How long should the simulation run?
• Is the model realistic?
• How sensitive is the model to parameters,
initial conditions?
Statistics Overview
Random Variables
• A random variable is any “probabilistic outcome”
– e.g., a coin flip, height of someone randomly chosen from a
population
• A R.V. takes on a value in a sample space
– space can be discrete, e.g., {H, T}
– or continuous, e.g. height in (0, infinity)
• R.V. denoted with capital letter (X), a realization with
lowercase letter (x)
– e.g., X is a coin flip, x is the value (H or T) of that coin flip
Probability Mass Function
• Describes probability for a discrete R.V.
• e.g.,
Probability Density Function
• Describes probability for a continuous R.V.
• e.g.,
[Population] Mean of a Random Variable
• aka expected value, first moment
• for discrete RV:
E [ X ] = µ = ∑ x i pi
i
• for continuous RV:
€
E[ X ] = µ =
∫
∞
−∞
x p(x) dx
[Population] Variance
σ = E [(X − µ)
2
2
]
= E [ X − 2Xµ + µ
2
= E[ X
]−µ
= E [ X ] − (E [ X ])
2
]
2
2
• for discrete RV:
2
2
σ = ∑ pi (x i − µ)
2
2
i
€
• for continuous RV:
σ2 =
2
(x
−
µ
)
p(x) dx
∫
Sample mean and sample variance
• Suppose we have N independent
observations of X: x1, x2, …xN
• Sample mean:
N
1
xi = x
∑
N i=1
E[x ] = µ
• Sample variance:
N
€
1
2
2
(x i − x ) = s
∑
N −1 i=1
€
2
E[s ] = σ
2
1/(N-1) and the sample variance
• The N differences
x i − x are not independent:
∑ (x
i
− x) = 0
• If you know N-1 of these values, you can deduce the
last one€
– i.e., only N-1 degrees of freedom
€ treat sample as population and compute
• Could
N
population variance:
1
2
(x
∑
N
i=1
i
− x)
– BUT this underestimates true population variance (especially
bad if sample is small)
€
Sample variance using 1/(N-1) is unbiased
N
⎡
⎤
1
2
2
E [ s ] = E ⎢
(x i − x ) ⎥
∑
⎣ N −1 i=1
⎦
⎡ N 2
⎤
1
2
=
E ⎢∑ x i − Nx ⎥
N −1 ⎣ i=1
⎦
2
⎛
⎞ ⎤
1 ⎡
σ
2
2
2
=
⎢N (σ + µ ) − N ⎜ + µ ⎟ ⎥
N −1 ⎣
⎝ N
⎠ ⎦
=σ2
€
Computing sample variance
• Can compute as
N
1
2
2
s =
(x
−
x
)
∑
i
N −1 i=1
• Prefer:
€
⎛ N 2 ⎞
⎜∑ x i ⎟ − N(x ) 2
⎝ i=1 ⎠
2
s =
N −1
The Gaussian Distribution
1
p(x) =
e
σ 2π
E[X] = µ
Var[X] = σ 2
1 ⎛ x − µ ⎞
⎜
⎟
2 ⎝ σ ⎠
2
Why so important?
• sum of independent observations of a
random variable converges to Gaussian
• in nature, events having variations resulting
from many small, independent effects tend to
have Gaussian distributions
– demo: http://www.mongrav.org/math/falling-ballsprobability.htm
– e.g., measurement error
– if effects are multiplicative, logarithm is often
normally distributed
Central Limit Theorem
• Suppose we sample x1, x2, … xN from a
distribution with mean μ and variance σ2
• Let
• then
€
N
1
x = ∑ xi
N i=1
x −µ
z=
→N(0,1)
σ/ N
• i.e., x distributed normally with mean μ,
variance σ2/N
€
Important Properties of Normal Distribution
1. Family of normal distributions closed under linear
transformations:
if X ~ N(μ, σ2) then
(aX + b) ~ N(aμ+b, a2σ2)
2. Linear combination of normals is also normal:
if X1 ~ N(μ1, σ12) and X2 ~ N(μ2, σ22) then
aX1+bX2 ~ N(aμ1 + bμ2, a2σ12 + b2σ22)
Important Properties of Normal Distribution
3. Of all distributions with mean and variance, normal
has maximum entropy
Information theory: Entropy like “uninformativeness”
Principle of maximum entropy: choose to represent
the world with as uninformative a distribution as
possible, subject to “testable information”
If we know x is in [a, b], then uniform distribution on [a,
b] has least entropy
If we know distribution has mean µ, variance σ2, normal
distribution N(µ, σ2) has least entropy
Important Properties of Normal Distribution
4. If errors are normally distributed, a least-squares fit
yields the maximum likelihood estimator
Finding least-squares x st Ax ≈ b finds the value of x
that maximizes the likelihood of data A under some
model
Important Properties of Normal Distribution
5. Many derived random variables have
analytically-known densities
e.g., sample mean, sample variance
6. Sample mean and variance of n identical
independent samples are independent;
sample mean is a normally-distributed
random variable
2
X n ~ N( µ, σ /n)
Distribution of Sample Variance
(For Gaussian R.V. X)
N
1
2
s =
(x i − x )
∑
N −1 i=1
2
(n −1)s2
define U =
σ2
then U has a χ2 distribution with (n -1) d.o.f.
⎡ n / 2 ⎛ n ⎞ ⎤
p(x) = ⎢2 Γ⎜ ⎟ ⎥
⎝ 2 ⎠ ⎦
⎣
−1
n
2 −1
( x) e
−x / 2
, x ≥0
E [U ] = n −1, Var [U ] = 2(n −1)
The Chi-Squared Distribution
What if we don’t know true variance?
• Sample mean is normally distributed R.V.
2
X n ~ N( µ, σ /n)
• Taking advantage of this presumes we know
σ2
x −µ
€
•
sn / n
€
has a t distribution with (n-1) d.o.f.
[Student’s] t-distribution
Forming a confidence interval
• e.g., given that I observed a sample mean of ____,
I’m 99% confident that the true mean lies between
____ and ____.
• Know that x − µ
has t distribution
sn / n
• Choose q1, q2 such that student t with (n-1) dof has
99% probability of lying between q1, q2
€
Confidence interval for the mean
⎧
⎫
xn − µ
if P ⎨q1 <
< q2 ⎬ = 0.99
sn / n
⎩
⎭
⎧
sn
sn ⎫
⎬ = 0.99
then P ⎨ x n − q2
< µ < x n − q1
⎩
n
n ⎭
€
Interpreting Simulation Outcomes
• How long will customers have to wait, on
average?
– e.g., for given # tellers, arrival rate, service time
distribution, etc.
• Simulate bank for N customers
• Let xi be the wait time of customer i
• Is mean(x) a good estimate for µ?
• How to compute a 95% confidence interval
for µ ?
– Problem: xi are not independent!
Replications
• Run simulation to get M observations
• Repeat simulation N times (different random
numbers each time)
• Treat the sample mean of different runs as
approximately uncorrelated
1
2
s =
(X i − X )
∑
n −1 i
2
Batch Means
• Run simulation for N (large)
• Divide xi into k consecutive batches of size b
• If b large enough, mean(batch1) approx.
uncorrelated with mean(batch2), etc.
Other approaches
• Use estimation of autocorrelation between xis
to derive better estimate of variance that can
be used for confidence interval
• Regenerative method: Take advantage of
“regeneration points” or cycles in behavior
– e.g., points when bank is empty of customers
Simulation Wrap-up
Finally…
Implications
• Who designed it all?
• How should we behave?
• What if we start running too many of our own
simulations?
Software
• http://en.wikipedia.org/wiki/
List_of_computer_simulation_software