Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Simulation Wrap-up, Statistics COS 323 Today • Simulation Re-cap • Statistics • Variance and confidence intervals for simulations • Simulation wrap-up • FYI: No class or office hours Thursday Simulation wrap-up Last time • Time-driven, event-driven • “Simulation” from differential equations • Cellular automata, microsimulation, agentbased simulation – see e.g. http://www.microsimulation.org/IMA/What%20is %20microsimulation.htm • Example applications: SIR disease model, population genetics Simulation: Pros and Cons • Pros: – – – – Building model can be easy (easier) than other approaches Outcomes can be easy to understand Cheap, safe Good for comparisons • Cons: – – – – Hard to debug No guarantee of optimality Hard to establish validity Can’t produce absolute numbers Simulation: Important Considerations • Are outcomes statistically significant? (Need many simulation runs to assess this) • What should initial state be? • How long should the simulation run? • Is the model realistic? • How sensitive is the model to parameters, initial conditions? Statistics Overview Random Variables • A random variable is any “probabilistic outcome” – e.g., a coin flip, height of someone randomly chosen from a population • A R.V. takes on a value in a sample space – space can be discrete, e.g., {H, T} – or continuous, e.g. height in (0, infinity) • R.V. denoted with capital letter (X), a realization with lowercase letter (x) – e.g., X is a coin flip, x is the value (H or T) of that coin flip Probability Mass Function • Describes probability for a discrete R.V. • e.g., Probability Density Function • Describes probability for a continuous R.V. • e.g., [Population] Mean of a Random Variable • aka expected value, first moment • for discrete RV: E [ X ] = µ = ∑ x i pi i • for continuous RV: € E[ X ] = µ = ∫ ∞ −∞ x p(x) dx [Population] Variance σ = E [(X − µ) 2 2 ] = E [ X − 2Xµ + µ 2 = E[ X ]−µ = E [ X ] − (E [ X ]) 2 ] 2 2 • for discrete RV: 2 2 σ = ∑ pi (x i − µ) 2 2 i € • for continuous RV: σ2 = 2 (x − µ ) p(x) dx ∫ Sample mean and sample variance • Suppose we have N independent observations of X: x1, x2, …xN • Sample mean: N 1 xi = x ∑ N i=1 E[x ] = µ • Sample variance: N € 1 2 2 (x i − x ) = s ∑ N −1 i=1 € 2 E[s ] = σ 2 1/(N-1) and the sample variance • The N differences x i − x are not independent: ∑ (x i − x) = 0 • If you know N-1 of these values, you can deduce the last one€ – i.e., only N-1 degrees of freedom € treat sample as population and compute • Could N population variance: 1 2 (x ∑ N i=1 i − x) – BUT this underestimates true population variance (especially bad if sample is small) € Sample variance using 1/(N-1) is unbiased N ⎡ ⎤ 1 2 2 E [ s ] = E ⎢ (x i − x ) ⎥ ∑ ⎣ N −1 i=1 ⎦ ⎡ N 2 ⎤ 1 2 = E ⎢∑ x i − Nx ⎥ N −1 ⎣ i=1 ⎦ 2 ⎛ ⎞ ⎤ 1 ⎡ σ 2 2 2 = ⎢N (σ + µ ) − N ⎜ + µ ⎟ ⎥ N −1 ⎣ ⎝ N ⎠ ⎦ =σ2 € Computing sample variance • Can compute as N 1 2 2 s = (x − x ) ∑ i N −1 i=1 • Prefer: € ⎛ N 2 ⎞ ⎜∑ x i ⎟ − N(x ) 2 ⎝ i=1 ⎠ 2 s = N −1 The Gaussian Distribution 1 p(x) = e σ 2π E[X] = µ Var[X] = σ 2 1 ⎛ x − µ ⎞ ⎜ ⎟ 2 ⎝ σ ⎠ 2 Why so important? • sum of independent observations of a random variable converges to Gaussian • in nature, events having variations resulting from many small, independent effects tend to have Gaussian distributions – demo: http://www.mongrav.org/math/falling-ballsprobability.htm – e.g., measurement error – if effects are multiplicative, logarithm is often normally distributed Central Limit Theorem • Suppose we sample x1, x2, … xN from a distribution with mean μ and variance σ2 • Let • then € N 1 x = ∑ xi N i=1 x −µ z= →N(0,1) σ/ N • i.e., x distributed normally with mean μ, variance σ2/N € Important Properties of Normal Distribution 1. Family of normal distributions closed under linear transformations: if X ~ N(μ, σ2) then (aX + b) ~ N(aμ+b, a2σ2) 2. Linear combination of normals is also normal: if X1 ~ N(μ1, σ12) and X2 ~ N(μ2, σ22) then aX1+bX2 ~ N(aμ1 + bμ2, a2σ12 + b2σ22) Important Properties of Normal Distribution 3. Of all distributions with mean and variance, normal has maximum entropy Information theory: Entropy like “uninformativeness” Principle of maximum entropy: choose to represent the world with as uninformative a distribution as possible, subject to “testable information” If we know x is in [a, b], then uniform distribution on [a, b] has least entropy If we know distribution has mean µ, variance σ2, normal distribution N(µ, σ2) has least entropy Important Properties of Normal Distribution 4. If errors are normally distributed, a least-squares fit yields the maximum likelihood estimator Finding least-squares x st Ax ≈ b finds the value of x that maximizes the likelihood of data A under some model Important Properties of Normal Distribution 5. Many derived random variables have analytically-known densities e.g., sample mean, sample variance 6. Sample mean and variance of n identical independent samples are independent; sample mean is a normally-distributed random variable 2 X n ~ N( µ, σ /n) Distribution of Sample Variance (For Gaussian R.V. X) N 1 2 s = (x i − x ) ∑ N −1 i=1 2 (n −1)s2 define U = σ2 then U has a χ2 distribution with (n -1) d.o.f. ⎡ n / 2 ⎛ n ⎞ ⎤ p(x) = ⎢2 Γ⎜ ⎟ ⎥ ⎝ 2 ⎠ ⎦ ⎣ −1 n 2 −1 ( x) e −x / 2 , x ≥0 E [U ] = n −1, Var [U ] = 2(n −1) The Chi-Squared Distribution What if we don’t know true variance? • Sample mean is normally distributed R.V. 2 X n ~ N( µ, σ /n) • Taking advantage of this presumes we know σ2 x −µ € • sn / n € has a t distribution with (n-1) d.o.f. [Student’s] t-distribution Forming a confidence interval • e.g., given that I observed a sample mean of ____, I’m 99% confident that the true mean lies between ____ and ____. • Know that x − µ has t distribution sn / n • Choose q1, q2 such that student t with (n-1) dof has 99% probability of lying between q1, q2 € Confidence interval for the mean ⎧ ⎫ xn − µ if P ⎨q1 < < q2 ⎬ = 0.99 sn / n ⎩ ⎭ ⎧ sn sn ⎫ ⎬ = 0.99 then P ⎨ x n − q2 < µ < x n − q1 ⎩ n n ⎭ € Interpreting Simulation Outcomes • How long will customers have to wait, on average? – e.g., for given # tellers, arrival rate, service time distribution, etc. • Simulate bank for N customers • Let xi be the wait time of customer i • Is mean(x) a good estimate for µ? • How to compute a 95% confidence interval for µ ? – Problem: xi are not independent! Replications • Run simulation to get M observations • Repeat simulation N times (different random numbers each time) • Treat the sample mean of different runs as approximately uncorrelated 1 2 s = (X i − X ) ∑ n −1 i 2 Batch Means • Run simulation for N (large) • Divide xi into k consecutive batches of size b • If b large enough, mean(batch1) approx. uncorrelated with mean(batch2), etc. Other approaches • Use estimation of autocorrelation between xis to derive better estimate of variance that can be used for confidence interval • Regenerative method: Take advantage of “regeneration points” or cycles in behavior – e.g., points when bank is empty of customers Simulation Wrap-up Finally… Implications • Who designed it all? • How should we behave? • What if we start running too many of our own simulations? Software • http://en.wikipedia.org/wiki/ List_of_computer_simulation_software