Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Simulation Wrap-up, Statistics COS 323 Last time • Time-driven, event-driven • “Simulation” from differential equations • Cellular automata, microsimulation, agent-based simulation • Example applications: SIR disease model, population genetics Simulation: Pros and Cons • Pros: – – – – Building model can be easy (easier) than other approaches Outcomes can be easy to understand Cheap, safe Good for comparisons • Cons: – – – – Hard to debug No guarantee of optimality Hard to establish validity Can’t produce absolute numbers Simulation: Important Considerations • Are outcomes statistically significant? (Need many simulation runs to assess this) • What should initial state be? • How long should the simulation run? • Is the model realistic? • How sensitive is the model to parameters, initial conditions? Statistics Overview Descriptive statistics experiment data Inferential statistics sample statistics: μ, σ2, … Model for probabilistic mechanism… estimates with confidence intervals inferences predictions Random Variables • A random variable is any “probabilistic outcome” – e.g., a coin flip, height of someone randomly chosen from a population • A R.V. takes on a value in a sample space – space can be discrete, e.g., {H, T} – or continuous, e.g. height in (0, infinity) • R.V. denoted with capital letter (X), a realization with lowercase letter (x) – e.g., X is a coin flip, x is the value (H or T) of that coin flip Probability Mass Function • Describes probability for a discrete R.V. • e.g., Probability Density Function • Describes probability for a continuous R.V. • e.g., [Population] Mean of a Random Variable • aka expected value, first moment • for discrete RV: E[ X ] = µ = ∑x p (requires that ∑ p i i i i i • for continuous RV: E[ X ] = µ = ∫ ∞ = 1) x p ( x) dx −∞ (requires that ∫ ∞ −∞ p ( x) dx = 1) [Population] Variance σ = E [(X − µ) 2 2 ] = E [X − 2Xµ + µ 2 = E [X 2 ] ]− µ = E [X ] − (E [X ]) 2 2 2 2 • for discrete RV: σ = ∑ pi (x i − µ) 2 2 i • for continuous RV: σ = ∫ ( x − µ ) p ( x)dx 2 2 Sample mean and sample variance • Suppose we have N independent observations of X: x1, x2, …xN • Sample mean: N 1 xi = x ∑ N i=1 Unbiased: E[x ] = µ • Sample variance: N 1 2 2 (x i − x ) = s ∑ N −1 i=1 E[s ] = σ 2 2 1/(N-1) and the sample variance • The N differences x i − x are not independent: ∑ (x i − x) = 0 • If you know N-1 of these values, you can deduce the last one – i.e., only N-1 degrees of freedom • Could treat sample as population and compute N population variance: 1 N 2 − x ) (x ∑ i i=1 – BUT this underestimates true population variance (especially bad if sample is small) Computing Sample Variance • Can compute as N 1 2 − x ) (x s = ∑ N −1 i=1 i 2 • Prefer: 1 2 2 2 ∑ xi − N (x ) ∑ xi − ∑ xi N i =1 i =1 i =1 2 s = = N −1 N −1 N N N (one pass, fewer operations, more accurate) 2 The Gaussian Distribution 1 p(x) = e σ 2π E[X] = µ Var[X] = σ 2 1 x −µ 2 σ 2 Why so important? • sum of independent observations of random variables converges to Gaussian *(with some assumptions) • in nature, events having variations resulting from many small, independent effects tend to have Gaussian distributions – demo: http://www.mongrav.org/math/falling-ballsprobability.htm – e.g., measurement error – if effects are multiplicative, logarithm is often normally distributed Central Limit Theorem • Suppose we sample x1, x2, … xN from a distribution with mean μ and variance σ2 • Let • then N 1 x = ∑ xi N i=1 x −µ z= →N(0,1) σ/ N holds for *(almost) any parent distribution! • i.e., x distributed normally with mean μ, variance σ2/N Important Properties of Normal Distribution 1. Family of normal distributions closed under linear transformations: if X ~ N(μ, σ2) then (aX + b) ~ N(aμ+b, a2σ2) 2. Linear combination of normals is also normal: if X1 ~ N(μ1, σ12) and X2 ~ N(μ2, σ22) then aX1+bX2 ~ N(aμ1 + bμ2, a2σ12 + b2σ22) Important Properties of Normal Distribution 3. Of all distributions with mean and variance, normal has maximum entropy Information theory: Entropy like “uninformativeness” Principle of maximum entropy: choose to represent the world with as uninformative a distribution as possible, subject to “testable information” If we know x is in [a, b], then uniform distribution on [a, b] has least entropy If we know distribution has mean μ, variance σ2, normal distribution N(μ, σ2) has least entropy Important Properties of Normal Distribution 4. If errors are normally distributed, a least-squares fit yields the maximum likelihood estimator Finding least-squares x st Ax ≈ b finds the value of x that maximizes the likelihood of data A under some model Important Properties of Normal Distribution 5. Many derived random variables have analytically-known densities e.g., sample mean, sample variance 6. Sample mean and variance of n identical independent samples are independent; sample mean is a normally-distributed random variable X n ~ N( µ, σ /n) 2 What if we don’t know true variance? • Sample mean is normally distributed R.V. X n ~ N( µ, σ /n) 2 • Taking advantage of this presumes we know σ2 x −µ • has a t distribution with (n-1) d.o.f. sn / n [Student’s] t-distribution Forming a confidence interval • e.g., given that I observed a sample mean of ____, I’m 99% confident that the true mean lies between ____ and ____. x −µ • Know that has t distribution sn / n • Choose q1, q2 such that student t with (n-1) dof has 99% probability of lying between q1, q2 q1 q2 Interpreting Simulation Outcomes • How long will customers have to wait, on average? – e.g., for given # tellers, arrival rate, service time distribution, etc. Interpreting Simulation Outcomes • Simulate bank for N customers • Let xi be the wait time of customer i • Is mean(x) a good estimate for μ? • How to compute a 95% confidence interval for μ? – Problem: xi are not independent! Replications • Run simulation to get M observations • Repeat simulation N times (different random numbers each time) • Treat the sample mean of different runs as approximately uncorrelated 1 2 s = (X i − X ) ∑ n −1 i 2