* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lesson1
Inductive probability wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Mean field particle methods wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Monte Carlo method wikipedia , lookup
German tank problem wikipedia , lookup
Lesson 1: Introduction to Monte Carlo • • • Go over outline and syllabus Background and course overview Statistical Formulae: • • • • Mean SD of population SD of mean A little practice 1 Background and course overview • • • Monte Carlo methods are a branch of mathematics that involves the stochastic solution of problems. Experimental approach to solving a problem. (“Method of Statistical Trials”) When the analyst is trying to use a Monte Carlo approach to estimate a physically measurable variable, the approach breaks itself down into two steps: 1. 2. Devise a numerical experiment whose expected value would correspond to the desired measurable value x . Run the problem to determine an estimate to this variable. We call the estimate x̂ . • Many (probably most) Monte Carlo problems are of the “Hit or Miss” category, which finds the probability of some event occurring. (e.g., hitting a straight flush, neutron escaping 2 through a surface, etc.) BG and Overview (2) • • • The first step can either be very simple or very complicated, based on the particulars features of the problem. If the problem is itself stochastic, the experimental design step is very simple: Let the mathematical simulation simulate the problem. This is an analog simulation, since the calculation is a perfect analog to the problem. Lucky for us, the transport of neutral particles is a stochastic situation. All we HAVE to do to get a guess at a measurable effect from a transport situation is to simulate the stochastic "decisions" that nature makes. 3 BG and Overview (3) • • For processes that are NOT inherently stochastic, the experimental design is more complex and generally requires that the analyst: 1. Derive an equation (e.g., heat transfer equation, Boltzmann transport equation) from whose solution an estimate of the effect of interest can be inferred. 2. Develop a Monte Carlo method to solve the equation. In this course we will do BOTH approaches: • • • 3 weeks of event-based mathematical basics 4 weeks of optimization of event-based transport methods 3 weeks function-based, using a more formal “functional analysis” approach to the solution of integral and differential equations (with the BTE as an example) 4 Our First Example: Finding p • Our first example will be a numerical estimation of , based on use of a “hit or miss” approach. We know that the ratio of the area of circle to the area of the square that (just barely) encloses it is: Pr{hitting circle}= • p r2 2r 2 p 4 Knowing this, we can design an experiment that will deliver an expected value of p 5 Our First Example (2) 1. Choose a point at random inside a 2x2 square by: A. Choosing a random number (x1) between -1 and 1 for the x coordinate, and B. Choosing a random number (x2) between -1 and 1 for the y coordinate. NOTE: By doing this you have made an implicit “change of variable” to this (which is called a “unit hypercube of order 2”): 6 Our First Example (3) 2. Score the result of a the trial: Consider a "hit" (score = 4) to be the situation when the chosen point is inside the circle, i.e., x2 y 2 1 3. a "miss" scoring 0. (Why does a success score 4?) Run the experiment a large number (N) of times, with the final estimate of the circle's area being an average of the results: N pˆ N s i 1 i N 7 Coding • This course will require lots of coding. You need to be able to write code in Java • Syntax similar to other languages (if you stay non-object oriented) • Free download of language • Useful for other things (webpages) • Simple tutorial available in the Public area of course (JavaLite.pdf) 8 Basic view of MC process • Our basic view of a Monte Carlo process is a black box that has a stream of random numbers (between 0 and 1) as input and a stream of estimates of the effect of interest as output: • Sometimes the estimates can be quite approximate, but with a long enough stream of estimates, we can get a good sample. 9 3 Formulae • There are three statistical formulae that we will be using over and over in this course: • • • • Our estimate of the expected value, Our estimate of the variance of the sample. Our estimate of the variance of the expected value. You must be able to tell them apart 10 Estimate of the expected value • The first, and most important, deals with the how we gather from the stream of estimates the BEST POSSIBLE estimate of the expected value. The resulting formula for xˆ N is: N xˆ N • • x i 1 i N Thus, our best estimate is the unweighted average of the individual estimates. This is not surprising, of course. Let’s compare with a couple of exact formulae. 11 Mean of continuous distribution • For a continuous distribution, p(x), over a range (a,b) (i.e., x=a to x=b). the true mean, x , is the first moment of x: b x x p x dx a • where we have assumed that p(x) is a true probability density function (pdf), obeying the following: p x 0, in the range (a,b) p ( x) dx 1 12 Mean of discrete distribution • • For a discrete distribution we choose one of M choices, each of which probability p i The equation for the mean is: M x p i xi i 1 • Again, we have limitations on the probabilities: p i 0, for all i M p i 1 i 1 13 Example: p problem • For our example of finding p , we were dealing with a binomial distribution (i.e., two possible outcomes): • Outcome 1 = Hit the circle: x1 4 p1 p / 4 0.7854 • Outcome 2 = Miss the circle: x2 0 • p 2 1 p / 4 0.2146 Therefore, the expected value is: M p x i 1 i i 0.7854 4 0.2146 0 p 14 Estimate of the sample variance Variance = Expected squared error • b 2 x p x x x 2 dx for a continuous distribution a M p i xi x for a discrete distribution 2 i 1 • MC estimate of variance: N x S xn 2 2 i 1 xi xˆN 2 2 N N xi xi N N 1 N 1 i 1 N i 1 N 2 15 Sample standard deviation • • The standard deviation is the square root of the variance. The same is true of our estimate of it: S xn S 2 xn (Many texts call the estimate the “standard error”) • Recall that we have been talking about properties of the sample distribution: How much the individual estimates differ from each other 16 Example: p problem • Using the same outcomes as before: M 2 x p i xi x 2 i 1 0.7854 4 p 0.2146 p 2 2 2.697 x 2.697 1.642 • Very non-normal distribution 17 Estimate of the variance of mean • Turn our attention to the variance and standard deviation of the mean. • • • How much confidence we have in the mean that we obtained from N samples We could estimate this by making many estimates of the mean (each using N independent samples) and do a statistical analysis on these estimates. To our great relief, we can streamline this process and get an estimate of the mean from a single set of N samples 18 Variance of mean (cont’d) • • The text has a derivation showing that the variance of the mean is related to the variance of the distribution by: 2 x 2 xˆ N N Since we do not know the actual variance, we have to use our estimate: 2 xˆ N S 2 xˆ N xˆ N S xˆ N S 2 xi N S xi N 19 Example: p problem • Back to our example of finding , using the probabilities from the previous example, the standard deviation of the mean for a sample of N=10,000 would be: xˆ N S xˆ N • S xi 1.642 0.01642 N 10, 000 1 This brings us to the famous N 2 principle of Monte Carlo: Each extra digit of accuracy requires that the problem be run with 100 times as many histories. 20 Markov inequality • • • • Most distributions are not normal What can we say about the probability of a selection being with 1 when it is NOT normal? An upper bound is given by the Chebyshev inequality, but before attacking it, we need to build a tool we will use: The Markov inequality Thought experiment: If I tell you that a group of people has an average weight of 100 pounds, what can you say about the number that weigh more than 200 pounds? 21 Markov inequality (2) E x x x p x dx 0 x p x dx (Because the range of integration is smaller) nx nx p x dx (Because nx is the lower limit of x) nx nx p x dx (Because nx is a constant) nx nx Pr x nx (Because the integral defines the probability) 1 Pr x nx n 22 Chebyshev inequality • The Chebyshev applies the Markov to the variance instead of the average: E x x 2 n Pr 2 • 2 x x 2 n 2 1 Pr x x n n Replace n with its square (it’s just a positive number!) 2 2 1 Pr x x n 2 n 1 Pr x x n 2 n 2 2 2 23 Chebyshev inequality (2) The resulting statements you can say are not very satisfying to us (especially since we are used to normal distributions): • • • • • Normal: 68.3% within 1 vs Chebyshev: ? Normal: 95.4% within 2 vs Chebyshev: ? Normal: 99.7% within 3 vs Chebyshev: ? But, Chebyshev is extremely valuable to theoretical mathematicians because it proves that the integral over the “tails” is guaranteed to decrease with n, with a limit of 0. 24 Law of Large Numbers • • Theoretical basis of Monte Carlo is the Law of Large Numbers LLN: The weighted average value of the function, f : N b f f x p x dx lim a • N f x i i 1 N , where xi chosen using p x This relates the result of a continuous integration with the result of a discrete sampling. All MC comes from this. 25 Law of Large Numbers (2) At first glance, this looks like this would be useful for mathematicians trying to estimate integrals, but not particularly useful to us— We are not performing integrations - we are simulating physical phenomena This attitude indicates that you are just not thinking abstractly enough—All Monte Carlo processes are (once you dig down) integrations over a domain of “all possible outcomes” • • • Our values of “x” are over all possible histories that a particle might have 26 Central limit theorem • • • The second most important (i.e., useful) theoretical result for Monte Carlo is the Central Limit Theorem CLT: The sum of a sufficiently large number of independent identically distributed random variables (i.i.d.) becomes normally distributed as N increases This is useful for us because we can draw useful conclusions from the results from a large number of samples (e.g., 68.3% within one standard deviation, etc.) 27 28 29