Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
Inductive probability wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Student's t-test wikipedia , lookup
Misuse of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Fisher–Yates shuffle wikipedia , lookup
Computer science homework : Uniform and non uniform distribution of real-valued random numbers 1 Rules for the homework You have to give back this homework by group of two on paper form at the beginning of your lesson in week 45 (week of the 07th of november) for students of the lière classique or at the beginning of your lesson in week 46 (week of the 14th of november) for the groups containing at least a CFA or a FIE student. There will be 2 points less by late day. 2 Introduction In order to simulate physical processes it can be useful to know how to generate real random numbers that follow uniform or non uniform distributions (a lorentzian distribution for atomic line proles, a gaussian distribution for laser beam proles for example). We have seen in classwork how to generate uniformly distributed integer random numbers. In the rst part of this problem you will adapt these results to the case of real numbers and you will develop some statistical tools. In the following we will study how to generate a random variable with a gaussian probability density from a uniform random variable. This method is called the reject method. 3 Generation of a uniform variable of real numbers Write a function that generates random numbers of type double between two bounds with a uniform probability density. Write a function that calculates the mean and the standard deviation of a distribution of real numbers. 4 Histogram The calculation of histogram for real numbers is more complicated than the calculation for integer numbers. The method that we propose is the following : suppose that you have a set of N real numbers. This set will be called sample. Suppose that the values of this sample are between two √ values min and max. The interval [min,max] is divided in classes. The number of the classes is M = N (or its integer part). As we are considering classes of equal size, the size of a class is then xed. We dene the following sequence : √ (xi )1≤i≤M +1 with M = N , so that the class i is the interval [xi , xi+1 [. Each of these intervals has the same width ` : xi+1 − xi = ` When an element of the sample is in the class i, a counter that is linked to this class is incremented. In other words, the counter i counts the number of times an element of the sample falls in the class i. The set of the counters of all the classes gives the occurences of each class. To obtain an estimation of the probability density of the data of the sample you have to divide the value of each counter by the product of the size N of the sample by the width ` of a class. Why does this normalisation give an estimation of the probability density ? In the following this "estimator" of the probability density will be called histogram. 1 1. Write a function that calculates the histogram of a sample of real numbers. 2. Test your function with a sample that you will have generated with your generator of real-valued random variables. Use Excel to visualize your histogram. 3. Calculate the histogram for samples of dierent sizes N . What can you say from the comparisons of these histograms ? 5 χ2 test The χ2 testis a very useful statistical test to determine if the values found from an experimental realization are distributed according to a given, known probability function. In this test you calculate 2 b [i] − P [i] M P X 1 χ2 = M i=1 P [i] where M is the number of classes. The vector Pb[i] represents the histogram of the sample and the vector P [i] represents the expected probabilities for each class i. The value of the χ2 corresponds to a normalized "quadratic distance" between the estimated histogram and the theoretical histogram. When this value is small we can consider that the measured sample follows the law of probability P [i]. 1. Write a function computing the χ2 . 2. Test it with a realization of a sample of uniform random numbers. If you use another realization, what is the result ? 3. Plot on Excel the value of χ2 versus the size N of the sample. For each value of N , you will plot the mean of the estimated χ2 on 20 realizations of the sample. 6 Reject method This method can be used to generate every kind of random variable. In this method data are generated following a density distribution close to the one desired. Then a portion of these data are eliminated so that only the data that follow the expected distribution are kept. We will apply this method to the generation of a gaussian random variable, very used in physics and engineering. The mathematical demonstration is not given here, you just have to follow the following algorithm : 1. 2. 3. 4. Generate U1, a uniform probability distribution on [0,1] (noted U[0,1] in the following) Calculate X = - ln(U1) Generate U2, a U[0,1] Test if 2 1 U 2 ≤ e− 2 (X−1) 5. If this condition is not satised, return to step 1 (X is rejected), if not continue (X is accepted) 6. Generate U3, a U[0,1] 7. If U 3 ≤ 0, 5 give a negative sign to X, if not keep it positive. The X that are accepted follow a gaussian law of probability of null mean and of variance 1. Write this algorithm. Plot on Excel the histogram of the probability distribution that is generated. 2 7 Application By using the function realized in the previous section, realize a function that generate a gaussian random variable with any mean m and any variance σ 2 . Generate a sample of 200 uniform random numbers of mean 1 and of variance 2. We recall that the variance of a uniform probability distribution between two bounds MIN and MAX is (M IN − M AX)2 /12. Generate a sample of gaussian random numbers with the same mean and the same variance. How can you distinguish these two samples ? If you want to know more about the reject method, you can go for example on the following site : www.stat.ucl.ac.be/cours/stat2430/documents/random.pdf 3