Download Uniform and non uniform distribution of real

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Fisher–Yates shuffle wikipedia , lookup

German tank problem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Computer science homework : Uniform and non
uniform distribution of real-valued random numbers
1
Rules for the homework
You have to give back this homework by group of two on paper form at the beginning of your
lesson in week 45 (week of the 07th of november) for students of the lière classique or at the beginning
of your lesson in week 46 (week of the 14th of november) for the groups containing at least a CFA or a
FIE student. There will be 2 points less by late day.
2
Introduction
In order to simulate physical processes it can be useful to know how to generate real random numbers
that follow uniform or non uniform distributions (a lorentzian distribution for atomic line proles, a
gaussian distribution for laser beam proles for example). We have seen in classwork how to generate
uniformly distributed integer random numbers. In the rst part of this problem you will adapt these
results to the case of real numbers and you will develop some statistical tools. In the following we will
study how to generate a random variable with a gaussian probability density from a uniform random
variable. This method is called the reject method.
3
Generation of a uniform variable of real numbers
Write a function that generates random numbers of type double between two bounds with a uniform
probability density. Write a function that calculates the mean and the standard deviation of a distribution
of real numbers.
4
Histogram
The calculation of histogram for real numbers is more complicated than the calculation for integer
numbers. The method that we propose is the following : suppose that you have a set of N real numbers.
This set will be called sample. Suppose that the values of this sample are between two
√ values min and
max. The interval [min,max] is divided in classes. The number of the classes is M = N (or its integer
part). As we are considering classes of equal size, the size of a class is then xed. We dene the following
sequence :
√
(xi )1≤i≤M +1
with M = N , so that the class i is the interval
[xi , xi+1 [.
Each of these intervals has the same width ` :
xi+1 − xi = `
When an element of the sample is in the class i, a counter that is linked to this class is incremented. In
other words, the counter i counts the number of times an element of the sample falls in the class i. The
set of the counters of all the classes gives the occurences of each class.
To obtain an estimation of the probability density of the data of the sample you have to divide the
value of each counter by the product of the size N of the sample by the width ` of a class.
Why does this normalisation give an estimation of the probability density ?
In the following this "estimator" of the probability density will be called histogram.
1
1. Write a function that calculates the histogram of a sample of real numbers.
2. Test your function with a sample that you will have generated with your generator of real-valued
random variables. Use Excel to visualize your histogram.
3. Calculate the histogram for samples of dierent sizes N . What can you say from the comparisons
of these histograms ?
5
χ2 test
The χ2 testis a very useful statistical test to determine if the values found from an experimental
realization are distributed according to a given, known probability function. In this test you calculate
2
b [i] − P [i]
M
P
X
1
χ2 =
M i=1
P [i]
where M is the number of classes.
The vector Pb[i] represents the histogram of the sample and the vector P [i] represents the expected
probabilities for each class i. The value of the χ2 corresponds to a normalized "quadratic distance"
between the estimated histogram and the theoretical histogram. When this value is small we can consider
that the measured sample follows the law of probability P [i].
1. Write a function computing the χ2 .
2. Test it with a realization of a sample of uniform random numbers. If you use another realization,
what is the result ?
3. Plot on Excel the value of χ2 versus the size N of the sample. For each value of N , you will plot
the mean of the estimated χ2 on 20 realizations of the sample.
6
Reject method
This method can be used to generate every kind of random variable. In this method data are generated following a density distribution close to the one desired. Then a portion of these data are eliminated
so that only the data that follow the expected distribution are kept. We will apply this method to the
generation of a gaussian random variable, very used in physics and engineering. The mathematical demonstration is not given here, you just have to follow the following algorithm :
1.
2.
3.
4.
Generate U1, a uniform probability distribution on [0,1] (noted U[0,1] in the following)
Calculate X = - ln(U1)
Generate U2, a U[0,1]
Test if
2
1
U 2 ≤ e− 2 (X−1)
5. If this condition is not satised, return to step 1 (X is rejected), if not continue (X is accepted)
6. Generate U3, a U[0,1]
7. If
U 3 ≤ 0, 5
give a negative sign to X, if not keep it positive.
The X that are accepted follow a gaussian law of probability of null mean and of variance 1.
Write this algorithm. Plot on Excel the histogram of the probability distribution that is generated.
2
7
Application
By using the function realized in the previous section, realize a function that generate a gaussian
random variable with any mean m and any variance σ 2 .
Generate a sample of 200 uniform random numbers of mean 1 and of variance 2. We recall that the variance of a uniform probability distribution between two bounds MIN and MAX is (M IN − M AX)2 /12.
Generate a sample of gaussian random numbers with the same mean and the same variance. How can
you distinguish these two samples ?
If you want to know more about the reject method, you can go for example on the following site :
www.stat.ucl.ac.be/cours/stat2430/documents/random.pdf
3