Download LAB1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Birthday problem wikipedia , lookup

Random variable wikipedia , lookup

Probability interpretations wikipedia , lookup

Randomness wikipedia , lookup

Central limit theorem wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Richard Kass
Spring 2006
Physics 416 LAB 3
One of the goals of this lab is to familiarize the student with the Central Limit Theorem, an
amazing result from probability theory that explains why the Gaussian distribution (aka "Bell
Shaped Curve" or Normal distribution) applies to areas as far ranging as economics and physics.
Below are two statements of the Central Limit Theorem (C.L.T.).
I) "If an overall random variable is the sum of many random variables, each having its own
arbitrary distribution law, but all of them being small, then the distribution of the overall random
variable is Gaussian".
II) Let Y1, Y2,...Yn be an infinite sequence of independent random variables each with the same
probability distribution. Suppose that the mean () and variance (2) of this distribution are both
finite. Then for any numbers a and b:
1 b 1/2y2
 Y1  Y2  Yn  n

lim P a 
 b 
dy
e
n 

 n

2 a
Thus the C.L.T. tells us that under a wide range of circumstances the probability distribution that
describes the sum of random variables tends towards a Gaussian distribution as the number of
terms in the sum .
Some things to note about the C.L.T. and the above statements:
a) A random variable is not the same as a random number! Devore in "Probability and
Statistics for Engineering and the Sciences" defines a random variable as (page 81):
"A random variable is any rule that associates a number with each outcome in S".
b) If y is described by a Gaussian distribution with mean  = 0 and variance 2 = 1 then
the probability that a < y < b is:
1 b 1/2y 2
P(a  y  b) 
dy
e
2 a
c) The C.L.T. is still true even if the Yi's are from different probability distributions! All
that is required for the C.L.T. to hold is that the distribution(s) have a finite mean(s) and
variance(s) and that no one term in the sum dominates the sum. This is more general than
definition II).
1) In this exercise we will use the computer and definition II) to illustrate the C.L.T. This exercise
uses the properties of the random number generator (RAN). The random number generator gives
us numbers uniformly distributed in the interval [0, 1]. This uniform distribution (p(x)) can be
described by:
p(x) = 1 for 0 < x < 1 and p(x) = 0 for all other x.
a) Show using the integral definitions of the mean and variance that the uniform distribution has
 = 1/2 and 2 = 1/12.
b) According to definition II) if we add together 12 (Y1 Y2 Y12) numbers taken from our
random number generator RAN then:
1 b 1/2 y 2
 Y  Y2  Y12  6

P a 1
b 
dy
e


1
2 a
This says that just by adding 12 random numbers (each between 0 and 1) together and subtracting
off 6 we will get something that very closely approximates a Gaussian distribution for the sum
(Z = Y1 Y2 Y12 - 6) with = 0 variance 2 = 1! Write a program to see if this is true.
Generate 106 values of Z and make a histogram of your results. I suggest using x bins of 0.5 unit,
e.g. Z < -5.5, -5.5 Z < -5.0...Z > 5.5. Superimpose (using e.g. Kaleidagraph or by hand) a
Gaussian pdf with = 0 and 2 = 1 on your histogram and comment on how well your histogram
reproduces a Gaussian distribution.
In your (FORTRAN) program you will need to put the (106) Z’s into bins. The following
FORTRAN code will do this job. There lots of ways to accomplish this task. Here we are using
“if” statements (sees the FORTRAN tutorial for details on how an “if” statements works). The
variables “bin” and inbin(24) are integer variables (and in the program you would need to declare
them as integers, e.g. “integer bin, inbin(24)”). The (integer) variable “inbin(24)” is an array of
dimension 24 and keeps track of the numbers of Z’s in a given interval.
if(gauss.lt.-5.5)bin=1
if(gauss.ge.-5.5.and.gauss.lt.-5.0)bin=2
if(gauss.ge.-5.0.and.gauss.lt.-4.5)bin=3
if(gauss.ge.-4.5.and.gauss.lt.-4.0)bin=4
if(gauss.ge.-4.0.and.gauss.lt.-3.5)bin=5
if(gauss.ge.-3.5.and.gauss.lt.-3.0)bin=6
if(gauss.ge.-3.0.and.gauss.lt.-2.5)bin=7
if(gauss.ge.-2.5.and.gauss.lt.-2.0)bin=8
if(gauss.ge.-2.0.and.gauss.lt.-1.5)bin=9
if(gauss.ge.-1.5.and.gauss.lt.-1.0)bin=10
if(gauss.ge.-1.0.and.gauss.lt.-0.5)bin=11
if(gauss.ge.-0.5.and.gauss.lt.0.0)bin=12
if(gauss.ge.0.0.and.gauss.lt.0.5)bin=13
if(gauss.ge.0.5.and.gauss.lt.1.0)bin=14
if(gauss.ge.1.0.and.gauss.lt.1.5)bin=15
if(gauss.ge.1.5.and.gauss.lt.2.0)bin=16
if(gauss.ge.2.0.and.gauss.lt.2.5)bin=17
if(gauss.ge.2.5.and.gauss.lt.3.0)bin=18
if(gauss.ge.3.0.and.gauss.lt.3.5)bin=19
if(gauss.ge.3.5.and.gauss.lt.4.0)bin=20
if(gauss.ge.4.0.and.gauss.lt.4.5)bin=21
if(gauss.ge.4.5.and.gauss.lt.5.0)bin=22
if(gauss.ge.5.0.and.gauss.lt.5.5)bin=23
if(gauss.ge.5.5)bin=24
inbin(bin)=inbin(bin)+1
NOTE: save this program, we will use it again in LAB 5.
2) In class (Lecture 3, page 7) we used the CLT to calculate the probability of winning $500 or
more in 60 days assuming that on a daily basis a gambler’s earnings (winnings or losses) can be
described by a uniform distribution in the interval [-$40, $50]. In this exercise you will write a
computer program that uses the Monte Carlo technique (see Section 10 of the FORTRAN
Tutorial) to calculate (simulate) this probability and in the process check the CLT. Here’s an
outline of the procedure:
a) Use the computer’s random number generator (which is uniform in [0,1]) to generate a number
uniform over the interval [-40,50]. This number represents one day’s earnings. (Hint: a+b*ran(iseed) is
uniform in the interval [a, a+b]).
b) Repeat a) 60 times adding each daily earning to the next. This sum represents one estimate of
the gambler’s earnings in 60 days.
c) Repeat b) 100000 (= TRIALS) times and for each repetition increment the variable BIGWIN by
one if the sum is greater than or equal to 500. (The variable BIGWIN is called a “counter” and keeps track of
the number of times the gambler wins ≥ $500 in 60 days).
d) The probability to win ≥ $500 in 60 days is calculated from BIGWIN/TRIALS.
Now (the fun part) answer the following questions:
i) How does your result, d), compare with the answer given in the lecture notes?
ii) Modify your program to calculate the probability that the gambler loses money in 60 days.
Compare your result (use TRIALS=100000) with the result obtained using the CLT.
3) Let’s apply the computer technique of exercise 2) to a different problem, sometimes called a
“random walk.” Let’s assume that a molecule in a liquid can only move along the x-axis (so this is
a problem in one dimension). Let’s assume that the molecule scatters every time it moves one
micron and after a scattering there is a 50% chance that the molecule will be moving in the +x
scattering and 50% probability that the molecule will be moving in the –x direction. What is the
probability that the molecule will be located at |x|  7 microns away from its starting point after
49 scatterings? Let the first scattering occurs at x=0. Use your computer program to generate the
directions of the 49 scatterings and keep track of where the molecule is along the x axis after
scattering. Repeat the process 100000 times and then calculate the probability that the molecule
will be located at |x|  7 microns away from its starting point.
You can also do this experiment using a yard stick and two coins. One coin is flipped to determine
the direction of the molecule after a scattering; the other coin (the “molecule”) is moved along the
yardstick (1 inch = 1 micron) and tracks the molecule’s location along the x-axis. Here you would
flip the coin 49 times and record the final position of the molecule/coin. Repeat this process 10
times (or more if you can stand it) and calculate the probability that the molecule is located at |x| 
7 inches. Does the result from your experiment agree with the result from the computer program?
Super Extra Credit: Use the CLT to predict the results of your computer program!