Download 1 Take a chance 2 Quantifying chance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Probability Distributions and Random Variables
Lecture notes : M. S. Santhanam
Physical Research Laboratory, Ahmedabad.
Take a chance
1
To every event, you can ask the question as to what is the chance that it is likely to happen ?
Fig 1(a) is the plot of the BSE Sensex from 2000 to 2004. Fig 1(b) shows the same information
in a probability distribution form. But the same information is contained in both of them. One
is the ’plain vanilla’ perspective; time series as a function of time. The other is the probability
distribution perspective of the events. The probability distribution answers the question; how
many times during 2000-2004 did the BSE Sensex had values, say, between 2900-3000. The
answer is the area under the histogram in Fig 1(b) marked as a dark patch. While the train
of events recorded by the time series tells you the trend, the finer details are revealed by the
histogram of the probability distribution. The details include everything from the mean, variance
to higher order moments of the distribution. This should help you guess which process gives rise
to the observed time series. Exercise : Explain why Fig 1 cannot be used to predict the Sensex
trend.
8000
200
(a)
(b)
7000
150
6000
100
5000
4000
50
3000
2000
0
500
1000
0
2000
1500
3000
4000
5000
6000
7000
8000
Figure 1: (a) Daily BSE sensex for the period 2000-2004 in time series form and (b) the same
data as a (unnormalised) probability distribution.
2
Quantifying chance
We get down to quantifying chance. In our example, an event is the range of values in each
bin of Fig 1(b). Each bin is of size (8000-2000/80). In other words, our total space consists of
80 events. The probability of an event, say, the sensex hovering around 2900-2975 is obtained
by actually counting number of times it did so divided by the total number of all outcomes. If
A, B, C......Z denotes the events and n a , nb , nc .....nz denote the number of times it occurs, then
the probability of ni th event happening is given by,
fi =
ni
na + nb ....nz
This would be correct probability for the ith event if the sum n a + nb ....nz → ∞. Thus,
probability is defined as the limit of infinite number of occurrence of the events.
1
However, more sophisticated definitions of probability require the ideas of set theory. We
define Ω to be the sample space, a space of elementary events X i . An event is elementary if
occurrence of one of the events precludes the occurrence of all the others. In the sensex example,
if on a given day the value of sensex falls in the range 2900-2975, then the other possible values
do not occur on the same day. For the detailed axioms of probability for elementary events, the
readers are referred to the bibliography.
Elementary events are the simplest possible cases. In the case of coin toss, elementary
events are H and T . If one occurs, the other does not occur. From here, one can image more
complicated sets of events that are non-elementary. Take two coins. The possible sets of events
are
A1 = {H, H}
A2 = {H, T }
A3 = {T, H}
A4 = {T, T }
(1)
Each set is composed of elementary events H and T . Occurrence of say, H, in the set A 1 does
not prevent it from appearing appearing in A 3 . A set composed of elementary events is again an
event. Occurrence of at least one of the events in A 1 means that event A1 has occurred. Hence,
A1 , ...A4 qualify to be 4 different events. Now, what’s the probability of occurrence of A 1 ?. It
is simply 1/4. Questions asked in real life are rarely this simple. We go to the complex ones.
3
More complex.....
What’s the probability that the sensex took the values in the bin 2900-2975 or 2975-3050. We
just add up the number of events in these two bins divided by the total number of recorded
outcomes. What we just did is to apply the addition law of probability. The formal question is,
what is the probability of A1 or A2 occurring ? That is to say,
P (A1 or A2 ) = P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 and A2 )
This is the statement of addition law of probability. This begs the question, what is P (A 1 and A2 ).
This is the probability that event A 1 and A2 occur. This is given by,
P (A1 and A2 ) = P (A1 ∩ A2 ) = P (A1 ) P (A2 )
if the events A1 and A2 are independent. If they are not, we invoke the conditional probability
defined as follows :
P (A1 ∩ A2 ) = P (A1 |A2 ) P (A2 )
Here, P (A1 |A2 ) is the conditional probability for A 1 to occur given that A2 has occurred. Note
that P (A1 |A2 ) = P (A1 ) if A1 and A2 are independent.
Exercise : Calculate P (A1 |A2 ) and P (A2 |A1 ) for the two coin toss experiment whose all
possible outcomes are given in Eq 1.
4
P to PDF
Now that probabilities and operational details are defined, we take a leap into the probability
distribution functions (pdf) that are defined for random variables. Random variables, usually
denoted capital letter X, are those that take random values as outcome of each trial. If Bombay
stock exchange is an experiment done by the ’market forces’, then sensex is the random variable
and it takes different possible values for every session. Then, a random event can be designated
by the bin range as done in Fig 1(b). What is the probability P (x < X < x + dx)dx that
2
the values lie in the bin of size dx between x and x + dx. Here, P (x < X < x + dx) is the
probability density function and P (x < X < x + dx) dx is the probability. This is an instance
of continuous probability distribution. If the random variable is the number of students in class
5 in schools all over India, then the variable takes discrete values. We could ask for P (X = x),
the probability that there are actually x students in a class. We should maintain the distinction
between continuous and discrete probability distributions.
Simple examples
5
Let us look at simple cases of idealised distributions. These are models of pdfs that occur in real
life or at least the real life pdfs can be understood in terms of these model distributions. The
most well known is of course the Gaussian distribution. The Maxwell distribution of velocities
is an instance of it.
2
1
(a)
0.8
1.5
P(x)
(b)
0.6
1
0.4
0.5
0.2
0
0
0.2
0.4
x
0.8
0.6
00
1
H
T
Figure 2: (a) Continuous distribution : Histogram of 5000 computer generated random numbers.
The solid line is the uniform distribution P (x). (b) Discrete distribution : Bernoulli pdf. The
probability of H is 0.4 and T is 0.6 .
An example of continuous distribution is the uniform distribution, i.e, P (x) = 1 for 0 ≤
x ≤ 1, shown in Fig 2(a). The computer programs that generate pseudo-random numbers are
uniformly distributed random numbers. A rather trivial example of discrete pdf is the Bernoulli
pdf shown in Fig 2(b). A Bernoulli trial is a single experiment of a coin toss. The outcome is
either H or T with probability p for H and q = 1−p for T . Note that p and q need not be 1/2. It
might seem a rather bland experiment at first sight until we figure out that the discrete Binomial
distribution is n trials of Bernoulli experiment which, in turn, is connected to the random walk
problem. For the details of Binomial distribution and other important distributions like the
Gaussian, exponential, Lorentzian etc. the reader is directed to the bibliography at the end of
this notes.
6
The ways and means
Further what more can pdfs tell us ? All the moments of the distribution can be calculated
using it. We might want to know what is the mean value of sensex recorded in Fig 1(a). In the
pdf picture, this is related to the expectation value of X n (n=1 case is the mean),
hX n i = E(X n ) =
3
Z
∞
xn P (x) dx
−∞
(2)
In this, if X is rescaled suitably such that mean is zero, i.e, hXi = 0 then n = 2 is the variance.
These moments are important because if these are given we can reconstruct the corresponding
pdf. A piece of warning : Eq 2 does not say anything about existence of the integral. The moments need not necessarily exist. For instance, variance does not exist for the Lorentzian. This
implies that Lorentzian has no definite characteristic scale. This has interesting consequences
in various fields ranging from stochastic processes to financial markets to fractals.
Exercise : Explain what is meant by a characteristic scale for any distribution. Why is it
absent in Lorentzian ? Hint : Compare Lorenzian with any distribution that has a characteristic
scale.
Beyond the moments, one could calculate the correlation between two random variables X
and Y . It is given by,
E(XY ) − E(X)E(Y )
cor(X, Y ) =
var(X) var(Y )
where var(.) is the variance. As the name suggests it is an indicator of correlation between X
and Y . It can be shown that −1 ≤ cor(X, Y ) ≤ 1. (Prove it and you would have proved Schwarz
inequality).
7
And there’s much more
Obviously, there is much more to statistics and probability than given here. One singularly important result in the theory of probability distributions is the Central Limit Theorem. It states
that the distribution P (X), of a sum of random variables given by X = X 1 + X2 + X3 ......Xn
whose individual means and variances exist, tends to a Gaussian as n → ∞. This is a powerful
theorem for many reasons, the most important being that it provides an argument as to why
Gaussian is ubiquitous in nature. Refer to texts for precise statements and applications of CLT.
Exercise : As we see, Gaussian is an all important distribution. The international business magazine
Forbes gives 6σ rating to businesses whose efficiency results in about only one error in 10 6 . This is based
on Gaussian distribution. What does 6σ signify for the Gaussian ? Note that the Mumbai’s Dabbawalas
were awarded this 6σ rating by Forbes. Read about it if you are not already aware of it.
Bibliography
(This write-up is a starting point. The books below are meant to be guidelines. Consult any
accessible book, including these ones, on probability and/or statistics.)
• Statistical Methods in Experimental Physics,W. T. Eadie et. al.
• Introduction to Mathematical Statistics, Paul G. Hoel.
• Introduction to Probability theory and its applications, Vol-1, William Feller.
• Statistical Procedures for Engg., Management and Science, Leland Blank.
• Probability and Statistics for Engineers and Scientists , R.E. Walpole et. al.
• Schaum Series on Probability and Statistics , M. R. Spiegel.
• Take a look at internet resources on probability distributions and dont miss the by-now
legendary story of Mumbai’s dabbawalas.
4