Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability Distributions and Random Variables Lecture notes : M. S. Santhanam Physical Research Laboratory, Ahmedabad. Take a chance 1 To every event, you can ask the question as to what is the chance that it is likely to happen ? Fig 1(a) is the plot of the BSE Sensex from 2000 to 2004. Fig 1(b) shows the same information in a probability distribution form. But the same information is contained in both of them. One is the ’plain vanilla’ perspective; time series as a function of time. The other is the probability distribution perspective of the events. The probability distribution answers the question; how many times during 2000-2004 did the BSE Sensex had values, say, between 2900-3000. The answer is the area under the histogram in Fig 1(b) marked as a dark patch. While the train of events recorded by the time series tells you the trend, the finer details are revealed by the histogram of the probability distribution. The details include everything from the mean, variance to higher order moments of the distribution. This should help you guess which process gives rise to the observed time series. Exercise : Explain why Fig 1 cannot be used to predict the Sensex trend. 8000 200 (a) (b) 7000 150 6000 100 5000 4000 50 3000 2000 0 500 1000 0 2000 1500 3000 4000 5000 6000 7000 8000 Figure 1: (a) Daily BSE sensex for the period 2000-2004 in time series form and (b) the same data as a (unnormalised) probability distribution. 2 Quantifying chance We get down to quantifying chance. In our example, an event is the range of values in each bin of Fig 1(b). Each bin is of size (8000-2000/80). In other words, our total space consists of 80 events. The probability of an event, say, the sensex hovering around 2900-2975 is obtained by actually counting number of times it did so divided by the total number of all outcomes. If A, B, C......Z denotes the events and n a , nb , nc .....nz denote the number of times it occurs, then the probability of ni th event happening is given by, fi = ni na + nb ....nz This would be correct probability for the ith event if the sum n a + nb ....nz → ∞. Thus, probability is defined as the limit of infinite number of occurrence of the events. 1 However, more sophisticated definitions of probability require the ideas of set theory. We define Ω to be the sample space, a space of elementary events X i . An event is elementary if occurrence of one of the events precludes the occurrence of all the others. In the sensex example, if on a given day the value of sensex falls in the range 2900-2975, then the other possible values do not occur on the same day. For the detailed axioms of probability for elementary events, the readers are referred to the bibliography. Elementary events are the simplest possible cases. In the case of coin toss, elementary events are H and T . If one occurs, the other does not occur. From here, one can image more complicated sets of events that are non-elementary. Take two coins. The possible sets of events are A1 = {H, H} A2 = {H, T } A3 = {T, H} A4 = {T, T } (1) Each set is composed of elementary events H and T . Occurrence of say, H, in the set A 1 does not prevent it from appearing appearing in A 3 . A set composed of elementary events is again an event. Occurrence of at least one of the events in A 1 means that event A1 has occurred. Hence, A1 , ...A4 qualify to be 4 different events. Now, what’s the probability of occurrence of A 1 ?. It is simply 1/4. Questions asked in real life are rarely this simple. We go to the complex ones. 3 More complex..... What’s the probability that the sensex took the values in the bin 2900-2975 or 2975-3050. We just add up the number of events in these two bins divided by the total number of recorded outcomes. What we just did is to apply the addition law of probability. The formal question is, what is the probability of A1 or A2 occurring ? That is to say, P (A1 or A2 ) = P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 and A2 ) This is the statement of addition law of probability. This begs the question, what is P (A 1 and A2 ). This is the probability that event A 1 and A2 occur. This is given by, P (A1 and A2 ) = P (A1 ∩ A2 ) = P (A1 ) P (A2 ) if the events A1 and A2 are independent. If they are not, we invoke the conditional probability defined as follows : P (A1 ∩ A2 ) = P (A1 |A2 ) P (A2 ) Here, P (A1 |A2 ) is the conditional probability for A 1 to occur given that A2 has occurred. Note that P (A1 |A2 ) = P (A1 ) if A1 and A2 are independent. Exercise : Calculate P (A1 |A2 ) and P (A2 |A1 ) for the two coin toss experiment whose all possible outcomes are given in Eq 1. 4 P to PDF Now that probabilities and operational details are defined, we take a leap into the probability distribution functions (pdf) that are defined for random variables. Random variables, usually denoted capital letter X, are those that take random values as outcome of each trial. If Bombay stock exchange is an experiment done by the ’market forces’, then sensex is the random variable and it takes different possible values for every session. Then, a random event can be designated by the bin range as done in Fig 1(b). What is the probability P (x < X < x + dx)dx that 2 the values lie in the bin of size dx between x and x + dx. Here, P (x < X < x + dx) is the probability density function and P (x < X < x + dx) dx is the probability. This is an instance of continuous probability distribution. If the random variable is the number of students in class 5 in schools all over India, then the variable takes discrete values. We could ask for P (X = x), the probability that there are actually x students in a class. We should maintain the distinction between continuous and discrete probability distributions. Simple examples 5 Let us look at simple cases of idealised distributions. These are models of pdfs that occur in real life or at least the real life pdfs can be understood in terms of these model distributions. The most well known is of course the Gaussian distribution. The Maxwell distribution of velocities is an instance of it. 2 1 (a) 0.8 1.5 P(x) (b) 0.6 1 0.4 0.5 0.2 0 0 0.2 0.4 x 0.8 0.6 00 1 H T Figure 2: (a) Continuous distribution : Histogram of 5000 computer generated random numbers. The solid line is the uniform distribution P (x). (b) Discrete distribution : Bernoulli pdf. The probability of H is 0.4 and T is 0.6 . An example of continuous distribution is the uniform distribution, i.e, P (x) = 1 for 0 ≤ x ≤ 1, shown in Fig 2(a). The computer programs that generate pseudo-random numbers are uniformly distributed random numbers. A rather trivial example of discrete pdf is the Bernoulli pdf shown in Fig 2(b). A Bernoulli trial is a single experiment of a coin toss. The outcome is either H or T with probability p for H and q = 1−p for T . Note that p and q need not be 1/2. It might seem a rather bland experiment at first sight until we figure out that the discrete Binomial distribution is n trials of Bernoulli experiment which, in turn, is connected to the random walk problem. For the details of Binomial distribution and other important distributions like the Gaussian, exponential, Lorentzian etc. the reader is directed to the bibliography at the end of this notes. 6 The ways and means Further what more can pdfs tell us ? All the moments of the distribution can be calculated using it. We might want to know what is the mean value of sensex recorded in Fig 1(a). In the pdf picture, this is related to the expectation value of X n (n=1 case is the mean), hX n i = E(X n ) = 3 Z ∞ xn P (x) dx −∞ (2) In this, if X is rescaled suitably such that mean is zero, i.e, hXi = 0 then n = 2 is the variance. These moments are important because if these are given we can reconstruct the corresponding pdf. A piece of warning : Eq 2 does not say anything about existence of the integral. The moments need not necessarily exist. For instance, variance does not exist for the Lorentzian. This implies that Lorentzian has no definite characteristic scale. This has interesting consequences in various fields ranging from stochastic processes to financial markets to fractals. Exercise : Explain what is meant by a characteristic scale for any distribution. Why is it absent in Lorentzian ? Hint : Compare Lorenzian with any distribution that has a characteristic scale. Beyond the moments, one could calculate the correlation between two random variables X and Y . It is given by, E(XY ) − E(X)E(Y ) cor(X, Y ) = var(X) var(Y ) where var(.) is the variance. As the name suggests it is an indicator of correlation between X and Y . It can be shown that −1 ≤ cor(X, Y ) ≤ 1. (Prove it and you would have proved Schwarz inequality). 7 And there’s much more Obviously, there is much more to statistics and probability than given here. One singularly important result in the theory of probability distributions is the Central Limit Theorem. It states that the distribution P (X), of a sum of random variables given by X = X 1 + X2 + X3 ......Xn whose individual means and variances exist, tends to a Gaussian as n → ∞. This is a powerful theorem for many reasons, the most important being that it provides an argument as to why Gaussian is ubiquitous in nature. Refer to texts for precise statements and applications of CLT. Exercise : As we see, Gaussian is an all important distribution. The international business magazine Forbes gives 6σ rating to businesses whose efficiency results in about only one error in 10 6 . This is based on Gaussian distribution. What does 6σ signify for the Gaussian ? Note that the Mumbai’s Dabbawalas were awarded this 6σ rating by Forbes. Read about it if you are not already aware of it. Bibliography (This write-up is a starting point. The books below are meant to be guidelines. Consult any accessible book, including these ones, on probability and/or statistics.) • Statistical Methods in Experimental Physics,W. T. Eadie et. al. • Introduction to Mathematical Statistics, Paul G. Hoel. • Introduction to Probability theory and its applications, Vol-1, William Feller. • Statistical Procedures for Engg., Management and Science, Leland Blank. • Probability and Statistics for Engineers and Scientists , R.E. Walpole et. al. • Schaum Series on Probability and Statistics , M. R. Spiegel. • Take a look at internet resources on probability distributions and dont miss the by-now legendary story of Mumbai’s dabbawalas. 4