Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Random Variables Math 146 1 Introdution Reall that we introdued random variables as funtions dened on a sample spae. In other words, we have a sample spae by X, Y, Z, . . .) observations. S, and funtions (often denoted whih take spei values depending on the outome of our The point is that we will mostly deal with events dened by values taken by one or more random variables, as in {X = x} , {a ≤ X ≤ b} , {X = x, a ≤ Y ≤ b} et. We'll sometimes write r.v. for random variable, for brevity. The olletion of probabilities P [X = x] = pX (x) (for all possible values of variable X. x)is (1) alled the (probability ) distribution of the random When we onsider more than on random variable, we may speak of the olletion P [X = x, Y = y] = pX,Y (x, y) (for all possible values of x and y) as the joint distribution of X and Y. Sine a full desription of the distribution is sometimes diult to obtain, and at other times is not needed, we often use some parameters of the distribution. Speially, we will often be interested in • The mean (or expeted value ) of the r.v. EX = X X pX (x) x (2) x where the sum is over all possible values of • x. X : some2 3 times we onsider a funtion of X,f (X) (ommon examples are X , X , . . ., kX e , and so on). It is a random variable too, and with a little reetion The mean (or expeted value) of some interesting funtion of it is not diult to show that Ef (X) = X x 1 pX (x) f (x) (3) 2 2 A Few Consequenes • As speial ases of the previous denition, we have the absolute moments of the distribution: EX k = X pX (x) xk (4) x for k = 2, 3, . . . and the entered moments k E (X − EX) = • X x k pX (x) (x − EX) = X x pX (x) x − X !k pX (x) x x An espeially ommonly used entered moment is the one with (5) k = 2, alled the Variane of the r.v.: V ar (X) = X x 2 pX (x) (x − EX) (6) Using the formulas above and a little algebra, it is not hard to show a few proprieties of the operation of taking the expeted value. For example: • E (aX + bY ) = aEX + bEY (where a and b are two numbers) 2 • E (aX + bY ) = a2 EX 2 + b2 EY 2 + 2abE (XY ) • V ar (X + Y ) = V ar (X) + V ar (Y ) + 2cov (X, Y ), where cov (X, Y ) = E [(X − EX) (Y − EY )] is usually alled the ovariane of X and Y We'll see several ases where the whole point of our experiments will redue to estimating the true value of the expeted values and the variane of one or more random variables. 2 A Few Consequenes Conentrating a moment on the rst few moments, in partiular the mean and the variane, there are a few onsequenes that we may want to draw. 2.1 Linear Transformations Given a r.v. as aX + b, X we are sometimes interested in working with a new r.v. dened where • X+b a and b are two numbers. Note that is the same r.v., but with its values shifted by example, that X b. Suppose, for represents the time until a ertain event ours (e.g., a bus arrives). To study it we need to deide a starting time, when is it that • aX X = 0. X + b shifts the starting time to −b. is the same r.v., but using a dierent sale. the time X Suppose we measure mentioned above in hours. If we deide to hange are units to minutes, all readings will be multiplied by 60, hene we will be onsidering the r.v. 60X . 3 2 A Few Consequenes A little algebra, and the denitions show that • E (aX + b) = aEX + b • V ar (aX + b) = a2 V ar (X) In partiular, a shift does not hange the variane, sine we are omputing it with respet to 2.2 EX whih is shifted by the same amount. Using Moments to Get Estimates A simple (and very rough, beause of its vast generality) estimate illustrates one use of moments. Consider k E |X| for some whether k (we use absolute values, so as to deal only with non negative values, k is even or odd). Applying the formula we saw, we have X E |X|k = (7) pX (x) |x|k x |X| exeed a ertain P [|X| > M ]. To get a rough handle on it, we an split the sum in (7) in parts: for |x| < M , and |x| ≥ M X X X k k k pX (x) |x| = pX (x) |x| + pX (x) |x| Now, suppose we are interested in the probability that value, two x Now, if |x| ≥ M , X |x|≥M x<M x≥M we lower the value of the sum if we write k pX (x) |x| ≥ X pX (x) M k = M k X |x|≥M |x|≥M in plae of x: pX (x) = M k P [|X| ≥ M ] k |x|<M pX (x) |x| rough estimate, but we are assuming almost nothing on Also, no matter what, we'll have that M P ≥ 0 (that's a quite pX , so we an only apply very rough information). Combining the two, E |X|k ≥ M k P [|X| ≥ M ] P [|X| ≥ M ] ≤ E |X| Mk k This is known as Markov's Inequality. In partiular, onsider a r.v. Y and X = Y − EY . Then EX 2 = V ar (Y ), and Markov's inequality, for k = 2, dene beomes P [|Y − EY | ≥ M ] ≤ V ar (Y ) M2 This is known as Chebyshev's Inequality. Hene, knowing the variane of a r.v. allows us a worst ase estimate of the probability of ending up far from the mean. 4 2 A Few Consequenes 2.3 What is Expeted in the Expeted Value? Atually, nothing is expeted. EX is not (neessarily) the most likely outome, and, quite often, it is not even a value that X will ever take (think of X , equal 1 1 1 1 to 0 or 1, eah with probability : the expeted value is 0 · 2 2 + 1 · 2 = 2 whih is not a value X an take). The signiane of EX , in pratial terms, is given by the (mathematial) Law of Large Numbers, whih we will disuss when turning to statistial appli- ations. In a hand-waving way, it is a good approximation to the average of very many independent observations of X . For example, if you play a lottery, −5 with probability of win p (say, p = 10 ). If a win will gain you $100, we an −5 represent a win as a random variable X suh that P [X = 100] = 10 , and −5 P [X = 0] = 1 − 10 . Hene, EX = 100 · 10−5 = 10−3 If n suessive attempts at this lottery an be onsidered as independent, iden- tially distributed opies of X, and n is suiently large, we may expet to end up with an average win (that is, total dollars won, divided by number of −3 attempts) approximately equal to 10 . This does not mean that you an be pretty sure that after, say, 10, 000 at- tempts, you will end up with $10! To make this lear, onsider a simpler alu- lation: suppose you play a fair game of hane, one in whih the probability 1 of winning is , and you are looking at the number of wins in this game, over a 2 large number of attempts. If the Law of Large Numbers applies, the perentage of wins will be lose to 50%. This means that if you play n times You would think that this implies 1 n ≈ N 2 that n ≈ N − n, N times, and you win but that's not so! The statement above means, in preise language, that for any ε, provided N is suiently large. If, for example, we had (8) would hold very well: and if N> (8) n= N 2 √ + N, √ √ N n − 1 = 2 + N − 1 = N = √1 N 2 N N 2 N N 1 ε2 , indeed Consequently, for, say winning n − 1 < ε N 2 n − 1 < ε N 2 N = 106 (you play a million games), you would be 106 + 103 2 5 3 A Note on Continuous Random Variables times, and so the dierene between your wins and those of your opponent would be 106 + 103 − 2 106 − 103 2 a relatively small number ompared to 106 , = 2 · 103 but marking a signiant dierene in number of wins and, under this assumption, things would get worse and worse as the game proeeded! 3 A Note on Continuous Random Variables We may onsider any r.v. as disrete, i.e., taking a number of values (maybe, theoretially, innite), that we an list. In some ases, this is obvious. instane, the toss of two die results in 11 possible values for X and Y Z = X +Y For where are the points shown by the rst and seond die. Also, if we play a game repeatedly, and for a very long time, the attempt number of our rst win, let's all it N, an take value 1, 2, 3, ... and so on, potentially without bound, if we keep losing and are very persistent. However, in other ases this idea of listing all values is a bit of a streth. Suppose we are measuring the time needed for a piee of equipment to fail, priniple, T T. In an take any non negative real number as a value. However, we may still treat it as a disrete r.v., if we take into aount that our measurements will be inevitably limited in preision. Hene, if we an be aurate to the minute, and T is measured in hours, the only values we an observe will be 2 61 1 , 60 , · · · , 1, 60, , · · ·. Note that, in theory, this sequene an go on indenitely. 0, 60 Hene, formulas like (1), (2), (3), (4), (5), (6), and (7) an always be thought as making sense. Of ourse, if we are dealing with a huge number of values, eah with a very small probability, alulating these formulas an beome exeedingly diult. For this reason, mathematiians have developed a tool to evaluate these sums to great auray, without having to atually add all these many small terms. If you will take a alulus lass, you will learn how this problem of omputing long sums of very small addends has been solved by the introdution of integrals. Sine we will not really need to perform those sums (we'll rely on the work of others who bothered to do that), we won't go into this eld. Of ourse, if you should pursue the study of statistis beyond introdutory ourses, you will denitely need to inlude alulus in your bag of tools! The one thing that we need to remember is the following: suppose that we have a r.v. taking very many values, very lose one to the next, eah with very small probability. Draw a histogram of this distribution: the probability of the event values a ≤ X ≤ b a and b: is given by the area below the histogram between the two 6 3 A Note on Continuous Random Variables Now, if we think of eah step to be very small, almost invisible (and note how this an be done by seleting a suitable sale for the units used to measure our variable!), we may substitute a smooth urve for the ragged line we have above: And, just as with the ragged urve, the probability for X to be between two values will be the area under the urve, between these two values. More often than not this is not a simple alulation, but there are plenty of tables and omputer programs that an do the work for us. 7 4 A Few Examples 4 A Few Examples The book onentrates on two ases: the binomial distribution, as the example disrete distribution, and the so-alled normal distribution, as the example ontinuous distribution. There are good reasons to onentrate on the latter (the main one being a deep theorem known as the Central Limit Theorem, whih we will disuss when applying all this material to statistial problems), and less so for the former (that is, the binomial distribution is a useful model, but it is extremely far from the exlusive disrete model). Here are a few dierent examples, with a brief mention of where they may arise. 4.1 Disrete Distributions 4.1.1 The Geometri Distribution Suppose an event may our with probability p. If we repeat it over and over, in an independent way, the time of rst ourrene will be given by P [N = k] = p (1 − p)k−1 If you are urious, you may inquire for the rst time the event will have ourred 2, 3, ... m times. You will nd the result in any book on probability, as the hypergeometri distribution 4.1.2 The Multinomial Distribution The binomial distribution is nie, but it onsiders two outomes only (say, win or lose). What if we have several dierent possible outomes (for instane we repeatedly toss a die, in an independent way what is the probability of having a ertain number of 1's, of 2's, ...? The formula is a slight ompliation of a1 , a2 , . . . , am , eah with p1 , p2 , . . . , pm and you repeat this experiment n times, you'll nd that the probability of having n1 times the outome a1 , n2 times the outome a2 , and so on, will be given by the formula (not that, neessarily, n1 + n2 + · · · nm = n the binomial formula: if you have possible outomes probability n! pn1 pn2 · · · pnmm n1 !n2 ! · · · nm ! 1 2 For m=2 4.1.3 this is just the binomial formula. The Poisson Distribution This is a distribution assoiated with rare events. In fat, it an be dedued as the limiting ase (in an appropriate sense) of the binomial distribution when the probability of one of the outomes is very small, and the number of attempts is very large. The lassial example is the number of alls arriving at a swithboard over a xed amount of time. This distribution is extremely useful in ruial appliations sine, for example, it is a simple but not unrealisti model 8 4 A Few Examples for the number of requests to use a resoure in a network (a omputer network, an eletrial network, the number of ustomers arriving at a teller, ...). It turns out that the number N in question has distribution P [N = k] = e−λ where λ λk k! is a parameter that, in a sense, is onneted the average time between requests (the higher the value of λ the more intense the ow of requests). This is a probability distribution beause of the remarkable formula (whih won't make sense until you learn some alulus) ∞ X λk k=0 Inidentally, it turns out that 4.2 k! EN = λ, = eλ and V ar (N ) = λ as well! Continuous Distributions Depending on the appliations, there are myriads of distributions that have been onsidered. A few examples follow. 4.2.1 The Exponential Distribution This is the ontinuous analog of the Geometri Distribution above (and it is onneted to it in a very preise mathematial sense). Here you have a possible model of rst time to an event, like the breakdown of a piee of equipment. One way of giving a formula is through P [T > t] = e−λt from whih you an easily dedue that P [a < T < b] = e−λa − e−λb With more alulations than we an perform here, you would nd that ET = 1 1 , V ar (T ) = 2 λ λ If this reminds you of the Poisson distribution's formulas, you are right: there is a strong onnetion between the two distributions: if the number of arrivals has a Poisson distribution, the time between arrivals has an exponential distribution, and the λ is the same! 4.2.2 The Weibull Distribution This a variation to the exponential distribution, used in survival analysis, when the exponential model is not appropriate; P [T > t] = e−λt α where α>0 is another parameter 9 4 A Few Examples 4.2.3 The Beta and the Gamma Distribution Just as the exponential distribution onerns the rst arrival time in a Poisson ow, and, in the disrete ase, onsidering multiple arrivals led from the geometri to the hypergeometri distribution, in the ontinuous ase the analog is the so-alled Gamma distribution. Also, in a number of problems more or less onneted to the same setup, another distribution omes up, alled the Beta distribution. We won't be onerned with these more omplex ases. 4.2.4 The Cauhy Distribution One you allow for innitely many outomes, you an't be really sure that EX, EX 2 , and so on make sense. In fat, there the parameters we dened, are many examples where they don't. These examples have beome of greater interest sine the explosion of Finanial Mathematis, where the modeling of the distribution of stok pries has led to onsider some of these exoti examples. The granddaddy of these examples is the Cauhy distribution. This is a distribution where the probability of the random variable to fall between and b is given by the area under the urve 1 1 · π 1 + x2 a 10 4 A Few Examples If you think this looks very muh like the Normal Distribution, you are only seeing a superial similarity. In fat, while for a normally distributed variable X, with parameters µ and σ, EX = µ, V ar (X) = σ attempts to ompute expeted value and variane for a Cauhy variable are fruitless: the numbers beome innitely large...