Download 9 Expectation and Variance

9 Expectation and Variance Two numbers are often used to summarize a probability distribution for a random variable X. The mean is a measure of the center or middle of the probability distribution, and the variance is a measure of the dispersion, or variability in the distribution. These two measures do not uniquely identify a probability distribution. That is, two different distributions can have the same mean and variance. Still, these measures are simple, useful summaries of the probability distribution of X. 9.1 Expectation of Discrete Random Variable The most important characteristic of a random variable is its expectation. Synonyms for expectation are expected value, mean, and first moment. The definition of expectation is motivated by the conventional idea of numerical average. Recall that the numerical average of n numbers, say a1 , a2 , . . . , an is n 1X ak . n k=1 We use the average to summarize or characterize the entire collection of numbers a1 , . . . , an with a single value. Example 9.1. Consider 10 numbers: 5, 2, 3, 2, 5, -2, 3, 2, 5, 2. The average is 5 + 2 + 3 + 2 + 5 + (−2) + 3 + 2 + 5 + 2 27 = = 2.7. 10 10 We can rewrite the above calculation as −2 × 1 4 2 3 +2× +3× +5× 10 10 10 10 102 Definition 9.2. Suppose X is a discrete random variable, we define the expectation (or mean or expected value) of X by X X x × pX (x). (15) x × P [X = x] = EX = x x In other words, The expected value of a discrete random variable is a weighted mean of the values the random variable can take on where the weights come from the pmf of the random variable. • Some references use mX or µX to represent EX. • For conciseness, we simply write x under the summation symbol in (15); this means that the sum runs over all x values in the support of X. (Of course, for x outside of the support, pX (x) is 0 anyway.) 9.3. Analogy: In mechanics, think of point masses on a line with a mass of pX (x) kg. at a distance x meters from the origin. In this model, EX is the center of mass (the balance point). This is why pX (x) is called probability mass function. Example 9.4. When X ∼ Bernoulli(p) with p ∈ (0, 1), Note that, since X takes only the values 0 and 1, its expected value p is “never seen”. 9.5. Interpretation: The expected value is in general not a typical value that the random variable can take on. It is often helpful to interpret the expected value of a random variable as the long-run average value of the variable over many independent repetitions of an experiment   1/4, x = 0 Example 9.6. pX (x) = 3/4, x = 2  0, otherwise 103 Example 9.7. For X ∼ P(α), EX = ∞ X ie i −α (α) i! i=0 = e−α α ∞ X k=0 = ∞ X e i −α (α) i! i=1 i+0=e αk = e−α αeα = α. k! −α ∞ X (α)i−1 (α) (i − 1)! i=1 Example 9.8. For X ∼ B(n, p), n n X X n i n! EX = i p (1 − p)n−i = pi (1 − p)n−i i i! (n − i)! i i=0 i=1 n n X X (n − 1)! n−1 i n−i i =n p (1 − p) =n p (1 − p)n−i (i − 1)! (n − i)! i−1 i=1 i=1 Let k = i − 1. Then, EX = n n−1 X k=0 n−1 X n − 1 n − 1 k+1 n−(k+1) p (1 − p) = np pk (1 − p)n−1−k k k k=0 We now have the expression in the form that we can apply the binomial theorem which finally gives EX = np(p + (1 − p))n−1 = np. We shall revisit this example again using another approach in Example 11.48. Example 9.9. Pascal’s wager : Suppose you concede that you don’t know whether or not God exists and therefore assign a 50 percent chance to either proposition. How should you weigh these odds when deciding whether to lead a pious life? If you act piously and God exists, Pascal argued, your gain–eternal happiness–is infinite. If, on the other hand, God does not exist, your loss, or negative return, is small–the sacrifices of piety. To weigh these possible gains and losses, Pascal proposed, you multiply the probability of each possible outcome by its payoff and add them all up, forming a kind of average or expected payoff. In other words, the mathematical expectation of your return on piety is one-half infinity (your gain if God exists) minus one-half a small number (your loss if he does not exist). Pascal knew enough about infinity to 104 know that the answer to this calculation is infinite, and thus the expected return on piety is infinitely positive. Every reasonable person, Pascal concluded, should therefore follow the laws of God. [14, p 76] • Pascals wager is often considered the founding of the mathematical discipline of game theory, the quantitative study of optimal decision strategies in games. 9.10. Technical issue: Definition (15) is only meaningful if the sum is well defined. The sum of infinitely many nonnegative terms is always welldefined, with +∞ as a possible value for the sum. • Infinite Expectation: Consider a random variable X whose pmf is defined by 1 , x = 1, 2, 3, . . . pX (x) = cx2 0, otherwise P 1 2 Then, c = ∞ n=1 n2 which is a finite positive number (π /6). However, EX = ∞ X k=1 ∞ ∞ X 1X1 11 = +∞. kpX (k) = k 2= ck c k k=1 k=1 Some care is necessary when computing expectations of signed random variables that take infinitely many values. • The sum over countably infinite many terms is not always well defined when both positive and negative terms are involved. • For example, the infinite series 1 − 1 + 1 − 1 + . . . has the sum 0 when you sum the terms according to (1 − 1) + (1 − 1) + · · · , whereas you get the sum 1 when you sum the terms according to 1 + (−1 + 1) + (−1 + 1) + (−1 + 1) + · · · . • Such abnormalities cannot happen when all terms in the infinite summation are nonnegative. 105 It is the convention in probability theory that EX should be evaluated as X X EX = xpX (x) − (−x)pX (x), x≥0 x<0 • If at least one of these sums is finite, then it is clear what value should be assigned as EX. • If both sums are +∞, then no value is assigned to EX, and we say that EX is undefined. Example 9.11. Undefined Expectation: Let 1 , x = ±1, ±2, ±3, . . . pX (x) = 2cx2 0, otherwise Then, EX = ∞ X k=1 kpX (k) − −1 X (−k) pX (k). k=−∞ The first sum gives ∞ X kpX (k) = k=1 ∞ X k=1 ∞ 1 1 X1 ∞ k = = . 2ck 2 2c k 2c k=1 The second sum gives −1 X k=−∞ (−k) pX (k) = ∞ X kpX (−k) = k=1 ∞ X k=1 ∞ 1 X1 1 ∞ = k = . 2 2ck 2c k 2c k=1 Because both sums are infinite, we conclude that EX is undefined. 9.12. More rigorously, to define EX, we let X + = max {X, 0} and X − = − min {X, 0}. Then observe that X = X + − X − and that both X + and X − are nonnegative r.v.’s. We say that a random variable X admits an expectation if EX + and EX − are not both equal to +∞. In which case, EX = EX + − EX − . 106 9.2 Function of a Discrete Random Variable Given a random variable X, we will often have occasion to define a new random variable by Y ≡ g(X), where g(x) is a real-valued function of the real-valued variable x. More precisely, recall that a random variable X is actually a function taking points of the sample space, ω ∈ Ω, into real numbers X(ω). Hence, we have the following definition Definition 9.13. The notation Y = g(X) is actually shorthand for Y (ω) := g(X(ω)). • The random variable Y = g(X) is sometimes called derived random variable. Example 9.14. Let pX (x) = 1 2 cx , 0, x = ±1, ±2 otherwise and Y = X 4. Find pY (y) and then calculate EY . 9.15. For discrete random variable X, the pmf of a derived random variable Y = g(X) is given by X pY (y) = pX (x). x:g(x)=y 107 Note that the sum is over all x in the support of X which satisfy g(x) = y. Example 9.16. A “binary” random variable X takes only two values a and b with P [X = b] = 1 − P [X = a] = p. X can be expressed as X = (b − a)I + a, where I is a Bernoulli random variable with parameter p. 9.3 Expectation of a Function of a Discrete Random Variable Recall that for discrete random variable X, the pmf of a derived random variable Y = g(X) is given by X pX (x). pY (y) = x:g(x)=y If we want to compute EY , it might seem that we first have to find the pmf of Y . Typically, this requires a detailed analysis of g which can be complicated, and it is avoided by the following result. 9.17. Suppose X is a discrete random variable. X g(x)pX (x). E [g(X)] = x This is referred to as the law/rule of the lazy/unconcious statistician (LOTUS) [22, Thm 3.6 p 48],[9, p. 149],[8, p. 50] because it is so much easier to use the above formula than to first find the pmf of Y . It is also called substitution rule [21, p 271]. Example 9.18. Back to Example 9.14. Recall that 1 2 x , x = ±1, ±2 pX (x) = c 0, otherwise (a) When Y = X 4 , EY = 108 (b) E [2X − 1] 9.19. Caution: A frequently made mistake of beginning students is to set E [g(X)] equal to g (EX). In general, E [g(X)] 6= g (EX). 1 (a) In particular, E X1 is not the same as EX . (b) An exception is the case of an affine function g(x) = ax + b. See also (9.23). Example 9.20. Continue from Example 9.4. For X ∼ Bernoulli(p), (a) EX = p (b) E X 2 = 02 × (1 − p) + 12 × p = p 6= (EX)2 . Example 9.21. Continue from Example 9.7. Suppose X ∼ P(α). ∞ ∞ i X 2 X αi−1 −α 2 −α α =e α i E X = ie i! (i − 1)! i=1 i=0 (16) We can evaluate the infinite sum in (16) by rewriting i as i−1+1: ∞ X i=1 ∞ i ∞ ∞ X X X αi−1 αi−1 αi−1 αi−1 = (i − 1 + 1) = (i − 1) + (i − 1)! (i − 1)! (i − 1)! (i − 1)! i=1 ∞ X =α i=2 αi−2 (i − 2)! + ∞ X i=1 i=1 i−1 α (i − 1)! i=1 = αeα + eα = eα (α + 1). Plugging this back into (16), we get E X 2 = α (α + 1) = α2 + α. 9.22. 9.8. For X ∼ B(n, p), one can find 2 Continue from Example E X = np(1 − p) + (np)2 . 109 9.23. Some Basic Properties of Expectations (a) For c ∈ R, E [c] = c (b) For c ∈ R, E [X + c] = EX + c and E [cX] = cEX (c) For constants a, b, we have E [aX + b] = aEX + b. (d) For constants c1 and c2 , E [c1 g1 (X) + c2 g2 (X)] = c1 E [g1 (X)] + c2 E [g2 (X)] . (e) For constants c1 , c2 , . . . , cn , " n # n X X E ck gk (X) = ck E [gk (X)] . k=1 k=1 Definition 9.24. Some definitions involving expectation of a function of a random variable: i i h h 0 k (a) Absolute moment: E |X| , where we define E |X| = 1 (b) Moment: mk = E X k = the k th moment of X, k ∈ N. • The first moment of X is its expectation EX. • The second moment of X is E X 2 . 110 9.4 Variance and Standard Deviation An average (expectation) can be regarded as one number that summarizes an entire probability model. After finding an average, someone who wants to look further into the probability model might ask, “How typical is the average?” or, “What are the chances of observing an event far from the average?” A measure of dispersion/deviation/spread is an answer to these questions wrapped up in a single number. (The opposite of this measure is the peakedness.) If this measure is small, observations are likely to be near the average. A high measure of dispersion suggests that it is not unusual to observe events that are far from the average. Example 9.25. Consider your score on the midterm exam. After you find out your score is 7 points above average, you are likely to ask, “How good is that? Is it near the top of the class or somewhere near the middle?”. Example 9.26. In the case that the random variable X is the random payoff in a game that can be repeated many times under identical conditions, the expected value of X is an informative measure on the grounds of the law of large numbers. However, the information provided by EX is usually not sufficient when X is the random payoff in a nonrepeatable game. Suppose your investment has yielded a profit of $3,000 and you must choose between the following two options: • the first option is to take the sure profit of $3,000 and • the second option is to reinvest the profit of $3,000 under the scenario that this profit increases to $4,000 with probability 0.8 and is lost with probability 0.2. The expected profit of the second option is 0.8 × $4, 000 + 0.2 × $0 = $3, 200 and is larger than the $3,000 from the first option. Nevertheless, most people would prefer the first option. The downside risk is too big for them. A measure that takes into account the aspect of risk is the variance of a random variable. [21, p 35] 111 9.27. The most important measures of dispersion are the standard deviation and its close relative, the variance. Definition 9.28. Variance: h 2 Var X = E (X − EX) i . (17) • Read “the variance of X” 2 , or VX [22, p. 51] • Notation: DX , or σ 2 (X), or σX • In some references, to avoid confusion from the two expectation symbols, they first define m = EX and then define the variance of X by Var X = E (X − m)2 . • We can also calculate the variance via another identity: Var X = E X 2 − (EX)2 • The units of the variance are squares of the units of the random variable. 9.29. Basic properties of variance: • Var X ≥ 0. • Var X ≤ E X 2 . • Var[cX] = c2 Var X. • Var[X + c] = Var X. • Var[aX + b] = a2 Var X. 112 Definition 9.30. Standard Deviation: p σX = Var[X]. • It is useful to work with the standard deviation since it has the same units as EX. • Informally we think of outcomes within ±σX of EX as being in the center of the distribution. Some references would informally interpret sample values within ±σX of the expected value, x ∈ [EX − σX , EX + σX ], as “typical” values of X and other values as “unusual”. • σaX+b = |a| σX . √ √ 9.31. σX and Var X: Note that the √ · function is a strictly increasing function. Because σX = Var X, if one of them is large, another one is also large. Therefore, both values quantify the amount of spread/dispersion in RV X (which can be observed from the spread or dispersion of the pmf or the histogram or the relative frequency graph). However, Var X does not have the same unit as the RV X. 9.32. In finance, standard deviation is a key concept and is used to measure the volatility (risk) of investment returns and stock returns. It is common wisdom in finance that diversification of a portfolio of stocks generally reduces the total risk exposure of the investment. We shall return to this point in Example 11.67. Example 9.33. Continue from Example 9.25. If the standard deviation of exam scores is 12 points, the student with a score of +7 with respect to the mean can think of herself in the middle of the class. If the standard deviation is 3 points, she is likely to be near the top. Example 9.34. Suppose X ∼ Bernoulli(p). (a) E X 2 = 02 × (1 − p) + 12 × p = p. 113 (b) Var X = EX 2 − (EX)2 = p − p2 = p(1 − p). Alternatively, if we directly use (17), we have Var X = E (X − EX)2 = (0 − p)2 × (1 − p) + (1 − p)2 × p = p(1 − p)(p + (1 − p)) = p(1 − p). Example 9.35. Continue from Example 9.7 and Example 9.21. Suppose X ∼ P(α). We have Var X = E X 2 − (EX)2 = α2 + α − α2 = α. Therefore, for Poisson random variable, the expected value is the same as the variance. Example 9.36. Consider two pmfs shown in Figure 11. The 2.4the Expectation 85 random variable X with pmf at the left has a smaller variance than the random variable Y withofpmf atitsthe right because more The variance is the average squared deviation X about mean. The variance characterizes how likely itmass is to observe values of the random variable far (their from its mean. For example, probability is concentrated near zero mean) in the consider the two pmfs shown in Figure 2.9. More probability mass is concentrated near zero graph at atthe than ingraph the atgraph at the right. [9, p. 85] in the graph the left left than in the the right. p ( i) p ( i) Y X −2 −1 1/3 1/3 1/6 1/6 0 1 2 i −2 −1 0 1 2 i Figure Example 2.27 shows the random X with pmf at the left has a smaller variance than the Figure2.9.11: Example 9.36thatshows thatvariable a random variable whose probability mass random variable Y with pmf at the right. is concentrated near the mean has smaller variance. [9, Fig. 2.9] Example 2.27. Let X and Y be the random variables with respective pmfs shown in Figure 2.9.We Compute and var(Y ). 9.37. havevar(X) already talked about variance and standard deviation as By a number of=the ] and Solution. symmetry, that both Xindicates and Y have spread/dispersion zero mean, and so var(X) E[X 2pmf. 2 ]. Write var(Y ) = E[Y More specifically, let’s imagine a pmf that shapes like a bell curve. 2 ] = (−2)2 1 + (−1)2 1 + (1)2 1 + (2)2 1 = 2, E[Xσ As the value of X gets smaller, 6 3 the spread 3 6 of the pmf will be smaller and hence the pmf would “look sharper”. Therefore, the and 2 ] = (−2)2 1 + (−1)2 1 + (1)2 1 + (2)2 1 = 3. E[Ythe probability that random X 6would3 take a value that is 3 variable 6 Thus, X andthe Y aremean both zero-mean random variables taking the values ±1 and ±2. But Y far from would be smaller. is more likely to take values far from its mean. This is reflected by the fact that var(Y ) > var(X). When a random variable does not have114 zero mean, it is often convenient to use the variance formula, var(X) = E[X 2 ] − (E[X])2 , (2.17) The next property involves the use of σX to bound “the tail probability” of a random variable. 9.38. Chebyshev’s Inequality : 2 σX P [|X − EX| ≥ α] ≤ 2 α or equivalently P [|X − EX| ≥ nσX ] ≤ 1 n2 • Useful only when α > σX Example 9.39. If X has mean m and variance σ 2 , it is sometimes convenient to introduce the normalized random variable Y = X −m . σ Definition 9.40. Central Moments: A generalization of the variance is the nth central moment which is defined to be µn = E [(X − EX)n ] . (a) µ1 = E [X − EX] = 0. 2 (b) µ2 = σX = Var X: the second central moment is the variance. 115 Sirindhorn International Institute of Technology Thammasat University School of Information, Computer and Communication Technology ECS315 2014/1 Part IV.1 Dr.Prapun 10 10.1 Continuous Random Variables From Discrete to Continuous Random Variables In many practical applications of probability, physical situations are better described by random variables that can take on a continuum of possible values rather than a discrete number of values. For this type of random variable, the interesting fact is that • any individual value has probability zero: P [X = x] = 0 for all x (18) and that • the support is always uncountable. These random variables are called continuous random variables. 10.1. We can see from (18) that the pmf is going to be useless for this type of random variable. It turns out that the cdf FX is still useful and we shall introduce another useful function called probability density function (pdf) to replace the role of pmf. However, integral calculus36 is required to formulate this continuous analog of a pmf. 10.2. In some cases, the random variable X is actually discrete but, because the range of possible values is so large, it might be more convenient to analyze X as a continuous random variable. 36 This is always a difficult concept for the beginning student. 117 Example 10.3. Suppose that current measurements are read from a digital instrument that displays the current to the nearest onehundredth of a mA. Because the possible measurements are limited, the random variable is discrete. However, it might be a more convenient, simple approximation to assume that the current measurements are values of a continuous random variable. Example 10.4. If you can measure the heights of people with infinite precision, the height of a randomly chosen person is a continuous random variable. In reality, heights cannot be measured with infinite precision, but the mathematical analysis of the distribution of heights of people is greatly simplified when using a mathematical model in which the height of a randomly chosen person is modeled as a continuous random variable. [17, p 284] Example 10.5. Continuous random variables are important models for (a) voltages in communication receivers (b) file download times on the Internet (c) velocity and position of an airliner on radar (d) lifetime of a battery (e) decay time of a radioactive particle (f) time until the occurrence of the next earthquake in a certain region Example 10.6. The simplest example of a continuous random variable is the “random choice” of a number from the interval (0, 1). • In MATLAB, this can be generated by the command rand. In Excel, use rand(). • The generation is “unbiased” in the sense that “any number in the range is as likely to occur as another number.” • Histogram is flat over (0, 1). • Formally, this is called a uniform RV on the interval (0, 1). 118 Definition 10.7. We say that X is a continuous random variable37 if we can find a (real-valued) function38 f such that, for any set B, P [X ∈ B] has the form Z P [X ∈ B] = f (x)dx. (19) B • In particular, P [a ≤ X ≤ b] = Z b f( x)dx. (20) a In other words, the area under the graph of f (x) between the points a and b gives the probability P [a ≤ X ≤ b]. • The function f is called the probability density function (pdf) or simply density. • When we want to emphasize that the function f is a density of a particular random variable X, we write fX instead of f . 37 To be more rigorous, this is the definition for absolutely continuous random variable. At this level, we will not distinguish between the continuous random variable and absolutely continuous random variable. When the distinction between them is considered, a random variable X is said to be continuous (not necessarily absolutely continuous) when condition (18) is satisfied. Alternatively, condition (18) is equivalent to requiring the cdf FX to be continuous. Another fact worth mentioning is that if a random variable is absolutely continuous, then it is continuous. So, absolute continuity is a stronger condition. 38 Strictly speaking, δ-“function” is not a function; so, can’t use δ-function here. 119 2. The range of possible x values is along the horizontal axis. 3. The probability that x will take on a value between a and b will be the area under the curve between points a and b, as shown in Figure 7.1. The FIGURE 7.1 Area = P(a ≤ x ≤ b) f (x) For a continuous random variable, the probability distribution is described by a curve called the probability density function, f(x). The total area beneath the curve is 1.0, and the probability that x will take on some value between a and b is the area beneath the curve between points a and b. a x b Figure 13: For a continuous random variable, the probability distribution is described by a curve called the probability density function, f (x). The total area beneath the curve is 1.0, and the probability that X will take on some value between a and b is the area beneath the curve between points a and b. Example 10.8. For the random variable generated by the rand command in MATLAB39 or the rand() command in Excel, Definition 10.9. Recall that the support SX of a random variable X is any set S such that P [X ∈ S] = 1. For continuous random variable, SX is usually set to be {x : fX (x) > 0}. 39 The rand command in MATLAB is an approximation for two reasons: (a) It produces pseudorandom numbers; the numbers seem random but are actually the output of a deterministic algorithm. (b) It produces a double precision floating point number, represented in the computer by 64 bits. Thus MATLAB distinguishes no more than 264 unique double precision floating point numbers. By comparison, there are uncountably infinite real numbers in the interval from 0 to 1. 120 10.2 Properties of PDF and CDF for Continuous Random Variables 10.10. fX is determined only almost everywhere40 . That is, given a pdf f for a random variable X, if we construct a function g by changing the function f at a countable number of points41 , then g can also serve as a pdf for X. 10.11. The cdf of any kind of random variable X is defined as FX (x) = P [X ≤ x] . Note that even though there are more than one valid pdfs for any given random variable, the cdf is unique. There is only one cdf for each random variable. 10.12. For continuous random variable, given the pdf fX (x), we can find the cdf of X by Z x FX (x) = P [X ≤ x] = fX (t)dt. −∞ 10.13. Given the cdf FX (x), we can find the pdf fX (x) by • If FX is differentiable at x, we will set d FX (x) = fX (x). dx • If FX is not differentiable at x, we can set the values of fX (x) to be any value. Usually, the values are selected to give simple expression. (In many cases, they are simply set to 0.) 40 41 Lebesgue-a.e, to be exact More specifically, if g = f Lebesgue-a.e., then g is also a pdf for X. 121 Example 10.14. For the random variable generated by the rand command in MATLAB or the rand() command in Excel, Example 10.15. Suppose that the lifetime X of a device has the cdf  x<0  0, 1 2 x , 0≤x≤2 FX (x) =  4 1, x>2 Observe that it is differentiable at each point x except at x = 2. The probability density function is obtained by differentiation of the cdf which gives 1 x, 0 < x < 2 fX (x) = 2 0, otherwise. At x = 2 where FX has no derivative, it does not matter what values we give to fX . Here, we set it to be 0. 10.16. In many situations when you are asked to find pdf, it may be easier to find cdf first and then differentiate it to get pdf. Exercise 10.17. A point is “picked at random” in the inside of a circular disk with radius r. Let the random variable X denote the distance from the center of the disk to this point. Find fX (x). 10.18. Unlike the cdf of a discrete random variable, the cdf of a continuous random variable has no jump and is continuous everywhere. Rx 10.19. pX (x) = P [X = x] = P [x ≤ X ≤ x] = x fX (t)dt = 0. Again, it makes no sense to speak of the probability that X will take on a pre-specified value. This probability is always zero. 10.20. P [X = a] = P [X = b] = 0. Hence, P [a < X < b] = P [a ≤ X < b] = P [a < X ≤ b] = P [a ≤ X ≤ b] 122 • The corresponding integrals over an interval are not affected by whether or not the endpoints are included or excluded. • When we work with continuous random variables, it is usually not necessary to be precise about specifying whether or not a range of numbers includes the endpoints. This is quite different from the situation we encounter with discrete random variables where it is critical to carefully examine the type of inequality. R 10.21. fX is nonnegative and R fX (x)dx = 1. Example 10.22. Random variable X has pdf −2x ce , x > 0 fX (x) = 0, otherwise Find the constant c and sketch the pdf. Definition 10.23. A continuous random variable is called exponential if its pdf is given by −λx λe , x > 0, fX (x) = 0, x≤0 for some λ > 0 Theorem 10.24. Any nonnegative42 function that integrates to one is a probability density function (pdf) of some random variable [8, p.139]. 42 or nonnegative a.e. 123 −∞ B for some integrable function f .a Since P(X ∈ IR) = 1, the function f must integrate to one; ∞ i.e., −∞ f (t) dt = 1. Further, since P(X ∈ B) ≥ 0 for all B, it can be shown that f must be nonnegative.1 A nonnegative function that integrates to one is called a probability density function (pdf). 10.25. Intuition/Interpretation: Usually, the set B is an interval such as B = [a, b]. In this case, The use of the word “density” originated with the analogy to the distribution of matterinb space. In physics, any finite volume, f (t) dt. P(a ≤ X ≤ b) = no matter how small, has aa positive mass, but there is no mass at single point. similar description applies to continuous See Figure 4.1(a).a Computing such A probabilities is analogous to determining the mass of random a variables. piece of wire stretching from a to b by integrating its mass density per unit length from a to b. Since most probability densities we work are continuous, for a small interval, say Approximately, for awith small ∆x, [x, x + ∆x], we have Z x+∆x x+∆x P [X ∈ [x, x + ∆x]] = fX (t)dt ≈ fX (x)∆x. P(x ≤ X ≤ x + ∆x) = x f (t) dtx ≈ f (x) ∆x. See Figure 4.1(b).This is why we call fX the density function. a x x+ x b Figure 14: (a) P [x ≤ X ≤ x + ∆x] is (b) the area of the shaded vertical strip. Figure 4.1. (a) P(a ≤ X ≤ b) = ab f (t) dt is the area of the shaded region under the density f (t). (b) P(x ≤ X ≤ In other words, the probability of random variable X taking x + ∆x) = xx+∆x f (t) dt is the area of the shaded vertical strip. on a value in a small interval around point c is approximately equal Note that for to random variables with f (c)∆c when ∆ca density, is the length of the interval. [x<X≤x+∆x] P(a ≤ X ≤ b) = P(a < X ≤ b) = P(aP ≤ X < b) = P(a < X < b) • In fact, fX (x) = lim ∆x→0 ∆x since the corresponding integrals over an interval are not affected by whether or not the • The number fX (x) itself is not a probability. In particular, endpoints are included or excluded. it does not have to be between 0 and 1. Some common densities • fX (c)of is a relative measure forA summary the likelihood that random Here are some examples continuous random variables. of the more comX will take a value in the immediate neighborhood mon ones can be foundvariable on the inside of the backon cover. a Later, of random pointvariable c. is involved, we write fX (x) instead of f (x). when more than one Stated differently, the pdf fX (x) expresses how densely the probability mass of random variable X is smeared out in the neighborhood of point x. Hence, the name of density function. 124 10.26. Histogram and pdf [17, 143approximation and 145]: From Histogram to ppdf Number of samples = 5000 2 4 6 Histogram 8 10 12 5000 Samples 14 16 18 Number of occurrences 1000 0.25 Vertical axis scaling 500 0 2 4 6 8 10 12 x Frequency (%) of occurrences 14 0.216 pdf Estimated pdf 18 20 0.15 10 0 2 4 6 8 10 x 12 14 16 0.1 18 0.05 6 0 2 4 6 8 10 x 12 14 16 18 Figure 15: From histogram to pdf. (a) A probability histogram is a bar chart that divides the range of values covered by the samples/measurements into intervals of the same width, and shows the proportion (relative frequency) of the samples in each interval. • To make a histogram, you break up the range of values covered by the samples into a number of disjoint adjacent intervals each having the same width, say width ∆. The height of the bar on each interval [j∆, (j + 1)∆) is taken such that the area of the bar is equal to the proportion of the measurements falling in that interval (the proportion of measurements within the interval is divided by the width of the interval to obtain the height of the bar). • The total area under the probability histogram is thus standardized/normalized to one. (b) If you take sufficiently many independent samples from a continuous random variable and make the width ∆ of the base intervals of the probability histogram smaller and smaller, the graph of the probability histogram will begin to look more and more like the pdf. 125 (c) Conclusion: A probability density function can be seen as a “smoothed out” version of a probability histogram 10.3 Expectation and Variance 10.27. Expectation: Suppose X is a continuous random variable with probability density function fX (x). Z ∞ EX = xfX (x)dx (21) Z−∞ ∞ E [g(X)] = g(x)fX (x)dx (22) −∞ In particular, E X2 = Var X = Z ∞ Z−∞ ∞ −∞ x2 fX (x)dx (x − EX)2 fX (x)dx = E X 2 − (EX)2 . Example 10.28. For the random variable generated by the rand command in MATLAB or the rand() command in Excel, Example 10.29. For the exponential random variable introduced in Definition 10.23, 126 10.30. If we compare other characteristics of discrete and continuous random variables, we find that with discrete random variables, many facts are expressed as sums. With continuous random variables, the corresponding facts are expressed as integrals. 10.31. All of the properties for the expectation and variance of discrete random variables also work for continuous random variables as well: (a) Intuition/interpretation of the expected value: As n → ∞, the average of n independent samples of X will approach EX. This observation is known as the “Law of Large Numbers”. (b) For c ∈ R, E [c] = c (c) For constants a, b, we have E [aX + b] = aEX + b. P P (d) E [ ni=1 ci gi (X] = ni=1 ci E [gi (X)]. (e) Var X = E X 2 − (EX)2 (f) Var X ≥ 0. (g) Var X ≤ E X 2 . (h) Var[aX + b] = a2 Var X. (i) σaX+b = |a| σX . 10.32. Chebyshev’s Inequality : P [|X − EX| ≥ α] ≤ 2 σX α2 or equivalently 1 n2 • This inequality use variance to bound the “tail probability” of a random variable. P [|X − EX| ≥ nσX ] ≤ • Useful only when α > σX 127 Example 10.33. A circuit is designed to handle a current of 20 mA plus or minus a deviation of less than 5 mA. If the applied current has mean 20 mA and variance 4 mA2 , use the Chebyshev inequality to bound the probability that the applied current violates the design parameters. Let X denote the applied current. Then X is within the design parameters if and only if |X − 20| < 5. To bound the probability that this does not happen, write P [|X − 20| ≥ 5] ≤ Var X 4 = = 0.16. 52 25 Hence, the probability of violating the design parameters is at most 16%. 10.34. Interesting applications of expectation: (a) fX (x) = E [δ (X − x)] (b) P [X ∈ B] = E [1B (X)] 128 Sirindhorn International Institute of Technology Thammasat University School of Information, Computer and Communication Technology ECS315 2014/1 Part IV.2 Dr.Prapun 10.4 Families of Continuous Random Variables Theorem 10.24 states that any nonnegative function f (x) whose integral over the interval (−∞, +∞) equals 1 can be regarded as a probability density function of a random variable. In real-world applications, however, special mathematical forms naturally show up. In this section, we introduce a couple families of continuous random variables that frequently appear in practical applications. The probability densities of the members of each family all have the same mathematical form but differ only in one or more parameters. 10.4.1 Uniform Distribution Definition 10.35. For a uniform random variable on an interval [a, b], we denote its family by uniform([a, b]) or U([a, b]) or simply U(a, b). Expressions that are synonymous with “X is a uniform random variable” are “X is uniformly distributed”, “X has a uniform distribution”, and “X has a uniform density”. This family is characterized by 0, x < a, x > b fX (x) = 1 b−a , a ≤ x ≤ b • The random variable X is just as likely to be near any value in [a, b] as any other value. 129 • In MATLAB, (a) use X = a+(b-a)*rand or X = random(’Uniform’,a,b) to generate the RV, 84 (b) use pdf(’Uniform’,x,a,b) and cdf(’Uniform’,x,a,b) to calculate the pdf and cdf, respectively. 0, x < a, x > b Exercise 10.36. Show that FX (x) = x−a b−a , a ≤ x ≤ b Probability theory, random variables and random processes Fx(x) fx(x) 1 1 b–a Fig. 3.5 a x b 0 a Fig. 3.6 x The pdf and cdf for the uniform random variable. Figure 16: The pdf and cdf for the uniform random variable. [16, Fig. 3.5] Fx(x) fx(x) 1 Example 10.37 (F2011). Suppose X is uniformly distributed on 2πσ 2 1 the interval (1, 2). (X ∼ U(1, 2).) (a) Plot the pdf fX (x) of X. b 0 0 1 2 x μ 0 μ x The pdf and cdf of a Gaussian (b) Plot the cdf Frandom of X. X (x) variable. G a u ss i a n ( o r n o r m a l ) ra n d o m va r i a b l e This is a continuous random variable that is described by the following pdf: 10.38. The uniform distribution provides a probability model for ' 1from the interval (x − μ)2 [a, b]. selecting a point at frandom (x) = √ exp − , (3.16) x 2π σ 2 2σ 2 • Use with caution to model a quantity that is known to vary parameters whose meaning is described later. Itelse is usually denoted where μrandomly and σ 2 are two between a and b but about which little is known. 2 as N (μ, σ ). Figure 3.6 shows sketches of the pdf and cdf of a Gaussian random variable. The Gaussian random variable is the most important and frequently encountered random variable in communications. This is because 130 thermal noise, which is the major source of noise in communication systems, has a Gaussian distribution. Gaussian noise and the Gaussian pdf are discussed in more depth at the end of this chapter. The problems explore other pdf models. Some of these arise when a random variable Example 10.39. [9, Ex. 4.1 p. 140-141] In coherent radio communications, the phase difference between the transmitter and the receiver, denoted by Θ, is modeled as having a uniform density on [−π, π]. (a) P [Θ ≤ 0] = 1 2 (b) P Θ ≤ π2 = 3 4 Exercise 10.40. Show that when X ∼ U([a, b]), EX = 2 2 1 2 2 = . Var X = (b−a) , and E X b + ab + a 12 3 10.4.2 a+b 2 , Gaussian Distribution 10.41. This is the most widely used model for the distribution of a random variable. When you have many independent random variables, a fundamental result called the central limit theorem (CLT) (informally) says that the sum (technically, the average) of them can often be approximated by normal distribution. Definition 10.42. Gaussian random variables: (a) Often called normal random variables because they occur so frequently in practice. (b) In MATLAB, use X = random(’Normal’,m,σ) or X = σ*randn + m. (c) fX (x) = √ 1 e− 2 ( 2πσ 1 x−m 2 σ ). • In Excel, use NORMDIST(x,m,σ,FALSE). In MATLAB, use normpdf(x,m,σ) or pdf(’Normal’,x,m,σ). • Figure 17 displays the famous bell-shaped graph of the Gaussian pdf. This curve is also called the normal curve. 131 84 Probability theory, random variables and random processes Fx(x) fx(x) (d) FX (x) has no closed-form expression. However, see 10.48. 1 1 Fig. 3.5 • Inb –MATLAB, use normcdf(x,m,σ) or cdf(’Normal’,x,m,σ). a • In Excel, use NORMDIST(x,m,σ,TRUE). x x a b 0 b σ2 . 0 N m, (e) We write aX ∼ The pdf and cdf for the uniform random variable. Fx(x) fx(x) 1 2πσ 2 1 1 2 0 Fig. 3.6 x μ 0 μ x The pdf and cdf of a Gaussian random variable. Figure 17: The pdf and cdf of N (µ, σ 2 ). [16, Fig. 3.6] G a u ss i a n ( o r n o r m a l ) ra n d o m va r i a b l e 2 10.43. EX by=themfollowing and Var is described pdf: X = σ . This is a continuous random variable that ' (x − μ)2 exp − , probabilities: 2σ 2 2π σ 2 fx (x) = √ 1 (3.16) 10.44. Important P [|X − µ| < σ] = 0.6827; where μ and σ 2 are two parameters whose meaning is described later. It is usually denoted P [|X −(μ,µ|σ 2> σ] =3.60.3173; ). Figure shows sketches of the pdf and cdf of a Gaussian random variable. as N Gaussian random variable is the most important and frequently encountered ranP [|X The − µ| > 2σ] = 0.0455; dom variable in communications. This is because thermal noise, which is the major source P [|X − µ| < 2σ] = 0.9545 of noise in communication systems, has a Gaussian distribution. Gaussian noise and the These are illustrated Figure 20. Gaussianvalues pdf are discussed in more depth in at the end of this chapter. The problems explore other pdf models. Some of these arise when a random variable is passed through a nonlinearity. How to determine the pdf of the random variable in this Example 10.45. case is discussed next.Figure 21 compares several deviation scores and the normal distribution: n c t i o n s of a ra n d o m va r i a b l e A function of a random variable y = g(x) is itself a (a) FuStandard scores have a mean of zero and a standard deviation random variable. From the definition, the cdf of y can be written as of 1.0. Fy (y) = P(ω ∈ : g(x(ω)) ≤ y). (3.17) (b) Scholastic Aptitude Test scores have a mean of 500 and a standard deviation of 100. 132 109 3.5 The Gaussian random variable and process (a) 0.6 0.4 Signal amplitude (V) 0.2 0 −0.2 −0.4 −0.6 −0.8 0 0.2 0.4 0.6 0.8 1 t (s) (b) 4 Histogram Gaussian fit Laplacian fit 3.5 3 fx(x) (1/V) 2.5 2 1.5 1 0.5 Fig. 3.14 0 −1 −0.5 0 x (V) 0.5 1 (a) A sample skeletal muscle (emg) signal, and (b) its histogram and pdf fits. Figure 18: Electrical activity of a skeletal muscle: (a) A sample skeletal muscle (emg) signal, and (b) its histogram and pdf fits. [16, Fig. 3.14] 1= = = 2 ∞ fx (x)dx = ∞ K1 e−ax dx 2 −∞ −∞ ∞ ∞ 2 2 K12 e−ax dx e−ay dy x=−∞ y=−∞ ∞ ∞ 2 2 K12 e−a(x +y ) dxdy. x=−∞ y=−∞ 133 2 (3.103) standard score. 3.5 The Gaussian random variable andhelp processus to determine The normal distribution can probabilities. 0.4 6.4 Notation σx = 1 σ =2 0.35 The z notation is critical in the use ofσx =normal x 5 0.3 distributions. 6.5 Normal Approximation of the Binomial 0.2 0.25 fx(x) 6.3 Applications of Normal Distributions 111 Binomial probabilities can be estimated by using a normal distribution. 0.15 0.1 23) Fourier transform: F ( fX ) = 0.05 ∞ ∫ f ( x)e X − jω x dt = e 1 − jω m − ω 2σ 2 2 mal Probability Distributions ence Scores . −∞ ∞ 24) Note that ∫e 0 x2 −α−15 dx = −∞ π−10 . α −5 0 x 5 10 15 Plots of the zero-meanxGaussian pdf for different values −m x −ofmstandard deviation, x−m σ . ⎞ x ⎛ ⎞ ⎛ ⎞ ⎛ 1− Q ⎜ = Q⎜− . [ X > x ] =ofQ the ⎜ ⎟ ; P [ X < x ] =Gaussian ⎟pdf Figure 25) 19:P Plots zero-mean for different values of standard σ ⎟⎠ ⎝ σ ⎠ ⎝ σ ⎠ ⎝ deviation,• σXP.⎡ X[16, − μ Fig. < σ ⎤ =3.15] 0.6827, P ⎡ X − μ > σ ⎤ = 0.3173 Fig. 3.15 ⎣ ⎦ ⎣ ⎦ μ > 2σ ⎤⎦Table = 0.0455, P ⎡⎣ X −single μ σ<x2on σ ⎤⎦different =most 0.9545 mal probability distributionP ⎡⎣isX −considered the important proba3.1 Influence of quantities stribution. An unlimited number a normal Range (±kσx )of continuous random k = 1 variables k = 2 have either k=3 k=4 roximately normal distribution. f ( x) ) x) P(mx − kσx < x ≤ mxf +( xkσ 0.683 0.955 0.997 0.999 −6 −8 e all familiar with IQ (intelligence quotient) scores and/or Test) 10−4 10Aptitude 10 Error probability 10−3 SAT (Scholastic the mean deviation 3.09 3.72scores have 4.75 a mean 5.61 of scores have a mean of 100Distance and afrom standard of 16. SAT 95% 68% a standard deviation of 100. But did you know that these continuous random variables w a normal distribution? of the pdf are ignorable. Indeed when communication systems are considered later it is the X X μ −σ μ μ +σ μ − 2σ μ μ + 2σ presence of these tails that results2 in bit errors. The probabilities are on the order of 10−3 – ∞ x − 1 10−12 , very small, but still significant in terms oftosystem It N is of Q-function : corresponds 26) Q z = e P [ X >performance. zof ~ ( ) ( 0,1interest ] where ∫z 2π 2 dx density Figure 20: Probability function X ∼X N (µ, σ)2; ) . to see how far, in terms of σx , one must be from the mean value to have the different levels of A, pictures the comparison of sevthat is Q ( z ) is the probability of the “tail” of N ( 0,1) . error probabilities. shall F IAs GU R EbeAseen in later chapters this translates to the required SNR to iation scores and the normal distriachieve a specified bit error probability. This N ( 0,1 ) is also shown in Table 3.1. Standard scores have a mean of Having considered the single (or univariate) Gaussian random variable, we turn our d a standard deviationattention of 1.0. to the case of two jointly Gaussian random variables (or the bivariate case). Again tic Aptitude Test scoresthey have a are described by their joint pdf which, in general, is an exponential whose exponent ( z(ax ) 2 +bx+cxy+dy+ey2 +f ) , where the conis a quadratic f 500 and a standard deviation of in the two variables, i.e., fx,y (x, y) =QKe stants K, a, b, c, d, e, and f are chosen to satisfy the basic properties of a valid joint pdf, 0.5 namely being always nonnegative (≥ 0), having unit volume, and also that the marginal 0 z t Intelligence Scale scores have a ∞ ∞ pdfs, fx (x) = −∞ fx,y2% (x, y)dy and = −∞34% fx,y (x,14% y)dx, are 2% valid. Written in standard 14%fy (y)34% 100 and a standard deviation of 16. 1 form joint pdf is a) the Q is a decreasing function with Q ( 0 ) = . case there are 34 percent of the –3.0 –2.0 –1.0 0 2 1.0 2.0 3.0 Standard Scores b) Q ( − z ) = 1 − Q ( z ) etween the mean and one standard −1 n, 14 percent between one and c) Qtwo ) ) = − z 300 400 500 600 700 800 (1 − Q ( z200 SAT π Scores d deviations, and 2 percent beyond π x x 1 2 − 2 sin θ 1 4 − 2 sin θ 2 dard deviations. d) Q ( x ) = ∫ e52 dθ68 . Q ( x84 e d116 θ. ) = ∫100 132 148 π π 2 2 2 0 2 0 Binet Intelligence Scale Scores ( f ( x )) 2 2 d 1 − x2 d 1 − 2 d ck, Applying Psychology: Critical and Creative of Deviation QThinking, e ;6.2Q“Pictures − Comparison f ( x ) ) =the e f (Several x) . e) ( x ) = − Figure ( dx dxby permission 2π 2πPearson the Normal Distribution,” ©Figure 1992 Prentice-Hall, Inc. Reproduced of Education, Inc. 21: Comparison of Several Deviation Scoresdx and the Normal Distribution 134 (c) Binet Intelligence Scale43 scores have a mean of 100 and a standard deviation of 16. In each case there are 34 percent of the scores between the mean and one standard deviation, 14 percent between one and two standard deviations, and 2 percent beyond two standard deviations. [Source: Beck, Applying Psychology: Critical and Creative Thinking.] 10.46. N (0, 1) is the standard Gaussian (normal) distribution. • In Excel, use NORMSINV(RAND()). In MATLAB, use randn. • The standard normal cdf is denoted by Φ(z). ◦ It inherits all properties of cdf. ◦ Moreover, note that Φ(−z) = 1 − Φ(z). 10.47. Relationship between N (0, 1) and N (m, σ 2 ). (a) An arbitrary Gaussian random variable with mean m and variance σ 2 can be represented as σZ +m, where Z ∼ N (0, 1). 43 Alfred Binet, who devised the first general aptitude test at the beginning of the 20th century, defined intelligence as the ability to make adaptations. The general purpose of the test was to determine which children in Paris could benefit from school. Binets test, like its subsequent revisions, consists of a series of progressively more difficult tasks that children of different ages can successfully complete. A child who can solve problems typically solved by children at a particular age level is said to have that mental age. For example, if a child can successfully do the same tasks that an average 8-year-old can do, he or she is said to have a mental age of 8. The intelligence quotient, or IQ, is defined by the formula: IQ = 100 × (Mental Age/Chronological Age) There has been a great deal of controversy in recent years over what intelligence tests measure. Many of the test items depend on either language or other specific cultural experiences for correct answers. Nevertheless, such tests can rather effectively predict school success. If school requires language and the tests measure language ability at a particular point of time in a childs life, then the test is a better-than-chance predictor of school performance. 135 3.6 Iz 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30 0.31 0.32 0.33 0.34 035 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 (z) 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 Iz 0.50 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.60 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.70 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.80 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 ~(z) 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 Iz 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 1.22 1.23 1.24 1.25 1.26 1.27 1.28 1.29 1.30 1.31 1.32 1.33 1.34 1.35 1.36 1.37 1.38 1.39 1.40 1.41 1.42 1.43 1.44 1.45 1.46 1.47 1.48 1.49 DELTA FUNCTIONS, MIXED RANDOM VARIABLES (z) 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 Iz 1.50 1.51 1.52 1.53 1.54 1.55 1.56 1.57 1.58 1.59 1.60 1.61 1.62 1.63 1.64 1.65 1.66 1.67 1.68 1.69 1.70 1.71 1.72 1.73 1.74 1.75 1.76 1.77 1.78 1.79 1.80 1.81 1.82 1.83 1.84 1.85 1.86 1.87 1.88 1.89 1.90 1.91 1.92 1.93 1.94 1.95 1.96 1.97 1.98 1.99 (z) 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 Iz 2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23 2.24 2.25 2.26 2.27 2.28 2.29 2.30 2.31 2.32 2.33 2.34 2.35 2.36 2.37 2.38 2.39 2.40 2.41 2.42 2.43 2.44 2.45 2.46 2.47 2.48 2.49 (z) 0.97725 0.97778 0.97831 0.97882 0.97932 0.97982 0.98030 0.98077 0.98124 0.98169 0.98214 0.98257 0.98300 0.98341 0.98382 0.98422 0.98461 0.98500 0.98537 0.98574 0.98610 0.98645 0.98679 0.98713 0.98745 0.98778 0.98809 0.98840 0.98870 0.98899 0.98928 0.98956 0.98983 0.99010 0.99036 0.99061 0.99086 0.99111 0.99134 0.99158 0.99180 0.99202 0.99224 0.99245 0.99266 0.99286 0.99305 0.99324 0.99343 0.99361 Iz 2.50 2.51 2.52 2.53 2.54 2.55 2.56 2.57 2.58 2.59 2.60 2.61 2.62 2.63 2.64 2.65 2.66 2.67 2.68 2.69 2.70 2.71 2.72 2.73 2.74 2.75 2.76 2.77 2.78 2.79 2.80 2.81 2.82 2.83 2.84 2.85 2.86 2.87 2.88 2.89 2.90 2.91 2.92 2.93 2.94 2.95 2.96 2.97 2.98 2.99 (z) 123 I 0.99379 0.99396 0.99413 0.99430 0.99446 0.99461 0.99477 0.99492 0.99506 0.99520 0.99534 0.99547 0.99560 0.99573 0.99585 0.99598 0.99609 0.99621 0.99632 0.99643 0.99653 0.99664 0.99674 0.99683 0.99693 0.99702 0.99711 0.99720 0.99728 0.99736 0.99744 0.99752 0.99760 0.99767 0.99774 0.99781 0.99788 0.99795 0.99801 0.99807 0.99813 0.99819 0.99825 0.99831 0.99836 0.99841 0.99846 0.99851 0.99856 0.99861 Table 3.1 The standard nonnal CDF <t>(y). - This relationship can be used to generate general Gaussian RV from standard Gaussian RV. (b) If X ∼ N m, σ 2 , the random variable X −m σ is a standard normal random variable. That is, Z ∼ N (0, 1). Z= • Creating a new random variable by this transformation is referred to as standardizing. • The standardized variable is called “standard score” or “z-score”. 10.48. It is impossible to express the integral of a Gaussian PDF between non-infinite limits (e.g., (20)) as a function that appears on most scientific calculators. • An old but still popular technique to find integrals of the Gaussian PDF is to refer to tables that have been obtained by numerical integration. ◦ One such table is the table that lists Φ(z) for many values of positive z. ◦ For X ∼ N m, σ 2 , we can show that the CDF of X can be calculated by x−m FX (x) = Φ . σ Example 10.49. Suppose Z ∼ N (0, 1). Evaluate the following probabilities. (a) P [−1 ≤ Z ≤ 1] 136 (b) P [−2 ≤ Z ≤ 2] Example 10.50. Suppose X ∼ N (1, 2). Find P [1 ≤ X ≤ 2]. ⎛ 1⎞ N ⎜ 0, ⎟ ⎝ 2⎠ erf ( z ) Q 10.51. Q-function: Q (z) = R∞ 0 z 2 − x2 √1 e z 2π ( 2z ) dx corresponds to P [X > z] where X ∼ N (0, 1); that is Q (z) is the probability of the “tail” of N (0, 1). The Q function is then a complementary cdf (ccdf). N ( 0,1) 1 0.9 0.8 Q(z) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 -3 z -2 -1 0 1 2 3 z Figure 22: Q-function (a) Q is a decreasing function with Q (0) = 21 . (b) Q (−z) = 1 − Q (z) = Φ(z) 10.52. Error function (MATLAB): erf (z) = √ 1 − 2Q 2z √2 π Rz 2 e−x dx = 0 (a) It is an odd function of z. (b) For z ≥ 0, it corresponds to P [|X| < z] where X ∼ N 0, 21 . (c) lim erf (z) = 1 z→∞ 137 124 CHAPTER 3 CONTINUOUS RANDOM VARIABLES Iz I z' Q(z) Q(z) Iz Q(z) Iz Q(z) Iz Q(z) 3.00 1.35.10- 3 3.40 3.37-10- 4 3.80 7.23.10- 5 4.20 1.33.10- 5 4.60 2.11.10- 6 3.01 1.31.10- 3 3.41 3.25.10- 4 3.81 6.95.10- 5 4.21 1.28.10- 5 4.61 2.01.10- 6 3.02 1.26.10- 3 3.42 3.13.10- 4 3.82 6.67.10- 5 4.22 1.22.10- 5 4.62 1.92-10- 6 3.03 1.22.10-3 3.43 3.02.10- 4 3.83 6.41.10- 5 4.23 1.17.10- 5 4.63 1.83.10- 6 3.04 1.18.10- 3 3.44 2.91.10- 4 3.84 6.15.10- 5 4.24 1.12.10- 5 4.64 1.74.10- 6 3.05 1.14.10- 3 3.45 2.80.10- 4 3.85 5.91.10- 5 4.25 ,1.07.10- 5 4.65 1.66.10- 6 3.06 1.11.10- 3 3.46 2.70.10- 4 3.86 5.67.10- 5 4.26 1.02.10- 5 4.66 1.58.10- 6 3.07 1.07.10- 3 3.47 2.60.10-4 3.87 5.44.10- 5 4.27 9.77-10- 6 4.67 1.51.10- 6 3.08 1.04.10- 3 3.48 2.51.10-4 3.88 5.22.10- 5 4.28 9.34.10- 6 4.68 1.43.10- 6 3.09 1.00.10- 3 3.49 2.42.10- 4 3.89 5.01.10- 5 4.29 8.93.10- 6 4.69 1.37.10- 6 3.10 9.68.10- 4 3.50 2.33-10- 4 3.90 4.81.10- 5 4.30 8.54.10- 6 4.70 1.30.10- 6 3.11 9.35.10- 4 3.51 2.24.10- 4 3.91 4.61.10- 5 4.31 8.16.10- 6 4.71 1.24.10- 6 3.12 9.04.10- 4 3.52 2.16.10- 4 3.92 4.43.10- 5 4.32 7.80.10- 6 4.72 1.18.10- 6 3.13 8.74.10- 4 3.53 2.08.10- 4 3.93 4.25.10- 5 4.33 7.46.10- 6 4.73 1.12.10- 6 3.14 8.45.10-4 3.54 2.00.10- 4 3.94 4.07-10- 5 4.34 7.12.10- 6 4.74 1.07.10- 6 3.15 8.16.10- 4 3.55 1.93.10-4 3.95 3.91.10- 5 4.35 6.81.10- 6 4.75 1.02.10- 6 3.16 7.89.10- 4 3.56 1.85.10-4 3.96 3.75.10-5 4.36 6.50.10- 6 4.76 9.68.10- 7 3.17 7.62.10- 4 3.57 1.78.10-4 3.97 3.59.10-5 4.37 6.21.10- 6 4.77 9.21.10- 7 3.18 7.36.10- 4 3.58 1.72-10-4 3.98 3.45.10- 5 4.38 5.93.10- 6 4.78 8.76.10- 7 3.19 7.11.10- 4 3.59 1.65.10-4 3.99 3.30.10- 5 4.39 5.67.10- 6 4.79 8.34.10- 7 3.20 6.87-10- 4 3.60 1.59.10-4 4.00 3.17.10- 5 4.40 5.41.10- 6 4.80 7.93.10- 7 3.21 6.64.10- 4 3.61 1.53.10- 4 4.01 3.04.10- 5 4.41 5.17.10- 6 4.81 7.55.10- 7 3.22 6.41.10- 4 3.62 1.47.10-4 4.02 2.91.10- 5 4.42 4.94.10- 6 4.82 7.18.10- 7 3.23 6.19.10- 4 3.63 1.42.10-4 4.03 2.79.10- 5 4.43 4.71.10- 6 4.83 6.83-10- 7 3.24 5.98.10- 4 3.64 1.36.10-4 4.04 2.67.10- 5 4.44 4.50.10- 6 4.84 6.49.10- 7 3.25 5.77-10- 4 3.65 1.31.10-4 4.05 2.56.10- 5 4.45 4.29.10- 6 4.85 6.17.10- 7 3.26 5.57.10- 4 3.66 1.26.10-4 4.06 2.45.10- 5 4.46 4.10.10- 6 4.86 5.87-10- 7 3.27 5.38.10- 4 3.67 1.21.10-4 4.07 2.35.10- 5 4.47 3.91.10- 6 4.87 5.58.10- 7 3.28 5.19.10-4 3.68 1.17.10-4 4.08 2.25.10- 5 4.48 3.73-10- 6 4.88 5.30.10- 7 3.29 5.Ql.1O- 4 3.69 1.12.10-4 4.09 2.16.10- 5 4.49 3.56.10- 6 4.89 5.04.10- 7 3.30 4.83.10-4 3.70 1.08.10-4 4.10 2.07.10- 5 4.50 3.40.10- 6 4.90 4.79.10- 7 3.31 4.66.10- 4 3.71 1.04.10-4 4.11 1.98.10- 5 4.51 3.24.10- 6 4.91 4.55.10- 7 3.32 4.50.10- 4 3.72 9.96.10- 5 4.12 1.89.10- 5 4.52 3.09.10- 6 4.92 4.33.10- 7 3.33 4.34·10-4 3.73 9.57.10- 5 4.13 1.81.10-5 4.53 2.95.10- 6 4.93 4.11.10- 7 3.34 4.19·10-4 3.74 9.20.10- 5 4.14 1.74.10- 5 4.54 2.81.10- 6 4.94 3.91.10- 7 3.35 4.04·10-4 3.75 8.84.10- 5 4.15 1.66.10- 5 3,36 4.55 2.68.10- 6 4.95 3.71-10- 7 3.90·10-4 3.76 8.50.10- 5 4.16 1.59.10- 5 4.56 2.56.10- 6 4.96 3.52.10- 7 3.37 3.76.10--;4 3.77 8.16.10- 5 4.17 1.52.10- 5 3.62·10-4 4.97 3.35.10- 7 3.38 4.57 2.44.10- 6 3.78 7.84.10- 5 4.18 1.46.10- 5 3.39 4.98 3.18.10- 7 3.49·10-4 4.58 2.32.10- 6 3.79 7.53.10- 5 4.19 1.39.10- 5 4.59 2.22.10- 6 4.99 3.02.10-7 Table 3.2 The standard normal complementary CDF Q(z) . _------ ..... . '- .. ... --~-------- • ⎧0, k k −2 E ⎡( X − μ ) ⎤ = ( k − 1) E ⎡( X − μ ) ⎤ = ⎨ ⎣ ⎦ ⎣ ⎦ 1⋅ 3 ⋅ 5 ⋅ ⎩ • ⎧ 2 k , k odd k ⎪ 2 ⋅ 4 ⋅ 6 ⋅ ⋅ ( k − 1) σ ⎡ ⎤ [Papoulis p 111]. E X −μ =⎨ π ⎣ ⎦ ⎪1 ⋅ 3 ⋅ 5 ⋅ ⋅ ( k − 1) σ k , k even ⎩ k odd ⋅ ( k − 1)σ , k even k (d) erf (−z) = −erf (z) ⎤⎦ = 4 μ 2σ 2 + 2 σ4. • Var ⎡⎣ X 2 1 x 1 x √ (e) Φ(x) = 2 1 + erf ⎡ √ =⎡ 2 kerfc − 2 ⎧0, k −2 ⎤ ⎤ 28) For N ( 0,1) and k ≥ 1 , E ⎣ X ⎦(2) = ( k − 1) E ⎣ X ⎦ = ⎨1 ⋅ 3 ⋅ 5 ⋅ ⎩ k odd ⋅ ( k − 1) , k even (f)29)The complementary error 2function: √e− x dx= 1 − 22Q ( R 2∞z ) corresponds Error function (Matlab): erf ( z ) = to 2 ∫ erfc (z) = 1 − erf (z) = 2Qπ 0 2z = √π z e−x dx z 2 ⎛ 1⎞ P ⎡⎣ X < z ⎤⎦ where X ~ N ⎜ 0, ⎟ . ⎝ 2⎠ ⎛ 1⎞ N ⎜ 0, ⎟ ⎝ 2⎠ erf ( z ) Q 0 ( 2z ) z a) lim erf ( z ) = 1 z →∞ Figure 23: erf-function and Q-function b) erf ( − z ) = −erf ( z ) 10.4.3 Exponential Distribution Definition 10.53. The exponential distribution is denoted by E (λ). (a) λ > 0 is a parameter of the distribution, often called the rate parameter. (b) Characterized by −λx λe , x > 0, • fX (x) = 0, x≤0 1 − e−λx , x > 0, • FX (x) = 0, x≤0 138 • Survival-, survivor-, or reliability-function: (c) MATLAB: • X = exprnd(1/λ) or random(’exp’,1/λ) • fX (x) = exppdf(x,1/λ) or pdf(’exp’,x,1/λ) • FX (x) = expcdf(x,1/λ) or cdf(’exp’,x,1/λ) Example 10.54. Suppose X ∼ E(λ), find P [1 < X < 2]. Exercise 10.55. Exponential random variable as a continuous version of geometric random variable: Suppose X ∼ E (λ). Show that bXc ∼ G0 (e−λ ) and dXe ∼ G1 (e−λ ) Example 10.56. The exponential distribution is intimately related to the Poisson process. It is often used as a probability model for the (waiting) time until a “rare” event occurs. • time elapsed until the next earthquake in a certain region • decay time of a radioactive particle • time between independent events such as arrivals at a service facility or arrivals of customers in a shop. • duration of a cell-phone call • time it takes a computer network to transmit a message from one node to another. 139 10.57. EX = 1 λ Example 10.58. Phone Company A charges $0.15 per minute for telephone calls. For any fraction of a minute at the end of a call, they charge for a full minute. Phone Company B also charges $0.15 per minute. However, Phone Company B calculates its charge based on the exact duration of a call. If T , the duration of a call in minutes, is exponential with parameter λ = 1/3, what are the expected revenues per call E [RA ] and E [RB ] for companies A and B? Solution: First, note that ET = λ1 = 3. Hence, E [RB ] = E [0.15 × T ] = 0.15ET = $0.45. and E [RA ] = E [0.15 × dT e] = 0.15E dT e . Now, recall, from Exercise 10.55, that dT e ∼ G1 e−λ . Hence, E dT e = 1−e1 −λ ≈ 3.53. Therefore, E [RA ] = 0.15E dT e ≈ 0.5292. 10.59. Memoryless property : The exponential r.v. is the only continuous44 r.v. on [0, ∞) that satisfies the memoryless property: P [X > s + x |X > s] = P [X > x] for all x > 0 and all s > 0 [18, p. 157–159]. In words, the future is independent of the past. The fact that it hasn’t happened yet, tells us nothing about how much longer it will take before it does happen. • Imagining that the exponentially distributed random variable X represents the lifetime of an item, the residual life of an item has the same exponential distribution as the original lifetime, 44 For discrete random variable, geometric random variables satisfy the memoryless property. 140 regardless of how long the item has been already in use. In other words, there is no deterioration/degradation over time. If it is still currently working after 20 years of use, then today, its condition is “just like new”. • In particular, suppose we define the set B+x to be {x + b : b ∈ B}. For any x > 0 and set B ⊂ [0, ∞), we have P [X ∈ B + x|X > x] = P [X ∈ B] because P [X ∈ B + x] = P [X > x] −λt dt τ =t−x B+x λe = e−λx R R B λe−λ(τ +x) dτ . e−λx 10.60. Summary: X∼ Support SX Uniform U(a, b) (a, b) Normal (Gaussian) N (m, σ 2 ) R Exponential E(λ) (0, ∞) fX (x) = 1 , a < x < b, b−a 0, otherwise. 1 x−m 2 − √ 1 e 2( σ ) 2πσ −λx λe , x > 0, 0, x≤0 Table 4: Examples of probability density functions. Here, λ, σ > 0. 141 10.5 Function of Continuous Random Variables: SISO Reconsider the derived random variable Y = g(X). Recall that we can find EY easily by (22): Z EY = E [g(X)] = g(x)fX (x)dx. R However, there are cases when we have to evaluate probability directly involving the random variable Y or find fY (y) directly. Recall that for discrete random variables, it is easy to find pY (y) by adding all pX (x) over all x such that g(x) = y: X pX (x). (23) pY (y) = x:g(x)=y For continuous random variables, it turns out that we can’t45 simply integrate the pdf of X to get the pdf of Y . 10.61. For Y = g(X), if you want to find fY (y), the following two-step procedure will always work and is easy to remember: (a) Find the cdf FY (y) = P [Y ≤ y]. (b) Compute the pdf from the cdf by “finding the derivative” d fY (y) = dy FY (y) (as described in 10.13). 10.62. Linear Transformation: Suppose Y = aX + b. Then, the cdf of Y is given by  h i y−b P X≤ a i , a > 0, h FY (y) = P [Y ≤ y] = P [aX + b ≤ y] =  P X ≥ y−b , a < 0. a Now, by definition, we know that y−b y−b = FX , P X≤ a a 45 When you applied Equation (23) to continuous random variables, what you would get is 0 = 0, which is true but not interesting nor useful. 142 and y−b y−b y−b P X≥ =P X> +P X = a a a y−b y−b = 1 − FX +P X = . a a h i y−b For continuous random variable, P X = a = 0. Hence,   FX y−b , a > 0, a FY (y) =  1 − FX y−b , a < 0. a Finally, fundamental theorem of calculus and chain rule gives  y−b 1  , a > 0, d a fX a fY (y) = FY (y) =  − 1 fX y−b , a < 0. dy a a Note that we can further simplify the final formula by using the | · | function: 1 y−b , a 6= 0. fY (y) = fX (24) |a| a Graphically, to get the plots of fY , we compress fX horizontally by a factor of a, scale it vertically by a factor of 1/|a|, and shift it to the right by b. Of course, if a = 0, then we get the uninteresting degenerated random variable Y ≡ b. Example 10.63. Suppose X ∼ E(λ). Let Y = 5X. Find fY (y). 143 10.64. Suppose X ∼ N (m, σ 2 ) and Y = aX+b for some constants a and b. Then, we can use (24) to show that Y ∼ N (am + b, a2 σ 2 ). Example 10.65. Amplitude modulation in certain communication systems can be accomplished using various nonlinear devices such as a semiconductor diode. Suppose we model the nonlinear device by the function Y = X 2 . If the input X is a continuous random variable, find the density of the output Y = X 2 . Example 10.66. Suppose X ∼ E(λ). Let Y = 144 1 X2 . Find fY (y). Exercise 10.67 (F2011). Suppose X is uniformly distributed on the interval (1, 2). (X ∼ U(1, 2).) Let Y = X12 . (a) Find fY (y). (b) Find EY . Exercise 10.68 (F2011). Consider the function x, x ≥ 0 g(x) = −x, x < 0. Suppose Y = g(X), where X ∼ U(−2, 2). Remark: The function g operates like a full-wave rectifier in that if a positive input voltage X is applied, the output is Y = X, while if a negative input voltage X is applied, the output is Y = −X. (a) Find EY . (b) Plot the cdf of Y . (c) Find the pdf of Y 145 P [X ∈ B] = P [X = x] = Discrete P pX (x) Continuous R fX (x)dx x∈B B pX (x) = F (x) − F (x− ) P X ((a, b]) = F (b) − F (a) Interval prob. P X ([a, b]) = F (b) − F a− P X ([a, b)) = F b− − F a− P X ((a, b)) = F b− − F (a) P EX = 0 P X ((a, b]) = P X ([a, b]) = P X ([a, b)) = P X ((a, b)) Zb = fX (x)dx = F (b) − F (a) a +∞ R xpX (x) x xfX (x)dx −∞ d P [g(X) ≤ y] . dy fY (y) = Alternatively, For Y = g(X), P pY (y) = pX (x) x: g(x)=y For Y = g(X), P [Y ∈ B] = P pX (x) x:g(x)∈B P E [g(X)] = g(x)pX (x) fY (y) = xk are the real-valued roots of the equation y = g(x). R fX (x)dx {x:g(x)∈B} +∞ R g(x)fX (x)dx x E [X 2 ] = P −∞ +∞ R x2 pX (x) x Var X = P x X fX (xk ) , |g 0 (xk )| k x2 fX (x)dx −∞ (x − EX)2 pX (x) +∞ R −∞ (x − EX)2 fX (x)dx Table 5: Important Formulas for Discrete and Continuous Random Variables 146 Sirindhorn International Institute of Technology Thammasat University School of Information, Computer and Communication Technology ECS315 2014/1 Part V.1 Dr.Prapun 11 Multiple Random Variables One is often interested not only in individual random variables, but also in relationships between two or more random variables. Furthermore, one often wishes to make inferences about one random variable on the basis of observations of other random variables. Example 11.1. If the experiment is the testing of a new medicine, the researcher might be interested in cholesterol level, blood pressure, and the glucose level of a test person. 11.1 A Pair of Discrete Random Variables In this section, we consider two discrete random variables, say X and Y , simultaneously. 11.2. The analysis are different from Section 9.2 in two main aspects. First, there may be no deterministic relationship (such as Y = g(X)) between the two random variables. Second, we want to look at both random variables as a whole, not just X alone or Y alone. Example 11.3. Communication engineers may be interested in the input X and output Y of a communication channel. 149 Example 11.4. Of course, to rigorously define (any) random variables, we need to go back to the sample space Ω. Recall Example 7.4 where we considered several random variables defined on the sample space Ω = {1, 2, 3, 4, 5, 6} where the outcomes are equally likely. In that example, we define X(ω) = ω and Y (ω) = (ω − 3)2 . Example 11.5. Consider the scores of 20 students below: 10, 9, 10, 9, 9, 10, 9, 10, 10, 9, 1, 3, 4, 6, 5, 5, 3, 3, 1, 3. | {z } | {z } Room #1 Room #2 The first ten scores are from (ten) students in room #1. The last 10 scores are from (ten) students in room #2. Suppose we have the a score report card for each student. Then, in total, we have 20 report cards. Figure 24: In Example 11.5, we pick a report card randomly from a pile of cards. I pick one report card up randomly. Let X be the score on that card. • What is the chance that X > 5? (Ans: P [X > 5] = 11/20.) 150 • What is the chance that X = 10? (Ans: pX (10) = P [X = 10] = 5/20 = 1/4.) Now, let the random variable Y denote the room# of the student whose report card is picked up. • What is the probability that X = 10 and Y = 2? • What is the probability that X = 10 and Y = 1? • What is the probability that X > 5 and Y = 1? • What is the probability that X > 5 and Y = 2? Now suppose someone informs me that the report card which I picked up is from a student in room #1. (He may be able to tell this by the color of the report card of which I have no knowledge.) I now have an extra information that Y = 1. • What is the probability that X > 5 given that Y = 1? • What is the probability that X = 10 given that Y = 1? 151 11.6. Recall that, in probability, “,” means “and”. For example, P [X = x, Y = y] = P [X = x and Y = y] and P [3 ≤ X < 4, Y < 1] = P [3 ≤ X < 4 and Y < 1] = P [X ∈ [3, 4) and Y ∈ (−∞, 1)] . In general, the event [“Some condition(s) on X”,“Some condition(s) on Y ”] is the same as the intersection of two events: [“Some condition(s) on X”] ∩ [“Some condition(s) on Y ”] which simply means both statements happen. More technically, [X ∈ B, Y ∈ C] = [X ∈ B and Y ∈ C] = [X ∈ B] ∩ [Y ∈ C] and P [X ∈ B, Y ∈ C] = P [X ∈ B and Y ∈ C] = P ([X ∈ B] ∩ [Y ∈ C]) . Remark: Linking back to the original sample space, this shorthand actually says [X ∈ B, Y ∈ C] = [X = {ω = {ω = [X ∈ B and Y ∈ C] ∈ Ω : X(ω) ∈ B and Y (ω) ∈ C} ∈ Ω : X(ω) ∈ B } ∩ {ω ∈ Ω : Y (ω) ∈ C} ∈ B] ∩ [Y ∈ C] . 152 11.7. The concept of conditional probability can be straightforwardly applied to discrete random variables. For example, P [“Some condition(s) on X” | “Some condition(s) on Y ”] (25) is the conditional probability P (A|B) where A = [“Some condition(s) on X”] and B = [“Some condition(s) on Y ”]. Recall that P (A|B) = P (A ∩ B)/P (B). Therefore, P [X = x and Y = y] , P [Y = y] P [X = x| Y = y] = and P [3 ≤ X < 4| Y < 1] = P [3 ≤ X < 4 and Y < 1] P [Y < 1] More generally, (25) is P ([“Some condition(s) on X”] ∩ [“Some condition(s) on Y ”]) P ([“Some condition(s) on Y ”]) P ([“Some condition(s) on X”,“Some condition(s) on Y ”]) = P ([“Some condition(s) on Y ”]) P [“Some condition(s) on X”,“Some condition(s) on Y ”] = P [“Some condition(s) on Y ”] = More technically, P [X ∈ B| Y ∈ C] = P ([X ∈ B] |[Y ∈ C]) = = P [X ∈ B, Y ∈ C] . P [Y ∈ C] 153 P ([X ∈ B] ∩ [Y ∈ C]) P ([Y ∈ C]) Definition 11.8. Joint pmf : If X and Y are two discrete random variables (defined on a same sample space with probability measure P ), the function pX,Y (x, y) defined by pX,Y (x, y) = P [X = x, Y = y] is called the joint probability mass function of X and Y . (a) We can visualize the joint pmf via stem plot. See Figure 25. (b) To evaluate the probability for a statement that involves both X and Y random variables: We first find all pairs (x, y) that satisfy the condition(s) in the statement, and then add up all the corresponding values from the joint pmf . More technically, we can then evaluate P [(X, Y ) ∈ R] by X P [(X, Y ) ∈ R] = pX,Y (x, y). (x,y):(x,y)∈R Example 11.9 (F2011). Consider random variables X and Y whose joint pmf is given by c (x + y) , x ∈ {1, 3} and y ∈ {2, 4} , pX,Y (x, y) = 0, otherwise. (a) Check that c = 1/20. (b) Find P X 2 + Y 2 = 13 . In most situation, it is much more convenient to focus on the “important” part of the joint pmf. To do this, we usually present the joint pmf (and the conditional pmf) in their matrix forms: 154 2.3 Multiple random variables 75 Example11.10. 2.13. In the precedingboth example, is the that the first cache valDefinition When Xwhat and Y probability take finitely many miss occurs after the third memory access? ues (both have finite supports), say SX = {x1 , . . . , xm } and SY = Solution. We need to find {y1 , . . . , yn }, respectively, we can arrange the probabilities pX,Y (xi , yj ) ∞ P(T > 3) = ∑ P(T = k). in an m × n matrix k=4  However, since P(T = k) = 0 for k ≤ 0, a finite series is obtained by writing  pX,Y (x1 , y1 ) pX,Y (x−,P(T y2 )≤ 3). . . pX,Y (x1 , yn ) 3) = 1 1   p (x , y )P(T >  X,Y 2 1 pX,Y (x2 , y3 2 ) . . . pX,Y (x2 , yn )  = .1 − ∑ P(T = (26) .  .. .. .. k=1 . .k).   . . 2 ]. pX,Y (xm , y1 ) pX,Y=(x1m−,(1y−2 )p)[1.+. .p +ppX,Y (xm , yn ) • We callmass thisfunctions matrix the joint pmf matrix. Joint shall probability The joint probability mass function of X and Y is defined by • The sum of all the entries in the matrix is one. pXY (xi , y j ) := P(X = xi ,Y = y j ). (2.7) An example for integer-valued random variables is sketched in Figure 2.8. 0.06 0.04 0.02 0 8 7 6 5 4 3 2 1 i 0 6 5 3 4 2 1 0 j 2.8. Sketch of bivariate probability mass function pXY (i, j). Figure 25:Figure Example of the plot of a joint pmf. [9, Fig. 2.8] • It turns out that we can extract the marginal probability mass functions pX (xi ) and pY (y j ) from the joint pmf p46 j ) using the formulas pX,Y (x, y) = 0 if XY (xxi , y ∈ / SX or y ∈ / SY . In other words, we don’t have to consider the the supports pXY (xiy , y joutside ) pX (x (2.8)of X i) =x ∑and j and Y , respectively. 46 To see this, note that pX,Y (x, y) can not exceed pX (x) because P (A ∩ B) ≤ P (A). Now, suppose at x = a, we have pX (a) = 0. Then pX,Y (a, y) must also = 0 for any y because it can not exceed pX (a) = 0. Similarly, suppose at y = a, we have pY (a) = 0. Then pX,Y (x, a) = 0 for any x. 155 11.11. From the joint pmf, we can find pX (x) and pY (y) by X pX,Y (x, y) (27) pX (x) = y pY (y) = X pX,Y (x, y) (28) x In this setting, pX (x) and pY (y) are call the marginal pmfs (to distinguish them from the joint one). (a) Suppose we have the joint pmf matrix in (26). Then, the sum of the entries in the ith row is47 pX (xi ), and the sum of the entries in the jth column is pY (yj ): pX (xi ) = n X pX,Y (xi , yj ) and pY (yj ) = m X pX,Y (xi , yj ) i=1 j=1 (b) In MATLAB, suppose we save the joint pmf matrix as P XY, then the marginal pmf (row) vectors p X and p Y can be found by p_X = (sum(P_XY,2))’ p_Y = (sum(P_XY,1)) Example 11.12. Consider the following joint pmf matrix 47 To see this, we consider A = [X = xi ] and a collection defined by Bj = [Y = yj ] and / SY ]. Note that the collection B0 , B1 , . . . , Bn partitions Ω. So, P (A) = Pn B0 = [Y ∈ P (A ∩ B j ). Of course, because the support of Y is SY , we have P (A ∩ B0 ) = 0. Hence, j=0 the sum can start at j = 1 instead of j = 0. 156 Definition 11.13. The conditional pmf of X given Y is defined as pX|Y (x|y) = P [X = x|Y = y] which gives pX,Y (x, y) = pX|Y (x|y)pY (y) = pY |X (y|x)pX (x). (29) 11.14. Equation (29) is quite important in practice. In most cases, systems are naturally defined/given/studied in terms of their conditional probabilities, say pY |X (y|x). Therefore, it is important the we know how to construct the joint pmf from the conditional pmf. Example 11.15. Consider a binary symmetric channel. Suppose the input X to the channel is Bernoulli(0.3). At the output Y of this channel, the crossover (bit-flipped) probability is 0.1. Find the joint pmf pX,Y (x, y) of X and Y . Exercise 11.16. Toss-and-Roll Game: Step 1 Toss a fair coin. Define X by 1, if result = H, X= 0, if result = T. Step 2 You have two dice, Dice 1 and Dice 2. Dice 1 is fair. Dice 2 is unfair with p(1) = p(2) = p(3) = 92 and p(4) = p(5) = p(6) = 1 9. (i) If X = 0, roll Dice 1. (ii) If X = 1, roll Dice 2. 157 Record the result as Y . Find the joint pmf pX,Y (x, y) of X and Y . Exercise 11.17 (F2011). Continue from Example 11.9. Random variables X and Y have the following joint pmf c (x + y) , x ∈ {1, 3} and y ∈ {2, 4} , pX,Y (x, y) = 0, otherwise. (a) Find pX (x). (b) Find EX. (c) Find pY |X (y|1). Note that your answer should be of the form   ?, y = 2, ?, y = 4, pY |X (y|1) =  0, otherwise. (d) Find pY |X (y|3). Definition 11.18. The joint cdf of X and Y is defined by FX,Y (x, y) = P [X ≤ x, Y ≤ y] . 158 Definition 11.19. Two random variables X and Y are said to be identically distributed if, for every B, P [X ∈ B] = P [Y ∈ B]. Example 11.20. Let X ∼ Bernoulli(1/2). Let Y = X and Z = 1 − X. Then, all of these random variables are identically distributed. 11.21. The following statements are equivalent: (a) Random variables X and Y are identically distributed . (b) For every B, P [X ∈ B] = P [Y ∈ B] (c) pX (c) = pY (c) for all c (d) FX (c) = FY (c) for all c Definition 11.22. Two random variables X and Y are said to be independent if the events [X ∈ B] and [Y ∈ C] are independent for all sets B and C. 11.23. The following statements are equivalent: (a) Random variables X and Y are independent. [Y ∈ C] for all B, C. |= (b) [X ∈ B] (c) P [X ∈ B, Y ∈ C] = P [X ∈ B] × P [Y ∈ C] for all B, C. (d) pX,Y (x, y) = pX (x) × pY (y) for all x, y. (e) FX,Y (x, y) = FX (x) × FY (y) for all x, y. Definition 11.24. Two random variables X and Y are said to be independent and identically distributed (i.i.d.) if X and Y are both independent and identically distributed. 11.25. Being identically distributed does not imply independence. Similarly, being independent, does not imply being identically distributed. 159 Example 11.26. Roll a dice. Let X be the result. Set Y = X. Example 11.27. Suppose the pmf of a random variable X is given by   1/4, x = 3, α, x = 4, pX (x) =  0, otherwise. Let Y be another random variable. Assume that X and Y are i.i.d. Find (a) α, (b) the pmf of Y , and (c) the joint pmf of X and Y . 160 Example 11.28. Consider a pair of random variables X and Y whose joint pmf is given by  1/15, x = 3, y = 1,      2/15, x = 4, y = 1, 4/15, x = 3, y = 3, pX,Y (x, y) =    β, x = 4, y = 3,   0, otherwise. (a) Are X and Y identically distributed? (b) Are X and Y independent? 161 11.2 Extending the Definitions to Multiple RVs Definition 11.29. Joint pmf: pX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = P [X1 = x1 , X2 = x2 , . . . , Xn = xn ] . Joint cdf: FX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = P [X1 ≤ x1 , X2 ≤ x2 , . . . , Xn ≤ xn ] . 11.30. Marginal pmf: Definition 11.31. Identically distributed random variables: The following statements are equivalent. (a) Random variables X1 , X2 , . . . are identically distributed (b) For every B, P [Xj ∈ B] does not depend on j. (c) pXi (c) = pXj (c) for all c, i, j. (d) FXi (c) = FXj (c) for all c, i, j. Definition 11.32. Independence among finite number of random variables: The following statements are equivalent. (a) X1 , X2 , . . . , Xn are independent (b) [X1 ∈ B1 ], [X2 ∈ B2 ], . . . , [Xn ∈ Bn ] are independent, for all B1 , B2 , . . . , Bn . Q (c) P [Xi ∈ Bi , ∀i] = ni=1 P [Xi ∈ Bi ], for all B1 , B2 , . . . , Bn . Q (d) pX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = ni=1 pXi (xi ) for all x1 , x2 , . . . , xn . Q (e) FX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = ni=1 FXi (xi ) for all x1 , x2 , . . . , xn . Example 11.33. Toss a coin n times. For the ith toss, let 1, if H happens on the ith toss, Xi = 0, if T happens on the ith toss. We then have a collection of i.i.d. random variables X1 , X2 , X3 , . . . , Xn . 162 Example 11.34. Roll a dice n times. Let Ni be the result of the ith roll. We then have another collection of i.i.d. random variables N1 , N2 , N3 , . . . , Nn . Example 11.35. Let X1 be the result of tossing a coin. Set X2 = X3 = · · · = Xn = X1 . 11.36. If X1 , X2 , . . . , Xn are independent, then so is any subcollection of them. 11.37. For i.i.d. Xi ∼ Bernoulli(p), Y = X1 + X2 + · · · + Xn is B(n, p). Definition 11.38. A pairwise independent collection of random variables is a collection of random variables any two of which are independent. (a) Any collection of (mutually) independent random variables is pairwise independent (b) Some pairwise independent collections are not independent. See Example (11.39). Example 11.39. Let suppose X, Y , and Z have the following joint probability distribution: pX,Y,Z (x, y, z) = 14 for (x, y, z) ∈ {(0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 0)}. This, for example, can be constructed by starting with independent X and Y that are Bernoulli1 2 . Then set Z = X ⊕ Y = X + Y mod 2. (a) X, Y, Z are pairwise independent. (b) X, Y, Z are not independent. 163 Sirindhorn International Institute of Technology Thammasat University School of Information, Computer and Communication Technology ECS315 2014/1 11.3 Part V.2 Dr.Prapun Function of Discrete Random Variables 11.40. Recall that for discrete random variable X, the pmf of a derived random variable Y = g(X) is given by X pY (y) = pX (x). x:g(x)=y Similarly, for discrete random variables X and Y , the pmf of a derived random variable Z = g(X, Y ) is given by X pX,Y (x, y). pZ (z) = (x,y):g(x,y)=z Example 11.41. Suppose the joint  1/15,      2/15, pX,Y (x, y) = 4/15,    8/15,   0, Let Z = X + Y . Find the pmf of Z. 164 pmf of X and Y is given by x = 0, y = 0, x = 1, y = 0, x = 0, y = 1, x = 1, y = 1, otherwise. Exercise 11.42 (F2011). Continue from Exercise 11.9. Let Z = X +Y. (a) Find the pmf of Z. (b) Find EZ. 11.43. In general, when Z = X + Y , X pX,Y (x, y) pZ (z) = (x,y):x+y=z = X y pX,Y (z − y, y) = X x pX,Y (x, z − x). Furthermore, if X and Y are independent, X pX (x) pY (y) pZ (z) = (30) (x,y):x+y=z = X y pX (z − y) pY (y) = X x pX (x) pY (z − x). (31) Example 11.44. Suppose Λ1 ∼ P(α1 ) and Λ2 ∼ P(α2 ) are independent. Let Λ = Λ1 +Λ2 . Use (31) to show48 that Λ ∼ P(α1 +α2 ). First, note that pΛ (`) would be positive only on nonnegative integers because a sum of nonnegative integers (Λ1 and Λ2 ) is still a nonnegative integer. So, the support of Λ is the same as the support for Λ1 and Λ2 . Now, we know, from (31), that X P [Λ = `] = P [Λ1 + Λ2 = `] = P [Λ1 = i] P [Λ2 = ` − i] i Of course, we are interested in ` that is a nonnegative integer. The summation runs over i = 0, 1, 2, . . .. Other values of i would make P [Λ1 = i] = 0. Note also that if i > `, then ` − i < 0 and P [Λ2 = ` − i] = 0. Hence, we conclude that the index i can only 48 Remark: You may feel that simplifying the sum in this example (and in Exercise 11.45 is difficult and tedious, in Section 13, we will introduce another technique which will make the answer obvious. The idea is to realize that (31) is a convolution and hence we can use Fourier transform to work with a product in another domain. 165 be integers from 0 to k: P [Λ = `] = ` X i −α1 α1 −α2 e i! i=0 e ` α2`−i (` − i)! 1X `! α1i α2`−i `! i=0 i! (` − i)! =e −(α1 +α2 ) =e −(α1 +α2 ) (α1 + α2 )` , `! where the last equality is from the binomial theorem. Hence, the sum of two independent Poisson random variables is still Poisson! ( ` −(α1 +α2 ) (α1 +α2 ) e , ` ∈ {0, 1, 2, . . .} `! pΛ (`) = 0, otherwise Exercise 11.45. Suppose B1 ∼ B(n1 , p) and B2 ∼ B(n2 , p) are independent. Let B = B1 + B2 . Use (31) to show that B ∼ B(n1 + n2 , p). 11.4 Expectation of Function of Discrete Random Variables 11.46. Recall that the expected value of “any” function g of a discrete random variable X can be calculated from X E [g(X)] = g(x)pX (x). x Similarly49 , the expected value of “any” function g of two discrete random variables X and Y can be calculated from XX E [g(X, Y )] = g(x, y)pX,Y (x, y). x y 49 Again, these are called the law/rule of the lazy statistician (LOTUS) [22, Thm 3.6 p 48],[9, p. 149] because it is so much easier to use the above formula than to first find the pmf of g(X) or g(X, Y ). It is also called substitution rule [21, p 271]. 166 P [X = Y ] Discrete P pX (x) x∈B P pX,Y (x, y) (x,y):(x,y)∈R P pX (x) = pX,Y (x, y) y P pY (y) = pX,Y (x, y) P P x pX,Y (x, y) xP y: y<x P = pX,Y (x, y) y x: x>y P pX,Y (x, x) X Y Conditional E [g(X, Y )] pX,Y (x, y) = pX (x)pY (y) p (x,y) pX|Y (x|y) = X,Y p (y) Y PP g(x, y)pX,Y (x, y) P [X ∈ B] P [(X, Y ) ∈ R] Joint to Marginal: (Law of Total Prob.) P [X > Y ] |= x x y Table 6: Joint pmf: A Summary 11.47. E [·] is a linear operator: E [aX + bY ] = aEX + bEY . (a) Homogeneous: E [cX] = cEX (b) Additive: E [X + Y ] = EX + EY P P (c) Extension: E [ ni=1 ci gi (Xi )] = ni=1 ci E [gi (Xi )]. Example 11.48. Recall from 11.37 that when i.i.d. Xi ∼ Bernoulli(p), Y = X1 + X2 + · · · Xn is B(n, p). Also, from Example 9.4, we have EXi = p. Hence, " n # n n X X X EY = E Xi = E [Xi ] = p = np. i=1 i=1 i=1 Therefore, the expectation of a binomial random variable with parameters n and p is np. 167 Example 11.49. A binary communication link has bit-error probability p. What is the expected number of bit errors in a transmission of n bits? Theorem 11.50 (Expectation and Independence). Two random variables X and Y are independent if and only if E [h(X)g(Y )] = E [h(X)] E [g(Y )] for “all” functions h and g. • In other words, X and Y are independent if and only if for every pair of functions h and g, the expectation of the product h(X)g(Y ) is equal to the product of the individual expectations. • One special case is that Y |= X implies E [XY ] = EX × EY. (32) |= However, independence means more than this property. In other words, having E [XY ] = (EX)(EY ) does not necessarily imply X Y . See Example 11.61. 11.51. Let’s combined what we have just learned about independence into the definition/equivalent statements that we already have in 11.32. The following statements are equivalent: (a) Random variables X and Y are independent. [Y ∈ C] for all B, C. |= (b) [X ∈ B] (c) P [X ∈ B, Y ∈ C] = P [X ∈ B] × P [Y ∈ C] for all B, C. (d) pX,Y (x, y) = pX (x) × pY (y) for all x, y. (e) FX,Y (x, y) = FX (x) × FY (y) for all x, y. (f) 168 Exercise 11.52 (F2011). Suppose X and Y are i.i.d. with EX = EY = 1 and Var X = Var Y = 2. Find Var[XY ]. 11.53. To quantify the amount of dependence between two random variables, we may calculate their mutual information. This quantity is crucial in the study of digital communications and information theory. However, in introductory probability class (and introductory communication class), it is traditionally omitted. 11.5 Linear Dependence Definition 11.54. Given two random variables X and Y , we may calculate the following quantities: (a) Correlation: E [XY ]. (b) Covariance: Cov [X, Y ] = E [(X − EX)(Y − EY )]. (c) Correlation coefficient: ρX,Y = Cov[X,Y ] σX σY Exercise 11.55 (F2011). Continue from Exercise 11.9. (a) Find E [XY ]. 1 (b) Check that Cov [X, Y ] = − 25 . 11.56. Cov [X, Y ] = E [(X − EX)(Y − EY )] = E [XY ] − EXEY • Note that Var X = Cov [X, X]. 11.57. Var [X + Y ] = Var X + Var Y + 2Cov [X, Y ] 169 Definition 11.58. X and Y are said to be uncorrelated if and only if Cov [X, Y ] = 0. 11.59. The following statements are equivalent: (a) X and Y are uncorrelated. (b) Cov [X, Y ] = 0. (c) E [XY ] = EXEY . (d) |= 11.60. Independence implies uncorrelatedness; that is if X Y , then Cov [X, Y ] = 0. The converse is not true. Uncorrelatedness does not imply independence. See Example 11.61. Example 11.61. Let X be uniform on {±1, ±2} and Y = |X|. 11.62. The variance of the sum of uncorrelated (or independent) random variables is the sum of their variances. 170 Exercise 11.63. Suppose two fair dice are tossed. Denote by the random variable V1 the number appearing on the first dice and by the random variable V2 the number appearing on the second dice. Let X = V1 + V2 and Y = V1 − V2 . (a) Show that X and Y are not independent. (b) Show that E [XY ] = EXEY . 11.64. Cov [aX + b, cY + d] = acCov [X, Y ] Cov [aX + b, cY + d] = E [((aX + b) − E [aX + b]) ((cY + d) − E [cY + d])] = E [((aX + b) − (aEX + b)) ((cY + d) − (cEY + d))] = E [(aX − aEX) (cY − cEY )] = acE [(X − EX) (Y − EY )] = acCov [X, Y ] . Definition 11.65. Correlation coefficient: Cov [X, Y ] σX σY E [XY ] − EXEY Y − EY X − EX = . =E σX σY σX σY ρX,Y = • ρX,Y is dimensionless • ρX,X = 1 • ρX,Y = 0 if and only if X and Y are uncorrelated. • Cauchy-Schwartz Inequality 50 : |ρX,Y | ≤ 1. In other words, ρXY ∈ [−1, 1]. 50 Cauchy-Schwartz inequality shows up in many areas of Mathematics. A general form of this inequality can be stated in any inner product space: | ha, bi |2 ≤ ha, ai hb, bi . Here, the inner product is defined by hX, Y i = E [XY ]. The Cauchy-Schwartz inequality then gives |E [XY ] |2 ≤ E X 2 E Y 2 . 171 11.66. Linear Dependence and Cauchy-Schwartz Inequality 1, a > 0 (a) If Y = aX + b, then ρX,Y = sign(a) = −1, a < 0. • To be rigorous, we should also require that σX > 0 and a 6= 0. (b) When σY , σX > 0, equality occurs if and only if the following conditions holds ≡ ≡ ≡ ≡ ∃a 6= 0 such that (X − EX) = a(Y − EY ) ∃a 6= 0 and b ∈ R such that X = aY + b ∃c 6= 0 and d ∈ R such that Y = cX + d |ρXY | = 1 a In which case, |a| = σσXY and ρXY = |a| = sgn a. Hence, ρXY is used to quantify linear dependence between X and Y . The closer |ρXY | to 1, the higher degree of linear dependence between X and Y . Example 11.67. [21, Section 5.2.3] Consider an important fact that investment experience supports: spreading investments over a variety of funds (diversification) diminishes risk. To illustrate, imagine that the random variable X is the return on every invested dollar in a local fund, and random variable Y is the return on every invested dollar in a foreign fund. Assume that random variables X and Y are i.i.d. with expected value 0.15 and standard deviation 0.12. If you invest all of your money, say c, in either the local or the foreign fund, your return R would be cX or cY . • The expected return is ER = cEX = cEY = 0.15c. • The standard deviation is cσX = cσY = 0.12c Now imagine that your money is equally distributed over the two funds. Then, the return R is 21 cX + 12 cY . The expected return 172 is ER = 12 cEX + 12 cEY = 0.15c. Hence, the expected return remains at 15%. However, i c2 hc c2 c2 Var R = Var (X + Y ) = Var X + Var Y = × 0.122 . 2 4 4 2 √ c ≈ 0.0849c. So, the standard deviation is 0.12 2 In comparison with the distributions of X and Y , the pmf of 1 2 (X + Y ) is concentrated more around the expected value. The centralization of the distribution as random variables are averaged together is a manifestation of the central limit theorem. 11.68. [21, Section 5.2.3] Example 11.67 is based on the assumption that return rates X and Y are independent from each other. In the world of investment, however, risks are more commonly reduced by combining negatively correlated funds (two funds are negatively correlated when one tends to go up as the other falls). This becomes clear when one considers the following hypothetical situation. Suppose that two stock market outcomes ω1 and ω2 are possible, and that each outcome will occur with a probability of 1 2 Assume that domestic and foreign fund returns X and Y are determined by X(ω1 ) = Y (ω2 ) = 0.25 and X(ω2 ) = Y (ω1 ) = −0.10. Each of the two funds then has an expected return of 7.5%, with equal probability for actual returns of 25% and 10%. The random variable Z = 21 (X + Y ) satisfies Z(ω1 ) = Z(ω2 ) = 0.075. In other words, Z is equal to 0.075 with certainty. This means that an investment that is equally divided between the domestic and foreign funds has a guaranteed return of 7.5%. 173 Exercise 11.69. The input X and output Y of a system subject to random perturbations are described probabilistically by the following joint pmf matrix: x 1 3 y 2 4 5 0.02 0.10 0.08 0.08 0.32 0.40   (a) Evaluate the following quantities. (i) EX (ii) P [X = Y ] (iii) P [XY < 6] (iv) E [(X − 3)(Y − 2)] (v) E X(Y 3 − 11Y 2 + 38Y ) (vi) Cov [X, Y ] (vii) ρX,Y (b) Calculate the following quantities using what you got from part (a). (i) Cov [3X + 4, 6Y − 7] (ii) ρ3X+4,6Y −7 (iii) Cov [X, 6X − 7] (iv) ρX,6X−7 174 Answers: (a) (i) EX = 2.6 (ii) P [X = Y ] = 0 (iii) P [XY < 6] = 0.2 (iv) E [(X − 3)(Y − 2)] = −0.88 (v) E X(Y 3 − 11Y 2 + 38Y ) = 104 (vi) Cov [X, Y ] = 0.032 (vii) ρX,Y = 0.0447 (b) (i) Hence, Cov [3X + 4, 6Y − 7] = 3 × 6 × Cov [X, Y ] ≈ 3 × 6 × 0.032 ≈ 0.576 . (ii) Note that Cov [aX + b, cY + d] σaX+b σcY +d ac acCov [X, Y ] = = ρX,Y = sign(ac) × ρX,Y . |a|σX |c|σY |ac| ρaX+b,cY +d = Hence, ρ3X+4,6Y −7 = sign(3 × 4)ρX,Y = ρX,Y = 0.0447 . (iii) Cov [X, 6X − 7] = 1 × 6 × Cov [X, X] = 6 × Var[X] ≈ 3.84 . (iv) ρX,6X−7 = sign(1 × 6) × ρX,X = 1 . 175 11.6 Multiple Continuous Random Variables Discrete P pX (x) x∈B P pX,Y (x, y) P [X ∈ B] P [(X, Y ) ∈ R] Joint to Marginal: (x,y):(x,y)∈R pX (x) = P pX,Y (x, y) Continuous R fX (x)dx B RR fX,Y (x, y)dxdy {(x,y):(x,y)∈R} +∞ R fX (x) = y (Law of Total Prob.) pY (y) = P pX,Y (x, y) fY (y) = x P [X > Y ] P P pX,Y (x, y) P P pX,Y (x, y) y x: x>y P [X = Y ] P fX,Y (x, y)dx −∞ x y: y<x = fX,Y (x, y)dy −∞ +∞ R +∞ R Rx fX,Y (x, y)dydx −∞ −∞ +∞ R R∞ = fX,Y (x, y)dxdy −∞ y pX,Y (x, x) 0 x |= X Y Conditional E [g(X, Y )] P [g(X, Y ) ∈ B] Z =X +Y pX,Y (x, y) = pX (x)pY (y) p (x,y) pX|Y (x|y) = X,Y pY (y) PP g(x, y)pX,Y (x, y) x y P pX,Y (x, y) (x,y): g(x,y)∈B P pZ (z) = pX,Y (x, z − x) P x = pX,Y (z − y, y) y Table 7: pmf vs. pdf 176 fX,Y (x, y) = fX (x)fY (y) f (x,y) fX|Y (x|y) = X,Y fY (y) +∞ R +∞ R g(x, y)fX,Y (x, y)dxdy −∞ −∞ RR fX,Y (x, y)dxdy {(x,y): g(x,y)∈B} R +∞ fZ (z) = −∞ fX,Y (x, z − x)dx R +∞ = −∞ fX,Y (z − y, y) Sirindhorn International Institute of Technology Thammasat University School of Information, Computer and Communication Technology ECS315 2014/1 12 12.1 Part VI Dr.Prapun Limiting Theorems Law of Large Numbers (LLN) Definition 12.1. Let X1 , X2 , . . . , Xn be a collection of random variables with a common mean E [Xi ] = m for all i. In practice, since we do not know m, we use the numerical average, or sample mean, n 1X Mn = Xi n i=1 in place of the true, but unknown value, m. Q: Can this procedure of using Mn as an estimate of m be justified in some sense? A: This can be done via the law of large number. 12.2. The law of large number basically says that if you have a sequence of i.i.dPrandom variables X1 , X2 , . . .. Then the sample means Mn = n1 ni=1 Xi will converge to the actual mean as n → ∞. 12.3. LLN is easy to see via the property of variance. Note that " n # n X 1 1X E [Mn ] = E Xi = EXi = m n i=1 n i=1 and " Var[Mn ] = Var n 1X n i=1 # Xi n 1 X 1 = 2 Var Xi = σ 2 , n i=1 n 176 (33) Remarks: (a) For (33) to hold, it is sufficient to have uncorrelated Xi ’s. (b) From (33), we also have 1 σMn = √ σ. n (34) In words, “when uncorrelated (or independent) random variables each having the same distribution are averaged together, the standard deviation is reduced according to the square root law.” [21, p 142]. Exercise 12.4 (F2011). Consider i.i.d. random variables X1 , X2 , . . . , X10 . Define the sample mean M by 10 1 X M= Xk . 10 k=1 Let 10 1 X V1 = (Xk − E [Xk ])2 . 10 k=1 and 10 1 X V2 = (Xj − M )2 . 10 j=1 Suppose E [Xk ] = 1 and Var[Xk ] = 2. (a) Find E [M ]. (b) Find Var[M ]. (c) Find E [V1 ]. (d) Find E [V2 ]. 177 12.2 Central Limit Theorem (CLT) In practice, there are many random variables that arise as a sum of many other random variables. In this section, we consider the sum n X Sn = Xi (35) i=1 where the Xi are i.i.d. with common mean m and common variance σ2. • Note that when we talk about Xi being i.i.d., the definition is that they are independent and identically distributed. It is then convenient to talk about a random variable X which shares the same distribution (pdf/pmf) with these Xi . This allow us to write i.i.d. Xi ∼ X, (36) which is much more compact than saying that the Xi are i.i.d. with the same distribution (pdf/pmf) as X. Moreover, 2 we can also use EX and σX for the common expected value and variance of the Xi . Q: How does Sn behave? In the previous section, we consider the sample mean of identically distributed random variables. More specifically, we consider the random variable Mn = n1 Sn . We found that Mn will converge to m as n increases to ∞. Here, we don’t want to rescale the sum Sn by the factor n1 . 12.5 (Approximation of densities and pmfs using the CLT). The actual statement of the CLT is a bit difficult to state. So, we first give you the interpretation/insight from CLT which is very easy to remember and use: For n large enough, we can approximate Sn by a Gaussian random variable with the same mean and variance as Sn . 178 Note that the mean and variance of Sn is nm and nσ 2 , respectively. Hence, for n large enough we can approximate Sn by 2 N nm, nσ . In particular, s−nm (a) FSn (s) ≈ Φ σ√n . (b) If the Xi are continuous random variable, then fSn (s) ≈ √ 2 1 √ ) − 21 ( x−nm σ n . √ e 2πσ n (c) If the Xi are integer-valued, then 2 1 1 1 √ − 12 ( k−nm σ n ) . P [Sn = k] = P k − < Sn ≤ k + ≈√ e √ 2 2 2πσ n [9, eq (5.14), p. 213]. The approximation is best for k near nm [9, p. 211]. Example 12.6. Approximation for Binomial Distribution: For X ∼ B(n, p), when n is large, binomial distribution becomes difficult to compute directly because of the need to calculate factorial terms. (a) When p is not close to either 0 or 1 so that the variance is also large, we can use CLT to approxmiate (k−EX)2 1 e− 2 Var X 2π Var X (k−np)2 1 =p e− 2np(1−p) . 2πnp (1 − p) P [X = k] ≈ √ (37) (38) This is called Laplace approximation to the Binomial distribution [25, p. 282]. (b) When p is small, the binomial distribution can be approximated by P(np) as discussed in 8.45. (c) If p is very close to 1, then n − X will behave approximately Poisson. 179 i =1 nλ0 = λ . Hence X is approximately normal N ( λ , λ ) for λ large. Some says that the normal approximation is good when λ > 5 . p := 0.05 −λ e ⋅ λ −λ −1 1 p := 0.05 x e Γ ( x+ 1) 2⋅ π λ n := 100 λ := 5 ⋅e 2 ⋅λ ⋅ 0.15 ( x− λ ) −1 1 2⋅ π λ 0.1 − x λ−1 ⋅e 2 ⋅λ 0.06 ( x− λ ) 2 0.04 − x λ−1 e ⋅x Γ ( λ) Γ ( n− x+ 1) ⋅ Γ ( x+ 1) x Γ ( x+ 1) 2 e ⋅x Γ ( n+ 1) λ n := 800 λ := 40 Γ ( λ) x ⋅ p ⋅ ( 1− p ) Γ ( n+ 1) n− x 0.05 Γ ( n− x+ 1) ⋅ Γ ( x+ 1) 0 0 5 x ⋅ p ⋅ ( 1− p ) n− x 0.02 0 10 x 0 20 40 60 x The above Figure figure compare 1) Poisson when x istointeger, 2) Gaussian, 3) Gamma,and 4) 26: Gaussian approximation Binomial, Poisson distribution, Binomial. Gamma distribution. • • • If g : Z + → R is any bounded function and Λ ~ P ( λ ) , then E ⎡⎣λ g ( Λ + 1) − Λg ( Λ ) ⎤⎦ = 0 . Exercise 12.7 (F2011). Continue from Exercise 6.53. The stronger i ∞ ∞ λ i +1 if∞ n is very λ i ⎞ large. (By person (Kakashi) should win −λ λ − λ ⎛ the competition Proof. ∑ ( λ g ( i + 1) − ig ( i ) ) e = e ⎜ ∑ g ( i + 1) − ∑ ig ( i ) ⎟ of wins i ! the⎝ proportion i ! fights !⎠ i = 0 the law of large numbers, i =0 i =1 that iKakashi should be close to 55%.) However, because the results aremrandom ∞ λ i +1 ∞ λ +1 ⎞ −λ ⎛ e ⎜∑ i + 1)not guarantee 1) Kakashi − ∑ g ( m +that and n can not be very=large, weg (can ⎟ i ! m! ⎠ i 0 m 0 = = ⎝ will win. However, it may be good enough if the probability that =0 Kakashi wins the competition is greater than 0.85. + to for findwhich the minimal such that theinprobability E ⎡⎣ f ( Λ value Any function We expressed the form →R f : Zwant )⎤⎦ = 0 ofcann be that Kakashi wins the competition is greater than 0.85. f ( j ) = λ g ( j + 1) − jg ( j ) for a bounded function g. Let N be the number of fights that Kakashi wins among the n E ⎡⎣λ gwe + 1) − Λg ( Λ ) ⎤⎦ = 0 for all bounded g, then Λ has the Poisson Thus, conversely, fights. ifThen, ( Λneed h ni distribution P ( λ ) . ≥ 0.85. (39) P N> 2 Poisson distribution can be obtained as a limit from negative binomial distributions. Use the binomial central distribution limit theorem Table r3.1 Table 3.2 from (Thus, the negative withand parameters andor p can be approximated by the [Yates and Goodman] to approximate the minimal value of n such rq Poisson distribution with parameter λ = (maen-matching), provided that p is that (39) is satisfied. p “sufficiently” close to 1 and r is “sufficiently” large. Let X be Poisson with mean λ . Suppose, that the mean λ is chosen in accord with a probability distribution FΛ ( λ ) . Hence, 180 Sirindhorn International Institute of Technology Thammasat University School of Information, Computer and Communication Technology ECS315 2014/1 13 Part VII Dr.Prapun Three Types of Random Variables 13.1. Review: You may recall51 the following properties for cdf of discrete random variables. These properties hold for any kind of random variables. (a) The cdf is defined as FX (x) = P [X ≤ x]. This is valid for any type of random variables. (b) Moreover, the cdf for any kind of random variable must satisfies three properties which we have discussed earlier: CDF1 FX is non-decreasing CDF2 FX is right-continuous CDF3 lim FX (x) = 0 and lim FX (x) = 1. x→−∞ x→∞ (c) P [X = x] = FX (x) − FX (x− ) = the jump or saltus in F at x. Theorem 13.2. If you find a function F that satisfies CDF1, CDF2, and CDF3 above, then F is a cdf of some random variable. 51 If you don’t know these properties by now, you should review them as soon as possible. 181 Example 13.3. Consider an input X to a device whose output Y will be the same as the input if the input level does not exceed 5. For input level that exceeds 5, the output will be saturated at 5. Suppose X ∼ U(0, 6). Find FY (y). 13.4. We can categorize random variables into three types according to its cdf: (a) If FX (x) is piecewise flat with discontinuous jumps, then X is discrete. (b) If FX (x) is a continuous function, then X is continuous. (c) If FX (x) is a piecewise continuous function with discontinuities, then X is mixed. 182 81 3.1 Random variables (a) Fx(x) 1.0 ¥ –¥ 0 (b) Fx(x) x 1.0 –¥ 0 (c) Fx(x) ¥ x ¥ x 1.0 Fig. 3.2 –¥ 0 TypicalFigure cdfs: (a) variable, (b) a continuous random variable, and (c) a mixed random 27:a discrete Typicalrandom cdfs: (a) a discrete random variable, (b) a continuous random variable. variable, and (c) a mixed random variable [16, Fig. 3.2]. For a discrete random variable, Fx (x) is a staircase function, whereas a random variable is called continuous if Fx (x) is a continuous function. A random variable is called mixed if it is neither discrete nor continuous. Typical cdfs for discrete, continuous, and mixed random variables are shown in Figures 3.2(a), 3.2(b), and 3.2(c), respectively. Rather than dealing with the cdf, it is more common to deal with the probability density 183 of F (x), i.e., function (pdf), which is defined as the derivative x fx (x) = dFx (x) . dx (3.11) We have seen in Example 13.3 that some function can turn a continuous random variable into a mixed random variable. Next, we will work on an example where a continuous random variable is turned into a discrete random variable. Example 13.5. Let X ∼ U(0, 1) and Y = g(X) where 1, x < 0.6 g(x) = 0, x ≥ 0.6. Before going deeply into the math, it is helpful to think about the nature of the derived random variable Y . The definition of g(x) tells us that Y has only two possible values, Y = 0 and Y = 1. Thus, Y is a discrete random variable. Example 13.6. In MATLAB, we have the rand command to generate U(0, 1). If we want to generate a Bernoulli random variable with success probability p, what can we do? Exercise 13.7. In MATLAB, how can we generate X ∼ binomial(2, 1/4) from the rand command? 184

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 9 Expectation and Variance