Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1. Derived Distributions 170B Probability Theory, Puck Rombach Last updated: September 28, 2016 Bertsekas & Tsitsiklis: Section 4.1. Assumed knowledge: Random variables, PMFs, PDFs, CDFs. The book only considers continuous random variables in section 4.1, but we will think about both. Lecture 1 We spend a part of this lecture talking about logistics of the course. Please think about the following questions yourself: Why am I taking this course? Which parts do I find interesting, and which parts will be useful to me later? You will probably find that your answer to these questions change throughout the course, as you discover new concepts of value to you. We discussed the following concepts. It is very important that you understand these and use them precisely. They should be familiar, but you will learn to use them more carefully than before. • Probability space, sample space, probability law. (B&T Ch.1) • A random variable X is a real-valued function of the outcomes of a sample space. (It is not random and not a variable.) The support RX of a random variable is the set of all values it can take with non-zero probability. A discrete random variable X has a probability mass function (PMF) pX (x) such that P(X = x) = pX (x). A (absolutely) continuous random variable X has a probability density function (PDF) fX (x) such that Z b fX (x)dx. P(a ≤ X ≤ b) = a In general, the cumulative distribution function (CDF) F X (x) is given by P R k≤x pX (x) if X discrete F X (x) = P(X ≤ x) = x fX (t)dt if X continuous. −∞ 1 2 We are interested in situations where one random variable is a function of the other, Y = g(X), where we know something about the distribution of X, and want to know about Y. For example, we know how sales are distributed, but we are interested in the profit. Or, we know how road surface resistance is distributed, but we want to know how fast tires will wear down. Think about examples from your area of expertise. (Of course, in interesting/useful cases the relationships between rvs is more complicated than a direct function, but that is where we should start.) General derived distribution for discrete rv Suppose we have Y = g(X), where we know the distribution of X, and are interested in the distribution of Y. Discrete case Suppose we have Y = g(X), and we know pX (x). It is easy to see that X pY (y) = pX (x). (1.1) {x|g(x)=y} Example. Toss 2 fair coins. X is the number of Hs. Y is 0 if X is even and 1 if X is odd. What is pY (y)? Lecture 2 The above sum is very general, but it is not the most direct solution in many cases. What if g(x) is an invertible function? Then the above is not really a sum, as there is only one x such that g(x) = y for each y. Let’s say that more rigorously. Suppose that X has support RX . Then RY = {y| there exists an x ∈ RX such that g(x) = y}. The function g(x) is invertible if for every y ∈ RY , this x such that g(x) = y is unique. The following alternative statement may not be obvious to you, but it’s good practice for mathematical statements: g(x) is invertible if for any x1 , x2 ∈ RX , g(x1 ) = g(x2 ) implies x1 = x2 . In the lecture, we show the following lemma. If I omit a proof in the notes, this means that I expect you to be able to do it yourself. You can use that as an extra exercise if needed. Lemma 1.1. If X is a discrete random variable and g : X → Y is an invertible function (on RX ), then pX (g−1 (y)) if y ∈ RY pY (y) = 0 if y < RY . 170B Probability Theory, Rombach Lecture Notes 3 Corollary 1.2. If X is a discrete random variable and Y = aX + b, a , 0, then y−b if y ∈ RY pX a pY (y) = 0 if y < RY . Continuous case For the continuous case, things get a little bit more complicated.The general analogue of equation 1.1 is fY (y) = FY (y) = dFY (y) , dy (1.2) fX (x) dx. (1.3) Z {x | g(x)≤y} Both the integral and the differentiation may be arbitrarily awful or impossible. The most manageable cases here are when g(x) is not just invertible, but also strictly monotone. The reason for this is that we cannot enquire about single values for X and Y, we can only consider sets of values. We walked through the proofs for the following lemmas in class, but they are also in the book in section 4.1. However, you should be able to fluently prove them yourself, so practice if necessary. Lemma 1.3. If X is a continuous random variable and g : X → Y is a strictly monotone differentiable function (on RX ), then, if g(x) is increasing dg−1 (y) if y ∈ RY , fX (g−1 (y)) dy fY (y) = 0 if y < RY , and if g(x) is decreasing dg−1 (y) − fX (g−1 (y)) dy fY (y) = 0 if y ∈ RY , if y < RY . Corollary 1.4. If X is a continuous random variable and Y = g(X) = aX + b, a , 0, then y−b 1 if y ∈ RY |a| fX a fY (y) = 0 if y < RY . We then apply these lemmas to show the following statements about functions of exponential and normal random variables. Again, it is worth checking these for yourself as practice. (Proofs also in the book in Example 4.4-4.5 on page 206) This might also be a good time to remind yourself of the PMFs, PDFs, CDFs of a few standard distributions: normal, exponential, Bernoulli, binomial, uniform, geometric, Poisson. • If X is an exponential random variable with parameter λ, and Y = aX, a > 0, then Y is exponential with parameter λ/a. • If X is a normal random variable with parameters µ and σ2 , and Y = aX + b, a , 0, then Y is normal with parameters aµ + b and a2 σ2 . 170B Probability Theory, Rombach Lecture Notes 4 General CDF for monotone g(x) The continuous case is forgiving when it comes to differentiating between P(X ≤ x) and P(X < x), allowing us to claim that FY (y) = 1 − F X (g−1 (y)) in the case of decreasing g(x) in the proof above. Let’s not forget that the full statement, which is correct in both the discrete and the continuous case is FY (y) = 1 − F X (g−1 (y)) + P(X = g−1 (y)), if y ∈ RY when g(x) is strictly monotone and decreasing. The last term is of course equal to 0 for continuous variables. Recommended exercises • Example 4.1-4.3 (p.203-204) • Example 4.6 (p.209) • Problem 1-4 (p.246) Lecture 3 We start the lecture with one more example of a function of a random variable. Exercise 1.5. Suppose we are given a uniform continuous random variable X, with 1 if x ∈ (0, 1], fX (x) = 0 if o/w. If Y = g(X) = − λ1 ln(X), then find fY (y). Functions of two random variables in general You may want to do a little TBT to the first time you did an example of a function of two random variables: Romeo and Juliet on p13. You may also want to look back at joint distributions. Remember that if X, Y are independent, then respectively pX,Y (x, y) = pX (x)pY (y) and fX,Y (x, y) = fX (x) fY (y). The reasons for being interested in functions of two (or more) random variables are similar to functions of single variables. Life is complicated. We may √ be interested in functions such as Z = max(X, Y) (profit if selling to a highest bidder), Z = X 2 + Y 2 (distance between two 2D data points), Z = X + Y (output that is signal + noise), etc... Can you find more examples from real life problems? 170B Probability Theory, Rombach Lecture Notes 5 Example 4.7 and 4.8 are covered in the lecture. I’d also like to write down the explicit general expressions of distribution functions. In the discrete case, we have that if Z = g(X, Y), X pZ (z) = pX,Y (x, y). (1.4) {x,y | g(x,y)=z} Here, the hardest part of the problem is usually to determine the set {x, y | g(x, y) = z} for every z. In the continuous case, with FZ (z) as in eq. 1.2 we have that Z Z FZ (z) = fX,Y (x, y) dxdy. (1.5) {x,y | g(x,y)≤z} Again, the hardest part of the problem is usually to determine the region {x, y | g(x, y) ≤ z} for every z. Getting practice with this is useful. Recommended exercises • Example 4.9 (p.211) • Problem 5-7 (p.246) • Problem 14 (p.247) Sums of two (independent) random variables Let’s start with a general case, where we sum over variables that are not necessarily independent. Let Z = X + Y be discrete, integer-valued random variables. Then pZ (z) = P(X + Y = z) X = P(X = x, Y = y) {(x,y)|x+y=z} = X P(X = x, Y = z − x) x = X pX,Y (x, z − x). (1.6) x Let Z = X + Y be continuous random variables. Then FZ (z) = P(Z ≤ z) = P(X + Y ≤ z) Z +∞ Z z−x = fX,Y (x, y) dy dx. −∞ (1.7) −∞ This is tricky. If you know how to differentiate an integral (Leibniz rule), which you do not need for this course, you will find Z fZ (z) = +∞ fX,Y (x, z − x) dx. (1.8) −∞ I add this result in here for completeness. It is the clear analogue of equation 1.6. You do not need to know how to derive equation 1.8. I would also like to highlight that the following approach, which may seem tempting, is not correct: 170B Probability Theory, Rombach Lecture Notes 6 W RO N G ! F X,Z (x, z) = P(X ≤ x, Z ≤ z) = P(X ≤ x, X + Y ≤ z) = P(X ≤ x, Y ≤ z − x) = F X,Y (x, z − x). Can you pinpoint the problematic line? Draw a picture of the XY-plane. Draw a line for X = x and one for X + Y = z. Then shade the regions representing line 2 and 3 in the equation above. The case where X, Y are independent is the most straighforward one to handle in general, and is computed by the so-called convolution of the two marginal distribution functions for X and Y. The proofs for these are described in detail in the book (p213). A key observation here is how we line up the PMFs and PDFs. For any Z = z, we line up pX (x) with pY (z − x), and then for each such pair, we multiply the probabilities. Example. Consider the example of the throw of a die X and a fair coin (0 or 1) Y, and consider the probability that the sum is Z = X + Y = 4. The following picture is very analogues in the continuous case. Notice how we line up the horizontal axis such that they add up to 4 everywhere. Of course we find P(Z = 4) = . . . + 0 · 0 + 1/6 · 0 + 1/6 · 0 + 1/6 · 1/2 + 1/6 · 1/2 + 1/6 · 0 + 1/6 · 0 + 0 · 0 + . . . = 1/6. Lemma 1.6. Let Z = X + Y, with X, Y independent. Then, • if X, Y discrete and integer-valued, pZ (z) = X pX (x)pY (z − x). x • if X, Y continuous, fZ (z) = Z +∞ fX fY (z − x) dx. −∞ Lemma 1.7. Let Z = X − Y, with X, Y independent. Then, 170B Probability Theory, Rombach Lecture Notes 7 • if X, Y discrete and integer-valued, pZ (z) = X pX (x)pY (x − z). x • if X, Y continuous, fZ (z) = Z +∞ fX fY (x − z) dx. −∞ In the lecture and in the proof for these in the book, you will find the following line (or something similar) P(Z ≤ z|X = x) = P(Y ≤ z − x). This only holds when X and Y are independent. Consider the following example. Let X be uniform over [0, 1], and let Y = X. Let x = .5, z = .6. Now, P(Z ≤ z|X = x) = 0, and P(Y ≤ z − x) > 0. Recommended exercises • Example 4.104.12 (p.214-216) • Problem 9-12 (p.246-247) Solutions to Exercises Solution 1.5. We have RX = (0, 1], which gives us RY = [0, ∞). The function g(X) is strictly decreasing in the domain. We have X = g−1 (Y) = e−λY . So, we directly apply dg−1 (y) − fX (g−1 (y)) dy = λe−λy if y ≥ 0, fY (y) = 0 if y < 0. This shows us that Y is an exponential random variable. What this implies is that, for example, if we have a method to sample from a uniform distribution (which many software packages or calculators can do), we can also sample from an exponential distribution, simply by transforming the random variable. 170B Probability Theory, Rombach Lecture Notes