Download 1. Derived Distributions - UCLA Department of Mathematics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
1. Derived Distributions
170B Probability Theory, Puck Rombach
Last updated: September 28, 2016
Bertsekas & Tsitsiklis: Section 4.1.
Assumed knowledge: Random variables, PMFs, PDFs, CDFs.
The book only considers continuous random variables in section 4.1, but we will think about both.
Lecture 1
We spend a part of this lecture talking about logistics of the course. Please think about the following questions yourself: Why am I taking this course? Which parts do I find interesting, and which
parts will be useful to me later? You will probably find that your answer to these questions change
throughout the course, as you discover new concepts of value to you.
We discussed the following concepts. It is very important that you understand these and use them
precisely. They should be familiar, but you will learn to use them more carefully than before.
• Probability space, sample space, probability law. (B&T Ch.1)
• A random variable X is a real-valued function of the outcomes of a sample space. (It is not
random and not a variable.)
The support RX of a random variable is the set of all values it can take with non-zero probability.
A discrete random variable X has a probability mass function (PMF) pX (x) such that
P(X = x) = pX (x).
A (absolutely) continuous random variable X has a probability density function (PDF)
fX (x) such that
Z b
fX (x)dx.
P(a ≤ X ≤ b) =
a
In general, the cumulative distribution function (CDF) F X (x) is given by
P


R k≤x pX (x) if X discrete
F X (x) = P(X ≤ x) = 

 x fX (t)dt if X continuous.
−∞
1
2
We are interested in situations where one random variable is a function of the other, Y = g(X),
where we know something about the distribution of X, and want to know about Y. For example, we
know how sales are distributed, but we are interested in the profit. Or, we know how road surface
resistance is distributed, but we want to know how fast tires will wear down. Think about examples
from your area of expertise. (Of course, in interesting/useful cases the relationships between rvs is
more complicated than a direct function, but that is where we should start.)
General derived distribution for discrete rv
Suppose we have Y = g(X), where we know the distribution of X, and are interested in the distribution of Y.
Discrete case
Suppose we have Y = g(X), and we know pX (x). It is easy to see that
X
pY (y) =
pX (x).
(1.1)
{x|g(x)=y}
Example. Toss 2 fair coins. X is the number of Hs. Y is 0 if X is even and 1 if X is odd. What is
pY (y)?
Lecture 2
The above sum is very general, but it is not the most direct solution in many cases. What if g(x) is
an invertible function? Then the above is not really a sum, as there is only one x such that g(x) = y
for each y. Let’s say that more rigorously.
Suppose that X has support RX . Then
RY = {y| there exists an x ∈ RX such that g(x) = y}.
The function g(x) is invertible if for every y ∈ RY , this x such that g(x) = y is unique. The following alternative statement may not be obvious to you, but it’s good practice for mathematical
statements:
g(x) is invertible if for any x1 , x2 ∈ RX , g(x1 ) = g(x2 ) implies x1 = x2 .
In the lecture, we show the following lemma. If I omit a proof in the notes, this means that I expect
you to be able to do it yourself. You can use that as an extra exercise if needed.
Lemma 1.1. If X is a discrete random variable and g : X → Y is an invertible function (on RX ),
then



 pX (g−1 (y)) if y ∈ RY
pY (y) = 

0
if y < RY .
170B Probability Theory, Rombach
Lecture Notes
3
Corollary 1.2. If X is a discrete random variable and Y = aX + b, a , 0, then
 y−b 

if y ∈ RY
 pX a
pY (y) = 

0
if y < RY .
Continuous case
For the continuous case, things get a little bit more complicated.The general analogue of equation
1.1 is
fY (y) =
FY (y) =
dFY (y)
,
dy
(1.2)
fX (x) dx.
(1.3)
Z
{x | g(x)≤y}
Both the integral and the differentiation may be arbitrarily awful or impossible. The most manageable cases here are when g(x) is not just invertible, but also strictly monotone. The reason for
this is that we cannot enquire about single values for X and Y, we can only consider sets of values. We walked through the proofs for the following lemmas in class, but they are also in the book
in section 4.1. However, you should be able to fluently prove them yourself, so practice if necessary.
Lemma 1.3. If X is a continuous random variable and g : X → Y is a strictly monotone differentiable function (on RX ), then, if g(x) is increasing

dg−1 (y)


if y ∈ RY ,
 fX (g−1 (y)) dy
fY (y) = 

0
if y < RY ,
and if g(x) is decreasing

dg−1 (y)


− fX (g−1 (y)) dy
fY (y) = 

0
if y ∈ RY ,
if y < RY .
Corollary 1.4. If X is a continuous random variable and Y = g(X) = aX + b, a , 0, then

y−b 1


if y ∈ RY
 |a| fX a
fY (y) = 

0
if y < RY .
We then apply these lemmas to show the following statements about functions of exponential and
normal random variables. Again, it is worth checking these for yourself as practice. (Proofs also
in the book in Example 4.4-4.5 on page 206) This might also be a good time to remind yourself of
the PMFs, PDFs, CDFs of a few standard distributions: normal, exponential, Bernoulli, binomial,
uniform, geometric, Poisson.
• If X is an exponential random variable with parameter λ, and Y = aX, a > 0, then Y is
exponential with parameter λ/a.
• If X is a normal random variable with parameters µ and σ2 , and Y = aX + b, a , 0, then Y is
normal with parameters aµ + b and a2 σ2 .
170B Probability Theory, Rombach
Lecture Notes
4
General CDF for monotone g(x)
The continuous case is forgiving when it comes to differentiating between P(X ≤ x) and P(X < x),
allowing us to claim that FY (y) = 1 − F X (g−1 (y)) in the case of decreasing g(x) in the proof above.
Let’s not forget that the full statement, which is correct in both the discrete and the continuous case
is
FY (y) = 1 − F X (g−1 (y)) + P(X = g−1 (y)), if y ∈ RY
when g(x) is strictly monotone and decreasing. The last term is of course equal to 0 for continuous
variables.
Recommended exercises
• Example 4.1-4.3 (p.203-204)
• Example 4.6 (p.209)
• Problem 1-4 (p.246)
Lecture 3
We start the lecture with one more example of a function of a random variable.
Exercise 1.5. Suppose we are given a uniform continuous random variable X, with



1 if x ∈ (0, 1],
fX (x) = 

0 if o/w.
If Y = g(X) = − λ1 ln(X), then find fY (y).
Functions of two random variables in general
You may want to do a little TBT to the first time you did an example of a function of two random variables: Romeo and Juliet on p13. You may also want to look back at joint distributions.
Remember that if X, Y are independent, then respectively
pX,Y (x, y) = pX (x)pY (y)
and
fX,Y (x, y) = fX (x) fY (y).
The reasons for being interested in functions of two (or more) random variables are similar to
functions of single variables. Life is complicated. We may
√ be interested in functions such as
Z = max(X, Y) (profit if selling to a highest bidder), Z = X 2 + Y 2 (distance between two 2D
data points), Z = X + Y (output that is signal + noise), etc...
Can you find more examples from real life problems?
170B Probability Theory, Rombach
Lecture Notes
5
Example 4.7 and 4.8 are covered in the lecture. I’d also like to write down the explicit general
expressions of distribution functions. In the discrete case, we have that if Z = g(X, Y),
X
pZ (z) =
pX,Y (x, y).
(1.4)
{x,y | g(x,y)=z}
Here, the hardest part of the problem is usually to determine the set {x, y | g(x, y) = z} for every z.
In the continuous case, with FZ (z) as in eq. 1.2 we have that
Z Z
FZ (z) =
fX,Y (x, y) dxdy.
(1.5)
{x,y | g(x,y)≤z}
Again, the hardest part of the problem is usually to determine the region {x, y | g(x, y) ≤ z} for every
z. Getting practice with this is useful.
Recommended exercises
• Example 4.9 (p.211)
• Problem 5-7 (p.246)
• Problem 14 (p.247)
Sums of two (independent) random variables
Let’s start with a general case, where we sum over variables that are not necessarily independent.
Let Z = X + Y be discrete, integer-valued random variables. Then
pZ (z) = P(X + Y = z)
X
=
P(X = x, Y = y)
{(x,y)|x+y=z}
=
X
P(X = x, Y = z − x)
x
=
X
pX,Y (x, z − x).
(1.6)
x
Let Z = X + Y be continuous random variables. Then
FZ (z) = P(Z ≤ z)
= P(X + Y ≤ z)
Z +∞ Z z−x
=
fX,Y (x, y) dy dx.
−∞
(1.7)
−∞
This is tricky. If you know how to differentiate an integral (Leibniz rule), which you do not need for
this course, you will find
Z
fZ (z) =
+∞
fX,Y (x, z − x) dx.
(1.8)
−∞
I add this result in here for completeness. It is the clear analogue of equation 1.6. You do not need to
know how to derive equation 1.8. I would also like to highlight that the following approach, which
may seem tempting, is not correct:
170B Probability Theory, Rombach
Lecture Notes
6
W
RO
N
G
!
F X,Z (x, z) = P(X ≤ x, Z ≤ z)
= P(X ≤ x, X + Y ≤ z)
= P(X ≤ x, Y ≤ z − x)
= F X,Y (x, z − x).
Can you pinpoint the problematic line? Draw a picture of the XY-plane. Draw a line for X = x and
one for X + Y = z. Then shade the regions representing line 2 and 3 in the equation above.
The case where X, Y are independent is the most straighforward one to handle in general, and is
computed by the so-called convolution of the two marginal distribution functions for X and Y. The
proofs for these are described in detail in the book (p213).
A key observation here is how we line up the PMFs and PDFs. For any Z = z, we line up pX (x) with
pY (z − x), and then for each such pair, we multiply the probabilities.
Example. Consider the example of the throw of a die X and a fair coin (0 or 1) Y, and consider
the probability that the sum is Z = X + Y = 4. The following picture is very analogues in the
continuous case. Notice how we line up the horizontal axis such that they add up to 4 everywhere.
Of course we find
P(Z = 4) = . . . + 0 · 0 + 1/6 · 0 + 1/6 · 0 + 1/6 · 1/2 + 1/6 · 1/2 + 1/6 · 0 + 1/6 · 0 + 0 · 0 + . . . = 1/6.
Lemma 1.6. Let Z = X + Y, with X, Y independent. Then,
• if X, Y discrete and integer-valued,
pZ (z) =
X
pX (x)pY (z − x).
x
• if X, Y continuous,
fZ (z) =
Z
+∞
fX fY (z − x) dx.
−∞
Lemma 1.7. Let Z = X − Y, with X, Y independent. Then,
170B Probability Theory, Rombach
Lecture Notes
7
• if X, Y discrete and integer-valued,
pZ (z) =
X
pX (x)pY (x − z).
x
• if X, Y continuous,
fZ (z) =
Z
+∞
fX fY (x − z) dx.
−∞
In the lecture and in the proof for these in the book, you will find the following line (or something
similar)
P(Z ≤ z|X = x) = P(Y ≤ z − x).
This only holds when X and Y are independent. Consider the following example. Let X be uniform
over [0, 1], and let Y = X. Let x = .5, z = .6. Now, P(Z ≤ z|X = x) = 0, and P(Y ≤ z − x) > 0.
Recommended exercises
• Example 4.104.12 (p.214-216)
• Problem 9-12 (p.246-247)
Solutions to Exercises
Solution 1.5. We have RX = (0, 1], which gives us RY = [0, ∞). The function g(X) is strictly
decreasing in the domain. We have X = g−1 (Y) = e−λY . So, we directly apply

dg−1 (y)


− fX (g−1 (y)) dy = λe−λy if y ≥ 0,
fY (y) = 

0
if y < 0.
This shows us that Y is an exponential random variable. What this implies is that, for example, if we have a method to sample from a uniform distribution (which many software
packages or calculators can do), we can also sample from an exponential distribution, simply
by transforming the random variable.
170B Probability Theory, Rombach
Lecture Notes