Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ST 521: Statistical Theory I
Armin Schwartzman
Class topic 4
Transformations of Random Variables
2
Functions of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Probability mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Discrete RVs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Continuous RVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Continuous RVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Examples
8
Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Square root of an exponential RV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Probability Integral Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Probability Integral Transform (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Inverse Probability Integral Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Inverse Probability Integral Transform (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Example: Cauchy Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Non-monotone transformations
17
One-to-many. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Quadratic transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
General Result–Theorem C-B 2.1.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Example: A wrapped distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1
Transformations of Random Variables
2 / 22
Functions of Random Variables
(C-B Chap 2.1)
If X is a rv with sample space X ⊂ R and cdf FX (x) then any function of X, say Y = g(X) is also a
random variable. The new random variable Y has a new sample space Y = g(X ) ⊂ R. The
objective is to find the cdf FY (y) of Y .
Example: Suppose X is an exponential random variable with parameter 1, i.e. F X (x) = 1 − e−x ,
fX (x) = e−x . What is the distribution of Y = X/λ?
ST 521 – Class topic 4
Fall 2014 – 3 / 22
Probability mapping
For any Borel set A:
P (Y ∈ A) = P (g(X) ∈ A)
= P ({x ∈ X : g(x) ∈ A})
= P (X ∈ g −1 (A))
where we have defined
g −1 (A) = {x ∈ X : g(x) ∈ A}
Notice that g −1 (A) is well defined even if g(·) is not necessarily bijective.
Example: Suppose X is uniform between -1 and 1. What is the distribution of Y = X 2 ?
ST 521 – Class topic 4
Fall 2014 – 4 / 22
2
Discrete RVs
Suppose that X is a discrete random variable with probability mass function p(x) = P (X = x).
Then, the pmf of a 1-1 transformation Y = g(X) is given by
P (Y = y) = P (g(X) = y) = P ({x : g(x) = y}) =
p(x)
x:g(x)=y
In practice, one never sees many general results about transformations of discrete random variables
because the results are so simple!
Example: Suppose X is uniform between -1 and 1. What is the distribution of Y = X 2 ?
ST 521 – Class topic 4
Fall 2014 – 5 / 22
Continuous RVs
Consider the transformation Y = g(X) where g(x) is strictly increasing (consequently a one-to-one
transformation), and suppose g is differentiable. This means that we can also define the inverse
function, g −1 (y).
FY (y) = P {Y ≤ y} = P {g(X) ≤ y}
= P {X ≤ g −1 (y)} = FX (g−1 (y)).
The pdf of Y is thus,
fY (y) =
d
dg−1 (y)
dx
FY (y) = FX [g−1 (y)]
= fX (g−1 (y))
dy
dy
dy
since
x = g −1 (y), so that
ST 521 – Class topic 4
dg−1 (y)
dx
=
dy
dy
Fall 2014 – 6 / 22
3
Continuous RVs
Suppose Y = g(x) is still one-to-one, but decreasing instead of increasing.
FY (y) = P {g(X) ≤ y} = P {X > g−1 (y)} = 1 − FX (g−1 (y))
and
dg−1 (y)
dx
= −fX (g−1 (y))
dy
dy
dx
= fX (g−1 (y))
dy
fY (y) = −FX (g−1 (y))
The last step follows because dx
dy is negative.
Therefore, regardless of whether Y = g(x) is increasing or decreasing, so long as it is monotonic,
we have
dx
fY (y) = fX (g−1 (y))
dy
ST 521 – Class topic 4
Fall 2014 – 7 / 22
Examples
8 / 22
Linear Transformation
Given X with pdf fX (x), let
Y = a + bX,
Then
fY (y) = fX [g−1 (y)]
dy
=b
dx
dx
= fX
dy
y−a
b
1
|b|
This transformation is often used when X has mean 0 and standard deviation 1. The linear
transformation above creates a rv Y with a distribution that has the same shape as that of X but has
mean a and standard deviation b.
Conversely, if Y has mean a and standard deviation b, then X = (Y − a)/b has mean 0 and standard
deviation 1. This is called sometimes the “Studentized” transform.
ST 521 – Class topic 4
Fall 2014 – 9 / 22
4
Normal Distribution
Let X ∼ N (0, 1):
x2
1
fX (x) = √ e− 2 ,
2π
−∞ < x < ∞
The transformation
Y = μ + σX,
yields
fY (y) = fX
X=
y−μ
σ
Y −μ
σ
(y−μ)2
1
1
e− 2σ2
=√
σ
2πσ
More generally, a distribution is a member of the class of location-scale distributions if the
distribution of a linear transformation of a random variable with that distribution has the same
distribution, but with different parameters.
ST 521 – Class topic 4
Fall 2014 – 10 / 22
Square root of an exponential RV
We have already seen that a constant times an exponential random variable leads to another
exponential random variable. Suppose X ∼ exp(λ), so that
fX (x) = λe−λx 1(x ≥ 0)
and consider the distribution of Y =
√
X.
The transformation
y = g(x) =
√
x,
x≥0
is one-to-one and has an inverse x = y 2 with dx/dy = 2y. Thus
2
fY (y) = fX (y 2 )2y = 2λye−λy ,
y≥0
This distribution is a particular form of the Rayleigh distribution and is a special case of the χ, Rice
and Weibull distributions.
ST 521 – Class topic 4
Fall 2014 – 11 / 22
5
Probability Integral Transform
Let X ∼ FX (x). Define the transformation
Y = FX (X) ∈ [0, 1],
Here
X = FX−1 (Y )
dy
= FX (x) = fX (x)
dx
1
=1
fY (y) = fX [FX−1 (y)]
fX [FX−1 (y)]
i.e. Y is uniform over [0, 1].
Another way to see it is through the distribution function:
FY (y) = P (Y ≤ y) = P (FX (X) ≤ y)
= P (X ≤ FX−1 (y)) = FX (FX−1 (y)) = y
What if FX (x) is not strictly increasing?
ST 521 – Class topic 4
Fall 2014 – 12 / 22
Probability Integral Transform (cont.)
The probability integral transform is useful in statistics for checking goodness of fit of a distribution to
a set of data.
Example:
X ∼ exp(λ) ⇒ Y = 1 − exp(−λX) ∼ U [0, 1]
If one has data X1 , . . . , Xn , one could compute the transformed data Y i = 1 − exp(−λXi ),
i = 1, . . . , n, and check whether the Y i ’s appear uniformly distributed over the interval [0, 1].
ST 521 – Class topic 4
Fall 2014 – 13 / 22
6
Inverse Probability Integral Transform
We can also start from the uniform distribution and do the inverse procedure.
Suppose X ∼ U [0, 1], so that f X (x) = 1 and FX (x) = x for x ∈ [0, 1]. Let
Y = F −1 (X),
X = F (Y )
where F (·) is a non-decreasing absolutely continuous function F : R → [0, 1], F (y) =
Then
dx
= F (y) = f (y) ⇒ fY (y) = f (y)
dy
y
−∞ f (x) dx.
i.e. Y has the pdf corresponding to F .
Another way to see it is through the distribution function:
FY (y) = P (Y ≤ y) = P (F −1 (X) ≤ y)
= P (X ≤ F (y)) = F (y)
ST 521 – Class topic 4
Fall 2014 – 14 / 22
Inverse Probability Integral Transform (cont.)
The Inverse Probability Integral Transform is used extensively in simulation of random variables.
Example: Most random number generators generate random numbers uniformly in the interval [0, 1].
Suppose we want to generate random numbers from an exponential distribution with parameter λ.
Let X ∼ U [0, 1]. We want Y with distribution function F (y) = 1 − exp(−λy). Then we need the
transformation
−1
log(1 − X)
Y = F −1 (X) =
λ
The recipe is
1.
2.
Generate random numbers X i uniformly over [0, 1].
Compute Yi = − log(1 − Xi )/λ.
ST 521 – Class topic 4
Fall 2014 – 15 / 22
7
Example: Cauchy Distribution
Let θ be distributed uniformly between (−π/2, π/2):
f (θ) =
1
,
π
−π/2 < θ < π/2
Consider Y = tan θ.
dy
= sec2 θ = 1 + tan2 θ = 1 + y 2
dθ
1
1 dθ
1
−∞<y <∞
fY (y) =
=
π dy
π (1 + y 2 )
The distribution with this density is known as the Cauchy distribution.
ST 521 – Class topic 4
Fall 2014 – 16 / 22
Non-monotone transformations
17 / 22
One-to-many
What if the transformation is not 1-1? The trick is to start with the cdf of the transformed random
variable.
Example: Let Y = |X|, and assume X is continuous.
FY (y) = P {Y ≤ y} = P {−y ≤ X ≤ y} = FX (y) − FX (−y)
fY (y) = FX (y) − FX (−y)(−1) = fX (y) + fX (−y)
Suppose
X ∼ N (0, 1),
Then
1
2
fX (x) = √ e−x /2
2π
2
2
fY (y) = √ e−y /2 ,
2π
ST 521 – Class topic 4
0<y<∞
Fall 2014 – 18 / 22
8
Quadratic transformation
Let
√
dy
=2 y
dx
dy
= 2x,
dx
Y = X 2,
Then
√
√
FY (y) = P {Y ≤ y} = P {X 2 ≤ y} = P {− y < X ≤ y}
√
√
= FX ( y) − FX (− y)
√
FX ( y)
fY (y) =
1 −1
y 2
2
−
√
FX (− y)
1
√
√
√ [fX ( y) + fX (− y)],
2 y
=
1 1
− y− 2
2
y>0
ST 521 – Class topic 4
Fall 2014 – 19 / 22
Example
Suppose X ∼ N (0, 1), Y = X 2 :
1 2
1
−∞<x<∞
fX (x) = √ e− 2 x
2π
2 −1y
1
fY (y) = √ √ e 2
2 y
2π
y
1
=
(y/2)− 2 e− 2
√
,
2 π
y>0
This is the density of a χ2 distribution with 1 degree of freedom.
Result: If X ∼ N (0, 1), then X 2 ∼ χ2 (1).
ST 521 – Class topic 4
Fall 2014 – 20 / 22
9
General Result–Theorem C-B 2.1.8
Suppose Y = g(X) is not 1-1, but there are disjoint sets A 1 , ...Ak that span the domain (sample
space) of X such that g(.) = gj (.) is continuous and 1-1 on each A j . This means that the inverse,
x = gj−1 (y) exists on each Aj . Then
fY (y) =
k
f (gj−1 (y))
j=1
dgj−1 (y)
dy
ST 521 – Class topic 4
Fall 2014 – 21 / 22
Example: A wrapped distribution
Suppose X ∈ R with density f X (x) represents a random angle of rotation (in radians) from the
x-axis on the unit circumference. The observed angle is
Θ=X
mod 2π,
Θ ∈ [0, 2π)
because it is impossible to tell if the rotation involved more than one full turn. In this case
fΘ (θ) =
∞
f (θ + 2πj),
0 ≤ θ < 2π
j=−∞
If X ∼ N (0, σ2 ), then
fΘ (θ) =
∞
j=−∞
√
(θ + 2πj)2
1
exp −
2σ 2
2πσ
ST 521 – Class topic 4
Fall 2014 – 22 / 22
10