Download Chapter 4 Jointly distributed Random variables = ∑ = ∑ ( ) ( )

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
RS – 4 - Jointly distributed RV (b)
Chapter 4
Jointly distributed Random variables
Multivariate distributions
Joint probability function
• For a Discrete RV, the joint probability function:
p(x,y) = P[X = x, Y = y]
• Marginal distributions
p X  x    p  x, y 
y
pY  y    p  x, y 
x
• Conditional distributions
pY X  y x  
p  x, y 
pX  x 
pX Y  x y  
p  x, y 
pY  y 
1
RS – 4 - Jointly distributed RV (b)
Joint probability function
• For a Continuous RV, the joint probability function:
f(x,y) = Pf[X = x, Y = y]
• Marginal distributions
fX x 

fY  y  
 f  x , y  dy

 f  x , y  dx


• Conditional distributions
fY
X
 y x 
f  x, y 
fX Y x y 
fX x
f  x, y 
fY  y 
Independence
Definition: Independence
Two random variables X and Y are defined to be independent if
p  x, y   p X
 x  pY  y 
if X and Y are discrete
x, y  
 x  fY  y 
if X and Y are continuous
f
fX
N o te : p Y
X
y x 
pX
Y
x y  
p  x, y 

x
p  x, y 

pY  y 
pX
pX
pX
 x  pY  y  
pX x
 x  pY  y  
pY  y 
pY
y
pX
x
Thus, in the case of independence
marginal distributions ≡ conditional distributions
2
RS – 4 - Jointly distributed RV (b)
The Multiplicative Rule for densities
if X and Y are discrete
 p X  x  p Y X  y x 
p  x, y   
 p Y  y  p X Y  x y 
 p X  x  pY
y
if X and Y are independent
if X and Y are continuous
 f X  x  f Y X  y x 
f  x, y   
 f Y  y  f X Y  x y 
 f X  x  fY
y
if X and Y are independent
The Multiplicative Rule for densities
Proof:
This follows from the definition for conditional densities:
pY
X
y x 
p  x, y 
pX Y x y  
pX x
p  x, y 
pY  y 
Hence
p  x , y   p X  x  pY
X
y x
and
p  x , y   pY
 y pX Y x y
The same is true for continuous random variables.
3
RS – 4 - Jointly distributed RV (b)
Finding CDFs (or pdfs) is difficult - Example
Suppose that a rectangle is constructed by first choosing its length, X
and then choosing its width Y.
Its length X is selected from an exponential distribution with mean 
= 1/ = 5. Once the length has been chosen its width, Y, is selected
from a uniform distribution from 0 to half its length.
Find the probability that its area A = XY is less than 4.
fX x 
Solution:
fY
1
5
e
 15 x
fo r x  0
1
y x 
if 0  y  x 2
x 2
f  x , y   f X  x  fY X  y x 
X
 15 e
 15 x
1
=
x 2
2
5x
e
 15 x
if 0  y  x 2 , x  0
Finding CDFs (or pdfs) is difficult - Example
yx 24 x
x 2  8 or x  8  2 2
yx 22 2 2 2
xy = 4
2
2, 2
y = x/2

4
RS – 4 - Jointly distributed RV (b)
Finding CDFs (or pdfs) is difficult - Example
P  XY  4 
x
2 2 2
 
0
2
2, 2
4

f  x, y  dydx 
x
  f  x, y  dydx
0
2 2 0

Finding CDFs (or pdfs) is difficult - Example
P XY  4 
2
2

2

2
2
 
0

x
2
5x
e
 15 x
dydx 
2
2
e
 15 x x
2
2

1
5

0
This part can be
evaluated
2
e
 15 x


dx 
8
5
2

2
5x
e
 15 x
dydx
2 0

dx 
0
2
x
 
 x, y  dydx
f
0
2
4

0
2
5x
x
 x, y  dydx   
f
0
2
4

2
 
0
2
x
2
5x
e
 15 x 4
x
dx
2
x 2 e
 15 x
dx
2
This part may require Numerical
evaluation
5
RS – 4 - Jointly distributed RV (b)
Functions of Random Variables
Methods for determining the distribution of
functions of Random Variables
Given some random variable X, we want to study some function h(X).
X is a transformation from (Ω, Σ ,P) into the probability space (χ, AX,
PX). Now, Y is a transformation from (χ, AX, PX) into the probability
space (Υ, AY, PY),
That is, we need to discover how the probability measure PY relates to
the measure PX.
For some F є F , we have that
PY[F] = PX{x є χ : Y(x) є F} = P{ω є Ω: g(X(ω)) є F}=
= P{ω є Ω: X(ω) є h-1(F)}= PX[h-1(F)]
6
RS – 4 - Jointly distributed RV (b)
Methods for determining the distribution of
functions of Random Variables
With non-transformed variables, we step "backwards" from the values
of X to the set of events in Ω. In the transformed case, we take two
steps backwards: 1) once from the range of the transformation back to
the values of the X, and 2) again back to the set of events in Ω.
Potential problem: The transformation h(x) --we need to work with
h-1(x)-- may not yield unique results if h(X) is not monotonic.
Methods to get PY:
1. Distribution function method
2. Transformation method
3. Moment generating function method
Method 1: Distribution function method
Let X, Y, Z …. have joint density f(x,y,z, …)
Let W = h( X, Y, Z, …)
First step
Find the distribution function of W
G(w) = P[W ≤ w] = P[h( X, Y, Z, …) ≤ w]
Second step
Find the density function of W
g(w) = G'(w).
7
RS – 4 - Jointly distributed RV (b)
Distribution function method: Example 1 - 2
Let X have a normal distribution with mean 0, and variance 1 -i.e., a
standard normal distribution. That is:
2
f
x 
x

1
e 2
2
Let W = X2.. Find the distribution of W.
Example 1: Chi-square - 2
First step
Find the distribution function of W
G(w) = P[W ≤ w] = P[ X2 ≤ w]
 P  
w



w
w  X 
w 
if w  0
2
x

1
e 2 dx
2
 F

w
  F 
w

2
where
F x   f
x  
x

1
e 2
2
8
RS – 4 - Jointly distributed RV (b)
Example 1: Chi-square - 2
Second step
Find the density function of W
g(w) = G'(w).

w

w
 F
 f
 d d ww


 F 
1  12
w  f
2


w

1
1  12
e 2
w 
2
2

1
w


1
w 2e 2
2

w
w



d 
w

dw
1  12
w
2
w

1
1  12
e 2
w
2
2
if w  0 .
Example 1: Chi-square - 2
Thus, if X has a standard Normal distribution =>W = X2 follows
g
w  
1
w
2

1
2
e

w
2
if w  0 .
This distribution is the Gamma distribution with  = ½ and  = ½.
This distribution is the 2 distribution with 1 degree of freedom.
• Using the additive properties of a gamma distribution, the sum of T
independent 2 RVs produces a 2 distributed RV. Alternatively, the
sum of T independent N(0,1)2 RVs produces a 2 distributed RV.
• Note: If we add T independent N(i,σi2)2 RVs, the Σi (Xi/σi)2 follows
a non-central 2 distribution, with non-centrality parameter Σi (i/σi)2.
This distribution is common in the power analysis of statistical tests in
which the null distribution is (asymptotically) a 2.
9
RS – 4 - Jointly distributed RV (b)
Distribution function method: Example 2
Suppose that X and Y are independent random variables each having
an exponential distribution with parameter  (=> E(X) = 1/)
f 1  x    e   x fo r x  0
f 2  y    e   y fo r y  0
f
 x, y  
f1  x  f 2  y 
  2 e    x  y  fo r x  0, y  0
Find the distribution of W = X + Y.
Example 2: Sum of Exponentials
First step
Find the distribution function of W = X + Y
G(w) = P[W ≤ w] = P[ X + Y ≤ w]
10
RS – 4 - Jointly distributed RV (b)
Example 2: Sum of Exponentials
Example 2: Sum of Exponentials
P X  Y  w 
w w x
 
0
f1  x  f 2  y  dydx
0
w w x

 
0
 2 e    x  y  dydx
0
11
RS – 4 - Jointly distributed RV (b)
Example 2: Sum of Exponentials
P X  Y  w 
w w x
0
0
w w x

f1  x  f 2  y  dydx
 
 
0
 2 e    x  y  dydx
0
w
 w x

  2  e   x   e   y dy dx
0
 0


w x
w
2
e
 x
0

w
2
e
 x
0
 e y 

 dx
  0
 e  w x  e 0 

dx




Example 2: Sum of Exponentials
P  X  Y  w  
w
2
e
0
 x
 e    w x   e0 

dx




w
    e   x  e   w dx
0
w
 e x


 xe   w 
 
0
 e   w
  e 0  
  
 we   w   

    
 
 1  e   w   we   w 
12
RS – 4 - Jointly distributed RV (b)
Example 2: Sum of Exponentials
Second step
Find the density function of W
g(w) = G'(w).

d
1  e   w   we   w 
dw
 w
 dw   w
de   w
  e

w
e
dw
dw





   e   w   e   w   2 we   w 
  2 we   w
for w  0
Example 2: Sum of Exponentials
Hence if X and Y are independent random variables each having
an exponential distribution with parameter  then W has density
g  w    2 we   w
for w  0
This distribution can be recognized to be the Gamma distribution
with parameters  = 2 and 
13
RS – 4 - Jointly distributed RV (b)
Distribution function method: Example 3
Student’s t distribution
Let Z and U be two independent random variables with:
1. Z having a Standard Normal distribution -i.e., Z~ N(0,1)
2
z

1
e 2
2
f z 
2.
U having a 2 distribution with  degrees of freedom

 1 2

u
 
1 
2
h u     u 2 e 2
 
  
 2 
Z
Find the distribution of
t 
U

Example 3: Student-t Distribution
Therefore the joint density of Z and U is:
f
z,u  
f
 z  h u  
The distribution function of T is:

G  t   P T  t   P 


Z
U

 1 2

z2 u
 
1 
 2 
2
u
e 2
 
2   
 2 


 t  P Z 




t


U 


t



 
0

U
 1 2
z2 u

 
1 
 2 
u 2 e 2 dzdu
 
2   
 2 
14
RS – 4 - Jointly distributed RV (b)
Example 3: Student-t Distribution

t
G(t)

 1 2
z2 u

 
1 
2
 
u 2 e 2 dzdu
 
2   
2
U


0

 
Then:
g t   G ( t ) 


0


d 
d t 



t

u


 1 2

z2 u
 
1 
2
2
u
e 2 dz
 
2   
 2 



d u



Example 3: Student-t Distribution
d
dt
Using:
Using the FTC:
b
 F ( x , t ) dx
a
b

a
x
F ( x) 
d
F ( x , t ) dx
dt

 f  t  dt
a
Then
g t  


0



d 
d t 


F ( x )  f  x 
=>

t

u


 1 2
z2 u

 
1 
 2 
2
u
e 2 dz
 
2   
 2 

 1 2
 
2
 
2   
2

u
0

2
1
e

u
2
e

t 2u
2
u




d u



du
15
RS – 4 - Jointly distributed RV (b)
Example 3: Student-t Distribution

 1 2
 
2
Hence
g (t ) 
2
Using
1


0

 
 
 2

x  1 e   x dx =>
  
Thus,

u
 1
0
2
e
 t2 
 1  u



2
u
 1
2
e
 t2

 1  u




2
du
0

 1   x
 x e dx 
  
0

 1
 1  2

2
2 

du 
 1
2
t
 2

1




Example 3: Student-t Distribution
or
  1 
 1
 1



  t2
2
2

t
 2
 2 



1
1
g (t ) 
K




  



    
2
where
K 
  1 


 2 
 
   
 2 
This is the Student’s t distribution. Its denoted as t ~ tv,
where ν is referred as degrees of freedom (df).
William Gosset (1876 - 1937)
16
RS – 4 - Jointly distributed RV (b)
Example 3: Student-t Distribution
t distribution
standard normal distribution
Example of a t-distributed variable
Let
_
_
(x  )

z 
 / n
Let
U 
n
( n  1) s 2

2
(x  )

~ N ( 0 ,1 )
~  n2  1
Assume that Z and U are independent (later, we will learn how
to check this assumption). Then,
_
t 
(x  )
n

( n  1) s 2

2
_

/( n  1 )
_
n (x  )
(x  )

~ t n 1
s
s/ n
17
RS – 4 - Jointly distributed RV (b)
Distribution function method: Example 4
Distribution of the Max and Min Statistics
Let x1, x2, … , xn denote a sample of size n from the density f(x).
Find the distribution of M = max(xi).
Repeat this computation for m = min(xi)
Assume that the density is the uniform density from 0 to . Thus,
 1

f (x)  
 
0  x  
e ls e w h e re
x  0
0
 x

F ( x)  P X  x   

 1
0  x  
x  
Example 4: Distribution of Max & Min
Finding the distribution function of M.
G (t )  P
M
 t   P  m a x
 P
 x1
 t , , x n  t 
 P
 x1
 t  P



 



t 
 t
t  0
0
t 
 
1
xn
xi  
n
0  t  
t  
18
RS – 4 - Jointly distributed RV (b)
Example 4: Distribution of Max & Min
Differentiating we find the density function of M.
 n t n 1

g t   G  t     n
 0

0t 
o th erw ise
g(t)
f(x)
0.12
0.6
0.1
0.5
0.08
0.4
0.06
0.3
0.04
0.2
0.02
0.1
0
0
0
2
4
6
8
0
10
2
4
6
8
10
Example 4: Distribution of Max & Min
Finding the distribution function of m.
G ( t )  P  m  t   P  m in  x i   t 
 1  P  x1  t ,  , x n  t 
 1  P  x1  t  P  x n  t 



 1 



t  0
0
t 

1 
 

1
n
0  t 
t 
19
RS – 4 - Jointly distributed RV (b)
Example 4: Distribution of Max & Min
Differentiating we find the density function of m.
g
t  
G  t

n 1
 n 
t 
1


   
 

0

0.12
f(x)
0  t  
o th e rw is e
0.6
0.1
g(t)
0.08
0.5
0.4
0.06
0.3
0.04
0.2
0.02
0.1
0
0
0
2
4
6
8
10
0
2
4
6
8
10
The Probability Integral Transformation
Theorem: Let U∼Uniform(0,1) and F is a CDF, then a RVwith CDF
F can be generated as F−1(U), where F−1 is the inverse of F, if it exists
(more generally, it is the quantile function for F).
This transformation allows one to convert observations that come
from a uniform distribution from 0 to 1 (easy to generate using a
standard function) to observations that come from an arbitrary pdf.
Let U denote an observation having a U(0, 1):
1
g (u )  

0  u  1
e ls e w h e r e
Let f(x) denote an arbitrary pdf and F(x) its corresponding CDF. Let
X=F-1(U); we want to find the distribution of X.
20
RS – 4 - Jointly distributed RV (b)
The Probability Integral Transformation
Now, we go from the u’s –i.e., from (0,1)- to the x’s.
0  u  1
1
g (u )  

e ls e w h e r e
The Probability Integral Transformation
Find the distribution of X.
G
x  
P
X
  P  F
F  x  
 x
 P  U 
 F x 
1
( U )  x 
(because U is uniform.)
g x   G x   F x   f
Hence:
Thus if U ~ Uniform(0,1), then,
X = F-1(U)
has density f(x).
x 
X  F1(U)
U
21
RS – 4 - Jointly distributed RV (b)
The Probability Integral Transformation
• The goal of some estimation methods is to simulate an expectation,
say E[h(Z)]. To do this, we need to simulate Z from its distribution.
The probability integral transformation is very handy for this task.
Example: Exponential distribution (X ~ Exponential(λ)).
Let U ~ Uniform(0,1) and F(x) = 1 – exp(- λx).
=>
X = -log(1 – U )/ λ ~ F (exponential distribution)
Code in R
> set.seed(90)
> lambda <- 3
> U <- runif(500,0,1)
> XX <- -log(1-U)/lambda
> mean(XX)
[1] 0.3127794
The Probability Integral Transformation
Example (continuation):
hist(XX, main="Histogram for Simulated Draws",xlab="Freq", breaks=20)
22
RS – 4 - Jointly distributed RV (b)
The Probability Integral Transformation
Example: If F is the standard normal, F-1 has no closed form solution.
Most computers programs have a routine to approximate F-1 for the
standard normal distribution. We can use this to simulate other
distributions –for example, truncated normals.
Say, we want to sample from X ~ N(μ, σ2) with truncation points a &
b. Suppose we can evaluate points from the untruncated inverse
normal cdf F-1. Then,
1. Calculate endpoints pa = F(a) & pb = F(b).
2. Sample U ~ Unif (pa; pb).
3. Set X = F-1(U).
Steps (1) and (2) avoid calculating the normalizing constant for the
pdf. -i.e., 1/[F-1(b) - F-1(a)].
The Probability Integral Transformation
Example: Code in R (a=-1; b=5, μ=0, σ=1)
>set.seed(90)
>pa <- pnorm(-1)
>pb <- pnorm(5)
> U <- runif(500,pa,pb)
> XX <- qnorm(U)
# inverse normal from mu=0, sd=1
> mean(XX)
[1] 0.2416027
> hist(XX, main="Histogram for Simulated Draws",xlab="Observations", breaks=20)
23
RS – 4 - Jointly distributed RV (b)
Method 2: The Transformation Method
Theorem
Let X denote a RV with pdf f(x) and U = h(X).
Assume that h(x) is either strictly increasing (or decreasing) then the
probability density of U is:
g u   f
h
1
(u )

d h 1 (u )
 f
du
x 
dx
du
Proof
Use the distribution function method.
Step 1 Find the distribution function, G(u)
Step 2 Differentiate G(u ) to find the probability density function g(u)
Method 2: The Transformation Method
G  u   P U  u   P  h  X
P

 
 P
1
 X  h ( u ) 
 X  h  1 ( u ) 


Thus,
g u

h s t r i c t ly i n c r e a s i n g
h s t r i c t ly d e c r e a s i n g

 F h 1 (u )

 
1
 1  F h ( u )
G  u
  u 
h s tr ic tly in c r e a s in g

h s t r i c t ly d e c r e a s i n g


dh  1  u 
1

F
h
u
 


du

1
  F  h  1 u dh  u 
 

du




h strictly increasing
h strictly decreasing
24
RS – 4 - Jointly distributed RV (b)
Method 2: The Transformation Method
g u   G  u 

d h 1 u 
1
 F  h u 

du
 
1
  F  h 1 u d h u 
 

du




h s tric tly in c re a s in g
h s tric tly d e c re a s in g
Or,
h
g u   f
1
(u )

d h 1 (u )
 f
du
x 
dx
du
Example 1: Log Normal Distribution
Suppose that X ~ N( 2). Find the distribution of U = h(x) = eX.
Solution: Recall transformation formula:
g
u

f
 f

x  
h 1 (u )

d h 1 (u )
 f
du

1
e
2 
h  1  u   ln  u  and
x 
dx
du
 x   2
2
2
dh  1  u 
du

d ln  u 
du

1
u
25
RS – 4 - Jointly distributed RV (b)
Example 1: Log Normal Distribution
Replacing in transformation formula:

g u   f
h 1 (u )
d h 1 (u )
 f
du

1
1 
e
2  u

 ln  u    
2
x 
dx
du
2
fo r u  0
2
Since the distribution of log(U) is normal, the distribution of U is
called the log-normal distribution.
Note: It is easy to obtain the mean of a log-normal variable.



E[u ]  ug (u )du 
0

0

1
2 
(ln(u )   ) 2
2 2
e
du
Example 1: Log Normal Distribution

E[u ] 

 ug (u)du  
0
0

1
2 
e
(ln(u )   ) 2
2 2
du
y=(ln(u) - )/σ=>u = eyσ+
Substitution:
dy σ =(1/u)du => du= σ eyσ+ dy.
Then,

E[u ] 

1
e
2 

0
e




e


1
e
2
2 
2
(ln(u )   ) 2


2 2
y2

du 


y2
 y
2

1
e
2

1
e 2  e y   dy
2 
dy
y2
 y
2
e

2
2
dy  e

2
2
26
RS – 4 - Jointly distributed RV (b)
Example 1: Log Normal Distribution - Graph
In finance, a popular assumption is assume that stock prices follow a
log-normal distribution. We have the following inks between normal
and log-normal variables.
Asset (Price)
ST
ln(ST)
(Return)
(ST - ST-1)/1
ln(ST) – ln(ST-1)
Distribution
Lognormal
Normal
First moment
exp(μ +0.5 σ)
μ
Second moment
exp(2μ +2 σ)
σ
Example 1: Log Normal Distribution - Graph
27
RS – 4 - Jointly distributed RV (b)
Example 1: Log Normal Distribution
Example: We value a call option, c, by discounting the expected option
payoff at expiration, using r as the discount rate, where the expectation
is calculated in a risk-neutral world (under “Q measure”):
c = e-rT EQ [ max (ST – K, 0) ]
• Under the Black-Scholes model, S is lognormally distributed, then
ln(S) follows a normal distribution. Then,
2
ln(S T ) ~ N (μ, σ 2 )  E[S T ]  e
μ
σ
2
 Var[S T ]  e 2μ  2σ
2
where μ is expected return and σ is volatility. Rates of return –i.e.,
ln(ST)-ln(St)- are also normally distributed!
We want to value a call options, c, with a strike price K:
c = e-rT EQ [ max (ST – K, 0) ]
Example 1: Log Normal Distribution
Divided into two terms:
c = e-rT EQ[ST|ST>K] - e-rT EQ[K|ST>K]
Looking at the second part:
EQ[K|ST>K] = K Prob [ln ST > ln K]
= K Prob [(ln ST – μ)/ σ > (ln K – μ)/ σ ]
= K Prob [ZT > (ln K – μ)/ σ ]
= K N(d2)
The first part is more complicated and relies on the following result:
If ln(ST)~N (μ, σ), then E [ST|ST>K] = E[ST ] N(σ -{ln(K)- μ}/σ).
Then,
EQ[ST|ST>K] = EQ[ST] N(σ -{ln(K)- μ}/σ)
= EQ[ST] N(d2 + σ)
= EQ[ST] N(d1) = exp(μ + 0.5 σ) N(d1)
28
RS – 4 - Jointly distributed RV (b)
Example 2: Inverse Gamma Distribution
Suppose that X has a Gamma distribution with α and λ
parameters. Find the distribution of U = h(x) = 1/X.
Solution: Recall transformation formula:
g
u  
f
h
1
(u )

d h 1 (u )
 f
du

x  1 e   x
 ( )
f ( x;  ,  ) 
h 1 ( u )  1 / u
x  0.
dh  1 ( u )
1
 2
du
u
and
dx
du
x 
x  0.
Example 2: Inverse Gamma Distribution
Replacing in transformation formula:
g
u  

f
h
1
(u )
  1 
 
 ( )  u 

dh
 1
1
(u )
 f
du

e
1
u
1

u
 ( )

u2
x 
dx
du
  1

e

u
u  0.
Since the distribution of X is gamma, the distribution of U is called
the inverse-gamma distribution.
It’s straightforward to calculate its mean:



E[u ]  ug (u )du 
0


(  1)

0


0

u
 ( )
 1

u
(  1)



e

e

u du

u du


(  1)
(  1).
29
RS – 4 - Jointly distributed RV (b)
Example 2: Inverse Gamma Distribution - Graph
• In Bayesian econometrics, it is common to assume that σ2 follows
an inverse gamma distribution (& σ-2 follows a gamma!). Usual values
for (α,λ): α=T/2 and λ=1/(2η2)=Φ/2, where η2 is related to the
variance of the T N(0, η2) variables we are implicitly adding.
30
RS – 4 - Jointly distributed RV (b)
Method 3: Use of moment
generating functions
Review: MGF - Definition
Let X denote a random variable with probability density function f(x) if
continuous (probability mass function p(x) if discrete). Then
mX(t) = the moment generating function of X


 E e tX
  tx
  e f  x  d x
  
  e tx p  x 
 x
if X is c o n tin u o u s
if X is d is c re te
The distribution of a random variable X is described by either
1. The density function f(x) if X continuous (probability mass
function p(x) if X discrete), or
2. The cumulative distribution function F(x), or
3. The moment generating function mX(t)
31
RS – 4 - Jointly distributed RV (b)
Review: MGF - Properties
1.
mX(0) = 1
2.
m Xk   0   k th d e riv a tiv e o f m X  t  a t t  0 .
 k  E
k  E X
3.
4.
m
X
k

X 
k
 x k f  x  d x
 
k
  x p  x 
t   1   1t

2
2!
t2 
X c o n tin u o u s
X d is c re te
3
3!
t3   
k
k!
tk   .
Let X be a RV with MGF mX(t). Let Y = bX + a
Then, mY(t) = mbX + a(t) = E(e [bX + a]t) = eatmX (bt)
Review: MGF - Properties
5. Let X and Y be two independent random variables with moment
generating function mX(t) and mY(t) .
Then mX+Y(t) = mX (t) mY (t)
6.
Let X and Y be two random variables with moment generating
function mX(t) and mY(t) and two distribution functions FX(x) and
FY(y) respectively.
If mX (t) = mY (t),
then FX(x) = FY(x).
This ensures that the distribution of a random variable can be identified
by its moment generating function.
7. mX+b(t) = ebt mX (t)
32
RS – 4 - Jointly distributed RV (b)
Using of moment generating
functions to find the distribution of
functions of Random Variables
Example: Sum of 2 independent exponentials
Suppose that X and Y are two independent distributed exponential
random variables with pdf ’s given by
f(x) = λ e-λx
f(y) = λ e-λy
Solution:

t
m X (t )  (
 t
m Y (t )  (1 
t

)  (1 

) 1
) 1
m X Y (t )  m X (t ) m X (t )  (1 
t
) 1 (1 
t
) 1  (1 
t
) 2



= MGF of the gamma distribution with α=2.
33
RS – 4 - Jointly distributed RV (b)
Example: Affine Transformation of a Normal
• Suppose that X ~ N() . What is the distribution of Y = aX + b?
• Solution:
m X t   e
t
 2t 2
2
m aX  b  t   e bt m X  at   e bt e
e
 a   b t 
  at  
 2  at 
2
2
 2 a 2t 2
2
= MGF of the normal distribution with mean a + b and
variance a22.
Thus, Y = aX + b ~ N(a + b, a22).
Example: Affine Transformation of a Normal
Special Case: the z transformation
X  
Z 


Z
 1 
  
 
 X  
  aX  b
  
  
 1 
  
 a  b  
   
  0



  
2

2
Z
 a 2
2
 1 
 
 
  
2
 1
Thus Z has a standard normal distribution.
34
RS – 4 - Jointly distributed RV (b)
Example 1: Distribution of X+Y
Suppose that X and Y are independent. Let X ~ N(X , X2) and
Y ~ N(Y , Y2). Find the distribution of S = X + Y.
Solution:
m X t   e
X t
m Y t   e
Y t 
 X2 t 2
2
 Y2 t 2
2
m X Y t   m X t  m Y t   e
Now,
m X Y t   e
  X   Y t 

2
X
X t
 X2 t 2
2
e
Y t 
 Y2 t 2
2

  Y2 t 2
2
Thus, Y = X + Y ~ N(X + Y , X2+Y2).
Example 2: Distribution of aX+bY
Suppose that X and Y are independent. Let X ~ N(X , X2) and
Y ~ N(Y , Y2). Find the distribution of L = aX + bY.
Solution:
m X t   e
Now,
X t
 X2 t 2
mY t   e
2
Y t 
 Y2 t 2
2
m a X  bY  t   m aX  t  m bY  t   m X  at  m Y  bt 
e
 X  at  
 X2  at 
2
a 
2
m aX  bY  t   e
2
 a  X  b  Y t 
e
2
X
Y  bt  
b
2
 Y2
t
 Y2  bt 
2
2
2
2
Thus, Y = aX + bY ~ N(aX + bY , a2X2+b2Y2).
35
RS – 4 - Jointly distributed RV (b)
Example 2: Distribution of aX+bY
Special Case:
a = +1 and b = -1.
Thus Y = X - Y has a normal distribution with mean X - Y and
variance:
  1
2

2
X
 
 1
2
2
Y  
2
2
 Y
X
Example 3: (Extension to n independent RV’s)
Suppose that X1, X2, …, Xn are independent each having a normal
distribution with means i, standard deviations i (for i = 1, 2, … , n)
Find the distribution of L = a1X1 + a1X2 + …+ anXn
Solution:
m X i t   e
Now
it 
 i2 t 2
2
(for i = 1, 2, … , n)
m a1 X 1   a n X n  t   m a1 X 1  t  m a n X n  t 
 m X 1  a1t  m X n  a n t 
e
1  a1t  
 12  a1t 
2
2
e
 n  an t 
 n2  a n t 
2
2
36
RS – 4 - Jointly distributed RV (b)
Example 3: (Extension to n independent RV’s)
or
ma1 X 1   an X n  t   e
 a1 1  ... an  n t 
a 
2 2
2 2
1 1  ... a n  n
t
2
2
= the MGF of the normal distribution
a1  1  ...  a n 
with mean
2
and variance a 1 
2
1
n
 . . .  a n2 
2
n
Thus Y = a1X1 + … + anXn has a normal distribution with mean
2
2
2
2
a11 + …+ ann and variance a 1  1  ...  a n  n
Example 3: (Extension to n independent RV’s)
Special case:
a1  a 2    a n 
1
n
1   2     n  
 12   12     12  
2
In this case X1, X2, …, Xn is a sample from a normal distribution
with mean , and standard deviations and
L 
1
n
X 1
 X
2
  X
n

 X  th e s a m p le m e a n
37
RS – 4 - Jointly distributed RV (b)
Example 3: (Extension to n independent RV’s)

Y  x  a 1 x1  ...  a n x n
Thus
1n x
1
 ... 
1n x
n
has a normal distribution with mean
 
 
 x  a 1  1  ...  a n  n  1   ...  1   
n
n
and variance

2
x
 a 12  12  ...  a n2 
2
 1 
   
 n 
2
n
2
2
 1 
 ...    
 n 
2
2
 1 
 n  
 n 
2


2
n
Summary
Let x1, x2, …, xn be a sample from a normal distribution with mean ,
and standard deviations . Then,
X ~ N(X, σ2/n)
Example 3: (Extension to n independent RV’s)
Distribution of x
0.4
0.3
Population
0.2
0.1
0
20
30
40
50
60
38
RS – 4 - Jointly distributed RV (b)
Application: Simple Regression Framework
Let the random variable Yi be determined by:
Yi = α + βXi + εi,
i=1, 2, ..., n.
where the Xi’s are (exogenous or pre-determined) numbers, α and β
constants. Let the random variable εi follow a normal distribution with
zero mean and constant variance. That is,
εi ~ N(0, σ2),
i = 1, 2, ..., n.
Then,
E(Yi)= E(α + βXi + εi) = α + βXi
Var(Yi)= Var(α + βXi + εi) = Var(εi) = σ2.
Since Y is a linear function of a normal RV
=> Yi ~ N(α + βXi, σ2).
Central and Non-central Distributions
• Noncentrality parameters are parameters of families of probability
distributions which are related to other central families of distributions.
If the noncentrality parameter of a distribution is zero, the
distribution is identical to a distribution in the central family.
For example, the standard Student's t-distribution is the central family
of distributions for the Noncentral t-distribution family.
• Noncentrality parameters are used in the following distributions:
- t-distribution
- F-distribution
- χ2 distribution
- χ distribution
- Beta distribution
39
RS – 4 - Jointly distributed RV (b)
Central and Non-central Distributions
• In general, noncentrality parameters occur in distributions that are
transformations of a normal distribution. The central versions are
derived from zero-mean normal distributions; the noncentral versions
generalize to arbitrary means.
Example: The standard (central) χ2 distribution is the distribution of a
sum of squared independent N(0, 1). The noncentral χ2 distribution
generalizes this to N( , 2).
• There are extended versions of these distributions with two
noncentrality parameters: the doubly noncentral beta distribution, the
doubly noncentral F-distribution and the doubly noncentral tdistribution.
Central and Non-central Distributions
• There are extended versions of these distributions with two
noncentrality parameters. This happens for distributions that are
defined as the quotient of two independent distributions.
• When both source distributions are central, the result is a central
distribution. When one is noncentral, a (singly) noncentral distribution
results, while if both are noncentral, the result is a doubly noncentral
distribution.
Example: A t-distribution is defined (ignoring constants) as the ratio
of a N(0,1) RV and the square root of an independent χ2 RV. Both
RV can have noncentral parameters. This produces a doubly
noncentral t-distribution.
40
RS – 4 - Jointly distributed RV (b)
The Central Limit Theorem (CLT): Preliminaries
The proof of the CLT is very simple using moment generating
functions. We will rely on the following result:
Let m1(t), m2(t), … , be a sequence of moment generating functions
corresponding to the sequence of distribution functions:F1(x) , F2(x), …
Let m(t) be a moment generating function corresponding to the
distribution function F(x).
Then, if
lim mi  t   m  t  for all t in an interval about 0.
then
lim Fi  x   F  x  for all x.
i 
i 
The Central Limit Theorem (CLT)
Let x1, x2, …, xn be a sequence of independent and identically
distributed_RVs with finite mean , and finite variance Then as n
increases, x , the sample mean, approaches the normal distribution with
mean μ and variance n.
_
This theorem is sometimes stated as
n (x  )

d
 N ( 0 ,1 )
d
where  means “the limiting distribution (asymptotic distribution) is”
(or convergence in distribution).
Note: This CLT is sometimes referred as the Lindeberg-Lévy CLT.
41
RS – 4 - Jointly distributed RV (b)
The Central Limit Theorem (CLT)
Proof: (use moment generating functions)
Let x1, x2, … be a sequence of independent random variables from a
distribution with moment generating function m(t) and CDF F(x).
Let Sn = x1 + x2 + … + xn then
mSn  t   mx1  x2  xn  t  =mx1  t  mx2  t  mxn  t 
=  m  t  
n
x1  x 2    x n
S
 n
n
n
now x 
or m x  t   m  1 
t   m S
 Sn
n
n
 t    t 
   m  
 n    n 
n
The Central Limit Theorem (CLT)
Let z 
x


n
then m z  t   e

n

n

t
x
n

 nt 

m x 
  e
  
n

t
  nt  
 m 
 
   n  
n
  t 
n
t  n ln  m 
and ln  m z  t    


   n 
Let u 
t
 n
or
ln [mz(t)]  
n 
t
t2
and n  2 2
u
 u
2 ln  m u    u
t2
t2
  
ln  m  u    t

2
2 2
2
 u  u
u2

42
RS – 4 - Jointly distributed RV (b)
The Central Limit Theorem (CLT)
ln [mz(t)]


t 2 ln  m  u     u
u2
2


lim ln  m z  t    lim ln  m z  t  
n
u0
Now

t2
lim
2
t

u0
2

lim
2 u0
ln  m  u     u
u2
m  u 

m u 
2u
using L'H opital's rule
m   u  m  u    m   u  

t2
2

 m  u  
2
lim
u0
2
2
using L'Hopital's rule again
The Central Limit Theorem (CLT)
t 2 m  0    m  0  
 2

2
th us
2
 
2
t 2 E xi   E  xi  
t2
 2


2
2
t2
lim ln  m z  t   
n 
2


t
2
an d lim  m z  t    e
t2
2
n 
2
Now m  t   e 2
• This is the moment generating function of the standard normal
distribution.
• Thus, the limiting distribution of z is the standard normal
distribution:
i.e.
lim F z  x  
n 
x


2
u

1
e 2 du
2
Q.E.D.
43
RS – 4 - Jointly distributed RV (b)
CLT: Asymptotic Distribution
The CLT states that the asymptotic distribution of the sample mean is a
normal distribution.
Q: What is meant by “asymptotic distribution”?
In general, the “asymptotic distribution of Xn” means a non-constant RV
X, along with real-number sequences {an} and {bn}, such that an(Xn − bn)
has as limiting distribution the distribution of X.
In this case, the distribution of X might be referred to as the asymptotic or
limiting distribution of either Xn or of an(Xn − bn), depending on the context.
The real number sequence {an} plays the role of a stabilizing
transformation to make sure the transformed RV –i.e., an(Xn−bn)– does not
have zero variance as n →∞.
_
_
Example: x has zero variance as n →∞. But, n1/2 x has a finite variance, σ2.
CLT: Remarks
• The CLT gives only an asymptotic distribution. As an approximation for a
finite number of observations, it provides a reasonable approximation only
when the observations are close to the mean; it requires a very large
number of observations to stretch it into the tails.
• The CLT also applies in the case of sequences that are not identically
distributed. Extra conditions need to be imposed on the RVs.
• Lindeberg found a condition on the sequence {Xn}, which guarantees that
the distribution of Sn is asymptotically normally distributed. W. Feller
showed that Lindeberg's condition is necessary as well (if the condition
does not hold, then the sum Sn is not asymptotically normally distributed).
• A sufficient condition that is stronger (but easier to state) than
Lindeberg's condition is that there exists a constant A, such that |Xn|< A
for all n.
44
RS – 4 - Jointly distributed RV (b)
Jarl W. Lindeberg (1876 – 1932)
Vilibald S. (Willy) Feller (1906-1970)
Paul Pierre Lévy (1886–1971)
45