Download Lecture 8 Characteristic Functions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability interpretations wikipedia , lookup

Birthday problem wikipedia , lookup

Random variable wikipedia , lookup

Probability box wikipedia , lookup

Randomness wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Karhunen–Loève theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Central limit theorem wikipedia , lookup

Transcript
Lecture 8: Characteristic Functions
1 of 9
Course:
Theory of Probability I
Term:
Fall 2013
Instructor: Gordan Zitkovic
Lecture 8
Characteristic Functions
First properties
A characteristic function is simply the Fourier transform, in probabilistic language. Since we will be integrating complex-valued functions,
we define (both integrals on the right need to exist)
Z
f dµ =
Z
< f dµ + i
Z
= f dµ,
where < f and = f denote the real and the imaginary part of a function
f : R → C. The reader will easily figure out which properties of the
integral transfer from the real case.
Definition 8.1. The characteristic function of a probability measure µ
on B(R) is the function ϕµ : R → C given by
ϕµ (t) =
Z
eitx µ(dx )
When we speak of the characteristic function ϕ X of a random variable X, we have the characteristic function ϕµX of its distribution µ X
in mind. Note, moreover, that
ϕ X (t) = E[eitX ].
While difficult to visualize, characteristic functions can be used to
learn a lot about the random variables they correspond to. We start
with some properties which follow directly from the definition:
Proposition 8.2. Let X, Y and { Xn }n∈N be a random variables.
1. ϕ X (0) = 1 and | ϕ X (t)| ≤ 1, for all t.
2. ϕ− X (t) = ϕ X (t), where bar denotes complex conjugation.
3. ϕ X is uniformly continuous.
4. If X and Y are independent, then ϕ X +Y = ϕ X ϕY .
Last Updated: December 8, 2013
Lecture 8: Characteristic Functions
2 of 9
5. For all t1 < t2 < · · · < tn , the matrix A = ( aij )1≤i,j≤n given by
a jk = ϕ X (t j − tk ),
is Hermitian and positive semi-definite, i.e., A∗ = A and ξ T Aξ ≥ 0, for
any ξ ∈ Cn ,
D
6. If Xn → X, then ϕ Xn (t) → ϕ X (t), for each t ∈ R.
Note: We do not prove (or use) it in
these notes, but it can be shown that
a function ϕ : R → C, continuous at
the origin with ϕ(0) = 1 is a characteristic function of some probability measure µ on B(R) if and only if it is positive semidefinite, i.e., if it satisfies part
5. of Proposition 8.2. This is known as
Bochner’s theorem.
Proof.
1. Immediate.
2. eitx = e−itx .
R
3. We have | ϕ X (t) − ϕ X (s)| = (eitx − eisx ) µ(dx ) ≤ h(t − s), where
iux
R iux
h(u) = e − 1 µ(dx ). Since e − 1 ≤ 2, dominated convergence theorem implies that limu→0 h(u) = 0, and, so, ϕ X is uniformly continuous.
4. Independence of X and Y implies the independence of exp(itX ) and
exp(itY ). Therefore,
ϕ X +Y (t) = E[eit(X +Y ) ] = E[eitX eitY ] = E[eitX ]E[eitY ] = ϕ X (t) ϕY (t).
5. The matrix A is Hermitian by 2. above. To see that it is positive
semidefinite, note that a jk = E[eit j X e−itk X ], and so

!
!
n
n n
n
∑ ξ k eitk X 
∑ ∑ ξ j ξ k a jk = E  ∑ ξ j eitj X
j =1 k =1
j =1
k =1
n
= E[| ∑ ξ j eit j X |2 ] ≥ 0.
j =1
6. The functions x 7→ cos(tx ) and x 7→ sin(tx ) and bounded and
continuous so it suffices to apply the definition of weak convergence.
Here is a simple problem you can use to test your understanding
of the definitions:
Problem 8.1. Let µ and ν be two probability measures on B(R), and
let ϕµ and ϕν be their characteristic functions. Show that Parseval’s
Identity holds:
Z
R
e−its ϕµ (t) ν(dt) =
Z
R
ϕν (t − s) µ(dt), for all s ∈ R.
Our next result shows µ can be recovered from its characteristic
function ϕµ :
Last Updated: December 8, 2013
Lecture 8: Characteristic Functions
3 of 9
Theorem 8.3 (Inversion theorem). Let µ be a probability measure on B(R),
and let ϕ = ϕµ be its characteristic function. Then, for a < b ∈ R, we have
Z T
1
lim
2π T →∞
−T
µ(( a, b)) + 21 µ({ a, b}) =
e−ita − e−itb
ϕ(t) dt.
it
(8.1)
Proof. We start by picking a < b and noting that
e−ita − e−itb
=
it
Z b
a
e−ity dy,
so that, by Fubini’s theorem, the integral in (8.1) is well-defined:
F ( a, b, T ) =
Z
[− T,T ]×[ a,b]
where
F ( a, b, T ) =
exp(−ity) ϕ(t) dy dt,
Z T −ita
e
− e−itb
it
−T
ϕ(t) dt.
Another use of Fubini’s theorem yields:
F ( a, b, T ) =
Z
exp(−ity) exp(itx ) dy dt µ(dx )
=
exp(−it(y − x )) dy dt µ(dx )
R
[− T,T ]×[ a,b]
Z
Z
−it( a− x )
−it(b− x )
1
−e
dt µ(dx ).
=
it e
[− T,T ]×[ a,b]×R
Z Z
R
[− T,T ]
Set
Z T
f ( a, b, T ) =
−T
1 −it( a− x )
it ( e
− e−it(b− x) ) dt and K ( T, c) =
Z T
sin(ct)
0
t
dt,
and note that, since cos is an even and sin an odd function, we have
Note: The integral
Z T
sin(( a− x )t)
Z T
f ( a, b, T; x ) = 2
t
0
−
sin((b− x )t)
t
dt
−T
1
it
exp(−it( a − x )) dt
is not defined; we really need to work
with the full f ( a, b, T; x ) to get the right
cancellation.
= 2K ( T; a − x ) − 2K ( T; b − x ).
Since
K ( T; c) =
R
T


 0



sin(ct)
ct d ( ct )
=
R cT
0
sin(s)
s
ds = K (cT; 1),
c>0
0,
c=0
−K (|c| T; 1),
c < 0,
(8.2)
Problem 5.11 implies that
lim K ( T; c) =
T →∞

π


2,
c > 0,
0,
c = 0,



− π2 ,
c < 0.
Last Updated: December 8, 2013
Lecture 8: Characteristic Functions
and so
lim f ( a, b, T; x ) =
T →∞


0,




4 of 9
x ∈ [ a, b]c ,
π,
x = a or x = b,
2π,
a < x < b.
Observe first that the function T 7→ K ( T; 1) is continuous on [0, ∞) and
has a finite limit as T → ∞ so that supT ≥0 |K ( T; 1)| < ∞. Furthermore,
(8.2) implies that |K ( T; c)| ≤ supT ≥0 K ( T; 1) for any c ∈ R and T ≥ 0
so that
sup{| f ( a, b, T; x )| : x ∈ R, T ≥ 0} < ∞.
Therefore, we can use the dominated convergence theorem to get that
1
F ( a, b, T; x )
T →∞ 2π
lim
1
T →∞ 2π
= lim
=
1
2π
Z
Z
f ( a, b, T; x ) µ(dx )
lim f ( a, b, T; x ) µ( x )
T →∞
= 21 µ({ a}) + µ(( a, b)) + 12 µ({b}).
Corollary 8.4. For probability measures µ1 and µ2 on B(R), the equality
ϕµ1 = ϕµ2 implies that µ1 = µ2 .
Proof. By Theorem 8.3, we have µ1 (( a, b)) = µ2 (( a, b)) for all a, b ∈ C
where C is the set of all x ∈ R such that µ1 ({ x }) = µ2 ({ x }) = 0. Since
C c is at most countable, it is straightforward to see that the family
{( a, b) : a, b ∈ C } of intervals is a π-system which generates B(R).
R Corollary 8.5. Suppose that R ϕµ (t) dt < ∞. Then µ λ and
bounded and continuous function given by
dµ
1
= f , where f ( x ) =
dλ
2π
Z
R
dµ
dλ
is a
e−itx ϕµ (t) dt for x ∈ R.
Proof. Since ϕµ is integrable and e−itx = 1, f is well defined. For
a < b we have
Z b
a
b
1
f ( x ) dx =
e−itx ϕµ (t) dt dx
2π a R
Z b
Z
1
=
ϕµ (t)
e−itx dx dt
2π R
a
Z
Z
e−ita − e−itb
ϕ(t) dt
it
R
Z T −ita
1
e
− e−itb
= lim
ϕ(t) dt
it
T →∞ 2π − T
=
1
2π
Z
(8.2)
= µ(( a, b)) + 21 µ({ a, b}),
by Theorem 8.3, where the use of Fubini’s theorem above is justified by
the fact that the function (t, x ) 7→ e−itx ϕµ (t) is integrable on [ a, b] × R,
Last Updated: December 8, 2013
Lecture 8: Characteristic Functions
5 of 9
for all a < b. For a, b such that µ({ a}) = µ({b}) = 0, the equation
Rb
(8.2) implies that µ(( a, b)) = a f ( x ) dx. The claim now follows by the
π − λ-theorem.
Example 8.6. Here is a list of some common distributions and the
corresponding characteristic functions:
1. Continuous distributions.
Density f X ( x )
Name
Parameters
1
Uniform
a<b
1
b− a
2
Symmetric Uniform
a>0
1
2a
3
Normal
µ ∈ R, σ > 0
4
Exponential
λ>0
5
Double Exponential
λ>0
6
Cauchy
µ ∈ R, γ > 0
√ 1
2πσ2
e−ita −e−itb
it(b− a)
1[ a,b] ( x )
sin( at)
at
1[− a,a] ( x )
exp(−
( x − µ )2
2σ2
λ exp(−λx )1[0,∞) ( x )
1
2
Ch. function ϕ X (t)
λ exp(−λ | x |)
γ
π (γ2 +( x −µ)2 )
)
exp(iµt − 21 σ2 t2 )
λ
λ−it
λ2
λ2 + t2
exp(iµt − γ |t|)
2. Discrete distributions.
Name
Parameters
Distribution µ X ,
Ch. function ϕ X (t)
7
Dirac
c∈R
δc
exp(itc)
8
Biased Coin-toss
p ∈ (0, 1)
pδ1 + (1 − p)δ−1
cos(t) + (2p − 1)i sin(t)
9
Geometric
p ∈ (0, 1)
∑n∈N0 pn (1 − p)δn
1− p
1−eit p
10
Poisson
λ>0
λn
n!
exp(λ(eit − 1))
∑n∈N0 e−λ
δn , n ∈ N0
3. A singular distribution.
11
Name
Ch. function ϕ X (t)
Cantor
t
eit/2 ∏∞
k =1 cos( 3k )
Tail behavior
We continue by describing several methods one can use to extract useful information about the tails of the underlying probability distribution from a characteristic function.
Proposition 8.7. Let X be a random variable. If E[| X |n ] < ∞, then
dn
ϕ (t) exists for all t and
(dt)n X
dn
(dt)n
ϕ X (t) = E[eitX (iX )n ].
In particular
n
d
E[ X n ] = (−i )n (dt
ϕ (0).
)n X
Last Updated: December 8, 2013
Lecture 8: Characteristic Functions
6 of 9
Proof. We give the proof in the case n = 1 and leave the general case
to the reader:
ϕ(h)− ϕ(0)
h
h →0
lim
= lim
Z
h →0 R
eihx −1
h
µ(dx ) =
Z
eihx −1
h
R h →0
lim
µ(dx ) =
Z
R
ix µ(dx ),
where the passage of the limit under the integral sign is justified by
the dominated convergence theorem which, in turn, can be used since
Z
ihx e −1 | x | µ(dx ) = E[| X |] < ∞.
h ≤ | x | , and
R
Remark 8.8.
1. It can be shown that for n even, the existence of
dn
(dt)n
ϕ X (0) (in the ap-
propriate sense) implies the finiteness of the n-th moment E[| X |n ].
2. When n is odd, it can happen that
∞ - see Problem 8.6.
dn
(dt)n
ϕ X (0) exists, but E[| X |n ] =
Finer estimates of the tails of a probability distribution can be obtained by finer analysis of the behavior of ϕ around 0:
Proposition 8.9. Let µ be a probability measure on B(R) and let ϕ = ϕµ
be its characteristic function. Then, for ε > 0 we have
µ([− 2ε , 2ε ]c ) ≤
1
ε
Z ε
−ε
(1 − ϕ(t)) dt.
Proof. Let X be a random variable with distribution µ. We start by
using Fubini’s theorem to get
1
2ε
Z ε
(1 − ϕ(t)) dt =
ε
1
2ε E[
= 1ε E[
Z ε
−ε
Z ε
0
(1 − eitX ) dt]
(1 − cos(tX )) dt] = E[1 −
sin( x )
sin(εX )
εX ].
sin( x )
It remains to observe that 1 − x ≥ 0 and 1 − x ≥ 1 − |1x| for all
x. Therefore, if we use the first inequality on [−2, 2] and the second
sin( x )
one on [−2, 2]c , we get 1 − x ≥ 12 1{| x|>2} so that
1
2ε
Z ε
ε
(1 − ϕ(t)) dt ≥ 12 P[|εX | > 2] = 12 µ([− 2ε , 2ε ]c ).
Problem 8.2. Use the inequality of Proposition 8.9 to show that if
R
ϕ(t) = 1 + O(|t|α ) for some α > 0, then R | x | β µ(dx ) < ∞, for all
R
β < α. Give an example where R | x |α µ(dx ) = ∞.
Note: “ f (t) = g(t) + O(h(t))” means
that, for some δ > 0, we have
Problem 8.3 (Riemann-Lebesgue theorem). Suppose that µ λ. Show
that
lim ϕµ (t) = lim ϕµ (t) = 0.
Hint: Use (and prove) the fact that f ∈
L1+ (R) can be approximated in L1 (R)
by a function of the form ∑nk=1 αk 1[ak ,bk ] .
t→∞
sup
|t|≤δ
| f (t)− g(t)|
h(t)
< ∞.
t→−∞
Last Updated: December 8, 2013
Lecture 8: Characteristic Functions
7 of 9
The continuity theorem
Theorem 8.10 (Continuity theorem). Let {µn }n∈N be a sequence of probability distributions on B(R), and let { ϕn }n∈N be the sequence of their characteristic functions. Suppose that there exists a function ϕ : R → C such
that
1. ϕn (t) → ϕ(t), for all t ∈ R, and
2. ϕ is continuous at t = 0.
Then, ϕ is the characteristic function of a probability measure µ on B(R) and
w
µn → µ.
Proof. We start by showing that the continuity of the limit ϕ implies
tightness of {µn }n∈N . Given ε > 0 there exists δ > 0 such that 1 −
ϕ(t) ≤ ε/2 for |t| ≤ δ. By the dominated convergence theorem we
have
lim sup µn ([− 2δ , 2δ ]c ) ≤ lim sup 1δ
n→∞
=
1
δ
n→∞
Z δ
−δ
Z δ
(1 − ϕn (t)) dt
δ
(1 − ϕ(t)) dt ≤ ε.
By taking an even smaller δ0 > 0, we can guarantee that
sup µn ([− δ20 , δ20 ]c ) ≤ ε,
n ∈N
which, together with the arbitrariness of ε > 0 implies that {µn }n∈N is
tight.
Let {µnk }k∈N be a convergent subsequence of {µn }n∈N , and let µ
be its limit. Since ϕnk → ϕ, we conclude that ϕ is the characteristic
function of µ. It remains to show that the whole sequence converges
to µ weakly. This follows, however, directly from Problem 7.4, since
any convergent subsequence {µnk }k∈N has the same limit µ.
Problem 8.4. Let ϕ be a characteristic function of some probability
measure µ on B(R). Show that ϕ̂(t) = e ϕ(t)−1 is also a characteristic
function of some probability measure µ̂ on B(R).
Additional Problems
Problem 8.5 (Atoms from the characteristic function). Let µ be a probability measure on B(R), and let ϕ = ϕµ be its characteristic function.
R T −ita
1
ϕ(t) dt.
1. Show that µ({ a}) = limT →∞ 2T
−T e
2. Show that if limt→∞ | ϕ(t)| = limt→−∞ | ϕ(t)| = 0, then µ has no
atoms.
Last Updated: December 8, 2013
Lecture 8: Characteristic Functions
8 of 9
3. Show that converse of (2) is false.
Problem 8.6 (Existence of ϕ0X (0) does not imply that X ∈ L1 ). Let X
be a random variable which takes values in Z \ {−2, −1, 0, 1, 2} with
P[ X = k ] = P[ X = − k ] =
where C =
1
1
−1
2 ( ∑k ≥3 k2 log(k ) )
C
,
k2 log(k )
Hint: Prove that | ϕ(tn )| = 1 along a suitably chosen sequence tn → ∞, where ϕ
is the characteristic function of the Cantor distribution.
for k = 3, 4, . . . ,
∈ (0, ∞). Show that ϕ0X (0) = 0, but
X 6∈ L1 . Hint: Argue that, in order to establish that ϕ0X (0) = 0, it is enough to show
that
∑
cos(hk )−1
lim 1
2
h→0 h k≥3 k log(k)
= 0.
Then split the sum at k close to 2/h and use (and prove) the inequality |cos( x ) − 1| ≤
min( x2 /2, x ). Bounding sums by integrals may help, too.
Problem 8.7 (Multivariate characteristic functions). Let X = ( X1 , . . . , Xn )
be a random vector. The characteristic function ϕ = ϕ X : Rn → C is
given by
ϕ(t1 , t2 , . . . , tn ) = E[exp(i
n
∑ tk Xk )].
k =1
We will also use the shortcut t for (t1 , . . . , tn ) and t · X for the random
variable ∑nk=1 tk Xk . Prove the following statements
1. Random variables X and Y are independent if and only if
ϕ(X,Y ) (t1 , t2 ) = ϕ X (t1 ) ϕY (t2 ) for all t1 , t2 ∈ R.
2. Random vectors X 1 and X 2 have the same distribution if and only
if random variables t · X 1 and t · X 2 have the same distribution for
all t ∈ Rn . (This fact is known as Wald’s device.)
Note: Take for granted the following
statement (the proof of which is similar
to the proof of the 1-dimensional case):
Suppose that X 1 and X 2 are random vectors
with ϕ X 1 (t ) = ϕ X 2 (t ) for all t ∈ Rn . Then
X 1 and X 2 have the same distribution, i.e.
µX 1 = µX 2 .
An n-dimensional random vector X is said to be Gaussian (or, to
have the multivariate normal distribution) if there exists a vector µ ∈ Rn
and a symmetric positive semi-definite matrix Σ ∈ Rn×n such that
ϕ X (t ) = exp(i t · µ − 21 t τ Σt ),
where t is interpreted as a column vector, and ()τ is transposition.
This is denoted as X ∼ N (µ, Σ). X is said to be non-degenerate if Σ is
positive definite.
3. Show that a random vector X is Gaussian, if and only if the random
vector t · X is normally distributed (with some mean and variance)
for each t ∈ Rn .
Note: Be careful, nothing in the second
statement tells you what the mean and
variance of t · X are.
4. Let X = ( X1 , X2 , . . . , Xn ) be a Gaussian random vector. Show that
Xk and Xl , k 6= l, are independent if and only if they are uncorrelated.
Last Updated: December 8, 2013
Lecture 8: Characteristic Functions
9 of 9
5. Construct a random vector ( X, Y ) such that both X and Y are normally distributed, but that X = ( X, Y ) is not Gaussian.
6. Let X = ( X1 , X2 , . . . , Xn ) be a random vector consisting of n independent random variables with Xi ∼ N (0, 1). Let Σ ∈ Rn×n
be a given positive semi-definite symmetric matrix, and µ ∈ Rn
a given vector. Show that there exists an affine transformation
T : Rn → Rn such that the random vector T ( X ) is Gaussian with
T ( X ) ∼ N (µ, Σ).
7. Find a necessary and sufficient condition on µ and Σ such that
the converse of the previous problem holds true: For a Gaussian
random vector X ∼ N (µ, Σ), there exists an affine transformation
T : Rn → Rn such that T ( X ) has independent components with the
N (0, 1)-distribution (i.e. T ( X ) ∼ N (0, yI ), where yI is the identity
matrix).
Problem 8.8 (Slutsky’s Theorem). Let X, Y, { Xn }n∈N and {Yn }n∈N be
random variables defined on the same probability space, such that
D
D
Xn → X and Yn → Y.
(8.3)
Show that
D
1. It is not necessarily true that Xn + Yn → X + Y. For that matter,
D
we do not necessarily have ( Xn , Yn ) → ( X, Y ) (where the pairs are
considered as random elements in the metric space R2 ).
2. If, in addition to (8.3), there exists a constant c ∈ R such that P[Y =
D
c] = 1, show that g( Xn , Yn ) → g( X, c), for any continuous function
g : R2 → R.
Hint:
It is enough to show that
D
( Xn , Yn ) → ( Xn , c). Use Problem 8.7).
Problem 8.9 (Convergence of a normal sequence).
1. Let { Xn }n∈N be a sequence of normally-distributed random variables converging weakly towards a random variable X. Show that
X must be a normal random variable itself.
a.s.
2. Let Xn be a sequence of normal random variables such that Xn → X.
Lp
Show that Xn → X for all p ≥ 1.
Hint: Use this fact: for a sequence
{µn }n∈N of real numbers, the following two
statements are equivalent
(a) µn → µ ∈ R, and
(b) exp(itµn ) → exp(itµ), for all t.
You don’t need to prove it, but feel free to try.
Last Updated: December 8, 2013