Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 8: Characteristic Functions 1 of 9 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 8 Characteristic Functions First properties A characteristic function is simply the Fourier transform, in probabilistic language. Since we will be integrating complex-valued functions, we define (both integrals on the right need to exist) Z f dµ = Z < f dµ + i Z = f dµ, where < f and = f denote the real and the imaginary part of a function f : R → C. The reader will easily figure out which properties of the integral transfer from the real case. Definition 8.1. The characteristic function of a probability measure µ on B(R) is the function ϕµ : R → C given by ϕµ (t) = Z eitx µ(dx ) When we speak of the characteristic function ϕ X of a random variable X, we have the characteristic function ϕµX of its distribution µ X in mind. Note, moreover, that ϕ X (t) = E[eitX ]. While difficult to visualize, characteristic functions can be used to learn a lot about the random variables they correspond to. We start with some properties which follow directly from the definition: Proposition 8.2. Let X, Y and { Xn }n∈N be a random variables. 1. ϕ X (0) = 1 and | ϕ X (t)| ≤ 1, for all t. 2. ϕ− X (t) = ϕ X (t), where bar denotes complex conjugation. 3. ϕ X is uniformly continuous. 4. If X and Y are independent, then ϕ X +Y = ϕ X ϕY . Last Updated: December 8, 2013 Lecture 8: Characteristic Functions 2 of 9 5. For all t1 < t2 < · · · < tn , the matrix A = ( aij )1≤i,j≤n given by a jk = ϕ X (t j − tk ), is Hermitian and positive semi-definite, i.e., A∗ = A and ξ T Aξ ≥ 0, for any ξ ∈ Cn , D 6. If Xn → X, then ϕ Xn (t) → ϕ X (t), for each t ∈ R. Note: We do not prove (or use) it in these notes, but it can be shown that a function ϕ : R → C, continuous at the origin with ϕ(0) = 1 is a characteristic function of some probability measure µ on B(R) if and only if it is positive semidefinite, i.e., if it satisfies part 5. of Proposition 8.2. This is known as Bochner’s theorem. Proof. 1. Immediate. 2. eitx = e−itx . R 3. We have | ϕ X (t) − ϕ X (s)| = (eitx − eisx ) µ(dx ) ≤ h(t − s), where iux R iux h(u) = e − 1 µ(dx ). Since e − 1 ≤ 2, dominated convergence theorem implies that limu→0 h(u) = 0, and, so, ϕ X is uniformly continuous. 4. Independence of X and Y implies the independence of exp(itX ) and exp(itY ). Therefore, ϕ X +Y (t) = E[eit(X +Y ) ] = E[eitX eitY ] = E[eitX ]E[eitY ] = ϕ X (t) ϕY (t). 5. The matrix A is Hermitian by 2. above. To see that it is positive semidefinite, note that a jk = E[eit j X e−itk X ], and so ! ! n n n n ∑ ξ k eitk X ∑ ∑ ξ j ξ k a jk = E ∑ ξ j eitj X j =1 k =1 j =1 k =1 n = E[| ∑ ξ j eit j X |2 ] ≥ 0. j =1 6. The functions x 7→ cos(tx ) and x 7→ sin(tx ) and bounded and continuous so it suffices to apply the definition of weak convergence. Here is a simple problem you can use to test your understanding of the definitions: Problem 8.1. Let µ and ν be two probability measures on B(R), and let ϕµ and ϕν be their characteristic functions. Show that Parseval’s Identity holds: Z R e−its ϕµ (t) ν(dt) = Z R ϕν (t − s) µ(dt), for all s ∈ R. Our next result shows µ can be recovered from its characteristic function ϕµ : Last Updated: December 8, 2013 Lecture 8: Characteristic Functions 3 of 9 Theorem 8.3 (Inversion theorem). Let µ be a probability measure on B(R), and let ϕ = ϕµ be its characteristic function. Then, for a < b ∈ R, we have Z T 1 lim 2π T →∞ −T µ(( a, b)) + 21 µ({ a, b}) = e−ita − e−itb ϕ(t) dt. it (8.1) Proof. We start by picking a < b and noting that e−ita − e−itb = it Z b a e−ity dy, so that, by Fubini’s theorem, the integral in (8.1) is well-defined: F ( a, b, T ) = Z [− T,T ]×[ a,b] where F ( a, b, T ) = exp(−ity) ϕ(t) dy dt, Z T −ita e − e−itb it −T ϕ(t) dt. Another use of Fubini’s theorem yields: F ( a, b, T ) = Z exp(−ity) exp(itx ) dy dt µ(dx ) = exp(−it(y − x )) dy dt µ(dx ) R [− T,T ]×[ a,b] Z Z −it( a− x ) −it(b− x ) 1 −e dt µ(dx ). = it e [− T,T ]×[ a,b]×R Z Z R [− T,T ] Set Z T f ( a, b, T ) = −T 1 −it( a− x ) it ( e − e−it(b− x) ) dt and K ( T, c) = Z T sin(ct) 0 t dt, and note that, since cos is an even and sin an odd function, we have Note: The integral Z T sin(( a− x )t) Z T f ( a, b, T; x ) = 2 t 0 − sin((b− x )t) t dt −T 1 it exp(−it( a − x )) dt is not defined; we really need to work with the full f ( a, b, T; x ) to get the right cancellation. = 2K ( T; a − x ) − 2K ( T; b − x ). Since K ( T; c) = R T 0 sin(ct) ct d ( ct ) = R cT 0 sin(s) s ds = K (cT; 1), c>0 0, c=0 −K (|c| T; 1), c < 0, (8.2) Problem 5.11 implies that lim K ( T; c) = T →∞ π 2, c > 0, 0, c = 0, − π2 , c < 0. Last Updated: December 8, 2013 Lecture 8: Characteristic Functions and so lim f ( a, b, T; x ) = T →∞ 0, 4 of 9 x ∈ [ a, b]c , π, x = a or x = b, 2π, a < x < b. Observe first that the function T 7→ K ( T; 1) is continuous on [0, ∞) and has a finite limit as T → ∞ so that supT ≥0 |K ( T; 1)| < ∞. Furthermore, (8.2) implies that |K ( T; c)| ≤ supT ≥0 K ( T; 1) for any c ∈ R and T ≥ 0 so that sup{| f ( a, b, T; x )| : x ∈ R, T ≥ 0} < ∞. Therefore, we can use the dominated convergence theorem to get that 1 F ( a, b, T; x ) T →∞ 2π lim 1 T →∞ 2π = lim = 1 2π Z Z f ( a, b, T; x ) µ(dx ) lim f ( a, b, T; x ) µ( x ) T →∞ = 21 µ({ a}) + µ(( a, b)) + 12 µ({b}). Corollary 8.4. For probability measures µ1 and µ2 on B(R), the equality ϕµ1 = ϕµ2 implies that µ1 = µ2 . Proof. By Theorem 8.3, we have µ1 (( a, b)) = µ2 (( a, b)) for all a, b ∈ C where C is the set of all x ∈ R such that µ1 ({ x }) = µ2 ({ x }) = 0. Since C c is at most countable, it is straightforward to see that the family {( a, b) : a, b ∈ C } of intervals is a π-system which generates B(R). R Corollary 8.5. Suppose that R ϕµ (t) dt < ∞. Then µ λ and bounded and continuous function given by dµ 1 = f , where f ( x ) = dλ 2π Z R dµ dλ is a e−itx ϕµ (t) dt for x ∈ R. Proof. Since ϕµ is integrable and e−itx = 1, f is well defined. For a < b we have Z b a b 1 f ( x ) dx = e−itx ϕµ (t) dt dx 2π a R Z b Z 1 = ϕµ (t) e−itx dx dt 2π R a Z Z e−ita − e−itb ϕ(t) dt it R Z T −ita 1 e − e−itb = lim ϕ(t) dt it T →∞ 2π − T = 1 2π Z (8.2) = µ(( a, b)) + 21 µ({ a, b}), by Theorem 8.3, where the use of Fubini’s theorem above is justified by the fact that the function (t, x ) 7→ e−itx ϕµ (t) is integrable on [ a, b] × R, Last Updated: December 8, 2013 Lecture 8: Characteristic Functions 5 of 9 for all a < b. For a, b such that µ({ a}) = µ({b}) = 0, the equation Rb (8.2) implies that µ(( a, b)) = a f ( x ) dx. The claim now follows by the π − λ-theorem. Example 8.6. Here is a list of some common distributions and the corresponding characteristic functions: 1. Continuous distributions. Density f X ( x ) Name Parameters 1 Uniform a<b 1 b− a 2 Symmetric Uniform a>0 1 2a 3 Normal µ ∈ R, σ > 0 4 Exponential λ>0 5 Double Exponential λ>0 6 Cauchy µ ∈ R, γ > 0 √ 1 2πσ2 e−ita −e−itb it(b− a) 1[ a,b] ( x ) sin( at) at 1[− a,a] ( x ) exp(− ( x − µ )2 2σ2 λ exp(−λx )1[0,∞) ( x ) 1 2 Ch. function ϕ X (t) λ exp(−λ | x |) γ π (γ2 +( x −µ)2 ) ) exp(iµt − 21 σ2 t2 ) λ λ−it λ2 λ2 + t2 exp(iµt − γ |t|) 2. Discrete distributions. Name Parameters Distribution µ X , Ch. function ϕ X (t) 7 Dirac c∈R δc exp(itc) 8 Biased Coin-toss p ∈ (0, 1) pδ1 + (1 − p)δ−1 cos(t) + (2p − 1)i sin(t) 9 Geometric p ∈ (0, 1) ∑n∈N0 pn (1 − p)δn 1− p 1−eit p 10 Poisson λ>0 λn n! exp(λ(eit − 1)) ∑n∈N0 e−λ δn , n ∈ N0 3. A singular distribution. 11 Name Ch. function ϕ X (t) Cantor t eit/2 ∏∞ k =1 cos( 3k ) Tail behavior We continue by describing several methods one can use to extract useful information about the tails of the underlying probability distribution from a characteristic function. Proposition 8.7. Let X be a random variable. If E[| X |n ] < ∞, then dn ϕ (t) exists for all t and (dt)n X dn (dt)n ϕ X (t) = E[eitX (iX )n ]. In particular n d E[ X n ] = (−i )n (dt ϕ (0). )n X Last Updated: December 8, 2013 Lecture 8: Characteristic Functions 6 of 9 Proof. We give the proof in the case n = 1 and leave the general case to the reader: ϕ(h)− ϕ(0) h h →0 lim = lim Z h →0 R eihx −1 h µ(dx ) = Z eihx −1 h R h →0 lim µ(dx ) = Z R ix µ(dx ), where the passage of the limit under the integral sign is justified by the dominated convergence theorem which, in turn, can be used since Z ihx e −1 | x | µ(dx ) = E[| X |] < ∞. h ≤ | x | , and R Remark 8.8. 1. It can be shown that for n even, the existence of dn (dt)n ϕ X (0) (in the ap- propriate sense) implies the finiteness of the n-th moment E[| X |n ]. 2. When n is odd, it can happen that ∞ - see Problem 8.6. dn (dt)n ϕ X (0) exists, but E[| X |n ] = Finer estimates of the tails of a probability distribution can be obtained by finer analysis of the behavior of ϕ around 0: Proposition 8.9. Let µ be a probability measure on B(R) and let ϕ = ϕµ be its characteristic function. Then, for ε > 0 we have µ([− 2ε , 2ε ]c ) ≤ 1 ε Z ε −ε (1 − ϕ(t)) dt. Proof. Let X be a random variable with distribution µ. We start by using Fubini’s theorem to get 1 2ε Z ε (1 − ϕ(t)) dt = ε 1 2ε E[ = 1ε E[ Z ε −ε Z ε 0 (1 − eitX ) dt] (1 − cos(tX )) dt] = E[1 − sin( x ) sin(εX ) εX ]. sin( x ) It remains to observe that 1 − x ≥ 0 and 1 − x ≥ 1 − |1x| for all x. Therefore, if we use the first inequality on [−2, 2] and the second sin( x ) one on [−2, 2]c , we get 1 − x ≥ 12 1{| x|>2} so that 1 2ε Z ε ε (1 − ϕ(t)) dt ≥ 12 P[|εX | > 2] = 12 µ([− 2ε , 2ε ]c ). Problem 8.2. Use the inequality of Proposition 8.9 to show that if R ϕ(t) = 1 + O(|t|α ) for some α > 0, then R | x | β µ(dx ) < ∞, for all R β < α. Give an example where R | x |α µ(dx ) = ∞. Note: “ f (t) = g(t) + O(h(t))” means that, for some δ > 0, we have Problem 8.3 (Riemann-Lebesgue theorem). Suppose that µ λ. Show that lim ϕµ (t) = lim ϕµ (t) = 0. Hint: Use (and prove) the fact that f ∈ L1+ (R) can be approximated in L1 (R) by a function of the form ∑nk=1 αk 1[ak ,bk ] . t→∞ sup |t|≤δ | f (t)− g(t)| h(t) < ∞. t→−∞ Last Updated: December 8, 2013 Lecture 8: Characteristic Functions 7 of 9 The continuity theorem Theorem 8.10 (Continuity theorem). Let {µn }n∈N be a sequence of probability distributions on B(R), and let { ϕn }n∈N be the sequence of their characteristic functions. Suppose that there exists a function ϕ : R → C such that 1. ϕn (t) → ϕ(t), for all t ∈ R, and 2. ϕ is continuous at t = 0. Then, ϕ is the characteristic function of a probability measure µ on B(R) and w µn → µ. Proof. We start by showing that the continuity of the limit ϕ implies tightness of {µn }n∈N . Given ε > 0 there exists δ > 0 such that 1 − ϕ(t) ≤ ε/2 for |t| ≤ δ. By the dominated convergence theorem we have lim sup µn ([− 2δ , 2δ ]c ) ≤ lim sup 1δ n→∞ = 1 δ n→∞ Z δ −δ Z δ (1 − ϕn (t)) dt δ (1 − ϕ(t)) dt ≤ ε. By taking an even smaller δ0 > 0, we can guarantee that sup µn ([− δ20 , δ20 ]c ) ≤ ε, n ∈N which, together with the arbitrariness of ε > 0 implies that {µn }n∈N is tight. Let {µnk }k∈N be a convergent subsequence of {µn }n∈N , and let µ be its limit. Since ϕnk → ϕ, we conclude that ϕ is the characteristic function of µ. It remains to show that the whole sequence converges to µ weakly. This follows, however, directly from Problem 7.4, since any convergent subsequence {µnk }k∈N has the same limit µ. Problem 8.4. Let ϕ be a characteristic function of some probability measure µ on B(R). Show that ϕ̂(t) = e ϕ(t)−1 is also a characteristic function of some probability measure µ̂ on B(R). Additional Problems Problem 8.5 (Atoms from the characteristic function). Let µ be a probability measure on B(R), and let ϕ = ϕµ be its characteristic function. R T −ita 1 ϕ(t) dt. 1. Show that µ({ a}) = limT →∞ 2T −T e 2. Show that if limt→∞ | ϕ(t)| = limt→−∞ | ϕ(t)| = 0, then µ has no atoms. Last Updated: December 8, 2013 Lecture 8: Characteristic Functions 8 of 9 3. Show that converse of (2) is false. Problem 8.6 (Existence of ϕ0X (0) does not imply that X ∈ L1 ). Let X be a random variable which takes values in Z \ {−2, −1, 0, 1, 2} with P[ X = k ] = P[ X = − k ] = where C = 1 1 −1 2 ( ∑k ≥3 k2 log(k ) ) C , k2 log(k ) Hint: Prove that | ϕ(tn )| = 1 along a suitably chosen sequence tn → ∞, where ϕ is the characteristic function of the Cantor distribution. for k = 3, 4, . . . , ∈ (0, ∞). Show that ϕ0X (0) = 0, but X 6∈ L1 . Hint: Argue that, in order to establish that ϕ0X (0) = 0, it is enough to show that ∑ cos(hk )−1 lim 1 2 h→0 h k≥3 k log(k) = 0. Then split the sum at k close to 2/h and use (and prove) the inequality |cos( x ) − 1| ≤ min( x2 /2, x ). Bounding sums by integrals may help, too. Problem 8.7 (Multivariate characteristic functions). Let X = ( X1 , . . . , Xn ) be a random vector. The characteristic function ϕ = ϕ X : Rn → C is given by ϕ(t1 , t2 , . . . , tn ) = E[exp(i n ∑ tk Xk )]. k =1 We will also use the shortcut t for (t1 , . . . , tn ) and t · X for the random variable ∑nk=1 tk Xk . Prove the following statements 1. Random variables X and Y are independent if and only if ϕ(X,Y ) (t1 , t2 ) = ϕ X (t1 ) ϕY (t2 ) for all t1 , t2 ∈ R. 2. Random vectors X 1 and X 2 have the same distribution if and only if random variables t · X 1 and t · X 2 have the same distribution for all t ∈ Rn . (This fact is known as Wald’s device.) Note: Take for granted the following statement (the proof of which is similar to the proof of the 1-dimensional case): Suppose that X 1 and X 2 are random vectors with ϕ X 1 (t ) = ϕ X 2 (t ) for all t ∈ Rn . Then X 1 and X 2 have the same distribution, i.e. µX 1 = µX 2 . An n-dimensional random vector X is said to be Gaussian (or, to have the multivariate normal distribution) if there exists a vector µ ∈ Rn and a symmetric positive semi-definite matrix Σ ∈ Rn×n such that ϕ X (t ) = exp(i t · µ − 21 t τ Σt ), where t is interpreted as a column vector, and ()τ is transposition. This is denoted as X ∼ N (µ, Σ). X is said to be non-degenerate if Σ is positive definite. 3. Show that a random vector X is Gaussian, if and only if the random vector t · X is normally distributed (with some mean and variance) for each t ∈ Rn . Note: Be careful, nothing in the second statement tells you what the mean and variance of t · X are. 4. Let X = ( X1 , X2 , . . . , Xn ) be a Gaussian random vector. Show that Xk and Xl , k 6= l, are independent if and only if they are uncorrelated. Last Updated: December 8, 2013 Lecture 8: Characteristic Functions 9 of 9 5. Construct a random vector ( X, Y ) such that both X and Y are normally distributed, but that X = ( X, Y ) is not Gaussian. 6. Let X = ( X1 , X2 , . . . , Xn ) be a random vector consisting of n independent random variables with Xi ∼ N (0, 1). Let Σ ∈ Rn×n be a given positive semi-definite symmetric matrix, and µ ∈ Rn a given vector. Show that there exists an affine transformation T : Rn → Rn such that the random vector T ( X ) is Gaussian with T ( X ) ∼ N (µ, Σ). 7. Find a necessary and sufficient condition on µ and Σ such that the converse of the previous problem holds true: For a Gaussian random vector X ∼ N (µ, Σ), there exists an affine transformation T : Rn → Rn such that T ( X ) has independent components with the N (0, 1)-distribution (i.e. T ( X ) ∼ N (0, yI ), where yI is the identity matrix). Problem 8.8 (Slutsky’s Theorem). Let X, Y, { Xn }n∈N and {Yn }n∈N be random variables defined on the same probability space, such that D D Xn → X and Yn → Y. (8.3) Show that D 1. It is not necessarily true that Xn + Yn → X + Y. For that matter, D we do not necessarily have ( Xn , Yn ) → ( X, Y ) (where the pairs are considered as random elements in the metric space R2 ). 2. If, in addition to (8.3), there exists a constant c ∈ R such that P[Y = D c] = 1, show that g( Xn , Yn ) → g( X, c), for any continuous function g : R2 → R. Hint: It is enough to show that D ( Xn , Yn ) → ( Xn , c). Use Problem 8.7). Problem 8.9 (Convergence of a normal sequence). 1. Let { Xn }n∈N be a sequence of normally-distributed random variables converging weakly towards a random variable X. Show that X must be a normal random variable itself. a.s. 2. Let Xn be a sequence of normal random variables such that Xn → X. Lp Show that Xn → X for all p ≥ 1. Hint: Use this fact: for a sequence {µn }n∈N of real numbers, the following two statements are equivalent (a) µn → µ ∈ R, and (b) exp(itµn ) → exp(itµ), for all t. You don’t need to prove it, but feel free to try. Last Updated: December 8, 2013