Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The weak law of large numbers The central limit theorem Gaussian random vectors Covariance matrices The multidimensional gaussian law Multidimensional gaussian density Marginal distributions Eigenvalues of the covariance matrix Uncorrelation and independence Linear combinations Conditional densities October 11, 2011 1 / 40 2 / 40 The weak law of large numbers � Theorem Let X1 , X2 , . . . , Xk , . . . be a sequence of independent and identically distributed r.v., each having E(Xk ) = m and finite variance. Let The r.v. X n has mean m: � � � � X1 + X2 + · · · + Xn E Xn = E n = X1 + X2 + · · · + Xn Xn = , n ≥ 1. n Then, for all � > 0, � �� � lim P �X n − m� ≥ � = 0 E(X1 ) + E(X2 ) + · · · + E(Xn ) n = nm =m n n→∞ 3 / 40 4 / 40 � If Var(Xk ) = σ 2 , then the variance of X n is σ 2 /n. � � Var X n = E �� Xn − m �2 � � �2 n 1 � = 2 E (Xk − m) n k=1 n � � � 1 = 2 E (Xk − m)2 + 2 n k=1 = � By the Chebyshev’s inequality: � 1≤i<j≤n σ2 n E ((Xi − m)(Xj − m)) Hence, as n → ∞, � �� � σ 2 /n P �X n − m � ≥ � ≤ 2 . � � �� � P �X n − m � ≥ � → 0 Notice that E ((Xi − m)(Xj − m)) = 0 because the r.v. Xi are independent, and, therefore, they are pairwise uncorrelated. 5 / 40 6 / 40 Example: Probability as the limit of the relative frequency � Let A be a given event with P(A) = p. � Le us repeat n times the random experiment (independent repetitions). � Let Xk be the indicator of the event “A happens in the k-th repetition”. Hence, � P (|fn (A) − p| ≥ �) → 0 as n → ∞ Equivalently, P (|fn (A) − p| < �) → 1 E(Xk ) = P(A) = p. � Therefore, for all � > 0, � Then, X1 + X2 + · · · + Xn = fn (A) n is the relative frequency of the event A. Xn = 7 / 40 quan n → ∞ Hence, in a certain sense, the relative frequency of the event A converges to its probability. 8 / 40 The central limit theorem � Theorem We have E(Sn� ) = 0, Let X1 , X2 , . . . , Xk , . . . be a sequence of independent and identically distributed r.v., each with E(Xk ) = m and Var(Xk ) = σ 2 . Let � Notice that Sn� is the normalized arithmetic mean X n : Sn� = n 1 � Xk − m Sn� = √ . σ n Then, 1 lim FSn� (x) = FN(0,1) (x) = √ n→∞ 2π Xn − m √ σ/ n σ X n = √ Sn� + m n k=1 � Var(Sn� ) = 1 � x e −t 2 /2 We write d Sn� −→ N(0, 1), dt −∞ 9 / 40 10 / 40 Covariance matrices where � The covariance matrix KX of an n-dimensional r.v. mX is the expectation vector mX = E(X ) = (mX1 , mX2 , . . . , mXn )t X = (X1 , X2 , . . . , Xn )t � is the square n × n matrix defined by � KX = E (X − mX )(X − mX ) k11 k21 = ··· kn1 k12 k22 ··· kn2 ··· ··· ··· ··· t k1n k2n ··· knn For i �= j, � � kij = E (Xi − mXi )(Xj − mXj ) = Cov(Xi , Xj ) � � 11 / 40 The diagonal entries of KX are � � kii = E (Xi − mXi )2 = Var(Xi ) 12 / 40 The matrix KX is � symmetric, � Moreover, the r.v. kij = Cov(Xi , Xj ) = Cov(Xj , Xi ) = kji � X 1 − m X1 , X 2 − m X2 , . . . , X n − m Xn are linearly independent if and only if KX is positive-definite. That is, if and only if, for all z �= 0, positive-semidefinite. That is, for all z = (z1 , z2 , . . . , zn )t ∈ Rn , z t KX z > 0 z t KX z ≥ 0 13 / 40 Indeed, if 14 / 40 Moreover, t Y = z1 X1 + · · · zn Xn = z X , z t KX z = 0 for some z �= 0 we have t t � t ⇐⇒ σY2 = 0 for some z �= 0 � z KX z = z E (X − mX )(X − mX ) z � � = E z t (X − mX )(X − mX )t z � � = E (Y − mY )(Y − mY )t � � = E (Y − mY )2 = σY2 ≥ 0. � (Notice that z t mX = ni=1 zi mXi = mY .) ⇐⇒ Y − mY = 0, with probability 1, for some z �= 0 � ⇐⇒ ni=1 zi (Xi − mXi ) = 0, with probability 1, for some z = (z1 , z2 , . . . , zn )t �= 0 ⇐⇒ X1 − mX1 , X2 − mX2 , . . . , Xn − mXn are linealy dependent, with probability 1. 15 / 40 16 / 40 Linear transformations We have Theorem mY = E(Y ) = E(AX ) = A E(X ) = AmX , Let X = (X1 , X2 , . . . , Xn )t be an n-dimensional r.v., let A be an m × n real matrix, and let Y = (Y1 , Y2 , . . . , Ym )t be the m-dimensional r.v. defined by and � � KY = E (Y − mY )(Y − mY )t � � = E A (X − mX )(X − mX )t At � � = A E (X − mX )(X − mX )t At = AKX At Y = AX . Then, m Y = A mX , K Y = A KX A t 17 / 40 18 / 40 Multidimensional gaussian density Let X1 , X2 , . . ., Xn be n independent gaussian r.v. Hence, We have, fX (x1 , x2 , . . . , xn ) = fX1 (x1 )fX2 (x2 ) · · · fXn (xn ) fX (x1 , x2 , . . . , xn ) = n � 2 1 1 √ = e − 2 ((xi −mXi )/σXi ) 2π σXi i=1 = (2π)n/2 σ = (2π)n/2 (2π)n/2 2 1 �n 1 e − 2 i=1 ((xi −mXi )/σXi ) X1 σ X2 · · · σ X n 1 1 t � e − 2 (x−mX ) det(KX ) 1 � � 1 exp − (x − mX )t KX−1 (x − mX ) 2 det(KX ) � KX−1 (x−mX ) 19 / 40 20 / 40 where � x = (x1 , x2 , . . . , xn )t � mX is the expectation vector Now, let us consider a linear transformation mX = (mX1 , . . . , mXn ) � KX = σX2 1 0 . σX2 2 . . . Y = AX , 0 . . . . σX2 n being A a non-singular n × n matrix. � The linear system y = Ax has a unique solution x = A−1 y . � The jacobian determinant is J(x1 , x2 , . . . , xn ) = det(A) is the covariance matrix. It is a diagonal matrix because the r.v. are independent and, hence, pairwise uncorrelated. 21 / 40 22 / 40 In this way, we have Moreover, � fX (x1 , x2 , . . . , xn ) �� fY (y1 , y2 , . . . , yn ) = |J(x1 , x2 , . . . , xn )| �x=A−1 y � mY = AmX =⇒ mX = A−1 mY = � (2π)n/2 1 � · det(KX ) |det(A)| (2π)n/2 1 � � � 1 · exp − (A−1 y − A−1 mY )t KX−1 (A−1 y − A−1 mY ) 2 KY = AKX At =⇒ KY−1 = (At )−1 KX−1 A−1 = (A−1 )t KX−1 A−1 = � det(KY ) · � � �t 1� · exp − A−1 (y − mY ) KX−1 A−1 (y − mY ) 2 det(KY ) = det(KX ) det(A)2 23 / 40 24 / 40 Multidimensional gaussian density = 1 � · (2π)n/2 det(KY ) � � 1 t −1 t −1 −1 · exp − (y − mY ) (A ) KX A (y − mY ) 2 � � 1 1 −1 t � = exp − (y − mY ) KY (y − mY ) 2 (2π)n/2 det(KY ) � Notice that the above expression is analogous to the one we have in the case of independent r.v. � But now, the covariance matrix KY will not be, in general, a diagonal matrix. Definition If the random vector X has probability density fX (x1 , x2 , . . . , xn ) = � � 1 1 t −1 � exp − (x − m) K (x − m) , 2 (2π)n/2 det(K ) where x = (x1 , x2 , . . . , xn )t , m is a column n × 1 vector, and K is a square positive-semidefinite n × n matrix, we say that X is a n-dimensional gaussian r.v. � We also say that the r.v. X1 , X2 , . . ., Xn are jointly gaussian. 25 / 40 26 / 40 σX = σY ρ=0 For instance, for n = 2 we obtain: fXY (x, y ) = where a(x, y ) = 2π � 1 1 − ρ2 σ X σ Y � � 1 1 exp − · · a(x, y ) , 2 1 − ρ2 0.15 � x − mX σX �2 − 2ρ x − mX y − mY · + σX σY � y − mY σY �2 0.1 2 0.05 0 0 -2 0 -2 2 27 / 40 28 / 40 σX = σY σX = σY ρ = 0.5 ρ = 0.9 0.3 0.15 0.1 0.2 2 0.05 2 0.1 0 0 0 0 -2 -2 0 0 -2 -2 2 2 29 / 40 30 / 40 Marginal distributions σX = 2σY ρ = 0.7 If m = (mi ) and K = (kij ), � 0.1 Each component Xi is a (1-dimensional) gaussian r.v. with parameters m Xi = m i , 0.075 0.05 σi2 = kii 2 0.025 0 0 -4 � K is the covariance matrix of X . -2 0 -2 2 4 31 / 40 32 / 40 Eigenvalues of the covariance matrix Hence, The matrix KX is symmetric. Hence, it can be transformed into a diagonal matrix by means of an ortogonal transformation. 0 ≤ z t KX z = z t C t DCz = (Cz)t D (Cz) That is, � = y t Dy = There exists an ortogonal matrix C such that yi2 λi , i=1 CKX C t = D = diag(λ1 , λ2 , . . . , λn ) � n � where y = Cz. The numbers λi ∈ R are the eigenvalues of KX . � Therefore, λi ≥ 0, Equivalently, i = 1, 2, . . . , n KX = C t DC 33 / 40 � � Uncorrelation and independence KX is a positive-definite matrix if and only if λi > 0, i = 1, 2, . . . , n ( Xi − mXi , i = 1, . . . , n, are linearly independent r.v.) This condition is equivalent to Theorem det(KX ) �= 0 � 34 / 40 If the r.v. X1 , X2 , . . ., Xn are jointly gaussian and pairwise uncorrelated, then they are jointly independent. Gaussian random vectors can be defined in a more general framework, in such a way that KX is not necessarily invertible. That is, Xi − mXi , i = 1, . . . , n, could be not all linearly independent. But only in the case that det(KX ) �= 0 there exists a density fX (x1 , x2 , . . . , xn ) 35 / 40 36 / 40 Linear combinations Indeed, Cov(Xi , Xj ) = 0 =⇒ KX = diag(σX2 1 , σX2 2 , . . . , σX2 n ) and, hence, Theorem Let X be an n-dimensional gaussian r.v., let A be an m × n real matrix, and let Y = AX . fX (x1 , x2 , . . . , xn ) = (2π)n/2 � � 1 1 −1 t � exp − (x − mX ) KX (x − mX ) 2 det(KX ) Then, Y is an m-dimensional gaussian r.v. with mY = A mX and KY = AKX At . � 2 1 − 12 ni=1 ((xi −mXi )/σXi ) = e (2π)n/2 σX1 σX2 · · · σXn � n � 2 1 1 √ = e − 2 ((xi −mXi )/σXi ) = fX1 (x1 )fX2 (x2 ) · · · fXn (xn ) 2π σXi i=1 If m ≤ n and A has full rang m, the random vector Y has a probaility density fY (y1 , . . . , yn ). 37 / 40 38 / 40 Conditional densities Let X , Y be jointly gaussian. Then, Theorem fY |X (y |X = x) = The n-dimensional r.v. X = (X1 , . . . , Xn )t is gaussian if and only if the 1-dimensional r.v. =√ Y = a1 X 1 + · · · + an X n = a t X is gaussian for all a = (a1 , a2 , . . . an )t ∈ Rn � 1 2π fXY (x, y ) fX (x) � 1 − ρ2 σY � 1 exp − 2 39 / 40 y − mY |X σY |X �2 � mY |X is the expected value of Y given X : mY |X = E(Y |X = x) = ρ � � σY (x − mX ) + mY σX σY2 |X = (1 − ρ2 ) σY2 . 40 / 40