Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sheet 5 · Page 1 Machine Learning 1 — WS2014 — Module IN2064 Machine Learning Worksheet 5 Gaussians and Probability Theory - II 1 Basic Probability Problem 1: Show that the sum of two independent Gaussian random variables (X1 and X2 ) is Gaussian. Some of the properties of Gaussians mentioned in the lecture can help. Let X1 ∼ N (µ1 , Σ1 ), X2 ∼ N (µ2 , Σ2 ), two n dimensional random vectors. Stack both vectors to form a random vector Y . Y is a Gaussian random variable: X1 µ1 Σ1 0 Y = ∼N , 0 Σ2 X2 µ2 Consider Z = X1 + X2 . Z can be written as Z = AY , with A = [I n I n ]. So Z is Gaussian with mean µ1 + µ2 and covariance Σ1 + Σ2 (see the rule for linear transformations of Gaussians from the slides). Problem 2: We say that two random variables are pairwise independent, if P (X2 |X1 ) = P (X2 ) and hence P (X2 , X1 ) = P (X1 )P (X2 |X1 ) = P (X1 )p(X2 ) We say that n random variables are mutually independent if P (Xi |XS ) = P (Xi ) and hence P (X1:n ) = ∀S ⊆ {1, . . . , n} \ {i} n Y P (Xi ) i=1 Show that pairwise independence between all pairs of variables however does not necessarily imply mutual independence. It suffices to give a counter example. There are 4 balls in a bag, denoted a, b, c, and d, respectively. Suppose we draw one at random. We define 3 different events based on this draw: • X1 : ball a or b is drawn. • X2 : ball b or c is drawn. • X3 : ball a or c is drawn. We have P (X1 ) = P (X2 ) = P (X3 ) = 0.5. And P (X1 , X2 ) = P (X1 , X3 ) = P (X2 , X3 ) = 0.25. Thus P (Xi , Xj ) = P (Xi )P (Xj ), for i 6= j, i, j = 1 . . . 3, i.e., the events are pairwise independent. However, P (X1 , X2 , X3 ) = 0 6= 1/8. Submit to [email protected] with subject line homework sheet 5 by 2014/11/10, 23:59 CET Sheet 5 · Page 2 Machine Learning 1 — WS2014 — Module IN2064 Problem 3: Let X and Y be two random variables. Express V ar[X + Y ] in terms of V ar[X], V ar[Y ] and Cov[X, Y ]. V ar[X +Y ] = E[(X +Y )2 ]−(E[X]+E[Y ])2 = E[X 2 ]−E 2 [X]+E[Y 2 ]−E 2 [Y ]+2E[XY ]−2E[X]E[Y ] And hence V ar[X + Y ] = V ar[X] + V ar[Y ] + 2Cov[X, Y ]. Problem 4: Let X and Y be two random variables. Prove that −1 ≤ ρ(X, Y ) ≤ 1. You may want to use the result from the previous problem. First observation is that Cov[X/σX , Y /σY ] = E[(X/σX − E[X/σX ])(Y /σY − E[Y /σY ])] = ρ(X, Y ). Because V ar[X/σX + Y /σY ] ≥ 0 (property of V ar[·]), we get 0 ≤ V ar[X/σX +Y /σY ] = V ar[X] V ar[Y ] + +2Cov[X/σX , Y /σY ] = 1+1+2ρ(X, Y ) = 2(1+ρ(X, Y )) 2 σX σY2 So ρ(X, Y ) ≥ −1. Similarly, consider V ar[X/σX − Y /σY ] to show ρ(X, Y ) ≤ 1. Problem 5: Let X be a random variable. Show that, if Y = aX + b for some parameters a > 0 and b, then ρ(X, Y ) = 1. Similarly show that if a < 0, then ρ(X, Y ) = −1. First, compute Cov[X, Y ]: 2 2 Cov[X, Y ] = E[(X − E(X))(Y − E(Y ))] = E[(X − E(X))(aX + b − aE(X) − b)] = E[aX + bX − aXE(X) − bX − aXE(X) − E(X)b + aE(X) + E(X)b] which reduces to E[aX 2 − 2aXE(X) + aE(X)2 ] = aV p ar[X]. For ρ(X, Y ) we also need V ar[Y ] = a2 V ar[X]. So we finally get ρ(X, Y ) = Cov[X, Y ]/( V ar(X)V ar(Y )) = aV ar[X]/(|a|V ar[X)] = a/|a|. Depending on the sign of a, we get ρ(X, Y ) = ±1. Problem 6: Let X ∼ U (−1, 1) and Y = X 2 . Obviously, Y is dependent on X (in fact, Y is uniquely determined by X). However, show that ρ(X, Y ) = 0 (i.e. uncorrelated does not imply independent, in general; but see next problem for a special case). Let’s consider Cov[X, Y ] = E[(X − E[X])(Y − E[Y ])]. Cov[X, Y ] = E[(X − E[X])(X 2 − E[X 2 ])] = E[X 3 ] − E[X]E[X 2 ] Because the U (−1, 1) is symmetric around 0, all uneven moments, in particular E[X 3 ] and E[X] are zero. Thus, ρ(X, Y ) = 0. Problem 7: Let Z = (X, Y ) be a bivariate normal distributed random variable. Furthermore, let 2 ) and Y ∼ N (µ , σ 2 ). Assume that ρ(X, Y ) = 0. Show that in this case X and Y are X ∼ N (µX , σX Y Y Submit to [email protected] with subject line homework sheet 5 by 2014/11/10, 23:59 CET Machine Learning 1 — WS2014 — Module IN2064 Sheet 5 · Page 3 independent. Because Y ) = 0 we know that Cov[X, Y ] = Cov[Y, X] = 0. Therefore, µZ = (µX , µY )T and ρ(X, 2 σX 0 ΣZ = . Therefore 0 σY2 pZ (z) = 1 exp |2πΣZ | −(x − µX )2 −(y − µY )2 −1 1 exp + = pX (x)pY (y) (z − µZ )T Σ−1 (z − µ ) = Z Z 2 2πσ 2 2 2 2πσX 2σX 2σY2 Y Problem 8: Using Jensen’s Inequality, show that for a finite random variable X (with n different values), its entropy is always bounded above by ln n. Additionally, prove that the Kullback-Leibler divergence between any two discrete probability distributions is always non-negative. H[X] = − X x DKL (p||q) = X x p(x) ln ln 1 1 p(x) ≤ − ln P = ln n 1/p(x) x 1/p(x)p(x) X X X p(x) q(x) q(x) =− p(x) ln ≥ − ln p(x) = − ln q(x) = − ln 1 = 0 q(x) p(x) p(x) x x x Submit to [email protected] with subject line homework sheet 5 by 2014/11/10, 23:59 CET