Download Gaussians and Probability Theory - II 1 Basic Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Sheet 5 · Page 1
Machine Learning 1 — WS2014 — Module IN2064
Machine Learning Worksheet 5
Gaussians and Probability Theory - II
1
Basic Probability
Problem 1: Show that the sum of two independent Gaussian random variables (X1 and X2 ) is Gaussian.
Some of the properties of Gaussians mentioned in the lecture can help.
Let X1 ∼ N (µ1 , Σ1 ), X2 ∼ N (µ2 , Σ2 ), two n dimensional random vectors. Stack both vectors to
form a random vector Y . Y is a Gaussian random variable:
X1
µ1
Σ1 0
Y =
∼N
,
0 Σ2
X2
µ2
Consider Z = X1 + X2 . Z can be written as Z = AY , with A = [I n I n ]. So Z is Gaussian with
mean µ1 + µ2 and covariance Σ1 + Σ2 (see the rule for linear transformations of Gaussians from the
slides).
Problem 2: We say that two random variables are pairwise independent, if P (X2 |X1 ) = P (X2 ) and
hence
P (X2 , X1 ) = P (X1 )P (X2 |X1 ) = P (X1 )p(X2 )
We say that n random variables are mutually independent if
P (Xi |XS ) = P (Xi )
and hence
P (X1:n ) =
∀S ⊆ {1, . . . , n} \ {i}
n
Y
P (Xi )
i=1
Show that pairwise independence between all pairs of variables however does not necessarily imply mutual
independence. It suffices to give a counter example.
There are 4 balls in a bag, denoted a, b, c, and d, respectively. Suppose we draw one at random. We
define 3 different events based on this draw:
• X1 : ball a or b is drawn.
• X2 : ball b or c is drawn.
• X3 : ball a or c is drawn.
We have P (X1 ) = P (X2 ) = P (X3 ) = 0.5. And P (X1 , X2 ) = P (X1 , X3 ) = P (X2 , X3 ) = 0.25. Thus
P (Xi , Xj ) = P (Xi )P (Xj ), for i 6= j, i, j = 1 . . . 3, i.e., the events are pairwise independent. However,
P (X1 , X2 , X3 ) = 0 6= 1/8.
Submit to [email protected] with subject line homework sheet 5 by 2014/11/10, 23:59 CET
Sheet 5 · Page 2
Machine Learning 1 — WS2014 — Module IN2064
Problem 3: Let X and Y be two random variables. Express V ar[X + Y ] in terms of V ar[X], V ar[Y ]
and Cov[X, Y ].
V ar[X +Y ] = E[(X +Y )2 ]−(E[X]+E[Y ])2 = E[X 2 ]−E 2 [X]+E[Y 2 ]−E 2 [Y ]+2E[XY ]−2E[X]E[Y ]
And hence V ar[X + Y ] = V ar[X] + V ar[Y ] + 2Cov[X, Y ].
Problem 4: Let X and Y be two random variables. Prove that −1 ≤ ρ(X, Y ) ≤ 1. You may want to
use the result from the previous problem.
First observation is that Cov[X/σX , Y /σY ] = E[(X/σX − E[X/σX ])(Y /σY − E[Y /σY ])] = ρ(X, Y ).
Because V ar[X/σX + Y /σY ] ≥ 0 (property of V ar[·]), we get
0 ≤ V ar[X/σX +Y /σY ] =
V ar[X] V ar[Y ]
+
+2Cov[X/σX , Y /σY ] = 1+1+2ρ(X, Y ) = 2(1+ρ(X, Y ))
2
σX
σY2
So ρ(X, Y ) ≥ −1. Similarly, consider V ar[X/σX − Y /σY ] to show ρ(X, Y ) ≤ 1.
Problem 5: Let X be a random variable. Show that, if Y = aX + b for some parameters a > 0 and b,
then ρ(X, Y ) = 1. Similarly show that if a < 0, then ρ(X, Y ) = −1.
First, compute Cov[X, Y ]:
2
2
Cov[X, Y ] = E[(X − E(X))(Y − E(Y ))] = E[(X − E(X))(aX + b − aE(X) − b)] = E[aX + bX − aXE(X) − bX − aXE(X) − E(X)b + aE(X) + E(X)b]
which reduces to E[aX 2 − 2aXE(X) + aE(X)2 ] = aV
p ar[X]. For ρ(X, Y ) we also need V ar[Y ] =
a2 V ar[X]. So we finally get ρ(X, Y ) = Cov[X, Y ]/( V ar(X)V ar(Y )) = aV ar[X]/(|a|V ar[X)] =
a/|a|. Depending on the sign of a, we get ρ(X, Y ) = ±1.
Problem 6: Let X ∼ U (−1, 1) and Y = X 2 . Obviously, Y is dependent on X (in fact, Y is uniquely
determined by X). However, show that ρ(X, Y ) = 0 (i.e. uncorrelated does not imply independent, in
general; but see next problem for a special case).
Let’s consider Cov[X, Y ] = E[(X − E[X])(Y − E[Y ])].
Cov[X, Y ] = E[(X − E[X])(X 2 − E[X 2 ])] = E[X 3 ] − E[X]E[X 2 ]
Because the U (−1, 1) is symmetric around 0, all uneven moments, in particular E[X 3 ] and E[X] are
zero. Thus, ρ(X, Y ) = 0.
Problem 7: Let Z = (X, Y ) be a bivariate normal distributed random variable. Furthermore, let
2 ) and Y ∼ N (µ , σ 2 ). Assume that ρ(X, Y ) = 0. Show that in this case X and Y are
X ∼ N (µX , σX
Y
Y
Submit to [email protected] with subject line homework sheet 5 by 2014/11/10, 23:59 CET
Machine Learning 1 — WS2014 — Module IN2064
Sheet 5 · Page 3
independent.
Because
Y
) = 0 we know that Cov[X, Y ] = Cov[Y, X] = 0. Therefore, µZ = (µX , µY )T and
ρ(X,
2
σX 0
ΣZ =
. Therefore
0 σY2
pZ (z) =
1
exp
|2πΣZ |
−(x − µX )2 −(y − µY )2
−1
1
exp
+
= pX (x)pY (y)
(z − µZ )T Σ−1
(z
−
µ
)
=
Z
Z
2 2πσ 2
2
2
2πσX
2σX
2σY2
Y
Problem 8: Using Jensen’s Inequality, show that for a finite random variable X (with n different values),
its entropy is always bounded above by ln n. Additionally, prove that the Kullback-Leibler divergence
between any two discrete probability distributions is always non-negative.
H[X] = −
X
x
DKL (p||q) =
X
x
p(x) ln
ln
1
1
p(x) ≤ − ln P
= ln n
1/p(x)
x 1/p(x)p(x)
X
X
X
p(x)
q(x)
q(x)
=−
p(x) ln
≥ − ln
p(x)
= − ln
q(x) = − ln 1 = 0
q(x)
p(x)
p(x)
x
x
x
Submit to [email protected] with subject line homework sheet 5 by 2014/11/10, 23:59 CET
Related documents