Download Problem Set 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Recursive Estimation
Raffaello D’Andrea
Spring 2011
Problem Set:
Probability Review
Last updated: February 28, 2011
Notes:
• Notation: Unless otherwise noted, x, y, and z denote random variables, fx (x) (or the short
hand f (x)) denotes the probability density function of x, and fx|y (x|y) (or f (x|y)) denotes
the conditional probability density function of x conditioned on y. The expected value is
denoted by E [·], the variance is denoted by Var[·], and Pr (Z) denotes the probability that
the event Z occurs.
• Please report any errors found in this problem set to the teaching assistants
([email protected] or [email protected]).
Problem Set
Problem 1
Prove f (x|y) = f (x) ⇔ f (y|x) = f (y).
Problem 2
Prove f (x|y, z) = f (x|z) ⇔ f (x, y|z) = f (x|z) f (y|z).
Problem 3
Come up with an example where f (x, y|z) = f (x|z) f (y|z) but f (x, y) ̸= f (x) f (y) for continuous
random variables x, y, and z.
Problem 4
Come up with an example where f (x, y|z) = f (x|z) f (y|z) but f (x, y) ̸= f (x) f (y) for discrete
random variables x, y, and z.
Problem 5
Let x and y be binary random variables: x, y ∈ {0, 1}. Prove that fx,y (x, y) ≥ fx (x) + fy (y) − 1,
which is called Bonferroni’s inequality.
Problem 6
Let y be a scalar continuous random variable with probability density function fy (y), and g
be a function transforming y to a new variable x by x = g(y). Derive the change of variables
formula, i.e. the equation for the probability density function fx (x), under the assumption that
g is continuously differentiable with dg(y)
dy < 0 ∀ y.
Problem 7
Let x ∈ X be a scalar valued continuous random variable with probability density function
fx (x). Prove the law of the unconscious statistician, that is, for a function g(·) show that the
expected value of y = g(x) is given by
∫
E [y] = g(x)fx (x) dx.
X
You may consider the special case where g(·) is a continuously differentiable, strictly monotonically increasing function.
Problem 8
Prove E [ax + b] = aE [x] + b, where a and b are constants.
Problem 9
Let x and y be scalar random variables. Prove that, if x and y are independent, then for any
functions g(·) and h(·), E [g(x)h(y)] = E [g(x)] E [h(y)] .
2
Problem 10
From above, it follows that if x and y are independent, then E [xy] = E [x] E [y]. Is the converse
true, that is, if E [xy] = E [x] E [y], does it imply that x and y are independent? If yes, prove it.
If no, find a counter-example.
Problem 11
Let x ∈ X and y ∈ Y be independent, scalar valued discrete random variables. Let z = x + y.
Prove that
∑
∑
fz (z) =
fx (z − y) fy (y) =
fx (x) fy (z − x) ,
y∈Y
x∈X
that is, the probability density function is the convolution of the individual probability density
functions.
Problem 12
Let x be a scalar continuous random variable that takes on only non-negative values. Prove
that, for a > 0,
E [x]
,
a
which is called Markov’s inequality.
Pr (x ≥ a) ≤
Problem 13
Let x be a scalar continuous random variable. Let x̄ = E [x], σ 2 = Var[x]. Prove that for any
k > 0,
σ2
,
k2
which is called Chebyshev’s inequality.
Pr (|x − x̄| ≥ k) ≤
Problem 14
Let x1 , x2 , . . . , xn be independent random variables, each having a uniform distribution over
[0, 1]. Let the random variable m be defined as the maximum of x1 , . . . , xn , that is, m =
max{x1 , x2 , . . . , xn }. Show that the cumulative distribution function of m, Fm (·), is given by
Fm (m) = mn ,
0 ≤ m ≤ 1.
What is the probability density function of m?
Problem 15
[ ]
Prove that E x2 ≥ (E [x])2 . When does the statement hold with equality?
Problem 16
Suppose that x and y are independent scalar continuous random variables. Show that
∫∞
Pr (x ≤ y) =
Fx (y)fy (y) dy.
−∞
3
Problem 17
Use Chebyshev’s inequality to prove the weak law of large numbers. Namely, if x1 , x2 , . . . are
independent and identically distributed with mean µ and variance σ 2 then, for any ε > 0,
(
)
x1 + x2 + · · · + xn
Pr − µ > ε → 0 as n → ∞.
n
Problem 18
Suppose that x is a random variable with mean 10 and variance 15. What can we say about
Pr (5 < x < 15)?
Problem 19
Let x and y be independent random variables with means µx and µy and variances σx2 and σy2 ,
respectively. Show that
Var[xy] = σx2 σy2 + µ2y σx2 + µ2x σy2 .
Problem 20
a)
Use the method presented in class to obtain a sample x of the exponential distribution
{
λe−λx x ≥ 0
fˆx (x) =
0
x<0
from a given sample u of a uniform distribution.
b)
Exponential random variables are used to model failures of components that do not wear,
for example solid state electronics. This is based on their property
Pr (x > s + t|x > t) = Pr (x > s) ∀ s, t ≥ 0.
A random variable with this property is said to be memoryless. Prove that a random
variable with an exponential distribution satisfies this property.
4
Sample solutions
Problem 1
From the definition of conditional probability, we have f (x, y) = f (x|y)f (y) and f (x, y) =
f (y|x)f (x) and therefore
f (x|y)f (y) = f (y|x)f (x).
(1)
We can assume that f (x) ̸= 0 and f (y) ̸= 0; otherwise f (y|x) and f (x|y), respectively, are not
defined, and the statement under consideration does not make sense.
We will check both “directions” of the equivalence statement:
(1)
f (x|y) = f (x) =⇒ f (x)f (y) = f (y|x)f (x)
f (x)̸=0
=⇒ f (y) = f (y|x)
(1)
f (y|x) = f (y) =⇒ f (x|y)f (y) = f (y)f (x)
f (y)̸=0
=⇒ f (x) = f (x|y)
Therefore, f (x|y) = f (x) ⇔ f (y|x) = f (y).
Problem 2
Using
f (x, y|z) = f (x|y, z)f (y|z)
(2)
both “directions” of the equivalence statement can be shown as follows:
(2)
f (x|y, z) = f (x|z) =⇒ f (x, y|z) = f (x|z)f (y|z)
(2)
f (x, y|z) = f (x|z)f (y|z) =⇒ f (x|y, z)f (y|z) = f (x|z)f (y|z)
=⇒ f (x|y, z) = f (x|z)
since f (y|z) ̸= 0, otherwise f (x|y, z) is not defined.
Problem 3
Let x, y, z ∈ [0, 1] be continuous random variables.
Consider the joint pdf f (x, y, z) = c gx (x, z) gy (y, z) where gx and gy are functions (not necessarily pdfs) and c is a constant such that
∫1 ∫1 ∫1
f (x, y, z) dx dy dz = 1
0 0 0
Then
f (x, y|z) =
c gx (x, z) gy (y, z)
f (x, y, z)
=
,
f (z)
c Gx (z) Gy (z)
with
∫1
Gx (z) =
∫1
gx (x, z) dx, Gy =
0
gy (y, z) dy
0
5
since
∫1 ∫1
∫1
f (z) =
c gx (x, z) gy (y, z) dx dy = c
0 0
Gx (z) gy (y, z) dy = c Gx (z) Gy (z).
0
Furthermore,
f (x, z)
=
f (x|z) =
f (z)
∫1
0
f (x, y, z) dy
c gx (x, z)Gy (z)
gx (x, z)
=
=
f (z)
c Gx (z)Gy (z)
Gx (z)
and, by analogy,
f (y|z) =
gy (y, z)
.
Gy (z)
Continuing the above,
f (x, y|z) =
gx (x, z)gy (y, z)
= f (x|z)f (y|z),
Gx (z)Gy (z)
that is, x and y are conditionally independent.
Now, let gx (x, z) = x + z and gy (y, z) = y + z. Then,
∫1
f (x, y) =
∫1
f (x, y, z) dz = c
0
∫1
0
[
1
1
= c z 3 + (x + y)z 2 + xyz
3
2
∫1
f (x) =
∫1
f (x, y) dx = c
0
0
∫1
(
f (y) =
z 2 + (x + y)z + xy dz
(x + z)(y + z) dz = c
]z=1
(
=c
z=0
0
1 1
+ (x + y) + xy
3 2
1
1 1
+ x + y + xy dx = c
3 2
2
7
f (x, y) dy = c x +
12
(
)
1 1 1
1
+ + y+ y
3 4 2
2
)
(
7
=c y+
12
)
)
(by symmetry).
0
Therefore,
(
f (x)f (y) = c
2
49
7
+ (x + y) + xy
144 12
)
̸= f (x, y),
that is, x and y are not independent.
Problem 4
As in Problem 3, let f (x, y, z) = c gx (x, z)gy (y, z). This ensures that f (x, y|z) = f (x|z)f (y|z).
Let x, y, z ∈ {0, 1}.
6
Define gx and gy by value tables:
x
0
0
1
1
y
0
1
0
1
gx (x, z)
0
1
1
0
y
0
0
1
1
z
0
1
0
1
gy (y, z)
1
0
0
1
x
0
1
f (x)
0.5
0.5
x
0
0
0
0
1
1
1
1
f (x, y, z) = 12 gx (x, z)gy (y, z)
0
0
0
0.5
0.5
0
0
0
y
0
0
1
1
0
0
1
1
z
0
1
0
1
0
1
0
1
y
0
1
f (y)
0.5
0.5
Compute f (x, y), f (x), f (y):
x
0
0
1
1
y
0
1
0
1
f (x, y)
0
0.5
0.5
0
x
0
0
1
1
y
0
1
0
1
f (x)f (y)
0.25
0.25
0.25
0.25
From this we see
f (x, y) ̸= f (x)f (y) .
Remark: For verification, one may also compute f (x, y|z), f (x|z), f (y|z) in order to verify
f (x, y|z) = f (x|z) f (y|z), although this holds by construction of f (x, y, z) and the derivation in
Problem 3.
Problem 5
For all x, y ∈ {0, 1} we have by marginalization,
fx (x) =
∑
fxy (x, y) = fxy (x, 0) + fxy (x, 1) = fxy (x, y) + fxy (x, 1 − y)
y∈{0,1}
fy (y) =
∑
fxy (x, y) = fxy (x, y) + fxy (1 − x, y) .
x∈{0,1}
Therefore,
fx (x) + fy (y) = 2fxy (x, y) + fxy (x, 1 − y) + fxy (1 − x, y)
= 2fxy (x, y) + fxy (x, 1 − y) + fxy (1 − x, y) + fxy (1 − x, 1 − y) − fxy (1 − x, 1 − y)
= fxy (x, y) + fxy (x, 1 − y) + fxy (1 − x, y) + fxy (1 − x, 1 − y)
|
{z
}
∑ ∑
=
fxy (x, y) = 1
y∈{0,1} x∈{0,1}
− fxy (1 − x, 1 − y) + fxy (x, y)
= 1 + fxy (x, y) − fxy (1 − x, 1 − y)
≤ 1 + fxy (x, y)
=⇒ fxy (x, y) ≥ fx (x) + fy (y) − 1.
7
Problem 6
Consider the probability that y is in a small interval [ȳ, ȳ + ∆y] (with ∆y > 0 small),
ȳ+∆y
∫
Pr (y ∈ [ȳ, ȳ + ∆y]) =
fy (y) dy,
ȳ
which approaches fy (ȳ) ∆y in the limit as ∆y → 0.
Let x̄ := g(ȳ). For small ∆y, we have by a Taylor series approximation,
g(ȳ + ∆y) ≈ g(∆y) +
dg
(ȳ) ∆y = x̄ + ∆x, ∆x < 0
dy
| {z }
=:∆x
Since g(y) is decreasing, ∆x < 0, see Figure 1.
x̄
g(y)
x̄ + ∆x
ȳ
ȳ + ∆y
Figure 1: Decreasing function g(y) on a small interval [ȳ, ȳ + ∆y].
We are interested in the probability of x ∈ [x̄ + ∆x, x̄]. For small ∆x, this probability is equal
to the probability that y ∈ [ȳ, ȳ + ∆y] (since y ∈ [ȳ, ȳ + ∆y] ⇒ g(y) ∈ [g(ȳ), g(ȳ + ∆y)] ⇒ x ∈
[x̄ + ∆x, x̄] for small ∆y and ∆x).
Therefore, for small ∆y (and thus small ∆x),
Pr (y ∈ [ȳ, ȳ + ∆y]) = fy (ȳ) ∆y = Pr (x ∈ [x̄ + ∆x, x̄]) = −fx (x̄) ∆x
⇒ fy (ȳ) = −fx (x̄) ∆x
fy (ȳ)
⇒ fx (x̄) = ∆x .
− ∆y
As ∆y → 0 (and thus ∆x → 0),
fx (x̄) =
fy (ȳ)
dg
− dy
(ȳ)
∆x
∆y
or fx (x) =
→
dg
dy (ȳ)
fy (y)
dg
− dy
(y)
and therefore
.
Problem 7
From the change of variables formula1 , we have
fy (y) =
1
fx (x)
dg
dx (x)
See lecture.
8
since g is continuously differentiable and strictly monotonic. Furthermore, y = g(x) ⇒ dy =
dg
dx (x)dx.
Let Y = g(X ), that is Y = {y | ∃ x ∈ X : y = g(x)}. Then
∫
E [y] =
∫
yfy (y) dy =
Y
X
fx (x) dg
g(x) dg
(x) dx =
(x) dx
dx
∫
g(x)fx (x) dx.
X
Problem 8
Using the law of the unconscious statistician2 , we have for a discrete random variable x,
E [ax + b] =
∑
(ax + b)fx (x) = a
x∈X
∑
x∈X
|
xfx (x) +b
{z
}
∑
x∈X
|
fx (x) = aE [x] + b
{z
}
1
E[x]
and for a continuous random variable x,
∫∞
E [ax + b] =
∫∞
(ax + b)fx (x) dx = a
−∞
∫∞
xfx (x) dx +b
−∞
|
{z
E[x]
Problem 9
Using the law of the unconscious
statistician3
}
fx (x) dx = aE [x] + b.
−∞
|
{z
}
1
[ ]
x
applied to the vector random variable
with
y
pdf fxy (x, y) we have,
∫∞ ∫∞
E [g(x)h(y)] =
g(x)h(y)fxy (x, y) dx dy
−∞−∞
∫∞ ∫∞
=
g(x)h(y)fx (x) fy (y) dx dy
−∞−∞
∫∞
=
∫∞
g(x)fx (x) dx
−∞
(by independence of x, y)
h(y)fy (y) dy
−∞
= E [g(x)] E [h(y)] .
And similarly for discrete random variables.
Problem 10
We will construct a counter-example.
• Let x be a discrete random variable that takes values -1, 0, and 1 with probabilities 14 , 21 ,
and 14 , respectively.
2
3
See Problem 7.
See Problem 7.
9
• Let w be a discrete random variable taking the values -1, 1 with equal probability. Let x
and w be independent.
• Let y be a discrete random variable defined by y = xw (it follows that y can take values
-1, 0, 1).
By construction, it is clear that y depends on x. We will now show that y and x are not independent, but E [xy] = E [x] E [y].
PDF fxy (x, y):
x
-1
-1
-1
0
0
y
-1
0
1
-1
0
fxy (x, y)
0
1
1
1
1
-1
0
1
0
→ x = −1 (Prob 14 ) ∧ w = +1 (Prob
→ impossible since w ̸= 0
→ x = −1 (Prob 14 ) ∧ w = −1 (Prob
→ impossible
w either ±1, x = 0
..
.
1
8
0
1
8
0
1
2
1
2
)
1
2
)
1
8
0
1
8
Marginalization fx (x), fy (y):
x
-1
0
1
f (x)
0.25
0.5
0.25
y
-1
0
1
f (y)
0.25
0.5
0.25
It follows that fxy (x, y) = fx (x) fy (y) ∀ x, y does not hold. For example, fxy (−1, 0) = 0 ̸=
fx (−1) fy (0).
1
8
=
Expected values:
∑∑
E [xy] =
fxy (x, y)
x
y
1
1
1
1
1
= (−1)(−1) + (−1)(1) + (0)(0) + (1)(−1) + (1)(1)
8
8
2
8
8
=0
∑
E [x] =
xfx (x)
x
1 1
=− + =0
4 4
E [y] = 0
So, for all x, y, E [xy] = E [x] E [y], but not fxy (x, y) = fx (x) fy (y). Hence, we have showed by
counterexample that E [xy] = E [x] E [y] does not imply independence of x and y.
Problem 11
From the definition of the probability density function for discrete random variables,
fz (z̄) = Pr (z = z̄) .
10
The probability that z takes on z̄ is given by the sum of the probabilities of all possibilities of
x and y such that z̄ = x + y, that is all ȳ ∈ Y such that x = z̄ − ȳ and y = ȳ (or, alternatively,
x̄ ∈ X s.t. x = x̄ and y = z̄ − x̄). That is,
∑
fz (z̄) = Pr (z = z̄) =
Pr (x = z̄ − ȳ) · Pr (y = ȳ) (by independence assumption)
ȳ∈Y
=
∑
fx (z̄ − ȳ) fy (ȳ)
ȳ∈Y
Rewriting the last equation yields fz (z) =
Similarly, fz (z) =
∑
∑
fx (z − y) fy (y).
y∈Y
fx (x) fy (z − x) may be obtained.
x∈X
Problem 12
∫∞
∫a
∫∞
E [x] = xf (x) dx = xf (x) dx + xf (x) dx
0
|0
{z
≥0
}
a
∫∞
∫∞
≥ xf (x) dx ≥ af (x) dx
a
a
∫∞
= a f (x) dx = a Pr (x ≥ a)
a
Problem 13
This inequality can be obtained using Markov’s inequality4 :
• (x − x̄)2 is a nonnegative random variable; apply Markov’s inequality with a = k 2 (k ≥ 0):
[
]
2
(
)
E
(x
−
x̄)
Var[x]
σ2
Pr (x − x̄)2 ≥ k 2 ≤
=
=
k2
k2
k2
• Since (x − x̄)2 ≥ k 2 ⇔ |x − x̄| ≥ k, the result follows:
Pr (|x − x̄| ≥ k) ≤
σ2
.
k2
Problem 14
For 0 ≤ t ≤ 1, we write
Fm (t) = Pr (m ∈ [0, t]) = Pr (max{x1 , . . . , xn } ∈ [0, t])
= Pr (xi ∈ [0, t] ∀ i ∈ {1, . . . , n})
= Pr (x1 ∈ [0, t]) · Pr (x2 ∈ [0, t]) · · · Pr (xn ∈ [0, t])
= tn .
4
See Problem 12.
11
By substituting m for t, we obtain the desired result.
∫∞
Since Fm (m) = fm (τ ) dτ , the pdf of m is obtained by differentiating,
0
fm (m) =
dFm
d
(m) =
(mn ) = n · mn−1 .
dm
dm
Problem 15
Let x̄ = E [x]. Then
∫∞
∫∞
2
(x − x̄) f (x) dx =
Var[x] =
−∞
∫∞
x f (x) dx − 2x̄
2
−∞
∫∞
2
xf (x) dx +x̄
−∞
|
{z
}
=x̄
[ ]
[ ]
= E x2 − 2x̄2 + x̄2 = E x2 − x̄2
f (x) dx
−∞
|
{z
}
=1
Since Var[x] ≥ 0, we have
[ ]
E x2 = x̄2 + Var[x] ≥ x̄2 ,
with equality if Var[x] = 0.
Problem 16
For fixed ȳ,
Pr (x ≤ ȳ) = Fx (ȳ) .
The probability that y ∈ [ȳ, ȳ + ∆y] is
ȳ+∆y
∫
Pr (y ∈ [ȳ, ȳ + ∆y]) =
fy (y) dy ≈ fy (ȳ) ∆y
for small ∆y.
ȳ
Therefore, the probability that x ≤ ȳ and y ∈ [ȳ, ȳ + ∆y] is
Pr (x ≤ ȳ ∧ y ∈ [ȳ, ȳ + ∆y]) = Pr (x ≤ ȳ) · Pr (y ∈ [ȳ, ȳ + ∆y]) ≈ Fx (ȳ) fy (ȳ) ∆y
(3)
by independence of x and y.
Now, since we are interested in Pr (x ≤ y), i.e. in the probabilty of x being less than any y and
not just a fixed ȳ, we can sum (3) over all possible intervals [ȳi , ȳi + ∆y]
Pr (∃ ȳi such that x ≤ ȳi ) = lim
N →∞
N
∑
Fx (ȳi ) · fy (ȳi ) · ∆y
i=−N
By letting ∆y → 0, we obtain
∫∞
Pr (x ≤ y) =
Fx (y) fy (y) dy.
−∞
12
Problem 17
Consider the random variable
x1 +x2 +···+xn
.
n
It has the mean
[
]
x1 + x2 + · · · + xn
1
1
E
= (E [x1 ] + · · · + E [xn ]) = (n · µ) = µ
n
n
n
and variance
[(
[
]
)2 ]
x1 + x2 + · · · + xn
x1 + x2 + · · · + xn
Var
=E
−µ
n
n
])
1 ( [
2
2
= 2 E (x1 − µ) + · · · + (xn − µ) + “cross terms”
.
n
The expected value of the cross terms is zero, for example,
E [(xi − µ) (xj − µ)] = E [xi − µ] · E [xj − µ] for i ̸= j, by independence
= 0.
Hence,
[
]
x1 + x2 + · · · + xn
1
nσ 2
σ2
Var
= 2 (Var[x1 ] + . . . Var[xn ]) = 2 =
n
n
n
n
Applying Chebyshev’s inequality with ϵ > 0 gives
(
)
x1 + x2 + · · · + xn σ2
Pr ≥ϵ ≤
−→ 0
n
n · ϵ2 n→0
(
)
x1 + x2 + · · · + xn > ϵ −→ 0
⇒Pr n→0
n
Remark: Changing “≥” to “>” is valid here, since
( x , . . . , x
)
( x , . . . , x
)
( x , . . . , x
)
1
1
1
n
n
n
Pr − µ ≥ ϵ = Pr − µ > ϵ + Pr − µ = 0 .
n
n
n {z
|
}
=0 since
Problem 18
First, we rewrite
Pr (5 < x < 15) = Pr (|x − 10| < 5) = 1 − Pr (|x − 10| ≥ 5) .
Using Chebyshev’s inequality, with x̄ = 10, σ 2 = 15, k = 5, yields
Pr (|x − 10| ≥ 5) ≤
15
15
3
=
= .
2
5
25
5
Therefore, we can say that
Pr (5 < x < 15) ≥ 1 −
3
2
= .
5
5
13
x1 +···+xn
n
is a CRV
Problem 19
First, we compute the mean of xy,
E [xy] = E [x] · E [y] = µx µy .
Hence, by using independence of x and y and the result of Problem 7,
[
]
Var[xy] = E (xy)2 − (µx µy )2
[ ] [ ]
= E x2 E y 2 − µ2x µ2y
[ ]
[ ]
Using E x2 = σx2 + µ2x and E y 2 = σy2 + µ2y , we obtain the result
(
)(
)
Var[xy] = σx2 + µ2x σy2 + µ2y − µx µy
= σx2 σy2 + µ2y σx2 + µ2x σy2 .
Problem 20
a)
The cumulative distribution is
∫x
F̂x (x) =
[
]x
λe−λt dt = −e−λt
t=0
= 1 − e−λx .
0
Let u be a sample from a uniform distribution. According to the method presented in
class, we obtain a sample x by solving u = F̂x (x) for x, that is,
u = 1 − e−λx ⇔ e−λx = 1 − u
⇔ −λx = ln (1 − u)
− ln(1 − u)
.
⇔x=
λ
Notice the following properties:
u → 0 ⇒ x → 0,
u → 1 ⇒ x → +∞.
b)
For s ≥ 0, t ≥ 0,
Pr (x > s + t ∧ x > t)
Pr (x > s + t)
=
Pr (x > t)
Pr (x > t)
[ −λx ]∞
∫ ∞ −λx
−e
λe
dx
e−λ(s+t)
s+t
=
= ∫s+t
=
∞
∞ −λx
e−λt
[−e−λx ]t
dx
t λe
∫∞
= e−λs = λe−λx dx
Pr (x > s + t|x > t) =
s
= Pr (x > s) .
14
Related documents