Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Recursive Estimation Raffaello D’Andrea Spring 2011 Problem Set: Probability Review Last updated: February 28, 2011 Notes: • Notation: Unless otherwise noted, x, y, and z denote random variables, fx (x) (or the short hand f (x)) denotes the probability density function of x, and fx|y (x|y) (or f (x|y)) denotes the conditional probability density function of x conditioned on y. The expected value is denoted by E [·], the variance is denoted by Var[·], and Pr (Z) denotes the probability that the event Z occurs. • Please report any errors found in this problem set to the teaching assistants ([email protected] or [email protected]). Problem Set Problem 1 Prove f (x|y) = f (x) ⇔ f (y|x) = f (y). Problem 2 Prove f (x|y, z) = f (x|z) ⇔ f (x, y|z) = f (x|z) f (y|z). Problem 3 Come up with an example where f (x, y|z) = f (x|z) f (y|z) but f (x, y) ̸= f (x) f (y) for continuous random variables x, y, and z. Problem 4 Come up with an example where f (x, y|z) = f (x|z) f (y|z) but f (x, y) ̸= f (x) f (y) for discrete random variables x, y, and z. Problem 5 Let x and y be binary random variables: x, y ∈ {0, 1}. Prove that fx,y (x, y) ≥ fx (x) + fy (y) − 1, which is called Bonferroni’s inequality. Problem 6 Let y be a scalar continuous random variable with probability density function fy (y), and g be a function transforming y to a new variable x by x = g(y). Derive the change of variables formula, i.e. the equation for the probability density function fx (x), under the assumption that g is continuously differentiable with dg(y) dy < 0 ∀ y. Problem 7 Let x ∈ X be a scalar valued continuous random variable with probability density function fx (x). Prove the law of the unconscious statistician, that is, for a function g(·) show that the expected value of y = g(x) is given by ∫ E [y] = g(x)fx (x) dx. X You may consider the special case where g(·) is a continuously differentiable, strictly monotonically increasing function. Problem 8 Prove E [ax + b] = aE [x] + b, where a and b are constants. Problem 9 Let x and y be scalar random variables. Prove that, if x and y are independent, then for any functions g(·) and h(·), E [g(x)h(y)] = E [g(x)] E [h(y)] . 2 Problem 10 From above, it follows that if x and y are independent, then E [xy] = E [x] E [y]. Is the converse true, that is, if E [xy] = E [x] E [y], does it imply that x and y are independent? If yes, prove it. If no, find a counter-example. Problem 11 Let x ∈ X and y ∈ Y be independent, scalar valued discrete random variables. Let z = x + y. Prove that ∑ ∑ fz (z) = fx (z − y) fy (y) = fx (x) fy (z − x) , y∈Y x∈X that is, the probability density function is the convolution of the individual probability density functions. Problem 12 Let x be a scalar continuous random variable that takes on only non-negative values. Prove that, for a > 0, E [x] , a which is called Markov’s inequality. Pr (x ≥ a) ≤ Problem 13 Let x be a scalar continuous random variable. Let x̄ = E [x], σ 2 = Var[x]. Prove that for any k > 0, σ2 , k2 which is called Chebyshev’s inequality. Pr (|x − x̄| ≥ k) ≤ Problem 14 Let x1 , x2 , . . . , xn be independent random variables, each having a uniform distribution over [0, 1]. Let the random variable m be defined as the maximum of x1 , . . . , xn , that is, m = max{x1 , x2 , . . . , xn }. Show that the cumulative distribution function of m, Fm (·), is given by Fm (m) = mn , 0 ≤ m ≤ 1. What is the probability density function of m? Problem 15 [ ] Prove that E x2 ≥ (E [x])2 . When does the statement hold with equality? Problem 16 Suppose that x and y are independent scalar continuous random variables. Show that ∫∞ Pr (x ≤ y) = Fx (y)fy (y) dy. −∞ 3 Problem 17 Use Chebyshev’s inequality to prove the weak law of large numbers. Namely, if x1 , x2 , . . . are independent and identically distributed with mean µ and variance σ 2 then, for any ε > 0, ( ) x1 + x2 + · · · + xn Pr − µ > ε → 0 as n → ∞. n Problem 18 Suppose that x is a random variable with mean 10 and variance 15. What can we say about Pr (5 < x < 15)? Problem 19 Let x and y be independent random variables with means µx and µy and variances σx2 and σy2 , respectively. Show that Var[xy] = σx2 σy2 + µ2y σx2 + µ2x σy2 . Problem 20 a) Use the method presented in class to obtain a sample x of the exponential distribution { λe−λx x ≥ 0 fˆx (x) = 0 x<0 from a given sample u of a uniform distribution. b) Exponential random variables are used to model failures of components that do not wear, for example solid state electronics. This is based on their property Pr (x > s + t|x > t) = Pr (x > s) ∀ s, t ≥ 0. A random variable with this property is said to be memoryless. Prove that a random variable with an exponential distribution satisfies this property. 4 Sample solutions Problem 1 From the definition of conditional probability, we have f (x, y) = f (x|y)f (y) and f (x, y) = f (y|x)f (x) and therefore f (x|y)f (y) = f (y|x)f (x). (1) We can assume that f (x) ̸= 0 and f (y) ̸= 0; otherwise f (y|x) and f (x|y), respectively, are not defined, and the statement under consideration does not make sense. We will check both “directions” of the equivalence statement: (1) f (x|y) = f (x) =⇒ f (x)f (y) = f (y|x)f (x) f (x)̸=0 =⇒ f (y) = f (y|x) (1) f (y|x) = f (y) =⇒ f (x|y)f (y) = f (y)f (x) f (y)̸=0 =⇒ f (x) = f (x|y) Therefore, f (x|y) = f (x) ⇔ f (y|x) = f (y). Problem 2 Using f (x, y|z) = f (x|y, z)f (y|z) (2) both “directions” of the equivalence statement can be shown as follows: (2) f (x|y, z) = f (x|z) =⇒ f (x, y|z) = f (x|z)f (y|z) (2) f (x, y|z) = f (x|z)f (y|z) =⇒ f (x|y, z)f (y|z) = f (x|z)f (y|z) =⇒ f (x|y, z) = f (x|z) since f (y|z) ̸= 0, otherwise f (x|y, z) is not defined. Problem 3 Let x, y, z ∈ [0, 1] be continuous random variables. Consider the joint pdf f (x, y, z) = c gx (x, z) gy (y, z) where gx and gy are functions (not necessarily pdfs) and c is a constant such that ∫1 ∫1 ∫1 f (x, y, z) dx dy dz = 1 0 0 0 Then f (x, y|z) = c gx (x, z) gy (y, z) f (x, y, z) = , f (z) c Gx (z) Gy (z) with ∫1 Gx (z) = ∫1 gx (x, z) dx, Gy = 0 gy (y, z) dy 0 5 since ∫1 ∫1 ∫1 f (z) = c gx (x, z) gy (y, z) dx dy = c 0 0 Gx (z) gy (y, z) dy = c Gx (z) Gy (z). 0 Furthermore, f (x, z) = f (x|z) = f (z) ∫1 0 f (x, y, z) dy c gx (x, z)Gy (z) gx (x, z) = = f (z) c Gx (z)Gy (z) Gx (z) and, by analogy, f (y|z) = gy (y, z) . Gy (z) Continuing the above, f (x, y|z) = gx (x, z)gy (y, z) = f (x|z)f (y|z), Gx (z)Gy (z) that is, x and y are conditionally independent. Now, let gx (x, z) = x + z and gy (y, z) = y + z. Then, ∫1 f (x, y) = ∫1 f (x, y, z) dz = c 0 ∫1 0 [ 1 1 = c z 3 + (x + y)z 2 + xyz 3 2 ∫1 f (x) = ∫1 f (x, y) dx = c 0 0 ∫1 ( f (y) = z 2 + (x + y)z + xy dz (x + z)(y + z) dz = c ]z=1 ( =c z=0 0 1 1 + (x + y) + xy 3 2 1 1 1 + x + y + xy dx = c 3 2 2 7 f (x, y) dy = c x + 12 ( ) 1 1 1 1 + + y+ y 3 4 2 2 ) ( 7 =c y+ 12 ) ) (by symmetry). 0 Therefore, ( f (x)f (y) = c 2 49 7 + (x + y) + xy 144 12 ) ̸= f (x, y), that is, x and y are not independent. Problem 4 As in Problem 3, let f (x, y, z) = c gx (x, z)gy (y, z). This ensures that f (x, y|z) = f (x|z)f (y|z). Let x, y, z ∈ {0, 1}. 6 Define gx and gy by value tables: x 0 0 1 1 y 0 1 0 1 gx (x, z) 0 1 1 0 y 0 0 1 1 z 0 1 0 1 gy (y, z) 1 0 0 1 x 0 1 f (x) 0.5 0.5 x 0 0 0 0 1 1 1 1 f (x, y, z) = 12 gx (x, z)gy (y, z) 0 0 0 0.5 0.5 0 0 0 y 0 0 1 1 0 0 1 1 z 0 1 0 1 0 1 0 1 y 0 1 f (y) 0.5 0.5 Compute f (x, y), f (x), f (y): x 0 0 1 1 y 0 1 0 1 f (x, y) 0 0.5 0.5 0 x 0 0 1 1 y 0 1 0 1 f (x)f (y) 0.25 0.25 0.25 0.25 From this we see f (x, y) ̸= f (x)f (y) . Remark: For verification, one may also compute f (x, y|z), f (x|z), f (y|z) in order to verify f (x, y|z) = f (x|z) f (y|z), although this holds by construction of f (x, y, z) and the derivation in Problem 3. Problem 5 For all x, y ∈ {0, 1} we have by marginalization, fx (x) = ∑ fxy (x, y) = fxy (x, 0) + fxy (x, 1) = fxy (x, y) + fxy (x, 1 − y) y∈{0,1} fy (y) = ∑ fxy (x, y) = fxy (x, y) + fxy (1 − x, y) . x∈{0,1} Therefore, fx (x) + fy (y) = 2fxy (x, y) + fxy (x, 1 − y) + fxy (1 − x, y) = 2fxy (x, y) + fxy (x, 1 − y) + fxy (1 − x, y) + fxy (1 − x, 1 − y) − fxy (1 − x, 1 − y) = fxy (x, y) + fxy (x, 1 − y) + fxy (1 − x, y) + fxy (1 − x, 1 − y) | {z } ∑ ∑ = fxy (x, y) = 1 y∈{0,1} x∈{0,1} − fxy (1 − x, 1 − y) + fxy (x, y) = 1 + fxy (x, y) − fxy (1 − x, 1 − y) ≤ 1 + fxy (x, y) =⇒ fxy (x, y) ≥ fx (x) + fy (y) − 1. 7 Problem 6 Consider the probability that y is in a small interval [ȳ, ȳ + ∆y] (with ∆y > 0 small), ȳ+∆y ∫ Pr (y ∈ [ȳ, ȳ + ∆y]) = fy (y) dy, ȳ which approaches fy (ȳ) ∆y in the limit as ∆y → 0. Let x̄ := g(ȳ). For small ∆y, we have by a Taylor series approximation, g(ȳ + ∆y) ≈ g(∆y) + dg (ȳ) ∆y = x̄ + ∆x, ∆x < 0 dy | {z } =:∆x Since g(y) is decreasing, ∆x < 0, see Figure 1. x̄ g(y) x̄ + ∆x ȳ ȳ + ∆y Figure 1: Decreasing function g(y) on a small interval [ȳ, ȳ + ∆y]. We are interested in the probability of x ∈ [x̄ + ∆x, x̄]. For small ∆x, this probability is equal to the probability that y ∈ [ȳ, ȳ + ∆y] (since y ∈ [ȳ, ȳ + ∆y] ⇒ g(y) ∈ [g(ȳ), g(ȳ + ∆y)] ⇒ x ∈ [x̄ + ∆x, x̄] for small ∆y and ∆x). Therefore, for small ∆y (and thus small ∆x), Pr (y ∈ [ȳ, ȳ + ∆y]) = fy (ȳ) ∆y = Pr (x ∈ [x̄ + ∆x, x̄]) = −fx (x̄) ∆x ⇒ fy (ȳ) = −fx (x̄) ∆x fy (ȳ) ⇒ fx (x̄) = ∆x . − ∆y As ∆y → 0 (and thus ∆x → 0), fx (x̄) = fy (ȳ) dg − dy (ȳ) ∆x ∆y or fx (x) = → dg dy (ȳ) fy (y) dg − dy (y) and therefore . Problem 7 From the change of variables formula1 , we have fy (y) = 1 fx (x) dg dx (x) See lecture. 8 since g is continuously differentiable and strictly monotonic. Furthermore, y = g(x) ⇒ dy = dg dx (x)dx. Let Y = g(X ), that is Y = {y | ∃ x ∈ X : y = g(x)}. Then ∫ E [y] = ∫ yfy (y) dy = Y X fx (x) dg g(x) dg (x) dx = (x) dx dx ∫ g(x)fx (x) dx. X Problem 8 Using the law of the unconscious statistician2 , we have for a discrete random variable x, E [ax + b] = ∑ (ax + b)fx (x) = a x∈X ∑ x∈X | xfx (x) +b {z } ∑ x∈X | fx (x) = aE [x] + b {z } 1 E[x] and for a continuous random variable x, ∫∞ E [ax + b] = ∫∞ (ax + b)fx (x) dx = a −∞ ∫∞ xfx (x) dx +b −∞ | {z E[x] Problem 9 Using the law of the unconscious statistician3 } fx (x) dx = aE [x] + b. −∞ | {z } 1 [ ] x applied to the vector random variable with y pdf fxy (x, y) we have, ∫∞ ∫∞ E [g(x)h(y)] = g(x)h(y)fxy (x, y) dx dy −∞−∞ ∫∞ ∫∞ = g(x)h(y)fx (x) fy (y) dx dy −∞−∞ ∫∞ = ∫∞ g(x)fx (x) dx −∞ (by independence of x, y) h(y)fy (y) dy −∞ = E [g(x)] E [h(y)] . And similarly for discrete random variables. Problem 10 We will construct a counter-example. • Let x be a discrete random variable that takes values -1, 0, and 1 with probabilities 14 , 21 , and 14 , respectively. 2 3 See Problem 7. See Problem 7. 9 • Let w be a discrete random variable taking the values -1, 1 with equal probability. Let x and w be independent. • Let y be a discrete random variable defined by y = xw (it follows that y can take values -1, 0, 1). By construction, it is clear that y depends on x. We will now show that y and x are not independent, but E [xy] = E [x] E [y]. PDF fxy (x, y): x -1 -1 -1 0 0 y -1 0 1 -1 0 fxy (x, y) 0 1 1 1 1 -1 0 1 0 → x = −1 (Prob 14 ) ∧ w = +1 (Prob → impossible since w ̸= 0 → x = −1 (Prob 14 ) ∧ w = −1 (Prob → impossible w either ±1, x = 0 .. . 1 8 0 1 8 0 1 2 1 2 ) 1 2 ) 1 8 0 1 8 Marginalization fx (x), fy (y): x -1 0 1 f (x) 0.25 0.5 0.25 y -1 0 1 f (y) 0.25 0.5 0.25 It follows that fxy (x, y) = fx (x) fy (y) ∀ x, y does not hold. For example, fxy (−1, 0) = 0 ̸= fx (−1) fy (0). 1 8 = Expected values: ∑∑ E [xy] = fxy (x, y) x y 1 1 1 1 1 = (−1)(−1) + (−1)(1) + (0)(0) + (1)(−1) + (1)(1) 8 8 2 8 8 =0 ∑ E [x] = xfx (x) x 1 1 =− + =0 4 4 E [y] = 0 So, for all x, y, E [xy] = E [x] E [y], but not fxy (x, y) = fx (x) fy (y). Hence, we have showed by counterexample that E [xy] = E [x] E [y] does not imply independence of x and y. Problem 11 From the definition of the probability density function for discrete random variables, fz (z̄) = Pr (z = z̄) . 10 The probability that z takes on z̄ is given by the sum of the probabilities of all possibilities of x and y such that z̄ = x + y, that is all ȳ ∈ Y such that x = z̄ − ȳ and y = ȳ (or, alternatively, x̄ ∈ X s.t. x = x̄ and y = z̄ − x̄). That is, ∑ fz (z̄) = Pr (z = z̄) = Pr (x = z̄ − ȳ) · Pr (y = ȳ) (by independence assumption) ȳ∈Y = ∑ fx (z̄ − ȳ) fy (ȳ) ȳ∈Y Rewriting the last equation yields fz (z) = Similarly, fz (z) = ∑ ∑ fx (z − y) fy (y). y∈Y fx (x) fy (z − x) may be obtained. x∈X Problem 12 ∫∞ ∫a ∫∞ E [x] = xf (x) dx = xf (x) dx + xf (x) dx 0 |0 {z ≥0 } a ∫∞ ∫∞ ≥ xf (x) dx ≥ af (x) dx a a ∫∞ = a f (x) dx = a Pr (x ≥ a) a Problem 13 This inequality can be obtained using Markov’s inequality4 : • (x − x̄)2 is a nonnegative random variable; apply Markov’s inequality with a = k 2 (k ≥ 0): [ ] 2 ( ) E (x − x̄) Var[x] σ2 Pr (x − x̄)2 ≥ k 2 ≤ = = k2 k2 k2 • Since (x − x̄)2 ≥ k 2 ⇔ |x − x̄| ≥ k, the result follows: Pr (|x − x̄| ≥ k) ≤ σ2 . k2 Problem 14 For 0 ≤ t ≤ 1, we write Fm (t) = Pr (m ∈ [0, t]) = Pr (max{x1 , . . . , xn } ∈ [0, t]) = Pr (xi ∈ [0, t] ∀ i ∈ {1, . . . , n}) = Pr (x1 ∈ [0, t]) · Pr (x2 ∈ [0, t]) · · · Pr (xn ∈ [0, t]) = tn . 4 See Problem 12. 11 By substituting m for t, we obtain the desired result. ∫∞ Since Fm (m) = fm (τ ) dτ , the pdf of m is obtained by differentiating, 0 fm (m) = dFm d (m) = (mn ) = n · mn−1 . dm dm Problem 15 Let x̄ = E [x]. Then ∫∞ ∫∞ 2 (x − x̄) f (x) dx = Var[x] = −∞ ∫∞ x f (x) dx − 2x̄ 2 −∞ ∫∞ 2 xf (x) dx +x̄ −∞ | {z } =x̄ [ ] [ ] = E x2 − 2x̄2 + x̄2 = E x2 − x̄2 f (x) dx −∞ | {z } =1 Since Var[x] ≥ 0, we have [ ] E x2 = x̄2 + Var[x] ≥ x̄2 , with equality if Var[x] = 0. Problem 16 For fixed ȳ, Pr (x ≤ ȳ) = Fx (ȳ) . The probability that y ∈ [ȳ, ȳ + ∆y] is ȳ+∆y ∫ Pr (y ∈ [ȳ, ȳ + ∆y]) = fy (y) dy ≈ fy (ȳ) ∆y for small ∆y. ȳ Therefore, the probability that x ≤ ȳ and y ∈ [ȳ, ȳ + ∆y] is Pr (x ≤ ȳ ∧ y ∈ [ȳ, ȳ + ∆y]) = Pr (x ≤ ȳ) · Pr (y ∈ [ȳ, ȳ + ∆y]) ≈ Fx (ȳ) fy (ȳ) ∆y (3) by independence of x and y. Now, since we are interested in Pr (x ≤ y), i.e. in the probabilty of x being less than any y and not just a fixed ȳ, we can sum (3) over all possible intervals [ȳi , ȳi + ∆y] Pr (∃ ȳi such that x ≤ ȳi ) = lim N →∞ N ∑ Fx (ȳi ) · fy (ȳi ) · ∆y i=−N By letting ∆y → 0, we obtain ∫∞ Pr (x ≤ y) = Fx (y) fy (y) dy. −∞ 12 Problem 17 Consider the random variable x1 +x2 +···+xn . n It has the mean [ ] x1 + x2 + · · · + xn 1 1 E = (E [x1 ] + · · · + E [xn ]) = (n · µ) = µ n n n and variance [( [ ] )2 ] x1 + x2 + · · · + xn x1 + x2 + · · · + xn Var =E −µ n n ]) 1 ( [ 2 2 = 2 E (x1 − µ) + · · · + (xn − µ) + “cross terms” . n The expected value of the cross terms is zero, for example, E [(xi − µ) (xj − µ)] = E [xi − µ] · E [xj − µ] for i ̸= j, by independence = 0. Hence, [ ] x1 + x2 + · · · + xn 1 nσ 2 σ2 Var = 2 (Var[x1 ] + . . . Var[xn ]) = 2 = n n n n Applying Chebyshev’s inequality with ϵ > 0 gives ( ) x1 + x2 + · · · + xn σ2 Pr ≥ϵ ≤ −→ 0 n n · ϵ2 n→0 ( ) x1 + x2 + · · · + xn > ϵ −→ 0 ⇒Pr n→0 n Remark: Changing “≥” to “>” is valid here, since ( x , . . . , x ) ( x , . . . , x ) ( x , . . . , x ) 1 1 1 n n n Pr − µ ≥ ϵ = Pr − µ > ϵ + Pr − µ = 0 . n n n {z | } =0 since Problem 18 First, we rewrite Pr (5 < x < 15) = Pr (|x − 10| < 5) = 1 − Pr (|x − 10| ≥ 5) . Using Chebyshev’s inequality, with x̄ = 10, σ 2 = 15, k = 5, yields Pr (|x − 10| ≥ 5) ≤ 15 15 3 = = . 2 5 25 5 Therefore, we can say that Pr (5 < x < 15) ≥ 1 − 3 2 = . 5 5 13 x1 +···+xn n is a CRV Problem 19 First, we compute the mean of xy, E [xy] = E [x] · E [y] = µx µy . Hence, by using independence of x and y and the result of Problem 7, [ ] Var[xy] = E (xy)2 − (µx µy )2 [ ] [ ] = E x2 E y 2 − µ2x µ2y [ ] [ ] Using E x2 = σx2 + µ2x and E y 2 = σy2 + µ2y , we obtain the result ( )( ) Var[xy] = σx2 + µ2x σy2 + µ2y − µx µy = σx2 σy2 + µ2y σx2 + µ2x σy2 . Problem 20 a) The cumulative distribution is ∫x F̂x (x) = [ ]x λe−λt dt = −e−λt t=0 = 1 − e−λx . 0 Let u be a sample from a uniform distribution. According to the method presented in class, we obtain a sample x by solving u = F̂x (x) for x, that is, u = 1 − e−λx ⇔ e−λx = 1 − u ⇔ −λx = ln (1 − u) − ln(1 − u) . ⇔x= λ Notice the following properties: u → 0 ⇒ x → 0, u → 1 ⇒ x → +∞. b) For s ≥ 0, t ≥ 0, Pr (x > s + t ∧ x > t) Pr (x > s + t) = Pr (x > t) Pr (x > t) [ −λx ]∞ ∫ ∞ −λx −e λe dx e−λ(s+t) s+t = = ∫s+t = ∞ ∞ −λx e−λt [−e−λx ]t dx t λe ∫∞ = e−λs = λe−λx dx Pr (x > s + t|x > t) = s = Pr (x > s) . 14