* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Independent random variables
Series (mathematics) wikipedia , lookup
Non-standard calculus wikipedia , lookup
Nyquist–Shannon sampling theorem wikipedia , lookup
List of important publications in mathematics wikipedia , lookup
Georg Cantor's first set theory article wikipedia , lookup
Infinite monkey theorem wikipedia , lookup
Fermat's Last Theorem wikipedia , lookup
Brouwer fixed-point theorem wikipedia , lookup
Fundamental theorem of calculus wikipedia , lookup
Mathematical proof wikipedia , lookup
Four color theorem wikipedia , lookup
Karhunen–Loève theorem wikipedia , lookup
Wiles's proof of Fermat's Last Theorem wikipedia , lookup
Fundamental theorem of algebra wikipedia , lookup
Independent random variables E6711: Lectures 3 Prof. Predrag Jelenković 1 Last two lectures probability spaces probability measure random variables and stochastic processes distribution functions independence conditional probability memoriless property of geometric and exponential distributions expectation conditional expectation (double expectation) mean-square estimation 1 Let fXj ; j and 1g be a sequence of independent random variables Nn = n X Xj j =1 be a partial sum of the first n of these r.v.s. In many applications understanding the statistical behavior of these sums is very important. Thus, a big part of probability theory studies the characteristics of Nn . In this lecture we review some of the well-known theorems of probability theory: Markov and Chebyshev’s inequalities Laws of Large Numbers Central Limit Theorem 2 Inequalities Proposition 2.1 (Markov’s inequality) If dom variable, then for any a > 0 P[X X is a nonnegative ran- a] E aX : Proof: For a > 0, let us define an indicator function 1[X a] = Then, 1[X 1 if X a 0 otherwise: a] Xa ; 2 thus, by taking the expected value on both sides in the preceding inequality we obtain E 1[X a] = P[X a] E aX : 3 Corollary 2.1 (Chebyshev’s inequality) If X is a random variable with finite mean and variance 2 = E (X )2, then for any > 0 j P[ X Proof: Let Y )2, = (X j 2 j ] 2 : then j ] = P[(X ) ] = P[Y ] EY = ; P[ X 2 2 2 2 2 2 where the last inequality follows from Markov’s inequality. Corollary 2.2 (Chernoff’s bound) Let M (t) t > 0, then P[X Proof: Let Y =e tX y] e and a = ety , then P[X ty def = E etX < M (t): y] = P[tX ty] tX = P[e ety ] = P[Y a] EaY = e ty M (t); 3 3 1 for some note that the last inequality follows from Markov’s inequality. 3 3 Laws of Large Numbers: ergodic theorems Ergodic theory studies the conditions under which the sample path average def Y = X1 + converges to the mean = E X 1 as + Xn n n (3.1) ! 1. Theorem 3.1 (Weak Law of Large Numbers) Let X 1; X2; : : :, be a sequence of independent random variables with finite mean and variance 2. Then, for any > 0 X1 + X2 + P n + Xn !0 as n ! 1: Proof: Recall the definition of Y from equation (3.1), then EY = E X1 + X2 + n and Var(Y ) = Var = = + Xn = X1 + X2 + + Xn n Var(X1 ) + + Var(Xn) n2 2 n Thus, by Chebyshev’s inequality j P[ Y 2 j ] n ! 0 2 4 as n ! 1; 3 we conclude the proof of the theorem. Now we know that X1 + X2 + P n + Xn converges to zero, however it is not clear how fast? This problem is investigated by the theory of Large Deviations. Recall M (t) = E etX1 and define the rate function def l(a) = log inf e t0 ta M (t) = sup(ta t0 log M (t)): Then Theorem 3.2 For every a > E X 1 and n 1 P X1 + X2 + n + Xn a e nl(a) : Proof: For any t > 0 P X1 + X2 + n + Xn a = P het X ( e (Chernoff’s inequality) = Thus, P X1 + X2 + n 1 +X2 + +Xn) etan i E et(X1 +X2++Xn) ta tX1 n e Ee : tan + Xn a inf t0 =e e nl(a) ta tX1 n Ee : 3 5 The preceding two theorems estimate the probabilities that a sample path mean is close to the (ensemble) mean. The following theorem goes one step further in showing that for almost every fixed omega the sample path average converges to the mean (in the ordinary deterministic sense). Theorem 3.3 (Strong Law of Large Numbers) Let X 1; X2; : : :, be a sequence of independent random variables with finite mean and def K = E X14 < 1. Then, for almost every ! (or with probability 1) X1 + X2 + n + Xn ! as n ! 1: Remark: For this theorem to hold it is enough to assume that the mean = E X1 exists (i.e., it could be even infinite). However, in order to present a simpler proof, we impose a stronger assumption E X14 < 1. Proof: To begin, assume that = E X j = 0; then E Nn4 = +Xn)(X + +Xn)(X + +Xn)(X + +Xn)]: E [(X1 + 1 1 1 Now, expanding the right-hand side of the equation above will result in terms of the form (i 6= j 6= k ) E Xi4 E [Xi3 Xj ] = E Xi3 E Xj =0 by independence E Xi2 Xj2 E [Xi2 Xj Xk ] = E Xi2 E Xj E Xk by independence E [Xi Xj Xk Xl ] = E Xi E Xj E Xk E Xl = 0 by independence: =0 6 Next, there are n terms of the form E X i4 and for each i 6= j there are 4 2 2 = 6 terms in the expansion that are equal to E X i Xj . Hence, 2 E Nn4 n 4 = nE X1 + 6 2 2 (E X1 ) 2 2 2 1)(E X1 ) : = nK + 3n(n Also, K 0 < 1 implies E X Var(X ) = E X 2 1 2 1 < 4 1 1, since 2 2 (E X1 ) ) (3.2) 2 2 (E X1 ) K (3.3) Now, by replacing (3.3) in (3.2), we obtain E Nn4 nK + 3nK 4nK : n4 Thus, E 1 4 X N n n=1 4 n 3 = 2 1 X E N4 n n=1 n 4 2 1 X 4K n=1 n2 < 1: Therefore, with probability 1 1 4 X N n n=1 n4 < 1; which implies that, with probability 1 Nn4 lim = 0; n!1 n4 or equivalently P Nn lim = 0 n!1 n = 1: This concludes the proof of the case = 0. If Xj0 = Xj E Xj and use the same proof. 7 6 = 0, then define 3 4 Central Limit Theorem Central Limit Theorem, Similarly to the Large Deviation Theorem, measures the deviation of the sample mean from the expected value . Theorem 4.1 (Cental Limit Theorem (CLT)) Let X j ; j 1 be a sequence of i.i.d. r.v.s with mean and variance 2 < 1. Then, the distribution of def Zn = X1 + +p Xn n tends to standard normal distribution as number a P X1 + pXn n n a !p n Z 1 2 n ! 1, i.e., for any real a 1 e x2 =2 dx as n ! 1: First, we state the following key lemma that will be used in the proof of CLT. Lemma 4.1 Let Z1; Z2; : : :, be a sequence of r.v.s having distribution functions FZn and moment generating functions M Zn (t) = E etZn ; n 1; And let Z be a random variable having distribution F Z and moment generating function M Z (t). It MZn (t) ! MZ (t) as n ! 1, for all t, then FZn (x) ! FZ (x) Proof: Omitted. 8 as x ! 1: 3 Proof of CLT: Assume that = 0 and 2 = 1. Then, moment p generating function (m.g.f.) of X j = n is equal to h p i tXj = n E e Thus, the m.g.f. of pn) where M (t) = E etX : p X = n is equal to = M (t= Pn j =1 j j M Now, if L(t) def = log M (t), log pn) Thus lim n!1 L(t= n 1 M pn t = lim n!1 = lim n!1 = lim n!1 = lim n!1 then n 00 L (0) = n : = nL L0 (t= L0 (t= 2n pn)n pn t 3=2 pn)t = t pn)n L(t= pn) n 1 : (by L’Hospital’s rule) 2 2n L00 (t= 1=2 2n 3= 2 2 t (by L’Hospital’s rule) 3=2 p t 00 L (t= n) 2 2 Next, note that L(0) = 0 pn t 0 L (0) = M 0 (0) M (0) M (0)M 00 (0) ==0 0 (M (0)) 2 (M (0))2 Hence, for any finite t lim n!1 p 00 L (t= n) = 1 9 = E X2 = 1: and, therefore p t lim nL(t= n) = ; 2 n!1 or, equivalently 2 lim M n!1 pn t n t2 =2 =e : On the other hand, if Z is a standard normal r.v., then E etN t2 =2 =e ; which, by Lemma 4.1, concludes the proof of the theorem for = 0 and 2 = 1. For the general case 6= 0 and 2 6= 1, we can introduce new variables def Xj = clearly E Xj = 0 Xj and ; Var(Xj ) = 1; and, therefore, we can use the already proved case. 10 3