Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 7: Chapter 7. Sums of Random Variables and Long-Term Averages ELEC206 Probability and Random Processes, Fall 2014 Gil-Jin Jang [email protected] School of EE, KNU page 1 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages Overview 7.1 Sums of Random Variables (7.1.1) 7.2 The Sample Mean and the Laws of Large Numbers Review of 4.6 7.3 The Markov and Chebyshev Inequalities∗ The Central Limit Theorem (7.3.1) Many problems involve counting the number of vent occurrences, measuring cumulative effects, or computing arithmetic averages in a series of measurements. These probables can be reduced to the problem of finding, exactly or approximately, the distribution of a random variable that consists of the sum of n independent, identically distributed random variables. We investigate sums of random variables and their properties as n becomes large. page 2 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages 7.1 Sums of Random Variables Let Sn = X1 + X2 + · · · + Xn . Regardless of statistical dependence, the expected value of a sum of n random variables is equal to the sum of the expected values: (proof: see Example 5.24) 7.1.1 Mean and Variance of Sums of Random Variables E[Sn ] = E[X1 + X2 + · · · + Xn ] = E[X1 ] + E[X2 ] + · · · + E[Xn ] Example 7.1 Find the variance of Z = X + Y . VAR[Z] = = = E[(Z − mZ )2 ] = E[{(X + Y ) − (mX + mY )}2 ] E[{(X − mX ) + (Y − mY )}2 ] ... VAR[X] + VAR[Y ] + 2 COV(X, Y ) page 3 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages Generalisation to many RVs Variance of the sum of nRVs “NOT” equal to the sum of individual n n X X VAR[Sn ] = E (Xj − E[Xj ]) (Xk − E[Xk ]) j=1 = n X n X j=1 k=1 = n X k=1 E [(Xj − E[Xj ]) (Xk − E[Xk ])] = VAR [Xk ] + variances: n X n X n X n X COV (Xj , Xk ) j=1 k=1 COV (Xj , Xk ) j=1 k=1,k6=j k=1 Variance of the sum of n “INDEPENDENT” RVs VAR[Sn ] = equal to the sum of variances: VAR [X1 ] + · · · + VAR [Xn ] if X1 , X2 , . . ., Xn are iid (independent, identically distributed), then Example 7.2 Sum of iid RVs E[Sn ] = VAR[Sn ] = E[X1 ] + · · · + E[Xn ] = nµ nVAR[Xj ] = nσ 2 page 4 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages 7.2 The Sample Mean and the Laws ... Let X be a RV for which the mean, E[X] = µ, is unknown. Let X1 , . . ., Xn denote n independent, repeated measurements of X; i.e., Xj ’s are iid RVs with the same pdf as X. The sample mean is a RV to estimate the true mean E[X]. n 1X Xj . Mn = n j=1 We will assess the effectiveness of Mn as an estimator for E[X] by computing the expected value and variance of Mn , and investigating the behaviour of Mn as n becomes large. This is very important in real situations: “How many measurements are necessary to obtain a reliable mean?” “We made n measurements and take the average of them as the mean value. Can we be confident of our mean?” page 5 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages Sample Mean The sample mean is itself a RV, so it will exhibit random variation The properties (conditions) of a good “estimator” 1. On the average, it should give the correct value: E[Mn ] = µ 2. Should not vary too much: VAR[Mn ] = E[(Mn − µ)2 ] is small enough. The sample mean is an unbiased estimator for µ: n n X X 1 1 1 Xj = E[Xj ] = nµ = µ E[Mn ] = E n j=1 n j=1 n since E[Xj ] = E[X] = µ, ∀j. The estimator is not biased, but centered at the true value. page 6 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages Variance of Sample Mean Compute the variance of Mn with its mean µ VAR[Mn ] = E[(Mn − µ)2 ] = E[(Mn − E[Mn ])2 ] Mn = 1 n Pn j=1 Xj = 1 n Sn where Xj ’s are iid RVs. From Example 7.2, VAR[Sn ] = nVAR[Xj ] = nσ 2 , and hence nσ 2 σ2 1 VAR[Mn ] = 2 VAR[Sn ] = 2 = n n n The variance of the sample mean approaches zero as the number of samples is increased. In other words, the probability of the sample mean being close to true mean approaches “one” as n becomes very large. page 7 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages 4.6 Markov and Chebyshev Inequalities The mean and variance of a random variable X helps obtain bound for probabilities of the form P [|X| ≥ t]. For X nonnegative E[X] P [X ≥ a] ≤ a Z Z Z Z Markov inequality: tfX (t)dt+ E[X] = 0 a ∞ ∞ ∞ a tfX (t)dt ≥ a tfX (t)dt ≥ a afX (t)dt = aP [X ≥ a] Suppose that the mean E[X] = m and the variance VAR[X] = σ 2 . Then σ2 P [|X − m| ≥ a] ≤ 2 a Chebyshev inequality: proof Let D2 = (X − m)2 be the squared deviation from the mean. Apply the Markov inequality to D2 , E[(X − m)2 ] σ2 2 2 P [D ≥ a ] ≤ = 2 a2 a In the case of σ 2 = 0, then it implies that P [X = m] = 1. page 8 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages Applying Chebyshev Inequality Keep in mind that E[Mn ] = µ and VAR[Mn ] = inequality can be formulated by σ2 n , then the Chebyshev VAR[Mn ] P [|Mn − E[Mn ]| ≥ ε] ≤ ε2 σ2 ⇔ P [|Mn − µ| ≥ ε] ≤ 2 nε σ2 ⇔ P [|Mn − µ| < ε] ≥ 1 − 2 nε If the true variance σ 2 is known, we can select the number of samples n so that Mn is within ε and the true mean with probability 1 − δ or greater. page 9 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages Example 7.9 A voltage of (unknown) constant value is to be measured. Each measurement Xj is the sum of the desired voltage v (true mean) and a noise voltage Nj of zero mean and standard deviation of 1 microvolt (µV ): Xj = v + Nj Assume that the noise voltages are independent RVs. How many measurements are required so that the probability that Mn is within ε = 1µV of the true mean at least 0.99? Solution E[Xj ] = E[v + Nj ] = v + E[Nj ] = 0 n 1 X for Mn = Xj , n j=1 VAR[Xj ] = VAR[v + Nj ] = VAR[Nj ] = 1 2 σX P [|Mn − µX | < ε] ≥ 1 − nε2 ⇒ P [|Mn − v| < ε] ≥ 1 − 1 = 0.99 n ∴ n = 100 page 10 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages Laws of Large Numbers Let X1 , X2 , . . . be a sequence of iid RVs with finite mean E[X] = µ, then for ε > 0, and even if the variance of the Xj ’s does not exist, Weak Law of Large Numbers σ2 lim P [|Mn − µ| < ε] = 1 (= lim 1 − 2 ) n→∞ n→∞ nε Let X1 , X2 , . . . be a sequence of iid RVs with finite mean E[X] = µ and finite variance, then Strong Law of Large Numbers P h i lim Mn = µ = 1 n→∞ (proof: beyond sophomore level) page 11 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages Example 7.10 In order to estimate the probability of an event A, a sequence of Bernoulli trials is carried out and the relative frequency of A is observed. How large should n be in order to have a 0.95 probability that the relative frequency is within 0.01 of p = P [A]? Solution Let X = IA be the indicator function of the occurrence of A. Since p = P [A] is the occurrence probability of the Bernoulli trial, E[X] = E[IA ] = p, VAR[X] = p(1 − p). The sample mean for X is: n n 1 X 1 X Mn = Xk = IA,k = fA (n) n k=1 n k=1 Since Mn is an estimator for E[X] = p, fA (n) is also an estimator for p. By applying Chebyshev Inequality, 2 σX 1 ≤ P [|fA (n) − p| ≥ ε] ≤ nε2 4nε2 ⇒ P [|fA (n) − p| < ε] ≥ 1 − 2 =VAR[X]=p(1−p)=−(p− 1 )2 + 1 ≤ 1 ∵σX 2 4 4 ⇒1− 1 4nε2 1 = 0.95 ⇒ n = 1/{(1 − 0.95) · 4ε2 } = 50, 000 2 4nε page 12 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages 7.3 The Central Limit Theorem Let X1 , X2 , . . . be a sequence of RVs with finite mean µ and finite variance σ 2 . Regardless of the probability distribution of Xj , the sum of the first n RVs, Sn = X1 + · · · + Xn approaches a Gaussian RV with mean E[Sn ] = nE[Xj ] = nµ and variance VAR[Sn ] = nVAR[Xj ] = nσ 2 , as n becomes large. Notion Mathematical formulation Standardise Sn by Zn = 1 lim P [Zn ≤ z] = √ n→∞ 2π Z Sn − nµ √ then σ n z −x2 /2 e dx −∞ √ Mn − nµ n In terms of Mn = j=1 Xj , Zn = σ Importance of Gaussian RV The central limit theorem explains why the Gaussian RV appears in so many diverse applications, since the real observations are from the mixture of so many random variables. 1 n Pn page 13 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages Example 7.11 7.11 Suppose that orders at a restaurant are iid random variables with mean µ = $8 and standard deviation σ = $2. 1. Estimate the probability that the first 100 customers spend a total of more than $840. S100 = 100 X i=1 Xi , Z100 = S100 − nµ S100 − 100 · 8 S100 − 800 √ = √ = 20 nσ 2 100 · 22 P [S100 > 840] = P [Z100 > 840 − 800 ] ≈ Q(2) = 2.28 × 10−2 20 2. Estimate the probability that the first 100 customers spend a total of between $780 and $820. P [780 ≤ S100 ≤ 820] = P [−1 ≤ Z100 ≤ 1] = Φ(1) − Φ(−1) = 1 − Q(1) − Q(1) = 1 − 2Q(1) ≈ 0.682 page 14 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages Example 7.12 7.12 In Example 7.11, after how many orders can we be 90% sure that the total spent by all customers is more than $1000? P [Sn > 1000] ≥ 0.9 1000 − 8n 1000 − 8n √ √ ⇔ P Zn > =Q ≥ 0.9 2 n 2 n 1000 − 8n √ ≤ 1 − 0.9 = 0.1 ⇔ Q − 2 n 1000 − 8n √ ≤ Q−1 (0.1) ≈ 1.2815 ⇔ − 2 n ∴ n ≥ 129 page 15 / 15 — Chapter 7. Sums of Random Variables and Long-Term Averages