Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Limit Theorems Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Probability Introduction bounds for probability sequences of random variables very large number of random variables Prof. C. Chen Limit Theorems Probability Bounds Prof. C. Chen Limit Theorems Markov Inequality For any non-negative random variable X and a > 0 P(X ≥ a) ≤ Prof. C. Chen E[X] a Limit Theorems Proof Define Y (X) = (X < a) ? 0 : a ⇒ Y (X) ≤ X ⇒ E[Y (X)] ≤ E[X] ⇒ a P(X ≥ a) ≤ E[X] E[X] ⇒ P(X ≥ a) ≤ a Prof. C. Chen Limit Theorems Chebyshev Inequality For any random variable X and c > 0 P (|X − E[X]| ≥ c) ≤ Prof. C. Chen var(X) c2 Limit Theorems Proof Define Z(X) = (|X − E[X]| < c) ? 0 : c2 ⇒ Z(X) ≤ (X − E[X])2 ⇒ E[Z(X)] ≤ E[(X − E[X])2 ] ⇒ c2 P(|X − E[X]| ≥ c) ≤ var(X) var(X) ⇒ P(|X − E[X]| ≥ c) ≤ c2 Prof. C. Chen Limit Theorems Comparison Markov inequality for non-negative random variables only the probability of being away from 0 by a the bound is inversely proportional to a Chebyshev inequality good for any random variables the probability of being away from mean by c the bound is inversely proportional to c2 Prof. C. Chen Limit Theorems Example 5.1 & 5.2 Let X be uniformly distributed in the interval [0, 4] E[X] = 2, var(X) = 4 3 By Markov inequality E[X] =1 2 E[X] 2 P(X ≥ 3) ≤ = 3 3 P(X ≥ 2) ≤ By Chebyshev inequality P(|X − 2| ≥ 1) ≤ Prof. C. Chen var(X) 4 = 2 1 3 Limit Theorems Example 5.3 When X takes values in [a, b], it can be shown that var(X) ≤ (b − a)2 4 Using this result, the upper bound in Chebyshev inequality can be replaced by a simpler bound P(|X − E[X]| ≥ c|) ≤ Prof. C. Chen (b − a)2 4c2 Limit Theorems Sequence of Random Variables Prof. C. Chen Limit Theorems Definition A sequence of random variables is X1 , X2 , . . . An instantiation of a sequence of random variables is called a sample sequence. Prof. C. Chen Limit Theorems Examples packet arrivals in a slot inter-arrival times daily currency exchange rates feature sequence of acoustic signal daily rainfalls at a particular location sequence of DNA words in a sentence Prof. C. Chen Limit Theorems Independent and Identically Distributed Sequence A sequence of random variables X1 , X2 , . . . is said to be iid (independent and identically distributed) if the random variables are independent and have the same probability distribution function. The common mean and variance of the random variables in an iid sequence will be denoted by E[Xi ] = µ, Prof. C. Chen var(Xi ) = σ 2 Limit Theorems Sample Mean The sample mean of a sequence X1 , X2 , . . . is defined by Mn = X1 + · · · + Xn n For an iid sequence of random variables n P E[Mn ] = n n P var(Mn ) = E[Xi ] i=1 var(Xi ) i=1 Prof. C. Chen =µ n2 = σ2 n Limit Theorems Weak Law of Large Numbers For an iid sequence of random variables X1 , X2 , . . . the sequence of sample means M1 , M2 , . . . converges to µ in the following sense: for any > 0 lim P(|Mn − µ| ≥ ) = 0 n→∞ That is, the probability that Mn differs from µ by more than vanishes as n increases. Prof. C. Chen Limit Theorems Proof By Chebyshev inequality, for any > 0, the probability that Mn differs from µ by more than is bounded by P (|Mn − µ| ≥ ) ≤ var(Mn ) 2 Substituting var(Mn ) = we have P (|Mn − µ| ≥ ) ≤ Prof. C. Chen σ2 n σ2 n2 −→ 0 Limit Theorems Example 5.4 Relative Frequency The relative frequency of occurrence of an event converges to the probability of the event. Loosely speaking probability = relative frequency of occurrence Prof. C. Chen Limit Theorems Explanation Consider event A. Let ( Xi = 1, if A occurs in the ith experiment 0, otherwise Consider the sample mean of the iid sequence X1 , X2 , . . . Mn = X1 + · · · + Xn n The right-hand side is the relative frequency of event A Mn converges to E[X], which is E[X] = 1 · P(A) + 0 · (1 − P(A)) = P(A) Prof. C. Chen Limit Theorems Example 5.5 Poll Let p be the fraction of voters who support Trump. We poll n randomly selected voters and record Mn , the fraction of them that support Trump. We view Mn as our estimate of p and would like to investigate its properties. Prof. C. Chen Limit Theorems Analysis of Poll Result Let Xi indicate the support of the ith selected voter for Trump. ⇒ E[Xi ] = p · 1 + (1 − p) · 0 = p var(Xi ) = E[Xi2 ] − E 2 [Xi ] = p(1 − p) Mn = X1 + · · · + Xn n ⇒ E[Mn ] = p, var(Mn ) = p(1 − p) n By Chebyshev’s inequality, for any > 0 P (|Mn − p| ≥ ) ≤ p(1 − p) 1 ≤ n2 4n2 which can be arbitrarily small by increasing n. Prof. C. Chen Limit Theorems Application In a poll, accuracy and confidence of the estimate can be achieved simultaneously by using a large n. To have the probability that Mn is within 0.01 of p not less than 0.95 P (|Mn − p| ≥ 0.01) ≤ 0.05 we choose n such that 1 ≤ 0.05 4n(0.01)2 ⇒ n ≥ 50000 Prof. C. Chen Limit Theorems Convergence of Random Variables Prof. C. Chen Limit Theorems Convergence of Real Numbers A sequence of real numbers r1 , r2 , . . . is said to converge to a real number r if for any δ > 0 |rn − r| ≤ δ, ∀ n ≥ n0 for some n0 . This is denoted by lim rn = r n→∞ or rn −→ r As n increases, rn gets arbitrarily close to r, and never gets away. Prof. C. Chen Limit Theorems From Real Numbers to Random Variables Consider a sequence of random variables Y1 , Y2 , . . . What does it mean to say that the sequence converges to a random variable? convergence in probability convergence with probability 1 Prof. C. Chen Limit Theorems Convergence in Probability A sequence Y1 , Y2 , . . . is said to converge in probability to a random variable Y if for any > 0 lim P(|Yn − Y | ≥ ) = 0 n→∞ This is denoted by P Yn −−→ Y As n increases, the probability that Yn gets away from Y gets arbitrarily close to 0 and never gets away from 0. Prof. C. Chen Limit Theorems A Closer Look Convergence in probability can be understood by convergence of real numbers. Specifically, for any > 0, the sequence pn () = P(|Yn − Y | ≥ ) −→ 0 On the other hand, sample sequences of Yn are not required to converge! Prof. C. Chen Limit Theorems Accuracy and Confidence Convergence in probability P Yn −−→ Y also means that for any accuracy > 0 and confidence δ > 0, there exists an n0 such that P(|Yn − Y | ≥ ) ≤ δ for n ≥ n0 . Prof. C. Chen Limit Theorems Test of Convergence in Probability To decide whether P Yn −−→ Y decide if lim P(|Yn − Y | ≥ ) = 0 n→∞ for any . Prof. C. Chen Limit Theorems Example 5.6 Convergence in Probability Consider an iid sequence of random variables X1 , X2 , . . . in which Xi is uniformly distributed in the interval [0, 1], and let Yn = min(X1 , . . . , Xn ) Is it true that P Yn −−→ 0 For any > 0 lim P(|Yn − 0| ≥ ) = lim (1 − )n = 0 n→∞ n→∞ Prof. C. Chen Limit Theorems Example 5.7 Convergence in Probability Let Y be an exponential random variable with parameter λ = 1. For any positive integer n, let Yn = Y , n Is it true that n = 1, 2, . . . P Yn −−→ 0 For any > 0 lim P(|Yn − 0| ≥ ) = lim e−n = 0 n→∞ n→∞ Prof. C. Chen Limit Theorems Example 5.8 Convergence in Probability Consider a sequence of random variables Y1 , Y2 , . . . with P(Yn = y) = 1 1 − n , 1 , n 0, Is it true that y=1 y = n2 otherwise P Yn −−→ 1 How about E[Yn ] −→ 1 Prof. C. Chen Limit Theorems Re-Statement of the Weak Law of Large Numbers For an iid sequence of random variables, the sample mean converges to the common mean in probability P Mn −−→ µ Prof. C. Chen Limit Theorems Normalized Sample Mean For an iid sequence of random variables X1 , X2 , . . . the normalized sample mean is defined by Zn = (X1 + · · · + Xn ) − nµ √ σ n E[Zn ] = 0 var(Zn ) = 1 Prof. C. Chen Limit Theorems Central Limit Theorem The normalized sample mean of an iid sequence of random variables has the limiting distribution of a standard normal random variable. lim P(Zn ≤ t) = P(Y ≤ t), n→∞ Y is standard normal = Φ(t) Z t 1 2 =√ e−s /2 ds 2π −∞ Prof. C. Chen Limit Theorems Approximation The sum of iid random variables is approximately Gaussian √ (Sn = X1 + · · · + Xn ) = nσZn + nµ ≈ N (nµ, nσ 2 ) ⇒ P(Sn ≤ c) = P (X1 + · · · + Xn ≤ c) X1 + · · · + Xn − nµ c − nµ √ =P ≤ √ σ n σ n c − nµ = P Zn ≤ √ σ n c − nµ ≈P Y ≤ √ σ n c − nµ √ =Φ σ n Prof. C. Chen Limit Theorems Example 5.9 Loading Packages We load on a plane 100 packages whose weights are independent random variables that are uniformly distributed between 5 and 50 pounds. What is the probability that the total weight will exceed 3000 pounds? Prof. C. Chen Limit Theorems Solution Let Xi be the weight of package i E[Xi ] = 27.5, var(Xi ) = (50 − 5)2 = 168.75 12 The total weight of n packages is Sn = X1 + · · · + Xn S100 − 100 · 27.5 3000 − 100 · 27.5 √ > √ 100 · 168.75 100 · 168.75 = P(Zn > 1.92) ⇒ P(S100 > 3000) = P = 1 − P(Zn ≤ 1.92) ≈ 1 − Φ(1.92) = 0.0274 Prof. C. Chen Limit Theorems Example 5.10 Processing Parts A machine processes parts, one part at a time. The processing times of different parts are independent random variables, uniformly distributed in [1, 5]. We wish to approximate the probability that the number of parts processed within 320 time units, denoted by N320 , is at least 100. Prof. C. Chen Limit Theorems Solution N320 ≥ 100 ⇔ S100 = T1 + · · · + T100 ≤ 320 Since E[Ti ] = 3, ⇒ P(S100 ≤ 320) = P var(Ti ) = S100 − 100 · 3 q 100 · 4 3 4 3 ≤ = P(Zn ≤ 1.73) ≈ P(Y ≤ 1.73) = 0.9582 Prof. C. Chen Limit Theorems 320 − 100 · 3 q 100 · 4 3 Example 5.11 Poll We poll n voters and record the fraction Mn of those polled who are in favor of a particular candidate. If p is the fraction of the entire voter population that supports this candidate, then Mn = X1 + · · · + Xn n where Xi ’s are iid Bernoulli random variables with parameter p. Given sample size n and accuracy , the central limit theorem can be used to bound P(|Mn − p| > ) Given accuracy and bound δ, a sample size n to satisfy P(|Mn − p| > ) < δ can be decided. Prof. C. Chen Limit Theorems Bounding Probability X1 + · · · + Xn X1 + · · · + Xn − np −p= n n X1 + · · · + Xn − np σ √ √ = nσ n σ = Zn · √ n √ ! ! n ⇒ P(|Mn − p| ≥ ) ≈ 2P(Mn − p ≥ ) = 2P Zn ≥ σ √ ! ! n ≈ 2P Y ≥ σ " √ ! !# n ≤2 1−P Y ≤ σ √ ≤ 2 1 − Φ(2 n) Mn − p = Prof. C. Chen Limit Theorems Sample Size Consider the same accuracy and confidence set earlier = 0.01, δ = 0.05 We want to find sample size n such that P(|Mn − p| ≥ 0.01) ≤ 0.05 P(|Mn − p| ≥ ) ≤ 2(1 − Φ(2 · √ n · )) ≤ δ √ ⇒ P(|Mn − p| ≥ 0.01) ≤ 2(1 − Φ(2 · n · 0.01)) ≤ 0.05 Using the standard normal table √ √ Φ(2 · n · 0.01) ≥ 0.975 ⇒ 2 · n · 0.01 ≥ 1.96 ⇒ n ≥ 9604 Prof. C. Chen Limit Theorems Convergence with Probability 1 A sequence of random variables is said to converge with probability 1, or converge almost surely, to random variable Y if P lim Yn = Y = 1 n→∞ That is P({ω | lim Yn (ω) = Y (ω)}) = 1 n→∞ This is denoted by a.s. Yn −−−→ Y Prof. C. Chen Limit Theorems Comparison with Convergence in Probability Convergence with probability 1 implies convergence in probability. The converse is not true. Yn −−−→ Y ⇒ P ⇒ P ⇒ P a.s. ⇒ lim Yn = Y =1 lim Yn 6= Y =0 n→∞ n→∞ lim |Yn − Y | = 6 0 =0 n→∞ P lim P (|Yn − Y | > ) = 0 ⇒ Yn −−→ Y n→∞ The converse is not true since convergence in probability does not require sample sequences of Yn to converge, which is required by convergence with probability 1. Prof. C. Chen Limit Theorems Example 5.15 Convergence with Probability 1 Let X1 , X2 , . . . be a sequence of iid random variables uniformly distributed in [0, 1], and Yn = min(X1 , . . . , Xn ), n = 1, 2, . . . Show that Yn converges to 0 with probability 1. Prof. C. Chen Limit Theorems Solution Y1 ≥ Y2 ≥ · · · ≥ 0 Any sample sequence of Y1 , Y2 , . . . is non-increasing and bounded from below, so its convergence is assured. For any >0 P(Yn > ) = P(X1 > , . . . , Xn > ) = (1 − )n ⇒ P lim Yn = 0 = 1 n→∞ Prof. C. Chen Limit Theorems Example 5.16 Convergence Consider a discrete-time arrival process. The time slots are partitioned into consecutive intervals of the form Ik = {2k , 2k + 1, . . . , 2k+1 − 1}. Note that the length of Ik is 2k , which increases with k. During each Ik , there is exactly one arrival, and all times within an interval are equally likely. The arrival times within different intervals are assumed independent. Define Yn = 1 if there is an arrival at time n, and Yn = 0 otherwise. Show that Yn converges to 0 in probability, but Yn does not converge to 0 with probability 1. Prof. C. Chen Limit Theorems Comparison Yn converges to 0 in probability, since P(|Yn − 0| > ) = P(Yn 6= 0) = 1 n→∞ −−−→ 0 2blog2 nc On the other hand, Yn does not converge to 0 with probability 1. In fact, any sample sequence does not even converge! Prof. C. Chen Limit Theorems Strong Law of Large Numbers For an iid sequence of random variables X1 , X2 , . . . the sequence of sample means M1 , M2 , . . . converges to µ with probability 1 a.s. Mn −−−→ µ That is X1 + · · · + Xn P lim =µ =1 n→∞ n Prof. C. Chen Limit Theorems Weak and Strong Consider an iid sequence. by the weak law, the probability that the sample mean gets away from the common mean converges to 0 by the strong law, the probability of the set of sample sequences whose sample means do not converge to the common mean is 0 Prof. C. Chen Limit Theorems