Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Random variable wikipedia , lookup
Probability box wikipedia , lookup
Inductive probability wikipedia , lookup
Birthday problem wikipedia , lookup
Ars Conjectandi wikipedia , lookup
Conditioning (probability) wikipedia , lookup
Infinite monkey theorem wikipedia , lookup
Central limit theorem wikipedia , lookup
Stat 501 – Probability Theory I Some Lecture Notes Ryan Martin Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago [email protected] www.math.uic.edu/~rgmartin October 20, 2014 Abstract These notes are based on the two lectures1 I gave in Stat 501, Probability Theory I, as a substitute for Professor Cheng Ouyang. There is a brief discussion of the so-called “basic grouping lemma” concerning independent σ-algebras, followed by a more detailed discussion of the Borel–Cantelli lemma, some applications, and some elaborations on independence. Independence, and the basic grouping lemma Let {Bt : t ∈ T } be a collection of independent σ-algebras, i.e., for any k ≥ 1, for any t1 , . . . , tk , and any Bt1 , . . . , Btk in Bt1 , . . . , Btk , the events Bt1 , . . . , Btk are independent. It makes sense that disjoint sub-collections of the σ-algebras are also independent. Here is the formal result. Lemma (Grouping Lemma). Let {Bt : t ∈ T } be an independent collection of σ-algebras. Let S be an index set with the property that, for s ∈ S, Ts ⊂ T and {Ts : s ∈ S} are pairwise disjoint. Define BTs = smallest σ-algebra containing all Bt , t ∈ Ts . Then {BTs : s ∈ S} is an independent collection of σ-algebras. Proof. Pretty easy, see pages 101–102 in Resnick. Despite the complicated σ-algebra terminology, the Grouping Lemma is quite intuitive. For example, let X1 , . . . , Xn be a collection of independent random variables. Then the Grouping Lemma says that • σ({X1 , . . . , Xk }) and σ({Xk+1 , . . . , Xn }) are independent σ-algebras; Pk Pn • X and i i=1 i=k+1 Xi are independent random variables; • and, more generally, f (X1 , . . . , Xk ) and g(Xk+1 , . . . , Xn ) are independent random variables for any suitable real-valued functions f and g. 1 These lectures are based, in part, on Sidney Resnick’s A Probability Path. 1 Borel–Cantelli lemma, part I For a given sequence of events {An : n ≥ 1} in a common probability space, recall the notion of “limsup” of events. That is \ [ lim sup An = An =: {An , i.o.}. n N ≥1 n≥N That is, An occurs “infinitely often” in the sense that, for ever N , there exists n such that An occurs. Then the first Borel–Cantelli lemma concerns the probability of this “infinitely often” event. Theorem (Borel–Cantelli, part I). Let An be events in the probability space (Ω, A, P ). P If n P (An ) < ∞, then P (lim supn An ) = P (An , i.o.) = 0. S Proof. It is clear that lim supn An ⊆ n≥N An for anySN . Furthermore, by monotonicP ity of P and Boole’s inequality P (lim supn An ) ≤ P ( n≥N An ) ≤ n≥N P (An ). Since the summation over all n is finite, given any ε > 0, there exists N = Nε such that P n≥Nε P (An ) < ε. Since ε is arbitrary, it follows that P (lim supn An ) = 0. Application: Strong law of large numbers The Borel–Cantelli lemma above looks very simple, but has important consequences. In fact, this result is commonly used to prove various almost sure convergence results. Besides the application here, I am aware of many applications of the Borel–Cantelli lemma in proofs of consistency for general Bayesian posterior distributions. This section gives an example that is a bit ahead of the course schedule. However, since the Borel–Cantelli lemma is so useful, it makes sense to give an interesting application to highlight its importance. Consider a sequence of iid random variables X1 , X2 , . . . , with common mean µ. A result that is justified, but not proved, in an introductory probability course is the law of large numbers, i.e., P (|X̄n − µ| > ε) → 0, n → ∞, ∀ ε > 0, P where X̄n = n−1 ni=1 Xi is the sample mean. That is, for large samples, the sample mean X̄n will be close to the population mean µ with high probability. This is often referred to as a weak law of large numbers. Here, using the Borel–Cantelli lame, we will prove a stronger result, namely, a strong law of large numbers,2 but for only a special case of standard normal random variables. Let X1 , X2 , . . . be a sequence of independent N(0, 1) random variables, i.e., their common density function is 2 (2π)−1/2 e−x /2 , x ∈ R. The crucial step to the application of the Borel–Cantelli lemma and the strong law of large numbers is the following “concentration inequality” for normal random variables. Lemma. Let Z ∼ N(0, 1). Then, for any ε > 0, 2 /2 P (|Z| > ε) ≤ ε−1 e−ε 2 . The difference between the “weak” and “strong” law of large numbers is the mode of convergence: the former is “in probability,” while the latter is “with probability 1.” You will learn more about the difference between these two modes of convergence later in Stat 510. 2 Proof. For Z ∼ N(0, 1) and ε > 0, we get Z ∞ Z ∞ 1 z −z2 /2 1 −z 2 /2 e dz. P (Z > ε) = e dz ≤ 1/2 1/2 (2π) (2π) ε ε ε By substitution (u = z 2 /2), we can simplify the upper bound to get 2 /2 P (Z > ε) ≤ ε−1 (2π)−1/2 e−ε . By symmetry of the standard normal, P (|Z| > ε) = 2P (Z > ε) ≤ 2 1 −ε2 /2 2 e ≤ ε−1 e−ε /2 . 1/2 (2π) ε This can be extended to the case of a sample mean for n independent standard iid normals, due to the well-known fact that X̄n ∼ N(0, n−1 ) when X1 , . . . , Xn ∼ N(0, 1). Corollary. For all n and all ε > 0, P (|X̄n | > ε) ≤ n−1/2 ε−1 e−nε 2 /2 . Proof. The distribution of X̄n is the same as n−1/2 Z, for Z ∼ N(0, 1). Then P (|X̄n | > ε) = P (|Z| > n1/2 ε). Write ε0 = n1/2 ε and apply the previous lemma. The main theorem of this section is a law of large numbers for X̄n , that is, we claim that the sample mean, X̄n , of standard normal random variables converges to the population mean, 0, with probability 1. A key step to proving this claim is representing the event of non-convergence as a lim sup. Given ε > 0, let An = {|X̄n | > ε}. Then the event {X̄n 6→ 0} is the same as lim supn An . To understand this, think about convergence using the calculus definition of limit: A sequence of numbers xn converges to x if, for any ε > 0, there exists N = N (ε, x) such that |xn − x| < ε for all n > N . In this sense, a sequence xn does not converge to x if there exists ε such that |xn − x| > ε for infinitely many n. This proves that {X̄n 6→ 0} = lim supn An . iid Theorem. For X1 , X2 , . . . ∼ N(0, 1), P (X̄n 6→ 0) = 0. P 2 Proof. For An as defined above, the Corollary gives P (An ) ≤ ε−1 e−nε /2 . Since n P (An ) < ∞, it follows from the Borel–Cantelli lemma that P (lim supn An ) = 0. From the above equivalence, we get P (X̄n 6→ 0) = 0. The above theorem says that X̄n converges to its mean, 0, with probability 1. This is a stronger result than the weak law of large numbers in Stat 401. The latter only requires a finite variance (which clearly holds for the normal), while the former requires some stronger control on the tails of the normal distribution. It turns out that even the strong law of large numbers holds in broad generality, without needing the Borel–Cantelli lemma and/or concentration inequalities like in the Lemma; in fact, all that’s needed is E|X1 | < ∞ in the iid case. You will see these details later on in Stat 501/502. 3 Borel–Cantelli lemma, part II Note that the first Borel–Cantelli lemma above does not have anything to do with independence. It turns out that there is a second version of the result which says, for independent events An , P (An , i.o.) is either 0 or 1, depending on whether the series P n P (An ) converges or diverges. Theorem (Borel–Cantelli, part II). Let {An : n ≥ 1} be independent events. Then ( P 0 if P (An ) < ∞, P (An , i.o.) = Pn 1 if n P (An ) = ∞. Proof. The first part follows from result. For the second Ppart, we show that S theTprevious c c c [lim supn An ] = lim inf n An = N ≥1 n≥N An has probability 0 if n P (An ) = ∞. Of course, [ \ \ lim inf Acn = Acn ⊆ Acn , ∀ N ≥ 1. n N ≥1 n≥N n≥N By independence, the set on the right-hand side above has probability \ Y c P An = {1 − P (An )}. n≥N n≥N The claim is that the right-hand side above equals zero. Pick a finite integer M . Since 1 − x ≤ e−x , we get NY +M n=N {1 − P (An )} ≤ NY +M e−P (An ) = e− PN +M n=N P (An ) . n=N P Since the series n P (An ) diverges, the exponent in the upper bound approaches −∞ as M → ∞, and it follows that P (lim supn An ) = 1 − P (lim inf n Acn ) = 1 − 0 = 1. In applications, at least in statistics, it is rare for the events An of interest to be independent. So, as far as I know, the first Borel–Cantelli lemma is the most useful in practice. However, the second Borel–Cantelli lemma provides a first peek at an interesting phenomenon that occurs more generally. That is, for certain “limiting events,” the probability can be either 0 or 1, nothing in between. You will see later under the name Kolmogorov Zero-One Law, which implies, for example, that a law of large numbers convergence holds with either probability 1 or probability 0, no middle ground. An interesting and somewhat philosophical implication is that there is no uncertainty in the limiting case—this is, effectively, the fundamental theorem of statistics.3 Resnick considers an independent coin-flipping experiment, and the event of interest is lim supn An = {coin lands heads infinitely often}. If the coin is fixed, so that P (An ) ≡ p, then clearly the series diverges so the coin lands on heads infinitely often with probability 1, which is quite intuitive. On the other hand, in order for heads to not appear infinitely often, the probability P (An ) for heads on the nth flip must decay quite rapidly. 3 This is my terminology, not given in a book, etc—but I don’t think that makes it wrong! 4