Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability Theory Four different convergence concepts Let X1, X2, … be a sequence of (usually dependent) random variables Chapter 6 Definition 1.1. Xn converges almost surely (a.s.), or with probability 1 (w.p.1), to the random variable X as n→∞ iff Convergence D fi iti 1.2. Definition 1 2 Xn converges in i probability b bilit to t the th random d variable X as n→∞ iff for every ε>0 Thommy Perlinger, Probability Theory 1 Thommy Perlinger, Probability Theory Four different convergence concepts Definition 1.3. Xn converges in r-mean to the random variable X as n→∞ iff 2 Convergence in probability Definition 1.2. Xn converges in probability to the random variable X as n→∞ iff for every ε>0 p Notation. Xn → X as n→∞. Definition 1.4. Xn converges in distribution to the random variable X as n→∞ iff In situations where the limiting distribution is degenerate, that is, the limiting random variable X is a constant, convergence in probability is (in statistics) also known as consistency. Chebyshev’s inequality. Let X be a random variable with mean μ and finite variance σ2. Then where C(FX) is the continuity set of FX. Thommy Perlinger, Probability Theory 3 4 1 Problem 6.8.6 a The weak law of large numbers Let X₁,X₂,… be i.i.d. Pa(1,2)-distributed random variables, and set p Yn = min{X₁,X₂,…,Xn}. Show that Yn→1 as n→∞. Since The weak law of large numbers. Let X1, X2, … be a sequence of i.i.d. random variables with mean μ and finite variance σ2 and set Sn= X1+X2+…+X + +Xn. Then it follows that The distribution function of Yn is therefore given by Proof. The statement is a simple consequence of Chebyshev’s inequality. and so (for any ε>0) it follows that It thus follows that 5 6 Convergence in probability: Extension Theorem 6.7 Example: Consistency of S2 Consistency of S2. Let X1, X2, … be a sequence of i.i.d. random variables with mean μ and finite variance σ2. Define the sample variance by Theorem 6.7.Suppose that X1, X2, … converges in probability to a constant a and that h is a continuous function. Then Proof . h is continuous, so given ε>0 there exists a δ>0 such that Since (prove this!) The continuity of h thus makes sure that it follows from Chebyshev Chebyshev’s s inequality that and since Xn converges in probability to X p and thus, a sufficient condition for S2 → σ2 is that Var(S2) →0 as n→∞. 7 Thommy Perlinger, Probability Theory 8 2 Convergence in probability: Extension Exercise 6.6.2 Convergence in probability: Extension Exercise 6.6.2 Exercise 6.6.2. Suppose that X1, X2, … converges in probability to a random variable X and that h is a continuous function. Then Proof . To prove this we use the fact that on a closed interval (compact set) any continuous function h is actually uniformly continuous. We therefore divide the sample space into two disjoint subsets and it now follows that To this we use the fact that for any given η>0 there exists an A such that Since h for any A is uniformly continuous on [-A,A] we can (on this interval) for any given η>0 and ε>0 find a δ>0 and an m such that for any n>m Thommy Perlinger, Probability Theory 9 Almost sure convergence Thommy Perlinger, Probability Theory 10 Example: Almost sure convergence Definition 1.1. Xn converges almost surely, or with probability one, to the random variable X as n→∞ iff Let the sample space S be [0,1] with the uniform distribution. Define random variables Xn(s) = s+sn and X(s) = s. As n→∞ we have that a.s. Notation. Xn → X as n→∞. When to prove that Xn converges (or fails to converge) almost surely we can use the following result. which means that a.s. Xn → X as n→∞ iff, ∀ε>0 and 0<δ<1, ∃n0 such that, ∀n>n0, But since Pr([0,1)) = 1 it follows by Definition 1.1 that Thommy Perlinger, Probability Theory 11 Thommy Perlinger, Probability Theory 12 3 Relationship: Almost sure convergence and convergence in probability Comparison of Definitions 1.1 and 1.2. We have that Example: Convergence in probability but not almost sure convergence Let the sample space S be [0,1] with the uniform distribution. Define X(s) = s and the sequence X1, X2,… by Since for m>n, etc. It is clear that for any 0<ε<1 p where h In is i th the iinterval t l related l t d to t Xn. It is i thus th clear l th t Xn → X as n→∞. that However, since Xn(s) alternates between s and s+1 infinitely often, that is it is ”proven” that it is clear that Xn does not converge to X almost surely as n→∞. Thommy Perlinger, Probability Theory 13 Relationship: Convergence in r-mean and convergence in probability Thommy Perlinger, Probability Theory 14 Convergence in distribution (and relationships between concepts) Definition 1.3. Xn converges in r-mean to the random variable X as n→∞ iff Definition 1.4. Xn converges in distribution to the random variable X as n→∞ iff r Notation. Xn → X as n→∞. where C(FX) is the continuity set of FX. Notation. Xn → X as n→∞. Convergence in r-mean is stronger convergence concept than convergence in probability. By Markov’s inequality (for any ε>0) Convergence in distribution is the weakest concept of the four but also the most useful. The (complete) relationships can be described as d which implies that where all implications are strict. Thommy Perlinger, Probability Theory 15 Thommy Perlinger, Probability Theory 16 4 Problem 6.8.6 b Important results concerning limits Consider two functions f and g, such that Let X₁,X₂,… be i.i.d. Pa(1,2)-distributed random variables, and set Yn = min{X₁,X₂,…,Xn}. Show that Un= n(Yn-1) converges in distribution as n→∞ and determine the limit distribution n→∞, distribution. Since Then If h is a continuous function then it follows that The single most important limit (in its most general form) is the following: Let an→a as n→∞. Then d It is thus clear that Un → X as n→∞ where X ∈ Exp(1/2). Thommy Perlinger, Probability Theory 17 Thommy Perlinger, Probability Theory 18 Important results concerning limits (when using Taylor series expansion) Convergence via transforms Theorem 4.1. Let X, X₁,X₂,… be nonnegative, integer-valued random variables. Then The Taylor series expansion of a function f that is infinitely differentiable in a neighborhood of x=a is the power series Theorem 4.2. Let X₁,X₂,… be random variables for which the mgf’s exist for –h<t<h for some h>0, and suppose that X is a random variable whose mgf ψX(t) exists for –h1≤t≤h1 where 0<h1<h. If which implies that Some terms in the expansion might be insignificant in the limit. For such terms we can use the ”o”-concept. The function f(x) is said to be ”little-o” of g(x) if f(x)/g(x)→0 as x→0 and we write then Thommy Perlinger, Probability Theory 19 Thommy Perlinger, Probability Theory 20 5 Problem 6.8.19 Problem 6.8.19 Let X₁,X₂,… be i.i.d. random variables with mean μ<∞, and let Nn Ge(pn), 0<pn<1, independent of X₁,X₂,…. Determine the limit distribution of We now note that for a general probability distribution with mean μ (where the mgf exists) it holds that as n→∞ if pn→0 as n→∞. It follows from Theorem 3.6.3 that that is and d ffrom Th Theorem 3 3.3.4 3 4 we have h th t that Si Since ψX(p ( nt) → ψX(0) = 1 as n→∞ it i therefore h f f ll follows that h d and so it is clear that Yn → Exp(μ) as n→∞. 21 The weak law of large numbers Revisited 22 The Central Limit Theorem The weak law of large numbers (LLN). Let X1, X2, … be a sequence of i.i.d. random variables with mean μ and mgf ψX(t). Then The Central Limit Theorem (CLT). Let X1, X2, … be i.i.d. random variables with mean μ, variance σ2, and mgf ψX(t) and set Sn= X1+X2+…+Xn. Then Proof. Proof. Because of linear properties of moment generating functions it is no restriction to let μ=0 and σ2=1. This means that as n→∞. It is clear that , or equvalently, 23 Thommy Perlinger, Probability Theory 24 6 Convergence of sums of sequences of random variables The Central Limit Theorem Theorem 6.1-6.3. Let X₁,X₂,… and Y₁,Y₂,… be sequences of random variables. Then and therefore we get that Theorem 6 6.6. 6 Let X₁,X X₂,… and Y₁,Y Y₂,… be sequences of random variables. variables Suppose further that Xn and Yn are independent for all n and that X and Y are independent. Then as n→∞, and we are done since this is the moment generating function of N(0,1). Thommy Perlinger, Probability Theory 25 Slutsky’s theorem (or Cramér’s theorem) Theorem 6.5. Let X₁,X₂,… and Y₁,Y₂,… be sequences of random variables. Suppose that Thommy Perlinger, Probability Theory 26 Exercise 6.6.3 Let X₁,X₂,… be i.i.d. Be(p)-distributed random variables where 0<p<1. We would like to construct a confidence interval for the population proportion p. What about the random behavior of the sample proportion? Set Sn= X1+X2+…+Xn and consider Y₁,Y₂,… where Yn=Sn/n. Since where a is a constant. Then it follows by the Central Limit Theorem (CLT) that which, for instance, implies that Thommy Perlinger, Probability Theory 27 28 7 Exercise 6.6.3 Exercise 6.6.3 Hence, in the denominator we have to replace p(1-p) with Yn(1-Yn), that is Now it follows by the third and the fourth result of Theorem 6.5 (Slutsky) that is to be replaced by Since the square root is a continuous function, it follows by Theorem 6.7 that With the aid of Slutsky’s theorem we can prove that CLT still ”works”. First it follows by the law of large numbers that Finally, it follows by the fourth result of Theorem 6.5 (Slutsky) that and therefore by the second result of Theorem 6.5 (Slutsky) we have that 29 Problem 6.8.26 as n→∞. It is thus clear that the approximation is still valid for sufficiently large sample sizes. 30 Problem 6.8.26 Let X₁,X₂,… be positive i.i.d. random variables with mean μ and variance σ2<∞, and set Sn= X1+X2+…+Xn. Determine the limit distribution of So therefore we have that In order to use the central limit theorem we rewrite the expression as Hence, it follows from Theorem 6.5 (Slutsky) that The expression in the numerator meets the requirements of the central limit theorem, and the expression in the denominator meets the requirements of the law of large numbers. since a linear function of a normal random variable also is normal. Thommy Perlinger, Probability Theory 31 Thommy Perlinger, Probability Theory 32 8