Download Chapter 11 - Sequences of Random Variables

EE603 Class Notes 12/05/13 John Stensby Chapter 11: Sequences of Finite-Second-Moment Random Variables The theory of sequences of finite-second-moment random variables is the topic of this chapter. We study their application to system theory, where they serve as the system’s input and output. It is natural to ask if a given random sequence has a limit, and in what sense is the limit is approached. Convergence of random variable sequences is discussed in this chapter. This chapter deals with discrete phenomenon and mathematics. Random sequences occur in applications where analog signals are sampled. They have applications in the fields of signal and image processing, digital control and digital communications. They have many applications outside of electrical engineering (for example, in the world of games, stocks, money and finance). Sequence of Random Variables – A Basic Definition Let (S, F, P) be a probability space (see Chapter 1 of these notes). A random variable X() maps S into the extended real line R+. (See Chapter 2 for the definition of a random variable.) A mapping from a sample space into a set of discrete-time sample functions is called a random, or stochastic, sequence X(n;), also known as a discrete-time random process. Often, we suppress the argument  and write X(n). For each fixed  in some sample space S, the function of n denoted by X(n;) is an “ordinary” deterministic sequence of numbers known as a sample function. Alternatively, also true is the fact that X(n;) is a sequence of random variables that is indexed on n. That is, for a fixed index n0, X(n0;) is a random variable. Example 11-1: X(n;)  X()f(n), where X() is a random variable, and f(n) is a deterministic sequence of real numbers, is a simple random sequence. Example 11-2: X(n;)  A()sin(n/10 + ()), where A() and () are random variables, is a random sequence. These two elementary examples have the feature that their future values are predictable from their present and past values. Updates at http://www.ece.uah.edu/courses/ee385/ 11-1 EE603 Class Notes 12/05/13 John Stensby Repeated Bernoulli Trials - the Quintessential Example of an Infinite Random Sequence Consider the tossing of a fair coin. Here, we have the sample space S = [H, T], the set of events (i.e., -algebra) F = {[H], [T], [H,T], Ø}, and the probability measure P that is usually associated with the tossing of a fair coin (i.e., P[H] = P[T] = 1/2, etc.). (S, F, P) is the probability space for the coin tossing experiment. We define a random variable X: S  R as X (H)  1 (11-1) X (T)  0 . We know that {X < 1/2} = [T], {X > 1/2} = [H], etc. In what follows, probability space (S, F, P) and random variable X will be used “to build” the Bernoulli trials random sequence. This random sequence X(n) is defined easily. On the nth toss, assign X(n) = 1 (alternatively, X(n) = 0) if a heads (alternatively, tails) is obtained. We call X(n) the Bernoulli trials random sequence. This simple sequence must be described by using the methodology outlined above, a task that introduces some complexity. We need a probability space (S , F, P) so that X(n;) can be defined as a mapping from S into a set of binary functions. The space (S , F, P), the development of which is outlined below, is a product space. Our product space (S , F, P) is developed by using ideas from Chapter 1 of these notes (also see Chapter 8 of Stark and Woods, Probability and Random Processes, 4rd ed.). Instead of considering individual heads and tails as elementary outcomes of separate experiments, our product space has elementary outcomes that are infinite head/tail sequences. We define the Bernoulli trials random sequence X(n;) as a mapping from sample space S into a set of binary functions. Our product space will be built as an infinite Cartesian product of (S, F, P) with itself (recall that (S, F, P) describes the coin tossing experiment). But first, by (Sn, Fn, Pn), we denote the “nth repetition” of (S, F, P); that is, Sn = {[Hn], [Tn]} and Fn = {[Hn], [Tn], [Hn, Tn], Ø}, where Hn and Tn denotes “heads on the nth toss” and “tails on the nth toss”, respectively. Pn is the usual Updates at http://www.ece.uah.edu/courses/ee385/ 11-2 EE603 Class Notes 12/05/13 John Stensby probably measure that is used for the tossing of a fair coin (i.e., Pn[Hn] = Pn[Tn] = 1/2, etc.). Now, denoted as (S, F, P), our infinite-dimensional product space is determined from the (Sk, Fk, Pk), k  1, as outlined in what follows. The sample space S is the infinite Cartesian product  S   S k  S1  S2  S3    S n   . (11-2) k 1 Elements of S consist of infinite sequences of heads and tales. Element   S has the form    1,   ,   ,   , (11-3) where k  Sk is either heads or tails, 1  k <  ( is a sequence of heads/tails outcomes, not the outcome of a specific trial). F denotes the set of events (i.e., the -algebra) for the product space. F includes all sets of the form   A k  A1  A2  k 1   An   , (11-4) where Ak  Fk, 1  k <  (set (11-4) is called a generalized rectangle). Also, all countable intersections and unions of such sets are included in F. For example, consider the event [the first two tosses produce different outcomes]  F . This event in F is represented as {H1}  {T2}  S3  S4     {T1}  {H 2 }  S3  S 4    , (11-5) the union of two generalized rectangles each of the form (11-4). Also, the intersection of events Updates at http://www.ece.uah.edu/courses/ee385/ 11-3 EE603 Class Notes 12/05/13 John Stensby [{H1}{S2}{S3} … ] [{S1}{T2}{S3} … ] must mean the event [{H1}{ T2}{S3} … ]. As it turns out, F is the -algebra generated by the collection (i.e., set) of all generalized rectangles of the form (11-4) (see Chapter 1 for details on how a -algebra can be generated by a collection of sets). To finish our product space, we must define P , a probability measure on the product space. To accomplish this, we use the fact that the successive trials are independent, and probabilities can be multiplied (without this assumption, it would not be possible to define P without knowing the interdependence of each trial on the other trials). We start with events of the form given by (11-4), and we define      P   A n   Pn (An )  P1(A1)P2 (A 2 )P3 (A3 )  Pn (A n )   n 1  n 1     Algebraic Product  Cartesian Product  (11-6) (note the different interpretation/usage of the  symbol). We realize that every event in F can be represented as countable unions and/or intersections of events of the form (11-4). And, we use the Axioms of Probability (specifically, the Countable Additivity property - possessed by all valid probability measures) to extend definition (11-6) to all of F. For example, consider the event [the first two tosses are different]  F given by (11-5). The probability of this event is P the first two tosses are different   {T1}  {H 2 }  S3  S 4      P {H1}  {T2 }  S3  S 4      P {H1}  {T2 }  S3  S 4       P[H1 ] P [T2 ] P[S3 ] P [S 4 ]   P  {T1}  {H 2 }  S3  S 4        P[T1 ] P [H 2 ] P[S3 ] P [S 4 ]  (11-7)  1/ 4  1/ 4  1/ 2 , Updates at http://www.ece.uah.edu/courses/ee385/ 11-4 EE603 Class Notes 12/05/13 John Stensby where we have used the fact that the event [the first two tosses are different] can be represented as the union of two events of the form (11-4). This finishes the definitions of P and our product space (S, F, P). Note that we have developed the same product space that is discussed in Chapter 8 of Stark and Woods, 4rd edition (also in the 3rd edition, Ch. 6). Finally, using our infinite-dimensional product space (S, F, P), we are in a position to define the Bernoulli trials random sequence. Denote an elementary outcome in S as . That is,  = (1, 2, ... )  S , where each k  Sk, k  1, is either a head or tail (so that  is an infinite indexed sequence of heads and tails). We define the Bernoulli trials random sequence as X (n ;  )  1,  n  H n  0, (11-8)  n  Tn a mapping from  S into the set of binary functions (remember that k is the kth component of ). The Limit of Nested Event Sequences When dealing with an infinite sequence of random variables, we need to be able to define the notion of a limit of an event sequence. In general, the limit of an event sequence is somewhat complicated and abstract. Before considering the general case, we first consider the important simple special case of nested sequences. A nested decreasing sequence of events is a simple concept. The event sequence Ak, k  1, is nested and decreasing if for each integer n  1 we have A1  A2  …  An . (11-9) A convenient feature of such a sequence is that Updates at http://www.ece.uah.edu/courses/ee385/ 11-5 EE603 Class Notes AN  12/05/13 John Stensby N  Ak . (11-10) k 1 Like a bounded and monotone sequence of real numbers, all of which have a real-number limit, a nested decreasing sequence of events has a well-defined limit event. As N   in (11-10), we obtain A  a countable intersection of events. And, a countable intersection of events is an event (recall that the set of events, a -algebra, is closed under countable unions and intersections). So, the limit of (11-10) is well defined. Often, we write AN  A  , where A is the limit. In some applications, an event can be expressed as the limit of a nested decreasing sequence of events, a sometimes-valuable representation. For example, let X(n), n  1, be a sequence of random variables and consider A   X (n)  5, n  1  X (1)  5  X (2)  5  X (3)  5    X (N)  5    limit AN , (11-11) N  where AN  N  {X(n)  5} . (11-12) n=1 Note that A1  A2  …  AN so that AN is a nested decreasing sequence that has the limit A  {X(n) < 5, n  1}. Similar results and statements can be made for a nested increasing sequence of events. The sequence BN is a nested increasing event sequence if B1  B2  …  BN for all N. Furthermore, we can write Updates at http://www.ece.uah.edu/courses/ee385/ 11-6 EE603 Class Notes BN  12/05/13 John Stensby N  Bn . (11-13) n 1 A nested increasing event sequence always has a limit B  limit Bn (11-14) n  since B can be written as a countable union of events. Often, we write BN  B  . Nested sequences of events are special cases of general event sequences. In Appendix 11b, we define the limit, when it exists, of an arbitrary sequence of events (unlike the case of nested sequences, the limit of an arbitrary event sequence may not exist!). Concerning infinite intersections and unions, some standard notation needs to be reviewed. For Bn, n  1, an arbitrary sequence of events, we utilize the standard notation  N n=1  n=1 N n=1 n=1  Bn  Nlimit  Bn  (11-15)  Bn  Nlimit  Bn .  (11-16) Of course, the limits (11-15) and (11-16) may, or may not, exist when the Bn are non-nested. Computing P[A], where A is the Limit of a Nested Event Sequence We need to be able to compute probabilities like P[X(n) < 5, n  1]. This probability can, we will argue, be computed as the limit of P[AN], where AN is represented by (11-12). That is, we need to show the second equality in (the first equality is a definition) N     N  P   {X (n) < 5}  P  limit  {X (n) < 5}  limit P   {X (n) < 5} .  n=1   N  n=1  N   n=1  Updates at http://www.ece.uah.edu/courses/ee385/ (11-17) 11-7 EE603 Class Notes 12/05/13 John Stensby To any specified accuracy, this limit can be approximated by using sufficiently large N. The second equality in Equation (11-17) follows from the continuity of the probability measure P, a fact that we will argue in what follows. The events AN  N  {X(n) < 5} (11-18) n=1 form an indexed set of nested, decreasing events. The limit of the nested sequence is A  limit AN  limit N  N  {X(n) < 5}  N  n=1   {X(n) < 5} . (11-19) n=1 As will be shown in a section that follows, for the nested sequence of decreasing events, we have P  A   P  limit AN   limit P  AN  .  N   N  (11-20) That is, we can interchange P and the limit operations. A similar statement will be made for a nested sequence of increasing events, BN  B. Nested sequences are just special cases. In Appendix 11-B, we define what is meant by the limit of an event sequence where the events are not generally nested. Also, we argue that (11-20) is true for arbitrary convergent sequences of events. Continuity of a Probability Measure On a general probability space (S, F, P), the probability measure P has a continuity property. This is satisfying from an intuitive sense; it allows us to use P as a metric, or “gauge”, to “measure” the “size” of an event. Also, the continuity of P is used when we approximate the probability of an event that is represented as the limit of an infinite sequence of nested events. There is an analog here to the theory of continuous functions. Let f(x) be any function Updates at http://www.ece.uah.edu/courses/ee385/ 11-8 EE603 Class Notes 12/05/13 B2 B1 B3 John Stensby B4  Figure 11-1: An increasing sequence of events. with domain that includes x0. Then f(x) is continuous at x0 if and only if limit f ( xn )  f (limit xn )  f ( x0 ) n  n  (11-21) for all sequences {xn} that converge to x0. In words, Equation (11-21) states that one can interchange limit and function computation. In the sense described by Theorem 11-1 (and the more inclusive results given in Appendix 11B), this basic idea carries over to probability measures. Theorem 11-1: Consider an increasing sequence of events as shown by Figure 11-1. That is, the events are such that Bn  Bn+1 for all n  1. Define the infinite union of these events as B  limit BN  limit N  N  Bn  N  n=1   Bn , (11-22) n=1 a well-defined event (since a -algebra is closed under countable unions). Then, to any degree of accuracy that is required, P[B] can be approximated by P[Bn] for sufficiently large n. That is, we have   limit P[ Bn ]  P  limit Bn   P[ B ] . n   n   Updates at http://www.ece.uah.edu/courses/ee385/ (11-23) 11-9 EE603 Class Notes 12/05/13 John Stensby In words, (11-23) says that we can move the limit operation from “outside” to “inside” the probability measure (interchange the limit and probability operations). Proof: We define the sequence of events A1  B1 A 2  B2  B   (11-24) An  Bn  Bn 1   where the over-bar denotes set complement. The Bn are nested, and the disjoint An, 1  n  N, “union up” to BN so we can write BN  N N n 1 n 1  Bn   An , 1  N   (i.e., including N  ) . (11-25) As a result of this, we have N  N  N P  BN   P   Bn   P   An    P[ An ]  n 1   n 1  n 1 (11-26) for all finite N. Now take the limit of (11-26) to obtain limit P  BN   limit N  N  P[ An ]  N  n 1   P[ An ] . (11-27) n 1 Now, the most crucial step in the proof answers the question: does the sum on the right-hand Updates at http://www.ece.uah.edu/courses/ee385/ 11-10 EE603 Class Notes 12/05/13 John Stensby side of (11-27) converge? If yes, what does it converge to? Since the An are disjoint, we can use the Countable Additivity Property of P (see Chapter 1) to write limit P  BN   N      P[ An ] P   An   1 . n 1  n =1   (11-28) In (11-27), the middle Nth partial sum is an increasing sequence of real numbers that is bounded above by unity, as can be seen by (11-28). Hence, the limits in (11-27) and (11-28) converge. To find out what they converge to, simply use   n =1 n =1  An   Bn  B , (11-29) in (11-28) to obtain the desired result limit P  Bn   P  B   P  limit Bn  .  n   n   (11-30) Corollary 11-1: A version of Theorem 11-1 holds for a decreasing nested sequence of events. That is, suppose Bn  Bn+1 for n  1. Then we can write   P  B   P  limit Bn   limit P[ Bn ] ,  n   n  (11-31) where BN  N  Bn , B  n 1   Bn . (11-32) n 1 Updates at http://www.ece.uah.edu/courses/ee385/ 11-11 EE603 Class Notes 12/05/13 John Stensby Proof: Similar to the proof given for Theorem 11-1. Appendix 11B extends Theorem 11-1 to more general, non-nested sequences of events. In the appendix, we define the limit, if it exists, of an event sequence, not necessarily nested. If event A is the limit of an infinite event sequence An , we show that P[A] is the limit of P[An] as index n approaches infinity. So, the probability measure P is continuous!! The analogy, drawn in the paragraph preceding Theorem 11-1, to continuous function is valid! Example 11-3: Theorem 11-1 and its corollary are used to approximate the probability of an event that is represented as the limit of an infinite sequence of events. For example, for each n  0, let Bn = {X[k] < 2 for 0  k  n}. This is a decreasing and nested sequence of events: Bn+1  Bn, n  0. Suppose we wanted to calculate P[B], where B = {X[k] < 2 for 0  k}. We know that B    Bn . (11-33) n=0 We use the corollary to approximate (as closely as desired) P[B] as the probability of a finite intersection. That is, based on our accuracy requirements, we select N and approximate  N P[ B ]  P[  Bn ]  P[  Bn ]  P  X(0)  2, X(1)  2, , X(N)  2 . n=0 (11-34) n=0 Example 11-4: Back in Chapter 2 of these class notes, we were told that probability distribution functions are right continuous. We were told that F( x )  limit F(x +1/n) n  (11-35) for any distribution function F(x) and all x. However, Equation (11-35) follows directly from Updates at http://www.ece.uah.edu/courses/ee385/ 11-12 EE603 Class Notes 12/05/13 John Stensby Theorem 11.1 since limit F(x +1/n) = limit P  X  x  1/ n  n  n     P  limit{X  x  1/ n}  n   (11-36)  P {X  x}  F(x). Statistical Specification of a Random Sequence In this chapter, we assume that all random variables are real-valued. This assumption greatly simplifies the notation, definitions and theory. From a conceptual standpoint, little is lost by assuming that everything is real valued (however, complex-valued random sequences are important - and often used - in many applications where band-pass signals are represented by their complex-valued, low-pass equivalents). A random sequence is statistically specified by its distribution functions, all orders are required in general. That is, for each positive integer n, and for all positive integer sequences k1  k2  …  kn, we need knowledge of the nth-order distribution function F( xk1 , xk 2 ,, xk n ; k1, k 2 ,..., k n ) (11-37)  P  X[k1 ]  xk1 , X[k 2 ]  xk 2 , , X[k n ]  xk n  . Note that a complete statistical specification requires an infinite set of distribution functions. In (11-37), the algebraic variables xk1 , xk 2 , , xk n are called realization variables. The subscripts on these variables serve only to distinguish one variable from another; F(, ,  ; k1, k2, k3) is just as meaningful as F( xk1 , xk 2 , xk3 ; k1, k 2 , k 3 ) . Updates at http://www.ece.uah.edu/courses/ee385/ 11-13 EE603 Class Notes 12/05/13 John Stensby The probability density functions are obtained by differentiating distribution functions. That is, the nth-order probability density function is defined as f ( xk1 , xk 2 , , xk n ; k1, k 2 ,..., k n ) n  F( xk1 , xk 2 , , xk n ; k1, k 2 ,..., k n ) xk1 xk 2  xk n . (11-38) The moments of a random sequence are important in applications. The mean (sometimes called the first-order average) is defined as [n]  E  X(n)     (11-39) x f (x, n)dx for a sequence of continuous random variables. Second-order statistical averages appear often in practice. For example, the autocorrelation function is defined as R X (n, m)  E  X(n)X(m)      x x f ( xn ,xm ; n, m)dxn dxm .   n m (11-40) In a similar manner, the autocovariance function is defined as CX (n, m)  E {X(n)  (n)}{X(m)  (m)}       . (11-41) {xn - (n)}{xm  (m)} f ( xn ,xm ; n, m)dxn dxm Note that both RX and CX are symmetric Updates at http://www.ece.uah.edu/courses/ee385/ 11-14 EE603 Class Notes 12/05/13 R X (n, m)  E  X(n)X(m)   E  X(m)X(n)  John Stensby (11-42)  R X (m, n) CX (n, m)  E {X(n)  (n )}{X(m)  (m )}  E {X(m)  (m )}{X(n)  (n )} (11-43)  CX (m, n). Also, we can write CX (n, m)  R X (n, m)  (n)(m) . (11-44) The sequence X(n) is said to have uncorrelated elements (or to be uncorrelated) if R X (n, m)  E  X(n)X(m)  E  X(n)] E[X(m)  (n)(m), n  m . For such a sequence, (11-44) leads to the conclusion that CX (n, m)  R X (n, m)  (n)(m)  2 (n) 0 nm (11-45) nm where 2(n) denotes the sequence variance. Example 11-5: Many applications involve the arrival of objects. For example, we may be interested in the arrival of cars at an intersection, the arrival of electrons at the plate of a vacuum tube, etc. A commonly-used simplifying assumption is that the objects arrive independently of one another. Let (n) denote the interval of time (in seconds) between the arrival of the (n-1)th and nth objects (relative to a given initial time 0, (1) is the arrival time for the first object). The Updates at http://www.ece.uah.edu/courses/ee385/ 11-15 EE603 Class Notes 12/05/13 John Stensby time line is depicted by Fig. 11-2 below. For n  1, we assume that (n) is a sequence of identical, independent random variables each with the exponential density f  (t ; n)   exp[ t]U(t) . (11-46) The mean of (n) is    (n)  E[ (n)]   x  e  x dx  1/  , (11-47) 0 and its variance is  2(n)  E[ (n)2 ]  (1/  )2   x2 e  x dx  (1/  )2  2 /  2  1/  2 0 . (11-48)  1/  2 Relative to a given initial time 0, the running sum of these intervals is the arrival times of the objects . That is, the arrival time of the nth object is T(n)  n  (k) , (11-49) k 1 (1)  0 (2)  (3)  (4)     T(1) T(2) T(3) T(4) Fig. 11-2: Random arrival times. (n) is the time between arrivals, and T(n) is the actual arrival time (relative to origin 0). Updates at http://www.ece.uah.edu/courses/ee385/ 11-16 EE603 Class Notes 12/05/13 John Stensby a sequence of random variables indexed on n. Since the time intervals are independent, the density function fT(t;n) for T(n) is an n-1 fold convolution of (11-46) with itself. We claim that this first-order density function is fT (t; n)  ( t) n 1  exp(- t)U(t) . (n  1)! (11-50) This result can be established by induction (by using a different approach, this same result was derived in Appendix 9B). Clearly, the result is correct for n = 1; assume it is true for n-1. Now, we convolve again to obtain f T (t ; n)  f T (t ; n -1)   exp( t)U(t)   t  ({t  }) n  2     exp( ) exp( {t  })d U(t) (n  2)!  0    t  n-2 d  U(t)   nexp( t)  0 (n  2)!   (11-51)   t n 1 exp( t)  U(t)   n  (n  1)!  as claimed. Equation (11-51) is the Erlang density, and T(n) is an Erlang distributed random variable (this same result was obtained in Appendix 9B). The expected value of random variable T(n) is T (n)  n  (n)  n /  . (11-52) Since the interval random variables are independent, the variance of T(n) is Updates at http://www.ece.uah.edu/courses/ee385/ 11-17 EE603 Class Notes 12/05/13 T2 (n)  n2 (n)  n /  2 . John Stensby (11-53) Gaussian Random Sequences A random sequence X(n) is called a Gaussian random sequence if all its nth-order probability density functions are Gaussian. Such sequences are very popular. Because of the Central Limit Theorem, Gaussian sequences occur in many applications. Also, they are completely described by only first- and second-order statistical averages (i.e., means and covariances). Finally, use of Gaussian statistics simplifies many technical developments and makes mathematically tractable many problems in the areas of filtering, estimation, detection and control. Example 11-6: Let X(n) be a zero-mean Gaussian sequence; that is, E[X(n)] = 0 for all n. Also, let X be delta correlated; that is, RX(n,m) = E[X(n)X(m)] = 2(n-m), where 2 is the variance and 1, k  0  (k)   . 0, k  0 (11-54) Often, delta-correlated sequences are said to be white; in many applications, delta-correlated Gaussian sequences are called white Gaussian noise. For n  m, X(n) and X(m) are uncorrelated and, since they are Gaussian, independent. As a result, an Nth-order density function factors into a product of N first-order density functions. Most computer-based math packages (such as MatLab, Matcad, etc.) generate periodic sequences that, for many purposes, can be used to approximate white Gaussian noise. For these sequences, the correlation between elements can be very low, and the sequence period is very long relative to the number of sequence values that are needed. Independent Increments Random sequence X(n) is said to have independent increments if for all N > 1 and n1  Updates at http://www.ece.uah.edu/courses/ee385/ 11-18 EE603 Class Notes 12/05/13 John Stensby n2  ...  nN the process increments X(n1), X(n2) - X(n1), X(n3) - X(n2), ... , X(nN) - X(nN-1) are jointly independent. Such processes have the nice feature that Nth-order density and distribution functions can be “built up” as products of the densities of the individual increments. For example, the second order distribution, for the case n2 > n1, can be written as F(x1, x 2 ; n1, n 2 )  P  X(n1 )  x1, X(n 2 )  x 2   P  X(n1 )  x1, X(n 2 )  X(n1 )  x 2  x1  (11-55)  P  X(n1 )  x1  P  X(n 2 )  X(n1 )  x 2  x1 . We have seen independent increment processes in previous chapters. For example, the Random Walk process, introduced in Chapter 6, has independent increments. Stationarity Often, random sequences are generated by a mechanism that is not changing with time. In these cases, the sequence moments are constant. More precisely, a random sequence is said to be stationary if, for all positive integers N, the Nth-order density function of the sequence is invariant to any shift of the index. That is, stationarity requires f ( xn1 , xn 2 , , xn N ; n1, n 2 , ..., n N ) (11-56)  f ( xn1 , xn 2 , , xn N ; n1  n0 , n 2  n0 , ..., n N  n 0 ) for all orders N and index shift values n0. Example 11-5 introduces a random sequence (n) of interval times. Since the interval times are independent, Nth-order densities can be built up as products of first-order densities, each of a form given by (11-46). Clearly, the random sequence (n) of interval times is described by an Nth-order density that satisfies (11-56); the sequence (n) is stationary. On the other hand, the total waiting time to the nth arrival, the sum T(n) given by (11-49), is not Updates at http://www.ece.uah.edu/courses/ee385/ 11-19 EE603 Class Notes 12/05/13 John Stensby stationary as is obvious by inspection of (11-52) and (11-53). Wide-Sense Stationarity (WSS) A weaker form of stationarity is adequate in some applications. Sometimes, all that is required is “stationarity in all second-order statistics”. We say that a random sequence is wide sense stationary (WSS) if its mean function is constant and its covariance depends only on the time difference. That is, the sequence is WSS if (n)  (0) (11-57) R X (n, m)  R X (n - m)  R X (k) , (11-58) where k  n - m is the time difference between the two sequence values. Clearly, all stationary sequences are WSS. However, the converse is not true. Gaussian sequences provide an interesting example for which there is no difference between the two forms of stationarity. Two distinct sequences can have “mutual stationarity” properties. Wide sense stationary sequences X(n) and Y(m) are said to be jointly wide sense stationary if R xy [n, m]  E  X(n)Y(m)  R xy [n  m] . (11-59) That is, the cross correlation depends only on the time difference n-m. Suppose X(n) and Y(m) are jointly WSS so that Rxy[n-m] = E[X(n)Y(m)]. We define k  n-m and write R xy [k]  E  X(m  k)Y(m) . (11-60) That is, for Rxy[k], k denotes the shift applied to the first indexed sequence (i.e., X(m)). Note that Rxy[k]  Ryx[k], in general. However, note that Rxy[k] = E[X(m+k)Y(m)] = E[X(m)Y(m-k)] Updates at http://www.ece.uah.edu/courses/ee385/ 11-20 EE603 Class Notes 12/05/13 John Stensby = Ryx[-k]. Power Spectral Density Let X(n) be a real-valued, wide-sense-stationary sequence with finite average power E[X(n)2] < . Denote the autocorrelation of X as Rx(k) = E[X(n+k)X(n)]. The power spectrum (or power spectral density) is denoted as Sx(). The celebrated Wiener-Khinchine theorem states that the power spectrum and autocorrelation comprise a discrete-time Fourier transform (DTFT) pair. That is, we write S x ()  F  R x (k)   R x (k)    k  R x (k) e jk ,      (11-61) 1  S x () e jk d .   2 Actually, Sx is 2 periodic in  and only need be specified on – <   . The average power in X(n) can be expressed as 1  S ()d Avg. Pwr  E  X 2 (n)     2   x (11-62) watts. White Noise Sequence A zero-mean X(n) is said to be a white noise sequence if R x (k)  E[X(n  k)X(n)]  2(k) . (11-63) Note that 2 is the finite variance of X(n). The power spectral density of X is S x ()  F  R x (k)     2 (k) e  jk   2      . (11-64) k  Updates at http://www.ece.uah.edu/courses/ee385/ 11-21 EE603 Class Notes 12/05/13 John Stensby The average power in X is 1  2 Avg. Pwr  E  X 2 (n)    d   2 watts.   2   (11-65) Note that a discrete-time white sequence has a finite average power (contrast this with the continuous-time case discussed in Chapter 8). Systems We are interested in systems with random sequence inputs. First, we review some basic definitions involving systems. Then, we focus on determining the mean and autocorrelation of the output of a linear system given descriptions of the input process and system impulse response. Given input sequence X(n,), we denote the system output as Y(n,  )  L  X(n, ) , (11-66) where operator L[·] maps input X into output Y. Often, we do not explicitly write the  variable in the notation; we write Y(n) = L[X(n)] with the  implied. The system is said to be linear if L X1(n) + X 2 (n)  L  X1(n)  L  X 2 (n) (11-67) for all inputs X1, X2 and all constants , . Linear system can be described by an unit sample response, denoted as h(n,m), assumed to be real-valued in what follows. This function is the response at time n to an unit sample function applied at time m. In general, impulse response h(n,m) may, or may not, depend on the absolute values of indices n and m, and h(n,m) may, or may not, be nonzero for values of n less than m. Given input X(n) and impulse response Updates at http://www.ece.uah.edu/courses/ee385/ 11-22 EE603 Class Notes 12/05/13 John Stensby h(n,m), we can express the output as Y(n) = L  X(n)  =   h(n, )X( ) . (11-68)  =  A linear system is said to be shift invariant (or time invariant) if a simple delay in the input sequence produces a corresponding delay in the output sequence. More formally, we say that linear system L[·] is shift invariant if Y(n) = L  X(n)   Y(n - n 0 ) = L  X(n - n 0 )  (11-69) for all input/output pairs (X, Y) and all index shifts n0. Shift invariant systems depend only on the difference of n and , not their absolute values. In this case, we can write h(n,) = h(n - ). Also, for shift invariant systems, Equation (11-68) becomes Y(n) = L  X(n)  =   h(n - )X( ) = h * X , (11-70)  =  the convolution of input X with impulse response h. A system is said to be bounded input - bounded output (BIBO) stable if bounded input sequences produce bounded output sequences. A linear, shift-invariant system is BIBO stable if, and only if, its impulse response is absolutely summable; that is, BIBO stability is equivalent to   h(n)   . (11-71) n=  A linear, shift-invariant system is said to be causal if it does not respond before it is Updates at http://www.ece.uah.edu/courses/ee385/ 11-23 EE603 Class Notes 12/05/13 John Stensby excited. More explicitly, for a causal system, if two inputs X1(n) and X2(n) are equal up to some index n0, then the corresponding outputs Y1(n) = L[X1(n)] and Y2(n) = L[X2(n)] must be equal up to index n0; what happens to the inputs after index n0 in no way influences the outputs before index n0. One can show that a linear, shift-invariant system is causal if, and only if, h(n) = 0 for n < 0. For a linear, shift-invariant and causal system, the input-output relationship becomes Y(n) = L  X(n) = n  h(n  )X( ) . (11-72)  =  One should consider the differences between (11-68), the most general I/O formula, (11-70) for the shift-invariant case and (11-72) which describes the most restrictive case. Linear, shift-invariant systems can be analyzed in the frequency domain. For this purpose, we describe the Fourier Transform of signal X(k) as X F (e j )    X(k) e  jk (11-73) k=- (we will use a subscript of F to denote a Fourier transform). If (11-73) converges, XF is a continuous, 2-periodic function of frequency variable . The inverse Fourier transform is X(n) = 1  X F (e j ) e jnd .  2  (11-74) The Fourier transform of a linear, shift-invariant system's output can be found easily. Simply use the convolution theorem with (11-70) to obtain YF (e j )  F [h(n)  X(n)]  H F (e j )X F (e j ) . Updates at http://www.ece.uah.edu/courses/ee385/ (11-75) 11-24 EE603 Class Notes 12/05/13 John Stensby Systems With Random Inputs Given a system with a random input, we determine below the mean and autocorrelation of the output. A more general, difficult problem is to find the Nth-order density function that describes the system output. A linear system with a Gaussian input will have a Gaussian output. Unfortunately, a general statement of this scope cannot be made for nonlinear systems or systems driven by non-Gaussian inputs. Theorem 11-2: Consider the linear system with input X(n) and output Y(n) = L[X(n)] (we do not require shift-invariance or causality). Suppose that both X(n) = E[X(n)] and Y(n) = E[Y(n)] exist. For this case, we can write  Y (n) = E  Y(n)   E  L[X(k)]  L  E[X(k)]  L  X (n)  . (11-76) That is, it is possible to interchange the operations of L[·] and E[·]. We write    E  Y(n) = E   h(n, m)X(m)  .  m   (11-77) Then, we formally interchange the summation and expectation to obtain      Y (n) = E  Y(n) = E   h(n, m)X(m)    h(n, m)E[X(m)]  m   m=     m  , (11-78) h(n, m) X (m) and (11-76) is established. Note that our derivation of (11-76) is not rigorous. A potential problem with (11-78) is the formal interchange of expectation and summation. In cases where the mean of Y(n) does not Updates at http://www.ece.uah.edu/courses/ee385/ 11-25 EE603 Class Notes 12/05/13 John Stensby exist, this interchange is not valid (can you construct a simple example where the mean of Y(n) does not exist, i.e., the interchange in (11-78) is not valid?). We will consider this “interchange problem” again once we have studied some convergence concepts. Let's consider a special case of (11-78); suppose that input X(n) is wide-sense stationary and the system is shift invariant. Then, h(n,m) = h(n - m) and X(n) = X is a constant so that  Y (n)         h(n - m) X   h(m) X   H(e j )  X ,  m   m=   (11-79) so Y(n) = Y is a constant as well. The bracketed quantity on the right-hand side of (11-79) is the DC gain of the system (which we assume to be finite in the development of (11-79)). Example 11-7: Consider a low-pass filter with impulse response h(n) = nU(n), where 0 <  < 1 to insure stability. The Fourier transform of h is H(e j )  1 1  e  j (11-80) According to (11-79), the mean of the filter output is X/(1 - ). Next, we determine the cross correlation between a system input X(n) and its output Y(n), both input and output assumed to be real valued. This quantity is defined as R XY (n, m)  E  X(n)Y(m) (11-81) Then, we use this result to find the autocorrelation RY of the system output. Theorem 11-3: Let X(n) and Y(n) denote the input and output, respectively, of a linear operator L[·]; that is, Y(n) = L[X(n)]. The cross-correlation between the input X and output Y can be calculated by the formula Updates at http://www.ece.uah.edu/courses/ee385/ 11-26 EE603 Class Notes 12/05/13 R XY (n, m) = L2  R X (n, m) , John Stensby (11-82) where L2 signifies that L operate with respect to the second variable (i.e., “m” is the independent variable in the operation), treating the first variable (i.e., “n”) as a constant. In a similar manner, the autocorrelation of the output can be calculated by the formula R Y (n, m) = L1  R XY (n, m) , (11-83) where L1 signifies that L operate with respect to the first variable only (i.e., “n” is the independent variable in the operation). Proof (see Theorems 7-1 and 7-2 for continuous-time version of this result): First, we write X(n)Y(m) = X(n)L[X(m)] = L2 [X(n)X(m)] , (11-84) where L2 operates on X(m). Now, take the expected value of this result to obtain E[X(n)Y(m)] = E  L2 [X(n)X(m)] = L2  E[X(n)X(m)] = L2  R X (n, m) , (11-85) and this establishes (11-82). The formula for the autocorrelation of the output can be developed by taking the expectation of the product Y(n)Y(m) to obtain R Y (n, m) = E  Y(n)Y(m) = E  L[X(n)]Y(m) = E  L1[X(n)Y(m)] = L1  E[X(n)Y(m)] (11-86) = L1  R XY (n, m) , (L1 operates on functions of n) and this establishes (11-83) so that the theorem is established.  Updates at http://www.ece.uah.edu/courses/ee385/ 11-27 EE603 Class Notes 12/05/13 John Stensby Note that Theorem 11-3 does not require that operator L (i.e., linear system) be time invariant or that the input be wide-sense stationary. Let us consider Theorem 11-3 specialized to the case of a WSS input sequence X(n) and a shift-invariant, linear system described by unit sample function h(n). For this case, formula (11-82) yields  R XY (n, m) =   =- R X (n, m - )h() (11-87)  =   =- R X ([n - m] + )h() =    =- R X ([n - m] - )h(-) . Observe that the right-hand side of (11-87) depends on n, m only through the difference k  n-m. Hence, X and Y are jointly wide sense stationary, and we can write R XY (k) = R X (k) * h(-k) . (11-88) For the WSS case, the output correlation formula (11-86) becomes  R Y (n, m) =   R XY ({n - }- m)h() =    R XY ({n - m}- )h() , (11-89) a formula depending on k  n-m. Hence, we write R Y (k) = R XY (k)  h(k) . (11-90) Finally, combining (11-88) and (11-90) yields Updates at http://www.ece.uah.edu/courses/ee385/ 11-28 EE603 Class Notes 12/05/13 R Y (k) = R X (k)  h(-k)  h(k) = R X (k)  {h(-k)  h(k)} , John Stensby (11-91) and we see that a WSS input produces an WSS output. Example 11-8: Suppose output Y is related to input X by the simple relationship Y(n) = L[X(n)] = X(n) - X(n -1) , (11-92) the first-order, backwards difference. For example, sequence Y(n) might be subjected to a threshold to implement a “pulse detector” function. The mean of the output is E[Y(n)] = E[X(n)] - E[X(n -1)] =  X (n) -  X (n -1) . (11-93) The cross-correlation between input and output is R XY (n, m) = L2  R X (n, m) = R X (n, m) - R X (n, m -1) . (11-94) Finally, the autocorrelation of the output is R Y (n, m) = L1  R XY (n, m)  = R XY (n, m) - R XY (n -1, m) = R X (n, m) - R X (n, m -1) - { R X (n -1, m) - R X (n -1, m -1) } (11-95) = R X (n, m) - R X (n -1, m) - R X (n, m -1) + R X (n -1, m -1) . Suppose the input is WSS with autocorrelation R X (n,m) = a n-m , 0 < a < 1. Updates at http://www.ece.uah.edu/courses/ee385/ (11-96) 11-29 EE603 Class Notes 12/05/13 John Stensby Then Equations (11-93) and (11-94) yield Y = 0 RXY(n,m) = an-m - an-m+1 , (11-97) and (11-95) yields RY(n,m) = 2an-m - an-1-m -an-m+1. (11-98) The output sequence Y(n) is WSS; if k  n – m, then (11-96) and (11-98) become RX(k) = ak (11-99) RY(k) = 2ak - ak-1 -ak+1, (11-100) respectively. A comparison of Fig. 11-3 and Fig. 11-4 (both correlations were computed and plotted for a = .6) reveals that the “pulse detector” (11-92) “decorrelates” the input data X(n), at Rx(k) Fig. 11-3: Eqn. (11-99) with a = .6. Updates at http://www.ece.uah.edu/courses/ee385/ Ry(k) Fig. 11-4: Eqn. (11-100) with a = .6. 11-30 EE603 Class Notes 12/05/13 John Stensby least to some extent. Vector Space of Random Variables All real-valued random variables with finite second moments (i.e., finite average power) comprise a vector space over the field of real numbers. We define vector space L2 as   L2  X : E[ X 2 ]   , (11-101) all real-valued, finite-second-moment random variables. We take the real number field, denoted here by R, as our scalar field. To show that L2 is a valid vector space, we must show, among other things, that L2 is closed under vector addition (i.e., X  L2 and Y  L2 implies that X + Y  L2) and scalar multiplication (i.e., X  L2 and c  R implies that cX  L2). The fact that L2 is closed under scalar multiplication follows easily. Clearly if X  L2 and c  R then E[cX2] = c2E[X2] <  so cX  L2. The fact that L2 is closed under vector addition follows from use of the Schwarz inequality (sometimes called the Cauchy-Schwarz inequality). Theorem 11-4 (Schwarz): Let X  L2 and Y  L2. Then E[XY] 2 2 2  E[ X ] E[ Y ] . (11-102) Proof: Let  be any real-valued number and consider 2 2 2 2 E[ X  Y ]  E[ Y ]   E[XY]   E[XY]   E[ X ]  0 . (11-103) Now, Equation (11-103) is a quadratic equation in , and the roots are either complex-valued or real and equal (see Fig. 5). Hence, in the quadratic equation, the discriminant must be nonpositive, or Updates at http://www.ece.uah.edu/courses/ee385/ 11-31 EE603 Class Notes 12/05/13 John Stensby E[ Y ] 2  2E[XY]   E[ X ] 2 2  -axis Fig. 5: Graph of quadratic equation 2 2 2 4 E[XY]  4E[ X ]E[ Y ]  0 . (11-104) The Schwarz inequality follows directly from (11-104). In (11-102), equality results when Y is a scalar multiple of X. Now, we show that L2 is closed under vector addition. Let X  L2 and Y  L2 and consider the sum X + Y. The second moment of the sum satisfies 2 2 2 E[ X  Y ]  E[ X ]  2E[XY]  E[ Y ] . 2 2 2 (11-105) 2  E[ X ]  2 E[ X ] E[ Y ]  E[ Y ] However, all quantities on the right-hand-side of (11-105) are finite since X  L2 and Y  L2. Hence, the sum X+Y  L2, and L2 is closed under vector addition. Closure under vector addition and scalar multiplication is necessary for L2 to be a valid vector space. The remaining requirements (found in any elementary text on linear algebra) that L2 must satisfy are shown easily. Hence, we can consider the set of all real-valued random variables with finite second moments to be a valid vector space. Updates at http://www.ece.uah.edu/courses/ee385/ 11-32 EE603 Class Notes 12/05/13 John Stensby Equality of Random Variables Let X and Y be random variables. The statement X = Y can be interpreted in different ways. Everything said about statement X = Y can be said about the equivalent statement X - Y = 0, and vice-versa. Hence, without loss of generality, we discuss the meaning of the statement random variable X = 0. X  0 Identically The statement X  0 identically means that the numerical value of X() = 0 for all   S. This is a very restrictive form of equality, one that is rarely needed in applications. Hence, we seek a “looser” interpretation of statement X = 0. X = 0 Almost Surely (a.s.) Means P[X = 0] =1 The statement X = 0 almost surely (a.s.) means P[X = 0] = P[{  S : X = 0}] = 1. Often, this condition is stated as 1) X = 0 almost everywhere (a.e.) 2) X = 0 with probability one, both equivalent phrases (used by different authors). It should be noted that X = 0 (a.s.) is NOT equivalent to X  0 (i.e., X = 0 for all   S, or everywhere). If X = 0 (a.s.), the event B = {  S : X  0} has probability zero, however it can be nonempty. E[X2] = 0 is Equivalent to P[X = 0] = 1 (Same as X = 0 (a.s.)) In words, E[X ] = 0 is stated as X = 0 in mean square, or more simply, X = 0 (m.s.). E[X ] = 0 is equivalent to P[X  0] = 0. To prove this, we show E[X = 0 if, and only if, P[X  0] = 0. First, we show the “if” part: assume P[X  0] = 0. Then, X is a discrete random variable with all probably concentrated at the origin; that is, its distribution function is F(x) = U(x), a unit step. Observe that 2   dF  x 2   dx   x 2(x) dx  0    dx  E[ X ]    Updates at http://www.ece.uah.edu/courses/ee385/ (11-106) 11-33 EE603 Class Notes 12/05/13 John Stensby Second, we show the “only if” part: assume E[X = 0. With this, use the Generalized Tchebycheff Inequality (see Chapter 2 of these class notes) to write 2 E[ X ] 2 P  X 2  1/ N    N E[ X ]  0   1/ N (11-107) for each integer N > 0. Now, note that N     2 2   P  X  0  P X  0 = P   {X  1/ n}  P  limit  {X 2  1/ n} .    n 1   N  n 1  (11-108) But, as indexed on n, sequence of events {X 2  1/ n} is nested increasing. Use continuity of probability and (11-107) to write N   N  P  X  0  P  limit  {X 2  1/ n}  limit P   {X 2  1/ n}  limit P  X 2  1/ N    N  n 1  N   n 1  N   (11-109)  0. Equations (11-106) and (11-109) lead to the conclusion that 2 E[ X ]  0 if, and only if, P[X  0] = 0 (same as X = 0 (a.s.)). (11-110) It is worth repeating that P[X = 0] = 1 is not the equivalent to the statement X  0 for all   S. Subspace of L2 M is said to be a subspace of L2 if it is a valid vector space (i.e., closed under scalar multiplication and vector addition in addition to the other requirements given in any linear algebra text) and M  L2. Subspaces play a crucial role in many applications that involve Updates at http://www.ece.uah.edu/courses/ee385/ 11-34 EE603 Class Notes 12/05/13 John Stensby optimization problems. Inner Product and Norm It is natural to define an inner product on L2 as the expected value of a product. That is, for any X  L2 and Y  L2, we denote the inner product (dot product or scalar product) as X,Y, and we define X, Y  E[XY] . (11-111) The Cauchy-Schwarz inequality (11-102) implies that X, Y  X, X Y, Y  . (11-112) That is, the inner product exists as a real number for every vector X and Y in L2. It can be shown that inner product X,Y = E[XY] satisfies the properties 1. X, X  0, and X, X  0 if and only if X = 0 almost surely (i.e., P[X = 0] = 1), 2. X,Y  Y,X and (11-113) 3. cX,Y  c X,Y , where c  R. If E[X] = 0, then second moment X, X  E[X 2 ] is the variance of random variable X. Random variables X and Y are said to be orthogonal if X, Y  E[XY]  0 . In Part 1) of (11-113), the statement “X = 0 almost surely” is not equivalent to X (i.e., X identically zero); so, X, X  0 is not equivalent to X  However, the equivalence of X, X  0 and X  0 is a general requirement of an inner product, as defined in almost all linear algebra books. However, in the applications literature, this subtle “issue” is overlooked, and (11-111) is declared a valid inner product. Updates at http://www.ece.uah.edu/courses/ee385/ 11-35 EE603 Class Notes 12/05/13 John Stensby Some authors change how random variables are defined/interpreted in an attempt to remove the phrase “almost surely” from Part 1) of (11-113) and “fix” the above-mentioned “issue”. They interpret a given X as a class of all random variables that are equal to X almost surely. Two members of the same class can differ on some set B as long as P[B] = 0. All class members will have the same expected value; sets of probability zero do not influence expectations. When computing X, Y  E  XY  , X represents any member from its class, as does Y; the expectation will be the same regardless of which class members are used. Interpreting a random variable as a class of equivalent random variables allows us to “fix” Part 1 of (11-113), removing the phrase “almost surely”. In terms of equivalent classes, the statement “X = 0” refers to a class of random variables, all of which are zero almost surely. On a vector space, a vector norm maps vectors into real numbers in a manner that adapts the concept of length to vectors. Almost universally, the norm of vector X is denoted as X . On vector space L2, we define the norm of X as X, X  E[X 2 ] X  (11-114) (we say that the inner product induces the norm). From (11-113) it follows directly that (11-114) satisfies 1. X  0. X  0 if, and only if, X = 0 almost surely (i.e., P[X = 0] = 1), 2. cX  c X , for any c  R and (11-115) 3. X+Y  X  Y (the triangle inequality). If E[X] = 0, then X is the standard deviation of X. In Part 1) of (11-115), P[X = 0] = 1 is not equivalent to X (i.e., X identically zero); so, X  0 is not equivalent to X  However, the equivalence of X  0 and X  0 is a Updates at http://www.ece.uah.edu/courses/ee385/ 11-36 EE603 Class Notes 12/05/13 John Stensby general requirement of a vector norm (see any text on linear analysis). However, in the applications literature, this subtle “issue” is overlooked, and (11-114) is declared to be a valid vector norm. This “problem” can be “fixed” by interpreting each random variable as a class, as discussed above. Often, norm (11-114) is called the mean-square norm since it involves the mean of the square of a random variable. In terms of (11-114), we can restate the Schwarz inequality as X, Y  X Y . (11-116) Equation (11-116) is how the Schwarz inequality is usually stated in the analysis literature where the notions of inner product and norm play central roles. The triangle inequality (Part 3 of (11-115)) has a form similar to the well-known triangle inequality for real numbers (which states that r1  r2  r1  r2 for any real numbers r1 and r2 This inequality follows from the observation X+Y 2 2  X + Y, X + Y  X, X  2 X, Y  Y, Y  X  2 X Y  Y 2 (11-117)  X  Y 2  which leads to the triangular inequality X + Y  X  Y . The norm (11-114) allows us a way to define the equality of two vectors (random variables). If X and Y are random variables for which X-Y  0 (11-118) we say that X = Y in the mean-square sense, or we say X = Y (m.s.). From (11-110), we see that (11-118) is equivalent to P[X = Y] = 1 and P[X  Y] = 0. Updates at http://www.ece.uah.edu/courses/ee385/ 11-37 EE603 Class Notes 12/05/13 John Stensby Convergence of Random Sequences Often, one has to deal with sequences of random variables that converge to a random variable. We say that the random sequence X(n;) converges to random variable X0() if for every fixed  0  S the sequence of numbers X(n; ) converges to the number X0(). This is "ordinary", sometimes called point-wise, sequence convergence (a topic that is usually covered in a Calculus course) that has nothing to do with the fact that we are dealing with random variables. Also, it is very restrictive. In applications, we can get by with much "weaker" modes of convergence; we discuss three alternative convergence modes. In what follows, we discuss almost sure (a.s.) convergence, convergence in probability (i.p.) and mean-square (m.s.) convergence. Mean square convergence is convergence in the mean-square norm (11-114). We discuss m.s. convergence first. Mean-Square Convergence (m.s. Convergence) As n goes to infinity, a sequence of random variables X(n)  L2 converges in m.s. to a random variable X0  L2 if   ½   limit X 0 - X(n)  0  same as limit E[{X0 - X(n)}2 ] = 0  . n  n    (11-119) The norm used in (11-119) is the mean-square norm given by (11-114). Often, this type of convergence is denoted as m.s (11-120) l.i.m X(n) = X 0 , (11-121) X(n)   X0 , or n  Updates at http://www.ece.uah.edu/courses/ee385/ 11-38 EE603 Class Notes 12/05/13 John Stensby where l.i.m denotes limit in the mean. Example 11-9: Let Z be a random variable with E  Z2   (i.e., Z  L2 ). .Let cn, n  0, be a sequence of deterministic real numbers converging to real number c. Then, cnZ, n  0, is a sequence of random variables. We show that l.i.m c n Z  cZ . (11-122) n  To see this, consider 2 2 2 2 2 E  c n Z  cZ   E  cn  c Z   cn  c E  Z  .       Now, cn  c and EZ2] <  implies EcnZ - cZ2]  0, and this proves (11-122). Example 11-10: Consider the probability space (S,B,P), where S = [0, 1], B the Borel sets (B is the -algebra generated by the open intervals on S. See Chapter 1 of class notes), and P  B   d , B  B (11-123) B (if B is an interval, then P[B] is the interval length. P can be thought of as a “generalized length” of event B). Consider the sequence of random variable defined by Fig. 6: Sequence of random variables Updates at http://www.ece.uah.edu/courses/ee385/ 11-39 EE603 Class Notes 12/05/13 John Stensby 0 1 X(n; )  1, n  0, 1 n (11-124)    1, as illustrated by Fig. 6. This sequence has a point-wise limit given by limit X(n; )  1, 0 n  (11-125)  0,   0 This sequence has zero as its mean-square limit since limit n  X(n; )  0 2  limit E[X 2 (n; )]  limit [1  1 ]  0 n  n  n (11-126) Theorem 11-5: Mean-square convergence is additive. That is, if X 0  l.i.m X(n) n  (11-127) Y0  l.i.m Y(n) , n  then for any real-valued constants a and b we have aX 0  bY0  l.i.m  aX(n)  bY(n)  . n  (11-128) Proof: Note that Updates at http://www.ece.uah.edu/courses/ee385/ 11-40 EE603 Class Notes 12/05/13 John Stensby {aX(n)  bY(n)}  {aX 0  bY0 }  a{X(n)  X 0 }  b{Y(n)  Y0 }  a{X(n)  X0 }  b{Y(n)  Y0 } (11-129) = a X(n)  X 0  b Y(n)  Y0 . However, Equation (11-127) ensures that the right-hand-side of (11-129) approaches zero as n approaches infinity, and this proves (11-128). Not every sequence of random variables has a mean square limit. We need tools and techniques for determining if a sequence has a mean-square limit. Fortunately, our intuition is helpful in this regard. Also helpful is some knowledge of real number sequences. Recall that real number sequences have the Cauchy property. This property states that a real number sequence {rn} converges if, and only if, rn - rm  0 as both n and m approach infinity. When equipped with the Euclidean norm, the set of real numbers is complete, we say. Similarly, sequences in L2 have the Cauchy property. This property states that a sequence X(n)  L2 converges (in the mean square norm) if, and only if, X(n)  X(m)  0 as both n and m approach infinity. When equipped with the mean square norm, the set of L2 random variables is complete, we say. Stated again, a random sequence X(n)  L2 has a mean-square limit X0 if, and only if, it is Cauchy (that is, X(n)  X(m)  0 as both n and m approach infinity). Mean-Square Cauchy Sequences and Completeness Let X(n), n  0, be a sequence in L2. The sequence is said to be a mean-square Cauchy sequence if limit X(n) - X(m) = 0 . n,m  (11-130) More tersely, we say that the sequence is m.s. Cauchy if (11-130) is true. For a m.s. Cauchy sequence, the quantity X(n)  X(m) approaches zero as n and m approach infinity, in any manner whatever. Basically, the further you “go out” in a mean-square Cauchy sequence the Updates at http://www.ece.uah.edu/courses/ee385/ 11-41 EE603 Class Notes 12/05/13 John Stensby "closer" (in the mean-square sense) the elements become. It is easy to show that mean-square convergence implies the mean-square Cauchy property (i.e., (11-120) implies (11-130)). Actually, this is true for arbitrary normed vector spaces (i.e., all convergent sequences are Cauchy sequences, regardless of the normed vector space under consideration). However, for the general normed vector space, Cauchy sequences are not necessarily convergence. But, for L2 space equipped with the mean-square norm, the mean-square Cauchy property implies mean square convergence. This is stated by the following theorem. Theorem 11-6 (Special Case of Riesz-Fischer Theorem) Vector space L2 is complete in the sense that a mean-square Cauchy sequence has a unique limit in L2. That is, for sequence X(n) in L2, there exists a unique element X0  L2 such that limit X 0 - X(n)  0 n     denoted symbolically as l.i.m X(n) = X 0  n    (11-131)    denoted symbolically as l.i.m [X(n) - X(m)]  0  .   n,m   (11-132) if limit X(n) - X(m) = 0 n,m  Since the converse is true (see paragraph before the theorem statement), (11-131) and (11-132) are equivalent for vector space L2. In (11-132), one must remember that the double limit is zero regardless of how n and m approach infinity. The value of Theorem 11.6 is this: we do not have to know/find the m.s. limit of a sequence to know that the sequence is m.s. convergent. To show that L2 sequence X(n) converges to some m.s. limit X0, we need not know/find X0. Instead, to show convergence, it is Updates at http://www.ece.uah.edu/courses/ee385/ 11-42 EE603 Class Notes 12/05/13 John Stensby sufficient to show that X(n) has elements that come arbitrarily close to one another as you “go out” in the sequence. In some cases, establishing (11-132) is much easier than finding X0 described by (11-131). With the introduction of Theorem 11.6, we have established L2 as a complete vector space with norm (11-114) that is induced by inner product (11-111). In the literature, such vector spaces are referred to as Hilbert Spaces. They are the natural setting for many significant problems in Fourier series, communication theory, optimal filtering, etc. Mean-square convergence has a number of useful properties. We discuss the ability to interchange l.i.m and expectation. Also, we show that a mean-square limit is unique (with equality in the mean-square sense). To develop these results, we must mention some (almost) obvious, facts. Note that l.i.m X(n) (11-133) n  is a random variable, but limit E[X(n)] (11-134) n  is an "ordinary" limit of an "ordinary" sequence. Also, for any random variable X in L2, we have 2 E[X]  E  X   E  X  1  E  X    12  X . (11-135) The first inequality results from the fact that the absolute value of an integral is less than, or equal to, the integral of the absolute value. The second inequality comes from the CauchySchwarz inequality (11-102) with Y = 1. Now, we show that we can interchange expectation and l.i.m. Updates at http://www.ece.uah.edu/courses/ee385/ 11-43 EE603 Class Notes 12/05/13 John Stensby Theorem 11-7: Let X(n) be a sequence in L2. Suppose X(n) has a m.s. limit X0  L2 that is, m.s X(n)   X0  limit X(n) - X 0  0 . n  (11-136) Then it follows that   E[X 0 ]  E  l.i.m X n   limit E[X(n)] .  n   n  (11-137) That is, expectation and l.i.m are interchangeable. Proof: Since L2 is complete, mean-square limit X0 is in L2 (X0 has a finite second moment), so E[X0] exists (i.e., the mean is finite). Now, from (11-135), we have E[X(n)]  E[X 0 ]  E[X(n)  X 0 ]  E  X(n)  X 0   X(n)  X 0 . (11-138) However, from (11-136) we know that the norm on the right-hand side of (11-138) goes to zero as n approaches infinity. Hence, we have the desired result (11-137). An important use of Theorem 11-7 deals with interchanging expectations and summations. For k = 1, 2, … , let Xk  L2 be a sequence of random variables with finite second moments. Define the nth partial sum Yn  n  Xk . (11-139) k 1 Suppose that Updates at http://www.ece.uah.edu/courses/ee385/ 11-44 EE603 Class Notes Y  l.i.m Yn  l.i.m n  12/05/13 John Stensby n  Xk . (11-140) n  k 1 We say that partial sum (11-139) converges in mean square to Y. By Theorem 11-7, we can write n n      E  Y   E  l.i.m Yn   E  l.i.m  X k   limit  E[X k ]   E[X k ] .  n   k 1  n  k 1  n  k 1 (11-141) Theorem 11-8: The mean-square limit of a sequence is unique. That is, if X 0  l.i.m X(n) n  Y0  l.i.m X(m) m   limit X 0 - X(n)  0 n  , (11-142) limit Y0 - X(m)  0 m  then X 0  Y0  0 and P[X0 = Y0] = 1. Proof: Observe that X 0 - Y0 = {X 0 - X(n)} + {X(n) - Y0} < X 0 - X(n) + X(n) - Y0 (11-143) from the triangle inequality. Now, on the right-hand side of (11-143), both norms go to zero as a consequence of (11-142). Hence, we have X 0  Y0  0 as claimed. The fact that P[X0 = Y0] = 1 follows immediately from (11-110). Example 11-11: We are trying to sample a DC voltage (for example, the output of a strain gauge, water tank level detector, etc.). However, our samples contain additive noise; the kth sample is Y(k) = mdc + (k), where mdc is the DC voltage we are trying to measure, and (k) is a real-valued sample of stationary, zero mean noise with variance 2. We assume that (k) is Updates at http://www.ece.uah.edu/courses/ee385/ 11-45 EE603 Class Notes 12/05/13 John Stensby uncorrelated from sample to sample (any two different-indexed samples are uncorrelated). We try the “time-honored” technique of averaging out the noise. That is, we form the average X(n) = 1 n  Y(k) . n k=1 (11-144) Note that X(n) has mdc as its mean and 2/n as its variance (indeed, with increasing n, we are “averaging out” the noise). However, the question remains: As n  , does the random sequence X(n)  L2 converge in mean square to a random variable? Let’s see if the sequence is mean-square Cauchy; consider X(m) - X(n) 2 = E  [{X(m) - mdc }-{X(n) - mdc }]2   E {X(m) - mdc }2 - 2{X(m) - mdc }{X(n) - mdc }+ {X(n) - mdc }2   (11-145) 2 2 .  2E {X(m)  mdc }{X(n) - mdc }  m n Consider the case n > m and use the fact that the noise is uncorrelated from sample to sample to evaluate the middle term E {X(m)  mdc }{X(n)  mdc }  E {X(m)  mdc } {X(m)  mdc }+ {X(n) - X(m)}  E {X(m)  mdc }2   E  X(m)  mdc  E  X(n)  X(m)     (11-146) 2  00 m Similarly, note that E[{X(m) - mdc}{X(n) - mdc}] = 2/n for the case m > n. Therefore, we can write (11-145) as Updates at http://www.ece.uah.edu/courses/ee385/ 11-46 EE603 Class Notes X(m) - X(n) 2 12/05/13 2 1 1 = 2   + .  m min{n, m} n  John Stensby (11-147) As m and n approach infinity (in any order), (11-147) approaches zero, so the sequence is meansquare Cauchy. By Theorem 11-6, the sequence is mean-square convergent. But what is its limit? The obvious “candidate” is mdc. To see that this is the limit, consider 1 n  (k)  limit 0.  n  n k 1 n  n limit X(n)  mdc  limit n  (11-148) So, we see that X(n) converges in mean square to mdc (and we can expect to get “better” results the more samples are included in the average). With Example 11-10, we have established a Mean-Square Law of Large Numbers for sequences of uncorrelated random variables. More general, let Yk, k = 1, 2, … , be a sequence of uncorrelated random variables with common mean E[Yk] = m and common variance VAR[Yk] = 2. Then the sample mean X(n) = 1 n  Y(k) n k=1 (11-149) converges in mean square to m. In a subsequent section, we will show that (11-149) converges to m in probability, a yetto-be-defined mode of convergence that is weaker than mean-square convergence. That sample mean (11-149) converges in probability to m is just the well-known and popular Law of Large Numbers, (weak version) that is cited often in the popular press. Example 11-12: Let X(k), k  1, be a sequence of independent random variables each of which is either 1 or 0. Furthermore, suppose that Updates at http://www.ece.uah.edu/courses/ee385/ 11-47 EE603 Class Notes 12/05/13 John Stensby P[X(k) = 1] = 1/k . (11-150) P[X(k) = 0] = 1-1/k As k  , does X(k) converge in mean square? Let’s check the obvious candidate X = 0; consider limit X(n) - 0  limit 1/ n  0 . n  (11-151) n  So, we see that X(n) converges in mean square to the random variable X = 0. However, in Example 11-16, we will see that X(n) does not converge (to zero) in a point-wise manner. Example 11-13: Let X(k), k  1, be a sequence of independent random variables similar to the previous example. However, suppose that X(k) is either k or 0 with P[X(k) = k] = 1/k 2 . (11-152) P[X(k) = 0] = 1-1/k 2 So, as k becomes large, we see that X is getting larger with a smaller probability. Is X(k) mean square convergent? To find out, consider X(m) - X(n) 2 = E  X(m)2 - 2X(m)X(n) + X(n)2    , (11-153) = 2[1-1/nm] a result that converges to 2 as m, n approach infinity. Hence, X(n) is not mean-square Cauchy; hence, it is not mean square convergent. The last two examples illustrate the fact that meansquare convergence depends on both the numerical values a sequence takes on and the probabilities of taking on those values. Updates at http://www.ece.uah.edu/courses/ee385/ 11-48 EE603 Class Notes 12/05/13 John Stensby Theorem 11-7 tells us that expectation and l.i.m. are interchangeable for m.s. convergent sequences. A similar result holds for the inner product operation defined by (11-111). Theorem 11-9 (Continuity of the Inner Product): Let X(n) and Y(m) be m.s. convergent sequences with m.s. limits X0 and Y0, respectively, so that X 0  l.i.m X(n) n  Y0  l.i.m Y(m) m  limit X 0 - X(n)  0 n   . (11-154) limit Y0 - Y(m)  0 m Under these conditions, we claim that X 0 , Y0  l.i.m X(n), l.i.m Y(m)  limit X(n), Y(m) . n  m n,m (11-155) Proof: First, consider the simple algebra X(n), Y(m) - X 0 , Y0 = X(n), Y(m) - X(n), Y0 + X(n), Y0 - X 0 , Y0 = X(n), Y(m) - Y0 + X(n) - X 0 , Y0 (11-156)  X(n), Y(m) - Y0 + X(n) - X 0 , Y0  X(n) Y(m) - Y0 + X(n) - X 0 Y0 . m.s.  X0 Now, since X(n)  as n  , the sequence X(n) is bounded (can you show this??), say X(n) < M. Use this fact, (11-154) and (11-156) to conclude limit n,m  X(n), Y(m) - X0 , Y0  limit  M Y(m) - Y0 + X(n) - X0 Y0   0 , n,m  Updates at http://www.ece.uah.edu/courses/ee385/ (11-157) 11-49 EE603 Class Notes 12/05/13 John Stensby a result that proves (11-155) and the continuity of the inner product. Theorem 11-9 establishes continuity of the inner product X,Y  E[XY]. What we mean by this is simple. Suppose we are given sequences X(n) and Y(m) with m.s. limits X0 and Y0, respectively, as described by (11-154). For “large” n and m, X(n) and Y(m) “get close” to X0 and Y0, respectively, and X(n),Y(m)  E[X(n)Y(m)] “gets close” to X0,Y0  E[X0,Y0]. This intuitive idea is known as continuity of the inner product. Convergence in Probability (i.p. Convergence) Some results that involve mean square convergence of random sequences can be generalized to a "weaker" convergence mode. This new mode is called convergence in probability. It is "weaker" (i.e., more general) than m.s. convergence; m.s. convergent sequences also converge in probability, but the converse is not true. As n   , a random sequence X(n) converges in probability (i.p.) to a random variable X0 if, for every  > 0, we have limit P  X(n) - X 0    0 . n  (11-158) Often, this type of convergence is denoted by either of i.p. X(n)   X0 (11-159) l.i.p X(n)  X . (11-160) n  For convergence in probability, many of the results parallel those given above for m.s. convergence. First, as we “go out” in a series (i.e., as the index becomes large), it may be more likely that the terms are closer together (this does not mean that the terms must be closer together in the m.s. sense). We say that a random sequence X(n) is Cauchy in probability if, for every  > Updates at http://www.ece.uah.edu/courses/ee385/ 11-50 EE603 Class Notes 12/05/13 John Stensby 0, we have limit P  X(m) - X(n)     0 . (11-161) n,m Cauchy in probability is a “weaker” condition than Cauchy in the mean square sense. A sequence that is mean square Cauchy is also Cauchy in probability, but the converse is not true. Condition (11-130) implies condition (11-161); however, the converse is not true. Next, we provide a theorem that does for convergence in probability what Theorem 11-6 did for convergence in mean square. Theorem 11-10: As n  , a sequence X(n) converges in probability to a random variable X0 if, and only if, the sequence is Cauchy in probability. Proof: First, we show that if X(n) converges in probability to X0 then it is Cauchy in probability. Suppose that the sequence converges in probability. Then note the event (i.e., set) relationship  X(m) - X(n) > ε   X(m) - X0 > ε/2   X(n) - X 0 > ε/2 , (11-162) as depicted by Figure 11-7. From (11-162), we see that P  X(m) - X(n) > ε   P  X(m) - X 0 > ε/2   P  X(n) - X 0 > ε/2  . (11-163) Longer Than  X0 X(n) X(m) Longer Than  Figure 11-7: If X(n) - X(m)   then either X(n) - X0  or X(m) - X0  Updates at http://www.ece.uah.edu/courses/ee385/ 11-51 EE603 Class Notes 12/05/13 John Stensby Now, since X(n) converges to X0 in probability, both terms on the right hand side of (11-163) approach zero as n and m approach infinity. Hence, the sequence is Cauchy in probability as claimed. The converse (if X(n) is Cauchy in probability then it converges in probability) is harder to prove and is not given here (see M. Loève, Probability Theory I, 4th Edition, pp. 117118). Theorem 11-11: If a sequence converges in probability, then the limit is unique. That is, suppose X(n) converges in probability to both X0 and Y0. Then it necessarily follows that P[X0  Y0] = 0. Proof: Using the same reasoning that led to (11-163), we can write  X0  Y0     X0 - X(n)   / 2   Y0 - X(n)   / 2 (11-164) P  X 0  Y0     P  X 0 - X(n)   / 2   P  Y0 - X(n)   / 2  . (11-165) However, both terms on the right-hand side of (11-165) approach zero as n approaches infinity. Hence, for every  > 0 we have P  X0  Y0     0 , (11-166) so that limit P  X 0  Y0     0 . 0+ (11-167) Continuity of the probability measure (see Appendix 11B) and (11-167) lead to the conclusion P  X 0  Y0  0  0 , Updates at http://www.ece.uah.edu/courses/ee385/ (11-168) 11-52 EE603 Class Notes 12/05/13 John Stensby and this establishes the claim that P[X0  Y0] = 0.  As claimed previously, convergence in mean square implies convergence in probability. This claim is substantiated by the following theorem (which is a nice application of the Tchebycheff inequality). Theorem 11-12: Convergence in mean square implies convergence in probability. Proof: Let X(n) be a sequence that converges in mean square to the random variable X0. For each n, apply the generalized Tchebycheff inequality (see Chapter 2 of these notes) to X(n) - X0 and obtain 2 E  X(n) - X 0  X(n) - X 2   0 P  X(n) - X 0  ε   = 2 2 ε ε (11-169) m.s.  X 0 , so that X(n) - X 0  0 as n  . for every  > 0. However, we know that X(n)  Hence, with (11-169), we have limit P  X(n) - X 0     0 , n  (11-170) i.p. so that X(n)   X 0 as claimed.  Let’s reconsider Examples 11-11 and 11-12, both of which provided sequences that converged in the mean square sense. Now, we know that these sequences converge in probability, as implied by Theorem 11-12. Actually, that the sequence in Example 11-11 converges in probability is just a statement of the Law of Large Numbers (weak version). Theorem 11-13 (The Weak Law of Large Numbers): Let X(n) be a sequence of independent, identically distributed (i.i.d) random variables with mean X and variance 2X . Then, the sample mean Updates at http://www.ece.uah.edu/courses/ee385/ 11-53 EE603 Class Notes ˆ n  12/05/13 John Stensby 1 n  X(k) n k=1 (11-171) converges in probability to the “real” mean X as n approaches infinity. Proof: The proof of this theorem follows from Example 11-11 and Theorem 11-12. The law of large numbers is the basis for estimating X from measurements. In applications, it is common to take the sample mean (11-171) as an estimate of the “real” mean X. The basis for doing this is the Law of Large Numbers. Example 11-14: In Example 11-13, we considered a sequence of independent random variables X(k), k  1, with P[X(k) = k] = 1/k2 P[X(k) = 0] = 1 - 1/k2. We found out that this sequence does not converge in the mean square sense (a “sufficient number” of the sample function sequences contain a “sufficient number” of instances where X(k) = k so that m.s. convergence is not achieved). Now, we show that it does converge in probability to X0 = 0. For every  > 0, we have limit P  X(k) - X 0     limit P  X(k) >    limit P  X(k) = k   limit 1/ k 2   0 ,  k k  k  k   (11-172) and we see that the sequence converges in probability to zero (in (11-172), only probabilities that X(k) = k are involved; for k  1, the actual numerical values of X(k) do not enter into the computation). The converse of Theorem 11-12 is not true (convergence in probability does not imply convergence in mean square), and Example 11-14 is a counter example that establishes this fact. Updates at http://www.ece.uah.edu/courses/ee385/ 11-54 EE603 Class Notes 12/05/13 John Stensby Basically, convergence in mean square is dependent upon both the numerical values of the sequence elements and the probabilities associated with the values. On the other hand, convergence in probability is only concerned with the probabilities. Example 11-15: For convergence in probability, this example shows that one cannot interchange the limit and expectation operations. For n  1, consider the sequence X(n), where X(n) is either = -1 or n. Also, suppose that P  X(n) = n  = 1/n P  X(n) = -1 = 1-1/n . (11-173) The sequence converges in probability to X0 = -1 since limit P  X(k) - X 0     limit P  X(k) - (-1)    k k  limit P  X(k) = k   limit 1/k   0 k . (11-174) k Now, we look at mean values. Clearly, E[X0] = -1, and 1  1 1 E  X(n)  = n   + (-1)  1-  = , n  n n (11-175) which has a zero limit as n approaches infinity. Hence, we have shown that limit E  X(n)   0  E[ l.i.p X(n) ]  E[X 0 ]  1 . n  n  (11-176) Example 11-15 serves as a counter example that shows that you cannot, in general, interchange the operations of limit in probability and expectation. That is, it is not generally true Updates at http://www.ece.uah.edu/courses/ee385/ 11-55 EE603 Class Notes 12/05/13 John Stensby that   E  l.i.p X(n)   n   and limit E  X(n) n  produce the same value (This differs from mean square convergence; recall that Theorem 11-7 proved that expectation and l.i.m are interchangeable). So, while convergence in probability is very general (and weak), there are limitations on what you can do with it. Convergence Almost Surely (a.s. Convergence) The last form of convergence we will study is called almost surely (a.s.) convergence. The random sequence X(n) converges almost surely to the random variable X0 if the sequence of functions X(n;) converges to X0() for all   S except possibly on a set of probability zero (recall that S denotes the sample space). Almost surely convergence requires that     P  limit X(n)  X 0   P    S : limit X(n;  )  X 0 ( )   1 . n   n     (11-177) In other words, X(n) converges almost surely to random variable X0 if there exists an event A, _ with P(A) = 1 (and P(A) = 0), for which X(n;)  X0() for all   A. Often we write a.s. X(n)   X0 . (11-178) Obviously, this type of convergence is “weaker” than pointwise (p.w.) convergence (p.w. Updates at http://www.ece.uah.edu/courses/ee385/ 11-56 EE603 Class Notes 12/05/13 John Stensby convergence requires that X(n;)  X0() for all   S). However, as shown below, almost surely (a.s.) convergence implies convergence in probability (i.p.). And, it doesn’t imply, nor is it implied by, convergence in mean square (m.s.). In the literature, a.s. convergence goes by the names convergence almost everywhere and convergence with probability one (other names are used as well). Like convergence in mean square and probability, in the context of almost sure convergence, it is possible to examine the separation, or distance, between sequence elements as we “go farther out” in a sequence. We say that X(n) is an almost surely Cauchy sequence if     P  limit X(n)  X(m)  0  P    S : limit X(n; )  X(m; )  0  1 . n,m  n,m    (11-179) In other words, there exists an event A, P(A) = 1, for which limit X(n; )  X(m; )  0 n,m (11-180) for all   A. To establish that X(n) is an almost surely Cauchy sequence, we do not require knowledge of a sequence limit. With regard to necessary and sufficient conditions for the Cauchy criteria, almost surely convergence parallels m.s. and i.p. convergence. To show almost surely convergence of a sequence, it is not necessary to come up with a limit (in the almost surely convergent sense) for the sequence. Instead, as shown by the following theorem, we can use the Cauchy criteria. Theorem 11-14: A sequence X(n) is almost surely convergent if, and only if, it is an almost surely Cauchy sequence. Proof: This theorem follows from the fact that, in the real number system, sequences of real numbers converge if, and only if, they are Cauchy sequences. A practical and useful test for almost surely convergence is given by the following Updates at http://www.ece.uah.edu/courses/ee385/ 11-57 EE603 Class Notes 12/05/13 John Stensby theorem. Theorem 11-15: Let X(n) denote a sequence of random variables. Suppose that X(n) converges to random variable X0 almost surely; that is, we suppose that a.s. X(n)   X0 . (11-181) Then, for every  > 0 we have    limit P  X(n) - X 0   for all n  m   limit P   { X(n) - X 0  }  1 , m  m   n=m  (11-182) which we write as limit P  A m   1 , (11-183) m  where Am is defined as A m   S : X(n; ) - X 0 ()   for all n  m     { S : X(n; )  X0 ()  } , (11-184) n m an event that depends on m and . The converse is true as well; hence, (11-182) and (11-181) are equivalent (i.e., one implies the other). Note: The sequence Am , m  0, is nested increasing with m; that is, Am  Am+1 for all m and all  > 0. Also, the complement of (11-184) is (DeMorgan’s Laws come in handy here) Updates at http://www.ece.uah.edu/courses/ee385/ 11-58 EE603 Class Notes 12/05/13 John Stensby A m   S : X(n; ) - X 0 ( )   for some n  m   (11-185)   { S : X(n; )  X0 ()  }. nm a.s. So, Theorem 11-15 is sometimes stated as: X(n)   X 0 iff for all  > 0 we have limit P  X(n)  X 0   for some n  m   limit P  A m   0 . m  (11-186) m  a.s. Proof: First, suppose that X(n)   X 0 . Then, there exists an event 1 for which P 1   1 P {S  1}  P  1   0 (11-187) limit X(n; )  X0 () for each  1. n   Now, show that 1   k 1 A k . Take any 0   . As shown by (11-187), X(n;0) converges in an “ordinary” sense to X0(0); this means that, given any  > 0, there exists an integer m(0,) (integer m depends on 0 and  with the property X(n, 0 )  X 0 ( 0 )   (11-188) for n  m(0,). Hence, we see that 0  A k , for all k  m(0,); that is, we can write 0  1  0  A k    { S : X(n;)  X0 ()  }, k  m(0 ,) . (11-189) n=k Since the Ak are nested increasing, we have Updates at http://www.ece.uah.edu/courses/ee385/ 11-59 EE603 Class Notes 1  12/05/13 John Stensby   Ak . (11-190) k=1 Since P(1) = 1, Equation (11-190) yields    P   Ak   1 .    k=1  (11-191) This leads to the conclusion n       n  1  P   A k   P  limit  A k   limit P   A k   limit P(A n ) ,    n   n    k =1  k =1     k=1  n  (11-192) a.s. and we have proven that (11-181), which states X(n)   X 0 , implies (11-182), which states n  P[A n ]  1 . Now, we show the converse; we show that (11-182) implies (11-181). We do this by showing that a false (11-181) implies a false (11-182) (this is the contrapositive of the statement “(11-182) implies (11-181)”). Hence, assume that (11-181) is false and show that limit P[A m ]  1 (i.e., (11-182) is false). If (11-181) is false there exists an event , P() > 0, m such that X(n,)  / X0() for    ( i.e., convergence does not occur for   ) . Now, consider the random variable Z( )  lim sup X(n, )  X 0 ( ) ,  S . n (11-193) The event {  S : Z() > 0} can be expressed as { S : Z( )  0}    { S : Z()  1/ n} . (11-194) n 1 Updates at http://www.ece.uah.edu/courses/ee385/ 11-60 EE603 Class Notes 12/05/13 John Stensby For each 0  , we have Z(0) > 0, so 0  {  S : Z() > 0}; this fact implies that    S : Z()  0. (11-195) Now, P() > 0 implies P ({  S : Z()  0 })  0 and the existence of some integer n0 for which the event {  S : Z() > 1/n0} has a strictly positive probability (to see this, equate the probability of both sides of (11-194) and use the continuity of P). That is, we have P[ {   S : Z() > 1/n0 } ] > 0. (11-196) But, this positive probability event is contained in the complement of Am, m  1, defined using  = 1/n0. This observation is written as {  S : Z() > 1/ n0}  A m    { S : X(n, )  X0 ()  1/ n0} , (11-197) n=m for every integer m (apply DeMorgan’s Law to (11-184) to get this complement). Hence, for every integer m, we have P(A m )  P( { S : Z()   )  0 , (11-198) so that P(A m ) is bounded away from zero, and P(Am) is bounded away from unity, as m  . Hence, Equation (11-183) (equivalently, Equation (11-182)) cannot be true; we have shown that a false (11-181) implies a false (11-182) (equivalently, we have shown that (11-182) implies (11-181)).   Updates at http://www.ece.uah.edu/courses/ee385/ 11-61 EE603 Class Notes 12/05/13 John Stensby Theorem 11-16: Almost surely (a.s.) convergence implies convergence in probability (i.p.). Proof: This is easy to show. Suppose that X(n)  X0 almost surely (a.s.) so that P(A m )  0 as m   for any fixed (but arbitrary)  > 0 used in the definition of Am. Note that { S : X(m, )  X 0 ()  }  A m    { S : X(n, )  X0 ()  } . (11-199) n=m Hence, P(A m )  0 as m   implies that P({ X(m)  X 0  })  0 as m   , and we have X(m)  X0 in probability (i.p.). Theorem 11-16 shows that a.s. convergence implies convergence in probability; however, the converse is not true, as shown by the next example. Example 11-16: This example shows that convergence in mean square (m.s.) does not imply convergence almost surely (a.s.). Recall that Example 11-12 discussed a binary random sequence X(k), all independent random variables, with P[X(k)  1]  1/ k . (11-200) P[X(k)  0]  1  1/ k In Example 11-12, we saw that X(k) converges in mean square (m.s.) to X0  0 (hence, it also converges in probability (i.p.) to X0  0). Now, we show that this sequence does not converge almost surely (a.s.). In terms of Am given by (11-184), observe that Updates at http://www.ece.uah.edu/courses/ee385/ 11-62 EE603 Class Notes 12/05/13  John Stensby  limit P[A n ]  limit P   { X(m) - X 0  }  limit P   {X(m) = 0}  n   m  n  n  n   m  n  limit( 1  1 )( 1  1 )  n n   limit n 1   ( 1  n 1m ) n  m=0 (11-201)     limit exp    1  n m n   m=0   0. Since this limit is not unity, X(m) cannot converge almost surely to X0 = 0 (study again Equation (11-182)). What we have provided here is a counter example that shows that mean square (m.s.) convergence does not imply almost surely (a.s.) convergence. Also, the example shows that convergence in probability (i.p.) does not imply convergence almost surely (a.s.). Also, see Stark and Woods (3rd Edition), Example 6.7-3, p. 381 for a similar example. Example 11-17: This example shows that convergence almost surely (a.s.) does not imply convergence in mean square (m.s.). Recall that Example 11-13 presented a binary random sequence X(k) of independent random variables with P[X(k)  k]  1/k 2 . (11-202) P[X(k)  0]  1-1/k 2 As shown by Example 11-13, this sequence is not mean square (m.s.) convergent. We show that X(k) converges almost surely (a.s.) to X0 = 0. In terms of A m defined by (11-199), observe that Updates at http://www.ece.uah.edu/courses/ee385/ 11-63 EE603 Class Notes 12/05/13  John Stensby  limit P[A n ]  limit P   { X(m) - X 0  }  limit P   {X(m)  m}  n    n  n   m=n mn (11-203)  1/ m2  0.  n   limit m=n Equivalently, in terms of An given by (11-184), this last result implies that    limit P[A n ]  limit P   { X(m) - X 0  }  1 . n  n   m=n  (11-204) From Theorem 11-15 (see Equation (11-182)), we can conclude that X(n) converges almost surely (a.s.) to X0 = 0. Together with Example 11-13, this example shows that convergence almost surely (a.s.) does not imply convergence in mean square (m.s.). Also, this example shows that convergence in probability (i.p.) does not imply convergence in mean square (m.s.). The next example is somewhat counter intuitive. It demonstrates that convergence point wise does not imply convergence in mean square, in general. Even though X (n; )  X 0 () for all   S (i.e., the random variable converges point wise), the integral in the computation of E[X[n;] - X02] may diverge so that X(n) does not converge to X0 in mean square. Example 11-18: Consider the probability space (S,B,P), where S = [0, 1], B the Borel sets (B is the -algebra generated by the open intervals on S. See Chapter 1 of class notes), and P  B   d , B  B (11-205) B (if B is an interval, then P[B] is the interval length. P can be thought of as a “generalized length” of event B). For   S, define the random variable sequence Updates at http://www.ece.uah.edu/courses/ee385/ 11-64 EE603 Class Notes 12/05/13 John Stensby Figure 11-8: Relationship between modes of convergence.  n, 1    2 n n  X(n, )  n I[ 1 , 2 ] ()   n n  0, otherwise (11-206) (note that IB() is called the Indicator Function). On S, X(n) converges to zero in a pointwise p.w. manner. We say that X(n)  0 . Sometimes, we say that X(n) converges everywhere or surely. However, sequence X(n) does not converge to zero in the mean square sense since X(n)  0 2 2 2 1  E  X(n)   n 2     n .   n n (11-207) Venn Diagram Describing Convergence Modes Figure 11-8 shows a Venn diagram that depicts the interrelationships between i.p., m.s., a.s., and p.w. convergence. The diagram follows directly from the definitions, theorems and counter examples given in this chapter. Mean square convergence neither implies, nor is it implied by, a.s. convergence; see Examples 11-16 and 11-17 for relevant counterexamples. The fact that p.w. convergence does not imply m.s. convergence is established by Example 11-18. Theorem 11-12 (alternatively, Theorem 11-16) establishes that m.s. (alternatively, a.s.) convergence implies i.p. convergence. Updates at http://www.ece.uah.edu/courses/ee385/ 11-65

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 11 - Sequences of Random Variables