Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer Overview What are we studying? Probability What is probability? The mathematical language for quantifying uncertainty Why are we studying probability? Dan has his reasons… Motivation These are the kind of questions that we hope to be able to answer. Review Definition: Random Variable – a random variable X is a mapping from the sample space to the real line Example: Flip a fair coin twice. The sample space is every combination of heads and tails. Choose the random variable X to be the number of heads. Let our outcome be one head and one tail. X maps this outcome to 1. Note: The notation can be confusing. In the book, usually X denotes the map and X with a subscript denotes a real number. However, this is not always the case, so you must examine the context to be sure of the meaning. X : {HH , HT , TH , TT } X HT X 1 Sequences of Random Variables Much of probability theory is concerned with large sequences of random variables. This study is sometimes known as large sample theory, limit theory, or asymptotic theory. What is a sequence of random variables? Simply a set of indexed set of random variables. We will be interested in sequences that have some interesting limiting behavior (i.e. we can say something about them as n gets large) X1 , , X n A Special Kind of Sequence of Random Variables As it turns out, a very common and particularly useful class of sequences of random variables is IID Definition: IID – Independent and identically distributed Independent – essentially, the value of one random variable doesn’t effect the value of any other; for instance, a coin doesn’t remember which side last landed up, so consecutive flips are said to be independent Identically distributed – each random variable P X has associated with it a cumulative A P A [0,1] distribution function (CDF) which is derived from the probability measure. The CDF gives the probability of the value of the random P CDF FX ( x) P( X x) variable being less than or equal to a certain value. When two or more random variables have the same CDF, we say they are X1 , , X n F identically distributed. Statements about sequences of random variables Given a sequence of random variables, most likely IID, it would be useful to be able to make statements of the form: The average of all Xi will be between two values with certain probability. Or, what is the probability that the average of Xi is less/greater than a certain value? These kind of statements can be made with the help of the Weak Law of Large Numbers (WLLN) and the Central Limit Theorem (CLT), respectively. So why not state them now? “Hold your horses” there, Makarand. The statement of the WLLN and the CLT make use of a few different types of convergence which must be discussed first. Example: Flip a fair coin n times. The average number of heads per toss will be between .4 and .6 with probability greater than or equal to 70% if we flip 84 times (n=84). Types of Convergence There are two main types of convergence Convergence in Probability (CIP) Convergence in Distribution (CID) A sequence of random variables is said to converge in probability to X if the probability of it differing from X goes to zero as n gets large. A sequence of random variables is said to converge in distribution to X if the limit of the corresponding CDFs is the CDF of X. There is also another type, called convergence in quadratic mean, which is used primarily because it is stronger than CIP or CID (it implies CIP and CID) and it can be computed relatively easily. P( X n X ) 0 P X n X lim Fn t F t n X n ~ X qm X n X qm CIP CID Weak Law of Large Numbers If a sequence of random variables are IID, then the sample average converges in probability to the expectation value. On the left we have information about many trials and on the right we have information about the relative likelihood of the different values a random variable can take on. In words, the WLLN says that the distribution of the sample average becomes more concentrated around the expectation as n gets large. 1 X n i X i n P X n E ( X1 ) xf ( x)dx Example of using the WLLN Consider flipping a coin for which the probability of heads is p. Let Xi denote the outcome of a single toss (either 0 or 1). Hence p = P(Xi=1)=E(Xi). The first equality is a definition. The second equality is obtained by averaging over the distribution. The fraction of heads after n tosses is equal to the sample average. Note that the Xi are IID. Therefore the WLLN can be applied. The WLLN says that the sample average converges to p = E(Xi) in probability. You may find yourself wondering, how many times must I flip this coin such that the sample average is between .4 and .6 with probability greater than or equal to 70%? The WLLN tells you that it is possible to find such an n. The inequalities that Justin presented on from Chapter 4 can be used to show that n=84 does the trick in this case, but I’ll spare you the details. E ( X ) i xi f ( xi ) 0(1/ 2) 1(1/ 2) 1/ 2 1 X n i X i n P X n E ( X1 ) The Central Limit Theorem Given a sequence of random variables with a mean and a variance, the CLT says that the sample average has a distribution which is approximately Normal, and gives the new mean and variance. X 1 , , X n ; ; 2 X n ~ N ( , / n) 2 Notice that nothing at all need be assumed about the P, CDF, or PDF associated with X, which could have any distribution from which a mean and variance can be derived. Example of using the CLT Suppose that the number of errors per computer program has a Poisson distribution with mean 5. We get 125 programs. Approximately what is the probability that the average number of errors per computer program is less than 5.5? In this case 125 is the sample size, which we hope is large enough to make a good approximation. The approximation we are making here is that the sample average will have a Normal distribution. Taking the sample size, mean and variance into account, it is possible to show that the question asked is equivalent to the probability of the standard Normal distribution being less than 2.5, which turns out to be approximately 0.9983. Topics in Chapter 5 not covered in this presentation All proofs Slutzky’s theorem and related theorems Multivariate central limit theorem The effect of adding sequences of random variables on their convergence behavior CLT with IID random vectors instead of variables The delta method The effect of applying a smooth function to a sequence of random variables on its limiting behavior Interesting problems 6&8 Bibliography Chapters 1-5 of All of Statistics by Larry Wasserman