Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Continuous Distributions, Mainly the Normal Distribution STA 281 Fall 2011 1 Continuous Random Variables Discrete distributions place probability on specific numbers. A Bin(n,p) distribution, for example, has possible values 0, 1, …, n and each of these numbers has positive probability of occurring. Some random phenomena, instead of having possible values of a discrete set, can take values in a range. For example, a person’s height can be anywhere between 0 and 10 feet tall. It MIGHT be exactly 6 feet, but odds are it’s more like 6.003243556 feet. It is a mathematical fact that if you have an interval of numbers, you can’t place a positive probability on each of them. However, we’d still like to talk about the probability a randomly chosen person has a height between 72 and 75 inches. We need another way of expressing probability models for random variables that can take on any value in an interval of the real line. Such random variables are called continuous random variables. Just like discrete random variables, continuous random variables have a set of possible values. However, this set is an interval of numbers (xmin,xmax) instead of a set of numbers like 0, 1, …, n. It is possible that xmin=-∞ and/or xmax=∞. To compute probabilities for continuous random variables, we have a probability density function f(x) defined over the range of possible values. To find the probability of any interval (a,b), integrate the density over that range. Thus ( ) ∫ ( ) This idea is similar to adding probabilities for discrete random variables. If X Bin(n=100,p=0.3) and you want P(23≤X≤26), you compute it by taking P(X=23)+P(X=24)+P(X=25)+P(X=26). This amounts to ( ) ∑ ( ) For continuous random variables, integration replaces adding the values together. Not just any f(x) can be a density. Since by integrating densities we are supposed to arrive at probabilities, densities must obey the conditions 1. ∫ ( ) . Just like any probabilities, the probability of the sample space must be 1. 2. f(x)≥0 for all x. If f(x) were negative anywhere, we could integrate in that region and get a negative probability. Since probabilities cannot be negative, densities can’t either. One important fact to keep in mind when dealing with densities is that all individual points have 0 probability. Thus, if X is a continuous random variable, P(X=37)=0. Points do not have probability, but intervals do have probability. Example. Let X be a random variable with density f(x)=0.5x over the range 0 ≤ x ≤ 2. This is a valid density since it is nonnegative over its entire range and ∫ ( ) P(0.2<X<0.6). We just integrate the density over that range 1 . Suppose we wanted to find Notice that, since points have zero probability, P(X=0.2)=0 and P(X=0.6)=0 (similarly any probability involving a single point is zero). You CANNOT just read probabilities off the density, you have to perform the integration. Furthermore, since points don’t have any probability, whether you use less than or less than or equal signs doesn’t make any difference. For example ( ) ( ) ( ) ( ) Be careful to keep the range in mind when computing probabilities. Continuing with the example, suppose you wanted P(-0.4<X<-0.1). There is no need to perform the integration in this example, since the interval (-0.4,-0.1) is outside the range of X (given to be the interval between 0 and 1). Thus, P(0.4<X<-0.1)=0. Also, P(-0.3<X<0.7) must be “edited” to P(0<X<0.7) before the probability is computed. The interval (-0.3,0) contains no probability by assumption, so we don’t need to integrate on that region. In general, look at the probability you are interested in, decide where you need to integrate based on the range of the random variable (don’t integrate outside the range!), and only then perform the integration to find the probabilities. 2 Cumulative Distribution Functions We are interested in interval probabilities such as P(a≤X≤b). Instead of doing the integration again and again for every a and b, one useful function is the cumulative distribution function, or cdf. Specifically, the cumulative distribution function F(x) is ( ) ∫ ( ) ( ) { Notice that we switched the variable of integration from x to t. The variable of integration is irrelevant; we could use any letter we want. However, the variable of integration cannot match the letter used for the boundary of integration. Thus, for the cdf, the boundary of integration involves x and we use t here for the variable of integration. The cdf values outside of (xmin,xmax) are derived simply because X cannot be outside that range. Thus, the probability of being below any value less than xmin is 0 by the definition of xmin, and the probability of being below any value greater than xmax is 1 because the entire range of X is below xmax. To continue our example, reconsider the density f(x)=0.5x for x between 0 and 2. The cdf in that range is The cdf allows us to compute many kinds of probabilities directly, without performing any integration (the integration has been “performed”). In particular, 1. P(X≤x)=F(x) (this is by definition). 2. P(X>x)=1-P(X≤x)=1-F(x) (this is by the complement rule). Remember that for continuous random variables P(X>x)=P(X≥x), since points have no probability. 3. P(a<X≤b)=P(X≤b)-P(X≤a)=F(b)-F(a). Again note that for continuous random variables points have no probability, so P(a<X≤b)= P(a≤X≤b)= P(a≤X<b)= P(a<X<b). 2 3 Expected Values and Variances ∑ Recall that, for discrete random variables, we had the formulas [ ] ( ), [ ( )] [( [ ] ( [ ]) . Similar formulas hold for continuous ∑ ( ) ( ) ] ), and [ ] distributions, except that the sums are replaced with integrals, and the P(X=x) is replaced with the density f(x). Specifically [ ] [( [ ] ∫ [ ( )] ∫ ( ) ( ) ( [ ]) [ ]) ] ∫ In the example with f(x)=0.5x for x ∫ ( ) ( ) [∫ [ ( ) ( ) ] ( [ ]) ] (0,2), we find As with discrete distributions, the square root of the variance is called the standard deviation. 4 Normal Distributions 4.1 Introduction A Normal distribution has two parameters, µ and σ2. The parameter µ is called the mean, and determines the central location of the distribution. The parameter σ2 is called the variance (σ is the standard deviation) and determines the spread of the distribution. If a random variable X has a normal distribution with mean µ and variance σ2, then we write X N(µ,σ2). One particular normal, called the standard normal, occurs when µ=0 and σ2=1. Usually a random variable with a standard normal distribution is written as Z. The density function f(x) for a normal curve is rather messy. The range of a normal distribution is the entire real line (-∞,∞) and the density is 3 ( ) { √ ( ) } Plotting this results in a “bell curve”. The expectation and variance of the normal may be found using the integration formulas from Section 3. These integrals result in [ ] [ ] [ ] ∫ √ ∫ { √ [ ] ( { ) ( ) } } ( [ ]) You can find probabilities (in theory!) for the normal distribution using the density f(x) just like any other continuous density. For any numbers a<b, ( ) ∫ ( ) The difficulty is that the function f(x) cannot be integrated analytically. This is not to say it’s just extremely hard. It’s more than that. It just can’t be done. The function f(x) does NOT have an antiderivative. Finding antiderivatives is not the only way of computing integrals, however. One can also use numerical integration (such as the trapezoidal rule or Simpson’s rule from your calculus classes). At some point, people did this for the standard normal, which has density ( ) ⁄ √ The numerical integration was focused at finding the cumulative distribution function of the standard normal, which is ( ) ∫ ( ) Recall from the previous sections that knowing the cumulative density functions allows you to compute all kinds of interval probabilities, such as P(Z≤z), P(Z>z), and P(a<Z≤b). The numerical results for Φ(z) were typically reported in tables, one of which is available on the course website. Today, we more commonly use computer programs to find answers. We will focus on both approaches in this course. Normal tables provide values of Φ(z) for z between about -3.5 and 3.5. For values of z less than 3.5, P(Z≤z) is essentially 0, and for values of z above 3.5, P(Z≤z) is essentially 1. If you are working problems where z is outside the range of the table, you may substitute 0 or 1 appropriately for Φ(z). If z is not divisible by 0.01, just use the closest z value in the table. For example, if you are trying to find P(Z≤2.3125), just use P(Z≤2.31). Computer programs, by contrast, allow you to put in any number and get an answer, and are thus easier to use and more accurate (surprise!). In R, the command is pnorm(2.3125) Of course, 2.3125 can be replaced with any number. 4 The remainder of this document is intended to describe a method for computing probabilities for arbitrary normal random variables (normal random variables without µ=0 and σ2=1) using the cdf of the standard normal Φ(z). 4.2 4.2.1 Computing Normal Probabilities and Quantiles Linear Transformations of Normal Distributions Normal distributions are strange in that despite having an ugly looking density and being impossible to integrate, virtually everything else you want to do with them works out nicely (albeit occasionally after a lot of work). One quite useful fact about normal distributions concerns linear transformations. Suppose X N(µ,σ2) and Y=aX+b, so Y is a linear transformation of X. We already know from a previous handout that E[Y]=aE[X]+b=aµ+b and V[Y]=a2V[Y]=a2σ2. These facts do not require the normality of X. However, when X is normally distributed, the linear transformation Y=aX+b also has a normal distribution, with the mean and variance given by those same formulas. Thus µY= aµX+b and . This result applies even more generally, since linear combinations of independent normal random variables also have normal distributions. In other words, if ( ) and ( ) with X independent of Y, then W=aX+bY+c also has a normal distribution. Using the formulas for the mean and variance of a linear combination, we find [ ] and (because X and Y are independent) [ ] . We summarize these results in the following theorem: THEOREM (Linear Transformations and Combinations of Normal Random Variables) a) If X N(µX, b) If ( ) and Y=aX+b, then Y N(aµX+b,a2 ). ), ( ), and X and Y are independent, then W=aX+bY+c N( ). c) In general, if X1,…,Xn are all independent, ( ), and W=a1X1+…+anXn+b, then ( ) The most important use of this theorem is the Z transformation, which provides a link between standard normals and any other normal. COROLLARY (Z Transformations) a) If X N(µ,σ2), then Z=(X-µ)/σ N(0,1). b) If Z N(0,1), then X=σZ+µ N(µ,σ2). Z transformations, together with the standard normal cdf Φ(z), allow us to compute interval probabilities for any normal distribution. 4.2.2 Computing Normal Probabilities and Quantiles Using the Z Table The building block for finding probabilities for any normal distribution is the standard normal cdf Φ(z). Suppose X N(µ,σ2). To find probabilities involving X, we use the Z transformation and then find probabilities in terms of a standard normal. 1. P(X≤x). Performing some algebra, we find ( ) ( ) 5 ( ) Thus, to find P(X≤x), look up Φ((x-µ)/σ). 2. Recall P(X≥x)=1-P(X≤x). 3. P(a≤X≤b)=P(X≤b)-P(X≤a). A quantile is a percentile expressed as a probability. The 80th percentile of a distribution, for example, is the point where 80% of the population is beneath that value. This is equivalent to the 0.8 quantile. Similarly, the 30th percentile is the 0.3 quantile. Suppose Z N(0,1) and we want to find the point z such that P(Z≤z)=0.4. The margins of a Z-table contain values of z, while the interior of the table contains probabilities. Thus, if you want to find a value of z, you are looking for a value on the margin of the table. The given information, 0.4, is a probability, so you have to find 0.4 in the body of the table. You will find that the closest value to 0.4 in the table is 0.4013, corresponding to z=(-0.25). To find the quantile of an arbitrary normal X N(µ,σ2), you first have to find the Z-score for the given quantile, and then use the formula X=σZ+µ to convert it back to the original X. For example, suppose X N(µ=50,σ2=100) and we want to find the 0.7 quantile. Finding 0.7 in the body of the table, the closest value is 0.6985, which corresponds to a Z-score of 0.52. Converting this back to the original scale, we find X=(10)(0.52)+50=55.2. 4.2.3 Computing Normal Probabilities and Quantiles Using R Although the equations above involving standard normals will work, R provides easier ways of finding normal probabilities that avoid the Z transformation. In addition to pnorm(z) providing P(Z≤z) for a standard normal Z, you can also just specify the mean and standard deviation. R uses the standard deviation, not the variance, to measure spread for normal distributions. Thus, if X N(5,3), where 3 is the variance, you can find P(X≤2) using the R command pnorm(2,5,sqrt(3)) The formulas in the previous section (P(X≥x)=1-P(X≤x) and P(a≤X≤b)=P(X≤b)-P(X≤a)) still apply, so you can find P(X>6) using the command 1- pnorm(6,5,sqrt(3)) and you could find P(1≤X≤7) using the command pnorm(7,5,sqrt(3))-pnorm(1,5,sqrt(3)) Quantiles in R are also easy to compute. The qnorm command supplies quantiles. Thus, to find the point x such that P(X≤x)=0.6, just use qnorm(0.6,5,sqrt(3)) Obviously, using the computer is easier and more accurate than using the table. Thus, this is the recommend method. Unfortunately, as computers are unavailable as standard equipment for exams, you still need to learn the Z-table method. In addition, should you decide to take more statistics, understanding the Z-transformation is the foundation of some more advanced theorems. 6 4.3 Examples of Finding Probabilities with Normal Tables Let X N(15,32) a) Find P(X<12.5), P(X>14.2), and P(X=15). b) Find P(3.5<X<16.4) and P(10.4<X<13.6). c) Find c such that P(X<c)=0.05 and d such that P(15-d<X<15+d)=0.80. 7 5 Problems 1) Let X be a continuous random variable with density f(x)=1-(x/2) over the range 0 to 2. a) Find the cdf of X. b) What are P(X=0.5), P(X<-1.5), P(X<0.7), P(X>0.2), and P(0.1<X<1.2)? c) What are E[X] and V[X]? 2) Let X N(µ=5,σ2=9). a) What are P(X=5), P(X<7), P(X>8), and P(5.2<X≤8.6)? b) What is the 0.4 quantile of X? c) Find a central region containing 90% of the probability. 3) Let X be a continuous random variable with density f(x)=x-0.5 for 1<x<2. a) Find P(0<X<1.5). b) Find P(X<4). c) Find the cdf F(x). 4) Let X N(5,42). Find a) P(X≥18). b) P(X<-0.5). c) P(6<X<11). 5) Let X be a continuous random variable with density 2(1-x) over the range (0,1). Find P(X>0.2), E[X], and V[X]. 6) Let X and Y be independent random variables with X N(3,62) and Y N(-2,32). Define W=3X2Y-1. Find a) the distribution of W. b) P(W=3). c) P(W>40). 7) A student turns in 10 homework assignments over the course of a semester. Suppose scores on each homework are normally distributed with mean 70 and variance 42, and assume all homeworks are independent. a) Let Y be the TOTAL number of points on the 10 homework assignments. Find the distribution of Y. b) Let ̅ be the AVERAGE number of points on the 10 homework assignments. Find the distribution of ̅ . c) Find P( ̅ >72). 8) A potential buyer is evaluating two different types of servers. Server A’s response times (in ms) are distributed N(24,32) while server B’s response times are distributed N(22,42). To test the servers, the buyer makes 5 requests to server A and 5 requests to server B. Assume the requests are independent. Let ̅ be the mean response time for server A and let ̅ be the mean response time for server B. a) Find the distribution of ̅- ̅ . b) Find P( ̅> ̅ ). 8