Download 4. CONTINUOUS DISTRIBUTIONS

CIVL 3103 – Approximation and Uncertainty J.W. Hurley, R.W. Meier 4. CONTINUOUS DISTRIBUTIONS The beautiful thing about learning is nobody can take it away from you. – B. B. King In the last chapter, we said that continuous random variables can take on continuous values. Examples of continuous random variables could be time, distance, mass, force, temperature, or any combination of these (such as stress or velocity). This makes them fundamentally different from discrete random variables, which can only take on certain specific values (head/tails, hearts/spades/clubs/diamonds, etc.) PROPERTIES OF CONTINUOUS RANDOM VARIABLES With a continuous random variable, P(X = x) is zero, so you have to address probability in a different way. You have to talk, instead, about the probability that the random variable X will fall within some range of values a ≤ X ≤ b. Those probabilities are determined by finding the area under a curve called the probability density function: b P ( a ≤ X ≤ b ) = ∫ f ( x ) dx f(x) a a b x The probability density function has to satisfy two conditions: f ( x ) ≥ 0 and ∞ ∫ f ( x ) dx = 1 −∞ The former ensures that the area under the curve (probability) is always non-negative and the latter says that it’s 100% certain that the experiment will have some outcome! These are, of course, the same conditions that had to be satisfied by the probability mass function for discrete random variables. Now, no one likes to integrate functions (well, mathematicians might, but engineers sure don't!) and that's all you can do with the probability density function. You do have a friend, though, and it's called the cumulative distribution function (or simply the distribution function). Just as for discrete random variables, the cumulative distribution function gives the probability that the random variable X will take on a value less than or equal to x when the experiment is performed: F ( x ) = P( X ≤ x) = x ∫ f ( u ) du −∞ If you have the distribution function, you don't have to integrate anything! To determine the probability that X will be less than or equal to some value a , simply pull the values off the curve: P( X ≤ a) = F ( a ) 1 F(x) F(b) F(a) 0 a x b If you want to know the probability that the random variable X will take on a range of values: P ( a ≤ X ≤ b ) = F (b) − F ( a ) Unfortunately, we don't have distribution functions for all of the various probability density functions because some of them can’t be integrated in closed form. Instead, they must be integrated numerically (more about that at the end of the course when we talk about approximation). Fortunately, someone has already done the numerical integration for you and summarized the results in tabular form. There is an appendix full of tables in the back of your textbook covering the most commonly used continuous distribution functions (all of which will be covered later in this chapter and at various points throughout the rest of the course). 39 The expected value of a continuous random variable is the probability-weighted average of the possible outcomes (as it was for discrete random variables). We said earlier that the expected value (also called the mean) is just the point about which the probability distribution is centered. In engineering terms, it’s the centroid of the probability distribution. As you no doubt remember from Statics, the centroid of an area can be calculated as E ( X ) = µx = ∞ ∫ x f ( x ) dx −∞ The variance of a continuous random variable is just the probability-weighted average of the squared deviations from the expected value (as it was for discrete random variables). It can be calculated in an analogous manner: V ( X ) = σ 2x = ∞ ∫ ( x − µ ) f ( x ) dx 2 x −∞ As before, we can write the variance as the expected value of the squared deviations: 2 V ( X ) = E ( x − µ x )    and we can also calculate the variance as V ( X ) = E ( X 2 ) −  E ( X )  2 which is easier to compute than squaring a bunch of deviations. SOME IMPORTANT CONTINUOUS DISTRIBUTIONS In this chapter, we’ll look at three of the most commonly used continuous distributions: o The Uniform Distribution o The Exponential Distribution o The Normal Distribution In subsequent chapters, we’ll add several more to this list. 40 THE UNIFORM DISTRIBUTION The uniform distribution is one in which a value is equally likely to occur anywhere in the range between the smallest possible value a and the largest possible value b: f(x) c a x b By definition, the total area under the probability density function must be one, so: c (b − a ) = 1 ⇒ c = 1 b−a From this, we conclude that the probability density function for a uniformly distributed random variable is simply f ( x) = 1 for a ≤ x ≤ b (0 elsewhere) b−a The cumulative distribution function, which gives P(X ≤ x), can be calculated by integrating the probability density function between –∞ and x. This task is made simpler by the fact that f(x) is zero for any X ≤ a: x F ( x) = ∫ a f (u ) du = −∞ x x 1 1 du = du − − b a b a ∫a a ∫ 0 du + ∫ −∞ Completing the integration, we get F ( x) = x−a for a ≤ x ≤ b b−a which is just the equation of a straight line extending from 0 at X = a (there’s no chance of X taking on a value less than a ) to 1 at X = b (it’s a certainty that X will take on a value less than or equal to b ). 41 The expected value of a uniformly distributed random variable can be found from our earlier equation as: E(X ) = ∞ ∫ x f ( x ) dx = −∞ b 1 b−a ∫ x dx = a 1  b2 − a2    b−a 2  This simplifies to E( X ) = µ = a +b 2 which is just the midpoint of the range. Of course you probably didn’t need to integrate to figure this out. If the random variable X has to be somewhere between a and b then your best guess for the value of that variable would probably be the midpoint. The phrase “best guess” is synonymous with “expected value.” The variance of a uniformly distributed random variable can be found using the simplified formula ( ) V ( X ) = E X 2 −  E ( X )  2 The expected value of X2 can be calculated as ∞ b ( ) ∫ x f ( x ) dx = b −1 a ∫ x E X2 = 2 −∞ 2 dx = a 1  b3 − a 3  b 2 + ab + a 2  = b−a 3  3 Substituting for E(X) from the equation above, V (X )= b 2 + ab + a 2  a + b  −  3  2  2 which can be simplified to V ( X ) = σ2 = 1 2 (b − a ) 12 Recalling that the standard deviation is just the square root of the variance, the standard deviation of a uniformly distributed random variable is σ= 1 (b − a ) 12 42 EXAMPLE: The Uniform Distribution Most calculators can generate “random numbers” as can most spreadsheets (e.g., Excel uses the function RAND to put a random number in a cell). These electronically generated “random numbers” appear, statistically, to be random. By this we mean that we cannot conclude, using various statistical tests, that these numbers do not follow a uniform distribution. The numbers are not really random, though, because they are generated by a computer algorithm that uniquely determines the next number to be generated from the last number generated or from some “seed value” entered by the user or stored in memory. For example, a very simple random number generator is: 5 5 xi = ( xi −1 + π ) − int ( xi −1 + π )    This generates pseudo-random numbers between 0 and 1, which we abbreviate as X ∼ U [ 0 , 1 ] meaning “the random variable X is uniformly distributed over the range [0,1].” What is the probability that the random number generator (RNG) above will generate a value between 0.2 and 0.3 the next time it is invoked? 43 THE EXPONENTIAL DISTRIBUTION The exponential distribution is often used to model the times between successive events, such as customers arriving at a drive-thru window. If the arrivals are at random (i.e., completely independent of one another), then the number of arrivals per unit time is Poisson distributed (as we learned in the last chapter) and the time intervals between arrivals will be exponentially distributed. The exponential distribution is defined by a single parameter, λ, which is the average number of arrivals per unit of time. (Yes, that’s the same λ that defines the Poisson distribution): f(x) λ 0 x The probability density function for the exponential distribution is: f ( x ) = λe −λx which can be integrated to provide the cumulative distribution function F ( x ) = 1 − e − λx The cumulative distribution function gives the probability that the next arrival will be in no more than x time units given an expected (or average) time between arrivals of: E( X ) = µ = 1 λ The variance of the exponential distribution is given by: V (X ) = σ 2 = 44 1 λ2 It is just as common to ask “What is the probability that the next arrival will be at least x minutes from now?” as it is to ask “What is the probability that the next arrival will be in less than x minutes?” The two questions are complementary: P ( X ≥ x) = 1 − P ( X ≤ x) Substituting the cumulative probability function for P(X ≤ x) gives P ( X ≥ x ) = e −λx An important property of the exponential distribution is that it is memoryless. That means that it doesn’t matter if the last arrival was one second ago or one hour ago, the probability that an arrival will (or won’t) occur in the next x minutes is exactly the same. The Poisson process doesn’t remember when the last arrival was. This derives from the assumption that the arrivals are completely independent of one another, so the timing of the last arrival should have no effect on the timing of the next arrival. EXAMPLE: The Exponential Distribution Telephone calls arrive at the help desk of a small computer software company at the rate of 15 per hour. What is the probability that the next call arrives within 3 minutes? 45 MODELING FAILURE WITH AN EXPONENTIAL DISTRIBUTION Random equipment failures (as opposed to, say, those that are age-related or triggered by an event, such as a voltage spike) are often modeled as Poisson processes. Each failure is considered a random “arrival” so the length of time between failures follows an exponential distribution. In this context, the parameter λ is referred to as the failure rate (the expected number of failures per unit time) and its inverse (the expected amount of time between failures) is called the Mean Time Between Failures (MTBF). EXAMPLE: If the mean time between failures for a SCSI disk drive is 1,000,000 hours (approximately 114 years), what is the probability that the disk drive will last 5 years? E(X ) = λ= 1 = 1,000,000 hours failure λ 1 failures hour 1,000,000 ( MTBF ) ( failure rate ) Asking the probability that the disk drive will last 5 years is the same as asking the probability that the next failure will be more than 5 years away: P ( X > 5 years ) = P ( X > 43,800 hours ) = e − 43,800 1,000,000 = 0.957 So there’s a 4.3% chance that the disk drive will fail in 5 years, even though its MTBF is nearly 114 years! Most equipment is made up of multiple components. The MTBF for a system with multiple components is given by MTBF = 1 λ1 + λ 2 + K + λ n For example, for a large corporate server with perhaps 100 disk drives, the MTBF for the system is 1 MTBF = 100 × 1 1,000,000 = 10,000 hours which is just slightly more than one year! The probability that the system will last 5 years without a single disk drive going bad is P ( X > 5 years ) = P ( X > 43,800 hours ) = e 46 − 43,800 10,000 = 0.0125 THE NORMAL DISTRIBUTION The normal distribution was discovered in 1733 by De Moivre, who found it to be the limiting case of the binomial distribution (which, you’ll recall, gives the probability of x successes in n trials) as the number of trials becomes infinite. The normal distribution was “discovered” again a half century later by both Laplace and Gauss, each of whom found that it seemed to describe the distribution of errors in astronomical measurements. The normal distribution is the foundation of many statistical methods used in data analysis because it does accurately describe the distribution of random errors. Its importance cannot be overstated. The normal distribution is bell-shaped, symmetrical about the mean, µ, and ranges from -∞ to ∞. f(x) x µ The probability density function is given by 1  x −µ  2  σ  −  1 f ( x) = e 2 σ 2π −∞< x<∞ −∞<µ <∞ σ >0 We have a shorthand way of describing a normally distributed random variable: X ∼ N [ µ , σ2 ] This says “the random variable X is normally distributed with mean µ and variance σ2.” If, for example, we have X ∼ N[25,100] , the interpretation would be that X is normally distributed with a mean of 25 and a variance of 100. µ = 25 (population mean) σ 2 = 100 (population variance) σ = 10 (population standard deviation) This is known as a specific normal distribution, because it has specified values of µ and σ2. (We could, of course, talk about specific binomial distributions and specific Poisson distributions, too.) 47 Because the normal distribution is symmetrical, the mean µ determines where on the x-axis the bell-curve is centered: µ1 µ2 µ3 The standard deviation σ determines the breadth of the bell-curve and, indirectly, the height of the curve. Because the area under the curve must always be 1, the height of the curve varies in inverse proportion to its breadth. For example, if one bell curve is twice as wide as another, it must also be half as high: σ=1 σ=2 µ As with all continuous distributions, the probability density function doesn’t do us much good. What we really need to know is the cumulative probability function, F ( x) = 1 x ∫e σ 2π −∞ 1  u −µ  2 −   2 σ  du Unfortunately, this integral cannot be evaluated in closed form. Instead, it has to be integrated numerically and the results tabulated for future use. 48 This presents a bit of a problem. There are an infinite number of normal random variables, each having a unique µ and σ . To calculate any probabilities associated with a specific normal variable, we’d have to have a table corresponding to those exact values of µ and σ. That means we’d need an infinite number of tables! Fortunately, there is an easy way out. THE STANDARD NORMAL DISTRIBUTION If X is a normally distributed random variable with mean µ and variance σ 2, then ( X − µ ) σ is a normally distributed random variable with zero mean and unit variance. In “shorthand” notation: If X N  µ ,σ 2  then X −µ σ N [ 0,1] For convenience, we define the Z statistic as: Z= X −µ σ so we can write Z ∼ N[0,1] The standard normal curve is symmetric about the origin and lies almost entirely within in the range [-3,+3] as shown below: f(Z) -4 -3 -2 -1 0 1 2 3 4 Z “So what?” you say. Well, now we don’t have to integrate specific normal distributions. We can, instead, numerically integrate the standard normal distribution once, store the values in a table, and use the table for every problem involving normally distributed random variables. Table 1 of Appendix B in the textbook is an example of such a table. Table 1 contains values of F ( z) = P(Z ≤ z) for selected non-negative values of Z. Get used to this table…you will use it a lot! 49 Wait a minute … read that last paragraph again. Table 1 in Appendix B contains values of F(z) for selected non-negative values of Z . What happens if Z is negative (meaning X is less than the mean)? Fortunately, we can use the symmetry of the standard normal distribution to easily calculate probabilities associated with negative values of Z. Examining the figure below, it is readily apparent that the area to the left of Z = – z is identical to the area to the right of Z = + z because the bell curve is symmetrical about the origin: f(z) P(Z ≤ –z) P(Z ≥ +z) -z 0 +z To determine the area to the left of Z = – z , we just have to calculate the area to the right of Z = + z . The area to the right of Z = + z is P(Z ≥ +z) and, by definition, P(Z ≥ +z) + P(Z ≤ +z) =1 (since Z ≥ z is the complement of Z ≤ z ), so P(Z ≥ +z) =1− P(Z ≤ +z) From this, we conclude that F (−z ) = 1 − F (+z ) so you only need to worry about one side of the distribution! 50 EXAMPLE: The Normal Distribution The paper “Reliability Enhancement of a New Computer” (ESS Proceedings, 1995) describes a program to enhance the reliability of the motherboard in a massively parallel RISC computer system. Part of the paper concentrates on a specific parity error that seems to be temperature dependent. The temperatures at which the error occurs follow a normal distribution with a mean of 80ºC and a standard deviation of 20ºC. a. What is the probability that an error occurs at a temperature of less than 77ºC? b. What is the probability of an error occurring between 77ºC and 83ºC? c. Over what range of temperatures will 2/3 of the errors occur? d. What is the probability that an error will occur at a temperature within ±2σ of the mean? 51 THE NORMAL AS AN APPROXIMATION OF THE BINOMIAL We said earlier that the normal distribution is the limiting case of the binomial distribution as the number of trials goes to infinity. If the number of trials is large, it becomes difficult to calculate a n P ( X ≤ a ) = F ( a ) = ∑   p x qn− x x =0  x  because of the sheer number of terms in the summation. Since the binomial distribution approaches the normal distribution for large n, we can approximate the probability above using the normal distribution and forget about calculating all those terms! Recall that the mean of the binomial distribution is µ x = np and the variance is σ x2 = npq . If X follows a binomial distribution and n is large (actually, if npq is greater than about 10), then: Z= a − np npq follows a standard normal distribution and F(z) can be determined from standard tables. Example A student is taking a true/false test for which he hasn’t studied, so he randomly circles answers. If there are 50 questions on the test, what are his chances of getting a C or better on the test (i.e., 70% or more correct answers)? What we’re asking is the probability of 35 or more successes in 50 trials where the probability of success on each trial is 50%. Ordinarily, we’d have to calculate the probabilities of getting 35, 36, 37, … , 50 answers correct and add the results. Using the normal approximation, though, we can determine the probability of getting 34 or less correct with a single calculation, then simply subtract the result from one: p = 0.5 ⇒ q = 1 − p = 0.5 ⇒ µ = np = 25 ⇒ σ 2 = npq = 12.5 Z= a − µ 34 − 25 = = 2.546 σ 12.5 P ( X ≥ 35 ) = 1 − P ( Z ≤ 2.546 ) = 1 − 0.9945 = 0.0055 !!! Remember this next time you decide to guess on a test! End Note: These aren't all of the continuous distributions that we'll cover … there are still more to come, but we’ll be exploring them in a slightly different way. 52 SOME CONTINUOUS DISTRIBUTIONS IN EXCEL You can use Excel’s Function Wizard to implement the exponential distribution and the normal distribution. Excel can give you both the value of the probability density function, f(x), and the value of the cumulative probability distribution, F(x). Just make the last argument of the function TRUE if you want F(x) and FALSE if you want f(x). If you don’t want to use the function wizard, you can simply type the functions into a cell just like any other function: Exponential Distribution f(x) = EXPONDIST(x, lambda, FALSE) F(x) = EXPONDIST(x, lambda, TRUE) Normal Distribution f(x) = NORMDIST(x, mean, stdev, FALSE) F(x) = NORMDIST(x, mean, stdev, TRUE) Excel also lets you implement the cumulative standard normal distribution. The function has just one argument, which is z: Standard Normal Distribution F(z) = NORMSDIST(z) Of course you could obtain the same results by using the NORMDIST function with a mean of zero and a standard deviation of one. In fact, that’s the only way you can obtain the probability mass function, f(z), for the standard normal distribution because the NORMSDIST function doesn’t have a TRUE/FALSE argument to toggle between the two. Many times, you need to use the cumulative normal distribution backwards. For example, to determine the 95th percentile value of X, you have to go into the body of the Cumulative Standard Normal Table, find the value 0.950, figure out what value of Z corresponds to that entry, and convert that value into X. This is called inversion and Excel provides functions for solving inverse problems involving both the normal distribution and the standard normal distribution: Normal Distribution X = NORMINV(probability, mean, stdev) Standard Normal Distribution Z = NORMSINV(probability) The former return the value of X corresponding to the input probability and the latter returns the value of Z corresponding to the input probability. There is no uniform distribution in Excel (because the calculations are so straightforward), but Excel has a couple of random number generators. The function RAND() returns a U[0,1] random number that changes every time the spreadsheet is recalculated. If you want a random number from a uniform distribution other than U[0,1], you can use the function RANDBETWEEN(bottom, top), which returns random numbers uniformly distributed between bottom and top. 53

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 4. CONTINUOUS DISTRIBUTIONS