Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 9 Continuous Probability Models 9.1 Introduction We have seen how discrete random variables can be modelled by discrete probability distributions such as the binomial and Poisson distributions. We now consider how to model continuous random variables. A variable is discrete if it takes a countable number of values, for example, r = 0, 1, 2, . . . , n or r = 0, 1, 2, . . . or r = 0, 0.1, 0.2, . . . , 0.9, 1.0. In contrast, the values which a continuous variable can take form a continuous scale. One simple example of a continuous variable is height. Although in practice we might only record height to the nearest cm, if we could measure height exactly (to an infinite number of decimal places) we would find that everyone had a different height. This is the essential difference between discrete and continuous variables. For example, suppose I measure my height (in cm) exactly, and you have to guess my height. What’s the probability that my height X equals your guess x? Well, to guess my height correctly you have to guess every one of the infinite decimal digits correctly – and this has zero probability, P (X = x) = 0. On the other hand, events such as P (170 < X ≤ 175) will have positive probability. If P (X = x) = 0 for all values of x of a continuous random variable, then we can’t write down the probability distribution of X like we did for discrete random variables such as the binomial or Poisson. We need some other way to define a continuous random variable. The solution can be found by considering a (relative frequency) histogram of a sample of values from the continuous random variable, and thinking about what happens to the histogram as the sample size increases. For example, consider the following graphs which show histograms for samples of 100, 1000 and 10000 observations made on a continuous random variable which can take values between 0 and 20. The final graph shows what happens when the sample size becomes infinitely big. 87 CHAPTER 9. CONTINUOUS PROBABILITY MODELS 88 0 5 10 15 20 15 20 0.10 0.00 Relative frequency 0.10 0.00 Relative frequency 0.20 n = 1000 0.20 n = 100 0 5 10 15 20 0 5 10 15 20 0 5 10 0.10 0.00 Probability density 0.10 0.00 Relative frequency 0.20 0.20 n = 10000 As the population size gets larger and the histogram intervals get smaller, the jagged profile of the histogram smooths out to become a curve. We call this curve the probability density function (pdf) and it is usually written as f (x). Note that probabilities such as P (X < x) can be determined using the pdf as they equate to areas under the curve. The key features of pdfs are 1. pdfs never take negative values; 2. the area under a pdf is one: P (−∞ < X < ∞) = 1; 3. areas under the curve correspond to probabilities; 4. P (X ≤ x) = P (X < x) since P (X = x) = 0. Over the next two weeks we will consider some particular probability distributions that are often used to describe continuous random variables. We start with the most important, most widely–used statistical distribution of all time . . . CHAPTER 9. CONTINUOUS PROBABILITY MODELS 89 9.2 The Normal Distribution 9.2.1 Introduction The normal distribution is possibly the best known and most used continuous probability distribution. It provides a good model for data in so many different applications – for example, the level of rainfall on a particular day, the height of people in a class, the IQ levels of the population as a whole. The outcomes of many production processes also follow normal distributions and hence it is used widely in industry. Similarly, the normal distribution provides a good model for many biological phenomenon; animal weight, crop yield and so on, and so it is also widely used in agriculture. Recall the “parameters” of the binomial and Poisson distributions: the binomial distribution has two parameters, n and p, and the Poisson distribution has one parameter, λ. The normal distribution has two parameters: the mean, µ (the Greek letter “mu”), and the standard deviation, σ (the lower-case Greek letter “sigma”). Its probability density function (pdf) has a “bell shaped” profile: f (x) µ − 4σ µ µ − 2σ µ + 2σ µ + 4σ x There are four important characteristics of the normal distribution: 1. It is symmetrical about its mean, µ. 2. The mean, median and mode all coincide. 3. The area under the curve is equal to 1. 4. The curve extends in both directions to infinity (∞). The formula for the pdf is f (x) = √ 1 2πσ 2 e− 2 ( 1 x−µ 2 σ ). Unlike the binomial and Poisson distributions, there is no simple formula for calculating probabilities such as P (X ≤ x). However, such probabilities can be determined using tables (see tables at the end of this chapter) or statistical packages such as Minitab. CHAPTER 9. CONTINUOUS PROBABILITY MODELS 90 Below are plots of the pdf for normal distributions with different values of µ and σ: Normal pdfs with means 10, 30, 50 and sd 10 Normal pdfs with mean 30 and sds 5, 10, 15 0 10 20 30 x 40 50 60 0.00 0.04 Density 0.08 0.04 0.00 0.02 Density 0.02 0.00 Density 0.04 Normal pdf with mean 30 and sd 10 -20 0 40 20 x 60 80 -20 0 40 20 60 80 x Note that the mean µ locates the distribution on the x–axis and the standard deviation σ affects the spread of the distribution, with larger values giving flatter and wider curves. 9.2.2 Notation If a random variable X has a normal distribution with mean µ and variance σ 2 , then we write X ∼ N µ, σ 2 . For example, a random variable X which follows a normal distribution with mean 10 and variance 25 is written as X ∼ N (10, 25) or X ∼ N (10, 52). It is important to note that the second parameter in this notation is the variance and not the standard deviation. 9.2.3 The standard normal distribution For various reasons, all probabilities for the normal distribution can be expressed in terms of those for a normal distribution with mean 0 and variance 1. Usually, a random variable with this standard normal distribution is called Z, that is Z ∼ N (0, 1) . If our random variable follows a standard normal distribution, then we can obtain cumulative probabilities from statistical tables (see the table at the end of this chapter, which give “less than or equal to” probabilities). For example, if Z ∼ N(0, 1), then: 1. The probability that Z is less than −1.46 is P (Z < −1.46). Therefore we look for the probability in tables corresponding to z = −1.46: row labelled −1.4, column headed −0.06. This gives P (Z < −1.46) = 0.0721. 2. The probability that Z is less than −0.01 is P (Z < −0.01). Therefore we look for the probability in tables corresponding to z = −0.01: row labelled 0.0, column headed −0.01. This gives P (Z < −0.01) = 0.4960. 3. The probability that Z is less than 0.01 is P (Z < 0.01). Therefore we look for the probability in tables corresponding to z = 0.01: row labelled 0.0, column headed 0.01. This gives P (Z < 0.01) = 0.5040. CHAPTER 9. CONTINUOUS PROBABILITY MODELS 91 4. The probability that Z is greater than 1.5 is P (Z > 1.5). Now our tables give “less than” probabilities, and here we want a “greater than” probability. But! The area under the curve is 1: So we find P (Z ≤ 1.5) = 0.9332 and subtract this from 1 to give 0.0668. 5. What about the probability that Z lies between −1.2 and 1.5? Graphically, this is: And so P (−1.2 < Z < 1.5) = P (Z < 1.5) − P (Z ≤ −1.2) = 0.9332 − 0.1151 = 0.8181. CHAPTER 9. CONTINUOUS PROBABILITY MODELS 92 So how do we calculate probabilities for any normal distribution, not just the standard normal distribution for which we have tables? The easiest approach is to “make” the normal distribution that we have “look like” the standard normal distribution, and then we can just use the tables as before. But how can we “make” any old normal distribution look like the standard normal distribution? We can transform it by using the “slide–squash” technique! This is best demonstrated through an example. IQ Example Suppose we are interested in the IQ of 18–19 year olds and that IQs follow a normal distribution with mean µ = 100 and standard deviation σ = 15. Thus, we have: X: IQ of 18–19 year olds, and X ∼ N 100, 152 . We don’t have tables of probabilities for this distribution, but we do have tables for the standard normal distribution with mean 0 and standard deviation 1. So to make our distribution look like the standard normal distribution, we first need to slide it along to the left so it has the same mean, and then squash it in so it has the same spread. For the slide, we subtract the mean from our distribution, i.e. subtract 100. Doing so centres the distribution on zero, just like the standard normal distribution. Then, for the squash, we divide by the standard deviation (in this case 15), which squashes our normal distribution so it has the same spread as the standard normal distribution. The formula for slide–squash, where X ∼ N(µ, σ 2 ), Z = X−µ and Z ∼ N(0, 1), is thus: σ X −µ x−µ x−µ P (X ≤ x) = P =P Z≤ , ≤ σ σ σ which transforms any normal distribution into the standard normal distribution. Thus, in the IQs example, let’s suppose we wanted to find the probability that an 18–19 year old has an IQ less than 85, i.e. P (X < 85). Using the slide–squash formula, this is transformed into a statement about the standard normal distribution as follows: P (X < 85) = = = = x−µ P Z< σ 85 − 100 P Z< 15 P (Z < −1) 0.1587. CHAPTER 9. CONTINUOUS PROBABILITY MODELS 93 What about the following probabilities? (i) The probability that an 18–19 year old has an IQ less than 110. (ii) The probability that an 18–19 year old has an IQ greater than 110. (iii) The probability that an 18–19 year old has an IQ greater than 125. (iv) The probability that an 18–19 year old has an IQ between 95 and 115. Solutions CHAPTER 9. CONTINUOUS PROBABILITY MODELS This page has been left blank for your solutions to the last example 94 CHAPTER 9. CONTINUOUS PROBABILITY MODELS 95 Vitamin C example Suppose that the vitamin C content per 100g tin of tomato juice is normally distributed with mean µ = 20mg and standard deviation σ = 4mg. Let X be the vitamin C content of a randomly chosen tin. Then X ∼ N 20, 42 . Find (i) The probability that a tin has less than 25mg of vitamin C. (ii) The probability that the tin has between 18mg and 25mg of vitamin C. For (i), we have P (X < 25) = = = = 25 − µ P Z< σ 25 − 20 P Z< 4 P (Z < 1.25) 0.8944 (from tables). For (ii), we have P (18 < X < 25). That is, P (18 < X < 25) = P (X < 25) − P (X ≤ 18) 25 − 20 18 − 20 = P Z< −P Z ≤ 4 4 = P (Z < 1.25) − P (Z ≤ −0.5) = 0.8944 − 0.3085 = 0.5859. 9.2.4 Using tables in reverse We can also use the tables in reverse. For example, we might want to know below what value are 95% of the population. This is equivalent to determining the value of z that satisfies P (Z < z) = 0.95. From tables, we can see that P (Z < 1.64) = 0.9495 P (Z < 1.65) = 0.9505. and Therefore, the value we want for z lies between 1.64 and 1.65. If a more accurate value is needed we can interpolate between these values: 0.95 is half-way between 0.9495 and 0.9505 and so we take z = 1.645. This is a more accurate answer and sufficient in most cases. However, the exact value for z can be found from more detailed tables or via a computer package such as Minitab. CHAPTER 9. CONTINUOUS PROBABILITY MODELS 96 Here are some more examples. 1. Below what value does 10% of the standard normal population fall? From tables we get P (Z < −1.29) = 0.0985 and P (Z < −1.28) = 0.1003. 0.1003 is closer to 10% (0.1) than 0.0985, and so, roughly, 10% of the standard normal population falls below −1.28. We can get a more accurate answer by using linear interpolation, as follows. The value we want, z, is given by, 0.1 − 0.0985 z = −1.29 + × (−1.28 − −1.29) 0.1003 − 0.0985 = −1.29 + 0.83 × 0.01 = −1.2817. So, P (Z < −1.2817) = 0.1. 2. A similar calculation can be used to calculate the IQ that identifies the bottom 10% of 18– 19 year olds. We need the value of x, where P (X < x) = 0.1. Now this population has µ = 100 and σ = 15. Also x−µ P (X ≤ x) = P Z ≤ σ and so we need x so that P x − 100 Z≤ 15 = 0.1. We know (from earlier) that P (Z < −1.2817) = 0.1 and therefore we solve x − 100 = −1.2817, 15 that is x = 100 − 1.2817 × 15 = 100 − 19.2255 = 80.7745. Notice that the calculation that transforms the z–value onto the x–scale is x = µ + zσ. 3. What is the IQ that identifies the top 1% of 18-19 year olds? Again, we first determine the value z that identifies the top 1% of a standard normal population and then translate this into an IQ. So we need the value z that satisfies P (Z > z) = 0.01. This is the same value as satisfies P (Z ≤ z) = 0.99. A quick examination of tables gives the two key probabilities as P (Z ≤ 2.32) = 0.9898 and P (Z ≤ 2.33) = 0.9901; CHAPTER 9. CONTINUOUS PROBABILITY MODELS 97 and linear interpolation gives 0.99 − 0.9898 z = 2.32 + 0.9901 − 0.9898 = 2.32 + 0.67 × 0.01 = 2.3267. × (2.33 − 2.32) So, P (Z ≤ 2.3267) = 0.99. Moving back to the IQ scale, we need the value x such that P (X > x) = 0.01 and so we take x = µ + zσ = 100 + 2.3267 × 15 = 134.9. So the top 1% of 18-19 year-olds have IQs over 134.9. CHAPTER 9. CONTINUOUS PROBABILITY MODELS 98 9.3 Exercises 1. A company promises delivery within 20 working days of receipt of order. However, in reality, they deliver according to a normal distribution with a mean of 16 days and a standard deviation of 2.5 days. (a) What proportion of customers receive their order late? (b) What proportion of customers receive their orders between 10 and 15 days of placing their order? (c) A new order processing system promises to reduce the standard deviation of delivery times to 1.5 days. If this system is used, what proportion of customers will receive their deliveries within 20 days? 2. A drinks machine is regulated by its manufacturer so that it dispenses an average of 200ml per cup. However, the machine is not particularly accurate and actually dispenses an amount that has a normal distribution with standard deviation 15ml. (a) What percentage of cups contain below the minimum permissible volume of 170ml? (b) What percentage of cups contain over 225ml? (c) What percentage of cups contain between 175ml and 225ml? (d) How many cups would you expect to overflow if 240ml cups are used for the next 10000 drinks? CHAPTER 9. CONTINUOUS PROBABILITY MODELS 99 Probability Tables for the Standard Normal Distribution The table contains values of P (Z ≤ z), where Z ∼ N(0, 1). z -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 -0.09 0.0014 0.0019 0.0026 0.0036 0.0048 0.0064 0.0084 0.0110 0.0143 0.0183 0.0233 0.0294 0.0367 0.0455 0.0559 0.0681 0.0823 0.0985 0.1170 0.1379 0.1611 0.1867 0.2148 0.2451 0.2776 0.3121 0.3483 0.3859 0.4247 0.4641 -0.08 0.0014 0.0020 0.0027 0.0037 0.0049 0.0066 0.0087 0.0113 0.0146 0.0188 0.0239 0.0301 0.0375 0.0465 0.0571 0.0694 0.0838 0.1003 0.1190 0.1401 0.1635 0.1894 0.2177 0.2483 0.2810 0.3156 0.3520 0.3897 0.4286 0.4681 -0.07 0.0015 0.0021 0.0028 0.0038 0.0051 0.0068 0.0089 0.0116 0.0150 0.0192 0.0244 0.0307 0.0384 0.0475 0.0582 0.0708 0.0853 0.1020 0.1210 0.1423 0.1660 0.1922 0.2206 0.2514 0.2843 0.3192 0.3557 0.3936 0.4325 0.4721 -0.06 0.0015 0.0021 0.0029 0.0039 0.0052 0.0069 0.0091 0.0119 0.0154 0.0197 0.0250 0.0314 0.0392 0.0485 0.0594 0.0721 0.0869 0.1038 0.1230 0.1446 0.1685 0.1949 0.2236 0.2546 0.2877 0.3228 0.3594 0.3974 0.4364 0.4761 -0.05 0.0016 0.0022 0.0030 0.0040 0.0054 0.0071 0.0094 0.0122 0.0158 0.0202 0.0256 0.0322 0.0401 0.0495 0.0606 0.0735 0.0885 0.1056 0.1251 0.1469 0.1711 0.1977 0.2266 0.2578 0.2912 0.3264 0.3632 0.4013 0.4404 0.4801 -0.04 0.0016 0.0023 0.0031 0.0041 0.0055 0.0073 0.0096 0.0125 0.0162 0.0207 0.0262 0.0329 0.0409 0.0505 0.0618 0.0749 0.0901 0.1075 0.1271 0.1492 0.1736 0.2005 0.2296 0.2611 0.2946 0.3300 0.3669 0.4052 0.4443 0.4840 -0.03 0.0017 0.0023 0.0032 0.0043 0.0057 0.0075 0.0099 0.0129 0.0166 0.0212 0.0268 0.0336 0.0418 0.0516 0.0630 0.0764 0.0918 0.1093 0.1292 0.1515 0.1762 0.2033 0.2327 0.2643 0.2981 0.3336 0.3707 0.4090 0.4483 0.4880 -0.02 0.0018 0.0024 0.0033 0.0044 0.0059 0.0078 0.0102 0.0132 0.0170 0.0217 0.0274 0.0344 0.0427 0.0526 0.0643 0.0778 0.0934 0.1112 0.1314 0.1539 0.1788 0.2061 0.2358 0.2676 0.3015 0.3372 0.3745 0.4129 0.4522 0.4920 -0.01 0.0018 0.0025 0.0034 0.0045 0.0060 0.0080 0.0104 0.0136 0.0174 0.0222 0.0281 0.0351 0.0436 0.0537 0.0655 0.0793 0.0951 0.1131 0.1335 0.1562 0.1814 0.2090 0.2389 0.2709 0.3050 0.3409 0.3783 0.4168 0.4562 0.4960 0.00 0.0019 0.0026 0.0035 0.0047 0.0062 0.0082 0.0107 0.0139 0.0179 0.0228 0.0287 0.0359 0.0446 0.0548 0.0668 0.0808 0.0968 0.1151 0.1357 0.1587 0.1841 0.2119 0.2420 0.2743 0.3085 0.3446 0.3821 0.4207 0.4602 0.5000 z 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 0.00 0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257 0.7580 0.7881 0.8159 0.8413 0.8643 0.8849 0.9032 0.9192 0.9332 0.9452 0.9554 0.9641 0.9713 0.9772 0.9821 0.9861 0.9893 0.9918 0.9938 0.9953 0.9965 0.9974 0.9981 0.01 0.5040 0.5438 0.5832 0.6217 0.6591 0.6950 0.7291 0.7611 0.7910 0.8186 0.8438 0.8665 0.8869 0.9049 0.9207 0.9345 0.9463 0.9564 0.9649 0.9719 0.9778 0.9826 0.9864 0.9896 0.9920 0.9940 0.9955 0.9966 0.9975 0.9982 0.02 0.5080 0.5478 0.5871 0.6255 0.6628 0.6985 0.7324 0.7642 0.7939 0.8212 0.8461 0.8686 0.8888 0.9066 0.9222 0.9357 0.9474 0.9573 0.9656 0.9726 0.9783 0.9830 0.9868 0.9898 0.9922 0.9941 0.9956 0.9967 0.9976 0.9982 0.03 0.5120 0.5517 0.5910 0.6293 0.6664 0.7019 0.7357 0.7673 0.7967 0.8238 0.8485 0.8708 0.8907 0.9082 0.9236 0.9370 0.9484 0.9582 0.9664 0.9732 0.9788 0.9834 0.9871 0.9901 0.9925 0.9943 0.9957 0.9968 0.9977 0.9983 0.04 0.5160 0.5557 0.5948 0.6331 0.6700 0.7054 0.7389 0.7704 0.7995 0.8264 0.8508 0.8729 0.8925 0.9099 0.9251 0.9382 0.9495 0.9591 0.9671 0.9738 0.9793 0.9838 0.9875 0.9904 0.9927 0.9945 0.9959 0.9969 0.9977 0.9984 0.05 0.5199 0.5596 0.5987 0.6368 0.6736 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0.8749 0.8944 0.9115 0.9265 0.9394 0.9505 0.9599 0.9678 0.9744 0.9798 0.9842 0.9878 0.9906 0.9929 0.9946 0.9960 0.9970 0.9978 0.9984 0.06 0.5239 0.5636 0.6026 0.6406 0.6772 0.7123 0.7454 0.7764 0.8051 0.8315 0.8554 0.8770 0.8962 0.9131 0.9279 0.9406 0.9515 0.9608 0.9686 0.9750 0.9803 0.9846 0.9881 0.9909 0.9931 0.9948 0.9961 0.9971 0.9979 0.9985 0.07 0.5279 0.5675 0.6064 0.6443 0.6808 0.7157 0.7486 0.7794 0.8078 0.8340 0.8577 0.8790 0.8980 0.9147 0.9292 0.9418 0.9525 0.9616 0.9693 0.9756 0.9808 0.9850 0.9884 0.9911 0.9932 0.9949 0.9962 0.9972 0.9979 0.9985 0.08 0.5319 0.5714 0.6103 0.6480 0.6844 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.8810 0.8997 0.9162 0.9306 0.9429 0.9535 0.9625 0.9699 0.9761 0.9812 0.9854 0.9887 0.9913 0.9934 0.9951 0.9963 0.9973 0.9980 0.9986 0.09 0.5359 0.5753 0.6141 0.6517 0.6879 0.7224 0.7549 0.7852 0.8133 0.8389 0.8621 0.8830 0.9015 0.9177 0.9319 0.9441 0.9545 0.9633 0.9706 0.9767 0.9817 0.9857 0.9890 0.9916 0.9936 0.9952 0.9964 0.9974 0.9981 0.9986