Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
8 The normal distribution 8.1 Introduction One continuous distribution is of particular importance. This is the normal distribution (sometimes called the Gaussian distribution). Many variables are found to have, at least approximately, normal distributions. Others might have normal distributions after being transformed, for example by taking logarithms or square roots. The probability density function is (x − µ)2 1 exp − f (x) = √ 2σ 2 2πσ 2 for −∞ < x < ∞. We can not write down the distribution function explicitly. We obtain values by using tables based on numerical integration. The distribution has two parameters: The mean µ, which governs the location of the distribution, and The variance σ 2 , which governs the spread of the distribution. We usually specify a particular normal distribution by giving these two parameters. We write X ∼ N (µ, σ 2 ) when X has a normal distribution with mean µ and variance σ 2 . For example, if X has mean 10 and variance 4, we write X ∼ N (10, 4). Note, however, that some computer software, such as Excel, uses the mean and standard deviation rather than the mean and variance. 8.2 Linear transformations If X ∼ N (µ, σ 2 ) and Y = a + bX, where a and b are fixed numbers, then Y ∼ N (a + bµ, b2 σ 2 ). That is, Y also has a normal distribution. For example, if X ∼ N (10, 1) is the temperature of the sea water at a particular location and time of year, in Celcius, then Y = 1.8X + 32 is the temperature in degrees Fahrenheit and Y ∼ N (50, 3.24). 8.3 The standard normal distribution Because each pair of values µ, σ 2 will specify a different normal distribution and because of the need to evaluate probabilities using tables, we define a standard normal distribution. This is the normal distribution with mean 0 and variance 1. Thus, if Z ∼ N (0, 1), we say that Z has a standard normal distribution. We often use the symbol φ(x) for the standard normal probability density function, where 1 2 x 1 φ(x) = √ exp − . 2 2π We often use the symbol Φ(x) for the standard normal distribution function. Thus 2 Z x Z x 1 u √ Φ(x) = φ(u) du = exp − du. 2 2π −∞ −∞ We can apply linear transformations to convert normal random variables into standard normal random variables. So, if X ∼ N (µ, σ 2 ), and Z= X −µ , σ then Z ∼ N (0, 1). 8.4 Finding probabilities These days we can often use computer software, such as Excel, to calculate probabilities from normal distributions. However, it can also be done using tables of the standard normal distribution function, Φ(z). For example, the weight of coal loaded onto a barge has a normal distribution with mean µ = 150 tonnes and standard deviation σ = 2.5 tonnes. Find the probability that the weight is more than 153 tonnes. The probability that the weight W is greater than 153 is 153 − 150 W − 150 > = Pr(Z > 1.2) Pr(W > 153) = Pr 2.5 2.5 where Z ∼ N (0, 1). So, since Φ(1.2) is the probability that Z < 1.2, the required probability is 1 − Φ(1.2). We simply find Φ(1.2) = 0.8849 from tables and hence the required probability is 1 − 0.8849 = 0.1151. Because the normal distribution is symmetric about µ, and, in particular, the standard normal distribution is symmetric about 0, table usually only give values of Φ(z) for z > 0. However, it is easily seen that Φ(z) = 1 − Φ(−z) and this can be used to find probabilities for negative z. For example, for the barge above, what is the probability that the weight of coal is less than 149 tonnes? The probability is Pr(W < 149) = Pr 8.5 W − 150 149 − 150 < 2.5 2.5 = Pr(Z < −0.4) = Φ(−0.4) = 1 − Φ(0.4) = 1 − 0.6554 = 0.3446. Adding and subtracting normal variables Suppose that X1 ∼ N (µ1 , σ12 ) and X2 ∼ N (µ2 , σ22 ) and that the covariance of X1 and X2 is γ. If Y1 = X1 + X2 then Y1 ∼ N (µ1 + µ2 , sigma21 + σ22 + 2γ). If Y2 = X1 − X2 then Y1 ∼ N (µ1 − µ2 , sigma21 + σ22 − 2γ). That is, if you add, or subtract, two normal random variables, the result is also a normal random variable. The means and variances are given by the ordinary rule for adding random variables (see section 5.9). Note that variances are always added. For example, consider two barges with the distribution for weight of coal given above. Suppose that they are independent, so that γ = 0. Then the total weight coal in the two barges has a normal distribution with mean 300 and variance 2.52 + 2.52 = 12.5. So the standard deviation is √ 12.5 = 3.536 tonnes. Notice that we add variances, not standard deviations. 2 8.6 The central limit theorem One of the reasons why the normal distribution is important is the central limit theorem. Suppose that we will observe a sample X1 , . . . , Xn of independent observations from some distribution, which need not be normal, with mean µ and variance σ 2 , (where 0 < σ 2 < ∞), then, if n X̄ = 1X Xi , n i=1 the distribution of X̄ − µ T =p σ 2 /n tend to a standard normal distribution as n → ∞. In practice this means that, for moderately large samples, sample means are often approximately normally distributed, even if the original data are not. 8.7 The normal approximation to the binomial distribution Calculating probabilities in binomial(n, p) distributions can be rather tedious when n is large. However, under some circumstances, we can use the normal distribution as an approximation. The approximation is good provided that n is large and neither np nor n(1 − p) is too small. As a rough guide, we might say n > 40 and np > 5 and n(1 − p) > 5. To approximate a binomial(n, p) distribution, we use a normal distribution with the same mean and variance. That is µ = np and σ 2 = np(1 − p). So, roughly speaking, if X ∼ binomial(n, p), and we want to find the probability that X < x, we calculate the probability that Y < x, where Y ∼ N (np, np(1 − p)). In fact we usually make a refinement called the continuity correction. This is needed because we are approximating a discrete distribution (binomial) with a continuous distribution (normal). In the normal distribution, Pr(Y < x) + Pr(Y > x) = 1. If X has a binomial distribution and x is an integer with 0 ≤ x ≤ n, then Pr(X < x) + Pr(X = x) + Pr(X > x) = 1. We have to allow for the possibility that X = x. So, in the normal approximation to the binomial, we count as being x everything between x − 0.5 and x + 0.5. The probability that X < x, where X ∼ binomial(n, p), is then approximated by the probability that Y < x − 0.5, where Y ∼ N (np, np(1 − p)). For example, in a batch of 500 fastenings, each fastening has, independently, a probability of 0.02 of being defective. Find the probability that there are more than 15 defective fastenings in the batch. Let the number of defective fastenings be X. Then X ∼ binomial(500, 0.02). We approximate this with a normal distribution with µ = 500 × 0.02 = 10 and σ 2 = 500 × 0.02 × 0.98 = 9.8. The required probability is 15.5 − 10 Y − 10 > √ Pr(X > 15) ≈ Pr(Y > 15.5) = Pr √ 9.8 9.8 = Pr(Z > 1.757) = 1 − Φ(1.757) ≈ 1 − 0.96 = 0.04 8.8 Problems 1. The amount of fuel, in litres, which a vessel will use on a short voyage has a normal distribution with mean 90 and variance 16. Find the probability that it will use more than 95 litres. 2. The amount of fuel that a vessel has in its tank at the beginning of a voyage has a normal distribution with mean 1200 litres and standard deviation 80 litres. The amount which it will use on the voyage has a normal distribution with mean 800 litres and standard deviation 50 litres. Find the probability that it has less than 300 litres in the tank at the end of the voyage. 3 3. Cylindrical metal bolts are manufactured to fit inside close-fitting metal collars. If the diameter of a bolt has a normal distribution with mean 15mm and standard deviation 0.3mm and the internal diameter of the collar has a normal distribution with mean 16mm and standard deviation 0.3mm, independently of the bolt, find the probability that the bolt will be too big to fit inside the collar. 4. Loaded containers are placed temporarily on an area of jetty. One container is added to the group on the jetty at a time. The weights of the containers are independent and each has a normal distribution with mean 4.2 tonnes and standard deviation 0.8 tonnes. (a) Find the probability that the weight of four containers exceeds 18 tonnes. (b) Find the probability that the weight of five containers exceeds 18 tonnes. (c) Find the probability that the total weight first exceeds 18 tonnes when the fifth container is placed on the jetty. 5. A passenger ferry on a short crossing has an on-board cafeteria. Suppose that each passenger has, independently, a probability of 0.3 of using the cafeteria during a crossing. Using a suitable approximation, find the probability that, out of 400 passengers, at least 130 use the cafeteria. 4