Download 8 The normal distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
8
The normal distribution
8.1
Introduction
One continuous distribution is of particular importance. This is the normal distribution (sometimes
called the Gaussian distribution). Many variables are found to have, at least approximately, normal
distributions. Others might have normal distributions after being transformed, for example by
taking logarithms or square roots.
The probability density function is
(x − µ)2
1
exp −
f (x) = √
2σ 2
2πσ 2
for −∞ < x < ∞.
We can not write down the distribution function explicitly. We obtain values by using tables
based on numerical integration.
The distribution has two parameters:
The mean µ, which governs the location of the distribution, and
The variance σ 2 , which governs the spread of the distribution.
We usually specify a particular normal distribution by giving these two parameters. We write
X ∼ N (µ, σ 2 )
when X has a normal distribution with mean µ and variance σ 2 . For example, if X has mean
10 and variance 4, we write X ∼ N (10, 4). Note, however, that some computer software, such as
Excel, uses the mean and standard deviation rather than the mean and variance.
8.2
Linear transformations
If X ∼ N (µ, σ 2 ) and Y = a + bX, where a and b are fixed numbers, then Y ∼ N (a + bµ, b2 σ 2 ).
That is, Y also has a normal distribution.
For example, if X ∼ N (10, 1) is the temperature of the sea water at a particular location
and time of year, in Celcius, then Y = 1.8X + 32 is the temperature in degrees Fahrenheit and
Y ∼ N (50, 3.24).
8.3
The standard normal distribution
Because each pair of values µ, σ 2 will specify a different normal distribution and because of the
need to evaluate probabilities using tables, we define a standard normal distribution. This is the
normal distribution with mean 0 and variance 1. Thus, if Z ∼ N (0, 1), we say that Z has a
standard normal distribution. We often use the symbol φ(x) for the standard normal probability
density function, where
1
2
x
1
φ(x) = √ exp −
.
2
2π
We often use the symbol Φ(x) for the standard normal distribution function. Thus
2
Z x
Z x
1
u
√
Φ(x) =
φ(u) du =
exp −
du.
2
2π
−∞
−∞
We can apply linear transformations to convert normal random variables into standard normal
random variables. So, if X ∼ N (µ, σ 2 ), and
Z=
X −µ
,
σ
then Z ∼ N (0, 1).
8.4
Finding probabilities
These days we can often use computer software, such as Excel, to calculate probabilities from
normal distributions. However, it can also be done using tables of the standard normal distribution
function, Φ(z).
For example, the weight of coal loaded onto a barge has a normal distribution with mean
µ = 150 tonnes and standard deviation σ = 2.5 tonnes. Find the probability that the weight is
more than 153 tonnes. The probability that the weight W is greater than 153 is
153 − 150
W − 150
>
= Pr(Z > 1.2)
Pr(W > 153) = Pr
2.5
2.5
where Z ∼ N (0, 1). So, since Φ(1.2) is the probability that Z < 1.2, the required probability is
1 − Φ(1.2). We simply find Φ(1.2) = 0.8849 from tables and hence the required probability is
1 − 0.8849 = 0.1151.
Because the normal distribution is symmetric about µ, and, in particular, the standard normal
distribution is symmetric about 0, table usually only give values of Φ(z) for z > 0. However, it is
easily seen that Φ(z) = 1 − Φ(−z) and this can be used to find probabilities for negative z.
For example, for the barge above, what is the probability that the weight of coal is less than
149 tonnes? The probability is
Pr(W < 149) = Pr
8.5
W − 150
149 − 150
<
2.5
2.5
=
Pr(Z < −0.4) = Φ(−0.4)
=
1 − Φ(0.4) = 1 − 0.6554 = 0.3446.
Adding and subtracting normal variables
Suppose that X1 ∼ N (µ1 , σ12 ) and X2 ∼ N (µ2 , σ22 ) and that the covariance of X1 and X2 is γ.
If Y1 = X1 + X2 then Y1 ∼ N (µ1 + µ2 , sigma21 + σ22 + 2γ).
If Y2 = X1 − X2 then Y1 ∼ N (µ1 − µ2 , sigma21 + σ22 − 2γ).
That is, if you add, or subtract, two normal random variables, the result is also a normal
random variable. The means and variances are given by the ordinary rule for adding random
variables (see section 5.9). Note that variances are always added.
For example, consider two barges with the distribution for weight of coal given above. Suppose
that they are independent, so that γ = 0. Then the total weight coal in the two barges has a
normal distribution with mean 300 and variance 2.52 + 2.52 = 12.5. So the standard deviation is
√
12.5 = 3.536 tonnes. Notice that we add variances, not standard deviations.
2
8.6
The central limit theorem
One of the reasons why the normal distribution is important is the central limit theorem. Suppose
that we will observe a sample X1 , . . . , Xn of independent observations from some distribution,
which need not be normal, with mean µ and variance σ 2 , (where 0 < σ 2 < ∞), then, if
n
X̄ =
1X
Xi ,
n i=1
the distribution of
X̄ − µ
T =p
σ 2 /n
tend to a standard normal distribution as n → ∞.
In practice this means that, for moderately large samples, sample means are often approximately
normally distributed, even if the original data are not.
8.7
The normal approximation to the binomial distribution
Calculating probabilities in binomial(n, p) distributions can be rather tedious when n is large.
However, under some circumstances, we can use the normal distribution as an approximation. The
approximation is good provided that n is large and neither np nor n(1 − p) is too small. As a rough
guide, we might say n > 40 and np > 5 and n(1 − p) > 5.
To approximate a binomial(n, p) distribution, we use a normal distribution with the same mean
and variance. That is µ = np and σ 2 = np(1 − p). So, roughly speaking, if X ∼ binomial(n, p),
and we want to find the probability that X < x, we calculate the probability that Y < x, where
Y ∼ N (np, np(1 − p)).
In fact we usually make a refinement called the continuity correction. This is needed because
we are approximating a discrete distribution (binomial) with a continuous distribution (normal).
In the normal distribution, Pr(Y < x) + Pr(Y > x) = 1. If X has a binomial distribution and x is
an integer with 0 ≤ x ≤ n, then Pr(X < x) + Pr(X = x) + Pr(X > x) = 1. We have to allow for
the possibility that X = x. So, in the normal approximation to the binomial, we count as being x
everything between x − 0.5 and x + 0.5. The probability that X < x, where X ∼ binomial(n, p), is
then approximated by the probability that Y < x − 0.5, where Y ∼ N (np, np(1 − p)).
For example, in a batch of 500 fastenings, each fastening has, independently, a probability
of 0.02 of being defective. Find the probability that there are more than 15 defective fastenings
in the batch. Let the number of defective fastenings be X. Then X ∼ binomial(500, 0.02). We
approximate this with a normal distribution with µ = 500 × 0.02 = 10 and σ 2 = 500 × 0.02 × 0.98 =
9.8. The required probability is
15.5 − 10
Y − 10
> √
Pr(X > 15) ≈ Pr(Y > 15.5) = Pr √
9.8
9.8
= Pr(Z > 1.757) = 1 − Φ(1.757)
≈ 1 − 0.96 = 0.04
8.8
Problems
1. The amount of fuel, in litres, which a vessel will use on a short voyage has a normal distribution with mean 90 and variance 16. Find the probability that it will use more than 95
litres.
2. The amount of fuel that a vessel has in its tank at the beginning of a voyage has a normal
distribution with mean 1200 litres and standard deviation 80 litres. The amount which it
will use on the voyage has a normal distribution with mean 800 litres and standard deviation
50 litres. Find the probability that it has less than 300 litres in the tank at the end of the
voyage.
3
3. Cylindrical metal bolts are manufactured to fit inside close-fitting metal collars. If the diameter of a bolt has a normal distribution with mean 15mm and standard deviation 0.3mm and
the internal diameter of the collar has a normal distribution with mean 16mm and standard
deviation 0.3mm, independently of the bolt, find the probability that the bolt will be too big
to fit inside the collar.
4. Loaded containers are placed temporarily on an area of jetty. One container is added to the
group on the jetty at a time. The weights of the containers are independent and each has a
normal distribution with mean 4.2 tonnes and standard deviation 0.8 tonnes.
(a) Find the probability that the weight of four containers exceeds 18 tonnes.
(b) Find the probability that the weight of five containers exceeds 18 tonnes.
(c) Find the probability that the total weight first exceeds 18 tonnes when the fifth container
is placed on the jetty.
5. A passenger ferry on a short crossing has an on-board cafeteria. Suppose that each passenger
has, independently, a probability of 0.3 of using the cafeteria during a crossing. Using a
suitable approximation, find the probability that, out of 400 passengers, at least 130 use the
cafeteria.
4