Download Chapter 5 The normal distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Law of large numbers wikipedia , lookup

Central limit theorem wikipedia , lookup

Transcript
Chapter 5
The normal distribution
This chapter deals in detail with one of the most versatile models for variation, the
normal distribution or 'bell-shaped' curve. You will learn how to use printed tables
to calculate normal probabilities. The normal curve also provides a useful approximation to other probability distributions: this is one of the consequences of the central
limit theorem.
In Chapter 2, Section 2.4 you were introduced to an important continuous
distribution called the normal distribution. It was noted that many real data
sets can reasonably be treated as though they were a random sample from the
normal distribution and it was remarked that the normal distribution turns
out to play a central role in statistical theory as well as in practice. This
entire chapter is devoted to the study of the normal distribution.
The chapter begins with a review of all that has been said so far about the
normal distribution. The main point to bear in mind is that in many cases a
probability model for random variation follows necessarily as a mathematical
consequence of certain assumptions: for instance, many random processes can
be modelled as sets or sequences of Bernoulli trials, the distribution theory
following from the twin assumptions that the trials are independent and that
the probability of success from trial to trial remains constant. Quite often,
however, data arise from a situation for which no model has been proposed:
nevertheless, even when the data sets arise from entirely different sampling
contexts, they often seem t o acquire a characteristic peaked and symmetric
shape that is essentially the same. This shape may often be adequately represented through a normal model. The review is followed by an account of
the genesis of the normal distribution.
In Section 5.2, you will discover how to calculate normal probabilities. As
for any other continuous probability distribution, probabilities are found by
calculating areas under the curve of the probability density function. But for
the normal distribution, this is not quite straightforward, because applying
the technique of integration does not in this case lead to a formula that is
easy to write down. So, in practice, probabilities are found by referring to
printed tables, or by using a computer.
The remaining sections of the chapter deal with one of the fundamental theorems in statistics and with some of the consequences of it. It is called the
central limit theorem. This is a theorem due to Pierre Simon Laplace (17491827) that was read before the Academy of Sciences in Paris on 9 April 1810.
The theorem is a major mathematical statement: however, we shall be concerned not with the details of its proof, but with its application t o statistical
problems.
Elements of Statistics
5.1 Some history
5.1.1 Review
The review begins with a set of data collected a long time ago. During the
mapping of the state of Massachusetts in America, one hundred readings were
taken on the error involved when measuring angles. The error was measured
in minutes ( a minute is 1/60 of a degree). The data are shown in Table 5.1.
Table 5.1
Errors in angular measurements
Error (in minutes)
Between +6 and +5
Between +5 and +4
Between +4 and +3
Between +3 and +2
Between +2 and +l
Between +l and 0
Between 0 and -1
Between -1 and -2
Between -2 and -3
Between -3 and -4
Frequency
1
2
2
3
United States Coast Survey Report
(1854). The error was calculated
by subtracting each measurement
from 'the most probable' value.
Frequency
13
26
26
17
8
2
10
A histogram of this sample is given in Figure 5.1. This graphical representation shows clearly the main characteristics of the data: the histogram is
unimodal (it possesses just one mode) and it is roughly symmetric about that
mode.
Another histogram, which corresponds to a different data set, is shown in
Figure 5.2. You have seen these data before.
This is a graphical representation of the sample of Scottish soldiers' chest
measurements that you met in Chapter 2, Section 2.4. This histogram is also
unimodal and roughly symmetric. The common characteristics of the shape
of both the histograms in Figures 5.1 and 5.2 are shared with the normal
distribution whose p.d.f. is illustrated in Figure 5.3.
-4.0 -2.0
0.0 2.0 4.0
Error (minutes)
6.0
Figure 5.1 Errors in angular
measurements (minutes)
Frequency
800
34 36 38 40 42 44 46 48
Chest (inches)
Figure 5.2 Chest measurements
of Scottish soldiers (inches)
Figure 5.9 The normal p.d.f.
What is it about Figures 5.1, 5.2 and 5 . 3 that makes them appear similar?
Well, each diagram starts a t a low level on the left-hand side, rises steadily
For clarity, the vertical axis has
been omitted in this graph of the
normal density function.
Chapter 5 Section 5.1
until reaching a maximum in the centre and then decreases, a t the same rate
that it increased, to a low value towards the right-hand side. The diagrams
are unimodal and symmetric about their modes (although this symmetry is
only approximate for the two data sets). A single descriptive word often
used to describe the shape of the normal p.d.f., and likewise histograms of
data sets that might be adequately modelled by the normal distribution, is
'bell-shaped'.
Note that there is more than one normal distribution. No single distribution
could possibly describe both the data of Figure 5.1, which have their mode
around zero and which vary from about -4 minutes of arc to over 5 minutes,
and those of Figure 5.2, whose mode is at about 40inches and which range
from approximately 33inches to 48inches. In the real world there are many
instances of random variation following this kind of pattern: the mode and the
range of observed values will alter from random variable to random variable,
but the characteristic bell shape of the data will be apparent.
The four probability density functions shown in Figure 5.4 all correspond to
different normal distributions.
Figure 5.4
Four normal densities
What has been described is another family of probability models, just like
the binomial family (with two parameters, n and p) and the Poisson family
(with one parameter, the mean p). The normal family has two parameters,
one specifying location (the centre of the distribution) and one describing the
degree of dispersion. In Chapter 2 the location parameter was denoted by p
Elements of Statistics
and the dispersion parameter was denoted by a; in fact, the parameter p is
the mean of the normal distribution and a is its standard deviation.
This information may be summarized as follows. The probability density
function for the normal family of random variables is also given.
The normal probability density function
If the continuous random variable X is normally distributed with mean
p and standard deviation a (variance a') then this may be written
X
N(P,0');
the probability density function of X is given by
f(z)= ~
1
e
x
1 2-11
[-i
p (a)2]
,
-W
< X < W.
(5.1)
A sketch of the p.d.f. of X is as follows.
The shape of the density function of X is often called 'bell-shaped'. The
p.d.f. of X is positive for all values of X; however, observations more
than about three standard deviations away from the mean are rather
unlikely. The total area under the curve is 1.
There are very few random variables for which possible observations include
all negative and positive numbers. But for the normal distribution, extreme
values may be regarded as occurring with negligible probability. One should
not say 'the variation in Scottish chest measurements is normally distributed
with mean 40 inches and standard deviation about 2 inches' (the implication
being that negative observations are possible); rather, say 'the variation in
Scottish chest measurements may be adequately modelled by a normal distribution with mean 40 inches and standard deviation 2 inches'.
In the rest of this chapter we shall see many more applications in the real
world where different members of the normal family provide good models of
variation. But first, we shall explore some of the history of the development
of the normal distribution.
186
As remarked already, you do not
need to remember this formula in
order to calculate normal
probabilities.
Chapter 5 Section 5.1
Exercise 5.1
Without attempting geometrical calculations, suggest values for the parameters p and U for each of the normal probability densities that are shown
in Figure 5.4.
An early history
Although the terminology was not standardized until after 1900, the normal
distribution itself was certainly known before then (under a variety of different names). The following is a brief account of the history of the normal
distribution before the twentieth century.
Credit for the very first appearance of the normal p.d.f. goes to Abraham
de Moivre (1667-1754), a Protestant Frenchman who emigrated to London
in 1688 to avoid religious persecution and lived there for the rest of his life,
becoming an eminent mathematician. Prompted by a desire to compute the
probabilities of winning in various games of chance, de Moivre obtained what
is now recognized as the normal p.d.f., an approximation to a binomial probability function (these were early days in the history of the binomial distribution). The pamphlet that contains this work was published in 1733. In
those days, the binomial distribution was known as a discrete probability
distribution in the way we think of discrete distributions today, but it is not
generally claimed that de Moivre thought of his normal approximation as
defining a continuous probability distribution, although he did note that it
defined 'a curve'.
Then, around the end of the first decade of the nineteenth century, two
famous figures in the history of science published derivations of the normal
distribution. The first, in 1809, was the German Carl Friedrich Gauss (17771855); the second, in 1810, was the Frenchman Pierre Simon Laplace (17491827). Gauss was a famous astronomer and mathematician. The range of
his influence, particularly in mathematical physics, has been enormous: he
made strides in celestial mechanics, geometry and geodesy, number theory,
optics, electromagnetism, real and complex analysis, theoretical physics and
astronomy as well as in statistics. Motivated by problems of measurement
in astronomy, Gauss had for a long time recognized the usefulness of the
'principle of least squares', an idea still very frequently used and which you
will meet in Chapter 10. Allied to this, Gauss had great faith in the use of
the mean as the fundamental summary measure of a collection of numbers.
Moreover, he wanted to assert that the most probable value of an unknown
quantity is the mean of its observed values (that is, in current terminology,
that the mean equals the mode). Gauss then, quite rightly, obtained the normal distribution as a probability distribution that would yield these desirable
properties: the normal distribution is relevant to the least squares method of
estimation and its mode and its mean are one and the same. Having said that,
Gauss's argument, or his claims for the consequences of his argument, now
look distinctly shaky. He took use of the mean as axiomatic, arguing for its
appropriateness in all circumstances, saw that the normal distribution gave
the answer he wanted and consequently inferred that the normal distribution
should also be the fundamental probability model for variation.
(a)
(b)
~i~~~ 5.5
(b) Laplace
(a) G~~~~and
Elements of Statistics
The Marquis de Laplace, as he eventually became, lived one of the most
influential and successful careers in science. He made major contributions to
mathematics and theoretical astronomy as well as to probability and statistics.
Laplace must also have been an astute political mover, maintaining a high
profile in scientific matters throughout turbulent times in France; he was even
Napoleon Bonaparte's Minister of the Interior, if only for six weeks! Laplace's
major contribution to the history of the normal distribution was a first version
of the central limit theorem, a very important idea that you will learn about in
Section 5.3. (Laplace's work is actually a major generalization of de Moivre's.)
It is the central limit theorem that is largely responsible for the widespread use
of the normal distribution in statistics. Laplace, working without knowledge
of Gauss's interest in the same subject, presented his theorem early in 1810
as an elegant result in mathematical analysis, but with no hint of the normal
curve as a p.d.f. and therefore as a model for random variation. Soon after,
Laplace encountered Gauss's work and the enormity of his own achievement
hit him. Laplace brought out a sequel to his mathematical memoir in which
he showed how the central limit theorem gave a rationale for the choice of the
normal curve as a probability distribution, and consequently how the entire
development of the principle of least squares fell into place, as Gauss had
shown.
This synthesis between the work of Gauss and Laplace provided the basis for
all the further interest in and development of statistical methods based on
the normal distribution over the ensuing years. Two contemporary derivations of the normal distribution by an Irish-American, Robert Adrain (17751843), working in terms of measurement errors, remained in obscurity. It is
interesting to note that of all these names in the early history of the normal
distribution, it is that of Gauss that is still often appended to the distribution
today when, as is often done, the normal distribution is referred to as the
Gaussian distribution.
The motivating problems behind all this and other early work in
mathematical probability were summarized recently by S.M. Stigler thus:
'The problems considered were in a loose sense motivated by other problems, problems in the social sciences, annuities, insurance, meteorology, and
medicine; but the paradigm for the mathematical development of the field
was the analysis of games of chance'. However, 'Why men of broad vision
and wide interests chose such a narrow focus as the dicing table and why the
concepts that were developed there were applied to astronomy before they
were returned to the fields that originally motivated them, are both interesting questions . . . '. Unfortunately, it would be getting too far away from our
main focus to discuss them further here.
Such was the progress of the normal distribution in the mid-nineteenth century. The normal distribution was not merely accepted, it was widely advocated as the one and only 'law of error'; as, essentially, the only continuous
probability distribution that occurred in the world! Much effort, from many
people, went into obtaining 'proofs' of the normal law. The idea was to
construct a set of assumptions and then to prove that the only continuous
distribution satisfying these assumptions was the normal distribution. While
some, no doubt, were simply wrong, many of these mathematical derivations
were perfectly good characterizations of the normal distribution. That is, the
normal distribution followed uniquely from the assumptions. The difficulty lay
in the claims for the assumptions themselves. 'Proofs' and arguments about
188
Stigler, S.M. (1986) The History of
Statistics-The Measurement of
Uncertainty before "OO. The
Belknap Press of Harvard
University Press,
Chapter 5 Section 5.1
proofs, or at least the assumptions on which they were based, abounded, but
it is now known that, as the normal distribution is not universally applicable,
all this effort was destined to prove fruitless.
This acceptance of the normal distribution is especially remarkable in light of
the fact that other continuous distributions were known at the time. A good
example is due to Sim6on Denis Poisson (1781-1840) who, as early as 1824,
researched the continuous distribution with p.d.f.
1
f (X) = l(.,
+ x Z ) ' -00 < X < 00,
which has very different properties from the normal distribution. An amusing
aside is that this distribution now bears the name of Augustin Louis Cauchy
(1789-1857) who worked on it twenty years or so later than Poisson did while,
on the other hand, Poisson's contribution to the distribution that does bear
his name is rather more tenuous compared with those of other researchers
(including de Moivre) of earlier times.
What of the role of data in all this? For the most part, arguments were solely
mathematical or philosophical, idealized discussions concerning the state of
nature. On occasions when data sets were produced, they were ones that
tended to support the case for the normal model. Two such samples were illustrated a t the beginning of this section. The data on chest measurements of
Scottish soldiers were taken from the Edinburgh Medical and Surgical Journal
of 1817. They are of particular interest because they (or a version of them)
were analysed by the Belgian astronomer, meteorologist, sociologist and statistician, Lambert Adolphe Jacques Quetelet (1796-1874) in 1846. Quetelet was
a particularly firm believer in, and advocate of, the universal applicability of
the normal distribution, and such data sets that do take an approximately
normal shape did nothing to challenge that view. Quetelet was also a major
figure in first applying theoretical developments to data in the social sciences.
The angular data in Table 5.1 are quoted in an 1884 textbook entitled 'A TextBook on the Method of Least Squares' by Mansfield Merriman, an American
author. Again, the book is firmly rooted in the universal appropriateness of
the normal distribution.
In a paper written in 1873, the American C.S. Peirce presented analyses
of 24 separate tables each containing some 500 experimental observations.
Peirce drew smooth densities which, in rather arguable ways, were derived
from these data and from which he seemed to infer that his results confirmed (yet again) the practical validity of the normal law. An extensive
reanalysis of Peirce's data in 1929 (by E.B. Wilson and M.M. Hilferty) found
every one of these sets of data to be incompatible with the normal model
in one way or another! These contradictory opinions based on the same observations are presented here more as an interesting anecdote rather than
because they actually had any great influence on the history of the normal
distribution, but they do nicely reflect the way thinking changed in the late
nineteenth century. Peirce's (and Merriman's) contributions were amongst
the last from the school of thought that the normal model was the only
model necessary to express random variation. By about 1900, so much evidence of non-normal variation had accumulated that the need for alternatives to complement the normal distribution was well appreciated (and by
1929, there would not have been any great consternation at Wilson's and
Hilferty's findings). Prime movers in the change of emphasis away from normal
Poisson's work on the binomial
distribution and the eponymous
approximating distribution was
described in Chapter 4.
models for continuous data were a number of Englishmen including Sir Francis
Galton (1822-1911), FYancis Ysidro Edgeworth (1845-1926) and Karl Pearson
(1857-1936).
But to continue this history of the normal distribution through the t i m a of
these important figures and beyond would be to become embroiled in the
whole fascinating history of the subject of statistics as it is understood today,
so we s h d cease our exploration at thii point.
There is, however, one interesting gadget to do with the n o d distribution
developed during the late nineteenth century, It was called the quincunx,
and was an invention of Francis Gdton in 1873 or thereabouts. Figure 5.6
shows a contemporary sketch of Gdton's original quincunx; Figure 5.7 is a
schematic diagram of the quincunx which more clearly aids the description of
its operation. The mathematical sections of g o d modern science museums
often have a working replica of this device, which forms a fascinating exhibit.
What does the quincunx do and how does it work? The idea is to obtain
in dynamic fashion a physical representation of a binomial distribution. The
word 'quincunx' actually means an arrangement of five objects in a square
or rectangle with one at each corner and one in the middle; the spots on
the '5' face of a die form a good exaniple. Galton's quincunx was made
up of lots of these quincunxes. It consists of a glas5enclosed board with
several rows of equalky spaced pins. Each row of pins is arranged so that
each pin in one m is directly beneath the midpoint of the gap between
two adjacent pins in the MW above; thus each pin is the centre of a quincunx. Metal shot is poured through a funnel directed at the pin in the top
row. Each ball of shot can fall left or right of that pin with probability
4.
Figure 6.6 Galton's quincunx
Figurn 5.7 Diagram of the quincunx
Chapter 5 Section 5.2
The same holds for all successive lower pins that the shot hits. Finally, at the
bottom, there is a set of vertical columns into which the shot falls, and a kind
of histogram or bar chart is formed.
The picture so obtained is roughly that of a unimodal symmetric distribution.
In fact, the theoretical distribution corresponding to the histogram formed by
the shot is the binomial distribution with parameters p = and n equal to the
number of rows of pins, which is 19 in Galton's original device. However, a
histogram from the binomial distribution B(19, looks very much the same
as a histogram from a normal distribution, so the quincunx also serves as
a method of demonstrating normal data. More precisely, the relationship
between the binomial distribution B(19,
and the normal distribution is
a consequence of the central limit theorem. Therefore, Laplace would have
understood the reason for us to be equally happy with the quincunx as a
device for illustrating the binomial distribution or as a device for illustrating
the normal distribution; by the end of this chapter, you will understand why.
i
i)
i)
5.2 The standard normal distribution
In each of the following examples, a normal distribution has been proposed
as an adequate model for the variation observed in the measured attribute.
Example 5.1 Chest measurements
After extensive sampling, it was decided to adopt a normal model for the chest
measurement in a large population of adult males. Measured in inches, the
model parameters were (for the mean) p = 40 and (for the standard deviation)
a = 2.
A sketch of this normal density is shown in Figure 5.8. The area under
the curve, shown shaded in the diagram, gives (according to the model) the
proportion of adult males in the population whose chest measurements are 43
inches or more.
The chest measurement of 43 inches is greater than the average measurement
within the population, but it is not very extreme, coming well within 'plus or
minus 3 standard deviations'.
The shaded area is given by the integral
(writing X N ( p , u 2 ) with p = 40 and a = 2, and using (5.1)). But it is
much easier to think of the problem in terms of 'standard deviations away
from the mean'. The number 43 is one and a half standard deviations above
the mean measurement, 40. Our problem is to establish what proportion of
the population would be a t least as extreme as this.
34
A
normal density f
X N(40,4)
40
43
46
of the
wh&re
Again, in this diagram the vertical
axis has been omitted.
X
Elements of Statistics
Example 5.2 IQ measurements
There are many different ways of assessing an individual's 'intelligence' (and
no single view on exactly what it is that is being assessed, or how best to make
the assessment). One test is designed so that in the general population the
variability in the scores attained should be normally distributed with mean
100 and standard deviation 15. Denoting this score by the random variable
W , then the statistical model is W N(100,225).
W
A sketch of the p.d.f. of this normal distribution is given in Figure 5.9. The
shaded area in the diagram gives the proportion of individuals who (according
to the model) would score between 80 and 120 on the test. The area may be
expressed formally as an integral
55
80
100
120
145 tu
~i~~~~5. g A sketch of the
normal density f ( W ) , where
W N(100,225)
but again it is easier to think in terms of a standard measure: how far away
from the mean are these two scores? At 20 below the mean, the score of 80
is
= $ = 1.33 standard deviations below the mean, and the score of 120 is
1.33 standard deviations above the mean. Our problem reduces to this: what
proportion of the population would score within 1.33 standard deviations of
the mean (either side)? H
8
Example 5.3 Osteoporosis
In Chapter 2, Example 2.17 observations were presented on the height of 351
elderly women, taken as part of a study of the bone disease osteoporosis.
A histogram of the data suggests that a normal distribution might provide
an adequate model for the variation in height of elderly women within the
general population. Suppose that the parameters'of the proposed model are
p = 160, a = 6 (measured in cm; the model may be written H
N(160,36)
where H represents the variation in height, in cm, of elderly women within
the population). According to this model, the proportion of women over
180 cm tall is rather small. The number 180 is (180 - 160)/6 = 3.33 standard
deviations above the mean: our problem is to calculate the small area shown
in Figure 5.10 or, equivalently, to calculate the integral
N
5.2.1 The standard normal distribution
In all the foregoing examples, problems about proportions have been expressed
in terms of integrals of different normal densities. You have seen that a
sketch of the model is a useful aid in clarifying the problem that has been
posed. Finally, and almost incidentally, critical values have been standardized
in terms of deviations from the mean, measured in multiples of the standard
deviation.
Is this standardization a useful procedure, or are the original units of measurement essential t o the calculation of proportions (that is, probabilities)?
Figure 5.10 A normal model for
the variation in height of elderly
women
In this diagram the vertical scale
has been slightly distorted so that
the shaded area is evident: in a
diagram drawn to scale it would
not show up at all.
Chapter 5 Section 5.2
The answer to this question is that it is useful: any normal random variable
X with mean p and standard deviation a (so that X N ( p , a 2 ) ) may be reexpressed in terms of a standardized normal random variable, usually denoted
Z , which has mean 0 and standard deviation 1. Then any probability for
observations on X may be calculated in terms of observations on the random
variable Z. This result can be proved mathematically; but in this course we
shall only be concerned with applying the result. First, the random variable
Z will be explicitly defined.
W
The s t a n d a r d n o r m a l d i s t r i b u t i o n
The random variable Z following a normal distribution with mean 0 and
standard deviation 1 is said to follow the s t a n d a r d n o r m a l distrib u t i o n , written Z N ( 0 , l ) . The p.d.f. of Z is given by
Notice the use of the reserved letter Z for this particular random variable,
and of the letter 4 for the probability density function of 2. This follows the
common conventions that you might see elsewhere.
4 is the Greek lower-case letter phi,
and is pronounced 'fyel.
The graph of the p.d.f. of Z is shown in Figure 5.11. Again, the p.d.f. of Z
is positive for any value of z , but observations much less than -3 or greater
than +3 are unlikely. Integrating this density function gives normal probabilities. (Notice the Greek upper-case letter phi in the following definition.
It is conve&ionally used to denote the c.d.f. of 2 . )
A
The c.d.f. of the standard normal variate Z is given by
-
Figure 5.11 The p.d.f. of
z N ( O ,11, 4(z) = &e-az2
It gives the 'area under the curve', shaded in the following diagram of
the density of Z (see Figure 5.12).
Figure 5.12
The c.d.f. of Z , @(z)= S_"- 4(x) dx
Where in other parts of the course the integral notation has been used to
describe the area under the curve defined by a probability density function,
an explicit formula for the integral has been given, and that formula is used as
Elements of Statistics
the starting point in future calculations. In this respect, the normal density
is unusual. No explicit formula for @(z)exists, though it is possible to obtain
an expression for @(t) in the form of an infinite series of powers of z. So,
instead, values of @(z)are obtained from tables or calculated on a computer.
Exercise 5.2
On four rough sketches of the p.d.f. of the standard normal distribution copied
from Figure 5.11, shade in the areas corresponding to the following standard
normal probabilities.
( 4 P(Z I
2)
(b) P ( Z > 1)
(C) P(-l < z1 1)
( 4 P(Z I
-2)
Before we look at tables which will allow us to attach numerical values to
probabilities like those in Exercise 5.2, and before any of the other important properties of the standard normal distribution are discussed, let us pause
to establish the essential relationship between the standard normal distribution and other normal distributions. It is this relationship that allows us to
calculate probabilities associated with (for example) Victorian soldiers' chest
measurements or mapmakers' measurement errors, or any other situation for
which the normal distribution provides an adequate model.
-
Once again, let X follow a normal distribution with arbitrary mean p and
variance a 2 , X N(p,u2), and write Z for the standard normal variate,
Z N(0,l). These two random variables are related as follows.
-
If X
-
N ( p , a2))then the random variable
Conversely, if Z
X =a Z +p
-
N(0, l), then the random variable
N(p, a2).
The great value of this result is that we can afford to do most of our thinking
about normal probabilities in terms of the easier standard normal distribution
and then adjust results appropriately, using these simple relationships between
X and Z , to answer questions about any given general normal random variable.
Figure 5.13 gives a graphical representation of the idea of standardization.
Chapter 5 Section 5.2
The shaded area gives the
probability
P(XZp+2u)=P(Z>2).
P-30
Figure 5 . 1 3
p-20
p-a
p
p+2u
p+3a
X
Standardization portrayed graphically
We can now formalize the procedures of Examples 5.1 to 5.3.
Example 5.1 continued
Our model for chest measurements (in inches) in a population of adult males is
normal with mean 40 and standard' deviation 2: this was written as
X N(40,4). We can rewrite the required probability P ( X > 43) as
This is illustrated in Figure 5.14, which may be compared directly with
Figure 5.8. H
Example 5.2 continued
In this case the random variable of interest is the intelligence score W, where
W N(100,225), and we require the probability P(80 5 W 5 120). This
may be found by rewriting it as follows:
N
= P(-1.33
5 Z 5 1.33).
This probability is illustrated by the shaded region in Figure 5.15 (and see
also Figure 5.9).
1.5
Figure 5.14
The probability P ( Z
z
> 1.5)
-1.33
Figure 5.15
1.33
2
The probability P ( - 1 . 3 3
5 Z 5 1.33)
Elements of Statistics
Example 5.3 continued
In Example 5.3 a normal model H N(160,36) was proposed for the height
distribution of elderly women (measured in cm). We wanted to find the
proportion of this population who are over 180cm tall. This probability
P ( H > 180) can be rewritten
and is represented by the shaded area in Figure 5.16. The diagram may be
compared with that in Figure 5.10.
As in Figure 5.10, the vertical scale
in this diagram has been slightly
distorted.
Figure 5.16 The probability P ( Z
> 3.33)
Exercise 5.3
(a) Measurements were taken on the level of ornithine carbonyltransferase
(a liver enzyme) present in individuals suffering from acute viral hepatitis.
After a suitable transformation, the corresponding random variable may
be assumed to be adequately modelled by a normal distribution with
mean 2.60 and standard deviation 0.33. Show on a sketch of the standard
normal density the proportion of individuals with this condition, whose
measured enzyme level exceeds 3.00.
(b) For individuals suffering from aggressive chronic hepatitis, measurements
on the same enzyme are normally distributed with mean 2.65 and standard deviation 0.44. Show on a sketch of the standard normal density the
proportion of individuals suffering from aggressive chronic hepatitis with
an enzyme level below 1.50.
(c) At a ball-bearing production site, a sample of 10 ball-bearings was taken
from the production line and their diameters measured (in mm). The
recorded measurements were
(i) Find the mean diameter Z and the standard deviation S for the sample.
(ii) Assuming that a normal model is adequate for the variation in
measured diameters, and using Z as an estimate for the normal parameter
p and S as an estimate for a, show on a sketch of the standard normal density the proportion of the production output whose diameter is between
0.8 mm and 1.2 mm.
See Chapter 2, Table 2.18.
See Chapter 2, Table 2.19.
Chapter 5 Section 5.2
The foregoing approach may be summarized simply as follows.
Calculating normal probabilities
If the random variable X follows a normal distribution with mean p and
variance 02,written X -- N ( p , a2),then the probability P ( X 5 X ) may
be written
1
where @(S)
is the c.d.f. of the standard normal distribution.
5.2.2 Tables of the standard normal distribution
We are not yet able to assign numerical values to the probabilities so far
represented only as shaded areas under the curve given by the standard normal
density function. What is the probability that an IQ score is more than 115?
What proportion of Victorian Scottish soldiers had chests measuring 38 inches
or less? What is the probability that measurement errors inherent in the
process leading to Merriman's data would be less than 2 minutes of arc in
absolute value?
The answer to all such questions is found by reference to sets of printed tables,
or from a computer. In this subsection you will see how to use the table of
standard normal probabilities, Table A2.
You have already seen that any probability statement about the random variable X (when X is N ( p , a 2 ) ) can be re-expressed as a probability statement
about Z (the standard normal variate). So only one page of tables is required:
we do not need reams of paper to print probabilities for other members of the
normal family. To keep things simple, therefore, we shall begin by finding
probabilities for values observed on 2 , and only later make the simple extension to answering questions about more general normally distributed random
variables useful in modelling the real world. The statistics table entitled
'Probabilities for the standard normal distribution' gives the left-hand tail
probability
for values of z from 0 to 4 by steps of 0.01, printed accurate to 4 decimal places.
(Other versions of this table might print the probability P ( Z z) for a range
of values of z; or the probability P ( 0 Z 5 z ) ; or even P(-z 5 Z 5 z)!
There are so many variations on possible questions that might be asked, that
no one formulation is more convenient than any other.)
<
>
Values of x are read off down the leftmost column and across the top row (the
top row gives the second decimal place). Thus the probability P ( Z 5 1.58),
for example, may be found by reading across the row for z = 1.5 until the
column headed 8 is found.
Then the entry in the body of the table in the same row and column gives the
probability required: in this case, it is 0.9429. (So, only about 6% of a normal
population measure in excess of 1.58 standard deviations above the mean.)
Elements of Statistics
As a second example, we can find the probability P ( Z 5 3.00) so frequently
mentioned. In the row labelled 3.0 and the column headed 0, the entry in
the table is 0.9987, and this is the probability required. It follows that only
a proportion 0.0013, about one-tenth of one per cent, will measure in excess
of 3 standard deviations above the mean, in a normal population. These
probabilities can be illustrated on sketches of the standard normal density, as
shown in Figure 5.17.
Figure 5.1 7 (a)P(Z 5 1.58)
(b) P(Z _< 3.00)
Exercise 5.4
Use the table to find the following probabilities.
(a) P ( Z 5 1.00)
(b) P ( Z 5 1.96)
(C) P ( Z 5 2.25)
Illustrate these probabilities in sketches of the standard normal density.
Of course, required probabilities will not necessarily always be of the form
P(Z 5 z ) . For instance, we might need to find probabilities such as
In such cases it often helps to draw a rough sketch of what is required and
include on the sketch information obtained from tables. The symmetry of the
normal distribution will often prove useful; as will the fact that the total area
under the standard normal curve is 1. To find P ( Z 2 1.50), for example, we
would start with a sketch of the standard normal density, showing the area
required, as in Figure 5.18.
From the tables, we find that the probability P ( Z 5 1.50) is 0.9332. By
subtraction from 1, it follows that the probability required, the area of the
shaded region, is 1 - 0.9332 = 0.0668. This is illustrated in Figure 5.19.
Figure 5.19
P(Z 2
1.50)
/ '2
E*
1.50
Figure 5.18
2
Elements of Statistics
Example 5.4 Calculating normal probabilities after standardization
According to the particular design of IQ tests which results in scores that
are normally distributed with mean 100 and standard deviation 15, what
proportion of the population tested will record scores of 120 or more?
The question may be expressed in terms of a normally distributed random
variable X N(100,225) as 'find the probability P ( X 2 120)'. This is found
by standardizing X , thus transforming the problem into finding a probability
involving Z:
This is found from the tables to be 1 - a(1.33) = 1 - 0.9082 = 0.0918. Not
quite 10% of the population will score 120 or more on tests to this design.
This sort of example demonstrates the importance of the standard deviation
in quantifying 'high' scores. Similarly, less than 2.5% of the population will
score 130 or more:
Exercise 5.8
A reasonable model for the nineteenth century Scottish soldiers' chest measurements is to take X N(40'4) (measurements in inches). What proportion of
that population would have had chest measurements between 37inches and
42 inches inclusive?
N
At this point you might wonder precisely how the actual data-the
5732
Scottish soldiers' chest measurements--enter the calculation. They figure
implicitly in the first sentence of the exercise: a reasonable model for the
distribution of the data is N(40,4). That the normal distribution provides a
reasonable model for the general shape can be seen by looking at the histogram
in Figure 5.2. That 40 is a reasonable value to take for p and 4 for c2 can be
seen from calculations based on the data-which we shall explore further in
Chapter 6. Once the data have been used to formulate a reasonable model,
then future calculations can be based on that model.
Exercise 5.9
A good model for the angular measurement errors (minutes of arc) mentioned
in Section 5.1 is that they be normally distributed with mean 0 and variance
2.75. What is the probability that such an error is positive but less than 2?
IIJ]
Chapter 5 Section 5.2
Exercise 5.10
Blood plasma nicotine levels in smokers (see Chapter 2, Table 2.16) can be
modelled as T N(315, 1312 = 17 161). (The units are nanograms per millilitre, ng/ml.)
(a) Make a sketch of this distribution marking in p
0, 1, 2, 3.
+ ka for k = -3,
-2, -1,
(b) What proportion of smokers has nicotine levels lower than 300? Sketch
the corresponding area on your graph.
(c) What proportion of smokers has nicotine levels between 300 and 500?
(d) If 20 other smokers are to be tested, what is the probability that at most
one has a nicotine level higher than 500?
Here the adequacy of a normal
model becomes questionable.
Notice that a nicotine level of zero
is only 3151131 = 2.40 standard
deviations below the mean. A
normal model would thus permit a
proportion of @(-2.40) = 0.008
negative recordings, though
negative recordings are not
realizable in practice.
5.2.3 Quantiles
So far questions of this general form have been addressed: if the distribution of
the random variable X N ( p , a2)is assumed to be an adequate model for the
variability observed in some measurable phenomenon, with what probability
P ( x l j X jx2) will some future observation lie within stated limits? Given
the boundaries illustrated in Figure 5.21, we have used the tables to calculate
the shaded area representing the probability P(xl 5 X 5 22).
W
Figure 5.21 The probability
P(x1 I
X I
xz)
Conversely, given a probability a we might wish to find X such that
P ( X 5 X) = cr. For instance, assuming a good model of IQ scores to be
N(100,225), what score is attained by only the top 2.5% of the population?
This problem is illustrated in Figure 5.22. Quantiles were defined in Chapter 3, Section 3.5. For a continuous random variable X with c.d.f. F ( x ) , the
a-quantile is the value X which is the solution to the equation F($)= a, where
0 < a < 1. This solution is denoted q,.
You may remember these special cases: the lower quartile, 40.25 or q ~ the
;
median, q 0 . ~or m; and the upper quartile, 40.75 or qw These are shown in
Figure 5.23 for the standard normal distribution.
60
The median of Z is clearly 0: this follows from the symmetry of the normal
distribution. From the tables, the closest we can get to qu'is to observe that
80
100
120x
140
IQ scores
Figure 5.22 The 97.5% point of
N(100,225) (X,unknown)
--
so (splitting the difference) perhaps qv 0.675 or thereabouts. It would be
convenient to have available a separate table of standard normal quantiles,
and this is provided in Table A3. The table gives values of q, to 3 decimal
places for various values of a from 0.5 to 0.999.
So, for instance, the upper quartile of Z is qv = 0.674; the 97.5% point of Z
is q0.975 = 1.96. If X N ( p , n 2 ) , then it follows from the relationship
W
that the 97.5% point of X is 1.960
in Figure 5.22 is
+ p. So the unknown IQ score illustrated
Figure 5.23 q ~ m,
, qu for
z
N(0,l)
Elements of Statistics
The symmetry of the normal distribution may also be used t o find quantiles
lower than the median. For instance, the 30% point of Z is
90.3 = -90.7 = -0.524.
Exercise 5.11
Find 90.2, 90.4, 90.6, 90.8 for the distribution of IQ test scores, assuming the
normal distribution N(100,225) to be an adequate model, and illustrate these
quantiles in a sketch of the distribution of scores.
There now follows a further exercise summarizing the whole of this section
so far. Take this opportunity to investigate the facilities available on your
computer to answer this sort of question.
While the tables often provide the quickest and easiest way of obtaining normal probabilities to answer isolated questions, in other circumstances it is
more convenient to use a computer, and computer algorithms have been developed for this purpose. In general, too, computers work to a precision much
greater than 4 decimal places, and more reliance can be placed on results
which, without a computer, involve addition and subtraction of several probabilities read from the tables.
Exercise 5.12
The answers given to the various questions in this exercise are all based on
computer calculations. There may be some inconsistencies between these
answers and those you would obtain if you were using the tables, with all the
implied possibilities of rounding error. However, these inconsistencies should
never be very considerable.
The random variable Z has a standard normal distribution N ( 0 , l ) . Use
your computer to find the following.
(i) P ( Z 2 1.7)
(ii) P ( Z 2 -1.8)
(iii) P(-1.8 5 Z 5 2.5)
(iv) P(1.55 Z 5 2.8)
(v) qo.10, the 10% point of the distribution of Z
(vi) 90.95, the 95% point of the distribution of Z
(vii) 90,975, the 97.5% point of the distribution of Z
(viii) 90.99, the 99% point of the distribution of Z
Let X be a randomly chosen individual's score on an IQ test. By the
design of the test, it is believed that X N(100,225).
(i) What is the probability that X is greater than 125?
(ii) What is the probability P(80 5 X 5 go)?
(iii) What is the median of the distribution of IQ scores?
(iv) What IQ score is such that only 10% of the population have that
score or higher?
(v) What is the 0.1-quantile of the IQ distribution?
N
Chapter 5 Section 5.2
(c) Suppose the heights (in cm) of elderly females follows a normal distribution with mean 160 and standard deviation 6.
(i) What proportion of such females are taller than 166 cm?
(ii) What is the 0.85-quantile of the distribution of females' heights?
(iii) What is the interquartile range of the distribution? (The population
interquartile range is the difference between the quartiles.)
(iv) What is the probability that a randomly chosen female has height
between 145 and 157 cm?
(d) Nicotine levels in smokers are modelled by a random variable T with a
normal distribution N(315,17161).
(i) What is the probability that T is more than 450?
(ii) What is the 0.95-quantile of the nicotine level distribution?
(iii) What is the probability P(150 < T < 400)?
(iv) What is the probability P(IT - 3151 L: loo)?
(v) What nicotine level is such that 20% of smokers have a higher level?
(vi) What range of levels is covered by the central 92% of the smoking
population?
(vii) What is the probability that a smoker's nicotine level is between
215 and 300 or between 350 and 400?
5.2.4 Other properties of the normal distribution
In Chapter 4 you looked a t some properties of sums and multiples of random
variables. In particular, if the random variables X I , . . . ,X, are independent
with mean pi and variance U:, then their sum C Xi has mean and variance
You learned the particular result that sums of independent Poisson variates
also follow a Poisson distribution.
A corresponding result holds for sums of independent normal random variables: they follow a normal distribution.
If Xi are independent normally distributed random variables with mean
pi and variance a:, i = 1 , 2 , . . . , n, then their sum C Xi is also normally
distributed, with mean C pi and variance C U:
C xi
This result is stated without proof.
N ( C pi, C 0:).
Example 5.5 Bags of sugar
Suppose that the normal distribution provides an adequate model for the
weight X of sugar in paper bags of sugar labelled as containing 2 kg. There is
some variability, and to avoid penalties the manufacturers overload the bags
slightly. Measured in grams, suppose X N(2003,l).
In fact, items marked with the e
next to their weight do
weigh 2 kg (or whatever) on
average, and that is all that a
manufacturermight be required to
demonstrate.
Elements of Statistics
It follows that the probability that a bag is underweight is given by
So about one bag in a thousand-is underweight.
A cook requiring 6 kg of sugar to make marmalade purchases three of the
bags. The total amount of sugar purchased is the sum
Assuming independence between the weights of the three bags, their expected
total weight is
and the variance in the total weight is
V(S) = a:
= 3;
that is, S
SD(S) =
W
+ + a:
0;
N(6009, 3).
= 1.732 gm.
The standard deviation in the total weight is
The probability that altogether the cook has too little sugar for the purpose
(less than 6 kg) is given by
P(S < 6000)
This probability is negligible. (Your computer, if you are using one, will give
you the result 1.02 X 10-?, about one in ten million!)
4 that if the random variable X has mean p and
variance a2,then for constants a and b, the random variable aX b has mean
You also saw in Chapter
+
and variance
E( a X
+ b) = a p + b,
V(aX
+ b) = a2u2.
This holds whatever the distribution of X . However, if X is normally distributed, the additional result holds that a X b is also normally distributed.
+
If X is normally distributed with mean p and variance u2, written
X N(p, a'), and if a and b are constants, then
aX
+b
N
N(ap
+ b, a2u2).
Chapter 5 Section 5.3
5.3 The central limit theorem
In the preceding sections of this chapter and a t the first mention of the normal
distribution in Chapter 2, it has been stressed that the distribution has an
important role in statistics as a good approximate model for the variability
inherent in measured quantities in all kinds of different contexts.
This section is about one of the fundamental results of statistical theory: it
describes particular circumstances where the normal distribution arises not in
the real world (chest measurements, enzyme levels, intelligence scores), but at
the statistician's desk. The result is stated as a theorem, the central limit
theorem. It is a theoretical result, and one whose proof involves some deep
mathematical analysis: we shall be concerned, however, only with its consequences, which are to ease the procedures involved when seeking to deduce
characteristics of a population from characteristics of a sample drawn from
that population.
5.3.1 Characteristics of large samples
The central limit theorem is about the distributions of sample means and
sample totals. You met these sample quantities in Chapter 1. Suppose we
have a random sample of size n from a population. The data items in the
sample may be listed
The sample total is simply the sum of all the items in the data set:
The sample m e a n is what is commonly called the 'average', the sample total
divided by the sample size:
Notice that in both these labels, tn and ?Fn,the subscript n has been included.
This makes explicit the size of the sample from which these statistics have been
calculated.
We know that in repeated sampling experiments from the same population
and with the same sample size, we would expect t o observe variability in the
individual data items and also in the summary statistics, the sample total and
the sample mean. In any single experiment therefore, the sample total tn is
is just
just one observation on a random variable Tn;and the sample mean ?i&
one observation on a random variable X,.
You saw in Chapter 4 , that notwithstanding this variability in the summary
statistics, they are useful consequences of the experiment. In particular, assuming the population mean p and the population variance a2 to be unknown,
the following important result for the distribution of the mean of samples of
size n was obtained:
Elements of Statistics
That is, if a sample of size n is collected from a large population, and if that
sample is averaged to obtain the sample mean, then the number obtained, En,
should constitute a reasonable estimate for the unknown population mean p.
Moreover, the larger the sample drawn, the more reliance can be placed on
the number obtained, since the larger the value of n, the less deviance that
should be observed in E, from its expected value p.
Chapter 4 , page 157
Exercise 5.13
Obtain a sample of size 5 from a Poisson distribution with mean 8, and
calculate the sample mean F5. Next, obtain 100 observations on the
random variable X 5 . How many of the 100 observations (all 'estimating
the number 8') are between 6 and 10?
Now obtain a sample of size 20 from a Poisson distribution with mean 8
and calculate the sample mean ?Cz0. Obtain 100 observations altogether
on r z o . How many of these are between 6 and 10? How many are between
7 and 9?
Now obtain a sample of size 80 from a Poisson distribution with mean 8,
and calculate the sample mean so.
Obtain 100 observations on KO,
and
calculate the number of them that are between 7 and 9.
Summarize in non-technical language any conclusions you feel able to
draw from the experiments of parts (a) to (c).
Exercise 5.14
Investigate the sampling properties of means of samples of size 5, 20, 80 from
the exponential distribution with mean 8.
In Exercise 5.13, and Exercise 5.14 if you tried it, the same phenomenon
should have been evident: that is, variation in the sample mean is reduced as
the sample size increases.
But all this is a consequence of a result that you already know, and have known
for some time-the point was made in Chapter 4 that increasing the sample
size increases the usefulness of ?C as an estimate for the population mean p.
However, knowledge of the mean (E(X,) = p ) and variance (V@,) = a 2 / n )
of the sample mean does not permit us to make probability statements about
likely values of the sample mean, because we still do not know the shape of
its probability distribution.
Exercise 5.15
(a) The exponential distribution is very skewed with a long right tail.
Figure 5.24 is a sketch of the density for an exponentially distributed
random variable with mean 1.
(i) Generate 100 observations on the random variable 572 from this distribution; obtain a histogram of these observations.
(ii) Now generate 100 observations on the random variable X30 from this
distribution. Obtain a histogram of these observations.
(iii) Comment on any evident differences in the shape of the two histograms.
f (X)
1.0-
0
I
I
I
1
2
3
Figure 5.24
I
4
I
5
f (X)= e-" , X 2 0
1
Chapter 5 Section 5.3
(b) The continuous uniform distribution is flat. The density of the uniform
distribution U(O,2) (with mean 1) is shown in Figure 5.25.
(i) Generate 100 observations on the random variable X2 from this distribution and obtain a histogram of these observations. .
(ii) Now generate 100 observations on
the observations.
X30,and
obtain a histogram of
(iii) Are there differences in the shape of the two histograms?
$1
f(x)
4
0
1
Figure 5.25 The uniform
distribution U(0,2)
5.3.2 Statement of the theorem
The point illustrated by the solution to Exercise 5.15 is that even for highly
non-normal populations, repeated experiments to obtain the sample mean
result in observations that peak at the population mean p, with frequencies
tailing off roughly symmetrically above and below the population mean. This
is a third phenomenon to add to the two results noted already, giving the
following three properties of the sample mean.
(a) In a random sample from a population with unknown mean p,
the sample mean is a good indicator of the unknown number p
WXn) = p).
(b) The larger the sample, the more reliance can be placed on the sample
= a2/n).
mean as an estimator for the unknown number p
(c) Notwithstanding any asymmetry in the parent population, and for
samples of sufficient size, the sample mean in repeated experiments
overestimates or underestimates the population mean p with roughly
equal probability. Specifically, the distribution of the sample mean
is approximately 'bell-shaped'.
(~(x)
It is also of interest that this bell-shaped effect happens not just with highly
asymmetric parent populations, but also when the parent population is discrete-Figure 5.26 shows the histogram that resulted when 1000 observations
were taken on X30from a Poisson distribution with mean 2.
Figure 5.26
1000 observations on X 3 0 from Poisson(2)
2
X
Elements of Statistics
Again, the 'bell-shaped' nature of the distribution of the sample mean is
apparent in this case.
Putting these three results together leads us to a statement of the central
limit theorem.
The central limit theorem
If X I , X2, . . . , Xn are n independent and identically distributed random
observations from a population with mean p and finite variance a2,then
for large n the distribution of their mean
is approximately normal
with mean p and variance a 2 / n : this is written
The symbol 'x' is read 'has
approximately the same
distribution as'.
The theorem is an asymptotic result-that
is, the approximation improves
as the sample size increases. The quality of the approximation depends on
a number of things including the nature of the population from which the
n observations are drawn, and one cannot easily formulate a rule such as
'the approximation is good for n at least 30'. There are cases where the
approximation is good for n as small as 3; and cases where it is not so good
even for very large n . However, certain 'rules of thumb' can be developed for
the common applications of this theorem, as you will see. One thing that is
certain is that in any sampling context the approximation will get better as
the sample size increases.
5.3.3 A corollary to the theorem
We have concentrated so far on the distribution of the mean of a sample of
independent identically distributed random variables: this has evident applications to estimation, as we have seen.
As well as the mean X,, we might also be interested in the total Tn of n
independent identically distributed random variables. This has mean and
variance given by
E(Tn) = np,
V(Tn) = na2.
A corollary to the central limit theorem states that for large n the distribution
of the sample total Tn is approximately normal, with mean n p and variance
ng2:
Example 5.6 A traffic census
In a traffic census, vehicles are passing an observer in such a way that the
waiting time between successive vehicles may be adequately modelled by an
exponential distribution with mean 15 seconds. As it passes, certain details of
Chapter 5 Section 5.3
each vehicle are recorded on a sheet of paper; each sheet has room to record
the details of twenty vehicles.
What, approximately, is the probability that it takes less than six minutes to
fill one of the sheets?
If the waiting time T measured in seconds has mean 15, then we know from
properties of the exponential distribution that it has standard deviation 15
and variance 225. The time taken to fill a sheet is the sum
of twenty such waiting times. Assuming the times to be independent, then
and
Also, by the central limit theorem, W is approximately normally distributed:
We need to find the probability that the total time W is less than six minutes:
that is, less than 360 seconds, seconds being our unit of measurement. This
is given by
From the tables, this probability is 0.8133. (Using a computer directly without
introducing incidental approximations yields the answer 0.8145.)
Exercise 5.16
A dentist keeps track, over a very long period, of the time T it takes her to
attend to individual patients at her surgery. She is able to assess the average
duration of a patient's visit, and the variability in duration, as follows:
p = 20 minutes,
a = 15 minutes.
(In reality, she arrives at these estimates through the sample mean and sample
standard deviation of her data collection; but these will suffice as parameter
estimates.)
A histogram of her data proves to be extremely jagged, suggestive of none of
the families of distributions with which she is familiar. (Although the data
set is large, it is not sufficiently large to result in a smooth and informative
histogram.)
Her work starts at 9.00 each morning. One day there are 12 patients waiting
in the waiting room: her surgery is scheduled to end at noon.
What (approximately) is the probability that she will be able to attend to all
12 patients within the three hours?
See (4.25).
Elements of Statistics
Exercise 5.17
Rather than keep an accurate record of individual transactions, the holder of a
bank account only records individual deposits into and withdrawals from her
account to the nearest pound. Assuming that the error in individual records
may be modelled as a continuous uniform random variable U(-:,
what is
the probability that at the end of a year in which there were 400 transactions,
her estimate of her bank balance is less than ten pounds in error?
i),
(Remember, if the random variable W is U(a, b), then W has variance
(b - a)'.)
5.4 Normal approximations to continuous
distributions
The probability density function of the normal distribution is a symmetric
bell-shaped curve: many other random variables which are not exactly normally distributed nonetheless have density functions of a qualitatively similar
form. So, when, as so often, it is difficult to determine a precise model, using
a normal distribution as an approximation and basing our efforts on that is
an appealing approach.
In many cases, the central limit theorem is the explanation for the apparently
normal nature of a distribution: the random variables we are interested in
are really made up of sums or averages of other independent identically distributed random variables, and so the central limit theorem applies to explain
the resulting approximate normal distribution. More than that, the central
limit theorem tells us the appropriate mean and variance of the approximate
normal distribution in terms of the mean and variance of the underlying random variables. So probabilities may be calculated approximately by using the
appropriate normal distribution. In Exercises 5.16 and 5.17, you have already
done this when given examples of underlying distributions and questions explicitly framed in terms of sums of the associated random variables. But we
can also use normal approximations in cases where we know the exact distribution, but where it is not easy to work with the exact result. Examples of
this include the binomial distribution-recall from Chapter 2, Section 2.3 that
binomial random variables are sums of independent identically distributed
Bernoulli random variables-and the Poisson distribution (sums of independent Poisson variates are again Poisson variates). Normal approximations to
the binomial and Poisson distributions are considered further in Section 5.5.
How large a sample is needed for the central limit theorem to apply? The
central limit theorem is a limiting result that, we have seen, we can use as
an approximate result for finite sample size: when is that approximation
good? Unfortunately, there is no neat single answer to these questions. It all
depends on the particular underlying distribution we are concerned with; for
the binomial distribution a 'rule of thumb' that has been established over long
experience will be given, t o provide a reasonable guide. For some distributions,
approximate normality can hold for surprisingly small n-like 5 or 1 0 - w e n
when the underlying distribution is very non-normal.
Chapter 5 Section 5.4
This section deals only with normal approximations to continuous distributions. Normal approximations to discrete distributions will be considered
in Section 5.5.
In Exercise 5.13 you used a computer to mimic the repeated drawing of
samples from a Poisson distribution; the results of that sampling experiment
were illustrated using histograms, and that was your first intimation of the
consequences of the central limit theorem. In Exercise 5.14, if you had the
time to try it, you would have seen the phenomenon repeated for a continuous
model, the highly skewed exponential distribution.
In this section we will also look at the densities of means and sums of continuous variables, rather than at histograms, their jagged sampling analogues.
That is to say, we shall be considering the exact form of the distribution
obtained when continuous random variables are added together.
Now, so far, the only exact result we have used is that the sum of normal
random variables is itself normally distributed (remember Example 5.5 where
weights of bags of sugar were added together). Even that result was merely
stated, and not proved. Otherwise, we have used approximate results based
on the central limit theorem. The problem is that, in general, the exact
distribution of a sum of continuous random variables is rather difficult to
obtain. For the three examples that follow, you are not expected to appreciate
all the theoretical detail underlying the results-indeed, not much detail is
given. Just try to understand the main message behind the examples. This
section is entirely illustrative.
Example 5.7 Summing exponential random variables
The exponential random variable X
M ( l ) has mean 1 and variance 1; its
probability density function is given by
The density, sketched in Figure 5.24, illustrates the highly skewed nature of
the distribution.
The mean of samples of size 2 from this distribution,
-
X2 = $(X1
+ X2),
has mean 1 and variance $. The p.d.f. of X2 is not something you need to
know, and far less be able to obtain; but the shape of the density of X2 is
given in Figure 5.27.
1.0
Figure 5.27 The density of
z
2
when X
2.0
3.0
M(1)
f2
The variance is a2/n, where a2 = 1
and = 2.
Elements of Statistics
Already you can see the reduction in the skewness of the density-although
far from being symmetric, there is a very apparent peak, and the density tails
off either side of this peak.
Figure 5.28 shows the density of Tl0,the mean of samples of size 10 from the
exponential distribution with mean 1. This random variable has mean 1 and
variance
h.
Figure 5.28
when X
The density of
W
M(1)
The dashed curve shown in Figure 5.28 is that of the normal density with
You can see that the two curves are very similar.
mean 1 and variance
For modelling purposes, one might as well use the approximating and tractable
normal curve-the exact distribution of Xlo is not a t all simple.
h.
In fact when X is exponential, the
distribution of the mean of a
random sample from the
distribution of X is known to
Example 5.8 Summing uniform random variables
belong to the g a m m a family of
The uniform random variable X -- U ( 0 , l ) has the density shown in Figure 5.29. distributions. Some computer
The density is flat; the random variable X has a minimum observable value programs for statistics can supply
exact probabilities for this family,
at 0 and a maximum a t 1.
eliminating the need for
The mean of a sample of size 2, X2 = (X1 XZ),again has a range extending approximate normal probabilities.
+
from 0 to 1. The p.d.f. of X2 is symmetric; but now there is a clear mode a t
This is shown in Figure 5.30.
the midpoint of the range, Fz =
i.
0
Figure 5.29
1
0
X
The density of X, X
W
U(0,l)
Figure 5.30
1.0 Z2
The density of
z,X
N
U(0,l)
Chapter 5 Section 5.4
x3
+ +
The mean of a sample of size 3,
= ;(X1
Xz X3), again has a density
defined
only
over
the
range
from
0
to
1;the mean of X3 is
the variance of
X3 is &/3 =
Its p.d.f. is drawn in Figure 5.31; the superimposed dotted
You
line is the p.d.f. of the normal distribution with mean and variance
can see that the approximation is already extremely good.
h.
Figure 5.31
The density of
i;
4
X3
when X
&.
When X is U(0, l),V(X) =
&.
U(0,l)
Example 5.9 Summing beta random variables
The random variable X whose p.d.f. is given by
is a member of the beta family. (You do not need to know any general
properties of the beta family of probability distributions.) It has a highly
skewed U-shaped distribution defined over the range (0, l ) , as you can see
from Figure 5.32. It is not at all easy to obtain the algebraic form of the
= ;(X1 X2), let alone of means of larger
density of the random variable
samples. Instead,
the
histograms
for
1000 observations on each of the random
- variables X 2 ,XI0,XZ0are shown in Figure 5.33.
+
Frequency
Figure 5.33
Frequency
(a) 1000 observations on
Figure 5.32 f (X)=
X - 1- ~ ) - ~ ' / 2 ,0 < X < 1
In this diagram the scales have
been slightly distorted to
exaggerate the main features of the
U-shaped density.
Frequency
(b) 1000 observations on X 1 0 ( c ) 1000 observations on
The three histograms are suggestive of the shape of the theoretical density
functions for z 2 , XI0 and X20. YOUcan see that even for samples of size
10 some characteristics of the normal distribution are beginning to become
W
Elements of Statistics
apparent and the distribution of the sample mean for samples of size 20 would
appear to be quite usefully approximated by a normal distribution.
The purpose of this section has been to provide graphical support for the
statement of the central limit theorem. Irrespective of the shape of the distribution of the random variable X (even when it is flat, or U-shaped) the
distribution of the random variable
the mean of random samples of size n , has shown to some extent the characteristics of the normal distribution: that is, the distribution of Xn is unimodal
and approximately symmetric.
5.5 The normal approximation to discrete
distributions
It has been stressed that the central limit theorem applies equally to continuous and discrete underlying distributions. However, in the discrete case,
it is possible to improve the normal approximation to probabilities further by
using a simple device called a continuity correction. This is developed in the
context of the binomial distribution, but the idea is applicable to any discrete
underlying distribution, including the Poisson distribution.
5.5.1 The normal approximation to the binomial
distribution
The binomial distribution, B(n,p), with parameters n and p is a discrete
distribution with probability mass function
where q = 1 - p.
You have seen that the random variable X B(n,p) can be thought of as the
sum of n independent Bernoulli variates each with parameter p. (A Bernoulli
random variable takes the value 1with probability p and the value 0 with probability g.) So, we can apply the central limit theorem to X and hence obtain
a normal approximation to the binomial distribution, and this will prove especially useful for calculating probabilities because normal probabilities avoid
the difficult sums that make up binomial tail probabilities. Now, p, the mean
of the Bernoulli distribution is p and c2,its variance, is pq. So, the approximating normal distribution for X (a sum of n independent identically
distributed Bernoulli variates) has mean n p = np and variance nu2 = npq.
Notice that, as you should have expected, these are also the mean and variance
of the exact binomial distribution B ( n , p ) , so we approximate the binomial
distribution by a normal distribution with the same mean and variance.
Example 5.10 Comparing the distributions ~ ( 1 6i)
, and N(8,4)
As an example of this approximation let us take n = 16, p = $. Then
np = 8 and npq = 4. Graphs of the binomial probability mass function and
the approximating normal p.d.f. are shown superimposed in Figure 5.34.
214
Chapter 5 Section 5.5
Figure 5.34
The distributions B (16, i) and N(8,4) compared
Apart from the obvious differences between a discrete and a continuous distribution, these graphs are really very similar. But what should be our approach if we wish to use the normal approximation to estimate the probability
P ( X 5 6), say? In general, we have seen that when approximating the distribution of X (with mean p and standard deviation a ) by a normal distribution
with the same mean and standard deviation, we have used the approximation
where a(.) is the c.d.f. of the standard normal variate 2. In this case, X is
binomial with mean np = 8 and variance npq = 4: so we set p equal to 8 and
a equal to 4 = 2. Here, in Table 5.2, are the corresponding binomial and
normal c.d.f.s.
You can see that calculated values for the two c.d.f.s are quite close but that,
at these particular values of X, the c.d.f. of the normal variate is always smaller
than that of the binomial. You can see from Figure 5.35 some indication of
why this is happening. This figure shows superimposed on the same diagram
-
the distribution of the binomial random variable X
B(16,:) and the ap-
proximating normal random variable Y N ( 8 , 4 ) . The shaded area gives the
exact binomial probability required, P(X 5 6), and the hatched area gives
the normal probability P ( Y 5 6 ) .
Probability
.
0
Figure 5.35
1 2 3 4
5
.
.
6 7 8 9 1'0111213141516
The probabilities P ( X 5 6) and P(Y 5 6) compared
z
Table 5.2
Binomial and normal
c.d.f.s
X
Binomial
c.d.f.
0
0.0000
2-8
-
2
-4
0.0000
Elements of Statistics
The diagram suggests that a more accurate estimate of the binomial probability P ( X 5 6) would be obtained from the normal approximation
P ( Y 5 6;). This comparison is shown in Figure 5.36.
Probability
0.20
4
0.15
0.10
Figwe 5.36 The probabilities P(X ( 6) and P(Y ( 6;) compared
The normal approximation gives
and this is indeed very close to the exact value of the binomial probability,
0.2272.
We may also need to use the normal distribution to approximate values of the
binomial probability mass function: then the same approach is adopted, as
shown in the following example.
Example 5.1 1 Approximating a binomial probability
Suppose we wish to approximate the binomial probability mass function when
We know that P ( X = 6) = P ( X 6) - P ( X
5) and that this can
be approximated by
X = 6.
<
<
= 0.1210.
Again, this is reasonably close to the true value of 0.1222.
W
What has been done in this example is to approximate the binomial probability function px(6) by the probability that the corresponding continuous
random variable lies in the interval from 5$ to 6$, as shown in Figure 5.37.
That is, the area of the bar centred on 6 is abproxLated by the hatched area;
you can see that a good approximation ensues. The approach that has been
used is an intuitively sensible way t o behave, and it is found in general t o give
6.0
Figure b.37
The normal
approximation to the binomial
probability p x ( 6 )
Chapter 5 Section 5.5
extremely satisfactory approximations for quite moderately sized n. This adjustment for discrete distributions to the normal approximation is often called
the continuity correction.
1
T h e continuity correction
When approximating the c.d.f. of an integer-valued discrete random variable X using the central limit theorem, write
P(XIX)NP(Y<Z+;),
(5.7)
where Y is normally distributed with the same mean and variance as X :
that is, Y N ( p X ,c:).
N
In the case where X is a binomial random variable with mean np and variance
npq, the expression (5.7) becomes
Exercise 5.18
If X
P(12
B(16, $) use the normal distribution to approximate the probability
< X < 15). Compare your answer with the true value of the probability.
So far we have looked at approximations to the binomial distribution when
p = and found them to be reasonably good. We should expect the approximation to work best in this case because when p = $ the probability mass
function of the binomial distribution is symmetric like the normal p.d.f. This
is, of course, precisely what was driving Galton's exhibition of approximate
normality using the quincunx, as was described in Section 5.1. However, the
approximation is also useful when p is not equal to
Here is a diagram of the
B(20,0.3) probability mass function and the approximating N(6,4.2) p.d.f.
to help convince you of this.
i
i.
Probability
Figure 5.38
The probability distributions B(20,0.3) and N(6,4.2) compared
Elements of Statistics
Example 5.12 An asymmetric binomial distribution
Suppose the random variable X has a binomial distribution B(20,0.3). Then
the mean of X is p = np = 6, and its variance is a2 = npq = 4.2. Then, for
example,
while (writing Y
N
N(6,4.2) and employing a continuity correction)
=P(Z
5 -0.24)
and from the tables this is 0.4052. Similarly,
P ( X = 4) = 0.1304.
This probability may be approximated by
~ ( 35
; Y 54;)
shown in the hatched area in Figure 5.39. From the tables this probability is
0.2327 - 0.1112 = 0.1215. The normal approximation to the exact binomial
probability required is not as good in this example as it was in Example 5.11,
but you can see that any differences are not serious.
Probability
Figure 5.39
Exercise 5.19
Suppose that the binomial random variable X has parameters n = 25, p =
9.
(a) Use your computer t o obtain the following probabilities t o 6 decimal
places.
(i) P ( X = 5)
(ii) P ( X = 6)
(iii) P ( X = 7)
(iv) P ( X = 8)
Chapter 5 Section 5.5
Write down the probability P ( 6 5 X 5 8).
Give the parameters of the normal approximation to this binomial distribution indicated by the central limit theorem.
Use the normal approximation to find the following probabilities, by
rewriting these probabilities in terms of the normal random variable Y
approximating the distribution of X.
(i) P ( X = 6)
(ii) P ( X = 7)
(iii) P(X = 8)
(iv) P ( 6 5 X 5 8)
You have seen that the central limit theorem provides us with a very useful
approximation to the binomial distribution provided n is fairly large; it is also
important that the binomial distribution is not too far from being symmetric.
Asymmetry is most evident when p is close to either 0 or 1. Here is a rough
rule for deciding when it may be appropriate to use a normal approximation
for a binomial distribution.
The c.d.f. of the normal distribution with a continuity correction provides a usable approximation to the c.d.f. of the binomial distribution
B(n,p) when both np 5 and nq 2 5, where q = 1 - p .
>
This rule is of course fairly arbitrary and whether or not it works depends on
how close an approximation is required, but it provides a reasonable basis for
deciding when the approximation may be used.
5.5.2 Normal approximations to other discrete
distributions
This chapter ends by looking at two more discrete distributions, one already
familiar to you, and their normal approximations. The method used is the
same as for the binomial distribution; the c.d.f. of our discrete random variable
is approximated by that of a continuous random variable with a normal distribution, by invoking the central limit theorem, and a continuity correction is
included to improve the approximation t o calculated probabilities.
The familiar case is the Poisson distribution. It was mentioned in Chapter 4, Section 4.3 that if X I , X 2 , . . . , X,, are independent Poisson variates
with means all equal to p, then their total also follows a Poisson distribution:
So, if we wish to approximate by a normal distribution a Poisson distribution
with large mean p , then we may think of the Poisson distribution as arising
as a sum of a large number n of independent Poisson variates, each with mean
p l n . By the central limit theorem, their sum will be approximately normally
distributed.
219
Elements of Statistics
In other words, the central limit theorem tells us that a Poisson distribution
with mean p may be approximated by a normal distribution with the same
mean and variance. In this case, these are both equal to p.
Table 5.3 Exact and
approximate Poisson probabilities
X
D X (X)
A~~roximation
For instance, if X is Poisson(l6), then we might try the approximation
P(x<x)~~P(Y<x+;),
where Y
N(l6,16), and
P(X=X)~~P(X-;<Y<X+;).
Values based on this last approximation are given in Table 5.3.
Although the approximating probabilities are not very close, they are always
within about 0.006 of the true probability, and so usually would be sufficiently
accurate.
Exercise 5.20
If the random variable X has a Poisson distribution with mean 40, use your
computer to find as accurately as you can the probability P(30 5 X 5 45)
and then find an approximation to this probability using the central limit
theorem.
The point at which the normal approximation N(,LL,
,LL)to the Poisson distribution Poisson(p) becomes a useful approximation depends essentially on the
purpose to which the resulting calculations will be put; but a rough rule which
many practitioners use is that p should be at least 30. As p becomes larger
than 30 the approximation gets better and better.
The next example is about sums of discrete uniform distributions, a context
that so far we have not considered.
Example 5.13 Rolling three dice
Finally, let us try rolling some dice! The probability distribution of the total
score when three fair dice are rolled is actually quite difficult to find: it is the
sum of three independent identically distributed discrete uniform scores on
1 , 2 , 3 , .. . , 6 . Let us try using the central limit theorem to obtain an approximation to the probability that the total score exceeds 15.
We know that if X is uniformly distributed on the integers 1 , 2 , . . . ,n , then
These results were stated in
Chapter 3.
Setting n equal to 6 gives
+ +
The sum S = XI X2 X3 of three independent scores has mean 3p = 10.5
and variance 3a2 = 8.75. Then the probability required (that the total score
S exceeds 15) may be approximated by writing
Chapter 5 Section 5.5
where Y
N(10.5,8.75). This is
In fact, the distribution of S is given in the following table.
Table 5.4 The distribution of the total score S, when three dice are rolled
S
1 3
4
5
6
7
8
9
10 11 12
13 14
The probability that the total score on a throw of three dice exceeds 15 is
therefore
L+L+L-&L216
216
216 - 216 - 0'0463;
and so the normal approximation to this probability is not bad.
H
Summary
1. Many manifestations of variability in the real world may be adequately
modelled by the two-parameter normal distribution N ( p , u2) with mean
p and variance u2 (standard deviation a ) .
2. The p.d.f. of the normal random variable X N ( p , a') is given by
The p.d.f. of the standard normal random variable Z
by
1 -p
$ ( z ) = ~e
, -m < z < m.
W
N ( 0 , l ) is given
The c.d.f. of Z is given by
3. In the usual way, probabilities are found by integrating under
mal curve, but this approach does not yield a closed formula.
reference is made to tables of standard normal probabilities
N(p, a2) and the standard normal
relationship between X
W
Z N(0,l):
and so
4.
Standard normal quantiles are defined by
qa = P(Z5 a).
the norInstead,
and the
variable
15
16
17
18
Elements of Statistics
To find the quantiles of X
N
N ( p , u2) it is necessary to use the relation
5. If the random variables Xi, i = 1 , . . . ,n, are independent normally distributed random variables with mean pi and variance U:, then their sum
is also normally distributed:
X~+xz+...+xn~N(Cp~)Ca?);
also, if X is normally distributed with mean p and variance u2, then for
constants a and b,
6.
The central limit theorem states that if X I , X2,. . . ,X, are independent
identically distributed random variables with mean p and variance a2,
then their mean X, has an approximate normal distribution
Equivalently, their sum T, has an approximate normal distribution
T, = X I + X 2 + . . - + X,
7.
%
N ( n p , nu2).
By the central limit theorem, if X is binomial B(n,p), then the distribution of X may be approximated by a normal model with corresponding
mean p = np and corresponding variance u2 = npq, where q = 1 -p. The
approximation will be useful if both np and nq are at least 5.
8. By the central limit theorem, if X is Poisson(p), then the distribution of
X may be approximated by a normal model with mean p and variance p.
The approximation will be useful if p is a t least 30.
9. When approximating the distribution of a discrete integer-valued random
variable X with mean p and variance u2 by a normally distributed random
variable Y N ( p , U'), it is appropriate to use a continuity correction:
W
and
The symbol ' E ' is read 'has
approximately the same
distribution