Download MDST242 C2 - The Open University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
The Open Untvemty
MnthemotirslSoc~olSc~entes~k~encefTerhnolagy
An mter fotulty second level tourse
Unit C2 Scientific experiments
:I'
STATISTICS
The Open University
Mathematics~Sotialkiences/kience/Technology
An inter-fatuity second level tourse
MDST242 Statistics in Society
Block C Data in context
Unit C 2 Scientific experiments
Prepared by the course team
The Open University
The Open University
Walton Hall. Milton Keynes
MK7 6AA
First published 1983. Reprinted 1986, 1989. 1993. 1997. 1998
Copyright
0 1983 The Open University
All rights reserved. No part of this work may be reproduced in any form, by mimeograph or any other
means. without permission in writing from the publisher.
Designed by the Graphic Design Group of the Open University.
Printed in Great Britain at The Alden Press, Oxford.
This text forms part of an Open University course. The complete list of units in the course appears at
the end of this text.
For general availability of supporting material referred to in this text. please write to Open University
Educational Enterprises Limited. 12 Cofferidge Close. Stony Stratford. Milton Keynes MKI I I BY.
Great Britain.
Further information on Open University courses may be obtained from the Admissions Office. The
Open University, P.O. Box 48, Walton Hall. Milton Keynes MK7 6AB.
Contents
Introduction
Scientific experiments
What are experiments?
Different kinds of experiment
Experiments and statistics
Carrying out an experiment
Purpose of the experiment
Setting up the experiment
Maintaining the experiment
Cutting the stems
Completing the experiment
Variability and systematic errors
The 1-test
Which test ?
The z-test reconsidered
The two-sample t-test
Analysis of mustard seedling data
The r-test continued
Why the t-test?
Checking the conditions
The one-sample t-test
The matched-pairs t-test
Other matched-pairs tests
Confidence intervals from t-tests
Analysis of variance
Solutions to exercises
Index
Acknowledgements
Introduction
Unit C1 introduced just one particular area, drug testing, in which experimentation
plays an important role. Although drug testing is undoubtedly an extremely important
activity, it is by no means the only one in which experimentation figures prominently.
Much of present-day technological knowledge in areas as diverse as agriculture and
electronics depends to a greater or lesser extent on knowledge which has been collected
from scientific experiments. This unit discusses more fully than was possible in the last
exactly what is involved in a scientific experiment and what are its key features.
In Section 2 of this unit, you are asked to set up an experiment on the growth of plants,
using mustard (or cress) seeds. This should give you an idea of some of the problems
that are involved in even the simplest of experiments. The timing of the experiment is
important and you therefore need to plan your schedule accordingly.
It will probably take you about one-and-a-half hours to set up the experiment,
assuming that you have already collected the items in the list which we sent you some
weeks ago. You will have to spend a few minutes on your experiment for each one of the
4 or 5 days after setting it up and then about one-and-a-half hours taking
measurements from your plants.
The analysis of the data from this experiment uses a new test which is similar to the ztest. This test is called the t-test and it is the major topic of Sections 3 and 4.
1 Scientific experiments
In this section we shall first describe those forms of inquiry for which we use the term
experiment. We shall then describe three different kinds of experiment common in
scientific inquiry before looking at one of these kinds in more detail because it is the one
which most commonly involves statistical analysis.
1.1 What are experiments?
The term experiment means a variety of things to a variety of people. To many it
conjures up a vision of a white-coated individual (usually a balding man with National
Health spectacles) surrounded by dials, flashing lights and vaporous fluids bubbling
sullenly in mysteriously coiled vessels. To others an experiment involves no more than
adding a new ingredient to a tried and trusted flan recipe to see if the taste is improved,
or planting the bulbs earlier than in previous years to see if they do better. These uses
are all perfectly proper: it is not what you do that qualifies an activity as an experiment,
it is the way you do it. In other words, the area in which you carry out an experiment
might be nuclear physics or it might be cookery, neither lies outside the province of
experimentation, but if other people are going to accept that you have carried out an
experiment then you must use certain methods and procedures.
One fundamental feature of any experiment is that it is a public activity. This does not
mean that an experiment has to be performed in front of a crowd. It means that
everything that an experimenter does should be open to inspection by other people. If
person (A) carries out an experiment then he or she should be able to explain everything
that took place during the experiment in such a way that another person (B) could, if
necessary, go trough exactly the same procedure. The results of this experiment should,
again, be open to public scrutiny so that B's results can be compared with A's. This idea
is already familiar from Unit CL If a drug company carries out a clinical trial on a new
beta-blocker then they must be able to describe every part of their procedure to the
outside world: including how the experiment was designed; how many subjects were
involved; how the drug was administered and in what quantities; how its effects were
measured etc. In fact, the Committee on the Safety of Medicines requires such
information before they grant a product licence. In just the same way, a cook who
experiments by altering an ingredient in a recipe should be able to explain every part of
the revised recipe so that other people could repeat the revised procedure exactly.
A second Fundamental feature of an experiment is that it sets out to answer a specific
question. Does this drug alter blood pressure? Does this ingredient improve the taste of
the flan? Does planting the bulbs earlier produce better flowers? Notice that each of
these questions is framed in such a way as to demand that something be measured if the
question is to be answered: namely blood pressure, taste or flower quality.
How should a cook carry out an experiment to discover whether the
addition of an extra ingredient improves the taste of a flan?
Activity 1.1
Solotiim The procedure could be exactly the same as that which a drug company uses
to discover whether a new drug reduces headaches. The cook prepares two flans, one
with and one without the new ingredient. A panel of tasters is recruited and each taster
tries each flan in a double-blind experiment. A crossover design would probably be
most suitable so that the tasters act as their own controls,i.e. each tastes both flans. Half
of the tasters test the old recipe first and the other half test the new recipe first. Each
taster rates the taste of each flan on a 5-point scale and the ratings of the new and the old
recipes are analysed by using a hypothesis test.
This activity shows that it is possible to carry out scientific experiments on all sorts of
things, not just on formally recognized scientific subjects. To illustrate the difference
between the scientific approach to a question and other forms of inquiry, it is instructive
to consider how a scientificexperimentmight be carrid out on such an unlikely subject
as poetry appreciation. Take a poem such as Keats's ode To Autumn (Figure 1.1).
Figure 1.1
A literary critic might ask the question: 4s this a great poem?'. Framed in this way
the question is not amenable to scientificexperiment. People might vary in their opinion
of the poem and they would probably produce more or less convincing evidence to
support their viewpoint. One might point to the images which the poet uses and argue
that they successfully convey the mood of autumn, another might complain that the
overall effect is too'languid for his taste etc. Although people might agree as to what are
good and bad criteria for judging the greatness of a poem, and although informed
opinion might generally agree as to the greatness of this particular poem, there is no
sense in which assessing the greatness of this poem is a scientific experiment. However,
it is possible to ask questions about the poem which are open to experimental
investigation.
Activity 1.2 Since the poem has three verses, someone might argue that the third verse
is inferior to the others, and that without it the poem is greatly improved. How might
you test this experimentally?
Solution One possibility is to use two groups of people, as similar as possible with
respect to those features which you think might be relevant to poetry appreciation, such
as age, sex, educational background, and ignorance of the poem in question. Each
person in one group would be asked to read the three-verse version of the poem and
rate how good it is on a 7-point scale. Each person in the other group would be asked to
read, and rate, the truncated two-verse version. The two batches of ratings would then
be analysed by applying the Mann-Whitney test. m
Of course nobody would imagine that this particular experiment is of any literary
value. Our purpose is simply to show that experimental investigation need not be
limited to the physical world.
1.2 Different kinds of experiment
All experiments have in common these two important features
they are public
they produce measurements or observations designed to answer specific questions.
Nonetheless there are different kinds of experiment: they differ in that they address
different kinds of question. This is not a case of, for example, some experiments being
about physics and others about chemistry; it is a case of the questions being different in
their purpose.
Some experiments answer questions of the form
What happens if I do this?
Some of the experiments that you met in Unit C1 answered questions of this sort. For
example, the question
What happens if I give this person this new drug?
might arise in the early stages of drug testing. Such experiments are exploratory in
nature, in the same way that toddlers7 investigations of their surroundings are
exploratory.
A second kind of experiment has, as its primary purpose, not exploration but the
measurement of a particular attribute. Such experiments are designed to answer
questions such as
What is the velocity of light?
How heavy is the earth?
How old is this rock?
What is the population of the UK?
This kind of experiment is a very important part of scientific investigation but we shall
not discuss such experiments at length in this unit.
A third kind of experiment, however, is both important and relevant to this block of the
course. This is the kind that tests a specific hypothesis. Perhaps the best way to explain
this kind of experiment is to give some examples.
ANALYSE
Francis Bacon, a sixteenth~
~
~
~
~
contem~orariesto carrv out
experiients of this type and,
for this reason, you may find
such exdoratory experiments
referred to as ~ a c o i a n .
~
{
Example 1.1
Hypothesis The baby is crying because it is cold.
Prediction based on hypothesis If the baby is made warmer then its crying will stop.
Test Make baby warmer but make sure that you do not change anything else in its
environment.
Possible results and conclusions The baby continues crying, therefore the hypothesis is
wrong. The baby stops crying, therefore the hypothesis is supported.
Example 1.2
Hypothesis The television is not working because the fuse in the plug is broken.
Prediction based on hypothesis If the fuse is replaced with one that does work then the
television will work again.
Test Replace the fuse but do not alter anything else.
This ensures that the
experiment has proper
controls.
Since you cannot discount the
possibility that it would have
stopped anyway, you cannot
say that the hypothesis is
correct.
Possible results and conclusions The television still does not work, therefore the
hypothesis is wrong. The television works, therefore the hypothesis is supported.
Example 1.3
Hypothesis Food putrefies if left for too long because of the action of microbes (small
organisms) that are present in the air and that come into contact with it.
Prediction based on hypothesis If the microbes are prevented from acting on the food
then it will not putrefy.
Test Prevent the microbes from acting by killing them or otherwise preventing them
from acting (for example, by deep-feeezing).
Possible results and conclusions The food putrefies, therefore the hypothesis is wrong.
The food does not putrefy, therefore the hypothesis is supported. I
Example 1.4
Hypothesis Sound is transmitted through air because sound is a form of mechanical
vibration and air contains particles (molecules) that can jostle each other and so pass
the vibration from one particle to the next.
Prediction based on hypothesis If all the air is removed from a container so that a
vacuum remains then it should not be possible to transmit sound through that vacuum.
For example, it should not be possible to hear an electric bell ringing inside a container
from which the air has been removed (Figure 1.2).
Test Remove air from a container and discover whether an electric bell inside it
becomes inaudible.
Possible results and conclusions The bell can be heard through the vacuum, therefore
the hypothesis is wrong. The bell cannot be heard, therefore the hypothesis is
supported.
The first two of these examples are homely examples of hypothesis-testing experiments
of a kind that people routinely carry out in their daily lives. The latter two examples are
well-known, formal, scientific experiments. Example 1.3 was first carried out by Louis
Pasteur (1822-95), who successfully prevented microbes from attacking food by
filtering the air in which the food was kept, and by taking food to the tops of mountains
where the combination of a low temperature and relatively microbe-free air prevented
the food from decaying.
All four of these examples have the following important features in common. In each
there is a specific hypothesis about the cause of a phenomenon. Hypothesis-testing
experiments try to explain how something works. The experimenter makes predictions
that flow directly from the hypothesis: if the hypothesis is true then certain things must
follow. For example, if it is true that microbes are the solecause of food decay then it
automatically follows that food should not decay if microbes are absent.
The experiment itself consists of testing whether the things that the hypothesis predicts
actually happen. If the result of the experiment is not what the hypothesis predicted
then the experimenter has to accept that the hypothesis is wrong. For example, if food
were to continue to decay, even when it was absolutely certain that no microbes were
Figure 1.2
Electric bell
present, then the hypothesis that microbes are the only cause of food decay would have
to be rejected.
It is important to note, however, that even if the prediction that the hypothesis makes
turns out to be correct then it may still be wrong to assume that the hypothesis itself is
perfectly correct. Take, for example, the hypothesis that malaria is caused by a
mosquito (more specifically, by a particular type of mosquito called Anopheles). A
prediction which follows from this hypothesis is that malaria should cease to occur in a
district from which Anopheles mosquitos have been eradicated. In fact, this is what
normally happens in practice, so it would seem reasonable to conclude that the
hypothesis is correct. However, although it is correct in one sense, it is not in another. In
a district from which the Anopheles mosquito has been eradicated, some people may
still continue to contract malaria, apparently spontaneously, long after the mosquitos
have gone. If you were ignorant of these cases of malaria then it would be legitimate to
believe in the original hypothesis that Anopheles mosquitos cause malaria. As soon as
these few instances come to light, the original hypothesis must be discarded and a new
one sought. In the case of malaria, the truth is that the mosquito is only a carrier for a
parasite, called Plasmodium, which is the direct cause of malaria. Plasmodium can
survive for long periods in the human body without giving rise to malaria. From time to
time, however, it invades the blood and malaria then develops.
To summarize, if the predictions that follow from a hypothesis turn out to be incorrect
then the hypothesis has to be abandoned or, at least, modified. If the predictions turn
out to be correct then the hypothesis is supported in the sense that it can be accepted
provisionally that the hypothesis is correct. There is always the possibility that, one
day, somebody may find that under certain conditions the predictions are incorrect.
When that happens, the hypothesis has to be replaced by a new one.
1.3 Experiments and statistics
Before going any further, it is important to consider how this description of scientific
hypothesis-testing experiments fits in with the ideas of statistical hypothesis-testing in
Block B. From the statistician's point of view, the above four examples of hypothesistesting experiments are framed in rather unconventional terms, because each
concentrates on a hypothesis of the form something afSects something and tests
predictions flowing from that hypothesis. Nevertheless, it is perfectly possible to
formulate each of the above experiments in statistical terms with a null and an
alternative hypothesis and it is essential to do so if you want to apply statistical
hypothesis tests to data arising from such experiments.
The null hypothesis, as its name implies,very often postulates the absence ofagiven effect
or relationship. For example, in the clinical trial of a drug, the null hypothesis postulates
that its effect does not differ from that ofthe placebo. The alternative hypothesis proposes
that its effect does differ from that of the placebo. When we carry out a statistical
hypothesis test, we concentrate on the null hypothesis. We say
What is the probability of obtaining a result at least as extreme as that which
we have obtained, if we assume that the null hypothesis is true?
I
HYPOTHESIS
I
TEST
STATISTIC
I
CRITICAL
VALUE
r-l
COMPARE
See Section 2.2 of Unit B2.
If this probability is too low then we reject the null hypothesis in favour of the
alternative.
We shall now devise a null and an alternative hypothesis for two of the four
experiments. In each case we shall assume that the tests are the same as those described
earlier and state what conclusions we draw from the different results that could be
obtained.
Example 1 .l (continued)
Null hypothesis Cold has no effect on the baby's crying.
Alternative hypothesis Cold makes the baby cry.
Test Alter the temperature of the baby's surroundings and set up suitable controls.
Possible results and conclusions The baby's crying stops, therefore reject the null
hypothesis. The baby's crying continues, therefore the null hypothesis is supported.
Often, as with these examples
the null hypothesis in the
statistical hypothesis test is
the converse of the hypothesis
in the scientific experiment; as
a result, the reformulation is
often not logically equivalent
to the original experiment.
Example 1.2 (continued)
Null hypothesis The present condition of the fuse is not responsible for the television's
refusal to work.
Alternative hypothesis The television is not working because the fuse is broken.
Test Replace the fuse but do not alter anything else.
Possible results and conclusions The television works, therefore reject the null
hypothesis. The television still does not work, therefore the null hypothesis is
supported. m
Activity 1.3 Express the experiments in Examples 1.3 and 1.4 as statistical hypothesis
tests.
Solution
Example 1.3
Null hypothesis
Microbes d o not make food putrefy.
Alternative hypothesis Microbes do make food putrefy.
Test Prevent microbes from acting on the food
Possible results and conclusions The food does not putrefy, therefore reject the null
hypothesis. The food putrefies, therefore the null hypothesis is supported.
Example 1.4
Null hypothesis Sound is not transmitted by the jostling of air molecules.
Alternative hypothesis Sound is transmitted by the jostling of air molecules.
Test Ring a bell inside a vessel with all the air pumped out.
Possible results and conclusions. The bell is inaudible, therefore reject the null
hypothesis. The bell can be heard, therefore the null hypothesis is supported.
One final point needs to be made about the relationship between hypothesis-testing
experiments and statistical hypothesis tests. A great deal of scientific experimentation
consists of testing specific hypotheses of the sort just described. If the experiments
require statistical analysis then the experimenter should use statistical hypothesis tests.
This will be the case if the experiments involve things that are intrinsically variable:
such as people, plants or animals. Sometimes, however, a scientist may not be so much
interested in testing whether or not a given treatment has an effect (maybe somebody
has already done an experiment which shows that it does) but rather in investigating
how big that effect is. An agriculturalist might want to know, for example, by how much
a new fertilizer increases the yield of a crop. In such instances the scientist conducting
the research project would use statistical estimation procedures (for example, finding a
confidence interval) rather than carrying out a hypothesis test.
Exercise 1.l
State whether each of the following is a scientific experiment.
(a) Measurement of the distance between the earth and the sun.
(b) Accelerating and braking less fiercely in a car to see whether petrol consumption is reduced.
(c) Reading a tea-taster's report on a brand of tea in order to decide whether to buy it or not.
Solution on p. 43
(d) Investigating whether overweight is caused by overeating.
Exercise 1.2
For each of the scientific experiments in Exercise 1.1, state which of the three kinds it is:
exploratory, measurement or hypothesis-testing.
Solution on p. 43
Exercise 1.3
For those scientific experiments in Exercise 1.1 which are hypothesis-testing experiments,
formulate them as statistical hypothesis tests.
Solution on p. 43
Hypothesis-testing experiments involve testing the deductions that can be made from a
hypothesis. For this reason, scientific experimentation is sometimes said to proceed by
the bypotheticodeductive method. There is currently considerable debate about
whether science does proceed by this method under all circumstances but this debate
lies outside the scope of this course. Many scientists feel that carrying out an
experiment is rather like riding a bicycle: if you think too hard about what you are
doing, you fall off! Accordingly, it is probably most profitable to leave such discussions
of the nature of scientific experimentation and to get down to the practicalities of
carrying out an experiment.
This description is frequently
with the
philosopher Karl Popper.
Summary
Scientific experiments are distinguished by two main features.
All stages of an experiment are open to public scrutiny and verification.
Experiments are designed to answer specific questions by making specific
measurements.
At least three kinds of experiment can be recognized: they are distinguished by the kind
of questions they attempt to answer. These are
exploratory (Baconian) experiments
measurement experiments
hypothesis-testing (hypothetico-deductive) experiments.
After completing your work on this section you should be able to
distinguish between experimental and non-experimental forms of inquiry
distinguish between the three kinds of experiments described above.
2 Carrying out an experiment
Now that you have read about some of the basic principles of experimentation, it is
worth trying to put these principles into practice by setting up and carrying out your
own experiment. You should read the whole of this section before you start the
experiment. The experiment is concerned with the growth of mustard (or cress)
seedlings.
You should set up the experiment and then leave the seedlings to grow for 4 or 5 days,
checking them from time to time and pruning some of them after 2 or 3 days. You will
then need to take measurements from them; by this time you should have read Section
3 and so be ready to analyse your results statistically. Provided that you have already
collected all the necessary items it should not take you more than about one-and-a-half
hours to set up the experiment. Here is the list of items that ,you need.
Items needed for the experiment
Two identical plastic containers such as small flower pots or empty cartons, e.g. of
margarine. These need to be at least 23 inches (6 cm) in diameter at the open end or,
if rectangular, need to have sides at least 2 inches (5 cm) long.
Two large containers (ice cream tubs, sandwich boxes, small buckets or basins)
which will hold a pint of water without leaking, and which can each hold one of the
plastic containers mentioned above. These larger containers will need to stand
side-by-side in a well-illuminated position, e.g. on a window ledge.
Mustard is better because it
grows faster but cress will do
perfectly well.
The details of these activities
are in Sections 2.2-2.5.
Figure 2.1
2
v ,.-
Pint or water in each
large container
<.
Holes in base
of each small
container
One small packet of mustard seeds. (If these are difficult to obtain then use cress
seeds.)
One piece of aluminium kitchen foil about 30 cm X 30cm (12 inches X 12 inches).
One piece ofclear kitchen foil (not cling-film) about 30 cm X 30 cm (12inches X 12
inches).
Some Andrex, Dixcel or other superior toilet tissue.
Two sheets of high quality (typing or writing) paper, preferably coloured so that
the seedlings show up against the paper.
Four elastic bands to go round the tops of the flower pots or margarine or yoghurt
cartons.
A milk bottle or other pint container.
A pair of dividers or a piece of plasticine (or a similar material) plus two pins.
A ruler measuring mm.
Two teaspoons.
A pair of fairly small, sharp-pointed scissors.
A magnifying glass would be helpful but is not essential.
2.1 Purpose of the experiment
The purpose of the experiment is to discover whether light affects the growth of plant
roots. Most people are familiar with the fact that light is important to plants and realize
that it affects the growth ofthe stems and leaves. If you keep a potted plant on a window
sill, for example, and never turn it round then it is likely to grow quite noticeably
towards the light. But what effect, if any, does light have on the growth of the roots?
Normally, ofcourse, the roots are underground and so are not exposed to the light but
it is quite conceivable that the roots could become exposed as they grew; for example, if
the plant was growing on an irregular surface or if rain washed some of the soil away.
Under such circumstances the plant would probably not benefit from the root
continuing to grow out into the open. Roots give plants mechanical support and they
absorb water and nutrients from the surrounding soil. They can d o none of these things
if they are growing in the open air. It seems a plausible hypothesis, therefore, that light
suppresses root growth so that the roots will tend not to grow in the open air.
+p$-,
INTERPRET
BACK A N A L Y S E
It is also possible to argue
plausibly that the roots will
tend to grow longer in the
open air.
The experiment which you are asked to carry out is on mustard seedlings. Thus the
precise question to be answered is
Does the light affect the root growth of mustard seedlings?
The principle of the experiment is simple. You are asked to grow two groups ofmustard
seedlings, one entirely in the dark and the other entirely in the light, but otherwise in as
near identical conditions as possible. After some time has elapsed you should measure
the lengths of the roots of the seedlings and compare the two groups. As in any
experiment, it is essential to control various factors.
COLLECT
CLARIFY
p--,H?
INTERPRET
BACK
ANALYSE
Activity 2.1 List some factors which need to be controlled and suggest how such
control might in each case be achieved.
Solution The seedlings need to grow under conditions that are as similar as possible
except for the presence or absence oflight. It is especially important that the seedlings in
the two groups are grown at the same temperature and humidity, and that they are
equally spaced from each other, so that they do not suffer from unequal crowding
effects. By growing the two sets of seedlings side-by-side in adjacent pots it should be
possible to control for temperature. By carefully spacing the seeds in the pots it should
be possible to avoid unequal crowding effects.
Darkness will be achieved for one set of seedlings by covering the pot in which they are
growing with aluminium kitchen foil. As well as cutting out the light, this is likely to
have the effect ofraising the humidity in the enclosed air space. It is important that the
seedlings grown in the light should experience similar humidity and this can be
achieved by covering their pot as well. The simplest solution is to cover the pot with
transparent foil. However, aluminium might conduct heat differently from transparent
foil so this might introduce an unwanted difference into the experiment. The difference
in temperature conditions induced by the two different coverings will probably be
small, especially if the room temperature is fairly constant during the experiment. It is
up to you whether you try to devise ingenious ways of reducing this source oferror but
it is not worth spending too long looking for ways ofimproving this particular aspect of
the experiment.
There is another, quite subtle, factor that needs to be controlled. Light will affect the
stems and leaves of the seedlings and the stems and leaves of the two groups ofseedlings
will differ quite strikingly in their appearance as a result. It is possible that these
changes in the leaves and stems themselves affect the roots. Perhaps a big stem
stimulates the growth of a big root, whereas a small stem does not. If this is so then it
will not be certain whether any differences which emerge are directly due to the effect of
lighton theroots,orwhetherthey aredue to theeffect ofthelightontheleavesandstems,
which in turn affect the growth of the roots. To control for this you should cut off the
growing shoots from the seedlings. m
This experiment should provide an impressive set of data. Before going on to analyse
your measurements you will need to consider the hypothesis being tested (as in Section
1).
Activity 2.2 State the null hypothesis that is being tested by thisexperiment. State the
prediction that follows if the null hypothesis is correct. State the conclusions that you
can draw from the different results which you might get.
Solution
Null hypothesis Light has no effect on root growth.
Prediction based on null hypothesis The seedlings grown in the light will not differ in
average root length from the seedlings grown in the dark.
Possible results and conclusions If thereis a difference in average root lengths between
the two groups ofseedlings then the null hypothesis must be rejected and the alternative
hypothesis, that light does influence root growth, should be accepted. If there is no
difference then the null hypothesis is supported.
We suggest that you do not
treat all the seedlings in this
way. This will allow you lo
see how long the roots grow
on the seedlings which are not
cut (details in Section 2.4).
pp?,-l
~ h t t k W t T BACK ANALYSI
If you were to discover, however, that the difference appears only in seedlings whose
stems have not been cut then you should suspect that light does not directly effect
root growth but does so indirectly through its effect on the leaves and stem.
2.2 Setting up the experiment
Here are the instructions for setting up the experiment.
1 Empty all the seeds (i.e. at least 40) into a basin containing cold tap water and leave
them to soak for an hour.
Seeds remain dormant while
they are dry; soaking them
ensures that all the seeds start
germinating at the same
moment and that they all get
off to a good start.
2 Make sure that the two large and the two small containers which you are going to
use are thoroughly clean, and thoroughly rinsed free of detergent, soap or any other
contaminant.
3 Put a pint of tap water into each of the large containers.
4 Take the two small containers and if they do not already have holes in the bottom
then cut one or two so that the water can easily seep up through the holes (see Figure
2.1).
5 Crumple several lengths of tissue paper and press them firmly into each small
container. Continue until both containers are just over-full. The paper should be firm
but not jammed solid.
6 Fold several lengths of tissue paper to make a smooth platform, then lay a sheet of
high quality (typing or writing) paper on top. Ensure that this platform touches the
crumpled tissue paper. Fix this platform of paper over the top of one of the small
containers with an elastic band (see Figure 2.2). Repeat for the other small container.
Figure 2.2
Typing or writing paper
Folded
tissue
Paper
\
Elastic band
/
Crumpled tissue paper
(absorbs water)
7
Stand each small container in a large container (see Figure 2.1).
8 By the time that the seeds have been soaked for one hour (see Instruction l ) the
paper in each small container should be thoroughly damp. If it is not then gently spoon
over the paper surface some of the water from the big container in which the small
container stands.
9 Reject any seeds that seem unusual: for example, those that are a different colour
from the rest and those that float rather than sink in the water.
10 Mix the seeds well and then allocate them alternately to two groups until 20 have
been allocated to each.
11 Arrange the two groups of 20 seeds as follows: one group on the writing paper
surface over each small container. A teaspoon may help you to manoeuvre the seeds
into position. Arrange each group of seeds in four rows of five, each row being at least
1cm (half an inch) away from the next (see Figure 2.3).
The further apart you can put
them>provided that they are
spaced regularly, the better.
Figure 2.3
12 Toss a coin to select randomly the pot whose seeds will grow in the dark; then
make a hood out of the aluminium foil and wrap it over the top of the chosen pot,
leaving a space of at least 5 cm (2 inches) between the seeds and the top of the foil (see
Figure 2.4). Secure the foil with an elastic band. Repeat the hood-making procedure,
using the clear foil, for the second pot.
o
a
O
O
O
13 Place the two sets of containers side-by-side in a well-lit spot safe from disturbance
by children, pets or curious passers-by. Try to ensure that the temperatures of the two
groups are the same.
This completes the setting-up procedure.
2.3 Maintaining the experiment
You also have to undertake a few tasks during the days that follow. Exactly how fast the
seedlings develop will depend on the temperature but you can expect to cut the stems
off some of the seedlings (see below) after about 2 to 5 days and to measure the root
lengths after 3 to 7 days. The maintenance tasks are as follows.
1 To control for any dissimilarity in the temperature of the two groups of seedlings,
swap the positions of the two large containers each day.
2 Make sure that there is still plenty of water in the large containers. If they begin to
dry out, add another pint of water to both of the large containers.
3 Check the seeds which you can see (i.e. those growing in the light) once a day to see
how rapidly the stems and roots are developing.
2.4 Cutting the stems
When most of the seedlings growing in the light have grown root hairs and have stems
which are a little over 1cm (about half an inch) long, cut the stems off 10 seedlings in
each pot (i.e. 20 seedlings in all), Any seedling which has not grown sufficiently for its
stem to be cut should be left uncut.
Figure 2.5 shows you where to cut. You will see on most of the roots a white, fluffy area
which is due to the presence of a multitude of very fine hairs, called root hairs. It is
primarily through these hairs that the root absorbs water and nutrients. Use the top of
the root, where the root hairs stop, as a reference point and cut across the stem about
3mm ($inch) above this point. Cuts should be made without moving the seedlings
using small, sharp-pointed scissors. Any seedling which has not produced any root
hairs should be left uncut.
When you have finished cutting the stems, replace the aluminium and clear foils on the
pots they were previously on, taking care not to squash the seedlings, and return them
to their positions (in the large containers) by the window. If the cut stems start to grow
rapidly then repeat the cutting process a few days later.
Fikre 2.5
Developing
leaves
\\
Root
hairs
Cut here with
small scissors
2.5 Completing the experiment
Leave the pots until the roots of most of the seedlings in the light have grown to at least
1cm (about +inch) long.
To measure the root lengths you will need either a pair of dividers, a pair of compasses,
or a piece of plasticine and two pins, or needles, assembled as shown in Figure 2.6. Start
with one pot of seedlings and working along each row in turn, carefully lift each seedling
off the paper, taking great care to ensure that the tip of the root is not left behind. Lay
the plant down on a flat and, preferably, dark-coloured surface and straighten the root
as far as possible. Measure from the tip of the root to the position near the top of the
root where the root hairs stop (see Figure 2.7). You should do this by putting one point
of the dividers etc. against the root tip and then moving the other until it lies opposite
the position where the root hairs stop. Then measure the distance between the two
points of the dividers etc. with the ruler. Measure it in millimetres (to the nearest whole
millimetre).
Some seeds may not
germinate at all and these
should be ignored at this
stage.
This is the trickiest part of the
experiment.
You may find it difficult to
measure the root lengths in
this way: if you do then try
measuring the roots directly
with the ruler instead.
Figure 2.6 Embed heads ofpins, or
needles, in plasticine about 1 cm apart
Figure 2.7
Measure
this length
I
I
I
l
Write the measurement down in the appropriate place in Figure 2.8. For each seedling,
indicate very clearly whether or not you have previously cut its stem. Indicate also any
seeds that failed to germinate by putting a cross in the appropriate box. Measure all the
seedlings in one pot in this way and then repeat the procedure for the second pot.
Figure 2.8 Lengths of roots (mm)
Light
Dark
00000
oontlu
Onono
DCLARIFY
000017
2.6 Variability and systematic errors
Whatever differences might exist between your two groups of seedlings, it is likely that
they will not be very large. If they are large then there would be no need to use a
hypothesis test to analyse the data. One of the reasons why any differences which might
exist may not be obvious, despite all the controls which you have used, is that the
seedlings themselves are variable. You may try very hard to ensure that all the seedlings
grow in the same conditions but nevertheless there will be some small variations which
affect the plants. More importantly, even if they are grown under identical conditions,
different plants will still grow differently.
Variability was discussed in
Section 2.3 of Unit C1 -it is
similar to sampling error (see
Section 4.1 of u n i t ,431.
We have explained that you might find some difficulty in measuring accurately the
lengths of the roots. Because you cannot straighten the roots perfectly, it is possible that
you will consistently underestimate their length. This kind of error, which is consistent
in its direction and approximately constant in its magnitude, is called systematic error.
Most scientific experiments are subject to both variability and systematic errors. The
task of the scientist (i.e. you) in carrying out an experiment is to discover, despite the
presence of variability, bias and systematic errors, whether genuine differences exist
between the two groups.
One maior source of variabilitv which. as we have exdained.. vou
. are verv likelv to
experience is that some of your seeds may not germinate at all. When, later in this unit,
you calculate the means of the root lengths for your two groups of seedlings, should
these non-starters be included? There is no hard-and-fast rule as to how to deal with a
problem like this but it would seem sensible in this experiment to exclude them. Thus
you will need to answer the following, even more precise, question.
Does light affect the root growth of those mustard seedlings which germinate?
l
Summary
In this section we have described how you should set up, maintain and complete a
scientific experiment to investigate the growth of mustard seedlings. This led to a
discussion of the problems of variability and systematic error which the experiment
illustrates. We also considered how you should analyse the data which you will get from
this experiment and formulated a statistical hypothesis which you will be able to
test using this data. This leads to the question
Which hypothesis test should be used to test this hypothesis?
We shall answer this question in the next section but now you should set up the
experiment as described in Section 2.2. While you are maintaining it as described in
Section 2.3, you should work through Section 3 to learn how to analyse the data which
you will collect when the experiment is completed (see Section 2.5).
3 The t-test
You now need to decide how to analyse the data from the mustard-seed experiment. In
this section we shall first consider this decision in the context of the hypothesis tests
introduced in Block B, and then we shall introduce a further test which is particularly
useful for data like this.
3.1 Which test?
In Block B we introduced several hypothesis tests, including the x2 test, the MannWhitney test, the Wilcoxon matched-pairs test and the z-test, and in Unit B5 we
summarized how to decide which test is suitable for analysing a particular sample of
data. We shall now apply these principles to the data which you will obtain from your
mustard seedlings.
Activity 3.1
(a) Will the data from your seedlings be from categorical, ordinal or interval scale
measurements, i.e. will it be categorical, ordinal or interval data?
(b) Are the seeds in the two samples matched in pairs or are they unmatched?
(c) Of the tests which you have already met in the course, which would be appropriate
for analysing your seedling data?
Solution
(a) They will be on an interval scale. Not only is it possible to say that a length of, say,
30mm is longer than a length of 20mm- you can also say how much longer it is.
(b) There is no obvious method for matching the seeds from the two samples in pairs
so they are not matched. Thus the samples are unrelated.
(c) You have come across several different tests which can be applied to two unrelated
samples of data measured on an interval scale. Remember that tests which apply to
categorical and ordinal data can also be applied to interval data. Thus the x2 test or
Mood's median test could be applied, if you were to construct a contingency table from
the data by dividing the measurements into categories. However, both these tests need
fairly large samples and, unless almost all your seeds germinate, it is likely that your
samples will not be large enough.
The Mann-Whitney test can also be applied to your data. Although it needs only
ordinal data (because it takes account only of the order of the sample values), it can of
course be applied to interval data as well. It also works well on small samples.
The two-sample z-test can be used on unrelated samples of data measured on an
interval scale but, like the test, it requires samples larger than those which you have.
Thus the Mann-Whitney test is the most appropriate one for your data. m
You could go ahead and carry out a Mann-Whitney test but think for a moment about
what it involves. The test statistic is found by taking each value in one sample and
counting up how many values in the other sample are less than it. It does not matter how
much less they are: that is precisely why it can be applied to ordinal measurements.
Now, you will have interval data: in comparing two sample values, you know not only
which one is the larger but also how much larger it is than the other. Yet the MannWhitney test does not use this information. This raises two questions.
Does this matter?
Is there a test which can be used on small samples and does use this
information ?
The answer to the first question is somewhat complicated (we shall return to it in
Section 4) but the second is easier. There is such a test but it can be applied only to data
from populations satisfying certain distributional conditions.
See Section 4.5 of Unit Cl.
Amongst the tests introduced in Block B, the one which comes closest to what is needed
is probably the two-sample z-test. It can be used to compare two unrelated samples of
interval data. Thus the only limitation which prevents you using it for your seedling
data is that the z-test can be used only for large samples. It is worth looking again at the
rationale behind the z-test to see how this limitation arises. We shall then show how,
and under what conditions, it can be modified to give the test which you require.
3.2 The z-test reconsidered
Suppose that you have unrelated samples of data from two populations, whose
population means are p, and p,, and that you want to use the two-sample z-test to test
the following null hypothesis H . against the two-sided alternative hypothesis H,.
H o : p, - p , = 0
H I :p ~ - p , # O .
The rationale of the two-sided z-test is as follows.
1 If the null hypothesis were true then the means p, and p, of the two populations
would be identical so, on average, you would expect the means from two samples, one
taken from each population, to be the same. In other words, on average you would
expect the difference between the means of the two samples to be 0. If the means of the
two samples are denoted by m, and m,, respectively, then, on average, you would
expect that m , - m, = 0.
2 If the null hypothesis were true and you took repeated pairs of samples from two
populations like this then you would find that sometimes m , was a bit bigger than m,
and that sometimes m~ was a bit bigger than m,. Thus m, - m, would sometimes be a
bit bigger than 0 and sometimes be a bit smaller. In other words, m, - m, has a
sampling distribution. In Unit B4 we explained that, provided the samples are large
enough, this sampling distribution of m, - m, is approximately Normal with a mean
value of 0. The standard deviation of this sampling distribution is called the standard
error (SE). The value of this standard error depends on the two population standard
deviations and also on the two sample sizes.
3 The test statistic z is calculated as follows
If the null hypothesis were true then the sampling distribution of the test statistic z
would be the standard Normal distribution.
4 Unfortunately the standard error is not usually known because the populations
from which the samples are taken are inaccessible and so the population standard
deviations are unknown. In Unit B4 we calculated the following estimate of the
standard error from the data being analysed
For example, neither you nor
anyone else has access to the
entire population of mustard
seedlings. We therefore have
to estimate the standard error
(SE).
where
S:
and
S;
are the variances of the two samples and n, and n, are their sizes.
5 So the test statistic z is calculated as follows
6 The null hypothesis is rejected whenever z is far enough away from zero. For
example, using a two-sided test at the 5 "/,significance level, the null hypothesis is
rejected if
The figure 1.96 is the critical
value.
For large samples the procedure in Stages 5 and 6 above works perfectly well but for
small samples, particularly those involving less than 25 values, two problems arise.
As we mentioned in Unit B4, the sampling distribution of m, - m, is Normal only
for reasonably large sample sizes (at least 25). Thus the z-test does not necessarily
work for small samples because the sampling distribution ofm, - m, may not then
be Normal. This problem does not arise if the two populations in question
themselves have Normal distributions: in this case the sampling distribution of
m, - m, is Normal for all sample sizes, however large or small. Thus, if it is
reasonable to assume that the populations have Normal distributions then this
difficulty is removed.
The second problem is in the estimate of the standard error, which is
To be more precise, this is the sample estimate of the standard deviation of the
sampling distribution of the difference between the sample means: m, - m,. It is
used because the actual value of this standard deviation (i.e. the standard error) is
not known. For large sample sizes, this estimate is likely to be very close to the true
standard error but, even for large samples, it is a number calculated from a sample
so it is not the same for all possible samples. Because of this variability between
samples, the test statistic
has a distribution different from the standard Normal distribution which you
would get if you divided m, - m, by its actual standard deviation (i.e. the actual
standard error) rather than this sample estimate thereof.
If the sample sizes n, and n, are large then
will vary very little from one sample to another. Its distribution will have a very
small spread and so
m, - m,
JZ
will have a distribution which is very close to the standard Normal distribution. If
either n A or n~ is small then this distribution will not be close to the standard
Normal distribution: it will be more spread out than the standard Normal
distribution, i.e. it will tend to have fewer values close to zero and more values away
from zero.
This has the following consequence: if you wish to compare small samples then,
even if you think that the populations have Normal distributions, you cannot use
the z-test.
However, under certain circumstances another test is available: it is called Student's ttest or, simply, the t-test.
We are still assuming that the
null hypothesis (m,- m, = 0)
is true.
Student investigated what distribution of values you would get for an expression like
.\In~ no
if you took repeated pairs of small samples from two populations whose distributions
are Normal with identical means and variances.
'
Since the distribution is not the standard Normal distribution, the letter zcan no longer
be used to represent such a test statistic when it is calculated from small samples, so
Student used the letter t.
It is called Student's t-test
after a famous statistician, W.
S. Gosset (1876-1937), who
published the results of his
mathematical research in this
area in 1908 under the penname Student. We worked for
the brewing company Guinness
and was required by the
conditions of his employment
to remain anonymous.
Using the results of his work it is possible to carry out a hypothesis test very similar to
the z-test provided certain quite stringent conditions are satisfied.
3.3 The two-sample t-test
You might be wondering why we have chosen to introduce yet another hypothesis test
for comparing two unrelated samples. There are two reasons.
The test in question is very widely used in all kinds of applications of statistics.
After you finish this course you will probably meet many statistical techniques
which have not been described in the course. So we want to show you how a
hypothesis test (which you might well find in a textbook) can be related to the
principles of hypothesis testing which you have met in the course, and also to
illustrate how to do this.
If you were to look in a statistics reference book to find a hypothesis test to compare
two small unrelated (i.e. unpaired) samples of measurements on an interval scale, then
you might well find something like this.
Two-sample t-test : summary
Applies to unrelated samples of data measured on an interval scale from two
populations, each of which is assumed to have a Normal distribution. Also, the
standard deviations of the two populations are assumed to be equal.
Denoting the means of the two populations by ,ux and py, the null and alternative
hypotheses for the two-sided test are:
Ho: PX =Py
H i : Px f PY.
.
The test is carried out as follows.
1 Calculate the sample means mx and my of the two samples.
2 Calculate an estimate S: of the common population variance as follows.
Denote the data in the two samples by X and y respectively, and denote the
corresponding sample sizes by nx and ny, then
S:
=
C(x
- mx)2 -t CO) -
nx-l+ny-1
3 Calculate the test statistic
t = mx - my
J!!;!
4 Look up the critical value tc for sample sizes nx and ny: this is the value
corresponding to nx - 1 ny - 1 (the number of degrees of freedom).
+
5 For the two-sided test, reject H. in favour of H, if
eithert>t,
or
t<-t,.
Otherwise the conclusion is that there is insufficient evidence to reject Ho.
The table in Figure 3.1 lists
the critical values for this test
at the 5 0/, significance level
when the number of degrees of
freedom, i.e. nx - 1 + n , - 1,
is 1, 2, 3,.. . up to 40.
Figure 3.1 Critical values for Student's t-test (two-sided)
Significance level : 5 %
Degrees of
freedom
Critical value (t,)
Degrees of
freedom
Critical value (t,)
This procedure is fairly similar to that for the two-sample z-test but there are several
important differences.
The critical value for the t-test depends on the sizes of the samples: more precisely it
depends on the number of degrees of freedom.
The t-test can be applied only if it is reasonable to assume that the populations
involved have Normal distributions.
The t-test can be applied only if it is reasonable to assume that the population
standard deviations are equal. If the population standard deviations are, as we
have assumed, equal then the population variances are also equal (i.e. there is a
common population variance); this is why, in Stage 2, SZ. is an estimate of this
common population variance.
This last difference is perhaps the most important so, before looking at some examples
of the use of this t-test, we shall show you how to calculate S:.
The formula for calculating S: in Stage 2 tells you to
0 add up the squared residuals of the measurements in the first sample (X) from their
mean of mx to get E ( x - m,)2
do the same for the second sample to get EO, - my)2
add these two sums together and divide by the sum of the sample sizes each minus
one, i.e. by the number of degrees of freedom: nx - 1 n y - 1.
+
Example 3.1
In an agricultural experiment to investigate the effect of different diets
on the weights of calves, eight calves were allocated to two groups which were fed
different diets, X and Y. Both diets consisted of milk, hay and manufactured
concentrates; the difference between them was that the concentrates in diet Y were
different from those in diet X.
Unfortunately, one of the calves (on diet Y) suffered from a disease which prevented
proper digestion so did not eat very much. That calf was therefore excluded from
consideration when assessing the effects of the two diets.
The calves were kept in similar conditions and the food intake and weight of each calf
was monitored from birth. The allocation of the calves to the two groups was designed
to control for birth-weight but was otherwise random. Figure 3.2 contains some of the
data collected in this experiment.
This procedure is somewhat
similar to, but more
complicated than, that used to
calculate the sample variance
of a single sample, which is:
variance = sZ =
I(.- ,I2
n
(see Section 4.2 of Unit A2).
Figure 3.2 Average daily weight-gain over jive
weeks from birth-date (kg per day)
We shall use the two-sample t-test to investigate whether there is a difference between
the population means, p, and p,, of the daily weight-gains ofcalves fed on the two diets.
For measurements such as these it is reasonable to assume that the populations have
Normal distributions and that their standard deviations (and hence variances) are
equal.
To use the t-test we must first calculate the estimate Sf of this common population
variance. Here is one way of going about this calculation: we first calculate
C(x - mX)2and CO, - my)2.
For the sample fed on diet X
Rather than continuing with this calculation, let u ~ . ~ a ufor
s e a moment and consider
the method of calculation. As with the calculation of the standard deviation, the
method we have just used is not the best method to calculate C (X - mx)2.It is much
quicker to start by calculating x2 and C x (and hence mx), as we recommended for the
standard deviation in Section 4.2 of Unit A2. Then, to calculate x ( x mx)2we can use
the fact that
C(X - mX)2= E x 2 - nxm:.
This is because
-
and
C(X- mX)2= nxsx.2
Thus
(
Z ( X - mX)2= nx -- m):
= C x 2 -nxm:.
Thus it is this second formula which is the most convenient to use. Using this method, as
in the similar method for the calculation of the standard deviation, only two columns
are required and the mean is calculated at the same time.
Do not worry if you cannot
follow the algebraic proof: it
is closely related to Section 4.4
of Unit A2.
So we shall now do the whole calculation for S: using this method.
2.05
Thus m, = -= 0.5125.
4
2.03
Thus my = -= 0.6766667.
3
Thus C(x - m,)'
Thus C ( y - my)' = C y 2 - nym:
= 1.3769 - 3 X (0.6766667)'
= 0.0032667.
= E x Z - nXm:
= 1.0625 - 4
X
(0.5125)'
= 0.01 1875.
Thus S:
=
C(x - m,)' + CO) - my)'
nx-l+ny-1
- 0'0151417 = 0.0030283.
5
Activity 3.2 Here are some pairs of samples (labelled X and Y). In each case the
samples come from two populations (also labelled X and Y) whose means p, and p, are
to be compared. The populations can all be assumed to have Normal distributions and
each pair of populations can be assumed to have a common standard deviation. For
each pair, calculate from the samples the estimate S: of the common variance of the two
populations.
(a) Two groups of children are asked to solve a simple puzzle in which they have to
manoeuvre a ball around an obstacle course and into a hole. One group, X, of children
haveseen theobstaclecourse before but havenot been told how tonegotiateit. The other
group, Y, ofchildren have not seen the obstacle course but have been told in advance how
to negotiate it. The length of time (in seconds) taken by each child to manoeuvre the ball
round the course is shown in Figure 3.3.
Figure 3.3
Group X
Group Y
(b) An agricultural experiment to determine the best method of applying fertilizer to
fields growing winter wheat produced the data in Figure 3.4. The fields in Group X had
some fertilizer applied at one stage of growth of the wheat and some more later whereas
the fields in Group Y had the same total amount of fertilizer all applied at the later
stage. The yield of grain in kg per hectare was recorded for each field in the experiment.
Figure 3.4
Group X
This method of calculation is
summarized on p. 29.
Group Y
It is important not to cut this
figure as it will be used in
further calculations.
Solution
(4
25
Thus m, = - = 5.
5
Thus C(x - m,)' = C x 2 - nxm:
= 151 - 5 X (5)'
= 151 - 125
= 26.
Thus CCy - mu)' = C y2 - nrm$
= 283 - 5 X (7)'
= 283 - 245
= 38.
Thus S: = C(x - mxY + CCy - mu)'
nx-l+ny-1
2 1.60
Thus m, = -- 7.2.
3
3 1.49
Thus m , = -= 6.298.
5
Thus C(x - mX)' = E x 2 - n X m i
Thus
Thus S: =
0.3912
- my)'
= C y 2 - nymy
+ 0,91468
3-1+5-1
The next stage (Stage 3) in the t-test procedure is exactly the same as in the z-test: the
test statistic is
Example 3.1 (continued) For the calves experiment
0.5125 - 0.6766667
-
-0.1641667
JW
We cut this to 3 decimal places to get t = -3.905.
Now, if the null hypothesis ( H o : px - py = 0) is true then we would expect t to be
close to zero. Is - 3.905 close enough to zero, or should the null hypothesis be rejected
in favour of the alternative hypothesis, that the mean weight-gains differ?This stage of
the t-test is much more similar to the Mann-Whitney test than it is to the z-test, in that
the answer depends on the sample sizes. This is because the shape of the sampling
distribution of the test statistic t varies with the sample sizes. It is therefore necessary to
consult a table of critical values.
The table of critical values is in Figure 3.1. One noticeable feature of this table is that the
first column is headed Degrees of freedom. You have to use the number of degrees of
freedom rather than the sample sizes themselves when looking up the critical value. For a
two-sample t-test with sample sizes nx and ny the number of degrees of freedom is
nx - 1 + ny - 1 (the same as the denominator in the formula for S,$)..
If you look at Figure 3.6 you will see that when the number of degrees of freedom is small,
the corresponding curve is low in the middle and high (i.e. further from the horizontal
axis) at the extremes; as thenurnber ofdegrees offreedom increases, thecurves get nearer
the axis at the extremes. Thus the probability of getting a large (i.e. extreme) value o f t
(either negative or positive) is bigger when thenumber ofdegrees of freedom is small than
when it is large.
Example 3.1 (continued) Here nx = 4 and ny = 3. Therefore the number of degrees of
freedom is 4 - 1 3 - 1. This equals 5, thus the row of the table to consult is the one
+
with 5 in the Degrees offreedom column, This gives the critical value t, = 2.571.
This means that if the null hypothesis is true then the probability of obtaining a value of
t greater than 2.571 is 0.025, or 23%, and the probability of obtaining a value o f t less
than -2.571 is also 0.025. So, if the null hypothesis is true then the probability of
obtaining avalue of t less than - 2.571 or greater than 2.571 is 0.05, or 5 %(see Figure 3.5).
In our example, the value of t was calculated to be - 3.905.
Figure 3.5
freedom
Sampling distribution of the test statistic t under the null hypothesis: 5 degrees of
You met this term in Section
3.A Frame l0 of Unit B3 in
connection with the ,y2 test.
Activity 3.3 Does this mean that the null hypothesis of no difference between the diets
should be rejected at the 5 % significance level?
Solution Yes. The value - 3.905 calculated for the test statistic t is less than - 2.571
(i.e. it is more extreme than the critical value) so the null hypothesis should be rejected
at the 5 % significance level.
In general, the null hypothesis should be rejected whenever the calculated value of the
test statistic t is further from zero than the critical value, i.e. whenever the value of t,
ignoring any minus sign, is greater than or equal to the critical value t, given in Figure
3.1 for the appropriate number of degrees of freedom.
Figure 3.6 Sampling distribution of the test statistic t , for various numbers of degrees of freedom,
compared with the standard Normal distribution
standard Normal distribution
16 degrees of freedom
5 degrees of freedom
3 degrees of freedom
- 1 degree of freedom
I
-3
1
-2
-1
l
I
0
1
2
l
C
3
I
If you look at the table of critical values in Figure 3.1 more closely then you may notice
that, as the number of degrees of freedom increases, the critical value decreases. For
example, for 10 degrees of freedom the critical value is 2.228, whereas for 40 degrees of
freedom it is2.021. The reasons for this are illustrated in Figure 3.6: for small numbers
of degrees of freedom the tails of the distribution of the test statistic t die away less
rapidly than for large numbers, so the critical value must be further from zero. This
difference can be seen quite clearly (from the curves in the figure) for 1 or 3 degrees of
freedom but the curve for 16 degrees of freedom looks very similar to that of the
standard Normal distribution.
For a larger number of degrees of freedom, the curve (drawn at the scale of the figure)
would be indistinguishable from that of the standard Normal distribution. However,
even these small differences in the curves produce noticeable differences in the sizes of the
tails. The tails of these distributions of t all represent a larger proportion of the
distribution than do the tails of the standard Normal distribution. This larger
proportion makes the critical value noticeably larger than 1.96: for 16degrees offreedom
it is 2.120 and for 40degrees of freedom it is2.021 (this is smaller than 2.120 but still larger
than 1.96). Thus, as the number of degrees of freedom increases, the corresponding
critical value decreases: the critical values get closer and closer to 1.96 but they never
become smaller than 1.96.
This rejection rule is for the
describe
two-sided the
test;
one-sided
we
t-test in
this course.
Activity 3.4 For each of the pairs of samples in Activity 3.2, carry out a two-sided ttest at the 5 % significance level to test the null hypothesis
Ho: The population means are equal: p, = p,.
against the two-sided alternative hypothesis
H,: The population means are not equal: p, # p,.
Solution
(a) In this case
-
- 1.118 (cut to 3 decimal places).
The number of degrees of freedom is 8 so the critical value t , = 2.306. The value of the
test statistic - 1.118 is nearer to zero than the critical value 2.306, so we cannot reject
the null hypothesis (of no difference between the population means) at the 5 %
significance level.
This critical value comes from
the row of Figure 3.1 with 8
in the Degrees of freedom
column.
(h) In this case
The number of degrees of freedom is 6 so, from Figure 3.1, the critical value t, = 2.447.
The value of the test statistic 2.647 is further from zero than the critical value 2.447, so
we reject the null hypothesis (of no difference between the population means) at the 5 %
significance level.
Often the data from the samples is not presented in detail but the summary measures,
mean and standard deviation, are given, as in some of the examples in Unit B4. If the
data is presented in this way, we can still carry out a t-test. To do this we need a formula
for calculating S: from sx and S,.
The formula is
This is because
C(X- m X ) 2= nxs:.
Activity 3.5 This concerns an experiment to investigate two different varieties of
lupin: Lupinus arboreus and Lupinus hartwegii.
Here are the summaries of the data from two samples: one of each of these varieties. All
the plants were grown at the same time in similar conditions in a nursery and the height
of each (in metres) was measured on the same day.
arboreus
sample size nA = 4; mean mA = 1.452; standard deviation S A = 0.061.
hartwegii
sample size n, = 5; mean m, = 1.023; standard deviation S , = 0.023.
Carry out a t-test on these samples to investigate whether there is a significant
difference between the heights of these samples of the two varieties.
Solution First we calculate the common population variance S: using the above
formula as follows.
Now,
-
12.779 (cut to 3 decimal places).
The number of degrees of freedom is 7 thus the critical value t, = 2.365. The value of the
test statistic 12.779 is further from zero than the critical value 2.365, so the varieties
differ significantly in height. H
Before going on to apply the t-test to the mustard seed data, which you should now
collect following the instructions in Section 2.4, here are some exercises.
Exercise 3.1
Solution on p. 43
Exercise 3.2
A biscuit manufacturer wishes to compare the performance of two biscuit production lines, A and
B. The lines produce packets of biscuits with a nominal weight of 300 grams. Two random
samples of 15 packets from each of the two lines are weighed (in grams). The sample data is
Solution on p. 44
summarized as follows.
Line A
mean m, = 309.8;standard deviation
S,
= 3.5.
S,
=
Line B
mean m,
=
305.2;standard deviation
7.0.
Carry out a t-test on this sample data to test whether the two production lines produce packets of
different weight.
3.4 Analysis of mustard seedling data
When you have collected the data from your mustard seedlings experiment, you will be
able to test whether light affects root growth. You should have 10 measurements from
seedlings grown in the light and 10 from seedlings grown in the dark: all from seedlings
whose stems you cut.
Beforecarryingout the I-test on thisdata, you should consider whether you can assume
that the populations satisfy the necessary distributional conditions. The only method
which you have available to d o this is toexamine thedata which you have collected. We
shall now demonstrate how to d o this using the following data, which we hope is not
too different from yours.
Even if you have not got 10
measurements from each
group, you can still carry Out
the test-provided you have at
least
altogether including at least I
from each group!
Figure3.7 Lengths of roots (in mm) obtained in a mustard
seedlings experiment
Seedlings grown in light
21
-
39 27 11
X
21 26 1) 12
52 39 X I1 2
5 0 x
82911
X
Seedlings grown in dark
22
X
21 X 39
20 X 16 20 X
1 4 3 2 z X 36
2 4 1 2 0 1 1 2 2
M~~~~~~~~~~~which are
underlined were obtained from
seedlings whose stems were
cut during their growth.
A cross indicates a seed which
did not germinate.
Activity 3.6 Prepare a back-to-back stemplot of the lengths of the roots of the two
samples of 10 cut seedlings from Figure 3.7 (i.e. of the values which are in underlined).
Solution
For light sample
Eu = 55
EL = 13,
For dark sample
EU = 4 1
E, = 14.
This suggests the following stemplot.
Lighf
73
91
991
520
1
2
3
4
151
I
n, = l0
1
Dark
467
0228
26
1
4 represents 14mm
n, = 10
Both of these samples of data look as if they could well have come from populations
with a Normal distribution. Also, their spreads are very similar so the population
standard deviations (and hence also variances) could well be equal. With such small
samples it is not possible to be more precise than this but there is certainly no strong
evidence that the distributional conditions required for the t-test are not satisfied. Nor,
again because they are so small, would there be any such evidence even if the samples
were less symmetric or had less similar spreads. Thus such a stemplot, together with a
large amount of similar data about living things collected by scientists, suggests that we
can apply the t-test to these samples of data.
The analysis of your mustard seedling data will form part of the tutor-marked
assignment covering this unit.
Summary
This section explained why a two-sample z-test cannot he used when the sample sizes
are small, and introduced the t-test which can be used with small samples under certain,
quite stringent, distributional conditions. Full details ofthese conditions, together with
a summary of the whole test procedure, are on p. 19. Here we shall summarize the
calculation of the test statistic.
COLLECT
Procedure To calculate the test statistic for the two-sample t-test
Denote the two samples by X and Y and their sizes by nx and n,.
1 Calculate the sample means m, and m, and the sums of squares E x 2 and
5'.
2 Calculate E x 2 - nxmi and C y 2 - nrm;.
3 Calculate S: by summing E x 2 - nxmi and Cy2- nrm; and dividing by
nx - 1 + nr - 1 (i.e. the number of degrees of freedom).
Full accuracy should be kept
for all these calculations but
the value of the test statistic t
should be cut to 3 decimal
places.
4 Calculate m, - m,.
5 Calculate
/
S+:-.":
-
6 Calculate the test statistic t by dividing mx - m, by
I
/m.
nx
nr
Cut this to 3
decimal places.
After completing this section you should be able to
carry out a two-sample t-test for unrelated samples
understand the stringent distributional conditions required for the use of the t-test.
4 The t-test continued
In this section we shall introduce a version of the t-test which can be used to analyse a
single sample of suitable data and then show how this test can be adapted to produce a
useful test for data from experiments involving matched pairs. We shall then show that,
like other hypothesis tests, these t-tests can be used to find confidence intervals. The
section ends by briefly considering more complicated designs of experiment and how
these might be analysed but, first, we return to a question which was raised briefly in
Section 3.1.
4.1
Why the t-test?
You may have wondered why people bother ever to use the t-test. After all, the MannWhitney test can almost always be used instead and the t-test seems to involve
calculations which are comparatively lengthy. Moreover, the t-test requires quite
demanding conditions: the variance of the two populations from which the samples are
drawn must be equal and these two populations must have Normal distributions. In
fact, this Normal distributions condition is the key to why the t-test is useful in many
circumstances. The t-test is based on the individual data values, rather than just on their
order, and hence appears to be using more information from the data than does the
Mann-Whitney test. However, the extra information can be used only if a condition on
the exact shape of the population distributions, the Normality condition, can be
assumed.
rests like this, which involve specific distributions, are called parametric. In contrast,
non-parametric tests like the Mann-Whitney test and the Wilcoxon matched-pairs test
do not require conditions on the exact shape of the population distributions, though
they may require some limitations on the distributions. The name parametric arises
because, for example, the t-test assumes that only the means and variances of the two
population distributions are unknown- the shape must be that of a Normal
distribution. (ThemeanandvarianceofaNormal distribution arecalleditsparametem.)
)
Also, tests like the Mann-Whitney test can be used for data from ordinal measurements,
whilst the t-test can be used only for data from interval scale measurements.
For example the MannWhitney test assumes that the
two population distributions
be the same shape as
each other, but it does not
matter what shape this is.
When the specific conditions for using the t-test are satisfied, the test certainly uses
more of the information in the data than does the Mann-Whitney test. This shows up in
the fact that the t-test is more powerful than the Mann-Whitney test. When statisticians
talk of the power of a test, in a context like this, they mean the ability to pick out a true
difference between the means of the two populations. So, judged by this criterion, the ttest is more likely than the Mann-Whitney test to detect a difference between
population means when such a difference really exists.
Accepting the null hypothesis when it is really false is a type 2 error so the power of a test
is a measure of its ability to avoid a type 2 error. To be precise it is the probability that,
when using the test, a type 2 error will not occur. Other things being equal, the t-test is
less likely to result in a type 2 error than is the Mann-Whitney test. However, the
difference between the powers of these two tests is not very great.
In general, the power of a test
is a measure of that test's
ability to reject the null
hypothesis when it is really
false.
The slightly lower power of the Mann-Whitney test seems a small price to pay for the
advantage of its simplicity and lack of restrictive conditions. For this reason, many
statisticians prefer, whenever possible, to use it, and other non-parametric tests, rather
than the t-test. However, the t-test and related tests do have advantages over nonparametric tests, particularly in analysing data from experiments with complicated
designs.
One other difference between the t-test and the Mann-Whitney test is that the former is
a test for differences between means and the latter for differences between medians. If the
conditions required by the t-test are satisfied then the population distributions are
symmetric because they are Normal. Therefore, in this case the population mean and
median are equal. However, in other cases the mean and median may not be equal and
it could be important to use the median rather than the mean.
4.2 Checking the conditions
There are statistical techniques which help to decide whether it is reasonable to assume
that a sample comes from a Normally distributed population. These are often called A Normally distributed
goodness-of-fit tests because they test how well the distribution of the data from a population is a population
with a Normal distribution.
sample fits a specified distribution such as the Normal distribution. Other techniques
help to assess whether it is reasonable to assume that two samples of data come from
populations whose variances are equal. It is, however, always a good idea to produce
stemplots of the measurements from the two samples and use these to check whether it In Section 3.4 we did this for
looks as if the samples come from populations which are Normally distributed with the data from the mustard
seedlings experiment.
equal variances. The problem with all these methods, however, is that with small
samples it is very difficult to detect any departure from Normal distributions or from
equal variances, even if these really exist. The distributions of the two populations need
to have very different variances, or need to be very far from Normal, before these
features show up in small samples. Usually, therefore, you must rely on knowledge of
similar experiments or the data sources, to decide whether the distributional conditions
required by the t-test are likely to be satisfied. For example, if the data was people's pay,
or earnings, then it would not be reasonable to assume that the populations were
Normally distributed. As was illustrated by Section 2.1 of Unit A2, such data often has A skew distribution is one
which is not symmetric.
a very right-skew distribution.
Fortunately, even if the conditions are not completely satisfied the t-test does not
usually give wildly misleading results: the analysis is not usually very much more likely
to lead to errors (of either type 1 or type 2) than if the conditions are met. For example,
as was discovered by the statistician Boneau. if the two populations from which the
samples are drawn have markedly skew distributions then the value of the test statistic t
is likely to be slightly smaller than it would be if the populations were Normally
distributed. Thus, it is slightly more likely that the null hypothesis will be accepted
when it is actually false (i.e. that a type 2 error will occur) if the population distributions
are skew than if they are Normal. Most statisticians would probably be happy to accept
this risk.
Boneau also found that, provided the samples are of equal size, even if the variance of
one of the Normally distributed populations were four times that of the other, the
values of the test statistic t obtained from samples drawn from those.populations would
Here, smaller means closer to
zero.
be only slightly different from those obtained from two (Normally distributed)
populations with equal variances. If, however, the samples sizes are markedly different
(e.g. one sample of size 5 and the other of size 15) then the values of the test statistic can
behave very erratically. Thus it is advisable to have samples of the same size, or of as
similar size as possible, when using a t-test.
4.3 The one-sample t-test
The t-test described in Section 3.2 is used to compare two unrelated (i.e. unpaired)
samples. w e pointed out there that in many ways the procedure is similar to the z-test
for the difference between two population means. You may have wondered whether a ttest like the one-sample z-test could be used to investigate whether the mean of a
population, from which a single small sample of data is available, is equal to some
particular value: if the sample were large then the one-sample z-test could be used. A
version of the t-test can indeed be used for this purpose and the procedure is again very
similar to that for the related z-test.
For example, in Section 3.1 of
Unit B47 the one-samde z-test
was used to investigate
whether a sample of
examinationmarks could
come from a population with
mean 58.
Example 4.1 A tomato grower decides to try out a new variety of outdoor bush
tomato. The seed merchant claims that this variety produces an average yield of 4 kg of
tomatoes per plant. The grower wants to investigate whether the claim is true because if
it is then it would be worth his growing this new variety. He has room to experiment
with only 5 plants of the new variety. The yields of tomatoes from each of his five plants,
in kg, are
3.6 3.2 3.1 2.6 3.9.
If the population mean of the yield (in kg per plant) of tomatoes from plants of this
variety is denoted by p, then the grower's null hypothesis is that p is as the seed
merchant claims. In symbols
H,: p = 4 .
Now, the grower needs to see whether there is a significant difference between the mean
yield of his 5 plants and the seed supplier's claim. Let us assume for the moment that he
has no views about the direction of any difference, thus his alternative hypothesis is
two-sided. In symbols
H,: p # 4.
.
If the sample size had been large then the grower could have used the one-sample z-test
but the sample is certainly not large enough. He could use the sign test provided he
changed his hypothesis to refer to the population median rather than the mean.
However, this data is from interval scale measurements and the sign test takes account
only of whether each data value is above or below a particular number. Thus the sign
test does not use all the information in the data. It is possible to use much more of this
information by using a t-test but this can be done only if it is reasonable to assume that
the population distribution is Normal. Our tomato grower might well feel that this
assumption is justified.
Activity 4.1
(a) What test statistic would you use if you were trying to apply a z-test to this data?
(b) Why is the z-test not appropriate here?
Solution
(a) The test statistic would be
where m is the mean of the sample data and SE is the sample estimate of the standard
error of the sample mean. You would calculate this estimate from the formula
where s is the sample standard deviation and n is the sample size.
See Section 3.1 of Unit B4.
See Section 4.1 of Unit B1.
Many measurements of living
things are Normally
distributed. For example the
heights of adult men h
Section 1.3 of Unit B4.
(b) The z-test is not appropriate here for the same reason that it was not appropriate
in Section 3. The sample size is so small that S/& may not be a good estimate of the
standard error. Consequently, for such a small sample size, this test statistic has not got
a standard Normal distribution.
Even though the z-test is not appropriate, the test statistic still seems reasonable. If the
null hypothesis were true then the sample mean m would tend to be close to 4, so it
makes sense to look at m - 4. If it is close to zero then H, may well be true. If it is far
away from zero then perhaps H . is false. Also, it still makes sense to divide m - 4 by an
estimate of its standard error, even if the small sample size means that this estimate may
not be accurate. For these reasons the test statistic used for this C-testis almost the same
as the z-test statistic. The only difference lies in the estimate of the standard error SE.
Activity 4.2 Calculate the sample mean m and sample standard deviation s for the
tomato grower's data.
I
CLARIFY
4
Solution The sample mean
I
Zx
m = -.
n
The calculation of the sample standard deviation by Method 2 uses the following
formula
Jx.
variance = -- m' and s =
n
Laying the calculations out in a table gives the following (the sample size n = 5).
Thus the sample mean is m = 3.28.
Thus the sample variance is 10.956 - (3.28)', which is 0.1976.
Finally, the standard deviation s = J0.1976
In calculating the test statistic for the t-test the only change from the z-test is that
instead of using the sample standard deviation S, you should use
Remember that
Thus the only difference between s and S is that to calculate S you should divide by
n - 1 instead of by n.
2
COLLECT
I
If you were to use Method 1 to calculate S, by adding up the values (X - m)' directly,
then you could calculate S by dividing by n - 1 instead of by n. If, as we advise, you use
Method 2 then, instead of calculating
you should calculate
X'
- nm2
n-l
Example 4.1 (continued) For the tomato data, where n = 4,
The reason for using S instead of s is that, for some purposes, S is a better estimate of the
population standard deviation than is S. It has particular advantages in the t-test and
related tests. For large samples, i.e. when n is large, it makes very little difference
whether s or S is used. For example, if x ( x - m)' were 200 and n were 100 then
It is important not to cut this
figure as it will be used in
further calculations.
whilst
However, for small samples the difference is more marked.
Some books refer to S rather than to s as the sample standard deviation. Also, some
calculators which have a key to calculate standard deviations automatically, calculate
S rather than S. If you have one of these calculators then you can check whether it
calculates s or S by using it to find the standard deviation of the sample of size 2
consisting of two numbers 0 and 2. For this sample, s = 1 and S 1.414.
Example 4.1 (continued) For the t-test the sample estimate of the standard error is
S/&. Thus the test statistic for the tomato grower is
m-4
t = ----(S/& '
i.e. m - 4 divided by this sample estimate of the standard error. This is
If H . were true then the population mean would be 4, so the sample mean m would
probably be close to 4. Hence m - 4 would be near 0 and so t would be near 0.Thus, as
with the z-test, values oft sufficiently far from 0 (positive or negative) result in rejection
of the null hypothesis. To know whether to reject the null hypothesis you need a critical
value and probably you will not be surprised to learn that this comes from the table of
critical values in Figure 3.1. To use this table you need to know what number of degrees
of freedom to look up. For the one-sample t-test the rule is:
for a sample of size n, the number of degrees of freedom is n - 1.
Activity 4.3 Find the critical value at the 5 % significance level for the tomato grower's
test.
Solution The sample size n is 5, s o the number of degrees of freedom is 4. Thus from
Figure 3.1 the critical value is 2.776.
As with the two-sided t-test
we cut the value of the test
statistic t here to 3 decimal
places.
Example 4.1 (continued) The rejection rule for the tomato grower is therefore: reject
H. in favour of H 1 if the sample value of the test statistic t is greater than or equal to
2.776 or less than or equal to - 2.776. Since t = - 3.239, which is less than - 2.776, the
tomato grower should reject H. and conclude that, on the basis of his sample, the seed
merchant's claim that the new variety yields 4 kg per plant is false.
Of course he may have some reservations about this conclusion. Perhaps the seed
merchant's claim referred to a growing a season with average weather, and the weather
had been bad this year, or perhaps the growing conditions or the amount of fertilizer
used were not optimal.
Here is a summary of the procedure for the one-sample t-test : the test described here is
two-sided.
One-sided t-tests can be used
but they need a different table
of critical values and we shall
not describe them in this
course.
Procedure One-sample t-test (two-sided)
The test applies to a sample of data measured on an interval scale when the
population from which the data comes can be assumed to have a Normal
distribution.
1 Denoting the population mean by p, the null and alternative hypotheses are
H,: p # A .
2
Calculate the sample mean m and calculate
The second formula is usually
easier to calculate.
where n is the sample size.
3 The test statistic is
4 The critical value at the 5 % significance level is the value of tc for n - 1
degrees of freedom in Figure 3.1.
5 Reject H. in favour of H , at the 5 % significance level if
Otherwise the conclusion is that there is insufficient evidence to reject H o .
In scientific experimentation of the kind described in Units C1 and C2, the one-sample
t-test is not often appropriate. What makes this test important in experimentation is
that it can easily be extended to deal with paired data, e.g. data from experiments using
crossover or matched-pairs designs.
4.4 The matched-pairs t-test
To illustrate this use of the one-sample t-test we shall look at some data which comes
from a paper by Cushny and Peebles published in 1904.
These researchers wanted to investigate whether two different forms, L and R, of a drug,
hyoscyamine hydrobromide, differ in their capacity to produce sleep. They conducted a
crossover trial on ten patients. Five of the patients received form R of the drug first, and
their gain in sleep was recorded. After a suitable time had elapsed they were given form
L instead and the same measurements recorded. The other five patients received the
two drugs in the opposite order. The results are in Figure 4.1.
This data was analysed by
Gossett (Student) in the 1908
paper in which he published
the results which led to the ttest.
Hyoscyamine hydrobromide is
not now commonly used as a
sleep-inducing drug.
Figure 4.1 Sleep gained (hours)by the
use of' hyoscyamine hydrobromide
Patient
Form L
Form R
1
2
3
4
+ 1.9
+0.8
+ 1.1
+0.7
5
-0.1
- 1.6
- 0.2
- 1.2
-0.1
+0.1
+ 4.4
+ 5.5
+ 1.6
+4.6
+ 3.4
6
7
8
9
10
+ 3.4
+ 3.7
A positive figure means that
the patient got more sleep
with the drug than without; a
negative figure means that he
or she got less sleep with the
drug.
+0.8
0.0
+ 2.0
Note that patients vary considerably in their responsiveness to both drugs. For
example, patient 7 is very responsive to both drugs, whilst patient 5 is much less
affected.
It would not be appropriate to analyse this data using the two-sample t-test from
Section 3, or using the Mann-Whitney test, because the data is in matched pairs: there
is a pair of sleep gained measurements for each patient. You could therefore use the
Wilcoxon matched-pairs test.
Activity 4.4 If you were to use the Wilcoxon matched-pairs test on this data, what
would be the first calculations you would perform?
Solution You would find the difference, for each patient, between the hours of sleep
gained using form L and the hours gained using form R.
The t-test for matched-pairs starts in exactly the same way.
Activity 4.5 For each patient in Figure 4.1 calculate the difference in hours of sleep
gained: L - R.
Solution
Figure 4.2 Sleep gained (hours)
Patient
Form L
Form R
Difference ( L - R )
From this stage onwards the Wilcoxon test uses just the diferences - the original data is
not used again. So let us also concentrate on these differences. The C-test tests
hypotheses about means rather than medians. The null hypothesis is that the two forms
of the drug are equally effective: i.e. on average, it does not make any difference which
form a patient receives. If we denote the population mean of the hours of sleep gained
with form L by p, and the population mean for form R by p, then the null hypothesis is
Ho: P L = / . ~ R
OR
HO:
/ A L - ~ R = ~ .
If the population mean of the difference between the hours of sleep gained with form L
and with form R is denoted by p,, then pD = pL - p,.
Now the null hypothesis is just that the population mean difference is zero:
Assuming that the researchers have no particular belief that the mean difference is likely
to be positive rather than negative, so that a two-sided alternative is appropriate, then
the alternative hypothesis is
Hi:
We now have a single sample of 10 numbers, the differences, and a null and an
alternative hypothesis about the population mean difference of the population from
which this sample of 10 diflerences comes. If it is reasonable to assume that the
distribution of this population of differencesis Normal then we can carry out a onesample t-test on these differences.
Activity 4.6 Carry out the test. What do you conclude?
Solution The hypotheses are
Hot p D = O
H i : pD#O.
First it is necessary to calculate S for the sample of differences. We do this as follows.
X
xZ
Note that this t-test differs
from the Wilcoxon test in that
the zero difference is included.
Thus the mean of these differences
Thus S =
x2 - nm2
Now, the test statistic is
m-A
t = ----(S/&) '
with A = 0 since the null hypothesis is H , :
Thus t
=
1.58 - 0
.V
(1.2299955/m)
p,
= 0.
4.062 (cut to 3 decimal places).
The critical value is obtained from Figure 3.1. The number of degrees of freedom is
n - 1 = 10 - 1 = 9, so the critical value is 2.262. The value oft is 4.062, which is greater
than 2.262. so Hn. is reiected.
The conclusion is that the population mean difference is not zero. Thus, on the basis of
this experiment it seems that the two forms of hyoscyamine hydrobromide do differ
in their effectiveness in increasing sleep. W
The procedure is on p. 34.
So, to carry out a matched-pairs t-test, the procedure is as follows.
Procedure Matched-pairs t-test (two-sided)
1 Calculate the differences between the two values in each pair.
2 The null and alternative hypotheses are
Ho:
H ~ :P A
where p, and p, are the population means of the two populations involved.
Replace these by the equivalent hypotheses
H o : p ~= 0
HI: h#O,
where pD is the population mean of the population of differences between the
matched pairs.
3 If it can be assumed that this population of differences has a Normal
distribution then apply the one-sample t-test with A = 0 to the sample of
differences.
4.5 Other matched-pairs tests
This procedure, which turns a one-sample test into a matched-pairs test, can be used on
other tests. For example, if you had carried out an experiment which produced a large
sample of matched pairs of data (on an interval scale) then you could calculate the
difference between the two data values in each pair, and use the one-sample z-test to
investigate whether the population mean of the differences is zero. If the samples were
small then, in a similar way, the sign test could be used to investigate whether the
population median of the differences is zero.
Working backwards, the Wilcoxon matched-pairs test can be used to test whether the
median of a single sample is equal to a particular number (i.e. it can be used like the sign
test in Unit Bl). If the null and alternative hypotheses are
Ho: M = O
H , : M # 0,
where M is the population median, then a sample of data from this population can be
analysed in exactly the same way that the sample differences are analysed in the
Wilcoxon matched-pairs test. A small adaptation of this technique produces a onesample Wilcoxon test for any null hypothesis about the value of the population median.
So you have now met two different tests to use with small samples of matched pairs of
data (on an interval scale): the matched-pairs t-test and the Wilcoxon matched-pairs
test. Most of the remarks in Section 4.1, where we were comparing the two-sample t-test
with the Mann-Whitney test, apply equally to any comparison between the matchedpairs t-test and the Wilcoxon matched-pairs test. For example, if the condition that the
differences have a Normal distribution is satisfied then the t-test is more powerful than
the Wilcoxon test but the difference in power is not great and in some ways the
Wilcoxon test is easier to use.
4.6 Confidence intervals from t-tests
As with other hypothesis tests, such as the Mann-Whitney test and the z-tests, it is often
important to go beyond the test and obtain estimates in the form of confidence
intervals. For example, the tomato grower in Section 4.3 would probably like an
estimate of the yield of tomatoes which he is likely to obtain from the new variety using
his growing methods. This would help him to decide whether it was worth growing this
new variety even though it did not match up to the seed merchant's claim.
So the tomato grower wants a 95 % confidence interval for p, the mean yield (in kg per
plant) of tomatoes which he will obtain from this new variety. If he were to calculate a
confidence interval based on the z-test then he would calculate the sample mean m and
the sample standard deviation S . Then the 95 % confidence interval for p would be
Recall that 1.96 is the critical value for the z-test and that S/& is the sample estimate of
the standard error. However, because his samples are small he had to use a t-test rather
than a z-test and for this same reason he must also calculate his confidence interval
differently -it must be based on the t-test rather than the z-test.
The calculation is almost exactly the same as for the interval based on the z-test, but
there are two modifications.
He must use the critical value c, from Figure 3.1. This is the same critical value as
the one which he used in the hypothesis test in Example 4.1; that for n - 1, i.e. 4,
degrees of freedom.
He must use the modified estimate S/& of the standard error, just as in Section
4.3.
Thus the 95% confidence interval for p is
From Example 4.1, m = 3.28, S = 0.4969909, n
=
5 and t, = 2.776.
Thus
Cut to the same level
of accuracy as the sample
mean.
so the tomato grower's 95 % confidence interval is [3.28 - 0.61, 3.28 + 0.61 1.
Figure 4.3
This is [2,67, 3.891,
2.67
We can summarize this method as follows.
Procedure To find a 95 % confidence interval for the population mean of a
Normally distributed population
The confidence interval is
where n, m, c, and S are as in the procedure for the one-sample c-test. Thus t, is the
critical value from Figure 3.1 for n - 1 degrees of freedom.
In Section 4.4 we showed that a hypothesis test for a single sample can give a test for the
difference between samples of matched pairs of data by applying it to the sample of
differences. Similarly, a confidence interval for the difference between the samples can
also be found.
Activity 4.7 Use the solution to Activity 4.6 to find a 95 % confidence interval for p ~ .
Solution Here we have
t, = 2.262.
= 10, m = 1.58,
S = 1.2299955 and the critical value
From this we calculate
Thus a 95 % confidence interval is
[1.58 - 0.87, 1.58 + 0.871.
This is [0.71, 2.451. W
Figure 4.4
3.89
Now p~ = p, - SO YOU have calculated a confidence interval for p, - p,. In words,
p, is the population mean of the difference between the hours of sleep gained by the
same patient using form L and using form R of the drug hyoscyamine hydrobromide.
Thus you have calculated a 95 % confidence interval for the mean difference between
the hours of sleep gained using the two forms of the drug.
The two-sample t-test (for unrelated samples) from Section 3 also gives a confidence
interval for the difference between the population means. Again the form is similar to
that given by the two-sample z-test for the difference between two population means.
The two-sample z-test gives this confidence interval
(mA- mB)+ 1.96
(m, - mB)- 1.96
The modifications required to apply this to small samples are
replace :S and S; by S: (calculated as described in the summary of Section 3);
replace 1.96 by the critical value tc from Figure 3.1 for n, - 1 + nB - 1 degrees of
freedom.
This first replacement can be done only if it can be assumed that the two population
standard deviations are equal. This is the condition which we required for the twosample c-test in Section 3. Also, as with all t-tests and related confidence intervals, the
populations must have Normal distributions. These replacements result in the
following.
Procedure To find a 95% confidence interval for the difference between the
population means of two unrelated Normally distributed populations with equal
standard deviations
I
The confidence interval is
I
where n,, ny, m,, m,, tc and S: are as in the summary of Section 3.
Thus tc is the critical value from Figure 3.1 for nx - 1
freedom.
+ ny - 1 degrees of
Example 4.2 We can use this procedure and the calculations in Example 3.1 to
calculate a confidence interval for the difference in average weight-gain (kg per day)
between the calves fed on diets X and Y.
nx = 4, ny = 3
m, = 0.512, my = 0.676
Thus
/=
= 2.571 X
= 2.571
X
(cut to 3 decimal places)
Jr
0.0030283
0.0030283
,/0.0017665°.0017665
Cut to the same level of
accuracy as the sample means.
Also m, - my = 0.512 - 0.676 = -0.164.
Thus the confidence interval is
[-0.164 - 0.108, -0.164
which is [-0.272, -0.0561.
Figure 4.5
+ 0.1081,
Before we describe briefly the analysis of more complicated experiments, here are some
exercises to test your understanding of the techniques in this section.
Exercise 4.1
Solution on p. 44
Use the matched-pairs t-test to analyse the data from Section 6.1 of Unit C1 on the effect of
urastarvin on male patients. The data on the 5 matched pairs of males is in Figure 4.6.
Figure 4.6
Experimented group
(drug taken)
Patient
Weight
number
change (D)
Control group
(placebo taken)
Patient
Weight
number
change (P)
Difference for
each pair (D - P)
Is there a significant difference between the effect of the drug and that of the placebo?
Exercise 4.2
Calculate the following confidence intervals based on the data in Exercises 3.1 and 3.2.
(a) For the difference p, - h,where p, and p" are the population mean weight-gains (kg) of
calves fed on the diets B and H respectively.
(b) For the differencep, - p,, where p, and p, are the mean weights of packets of biscuits from
the two production lines.
4.7 Analysis of variance
So far, most of the experiments discussed in Units C1 and C2 have been relatively
simple in design. Each has involved a comparison of two groups, whether they be of
people, plants or calves, that are as nearly identical as possible except for one factor of
interest, such as a drug treatment, light or diet. However, experiments need not be as
simple as this in their design. It is often convenient to compare, not just one new drug
or one new diet with a control, but a whole range of them. Also in Section 6.2 of
Unit C l , we described a complicated version of the clinical trial from Section 6.1, in
which each patient received both drug and placebo one after the other, in a crossover
design. In Activity 6.1 of that unit you saw that the data from this trial could not be
analysed properly by using simple comparisons between two groups of measurements.
Let us look at another, simpler, example in which such multiple comparisons are
required. Suppose that agricultural scientists want to try out 6 different diets to see
which produces the best development in chickens. It would be cumbersome to carry out
6 separate experiments, each with one of the new diets tested against an existing
standard diet. Why not instead test all 6 new diets simultaneously against a single
control group fed on the standard diet? This would seem to make sense in terms of
efficiency: for instance, only one control group is needed rather than six.
Suppose also that the scientists wished to try out these 6 new diets on 4 separate breeds
of poultry. If they fed 10 chickens of each breed on each diet then the design of the
experiment would be as in Figure 4.9.
Figure 4.9
Diet 1
Diet 2
Diet 3
Diet 4
Diet 5
Diet 6
Control
diet
10
10
10
10
10
10
10
chickens
chickens
chickens
chickens
chickens
chickens
chickens
10
10
10
10
10
10
10
Breed2
chickens
chickens
chickens
chickens
chickens
chickens
chickens
10
10
10
10
10
10
10
Breed3
chickens
chickens
chickens
chickens
chickens
chickens
chickens
10
10
10
10
10
10
10
Breed4
chickens
chickens
chickens
chickens
chickens
chickens
chickens
Solution on p. 45
At the end of the experiment, the experimenters would weigh the chickens and
altogether they would have measurements on 280 birds, in 28 groups of 10.
Where does one begin to analyse data of this sort? One way might be to carry out a
hypothesis test of each experimental group of a given breed against the control group of
that breed. That would, however, involve 24 hypothesis tests, and this would be both
tedious and undesirable. It would be undesirable because each test would be conducted
on a relatively small number of chickens: those in only two of the positions in Figure
4.9, i.e.just 20 out of the 280 chickens. Thus each test would not be very likely to identify
a difference as significant unless it was fairly large.
It would be undesirable also because, with 24 hypothesis tests and a significance level of
5 %, there is a large chance of wrongly rejecting the null hypothesis in at least one of the
tests. The reason for this is as follows. Suppose that you were to carry out a hypothesis
test at the 5 % significance level on data from two samples. If the null hypothesis of no
difference between the two populations is true then the probability of obtaining a value
of the test statistic which led you to reject the null hypothesis is 5 %. If you carried out
only one such significance test that leads to rejection of the null hypothesis then you
would probably be quite happy to accept this small risk that you have made the wrong
decision (i.e.a type 1error). However, still assuming that thenull hypothesisis true, if you
were to carry out 100 hypothesis tests, on 100 completely independent pairs of samples
drawn from two populations, each at the 5 % significance level, then, on average, 5 % of
the 100 tests (i.e. 5 of them) would lead you to reject the null hypothesis although it is
true. In summary, if you carry out a large number of hypothesis tests at the 5 %
significance level then you are very likely to sometimes reject the null hypothesis when
it is true.
In the chicken experiment, if it happened that none of the new diets really made a
difference to any of the breeds (i.e. all 24 null hypotheses were really true) then it is more
likely than not that at least one of the null hypotheses would be wrongly rejected (i.e.
that a type 1 error would be made somewhere).
It would seem to be much better to use an analysis which can deal with the complete set
of data, using information about all 280 chickens. The typical weight of a chicken of one
breed could then be based on a larger rather than a smaller sample (size 70 rather than
10). Similarly, the typical effect of each diet could be based on a larger number of birds
(40 rather than 10). Moreover, the overall pattern of the results would show whether the
effect of the diets varied with the breeds to which they were fed.
The sort of experiment just described is very common in agriculture and animal
husbandry, and it is therefore not surprising that methods of analysing such data were
first developed by scientists working in this area. The technique that is used is called
analysis of variance. It would take more space than is available in this course to describe
the analysis of variance in detail, although the principles are quite straightforward. It
analyses the reasons why the measurements differ from one to another.
In the chicken example, the mean weight of all 280 chickens could be calculated. The
reasons why the weights ofindividual chickens differ from this overallmeancould then be
investigated. Here are some examples of possible reasons.
Chickens of one breed are heavier than average, so individuals of that breed tend to
weigh more than the overall mean.
One diet is particularly poor, so individuals fed on that diet tend to weigh less than
the overall mean.
Individual chickens, even of the same breed fed on the same diet, will vary in weight.
To be more precise, the possible sources of these differences are as follows:
the diets
the breeds
the variability of individual chickens.
The analysis of variance makes it possible to estimate the extent to which the differences
between the measurements are due to each of these sources and, most important, to
assess whether any of the differences observed are due to the diets or breeds having a
real effect rather than being due to the random variability of individual chickens. If this
These differences are
sometimes called sources of
variability, in which case the
last is called random
variability.
random variability is rather small but the differences due to the diets are large then the
analysis would probably lead to the conclusion that the diets are having a real effect. If,
however, the random variability is large and the differences due to both diets and breeds
are small then the analysis would probably lead to the conclusion that, on the basis of
this experiment, neither diets nor breeds make any difference, i.e. that the small
differences which were observed could be due to random variation alone. It might even
turn out that one diet is particularly suitable for one breed whilst another diet is
particularly suitable for some other breed: in this case further analysis will be required,
again using analysis of variance.
Analysis of variance can be applied to many kinds of experiment. For example, analysis
of variance could be used to investigate the complicated version of the drug trial from
Section 6.2 of Unit C l . A slightly more complicated version of the mustard seed
experiment from Section 2 of this unit could also be analysed this way. In the
experiment which we asked you to carry out, you concentrated on comparing seedlings
whose stems had been cut (although you were asked not to cut the stems on a few
seedlings just to see what would happen). Another approach would have been to
compare the root lengths of seedlings grown in the light and dark, both with the stems
on and with the stems cut off. The design for such an experiment would have been as in
Figure 4.10.
Figure 4.10
Stem not cut
Stem cut
Grown in light
Grown in dark
10 seedlings
10 seedlings
10 seedlings
.l0 seedlings
There are two reasons why the analysis of variance has been introduced here, at the end
of a unit about experiments and t-tests.
The statistical techniques used in the analysis of variance are closely related to, but
rather more systematic than, those used in the t-test.
In addition to an analysis of variance, an experimenter may well carry out one or
more t-tests. For example, the analysis of variance might show that the diets vary
significantly. A t-test might then be used to analyse the difference between two
particular diets in which the experimenter was especially interested. There are
considerable difficulties in interpreting the results of such t-tests, especially when
there are several possible comparisons which could be made. Nonetheless, the ttest is widely used in conjunction with complicated experiments involving the
analysis of variance. However, we do not recommend such use thereof.
Summary
In this section we stated that the two-sample t-test is not markedly affected by quite
wide deviations from Normality in the population under investigation and that it is not
greatly affected by unequal variances in the two populations provided that the sizes of
the samples involved are nearly equal. We then introduced a one-sample t-test and
showed how this and other tests, such as the z-test, give tests for matched pairs of data.
Confidence intervals based on t-tests were also described and the principles of an
important statistical technique, called analysis of variance, were briefly illustrated.
After completing your work on this section you should be able to
recognize both samples and areas of investigation in which it would be unwise to
use the t-test
carry out a one-sample t-test
carry out a matched-pairs t-test
understand that other one-sample tests, e.g. the sign test and the z-test, can be used
as matched-pairs tests
find a confidence interval from small sample@)ofdata from population(s) satisfying
the distributional conditions required by the t-tests.
The number of root
measurements in each group is
shown as 10. Although of
course other numbers could
be used, it is important that
there is the same number of
seedlings in each group.
Solutions to exercises
Exercise 1.1 (p. 9)
(a)(b)(d) All three of these activities are designed to answer specific questions by pursuing
specific investigations. Thus they are all scientific experiments provided they are properly
conducted and all stages are open to public scrutiny.
(C) This is not a scientific experiment since the decision is to be based on a single individual's
opinion. It is an expert opinion but it is none-the-less personal and private.
Exercise 1.2 (p. 9)
(a) This is a measurement experiment.
(b) This is an exploratory experiment. You could, however, rephrase it as a hypothesis-testing
experiment, as follows.
Hypothesis Rapid acceleration and braking causes high petrol consumption.
Prediction Accelerating and braking less fiercely will decrease the petrol consumption.
Test See what happens to petrol consumption when braking and accelerating are less fierce.
Possible results and conclusions Petrol consumption unchanged (or goes up), therefore
hypothesis is false. Petrol consumption goes down, therefore hypothesis is supported.
(d) This is a hypothesis-testing experiment. It seeks to test a hypothesis about the cause of a
phenomenon, as follows.
Hypothesis Overweight is caused by overeating.
Prediction Eating a lot will cause people to be overweight.
Test Measure the weights of people who eat a lot and people who do not and compare these
with their ideal weights.
Possible results and conclusions People who eat a lot are no more overweight than people who
do not, therefore hypothesis is false. People who eat a lot are more overweight than people who
do not, therefore hypothesis is supported.
Exercise 1.3 (p. 9)
(b) (Treating this as a hypothesis-testing experiment.)
Null hypothesis Rapid acceleration and braking does not cause high petrol consumption.
Alternative hypothesis Rapid acceleration and braking causes high petrol consumption.
Test See what happens to petrol consumption when braking and accelerating are less fierce.
Possible results and conclusions Petrol consumption goes down, therefore reject the null
hypothesis. Petrol consumption unchanged (or goes up), therefore the null hypothesis is
supported.
(d) Null hypothesis Overweight is not caused by overeating.
Alternative hypothesis Overweight is caused by overeating.
Test Measure the weights of people who eat a lot and people who do not and compare these
with their ideal weights.
Possible results and conclusions People who eat a lot are more overweight than people who do
not, therefore reject the null hypothesis. People who eat a lot are no more overweight than people
who do not, therefore the null hypothesis is supported.
Exercise 3.1 (p. 27)
Exercise 3.2 (p. 27)
Thus the value of the test statistic
The number of degrees of freedom is 28, so the critical value t, is 2.048.
Thevalue of the test statistic is further from zero than the critical value so we should reject the null
hypothesis. Thus the two production lines do appear to produce packets of different weight.
Exercise 4.1 (p. 40)
We need to analyse the sample of 5 differences (D - P)
Cx
-12
n
5
Thus the mean difference m = -= -= -2.4.
Thus S =
C x 2 - nm2
-E
-
= 4.2778499.
Thus the test statistic
= - 1.254.
The number of degrees of freedom is n - 1 = 4. Thus the critical value is 2.776. The test statistic
- 1.254 is closer to zero than the critical value 2.776, so we cannot reject the null hypothesis. Thus
there appears to be no difference between the drug and the placebo in their effect on male
patients.
Exercise 4.2 (p. 40)
(a) From Exercise 3.1, t, = 2.026 and
Thus
= 0.056.
Figure 4.7
Thus, since m, - mH = 0.012, the confidence interval is
t0.012 - 0.056, 0.012 0.0561.
This is [-0.044, 0.068 1.
+
(b) From Exercise 3.2, t, = 2.048 and
Z
4.2.
Thus tc/=
~ m, = 4.6, the confidence interval is
Thus, since r n [4.6 - 4.2, 4.6 + 4.21.
This is [0.4, 8.81.
Figure 4.8
%%,
*
04
,
,.
~.*-T-m
.
1
8.8
Index
alternative hypothesis 1.3, 3.2, 4.3, 4.4
analysis of variance 4.7
one-sample z-test 4.3
ordinal measurement 3.1, 4.1
back-to-back stemplot 3.4
Baconian experiment 1.2
Boneau 4.2
parameter 4.1
parametric test 4.1
Pasteur 1.2
Popper 1.3
population mean 3.2, 4.3
population standard deviation
population variance 3.2, 3.4
power of a test 4.1
common population variance 3.3
confidence interval 4.6
control 2.1
control group 4.7
critical value 3.3, 4.3
degrees of freedom 3.3, 4.3
distribution of c 3.3
distributional condition 3.1, 3.4, 4.1, 4.3
error 2.6, 4.1
estimate 3.2, 3.3, 4.3, 4.6
experiment 1.1, 2.1
exploratory experiment 1.2
goodness-of-fit test 4.2
Gossett 3.2, 4.4
Guinness 3.2
hyoscyamine hydrobromide 4.4
hypothesis testing 1.3
hypothesis-testing experiment 1.2
hypothetico-deductive experiment 1.2
interval scale measurement
3.1, 4.1, 4.3
Mann-Whitney test 3.1, 4.1
matched pairs 4.4, 4.5
matched-pairs sign test 4.5
matched-pairs c-test 4.4
matched-pairs z-test 4.5
mean 3.2, 4.1
measurement 2.5, 3.1
measurement experiment 1.2
median 4.1, 4.5
mustard seedlings 2.1, 3.4
random variability 4.7
root hairs 2.4, 2.5
S 4.3
Sc 3.3
sample 3.2, 3.3, 4.2, 4.3, 4.4
sample mean 3.2, 3.3, 4.3
sample size 3.2, 3.3, 4.3
sample standard deviation 3.2, 4.3
sample variance 3.2
sampling distribution 3.2
shape 4.1
sign test 4.3
skew 4.2
sources of variability 4.7
standard deviation 3.2, 3.3, 3.4, 4.3
standard error 3.2, 4.3
standard Normal distribution 3.2, 3.3, 4.3
stemplot 3.4
Student's c-test 3.2, 3.3, 4.4
symmetric 4.2
systematic error 2.6
c 3.3
t, 3.3
tail 3.3
test statistic 3.2, 3.3, 4.3
To Autumn 1.1
c-test 3.2, 3.3, 3.4, 4.1, 4.3, 4.4
two-sample C-test 3.3
two-sample z-test 3.2
type l error 4.1, 4.7
non-parametric test 4.1
type 2 error 4.1
Normal distribution 3.2, 3.3, 3.4, 4.2, 4.3, 4.4
null hypothesis 1.3, 3.2, 4.3, 4.4, 4.7
number of degrees of freedom 3.3, 4.3
variability 2.6, 4.7
variance 3.2, 3.3, 3.4
one-sample t-test 4.3
one-sample Wilcoxon test
3.2, 3.4, 4.3
Wilcoxon matched-pairs test
4.5
z-test
3.1, 3.2, 4.3
4.4, 4.5
Acknowledgements
Grateful acknowledgement is made to the following for the material in this unit: Figure
1.1 reproduced, by permission, from The Oxford Book of English Verse 1250-1918,
Oxford University Press; photographs p. 6 Wellcome Institute, p. 9 courtesy of Sir Karl
Popper, p. 18 from Student's Collected Papers, courtesy Biometrika; data Figures 3.2,
Exercise 3.1 Ministry of Agriculture, Fisheries and Food.
MDST 242 Statistics in Society
Block A
Unit A 0
Unit A1
Unit A 2
Unit A 3
Unit A 4
Unit A 5
Exploring the data
Introduction
Prices
Earnings
Relationships
Surveys
Review
Block B
Unit B1
Unit B2
Unit B 3
Unit B4
Unit B5
Statistical inference
How good are the schools?
.*
Does education pay?
.
Education: does sex matter?
What about large samples?
Review
Block C
Unit C1
Unit C 2
Unit C3
Unit C4
Unit C5
Data in context
Testing new drugs
Scientific experiments
s
I my child normal?
Smoking, statistics and society
Review