Download here - Mathematical and Computer Sciences - Heriot

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
1
Topic 5
Hypothesis Tests
Contents
5.1 Introduction to Tests of Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . .
2
5.1.1
5.1.2
5.1.3
5.2 Single
.
.
.
.
3
4
5
6
5.3 Single proportion - large samples . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Difference of two means - large samples . . . . . . . . . . . . . . . . . . . . . .
5.5 Difference of two proportions - large samples . . . . . . . . . . . . . . . . . . .
10
12
14
5.6 Small Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.6.1 Single mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.6.2 Confidence Intervals with Small Samples . . . . . . . . . . . . . . . . .
17
19
21
5.6.3 Difference of 2 Means from Small Samples . .
5.6.4 Paired t test . . . . . . . . . . . . . . . . . . . .
5.7 The Chi-Squared Distribution . . . . . . . . . . . . . .
5.7.1 Checking for Association - Hair and Eye Colour
.
.
.
.
21
24
26
27
5.7.2 Limitations of Chi-squared test . . . . . . . . . . . . . . . . . . . . . . .
5.7.3 Goodness of Fit Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.8 Coursework 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
32
36
5.9 Summary and assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
Type 1 and 2 Errors . . . . . .
One-tailed and two-tailed tests
Different Significance Levels .
mean - large samples . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Learning Objectives
identify situations in experimentation where a hypothesis test will produce a useful
result
appreciate the ideas of null and alternative hypotheses
use the standardised Normal distribution in hypothesis tests involving large
samples
use the student’s t distribution in hypothesis tests involving small samples
explain Type 1 and Type 2 Errors
use the formulae for standard error and test statistic in the cases of
2
a) single mean - large samples
b) single proportion - large samples
c) difference between two means - large samples
d) difference between proportions - large samples
e) single mean - small samples
f) difference between two means - large samples
decide when to use a One or Two Tailed Test
appreciate the concept of degrees of freedom
c
calculate confidence interval for population mean based on a sample mean from
a small sample
use a paired t test
H ERIOT-WATT U NIVERSITY 2003
5.1. INTRODUCTION TO TESTS OF HYPOTHESIS
5.1
3
Introduction to Tests of Hypothesis
In the last Topic it was seen that a sample could be used to infer a confidence interval
for the mean of the population that it was taken from. A very useful fact is that this
method can be turned on its head and instead of being used to estimate a property
of the population, a sample can help prove whether it is likely that a population has a
particular mean value (or proportion). In this chapter most of the worked examples will
start off by suggesting a hypothesis (assumption) and effectively either proving it to be
true or deciding that it is false.
Imagine that a company states in its sales pitch that a particular model of its mobile
phones lasts for 150 hours before it is required to be next charged up. If you were
thinking of buying one, you may like to obtain some proof that this assertion is true. One
way of doing this is to take a sample of phones, make a number of measurements and
then calculate the mean number of hours between charging. It would be impossible to
do this for every phone produced since the population is so large, so the best that can
be done is to calculate a sample mean.
Suppose that a sample of 40 was taken and this produced a mean value of 147.4 hours.
Does this mean that the manufacturer’s claim has been disproved? Clearly 147.4 is less
than 150 so it looks as if the manufacturer is over-estimating the time between charging.
However, it must be appreciated that this was just one sample; it was shown in the last
topic that if another sample was taken it might give a very different result (for example,
it could give a value of 152.3 hours, in which case the phones are doing better than the
manufacturers claim!).
The method of hypothesis testing starts by making an assertion about the population,
usually an assumption that the mean is equal to a stated result. In this case it is
hypothesised that the population mean, , for the mobile phones is 150 hours. The
Central Limit Theorem will next be used, and to do this a value for the standard deviation
is required. Assume that in this case the population standard deviation is 12 hours.
From the last topic, 95% of all sample means lie between
The term
(which is the standard deviation of the sample means) is often called the
Standard Error (S.E.). In this case it is equal to 1.90.
So the upper and lower bounds calculate as 150
3.72.
So 95% of the sample means lie between 146.28 and 153.72.
All the calculations so far have been based on the population; it is only now that the
sample mean value needs to be used - recall that this was calculated as 147.4 hours.
This is within the range of values that 95% of sample means are expected to fall
between, so it has not been possible to disprove the hypothesis that the mean is 150.
In other words there is only a 5% chance that the population mean is not 150. This is
known as a significance test with level 0.05.
There is no evidence to dispute the manufacturer’s claim at the 5% level.
The supposition that the population mean is equal to 150 can be written as
H0 :
= 150
This is called the Null Hypothesis.
c
H ERIOT-WATT U NIVERSITY 2003
5.1. INTRODUCTION TO TESTS OF HYPOTHESIS
To decide whether or not this assertion is true, it is necessary to have a comparison with
an alternative hypothesis (so that one or the other will be true). This is written as
H1 :
150
It is usual to then draw a Normal distribution curve and shade in the appropriate
significance level (here 5%).
The whole calculation can then be expressed more briefly in a diagram as:
Since the sample mean, 147.4, is not in the shaded region H 0 is accepted. There is no
evidence at the 0.05 level of significance that the population mean is not 150.
5.1.1
Type 1 and 2 Errors
Since probabilities are used in the hypothesis tests, there is always the chance of an
error in the conclusion being made. In the mobile phone example it is only being said that
the sample mean value is consistent with a population mean value with 95% confidence.
There is a 5% chance that the population mean value is not 150 hours. If the population
mean is, in fact, not 150 hours but the hypothesis test resulted in accepting H 0 , it is said
that a Type 2 Error has occurred. Conversely, if H 0 is actually true, but the sample mean
resulted in it being rejected, it is said that a Type 1 Error has been made. This can be
summarised in the table below.
c
H ERIOT-WATT U NIVERSITY 2003
4
5.1. INTRODUCTION TO TESTS OF HYPOTHESIS
Decision
Accept H0
Reject H0
5.1.2
State of Nature
H0 is true
correct decision
Type 1 Error
5
H0 is false
Type 2 Error
correct decision
One-tailed and two-tailed tests
In the mobile phone example, recall that the diagram of the normal distribution curve
had a 5% area shaded and this was split between both "tails". This will always be the
case when the alternative hypothesis has a "not equal to" sign and is called a two-tailed
test (for obvious reasons!).
In some hypothesis tests, the alternative hypothesis is given as " is less than" or "
is greater than" some value. In cases like this, only one side of the normal distribution
curve is shaded and, not surprisingly, the test is called a one-tailed test.
The example will now be re-worked as a one-tailed test. A competing mobile phone
manufacturer wishes to prove that the time between charging for his rival’s phone is
less than 150 hours. The hypotheses (plural of hypothesis) now become
H1 : H0 :
150
150
The normal distribution curve in this case now has only one side shaded
!"$#&%')(*
,+
To calculate the "cut-off" point, this time it is not
that is used, but
(From tables, the value of 1.64 gives an area under the normal distribution
of approximately 0.05, whereas 1.96 gave 0.025)
".-$#%')(/
This means that the lower bound is
Cc
1023+4".-5# 9 68:<7 ;>= ?-@"AB!
H ERIOT-WATT U NIVERSITY 2003
5.1. INTRODUCTION TO TESTS OF HYPOTHESIS
6
Using the same sample value as before since the sample mean, 147.4, is not in the
shaded region, again the null hypothesis is accepted. There is no evidence at the 0.05
level of significance that the population mean is less than 150.
D
Notice that for one-tailed tests with " " in the alternative hypothesis it is the left hand
side of the Normal distribution curve that is shaded, whilst if it is " " in the alternative
hypothesis the right hand side is shaded.
5.1.3
E
Different Significance Levels
The significance level of 5% (or 0.05 in decimals) has been used in the mobile phone
example. This is a very common value to use but it is not the only one that can be
employed. It implies that there is a 5% chance of making a mistake. However, if it is
necessary for the margin of error to be less (in medical matters, say) then this can be
reduced to 1% or even 0.1% (or, indeed any other value). Changing the significance
level will have an effect on the "cut-off" point. For example, in the mobile phone example
for a two-tailed test and a significance level of 1%, the upper and lower bounds would
be calculated as
150
F
2.58 x S. E. i.e. from 145.10 to 154.90
The lower the significance level, the more difficult it is to prove the alternative hypothesis
(which is often what you hope to do). If an alternative hypothesis is proved at the 5%
level it is said to be significant; a level of 1% is termed highly significant whilst a 0.1%
level is deemed to be a highly significant result.
To help in calculations at different significance levels, for the general result
the appropriate z values are given in the table below
Sc
H ERIOT-WATT U NIVERSITY 2003
GHFJILKNMPO'Q)R/Q ,
5.2. SINGLE MEAN - LARGE SAMPLES
T
T
U
.50
.45
.40
.35
.30
Z
0.0000
0.1257
0.2533
0.3853
0.5244
.25
.20
.15
.10
.05
0.6745
0.8416
1.0364
1.2816
1.6449
T
U
.020
.019
.018
.017
.016
Z
2.0537
2.0749
2.0969
2.1201
2.1444
.015
.014
.013
.012
.011
2.1701
2.1973
2.2262
2.2571
2.2904
T
U
U
.050
.048
.046
.044
.042
Z
1.6449
1.6646
1.6849
1.7060
1.7279
.030
.029
.028
.027
.026
Z
1.8808
1.8957
1.9110
1.9268
1.9431
.040
.038
.036
.034
.032
1.7507
1.7744
1.7991
1.8250
1.8522
.025
.024
.023
.022
.021
1.9600
1.9774
1.9954
2.0141
2.0335
T
U
T
U
.010
.009
.008
.007
.006
Z
2.3236
2.3656
2.4089
2.4573
2.5121
.050
.010
.001
.0001
.00001
Z
1.6449
2.3263
3.0902
3.7190
4.2649
.005
.004
.003
.002
.001
2.5758
2.6521
2.7478
2.8782
3.0902
.025
.005
.0005
.00005
.000005
1.9600
2.5758
3.2905
3.8906
4.4172
T
5.2
7
: significance level
Single mean - large samples
Hypothesis tests can be carried out on many different types of experimental data but
the method of implementation is always the same. The main points to note are that
the analysis should always begin by stating the null and alternative hypotheses, an
appropriate measure of standard error should then be calculated and finally the sample
value should be plotted on the appropriate distribution curve - depending on where it
lies the null or alternative hypothesis will be accepted.
Comparisons with the Normal distribution curve are only valid if the sample size is
greater than 30; when this is the case the sample is categorised as large. Small samples
will be considered later.
The formula for the Standard Error in problems involving one large sample comes
straight from the Central Limit Theorem given earlier.
V'W)X*WAY [ Z \
]c
H ERIOT-WATT U NIVERSITY 2003
5.2. SINGLE MEAN - LARGE SAMPLES
Examples
1. The time between server failures in an organisation is recorded for a sample of 32
failures and the mean value calculates as 992 hours. The organisation works on the
assumption that the mean time between server failures is 1000 hours with a standard
deviation of 20. Is it justified to use this figure of 1000 hours? Use a significance level of
0.05.
H0 :
^
= 1000
^` _ 1000
a'b)c*b ` g dfh e `ji b k i l
d
H0 :
This is a two-tailed test with 2.5% shaded on each side of the Normal distribution so the
cut-off points are given by 1000 1.96 x 3.536 , i.e. 993.070 and 1006.930.
m
This is shown on the diagram below, together with the sample mean of 992.
Since the sample mean is in the shaded area, the null hypothesis is rejected and so
the alternative hypothesis accepted. This means that there is evidence at the 5% level
that the population mean is not 1000, so the organisation might like to review their
specification for the server which, in fact, is performing better than they indicate.
It is often the case when performing hypothesis tests that a test statistic is calculated
from the sample value and this is compared with the standardised normal curve. This
is doing exactly the same thing that was shown in Chapter 2 when converting Normal
distributions into a form that could be compared with the tables.
In this case, the test statistic is
n ` ous prt vwq t
This gives z = -2.26
This is now compared with the standardised Normal curve
xc
H ERIOT-WATT U NIVERSITY 2003
8
5.2. SINGLE MEAN - LARGE SAMPLES
9
It is clearly seen that the test statistic falls in the shaded area so H 1 is accepted as
before.
The two previous diagrams show that the two methods are identical but simply involve
considering different scales.
2. It is suspected that in a particular experiment the method used gives an underestimate of the boiling point of a liquid. 50 determinations of the boiling point of water
were made in an experiment in which the standard deviation was known to be 0.9
degrees C. The mean value is calculated to be 99.6C. The correct boiling point of water
is 100 degrees C. Use a significance level of 0.01.
Since it would be desirable to prove that the population mean is less than 100, it is
sensible to use a one-tailed test with alternative hypothesis
100. So the hypotheses
become
y{z
y|
H0 : yz
H0 :
100
100
The standard error is
~ } $€jƒ‚…„1†u‡
Test statistic = 99.6 - 100/0.127 = -3.15
The standardised normal distribution curve gives a z value of -2.33 for an area of 0.01.
Thus the diagram, with test statistic marked in, has the appearance:
ˆc
H ERIOT-WATT U NIVERSITY 2003
5.2. SINGLE MEAN - LARGE SAMPLES
10
Since the test statistic is in the shaded region, H 1 is accepted. There is evidence at the
1% level that the population mean is less than 100. It is logical to assume, then, that the
method of the experiment is underestimating the boiling point.
In the above examples the standard deviation of the population mean was known. Often
this will not be the case so as long as the samples are large ( 30) it is acceptable
to estimate this value by using the sample values (as was done in Topic 3 with the
confidence intervals).
‰
Hypothesis testing
Q1:
A particular questionnaire is designed so that it can be completed in 2 minutes. Over
a number of days a researcher measures the time taken by everyone who fills in the
form. The results are given in the table below (and can be downloaded here). Take a
random sample and carry out a hypothesis test to check whether the 2-minute expected
completion time is valid. Times are given in minutes.
2.44
2.71
2.46
2.53
1.76
2.52
2.60
2.26
1.87
1.80
Šc
2.20
2.48
2.14
2.71
1.89
2.97
2.32
1.46
2.16
2.16
1.49
2.99
2.95
2.33
2.54
1.37
2.01
2.04
2.21
1.93
2.39
1.92
2.19
2.12
2.62
0.91
1.25
1.53
2.19
2.29
H ERIOT-WATT U NIVERSITY 2003
2.59
2.59
1.67
2.12
1.86
2.99
1.79
1.78
2.08
2.49
2.63
3.21
2.31
2.08
2.05
2.87
1.84
1.87
2.24
1.26
2.20
1.92
2.38
1.73
1.03
2.22
2.03
1.98
2.40
2.34
2.66
1.73
1.52
2.41
2.49
2.58
2.10
1.72
1.73
2.12
5.3. SINGLE PROPORTION - LARGE SAMPLES
2.16
1.60
1.90
2.04
1.77
2.02
2.19
1.49
1.94
2.29
2.12
2.23
1.88
1.75
1.81
2.41
2.21
2.03
2.96
2.12
5.3
1.73
1.99
2.19
1.06
1.77
1.66
2.54
1.69
2.26
1.13
1.81
2.80
2.22
1.95
1.50
1.83
2.38
1.97
1.99
2.22
11
1.83
2.25
2.56
2.35
2.67
2.57
2.32
2.58
2.23
2.12
2.29
1.97
1.50
2.06
2.23
1.95
2.07
2.42
2.39
1.99
1.51
1.91
1.44
2.59
2.58
1.88
2.57
2.19
2.04
2.04
Single proportion - large samples
It was shown in Topic 3 that sample proportions also follow the theory of the Central
Limit Theorem. The standard deviation of the proportions (which will now be referred
‹ ŒŽf‘rŒ’
“
”
to as the Standard Error) was given by the formula
, where is the population
proportion and n is the sample size (again considered to be 30). Hypothesis tests can
be carried out in much the same way as before.
•
Examples
1.
A survey of the first beverage that residents of the UK take when they waken up in the
morning has shown that 17% have a cup of tea. It is thought that this figure might be
higher in the county of Yorkshire, so a random sample of 550 Yorkshire residents is
questioned and out of that number 115 said they had tea first thing. Using a significance
level of 0.05, test the idea that the tea figure is higher in Yorkshire.
The population proportion is thought to be 17% (or 0.17 as a decimal) so this is the figure
that must be used in the hypotheses (like in the "mean" case, where it was always the
population mean that was mentioned in the null hypothesis). Since it is hoped that it can
be proved that the Yorkshire figure is higher than average, the alternative hypothesis
0.17.
must have the form,
”,•
The hypotheses are therefore:
”,–
H1 : ”,•
H0 :
0.17
0.17
‹ ŒŽf‘rŒ’ ‹ › œž Ÿ¡f‘¢› ›œž Ÿf’
f£ £
š¤ƒ˜)¤¦¥1§¤
“ š
Now calculate the Standard Error (S.E.)
In this case,
—'˜)™/˜ƒš
‘rŒ
¨ ª
P is the population proportion, in this case 115/550 or 0.209
š ©› « œ œ­f¬w›f®œ ,‘¢where

›
ž
œ

¯
Ÿ
›
¨ š › œ ›8°f› šj±A˜ž²²
The test statistic here is comparable to the one for means.
The standardised Normal curve can be drawn as before, with the value of 2.33 being
used as the cut-off point for 1% (or 0.01).
³c
H ERIOT-WATT U NIVERSITY 2003
5.3. SINGLE PROPORTION - LARGE SAMPLES
Since the test statistic for the sample proportion is in the shaded area, there is enough
evidence to reject the null hypothesis and accept the alternative one. In other words,
Yorkshire folk drink more tea than the National average (using a significance level of
1%). Note that it is harder to prove a fact using a significance level of 1% than it is for
5%, so it can be said that this is a highly significant result.
2.
In an ESP test, a subject has to identify which of the five shapes appears on a card. In
a test consisting of 100 cards, would you be fairly convinced that a subject does better
than just guessing if he gets 30 correct? Test at 1% significance level and at 0.1%
significance levels.
If he just guesses the proportion of times he would get the answer right is 1/5 = 0.2. So
it is hoped to prove that the sample corresponds to a population proportion greater than
0.2. The hypotheses are therefore:
´,µ
H1 : ´,¶
H0 :
0.2
0.2
Now calculate the Standard Error (S.E.)
·'¸)¹/¸ƒº¼» ½Ž¾¿f Àr½Á º¼» ÃÄ Å¾¿8¿fÃfÀ¢Ã ÃÄ Å<Á ºÆƒ¸)ÆÇ
The test statistic is ÈɺËÊ Ì Àr½ , where P = 30/100 = 0.3
Ä ÍwÄ
So Èκ ÃÄ ÏÀ¢ÃÄ Å ºjÑA¸Ò
ÃÄ Ã¯Ð
The standardised Normal curve with appropriately shaded regions is shown below.
In this case,
Óc
H ERIOT-WATT U NIVERSITY 2003
12
5.4. DIFFERENCE OF TWO MEANS - LARGE SAMPLES
The test statistic is in the shaded region for the 1% significance test so accept H 1 here.
There is evidence at the 1% level that the subject displays powers of ESP.
However at the 0.1% level of significance, the test statistic is not in the shaded region.
Therefore the null hypothesis has to be accepted in this case.
This shows that there is a highly significant evidence of the subject displaying ESP, but
not a very highly significant result.
Ô
Ô Õ
Note that in both examples n and n(1- )
Limit Theorem to be valid.
5.4
5, a property that is required for the Central
Difference of two means - large samples
So far in this chapter the hypothesis tests have been used to compare one sample mean
or proportion with a known value. However, it is very often the case that comparisons
are required between two samples in order to decide which is the better of the two for a
certain purpose. For example, if a new piece of software is introduced into an office and
workers think that their job is now taking longer on the new system, it would be useful to
have a statistical test to check out their claims.
The Central Limit Theorem provides useful information about the distribution of sample
means. However, it can be extended to also give information about the distribution of
the difference of two sample means. In fact, it can be proved that this distribution
is Normally distributed with mean 0. This is a very useful and interesting result and it
highlights once again why the Normal distribution is so important in statistics! The same
rules apply as for the single mean case that the original populations do not have to be
Öc
H ERIOT-WATT U NIVERSITY 2003
13
5.4. DIFFERENCE OF TWO MEANS - LARGE SAMPLES
Normally distributed as long as the sample size is greater than 30.
The standard deviation of the difference of two sample means, referred to again in this
section as the Standard Error (S.E.) is given by the formula:
×'Ø)Ù*ØAÚÜÛ à.Ý Þßâß á
àÝãÞÞÞ
where the subscripts 1 and 2 refer to population 1 and population 2 respectively.
As in previous examples for large samples, if the population standard deviation is
unknown it is fine to use the sample standard deviations (usually referred to as s 1 and
s2 )
The hypothesis tests usually start of by assuming that there is no difference between
the population means ( 1 - 2 = 0) and either confirming this or proving the assumption
to be wrong.
ä ä
Example
The response times of two hard drives are measured and the values are given in the
table below (times are measured in seconds).
Disk 1
n1 = 35
Disk 2
n2 = 38
s1 = 5
s2 = 4
åçæ Úéè1ê
åìë Úíè1î
Is there a significant difference between response times?
Start off by assuming that there is no difference between the populations that the two
samples come from.
ä
H0 :
1
-
ä
2
=0
There is no need to check whether one disk is better or worse than the other so a
two-tailed test is a reasonable thing to use. Therefore the alternative hypothesis is:
H1 :
ä
1
-
ä 2 Úï
0
The method of the test follows the same pattern as the previous examples in this chapter.
The next step is to calculate the standard error and use it in the test statistic.
×'Ø)Ù*ØAÚ Û ðñ ðÞ í
á ñfò óÞ ÚéèØ)ôuêê
Since now it is the difference of means that is being considered, the test statistic takes
the form:
õ Ú÷ö ø ß?ù ø Þ?ýuú ùþ ÿwöüþû ß ù û Þ ú
Since it is being assumed in the null hypothesis that m1 = m2, the second bracketed
term on the numerator is equal to zero
Thus
õ Ú ö æ æ ù þ æ ñ ú ù ÚA؃è
Now make a sketch of the standardised Normal distribution curve and choose a
significance level of 0.05.
c
H ERIOT-WATT U NIVERSITY 2003
14
5.5. DIFFERENCE OF TWO PROPORTIONS - LARGE SAMPLES
It can be seen that the test statistic is in the shaded region so the null hypothesis is
rejected and the alternative hypothesis accepted. It has therefore been shown that
there is a significant difference between the response times.
It would, of course, have been possible to carry out a one-tailed test if required in the
example. The hypotheses would change to:
H0 :
1
-
2 0
H1 :
1
-
2 0
5.5
Difference of two proportions - large samples
In the same way as the theory relating to one sample mean was extended to the
comparison of two sample means, exactly the same thing can be done for sample
proportions.
The Central Limit Theorem provides useful information about the distribution of sample
proportions. However, it can be extended to also give information about the distribution
of the difference of two sample proportions. In fact, it can be proved that this
distribution is Normally distributed with mean 0. This is a very useful and interesting
result and it highlights once again why the Normal distribution is so important in
statistics! The same rules apply as for the single proportion case that the original
populations do not have to be Normally distributed as long as the sample size is greater
than 30. Also it is required that np and n(1- p) are greater than 5 for each population.
The standard deviation of the difference of two sample proportions, referred to again in
c
H ERIOT-WATT U NIVERSITY 2003
15
5.5. DIFFERENCE OF TWO PROPORTIONS - LARGE SAMPLES
this section as the Standard Error (S.E.) is given by the formula:
!#"
$ %
'&() '&"
'&
where the subscripts 1 and 2 refer to population 1 and population 2 respectively.
Now, usually the population proportions are unknown and the null hypothesis will be
assuming in any case that they are the same. For these reasons, a pooled value of the
sample proportions is used in the formula instead of * 1 and * 2 . This is referred to as + .
Thus the formula for the standard error becomes:
, , "
$ %
, ) , "
$ &
The hypothesis tests usually start of by assuming that there is no difference between the
population proportions (* 1 - * 2 = 0) and either confirming this or proving the assumption
to be wrong.
Example
It is desired to investigate the proportion of people who attend church regularly in
Scotland and in England, so two random samples are taken and the results are given
below.
Scotland
England
Attend regularly
47
31
Do not attend regularly
136
183
106
137
Total
Is there any evidence that more people in Scotland attend church than in England?
This is a problem dealing with two proportions so the method of solution is to use the
formulae for the difference of two proportions.
Since it is desired to prove that the Scottish proportion is higher than the English
proportion, a one-tailed test has to be used. If Scotland is referred to with subscripts "1"
and England with subscripts "2", the alternative hypothesis will have to be of the form
H1 :
* 1
-
* 2 -
0
So the null hypothesis will be
H0 :
* 1
-
* 2 .
0
The problem is solved using exactly the same procedures as all the previous ones.
1.
Hypotheses
H0 :
* 1
-*
2 .
0
H1 :
* 1
-*
2 -
0
2.
Calculation of Standard Error
/
First calculate +
c
H ERIOT-WATT U NIVERSITY 2003
16
5.5. DIFFERENCE OF TWO PROPORTIONS - LARGE SAMPLES
This is calculated as
Now,
C
>DE>:GF
HBI
5LNJ M
17
0(124365
5738295381;:=<?>@BA!A
O
H'K
5
L!J Q
HBI
HPK
:
F
R6S T
00'U
1WVX
573 R6S
O
R6S T
00'U
1WVX
5381 R6S
:=<?><YANZ![
3.
Calculate test statistic
M
Q
_ J
J
IH
H K ^I ]
] K
:
S `aS have b 1
hypothesis we
In this case
In the null
0.
M
\
Q
J
-
b 2 c
0 so take the extreme case that b
1
-b
2
=
Now, P1 = 47/183 = 0.257 and P2 = 31/137 = 0.226
This gives
\
:
I R6S T
V81dJ
4.
R6S R
R608S 7TV T
X
K
J
R
:=<?>e!f!g
Compare the test statistic with the standardised Normal distribution curve (use a
5% significance level).
5.
Offer a conclusion.
Since the test statistic is not in the shaded area the null hypothesis in accepted.
There is no evidence, at the 5% level, that a higher proportion of the Scottish
population attend church than does the English population.
p
Notice that in this example
sizes.
c
h i
H ERIOT-WATT U NIVERSITY 2003
and
hkjWlnm
i4o
are greater than 5 for both sample
5.6. SMALL SAMPLES
5.6
Small Samples
In the large sample (n q 30) problems discussed earlier in this Topic it was acceptable
to estimate the population standard deviation by using the sample standard deviation.
With small samples, where more chance variation must be allowed for, there is more
uncertainty in estimating this value and hence also the standard error. Some modification of the procedure of using the test statistic is needed, and the technique to use
is the t test. Its foundations were laid by W.S. Gosset [1876-1937], who wrote under the
pseudonym "student", so that it is sometimes known as student’s t test. The procedure
does not differ greatly from the one used for large samples, and in the one sample case
the test statistic looks very like the one used earlier, namely rks tYx u w y v
This t value is no longer compared with the standardised Normal distribution curve. In
fact, if the underlying distribution is Normal then this random variable is said to follow a
student t distribution with parameter z = n - 1.
This is similar to the Normal distribution in the sense that it is a symmetrical "bellshaped" curve, but it is slightly flatter and hence wider (the total area under it, of course,
still equals 1). Note, though, that as n gets larger, the curve becomes indistinguishable
from the Normal distribution.
Unlike the Normal distribution, however, where the same values for "cut-off" points were
used whatever the sample size (e.g. 1.96 for an area of 0.025), this is not the case in
the t distribution. These values change depending on what the sample size is. They
can be obtained from Statistical tables (or computer packages) and are categorised in
terms of a quantity called the degrees of freedom ( z ). To grasp the concept of degrees
of freedom, imagine you have been asked to select 5 numbers whose mean is 30 - the
sum of these numbers will therefore be 150. If the first four numbers selected were 25,
26, 29 and 33 there is no choice for the fifth one other than 37. In other words there are
only 4 degrees of freedom. In general if you have n numbers and the mean is specified
then you have n - 1 degrees of freedom.
An example of the t distribution curve with 10 degrees of freedom ( z = 10) is drawn
below with a shaded area of 2.5% in each tail.
{
c
H ERIOT-WATT U NIVERSITY 2003
18
5.6. SMALL SAMPLES
19
Part of the t tables are shown below. The number highlighted refers to an area of 5%
and degrees of freedom = 6.
|
=
=1
=2
=3
=4
=5
=6
=7
=8
=9
= 10
}
}
}
}
}
}
}
}
}
}
|
}
}
}
}
}
}
}
}
}
}
=
=1
=2
=3
=4
=5
=6
=7
=8
=9
= 10
0.10
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
0.01
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
0.05
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
0.005
63.657
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
0.025
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
0.001
318.31
22.326
10.213
7.173
5.893
5.208
4.785
4.501
4.297
4.144
0.0005
636.62
31.598
12.924
8.610
6.869
5.959
5.408
5.041
4.781
4.587
Notice that the numbers are all positive, so if the shading is on the left-hand side of the
curve, a negative sign is placed in front of the appropriate number.
~
c
H ERIOT-WATT U NIVERSITY 2003
5.6. SMALL SAMPLES
The following diagram shows how the t-distribution changes as  (and hence the sample
size) increases.
To summarise, the properties of the t-distribution are:
1. The t-distribution is "bell-shaped" and symmetric
2. The t-distribution is actually a family of curves, each determined by a parameter
called the degrees of freedom (  ), with  = n - 1
3. The total area under a t-curve is 1
4. The mean, median, and mode of the t-distribution are equal to zero
5. As the degrees of freedom increase, the t-distribution approaches the standard
normal z-distribution
5.6.1
Single mean
The method of the t test is best illustrated by an example.
Example
A paint manufacturer claims that on average one litre of paint will cover 14 square
metres. A firm buying the paint suspects that this is an exaggeration so they take a
random sample of 12 litres and measure the area covered by each. The data are:
13.6 13.9 13.2 14.5 12.6 12.6 13.2 13.8 13.4 12.4 14.3 13.2
The population standard deviation is unknown so it has to be estimated from the sample.
Since it has a size less than 30 the z test statistic cannot be used, but the t test can be
employed (as long as the original data follow a Normal distribution).
It is a straightforward process to show that the sample mean, €‚„ƒ†…‡…!ˆ!‰ and the sample
standard deviation, Š‹=Œ?‡!!Ž

c
H ERIOT-WATT U NIVERSITY 2003
20
5.6. SMALL SAMPLES
21
Now the hypotheses are set up as before. Since it is suspected that the area of paint
coverage is less than 14 square metres, a one-tailed test is used. The alternative
hypothesis should therefore be of the form ’‘ 14.
The hypotheses are summarised as:
H0 :
’“
14
H1 :
’‘
14
Now the standard error has to be calculated. In the case of small samples with a single
mean the formula is simply ”•–E•?— ™ ˜ š
In this case,
”•–E•?—œ›6™  ž ž¡ Ÿ
—=¢?•¤£†¥!¦
The t statistic is calculated by the formula
§
—
So
« © ª
¨Y
 ¬­ ® ® ¯¡
§
—
 ¯¡
›6
—±°³²•¤£P´
Now the t distribution curve is drawn with a 0.05 significance interval shaded. Note that
the "cut-off" value is obtained from tables using 11 degrees of freedom ( µ = 11) and
reading down the appropriate column.
Since the test statistic is in the shaded region the null hypothesis is rejected and the
alternative accepted. There is evidence at the 5% level that the paint coverage is less
than 14 square metres and so the manufacturer is exaggerating.
¶
c
H ERIOT-WATT U NIVERSITY 2003
5.6. SMALL SAMPLES
5.6.2
22
Confidence Intervals with Small Samples
In this section a short diversion from hypothesis tests is taken to fill in the gap of
estimating a population mean from a small sample. As is the case with large samples,
a point estimate of the population mean is given by the sample mean. However in the
small sample case, the confidence interval will depend on the sample size as well as
the mean. The formula is given by
·¹¸»º¼½ ¾À
Á ¿ ÂÄÃÆÅ’Ã
·ÈÇɺ¼½ ¾É
Á ¿Â
where · is the sample mean, s is the sample standard deviation, n is the sample size
and ta,v is available from tables.
Example
The lengths in cm of a random sample of 7 components taken from a the output of a
manufacturing process are:
3.1 3.4 3.4 3.3 3.2 3.3 3.0
Give a 95% confidence interval for the population mean.
By calculation, ·ËÊÍÌÎÏBÐNÌ and Ñ
ÊÓÒ?ΤԆÕ?Ô
A 95% confidence interval results in a shaded area of 2.5% in the tails of the t curve.
From tables then, using a significance level of 0.025 and a value of Ö = 6, the value
2.447 is obtained. Substituting the values in the appropriate formula gives:
·¹¸»º×†½ ¾ Á ¿ Â
ÃØÅ’Ã
ÌÎÏBÐNÌÚ¸ÛÏÎÜÐ!ÐÞÝàß6Á áÜâã6
ä â
3.103
Å
æ
æ
·ÈÇɺW×Ù½ ¾ Á ¿ Â
ÃØÅ’Ã
ÌÎÏBÐN̳ÇåÏÎÜÐ!ÐÞÝàß6ÁáÜâã6
ä â
3.383
In other words, it can be deduced that the population mean will lie between 3.103 and
3.383 with 95% confidence.
5.6.3
Difference of 2 Means from Small Samples
The sampling distribution of the difference between two means of small samples follows
a t distribution with mean Å 1 - Å 2 and standard error given by the rather complicated
looking formula:
ç ÎèEÎÊêé
ÑÞë
with
Ñ
é Êï
Ñ
é
Ô
ì
â
Ç
Ô
ìîí
being estimated from the two sample standard deviations as:
ð
ÂNñWò
ñ#õ Â ò
â#ó)¿ô ð ô í â#ó)¿ô
 ñ õ  ò
ô
ô
The subscripts 1 and 2 refer to sample and population 1 and 2 respectively.
The format of the hypothesis test follows exactly that of the large sample case, but
clearly uses a different formula for the standard error and the comparisons are made
with t distribution curves rather than standardised Normal.
The degree of freedom ( Ö ) for problems of this type can be calculated as n 1 + n2 - 1.
ö
c
H ERIOT-WATT U NIVERSITY 2003
5.6. SMALL SAMPLES
23
Example
A survey was carried out to investigate the number of hours worked per week by people
in various countries and two specific countries were highlighted, Japan and Russia. It
had always been believed previously that Russians worked the highest number of hours
per week, but the data do not seem to support this. The results are given below.
Russians (hours worked)
Japanese (hours worked)
÷ùøûúýüNþÿ
÷
úYÿ
ø úÓü ÿ
úÿþ
ø ú ü
ú
Is there a significant difference between the number of hours worked per week by
Russians and the Japanese? Test at the 5% level.
1.
Set up the hypotheses:
H0 :
1
-
2
H1 :
1
-
2
=0
ú
0
(Subscripts 1 refer to Russia and 2 Japan).
2.
Calculate the Standard Error:
ø
ÿEÿú
so
ú
ø
Therefore,
with
!
ø&%('
#
"
!
)
ø&%('
! #
"
! #
,
)
ÿEÿ?ú<
!
!
&)
"
ø&%('
!*#
+! #
úÓü ÿ:Yü;
!
ø.-0/132Üø.
ø41
!
)
ø
=
ø&%('
$#
ú
ø
"
!
)
)
ø.9
!
!
ø415/-62 7$8
#
!
ø
ø41
ø
ø.9
ú=ü ÿ:Yü;
ú>Yÿ+:Yü
3.
Use the standard error in the test statistic:
In the case of difference of two means for small samples this is given by ?
ú
$#
" @
@
%
!
#
C
"A
2 DE2
B#
A
!
%
And by the null hypothesis, So,
?
ú
96ø$2 F
#
ø$2 $
9 GH1
1B762 -
1
2
= 0.
ú>IJYÿ!þK
4.
M
Compare with t distribution curve with 27 degrees of freedom (recall that
calculated as n1 + n2 - 1):
c
H ERIOT-WATT U NIVERSITY 2003
L
is
5.6. SMALL SAMPLES
5.
Make a conclusion:
Since the test statistic is not in the shaded region the null hypothesis is accepted.
There is no evidence of a significant difference, at the 5% level, between
the number of hours worked per week by Japanese and Russian people. To
summarise, it has not been proved that Russians still work the longest hours per
week (as had been previously thought), but it has been shown that although the
Japanese figures initially seemed higher, there is, in fact, no significant difference
between the Russians and the Japanese in terms of the number of hours worked
per week.
Play length
Q2:
A music producer is interested in estimating whether there is a difference in the average
play length of Country and Pop CD singles. Random samples are taken from each
category and the results are shown here.
N
c
H ERIOT-WATT U NIVERSITY 2003
24
5.6. SMALL SAMPLES
Country (duration in minutes)
25
Pop (duration in minutes)
3.80
3.88
3.30
4.13
3.43
4.11
3.30
3.98
3.03
3.98
4.18
3.93
3.18
3.92
3.83
3.98
3.22
4.67
3.38
Assuming that the duration times of both types of music come from Normal distributions,
carry out a hypothesis test to investigate for a significant difference in duration.
5.6.4
Paired t test
In the last section, hypotheses were tested about the difference in two population means
when the samples were independent. A method is presented here to analyse situations
when this is not the case, for example, if some quantity was measured before and after
a specific treatment, clearly one set of results would depend on the other.
The test statistic in situations like this is given by a much simpler formula than that of
4.6.3, namely,
OQP
R3S
T
UWV
X Y
, with
Z
= n-1
Note that:
n = number of pairs (by default, both sample sizes must be the same).
d = sample difference in pairs
D = mean population difference
Sd = standard deviation of population difference
[
P
mean sample difference.
Example
Five keyboard operators were asked to perform the same task on two types of machine.
Test if there is any significant difference in the time taken to do the task. Test at the 5%
level. Times are in minutes.
\
c
H ERIOT-WATT U NIVERSITY 2003
5.6. SMALL SAMPLES
Operator
26
Machine A
9.6
8.4
7.7
10.1
8.3
1
2
3
4
5
Machine B
7.2
7.1
6.8
9.2
7.1
The null hypothesis assumes there is no difference in the population means, so D = 0.
Therefore:
H0 : D = 0
^
H1 : D
]
0
Now calculate the differences, d.
These are 2.4, 1.3, 0.9, 0.9 and 1.2 (Note that they are all positive here, but it would be
perfectly feasible to have both negative and positive results).
Now the mean and standard deviation of d are calculated in the usual way.
_
^`+abdcfe,g0hi^kjal`m
The test statistic is given by
n
^
h3o
p
s
qWr
t
o{z
^vu$w|~xH} ~
y €(
s
^ƒcfa„dc…j
‚
The t distribution curve with
side) is given below.
†
= 5 - 1= 4 and a significance level of 0.05 (0.025 each
Since the test statistic is in the shaded region the null hypothesis is rejected and the
‡
c
H ERIOT-WATT U NIVERSITY 2003
5.7. THE CHI-SQUARED DISTRIBUTION
alternative accepted. There is evidence at the 5% level of a difference in times taken
to perform the task on both machines. Note that if it was desired to prove that machine
B takes longer to do a task, a one-tailed test can be employed in the usual way. The
hypotheses would then become:
H0 : D
ˆ
0
H1 : D
‰
0
5.7
The Chi-Squared Distribution
So far the distributions discussed in the examples have all had graphs with a very similar
shape. Apart from having different points where they cut the axes, both the standardised
Normal and the t distributions have bell-shaped, symmetric curves as shown below.
However, do not be misled into thinking that every statistical distribution looks like
this. The first non-parametric test now considered deals with comparison with the chisquared distribution which has a graph shaped like the one below. (Chi is pronounced
"kye" and is the Greek letter Š ).
The chi-square distribution, like the t distribution, depends on the degrees of freedom
and so is actually a family of curves. Some examples are given below.
‹
c
H ERIOT-WATT U NIVERSITY 2003
27
5.7. THE CHI-SQUARED DISTRIBUTION
28
Note that the curve is NOT symmetrical.
There are two main uses of the chi-squared distribution. The first is to test whether
there is a significant association between two variables (like hair colour and a person’s
sex) and the second is what is called a "goodness of fit" test - a check as to whether
observed data follows a particular expected distribution.
5.7.1
Checking for Association - Hair and Eye Colour
The contingency table below was obtained from an experiment designed to examine
whether there is a relationship between hair and eye colour in humans.
Blue
Grey
Hazel
Brown
Total
Blond
60
20
10
10
100
Brown
40
50
50
160
300
Black
60
20
10
10
100
Red
40
10
30
20
100
Total
200
100
100
200
600
The first thing to do when analysing problems of this type is none other than the old
familiar process of setting up hypotheses. In testing for association there is only one
possibility for what they should be so there is no need to worry about whether it is a one
or two tailed test that is required. The general form of the hypothesis test is:
H0 : The two criteria of classification are independent
H1 : The two criteria of classification are not independent
In this particular case, then, the hypotheses will be
H0 : There is no relationship between hair and eye colour
H1 : There is a relationship between hair and eye colour
There is no concept of standard error in non-parametric tests, but it is still necessary
to calculate a test statistic. In examples checking for association, this test statistic will
follow the chi-squared distribution with an appropriate number of degrees of freedom.
In order to calculate its value, the contingency table has to be redrawn with expected
Œ
c
H ERIOT-WATT U NIVERSITY 2003
5.7. THE CHI-SQUARED DISTRIBUTION
29
values in each cell.
These expected values are calculated by assuming that both of the classifications are
independent and therefore probabilities can be multiplied using the equation
p(A and B) = p(A)

p(B)
There are 16 numbers to be calculated here, so only two will be carried out in full.
Ž
Blue Eyes/blonde Hair
200 1
=
600 3
100 1
=
p(blond) =
600 6
1
1
1
p(blue and blond) =
=

3
6 18
p(blue) =
Out of 600 people, then, it would be expected that 1 /18 of them would have blue
eyes and blonde hair. This calculates as 33.3 to one decimal place (expected
values are often not whole numbers and should be given to an appropriate degree
of accuracy in problems).
Ž
Grey Eyes/Brown Hair
100 1
=
600 6
300 1
=
p(brown) =
600 2
1
1
1
p(grey and brown) =

=
6
2 12
p(grey) =
Out of 600 people, then, it would be expected that 1 /12 of them would have grey
eyes and brown hair.This calculates as 50.
The contingency table can be redrawn now to show expected values.
Blue
Grey
Hazel
Brown
Total
Blond
33.3
16.7
16.7
33.3
100
Brown
100
50
50
160
300
Black
33.3
16.7
16.7
33.3
100
Red
33.4
16.6
16.6
33.4
100
Total
200
100
100
200
600
Notice that the results in the "red" column were all rounded so that the "total" column
was the same for the both expected results and for the original observed values. This
should always be done (and, in fact, saves some calculations of probabilities, since all
that is then required at that last stage is a subtraction).
Now the test statistic needs to be defined as it is this that follows the chi-squared
distribution. The easiest way to do this is to let O represent each original "observed"
value and E represent each "expected" value in turn and then calculate:

Test statistic =
c
H ERIOT-WATT U NIVERSITY 2003

(O - E)2
E
5.7. THE CHI-SQUARED DISTRIBUTION
30
This is often referred to as ‘“’”H•4–
The degrees of freedom, — , for contingency table problems is calculated by (number of
rows -1) ˜ (number of columns -1). Note that the "total" row and column is not counted.
So in this case,
—
= (4 - 1)
˜
(4 - 1) = 9
Also in this problem then, the test statistic is given by
(60 - 33.3)2
33.3
’”~•4–=™
‘
+
(40 - 100)2
100
+
š›š›š›š›š
+
(20 - 33.3)2
33.3
= 180.06
Just like there are statistical tables for the standardised Normal and t distributions, so
there are tables for the chi-squared distribution. Part of a set of tables is shown here.
Since the curve is not symmetrical, separate "cut-off" points need to be given for the left
and right hand sides.
œ
=
—
œ
=1
2
3
4
5
6
7
8
9
10
.99
.03157
.0201
.115
.297
.554
.872
1.239
1.646
2.088
2.558
.975
.03982
.0506
.216
.484
.831
1.237
1.690
2.180
2.700
3.247
.95
.00393
.103
.352
.711
1.145
1.635
2.167
2.733
3.325
3.940
.90
.0158
.211
.584
1.064
1.610
2.204
2.833
3.490
4.168
4.865
.50
.455
1.386
2.366
3.357
4.351
5.348
6.346
7.344
8.343
9.342
.20
1.642
3.219
4.642
5.989
7.289
8.558
9.803
11.030
12.242
13.442
=1
2
3
4
5
6
7
8
9
10
.05
3.841
5.991
7.815
9.488
11.070
12.592
14.067
15.507
16.919
18.307
.025
5.024
7.378
9.348
11.143
12.832
14.449
16.013
17.535
19.023
20.483
.02
5.412
7.824
9.837
11.668
13.388
15.033
16.622
18.168
19.679
21.161
.01
6.635
9.210
11.345
13.277
15.086
16.812
18.475
20.090
21.666
23.209
.005
7.879
10.597
12.838
14.860
16.750
18.548
20.278
21.955
23.589
25.188
.001
10.827
13.815
16.268
18.465
20.517
22.457
24.322
26.125
27.877
29.588
=
—
.10
2.706
4.605
6.251
7.779
9.236
10.645
12.017
13.362
14.684
15.987
These Tables are taken from Murdoch + Barns Statistical Tables
The tables reveal that for 9 degrees of freedom, the "cut-off" points for 5%, 1% and 0.1%
are 16.919, 21.666 and 27.877 respectively.
A diagram is now shown with the area shaded appropriate to a significance level of
0.001.

c
H ERIOT-WATT U NIVERSITY 2003
5.7. THE CHI-SQUARED DISTRIBUTION
Since the test statistic is in the shaded region, the null hypothesis is rejected. There is
evidence at the 0.1% level, therefore, that there is an association between hair and eye
colour. In other words there is a very highly significant relationship between hair and
eye colour.
5.7.2
Limitations of Chi-squared test
Chi squared is a mathematical distribution and has been used so far without any proof
given as to why it is useful in measuring whether there is an association between
criteria. The mathematical details are not required in this course so are not provided
here (although at the end of this topic the distribution will be re-visited and put in a
different context which may shed some light on how it comes about). However, account
must be taken of some limitations so that it can be used validly for statistical tests.
The first problem occurs if there is only one degree of freedom. This happens more often
than you might think, since if the contingency table only has 2 rows and 2 columns, the
degrees of freedom will be (2 - 1) ž (2 - 1) = 1. In cases like this, a Yates’ continuity
correction must be made. This also occurs in other areas in probability where discrete
distributions are being approximated by continuous ones. Basically what happens is
that 0.5 is subtracted from each calculated value of "O - E", ignoring the sign (plus or
minus). In other words, an "O - E" value of + 5 becomes + 4.5, and an "O - E" value of
-5 becomes -4.5. That number is then squared and divided by E. In terms of a formula,
the test statistic is now given by:
Ÿ¡ ¢H£4¤=¥¦
( § O - E § - 0.5)2
E
The second limitation in the use of the chi-squared distribution, again to satisfy the
underlying mathematical assumptions, the expected values should be relatively large.
The following simple rules are applied:
1. No expected category should be less than 1 (it does not matter what the observed
values are)
¨
2. AND no more than one-fifth of expected categories should be less than 5.
c
H ERIOT-WATT U NIVERSITY 2003
31
5.7. THE CHI-SQUARED DISTRIBUTION
32
If data do not meet these criteria then either larger samples have to be taken, or the data
for the smaller "expected" categories can be combined until their combined expected
value is 5 or more. This should only be done, however, if combinations are sensible
Example The example from Topic 4, where differences between two sample
proportions were considered, will now be re-worked using a chi-squared test instead of
the method used previously of calculating a z value and comparing it with standardised
Normal distribution.
The problem examined church attendance in two countries, Scotland and England, and
asked if there was a significant difference between the church visiting patterns of the
Scots and the English. These were the results:
Scotland
England
Total
Attend regularly
47
31
78
Do not attend
regularly
136
106
242
Total
183
137
320
The hypothess test is given as:
H0 : There is no relationship between church attendance and Country
H1 : There is a relationship between church attendance and Country
A table of expected values is now calculated in the same way as before assuming the
null hypothesis to be true. These expected values are listed below. (This can be done
very quickly by noticing that, in fact, only one probability calculation is required - the
others are obtained by subtractions).
Scotland
England
Total
Attend regularly
44.6
33.4
78
Do not attend
regularly
138.4
103.6
242
Total
183
137
320
2
ª~«4¬=­¯®±°B²´³µ·¶¸²dµ
©
0.5 ¹
2
¶
­
47 - 44.6 ²dµ 0.5 ¹ 2
31 - 33.4 ²5µ 0.5 ¹
°B²
°B²
º
44.6
33.4
0.081 + 0.108 + 0.026 + 0.035
­
­
2
º
°B²
136 - 138.4 ²5µ 0.5 ¹
138.4
2
º
°B²
106 - 103.6 ²dµ 0.5 ¹
103.6
0.25
Now compare with a chi-squared curve with one degree of freedom. The "cut-off" point
for 5% is 3.841.
»
c
H ERIOT-WATT U NIVERSITY 2003
2
5.7. THE CHI-SQUARED DISTRIBUTION
33
Since the test statistic is not in the shaded region, the null hypothesis is accepted. There
is no evidence, at the 5% level, of a relationship between people who attend church
regularly and whether they live in Scotland or England. This supports the conclusion
reached in the previous chapter.
Note: No worked examples have been given in this section which show the limitations of
the test when small expected values are calculated, but the reader should be aware of
these limitations and address them appropriately if they are encountered in calculations.
Charter airlines
A consumer association has done some research on customer views on the reliability of
charter airlines. The results are tabulated below:
Airline
High Life
Good
Average
Poor
50
40
30
Sky Coaxing
40
80
Up and Away
35
50
55
50
Carry out an appropriate hypothesis test to determine if there is an association between
the airline and reliability.
5.7.3
Goodness of Fit Tests
The Chi-squared test can also be used in other situations where observed and
expected values are being compared. The test statistic will again be:
½
¼¡½¾H¿4À
ÁñÄ4ÅkÆ·ÇÉÈ
Ç
The degrees of freedom, Ê , will depend on the particular problem, but in general
= (number of classes) - (number of parameters estimated) - 1
Ê
Ë
c
H ERIOT-WATT U NIVERSITY 2003
5.7. THE CHI-SQUARED DISTRIBUTION
34
Examples
1. Unfair die
A gambler suspects the die being used for a game is loaded and producing unfair results.
A survey of 120 throws gave the following results:
Throw
Frequency
1
17
2
16
3
19
4
23
5
22
6
23
These are clearly the observed values. The estimated values are quite simply 20 for
each throw if the die is fair.
The hypothesis test takes the form:
H0 : The expected distribution is true (in this case "the die is fair")
H1 : The expected distribution is false (in this case "the die is loaded")
The test will be carried out using a significance level of 0.01.
The following table shows how the calculations are carried out.
O
E
(O - E)
(O - E)2
17
16
19
23
22
23
20
20
20
20
20
20
3
4
1
3
2
3
9
16
1
9
4
9
0.45
0.80
0.05
0.45
0.20
0.45
Total:
2.40
̛ÍÏÎ{ÐÒÑ
No parameters have been estimated in this problem so,
Tables give a Ô
2
value of 15.086 (1% level)
The diagram is as follows:
Õ
c
H ERIOT-WATT U NIVERSITY 2003
2
E
Ó
=6-1=5
5.7. THE CHI-SQUARED DISTRIBUTION
35
Since the test statistic is not in the shaded region the null hypothesis must be accepted.
There is no evidence that the die is loaded.
2. Company feelings
In this example a test will be carried out to verify whether a particular distribution is
Normally distributed.
An attitude survey is taken by employees to see how they feel about their company.
Answers from a questionnaire could potentially produce scores from 0 to 50 and the
actual results are shown below:
Frequency (f)
Class intervals
10 - under 15
15 - under 20
20 - under 25
25 - under 30
30 - under 35
35 - under 40
Ö
11
14
24
28
13
10
f = 100
Test at the 5% level whether this is a Normal distribution.
Solution
There are two parameters to be estimated here, the mean and standard deviation. Using
the usual formulae (and approximating each class interval by its mid-point) these are
calculated as
Ø×
±
ÙÛÚ
Ù
12.5
Ü
11 Ý + Ú 17.5
Ü
14 Ý + Ú 22.5
Ü
26 Ý + Ú 27.5
100
Ü
26 Ý + Ú 32.5
Ü
13 Ý + Ú 37.5
Ü
10 Ý
24.9
Similarly, the standard deviation, s, is calculated as 7.194.
Now various probabilities using the Normal curve and statistical tables must be
calculated. As an example, consider 30 - under 35. The Normal curve has the
appearance:
Þ
c
H ERIOT-WATT U NIVERSITY 2003
5.7. THE CHI-SQUARED DISTRIBUTION
36
The area to the right of 35 is given by a z value of
35 - 24.9
7.194
= 1.40
30 - 24.9
7.194
= 0.71
Tables give the area to the right of 1.40 as 0.0808
The area to the right of 30 is given by a z value of
Tables give the area to the right of 0.71 as 0.2389
This means the probability of obtaining a score between 30 and 35 is 0.2389 - 0.0808 =
0.1581.
Multiplying this by 100 gives the expected number in this category, namely 15.81.
The other expected values are calculated in the same way and the results are as follows:
Frequency (f)
Class intervals
ß
10
10 - under 15
15 - under 20
20 - under 25
25 - under 30
30 - under 35
35 - under 40
à
40
1.92
6.46
16.45
25.57
25.71
15.81
6.29
1.79
Notice that these add to 100 (that is why the extra categories at the start and end had
to be added).
Now though, since these "extra" categories give values less than 5 (one of the limitations
of the chi-squared test) it makes sense to combine them with the adjacent categories.
The revised table is as follows:
Frequency (f)
Class intervals
10 - under 15
15 - under 20
20 - under 25
25 - under 30
30 - under 35
35 - under 40
8.38
16.45
25.57
25.71
15.81
8.08
Compare this now with the observed values.
All that remains now is to set up the hypotheses and calculate the test statistic (notice
that the most time-consuming part of this problem was the mundane calculations!).
H0 : Data follow a normal distribution with mean 24.9 and standard deviation 7.194
H1 : Data do not follow a normal distribution with mean 24.9 and standard deviation
7.194
á
âHã4ä=åÃæ±ç4èké·êÉë~ì
2
ê
åíç
11 - 8.38 ë
8.38
2
+
ç
14 - 16.45 ë
16.45
2
ç
îƒï›ï›ï›ïdî
10 - 8.08 ë
8.08
2
= 2.44
The degrees of freedom, ð , is given by ð = (number of classes) - (number of parameters
estimated) - 1
ñ
c
H ERIOT-WATT U NIVERSITY 2003
5.8. COURSEWORK 1
so
= 6 - 2 (mean and standard deviation) - 1 = 3
ò
The critical value from chi-squared tables is 7.815 as shown on the graph below.
Since the test statistic is not in the shaded region, the null hypothesis cannot be
rejected. There is evidence at the 5% level that suggests that these data follow a normal
distribution with mean 24.9 and standard deviation 7.194
5.8
Coursework 1
This is the first of two coursework exercises the second is at the end of Topic 10
This work should be submitted to your tutor at a date to be notified. For the exercise
it is expected that you will have access to an appropriate computer package (such as
Microsoft Excel or Minitab) in order to help analyse the data. You are not required to
perform calculations manually.
Task 1
An insurance company wishes to investigate if there is a difference between the claims
received by their Aberdeen and Dumfries offices. One week of the year is randomly
selected and all the claims to each office during that week are recorded. The results are
given in Table 5.1 and Table 5.2
ó
c
H ERIOT-WATT U NIVERSITY 2003
37
5.8. COURSEWORK 1
38
Table 5.1:
339
297
392
345
342
335
335
201
284
268
259
222
332
353
342
160
447
191
292
412
349
223
186
205
350
267
197
280
134
292
119
1293
374
378
320
270
Aberdeen Claims
220
283
171
219
268
363
323
105
272
307
408
241
292
285
456
403
349
59
281
135
247
246
344
221
270
278
328
1381
334
277
400
173
198
253
160
371
364
245
382
476
351
256
349
318
198
398
196
191
224
310
171
1249
383
206
299
365
190
420
208
188
290
418
224
301
361
344
275
394
363
231
Table 5.2:
193
164
486
331
319
208
506
371
128
445
400
445
372
51
374
256
174
265
275
257
355
230
79
325
319
189
422
313
307
224
168
Dumfries Claims
560
451
1303
420
255
51
370
60
333
201
408
300
234
334
247
458
385
137
273
343
413
a) Summarise each data set.
Obtain mean, median, standard deviation and interquartile range.
Produce relevant graphs that will show any patterns in the data.
b) Use a hypothesis test to investigate if there is a significant difference between the
claims received by the two offices.
Set out your hypotheses clearly and show all your working.
c) Produce 95% confidence intervals that will give an estimation of the average
amount of all the claims received by each office.
Task 2
A warehouse ships out 500 cartons of strawberries one day, each of which contains 20
strawberries. It is desired to analyse the distribution of rotten strawberries. Tests are
carried out and the results are shown in the following frequency table Table 5.3
ô
c
H ERIOT-WATT U NIVERSITY 2003
5.9. SUMMARY AND ASSESSMENT
39
Table 5.3:
Number of
rotten
strawberries
Observed
Counts
0
1
3
10 34
2
3
4
5
6
7
8
9
63 100 100 82 56 34 13
10 11 12
0
2
more
than12
2
1
Perform an appropriate test, showing all the details, to check whether or not the
distribution of rotten strawberries can be represented by a Binomial distribution with
n = 20. (You will need to calculate p).
5.9
Summary and assessment
At this stage you should be able to:
õ
identify situations in experimentation where a hypothesis test will produce a useful
result
õ
appreciate the ideas of null and alternative hypotheses
õ
use the standardised Normal distribution in hypothesis tests involving large
samples
õ
use the student’s t distribution in hypothesis tests involving small samples
õ
explain Type 1 and Type 2 Errors
õ
use the formulae for standard error and test statistic in the cases of
a) single mean - large samples
b) single proportion - large samples
c) difference between two means - large samples
d) difference between proportions - large samples
e) single mean - small samples
ö
f) difference between two means - large samples
c
õ
decide when to use a One or Two Tailed Test
õ
appreciate the concept of degrees of freedom
õ
calculate confidence interval for population mean based on a sample mean from
a small sample
õ
use a paired t test
H ERIOT-WATT U NIVERSITY 2003
ANSWERS: TOPIC 5
Answers to questions and activities
5 Hypothesis Tests
Hypothesis testing (page 9)
Q1:
Using a systematic sampling technique of taking every fifth number, a sample is
obtained.
2.59
2.16
1.51
2.59
1.60
1.91
1.67
1.90
1.44
2.04
2.59
2.12
1.86
2.58
1.77
2.99
2.02
1.88
1.79
2.19
2.57
1.78
1.49
2.19
2.08
1.94
2.04
2.49
2.29
2.04
Now using the statistical functions on a calculator, or using Excel, Minitab or another
statistical package, the mean and standard deviation for the sample is found.
÷±øùúû;üdû
ýþøkûúÿ
From the sample results it looks as if the time taken may be more than 2 minutes so set
up a Hypothesis test in the form:
H0 : 2.000
H1 : 2.000
The sample standard deviation can be used as an estimate for the population value
since this is a large sample. Now the standard error can be calculated as ú
úfø ø
øûúû;üdû
& økûú)(((
The test statistic is ø ! "# ø%$ & $ '
Comparing with the standardised Normal distribution curve, and using a significance
level of 5%, the test statistic is in the shaded region.
*
c
H ERIOT-WATT U NIVERSITY 2003
40
ANSWERS: TOPIC 5
41
Therefore the null hypothesis is accepted. The claim that the questionnaire takes 2
minutes to complete is valid using a significance level of 5%.
Note: Your numbers will be different if you took a different sample
Play length (page 23)
Q2:
1.
Set up the hypotheses:
H0 : + 1 - + 2 = 0
H1 : + 1 - + 2 - , 0
(Subscripts 1 refer to Country and 2 Pop).
2.
Calculate sample means and sample standard deviations
.;: -<53
=>75
.0/ -214365879
?/ -@=A3)19>B
?C: -@=A3)D58D
E#/ -GFH=
E0: -2I
3.
j
Calculate the Standard Error:
/
J 3
KL34-%?4M N /
N
O8PRQ OS with ? M /WVYX SP Z O SU /WVYX SS
N T O PU
T
?
M
- N
so
O P Z O S[U :
/
/
J
Therefore, 3
KL34-f? M N O P Q O S
c
/WVYX SPWZ O SU /WVYX SS
T O PU
T
O8P Z OS U :
/]\C^4\_ `a&b S Zdc ^4\_ :e&: S
- =A3)1D7
@
/]\ Zdc U :
/
/
-@=A3)1D7 N /]\ Q
-gA
= 3hFi9=
c
H ERIOT-WATT U NIVERSITY 2003
ANSWERS: TOPIC 5
42
4.
Use the standard error in the test statistic:
In the case of difference of two means for small samples this is given by kmlon pqr pCsy>turv
z {#nxw>z qr wst
And by
the null hypothesis, |
kml%} z ~&‚€ z6r ƒ]€~‚ z ‚~ l…„‡†4ˆ)‰‰†
1
-|
2
= 0.
5.
Compare with t distribution curve with 17 degrees of freedom (recall that n is
calculated as n1 +n2 -1). Use a significance level of 0.05 (0.025 each side):
6.
Make a conclusion:
Since the test statistic is in the shaded region, reject the null hypothesis and accept
the alternative one. There is evidence at the 5% level that the duration times of
Country and Pop CD singles are different.
Charter airlines (page 32)
Step1: Add totals to the table and calculate probabilities
Airline
High Life
Good
Average
Poor
Total
50
40
30
120
Sky Coaxing
40
50
80
170
Up and Away
35
125
55
145
50
160
140
430
Š
Total
c
H ERIOT-WATT U NIVERSITY 2003
ANSWERS: TOPIC 5
43
p(H) = 120/430 = 0.279; p(S) = 0.395; p(U) = 0.326;
p(G) = 125/430 = 0.291; p(A) = 0.337; p(P) = 0.372.
Step 2: Hypothesis Test and Expected Values (EV)
H0 : There is no relationship between airline and reliability
H1 : There is a relationship between airline and reliability
p(H and G)=0.279 ‹ 0.291 = 0.081
EV(H and G)=0.081 ‹ 430 = 34.9
EV(H and A) = 0.279 ‹ 0.337 ‹ 430 = 40.4
By subtraction, EV( H and P) = 120 - 40.4 - 34.9 = 44.7
EV(S and G) = 0.395 ‹ 0.291 ‹ 430 = 49.3
EV(S and A) =0.395 ‹ 0.337 ‹ 430 = 57.2
EV(S and P) = 170 - 49.3 - 57.2 = 63.5
The last row can be done by subtractions to make up column totals.
Expected Values
Airline
High Life
Good
Average
Poor
Total
34.9
40.4
44.7
120
Sky Coaxing
49.3
57.2
63.5
170
Up and Away
40.8
125
47.4
145
51.8
160
140
430
Total
Step 3: Calculate Test Statistic
Test statistic = ŒŽ#‘“’0”–•
’
The degrees of freedom, — , for contingency table problems is calculated by (number of
rows - 1) ‹ (number of columns - 1)
So in this case, — = (3 - 1) ‹ (3 - 1) = 4
Also in this problem then, the test statistic is given by:
˜š™›œž Ÿ¢¡ 50 - 34.9 £
34.9
2
40 - 40.4 £
+ ¡
40.4
2
¤<¥¦¥¦¥¦¥¦¥¦¥¤
¡ 50 - 51.8 £
51.8
2
= 20.425
Step 4: Compare with chi-squared tables.
With 9 degrees of freedom the "cut-off" point for a 5% significance test is 16.9. Since
the test statistic is in the shaded region (see graph below) there is evidence at the 0.05
level that there is a relationship between airline and reliability.
§
c
H ERIOT-WATT U NIVERSITY 2003
ANSWERS: TOPIC 5
¨
c
H ERIOT-WATT U NIVERSITY 2003
44