Download MATHEMATICS SUPPORT CENTRE Title: Measures of Spread

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
S2
MATHEMATICS
SUPPORT CENTRE
Title: Measures of Spread
Target: On completion of this worksheet you should understand what is meant by a
measure of spread and be able to calculate range, interquartile range and standard
deviation.
Suppose there are two football teams, A and B,
and we need to choose one of them to take part
in a competition. In this competition consistency
is important. In order to make our decision we
will use the number of goals they scored in their
last 11 matches.
A 4 7 0 1 2 0 6 7 4 2 0
B 3 3 2 3 3 3 2 4 4 3 3
We will first consider the mean number of goals
for each team:
mean A = 33 = 3
mean B = 33 = 3
11
11
The means are equal so we must use other
criteria. As consistency is important we need to
find some way of measuring the spread of the
data.
The Range
range = largest value – smallest value
range A = 7 – 0 = 7
range B = 4 – 2 = 2
As team B has a lower range than team A we
would choose team B.
In general although the range is very easy to
find it can be distorted by a very high or a very
low figure particularly if there is a large amount
of data.
The Interquartile Range
The median divides the data into two halves (see
sheet S1). The quartiles with the median divide
the data into quarters.
To find the median and quartiles the data must
first be put in numerical order.
Mathematics Support Centre,Coventry University, 2001
Example
Team A: 0 0 0 1 2 2 4 4 6 7 7
The median is the 6th value, the lower quartile
(Q1) is the 3rd value and the upper quartile (Q3)
is the 9th value
0 0 0 1 2 2 4 4 6 7 7
Q1
median
Q3
We can see that these values divide the data
into four equal parts. Now another measure of
spread is the
interquartile range = Q3 – Q1
interquartile range for A = 6 – 0 = 6
For team B we must first arrange the data in
order as before:
2 2 3 3 3 3 3 3 3 4 4
Q1
M
Q3
The quartiles and median are in the same
positions as for team A.
interquartile range for B = 3 – 3 = 0
Exercise
Find the range and interquartile range for the
following:
1. The test marks for a group of students are:
50, 42, 76, 38, 12, 56, 62.
2. Bolts are packed in boxes of 10. A sample of
boxes was examined and the number of
defective bolts in each was:
1, 0, 3, 0, 0, 1, 1, 1, 3, 1, 0, 1, 2, 0, 1.
3. The number of minutes late for a particular
bus: 5, 2, 7, 5, 6, 4, 1, 0, 3, 4.
(Answers: 64, 24; 3, 1; 7, 4)
The Standard Deviation
The standard deviation uses all the values and gives
a more useful measure of spread. The formula is
standard deviation =
(x − x)2
n
where x is the mean
Using the football teams data (mean = 3)
A
B
x
(x − x)
(x − x)2
x
(x − x)
(x − x)2
4
7
0
1
2
0
6
7
4
1
4
-3
-2
-1
-3
3
4
1
1
16
9
4
1
9
9
16
1
1
9
3
3
2
3
3
3
2
4
4
3
3
0
0
-1
0
0
0
-1
1
1
0
0
1
0
0
0
1
1
1
= 76
Σ
(x − x)2
2
0
-1
-3
Σ
(x − x)2
0
0
0
0
Example Find the standard deviation:
Goals
Number of
Scored Matches
x
f
fx
fx2
0
10
0
0
1
29
29
29
2
32
64
128
3
23
69
207
4
6
24
96
Σf = 100
Σfx = 186 Σfx2 = 460
460  186 
standard deviation =
−

100  100 
2
= 1 ⋅ 068
This is the standard deviation for the population
of 100 matches, ie. for the data given. If we
want to use this data as a sample to estimate the
standard deviation for the population of all
matches then we must use a slightly different
formula which gives a better estimate. We use s
to denote this standard deviation.
=4
Σfx
2
2
Σfx )
(
−
Σf
Σf − 1
76
= 2 ⋅ 63
Standard deviation A = σ =
11
Standard deviation = s =
4
= 0 ⋅ 60
11
Note the use of σ (sigma) for the standard deviation
The formula can also be written as
Using this with the above example gives
Standard deviation B = σ =
σ =
Σx 2  Σx 
− 
n
 n 
2
Exercise
Find the standard deviation for the questions in the
previous exercise using both formulae.
(Answers: 18·8, 0·966, 2·1)
For a frequency distribution the formula is
Σfx 2  Σfx 

− 
Standard deviation = σ =
Σf
 Σf 
2
This formula is also used for a grouped frequency
distribution where x is the mid-point of the
interval.
You can use your calculator to work out the mean
and standard deviation. First put your calculator
into statistics mode and then clear all memories.
Each x value is entered ( × frequency if needed )
followed by ‘DATA’ key. How to display the mean
and standard deviation depends on your calculator.
Mathematics Support Centre,Coventry University, 2001
s=
186 2
100 = 1 ⋅ 073
100 − 1
460 −
As the number of values is large there is little
difference between the two standard deviations
but this second formula should be used when
estimating a population standard deviation.
Exercise
Estimate the population standard deviation for
the following sets of data:
1 x
4
5
6
7
8
f
2
7
8
3
2
2 x
23
24
25
26
27
f
9
21
24
13
3
3 x
1-3
4-6
7-9
10-12 13-15
f
5
24
27
15
9
(Answers: 1·10, 1·05, 3·29)
The variance is the square of the standard
deviation so
Σfx
variance = σ =
2
2
2
(
Σfx )
−
Σf
Σf − 1