Download Averages and spread 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Time series wikipedia, lookup

World Values Survey wikipedia, lookup

Transcript
Averages and spread 2
Teacher guidance notes: worked examples incorporating solutions
1. Raw data
Raw data refers to a data set where each data value is known individually. All of the
different measures of location and dispersion can be calculated from raw data.
Example
In an experiment, scientists from CERN were tested for their level of radiation
exposure (millisievert, mSv), over a year. The results were as follows:
12.3
21.2
19.0
13.1
17.1
18.1
24.0
15.1
15.4
21.7
18.2
a. Calculate the following numerical measures
i. Mean
12.3 + 21.2 + 19.0 + 13.1 + 17.1 + 18.1 + 24.0 + 15.1 + 15.4 + 21.7 + 18.2
11
195.2
Μ‡
Μ‡
π‘₯Μ… = 11 = 17.745 = πŸπŸ•. πŸ• mSv (1dp).
π‘₯Μ… =
ii. Median
𝑛+1
If there are n data values in ascending order, the median is the (
2
)th data
value. In this case, n = 11, so the median is the 6th value.
In ascending order:
12.3
13.1
15.1
15.4
17.1
18.1
18.2
19.0
21.2
21.7
24.0
The median is the middle value, 18.1.
iii. Mode
Each data value appears once, so there is no mode. If the data were rounded to
the nearest whole number: (12, 13, 15, 15, 17, 18, 18, 19, 21, 22, 24) there
would be twin modes at 15 and 18. The mode is of limited use here.
iv. Interquartile range
If there are n data values in ascending order, the lower quartile is the (
3(𝑛+1)
data value and the upper quartile is the (
4
𝑛+1
4
)th
)th data value. Here, n = 11, so
the quartiles are the 3rd and 9th values.
In ascending order:
12.3
13.1
15.1
15.4
17.1
18.1
18.2
19.0
21.2
21.7
24.0
The upper and lower quartiles are 21.2 and 15.1, so the interquartile range is
21.2 – 15.1 = 6.1 mSv.
Page 1 of 7
v. Range
The range is the difference between the highest data value (24.0) and the lowest
(12.3).
Range = 24.0 – 12.3 = 11.7 mSv.
vi. Variance
The variance is the average of the squared deviations from the mean:
βˆ‘(π‘₯ βˆ’ π‘₯Μ… )2 βˆ‘ π‘₯ 2
𝜎2 =
=
βˆ’ π‘₯Μ… 2 .
𝑛
𝑛
2
2
+ 19.02 + 13.12 + 17.12 + 18.12 = 3596.66
βˆ‘ π‘₯ 2 = 12.3 + 21.2
2
+24.0 + 15.12 + 15.42 + 21.72 + 18.22
So
βˆ‘ π‘₯2
3596.66
2
𝟐
𝝈 =
βˆ’ π‘₯Μ… 2 =
βˆ’ (17.74Μ‡5Μ‡) = 326.9691 βˆ’ 314.9012 = 𝟏𝟐. πŸŽπŸ”πŸ•πŸ—
𝑛
11
vii. Standard deviation
The standard deviation is the square root of the variance:
𝝈=√
βˆ‘ π‘₯2
βˆ’ π‘₯Μ… 2 = √12.0679 = 3.4739 = πŸ‘. πŸ“ mSv (1dp).
𝑛
b. Interpretation:
The safe level for one year’s exposure is 20.0 mSv. Explain if the following
statement is correct, using the data you have just calculated.
β€˜The scientists at CERN are working within the safe levels of radioactive
exposure.’
The statement is incorrect, since the upper quartile of 21.2 is above the safe
limit, suggesting that a quarter of scientists exceeded the safe limit. The safe
limit should apply to each scientist individually, rather than to the average
scientist’s exposure.
Page 2 of 7
Worked examples:
2. Frequency distribution
A frequency distribution is a compact way of describing raw data when some of the
readings occur more than once. Instead of listing all the data values individually, a
frequency distribution lists each different value along with its frequency (the number of
times it occurs). Averages and measures of spread can be calculated by
reconstructing the raw data, but for the mean, variance and standard deviation it is
easier adapt the formulas to take frequencies into account:
βˆ‘ 𝑓π‘₯
π‘₯Μ… =
where 𝑛 = βˆ‘π‘“
𝑛
𝜎2 =
βˆ‘ 𝑓π‘₯ 2
βˆ’ π‘₯Μ… 2
𝑛
𝜎=√
βˆ‘ 𝑓π‘₯ 2
βˆ’ π‘₯Μ… 2 .
𝑛
These are best calculated from a table, as in the example following.
Example
Isabella went up and down the street to find out how many parking spaces each house
has. Here are her results:
Number of
parking spaces
Frequency
1
15
2
27
3
8
4
3
a. Calculate the mean, variance and standard deviation.
This can be done by extending the table:
# of parking
spaces, x
Totals
Frequency, f
f x2
fx
1
15
15
15
2
27
54
108
3
8
24
72
4
3
12
48
𝑛 = βˆ‘ 𝑓 = 53
βˆ‘ 𝑓π‘₯ = 105
Page 3 of 7
βˆ‘ 𝑓π‘₯ 2 = 243
Mean:
Variance:
βˆ‘ 𝑓π‘₯
π‘₯Μ… =
𝜎2 =
𝑛
βˆ‘ 𝑓π‘₯ 2
𝑛
105
=
53
= 1.981132 = 1.98 (3SF).
βˆ’ π‘₯Μ… 2 =
243
βˆ’ (1.981132)2 = 4.584906 βˆ’ 3.924884 =
53
0.660021.
2
Standard deviation: 𝜎 = βˆšβˆ‘ 𝑓π‘₯ βˆ’ π‘₯Μ… 2 = √0.660021 = 0.812417 = 0.81.
𝑛
The mean number of parking spaces is 2.0 and the standard deviation is 0.8
parking spaces.
b. Calculate the median and interquartile range
𝑛+1
If there are n data values in ascending order, the median is the (
2
)th data value.
In this case, n = 53, so the median is the 27th value. To work this out, it is necessary
to imagine (but not write out) the raw data in ascending order. The raw data looks
like this:
15 times
27 times
8 times
3 times
⏞
1, 1, 1, … … … , 1, 1,
⏞
2, 2, 2, … … … , 2, 2,
⏞
3, 3, 3, … … , 3,3,
⏞
4, 4, 4
The 27th value is one of the 2s, so the median is 2.
𝑛+1
3(𝑛+1)
4
4
The quartiles are the (
)th and (
)th data values. Here, n = 53, so the
quartiles are the 13.5th and 40.5th values. The 40.5th value is the average of the
40th and 41st values, both of which are 3s. The upper quartile is therefore 3. The
13.5th value is the average of the 13th and 14th values, both of which are 1s. The
lower quartile is therefore 1.
The interquartile range is 3 – 1 = 2.
c. Calculate the mode and the range
The mode is the data value with the highest frequency. In this case the mode is 2.
The range is the difference between the highest and lowest data values. These are 4
and 1, so the range is 4 – 1 = 3.
Page 4 of 7
Worked examples:
3. Grouped data
Sometimes data is grouped, either because the data is continuous or because there
are a lot of different data values. For example, we might group together all the values
between 0 and 10 in one group, those from 10 to 20 in another and so on. Grouped
data looks a bit like a frequency distribution but, instead of having the frequency for
each data value, we have the frequency for each group of data.
We describe the grouping using intervals: 0 ≀ π‘₯ < 10, 10 ≀ π‘₯ < 25, etc. The intervals
can be different sizes, but we need to be careful that they do not overlap or leave
gaps. For example: the intervals 0 ≀ π‘₯ ≀ 10, 10 ≀ π‘₯ < 25 overlap, since 10 could go
into either group; the intervals 0 ≀ π‘₯ ≀ 9, 10 ≀ π‘₯ ≀ 24 leave a gap, since 9.5 does not
fit into either group.
Once data has been grouped, we do not know the individual data values. This means
we can only estimate the averages and measures of spread. Grouped data is dealt
with in much the same way as a frequency distribution, but there is one extra step to
deal with: we have to decide what to use as the x-value of each group. So that all
mathematicians to do this in the same way, we agree to use the midpoint of each
group. We do not know the data values, so we assume that they are all at the
midpoint.
The midpoint is calculated by averaging the group boundaries. For example, using
0+10
intervals: 0 ≀ π‘₯ < 10, 10 ≀ π‘₯ < 25, the first midpoint is at 2 = 5, and the second
midpoint is at
10+25
2
= 17.5.
Example
A survey is conducted to look into the amount of money the average customer spends
at a supermarket checkout. This was done with a sample of 100 people. The
information was then grouped into the following intervals:
Amount spent (£)
Frequency
5 ο‚£ x < 25
10
25 ο‚£ x < 40
13
40 ο‚£ x < 70
12
70 ο‚£ x < 100
29
100 ο‚£ x < 150
23
150 ο‚£ x < 200
13
Page 5 of 7
a. Estimate the mean and standard deviation of this data.
Note that the interval 5 ο‚£ x < 25 has a frequency of 10. This tells us that 10 people
spent between £5 and £24.99, since £25 would be included in the next group up.
The estimates can be made by extending the table:
Midpoint,
x
Frequency,
f
fx
f x2
5 ο‚£ x < 25
15.0
10
150.0
2250.00
25 ο‚£ x < 40
32.5
13
422.5
13731.25
40 ο‚£ x < 70
55.0
12
660.0
36300.00
70 ο‚£ x < 100
85.0
29
2465.0
209525.00
100 ο‚£ x < 150
125.0
23
2875.0
359375.00
150 ο‚£ x < 200
175.0
13
2275.0
398125.00
Totals
𝑛 = 100
βˆ‘ 𝑓π‘₯ = 8847.5
βˆ‘ 𝑓π‘₯ 2 = 1019306.25
Amount
spent (£)
π‘₯Μ… =
Mean:
βˆ‘ 𝑓π‘₯
𝑛
=
8847.5
100
= 88.475 = £πŸ–πŸ–. πŸ’πŸ–
Variance:
𝜎2 =
βˆ‘ 𝑓π‘₯ 2
𝑛
βˆ’ π‘₯Μ… 2 =
1019306.25
100
βˆ’ (88.475)2 = 10193.625 βˆ’ 7827.825625 =
2365.236875
2
Standard deviation: 𝜎 = βˆšβˆ‘ 𝑓π‘₯ βˆ’ π‘₯Μ… 2 = √2365.236875 = 48.6337 = £πŸ’πŸ–. πŸ”πŸ‘
𝑛
b. Estimate the median and interquartile range of this data.
The estimates can be made by working out the cumulative frequency table. This
indicates how many data values lie below each group boundary value:
Amount spent (£)
Endpoint, x
Cumulative frequency
5
0
5 ο‚£ x < 25
25
10
25 ο‚£ x < 40
40
23
40 ο‚£ x < 70
70
35
70 ο‚£ x < 100
100
64
100 ο‚£ x < 150
150
87
150 ο‚£ x < 200
200
100
Page 6 of 7
A cumulative frequency graph has cumulative frequency on the y-axis and data
values on the x-axis. The lower quartile, median and upper quartile are the data
𝑛 𝑛 3𝑛
values on the x-axis corresponding to the cumulative frequencies of 4 , 2 , 4
respectively on the y-axis.
In this case, where n = 100, we can read off the lower quartile, median and upper
quartile as the values on the x-axis corresponding to cumulative frequencies of 25,
50 and 75.
LQ = £45
Median = £86
UQ = £124
The median is £86 to the nearest pound.
The interquartile range is £124 – £45 = £79, to the nearest pound.
Page 7 of 7