Download Chapter 6

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Chapter 6
Outline solutions
In all these solutions, don’t be put off if you have minor disagreements with my
answers – this can be due to rounding off of figures within calculations, and is nothing
to get excited about. You should be able to tell whether your answers are near enough
to be correct, or whether there are sufficiently different to suggest that you’ve actually
made a mistake. Definite signs that something has gone wrong are negative numbers
under the square root sign in a standard deviation calculation, a mean which is outside
the range of the original data, etc!
1 (a) 1.5, 2, 2, 3, 3.5, 4, 4, 4, 6, 7
It’s easiest to arrange the data as a table, and to use the formulae for ungrouped data:
x
1.5
2
2
3
3.5
4
4
4
6
7
37
x2
2.25
4
4
9
12.25
16
16
16
36
49
164.5
Mean, x =
x
n
=
37
= 3.7 minutes
10
Standard deviation =
x
n
2
 x 2 = 164.5 / 10  3.72 = 1.66 minutes
Median will be the (10 + 1)/2 = 5.5th value – that’s halfway between 3.5 and 4, which
is 3.75 minutes.
Quartiles are at the 11/4 = 2.75th and the 3 x 11/4 = 8.25th position, giving a value ¾
of the way between 2 and 2 – that’s 2 minutes – and one ¼ of the way between 4 and
6 – that’s 4.5 minutes.
2. Here we need to set up the following table – we’ve had to make a decision about
closing the top class, so I’ve assumed that no call lasts more than 20 minutes. If
you’ve made a different assumption you will have different answers:
Duration (mins)
less than 1
No. of calls
6
x
fx
fx^2
0.5
3
1.5
1 but under 2
2 but under 3
3 but under 5
5 but under 10
10 or more
Total
19
32
65
36
11
1.5
28.5
42.75
2.5
80
200
4
260
1040
7.5
270
2025
15
165
2475
806.5
5784.25
169
and to use the formulae for grouped data.
Mean =
 fx =
f
806 .5
= 4.77 minutes.
169
Standard deviation =
x
n
2
 x2 =
5784.25 / 169  4.772 = 3.38 minutes.
For the median and quartiles we need the cumulative table:
Duration (mins)
less than 1
less than 2
less than 3
less than 5
less than 10
less than 20
Cumulative frequency
6
25
57
122
158
169
Median = (169 + 1)/2 = 85th value (it would be OK here to use the 169/2 = 84.5th
value since the sample is large – the discrepancy in the answers will be small).
The 85th value comes in the class 3 but less than 5. The proportion calculation as
shown in the book gives:
median = 3 +
85  57
28
x2=3+
x 2 = 3.86 minutes.
122  57
65
Quartile 1 is the 170/4 = 42.5th value, lying in the class 2 but less than 3, so we get
Q1 = 2 +
42.5  25
x 1 = 2.55 minutes,
57  25
and in the same way Q3 = 127.5th value = 5.76 minutes.
The mean is bigger than the median here, and Q3 is further from the median than Q1,
because the distribution has a long tail at the upper end, and those high values ‘pull’
the mean and Q3 upwards. So the median probably gives a better picture of the
‘typical’ length of a phone call.
3. Use the same formulae and table layout as above, except that we don’t need to use
a class midpoint for x here, since the values (marks) are discrete.
You should find that mean = 1192/189 = 6.31 marks, standard deviation = 1.79
marks, median = 6 marks, Q1 = 5 marks and Q3 = 8 marks (there is no need to use the
interpolation formula to calculate median and quartiles in this case, since the marks
are discrete).
4. Working out the class mid-point is always tricky with age data. The point is that
we go on quoting our age as, say, 70 until the day before our 71st birthday. Thus the
class 61-70 actually includes everyone up to 70 years and 364 days – so it’s really ’61
but less than 71’. Its mid-point is thus 66. The same applies for all the other classes –
for example, 71-75 is really ’71 but under 76’, thus having mid-point 73.5.
We then have the table below:
Age
range
No of residents
x
fx
fx^2
f
61-70
71-75
76-80
81-85
86-90
91-95
96-100
3
12
16
27
18
7
2
Total
85
66
198
13068
73.5
882
64827
78.5
1256
98596
83.5
2254.5
188250.8
88.5
1593
140980.5
93.5
654.5
61195.75
98.5
197
19404.5
7035
586322.5
so mean = 7035/85 = 82.76 years, standard deviation = 6.92 years.
The cumulative frequency table is:
Age range cumulative
frequency
under 71
under 76
under 81
under 86
under 91
under 96
under 101
3
15
31
58
76
83
85
(again, the limits are 71, 76 etc not 70, 75 etc.)
So median = 43rd value = 81 + 12/27 x 5 years = 83.22 years.
Similarly Q1 = 78.03 years, and Q3 = 87.81 years.
Here the mean is slightly pulled downwards by the small proportion of younger
people in the sample.