Download S1.2 Calculating means and standard deviations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
AS-Level Maths:
Statistics 1
for Edexcel
S1.2 Calculating means
and standard deviations
This icon indicates the slide contains activities created in Flash. These activities are not editable.
For more detailed instructions, see the Getting Started presentation.
11 of
of 26
26
© Boardworks Ltd 2005
Contents
Means
Calculating means
Calculating standard deviations
Coding
22 of
of 26
26
© Boardworks Ltd 2005
Mean
The mean is the most widely used average in statistics. It is
found by adding up all the values in the data and dividing by
how many values there are.
Notation: If the data values are x1, x2 , x3 ,..., xn , then the
mean is
This is the
mean symbol
x1  x2  x3  ...  xn  xi
x

n
n
This symbol
means the
total of all the
x values
Note: The mean takes into account every piece of
data, so it is affected by outliers in the data. The
median is preferred over the mean if the data
contains outliers or is skewed.
3 of 26
© Boardworks Ltd 2005
Mean
If data are presented in a frequency table:
Value
Frequency
x1
x2
f1
f2
…
…
xn
fn
then the mean is
x1 f1  x2 f 2  ...  xn f n  xi fi
x

 fi
 fi
4 of 26
© Boardworks Ltd 2005
Mean
Example: The table shows the results of a survey
into household size. Find the mean size.
Household size, x
Frequency, f
x×f
1
20
20
2
28
56
3
25
75
4
19
76
5
16
80
6
6
36
TOTAL
114
343
To find the mean, we add a 3rd column to the table.
Mean = 343 ÷ 114 = 3.01
5 of 26
© Boardworks Ltd 2005
Contents
Standard deviation
Calculating means
Calculating standard deviations
Coding
66 of
of 26
26
© Boardworks Ltd 2005
Standard deviation
There are three commonly used measures of spread (or
dispersion) – the range, the inter-quartile range and the
standard deviation.
The standard deviation is widely used in statistics to measure
spread. It is based on all the values in the data, so it is
sensitive to the presence of outliers in the data.
The variance is related to the standard deviation:
variance = (standard deviation)2
The following formulae can be used to find the variance and s.d.
(x  x )

variance 
i
n
7 of 26
2
s.d. 
 (x  x )
2
i
n
© Boardworks Ltd 2005
Standard deviation
Example: The mid-day temperatures (in °C) recorded for
one week in June were: 21, 23, 24, 19, 19, 20, 21
First we find the mean: x 
xi
xi  x
( xi  x )2
21
0
0
23
2
4
24
3
9
19
-2
4
19
-2
4
20
-1
1
21
0
0
Total:
8 of 26
21 23  ...  21 147

 21°C
7
7
(x  x )

variance 
2
i
n
So variance = 22 ÷ 7 = 3.143
So, s.d. = 1.77°C (3 s.f.)
22
© Boardworks Ltd 2005
Standard deviation
There is an alternative formula which is usually a more
convenient way to find the variance:
variance 
 ( xi  x )
2
n
But,  ( xi  x )2   ( xi2  2 xi x  x 2 )
  xi2  2 x  xi  nx 2
  xi2  2 x  nx  nx 2
  xi2  nx 2
Therefore,
9 of 26
x

variance 
i
n
2
 x and s.d. 
2
x
i
n
2
 x2
© Boardworks Ltd 2005
Standard deviation
Example (continued): Looking again at the temperature
data for June: 21, 23, 24, 19, 19, 20, 21
147
 21°C
We know that x 
7
Also,
So,
2
2
2
= 3109
x

21

23

...

21
i
2
x

variance 
i
n
2
3109
x 
 212  3.143
7
2
s.d.  1.77 °C
Note: Essentially the standard deviation is a measure
of how close the values are to the mean value.
10 of 26
© Boardworks Ltd 2005
Calculating standard deviation from a table
When the data is presented in a frequency table, the formula
for finding the standard deviation needs to be adjusted slightly:
s.d. 

f
Example: A class of 20
students were asked how
many times they exercise
in a normal week.
Find the mean and the
standard deviation.
11 of 26
fi  xi
2
 x2
i
Number of times
exercise taken
Frequency
0
5
1
3
2
5
3
4
4
2
5
1
© Boardworks Ltd 2005
Calculating standard deviation from a table
No. of times
exercise taken, x
Frequency, f
x×f
x2 × f
0
5
0
0
1
3
3
3
2
5
10
20
3
4
12
36
4
2
8
32
5
1
5
25
20
38
TOTAL:
116
The table can be extended to help find the mean and the s.d.
38
x
 1 .9
20
12 of 26
s.d. 

f
fi  xi
i
2
116
x 
 1.92  1.48
20
2
© Boardworks Ltd 2005
Calculating standard deviation from a table
If data is presented in a grouped frequency table, it is only
possible to estimate the mean and the standard deviation.
This is because the exact data values are not known.
An estimate is obtained by using the mid-point of an interval to
represent each of the values in that interval.
Example: The table
shows the annual mileage
for the employees of an
insurance company.
Estimate the mean and
standard deviation.
13 of 26
Annual mileage, x
Frequency
0 ≤ x < 5000
6
5000 ≤ x < 10,000
17
10,000 ≤ x < 15,000
14
15,000 ≤ x < 20,000
5
20,000 ≤ x < 30,000
3
© Boardworks Ltd 2005
Calculating standard deviation from a table
Mileage
Frequency, f
Mid-point, x
f×x
f × x2
0 – 5000
6
2500
15000
37,500,000
5000 – 10,000
17
7500
127,500
956,250,000
10,000 – 15,000
14
12,500
175,000
2,187,500,000
15,000 – 20,000
5
17,500
87,500
1,531,250,000
20,000 – 30,000
3
25,000
75,000
1,875,000,000
480,000
6,587,500,000
TOTAL
45
480,000
x
 10,667 miles
45
s.d. 
14 of 26
6,587,500,000
 10,6672  5711 miles
45
© Boardworks Ltd 2005
Notes about standard deviation
Here are some notes to consider about standard deviation.
In most distributions, about 67% of the data will lie within
1 standard deviation of the mean, whilst nearly all the
data values will lie within 2 standard deviations of the mean.
Values that lie more than 2 standard deviations from the
mean are sometimes classed as outliers – any such
values should be treated carefully.
Standard deviation is measured in the same units as the
original data. Variance is measured in the same units
squared.
Most calculators have a built-in function which will find
the standard deviation for you. Learn how to use this
facility on your calculator.
15 of 26
© Boardworks Ltd 2005
Examination-style question
Examination-style question:
The ages of the people in a
cinema queue one Monday
afternoon are shown in the
stem-and-leaf diagram:
2
3
4
5
6
2
3
1
1
0
1
3 means 23 years old
6
6 6
2 5 6 9
4 7
a) Explain why the diagram suggests that the mean and
standard deviation can be sensibly used as measures of
location and spread respectively.
b) Calculate the mean and the standard deviation of the ages.
c) The mean and the standard deviation of the ages of the
people in the queue on Monday evening were 29 and
6.2 respectively. Compare the ages of the people
queuing at the cinema in the afternoon with those in the
evening.
16 of 26
© Boardworks Ltd 2005
Examination-style question
2 3 means 23 years old
a) The mean and the standard
2 3 6
deviation are appropriate, as
3 1 6 6
the distribution of ages is
4 1 2 5 6 9
roughly symmetrical and
5 0 4 7
6 1
there are no outliers.
597
b)  xi  597 so, x 
 42.64286  42.6
14
27,131
2
2
x

27131
so,
s.d.


42
.
64286
 10.9
i
14
c) The cinemagoers in the evening had a smaller mean
age, meaning that they were, on average, younger
than those in the afternoon.
The standard deviation for the ages in the evening was
also smaller, suggesting that the evening audience were
closer together in age.
17 of 26
© Boardworks Ltd 2005
Combining sets of data
Sometimes in examination questions you are asked to pool
two sets of data together.
Example: Six male and five female students sit an
A-level examination.
The mean marks were 52% and 57% for the males
and females respectively. The standard deviations
were 14 and 18 respectively.
Find the combined mean and the standard deviation
for the marks of all 11 students.
18 of 26
© Boardworks Ltd 2005
Combining sets of data
Let x1,..., x6 be the marks for the 6 male students.
Let y1,..., y5 be the marks of the 5 female students.
To find the overall mean, we first need to find the total
marks for all 11 students.
As y  57
 x  6  52  312
 y  5  57  285
Therefore
 x  y  312  285  597
As x  52
So the combined mean is:
19 of 26
597
 54.2727...  54.3%
11
© Boardworks Ltd 2005
Combining sets of data
To find the overall standard deviation, we need to find the
total of the marks squared for all 11 students.
Notice that the formula s.d. 
rearranges to give
x
i
n
2
 x2
2
2
2
x

n

(
s.d.

x
)

As s.d.x  14
2
2
2
x

6

(
14

52
)  17,400

As s.d.y  18
2
2
2
y

5

(
18

57
)  17,865

Therefore,
2
2
x

y
   35,265
So the combined s.d. is:
20 of 26
35,265
 54.272  16.1% (to 3 s.f.)
11
© Boardworks Ltd 2005
Contents
Coding
Calculating means
Calculating standard deviations
Coding
21
21 of
of 26
26
© Boardworks Ltd 2005
Coding
Coding is a technique that can simplify the numerical effort
required in finding a mean or standard deviation.
Enter some data below, and see how it changes when you
add or multiply by different numbers.
22 of 26
© Boardworks Ltd 2005
Coding
Adding
So, if a number b is added to each piece of data, the
mean value is also increased by b.
The standard deviation is unchanged.
Multiplying
If each piece of data is multiplied by a, the mean value
is multiplied by a.
The standard deviation is also multiplied by a.
More formally, if yi  axi  b then:
y  ax  b
s.d.y  a  s.d.x
23 of 26
© Boardworks Ltd 2005
Coding
Example: Find the mean and the standard deviation of the
values in the table. Use the transformation below to help you.
1
y
x 5
10
x
Frequency
y
50
3
0
60
5
1
70
7
2
80
4
3
90
1
4
Using the given transformation, add a y column to the table.
24 of 26
© Boardworks Ltd 2005
Coding
y
Frequency, f
y×f
y2 × f
0
3
0
0
1
5
5
5
2
7
14
28
3
4
12
36
4
1
4
16
20
35
85
Total
To find the mean:
To find the s.d.:
35
y
 1.75
20
s.d. 
 f y
f
i
i
i
25 of 26
2
 y2 
85
 1.752  1.09
20
© Boardworks Ltd 2005
Coding
You have now found the mean and standard deviation of y.
To find them for the x values, you must reverse the coding.
1
x 5
We can rearrange: y 
10
to get:
x  10 y  50
Therefore the mean of x is: x  10 y  50  10  1.75  50  67.5
And the standard deviation of x is: 10 × 1.09 = 10.9
Note how the coding helped to simplify the
calculations by making the numbers smaller.
26 of 26
© Boardworks Ltd 2005