Download Topic 3: Measures of Central Tendency

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

World Values Survey wikipedia , lookup

Transcript
Topic 3: Measures of Central Tendency
With any set of numerical data, there is always a temptation to summarise the data to a single
value that attempts to be a typical value for the data. There are three commonly used measures to
describe the 'central' or 'typical' value of the data, the Mean, Median and Mode.
Mean
The mean is commonly referred to as the average.
The mean is the most common measure used. It is found by adding up all the values and dividing
by the number of values. Because of this, every value contributes to the mean.
For a set of values written as X1, X2, X3, X4, X5,………. Xn, the sample mean is calculated using the
equation:
=
X
sum of the values
=
number of values in the sample
∑X
n
where X is the mean of the n values in the sample.
The symbol Σ is shorthand for "the sum of".
The equation for the population mean uses some different pronumerals.
=
m
sum of the values
=
number of values in the population
∑X
N
where μ is the mean of the N values in the population.
Because the calculation of the sample and population means is the same (except for the
pronumerals used in the equation), calculators have just one key to cover both.
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 1
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
Finding the mean of Individual Values
This data is the test results of a sample 15 students on a maths test (in order).
23, 45, 50, 54, 55, 57, 59, 59, 59, 61, 63, 75, 75, 81, 90.
X=
∑X
n
23 + 45 + 50 + ............. + 75 + 81 + 90
X=
15
X = 60.4
The mean can easily be calculated on a calculator, especially a scientific calculator with STATS
mode (covered later).
Finding the mean from a Frequency Distribution Table
Let’s recall the data about the number of matches in 50 boxes which was organised in a FDT with
no grouping. The table has a new column added headed f x x. This column is the score multiplied
by the frequency.
f ×X
Number of
Matches (X)
Frequency (f)
46
2
2 x 46 = 92
47
3
3 x 47 = 141
48
5
240
49
6
294
50
16
800
51
6
306
52
4
208
53
3
159
54
3
162
55
2
110
Total
50
Σf =
ΣfX =
2512
92 represents the total of all
the 46s in the data.
141 represents the total of
all the 47s in the data.
2512 is the total of all the
values in the data.
The mean number of matches per box is given by the equation:
sum of the values
number of values in the sample
ΣfX
X=
Σf
2512
X=
50
X = 50.24
X=
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 2
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
The mean number per box is 50.24 matches.
Now let’s look at the example about the money spent by children at the show. This was organised
using a Grouped FDT.
Finding the mean is performed in a similar manner to above except the Group Midpoint is used to
represent the group.
Group
Midpoint
(X)
f ×X
Amount Spent$
Frequency
(f)
$0 but less than $10
2
5
2 x 5 = 10
$10 but less than $20
3
15
3 x 15 = 45
$20 but less than $30
5
25
125
$30 but less than $40
4
35
140
$40 but less than $50
2
45
90
$50 but less than $60
8
55
440
$60 but less than $70
14
65
910
$70 but less than $80
18
75
1350
$80 but less than $90
12
85
1020
$90 but less than $100
2
95
190
Total
70
Σf =
ΣfX =
4320
The mean amount spent is given by the equation:
sum of the values
number of values in the sample
ΣfX
X=
Σf
4320
X=
70
X = $61.71 (to the nearest cent)
X=
Remember this method assumes that the values in each group are evenly spread. This
assumption is not always true so the figure obtained for the mean using this method can slightly
different to the mean if the original values where used.
Finding the mean using a Calculator
A scientific calculator is usually capable of performing this operation when STATS mode is used.
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 3
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
The instructions below apply to the Casio fx-82AU and the Sharp EL531 only. If you have a
different scientific calculator to this, you will need to read the user's guide for your calculator.
There are 3 steps to using a scientific calculator in STATS mode:
1. Getting the calculator into STATS mode.
2. Entering the values (data).
3. Obtaining the statistical measure required.
On a Casio fx-82AU:
On a Sharp EL531
To get this calculator into STATS mode
To get this calculator into STATS mode
Press
Mode . The display will be:
Press
Mode .
Press
1 for STAT
Press
0 for Standard Deviation (+ more)
The calculator should display Stat 0 in the top left
of the display.
Press
2 for STAT
Press
1 for 1-VAR
The calculator should display a small STAT on
the top line of the display.
Now your calculator is in STATs mode, the next stage is to enter the data. There are slight
differences depending on how the data is organised.
Finding the Mean of Individual Values
This data is the test results of a sample 15 students on a maths test (in order).
23, 45, 50, 54, 55, 57, 59, 59, 59, 61, 63, 75, 75, 81, 90.
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 4
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
On a Casio fx-82AU:
On a Sharp EL531
Each number is entered into the list, 23 =, 45 =:
Each number from the list is entered as:
23 then press
M+
the calculator display will show
45 then press
If you make a mistake, use the big round blue
‘Replay’ button to navigate back to the position of
the incorrect value and enter the correct value.
1
M+
the calculator display will show
.
.
.
.
.
90 then press
DATA SET =
DATA SET=
2
DATA SET=
15
M+
the calculator display will show
DATA SET=
The display
15 informs you that
15 numbers have been entered into the STATs
memory.
If you make a mistake and need to start again,
the STATs memory is cleared by pressing
2ndF CA
Now the data is entered, the calculated value of the sample and population standard deviations
can be obtained.
On a Casio fx-82AU:
On a Sharp EL531
When all the values have been entered, press
AC This is important.
To obtain the value of the population standard
deviation; press
Now press SHIFT 1 . This is the same as
population standard deviation.
SHIFT STAT .
The calculator display will be as shown below.
RCL σ x the symbol σ is used to represent the
To obtain the value of the sample standard
deviation; press
RCL sx the symbol s is used to represent the
sample standard deviation.
(Think of RCL as recall)
Other values are found by;
To obtain the value of the mean; press
RCL x the symbol x means the mean.
To obtain the number of values entered; press
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 5
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
RCL n the symbol n means the number of
numbers entered.
To obtain the sum of the values entered; press
RCL Σx the symbol Σx means the sum of the
numbers entered.
To obtain the sum of the squares of the values
entered; press
RCL Σx 2 the symbol Σx 2 means the sum of the
squares of the numbers entered.
Now press 4 to get more options.
Press 2 = – for the mean
Press 3 = – for the population standard dev.
Press 4 = – for the sample standard dev.
Always press = to get the value required.
Do the calculation yourself, the values obtained should be:
Mean = 60.4
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 6
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
Finding the Mean from a Frequency Distribution Table
Let’s recall the data about the number of matches in 50 boxes which was organised in a FDT with
no grouping.
Number of
Matches (X)
Frequency (f)
46
2
47
3
48
5
49
6
50
16
51
6
52
4
53
3
54
3
55
2
Total
Σf =
50
It is worth recalling that this table is informing us that there are 2 occurrences of the value 46, 3
occurrences of 47, 5 occurrences of 48, ….
Instead of entering the value of 50, 16 times a modified entry process is used. The process for
finding the standard deviation is exactly the same.
Remember to clear STATs memory first.
With a Frequency Distribution Table, the data is entered into the Sharp EL531 calculator in the
format: score, frequency .
On the Casio fx-82AU, the table the values are entered into needs to be changed into a frequency
like the example above. To do this:
Press SHIFT MODE . Eight options will be displayed.
Press the big blue key down
.
This will take you to another screen with 5
options.
Press 3: STAT The screen will then display
Press 1: ON This will add a frequency column
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 7
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
On a Casio fx-82AU:
On a Sharp EL531
Get the calculator into STATS mode:
Each row from the table is entered as:
Press
Mode . The display will be:
46 , 2 M +
the calculator display will show
DATA SET =
1
47 , 3 M +
the calculator display will show
DATA SET=
2
DATA SET=
Press
2 for STAT
The display
2 informs you that 2
rows have been entered into the STATs memory.
48 , 5 M +
Press
1 for 1-VAR
This time the calculator will display an X column
and a FREQ column. Enter a value followed by =.
Navigate to the frequency column using the big
blue REPLAY key.
the calculator display will show
.
.
.
.
.
.
.
DATA SET=
3
55 , 2 M + the calculator display will show
DATA SET=
15
If you make a mistake and need to start again,
the STATs memory is cleared by pressing
2ndF CA
Obtaining values for the mean is exactly the same process as in the previous section.
Do the calculation yourself, the values obtained should be:
Mean = 50.24
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 8
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
If the data is grouped, the process is almost identical except that the group midpoint is used.
Amount Spent$
Group
Midpoint
(X)
Frequency
(f)
$0 but less than $10
5
2
$10 but less than $20
15
3
$20 but less than $30
25
5
$30 but less than $40
35
4
$40 but less than $50
45
2
$50 but less than $60
55
8
$60 but less than $70
65
14
$70 but less than $80
75
18
$80 but less than $90
85
12
$90 but less than $100
95
2
Total
Σf =
70
The data is entered into the calculator as Group Midpoint, Frequency or Group Midpoint;
Frequency depending on the calculator used.
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 9
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
Median
The median is the middle value of the data if the values are arranged in order.
As a measure of centre, it is a value based on its position; it is not influenced by the size of the
values that are above or below it.
Therefore it is quite different to the mean because it is based on position and for the mean, every
value contributes.
Finding the median of Individual Values
This data is the test results of a sample 15 students on a maths test (in order).
23, 45, 50, 54, 55, 57, 59, 59, 59, 61, 63, 75, 75, 81, 90.
When finding the median for a set of individual values, the first step is to order the values. Usually
this is done in ascending order, but descending order gives the same result.
The position of the median can always be found by using a simple expression:
The position of the median is
n +1
.
2
As there are 15 values, the position of the median is:
n + 1 15 + 1 16
=
= = 8
2
2
2
th
The 8 value is the median.
The median is the 8th value, which is 59. The median is the 8th value, so there are 7 values before
it and 7 values after it.
23, 45, 50, 54, 55, 57, 59
59
59, 61, 63, 75, 75, 81, 90
7 values
median
7 values
= 15 values
The median cannot be calculated on a scientific calculator.
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 10
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
Finding the median from a Frequency Distribution Table
Let’s recall the data about the number of matches in 50 boxes which was organised in a FDT with
no grouping.
To find the median, the cumulative frequency column is required.
Number of
Matches (X)
Frequency (f)
Cumulative
Frequency
46
2
2
47
3
5
48
5
10
49
6
16
50
16
32
51
6
38
52
4
42
53
3
45
54
3
48
55
2
50
Total
Σf =
50
th
The 17 through to the
nd
32 values are in this
group.
th
As there are 50 values, the position of the median is:
This means that the 25
th
and 26 values are both
50.
n + 1 50 + 1 51
=
= = 25.5
2
2
2
In this case the 25th and 26th values are required. The median is the average of the 25th and 26th
values. As the 25th and 26th values are both 50, the median is 50.
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 11
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
Now let’s look at the example using a Grouped FDT. Finding the median can be performed in one
of two ways: interpolation method or graphical method.
(a) Interpolation Method (Amount Spent at a Show Data)
Amount Spent$
Frequency
(f)
Group
Midpoint
(X)
Cumulative
Frequency
$0 but less than $10
2
5
2
$10 but less than $20
3
15
5
$20 but less than $30
5
25
10
$30 but less than $40
4
35
14
$40 but less than $50
2
45
16
$50 but less than $60
8
55
24
$60 but less than $70
14
65
38
$70 but less than $80
18
75
56
95
70
The group width
$80 but =70-60
less than $90
=10
12
$90 but less than $100
2
Total
Σf =
70
This means that the
th
24 score is $60
This means that the
th
38 score is $70
There are 14 values
in this 85
group.
68
As there are 70 values, the position of the median is:
n + 1 70 + 1 71
=
= = 35.5
2
2
2
In this case the 35th and 36th values are required. Both of these values are located in the group
‘$60 but less than $70’. This means that the median is greater than 60 but less than 70. This
means that the median is (35.5 – 24 =) 11.5/14 of the way through the ‘$60 but less than $70’ group.
The median is:
( 35.5 − 24 ) ×10 (Group width)
Median =
60 +
14
Median = 68.2
Or expressed more generally:
 n +1

 -cf m -1
2 
× Group width
Median =
Lm + 
fm
where Lm is the lower limit of the median group
f m is the frequency of the median group
cf m -1 is the cumulative frequency of the previous group
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 12
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
(b) Graphical method (using the Ogive)
There are 70 values in the table. This is shown on the vertical axis (Cumulative Frequency).
The first step is to come half way up the vertical axis.
This is to 35.
The second step; from this position, move horizontally across the graph until the cumulative
frequency Ogive is found.
The third step is to move down (vertically) to the horizontal axis and read the value.
Ogive of Money Spent
80
70
Cumulative Frequency
60
50
40
30
20
10
0
0
20
40
60
80
100
120
Amount Spent $
From the graph, the median is approximately 68.
Remember this method assumes that the values in each group are evenly spread. This
assumption is not always true, so the figure obtained for the median using this method can slightly
different to the actual median if the original values where used.
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 13
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
Mode
The Mode is the value that occurs the most often.
The mode may not exist because values only occur once.
The mode could even be quite different to the mean and median, it may be much higher or much
lower depending on the meaning of the data.
There may also be more than one mode. If there are two modes, the data is said to be ‘bimodal’.
Finding the mode from Graphs
Number of Children in 40 households
16
14
Frequency
12
10
8
6
4
2
0
0
1
2
3
4
5
Number of Children
From the graph, it can be observed that '1 child per household' has the highest frequency. This is
the mode of these values.
Finding the mode of Individual Values
This data is the test results of a sample 15 students on a maths test (in order).
23, 45, 50, 54, 55, 57, 59, 59, 59, 61, 63, 75, 75, 81, 90.
When inspecting this data it can be seen that the 59 occurs 3 times.
This is the value that occurs the most frequently, that is, it has the highest frequency.
The mode for this data is 59.
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 14
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
Finding the mode from a Frequency Distribution Table
Let’s recall the data about the number of matches in 50 boxes which was organised in a FDT with
no grouping.
To find the mode, the frequency column is inspected. The highest frequency of 16 is related to a
box containing 50 matches. The mode of this data is 50 matches. Take care to give the mode as
the score (50 matches) not the frequency (16).
Number of
Matches (X)
Frequency (f)
46
2
47
3
48
5
49
6
50
16
51
6
52
4
53
3
54
3
55
2
Total
Σf =
50
The mode for this
data is 50 matches
as it has the
highest frequency
(occurs the most
often)
When data is grouped, it is more relevant to refer to the modal group.
From the scenario, ‘Amount Spent at the Show’, the group with the highest frequency is '$70 but
less than $80' which has a frequency of 18.
It can be stated that the modal group is '$70 but less than $80'.
Amount Spent$
Frequency
(f)
Group
Midpoint
(X)
$0 but less than $10
2
5
$10 but less than $20
3
15
$20 but less than $30
5
25
$30 but less than $40
4
35
$40 but less than $50
2
45
$50 but less than $60
8
55
$60 but less than $70
14
65
$70 but less than $80
18
75
$80 but less than $90
12
85
The modal group
for this data is ‘$70
but less than $80’
as it has the
highest frequency
(occurs the most
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 15
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
$90 but less than $100
2
Total
Σf =
70
95
Either the group ‘$70 but less than $80’ or the Group Midpoint ‘$75’ can be given as the mode.
Which measure of centre should be used and when?
Every value from the data is used to calculate the mean. As long as the data is spread out without
excessive variation, the mean is usually used. A lecturer would use the mean to find a measure of
centre because the marks would be usually spread out from (say) 30 to 100%. In this data the low
or high values are not excessively different to a measure of centre.
In real estate, the median is often used as a measure of centre because there are some
excessively high values that are quite different to the bulk of the market. Real estate sales
generally consist of many properties at the cheaper part of the market and fewer properties at the
expensive part of the market. If the mean was to be used, the fewer, more expensive properties
would have a huge effect on the mean. Because the median is a measure of centre based on
location, variation in the price of high value properties and the number of high value properties will
have no effect on the median but would have a significant effect on the mean.
In the clothing industry, the mode is often used as a measure of centre. If a shop sells mostly size
12 clothing, then size 12 is the mode of the sizes sold. The mean is of little use because it could be
a value that is not even a possible size. The median is a better measure but fails to reflect the
nature of the sales environment.
Video ‘Measures of Central Tendency’
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 16
[last edited on 21 July 2015] CRICOS Provider: 01241G
Centre for Teaching and Learning
Numeracy
Activity
1.
The lifetime, in hours, of a sample of 15 light bulbs is: 351, 429, 885, 509, 317, 753, 827,
737, 487, 726, 395, 773, 926, 688, 485.
Calculate the mean, mode and median of the values. Do not organise into a table.
2.
The number of children in a 10 families is: 1, 5, 2, 2, 2, 3, 1, 4, 3, 2. Calculate the mean,
mode and median of the values. Do not organise into a table.
3.
For the Car Speed Data, calculate the mean, mode and median of the values after
organising the data in Frequency Distribution Table (The FDT can be found in the answers to
the Topic ‘Graphs’).
12
31
22
40
4.
41
35
50
52
44
31
46
57
Car Speed Data (in km/hr)
45
28
40
20
59
27
14
33
48
27
61
42
32
49
25
39
62
19
32
64
46
58
52
52
25
38
69
27
The number of students attending a class (maximum 25) for 30 lessons is given in the table
below:
25
25
24
22
25
24
24
25
24
23
Students Attending Class
24
25
23
24
20
23
25
24
24
25
24
25
25
23
24
23
25
24
21
22
(a) Calculate the mean, mode and median using a Frequency Distribution Table. (The FDT
can be found in the answers to the Topic ‘Graphs’).
(b) Discuss the appropriateness of each measure of central tendency as a typical value.
5.
The Shot Put distances thrown by 27 world champion shot putters are given in the table
below. The unit is metres (m).
22.25
21.72
21.07
20.22
22.67
20.19
20.37
21.55
20.38
21.58
21.39
20.45
23.12
22.37
21.72
21.25
23.09
20.91
22.19
21.19
21.19
22.58
21.70
22.07
21.22
21.97
20.54
Calculate the mean, mode and median using a grouped Frequency Distribution Table. (The
FDT can be found in the answers to the Topic ‘Graphs’)
Centre for Teaching and Learning | Academic Practice | Academic Skills
T +61 2 6626 9262 E [email protected] W www.scu.edu.au/teachinglearning
Page 17
[last edited on 21 July 2015] CRICOS Provider: 01241G