Download South Africa - Maths Excellence

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Time series wikipedia , lookup

Transcript
§2: Using a calculator to find
measures of central tendency and
measures of dispersion around a mean
NB
Participants must have a scientific calculator for this workshop. In these notes a
CASIO fx-82ES is used.
The material in this workshop covers some of the aspects of the following Core
Assessment Standards:
A S 10.4.1 (a)
Collect, organise and interpret univariate numerical data in order to determine
measures of central tendency of ungrouped data
AS 11.4.1 (a)
Calculate and represent measures of central tendency and dispersion in univariate
data by calculating the variance and standard deviation of sets of data manually
(for small sets of data) and using available technology (for larger sets of data) and
representing results graphically using histograms and frequency polygons
USING A CALCULATOR TO FIND THE MEAN
Before you can use the calculator to work out the mean of a data set you must first
get it into statistical (STAT) mode. Obviously you can find the mean of a data set
on a calculator without going into the STAT mode but it is easier to use this mode
as it eliminates mistakes made when working out the sum of the data set.
1) Use the [MODE] key when you want to perform statistical calculations. To get
into the STAT MODE, press [MODE] [2:STAT]
2) The following Statistical Calculation Types then appear on the display:
1:
3:
5:
7:
1 – VAR (Single variable)
__ + CX2 (Quadratic regression)
e^X (e exponential regression)
A.X^B (Power regression)
2:
4:
6:
8:
A + BX (Linear regression)
ln X (Logarithmic regression)
A.B^X (ab exponential regression)
1/X (inverse regression)
Press [1: 1  VAR], and the following STAT Editor Screen will appear on the
display:
X
1
2
3
Values are entered be typing the number and pressing [=]
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – May 2007
§2: Mean, variance and standard deviation
FET Data Handling
3) The STAT Calculation Screen is used for performing statistical calculations with
the data you input with the STAT editor screen. Pressing the [AC] key while the
STAT editor screen is displayed switches to the STAT calculation screen.
4) While the STAT editor screen or STAT calculation screen is on the display, press
[SHIFT] [1] (STAT) to display the STAT menu. The following appears on the
display:
Used when you want to:
1:
2:
3:
4:
5:
Type
Data
Edit
Sum
Var
6: MinMax
Display the Statistical Calculation Type
Display the STAT editor screen
Display the Edit sub-menu for editing STAT editor screen
Display the Sum sub-menu of commands for calculating suns
Display the Var sub-menu of commands for calculating the mean,
standard deviation, etc
Display the MinMax sub-menu of commands for obtaining maximum
and minimum
Activity 1
1) Thandi’s marks at the end of the first term are:
English 63%;
Biology 31%;
History 25%;
Geography 63%;
Maths 57%;
Technology 37%
Zulu 77%
Use your calculator to find her mean mark (average of her marks for the term).
a) Get into the Stats mode by entering [MODE] [2: STAT]
b) Enter [1: 1VAR]
c) Enter these marks into the calculator:
63 [=] 31 [=]……….
Look at the display each time. What does it show?
d) After entering all the items press [AC]. What does the display show?
e) To work out the mean press [SHIFT] [1] (STAT) [5: VAR]
Then press [2: X ] [=]
What does the display show?
f)
To find out how many data items you entered press:
[SHIFT] [1] (STAT) [5: VAR] [n] [=]
What does the display show?
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – April 2008
2
§2: Mean, variance and standard deviation
FET Data Handling
2) The height and mass of five sprinters in the schools athletics team are given in
the table below. In order to develop a suitable training diet for the athletes you
need to know the average height and average mass of the sprinters. Work this
out.
Height in cm
Mass in kg
170
85
190
91
185
74
178
68
188
82
3) The table below shows the results of anonymous HIV surveys of women at
antenatal clinics taken since 1990 by the Department of Health in South Africa.
Figures given are percentage estimated HIV infection.
W. Cape E. Cape N. Cape
1996
1997
1998
3,09
6,29
5,20
8,10
12,61
15,90
6,47
8,63
9,90
Free
State
17,49
19,57
22,80
KZN
Mpum.
19,90
26,92
32,50
15,77
22,55
30,02
Limpopo
7,96
8,20
11,50
Gauteng
15,49
17,10
22,50
North
West
25,13
18,10
21,30
Taking into consideration that the survey was limited to women of childbearing age, estimates reflect only 15-49
year-olds.
(This means that in 1996 3,09% of women at the antenatal clinics in the Western
Cape were HIV positive.)
a) Find the mean percentage of HIV infected women in South Africa in 1996,
1997 and 1998. (Give answers correct to 2 decimal places)
b) What does the average tell you about HIV in South Africa?
4) The table below shows the area of each of the 9 provinces in South Africa in km2
Free
KZN
State
129 370 169 580 361 830 129 480 92 100
W. Cape E. Cape N. Cape
Limpop
North
Gauteng
o
West
79 490 123 910 17 010 116 320
Mpum.
a) Find the average area of the provinces.
b) Does this figure have any meaning? Why/why not?
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – April 2008
3
§2: Mean, variance and standard deviation
FET Data Handling
MEASURES OF DISPERSION
Measures of central tendency (or averages) are very important as they can give a
picture of the group that they represent. However, taken by themselves, they give a
very limited view of the whole picture. As well as the average, you need to know
how the rest of the data is grouped around the average – whether it is closely
grouped or scattered more widely. You need to consider a MEASURE OF THE
SPREAD or DISPERSION of data items around the middle values
You can find a measure of dispersion around a mean or a median. In this
workshop we consider dispersion around a mean.
1) VARIANCE
The mean is the balance point of a distribution of data. There are various ways you
can measure the spread of a distribution around its mean. A measure that gives an
idea of the spread of a data set is the deviation from the mean – i.e. how far away
from the mean each data item is.

The differences from the mean are written ( X  X ) or ( X  X ), where X is an

element of the set of data and x is the mean of the set of data.
The variance is the average of the squares of the deviations of each data item
from the mean
Note:
 This measure of spread takes into account all data items.
 It is a measure of the variability of the data items.
 If the value of the variance is large, then the data items are widely spread. If the
value of the variance is small the data items are closely clustered around the
mean.
Activity 2
Suppose two men, Fred and Sipho, each have three sisters.
 The ages of Fred’s sisters (rounded off to the nearest year) are: 22 years, 17
years and 21 years
 The ages of Sipho’s sisters (rounded off to the nearest year) are: 10 years, 12
years and 38 years.
1) Calculate
a) The mean age of Fred’s sisters
b) The mean age of Sipho’s sisters
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – April 2008
4
§2: Mean, variance and standard deviation
FET Data Handling
2) Work out the deviations from the mean of each data item by completing the
table below:
age
x
Fred’s sisters
deviation deviation2
(x – x )
(x – x
age
x
)2
22
10
17
12
21
38
Total =
(x -
x )2
Sipho’s sisters
deviation deviation2
(x – x )
Total =
(x -
(x – x )2
x )2
3) Find the mean (average) of squared deviations. This is called the variance.
Variance =
Fred’s sisters, variance =
2
(x  x)
n
where n = the number of terms
.......... ...
 .......... ......... (2 decimal places)
3
Sipho’s sisters, variance =
.......... ......
 .......... .......... . (2 decimal places)
3
As you can see these numbers bear no real relation to the ages of the sisters in each
case and it is sometimes a little difficult to understand what the variance is saying
about a data set.
2) STANDARD DEVIATION
The most common measure of dispersion is the standard deviation. It is simply
the square root of the variance and is usually represented by the letter s or 
(lower case “sigma”)
To find the standard deviation we use the formula:
Standard deviation =
var iance 
For Fred’s sisters, standard deviation =
(x  x)
2
n
4,66666 ...  2,16 years
And for Sipho’s sisters the standard deviation =
162,66666 ...  12,75 years
Note:
 The standard deviation has the same units as the data items and as the mean.
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – April 2008
5
§2: Mean, variance and standard deviation





FET Data Handling
A small standard deviation tells you that the data items are closely clustered
around the mean, while a large standard deviation tells you that the items are
more spread out.
The standard deviation is the most commonly used measure of dispersion.
Although it looks as though the standard deviation is complicated to calculate it
really takes little time and is very easy if you use a calculator or a computer
spreadsheet.
The standard deviation can be used to compare different sets of data.
Two versions of the formula for standard deviation are used in statistics.
Dividing by (n  1) gives a slightly larger value for the standard deviation and
this works better when dealing with statistical inference in larger populations.
Activity 3
The table below shows the average longevity (in years) of domesticated animals
Cat
Cow
Dog
Donkey
Goat
Guinea
pig
Horse
Pig
Rabbit
Sheep
12
15
12
12
8
4
20
10
5
12
1) Find the mean age of the animals listed in the table.
2) Calculate the square of the deviations from the mean by filling the information
onto the following table:
Age
x
Deviation from
the mean
(Deviation)2
(x  x )2
(x  x)
12
15
12
12
8
4
20
10
5
12
Total =
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – April 2008
(x  x)
2
=
6
§2: Mean, variance and standard deviation
FET Data Handling
3) Find the standard deviation of the data by substituting into the following
formula:
Standard deviation =

x  x 
2
n
4) What do you think the mean and the standard deviation tell you about the data?
3) USING THE CALCULATOR TO FIND THE STANDARD DEVIATION
It is easy to use a calculator to find the standard deviation. Remember to first get
into STAT mode on the calculator.
To work out the standard deviation on the calculator all you need to do is enter the
data and then enter:
[SHIFT] [1] (STAT)
[5: VAR]
[3: x  n]
[=]
Look again at Fred’s and Sipho’s sisters
Fred’s sisters ages are:
22 ; 17 and 21
Sipho’s sister’s ages are: 10 ; 12 and 38
To work out the Standard Deviation for Fred’s sisters, press the following keys
[MODE] [2: STAT]
[1: 1VAR]
22 [=] 17 [=] 21 [=]
[SHIFT] [1] (STAT)
[5: VAR]
[3: x  n]
[=]
You should find that the standard deviation of the ages of Fred’s sisters = 2,16
Similarly you should find the standard deviation of ages of Sipho’s sisters = 12,75
What do these values tell you about the spread of the data?
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – April 2008
7
§2: Mean, variance and standard deviation
FET Data Handling
Activity 4
1) The table shows the lowest temperatures ever recorded in 9 cities around the
world.
Cities
Addis Ababa
Algiers
Bangkok
Johannesburg
Madrid
Nairobi
Sao Paulo
Warsaw
Washington
Countries
Ethiopia
Algeria
Thailand
South Africa
Spain
Kenya
Brazil
Poland
USA
Temperatures °C
0
0
10
–8
–10
5
0
–30
–26
a) Use your calculator to find
i) the mean of the temperatures
ii) the standard deviation of the temperatures.
b) What does the standard deviation tell you about the data?
c) What does this information tell you about Johannesburg’s temperature?
2) The figures below show the life expectancy at birth in countries belonging to the
Southern African Development Community (SADC)
Life expectancy
at birth 1995 2000
Angola
Botswana
Dem Rep of Congo
Lesotho
Malawi
Mauritius
Mozambique
45,2
40,3
51,3
45,7
40,0
71,3
39,3
Life expectancy
at birth 1995 2000
Namibia
Seychelles
South Africa
Swaziland
Tanzania
Zambia
Zimbabwe
44,7
72,7
52,1
44,4
51,1
41,4
42,9
a) Use the calculator to find the mean of the data.
b) Use the calculator to find the standard deviation of the data.
c) What does this tell you about the data?
d) Why do you think there is such a spread in the life expectancy?
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – April 2008
8
§2: Mean, variance and standard deviation
FET Data Handling
4) THE STANDARD DEVIATION AND THE MEAN
Measures of central tendency and spread of a data set help you describe the data
more fully. The measures usually combined together are either the mean and the
standard deviation, or the median and the quartiles. The right choice of summaries
to use depends in the ‘shape’ of the distributions.
The standard deviation and the mean together provide a measure of variability
within a single data set (i.e. finding measure of one, two or three standard
deviations around the mean and counting how many data items fall within the
intervals), as well as a contrast between two data sets. They give a way of
characterising distributions of data.
This first diagram shows frequency polygons of three distributions having the same
mean and varying standard deviations. The second diagram shows three
distributions having the same standard deviation and different means:
[Hodge,S & Seed, M (1972) Statistics and probability Blackie & Sons, Glasgow, page 78]
Activity 5
The maths marks of two learners in you class are given below.
Jabu
20
16
10
3
12
10
11
14
5
19
Mmatsie
13
12
11
13
13
11
12
12
11
12
1) Find the mean of each data set
2) Find the standard deviation of each data set
3) The headmaster asks you to write a report comparing the progress of the two
learners. Using measures of central tendency and measures of dispersion what
can you say about the learners work?
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – April 2008
9
§2: Mean, variance and standard deviation
FET Data Handling
5) USING THE STANDARD DEVIATION TO REACH CONCLUSIONS:
Provided that the sample size is reasonably large and the data is not too skewed
(that is, it does not have some very large or very small values), it is possible to make
the following approximate statements:
 About 66% of the individual observations will lie within one standard deviation
of the mean.
 For most data sets, about 95% of the individual observations will lie within two
standard deviations of the mean.
 Almost all of the data will lie within three standard deviations of the mean.
ACTIVITY 6
The office manager of a small office wants to get an idea of the number of phone
calls made by the people working in the office during a typical day in one week in
June. The number of calls on each day of the (5-day) week is recorded. They are as
follows:
Monday – 15; Tuesday – 23; Wednesday – 19; Thursday – 31; Friday – 22
1) Determine
a) the mean number of phone calls per day
b)
the standard deviation (correct to 1 decimal places).
2) On what percentage of the days is the number of calls within one Standard
Deviation of the mean?
One Standard Deviation from the mean is:
x  σ ………………………………
So the interval is (……  …… ; …… + ……) = ……………………
The phone calls on ………………………………………………………………………….
fall within the interval.

100% = ……………………
So the number of phone calls on ………. of the days lies within one Standard
Deviation of the mean.
Written by Jackie Scheiber and Meg Dickson
© RADMASTE Centre, University of the Witwatersrand – April 2008
10