Download chapter 2: statistics

Document related concepts

Statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
A Mathematical Skills Fundamental for the Pulp and Paper Industry
STATISTICS AND PROBABILITIES 4
Facilitator Guide
NQF Level 4
Credits: 5
Unit Standard 9015
Compiled by:
Amanda Gilfillan
Johan Els
for
FIETA
Sparrow Research and Industrial Consultants © July 2005
Statistics and Probabilities 4
Learning Outcomes
Upon studying this module, the learner will be able to apply his / her knowledge of
statistics and probability to:

critically interrogate and effectively communicate results

look at samples in terms of size and representativeness

understand what a normal distribution is

have a basic understanding of probability
Specific Outcomes
Unit Standard 9015: Apply knowledge of statistics and probability to critically
interrogate and effectively communicate findings on life related problems

Critique and use techniques for collecting, organising and representing data

Use theoretical and experimental probability to develop models

Critically interrogate and use probability and statistical models
Facilitator Guide
US 9015 – Statistics and Probabilities 4
2
Sparrow Research and Industrial Consultants © July 2005
Table of Contents
CHAPTER 1:
1
PROBABILITY ............................................................................................ 4
WHAT IS PROBABILITY? ......................................................................................... 4
2
CALCULATING PROBABILITY ................................................................................. 4
2.1
Theoretical probability ........................................................................................ 6
2.2
Experimental probability ..................................................................................... 6
2.3
Subjective probability ......................................................................................... 6
3
NOTATION ................................................................................................................ 7
4
ADDITIONAL RULES ...............................................................................................14
4.1
Mutually exclusive events..................................................................................15
4.2
Exhaustive ........................................................................................................15
5
TREE DIAGRAMS ....................................................................................................20
6
TERMS USED ..........................................................................................................23
6.1
Probability statements .......................................................................................23
6.2
Sample space ...................................................................................................23
6.3
Joint and disjoint outcomes ...............................................................................24
6.4
Independent and dependent outcomes .............................................................24
7
SUMMARY ...............................................................................................................28
CHAPTER 2:
STATISTICS .............................................................................................29
1
THE NORMAL DISTRIBUTION ................................................................................29
1.1
Skewed and symmetrical distributions...............................................................29
1.2
Normal distribution curves .................................................................................30
1.3
Characteristics of the normal curve ...................................................................32
1.4
Skewed data .....................................................................................................45
2
SAMPLE SIZE AND REPRESENTATIVENESS .......................................................45
2.1
Sample representativeness ...............................................................................45
2.2
Sample size ......................................................................................................46
2.3
Central limit theorem .........................................................................................47
3
STATISTICS IN THE MEDIA ....................................................................................53
3.1
Misleading statistics ..........................................................................................53
Annexure A – Normal distribution table ................................................................................59
Facilitator Guide
US 9015 – Statistics and Probabilities 4
3
Sparrow Research and Industrial Consultants © July 2005
CHAPTER 1:
PROBABILITY
On completion of this chapter, you will be able to:

have a basic understanding of probability
1
WHAT IS PROBABILITY?
Probability is an attempt to quantify (put a value to) uncertainty by measuring or calculating
the likelihood of some event happening or not happening. Since we need a measure that
can be easily understood, probabilities are usually represented as either percentages (say, a
80% chance) or as figures with two decimal places, which is a fraction of the whole, for
example: “the probability is 0,50” (out of 1,00). Probability is the relative frequency with
which a certain event will occur in the long run.
In real life, we make many decisions based on our perception of probability. People who
won’t do anything unless they are certain of it success will never do anything at all. Typical
examples are:

“This furniture should last a long time!” (before buying new dining room furniture)

“A computer course should really get my career going!” (before deciding on enrolling
on new studies)

“Rather just buy ice-cream for dessert, everybody will enjoy that” (when preparing a
menu for weekend guests)

“This fence will keep the burglars out!”
2
CALCULATING PROBABILITY
A few essential definitions:

An event is the outcome of an experiment or trail

The probability of an event occurring 

Probability is measured on a scale from 0 to 1
Facilitator Guide
number of sucsessful outcomes
total number of outcomes
US 9015 – Statistics and Probabilities 4
4
Sparrow Research and Industrial Consultants © July 2005
Exercise 1 - Probabilities
1.
When two football teams play against each other, how many outcomes are there?
Mostly three – win, lose or draw. When there are rules to force a result (penalty
shootout, golden goal, etc.) then there are only two outcomes – win or lose.
2.
What is the probability that the following is true:
a)
tomorrow is Wednesday
1
7
b)
(It is a Wednesday once every seven days)
the sun will shine tomorrow
½
(“shine” or “not shine” are the possible outcomes)
If the probability that Susan is on time is ⅓, what is the probability that she is late?
3.
⅔
4.
(1 – ⅓)
An ordinary die is thrown. What is the probability that the number thrown is:
a)
less than 3
Less than 3 is 1 and 2 so
b)
2
6
=⅓
more than 7
0
c)
a factor of 6?
Factors of 6 are 1, 2, 3, 6 so
Facilitator Guide
4
6
=⅔
US 9015 – Statistics and Probabilities 4
5
Sparrow Research and Industrial Consultants © July 2005
There are several methods of determining probability:
2.1
Theoretical probability
Theoretical probability is when we can establish probability by using previous knowledge.
For example, the probability of throwing a 5 with a fair die is one out of six or
2.2
1
6
.
Experimental probability
Experimental (or empirical) probability means we have to carry out a large amount of trails to
determine the probability of a particular outcome. For example, to determine whether a slice
of bread will land with the butter side down, we will have to drop a large number of buttered
slices. By recording the outcome we could find the probability of any one slice landing
buttered side down.
2.3
Subjective probability
If it is not possible or practical to carry out a large number of trails, a subjective probability
has to be formed. To determine whether it will rain on New Year’s Day, we would have to
look at the weather records for many previous years. On the basis of that information it
would be possible to estimate the chances that it will rain this New Year’s Day.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
6
Sparrow Research and Industrial Consultants © July 2005
3
NOTATION
If we consider 100 learners registered at Richardsbay Technical College, 20 take
mathematics courses, 15 are studying chemistry and 8 are studying both maths and
chemistry, we can illustrate the information in a Venn diagram.
In a Venn diagram each group of objects is represented by a circle, the number inside the
circle represent the number of people (objects) in that particular set. So for maths that would
be 12 + 8 = 20. The number in the intersection represents the number of people taking both
maths and chemistry and the number outside the circles but inside the box represent the
number of people in the group who takes neither maths nor chemistry.
The probability that a learner takes maths (indicated by “M”) is written as P(M), where
P(M) =
20 1

100 5
Similarly the probability that a random chosen learner takes chemistry is
P(C) =
15
3

100 20
Facilitator Guide
US 9015 – Statistics and Probabilities 4
7
Sparrow Research and Industrial Consultants © July 2005
The shaded region is written as M ∩ C, and is read as M intersection C. The probability that
a learner takes both maths and chemistry is written P(M ∩ C), where
8
2

100 25
P(M ∩ C) =
The probability that a learner takes either maths or chemistry is written as M ∩ C, and is read
as M union C.
P(M ∩ C) =
12  8  7 27

100
100
We can also find the probability that a learner does not study maths from the Venn diagram.
The notation P(M’) is used for “not Maths”
Thus P(M’) =
Facilitator Guide
7  73 80 4


100
100 5
US 9015 – Statistics and Probabilities 4
8
Sparrow Research and Industrial Consultants © July 2005
or
It is important to point out that:
P(M’) = 1 – P(M)
 1
1 4

5 5
This is known as complementary probability.
P(M’ ∩ C) is the probability that a learner does not study maths but does study chemistry.
P(M’ ∩ C) =
7
100
Exercise 2 – Venn diagrams
1.
Draw a Venn diagram to show the following information. In a group of 40 learners,
25 plays netball and 17 play hockey and 5 play both. Use your diagram to find:
a)
The probability that a student chosen at random from the group will play netball.
b)
The probability that the student plays either netball or hockey.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
9
Sparrow Research and Industrial Consultants © July 2005
c)
The probability that the student does not play netball but does play hockey.
P(N)=
2.
25 5

40 8
P(N U H) =
20  5  12 37

40
40
P(N’ ∩ H) =
12 3

40 10
You may find it useful to use a Venn diagram to answer the question. In a class of
65 learners 15 are left-handed. There are 41 girls in the class of whom 4 are lefthanded. If a student is chosen at random, calculate the probability that the student
is:
a)
right-handed.
b)
a left-handed male.
c)
a right-handed female.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
10
Sparrow Research and Industrial Consultants © July 2005
P(L’) =
3.
50
65
P(L ∩ G’) =
11
65
P(L’ ∩ G) =
37
65
In a row of 30 houses, three have security fences and an alarm system. Eleven
houses have neither a fence nor an alarm system, and 17 have a alarm system.
What is the probability that a house chosen at random has:
a)
A security fence but no alarm system.
b)
Either an alarm or a security fence?
P(S ∩ A’) =
Facilitator Guide
14 7

30 15
US 9015 – Statistics and Probabilities 4
11
Sparrow Research and Industrial Consultants © July 2005
P(S U A) = 
4.
14  3  2 19

30
30
In a secondary school class there are 28 pupils. Seven are in the chess team and
play snooker. There are 16 pupils involved in snooker and ten in the chess team.
Find the probability that a student chosen at random:
a)
is only playing chess.
b)
is in either the snooker or chess team.
c)
is in neither the snooker or chess team.
5.
P(C ∩ S’) =
3
28
P(S U C) =
9  7  3 19

28
28
P(S U C)’ =
9
28
In a road of 30 houses, 25 are known to have mobile phones. Eight of these houses
have a video recorder. One house has neither the phone nor the video recorder. If
one house is chosen at random what is the probability that:
a)
it has a video recorder
b)
it does not have a video recorder
c)
has either a video recorder or a mobile phone
Facilitator Guide
US 9015 – Statistics and Probabilities 4
12
Sparrow Research and Industrial Consultants © July 2005
P(V) =
8  4 12 2


30
30 5
P(V’) =
17  1 18 3
2 3


or 1 - P(V) = 1  
30
30 5
5 5
P(V U M) =
6.
17  8  4 29

30
30
A group of 60 people were asked if they had watched the cricket game or the news
during the past week on TV. Thirty-five said they watched cricket, 20 said they hade
watched the news and 14 said they had watched neither. What is the probability
that a person chosen at random watched:
a)
both
b)
cricket but not the news
c)
either cricket or the news?
Facilitator Guide
US 9015 – Statistics and Probabilities 4
13
Sparrow Research and Industrial Consultants © July 2005
P(C ∩ N) =
9
3

60 20
P(C U N’) = C  N  
P(C U N) =
4
26 13

60 30
26  9  11 46 23


60
60 30
ADDITIONAL RULES
The diagram shows two events A and B. The Shaded area represents P (A U B), that is the
probability of A or B occurring. This can be found by adding the probability of A to the
probability of B and subtracting the probability of both A and B occurring.
This gives the important result of
P (A U B) = P (A) + P (B) – P(A ∩ B)
Facilitator Guide
US 9015 – Statistics and Probabilities 4
14
Sparrow Research and Industrial Consultants © July 2005
P (M) =
20
15
8
; P(C) =
and P(M ∩ C) =
100
100
100
So P (A U B) = P (A) + P(B) – P(A ∩ B)
P (A U B) =
20 15
8


100 100 100
=
27
100
P(A ∩ B) is subtracted as the members of the intersection are already included in P(A)
4.1
Mutually exclusive events
A card is chosen from a pack of 52 playing cards. If the card is a spade then it cannot be a
red card.
Events such as this were two outcomes cannot occur at the same time are said to be
mutually exclusive and we have:
P (R U S) = P(R) +P(S)
4.2
Exhaustive
If P(R U S) = 1 then the events R and S are said to be exhaustive. This means that there are
only two possible outcomes: R and S. For example, if a playing card is chosen at random it
can be either red or black.
Half of the pack is red and the other half is black.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
15
Sparrow Research and Industrial Consultants © July 2005
P (red card) = ½ ; P(black cards) = ½
P (red U black) = ½ + ½ = 1
A card is drawn at random from an ordinary pack of 52 playing cards. Find the probability
that the card is:
a)
a spade or a heart
b)
a spade or an ace
Solution
A card can not be a spade and a heart at the same time, so they are mutually exclusive.
a)
P(S U H) = P(S) + P (H)
1 1

4 4
1

2

or
P(S U H) = P(S) + P (H)
Facilitator Guide
US 9015 – Statistics and Probabilities 4
16
Sparrow Research and Industrial Consultants © July 2005
13 13 26


52 52 52
1

2

b)
P(S U A) = P(S) + P(A) – P(S ∩ A) or P(S U A) = P(S) +P (A) – P(S ∩ A)
13 4
1


52 52 52
16

52
4

13

1 1
1
 
4 13 52
4

13

Exercise 3 - Probabilities
1.
A card is draw at random from a pack of 52 playing cards. Work out the probability
that:
a)
The card is either a hart or a queen.
P(H ∩ Q) = P(H) +P(Q) –P(H ∩ Q)
Facilitator Guide
US 9015 – Statistics and Probabilities 4
17
Sparrow Research and Industrial Consultants © July 2005
13 4
1


52 52 52
16

52
4

13

b)
The card is either a heart or a diamond.
P(H U D) = P(H) +P(D) –P(H ∩ D)
13 13 0

 
52 52 52
26

52
1

2

c)
The card is either red or a ten.
P(R U T) = P(R) + P(T) – P(R ∩ T)
26 4
2
 
52 52 52
28

52
7

13

2.
A ten-sided dice, numbered 1 to 10, is thrown. Calculate the probability that:
a)
the number scored is a prime number.
Prime Numbers 2; 3; 5; 7
P(P) =
b)
4 2

10 5
The number scored is either a prime number or a multiple of 4.
P(P U M4) = P(P) + P(M4) – P(P ∩ M4)
Facilitator Guide
US 9015 – Statistics and Probabilities 4
18
Sparrow Research and Industrial Consultants © July 2005
4
2
0


10 10 10
6

10
3

5

c)
The number scored is either a multiple of 4 or a multiple of 3.
P(M4 U M3) = P(M4) +P(M3) – P(M4 ∩ M3)
2 3 0
 
10 10 10
5

10
1

2

If E and F are two events such that P(E) = ¼; P(F) = ½, and P(E ∩ F) = ⅛, find:
3.
a)
P(E U F)
P(E U F) = P(E) + P(F) – P(E ∩ F)
1 1 1
 
4 2 8
2  4 1

8
5

8

b)
P(E U F)’
P(E U F)’ = 1 – P(E U F)
 1

4.
5
8
3
8
For each of the following pairs of events, X and Y, say whether or not they are
mutually exclusive and / or exhaustive?
Facilitator Guide
US 9015 – Statistics and Probabilities 4
19
Sparrow Research and Industrial Consultants © July 2005
a)
A student is chosen at random from a tutor group. Event X: student is righthanded, Event Y: student is left-handed.
mutually exclusive and exhaustive
b)
A fair die is thrown. Event X: die shows a multiple of 3, Event Y: die shows a
prime number.
not mutually exclusive or exhaustive
c)
A card is dealt from a pack of playing cards. Event X: the card is a spade, Event
Y: the card is a King.
not mutually exclusive or exhaustive
5
TREE DIAGRAMS
Tree diagrams are very useful in solving probability problems, either where one event is
repeated, or where more than one event occurs.
1.
The probability that Jeffrey is late for class on any one day is 0,15 and is
independent of whether he was late on the previous day. Find the probability that
Jeffrey will:
a)
be late on Monday and Tuesday
b)
arrive on time on one of these days
Facilitator Guide
US 9015 – Statistics and Probabilities 4
20
Sparrow Research and Industrial Consultants © July 2005
Solution
Outcome
2.
Probability
Late-Late
0,15 x 0,15
0,0225
Late-On time
0,15 x 0,85
0,1275
On time-Late
0,85 x 0,15
0,1275
On time-On time
0,85 x 0,85
0,7225
c)
0,0225 = 2,25% chance (late-late)
d)
0,1275 + 0,1275 = 25,5% chance (on time-late, and late-on time)
If we investigate the probability of drawing an ace from a pack of cards, the tree will
look like this.
Outcome
Facilitator Guide
Probability
Ace – Ace
4 3
  0,0045
52 51
Ace – Other card
4 48

 0,0724
52 51
Other card – Ace
48 4
  0,0724
52 51
Other card – Other card
48 47

 0,8507
52 51
US 9015 – Statistics and Probabilities 4
21
Sparrow Research and Industrial Consultants © July 2005
3.
If we now extend the diagram to a third draw of cards, the probability will look like
this:
Exercise 4 - Probabilities
1.
Research shows that if it rains on one day in Cape Town, the probability that it will
rain the next day is 30%. If it is a fair day the probability that it will rain the next day
is 15%. Use a tree diagram and determine the following: If it is a fair day on Monday
what is the probability that it will rain on Wednesday?
A probability of 30% can be written as 0,3 and 15% as 0,15. If it is fair on Monday
the probability that it will rain on Tuesday is 0,15 and the probability that it will be a
fair day 0,85. If it does rain on Tuesday the probability that it will rain on Wednesday
is 0,3 and that it will be fair 0,7. If Monday and Tuesday were fair days, there is a
72% probability that it will rain on Wednesday.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
22
Sparrow Research and Industrial Consultants © July 2005
6
TERMS USED
6.1
Probability statements
It is usual to make statements about probability in the form
P(A) = …………….
This is read as: “The probability of an event “A” happening is …..” Thus the probability of
drawing a spade from a pack of cards is P(S) =
13
= 0,25 (25%)
52
This is because there are 13 spades in a pack of cards and a pack has 52 cards.
6.2
Sample space
Sample space refers to the set of all possible outcomes. When taking cards from a pack the
sample space is 52 (without the jokers). There are 7 days in a week, therefore the sample
space is 7 while the sample space for months in a year is 12. There are 7 grades in primary
school so the sample space will be 7.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
23
Sparrow Research and Industrial Consultants © July 2005
6.3
Joint and disjoint outcomes
In many situations events are disjoint or mutually exclusive. If you consider picking Aces
and Kings from a pack of cards, the two events are mutually exclusive. If you do pick an Ace
it can never be a King, and if you pick a King it can never be an Ace.
However if the two events are a King and a Spade, it is possible to pick the King of Spades.
Such events are said to be joint or non-exclusive.
6.4
Independent and dependent outcomes
The question whether events are dependent on one another sometimes arises.
With
independent events the result of the one event has no effect on the outcome of another
event. If we toss a coin in the air (Head or Tails), the result of one throw has no effect on the
result of the next throw.
If we remove an Ace from a pack of cards we reduce the probability of finding another Ace in
the pack, so the outcome is influenced by the first event, this is called a dependent outcome.
The successive chances of picking an Ace would be:
Full pack
P(A) =
4
= 0,077
52
One ace removed
P(A) =
3
= 0,059
51
Two aces removed
P(A) =
2
= 0,040
50
Three aces removed P(A) =
1
= 0,020
49
Exercise 5 - Probabilities
Where necessary, give the answer correct to three decimal places.
1.
What are the chances of selecting from a full pack of cards (excluding jokers):
a)
The jack of diamonds?
Facilitator Guide
US 9015 – Statistics and Probabilities 4
24
Sparrow Research and Industrial Consultants © July 2005
1
52
b)
Any jack?
4
52
c)
Any diamond?
13
52
2.
From a new pack of cards is taken only the aces, kings, queens, jacks and tens.
What are the chances of selecting from this reduced pack:
Aces = 4 ; Kings = 4 ; Queens = 4 ; Jacks = 4 ; Tens = 4
Total = 20
a)
Any queen?
4 1

20 5
b)
Any diamond?
1
4
c)
The queen of hearts?
1
20
3.
The probability that a taxi will arrive on time or late is put at 0,78. What is the
probability that the taxi will arrive early?
0,22
Facilitator Guide
US 9015 – Statistics and Probabilities 4
25
Sparrow Research and Industrial Consultants © July 2005
4.
A bag contains three green balls and five yellow balls. One ball is chosen and its
colour noted before being replaced in the bag. A second ball is selected and its
colour noted. With the help of a probability tree work out:
Green = 3 and Yellow = 5 Total = 8 balls
So Green = ⅜ = 0,375 and Yellow =⅝ = 0,625
a)
The probability that two green balls are chosen
P(Green, Green) = ⅜ x ⅜ =
b)
9
64
The probability that the two balls are different colours.
To get 2 different colours can happen as follows:
Draw Green then Yellow, or draw Yellow the Green, therefore:
P(2 colours) = P(Green, Yellow) + P(Yellow, Green)
P(2 colours) = ⅜ x ⅝ + ⅝ x ⅜ =
5.
15 15 30 15



64 64 64 32
At an activity holiday centre children choose either painting or drama as there
activity on the first morning and horse-riding, football or swimming as their activity on
the first afternoon. Past records show that in one morning 60% choose painting,
and in the afternoon 45% choose horse-riding and 30% choose football and that the
afternoon choices are made independently of the morning choices.
a)
Draw a tree diagram to illustrate the probabilities of the various choices for a child
selected at random.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
26
Sparrow Research and Industrial Consultants © July 2005
P(P, H) = 60% x 45% = 27%
P(P, F) = 60% x 30% = 18%
P(P, S) = 60% x 25% = 15%
P(D, H) = 40% x 45% = 18%
P(D, F) = 40% x 30% = 12%
P(D, S) = 40% x 25% = 10%
b)
Use your tree diagram to find the probability that a child selected at random
chooses:
i. Drama in the morning and swimming in the afternoon. 0,1
ii. Neither drama nor horse-riding. 0,18 + 0,15 = 0,33
6.
In a group of 50 students, 18 take history and 26 take English. If 14 students take
neither History nor English find the probability that a student chosen at random
takes:
a)
Both History and English
P(H ∩ E) = 8
Facilitator Guide
US 9015 – Statistics and Probabilities 4
27
Sparrow Research and Industrial Consultants © July 2005
b)
History but not English
P(H) U P(E’)= 10
c)
Either History or English.
P(H U E) = 36
7
SUMMARY
1.
Probability is a way of measuring the likelihood of some event taking place.
a)
The probability of events A and B both taking place = P(A ∩ B)
b)
The probability of event A or event B or both events occurring = P(A U B)
c)
The probability of events A not taking place = P(A’)
d)
The sum of the probability of event A occurring and A not occurring is equal to 1:
P(A) + P(A’) = 1 so P(A’) = 1 – P(A)
e)
The addition rule for probability is
P(A U B) = P(A) +P(B) – P(A ∩ B)
f)
For mutually exclusive events P(A U B) = P(A) + P(B) and P(A ∩ B) = 0
g)
Probabilities are always given as a percentage or as a decimal to two places (e.g.
the probability is 80% or 0,80)
h)
The smallest value a probability can have is 0 – the event will never happen.
i)
The largest value is 1 – the event is bound to happen.
2.
Statements of probabilities always refer to the long run.
3.
The sample space is the set of all possible outcomes.
4.
Tree diagrams and Venn-diagrams are techniques for investigating probabilities.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
28
Sparrow Research and Industrial Consultants © July 2005
CHAPTER 2:
STATISTICS
On completion of this chapter, you will be able to:

look at samples in terms of size and representativeness

understand what a normal distribution is

critically interrogate and effectively communicate results
No one can afford to be without some knowledge of statistical methods today.
In any
working situation you will have to deal with some form of statistics.
In this unit standard we will apply our knowledge of statistics and probability to:

critically interrogate and effectively communicate results

look at samples in terms of size and representativeness

understand what a normal distribution is

have a basic understanding of probability.
1
THE NORMAL DISTRIBUTION
1.1
Skewed and symmetrical distributions
By now, you should be able to calculate averages and measures of dispersion in order to
describe any data collected. However, it is still possible to have three sets of data with the
same mean and standard deviation, but completely different values.
Suppose a company owns three petrol stations. Their average weekly wages are shown
below:
Facilitator Guide
US 9015 – Statistics and Probabilities 4
29
Sparrow Research and Industrial Consultants © July 2005
Garage
Mean (R)
Standard Deviation (R)
A
180
10
B
180
10
C
180
10
The wages structure appears to be the same, but plots of the three frequency curves might
be as follows:
The three petrol stations clearly have very different wage structures, but this was not
apparent from the mean and standard deviation.
1.2
Normal distribution curves
The concept of a ‘normal’ distribution curve is a very important one in statistics. Whether we
find the average by the mean, the median or the mode, we need to know how the data that
make up the distribution under discussion are spread around the average chosen. Are they
symmetrically or asymmetrically (skew) arranged around the average? Even if they are
symmetrically arranged, are they widely spread or narrowly spread? For example, if we take
five pieces if data:
98
99
100
101
102
The average is clearly 100 (total 500  5) and the data are closely clustered around the
average. By contrast, consider the following five pieces of data:
25
50
100
150
175
The average is still 100 (500  5) but now the data are widely spread around the average.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
30
Sparrow Research and Industrial Consultants © July 2005
Both sets of data are symmetrically distributed around the average, but the distributions are
far from alike. In any case of normal distribution the three averages must by definition
coincide. The arithmetic mean and the central item (the median) of a symmetrically arranged
distribution must coincide; while the fact that the mode is the most frequent item and is
always to be found at the high point of the curve will also mean that it is centrally placed.
Consider an experiment in which ten truly balanced coins are tossed into the air. We would
expect to have five heads and five tails, but in any actual experiment we might have other
results, such as 6:4 or 7:3, or, rather more rarely, 10:0. If we can repeat this experiment 100
times, we might have results as shown in this table.
Number of heads
0
1
2
3
4
5
6
7
8
9
10
Frequency
1
3
5
10
16
24
17
11
7
3
1
We could draw this set of data as a histogram (bar chart):
Heads and Tails Distribution
30
25
Frequency
20
15
10
5
0
0
1
2
3
4
5
6
7
8
9
10
Number of Heads
If we consider the kind of curve that will result from a very large number of experiments with
a large number of coins, we would expect a normal distribution curve.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
31
Sparrow Research and Industrial Consultants © July 2005
1.3
Characteristics of the normal curve
The characteristics of a normal curve are the following:

It is bell-shaped.

It is symmetrical about the mean.

It extends indefinitely in both directions, but in practice it is indistinguishable from the
horizontal axis once we get more than three standard deviations either side of the
mean.

The parts of the curve which approach the horizontal axis are called the tails.

If we know the mean and the standard deviation of the curve, the curve is
completely determined mathematically.
A large number of 1 kg bags of sugar are weighed to check how accurately they have been
filled, and the results are shown in the histogram below:
Weight Distribution of Sugar Bags
25
Frequency
20
15
10
5
1.
05
5
1.
05
0
1.
04
5
1.
04
0
1.
03
5
1.
03
0
1.
02
5
1.
02
0
1.
01
5
1.
01
0
1.
00
5
1.
00
0
0.
99
5
0.
99
0
0.
98
5
0.
98
0
0.
97
5
0.
97
0
0.
96
5
0.
96
0
0.
95
5
0.
95
0
0.
94
5
0
Weight (kg)
Facilitator Guide
US 9015 – Statistics and Probabilities 4
32
Sparrow Research and Industrial Consultants © July 2005
This histogram is similar to the histograms you might obtain for a variety of different types of
data, such as heights or weight of learners, analysis of a chemical component, thickness of
paper, strength of tissue, etc. The most important features of this histogram are:

It is (almost) symmetrical about the mean.

There are more values close to the mean than further away from the mean.
As the values of  (standard deviation) and  (mean) vary, so the shape of the curve will
change, although it will always retain its characteristic bell shape.
In each example the area under the curve is equal to one. This follows the results of the
previous chapter. The ‘peak’ occurs at the mean  while the value of the standard deviation,
, determines the spread of the curve – long and thin or wide and flat.
It is true, however, that for all these different normal curves almost all (99,7%) the values
being within  3 standard deviations from the mean, approximately 95% of all values being
within  2 standard deviations from the mean, and approximately 66% of all values being
within  1 standard deviation from the mean.
As a normal distribution curve is continuous, the probability that x lies between a and b can
be calculated by finding the area under the curve between the values a and b.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
33
Sparrow Research and Industrial Consultants © July 2005
The probability is written as, P(a < x < b)
The area under the normal distribution curve can be calculated by converting a distribution
into a “standard normal distribution”. The surface area (and therefore the probability) for the
standard normal curve are given in normal distribution tables.
The standard normal distribution is the special normal distribution where the mean, μ = 0 and
the standard deviation, σ = 1.
The curve indicates that the standard normal distribution with its mean (μ = 0) in the middle
and the range of  3 σ.
All normal distributions can be converted into the standard normal distribution by a process
known as standardising. In order to use the standard normal tables it is necessary to
convert a distribution to a standard normal distribution. This is done by calculating the value
of z using the formula:
Facilitator Guide
US 9015 – Statistics and Probabilities 4
34
Sparrow Research and Industrial Consultants © July 2005
z
X 

One only needs to know the mean and standard deviation for a sample to be able to convert
any value into a value on the standard normal curve. Lengths and weights of learners,
weight of sugar bags, chlorine concentration in water, strength and thickness of paper, etc.
The average basis weight for imported 80 g/m2 paper is in actual fact 81,1 g/m2, with a
standard deviation of 1,3 g/m2. Calculate z for basis weights of 79 g/m2, 82 g/m2 and 83
g/m2. Also determine the z value for the mean basis paper weight of 81,1 g/m2.
z
z
z
z
X 

X 

X 

X 


79  81,1
 1,62
1,3

82  81,1
 0,69
1,3

83  81,1
 1,46
1,3

81,1  81,1
0
1,3
So what is the use of this conversion? Let us first investigate the meaning of the two results
we obtained by finding these values on the standardised normal curve:
Facilitator Guide
US 9015 – Statistics and Probabilities 4
35
Sparrow Research and Industrial Consultants © July 2005
In itself, one can at least get a general feeling of how far each value is from the mean value
(in terms of the standard deviation). However, from the standard normal tables one can
obtain the cumulative area under the curve which is also an indication of the probability that a
value would fall within this range. The surface area (and therefore probability) is written as:
(z) = P(Z < z)
This indicates that the standard normal curve tables always indicates the probability less
than z, in other words, the area left of z on the curve:
Reading from the table is done by finding the correct z value on the left and top axis and then
reading the correct (z) value from the body of the table.
Find (z) for z = 0,58, z= 0,46 and z = 0,03 by reading the values from the table:
(0,58) = 0,71904 = 71,904%
Facilitator Guide
US 9015 – Statistics and Probabilities 4
36
Sparrow Research and Industrial Consultants © July 2005
(0,46) = 0,66724 = 66,724%
(0,03) = 0,51197 = 51,197%
The next step will be to address all the whole process in a single example.
For the basis weight problem in the previous example, z values were calculated for basis
weights of 79 g/m2, 82 g/m2, 83 g/m2 and 81,1 g/m2. Using these z values, determine the
following probabilities:
a)
The imported paper has a basis weight of less than 81,1 g/m2.
b)
The imported paper has a basis weight of less than 83 g/m2.
c)
The imported paper has a basis weight of less than 79 g/m2.
d)
The imported paper has a basis weight between 79 and 83 g/m2.
From the standard normal tables one can obtain P(Z < z):
a)
For X = 81,1 g/m2, z = 0,00. From the table P(Z < z) = 0,50 = 50%. It is to be
expected that 50% of the values would be smaller than the average!
b)
For X = 83 g/m2, z = 1,46. From the table P(Z < 1,46) = 0,92785 = 92,785%.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
37
Sparrow Research and Industrial Consultants © July 2005
c)
For X = 79 g/m2, z = –1,62. Since the table gives no probabilities for negative
values of z, the probability has to be calculated as follows:
P(Z < z) = 1 – P(Z < |z|), which in this case is: P(Z < –1,62) = 1 – P(Z < 1,62)
Where |z| indicates the value obtained from the table based on the positive value of z.
Therefore, from the table P(Z < |z|) = P(Z < 1,62) = 1 - 0,94738 = 0,05262 = 5,262%
d)
For the imported paper to have a basis weight between 79 and 83 g/m2, the area
indicated in the following diagram need to be determined:
Facilitator Guide
US 9015 – Statistics and Probabilities 4
38
Sparrow Research and Industrial Consultants © July 2005
The probability is therefore:
P(Z < 1,46) – P(Z < -1,62) = 92,785% - 5,262% = 87,523%
Find the area under the standard normal curve:
a.
between 1 and 2 standard deviation from the mean
b.
between 0,5 and 1,5 standard deviation from the mean.
Solution:
a.
For z = 2 the table indicates (2) = 0,97725, and for z = 1, (1) = 0,84134.
As explained before, the area between z = 1 and z = 2 is given as (2) – (1).
(2) – (1) = 0,97725 – 0,84134 = 0,13591 (13,591%)
b.
For z = 1,5 the table indicates (1,5) = 0,93319, and for z = 0,5, (0,5) = 0,69146.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
39
Sparrow Research and Industrial Consultants © July 2005
(1,5) – (0,5) = 0,93319 – 0,69146 = 0,24173 (24,173%)
Since the total area under the curve equals 1, we can from the above diagram that the area
under the normal curve for z > 1,5 we can simply find the area under the curve for z<1,5 and
subtract this value from 1.
(>1,5) = 1 – (1,5)
= 1 – 0,93319
= 0,06681
Find the area under the curve which is more than:
Facilitator Guide
US 9015 – Statistics and Probabilities 4
40
Sparrow Research and Industrial Consultants © July 2005
a.
1 standard deviation above the mean
b.
2,4 standard deviations above the mean
Solution:
a.
First draw a sketch to show the area required.
(>1,0) = 1 – (1,0)
= 1 – 0,84134
= 0,15866
b.
First draw a sketch to show the area required.
(>2,4) = 1 – (2,4) = 1 – 0,99180 = 0,00820
Facilitator Guide
US 9015 – Statistics and Probabilities 4
41
Sparrow Research and Industrial Consultants © July 2005
Note:
Since the table only gives values of (z) for z > 0. One can work out values for z < 0 if you
remember that the normal curve is symmetrical about the mean (z = 0) by using:
(z < 0) = 1 – (–z)
Find the area under the curve that is less than:
a.
1 standard deviation below the mean
b.
1,75 standard deviations below the mean.
Solution:
a.
First draw a sketch to show the area required.
As can be seen this is the same as more than 1 standard deviation above the mean. The
formula used is:
(–1,0) = 1 – (1,0)
= 1 – 0,84134
= 0,15866
Facilitator Guide
US 9015 – Statistics and Probabilities 4
42
Sparrow Research and Industrial Consultants © July 2005
b.
On the same basis as a. above:
(–1,75) = 1 – (1,75)
= 1 – 0,95994
= 0,04006
Exercise 6 – Normal distribution
1.
Find the probability of an event occurring between 1 and 1,5 standard deviations from
the mean.
The area to the left of z = 1,5 is 0,93319 and the area to the left of z = 1 is 0,84134.
Therefore the area between z = 1 and z = 1,5 is:
(1,0 < z < 1,5) = (1,5) – (1,0)
From the tables: (1,5) = 0,93319; (1,0) = 0,84134
(1,0 < z < 1,5) = 0,93319 – 0,84134 = 0,09185
Facilitator Guide
US 9015 – Statistics and Probabilities 4
43
Sparrow Research and Industrial Consultants © July 2005
2.
Find the probability of an event occurring which is more than 1,3 standard deviations
above the mean.
First draw a sketch to show the area required.
Shaded area = (z > 1,3) = 1 – (1,3)
= 1 – 0,90320
= 0,09680
3.
Find the area under the curve that is less than 2 standard deviations below the mean.
First draw a sketch to show the area required.
(z < -2,0) = 1 – (2)
Facilitator Guide
US 9015 – Statistics and Probabilities 4
44
Sparrow Research and Industrial Consultants © July 2005
=1 – 0,97725
=0,02275
1.4
Skewed data
Although the normal distribution is an important theoretical distribution, it is unlikely to be
exactly met in real life. However, a number of sets of data tend towards normality e.g.
educational data, sample quality control measurements, or some market research
information, particularly where the numbers concerned are very large. Most sets of data will
display some skewness and the reader will see that this causes some separation between
the mean, the medium and the mode. Since we find it helpful to describe any set of data to
know where the ‘average’ is, it follows that our three averages (if they do not coincide to give
a normal distribution) may give us some clues as to the extent of the skewness they display.
It is this observed difference between the averages which is used to measure skewness.
2
SAMPLE SIZE AND REPRESENTATIVENESS
2.1
Sample representativeness
The choice of a sample in itself is a major statistical exercise, because the type of sample
must be balanced against the cost to collect it: If the sample choice is narrowed down to
lower the cost, it will not be as representative as a completely random sample. For example,
basing an enquiry about the family expenditure of South Africans on citizens that live in
Westville will be less representative of the total population than one based on the citizens of
the whole Durban area or further expanding it to all the people staying near the KwaZulu
Natal coast. However, interviewing a thousand people in Westville or Durban is relatively
easy, but interviewing a thousand in rural areas is not so easy, and will involve greater
travelling costs and much more time.
Ensuring that the sample is free from bias is not just a matter of avoiding obvious tendencies.
Bias of an unconscious nature must not be included either. For example, a survey on the
educational level of South Africans could not take its sample solely from Universities – it
should be fairly obvious that the degree of achievement would be grossly overstated.
However, if you conducted a survey by questioning the first five thousand people you
encountered in Smith Street, it would not necessarily be biased but it would not be
representative of the whole population either.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
45
Sparrow Research and Industrial Consultants © July 2005
In order to overcome such problems, random sampling is used. It is important to realise that
although random sampling is a method of selecting a sample free of bias, it cannot
guarantee that the respondent who happened to be selected in the sample, is not biased –
the randomly selected respondent may still have very biased views on the matter in hand.
A major problem in survey sampling is matching the studied population to the target
population. As an example, after an in-line sample has been analysed in a pulping plant’s
chemical recovery unit, it is not available for further analysis or investigation since it either reenters the system or it is disposed.
Samples belonging to certain strata may not be
externally recognisable or extractable from its environment (e.g. certain polymers in a fibre
stream).
Sometimes there is a convenient list of groups in a population, which can then be used to
specify the sample that is sought, but this is not always the case – there is no list of people
who buys stationary.
2.2
Sample size
Before carrying out a test or survey it is often useful to be able to calculate the size of a
sample needed to give the mean of the population with a certain accuracy. Suppose the
mean height of a population was to be quoted to the nearest centimetre, with 95%
confidence. What size of sample is required?
The factors that affect the sample size are:

The variation between different people – this can be expressed as the standard
deviation. The standard deviation is therefore a function of the characteristics of
the specific parameter (height, weight, hair colour, etc.) and the specific population
under consideration (see graph below).
Facilitator Guide
US 9015 – Statistics and Probabilities 4
46
Sparrow Research and Industrial Consultants © July 2005

The confidence level that we require. To be 80% sure, a much smaller sample is
required than when you want to be 99% sure.

The limit within which the variation should be established. For example it would be
much more difficult to establish the average height to within 1 mm than to within
100 mm with 99% confidence.
2.3
Central limit theorem
The central limit theorem provides us with a method to determine sample size – in actual
fact; it is useful within a number of cases:

Given a confidence level (e.g. 99% certainty) and limit (e.g. within 10 cm), a required
sample size can be determined

Given confidence level (e.g. 99% certainty) and sample size (e.g. 30 respondents),
the limit can be determined

Given sample size (e.g. 30 respondents) and limit (e.g. within 10 cm) the confidence
level can be determined
To determine these parameters, the following equation is used:
X   z

n
X – μ = Limit (Distance from the mean value)
Μ = Mean
z = Standardised normal distribution
σ = standard deviation
n = Sample size
Facilitator Guide
US 9015 – Statistics and Probabilities 4
47
Sparrow Research and Industrial Consultants © July 2005
The mean height of a population, with a standard deviation of 5 cm, was to be quoted to the
nearest centimetre, with 95% confidence. What size of sample is required?
Solution:
Since the height has to be quoted to the nearest centimetre, it actually means 0,5 cm from
the mean value.
For a confidence level of 95%, the standardised score from the
standardised normal distribution tables, is Z = 1,96 ((z) = 0,975).
1.96  5  0.5
n
1.96  5  n
0.5
19.6  n
384.16  n
A sample size of 385 is required to be within 0,5 cm as the sample size must be an integer.
From this example it can be seen that to achieve a small limit the required size of sample is
quite large. A balance must be kept between the size of the limit and the size of the sample.
If we enlarge the limit to within 4 cm (2 cm) the following sample size can be calculated:
Facilitator Guide
US 9015 – Statistics and Probabilities 4
48
Sparrow Research and Industrial Consultants © July 2005
1.96  5  2
n
1.96  5  n
2
4.9  n
24.0  n
This is clearly a much easier achievable target.
The mean fuel consumption of a fleet of 9 Toyota taxis has been determined by the owner,
Mr Twala, as 10,5 km/l with a standard deviation of 1 km/l. He claims that the economy of
similar Toyotas should “definitely” be between 10,0 and 11 km/l.
What is the actual
confidence level of his determination?
Solution:
The consumption limit claimed is actually 0,5 km/l from the mean value.
X   z
0,5  z 

n
1,0
9
z
0,5  9
 1,5
1,0
From the tables it can be seen that:
(1,5) = 0,93319
The top and bottom ends are therefore both:
(z > 1,5) = 1 – 0,93319 = 0,06681
The top and bottom together are therefore:
0,06681 x 2 = 0,13362
Facilitator Guide
US 9015 – Statistics and Probabilities 4
49
Sparrow Research and Industrial Consultants © July 2005
The probability of the value being between 10 km/l and 11 km/l are therefore:
(-1,5 > z > 1,5) = 1 – 0,13362 = 0,86638 or 86,638%
Mr Twala is therefore a bit optimistic in using the word “definitely”, he would be more
accurate if he says that it is 86,638% sure that similar Toyotas will achieve between 10 and
11 km/l!
Exercise 7 – Sample size
Find the size of sample required at the:
a)
90%
b)
95%
confidence limit for the following distribution and intervals.
σ = 15 cm , interval  8 cm
1.
a)
Facilitator Guide
US 9015 – Statistics and Probabilities 4
50
Sparrow Research and Industrial Consultants © July 2005
X    z
8  1,645 

n
15
n
15  1,645
n
 3,084
8
 n  9,51
Since one cannot take 9,51 samples, n ≥ 10
b)
X    z
8  1,96 

n
15
n
15  1,96
n
 3,675
8
 n  13,51
Since one cannot take 13,51 samples, n ≥ 14
σ =25 cm interval 20 cm
2.
a)
a.
n ≥ 67,2, ∴ n ≥ 68
b)
b.
n ≥ 96,04, ∴ n ≥ 97
σ = 12 cm interval 0,5 cm
3.
a)
n ≥ 4,2, ∴ n ≥ 5
b)
n ≥ 6,0025, ∴ n ≥ 7
σ = 50 cm interval 10 cm
4.
a)
n ≥ 1549,2, ∴ n ≥ 1550
Facilitator Guide
US 9015 – Statistics and Probabilities 4
51
Sparrow Research and Industrial Consultants © July 2005
b)
n ≥ 2212,76, ∴ n ≥ 2213
σ = 0,5 cm interval 0,1 cm
5.
a)
n ≥ 67,62, ∴ n ≥ 68
b)
n ≥ 96,04, ∴ n ≥ 97
σ =22,5 cm interval 8 cm
6.
a)
n ≥ 67,65, ∴ n ≥ 68
b)
n ≥ 96,04, ∴ n ≥ 97
σ =1000 cm interval 50 cm
7.
a)
n ≥ 21,405, ∴ n ≥ 22
b)
n ≥ 30,39, ∴ n ≥ 31
σ = 0,1 cm interval 0,001 cm
8.
a)
n ≥ 1082,41, ∴ n ≥ 1083
b)
n ≥ 1536,6, ∴ n ≥ 1537
σ = 3,4 cm interval 0,25 cm
9.
a)
n ≥ 27060,25, ∴ n ≥ 27061
b)
n ≥ 38416, ∴ n ≥ 38416
Facilitator Guide
US 9015 – Statistics and Probabilities 4
52
Sparrow Research and Industrial Consultants © July 2005
3
STATISTICS IN THE MEDIA
Sometimes statistics are misrepresented to imply a much better (or worse) situation than is
really the case. In this section we will consider a few techniques used to create misleading
statistics.
3.1
Misleading statistics
3.1.1
Axes’ scales
The starting point of the scale on one or both axes can be changed. Diagrams A and B show
the South African budgeted expenditure on education since 1999. Which is the misleading
display? Discuss.
Diagram A:
Education Budget
50
45
40
Budget (R billion)
35
30
25
20
15
10
5
0
1999
Facilitator Guide
2000
2001
2002
US 9015 – Statistics and Probabilities 4
2003
2004
53
Sparrow Research and Industrial Consultants © July 2005
Diagram B:
Education Budget
50
Budget (R billion)
45
40
35
30
25
1999
2000
2001
2002
2003
2004
The spacing of the scale on one or both axes can also be changed. Diagrams C and D show
the price of 1 US dollar paid in RSA cents from 1979 to 2002. Which is the misleading
display? Discuss.
Diagram C:
Average Rand / US Dollar Exchange Rate
(1979 - 2002)
12
11
10
9
ZAR / US$
8
7
6
5
4
3
2
1
0
1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002
Facilitator Guide
US 9015 – Statistics and Probabilities 4
54
Sparrow Research and Industrial Consultants © July 2005
Diagram D:
Average Rand / US Dollar Exchange Rate
(1979 - 2002)
12
11
10
9
ZAR / US$
8
7
6
5
4
3
2
1
0
1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001
3.1.2
Perspective
Perspective in 3D can be misused to make things seem larger or smaller than is really the
case. The pie charts below display the percentage owned by shareholders in a company.
Why can the second display be considered misleading?
Facilitator Guide
US 9015 – Statistics and Probabilities 4
55
Sparrow Research and Industrial Consultants © July 2005
3.1.3
Size
Area or volume can be misused.
These displays show the increase in turnover in the
company from 2001 to 2004. Are any of them misleading?
Exercise 8 – Displaying statistics
The table below shows a dramatic increase in the number of women diagnosed with
HIV each year. In 1990, females accounted for just 15% of HIV diagnoses, but in
2004 that figure was 43%.
1.
Draw a graph of your choice to illustrate the findings for 1990 up to 2004.
2.
Is there any noticeable trend in your graph?
3.
Discuss the decline in 2004 – do you think it is real or is it a statistical or
administrative change in measurement?
4.
What is your forecast of what would happen in the future?
Facilitator Guide
US 9015 – Statistics and Probabilities 4
56
Sparrow Research and Industrial Consultants © July 2005
HIV infected individuals and AIDS cases by year of diagnosis and sex
HIV
Year of diagnosis
Male
1989 or earlier
12892
1319
14238
3398
155
3553
1990
2177
371
2553
1145
97
1242
1991
2278
450
2728
1253
138
1391
1992
2202
541
2744
1405
173
1578
1993
2101
533
2635
1551
238
1789
1994
2042
534
2577
1626
227
1853
1995
2085
572
2657
1487
284
1771
1996
2117
587
2704
1172
272
1444
1997
2093
669
2763
863
217
1080
1998
2086
757
2844
593
195
788
1999
2154
943
3099
564
192
756
2000
2482
1390
3872
586
245
831
2001
3108
1959
5068
487
240
727
2002
3575
2636
6211
548
319
867
2003
3980
3156
7136
496
390
886
2004
3681
2721
6403
395
303
698
51053
19138
70232
17569
3685
21254
Total
Facilitator Guide
Female
AIDS
Total*
Male
US 9015 – Statistics and Probabilities 4
Female
Total
57
Sparrow Research and Industrial Consultants © July 2005
Solution
1.
HIV and AIDS in South Africa
4500
4000
3500
Cases Diagnosed
3000
HIV Male
HIV Female
AIDS Male
AIDS Female
2500
2000
1500
1000
500
0
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2.
Sharp increase from 1998 to 2003 followed by a decline in all cases in 2004.
3.
It is important to investigate the results to determine the reason for the changes.
4.
The HIV cases will in future convert to AIDS with the result that AIDS may well
increase due to the high HIV in the late 1990s and early 2000s. The trend with HIV is
hopefully broken and may continue to a low plateau in future.
Facilitator Guide
US 9015 – Statistics and Probabilities 4
58
Sparrow Research and Industrial Consultants © July 2005
Annexure A – Normal distribution table
z
0,0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1,0
1,1
1,2
1,3
1,4
1,5
1,6
1,7
1,8
1,9
2,0
2,1
2,2
2,3
2,4
2,5
2,6
2,7
2,8
2,9
3,0
3,1
3,2
3,3
3,4
3,5
3,6
3,7
3,8
3,9
0
0,01
0,02
0,03
0,04
0,05
0,06
0,07
0,08
0,09
0,50000
0,53983
0,57926
0,61791
0,65542
0,69146
0,72575
0,75804
0,78814
0,81494
0,84134
0,86433
0,88493
0,90320
0,91924
0,93319
0,94520
0,95543
0,96407
0,97128
0,97725
0,98214
0,98610
0,98928
0,99180
0,99379
0,99534
0,99653
0,99744
0,99813
0,99865
0,99903
0,99931
0,99952
0,99966
0,99977
0,99984
0,99989
0,99993
0,99995
0,50399
0,54380
0,58317
0,62172
0,65910
0,69497
0,72907
0,76115
0,79103
0,81859
0,84375
0,86650
0,88686
0,90490
0,92073
0,93448
0,94630
0,95637
0,96485
0,97193
0,97778
0,98257
0,98645
0,98956
0,99202
0,99396
0,99547
0,99664
0,99752
0,99819
0,99869
0,99906
0,99934
0,99953
0,99968
0,99978
0,99985
0,99990
0,99993
0,99995
0,50798
0,54776
0,58706
0,62552
0,66276
0,69847
0,73237
0,76424
0,79389
0,82121
0,84614
0,86864
0,88877
0,90658
0,92220
0,93574
0,94738
0,95728
0,96562
0,97257
0,97831
0,98300
0,98679
0,98983
0,99224
0,99413
0,99560
0,99674
0,99760
0,99825
0,99874
0,99910
0,99936
0,99955
0,99969
0,99978
0,99985
0,99990
0,99993
0,99996
0,51197
0,55172
0,59095
0,62930
0,66640
0,70194
0,73565
0,76730
0,79673
0,82381
0,84849
0,87076
0,89065
0,90824
0,92364
0,93699
0,94845
0,95813
0,96638
0,97320
0,97882
0,98341
0,98713
0,99010
0,99245
0,99430
0,99573
0,99683
0,99767
0,99831
0,99878
0,99913
0,99938
0,99957
0,99970
0,99979
0,99986
0,99990
0,99994
0,99996
0,51595
0,55567
0,59483
0,63307
0,67003
0,70540
0,73891
0,77035
0,79955
0,82639
0,85083
0,87286
0,89251
0,90988
0,92507
0,93822
0,94950
0,95907
0,96712
0,97381
0,97932
0,98382
0,98745
0,99036
0,99266
0,99446
0,99585
0,99693
0,99774
0,99836
0,99882
0,99916
0,99940
0,99958
0,99971
0,99980
0,99986
0,99991
0,99994
0,99996
0,51994
0,55962
0,59871
0,63683
0,67364
0,70884
0,74215
0,77337
0,80234
0,82894
0,85314
0,87493
0,89435
0,91149
0,92647
0,93943
0,95053
0,95994
0,96784
0,97441
0,97982
0,98422
0,98778
0,99061
0,99286
0,99461
0,99598
0,99702
0,99781
0,99841
0,99886
0,99918
0,99942
0,99960
0,99972
0,99981
0,99987
0,99991
0,99994
0,99996
0,52392
0,56356
0,60257
0,64058
0,67724
0,71226
0,74537
0,77637
0,80511
0,83147
0,85543
0,87698
0,89617
0,91309
0,92785
0,94062
0,95154
0,96080
0,96856
0,97500
0,98030
0,98461
0,98809
0,99086
0,99305
0,99477
0,99609
0,99711
0,99788
0,99846
0,99889
0,99921
0,99944
0,99961
0,99973
0,99981
0,99987
0,99992
0,99994
0,99996
0,52790
0,56749
0,60642
0,64431
0,68082
0,71566
0,74857
0,77935
0,80785
0,83398
0,85769
0,87900
0,89796
0,91466
0,92922
0,94179
0,95254
0,96164
0,96926
0,97558
0,98077
0,98500
0,98840
0,99111
0,99324
0,99492
0,99621
0,99720
0,99795
0,99851
0,99893
0,99924
0,99946
0,99962
0,99974
0,99982
0,99988
0,99992
0,99995
0,99996
0,53188
0,57142
0,61026
0,64803
0,68439
0,71904
0,75175
0,78230
0,81057
0,83646
0,85993
0,88100
0,89973
0,91621
0,93056
0,94296
0,95352
0,96246
0,96995
0,97615
0,98124
0,98537
0,98870
0,99134
0,99343
0,99506
0,99632
0,99728
0,99801
0,99856
0,99896
0,99926
0,99948
0,99964
0,99975
0,99983
0,99988
0,99992
0,99995
0,99997
0,53586
0,57535
0,61409
0,65173
0,68793
0,72240
0,75490
0,78524
0,81327
0,83891
0,86214
0,88298
0,90147
0,91774
0,93189
0,94408
0,95449
0,96327
0,97062
0,97670
0,98169
0,98574
0,98899
0,99158
0,99361
0,99520
0,99643
0,99736
0,99807
0,99861
0,99900
0,99929
0,99950
0,99965
0,99976
0,99983
0,99989
0,99992
0,99995
0,99997
Facilitator Guide
US 9015 – Statistics and Probabilities 4
59