Download econs 2 - unimaid.edu.ng

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Gibbs sampling wikipedia, lookup

Transcript
UNIVERSITY OF MAIDUGURI
Maiduguri, Nigeria
CENTRE FOR DISTANCE LEARNING
MANAGEMENT SCIENCES
ECON 103:
INTRODUCTION TO STATISTICS II
3
UNIT:
ECON 103: INTRODUCTION TO STATISTICS II
units
Published
3
2006©
All rights reserved. No part of this work may be reproduced in any
form, by mimeograph or any other means without prior permission in
writing from the University of Maiduguri.
This text forms part of the learning package for the academic
programme of the Centre for Distance Learning, University of
Maiduguri.
Further enquiries should be directed to the:
Coordinator
Centre for Distance Learning
University of Maiduguri
P. M. B. 1069
Maiduguri, Nigeria.
This text is being published by the authority of the Senate, University
of Maiduguri, Maiduguri – Nigeria.
ISBN:
978-8133-50-9
CDL, University of Maiduguri, Maiduguri
ii
ECON 103: INTRODUCTION TO STATISTICS II
units
3
PREFACE
This study unit has been prepared for learners so that they can do
most of the study on their own. The structure of the study unit is
different from that of conventional textbook. The course writers have
made efforts to make the study material rich enough but learners need
to do some extra reading for further enrichment of the knowledge
required.
The learners are expected to make best use of library facilities and
where feasible, use the Internet. References are provided to guide the
selection of reading materials required.
The University expresses its profound gratitude to our course writers
and editors for making this possible. Their efforts will no doubt help
in improving access to University education.
Professor J. D. Amin
Vice-Chancellor
CDL, University of Maiduguri, Maiduguri
iii
ECON 103: INTRODUCTION TO STATISTICS II
units
3
HOW TO STUDY THE UNIT
You are welcome to this study Unit. The unit is arranged to
simplify your study. In each topic of the unit, we have introduction,
objectives, in-text, summary and self-assessment exercise.
The study unit should be 6-8 hours to complete. Tutors will be
available at designated contact centers for tutorial. The center expects
you to plan your work well. Should you wish to read further you could
supplement the study with more information from the list of
references and suggested readings available in the study unit.
PRACTICE EXERCISES/TESTS
1. Self-Assessment Exercises (SAES)
This is provided at the end of each topic. The exercise can help
you to assess whether or not you have actually studied and understood
the topic. Solutions to the exercises are provided at the end of the
study unit for you to assess yourself.
2. Tutor-Marked Assignment (TMA)
This is provided at the end of the study Unit. It is a form of
examination type questions for you to answer and send to the center.
You are expected to work on your own in responding to the
assignments. The TMA forms part of your continuous assessment
(C.A.) scores, which will be marked and returned to you. In addition,
you will also write an end of Semester Examination, which will be
added to your TMA scores.
Finally, the center wishes you success as you go through the
different units of your study.
CDL, University of Maiduguri, Maiduguri
iv
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
INTRODUCTION TO THE COURSE
In this study unit we shall cover four topics. Each topic
discusses a particular aspect of the course. However, it be noted that
there are relationships among the topics.
The topics that we shall cover in this study unit are given below:
1.
Statistical Notation; 2.
Measures of Central Tendency;
Measures of Dispersion; and 4.
3.
Measures of Skew-ness and
Kurtosis
CDL, University of Maiduguri, Maiduguri
1
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
ECON 103:
INTRODUCTION TO STATISTICS II UNIT: 2
T A B L E O F C O N T E N TS
PREFACE
-
-
-
HOW TO STUDY THE UNIT
-
-
-
-
-
iii
-
-
-
-
-
iv
-
-
-
-
1
-
INTRODUCTION TO STUDY UNIT 2
TOPIC
1.
STATISTICAL NOTATIONS
-
3
2.
MEASURES OF CENTRAL TENDENCY
7
3.
MEASURES OF DISPERSION -
-
17
4.
MEASURES OF SKEWNESS AND
-
23
KURTOSIS -
-
-
-
SOLUTION TO EXERCISES
CDL, University of Maiduguri, Maiduguri
2
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
T O P I C 1:
TABLE OF CONTENTS
1.0
TOPIC 1: STATISTICAL NOTATIONS -
-
-
3
1.1
INTRODUCTION -
-
-
-
-
-
-
4
1.2
OBJECTIVES
-
-
-
-
-
-
-
4
1.3
IN-TEXT
-
-
-
-
-
-
-
4
-
-
-
-
4
1.3.1 SUBSCRIPT OR INDEX
1.3.2 SYMBOLS
-
-
-
-
-
-
-
4
1.4
SUMMARY
-
-
-
-
-
-
-
5
1.5
SELF-ASSESSMENT EXERCISE (SAE)
-
-
-
6
1.6
REFERENCES
-
-
-
-
-
-
6
1.7
SUGGESTED READINGS -
-
-
-
-
6
-
CDL, University of Maiduguri, Maiduguri
3
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
1.0
TOPIC 1: STATISTICAL NOTATIONS
1.1
INTRODUCITON:
Subscript and symbols are frequently used in the field of
statistics. These notations are aids that assist in condensing large
information on a series of data in a reduced form.
1.2
OBJECTIVES
At the end of this topic you should be able to:
i.
Provide a reduced form of a series of data
ii.
Assign a definite value of the series by summing up or
multiplying the scores contained in it.
iii.
Calculate, compute and evaluate using symbol/notations.
IN-TEXT
1.3.1 SUBSCRIPT OR INDEX
1.3
We use subscript or index to distinguish between scores of a
series of data and isolate some from others based on our interest. Let
Xi interpreted as X subscript I denote any of the k values x 1, x2, … xk
assumed by X the variable of interest. In the expression Xi, i stand for
any number ranging form 1,2,3, …. K; and this is called subscript or
index.
E.g: In a series of X1, X2, X3, X4, X5, I = 1,2,3,4,5.
At
X1, i=1
X2, i=2
X3, i=3
X4, i=4
X5, i=5
1.3.2 SYMBOLS
Symbols are generally used to quantify statistical information.
Here, after the summation or multiplication of the score, a definite
value is assigned to the series under consideration. Let ∑ (sigma) be
interpreted as the summation symbol of all scores or observations in
the series. This is generally written as follows:
K
∑ Xi = X1 + X2 + X3 + … + Xk
L=I
E.g.: In the case of our series with 5 observations we can apply
the formula and rewrite the information as:
5
∑ Xi = X1 + X2 + X3 + X4 + X5.
L=I
CDL, University of Maiduguri, Maiduguri
4
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
Given this, the following also hold
K
∑ K
∑ Xi Yi = X1 Y1 + X2 Y2 +…. + XK YK
i=1
K
∑ axi = ax1 + ax2 + axk
= a (Xi + X2+…..+xk)
L=I
= a
K
∑
xi (a is a constant number)
L=i
K
K
∑
∑
L=i
j=i
K
Xi Xj=(
∑
X1 )2 , where Xi = Xj
L=i
K
∑
=
xi
2
+2
i=i
K
K
K
∑
∑
xi xj
I< j
K
K
K
∑
(axi + byi – czi) = a ∑
xi + b ∑
-c ∑
i=1
i=1
i=1
i=1
zi if
a, b, and c are all constants.
K
∑
i=1
xi. xi
K
= ∑ x2i ╪
L=I
K
( ∑ xi)2
L=i
The symbol
(Pi) is used in the same fashion to denote
the product of all scores or observations in the series. This is written
as follows:
k
Xi = xi.x2.x3. …..xk.
i=1
E.g: Considering our former example with 5 observations, we
have:
5
i=1
1.4
= x1.x2.x3.x4.x5
SUMMARY
CDL, University of Maiduguri, Maiduguri
5
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
The topic has attempted to distinguish between subscripts and
symbols. In addition, it has succeeded in combining the two in order
to quantify statistical information.
1.5
SELF-ASSESSMENT EXERCISE (SAE)
1.
Given the following series of observations, where Xi =
2,3,1,5,4,2, compute:
6
a.
6
xi
∑
,
x i2
∑
i=1
i=1
6
∑
b.
xi
i=3
c.
5
i=2
xi
5
d.
xi
i=3
2.
Consider the following value of xj, where xi=xj
a.
i
1
2
3
4
xi
3
-7
9
-1
K
b.
xi)2 , (
Compute: ( ∑
i=1
c.
xi
)2
L=i
K
K
∑
∑
i=1
j=i
Calculate
K
(-1) xi xj
K
d.
Evaluate
∑
6xi
i=1
1.6
REFERENCES
Walpole, R. E. (1982) Introduction to Statistics, 3rd Edition, New
York: Macmillan Publishing Co., Chap. 1
1.7
SUGGESTED READINGS
Spiegel, M. R. and Stephens, L. J. (1999), Schaum’s Outline of
Theory and Problems of Statistics, New York, London:
McGraw Hill, chap. 3
CDL, University of Maiduguri, Maiduguri
6
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
T O P I C 2:
TABLE OF CONTENTS
2.0
TOPIC 1: MEASURES OF CENTRAL TENDENCY -
-
7
2.1
INTRODUCTION
2.2
OBJECTIVES
2.3
IN-TEXT
-
-
-
-
-
-
8
-
-
-
-
-
-
-
8
-
-
-
-
-
-
-
8
2.3.1 MEAN
-
-
-
-
-
-
8
2.3.2 MODE
-
-
-
-
-
-
10
-
-
-
-
-
-
11
2.3.4 GEOMETRIC AND HARMONIC MEAN
-
-
14
2.4
SUMMARY
-
-
-
-
16
2.5
SELF-ASSESSMENT EXERCISE (SAE)
-
-
-
16
2.6
REFERENCES
-
-
-
-
-
-
16
2.7
SUGGESTED READINGS -
-
-
-
-
16
2.3.3 MEDIAN
-
-
-
-
CDL, University of Maiduguri, Maiduguri
7
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
2.0
TOPIC 2: MEASURES OF CENTRAL TENDENCY
2.1
INTRODUCITON:
Measures of central tendency are also known as measures
location. These measures describe how centrally placed
representative is a particular value among the scores of a series
data. The mean, mode and median are the most common measures
central tendency.
2.2
of
or
of
of
OBJECTIVES
At the end of the topic you should be able to:
i.
Define the mean, mode and median
ii.
Compute them for both the discrete and continuous
variable of a series
iii.
Use graphs to determine the mode and median of a
distribution.
2.3
2.3.1
IN-TEXT
THE MEAN OR ARITHMENTIC MEAN
The mean is an average value of scores or observations
occurring in a set of data. It is calculated by summing up all the values
assigned to the scores divided by the total number of observations in
the series. We distinguish two cases: The simple arithmetic mean and
weighted arithmetic mean. The simple arithmetic mean of a series of
data with k observations
xi, x2, x3, …, xk is given as:
x = xi + x2 + x3 + … + x k
N
= K
∑ xi
(Reduced
Form)
i=1
K
=
1 ∑ xi
N
i=1
For the sake of conveniences and neatness in the expression, we
may ignore the lower and upper limits of the summations sign: ∑ and
write:
__
x =
K
1 ∑ xi ,
N
where
Xi: observation with i=1, 2, - - - k
CDL, University
of
N: total
numberMaiduguri,
of observations in the series Maiduguri
8
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
CDL, University of Maiduguri, Maiduguri
9
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
The case of weighted (classified or grouped) arithmetic mean,
every xi is associated with a corresponding frequency fi such that xi,
x2, x3, - - -, xk will have f1, f2, f3, - - -, fk frequencies. The weighted
arithmetic mean is thus given as:
__
X = f1x1 + f2 x2 + f3 x3 + - - - + fk xk
f1+ f2+f3+ - - -+fk
= ∑ fi xi
∑ fi
NUMERICAL APPLICATION:
Given the following series of data 3, 6, 9,8,4 of a discrete
variable x, compute the simple arithmetic mean.
__
x = 1 ∑ xi
N
=3+6+9+8+4
5
= 30
5
=6
Assuming that each observation is associated with a
corresponding frequency, 1, 3, 2, 5 and 7. Then the arithmetic means
becomes:
__
x = ∑fi xi
∑fi
= 1(3)+3(6)+2(9)+5(8)+7(4)
1+3+2+5+7
= 3+18+18+40+28
18
= 107
18
= 5.94
PROPERTIES OF THE ARITHMETIC MEAN:
1.
The algebraic sum of the deviations of a set of
numbers about the mean is zero.
∑ (x – x) = 0
∑x - ∑x = 0
∑ x = x+x+x+- - -+x
∑ x – nx = 0
∑ x – n ∑x = 0
n
x=∑x
n
n times
CDL, University of Maiduguri, Maiduguri
10
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
∑x - ∑x = 0
2.
If two sets of data N1 and N2 have x, and x2
means respectively, then the weighted
arithmetic mean of all the means (the means
of the two sets combined) is obtained as:
f1m1+f2m2
x = N1x1+n2 x2 ;
f1+f2
N1+N2
(
)
3.
If there are N observations x1 x2, - - - xn with x
mean and y1= kxi (i= 1,2,---n; k=constant), then
the arithmetic mean Y can be expressed as:
__
Y =kx
If the mean of xi, x2 ---xn is xi and that to
every value of x a constant c is added such that
the observations will look like x1+c, x2+c, --xn+c, then the arithmetic mean of the set will
be x+c (where c is a negative or positive
number.)
If A is an assumed mean (guessed mean) and
di = xi –A the deviation of xi from A, then the
mean can be calculed as follows:
4.
5.
∑d
X=A+
i
for simple arithmetic mean
N
X= A+
mean
∑fidi
∑f
for weighted arithmetic
_
_
_
In Short, x = A + d ; d =∑di
N
2.3.2
THE MODE
The mode of a set of n observation is the number that occurs
most frequently in that set. In other words, it is the number that has
the highest frequency of occurrence in the set. For a discrete variable,
the mode is determined by mere observation of the frequency of
occurrence of the scores.
E.g.: in the following series 3, 2, 2, 2, 4, the mode is
Mo = 2 (2 occurs most compared to 3 and 4)
For continuous variable or distribution, the mode is located
within the modal class (class with the highest frequency). There are
CDL, University of Maiduguri, Maiduguri
11
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
two methods for determining the mode: the mathematical and
graphical method.
2.3.2.1
THE MATHEMATICAL METHOD
With this method, we use the following formula:
f1 .
Mo =Lo + f1+f2
Lo:
f1 :
f2:
Z:
+z, where
lower class boundary (true limit x-0.5) of modal class,
difference between the frequency of the modal class and
the frequency of the class immediately preceding (before)
the modal class,
difference between the frequency of the modal class and
the frequency of the class immediately proceeding (after)
the modal class,
size of the modal class (upper true limit –lower true limt).
2.3.2.2
THE GRAPHICAL METHOD
The mode of a distribution can also be read from the histogram,
by drawing a guide line from the top two extreme corners of the
modal class bar and projecting a straight line from their point of
intersection on the x-axis to finally locate it (the mode). This is
illustrated below:
40
35
30
25
20
15
10
5
0
Mo=xo
2.3.3
xi (class
boundary)
THE MEDIAN
The median is the middle value of a set of data arranged in
order of magnitude (ascending or descending order). In other words,
it is the measure of location that divides a set of data formally
arranged into two equalN parts. In short, it corresponds to the
2t
observation of a series of data.
h
CDL, University of Maiduguri, Maiduguri
12
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
Eg: Find the median of the following series of data:
1, 5, 2, 7, 3, (Discrete variable)
Arranging the data in order of magnitude (ascending or
descending order) we obtain:
1,2,3,5,7
ascending order
7,5,3,2,1
descending order
In both cases 3 is the value that divides the series into two equal
parts. Thus the median Me = 3.
If another observation or value say 9 is added to the series the
median will lie within the interval 3-5 or 5-3 and the median will be
the average of these two numbers. Thus,
Me = 3+5 = 8 = 4
2
2
Me = 5+3 = 8 = 4
2
2
For a classified data, we can either use the mathematical
approach or graphical approach to find the median.
2.3.3.1
THE MATHEMATICAL APPROACH
This approach requires the application of the following formula:
N
Me = Le + 2 - F
x Z, where
fe
Le:
lower class boundary of the median class (class containing
the (N)th observation of the series),
2
N:
N
2:
F:
total number of observations in the series,
middle value,
cumulative frequency of the class preceding (before) the
(N)th
2
fe:
Z:
2.3.3.2
observation or cumulative frequency before the median
class
frequency of the median class (relative or absolute but no
cumulative frequency),
Size of the median class
THE GRAPHICAL APPROACH
CDL, University of Maiduguri, Maiduguri
13
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
With this approach, the Me can be read from the cumulative
frequency or ogive curve. To do this, we locate the Nth value on the
cumulative
2
frequency axis (y-axis), draw a straight horizontal line from that
point to intersect with the ogive curve and from the point of
intersection project a straight vertical line on the x-axis to finally read
the median of the distribution. This is illustrated below:
100
50
0
Me= Lo
xi
2.3.3.3
QUARTILES, DECILES AND PERCENTILES
These are extrapolations of the concept of median. To
compute them we use the formula for computing the median.
This is done by considering into how many equal parts the
distribution should be divided. In this regard, quartiles, deciles
and percentiles divide a set of data into four, ten and hundred
equal parts respectively. Quartile are 3 in number: 41, 42 and 43.
Deciles are 9 in number: D1, D2, D3, …, D9. Percentiles are 99 in
number: P1, P2, P3, …, P99.
The general formula for quartiles computation is given
below:
Qr = Lr +
r = 1,2,3:
Lr:
N:
F:
rN
4 - F
fr
x Z, where
the group of the quartile of interest
lower class boundary of the class containing(rN)th
the
4
observation
total number of observations,
cumulative frequency of the class preceding (before)
the
(rN)th
4
CDL, University of Maiduguri, Maiduguri
14
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
class containing the
frequency
of
the
fr:
observation,
observation
class
containing
(rN)th
the
4
rN
4:
the value dividing the series into 4 x r (the
group of interest)
size of rN Class
Z:
4
The general formulae for the computation of deciles and
percentiles follow the same procedure with little and
appropriate adjustments where necessary. The formulae are
given below:
rN
Deciles:
Dr = Lr +
-
10
F
x Z,
D1 D2 D3 … D9
fr
Percentiles:
rN
Pr = L100
r +
P99
F
2.3.4
THE GEOMETRIC AND HARMONIC MEAN
-
x Z,
P1 P2 P3 …
fr
The geometric and harmonic mean are not frequently used
in statistical analysis. However, we need to discuss them here
for the purpose of completeness.
The geometric mean on N positive values x1, x2 … xn of a
set is the Nth root of the product of all numbers of the set. It is
mathematically defined as:
G  N x1.x 2...xn
Simple geometric mean
G  N f 1 x1. f 2 x 2... fnxn
Weighted geometric mean
Eg:
Find the geometric mean of the following numbers: 3,
4, 5 of a series.
G
CDL, University of Maiduguri, Maiduguri
15
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
 3 3.4.5
 3 60
=3
shift
or
2nd F
or inv
x
y60  3.91
Similarly, we can use the decimal logarithm (log base 10)
to find the geometric mean of a series of observations. The
formula is given as:
log G 
1
N
 log xi, ( simple )
log G 
1
N
 fi log xi, ( weighted )
log G 
1
N
 log xi
=
1
N
(log xi  log x 2  log x3  ... log xn)
Eg: using our previous data: 3,4,5, we get:
log G = 13 (log 3+log 4+log 5)
= 13 (0.477+0.60+0.698)
= 0.33 (1.775)
log G = 0.58575
G = shift
= 3.90
or
2nd F
or inv
10x 0.5.8575
The harmonic mean of n numbers x1, x2, …, x7 of a series of
data is the reciprocal of the arithmetic mean of their reciprocals.
The harmonic mean is mathematically expressed as:
1
H
1
H
= ∑
=∑
1
N
1
xi
H∑ xi1 =N
1
xi
N
N

N
H

H
fi
xi
, ( weighted ) 
1
xi
, ( simple )
Eg: Find the harmonic mean of the following numbers:
2,7,5,3
CDL, University of Maiduguri, Maiduguri
16
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
H
N
1

xi
4
(   15  13 )
1
2

1
7
4
1.17
 0.85
The Relationship between the arithmetic mean, geometric
mean and harmonic mean of a series of data.
If the observations x1, x2, x3 …xn of the series are different
then H  G  X . However, if the observations are identical then
H G X.
The empirical relationship between the mean, mode and
median. For unimodal (one mode) frequency curves that are
moderately skewed (asymmetrical) the following relationship
holds: X  Mo  3( X  Me) .
CDL, University of Maiduguri, Maiduguri
17
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
2.4
SUMMARY
The topic has introduced the student to the meaning of measures
of central tendency. The topic has also exposed the students to various
approaches of getting these measures. Besides this, the topic has
succeeded in defining the existing relationship between the measures
discussed.
2.5
SELF-ASSESSMENT EXERCISE
1.
2.
3.
4.
5.
2.6
Define measures of central tendency
Compute the mean, mode and median of the
following numbers:7,9,10,5,6,15,4,7,8,7,13,3
Find the geometric mean of the series of data in 2
above using the two approaches discussed in the
main text.
Prove that the sum of the deviations about the mean
of the series of data in 2 above is zero.
A hypothetical pattern of people’s weekly expenses
in a ward in Maiduguri is given below:
Daily Expenses in N000
No.
of
people
1–4
10
5-8
15
9 – 12
50
13 – 16
15
17 – 20
10
Total
100
Use the data to compute the mean, mode and median
of the distribution. Graphically find the position of
the mode and median of the distribution.
REFERENCES
Spiegel, M. R. and Stephens, L. J. (1999), Schaum’s Outline of
Theory and Problems of Statistics, New York, London:
McGraw Hill, chap. 3
2.7
SUGGESTED READINGS
Walpole, R. E. (1982) Introduction to Statistics, 3rd Edition, New
York: Macmillan Publishing Co.
CDL, University of Maiduguri, Maiduguri
18
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
T O P I C 3:
TABLE OF CONTENTS
3.0
TOPIC 1: MEASURE OF DISPERSION
-
-
-
17
3.1
INTRODUCTION -
-
-
-
-
-
-
18
3.2
OBJECTIVES
-
-
-
-
-
-
-
18
3.3
IN-TEXT
-
-
-
-
-
-
-
18
-
-
-
-
18
-
3.3.1 VARIATION RATIO
3.3.2 RANGE
-
-
-
-
-
-
18
3.3.3 VARIANCE -
-
-
-
-
-
-
19
3.3.4 STANDARD DEVIATION -
-
-
-
20
3.4
SUMMARY
-
-
-
-
21
3.5
SELF-ASSESSMENT EXERCISE (SAE)
-
-
21
3.6
REFERENCES
-
-
-
-
-
-
22
3.7
SUGGESTED READINGS -
-
-
-
-
22
-
-
-
-
CDL, University of Maiduguri, Maiduguri
19
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
3.0
TOPIC 3: MEASURES OF DISPERSION
3.1
INTRODUCITON:
Measures of dispersion are measures describing the extent of
spread, scatter or dispersion among scores in a distribution. They are
generally used to indicate how typical a measure of central tendency
is with small variability among the scores. In a more narrow
distribution for example, the measure of central tendency will be
closer to all scores of the distribution. However, we will restrict
ourselves to the study of the variation ratio, range, variance, standard
deviation and coefficient of variation.
3.2
OBJECTIVES
At the end of this topic, you should be able to:
i.
Calculate and assign definite value of spread of a
distribution.
ii.
Identify the appropriate measure of variability to be used
for assessing the spread among scores of a distribution
when a particular measure of central tendency is
considered.
iii.
Make comparison between the variability of distributions
and
iv.
Overcome the difficulties that may arise from the input
data requirements of distributions (differences in
variables or difference in arithmetic means).
3.3
IN-TEXT
3.3.1 VARIATION RATIO
The variation ratio measures the proportion or percentage of
subject/scores of a distribution outside (not included in) the modal
class. It is appropriate to use this measure as a measure of variability
when the typical measure of central tendency under consideration is
the mode.
The variation ratio is given in percentage as:
fo
VR  1  %, where
N
fo:
frequency of the modal class,
N:
total number of observations in the distribution.
3.3.2 RANGE
CDL, University of Maiduguri, Maiduguri
20
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
The range of distribution is the difference between the smallest
and largest score of the distribution. We use it as a measure of
variability when the typical measure of central tendency under
consideration is the median. The range is mathematically defined as:
L:
:
R=L– 
, where
largest upper true limit of the distribution,
smallest lower true limit of the distribution.
3.3.3 VARIANCE
The variance of a distribution is the average of the squared
deviation of the scores about their mean. It is the most commonly
used indicator of degree of variability and most dependable estimate
of variability in the total population from which samples are generally
drawn. We use variation to measure the extent of spread in a
distribution when the typical measure of central tendency under
consideration is the arithmetic mean.
We distinguish two types of variance – namely: population
variance (  2) and sample variance (S2).
NB: no unit of measurement.
( x   ) 2
2 
, population variance
N
:
N:
population mean
biased estimate of the total number of observations, in the
population/set.

( x  x ) 2  x 2  x
2 

(
)2, sample variance
N
n
n
X:
n=N-1:
sample mean
unbiased estimate of the total
observations in the sample/sub-set.
number
of
In the case of classical data/grouped data/ categorized data, we
express the variance as follows:
2 
s2

 f ( x   )2
, population variance
N
 f ( x  x) 2

,  N  1
CDL, University of Maiduguri, Maiduguri
21
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2

 fx2

(
 fx

)2, sample variance
THE STANDARD DEVIATION
3.3.4
The standard deviation of a distribution is the square root of the
variance of the distribution. Similarly, we have population standard
deviation and sample standard deviation. These are computed as
follows:
   2,
population standard deviation
s  s2 ,
Sample standard deviation.
Contrary to the variance, the standard deviation carries the unit
of measurement assigned to the variable under study.
3.3.4.1
1.
2.
THE USES AND INTERPRETATION OF THE STANDARD
DEVIATION
The standard deviation is used to calculate many
other
statistics
(coefficient
of
variation,
moments…).
We also use the standard deviation to compare the
variability
of
scores
between
two
groups/distributions.
A
with S.D= 3.0  easy to get the
subject
B
with S.D= 15
 very difficult to get
the
subject because
the widespread.
of
The value of the standard deviation of a distribution indicates
the magnitude of the spread among the scores of the distribution. If
the standard deviation of a distribution is small, it means that the
scores are concentrated near the mean. On the other hand if it is
large, the scores are scattered widely about the mean of the
distribution.
3.3.5
THE COEFFICIENT OF VARIATION
CDL, University of Maiduguri, Maiduguri
22
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
When a comparison between the variability of distributions is
not satisfactory or conclusive after use of the variance and standard
deviation due to difference in their variables or arithmetic means
(with same variables), we use the coefficient of variation. The
coefficient of variation is the ratio of the standard deviation to the
arithmetic mean of a distribution. It is expressed in percentage as:
S .D
CV   x100 .
X
Other measures are defined as follows:
Inter Quartile Range:
IQR= Q3 – Q1
Semi-Inter Quartile Range: the arithmetic mean of the
deviations of the first and third
quartiles round the median
( Me  Q1)  (Q3  Me)
2
Q3  Q1

2
SIQR 
Mean Absolute Deviation:
3.4
M.A.D =
 xx
N
SUMMARY
The topic has attempted to define measures of dispersion and
exposed the student to different formulae for computing the various
measures of spread discussed. Besides, the uses and interpretation of
the standard deviation are given.
3.5
SELF-ASSESSMENT EXERCISE (SAE)
1.
2.
3.
What is a measure of dispersion?
Give mathematical formulae for measures of spread
known to you.
With the following information:
Age Group
No. of Students
16-18
5
19-21
10
22-24
13
25-27
35
CDL, University of Maiduguri, Maiduguri
23
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
4.
3.6
28-30
7
Total
70
Compute:
i. Variation ratio
ii. Range
iii. Variance
iv. Standard deviation
v. Coefficient of variation
Compare the distribution with that of Exercise 5 of
section 2.5 in Topic 2 and freely comment.
REFERENCES
Spiegel, M. R. and Stephens, L. J. (1999), Schaum’s Outline of
Theory and Problems of Statistics, New York, London:
McGraw Hill, chap. 3
3.7
SUGGESTED READINGS
Walpole, R. E. (1982) Introduction to Statistics, 3rd Edition, New
York: Macmillan Publishing Co.
CDL, University of Maiduguri, Maiduguri
24
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
T O P I C 4:
TABLE OF CONTENTS
4.0
TOPIC 1: MEASURES OF SKEWNESS AND KURTOSIS
4.1
INTRODUCTION -
-
-
-
-
-
-
24
4.2
OBJECTIVES
-
-
-
-
-
-
-
24
4.3
IN-TEXT
-
-
-
-
-
-
-
24
-
-
-
24
-
-
-
24
-
4.3.1 PEARSON COEFFICIENT
4.3.2 BOWLEY COEFFICIENT
23
4.3.3 MOMENTS
-
-
-
-
-
-
25
4.3.4 KURTOSIS
-
-
-
-
-
-
26
-
-
4.4
SUMMARY
-
-
-
-
-
27
4.5
SELF-ASSESSMENT EXERCISE (SAE)
-
-
-
27
4.6
REFERENCES
-
-
-
-
-
-
28
4.7
SUGGESTED READINGS -
-
-
-
-
28
-
CDL, University of Maiduguri, Maiduguri
25
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
4.0
TOPIC 4: MEASURES OF SKEWNESS AND KURTOSIS
4.1
INTRODUCITON:
In real world situations, it is possible to come across two
different series of data with exactly same arithmetic mean and
standard deviation. When this occurs, we shall ascertain the extent to
which the scores are distributed across the central measure and how
their curves depart from that of a normal or Gaussian distribution
before drawing any conclusion about the series. This requires the
study of the measures of skewness and kurtosis.
4.2
OBJECTIVES
At the end of this topic, you should be able to:
i. Define skewness
ii. Define kurtosis
iii. Compute their measures
iv. Conclude about the type of skewness and kurtosis
assumed by a given distribution.
4.3
IN-TEXT
Skewness describes the degree of asymmetry in polygons from
two series. The skewness of a distribution always lies between –
1 and 1.
4.3.1 PEARSON COEFFICIENT (FIRST AND SECOND)
The Pearson coefficient is used to measure the skewness when
the emphasis is placed on the mean, mode, median and S.D of the
distribution. The coefficient is given as V1 or V2.

( x  Mo )
V1 
;
S .D

3( x  Me )
V2 
S .D
4.3.2
BOWLEY COEFFICIENT
When the emphasis shifts to the median and quartiles, we apply
the Bowley coefficient to ascertain the skewness of the distributions.
The Bowley coefficient is given as:
2(Q3  Q1  Q 2 Me)
K
Q3  Q1
CDL, University of Maiduguri, Maiduguri
26
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
NB: Me = Q2
A polygon distribution can be symmetrical, positively skewed or
negatively skewed. The skewness coefficient of a symmetrical

distribution is always zero and x , Mo and Me all coincide. The
skewness coefficient of a positively skewed (skewed to the right)

distribution is always greater than zero. Here, Mo > Me > x .
100
80
60
40
20
0
x
Me
Mo
Figure 1: Negatively skewed distribution
Skewness < 0
100
80
60
40
20
0
x, Me,
Mo
Figure 2: Symmetrical Distribution
Skewness = 0
100
80
60
40
20
0
x
Me
Mo
Figure 3: Positively skewed distribution
Skewness > 0
4.3.3 MOMENTS
Moments are the widely accepted measures of skewness by
mathematical statisticians. The general formula for moments is given
as:

1
Mr  ( x  x)r , where r: rank of the moment.
n
Given this general formula, it follows:
CDL, University of Maiduguri, Maiduguri
27
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2

1 ( x  x )  0 
- first moment
M1 
n

1
M2 
( x  x) 2  variance – second moment
n

1
M3 
( x  x)3 - third moment
ns 3
4.3.4
KURTOSIS
Kurtosis describes the amount of peakedness in a distribution. It
explains how a particular distribution departs from the shape of the
normal distribution curve. Kurtosis shows whether a distribution is
very pointed with wide tails or humped with short tails.
A distribution that is very pointed with wide tails is known as
leptokurtic. A broad humped distribution with short tails is referred
to as platykurtic. A distribution that is neither leptokurtic nor
platykurtic is known as mesokurtic. A mesokurtic distribution
conforms to the shape of the normal distribution curve.
The amount of peakedness or kurtosis is measured by the fourth
moment. The coefficient is given as:

( x  x) 4
4 
,
s 4
The coefficient of kurtosis for mesokurtic distributions is always
 4=3. The coefficient of kurtosis for leptokurtic distributions is
always greater than 3, while the amount of peakedness for platykurtic
distributions is always less than 3.
CDL, University of Maiduguri, Maiduguri
28
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
150
135
120
105
90
75
60
45
30
15
0
Platykurtic
0
2
4
6
8
150
135
120
105
90
75
60
45
30
15
0
Mesokurtic
(norm al)
0
5
10
150
135
120
105
90
75
60
45
30
15
0
Leptokurtic
0
4.4
5
10
SUMMARY
This topic has explained the degree of asymmetry and amount of
peakedness in distributions. The topic has also exposed the student to
mathematical formulae for measuring these statistics and displayed
their respective graphical presentation.
4.5
SELF-ASSESSMENT EXERCISE (SAE)
1. Define skewness and kurtosis
2. Based on employment records obtained
from a ministry in your state over a period
of time, you have been able to compile the
following information:
CDL, University of Maiduguri, Maiduguri
29
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
43,34,45,35,55,44,45,65,48,
53,70,60,45,48,44,69,73,55,51,58,66,77,40,
43,35,54,77,54,77,49,56,30,48,55,67,36,47,
52,53,46,74,45,64,45,71,32,44,55,43,35,60,
55,66,53,44,36,48,46,76,73,34,49,47,57,50,
54,65,53,46,44,56.
a. Construct a frequency distribution
table of 6 classes
b. Calculate the mean, mode and median
of the distribution.
c. Construct a histogram and frequency
polygon of the distribution
d. Construct a cumulative frequency of
the distribution
e. Compute the Peason coefficient and
the
fourth
moment
of
the
distribution.
f. Graphically show the degree of the
asymmetry and amount of peakedness
in the distribution.
4.6
REFERENCES
Spiegel, M. R. and Stephens, L. J. (1999), Schaum’s Outline of
Theory and Problems of Statistics, New York, London:
McGraw Hill, chap. 5
4.7
SUGGESTED READINGS
Karmel, P. H. and Polasek, M. (1970): Applied Statistics for
Economics, 3rd edition, Great Britain: Pitman.
CDL, University of Maiduguri, Maiduguri
30
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
CDL, University of Maiduguri, Maiduguri
31
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
SOLUTIONS TO EXERCISES
TOPIC 1:
1.
xi=2,3,1,5,4,2
6
a.
 x 2i  2  3  1  5  4  2
i 1
 17
6
 xi 2  (2)2  (3)2  (1)2  (5)2  (4)2  (2)2
i 1
 4  9  1  25  16  4
 59
6
b.
 xi  1  5  4  2
i 3
 12

5
c.
i 2
 3x1x5 x4 x2
 120
TOPIC 2:
5.
Computation of the mean, mode and median of the
distribution.
Daily
Expenses
(N000)
No. of
people
fi
1–4
5-8
9–12
13–16
17-20
TOTAL
10
15
50
15
10
100
Midpoints Product Cumulative
frequency
xi
fi xi
F
2.5
6.5
10.5
14.5
18.5
-
25
97.5
52.5
217.5
18.5
1050
True Limit
 -0.5 L+0.5
0.5 – 4.5
4.5 – 8.5
8.5 – 12.5
12.5-16.5
16.5-20.5
10
25
75
90
100
1050
1
 10.5 xN1000
N  fixi 
The mean of the distribution is: x=
100
 N10,500
CDL, University of Maiduguri, Maiduguri
32
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
The mode of the distribution should be located within the
interval 9-12 which is the modal class. The mode is therefore:
( f 1)
Mo  Lo 
xZ
f1 f 2
35
35
4
Mo  8.5  (
)4  8.5  ( )4  8.5  ( _  10.5
35  35
70
2
Mo  10.5 xN1000  N10,500
The median of the distribution should be located within the
median class of 9-12, where the N2 th observation of the distribution
falls.
The median:
(n  f )
Me  le  2
Z
fe
N
Le  8.5,  50, F  25, fe  50, Z  4
2
50  25
25
4
Me  8.5  (
)4  8.5  ( )4  8.5  ( )  10.5
50
50
2
Me  10.5 XN1000  N10,500
Graphical location of the mode and median of the distribution.
To locate the mode with the help of a graph, we shall construct a
histogram of the distribution, whereas for the median, we shall
construct the cumulative frequency or ogive curve of the distribution.
140
12.5
120
100
80
60
8.5
4.5
0.5
40
20
0
frequency
CDL, University of Maiduguri, Maiduguri
0
Class boundary
(N000)
33
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
Location of the median: Construction of the ogive curve:
100
90
80
70
60
50
40
30
20
10
0
Ogive Curve
Upper
true
limit
4.5
8.5
12.5
16.5
TOPIC 3:
1.
In order to compare this distribution with the
distribution of Exercise 5, section 2.5 of topic 2 we
proceed as follows:
We compute the standard deviation of the two
distribution.
Standard Deviation for exercise 3 of topic 3.

1
 2   f ( x  x) 2
N

1697
N  70, x 
 24.24
70

728.34
 f ( x  x)2  728.34;  2 
 10.40
70
Standard Deviation    2  10.40  3.22
Standard Deviation for Exercise 5 of topic 2

1
 2   f ( x  x)2
N

1050
N  100, x 
 10.5
100

78625
 f ( x  x)2  78625,  2 
 786.25
100
Standard Deviation  2   2  100 786.25  328.04
The
data
are
given
in
‘000’.
Thus,
 =28.04x1000=28040
CDL, University of Maiduguri, Maiduguri
34
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
The concentration of scores near or
around the mean is smaller for the distribution in
Exercise 3 of topic 3. this means that it is a narrow
spread distribution and we can easily get in touch
with the subjects. In Exercise 5 of topic 2, the
distribution assumes a widespread distribution
pattern. The subjects are spread faraway from the
mean. This means, it is very difficult to get the
subject of the distribution. In short, we may say that
 =28.04 >  =3.22.
TOPIC 4:
2. e
Computation of the Pearson coefficient and the fourth
moment of the distribution.
We shall consider both the first and second Pearson
coefficient (V1 and V2)

x  Me
(Pearson first coefficient)
V1 
S .D

x  52.20
Mo  49.86
S .D  12.15  V 1  0.192

V2 
( x  Me)
S .D
(Pearson
second

x  52.20
Me  51.18
S .D  12.15 V 2  0.251
coefficient)
The Fourth Moment

 f ( x  x) 4
4 
NS 4

 f ( x  x) 4  3354549.68
N  71
S 4  21792.40   4  2.16
CDL, University of Maiduguri, Maiduguri
35
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
f)
Illustration for the skewness and kurtosis of the
distribution. For the skewness we focus attention on the
position of the mean, mode and median of the distribution.

x  52.20
Mo  49.86
Me  51.18
Here Mo < Me < Mean  we have a positively skewed
distribution or a skewed to the right distribution.
x
25
20
15
x
10
5
0
0
49.86
51.18
52.20 x
For the amount of peakedness, we concentrate on the
value of  4. Since  4=2.16< 3, then we conclude that the
distribution is a platykurtic distribution. This can be
illustrated as follows:
20
15
10
5
0
variable
Frequency
Frequency
30
50
CDL, University of Maiduguri, Maiduguri
36
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
TUTOR-MARKED ASSIGNMENT
UNIT-TEST 1
1.
2.
3.
The responses of a nutrient of a hypothetical drug
producing or manufacturing company on the weight
of rabbits have been recorded over a given period of
time as below/follows:
27,26,25,23,27,28,32,31,30,29,30,30,28,
43,44,45,47,33,35,37,35,36,34,34,33,37,
36,36,34,33,34,37,33,42,40,41,38,39,40,
42,41,40,39,38,40,42,41,39,39,41,40,41,
40,39,38.
a. Consider the data and construct a five class
frequency distribution.
b. Calculate the mean, mode and median of the
distribution.
c. Locate the position of the mode and median of the
distribution using separate graphs or techniques
of data presentation.
The number eggs produced in a poultry by ten layers
in an hour is given below:
10,11,23,9,7,10,16,20,10,14
a. What is the nature of the variable under
consideration?
b. Compute the arithmetic mean for the first five
layers.
c. Calculate the arithmetic mean of the series of
data
d. What is the mode?
e. Find the median of the series of data.
f. Calculate the standard deviation of the series and
comment.
Explain briefly the following terms:
a. Measures of dispersion
b. Skewness and kurtosis
c. Quantiles
CDL, University of Maiduguri, Maiduguri
37
ECON 103: INTRODUCTION TO STATISTICS II
Unit: 2
d. Measures of central tendency.
CDL, University of Maiduguri, Maiduguri
38