Download File

Document related concepts
no text concepts found
Transcript
UNIT I
1
Variable: a variable is a value that may change
within the scope of a given problem or set of
operations.
Data: The term data refers to qualitative or
quantitative attributes of a variable or set of
variables.
Data Collection: Data collection is a term used
to describe a process of preparing and collecting
data.
Data are generally classified into following two
groups:
a) Internal data
b) External data
2
Internal data: It comes from internal sources related
with the functioning of an organization or firm
where records regarding purchase, production,
sales, profits etc. are kept on regular basis.
External data: The external data are collected and
published by external agencies. The external data
can further classified as:
a) Primary data
b) Secondary data
Primary Data: These are original and first hand
information.
Secondary data: these are one which are already
been collected by a source other than the present
investigator.
3
Primary
Sources of
data
External
Secondary
Internal
4
The Collected data or raw data or ungrouped data
are always in an unorganized form and need to be
organized and presented in meaningful and readily
comprehensible form in order to facilitate further
statistical analysis.
Classification: It is the process of arranging things in
the groups according to their resemblances and
affinities and gives expression to the unity of
attributes that may subsist amongst a diversity of
individuals. Or in simple words it is grouping of data
according to their identity, similarity or
resemblances. For eg. Letters in the post office are
classified according their destinations viz., Delhi,
Raipur, Agra, Kanpur etc.
5
Chronological or
Temporal
Classification
Types of
Classification
Geographical or
Spatial
Classification
Qualitative
classification
Quantitative
Classification
6
Chronological or Temporal classification: In
Chronological classification, the collected data
are arranged according to the order of time
expressed in years, months, weeks etc. The data
are generally classified in ascending order of
time.
Example: The estimates of birth rates in India
during (1970-79) are:
Year
1970 1971 1972 1973 1974 1975 1976
1977 1978
1979
Birth rate
36.8
33.0
33.0
36.9
36.6
34.6
34.5
35.2
34.2
33.3
7
Geographical or Spatial classification: In this type of classification
the data are classified according to geographical region or place.
The observations are either classified in the alphabetical order of
the reference places or in the order of size of the observation.
Example:
1. When the names of countries are in alphabetical order:
Country
America
China
Denmark
France
India
Yield of
wheat
1925
893
225
439
862
2. When observations are in descending order:
Country
America
China
India
France
Denmark
Yield of
wheat
1925
893
862
439
225
8
Qualitative classification: In this type of
classification data are classified on the basis of
some attributes or quality like literacy, religion,
employment etc. Such attributes cannot be
measured along with a scale.
When the classification is done w.r.t on attribute,
which is dichotomous in nature, two classes
were formed, one possessing the attribute and
the other not possessing the attribute. This type
of classification is called Simple or dichotomous
classification.
9
The classification where two or more attributes
are considered and several classes are formed, is
called a manifold classification.
10
Population
Urban
Male
Rural
Female
Male
Female
11
Quantitative classification: The collected data
are grouped with reference to the characterstics
which can be measured and numerically
described such as height, weight, sales, imports,
age, income etc.
12
If data is arranged in ascending or descending
order of magnitude then it is said to be an array.
Example: Consider the marks of 50 students
Ungrouped data
21
50
42
75
55
67
74
55
47
64
71
61
40
25
25
54
64
37
88
44
31
70
81
51
45
63
49
43
35
67
68
31
38
45
59
75
57
29
66
50
56
84
56
88
63
32
55
88
79
78
Arranged in array
21
31
40
45
51
56
61
66
71
79
25
32
42
47
54
56
63
67
74
81
25
35
43
49
55
57
63
67
75
84
29
37
44
50
55
58
64
68
75
84
31
38
45
50
55
59
64
70
78
88
13
Diagrammatical
representation:
In
this
presentation we make use of geometric figures
like bars, squares, rectangles, circles etc.
14
One
Dimensional
diagrams
Two dimensional
diagrams
Types of
Diagrams
Pictograms
Cartograms
15
1. One Dimensional diagrams: One Dimensional
diagrams are also called Bar diagrams, widely
used diagrams for the visual presentation of
data.
16
Simple Bar
Diagram
Multiple bar
diagram
Subdivided bar
diagram
One Dimensional
diagram
Percentage bar
diagram
Deviation Bar
Diagram
Broken bars
17
i. Simple Bar-diagrams: It consists of number
of rectangles and is used only for onedimensional comparisons. It is generally used
to show changes in the magnitudes of a
phenomenon over time or space.
18
Example: Draw a bar diagram to represent the
following data related to a school
Year
1990
1991
1992
1993
1994
1995
No. of
students
210
242
290
315
340
355
Present the data with a suitable diagram
19
20
ii. Multiple Bar-diagrams: It is used when a
comparison is to be made between two or more
variables. These are also used for comparing
magnitudes of one variable in two or three
aspects.
Example: Following data relate to the facultywise enrolment of students in a college:
Years
1993
1994
1995
No. of arts students
95
110
120
No. of science students
160
170
165
95
110
No. of commerce students 75
Represent the data by suitable diagram.
21
22
iii. Subdivided Bar-diagrams: also known as
components bar diagram, is useful in a situation
when it is necessary to show and compare the
breakup of one variable into several
components.
Example: Following data relate to year wise
enrolment in a college, classified according to
sex: Year
1990-91 1991-92 1992-93 1993-94 1994-95
No. of
girls
810
825
844
780
820
No. of
Boys
1215
1160
1325
1410
1480
Total
2025
1985
2169
2190
2300
Represent the data by suitable diagram
23
24
iv. Percentage Bar-diagrams: The construction of
percentage bar diagram is similar to the
subdivided bar chart. The difference between
the two is that, in subdivided bar diagram , the
component parts are shown in absolute
quantities, while in the percentage bar diagram,
the component parts are transformed into
percentages of the total. In this diagram all the
bars are of equal heights. These bars are then
divided in terms of percentages of the
components.
25
Example: Following data relate to the facultywise enrolment of students in a college:
Years
1993
1994
1995
No. of arts students
95
110
120
No. of science students
160
170
165
95
110
No. of commerce students 75
Represent the data by percentage bar diagram.
26
Data can be represented as
% of students
Year
Total %
Arts
Science
Commerce
1993
28.79
48.48
22.73
100
1994
29.33
45.34
25.33
100
1995
30.38
41.77
27.85
100
27
28
v. Deviation Bar-diagrams: These are used to
show the magnitudes of a phenomenon, i.e. net
profit, net loss, net exports or imports etc. Bars
in these diagrams can assume both negative and
positive values.
Example: Depict the following data by a suitable
diagram (Balance of trade=Export-Import)
Year
Export
Import
Balance of
trade
(Millions Rs.)
1993
98
115
-17
1994
110
140
-30
1995
115
96
+19
1996
120
100
+20
29
30
vi. Broken bars: It is used to represent series
having wide variations in values.
Example: The following data relate to sales in
five firms A,B,C,D,E.
Firms
A
B
C
D
E
Sales
(in Lakh Rs.)
25
38
300
200
56
Use a suitable bar diagram to represent the data.
31
32
2. Two dimensional diagrams: Such diagrams
are useful in situations when the proportion
between the magnitudes of the given values of
the variable is quite large.
33
Rectangle
diagram
Square and circle
diagram
Two Dimensional
diagram
Pie diagram
Multiple pie
diagram
34
i. Rectangle Diagrams: These diagrams are used
for two dimensional comparisons. These
rectangles vary in height as well as in the width,
so that the areas of rectangles represent the
magnitude of the variable over time or space or
over some other characteristic of variation.
35
Example: The following data represent the
expenditure of the two families on various items.
Represent the data by a rectangle diagram.
S. No.
Items
Expenditure (Rs.)
Family A
Family B
1
Food
1200
1700
2
Clothing
500
800
3
House Rent
600
900
4
Fuel and electricity
250
300
5
Miscellaneous
450
800
Total
3000
4500
36
37
ii. Squares and circle Diagrams: It is useful when
the proportion between the magnitudes of the
given value is quite large.
For drawing squares, sides of squares are kept
proportional to the magnitudes of the values
and for circle diagrams, radii of the circles should
be proportional.
Example: The following data relate to the plan
outlay of a country for three plans.
Five year plan
I
IV
VII
Outlay ( Rs. ‘000 crores)
196
2060
8820
Represent the data by a square and circle
diagram.
38
Plan
Outlay
Side of square
Or
radius of circle
I
196
14
0.7
IV
2060
45.39
2.26
V
8820
93.91
4.7
Ratio
39
Square Diagram
Plan I a=0.7”
Plan IV a=2.26”
Plan V a=4.7”
40
Circle Diagram
Plan I r=0.7”
Plan IV r=2.26”
Plan V r=4.7”
41
iii. Pie- Diagrams: This diagram is generally used
to compare the relations between various
subdivisions of the value. Pie diagram is circle
divided into sectors with areas equal to the
corresponding components. A pie diagram
shows the components or subdivisions in terms
of percentages only and not in absolute terms.
Example: The following data relate to faculty
wise enrolment in a college
Faculty
Science Arts
Commerce
Total
No. of students
2010
2390
5500
1100
Represent the data by a pie diagram.
42
Faculty
No. of students
Angle in degree
Science
2010
2010
 3600  131.560
5500
Arts
1100
Commerce
2390
1100
 3600  720
5500
2390
 3600  156.440
5500
Total
5500
3600
43
44
iv. Multiple Pie Diagrams: A multiple pie
diagram is used for two dimensional
comparisons, where a variable value is shown
over time, space or in terms of some other
characteristic and the variable values are also
broken into components.
45
Example: The following data represent the
expenditure of the two families on various items.
Represent the data by a multiple pie diagram.
S. No.
Items
Expenditure (Rs.)
Family A
Family b
1
Food
1200
1700
2
Clothing
500
800
3
House Rent
600
900
4
Fuel and electricity
250
300
5
Miscellaneous
450
800
Total
3000
4500
46
Expenditure (Rs.)
S. No.
Angles in degrees
Items
Family A
Family B
Family A
Family B
1
Food
1200
1700
1200
 3600  1440
3000
1700
 3600  1360
4500
2
Clothing
500
800
500
 3600  600
3000
800
 3600  640
4500
3
House Rent
600
900
600
 3600  720
3000
900
 3600  720
4500
4
Fuel and
electricity
250
300
250
 3600  300
3000
5
Miscellaneous
450
800
Total
3000
4500
450
 3600  540
3000
3600
300
 3600  240
4500
800
 3600  640
4500
3600
47
48
3. Pictorial diagrams or Pictogram: Statistical data
may be represented with the help of pictures also.
Such a presentation is called pictorial diagram or
pictogram. In pictograms, the magnitude of the
values are explained with the help of pictures. In a
pictogram, a symbolic picture represents the total
magnitude of the values.
Example: The following data relate to the
production of electric bulbs in a factory.
Year
1992
1993
1994
1995
Production of bulbs (In
millions)
32
57
79
89
Represent the data by pictogram.
49
90
80
70
60
50
40
30
20
10
1992
1993
1994
1995
50
4. Cartograms or Maps: Statistical data classified
according to geographical regions are also
representable with the help of suitable maps.
The representation of statistical data by maps is
called cartogram.
51
52
Graphical representation: It is used in the
situations when we observe some functional
relationship between the values of the variables.
It provides us an accurate conception of the
shape of a frequency distribution. There are
many forms of graphs which can be broadly
classified as:
1. Graphs of frequency distribution
2. Graphs of time series or line graphs
53
Histogram
Graphs of
frequency
distributions
Frequency
Polygon
Frequency Curve
Cumulative
frequency curve
or Ogives
54
Graphs of frequency distribution: The graphs
representing a frequency distribution are:
1. Histogram
2. Frequency Polygon
3. Frequency curve
4. Cumulative frequency curve or ‘Ogive’
55
Example: The table below given the distribution
of the age of members in a sports club
Age Group (years) No. of
members
15-19
11
20-24
36
25-29
28
30-34
13
35-39
7
40-44
3
44-49
2
56
The smoothened frequency distribution
will be
Age groups (years)
No. of members
14.5-19.5
11
19.5-24.5
36
24.5-29.5
28
29.5-34.5
13
34.5-39.5
7
39.5-44.5
3
44.5-49.5
2
57
Histogram for above data is represented
as
58
The following chart shows the frequency
polygon
40
35
30
25
20
15
10
5
0
14.5
19.5
24.5
29.5
34.5
39.5
Age
44.5
Group
59
The following chart shows the frequency
curve
1
0.8
14.5
0.6
19.5
0.4
24.5
29.5
0.2
39.5
0
Age
44.5
Group
60
Measures of
Dispersion
61
The extent or degree to which data tend to
spread around an average is called the dispersion
or variation. Measures of dispersion help us in
studying the extent to which observations are
scattered around the average or central value.
62
Types of Dispersion: There are two types of
measures of Dispersion
a) Absolute measure of Dispersion: These are
expressed in the same unit in which the
observations are given. Thus, absolute
measures of dispersion are useful for
comparing variation in two or more
distributions where units of measurement is
the same. Such measures are not suitable for
comparing the variability of the distributions
expressed in different units measurement.
63
b) Relative measure of dispersion: These are
expressed as ratio or percentage or the
coefficient of the absolute measure of
dispersion. Relative measures are useful for
comparing variability in two or more
distributions where units of measurement may
be different.
64
Various measures of Dispersion
The following are some important measures of
dispersion:
1. Range
2. Interquartile Range and Quartile Deviation
3. Mean Deviation or average deviation
4. Standard Deviation
65
Range: Range is the simplest measure of
Dispersion. For a given set of observations, the
range is the difference between the largest and
the smallest observation. Thus
Range=R=L-S
Where
L=the largest observation
S= the smallest observation
R= the Range
In case of grouped data, the range is defined as
the difference between the upper limit of the
highest class and the lower limit of the smallest
class.
66
Coefficient of Range: Range is an absolute
measure of dispersion which is unsuitable for
comparing variation in two or more distributions
expressed in different units. So a relative
measure of dispersion called the coefficient of
range is defined as:
LS
Coefficient of range=
LS
67
Example: Marks of 10 students in Mathematics
and Statistics are given below:
Marks in Mathematics
25 40
30
35 21 45
23
33
10
29
Marks in Statistics
30 39
23
42 20 40
25
30
18
19
a) Compare the range of marks in the two
subjects.
b) Compare the coefficients of range for both
the subjects.
68
Solution: Highest Marks in Mathematics = 45
Lowest marks in Mathematics = 10
Range of marks in Mathematics R  L  S  45 10  35
Coefficient of Range= L  S  45 10  0.64
L  S 45 10
Highest Marks in Statistics = 42
Lowest marks in Statistics = 18
Range of marks in Statistics R  L  S  42 18  24
L  S 42 18

 0.4
Coefficient of Range=
L  S 42 18
The range as well as the coefficient of range for
marks in Mathematics are higher than that of
marks in Statistics.
69
Example: Find the range and coefficient of range
from the following
Mid value
5
10
15
20
25
30
35
Frequency
7
5
8
12
8
9
8
Solution: Class limits are
(2.5-7.5), (7.5-12.5),…………., (32.5-37.5)
Range R  L  S  37.5  2.5  35
L  S 37.5  2.5

 0.875
Coefficient of range
LS
37.5  2.5
70
Interquartile Range and Quartile Deviation:
Interquartile range includes the middle fifty
percent of the distribution or it is the difference
between the third quartile (Q3) and the first
quartile (Q1).
Interquartile Range= Q3-Q1
Quartile Deviation or semi interquartile range is
defined as the average amount by which the two
quartiles differ from the median.
Quartile deviation or semi interquartile
range=(Q3-Q1)/2
71
Quartile deviation is an absolute measure of
dispersion. For comparing two or more
distributions in respect of variation, the
coefficient of quartile deviation is defined as
Q3  Q1
Coefficient of Q.D.= Q  Q
3
1
Example: From the following information of
wages of 15 workers, find interquartile range,
quartile deviation and coefficient of Q.D.
S.No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Wages 520 550 440 580 450 620 470 680 400 490 420 480 440 480 500
(Rs.)
72
Solution: Arrange the wages in ascending order
S.No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Wages 400 420 440 440 450 470 480 480 490 500 520 550 580 620 680
(Rs.)
 N 1
th
Q1  
 term  4 term  440
 4 
th
 N 1
th
Q3  3
 term  12 term  550
 4 
th
Interquartile Range = Q  Q  550  440  110
Q3  Q1 550  440
Quartile deviation= 2  2  55
Coefficient of Quartile Deviation=
3
1
Q3  Q1 550  440

 0.11
Q3  Q1 550  440
73
Example: Calculate quartile deviation and its
coefficient from the following distribution:
Weekly income (Rs.)
58 59 60 61 62 63 64 65 66
No. of workers
2
3
6
15 10 5
4
3
1
74
Solution:
Weekly income (Rs.)
58 59 60 61 62 63 64 65 66
No. of workers
2
3
6
Cumulative
Frequency
2
5
11 26 36 41 45 48 49
15 10 5
4
3
1
 N 1
th
Q1  
 term  12.5 term  61
 4 
th
 N 1
th
Q3  3
 term  37.5 term  63
 4 
th
Q3  Q1 63  61

1
2
2
Quartile deviation=
Coefficient of Quartile Deviation=
Q3  Q1 63  61

 0.016
Q3  Q1 63  61
75
Example: The following is the age distribution of
799 workers.
Age Group
20-25 25-30
30-35 35-40 40-45
45-50
50-55
55-60
No. of
workers
50
100
120
70
59
70
180
150
Find Quartile deviation and its coefficient.
Solution:
Age Group
20-25
25-30
30-35
35-40
40-45
45-50
50-55
55-60
No. of
workers
50
70
100
180
150
120
70
59
Cumulative
Frequency
50
120
220
400
550
670
740
799
76
th
N
Q1    term  199.75th term  lies in 30  35
4
N C
799  120
4
Q1  l1  4
 h  30 
 5  33.9875  34
f
100
th
N
Q3  3  term  599.25th term  lies in 45  50
4
3N  C
3  799  550
4
4
Q3  l1 
 h  45 
 5  47.05  47
f
120
Q3  Q1 47  34

 6.5
2
2
Quartile Deviation=
Coefficient of Quartile deviation=
Q3  Q1 47  34

 0.16
Q3  Q1 47  34
77
Mean Deviation or Average deviation: Mean
deviation of a series is the arithmetic mean of
the absolute deviations of various items from
some central value, such as mean, median,
mode.
1. For ungrouped data
a) Mean deviation from mean X
1
M .D.   X  X
N
b) Mean deviation from median M d
X
M .D.M d 
1
N
 X M
d
c) Mean deviation from mode M o
1
M .D.   X  M
N
Mo
o
78
2. For grouped data
a) Mean deviation from mean X
1
M .D. 
f X X

f
b) Mean deviation from median M d
i
X
i
i
M .D.M d 
1
f
f
i
Xi  Md
i
c) Mean deviation from mode M o
M .D.M o 
1
f
f
i
Xi  Mo
i
79
Coefficient of Mean deviation: Mean deviation
is an absolute measure of dispersion. The
corresponding relative measure called coefficient
of mean deviation, is obtained by dividing mean
deviation by the average or central value used
for calculating it.
M .D.
Coefficient of M.D.= Mean or median or mod e
80
Example: Compute mean deviation from mean
and its coefficient from the following data
relating to the marks obtained by a batch of 11
students in a class test:
Marks
10 70 50 53 20 95 55 42 60 48 80
81
Solution:
Marks(X)
XX
10
43
70
17
50
3
53
0
20
33
95
42
55
2
42
11
60
7
48
5
80
27
583
190
X
583
 53
11
Mean deviation=
1
N
 X X

190
 17.27
11
Coefficient of mean deviation=
M .D. 17.27

 0.325
Mean
53
82
Example: Calculate mean deviation from median
from the following data. Also compute the
coefficient of M.D.
Size
2
4
6
8
10
12 14
16
Frequency
2
2
4
5
3
2
1
1
83
Solution:
f X  Md
X  Md
X
F
C.F.
2
2
2
6
12
4
2
4
4
8
6
4
8
2
8
8
5
13
0
0
10
3
16
2
6
12
2
18
4
8
14
1
19
6
6
16
1
20
8
8
32
56
20
 N 1
th

 term  10.5 term  8
 2 
th
Median(Md)=
 2.8
Mean deviation= N1  f X  M  56
20
Coefficient of mean deviation= M .D.
d
median

2. 8
 0.35
8
84
Example: Compute the mean deviation (M.D.)
from mean from the following data.
Classes
0-20
20-40 40-60 60-80
80-100
100-120
Frequency
5
50
10
6
84
32
Also find the coefficient of M.D.
85
Solution:
Frequency
Mid
d  x  70
20
(f)
point(x)
Classes
fd
X  51
f X  51
0-20
5
10
-3
-15
41
205
20-40
50
30
-2
-100
21
1050
40-60
84
50
-1
-84
1
84
60-80
32
70
0
0
19
608
80-100
10
90
1
10
39
390
100-120
6
110
2
12
59
354
187
-177
2691
 fd  h  70  177  20  51.06  51
187
f
Mean=
1
2691
f
X

X

 14.39
Mean deviation= N 
187
Coefficient of mean deviation= M .D.  14.39  0.28
A
mean
51
86
Standard Deviation: The standard deviation is
defined as the positive square root of the
arithmetic mean of the squares of deviations of
the observations from the arithmetic mean.
a) For ungrouped data

1
2


X

X

N
b) For grouped data or frequency distribution

1
2


f
X

X

N
87
Variance: The square of standard deviation is
known as variance.
a) For ungrouped data
1
 
N
2
 X  X 
2
b) For grouped data or frequency distribution
1
 
N
2
 f X  X 
2
88
Example: Calculate Standard deviation from the
following set of observations:
X
10
11
17
25
7
13
21
10
12
14
89
Solution:
X
X-14
(X-14)2
10
-4
16
11
-3
9
17
3
9
25
11
121
7
-7
49
13
-1
1
21
7
49
10
-4
16
12
-2
4
14
0
0
140
274
X
 14
Mean = N  140
10
Standard deviation=
1
X  X 2  274  5.23

N
10
90
Example: Calculate standard deviation of the
following discrete frequency distribution
Size(X)
4
5
6
7
8
Frequency
6
12
15
28 20
9
10
14
5
91
Solution:
Size(X)
Frequency (f)
d=X-7
fd
fd2
4
6
-3
-18
54
5
12
-2
-24
48
6
15
-1
-15
15
7
28
0
0
0
8
20
1
20
20
9
14
2
28
56
10
5
3
15
45
6
238
100
 fd  h  7  6 1  7.06
100
f
Mean =
Standard deviation=
A
h


2
  fd 
238
6


  1


  1.54


f
f
100
100


 
fd 2
2
92
Moments: Moments are used to describe the
characteristics of a distribution. The moments of
a distribution are the arithmetic mean of the
various powers of the deviations of items from
some given numbers.
93
Moments about mean (Central moment):
a) For an individual series: If x1 , x2 ,...xn be the n
observations in a data set with mean x then
rth moment about the mean of a variable is
defined as
n
r 
 x  x 
i 1
r
i
n
, r  0,1,2,.....
94
b) For grouped data or frequency Distribution:
Let x1 , x2 ,...xn be the n observations in a data set
with corresponding frequencies f1 , f 2 ,... f n
respectively. Then rth moment about the mean of
a variable is defined as
n
r 
where
 f x  x 
i 1
r
i
i
, r  0,1,2,.....
N
n
N   fi
i 1
95
In particular
 0 1
1  0
2  
2
96
Moments about an arbitrary point (Raw
moment):
x the n
a) For an individual series: If x1 , x2 ,...xn be
observations in a data set then rth moment
about arbitrary point A is defined as
n
r 
'
  x  A
i 1
r
i
n
, r  0,1,2,.....
97
b) For grouped data or frequency Distribution:
Let x1 , x2 ,...xn be the n observations in a data set
with corresponding frequencies f1 , f 2 ,... f n
respectively. Then rth moment about arbitrary
point A is defined as
n
r 
'
where
 f  x  A
i 1
r
i
i
, r  0,1,2,.....
N
n
N   fi
i 1
98
In particular
0 1
'
1  x  A
'
1
2 
N
'
n
 f  x  A
i 1
2
i
i
99
Moment about zero or origin:
Let x1 , x2 ,...xn be the n observations in a data set
with corresponding frequencies f1 , f 2 ,... f n
respectively. Then rth moment about origin is
defined as
n
r 
where
fx
i 1
i i
N
r
, r  0,1,2,.....
n
N   fi
i 1
100
In particular
0 1
1  x
1
2 
N
n
fx
2
i i
i 1
101
'

Relation between r and r
r
'
r
' '
r
' '2
r  r  C1r 1 1  C2 r 2 1  ....   1 1' r
In particular
'2
2  2  1
'
' '
'3
3  3  32 1  21
'
' '
' '2
4  4  43 1  62 1
'
'4
 31
102
'

Relation between r and r
 r  r '  rC1r 1' A  rC2r 2' A2  ....  Ar
In particular
1  x
2 
'
2  21 A 
'
2
A
103
Relation between r and r
 r  r  rC1r 1x  rC2r 2 x 2  ....  x r
In particular
1  x
 2  2  x 2
 3  3  3 2 x  x 3
 4   4  4 3 x  6  2 x 2  x 4
104
Example: Calculate first four moments about
mean from the following distribution:
X
0
1
2
3
4
5
6
7
8
F
1
8
28
56
70
56
28
8
1
105
Solution:
X
Frequency(f)
fx
X-4
f(X-4)
f(X-4)2
f(X-4)3
f(X-4)4
0
1
0
-4
-4
16
-64
256
1
8
8
-3
-24
72
-216
648
2
28
56
-2
-56
112
-224
448
3
56
168
-1
-56
56
-56
56
4
70
280
0
0
0
0
0
5
56
280
1
56
56
56
56
6
28
168
2
56
112
224
448
7
8
56
3
24
72
216
648
8
1
8
4
4
16
64
256
256
1024
0
512
0
2816
fX 1024

Mean = X  f  256  4

106
f X  X 

 
0
f
f X  X 
512

 

2
256
f
f X  X 

 
0
f
f X  X 
2816

 

 11
256
f
1
2
2
3
3
4
4
107
Example: The first three moments of a
distribution about the value 2 of the variable are
1,16 and -40. Show that the mean is 3, the
variance is 15, the third moment about mean is 86. Also show that the first three moments
about the origin are 3,24 and 76.
108
Solution: Given that
A  2   1   16   40
'
2
'
1
'
3
  xA  x   A3
'
1
'
1
  2      16  1  15
2
'
2
'2
1
'
'
'
'
3
    3   2  40  3 16 1  2 1  86
3
3
2 1
1
1  x  3
 2  2  x 2  24
 3  3  3 2 x  x 3  76
109
Skewness: Skewness means lack of symmetry. A
frequency distribution of the set of values that is
not symmetrical is called asymmetrical or
skewed. In a skewed distribution, extreme values
in a data set move towards one side of a
distribution. When extreme values moves
towards the upper or right tail, the distribution is
positively skewed. When extreme values moves
towards the lower or left tail, the distribution is
negatively skewed. The basic purpose of
measuring skewness is to estimate the extent to
which an distribution is distorted from perfectly
symmetrical distributions.
110
Symmetrical distribution
Positively skewed distribution
Negatively skewed distribution
111
Mean=Median=Mode
Mean<Median<Mode
Mean>Median>Mode
112
Measure of Skewness: The degree of skewness
in a distribution can be classified as follows:
a) Absolute measure of skewness
b) Relative measure of skewness
113
Measure of Skewness: The degree of skewness
in a distribution can be classified as follows:
a) Absolute measure of skewness
b) Relative measure of skewness
114
Absolute measure of skewness: Skewness can
be measured in absolute terms by finding the
difference between the mean and the mode or
mean and median.
Skewness = Mean-Mode
Skewness = Mean-Median
Skewness = Q3+Q1-2Median
115
Relative measure of skewness: The Relative
measure of skewness is known as coefficient of
skewness is obtained by dividing the absolute
measure of skewness by any of the measure of
dispersion.
116
Karl Pearson
coefficient of
skewness
Relative measure
of skewness
Bowley
coefficient of
skewness
Kelly’s coefficient
of skewness
Method of
moments
117
1. Karl Pearson’s coefficient of skewness: Karl
Pearson’s coefficient of skewness is based on
the difference between mean and mode and
is given by


Coefficient of skewness Sk p 
3 Mean  Median 
S tan dard deviation
118
2. Bowley coefficient of skewness: This method
is based on the fact that in a symmetrical
distribution, the quartiles are equidistant from
the median.
Coefficient of skewness  SkB  
Q1 Q3 2 Median
Q3 Q1
119
3. Kelly’s coefficient
of skewness: Kelly’s
coefficient of skewness is based on percentile
and deciles.
Coefficient of skewness  Skk  
P10  P90 2 Median
P90  P10
D1  D9  2Median

D9  D1
120
4. Method of moments: It is denoted by Skm
Coefficient of skewness  SkM  
3
23 2
 1
121
Example: Calculate Karl Pearson’s coefficient of
skewness from the following:
Marks above
0
10
20
30
40
50
60
70
80
No. of
students
150
140
100
80
80
70
30
14
0
122
Solution:
Class
Frequency Cumulative
(f)
frequency
Mid point
(x)
d=(x-45)/10
fd
fd2
0-10
10
10
5
-4
-40
160
10-20
40
50
15
-3
-120
360
20-30
20
70
25
-2
-40
80
30-40
0
70
35
-1
0
0
40-50
10
80
45
0
0
0
50-60
40
120
55
1
40
40
60-70
16
136
65
2
32
64
70-80
14
150
75
3
42
126
-86
830
150
th
N
th
  term  75 term  lies in 40  50
2
N C
l 2
 h  45
f
Median=
123
fd
86

A


h

45

10  39.27
Mean=  f
150
Standard deviation=
h


2
  fd 
830

86


  10 


  22.81


f
f
150
150


 
fd 2
2
Coefficient of skewness(Skp)=
3Mean  Median 
s tan dard deviation
339.27  45

 0.75
22.81
124
Example: From the following distribution,
calculate the first four moments about mean,
and coefficient of skewness based on moments:
Income(Rs)
0-10
10-20
20-30
30-40
Frequency
1
3
4
2
125
Solution:
X
Frequency
Mid
(f)
point(x)
fx
X-22 f(X-22)
f(X-22)2
f(X-22)3
f(X-22)4
0-10
1
5
5
-17
-17
289
-4913
83521
10-20
3
15
45
-7
-21
147
-1029
7203
20-30
4
25
100
3
12
36
108
324
30-40
2
35
70
13
26
338
4394
57122
0
810
-1440
148170
10
220
fX 220

Mean = X  f  10  22

126
f X  X 

 
0
f
f  X  X  810

 

 81
10
f
f X  X   1440

 

 144
10
f
f X  X  148170

 

 14817
10
f
1
2
2
3
3
4
4
Coefficient of skewness(Skm)=
3  144  16
 32 
32
2
81
81
127
Example: Calculate Bowley’s coefficient of
skewness from the following:
Wages (Rs.)
30-40
40-50
50-60
60-70 70-80
80-90
90-100
No. of persons
1
3
11
21
43
32
9
Solution:
Wages (Rs.)
30-40
40-50
50-60
60-70 70-80
80-90
90-100
No. of persons
1
3
11
21
43
32
9
Cumulative
frequency
1
4
15
36
79
111
120
128
th
N
Q1    term  30th term  lies in 60  70
4
N C
120  15
4
4
Q1  l1 
 h  60 
10  67.14  67
f
21
th
N
Q3  3  term  90th term  lies in 80  90
4
3N  C
3 120  79
4
4
Q3  l1 
 h  80 
10  83.44  83
f
32
th
Median=
N
th
  term  60 term  lies in 70  80
2
N C
120  36
2
l 2
 h  70 
10  75.58  76
f
43
Coefficient of skewness (SkB)=
Q1  Q3  2Median 67  83  2  76

 0.125
Q3  Q1
83  67
129
Kurtosis: The measure of kurtosis describes the
degree of concentration of observed frequencies
in a given data. Kurtosis is used to test how near
a frequency distribution conforms to normal
curve or it is the degree of peakedness of a
distribution, usually taken in relative to a normal
distribution.
130
Measures of Kurtosis: Karl Pearson’s coefficient
of kurtosis is defined as
2 
4
 22
The kurtosis of a distribution is also defined as
 2  2  3
If ,  2  0 the distribution is leptokurtic
If ,  2  0 the distribution is platykurtic
If ,  2  0 the distribution is mesokurtic
131
132
Example: Calculate the coefficient of skewness
and kurtosis from the following data:
Profit(Rs. In
lakh)
10-20
20-30
30-40
40-50
50-60
No. of
companies
18
20
30
22
10
133
Solution:
Class
Frequency
interval
(f)
Mid
point(x)
d=(x-35)/10
fd
fd2
fd3
fd4
10-20
18
15
-2
-36
72
-144
288
20-30
20
25
-1
-20
20
-20
20
30-40
30
35
0
0
0
0
0
40-50
22
45
1
22
22
22
22
50-60
10
55
2
20
40
80
160
-14
154
-62
490
100
1'  
fd
f

'
2
 fd

f
h 
 14
10  1.4
100
2
 h2 
154
100  154
100
134
3
fd

 62
 
h 
103  620
100
f
'
3
4' 
4
fd

f
3
 h4 
490
104  49000
100
2  2'  1'2  152.04
   '  3 '  '  2 '3  21.312
3
3
2 1
1
 4   4'  43' 1'  6 2' 1'2  31'4  47327.52
3
21.312
1  3 2 
 0.0114
32
2
152.04
2 
 4 47327.52

 2.0474
2
2
2
152.04
135
Example: Prove that the frequency distribution
curve of the following frequency distribution is
leptokurtic:
Class
10-15 15-20 20-25
25-30 30-35 35-40 40-45
45-50
50-55
Frequency
1
19
5
1
4
8
35
20
7
136
Solution:
Class
Frequency
Mid
d=(x-32.5)/5
interval
(f)
point(x)
fd
fd2
fd3
fd4
10-15
1
12.5
-4
-4
16
-64
256
15-20
4
17.5
-3
-12
36
-108
324
20-25
8
22.5
-2
-16
32
-64
128
25-30
19
27.5
-1
-19
19
-19
19
30-35
35
32.5
0
0
0
0
0
35-40
20
37.5
1
20
20
20
20
40-45
7
42.5
2
14
28
56
112
45-50
5
47.5
3
15
45
135
405
50-55
1
52.5
4
4
16
64
256
2
212
20
1520
120
1'  
fd
f
h 
2
1
5 
120
12
137
2' 
2
fd

f
 h2 
212 2 265
5 
120
6
fd
20
125

 
h 
5 
120
6
f
fd
1520
23750

 
h 
5 
120
3
f
3
'
3
3
3
4
'
4
4
4
6359
2     
144
'
2
'2
1
54684719
 4    4   6   3 
6912
'
4
4
2  2 
2
'
3
'
1
54684719

'
2
'2
1
'4
1
6912  4.057  3
6359
144

2
138
Example: The first four moments about the
working mean 28.5 of a distribution are 0.294,
7.144, 42.409 and 454.98. Calculate the
moments about mean. Also evaluate β1 β2 and
comment upon skewness & Kurtosis of the
distribution.
139
Solution: Given that
A  28.5
3'  42.409
'

2  7.144
  0.294
'
1
4'  454.98
2  2'  1'2  7.057564
   '  3 '  '  2 '3  36.16
3
3
2 1
1
 4   4'  43' 1'  6 2' 1'2  31'4  408.79
3 2 36.16 2
1  3 
 3.719
3
 2 7.058
 4 408.79
2  2 
 8.206  3
2
 2 7.058
140
Related documents