Download B.Sc PSYCHOLOGICAL STATISTICS . Counselling Psychology II SEMESTER

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
PSYCHOLOGICAL STATISTICS
II SEMESTER
Complementary Course
For
B.Sc. Counselling Psychology
(CU-CBCSS)
(2014 Admission onwards)
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
Calicut university P.O, Malappuram Kerala, India 673 635.
School of Distance Education
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
STUDY MATERIAL
COMPLEMENTARY COURSE
For
B.Sc. COUNSELLING PSYCHOLOGY
PSYCHOLOGICAL STATISTICS
II Semester
Prepared by:
Ms. Sajila
Research Scholar
University of Calicut
Layout: Computer Section, SDE
©
Reserved
Psychological Statistics
Page 2
School of Distance Education
Psychological Statistics
CONTENT
PAGES
Module - 1
05 – 16
Module - 2
17 – 26
Module - 3
27-34
Page 3
School of Distance Education
Psychological Statistics
Page 4
School of Distance Education
Module 1: Frequency Distribution and Graphs
Horace Secrist defines statistics as, “aggregate of facts, affected to a marked extent
by multiplicity of causes, numerically expressed, enumerated or estimated according to a
reasonable standard of accuracy, collected in a systematic manner for a predetermined
purpose and placed in relation to each other”.
Meaning of Data
The term ‘data’ refers to facts or evidences relating to a group, situation or a
phenomenon. It may include raw facts such as name, measures of height, weight and scores
on different forms of tests, experiments or surveys.
Measures of Data: Continuous and Discrete
Data may be either in continuous or discrete form. Data relating to psychological
and physical traits fall into continuous data. A continuous series can have any degree of
subdivision, with each measure, which may be an integer or a fraction, existing anywhere
within the range of the scale used. ie., Continuous data are not restricted to defined separate
values, but can occupy any value over a continuous range. Between any two continuous
data values there may be an infinite number of other values. Examples: measure of heights
like 160.5cms,159.6 cms, etc, measure of distances like 25.7kms,56.5kms,etc, scores
obtained in exams like 85.5, 73.5,etc.
Discrete data can only take particular values. There may potentially be an infinite
number of those values, but each is distinct and there's no grey area in between. ie.,
measures that fall under discrete series are separate and distinct. There is real gap between
the measures. Examples: number of students in a class like 50, 45, etc, number of books in
a library like 1000, 2345, etc.
Organisation of Data
The different aspects of psychology may be studied by conducting different forms
of tests, surveys and experiments which yields valuable data. Data in its original form
having little meaning to the reader or investigator is termed as raw data. In order to make
the raw data meaningful, it has to be organised or arranged systematically. This process of
organising or arranging original data in a systematic manner in order to make meaningful
interpretations is termed as organisation or grouping of data.
There are different methods for the organisation of data. Data may be organised in
any of the following forms as given below.
1. Statistical Tables
2. Rank Order
3. Frequency Distribution
Organising data in the form of Statistical Tables: Under this method, data are presented in
tabular form or arranged in to rows and columns of different headings. The tables
Psychological Statistics
Page 5
School of Distance Education
constitute original data or raw scores as well as the percentages, means, standard
deviations, etc. Consider the statistical table given below.
Example 1.1
Organising data in the form of Rank Order: Under this method, raw data are arranged in
ascending or descending series which reveals the order with respect to ranks or merit
position of the individual. Consider the following example.
Example 1.2: The following are scores obtained by 40 students in a test. Present the data in
a tabular form depicting the rank order. The scores are:
72
55
64
65
54
85
45
60
60
76
52
65
55
39
53
40
80
64
35
52
45
63
55
40
53
46
63
42
76
62
38
78
50
42
53
48
60
63
62
52
Solution: The rank order tabulation of the data
Sl No. Score Sl No. Score Sl No. Score Sl No. Score Sl No. Score
1
2
3
4
5
6
7
8
35
38
39
40
40
42
42
45
9
10
11
12
13
14
15
16
45
46
48
50
52
52
52
53
17
18
19
20
21
22
23
24
53
53
54
55
55
55
60
60
25
26
27
28
29
30
31
32
60
62
62
63
63
63
64
64
33
34
35
36
37
38
39
40
65
65
72
76
76
78
80
85
Frequency Distribution: Frequency Distribution is a method of presenting data showing the
frequency or the number of time a score or group of scores occur in a given distribution.
Under this method data is organised in to groups or classes in which each score is allotted a
Psychological Statistics
Page 6
School of Distance Education
place in the respective group or class. The number of times a particular score or group of
score occurs in the given distribution is also given. This is known as the frequency of a
score or group of scores.
Construction of Frequency Distribution Table
Data are organised in to a frequency distribution systematically. The following steps
are used to construct a frequency distribution table:
1. Finding the Range: The first step is finding out the range of the given series of data.
Range is computed by subtracting the lowest score from the highest one given in the
data series. In the Example 1.2 given, the range of distribution of data will be,
Range= Highest score – Lowest score, ie., 85-35=50.
2. Determining the Class Interval: Class interval denotes the number and size of classes
of groups used for grouping or organising data. There are two methods for this:
i. Computing the class interval (i) using the formula:
=
As a general rule, Tate (1955) has given the following rule for deciding the number
of classes desired.
ii.
For items less than or equal to 50, the number of classes may be 10.
For 50 to 100 items, then 10 to 15 classes are appropriate.
For more than 100 items, 15 or more classes may be appropriate.
Ordinarily, not fewer than 10 classes or more than 20 classes are used.
Under the second method, class interval (i) is decided first and then the number
of classes is determined. For this purpose usually, the class intervals of 2, 3, 5
or 10 units in length are used.
Thus in the given Example 1.2, the class interval (i) will be,
=
Here, range is 50. As the number of scores is 40, which is less than 50, it may
be sufficient to take 10 classes. Hence class interval (i) will be 50/10 ie., 5.
3. Preparing Frequency Distribution Table
After determining the size and class interval, we proceed to preparing the frequency
distribution table. This follows two steps:
i.
ii.
Writing classes of the distribution
Tallying the score and checking the tallies
The first step is the writing of classes of distribution. For this, first the lowest
classes and then the subsequent higher classes are formed.
Psychological Statistics
Page 7
School of Distance Education
In the example, the lowest class will be 30-39 and subsequently the higher
classes 40-49, 50-59, 60-60, 70-79 and 80-89.
The second step involves tallying the scores and checking the tallies. For this,
the score given in the distribution are taken one by one and tallied in their proper
classes as shown in the Table 1.1. The tally marks against each class are then
counted and checked to determine the frequencies of that class. The total
frequencies should be equal to the number for individuals whose scores have been
tabulated.
Table 1.1: Frequency Distribution Table
Class
Tallies
Frequency
80 – 89
2
70 – 79
4
60 – 69
12
50 – 59
11
40 – 49
8
30 – 39
3
Cumulative frequency and Cumulative Percentage Frequency Distributions
A frequency distribution table shows how frequencies are distributed over the
different class intervals. For determining the number of scores or percentage of scores
lying above or below a class interval, another category of tables called cumulative
frequency and cumulative percentage frequency tables are constructed. The cumulative
frequency and cumulative percentage frequency distributions may be directly obtained
from frequency distribution. Consider Table 1.2.
Table 1.2:
Cumulative Frequencies and Cumulative Percentage Frequencies
Class Frequency Cumulative Frequency Cumulative Percentage Frequency
80 – 89
2
40
100
70 – 79
4
38
95
60 – 69
12
34
85
50 – 59
11
22
55
40 – 49
8
11
27.5
30 – 39
3
3
7.5
N= 40
In Table 1.2, cumulative frequencies are obtained by adding successively the
individual frequencies starting from the lowest class. These cumulative frequencies are
converted to cumulative percentage frequencies by multiplying each cumulative frequency
by 100/N., where N is the total number of frequencies.
The cumulative percentage frequencies show the percentage cases lying above or
below a given score or class. In the Table 1.2, consider for example, the cumulative
percentage 55%, which is computed as 22  100/40. This shows that 55% of students in the
Psychological Statistics
Page 8
School of Distance Education
class of 40 students, achievement score in mathematics lie below 59 or 59.5 which is the
actual or exact upper limit of the class 50-59.
Thus, the cumulative frequencies and cumulative percentage frequencies help us to
determine the relative position, rank or merit of an individual with respect to the members
of a group.
Diagrams and Graphs
The data obtained from surveys, tests and experiments may be organised in the form
of statistical and frequency distribution tables. Such an organisation helps in the better
understanding of data and interpreting them to derive valuable conclusions. The numerical
data may be easily analysed if they are represented graphically in the form of pictures and
graphs.
Meaning of graphical Data Representation
Graphical representation of data means representing numerical data in visual form
using pictures, diagrams and graphs for analysing the data more easily and effectively. It is
always considered as an effective and economical way for presenting, understanding,
analysing and interpreting of statistical data.
Advantages
1. Precise and easy to understand.
2. More economical and effective method of representing data.
3. Attractive and appealing.
4. Easy to remember.
5. Easy to make comparisons with other data effectively.
6. Proper estimation, evaluation and interpretation of data is possible.
7. Easy computations of mean, median, mode, etc.
8. Helps in determining the nature of data and forecasting the trends.
Modes of graphical representations
The two types of data such as ungrouped (data in raw form) and grouped (data
organised in to frequency distribution) uses separate methods of representing data in
graphical form.
Graphical Representation of Ungrouped Data
The ungrouped data usually uses the following graphical representations:
Diagrams (2) Pie Diagrams (3) Pictograms (4) Line Graphs
(1) Bar
1. Bar Diagrams
Data in different forms like raw scores, total scores or frequencies, computed
statistics and summarised figures like percentages and averages can be represented by
Psychological Statistics
Page 9
School of Distance Education
using bars. This form of graphical data representation is called bar diagrams. This may take
two forms like vertical and horizontal bar diagrams.
The lengths of bars are in the proportion of the value of variables (height, weight,
intelligence, marks, price, etc). The widths of bars are chosen arbitrarily. It is conventional
to have the space between the bars about one half of the width of a bar. Consider example
1.3 for the illustration of bar diagram.
Example 1.3: The following data relates to the student enrolment in Zenith College in
different years. Represent the following data using bar diagram.
Year
Number of Students Enrolled
2010 – 2011
1000
2011 – 2012
1220
2012 – 2013
900
2013 – 2014
1100
2014 – 2015
1400
2015 – 2016
1500
Student enrolment
The above data can be represented using bar diagrams in vertical and horizontal forms as
give in Figures 1.1 and 1.2.
1600
1400
1200
1000
800
600
400
200
0
2010 –
2011
2011 –
2012
2012 –
2013
2013 –
2014
2014 –
2015
2015 2016
Year
Figure 1.1: Vertical Bar Diagram – Student enrolment in Zenith College during the years
2010 to 2015.
Year
2014 – 2015
2012 – 2013
2010 – 2011
0
500
1000
1500
2000
Student enrolment
Figure 1.2: Horizontal Bar Diagram – Student enrolment at Zenith College during the
years 2010 to 2015.
Psychological Statistics
Page 10
School of Distance Education
2. Pie Diagram
In a pie diagram data is represented as sections or portions of a circle of 3600, in
which each part represents the amount of data converted in to angles. The total frequency
value is equated to 3600 and then the angles corresponding to component parts are
calculated. By using these angles, different sectors are drawn. Consider Example 1.4 for
the illustration of preparing pie diagram.
Example 1.4: The following data relates to Subjects offered for study in an institution and
the number of students enrolled. Present the data graphically in the form of a pie diagram.
Subjects
:
Science
Arts
Commerce
Students enrolled
:
100
130
170
The above data can be presented in the form of pie diagram as given below.
Courses Offered
Science
Arts
Commerce
Total
No. of Students
100
130
170
400
Science
43%
Arts
Angle of the Circle
(100/400)x360 = 900
(130/400)x360 = 1170
(170/400)x360 = 1530
3600
Commerce
25%
32%
Figure 1.3 Representation of Pie Diagram – Subjects offered for study and percentage of
students enrolled.
3. Pictograms
In data representation using pictograms, numerical data is represented by means of
picture figures appropriately designed in proportion to the numerical data.
Example 1.5: The number of students in classes 1 to 5 is given. Represent the data using
pictogram.Class : I
II
III
IV
V
Strength: 70
Psychological Statistics
70
60
50
40
Page 11
School of Distance Education
Figure 1.4: Pictogram representation of number of students in classes 1 to 5.
4. Line Graphs
In line graph form of data representation, data related to one variable is plotted on
the horizontal X-axis, and the other variable on the vertical Y- axis of line graph.
Consider Example 1.3 for drawing a line graph.
1600
No. of students enrolled
1400
1200
1000
800
600
400
200
0
2009-2010 2010 – 2011 2011 – 2012 2012 – 2013 2013 – 2014 2014 – 2015 2015 – 2016
Year
Figure 1.5: Line graph- Student enrolment in Zenith College in different years.
Graphical Representation of Grouped Data
The raw data are organised into frequency distribution to get grouped data. The
methods of representing grouped data graphically are given below:
(1) Histogram (2) Frequency Polygon (3) Cumulative Frequency Graph
(4)
Cumulative Frequency Curve or Ogive.
1. Histogram
A histogram is essentially a bar diagram of a frequency distribution in which the
‘actual’ class interval plotted on the X-axis represent the width of bars (rectangles) and
respective frequencies of these class represents the height of bars.
Psychological Statistics
Page 12
School of Distance Education
For determining the actual class, a value 0.5 is subtracted from the lower limit of the
class and 0.5 is added to the upper limit. For example, in the class 50-54, the actual class
limits are determined by subtracting and adding 0.5 to the upper and lower limits
respectively. Hence we get the actual class interval as 49.5 - 54.5.
The steps in the construction of histograms are given below:
1. Convert the scores into actual class limits, ie. , 20 – 24 as 19.5 – 24.5.
2. Take two extra class intervals, one above and one below the given classes with zero
as frequency.
3. Plot the actual or exact lower limits of classes on the X-axis.
4. Frequencies of distributions are to be plotted on the Y-axis.
5. Represent each class by separate rectangles in which the base of each rectangle is
the width of the class interval (i) and the height as its respective frequency.
Consider the following example for the illustration of representing data in the form of
histogram.
Example 1.6
Score
: 30-39
40-49 50-59 60-69 70-79 80-89
No. of students:
3
8
11
12
4
2
To draw histogram, take the actual or exact lower limits of the classes of score as values
to be marked on the X-axis, and the corresponding frequencies of classes on the Y-axis.
Figure 1.6: Histogram representation of scores and frequencies.
2. Frequency Polygon
A frequency polygon is essentially a line graph used for the graphical representation
of a frequency distribution. A frequency polygon is drawn from a histogram by connecting
the midpoints of the upper bases of rectangular bars by using straight lines. Frequency
polygon can also be drawn directly by plotting the midpoints of classes.
Steps in the construction of a frequency polygon are given below.
Psychological Statistics
Page 13
School of Distance Education
1. Take two extra classes one above and one below the given intervals with zero
frequency.
2. Compute the midpoints of classes.
3. Mark the midpoints along the X-axis and mark the corresponding frequencies on Yaxis.
4. Join the points marked on the graph by using straight lines to obtain a frequency
polygon.
Example 1.7: Construct a frequency polygon from the data given below.
Score
: 50-59
No. of students: 5
60-69 70-79 80-89 90-99
10
30
40
15
Figure 1.7: Frequency Polygon
3. The Cumulative Frequency Graph
The data organised in the form of a cumulative frequency distribution may be
represented graphically using cumulative frequency graph. It is essentially a line graph
drawn by plotting actual upper limits of the class intervals on the X-axis and the respective
cumulative frequencies of these class intervals on the Y-axis.
Steps in the construction of cumulative frequency graph are given below.
1. Take one extra class with cumulative frequency as zero to plot the origin of the
graph on the X-axis.
2. Compute the actual upper limits of classes.
3. Compute the cumulative frequencies.
4. Mark the actual upper limits of classes on X-axis and mark the corresponding
cumulative frequencies on Y-axis.
5. Join the points plotted on graph by using straight lines resulting in a cumulative
frequency graph or a cumulative frequency line graph.
Example 1.8: Consider the following for constructing cumulative frequency graph.
Psychological Statistics
Page 14
School of Distance Education
Scores
: 30-39 40-49 50-59 60-69 70-79
No. of students:
20
35
25
15
5
Solution:
Actual upper limits of classes: 39.5 49.5 59.5 69.5
Cumulative Frequencies
: 20
55
80
95
79.5
100
Figure 1.8: Cumulative Frequency Graph
4. The Cumulative Percentage Frequency Curve or Ogive
The cumulative percentage frequency curve or ogive represents the cumulative
percentage frequency distribution by plotting exact or actual upper limits of classes on the
X-axis and their respective cumulative percentage frequencies of classes on the Y-axis.
Ogives can be useful in the computation of medians, quartiles, deciles, percentiles,
percentile ranks and percentile norms as well as for the overall comparison of two or more
groups or frequency distributions.
Consider data given in Example 1.6 for the illustration of Ogive
Solution:
Scores
Actual
upper
limits
(X)
30-39
39.5
40-49
49.5
50-59
59.5
60-69
69.5
70-79
79.5
Psychological Statistics
Frequencies
(f)
20
35
25
15
5
N=100
Cumulative
Frequencies
(CF)
Cumulative Percentage
Frequency
20
55
80
95
100
20
55
80
95
100
=
CF
 100
N
Page 15
School of Distance Education
Figure 1.9: Cumulative Percentage Frequency Curve or Ogive
Psychological Statistics
Page 16
School of Distance Education
Module 2: Measures of Central Tendency
Meaning
The scores obtained by conducting tests, surveys and experiments are mostly not be
presented entirely which in many circumstances would be impossible also. It can be seen
that only a very few scores are very high or very low, while most of the scores tend to
cluster around a central value. This central value reflects the average characteristic of the
distribution.
The tendency of scores in a distribution to cluster around a central value is termed as
central tendency; and the typical score or value lying between the extremes reflecting the
average characteristic is referred to as a measure of central tendency.
The three most common measures of central tendency are given below.
1. Arithmetic Mean or Mean
2. Median
3. Mode
Arithmetic Mean
Arithmetic mean or Mean is the sum of all the values of a given distribution divided
by the number of values. In simple words, it is the average of a distribution. It is
represented by the symbol M or X .
Mean =
Characteristics of Arithmetic Mean
Sum of all values
Number of values
1.
2.
3.
4.
The value of mean reflects the magnitude of every value in a given distribution.
A distribution has only one mean.
It is possible to manipulate mean algebraically.
Mean may be calculated even if individual values are unknown, provided the sum of
values and the size of sample ‘N’ are given.
5. There is no need or ordering or grouping of data for the computation of mean.
6. It is not possible to compute mean of an open ended distribution.
Types of Mean
There are mainly four types of mean. They are:
1. Arithmetic Mean
2. Geometric Mean
3. Harmonic Mean
4. Quadratic Mean
Psychological Statistics
Page 17
School of Distance Education
Arithmetic mean is simply the ‘average value’. It is the sum of all scores divided by
the number of scores. Geometric mean is computed by multiplying all the values (N) in
a distribution and taking the Nth root of their product. Harmonic mean is the central
tendency of a distribution that is the reciprocal of arithmetic mean of the reciprocals of
a set of values. Quadratic mean is the central tendency of a distribution that is square
root of the arithmetic mean of the squares of a set of values.
Advantages
1. It is easy to understand.
2. It is simple to calculate.
3. There is no need to order data in ascending or descending manner.
4. All the scores in a distribution are taken in to consideration while computing Mean.
5. It is very useful for comparing values.
Limitations
1. It is difficult to assume Mean from frequencies of values alone.
2. It is not appropriate for qualitative analysis.
3. If the frequency of one value is missing, it would be difficult to calculate Mean.
4. The Mean gives importance to large frequencies than smaller ones.
5. The same Mean of different categories may give different meanings.
6. It is not appropriate for computing ratios.
Computation of Mean from Ungrouped Data
Direct Method
If X1, X2, X3, ..... , X10 are the scores obtained by 10 students on a test, the
arithmetic mean is computed as:
`
M =X1 + X 2 + X 3 + .....+ X10
10
The formula for calculating mean of ungrouped data is
X
X
N
Where,
X is the sum of scores of the distribution
N is the total number of scores in the distribution.
Example 2.1: Consider the marks obtained by 10 students in an achievement test in
Psychology. Marks: 65, 76, 50, 80, 73, 64, 57, 45, 78, 82. Compute mean marks from the
data given.
Psychological Statistics
Page 18
School of Distance Education
Marks
X
--------65
76
50
80
73
64
57
45
78
82
-----------X = 670
=======
Mean =

X
N
670
 67
10
Short-cut Method
X 
A  d
N
Where,
A is the assumed mean
d is deviation
N is number of scores in the distribution
X
d= (X – A)
65
1
76
12
50
-14
80
16
73
9
64(A)
0
57
-7
45
-19
78
14
82
18
d = 30
A  d
N
30
 64 
 64  3  67
10
X 
Psychological Statistics
Page 19
School of Distance Education
Computation of mean from Grouped Data
Direct Method
In a frequency distribution, where all the frequencies are greater than one, the mean
is calculated by the formula given below.
M 
fX
N
Where,
X is the mid-point of the classes
f is the frequency
N is the total of all frequencies
Example 2.2: Compute mean from the data given below.
Scores Frequency(f)
85-89
1
80-84
1
75-79
3
70-74
1
65-69
2
60-64
10
55-59
3
50-54
8
45-49
4
40-44
4
35-39
3
N=40
Solution:
Scores Frequency(f) Mid-point (X)
fX
85-89
1
87
87
80-84
1
82
82
75-79
3
78
234
70-74
1
72
72
65-69
2
68
136
60-64
10
62
620
55-59
3
58
174
50-54
8
52
416
45-49
4
48
192
40-44
4
42
168
35-39
3
38
114
N=40
fX=2295
fX
N
2295
 57.38
=
40
M 
Shortcut Method
M  A
fx '
i
N
Psychological Statistics
Page 20
School of Distance Education
Where,
A = assumed mean
i = class internal
f = frequency
N = total frequency
x' =
X A
, where, X is the mid-point of the class.
i
Consider the data given in Example 2.2. Compute mean by using shortcut method.
Scores Frequency(f) Mid-point (X) x' = (X-A)/i
fx'
85-89
1
87
5
5
80-84
1
82
4
4
75-79
3
78
3
9
70-74
1
72
2
2
65-69
2
68
1
2
60-64
10
62
0
0
55-59
3
58
-1
-3
50-54
8
52
-2
-16
45-49
4
48
-3
-12
40-44
4
42
-4
-16
35-39
3
38
-5
-15
N=40
fx' = -40
fx '
M  A
i
N
 62 
 40
5
40
 62  5
= 57
Median
When the items of a series are arranged in ascending or descending order of
magnitude, the measure or value of the central item in the series is called as Median.
Median is a value that divides the distribution into two parts, ie., half of the value
lies above the Median and half below it.
Characteristics of Median
1. It is the value that occupies the middle point of the distribution, such that half the
items fall above it and half below it.
2. The value of median doesn’t reflect the values in a given distribution.
3. A distribution has only one median.
4. Median cannot be manipulated algebraically.
5. Computation of median requires the proper ordering of values.
Psychological Statistics
Page 21
School of Distance Education
6. It is possible to compute median of an open ended distribution.
Advantages
1. It is simple to calculate.
2. Easy to understand.
3. It is possible to calculate median in all distributions.
4. Median can be calculated even with extreme values.
5. It is very useful in quantitative analysis where order of score is emphasised (ie.,
ordinal).
Limitations
1. It has only limited use.
2. Not appropriate for qualitative phenomenon.
3. Not applicable where items are assigned weights.
Computation of Median for Ungrouped Data
i.
When the number of items in a distribution (N) is odd
When N, ie., the number of items in a distribution is an odd number, Median is computed
using the following formula:
Median (Md)= the measure or value of the (N=1)/2th item.
Example 2.3: The marks obtained by 5 students in a test are 42, 50, 64, 56, 35. Compute
the Median mark obtained in the test.
The first step in the calculation of Median is to arrange the scores either in ascending or
descending order.
By arranging the marks in ascending order we get 35, 42, 50, 56, 64.
Since N=5, which is an odd number, we compute Median by using the formula Median
(Md)= the measure of (N+1)/2th item viz.,
= the measure of (5+1)/2th item
= the measure of 3rd item, ie., 50
ii.
When the number of items in a distribution (N) is even
When N, ie., the number of items in a distribution is an even number, Median is computed
using the following formula:
Median( M d ) 
Value of (N/2) th item  Value of [(N/2) + 1]th item
2
Example 2.4: The marks obtained by 8 students in an achievement test are 50, 42, 60, 35,
56, 65, 40, 62. Calculate the Median mark obtained.
Arranging the marks in ascending order we get, 35, 40, 42, 50, 56, 60, 62, 65.
Psychological Statistics
Page 22
School of Distance Education
Median( M d ) 
Value of (N/2) th item  Value of [(N/2) + 1]th item
2
Where,
N=8
Value of (N/2)th item = 8/2= 4th item, ie., 50
Value of [(N/2) + 1]th item = 4 +1 = 5th item, ie., 56
Therefore, Median is (50 + 56)/2, ie., 53.
Example 2.5: The table gives salary to employees in a firm. There are 52 employees
working. Compute the median salary paid to employees in a month.
Salary (in thousands): 4 7 8 10 11 12 13 14 15
Number of employees: 3 4 7 9
12 8
4
2
1
Solution:
Salary
(in thousands)
4
7
8
10
11
12
13
14
15
No. of employees
(f)
3
4
7
9
12
8
4
2
1
N=50
Cumulative
Frequency (cf)
3
7
14
23
35
43
47
49
50
 N  1  th
 item
 2 
Median (Md) = Measure of 

50  1 51
  25.5
2
2
Here, 25.5th item comes after the cumulative frequency 23. Therefore it will be
included in 35; and hence the Median salary will be Rs. 11000.
Computation of Median for Grouped Data
Consider the following example for computation of Median for grouped data or data
in continuous series.
Example 2.6: The monthly income of staff members of an institution
Monthly Income: 2000 – 2500
Staff
:
3
Md = l 
1500 – 2000 1000 – 1500 500 – 1000 0 – 500 No. of
14
27
34
46
i( N / 2  F )
f
Psychological Statistics
Page 23
School of Distance Education
Where,
l = Exact or actual lower limit of the median class
F = Total of all frequencies before the median class
f = Frequency of the median class
i = Class interval
N= Total frequencies
Monthly Income
2000 – 2500
1500 – 2000
1000 – 1500
500 – 1000
0 – 500
f
3
14
27
34
46
F
124
121
107
80
46
Median class can be computed as follows:
Firstly, find N/2 = 124/2 viz., 62
Then, find the cumulative frequency in which the 62 can be included. Here, 62 can be
included in the cumulative frequency (F) 80. Therefore the median class is
500 –
1000.
Now, applying the formula we get,
Md = l 
i( N / 2  F )
f
 499.5 
500 (62  46)
34
 499.5 
500 16
 734.79
34
Mode
Mode is the value or measure that occurs most frequently in a distribution. The
score or value corresponds to the maximum frequency of the distribution.
Characteristics of Mode
1. It is the most frequently occurring value in a distribution.
2. A distribution may have two or more modes.
3. Mode does not reflect the other values in a given distribution.
4. It cannot be manipulated algebraically.
5. The computation of mode requires proper ordering of data.
6. It is possible to calculate mode of an open ended distribution.
Psychological Statistics
Page 24
School of Distance Education
Advantages
1. Mode can be easily computed.
2. It can be also identified by graph.
3. It is not affected by extreme values.
4. It is very useful for business purposes.
Limitations
1. It is not a stable measure of central tendency.
2. It cannot be put to algebraic treatment.
3. It remains indeterminate when there exists two or more modal values in a series.
4. It is not suitable where the relative importance of items is under consideration.
Computation of Mode from Ungrouped Data
In the case of ungrouped data, mode is the value or score that occurs maximum
number of times in a distribution. That is, it is the value or measure that has the maximum
frequency.
Example 2.7: Compute mode from the following distribution: 34, 23, 45, 34, 48, 54, 56,
34, 76, 45.
Here, 34 occurs the most number of times ie., three times. Hence, in the example given, the
value of mode is 34.
Computation of Mode from Grouped Data
In data which is given in the form of a frequency distribution (grouped data or
continuous series), Mode is computed using the formula,
Mode (Mo) = 3Md – 2M
Where, Md is the median and M is the Mean of the given distribution. The Mean and
Median are first computed and subsequently Mode is computed.
Mode can also be computed directly from the frequency distribution table without
calculating mean and median. For this, the following formula is used:
Mo  l 
f1  f 0
 (l 2  l1 )
2 f1  f 0  f 2
Where,
l1= lower limit of the modal class
l2= upper limit of the modal class
f1=frequency of the modal class
f0= frequency of the class preceding (before) the modal class
f2= frequency of the class succeeding (after) the modal class
Example 2.8: The following data relates to the different income groups of 45 farmers in a
village.
Psychological Statistics
Page 25
School of Distance Education
Income groups No. of farmers
30000 – 35000
2
35000 – 40000
5
40000 – 450000
10
45000 – 50000
8
50000 – 55000
3
55000 – 60000
10
60000 – 65000
7
N= 45
Solution:
Mo  l 
f1  f 0
 (l 2  l1 )
2 f1  f 0  f 2
M o  45000 
8  10
 5000
2(8)  (10)  (3)
 45000 
2
 5000
16  13
 45000 
10000
 45000  3333  41667
3
Psychological Statistics
Page 26
School of Distance Education
Module 3: Measures of Dispersion
Measures of central tendency provide a value that can be used to represent the
characteristic of a given distribution. This single value or measure can be used to represent
the characteristic of the entire distribution or group. But they do not show how the
individual scores are ‘spread’ or ‘scattered’, which is very important in cases where we
have to describe and compare two or more frequency distributions or sets of scores.
There is a tendency for data to be dispersed, scattered or to show variability around
the average. The tendency of scores to ‘scatter’ or ‘spread’ or deviate from the average or
central value is termed as the measure of dispersion or variability. It is to be noted that if
dispersion is less, the average is more representative of the distribution and vice versa.
Measures of Dispersion
The measure of dispersion gives the degree of variability or dispersion by a single
value, which tells us how the individual scores are scattered or spread throughout the
distribution or data. There are four measures of variability or dispersion. They are the
following:
1.
2.
3.
4.
Range (R)
Quartile Deviation (QD)
Average Deviation (AD)
Standard Deviation (SD)
1. Range (R)
Range is the simplest measure of variability or dispersion. It is computed by
subtracting the lowest score in the series from the highest score. Lower the range, less
scattered would be the variations and higher the range, more scattered would be the
variations. However, range is a very crude or rough score as it takes in to account only the
extreme values and ignore the variation of individual items.
Range (R)= Largest value – Smallest value
Coefficient of Range
For comparative purposes, absolute measure has to be converted into relative
measure. This is done by computing coefficient of variation. Here, in this case, we are
considering range, and hence we have to computeCoefficientofrange 
L arg estValue  SmallestValue
L arg estValue  SmallestValue
Quartile Deviation (QD)
The total distribution is divided in to four quartiles or parts which includes Q1
(25%), Q2 (25%), Q3 (25%) and Q4 (25%). Quartile Deviation (QD) is one half of the
Psychological Statistics
Page 27
School of Distance Education
difference between the 3rd quartile which is Q3 and the 1st quartile is Q1. The formula for
Quartile Deviation is given below:
QD 
Q3  Q1
2
Where,
Q3  l 
i(3N / 4  F )
f
Q1  l 
i( N / 4  F )
f
The value Q3 – Q1 is the difference or range between the 3rd quartile and the 1st
quartiles. This value is also called the interquartile range. While computing Quartile
Deviation, this interquartile range is divided by 2, and hence, Quartile Deviation is also
called as semi-interquartile range.
Example 3.1: Compute quartile deviation from the data given below.
Class
90-99
80-89
70-79
60-69
50-59
40-49
30-39
20-29
10-19
0-9
3rd quartile=
F
1
5
12
20
26
13
8
7
4
4
N=100
F
100
99
94
82
62
36
23
15
8
4
Q3
Q2
Q1
3N 3x100

 75
4
4
Where, 75 is included in the cumulative frequency 82.
2nd Quartile, 50 is included in the cumulative frequency 62,
hence median class is 50 – 59.
QD 
Q3  Q1
2
Q3  l 
i(3N / 4  F )
f
Psychological Statistics
Page 28
School of Distance Education
 3 x100

10
 62 
4

= 59.5  
20
59.5 
=
= 59.5 
Q1  l 
10  13
 66
20
i( N / 4  F )
f
 39.5 
QD 
10(75  62)
20
10(100 / 4  23)
13
 39.5 
10(25  23)
13
 39.5 
10  2
 41.04
13
Q3  Q1
2

66  41.04
 12.48
2
Mean Deviation or Average Deviation
Garrett (1971) defines Average Deviation as the mean of deviations of all the
separate scores in the series taken from their mean. This measure of variability takes in to
account the fluctuation or variation of all the items in a series.
Computation of Mean Deviation from Ungrouped Data
The following formula is used for ungrouped data:
MD 
x
N
Where,
x=X– X
X is the raw score
M is the Mean value
x is the absolute value of x, ie., value of x by ignoring the signs +ve or –ve.
Example 3.2: find the Mean Deviation of the scores 35, 32, 17, 20, 31.
Solution:
N=5
Psychological Statistics
Page 29
School of Distance Education
Mean= (35+32+17+20+31) / 5 = 135 / 5 = 27
X
35
32
17
20
31
N=5
MD 
x
N
x
x= X – X
8
5
-10
-7
4
=
8
5
10
7
4
 x = 34
34
 6.8
5
Computation of Mean Deviation from Grouped Data
The following formula is used to compute Mean Deviation for grouped data:
MD 
 fx
N
Example 3.3: Compute mean deviation from the data give below.
Scores frequency
50-54
3
45-49
4
40-44
6
35-39
11
30-34
14
25-29
12
20-24
9
15-19
4
10-14
2
Solution:
Scores
50-54
45-49
40-44
35-39
30-34
25-29
20-24
15-19
10-14
f
3
4
6
11
14
12
9
4
2
N=65
Psychological Statistics
X
52
47
42
37
32
27
22
17
12
fX
x=X- X
156
20
188
15
252
10
407
5
448
0
324
-5
198
-10
68
-15
24
-20
 fX= 2065
fx
60
60
60
55
0
-60
-90
-60
-40
fx
60
60
60
55
0
60
90
60
40
 fx = 485
Page 30
School of Distance Education
Mean or X 
MD 
 fx
N
fX 2065

 31.77  32
N
65

485
 7.46
65
2. Variance and Standard Deviation
Variance is the measure of dispersion which eliminates the sign problem caused by
the negative deviations cancelling out the positive deviations. The procedure is to square
the deviation scores and divide their sum by number of scores in the distribution.
( X 1  X )
Variance S 
n
2
2
Standard Deviation (SD) is regarded as the most stable measure of variability as
mean is used for its computation. Standard Deviation of a set of scores is defined as the
square root of the average of the squares of the deviations of each score from the mean. It
will always be a positive number. SD explains how much dispersion is there in the
distribution of the given data. Standard Deviation is interpreted as an index of variation.
The larger the standard deviation, the greater is the variation or spread of the scores in the
distribution. If there is no variation of scores, then the standard deviation is always zero.
Standard deviation is often referred to as root mean square deviation and is denoted
by the Greek letter sigma (  ). Since the algebraic sign +ve and –ve are not ignored, it is
more accurate than Mean Deviation.
Characteristics of Standard Deviation
1.
2.
3.
4.
5.
It is the most important measure of dispersion.
It measure variability or spread of scores in a distribution.
Standard deviation will be a positive number.
It is more accurate and justified measure of dispersion.
It is more accurate than mean deviation since + and – signs are not ignored in the
calculation.
The formula for computing SD is given below.
SD 
( X  X ) 2
N

x 2
N
Where,
X = individual score
X = mean of all scores
N = total number of items
Psychological Statistics
Page 31
School of Distance Education
x = deviation of each score from the mean ie., X – X
Computation of Standard Deviation from Ungrouped Data
Standard deviation of ungrouped data can be computed using the formula given
below.
SD 
x 2
N
Example 3.4: Compute standard deviation of the following distribution.
Score: 68, 62, 58, 64, 52, 58, 50, 68
Mean 
X
N
=
68  62  58  64  52  58  50  68
8
=
480
8
= 60
Score (X) x=X- X
68
8
62
2
58
-2
64
4
52
-8
58
-2
50
-10
68
8
S D( ) 
=
x2
64
4
4
16
64
4
100
64
2
x = 320
x 2
N
320
8
 40  6.32
Computation of Standard Deviation from Grouped Data
Standard deviation of grouped data can be computed using the formula given below.
Psychological Statistics
Page 32
School of Distance Education
Mean 
X
N
Example 3.5: Compute Standard Deviation for the frequency distribution given below.
The mean of the distribution is 115.
Scores Frequency
127-129
1
124-126
2
121-123
3
118-120
1
115-117
6
112-114
4
109-111
3
106-108
2
103-105
1
100-102
1
N=24
Solution:
Scores Frequency X x= X- X
127-129
1
128
13
124-126
2
125
10
121-123
3
122
7
118-120
1
119
4
115-117
6
116
1
112-114
4
113
-2
109-111
3
110
-5
106-108
2
107
-8
103-105
1
104
-11
100-102
1
101
-14
N=24
SD 
x2
169
100
49
16
1
4
25
64
121
196
fx2
169
200
147
16
6
16
75
128
121
196
2
fx = 1074
fx 2
1074

 44.75  6.69
N
4824
Coefficient of Variation or Coefficient of Relative Variability
It is often desirable to compare variabilities when means are unequal or when units
of measurement from test to test are incommensurable. A statistic useful in making such
comparisons is the coefficient of variation or V, sometimes called the coefficient of relative
variability. This measure was first suggested by Karl Pearson as the percentage variation in
a mean, the standard deviation being treated as the total variation in the mean, symbolically
coefficient of variation.
Coefficient of variation stands for the percentage which the value of standard
deviation is, to the value of the mean. That is, if standard deviation is divided by the mean
and multiplied by 100, we get the coefficient of variation.
Psychological Statistics
Page 33
School of Distance Education
The following formula is used for computing coefficient of variation:
CoefficientofVariation(V ) 
S tan dardDeviation
 100
Mean
Example 3.6: The mean of a distribution is 50 and SD is 10. find the coefficient of
variation.
Solution:
Coefficient of Variation (V) =
10
 100 ie., 20%.
50
It means that the SD is 20% of mean. Coefficient of variation (V) is a primary tool
in the statistical analysis of data because, being expressed as a percentage, the units of the
variables can be ignored. Problems relating to conversion of different units of the variables
in to a standard unit for purpose of uniform expression do not arise. Coefficient of variation
is only a percentage of SD to the mean of a given distribution.
*********************
Psychological Statistics
Page 34