Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PSYCHOLOGICAL STATISTICS II SEMESTER Complementary Course For B.Sc. Counselling Psychology (CU-CBCSS) (2014 Admission onwards) UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION Calicut university P.O, Malappuram Kerala, India 673 635. School of Distance Education UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION STUDY MATERIAL COMPLEMENTARY COURSE For B.Sc. COUNSELLING PSYCHOLOGY PSYCHOLOGICAL STATISTICS II Semester Prepared by: Ms. Sajila Research Scholar University of Calicut Layout: Computer Section, SDE © Reserved Psychological Statistics Page 2 School of Distance Education Psychological Statistics CONTENT PAGES Module - 1 05 – 16 Module - 2 17 – 26 Module - 3 27-34 Page 3 School of Distance Education Psychological Statistics Page 4 School of Distance Education Module 1: Frequency Distribution and Graphs Horace Secrist defines statistics as, “aggregate of facts, affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to a reasonable standard of accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to each other”. Meaning of Data The term ‘data’ refers to facts or evidences relating to a group, situation or a phenomenon. It may include raw facts such as name, measures of height, weight and scores on different forms of tests, experiments or surveys. Measures of Data: Continuous and Discrete Data may be either in continuous or discrete form. Data relating to psychological and physical traits fall into continuous data. A continuous series can have any degree of subdivision, with each measure, which may be an integer or a fraction, existing anywhere within the range of the scale used. ie., Continuous data are not restricted to defined separate values, but can occupy any value over a continuous range. Between any two continuous data values there may be an infinite number of other values. Examples: measure of heights like 160.5cms,159.6 cms, etc, measure of distances like 25.7kms,56.5kms,etc, scores obtained in exams like 85.5, 73.5,etc. Discrete data can only take particular values. There may potentially be an infinite number of those values, but each is distinct and there's no grey area in between. ie., measures that fall under discrete series are separate and distinct. There is real gap between the measures. Examples: number of students in a class like 50, 45, etc, number of books in a library like 1000, 2345, etc. Organisation of Data The different aspects of psychology may be studied by conducting different forms of tests, surveys and experiments which yields valuable data. Data in its original form having little meaning to the reader or investigator is termed as raw data. In order to make the raw data meaningful, it has to be organised or arranged systematically. This process of organising or arranging original data in a systematic manner in order to make meaningful interpretations is termed as organisation or grouping of data. There are different methods for the organisation of data. Data may be organised in any of the following forms as given below. 1. Statistical Tables 2. Rank Order 3. Frequency Distribution Organising data in the form of Statistical Tables: Under this method, data are presented in tabular form or arranged in to rows and columns of different headings. The tables Psychological Statistics Page 5 School of Distance Education constitute original data or raw scores as well as the percentages, means, standard deviations, etc. Consider the statistical table given below. Example 1.1 Organising data in the form of Rank Order: Under this method, raw data are arranged in ascending or descending series which reveals the order with respect to ranks or merit position of the individual. Consider the following example. Example 1.2: The following are scores obtained by 40 students in a test. Present the data in a tabular form depicting the rank order. The scores are: 72 55 64 65 54 85 45 60 60 76 52 65 55 39 53 40 80 64 35 52 45 63 55 40 53 46 63 42 76 62 38 78 50 42 53 48 60 63 62 52 Solution: The rank order tabulation of the data Sl No. Score Sl No. Score Sl No. Score Sl No. Score Sl No. Score 1 2 3 4 5 6 7 8 35 38 39 40 40 42 42 45 9 10 11 12 13 14 15 16 45 46 48 50 52 52 52 53 17 18 19 20 21 22 23 24 53 53 54 55 55 55 60 60 25 26 27 28 29 30 31 32 60 62 62 63 63 63 64 64 33 34 35 36 37 38 39 40 65 65 72 76 76 78 80 85 Frequency Distribution: Frequency Distribution is a method of presenting data showing the frequency or the number of time a score or group of scores occur in a given distribution. Under this method data is organised in to groups or classes in which each score is allotted a Psychological Statistics Page 6 School of Distance Education place in the respective group or class. The number of times a particular score or group of score occurs in the given distribution is also given. This is known as the frequency of a score or group of scores. Construction of Frequency Distribution Table Data are organised in to a frequency distribution systematically. The following steps are used to construct a frequency distribution table: 1. Finding the Range: The first step is finding out the range of the given series of data. Range is computed by subtracting the lowest score from the highest one given in the data series. In the Example 1.2 given, the range of distribution of data will be, Range= Highest score – Lowest score, ie., 85-35=50. 2. Determining the Class Interval: Class interval denotes the number and size of classes of groups used for grouping or organising data. There are two methods for this: i. Computing the class interval (i) using the formula: = As a general rule, Tate (1955) has given the following rule for deciding the number of classes desired. ii. For items less than or equal to 50, the number of classes may be 10. For 50 to 100 items, then 10 to 15 classes are appropriate. For more than 100 items, 15 or more classes may be appropriate. Ordinarily, not fewer than 10 classes or more than 20 classes are used. Under the second method, class interval (i) is decided first and then the number of classes is determined. For this purpose usually, the class intervals of 2, 3, 5 or 10 units in length are used. Thus in the given Example 1.2, the class interval (i) will be, = Here, range is 50. As the number of scores is 40, which is less than 50, it may be sufficient to take 10 classes. Hence class interval (i) will be 50/10 ie., 5. 3. Preparing Frequency Distribution Table After determining the size and class interval, we proceed to preparing the frequency distribution table. This follows two steps: i. ii. Writing classes of the distribution Tallying the score and checking the tallies The first step is the writing of classes of distribution. For this, first the lowest classes and then the subsequent higher classes are formed. Psychological Statistics Page 7 School of Distance Education In the example, the lowest class will be 30-39 and subsequently the higher classes 40-49, 50-59, 60-60, 70-79 and 80-89. The second step involves tallying the scores and checking the tallies. For this, the score given in the distribution are taken one by one and tallied in their proper classes as shown in the Table 1.1. The tally marks against each class are then counted and checked to determine the frequencies of that class. The total frequencies should be equal to the number for individuals whose scores have been tabulated. Table 1.1: Frequency Distribution Table Class Tallies Frequency 80 – 89 2 70 – 79 4 60 – 69 12 50 – 59 11 40 – 49 8 30 – 39 3 Cumulative frequency and Cumulative Percentage Frequency Distributions A frequency distribution table shows how frequencies are distributed over the different class intervals. For determining the number of scores or percentage of scores lying above or below a class interval, another category of tables called cumulative frequency and cumulative percentage frequency tables are constructed. The cumulative frequency and cumulative percentage frequency distributions may be directly obtained from frequency distribution. Consider Table 1.2. Table 1.2: Cumulative Frequencies and Cumulative Percentage Frequencies Class Frequency Cumulative Frequency Cumulative Percentage Frequency 80 – 89 2 40 100 70 – 79 4 38 95 60 – 69 12 34 85 50 – 59 11 22 55 40 – 49 8 11 27.5 30 – 39 3 3 7.5 N= 40 In Table 1.2, cumulative frequencies are obtained by adding successively the individual frequencies starting from the lowest class. These cumulative frequencies are converted to cumulative percentage frequencies by multiplying each cumulative frequency by 100/N., where N is the total number of frequencies. The cumulative percentage frequencies show the percentage cases lying above or below a given score or class. In the Table 1.2, consider for example, the cumulative percentage 55%, which is computed as 22 100/40. This shows that 55% of students in the Psychological Statistics Page 8 School of Distance Education class of 40 students, achievement score in mathematics lie below 59 or 59.5 which is the actual or exact upper limit of the class 50-59. Thus, the cumulative frequencies and cumulative percentage frequencies help us to determine the relative position, rank or merit of an individual with respect to the members of a group. Diagrams and Graphs The data obtained from surveys, tests and experiments may be organised in the form of statistical and frequency distribution tables. Such an organisation helps in the better understanding of data and interpreting them to derive valuable conclusions. The numerical data may be easily analysed if they are represented graphically in the form of pictures and graphs. Meaning of graphical Data Representation Graphical representation of data means representing numerical data in visual form using pictures, diagrams and graphs for analysing the data more easily and effectively. It is always considered as an effective and economical way for presenting, understanding, analysing and interpreting of statistical data. Advantages 1. Precise and easy to understand. 2. More economical and effective method of representing data. 3. Attractive and appealing. 4. Easy to remember. 5. Easy to make comparisons with other data effectively. 6. Proper estimation, evaluation and interpretation of data is possible. 7. Easy computations of mean, median, mode, etc. 8. Helps in determining the nature of data and forecasting the trends. Modes of graphical representations The two types of data such as ungrouped (data in raw form) and grouped (data organised in to frequency distribution) uses separate methods of representing data in graphical form. Graphical Representation of Ungrouped Data The ungrouped data usually uses the following graphical representations: Diagrams (2) Pie Diagrams (3) Pictograms (4) Line Graphs (1) Bar 1. Bar Diagrams Data in different forms like raw scores, total scores or frequencies, computed statistics and summarised figures like percentages and averages can be represented by Psychological Statistics Page 9 School of Distance Education using bars. This form of graphical data representation is called bar diagrams. This may take two forms like vertical and horizontal bar diagrams. The lengths of bars are in the proportion of the value of variables (height, weight, intelligence, marks, price, etc). The widths of bars are chosen arbitrarily. It is conventional to have the space between the bars about one half of the width of a bar. Consider example 1.3 for the illustration of bar diagram. Example 1.3: The following data relates to the student enrolment in Zenith College in different years. Represent the following data using bar diagram. Year Number of Students Enrolled 2010 – 2011 1000 2011 – 2012 1220 2012 – 2013 900 2013 – 2014 1100 2014 – 2015 1400 2015 – 2016 1500 Student enrolment The above data can be represented using bar diagrams in vertical and horizontal forms as give in Figures 1.1 and 1.2. 1600 1400 1200 1000 800 600 400 200 0 2010 – 2011 2011 – 2012 2012 – 2013 2013 – 2014 2014 – 2015 2015 2016 Year Figure 1.1: Vertical Bar Diagram – Student enrolment in Zenith College during the years 2010 to 2015. Year 2014 – 2015 2012 – 2013 2010 – 2011 0 500 1000 1500 2000 Student enrolment Figure 1.2: Horizontal Bar Diagram – Student enrolment at Zenith College during the years 2010 to 2015. Psychological Statistics Page 10 School of Distance Education 2. Pie Diagram In a pie diagram data is represented as sections or portions of a circle of 3600, in which each part represents the amount of data converted in to angles. The total frequency value is equated to 3600 and then the angles corresponding to component parts are calculated. By using these angles, different sectors are drawn. Consider Example 1.4 for the illustration of preparing pie diagram. Example 1.4: The following data relates to Subjects offered for study in an institution and the number of students enrolled. Present the data graphically in the form of a pie diagram. Subjects : Science Arts Commerce Students enrolled : 100 130 170 The above data can be presented in the form of pie diagram as given below. Courses Offered Science Arts Commerce Total No. of Students 100 130 170 400 Science 43% Arts Angle of the Circle (100/400)x360 = 900 (130/400)x360 = 1170 (170/400)x360 = 1530 3600 Commerce 25% 32% Figure 1.3 Representation of Pie Diagram – Subjects offered for study and percentage of students enrolled. 3. Pictograms In data representation using pictograms, numerical data is represented by means of picture figures appropriately designed in proportion to the numerical data. Example 1.5: The number of students in classes 1 to 5 is given. Represent the data using pictogram.Class : I II III IV V Strength: 70 Psychological Statistics 70 60 50 40 Page 11 School of Distance Education Figure 1.4: Pictogram representation of number of students in classes 1 to 5. 4. Line Graphs In line graph form of data representation, data related to one variable is plotted on the horizontal X-axis, and the other variable on the vertical Y- axis of line graph. Consider Example 1.3 for drawing a line graph. 1600 No. of students enrolled 1400 1200 1000 800 600 400 200 0 2009-2010 2010 – 2011 2011 – 2012 2012 – 2013 2013 – 2014 2014 – 2015 2015 – 2016 Year Figure 1.5: Line graph- Student enrolment in Zenith College in different years. Graphical Representation of Grouped Data The raw data are organised into frequency distribution to get grouped data. The methods of representing grouped data graphically are given below: (1) Histogram (2) Frequency Polygon (3) Cumulative Frequency Graph (4) Cumulative Frequency Curve or Ogive. 1. Histogram A histogram is essentially a bar diagram of a frequency distribution in which the ‘actual’ class interval plotted on the X-axis represent the width of bars (rectangles) and respective frequencies of these class represents the height of bars. Psychological Statistics Page 12 School of Distance Education For determining the actual class, a value 0.5 is subtracted from the lower limit of the class and 0.5 is added to the upper limit. For example, in the class 50-54, the actual class limits are determined by subtracting and adding 0.5 to the upper and lower limits respectively. Hence we get the actual class interval as 49.5 - 54.5. The steps in the construction of histograms are given below: 1. Convert the scores into actual class limits, ie. , 20 – 24 as 19.5 – 24.5. 2. Take two extra class intervals, one above and one below the given classes with zero as frequency. 3. Plot the actual or exact lower limits of classes on the X-axis. 4. Frequencies of distributions are to be plotted on the Y-axis. 5. Represent each class by separate rectangles in which the base of each rectangle is the width of the class interval (i) and the height as its respective frequency. Consider the following example for the illustration of representing data in the form of histogram. Example 1.6 Score : 30-39 40-49 50-59 60-69 70-79 80-89 No. of students: 3 8 11 12 4 2 To draw histogram, take the actual or exact lower limits of the classes of score as values to be marked on the X-axis, and the corresponding frequencies of classes on the Y-axis. Figure 1.6: Histogram representation of scores and frequencies. 2. Frequency Polygon A frequency polygon is essentially a line graph used for the graphical representation of a frequency distribution. A frequency polygon is drawn from a histogram by connecting the midpoints of the upper bases of rectangular bars by using straight lines. Frequency polygon can also be drawn directly by plotting the midpoints of classes. Steps in the construction of a frequency polygon are given below. Psychological Statistics Page 13 School of Distance Education 1. Take two extra classes one above and one below the given intervals with zero frequency. 2. Compute the midpoints of classes. 3. Mark the midpoints along the X-axis and mark the corresponding frequencies on Yaxis. 4. Join the points marked on the graph by using straight lines to obtain a frequency polygon. Example 1.7: Construct a frequency polygon from the data given below. Score : 50-59 No. of students: 5 60-69 70-79 80-89 90-99 10 30 40 15 Figure 1.7: Frequency Polygon 3. The Cumulative Frequency Graph The data organised in the form of a cumulative frequency distribution may be represented graphically using cumulative frequency graph. It is essentially a line graph drawn by plotting actual upper limits of the class intervals on the X-axis and the respective cumulative frequencies of these class intervals on the Y-axis. Steps in the construction of cumulative frequency graph are given below. 1. Take one extra class with cumulative frequency as zero to plot the origin of the graph on the X-axis. 2. Compute the actual upper limits of classes. 3. Compute the cumulative frequencies. 4. Mark the actual upper limits of classes on X-axis and mark the corresponding cumulative frequencies on Y-axis. 5. Join the points plotted on graph by using straight lines resulting in a cumulative frequency graph or a cumulative frequency line graph. Example 1.8: Consider the following for constructing cumulative frequency graph. Psychological Statistics Page 14 School of Distance Education Scores : 30-39 40-49 50-59 60-69 70-79 No. of students: 20 35 25 15 5 Solution: Actual upper limits of classes: 39.5 49.5 59.5 69.5 Cumulative Frequencies : 20 55 80 95 79.5 100 Figure 1.8: Cumulative Frequency Graph 4. The Cumulative Percentage Frequency Curve or Ogive The cumulative percentage frequency curve or ogive represents the cumulative percentage frequency distribution by plotting exact or actual upper limits of classes on the X-axis and their respective cumulative percentage frequencies of classes on the Y-axis. Ogives can be useful in the computation of medians, quartiles, deciles, percentiles, percentile ranks and percentile norms as well as for the overall comparison of two or more groups or frequency distributions. Consider data given in Example 1.6 for the illustration of Ogive Solution: Scores Actual upper limits (X) 30-39 39.5 40-49 49.5 50-59 59.5 60-69 69.5 70-79 79.5 Psychological Statistics Frequencies (f) 20 35 25 15 5 N=100 Cumulative Frequencies (CF) Cumulative Percentage Frequency 20 55 80 95 100 20 55 80 95 100 = CF 100 N Page 15 School of Distance Education Figure 1.9: Cumulative Percentage Frequency Curve or Ogive Psychological Statistics Page 16 School of Distance Education Module 2: Measures of Central Tendency Meaning The scores obtained by conducting tests, surveys and experiments are mostly not be presented entirely which in many circumstances would be impossible also. It can be seen that only a very few scores are very high or very low, while most of the scores tend to cluster around a central value. This central value reflects the average characteristic of the distribution. The tendency of scores in a distribution to cluster around a central value is termed as central tendency; and the typical score or value lying between the extremes reflecting the average characteristic is referred to as a measure of central tendency. The three most common measures of central tendency are given below. 1. Arithmetic Mean or Mean 2. Median 3. Mode Arithmetic Mean Arithmetic mean or Mean is the sum of all the values of a given distribution divided by the number of values. In simple words, it is the average of a distribution. It is represented by the symbol M or X . Mean = Characteristics of Arithmetic Mean Sum of all values Number of values 1. 2. 3. 4. The value of mean reflects the magnitude of every value in a given distribution. A distribution has only one mean. It is possible to manipulate mean algebraically. Mean may be calculated even if individual values are unknown, provided the sum of values and the size of sample ‘N’ are given. 5. There is no need or ordering or grouping of data for the computation of mean. 6. It is not possible to compute mean of an open ended distribution. Types of Mean There are mainly four types of mean. They are: 1. Arithmetic Mean 2. Geometric Mean 3. Harmonic Mean 4. Quadratic Mean Psychological Statistics Page 17 School of Distance Education Arithmetic mean is simply the ‘average value’. It is the sum of all scores divided by the number of scores. Geometric mean is computed by multiplying all the values (N) in a distribution and taking the Nth root of their product. Harmonic mean is the central tendency of a distribution that is the reciprocal of arithmetic mean of the reciprocals of a set of values. Quadratic mean is the central tendency of a distribution that is square root of the arithmetic mean of the squares of a set of values. Advantages 1. It is easy to understand. 2. It is simple to calculate. 3. There is no need to order data in ascending or descending manner. 4. All the scores in a distribution are taken in to consideration while computing Mean. 5. It is very useful for comparing values. Limitations 1. It is difficult to assume Mean from frequencies of values alone. 2. It is not appropriate for qualitative analysis. 3. If the frequency of one value is missing, it would be difficult to calculate Mean. 4. The Mean gives importance to large frequencies than smaller ones. 5. The same Mean of different categories may give different meanings. 6. It is not appropriate for computing ratios. Computation of Mean from Ungrouped Data Direct Method If X1, X2, X3, ..... , X10 are the scores obtained by 10 students on a test, the arithmetic mean is computed as: ` M =X1 + X 2 + X 3 + .....+ X10 10 The formula for calculating mean of ungrouped data is X X N Where, X is the sum of scores of the distribution N is the total number of scores in the distribution. Example 2.1: Consider the marks obtained by 10 students in an achievement test in Psychology. Marks: 65, 76, 50, 80, 73, 64, 57, 45, 78, 82. Compute mean marks from the data given. Psychological Statistics Page 18 School of Distance Education Marks X --------65 76 50 80 73 64 57 45 78 82 -----------X = 670 ======= Mean = X N 670 67 10 Short-cut Method X A d N Where, A is the assumed mean d is deviation N is number of scores in the distribution X d= (X – A) 65 1 76 12 50 -14 80 16 73 9 64(A) 0 57 -7 45 -19 78 14 82 18 d = 30 A d N 30 64 64 3 67 10 X Psychological Statistics Page 19 School of Distance Education Computation of mean from Grouped Data Direct Method In a frequency distribution, where all the frequencies are greater than one, the mean is calculated by the formula given below. M fX N Where, X is the mid-point of the classes f is the frequency N is the total of all frequencies Example 2.2: Compute mean from the data given below. Scores Frequency(f) 85-89 1 80-84 1 75-79 3 70-74 1 65-69 2 60-64 10 55-59 3 50-54 8 45-49 4 40-44 4 35-39 3 N=40 Solution: Scores Frequency(f) Mid-point (X) fX 85-89 1 87 87 80-84 1 82 82 75-79 3 78 234 70-74 1 72 72 65-69 2 68 136 60-64 10 62 620 55-59 3 58 174 50-54 8 52 416 45-49 4 48 192 40-44 4 42 168 35-39 3 38 114 N=40 fX=2295 fX N 2295 57.38 = 40 M Shortcut Method M A fx ' i N Psychological Statistics Page 20 School of Distance Education Where, A = assumed mean i = class internal f = frequency N = total frequency x' = X A , where, X is the mid-point of the class. i Consider the data given in Example 2.2. Compute mean by using shortcut method. Scores Frequency(f) Mid-point (X) x' = (X-A)/i fx' 85-89 1 87 5 5 80-84 1 82 4 4 75-79 3 78 3 9 70-74 1 72 2 2 65-69 2 68 1 2 60-64 10 62 0 0 55-59 3 58 -1 -3 50-54 8 52 -2 -16 45-49 4 48 -3 -12 40-44 4 42 -4 -16 35-39 3 38 -5 -15 N=40 fx' = -40 fx ' M A i N 62 40 5 40 62 5 = 57 Median When the items of a series are arranged in ascending or descending order of magnitude, the measure or value of the central item in the series is called as Median. Median is a value that divides the distribution into two parts, ie., half of the value lies above the Median and half below it. Characteristics of Median 1. It is the value that occupies the middle point of the distribution, such that half the items fall above it and half below it. 2. The value of median doesn’t reflect the values in a given distribution. 3. A distribution has only one median. 4. Median cannot be manipulated algebraically. 5. Computation of median requires the proper ordering of values. Psychological Statistics Page 21 School of Distance Education 6. It is possible to compute median of an open ended distribution. Advantages 1. It is simple to calculate. 2. Easy to understand. 3. It is possible to calculate median in all distributions. 4. Median can be calculated even with extreme values. 5. It is very useful in quantitative analysis where order of score is emphasised (ie., ordinal). Limitations 1. It has only limited use. 2. Not appropriate for qualitative phenomenon. 3. Not applicable where items are assigned weights. Computation of Median for Ungrouped Data i. When the number of items in a distribution (N) is odd When N, ie., the number of items in a distribution is an odd number, Median is computed using the following formula: Median (Md)= the measure or value of the (N=1)/2th item. Example 2.3: The marks obtained by 5 students in a test are 42, 50, 64, 56, 35. Compute the Median mark obtained in the test. The first step in the calculation of Median is to arrange the scores either in ascending or descending order. By arranging the marks in ascending order we get 35, 42, 50, 56, 64. Since N=5, which is an odd number, we compute Median by using the formula Median (Md)= the measure of (N+1)/2th item viz., = the measure of (5+1)/2th item = the measure of 3rd item, ie., 50 ii. When the number of items in a distribution (N) is even When N, ie., the number of items in a distribution is an even number, Median is computed using the following formula: Median( M d ) Value of (N/2) th item Value of [(N/2) + 1]th item 2 Example 2.4: The marks obtained by 8 students in an achievement test are 50, 42, 60, 35, 56, 65, 40, 62. Calculate the Median mark obtained. Arranging the marks in ascending order we get, 35, 40, 42, 50, 56, 60, 62, 65. Psychological Statistics Page 22 School of Distance Education Median( M d ) Value of (N/2) th item Value of [(N/2) + 1]th item 2 Where, N=8 Value of (N/2)th item = 8/2= 4th item, ie., 50 Value of [(N/2) + 1]th item = 4 +1 = 5th item, ie., 56 Therefore, Median is (50 + 56)/2, ie., 53. Example 2.5: The table gives salary to employees in a firm. There are 52 employees working. Compute the median salary paid to employees in a month. Salary (in thousands): 4 7 8 10 11 12 13 14 15 Number of employees: 3 4 7 9 12 8 4 2 1 Solution: Salary (in thousands) 4 7 8 10 11 12 13 14 15 No. of employees (f) 3 4 7 9 12 8 4 2 1 N=50 Cumulative Frequency (cf) 3 7 14 23 35 43 47 49 50 N 1 th item 2 Median (Md) = Measure of 50 1 51 25.5 2 2 Here, 25.5th item comes after the cumulative frequency 23. Therefore it will be included in 35; and hence the Median salary will be Rs. 11000. Computation of Median for Grouped Data Consider the following example for computation of Median for grouped data or data in continuous series. Example 2.6: The monthly income of staff members of an institution Monthly Income: 2000 – 2500 Staff : 3 Md = l 1500 – 2000 1000 – 1500 500 – 1000 0 – 500 No. of 14 27 34 46 i( N / 2 F ) f Psychological Statistics Page 23 School of Distance Education Where, l = Exact or actual lower limit of the median class F = Total of all frequencies before the median class f = Frequency of the median class i = Class interval N= Total frequencies Monthly Income 2000 – 2500 1500 – 2000 1000 – 1500 500 – 1000 0 – 500 f 3 14 27 34 46 F 124 121 107 80 46 Median class can be computed as follows: Firstly, find N/2 = 124/2 viz., 62 Then, find the cumulative frequency in which the 62 can be included. Here, 62 can be included in the cumulative frequency (F) 80. Therefore the median class is 500 – 1000. Now, applying the formula we get, Md = l i( N / 2 F ) f 499.5 500 (62 46) 34 499.5 500 16 734.79 34 Mode Mode is the value or measure that occurs most frequently in a distribution. The score or value corresponds to the maximum frequency of the distribution. Characteristics of Mode 1. It is the most frequently occurring value in a distribution. 2. A distribution may have two or more modes. 3. Mode does not reflect the other values in a given distribution. 4. It cannot be manipulated algebraically. 5. The computation of mode requires proper ordering of data. 6. It is possible to calculate mode of an open ended distribution. Psychological Statistics Page 24 School of Distance Education Advantages 1. Mode can be easily computed. 2. It can be also identified by graph. 3. It is not affected by extreme values. 4. It is very useful for business purposes. Limitations 1. It is not a stable measure of central tendency. 2. It cannot be put to algebraic treatment. 3. It remains indeterminate when there exists two or more modal values in a series. 4. It is not suitable where the relative importance of items is under consideration. Computation of Mode from Ungrouped Data In the case of ungrouped data, mode is the value or score that occurs maximum number of times in a distribution. That is, it is the value or measure that has the maximum frequency. Example 2.7: Compute mode from the following distribution: 34, 23, 45, 34, 48, 54, 56, 34, 76, 45. Here, 34 occurs the most number of times ie., three times. Hence, in the example given, the value of mode is 34. Computation of Mode from Grouped Data In data which is given in the form of a frequency distribution (grouped data or continuous series), Mode is computed using the formula, Mode (Mo) = 3Md – 2M Where, Md is the median and M is the Mean of the given distribution. The Mean and Median are first computed and subsequently Mode is computed. Mode can also be computed directly from the frequency distribution table without calculating mean and median. For this, the following formula is used: Mo l f1 f 0 (l 2 l1 ) 2 f1 f 0 f 2 Where, l1= lower limit of the modal class l2= upper limit of the modal class f1=frequency of the modal class f0= frequency of the class preceding (before) the modal class f2= frequency of the class succeeding (after) the modal class Example 2.8: The following data relates to the different income groups of 45 farmers in a village. Psychological Statistics Page 25 School of Distance Education Income groups No. of farmers 30000 – 35000 2 35000 – 40000 5 40000 – 450000 10 45000 – 50000 8 50000 – 55000 3 55000 – 60000 10 60000 – 65000 7 N= 45 Solution: Mo l f1 f 0 (l 2 l1 ) 2 f1 f 0 f 2 M o 45000 8 10 5000 2(8) (10) (3) 45000 2 5000 16 13 45000 10000 45000 3333 41667 3 Psychological Statistics Page 26 School of Distance Education Module 3: Measures of Dispersion Measures of central tendency provide a value that can be used to represent the characteristic of a given distribution. This single value or measure can be used to represent the characteristic of the entire distribution or group. But they do not show how the individual scores are ‘spread’ or ‘scattered’, which is very important in cases where we have to describe and compare two or more frequency distributions or sets of scores. There is a tendency for data to be dispersed, scattered or to show variability around the average. The tendency of scores to ‘scatter’ or ‘spread’ or deviate from the average or central value is termed as the measure of dispersion or variability. It is to be noted that if dispersion is less, the average is more representative of the distribution and vice versa. Measures of Dispersion The measure of dispersion gives the degree of variability or dispersion by a single value, which tells us how the individual scores are scattered or spread throughout the distribution or data. There are four measures of variability or dispersion. They are the following: 1. 2. 3. 4. Range (R) Quartile Deviation (QD) Average Deviation (AD) Standard Deviation (SD) 1. Range (R) Range is the simplest measure of variability or dispersion. It is computed by subtracting the lowest score in the series from the highest score. Lower the range, less scattered would be the variations and higher the range, more scattered would be the variations. However, range is a very crude or rough score as it takes in to account only the extreme values and ignore the variation of individual items. Range (R)= Largest value – Smallest value Coefficient of Range For comparative purposes, absolute measure has to be converted into relative measure. This is done by computing coefficient of variation. Here, in this case, we are considering range, and hence we have to computeCoefficientofrange L arg estValue SmallestValue L arg estValue SmallestValue Quartile Deviation (QD) The total distribution is divided in to four quartiles or parts which includes Q1 (25%), Q2 (25%), Q3 (25%) and Q4 (25%). Quartile Deviation (QD) is one half of the Psychological Statistics Page 27 School of Distance Education difference between the 3rd quartile which is Q3 and the 1st quartile is Q1. The formula for Quartile Deviation is given below: QD Q3 Q1 2 Where, Q3 l i(3N / 4 F ) f Q1 l i( N / 4 F ) f The value Q3 – Q1 is the difference or range between the 3rd quartile and the 1st quartiles. This value is also called the interquartile range. While computing Quartile Deviation, this interquartile range is divided by 2, and hence, Quartile Deviation is also called as semi-interquartile range. Example 3.1: Compute quartile deviation from the data given below. Class 90-99 80-89 70-79 60-69 50-59 40-49 30-39 20-29 10-19 0-9 3rd quartile= F 1 5 12 20 26 13 8 7 4 4 N=100 F 100 99 94 82 62 36 23 15 8 4 Q3 Q2 Q1 3N 3x100 75 4 4 Where, 75 is included in the cumulative frequency 82. 2nd Quartile, 50 is included in the cumulative frequency 62, hence median class is 50 – 59. QD Q3 Q1 2 Q3 l i(3N / 4 F ) f Psychological Statistics Page 28 School of Distance Education 3 x100 10 62 4 = 59.5 20 59.5 = = 59.5 Q1 l 10 13 66 20 i( N / 4 F ) f 39.5 QD 10(75 62) 20 10(100 / 4 23) 13 39.5 10(25 23) 13 39.5 10 2 41.04 13 Q3 Q1 2 66 41.04 12.48 2 Mean Deviation or Average Deviation Garrett (1971) defines Average Deviation as the mean of deviations of all the separate scores in the series taken from their mean. This measure of variability takes in to account the fluctuation or variation of all the items in a series. Computation of Mean Deviation from Ungrouped Data The following formula is used for ungrouped data: MD x N Where, x=X– X X is the raw score M is the Mean value x is the absolute value of x, ie., value of x by ignoring the signs +ve or –ve. Example 3.2: find the Mean Deviation of the scores 35, 32, 17, 20, 31. Solution: N=5 Psychological Statistics Page 29 School of Distance Education Mean= (35+32+17+20+31) / 5 = 135 / 5 = 27 X 35 32 17 20 31 N=5 MD x N x x= X – X 8 5 -10 -7 4 = 8 5 10 7 4 x = 34 34 6.8 5 Computation of Mean Deviation from Grouped Data The following formula is used to compute Mean Deviation for grouped data: MD fx N Example 3.3: Compute mean deviation from the data give below. Scores frequency 50-54 3 45-49 4 40-44 6 35-39 11 30-34 14 25-29 12 20-24 9 15-19 4 10-14 2 Solution: Scores 50-54 45-49 40-44 35-39 30-34 25-29 20-24 15-19 10-14 f 3 4 6 11 14 12 9 4 2 N=65 Psychological Statistics X 52 47 42 37 32 27 22 17 12 fX x=X- X 156 20 188 15 252 10 407 5 448 0 324 -5 198 -10 68 -15 24 -20 fX= 2065 fx 60 60 60 55 0 -60 -90 -60 -40 fx 60 60 60 55 0 60 90 60 40 fx = 485 Page 30 School of Distance Education Mean or X MD fx N fX 2065 31.77 32 N 65 485 7.46 65 2. Variance and Standard Deviation Variance is the measure of dispersion which eliminates the sign problem caused by the negative deviations cancelling out the positive deviations. The procedure is to square the deviation scores and divide their sum by number of scores in the distribution. ( X 1 X ) Variance S n 2 2 Standard Deviation (SD) is regarded as the most stable measure of variability as mean is used for its computation. Standard Deviation of a set of scores is defined as the square root of the average of the squares of the deviations of each score from the mean. It will always be a positive number. SD explains how much dispersion is there in the distribution of the given data. Standard Deviation is interpreted as an index of variation. The larger the standard deviation, the greater is the variation or spread of the scores in the distribution. If there is no variation of scores, then the standard deviation is always zero. Standard deviation is often referred to as root mean square deviation and is denoted by the Greek letter sigma ( ). Since the algebraic sign +ve and –ve are not ignored, it is more accurate than Mean Deviation. Characteristics of Standard Deviation 1. 2. 3. 4. 5. It is the most important measure of dispersion. It measure variability or spread of scores in a distribution. Standard deviation will be a positive number. It is more accurate and justified measure of dispersion. It is more accurate than mean deviation since + and – signs are not ignored in the calculation. The formula for computing SD is given below. SD ( X X ) 2 N x 2 N Where, X = individual score X = mean of all scores N = total number of items Psychological Statistics Page 31 School of Distance Education x = deviation of each score from the mean ie., X – X Computation of Standard Deviation from Ungrouped Data Standard deviation of ungrouped data can be computed using the formula given below. SD x 2 N Example 3.4: Compute standard deviation of the following distribution. Score: 68, 62, 58, 64, 52, 58, 50, 68 Mean X N = 68 62 58 64 52 58 50 68 8 = 480 8 = 60 Score (X) x=X- X 68 8 62 2 58 -2 64 4 52 -8 58 -2 50 -10 68 8 S D( ) = x2 64 4 4 16 64 4 100 64 2 x = 320 x 2 N 320 8 40 6.32 Computation of Standard Deviation from Grouped Data Standard deviation of grouped data can be computed using the formula given below. Psychological Statistics Page 32 School of Distance Education Mean X N Example 3.5: Compute Standard Deviation for the frequency distribution given below. The mean of the distribution is 115. Scores Frequency 127-129 1 124-126 2 121-123 3 118-120 1 115-117 6 112-114 4 109-111 3 106-108 2 103-105 1 100-102 1 N=24 Solution: Scores Frequency X x= X- X 127-129 1 128 13 124-126 2 125 10 121-123 3 122 7 118-120 1 119 4 115-117 6 116 1 112-114 4 113 -2 109-111 3 110 -5 106-108 2 107 -8 103-105 1 104 -11 100-102 1 101 -14 N=24 SD x2 169 100 49 16 1 4 25 64 121 196 fx2 169 200 147 16 6 16 75 128 121 196 2 fx = 1074 fx 2 1074 44.75 6.69 N 4824 Coefficient of Variation or Coefficient of Relative Variability It is often desirable to compare variabilities when means are unequal or when units of measurement from test to test are incommensurable. A statistic useful in making such comparisons is the coefficient of variation or V, sometimes called the coefficient of relative variability. This measure was first suggested by Karl Pearson as the percentage variation in a mean, the standard deviation being treated as the total variation in the mean, symbolically coefficient of variation. Coefficient of variation stands for the percentage which the value of standard deviation is, to the value of the mean. That is, if standard deviation is divided by the mean and multiplied by 100, we get the coefficient of variation. Psychological Statistics Page 33 School of Distance Education The following formula is used for computing coefficient of variation: CoefficientofVariation(V ) S tan dardDeviation 100 Mean Example 3.6: The mean of a distribution is 50 and SD is 10. find the coefficient of variation. Solution: Coefficient of Variation (V) = 10 100 ie., 20%. 50 It means that the SD is 20% of mean. Coefficient of variation (V) is a primary tool in the statistical analysis of data because, being expressed as a percentage, the units of the variables can be ignored. Problems relating to conversion of different units of the variables in to a standard unit for purpose of uniform expression do not arise. Coefficient of variation is only a percentage of SD to the mean of a given distribution. ********************* Psychological Statistics Page 34