Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
UNIT I 1 Variable: a variable is a value that may change within the scope of a given problem or set of operations. Data: The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data Collection: Data collection is a term used to describe a process of preparing and collecting data. Data are generally classified into following two groups: a) Internal data b) External data 2 Internal data: It comes from internal sources related with the functioning of an organization or firm where records regarding purchase, production, sales, profits etc. are kept on regular basis. External data: The external data are collected and published by external agencies. The external data can further classified as: a) Primary data b) Secondary data Primary Data: These are original and first hand information. Secondary data: these are one which are already been collected by a source other than the present investigator. 3 Primary Sources of data External Secondary Internal 4 The Collected data or raw data or ungrouped data are always in an unorganized form and need to be organized and presented in meaningful and readily comprehensible form in order to facilitate further statistical analysis. Classification: It is the process of arranging things in the groups according to their resemblances and affinities and gives expression to the unity of attributes that may subsist amongst a diversity of individuals. Or in simple words it is grouping of data according to their identity, similarity or resemblances. For eg. Letters in the post office are classified according their destinations viz., Delhi, Raipur, Agra, Kanpur etc. 5 Chronological or Temporal Classification Types of Classification Geographical or Spatial Classification Qualitative classification Quantitative Classification 6 Chronological or Temporal classification: In Chronological classification, the collected data are arranged according to the order of time expressed in years, months, weeks etc. The data are generally classified in ascending order of time. Example: The estimates of birth rates in India during (1970-79) are: Year 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 Birth rate 36.8 33.0 33.0 36.9 36.6 34.6 34.5 35.2 34.2 33.3 7 Geographical or Spatial classification: In this type of classification the data are classified according to geographical region or place. The observations are either classified in the alphabetical order of the reference places or in the order of size of the observation. Example: 1. When the names of countries are in alphabetical order: Country America China Denmark France India Yield of wheat 1925 893 225 439 862 2. When observations are in descending order: Country America China India France Denmark Yield of wheat 1925 893 862 439 225 8 Qualitative classification: In this type of classification data are classified on the basis of some attributes or quality like literacy, religion, employment etc. Such attributes cannot be measured along with a scale. When the classification is done w.r.t on attribute, which is dichotomous in nature, two classes were formed, one possessing the attribute and the other not possessing the attribute. This type of classification is called Simple or dichotomous classification. 9 The classification where two or more attributes are considered and several classes are formed, is called a manifold classification. 10 Population Urban Male Rural Female Male Female 11 Quantitative classification: The collected data are grouped with reference to the characterstics which can be measured and numerically described such as height, weight, sales, imports, age, income etc. 12 If data is arranged in ascending or descending order of magnitude then it is said to be an array. Example: Consider the marks of 50 students Ungrouped data 21 50 42 75 55 67 74 55 47 64 71 61 40 25 25 54 64 37 88 44 31 70 81 51 45 63 49 43 35 67 68 31 38 45 59 75 57 29 66 50 56 84 56 88 63 32 55 88 79 78 Arranged in array 21 31 40 45 51 56 61 66 71 79 25 32 42 47 54 56 63 67 74 81 25 35 43 49 55 57 63 67 75 84 29 37 44 50 55 58 64 68 75 84 31 38 45 50 55 59 64 70 78 88 13 Diagrammatical representation: In this presentation we make use of geometric figures like bars, squares, rectangles, circles etc. 14 One Dimensional diagrams Two dimensional diagrams Types of Diagrams Pictograms Cartograms 15 1. One Dimensional diagrams: One Dimensional diagrams are also called Bar diagrams, widely used diagrams for the visual presentation of data. 16 Simple Bar Diagram Multiple bar diagram Subdivided bar diagram One Dimensional diagram Percentage bar diagram Deviation Bar Diagram Broken bars 17 i. Simple Bar-diagrams: It consists of number of rectangles and is used only for onedimensional comparisons. It is generally used to show changes in the magnitudes of a phenomenon over time or space. 18 Example: Draw a bar diagram to represent the following data related to a school Year 1990 1991 1992 1993 1994 1995 No. of students 210 242 290 315 340 355 Present the data with a suitable diagram 19 20 ii. Multiple Bar-diagrams: It is used when a comparison is to be made between two or more variables. These are also used for comparing magnitudes of one variable in two or three aspects. Example: Following data relate to the facultywise enrolment of students in a college: Years 1993 1994 1995 No. of arts students 95 110 120 No. of science students 160 170 165 95 110 No. of commerce students 75 Represent the data by suitable diagram. 21 22 iii. Subdivided Bar-diagrams: also known as components bar diagram, is useful in a situation when it is necessary to show and compare the breakup of one variable into several components. Example: Following data relate to year wise enrolment in a college, classified according to sex: Year 1990-91 1991-92 1992-93 1993-94 1994-95 No. of girls 810 825 844 780 820 No. of Boys 1215 1160 1325 1410 1480 Total 2025 1985 2169 2190 2300 Represent the data by suitable diagram 23 24 iv. Percentage Bar-diagrams: The construction of percentage bar diagram is similar to the subdivided bar chart. The difference between the two is that, in subdivided bar diagram , the component parts are shown in absolute quantities, while in the percentage bar diagram, the component parts are transformed into percentages of the total. In this diagram all the bars are of equal heights. These bars are then divided in terms of percentages of the components. 25 Example: Following data relate to the facultywise enrolment of students in a college: Years 1993 1994 1995 No. of arts students 95 110 120 No. of science students 160 170 165 95 110 No. of commerce students 75 Represent the data by percentage bar diagram. 26 Data can be represented as % of students Year Total % Arts Science Commerce 1993 28.79 48.48 22.73 100 1994 29.33 45.34 25.33 100 1995 30.38 41.77 27.85 100 27 28 v. Deviation Bar-diagrams: These are used to show the magnitudes of a phenomenon, i.e. net profit, net loss, net exports or imports etc. Bars in these diagrams can assume both negative and positive values. Example: Depict the following data by a suitable diagram (Balance of trade=Export-Import) Year Export Import Balance of trade (Millions Rs.) 1993 98 115 -17 1994 110 140 -30 1995 115 96 +19 1996 120 100 +20 29 30 vi. Broken bars: It is used to represent series having wide variations in values. Example: The following data relate to sales in five firms A,B,C,D,E. Firms A B C D E Sales (in Lakh Rs.) 25 38 300 200 56 Use a suitable bar diagram to represent the data. 31 32 2. Two dimensional diagrams: Such diagrams are useful in situations when the proportion between the magnitudes of the given values of the variable is quite large. 33 Rectangle diagram Square and circle diagram Two Dimensional diagram Pie diagram Multiple pie diagram 34 i. Rectangle Diagrams: These diagrams are used for two dimensional comparisons. These rectangles vary in height as well as in the width, so that the areas of rectangles represent the magnitude of the variable over time or space or over some other characteristic of variation. 35 Example: The following data represent the expenditure of the two families on various items. Represent the data by a rectangle diagram. S. No. Items Expenditure (Rs.) Family A Family B 1 Food 1200 1700 2 Clothing 500 800 3 House Rent 600 900 4 Fuel and electricity 250 300 5 Miscellaneous 450 800 Total 3000 4500 36 37 ii. Squares and circle Diagrams: It is useful when the proportion between the magnitudes of the given value is quite large. For drawing squares, sides of squares are kept proportional to the magnitudes of the values and for circle diagrams, radii of the circles should be proportional. Example: The following data relate to the plan outlay of a country for three plans. Five year plan I IV VII Outlay ( Rs. ‘000 crores) 196 2060 8820 Represent the data by a square and circle diagram. 38 Plan Outlay Side of square Or radius of circle I 196 14 0.7 IV 2060 45.39 2.26 V 8820 93.91 4.7 Ratio 39 Square Diagram Plan I a=0.7” Plan IV a=2.26” Plan V a=4.7” 40 Circle Diagram Plan I r=0.7” Plan IV r=2.26” Plan V r=4.7” 41 iii. Pie- Diagrams: This diagram is generally used to compare the relations between various subdivisions of the value. Pie diagram is circle divided into sectors with areas equal to the corresponding components. A pie diagram shows the components or subdivisions in terms of percentages only and not in absolute terms. Example: The following data relate to faculty wise enrolment in a college Faculty Science Arts Commerce Total No. of students 2010 2390 5500 1100 Represent the data by a pie diagram. 42 Faculty No. of students Angle in degree Science 2010 2010 3600 131.560 5500 Arts 1100 Commerce 2390 1100 3600 720 5500 2390 3600 156.440 5500 Total 5500 3600 43 44 iv. Multiple Pie Diagrams: A multiple pie diagram is used for two dimensional comparisons, where a variable value is shown over time, space or in terms of some other characteristic and the variable values are also broken into components. 45 Example: The following data represent the expenditure of the two families on various items. Represent the data by a multiple pie diagram. S. No. Items Expenditure (Rs.) Family A Family b 1 Food 1200 1700 2 Clothing 500 800 3 House Rent 600 900 4 Fuel and electricity 250 300 5 Miscellaneous 450 800 Total 3000 4500 46 Expenditure (Rs.) S. No. Angles in degrees Items Family A Family B Family A Family B 1 Food 1200 1700 1200 3600 1440 3000 1700 3600 1360 4500 2 Clothing 500 800 500 3600 600 3000 800 3600 640 4500 3 House Rent 600 900 600 3600 720 3000 900 3600 720 4500 4 Fuel and electricity 250 300 250 3600 300 3000 5 Miscellaneous 450 800 Total 3000 4500 450 3600 540 3000 3600 300 3600 240 4500 800 3600 640 4500 3600 47 48 3. Pictorial diagrams or Pictogram: Statistical data may be represented with the help of pictures also. Such a presentation is called pictorial diagram or pictogram. In pictograms, the magnitude of the values are explained with the help of pictures. In a pictogram, a symbolic picture represents the total magnitude of the values. Example: The following data relate to the production of electric bulbs in a factory. Year 1992 1993 1994 1995 Production of bulbs (In millions) 32 57 79 89 Represent the data by pictogram. 49 90 80 70 60 50 40 30 20 10 1992 1993 1994 1995 50 4. Cartograms or Maps: Statistical data classified according to geographical regions are also representable with the help of suitable maps. The representation of statistical data by maps is called cartogram. 51 52 Graphical representation: It is used in the situations when we observe some functional relationship between the values of the variables. It provides us an accurate conception of the shape of a frequency distribution. There are many forms of graphs which can be broadly classified as: 1. Graphs of frequency distribution 2. Graphs of time series or line graphs 53 Histogram Graphs of frequency distributions Frequency Polygon Frequency Curve Cumulative frequency curve or Ogives 54 Graphs of frequency distribution: The graphs representing a frequency distribution are: 1. Histogram 2. Frequency Polygon 3. Frequency curve 4. Cumulative frequency curve or ‘Ogive’ 55 Example: The table below given the distribution of the age of members in a sports club Age Group (years) No. of members 15-19 11 20-24 36 25-29 28 30-34 13 35-39 7 40-44 3 44-49 2 56 The smoothened frequency distribution will be Age groups (years) No. of members 14.5-19.5 11 19.5-24.5 36 24.5-29.5 28 29.5-34.5 13 34.5-39.5 7 39.5-44.5 3 44.5-49.5 2 57 Histogram for above data is represented as 58 The following chart shows the frequency polygon 40 35 30 25 20 15 10 5 0 14.5 19.5 24.5 29.5 34.5 39.5 Age 44.5 Group 59 The following chart shows the frequency curve 1 0.8 14.5 0.6 19.5 0.4 24.5 29.5 0.2 39.5 0 Age 44.5 Group 60 Measures of Dispersion 61 The extent or degree to which data tend to spread around an average is called the dispersion or variation. Measures of dispersion help us in studying the extent to which observations are scattered around the average or central value. 62 Types of Dispersion: There are two types of measures of Dispersion a) Absolute measure of Dispersion: These are expressed in the same unit in which the observations are given. Thus, absolute measures of dispersion are useful for comparing variation in two or more distributions where units of measurement is the same. Such measures are not suitable for comparing the variability of the distributions expressed in different units measurement. 63 b) Relative measure of dispersion: These are expressed as ratio or percentage or the coefficient of the absolute measure of dispersion. Relative measures are useful for comparing variability in two or more distributions where units of measurement may be different. 64 Various measures of Dispersion The following are some important measures of dispersion: 1. Range 2. Interquartile Range and Quartile Deviation 3. Mean Deviation or average deviation 4. Standard Deviation 65 Range: Range is the simplest measure of Dispersion. For a given set of observations, the range is the difference between the largest and the smallest observation. Thus Range=R=L-S Where L=the largest observation S= the smallest observation R= the Range In case of grouped data, the range is defined as the difference between the upper limit of the highest class and the lower limit of the smallest class. 66 Coefficient of Range: Range is an absolute measure of dispersion which is unsuitable for comparing variation in two or more distributions expressed in different units. So a relative measure of dispersion called the coefficient of range is defined as: LS Coefficient of range= LS 67 Example: Marks of 10 students in Mathematics and Statistics are given below: Marks in Mathematics 25 40 30 35 21 45 23 33 10 29 Marks in Statistics 30 39 23 42 20 40 25 30 18 19 a) Compare the range of marks in the two subjects. b) Compare the coefficients of range for both the subjects. 68 Solution: Highest Marks in Mathematics = 45 Lowest marks in Mathematics = 10 Range of marks in Mathematics R L S 45 10 35 Coefficient of Range= L S 45 10 0.64 L S 45 10 Highest Marks in Statistics = 42 Lowest marks in Statistics = 18 Range of marks in Statistics R L S 42 18 24 L S 42 18 0.4 Coefficient of Range= L S 42 18 The range as well as the coefficient of range for marks in Mathematics are higher than that of marks in Statistics. 69 Example: Find the range and coefficient of range from the following Mid value 5 10 15 20 25 30 35 Frequency 7 5 8 12 8 9 8 Solution: Class limits are (2.5-7.5), (7.5-12.5),…………., (32.5-37.5) Range R L S 37.5 2.5 35 L S 37.5 2.5 0.875 Coefficient of range LS 37.5 2.5 70 Interquartile Range and Quartile Deviation: Interquartile range includes the middle fifty percent of the distribution or it is the difference between the third quartile (Q3) and the first quartile (Q1). Interquartile Range= Q3-Q1 Quartile Deviation or semi interquartile range is defined as the average amount by which the two quartiles differ from the median. Quartile deviation or semi interquartile range=(Q3-Q1)/2 71 Quartile deviation is an absolute measure of dispersion. For comparing two or more distributions in respect of variation, the coefficient of quartile deviation is defined as Q3 Q1 Coefficient of Q.D.= Q Q 3 1 Example: From the following information of wages of 15 workers, find interquartile range, quartile deviation and coefficient of Q.D. S.No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Wages 520 550 440 580 450 620 470 680 400 490 420 480 440 480 500 (Rs.) 72 Solution: Arrange the wages in ascending order S.No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Wages 400 420 440 440 450 470 480 480 490 500 520 550 580 620 680 (Rs.) N 1 th Q1 term 4 term 440 4 th N 1 th Q3 3 term 12 term 550 4 th Interquartile Range = Q Q 550 440 110 Q3 Q1 550 440 Quartile deviation= 2 2 55 Coefficient of Quartile Deviation= 3 1 Q3 Q1 550 440 0.11 Q3 Q1 550 440 73 Example: Calculate quartile deviation and its coefficient from the following distribution: Weekly income (Rs.) 58 59 60 61 62 63 64 65 66 No. of workers 2 3 6 15 10 5 4 3 1 74 Solution: Weekly income (Rs.) 58 59 60 61 62 63 64 65 66 No. of workers 2 3 6 Cumulative Frequency 2 5 11 26 36 41 45 48 49 15 10 5 4 3 1 N 1 th Q1 term 12.5 term 61 4 th N 1 th Q3 3 term 37.5 term 63 4 th Q3 Q1 63 61 1 2 2 Quartile deviation= Coefficient of Quartile Deviation= Q3 Q1 63 61 0.016 Q3 Q1 63 61 75 Example: The following is the age distribution of 799 workers. Age Group 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60 No. of workers 50 100 120 70 59 70 180 150 Find Quartile deviation and its coefficient. Solution: Age Group 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60 No. of workers 50 70 100 180 150 120 70 59 Cumulative Frequency 50 120 220 400 550 670 740 799 76 th N Q1 term 199.75th term lies in 30 35 4 N C 799 120 4 Q1 l1 4 h 30 5 33.9875 34 f 100 th N Q3 3 term 599.25th term lies in 45 50 4 3N C 3 799 550 4 4 Q3 l1 h 45 5 47.05 47 f 120 Q3 Q1 47 34 6.5 2 2 Quartile Deviation= Coefficient of Quartile deviation= Q3 Q1 47 34 0.16 Q3 Q1 47 34 77 Mean Deviation or Average deviation: Mean deviation of a series is the arithmetic mean of the absolute deviations of various items from some central value, such as mean, median, mode. 1. For ungrouped data a) Mean deviation from mean X 1 M .D. X X N b) Mean deviation from median M d X M .D.M d 1 N X M d c) Mean deviation from mode M o 1 M .D. X M N Mo o 78 2. For grouped data a) Mean deviation from mean X 1 M .D. f X X f b) Mean deviation from median M d i X i i M .D.M d 1 f f i Xi Md i c) Mean deviation from mode M o M .D.M o 1 f f i Xi Mo i 79 Coefficient of Mean deviation: Mean deviation is an absolute measure of dispersion. The corresponding relative measure called coefficient of mean deviation, is obtained by dividing mean deviation by the average or central value used for calculating it. M .D. Coefficient of M.D.= Mean or median or mod e 80 Example: Compute mean deviation from mean and its coefficient from the following data relating to the marks obtained by a batch of 11 students in a class test: Marks 10 70 50 53 20 95 55 42 60 48 80 81 Solution: Marks(X) XX 10 43 70 17 50 3 53 0 20 33 95 42 55 2 42 11 60 7 48 5 80 27 583 190 X 583 53 11 Mean deviation= 1 N X X 190 17.27 11 Coefficient of mean deviation= M .D. 17.27 0.325 Mean 53 82 Example: Calculate mean deviation from median from the following data. Also compute the coefficient of M.D. Size 2 4 6 8 10 12 14 16 Frequency 2 2 4 5 3 2 1 1 83 Solution: f X Md X Md X F C.F. 2 2 2 6 12 4 2 4 4 8 6 4 8 2 8 8 5 13 0 0 10 3 16 2 6 12 2 18 4 8 14 1 19 6 6 16 1 20 8 8 32 56 20 N 1 th term 10.5 term 8 2 th Median(Md)= 2.8 Mean deviation= N1 f X M 56 20 Coefficient of mean deviation= M .D. d median 2. 8 0.35 8 84 Example: Compute the mean deviation (M.D.) from mean from the following data. Classes 0-20 20-40 40-60 60-80 80-100 100-120 Frequency 5 50 10 6 84 32 Also find the coefficient of M.D. 85 Solution: Frequency Mid d x 70 20 (f) point(x) Classes fd X 51 f X 51 0-20 5 10 -3 -15 41 205 20-40 50 30 -2 -100 21 1050 40-60 84 50 -1 -84 1 84 60-80 32 70 0 0 19 608 80-100 10 90 1 10 39 390 100-120 6 110 2 12 59 354 187 -177 2691 fd h 70 177 20 51.06 51 187 f Mean= 1 2691 f X X 14.39 Mean deviation= N 187 Coefficient of mean deviation= M .D. 14.39 0.28 A mean 51 86 Standard Deviation: The standard deviation is defined as the positive square root of the arithmetic mean of the squares of deviations of the observations from the arithmetic mean. a) For ungrouped data 1 2 X X N b) For grouped data or frequency distribution 1 2 f X X N 87 Variance: The square of standard deviation is known as variance. a) For ungrouped data 1 N 2 X X 2 b) For grouped data or frequency distribution 1 N 2 f X X 2 88 Example: Calculate Standard deviation from the following set of observations: X 10 11 17 25 7 13 21 10 12 14 89 Solution: X X-14 (X-14)2 10 -4 16 11 -3 9 17 3 9 25 11 121 7 -7 49 13 -1 1 21 7 49 10 -4 16 12 -2 4 14 0 0 140 274 X 14 Mean = N 140 10 Standard deviation= 1 X X 2 274 5.23 N 10 90 Example: Calculate standard deviation of the following discrete frequency distribution Size(X) 4 5 6 7 8 Frequency 6 12 15 28 20 9 10 14 5 91 Solution: Size(X) Frequency (f) d=X-7 fd fd2 4 6 -3 -18 54 5 12 -2 -24 48 6 15 -1 -15 15 7 28 0 0 0 8 20 1 20 20 9 14 2 28 56 10 5 3 15 45 6 238 100 fd h 7 6 1 7.06 100 f Mean = Standard deviation= A h 2 fd 238 6 1 1.54 f f 100 100 fd 2 2 92 Moments: Moments are used to describe the characteristics of a distribution. The moments of a distribution are the arithmetic mean of the various powers of the deviations of items from some given numbers. 93 Moments about mean (Central moment): a) For an individual series: If x1 , x2 ,...xn be the n observations in a data set with mean x then rth moment about the mean of a variable is defined as n r x x i 1 r i n , r 0,1,2,..... 94 b) For grouped data or frequency Distribution: Let x1 , x2 ,...xn be the n observations in a data set with corresponding frequencies f1 , f 2 ,... f n respectively. Then rth moment about the mean of a variable is defined as n r where f x x i 1 r i i , r 0,1,2,..... N n N fi i 1 95 In particular 0 1 1 0 2 2 96 Moments about an arbitrary point (Raw moment): x the n a) For an individual series: If x1 , x2 ,...xn be observations in a data set then rth moment about arbitrary point A is defined as n r ' x A i 1 r i n , r 0,1,2,..... 97 b) For grouped data or frequency Distribution: Let x1 , x2 ,...xn be the n observations in a data set with corresponding frequencies f1 , f 2 ,... f n respectively. Then rth moment about arbitrary point A is defined as n r ' where f x A i 1 r i i , r 0,1,2,..... N n N fi i 1 98 In particular 0 1 ' 1 x A ' 1 2 N ' n f x A i 1 2 i i 99 Moment about zero or origin: Let x1 , x2 ,...xn be the n observations in a data set with corresponding frequencies f1 , f 2 ,... f n respectively. Then rth moment about origin is defined as n r where fx i 1 i i N r , r 0,1,2,..... n N fi i 1 100 In particular 0 1 1 x 1 2 N n fx 2 i i i 1 101 ' Relation between r and r r ' r ' ' r ' '2 r r C1r 1 1 C2 r 2 1 .... 1 1' r In particular '2 2 2 1 ' ' ' '3 3 3 32 1 21 ' ' ' ' '2 4 4 43 1 62 1 ' '4 31 102 ' Relation between r and r r r ' rC1r 1' A rC2r 2' A2 .... Ar In particular 1 x 2 ' 2 21 A ' 2 A 103 Relation between r and r r r rC1r 1x rC2r 2 x 2 .... x r In particular 1 x 2 2 x 2 3 3 3 2 x x 3 4 4 4 3 x 6 2 x 2 x 4 104 Example: Calculate first four moments about mean from the following distribution: X 0 1 2 3 4 5 6 7 8 F 1 8 28 56 70 56 28 8 1 105 Solution: X Frequency(f) fx X-4 f(X-4) f(X-4)2 f(X-4)3 f(X-4)4 0 1 0 -4 -4 16 -64 256 1 8 8 -3 -24 72 -216 648 2 28 56 -2 -56 112 -224 448 3 56 168 -1 -56 56 -56 56 4 70 280 0 0 0 0 0 5 56 280 1 56 56 56 56 6 28 168 2 56 112 224 448 7 8 56 3 24 72 216 648 8 1 8 4 4 16 64 256 256 1024 0 512 0 2816 fX 1024 Mean = X f 256 4 106 f X X 0 f f X X 512 2 256 f f X X 0 f f X X 2816 11 256 f 1 2 2 3 3 4 4 107 Example: The first three moments of a distribution about the value 2 of the variable are 1,16 and -40. Show that the mean is 3, the variance is 15, the third moment about mean is 86. Also show that the first three moments about the origin are 3,24 and 76. 108 Solution: Given that A 2 1 16 40 ' 2 ' 1 ' 3 xA x A3 ' 1 ' 1 2 16 1 15 2 ' 2 '2 1 ' ' ' ' 3 3 2 40 3 16 1 2 1 86 3 3 2 1 1 1 x 3 2 2 x 2 24 3 3 3 2 x x 3 76 109 Skewness: Skewness means lack of symmetry. A frequency distribution of the set of values that is not symmetrical is called asymmetrical or skewed. In a skewed distribution, extreme values in a data set move towards one side of a distribution. When extreme values moves towards the upper or right tail, the distribution is positively skewed. When extreme values moves towards the lower or left tail, the distribution is negatively skewed. The basic purpose of measuring skewness is to estimate the extent to which an distribution is distorted from perfectly symmetrical distributions. 110 Symmetrical distribution Positively skewed distribution Negatively skewed distribution 111 Mean=Median=Mode Mean<Median<Mode Mean>Median>Mode 112 Measure of Skewness: The degree of skewness in a distribution can be classified as follows: a) Absolute measure of skewness b) Relative measure of skewness 113 Measure of Skewness: The degree of skewness in a distribution can be classified as follows: a) Absolute measure of skewness b) Relative measure of skewness 114 Absolute measure of skewness: Skewness can be measured in absolute terms by finding the difference between the mean and the mode or mean and median. Skewness = Mean-Mode Skewness = Mean-Median Skewness = Q3+Q1-2Median 115 Relative measure of skewness: The Relative measure of skewness is known as coefficient of skewness is obtained by dividing the absolute measure of skewness by any of the measure of dispersion. 116 Karl Pearson coefficient of skewness Relative measure of skewness Bowley coefficient of skewness Kelly’s coefficient of skewness Method of moments 117 1. Karl Pearson’s coefficient of skewness: Karl Pearson’s coefficient of skewness is based on the difference between mean and mode and is given by Coefficient of skewness Sk p 3 Mean Median S tan dard deviation 118 2. Bowley coefficient of skewness: This method is based on the fact that in a symmetrical distribution, the quartiles are equidistant from the median. Coefficient of skewness SkB Q1 Q3 2 Median Q3 Q1 119 3. Kelly’s coefficient of skewness: Kelly’s coefficient of skewness is based on percentile and deciles. Coefficient of skewness Skk P10 P90 2 Median P90 P10 D1 D9 2Median D9 D1 120 4. Method of moments: It is denoted by Skm Coefficient of skewness SkM 3 23 2 1 121 Example: Calculate Karl Pearson’s coefficient of skewness from the following: Marks above 0 10 20 30 40 50 60 70 80 No. of students 150 140 100 80 80 70 30 14 0 122 Solution: Class Frequency Cumulative (f) frequency Mid point (x) d=(x-45)/10 fd fd2 0-10 10 10 5 -4 -40 160 10-20 40 50 15 -3 -120 360 20-30 20 70 25 -2 -40 80 30-40 0 70 35 -1 0 0 40-50 10 80 45 0 0 0 50-60 40 120 55 1 40 40 60-70 16 136 65 2 32 64 70-80 14 150 75 3 42 126 -86 830 150 th N th term 75 term lies in 40 50 2 N C l 2 h 45 f Median= 123 fd 86 A h 45 10 39.27 Mean= f 150 Standard deviation= h 2 fd 830 86 10 22.81 f f 150 150 fd 2 2 Coefficient of skewness(Skp)= 3Mean Median s tan dard deviation 339.27 45 0.75 22.81 124 Example: From the following distribution, calculate the first four moments about mean, and coefficient of skewness based on moments: Income(Rs) 0-10 10-20 20-30 30-40 Frequency 1 3 4 2 125 Solution: X Frequency Mid (f) point(x) fx X-22 f(X-22) f(X-22)2 f(X-22)3 f(X-22)4 0-10 1 5 5 -17 -17 289 -4913 83521 10-20 3 15 45 -7 -21 147 -1029 7203 20-30 4 25 100 3 12 36 108 324 30-40 2 35 70 13 26 338 4394 57122 0 810 -1440 148170 10 220 fX 220 Mean = X f 10 22 126 f X X 0 f f X X 810 81 10 f f X X 1440 144 10 f f X X 148170 14817 10 f 1 2 2 3 3 4 4 Coefficient of skewness(Skm)= 3 144 16 32 32 2 81 81 127 Example: Calculate Bowley’s coefficient of skewness from the following: Wages (Rs.) 30-40 40-50 50-60 60-70 70-80 80-90 90-100 No. of persons 1 3 11 21 43 32 9 Solution: Wages (Rs.) 30-40 40-50 50-60 60-70 70-80 80-90 90-100 No. of persons 1 3 11 21 43 32 9 Cumulative frequency 1 4 15 36 79 111 120 128 th N Q1 term 30th term lies in 60 70 4 N C 120 15 4 4 Q1 l1 h 60 10 67.14 67 f 21 th N Q3 3 term 90th term lies in 80 90 4 3N C 3 120 79 4 4 Q3 l1 h 80 10 83.44 83 f 32 th Median= N th term 60 term lies in 70 80 2 N C 120 36 2 l 2 h 70 10 75.58 76 f 43 Coefficient of skewness (SkB)= Q1 Q3 2Median 67 83 2 76 0.125 Q3 Q1 83 67 129 Kurtosis: The measure of kurtosis describes the degree of concentration of observed frequencies in a given data. Kurtosis is used to test how near a frequency distribution conforms to normal curve or it is the degree of peakedness of a distribution, usually taken in relative to a normal distribution. 130 Measures of Kurtosis: Karl Pearson’s coefficient of kurtosis is defined as 2 4 22 The kurtosis of a distribution is also defined as 2 2 3 If , 2 0 the distribution is leptokurtic If , 2 0 the distribution is platykurtic If , 2 0 the distribution is mesokurtic 131 132 Example: Calculate the coefficient of skewness and kurtosis from the following data: Profit(Rs. In lakh) 10-20 20-30 30-40 40-50 50-60 No. of companies 18 20 30 22 10 133 Solution: Class Frequency interval (f) Mid point(x) d=(x-35)/10 fd fd2 fd3 fd4 10-20 18 15 -2 -36 72 -144 288 20-30 20 25 -1 -20 20 -20 20 30-40 30 35 0 0 0 0 0 40-50 22 45 1 22 22 22 22 50-60 10 55 2 20 40 80 160 -14 154 -62 490 100 1' fd f ' 2 fd f h 14 10 1.4 100 2 h2 154 100 154 100 134 3 fd 62 h 103 620 100 f ' 3 4' 4 fd f 3 h4 490 104 49000 100 2 2' 1'2 152.04 ' 3 ' ' 2 '3 21.312 3 3 2 1 1 4 4' 43' 1' 6 2' 1'2 31'4 47327.52 3 21.312 1 3 2 0.0114 32 2 152.04 2 4 47327.52 2.0474 2 2 2 152.04 135 Example: Prove that the frequency distribution curve of the following frequency distribution is leptokurtic: Class 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50 50-55 Frequency 1 19 5 1 4 8 35 20 7 136 Solution: Class Frequency Mid d=(x-32.5)/5 interval (f) point(x) fd fd2 fd3 fd4 10-15 1 12.5 -4 -4 16 -64 256 15-20 4 17.5 -3 -12 36 -108 324 20-25 8 22.5 -2 -16 32 -64 128 25-30 19 27.5 -1 -19 19 -19 19 30-35 35 32.5 0 0 0 0 0 35-40 20 37.5 1 20 20 20 20 40-45 7 42.5 2 14 28 56 112 45-50 5 47.5 3 15 45 135 405 50-55 1 52.5 4 4 16 64 256 2 212 20 1520 120 1' fd f h 2 1 5 120 12 137 2' 2 fd f h2 212 2 265 5 120 6 fd 20 125 h 5 120 6 f fd 1520 23750 h 5 120 3 f 3 ' 3 3 3 4 ' 4 4 4 6359 2 144 ' 2 '2 1 54684719 4 4 6 3 6912 ' 4 4 2 2 2 ' 3 ' 1 54684719 ' 2 '2 1 '4 1 6912 4.057 3 6359 144 2 138 Example: The first four moments about the working mean 28.5 of a distribution are 0.294, 7.144, 42.409 and 454.98. Calculate the moments about mean. Also evaluate β1 β2 and comment upon skewness & Kurtosis of the distribution. 139 Solution: Given that A 28.5 3' 42.409 ' 2 7.144 0.294 ' 1 4' 454.98 2 2' 1'2 7.057564 ' 3 ' ' 2 '3 36.16 3 3 2 1 1 4 4' 43' 1' 6 2' 1'2 31'4 408.79 3 2 36.16 2 1 3 3.719 3 2 7.058 4 408.79 2 2 8.206 3 2 2 7.058 140