Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER # 02: PRESENTATION OF DATA. 2.1 Introduction The data collected for the purpose of a statistical inquiry sometimes consists of a few fairly simple figures which can be easily understood without any special treatment. But sometimes the mass of raw data is without any structure that is not easily understood or interpreted. In order to make the data simple and easily understandable, the first task is to simplify the data in such a way that irrelevant data are removed and their significant features are stand out prominently. The procedures adopted for this purpose are classification and tabulation. 2.2 Classification The process of arranging data into groups/classes according to some common characteristics is called classification. 2.3 Types of classification One-way classification When the data are classified according to one characteristics, the classification is named as one-way classification. For example, the blank table given below may be used to show the number of adults in different occupations in a locality. The number of adults in different occupations in a locality Occupation No of adults Total Two-way classification When the data are classified according to two characteristics, the classification is named as two-way classification. Three-way classification When the data are classified according to three characteristics, the classification is named as three-way classification. 2.4 Tabulation The systematic arrangement of data into rows and columns is called tabulation. The table can be simple, double, treble and complex. 2.5 Frequency Distribution The organization of raw data in the form of table using classes and frequencies is known as frequency distribution where each class is defined by two numbers (lower and upper) and frequency is the no. of values in a specified class of distribution and is denoted by f. 2.6 Components of frequency distribution 1. 2. 3. 4. Class limits Class boundaries Class mark Class interval/width Class limits The values which describe the classes; the smaller number is the lower class limit and the larger number is the upper class limit. For example, 15-19, 20-24 etc. Class boundaries The precise numbers which separate one class from another is called class boundary. For example, 14.5-19.5, 19.5-24.5 etc. Class mark/Midpoint If the sum of the lower and upper boundaries of the class is divided by 2, the value obtained is called class mark or class midpoint. Class interval/Width The difference between the class boundaries is called class interval. It is denoted by h. 2.7 Steps for constructing a grouped frequency distribution A. Decide on the no. of classes into which the data are to be classified by the given formula: K= 1+3.3logN Where k= no. of classes and N is total no. of observations. B. Determine the range of variation in data as: Range= largest value−smallest value. C. Divide the range of variation by the no. of classes to determine class width. D. Start with the smallest value of data. E. Distribute the raw data into classes and determine the class frequency in each class by listing the actual values or tally marks. Example. The height of 30 students measured at the time of registration is given by 91, 89, 88, 87, 89, 91, 87, 92, 90, 98, 95, 97, 96, 100, 101, 96, 98, 99, 98, 100, 102, 99, 101, 105, 103, 107, 105, 106, 107,112. Make a suitable frequency distribution. Solution. k=1+3.3logN =1+3.3log (30) k=1+4.87=5.87 k=6 Range=largest value-smallest value 00=112-87 Range=25 h=25/6=4.167 h=5 Class limits Class boundaries Midpoint Tally Frequency 86-90 85.5-90.5 88 |||| | 6 91-95 90.5-95.5 93 |||| 4 96-100 95.5-100.5 98 |||| |||| 10 101-105 100.5-105.5 103 |||| | 6 106-110 105.5-110.5 108 ||| 3 111-115 110.5-115.5 113 | 1 30 Total 2.8 Relative frequency The frequency of a class divided by the total frequency is called relative frequency. Class boundaries 85.5-90.5 90.5-95.5 95.5-100.5 100.5-105.5 105.5-110.5 110.5-115.5 F 6 4 10 6 3 1 Total 30 R. F 6/30=0.200 4/30=0.133 10/30=0.333 6/30=0.200 3/30=0.100 1/30=0.033 1.000 2.9 Cumulative frequency The frequency obtained by adding each successive frequency to the cumulative total of frequencies for the preceding classes is known as cumulative frequency. Class boundaries 85.5-90.5 90.5-95.5 95.5-100.5 100.5-105.5 105.5-110.5 110.5-115.5 f 6 4 10 6 3 1 C. F 6 6+4=10 10+10=20 20+06=26 26+03=29 29+01=30 2.10 Grouped data Data presented in the form of frequency distribution is called grouped data. 2.11 Ungrouped data Data in its original form is known as ungrouped data. 2.12 Graphical representation The visual display of statistical data in the form of points, lines, areas and other geometrical forms and symbols is known as graphical representation. It has two major areas named as diagrams and graphs. 2.13 Diagrams It may be one, two or three dimensional form of visual representation of data. 2.14 Types of diagrams Simple bar diagram Multiple bar diagram Component bar diagram Pictograms Pie diagram Simple bar diagram It consists of horizontal/vertical bars of equal widths and lengths proportional to the values they represent. The values of variable are taken on x-axis and the frequencies are taken on y-axis. Multiple bar diagram The extension of simple bar diagram used to represent two or more related sets of data in the form of grouped bars. Component bar diagram A diagram in which each bar is divided into two or more sections proportional in size to the component parts of a total being displayed by each bar. Pictograms The representation of data by means of pictures or small symbols. A picture is worth ten thousand words. Pie/sector diagram A graphic device consisting of a circle divided into sectors or pie-shaped pieces whose areas are proportional to the various parts into which the whole quantity is divided. 2.15 Graphs It is the representation of data by a continuous curve usually shown on a graph paper. 2.16 Types of graphs Histogram Frequency polygon Frequency curve Cumulative frequency curve Histogram The graphical representation of data to get a visual impression about its distribution is called histogram. It is constructed from the grouped data by taking class boundaries along x-axis and the corresponding frequencies along y-axis. Historigram The graph of time series data is called historigram. Frequency polygon A closed geometric figure used to display a frequency distribution graphically is called frequency polygon. Here, the mid values of class boundaries are taken on x-axis while the relevant frequencies are taken along y-axis. Frequency curve When a frequency polygon/histogram constructed over class intervals made sufficiently small for a large no. of observations, is smoothed, it approaches to a continuous curve called frequency curve. Cumulative frequency polygon A graph obtained by plotting the cumulated frequency of distribution against the upper or lower class boundaries. Assignments Exercise Q # 01 Construct the frequency table for the following data. Also calculate relative and cumulative frequencies. 100, 96, 92, 88, 86, 84, 82, 80, 78, 91, 87, 83, 79, 77, 75, 73, 71, 69, 58, 56, 73, 50, 57, 55, 53, 51, 48, 46, 63, 59, 55, 51, 49, 47, 45, 41, 43, 58, 54, 50, 56, 44, 42, 40, 38, 36, 46, 53, 50, 43. Q # 02 Draw multiple bar diagram for the following data. Items Clothing House rent fuel Miscellaneous Family A 600 100 400 100 Family B 800 100 500 300 Q # 03 Draw frequency polygon for the given data. Mid 32 values frequency 3 37 42 47 52 57 62 67 17 28 47 54 31 14 4