Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS) DESCRIPTIVE STATISTICS : NUMERICAL MEASURES (STATISTICS) 3.1 Measures of Central Tendency • Gives the center of a histogram or a frequency distribution curve. 3.1.1 Different measures of central tendency i. Mean • The mean of a sample is the sum of the measurements divided by the number of measurements in the set. Mean is denoted by Mean = Sum of all values / Number of values • Mean x can be obtained as below :For ungrouped data, mean is defined by, _ x x1 x2 ....... xn x , for n 1,2,..., n or x n n _ For grouped data, mean is defined by, n x fx i i 1 n f i 1 i i fx or f Where f = class frequency; x = class mark (mid point) Example 3.1:2002 Total Payroll (Million of dollars) MLB Team Anaheim Angels 62 Atlanta Braves 93 New York Yankees 126 St. Louis Cardinals 75 Tampa Bay Devil Rays 34 Total 390 The mean sample of CGPA (raw) is x 390 x 78 n 5 Table 3.1 Example 3.2 :The mean sample for Table 3.2 CGPA (Class) 2.50 - 2.75 2.75 - 3.00 3.00 - 3.25 3.25 - 3.50 3.50 - 3.75 3.75 - 4.00 Total Table 3.2 Frequency 2 10 15 13 7 3 50 Class Mark 2.625 2.875 3.125 3.375 3.625 3.875 fx 5.250 28.750 46.875 43.875 25.375 11.625 161.750 n x fx i 1 n i i f i 1 i 161.75 3.235 50 ii. Median • Median is the middle value of a set of observations arranged in ~ order of magnitude and normally is devoted by x 1. The median for ungrouped data. - The median depends on the number of observations in the data, n. - If nis odd, then the median is the ( n 2 1) th observation of the ordered observations. - If nis even, then the median is the arithmetic mean of the n th observation and the ( n 1) th observation. 2 2 2. The median of grouped data / frequency of distribution. The median of frequency distribution is defined by: f F j 1 x Lc 2 fj where, • L = the lower class boundary of the median class; • c = the size of the median class interval; • F = the sum of frequencies of all classes lower than the median class; and • f = the frequency of the median class. j 1 j Example 3.3 for ungrouped data :The median of this data 4, 6, 3, 1, 2, 5, 7, 3 is 3.5. - Rearrange the data in order of magnitude becomes 1,2,3,3,4,5,6,7. As n=8 (even), the median is the mean of the 4th and 5th observations that is 3.5. Example 3.4 for grouped data :f F j 1 x Lc 2 fj = 3.217 Proof :CGPA (Class) 2.50 - 2.75 2.75 - 3.00 Frequency 2 10 3.00 - 3.25 3.25 - 3.50 3.50 - 3.75 3.75 - 4.00 Total 15 13 7 3 50 25 12 Median, x 3.00 0.25 3.217 15 Class Mark 2 12 Median 27 40 47 50 Table 3.3 iii. Mode • The mode of a set of observations is the observation with the highest frequency and is usually denoted by . Sometimes mode can also be used to describe the qualitative data. 1. Mode of ungrouped data :- Defined as the value which occurs most frequent. - The mode has the advantage in that it is easy to calculate and eliminates the effect of extreme values. - However, the mode may not exist and even if it does exit, it may not be unique. *Note: • If a set of data has 2 measurements with higher frequency, therefore the measurements are assumed as data mode and known as bimodal data. • If a set of data has more than 2 measurements with higher frequency so the data can be assumed as no mode. 2. The mode for grouped data/frequency distribution data. - When data has been grouped in classes and a frequency curve is drawn to fit the data, the mode is the value of corresponding to the maximum point on the curve. - Determining the mode using formula. 1 x̂ L c 1 2 where L = the lower class boundary of the modal class; c= the size of the modal class interval; 1 = the difference between the modal class frequency and the class before it; and 2 = the difference between the modal class frequency and the class after it. *Note: - The class which has the highest frequency is called the modal class. Example 3.5 for ungrouped data :The mode for the observations 4,6,3,1,2,5,7,3 is 3. Example 3.6 for grouped data based on table :Proof :CGPA Table 3.4 (Class) 2.50 - 2.75 Frequency 2 2.75 - 3.00 3.00 - 3.25 3.25 - 3.50 3.50 - 3.75 3.75 - 4.00 Total 10 15 13 7 3 50 1 xˆ L c 3.179 1 2 1 5 xˆ L c 3.00 0.25( ) 3.179 5 2 1 2 3.2 Measure of Dispersion • The measure of dispersion or spread is the degree to which a set of data tends to spread around the average value. • It shows whether data will set is focused around the mean or scattered. • The common measures of dispersion are variance and standard deviation. • The standard deviation actually is the square root of the variance. • The sample variance is denoted by s2 and the sample standard deviation is denoted by s. 3.2.1 Range • The range is the simplest measure of dispersion to calculate. Range = Largest value – Smallest value Example 3.7:Table 3.5 gives the total areas in square miles of the four western SouthCentral states the United States. State Total Area (square miles) Arkansas 53,182 Louisiana 49,651 Oklahoma 69,903 Texas 267, 277 Table 3.4 Range = Largest Value – Smallest Value = 267, 277 – 49, 651 = 217, 626 square miles. 3.2.2 Variance i. Variance for ungrouped data - The variance of a sample (also known as mean square) for the raw (ungrouped) data is denoted by s2 and defined by: S2 2 ( x x ) n 1 ii. Variance for grouped data - The variance for the frequency distribution is defined by: S2 f (x x ) fx 1 2 fx 2 nx 2 n 1 Example 3.8 for ungrouped data :Variance for the Students’ CGPA for Data 1 is 0.105. Example 3.9 for grouped data :The variance for frequency distribution in Table 3.5 is: S 2 CGPA (Class) 2.50 - 2.75 2.75 - 3.00 Frequency, f 2 10 Class Mark, x 2.625 2.875 fx 5.250 28.750 3.00 - 3.25 15 3.125 46.875 3.25 - 3.50 3.50 - 3.75 3.75 - 4.00 13 7 3 3.375 3.625 3.875 Total 50 43.875 25.375 11.625 161.75 0 fx 2 nx 2 n 1 fx2 13.781 82.656 146.48 4 148.07 8 91.984 45.047 528.03 1 Table 3.5 528.031 50(3.235)2 0.0973 49 3.2.3 Standard Deviation i. Standard deviation for ungrouped data :S2 (x x ) 2 n 1 ii. Standard deviation for grouped data :- S2 fx 1 f ( x x )2 2 2 fx nx n 1 Example 3.10 (Based on example 3.8) for ungrouped data: S2 0.105 0.3240 Example 3.11 (Based on example 3.9) for grouped data: S2 fx 2 nx 2 n 1 528.031 50(3.235)2 0.3119 49 3.2.4 Rules of Data Dispersion • Chebyshev’s Theorem - At least of the observations in will be in the range of standard deviation from mean, where is the positive number exceed 1. Steps: 1) 2) 3) 4) Determine the interval x ks 1 (1 ) Find value of k2 Change the value in step 2 to a percent Write statement: at least the percent of data found in step 3 is in the interval found in step 1 - Example 3.12:(1 1 ) k2 = (1 1 ) 22 = (1 1 ) 4 = 3 ( ) 4 = 0.75 Hence, according to Chebyshev`s Theorem, at least 75% of the value of a data set lie within two standard deviations of the mean. • Empirical Rule - For a data that is normally distributed, at least i. 68% of the observations lie in the interval ( x s, x s) ii. 95% of the observations lie in the interval ( x 2s, x 2s) iii. 99.7% of the observations lie in the interval ( x 3s, x 3s)