Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
GROUPED DATA SUBTOPIC 8.3 : Measures of Location 8.4 : Measures of Dispersion LEARNING OUTCOMES 8.3(b) Find and interpret the mean, mode, median, quartiles and percentile for grouped data 8.3(c) Describe the symmetry and skewness for a data distribution 8.4(b) Find and interpret variance, standard deviation and coefficient of variation for grouped data Sketch of Median, Quartiles, Interquartiles, Decile and Percentile from ogive Cumulative frequency P75 = Q3 = D7.5 Median = P50 = Q2 = D5 P25=Q1 = D2.5 X1 X2 X3 Class boundaries Example 1 Using the ogive drawn below, determine the (a)Median (b)First quartile (c)Third decile (d)Seventieth percentile 5 10 15 20 25 30 35 40 Solution (a) Median: 60/2= 30th observation From the ogive, the median = 20 (b) First quartile:60/4=15th observation From the ogive, the first quartile =12.5 (c) Third decile;3/10 X 60=18th From the ogive, the third decile =14 (d) Seventieth percentile; 70/100 X 60=42th From the ogive percentile is = 24.5 Seventieth percentile Median Third decile First quartile 5 12.5 10 15 14 20 25 30 35 20 24.5 40 Shape of data distribution Symmetry and Skewness • The general shape of the data distribution can be determine from mean, median and mode as illustrated in the histogram or frequency curve. • For largely skewed distribution, median is more appropriate measure of central tendency. • For symmetrical distribution or almost symmetrical distribution, mean is the appropriate measure of central tendency. Shape of data distribution Symmetry and Skewness • Three important shapes: • i. Symmetry • ii. Positively skewed or rightskewed distribution • iii. Negatively skewed or left-skewed distribution (i) Symmetrical ~The values of the mean, median mode are identical. ~They lie at the center. frequency and Mean = Median = Mode SYMMETICAL Mean Median Mode variable A set of observations is symmetrically distributed if its graphical representation (histogram, bar chart) is symmetric with respect to a vertical axis passing through the mean. For a symmetrically distributed population or sample, the mean, median and mode have the same value. Half of all measurements are greater than the mean, while half are less than the mean. (ii) Positively skewed or Skewed to the right ~The value of the mean is the largest ~The mode is the smallest ~The median lies between these two values frequency Mode Mean Median Mean > Median > Mode POSITIVELY SKEWED variable A set of observations that is not symmetrically distributed is said to be skewed. It is positively skewed if a greater proportion of the observations are less than or equal to (as opposed to greater than or equal to) the mean; this indicates that the mean is larger than the median. The histogram of a positively skewed distribution will generally have a long right tail; thus, this distribution is also known as being skewed to the right. (iiI) Negatively skewed or Skewed to the left ~The value of the mean is the smallest ~The mode is the largest ~The median lies between these two values frequency Mean < Median < Mode NEGATIVELY SKEWED Mean Mode variable Median A negatively skewed distribution has more observations that are greater than or equal to the mean. Such a distribution has a mean that is less than the median. The histogram of a negatively skewed distribution will generally have a long left tail; thus, the phrase skewed to the left is applied here. RANGE Range = upper boundary of the last data - lower boundary of the first class INTERQUARTILE RANGE • Defined as the difference between the third quartile and the first quartile Interquartile range = Q3 - Q1 fx fx f f -1 2 2 Variance, S2 standard deviation, S Variance S 2 Example 2: Find the range, variance and standard deviation Class Frequency Class mark x Intervals 1-3 5 2 4-6 3 5 7-9 2 8 10-12 1 11 13-15 6 14 16-18 4 17 f 21 2 fx fx 10 15 16 11 84 68 20 75 128 121 1176 1156 fx fx 2 = 204 2676 Solution: Range = upper boundary of the last data - lower boundary of the first class = 18.5 – 0.5 = 18 fx fx f f 1 2 204 2676 2 S 2 21 20 2 S 34.71 2 S = 34.71 5.892 REMARK Sometimes we would like to compare the variability of two different data sets that have different units of measurement. Standard deviation is not suitable since it is a measure of absolute variability and not of relative variability. The most appropriate measure is the coefficient of variation (CV) which expresses standard deviation as a percentage of the mean. Coefficient of variation standard deviation CV X 100% mean • Note: A larger coefficient of variation means that the data is more dispersed and less consistent. Example : Suppose we want to compare two production process that fill containers with products • Process A is filling fertilizer bags, which have a nominal weight of 80 pounds. • For process A : x 80.2 pounds s 1.2 pounds • Process B is filling cornflakes boxes, which have a nominal weight of 24 ounces. • For process B : x 24.6 ounces s 0.4 ounces For process A, For process B, 1.2 CV 100% 1.50% 0.4 80.2 CV 100% 1.63% 24.6 Is process A much more variable than process B because 1.2 is three times larger than 0.4? No because the two processes have very similar variability relative to the size of their means