Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 1 ◦ Methods for organizing, displaying and describing data using tables, graphs and summary measures Raw data is made more manageable Raw data is presented in a logical form Patterns can be seen from organised data Frequency tables Graphical techniques Measures of Central Tendency Measures of Spread (variability) Chapter 3 2 Organize data and display data using tables and graphs a) presentation of qualitative data b) presentation of quantitative data Describe the characteristics of data set using statistical measures a) measures of central tendency b) measures of dispersion c) measures of skewness d) Box and whisker plot e) Population vs sample Chapter 3 3 Chapter 3 4 Definition: Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data 21 18 25 22 25 19 20 19 28 23 24 19 31 21 18 25 22 19 20 37 29 19 23 22 27 34 19 18 22 23 26 25 23 21 21 27 22 19 20 25 37 25 23 19 21 33 23 26 21 24 Ex: Ages of 50 students Chapter 3 5 Qualitative Data Data that cannot be measured but can be classified into different categories Example: gender, status of a students, nationality, races Quantitative Data Data that can be measured numerically Example: income, heights, gross sales, prices of homes, numbers of cars owned and numbers of accident Chapter 3 6 a) Organizing qualitative data ◦ (i) Frequency distributions ◦ (ii)Relative frequency and distributions percentage b) Graphing qualitative data ◦ (i) Bar graphs ◦ (ii)Pie charts Chapter 3 7 A frequency distribution for qualitative data lists all categories and the number of elements that belong to each of the categories. Chapter 3 8 A sample of 20 employees from large companies was selected and these employees were asked how stressful their jobs were. The responses are recorded as very represents very stressful, somewhat means somewhat stressful and none stands for not stressful at all. somewhat none somewhat very none very somewhat none somewhat somewhat very somewhat very somewhat somewhat somewhat Chapter 3 very very none very 9 Stress on job Tally Frequency (f ) Very |||| || 7 Somewhat |||| |||| 9 None |||| 4 Sum = 20 Frequency Distribution of Stress on Job Chapter 3 10 Frequency of that category Relative frequency of a category = Sum of all frequencies Percentage = (Relative ferquency) 100 Chapter 3 11 Stress on job Relative frequency Percentage (%) Very 7/20 = 0.35 0.35(100) = 35 Somewhat 9/20 = 0.45 0.45(100) = 45 None 4/20 = 0.20 0.20(100) = 20 Sum = 1.00 Sum = 100% Relative frequency and percentage distributions of stress on job Chapter 3 12 A graph made of bars where the categories are on the horizontal axis and the frequencies (or relative frequencies) are on the vertical axis. 60 40 20 0 heart cancer stroke CLRD accident Chapter 3 13 A circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories is called a pie chart. heart cancer stroke CLRD Chapter 3 accident 14 Numerical Data 1 Array 2 3 Types of quantitative data a Histogram b Polygon Frequency Distributions c Ogive Chapter 3 d Stem & Leaf 15 1. Organizes data to focus on major features i. Ascending Example: 1, 2, 3, 4, 5,…. ii. Descending Example: 10, 9, 8, 7, 6,…. iii.Range (difference between the largest and smallest) Example: largest height is 74 inch smallest height is 60 inch range is 74 – 60 = 14 inch Chapter 3 16 o o o o Quickly notice lowest and highest values in the data Easily divide data into sections Easily see values that occur frequently Observe variability in the data Chapter 3 17 Raw Data: Yards Produced by 30 Carpet Looms 16.2 15.4 16.0 16.6 15.9 15.8 16.0 16.8 16.9 16.8 15.7 16.4 15.2 15.8 15.9 16.1 15.6 15.9 15.6 16.0 16.4 15.8 15.7 16.2 15.6 15.9 16.3 16.3 16.0 16.3 (ungrouped data) Chapter 3 18 Raw Data: Yards Produced by 30 Carpet Looms 16.2 15.4 16.0 16.6 15.9 15.8 16.0 16.8 16.9 16.8 15.7 16.4 15.2 15.8 15.9 16.1 15.6 15.9 15.6 16.0 16.4 15.8 15.7 16.2 15.6 15.9 16.3 16.3 16.0 16.3 Data Array: Daily Production in Yards of 30 Carpet Looms 15.2 15.4 15.6 15.6 15.6 15.7 15.7 15.8 15.8 15.8 15.9 15.9 15.9 15.9 16.0 16.0 16.0 16.0 16.1 16.2 Chapter 3 16.2 16.3 16.3 16.3 16.4 16.4 16.6 16.8 16.8 16.9 19 Discrete data - integer values 0, 1, 2 Example: number of children, cars,.. Continuous data Example: weight, length, time, area, price, 256.312 grams Chapter 3 20 A frequency distribution for quantitative data lists all the classes and the number of values that belong to each class. Data presented in the form of a frequency distribution are called grouped data Chapter 3 21 variable third class classes lower limit of sixth class Weekly Earnings (RM) 401 - 600 Num of Employees (f ) 9 601 - 800 12 801 - 1000 39 1001 - 1200 15 1201 - 1400 9 1401 - 1600 6 frequency column frequencies upper limit of sixth class Chapter 3 22 Class boundary = upper limit + lower limit of next class 2 Ex: Upper boundary of first class (600+601)/2 = 600.5 Lower boundary of second class (601+600)/2 = 600.5 Upper boundary one class = Lower boundary next class Chapter 3 23 Class width = upper boundary - lower boundary Example: Width of first class 600.5 - 400.5 = 200 Width of second class 800.5 - 600.5 = 200 Chapter 3 24 Class midpoint = lower limit + upper limit 2 Ex: Midpoint of the first class (401 + 600)/2 = 500.5 Ex: Midpoint of the second class (601 + 800)/2 = 700.5 Chapter 3 25 class interval i. ii. iii. iv. v. Height (cm) Number of Students 60 - 62 63 - 65 66 - 68 69 - 71 72 – 74 10 18 42 27 8 Total 105 frequency First class limits. Lower class limit = 60 Upper class limit = 62 First class boundary. Upper boundary = 62.5 Lower boundary = 59.5 Class width. Example: c = 62.5 - 59.5 = 3 First class midpoint = (60 + 62)/2 = 61 Class frequency = number of students Chapter 3 26 Weekly Earnings (RM) 400 - 600 Num of Employees (f ) 9 600 - 800 12 800 - 1000 39 1000 - 1200 15 1200 - 1400 9 1400 - 1600 6 Class limit = Class boundary Chapter 3 27 15.2 15.2 15.3 15.3 15.3 15.3 15.3 15.4 15.4 15.4 Raw Data: 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.5 15.5 15.5 15.5 15.5 15.5 15.6 15.6 15.6 15.7 15.7 15.7 Frequency Distribution Class Tallies 15.2 15.3 15.4 15.5 15.6 15.7 // //// //// //// / //// / /// /// Frequency 2 5 11 6 3 3 Relative Frequency Distribution Class 15.2 15.3 15.4 15.5 15.6 15.7 Frequency (1) 2 5 11 6 3 3 30 Relative Freq. (1) 30 0.07 0.16 0.37 0.20 0.10 0.10 1.00 Chapter 3 Cumulative Relative Frequency 0.07 0.23 0.60 0.80 0.90 1.00 29 When constructing a frequency distribution table, we need to make the following three major decisions : Number of Classes Class Width Lower Limit of the First Class / Starting Point Chapter 3 30 Number of Classes k = 1 + 3.3 log n Class width i ≥ Largest Value – Smallest Value Number of classes (k) Lower Limit of the First Class/ Starting Point ◦ Any convenient number that is equal to or less than the smallest value in the data set can be used as the lower limit of the first class. Chapter 3 31 1. Determine the Class Interval Size or Class Width) Example: Given the following data 100 74 84 95 95 110 99 87 100 108 85 103 99 83 91 91 84 110 113 105 100 98 100 108 100 98 100 107 79 86 123 107 87 105 88 85 99 101 93 99 u R = 123 - 74 = 49 Chapter 3 32 Number of Classes k = = = ≈ 1 + 3.3 log n 1 + 3.3 log 40 6.3 6 Chapter 3 33 Class Width i ≥ Largest Value – Smallest Value Number of classes (k) ≥ 49/6 ≥9 Chapter 3 34 Grouped Frequency Distribution 6 classes Cumulative Class Frequency Relative Frequency % (1) (1) 40 71 - 80 Class Interval Midpoint 81 - 90 (71 + 80)/2 = 75.5 91 - 100 Upper Limit 101 - 110 100 111 - 120 Lower Limit 121 - 130 91 Class width = 130.5 – 120.5 = 10 Chapter 3 35 Class Boundary – Is given by the mid-point of the upper limit of one class and the lower limit of the next class. Class boundaries are also call real class limit. Chapter 3 36 • • • Histogram is a certain kind of graph that can be drawn for a frequency distribution, a relative frequency distribution or a percentage distribution. To draw histogram, mark horizontal axis as classes and vertical axis as frequencies (or relative frequencies or percentage). A histogram is called a frequency histogram, a relative frequency histogram or a percentage histogram depending on the vertical axis Chapter 3 37 Class 15.2-15.5 15.5-15.8 15.8-16.1 16.1-16.4 16.4-16.7 16.7-16.10 Frequency 12 10 8 Frequency 2 5 11 6 3 3 6 4 2 0 15.2 15.5 15.5 15.8 15.8 16.1 16.1 16.4 16.4 16.7 16.7 16.10 Chapter 3 38 • • • A graph formed by joining the midpoints of the tops of successive bars in a histogram with straight lines is called a polygon. A graph of polygon consist of class midpoints on the horizontal axis and the frequencies, relative frequencies or percentages on the vertical axis. A histogram is called a frequency histogram, a relative frequency histogram or a percentage histogram depending on the vertical axis Chapter 3 39 Frequency 12 10 8 6 4 2 0 15.0 15.3 15.6 15.9 16.2 16.5 16.8 Production Level in Yards 17.1 Chapter 3 40 Frequency 12 10 8 6 4 2 0 15.0 15.3 15.6 15.9 16.2 16.5 16.8 Production Level in Yards 17.1 Chapter 3 41 • Ogive is a curve drawn for the cumulative frequency distribution by joining with straight lines the dots marked above the upper boundaries of classes at heights equal to the cumulative frequencies of respective classes. Chapter 3 42 • • • Each value is divided into two portions (a stem and a leaf). The leaves for each stem are shown separately is a display. An advantage of a stem and leaf display is we do not lose information on individual observations only for quantitative data Chapter 3 43 The following are scores of 30 college students on a statistics test: 75 93 79 71 Construct 52 95 68 87 80 69 50 72 96 72 92 92 65 81 83 57 79 61 84 98 71 76 77 87 86 64 a stem and leaf display. Chapter 3 44 1. 2. 3. 4. Split each score into two parts First part contains first digit which called stem Second part contains the second digit which called the leaf Arranged in increasing order. stem 5 6 7 8 9 2 leaves 5 Chapter 3 45 The complete stem and leaf display for scores is shown below: 5 6 7 8 9 2 5 5 0 6 0 9 9 7 3 7 1 1 1 5 8 2 6 2 4 6 9 7 1 2 3 4 7 2 8 From the figure, the stem 7 has the highest frequency followed by stem 8,9,6 and 5 Chapter 3 46 The leaves for each stem are ranked in increasing order as below: 5 6 7 8 9 0 1 1 0 2 2 7 4 5 8 9 1 2 2 5 6 7 9 9 1 3 4 6 7 7 2 3 5 6 8 Chapter 3 47 Diastolic blood pressure on 120 people. 60 Type A people vs. 60 Type B people Type A: Extremely hostile, competitive, impatient Type B: Laid back people Chapter 3 48 Type A: Extremely hostile, competitive, impatient 53, 57, 58, 59, 59, 60, … Type B: Laid back people 51, 52, 59, 59, 60, … Chapter 3 49 5 6 6 6 6 7 7 8 37899 00001111 2223333 444455555 666777778888 0000111 333444789 011 5 6 6 6 7 7 8 9 1299 0001122233333 4445555555777 888889 0000111 222333466899 0000 3 Chapter 3 50 5 6 6 6 6 7 7 8 37899 00001111 2223333 444455555 666777778888 0000111 333444789 011 Modes Chapter 3 51 Chapter 3 52 distinguish among the measures of central tendency, measures of dispersion and measures of skewness. calculate values for common measures of location, including the arithmetic mean, median and mode. calculate values for common measures of dispersion, including range, variance, standard deviation and quartile deviation calculate values for measures of skewness. Chapter 3 53 measure of asymmetry: to show frequency distribution symmetrical about the mean or skewed Measure of central tendency measure of location: to show where the centre of the data Statistical Measures Measure of skewness Measure of dispersion measure of spread: to show how spread out the data are around the centre Chapter 3 54 MEASURE OF CENTRAL TENDENCY 1. Mean ( average, x ) - Add all observation - Divide this sum by the number of observation x a) Set of values,x = n b) Simple frequency distribution fx x= f c) Grouped frequency fx x= f ( x = class midpoint) Chapter 3 55 MEASURE OF CENTRAL TENDENCY it might be distorted by extremely high or low values. Chapter 3 56 MEASURE OF CENTRAL TENDENCY ◦ Advantages it is widely understood the value of every item is included in the computation of the mean. it is well suited to further statistical analysis. ◦ Disadvantages its value may not correspond to any actual value. it might be affected by extremely high or low values. Chapter 3 57 MEASURE OF CENTRAL TENDENCY Example a. The arithmetic mean (mean) of the number 8, 3, 5, 12, and 10 is.. b. If 5, 8, 6, and 2 occur with frequencies 3, 2, 4 and 1, the mean is.. c. Find the mean of the following frequency distribution Class Frequency 1-3 4-6 7-9 10 - 12 13 - 15 16 - 18 1 4 8 6 3 1 Chapter 3 58 MEASURE OF CENTRAL TENDENCY a. x x 8 3 5 12 10 7.6 n 5 b. x fx 5(3) 8(2) 6(4) 2(1) 5.7 3 2 4 1 f c. Class f x (midpoint) fx 1-3 4-6 7-9 10 - 12 13 - 15 16 - 18 1 4 8 6 3 1 2 5 8 11 14 17 2 20 64 66 42 17 f fx 23 211 x Chapter 3 fx 211 9.17 f 23 59 MEASURE OF CENTRAL TENDENCY 2. Median (middle value of a distribution or array) - Arrange the observations in order of increasing size - Find the number of observations and the middle observation - Identify the median as this middle value a) Set of data b) Simple frequency distribution n 1 odd 2 n n and 1 2 2 even ( n = sample size ) c) Grouped frequency (i) Graphical method (ii) Interpolation method Chapter 3 60 MEASURE OF CENTRAL TENDENCY (i) Graphical Method Median = 700 Chapter 3 61 MEASURE OF CENTRAL TENDENCY (ii) Interpolation Method Median = n Fm1 2 Lm Cm fm Where: Lm n = the lower boundary of the class containing the median. = the total frequencies. Fm-1 = the cumulative frequency in the classes immediately preceding the class containing the median. fm Cm = the frequency in the class containing the median. = the width of the class in which the median lies. Chapter 3 62 MEASURE OF CENTRAL TENDENCY it is unaffected by extremely high or low values. Chapter 3 63 MEASURE OF CENTRAL TENDENCY Advantages it is unaffected by extremely high or low values. can be used when certain end values of a set or distribution are difficult, expensive or impossible to obtain, particularly appropriate to ‘life’ data. can be used with non-numeric data if desired, providing the measurements can be naturally ordered. will often assume a value equal to one of the original data. Disadvantages it is difficult to handle theoretically in more advanced statistical work, so its use is restricted to analysis at a basic level. it fails to reflect the full range of values. Chapter 3 64 MEASURE OF CENTRAL TENDENCY Example a. The times taken to inspect five units coming from a production line are recorded as 13, 14, 11, 17 and 11 minutes. What is the median? b. Find the median of the following frequency distribution Class Frequency 118 - 126 127 - 135 136 - 144 145 - 153 154 - 162 163 - 171 172 - 180 3 5 9 12 5 4 2 Chapter 3 65 MEASURE OF CENTRAL TENDENCY a. b. n 1 5 1 3 2 2 median 13 median 11, 11, 13, 14, 17 Class f F 118 - 126 127 - 135 136 - 144 145 - 153 154 - 162 163 - 171 172 - 180 3 5 9 12 5 4 2 3 8 17 29 34 38 40 median class n 40 20 2 2 n Fm1 2 median Lm Cm fm 40 17 2 =144.5+ (153.5 144.5) 12 147 Chapter 3 66 MEASURE OF CENTRAL TENDENCY 3. Mode (value which occurs most often) - Draw a frequency table for the data - Identify the mode as the most frequent value a) Set of data b) Simple frequency distribution Mode = value that appears most frequently c) Grouped frequency (i) Graphical method (ii) Interpolation method Chapter 3 67 MEASURE OF CENTRAL TENDENCY (i) Graphical Method 16 14 No. of cars 12 10 8 Mode = 146 6 4 2 0 110 - 120 120 - 130 130 - 140 140 - 150 150 - 160 160 - 170 170 - 180 Mileage (km) Chapter 3 68 MEASURE OF CENTRAL TENDENCY (ii) Interpolation Method Mode = Where: D1 C L D D 1 2 L = The lower class boundary of class containing the mode. C = The class width for class containing the mode. D1 = Difference between the largest frequency and the frequency immediately preceding it (f0 – f-). D2 = Difference between the largest frequency and the frequency immediately following it (f0 – f+). Chapter 3 69 MEASURE OF CENTRAL TENDENCY Mode the mode of a set of data is that value which occurs most often, or, equivalently , has the largest frequency. Chapter 3 70 MEASURE OF CENTRAL TENDENCY ◦ Advantages it is more appropriate average to use in situations where it is useful to know the most common value. easy to understand, not difficult to calculate and can be used when a distribution has opened-ended classes. it is not affected by extreme values. ◦ Disadvantages it ignores dispersion around the modal value and it does not take all the values into account. it is unsuitable for further statistical analysis. although it ignores extreme values, it is thought to be too much affected by the most popular class when a distribution is significantly skewed. Chapter 3 71 MEASURE OF CENTRAL TENDENCY Example a. Find the mode of the following frequency distribution Class Frequency 1-3 4-6 7-9 10 - 12 13 - 15 16 - 18 1 4 8 6 3 1 Chapter 3 72 MEASURE OF CENTRAL TENDENCY Class Frequency 1-3 4-6 7-9 10 - 12 13 - 15 16 - 18 1 4 8 6 3 1 mode class D1 mode L C D1 D2 84 6.5 (9.5 6.5) (8 4) (8 6) 8.5 Chapter 3 73 MEASURE OF DISPERSION 1. Range maximum value – minimum value Chapter 3 74 MEASURE OF DISPERSION 2. Standard deviation - Calculate the mean value a) Set of data s= x - x 2 s= x n n 2 x - n 2 - find the deviation of each observation from this mean b) Simple frequency distribution - Square these deviations - add the squares s= - divide this sum by num of observations - Square root of the value obtained fx f 2 fx - f 2 c) Grouped frequency s= fx f 2 fx - f 2 where x = class mid-point Chapter 3 75 MEASURE OF DISPERSION Comparing standard deviation Chapter 3 76 MEASURE OF DISPERSION a) Set of data 3. Variance x - x v= x v= 2 n n 2 fx - n 2 b) Simple frequency distribution variance = standard deviation 2 s 2 fx = f 2 fx - f 2 c) Grouped frequency fx fx 2 s = - f f 2 where x Chapter = class 3 2 77 MEASURE OF DISPERSION Example a. Find the variance and standard deviation of the following data: Class Frequency 0 - 4.9 5 - 9.9 10 - 14.9 15 - 19.9 20 - 24.9 3 5 7 6 2 Chapter 3 78 MEASURE OF DISPERSION Class f x x2 fx fx2 0 - 4.9 5 - 9.9 10 - 14.9 15 - 19.9 20 - 24.9 3 5 7 6 2 2.45 7.45 12.45 17.45 22.45 6.0025 55.5025 155.0025 304.5025 504.0025 7.35 37.25 87.15 104.7 44.9 18.0075 277.5125 1085.0175 1827.015 1008.005 f 23 fx 281.35 2 fx fx 2 s - f f 4215.5575 2 4215.5575 281.35 23 23 183.2851 149.6367 33.6484 33.65 fx 2 2 s s2 5.8 Chapter 3 79 MEASURE OF DISPERSION 4. Chebyshev’s Theorem - By using the mean and standard deviation, we can find the propor or percentage of the total observation that fall within a given inte about the mean using Chebyshev’s theorem. For any number k greater than 1, at least (1 1 k 2 ) of the data values lie within k standard deviations of the mean. At least (1-1/k2) of the values lie in the shaded areas. k k k k Chapter 3 80 MEASURE OF DISPERSION Example The average systolic blood pressure for 4000 women who were screened for high blood pressure was found to be 187 with a standard deviation of 22. Using Chebyshev’s theorem, find at least what percentage of women in this group have a systolic blood pressure between 143 and 231. Chapter 3 81 MEASURE OF DISPERSION Solution: 187 and 22 To find the percentage of blood pressure between 143 and 231 143 - 187 = -44 143 231 - 187 = 44 187 231 k is obtained by dividing the distance between the mean by standard de 44 k 2 22 1 1 1 2 1 2 1 0.25 0.75 k (2) Chapter 3 82 MEASURE OF DISPERSION At least 75% of the women have systolic blood pressure between 143 and 231 At least 75% of the women have systolic blood pressure between 143 and 231. 143 2 187 231 2 Chapter 3 83 MEASURE OF DISPERSION 5. Empirical Rule - The empirical rule applies only to a specific type of distribution ca a bell-shaped distribution also known as normal curve. • 68% of the observations lie within one standard deviation of the mean • 95% of the observations lie within two standard deviation of the mean • 99.7% of the observations lie within three standard deviation of the mean 99.7% 95% 68% 3 2 2 3 Chapter 3 84 MEASURE OF DISPERSION Example 1 The age distribution of a sample of 5000 person is bellshaped with a mean of 40 years and a standard deviation of 12 years. Determine the approximate percentage of people who are 16 to 64 years old. Chapter 3 85 MEASURE OF DISPERSION Solution: x 40 and s 12 To find the percentage of age between 16 and 64 16 - 40 = -24 16 64 - 40 = 24 x 40 64 Dividing the distance,24 by the standard deviation,12 we have the distance is equal 2s 24 2 12 Chapter 3 86 MEASURE OF DISPERSION 16 - 40 = -24 = -2s 16 x 2s 64 - 40 = 24 = 2s x 40 64 x 2s Because the area within two standard deviations of the mean is approximately 95% for a bell-shaped curve, approximately 95% of the people in the sample are 16 to 64 years old. Chapter 3 87 MEASURE OF DISPERSION Example 2 Assuming the incomes for all single parent household last year produces a bell shaped distribution with mean RM23,500 and standard deviation of RM4,500. Determine the range of income if it is distributed for 68% = (RM19,000,RM28,000) 95% = (RM14,500,RM32,500) 99.7% = (RM10,000,RM37,000) Chapter 3 88 MEASURE OF DISPERSION 6. Coefficient of variation standard deviation (s) ×100% x • The coefficient of variation represents the ratio of the standard deviation to the mean, and it is a useful statistic for comparing the degree of variation from one data series to another, even if the means are drastically different from each other. • Investopedia explains Coefficient Of Variation - CV In the investing world, the coefficient of variation allows you to determine how much volatility (risk) you are assuming in comparison to the amount of return you can expect from your investment. In simple language, the lower the ratio of standard deviation to mean return, the better your risk-return tradeoff. Chapter 3 89 MEASURE OF DISPERSION Comparing coefficient of variation the higher the coefficient of variation, the more dispersed are the data Chapter 3 90 MEASURE OF DISPERSION Example 2 New Car Used Car Mean = RM20,100 Mean = RM5,485 Standard deviation = RM6,125 Standard dev.= RM2,730 Chapter 3 91 MEASURE OF DISPERSION 7. Quartile Deviation a) Set of data b) Simple frequency distribution - Quartiles are defined as value which are quarter the data - Q1 - first quartile - value below 25% of observations - Q2 - second quartile - half of the data(median) - Q3 - third quartile - value below 75% of Quartile Deviation = Q3 - Q1 2 Inter-quartile range = Q3 -Q1 Q1 n 1 4 3 n 1 Q3 4 c) Grouped frequency (i) Graphical method (ii) Interpolation method observation Chapter 3 92 MEASURE OF DISPERSION (i) Graphical Method F n 3n/4 n/4 x Q1 Q3 Chapter 3 93 MEASURE OF DISPERSION (ii) Interpolation Method n 4 - F Q1-1 Q1 = LQ + CQ 1 1 fQ 1 Where: LQ1 = the lower boundary of the class containing Q1. n = the total frequencies FQ1-1 = the cumulative number of frequency in the classes immediately preceding the class containing Q1. fQ1 = the frequency in the class containing Q1. CQ1 = the width of the class in which Q1 lies. Chapter 3 94 MEASURE OF DISPERSION 3n - FQ 4 3-1 Q3 = L Q + CQ 3 3 fQ 3 Where: LQ3 = the lower boundary of the class containing Q3. n = the total frequencies. FQ3-1 = the cumulative number of frequency in the classes immediately preceding the class containing Q3. fQ3 = the frequency in the class containing Q3. CQ3 = the width of the class in which Q3 lies. Chapter 3 95 MEASURE OF DISPERSION Example a. Find the quartile deviation of the following data: Class Frequency 0 - 9.9 10 - 19.9 20 - 29.9 30 - 39.9 40 - 49.9 50 - 59.9 60 - 69.9 5 19 38 43 34 17 4 Chapter 3 96 MEASURE OF DISPERSION Class f F 0 - 9.9 10 - 19.9 20 - 29.9 30 - 39.9 40 - 49.9 50 - 59.9 60 - 69.9 5 19 38 43 34 17 4 5 24 62 105 139 156 160 n - FQ 1-1 C Q1 = LQ + 4 Q1 1 fQ 1 n Q = 1 4 3n Q = 3 4 3n - FQ 4 3-1 Q3 = L Q + C Q3 3 f Q3 160 4 24 19.95 10 38 3(160) 105 39.95 4 10 34 24.16 44.36 Chapter 3 97 MEASURE OF DISPERSION Therefore the quartile deviation is, Q3 - Q1 2 44.36 24.16 2 10.1 Quartile Deviation = Chapter 3 98 MEASURE OF SKEWNESS •Skewness is the degree of asymmetry •Method to describe data distribution •Data which are not symmetrical may be either positively or negatively skewed. negative skewness positive skewness Chapter 3 99 MEASURE OF SKEWNESS Mean Mode Median Mode Median Mean Symmetric Histogram Positive Skewed Histogram Median Mean Mode Chapter 3 Negative Skewed Histogram 100 MEASURE OF SKEWNESS Example a. What type of distribution is described by the following information? Mean = 56 Median = 58.1 Mode = 63 Answer : Negatively skewed b. 1 2 3 4 1 3 1 0 1 4 1 0 2 2 3 3 4 5 6 7 4 5 6 6 2 2 2 3 1 Based on the stem-and-leaf plots above, find the i) median, ii) mode, iii) mean and iv) describe the shape of the distribution. Answer : i) 24 ii) 32 iii) 23.76 iv) Negative skewed distribution Chapter 3 101 MEASURE OF SKEWNESS c. Class Frequency 0 - 100 100 - 200 200 - 300 300 - 400 400 - 500 5 19 38 43 34 Based on the distribution table i) construct a histogram, and ii) describe the shape of the distribution. Chapter 3 102 MEASURE OF SKEWNESS Curve A Chapter 3 103 MEASURE OF SKEWNESS Curve A Curve B Chapter 3 104 MEASURE OF SKEWNESS Curve C Curve A Curve B Chapter 3 105 MEASURE OF SKEWNESS Curve A: Chapter 3 106 MEASURE OF SKEWNESS Curve A: Curve B: Chapter 3 107 MEASURE OF SKEWNESS Curve A: Positively Skewed Chapter 3 108 MEASURE OF SKEWNESS Curve A: Positively Skewed Curve B: Negatively Skewed Chapter 3 109 BOX-AND-WHISKER PLOT A plot that show the center, spread and skewness of a data set. It is constructed by drawing a box and two whiskers that use the median,the first quartile, the third quartile and the smallest and the largest values in the data set between the lower and the upper inner fences. Minimum Q1 Q2 Q3 Maximum Chapter 3 110 BOX-AND-WHISKER PLOT Example The following data are the incomes (in thousands of dollars) for a sample of 12 households. 35 29 44 72 34 64 41 50 104 39 58 Construct a box-and-whisker plot for these data. Chapter 3 54 111 BOX-AND-WHISKER PLOT Solution: Step 1: Rank the data 29 58 34 64 Q3 35 72 39 104 Q1 41 44 50 54 median 44 50 median 47 2 35 39 Q1 37 2 58 64 Q3 61 2 IQR(Q3 Q1 ) 61 37 Chapter 3 112 Step 2: Determine the lower and upper inner fences 1.5 IQR 1.5 24 36 Lower inner fence Q1 36 37 36 1 Upper inner fence Q3 36 61 36 97 Step 3: Determine the smallest and the largest values in the data set within the two inner fences Smallest value = 29 Largest value = 72 Step 4: Draw median First quartile 25 35 45 Third quartile 55 65 75 85 95 105 Chapter 3 113 : called whiskers Step 5: median First quartile Third quartile smallest value within the two inner fences 25 35 45 55 65 75 largest value within the two inner fences 85 an outlier * 95 105 outlier : value that falls outside the two inner fences (value that are very small or very large relative). The data are skewed to the right Chapter 3 114 BOX-AND-WHISKER PLOT S<0 Negatively Skewed S=0 S>0 Symmetric (Not Skewed) Positively Skewed Chapter 3 115 BOX-AND-WHISKER PLOT Left-Skewed Q1 Q2 Q3 Symmetric Q1Q2Q3 Right-Skewed Q1 Q2 Q3 Chapter 3 116 BOX-AND-WHISKER PLOT Median close to the center of the box -- symmetrical Median close to the left of the center of the box -positive skewed Median close to the right of the center of the box -negative skewed Whiskers are the same length -- symmetrical Whisker is longer than the left whisker -- positive skewed Whisker is longer than the right whisker -- negative skewed Chapter 3 117 BOX-AND-WHISKER PLOT A bimodal distribution has two modes. All classes occur with approximately the same frequency in a uniform distribution. An outlier in any graph of data is an individual observation that falls outside the overall pattern of the graph. Chapter 3 118 POPULATION VERSUS SAMPLE Measurement Sample Population Mean x Standard deviation s Variance 2 s Chapter 3 119 POPULATION VERSUS SAMPLE 1. The following are ages of all eight employees of a small company 53 32 61 27 39 44 49 57 Find the mean age of these employees. POPULATION 2. The following data give the weight lost (in pounds) by a sample of five members of a health club at the end of two months of membership. 10 5 19 8 3 Find the median SAMPLE Chapter 3 120 POPULATION VERSUS SAMPLE Example 3. Data in table below refer to the 2002 payrolls (in million of dollars) of five MLB teams. Those data are reproduced here. MLB Team Anaheim Angels Atlanta Braves New York Yankees St Louis Cardinals Tampa Bay Devil Rays 2002 Total Payroll (million of dollars) 62 93 126 75 34 Find the variance and standard deviation of these data SAMPLE Chapter 3 121 POPULATION VERSUS SAMPLE Example 4. Following are the 2002 earning (in thousand of dollars) before taxes for all six employees of a small company. 48.50 38.40 65.50 22.6 Calculate the variance and standard deviation for these data. POPULATION Chapter 3 122