Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

© 2012 McGraw-Hill Ryerson Limited © 2009 McGraw-Hill Ryerson Limited 1 Lind Marchal Wathen Waite © 2012 McGraw-Hill Ryerson Limited 2 Learning Objectives LO 1 Calculate the arithmetic mean, weighted mean, median, mode, and geometric mean. LO 2 Explain the characteristics, uses, advantages, and disadvantages of each measure of central location. LO 3 Identify the position of the mean, the median, and the mode for both symmetric and skewed distributions. LO 4 Compute and explain the range, the mean deviation, the variance, and the standard deviation. LO 5 Explain the characteristics, uses, advantages, and disadvantages of each measure of dispersion. © 2012 McGraw-Hill Ryerson Limited 3 Learning Objectives LO 6 Explain Chebyshev’s theorem and the Empirical Rule as they relate to a set of observations. LO 7 Compute and explain the coefficient of skewness and the coefficient of variation. LO 8 Compute and interpret quartiles, deciles, and percentiles. LO 9 Construct and interpret box plots. © 2012 McGraw-Hill Ryerson Limited 4 Introduction © 2012 McGraw-Hill Ryerson Limited 5 Numerical Descriptive Measures Measures of Location • • • • • Arithmetic Mean Weighted Mean Median Mode Geometric Mean Measures of Dispersion • Range • Mean Absolute Deviation • Variance • Standard Deviation © 2012 McGraw-Hill Ryerson Limited 6 LO 1 THE POPULATION MEAN © 2012 McGraw-Hill Ryerson Limited 7 Population Mean For ungrouped data, the population mean is the sum of all the population values divided by the total number of population values. The formula for Population Mean is : Sum of all the values in the population Population Mean = Number of values in the population LO 1 © 2012 McGraw-Hill Ryerson Limited 8 Population Mean The mean of a population using mathematical symbols is: Where: μ represents the population mean. It is the Greek lowercase letter “mu.” N is the number of items in the population. X represents any particular value. Σ is the Greek capital letter “sigma” and indicates the operation of adding. ΣX is the sum of the X values. LO 1 © 2012 McGraw-Hill Ryerson Limited 9 Example – Population Mean There are 15 teams in the Eastern Conference of the NHL. Listed below is the number of goals scored by each team in the 2010–2011 season (www.nhl.com). Team Goals Scored Team Goals Scored Philadelphia Flyers 256 Washington Capitals 219 Boston Bruins 244 Atlanta Thrashers 218 Tampa Bay Lightning 241 Toronto Maple Leafs 213 Buffalo Sabres 240 Montreal Canadiens 213 Carolina Hurricanes 231 Florida Panthers 191 Pittsburgh Penguins 228 Ottawa Senators 190 New York Islanders 225 New Jersey Devils 171 New York Rangers 224 What is the arithmetic mean number of goals scored? LO 1 © 2012 McGraw-Hill Ryerson Limited 10 Solution – Population Mean X N 256 244 ... 190 171 15 3304 15 220 LO 1 © 2012 McGraw-Hill Ryerson Limited 11 LO 1 THE SAMPLE MEAN © 2012 McGraw-Hill Ryerson Limited 12 Sample Mean For ungrouped data, the sample mean is the sum of all the sample values divided by the total number of sampled values. The formula for Sample Mean is : Sum of all the values in the sample Sample Mean = Number of values in the Sample LO 1 © 2012 McGraw-Hill Ryerson Limited 13 Sample Mean The mean of a sample using mathematical symbols is: Where: X is the sample mean. It is read “X bar.” n is the number sample. LO 1 © 2012 McGraw-Hill Ryerson Limited 14 Example – Sample Mean Microcorp is doing research about the number of minutes spent on social networking website by internet users. A random sample of 14 internet users showed the following number of minutes used last month. 85 93 105 87 120 97 120 79 114 107 115 88 109 113 What is the arithmetic mean number of minutes used? LO 1 © 2012 McGraw-Hill Ryerson Limited 15 Solution – Sample Mean X X n 85 93 105 ... 113 14 1432 14 102.28 LO 1 © 2012 McGraw-Hill Ryerson Limited 16 The Mean In Excel 1. From the tool bar, select the Paste Function, or use Insert, Function. 2. From the Function Category list, select Statistical. In the Function name list, select AVERAGE. Click OK. A dialog box opens. 3. Enter the range A2:A14 in the Number1 box. The answer appears in the dialog box. Click OK. LO 1 © 2012 McGraw-Hill Ryerson Limited 17 LO 2 THE PROPERTIES OF THE ARITHMETIC MEAN © 2012 McGraw-Hill Ryerson Limited 18 Properties of the Arithmetic Mean 1. Every set of interval-level and ratio-level data has a mean. 2. All the values are included in computing the mean. 3. The mean is unique. 4. The sum of the deviations of each value from the mean is zero. LO 2 © 2012 McGraw-Hill Ryerson Limited 19 You Try It Out! The annual growth of a sample of manufacturing industries at Northernhouse are: $70000, $72000, $65000, $75000 and $68000. a) Find the sample mean. b) Is the mean you computed in (a) a statistic or a parameter? Why? c) What is your best estimate of the population mean? LO 2 All 511 employees in Micronet are considered the population. Their working hours in a week are 45, 40, 38, 37, 42, 41, 35, 44 a) Compute the mean working hours in a week. b) Is the mean you computed in (a) a statistic or a parameter? Why? © 2012 McGraw-Hill Ryerson Limited 20 LO 1 THE WEIGHTED MEAN © 2012 McGraw-Hill Ryerson Limited 21 Weighted Mean The weighted mean of a set of numbers designated X1, X2, ..., Xn, with corresponding weights w1, w2, ...,wn, is computed from the following formula: LO 1 © 2012 McGraw-Hill Ryerson Limited 22 Example – Weighted Mean A car manufacturer recorded the speeds of newly designed models. The grades follow. What is the mean grade of the speed? Grade on Speed Number of cars LO 1 5 4 6 5 7 13 © 2012 McGraw-Hill Ryerson Limited 8 11 9 8 10 7 23 Solution – Weighted Mean 4(5) 5(6) 13(7) 11(8) 8(9) 7(10) 4 5 13 11 8 7 371 48 7.73 Xw LO 1 © 2012 McGraw-Hill Ryerson Limited 24 You Try It Out! NewEra Bakery sold 100 cakes for the regular price of $200. For Christmas the cake price was reduced to $150 and 130 were sold. On New Year’s, the price was reduced to $100 and 175 cakes were sold. a) What was the weighted mean price of cake? b) NewEra Bakery spent $150 a cake for the 400 cakes. Comment on the Bakery’s profit per cake if a baker receives a $20 commission for each one sold. LO 1 © 2012 McGraw-Hill Ryerson Limited 25 LO 1 THE MEDIAN © 2012 McGraw-Hill Ryerson Limited 26 The Median The median is the midpoint of the values after they have been ordered from the smallest to the largest. 1. There are as many values above the median as below it in the data array. 2. For an even set of values, the median will be the arithmetic average of the two middle numbers. LO 1 © 2012 McGraw-Hill Ryerson Limited 27 Example – The Median LO The ages for a sample of seven bank employees are: The weights of six patients, in kilograms, are: 31, 35, 29, 30, 32, 33,34 66, 63, 70, 65,64,62 Find the median age. Find the median weight. 1 © 2012 McGraw-Hill Ryerson Limited 28 Solution – The Median Arranging the data in ascending order gives: Arranging the data in ascending order gives: 29,30,31,32,33,34,35 62,63,64,65,66,70 Thus the median is 32. LO 1 Thus the median is 64.5. © 2012 McGraw-Hill Ryerson Limited 29 LO 2 THE PROPERTIES OF THE MEDIAN © 2012 McGraw-Hill Ryerson Limited 30 Properties of the Median 1. There is a unique median for each data set. 2. It is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur. 3. It can be computed for ratio-level, interval-level, and ordinal-level data. LO 2 © 2012 McGraw-Hill Ryerson Limited 31 The Median In Excel 1. From the tool bar, select the Paste Function, or use Insert, Function. 2. From the Function category list, select Statistical. In the Function name list, select MEDIAN. Click OK. A dialogue box opens. 3. Enter the range in the Number1 box. The answer appears in the dialogue box. Click OK. LO 2 © 2012 McGraw-Hill Ryerson Limited 32 LO 1 THE MODE © 2012 McGraw-Hill Ryerson Limited 33 The Mode The mode is the value of the observation that appears most frequently. CHART 3–1 Number of Respondents Favoring Various Bath Oils LO 1 © 2012 McGraw-Hill Ryerson Limited 34 Example – The Mode Average earnings of Canadians 15 years and older with university degrees in selected cities are shown below. (Statistics Canada, Census of Population, 2001) What is the modal average earning? LO 1 City Salary ($) City Salary ($) Calgary 58000 Regina 46000 Edmonton 45000 Saint John 46000 Halifax 43000 Greater Sudbury 50000 Hamilton 52000 Toronto 56000 London 48000 Winnipeg 43000 Montreal 47000 Vancouver 46000 Ottawa-Gatineau 55000 Victoria 42000 © 2012 McGraw-Hill Ryerson Limited 35 Solution – The Mode A perusal of the earnings reveals that $46 000 appears more often (three times) than any other amount. Therefore, the mode is $46 000. LO 1 © 2012 McGraw-Hill Ryerson Limited 36 You Try It Out! The average weekly vehicle insurance benefits for 2008, by category, are: $433, $325, $176, $469, $245, and $287. The number of work stoppages in the manufacturing industry for selected months are 8, 5, 12, 7, 9, and 0. a) a) b) What is the median monthly benefit? How many observations are below the median? Above it? b) c) LO 1 What is the median number of stoppages? How many observations are below the median? Above it? What is the modal number of work stoppages? © 2012 McGraw-Hill Ryerson Limited 37 The Mode In Excel 1. Select the Paste Function. 2. In the Function name list, select MODE. Click OK. A dialogue box opens. 3. Enter the range in the Number1 box. The answer appears in the dialogue box. Click OK. LO 2 © 2012 McGraw-Hill Ryerson Limited 38 Excel Solutions 1. Open Excel and the Excel file Table 02-1 from the DataSets on the CD provided. 2. From the menu bar, select Tools, Data Analysis, and Descriptive Statistics; then click OK. In Excel 2007, select Data in place of Tools. 3. Enter A1 : A97 as the Input Range. For Grouped By, select Columns to indicate that your data is in a column; select Labels in First Row to indicate that you have the label List Price in the first cell of the input range. Place the output in the same worksheet by entering C3 in the Output Range. 4. Select the Summary statistics box; click OK. LO 2 © 2012 McGraw-Hill Ryerson Limited 39 Excel Solutions LO 2 © 2012 McGraw-Hill Ryerson Limited 40 LO 3 THE RELATIVE POSITIONS OF THE MEAN, MEDIAN, AND MODE © 2012 McGraw-Hill Ryerson Limited 41 The Relative Positions of the Mean, Median and the Mode CHART 3-2 A Symmetric Distribution CHART 3-3 A Positively Skewed Distribution CHART 3-4 Negatively Skewed Distribution LO 3 © 2012 McGraw-Hill Ryerson Limited 42 You Try It Out! The monthly profits of Hi-Mark corporation were organized into a frequency distribution. The mean of monthly profits was computed to be $105 600, the median $104 700, and the mode $104 200. a) b) LO 3 Sketch the profit in the form of a smoothed frequency polygon. Note the location of the mean, median, and mode on the X-axis. Is the distribution symmetrical, positively skewed, or negatively skewed? Explain. © 2012 McGraw-Hill Ryerson Limited 43 LO 1 THE GEOMETRIC MEAN © 2012 McGraw-Hill Ryerson Limited 44 The Geometric Mean Useful in finding the average change of percentages, ratios, indexes, or growth rates over time Has a wide application in business and economics because we are often interested in finding the percentage changes in sales, salaries, or economic figures, such as the GDP, which compound or build on each other LO 1 © 2012 McGraw-Hill Ryerson Limited 45 The Geometric Mean Will always be less than or equal to the arithmetic mean Defined as the nth root of the product of n values The formula for the geometric mean is written: LO 1 © 2012 McGraw-Hill Ryerson Limited 46 Example – The Geometric Mean The profit on investment earned by Super Constructions Company for five successive years was: 20 percent, 30 percent, 30 percent, 50 percent and 300 percent. What is the geometric mean rate of profit on investment? LO 1 © 2012 McGraw-Hill Ryerson Limited 47 Solution– The Geometric Mean GM 5 (1.2)(1.3)(0.7)(1.5)(4.0) 5 6.552 1.456 LO 1 © 2012 McGraw-Hill Ryerson Limited 48 Another Application Another use of the geometric mean is to determine the percent increase in sales, production or other business or economic series from one time period to another. LO 1 © 2012 McGraw-Hill Ryerson Limited 49 Example – Another Application The population of Alberta grew from 2974807 in January 2001 to 3290350 in January 2007. What was the average annual rate of percentage increase during the period? LO 1 © 2012 McGraw-Hill Ryerson Limited 50 Solution – Another Application 3 290 350 GM 1 2 974 807 1.0169 1 0.0169 6 LO 1 © 2012 McGraw-Hill Ryerson Limited 51 You Try It Out! The percent increase in profits, for the last five years at Cure Medico are: 5.61, 7.85, 11.22, 19.77 and 23.87. a) b) c) LO 1 Find the geometric percent increase. Find the arithmetic percent increase. Is the arithmetic mean equal to or greater than the geometric mean? Sale of Shine Bulbs increased from 65 000 units in 2006 to 132 745 units in 2010. a) Find the geometric mean annual percent increase. © 2012 McGraw-Hill Ryerson Limited 52 Why Study Dispersion? A measure of location, such as the mean or the median, only describes the centre of the data, but it does not tell us anything about the spread of the data. For example, if your nature guide told you that the river ahead averaged 1 m in depth, would you want to wade across on foot without additional information? Probably not. You would want to know something about the variation in the depth. A second reason for studying the dispersion in a set of data is to compare the spread in two or more distributions. © 2012 McGraw-Hill Ryerson Limited 53 Samples of Dispersions CHART 3-5 Histogram of Years of Employment at Hammond Iron Works Inc. CHART 3-6 Hourly Production of Computers at the Kanata and Waterloo Plans © 2012 McGraw-Hill Ryerson Limited 54 LO 4 MEASURES OF DISPERSION © 2012 McGraw-Hill Ryerson Limited 55 Range The range is the difference between the largest and the smallest value in a data set. LO 4 © 2012 McGraw-Hill Ryerson Limited 56 LO 5 ADVANTAGES and DISADVANTAGES of RANGE © 2012 McGraw-Hill Ryerson Limited 57 Advantages and Disadvantages of Range 1) Only two values are used in its calculation 2) It is influenced by extreme values. 3) It is easy to compute and understand. LO 5 © 2012 McGraw-Hill Ryerson Limited 58 Example – Range Below are the hourly production rates of Vision Quest computers at the Kanata and Waterloo plants. Find the range for each of the plants. CHART 3-6 Hourly Production of Computers at the Kanata and Waterloo Plants LO 5 © 2012 McGraw-Hill Ryerson Limited 59 Solution – Range Kanata: 52 – 48 =4 Waterloo: 60 – 40 = 20 LO 5 © 2012 McGraw-Hill Ryerson Limited 60 Mean Deviation The mean deviation is the arithmetic mean of the absolute values of the deviations from the arithmetic mean. LO 4 © 2012 McGraw-Hill Ryerson Limited 61 LO 5 ADVANTAGES AND DISADVANTAGES OF MEAN DEVIATION © 2012 McGraw-Hill Ryerson Limited 62 Advantages and Disadvantages of Mean Deviation 1) All values are used in the calculation. 2) It is easy to understand because it is the average of the deviations from the mean. 3) Major drawback is the use of absolute values. LO 5 © 2012 McGraw-Hill Ryerson Limited 63 Example – Mean Deviation The number of chocolate drinks sold at the Choco-Coffee shop in Ottawa between 5 and 8 p.m. for a sample of 6 days last year were 30, 50, 60, 70, 80 and 100. Determine the mean deviation. X X Number of Chocolate Drinks Sold Daily 30 (30 – 65) = –35 Absolute Deviation 35 50 (50 – 65) = –15 15 60 (60 – 65) = –5 5 70 (70 – 65) = 5 5 80 (80 – 65) = 15 15 100 (100 – 65) = 35 35 Total LO 5 110 © 2012 McGraw-Hill Ryerson Limited 64 Solution – Mean Deviation XX MeanDeviation n 110 6 18.33 LO 5 © 2012 McGraw-Hill Ryerson Limited 65 You Try It Out! The heights of a group of newly arrived plants in a nursery are given below (in centimetres) 10 a) b) c) LO 5 15 14 16 25 13 17 23 What is the range of the heights? Compute the arithmetic mean height. Compute the mean deviation of the heights. © 2012 McGraw-Hill Ryerson Limited 66 LO 4 VARIANCE AND STANDARD DEVIATION © 2012 McGraw-Hill Ryerson Limited 67 Population Variance The Population Variance is the arithmetic mean of the squared deviations from the population mean. Where: σ2 is the symbol for the population variance. It is usually referred to as “sigma squared.” μ is the arithmetic mean of the population. X is the value of each observation in the population. N is the number of observations in the population. LO 4 © 2012 McGraw-Hill Ryerson Limited 68 LO 5 ADVANTAGES AND DISADVANTAGES OF POPULATION VARIANCE © 2012 McGraw-Hill Ryerson Limited 69 Advantages and Disadvantages of Population Variance 1. Advantage: All values are used in the calculation 2. Disadvantage: The units are awkward, the square of the original units LO 5 © 2012 McGraw-Hill Ryerson Limited 70 Steps in computing Population Variance 1. Begin by finding the mean. 2. Next, find the difference between each observation and the mean. 3. Square the difference. 4. Sum all of the squared differences. 5. Divide the sum by the total number of observations in the population. LO 5 © 2012 McGraw-Hill Ryerson Limited 71 Example – Population Variance The ages of all the patients in the isolation ward of Mountainview Hospital are 35, 24, 11, 42, and 23 years. Determine the population variance. Age (X) X X 35 24 11 42 23 135 8 -3 -16 15 -4 0* 64 9 256 225 16 570 2 *Sum of the deviations from mean must equal 0. LO 5 © 2012 McGraw-Hill Ryerson Limited 72 Example – Population Variance X N 135 6 27 2 X 2 N 570 6 95 LO 5 © 2012 McGraw-Hill Ryerson Limited 73 Population Standard Deviation The population standard deviation σ is the square root of the population variance. LO 5 © 2012 McGraw-Hill Ryerson Limited 74 You Try It Out! An office of Fine Tune Telecommunications hired seven trainee engineers this year. Their monthly starting salaries were: $5356; $5651; $5423; $5534; $5467; $5289 and $5670. a) b) c) d) LO 5 Compute the population mean. Compute the population variance. Compute the population standard deviation. Another location hired eight trainee engineers. Their mean monthly salary was $5650, and the standard deviation was $480. Compare the two groups. © 2012 McGraw-Hill Ryerson Limited 75 Sample Variance The population standard deviation σ is the square root of the population variance. It tends to underestimate the population variance σ 2. The use of (n – 1) in the denominator provides the appropriate correction for this tendency. LO 5 © 2012 McGraw-Hill Ryerson Limited 76 Sample Variance Where: s2 is the sample variance. X is the vale of each observation in the sample. X is the mean of the sample. n is the number of observations in the sample. LO 5 © 2012 McGraw-Hill Ryerson Limited 77 Example – Sample Variance The sample of prices of different cuts of beef available at a restaurant is given below. $13, $21, $15, $19, and $22. What is the sample variance? LO 5 © 2012 McGraw-Hill Ryerson Limited 78 Solution – Sample Variance Deviation Method Note the sample mean is: 90 = 18 5 Price ($) X 13 21 15 19 22 90 LO 5 $ X X –5 3 –3 1 4 0 $2 X X 25 9 9 1 16 60 2 s 2 X X 2 n 1 60 5 1 15$2 © 2012 McGraw-Hill Ryerson Limited 79 Solution – Sample Variance Continued Direct Method Price ($) LO 5 2 $ 2 X 13 X 169 21 441 15 225 19 361 22 484 90 1680 s2 X2 X n 1 n 90 1680 5 1 2 2 5 60 5 1 15$2 © 2012 McGraw-Hill Ryerson Limited 80 LO 5 SAMPLE STANDARD DEVIATION © 2012 McGraw-Hill Ryerson Limited 81 Sample Standard Deviation The sample standard deviation is the square root of the sample variance. LO 5 © 2012 McGraw-Hill Ryerson Limited 82 Example – Sample Standard Deviation The sample variance in the previous example involving hourly quantity for cuts of beef is $152. What is the sample standard deviation? LO 5 © 2012 McGraw-Hill Ryerson Limited 83 Solution – Sample Standard Deviation The sample standard deviation is $3.87, found by 15 . Note again that the sample variance is in terms of dollars squared, but taking the square root of 15 gives us $3.87, which is in the same units (dollars) as the original data. LO 5 © 2012 McGraw-Hill Ryerson Limited 84 You Try It Out! A sample of seven employees who will stop working after a few months is given below: 3, 6, 2, 7, 1, 4 and 3. a) b) LO 5 What is the sample variance? Compute the sample standard deviation. © 2012 McGraw-Hill Ryerson Limited 85 LO 6 INTERPRETATION AND USES OF THE STANDARD DEVIATION © 2012 McGraw-Hill Ryerson Limited 86 Chebyshev’s Theorem For any set of observations (sample or population), the proportion of the values that lie within k standard deviations of the mean is at least 1-1/k2, where k is any constant greater than 1. Allows us to determine the minimum proportion of the values that lie within a specified number of standard deviations of the mean. Concerned with any set of values; that is the distribution of values can have any shape. LO 6 © 2012 McGraw-Hill Ryerson Limited 87 Example – Chebyshev’s Theorem The average number of students present today in each class is 89.9, and the standard deviation is 11.31. At least what percent of students lie within plus and minus 3.5 standard deviations? LO 6 © 2012 McGraw-Hill Ryerson Limited 88 Solution – Chebyshev’s Theorem 1 1 1 1 2 k2 3.5 1 12.25 0.92 1 LO 6 © 2012 McGraw-Hill Ryerson Limited 89 LO 6 THE EMPIRICAL RULE © 2012 McGraw-Hill Ryerson Limited 90 The Empirical Rule For a symmetrical, bell-shaped frequency distribution: 1) Approximately 68 percent of the observations will lie within ±1 standard deviation of the mean. 2) About 95 percent of the observations will lie within ±2 standard deviations of the mean. 3) Practically all (99.7 percent) CHART 3–7 A Symmetrical Bell-shaped Curve Showing the relationship between the standard Deviation and the Observations will lie within ±3 standard deviations of the mean. LO 6 © 2012 McGraw-Hill Ryerson Limited 91 Example – The Empirical Rule A sample of the rental rates at the Street Park apartments approximates a symmetrical, bell-shaped distribution. The sample mean is $600; the standard deviation is $24. Using the Empirical Rule, answer these questions: 1. About 68 percent of the rental rates are between what two amounts? 2. About 95 percent of the rental rates are between what two amounts? 3. Almost all of the rental rates are between what two amounts? LO 6 © 2012 McGraw-Hill Ryerson Limited 92 Solution – The Empirical Rule 1. About 68 percent are between $576 ($600 - $24) and $624 ($600 + $24). 2. About 95 percent are between $552 ($600 - 2($24)) and $648 ($600 + 2($24)). 3. Almost all (99.7 percent) are between $528($600 - 3($24)) and $672 ($600 + 3($24)). LO 6 © 2012 McGraw-Hill Ryerson Limited 93 You Try It Out! The Quality Metal Company is one of several domestic manufacturers of PVC pipe. The quality control department sampled 700 20m lengths. At a point 2 m from the end of the pipe they measured the outside diameter. The mean was 2.4 m and the standard deviation 0.2 m. a) If the shape of the distribution is not known, at least what percent of the observations will lie between 2.05 m and 2.35 m? b) If we assume that the distribution of diameters is symmetrical and bell-shaped, about 68 percent of the observations will be between what two values? c) If we assume that the distribution of diameters is symmetrical and bell-shaped, about 95 percent of the observations will be between what two values? LO 6 © 2012 McGraw-Hill Ryerson Limited 94 LO 7 RELATIVE DISPERSION © 2012 McGraw-Hill Ryerson Limited 95 Relative Dispersion In order to make a meaningful comparison of different measures, we need to convert each of these measures to a relative value – that is, a percent. The coefficient of variation is the ratio of the standard deviation to the arithmetic mean, expressed as a percentage. LO 7 © 2012 McGraw-Hill Ryerson Limited 96 Example – Relative Dispersion A study of the amount of money spent on the maintenance of a car and the years of service of the car resulted in these statistics: The mean amount spent was $150; the standard deviation was $30; the mean number of years of service was 15 years; the standard deviation was 1.5 years. Compare the relative dispersion in the two distributions using the coefficient of variation. LO 7 © 2012 McGraw-Hill Ryerson Limited 97 Solution – Relative Dispersion Money Spent Years of Service s CV (100) X $30 CV (100) $150 CV 20% s CV (100) X $1.5 CV (100) $15 CV 10% There is more dispersion relative to the mean in the distribution of money spent compared with the distribution of years of service (because 20 percent 10 percent). LO 7 © 2012 McGraw-Hill Ryerson Limited 98 You Try It Out! A large group of management trainees was given two types of aptitude tests, a quantitative aptitude test and a management aptitude test. The arithmetic mean score on the quantitative aptitude test was 400, with a standard deviation of 20. The mean was 50 and the standard deviation for the management aptitude test was 10. Compare the relative dispersion in the two groups. LO 7 © 2012 McGraw-Hill Ryerson Limited 99 LO 7 SKEWNESS © 2012 McGraw-Hill Ryerson Limited 100 Skewness In a symmetric set of observations the mean and median are equal and the data values are evenly spread around these values. The data values below the mean and median are a mirror image of those above. A set of values is skewed to the right or positively skewed if there is a single peak and the values extend much further to the right of the peak than to the left of the peak. In this case the mean is larger than the median. LO 7 © 2012 McGraw-Hill Ryerson Limited 101 Skewness In a negatively skewed distribution there is a single peak but the observations extend further to the left, in the negative direction, than to the right. In a negatively skewed distribution the mean is smaller than the median. A bimodal distribution will have two or more peaks. LO 7 © 2012 McGraw-Hill Ryerson Limited 102 Shapes of Frequency Polygons CHART 3 – 8 Shapes of Frequency Polygons LO 7 © 2012 McGraw-Hill Ryerson Limited 103 Pearson’s Coefficient of Skewness The major characteristics of the coefficient of skewness are: 1) It can range from –3.00 up to 3.00. 2) A value of 0 indicates a symmetric distribution. It is computed as follows: LO 7 © 2012 McGraw-Hill Ryerson Limited 104 Example – Skewness Following are the earnings per share, in dollars, for a sample of 16 software companies for the year 2008. $0.08 0.12 0.44 0.52 4.55 7.93 8.62 11.15 14.88 17.43 13.13 7.36 1.10 1.19 2.49 1.18 The mean is $5.76. The standard deviation is $5.85. The median is $3.52. Find the coefficient of skewness using Pearson’s estimate. LO 7 © 2012 McGraw-Hill Ryerson Limited 105 Solution – Skewness 3($5.76 $3.52) sk $5.85 1.143 LO 7 © 2012 McGraw-Hill Ryerson Limited 106 Skewness In Excel LO 7 © 2012 McGraw-Hill Ryerson Limited 107 You Try It Out! A sample of ten customer care technicians employed in the customer service department of a large telecommunication company received the following number of calls yesterday: 85, 78, 90, 97, 86, 72, 95, 89, 90, and 75. a) b) c) LO 7 Find the mean, median, and the standard deviation. Compute the coefficient of skewness using Pearson’s method. What is your conclusion regarding the skewness of the data? © 2012 McGraw-Hill Ryerson Limited 108 LO 8 OTHER MEASURES OF DISPERSION © 2012 McGraw-Hill Ryerson Limited 109 Other Measures of Dispersion Quartiles divide a set of observations into four equal parts. Deciles divide a set of observations into 10 equal parts. Percentiles divide a set of observations into 100 equal parts. LO 8 © 2012 McGraw-Hill Ryerson Limited 110 Example – Percentiles Listed below are the commissions earned, in dollars, last month by a sample of 17 brokers at a real estate office. $2345 1657 1876 1235 2789 2354 1987 2309 1985 2001 1967 1976 2343 2654 2313 1768 2650 Locate the median, the first quartile, and the third quartile for the commissions earned. LO 8 © 2012 McGraw-Hill Ryerson Limited 111 Solution - Percentiles The first step is to organize the data from the smallest commission to the largest. $1235 1657 1768 1876 1967 1976 1985 1987 2001 2309 2313 2343 2345 2354 2650 2654 2789 25 L25 (17 1) 4.5 $1967 100 50 L50 (17 1) 9 $2001 100 L75 (17 1) LO 8 75 13.5 $2345 100 © 2012 McGraw-Hill Ryerson Limited 112 You Try It Out! The quality control department of a tomato sauce company is responsible for checking the mass of the 250g jar of tomato sauce. The masses of a sample of eight jars produced last hour are: 251 249 246 248 250 250 243 242 245 a) b) LO 8 What is the median mass? Determine the masses corresponding to the first and third quartiles. © 2012 McGraw-Hill Ryerson Limited 113 LO 9 BOX PLOT © 2012 McGraw-Hill Ryerson Limited 114 Box Plots A box plot is a graphical display, based on quartiles, that helps to picture a set of data Five pieces of data are needed to construct a box plot: 1. 2. 3. 4. 5. LO 9 Minimum value First quartile Median Third quartile Maximum value © 2012 McGraw-Hill Ryerson Limited 115 EXAMPLE – Box Plots Stuart’s Pizza offers free delivery of its pizza within 15 km. Stuart, the owner, wants some information on the time it takes for delivery. How long does a typical delivery take? Within what range of times will most deliveries be completed? For a sample of 20 deliveries, he determined the following information: Minimum value = 15 minutes 1st Quartile (Q1) = 16 minutes Median = 18 minutes 3rd Quartile (Q3) = 26 minutes Maximum value = 31 minutes Develop a box plot for the delivery times. LO 9 © 2012 McGraw-Hill Ryerson Limited 116 Solution – Box Plots LO 9 © 2012 McGraw-Hill Ryerson Limited 117 You Try It Out! The following box plot is given. What are the median, the largest and smallest values, and the first and third quartiles? Would you agree that the distribution is symmetrical? LO 9 © 2012 McGraw-Hill Ryerson Limited 118 THE MEAN, MEDIAN, AND STANDARD DEVIATION OF GROUPED DATA © 2012 McGraw-Hill Ryerson Limited 119 The Mean of Grouped Data To approximate the arithmetic mean of data organized into a frequency distribution, we begin by assuming the observations in each class are represented by the midpoint of the class Where: X is the designation for each sample mean. X is the midpoint of each class. f is the frequency in each class. fX is the frequency in each class times the midpoint of the class. fX is the sum of these products. n is the total number of frequencies. © 2012 McGraw-Hill Ryerson Limited 120 Example – The Mean of Grouped Data The computations for the arithmetic mean of data grouped into a frequency distribution will be shown based on the real estate data, from the south-east area of Edmonton. The frequency distribution for the list prices is given here. Determine the arithmetic mean of the listed prices. List Price ($ thousands) Frequency $250 to under $375 33 375 to under 500 41 500 to under 625 11 625 to under 750 5 750 to under 875 4 875 to under 1000 1 1000 to under 1125 1 Total 96 © 2012 McGraw-Hill Ryerson Limited 121 Solution – The Mean of Grouped Data List Prices, South-East Area of Edmonton List Price ($ thousands) Frequency f Midpoint ($) X fX ($) $250 to under $375 33 $312.5 $10312.5 375 to under 500 41 437.5 17937.5 500 to under 625 11 562.5 6187.5 625 to under 750 5 687.5 3437.5 750 to under 875 4 812.5 3250.0 875 to under 1000 1 937.5 937.5 1000 to under 1125 1 1062.5 1062.5 Total 96 X fX n $43125.0 $43125 $449.2(thousands) 96 © 2012 McGraw-Hill Ryerson Limited 122 The Median of Grouped Data If the data is grouped some of the raw data values may not be available, and so we cannot necessarily determine the exact value of the median. But we can estimate the median by first finding the position of the median (which class it falls in), and then calculating an estimate of the median within this median class. Where: L is the lower limit of the median class. N is the size of the population. f is the frequency of the median class. fc is the cumulative frequencies up to but excluding the median class i is the class width of the median class © 2012 McGraw-Hill Ryerson Limited 123 Example – The Median of Grouped Data Professor Law lists the following quiz marks for his class of 30 students. Determine the median quiz mark. Note the marks have already been sorted from lowest to highest. Grade on Quiz 0 1 2 3 4 5 6 7 8 9 10 Number of Students 0 0 1 0 1 2 4 12 5 3 2 © 2012 McGraw-Hill Ryerson Limited 124 Solution – The Median of Grouped Data Find the position of the median mark. 50 L50 (30 1) 15.5 7 100 Quiz Mark Number of Students fc 0 0 0 1 0 0 2 1 1 3 0 1 4 1 2 5 2 4 6 4 8 7 12 20 8 5 25 9 3 28 10 2 30 © 2012 McGraw-Hill Ryerson Limited 125 The Standard Deviation of Grouped Data If the data of interest are in grouped form (in a frequency distribution), the sample standard deviation can be approximated by substituting ∑fX2 for ∑X2 and ∑fX for ∑X. Where: s X f n is the symbol for the sample standard deviation. is the midpoint of a class. is the class frequency. is the total number of sample observations. © 2012 McGraw-Hill Ryerson Limited 126 Solution – The Standard Deviation of Grouped Data List Prices, South-East Area of Edmonton fX ($) fX2 List Price ($ thousands) Frequency f Midpoint ($) X $250 to under $375 33 $312.5 $10312.5 3222656.25 375 to under 500 41 437.5 17937.5 7847656.25 500 to under 625 11 562.5 6187.5 3480468.75 625 to under 750 5 687.5 3437.5 2363281.25 750 to under 875 4 812.5 3250.0 2640625.00 875 to under 1000 1 937.5 937.5 878906.25 1000 to under 1125 1 1062.5 1062.5 1128906.24 Total 96 $43125.0 21562499.99 (43125)2 21562499.9996 s 151.83 96 1 © 2012 McGraw-Hill Ryerson Limited 127 You Try It Out! The net incomes of a sample of large art dealers were organized into the table given to the right. a) b) c) d) What is the table called? Based on the distribution, what is the estimate of the arithmetic mean net income? Based on the distribution, what is the estimate of the median net income? Based on the distribution, what is the estimate of the standard deviation? Net Income ($Millions) Number of Importers $4 to under $8 2 8 to under 12 3 12 to under 16 12 16 to under 20 7 20 to under 24 5 © 2012 McGraw-Hill Ryerson Limited 128 Chapter Summary I. A measure of central location is a value used to describe the centre of a set of data. A. The arithmetic mean is the most widely reported measure of central location. 1. It is calculated by adding the values of the observations and dividing by the total number of observations. a. The formula for a population mean of ungrouped or raw data is: X [3–1] N b. The formula for the mean of a sample is: X X= [3–2] n © 2012 McGraw-Hill Ryerson Limited 129 Chapter Summary c The formula for the sample mean of data in a frequency distribution is X = fX . [3–16] n 2. The major characteristics of the arithmetic mean are: a. At least the interval scale of measurement is required. b. All the data values are used in the calculation. c. A set of data has only one mean. That is, it is unique. d. The sum of the deviations from the mean equals 0. © 2012 McGraw-Hill Ryerson Limited 130 Chapter Summary B. The weighted mean is found by multiplying each observation by it’s corresponding weight. 1. The formula for determining the weighted mean is: X w w1 X1 w2 X 2 w3 X 3 wn X n [3–3] w1 w2 w3 wn 2. It is a special case of the arithmetic mean. C. The median is the value in the middle of a set of ordered data. 1. To find the median, sort the observations from smallest to largest and identify the middle value. 2. The major characteristics of the median are: a. At least the ordinal scale of measurement is required. © 2012 McGraw-Hill Ryerson Limited 131 Chapter Summary b. It is not influenced by extreme values. c. Fifty percent of the observations are larger than the median. d. It is unique to a set of data. 3. The formula for the median of grouped data is: N fc [3–17] 2 Median = L (i ) f D. The mode is the value that occurs most often in a set of data. 1. The mode can be found for nominal-level data. 2. A set of data can have more than one mode. © 2012 McGraw-Hill Ryerson Limited 132 Chapter Summary E. The geometric mean is the nth root of the product of n values. 1. The formula for the geometric mean is: [3–4] GM = n ( X1 )( X 2 )( X 3 ) ( X n ) 2. The geometric mean is also used to find the rate of change from one period to another. [3–5] Value at end of period GM = n Value at beginning of period 1 3. The geometric mean is always equal to or less than the arithmetic mean. © 2012 McGraw-Hill Ryerson Limited 133 Chapter Summary II. The dispersion is the variation or spread in a set of data. A. The range is the difference between the largest and the smallest value in a set of data. 1. The formula for the range is: Range Largest value Smallest value [3–6] 2. The major characteristics of the range are: a. Only two values are used in its calculation. b. It is influenced by extreme values. c. It is easy to compute and to understand. B. The mean absolute deviation is the sum of the absolute values of the deviations from the mean divided by the number of observations. © 2012 McGraw-Hill Ryerson Limited 134 Chapter Summary 1. The formula for computing the mean absolute deviation is: MD = X X | [3–7] n 2. The major characteristics of the mean absolute deviation are: a. It is not unduly influenced by large or small values. b. All observations are used in the calculation. c. The absolute values are difficult to work with. C. The variance is the mean of the squared deviations from the arithmetic mean. 1. The formula for the population variance is: X 2 2 [3–8] N © 2012 McGraw-Hill Ryerson Limited 135 Chapter Summary 2. The formula for the sample variance is: X X )2 s n 1 2 [3–10] 3. The major characteristics of the variance are: a. All observations are used in the calculation. b. It is not unduly influenced by extreme observations. c. The units are somewhat difficult to work with; they are the original units squared. D. The standard deviation is the square root of the variance. © 2012 McGraw-Hill Ryerson Limited 136 Chapter Summary 1. The major characteristics of the standard deviation are: a. It is in the same units as the original data. b. It is the square root of the average squared deviation from the mean. c. It cannot be negative. d. It is the most widely reported measure of dispersion. 2. The formula for the sample standard deviation is: [3–12] X )2 2 X s= n 1 n © 2012 McGraw-Hill Ryerson Limited 137 Chapter Summary 3. The formula for the standard deviation of fX )2 grouped data is: [3–18] fX 2 s= n1 n III. We interpret the standard deviation using two measures. A. Chebyshev’s theorem states that regardless of the shape of the distribution, at least 1 - 1/k2 of the observations will be within k standard deviations of the mean, where k is greater than 1. B. The Empirical Rule states that for a bell-shaped distribution about 68 percent of the values will be within 1 standard deviation of the mean, about 95 percent within 2, and virtually all within 3. © 2012 McGraw-Hill Ryerson Limited 138 Chapter Summary IV. The coefficient of variation is a measure of relative dispersion. A. The formula for the coefficient of variation is: s CV = (100) [3–13] X B. It reports the variation relative to the mean. C. It is useful for comparing distributions with different units. V. The coefficient of skewness measures the symmetry of a distribution. A. In a positively skewed set of data the long tail is to the right. B. In a negatively skewed distribution the long tail is to the left. © 2012 McGraw-Hill Ryerson Limited 139 Chapter Summary C. There are two formulas for the coefficient of skewness. 1. The formula developed by pearson is: sk = 3( X Median) s [3–14] 2. The coefficient of skewness computed by statistical software is: X X 3 n sk = ( n 1)( n 2) s VI. Measures of location also describe the spread in a set of observations. A. A quartile divides a set of observations into four equal parts. © 2012 McGraw-Hill Ryerson Limited 140 Chapter Summary 1. Twenty-five percent of the observations are less than the first quartile, 50 percent are less than the second quartile (the median), and 75 percent are less than the third quartile. 2. The interquartile range is the difference between the third and the first quartile. B. Deciles divide a set of observations into ten equal parts and percentiles into 100 equal parts. © 2012 McGraw-Hill Ryerson Limited 141 Chapter Summary C. A box plot is a graphic display of a set of data. 1. A box is drawn enclosing the regions between the first and third quartiles. a. A line is drawn inside the box at the median value. b. Dotted line segments are drawn from the third quartile to the largest value to show the highest 25 percent of the values and from the first quartile to the smallest value to show the lowest 25 percent of the values. 2. A box plot is based on five statistics: the maximum and minimum values, the first and third quartiles, and the median. © 2012 McGraw-Hill Ryerson Limited 142