Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke Learning Objectives Distinguish between measures of central tendency, measures of variability, measures of shape, and measures of association. Compute variance, standard deviation, and mean absolute deviation on ungrouped data. Understand the meaning of standard deviation as it is applied by using the empirical rule and Chebyshev’s theorem. Introduction of skewness, box and whisker plots. Measures of Central Tendency: Ungrouped Data Measures of central tendency yield information about the center, or middle part, of a group of numbers. Common Measures of central tendency Mode Median Mean Percentiles Quartiles Exercise: Computing Central Tend. Measures using Frequency Tables We want to choose one of the two suppliers. We have data about their lateness in delivery (data is in hours). Which one has better statistical measures of central tendency? Supplier 2 Supplier 1 Xi Fi Fi * Xi Xi Fi Fi * Xi 1 4 6 2 4 3 2 16 18 0 1 4 2 0 3 0 0 12 10 12 3 2 n=14 30 24 90 6 10 5 4 n=14 30 40 82 Response: Computing Central Tend. Measures using Frequency Tables Which one has better statistical measures of central tendency? Supplier 1 Supplier 2 Xi Fi Fi * Xi Xi Fi Fi * Xi 1 4 2 4 2 16 0 1 2 0 0 0 6 10 12 3 3 2 n=14 18 30 24 90 4 6 10 3 5 4 n=14 12 30 40 82 Mode= 4 hours Median position= 15/2 = 7.5 Median value= 6 hours Mean = 90/14 = 6.43 hours Mode= 6 hours Median position= 15/2 = 7.5 Median value= 6 hours Mean = 82/14 = 5.8 hours Measures of Dispersion: Variability No Variability in Cash Flow (same amounts) Mean Mean Variability in Cash Flow (different amounts) Mean Mean Measures of Variability: Ungrouped Data Measures of variability describe the spread or the dispersion of a set of data. Common Measures of Variability Range Interquartile Range Mean Absolute Deviation Variance Standard Deviation Z scores Coefficient of Variation Range The difference between the largest and the smallest values in a set of data 35 41 44 Simple to compute Ignores all data points except 37 41 44 the two extremes 37 43 44 Example: 39 43 44 Range = Largest - Smallest 40 43 44 = 48 - 35 = 13 40 43 Weakness: Depends only on two extreme values 45 45 46 46 46 46 48 Interquartile Range Range of values between the first and third quartiles Range of the middle 50% of the ordered data set Less influenced by extremes Interquartile Range Q 3 Q1 Deviation from the Mean Data set: 5, 9, 16, 17, 18 Mean: = 13 Deviations (Xi - ) from the mean: -8, -4, 3, 4, 5 -4 -8 0 5 10 +3 15 +4 +5 20 Mean Absolute Deviation Average of the absolute deviations from the mean ( = 13) X X X X 5 9 16 17 18 -8 -4 +3 +4 +5 0 +8 +4 +3 +4 +5 24 M . A. D. 24 5 4.8 N Variance and Standard Deviation of Grouped Data Population Sample f M S N 2 2 2 2 S f M X n1 S 2 2 Population Variance Average of the squared deviations from the arithmetic mean ( = 13) X 5 9 16 17 18 X X -8 -4 +3 +4 +5 0 64 16 9 16 25 130 2 X 2 2 130 5 26 .0 N Population Standard Deviation Square root of the variance 2 X 2 N 130 5 2 6 .0 2 2 6 .0 5 .1 Mathematically SD’s values for this case include +5.1 and -5.1 Computing Dispersion Measures for a Sample Mean= Fi *Xi Fi = 1655/15 =110.33 Xi Fi Fi * Xi 55 60 100 125 2 1 3 5 110 60 300 625 140 4 15 560 1655 Computing Dispersion Measures for Ungrouped Samples (Formula 1) Mean (μ) = Fi *Xi Fi =1655/15 =110.33 Variance (s 2) = ( Fi * (Xi- μ)2 ) (n –1) =13573.335/(15 –1) =969.52 Standard deviation (s) = 31.137 inches Xi Fi Fi * Xi (Xi- μ) (Xi- μ)2 Fi * (Xi- μ)2 55 60 100 2 1 3 110 60 300 -55.33 -50.33 -10.33 3061.409 2533.109 106.709 6122.818 2533.109 320.127 125 140 5 4 15 625 560 1655 14.67 29.67 215.209 880.309 1076.045 3521.236 13573.335 Computing Dispersion Measures for Ungrouped Samples (Formula 2) Var (s 2) =( Fi* Xi 2 – ( Fi*Xi)2/n) (n –1) = 196175 – (1655 2/15)/(15 –1) =(196175 – 182601.66)/14 = = 969.52 Standard deviation (s) = 31.137 inches Xi Fi Fi * Xi (Xi) 55 60 100 2 1 3 110 60 300 3025 3600 10000 6050 3600 30000 125 140 5 4 15 625 560 1655 15625 19600 78125 78400 196175 2 Fi*(Xi)2 Exercise: Dispersion Measures Var (s 2) = ( ( Fi* Xi 2 ) – (( Fi*Xi)2/n) ) (n –1) Standard deviation (s) = Xi Fi 5 6 10 2 1 3 12 14 2 1 Fi * Xi (Xi) 2 Fi*(Xi)2 Exercise: Variability Measures with Frequency Tables Which worker is more efficient? Worker 1: Time in hours to do work Worker 2: Time in hours to do work Xi Fi Xi Fi 5 2 5 0 6 1 6 3 10 3 10 4 12 2 12 1 14 1 14 1 n=14 n=14 Mode= Median position= Median value= Mean = Mode= Median position= Median value= Mean = In-Class Exercise For the supplier selection problem. Calculate the standard deviation for supplier 1. Put your names on the paper that you use. Response: Computing Central Tend. Measures using Frequency Tables Which one has better statistical measures of central tendency? Supplier 1 Supplier 2 Xi Fi Fi * Xi Xi Fi Fi * Xi 1 4 2 4 2 16 0 1 2 0 0 0 6 10 12 3 3 2 n=14 18 30 24 90 4 6 10 3 5 4 n=14 12 30 40 82 Mode= 4 hours Median position= 15/2 = 7.5 Median value= 6 hours Mean = 90/14 = 6.428 hours Mode= 6 hours Median position= 15/2 = 7.5 Median value= 6 hours Mean = 82/14 = 5.8 hours Exercise: Computing Standard Deviation using Frequency Tables Which one has better statistical measures of central tendency? Supplier 2 (mean = 5.8 hours) Xi Fi Fi * Xi (Xi- Mean) (Xi- Mean)2 Fi *(Xi- Mean)2 0 2 0 (0 - 5.8) = - 5.8 33.64 67.26 1 0 0 (1- 5.8) = - 4.8 23.04 92.16 4 3 12 (4 - 5.8) = - 1.8 3.24 9.72 6 5 30 (6 - 5.8) = + 0.2 0.04 0.12 10 4 40 (10 - 5.8) = + 4.2 17.64 35.28 n=14 82 77.6 204.54 Mode= 6 hours Median position= 15/2 = 7.5 Median value= 6 hours Mean = 82/14 = 5.8 hours Variance (s2) = 204.54/(14-1) = 15.734 hours Standard deviation (s) = 3.966 hours Exercise: Computing Standard Deviation using Frequency Tables Which one has better statistical measures of central tendency? Supplier 1 (mean=6.43 hrs) Xi Fi Fi * Xi (Xi- Mean) (Xi- Mean)2 Fi *(Xi- Mean)2 1 2 2 (1- 6.43) = - 5.423 29.408 58.816 4 4 16 (4- 6.43) = - 2.43 5.905 23.62 6 3 18 (6 - 6.43) = - 0.43 0.185 0.555 10 3 30 (10 - 6.43) = + 3.57 12.745 38.235 12 2 24 (12 - 6.43) = + 5.57 31.025 62.049 n=14 90 Mode= 4 hours Median position= 15/2 = 7.5 Median value= 6 hours Mean = 90/14 = 6.43 hours Variance (s2) = 183.275/(14-1) = 14.098 hours Standard deviation (s) = 3.755 hours 183.275 Which supplier is better? Why? Mode Median position Median Mean Variance Stand. deviation Supplier 1 4 hrs 7.5 6 hours 6.43 hours 14.09 hours 3.75 hours Supplier 2 6 hrs 7.5 6 hours 5.8 hours 15.73 hours 3.96 hours We want to choose one of the two suppliers. We have data about their lateness in delivery (data is in hours). Which supplier is better? Supplier 2 Why? Mode Median position Median Mean Variance Stand. deviation Supplier 1 4 hrs 7.5 6 hours 6.43 hours 14.09 hours 3.75 hours 10.18 to 2.68 Supplier 2 6 hrs 7.5 6 hours 5.8 hours 15.73 hours 3.96 hours 9.76 t0 1.84 We want to choose one of the two suppliers. We have data about their lateness in delivery (data is in hours). Grouped Data Examples Class interval Freq (Fi) M Fi * M Fi * M2 [1 – 3) inch 16 2 inches 32 inches 64 inches [3 – 5) inch 2 4 inches 8 inches 32 inches [5 – 7) inch 4 6 inches 24 inches 144 inches [7 – 9) inch 3 8 inches 24 inches 192 inches [9 – 11) inch 9 10 inches 90 inches 900 inches [11 – 13) inch 6 12 inches 72 inches 864 inches 40 250 2,196 Var (s 2) = Fi* Mi 2 – ( Fi*Mi)2/n (n –1) Standard deviation (s) = 4.03 inches = 2196 – 1562.5 = 16.24 39 Grouped Data Exercise Class interval Freq (Fi) [1 – 4) inches 4 [4 – 8) inches 4 [8 – 12) inches 6 [12 – 16) inches 12 [16 – 20) inches 8 [20 – 24) inches 6 40 M Fi * M Var (s 2) =( (Fi* Mi 2) – (( Fi*Mi)2/n)) = (n –1) Standard deviation (s) = Fi * M2 In-Class exercise: Grouped data Uses of Standard Deviation Indicator of financial risk Quality Control construction of quality control charts process capability studies Comparing populations household incomes in two cities employee absenteeism at two plants Standard Deviation as an Indicator of Financial Risk Annualized Rate of Return Financial Security A 15% 3% B 15% 7% Measures of Shape Skewness Absence of symmetry Extreme values in one side of a distribution Kurtosis Peakedness of a distribution Leptokurtic: high and thin Mesokurtic: normal shape Platykurtic: flat and spread out Box and Whisker Plots Graphic display of a distribution Reveals skewness Relationship of Mean, Median and Mode Relationship of Mean, Median and Mode Relationship of Mean, Median and Mode Empirical Rule Id data are normally distributed (or approximately normal) Distance from the Mean 1 2 3 Percentage of Values Falling Within Distance 68 95 99.7 Chebyshev’s Theorem Applies to all distributions 1 P( k X k ) 1 2 k for k > 1 Chebyshev’s Theorem Applies to all distributions Number of Number Standard of Deviations Standard KDeviation = 2= 2 K s KK = 3= 3 KK = 4= 4 Distance Distancefrom from the theMean Mean 22 33 44 Minimum Minimum Proportion Proportion of ofValues Values Falling Falling Within WithinDistance Distance 22= 0.75 1-1/2 1-1/2 = 0.75 22= 0.89 1-1/3 1-1/3 = 0.89 1-1/42 = 0.94 Box and Whisker Plot Five specific values are used: Median, Q2 First quartile, Q1 Third quartile, Q3 Minimum value in the data set Maximum value in the data set Inner Fences IQR = Q3 - Q1 Lower inner fence = Q1 - 1.5 IQR Upper inner fence = Q3 + 1.5 IQR Outer Fences Lower outer fence = Q1 - 3.0 IQR Upper outer fence = Q3 + 3.0 IQR Box and Whisker Plot Minimum Q1 Q2 Q3 Maximum Exercises