Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Business Statistics, 4e by Ken Black Chapter 3 Discrete Distributions Descriptive Statistics Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Learning Objectives • Distinguish between measures of central tendency, measures of variability, measures of shape, and measures of association. • Understand the meanings of mean, median, mode, quartile, percentile, and range. • Compute mean, median, mode, percentile, quartile, range, variance, standard deviation, and mean absolute deviation on ungrouped data. • Differentiate between sample and population variance and standard deviation. Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-2 Learning Objectives -- Continued • Understand the meaning of standard deviation as it is applied by using the empirical rule and Chebyshev’s theorem. • Compute the mean, median, standard deviation, and variance on grouped data. • Understand box and whisker plots, skewness, and kurtosis. • Compute a coefficient of correlation and interpret it. Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-3 Measures of Central Tendency: Ungrouped Data • Measures of central tendency yield information about “particular places or locations in a group of numbers.” • Common Measures of Location – Mode – Median – Mean – Percentiles – Quartiles Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-4 Mode • The most frequently occurring value in a data set • Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio) • Bimodal -- Data sets that have two modes • Multimodal -- Data sets that contain more than two modes Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-5 Mode -- Example • The mode is 44. • There are more 44s than any other value. Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 35 41 44 45 37 41 44 46 37 43 44 46 39 43 44 46 40 43 44 46 40 43 45 48 3-6 Median • Middle value in an ordered array of numbers. • Applicable for ordinal, interval, and ratio data • Not applicable for nominal data • Unaffected by extremely large and extremely small values. Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-7 Median: Computational Procedure • First Procedure – Arrange the observations in an ordered array. – If there is an odd number of terms, the median is the middle term of the ordered array. – If there is an even number of terms, the median is the average of the middle two terms. • Second Procedure – The median’s position in an ordered array is given by (n+1)/2. Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-8 Median: Example with an Odd Number of Terms Ordered Array 3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22 • • • • There are 17 terms in the ordered array. Position of median = (n+1)/2 = (17+1)/2 = 9 The median is the 9th term, 15. If the 22 is replaced by 100, the median is 15. • If the 3 is replaced by -103, the median is 15. Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-9 Median: Example with an Even Number of Terms Ordered Array 3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 • There are 16 terms in the ordered array. • Position of median = (n+1)/2 = (16+1)/2 = 8.5 • The median is between the 8th and 9th terms, 14.5. • If the 21 is replaced by 100, the median is 14.5. • If the 3 is replaced by -88, the median is 14.5. Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-10 Arithmetic Mean • • • • • Commonly called ‘the mean’ is the average of a group of numbers Applicable for interval and ratio data Not applicable for nominal or ordinal data Affected by each value in the data set, including extreme values • Computed by summing all values in the data set and dividing the sum by the number of values in the data set Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-11 Population Mean X X X X ... X 1 2 3 N N N 24 13 19 26 11 5 93 5 18. 6 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-12 Sample Mean X X X X ... X X 1 2 3 n n n 57 86 42 38 90 66 6 379 6 63.167 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-13 Percentiles • Measures of central tendency that divide a group of data into 100 parts • At least n% of the data lie below the nth percentile, and at most (100 - n)% of the data lie above the nth percentile • Example: 90th percentile indicates that at least 90% of the data lie below it, and at most 10% of the data lie above it • The median and the 50th percentile have the same value. • Applicable for ordinal, interval, and ratio data • Not applicable for nominal data Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-14 Percentiles: Computational Procedure • Organize the data into an ascending ordered array. • Calculate the P percentile location: i 100 (n) • Determine the percentile’s location and its value. • If i is a whole number, the percentile is the average of the values at the i and (i+1) positions. • If i is not a whole number, the percentile is at the (i+1) position in the ordered array. Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-15 Percentiles: Example • Raw Data: 14, 12, 19, 23, 5, 13, 28, 17 • Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28 • Location of 30 30th percentile: i (8) 2. 4 100 • The location index, i, is not a whole number; i+1 = 2.4+1=3.4; the whole number portion is 3; the 30th percentile is at the 3rd location of the array; the 30th percentile is 13. Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-16 Quartiles • Measures of central tendency that divide a group of data into four subgroups • Q1: 25% of the data set is below the first quartile • Q2: 50% of the data set is below the second quartile • Q3: 75% of the data set is below the third quartile • Q1 is equal to the 25th percentile • Q2 is located at 50th percentile and equals the median • Q3 is equal to the 75th percentile • Quartile values are not necessarily members of the data set Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-17 Quartiles Q2 Q1 25% 25% Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. Q3 25% 25% 3-18 Quartiles: Example • Ordered array: 106, 109, 114, 116, 121, 122, 125, 129 • Q1 25 109114 i (8) 2 100 Q1 2 1115 . • Q2: 50 i (8) 4 100 116121 Q2 1185 . 2 • Q3: 75 i (8) 6 100 122125 Q3 1235 . 2 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-19 Variability No Variability in Cash Flow Variability in Cash Flow Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. Mean Mean Mean Mean 3-20 Variability Variability No Variability Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-21 Measures of Variability: Ungrouped Data • Measures of variability describe the spread or the dispersion of a set of data. • Common Measures of Variability – Range – Interquartile Range – Mean Absolute Deviation – Variance – Standard Deviation – Z scores – Coefficient of Variation Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-22 Range • The difference between the largest and the smallest values in a set of data • Simple to compute 35 41 44 • Ignores all data points except 37 41 the44 two extremes 37 43 44 • Example: Range 39 43 = 44 Largest - Smallest = 40 43 44 48 - 35 = 13 40 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 43 45 45 46 46 46 46 48 3-23 Interquartile Range • Range of values between the first and third quartiles • Range of the “middle half” • Less influenced by extremes Interquartile Range Q 3 Q1 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-24 Deviation from the Mean • Data set: 5, 9, 16, 17, 18 • Mean: X 65 13 N 5 • Deviations from the mean: -8, -4, 3, 4, 5 -4 -8 0 +3 5 10 15 +4 +5 20 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-25 Mean Absolute Deviation • Average of the absolute deviations from the mean X 5 9 16 17 18 X X -8 -4 +3 +4 +5 0 +8 +4 +3 +4 +5 24 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. M . A. D. X N 24 5 4.8 3-26 Population Variance • Average of the squared deviations from the arithmetic mean X 5 9 16 17 18 X X -8 -4 +3 +4 +5 0 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 64 16 9 16 25 130 2 2 X N 130 5 2 6 .0 3-27 2 Population Standard Deviation • Square root of the variance X 5 9 16 17 18 X X -8 -4 +3 +4 +5 0 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 64 16 9 16 25 130 2 2 X N 130 5 2 6 .0 2 2 6 .0 5 .1 3-28 2 Sample Variance • Average of the squared deviations from the arithmetic mean X 2,398 1,844 1,539 1,311 7,092 X X X 625 71 -234 -462 0 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. X 390,625 5,041 54,756 213,444 663,866 2 S 2 X X n1 6 6 3 ,8 6 6 3 2 2 1 , 2 8 8 .6 7 3-29 2 Sample Standard Deviation • Square root of the sample variance X X X X 2,398 1,844 1,539 1,311 7,092 625 71 -234 -462 0 X 390,625 5,041 54,756 213,444 663,866 2 S 2 X X n1 6 6 3 ,8 6 6 3 2 2 1 , 2 8 8 .6 7 S S 2 2 2 1 , 2 8 8 .6 7 4 7 0 .4 1 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-30 2 Uses of Standard Deviation • Indicator of financial risk • Quality Control – construction of quality control charts – process capability studies • Comparing populations – household incomes in two cities – employee absenteeism at two plants Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-31 Standard Deviation as an Indicator of Financial Risk Annualized Rate of Return Financial Security A 15% 3% B 15% 7% Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-32 Empirical Rule • Data are normally distributed (or approximately normal) Distance from the Mean 1 2 3 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. Percentage of Values Falling Within Distance 68 95 99.7 3-33 Chebyshev’s Theorem • Applies to all distributions 1 P( k X k ) 1 2 k for k > 1 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-34 Chebyshev’s Theorem • Applies to all distributions Number of Standard Deviations K=2 K=3 K=4 Distance from the Mean 2 3 4 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. Minimum Proportion of Values Falling Within Distance 1-1/22 = 0.75 1-1/32 = 0.89 1-1/42 = 0.94 3-35 Coefficient of Variation • Ratio of the standard deviation to the mean, expressed as a percentage • Measurement of relative dispersion C.V . 100 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-36 Coefficient of Variation 84 29 1 1 4.6 CV . . 1 1 100 1 4.6 100 29 1586 . Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 2 2 10 CV . . 2 2 100 2 10 100 84 1190 . 3-37 Measures of Central Tendency and Variability: Grouped Data • Measures of Central Tendency – Mean – Median – Mode • Measures of Variability – Variance – Standard Deviation Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-38 Mean of Grouped Data • Weighted average of class midpoints • Class frequencies are the weights fM f fM N f 1M 1 f 2 M 2 f 3 M 3 f iM i f 1 f 2 f 3 fi Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-39 Calculation of Grouped Mean Class Interval Frequency Class Midpoint 20-under 30 6 25 30-under 40 18 35 40-under 50 11 45 50-under 60 11 55 60-under 70 3 65 70-under 80 1 75 50 fM f fM 150 630 495 605 195 75 2150 2150 43 . 0 50 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-40 Median of Grouped Data N cfp W Median L 2 fmed Where: L the lower limit of the median class cfp = cumulative frequency of class preceding the median class fmed = frequency of the median class W = width of the median class N = total of frequencies Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-41 Median of Grouped Data -- Example Cumulative Class Interval Frequency Frequency 20-under 30 6 6 30-under 40 18 24 40-under 50 11 35 50-under 60 11 46 60-under 70 3 49 70-under 80 1 50 N = 50 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. N cfp W Md L 2 fmed 50 24 10 40 2 11 40.909 3-42 Mode of Grouped Data • Midpoint of the modal class • Modal class has the greatest frequency 30 40 Class Interval Frequency Mode 35 20-under 30 6 2 30-under 40 18 40-under 50 50-under 60 60-under 70 70-under 80 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 11 11 3 1 3-43 Variance and Standard Deviation of Grouped Data Population Sample f M S N 2 2 2 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 2 S f M X n1 S 2 3-44 2 Population Variance and Standard Deviation of Grouped Data Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 2 f f M fM 6 18 11 11 3 1 50 25 35 45 55 65 75 150 630 495 605 195 75 2150 M N Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. M M -18 -8 2 12 22 32 2 7200 144 50 2 f M 1944 1152 44 1584 1452 1024 7200 324 64 4 144 484 1024 2 2 144 12 3-45 Measures of Shape • Skewness – Absence of symmetry – Extreme values in one side of a distribution • Kurtosis – – – – Peakedness of a distribution Leptokurtic: high and thin Mesokurtic: normal shape Platykurtic: flat and spread out • Box and Whisker Plots – Graphic display of a distribution – Reveals skewness Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-46 Skewness Negatively Skewed Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. Symmetric (Not Skewed) Positively Skewed 3-47 Skewness Mean Median Mean Median Mode Negatively Skewed Symmetric (Not Skewed) Mode Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. Mean Mode Median Positively Skewed 3-48 Coefficient of Skewness • Summary measure for skewness S 3 Md • If S < 0, the distribution is negatively skewed (skewed to the left). • If S = 0, the distribution is symmetric (not skewed). • If S > 0, the distribution is positively skewed (skewed to the right). Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-49 Coefficient of Skewness 1 M d1 1 S 1 23 26 M d2 12.3 2 3 1 M d1 S 1 3 23 26 12.3 0.73 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 2 2 26 26 M d3 12.3 3 3 2 M d2 2 3 26 26 12.3 0 S 3 3 29 26 12.3 3 3 M d3 3 3 29 26 12.3 0.73 3-50 Kurtosis • Peakedness of a distribution – Leptokurtic: high and thin – Mesokurtic: normal in shape – Platykurtic: flat and spread out Leptokurtic Mesokurtic Platykurtic Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-51 Box and Whisker Plot • Five secific values are used: – – – – – Median, Q2 First quartile, Q1 Third quartile, Q3 Minimum value in the data set Maximum value in the data set • Inner Fences – IQR = Q3 - Q1 – Lower inner fence = Q1 - 1.5 IQR – Upper inner fence = Q3 + 1.5 IQR • Outer Fences – Lower outer fence = Q1 - 3.0 IQR – Upper outer fence = Q3 + 3.0 IQR Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-52 Box and Whisker Plot Minimum Q1 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. Q2 Q3 Maximum 3-53 Skewness: Box and Whisker Plots, and Coefficient of Skewness S<0 Negatively Skewed S=0 Symmetric (Not Skewed) Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. S>0 Positively Skewed 3-54 Pearson Product-Moment Correlation Coefficient r SSXY SSX SSY X X Y Y X X Y Y X Y XY n 2 2 X 2 X 2 n Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. Y Y 2 n 2 1 r 1 3-55 Three Degrees of Correlation r<0 r>0 r=0 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-56 Computation of r for the Economics Example (Part 1) Day 1 2 3 4 5 6 7 8 9 10 11 12 Summations Interest X 7.43 7.48 8.00 7.75 7.60 7.63 7.68 7.67 7.59 8.07 8.03 8.00 92.93 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. Futures Index Y 221 222 226 225 224 223 223 226 226 235 233 241 2,725 X2 55.205 55.950 64.000 60.063 57.760 58.217 58.982 58.829 57.608 65.125 64.481 64.000 720.220 Y2 48,841 49,284 51,076 50,625 50,176 49,729 49,729 51,076 51,076 55,225 54,289 58,081 619,207 XY 1,642.03 1,660.56 1,808.00 1,743.75 1,702.40 1,701.49 1,712.64 1,733.42 1,715.34 1,896.45 1,870.99 1,928.00 21,115.07 3-57 Computation of r for the Economics Example (Part 2) r X Y XY n 2 X Y 2 2 X n Y n 92.93 2725 21,115.07 12 2 92 . 93 720.22 619,207 2725 12 12 2 2 .815 Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-58 Scatter Plot and Correlation Matrix for the Economics Example 245 Futures Index 240 235 230 225 220 7.40 7.60 7.80 8.00 8.20 Interest Interest Interest Futures Index Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. Futures Index 1 0.815254 1 3-59