Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Business Statistics Spring 2005 Summarizing and Describing Numerical Data Topics •Measures of Central Tendency Mean, Median, Mode, Midrange, Midhinge •Quartile •Measures of Variation The Range, Interquartile Range, Variance and Standard Deviation, Coefficient of variation •Shape Symmetric, Skewed, using Box-and-Whisker Plots Numerical Data Properties Central Tendency (Location) Variation (Dispersion) Shape Measures of Central Tendency for Ungrouped Data Raw Data Summary Measures Summary Measures Central Tendency Mean Quartile Mode Median Range Midrange Midhinge Variation Coefficient of Variation Variance Standard Deviation Measures of Central Tendency Central Tendency Mean Median Mode n xi i 1 n Midrange Midhinge 3-2 Population Mean For ungrouped data, the population mean is the sum of all the population values divided by the total number of population values: SX N where µ stands for the population mean. N is the total number of observations in the population. X stands for a particular value. S indicates the operation of adding. 3-3 Population Mean Example Parameter: a measurable characteristic of a population. The Kane family owns four cars. The following is the mileage attained by each car: 56,000, 23,000, 42,000, and 73,000. Find the average miles covered by each car. The mean is (56,000 + 23,000 + 42,000 + 73,000)/4 = 48,500 3-4 Sample Mean For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values: x X n where X stands for the sample mean n is the total number of values in the sample Return on Stock 1998 Stock X 10% Stock Y 17% 1997 8 -2 1996 12 16 1995 2 1 1994 8 8 40% Average Return on Stock 40% = 40 / 5 = 8% The Mean (Arithmetic Average) •It is the Arithmetic Average of data values: x Sample Mean n xi i 1 n x1 x2 xn n •The Most Common Measure of Central Tendency •Affected by Extreme Values (Outliers) 0 1 2 3 4 5 6 7 8 9 10 Mean = 5 0 1 2 3 4 5 6 7 8 9 10 12 14 Mean = 6 3-6 Properties of the Arithmetic Mean Every set of interval-level and ratio-level data has a mean. All the values are included in computing the mean. A set of data has a unique mean. The mean is affected by unusually large or small data values. The mean is relatively reliable. The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is zero. 3-7 EXAMPLE Consider the set of values: 3, 8, and 4. The mean is 5. Illustrating the fifth property, (3-5) + (8-5) + (4-5) = -2 +3 -1 = 0. In other words, S( X X ) 0 3-10 The Median Median: The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest. There are as many values above the median as below it in the data array. Note: For an even set of numbers, the median will be the arithmetic average of the two middle numbers. Median Position of Median in Sequence Positioning Point n 1 2 The Median •Important Measure of Central Tendency •In an ordered array, the median is the “middle” number. •If n is odd, the median is the middle number. •If n is even, the median is the average of the 2 middle numbers. •Not Affected by Extreme Values 0 1 2 3 4 5 6 7 8 9 10 Median = 5 0 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5 Properties of the Median • There is a unique median for each data set. • It is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur. • It can be computed for ratio-level, interval-level, and ordinal-level data. • It can be computed for an open-ended frequency distribution if the median does not lie in an openended class. • No arithmetic properties 62 The Mode •A Measure of Central Tendency •Value that Occurs Most Often •Not Affected by Extreme Values •There May Not be a Mode •There May be Several Modes •Used for Either Numerical or Categorical Data 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode Midrange •A Measure of Central Tendency •Average of Smallest and Largest Observation: Midrange x l arg est x smallest 2 •Affected by Extreme Value 0 1 2 3 4 5 6 7 8 9 10 Midrange = 5 0 1 2 3 4 5 6 7 8 9 10 Midrange = 5 Quartiles • • Not a Measure of Central Tendency Split Ordered Data into 4 Quarters 25% 25% Q1 • 25% Q2 Position of i-th Quartile: 25% Q3 position of point Qi i(n+1) 4 Data in Ordered Array: 11 12 13 16 16 17 18 21 22 Position of Q1 = 1•(9 + 1) 4 = 2.50 Q1 =12.5 Quartiles •See text page 107 for “rounding rules” for position of the i-th quartile 25% 25% Q1 25% Q2 25% Q3 • Position (not value) of i-th Quartile: Qi i(n+1) 4 Midhinge • A Measure of Central Tendency • The Middle point of 1st and 3rd Quarters Midhinge = Q1 Q3 2 • Used to Overcome Extreme Values Data in Ordered Array: 11 12 13 16 16 17 18 21 22 Midhinge = Q1 Q 3 12 .5 19 .5 16 2 2 Summary Measures Summary Measures Central Tendency Mean Mode Median n xi i 1 Quartile n Midrange Midhinge Range Variance x i x s n 1 2 2 Variation Coefficient of Variation Standard Deviation Measures of Variation Variation Variance Range Population Variance Sample Variance Interquartile Range Standard Deviation Population Standard Deviation Sample Standard Deviation Coefficient of Variation S CV X 100% The Range • Measure of Variation • Difference Between Largest & Smallest Observations: Range = x La rgest x Smallest • Ignores How Data Are Distributed: Range = 12 - 7 = 5 Range = 12 - 7 = 5 7 8 9 10 11 12 7 8 9 10 11 12 Return on Stock 1998 Stock X 10% Stock Y 17% 1997 8 -2 1996 12 16 1995 2 1 1994 8 8 Range on Stock X = 12 - 2 = 10% Range on Stock Y = 17 - (-2) = 19% Interquartile Range • Measure of Variation • Also Known as Midspread: Spread in the Middle 50% • Difference Between Third & First Quartiles: Interquartile Range = Data in Ordered Array: 11 12 13 16 16 17 • Q 3 Q 1 = 17.5 - 12.5 = 5 Q 3 Q1 17 18 21 Interquartile Range • IQR = 75th percentile - 25th percentile •The IQR is useful for checking for outliers •Not Affected by Extreme Values Data in Ordered Array: 11 12 13 16 16 17 Q 3 Q 1 = 17.5 - 12.5 = 5 17 18 21 Variance & Standard Deviation Measures of Dispersion Most Common Measures Consider How Data Are Distributed Show Variation About Mean (`X or ) X = 8.3 4 6 8 10 12 Variance •Important Measure of Variation •Shows Variation About the Mean: 2 2 Xi •For the Population: N •For the Sample: X i X s n1 2 2 For the Population: use N in the denominator. For the Sample : use n - 1 in the denominator. 4-5 Population Variance The population variance for ungrouped data is the arithmetic mean of the squared deviations from the population mean. S( X ) 2 N 2 Population Variance EXAMPLE The ages of the Dunn family are 2, 18, 34, and 42 years. What is the population variance? x (x- 2 18 34 42 24 24 24 24 -2 2 -6 10 18 (x- ) 2 484 36 100 324 S( X ) N 2 2 SX / N 96 / 4 24 944 2 S ( X ) 2 / N 944 / 4 236 Population Standard Deviation ( x ) N 2 Population Standard Deviation EXAMPLE The ages of the Dunn family are 2, 18, 34, and 42 years. What is the population variance? x (x- 2 18 34 42 24 24 24 24 -2 2 -6 10 18 (x- ) 2 484 36 100 324 944 S( X ) N SX / N 96 / 4 24 S( X ) 944 236 N 4 2 2 Standard Deviation •Most Important Measure of Variation •Shows Variation About the Mean: •For the Population: •For the Sample: s For the Population: use N in the denominator. 2 X i N X i X n 1 2 For the Sample : use n - 1 in the denominator. Sample Variance and Standard Deviation am The sample variance estimates the population variance. NOTE: important computation formriance estimates the population variance. 2 S( X X ) n 1 2 ( S X ) SX 2 n S2 n 1 S2 The sample standard deviation = s s 2 Example of Standard Deviation Amount 600 350 275 430 520 s= s= Deviation from Mean (X - X) 600 - 435 = 165 350 - 435 = -85 275 - 435 = -160 430 -435 = -5 520 - 435 = 85 0 X 435 435 435 435 435 (X X) n 1 2 = 2 (X-X) 27,225 7,225 25,600 25 7,225 67,300 67,300 = 16,825 = 129.71 4 Example of Standard Deviation (Computational Version) (X - X ) ( X - X )2 X 600 435 165 27,225 360000 350 435 -85 7,225 122500 275 435 -160 25,600 75625 430 435 -5 25 184900 520 435 85 7,225 270400 67,300 1013425 2175 X s= 2 x n 1 n 2 X 2 Am ount(X ) 2175 1013425 2 = 5 1 5 = 129.71 Sample Standard Deviation X i X n1 2 s Data: Xi : 10 12 n=8 s= NOTE: For the Sample : use n - 1 in the denominator. 14 15 17 18 18 24 Mean =16 (10 16) 2 (12 16) 2 ..... (18 16) 2 (24 16) 2 8 1 = 4.2426 4-14 Interpretation and Uses of the Standard Deviation Chebyshev’s theorem: For any set of observations, the minimum proportion of the values that lie within k standard deviations of the mean is at least 1 - 1/k2 where k is any constant greater than 1. Multiply by 100% to get percentage of values within k standard deviations of the mean 4-15 Interpretation and Uses of the Standard Deviation Empirical Rule: For any symmetrical, bellshaped distribution, approximately 68% of the observations will lie within 1 of the mean ( );approximately 95% of the observations will lie within 2 of the mean ( ); approximately 99.7% will lie within 3 of the mean ( ). Comparing Standard Deviations Data : X i : 10 N= 8 12 14 15 17 18 18 24 Mean =16 s = X i X n 1 X i N 2 = 4.2426 2 = 3.9686 Value for the Standard Deviation is larger for data considered as a Sample. Comparing Standard Deviations Data A 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 3.338 Data B 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = .9258 Data C 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.57 Coefficient of Variation •Measure of Relative Variation •Always a % •Shows Variation Relative to Mean •Used to Compare 2 or More Groups •Formula ( for Sample): S CV 100% X Comparing Coefficient of Variation Stock A: Average Price last year = $50 Standard Deviation = $5 Stock B: Average Price last year = $100 Standard Deviation = $5 S CV 100% X Coefficient of Variation: Stock A: CV = 10% Stock B: CV = 5% Shape • • Describes How Data Are Distributed Measures of Shape: Symmetric or skewed Left-Skewed Mean Median Mode Symmetric Mean = Median = Mode Right-Skewed Mode Median Mean Box-and-Whisker Plot Graphical Display of Data Using 5-Number Summary X smallest Q1 Median Q3 4 6 8 10 Xlargest 12 Distribution Shape & Box-and-Whisker Plots Left-Skewed Q1 Median Q3 Symmetric Q1 Median Q3 Right-Skewed Q1 Median Q3 Summary • Discussed Measures of Central Tendency Mean, Median, Mode, Midrange, Midhinge • Quartiles • Addressed Measures of Variation The Range, Interquartile Range, Variance, Standard Deviation, Coefficient of Variation • Determined Shape of Distributions Symmetric, Skewed, Box-and-Whisker Plot Mean Median Mode Mean = Median = Mode Mode Median Mean