Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 2.3 Measures of Variation Figure 2.31 indicates that we need measures of variation to express how the two distributions differ. Figure 2.31 Repair Times for Personal Computers at Two Service Centers Chapter 2 Descriptive Statistics Measures of Variation Range Largest minus the smallest measurement The Population Variance 2 (pronounced sigma squared) The average of the squared deviations of all the population measurements from the population mean Standard Deviation (pronounced sigma) The square root of the variance Chapter 2 Descriptive Statistics The Range Range = largest measurement - smallest measurement The range measures the interval spanned by all the data Example 2.3: Internist’s Salaries (in thousands of dollars) 127 132 138 141 144 146 152 154 165 171 177 192 241 Range = 241 - 127 = 114 ($114,000) Chapter 2 Descriptive Statistics The Variance Population X1, X2, …, XN Sample x1, x2, …, xn s2 2 Population Variance N 2 Sample Variance n X i - 2 s 2= i=1 N Chapter 2 Descriptive Statistics 2 x x i i=1 n-1 The Variance For a population of size N, the population variance 2 is defined as N 2 2 x i i 1 N 2 2 2 x1 x2 xN N For a sample of size n, the sample variance s2 is defined as n s2 2 x x i i 1 n 1 2 2 2 x1 x x2 x xn x n 1 and is a point estimate for 2 Chapter 2 Descriptive Statistics The Standard Deviation Population Standard Deviation, : Sample Standard Deviation, s: s s Chapter 2 Descriptive Statistics 2 2 Example 2.5 Consider the population of profit margins for five of the best big companies in America as rated by Forbes magazine on its website on March 16, 2005. These profit margins are 8%, 10%, 15%, 12% and 5%. Population Mean 8 10 15 12 5 50 10 % 5 5 Population 2 8 102 10 102 15 102 12 102 5 102 Variance 5 2 2 2 0 2 52 2 2 5 5 4 0 25 4 25 58 2 11.6% 5 5 Population Standard Deviation Chapter 2 Descriptive Statistics 2 11 .6 3.406 % Example 2.6 The Car Mileage Case Sample variance and standard deviation for first five car mileages from Table 2.1 30.8, 31.7, 30.1, 31.6, 32.1 so x 31.26 2 xi x 5 s2 i 1 5 1 30.8 31.26 2 31.7 31.26 2 30.1 31.26 2 31.6 31.26 2 32.1 31.26 2 4 = 2.572 /4 = 0.643 s s 2 .643 0.8019 Chapter 2 Descriptive Statistics Sample variance and standard deviation for all car mileages from Table 2.1, Recall x 31.5531 . 49 s 2 2 x x i i 1 49 1 30.66204 0.638793 48 s s 2 0.638793 0.7992 The point estimate of the variance of all cars is 0.638793 mpg2 and the point estimate of the standard deviation of all cars is 0.7992 mpg. Chapter 2 Descriptive Statistics The computational formula for the sample variance s2 2 n xi 1 n xi2 i 1 n 1 i 1 n The Payment Time Case Example 2.7 Consider the sample of 65 payment times in Table 2.2. 65 x i 1 i 65 x i 1 2 i x1 x2 x65 22 19 21 1,177 2 x12 x22 x65 (22) 2 (19) 2 (21) 2 22,317 Therefore 1 (1,177) 2 1,004.2464 s 15.69135 22,317 (65 1) 65 64 2 and s s 2 15.69135 3.9612 Days. Chapter 2 Descriptive Statistics The Empirical Rule for Normal Populations If a population has mean and standard deviation and is described by a normal curve (symmetrical, bell-shaped distribution), then 1. 68.26% of the population measurements lie within one standard deviation of the mean: [, ] 2. 95.44% of the population measurements lie within two standard deviations of the mean: [2, 2] 3. 99.73% of the population measurements lie within three standard deviations of the mean: [3, 3] Chapter 2 Descriptive Statistics 3- 12 Empirical Rule 68% 95% 99.7% 3 2 1 1 2 3 Chapter 2 Descriptive Statistics Tolerance Intervals An Interval that contains a specified percentage of the individual measurements in a population is called a tolerance interval. The one, two, and three standard deviation intervals around given in (1), (2) and (3) are tolerance intervals containing, respectively, 68.26 percent, 95.44 percent and 99.73 percent of the measurements in a normally distributed population. The three-sigma interval 3 ] to be a tolerance interval that contains almost all of the measurements in a normally distributed population. Chapter 2 Descriptive Statistics Example 2.8 The Car Mileage Case Recall x 31.5531 and standard deviation S= 0.7992 68.26% of all individual cars will have mileages in the range x s] 31.6 0.8] 30.8,32.4] mpg 95.44% of all individual cars will have mileages in the range x 2s] 31.6 1.6] 30.0,33.2] mpg 99.73% of all individual cars will have mileages in the range x 3s] 31.6 2.4] 29.2,34.0] mpg Chapter 2 Descriptive Statistics Example 2.8 The Car Mileage Case Chapter 2 Descriptive Statistics Skewness and the Empirical Rule The Empirical Rule holds for normally distributed populations. This rule also approximately holds for populations having mound-shaped (single-peaked) distributions that are not very skewed to the right or left. For example , Recall that the distribution of 65 payment times, it indicates that the empirical rule holds. Chapter 2 Descriptive Statistics Chebyshev’s Theorem Let and be a population’s mean and standard deviation, then for any value k > 1, At least 100(1 - 1/k2 )% of the population measurements lie in the interval: [k, k] Use only for non-mound distributions • Skewed But not extremely skewed. (If a population is extremely skewed, it is best to measure variation by using percentiles.) • Bimodal Chapter 2 Descriptive Statistics Empirical Rule and Chebyshev’s Theorem Population Empirical Rule Normal Mound-shaped, not very skewed Chebyshev Any population but not extremely skewed [μ-σ, μ+σ] About 68% [μ-2σ, μ+2σ] About 95% At least 75% [μ-3σ, μ+3σ] About 99.7% At least 88.89% Chapter 2 Descriptive Statistics Exercise Shipping Times (Chebychev’s Theorem and the Empirical Rule) A group of 13 students is studying in Istanbul, Tukey, for five weeks. As part of their study of the local economy, they each purchased an Oriental rug and arranged for its shipment back to the United States. The shipping time, in days, For each rug was 31 31 42 39 42 43 34 30 28 36 37 35 40 Estimate the percentage of days that are within two standard deviations of the Mean. Is it likely to take 2 months for a delivery? Chapter 2 Descriptive Statistics z Scores For any x in a population or sample, the associated z score is x mean z standard deviation The z score is the number of standard deviations that x is from the mean A positive z score is for x above (greater than) the mean A negative z score is for x below (less than) the mean Chapter 2 Descriptive Statistics Example 2.9 Population of profit margins for five big American companies: 8%, 10%, 15%, 12%, 5% = 10%, = 3.406% x x – mean z score 8 8 – 10 = -2 -2/3.406 = -0.59 10 10- 10 = 0 0/3.406 = 0.00 15 15 – 10 = 5 5/3.406 = 1.47 12 12 – 10 = 2 2/3.406 = 0.59 5 5 – 10 = -5 -5/3.406 = -1.47 Chapter 2 Descriptive Statistics Coefficient of Variation Measures the size of the standard deviation relative to the size of the mean Coefficient of Variation = standard deviation ×100% mean Used to: compare the relative variabilities of values about the mean Compare the relative variability of populations or samples with different means and different standard deviations Measure risk Chapter 2 Descriptive Statistics Table: Rates of Return: Asset A and B Rates of Return Year 5 years ago 4 years ago 3 years ago 2 years ago 1 year ago Total Average rate of Return Standard deviation Expected Return Asset A (house) Asset B (stock) 11.3% 12.5 13.0 12.0 12.2 61.0 12.2% 0.63 9.4% 17.1 10.3 11.2 13.5 61.5 12.3% 3.08 Which of these two single asset is better? Chapter 2 Descriptive Statistics Section 2.4 Percentiles, Quartiles and Boxand-Whiskers Display For a set of measurements arranged in increasing order, the pth percentile is a value such that p percent of the measurements fall at or below the value and (100-p) percent of the measurements fall at or above the value The first quartile Q1 is the 25th percentile The second quartile (or median) Md is the 50th percentile The third quartile Q3 is the 75th percentile The interquartile range IQR is Q3 - Q1 Chapter 2 Descriptive Statistics One of Procedures for calculating Percentiles 1. Arrange the measurements in increasing order . p 2. Calculate the index i n 100 A. If i is not integer, the next integer greater than i denotes the Position of the pth percentile in the ordered arrangement. B. If i is integer, then the pth percentile is the average of the The measurement in positions i and i+1 in the ordered arrangement Chapter 2 Descriptive Statistics Example 2.10 DVD Recorder Satisfaction 20 customer satisfaction ratings: 1 3 5 5 7 8 8 8 8 8 8 9 9 9 9 9 10 10 10 10 Md = (8+8)/2 = 8 Q1 = (7+8)/2 = 7.5 Q3 = (9+9)/2 = 9 IQR = Q3 Q1 = 9 7.5 = 1.5 Chapter 2 Descriptive Statistics Five-Number Summary Smallest Value First Quartile Median Third Quartile Largest Value Chapter 2 Descriptive Statistics Using stem-and-leaf displays to find percentiles (a)The 75th percentile of the 65 payment times, and a fivenumber summary Chapter 2 Descriptive Statistics (b) The 5th percentile of the 60 bottle design ratings and a five-number summary Chapter 2 Descriptive Statistics Box-and-Whiskers Plots The box plots the: first quartile, Q1 median, Md third quartile, Q3 inner fences, located 1.5IQR away from the quartiles: = Q1 – (1.5 IQR) = Q3 + (1.5 IQR) outer fences, located 3IQR away from the quartiles: = Q1 – (3 IQR) = Q3 + (3 IQR) Chapter 2 Descriptive Statistics The “whiskers” are dashed lines that plot the range of the data A dashed line drawn from the box below Q1 down to the smallest measurement Another dashed line drawn from the box above Q3 up to the largest measurement Example: 20 customer satisfaction ratings: 1 3 5 5 7 8 8 8 8 8 8 9 9 9 9 9 10 10 10 10 Chapter 2 Descriptive Statistics Outliers Outliers are measurements that are very different from most of the other measurements Because they are either very much larger or very much smaller than most of the other measurements Outliers lie beyond the fences of the box-and-whiskers plot Measurements between the inner and outer fences are mild outliers Measurements beyond the outer fences are severe outliers Chapter 2 Descriptive Statistics Chapter 2 Descriptive Statistics Section 2.5 Describing Qualitative Data Pie charts of the proportion (as percent) of all cars sold in the United States by different manufacturers, 1970 versus 1997 Chapter 2 Descriptive Statistics Population and Sample Proportions Population X1, X2, …, XN Sample x1, x2, …, xn pˆ p Sample Proportion Population Proportion n pˆ x i i =1 n p^ is the point estimate of p Chapter 2 Descriptive Statistics Example 2.11 The Marketing Ethics Case 117 out of 205 marketing researchers disapproved of action taken in a hypothetical scenario X = 117, number of researches who disapprove n = 205, number of researchers surveyed Sample Proportion: p̂ X 117 0.57 n 205 Chapter 2 Descriptive Statistics Bar Chart Percentage of Automobiles Sold by Manufacturer, 1970 versus 1997 Chapter 2 Descriptive Statistics Pie Chart Percentage of Automobiles Sold by Manufacturer,1997 Chapter 2 Descriptive Statistics An Bar Chart of U.S Automobile Sales in 1997 Chapter 2 Descriptive Statistics Misleading Graphs and Charts: Scale Break Break the vertical scale to exaggerate effect Mean Salaries at a Major University, 2002 - 2005 Chapter 2 Descriptive Statistics Misleading Graphs and Charts: Scale Effects Compress vs. stretch the vertical axis to exaggerate or minimize the effect Mean Salary Increases at a Major University, 2002 - 2005 Chapter 2 Descriptive Statistics Weighted Means Sometimes, some measurements are more important than others Assign numerical “weights” to the data Weights measure relative importance of the value Calculate weighted mean as w x w i i i where wi is the weight assigned to the ith measurement xi Chapter 2 Descriptive Statistics Example 2.12 June 2001 unemployment rates in the U.S. by region Census Region Civilian Labor Force Unemployment (millions) Rate (%) Northeast 26.9 4.1 South 50.6 4.7 Midwest 34.7 4.4 West 32.5 5.0 Want the mean unemployment rate for the U.S. Chapter 2 Descriptive Statistics Calculate it as a weighted mean So that the bigger the region, the more heavily it counts in the mean The data values are the regional unemployment rates The weights are the sizes of the regional labor forces 26 .9 4.1 50 .6 4.7 34 .7 4.4 32 .5 5.0 26 .9 50 .6 34 .7 25 .5 32 .5 663 .29 4.58 % 144 .7 Note that the unweigthed mean is 4.55%, which underestimates the true rate by 0.03% That is, 0.0003 144.7 million = 43,410 workers Chapter 2 Descriptive Statistics