Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Descriptive Statistics: Numerical Methods McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Descriptive Statistics 3.1 Describing Central Tendency 3.2 Measures of Variation 3.3 Percentiles, Quartiles and Box-andWhiskers Displays 3.4 Covariance, Correlation, and the Least Square Line 3.5 Weighted Means and Grouped Data (Optional) 3.6 The Geometric Mean (Optional) 3-2 Describing Central Tendency • In addition to describing the shape of a distribution, want to describe the data set’s central tendency – A measure of central tendency represents the center or middle of the data 3-3 Parameters and Statistics • A population parameter is a number calculated from all the population measurements that describes some aspect of the population • A sample statistic is a number calculated using the sample measurements that describes some aspect of the sample 3-4 Measures of Central Tendency Mean, The average or expected value Median, Md The value of the middle point of the ordered measurements Mode, Mo The most frequent value 3-5 The Mean Population X1, X2, …, XN Sample x1, x2, …, xn x Population Mean Sample Mean n N Xi i=1 N x x i i=1 n 3-6 The Sample Mean For a sample of size n, the sample mean is defined as n x x i 1 n i x1 x2 ... xn n and is a point estimate of the population mean • It is the value to expect, on average and in the long run 3-7 Example 3.1: The Car Mileage Case • Example 3.1:Sample mean for first five car mileages from Table 3.1: 30.8, 31.7, 30.1, 31.6, 32.1 5 x x1 x2 x3 x4 x5 x 5 5 30.8 31.7 30.1 31.6 32.1 156.3 x 31.26 5 5 i 1 i 3-8 The Median The median Md is a value such that 50% of all measurements, after having been arranged in numerical order, lie above (or below) it 1. If the number of measurements is odd, the median is the middlemost measurement in the ordering 2. If the number of measurements is even, the median is the average of the two middlemost measurements in the ordering 3-9 Example: Car Mileage Case • Example 3.1: First five observations from Table 3.1: 30.8, 31.7, 30.1, 31.6, 32.1 • In order: 30.1, 30.8, 31.6, 31.7, 32.1 • There is an odd so median is one in middle, or 31.6 3-10 The Mode The mode Mo of a population or sample of measurements is the measurement that occurs most frequently – Modes are the values that are observed “most typically” – Sometimes higher frequencies at two or more values • If there are two modes, the data is bimodal • If more than two modes, the data is multimodal – When data are in classes, the class with the highest frequency is the modal class • The tallest box in the histogram 3-11 Histogram Describing the 50 Mileages 3-12 Relationships Among Mean, Median and Mode 3-13 Measures of Variation • Knowing the measures of central tendency is not enough • Both of the distributions below have identical measures of central tendency 3-14 Measures of Variation Range Largest minus the smallest measurement Variance The average of the squared deviations of all the population measurements from the population mean Standard The square root of the variance Deviation 3-15 The Range • Largest minus smallest • Measures the interval spanned by all the data • For Figure 3.13, largest repair time is 5 and smallest is 3 • Range is 5 – 3 = 2 days 3-16 Population Variance and Standard Deviation • The population variance (σ2) is the average of the squared deviations of the individual population measurements from the population mean (µ) • The population standard deviation (σ) is the positive square root of the population variance 3-17 Variance • For a population of size N, the population variance σ2 is: N 2 2 x i i 1 N 2 2 2 x1 x2 xN N • For a sample of size n, the sample variance s2 is: n s2 2 x x i i 1 n 1 2 2 2 x1 x x2 x xn x n 1 3-18 Standard Deviation • Population standard deviation (σ): 2 • Sample standard deviation (s): s s 2 3-19 Example: Chris’s Class Sizes This Semester • Data points are: 60, 41, 15, 30, 34 • Mean is 36 • Variance is: 2 2 2 2 2 2 60 36 41 36 15 36 30 36 34 36 5 576 25 441 36 4 1082 216.4 5 5 Standard deviation is: 216.4 14.71 3-20 Example: Sample Variance and Standard Deviation • Example 3.7: data for first five car mileages from Table 3.1 are 30.8, 31.7, 30.1, 31.6, 32.1 • The sample mean is 31.26 5 s2 x x i 1 2 i 5 1 2 2 2 2 2 30.8 31.26 31.7 31.26 30.1 31.26 31.6 31.26 32.1 31.26 4 2.572 0.643 4 s s 2 0.643 0.8019 3-21 The Empirical Rule for Normal Populations • If a population has mean µ and standard deviation σ and is described by a normal curve, then – 68.26% of the population measurements lie within one standard deviation of the mean: [µ-σ, µ+σ] – 95.44% of the population measurements lie within two standard deviations of the mean: [µ-2σ, µ+2σ] – 99.73% of the population measurements lie within three standard deviations of the mean: [µ-3σ, µ+3σ] 3-22 The Empirical Rule and Tolerance Intervals 3-23 Example 3.9: The Car Mileage Case Continued • 68.26% of all individual cars will have mileages in the range [x±s] = [31.6±0.8] = [30.8, 32.4] mpg • 95.44% of all individual cars will have mileages in the range [x±2s] = [31.6±1.6] = [30.0, 33.2] mpg • 99.73% of all individual cars will have mileages in the range [x±3s] = [31.6±2.4] = [29.2, 34.0] mpg 3-24 Estimated Tolerance Intervals in the Car Mileage Case 3-25 Chebyshev’s Theorem • Let µ and σ be a population’s mean and standard deviation, then for any value k> 1 • At least 100(1 - 1/k2 )% of the population measurements lie in the interval [µ-kσ, µ+kσ] • Only practical for non-mound-shaped distribution population that is not very skewed 3-26 z Scores • For any x in a population or sample, the associated z score is x mean z standard deviation • The z score is the number of standard deviations that x is from the mean – A positive z score is for x above (greater than) the mean – A negative z score is for x below (less than) the mean 3-27 Example: z Score • Population of profit margins for five American companies: 8%, 10%, 15%, 12%, 5% • µ = 10%, σ = 3.406% 3-28 Coefficient of Variation • Measures the size of the standard deviation relative to the size of the mean • Coefficient of variation =standard deviation/mean × 100% • Used to: – Compare the relative variabilities of values about the mean – Compare the relative variability of populations or samples with different means and different standard deviations – Measure risk 3-29