Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stat 2411 Statistical Methods Chapter 4. Measure of Variation 4.1 The Range • Difference between the largest and smallest values 3, 4, 6, 2, 1, 9 1, 2, 3, 4, 6, 9 Range=9-1=8 4.2 Variance and Standard Deviation For a population with values x1, x2, … …, xn The center is the population mean x i n The deviations from the mean are x1 , x2 , , xn Consider the population – Diameters of all ball bearings produced by machine: x1, x2, … …, xn Let = population mean n = population size Then n in i 1 x 2 population variance n mean of (x - ) 2 Average squared deviation from mean 2 ( xi )2 i 1 n population standard deviation 2 Many calculators have a function for x Sample variance • For a sample of size n, the sample variance is n 1 2 s ( xi x ) n 1 i 1 2 • Why divide by n -1? This makes s 2 an unbiased estimator of 2 . Unbiased means on the average correct. Suppose we have a large population of ball bearings with diameters =1cm and 2 0.02 0.0004 s2 x Sample 1 2 3 4 . . ∞ Mean 0.98 1.03 1.01 1.02 . . -----1.00 0.00032 0.00031 0.00045 0.00052 . . -------0.0004 2 If we knew we would find ˆ 2 ( xi ) n Fact i 1 n min So (x (x i m) 2 ( x i x ) 2 x ) ( xi ) 2 i Dividing by n-1 makes s2 2 and come out right (xi x ) 2 would be too small for 2. n (2 )on average. Sample Standard Deviation Variance: 2 ( x x ) i s2 n 1 Standard Deviation: ( xi x )2 s n 1 The standard deviation (s) measures spread (or variation) by looking at how far observations are from the mean. Example On an exam I might ask you to write a numerical expression for s for the data for the sample. x1 8 x2 9 x3 4 894 x 7 3 2 2 2 ( 8 7 ) ( 9 7 ) ( 4 7 ) 14 2 s 7 3 1 2 s Sample standard deviation 7 2.65 (8 7)2 (9 7)2 (4 7)2 s 2 Choosing Measures of Center and Spread Use the mean & standard deviation for “bellshaped” distributions, where data are symmetric and the average score is typical, i.e. no outliers. Use the five number summary (Min, Q1, Median, Q3, Max) for skewed data where very large or small observations make the mean less representative and to highlight the range of outliers. 4.3 Application of the Standard Deviation • Chebyshev’s Theorem – skip • For bell – shaped histograms (or approximately normal distributed, we will talk more about this later) 3 2 2 3 Approx. 68% of the obs. are between 1 Approx. 95% of the obs. are between 2 Approx. 99.7% of the obs. are between 3 The same is true for s and x Standardizing Observations – z-scores If we measure in units of size , about the mean , we can transform our data to standard units: # of standard deviations from average. This is called standardizing. So if x is an observation from a data set that has mean and standard deviation , the standardized value of x is z x A standardized value is often called a z-score. Example In the US, the systolic blood pressure of men aged 20 has mean 120 and standard deviation 10. 1) We can expect 95% of our observations fall within 120 2 *10 120 20 or (100,140) 2) The systolic bp of a 20-yr old man is 130. Find the z-score for his bp: z x 130 120 1 10 1 standard deviation above the average. Exercise 1: The Standard Deviation (s) 26 systolic blood pressure 108 134 100 108 112 112 112 122 116 116 120 108 108 96 114 108 128 114 112 124 90 102 106 124 130 116 X = 113.08 mm Hg 2 2 2 (108-113.08) + (134-113.08) +... + (116-113.08) s2 105.67 26 1 s 105.67 10.28 Exercise 2: z-score In the US, the systolic blood pressure of men aged 20 has mean 120 and standard deviation 10. Q1. what proportion of the bps have a value outside the range 110 to 130? Approximately 68% of the bp values will fall within 1 standard deviations from the mean. So approximately 32% of the values will fall outside the range. Q2. What is the z-score of a blood pressure value of 100? 100 120 2, so 2 standard deviations below the mean. 10