Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Part 2 Numerical Summaries of Symmetric Data. Measure of Center: Mean Measure of Variability: Standard Deviation Symmetric Data Body temp. of 93 adults Recall: 2 characteristics of a data set to measure center measures where the “middle” of the data is located variability measures how “spread out” the data is Measure of Center When Data Approx. Symmetric mean (arithmetic mean) notation xi : ith measurement in a set of observations x1 , x2 , x3 , , xn n: number of measurements in data set; sample size n xi x1 x2 x3 xn i 1 Sample mean x n x x1 x2 x3 xn i 1 x n n i Population mean (value typically not known) N = population size N x i 1 N i Recall: Warmup 456=270; 270-40=230; 230/5=46 Six people in a room have a median age of 45 years and mean age of 45 years. One person who is 40 years old leaves the room. Questions: 1. What is the median age of the 5 people remaining in the room? Can’t answer 2. What is the mean age of the 5 people remaining in the room? 46 Connection Between Mean and Histogram A histogram balances when supported at the mean. Mean x = 140.6 Histogram 70 50 40 Fr equency 30 20 10 Abs e nce s f rom Work More 1 60.5 153.5 146.5 139 .5 132.5 125.5 0 118.5 Fre que ncy 60 Mean: balance point Median: 50% area each half right histo: mean 55.26 yrs, median 57.7yrs Properties of Mean, Median 1. The mean and median are unique; that is, a data set has only 1 mean and 1 median (the mean and median are not necessarily equal). 2. The mean uses the value of every number in the data set; the median does not. 20 46 Ex. 2, 4, 6, 8. x 5; m 5 4 2 21 1 46 Ex. 2, 4, 6, 9. x 5 4 ; m 5 4 2 Example: class pulse rates 53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140 n 23 23 x x i 1 i 84.48; 23 m :location: 12th obs. m 85 2010, 2014 baseball salaries 2010 n = 845 = $3,297,828 median = $1,330,000 max = $33,000,000 2014 n = 848 = $3,932,912 median = $1,456,250 max = $28,000,000 Disadvantage of the mean Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data Mean, Median, Maximum Baseball Salaries 1985 - 2014 Baseball Salaries: Mean, Median and Maximum 1985-2014 Mean Median Maximum 35,000,000 3,200,000 25,000,000 2,700,000 20,000,000 2,200,000 15,000,000 1,700,000 10,000,000 1,200,000 Year 2013 2011 2009 2007 2005 2003 2001 1999 1997 1995 1993 0 1991 200,000 1989 5,000,000 1987 700,000 Maximum Salary 30,000,000 1985 Mean, Median Salary 3,700,000 Skewness: comparing the mean, and median Skewed to the right (positively skewed) mean>median 2011 Baseball Salaries 600 490 Frequency 500 400 300 200 100 53 102 72 35 21 26 17 8 10 0 Salary ($1,000's) 2 3 1 0 0 1 Skewed to the left; negatively skewed Mean < median mean=78; median=87; Histogram of Exam Scores Frequency 30 20 10 0 20 30 40 50 60 70 80 Exam Scores 90 100 Symmetric data mean, median approx. equal Bank Customers: 10:00-11:00 am 15 10 5 0 70 .8 78 .6 86 .4 94 .2 10 2 10 9. 8 11 7. 6 12 5. 4 13 3. 2 m or e Frequency 20 Number of Customers DESCRIBING VARIABILITY OF SYMMETRIC DATA Describing Symmetric Data (cont.) Measure of center for symmetric data: Sample mean x n x1 x2 x3 x n Measure data? xn x i 1 i n of variability for symmetric Example 2 data sets: x1=49, x2=51 x=50 y1=0, y2=100 y=50 On average, they’re both comfortable 0 100 49 51 Ways to measure variability range=largest-smallest ok sometimes; in general, too crude; sensitive to one large or small obs. 1. 2. measure spread from the middle, where the middle is the mean x ; deviation of xi from the mean: xi x n (x i 1 i x ); sum the deviations of all the xi 's from x ; n ( x x ) 0 always; tells us nothing i 1 i Previous Example sum of deviations from mean: x1 49, x2 51; x 50 ( x1 x ) ( x2 x ) (49 50) (51 50) 1 1 0; y1 0, y2 100; y 50 ( y1 y ) ( y2 y ) (0 50) (100 50) 50 50 0 The Sample Standard Deviation, a measure of spread around the mean Square the deviation of each observation from the mean; find the square root of the “average” of these squared deviations n ( x i x ) ; ( x i x ) 2 and find the " average" , 2 i 1 then take the square root of the average n s (x i 1 deviation i x )2 n 1 called the sample standard Calculations … Women height (inches) i xi x (xi-x) (xi-x)2 1 59 63.4 -4.4 19.0 2 60 63.4 -3.4 11.3 3 61 63.4 -2.4 5.6 4 62 63.4 -1.4 1.8 5 62 63.4 -1.4 1.8 6 63 63.4 -0.4 0.1 7 63 63.4 -0.4 0.1 8 63 63.4 -0.4 0.1 9 64 63.4 0.6 0.4 10 64 63.4 0.6 0.4 11 65 63.4 1.6 2.7 12 66 63.4 2.6 7.0 13 67 63.4 3.6 13.3 14 68 63.4 4.6 21.6 Mean = 63.4 Sum 0.0 Sum 85.2 Sum of squared deviations from mean = 85.2; Mean 63.4 x (n − 1) = 13; (n − 1) is called degrees freedom (df) s2 = variance = 85.2/13 = 6.55 inches squared s = standard deviation = √6.55 = 2.56 inches i xi x (xi-x) (xi-x)2 1 59 63.4 -4.4 19.0 2 60 63.4 -3.4 11.3 3 61 63.4 -2.4 5.6 4 62 63.4 -1.4 1.8 We’ll never calculate these by hand, so make sure to 5 62 63.4 -1.4 know how to get the1.8standard deviation using your 6 63 63.4 -0.4 0.1 calculator, Excel, or other software. 7 63 63.4 -0.4 0.1 x 8 63 63.4 -0.4 0.1 9 64 63.4 0.6 0.4 10 64 63.4 0.6 0.4 11 65 63.4 1.6 2.7 12 66 63.4 2.6 7.0 13 67 63.4 3.6 13.3 14 68 63.4 4.6 21.6 Sum 0.0 Sum 85.2 Mean 63.4 Mean ± 1 s.d. 2. Then take the square root to get the 1. First calculate the variance s2. s2 n 1 ( xi x ) 2 n 1 1 standard deviation s. 1 n 2 s ( x x ) i n 1 1 Population Standard Deviation N 2 ( x ) i i 1 N value of population standard deviation typically not known; use s to estimate value of Remarks 1. The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement Remarks (cont.) 2. Note that s and are always greater than or equal to zero. 3. The larger the value of s (or ), the greater the spread of the data. When does s=0? When does =0? When all data values are the same. Remarks (cont.) 4. The standard deviation is the most commonly used measure of risk in finance and business – Stocks, Mutual Funds, etc. 5. Variance s2 sample variance 2 population variance Units are squared units of the original data square $, square gallons ?? Remarks 6):Why divide by n-1 instead of n? degrees of freedom each observation has 1 degree of freedom however, when estimate unknown population parameter like , you lose 1 degree of freedom In formula for s , we use x to estimate the unkown n value of ; s 2 ( x x ) i i 1 n 1 Remarks 6) (cont.):Why divide by n-1 instead of n? Example Suppose we have 3 numbers whose average is 9 Choose ANY values for x1 x2 x1= x2= and Since the average (mean) is 9, x1 + x2 + x3 must then x3 must be equal 9*3 = 27, so x3 = 27 once we selected x1– and (x1 + xx22) , x3 was determined since the average was 9 3 numbers but only 2 “degrees of freedom” Computational Example observations 1, 3, 5, 9; x 184 4.5 (1 4.5) 2 (3 4.5) 2 (5 4.5) 2 (9 4.5) 2 s 4 1 (3.5) 2 (1.5) 2 (.5) 2 (4.5) 2 3 12.25 2.25 .25 20.25 35 11.67 3.42; 3 3 s 2 11.67 class pulse rates 53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140 n 23 x 84.48 m 85 s 290.26(beats per minute) s 17.037 beats per minute 2 2 Review: Properties of s and s and are always greater than or equal to 0 when does s = 0? = 0? The larger the value of s (or ), the greater the spread of the data the standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement Summary of Notation SAMPLE y sample mean POPULATION population mean m sample median m population median s sample variance 2 population variance s sample stand. dev. population stand. dev. 2 End of Chapter 3