Download GRAPHICAL METHODS FOR QUANTITATIVE DATA

Chapter 3 Part 2 Numerical Summaries of Symmetric Data. Measure of Center: Mean Measure of Variability: Standard Deviation Symmetric Data Body temp. of 93 adults Recall: 2 characteristics of a data set to measure  center measures where the “middle” of the data is located  variability measures how “spread out” the data is Measure of Center When Data Approx. Symmetric  mean (arithmetic mean)  notation xi : ith measurement in a set of observations x1 , x2 , x3 , , xn n: number of measurements in data set; sample size n  xi  x1  x2  x3    xn i 1 Sample mean x n x x1  x2  x3  xn i 1 x  n n i Population mean  (value typically not known) N = population size N x   i 1 N i Recall: Warmup 456=270; 270-40=230; 230/5=46  Six people in a room have a median age of 45 years and mean age of 45 years.  One person who is 40 years old leaves the room.  Questions: 1. What is the median age of the 5 people remaining in the room? Can’t answer 2. What is the mean age of the 5 people remaining in the room? 46 Connection Between Mean and Histogram A histogram balances when supported at the mean. Mean x = 140.6 Histogram 70 50 40 Fr equency 30 20 10 Abs e nce s f rom Work More 1 60.5 153.5 146.5 139 .5 132.5 125.5 0 118.5 Fre que ncy 60 Mean: balance point Median: 50% area each half right histo: mean 55.26 yrs, median 57.7yrs Properties of Mean, Median 1. The mean and median are unique; that is, a data set has only 1 mean and 1 median (the mean and median are not necessarily equal). 2. The mean uses the value of every number in the data set; the median does not. 20 46 Ex. 2, 4, 6, 8. x   5; m  5 4 2 21 1 46 Ex. 2, 4, 6, 9. x   5 4 ; m  5 4 2 Example: class pulse rates  53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140 n  23 23 x  x i 1 i  84.48; 23 m :location: 12th obs. m  85 2010, 2014 baseball salaries 2010 n = 845  = $3,297,828 median = $1,330,000 max = $33,000,000  2014 n = 848  = $3,932,912 median = $1,456,250 max = $28,000,000  Disadvantage of the mean  Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data Mean, Median, Maximum Baseball Salaries 1985 - 2014 Baseball Salaries: Mean, Median and Maximum 1985-2014 Mean Median Maximum 35,000,000 3,200,000 25,000,000 2,700,000 20,000,000 2,200,000 15,000,000 1,700,000 10,000,000 1,200,000 Year 2013 2011 2009 2007 2005 2003 2001 1999 1997 1995 1993 0 1991 200,000 1989 5,000,000 1987 700,000 Maximum Salary 30,000,000 1985 Mean, Median Salary 3,700,000 Skewness: comparing the mean, and median  Skewed to the right (positively skewed)  mean>median 2011 Baseball Salaries 600 490 Frequency 500 400 300 200 100 53 102 72 35 21 26 17 8 10 0 Salary ($1,000's) 2 3 1 0 0 1 Skewed to the left; negatively skewed  Mean < median  mean=78; median=87; Histogram of Exam Scores Frequency 30 20 10 0 20 30 40 50 60 70 80 Exam Scores 90 100 Symmetric data  mean, median approx. equal Bank Customers: 10:00-11:00 am 15 10 5 0 70 .8 78 .6 86 .4 94 .2 10 2 10 9. 8 11 7. 6 12 5. 4 13 3. 2 m or e Frequency 20 Number of Customers DESCRIBING VARIABILITY OF SYMMETRIC DATA Describing Symmetric Data (cont.)  Measure of center for symmetric data: Sample mean x n x1  x2  x3  x n  Measure data?  xn  x i 1 i n of variability for symmetric Example 2 data sets: x1=49, x2=51 x=50 y1=0, y2=100 y=50 On average, they’re both comfortable 0 100 49 51 Ways to measure variability range=largest-smallest ok sometimes; in general, too crude; sensitive to one large or small obs. 1. 2. measure spread from the middle, where the middle is the mean x ;  deviation of xi from the mean: xi  x  n  (x i 1 i  x ); sum the deviations of all the xi 's from x ; n  ( x  x )  0 always; tells us nothing i 1 i Previous Example sum of deviations from mean: x1  49, x2  51; x  50  ( x1  x )  ( x2  x )  (49  50)  (51  50)  1  1  0; y1  0, y2  100; y  50  ( y1  y )  ( y2  y )  (0  50)  (100  50)  50  50  0 The Sample Standard Deviation, a measure of spread around the mean  Square the deviation of each observation from the mean; find the square root of the “average” of these squared deviations n ( x i  x ) ;  ( x i  x ) 2 and find the " average" , 2 i 1 then take the square root of the average n s   (x i 1 deviation i  x )2 n 1 called the sample standard Calculations … Women height (inches) i xi x (xi-x) (xi-x)2 1 59 63.4 -4.4 19.0 2 60 63.4 -3.4 11.3 3 61 63.4 -2.4 5.6 4 62 63.4 -1.4 1.8 5 62 63.4 -1.4 1.8 6 63 63.4 -0.4 0.1 7 63 63.4 -0.4 0.1 8 63 63.4 -0.4 0.1 9 64 63.4 0.6 0.4 10 64 63.4 0.6 0.4 11 65 63.4 1.6 2.7 12 66 63.4 2.6 7.0 13 67 63.4 3.6 13.3 14 68 63.4 4.6 21.6 Mean = 63.4 Sum 0.0 Sum 85.2 Sum of squared deviations from mean = 85.2; Mean 63.4 x (n − 1) = 13; (n − 1) is called degrees freedom (df) s2 = variance = 85.2/13 = 6.55 inches squared s = standard deviation = √6.55 = 2.56 inches i xi x (xi-x) (xi-x)2 1 59 63.4 -4.4 19.0 2 60 63.4 -3.4 11.3 3 61 63.4 -2.4 5.6 4 62 63.4 -1.4 1.8 We’ll never calculate these by hand, so make sure to 5 62 63.4 -1.4 know how to get the1.8standard deviation using your 6 63 63.4 -0.4 0.1 calculator, Excel, or other software. 7 63 63.4 -0.4 0.1 x 8 63 63.4 -0.4 0.1 9 64 63.4 0.6 0.4 10 64 63.4 0.6 0.4 11 65 63.4 1.6 2.7 12 66 63.4 2.6 7.0 13 67 63.4 3.6 13.3 14 68 63.4 4.6 21.6 Sum 0.0 Sum 85.2 Mean 63.4 Mean ± 1 s.d. 2. Then take the square root to get the 1. First calculate the variance s2. s2  n 1 ( xi  x ) 2  n 1 1 standard deviation s. 1 n 2 s ( x  x )  i n 1 1 Population Standard Deviation N   2 ( x   )  i i 1 N value of  population standard deviation typically not known; use s to estimate value of  Remarks 1. The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement Remarks (cont.) 2. Note that s and  are always greater than or equal to zero. 3. The larger the value of s (or  ), the greater the spread of the data. When does s=0? When does  =0? When all data values are the same. Remarks (cont.) 4. The standard deviation is the most commonly used measure of risk in finance and business – Stocks, Mutual Funds, etc. 5. Variance     s2 sample variance  2 population variance Units are squared units of the original data square $, square gallons ?? Remarks 6):Why divide by n-1 instead of n?  degrees of freedom  each observation has 1 degree of freedom  however, when estimate unknown population parameter like , you lose 1 degree of freedom In formula for s , we use x to estimate the unkown n value of  ; s  2 ( x  x )  i i 1 n 1 Remarks 6) (cont.):Why divide by n-1 instead of n? Example  Suppose we have 3 numbers whose average is 9 Choose ANY values for x1 x2  x1= x2= and Since the average (mean) is 9, x1 + x2 + x3 must  then x3 must be equal 9*3 = 27, so x3 = 27  once we selected x1– and (x1 + xx22) , x3 was determined since the average was 9  3 numbers but only 2 “degrees of freedom” Computational Example observations 1, 3, 5, 9; x  184 4.5 (1  4.5) 2  (3  4.5) 2  (5  4.5) 2  (9  4.5) 2 s  4 1 (3.5) 2  (1.5) 2  (.5) 2  (4.5) 2  3 12.25  2.25  .25  20.25 35    11.67 3.42; 3 3 s 2 11.67 class pulse rates 53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140 n  23 x  84.48 m  85 s  290.26(beats per minute) s  17.037 beats per minute 2 2 Review: Properties of s and  s and  are always greater than or equal to 0 when does s = 0?  = 0?  The larger the value of s (or ), the greater the spread of the data  the standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement  Summary of Notation SAMPLE y sample mean POPULATION  population mean m sample median m population median s sample variance  2 population variance s sample stand. dev.  population stand. dev. 2 End of Chapter 3

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download GRAPHICAL METHODS FOR QUANTITATIVE DATA