Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2.4 Describing Distributions Numerically – cont. Describing Symmetric Data 1 Symmetric Data Body temp. of 93 adults 2 Recall: 2 characteristics of a data set to measure center measures where the “middle” of the data is located variability measures how “spread out” the data is 3 Measure of Center When Data Approx. Symmetric mean (arithmetic mean) notation xi : ith measurement in a set of observations x1 , x2 , x3 , , xn n: number of measurements in data set; sample size n xi x1 x2 x3 xn i 1 4 Sample mean x n x x1 x2 x3 xn i 1 x n n i Population mean (value typically not known) N = population size N x i 1 N i 5 Connection Between Mean and Histogram A histogram balances when supported at the mean. Mean x = 140.6 Histogram 70 60 50 40 Fr equency 30 20 10 Abs e nce s f rom Work More 1 60.5 153.5 146.5 139 .5 132.5 125.5 0 118.5 Fre que ncy 6 Mean: balance point Median: 50% area each half right histo: mean 55.26 yrs, median 57.7yrs 7 Properties of Mean, Median 1. The mean and median are unique; that is, a data set has only 1 mean and 1 median (the mean and median are not necessarily equal). 2. The mean uses the value of every number in the data set; the median does not. 20 46 Ex. 2, 4, 6, 8. x 5; m 5 4 2 21 1 46 Ex. 2, 4, 6, 9. x 5 4 ; m 5 4 2 8 Think about mean and median 456=270; 270-40=230; 230/5=46 Six people in a room have a median age of 45 years and mean age of 45 years. One person who is 40 years old leaves the room. Questions: 1. What is the median age of the 5 people remaining in the room? Can’t answer 2. What is the mean age of the 5 people remaining in the room? 46 9 Example: class pulse rates 53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140 n 23 23 x x i 1 i 84.48; 23 m :location: 12th obs. m 85 10 2010, 2014 baseball salaries 2010 n = 845 = $3,297,828 median = $1,330,000 max = $33,000,000 2014 n = 848 = $3,932,912 median = $1,456,250 max = $28,000,000 11 Disadvantage of the mean Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data 12 Mean, Median, Maximum Baseball Salaries 1985 - 2014 Baseball Salaries: Mean, Median and Maximum 1985-2014 Mean Median Maximum 35,000,000 3,200,000 25,000,000 2,700,000 20,000,000 2,200,000 15,000,000 1,700,000 Maximum Salary 30,000,000 10,000,000 1,200,000 Year 2013 2011 2009 2007 2005 2003 2001 1999 1997 1995 1993 0 1991 200,000 1989 5,000,000 1987 700,000 1985 Mean, Median Salary 3,700,000 13 Skewness: comparing the mean, and median Skewed to the right (positively skewed) mean>median 2013 MLB Salaries 450 419 400 Frequency 350 300 250 200 150 99 100 50 72 24 33 29 28 16 12 7 8 4 2 1 0 2013 Salary ($1,000) 14 Skewed to the left; negatively skewed Mean < median mean=78; median=87; Histogram of Exam Scores Frequency 30 20 10 0 20 30 40 50 60 70 80 Exam Scores 90 100 15 Symmetric data mean, median approx. equal Bank Customers: 10:00-11:00 am 20 15 10 5 0 70 .8 78 .6 86 .4 94 .2 10 2 10 9. 8 11 7. 6 12 5. 4 13 3. 2 m or e Frequency Number of Customers 16 Describing Symmetric Data (cont.) Measure of center for symmetric data: Sample mean x n x1 x2 x3 x n xn x i 1 i n Measure of variability for symmetric data? 18 Example 2 data sets: x1=49, x2=51 x=50 y1=0, y2=100 y=50 19 On average, they’re both comfortable 0 100 49 51 20 Ways to measure variability range=largest-smallest ok sometimes; in general, too crude; sensitive to one large or small obs. 1. 2. measure spread from the middle, where the middle is the mean x ; deviation of xi from the mean: xi x n (x i 1 i x ); sum the deviations of all the xi 's from x ; n ( x x ) 0 always; tells us nothing i 1 i 21 Previous Example sum of deviations from mean: x1 49, x2 51; x 50 ( x1 x ) ( x2 x ) (49 50) (51 50) 1 1 0; y1 0, y2 100; y 50 ( y1 y ) ( y2 y ) (0 50) (100 50) 50 50 0 22 The Sample Standard Deviation, a measure of spread around the mean Square the deviation of each observation from the mean; find the square root of the “average” of these squared deviations n ( x i x ) ; ( x i x ) 2 and find the " average" , 2 i 1 then take the square root of the average n s (x i 1 deviation i x )2 n 1 called the sample standard 23 Calculations … Women height (inches) i xi x (xi-x) (xi-x)2 1 59 63.4 -4.4 19.0 2 60 63.4 -3.4 11.3 3 61 63.4 -2.4 5.6 4 62 63.4 -1.4 1.8 5 62 63.4 -1.4 1.8 6 63 63.4 -0.4 0.1 7 63 63.4 -0.4 0.1 8 63 63.4 -0.4 0.1 9 64 63.4 0.6 0.4 10 64 63.4 0.6 0.4 11 65 63.4 1.6 2.7 12 66 63.4 2.6 7.0 13 67 63.4 3.6 13.3 14 68 63.4 4.6 21.6 Mean = 63.4 Sum 0.0 Sum 85.2 Sum of squared deviations from mean = 85.2 Mean 63.4 x (n − 1) = 13; (n − 1) is called degrees freedom (df) 24 s2 = variance = 85.2/13 = 6.55 inches squared s = standard deviation = √6.55 = 2.56 inches i xi x (xi-x) (xi-x)2 1 59 63.4 -4.4 19.0 2 60 63.4 -3.4 11.3 3 61 63.4 -2.4 5.6 4 62 63.4 -1.4 1.8 We’ll never calculate these by hand, so make sure to 5 62 63.4 -1.4 know how to get the1.8standard deviation using your 6 63 63.4 -0.4 0.1 calculator, Excel, or other software. 7 63 63.4 -0.4 0.1 x 8 63 63.4 -0.4 0.1 9 64 63.4 0.6 0.4 10 64 63.4 0.6 0.4 11 65 63.4 1.6 2.7 12 66 63.4 2.6 7.0 13 67 63.4 3.6 13.3 14 68 63.4 4.6 21.6 Sum 0.0 Sum 85.2 Mean 63.4 Mean ± 1 s.d. 2. Then take the square root to get the 1. First calculate the variance s2. s2 n 1 ( xi x ) 2 n 1 1 standard deviation s. 1 n 2 s ( x x ) i n 1 1 25 Population Standard Deviation N 2 ( x ) i i 1 N value of population standard deviation typically not known; use s to estimate value of 26 Remarks 1. The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement 27 Remarks (cont.) 2. Note that s and are always greater than or equal to zero. 3. The larger the value of s (or ), the greater the spread of the data. When does s=0? When does =0? 28 Remarks (cont.) 4. The standard deviation is the most commonly used measure of risk in finance and business – Stocks, Mutual Funds, etc. 5. Variance s2 sample variance 2 population variance Units are squared units of the original data square $, square gallons ?? 29 Remarks 6):Why divide by n-1 instead of n? degrees of freedom each observation has 1 degree of freedom however, when estimate unknown population parameter like , you lose 1 degree of freedom In formula for s , we use x to estimate the unkown n value of ; s 2 ( x x ) i i 1 n 1 30 Remarks 6) (cont.):Why divide by n-1 instead of n? Example Suppose we have 3 numbers whose average is 9 x1= x2= then x3 must be once we selected x1 and x2, x3 was determined since the average was 9 3 numbers but only 2 “degrees of freedom” 31 Computational Example observations 1, 3, 5, 9; x 184 4.5 (1 4.5) 2 (3 4.5) 2 (5 4.5) 2 (9 4.5) 2 s 4 1 (3.5) 2 (1.5) 2 (.5) 2 (4.5) 2 3 12.25 2.25 .25 20.25 35 11.67 3.42; 3 3 32 2 s 11.67 class pulse rates 53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140 n 23 x 84.48 m 85 s 290.26(beats per minute) s 17.037 beats per minute 2 2 33 Example x s m #1 32 41 44 47 50 53 56 59 68 50 10.6 50 #2 33 35 45 50 52 54 58 59 64 50 10.6 52 #3 38 39 39 40 56 57 58 61 62 50 10.6 56 #4 37 42 45 46 47 48 50 67 68 50 10.6 34 47 Boxplots: same mean, standard deviation 35 More Boxplots of the 4 data sets 36 Review: Properties of s and s and are always greater than or equal to 0 when does s = 0? = 0? The larger the value of s (or ), the greater the spread of the data the standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement 37 Summary of Notation SAMPLE y sample mean POPULATION population mean m sample median m population median s sample variance 2 population variance s sample stand. dev. population stand. dev. 2 38