Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 12 Describing Distributions with Numbers Chapter 12 1 Thought Question 1 Suppose you are helping to prepare a meal for the homeless at a local shelter, and you are told that on average about 25 people attend such meals. Is there anything else you should know about the data, in addition to the average, before deciding how much food to prepare? (Hint: Do 25 people attend every meal?) Chapter 12 2 Thought Question 2 The 2000-2001 Los Angeles Lakers professional basketball team had a median salary of $1,760,000 and an average salary of $3,459,000. How are these values computed? Which do you think is a better measure of the typical salary of a player on this team, the median or the average? Chapter 12 3 Thought Question 3 The length of human pregnancies has a mean, or average, of 266 days for the entire population of women. It also has a standard deviation of 16 days. What do you think is meant by the term “standard deviation”? Chapter 12 4 Turning Data Into Information Center of the data – mean – median – mode Spread of the data (Variability) – variance – standard deviation – range – interquartile range Chapter 12 5 Average or Mean ( X ) Traditional measure of center Sum the values and divide by the number of values n 1 1 x x1 x 2 xn xi n n i 1 Chapter 12 6 Median (M) A resistant measure of the data’s center At least half of the ordered values are less than or equal to the median value At least half of the ordered values are greater than or equal to the median value If n is odd, the median is the middle ordered value If n is even, the median is the average of the two middle ordered values Chapter 12 7 Median Example 1 data: 2 4 6 Median (M) = 4 Example 2 data: 2 4 6 8 Median = 5 (ave. of 4 and 6) Example 3 data: 6 2 4 Median 2 (order the values: 2 4 6 , so Median = 4) Chapter 12 8 10 11 12 13 Measures of Center 14 mean 15 16 median 17 18 19 20 21 22 23 STAT 208 Class Survey 24 Spring, 1997 25 Virginia Commonwealth University 26 Weight Data: Chapter 12 0166 009 0034578 00359 08 00257 555 000255 000055567 245 3 025 0 0 9 Comparing the Mean & Median The mean and median of data from a symmetric distribution should be close together. The actual (true) mean and median of a symmetric distribution are exactly the same. In a skewed distribution, the mean is farther out in the long tail than is the median [the mean is ‘pulled’ in the direction of the possible outlier(s)]. Chapter 12 10 Case Study Airline fares appeared in the New York Times on November 5, 1995 “...about 60% of airline passengers ‘pay less than the average fare’ for their specific flight.” How can this be? 13% of passengers pay more than 1.5 times the average fare for their flight Chapter 12 11 Spread or Variability If all values are the same, then they all equal the mean. There is no spread. Variability exists when some values are different from (above or below) the mean. We will discuss the following measures of spread: variance, standard deviation, range, and interquartile range Chapter 12 12 Variance and Standard Deviation When variability exists, each data value has an associated deviation from the mean: xi x What is a typical deviation from the mean? (standard deviation) Small values of this typical deviation indicate small spread in the data Large values of this typical deviation indicate large spread in the data Chapter 12 13 Variance Find the mean Find the deviation of each value from the mean Square the deviations Sum the squared deviations Divide the sum by n-1 (gives typical squared deviation from mean) Chapter 12 14 Variance Formula n 1 2 2 s ( xi x ) (n 1) i 1 Chapter 12 15 Standard Deviation Formula typical deviation from the mean n 1 2 s ( xi x ) (n 1) i 1 [ standard deviation = square root of the variance ] Click for Computation Chapter 12 16 Traditional Summary Statistics Weight Data Mean = 158.75 Standard deviation = 35.65 Chapter 12 17 Quartiles Three numbers that divide the ordered data into four equal sized groups. Q1 has 25% of the data below it. Q2 has 50% of the data below it. (Median) Q3 has 75% of the data below it. Chapter 12 18 Quartiles Uniform Histogram 1st Qtr Q1 2nd Qtr Q2 3rd Qtr Chapter 12 Q3 4th Qtr 19 Obtaining the Quartiles Order the data. For Q2, just find the median. For Q1, look at the lower half of the data values, those to the left of the median; find the median of this lower half. For Q3, look at the upper half of the data values, those to the right of the median; find the median of this upper half. Chapter 12 20 Weight Data: Sorted 100 101 106 106 110 110 119 120 120 123 124 125 127 128 130 130 133 135 139 140 148 150 150 152 155 157 165 165 165 170 170 170 172 175 175 180 180 180 180 185 Chapter 12 185 185 186 187 192 194 195 203 210 212 215 220 260 21 Weight Data: Quartiles Q 1= 127.5 Q2= 165 (Median) Q3= 185 Chapter 12 22 10 11 12 first quartile 13 Quartiles 14 15 16 median or second quartile 17 third quartile 18 19 20 21 22 23 24 25 26 Weight Data: Chapter 12 0166 009 0034578 00359 08 00257 555 000255 000055567 245 3 025 0 0 23 Five-Number Summary minimum = 100 Q1 = 127.5 M = 165 Q3 = 185 maximum = 260 Interquartile Range (IQR) = Q3 Q1 = 57.5 IQR gives spread of middle 50% of the data ( “middle 50% of data ranges from Q1 to Q3” ) Chapter 12 24 Range One way to measure spread is to give the smallest (minimum) and largest (maximum) values in the data set; Range = max min ( “the values range from min to max” ) The range is strongly affected by outliers, and is rarely used Chapter 12 25 Boxplot (from Five-Number Summary) Central A line box spans Q1 and Q3. in the box marks the median M. Lines extend from the box out to the minimum and maximum. Chapter 12 26 Weight Data: Boxplot min 100 Q1 125 M 150 Q3 175 max 200 225 250 275 Weight Chapter 12 27 Choosing a Summary Outliers affect the values of the mean and standard deviation. The five-number summary should be used to describe center and spread for skewed distributions, or when outliers are present. Use the mean and standard deviation for reasonably symmetric distributions that are free of outliers. Chapter 12 28 Number of Books Read for Pleasure: Sorted 0 0 0 0 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 Med 3 4 4 4 4 4 4 5 5 5 5 5 5 6 Chapter 12 10 10 12 13 14 14 15 15 20 20 30 99 29 Five-Number Summary, Boxplot Median = 3 interquartile range (iqr) = 5.5-1.0 = 4.5 range = 99-0 = 99 0 10 20 30 40 50 60 Number of books Mean = 7.06 70 80 90 100 s.d. = 14.43 Chapter 12 30 Number of Books Read for Pleasure 10 9 8 7 6 5 4 3 2 1 0 00 7 14 21 28 35 42 49 56 63 Number of Books Chapter 12 70 77 84 91 98 31 Key Concepts Numerical Summaries – Center (mean, median) – Spread (variance, std. dev., range, IQR) – Five-number summary & Boxplots Choosing mean versus median Choosing standard deviation versus five-number summary Chapter 12 32 Variance and Standard Deviation Example The following data consist of the metabolic rates (cal./24hr.) of 7 men from a dieting study: 1792 1666 1362 1614 1460 1867 1439 First, compute the sample mean: 1792 1666 1362 1614 1460 1867 1439 x 7 11,200 7 1600 Chapter 12 33 Variance and Standard Deviation Example Observations Deviations Squared deviations xi x xi xi x 1792 17921600 = 192 1666 1666 1600 = 1362 1362 1600 = -238 1614 1614 1600 = 1460 1460 1600 = -140 (-140)2 = 19,600 1867 1867 1600 = 267 (267)2 = 71,289 1439 1439 1600 = -161 (-161)2 = 25,921 sum = 2 66 14 0 Chapter 12 (192)2 = 36,864 (66)2 = 4,356 (-238)2 = 56,644 (14)2 = 196 sum = 214,870 34 Variance and Standard Deviation Example 214,870 s 35,811.67 7 1 2 s 35,811.67 189.24 calories Return to Slide 16 Chapter 12 35