Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group were accidents, 12,958; homicide, 5,793; suicide, 4,146; cancer, 1,583; heart disease, 1,013; congenital defects, 383; AIDS, 276. Make a bar graph to display these data. What additional information do you need to make a pie chart? Warm up 2 1. The histogram shows the number of major hurricanes that reached the East Coast of the United States from 1944 to 2000. Describe the shape, center, and spread of the distribution. Frequency 15 10 5 0 0 1 2 3 4 Hurricanes 5 6 7 WU 2 Cont. Hallux abducto valgus (call it HAV) is a deformation of the big toe that is not common in youth and often requires surgery. Doctors used X-rays to measure the angle (in degrees) of deformity in 38 consecutive patients under the age of 21 who came to a medical center for surgery to correct HAV. The angle is a measure of the seriousness of the deformity. Here are the data. 28 32 25 34 38 26 25 18 30 26 28 13 20 21 17 16 21 23 14 32 25 21 22 20 18 26 16 30 30 20 50 25 26 28 31 38 32 21 Make a stemplot and give a numerical description of this distribution. Are there an outliers? Write a brief discussion of the shape, center, and spread of the angle of deformity among young patients needing surgery for this condition. Section 1.2 Describing Distributions with Numbers Specific Ways to Describe Shape, Center and Spread Center: Mean – ordinary arithmetic average. Pronounced “x-bar.” n 1 X Xi n i 1 Σ, pronounced “sigma” means the sum of… In other words, you add up the terms 1 through n. •Median – the midpoint of the data set. Denoted M. Bonds vs. Aaron Barry Bonds Hank Aaron 16 40 13 32 19 42 27 44 24 46 26 39 25 49 44 29 25 73 30 44 33 39 38 33 40 47 34 34 34 34 45 40 37 44 20 37 24 Have no fear… Your calculator is here! You can get all this information from your calculator. Type your data in L1 and L2. Stat, 1-Var Stats, L1. Do the same thing for L2. Compare Centers Find the mean and median of both Bonds’ and Aaron’s home runs. X 35.4375 Y 34.9 M X 34 M Y 38 Bonds has a higher average number of home runs, but this average is affected by the extreme value of 73. The median for Aaron is higher than Bonds, indicating that he hit more home runs than Bonds in a typical season. Resistant and Non-resistant The mean is affected by extreme observations, such as Bonds’ single season record of 73 home runs. It is a non-resistant measure of center. The median, however, is resistant to extreme measures. It is preferable when a data set has outliers. Think About This Change Bonds’ single season record from 73 home runs to 100 home runs. How is the mean affected? The median? How do the mean and median compare to each other in a symmetric distribution? In a (unimodal) skewed right distribution? In a (unimodal) skewed left distribution? Introduction to Measures of Spread Today, we’ll learn about quartiles. Oddly enough, they divide a data set into fourths (25% sections). Finding quartiles is like finding the median. You count midpoints, and average the middle two numbers if there are an even number of data points. A Visual Representation of Quartiles Q1 Lower Quartile 25th %ile 25% Q2 Q3 Upper Quartile 75th %ile Median 50th %ile 25% 25% 25% So, there are really only THREE quartiles, and the middle one isn’t usually called a quartile (it’s called the median). We generally refer to Q1, M, and Q3. To find Q1, you find the median of the lowest half of data. To find Q3, you find the median of the higher half of the data. Try it! 16 19 24 25 25 33 33 34 34 37 37 Find the Range, Median, Q1, and Q3 40 42 46 49 73 Solution 16 19 24 25 25 Q1 = 25 33 33 34 34 37 Median = 34 37 40 42 46 49 73 Q3 = 41 So, the Range is 73 – 16 = 57. This gives us a little information about the variability of Bonds’ home runs in a season. The middle 50% of the data lies between 25 and 41, so we see where the spread of the middle half of the data lies. Interquartile Range and the Outlier Rule IQR is simply Q3 - Q1. In our Barry Bonds example, IQR = 41 – 25 = 16. The IQR is a suitable measure of spread and is paired with Median. We use the IQR to define what an outlier is. An outlier is any value (or values) that falls more than 1.5*IQR above the upper quartile or below the lower quartile. “Fences” Think of the 1.5*IQR rule as fences. They draw the boundary line beyond which values are outliers. Is Barry Bonds’ 73 homer season an outlier??? Recall: Q1 = 25; Q3 = 41; IQR = 16 So, 1.5*IQR = 1.5*16 = 24. Add 24 to Q3 and Subtract 24 from Q1: Upper boundary = 24 + 41 = 65 Lower boundary = 25 – 24 = 1 Conclusion: 73 falls above the outlier boundary of 65, so it is an outlier!!! 5 Number Summary The five number summary consists of the lowest value, Q1, the Median, Q3, and the highest value. It is important because we’ll use it to create a new kind of graph: a boxplot (also called a box-and-whiskers plot). Bonds’ Boxplot Recall his 5 number summary: L = 16; Q1 = 25; M = 34; Q3 = 41; H = 73 10 20 30 40 50 Number of home runs in a season 60 70 Modified Boxplots Modified boxplots show outliers as isolated points. Bonds’ 73 home run season was an outlier, so the whisker in a modified boxplot only extends to the last data point that was NOT an outlier. Any outlier is shown as a star (*). CAUTION: Many students extend the whisker to the outlier “fence” (i.e. 65) This is WRONG! The whisker should stop at the last actual data point. So tell me – where should the upper whisker end in a modified boxplot of Bonds’ home runs per season??? 49 We can look at these in the calculator as well. Go to StatPlot. It’s Never Too Soon for a Practice AP Question 2005 AP Statistics Problem #1 http://apcentral.collegeboard.com/ apc/public/repository/_ap05_frq_st atistics_45546.pdf Question 1 Part a) Part a) is graded Essentially Correct, Partially Correct, or Incorrect To receive an Essentially Correct, a student must successfully compare center, shape and spread. Specific numeric values are not required. To receive a Partially Correct, a student must successfully compare 2 of the 3 measures of center, shape and spread. All other responses are graded as Incorrect. Special Notes Compare means you state which is larger. For example, “the mean of the rural students’ daily caloric intake is greater than the mean for the urban students” is a correct comparison. However, stating “the mean of the rural students’ daily caloric intake is 40.45 while the mean for the urban students is 32.6” is not a COMPARISON. In Conclusion Graders were looking for three comparisons: Center—the mean caloric intake of the rural students is greater than the mean caloric intake of the urban students Spread—the spread of the rural students’ distribution is larger than the spread of the urban students Shape—the rural students’ caloric intakes are roughly symmetric while the urban students’ caloric intakes are skewed right. There’s More to Spread than IQR Section 1.2 Standard Deviation Describing Data with Numbers So far, we’ve learned the 5 Number Summary to describe a set of data: Min, Q1, M, Q3, and Max. We’ve also used the mean as another measure of center. Measuring Spread: Standard Deviation The most commonly used measure of spread is the standard deviation. Standard deviation tells us, on average, how far the observations are away from the mean. Standard Deviation and Variance Variance is the average of the squares of the deviations of the observations from the mean. WHAT??? But your calculator can tell you all of this! 1 s xi x n 1 2 2 Properties of Standard Deviation s2 is called variance. Square root of s2 is __. s measures spread about the mean and is called standard deviation. s = 0 only when there is NO SPREAD (in other words, all the data values are the same). As the observations become more spread out about their mean, s gets larger. s is not resistant to skewness or outliers. WHY? Recap Measures of spread: IQR, standard deviation Measures of center: Median, Mean When to use which??? The mean and the std. dev. are not resistant to outliers, so use them only when the distribution is roughly symmetric and there aren’t outliers. Use the 5 Number Summary when the distribution is strongly skewed or has outliers. How the AP Folks Test Your Ability to Reason How do the following affect the mean? The median? The Std. Dev.? Adding a certain amount to every value in a data set Multiplying each value in a data set by the same number Homework Day 1: Chapter 1 #40, 41, 45, 50, 52 Day 2: Chapter 1 #63, 91, 94, 96, 101