Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 1.2 Describing Distributions with Numbers Specific Ways to Describe Shape, Center and Spread • Center: – Mean – ordinary arithmetic average. Pronounced “x-bar.” n 1 X Xi n i 1 •Median – the midpoint of the data set. Denoted M. Bonds vs. Aaron Barry Bonds Hank Aaron 16 40 13 32 19 42 27 44 24 46 26 39 25 49 44 29 25 73 30 44 33 39 38 33 40 47 34 34 34 34 45 40 37 44 20 37 24 Compare Centers Find the mean and median of both Bonds’ and Aaron’s home runs. Bonds has a higher mean of home runs, but Aaron has a higher median. Why? Resistant and Non-resistant Means are affected by extreme observations and outliers. The mean is a non-resistant measure of center. The median is resistant to extreme measures. It is preferable when a data set has outliers. Think About This Change Bonds’ single season record from 73 home runs to 100 home runs. How is the mean affected? The median? How do the mean and median compare to each other in a symmetric distribution? In a (uni-modal) skewed right distribution? In a (uni-modal) skewed left distribution? Introduction to Measures of Spread One measure of spread you’ve already studied is range, where you subtract the lowest value from the highest value. It is not a dependable measure of spread, because it only depends on two values in the data set. Today, we’ll learn about quartiles. They divide a data set into fourths. Finding quartiles is like finding the median. You count midpoints, and average the middle two numbers if there is an even number of data points. A Visual Representation of Quartiles Q1 Lower Quartile 25th %ile Q2 Median 50th %ile Q3 Upper Quartile 75th %ile So, there are really only THREE quartiles, and the middle one isn’t usually called a quartile (it’s called the median). We generally refer to Q1, M, and Q3. Try it! 16 19 24 25 25 33 33 34 34 37 Find the Range, Median, Q1, and Q3 37 40 42 46 49 73 Solution 16 19 24 25 25 Q1 = 25 33 33 34 34 Median = 34 37 37 40 42 46 49 Q3 = 41 So, the Range is 73 – 16 = 57. This gives us a little information about the variability of Bonds’ home runs in a season. The middle 50% of the data lies between 25 and 41, so we see where the spread of the middle half of the data lies. 73 Interquartile Range and the Outlier Rule IQR is simply the difference between the upper quartile and the lower quartile. In our Barry Bonds example, IQR = 41 – 25 = 16. We use the IQR to define what an outlier is. An outlier is any value (or values) that falls more than 1.5*IQR above the upper quartile or below the lower quartile. “Fences” Think of the 1.5*IQR rule as fences. They draw the boundary line beyond which values are outliers. Is Barry Bonds’ 73 homer season an outlier??? 5 Number Summary The five number summary consists of: minimum, Q1, the Median, Q3, and maximum. It is important because we’ll use it to create a boxplot (also called a box-and-whiskers plot). Bonds’ Boxplot Recall his 5 number summary: L = 16; Q1 = 25; M = 34; Q3 = 41; H = 73 Barry Bond’s Homeruns in a season 10 20 30 40 50 Number of Homeruns 60 70 Describing Distributions using a Boxplot Spread: IQR or Range Center: Median Outliers: Use formula or a Modified Boxplot Shape: If the Median is approx. centered: Roughly symmetric If the Median is closer to the maximum: skewed left If the Median is closer to the minimum: skewed right Graph Choices for Comparing Distributions Boxplots alone contain little detail, but sideby-side boxplots effectively compare large sets of quantitative data. Let’s Plot Bonds vs Aaron’s and compare. =] Keys to Remember **Plot both distributions using the same scale. **Always compare apples to apples. By that, I mean compare mean to mean, median to median, Q1 to Q1, etc. Students lose points on the AP exam when they make comparisons between two different measures. Measuring Spread: Standard Deviation The most commonly used measure of spread is the standard deviation. Standard deviation tells us the average distance the observations are away from the mean. Standard Deviation and Variance Variance is the average of the squares of the deviations of the observations from the mean. WHAT??? n s 2 (X i 1 i X) n 1 Find this on your formula sheet! 2 Let Me ‘xplain Observation Deviation from Mean Squared Deviation 1792 1792-1600 =192 1922=36,864 1666 1362 1614 1460 1867 1439 Mean = 1600 Sum Standard Deviation Calculation Continued s2=214,870/6 = 35, 811.67 This is the variance. s = √35,811.67 = 189.24 calories Properties of Standard Deviation s measures spread about the mean s = 0 only when there is NO SPREAD (meaning all the values are the same). As the observations become more spread out about their mean, s gets larger. s is not resistant to skewness or outliers. WHY? How the AP Folks Test Your Ability to Reason How do the following affect the mean? The median? The Std. Dev.? Adding a certain amount to every value in a data set Multiplying each value in a data set by the same number Recap Measures of spread: Measures of center: Range/spread, IQR, standard deviation Median, Mean When to use which??? The mean and the std. dev. are not resistant to outliers, so use them only when the distribution is roughly symmetric and there aren’t any outliers. Use the 5 Number Summary when the distribution is strongly skewed or has outliers. Height Project due ____day! Height Project: Collect heights, in inches, from 50 high school girls and 50 high school boys (keep them separate). Organize these in two frequency tables and create side-by-side box plots. Describe the distributions of both the boys and girls heights. Write 3 statements comparing the distributions. Grading Rubric: Frequency Table of values(one for boys, one for girls) Side-by-side boxplots Describe the distributions for both the boys and girls heights Comparison statements 10 points 30 points 30 points 30 points