Download 33_center_spread_with_standard_deviation

Describing Distributions of Quantitative Data Center and Spread Shape, Center, Spread After spending some time in previous units describing the shape of quantitative data, in this unit we will describe the center and spread of quantitative data. Objective: 1. Students will be able to calculate measures of center including mean, median and midrange. Students will also be able to calculate measures of spread including IQR and standard deviation. 2. Students will know which measure of center and spread are appropriate for the data that is being described. Measures of Center Midrange: A simple measure of center taking the average of the maximum and minimum value. Max  Min Midrange  2 Example: Find the midrange of the following data. 6, 2, 5, 8, 10, 15, 20, 3, 4, 8 First, put the data in order: 2, 3, 4, 5, 6, 8, 8, 10, 15, 20 Max = 20 Min = 2 20  2 22 Midrange    11 2 2 Measures of Center Mean: Commonly referred to as the “average” of a set of data, the mean takes the sum of the data and divides by the number of data y entries. Sum of entries mean  Number of entries n Example: Find the mean of the following data. 6, 2, 5, 8, 10, 15, 20, 3, 4, 8 Add the 10 numbers and divide by 10: 6+2+5+8+10+15+20+3+4+8  81  8.1 10 10 Measures of Center Median: The middle value of an ordered set of data. If there is an odd number of data entries, the median is the middle value. If there is an even number of entries, the median is the mean of the two middle values. Example: Find the median of the following sets of data. a) 5, 4, 9, 20, 15 b) 6, 2, 5, 8, 10, 15, 20, 3, 4, 8 4, 5, 9, 15, 20 2, 3, 4, 5, 6, 8, 8, 10, 15, 20 Median = 9 median = 6+8 = 7 2 Measures of Center Which Measure of CENTER? Midrange: Very sensitive to small changes in data. Not a very good measurement to describe a whole set of data. Mean: Good for describing symmetric data. Median: Good for describing skewed data or data with outliers. (If data is symmetric, the median and mean will be very similar numbers. If the median and mean are very different, the data is skewed or has outliers.) Average? Can you calculate: a) b) c) d) Your average test grade? The average heart rate? The average family? The average song title? "Average" is a term used to mean "typical". With numeric data we need to be more specific. (Is your typical test grade the mean or median of your test grades?) If your data is not numeric it does not make sense to try to calculate a mean or median to describe an average. Measures of Spread Range: The maximum value minus the minimum value of a set of data. A simple measure of spread good for determining a scale for a graph. max  min Example: Find the range of the following data. 6, 2, 5, 8, 10, 15, 20, 3, 4, 8 First, put the data in order: 2, 3, 4, 5, 6, 8, 8, 10, 15, 20 Max = 20 Min = 2 range  20  2  18 Measures of Spread IQR: The difference between the middle 50% of your data. Best used to describe the range of a skewed data. Q3  Q1 Example: Find the IQR of the following data. 2, 3, 4, 5, 6, 8, 8, 10, 15, 20 Min = 2 Q1 = 4 Med = 7 Q3 = 10 Max = 20 Median IQR  10  4  6 Measures of Spread Standard Deviation: The average distance the data values are from the mean. Best used to describe the range of symmetric data. s   y y  2 n 1 * This formula is difficult to understand at first glance. It will be explained in subsequent slides. Standard Deviation Variance and Standard Deviation Notation: For a set of data, {y1, y2, y3, y4, …, yn} n: The number of data entries y: sum of data entries y  mean = n n s2: variance s: standard deviation Variance 2 s Another measure of spread (best used for symmetric data), variance finds the "almost average" distance of each data point from the mean. The symbol used for variance is s2 because it is the square of the standard deviation. (standard deviation is the square root of variance) Distance from the mean Sum s  2   y y n 1  2 Squared One less than the total # of data entries: the “almost average” Variance 2 s Ex) Find the variance of the data. These are the This is y – values. 6, 8, 10, 14, 17 Square each distance, then add6(Σ)  8them  10 together. 14  17 55  11 First calculate the mean ( 2). 2 2 2 2 (5)  (3)  (1)  (3) 5  (6) 5 Then find the distance of each data point from the 25  9  1  9  36  80 mean. (y - ) (6 number 11) (8 by (10 {n11) 11)(n-1). (14number  11) (17  11) Finally, divide this is the 5) in this (3)example, (1) so n-1=4 (3) } (6) of data entries. ({n=5 80 Square each distance, then add (Σ) them together.  20 4 2 2 2 2 2 (5)  (3)  (1) 2 (3)  (6) The variance of this data s = 20. 25  9  1  9  36  80 Problem with Variance Problem with Variance The problem with variance is that it always yields square units. We don’t usually want to compare (units)2. i.e.- square meters (m2), mpg2, (test grade)2 We want to describe our spread in terms of the same units as our original data. If the original data is in meters, we want to know the spread in terms of meters. If the original data are test grades, we want to know the spread in terms of test grades. Problem with Variance To fix this problem, we use the standard deviation (s) as our measure of spread. Standard deviation is the square root of variance (s2), so the units for standard deviation will be the same as the units in the original data. standard deviation = variance s s 2 Standard Deviation 2   y y variance, s =  2 n 1   y y standard deviation, s = n 1  2 Standard Deviation Recall the data from the previous example: 6, 8, 10, 14, 17 We found that the variance (s2) for this data is 20. Therefore, the standard deviation (s) = 20  4.47 What this means, is that the average distance of each data point from the mean is approximately 4.47. Standard Deviation Recall the mean of this data is 11. 6, 8, 10, 14, 17 5 6 7 8 3 9 3 1 10 6 11 12 13 14 15 16 17 Mean Does it seem that the distances of the values from the mean have an approximate average of 4.47? Standard Deviation Finding the Variance and Standard Deviation using a table: 1. Find the mean of the data. 2. Set up three columns 3. Find the sum of the squared deviations 4. Divide the sum by (n-1). This is the Variance. 5. Take the square root of the variance. This is the Standard Deviation. Standard Deviation Example: Find the variance and standard deviation 30, 32, 32, 40, 42, 46 222 mean y   37 2 ) (y - ) 6 y (y - 30 32 32 40 42 46 (30-37) (32-37) (32-37) (40-37) (42-37) (46-37)   (-7)2 = 49 (-5)2 = 25 (-5)2 = 25 (3)2 = 9 (5)2 = 25 (9)2 = 81 Add these 49 25 25 9 25 +81 . 214 214 Variance   42.8 5 n–1 Std. Dev.  42.8  6.5 Standard Deviation Try this: Find the mean and median of the data. 5, 7, 9, 9, 10, 11, 12 Given the histogram of the above data, which is the appropriate measure of center (mean or median)? Explain. Hint: Is the data symmetric or skewed? Standard Deviation Try this: Find the IQR and the standard deviation 5, 7, 9, 9, 10, 11, 12 Given the histogram of the above data, which is the appropriate measure of spread (IQR or standard deviation)? Explain. Hint: Is the data symmetric or skewed? Review Review: Create a box plot to describe the following data (be sure to identify any outliers) 5, 5, 6, 7, 8, 8, 9, 10, 12, 20, 23

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 33_center_spread_with_standard_deviation