Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Review and Preview 3-2 Measures of Center 3-3 Measures of Variation 3-4 Measures of Relative Standing and Boxplots Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 1 Preview Descriptive Statistics In this chapter we’ll learn to summarize or describe the important characteristics of a known set of data Inferential Statistics In later chapters we’ll learn to use sample data to make inferences or generalizations about a population Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 2 Section 3-2 Measures of Center Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 3 Measure of Center Measure of Center the value at the center or middle of a data set The three common measures of center are the mean, the median, and the mode. 3.1 - 4 Mean the measure of center obtained by adding the data values and then dividing the total by the number of values What most people call an average also called the arithmetic mean. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 5 Notation Greek letter sigma used to denote the sum of a set of values. x is the variable usually used to represent the data values. n represents the number of data values in a sample. 3.1 - 6 Example of summation • If there are n data values that are denoted as: x1 , x2 , , xn Then: x x1 x2 xn 3.1 - 7 Example of summation • data 21,25,32,48,53,62,62,64 Then: x 21 25 32 48 53 62 62 64 367 3.1 - 8 Sample Mean x is pronounced ‘x-bar’ and denotes the mean of a set of sample values x = x n Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 9 Example of Sample Mean • data 21,25,32,48,53,62,62,64 Then: 21 25 32 48 53 62 62 64 x 8 367 45.875 45.9 8 3.1 - 10 Notation µ Greek letter mu used to denote the population mean N represents the number of data values in a population. 3.1 - 11 Population Mean µ = x N Note: here x represents the data values in the population 3.1 - 12 Mean Advantages Is relatively reliable: means of samples drawn from the same population don’t vary as much as other measures of center Takes every data value into account Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 13 Mean Disadvantage Is sensitive to every data value, one extreme value can affect it dramatically; is not a resistant measure of center Example: 21,25,32,48,53,62,62,64 → 21,25,32,48,53,62,62,300 → x 45.9 x 75.4 3.1 - 14 Median Median the measure of center which is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude often denoted by x~ (pronounced ‘x-tilde’) is not affected by an extreme value - is a resistant measure of the center Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 15 Finding the Median First sort the values (arrange them in order), the follow one of these 1. If the number of data values is odd, the median is the number located in the exact middle of the list. 2. If the number of data values is even, the median is found by computing the mean of the two middle numbers. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 16 Example of Median • 6 data values: 5.40 1.10 0.42 0.73 0.48 1.10 • Sorted data: 0.42 0.48 0.73 1.10 1.10 5.40 (even number of values – no exact middle) 0.73 1.1 median 0.915 2 3.1 - 17 Example of Median • 7 data values: 5.40 1.10 0.42 0.73 0.48 1.10 0.73 1.10 1.10 0.66 • Sorted data: 0.42 0.48 0.66 5.40 median 0.73 3.1 - 18 Mode the value that occurs with the greatest frequency Data set can have one, more than one, or no mode 3.1 - 19 Mode Bimodal two data values occur with the same greatest frequency Multimodal more than two data values occur with the same greatest frequency No Mode no data value is repeated Mode is the only measure of central tendency that can be used with nominal data Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 20 Mode - Examples a) 5.40 1.10 0.42 0.73 0.48 1.10 Mode is 1.10 b) 27 27 27 55 55 55 88 88 99 Bimodal - c) 1 2 3 6 7 8 9 10 No Mode Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 27 & 55 3.1 - 21 Definition Midrange the value midway between the maximum and minimum data values in the original data set (average of highest and lowest data values) = maximum data value + minimum data value 2 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 22 Example of Midrange • data values: 5.41 1.13 0.42 0.49 0.65 1.86 1.69 5.41 0.42 midrange 2.915 2 3.1 - 23 Midrange Sensitive to extremes because it uses only the maximum and minimum values, so rarely used Redeeming Features (1) very easy to compute (2) reinforces that there are several ways to define the center (3) avoids confusion with median Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 24 Best Measure of Center Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 25 Round-off Rule for Measures of Center Carry one more decimal place than is present in the original set of values. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 26 Example of Round-off Rule • data values: 5.41 1.13 0.42 0.49 0.65 1.86 1.69 5.41 1.13 0.42 0.49 0.65 1.86 1.69 x 7 11.65 1.6642857 7 x 1.664 3.1 - 27 Example problem 12, page 95 • These data values represent weight gain or loss in kg for a simple random sample (SRS) of18 college freshman (negative data values indicate weight loss) 11 3 0 -2 3 -2 -2 5 -2 7 2 4 1 8 1 0 -5 2 • Do these values support the legend that college students gain 15 pounds (6.8 kg) during their freshman year? Explain 3.1 - 28 Example problem 12, page 95 • Sample Mean x 34 1.9 kg n 18 • Median -5 -2 -2 -2 -2 0 0 1 1 2 2 3 3 4 5 7 8 11 1 2 median 1.5 kg 2 3.1 - 29 Example problem 12, page 95 • Mode is -2 • Midrange 5 11 midrange 3.0 kg 2 3.1 - 30 Example problem 12, page 95 • All of the measures of center are below 6.8 kg (15 pounds) • Based on measures of center, these data values do not support the idea that college students gain 15 pounds (6.8 kg) during their freshman year 3.1 - 31 Critical Thinking Think about whether the results are reasonable. See example 6, page 89 Think about the method used to collect the sample data. See example 7, page 89 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 32 Mean/Median with Graphing Calculator • First, enter the list of data values •Then select 2nd STAT (LIST) and arrow right to MATH option 3:mean( or 4: median( •and input the desired list 3.1 - 33 Example of Computing the Mean Using Calculator Sorted amounts of Strontium-90 (in millibecquerels) in a simple random sample of baby teeth obtained from Philadelphia residents born after 1979 Note: this data is related to Three Mile Island nuclear power plant Accident in 1979. x = x n = 149.2 3.1 - 34 Example of Computing the Mean Using Calculator Median is 150 3.1 - 35 Part 2 Beyond the Basics of Measures of Center Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 36 Mean from a Frequency Distribution Assume that all sample values in each class are equal to the class midpoint. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 37 Mean from a Frequency Distribution use class midpoints for variable x Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 38 Example of Computing the Mean from a Frequency Distribution Heights (inches) of 25 Women 67 64 65 65 64 59 67 67 72 65 64 62 66 67 66 60 70 68 61 64 60 68 65 66 62 3.1 - 39 Example (cont.) Frequency distribution of heights of women HEIGHT FREQUENCY 59-60 3 61-62 3 63-64 4 65-66 7 67-68 6 69-70 1 71-72 1 Class midpoints are: 59.5, 61.5, 63.5, 65.5, 67.5, 69.5, 71.5 3.1 - 40 Example (cont.) HEIGHT FREQUENCY 59-60 3 61-62 3 63-64 4 65-66 7 67-68 6 69-70 1 71-72 1 3(59 .5) 3(61 .5) 4(63 .5) 7(65 .5) 6(67 .5) 1(69 .5) 1(71 .5) 25 1621.5 / 25 64.86 3.1 - 41 Weighted Mean When data values are assigned different weights, we can compute a weighted mean. (w • x) x = w Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 42 Example: Weighted Mean In this class, homework/quizz average is weighted 10%, 2 exams are weighted 60%, and final exam is weighted 30%. Suppose a student makes homework/quiz average 87, exam scores of 80 and 92, and final exam score 85. Compute the weighted average. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 43 Example: Weighted Mean Weighted average is: 0.10 (87 ) 0.30 (80 ) 0.30 (92 ) 0.30 (85 ) 85 .8 1.00 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 44 Skewed and Symmetric Symmetric distribution of data is symmetric if the left half of its histogram is roughly a mirror image of its right half Skewed distribution of data is skewed if it is not symmetric and extends more to one side than the other Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 45 Skewed Left or Right Skewed to the left (also called negatively skewed) have a longer left tail, mean and median are to the left of the mode Skewed to the right (also called positively skewed) have a longer right tail, mean and median are to the right of the mode Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 46 Shape of the Distribution The mean and median cannot always be used to identify the shape of the distribution. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 47 Shape of the Distribution • Consider the data: 5 5 5 5 5 10 10 10 10 10 10 15 15 15 15 15 20 20 20 20 25 25 25 30 30 30 35 35 40 45 • Mean: x 560 x 18.7 n 30 15 15 ~ x 15 2 • Median: • The distribution is skewed and the mean is “pulled” in the direction of the outliers 40 and 45 3.1 - 48 Distribution Skewed Left • Outliers that are much less than the median Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 49 Symmetric Distribution • No outliers Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 50 Distribution Skewed Right • Outliers that are much more than the median Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 51 Recap In this section we have discussed: Types of measures of center Mean Median Mode Mean from a frequency distribution Weighted means Best measures of center Skewness Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 52 Section 3-3 Measures of Variation Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 53 Key Concept Discuss characteristics of variation, in particular, measures of variation, such as standard deviation, for analyzing data. Make understanding and interpreting the standard deviation a priority. Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 54 Part 1 Basics Concepts of Measures of Variation Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 55 Why is it important to understand variation? • • A measure of the center by itself can be misleading Example: Two nations with the same median family income are very different if one has extremes of wealth and poverty and the other has little variation among families (see the following table). 3.1 - 56 Example of variation MEAN MEDIAN Data Set A Data Set B 50,000 10,000 60,000 20,000 70,000 70,000 80,000 120,000 90,000 130,000 70,000 70,000 70,000 70,000 Data set B has more variation about the mean 3.1 - 57 Histograms: example of variation Data set B has more variation about the mean (Target). 3.1 - 58 How do we quantify variation? 3.1 - 59 Definition The range of a set of data values is the difference between the maximum data value and the minimum data value. Range = (maximum value) – (minimum value) Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 60 Example of range. Data: 27 28 25 6 27 30 26 Range = 30 - 6 = 24 3.1 - 61 Range (cont.) Ignoring the outlier of 6 in the previous data set gives data 27 28 25 27 30 26 Range = 30 - 25 = 5 This shows that the range is very sensitive to extreme values; therefore not as useful as other measures of variation. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 62 Definition The standard deviation of a set of sample values, denoted by s, is a measure of variation of values about the mean. Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 63 Sample Standard Deviation Formula s= (x – x) n–1 Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 2 3.1 - 64 Steps to calculate the sample standard deviation 1. Calculate the sample mean x 2. Find the squared deviations from the sample mean for each sample data value: (x x) 2 3. Add the squared deviations 4. Divide the sum in step 3 by n-1 5. Take the square root of the quotient in step 4 3.1 - 65 Example: Standard Deviation Given the data set: 8, 5, 12, 8, 9, 15, 21, 16, 3 Find the standard deviation 3.1 - 66 Example: Standard Deviation • Find the mean x 8 5 12 8 9 15 21 16 3 x 10.78 n 9 3.1 - 67 Data Value Squared Deviations From the Mean 8 5 12 8 9 15 21 16 3 (8 10.78) 2 7.73 (5 10.78) 2 33.41 (12 10.78) 2 1.49 (8 10.78) 2 7.73 (9 10.78) 2 3.17 (15 10.78) 2 17.81 (21 10.78) 2 104.45 (16 10.78) 2 27.25 (3 10.78) 2 60.53 3.1 - 68 Example: Standard Deviation • Add the squared deviations (last column in the table above) 7.73 33.41 1.49 7.73 3.17 17.81 104.45 27.25 60.53 263.57 3.1 - 69 Example: Standard Deviation • Divide the sum by 9-1=8: 263.57 / 8 32.95 • Take the square root: 32.95 5.74 s 5.7 3.1 - 70 Round-Off Rule for Measures of Variation When rounding the value of a measure of variation, carry one more decimal place than is present in the original set of data. Round only the final answer, not values in the middle of a calculation. Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 71 Sample Standard Deviation (Shortcut Formula) n(x ) – (x) n (n – 1) 2 s= Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 2 3.1 - 72 Example: Standard Deviation • Page 110, problem 10 Data: 2 1 1 1 1 1 1 4 1 2 2 1 2 3 3 2 3 1 3 1 3 1 3 2 2 The range is 4 - 1 = 3 Determine the standard deviation using the previous formula 3.1 - 73 Example: Standard Deviation • We need to find each the following: n ( x ) 2 x 2 x 3.1 - 74 Data Table (25 data values) TOTALS: 47 109 3.1 - 75 Example: Standard Deviation • Thus: n 25 ( x ) x 109 x 47 2 2 3.1 - 76 Example: Standard Deviation • And: s ( n x ( x 2 2 n(n 1) 25(109) (47 25(24) 2 516 0.86 0.9 600 3.1 - 77 Example: Standard Deviation • Page 110, problem 10 Do the results make sense? NO: although we can determine the standard deviation, the results make no sense because the data is actually categorical (describing phenotype) and do not represent counts. Note that if we renumber the categories, the standard deviation would change but the results would not. 3.1 - 78 Standard Deviation Important Properties The standard deviation is a measure of variation of all values from the mean. The value of the standard deviation s is never negative and usually not zero. The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others). The units of the standard deviation s are the same as the units of the original data values. Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 79 Calculator Example • Data from problem on page 111, problem 20 is strontium90 data from previous class examples 3.1 - 80 Calculator Example Amounts of Strontium-90 (in millibecquerels) in a simple random sample of baby teeth obtained from Philadelphia residents born after 1979 Note: this data is related to Three Mile Island nuclear power plant Accident in 1979. DIRECTIONS: Find standard deviation 3.1 - 81 Calculator Example •To compute standard deviation: 1) Enter the list of data values 2) Select 2nd STAT (LIST) and arrow right to MATH option 7:stdDev( 3) Input the data by choosing the desired list Previous example: s = 15.0 millibecquerels 3.1 - 82 Calculator Example Press STAT. Arrow to the right to CALC. Now choose option #1: 1-Var Stats. When 1-Var Stats appears on the home screen, tell the calculator the name of the list containing the data 3.1 - 83 Calculator Example • Strontium-90 data, 1-Var Stats Displays: x 149.15 (mean) x x 2 5966 898586 S x 14.98 (sample standard deviation) ETC. 3.1 - 84 Population Standard Deviation = (x – µ) 2 N This formula is similar to the previous formula, but instead, the population mean and population size are used. Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 85 Variance The variance of a set of values is a measure of variation equal to the square of the standard deviation. Sample variance: s2 - Square of the sample standard deviation s Population variance: 2 - Square of the population standard deviation Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 86 Unbiased Estimator The sample variance s2 is an unbiased estimator of the population variance 2, which means values of s2 tend to target the value of 2 instead of systematically tending to overestimate or underestimate 2. Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 87 Variance Unlike standard deviation, the units of variance do not match the units of the original data set, they are the square of the units in the original data set. Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 88 Variance - Notation s = sample standard deviation s2 = sample variance = population standard deviation 2 = population variance Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 89 Part 2 Beyond the Basics of Measures of Variation Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 90 Range Rule of Thumb for many data sets, the vast majority (such as 95%) of sample values lie within two standard deviations of the mean. Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 91 Range Rule of Thumb •usual values in a data set are those that are “typical” and not too extreme •rough estimates of the minimum and maximum “usual” sample values can be computed using the “range rule of thumb” Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 92 Range Rule of Thumb minimum “usual” value = (mean) – 2 (standard deviation) x 2s Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 93 Range Rule of Thumb maximum “usual” value = (mean) + 2 (standard deviation) x 2s Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - 94 Range Rule of Thumb Usual values fall between the maximum and minimum usual values: x 2s x x 2s Otherwise the value is unusual 3.1 - 95 Example: Standard Deviation • Page 110, problem 12 Data (kg): 11 3 0 -2 3 -2 -2 5 -2 7 2 4 1 8 1 0 -5 2 3.1 - 96 Example (cont.) • Thus: n 18 ( x ) x 344 x 34 2 2 3.1 - 97 Example (cont.) x 34 x 1.9 kg n s ( 18 n x ( x 2 n(n 1) 2 18(344) (34 18(17) 2 5036 4.1 kg 306 3.1 - 98 Example (cont.) • Minimum usual value x 2s 1.9 2(4.1) 6.3 kg • Maximum usual value x 2s 1.9 2(4.1) 10.1 kg 3.1 - 99 Example (cont.) • Because 6.8 kg falls between the minimum usual value and the maximum usual value, it is not considered “unusual” 3.1 - Range Rule of Thumb for Estimating Standard Deviation s To roughly estimate the standard deviation from a collection of known sample data use range s 4 where range = (maximum value) – (minimum value) Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Example: Standard Deviation • Page 110, problem 12 Data (kg): 11 3 0 -2 3 -2 -2 5 -2 7 2 4 1 8 1 0 -5 2 Range rule of thumb: 11 (5) 16 s 4 kg 4 4 3.1 - Properties of the Standard Deviation • Measures the variation among data values • Values close together have a small standard deviation, but values with much more variation have a larger standard deviation • Has the same units of measurement as the original data Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Properties of the Standard Deviation • For many data sets, a value is unusual if it differs from the mean by more than two standard deviations • Compare standard deviations of two different data sets only if the they use the same scale and units, and they have means that are approximately the same Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Empirical (or 68-95-99.7) Rule For data sets having a distribution that is approximately bell shaped, the following properties apply: About 68% of all values fall within 1 standard deviation of the mean. About 95% of all values fall within 2 standard deviations of the mean. About 99.7% of all values fall within 3 standard deviations of the mean. Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - The Empirical Rule Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - The Empirical Rule Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - The Empirical Rule Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Example: The Empirical Rule Page 113, problem 34 One standard deviation from the mean is between: x s 125.0 0.3 124.7 volts x s 125.0 0.3 125.3 volts 68% of the data are between 124.7 volts and 125.3 volts Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Example: The Empirical Rule Page 113, problem 34 Two standard deviations from the mean is between: x 2s 125.0 2(0.3) 124.4 volts x 2s 125.0 2(0.3) 125.6 volts 95% of the data are between 124.4 volts and 125.6 volts Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Example: The Empirical Rule Page 113, problem 34 Three standard deviations from the mean is between: x 3s 125.0 3(0.3) 124.1 volts x 3s 125.0 3(0.3) 125.9 volts 99.7% of the data are between 124.1 volts and 125.9 volts Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Example: The Empirical Rule Page 113, problem 34 a) 95% of the data are between 124.4 volts and 125.6 volts b) 99.7% of the data are between 124.1 volts and 125.9 volts Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Chebyshev’s Theorem The proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1–1/K2, where K is any positive number greater than 1. For K = 2, at least 3/4 (or 75%) of all values lie within 2 standard deviations of the mean. For K = 3, at least 8/9 (or 89%) of all values lie within 3 standard deviations of the mean. Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Example: Chebyshev’s Theorem Page 113, problem 36 By Chebyshev’s Theorem there must be 1 1 / 3 1 1 / 9 8 / 9 0.89 89% 2 within three standard deviations of the mean Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Example: Chebyshev’s Theorem Page 113, problem 36 The minimum voltage amount within 3 standard deviations of the mean is x 3s 125.0 3(0.3) 124.1 volts and the maximum amount is x 3s 125.0 3(0.3) 125.9 volts Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Comparing Variation in Different Samples It’s a good practice to compare two sample standard deviations with s only when the sample means are approximately the same. In general, it is better to use the coefficient of variation when comparing any two standard deviations. Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Coefficient of Variation The coefficient of variation (or CV) for a set of nonnegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean. Sample CV = s 100% x Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. Population CV = 100% m 3.1 - Example: Comparing Variation Page 112, problem 24 Waiting times for Jefferson Valley: x 71.5 x 7.15 min n s ( 10 n x ( x 2 n(n 1) 2 0.48 min CV s / x 0.48 / 7.15 0.067 6.7% Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Example: Comparing Variation Page 112, problem 24 Waiting times for Providence: x 71.5 x 7.15 min n s ( 10 n x ( x 2 n(n 1) 2 1.82 min CV s / x 1.82 / 7.15 22.5% Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Example: Comparing Variation Page 112, problem 24 CV for Jefferson Valley wait times much smaller than CV for Providence wait times implies that variation for Jefferson Valley is much smaller than that for Providence Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Formula for Standard Deviation of a Sample s= (x – x) n–1 2 Why do we use n-1 instead of n? Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Rationale for using n – 1 versus n There are only n – 1 independent values. With a given mean, only n – 1 values can be freely assigned any number before the last value is determined (see homework problem from 3-2 number 35). Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Rationale for using n – 1 versus n Dividing by n – 1 yields better results than dividing by n. It causes s2 to target 2 whereas division by n causes s2 to underestimate 2. Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Recap In this section we have looked at: Range Standard deviation of a sample and population Variance of a sample and population Range rule of thumb Empirical distribution Chebyshev’s theorem Coefficient of variation (CV) Copyright © 2010 2010,Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1 - Section 3-4 Measures of Relative Standing and Boxplots Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Key Concepts •This section introduces measures of relative standing, which are numbers showing the location of data values relative to the other values within a data set. •Measures of relative standing can be used to compare values from different data sets, or to compare values within the same data set. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Key Concepts •The most important concept is the z score. We will also discuss percentiles and quartiles, as well as a new statistical graph called the boxplot. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Part 1 Basics of z Scores, Percentiles, Quartiles, and Boxplots Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Z score z Score (or standardized value) the number of standard deviations that a given value x is above or below the mean Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Measures of Position z Score Sample x – x z= s Population x – µ z= Round z scores to 2 decimal places Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Interpreting Z Scores Whenever a value is less than the mean, its corresponding z score is negative Ordinary values: –2 ≤ z score ≤ 2 Unusual Values: z score < –2 or z score > 2 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Z score • Problem 6 on page 127 Given mean and standard deviation: x 43.8 years and s 8.9 years a) Difference between Hoffman’s age and mean age is x x 38 43.8 5.8 years Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Z score • Problem 6 on page 127 Note: book answer is absolute value of the difference between Hoffman’s age and mean age which is x x 38 43.8 5.8 years 5.8 years Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Z score • Problem 6 on page 127 b) How many standard deviations is that? Here take the ratio of the absolute value of the difference and the standard deviation: 5.8 0.65 8.9 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Z score • Problem 6 on page 127 c) Convert Hoffman’s age to a z score Here take the ratio of the difference and the standard deviation: 5.8 0.65 8.9 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Z score • Problem 6 on page 127 d) Is Hoffman’s age usual or unusual? “Usual” means z score is between 2 and 2 and since -0.65 is between 2 and 2, Hoffman’s age is usual. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Z score • Problem 10 on page 127 Given mean and standard deviation for heights of women: x 63.6 in and s 2.5 in Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Z score • Problem 10 on page 127 Z scores for height requirements for women in army • Minimum height requirement z score x x 58 63.6 2.24 years s 2.5 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Z score • Problem 10 on page 127 Z scores for height requirements for women in army • Maximum height requirement z score x x 80 63.6 6.56 years s 2.5 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Z score • Problem 10 on page 127 • Minimum height requirement z score is less than -2 so it is considered unusual • Maximum height requirement z score is greater than 2 so it is considered unusual Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Z score • Problem 13 on page 128 • Score on SAT test is 1190 with mean of scores on SAT test 1518 and standard deviation 325 has z score of x x 1840 - 1518 0.99 s 325 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Z score • Problem 13 on page 128 • Score on ACT test is 26 with mean of scores on ACT test 21 and standard deviation 4.8 has z score of x x 26 21 1.02 s 4.8 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Z score • Problem 13 on page 128 Comparing z scores the ACT z score was better than the SAT z score which means the ACT score was higher above the mean than the SAT score (relative to the standard deviation) so the ACT score is better. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Percentiles are measures of location. There are 99 percentiles denoted P1, P2, . . . P99, which divide a set of data into 100 groups with about 1% of the values in each group. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Finding the Percentile of a Data Value Percentile of value x = number of values less than x 100 total number of values Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Finding Percentile • Problem 16 on page 128 Data (NFL superbowl total points): 36 37 37 39 39 41 43 44 44 47 50 53 54 55 56 56 57 59 61 61 65 69 69 75 Find the percentile corresponding to 65 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Finding Percentile • Problem 16 on page 128 Percentile of value x = number of values less than x 100 total number of values 20 100 83 24 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Converting from the kth Percentile to the Corresponding Data Value Notation total number of values in the data set k percentile being used L locator that gives the position of a value Pk kth percentile n L= k 100 •n Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Converting from the kth Percentile to the Corresponding Data Value Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Finding Data Value Given Its Percentile • Problem 22 on page 128 Data (NFL superbowl total points): 36 37 37 39 39 41 43 44 44 47 50 53 54 55 56 56 57 59 61 61 65 69 69 75 Find the data value at P80 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Finding Data Value Given Its Percentile • Problem 22 on page 128 • For 80th percentile P80 , the k value in the locator formula is 80 k 80 L n 24 19.2 100 100 • If L is not a whole number, then round up to next whole number: L 20 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Finding Data Value Given Its Quartile • The 20th score is 61: 36 37 37 39 39 41 43 44 44 47 50 53 54 55 56 56 57 59 61 61 65 69 69 75 x20 61 Thus: P80 61 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Quartiles Are measures of location, denoted Q1, Q2, and Q3, which divide a set of data into four groups with about 25% of the values in each group. Q1 (First Quartile) separates the bottom 25% of sorted values from the top 75%. Q2 (Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%. Q3 (Third Quartile) separates the bottom 75% of sorted values from the top 25%. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Quartiles The first quartile is the 25th percentile Q1 P25 • The second quartile is the 50th percentile Q2 P50 • The third quartile is the 75th percentile Q3 P75 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Finding Data Value Given Its Quartile • Problem 20 on page 128 Data (NFL superbowl total points): 36 37 37 39 39 41 43 44 44 47 50 53 54 55 56 56 57 59 61 61 65 69 69 75 Find the data value at Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Q1 3.1 - Example of Finding Data Value Given Its Quartile • Convert quartile to percentile: Q1 P25 • For 25th percentile P25 , the k value in the locator formula is 25 k 25 L n 24 6 100 100 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Finding Data Value Given Its Quartile • If L is a whole number, then round up to next whole number, find the percentile by adding the Lth value and the next value then dividing by 2 • Sixth value is 41 and seventh value is 43: x6 x7 41 43 Q1 42 2 2 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Quartiles Q1, Q2, Q3 divide ranked scores into four equal parts 25% (minimum) 25% 25% 25% Q1 Q2 Q3 (maximum) (median) Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Some Other Statistics Interquartile Range (or IQR): Q3 – Q1 Semi-interquartile Range: Q3 – Q1 2 Midquartile: Q3 + Q1 2 10 - 90 Percentile Range: P90 – P10 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - 5-Number Summary For a set of data, the 5-number summary consists of the minimum value; the first quartile Q1; the median (or second quartile Q2); the third quartile, Q3; and the maximum value. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Boxplot A boxplot (or box-and-whiskerdiagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third quartile, Q3. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Boxplots Boxplot of Movie Budget Amounts (See example 7 and 8 on page 121) Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Boxplots - Normal Distribution Normal Distribution: Heights from a Simple Random Sample of Women Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Boxplots - Skewed Distribution Skewed Distribution: Salaries (in thousands of dollars) of NCAA Football Coaches Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Five Number Summary and Boxplots • Problem 30 on page 128 uses 10 data values from Strontium-90 data ordered as follows: 128 130 133 137 138 142 142 144 147 149 151 151 151 155 156 161 163 163 166 172 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Five Number Summary and Boxplots • Min data value is 128 • First quartile Q1 P25 k 25 L n 20 5 100 100 x5 x6 138 142 Q1 140 2 2 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Five Number Summary and Boxplots • Second quartile Q2 P50 median k 50 L n 20 10 100 100 x10 x11 149 151 Q2 150 2 2 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Five Number Summary and Boxplots • Third quartile Q3 P75 k 75 L n 20 15 100 100 x15 x16 156 161 Q3 158.5 2 2 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Example of Five Number Summary and Boxplots • Max data value is 172 • Five Number Summary 128 140 150 158.5 • Boxplot? Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 172 3.1 - Example of Five Number Summary • Boxplot Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Calculator: Five Number Summary • Five Number Summary (from website mathbits.com) • Enter the data in a list: Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Calculator: Five Number Summary • Go to STAT - CALC and choose 1-Var Stats Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Calculator: Five Number Summary • On the HOME screen, when 1-Var Stats appears, type the list containing the data. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Calculator: Five Number Summary • Arrow down to the five number summary (last five items in the list) Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Calculator: Boxplot • CLEAR out the graphs under y = (or turn them off). • Enter the data into the calculator lists. (choose STAT, #1 EDIT and type in entries) Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Calculator: Boxplot • CLEAR out the graphs under y = (or turn them off). • Enter the data into the calculator lists. (choose STAT, #1 EDIT and type in entries) Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Calculator: Boxplot • Press 2nd STATPLOT and choose #1 PLOT 1. Be sure the plot is ON, the second box-andwhisker icon is highlighted, and that the list you will be using is indicated next to Xlist. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Calculator: Boxplot • To see the box-and-whisker plot, press ZOOM and #9 ZoomStat. Press the TRACE key to see on-screen data about the box-and-whisker plot. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Part 2 Outliers and Modified Boxplots OMIT THIS PART OF 3-4 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Recap In this section we have discussed: z Scores z Scores and unusual values Percentiles Quartiles Converting a percentile to corresponding data values Other statistics 5-number summary Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 - Putting It All Together Always consider certain key factors: Context of the data Source of the data Sampling Method Measures of Center Measures of Variation Distribution Outliers Changing patterns over time Conclusions Practical Implications Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1 -