Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 3.4 Measures of Relative Standing and Boxplots Part 1 Basics of z Scores, Percentiles, Quartiles, and Boxplots Z Score the number of standard deviations that a given value x is above or below the mean - Also known as a standardized value Measures of Position z Score Sample x – x z= s Population x – µ z= *Round z scores to 2 decimal places if necessary. Interpreting Z Scores Whenever a value is less than the mean, its corresponding z score is negative Ordinary values: –2 ≤ z score ≤ 2 Unusual Values: z score < –2 or z score > 2 Example 1: Helen Mirren was 61 when she earned her Oscar-winning Best Actress award. The Oscar-winning Best Actresses have a mean age of 35.8 years and a standard deviation of 11.3 years. a) What is the difference between Helen Mirren’s age and the mean age? b) How many standard deviations is that? c) Convert Helen Mirren’s age to a z score. d) If we consider “usual” ages to be those that convert to z scores between –2 and 2, is Helen Mirren’s age usual or unusual? Example 2: Human body temperatures have a mean of 98.20°F and a standard deviation of 0.62°F (based on Data Set 2 in Appendix B). Convert each given temperature to a z score and determine whether it is usual or unusual. a) 101.00°F b) 96.90°F c) 96.98°F Example 3: Scores on the SAT test have a mean of 1518 and a standard deviation of 325. Scores on the ACT test have a mean of 21.1 and a standard deviation of 4.8. Which is relatively better: a score of 1840 on the SAT test or a score of 26.0 on the ACT test? Why? Percentiles measures of location. There are 99 percentiles denoted P1, P2, . . . P99, which divide a set of data into 100 groups with about 1% of the values in each group. Finding the Percentile of a Data Value Percentile of value x = number of values less than x total number of values • 100 Example 4: Use the given sorted values (the numbers of points scored in the Super Bowl for a recent period of 24 years). Find the percentile corresponding to the given number of points. 36 37 37 39 39 41 43 44 44 47 50 53 54 55 56 56 57 59 61 61 65 69 69 75 a) 47 points b) 54 points Converting from the kth Percentile to the Corresponding Data Value Notation total number of values in the data set k percentile being used L locator that gives the position of a value Pk kth percentile n L= k 100 •n Converting from the kth Percentile to the Corresponding Data Value Example 5: Use the given sorted values (the numbers of points scored in the Super Bowl for a recent period of 24 years). Find the indicated percentile. 36 37 37 39 39 41 43 44 44 47 50 53 54 55 56 56 57 59 61 61 65 69 69 75 a) P50 b) P22 Quartiles measures of location, denoted Q1, Q2, and Q3, which divide a set of data into four groups with about 25% of the values in each group. Q1 (First Quartile) separates the bottom 25% of sorted values from the top 75%. Q2 (Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%. Q3 (Third Quartile) separates the bottom 75% of sorted values from the top 25%. Quartiles Q1, Q2, Q3 divide ranked scores into four equal parts 25% (minimum) 25% 25% 25% Q1 Q2 Q3 (median) (maximum) Some Other Statistics • Interquartile Range (or IQR): Q3 – Q1 • Semi-interquartile Range: • Q3 – Q1 2 Q + Q 3 1 Midquartile: 2 • 10 - 90 Percentile Range: P – P 90 10 Example 6: Use the given sorted values (the numbers of points scored in the Super Bowl for a recent period of 24 years). Find the indicated percentile or quartile. 36 37 37 39 39 41 43 44 44 47 50 53 54 55 56 56 57 59 61 61 65 69 69 75 a) P20 b) Q3 5-Number Summary For a set of data, the 5-number summary consists of the minimum value; the first quartile Q1; the median (or second quartile Q2); the third quartile, Q3; and the maximum value. Boxplot A boxplot (or box-and-whisker-diagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third quartile, Q3. Boxplot Boxplot of Movie Budget Amounts Boxplots - Normal Distribution Normal Distribution: Heights from a Simple Random Sample of Women Boxplots - Skewed Distribution Skewed Distribution: Salaries (in thousands of dollars) of NCAA Football Coaches Example 7: Use the given sorted values (the numbers of points scored in the Super Bowl for a recent period of 24 years). Construct a boxplot and include the values of the 5 – number summary. 36 37 37 39 39 41 43 44 44 47 50 53 54 55 56 56 57 59 61 61 65 69 69 75 Part 2 Outliers and Modified Boxplots Outliers An outlier is a value that lies very far away from the vast majority of the other values in a data set. Important Principles An outlier can have a dramatic effect on the mean. An outlier can have a dramatic effect on the standard deviation. An outlier can have a dramatic effect on the scale of the histogram so that the true nature of the distribution is totally obscured. Outliers for Modified Boxplots For purposes of constructing modified boxplots, we can consider outliers to be data values meeting specific criteria. In modified boxplots, a data value is an outlier if it is . . . above Q3 by an amount greater than 1.5 IQR or below Q1 by an amount greater than 1.5 IQR Modified Boxplots Boxplots described earlier are called skeletal (or regular) boxplots. Some statistical packages provide modified boxplots which represent outliers as special points. Modified Boxplot Construction A modified boxplot is constructed with these specifications: • A special symbol (such as an asterisk) is used to identify outliers. • The solid horizontal line extends only as far as the minimum data value that is not an outlier and the maximum data value that is not an outlier. Modified Boxplots - Example Pulse rates of females listed in Data Set 1 in Appendix B. Example 8: Use the 40 upper leg lengths (cm) listed for females from Data Set 1 in Appendix B. Construct a modified boxplot. Identify any outliers. 41.6 42.8 39 40.2 36.2 43.2 38.7 41 43.8 37.3 42.3 39.1 40.3 48.6 33.2 43.4 41.5 40 38.2 38.2 38.2 41 38.1 39 36.6 27 38 36 32.1 31.1 39.4 40.2 39.2 38.5 39.9 37.5 39.7 39 41.6 33.8 Deciles are the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, and 90th percentiles. The are denoted using the following notation: D1, D2, …, D9. Quintiles are the 20th, 40th, 60th, and 80th percentiles. Example 9: Using the following data, find the deciles D2, D5, and D9. Sorted Movie Budget Amounts (in millions of dollars) 4.5 5 6.5 7 20 20 29 30 35 40 40 41 50 52 60 65 68 68 70 70 70 72 74 75 80 100 113 116 120 125 132 150 160 200 225