Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3: Descriptive Measures STP 226: Elements of Statistics Jenifer Boshes Arizona State University 3.1: Measures of Center Mean The mean of a data set is the sum of the observations divided by the number of observations. (average) Example 1: Example 2: The following data set is comprised of a set of homework grades. Find the mean homework grade. The following data set is comprised of the lengths of a rare orchid (in inches). Find the mean orchid length. 93 87 90 90 82 85 88 90 93 83 90 13 18 14.5 14 15 14 Interpret: Interpret: Median To find the median of a data set: Arrange the data in increasing order. • If the number of observations is odd, then the median is the observation exactly in the middle. • If the number of observations is even, the median is the mean of the two middle observations in the ordered list. Example 3: Find the median homework score. 93 87 90 90 82 85 88 90 93 83 90 Interpret: Example 4: Find the median orchid length. 13 18 14.5 14 15 14 Interpret: Mode The mode of a data set is value that occurs with greatest frequency. First, find the frequency of each value in the data set. • If no value occurs more than once, there is no mode. • Otherwise, any value that occurs with greatest frequency is a mode. Example 5: Example 6: Find the mode homework score. Find the mode orchid length. 93 87 90 90 82 85 88 90 93 83 90 13 18 14.5 14 15 14 Interpret: Interpret: Example 7: Find the mean, median, and mode of each of the data sets. Data Set I Data Set II 45 90 78 88 61 99 82 68 16 86 49 72 80 96 88 86 82 76 78 77 66 Skewed vs. Symmetric (a) Right skewed: The mean is to the right of the median. (b) Symmetric: The mean is equal to the median. (c) Left skewed: The mean is to the left of the median. When to use each… Median: Use the median when your data set has very extreme values. A resistant measure (or robust) is not sensitive to the influence of a few extreme observations. Mode: Use the mode when you have qualitative data. Sample Mean Example 8: The exam scores for a student are: 61, 97, 78, 86, and 73. (a) Use mathematical notation to represent the individual exam scores. (b) Use summation notation to express the sum of the five exam scores. (c) Find x for the exam data. 3.2: Measures of Variation Example 1: The exam scores for student A are: 100, 100, 90, 90, and 70. The exam scores for student B are: 90, 88, 88, 93, and 91. Compare the means and medians. Who is the better student? Who is more consistent? Range Standard Deviation The standard deviation measures variation by indicating, on average, how far the observations are from the mean. Sample Standard Deviation 1. For each observation, calculate the deviation from the mean. 2. Square this value. 3. Add up the squares. 4. Divide by n – 1. 5. Take the square root. Example 2: Find the standard deviation for student A: 100, 100, 90, 90, and 70. 1. 2. 3. 4. 5. For each observation, calculate the deviation from the mean. Square this value. Add up the squares. Divide by n – 1. Take the square root. Example 3: Find the standard deviation for student B: 90, 88, 88, 93, and 91. 1. 2. 3. 4. 5. For each observation, calculate the deviation from the mean. Square this value. What can we say about the Add up the squares. relative performance between students A and B? Divide by n – 1. Take the square root. Comments on Standard Deviation s2 is called the sample variance. The units of s2 are the square of the original units. The units of s are the same as the original units. s is ALWAYS ≥ 0. Why? s is a measure of how much each point deviates from the mean deviation. Do not perform any rounding until the computation is complete; otherwise, substantial roundoff error can result. Almost all the observations in any data set lie within three standard deviations to either side of the mean. This is known as Chebyshev’s Rule. Example 3: • How many observations for student B are within one standard deviation of the mean? • How many observations for student B are within two standard deviation of the mean? • How many observations for student B are within three standard deviation of the mean? 3.3: The Five-Number Summary; Boxplots Robustness Recall: What does it mean for a statistic to be robust? Name a statistic that is not robust. Name a statistic that is robust Quartiles Quartiles divide a data set into quarters. Q1, Q2, and Q3 are the three quartiles. The second quartile (Q2) is the median of the entire data set. The first quartile (Q1) is the median of the portion of the data set that lies at or below Q 2. The third quartile (Q3) is the median of the portion of the data set that lies at or above Q 2. Example 1: Fifteen people were asked how many baseball games they had attended the previous season. Find the quartiles. 1. 2. 3. 4. Order the data. Find the median of the data set. This is Q2. Find the median of the data that lies at or below the median of the entire data set. This is Q1. Find the median of the data that lies at or above the median of the entire data set. This is Q3. 12 25 8 6 1 0 42 19 17 0 63 14 22 31 34 Interquartile Range (IQR) The IQR is the difference between the first and third quartiles; that is, IQR = Q3 – Q1. It is the preferred measure of variation when the median is used as the measure of center. Like the median, the IQR is a resistant or robust measure. Example 2: What is the IQR for the baseball data? Interpret: Five-Number Summary Min Q1 Q2 Q3 Max Example 3: Find the five-number summary for the baseball data. Outliers Outliers are observations that fall well outside the overall pattern of the data. They may result from a recording error, obtaining an observation from a different population, or an unusual extreme value. Lower and Upper Limits Lower limit: Q1 – 1.5 · IQR Upper limit: Q3 + 1.5 · IQR Observations that lie outside the upper and lower limits – either below the lower limit or above the upper limit – are potential outliers. Example 4: For the baseball data: (a) Obtain the lower and upper limits. (b) Determine the potential outliers, if any. (c) Construct a modified boxplot. 12 25 8 6 1 0 42 19 17 0 63 14 22 31 34 • Adjacent values of a set are the most extreme observations that are not potential outliers. Steps for Constructing a Modified Boxplot Steps for Constructing a Boxplot Boxplots Boxplots are useful for comparing two or more data sets. Notice how box width and whisker length relate to skewness and symmetry. Bibliography Some of the textbook images embedded in the slides were taken from: Elementary Statistics, Sixth Edition; by Weiss; Addison Wesley Publishing Company Copyright © 2005, Pearson Education, Inc.