Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 6: Interpreting the Measures of Variability • To date we have discussed three measures of central tendency (mean, median, and mode) and three measures of variability (range, IQR, and standard deviation). • To adequately summarize a set of data we need both measures. • Remember that mean and range are may not be the best measures to use when a distribution is skewed or contains outliers. Better options would be median and IQR. Five-Number Summary • The five-number summary of a distribution for numerical data consists of: – The smallest observation (minimum) – The first quartile – The median – The third quartile – The largest observation (maximum) • These five values are used to draw a boxplot (aka box-and-whisker plot). Boxplots • Boxplots are a statistical device used to examine graphically the shape of a distribution, the range and IQR of a distribution, and the sides with the greatest concentration of observations. • Boxplots serve as a statistical tool for summarizing and comparing numerical data from two or more samples, in particular where their medians are located, how spread out they are, and whether they are symmetric, positively skewed (skewed right) or negatively skewed (skewed left). • Boxplots also identify outliers of a distribution. • The whiskers of the boxplot extend to the lowest and highest values in data set that are not outliers. • Outliers are marked separately with an asterisk or circle. Positively Skewed Data • When the median is closer to the bottom of the box, and if the whisker is shorter on the lower end of the box, then the distribution is positively skewed (skewed right). Negatively Skewed Data • When the median is closer to the top of the box, and if the whisker is shorter on the upper end of the box, then the distribution is negatively skewed (skewed left). Symmetrical Data • When the median is in the middle of the box, and the whiskers are about the same on both sides of the box, then the distribution is symmetric. Outliers • Outliers are any value that falls out of the pattern of the rest of the data (unusually high or unusually low values in a distribution). • The rule of thumb for an observation being an outlier is if the observation lies more than 1.5 IQR’s below the first quartile or above the third quartile. Example: On September 20, 2009, the Tennessee Titans played the Houston Texans. Here are the rushing yards Titan’s running back Chris Johnson had for each of his 16 rushing attempts. Determine if there are any outliers. Steps to Make a Boxplot 1) Draw a central box (rectangle) from the first quartile to the third quartile 2) Draw a vertical line to mark the median 3) Draw horizontal lines (whiskers) that extend from the box out to the smallest and largest observations that are not outliers 4) If there are any outliers, mark them separately Example: Draw a boxplot for the Chris Johnson data. • From the boxplot, we can see that our distribution contains no outliers. In addition, due to the location of the median and the shorter left whisker, we can also state that our distribution is slightly skewed right (positively skewed). Example: Construct a boxplot for the following set of data: 1, 6, 5, 4, 10, 16, 8, 3, 18, 13 Order: 1, 3, 4, 5, 6, 8, 10, 13, 16, 18 Median: 7 Q1: 4 Q3: 13 IQR: 9 There are no outliers. The Empirical Rule • In symmetric (bell-shaped) distributions, the values are distributed symmetrically about the mean in such a way that the values are clustered most densely around the mean and become rarer as the distance between the values and the mean widens. • This distribution is called a Normal distribution and the graph is called the Normal curve. • The empirical rule is a statement about the proportion of the items that falls within different standard deviation units from the mean, when the distribution is a Normal distribution. • The empirical rule is also known as the 68-9599.7 rule, for an obvious reason. • In general, in a Normal distribution: – Approximately 68% of the observations will be within 1 standard deviation of the mean. – Approximately 95% of the observations will be within 2 standard deviations of the mean. – Approximately 99.7% of the observations will be within 3 standard deviations of the mean. A Visual Summary of the Empirical Rule Example: Suppose a sample of scores yields a mean of 100 and a standard deviation of 15. Assume that the distribution is Normal. What percent of scores should fall between 85 and 115? (Hint: Draw a diagram first!) 85 and 115 are both one standard deviation from the mean, so the percent of scores that fall between 85 and 115 is approximately 68% Let’s try some more with the same distribution… What percent of scores should fall: a) Between 70 and 130? 95% c) Between 70 and 115? 13.5%+68%=81.5% e) Less than 70? 2.5% b) Between 55 and 145? 99.7% d) Greater than 115? 13.5%+2.5%=16%