Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 1.3 Describing Quantitative Data with Numbers Measuring Center β’ The Mean π₯ (pronounced βx barβ) used for the mean of a sample. β’ π (greek letter mu, pronounced βmewβ) used for the population mean. β’ To find the mean of a set of observations, add up their values and divide by the number of observations. Measuring Center The Mean β’ A more compact notation for the Mean is: β’ The Ξ£ (capital Greek letter sigma) in the formula for the mean is short for sum. The subscripts on the observations xi are just a way of keeping the n observations distinct. They do not necessarily indicate order or any other special facts about the data. Example of Calculating the Mean EX: Refer to the data on travel times to work for the sample of 15 North Carolinians. β’ (a) Find the mean travel time for all 15 workers. β’ (b) Calculate the mean again, this time excluding the person who reported a 60-minute travel time to work. What do you notice? Calculating the Mean Solution Things to think aboutβ¦ The previous example illustrates an important weakness of the mean as a measure of center which is that the mean is sensitive to the influence of extreme observations. These may be outliers, but a skewed distribution that has no outliers will also pull the mean toward its long tail. Because the mean cannot resist the influence of extreme observations, therefore we say that it is not a Resistant measure of center. Measuring the Center The Median β’ The median βMβ is the midpoint of a distribution β the number so that half of the observations are smaller and the other half are larger. β’ To find the Median: 1. Arrange the data in ascending order (from smallest to largest) 2. If the number of observations is odd, then the median is the center observation in the ordered list. 3. If the number of observations is even, the Median is the average of the two center observations in the ordered list. Example of Finding the Median EX: People say that it takes a long time to get to work in New York State due to the heavy traffic near big cities. What do the data say? Here are the travel times in minutes of 20 randomly chosen New York workers: 10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45 a) Find the median of the travel times. Show your work. Comparing Mean and Median β’ The mean and median of a roughly symmetric distribution are close together. If the distribution is exactly symmetric, the mean and median are exactly the same. In a skewed distribution, the mean is usually farther towards the long tail than the median. β’ Also the Mean is not resistant (to outliers and skewed distributions) while the Median is. Measuring the Spread The Range β’ Range is one of the most basic ways of measuring spread. It is simply the difference between the highest and lowest observations. (denoted: ππ₯ for population and π π₯ for sample) Quartiles and IQR (Interquartile Range) To calculate the quartiles: 1. Arrange the observations in increasing order and locate the median M in the ordered list of observations. 2. The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the median. 3. The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the median. β’ The Interquartile Range (IQR), which is the measure of the range of the middle 50% of the data, is defined as: IQR = Q3 β Q1 Example of finding IQR EX: Find and interpret the IQR of the following data on reported travel times to work for 20 randomly selected New Yorkers. Notes: if the number of observations is odd, you do not include the Median when calculating the Quartiles. If the number if observations is even you include all numbers. Identifying Outliers An observation is an outlier if it falls more than 1.5 x IQR above the 3rd quartile or below the 1st quartile ABOVE: Q3 + (1.5 x IQR) OR BELOW: Q1 β (1.5 x IQR) EX: Does the 1.5 × IQR rule identify any outliers for the New York travel time data? In the previous example, we found that Q1 = 15 minutes, Q3 = 42.5 minutes, and IQR = 27.5 minutes. Five-Number Summary The five number summary for observations includes: minimum, Q1, Median, Q3, and maximum The five number summary of a distribution leads us to a new type of graphβ¦.THE BOXPLOT (sometimes called a box and whisker plot) β’ A central box is drawn from the first quartile (Q1) to the third quartile (Q3). β’ A line in the box marks the median. β’ Lines (called whiskers) extend from the box out to the smallest and largest observations that are not outliers. EX: The 2009 roster of the Dallas Cowboys professional football team included 10 offensive linemen. Their weights (in pounds) were 338 318 353 313 318 326 307 317 311 311 1. 2. 3. 4. Find the five-number summary for these data by hand. Show your work. Calculate the IQR. Determine if there are any outliers. Draw a boxplot of the data. Now lets use our calculators to construct the boxplot for this data!! The standard deviation sx or Οx and variance π π2 or Ο2x β’ The standard deviation measures the average distance of the observations from their mean. It is calculated by finding the average of the squared distances and then taking the square root. This average squared distance is called the variance. In symbols, the sample standard deviation is given by Standard Deviation More important than the details of calculating the standard deviation are the properties that determine the usefulness of the standard deviation: β sx measures spread about the mean and should be used only when the mean is chosen as the measure of center. β sx is always greater than or equal to 0. sx = 0 only when there is no variability. This happens only when all observations have the same value. β sx has the same units of measurement as the original observations. β Like the mean, sx is not resistant. A few outliers can make sx very large. EX: The heights (in inches)of the five starters on a basketball team are 67, 72, 76, 76, and 84 a) Find and interpret the Mean. b) Use your calculator to calculate the standard deviation. c) Interpret the meaning of the standard deviation in this setting. Numerical Summaries using Calculator Enter the following data for travel times to work in List1 and List2 on your calculator: N.C. (L1) NY (L2) Record the descriptive statistics for each set of data: Choosing Measure of Center and Spread We now have a choice between two descriptions of the center and spread of a distribution: the median and IQR, OR πand sx. Because π and sx are sensitive to extreme observations, they can be misleading when a distribution is strongly skewed or has outliers. In these cases, the median and IQR, which are both resistant to extreme values, provide a better summary.