Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Chapter 3 Displaying and Summarizing Quantitative Data Copyright © 2014, 2012, 2009 Pearson Education, Inc. 1 NOTE on slides / What we can and cannot do The following notice accompanies these slides, which have been downloaded from the publisher’s Web site: “This work is protected by United States copyright laws and is provided solely for the use of instructors in teaching their courses and assessing student learning. Dissemination or sale of any part of this work (including on the World Wide Web) will destroy the integrity of the work and is not permitted. The work and materials from this site should never be made available to students except by instructors using the accompanying text in their classes. All recipients of this work are expected to abide by these restrictions and to honor the intended pedagogical purposes and the needs of other instructors who rely on these materials.” Some of these slides are taken from the Third Edition; others are my own additions. We can use these slides because we are using the text for this course. Please help us stay legal. Do not distribute these slides any further. The original slides are done in green / red and black. My additions are in red and blue. Topics in brown and maroon are optional. Slide 2- 2 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 2 2 Overview – Organization of the chapter Pictorial Display Histogram Stem – and Leaf Plot Dotplot Numerical summary Shape of data Center Spread Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 3 3 Slide 4- 3 Division of Mathematics, HCC Course Objectives for Chapter 3 After studying this chapter, the student will be able to: 8. Appropriately display quantitative data using a frequency distribution, histogram, relative frequency histogram, stem-and-leaf display, dotplot. 9. Describe the general shape of a distribution in terms of shape, center and spread. 10. Describe any anomalies or extraordinary features revealed by the display of a variable. 11. Compute and apply the concepts of mean and median to a set of data. 12. Compute and apply the concept of the standard deviation and IQR to a set of data. 13. Select a suitable measure of center/spread for a variable based on information about its distribution. 14. Create a five-number summary of a variable. 15. Construct a boxplot by hand and with technology. 16. Use the 1.5 IQR rule to identify possible outliers Copyright © 2014, 2012, 2009 Pearson Education, Inc. 4 3.1 Displaying Quantitative Variables Copyright © 2014, 2012, 2009 Pearson Education, Inc. 5 Dealing With a Lot of Numbers… • • • • Summarizing the data will help us when we look at large sets of quantitative data. Without summaries of the data, it’s hard to grasp what the data tell us. The best thing to do is to make a picture… We can’t use bar charts or pie charts for quantitative data, since those displays are for categorical variables. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 6 6 A histogram of tsunami generating earthquakes Histograms (The authors did not provide the raw data.) • Histogram: A chart that displays quantitative data • First used by Kaoru Ishikawa (Japan, 1950) • Great for seeing the distribution of the data • Most earthquake generating tsunamis have magnitudes between 6.5 and 8. • Japan and Sumatra quakes (9.0 and 9.1) are rare. • Quakes under 5 rarely cause tsunamis. • Quakes between 7.0 and 7.5 most common for causing tsunamis Copyright © 2014, 2012, 2009 Pearson Education, Inc. 7 Choosing the Bin Width • Different bin widths tell different stories. • Choose the width that best shows the important features. • Presentations can feature two histograms that present the same data in different ways. • A gap in the histogram means that there were no occurrences in that range. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 8 Relative Frequency Histograms • Relative Frequency Histogram • The vertical axis represents the relative frequency, the frequency divided by the total. • The horizontal axis is the same as the horizontal axis for the frequency histogram. • The shape of the relative frequency histogram is the same as the frequency histogram. • Only the scale of the y-axis is different. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 9 Histograms Both histograms “look” the same. The only difference is the vertical axis. Did we see this in Chapter 2? Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 10 10 Slide 4- 10 Histograms • • • • They can be displayed horizontally as well as vertically I rotated this one 90 degrees clockwise To publish this, I would put the “% of Earthquakes” axis across the bottom instead of the top. I’d also retype the labels so they can be more easily read! Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 11 11 Slide 4- 11 Histogram with the TI Example: Data: 62, 63, 65, 66, 68, 70, 71, 73, 75 Use [STAT][EDIT] to put the dataset in L1. The first few data points are shown. NOTE: You will do this a lot in this course! Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 12 12 Slide 1- 12 Histogram with the TI •First, select [Y1] and turn off or erase any functions from Algebra class! •Press [2nd][Y1] and go to one of the three plots. Turn it on. •Select the histogram. •Make sure that L1 (or wherever you put the data) is in Xlist. •Make sure the 1 is in Freq (unless there is a separate column giving frequency). Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 13 13 Histogram with the TI (default) • • • You can get a window default by selecting Zoom and then 9 Below is the window. It shows a bin width of 3.25. It includes all of the values. Because we have integers, I’d rather have 3 as a bin width. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 14 14 Histogram with the TI •Choose as window X:[60,78];Y[-1,3]. You may have to play with this. – For X, I picked a little lower than the min and a little higher than the max. – For Y, I picked a little bigger than the largest bin frequency than I expected. •Xscl is the length of the bin. In this case, choosing 3 makes cut points at 60, 63, 66, 69. 72. 75, and 78. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 15 15 Usefulness of the Trace function Use the horizontal arrows to navigate the bins. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 16 16 Histograms and StatCrunch • • • • • • Enter Data. Graphics → Histogram Click on the data variable and Next. Select Frequency or Relative Frequency. Put in starting value and/or Binwidth if desired. Click Next twice, and type in labels. Click Create Graph. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 17 Histogram with StatCrunch • • • • • Select Graphics Select Histogram Select the column you want graphed. Select Next. (Do not select “Create Graph” unless you do not want to have control over the bin size. For the same bins as with the TI, “Start Bins” at 60 and set Bin Width equal to 3. Then select “Create Graph”. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 18 18 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 19 19 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 20 20 Results With default bin size Better size Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 21 21 How many bins? • • • No “hard and fast” rule. There is even some disagreement among professionals. Recommendations from slides from two Johns Hopkins graduate Biostatistics Courses. Both depend on the number (n) of data points. ○ Biostatistics 612: √n ○ Biostatistics 651: 2√n I personally would use √n, but would try different numbers to see what looks best. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 22 22 Publisher Instructions: Histogram • • • • • • Histogram: Displays the frequency, relative frequency or density for numerical data combined into classes. Select the column(s) to be displayed in the plot(s). A separate plot will be generated for each column selected. Enter an optional Where clause to specify the data rows to be included in the computation. Select an optional Group by column to construct a histogram for each distinct value of this column. Click the Next button to select either the Frequency, Relative Frequency or Density histogram. In addition, optional values for the starting point of the bins and the bin width may be specified. These parameters will apply to all of the histograms to be constructed. Click the Next button again to specify graph layout options. Click the Create Graph! button to create the plot(s). Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 23 23 Thoughts on Histograms • • • • • Histograms are useful and easy to apply to mostly all types of quantitative data. This is especially true for larger data sets. They can use a lot of ink and space! Color is more useful than black-and-white or grayscale. It can be difficult to display several related datasets at the same time to compare datasets. When you get a default, accept it if you can live with it! If not, at least save (or remember) what you did. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 24 24 “Reading” Histograms The percent of movie lengths for 150 selected movies is given in the histogram on the right. (Data are from the StatCrunch collection for Chapter 3.) Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 25 25 “Reading” Histograms Question 1: How many movies had lengths more than two hours (120 minutes)? Answer: 18 + 4 + 5 + 2 + 1 = 30 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 26 26 “Reading” Histograms Question 1: What percentage of movies were more than two hours (120 minutes) in length? Answer: 18 + 4 + 5 + 2 + 1 = 30 30/150 = 20% Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 27 27 Histograms of test scores for two classes of the same course and instructor. Period 1 Period 2 95 83 98 75 93 82 96 75 93 81 92 73 91 81 87 72 87 78 84 72 87 77 82 70 86 69 80 69 84 28 77 63 77 58 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 28 Comparing the two data sets using histograms • • • • Make sure that you use the same starting point and bin width for both histograms. Do each histogram separately and look at them side by side The gap and suspected outlier are apparent in the first class. The second one is a little more symmetric. Copyright © 2014, 2012, 2009 Pearson Education, Inc. (Same for L2) (Same for L2) 29 Stem-and-Leaf Displays • Stem-and-Leaf: Shows both the shape of the distribution and all of the individual values • Not as visually pleasing as a histogram; more technical looking • Can only be used for small collections of data • The first column (stems) represents the leftmost digit. • The second column (leaves) shows the remaining digit(s). Copyright © 2014, 2012, 2009 Pearson Education, Inc. 30 Stem-and-Leaf Displays • • Stem-and-leaf displays show the distribution of a quantitative variable, like histograms do, while preserving the individual values. Stem-and-leaf displays contain all the information found in a histogram and, when carefully drawn, satisfy the area principle and show the distribution. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 31 31 Stem-and-Leaf Displays • • • They can show a complete dataset in very little space. It is easy to put them back-to-back to compare groups. Invented in 1972 by John Tukey (1915 – 2000) ○ Bell Labs’ NJ ○ “Exploratory Data Analysis”, 1977 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 32 32 Stem-and-Leaf Example Compare the histogram and stem-and-leaf display for the pulse rates of 24 women at a health clinic. Which graphical display do you prefer? Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 33 33 Stem-and-Leaf Displays Stem-and-Leaf plots give all of the data in pictorial form. Stem-and-Leaf plots are useful for smaller datasets. It is not possible to do a stem-and-leaf plot with the TI. They can be done with StatCrunch, but you have no control over the bin sizes. But if the data set is ordered, they are easy to do by hand. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 34 34 Constructing a Stem-and-Leaf Display (by hand) • • • • • First, draw a vertical line. Next, to the left of the line, cut each data value into leading digits (“stems”) and to the right of the line, trailing digits (“leaves”). Use the stems to label the bins. Use only one digit for each leaf—either round or truncate the data values to one decimal place after the stem. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 35 35 Stem and Leaf with StatCrunch • Enter Data • Graphics → Stem and Leaf • Click on the variable name and Next • Select Outlier Trimming Type and Create Graph! Copyright © 2014, 2012, 2009 Pearson Education, Inc. 36 Stem and Leaf with StatCrunch (using movie length data) • Select Graphics. • Select “Stem and Leaf”. • Select the variable you want graphed. • Select a choice for “Leaf unit” (the rough equivalent of a bin width.) • You have more limited control over the leaf unit than you did with the bin width in a histogram. • You can trim outliers if you wish to (after we study outliers later in Chapter 3.) Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 37 37 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 38 38 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 39 39 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 40 Publisher’s Instructions • • • • Stem and Leaf : Displays a character based plot of a column that is similar to a histogram turned on its side. The actual (or approximate) data values are represented in the plot. Select the column(s) to be displayed in the plot(s). A separate plot will be generated for each column selected. Enter an optional Where clause to specify the data rows to be included in the computation. Select an optional Group by column to construct a separate stem and leaf plot for each distinct value of this column. Click the Create Graph! button to create the plot(s). Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 41 41 Stem and leaf plot of test scores for two classes of the same course and instructor. Period 1 Period 2 95 83 98 75 93 82 96 75 93 81 92 73 91 81 87 72 87 78 84 72 87 77 82 70 86 69 80 69 84 28 77 63 77 58 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 42 Stem and leaf plot of test scores for two classes of the same course and instructor. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 43 43 Dotplots • Dotplot: Displays dots to describe the shape of the distribution • There were 30 races with a winning time of 122 seconds. • Good for smaller data sets • Visually more appealing than stem-and-leaf • In StatCrunch: Graphics → Dotplot Copyright © 2014, 2012, 2009 Pearson Education, Inc. 44 Dotplots with StatCrunch Can’t do with the TI or EXCEL. With StatCrunch, again select “Graphics”, then “DotPlot” (as with the Histogram and the Stem and Leaf). In the next panel, you can input axis labels and draw grid lines if you wish. In the following one, you can pick a color scheme. But you have no control over the bin size (see next slide for an example of a dotplot that is not very useful). Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 45 45 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 46 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 47 I personally recommend the histogram over the dotplot. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 48 48 Publisher’s Instructions • • • • • Dotplot : Displays a graphical representation of numerical values as points on a number line. Points with the same pixel representation are stacked on top of each other. If the number of points in a stack exceeds the height of the graphic, each point on the plot may represent more than one observation. If this occurs, the number of observations per point will be shown in the title of the graphic. Select the column(s) to be displayed in the plot(s). If multiple columns are selected, the plots will be stacked in the reverse order of selection in the same graphic. Enter an optional Where clause to specify the data rows to be included in the computation. Select an optional Group by column to construct dotplots for each distinct value of this column. If a Group by column is specified, select either to stack the plots of each group for each column or to stack plots of each column for each group. Click the Next button to specify graph layout options. Click the Create Graph! button to create the plot(s). Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 49 49 Think Before you Draw • Is the variable quantitative? Is the answer to the survey question or result of the experiment a number whose units are known? • Histograms, stem-and-leaf diagrams, and dotplots can only display quantitative data. • Bar and pie charts display categorical data. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 50 Constructing Effective Graphs Source: Agresti & Franklin • • • • Label both axes and provide proper headings To better compare relative size, the vertical axis should start at 0 (if practical) Be cautious in using anything other than bars, lines, or points. Don’t use birds, dollar signs, ships, etc! It can be difficult to portray more than one group on a single graph when the variable values differ greatly 51 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 51 3.2 Shape Copyright © 2014, 2012, 2009 Pearson Education, Inc. 52 Shape, Center, and Spread • When describing a distribution, make sure to always tell about three things: shape, center, and spread… Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 53 53 What is the Shape of the Distribution? 1. Does the histogram have a single, central hump or several separated humps? 2. Is the histogram symmetric? 3. Do any unusual features stick out? Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 54 54 Modes • A Mode of a histogram is a hump or high-frequency bin. • One mode → Unimodal • Two modes → Bimodal • 3 or more → Multimodal Unimodal Bimodal Copyright © 2014, 2012, 2009 Pearson Education, Inc. Multimodal 55 Uniform Distributions • Uniform Distribution: All the bins have the same frequency, or at least close to the same frequency. • The histogram for a uniform distribution will be flat. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 56 Symmetry • The histogram for a symmetric distribution will look the same on the left and the right of its center. Symmetric Not Symmetric Copyright © 2014, 2012, 2009 Pearson Education, Inc. Symmetric 57 Skew • A histogram is skewed right if the longer tail is on the right side of the mode. • A histogram is skewed left if the longer tail is on the left side of the mode. Skewed Right Skewed Left Copyright © 2014, 2012, 2009 Pearson Education, Inc. 58 Examples of Skewness Source: Agresti & Franklin, “Statistics: The Art and Science of Learning from Data”; Pearson, 2007 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 59 59 Examples of Skewness Source: Agresti & Franklin, “Statistics: The Art and Science of Learning from Data”; Pearson, 2007 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 60 60 Outliers • An Outlier is a data value that is far above or far below the rest of the data values. • An outlier is sometimes just an error in the data collection. • An outlier can also be the most important data value. • Income of a CEO • Temperature of a person with a high fever • Elevation at Death Valley Copyright © 2014, 2012, 2009 Pearson Education, Inc. 61 My note on outliers • • • Currently, points that appear as outliers are labeled (by me) as “suspected” outliers. There is a method (explained later in this chapter) for detecting outliers. Once we learn this method and apply it to our data, we have confirmed and outlier (or not), and if a data point is an outlier, we can remove the word “suspected” for that data point. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 62 Example • The histogram shows the amount of money spent by a credit card company’s customers. Describe and interpret the distribution. • The distribution is unimodal. Customers most commonly spent a small amount of money. • The distribution is skewed right. Many customers spent only a small amount and a few were spread out at the high end. • There is a suspected outlier at around $7000. One customer spent much more than the rest of the customers. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 63 3.3 Center Copyright © 2014, 2012, 2009 Pearson Education, Inc. 64 The Median • Median: The center of the data values • Half of the data values are to the left of the median and half are to the right of the median. • For symmetric distributions, the median is directly in the middle. • The median was first proposed by Sir Francis Galton in 1875 as a way of getting to the “average” value of a dataset without cumbersome calculations. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 65 Calculating the Median: Odd Sample Size • First order the numbers. • If there are an odd number of numbers, n, the median is n 1 at position . 2 • • Find the median of the numbers: 2, 4, 5, 6, 7, 9, 9. n 1 7 1 4 2 2 • The median is the fourth number: 6 • Note that there are 3 numbers to the left of 6 and 3 to the right. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 66 Calculating the Median: Even Sample Size • First order the numbers. • If there are an even number of numbers, n, the median is the average of the two middle numbers: n , n 1 . 2 2 • Find the median of the numbers: 2, 2, 4, 6, 7, 8. n 6 3 • 2 2 • The median is the average of the third and the fourth numbers: Median 4 6 5 2 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 67 3.4 Spread Copyright © 2014, 2012, 2009 Pearson Education, Inc. 68 Spread • Locating the center is only part of the story • Are the data all near the center or are they spread out? • Is the highest value much higher than the lowest value? • To describe data, we must discuss both the center and the spread. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 69 Range • The range is the difference between the maximum and minimum values. Range = Maximum – Minimum • The ages of the guests at your dinner party are: 16, 18, 23, 23, 27, 35, 74 • The range is: 74 – 16 = 58 • The range is sensitive to outliers. A single high or low value will affect the range significantly. This makes the range not useful as a measure of spread. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 70 Percentiles and Quartiles • Percentiles divide the data in one hundred groups. • The nth percentile is the data value such that n percent of the data lies below that value. • For large data sets, the median is the 50th percentile. • The median of the lower half of the data is the 25th percentile and is called the first quartile (Q1). • The median of the upper half of the data is the 75th percentile and is called the third quartile (Q3). Copyright © 2014, 2012, 2009 Pearson Education, Inc. 71 StatCrunch, Q1, Median, and Q3 • Enter the data. • Stat → Summary Stats → Columns • Click on the variable and then Calculate. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 72 The Interquartile Range • The Interquartile Range (IQR) is the difference between the upper quartile and the lower quartile IQR = Q3 – Q1 • The IQR measures the range of the middle half of the data. • Example: If Q1 = 23 and Q3 = 44 then IQR = 44 – 23 = 21 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 73 The Interquartile Range • The Interquartile Range for earthquake causing tsunamis is 0.9. • The picture below shows the meaning of the IQR. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 74 Benefits and Drawbacks of the IQR • The Interquartile Range is not sensitive to outliers. • The IQR provides a reasonable summary of the spread of the distribution. • The IQR shows where typical values are, except for the case of a bimodal distribution. • The IQR is not great for a general audience since most people do not know what it is. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 75 3.5 Boxplots and 5-Number Summaries Copyright © 2014, 2012, 2009 Pearson Education, Inc. 76 5-Number Summary • The 5-Number Summary provides a numerical description of the data. It consists of • • • • • • Minimum First Quartile (Q1) Median Third Quartile (Q3) Maximum The list to the right shows the 5-Number Summary for the tsunami data. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 77 Interpreting the 5-Number Summary • The smallest tsunami-causing earthquake had magnitude 3.7. • The largest tsunami-causing earthquake had magnitude 9.1. • The middle half of tsunami-causing earthquakes is between 6.7 and 7.6. • Half of tsunami-causing earthquakes have magnitudes below 7.2 and half are above 7.2. • A tsunami-causing earthquake less than 6.7 is small. • A tsunami-causing earthquake more than 7.6 is small. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 78 Example – Text data, page 53 • The ordered values from the first batch: • 17.5, 2.8, 3.2, 13.9, 14.1, 25.3, 45.8 • Let’s verify the text results with our technology. • Odd number of points • Min = -17.5, Max = 45.8, Med = 13.9 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 79 79 Example – Text data, page 53 How about Q1 and Q3? Book’s method: • For Q1, take the median of the first four points (i.e. including the median). That is, take the median of -17.5, 2.8, 3.2, 13.9, which is 3.0. • For Q3, take the median of the last four points (i.e. including the median). That is, take the median of 13.9, 14.1, 25.3, 45.8, which is 19.7. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 80 80 5 – number summary – TI (except newer 84’s) Select [2nd][STAT] Select [CALC] Select #1, “1-Var Stats”, and then add L1. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 81 81 5 – number summary – TI (newer 84’s) Old operating system: 1-var Stats L1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 82 82 Hmmmmm! For Q1, the text got 3.0 and the TI got 2.8. For Q3, the text got 19.7 and the TI got 25.3. Difference in methodology. The text included the median in the upper-half dataset; the TI did not. Let’s go on to StatCrunch. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 83 83 5-Number Summary - StatCrunch Select Stat, then Summary Statistics, then Columns. Then select the column you want summarized. You will see a list of summary statistics. De-select all except those you want; i.e. Max, Min, Q1, Q3 and Median. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 84 84 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 85 85 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 86 86 The Result with StatCrunch Summary statistics: Column Median Min Max Q1 Q3 var1 13.9 -17.5 45.8 2.8 25.3 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 87 87 Publisher Instructions for Summary Statistics Columns : Provides the following descriptive statistics in tabular format for the column(s) selected: sample size (n), mean, variance, standard deviation (Std. Dev.), Standard Error (Std. Err.), median, range, minimum, maximum, first quartile (Q1) and third quartile (Q3). Select the columns for which summary statistics will be computed. Enter an optional Where clause to specify the data rows to be included in the computation. Select an optional Group By column to group results. If a Group By column is selected, choose whether to display the output in separate tables for each column selected or in separate tables for each group. Click the Next button to select the summary statistics (by default, all are selected) to be computed. The statistics will be displayed in the order in which they are selected (from right to left). Additional percentiles may also be entered as a space or comma delimited list. Check the Store output in data table option if the output is to be placed in the data table. Click the Calculate button to view the results. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 88 88 Other technologies • SAS, StatDisk and MINITAB all agree with the TI and StatCrunch. • EXCEL: PERCENTILE(Array,.25)=3, PERCENTILE(Array,.75)=19.7! • Data Desk, an add-on to EXCEL, gives Q1 = 2.9 and Q3 = 22.5! • There are different ways of computing Q1 (same for Q3) ○ Split list into two halves, include median in each (text) ○ Split list into two halves; don’t include median(TI, SC) • I think that Data Desk used cut points of 0, (1/6), (2/6),(3/6),(4/6),(5/6) and1, and interpolated. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 89 89 Boxes on pp. 53 and 54 of text • • • • • • There are several ways to compute a quartile (we’ve seen 3; the authors have seen 6; other texts say that there are 9.) For large datasets, it makes very little difference. For smaller datasets (where it might make a difference), you do as well to just give the whole dataset rather than the summary statistics! You will be using technology – state which (StatCrunch or TI.) Even StatCrunch and the TI do not agree on some datasets! The IQR can also be different! Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 90 90 Professional Disagreement on basic concepts is common. • • • How do English grammar manuals direct us in writing about a hat that belongs to Boris? • Chicago Manual of Style: Boris’s hat. • American Psychological Association (APA): Boris’ hat. • The HCC English Department uses MLA (Modern Language Association), which accepts either one. Historians disagree on the date of birth of Portuguese explorer Vasco da Gama – 1460 or 1469? Astronomers disagree on the definition of a planet! Is Pluto a real planet or a dwarf planet? Copyright © 2014, 2012, 2009 Pearson Education, Inc. 91 Boxplots • A Boxplot is a chart that displays the 5-Point Summary and the outliers. • Boxplots were invented in 1977 by John Tukey (1915 – 2000) of Bell Labs. • The Box shows the Interquartile Range. • The dashed lines are called fences, outside the fences lie the outliers. • Above and below the box are the whiskers that display the most extreme data values within the fences. • The line inside the box shows the median. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 92 Finding the Fences • The lower fence is defined by Lower Fence = Q1 – 1.5 × IQR • The upper fence is defined by Upper Fence = Q3 + 1.5 × IQR • Tsunami Example: Q1 = 6.7, Q3 = 7.6 IQR = 7.6 – 6.7 = 0.9 • Lower Fence = 6.7 – 1.5 × 0.9 = 5.35 • Upper Fence = 7.6 + 1.5 × 0.9 = 8.95 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 93 Constructing Boxplots by hand 1. Draw a single vertical axis spanning the range of the data. Draw short horizontal lines at the lower and upper quartiles and at the median. Then connect them with vertical lines to form a box. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 5- 94 94 Constructing Boxplots by hand (cont.) 2. Erect “fences” around the main part of the data. • The upper fence is 1.5 IQRs above the upper quartile. • The lower fence is 1.5 IQRs below the lower quartile. • Note: the fences only help with constructing the boxplot and should not appear in the final display. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 5- 95 95 Constructing Boxplots by hand (cont.) 3. Use the fences to grow “whiskers.” • Draw lines from the ends of the box up and down to the most extreme data values found within the fences. • If a data value falls outside one of the fences, we do not connect it with a whisker. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 5- 96 96 Constructing Boxplots by hand(cont.) 4. Add the outliers by displaying any data values beyond the fences with special symbols. • We often use a different symbol for “far outliers” that are farther than 3 IQRs from the quartiles. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 5- 97 97 BOXPLOTS on the TI Clear any prior graph by going to [Y=] and then [CLEAR] for each function. Do [2nd][Y=] Turn Plot1 ON and have your data in list L1. If the other plots are on, turn them off. Select the picture of the BoxPlot. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 98 98 BOXPLOTS on the TI With the standard window, you will likely not get anything! Try Zoom, 9. If that does not work, … Make the window reflect the data, i.e. X[60,80]. Y could be -- say [0,10]. Here’s what you get. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 99 99 TI: 5-number summary, boxplot Step through with the TRACE button: Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 100 100 Identifying Outliers: The 1.5 IQR rule • • • • • • • • The lower fence is Q1 – 1.5 IQR. Anything below that is an outlier (low outlier) The upper fence is Q3 + 1.5 IQR. Anything above that is an outlier (high outlier) If a number is more than 3 IQR’s away, it is a far outlier. We can now check suspected outliers. Error in the text: Page 55, Lower Fence is stated incorrectly (has a + instead of a -). (the math is right, though.) Copyright © 2014, 2012, 2009 Pearson Education, Inc. 101 Identifying Outliers: The 1.5 IQR rule • • • • • Example: SAT Math scores for a group of students 770, 740, 570, 560, 560, 560. 550. 540, 530, 530, 420 Q1 = 530, Median = 560, Q3 = 570 IQR = 570 – 530 = 40 Check for Low Outliers: • Q1 – (1.5*IQR) = 530 – (1.5 * 40) = 470 • 420 is a low outlier • Check for High Outliers • Q3 + (1.5*IQR) = 570 + (1.5 * 40) = 630 • 770 and 740 are high outliers Copyright © 2014, 2012, 2009 Pearson Education, Inc. 102 Movie lengths with StatCrunch • Enter data and go to Graphics → Boxplot. • Click on the variable and Next. • Check “Use fences to identify outliers.” Then Next • Type in labels and click on Create Graph Copyright © 2014, 2012, 2009 Pearson Education, Inc. 103 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 104 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 105 Summary statistics: Column Running Time Min Q1 43 Median 98 104.5 Q3 Max 116 Copyright © 2014, 2012, 2009 Pearson Education, Inc. IQR 160 18 106 Outlier Movie lengths • • • • • • • • Upper outlier: Q3 + (1.5 * IQR) = 116 + (1.5*18) = 143 Any movie with a running time above 143 min. is an upper outlier. There are seven, but you cannot see them all on the boxplot. Lower Outlier: Q1 - (1.5 * IQR) = 98 - (1.5*18) = 71 Any movie with a running time below 71 min. is a lower outlier. There is one. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 107 Source: http://www.causeweb.org/resources/fun/ Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 108 108 StatCrunch and Boxplots • Enter data and go to Graphics → Boxplot. • Click on the variable and Next. • Check “Use fences to identify outliers.” Then Next • Type in labels and click on Create Graph. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 109 Step-by-Step Example of Shape, Center, Spread: Flight Cancellations • Question: How often are flights cancelled? • Who? • What? Percentage of Flights Cancelled at U.S. Airports • When? 1995 – 2011 • Where? United States • How? Bureau of Transportation Statistics Data Months Copyright © 2014, 2012, 2009 Pearson Education, Inc. 110 Flight Cancellations: Think • Identify the Variable • Percent of flight cancellations at U.S. airports • Quantitative: Units are percentages. • How will be data be summarized? • Histogram • Numerical Summary • Boxplot Copyright © 2014, 2012, 2009 Pearson Education, Inc. 111 Flight Cancellations: Show • Use StatCrunch (or the TI) to create the histogram, boxplot, and numerical summary. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 112 Flight Cancellations: Tell • Describe the shape, center, and spread of the distribution. Report on the symmetry, number of modes, and any gaps or outliers. You should also mention any concerns you may have about the data. • Skewed to the Right: Can’t be a negative percent. Bad weather and other airport troubles can cause extreme cancellations. • IQR is small: 1.23%. Consistency among cancellation percents • Extraordinary outlier at 20.2%: September 2001 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 113 Boxplots of test scores for two classes of the same course and instructor. Period 1 Period 2 95 83 98 75 93 82 96 75 93 81 92 73 91 81 87 72 87 78 84 72 87 77 82 70 86 69 80 69 84 28 77 63 77 58 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 114 Boxplots of test scores for two classes of the same course and instructor. • • • • • • • Put Period 1 in L1 and Period 2 in L2. Select 2nd [STAT PLOT] Turn the first two plots on Set up Plot 1 for a boxplot with L1 Set up Plot 2 for a boxplot with L2 Execute Zoom-9 We can tell some things about the scores. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 115 115 3.6 The Center of Symmetric Distributions: The Mean Copyright © 2014, 2012, 2009 Pearson Education, Inc. 116 The Mean • The Mean is what most people think of as the average. • Add up all the numbers and divide by the number of numbers. y y n • Recall that S means “Add them all.” • In StatCrunch, the mean is listed in the Summary Statistics. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 117 First use of the “average” • • • • 450 BC : Hippias used the average length of a king’s reign to estimate the date of occurrence of the first Olympic Games (about 300 years before then, or about 750 BC.) How he estimated the “average” is unknown. His estimate may have been subjective. He estimated the date by multiplying his “average” by the number of kings, which was precisely documented. The most accurate estimate that we have is 776 BC, based on engravings on Mount Olympus giving the names of the winners of a foot race held every four years since that date. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 118 The Mean is the “Balancing Point” • If you put your finger on the mean, the histogram will balance perfectly. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 119 Mean Vs. Median • For symmetric distributions, the mean and the median are equal. • The balancing point is at the center. • The tail “pulls” the mean towards it more than it does to the median. • The mean is more sensitive to outliers than the median. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 120 The Mean Is Attracted to the Outlier • The mean is larger than the median since it is “pulled” to the right by the outlier. • The median is a better measure of the center for data that is skewed. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 121 Why Use the Mean? • Although the median is a better measure of the center, the mean weighs in large and small values better. • The mean is easier to work with. • For symmetric data, statisticians would rather use the mean. • For skewed data, statisticians prefer the median. • It is always ok to report both the mean and the median. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 122 5 – number summary – TI (except newer 84’s) Select [2nd][STAT] Select [CALC] Select #1, “1-Var Stats” and Add L1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 123 123 5 – number summary – TI (newer 84’s) Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 124 124 What’s wrong with these quotes? “We look forward to the day when everyone will receive more than the average wage.” ○ Australian Minister of Labour, 1973 “Lake Woebegone, Minnesota : Where all the women are strong, all the men are good-looking, and all the children are above average” Garrison Keillor (made in jest on the show “A Prairie Home Companion”) Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 125 125 *Weighted Arithmetic Mean Weighted Arithmetic Mean is computed by using following formula: Where: Stands for weighted arithmetic mean. x Stands for values of the items and w Stands for weight of the item Source: http://www.emathzone.com/tutorials/basicstatistics/weighted-arithmetic-mean.html Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 126 126 *Example: Weighted Mean - GPA A freshman receives the following grades Assume 4 points for an A, 3 for a B. What is his grade point average? Course Credits Grade Intro to Literature 3 B Russian I 3 A Physics I 4 A Calculus I 4 A Chemistry I 4 B Physical Education I 1 A Copyright © 2014, 2012, 2009 Pearson Education, Inc. Points 3 4 4 4 3 4 Slide 4- 127 127 *Example: Weighted Mean - GPA Use ∑Credits*Points ∑Credits ∑Credits*Points = 69 ∑ Credits =19 69/19 = 3.63. Credits Grade Points Credits*P oints 3 B 3 9 3 A 4 12 4 A 4 16 4 A 4 16 4 B 3 12 1 A 4 4 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 128 128 *Example: Weighted Mean – Customer Ratings Amazon.com is reviewing the ratings on a line of projects. Customers rate 1 to 5, 1 = Worst, 5 = Best Ratings (and number giving each rating) are on the right What is the average rating for the product. Rating 5 Number of customers 57 4 73 3 36 2 7 1 10 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 129 129 *Example: Weighted Mean – Customer Ratings Use ∑Ratings*Customers ∑Customers ∑Rtgs*Cust = 709 ∑ Cust =183 789/183 = 3.874. Rating 5 Customers 57 Total 285 4 73 292 3 36 108 2 7 14 1 10 10 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 130 130 With the TI Put the ratings (5 to 1) in L1 and the number in L2. Old operating system: Do 1-varStats L1,L2. L1 comma L2 New operating system: 1-var Stats List: L1 Frequency: L2 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 131 131 3.7 The Spread of Symmetric Distributions: The Standard Deviation Copyright © 2014, 2012, 2009 Pearson Education, Inc. 132 The Variance • • • • • • s 2 y y 2 n 1 The variance is a measure of how far the data is spread out from the mean. The difference from the mean is: y y . To make it positive, square it. Then find the average of all of these distances, except instead of dividing by n, divide by n – 1. Use s2 to represent the variance. The variance will mostly be used to find the standard deviation s which is the square root of the variance. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 133 Standard Deviation s y y 2 n 1 The variance’s units are the square of the original units. Taking the square root of the variance gives the standard deviation, which will have the same units as y. • The standard deviation was first used by Karl Pearson in 1894. • The standard deviation is a number that is close to the average distances that the y values are from the mean. • If data values are close to the mean (less spread out), then the standard deviation will be small. • If data values are far from the mean (more spread out), then the standard deviation will be large. • • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 134 Standard Deviation by hand (Technology is easier!) X (X – Xbar) (X –Xbar)2 98 9 81 96 7 49 92 3 9 87 - 2 4 85 - 4 16 83 - 6 36 82 - 7 49 • 244 / 6 = 40.66667 square dollars (What are those?). • The square root of 40.66667 is $6.38 • We have obtained the standard deviation. • Units are the same as in the original data (dollars) Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 135 135 Questions about Variance s 2 y y 2 n 1 Why n – 1 instead of n? It has to do with a concept called degrees of freedom. We will see this in later chapters (Chapter 18). Essentially, it is the number of entitles that can be freely changed if the sum (or the mean) remains constant. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 136 136 Slide 5- 136 The Standard Deviation and Histograms Order the histograms below from smallest standard deviation to largest standard deviation. A B C Answer: C, A, B Copyright © 2014, 2012, 2009 Pearson Education, Inc. 137 Mean and Standard Deviation - TI For the numbers 62, 63, 65, 66, 68, 70, 71, 73, 75: Press [STAT], [CALC], 1Var Stats The mean is 68.1111 The st. dev is 4.4845 Use the sx instead of the σx (will explain later in the course.) Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 138 138 Mean and Standard Deviation - StatCrunch Numbers in Var1. Select Stat, then Summary Stats, then Column as before. Give Var1 as your input column. Under Statistics, make sure that Mean and Standard Deviation are checked. (You can check others.) Click Create Summary statistics: Column n Mean Std. Dev. var1 9 68.111115 4.4845414 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 139 139 **EXCEL summary statistics Summary Statistic EXCEL function Mean =average(a1:a7) 12.514 2.8 Standard Deviation =stdev(a1:a7) 19.824 3.2 Median =median(a1:a7) 13.9 1st quartile =quartile(A1:A7,1) 3 14.1 3rd quartile =quartile(A1:A7,3) 19.7 25.3 Minimum =min(a1:a7) -17.5 45.8 Maximum =max(a1:a7) 45.8 Skewness =skew(a1:a7) 0.3059 Kurtosis =KURT(A1:A7) 0.8924 -17.5 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Answer 13.9 Slide 4- 140 140 *Other summary measures: Skewness For data points Y1, Y2, …, Yn, the skewness is defined as Note that it involves “cubes”, the third power. The data are positively or negatively skewed depending on whether this quantity is greater than or less than 0. The magnitude of this quantity is a measure of how skewed the data are. Source: Wikipedia Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 141 141 *Other summary measures: Kurtosis Kurtosis is a measure of how peaked or flat your data are. Mathematically, kurtosis is defined as: _ 2 ( x x) _ 4 3 ( x x) Note that this involves the fourth power. A value of 0 indicates a perfect bell shape Greater than 0: More peaked Less than 0: Flatter Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 142 142 *Other summary measures: Coefficient of Variation You may see this in upper level textbooks. The “coefficient of variation” is the standard deviation divided by the mean. For the most recent example, CV = 0.06584. This is normally expressed as a percent, i.e. CV=6.584%. Notice that the CV is “unitless”. This is an advantage since it allows us to compare different populations. We will see this a lot in the course. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 143 143 Thinking About Variation Since Statistics is about variation, spread is an important fundamental concept of Statistics. Measures of spread help us talk about what we don’t know. When the data values are tightly clustered around the center of the distribution, the IQR and standard deviation will be small. When the data values are scattered far from the center, the IQR and standard deviation will be large. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 144 144 Tell - Draw a Picture When telling about quantitative variables, start by making a histogram or stem-and-leaf display and discuss the shape of the distribution. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 145 145 3.8 Summary—What to Tell About a Quantitative Variable Copyright © 2014, 2012, 2009 Pearson Education, Inc. 146 What to Tell • Histogram, Stem-and-Leaf, Boxplot • Describe modality, symmetry, outliers • Center and Spread • Median and IQR if not symmetric • Mean and Standard Deviation if symmetric. • Unimodal symmetric data: IQR > s. Check for errors. • Unusual Features • For multiple modes, possibly split the data into groups. • When there are outliers, report the mean and standard deviation with and without the outliers. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 147 Example: Fuel Efficiency • The car owner has checked the fuel efficiency each time he filled the tank. How would you describe the fuel efficiency? • Plan: Summarize the distribution of the car’s fuel efficiency. • Variable: mpg for 100 fill ups, Quantitative • Mechanics: show a histogram • Fairly symmetric • Low outlier Copyright © 2014, 2012, 2009 Pearson Education, Inc. 148 Fuel Efficiency Continued • Which to report? • The mean and median are close. • Report the mean and standard deviation. • Conclusion • Distribution is unimodal and symmetric. • Mean is 22.4 mpg. • Low outlier may be investigated, but limited effect on the mean • s = 2.45; from one filling to the next, fuel efficiency differs from the mean by an average of about 2.45 mpg. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 149 Practice Recall: Suppose a basketball player scored the following number of points in his last 15 games: 4, 4, 3, 4, 7, 16, 12, 15, 6, 8, 5, 9, 8, 25, 11 Describe the shape of the distribution (modality, skew, and unusual features) . Use a starting point of 3 and a bin width of 4. Reordered, the points are: 3, 4, 4, 4, 5, 6, 7, 8, 8, 9, 11, 12, 15, 16, 25 If you are using technology, you need not reorder. What measures of center or spread would be most appropriate for this data set? Source: Mrs. Emily Francis, Instructor of Mathematics, HCC Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 150 150 Answer to practice exercise • • • Modality: Unimodal Symmetry: Skewed right. Gap and suspected outlier. • Because of skewness: • • Measure of center: Median Measure of spread: IQR Copyright © 2014, 2012, 2009 Pearson Education, Inc. 151 Practice #26: A meteorologist preparing a talk about global warming compiled a list of weekly low temperatures (in degrees Fahrenheit) he observed at his south Florida home last year. The coldest temp. for any week was 36F, but he inadvertently recorded the Celsius value of 2 degrees. Assuming he correctly listed all the other temperatures, explain how this error will affect these summary statistics: • Measures • Measures of center: mean and median of spread: range, IQR, and standard deviation Source: Mrs. Emily Francis, Instructor of Mathematics, HCC Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 152 152 Answer to practice exercise • • • • • • Recording 2oC instead of 36oF: Mean: Will decrease Median: Should remain the same or decrease by a small amount. Range: Should increase. IQR: Should remain the same or increase by a small amount. Standard deviation: Should increase unless there are a lot of cold temperatures recorded. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 153 Practice The table displays the heights (in inches) of 130 members of a choir a) Find the median and IQR b) Find the mean and standard deviation c) Display these data with a histogram d) Write a few sentences describing the distribution Put the data into the TI as we did with the weighted mean calculation. Height Count Height Count 60 2 69 5 61 6 70 11 62 9 71 8 63 7 72 9 64 5 73 4 65 20 74 2 66 18 75 4 67 7 76 1 68 12 Source: Mrs. Emily Francis, Instructor of Mathematics, HCC Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 154 154 3.end Wrap-up Copyright © 2014, 2012, 2009 Pearson Education, Inc. 155 What Can Go Wrong? • Don’t make a histogram for categorical data. • Don’t look for shape, center, and spread for a bar chart. • Choose a bin width appropriate for the data. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 156 What Can Go Wrong? Continued • Do a reality check • Don’t blindly trust your calculator. For example, a mean student age of 193 years old is nonsense. • Sort before finding the median and percentiles. • 315, 8, 2, 49, 97 does not have median of 2. • Don’t worry about small differences in the quartile calculation. • Don’t compute numerical summaries for a categorical variable. • The mean Social Security number is meaningless. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 157 What Can Go Wrong? Continued • Don’t report too many decimal places. • Citing the mean fuel efficiency as 22.417822453 is going overboard. • Don’t round in the middle of a calculation. • For multiple modes, think about separating groups. • Heights of people → Separate men and women • Beware of outliers, the mean and standard deviation are sensitive to outliers. • Use a histogram or dotplot to ensure that the mean and standard deviation really do describe the data. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 158 Division of Mathematics, HCC Course Objectives for Chapter 3 After studying this chapter, the student will be able to: 8. 9. 10. 11. 12. 13. 14. 15. 16. Appropriately display quantitative data using a frequency distribution, histogram, relative frequency histogram, stem-and-leaf display, dotplot. Describe the general shape of a distribution in terms of shape, center and spread. Describe any anomalies or extraordinary features revealed by the display of a variable. Compute and apply the concepts of mean and median to a set of data. Compute and apply the concept of the standard deviation and IQR to a set of data. Select a suitable measure of center/spread for a variable based on information about its distribution. Create a five-number summary of a variable. Construct a boxplot by hand and with technology. Use the 1.5 IQR rule to identify possible outliers Copyright © 2014, 2012, 2009 Pearson Education, Inc. 159 Division of Mathematics, HCC Course Objectives for Chapter 4 After studying this chapter, the student will be able to: 17. Construct side-by-side histograms or boxplots for two or more groups. 18. Compare the distributions of two or more groups by comparing their shapes, centers, spreads, and unusual features. We have already completed this! Copyright © 2014, 2012, 2009 Pearson Education, Inc. 160