Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 4 Displaying and Summarizing Quantitative Data Performance Scales: 1. Know the goals and the plots on the real number line (dot plots, histograms, and box plots). 2. Use the goals to understand the plots on the real number line (dot plots, histograms, and box plots). 3. Use the goals to represent data with plots on the real number line (dot plots, histograms, and box plots) MAFS.912.S-ID.1.1. 4. Adapts and applies the goals to represent data with plots on the real number line (dot plots, histograms, and box plots) in different and more complex problems. Learning Goals 1. Know how to display the distribution of a quantitative variable with a histogram, a stem-and-leaf display, or a dotplot. 2. Know how to display the relative position of quantitative variable with a Cumulative Frequency Curve and analysis the Cumulative Frequency Curve. 3. Be able to describe the distribution of a quantitative variable in terms of its shape. 4. Be able to describe any anomalies or extraordinary features revealed by the display of a variable. Learning Goals 5. Be able to determine the shape of the distribution of a variable by knowing something about the data. 6. Know the basic properties and how to compute the mean and median of a set of data. 7. Understand the properties of a skewed distribution. 8. Know the basic properties and how to compute the standard deviation and IQR of a set of data. Learning Goals 9. Understand which measures of center and spread are resistant and which are not. 10. Be able to select a suitable measure of center and a suitable measure of spread for a variable based on information about its distribution. 11. Be able to describe the distribution of a quantitative variable in terms of its shape, center, and spread. Learning Goal 1 Know how to display the distribution of a quantitative variable with a histogram, a stem-and-leaf display, or a dotplot Learning Goal 1: Ways to Graph Quantitative Data Histograms and Stemplots These are summary graphs for a single variable. They are very useful to understand the pattern of variability in the data. Dotplots Quick and easy graph for small data sets. Cumulative Frequency Curves (Ogive) Used to compare relative standings of the data. Line Graphs: Time Plots Use when there is a meaningful sequence, like time. The line connecting the points helps emphasize any change over time. Learning Goal 1: Dealing With a Lot of Numbers… Summarizing the data will help us when we look at large sets of quantitative data. Without summaries of the data, it’s hard to grasp what the data tell us. The best thing to do is to make a picture… We can’t use bar charts or pie charts for quantitative data, since those displays are for categorical variables. Learning Goal 1: Tabulating Numerical Data What is a Frequency Distribution (table)? A frequency distribution is a list or a table … containing class groupings (ranges within which the data fall) ... and the corresponding frequencies with which data fall within each grouping or class. Learning Goal 1: Why Use a Frequency Distribution? It is a way to summarize numerical data. It condenses the raw data into a more useful form. It allows for a quick visual interpretation of the data. Quantitative Data HISTOGRAM Learning Goal 1: Histograms A Histogram is a graph that uses bars to portray the frequencies or the relative frequencies of the possible outcomes for a quantitative variable. 12 Learning Goal 1: Histograms The most common graph used to display one variable quantitative data. Learning Goal 1: Histograms To make a histogram we first need to organize the data using a quantitative frequency table. Two types of quantitative data 1. Discrete – use ungrouped frequency table to organize. 2. Continuous – use grouped frequency table to organize. Learning Goal 1: Quantitative Frequency Tables – Ungrouped • What is an ungrouped frequency table? An ungrouped frequency table simply lists the data values with the corresponding frequency counts with which each value occurs. • Commonly used with discrete quantitative data. Learning Goal 1: Quantitative Frequency Tables – Ungrouped • Example: The at-rest pulse rate for 16 athletes at a meet were 57, 57, 56, 57, 58, 56, 54, 64, 53, 54, 54, 55, 57, 55, 60, and 58. Summarize the information with an ungrouped frequency distribution. Learning Goal 1: Quantitative Frequency Tables – Ungrouped Example continued: 57, 57, 56, 57, 58, 56, 54, 64, 53, 54, 54, 55, 57, 55, 60, 58. Note: The (ungrouped) classes are the observed values themselves. Class (pulse rate) Frequency, f 53 1 54 3 55 2 56 2 57 4 58 2 59 0 60 1 61 0 62 0 63 0 64 1 Total N =16 Learning Goal 1: Quantitative Relative Freq. Tables - Ungrouped Class (pulse rate) Frequency, f Relative Frequency 53 1 0.0625 54 3 0.1875 55 2 0.1250 56 2 0.1250 57 4 0.2500 58 2 0.1250 59 0 0 60 1 0.0625 61 0 0 62 0 0 63 0 0 64 1 0.0625 Total N =16 1 Note: The relative freq. for a class is obtained by computing f/n. Learning Goal 1: Relative Freq. Tables – Your Turn TVs per Household Trends in Television, published by the Television Bureau of Advertising, provides information on television ownership. The table gives the number of TV sets per household for 50 randomly selected households. Use classes based on a single value to construct a ungrouped-data relative frequency table for these data. Learning Goal 1: Relative Freq. Tables – Solution Learning Goal 1: Quantitative Frequency Tables – Grouped • What is a grouped frequency table? A grouped frequency table is obtained by constructing classes (or intervals) for the data, and then listing the corresponding number of values (frequency counts) in each interval. • Commonly used with continuous quantitative data. Learning Goal 1: Quantitative Frequency Tables – Grouped Class: an interval of values. Example: 61 x 70. Frequency: the number of data values that fall within a class. “Five data fall within the class 61 x 70”. Relative Frequency: the proportion of data values that fall within a class. “18% of the data fall within the class 61 x 70”. Learning Goal 1: Grouped Frequency Tables – Example A frequency table organizes quantitative data. partitions data into classes (intervals). shows how many data values are in each class. Test Score Number of Students 61-70 4 71-80 8 81-90 15 91-100 7 Learning Goal 1: Grouped Frequency Table Terminology Class - non-overlapping intervals the data is divided into. Class Limits –The smallest and largest observed values in a given class. Class Boundaries – Fall halfway between the upper class limit for the smaller class and the lower class limit for larger class. Used to close the gap between classes. Class Width – The difference between the class boundaries for a given class. Class Midpoint or Mark – The midpoint of a class. Learning Goal 1: Grouped Frequency Tables – Classes • A grouped frequency table should have a minimum of 5 classes and a maximum of 20 classes. • For small data sets, one can use between 5 and 10 classes. • For large data sets, one can use up to 20 classes. Learning Goal 1: Number of Classes Same data set Too Many Classes - Not summarized enough. Learning Goal 1: Number of Classes Same data set Too Few Classes – summarized too much. Learning Goal 1: Number of Classes Same data set Correct Number of Classes – 5 to 10. Learning Goal 1: Class Limits Lower Class Limits are the smallest numbers that can actually belong to different classes. Lower Class Limits Learning Goal 1: Class Limits Upper Class Limits are the largest numbers that can actually belong to different classes. Upper Class Limits Learning Goal 1: Class Boundaries Class Boundaries are the numbers used to separate classes, but without the gaps created by class limits. Class boundaries split the gap, created by the class limits between two consecutive classes, in half. Half of the gap is given to the upper class and half given to the lower class. Thus, bringing the bars of the two consecutive classes together, with no gap. Learning Goal 1: Structure of a Data Class A “class” is basically an interval on a number (b + 0.5) - (a - 0.5) line. It has: A lower limit a and an upper limit b. A width. A lower boundary and an upper boundary (integer data). A midpoint. Learning Goal 1: Structure of a Data Class - Problem (b + 0.5) - (a - 0.5) If a = 60 and b = 69 for integer data, what is the value of the lower boundary? a). 60 b). 59.5 c). 9 d). 64.5 Learning Goal 1: Structure of a Data Class - Problem (b + 0.5) - (a - 0.5) If a = 60 and b = 69 for integer data, what is the value of the lower boundary? a). 60 b). 59.5 c). 9 d). 64.5 Learning Goal 1: Class Boundaries Class Boundaries are the number separating classes. - 0.5 Class Boundaries 99.5 199.5 299.5 399.5 499.5 Learning Goal 1: Class Midpoints or Class Mark Class Midpoint or Class Mark is the midpoint of each class. Class midpoints can be found by adding the lower class limit to the upper class limit and dividing the sum by two. Learning Goal 1: Class Midpoints Class Midpoint is the midpoint of each class. Class Midpoints 49.5 149.5 249.5 349.5 449.5 Learning Goal 1: Class Width Class Width is the difference between two consecutive lower class limits or two consecutive lower class boundaries Class Width 100 100 100 100 100 Learning Goal 1: Constructing A Frequency Table 1. Decide on the number of classes (should be between 5 and 20) . 2. Calculate (round up). class width (highest value) – (lowest value) number of classes 3. Starting point: Begin by choosing a lower limit of the first class. 4. Using the lower limit of the first class and class width, proceed to list the lower class limits. 5. List the lower class limits in a vertical column and proceed to enter the upper class limits. 6. Go through the data set putting a tally in the appropriate class for each data value. Learning Goal 1: Constructing A Frequency Table - Example A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature. 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27 Learning Goal 1: Constructing A Frequency Table - Example Sort raw data in ascending order: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Find range: 58 - 12 = 46 Select number of classes: 5 (usually between 5 and 10) Compute class interval (width): 10 (46/5 then round up) Determine lower class (limits): 10, 20, 30, 40, 50. List in a vertical column. Compute upper class limits 19, 29, 39, 49, 59, and then class midpoints: 14.5, 24.5, 34.5, 44.5, 54.5. Count observations & assign to classes (continued) Learning Goal 1: Constructing A Frequency Table - Example (continued) Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Class 10 - 19 20 - 29 30 - 39 40 - 49 50 - 59 Total Relative Frequency Frequency 3 6 5 4 2 20 .15 .30 .25 .20 .10 1.00 Percentage 15% 30% 25% 20% 10% 100% Learning Goal 1: Tip for Constructing A Frequency Table Use Tally marks to count the data in each class. Record the frequencies (and relative frequencies if desired) on the table. Learning Goal 1: Histogram Then to make the Histogram, graph the Frequency Table data. Learning Goal 1: Making a Histogram • • • • • Make a frequency table. Choose appropriate scale for vertical axis (freq. or relative freq.) and horizontal axis (based on classes). Label both axis. Place class boundaries on horizontal axis. Place frequencies on vertical axis. For each class, draw a bar with height equal to the class frequency and width equal to the class width. Title the graph. Learning Goal 1: Making a Histogram Class Midpoint Frequency Class 15 25 35 45 55 3 6 5 4 2 Histogram: Daily High Tem perature 7 6 Frequency 10 - 19 20 - 29 30 - 39 40 - 49 50 - 59 (No gaps between bars) 5 4 3 2 1 0 5 15 25 35 45 55 Temperatures (degrees) 65 Learning Goal 1: Frequency Table From a Histogram • There are several procedures that one can use to construct a grouped frequency tables. • However, because of the many statistical software packages (MINITAB, SPSS etc.) and graphing calculators (TI-84 etc.) available today, it is not necessary to try to construct such distributions using pencil and paper. Learning Goal 1: Frequency Table From a Histogram • The weights of 30 female students majoring in Physical Education on a college campus are as follows: 143, 113, 107, 151, 90, 139, 136, 126, 122, 127, 123, 137, 132, 121, 112, 132, 133, 121, 126, 104, 140, 138, 99, 134, 119, 112, 133, 104, 129, and 123. • Summarize the data with a frequency distribution using seven classes. Learning Goal 1: Frequency Table From a Histogram • The MINITAB statistical software was used to generate the histogram (similar to the histogram on our TI-84) in the next slide. • The histogram has seven classes. • Classes for the weights are along the x-axis and frequencies are along the y-axis. • The number at the top of each rectangular box, represents the frequency for the class. Learning Goal 1: Frequency Table From a Histogram Histogram with 7 classes for the weights. Learning Goal 1: Frequency Table From a Histogram • Observations • From the histogram, the classes (intervals) are 85 – 95, 95 – 105,105 – 115 etc. with corresponding frequencies of 1, 3, 4, etc. • We will use this information to construct the group frequency distribution. Learning Goal 1: Frequency Table From a Histogram • Observations (continued) • Observe that the upper class limit of 95 for the class 85 – 95 is listed as the lower class limit for the class 95 – 105. • Since the value of 95 cannot be included in both classes, we will use the convention that the upper class limit is not included in the class. Learning Goal 1: Frequency Table From a Histogram • Observations (continued) • That is, the class 85 – 95 should be interpreted as having the values 85 and up to 95 but not including the value of 95. • Using these observations, the grouped frequency distribution is constructed from the histogram and is given on the next slide. Learning Goal 1: Frequency Table From a Histogram Class (weight) Frequency 85 – 95 1 95 – 105 3 105 – 115 4 115 – 125 6 125 – 135 9 135 – 145 6 145 – 155 1 Total n = 30 Learning Goal 1: Using the TI-84 to Make Histograms Start by entering data into a list (STAT / Edit / L1). Example: Enter the presidential data on the next slide into list L1. Learning Goal 1: Using the TI-84 to Make Histograms Learning Goal 1: Using the TI-84 to Make Histograms Choose 2nd: Stat Plot to choose a histogram plot. Caution: Watch out for other plots that might be “turned on” or equations that might be graphed. Learning Goal 1: Using the TI-84 to Make Histograms Turn the plot “on”, Choose the histogram plot. Xlist should point to the location of the data. Learning Goal 1: Using the TI-84 to Make Histograms Under the “Zoom” menu, choose option 9: ZoomStat Learning Goal 1: Using the TI-84 to Make Histograms The result is a histogram where the calculator has decided the width and location of the ranges. You can use the Trace key to get information about the ranges and the frequencies. Learning Goal 1: Using the TI-84 to Make Histograms You can change the size and location of the ranges by using the Window button. Use the Xscl to change the class width on the graph. Press the Graph button to see the results Learning Goal 1: Using the TI-84 to Make Histograms Voila! Of course, you can still change the ranges if you don’t like the results. And you can construct a frequency table from the histogram. Learning Goal 1: Using the TI-84 to Make Histograms – Your Turn Using the data given, on sodium in cereals, construct a histogram on your TI – 84 and then using your histogram construct a frequency/relative frequency table. Use 8 classes, with a lower class limit of 0. Sodium Data: 0 210 260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180 Learning Goal 1: Using the TI-84 to Make Histograms – Solution STAT, EDIT, (enter data) STAT PLOT ZOOM, #9:ZoomStat Sodium Data: 0 210 260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180 64 Learning Goal 1: Using the TI-84 to Make Histograms – Solution Sodium Data: 0 210 260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180 65 Learning Goal 1: Using the TI-84 to Make Histograms – Solution Sodium Data: 0 210 260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180 66 Learning Goal 1: TI-84 to Make Histogram Using Freq. Table Data Class Limits 350 to < 450 450 to < 550 550 to < 650 650 to < 750 750 to < 850 850 to < 950 Frequency 11 10 2 2 2 1 Same as raw data, using the class midpoint to represent the class. Learning Goal 1: TI-84 to Make Histogram Using Freq. Table Data Enter the data into 2 lists. L1 is the classes (class midpoint) and L2 is the frequency. Learning Goal 1: TI-84 to Make Histogram Using Freq. Table Data Turn on Stats Plot1 and select the histogram. Xlist is L1 the classes and Freq is L2 the frequencies. Learning Goal 1: TI-84 to Make Histogram Using Freq. Table Data Select ZoomStat to graph the histogram. Learning Goal 1: TI-84 to Make Histogram Using Freq. Table Data Adjust the WINDOW to improve the picture and/or make the values better. Learning Goal 1: TI-84 to Make Histogram Using Freq. Table Data Use the Trace Key to determine values on the graph. Learning Goal 1: Freq. Histogram vs Relative Freq. Histogram Frequency Histogram - a bar graph in which the horizontal scale represents the classes of data values and the vertical scale represents the frequencies. Learning Goal 1: Freq. Histogram vs Relative Freq. Histogram Relative Frequency Histogram - has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies. Learning Goal 1: Freq. Histogram vs Relative Freq. Histogram They look the same with the exception of the vertical axis scale. Learning Goal 1: Freq. Histogram vs Relative Freq. Histogram - Example Learning Goal 1: Histograms - Facts • Histograms are useful when the data values are quantitative. • A histogram gives an estimate of the shape of the distribution of the population from which the sample was taken. • If the relative frequencies were plotted along the vertical axis to produce the histogram, the shape will be the same as when the frequencies are used. Learning Goal 1: Anatomy of a Histogram Title Note that there are no spaces between bars. (continuous data) Number of observations. Height of each bar represents the frequency in each class. Number of occurrences (frequencies) are shown on the vertical axis. Empty Class: No data were recorded between 75 and 80. Each bar represents a class. The number of classes is usually between 5 and 20. Here, there are 17 classes. The width of each class is determined by dividing the range of the data set by the number of classes, and rounding up. In this data set, the range is 82. 82/17 = 4.8, rounded up to 5. This class goes from 5 to 10. Label both horizontal and vertical axes. The numbers shown on the horizontal axis are the boundaries of each class. NOTE: Sometimes the numbers shown on the horizontal axis are the midpoints of each class. (A class midpoint is also referred to as the mark of the class.) Quantitative Data STEM AND LEAF PLOT Learning Goal 1: Stem-and-Leaf Plots • What is a stem-and-leaf plot? A stem-andleaf plot is a data plot that uses part of a data value as the stem to form groups or classes and part of the data value as the leaf. • Most often used for small or medium sized data sets. For larger data sets, histograms do a better job. • Note: A stem-and-leaf plot has an advantage over a grouped frequency table or histogram, since a stem-and-leaf plot retains the actual data by showing them in graphic form. Learning Goal 1: Stem-and-Leaf Plots Stem-and-leaf plots are used for summarizing quantitative variables. Separate each observation into a stem (first part of the number) and a leaf (typically the last digit of the number). Write the stems in a vertical column ordered from smallest to largest, including empty stems; draw a vertical line to the right of the stems. Write each leaf in the row to the right of its stem in order. 81 Learning Goal 1: Stem and Leaf Plot Construction Learning Goal 1: Stem-and-Leaf Plots How to make a stemplot: 1) Separate each observation into a stem, consisting of all but the final (rightmost) digit, and a leaf, which is that remaining final digit. Stems may have as many digits as needed. Use only one digit for each leaf—either round or truncate the data values to one decimal place after the stem. 2) Write the stems in a vertical column with the smallest value at the top, and draw a vertical line at the right of this column. 3) Write each leaf in the row to the right of its stem, in increasing order out from the stem. Title and include key.Original data: 9, 9, 22, 32, 33, 39, 39, 42, 49, 52, 58, 70. STEM LEAVES Include key – how to read the stemplot. 0|9 = 9 Learning Goal 1: Stem-and-Leaf Plots – Picking Stems Data in ordered array: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 Here, use the 10’s digit for the stem unit: Stem Leaf 2 1 21 is shown as 38 is shown as 3 8 41 is shown as 4 1 Learning Goal 1: Stem-and-Leaf Plots – Picking Stems (continued) Completed stem-and-leaf diagram: Data in ordered array: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 Stem Key 3⃓ 0 = 30 Leaves 2 1 4 4 6 7 7 3 0 2 8 4 1 Learning Goal 1: Stem-and-Leaf Plots - Using Other Stem Units Using the 100’s digit as the stem: Round off the 10’s digit to form the leaves Stem Leaf 613 would become (610) 6 1 776 would become (780) 7 8 12 2 ... 1224 becomes (1220) Learning Goal 1: Stem-and-Leaf Plots - Using Other Stem Units (continued) Using the 100’s digit as the stem: The completed stem-and-leaf display: Data: 613, 632, 658, 717, 722, 750, 776, 827, 841, 859, 863, 891, 894, 906, 928, 933, 955, 982, 1034, 1047,1056, 1140, 1169, 1224 Stem 6 Leaves 136 7 2258 8 346699 9 13368 10 356 11 47 12 2 Key 6⃓ 3 = 630 Learning Goal 1: Stem-and-Leaf Plots - Example Construct a stem-and-leaf diagram, which simultaneously groups the data and provides a graphical display similar to a histogram. Learning Goal 1: Stem-and-Leaf Plots - Example Put the data in a List in the TI – 84 (STAT/EDIT/L1). Order the data using sort ascending function (STAT/EDIT/2:SortA(… ) and List 1. Learning Goal 1: Stem-and-Leaf Plots - Example Return to the list (STAT/EDIT) to view ordered data. Learning Goal 1: Stem-and-Leaf Plots - Example First, list the leading digits of the numbers in the table (3, 4, . . . , 9) in a column, as shown to the left of the vertical rule. Next, write the final digit of each number from the table to the right of the vertical rule in the row containing the appropriate leading digit. Do not forget the title and key. Learning Goal 1: Stem-and-Leaf Plots - Variation Splitting Stems – (too few stems or classes) Split stems to double the number of stems when all the leaves would otherwise fall on just a few stems. Each stem appears twice. Leaves 0-4 go on the 1st stem. Leaves 5-9 go on the 2nd stem. Learning Goal 1: Stem-and-Leaf Plots – Split Stems Example A pediatrician tested the cholesterol levels of several young patients and was alarmed to find that many had levels higher than 200 mg per 100 mL. The table below presents the readings of 20 patients with high levels. Construct a stem-and-leaf diagram for these data by using a. one line per stem. b. Split Stems - two lines per stem. Learning Goal 1: Stem-and-Leaf Plots – Split Stems Example The stem-and-leaf diagram in (a) is only moderately helpful because there are so few stems. (b) is a better stem-and-leaf diagram for these data. It uses Split Stems - two lines for each stem, with the first line for the leaf digits 0-4 and the second line for the leaf digits 5-9. Cholesterol Levels Key 19⃓ 9 = 199 Cholesterol Levels Key 19⃓ 9 = 199 Learning Goal 1: Stem-and-Leaf Plots - Your Turn • A sample of the number of admissions to a psychiatric ward at a local hospital during the full phases of the moon is as follows: 22, 30, 21, 27, 31, 36, 20, 28, 25, 33, 21, 38, 32, 35, 26, 19, 43, 30, 30, 34, 27, and 41. • Display the data in an appropriate stem-and-leaf plot. Learning Goal 1: Stem-and-Leaf Plots – Correct Solution Admissions to Psychiatric Ward 1 1 9 2 0112 2 56778 3 0001234 3 568 4 13 4 Key 3⃓ 5 = 35 Learning Goal 1: Stem-and-Leaf Plots – Incorrect Solution Key: 1|9 = 19 Admissions to Psychiatric Ward 1 2 3 4 9 0 1 1 2 5 6 7 7 8 0 0 0 1 2 3 4 5 6 8 1 3 Learning Goal 1: Stemplots versus Histograms Stemplots are quick and dirty histograms that can easily be done by hand, therefore, very convenient for back of the envelope calculations. However, they are rarely found in scientific or laymen publications. Learning Goal 1: Stemplots versus Histograms Stem-and-leaf displays show the distribution of a quantitative variable, like histograms do, while preserving the individual values. Stem-and-leaf displays contain all the information found in a histogram and, when carefully drawn, satisfy the area principle and show the distribution. Quantitative Data DOTPLOTS Learning Goal 1: Dotplots • What is a dot plot? A dot plot is a plot that displays a dot for each value in a data set along a number line. If there are multiple occurrences of a specific value, then the dots will be stacked vertically. Learning Goal 1: Dotplots A dotplot is a simple display. It just places a dot along an axis for each case in the data. The dotplot to the right shows Kentucky Derby winning times, plotting each race as its own dot. You may see a dotplot displayed horizontally or vertically. Learning Goal 1: Dotplots To construct a dot plot 1. 2. 3. 4. Draw a horizontal line. Label it with the name of the variable. Mark regular values of the variable (scale) on it. For each observation, place a dot above its value on the number line. Sodium in Cereals 103 Learning Goal 1: Dotplots - Example: The following data shows the length of 50 movies in minutes. Construct a dot plot for the data. 64, 64, 69, 70, 71, 71, 71, 72, 73, 73, 74, 74, 74, 74, 75, 75, 75, 75, 75, 75, 76, 76, 76, 77, 77, 78, 78, 79, 79, 80, 80, 81, 81, 81, 82, 82, 82, 83, 83, 83, 84, 86, 88, 89, 89, 90, 90, 92, 94, 120. Length of 50 Movies Figure 2-5 Learning Goal 1: Dotplots – Frequency Table Data The following frequency distribution shows the number of defectives observed by a quality control officer over a 30 day period. Construct a dot plot for the data. Learning Goal 1: Dotplots – Solution Learning Goal 1: Dotplots – Your Turn One of Professor Weiss’s sons wanted to add a new DVD player to his home theater system. He used the Internet to shop and went to pricewatch.com. There he found 16 quotes on different brands and styles of DVD players. Construct a dotplot for these data. Learning Goal 1: Dotplots – Solution To construct a dotplot for the data, we begin by drawing a horizontal axis that displays the possible prices. Then we record each price by placing a dot over the appropriate value on the horizontal axis. For instance, the first price is $210, which calls for a dot over the “210” on the horizontal axis. Learning Goal 1: Think Before You Draw Remember the “Make a picture” rule? Now that we have options for data displays, you need to Think carefully about which type of display to make. Before making a stem-and-leaf display, a histogram, or a dotplot, check the Quantitative Data Condition: The data are values of a quantitative variable whose units are known. Learning Goal 2 Know how to display the relative position of quantitative variable with a Cumulative Frequency Curve and analysis the Cumulative Frequency Curve. Quantitative Data OGIVE - CUMULATIVE FREQUENCY CURVE Learning Goal 2: Cumulative Frequency and the Ogive Histogram displays the distribution of a quantitative variable. It tells little about the relative standing (percentile, quartile, etc.) of an individual observation. For this information, we use a Cumulative Frequency graph, called an Ogive (pronounced O-JIVE). Learning Goal 2: Measures of Relative Standing How many measurements lie below the measurement of interest? This is measured by the pth percentile. p% (100-p) % p-th percentile x Learning Goal 2: Percentile The pth percentile is a value such that p percent of the observations fall below or at that value. 114 Learning Goal 2: Special Percentiles – Deciles and Quartiles • Deciles and quartiles are special percentiles. • Deciles divide an ordered data set into 10 equal parts. • Quartiles divide the ordered data set into 4 equal parts. • We usually denote the deciles by D1, D2, D3, … , D9. • We usually denote the quartiles by Q1, Q2, and Q3. Learning Goal 2: Special Percentiles – Deciles and Quartiles • • • • • • • There are 9 deciles and 3 quartiles. Q1 = first quartile = P25 Q2 = second quartile = P50 Q3 = third quartile = P75 D1 = first decile = P10 D2 = second decile = P20 . . . D9 = ninth decile = P90 Learning Goal 2: Percentile - Examples 90% of all men (16 and older) earn more than $319 per week. BUREAU OF LABOR STATISTICS 10% 90% $319 50th Percentile = Median 25th Percentile = Lower Quartile (Q1) 75th Percentile = Upper Quartile (Q3) $319 is the 10th percentile. Learning Goal 2: Calculating Percentile • The percentile corresponding to a given data value, say x, in a set is obtained by using the following formula. Number of values at or below x Percentile 100% Number of values in data set Learning Goal 2: Calculating Percentile - Example • Example: The shoe sizes, in whole numbers, for a sample of 12 male students in a statistics class were as follows: 13, 11, 10, 13, 11, 10, 8, 12, 9, 9, 8, and 9. • What is the percentile rank for a shoe size of 12? Learning Goal 2: Calculating Percentile - Solution • Solution: First, we need to arrange the values from smallest to largest. • The ordered array is given below: 8, 8, 9, 9, 9, 10, 10, 11, 11, 12, 13, 13. • Observe that the number of values at or below the value of 12 is 10. Learning Goal 2: Calculating Percentile - Solution • Solution (continued): The total number of values in the data set is 12. • Thus, using the formula, the corresponding percentile is: The value of 12 corresponds to approximately the 83rd percentile. Learning Goal 2: Calculating Percentile - Example • Example: The data given below represents the 19 countries with the largest numbers of total Olympic medals – excluding the United States, which had 101 medals – for the 1996 Atlanta games. Find the 65th percentile for the data set. • 63, 65, 50, 37, 35, 41, 25, 23, 27, 21, 17, 17, 20, 19, 22, 15, 15, 15, 15. Learning Goal 2: Calculating Percentile - Solution • Solution: First, we need to arrange the data set in order. The ordered set is: . • 15, 15, 15, 15, 17, 17, 19, 20, 21, 22, 23, 25, 27, 35, 37, 41, 50, 63, 65. • Next, compute the position of the percentile. • Here n = 19, k = 65. • Thus, c = (19 65)/100 = 12.35. • We need to round up to a value 13. Learning Goal 2: Calculating Percentile - Solution • Solution (continued): Thus, the 13th value in the ordered data set will correspond to the 65th percentile. • That is P65 = 27. Learning Goal 2: Cumulative Frequency • What is a cumulative frequency for a class? The cumulative frequency for a specific class in a frequency table is the sum of the frequencies for all values at or below the given class. Learning Goal 2: Cumulative Frequency Tables Cumulative frequencies for a class are the sums of all the frequencies up to and including that class. Example Learning Goal 2: Cumulative Frequency Tables Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Class Frequency Percentage Cumulative Cumulative Frequency Percentage 10 - 20 3 15 3 15 20 - 30 6 30 9 45 30 - 40 5 25 14 70 40 - 50 4 20 18 90 50 - 60 2 10 20 100 Total 20 100 Learning Goal 2: Cumulative Frequency Curve - Ogive A line graph that depicts cumulative frequencies. Used to Find Quartiles and Percentiles. Ogive: Daily High Temperature Cumulative Percentage 100 80 60 40 20 0 10 20 30 40 50 60 Learning Goal 2: Constructing an Ogive 1. 2. 3. 4. Make a frequency table and add a cumulative frequency column. To fill in the cumulative frequency column, add the counts in the frequency column that fall in or below the current class interval. Label and scale the axes and title the graph. Horizontal axis “classes” and vertical axis “cumulative frequency or relative cumulative frequency”. Begin the ogive at zero on the vertical axis and lower boundary of the first class on the horizontal axis. Then graph each additional Upper class boundary vs. cumulative frequency for that class. Learning Goal 2: Ogive - Example Learning Goal 2: Cumulative Frequency Curve – Example The frequencies of the scores of 80 students in a test are given in the following table. Complete the corresponding cumulative frequency table. A suitable table is as follows: Learning Goal 2: Cumulative Frequency Curve – Example The information provided by a cumulative frequency table can be displayed in graphical form by plotting the cumulative frequencies given in the table against the upper class boundaries, and joining these points with a smooth. Construct the Cumulative Frequency Curve. The cumulative frequency curve corresponding to the data is as follows: Learning Goal 2: Cumulative Frequency Curve – Class Problem The results obtained by 200 students in a mathematics test are given in the following table. Draw a cumulative frequency curve and use it to estimate. a) The median mark. b) The number of students who scored less than 22 marks. c) The pass mark if 120 students passed the test. d) The min. mark required to obtain an A grade if 10% of the students received an A grade. Learning Goal 2: Cumulative Frequency Curve – Solution a) b) c) d) The required cumulative frequency curve is as follows: The median mark: median mark is 26 The number of students who scored less than 22 marks: approximately 69 students scored less than 22 marks The pass mark if 120 students passed the test: pass mark is 28 The min. mark required to obtain an A grade if 10% of the students received an A grade: min. mark required for an A is 38 Learning Goal 3 Be able to describe the distribution of a quantitative variable in terms of its shape. Learning Goal 3: What is the Shape of the Distribution? 1. Does the histogram have a single central peak or several separated peaks? 2. Is the histogram symmetric? 3. Do any unusual features stick out? In any graph, look for the overall pattern and any striking deviations from that pattern. Learning Goal 3: Shape, Center, and Spread When describing a distribution, make sure to always talk about three things: shape, center, and spread… Actually you should comment on four things when describing a distribution. The three above and any deviations from the shape. These deviations from the shape are called ‘outliers’ and will be discussed later. Learning Goal 3: Shape - Peaks Does the histogram have a single central peak or several separated peaks? Peaks in a histogram are also called modes. A histogram with one main peak is dubbed unimodal; histograms with two peaks are bimodal; histograms with three or more peaks are called multimodal. Learning Goal 3: Shape: Unimodal - Example Unimodal – single peak. 139 Learning Goal 3: Shape: Bimodal - Example Bimodal - two peaks. Learning Goal 3: Shape: Multimodal - Example Multimodal – three or more peaks. Learning Goal 3: Shape: Bimodal or Multimodal A bimodal or multimodal shape distribution might indicate that the data are from two or more different populations. Height of plants by color 5 red Number of plants 4 pink blue 3 2 1 0 Height in centimeters Learning Goal 3: Shape: Uniform A histogram that doesn’t appear to have any mode and in which all the bars are approximately the same height is called uniform or rectangular. A distribution in which every class has equal frequency, no mode. A uniform distribution is symmetrical with the added property that the bars are the same height. Learning Goal 3: Shape: Uniform - Example Uniform – no mode, symmetrical. Learning Goal 3: Shape: Modal Comparison Learning Goal 3: Shape: Symmetrical • In a symmetrical distribution, the data values are evenly distributed on both sides of the mean. • If you can fold the histogram along a vertical line through the middle and have the edges match pretty closely, the histogram is symmetric. Learning Goal 3: Shape: Symmetrical - Example Symmetrical – The distribution’s shape is generally the same if folded down the middle. Learning Goal 3: Shape: Skewed The (usually) thinner ends of a distribution are called the tails. If one tail stretches out farther than the other, the histogram is said to be skewed to the side of the longer tail. In the figure below, the histogram on the left is said to be skewed left, while the histogram on the right is said to be skewed right. Learning Goal 3: Shape: Skewed Right - Example In a skewed right distribution, most of the data values fall to the left, and the “tail” of the distribution is to the right. Learning Goal 3: Shape: Skewed Left - Example In a skewed left distribution, most of the data values fall to the right, and the “tail” of the distribution is to the left. Learning Goal 3: Shape: Skewed - Comparison A distribution is skewed to the left if the left tail is longer than the right tail A distribution is skewed to the right if the right tail is longer than the left tail 151 Learning Goal 3: Shape: Other Common Terms Hump – high bar Valley – between 2 peaks Gap – no data Learning Goal 3: Shapes Learning Goal 4 Be able to describe any anomalies or extraordinary features revealed by the display of a variable. Learning Goal 4: Overall Pattern - Anything Unusual? Do any unusual features stick out? Sometimes it’s the unusual features that tell us something interesting or exciting about the data. You should always mention any stragglers, or outliers, that stand off away from the body of the distribution. Are there any gaps in the distribution? If so, we might have data from more than one group. Learning Goal 4: Deviations from the Overall Pattern Outliers – An individual observation that falls outside the overall pattern of the distribution. Extreme Values – either high or low. Outliers Causes: 1. Data Mistake 2. Special nature of some observations Learning Goal 4: Outliers An Outlier falls far from the rest of the data. 157 Learning Goal 4: Outliers Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. Alaska Florida The overall pattern is fairly symmetrical except for two states clearly not belonging to the main trend. Alaska and Florida have unusual representation of the elderly in their population. A large gap in the distribution is typically a sign of an outlier. Learning Goal 5 Be able to determine the shape of the distribution of a variable by knowing something about the data. Learning Goal 5: Determine the Shape of a Distribution - Example 160 Learning Goal 5: Determine the Shape of a Distribution - Example It’s often a good idea to think about what the distribution of a data set might look like before we collect the data. What do you think the distribution of each of the following data sets will look like? 1. Number of Miles run by Saturday morning joggers at a park. Roughly symmetric, slightly skewed right. 2. Hours spent by U.S. adults watching football on Thanksgiving Day. Bimodal. Many people watch no football, others watch most of one or more games. 3. Amount of winnings of all people playing a particular state’s lottery last week. Strongly skewed to the right, with almost everyone at $0, a few small prizes, with the winner an outlier. Learning Goal 5: Determine the Shape of a Distribution – Your Turn Consider a data set containing IQ scores for the general public. What shape would you expect a histogram of this data set to have? a. b. c. d. Symmetric Skewed to the left Skewed to the right Bimodal 162 Learning Goal 5: Determine the Shape of a Distribution – Your Turn Consider a data set containing IQ scores for the general public. What shape would you expect a histogram of this data set to have? a. b. c. d. Symmetric Skewed to the left Skewed to the right Bimodal 163 Learning Goal 5: Determine the Shape of a Distribution – Your Turn Consider a data set of the scores of students on a very easy exam in which most score very well but a few score very poorly. What shape would you expect a histogram of this data set to have? a. b. c. d. Symmetric Skewed to the left Skewed to the right Bimodal 164 Learning Goal 5: Determine the Shape of a Distribution – Your Turn Consider a data set of the scores of students on a very easy exam in which most score very well but a few score very poorly. What shape would you expect a histogram of this data set to have? a. b. c. d. Symmetric Skewed to the left Skewed to the right Bimodal 165 Learning Goal 6 Know the basic properties and how to compute the mean and median of a set of data. Learning Goal 6: Measures of Central Tendency A measure of central tendency for a collection of data values is a number that is meant to convey the idea of centralness or center of the data set. The most commonly used measures of central tendency for sample data are the: mean, median, and mode. Learning Goal 6: Measures of Central Tendency Overview Central Tendency Mean Median Mode n X X i 1 n i Midpoint of ranked values Most frequently observed value Learning Goal 6: The Mean • Mean: The mean of a set of numerical (data) values is the (arithmetic) average for the set of values. • When computing the value of the mean, the data values can be population values or sample values. • Hence we can compute either the population mean or the sample mean Learning Goal 6: Mean Notation • NOTATION: The population mean is denoted by the Greek letter µ (read as “mu”). • NOTATION: The sample mean is denoted by 𝑥 (read as “x-bar”). • Normally the population mean is unknown. Learning Goal 6: The Mean The mean is the most common measure of central tendency. The mean is also the preferred measure of center, because it uses all the data in calculating the center. For a sample of size n: n X X i1 n Sample size i X1 X2 Xn n Observed values Learning Goal 6: The Mean - Example • What is the mean of the following 11 sample values? 3 8 6 14 0 0 12 -7 0 -10 -4 Learning Goal 6: The Mean - Example (Continued) • Solution: 3 8 6 14 0 (4) 0 12 (7) 0 (10) x 11 2 Learning Goal 6: Mean – Frequency Table • When a data set has a large number of values, we summarize it as a frequency table. • The frequencies represent the number of times each value occurs. • When the mean is calculated from a frequency table it is an approximation, because the raw data is not known. Learning Goal 6: Mean – Frequency Table Example What is the mean of the following 11 sample values (the same data as before)? Class Frequency -10 to < -4 2 -4 to < 2 4 2 to < 8 2 8 to < 14 2 14 to < 20 1 Learning Goal 6: Mean – Frequency Table Example Solution: Class Midpoint Frequency -10 to < -4 -7 2 -4 to < 2 -1 4 2 to < 8 5 2 8 to < 14 11 2 14 to < 20 17 1 2 7 4 1 2 5 2 11 1 17 x 11 2.82 Learning Goal 6: Calculate Mean on TI-84 Raw Data 1. Enter the raw data into a list, STAT/Edit. 2. Calculate the mean, STAT/CALC/1-Var Stats List: L1 FreqList: (leave blank) Calculate 177 Learning Goal 6: Calculate Mean on TI-84 Frequency Table Data Same Data Class Mark Freq 0-50 25 1 50-100 75 1 100-150 125 3 150-200 175 4 200-250 225 7 250-300 275 4 1. Enter the Frequency table data into two lists (L1 – Class Midpoint, L2 – Frequency), STAT/Edit. 2. Calculate the mean, STAT/CALC/1-Var Stats List: L1 FreqList: L2 Calculate 178 Learning Goal 6: Calculate Mean on TI-84 – Your Turn Raw Data: 548, 405, 375, 400, 475, 450, 412 375, 364, 492, 482, 384, 490, 492 490, 435, 390, 500, 400, 491, 945 435, 848, 792, 700, 572, 739, 572 Solution: 516.2 Learning Goal 6: Calculate Mean on TI-84 – Your Turn Frequency Table Data (same): Class Limits 350 to < 450 450 to < 550 550 to < 650 650 to < 750 750 to < 850 850 to < 950 Solution: 517.9 Frequency 11 10 2 2 2 1 Learning Goal 6: Median The median is the midpoint of the observations when they are ordered from the smallest to the largest (or from the largest to smallest) If the number of observations is: Odd, then the median is the middle observation Even, then the median is the average of the two middle observations 181 Center of a Distribution -- Median The median is the value with exactly half the data values below it and half above it. It is the middle data value (once the data values have been ordered) that divides the histogram into two equal areas. It has the same units as the data. Learning Goal 6: Finding the Median The location of the median: n 1 Median position position in the ordered data 2 If the number of values is odd, the median is the middle number. If the number of values is even, the median is the average of the two middle numbers. Note that 𝑛+1 2 is not the value of the median, only the position of the median in the ranked data. Learning Goal 6: Finding the Median – Example (n odd) • What is the median for the following sample values? 3 8 6 2 12 -7 14 0 -1 -10 -4 Learning Goal 6: Finding the Median – Example (n odd) • First of all, we need to arrange the data set in order ( STATS/SortA ) • The ordered set is: • -10 -7 -4 -1 0 2 3 6 8 12 14 6th value • Since the number of values is odd, the median will be found in the 6th position in the ordered set (To find; data number divided by 2 and round up, 11/2 = 5.5⇒6). • Thus, the value of the median is 2. Learning Goal 6: Finding the Median – Example (n even) • Find the median age for the following eight college students. 23 19 32 25 26 22 24 20 Learning Goal 6: Finding the Median – Example (n even) • First we have to order the values as shown below. 19 20 22 23 24 25 26 32 Middle Two Average • Since there is an even number of ages, the median will be the average of the two middle values (To find; data number divided by 2, that number and the next are the two middle numbers, 8/2 = 4⇒4th & 5th are the middle numbers). • Thus, median = (23 + 24)/2 = 23.5. Learning Goal 6: The Median - Summary The median is the midpoint of a distribution—the number such that half of the observations are smaller and half are larger. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 0.6 1.2 1.6 1.9 1.5 2.1 2.3 2.3 2.5 2.8 2.9 3.3 3.4 3.6 3.7 3.8 3.9 4.1 4.2 4.5 4.7 4.9 5.3 5.6 25 12 6.1 1. Sort observations from smallest to largest.n = number of observations ______________________________ 2. If n is odd, the median is observation n/2 (round up) down the list n = 25 n/2 = 25/2 = 12.5=13 Median = 3.4 3. If n is even, the median is the mean of the two center observations n = 24 n/2 = 12 &13 Median = (3.3+3.4) /2 = 3.35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 0.6 1.2 1.6 1.9 1.5 2.1 2.3 2.3 2.5 2.8 2.9 3.3 3.4 3.6 3.7 3.8 3.9 4.1 4.2 4.5 4.7 4.9 5.3 5.6 Learning Goal 6: Finding the Median on the TI-84 1. Enter data into L1 2. STAT; CALC; 1:1-Var Stats 189 Learning Goal 6: Find the Mean and Median – Your Turn CO2 Pollution levels in 8 largest nations measured in metric tons per person: 2.3 1.1 19.7 9.8 1.8 1.2 0.7 0.2 a. Mean = 4.6 b. Mean = 4.6 c. Mean = 1.5 Median = 1.5 Median = 5.8 Median = 4.6 190 Learning Goal 6: Find the Mean and Median – Your Turn CO2 Pollution levels in 8 largest nations measured in metric tons per person: 2.3 1.1 19.7 9.8 1.8 1.2 0.7 0.2 a. Mean = 4.6 b. Mean = 4.6 c. Mean = 1.5 Median = 1.5 Median = 5.8 Median = 4.6 191 Learning Goal 6: Mode A measure of central tendency. Value that occurs most often or frequent. Used for either numerical or categorical data. There may be no mode or several modes. Not used as a measure of center. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode Learning Goal 6: Mode - Example The mode is the measurement which occurs most frequently. The set: 2, 4, 9, 8, 8, 5, 3 The mode is 8, which occurs twice The set: 2, 2, 9, 8, 8, 5, 3 There are two modes - 8 and 2 (bimodal) The set: 2, 4, 9, 8, 5, 3 There is no mode (each value is unique). Learning Goal 6: Summary Measures of Center Learning Goal 7 Understand the properties of a skewed distribution. Learning Goal 7: Where is the Center of the Distribution? If you had to pick a single number to describe all the data what would you pick? It’s easy to find the center when a histogram is unimodal and symmetric—it’s right in the middle. On the other hand, it’s not so easy to find the center of a skewed histogram or a histogram with outliers. Learning Goal 7: Meaningful measure of Center Your measure of center must be meaningful. The distribution of women’s height appears coherent and symmetrical. The mean is a good measure center. Height of 25 women in a class x 69.3 Is the mean always a good measure of center? Learning Goal 7: Impact of Skewed Data Mean and median of a symmetric distribution Disease X: x 3.4 M 3.4 Mean and median are the same. and skewed distribution. Multiple myeloma: x 3.4 M 2.5 The mean is pulled toward the skew. Learning Goal 7: The Mean Nonresistant – The mean is sensitive to the influence of extreme values and/or outliers. Skewed distributions pull the mean away from the center towards the longer tail. The mean is located at the balancing point of the histogram. For a skewed distribution, is not a good measure of center. Learning Goal 7: Mean – Nonresistant Example The most common measure of central tendency. Affected by extreme values (skewed dist. or outliers). 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 1 2 3 4 5 15 3 5 5 0 1 2 3 4 5 6 7 8 9 10 Mean = 4 1 2 3 4 10 20 4 5 5 Learning Goal 7: The Median Resistant – The median is said to be resistant, because extreme values and/or outliers have little effect on the median. In an ordered array, the median is the “middle” number (50% above, 50% below). Learning Goal 7: Median – Resistant Example Not affected by extreme values (skewed distributions or outliers). 0 1 2 3 4 5 6 7 8 9 10 Median = 3 0 1 2 3 4 5 6 7 8 9 10 Median = 3 Learning Goal 7: Mean vs. Median with Outliers Percent of people dying x 3.4 x 4.2 Without the outliers With the outliers The mean (non-resistant) is The median (resistant), on the pulled to the right a lot by the other hand, is only slightly outliers (from 3.4 to 4.2). pulled to the right by the outliers (from 3.4 to 3.6). Learning Goal 7: Effect of Skewed Distributions • The figure below shows the relative positions of the mean and median for right-skewed, symmetric, and left-skewed distributions. • Note that the mean is pulled in the direction of skewness, that is, in the direction of the extreme observations. • For a right-skewed distribution, the mean is greater than the median; for a symmetric distribution, the mean and the median are equal; and, for a left-skewed distribution, the mean is less than the median. Learning Goal 7: Comparing the mean and the median The mean and the median are the same only if the distribution is symmetrical. The median is a measure of center that is resistant to skew and outliers. The mean is not. Mean and median for a symmetric distribution Mean Median Left skew Mean Median Mean and median for skewed distributions Mean Median Right skew Learning Goal 7: Which measure of location is the “best”? Because the median considers only the order of values, it is resistant to values that are extraordinarily large or small; it simply notes that they are one of the “big ones” or “small ones” and ignores their distance from center. To choose between the mean and median, start by looking at the distribution. Mean is used, for unimodal symmetric distributions, unless extreme values (outliers) exist. Median is used, for skewed distributions or when there are outliers present, since the median is not sensitive to extreme values. Learning Goal 7: Class Problem Observed mean =2.28, median=3, mode=3.1 What is the shape of the distribution and why? Learning Goal 7: Solution Solution: Skewed Left Left-Skewed Mean Median Mode Symmetric Right-Skewed Mean = Median = Mode Mode Median Mean Learning Goal 7: Example Five houses on a hill by the beach. $2,000 K House Prices: $500 K $300 K $100 K $100 K $2,000,000 500,000 300,000 100,000 100,000 Learning Goal 7: Example – Measures of Center House Prices: $2,000,000 500,000 300,000 100,000 100,000 Which is the best measure of center? Median Sum $3,000,000 Mean: ($3,000,000/5) = $600,000 Median: middle value of ranked data = $300,000 Mode: most frequent value = $100,000 Conclusion – Mean or Median? Mean – use with symmetrical distributions (no outliers), because it is nonresistant. Median – use with skewed distribution or distribution with outliers, because it is resistant. Learning Goal 8 Know the basic properties and how to compute the standard deviation and IQR of a set of data. Learning Goal 8: How Spread Out is the Distribution? Variation matters, and Statistics is about variation. Are the values of the distribution tightly clustered around the center or more spread out? Always report a measure of spread along with a measure of center when describing a distribution numerically. Learning Goal 8: Measures of Spread A measure of variability for a collection of data values is a number that is meant to convey the idea of spread for the data set. The most commonly used measures of variability for sample data are the: range interquartile range variance and standard deviation Learning Goal 8: Measures of Variation Variation Range Interquartile Range Variance Standard Deviation Measures of variation give information on the spread or variability of the data values. Same center, different variation Learning Goal 8: The Interquartile Range One way to describe the spread of a set of data might be to ignore the extremes and concentrate on the middle of the data. The interquartile range (IQR) lets us ignore extreme data values and concentrate on the middle of the data. To find the IQR, we first need to know what quartiles are… Learning Goal 8: The Interquartile Range Quartiles divide the data into four equal sections. One quarter of the data lies below the lower quartile, Q1 One quarter of the data lies above the upper quartile, Q3. The quartiles border the middle half of the data. The difference between the quartiles is the interquartile range (IQR), so IQR = upper quartile(Q3) – lower quartile(Q1) Learning Goal 8: Interquartile Range Eliminate some outlier or extreme value problems by using the interquartile range. Eliminate some high- and low-valued observations and calculate the range from the remaining values. IQR = 3rd quartile – 1st quartile IQR = Q3 – Q1 Learning Goal 8: Finding Quartiles 1. 2. 3. 4. 5. Order the Data Find the median, this divides the data into a lower and upper half (the median itself is in neither half). Q1 is then the median of the lower half. Q3 is the median of the upper half. Example Even data Q1=27, M=39, Q3=50.5 IQR = 50.5 – 27 = 23.5 Odd data Q1=35, M=46, Q3=54 IQR = 54 – 35 = 19 Learning Goal 8: Quartiles Example: X minimum Q1 25% 12 Middle fifty Median (Q2) 25% 30 25% 45 X Q3 maximum 25% 57 70 Interquartile range = 57 – 30 = 27 Not influenced by extreme values (Resistant). Learning Goal 8: Quartiles Quartiles split the ranked data into 4 segments with an equal number of values per segment. 25% 25% 25% 25% Q1 Q2 Q3 The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger. Q2 is the same as the median (50% are smaller, 50% are larger). Only 25% of the observations are greater than the third quartile. Learning Goal 8: The Interquartile Range - Histogram The lower and upper quartiles are the 25th and 75th percentiles of the data, so… The IQR contains the middle 50% of the values of the distribution, as shown in figure: + Learning Goal 8: Find and Interpret IQR Travel times to work for 20 randomly selected New Yorkers 10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45 5 10 10 15 15 15 15 20 20 20 25 30 30 40 40 45 60 60 65 85 Q1 = 15 M = 22.5 Q3= 42.5 IQR = Q3 – Q1 = 42.5 – 15 = 27.5 minutes Interpretation: The range of the middle half of travel times for the New Yorkers in the sample is 27.5 minutes. Learning Goal 8: Interquartile Range on the TI-84 • • Use STATS/CALC/1-Var Stats to find Q1 and Q3. Then calculate IQR = Q3 – Q1. Interquartile range = Q3 – Q1 = 9 – 6 = 3. Learning Goal 8: Calculate IQR - Your Turn The following scores for a statistics 10point quiz were reported. What is the value of the interquartile range? 7 8 9 6 8 0 9 9 9 0 0 7 10 9 8 5 7 9 Solution: IQR = 3 Learning Goal 8: 5-Number Summary Definition: The five-number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. Minimum Q1 M Q3 Maximum Learning Goal 8: 5-Number Summary The 5-number summary of a distribution reports its minimum, 1st quartile Q1, median, 3rd quartile Q3, and maximum in that order. Obtain 5-number summary from 1-Var Stats. Min. 3.7 Q1 6.6 Med. 7 Q3 7.6 Max. 9 Learning Goal 8: Calculate 5 Number Summary 1. 2. 3. 4. 5. Enter data into L1. STAT; CALC; 1:1-Var Stats; Enter. List: L1. Calculate. Scroll down to 5 number summary. 228 Learning Goal 8: Calculate 5 Number Summary – Your Turn The grades of 25 students are given below : 42, 63, 47, 77, 46, 71, 68, 83, 91, 55, 67, 66, 63, 57, 50, 69, 73, 82, 77, 58, 66, 79, 88, 97, 86. Calculate the 5 number summary for the students grades. Solution: 42, 57.5, 68, 80.5, 97 Learning Goal 8: Calculate 5 Number Summary – Your Turn A group of University students took part in a sponsored race. The number of laps completed is given in the table. number of laps frequency (x) 1-5 2 6 – 10 9 11 – 15 15 16 – 20 20 21 – 25 17 26 – 30 25 31 – 35 2 36 - 40 1 Calculate the 5 number summary. Solution: 3, 13, 18, 28, 38 Learning Goal 8: Standard Deviation A more powerful measure of spread than the IQR is the standard deviation, which takes into account how far each data value is from the mean. A deviation is the distance that a data value is from the mean. Since adding all deviations together would total zero, we square each deviation and find an average of sorts for the deviations. But to calculate the standard deviation you must first calculate the variance. Learning Goal 8: Variance The variance is measure of variability that uses all the data. It measures the average deviation of the measurements about their mean. Learning Goal 8: Variance The variance, notated by s2, is found by summing the squared deviations and (almost) averaging them: s 2 x x 2 n 1 Used to calculate Standard Deviation. The variance will play a role later in our study, but it is problematic as a measure of spread - it is measured in squared units – not the same units as the data, a serious disadvantage! Learning Goal 8: Variance The variance of a population of N measurements is the average of the squared deviations of the measurements about their mean m. Sigma Squared 2 ( x m ) 2 i N The variance of a sample of n measurements is the sum of the squared deviations of the measurements about their mean, divided by (n – 1). S Squared ( xi x ) s n 1 2 2 Learning Goal 8: Standard Deviation The standard deviation, s, is just the square root of the variance. Is measured in the same units as the original data. Why it is preferred over variance. s x x n 1 2 Learning Goal 8: Standard Deviation In calculating the variance, we squared all of the deviations, and in doing so changed the scale of the measurements. To return this measure of variability to the original units of measure, we calculate the standard deviation, the positive square root of the variance. Population standard deviation : Sample standard deviation : s s 2 2 Learning Goal 8: Finding Standard Deviation The most common measure of spread looks at how far each observation is from the mean. This measure is called the standard deviation. Let’s explore it! Consider the following data on the number of pets owned by a group of 9 children. 1) Calculate the mean. 2) Calculate each deviation. deviation = observation – mean deviation: 1 - 5 = -4 deviation: 8 - 5 = 3 x =5 Learning Goal 8: Finding Standard Deviation (xi-mean)2 xi (xi-mean) 1 1 - 5 = -4 (-4)2 = 16 3 3 - 5 = -2 (-2)2 = 4 3) Square each deviation. 4 4 - 5 = -1 (-1)2 = 1 4) Find the “average” squared deviation. Calculate the sum of the squared deviations divided by (n-1)…this is called the variance. 4 4 - 5 = -1 (-1)2 = 1 4 4 - 5 = -1 (-1)2 = 1 5 5-5=0 (0)2 = 0 7 7-5=2 (2)2 = 4 8 8-5=3 (3)2 = 9 9 9-5=4 (4)2 = 16 5) Calculate the square root of the variance…this is the standard deviation. Sum=? “average” squared deviation = 52/(9-1) = 6.5 Standard deviation = square root of variance = Sum=? This is the variance. 6.5 2.55 Learning Goal 8: Standard Deviation - Example The standard deviation is used to describe the variation around the mean. 1) First calculate the variance s2. 1 n 2 s ( x x ) i n 1 1 2 2) Then take the square root to get the standard deviation s. x Mean ± 1 s.d. 1 n 2 s ( x x ) i n 1 1 Learning Goal 8: Standard Deviation - Procedure 1. Compute the mean . x 2. Subtract the mean from each individual value to get a list of the deviations from the mean x x . 3. Square each of the differences to produce the square of the deviations from the mean 2 x x. 4. Add all of the squares of the deviations from 2 the mean to get x x . x x 5. Divide the sum by n 1 . [variance] 6. Find the square root of the result. 2 Learning Goal 8: Standard Deviation - Example Find the standard deviation of the Mulberry Bank customer waiting times. Those times (in minutes) are 1, 3, 14. Use a Table. We will not normally calculate standard deviation by hand. Learning Goal 8: Calculate Standard Deviation 1. 2. 3. 4. 5. Enter data into L1 STAT; CALC; 1:1-Var Stats; Enter List: L1;Calculator Sx is the sample standard deviation. σx is the population standard deviation. 242 Learning Goal 8: Calculate Standard Deviation – Your Turn The prices ($) of 18 brands of walking shoes: 90 70 70 70 75 70 65 68 60 74 70 95 75 70 68 65 40 65 Calculate the standard deviation. Solution: Sx = $11.31 Learning Goal 8: Calculate Standard Deviation – Your Turn During 3 hours at Heathrow airport 55 aircraft arrived late. The number of minutes they were late is shown in the grouped frequency table. minutes late frequency 010 20 30 40 50 - 9 19 29 39 49 59 27 10 7 5 4 2 Calculate the standard deviation for the number of minutes late. Solution: 14.9 min. Learning Goal 8: Standard Deviation - Properties The value of s is always positive. s is zero only when all of the data values are the same number. Larger values of s indicate greater amounts of variation. The units of s are the same as the units of the original data. One reason s is preferred to s2. Measures spread about the mean and should only be used to describe the spread of a distribution when the mean is used to describe the center (ie. symmetrical distributions). Nonresistant (like the mean), s can increase dramatically due to extreme values or outliers. Learning Goal 8: Standard Deviation - Example Larger values of standard deviation indicate greater amounts of variation. Small standard deviation Large standard deviation Learning Goal 8: Standard Deviation - Example Standard Deviation: the more variation, the larger the standard deviation. Data set II has greater variation. Learning Goal 8: Standard Deviation - Example Data Set I Data Set II Data set II has greater variation and the visual clearly shows that it is more spread out. Learning Goal 8: Comparing Standard Deviations The more variation, the larger the standard deviation. Data A 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 S = 3.338 20 21 Mean = 15.5 S = 0.926 20 21 Mean = 15.5 S = 4.567 Data B 11 12 13 14 15 16 17 18 19 Data C 11 12 13 14 15 16 17 18 19 Values far from the mean are given extra weight (because deviations from the mean are squared). Learning Goal 8: Spread: Range The range of the data is the difference between the maximum and minimum values: Range = max – min A disadvantage of the range is that a single extreme value can make it very large and, thus, not representative of the data overall. Learning Goal 8: Range Simplest measure of variation. Difference between the largest and the smallest values in a set of data. Example: Range = Xlargest – Xsmallest 0 1 2 3 4 5 6 7 8 9 10 11 12 Range = 14 - 1 = 13 13 14 Learning Goal 8: Disadvantages of the Range Ignores the way in which data are distributed 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5 Sensitive to outliers 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 Range = 5 - 1 = 4 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 120 - 1 = 119 Learning Goal 8: Range • The range is affected by outliers (large or small values relative to the rest of the data set). • The range does not utilize all the information in the data set only the largest and smallest values. • Thus, range is not a very useful measure of spread or variation. Learning Goal 8: Summary Measures Describing Data Numerically Central Tendency Quartiles Variation Mean Range Median Interquartile Range Mode Variance Standard Deviation Shape Skewness Learning Goal 9 Understand which measures of center and spread are resistant and which are not. Learning Goal 9: Resistant or Non-Resistant Which measures of center and spread are resistant? 1. Median – Extreme values and outliers have little effect. 2. IQR – Measures the spread of the middle 50% of the data, therefore extreme values and outliers have no effect. 3. When using Median to measure the center of a distribution, use IQR to measure the spread of the distribution. Learning Goal 9: Resistant or Non-Resistant Which measures of center and spread are Non-Resistant? 1. Mean – Extreme values and outliers pull the mean towards those values. 2. Standard Deviation – Measures the spread relative to the mean. Extreme values or outliers will increase the standard deviation of the distribution. 3. When using Mean to measure the center of a distribution, use Standard Deviation to measure the spread of the distribution. Learning Goal 9: Resistant or Non-Resistant Measures of Center: Mean (not resistant) Median (resistant) Measures of Spread: Standard deviation (not resistant) IQR (resistant) Range (not resistant) Most often and preferred, use the mean and the standard deviation, because they are calculated based on all the data values, so use all the available information. Learning Goal 9: Resistant or Non-Resistant Animated Center and Spread 63.33 Mean: 68.82 Mean:72.5 72.5 70 Median: 70 Median:72.5 72.5 S: 16.84 S: 12.56 S:10.16 10.16 IQR: 30 IQR: 20 IQR: 15 15 What is the difference between the center and spread of a distribution? Which measure of center (mean or median) was affected more by adding data points that skewed the distribution? Explain your answer. 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Quiz Scores In a symmetric distribution: • The mean, non-resistant, is used to represent the center. • The standard deviation (S), non-resistant, is used to represent the spread. In a skewed distribution: • The median, resistant, is used to represent the center. • The interquartile range (IQR), resistant, is used to represent the spread. ©2013 All rights reserved. For each distribution below, which measure of center and spread would you use? How do you know? A B Mean &S Median & IQR CCSS 6th Grade Statistics and Probability 2.0 Describe the distribution of a data set. Lesson to be used by EDI-trained teachers only. Learning Goal 9: Resistant or Non-Resistant Median and IQR are paired together – Resistant. Mean and Standard Deviation are paired together – Non-Resistant. Learning Goal 10 Be able to select a suitable measure of center and a suitable measure of spread for a variable based on information about its distribution. Learning Goal 10: Choosing Measures of Center and Spread We now have a choice between two descriptions for center and spread Mean and Standard Deviation Median and Interquartile Range Choosing Measures of Center and Spread •The median and IQR are usually better than the mean and standard deviation for describing a skewed distribution or a distribution with outliers. •Use mean and standard deviation only for reasonably symmetric distributions that don’t have outliers. •NOTE: Numerical summaries do not fully describe the shape of a distribution. ALWAYS PLOT YOUR DATA! Learning Goal 10: Choosing Measures of Center and Spread Plot your data Dotplot, Stemplot, Histogram Interpret what you see: Shape, Outliers, Center, Spread Choose numerical summary: 𝒙 and s, or Median and IQR Learning Goal 10: Choosing Center and Spread - Practice The distribution of a data set shows the arrangement of values in the data set. The center of a distribution is a number that represents all the values in the data set. The spread of a distribution is a number that describes the variability in the data set. The dot plots below show the ratings given to a new movie by two different audiences. 1. 1 2. Audience #1 2 3 4 5 6 7 8 Audience Rating 9 10 Mean: 7 Median: 7 S: 1.43 IQR: 2 1 Symmetric Audience #2 2 3 4 5 6 7 8 Audience Rating 9 10 Mean: 5.71 Median: 6 S: 1.67 IQR: 3 Center: Mean Spread: S Skewed Shape: The shape of the distribution is mostly Shape: The shape of the distribution is mostly symmetric. Center: Because the distribution is symmetric, the mean of 7 can be used as the measure of center. Spread: The S of the distribution is 1.43. symmetric. Center: Because the distribution is symmetric, the mean of 5.71 can be used as the measure of center. Spread:The S of the distribution is 1.67. Center: Median Spread: IQR ©2013 All rights reserved. CCSS 6th Grade Statistics and Probability 2.0 Describe the distribution of a data set. Lesson to be used by EDI-trained teachers only. Learning Goal 10: Choosing Center and Spread - Practice The distribution of a data set shows the arrangement of values in the data set. The center of a distribution is a number that represents all the values in the data set. The spread of a distribution is a number that describes the variability in the data set. The histograms below show the number of hours studied in a week for students in two math classes. 4. Class #1 Students 10 8 6 4 2 0-2 3-5 6-8 9-11 12-14 15-17 Mean: 9.69 Median: 10.5 S: 3.6 IQR: 6.5 Symmetric Class #2 10 8 6 4 2 Students 3. 0-2 Hours Studied 3-5 6-8 9-11 12-14 15-17 Mean: 7.75 Median: 7 S: 2.93 IQR: 4.5 Center: Mean Spread: S Hours Studied Shape: The shape of the distribution is skewed to Shape: The shape of the distribution is skewed to the left. the right Center: Because the distribution is skewed, the Center: Because the distribution is skewed, the Skewed median of 10.5 can be used as the measure of center. median of 7 can be used as the measure of center. Spread: The IQR of the distribution is 6.5. Spread:The IQR of the distribution is 4.5. Center: Median Spread: IQR ©2013 All rights reserved. CCSS 6th Grade Statistics and Probability 2.0 Describe the distribution of a data set. Lesson to be used by EDI-trained teachers only. Learning Goal 10: Choosing Center and Spread - Practice The distribution of a data set shows the arrangement of values in the data set. The center of a distribution is a number that represents all the values in the data set. The spread of a distribution is a number that describes the variability in the data set. The dot plot below shows the number of hours of The histogram below shows the number of hours of sleep per night for 33 students in a 6th-grade class. sleep per night for 33 adults selected at random. 1. 2. 4 5 6 7 8 9 10 11 Hours of Sleep Adults Mean: 8.4 Median: 9 S: 1.53 IQR: 3 12 10 8 6 4 2 Mean: 6.8 Median: 7 S: 1.54 IQR: 2.5 0-1 2-3 4-5 6-7 8-9 Center: Mean Spread: S 10+ Hours Slept Skewed Shape: The shape of the distribution is skewed Shape: The shape of the distribution is fairly left. symmetric, with a slight skew to the left. Center: Because the distribution is mostly symmetric, the mean of 6.8 can be used as the measure of center. Spread:The S of the distribution is 1.54. Center: Because the distribution is skewed, the median of 9 can be used as the measure of center. Spread: The IQR of the distribution is 3. Symmetric Center: Median Spread: IQR ©2013 All rights reserved. CCSS 6th Grade Statistics and Probability 2.0 Describe the distribution of a data set. Lesson to be used by EDI-trained teachers only. Learning Goal 10: Choosing Center and Spread - Practice The histograms below show the scores of 31 students on a pretest and posttest. Pretest 41-50 51-60 61-70 71-80 81-90 91-100 Mean: 57.67 Median: 54 S: 9.07 IQR: 14 Score 12 10 8 6 4 2 Students 2. 12 10 8 6 4 2 Students 1. Posttest 41-50 51-60 61-70 71-80 81-90 91-100 Mean: 76 Median: 76 S: 9.81 IQR: 24 Score Shape: The shape of the distribution is skewed Shape: The shape of the distribution is mostly right. symmetric. Center: Because the distribution is mostly symmetric, the mean of 76 can be used as the measure of center. Spread:The S of the distribution is 9.81. Center: Because the distribution is skewed, the median of 54 can be used as the measure of center. Spread: The IQR of the distribution is 14. Did scores on the test improve from the pretest to the posttest? Explain your answer. Yes, test scores improved from the pretest to the posttest. It can be seen by the noticeably higher center in the distribution of scores for the posttest. CCSS 6 Grade Statistics and Probability 2.0 th ©2013 All rights reserved. Describe the distribution of a data set. Lesson to be used by EDI-trained teachers only. Learning Goal 10: Choosing Center and Spread - Practice The dot plot below shows the number of pets in each household of 28 students in a 6th-grade class. Mean: 1.82 Median: 2 S: 1.13 IQR: 1.5 1. Shape: The shape of the distribution is skewed right. Center: Because the distribution is skewed, the median of 2 can be used as the measure of center. Spread: The IQR of the distribution is 1.5. 0 1 2 3 4 5 6 7 8 9 Number of Pets ©2013 All rights reserved. CCSS 6th Grade Statistics and Probability 2.0 Describe the distribution of a data set. Lesson to be used by EDI-trained teachers only. Learning Goal 10: Choosing Center and Spread - Questions Choose Yes or No to indicate whether each statement is true about this distributions. A. Both distributions are symmetric. B. The median is the best measure of center for Distribution A. C. Overall, scores were higher in Distribution A than Distribution B. D. There is more variability in scores for Distribution A than Distribution B. E. Distribution A is skewed to the right. F. The Standard Deviation can be used to describe the spread for Distribution B. ©2013 All rights reserved. O Yes O No O Yes O No O Yes O No O Yes O No O Yes O No O Yes O No CCSS 6th Grade Statistics and Probability 2.0 Describe the distribution of a data set. Lesson to be used by EDI-trained teachers only. Learning Goal 11 Be able to describe the distribution of a quantitative variable in terms of its shape, center, and spread. Learning Goal 11: How to Analysis Quantitative Data 2009 Fuel Economy Guide Examine each variable by itself. Then study relationships among the variables. MODEL 2009 Fuel Economy Guide 2009 Fuel Economy Guide MPG MPG MODEL <new>MODEL MPG 1 Acura RL 9 22 Dodge Avenger 1630 Mercedes-Benz E350 24 2 Audi A6 Quattro 1023 Hyundai Elantra 1733 Mercury Milan 29 3 Bentley Arnage 1114 Jaguar XF 1825 Mitsubishi Galant 27 4 BMW 5281 1228 Kia Optima 1932 Nissan Maxima 26 5 Buick Lacrosse 1328 Lexus GS 350 2026 Rolls Royce Phantom 18 6 Cadillac CTS 1425 Lincolon MKZ 2128 Saturn Aura 33 7 Chevrolet Malibu 1533 Mazda 6 2229 Toyota Camry 31 8 Chrysler Sebring 1630 Mercedes-Benz E350 2324 Volkswagen Passat 29 9 Dodge Avenger 1730 Mercury Milan 2429 Volvo S80 25 Start with a graph or graphs Add numerical summaries <new> Learning Goal 11: How to Describe a Quantitative Distribution The purpose of a graph is to help us understand the data. After you make a graph, always ask, “What do I see?” How to Describe the Distribution of a Quantitative Variable In any graph, look for the overall pattern and for striking departures from that pattern. Describe the overall pattern of a distribution by its: •Shape Don’t forget your •Center SOCS! •Spread •Outliers Note individual values that fall outside the overall pattern. These departures are called outliers. Learning Goal 11: Describing a Quantitative Distribution We describe a distribution (the values the variable takes on and how often it takes these values) using the acronym SOCS. Shape– We describe the shape of a distribution in one of two ways: Symmetric/Approx. Symmetric or Skewed right/Skewed left Approx. Symmetric (with extreme values) Dot Plot Number of Home Runs in a Single Season Babe Ruth’s Single Season Home Runs 20 25 30 35 40 45 Ruth 50 55 60 65 Learning Goal 11: Describing a Quantitative Distribution Outliers: Observations that we would consider “unusual”. Data that don’t “fit” the overall pattern of the distribution. Babe Ruth had two seasons that appear to be somewhat different than the rest of his career. These may be “outliers”. (We’ll learn a numerical way to determine if observations are truly “unusual” later). Outliers 22, 25 Dot Plot Number of Home Runs in a Single Season Babe Ruth’s Single Season Home Runs Possible Outliers 20 25 30 35 Unusual observation??? 40 45 Ruth 50 55 60 65 Learning Goal 11: Describing a Quantitative Distribution Center: A single value that describes the entire distribution. Symmetric distributions use mean and skewed distributions use median. Dot Plot Number of Home Runs in a Single Season Babe Ruth’s Single Season Home Runs 20 Median is 46 25 30 35 40 45 Ruth 50 55 60 65 Learning Goal 11: Describing a Quantitative Distribution Spread: Talk about the variation of a distribution. Symmetric distributions use standard deviation and skewed distributions use IQR. Dot Plot Number of Home Runs in a Single Season Babe Ruth’s Single Season Home Runs 20 25 30 35 Q1 IQR is 19 40 45 Ruth 50 55 Q3 60 65 Learning Goal 11: Distribution Description using SOCS The distribution of Babe Ruth’s number of home runs in a single season is approximately symmetric1 with two possible outlier observations at 23 and 25 home runs.2 He typically hits about 463 home runs in a season. Over his career, the number of home runs has normally varied from between 35 and 54.4 1-Shape 2-Outliers 3-Center 4-Spread Learning Goal 11: Describe the Distribution – Your Turn The table and dotplot below displays the Environmental Protection Agency’s estimates of highway gas mileage in miles per gallon (MPG) for a sample of 24 model year 2009 midsize cars. Describe the shape, center, and spread of the distribution. Are there any outliers? 2009 Fuel Economy Guide MODEL 2009 Fuel Economy Guide 2009 Fuel Economy Guide MPG MPG MODEL <new>MODEL MPG 1 Acura RL 922 Dodge Avenger 1630 Mercedes-Benz E350 24 2 Audi A6 Quattro 1023 Hyundai Elantra 1733 Mercury Milan 29 3 Bentley Arnage 1114 Jaguar XF 1825 Mitsubishi Galant 27 4 BMW 5281 1228 Kia Optima 1932 Nissan Maxima 26 5 Buick Lacrosse 1328 Lexus GS 350 2026 Rolls Royce Phantom 18 6 Cadillac CTS 1425 Lincolon MKZ 2128 Saturn Aura 33 7 Chevrolet Malibu 1533 Mazda 6 2229 Toyota Camry 31 8 Chrysler Sebring 1630 Mercedes-Benz E350 2324 Volksw agen Passat 29 9 Dodge Avenger 1730 Mercury Milan 2429 Volvo S80 25 <new> Learning Goal 11: Describe the Distribution – Solution The distribution of highway gas mileage in miles per gallon (MPG) for a sample of 24 model year 2009 midsize cars is skewed left with two possible outliers at 18 and 14 miles per gallon. The gas mileage of a typical 2009 midsize car in the sample is 28 mpg. The gas mileage normally varied from between 24.5 and 30 mpg. Learning Goal 11: Describe the Distribution – Your Turn Smart Phone Battery Life (minutes) Apple iPhone 300 Motorola Droid 385 Palm Pre 300 Blackberry Bold Blackberry Storm Motorola Cliq Samsung Moment Blackberry Tour HTC Droid 360 330 360 330 300 460 Smart Phone Battery Life: Here is the estimated battery life for each of 9 different smart phones in minutes. Describe the distribution. Learning Goal 11: Describe the Distribution – Solution Collection 1 Dot Plot Solution: 300 340 380 420 460 BatteryLife (minutes) 300 300 300 330 330 360 360 385 460 Shape: There is a peak at 300 and the distribution has a long tail to the right (skewed to the right). Center: The median value is 330 minutes. Spread: The IQR is 72.5 minutes. Outliers: There is one phone with an unusually long battery life, the HTC Droid at 460 minutes. Cartoon Time Assignment Chapter 4 Notes Worksheet Exercises pg. 72 – 79: #5 - 18, 30 - 33, 43, 44, 48 Read Ch-5, pg. 80 - 94