Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LESSON 9.2 Name Data Distributions and Outliers 9.2 Class Date Data Distributions and Outliers Essential Question: What statistics are most affected by outliers, and what shapes can data distributions have? Resource Locker Common Core Math Standards Explore The student is expected to: Using Dot Plots to Display Data A dot plot is a data representation that uses a number line and Xs, dots, or other symbols to show frequency. Dot plots are sometimes called line plots. S-ID.1 Represent data with plots on the real number line (dot plots, histograms, and box plots). Also S-ID.2, S-ID.3, N-Q.1 Finance Twelve employees at a small company make the following annual salaries (in thousands of dollars): 25, 30, 35, 35, 35, 40, 40, 40, 45, 45, 50, and 60. Mathematical Practices MP.2 Reasoning Language Objective Choose the number line with the most appropriate scale for this problem. Explain your reasoning. 0 ENGAGE 20 30 40 50 60 70 20 35 50 65 80 95 Essential Question: What statistics are most affected by outliers, and what shapes can data distributions have? Outliers affect the mean more than the median, and they affect the standard deviation more than the IQR. Data distributions can be described generally as symmetric, skewed to the left, or skewed to the right. PREVIEW: LESSON PERFORMANCE TASK View the Engage section online. Discuss why, if you owned a business, you might compare a competitor’s sales to your company’s sales, and how your findings might lead you to change the way you run your business. Then preview the Lesson Performance Task. © Houghton Mifflin Harcourt Publishing Company . image credit: ©Blend Images/Alamy Explain to a partner what an outlier is. 50 100 The second number line has the most appropriate scale. The scale of the first number line includes a larger range of numbers than necessary, so dots will be clustered in the middle. The scale of the third number line does not have convenient tick marks for determining where values between the labels belong. Create and label a dot plot of the data. Put an X above the number line for each time that value appears in the data set. x x x x x x x x x x x 20 30 40 x 50 60 70 Salary (thousands of dollars) Reflect 1. Discussion Recall that quantitative data can be expressed as a numerical measurement. Categorical, qualitative data is expressed in categories, such as attributes or preferences. Is it appropriate to use a dot plot for displaying quantitative data, qualitative data, or both? Explain. A dot plot uses a number line, so it is only appropriate for displaying quantitative data. Module 9 be ges must EDIT--Chan DO NOT Key=NL-A;CA-A Correction Lesson 2 389 gh "File info" made throu Date Class Outliers tions and bu Data Distri Name data shapes can rs, and what by outlie affected are most statistics box plots). rams, and ion: What have? plots, histog distributions r line (dot real numbe plots on the ent data with a Dat S-ID.1 Repres N-Q.1 lay S-ID.3, s to Disp Also S-ID.2, 9.2 Resource Locker Quest Essential IN1_MNLESE389755_U4M09L2.indd 389 Using Dot HARDCOVER PAGES 317326 Plot Xs, r line and uses a numbe sometimes entation that plots are is a data represshow frequency. Dot A dot plot symbols to dots, or other make the plots. company called line 30, 35, 35, yees at a small dollars): 25, Twelve emplo (in thousands of s Finance salarie l annua following 50, and 60. 40, 45, 45, scale 35, 40, 40, appropriate Explore with the most ing. number line n your reason Choose the m. Explai for this proble 0 20 100 50 30 70 60 50 40 Turn to these pages to find this lesson in the hardcover student edition. 95 er line first numb 35 scale of the 20 middle. scale. The red in the appropriate will be cluste the most line has mining sary, so dots d number s for deter than neces The secon tick mark numbers convenient range of not have a larger includes line does number third of the s belong. The scale that the label between for each time s line er value where the numb an X above data. Put plot of the label a dot set. Create and rs in the data x x x value appea y . image g Compan credit: ©Blend 50 Harcour n Mifflin © Houghto lamy Images/A t Publishin 80 65 x xx x x x x x 20 30 50 40 sands of Salary (thou x 60 70 dollars) rical, rement. Catego to use a dot rical measu sed as a numeences. Is it appropriate be expres data can tes or prefer as attribu quantitative n. titative data. ries, such Recall that both? Explai ying quan Discussion is expressed in categoqualitative data, or for displa data appropriate itative data, qualitative it is only ying quant Lesson 2 er line, so plot for displa uses a numb A dot plot Reflect 1. 389 Module 9 9L2 389 55_U4M0 ESE3897 IN1_MNL 389 Lesson 9.2 09/04/14 6:11 PM 01/04/14 9:27 PM The Effects of an Outlier in a Data Set Explain 1 EXPLORE An outlier is a value in a data set that is much greater or much less than most of the other values in the data set. Outliers are determined by using the first or third quartiles and the IQR. Using Dot Plots to Display Data How to Identify an Outlier A data value x is an outlier if x < Q 1 - 1.5(IQR) or if x > Q 3 + 1.5(IQR). Example 1 Create a dot plot for the data set using an appropriate scale for the number line. Determine whether the extreme value is an outlier. INTEGRATE TECHNOLOGY To make it easier to create a dot plot for a large data set, students can enter the data values into one column of a spreadsheet, then use the spreadsheet’s datasorting function to arrange them in increasing order. Suppose that the list of salaries from the Explore is expanded to include the owner’s salary of $150,000. Now the list of salaries is 25, 30, 35, 35, 35, 40, 40, 40, 45, 45, 50, 60, and 150. To choose an appropriate scale, consider the minimum and maximum values, 25 and 150. A number line from 20 to 160 will contain all the values. A scale of 5 will be convenient for the data. Label tick marks by 20s. Plot each data value to see the distribution. QUESTIONING STRATEGIES xx xxx xxxxxx x 20 40 60 How can you use a dot plot to find the interquartile range of a data set? First, find the median by counting the same number of marks from each end of the dot plot until the middle value is reached. If there are an even number of marks, find the mean of the two middle values. Then use the same process to find the first quartile (Q1, the middle value of the lower half) and the third quartile (Q3, the middle value of the upper half). Finally, subtract Q1 from Q3 to find the interquartile range. x 80 100 120 140 160 Salary (thousands of dollars) Find the quartiles and the IQR to determine whether 150 is an outlier. ? 150 > Q3 + 1.5(IQR) ? 47.5 + 1.5(47.5 - 35) 150 > 150 > 66.25 True Suppose that the salaries from Part A were adjusted so that the owner’s salary is $65,000. Now the list of salaries is 25, 30, 35, 35, 35, 40, 40, 40, 45, 45, 50, 60, and 65. x x x x x x x x x x x To choose an appropriate scale, consider the minimum and 25 maximum data values, A number line from 65 and 20 to 70 . will 20 30 40 50 x 70 60 Salary (thousands of dollars) contain all the data values. A scale of 5 Label tick marks by © Houghton Mifflin Harcourt Publishing Company 150 is an outlier. EXPLAIN 1 The Effects of an Outlier in a Data Set will be convenient for the data. 10s AVOID COMMON ERRORS . Students sometimes forget to take the square root of the mean of the squared deviations when calculating standard deviation. Review the steps for calculating the standard deviation. Plot each data value to see the distribution. Module 9 390 Lesson 2 PROFESSIONAL DEVELOPMENT IN1_MNLESE389755_U4M09L2.indd 390 Integrate Mathematical Practices This lesson provides an opportunity to address Mathematical Practice MP.2, which calls for students to “reason abstractly and quantitatively.” Students solve real-world problems by creating dot plots for data sets. They analyze and describe the shapes of the data distributions, recognizing how the shapes affect the measures of center and spread, and they use both dot plots and statistical measures to compare data sets. Thus, they first take a situation from its real-world context to represent it symbolically, then they interpret the results in the real-world context. 01/04/14 9:27 PM QUESTIONING STRATEGIES How does an outlier affect the mean and median of a data set? If a data set includes an outlier, the mean can be increased or decreased significantly. This can make the mean misleading as a measure of center. When there are no outliers, most data values cluster closer to the mean. The median is much less affected by an outlier, because a single outlier shifts the middle of the data set by only a small amount, if at all. Data Distributions and Outliers 390 Find the quartiles and the IQR to determine whether 65 is an outlier. EXPLAIN 2 65 > Q3 + 1.5(IQR) ? 47.5 65 > 66.25 ? Comparing Data Sets ( + 1.5 47.5 - 35 True / False ) Therefore, 65 is / is not an outlier. INTEGRATE MATHEMATICAL PRACTICES Focus on Technology MP.5 Review the steps generating statistics Reflect 2. Explain why the median was NOT affected by changing the max data value from 150 to 65. The maximum value in the data set changed, but its ordered position did not, so the middle value in the ordered list was not moved or changed. using a graphing calculator. Students can create a list by pressing STAT, then selecting 1:Edit. A previously entered list can be cleared by highlighting the name of the list, pressing CLEAR, then pressing the down arrow. Your Turn 3. After entering data in a list, students can find the one-variable statistics by pressing STAT, selecting CALC, and then selecting 1:1-Var Stats. For data in lists other than L1, they must enter the list number before pressing ENTER to generate the statistics. Sports Baseball pitchers on a major league team throw at the following speeds (in miles per hour): 72, 84, 89, 81, 93, 100, 90, 88, 80, 84, and 87. Create a dot plot using an appropriate scale for the number line. Determine whether the extreme value is an outlier. x 70 x 75 80 x x 85 72 < Q1 - 1.5(IQR) ? xxxx x 90 x 95 Explain 2 72 < 81 - 1.5(9) ? 100 Pitching Speeds (mph) 72 < 67.5 False Therefore, 72 is not an outlier. Comparing Data Sets Numbers that characterize a data set, such as measures of center and spread, are called statistics. They are useful when comparing large sets of data. © Houghton Mifflin Harcourt Publishing Company AVOID COMMON ERRORS Students may expect their graphing calculators to provide the value of the IQR. Remind them that they must calculate the IQR by finding the difference between the first and third quartiles. 65 > Example 2 Calculate the mean, median, interquartile range (IQR), and standard deviation for each data set, and then compare the data. Sports The tables list the average ages of players on 15 teams randomly selected from the 2010 teams in the National Football League (NFL) and Major League Baseball (MLB). Describe how the average ages of NFL players compare to those of MLB players. NFL Players’ Average Ages, by Team 25.8, 26.0, 26.3, 25.7, 25.1, 25.2, 26.1, 26.4, 25.9, 26.6, 26.3, 26.2, 26.8, 25.6, 25.7 MLB Players’ Average Ages, by Team 28.5, 29.0, 28.0, 27.8, 29.5, 29.1, 26.9, 28.9, 28.6, 28.7, 26.9, 30.5, 28.7, 28.9, 29.3 Module 9 391 Lesson 2 COLLABORATIVE LEARNING IN1_MNLESE389755_U4M09L2.indd 391 Peer-to-Peer Activity Have students work in pairs. Have each student create a data set with 10 values, using the definition of outlier to verify that none of the values are outliers. Students then find the mean, median, range, and IQR for their data sets. Have students trade data sets with their partners. Ask each student to add an outlier to the partner’s data set, and then calculate the new mean, median, range, and IQR for the set. Students should compare their results and discuss how the outliers affected the statistics. 391 Lesson 9.2 01/04/14 9:27 PM On a graphing calculator, enter the two sets of data into L 1 and L 2. QUESTIONING STRATEGIES Use the “1-Var Stats” feature to find statistics for the data in lists L 1 _ and L 2. Your calculator may use the following notations: mean x, standard deviation σx. What can you conclude about two data sets by comparing each of the following statistics: mean, median, IQR, and standard deviation? By comparing the mean and median values, you can conclude whether the typical value for one data set is higher or lower than the typical value for the other set. By comparing the IQR and standard deviation values, you can determine whether the data values in one set are more or less spread out than the values in the other set. Scroll down to see the median (Med), Q 1, and Q 3. Complete the table. Mean Median NFL 25.98 26.00 MLB 28.62 28.70 IQR (Q 3 - Q 1) Standard deviation 0.60 0.46 1.10 0.91 Compare the corresponding statistics. The mean age and median age are lower for the NFL than for the MLB, which means that NFL players tend to be younger than MLB players. In addition, the IQR and standard deviation are smaller for the NFL than for the MLB, which means that the ages of NFL players are closer together than those of MLB players. The tables list the ages of 10 contestants on 2 game shows. Game Show 1 18, 20, 25, 48, 35, 39, 46, 41, 30, 27 Game Show 2 24, 29, 36, 32, 34, 41, 21, 38, 39, 26 On a graphing calculator, enter the two sets of data into L 1 and L 2. Mean Median IQR (Q 3 – Q 1) Standard deviation Show 1 32.9 32.5 16 10.00 Show 2 32 33 12 6.45 © Houghton Mifflin Harcourt Publishing Company Complete the table. Then circle the correct items to compare the statistics. The mean is lower for the 1st / 2nd game show, which means that contestants in the 1st / 2nd game show are on average younger than contestants in the 1st / 2nd game show. However, the median is lower for the 1st / 2nd game show, which means that although contestants are on average younger on the 1st / 2nd game show, there are more young contestants on the 1st / 2nd game show. Finally, the IQR and standard deviation are higher for the 1st / 2nd game show, which means that the ages of contestants on the 1st / 2nd game show are further apart than the age of contestants on the 1st/ 2nd game show. Module 9 392 Lesson 2 DIFFERENTIATE INSTRUCTION IN1_MNLESE389755_U4M09L2.indd 392 Multiple Representations 25/07/14 12:47 PM Students may benefit from acting out a real-world example of how adding an outlier to a data set affects measures of center and spread. For example, have five students each begin with 1 to 5 slips of paper (or pennies or markers); each slip represents a dollar. Have the students calculate the mean by equally distributing all the slips of paper among the five students. Then have a sixth student with $25 (25 slips of paper) join the group. Again use the slips of paper to find the mean by distributing them among the six students. Ask whether the new mean is a reasonable measure of center. Data Distributions and Outliers 392 Your Turn EXPLAIN 3 4. The tables list the age of each member of Congress in two randomly selected states. Complete the table and compare the data. Comparing Data Distributions Illinois 26, 24, 28, 46, 39, 59, 31, 26, 64, 40, 69, 62, 31, 28, 26, 76, 57, 71, 58, 35, 32, 49, 51, 22, 33, 56 AVOID COMMON ERRORS Arizona Students often confuse the terms skewed to the left and skewed to the right. Encourage students to come up with a mnemonic to help them remember how the direction of a skew should be described. For example, students may easily remember how the “tail” of a data distribution looks on a dot plot. Point out that both tail and skew have four letters, and that a data distribution is skewed in the direction of its tail. 42, 37, 58, 32, 46, 42, 26, 56, 27 Mean Median Illinois 43.81 39.5 Arizona 40.67 42 IQR (Q 3 - Q 1) Standard deviation 30 16.42 21.5 10.84 The mean is lower for Arizona, which means that, on average, members of Congress tend to be younger in Arizona than in Illinois. However, the median is lower in Illinois, which means that there are more young members of Congress in Illinois despite the differences in average age. Finally, the IQR and standard deviation are lower for Arizona, which QUESTIONING STRATEGIES means that the ages of members of Congress are closer together than they are in Illinois. Some data distributions are described as uniform. What do you think the general shape of a uniform distribution would be? The general shape of a uniform distribution is fairly even across the plot. Comparing Data Distributions A data distribution can be described as symmetric, skewed to the left, or skewed to the right, depending on the general shape of the distribution in a dot plot or other data display. © Houghton Mifflin Harcourt Publishing Company What would be true about the mean and median of a data set with a uniform distribution? The mean and median would be approximately equal. Explain 3 Skewed to the Left x x Example 3 Symmetric x x x x x x x x x x x Skewed to the Right x x x x x x x x x x x x x x x x x x x x x x x x x x x x x For each data set, make a dot plot and determine the type of distribution. Then explain what the distribution means for each data set. Sports The data table shows the number of miles run by members of two track teams during one day. Miles 3 3.5 4 4.5 5 5.5 6 Members of Team A 2 3 4 4 3 2 0 Members of Team B 1 2 2 3 3 4 3 Module 9 393 Lesson 2 LANGUAGE SUPPORT IN1_MNLESE389755_U4M09L2.indd 393 Connect Vocabulary English learners who are working on acquiring academic English in algebra may find that some terminology is difficult to pronounce or to differentiate when listening. Words such as effect and affect may be difficult to distinguish, and words such as skew or interquartile may be difficult to pronounce. Be sure to enunciate clearly so that students can understand and learn to pronounce the key words correctly. 393 Lesson 9.2 01/04/14 9:27 PM Team A x x x x x x x x x 3 Team B x x x x x x x x x x x x x x x x x x x x x x x x x x x 4 5 6 3 4 Miles The data for team A show a symmetric distribution. This means that the distances run are evenly distributed about the mean. B 5 6 Miles The data for team B show a distribution skewed to the left. This means that more than half the team members ran a distance greater than the mean. The table shows the number of days, over the course of a month, that specific numbers of apples were sold by competing grocers. Number of Apples Sold 0 50 100 150 200 250 300 Grocery Store A 1 4 8 8 4 1 0 Grocery Store B 3 6 8 8 2 2 1 Grocery Store A x x x x x 0 x x x x x x x x 100 x x x x x x x x Grocery Store B x x x x x x x x x x x x x x 200 300 400 0 Number of Apples sold x x x x x x x x x x x x 100 200 300 400 Number of Apples sold The distribution for grocery store B is: left-skewed/ right-skewed /symmetric. This means that the number of apples sold each day is evenly/ unevenly distributed about the mean. Reflect 7. Will the mean and median in a symmetric distribution always be approximately equal? Explain. The mean and median in a symmetric distribution will always be approximately equal because the values are equally distributed on either side of the center. 8. Will the mean and median in a skewed distribution always be approximately equal? Explain. The mean and median in a skewed distribution will not always be approximately equal © Houghton Mifflin Harcourt Publishing Company The distribution for grocery store A is: left-skewed/right-skewed / symmetric. This means that the number of apples sold each day is evenly / unevenly distributed about the mean. x x x x x x x x because the median will sometimes be closer to where the values cluster than the mean will be. Module 9 IN1_MNLESE389755_U4M09L2.indd 394 394 Lesson 2 25/07/14 12:47 PM Data Distributions and Outliers 394 Your Turn ELABORATE 9. Sports The table shows the number of free throws attempted during a basketball game. Make a dot plot and determine the type of distribution. Then explain what the distribution means for the data set. QUESTIONING STRATEGIES Can a data set have more than one outlier? Explain. Yes; More than one value may be less than Q1 - 1.5(IQR) or greater than Q3 + 1.5(IQR). Free Throws Shot 0 2 4 6 8 Members of Team A 2 2 4 2 2 Members of Team B 3 4 2 2 1 Team A INTEGRATE MATHEMATICAL PRACTICES Focus on Critical Thinking MP.3 Discuss with students whether all the values x x x x x x x x 0 2 4 Team B x x x x 6 8 Number of Free Throws in a data set could be outliers. Review the definition of outlier. Students should understand that because an outlier must be less than Q1 or greater than Q3, values between Q1 and Q3 will never be outliers for a data set. x x x x x x x x x x x x 0 2 4 6 8 Number of Free Throws The data for team A show a symmetric The data for team B show a distribution distribution. This means that the number of skewed to the right. This means that fewer free throws shot is evenly distributed about than half of the team members shot a the mean. number of free throws that were greater than the mean. Elaborate SUMMARIZE THE LESSON © Houghton Mifflin Harcourt Publishing Company How can you determine whether a value in a data set is an outlier? How does the inclusion of an outlier affect the mean, median, range, and IQR? An outlier is a value that is less than Q 1 - 1.5(IQR) or greater than Q 3 + 1.5(IQR). Outliers significantly affect the mean and range, but affect the median and IQR very little or not at all. 10. If the mean increases after a single data point is added to a set of data, what can you tell about this data point? If the mean increases after a single data point is added to a set of data, you can tell that the data point added was larger than the mean of the set. 11. How can you use a calculation to decide whether a data point is an outlier in a data set? You can decide whether a data point is an outlier in a data set by finding the 1st and 3rd quartile and subtracting them to get the interquartile range. If the data point is larger or smaller than the result found by adding the 3rd quartile to 1.5 times the interquartile range or by subtracting the 1st quartile from 1.5 times the interquartile range, respectively, then the data point is an outlier. 12. Essential Question Check-In What three shapes can data distributions have? Data distributions can be skewed to the left, skewed to the right, and symmetric. Module 9 Exercise IN1_MNLESE389755_U4M09L2 395 395 Lesson 9.2 Lesson 2 395 Depth of Knowledge (D.O.K.) Mathematical Practices 1–8 1 Recall MP.4 Modeling 9 1 Recall MP.5 Using Tools 10 2 Skills/Concepts MP.7 Using Structure 11 1 Recall MP.5 Using Tools 12 2 Skills/Concepts MP.7 Using Structure 6/9/15 12:34 PM Evaluate: Homework and Practice EVALUATE Fitness The numbers of members in 8 workout clubs are 100, 95, 90, 85, 85, 95, 100, and 90. Use this information for Exercises 1–2. 1. • Online Homework • Hints and Help • Extra Practice Create a dot plot for the data set using an appropriate scale for the number line. Possible plot shown. 60 x x x x x x x x 70 80 90 100 110 ASSIGNMENT GUIDE Number of Members 2. Suppose that a new workout club opens and immediately has 150 members. Is the number of members at this new club an outlier? 150 > 100 + 1.5(100 - 87.5) = 118.75 True 150 members is an outlier. Sports The number of feet to the left outfield wall for 10 randomly chosen baseball stadiums is 315, 325, 335, 330, 330, 330, 320, 310, 325, and 335. Use this information for Exercises 3–4. 3. Create a dot plot for the data set using an appropriate scale for the number line. Possible plot shown. 300 x x x x x x x x x x 310 320 330 340 350 © Houghton Mifflin Harcourt Publishing Company • Image Credits: ©Blend Images/Alamy Number of Feet 4. The longest distance to the left outfield wall in a baseball stadium is 355 feet. Is this stadium an outlier if it is added to the data set? ? 355 > 335 + 1.5(335 - 320) = 357.5 False 355 feet is not an outlier. Education The numbers of students in 10 randomly chosen classes in a high school are 18, 22, 26, 31, 25, 20, 23, 26, 29, and 30. Use this information for Exercises 5–6. 5. Create a dot plot for the data set using an appropriate scale for the number line. Possible plot shown. x x x xx xx 16 20 24 xxx 28 32 36 Concepts and Skills Practice Explore Using Dot Plots to Display Data Exercises 1, 3, 5, 7 Example 1 The Effects of an Outlier in a Data Set Exercises 2, 4, 6, 8, 17–18 Example 2 Comparing Data Sets Exercises 9–12 Example 3 Comparing Data Distributions Exercises 13–16, 19 INTEGRATE MATHEMATICAL PRACTICES Focus on Critical Thinking MP.3 Understanding how outliers can affect the mean and the median of a data set is an important skill, especially for interpreting data. Discuss how statistics can be misleading when outliers that affect the mean value for a data set are included. Number of Students 6. Suppose that a new class is opened for enrollment and currently has 7 students. Is this class an outlier if it is added to the data set? ? 7 < 20 - 1.5(29 - 20) = 6.5 False Module 9 7 is not an outlier. Lesson 2 396 Exercise IN1_MNLESE389755_U4M09L2.indd 396 13–15 16 17–18 19 Depth of Knowledge (D.O.K.) Mathematical Practices 2 Skills/Concepts MP.4 Modeling 1 Recall of Information MP.5 Using Tools 3 Strategic Thinking MP.3 Logic 2 Skills/Concepts MP.4 Modeling 25/07/14 12:47 PM Data Distributions and Outliers 396 Sports The average bowling scores for a group of bowlers are 200, 210, 230, 220, 230, 225, and 240. Use this information for Exercises 7–8. MODELING To help students think about possible causes for outliers in a data set, ask them to consider the distribution of heights of all the people in a kindergarten classroom, in a high school classroom, and on a basketball court. Discuss how many outliers might be expected in each case, and what factors might affect the number of outliers in each situation. Students should recognize that there is often a reason why one value is very different from the others in a data set, such as the fact that a kindergarten teacher may be the only adult in the classroom. 7. 8. Create a dot plot for the data set using an appropriate scale for the number line. Possible plot shown. x x 200 210 220 230 x 240 250 Bowling Scores Suppose that a new bowler joins this group and has an average score of 275. Is this bowler an outlier in the data set? ? 275 > 235 + 1.5(235 - 215) = 265 True 275 is an outlier. The tables describe the average ages of employees from two randomly chosen companies. Use this information for Exercises 9–10. 9. Company A Company B 23, 29, 35, 46, 51, 50, 42, 37, 30 24, 23, 45, 45, 42, 52, 55, 47, 55 Calculate the mean, median, interquartile range (IQR), and standard deviation for each data set. AVOID COMMON ERRORS Mean Median IQR (Q 3 – Q 1) Standard deviation Company A Mean 38.1 Mean 37 Mean 18.5 Mean 9.27 Company B Mean 43.1 Mean 45 Mean 20.5 Mean 11.33 10. Compare the data sets. Employees at company A tend to be younger than employees at company B. The ages of employees at company A are closer together than the ages of employees at company B. © Houghton Mifflin Harcourt Publishing Company Make sure students understand the process for determining the standard deviation for a data set. Encourage them to first create a table to record the deviation and squared deviation for each data value, then add the squared deviations, divide the sum by the number of values, and finally find the square root. Suggest that when they do not record their work, students can easily overlook a step in the process. x x x x The tables describe the size of microwaves, in cubic feet, chosen randomly from two competing companies. Use this information for Exercises 11–12. Company A Company B 1.8, 2.1, 3.1, 2.0, 3.3, 2.9, 3.3, 2.1, 3.2 1.9, 2.6, 1.8, 3.0, 2.5, 2.8, 2.0, 3.6, 3.1 11. Calculate the mean, median, interquartile range (IQR), and standard deviation for each data set. Mean Median IQR (Q 3 – Q 1) Standard deviation Company A Mean 2.6 Mean 2.9 Mean 1.2 Mean 0.59 Company B Mean 2.6 Mean 2.6 Mean 1.1 Mean 0.57 12. Compare the data sets. Microwaves from company B tend to be smaller than microwaves from company A. The average size of microwaves tend to be closer together at company B than at company A. Module 9 IN1_MNLESE389755_U4M09L2.indd 397 397 Lesson 9.2 397 Lesson 2 01/04/14 9:27 PM For each data set, make a dot plot and determine the type of distribution. Then explain what the distribution means for each data set. Possible plot shown. CRITICAL THINKING 13. Sports The data table shows the number of miles run by members of two teams running a marathon. Have students analyze and describe the shape of the distribution of a dot plot they created. Ask students how the shape relates to the statistics they would use to characterize the data. Miles 5 10 15 20 25 Members of Team A 3 5 10 5 3 6 10 4 1 Members of Team B 5 Team A Team B x x x x x x x x x x x x x x x x x x x x x x x x x x 5 10 15 20 25 30 x x x x x x x x x x x x x x x x x x x x x x x x x x 5 10 15 20 25 Miles 30 Miles The data for team A show a symmetric distribution. The distances run are evenly distributed about the mean. The data for team B show a right-skewed distribution. This means that fewer than half of the team members ran a distance greater than the mean. 14. Sales The data table shows the number of days that specific numbers of turkeys were sold. These days were in the two weeks before Thanksgiving. 10 20 30 40 Grocery Store A 2 5 5 2 Grocery Store B 5 5 1 3 Grocery Store A x x 0 10 Grocery Store B x x x x x x x x x x x x 20 30 40 x x x x x 50 0 Number of Turkeys IN1_MNLESE389755_U4M09L2.indd 398 x x x x 20 30 40 50 Number of Turkeys The data for grocery store A show a symmetric distribution. This means that the numbers of turkeys sold per day are evenly distributed about the mean. Module 9 10 x x x x x The data for grocery store B show a rightskewed distribution. This means that the store sold fewer than the average number of turkeys for more than half of the days. 398 © Houghton Mifflin Harcourt Publishing Company Number of Turkeys Lesson 2 25/07/14 12:47 PM Data Distributions and Outliers 398 15. State whether each set of data is left-skewed, right-skewed, or symmetrically distributed. JOURNAL Have students create their own graphic organizers to share with classmates, outlining the steps for finding mean, median, Q 1, Q 3, IQR, and standard deviation from a dot plot. A. 3, 5, 5, 3 symmetric B. 1, 1, 3, 1 right-skewed symmetric C. 7, 9, 9, 11 D. 5, 5, 3, 3 symmetric symmetric E. 19, 21, 21, 19 H.O.T. Focus on Higher Order Thinking 16. What If? Given the data set 8, 15, 12, 10, and 5, what happens to the mean if you add a data value of 40? Is 40 an outlier of the new data set? The mean increases from 10 to 15. 40 is an outlier of the new data set because 40 > 25.5. 17. Critical Thinking Can an outlier be a data value between Q 1 and Q 3? Justify your answer. An extreme value such as the max or min value can be an outlier, but by definition, no value between Q 1 and Q 3 can be an outlier. 18. Justify Reasoning If the distribution has outliers, why will they always have an effect on the range? When present, outliers will always have an effect on the range since one of the outliers will either be the highest or lowest number in a given data set and the range is found by finding the difference between the highest and lowest numbers. © Houghton Mifflin Harcourt Publishing Company 19. Education The data table describes the average testing scores in 20 randomly selected classes in two randomly selected high schools, rounded to the nearest ten. For each data set, make a dot plot, determine the type of distribution, and explain what the distribution means in context. Average Scores 0 10 20 30 40 50 60 70 80 90 100 School A 0 1 2 2 3 4 3 2 2 1 0 School B 0 1 1 1 2 4 5 4 2 0 0 School A School B x x x x x x x x x x x x x x x x x x x x 0 20 40 60 80 x x x x x x x x x 100 Test Scores The data for school A show a symmetric distribution. This means that the test scores were evenly distributed about the mean test score. Module 9 IN1_MNLESE389755_U4M09L2 399 399 Lesson 9.2 0 20 40 x x x x x x x x x x x 60 80 100 Test Scores The data for school B show a left-skewed distribution. This means that more than half of the classes received a test score that was above the mean. 399 Lesson 2 6/10/15 8:51 AM Lesson Performance Task INTEGRATE MATHEMATICAL PRACTICES Focus on Reasoning MP.2 Ask students whether the dealer who tended The tables list the daily car sales of two competing dealerships. Dealer A Dealer B 14 13 15 12 16 17 15 20 15 16 15 17 18 19 18 17 17 12 16 14 19 10 19 18 15 16 14 16 15 17 20 19 13 14 18 15 18 18 16 17 to sell more cars than a competitor would necessarily make the greater profit. Students should recognize that a greater number of car sales leads to a greater profit only when the profit per car is about the same in both cases. If one dealer sold more cars by setting the prices so low that there was a very small profit margin, that dealer could end up with lower profits despite having more sales. A. Calculate the mean, median, interquartile range (IQR), and standard deviation for each data set. Compare the measures of center for the two dealers. IQR (Q 3 – Q 1) Standard deviation 15 2 1.6 18 2.5 2.2 Mean Median Dealer A 14.85 Dealer B 17.3 QUESTIONING STRATEGIES The number of cars sold by Dealer A tends to be lower than the number of cars sold by Dealer B. What might be some reasons for an outlier to occur in a set of daily car sale values? Possible answers: There might have been a day with very bad weather, so no one went car shopping, or a day when the best salespeople were out sick, so they didn’t sell any cars. The number of cars sold by Dealer A are more consistent than the number of cars sold by Dealer B. B. Create a dot plot for each data set. Compare the distributions of the data sets. Dealer A x x x x x x x x x x x x x Dealer B x x x x x x x x x x x x x x x x x x x 10 11 12 13 14 15 16 17 18 19 20 The data for Dealer A show a symmetric The data for Dealer B show a distribution distribution, so the number of cars sold skewed to the left, so during more than half daily by Dealer A is evenly distributed of the days, car sales were greater than the about the mean. mean. C. Determine if there are any outliers in the data sets. If there are, remove the outlier and find the statistics for that data set(s). What was affected by the outlier? Dealer A: Dealer B: x < 14 - 1.5 (2) x > 16 + 1.5 (2) x < 16.5 - 1.5 (2.5) x > 19 + 1.5 (2.5) x < 11 x > 19 x < 12.75 x > 22.75 There are no values in the data set that satisfy these inequalities for x. So, there are no outliers. Module 9 © Houghton Mifflin Harcourt Publishing Company 10 11 12 13 14 15 16 17 18 19 20 x x x x x x x x 10 is an outlier in the data set for Dealer B. Removing the outlier increases the mean and decreases the standard deviation. The median is unaffected. 400 Lesson 2 EXTENSION ACTIVITY IN1_MNLESE389755_U4M09L2.indd 400 Explain to students that a bimodal data distribution has two peaks. Have students create a set of 20 daily car-sale values with a bimodal distribution, then create a dot plot and calculate statistics for the data. Ask what situations might produce this distribution. Students may speculate that a sudden change in sales tactics or prices could lead to several days with much higher or lower sales values than preceding days. Point out that neither the mean nor the median accurately represents a bimodal distribution. Explain that in some cases, such as when the data originate from two different sets of conditions, it is appropriate to split it into two data sets and evaluate them separately. 25/07/14 12:47 PM Scoring Rubric 2 points: Student correctly solves the problem and explains his/her reasoning. 1 point: Student shows good understanding of the problem but does not fully solve or explain his/her reasoning. 0 points: Student does not demonstrate understanding of the problem. Data Distributions and Outliers 400