Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Math 6 Notes – Unit 09: Statistics and Probability In this unit, students will learn to identify statistical and non statistical questions. They will draw dot plots, histograms and other graphical representations to visually display data sets. Then students will calculate measures of center and also analyze the spread of data sets by drawing box plots, finding the interquartile range, and finding the mean absolute deviations. NVACS 6.SP.A.1 - Recognize a statistical question as one that anticipates variability in the data related to the question and accounts for it in the answers. In order to achieve the goal in CCSS 6.SP.1, students must first understand what a “statistical question” is and what it is not. Simply put, a statistical question is one that can be answered (in a variety of ways) numerically. A statistical question anticipates an answer that varies from one individual to another; in doing so, the responses to a statistical question result in a set of data. Data are the numbers produced in response to a statistical question. If answers to a statistical question do not predict variability, then the question is not statistical. For example, a student asking themselves, “How old am I?” is not a statistical question; the answer is predictable. Example 1: Students should begin by simply “recognizing” statistical questions. Teachers should present a group of both statistical and non-statistical questions to students. Student teams or cooperative groups are allowed time to decide whether each question is statistical or non-statistical. Then team responses should be shared classroom wide. Questions - Statistical or not? a. How many pets do each of your teachers own? b. How old is the oldest member of a household? c. What is your classmate’s favorite flavor of ice cream? d. How many times does a sixth grader eat (on average) each day? e. What lunch item is served every day on “Pizza Day”? Questions a, b, and d are statistical; they can be responded to numerically and the questions will have varied responses. Questions c and e are non-statistical questions. Question c is not a numerically statistical question because students cannot respond numerically to the question. Question e is not statistical because the answer would be, predictably, pizza. Additionally, the answer is not numerical. Example 2: Teachers should provide students with slides of questions (either via a power point or Smartboard) to enhance student ability to identify “statistical” questions. Students will respond individually to whether the question is statistical or not with a thumbs up or thumbs down. If the question is statistical, students will be instructed to put their thumb up; if not, then Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 1 of 64 they will put their thumb down. After students have the opportunity to respond, teachers should expand on WHY the questions are either statistical or not. Possible questions: At what age do children learn to ride a 2-wheel bicycle? What are the favorite colors of the students in the class? Whom do you admire most? What time do you wake up on school days? What time does your school begin? How many pairs of shoes do you currently own? Which pair of shoes is your favorite? Next, teachers should allow students (or student groups) the opportunity to formulate both statistical and non-statistical questions. Teachers should instruct students (or student teams) to divide their notebook paper into two columns (one labeled “statistical”, one labeled “nonstatistical”). Individuals (or teams) should be given time to brainstorm examples of statistical or non-statistical questions. Student examples should be shared orally with the class with explanations provided by the students. STUDENT REFLECTION: What makes a question numerically “statistical”? Use the word “variability” in your response. Provide an example (with possible responses) to support your view. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 2 of 64 NVACS - 6.SP.A.2 - Understand that a set of data collected to answer a statistical question has a distribution which can be described by its center, spread, and overall shape. Dot plots (also known as line plots) can be used as a simple means for students to visualize data. Example: The number of various kinds of snakes found in a zoo is shown in the dot plot (below). What is the total number of snakes in the zoo? • • • • Black Cobra Number of Snakes in a Zoo • • • • • • • • • • • • • • • • • • • • • • • • • Python Anaconda Rattle Snakes Green Cobra Types of snakes A. B. C. D. 29 25 30 34 In addition to visualizing data, we use dot plots to view the distribution, center, spread, and overall shape of the data. Dot Plot A (below) shows the writing rubric scores of students based upon “organization”. Dot Plot B shows the writing rubric scores of the same group of 30 students based upon “ideas”. Dot Plot A – Organization X X X X X X X 1 Math 6, Unit 09 X X X X X X X X 2 X X X X X X X X 3 X X X X X X X 4 5 Statistics and Probability 2014-15-NVACS Page 3 of 64 Dot Plot B – Ideas X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 1 2 3 4 5 Showing the two dot plots vertically allows students to make direct comparisons in regard to the center of the graph, its spread, and its overall shape. Even though students have not been formally introduced to the mathematical definitions of mean, median, mode, and range, informal discussion about the center and spread of the data and picture can be discussed. Questions teachers might ask include: Where is the overall center of the data in graph A versus graph B? How is the shape of the graph different in graph B? What do you think causes this difference? Where are the scores “clustered” in graph A? Graph B? In which graph is data more spread out? Support your answer. Where is the actual middle of the set of data for A and B? Example 1: Students were asked their shoe size. The data was then collected and a dot plot was produced (with guidance from the teacher). An example dot plot is seen below. Shoe sizes for 6th Graders X X X X X X X X 4 5 6 Math 6, Unit 09 X X X X X X X X X 7 X X X X X X X X X X X X X 8 9 10 11 12 Statistics and Probability 2014-15-NVACS Page 4 of 64 Discussion questions: How many students responded to the question about their shoe size? How would you describe the “shape” of the graph? What does that shape indicate? Locate and describe the “center” of the data. That center is like the _______________? Most students have a shoe size around 6, 7, or 8; the data is “clustered” there. This means that if I were to calculate the measures of central tendency……… The data is not very spread out. Why do you think that is so? What does this tell us about the measures of variability? STUDENT REFLECTION: What does the shape of the scores have to do with the sameness or differences in their values? Explain. Does the range of the data affect its shape? Additional Examples: Teachers can also collect student data based upon other survey questions. Students can participate in the collection, organization, and representation of the data via a dot plot (line plot). Some possible example questions are: How many pets do your classmates own? At what age did your classmates begin riding a 2 wheel bicycle? How many siblings do your peers have? How much do the students in our math class weigh? How tall are the students in your math class? Each of these questions can be utilized by teachers and students to collect, organize, and display data. At that point students should be challenged to use their graphic representation of the data (line plot) to draw generalizations about the data set and its shape and spread and obvious “clusters”. Student Extension: For more advanced level students, teachers can show a set of data graphically represented on a dot plot. Teachers along with their students can describe the shape of the data and its overall spread. Teachers can then challenge student groups to hypothesize what the data could possibly and logically represent. In other words, students will be asked to identify the survey population (if any) and formulate a possible survey question that would result in that set of data. Now that students understand what statistical questions are they need to be given opportunities to gather data and make graphical representation of their data. It is necessary to show students how to create the various types of graphical representations as well as their benefits and limitations. To begin working with data, we must first understand the two types of data we will encounter in this unit—categorical data and numerical data. Categorical data consists of names, labels or other non- Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 5 of 64 numerical values such as movie preferences, types of animals, types of advertising, colors of medals given to winners of sporting events, etc. Categorical data is usually displayed in circle graphs and bar graphs. Numerical data consists of numbers such as weights of animals, lengths of rivers, heights of waterfalls, rainfall in a particular city, etc. Numerical data is displayed in histograms, line graphs, scatter plots, stem-and-leaf plots, and box-and-whisker plots. As students begin to work with data displays, it is important that they choose an appropriate display for the data set. Below is a quick summary of each of the displays students will encounter in this unit. Visual Display Usage/strength bar graph to compare categorical data. This graph type has no center and no spread. box-andwhisker plot to organize numerical data into four groups of approximately equal size. Used for large sets of data; appropriate for comparing two or more sets of data in terms of spread and skewness. histogram line graph 6 7 0359 48 Interval Key 4 2 = 42 Tally stem-and-leafplot to organize numerical data based on their digits frequency table to organize numerical data according to the number of times the item occurs Frequency 0-9 15 10 - 19 7 20 - 29 1 30 - 39 6 Math 6, Unit 09 to compare frequencies of numerical data that fall in equal intervals. Appropriate for displaying large sets of data or data sets with a large range. Strength is that it allows data to be manipulated by varying the intervals; gives an overall picture of the data by intervals without highlighting specific pieces of data within the set to display numerical data that change over time Statistics and Probability 2014-15-NVACS Page 6 of 64 X X X X X X 0 1 X X X X X 2 3 4 5 6 X X 7 8 Venn diagram To show how data are grouped or belong together. Dot plot (line plot) To display numerical data. Good to use for small to moderate sets of data with small to moderate range. Strength is that it is easy to create, highlights the distribution including clusters, gaps, and outliers DOT PLOTS (LINE PLOTS) Dot plots (line plots) is the most basic type of graph for representing data was featured in previous examples. It gives students a good visual representation of the shape of the data, where the center might be, and a visual of the data’s variability or spread. However, it is not very useful when there are a lot of data pieces because it can be cumbersome to create. FREQUENCY TABLE and HISTOGRAM The frequency of a data value is the number of times it occurs in a data set. How can you tell the frequency of a data value by looking at a dot plot? It is sometimes more convenient to show data that has been divided into intervals than to display individual data values. A Histogram is a type of bar graph whose bars represent the frequencies of data within intervals. Example: Make a histogram for the data given: 12, 3, 8, 1, 1, 6, 10, 14, 3, 6, 2, 1, 3, 2, 7 First, make a frequency table: Interval 1-4 5-8 9-12 13-16 Tally Frequency How many data values are in this interval? 8 4 2 1 A histogram is made up of adjoining vertical rectangles or bars. In a histogram the horizontal axis typically identifies the topic of the graph and the vertical axis describes the frequency of those observations. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 7 of 64 Age of Children at the Park 9 8 # of children 7 6 5 4 3 2 1 0 1-4 5–8 9 - 12 Age Groups 13 - 16 STEM-AND-LEAF PLOT The following test scores are used to construct a stem-and-leaf plot: 82, 97, 70, 72, 83, 75, 76, 84, 76, 88, 80 81, 81, 82, 82 First determine how the stems will be defined. In this case, the stem will represent the tens column in the scores, the leaf will be represented by the ones column. When the information is presented, it will be in two parts, the stem and leaf. For instance, 5 | 7 4 would be read as follows: The stem represents fifty, and the leaf has two scores, 7 and 4. Reading that information then gives 57 and a 54. Student’s Test Scores Since the lowest score is in the 70’s and the highest is in the 90’s, the stem will consist of 7, 8, and 9. Usually, the smaller stems are placed on top, but they can be arranged from largest to smallest. Another decision to be made is whether or not to put the scores in order in the leaf portion. Stems 7 8 9 Leaves 0 2 5 6 6 2 3 4 8 0 1 1 2 2 7 Key: 8|1 = 81 If the stem-and-leaf plot were to be rotated 90 degrees (a quarter turn), the graph would resemble a bar graph (which leads to the next type of graph to discuss). Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 8 of 64 BAR GRAPH Using the same information above, let’s construct a bar graph to show how many A, B, C, D, and F’s there are. A’s are defined as 90 and above, B’s from 80 to 89, C’s 70 to 79, etc. Sometimes, placing bars side-by-side in pairs makes it easier to display the kinds of comparisons you want to show. This is called a double-bar graph. Let’s compare the grades earned in the class discussed so far to another classroom of students. Both the old data and the new data is summed up in the table below: First classroom Second classroom Number of A’s 1 3 Number of B’s 9 7 Number of C’s 5 5 The double bar graph could look like: Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 9 of 64 POINTS ON THE COORDINATE PLANE (which helps with creating grids for line graphs and scatter plots) A coordinate plane is formed by the intersection of a horizontal number line (called the x-axis) and a vertical number line (called the y-axis). It consists of infinitely many points called ordered pairs. The intersection of the two axes is called the origin. The coordinate plane is divided into six parts: the xaxis, the y-axis, Quadrant I, Quadrant II, Quadrant III, and Quadrant IV. Quadrant II y-axis y Quadrant I 6 5 4 3 2 1 -6 -5 -4 -3 -2 -1 -1 -2 -3 -4 -5 -6 x-axis 1 2 3 4 5 6 x Quadrant III Quadrant IV The axes divide the plane into four regions called quadrants which are numbered I, II, III, and IV in the counterclockwise direction. An ordered pair is a pair of numbers that can be used to locate a point on a coordinate plane. The two numbers that form the ordered pair are called the coordinates. The origin is the intersection of the x- and y-axes. It is the ordered pair (0, 0). The ordered pairs are listed in alphabetical order; (x, y) – (input, output), (horizontal axis, vertical axis), (abscissa, ordinate), (domain, range). The alphabetical order will help you remember these terms. y To find the coordinates of point A in Quadrant I, start from the origin and move 2 units to the right, and up 3 units. Point A in Quadrant I has coordinates (2, 3). To find the coordinates of point B in Quadrant II, start from the origin and move 3 units to the left, and up 4 units. Point B in Quadrant II has coordinates (−3, 4). B 6 5 4 3 2 1 -6 -5 -4 -3 -2 -1 -1 -2 C -3 -4 -5 -6 A 1 2 3 4 5 6x D To find the coordinates of the point C in Quadrant III, start from the origin and move 4 units to the left, and down 2 units. Point C in Quadrant III has coordinates (−4, −2). To find the coordinates of the point D in Quadrant IV, start from the origin and move 2 units to the right, and down 5 units. Point D in Quadrant IV has coordinates (2, −5). Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 10 of 64 y 5 Notice that there are two other points on the above graph, one point on the x-axis and the other on the y-axis. For the point on the x-axis, you move 3 units to the right and do not move up or down. This point has coordinates of (3, 0). For the point on the y-axis, you do not move left or right, but you do move up 2 units on the y-axis. This point has coordinates of (0, 2). Points on the x-axis will have coordinates of (x, 0) and points on the y-axis will have coordinates of (0, y). -5 5 x -5 To plot points in the coordinate plane, you must follow the direction given. The first number in an ordered pair tells you to move left or right along the xaxis. The second number in the ordered pair tells you to move up or down along the y-axis. Example: To plot the point (4, 2), we start at the origin and move four units to the right and then move two units up. When you make a graph, the first thing you do is choose the axes. Next, you choose the scale— the numbers running along a side of the graph. The difference between numbers from one grid line to another is the interval. The interval will depend on the lowest and highest values in your data. When you can, you should choose scales with the numbers starting at zero and increasing by ones or other convenient equal intervals. LINE GRAPHS A line graph displays a set of data using line segments. This graph shows data that has changed over time. Example: The following data represents the monthly class average over a 12 month period. 82, 97, 70, 72, 83, 75, 76, 84, 76, 88, 80, 81 The data can be arranged on a line graph: Class Averages 100 95 90 85 80 75 70 65 60 55 J F M A M J J A S O N D The advantage of a line graph is that it shows changes in data over a period of time. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 11 of 64 Drawing Conclusions from Data When looking at results, you must consider several questions like: Is the data display misleading? What is the margin of error? Are the conclusions supported by the data? Margin of error of a random sample defines an interval centered on the sample percent in which the population percent is most likely to lie. Example: A sample percent of 27% has a margin of error of ±4%. Find the interval in which the population percent is most likely to lie. 27% − 4% = 23% and 27% + 4% = 31% . The interval is between 23% and 31%. If p % of a sample gives a particular response and the sample is representative of the population, then: # of people in the population p % ⋅ ( # of people in population ) = giving the response Example: A survey of 300 randomly selected cat owners finds that 120 cat owners prefer Brand C cat food. How many owners in a town of 2000 cat owners prefer Brand C cat food? The percent of cat owners in the sample that preferred Brand C is 120 = 40% ; 40% of 2000 = 800 cat owners 300 Example 1: Which of the following graphs most accurately depicts the hourly wages earned with respect to time worked? B C Hours Worked Hours Worked D Earnings Earnings Earnings Earnings A Hours Worked Hours Worked Discussion: Students should recognize that if one is paid by the hour, then only Graph C could illustrate it. A person would not earn money at no (zero) hours as shown in graph A. Looking at the other graphs, you could rule out B as a person would not earn the same amount for different amounts of hours worked. Graph D shows a person earning less money as the person works more hours. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 12 of 64 Example 2: The graphs below show the numbers of baskets made by Player A and Player B during 5 basketball practices. Each player takes 100 practice shots during each practice. According to the graphs, who was more successful at making baskets? Player B Player A 100 80 Number of Baskets Number of Baskets 100 80 60 40 20 A. Player A did much better. B. Player B did much better. C. Their scores appear to be about the same. D. More information is needed. 80 78 60 76 40 74 20 72 0 0 0 1 2 3 4 0 5 1 Practice 2 3 4 5 Practice Discussion: In this problem, there will be students who think Player B has scored the most baskets because of the steepness of the linear segments. Some students will think Player A scored the most baskets because his line segments looks consistently higher. They need to look carefully at the vertical scaling and note that Player A started about 70-71 and ended at 80, the same as Player B. (Answer is C.) Example 3: Why is this graph misleading? Season Attendance (in thousands) # of spectators (thousands) 50 The heights of the balls are used to represent the number of spectators. However, the area of the balls distorts the comparison. 40 30 The attendance for both sports doubled in the 10 year period; the size of the basketball makes it look like that increase may have been a lot more. 20 10 0 0 1990 2000 1990 2000 Year Example 4: Why is this graph misleading? The two bar graphs shown here picture the advertised gasoline mileage for four cars. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 13 of 64 The graph on the right distorts the differences in mileage because its left-hand scale does not start with 0. In reading a graph, always check to see that numbered scales start with 0. NVACS – 6.SP.B.4 - Display numerical data in plots on a number line, including dot plots, histograms, and box plots. Box-and-Whisker Plot The purpose of a box-and-whisker plot is to organize numerical data into four groups of approximately equal size. Let’s take a look at what might be a way of giving notes to your students to help them learn the steps for creating a box-and-whisker plot. As you demonstrate the example in the third column, students are working with you taking the notes. Column 1 allows for guided, group or independent practice once they have an idea how to make a boxand-whisker plot. You try: Making a Box & Whisker Plot Ex: Steps 5 10 7 9 8 6 11 10 12 8 14 16 16 11 13 11 15 8 1. Arrange data in increasing order (minimum value & maximum value are the endpoints). 2. Find median of the entire list (median value). 5 6 7 8 9 10 11 5 6 7 8 9 10 11 a. If there is a number in the list that is the middle term, circle it and draw a line thru it. (median) 8 is the median 5 6 7 8 9 10 11 b. If there is not a number that is in the middle, draw Does not apply to this problem a line between the two numbers. (Median is the number halfway between Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 14 of 64 the two numbers.) 3. Look at the bottom half of the numbers. Find the median of the bottom half of numbers (lower quartile, same as #2 above). 4. Look at the top half of the numbers. Find the median of the upper half of numbers (upper quartile, same as #2 above). 5. Draw a number line that will cover the range of data (evenly spaced marks). 5 6 7 9 10 11 4 6 8 10 12 14 16 4 6 8 10 12 14 16 6. Slightly above the number line place dots at the following points: minimum, lower quartile, median, upper quartile, and maximum. 7. Draw a box with side borders being the lower and upper quartiles. Draw two lines, one from each side of the box, connecting the minimum point on one side and the maximum point on the other side. Draw a vertical line at the median point from the top to the bottom of the box. lower quartile median 6 8 10 12 upper quartile maximum minimum 4 4 6 8 10 12 14 16 The data in the box represents the interquartile range – IQR, the average, the middle 50%. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 15 of 64 14 16 The whisker on the left represents the bottom quartile, the bottom 25%; the whisker on the right represents the top 25%. The difference between the upper and lower quartiles is called the “interquartile range” (IQR). A statistic useful for identifying extremely large or small values of data is called an “outlier”. An outlier is commonly defined as any value of the data that lies more than 1.5 IQR units below the lower quartile or more than 1.5 IQR units above the upper quartile. As we get more in depth with these plots we will find that an extreme value is commonly defined as any value of the data that lies more than 6 IQR units below the lower quartile or more than 6 IQR units above the upper quartile. In our example the lower quartile was at 6, the upper at 10. Using that the IQR = . Multiplying that by 1.5, we have . Therefore, any score below 6 − 6 = 0 is an outlier, as is any score above There are no points below 0, so we are OK on the left. There are no points greater than 16, so we are OK on the right. This would be an ideal place to use technology. Next are the instructions for drawing a box-and-whisker plot on the TI-84. The examples will address outliers. Entering Data and Drawing a Box-and-Whisker plot on the TI-84 1. STAT 2. EDIT 3. Enter the numbers in List 1 (L1) or List 2 (L2) 4. Return to the home screen 5. STAT PLOT modified regular 6. Turn Stat Plot 1 on and select the type of boxplot (modified or regular) 7. ZOOM 8. ZoomStat (9) Modified 9. GRAPH Regular The TI-84 graphing calculator may indicate whether a box-and-whisker plot includes outliers. One setting on the graphing calculator gives the regular box-and-whisker plot which uses all numbers, so the furthest outliers are shown as being the endpoints of the whiskers. Another calculator setting (modified) gives the box-and-whisker plot with the outliers specially marked (in this case, with a simulation of an open dot), and the whiskers going only as far as the highest and lowest values that aren't outliers. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 16 of 64 Find the outliers and extreme values, if any, for the following data set, and draw the boxand-whisker plot. Mark any outliers with an asterisk and any extreme values with an open dot. 20, 21, 21, 23, 23, 24, 25, 25, 26, 27, 29, 33, 40 To find the outliers and extreme values, I first have to find the IQR. Since there are thirteen values in the list, the median is the seventh value, so Q2 = 25. The first half of the list is 20, 21, 21, 23, 23, 24, so Q1 = 22; the second half is 25, 26, 27, 29, 33, 40 so Q3 = 28. Then IQR = 28 – 22 = 6. The outliers will be any values below 22 – 1.5×6 = 22 – 9 = 13 or above 28 + 1.5×6 = 28 + 9 = 37. The extreme values will be those below 22 – 3×6 = 22 – 18 = 4 or above 28 + 3×6 = 28 + 18 = 46 Another example: L2 =21, 23, 24, 25, 29, 33, 49 So I have an outlier at 49 but no extreme values, so I won't have a top whisker because Q3 is also the highest non-outlier, and my plot looks like this: Venn Diagrams Venn diagrams are used to show relationships among sets of objects. Let’s look at a Venn Diagram made up of three sets in which the regions are labeled. We’ll describe each region. U 1 3 2 B A 5 4 6 7 C Math 6, Unit 09 Statistics and Probability 2014-15-NVACS 8 Page 17 of 64 • • • • Region 5 is in all three circles. So any elements in region 5 would belong to all three sets. What about Region 2? Those are the elements in A and B, but not C. How might you describe Region 6? Those are elements in B and C, but not in A. Try Region 4. The elements in A and C, but not B. Let’s look at some more regions. Region 1 describes the elements in A only. What about region 3? Those elements are only in B. Region 7 then would be the elements in C only. Region 8 would describe elements that are not members of any of the sets, but belong to the universal set. It’s important that you become familiar with how each of those regions might be described. Being able to describe those regions would allow you to solve some problems. Let’s try a problem. Example: A survey was taken of 650 university students. It was reported that 240 were taking math, 290 were taking biology, and 270 were enrolled in chemistry. Of those students, 80 were taking biology and math, 70 were taking math and chemistry, 60 were taking biology and chemistry, and 50 were taking all three classes. How many students took math only? A Venn Diagram has been drawn to help show the problem. 140 30 200 B M 50 20 10 190 C How many students took only math? Other questions you can answer: How many students took math and biology, but not chemistry? How many students took math and chemistry, but not biology? Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 18 of 64 How many students took biology and chemistry, but not math? How many students took exactly two of the courses? Now that students make a graphical representation of the data and discuss the general shape, center and spread of the data, it is time to move them to be able to compute measure of center and variability. 3 Measures of Central Tendency: 1. Mean 2. Median 3. Mode MEAN Students need to conceptually understand the concept of mean. Be sure to give them time to experience that mean is an “equal distribution” or everyone getting a “fair share”. 1. One way to do this is to bring a large bag of treats (eg. M&M’s, Jolly Ranchers, etc.) and distribute them to students at random in varying amounts. The students who get nothing will usually begin to comment or complain that they didn’t get any, another will say they didn’t get their share, others will say they got less than their neighbor . This sets the stage for a great discussion on “equal distribution” or “fair share” where everyone gets the same amount. 2. Another way to do this is give a problem and have students solve it using unifix cubes, blocks, chips, etc. Example: The table shows the number of students absent from 4th period last week. What was the mean number of students absent per day? Day # of students absent Monday 2 Tuesday 5 Wednesday 2 Thursday 1 Friday 5 1. Have students model the data using their manipulatives. 2 Math 6, Unit 09 5 2 1 5 Statistics and Probability 2014-15-NVACS Page 19 of 64 Have students “even out” or “equally distribute” the counters until each column has the same number of counters. (remind students they have 5 columns, Monday – Friday, and must have 5 columns in the end.) 3 3 3 3 3 Example: With large groups or whole class, have each student create a stack (of unifix cubes) that represents the number of letters in their last name. (eg. Long = ) Have students display their stacks together. Have students compute the mean of the letters in their classes last names using only the unifix cubes and redistributing them or evening them out. Of course you must be ready to discuss remainders and what they represent. Let’s say the last names you were working with looked like this: Once distributed it might look like this. Since each of the columns are 6 blocks high and a few extra, the mean is 6 and something. What do we do with the 3 extra blocks? Since we have 3 extra blocks and 5 columns our mean is 6 3/5. Procedurally, the mean is the one that is probably most familiar; it’s the one often used in school for grades. To find the mean, you simply add all the scores and divide by the number of scores. For example, if a student scores 70, 80, and 90 on three tests, the mean is calculated as follows: add the three scores, 70 + 80 + 90 = 240, then divide the sum by the number of data pieces→ 240/3 = 80. The mean is 80. The average is 80. Example: Find the mean of 72, 65, 93, 85, and 55. First add those 5 scores together; 72 + 65 + 93 + 85 + 55 = 370 Second, divide that total by the number of scores, 370 ÷ 5 = 74. The mean is 74. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 20 of 64 Example: Five kids just finished bowling one game. The average score of the five kids is 82. What is the total of all 5 scores? Having a mean of 82 does not mean each kid scored an 82—it means if the scores were distributed equally, they would each have 82. Their total score is 82 ⋅ 5 = 410. When thinking of the mean, you need to think of the TOTAL units being distributed EQUALLY. Problem: If the mean is 6, find the missing value for the set of numbers 3, 4, 5, , 9. Method 1 Looking at the data given we know the mean is 6 and there are 5 data in the set. So the total sum of the data must be 6 x 5 = 30. Adding the data that we have 3 + 4 +5 + 9 = 21. =9 The missing value must be 30 – 21 = 9. Method 2 Knowing the mean is 6, examine each piece of data given with reference to the mean being 6. 3 4 5 9 -6 -6 -6 -6 -3 -2 -1 +3 Summing the differences you get -3. Since you are 3 short (negative) add 3 to the mean, 6 + 3 = 9. =9 Method 3 3 + 4 + 5 + x + 9 = 30 6 x + 21 = 30 x=9 Math 6, Unit 09 OR 3 + 4 + 5 + x +9 = 6 + 6 + 6 + 6 + 6 x+ 21 = 30 x=9 Statistics and Probability 2014-15-NVACS Page 21 of 64 MEDIAN The median, often used in finance, is the middle score when the data is listed in either ascending or descending order. If there is no middle score, take the two middle scores, add them and divide by 2. It’s also referred to as the average of the two middle scores. Example: Find the median of 72, 65, 93, 85, and 55. Step 1: Rewrite the data in ascending order: 55, 65, 72, 85 and 93. Step 2: The middle score is 72; therefore, the median is 72. Notice in the first two examples the data is the same, but the mean and median are not the same. Example: Find the median of 72, 65, 93, 85, 74, and 55. Step 1: Step 2: Step 3: Rewrite the data in ascending order: 55, 65, 72, 74, 85, 93 Notice there is no middle score. Add the two middle scores together and divide by 2. 72 + 74 = 146, 146 ÷ 2 = 73. The median is 73. When thinking about the median, you need to think of the MIDDLE score if they are listed in ORDER. MODE MODE The mode is the value point that appears most frequently. Example: The following are test scores: 55, 64, 64, 76, 78, 81, 81, 81, and 92. Find the mode. The score that appears most often is 81. When thinking about the mode, you need to think of the score that APPEARS MOST FREQUENTLY. Note: a distribution may have no mode, one mode or more than one mode. Example 1: Find the mode of 55, 64, 64, 76, 78, 81, 81, 81, and 92. What scores appears most often? The mode is 81. Example 2: Find the mode of 8, 9, 11, 14, 15, and 17. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 22 of 64 What scores appears most often? None of these, so there is no mode. Example 3: Find the mode of 17, 15, 15, 14, 14, 11. What scores appears most often? Both 15 and 14 appear twice, so the modes are 15 and 14. An outlier – an extreme value that is much smaller than or much larger than the rest of the data in a data set – can greatly affect the mean. It is important to note “which” measure of central tendency is “most” useful to use. Measure Most useful when mean the data are spread fairly symmetrically without outliers median the data set has an outlier the data involve a subject in which many data points of one value mode are important, such as election results These three measures of central tendency, most often referred to as averages, describe a set of data using a single number. By condensing the information like that, the whole picture may not be seen. The following example shows how using an average or mean to describe a performance may be misleading. Let’s say Abe, Ben and Carl each bowl 3 games. Three games later, they all found they had a mean of 80. Here are the scores for each person: Abe’s scores: 80, 80, 80 Ben’s scores: 70, 80, 90 Carl’s scores: 65, 75, 100 In this case, the mean may not be a good indicator or each person’s performance. Looking at Carl’s scores, it appears he’s a little erratic. It might be difficult to predict what he might score on the next game using the average. Abe, on the other hand, looks pretty stable as he’ll probably score an 80 on the next game. Abe and Ben both have the same average—this example shows that one mean is a pretty good descriptor, which would allow you to predict more comfortably what might happen next. In other words, the mean is doing a pretty good job of describing what’s happening. Carl’s mean is not as good of a descriptor as Abe’s. Although his average is 80 (like Abe’s), the mean does not do a good job of describing what is happening. Abe’s mean better describes what is occurring than Carl’s mean. But, if the scores were not shown, it would not be evident how consistent Abe is and how erratic Carl is because their averages are both 80. Sometimes, more information is needed. One way to do this is to look at all the scores and try to determine consistency. In math, rather than looking at the whole set of data, only the high and low scores might be examined. It might be someone just had one super high or low score that really affected the mean. In other words, determine the spread of the scores. In statistics, that’s referred to as variability. There are three ways to measure this spread or variability. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 23 of 64 3 Measures of Spread/Variability 1. Range 2. Interquartile Range (IQR) 3. Mean Absolute Deviation (MAD) RANGE The range is just the difference between the top score and the bottom score. The larger the range, the less likely the mean can be depended upon as a good descriptor or predictor. In the last example, the range of Abe’s scores was zero. The range of Carl’s scores was 35 and Ben’s range was 20. INTERQUARTILE RANGE (IQR) Students can describe measures of variability in a data set by finding the interquartile range and the mean absolute deviation (MAD). The interquartile range is found by subtracting the lower quartile from the upper quartile. Example: Student’s Heights (in inches) 60 65 58 61 54 62 56 59 63 56 61 58 When the student data is ordered it appears like this: 54 56 56 58 58 59 60 61 61 62 63 65 The median of the data is 59.5 (59 + 60)/2. The significance of this number is that half the data is less than 59.5 and half the data is greater than 59.5. The Lower Quartile is the median of the lower half of the data. In this example, the lower half of the data is 54, 56, 56, 58, 58, and 59. The median of these six data is 57 (56 + 58/2). The Upper Quartile is the median of the upper half of the data. In this example, the upper half of the data is 60, 61, 61, 62, 63, and 65. The median of these six data is 61.5 (61 + 62/2). The Interquartile Range is the difference between the upper quartile and the lower quartile. 61.5 – 57 = 4.5. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 24 of 64 The smaller the interquartile range, the less variability there is in the data. The greater the interquartile range, the greater the variability in the data. An interesting reason to study box-and-whisker plots is when we begin to compare two or more plots and examine the data. Following are several examples to highlight this concept. Example: Given below are boxplots displaying the annual temperatures for two cities, Seattle and Boston. What information and generalizations can you see in the plots? Seattle 41 45 52 61 66 Boston 29 20 30 51.5 36.5 40 50 74 66.5 60 70 80 Some information you should note during your discussion with students, could include, but not be limited to: • The median temperature for Seattle and Boston is essentially the same. • Boston’s data is more spread out. • Boston’s range is greater so its temperature fluctuates more than Seattle’s. • Boston’s temperature range is wider than Seattle’s. • Boston has greater high temperatures and lower low temperatures. • More than 25% of the days in Boston the low temperatures are below Seattle’s lowest temperature. • 1/4 of the days with high temperatures in Boston are higher than Seattle’s highest temperature. Example: Look at the three boxplots below. Even without a scale, what can you say about the 3 temperature plots in general? A. B. C. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 25 of 64 General discussions might include: Plot A • • • • This city has a lot of cold days (in comparison to the other two). ¼ of the days the temperatures are very close (in the lower quartile). The number of days of lowest temperatures appear to have a great range. The number of days of temperatures in the upper quartile have a great range. Plot B • • • This city has a lot of hot days (in comparison to the other two). The variability of cold temperatures is great. The variability of high temperatures for ½ the year is small. Plot C • • This city has real extremes - outliers. Without the outliers, this city has the smallest range of temperatures. Looking back at the boxplots, if you were told one of the graphs represents temperatures in Las Vegas and another represents temperatures in Hawaii which graphs might represent them? Let students explain why they would choose one plot over another. In several Take It To the Mat articles, the issue of box-and-whiskers is addressed. Below is an excerpt from one of the articles. For the complete articles go to www.rpdp.net > Math > Take It To The Mat> HS Edition>Data Analysis and Probability> October 2003, November 2003 or December 2003. The impression of Las Vegas residents is that October 2003 was unusually warm. Half of the days had high temperatures of 89º F or more and fully three-fourths of days were at or above 84º. The coolest high temperature was 63º, but after looking at the raw data that seems like an anomaly when compared to the rest. Was it really that warm in October 2003? How did it compare with the year before? One of the powers of boxplots is to answer questions about comparing distributions. In this case, we want to compare the highs in October 2003 with those from October 2002. The five-number summary for October 2002 is {59, 74, 80, 84, 92}. Parallel boxplots for both Octobers 2002 and 2003 are shown at right. 2002 2003 50 x 70 60 80 90 100 110 High Temperature (deg F) The boxplots clearly show that October 2003 was warmer, on the whole, than October 2002. The median high temperature in October 2003 is 9 degrees warmer than in October 2002. The coolest three-fourths of high temperatures in 2002 were below the first quartile in 2003. The middle halves of each data set don’t even overlap! Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 26 of 64 We know that October 2003 was much warmer than October 2002, but how does it compare with what is considered “normal?” (Normal is the average daily high temperature since 1937.) Norm Consider the parallel boxplots at right and draw your own conclusions. 2002 2003 x 50 70 60 80 90 100 110 High Temperature (deg F) Useful suggestions for ways to connect these vocabulary words and concepts with your students may include: Box and Whiskers Song By: Karl Spendlove Sung to the tune of “Oh My Darling, Clementine” Put in order Find the median Find the median of the top Find the median of the bottom Then you draw the whisker plot Chorus: Box and whiskers Box and whiskers Put the data to the test Boxes show the middle 50 And the whiskers show the rest. MEAN ABSOLUTE DEVIATION (MAD) Students should additionally be introduced to mean absolute deviation (MAD) as a way to gauge variability. MAD is the average of how far away each piece of data is from the mean. Simply put, the greater the relative MAD, the more variability in the data set. The smaller the MAD, the less the variability of the data set as a whole. To compute mean absolute deviation: Step 1. Find the mean of the data. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 27 of 64 Step 2. For each piece of data, find the distance that data is from the mean and take it’s absolute value. Step 3. Find the average of the distances from the mean. Example: 45 45 45 65 70 55 70 Period 1 Test Scores 55 55 60 70 70 70 75 60 60 60 65 65 75 75 80 85 100 Step 1. Mean (1575/24 = 65.625 ≈ 66%) Step 2. Find the distance each score is from the mean and take it’s absolute value. 45 45 45 55 55 55 60 60 60 60 65 65 absolute 21 distance from mean 21 21 11 11 11 6 6 6 6 1 1 65 70 70 70 70 70 75 75 75 80 85 100 4 4 4 4 4 9 9 9 14 19 34 score score absolute distance from mean 1 Step 3. Find the average of the distances: 234 ÷ 24 = 9.875 The mean absolute deviation is 9.875 Example: score 45 70 Period 2 Test Scores 70 75 75 80 80 80 80 85 85 85 absolute distance from mean 20 85 score 15 90 15 90 10 90 10 95 5 95 5 95 5 95 5 100 0 100 0 100 0 100 absolute distance from mean 5 5 5 10 10 10 10 15 15 15 15 0 Mean (2045/24 = 85.21= 85%) 225/24 = 9.375 Mean Absolute Deviation Period 1 has a mean absolute deviation of 9.875; period 2 has a mean absolute deviation of 9.375 (similar, but slightly lower). What does this tell us about the data sets? First of all it tells Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 28 of 64 us that according to this measure of variance, these two data sets are relatively similar. Overall, the data in each set is clustered reasonably close together. Outliers, students should understand, can have a great effect on the mean absolute deviation. NVACS - 6.SP.A.3 - Recognize that a measure of center for a numerical data set summarizes all of its values with a single number, while a measure of variation describes how its values vary with a single number. In order to master standard 6.SP.A.3, students will need an understanding of the difference between measures of center (mean, median, and mode) versus a measure of variability (range, IQR, MAD). These measures are typically taught as a unit (often in one day) however, it would be advantageous, in terms of student understanding, to separate them from the beginning of the lesson. Measures of center describe the data with a single numerical value; however, a measure of variability, describes how the data differs (or varies) from other data values in the set. Simple examples can help your students to see the difference between measures of center and measures of variability. Example 1: The following data set represents the ages of people who attended the 70th birthday party of your grandmother at the senior center: 70, 75, 80, 72, 82, 81, 73, 78, 82 Measures of Center Measure of Variability Mean= 693/9 = 77 Range = 82 – 70 = 12 Median= 78 Mode= 82 The measures of center reasonably describe the data. If someone was to ask you, “How old were the people who attended your grandmother’s birthday party?”, any one of the measures of central tendency would be a “reasonable” representation of the data although in this case the mean and median are better than the mode. However, the range does not reasonably describe the data in one number; were the people who attended grandmother’s birthday party around 12 years old? No, all the range tells us here is that the people who attended are elderly. Other data sets where the range is significantly different from the measures of center provide students with an understanding that sets “measures of center” apart from “measures of variability”. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 29 of 64 Example 2: Provide students with a dot plot of data. Student Scores on Math Constructed Response X X X X X X X X X X X X X X X X X X X X 0 1 2 3 Ask students to consider the data shown in the dot plot of constructed response scores among students. The following questions are appropriate discussion questions: • • • • • How many students are represented in the data set for students constructed response scores? How do you know the sample size? What are the measures of center (mean, median, and mode)? What do these values mean? How do they compare with one another? What is the range (measure of variability) of the data set? What does its value mean? If you have to choose one number to DESCRIBE the data set, which measure would you use? Defend your choice. In this example, why do you think the measures of center and the measure of variability are so similar? Explain your reasoning. Student Reflection: Identify the measures of center. Identify a measure of variability. In what ways are they alike? Different? Which measure of center best describes a set of data? NVACS – 6.SP.B.4 - Display numerical data in plots on a number line, including dot plots, histograms, and box plots. In order for students to display numerical data they need to be equipped to make decisions and perform mathematical calculations. Decisions need to be made in regard to which graphical display is MOST appropriate for the data. Basic mathematical calculations often need to be made to choose a graphical display and to accurately graph the given data. Students are expected to choose the most appropriate display for a given set of data based upon their understanding of the data and the various ways to display it. Additionally, students are expected to read and interpret graphs from other sources. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 30 of 64 Type of Display Dot Plot Example A Histogram Example B Box Plots Example C Strengths of the Display Simple; easy to create; highlights the distribution of the data including clusters, gaps, and outliers Allows data to be manipulated by varying the intervals used in the histogram; gives an overall picture of the data by intervals without highlighting specific pieces of data within the set Organizes the data into four manageable quarters; allows students to view the spread of the data and it clusters in a moderate or larger sets of data Type of Data Set Best Suited for the Display Small to moderate sets of data; data with a small to moderate range; repeated numbers Appropriate for displaying large sets of data or data sets with a large range Moderate to large sets of data; appropriate for comparing two or more sets of data in terms of spread and skewness Example A: Eighteen students received a score on their practice constructed response for the CRT. Students were scored on a rubric that ranged from 0 to 3 points. The following group of data was the result: 0, 1, 2, 2, 2, 1, 3, 3, 0, 3, 2, 2, 1, 2, 3, 2, 0, 2. The data display follows. Constructed ResponseScores for CRT Practice X X X X X X X X X X X X X X X X X X 0 1 2 3 Once the data set has been displayed, ask students to make some observations from the data (describe it) in terms of: shape, sample size, distribution, spread, clusters, etc. Model writing some observations from the data display, then allow student groups to practice writing their own sentences that express observations about the data. Tip: Teachers may want to provide students with a list of both “key words” to help students construct observation sentences. “Key words” might include: shape, distribution, variance, spread, cluster, center, etc. Example B: Sixth grade students collected data from 7th graders and 8th graders regarding the number of video games owned by each group. A total of eighty students were surveyed; forty 7th graders and forty 8th graders. The data is below. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 31 of 64 Number of Video Games Owned by 7th Graders 7 6 8 12 9 4 10 1 12 31 16 0 2 32 11 5 0 2 3 8 0 7 2 2 40 15 0 38 3 2 33 20 9 36 31 3 6 6 18 9 1 14 2 8 48 23 43 46 Number of Video Games Owned by 8th Graders 4 46 29 0 45 32 50 11 30 18 54 10 32 11 21 27 16 17 12 34 19 31 52 28 12 6 7 44 5 16 0 19 Frequency Table for 7th Graders Interval 0-9 10 - 19 20 - 29 30 - 39 40 - 49 50 - 59 Math 6, Unit 09 Tally Frequency 25 7 1 6 1 0 Statistics and Probability 2014-15-NVACS Page 32 of 64 Frequency Table for 8th Graders Interval 0-9 10 - 19 20 - 29 30 - 39 40 - 49 50 - 59 Tally Frequency 9 12 5 5 6 3 Video Games Owned by 8th Graders 30 # of students 25 20 15 10 5 0 0-9 1 0-1 9 20 -29 30 - 39 40 - 49 50 -59 # of video games By displaying two sets of data in the same way using the same intervals, students have the opportunity to compare and contrast the data sets using the histograms. Not only can students make generalizations/observations about each set of data, discussion can be initiated regarding the manipulation of the intervals in order to achieve particular conclusions. For example, would increasing the range of the interval further support the conclusion that 8th graders own more video games than 7th graders? Is there a way to display the data where the number of video games owned by 8th and seventh graders could appear more equitable? Tip: Teachers should not only questions students, but give students tools to question one another regarding data and observations of data. By modeling questioning continually and consistently, students will begin to feel more comfortable asking questions of each other. Teachers must provide students with the collaborative opportunity to do so. An easy way to Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 33 of 64 begin this is with a “script” and an available bank of “key words”. Gradually, students won’t need this to question each other meaningfully and appropriately. Example C: A physical representation of each piece of data in the set is helpful when students are beginning to create box plots. One simple way to achieve this is by using sticky notes to represent each individual piece of data in the set. The data set below was achieved by the teacher asking each of 32 students for their age in months. Each student wrote their age (in months) on a sticky note provided by the teacher. The sticky notes were then placed in order by the students in the front of the classroom. 130 133 139 142 130 134 139 143 131 136 140 144 131 136 140 145 131 137 141 147 132 137 141 148 132 138 142 149 132 139 142 150 Five Number Summary Minimum: 130 Quartile 1 (Q1): (132 + 133)/2 = 132.5 Median: 139 Quartile 3 (Q3): 142 Maximum: 150 Ages in Months of a Class of 32 Sixth Grade Students 130 135 140 # of months 145 150 Have students complete the following to demonstrate their understanding of the data: 1. 2. 3. 4. 5. Each 1/4th of the data is equal to ______ students. This is because _____ divided by 4 is ______. The youngest 1/4th of students is between _____ and _____ months old. The second youngest one quarter of the students are between ____ and ____ months old. The oldest one quarter of students is between _____ and ____ months old. The _____ quarter has students that are between 139 and 142 months old. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 34 of 64 6. The most “spread out” 1/4th of the data is the ____ quartile. I know this because __________________________________________________________________. 7. The data is most “concentrated” in the _____quartile. I know this because ___________________________________________________________________. 8. The median class age in months is _____ months. 9. One half of the class is from ______ to ______ months old (remember this can be answered with three different, yet correct, answers). Tip: Beginning with “closed” responses allows teachers to model for students how to correctly draw observations and conclusions from the data as well as modeling the correct interpretation of the display. Student Reflection: Identify one way to display data. Describe the strengths of your data display and what kind of data it is best suited for. Describe the limitations of your data display. Share your thoughts with a teammate. NVACS – 6.SP.B.5a - Summarize numerical data sets in relation to their context by reporting the number of observations. NVACS- 6.SP.B.5b – Summarize numerical data sets in relation to their context by describing the nature of the attribute under investigation, including how it was measured and its units of measurement. Students should be able to ascertain the number of pieces of data in a set by viewing a raw representation of the data, by viewing the data organized into a table, or by viewing the data in certain displays. Students should be able to express what the numerical data represents and what unit of measure the numerical data expresses. For example, when presented with the following data, students should be able to recognize that there are 20 chickens in the coup. Weights of Prize Winning Chickens at the County Fair (in pounds) 6.5 5.9 7.8 7.5 8.1 7.3 6.9 8.1 6.3 8.2 7.1 7.2 7.2 8.4 8.1 7.8 6.4 6.9 6.6 7.7 Each piece of data recognizes the existence of one chicken whose weight was measured. Students should also be able to see and understand that the chicken’s weight was measured in pounds. Example 1: History Test Scores Math 6, Unit 09 73, 45, 68, 90, 99, 67, 89, 61, 77, 59, 97, 83, 96 Statistics and Probability 2014-15-NVACS Page 35 of 64 1. How many students took the history test? Tell how you know. 2. In what unit of measure is their score given? Explain your reasoning. Example 2: Students’ Heights (inches) 60 56 58 60 58 59 54 60 61 57 54 62 51 64 55 49 60 55 65 63 61 58 54 60 52 1. How many students had their height observed (reported)? 2. What unit of measure was used to record student heights? 3. What additional information could be added to the title to make this data more meaningful? 4. Would another unit of measure have been appropriate? If so, which unit? Explain. Example 3: Number of Pets Owned by Mrs. Wheeler’s 6th Grade Class X X X X 0 X X X X X X X X X X 1 X X X X X X X X X X X X X X 2 3 4 5 6 7 8 1. How many students were surveyed? How many students are in Ms. Wheeler’s class? Were all of the students surveyed? How do you know? 2. What unit of measure is represented by each “x”? 3. Would you add anything to the display to make it more understandable and meaningful? If so, what would you add? Student Reflection: What are some ways to find the number of observations (pieces of data) in a data set? How do you find the unit of measure represented in the data set? Explain. NVACS - 6.SP.5c –Summarize numerical data sets in relation to their context, giving quantitative measures of center (median and/or mean) and variability (interquartile range Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 36 of 64 and/or mean absolute deviation) as well as describing any overall pattern and any striking deviations from the overall pattern with reference to the context in which the data were gathered. The measure of center that a student uses to describe or summarize a data set will depend both on the shape of the data set and the context of the data collection. Although the mode is the value that appears more frequently than the other values in the data set (and often the value easiest for students to discern), it is the least frequently used as a measure of center. This is because data sets may not have a mode, may have more than one mode, or the mode may not be descriptive of the data set as a whole. Example 1: Number of Times Grade Students Attended the Movies in the Last 30 Days (20 students sampled) 3 2 2 1 1 8 3 7 0 5 4 6 12 4 12 12 8 6 5 7 In the above data set, the mode is 12; it appears 3 times and none of the other pieces of data appear that often. This data set is an example of why the mode is often not the best measure of center to describe the data set. The three students who answered “12” to the question all have relatives who work in the local movie theater, therefore gaining them entry to the movies each and every weekend. The mode, in this example, is not representative of the data set as a whole. It is clearly substantially higher than most of the other responses. The mean is a very common measure of center computed by adding all of the values in a data set and dividing by the number of values in the data set. However, the mean can sometimes not be the best measure of center used to describe or summarize a data set because a few data points that are very high or low can greatly affect the mean. Ages of People Who Attended Ailee’s 12th Birthday Party Example 2: 10 9 18 12 12 16 13 13 10 40 14 9 11 11 12 12 11 13 13 80 13 8 84 12 In this data set, the mean is 19 (sum of 456/24 attendees). This mean is raised considerably by the attendance of Ailee’s grandparents (80 and 84) and also the attendance of Ailee’s mother (40). This data set is an example of a data set where the mean is strongly affected by one (or more) pieces of data that are much higher (or lower) than the rest of the data in the set. Clearly, the mean of 19 does not summarize or accurately describe the age of the attendees at Ailee’s birthday party. If the data were ordered to find the median: 8, 9, 9, 10, 10, 11, 11, 11, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 14, 16, 18, 40, 80, 84 Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 37 of 64 The median in this data set is 12. The median would be the best and most appropriate measure of center to describe the data set. As mentioned above, the mean is artificially inflated and there are two modes in this set of data: 12 and 13. Students will also need to be directed to look at the distribution of the data set. In data sets that are symmetrically distributed, the mean and median will be very close to the same value. In data sets that are clearly skewed, the mean and median will be different with the median generally providing a better overall description of the data set. Example: 2 student groups (6th graders surveyed at different schools) HOW MANY PEOPLE IN THEIR IMMEDIATE HOUSEHOLD? Group 1 Group 2 (18 students/observations including themselves) (18 students/observations including themselves) X X X X X X X X X X X X X X X X X X 2 3 4 5 6 7 8 9 10 even distribution X X X X X X X X X 2 3 4 5 6 7 X X X 8 X X X X X X 9 10 uneven distribution Questions: 1. Which student group had the most even distribution? Explain 2. Which student group had the most uneven distribution? Tell how you know. 3. In data groups with an even distribution, either the median or the mean can be used to best describe the data. Which measure of center would you use to best describe the data set in group 1? Explain your reasoning. 4. In data sets with skewed distribution, the median is often the best way to describe the center of the data. What is the median in the group 2 data set? Do you think it describes the center of the data better than the mean? Explain why. 5. (Extension) If you could add eight more pieces of data to group 2 to “even out the distribution”, where would you add them? Explain your reasoning. Student Reflection: The mean, median, and mode are all measures of center. Describe how you know which measure of center best describes a particular set of data? What do you look for in the data set to tell you which measure of center is most appropriate? NVACS - 6.SP.B.5c – Summarize numerical data sets in relation to their context, giving quantitative measures of measures of center (median and/or mean) variability (interquartile Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 38 of 64 range and/or mean absolute deviation) as well as describing any overall pattern and any striking deviations from the overall pattern with reference to the context in which the data were gathered. Students can describe measures of variability in a data set by finding the interquartile range and the mean absolute deviation. The interquartile range is found by subtracting the lower quartile from the upper quartile. Example 1: Student’s Heights (in inches) 60 65 58 61 54 62 56 59 63 56 61 58 When the student data is ordered it appears like this: 54 56 56 58 58 59 60 61 61 62 63 65 The median of the data is 59.5 (59 + 60)/2. The Lower Quartile is 57 (56 + 58)/2. The Upper Quartile is 61.5 (61 + 62)/2. The Interquartile Range is 4.5 (61.5 − 57). The lower the interquartile range, the less variability there is in the data. The greater the interquartile range, the greater the variability in the data. Another measure of variability is the mean absolute deviation. The mean absolute deviation is used in the sixth grade to develop a deeper understanding of variability. Students can be brought to understand numerically the distance each piece of data is from the mean of the data and from there find the mean of those deviations, or distances. This standard also gives teachers the opportunity to introduce or reinforce the concept of absolute value as distance (neither a negative or a positive directionally). Students can see that the larger the mean absolute deviation is, the greater the variability in the sets of data. Consider the two data sets below: Group A: Heights of Sixth Graders (as seen above): 54 5 56 3 56 3 58 1 58 1 59 0 60 1 61 2 61 2 62 3 63 4 65 6 62 0 62 0 64 2 66 4 68 6 68 6 70 8 Group B: Heights of Eighth Graders: 54 8 55 7 Math 6, Unit 09 57 5 59 3 59 3 Statistics and Probability 2014-15-NVACS Page 39 of 64 The mean for Group A is 59.4 (which would be rounded to 59). The mean for Group B is 62. To figure the absolute mean deviation, you must assess each data value’s distance from zero, take it’s absolute value and then average them. Distances from the mean are displayed in red in each table above. Group A: 5 + 3 + 3 + 1 + 1+ 0 + 1 + 2 + 2+ 3 + 4 + 6 = 31/12 = 2.58 Group B: 8 + 7 + 5 + 3 + 3 + 0 + 0 + 2 + 4 + 6 + 6 + 8 = 52/12 = 4.33 Group B has a greater mean absolute value than Group A; therefore there is greater variability in the data in group B. If the interquartile range is used as a measure of variability in the example above, Group B again shows greater variability. As expressed above, the interquartile range for Group A data is 4.5. The interquartile range for Group B data is 9 (67-58=9). By using examples in which both measures of variability consistently demonstrate a greater or less level of variability, students will begin to see the relationship and connection with measures of variability. Student Reflection: How would you describe “variability” within a data set? What are some ways to determine the variability of a data set? Explain. NVACS - 6.SP.5c – Summarize numerical data sets in relation to their context by giving quantitative measures of measures of center (median and/or mean) variability (interquartile range and/or mean absolute deviation) as well as describing any overall pattern and any striking deviations from the overall pattern with reference to the context in which the data were gathered. The overall pattern of data is related to the shape and spread (variance of the data). Example 1: Inning # of Runs Inning # of Runs Math 6, Unit 09 1 3 1 3 2 4 2 1 3 2 3 2 Baseball Team A Game 1 4 5 1 0 Baseball Team A Game 2 4 5 4 0 Statistics and Probability 2014-15-NVACS 6 0 7 0 8 1 9 0 6 0 7 1 8 0 9 0 Page 40 of 64 Inning # of Runs 1 0 Inning # of Runs 1 1 2 1 2 0 3 0 Baseball Team B Game 1 4 5 0 3 6 2 7 1 8 7 9 4 3 0 Baseball Team B Game 2 4 5 0 0 6 4 7 6 8 2 9 1 Students can view the scoring pattern of each baseball team and make generalizations about their scoring based upon the data. Clearly, each team has a distinct pattern to their scoring. Students can be asked to “describe” the pattern and make generalizations regarding the pattern. These generalizations can also aid students in making predictions about what might happen in future games for each team. Group discussion questions: 1. Overall, how do the scoring patterns for Team A and Team B differ? Are the scoring patterns similar in any way? 2. Which team would you predict to be most likely to score in the second inning? Explain your reasoning. 3. Which team has the greater variance among the innings? Which team is slightly more even in their scoring? 4. If you were to go and see one of the teams play, which team would be more exciting to watch? Why? 5. What might account for the difference in scoring patterns for the teams? Explain your reasoning. Example 2: Weights of 8th Grade Girls (Rounded to the nearest 10 pounds) (20 observations) X X X 80 X X X X X X X X X X X X X X X X X 90 100 110 120 130 140 150 160 170 180 pounds Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 41 of 64 Weights of 8th Grade Boys (Rounded to the nearest 10 pounds) (20 observations) X 80 X X X X X X X X X X X X X X X X X 90 100 110 120 130 140 150 160 170 180 pounds When comparing the weights of 8th grade boys and 8th grade girls, it is important for students to make the observation that although the simple range of the data is the same for boys and girls (180 − 80= 100) the data demonstrates different patterns and appears very different visually. The weights of the 8th grade girls observed, clusters between 80 and 120 pounds. The weights of the 8th grade boys observed, clusters from 120 to 180 lbs. These clusters greatly affect the shape of the data and its measures of center. In the set of data representing the weights of 8th grade girls, the 3 girls that weighed 140 lbs. or more stand out; in the data representing the weights of 8th grade boys, the lower weights of 80 and 100 lbs. stand out from the rest of the data. Because of the overall patterns and clustering in the graphic representation of each set of data, students can make the correct statement that based upon this data, 8th grade boys weigh more than 8th grade girls. Student Reflection: Is it possible for data sets with the same range to have different shapes? Provide an example of two data sets that have the same range yet appear different in shape. NVACS - 6.SP.5d –Summarize numerical data sets in relation to their context, by relating the choice of measures of center and variability to the shape of the data distribution and the context in which the data were gathered. As demonstrated in Example 2 above, simply finding the range (high − low) of a set of data does not provide very detailed information about the variability (spread) of the data set. Data sets with identical ranges can be very different and therefore cause us to draw opposite conclusions regarding the data set. Consider the example below. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 42 of 64 45 65 45 70 70 90 55 70 70 90 60 75 Period 2 Math Test Scores 24 Observations (%) 75 80 80 80 80 95 95 95 95 100 75 90 60 80 85 100 65 85 85 100 65 100 85 100 # of students 45 85 45 70 Period 1 Math Test Scores 24 Observations (%) 55 55 60 60 70 70 75 75 Letter Grade Period 2 12 # of students 10 8 6 4 2 0 A B C D F Letter Grade Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 43 of 64 When the data is viewed in a histogram, it is quite clear that the test scores for period 1 were much lower overall than the test scores for period 2 despite the fact that both classes had a range of 55 (100- 45= 55). Examples like the one above support the necessity for additional ways to determine the variability of a data set. As noted previously, a box plot provides significantly more information about the overall spread of a set of data (variability). By providing a physical representation of each quarter of the data, students can get an idea of where the data is clustered and where there are large differences in the spread of the data. Median: 65 Period 1 LQ: 57.5 Math Test UQ: 72.5 Outlier: 100 High: 85 Low: 45 45 50 55 60 65 70 75 Period 2 Math Test 80 85 90 95 Median: 85 LQ: 80 UQ: 95 High: 100 Low: 70 100 Outlier: 45 45 50 55 60 65 70 75 80 85 90 95 100 It is important for students to understand that even though the physical size of each quarter of the data differs in a box plot, it still represents only 25% of the total data set. Students should be questioned first about the basics of a box plot. Below are some sample questions that would help students feel comfortable with the basic components of a box plot. 1. What is the median test score for period 1? 2. For period 2, the Lower Quartile is 80%, what does that mean? How do we determine a Lower Quartile? 3. In period 2, why is there an unconnected dot/point on the left side of the graph? What causes this? 4. Why is the “box” in a different position along the number line in period 2 when compared to its position in period 1? 5. Can you tell from ONLY looking at the box plots how many people were observed? Explain. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 44 of 64 6. Can you tell ONLY from the histogram how many people were observed? Explain. Students should additionally be introduced to mean absolute deviation as a way to gauge variability. Simply put, the greater the relative mean absolute deviation, the more variability in the data set. The smaller the mean absolute deviation, the less the variability of the data set as a whole. Period 1 Mean (1575/24 = 65.625= 66%) 45 45 45 55 55 55 60 60 60 60 65 65 21 65 21 70 21 70 11 70 11 70 11 70 6 75 6 75 6 75 6 80 1 85 1 100 1 4 4 4 4 4 9 9 9 14 19 34 85 85 85 237/24 = 9.875 Mean Absolute Deviation 45 70 70 75 Period 2 Mean (2045/24 = 85.21= 85%) 75 80 80 80 80 40 85 15 90 15 90 10 90 10 95 5 95 5 95 5 95 5 100 0 100 0 100 0 100 0 5 5 5 10 10 10 10 15 15 15 15 225/24 = 9.375 Mean Absolute Deviation Period 1 has an absolute mean deviation of 9.875; period 2 has an absolute mean deviation of 9.375 (similar, but slightly lower). What does this tell us about the data sets? First of all it tells us that according to this measure of variance, these two data sets are relatively similar. Overall, the data in each set is clustered reasonably close together. Outliers, students should understand, can have a great effect on the mean absolute deviation. Consider the following set of data: Number of Pairs of Shoes Owned by Eight 6th Grade Students 6 10 8 5 4 5 4 32 With the outlier of 32, the mean of the data is 9.25 (74/8). The mean absolute deviation is 3 + 1 + 5 + 5 + 1 + 4 + 4 + 23 = 46/8 = 5.75 Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 45 of 64 Without the outlier of 32, the mean of the data set is 6 (42/7). The mean absolute deviation is 0 + 2 + 2 + 2 + 4 + 1 + 1 = 12/7 = 1.7 The outlier not only had a great affect on the mean of the data, but also greatly affected the mean absolute deviation. Without the outlier of 32, the data is clearly more clustered around the mean and the shape of the data as seen in a dot plot is greatly altered. Student Reflection: Why is finding the range of a data set not enough information to truly describe the variability of the data set? What are the limitations of only finding the range? What can a box plot and the mean absolute deviation of a data set tell you that the range cannot? The “shape” of a data set can be greatly affected by an outlier within the data set. Of course, the mean can also be greatly affected by an outlier in the data set. Normally, however, the median and the mode are not affected by the presence of an outlier. Consider the following data set: Prices for a new bicycle 80 75 90 110 90 120 115 490 120 75 In this data set, the outlier of $490 affects the variability of the data set as well as the mean (center) of the data set. Students should be asked to calculate the mean WITH the $490 bicycle (1365/10 = 136.5) and WITHOUT the $490 bicycle (875/9 = 97.2). Additionally, the outlier of $490 affects the MAD as well. However, the inter quartile range is 40 (120 – 80) even without such a great outlier. Had 490 been replaced with a price of 120, the inter quartile range would remain 40. ItStudent is important for students to issee thatlikely outliers some by measures of center and some Reflection: Which more to affect be affected an outlier, the MAD or the Inter measures of variability more others. quartile range? Explain yourthan reasoning. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 46 of 64 SBAC Example: Standard 6.SP.4, 6.SP.5 DOK: 2 Difficulty: M Item Type: TE This will be a Technology Enhanced Question. On Core Sample Questions for Standard 6.SP.1 Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 47 of 64 On Core Sample Questions for Standard 6.SP.2 On Core Sample Questions for Standard 6.SP.B.4 Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 48 of 64 On Core Sample Questions for Standard 6.SP.5a Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 49 of 64 On Core Sample Questions for Standard 6.SP.B.5b On Core Sample Questions for Standard 6.SP.B.5c On Core Sample Questions for Standard 6.SP.B.5d Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 50 of 64 SBAC Example: Standard 6.SP.B.5 Math 6, Unit 09 DOK: 3 Difficulty: M Statistics and Probability 2014-15-NVACS Item Type: ER Page 51 of 64 More SBAC Examples Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 52 of 64 Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 53 of 64 Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 54 of 64 Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 55 of 64 ***Please note the answer shown for this example is incorrect. There should be no bar in the 0-3 interval and the bar for the interval 12-15 should be 2 units tall. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 56 of 64 Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 57 of 64 ***BEWARE*** Please note both answers B and C are correct since the graphs are identical. Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 58 of 64 Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 59 of 64 Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 60 of 64 Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 61 of 64 Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 62 of 64 Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 63 of 64 Math 6, Unit 09 Statistics and Probability 2014-15-NVACS Page 64 of 64