Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Algebra 1 Summer Institute 2014 The Fair Allocation Paradigm Summary Goals Participant Handouts Participants will explore Understand the sample and interpret the sample mean as an indicator of mean and learn several fair allocation ways to describe the degree Explore deviations of data of variation in data, based in values from the sample how much the data values mean vary from the mean Understand the mean mean as the "balancing point" of a data set Learn how to measure variation about the mean Materials Technology Source Paper Colored Pencils Sticky notes Snap Cubes LCD Projector Facilitator Laptop 1. The Noodle Conundrum 2. The Box Plot 3. Finding the Five Number Annenberg Learner Website Estimated Time 90 minutes Mathematics Standards Common Core State Standards for Mathematics MAFS.6.SP.1: Develop understanding of statistical variability 1.2: Understand that a set of data collected to answer a statistical question has a distribution which can be described by its center, spread, and overall shape. 1.3: Recognize that a measure of center for a numerical set of data summarizes all of its values with a single number, while a measure of variation describe how its values vary with a single number. MAFS.6.SP.2: Summarize and describe distributions 2.5: Summarize numerical data sets in relation to their context, such as by: a. Reporting the number of observations b. Describing the nature of the attribute under investigation, including how it was measured and its units of measurement. c. Giving quantitative measures of center (median and/or mean) and variability (interquartile range and/or mean absolute deviation), as well as describing any overall pattern and any striking deviations from the overall pattern with reference to the context in which the data were gathered. Standards for Mathematical Practice 1. Make sense of problems and persevere in solving them 2. Reason abstractly and quantitatively 1 Algebra 1 Summer Institute 2014 3. Construct viable arguments and critique the reasoning of others 4. Model with mathematics Instructional Plan NOTE: in previous activities we have used the word “average” instead of mean. In this activity sample mean is formally introduced. The words mean and median refer to sample mean and sample median when estimating from a sample of data. In the last activity, we explored the Five-Number Summary and its graphical representation, the box plot. We also explored the median, a common numerical summary for a data set. In this session, we'll investigate another common numerical summary, the mean. We'll also learn several ways to describe the degree of variation in data, based on how much the data values vary from the mean. 1. Begin the session by asking participants what is the mean. Listen to a couple of responses. In this session we are going to see how to interpret the mean. What does it say about the data? Is it a center point? If it is a measure of center, in what way is it a halfway point? (Slide 2) The term average is a popular one; it is often used, and often used incorrectly. Although there are different types of averages, the typical definition of the word "average" when talking about a list of numbers is "what you get when you add all the numbers and then divide by how many numbers you have." This statement describes how you calculate the arithmetic mean, or average. But knowing how to calculate a mean doesn't necessarily tell you what it represents. 2. Let's begin our exploration of the mean. The problem is: how large are families? Ask Participants to think back to when they were in the 4th grade. How many people were in their family? They can represent the number of family members in their family when they were in the 4th grade with snap cubes. Ask the participants to place their snap cubes in a table in the front of the room. Once there are all in the front of the room, ask them what could we do with them to better view the distribution. Suggest that we might want to place them in some order, from smallest to largest for example, and you do it yourself or ask somebody to do it so there can be a more visual representation of the distribution. We would like to figure out what is the mean of the distribution without using the rule of adding them up and dividing by how many data points are there. 2 Algebra 1 Summer Institute 2014 3. Before figuring out the mean, introduce them to another representation of data: the dot plot. Imagine you did a similar survey in the past to 9 people and you have the representation of that data. Show them a poster paper with a dot plot and 9 sticky post-it notes lined up vertically on the column representing the number 5 like the picture below: (Slide 3) Ask them what they think is the mean of this distribution. Is there any doubt that the median and the mode are also 5? This data set shows no variation. 4. Ask participants to draw a distribution of 9 family sizes with a mean of 5 that shows some variation. Does the median have to be 5? Participants will be in groups of 4 and all will share their distribution using a poster paper and sticky notes. As a couple of groups present their results, ask them how they decided on that distribution and what is the median of their distribution. Encourage presentations that show symmetrical and non-symmetrical distributions of the data. (Slide 4) 5. Ask participants to write the number of members in their families when they were in the 4th grade in a sticky post-it note. On a poster paper, participants will stick their notes vertically in columns representing their number of family members. Now there are two representations of their data: the snap cubes and the sticky notes in the poster paper. Ask participants what is the connection between the two representations and if it is possible to go from one to the other. Which one offers a better visual representation of the data to them? Why? What could we do to the representation of the snap cubes to resemble the representation on the poster paper? Let participants figure it out; one way could be to lay down the columns of snap cubes and stack together the ones representing the same numbers. Since the visual representation and the dot plot look so different, the connection between them might not be obvious for some people. (Slide 5) 3 Algebra 1 Summer Institute 2014 6. Return the snap cubes to their original single column representation. Pose the question, what can be done to the cubes to find the mean without having to add and divide? (Slide 6) Ask a volunteer to come up and explain what they would do and do it. Hopefully the attempt is to make all the stacks the same in size. What would be the mean, or an estimate of the mean, without doing any calculations? The purpose is to give a demonstration to the participants of how to find the mean without doing any calculations. One of the interpretations of the mean is the number that levels off the data. It is considered the fair-share value. The mean is the notion of putting all the data together and redistributing it evenly. What is the actual mean for this data? If appropriate, leave the result as a mixed number. 7. What other information could we gather from the data to help us identify where the data comes from? Direct the participants’ attention to the dot plots they created with the 9 imaginary families and the mean of 5. Ask them to think about how different are the data values and how much they vary from the quantity 5 that represent the mean. The goal is to measure variation about the mean. The distribution that has the least amount of variation is the one that has all the sticky notes in the same column 5. What distribution has data points that vary the most from 5? We would like to quantify the variation from 5. Pick one distribution and write down how many units is each data point away from 5. For example if the data point is 7, the difference from 5 is 2. Write down the difference of each data point from five on the sticky note. If the data point is 3, the difference is still a positive 2. Once all the 9 differences are written for one distribution, ask participants what we could do to find the average of the differences? 8. Find the averages of the differences on the board. This quantity is called the MAD. The MAD (Mean Absolute Deviation) measure tells us how much, on average, the values in a dot plot differ from the mean. (Slide 7) Find the MAD from another distribution. What is the MAD telling us? If the MAD is small, it tells us that the values in the set are clustered closely around the mean. If it is large, we know that at least some values are quite far away from the mean. If a distribution has a smaller MAD than a second distribution, the data shows less variation than a distribution with a larger MAD. How do we find the MAD? The deviation (D) is the distance between the data point value and the mean (D= |value – mean|). The result of this difference could be negative or negative, since we are interested is in the distance, we take the 4 Algebra 1 Summer Institute 2014 absolute value, which results in positive values. Finally, in order to find the MAD, we add all the differences in absolute value (D) and then divide by the number or data points (n) in the distribution. MAD = Sum (D)/n. 9. Ask participants to find the MAD of the rest of the displayed distributions of the 9 imaginary families. MAD is not commonly used in statistics but it is a natural step before developing the idea of standard deviation. MAD and standard deviation could produce similar numbers, just like the mean and the median. Mean and median inform us about center, MAD and standard deviation inform us how much the data points deviate from the mean. 10. As an extension, ask participants if it would be possible to come up with a distribution of 9 families with a mean of 5 members in the family and a MAD of 1? The reason it is impossible is that the MAD is the total of all absolute deviations. You may have noticed that in these problems the MAD is the sum of the deviations divided by 9. For the MAD to equal 1, the sum of the deviations would have to be exactly 9 (9 / 9 = 1). But the only way that could happen is if the total deviation of the points below the mean and the total deviation of the point above the mean were each equal to 4.5. This would require having families with a fractional numbers of family members, which is not possible. 11. Le us return to the question of how is the mean a measure of center. Pick one of the distributions used to calculate the MAD, and draw a vertical line through its mean (5). One of the interpretations of the mean is that the mean is the fair share value; another is that it is the center of the distances of the values below the mean and above the mean. Looking at the distribution with the vertical line: How many values are below the mean? How many are above the mean? Could the mean and the median be the same in this distribution? If no, why not? Ask participants to add the distances of each data point below the mean, and do the same for the data points above the mean. What do they notice? The total distances are the same on each side, and this is the reason why the mean is a center of measure. (Slide 9) 12. Professional statisticians more commonly use two other measures of variation: the variance and the standard deviation. The method for calculating variance is very similar to the method used to calculate the MAD. The first step in calculating the variance is the same used to find the MAD: Find the deviation for each value in the set (i.e., how much each value differs from the mean). The next step in calculating the variance is to square each deviation. Note the difference between this and the MAD, which requires us to find the absolute value of each deviation. The final step is to find the variance 5 Algebra 1 Summer Institute 2014 by calculating the mean of the squares. As usual, find the mean of the squares by adding all the values and then dividing by how many there are. Find the variance of one of the distributions. Here is a table for this calculation for a possible distribution: (Slide 10) Number of Members in Family(x) Deviation from the Mean (x - 5) Squared Deviation from the Mean (x - 5)2 2 -3 9 3 -2 4 3 -2 4 4 -1 1 5 0 0 6 +1 1 6 +1 1 8 +3 9 8 +3 9 ______ ______ ______ 45 0 38 The mean of the squared deviations is 38 / 9 = 4 2/9, or approximately 4.22. This value is the variance for this data set. As with the MAD, the variance is a measure of variation about the mean. Data sets with more variation will have a higher variance. The variance is the mean of the squared deviations, so you could also say that it represents the average of the squared deviations. The problem with using the variance as a measure of variation is that it is in squared units. To gauge a typical (or standard) deviation, we would need to calculate the square root of the variance. This measure -- the square root of the variance -- is called the standard deviation for a data set. For the data set given above, the standard deviation is the square root of 4.22, which is approximately 2.05. Note that this value is fairly close to the MAD for this distribution, which is 16/9 or about 1.78. 6 Algebra 1 Summer Institute 2014 13. Ask each group of participants to find the variance and standard deviation for the distribution of the family size of the whole class and compare the standard deviation to the MAD of that distribution. The standard deviation, first introduced in the late 19th century, has become the most frequently used measure of variation in statistics today. For example, IQ tests are created with an expected mean of 100 and a standard deviation of 15. 14. As an extension to this session, you could ask the following questions: (Slide 11) a. What would happen to the mean of a data set if you added 3 to every number in it? b. What would happen to the MAD of a data set if you added 3 to every number in it? c. What would happen to the variance of a data set if you added 3 to every number in it? d. What would happen to the standard deviation of a data set if you added 3 to every number in it? e. What would happen to the mean of a data set if you doubled every number in it? f. What would happen to the MAD of a data set if you doubled every number in it? g. What would happen to the variance of a data set if you doubled every number in it? h. What would happen to the standard deviation of a data set if you doubled every number in it? Solutions: a. The mean would increase by 3. b. The MAD would not change. Since the values in the list are each 3 larger, and the mean is also 3 larger, the deviations from the mean would remain the same. c. The variance would not change, since it depends only on the deviation from the mean, not the values themselves. Since the mean increases by 3 along with the rest of the data set, none of the deviations will change. d. Since the standard deviation is the square root of the (unchanged) variance, it will not change. e. The mean would be doubled. f. The MAD would be doubled, since all the deviations are now doubled, and the MAD is the average of these deviations. g. The variance would be multiplied by 4. Since calculating the variance involves squaring the deviations, the newly doubled deviations would all be squared, resulting in values that are four times as large. For example, if a deviation was (+3), it now becomes (+6). The value used in the variance calculation changes from 32 = 9 to 62 = 36, which is four times as large. 7 Algebra 1 Summer Institute 2014 h. The standard deviation would be doubled, since it is the square root of the variance. Discussion on the Standard Deviation The standard deviation is a measure of the spread of scores within a set of data. Usually, we are interested in the standard deviation of a population. However, as we are often presented with data from a sample only, we can estimate the population standard deviation from a sample standard deviation. These two standard deviations - sample and population standard deviations - are calculated differently. The formula for the population standard deviation is: (Slide 12) 𝜎 = ∑𝑛𝑖=1 (𝑥𝑖 −𝜇)2 𝑛 where is the mean. The formula for the sample standard deviation is: 𝑠𝑑 = ∑𝑛𝑖=1 (𝑥𝑖 −𝑥̅ )2 𝑛−1 where 𝑥̅ is the mean Notice the difference: the population formula divides by “n”, the sample by “n-1”. When to use the sample or the population standard deviation: We are normally interested in knowing the population standard deviation because our population contains all the values we are interested in. Therefore, you would normally calculate the population standard deviation if: (1) you have the entire population or (2) you have a sample of a larger population, but you are only interested in this sample and do not wish to generalize your findings to the population. However, in statistics, we are usually presented with a sample from which we wish to estimate (generalize to) a population, and the standard deviation is no exception to this. Therefore, if all you have is a sample, but you wish to make a statement about the population standard deviation from which the sample is drawn, you need to use the sample standard deviation. Confusion can often arise as to which standard deviation to use due to the name "sample" standard deviation incorrectly being interpreted as meaning the standard deviation of the sample itself and not the estimate of the population standard deviation based on the sample. Quick explanation of the difference: When the mean is calculated from the n data, there are only n-1 degrees of freedom left to calculate the spread of the data. Essentially, the mean used is not the real mean but an estimate of the mean based on samples of data found in the experiment. This estimate is, of course, biased towards 8 Algebra 1 Summer Institute 2014 fitting the found data because that is how you got it. Therefore estimating how widely spread around the real mean the data samples are by calculating how spread around the estimated mean they are is going to give a value biased slightly too low. Using n-1 instead of n compensates. Source: https://statistics.laerd.com/statistical-guides/measures-of-spread-standarddeviation.php 9