Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 4.3 Instruction: Measures of Central Tendency This lecture discusses three statistics of numerical data sets called measures of central tendency. A measure of central tendency is a statistic that assigns a numerical value as representative of an entire data set. One measure of central tendency is the arithmetic mean. The symbol x-bar, x , denotes the arithmetic mean of a sample set. The arithmetic mean, defined below, can be thought of as the average. For a given numerical set of data S = { x1 , x2 ,… , xn } with n elements, the arithmetic mean of the set is given by the formula: ∑x x= . n The arithmetic mean has three significant characteristics. First, changing the value of any score or adding to the data set a new score not equal to the mean, will change the mean of the data set. Second, if some constant value c is added to each value in the data set, the mean changes to x + c . Third, if some constant value c is multiplied by each value in the data set, the mean changes to c ⋅ x A second measure of central tendency is the median. The median, defined below, is the midpoint of the distribution of the data set. For data set S arranged in ascending order, the median is the value that divides the data set exactly in half, and exactly 50% of the data will be equal to or less than the median. If ( n + 1) 2 is an integer, it equals the position of the median. If ( n + 1) 2 is a not an integer, the position of the median is the midpoint between the score in the n 2 position and the score in the ( n + 2 ) 2 position. If n ( S ) is odd for a sample S of non-rounded data arranged in ascending order, the median is the middle number in S. If n ( S ) is even for a sample S of non-rounded data arranged in ascending order, the median is the mean of the two middle numbers. The third measure of central tendency is the mode. The mode, defined below, is the most common number in a numerical data set. For data set S with some frequency f k greater than any other frequency f j , the mode is the value with the greatest frequency. According to the definition above, there is no mode in a numerical data set that contains data values such that the frequencies of all the data values are equal. If, however, there exists any one or more frequencies greater than one or more other frequencies, the data set has a mode, and the Lecture 4.3 mode equals the data value (or values) with the greatest frequency. Data sets with multiple modes are said to be multimodal. Data sets with two modes are said to be bimodal. Consider a sample V = {6, 5, 2, 12, 1, 3, 2, 4, 0, 4, 13, 6, 6, 7, 1, 6} . To find the three measures of central tendency, we must find the arithmetic mean, the median, and the mode. Arranging the data set in ascending order, will help identify frequencies and the median: V = {0, 1, 1, 2, 2, 3, 4, 4, 5, 6, 6, 6, 6, 7, 12, 13} . The data point 6 appears in the data set the most (has the greatest frequency), so the mode equals 6. The arithmetic mean equals the ratio of the sum of the data points to the number of data points as computed below. x= 0 + 1 + 1 + 2 + 2 + 3 + 4 + 4 + 5 + 6 + 6 + 6 + 6 + 7 + 12 + 13 78 = = 4.875 16 16 Since n (V ) is even, the median equals the mean of the two middle numbers as computed below. median = 4+5 9 = = 4.5 2 2 In summary, for the given data set V, we have the three measures of central tendency: mean = 4.875, median = 4.5, & mode = 6. Consider a larger set of data S displayed by the frequency distribution below. x 22 23 24 25 26 27 28 29 30 31 32 33 f 5 3 7 1 1 2 4 10 4 1 1 1 Since the frequency distribution organizes the data set, finding the three measures of central tendency for the data set is not much more difficult for S than it was for V; even though, n ( S ) > n (V ) . Note that n ( S ) = ∑ f = 5 + 3 + 7 + 1 + 1 + 2 + 4 + 10 +4 + 1 + 1 + 1 = 40 . To find the mode, select the data value with the greatest frequency, which is 29. To find the median, start by calculating its position: ( 40 + 1) 2 = 20.5. Since position of the median is 20.5, the median equals the average of the 20th and 21st values in the data set arranged in ascending order: ( 28 + 28) 2 = 28. To find the arithmetic mean, calculate the ratio of the sum of the data points to the number of data points as below. x= ∑ f ⋅ x 5 ⋅ 22 + 3 ⋅ 23 + 7 ⋅ 24 + 25 + 26 + 2 ⋅ 27 + 4 ⋅ 28 + 10 ⋅ 29 + 4 ⋅ 30 + 31 + 32 + 33 = = 26.75 40 ∑f In summary, for the given data set S, we have the three measures of central tendency: mean = 26.75, median = 28, & mode = 29. Application Exercise 4.3 Problems Suppose NASA studies the effects of micro-gravity on the immune system. As part of this study, NASA collects thirty blood samples from astronauts after six consecutive weeks in orbit and records the number of white cells in thousands per cubic millimeter below. 3.6 5.9 6.3 5.1 5.0 7.2 5.2 9.3 8.1 7.1 9.9 9.2 5.9 9.9 5.7 7.9 9.9 8.4 6.0 8.5 6.7 7.9 7.7 4.4 8.0 4.7 6.9 7.8 9.1 4.9 #1 Calculate the mean number of white cells in thousands per cubic millimeter. #2 Identify the median number of white cells in thousands per cubic millimeter. #3 Identify the mode of the sample. #4 Assume every measurement given above is actually eight times greater than the given amount. What would the new mean be? #1 7.073 #2 15.5 #3 9.9 #4 56.586 Lecture 4.3 Instruction: Measures of Central Tendency This lecture discusses three statistics of numerical data sets called measures of central tendency. A measure of central tendency is a statistic that assigns a numerical value as representative of an entire data set. One measure of central tendency is the arithmetic mean. The symbol x-bar, x , denotes the arithmetic mean of a sample set. The arithmetic mean, defined below, can be thought of as the average. For a given numerical set of data S = { x1 , x2 ,… , xn } with n elements, the arithmetic mean of the set is given by the formula: ∑x x= . n The arithmetic mean has three significant characteristics. First, changing the value of any score or adding to the data set a new score not equal to the mean, will change the mean of the data set. Second, if some constant value c is added to each value in the data set, the mean changes to x + c . Third, if some constant value c is multiplied by each value in the data set, the mean changes to c ⋅ x A second measure of central tendency is the median. The median, defined below, is the midpoint of the distribution of the data set. For data set S arranged in ascending order, the median is the value that divides the data set exactly in half, and exactly 50% of the data will be equal to or less than the median. If ( n + 1) 2 is an integer, it equals the position of the median. If ( n + 1) 2 is a not an integer, the position of the median is the midpoint between the score in the n 2 position and the score in the ( n + 2 ) 2 position. If n ( S ) is odd for a sample S of non-rounded data arranged in ascending order, the median is the middle number in S. If n ( S ) is even for a sample S of non-rounded data arranged in ascending order, the median is the mean of the two middle numbers. The third measure of central tendency is the mode. The mode, defined below, is the most common number in a numerical data set. For data set S with some frequency f k greater than any other frequency f j , the mode is the value with the greatest frequency. According to the definition above, there is no mode in a numerical data set that contains data values such that the frequencies of all the data values are equal. If, however, there exists any one or more frequencies greater than one or more other frequencies, the data set has a mode, and the Lecture 4.3 mode equals the data value (or values) with the greatest frequency. Data sets with multiple modes are said to be multimodal. Data sets with two modes are said to be bimodal. Consider a sample V = {6, 5, 2, 12, 1, 3, 2, 4, 0, 4, 13, 6, 6, 7, 1, 6} . To find the three measures of central tendency, we must find the arithmetic mean, the median, and the mode. Arranging the data set in ascending order, will help identify frequencies and the median: V = {0, 1, 1, 2, 2, 3, 4, 4, 5, 6, 6, 6, 6, 7, 12, 13} . The data point 6 appears in the data set the most (has the greatest frequency), so the mode equals 6. The arithmetic mean equals the ratio of the sum of the data points to the number of data points as computed below. x= 0 + 1 + 1 + 2 + 2 + 3 + 4 + 4 + 5 + 6 + 6 + 6 + 6 + 7 + 12 + 13 78 = = 4.875 16 16 Since n (V ) is even, the median equals the mean of the two middle numbers as computed below. median = 4+5 9 = = 4.5 2 2 In summary, for the given data set V, we have the three measures of central tendency: mean = 4.875, median = 4.5, & mode = 6. Consider a larger set of data S displayed by the frequency distribution below. x 22 23 24 25 26 27 28 29 30 31 32 33 f 5 3 7 1 1 2 4 10 4 1 1 1 Since the frequency distribution organizes the data set, finding the three measures of central tendency for the data set is not much more difficult for S than it was for V; even though, n ( S ) > n (V ) . Note that n ( S ) = ∑ f = 5 + 3 + 7 + 1 + 1 + 2 + 4 + 10 +4 + 1 + 1 + 1 = 40 . To find the mode, select the data value with the greatest frequency, which is 29. To find the median, start by calculating its position: ( 40 + 1) 2 = 20.5. Since position of the median is 20.5, the median equals the average of the 20th and 21st values in the data set arranged in ascending order: ( 28 + 28) 2 = 28. To find the arithmetic mean, calculate the ratio of the sum of the data points to the number of data points as below. x= ∑ f ⋅ x 5 ⋅ 22 + 3 ⋅ 23 + 7 ⋅ 24 + 25 + 26 + 2 ⋅ 27 + 4 ⋅ 28 + 10 ⋅ 29 + 4 ⋅ 30 + 31 + 32 + 33 = = 26.75 40 ∑f In summary, for the given data set S, we have the three measures of central tendency: mean = 26.75, median = 28, & mode = 29. Lecture 4.4 Contemporary Mathematics Instruction: Measures of Dispersion This lecture discusses three statistics of numerical data sets called measures of dispersion. Consider the two samples below each with the same mean and median. A = {47, 50, 53} B = {0, 50, 100} For both sets, x = 50. For sample A, the mean is a good estimate for any score found in the set, but the mean is not a good estimate for any score found in sample B. The scores in sample B are spread further apart than those in sample A. Sample B is said to have greater variability. Statistics that measure the magnitude of variability are called measures of dispersion. A measure of dispersion is a statistic that assigns a numerical value to describe the variability of a data set. Variability refers to the spread of a data set. A measure of dispersion measures how spread out or how widely dispersed a set of data is. One particular measure of dispersion is the range. The range, defined below, is the distance between the largest and smallest values in a sample. The range is the difference of the largest and smallest values in a sample. The range of set A above equals six because 53 − 47 = 6. The range of set B above equals 100 because 100 − 0 = 100. A second measure of dispersion is the sample variance. To discuss variance, we must first discuss a deviation and the squares of deviations. Deviation equals distance from the mean. A deviation score equals x − x . According to the definition above, the deviations of scores below the mean are negative, and the deviations of scores above the mean are positive. The table below shows the deviations for set A. x x−x 47 –3 50 0 53 3 Scores below the mean have negative deviations. Scores above the mean have positive deviations. Scores equal to the mean have zero deviations. While deviations can be positive or ( ) 2 negative depending on the position of the respective score, the squares of deviations, x − x , are always positive. Lecture 4.4 To calculate sample variance, we must calculate the deviation of each score in the sample as above as well as the square of each deviation as below. x x−x ( x − x) 47 50 53 –3 0 3 9 0 9 2 The population variance equals the mean of the sum of the squares of the deviations. The sample variance equals an estimate of the population variance given by the formula in the box below. The sample variance, denoted var, equals the ratio: var = ( ∑ x−x ) n −1 2 . The sample variance for sample A = {47, 50, 53} is calculated below. var = 9 + 0 + 9 18 = =9 3 −1 2 The third measure of dispersion is the standard deviation, which equals the square root of the variance. The standard deviation, denoted s, is a distance from the mean that equals the square root of the variance: s = var = ( ∑ x−x ) 2 . n −1 The standard deviation measures the typical or standard distance of scores in the sample from the mean. According to the definition above, widely dispersed data sets have large standard deviations. Indeed, the larger the sample's standard deviation, the more widely dispersed are the elements in the sample. The standard deviation of sample A = {47, 50, 53} is given here: s = 9 = 3 . The standard deviation has two key characteristics. First, adding a constant to each score in a sample will not change the standard deviation. Thus, if A* = {46, 49, 52} , then s = 3. Second, multiplying each score by a constant causes the standard deviation to be multiplied by the same constant. Thus, if A* = {94, 100, 106} , then s = 6. Application Exercise 4.4 Problems Suppose NASA studies the effects of micro-gravity on the immune system. As part of this study, NASA collects thirty blood samples from astronauts after six consecutive weeks in orbit and records the number of white cells in thousands per cubic millimeter below. 3.6 5.9 6.3 5.1 5.0 7.2 5.2 9.3 8.1 7.1 9.9 9.2 5.9 9.9 5.7 7.9 9.9 8.4 6.0 8.5 6.7 7.9 7.7 4.4 8.0 4.7 6.9 7.8 9.1 4.9 #1 Calculate the range of the sample. #2 Calculate the variance of the sample. #3 Calculate the standard deviation of the sample. #4 Assume every measurement in the sample is actually four times greater than the given amount. What would the new standard deviation be? #1 6.3 #2 var = 3.22 #3 s ≈ 1.79 #4 s ≈ 7.18 Assignment 4.4 Problems #1 Which statistic equals the difference between the smallest and largest values in a sample? #2 Which statistic measures the typical or standard distance of scores in the sample from the mean. #3 Find the range and standard deviation for the sample below. Round answers to nearest hundredth. {206.3, 210.4, 209.3, 211.1, 210.8, 213.5, 212.6, 210.5, 211.0, 214.2} #4 Find the standard deviation for the data set displayed by the frequency distribution below. Round answers to nearest hundredth. Value 9 7 5 3 1 #5 Frequency 3 4 7 5 2 Consider the sample S = { x1 , x2 , x3 } whose standard deviation equals 5. What is the standard deviation of a data set comprised of three datum: {6 ⋅ x1 , 6 ⋅ x2 , 6 ⋅ x3 } ?