Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unit 2 Chapter 2 Summary Section 2.1 Constructing a frequency table 1. Determine the number of classes and determine the class width. Class Width = l arg est value smallest value number of classes Increase this value to the next whole number. 2. Create lower class limits and upper class limits using the class width. To set up the interval properly, determine the lower class limit and add the width then subtract 1. 3. Tally the data with tick marks as you classify the data into its respective class. Then complete a table column entitled Frequency with the actual count for each class. 4. Compute the midpoint (class mark) for each class. Do not round this value. Midpoint = lower class lim it upper class lim it 2 5. Determine class boundaries. To find the upper class boundary, add 0.5 to the upper class limits. To find the lower class boundary, subtract 0.5 from the lower class limits. 6. Compute the relative frequency for each class. Re lative frequency class frequency total of all frequencies Follow the directions in docsharing to construct a histogram and relative frequency histogram. Section 2.2 Other types of graphs Stem-leaf plot Dot plots Pareto charts Circle graphs (pie chart) Time-series graphs Stem-and-leaf Displays is a method to display data that is used to rank-order and arrange data into groups. Be sure to include any stems that are empty as shown it the “6” stems. Example: for the data shown below 12 23 56 25 15 45 35 25 32 14 10 18 29 43 75 71 13 Stems 1 2 3 4 5 6 7 Leaves 0 2 3 4 5 8 3 5 5 9 2 5 3 5 6 Note empty stem 1 5 Section 2.3 Measures of Central Tendency: Mode, Median, Mean The 2 data set that will be used are shown below. Data set x = {19, 18, 23, 19, 25, 27} Data set y = {2, 3, 4, 3, 2, 5, 6} MODE of a data set is the value that occurs most frequently. For data set x, the MODE is 19. For data set y, the MODES are 2 and 3. This last situation is called BIMODAL. MEDIAN is the central value of an ordered distribution. The concept of order indicates that the measure is positional. So the first step to take is to rewrite the data set in ascending order. Data set x = {18, 19, 19, 23, 25, 27} Data set y = {2, 2, 3, 3, 4, 5, 6} Data set x has 6 items. Data set y has 7 items. To find the median, 1. If the number of items is odd, the median is the middle item. Data set y has an odd number of items, 7. Therefore the MEDIAN = 3 2. If the number of items is even, the median is the average of the middle two items. Data set x has an even number of items, 6. The middle 2 values are 19 and 23. Therefore MEDIAN = 19 23 21 2 MEAN is the arithmetic average of all the data values. Mean = sum of all values # of values Mean of data set x = 19 18 23 19 25 27 21.8 6 (rounded to 1 decimals) Mean of data set y = (rounded to 1 decimals) The proper formulas use the summation notation, . 2 233 456 3.6 7 Sample Mean = x Population Mean = x Pronounced x-bar. n is the number of values in the sample. n x N is pronounced “mu”. N is the number of values in the population. Section 2.4 Measures of Variation: Range, Standard Deviation, Variance Measures of variation show the spread of data or the spread of data about the mean. RANGE is the difference between the largest and smallest values of a data set. Data set x = {18, 19, 19, 23, 25, 27} Data set y = {2, 2, 3, 3, 4, 5, 6} Data set x has 6 items. Data set y has 7 items. Mean of x = 21.8 Mean of y = 3.6 Range of x = 27 – 18 = 9 Range of y = 6 – 2 = 4 The range show the spread of the data but not how it is related to the mean. The standard deviation and deviation show the spread relationship with the mean. As with the mean, there is a sample standard deviation and a population deviation. Method 1 SAMPLE VARIATION: s 2 ( x x) 2 n 1 ( x x) SAMPLE STANDARD DEVIATION: s = Method 2 n 1 2 or s s2 x x n 2 2 SAMPLE VARIATION: s2 n 1 x x n 2 2 SAMPLE STANDARD DEVIATION: s= n 1 or s s2 Method 1 for data set x: x 18 19 19 x - mean -3.8 -2.8 -2.8 (x - mean)^2 14.7 8.0 8.0 n= 6 23 25 27 1.2 3.2 5.2 1.4 10.0 26.7 Mean of x 21.8 s= s^2 = 3.7 13.8 n= 7 68.8 Data set for x: s 68.8 3.7 and s 2 13.8 6 1 Method 1 for data set y: y 2 2 3 3 4 5 6 y - mean -1.6 -1.6 -0.6 -0.6 0.4 1.4 2.4 (y - mean)^2 2.5 2.5 0.3 0.3 0.2 2.0 5.9 Mean of y 3.6 s= s^2 = 1.5 2.3 13.7 Data set for y: s 13.7 1.5 and s 2 2.3 7 1 Method 2 for data set x: x 18 19 19 23 25 27 x^2 324 361 361 529 625 729 131 2929 s2 1312 6 2929 2860.167 68.833 13.8 6 1 5 5 2929 s 13.8 3.7 Method 2 for data set y: y 2 2 3 3 4 5 6 y^2 4 4 9 9 16 25 36 25 103 s2 103 252 7 7 1 s 2.3 1.5 103 89.29 13.71 2.3 6 6 There is a corresponding standard deviation and variance for the population. The population standard deviation is denoted , pronounced sigma. The population variance is 2, called sigma squared. The formulas for these are found on page 88. Data set x and y are quite different and would be difficult to compare with the measure that we have produced so far. In order to compare different data set, we can use the coefficient of variations to accomplish this. Empirical Rule This theorem is used to show the data spread about the mean. Results of Chebyshev’s Theorem: For any set of data, At least 75% of the data fall in the interval from - 2 to + 2 At least 88.9% of the data fall in the interval from - 3 to + 3 At least 93.8% of the data fall in the interval from - 4 to + 4 Using x to estimate and s for we can draw some conclusions about data set x. At least 75% of the data falls must fall within 2 standard deviations of the mean. x 2 s to x 2 s 21.8 – 2(3.7) to 21.8 + 2(3.7) 14.4 to 29.2 Section 2.5 Percentiles and Five-Number Summary For whole number P, where 1 P 99 , the Pth Percentile of the distribution is a value such that P% of the data fall at or below it and (100 – P%) of the data fall at or above it. Quartiles are special percentiles. The 25th percentile is the first quartile Q1, the 50th percentile is the second quartile Q2 , and the 75th percentile is the third quartile Q3. The second quartile Q2 is the same as the Median. Data set D is shown. It has been arranged in ascending order. 2 5 7 8 8 11 12 23 25 26 27 28 29 31 14 36 20 36 23 42 1. Find the median which is Q2. For this data, the median will fall between the 10th and 11th items. Median = 23 23 23 2 2. Find Q1. This is the median of the data from the 10th and below. Q1 = 8 11 9 .5 2 3. Find Q3. This is the median of the data of the upper half of the data. Q3 = 28 29 28.5 2 Now the five-number summary can be given. Lowest value 2 Q2 = 9.5 Median or Q2 = 23 Q3 = 28.5 Highest value 42 Interquartile Range IQR = Q3 – Q1 The interquartile range for data set D is IQR = 28.5 – 9.5 = 19 The interquartile range is used to examine the data to evaluate if any extremely large or small value may produce too much influence on the data analysis.