Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 6 ANALYZING DATA STATISTICALLY There are several commonly used parameters that allow us to draw conclusions about the characteristics of a data set. They are the mean, median, mode, min, max, variance, and standard deviation. Mean β’ The mean is the most commonly used characteristic of a data set. It is also referred to as the average or the arithmetic average. It is an indication of the expected behavior of a data set. β’ The mean is determined by the well-known formula π (π₯1 + π₯2 + β― + π₯π ) 1 π₯= = π₯π π π π=1 β’ In Excel, the AVERAGE function is used to determine the mean. The argument contained in parentheses indicates a block of cells containing the values to be averaged. Thus, the expression =AVERAGE(B1:B12) will determine the mean of the values stored in cells B1 through B12. Within the indicated block of cells, zeros are included in the calculation but blank cells are ignored. Median β’ The median is a value such that half the data values lie above and half lie below. If the number of data values is odd, the median coincides with one of the data values. β’ For example, 3 is the median for the data set (2, 0, 8, 3, 5). If the number of data values is even, however, the median is usually taken as the average of the two centermost values. Thus, 4 is the median for the data set (2, 8, 3, 5). β’ In Excel, the MEDIAN function is used to determine the median. It is used in the same manner as the AVERAGE function described above. Thus, the expression =MEDIAN(B1:B12) will determine the median of the values stored in cells B1 through B12. The numerical values within the cells need not be sorted. Mode β’ The mode is the value that occurs with the greatest frequency within a data set. Not all data sets have a mode. On the other hand, some data sets have multiple modes. β’ The data set (1, 2, 3, 4, 5), for example, does not have a mode because no value occurs more frequently than any other. However, in the data set (1, 2, 2, 4, 5), the mode is 2. The data set (1, 2, 2, 3, 4, 4, 5) has two modes, 2 and 4. β’ In Excel, the mode can be determined with the MODE function. Again, the arguments indicate the block of cells containing the data. Thus, the expression =MODE(B1:B12) will determine the mode of the values stored in cells B1 through B12. The MODE function returns an error message (#N/A) if the data set does not have a mode. Min and Max β’ The min and the max (i.e., the minimum and the maximum) simply indicate the extremities of the data set. β’ In Excel, the MIN and MAX functions return these values. The arguments again indicate the block of cells containing the data. Thus, the expression =MIN(B1:B12) returns the smallest value within the cells B1 through B12, whereas =MAX(B1:B12) returns the largest value. Blank cells are ignored. β’ Note that the MIN and MAX functions return the values that are the smallest and the largest algebraically. They do not return the values that are the smallest and the largest in magnitude. Thus, for the data set (β5, β2, 1), the MIN function would return β5 (which is algebraically the smallest value), and the MAX function would return 1 (which is algebraically the largest). Variance β’ The variance provides an indication of the degree of spread in the data. The greater the variance, the greater the spread. β’ The variance (more precisely, the sampleπvariance) is defined as 1 2 π = (π₯π β π₯)2 πβ1 π=1 β’ where s2 represents the variance, xi represents an individual data value, π₯ epresents the mean, and n represents the number of data values. Note that the equation involves summing the square of the difference between each data value and the calculated mean. Statisticians refer to the variance as the second moment about the mean. β’ Excel includes the VAR function, which is used to determine the variance. As before, the argument indicates the block of cells containing the data. Thus, the expression =VAR(B1:B12) returns the variance of the numerical values in cells B1 through B12. Standard Deviation β’ The standard deviation also provides a measure of spread in the data set. It is simply the square root of the variance. Thus, the sample standard deviation is π = π 2 = 1 πβ1 π (π₯π β π₯)2 π=1 β’ where s represents the sample standard deviation and all other symbols are as defined for the variance. β’ In Excel, the function STDEV returns the standard deviation of its arguments. Again, the arguments indicate the block of cells containing the data. Hence the expression =STDEV(B1:B12) will return the standard deviation of the data in cells B1 through B12. We could, of course, first determine the variance and then calculate its square root, but the direct calculation is easier. The heights of 20 engineering students are given in the following table. Using Equations (6.1), (6.2), and (6.3), determine the mean, median, mode, min, max, variance, and standard deviation. Carry out the calculations by hand, using only a calculator, a pencil, and a piece of paper. (Do not use a spreadsheet to obtain a solution. Also, do not use any of the built-in statistical functions that may be present in your calculator.) Install Analysis ToolPak β’ For those readers with some statistical background, Excel allows you to generate a summary of all applicable descriptive statistics for a given data set (i.e., one or more columns or rows of numerical data). This feature requires the Descriptive Statistics feature found in the Analysis ToolPak. β’ To install the Analysis ToolPak, click on the Office Button and choose Excel Options at the bottom of the window. Then choose Add-Ins from the list at the left. This will result in the list of Add-Ins. If the Analysis ToolPak is not included in the list, click on Manage: Excel Add-ins/Go at the bottom. Then check the box labeled Analysis ToolPak, and click on OK. Example 6.1 Analyzing a Data Set β’ A car manufacturer wishes to determine how accurately the cylinders are being machined in several engine blocks. The design specifications call for a cylinder diameter of 3.500 inches, with a tolerance of ±0.005 inch. β’ To determine the accuracy of the cylinders, several engine blocks were taken from the assembly line during manufacture and one cylinder was measured in each block. For consistency, the measurement was always perpendicular to the axis of the engine block (i.e., perpendicular to the straight line connecting the centers of the cylinders). β’ Analyze the data by placing them in an Excel worksheet and then calculating the mean, median, mode, min, max, and standard deviation. Also, generate a table showing all summary statistics for the given data. Problem 6.4 β’ Enter the data given in Problem 6.3 into an Excel worksheet and determine the mean, median, mode, min, max, variance, and standard deviation. Be sure the worksheet is legible and clearly labeled.