Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

World Values Survey wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 6 ANALYZING DATA STATISTICALLY
There are several commonly used parameters that allow us to draw
conclusions about the characteristics of a data set. They are the mean,
median, mode, min, max, variance, and standard deviation.
Mean
β€’ The mean is the most commonly used characteristic of a data set. It is
also referred to as the average or the arithmetic average. It is an
indication of the expected behavior of a data set.
β€’ The mean is determined by the well-known formula
𝑛
(π‘₯1 + π‘₯2 + β‹― + π‘₯𝑛 ) 1
π‘₯=
=
π‘₯𝑖
𝑛
𝑛
𝑖=1
β€’ In Excel, the AVERAGE function is used to determine the mean. The
argument contained in parentheses indicates a block of cells
containing the values to be averaged. Thus, the expression
=AVERAGE(B1:B12) will determine the mean of the values stored in
cells B1 through B12. Within the indicated block of cells, zeros are
included in the calculation but blank cells are ignored.
Median
β€’ The median is a value such that half the data values lie above and half lie
below. If the number of data values is odd, the median coincides with one
of the data values.
β€’ For example, 3 is the median for the data set (2, 0, 8, 3, 5). If the number of
data values is even, however, the median is usually taken as the average of
the two centermost values. Thus, 4 is the median for the data set (2, 8, 3,
5).
β€’ In Excel, the MEDIAN function is used to determine the median. It is used
in the same manner as the AVERAGE function described above. Thus, the
expression =MEDIAN(B1:B12) will determine the median of the values
stored in cells B1 through B12. The numerical values within the cells need
not be sorted.
Mode
β€’ The mode is the value that occurs with the greatest frequency within a
data set. Not all data sets have a mode. On the other hand, some data sets
have multiple modes.
β€’ The data set (1, 2, 3, 4, 5), for example, does not have a mode because no
value occurs more frequently than any other. However, in the data set (1, 2,
2, 4, 5), the mode is 2. The data set (1, 2, 2, 3, 4, 4, 5) has two modes, 2 and
4.
β€’ In Excel, the mode can be determined with the MODE function. Again, the
arguments indicate the block of cells containing the data. Thus, the
expression =MODE(B1:B12) will determine the mode of the values stored
in cells B1 through B12. The MODE function returns an error message
(#N/A) if the data set does not have a mode.
Min and Max
β€’ The min and the max (i.e., the minimum and the maximum) simply indicate
the extremities of the data set.
β€’ In Excel, the MIN and MAX functions return these values. The arguments
again indicate the block of cells containing the data. Thus, the expression
=MIN(B1:B12) returns the smallest value within the cells B1 through B12,
whereas =MAX(B1:B12) returns the largest value. Blank cells are ignored.
β€’ Note that the MIN and MAX functions return the values that are the
smallest and the largest algebraically. They do not return the values that
are the smallest and the largest in magnitude. Thus, for the data set (βˆ’5,
βˆ’2, 1), the MIN function would return βˆ’5 (which is algebraically the
smallest value), and the MAX function would return 1 (which is
algebraically the largest).
Variance
β€’ The variance provides an indication of the degree of spread in the data. The
greater the variance, the greater the spread.
β€’ The variance (more precisely, the sample𝑛variance) is defined as
1
2
𝑠 =
(π‘₯𝑖 βˆ’ π‘₯)2
π‘›βˆ’1
𝑖=1
β€’ where s2 represents the variance, xi represents an individual data value, π‘₯
epresents the mean, and n represents the number of data values. Note that the
equation involves summing the square of the difference between each data value
and the calculated mean. Statisticians refer to the variance as the second
moment about the mean.
β€’ Excel includes the VAR function, which is used to determine the variance. As
before, the argument indicates the block of cells containing the data. Thus, the
expression =VAR(B1:B12) returns the variance of the numerical values in cells B1
through B12.
Standard Deviation
β€’ The standard deviation also provides a measure of spread in the data set. It
is simply the square root of the variance. Thus, the sample standard
deviation is
𝑠=
𝑠2
=
1
π‘›βˆ’1
𝑛
(π‘₯𝑖 βˆ’ π‘₯)2
𝑖=1
β€’ where s represents the sample standard deviation and all other symbols are
as defined for the variance.
β€’ In Excel, the function STDEV returns the standard deviation of its arguments.
Again, the arguments indicate the block of cells containing the data. Hence
the expression =STDEV(B1:B12) will return the standard deviation of the
data in cells B1 through B12. We could, of course, first determine the
variance and then calculate its square root, but the direct calculation is
easier.
The heights of 20 engineering students are given in the following table. Using
Equations (6.1), (6.2), and (6.3), determine the mean, median, mode, min, max,
variance, and standard deviation. Carry out the calculations by hand, using only a
calculator, a pencil, and a piece of paper. (Do not use a spreadsheet to obtain a
solution. Also, do not use any of the built-in statistical functions that may be
present in your calculator.)
Install Analysis ToolPak
β€’ For those readers with some statistical background, Excel allows you
to generate a summary of all applicable descriptive statistics for a
given data set (i.e., one or more columns or rows of numerical data).
This feature requires the Descriptive Statistics feature found in the
Analysis ToolPak.
β€’ To install the Analysis ToolPak, click on the Office Button and choose
Excel Options at the bottom of the window. Then choose Add-Ins
from the list at the left. This will result in the list of Add-Ins. If the
Analysis ToolPak is not included in the list, click on Manage: Excel
Add-ins/Go at the bottom. Then check the box labeled Analysis
ToolPak, and click on OK.
Example 6.1 Analyzing a Data Set
β€’ A car manufacturer wishes to determine how accurately the cylinders are
being machined in several engine blocks. The design specifications call for a
cylinder diameter of 3.500 inches, with a tolerance of ±0.005 inch.
β€’ To determine the accuracy of the cylinders, several engine blocks were
taken from the assembly line during manufacture and one cylinder was
measured in each block. For consistency, the measurement was always
perpendicular to the axis of the engine block (i.e., perpendicular to the
straight line connecting the centers of the cylinders).
β€’ Analyze the data by placing them in an Excel worksheet and then
calculating the mean, median, mode, min, max, and standard deviation.
Also, generate a table showing all summary statistics for the given data.
Problem 6.4
β€’ Enter the data given in Problem 6.3 into an Excel worksheet and
determine the mean, median, mode, min, max, variance, and standard
deviation. Be sure the worksheet is legible and clearly labeled.