* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Statistics in Applied Science and Technology
Survey
Document related concepts
Transcript
Statistics in Applied Science and Technology Chapter 4 Summarizing Data July, 2000 Guang Jin Key Concepts in This Chapter Mean Median Mode Range Standard Deviation Variance Coefficient of Variation July, 2000 Guang Jin Measures of Central Tendency Central tendency - the tendency of a set of data to center around certain values. The three most common values are the mean, the median, and the mode. July, 2000 Guang Jin The Mean The arithmetic mean (or simply, mean) is computed by summing all the observations in the sample and dividing the sum by the number of observations. Symbolically, the mean x n x July, 2000 x i 1 n i x1 is the first and xi is the ith in a series of observations. n is the total number of observations Guang Jin The Mean (Continued) The arithmetic mean may be considered the balance point, or fulcrum, in a distribution. The arithmetic mean is the point that balances the positive and negative deviations from the fulcrum. The mean is affected by values of each observations of the distribution and may be distorted when extreme values exist. July, 2000 Guang Jin The Median Median is defined as the middle value when observations are ordered. Median is the value above which there are the same number of observations as below. For an even number of observations, the median is the average of the two middlemost values. July, 2000 Guang Jin The Mode The mode is the observation that occurs most frequently. Mode can be read from a graph as that value on the horizontal axis that corresponds to the peak of the distribution. July, 2000 Guang Jin Which Average Should You Use for Quantitative Data? When a distribution of observation is normal or not too skewed, the values of the mode, the median and the mean are same or similar, and any of them can be used to describe central tendency. When a distribution is skewed, appreciable difference between the values of mean and median, therefore both the mean and median should be reported. July, 2000 Guang Jin Measures of central tendency for Qualitative Data The mode always can be used with qualitative data Median can be used whenever the qualitative data is ordinal Mean is not appropriate for qualitative data July, 2000 Guang Jin Measures of Variation Measure of variation (or variability) is important to know whether observations tend to be quite similar (homogeneous) or whether they vary considerably (heterogeneous). Three most common measures of variation include the range, the standard deviation, and the variance. July, 2000 Guang Jin Range The range is defined as the difference in value between the highest (maximum) and lowest (minimum) observation: Range = X max - X min July, 2000 Guang Jin Standard Deviation and Variance By far the most widely used measure of variation is the standard deviation, represented by symbol s. Standard deviation is the square root of the variance (represented by symbol s2) of the observation. The larger the standard deviation and variance, the more heterogeneous the distribution. July, 2000 Guang Jin Variance The variance (s2) is computed by squaring each deviation from the mean, adding them up, and dividing their sum by one less than n, the sample size: n s2 July, 2000 (x x) i 1 2 i n 1 Guang Jin Standard Deviation The standard deviation (s, sometimes represented by SD) is computed by extracting the square root of the variance: s s 2 The units of the standard deviation is the same as the unites of raw data. July, 2000 Guang Jin Important Generalizations For most frequency distributions, a majority (often as many as 68%) of all observations are within one standard deviation on either side of the mean. For most frequency distributions, a small minority (often as many as 5%) of all observations deviate more than two standard deviations on either side of the mean. July, 2000 Guang Jin Variability for Qualitative Data For qualitative data can not be ordered, measures of variability are nonexistent. For qualitative data can be ordered, it is appropriate to describe variability by identifying extreme observations. July, 2000 Guang Jin Coefficient of Variation Coefficient of variation (represented by CV) is defined as the ratio of the standard deviation to the absolute value of the mean, expressed as a percentage: CV depicts the size of the standard deviation relative to its mean and can be used to compare the relative variation of even unrelated quantities. July, 2000 Guang Jin Equations for Population and Sample Means and Standard Deviation n x Mean x Variance s July, 2000 N i i 1 n (x x) i 1 2 i n 1 s s x i 1 i N N n 2 Standard deviation Population Sample Quantity 2 Guang Jin 2 2 ( x ) i i 1 N 2