Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Measure of Location The term "Average" is vague Average could mean one of four things. The arithmetic mean, the median, midrange, or mode. For this reason, it is better to specify which average you're talking about. Mean When people say “average”, they usually intend the mean. Population Mean (), N = number of observations in the population: Sample Mean (x), : n = number of observation in the sample For reporting x, it is recommended that we use decimal accuracy of one digit more than the accuracy of the xi’s. For instance, If x1 = 58 and x2 = 67, then x 62.5 Physical interpretation of x Suppose there is horizontal measurement axis, each sample observation is represented by a 1-kg weight placed at the corresponding point on the axis. The only point at which a fulcrum can be placed to balance the system of weights is the point corresponding to the value of x . x = 21.18 10 20 30 40 1-11 Median The word median is synonymous with “middle”. The sample median is the middle value when the observations are ordered from smallest to largest. To find the median, we sort the data in ascending order, with any repeated values included. Then if n is odd, ~ x = the single middle value, OR n 1 ~ x ordered value 2 th If n is even, ~ x = the average of the two middle values, OR th n Average of and 2 th n 1 ordered values 2 Ex Find a median for the following data set. 3, 5, 4, 2, 1, 6, 8, 11, 14, 13, 6, 9, 10, 7 Sol . 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14 Median = (6+7)/2 = 6.5 BOXPLOT Ex We can make a box plot by using medians as follows 1. Split the numbers on left and right sides of the median: 1, 2, 3, 4, 5, 6, 6, │7, 8, 9, 10, 11, 13, 14 2. Find the median for each half: 1, 2, 3, 4, 5, 6, 6 │ 7, 8, 9, 10, 11, 13, 14 1-12 Left median (Lower Quartile ) = 4 Right median (Upper Quartile) = 10 3. Draw a number line from the smallest to the largest number without skipping any numbers 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14 4. Put circles at the LOWER, UPPER Quartiles,and median (6.5) 5. Draw a box connecting the circles at the LOWER and UPPER Quartiles. 6. Draw a line connecting the median to the box. 7. Put circles at the high and low points. 1-13 8. Draw lines that connect the high and low points to the box. 1 2 3 4 5 6 6 7 9 8 10 11 13 Mean VS median The population mean and median will not generally be equal to each other. If the population distribution is either positively or negatively skewed, then ~ ~ (a) negative skew ~ = ~ (b) symmetric (c) positive skew Mode (x*) The mode is the most frequent data value. There may be no mode if no one value appears more than any other. There may also be two modes (bimodal), three modes (trimodal), or more than three modes (multi-modal). Other measures of location: quartiles, percentiles, and trimmed means The median divides the data set into two parts of equal size. Quartiles divide the data set into 4 equal parts. Percentiles divide the data set into 100 equal parts. 10 % trimmed mean is computed by eliminating the smallest 10% and the largest 10 % of the sample, and then averaging what is left over. 1-14 14 15 % trimmed mean is computed by eliminating the smallest 15% and the largest 15 % of the sample, and then averaging what is left over. The median is obtained by trimming the maximum possible amount from each end. The mean is obtained by trimming 0% from each end of the sample. Example Consider the following 20 observations, ordered from smallest to largest, each one representing the lifetime (in hours) of a certain type of incandescent lamp: 612 623 666 744 883 898 964 970 983 1003 1016 1022 1029 1058 1085 1088 1122 1135 1197 1201 The average of all 20 observations is x 965.0 , and ~x 1009.5 The 10% trimmed mean is obtained by : deleting the smallest two observations (612 and 623) and the largest two (1197 and 1201) averaging the remaining 16, we obtain xtr (10) 979.1 The 20% trimmed mean is obtained by averaging the middle 12 values: xtr ( 20) 999.9 Note: 20% trimmed mean is closer to the median than the 10% trimmed mean. Summary The Mean is used in computing other statistics (such as the variance). It is often not appropriate for skewed distributions such as salary information. The Median is the center number and is good for skewed distributions because it is resistant to change. The Mode is used to describe the most typical case. The mode can be used with nominal data whereas the others can't. The mode may or may not exist and there may be more than one value for the mode. 1-15 Property Mean Median Mode Uses all data values Yes No No Affected by extreme values Yes No No __________________________________________________ Measures of Variation The following figure shows dotplots of 3 samples with the same mean and median. Questions - It can be seen that the spread about the center is different for all 3 samples. - Which sample has the largest amount of variability? - Which sample has the smallest amount of variability? 1: 2: 3: 10 20 30 40 50 Range The range is the simplest measure of variation. It is simply the highest value minus the lowest value. RANGE = MAXIMUM - MINIMUM Since the range only uses the largest and smallest values, it is greatly affected by extreme values, that is - it is not resistant to change. Variance "Average Deviation" 1-16 The range only involves the smallest and largest numbers, and it would be desirable to have a statistic which involved all of the data values. There are many ways to produce a statistics that involves all data values. An easy way is to use the average deviation from the mean, which is defined as: Average Deviation = (x i ) N The problem is that this summation is always zero. So, the average deviation will always be zero. That is why the average deviation is never used. Population Variance So, to keep it from being zero, the deviation from the mean is squared and called the "squared deviation from the mean". This "average squared deviation from the mean" is called the variance. Population Variance 2 (x i )2 N Sample Variance One would expect the sample variance to simply be the population variance with the population mean replaced by the sample mean. However, this formula has the problem that the estimated value isn't the same as the parameter. To counteract this, the sum of the squares of the deviations is divided by one less than the sample size. Sample variance s2 (x i x)2 n 1 Computational formula for S2 : 1-17 S2 Proof (x i Because x x n i xi2 ( xi ) 2 n 1 n , therefore nx 2 ( xi ) 2 -------------(1) n x ) 2 ( xi2 2 x.xi x 2 ) xi2 2 x xi ( x ) 2 From (1) = x 2 x.nx n( x ) 2 = x 2n( x ) 2 n( x ) 2 = x n(x ) 2 = xi2 2 i 2 i 2 i ( xi ) 2 n Standard Deviation There is a problem with variances. Recall that the deviations were squared. That means that the units were also squared. To get the units back to the same as the original data values, the square root must be taken. Population Standard Deviation Simple Standard Deviation 2 S S 2 (x i )2 N (x x) 2 n 1 Example (From Devore, 2000) 1-18 The amount of light reflectance by leaves has been used for various purposes, including evaluation of turf color, estimation of nitrogen status, and measurement of biomass. The following observations are obtained using spectrophotogrammetry, on leaf reflectance under specified experimental conditions. observation xi Observation xi 15.2 x i2 231.04 9 12.7 x i2 161.29 1 2 16.8 282.24 10 15.8 249.64 3 12.6 158.76 11 19.2 368.64 4 13.2 174.24 12 12.7 161.29 5 12.8 163.84 13 15.6 243.36 6 13.8 190.44 14 13.5 182.25 7 16.3 265.59 15 12.9 166.4 8 13.0 169.00 x i 216.1 x 2 i 3168.13 Find S2 and S S2 x 2 i ( xi ) 2 n 1 n = 216.12 3168.13 15 = 3.92 15 1 S = 1.98 Because the numerator of S2 is the sum of nonnegative quantities, S2 is guaranteed to be nonnegative. Properties of s2 The properties of s2 can sometimes be used to increase computational efficiency. Let x1 , x2 , … xn be a sample and c be any nonzero constant. 1. If y1 = x1 + c, y2 = x2 + c, . . . , yn = xn + c, then s y2 s x2 , and 2. If y1 = c x1 , . . . , yn = c xn, then s y2 c 2 s x2 and s y c s x 1-19 where s x2 = sample variance of the x’s s y2 = sample variance of the y’s In words, property 1 says that if a constant c is added to (or subtracted from) each data value, the variance is unchanged. Property 2 says that the multiplication of each xi by c results in s2 being multiplied by a factor of c2 These properties can be proved by noting in Property 1 that y x c and in Property 2 that y cx 1-20