Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 4. The Average and the Standard Deviation This chapter’s objective is to introduce basic statistical terms and formulas used as tools for analyzing and describing data sets. Key Terms Median – The value that falls in the middle position when the measurements are ranked in the order from the smallest to the largest. Average (or Mean) – The sum of a list of numbers divided by how many numbers are in the list. Root Mean Square – This gives you an idea of the size of the numbers on the list without regard to whether they are positive or negative. Standard Deviation – Measures the spread around the average of a data set. Cross-sectional study – Different subjects are compared to each other at one point in time. Longitudinal – Subjects are followed over time, and compared with themselves at different points in time. THE MEDIAN AND THE AVERAGE The best way to understand the median and the average is through examples: Example 1 What is the median in this list: 13,2,45,21,9,7,5 ? Solution First sort the list from lowest to highest to get 2, 5, 7, 9, 13, 21, 45. The median is then the middle number, which is 9. Example 2 Find the median: 5, 7, 7, 8, 10, 12. Solution The list is already sorted but there is no middle number because there are an even number of numbers on the list. When that happens take the 2 middle numbers, here 7 and 8, and take median=(7+8)/2=7.5 is the median. Example 3 Find the median: 10, 20, 30, 40, 50, 60. Solution The median is 35. Since there is an even amount of numbers, the median is found by taking the average of the two middle numbers. Example 4 Find the average of the following numbers: 0, 5, 9, 11 and 15. Solution To find the average, you add the numbers in the list and divide by the quantity of numbers in that list. Average = 0 5 9 11 25 = 10 5 Example 5 Find the average of these numbers: -3, -1, 7, 12 and 20. Solution Average = (3) (1) 7 12 20 =7 5 ROOT-MEAN-SQUARE The Root-Mean-Square, abbreviated r.m.s., is an measure of the average size of the numbers on a list. To find the r.m.s. size you: SQUARE all the entries; this gets rid of the negative signs. Take the MEAN of the squares. Take the SQUARE ROOT of the mean r.m.s. size of a list = average of (entries) 2 It’s easy to remember what to do here because the initial r.m.s. is the procedure to follow, in reverse (Square, take the Mean, take the square Root). Here is a link to an Excel spreadsheet that will help you find the rms RMS spreadsheet STANDARD DEVIATION As the definition states the standard deviation measures spread around the average of a data set. More technically, the standard deviation, abbreviated SD, is the r.m.s size of the deviation from the average. Do not confuse the SD of a list with its r.m.s. size. The SD is the r.m.s., not of the original numbers on the list, but of their deviations from the average. To compute the SD, use the following steps: 1. Calculate the mean AVG of the list. 2. Subtract AVG from each member of the list to get a new list. 3. Take the RMS of the new list. The result is the SD Calculate the SD of a List List List-Avg (List-Avg)^2 1.9 -1.225 1.500625 3.4 3.4 11.56 SD= -1.7 -1.7 2.89 8.9 8.9 79.21 3.125 23.7901563 4.877515 The table has an example of an Excel spreadsheet to calculate the SD. The next example does it by hand. Example Find the standard deviation of the list 41, 48, 50, 50, 54, 57. Solution The first step is to find the average: Average = 41 48 50 50 54 57 = 50 6 The second step is to find the deviations from the average: 41-50=-9 48-50=-2 50-50=0 50-50=0 54-50=4 57-50=7 The last step is to find the r.m.s. of the deviations: SD = (9) 2 (2) 2 0 2 0 2 4 2 7 2 = 6 81 4 0 0 16 49 150 = = 6 6 25 = 5 RELATING TO A HISTOGRAM 1. The histogram balances at the mean. 2. The median is the point at which 1/2 the area is to the right and 1/2 the area is to the left. 3. Histograms with a wide spread around the mean have a larger SD than histograms with a tight spread around the mean. 4. The Median is NOT necessarily equal to the Mean. It can be bigger or smaller. First we will discuss the different shapes of histograms and then what this means for the average and SD of a data set. A histogram is either symmetric or asymmetric. Basically, symmetry requires that if you draw an imaginary line through the middle of the histogram, the two remaining halves should be mirror images; that is, if you folded around the line, the two halves should match up. If the histogram is asymmetric, we say it is either skewed to the left or skewed to the right. When a histogram is skewed to the left, the bulk of the area is on the right and when a histogram is skewed to the right the bulk of the area is on the left. The figures below show a histogram skewed to the left and skewed to the right. Notice the first has a long right hand tail and the second a long left hand tail. When a histogram has a long right tail the MEAN>MEDIAN. When a histogram has a long left tail the MEAN<MEDIAN. Figure 4.4. Histogram for the list 1, 2, 2, 3. Percentage per number in list 60 50 40 30 20 10 0 1 2 3 1. If a histogram is symmetric around a value, then that value equals the average. Imagine histograms as blocks on a board of plywood and the average as a pivot under the board. Then the histogram will balance at the average. Visualizing this, and referring to figure 4.4, if we moved the block over 3 a little bit to the right, then the average also would move to the right a bit to maintain balance. The farther away we move the block, the more we continue to move the average in the same direction As previously defined the median is the middle measurement. Relating to histograms, we refer to areas rather than measurements. Therefore, the median of a histogram is the value with half the area to the left and half the area to the right. As defined earlier, standard deviation measures the spread of a data set around the average. In other words, it tells you how the numbers deviate from the average. The 68-95-99 rule. In most histograms which are roughly shaped like a Bell Curve 68% of the data points are within 1 SD of the Mean 95% of the data points are within 2 SD’s of the Mean 99% of the data points are within 3 SD’s of the Mean So if we know that IQs of humans have mean 100 with SD 10, this rule allows us to estimate that 68% of all people have an IQ in the range from 90 to 110. 95% are from 80 to 120, and 99% are from 70 to 130. People with IQs above 130 are as rare as people with IQs below 70. Example Which numbers on the list – 41, 48, 50, 50, 54, 57 – are within 0.5 SDs of average? Within 1.5 SDs of average? Solution First we must find the average and standard deviation of the list of numbers. We did this in the previous example and found the average to be 50 and the SD to be 5. Finding 0.5 SDs of average means that you first have to multiply the standard deviation by 0.5. This results in 0.5 5 = 2.5. Now, to find the numbers that fall within 0.5 SDs of the average take the average and add 2.5 to it to find the upper bound. This results to 50 + 2.5 = 52.5. Also, subtract 2.5 from the average to obtain 50 - 2.5 = 47.5. The numbers that fall within the range 47.5 through 52.5 are the numbers that fall within 0.5 SDs of the average. They are 48, 50 and 50. Similarly, to find the numbers that fall within 1.5 SDs, first find 1.5 times the SD which is 1.5 5 = 7.5. To obtain the upper and lower bounds we add 7.5 to and subtract 7.5 from the average. This results in 50 + 7.5 = 57.5 and 50 – 7.5 = 42.5. The numbers that fall within this range are 48, 50, 50, 54, and 57. Example in finding the MEAN from a frequency table. Suppose we have a company with 5 emloyees. 3 of them earn 12000 dollars per year and 2 of them earn 6000 dollars per year. Does that mean the average salary paid is (12000+6000)/2=9000. OF COURSE NOT. The way we find the mean is as follows. Total salary paid=3 times 12000 + 2 times 6000=48000. Mean salary paid=48000/5=9600.