Download ch4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Law of large numbers wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Chapter 4. The Average and the Standard Deviation
This chapter’s objective is to introduce basic statistical terms and formulas used as tools
for analyzing and describing data sets.
Key Terms
Median – The value that falls in the middle position when the measurements are ranked
in the order from the smallest to the largest.
Average (or Mean) – The sum of a list of numbers divided by how many numbers are in
the list.
Root Mean Square – This gives you an idea of the size of the numbers on the list
without regard to whether they are positive or negative.
Standard Deviation – Measures the spread around the average of a data set.
Cross-sectional study – Different subjects are compared to each other at one point in
time.
Longitudinal – Subjects are followed over time, and compared with themselves at
different points in time.
THE MEDIAN AND THE AVERAGE
The best way to understand the median and the average is through examples:
Example 1
What is the median in this list: 13,2,45,21,9,7,5 ?
Solution
First sort the list from lowest to highest to get 2, 5, 7, 9, 13, 21, 45. The
median is then the middle number, which is 9.
Example 2
Find the median: 5, 7, 7, 8, 10, 12.
Solution
The list is already sorted but there is no middle number because there are
an even number of numbers on the list. When that happens take the 2 middle numbers,
here 7 and 8, and take median=(7+8)/2=7.5 is the median.
Example 3
Find the median: 10, 20, 30, 40, 50, 60.
Solution
The median is 35. Since there is an even amount of numbers, the median
is found by taking the average of the two middle numbers.
Example 4
Find the average of the following numbers: 0, 5, 9, 11 and 15.
Solution
To find the average, you add the numbers in the list and divide by the
quantity of numbers in that list.
Average =
0  5  9  11  25
= 10
5
Example 5
Find the average of these numbers: -3, -1, 7, 12 and 20.
Solution
Average =
(3)  (1)  7  12  20
=7
5
ROOT-MEAN-SQUARE
The Root-Mean-Square, abbreviated r.m.s., is an measure of the average size of the
numbers on a list. To find the r.m.s. size you:



SQUARE all the entries; this gets rid of the negative signs.
Take the MEAN of the squares.
Take the SQUARE ROOT of the mean
r.m.s. size of a list =
average of (entries)
2
It’s easy to remember what to do here because the initial r.m.s. is the procedure to follow,
in reverse (Square, take the Mean, take the square Root). Here is a link to an Excel
spreadsheet that will help you find the rms RMS spreadsheet
STANDARD DEVIATION
As the definition states the standard deviation measures spread around the average
of a data set. More technically, the standard deviation, abbreviated SD, is the r.m.s size
of the deviation from the average. Do not confuse the SD of a list with its r.m.s. size. The
SD is the r.m.s., not of the original numbers on the list, but of their deviations from the
average. To compute the SD, use the following steps:
1. Calculate the mean AVG of the list.
2. Subtract AVG from each member of the list to get a new list.
3. Take the RMS of the new list. The result is the SD
Calculate the SD of a List
List
List-Avg (List-Avg)^2
1.9
-1.225
1.500625
3.4
3.4
11.56 SD=
-1.7
-1.7
2.89
8.9
8.9
79.21
3.125
23.7901563
4.877515
The table has an example of an Excel spreadsheet to calculate the SD. The next example
does it by hand.
Example
Find the standard deviation of the list 41, 48, 50, 50, 54, 57.
Solution
The first step is to find the average:
Average =
41  48  50  50  54  57
= 50
6
The second step is to find the deviations from the average:
41-50=-9
48-50=-2
50-50=0
50-50=0
54-50=4
57-50=7
The last step is to find the r.m.s. of the deviations:
SD =
(9) 2  (2) 2  0 2  0 2  4 2  7 2
=
6
81  4  0  0  16  49
150
=
=
6
6
25 = 5
RELATING TO A HISTOGRAM
1. The histogram balances at the mean.
2. The median is the point at which 1/2 the area is to the right and 1/2 the area is to the
left.
3. Histograms with a wide spread around the mean have a larger SD than histograms with
a tight spread around the mean.
4. The Median is NOT necessarily equal to the Mean. It can be bigger or smaller.
First we will discuss the different shapes of histograms and then what this means for the
average and SD of a data set. A histogram is either symmetric or asymmetric. Basically,
symmetry requires that if you draw an imaginary line through the middle of the
histogram, the two remaining halves should be mirror images; that is, if you folded
around the line, the two halves should match up. If the histogram is asymmetric, we say
it is either skewed to the left or skewed to the right. When a histogram is skewed to the
left, the bulk of the area is on the right and when a histogram is skewed to the right the
bulk of the area is on the left. The figures below show a histogram skewed to the left
and skewed to the right. Notice the first has a long right hand tail and the second a long
left hand tail.
When a histogram has a long right tail the MEAN>MEDIAN.
When a histogram has a long left tail the MEAN<MEDIAN.
Figure 4.4.
Histogram for the list 1, 2, 2, 3.
Percentage per number in list
60
50
40
30
20
10
0
1
2
3
1. If a histogram is symmetric around a value, then that value equals the average.
Imagine histograms as blocks on a board of plywood and the average as a pivot under the
board. Then the histogram will balance at the average. Visualizing this, and referring to
figure 4.4, if we moved the block over 3 a little bit to the right, then the average also
would move to the right a bit to maintain balance. The farther away we move the block,
the more we continue to move the average in the same direction
As previously defined the median is the middle measurement. Relating to
histograms, we refer to areas rather than measurements. Therefore, the median of a
histogram is the value with half the area to the left and half the area to the right.
As defined earlier, standard deviation measures the spread of a data set around the
average. In other words, it tells you how the numbers deviate from the average.
The 68-95-99 rule.
In most histograms which are roughly shaped like a Bell Curve
 68% of the data points are within 1 SD of the Mean
 95% of the data points are within 2 SD’s of the Mean
 99% of the data points are within 3 SD’s of the Mean
So if we know that IQs of humans have mean 100 with SD 10, this rule allows us to
estimate that 68% of all people have an IQ in the range from 90 to 110. 95% are from 80
to 120, and 99% are from 70 to 130. People with IQs above 130 are as rare as people with
IQs below 70.
Example
Which numbers on the list – 41, 48, 50, 50, 54, 57 – are within 0.5 SDs of
average? Within 1.5 SDs of average?
Solution
First we must find the average and standard deviation of the list of numbers. We did this
in the previous example and found the average to be 50 and the SD to be 5. Finding 0.5
SDs of average means that you first have to multiply the standard deviation by 0.5. This
results in 0.5  5 = 2.5. Now, to find the numbers that fall within 0.5 SDs of the average
take the average and add 2.5 to it to find the upper bound. This results to 50 + 2.5 = 52.5.
Also, subtract 2.5 from the average to obtain 50 - 2.5 = 47.5. The numbers that fall
within the range 47.5 through 52.5 are the numbers that fall within 0.5 SDs of the
average. They are 48, 50 and 50.
Similarly, to find the numbers that fall within 1.5 SDs, first find 1.5 times the SD which
is 1.5  5 = 7.5. To obtain the upper and lower bounds we add 7.5 to and subtract 7.5 from
the average. This results in 50 + 7.5 = 57.5 and 50 – 7.5 = 42.5. The numbers that fall
within this range are 48, 50, 50, 54, and 57.
Example in finding the MEAN from a frequency table.
Suppose we have a company with 5 emloyees. 3 of them earn 12000 dollars per year and
2 of them earn 6000 dollars per year. Does that mean the average salary paid is
(12000+6000)/2=9000. OF COURSE NOT. The way we find the mean is as follows.
Total salary paid=3 times 12000 + 2 times 6000=48000.
Mean salary paid=48000/5=9600.