Download Atkinson Statistical Measures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Atkinson Statistical Measures
The Median, the Mean and the Mode
Before you can begin to understand statistics, there are four terms you will need to fully understand. The
first term 'average' is something we have been familiar with from a very early age when we start analyzing
our marks on report cards. We add together all of our test results and then divide it by the sum of the total
number of marks there are. We often call it the average. However, statistically it's the Mean!
The Mean
Example:
Four tests results: 15, 18, 22, 20
The sum is: 75
Divide 75 by 4: 18.75
The 'Mean' (Average) is 18.75
The Median
The Median is the 'middle value' in your list. When the totals of the list are odd, the median is the middle
entry in the list after sorting the list into increasing order. When the totals of the list are even, the median is
equal to the sum of the two middle (after sorting the list into increasing order) numbers divided by two.
Thus, remember to line up your values, the middle number is the median! Be sure to remember the odd and
even rule.
Examples:
Find the Median of: 9, 3, 44, 17, 15 (Odd amount of numbers)
Line up your numbers: 3, 9, 15, 17, 44 (smallest to largest)
The Median is: 15 (The number in the middle)
Find the Median of: 8, 3, 44, 17, 12, 6 (Even amount of numbers)
Line up your numbers: 3, 6, 8, 12, 17, 44
Add the 2 middles numbers and divide by 2: 8 12 = 20 ÷ 2 = 10
The Median is 10.
The Mode
The mode in a list of numbers refers to the list of numbers that occur most frequently. A trick to remember
this one is to remember that mode starts with the same first two letters that most does. Most frequently Mode. You'll never forget that one!
Examples:
Find the mode of:
9, 3, 3, 44, 17 , 17, 44, 15, 15, 15, 27, 40, 8,
Put the numbers is order for ease:
3, 3, 8, 9, 15, 15, 15, 17, 17, 27, 40, 44, 44,
The Mode is 15 (15 occurs the most at 3 times)
*It is important to note that there can be more than one mode and if no number occurs more than once in
the set, then there is no mode for that set of numbers.
Range
Occasionally in Statistics you'll be asked for the 'range' in a set of numbers. The range is simply the the
smallest number subtracted from the largest number in your set. Thus, if your set is 9, 3, 44, 15, 6 - The
range would be 44-3=41. Your range is 41.
Standard Deviation
The Standard Deviation (σ) is a measure of how spread out numbers are.
The formula is easy: it is the square root of the Variance. So now you ask, "What is the Variance?"
Variance
The Variance (which is the square of the standard deviation, ie: σ2) is defined as:
The average of the squared differences from the Mean.
In other words, follow these steps:
1. Work out the Mean (the simple average of the numbers)
2. Now, for each number subtract the Mean and then square the result (the squared difference).
3. Then work out the average of those squared differences. (Why Square?)
Example
You and your friends have just measured the heights of your dogs (in millimeters):
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.
Find out the Mean, the Variance, and the Standard Deviation.
Answer:
600 + 470 + 170 + 430 + 300
Mean =
1970
=
5
= 394
5
so the average height is 394 mm. Let's plot this on the chart:
Now, we calculate each dogs difference from the Mean:
To calculate the Variance, take each difference, square it, and then average the result:
2062 + 762 + (-224)2 + 362 + (-94)2
Variance: σ =
2
108,520
=
5
= 21,704
5
So, the Variance is 21,704.
And the Standard Deviation is just the square root of Variance, so:
Standard Deviation: σ = √21,704 = 147
And the good thing about the Standard Deviation is that it is useful. Now we can show which heights are
within one Standard Deviation (147mm) of the Mean:
So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what is extra
large or extra small.
Rottweillers are tall dogs. And Dachsunds are a bit short ... but don't tell them!
*Note: Why square ?
Squaring each difference makes them all positive numbers (to avoid negatives reducing the Variance)
And it also makes the bigger differences stand out. For example 1002=10,000 is a lot bigger than
502=2,500.
But squaring them makes the final answer really big, and so un-squaring the Variance (by taking the square
root) makes the Standard Deviation a much more useful number.
Histogram
Definition of Histogram

A histogram is a bar graph that shows how frequently data occur within certain
ranges or intervals. The height of each bar gives the frequency in the respective
interval.
Examples of Histogram

The histogram shown below gives the number of children visited a particular zoo.
The Normal Curve
The normal distributions are a very important class of statistical distributions. All normal distributions are
symmetric and have bell-shaped density curves with a single peak.
To speak specifically of any normal distribution, two quantities have to be specified: the mean , where the
peak of the density occurs, and the standard deviation , which indicates the spread or girth of the bell
curve. (The greek symbol
Different values of
is pronounced mu and the greek symbol is pronounced sig-ma.)
and yield different normal density curves and hence different normal distributions.
Try the applet below for example. You should be able to change the mean
using the sliders and see the density change.
and the standard deviation
The normal density can be actually specified by means of an equation. The height of the density at any
value x is given by
Although there are many normal curves, they all share an important property that allows us to treat them in
a uniform fashion.
The 68-95-99.7% Rule
All normal density curves satisfy the following property which is often referred to as the Empirical Rule.
68%
of the observations fall within 1 standard deviation of the mean, that is, between
and
.
95%
of the observations fall within 2 standard deviations of the mean, that is, between
and
.
99.7%
of the observations fall within 3 standard deviations of the mean, that is, between
and
.
Thus, for a normal distribution, almost all values lie within 3 standard deviations of the mean.
The check buttons below will help you realize the appropriate percentages of the area under the curve.
Remember that the rule applies to all normal distributions. Also remember that it applies only to normal
distributions.
An Example
Let us apply the Empirical Rule to Example 1.17 from Moore and McCabe.
The distribution of heights of American women aged 18 to 24 is approximately normally distributed with
mean 65.5 inches and standard deviation 2.5 inches. From the above rule, it follows that
68%
of these American women have heights between 65.5 - 2.5 and 65.5 + 2.5 inches, or between 63
and 68 inches,
95%
of these American women have heights between 65.5 - 2(2.5) and 65.5 + 2(2.5) inches, or between
63 and 68 inches.
Again, you can try this out with the example below.
Therefore, the tallest 2.5% of these women are taller than 70.5 inches. (The extreme 5% fall more than two
standard deviations, or 5 inches from the mean. And since all normal distributions are symmetric about
their mean, half of these women are the tall side.)
Almost all young American women are between 58 and 73 inches in height if you use the 99.7%
calculations.
Pearson’s Correlation Coefficient
The correlation coefficient ρX, Y between two random variables X and Y with expected values μX and μY and
standard deviations σX and σY is defined as:
where E is the expected value operator and cov means covariance. A widely used alternative notation is
Since μX = E(X), σX2 = E[(X - E(X))2] = E(X2) − E2(X) and likewise for Y, we may also write