Download Standard Deviation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistical Skills – Standard Deviation
The Normal Distribution Curve/The Bell Curve
To explain what standard deviation is and what it is used for it is first important to understand something
called the ‘normal distribution curve’. This is something that mathematicians noticed about data that is
collected.
When sample sizes are at least 100, if the results are quantified and displayed on a graph, the results will
tend to approximate what is called the "normal curve" of distribution (see diagram). That is, the majority of
people will give you an "average" response, a smaller number will give you a "below average" or an "above
average" response, and a very small number will give you an "exceptionally below average" or an
"exceptionally above average" response. This distribution is also known as a bell curve.
The larger your sample size is, the more likely that it is that your results will reflect the normal distribution
curve. The steeper the curve the more clustered the data is around the mean and vice versa. Sometimes
there can be a clear skew in the data:
This may be when you are researching something controversial which people have strong views about – for
example, if you did a survey ‘The BNP should be the next leaders of the UK’, you would expect the data to
be skewed towards the negative responses. In a negative skew the mode lies to the right of the mean and
vice versa for the positive skew. The greater the differed between the mean and mode, the greater the
skew is likely to be.
Standard Deviation
You might have two set of data that produce the same mean, but you might have a very different range of
values within them. You could then use the interquartile range to take out the extreme values and give you
a clearer idea of the spread of the data. Standard deviation is one additional statistical tool that produces a
figure indicating the extent to which data is clustered around the mean.
Mathematically, under a normal distribution curve, 68.3 percent of all observations fall within plus or minus
one standard deviation of the middle of the curve; 95.5 percent of test observations fall within two
standard deviations of the middle of the normal curve and 99.7 percent of test observations fall within
three standard deviations. The key point is that the larger the sample size, the greater the probability that
the test results will fall within one to two standard deviations of the middle of the normal curve of
population behavior.
The Formula
1.
2.
3.
4.
5.
6.
First work out the mean
Put the data in a table and subtract the mean from each value.
Square the results of step 2.
Add up all the results of step 3 (this gives you the top part of the formula – Σ(x-x)²
Divide your results of step 4 by n (the number in the sample)
Finally – don’t forget to find the square root of your result.
Upstream and Downstream Pebble size.
Upstream (cm)
15
8
22
32
16
18.5
34
32
19.5
13.5
28
10.5
13
24.5
45
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Downstream (cm)
4
8
10
6
19
14
6
13.5
7
5
12.5
12
8.5
6
13
Upstream: Working
Pebble
Size
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Σx =
Mean =
Divide your result by the number of samples :
Find the square root of the result above:
(the answer is the SD).
Downstream
Year
Size
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Σx =
Mean =
Divide your result by the number of samples :
Find the square root of the result above:
(the answer is the SD).
Upstream
Interpreting your result – you should have got a result of …………..
As 68.2% of the data in a normal distribution sample lies within 1 standard deviation of the mean, this
means that you need to add and minus the standard deviation from the mean.
668.5 + 123.6 = 792.1mm
668.5 – 123.6 = 544.9 mm
Therefore this tells us that 68.2% of the data lies between 544.9mm and 792.1mm
2 standard deviations of the mean is:
668.5+ 123.6+123.6 =
668.5 -123.6 – 123.6 =
Therefore 95.5% of the data lies between _____ and _______
3 standard deviations of the mean is:
668.5 +123.6 +123.6 + 123.6 =
668.5 -123.6 – 123.6 – 123.6 =
Therefore 99.7% of the data lies between ______ and ________
The lower standard deviation result you get from your calculation then the more centred around the mean
the data is and therefore the fewer extreme values there are in the data set and the dispersion is narrow. A
high standard deviation value indicates that the data is more widely spread and that dispersion is large.
Downstream
Interpreting your result – you should have got a result of …………..
The values between which 68.3% (1 standard deviation plus and minus of the mean) of the data lies
between = _____________ and _________________
The values between which 95.5% of the data lies =_______________ and ______________
The values between which 99.7% of the data lies = _______________ and ________________
Comparing
Standard deviation for Upstream ______________
Standard deviation for Downstream : __________________
What can we conclude from these results: