Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical Skills – Standard Deviation The Normal Distribution Curve/The Bell Curve To explain what standard deviation is and what it is used for it is first important to understand something called the ‘normal distribution curve’. This is something that mathematicians noticed about data that is collected. When sample sizes are at least 100, if the results are quantified and displayed on a graph, the results will tend to approximate what is called the "normal curve" of distribution (see diagram). That is, the majority of people will give you an "average" response, a smaller number will give you a "below average" or an "above average" response, and a very small number will give you an "exceptionally below average" or an "exceptionally above average" response. This distribution is also known as a bell curve. The larger your sample size is, the more likely that it is that your results will reflect the normal distribution curve. The steeper the curve the more clustered the data is around the mean and vice versa. Sometimes there can be a clear skew in the data: This may be when you are researching something controversial which people have strong views about – for example, if you did a survey ‘The BNP should be the next leaders of the UK’, you would expect the data to be skewed towards the negative responses. In a negative skew the mode lies to the right of the mean and vice versa for the positive skew. The greater the differed between the mean and mode, the greater the skew is likely to be. Standard Deviation You might have two set of data that produce the same mean, but you might have a very different range of values within them. You could then use the interquartile range to take out the extreme values and give you a clearer idea of the spread of the data. Standard deviation is one additional statistical tool that produces a figure indicating the extent to which data is clustered around the mean. Mathematically, under a normal distribution curve, 68.3 percent of all observations fall within plus or minus one standard deviation of the middle of the curve; 95.5 percent of test observations fall within two standard deviations of the middle of the normal curve and 99.7 percent of test observations fall within three standard deviations. The key point is that the larger the sample size, the greater the probability that the test results will fall within one to two standard deviations of the middle of the normal curve of population behavior. The Formula 1. 2. 3. 4. 5. 6. First work out the mean Put the data in a table and subtract the mean from each value. Square the results of step 2. Add up all the results of step 3 (this gives you the top part of the formula – Σ(x-x)² Divide your results of step 4 by n (the number in the sample) Finally – don’t forget to find the square root of your result. Upstream and Downstream Pebble size. Upstream (cm) 15 8 22 32 16 18.5 34 32 19.5 13.5 28 10.5 13 24.5 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Downstream (cm) 4 8 10 6 19 14 6 13.5 7 5 12.5 12 8.5 6 13 Upstream: Working Pebble Size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Σx = Mean = Divide your result by the number of samples : Find the square root of the result above: (the answer is the SD). Downstream Year Size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Σx = Mean = Divide your result by the number of samples : Find the square root of the result above: (the answer is the SD). Upstream Interpreting your result – you should have got a result of ………….. As 68.2% of the data in a normal distribution sample lies within 1 standard deviation of the mean, this means that you need to add and minus the standard deviation from the mean. 668.5 + 123.6 = 792.1mm 668.5 – 123.6 = 544.9 mm Therefore this tells us that 68.2% of the data lies between 544.9mm and 792.1mm 2 standard deviations of the mean is: 668.5+ 123.6+123.6 = 668.5 -123.6 – 123.6 = Therefore 95.5% of the data lies between _____ and _______ 3 standard deviations of the mean is: 668.5 +123.6 +123.6 + 123.6 = 668.5 -123.6 – 123.6 – 123.6 = Therefore 99.7% of the data lies between ______ and ________ The lower standard deviation result you get from your calculation then the more centred around the mean the data is and therefore the fewer extreme values there are in the data set and the dispersion is narrow. A high standard deviation value indicates that the data is more widely spread and that dispersion is large. Downstream Interpreting your result – you should have got a result of ………….. The values between which 68.3% (1 standard deviation plus and minus of the mean) of the data lies between = _____________ and _________________ The values between which 95.5% of the data lies =_______________ and ______________ The values between which 99.7% of the data lies = _______________ and ________________ Comparing Standard deviation for Upstream ______________ Standard deviation for Downstream : __________________ What can we conclude from these results: