Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EART20170 STATISTICS PRACTICAL 1 Histograms, standard error and weighted mean Attempt this part without using the statistical functions of your calculators. 1. A sedimentologist studying fluvial conglomerates has measured the dimensions of pebbles in a present-day point-bar deposit. The following set of numbers is a ranked list of the lengths (in cm.) of pebble long axes (i.e. maximum diameters) as measured on a sample of 51 pebbles: 7.6 8.1 8.2 8.5 8.8 8.9 9.0 9.1 9.2 9.2 9.3 9.3 9.3 9.3 9.4 9.4 9.4 9.4 9.4 9.5 9.5 9.6 9.7 9.7 9.8 9.9 10.0 10.0 10.1 10.2 10.4 10.5 10.7 10.7 10.9 10.9 11.1 11.3 11.3 11.5 11.6 11.8 12.2 12.6 12.7 12.9 13.4 13.5 14.2 15.7 17.3. (a) Given that the quoted precision (to the nearest mm) is appropriate, depict the distribution of pebble long axes on a histogram (use graph paper provided). (b) Describe the distribution qualitatively. (c) Derive the following central values: (i) mode; (ii) median; (iii) mean. (d) Describe the spread of the distribution with reference to: (i) the total range; (ii) the interquartile range; (iii) the standard deviation. (e) The distribution of pebble lengths contains information about two important sedimentological characteristics. One is the absolute grainsize of the sediment. What is the other? What statistical parameter would you choose as a measure of this second characteristic? (f) Describe the symmetry and sharpness of the distribution. Derive the following: (i) Skewness; and (ii) Kurtosis. 2. An exploration geologist measures the gold concentration in a stream sediment to assess the likelihood of an area containing a gold deposit. The geologist uses atomic absorption spectroscopy (AAS) to determine the gold content in parts per million (ppm). Ten replicate analyses of the sediment gave the following results in ppm. 7.1 10.1 7.7 10.3 9.8 12.4 9.3 10.4 9.6 9.5 (a) Calculate the arithmetic mean and standard deviation (b) Calculate the standard error of the mean The cut-off grade for continued exploration for a gold deposit is estimated at 10 ppm in the stream sediment. Therefore, the geologist would like to be confident that the mean gold content is accurate to within 0.1 ppm of the true value. To obtain this level of accuracy the geologist has two options to consider, either (1) re-calibrate the AAS, which will lead to a five times improvement in the precision of the apparatus (total cost of re-calibration is £500), or (2) collect more stream sediment samples and analyse for their gold content on the uncalibrated AAS (at a cost of £3 per sediment sample). (c) Use your knowledge of statistics to decide which of the options is likely to be the most cost effective. 3. Mine drainage entering a river is suspected to be the cause of high As concentrations, measured in parts per billion in four river samples as: 25.2 3.8 26.4 2.0 21.2 3.4 22.2 3.1 (errors quoted are 1 standard deviation) Following a clean-up operation, which attempted to prevent the mine drainage from entering the river, the As was measured in a further four samples of river water: 17.3 3.6 15.4 2.5 18.3 3.2 12.7 3.2 (errors quoted are 1 standard deviation) Prove that the clean-up operation has had some success (hint: assume a significant difference as being greater than 2 standard deviations from the average). Statistics Practical No1 : Answers 1 (a) If the quoted precision on the measured pebble lengths is 0.1 cm, then the class widths should be larger than this. Class widths of 0.5 cm or 1 cm would be appropriate. However a class width of 2 cm would be too large as it would be a large proportion of the total range. The histogram below has class widths of 1 cm. The data are continuous (i.e. any value is possible), so the histogram is drawn with adjoining columns. (b) The distribution is unimodal (i.e. has one peak) and is skewed to high values of pebble length (positive skew). (c) (i) The data are continuous and therefore may be grouped into classes to give a mode. Once the data have been classed the mode is the mid-point of the class with the highest frequency (i.e. mid-point of the highest column in the histogram). On the histogram above the class with the highest frequency is 9-10 (frequency of 20). Therefore the mode is 9.5 cm. (ii) The median is the middle value in the ranked list, for the pebble data this is the 26th value in the list of 51, i.e. median is 9.9 cm. (iii) The mean is the sum of all the values divided by the number of samples. x 536 51 x 10.5cm (51 observations) (d) (i) The total range is the difference between the maximum and minimum values = 17.3 - 7.6 = 9.7 cm (ii) The inter-quartile range is the difference between the value 3/4 of the way along the ranked list and 1/4 of the way along. 3/4 of 51 = 38.25, therefore take the 39th value = 11.3 1/4 of 51 = 12.75, therefore take the 13th value = 9.3 The interquartile range = 11.3 - 9.3 = 2.0 cm. s (x - x )2 N -1 (iii) Standard deviation: x ( x x) ( x x) 2 7.6 8.1 8.2 8.5 8.8 8.9 9.0 9.1 9.2 9.2 9.3 9.3 9.3 9.3 9.4 9.4 9.4 9.4 9.4 9.5 9.5 9.6 9.7 9.7 9.8 9.9 2.91 2.41 2.31 2.01 1.71 1.61 1.51 1.41 1.31 1.31 1.21 1.21 1.21 1.21 1.11 1.11 1.11 1.11 1.11 1.01 1.01 0.91 0.81 0.81 0.71 0.61 8.468 5.808 5.336 4.040 2.924 2.592 2.280 1.988 1.716 1.716 1.464 1.464 1.464 1.464 1.232 1.232 1.232 1.232 1.232 1.020 1.020 0.828 0.656 0.656 0.504 0.372 x 10.0 10.0 10.1 10.2 10.4 10.5 10.7 10.7 10.9 10.9 11.1 11.3 11.3 11.5 11.6 11.8 12.2 12.6 12.7 12.9 13.4 13.5 14.2 15.7 17.3 ( x x) 0.51 0.51 0.41 0.31 0.11 0.01 0.19 0.19 0.39 0.39 0.59 0.79 0.79 0.99 1.09 1.29 1.69 2.09 2.19 2.39 2.89 2.99 3.69 5.19 6.79 ( x x) 2 0.260 0.260 0.168 0.096 0.012 0.0001 0.036 0.036 0.152 0.152 0.348 0.624 0.624 0.980 1.188 1.664 2.856 4.368 4.796 5.712 8.352 8.940 13.616 29.936 46.104 ( x x ) 2 = 182.220 s 182.22 51 - 1 3.644 s 1.9 cm (e) The second sedimentological characteristic about which the data yield information is the degree of size sorting. Well-sorted sediments have narrow spreads of grain size and poorly-sorted sediments have wide spreads of grain size. Therefore, the degree of size sorting can be described by any of the statistical parameters that measure spread. Of these the total range is the least satisfactory as it is strongly affected by outliers (i.e. extreme values), but the inter-quartile range (I.Q.R.) or standard deviation would do equally well. (f) The distribution is postively skewed to high values of pebble length and has a sharp peak. (i) Skewness N g1 N xi x 3 i 1 2 xi x i 1 N 3/ 2 The answer is 1.45 ie positively skewed (as expected). (ii) Kurtosis: N g2 N xi x 4 i 1 2 xi x i 1 N 2 3 The answer is 2.34. i.e. a sharp peak in the distribution 2 (a) x 9.6 ; = 1.5 (b) The standard error of the mean is SE( x ) s N 1.5 10 0.5 The gold concentration is 9.6 ± 0.5 ppm (10 observations). (c) Option (1) obtains a five-fold improvement in precision of the AAS by recalibration. The uncalibrated precision is taken to be the standard deviation of the ten samples analysed (1.5 ppm). A five times improvement therefore gives s =(1.5/5) = 0.3, and the standard error reduces to: SE( x ) s N 0.3 10 0.1 which is the required value Option (2) requires more samples to be taken until the standard error reaches 0.1 ppm, 2 s 1.5 N 225 0.1 SE( x ) 2 Therefore, a further 215 samples must be collected and analysed at a cost of 215 3 = £675. Thus, the cheaper option would be to re-calibrate the AAS. 3. The As concentrations can be weighted acording to their errors to obtain a weighted average*. Prior to clean-up the weighted mean As concentration is: x s x / s2 1/ s 2 1 1 / s2 12.49 24.5 ppb 0.51 14 . ppb . ppb As After the clean-up this reduces to x 15.8 15 The weighted averages are more than two standard deviations apart and therefore the clean-up operation has had a significant effect in reducing As concentration in the river. *Note of caution: Averaging of results, whether weighted or not, should be done with caution and commonsense. Even though a mesurement has a small quoted error it can still be wrong. If two results are in obvious disagreement any average is meaningless and wrong. If results differ by more than two standard deviations then you should be extremely suspicious, however rejection of points outside certain limits can get rapidly out of hand, and points should only be rejected with reluctance.