Download Stats Practical 1 2006

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
EART20170
STATISTICS PRACTICAL 1
Histograms, standard error and weighted mean
Attempt this part without using the statistical functions of your calculators.
1. A sedimentologist studying fluvial conglomerates has measured the dimensions of pebbles in a
present-day point-bar deposit. The following set of numbers is a ranked list of the lengths (in cm.) of
pebble long axes (i.e. maximum diameters) as measured on a sample of 51 pebbles:
7.6
8.1
8.2
8.5
8.8
8.9
9.0
9.1
9.2
9.2
9.3
9.3
9.3
9.3
9.4
9.4
9.4
9.4
9.4
9.5
9.5
9.6
9.7
9.7
9.8
9.9
10.0
10.0
10.1
10.2
10.4
10.5
10.7
10.7
10.9
10.9
11.1
11.3
11.3
11.5
11.6
11.8
12.2
12.6
12.7
12.9
13.4
13.5
14.2
15.7
17.3.
(a) Given that the quoted precision (to the nearest mm) is appropriate, depict the distribution of
pebble long axes on a histogram (use graph paper provided).
(b) Describe the distribution qualitatively.
(c) Derive the following central values: (i) mode; (ii) median; (iii) mean.
(d) Describe the spread of the distribution with reference to: (i) the total range; (ii) the interquartile range; (iii) the standard deviation.
(e) The distribution of pebble lengths contains information about two important
sedimentological characteristics. One is the absolute grainsize of the sediment. What is the
other? What statistical parameter would you choose as a measure of this second characteristic?
(f) Describe the symmetry and sharpness of the distribution. Derive the following: (i) Skewness; and
(ii) Kurtosis.
2. An exploration geologist measures the gold concentration in a stream sediment to assess the
likelihood of an area containing a gold deposit. The geologist uses atomic absorption spectroscopy
(AAS) to determine the gold content in parts per million (ppm). Ten replicate analyses of the
sediment gave the following results in ppm.
7.1
10.1
7.7
10.3
9.8
12.4
9.3
10.4
9.6
9.5
(a) Calculate the arithmetic mean and standard deviation
(b) Calculate the standard error of the mean
The cut-off grade for continued exploration for a gold deposit is estimated at 10 ppm in the
stream sediment. Therefore, the geologist would like to be confident that the mean gold content is
accurate to within 0.1 ppm of the true value. To obtain this level of accuracy the geologist has two
options to consider, either (1) re-calibrate the AAS, which will lead to a five times improvement in
the precision of the apparatus (total cost of re-calibration is £500), or (2) collect more stream
sediment samples and analyse for their gold content on the uncalibrated AAS (at a cost of £3 per
sediment sample).
(c) Use your knowledge of statistics to decide which of the options is likely to be the most cost
effective.
3. Mine drainage entering a river is suspected to be the cause of high As concentrations, measured in
parts per billion in four river samples as:
25.2  3.8
26.4  2.0
21.2  3.4
22.2  3.1 (errors quoted are 1 standard deviation)
Following a clean-up operation, which attempted to prevent the mine drainage from entering the river,
the As was measured in a further four samples of river water:
17.3  3.6
15.4  2.5
18.3  3.2
12.7  3.2 (errors quoted are 1 standard deviation)
Prove that the clean-up operation has had some success (hint: assume a significant difference as being
greater than 2 standard deviations from the average).
Statistics Practical No1 : Answers
1 (a) If the quoted precision on the measured pebble lengths is 0.1 cm, then the class widths should be larger
than this. Class widths of 0.5 cm or 1 cm would be appropriate. However a class width of 2 cm would be too
large as it would be a large proportion of the total range. The histogram below has class widths of 1 cm. The
data are continuous (i.e. any value is possible), so the histogram is drawn with adjoining columns.
(b) The distribution is unimodal (i.e. has one peak) and is skewed to high values of pebble length (positive
skew).
(c) (i) The data are continuous and therefore may be grouped into classes to give a mode. Once the data
have been classed the mode is the mid-point of the class with the highest frequency (i.e. mid-point of the
highest column in the histogram). On the histogram above the class with the highest frequency is 9-10
(frequency of 20). Therefore the mode is 9.5 cm.
(ii) The median is the middle value in the ranked list, for the pebble data this is the 26th value in the list of
51, i.e. median is 9.9 cm.
(iii) The mean is the sum of all the values divided by the number of samples.
x 
536
51
x  10.5cm (51 observations)
(d) (i) The total range is the difference between the maximum and minimum values
= 17.3 - 7.6 = 9.7 cm
(ii) The inter-quartile range is the difference between the value 3/4 of the way along the ranked list and 1/4
of the way along.
3/4 of 51 = 38.25, therefore take the 39th value = 11.3
1/4 of 51 = 12.75, therefore take the 13th value = 9.3
The interquartile range = 11.3 - 9.3 = 2.0 cm.
s
 (x - x )2
N -1
(iii) Standard deviation:
x
( x  x)
( x  x) 2
7.6
8.1
8.2
8.5
8.8
8.9
9.0
9.1
9.2
9.2
9.3
9.3
9.3
9.3
9.4
9.4
9.4
9.4
9.4
9.5
9.5
9.6
9.7
9.7
9.8
9.9
2.91
2.41
2.31
2.01
1.71
1.61
1.51
1.41
1.31
1.31
1.21
1.21
1.21
1.21
1.11
1.11
1.11
1.11
1.11
1.01
1.01
0.91
0.81
0.81
0.71
0.61
8.468
5.808
5.336
4.040
2.924
2.592
2.280
1.988
1.716
1.716
1.464
1.464
1.464
1.464
1.232
1.232
1.232
1.232
1.232
1.020
1.020
0.828
0.656
0.656
0.504
0.372
x
10.0
10.0
10.1
10.2
10.4
10.5
10.7
10.7
10.9
10.9
11.1
11.3
11.3
11.5
11.6
11.8
12.2
12.6
12.7
12.9
13.4
13.5
14.2
15.7
17.3
( x  x)
0.51
0.51
0.41
0.31
0.11
0.01
0.19
0.19
0.39
0.39
0.59
0.79
0.79
0.99
1.09
1.29
1.69
2.09
2.19
2.39
2.89
2.99
3.69
5.19
6.79
( x  x) 2
0.260
0.260
0.168
0.096
0.012
0.0001
0.036
0.036
0.152
0.152
0.348
0.624
0.624
0.980
1.188
1.664
2.856
4.368
4.796
5.712
8.352
8.940
13.616
29.936
46.104
( x  x ) 2 = 182.220
s 
182.22

51 - 1
3.644
s  1.9 cm
(e) The second sedimentological characteristic about which the data yield information is the degree of size
sorting. Well-sorted sediments have narrow spreads of grain size and poorly-sorted sediments have wide
spreads of grain size. Therefore, the degree of size sorting can be described by any of the statistical
parameters that measure spread. Of these the total range is the least satisfactory as it is strongly affected by
outliers (i.e. extreme values), but the inter-quartile range (I.Q.R.) or standard deviation would do equally
well.
(f) The distribution is postively skewed to high values of pebble length and has a sharp peak.
(i) Skewness
N
g1 
N   xi  x 
3
i 1

2
   xi  x  
 i 1

N
3/ 2
The answer is 1.45 ie positively skewed (as expected).
(ii) Kurtosis:
N
g2 
N  xi  x 4
i 1

2
   xi  x  
 i 1

N
2
3
The answer is 2.34. i.e. a sharp peak in the distribution
2 (a) x  9.6 ;  = 1.5
(b) The standard error of the mean is
SE( x ) 
s

N
1.5
10
 0.5
The gold concentration is 9.6 ± 0.5 ppm (10 observations).
(c) Option (1) obtains a five-fold improvement in precision of the AAS by recalibration. The uncalibrated
precision is taken to be the standard deviation of the ten samples analysed (1.5 ppm). A five times
improvement therefore gives s =(1.5/5) = 0.3, and the standard error reduces to:
SE( x ) 
s

N
0.3
10
 0.1
which is the required value
Option (2) requires more samples to be taken until the standard error reaches 0.1 ppm,
2
 s 
 1.5 
  
N  
  225
 0.1 
 SE( x ) 
2
Therefore, a further 215 samples must be collected and analysed at a cost of 215  3 = £675.
Thus, the cheaper option would be to re-calibrate the AAS.
3. The As concentrations can be weighted acording to their errors to obtain a weighted average*. Prior to
clean-up the weighted mean As concentration is:
x
s
 x / s2
 1/ s
2
1
 1 / s2

12.49
 24.5 ppb
0.51
 14
. ppb
. ppb As
After the clean-up this reduces to x  15.8  15
The weighted averages are more than two standard deviations apart and therefore the clean-up operation has
had a significant effect in reducing As concentration in the river.
*Note of caution: Averaging of results, whether weighted or not, should be done with caution and commonsense. Even
though a mesurement has a small quoted error it can still be wrong. If two results are in obvious disagreement any
average is meaningless and wrong. If results differ by more than two standard deviations then you should be extremely
suspicious, however rejection of points outside certain limits can get rapidly out of hand, and points should only be
rejected with reluctance.