Download experimental - accepted

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Chemistry 153
Clark College
Chemistry 153 Statistics Review
Pertinent Definitions
ERROR in a measurement is the difference between the observed (measured) value and the "true"
value of the quantity measured.
ACCURACY expresses the correctness (absence of error) in a measurement, i.e. the closeness of an
experimental determination to the "true" value.
PRECISION describes the reproducibility of a measurement, i.e. the agreement between the numerical
values of two or more experimental determinations done in identical fashion. Strictly speaking,
precision and accuracy do not have to be related. One could have data where the experimental values
varied quite a bit and therefore would be termed imprecise. However, it is possible that the average of
these numbers could be very close to the "true" value and the accuracy of the mean would be high.
(Dartboard analogy)
DETERMINATE ERRORS are those that do have a definite value which in principle can be defined.
Examples include use of uncalibrated equipment, incomplete chemical reaction, improper use of a
balance or colorblindness.
INDETERMINATE ERRORS are those that do not have a definite measurable value, but rather
fluctuate in a random manner. All physical measurements are subject to a degree of uncertainty, or
indeterminate error. The range or spread is the difference between the highest and lowest values in a
series of measurements.
ABSOLUTE ERROR in a measurement is the difference between the observed value and the accepted
value. Absolute Error = (experimental – accepted/"true")
RELATIVE ERROR defines error in comparative terms (the percent difference). Usually in percent (%)
or in parts per thousand(ppth).
Percent Relative Error =
(experimental - accepted)
x 100% (or 1000)
accepted
Example: In a particular analysis the accepted value for the percentage of chlorine in a sample was
24.34%. An experimental result of 24.29% would have an absolute error of 24.29 - 24.34 or 0.05%. The relative error would be:
(24.29 ! 24.34)
x 100% = -0.2%
24.34
or
(24.29 - 24.34)
x 1000 = -2 ppth
24.34
MEAN VALUE, (x-“bar”) is the arithmetic average of a set of measurements. It is represented by the
equation below, where Xi represents a single experiment value and N is the number of determinations.
x =
N
! xi
i=1
N
AVERAGE DEVIATION is a measure of precision. It is represented by the equation below, where Xi
represents a single experiment value,
x is the mean and N is the number of determinations.
d =
N
" xi ! x
i =1
N
Statistics Review
Rev Spring 2009 NF
Page 1 of 9
Chemistry 153
Clark College
STANDARD DEVIATION is probably a more reliable measure of precision. It is a statistical function
representing in absolute terms the interval about the mean within which a majority of experimental
value should lie. The second expression below is often an easier technique to use with a calculator.
You may also be able to use your calculator functions to perform this calculation!!
' !"#(x -x )
i
S=
2
$
%&
N-1
(
x
' **
)
2
i
or
(' x )
i
2
+
N ,
N-1
RELATIVE STANDARD DEVIATION, SR or RSD, is the standard deviation divided by the mean
value. This may be expressed in percent or in ppth.
S
R
=
S
x
! 100
(in percent)
S
R
=
S
! 1000
x
(in ppth)
Understanding Standard Deviation
(A Measure of the "Spread" in Data)
Data often tends to "pile up" near its average, or that it can be fit by a normal, or gaussian curve:
2
2
The Gaussian curve function is approximated by f (x) = Ae -s ( x-m) where x is a particular measurement,
m is the arithmetic mean of the sample, s is the standard deviation of the sample and A is the height of
the curve at m.
A
m
X
X = average
S = standard deviation
68% of data in range X ± 1S
95% of data in range X ± 2S
99.7% of data in range X ± 3S
S states that 68.26% of all data lies between ±1s of the mean, 95.44% of all data lies
The empirical rule
between ±2s, and 99.74% lies between ±3s. If the data is very precise, then the standard deviation will
be small, meaning that the “spread” of the data is narrow and our values are very close to the mean.
However, if the standard deviation is large, then our data is not very precise, and we must repeat the
experiment. Note that this does not determine the accuracy of the data, or how close the data is to an
“accepted value”. We may assume (which may not be a good assumption) that our experiment will
result in a solution that is accurate, and we must then obtain data that is precise.
For any given set of data, we must determine the mean, the standard deviation and the relative
standard deviation. The following table shows an example set of data and the determination of its
mean and standard deviation
Statistics Review
Rev Spring 2009 NF
Page 2 of 9
Chemistry 153
Clark College
x=
!x
N
i
=
xI
xi - x
2
3
3
6
1
-1
0
0
3
-2
15
=3
5
! (x
N = 5; s =
(x
i
-x
)
2
1
0
0
9
4
i
-x
)
2
= 14
14
= 1.87
(5-1)
Evaluation and Rejection of Data
All experimental procedures contain sources of error. Some of these, called "determinate errors", can
be treated in a straight-forward manner. For example, the sensitivity of balances or calibration of
burets contributes to the number of places to which the data are significant, and hence to the number of
significant places in the reported results. Other errors, called "indeterminate errors", are results which
deviate from the set in a random manner and for no apparent cause. These errors are recognized by
using statistical methods.
Except where specifically instructed otherwise, all analytical procedures in this course are conducted
with quadruplicate samples. One of these four samples could be rejected for cause. Naturally, if a
procedural error is made (e.g.: adding the wrong reagent, titrating past the end point, or dropping the
flask) the sample is rejected immediately and a fifth sample should be weighed out to replace the lost
sample.
Occasionally one result (an outlier) seems quite different from the remaining three results for no known
cause. In this case one should suspect a random, or indeterminate, error and apply statistical methods
in order to confirm the natural desire to reject the outlying value. A simple and statistically valid test is
the Q-test in which the difference between the outlying result and its nearest neighbor is compared
with the total sample range. If the ratio of the difference to the sample range exceeds a statistically
derived value (see Table below) the outlying result may be rejected. For this course, we will reject data
with a 90% certainty that the rejection is valid. It is essential that all Q-test calculations be shown on the
back of your report.
However, just because the Q-test ratio is exceeded is no sure guarantee that the outlying value
should be rejected. If the outlying value falls within the expected limits of error due to method and
instrument limitations then the suspect result is actually a valid result.
Statistics Review
Rev Spring 2009 NF
Page 3 of 9
Chemistry 153
Clark College
Q-TEST REJECTION QUOTIENTS
Number of
Observations
90%
Confidence
95%
Confidence
3
0.941
0.970
4
0.765
0.829
5
0.642
0.710
6
0.560
0.625
7
0.507
0.568
8
0.468
0.526
9
0.437
0.493
10
0.412
0.466
To perform the Q-Test, list all of your data points in order of increasing value. Check your values and
if only one of the values seems different than the others, perform the Q-Test on that value in the
following manner: (the examples have a confidence interval of 90%)
Example 1: Values: 86.20, 86.21, 86.27, 86.44
Q=
difference
86.44 - 86.27
0.17
=
=
= 0.71 < 0.765
range
86.44 - 86.20
0.24
Comment: Ratio is less than tabulated value of Q, and outlying result must be retained.
Example 2: Values: 87.80, 87.00, 86.98, 86.84
difference 0.80
=
= 0.83 > 0.765
range
0.96
Comment: Ratio is greater than tabulated value of Q, and outlying result may be rejected with
90% confidence.
Example 3: Values: 86.45, 86.25, 86.24, 86.21
difference 0.20
=
= 0.83
range
0.24
Comment: Ratio is the same as in Example 2. However, the difference between the outlying
result and the sample mean 86.45 - 86.29 = 0.16 is only 1.9 part per 1000 of the
sample mean (
0.16
*1000 ) and is therefore within acceptable limits of error due to
86.29
indeterminate causes. The outlying result should not be rejected since it falls within
the range that is expected for the given analytical procedure.
Example 4: Values: 87.80, 87.64, 86.96, 86.64
difference 0.16
=
= 0.14
range
1.16
Comment: This wide sample range and small difference is a frustrating situation. All four
results must be retained according to the Q-test, but the range is 13 parts per
thousand. If this is beyond acceptable limits, one must admit poor technique and
repeat the entire analysis.
IMPORTANT: The Q-Test is only one test for the evaluation and rejection of data. In this class, a
value must fail both the Q-Test and the 5 ppth test to be rejected. The 5 ppth test is discussed in the
next session.
Statistics Review
Rev Spring 2009 NF
Page 4 of 9
Chemistry 153
Clark College
Precision and Parts-per-thousand (ppth) Analysis
Similar to a percentage (parts-per-hundred), ppth is simply a comparison with either more precision or
more accuracy than a percent error. This same process can be used for a variety of errors – in
Environmental Chemistry, detection levels and precision are often measured in the parts-per-million or
parts-per-billion range.
There are three ways we will use ppth analysis:
1. Precision – using ppth to define a number of sig figs to report in a measurement.
For data collection, always report the number of digits given by the measuring device. Examples:
lab counter balances, ±0.01 g; analytical balances, ±0.0001g; burets, ±0.01. However, you will report a
piece of data using the # of significant figures that gives you a precision of 1 ppth, or very close to it.
For example, you make a measurement on the analytical balance that is 11.01576 g. To determine the
number of digits to report (and therefore to which decimal place to round the mass value), you want to
find an ‘x’ value that, when divided by the measurement rounded to the same decimal place, roughly
equals 1/1000. The example below shows three possible “rounding places”, by inspection the middle
representation below is nearest to 1/1000, therefore you report your values to the hundredth place.
x=
± 0.1
11.0
± 0.01
11.02
± 0.001
11.016
Reported value = 11.02 ± 0.01 g
If the situation is more complicated, for example a calculation near 40%, you will need to go
through the same process and you will need to try and determine if it is OK to use 1/400, ±0.4 or
1/4000, ±0.04. There is sometimes no easy answer and it depends upon the error in your actual
measurements and the least precise decimal place for that piece of equipment. As we go through the
quarter, we will discuss various situations in the introduction to each experiment.
2. Average error in a set of data, and the relative standard deviation (RSD).
You will often be taking 4 readings or measurements on a sample or unknown. After you do all
calculations and find the standard deviation (described previously), you will use this to find the
Relative Standard Deviation in ppth.
To find the RSD:
RSD (in ppth) =
s
x 1000
x (the mean)
RSD(in %, or pph) =
s
x 100
x
3. The 5-ppth Rule, also known as Relative Error
You will use two different criteria to determine if you may toss out an errant piece of data, the Q–
Test (described previously), and the 5-ppth rule. This rule is similar to a percent difference. For a piece
of data to fail the 5 ppth rule, the ppth difference must be greater than 5 ppth:
value - mean
x 1000 ! 5 ppth ! Fails!
mean
Remember! This test is used in conjunction with the Q-test. You may not discard a value unless it
fails both the Q-test and the 5-ppth rule.
Statistics Review
Rev Spring 2009 NF
Page 5 of 9
Chemistry 153
Clark College
For this class, you must always calculate the mean, standard deviation and relative standard
deviation for all data that you collect. The examples below goes through the complete process. Pay
attention to the significant digits and decimal places in the values.
Example: 4 different samples of iron ore are weighed on an analytical balance, and the following
masses were obtained: 18.6389 g, 18.6357 g, 18.6273 g, 18.6310 g. Tabulate this data, rounding to ppth
precision. Determine the mean, the standard deviation, and the RSD for this set of data.
Trial
1
2
3
4
Mass, in g
18.6989
18.6357
18.6473
18.6110
mean = x =
!x
i
( 18.70 + 18.64 + 18.65 + 18.61) = 18.65 g
=
N
Mass, in g, to ppth precision
18.70
18.64
18.65
18.61
4
(xi
- x)
( xi
- x)
Trial
xi
1
18.70
0.05
0.0025
2
18.64
-0.01
0.0001
3
18.65
0
0
4
18.61
-0.04
!(x
Std. Deviation =
! (x
i
-x
N-1
)
2
=
0.0016
- x) =
2
i
2
0.0042
0.0042
= 0.0374 = 0.04 g
4-1
Note: The standard deviation should be rounded to the same decimal place as the mean, and has the
same units.
RSD =
s
0.04
x 1000 =
x 1000 = 2.145 = 2
x
18.65
Note: Because the standard deviation has only one significant digit, the RSD is rounded to only one
significant digit. It has no units!
Summary:
Mean =
18.65 g
Rounded to reflect ppth precistion.
Standard Deviation =
0.04 g
Rounded to the same decimal place as the mean.
RSD =
2 ppth
Rounded to the same number of significant digits
as the standard deviation.
Statistics Review
Rev Spring 2009 NF
Page 6 of 9
Chemistry 153
Clark College
Example: Five determinations of percent iron in iron ore yielded the results 67.45, 67.37, 67.47, 67.43
and 67.40%. Calculate the average deviation, the standard deviation, and the relative standard
deviation.
Trial
Percentage, %
x!x
( x ! x )2
1
67.48
0.05
0.0025
2
67.37
0.06
0.0036
3
67.47
0.04
0.0016
4
67.43
0.00
0.0000
5
67.40
0.03
0.0009
Σ = 0.18
Σ = 0.0086
mean = 67.43%
Average Deviation = d =
0.18
= 0.04% (Note: the same units, decimal place as in the data)
5
Standard Deviation = s =
0.0086
= 0.05% (Note: the same units, decimal place as in the data)
4
Relative Standard Deviation in ppth =
0.05
x 1000 = 0.7 ppth, less than 1 ppth.
67.43
Practice Problem 1: Use your calculator or other method to find the mean, the standard deviation, and
the relative standard deviation (RSD) for the following set of mass measurements: 2.13 kg, 2.15 kg, 2.15
kg, 2.17 kg, 2.09 kg, 2.12 kg, 2.17 kg, 2.09 kg, 2.11 kg, 2.12 kg. (Assume all values are to be retained.)
Practice Problem 2: For the numbers 116.0, 97.9, 114.2, 106.8 and 108.3, find the mean, the standard
deviation, the range, and the RSD. Using the Q-test, determine if the value 97.9 should be discarded
with a 95% confidence interval.
Answers:
Practice Problem 1:
n = 10,
Practice Problem 2:
value must be kept.
n = 5,
Statistics Review
x = 2.13 kg, sn-1 = 0.03, RSD= 13.8  10 ppth to 1 s.f.
x = 108.6, sn-1 = 7.1, Range = 18.1, RSD = 65 ppth, Qobs. = 0.49 < Qtable so
Rev Spring 2009 NF
Page 7 of 9
Chemistry 153
Clark College
A COMPLETE example:
You have just performed an analysis of chromium in steel samples by spectrophotometry, and you
have calculated the following 5 values:
Trial:
% Cr
1
16.237
2
16.251
3
16.233
4
16.361
5
16.239
You will be reporting percent chromium on your data report sheet, so you will need to round these
values to ppth precision. To determine the correct decimal place to round, we consider the following
ratios
x=
± 0.1
16.2
± 0.01
16.24
± 0.001
16.237
(this can also be done purely by inspection – since the first number of the measurement is a one, we
need four total digits to be close to “1000”)
Trial:
% Cr
1
16.24
2
16.25
3
16.23
4
16.36
5
16.24
Looking over the rounded data, the fourth trial seems “off”. We can use the Q-test and 5-ppth test to
determine if we can reject the data. Note that the data must fail both tests to be rejected.
Q-test (90% confidence)
Q=
difference
16.36 - 16.25
0.11
=
=
= 0.846 > 0.642
range
16.36 - 16.23
0.13
The value of Q90 for 5 data points is 0.642 (from the table in this packet). Because the determined Q
value is greater than Q90, this data point fails the Q test. However, we cannot reject it outright, we must
also perform the 5-ppth test to see if it fails that.
5-ppth test
value - mean
x 1000 ! 5 ppth ! Fails!
mean
For the 5-ppth test, we need to determine the mean of the data, using all 5 data points. The mean is
16.26% Cr.
16.36 - 16.26
x 1000 = 6.2 ! 5 ppth ! Fails!
16.26
The fourth trial will be rejected, as it fails both the Q-test and the 5-ppth test. The new data now looks
like:
Trial:
% Cr
1
16.24
2
16.25
3
16.23
4
16.36
5
16.24
With the remaining four pieces of data, a new mean, standard deviation and RSD will be calculated.
Statistics Review
Rev Spring 2009 NF
Page 8 of 9
Chemistry 153
Clark College
x=
The mean:
( 16.24 + 16.25 + 16.23 + 16.24) = 16.24%
4
The standard deviation (s):
(x
-x
)
(x
)
2
Trial
xi
1
16.24
0
2
16.25
0.01
0.0001
3
16.23
-0.01
0.0001
4
16.24
0
0
i
! (x
! (x
Std. Deviation =
i
-x
N-1
)
2
=
i
)
2
-x =
i
-x
0.0002
0.0002
= 0.008165 = 0.01 %
4-1
The standard deviation is the amount of uncertainty in the last digit of the data points, or the mean.
Since those data are reported to the hundredths place, the standard deviation must be rounded to the
same decimal place to reflect the uncertainty in that decimal place. Also, the standard deviation carries
the same units as the data. So, the standard deviation is 0.01%.
The relative standard deviation (RSD):
RSD =
s
0.01
x 1000 =
x 1000 = 0.6158 = 0.6 ppth
x
16.24
The RSD is a measure of how large the standard deviation is as compared to the actual data. Most
calculation-based experiments in CHEM 153 will require an RSD no greater than a certain threshold (5
or 10 ppth). Since the RSD is based on the standard deviation, it should have the same number of
significant digits as the standard deviation. In this example, the standard deviation has one sig fig, so
the RSD is rounded to one sig fig as well.
The final step is to report your data using the data report sheet provided, and to input your %Cr values
into the class spreadsheet. For this “experiment”, the data report sheet would appear as follows:
Trial:
1
2
3
4
5
% Cr
16.24
16.25
16.23
16.36
16.24
Mean
16.24%
← The mean is reported with
units!
Standard
Deviation
0.01%
← S has the same units, decimal
place as the mean.
RSD
0.6 ppth
← The RSD has the same
number of sig figs as S.
Statistics Review
Rev Spring 2009 NF
All trials are reported, rounded
to ppth precision. Trials that are
rejected are circled, and calcs for
Q and 5-ppth tests are shown on
the back of the data report sheet.
Page 9 of 9