Download x - Analytical Chemistry

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Receiver operating characteristic wikipedia , lookup

Confidence interval wikipedia , lookup

Regression toward the mean wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Ch. 4: Statistics
Outline:
•
•
•
•
•
•
•
•
•
4-1 Gaussian Distribution
4-2 Confidence Intervals
4-3 Means and Studentʼs t-test
4-4 Standard Deviations and the F-test
4-5 Excel exercise: t tests with a spreadsheet
4-6 Grubbs Test for an outlier
4-7 Method of Least Squares
4-8 Calibration Curves
4-9 Excel exercise: least squares on a spreadsheet
Updated Oct. 7, 2011: minor fixes to slides 16, 36
Gaussian Distribution
If an experiment is done a large number of times, the resulting value tend to cluster around
an average value in a symmetric fashion. Many repetitions result in a Gaussian distribution
or the so-called bell curve.
In practice, in the lab you will be making between 3-5 measurements (not thousands);
nonetheless, the parameters that estimate a larger set of data can be easily obtained from
these measurements.
Mean & Standard Deviation
The arithmetic mean (or average) is defined by the sum of the measured values, xi, divided
by the number of measurements, n.
x=
∑x
i
i
n
The standard deviation gives a measure of how closely the data is clustered about the mean
(i.e., the precision), with a small s corresponding to a tight clustering.
∑ (x
i
s=
− x )2
i
n −1
The plot to the right shows the number of bulbs plotted
as a function of lifetime. Though the two sets of bulbs
have the same mean lifetime, it is clear that the set of
bulbs with s = 47.1 h has undergone a more uniform
manufacturing process than the set with s = 94.2 h.
Mean & Standard Deviation, 2
If the dataset is infinite (i.e., n = ∞), then the mean and standard deviation are designated by
the Greek letters μ and σ. Of course, it is not possible in practice to measure these values;
however, as the number of measurements increase, the values of x and s approach the
values of μ and σ.
Other important statistical values:
The quantity n - 1 in the denominator of s is referred to as the degrees of freedom.
The square of the standard deviation, s2, is known as the variance.
Finally, if the standard deviation is expressed as a percentage of the mean, this is called the
coefficient of variation or the relative standard deviation.
s
× 100%
x
Significant figures: Experimental results are expressed in the form x ±s(n=_), where n is
the number of data points. e.g., 823 ± 30 (n = 4) or 8.2 (±0.3) × 102 (n = 4) indicates that the
mean has just two significant figures.
Probability
The Gaussian curve is described by the formula
1
−( x − µ )2 /2 σ 2
y=
e
σ 2π
e is the base of the natural logarithm, and μ and σ
are approximated by x and s.
The graph of this equation (with μ = 0 and σ = 1)
has a maximum value of y when x = μ, and the
curve is symmetric about this value.
Deviations from the mean are often expressed at
multiples of the standard deviation, z:
z=
x−µ x−x
≈
σ
s
i.e., so when z = +1, x is one standard deviation
above the mean, and when z = −2, x is two
standard deviations below the mean.
A Gaussian curve in which μ = 0 and σ = 1. A Gaussian
curve whose area is unity is called a normal error curve. The
abscissa z = (x − μ)/σ is the distance away from the mean,
measured in units of the standard deviation. When z = 2,
we are two standard deviations away from the mean.
Probability, 2
The probability of measuring z in a certain range is equal to the area of that range.
e.g., the probability of observing z between −2 and −1 is 0.136 (corresponds to the shaded
area on the previous slide). The area under each portion of the curve is shown below:
Probability, 3
The sum of all probabilities must be equal to unity (1 or 100%); hence, the area under the
curve that goes from z = -∞ to z = +∞ must also be equal to 1. The factor 1/(σ√2π) is called
the normalization factor, and ensures that this is the case.
The standard deviation is therefore a measure of the width of the curve; the larger the
standard deviation, the broader the curve, and the less the precision.
Interestingly, in any Gaussian curve (no matter what the nature of the data is comprising it),
68.3% of the area is in the range of 1 standard deviation (i.e., between μ - 1σ and μ + 1σ),
meaning that greater than two thirds of the data are expected to lie within 1 standard
deviation of the mean.
Standard Deviation of the Mean
To measure the mean life time of a collection of light bulbs, we could measure each oneʼs
lifetime, and compute the average. Alternatively, we could select four at a time, measure the
lifetime of each, and compute the average of lifetime of the four (and repeat this for many
sets of four). From these data, we compute μ and σ4 (the subscript 4 indicates sets of 4).
The means that are calculated from
both of these methods work out to be
the same, but the standard deviations
are different. In this case, σ4 = σ/√4.
Thus, σ4 is the standard deviation of
the mean for sets of 4 samples. The
standard deviation of the mean for sets
of n samples is expressed as:
σn =
σ
n
The more measurements that are made, the higher the confidence that the average is close
to the real mean. Uncertainty decreases proportional to 1/√n, where n is the number of
measurements. (e.g., uncertainty decreases by 10x with 100 measurements!)
Confidence Intervals
Studentʼs t is used to express confidence intervals and to compare results from different
experiments.
For a limited number of measurements, n, we determine the
sample mean x and the sample standard deviation, s. The
confidence interval is computed from
Confidence interval = x ±
ts
n
where t is Studentʼs t, which is selected from the table on the next
slide for a desired level of confidence.
What does the confidence interval tell us? e.g., The 95%
confidence interval would include the true population mean
(unknown value) in 95% of the sets of n measurements.
“Student” was the pseudonym of W. S.
Gosset, whose employer, the Guinness
breweries of Ireland, restricted publications
for proprietary reasons. Because of the
importance of Gosset’s work, he was
allowed to publish it (Biometrika 1908, 6,
1), but under an assumed name.
Confidence Intervals
Pick a Studentʼs t value by selecting a confidence level, and determining the number of
degrees of freedom (e.g., if n = 21, n - 1 = 20, CL = 98%, then t = 2.528).
Confidence Intervals and Excel
Example: The carbohydrate content of a glycoprotein (a protein with sugars attached to it) is
found to be 12.6, 11.9, 13.0, 12.7, and 12.5 wt% (g carbohydrate/100 g glycoprotein) in
replicate analyses. Find the 50% and 90% confidence intervals for the carbohydrate content.
Meaning of Confidence Intervals
A computer chose numbers at random from a Gaussian population with a population mean
(μ) of 10 000 and a population standard deviation (σ) of 1 000.
Trial 1: 4 numbers, x and s calculated, CI = 50%, 3 DOF, t = 0.765, x = 9526 (left most)
Experiment repeated for 100 trials, so 50 (50% for infinite # of experiments) should include
the true population mean of 10000 (in fact, in this test, there are 45 white blocks).
Includes pop. mean
Does not include
pop. mean
Same experiment
conducted with CI = 90%
(which for an infinite # of
experiments should have
90% including the true
population mean); 89
were found to include the
true population mean.
SD, CI and Experimental Uncertainty
Suppose you measure the volume of a vessel five times and observe values of 6.375, 6.372,
6.374, 6.377, and 6.375 mL.
For 5 measurements,
DOF = 4, CL = 95%,
x = 6.3746 mL, s = 0.0018 mL
Studentʼs t = 2.776, CI = 0.0023
For 21 measurements, reduced uncertainty
DOF = 20, CL = 95%,
x = 6.3746 mL, s = 0.0018 mL
Studentʼs t = 2.086, CI = 0.0008
Comparison of means
If two sets of measurements are made on the same sample, the mean from one set will
generally not be equal to the mean from the other set (small random errors!).
We can use the t test to compare the mean values to decide if there is a statistically
significant difference between them (i.e., do they agree within experimental error?).
The null hypothesis states that the mean
values from two sets of measurements are
not different. Statistics gives us a probability
that the observed difference between two
means arises from random measurement
error. We reject the null hypothesis if there is
less than a 5% chance that the observed
difference arises from random variations (i.e.,
there is a 95% chance that our conclusion is
correct, or 1 time out of 20 when we conclude
that two means are not different we will be
wrong).
In the field of statistics, the null hypothesis is assumed to be
true. Unless you find strong evidence that it is not true, you
continue to believe that it is true. In the U.S. legal system, the
null hypothesis is that the accused person is innocent. It is up
to the prosecution to produce compelling evidence that the
accused person is not innocent; failing that, the jury must
acquit the defendant.
Comparison of means, 2
Case 1: Comparing a measured result with a “known” value
A quantity is measured several times, mean and standard deviation are obtained. We need
to compare our answer with an accepted answer. The average is not exactly the same as
the accepted answer. Does our measured answer agree with the accepted answer “within
experimental error”?
Case 2: Comparing replicate measurements
A quantity is measured multiple times by two different methods that give two different
answers, each with its own standard deviation. Do the two results agree with each other
“within experimental error”?
Case 3: Paired t test for computing individual difference
Sample A is measured once by method 1 and once by method 2; the two measurements do
not give exactly the same result. Sample B, is measured once by method 1 and once by
method 2; and, again, the results are not exactly equal. The procedure is repeated for n
different samples. Do the two methods agree with each other “within experimental error”.
Case 1
Case 1: Comparing a measured result with a “known” value
You purchased a Standard Reference Material coal sample certified by the National Institute
of Standards and Technology to contain 3.19 wt% sulphur. You are testing a new analytical
method to see whether it can reproduce the known value. The measured values are 3.29,
3.22, 3.30, and 3.23 wt% sulphur, with x = 3.260 and s = 0.041.
Does your answer agree with the known answer? To find out, compute the 95% confidence
interval for your answer and see if that range includes the known answer. If the known
answer is not within your 95% confidence interval, then the results do not agree.
95% confidence interval =
ts
(3.182)(0.041 )
x±
= 3.26 0 ±
= 3.26 0 ± 0.06 5
n
4
so the 95% confidence interval = 3.19 5 to 3.32 5 wt%
The known answer (3.19 wt%) is just outside the 95% confidence interval; therefore, there is
less than a 5% chance that our method agrees with the known answer. Thus, we conclude
that our method gives a “different” result from the known result. However, in this case, the
95% confidence interval is so close to including the known result that it would be prudent to
make more measurements before concluding that our new method is not accurate.
Case 2
Case 2: Comparing replicate measurements
Lord Rayleigh (John W. Strutt) discovered the element Argon (Nobel Prize in 1904) during a
time when it was thought that dry air was 1/5 O2 and 4/5 N2.
1. He took air and removed oxygen with hot copper (to make CuO) and collected the
remaining gas and accurately measured its weight and density.
2. He compared this to pure N2 gas from chemical decomposition of nitrous oxide, nitric
oxide and ammonium nitrite.
The average mass collected from air (2.31011 g) was
0.46% greater than the average mass of the same volume
of gas from chemical sources (2.29947 g).
Experiments were done with great care and repeated
many times - Rayleigh understood that the discrepancy
was outside his margin of error, and he postulated that
gas collected from the air was a mixture of N2 with a small
amount of a heavier gas, which turned out to be Ar.
Case 2, 2
We will use the t-test to see if the gas isolated from air is “significantly” different than
nitrogen isolated from chemical sources. There are two sets of measurements, no “known”
value for comparison, and it is assumed that the σ from each set are similar.
Using two sets of data with n1 and n2 measurements, t is calculated with:
t calc =
| x1 − x2 | n1n2
spooled
n1 + n2
where spooled is a pooled standard deviation that makes use of both sets of data.
spooled =
∑ (x
i
set 1
− x1 )2 + ∑ (x j − x2 )2
set 2
n1 + n2 − 2
=
s12 (n1 − 1) + s22 (n2 − 1)
n1 + n2 − 2
The tcalc is compared to the t from the table for n1 + n2 - 2 degrees of freedom. If tcalc is
greater than ttable at CL = 95%, then the results are considered to be different (i.e., less than
a 5% chance that the two sets of data have the same population mean).
N.B., There are also sets of equations for cases where the values of σ from each set are not
similar to one another.
Case 3
Case 3: Paired t test for computing individual difference
Two separate methods are used to measure the same values for multiple samples. For this
example, we look at aluminum concentrations in drinking water.
Results are similar, but not identical. Hence, to see if
the difference is significant (i.e., outside of experimental
error), the paired t test. To get started, the differences
between methods are tabulated, and the mean and
S.D. are determined for these differences. Then,
t calc =
|d |
2.4 91
n=
11 = 1.2 24
sd
6.7 48
Note that the absolute value of the mean of the
differences is in the numerator, so tcalc > 0.
Since tcalc = 1.224 is less than ttable = 2.228 (for CL =
95% and 10 DOF); hence, there is a more than 5%
chance that the two sets of data lie “within experimental
error” of one another, meaning that the results are not
significantly different (different methods work well!)
Significance: 1- and 2-Tails
The curve below (a) is the t distribution for 3 DOF. If the certified value lies in the outer 5%
of
f the area under the curve, we reject the null hypothesis and conclude with 95% confidence
that the measured mean is not equivalent to the certified value.
The critical value of t for rejecting the
null hypothesis is 3.182 for 3 degrees
of freedom (see Table 4-2, slide 10).
In (a), 2.5% of the area beneath the
curve lies above t = −3.182 and 2.5%
of the area lies below t = −3.182. We
call this a two-tailed test because we
reject the null hypothesis if the certified
value lies in the low-probability region
on either side of the mean.
We will discuss 2-tailed t-tests for the
most part.
Red Blood Cell Counts
Is todayʼs blood cell count anomalously high? Or, given the set of data, is the count
regularly expected to get up to 5.6 × 106 cells/μL
Is todayʼs blood cell count anomalously high?
Or, given the set of data, is the count regularly
expected to get up to 5.6 × 106 cells/μL
t calc =
| today's count − x |
| 5.16 − 5.6 |
n=
5 = 4.28
s
0.23
In Table 4-2 (slide 10), looking across the row for
4 degrees of freedom, we see that 4.28 lies
between the 98% (t = 3.747) and 99% (t = 4.604)
confidence levels. Todayʼs red cell count lies in
the upper tail of the curve containing less than
2% of the area of the curve. There is less than a
2% probability of observing a count of 5.6 × 106
cells/μL on “normal” days. It is reasonable to
conclude that todayʼs count is elevated.
F test and Standard Deviations
The F test tells us whether two standard deviations are “significantly” different from each
other. F is the quotient of the squares of the standard deviations:
Fcalc
s12
= 2
s2
The larger standard deviation is placed in the numerator so that F ≥ 1. The hypothesis that
s1 > s2 is tested using the one-tailed Ftest in the table on the next slide. If Fcalculated > Ftable,
then the difference is significant, and different formulae (from those on slide 18) must be
used.
F test and Standard Deviations, 2
Grubbs Test for Outliers
Students dissolved zinc from a galvanized nail and measured the mass lost by the nail to tell
how much of the nail was zinc. Here are 12 results:
Mass loss (%): 10.2, 10.8, 11.6, 9.9, 9.4, 7.8, 10.0, 9.2, 11.3, 9.5, 10.6, 11.6
Outlier
Question: Should we discard or retain this result?
1. Compute the average:
= 10.16, s = 1.1
2. Calculate the Grubbs statistic:
| questionable value − x |
s
3. If Gcalculated > Gtable, then discard the point.
Above Gcalculated = 2.13 and Gtable (N = 12) = 2.285, so the
point should be retained, and there is more than a 5%
chance that the value is a member of the same population
as the other values. But...use common sense!
Gcalculated =
Method of Least Squares
For most chemical analyses, the response of
the procedure must be evaluated for known
quantities of analyte (called standards) so that
the response to an unknown quantity can be
interpreted. We prepare a calibration curve,
which ideally is linear in the region of interest.
The method of least squares is used to predict
the “best” straight line through a dataset,
though some points of course will scatter from
the line.
In this section, we learn to estimate the
uncertainty in a chemical analysis from the
uncertainties in the calibration curve and in the
measured response to replicate samples of
unknown.
Calibration curves for analysis of caffeine and
theobromine content (see lecture 0). The black
points are from the standards, and the blue points
are from the unknown.
Method of Least Squares, 2
To use this procedure, we assume:
1. The uncertainties in y are greater
than those of x, which is typically the
case in analytical chemistry (i.e.,
response of an instrument vs. weight/
volume).
2. The uncertainties (std. dev.) in all of
the y values are similar.
The Gaussian curve drawn over the point (3,3) is a schematic
indication of the fact that each value of yi is normally distributed
about the straight line. That is, the most probable value of y will
fall on the line, but there is a finite probability of measuring y
some distance from the line.
Method of Least Squares, 3
The equation of a straight line is:
y = mx + b
The vertical deviation for point xi, yi is yi - y, where y
is the ordinate of the line when x = xi.
vertical deviation = di = yi − y = yi − (mxi + b)
Since the deviations about the line have equal
chances of being positive or negative, we wish to
minimize the magnitude of deviations (irrespective of
sign), so we take the squares (all +ve numbers):
d i2 = (yi − y)2 = (yi − mxi − b)2
Hence, since we are minimizing the squares of the deviations, this technique is known as
the method of least squares.
Method of Least Squares, 4
We omit the calculus used to derive this method, and express the final solutions in terms of
determinants, which are defined as:
e
f
g
h
= eh − fg; e.g.,
6 5
= (6 × 3) − (5 × 4) = −2
4 3
The slope and the intercept of the “best” straight line for a least squares fit are:
m=
∑ (x y ) ∑ x
n
∑y
i i
i
÷ D and
i
b=
∑ (x ) ∑ (x y )
∑x
∑y
2
i
i i
i
i
÷D
where n is the number of points and D is
D=
∑ (x ) ∑ x
∑ x ∑n
2
i
i
i
From the manual calculation below,
y = 0.61538x + 1.34615
Method of Least Squares, 5
The population standard deviation of all y values, σy, characterizes the little Gaussian
curve on the linear regression plot on slide 26, and is dependent upon the uncertainties in
m and b. We estimate it by calculating the standard deviation, sy, for all measured values
of y, where the deviation of each yi from the centre of the Gaussian curve is di = yi − y = yi
− (mxi + b):
σ y ≈ sy =
∑ (d
i
− d )2
(degrees of freedom)
where d bar is the average deviation, and is equal to 0 for a straight line. This means that
the numerator above can simply be expressed as ∑ (di2 ) . The degrees of freedom are the
number of independent pieces of information (e.g., n - 1 in addition to the average, or n - 2
if both the slope and intercept are known.
The uncertainty associated with the y values, and the standard deviations of m and b are:
sy =
∑ (d
2
i
n−2
)
;
s 2m =
sy2 n
D
;
sb2 =
sy2 ∑ (xi2 )
D
The first decimal place of the standard deviation is the last significant figure of the slope or
intercept.
Method of Least Squares, 6
As you are probably already aware, it is trivial to perform least squares analyses using the
Microsoft Excel program. For simple averages and intercepts, the AVERAGE and
INTERCEPT commands are used. For more information on the standard deviations of the
mean and intercept, the LINEST command is used. LINEST is an array formula (see
instructions below), and is input as LINEST (y values, x values, TRUE, TRUE), where the
values are expressed the usual way (as a range of cells, e.g., B2:B5, etc.).
Calibration Curves
Consider the set of data below, from spectrophotometric measurements of absorbance of
light by aqueous protein samples (absorbance is proportional to protein concentration).
The corrected absorbance is the absorbance
subtracted by the value for the blank sample.
This data involves known concentrations of analyte
(as well as a blank), and can be used to construct a
calibration curve. Note the outlier marked in
parentheses. Based on the curve on the right, this
point (0.392) can be omitted from further calculations.
In addition, it seems that the average absorbance
value for the 25.0 μg sample seem a bit low however, repetition of this analysis shows that this
always happens, and the original data is good.
Calibration Curves, 2
Constructing a calibration curve:
Step 1: Prepare known samples of
analyte covering a range of
concentrations expected for unknowns.
Measure the response of the analytical
procedure to these standards.
Step 2: Subtract the average
absorbance of the blank samples
(0.0993) from each measured
absorbance to obtain corrected
absorbance.
Step 3: Make a graph of corrected
absorbance versus quantity of protein
analyzed and use the least-squares
procedure to find the best straight line
through the linear portion of the data (up
to and including 20.0 μg of protein).
The equation of the solid straight line fitting the 14 data points (open
circles) from 0 to 20 μg, derived by the method of least squares, is y =
0.01630 (±0.00022)x + 0.0047 (±0.0026) with sy = 0.0059. The equation
of the dashed quadratic curve that fits all 17 data points from 0 to 25 μg,
determined by a nonlinear least squares procedure is y = −1.17(±0.21) ×
10−4 x2 + 0.01858 (±0.00046)x − 0.0007 (±0.0010) with sy = 0.0046.
y(±sy ) = [m(±sm )]x + [b(±sb )]
Calibration Curves, 3
Calibration Curves, 4
The linear range of an analytical method is the
range over which the response to analyte
concentration is linear, whereas the dynamic
range simply describes the region over which
there is a response - even if it is not linear. In
this latter case, one may use non-linear
regression methods (see text) to fit curves that
deviate from linearity.
Finally, the uncertainty in x (sx) can be calculated from this unwieldy equation:
1 1
(y − y )2
sx =
+ + 2
| m | k n m ∑ (xi − x )2
sy
where |m| is the absolute value of the slope, k is the number of replicate measurements of
the unknown, n is the number of data points for the calibration line. The x and y values
have their usual meanings.
Good practice
1. Always make a graph of your data. The graph gives you an opportunity to reject bad
data or the stimulus to repeat a measurement or decide that a straight line is not an
appropriate function.
2. Do not extrapolate any calibration curve, linear or nonlinear, beyond the measured
range of standards.
3. At least six calibration concentrations and two replicate measurements of
unknown are recommended. The most rigorous procedure is to make each calibration
solution independently from a certified material.
4. Avoid serial dilution of a single stock solution. Serial dilution propagates any
systematic error in the stock solution.
5. Measure calibration solutions in random order, not in consecutive order by increasing
concentration.
Least squares and Excel
95% confidence interval for x:
DOF: 2
x ± tsx = 2.2325 ± (4.303)(0.3735) = 2.2 ± 1.6