Download CHAPTER 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
CHE322_F06
CHAPTER4_EAD
CHAPTER 4
EVALUATING ANALYTICAL DATA
A. Characterizing a Measurement and Results
B. Characterizing Experimental Errors
C. Propagation of Uncertainty
D. Distribution of Measurements and Results
E. Statistical Analysis of Data
F. Detection Limits
1
CHE322_F06
CHAPTER4_EAD
A.
MEASURES OF CENTRAL TENDENCY AND SPREAD
A.1
ESTIMATORS OF CENTRAL TENDENCY
The mean
Estimator of the central tendency or the true value
X 
in1 X i
n
The median
X med is the middle value.
For an odd data set
____________________________
For an even data set
____________________________
A.2
ESTIMATORS OF VARIABILITY (scatter)
The range
The range (w) is the difference between the largest and the smallest values in
a data set
The standard deviation
 in1 X i  X 
2
s
n 1
Absolute standard deviation
n-1= degrees of freedom = number of independent pieces of information on which a
parameter estimate is based.
sr 
s
X
%s r 
Relative standard deviation
s
 100
X
Percent relative standard deviation
The Variance
 in1 X i  X 
2
s2 
n 1
2
CHE322_F06
CHAPTER4_EAD
B.
CHARACTERIZING EXPERIMENTAL ERRORS
B.1
Accuracy
E  X 
Er 
X 

Absolute error
 100
Percent relative error
Accuracy is determined by
Determinate Errors (systematic errors)
1. Sampling errors
Non representative sample
2. Method errors
S meas  kC A  S reag
Incorrect k used
Incorrect Reagent blank measurements
Other Interferents
3. Measurement errors
Instruments (loss of calibration)
Equipment (page 59)
4. Personal errors
Example: Dust in molecular absorption spectrophotometry
Constant determinate errors
Can be detected by using different size samples for making a determination
Proportional determinate errors
3
CHE322_F06
B.2
CHAPTER4_EAD
PRECISION
Measure of the spread of data about a central value.
Repeatability
Spread of data obtained by one analyst using the same solutions and
equipment during one period laboratory work.
Reproducibility
Reproducibility involves variations in analysts, laboratories, equipment,
instruments, work periods etc…
Indeterminate errors
Inderterminate errors are random errors, which affect the precision. These
errors can not be eliminated.
Sources
Sampling process
Sample treatment
Measurement (reading errors, electronic noise, stray light etc…)
Evaluating/ Identifying Sources of Indeterminate errors
Examples
Make several determinations of a single sample/ item.
Obtain measurements of several samples of the same 'composition'.
4
CHE322_F06
C.
CHAPTER4_EAD
PROPAGATION OF UNCERTAINTY
Error = difference between a single measurement or result and the true value
The uncertainty is the range of possible values that a measurement or result
may have. It includes all errors, determinate and indeterminate.
C.1
Uncertainty on the Result of Additions and/or Subtractions
R  A B C
2
2
s R  s 2A  s B
 sC
C.2
R
Uncertainty on the Result of Multiplication and Divisions
A B
C
2
2
sR
s 
s 
s 
  A   B   C 
R
 A
 B 
C 
C.3
2
Uncertainty of Mixed Operations
Examples
4.7
Quiz 4.8
5
CHE322_F06
CHAPTER4_EAD
C.4 Uncertainty for other Mathematical Functions
FUNCTION
R  ln( A)
R  log( A)
R  eA
R  10 A
R  Ak
UNCERTAINTY
s
sR  A
A
s
s R  0.4343  A
A
sR
 sA
R
sR
 2.303  s A
R
sR
s
k A
R
A
Calculations of the Propagation of Uncertainty are used for the following
purposes:
1) to compare the Expected Uncertainty of an Analysis and Actual
uncertainty obtained
2) to determine Major and Minor Contributions to overall uncertainty
3) to compare of two or more methods
4) Development of best procedure for preparing a sample
6
CHE322_F06
D.
CHAPTER4_EAD
DISTRIBUTION OF MEASUREMENTS AND RESULTS
Replicate 1 measurement 1
measurement 2
measurement 3
mean 1
Replicate 2 measurement 1
measurement 2
measurement 3
mean 2
Replicate 3 measurement 1
measurement 2
measurement 3
mean 3
Replicate 4 measurement 1
measurement 2
measurement 3
mean 4
Mean
Presentation of results
Two students determine the concentration of a solution of NaOH by titrating
several aliquots of a single stock solution.
Student 1/ Sample 1
Student 2/ Sample 2
Aliquots NaOH (M)
Aliquots NaOH (M)
1
0.1007
1
0.1005
2
0.1010
2
0.1010
3
0.1011
3
0.1002
4
0.1013
4
0.1004
5
0.1005
5
0.1009
6
0.1009
6
0.1003
7
0.1008
7
0.1010
X
s
What can you say about the 'True concentration'?
You need to predict
7
CHE322_F06
CHAPTER4_EAD
1) true spread of the population
2) true central value
Population
Population refers to all members of a system being investigated.
It is an infinite number of data or a universe of data.
Sample
A sample is a finite number of experimental observations/ measurements.
It is a tiny fraction of the population.
A sample is that part of the population that is collected and analyzed. It is a
subset of the population.
Analysis of the entire population provides the population's true central value
() and spread ()
 in1 X i  X 
2

X 
N
in1 X i
N
P(V ) 
V
M
N
M
the probability of occurrence of V
N
value of interest
frequency of occurrence
size of population
In experimental sciences, we seldom sample the whole population. Rather a
sample of the population is analyzed.
From properties of the sample to properties of the population
8
CHE322_F06
CHAPTER4_EAD
(How do we extend what we know about the sample to the population?)
Requirement
-Need to make assumptions about the distribution of the population
Distributions of samples of chemical systems display trends of well-defined
population distributions.
What are they?
9
CHE322_F06
D.3
CHAPTER4_EAD
PROBABILITY DISTRIBUTION
Distribution of a population: Frequency of occurrence versus individual
values
Distribution of data where the members of the population can take any
value, i .e. continuous distribution.
Example: Use data obtained for the calibration of a 10-mL pipet
Generate a histogram of the data
Calculate the mean and the standard deviation
What is the shape/ trend of the distribution around the 'central value'?
Can you predict the distribution of the population from this sample's
distribution?
10
CHE322_F06
CHAPTER4_EAD
D.3.21 NORMAL/ GAUSSIAN DISTRIBUTION
Members of the population may take any value
We will first discuss Gaussian statistics of populations; then we will show
how these relationships can be modified and applied to small samples of
data.
Gaussian Distribution Equation
f(X) versus X
   X   2 
exp 

2
2
2



2

1
f (X ) 
f (X ):
frequency of occurrence for a value X
Defined by two Parameters only:

in1 X i
 
2
true mean
n
in1 X i   2
n
true population's variance
Properties of a normal distribution
1) The mean occurs at the point of maximum frequency
2) There is a symmetrical distribution of positive and negative deviation
about maximum
3) There is an exponential decrease in frequency as the magnitude of
deviations increases
11
CHE322_F06
CHAPTER4_EAD
Universal Gaussian curve
Frequency of deviations from the mean versus deviation from the mean in
X 
units of standard deviation ( z 
)

When X   , z  0
Appendix 1A: z deviations versus fraction of population to the right of z
Area under two limits gives the probability of occurrence between the two
limits
Limits
% population
  1
68.26
  2
95.44
  3
99.73
  4
99.99


f ( X )dX 




   X   2 
exp 
 dX
2
2
 2

2
1
Let us set = 0 and  = 1
1

1
 X 2 
1
exp 
 dX
2
2


Confidence interval
For X i taken from the population, we can state that:
  X i  z
12
CHE322_F06
CHAPTER4_EAD
Confidence Intervals for various values of z   z
The probability of finding  within  z
Z
Confidence Interval
X
(%)
0.5
38
0.5
1.00
68.26
1.0
1.50
86.64
1.50
1.96
95.00
1.96
2.50
98.76
2.50
3.00
99.73
3.00
3.50
99.95
3.50
13
CHE322_F06
CHAPTER4_EAD
D.3.2 What if a mean is obtained from a sample of the population
of known standard deviation?
Confidence Intervals in cases of a sample of measurements (n) and known
population's 
X
z
n
Examples
14
CHE322_F06
CHAPTER4_EAD
D.3.3 PROBABILITY DISTRIBUTIONS FOR SAMPLES
In experimental sciences, we seldom know the parameters of the population.
Therefore we must make assumptions about the population distribution or
predict the distribution.
Measurements on a large sample can be used to verify the distribution trend.
Let us do that using data on the calibration of a pipet
Replicate data on the Calibration of a 10-mL Pipet
a) Construct histogram
b) Calculate the mean and the standard deviation
Central limit theorem
The distribution of measurements is normal when all errors are random,
independent of each other and of similar magnitude.
Then, the sample mean is a good estimate of the population mean, and the
sample variance is a good estimate of the population variance.
Therefore, we can
c) Generate a Gaussian curve using the mean and the standard deviation
calculated
Estimating the true mean () and the true standard deviation ()
Analysis of a large number of samples will yield the true mean and standard
deviation. When the sample size is 50 (>20) the sample mean and the
sample standard deviation approach  and  respectively.
15
CHE322_F06
CHAPTER4_EAD
Confidence intervals
As we have assumed Gaussian distribution of the population, we can
determine a range within which the true mean is expected at a given
confidence level.
Can we use z to define intervals?
Recall z was calculated using population parameters
z
( X  )

s
instead of z and 
n
So we use t and
s
n
  Xi  t
t  z at all confidence levels
s
 s m : Standard error of the mean
n
s:
Sample standard deviation
s
 in1 X i  X 
2
n-1
s
n 1
degrees of freedom (df, ); is the number of independent results
used to compute the standard deviation (when n-1 deviation
have been computed, the final one is known)
 in1
2
n


X

i
X 2  i 1
i
n 1
n
16
CHE322_F06
CHAPTER4_EAD
Appendix 1B lists values of t for various confidence levels and degrees of
freedom.
How should t vary with sample size?
When n = 50 , t 95 = 2.01
For population (n = ), t 95 = 1.96
Example
Use Pipet data.
What is the 95 % confidence interval for the pipet data?
Mean volume = 9.982 mL
Standard deviation = 0.0056 mL
Number of trials = 50
 = 49
  9,982 
2.01  0.0056
 9.982  0.0016 mL
49
There is 95 % probability that the pipet's mean volume is between 9.984 and
9.980 mL.
17
CHE322_F06
E.
CHAPTER4_EAD
Statistical Analysis of Data
We can make definite statements only about the probability that the true
value lies within a given range.
Q: How do we compare two or more samples of results, or two or more
analysts results, or results obtained from two or more methods, made during
a long period of time, from different sources/ subjects?
R: Use statistical tests to determine if the results are significantly different
or not at a desired confidence level.
Note that there still remains the probability that the response may be wrong,
because our hypothesis is tested statistically.
E.1
SIGNIFICANCE TESTING/ hypothesis testing
Construct probability distribution curves for each sample of measurements
Use figure on page 82
Q: Can the difference between the samples be explained by indeterminate
error?
R: One can only determine the probability that the difference is significant
Null hypothesis:
E.2
assumes that the numerical quantities being compared are
equal
TEST OF SIGNIFICANCE FOR MEANS
Sample mean and population mean
Null hypothesis (H0):
the mean of the sample is equal to the mean of the
population
Alternative hypothesis (HA): the mean of the sample is not equal to the
mean of the population
18
CHE322_F06
CHAPTER4_EAD
Choose a significance level:
95 %: the probability that H0 will be correctly retained
The probability that H0 will be incorrectly rejected is  = 0.05
Confidence interval
1     100
19
CHE322_F06
CHAPTER4_EAD
Example
A new procedure for the rapid determination of sulfur in kerosenes was
tested on a sample known from its method of preparation to contain 0.123 %
S (  ). The results were %S = .112, 0.118, 0.115, and 0.119. Do the data
indicate there is bias in the method?
 Xi
 0.464
X  0.116 % S
 X i2  0.053854
s = 0.0032
X    0.007%
Compute t exp and compare it to critical t at the desired confidence level
X
t exp 
t exp 
t exp s
n
X  n
s
0.123  0.116  4
t ( , )  3.18
0.003
 4.375
the null hypothesis must be rejected
  0.05
 3
The probability of rejecting the null hypothesis incorrectly is 0.05.
Type 1 error: null hypothesis is incorrectly rejected
Type 2 error: null hypothesis is incorrectly retained
20
CHE322_F06
E.3
CHAPTER4_EAD
TEST OF SIGNIFICANCE FOR STANDARD DEVIATIONS
A) Are analysis results within statistical control?
Can the difference between the standard deviation of the sample and the
population standard deviation be explained by random error?
Null hypothesis: s 2   2
F-test
Fexp 
s2
2
s2   2
F( , ( num), ( den))
If Fexp  Fcrit
reject the null hypothesis
B) Are two variances of two samples significantly different?
Fexp 
s 2A
s B2
21
CHE322_F06
E.4
CHAPTER4_EAD
COMPARING TWO EXPERIMENTAL MEANS
A) Unpaired data: samples are from the same source
Compare the mean of two sets of identical analysis
t exp 
XA  XB


s 2A n A  s B2 n B

(1)
If the standard deviations are not significantly different use the pooled
standard deviation and equation (2) to calculate t.
t exp 
XA  XB
s pool
s pool 
1 n A   1 n B 
n A  1s 2A  n B  1s B2
n A  nB  2
(2)
(3)
  n A  nB  2
t ( , )  ?
If the standard deviation are significantly different use equation (1) to
calculate t exp . Calculate degrees of freedoms using equation (4) and round
to the nearest integer.
2
2


s 2A n A   s B
n B 

2
2
2
 2
  2

s A n A  n A  1  s B n B  n B  1

 

22
(4)
CHE322_F06
CHAPTER4_EAD
B) Paired data: samples are from different sources
t exp 
d n
sd
di :
difference between paired data
sd :
standard deviation of the differences
d:
average difference
23
CHE322_F06
E.5
CHAPTER4_EAD
DETECTING GROSS ERRORS: OUTLIERS TEST/ Q-TEST
Should a measurement be rejected?
A) Outlier is the smallest value ( X 1 ),
Qexp 
X 2  X1
X n  X1
B) Outlier is the largest value ( X n )
X  X n 1
Qexp  n
X n  X1
Appendix 1D: Q( , n)
Caution, when the sample is small such as the three to five determinations
you make in the CHE 322 L laboratory course.
"Those who believe that they can discard observations with statistical
sanction by using statistical rules for rejection of outliers are simply
deluding themselves." J Mandel
24
CHE322_F06
CHAPTER4_EAD
F.
DETECTION LIMITS
F.1
IUPAC Definition
The detection limit is the smallest concentration or absolute amount of
analyte that has a signal significantly larger than the signal arising from a
reagent blank. (Detectable signal)
This limit is determined by the blank signal / 'background noise' of the
method and the sensitivity of the method.
H0: no analyte in blank
S A DL
 S reag  z reag
S A DL
 S reag  ts reag
 reag : known standard deviation for reagent blank's signal
s reag : standard deviation determined for a reagent blank's signal
t: for one-tailed analysis
C A DL

S A DL
C A DL

S A DL
z3
k
k
( = 0.00135)
The probability of type 1 error is .135 %, but the probability of type 2 error
is higher.
25
CHE322_F06
F.2
CHAPTER4_EAD
Limit of Identification (LOI)
S A LOI
 S reag  z reag  z samp
LOI: the smallest concentration or absolute amount of analyte such that the
probability of type 1 and type 2 errors are equal.
F.3
Limit of Quantitation (LOQ)
Committee on Environmental Chemistry: LOQ is the smallest concentration
or absolute amount of analyte that can be reliably determined. (Quantifiable
signal)
S A LOQ
 S reag  10 reag
26