Download Chapters 4 Statistical treatment of Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 4
Statistics
Standard Deviation

Sample Standard deviation

Population Standard deviation
(for use with small samples
n< ~25)
(for use with samples n > 25)


m = population mean
IN the absence of systematic
error, the population mean
approaches the true value for the
measured quantity.
( xi  x )
s
n 1
2
( xi  m )

N
2
Example

The following results were obtained in
the replicate analysis of a blood sample
for its lead content: 0.752, 0.756,
0.752, 0.760 ppm lead. Calculate the
mean and standard deviation for the
data set.
Standard deviation

0.752, 0.756, 0.752, 0.760 ppm lead.
x  0.755
( xi  x )
s
n 1
2
You’d report the amount of lead in this sample of blood as
Excel® Demo
Distributions of Experimental
Data


We find that the distribution of replicate
data from most quantitative analytical
measurements approaches a Gaussian
curve.
Example – Consider the calibration of a
pipet.
Replicate data on the
calibration of a 10-ml pipet.
Trial
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Volume
9.988
9.973
9.986
9.980
9.975
9.982
9.986
9.982
9.981
9.990
9.980
9.989
9.978
9.971
9.982
9.983
9.988
Mean
9.982 ml
median
9.982 ml
spread
0.025 ml
Standard Deviation
Trial
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
0.0056 ml
Volume
9.975
9.980
9.994
9.992
9.984
9.981
9.987
9.978
9.983
9.982
9.991
9.981
9.969
9.985
9.977
9.976
9.983
Trial
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Volume
9.976
9.990
9.988
9.971
9.986
9.978
9.986
9.982
9.977
9.977
9.986
9.978
9.983
9.980
9.983
9.979
Frequency distribution
Volume
Range, mL
Number in Range
% in range
9.969 to 9.971
3
9.982 to 9.974
1
9.975 to 9.977
7
9.978 to 9.980
9
9.981 to 9.983
13
9.984 to 9.986
7
9.987 to 9.989
5
9.990 to 9.992
4
9.993 to 9.995
1
6
2
14
18
26
14
10
8
2
1 ( xm )2 / 2 2
y
e
 2
14
Number of measurements
12
10
8
6
4
2
0
9.965
9.970
9.975
9.980
9.985
9.990
Range of measured values
9.995
14
Average= 9.982
12
Number of measurements
Std. Dev = + 0.0056
10
8
6
4
2
0
9.965
9.970
9.975
9.980
9.985
9.990
Range of measured values
9.995
4-2 Confidence Intervals
For small data sets
ts
Confidence Interval for m  x 
n
small data sets
m is the true mean and the above equations express that
the “true mean” will be in the calculated range at a given
confidence.
Example

The following results were obtained in the
replicate analysis of a blood sample for its
lead content: 0.752, 0.756, 0.752, 0.760
ppm lead. Calculate the mean and standard
deviation for the data set.
0.755  0.0038 ppm
Find (a) the 50% CL
and (b) the 90% CL
Confidence Intervals
?
ts
CL for m  x 
n
t  0.0038
CL for m  0.755 
4
Confidence Intervals
?
ts
CL for m  x 
n
t  0.0038
50% CL for m  0.755 
4
0.765  0.0038
50% CL for m  0.755 
4
50% CL for m  0.755  0.00146
Confidence Intervals
?
ts
CL for m  x 
n
t  0.0038
90% CL for m  0.755 
4
2.353  0.0038
90% CL for m  0.755 
4
90% CL for m  0.755  0.0045
Confidence Intervals
90 % CI
50 % CI
0.750
0.755
0.760
There is a 50% chance that the true mean, m, lies in the range
0.755 + 0.001 ppm (of from 0.754 to 0.756 ppm)
Likewise, these calculations mean that there is a 90% chance
that the true mean, m, lies in the range 0.755 + 0.005 ppm (of
from 0.750 to 0.760 ppm)
Confidence limits and
uncertainty

Suppose we measure the volume of a
vessel five times and observe values:
6.372, 6.375, 6.374, 6.377, and 6.375 mL.
And find average = 6.3746 mL
And s = 0.0018mL
Use a 90% CL to Estimate uncertainty!
Experimental Uncertainty

Well, a 90% CI means that there is a
90% chance that the true volume is
within the range.
And find average = 6.3746 mL
And s = 0.0018mL
ts
t
(
0
.
001
)
8
CL for m  x 
CL for m  6.3746 
n
5
Experimental Uncertainty
______(0.0018 )
90% CL for m  6.3746 
5
Comparison of Means with
Student’s t
Comparison of Means with
Student’s t

A t test is used to compare one set of
measurements with another to decide
whether or not they are “The Same”


Compare measured result with a “true”
value
Comparing two experimental means
Comparing a mean with a true
value


Good for detecting
systematic
(determinate) errors
Uses student tvalues
tcalculated 
x  xt
s
n
COMPARE TO ttable
If tcalc > ttable – difference is
significant
If tcalc < ttable - difference is
NOT significant
EXAMPLE

A new procedure for the rapid determination
of sulfur in kerosene was tested on a
KNOWN sample (m or xt = 0.123% S). The
results were: % S = 0.112, 0.118, 0.115,
and 0.119. Is there a difference at the 95%
confidence level?
tcalculated 
x  xt
s
n
tcalculated 
0.116  0.123
0.003162
4
tcalculated =
ttable = 3.182
Are they significantly different?
Strutt’s Story
At the turn of the last century it was generally thought that dry air
contained about one-fifth oxygen and four-fifths nitrogen. One man
wanted to confirm this ….
sample –
Dry air …
Added red-hot copper.
Cu would react with oxygen to
make solid copper oxide
(CuO).

1st
Air without Oxygen.
2nd sample –
Make an equal volume of
nitrogen.

Pure Nitrogen can be generated
by decomposition of N2O
(Nitrous oxide) or NO (Nitric
Oxide).
Pure nitrogen.
Reasoned the amounts would be the same
Is there a
Difference at the
95% Confidence
Level?
Comparison of two means
tcalculated 
s pooled 
x1  x2
s pooled
n1n2
n1  n2
2
2
(
x

x
)

(
x

x
)
 i
 i
set1
set 2
n1  n2  2
s12 (n1  1)  s22 (n2  1)

n1  n2  2
Comparison of two means
 x2  2n.129947
n2
2x.131011

ttcalculated
calculated 
s pooled0.00102
n1  n2
7 8
78
tcalculated  20.2
0.000143 (7  1)  0.001382 (8  1)

 0.00102
782
2
s pooled
If tcalc > ttable – difference is significant


Why the difference?
In 1904, Lord Rayleigh was awarded the
Novel Prize for discovering Argon
Comparison of Standard
deviations between data

The F-test may be used to provide
insights into:

Whether there is a difference in the
precision of two methods.


(may warrant a new calculation to compare
means! )
Is method A more precise than method B?
F-test
(comparison of std. dev.)
2
Fcalculated 
s1
s
2
2
We always put the larger standard deviation in the numerator, so that F>1.
If Fcalculated > Ftable then the difference is significant at the 95% CL.
Example
A well developed method for protein concentration
determination yields a standard deviation of 0.25 M
over many hundreds of replicates.
A)
Dr. Skeels develops a rapid method for the
determination of protein concentration that yields a
standard deviation of 0.15 M (for 12 degrees of
freedom).
B)
Dr. Marano’s method yields 0.11 M (std dev) for the
same number of degrees of freedom.
Is Dr. Skeels’ method more precise than the standard or is
Dr. Marano’s, or neither?

Throwing out “Bad data”
For an analysis of alcohol
content in wine
Dr. Skeels finds the following:
12.53,
12.56,
12.47,
12.67,
and 12.48%

Q-test for Bad Data
gap
Qcalculated 
range
Compare to Qcritical
Qcalc > Qcritical can reject
Range
12.47 12.48 12.53 12.56
12.67
Gap
Qcalculated  ?
Qtable  0.64