Download Statistical Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Significant Figures
Significant Figures: 142.7 (4 sig. figs)
It is the minimum number of digits needed to write a given value in scientific
notation without loss of accuracy.
142.7 (4 sig. figs)
1.427 x 102
9.25 x 104 (3 sig. figs)
9.250 x 104 (4 sig. figs)
9.2500 x 104 (5 sig. figs)
Zeros are significant when they occur
1) in the middle of the number;
2) at the end of the number on the rhs of a decimal pt.
106,
0.0106,
0.01060
Estimates:
Absorbance: 0.234 ± 1: 0.235 to 0.233
Transmittance: 58.3 ± 2
Significant figures in arithmetic
5.345
1.362 x 10-4
-4
+3.111 x 10
+ 6.728
--------------------------4
4.473 x 10
12.073
In addition and subtraction the number of the significant figures in the answer may
exceed or be less than that in the original data.
In multiplication and division the number of the significant figures is limited to the
number of digits contained in the number with the fewest significant figures.
34.60 ÷ 2.46287 = 14.05
4.3179 x 1012 * 3.6 x 10-19 = 1.6 x 10-6
1
n = 10a
Logarithms
of a)
339 = 3.39 x 102
log 339 = 2. 530
means that log n = a (the number n is the antilogarithm
| 2: characteristic 530: mantissa
The number of digits in the mantissa should equal the number of total significant
figures
Consequently, when we convert a logarithm to an antilogarithm, the number of sig.
figs in the antilogarithm should equal the number of digits in the mantissa.
antilog ( - 3.42 ) = 10 -3.42 = 3.8 x 10-4
Experimental Error
Every measurement has some uncertainty ⇒ experimental error
Systematic (determinate) error: arises from a flaw in equipment or the design of
the experiment
Random (indeterminate) error: arises from the effects of uncontrolled (and
maybe uncontrollable) variables in the measurement. It can be positive or negative
and always present! (Electric noise, human readings…). It cannot be eliminated but
it can be reduced.
Precision: describes the reproducibility of the results
Accuracy: describes how close a measured value is to the “true value”.
Absolute Uncertainty: expresses the margin of uncertainty associated with a
measurement. Buret: ± 0.02 mL
Relative Uncertainty: compares the size of the absolute uncertainty to the size of
its associated measurement.
{absolute uncertainty ÷ magnitude of measurement}
12.35 ± 0.02
0.02
= 0.002
12.35
2
Percent Relative Uncertainty: Relative Uncertainty x 100
Uncertainties in Data and Results
• Random Errors and Precision
We assume that the numerical result to which our discussion applies is obtained
with an instrument that measures a physical quantity for which the a priori range of
all physical values constitutes a continuum.
1 N
x
=
Average or mean of N measurements of an experimental variable x:
∑ xi
N i =1
Range of measured values: R = xl arg est − x smallest - not a clear measure of the
precision.
1
Average deviation: ave.dev =
N
computers
N
∑| x
i =1
i
− x | - declining value in the age of
A measure of precision unbiased by sample size is the variance, S2:
1 N
S2 =
( xi − x ) 2 - Additive property! (= Estimates of random error from
∑
N − 1 i =1
variable sources may be combined). Divisor is known as degrees of freedom.
The number of degrees of freedom is equal to the number of independent data on
which the calculation of variance is based.
Alternative form of the variance equation:
2
2
1
1
S2 =
xi2 − N x =
( x 2 − x ) - Useful for calculators and computers.
∑
N −1
N −1
(
)
The square root of the variance is called the estimated standard deviation:
1/ 2
S =
1 N
2
(
x
−
x
)
∑
i

N − 1  i =1

- indicates the precision of individual measurements.
3
The precision of the mean of the measurements is given by the estimated standard
1/ 2
S
1 N
2
=
=
−
S
(
x
x
)
deviation of the mean of N values: m
∑
i

N
N − 1  i =1

The precision of the mean can be increased by increasing the number of individual
measurements!
Rejection of Discordant Data
Grubbs Test for an Outlier
To determine whether a particular data point can be excluded based upon its
questionable veracity; we compute the Grubbs statistic G, defined as:
| questionable value - x |
Gcalculated =
s
If Gcalculated < Gtable the questionable point should be
retained.
Example: 10.2, 10.8, 11.6, 9.9, 9.4, 7.8, 10.00, 9.2, 11.3,
9.5, 10.6, 11.6
The value of 7.8 appears out of line.
We get s = 1.11 and x = 10.16
Gcalculated = 2.13 and on comparison to Gtable = 2.285 (for 12
observations), the questionable value should be retained.
Statistical Treatment of Random Errors
Error Frequency Distribution
For a physical quantity x we get a large number of measurements xi (I = 1, 2, 3, …
N) and are subject to random errors εi. For simplicity we assume that the true value
xo is known so the errors are known also. Therefore, we are concerned with the
frequency of occurrence, nε, of errors of size ε.
4
The graph on the left
represents the actual error
frequency distribution nε for
376 measurements; the
estimated normal error
probability function P(ε) is
given by the dashed curve.
Estimated values of the
standard deviation σ and the
95% confidence limit ∆ are
indicated in relation to the
normal error curve. The
width w is chosen as a
compromise between the
desirability of having the numbers in each bar as large as possible and the
desirability of having the number of bars as large as possible.
+∞
P(ε) is normalized, so that
∫ P(ε )dε = 1 .
−∞
The significance of normalization is that a single measurement will be in error by
an amount lying in the range between ε and ε + dε is equal to P(ε)dε. A probability
function derived in this way is approximate. It can be assumed that the probability
function is represented by a Gaussian distribution which is called normal error
−ε 2
1
2
e 2σ Where the standard deviation σ is a
probability function. P (ε ) =
2π σ
parameter which characterizes the width of the distribution. It is the root-meansquare error expected with this probability function.
1/ 2
 1 + ∞ 2 − ε 2 2σ 2 
2
σ ≡ (ε ) = 
ε e
dε 
∫
π
σ
2
−∞


known, then σ can be estimated by
If the true value is xo and the errors εI are
1/ 2
1 N 
σ =  ∑ ε i2  The dashed curve represents a normal error probability function
 N i =1 
with a value of σ for 376 measurements.
5
All the assumptions made are required for the validity of the central limit theorem.
Infinitely Large Sample
So far our discussion has dealt with the errors themselves. In real circumstances we
do not know the errors εI by which each measurement deviate from the true value
xo. We know only the deviations from a mean value ( xi − x ) . If the errors are only
random then the mean value is the best estimate of the true value.
If we can make a very large (theoretically infinite) number of measurements then
we can determine the true mean µ exactly and the spread of the data points about
this mean would indicate the precision of the observation. The probability function
for the deviation will be
 ( x − µ )2 
1
1
exp  −
, where σ = lim 

2
2σ 
2π σ
N →∞  N

P( x − µ ) =
1/ 2

( x − µ )2 
∑
i =1

N
In the absence of systematic errors µ should be equal to xo.
The normal probability distribution function is used to establish the probability P
that an error is less than a certain magnitude δ or to establish the limiting width of
the range -δ to +δ.,within which the integrated probability P has a certain value.
+δ
2
−ε
1
2σ 2
P=
e
dε If δ = σ, then P = 0.6826. This means that 68.26% of all
2π σ −∫δ
errors are less than the standard deviation in magnitude.
If P = 0.95 then δ 0.95 = 1.96σ ≈ 2σ
The value of P is
given by the shaded
area. The 95%
confidence limit is
shown in rhs graph.
For σ to be known
satisfactory, N should
be at least 20.
6
Correspondence between uncertainty value and confidence level
±σ
Uncertainty
± 1.64σ ± 1.96σ ± 2.58σ ±3.29σ
Confidence Level 68.26
90
95
99
99.9
Large Finite Sample - Uncertainty in mean value
The error in the mean εm of N observations is the mean of the individual errors εi:
εm =
1
N
N
∑ε
i =1
i
The estimated standard deviation Sm of the mean is given by:
1  N 2 N
S = ε = 2 ∑ ε i  + ∑
N  i =1  i =1
2
m
2
m
N
∑ε ε
i
j
j =1
The meaning of the above equation is that the mean of a group of N independent
measurements of equal weight has a higher precision than any single of these
measurements.
For a sample of N ≥ 20 there is a 68.26% probability that the true value lies
between ( x − Sm ) and ( x + Sm )
We can determine a 95% confidence limit in the mean, denoted as ∆,
∆ = δ m ,0.95 =
1.96S
2S
≅
If N < 20 then Student t distribution will be used
N
N
The joint probability is given by
Pm ( x ) =
 ( x − µ )2 
exp  −
2
2π (σ / N )
 2(σ / N ) 
1
In the case where σ is known, σ m =
σ
N
7
Small Samples (1 < N < 20) – Student t distribution function
Student t distribution functions P(τ) for ν = 1, 3, 5, …∞ (degrees of freedom). The
quantities actually plotted

τ2 
P(τ ) / k norm = 1 +

 N − 1
x−µ
x−µ
τ≡
=
Sm
(S / N )
−N / 2
= [1 + τ 2 /ν ]−(ν +1) / 2 , where N – 1 = ν and
The curve for ν = ∞ is the normal error curve. The short vertical bars mark the
95% confidence level.
The t distribution curve can be used in the same way as the normal distribution
curve.
Suppose we seek to find the values of τ over which the integral of the Student
probability function is a fraction of P. Then we calculate the following integral:
t
∫ P(τ )dτ = P
−t
8
We define as limit of error δ as the value of ( x − µ ) that correspond to the limit of
integration t.
δ = tSm =
tS
t0.95 S
and for the 95% confidence limit, ∆ = t0.95 Sm =
N
N
The table below shows “critical” values of t for a given number of degrees of
freedom ν and a given P.
Propagation of Errors (random/systematic)
If one determines a quantity F(x, y, z, …) where x, y, z, … are measured values with
uncertainties ∆(x), ∆(y), ∆(z), …then the error in F is given by
2
2
2
 ∂F  2
 ∂F  2
 ∂F  2
∆ (F ) = 
 ∆ ( y) + 
 ∆ ( x) + 
 ∆ ( z ) + ...
 ∂x 
 ∂z 
 ∂y 
2
In certain cases the propagation of errors can be carried out very simply:
2
2 2
2 2
2 2
a) For F = ax ± by ± cz , ∆ ( F ) = a ∆ ( x ) + b ∆ ( y ) + c ∆ ( z )
9
∆2 ( F ) ∆2 ( x ) ∆2 ( y ) ∆2 ( z )
=
+
+ 2
b) For F = axyz (or axy/z, or ax/yz, or a/xyz),
F2
x2
y2
z
n
c) For F = ax ,
2
∆2 ( F )
∆ (F )
∆ ( x)
2 ∆ ( x)
n
n
=
→
=
F2
x2
F
x
2
2 2 2
d) For F = aex, ∆ ( F ) = a e ∆ ( x ) →
∆( F )
= ∆( x )
F
∆ ( x)
a2 2
e) For F = a ln x, ∆ ( F ) = 2 ∆ ( x ) → ∆( F ) == a
x
x
2
Method of Least Squares
We use this method in order to draw the best straight line through experimental
data points that have some scatter and they do not lie perfectly on a straight line.
The equation of the straight line is:
y = mx + b
10
The vertical deviation is: d i = yi − y = y1 − (mxi + b) , some + and some –
2
2
2
We square, d i = ( yi − y ) = ( y1 − mxi − b)
2
Now we minimize the sum of the squares of all the deviations: SSE = σ = ∑ d i
i
The values of m and b are found which minimize SSE,
 ∂

 ∂

SSE  = and  SSE  =

 ∂m
b
 ∂b
m
LINEST in Excel and Regression Analysis in SigmaPlot
11
12
Related documents