Download d - Unit Operations Lab @ Brigham Young University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Resampling (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistical Methods
For UO Lab — Part 1
Calvin H. Bartholomew
Chemical Engineering
Brigham Young University
Background
 Statistics is the science of problem-solving in
the presence of variability (Mason 2003).
 Statistics enables us to:







Assess the variability of measurements
Avoid bias from unconsidered causes variation
Determine probability of factors, risks
Build good models
Obtain best estimates of model parameters
Improve chances of making correct decisions
Make most efficient and effective use of resources
Some U.S. Cultural Statistics
 58.4% have called into work sick when we weren't.
 3 out of 4 of us store our dollar bills in rigid order with
singles leading up to higher denominations.
 50% admit they regularly sneak food into movie
theaters to avoid the high prices of snack foods.
 39% of us peek in our host's bathroom cabinet.
 17% have been caught by the host.
 81.3% would tell an acquaintance to zip his pants.
 29% of us ignore RSVP.
 35% give to charity at least once a month.
 71.6% of us eavesdrop.
Population vs. Sample Statistics
 Population statistics
 Sample statistics
 Characterizes the entire
population, which is
generally the unknown
information we seek
 Mean generally designated
m
 Variance & standard
deviation generally
designated as s 2, and s,
respectively
 Characterizes a random,
hopefully representative,
sample – typically data
from which we infer
population statistics
 Mean generally
designated x
 Variance & standard
deviation generally
designated as s2 and s,
respectively
Point vs. Model Estimation
 Point estimation
 Model development
 Characterizes a single,
usually global measurement
 Characterizes a function of
dependent variables
 Generally simple mathematic
and statistical analysis
 Complexity of parameter
estimation and statistical
analysis depend on model
complexity
 Procedures are
unambiguous
 Parameter estimation and
especially statistics are
somewhat ambiguous
Overall Approach
 Use sample statistics to estimate
population statistics
 Use statistical theory to indicate the
accuracy with which the population
statistics have been estimated
 Use linear or nonlinear regression
methods/statistics to fit data to a model
and to determine goodness of fit
 Use trends indicated by theory to optimize
experimental design
Sample Statistics
 Estimate properties of probability distribution function (PDF),
i.e., mean and standard deviation using Gaussian statistics
 Use student t-test to determine variance and confidence
interval
 Estimate random errors in the measurement of data
 For variables that are geometric functions of several basic variables, use
the propagation of errors approach estimate: (a) probable error (PE) and
(b) maximum possible error (MPE)
 PE and MPE can be estimated by differential method; MPE can also be
estimated by brute force method
 Determine systematic errors (bias)
 Compare estimated errors from measurements with calculated
errors from statistics—will reveal whether methods of measurement
or quantity of data is limiting
Random Error: Single Variable (i.e. T)
Questions
Several measurements
are obtained for a
single variable (i.e. T).
• What is the true value?
• How confident are you?
• Is the value different on
different days?
Some definitions:
x = sample mean
s = sample standard deviation
m = exact mean
s = exact standard deviation
As the sampling becomes larger:
xm
ss
t chart
z chart
not valid if bias exists
(i.e. calibration is off)
How do you determine bounds of m?
 Let’s assume a “normal” Gaussian distribution
 For small sample: s is known
 For large sample: s is assumed
small
x 
i
xi
n
1
2
s 

x

x


i
n 1 i
2
we’ll pursue this approach
large
(n>30)
1
2
s 

x

x


i
n 1 i
2
Use z tables for this approach
Example 1
n
Temp
1
40.1
x
(40.1  39.2  43.2  47.2  38.6  40.4  37.7)
 40.9
7
2
39.2
3
43.2
4
47.2
5
38.6
6
40.4
40.1  40.9 2  39.2  40.9 2  


2
2
1 43.2  40.9   47.2  40.9   
s2 
 10.7


2
2
7  1 38.6  40.9   40.4  40.9  


37.7  40.9 2



7
37.7
s  3.27
Properties of a Normal PDF
 About 68.26%, 95.44%, and 99.74% of data lie
within 1, 2, and 3 standard deviations of the
mean, respectively.
 When mean is zero and standard deviation is
1, it is referred to as a standard normal
distribution.
 Plays fundamental role in statistical analysis
because of the Central Limit Theorem.
Central Limit Theorem
 Distribution of means calculated from a large
data set is approximately normal
 Becomes more accurate with larger number of
samples
 Sample mean approaches true mean as n → 
 Assumes distributions are not peaked close to a
boundary and variances are finite
Z sx
mx
n
Student t-Distribution
Probability Density
 Widely used in hypothesis
testing and determining
confidence intervals
 Equivalent to normal
distribution for large sample
size
 Student is a pseudonym,
not an adjective – actual
name was W. S. Gosset who
published in early 1900s.
0.4
0.3
0.2
0.1
0.0
-4
-2
0
2
Value of Random Variable
4
Student t-Distribution
Quantile Value of t Distribution
60
 Used to compute confidence
intervals according to
50
s t
mx
n
99 % confidence interval
95 % confidence interval
90 % confidence interval
40
 Assumes mean and variance
are estimated by sample
values
 Value of t decreases with
DOF or number of data
points n; increases with
increasing % confidence
30
20
10
0
5
10
15
Degrees of Freedom
20
Student t-test (determine error from s)
5%
5%
t
 s 


m  x t
where
t

f
,
n

1



2

 n
 = 1- probability
r = n -1
error = t s /n 0.5
e.g. From Example 1: n = 7, s = 3.27
Prob.
/2
t
t s/n 0.5
90%
0.05
1.943
2.40
Values of Student t Distribution
 Depend on both confidence
level desired and amount of
data.
 Degrees of freedom are n-1,
where n = number of data
points (assumes mean and
variance are estimated from
data).
 This table assumes two-tailed
distribution of area.
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
inf
Two-tailed confidence level
90%
95%
99%
6.31375 12.7062 63.6567
2.91999 4.30265 9.92484
2.35336 3.18245 5.84091
2.13185 2.77645 4.60409
2.01505 2.57058 4.03214
1.94318 2.44691 3.70743
1.89457 2.36458 3.49892
1.85954 2.30598 3.3551
1.83311 2.26214 3.24968
1.81246 2.22813 3.16918
1.79588 2.20098 3.10575
1.78229 2.17881 3.0545
1.77093 2.16037 3.01225
1.76131 2.14479 2.97683
1.75305 2.13145 2.9467
1.74588 2.1199 2.92077
1.73961 2.10982 2.89822
1.73406 2.10092 2.87844
1.72913 2.09302 2.86093
1.72472 2.08596 2.84534
1.72074 2.07961 2.83136
1.71714 2.07387 2.81875
1.71387 2.06866 2.80733
1.71088 2.0639 2.79694
1.70814 2.05954 2.78743
1.64486 1.95997 2.57583
Example 2
 Five data points with sample mean and standard
deviation of 713.6 and 107.8, respectively.
 The estimated population mean and 95% confidence
interval is (from previous table t = 2.77645):
s t
107.8* 2.77645
mx
 713.6 
n
5
 713.6  133.9
 713.6(133.9)
Example 3: Comparing Averages
Day 1: x  40.9
Day 2: y  37.2
s x  3.27
nx  7
s y  2.67
ny  9
What is your confidence that mx≠my?
t
xy
(nx  1) s  (n y  1) s  1 1 
  
n n 
nx  n y  2
y 
 x
2
x
nx+ny-2
 2.5
2
y
99% confident different
1% confident same
Error Propagation: Multiple Variables
Obtain value (i.e. from model) using multiple input variables.
What is the uncertainty of your value?
Each input variable has its own error
Example: How much ice cream do you buy for
the AIChE event?
Ice cream = f (time of day, tests, …)
Example: You take measurements of r, A, v
to determine m = rAv. What is the
range of m and its associated uncertainty?
Value and Uncertainty
• Values are used to make decisions by managers —
uncertainty of a value must be specified
• Ethics and societal impact of values are important
• How do you determine the uncertainty of a value?
Sources of uncertainty:
1.
2.
3.
4.
5.
6.
7.
8.
Estimation- we guess!
Discrimination- device accuracy (single data point)
Calibration- may not be exact (error of curve fit)
Technique- i.e. measure ID rather than OD
Constants and data- not always exact!
Noise- which reading do we take?
Model and equations- i.e. ideal gas law vs real gas
Humans- transposing, …
Estimates of Error (d ) for Input Variable
(Methods or rules)
1. Measured variable (as we just did):
measure multiple times; obtain s;
d ≈ 2.57 s (t chart shows > 2.57s for 99%
confidence
e.g. s = 2.3 ºC for thermocouple, d = 5.8 ºC
2. Tabulated variable:
d ≈ 2.57 times last reported significant digit
(e.g. r = 1.0 g/ml at 0º C, d = 0.257 g/ml)
Estimates of Error (d) for Variable
3. Manufacturer specs:
use given accuracy data
(ex. Pump is ± 1 ml/min, d = 1 ml/min)
4. Variable from regression (i.e. calibration curve):
d ≈ standard error
(e.g. Velocity from equation with std error = 2 m/s )
5. Judgment for a variable:
use judgment for d
(e.g. graph gives pressure to ± 1 psi, d = 1 psi)
Calculating Maximum or Probable Error
1. Maximum error can be calculated as shown previously:
a) Brute force method
b) Differential method
2. Probable error is more realistic – positive and negative
errors can lower the error. You need standard
deviations (s or s) to calculate probable error (PE)
(i.e. see previous example). PE = d = 2.57 s
2
 dy  2
s     s xi
i  dxi 
2
y
Ψ = y ± 1.96 SQRT(s2y) 95%
Ψ = y ± 2.57 SQRT(s2y) 99%
Calculating Maximum (Worst) Error
1. Brute force method: substitute upper and lower limits
of all x’s into function to get max and min values of y.
Range of y (Ψ ) is between ymin and ymax.
2. Differential method: from a given model
y = f(a,b,c…, x1,x2,x3,…)
Exact constants Independent variables
Range of y (Ψ) = y ± dy
dy
dy  
di
i dxi
Example 4: Differential method
dy
dy  
di
i dxi
y
 x2 x3  A v  6.8 cm3 / s
x1
y
 x1 x3  r v  4.0 g/cm2 / s
x2
y
 x1 x2  r A  6.8 g/cm
x3
m= r A v
y
x1 x2 x3
x1 = r = 2.0 g/cm3 (table)
x2 = A = 3.4 cm2 (measured avg)
x3 = v = 2 cm/s (calibration)
d1 = 0.257 g/cm3 (Rule 2)
d2 = 0.2 cm2 (Rule 1)
d3 = 0.1 cm/s (Rule 4)
Ψ = 13.6 ± 3.2 g/s
y = (2.0)(3.4)(2) = 13.6 g/s
dy = (6.8)(0.257)+(4.0)(0.2)+(6.8)(0.1) = 3.2 g/s
Which product term contributes the most to uncertainty?
This method works only if errors are symmetrical