Download d - Unit Operations Lab @ Brigham Young University

Statistical Methods For UO Lab — Part 1 Calvin H. Bartholomew Chemical Engineering Brigham Young University Background  Statistics is the science of problem-solving in the presence of variability (Mason 2003).  Statistics enables us to:        Assess the variability of measurements Avoid bias from unconsidered causes variation Determine probability of factors, risks Build good models Obtain best estimates of model parameters Improve chances of making correct decisions Make most efficient and effective use of resources Some U.S. Cultural Statistics  58.4% have called into work sick when we weren't.  3 out of 4 of us store our dollar bills in rigid order with singles leading up to higher denominations.  50% admit they regularly sneak food into movie theaters to avoid the high prices of snack foods.  39% of us peek in our host's bathroom cabinet.  17% have been caught by the host.  81.3% would tell an acquaintance to zip his pants.  29% of us ignore RSVP.  35% give to charity at least once a month.  71.6% of us eavesdrop. Population vs. Sample Statistics  Population statistics  Sample statistics  Characterizes the entire population, which is generally the unknown information we seek  Mean generally designated m  Variance & standard deviation generally designated as s 2, and s, respectively  Characterizes a random, hopefully representative, sample – typically data from which we infer population statistics  Mean generally designated x  Variance & standard deviation generally designated as s2 and s, respectively Point vs. Model Estimation  Point estimation  Model development  Characterizes a single, usually global measurement  Characterizes a function of dependent variables  Generally simple mathematic and statistical analysis  Complexity of parameter estimation and statistical analysis depend on model complexity  Procedures are unambiguous  Parameter estimation and especially statistics are somewhat ambiguous Overall Approach  Use sample statistics to estimate population statistics  Use statistical theory to indicate the accuracy with which the population statistics have been estimated  Use linear or nonlinear regression methods/statistics to fit data to a model and to determine goodness of fit  Use trends indicated by theory to optimize experimental design Sample Statistics  Estimate properties of probability distribution function (PDF), i.e., mean and standard deviation using Gaussian statistics  Use student t-test to determine variance and confidence interval  Estimate random errors in the measurement of data  For variables that are geometric functions of several basic variables, use the propagation of errors approach estimate: (a) probable error (PE) and (b) maximum possible error (MPE)  PE and MPE can be estimated by differential method; MPE can also be estimated by brute force method  Determine systematic errors (bias)  Compare estimated errors from measurements with calculated errors from statistics—will reveal whether methods of measurement or quantity of data is limiting Random Error: Single Variable (i.e. T) Questions Several measurements are obtained for a single variable (i.e. T). • What is the true value? • How confident are you? • Is the value different on different days? Some definitions: x = sample mean s = sample standard deviation m = exact mean s = exact standard deviation As the sampling becomes larger: xm ss t chart z chart not valid if bias exists (i.e. calibration is off) How do you determine bounds of m?  Let’s assume a “normal” Gaussian distribution  For small sample: s is known  For large sample: s is assumed small x  i xi n 1 2 s   x  x   i n 1 i 2 we’ll pursue this approach large (n>30) 1 2 s   x  x   i n 1 i 2 Use z tables for this approach Example 1 n Temp 1 40.1 x (40.1  39.2  43.2  47.2  38.6  40.4  37.7)  40.9 7 2 39.2 3 43.2 4 47.2 5 38.6 6 40.4 40.1  40.9 2  39.2  40.9 2     2 2 1 43.2  40.9   47.2  40.9    s2   10.7   2 2 7  1 38.6  40.9   40.4  40.9     37.7  40.9 2    7 37.7 s  3.27 Properties of a Normal PDF  About 68.26%, 95.44%, and 99.74% of data lie within 1, 2, and 3 standard deviations of the mean, respectively.  When mean is zero and standard deviation is 1, it is referred to as a standard normal distribution.  Plays fundamental role in statistical analysis because of the Central Limit Theorem. Central Limit Theorem  Distribution of means calculated from a large data set is approximately normal  Becomes more accurate with larger number of samples  Sample mean approaches true mean as n →   Assumes distributions are not peaked close to a boundary and variances are finite Z sx mx n Student t-Distribution Probability Density  Widely used in hypothesis testing and determining confidence intervals  Equivalent to normal distribution for large sample size  Student is a pseudonym, not an adjective – actual name was W. S. Gosset who published in early 1900s. 0.4 0.3 0.2 0.1 0.0 -4 -2 0 2 Value of Random Variable 4 Student t-Distribution Quantile Value of t Distribution 60  Used to compute confidence intervals according to 50 s t mx n 99 % confidence interval 95 % confidence interval 90 % confidence interval 40  Assumes mean and variance are estimated by sample values  Value of t decreases with DOF or number of data points n; increases with increasing % confidence 30 20 10 0 5 10 15 Degrees of Freedom 20 Student t-test (determine error from s) 5% 5% t  s    m  x t where t  f , n  1    2   n  = 1- probability r = n -1 error = t s /n 0.5 e.g. From Example 1: n = 7, s = 3.27 Prob. /2 t t s/n 0.5 90% 0.05 1.943 2.40 Values of Student t Distribution  Depend on both confidence level desired and amount of data.  Degrees of freedom are n-1, where n = number of data points (assumes mean and variance are estimated from data).  This table assumes two-tailed distribution of area. df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 inf Two-tailed confidence level 90% 95% 99% 6.31375 12.7062 63.6567 2.91999 4.30265 9.92484 2.35336 3.18245 5.84091 2.13185 2.77645 4.60409 2.01505 2.57058 4.03214 1.94318 2.44691 3.70743 1.89457 2.36458 3.49892 1.85954 2.30598 3.3551 1.83311 2.26214 3.24968 1.81246 2.22813 3.16918 1.79588 2.20098 3.10575 1.78229 2.17881 3.0545 1.77093 2.16037 3.01225 1.76131 2.14479 2.97683 1.75305 2.13145 2.9467 1.74588 2.1199 2.92077 1.73961 2.10982 2.89822 1.73406 2.10092 2.87844 1.72913 2.09302 2.86093 1.72472 2.08596 2.84534 1.72074 2.07961 2.83136 1.71714 2.07387 2.81875 1.71387 2.06866 2.80733 1.71088 2.0639 2.79694 1.70814 2.05954 2.78743 1.64486 1.95997 2.57583 Example 2  Five data points with sample mean and standard deviation of 713.6 and 107.8, respectively.  The estimated population mean and 95% confidence interval is (from previous table t = 2.77645): s t 107.8* 2.77645 mx  713.6  n 5  713.6  133.9  713.6(133.9) Example 3: Comparing Averages Day 1: x  40.9 Day 2: y  37.2 s x  3.27 nx  7 s y  2.67 ny  9 What is your confidence that mx≠my? t xy (nx  1) s  (n y  1) s  1 1     n n  nx  n y  2 y   x 2 x nx+ny-2  2.5 2 y 99% confident different 1% confident same Error Propagation: Multiple Variables Obtain value (i.e. from model) using multiple input variables. What is the uncertainty of your value? Each input variable has its own error Example: How much ice cream do you buy for the AIChE event? Ice cream = f (time of day, tests, …) Example: You take measurements of r, A, v to determine m = rAv. What is the range of m and its associated uncertainty? Value and Uncertainty • Values are used to make decisions by managers — uncertainty of a value must be specified • Ethics and societal impact of values are important • How do you determine the uncertainty of a value? Sources of uncertainty: 1. 2. 3. 4. 5. 6. 7. 8. Estimation- we guess! Discrimination- device accuracy (single data point) Calibration- may not be exact (error of curve fit) Technique- i.e. measure ID rather than OD Constants and data- not always exact! Noise- which reading do we take? Model and equations- i.e. ideal gas law vs real gas Humans- transposing, … Estimates of Error (d ) for Input Variable (Methods or rules) 1. Measured variable (as we just did): measure multiple times; obtain s; d ≈ 2.57 s (t chart shows > 2.57s for 99% confidence e.g. s = 2.3 ºC for thermocouple, d = 5.8 ºC 2. Tabulated variable: d ≈ 2.57 times last reported significant digit (e.g. r = 1.0 g/ml at 0º C, d = 0.257 g/ml) Estimates of Error (d) for Variable 3. Manufacturer specs: use given accuracy data (ex. Pump is ± 1 ml/min, d = 1 ml/min) 4. Variable from regression (i.e. calibration curve): d ≈ standard error (e.g. Velocity from equation with std error = 2 m/s ) 5. Judgment for a variable: use judgment for d (e.g. graph gives pressure to ± 1 psi, d = 1 psi) Calculating Maximum or Probable Error 1. Maximum error can be calculated as shown previously: a) Brute force method b) Differential method 2. Probable error is more realistic – positive and negative errors can lower the error. You need standard deviations (s or s) to calculate probable error (PE) (i.e. see previous example). PE = d = 2.57 s 2  dy  2 s     s xi i  dxi  2 y Ψ = y ± 1.96 SQRT(s2y) 95% Ψ = y ± 2.57 SQRT(s2y) 99% Calculating Maximum (Worst) Error 1. Brute force method: substitute upper and lower limits of all x’s into function to get max and min values of y. Range of y (Ψ ) is between ymin and ymax. 2. Differential method: from a given model y = f(a,b,c…, x1,x2,x3,…) Exact constants Independent variables Range of y (Ψ) = y ± dy dy dy   di i dxi Example 4: Differential method dy dy   di i dxi y  x2 x3  A v  6.8 cm3 / s x1 y  x1 x3  r v  4.0 g/cm2 / s x2 y  x1 x2  r A  6.8 g/cm x3 m= r A v y x1 x2 x3 x1 = r = 2.0 g/cm3 (table) x2 = A = 3.4 cm2 (measured avg) x3 = v = 2 cm/s (calibration) d1 = 0.257 g/cm3 (Rule 2) d2 = 0.2 cm2 (Rule 1) d3 = 0.1 cm/s (Rule 4) Ψ = 13.6 ± 3.2 g/s y = (2.0)(3.4)(2) = 13.6 g/s dy = (6.8)(0.257)+(4.0)(0.2)+(6.8)(0.1) = 3.2 g/s Which product term contributes the most to uncertainty? This method works only if errors are symmetrical

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download d - Unit Operations Lab @ Brigham Young University