Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Part-III Treatment of Data 1 OVERVIEW (1) Units of measurement (a) must be indicated in tables/graphs. (b) use scientific notation Examples 2.2 µV 71 83.4 kJ 0.21 MW 73 pF m m m 2.2x10-6 V 7.1x10-2 8.34x104 J 2.1x105 W 7.3x10-11 F Tabulation of Data Whether one is measuring the distance to the moon using laser interferometry or measuring the soil strength using a penetrometer or bond angles/lengths using XRD, carefully made (and recorded) observations are the cornerstone of a good experimental work. 1 x2 x1 x (i) Repeat measurements These are made even when the quantity (or the value) is NOT varying.Timing the fall of an object through a given distance or measuring the wavelength of light emitted from a lamp containing helium gas. (ii) Relationship between two variables : y = f (x) so assign different values to as , ...... and measure y , y 2 ...... Table: Fall times (t) for an object to fall in air through 25 m (ambient temperature 298 K, 3 pm, 17-09-2013). t(s) 2.2 2.0 2.6 2.1 1.9 2.2 2.4 2.0 2.3 2.3 (3) Use Scientific Notation For example, 2 mg 2x10-3 g ≅ 2x10-6 kg Pressure values (Pa) : 1.03x105 1.01x105 9.9x104 9.83x104 1.01x105 1.05x105 Table : Measured values of pressure. Pressure(x105Pa ) 1.03 1.01 0.99 0.983 1.01 1.05 or Pressure(kPa) 103 101 99 98.3 101 105 (4) Uncertainties in measurements Despite our best efforts and/or the quality of instruments “Variability”/”uncertainty” in experimental data. graduation of a thermometer ∴ there is nothing called “Exact” measurement. (One can only do the best possible !) Fluctuating nature of the mercury in a thermometer ∼ ± 0.5 o C . There is an uncertainty of 20 ± 0.5o C What does it mean ? What you mean is that the temperature could be o o anything from 19.5 C to 20.5 C , with the possible average value of 20o C ! 6 One must always include uncertainty estimates. Table : Dependence of electrical resistance on temperature of a Copper wire (Kirkup : Experimental Methods,Wiley 1994). T (K) ± 0.5K 281 289.5 296.5 305.0 313.5 327.5 Electrical Resistance Ω ± 0.001Ω 0.208 0.213 0.222 0.229 0.232 0.243 OTHER FACTORS Compounding of uncertainties? Does uncertainty depend upon the magnitude of the measurement ? 8 (5) Significant Figures If a value is recorded as 5.1 2 your experiment is able to distinguish between 5.1 1 and 5.1 3 Similarly, 5.123 5.122 & 5.124 5.12 3 significant digits 5.123 4 significant digits Resist the temptation to record all the figures given by an instrument display or a calculator or a computer. Let us look at the value of 0.0020409 how many significant figures are there ? between the first non-zero and the last digit inclusive five significant figures Exercise 2.564 0.00489 64000 1.20 1.2 0.20000878 four three ? three two eight Rounding off Numbers In calculations of derived variables, rounding off is needed. How to do it ? 11 1.8671132 Reduce to 2 SF 1.8 or 1.9 Reduce to 3 SF 1.87 Reduce to 4 SF 1.867 CALCULATION & SIGNIFICANT FIGURES Let us say, for a cylinder, D = 7.9 L = 1.5 mm mm πD A= = 4 4 A = 49.016699mm 2 2 X- Sectional area, 2 ( ) π 7.9 Is this sensible ? Obviously not.Why ? Well, if D is known to 2 significant figures, area can not be calculated to 8 significant figures !! A = 49mm 2 2 π D Volume of cylinder = L 4 π 2 = × ( 7.9) ×1.5 4 = 73.525049mm3 ∴ V = 73 mm3 Guidelines for rounding off Rule # 1 :Addition/Subtraction Round off the result to the same number of decimal places = least number of decimal places of the constituents Ex. 11.39 – 7.897 + 12.3538 = 15.8468 Correct answer is 15.85 14 Rule # 2 : Multiplication /Division The number of significant digits in the answer = least number of significant digits in the primary numbers. 3.17×3.393×3.3937 = 36.501992 3 significant figures 8 significant figures 36.5 CAVEAT Volume = Area × L = 49×1.5 = 73.5mm3 3 One might be tempted to round it up to 74 mm ! Do not round off the intermediate results PROBLEM In an experiment, the density (ρ) of a metallic sphere is to be measured. Density ρ is defined as mass per unit volume , i.e., m/v. πd 3 V = 6 Lab notebook has the following entries Mass of sphere = 0.44 g Diameter of sphere = 4.76 mm πd 3 π ( 4 .7 6 ) V = = = 4 5 1 .7 6 1 7 6 1m m 3 6 6 3 ∴ 0.44 ρ= = 9.739647 × 10 -4 451.761761 Does it make sense ? PRESENTATION OF DATA (Graphs, bar/ pie charts) 1. Purpose Graphs: our visual ability is very strong to detect trends as opposed to tabulated information. x y 9.3 19.0 7.3 15.0 9.8 20 1.8 4 5.3 11 Graph/plot can be used to show the: a) Range of data b) Uncertainty in each measurement c) Trend or the absence of a trend d) Data which do not fall in line with the majority of data points. DEPENDENT/INDEPENDENT VARIABLES Horizontal axis (Independent variable) 2-D plots Vertical axis (Dependent variable) Independent variables : controlled or which is varied systematically P P can be varied through a high pressure gas line P Q Q Independent variable Dependent variable Effect of temperature on solubility of salt in water Amount of water = 1 m3 Mass of salt which can be dissolved = M kg ∴ solubility, S = M/1 (kg/m3) T (K) Independent variable S (kg/m3) 298 S1 315 S2 325 S3 Dependent variable Example: Effect of temperature on the length of a wire T (K) L (m) 273 1.1155 298 1.1164 323 1.1170 348 1.1172 373 1.1180 398 1.1190 423 1.1199 448 1.1210 473 1.1213 498 1.1223 523 1.1223 (3) ORIGINS Is it necessary to have a (0,0) point on both axes ? Re-plot the T-L data This is not a good graph ! Why ? (4) Error bars These show the uncertainties in both variables: For the independent variable: For the dependent variable: for any ( x, y) data point Uncertainty in y- value Uncertainty in x- value Cooling of an object Time (s) ± 5 s Temperature (°C) ± 4 °C 10 125 70 116 125 104 190 94 260 87 320 76 370 72 NOTE 1: If uncertainties are not constant, the size of the error bars must also vary. NOTE 2: % error is not same for each data point. (5) Types of Graphs • Linear x-y graphs • Semi-log graphs (one scale is linear and one is logarithmic) • Log- Log graphs (both scales are logarithmic) (6) Linear x-y plots Dependence of relative density on sugar concentration at 298 K. Concentration (kg/m3) Relative density (-) 0 1.005 50 1.034 100 1.066 150 1.095 200 1.122 250 1.150 Draw the best line by naked eye ? For RD = 1.053, calculate the sugar concentration CAVEATS (1) Do not ignore outliers - investigate them further (2) Interpolation of results is justified when there is a sufficient number of data points ? ? (3) Extrapolation should always be avoided. What are the two characteristics of a linear graph/plot? y = mx + C slope intercept For two data points, one can only draw one line and this is also the best line : (x1,y1), (x2,y2) m= y 2 − y1 and C = value of y at x = 0 x 2 − x1 But when we have a large number of data points (xi ,yi) (i=20 say) which pair of data points should be used to calculate the values of m and C? BEST FIT What are the uncertainties in the best fit values of m and C? Example T (h) Size (mm) 0.5 1.5 ± 0.3 1 2.3 ± 0.3 1.5 3.3 ± 0.3 2 4.3 ± 0.3 2.5 5.4 ± 0.3 Linearization of Equations Prior knowledge about the expected form of dependence. Original data might exhibit a non-linear relationship. Transform one of the variables in such a fashion that transformed variable leads to a linear relationship. Period of oscillation (T) ~ mass of the body (M) M M (kg) T (s) 0.02 0.7 0.05 1.11 0.10 1.6 0.20 2.25 0.30 2.76 0.40 3.18 0.50 3.58 0.60 3.97 0.70 4.16 0.80 4.60 Dependence is not linear Let us recall our high school physics: M k π 2 = M ∝ T M ∝ T ∴ (k : spring constant) T = 2π k y = m M x + C We expect C = 0 Linear graphs great advantage EXAMPLE - 1 1 2 s = ut + at 2 Relationship between “s” and “t” is quadratic s 1 = u + at t 2 y C mx Constant acceleration (a) Initial Velocity @ t = 0 u =u s : distance travelled in time t a = 9.81 m/s2 u = 1 m/s s/t m = a/2 C=u 0 t 5 Linearization has come about at a price? We should now estimate uncertainty in (s/t) NOT in s. EXAMPLE-2 For a radioactive material, N = N0 exp(-λt) N : undecayed nuclei @ time t N0: Initial value of N @ t = 0 λ : characteristic constant of material How to linearize it? ln N = ln N0 + (-λt) ln e ln e = 1 ln N = (-λ)t + ln N0 y = mx + C C = ln N 0 “Semi-log” plot −λ ln N 0 t Now we must estimate uncertainity in lnN? Finally, let us come to log-log graphs: Motivation Sometimes no matter what we do, it is not possible to choose suitable scales for linear graphs. Table : Current-Voltage relationship for a silicon diode Voltage (V) I (Amperes) 0.35 9 x 10-7 0.40 3 x 10-6 0.45 5 x 10-5 0.50 2 x 10-4 0.55 1.7 x 10-3 0.60 1.5 x 10-2 0.65 7.5 x 10-2 0.70 0.55 0.75 3.5 4 3.5 3 I 2.5 Linear 2 1.5 1 0.5 0.4 0.5 V 0.6 0.7 0.8 Semi-Log 1 10 0 10-1 I 0 0.3 10 10 -2 10 -3 10 -4 10 -5 10-6 10-7 0.3 0.4 0.5 V 0.6 0.7 0.8 When both variables entail several orders of magnitudes Use double log coordinates y = axb ln y = ln a + b ln x ynew C m xnew log-log scale 40 NATURE OF UNCERTAINTIES Uncertainty is an inevitable evil, both in experimental and numerical studies. Let us look at a simple test: Same object, constant value of S, same operator/equipment. Time(s) 0.74 0.71 0.73 0.63 0.69 0.75 0.70 0.71 0.74 0.81 s water What do you make of these measurements? Uncertainty is an inherent part of experiments Two Questions 1) Identification of sources: temperature, tube not being vertical, object is not being dropped at the same location/different initial condition/air bubbles attached to it…. 2) Quantification of the uncertainty Practice problems • Fill a coffee making kettle with 1L of water and record the time it takes for the water to boil? What factors will contribute to the variability of the results? (1) Uncertainty (i) Single measurement: • No method to establish the extent of uncertainty. Repeat the test at least one more time. • There are situations when it is not possible to repeat a test: biology, radioactivity, CERN, on the surface of the moon, etc. • Test itself is varying with time. (2) Uncertainty stems from: (a) Resolution of instruments: What is the minimum value the instrument can measure? Length: 1 mm graduations 0.5 mm For better resolution, one can use a micrometer or Vernier callipers, but these also have their least count. 375 ± 1.2mm → 373.8 ≤ true value ≤ 376.2 mm This is what is used for a single test, i.e., the uncertainty introduced by the instrument. (b) Reading uncertainty: N=0 T Corresponds to the value at a fixed point. N≠0 Thermometer shows wild fluctuations. T water So, if you make a single measurement, we can not evaluate the uncertainty arising from the heating process. (c) Calibration uncertainty: All instruments require benchmarking or calibration which can change over a period of time !! heating What is the way forward? The mean or average comes in handy-returning to our earlier example. Time(s) t min = 0.63 s 0.74 0.71 0.73 0.63 0.69 0.75 0.70 0.71 0.74 0.81 tmax =0.81s 1 n 0.74 + 0.71 + 0.73 + .... t min = ∑ t i = n i =1 10 t = 0.721s On average, this is the result we can expect. No. of significant figures? What is the uncertainty in the mean value? range (spread) = x max − x min Uncertainty = For our example: Uncertainty = x max − x min n 0.81 - 0.63 = 0.018 10 Now we should round off, the mean value sensibly to xmean= 0.72 s ∴ P ro b a b le v a lu e o f x (o r t in o u r c a s e ) = 0 .7 2 ± 0 .0 1 8 s One can also quote % uncertainty uncertainty × 100 mean value 0.018 = ×100 = 2.5% 0.72 % uncertainty = Therefore, t mean = x = 0.72s with ±2.5% uncertainty Without repeating measurements, one can not estimate the uncertainty. (3) True value, accuracy & precision Aim of an experiment true value But this is impossible to do. On the other hand, we are trying to approximate the true value by an average or mean value, i.e., x true ≈ x How many times we must repeat the measurements? Recall, x= 1 xi ∑ n Larger the value of n, closer will be If x ≈ x tru e Our measurements are accurate. x to its true value. For example, the charge of an electron is known to be (−1.6021773 ± 0.0000005) ×10−19 C −5 i.e., an uncertainty of 3 × 10 % What is precision? Uncertainty is small accurate! How? range is small, but it does not mean that the results are Let us look at an example: Boiling point of water @ 1 standard atmosphere T (oC) 102.4 102.6 102.3 102.6 102.4 102.4 102.5 102.6 102.4 102.7 Tmean = x = 102.49 o C 102.7 − 102.3 = 0.04 o C 10 ∴ Boiling point of water = 102.49 ± 0.04 o C Uncertainty = This looks very impressive in terms of precision except that it is not very accurate.! Let us use a different thermometer (+0.5 oC) T (oC) 101.0 101.0 100.5 99.0 100.5 101.0 100.5 99.0 x = 100.2 o C 101 − 99 = 0.2 o C 10 ∴ Boiling point of water = 100.2 ± 0.2 o C uncertainty = Less precise, but more accurate experiments. 99.5 In summary, Accurate Precise close to the true value Low uncertainty, but not necessarily close to the true value Accurate & Precise close to the true value, with a small uncertainty TYPES OF UNCERTAINTIES (1) Systematic Difficult to detect and deal with. Offset uncertainty Melting point of ice x(o C) Thermocouple -7.5, -6.9, -7.3 -7.4, -7.6, -7.4 -7.3, -7.7, -7.6 -7.6 x = −7.43, uncertainty = 0.08o C Very precise but inaccurate measurements! ice+water mixture The true value is expected to be close to zero! There is a big offset error here. Check your calibration, electronic gadgets, warm up period, insensitive thermocouple, etc. On the other hand, for a plasma furnace (~ 1500 oC), 7.5 oC is not a significant offset. Try to develop a feel for the answer you are looking for! Gain uncertainty: This varies with the magnitude of quantity itself. Example: Calibration masses and electronic balance Standard mass (g) 0.00 20.00 40.00 60.00 80.00 100.00 Electronic balance value (g) 0.00 20.18 40.70 61.00 81.12 101.68 The difference between the two values increases as the mass increases. (2) Random Uncertainties: • These are responsible for scatter in the measurements. • Environmental factors can also introduce random uncertainty: electrical interference (switching on/off equipment, vibrations caused by rotomachinery, power supply fluctuations (water/air/steam main pressures etc.) If these are truly random, the averaging of several measurements will even out this effect. COMBINING UNCERTAINTIES So far we have talked about uncertainties when we are interested in the measurement directly. Engineering experiments we need to combine several measurements to calculate the quantity of interest Let us say you are given a cylindrical bar of an unknown metal and we want to calculate its density. m D ρ= mass volume L D L Uncertainty in the value of ρ depends upon the uncertainties in the measured values of m, D, L ρ= m D ≡ D ± ∆D π 2 D L 4 L ≡ L ± ∆L m ≡ m ± ∆m ρ ≡ ρ ± ∆ρ One way to estimate ∆ρ m ≡ m + ∆m L ≡ L + ∆L ∆ρ D ≡ D + ∆D D ≡ D − ∆D L ≡ L − ∆L L ≡ L + ∆L L ≡ L − ∆L m ≡ m − ∆m m ≡ m + ∆m m ≡ m − ∆m m ≡ m + ∆m m ≡ m − ∆m m ≡ m + ∆m m ≡ m − ∆m This looks like a lot of hard work!! We can be a little smarter than this: ρ= m (π 4) D L 2 ln ρ = ln m − ln π + ln 4 − 2 ln D − ln L Differentiate it: ∆ρ ∆m ∆D ∆L = −2 − m D L ρ Multiply this equation by 100 on both sides % uncertainty in ρ = % uncertainty in m + 2 × % uncertainty in D + % uncertainty in L NOTE: • Uncertainty in D is multiplied by 2. • All terms have been added up. STATISTICAL ANALYSIS OF DATA Statistics is a science of numbers ! It helps you draw good inferences, but it also gives you confidence to tell lies !! “There are three kind of lies – lies, damned lies, and statistics.” (attributed to Ben Desraeli & Mark Twain). Naturally, statistics and statistical methods “large dataset”. DEFINITIONS Variance of a dataset (i) x is assumed to be the best estimate of the true value of the quantity. Despite uncertainty measurement, x in each individual , the single value ≈ true value. 62 Example: Time to slide down the plane: θ 0.74, 0.74, 0.69, 0.68, 0.80, 0.71, 0.78, 0.65, 0.67, 0.73 x= xi (s) 1 ∑ xi = 0.719 10 di = x- xi (s) ( d 2i = x - xi ) 2 (s2 ) 0.74 -0.021 0.000441 0.74 -0.021 0.000441 0.69 0.029 0.000841 0.68 0.039 0.001521 0.80 -0.081 0.006561 0.71 0.009 0.000081 0.78 -0.061 0.003721 0.65 0.069 0.004761 0.67 0.049 0.002401 0.73 -0.011 0.000121 ∑di = 0 ∑di2 = 0.02089 x = 0.719 ∑d σ = = n Variance, 2 i 2 ∑ ( x − xi ) 2 n For our example, 0.02089 σ = = 0.002089 s 2 10 2 Another related parameter σ= σ = 2 Standard deviation ∑ ( x − xi ) 2 n σ = 0.002089 = 0.04571s Usually σ is NOT strongly dependent on n. Uncertainty in the mean of repeat measurements: Let us say :- We have repeat data sets. Series I Series II 43 - - 55 - - 53 - - 52 - - 55 - - 52 - - 51 - - 54 - - 50 - - 52 - - ----- Series VI I II III IV V VI VII VIII x 51 51.7 50.4 51.5 51.7 50.4 52.5 49.5 σ 3.13 3.29 3.07 3.11 3.20 2.94 2.73 3.20 COMMENTS • Variability in the “means” < Variability in each set. ( x ) − xi ( x ) = 51.1; σ x = n σx = 1 2 = 0.893 σ n Therefore, the best estimate of x is 51.1 ± 0.893 This, however, only eliminates the role of random uncertainty & NOT of the systematic uncertainty. If there are sufficient number of data points without any systematic uncertainties : Frequency or distribution x This is more or less the universal curve which is encountered literally in every application relying on numerous data points. This is called Normal distribution or Bell-shaped curve. What is so special about this curve? Two metrics are needed to describe this population? Mean () (Height of peak) Standard deviation (σ) (spread of curve) x-σ ≤ x ≤ x +σ : Line of symmetry Area under the curve between these limits α number of data points lying in this range. x±σ : ∼ 70% of the total area x ± 2σ : ∼ 95% of the total : σ 6 ± x area ∼ 99.9999% of the total area x-σ x x+σ Formal Treatment of Population & Sample On one hand, we wish to have data which are reliable, reproducible and with as small uncertainty as possible, one can not go on making ∞ repeat measurements. Let us say that a population of measurements with mean µ and σpop: µ = ∑ xi n σ ∑ ( µ − x i = n pop ) 2 1 2 Evidently, µ = true value We would like our “sample” (small sub-set of population) such that sample mean ≈ µ and σ pop ∑ = s = (x − xi n −1 ) 2 1 2 Confidence Bands x-σ x x+σ If 70% of the data lie within x±σ, so we can say that there is a 70% probability to predict the expected outcome within ±σ. If 95% of the data lie within ±2σ, so we can say that there is a 95% probability that we can predict the expected outcome within ̅x̅ ±2σ, etc. REJECTION OF DATA Some will argue “all data are equal”. ∴ It is not correct to through away any data point. Other extreme is that “one data set looks like spurious or suspect” and therefore is less reliable than the other sets. There are statistical tests to deal with this issue. Therefore, the automatic filtering by a computer program or another device should be assessed properly. The question is:“truly spurious” vs.“new phenomenon”? Therefore, meticulous recording of data, observations, unusual features, frequent voltage fluctuations, exceptional temperature, etc. all must be documented in detail in lab notebooks. Methodology of rejecting data One data point strikingly disagrees with all the others. Fall time (seconds) of an object in a liquid : 3.8 , 3.5 , 3.9 , 3.9 , 3.4 , 1.8 very different from all others !! Recall that individual data can differ within a band from each other. However, legitimate discrepancy of this size is highly improbable. Controversial Data rejection Important 72 t = 3.8 + 3.5 + 3.9 + 3.9 + 3.4 + 1.8 ≃ 3.38 6 σ = 0.8 s t = 3.4 s Our suspect measurement of 1.8s deviates , 3.4 − 1.8 = 1.6 s i.e., by 2σ Assuming Normal or Gaussian distribution, we can calculate the probability of a measurement lying outside ±2σ : ؞Probability (outside 2σ) = 1 – probability (within 2σ) 73 95.45 % -2σ 3.4 2σ ؞Probability (outside 2σ) = 1 – 0.9545 = 0.0455 < 0.05 Thus, there is only 5% chance of a measurement lying outside ± 2σ , i.e., 1 in 20 measurements could be beyond ± 2σ . Out of 6 measurements, only 6 x0.05 = 0.3 is likely to be beyond ± 2σ . Chauvenet’s criterion : If this number < 0.5 , this data can be rejected. 74 For N measurements : x1 , x2 ,....., xN xsus : value in doubt tsus = xsus − x σ # of standard deviations Find the probability of (outside tsusσ). No. of expected deviants , n = N x Prob (outside tsusσ) If n < 0.5 reject the data point in question and re-calculate x , σ , etc. 75 EXAMPLE A student makes 10 measurements of mass (g) as follows : 46 , 48 , 44 , 38 , 45 , 47 , 58 , 44 , 45 , 43. x = 45.8 , σ = 5.1 Our suspect is : 58 ؞ tsus = 58 − 45.8 = 2.4 5.1 i.e., our suspect deviates by 2.4σ. Prob (outside ± 2.4σ) = 1 – Prob (inside ± 2.4σ) = 1 – 0.9836 = 0.016 ؞In a set of 10 measurements , 10 x 0.016 = 0.16 of data can be outside ± 2.4σ. 76 Since 0.16 < 5 , we can safely reject this data point. New results are : x = 44.4, σ = 2.9 Not much change in x , but σ has dropped significantly. Remember, the choice of n < 0.5 is arbitrary. 77 CONCLUDING REMARKS • Units of measurements, scientific notation, their representation, uncertainties, significant figures. • Presentation of data (Graphs, tables, bar/pie charts). • Uncertainties: Systematic/Random • Statistical analysis of data: linear regression/nonlinear regression, adequacy of fit, R2, etc. 78