Download Data Analysis

DATA ANALYSIS 1. Error 1. A. Error is always present Scientific experiments are carried out to measure quantities of interest and to develop and test theories. Error is present in all experiments and prevents one from obtaining the "true value" of any measurable quantity. Although the true value of a quantity is unknowable due to error, well-defined bounds can be placed on experimental uncertainty. 1. B. Terminology about errors Systematic Error Random Error Accuracy Precision Reproducible inaccuracy (always the same sign and magnitude); can be discovered and corrected in principle Indeterminate fluctuations (positive and negative); can be reduced by averaging independent measurements Nearness to "truth"; depends on how well systematic errors are controlled or compensated for Reproducibility; depends on how well random error can be overcome Pictorial Example: imprecise (but fairly accurate on average) precise (but inaccurate) From the above definitions, it is seen that  Minimizing systematic error increases the accuracy of a measurement  Minimizing random error increases the precision of a measurement Example: The Hubble telescope was precise (flat to /50) but inaccurate (focal length error of 1 mm). However, since the error was systematic, NASA was able to correct it with compensating lenses. 2. One-Dimensional Measurements One-dimensional measurements are measurements of a value of a physical property. A data set consists of a set of repeated measurements, {x1, x2, ..., xn}. An example is the determination of the mass of a sample by several weighings. 2. A. Distribution of one-dimensional measurements Repeated experiments will yield a histogram of measurements centered about an average value (mean) with a characteristic spread (standard deviation). In the limit of an infinite number of measurements, the probability distribution is observed to be a Gaussian distribution (or normal error distribution).  lim n P(x)  x x 68% of the area under a Gaussian distribution lies between ; 95% of the area lies within . 2. B. Parent distribution The parent distribution is the “true” distribution that would be obtained if an infinite number of measurements could be conducted. parent mean: 1 n   lim  xi n n i 1 parent standard deviation: 1 n 2   lim   xi    n n i 1 The mean,  of the distribution is the average value. The standard deviation,  is the square root of the average squared deviation from the mean. 2. C. Sample distribution The sample distribution is an observed distribution obtained from a finite number of measurements. sample mean: 1 n m  x   xi n i 1 sample standard deviation: 1 n s  xi  x  2  n  1 i 1 One “degree of freedom” is used to determine the mean of the distribution; hence, the divisor is n1 in the sample standard deviation. Note the use of greek letters for the parent distribution and roman letters for the sample distribution. 2. D. Reporting values When reporting values, always report the mean, standard deviation, and units: x  s units Use two significant figures for s and match precision for x . Example: l = 12.5  1.3 mm. Note that the mean has three significant figures and the standard deviation has two significant figures; however, the precision of both quantities is 0.1 mm. 2. E. Significant figures Use of standard deviations can be thought of as “advanced significant figure theory” because the standard deviation specifies the uncertainty in a value more precisely. We will also see that there are methods to propagate uncertainty during calculations. 130 132  6 Preview:  2500 2600  2452  18 2584  19 3. Two-Dimensional Measurements Two-dimensional measurements are measurements that describe how one physical property depends on another. A data set consists of (x,y) pairs, {(x1,y1), (x1,y1), ..., (xn,yn)}. For example, a set of (T,p) data points describes how pressure depends on temperature. 3. A. Linear least square fitting Linear least squares fitting is a method which finds the best straight line fit to a set of (x,y) data points, i.e., finds the slope m and intercept b of the function mx+b which best fits the observed data. (Actually the method finds the best fit values for parameters which appear linearly in the fitting function, but a straight line is the most common case.) 3. B. Derivation of the least squares best fit for a straight line If 1) the two variables are linearly related, i.e., by y = mx + b, 2) the parent distribution is Gaussian, and 3) all standard deviations are equal, then the best fit of the data {(x1,y1), (x1,y1), ..., (xn,yn)} is obtained by minimizing the sum of squared differences between the observed data and predicted fit n  R  residual   yi  yfit i i 1 If the fitting function is a straight line  2 y fit  mx  b then the residual may be written as R    yi  mxi  b n 2 i 1 R is minimized with respect to variations in fitting parameters m and b by setting its partial derivatives equal to zero R  2  yi  mxi  b  0   m m R  2  yi  mxi  b  0   b b Evaluating these derivatives yields  2 yi  mxi  b xi   0      2 y  mxi  b1  0 i which can be simplified by dividing by 2, separating the summations, and recognizing that 1=n  yi xi  m xi2  b xi  0 y  m xi  b1  0 This leaves two equations and two unknowns. Solving for m and b yields 1 m n xi yi   xi  yi  1 b  xi2  yi   xi  xi yi  where i      n xi2    x  2 i Furthermore, the standard deviations may be shown to be 1 s  std dev of fit   yi  mxi  b n2   s2n sm  std dev of slope        2 12  s 2  xi2   sb  std dev of intercept      Observe that two “degrees of freedom” are used to determine the slope and intercept of the fitting function; hence, the divisor is n2 in the standard deviation of the fit. 12 3. C. Using the least squares best fit formulas In practice, one uses a computer program or spreadsheet to accumulate the summations n,  xi ,  xi2 ,  yi ,  xi yi and then calculate m  sm , b  sb , s The units of m and sm are the units of the slope, i.e., the y units divided by the x units. The units of b, sb, and s are the same as the y units. 3. D. Intuitive definitions of s, sm, and sb The following figure shows the best fit to a set of data points as a solid line. Two limiting “reasonable” fits are also shown as dashed lines. sm = std dev of slope s = std dev of fit sb = std dev of intercept The standard deviation of the fit s is approximately the average difference in y between each data point and the best fit line. The standard deviation of the slope sm is approximately the difference in slope between the best fit line and a limiting reasonable fit line. The standard deviation of the intercept sb is approximately the difference in the y-intercept between the best fit line and a limiting reasonable fit line.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Data Analysis