Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Application of Statistics Shakeel Ahmad Department of Physics A.M.U., Aligarh All measurements, how carefully done, involve errors These errors may be associated with technical difficulties Imperfection of measuring instrument, limitation of human eye and several other factors that can not be taken into account, like fluctuations in temperature, motion of air stream around the instrument Accuracy How close a measured value is to the actual (true) value. Precision How close the measured values are to each other. Low Accuracy High Precision High Accuracy Low Precision High Accuracy High Precision Accuracy signifies the degree of closeness between the measured value and true value of the quantity. Precision is the limit or resolution to which a quantity is measured. No matter how carefully we measure, we can never obtain a result more precise than the measuring device. The limit of precision of a measuring device is ±½ the smallest division of the device. Significant Figures: Accuracy of measurements • While measuring a quantity, there is a limit to accuracy of measurement • This accuracy depends upon the number of digits upto which the value of a quantity is known. • It is, therefore, necessary to use appropriate number of digits to express the accuracy and reliability. • No purpose is served by writing additional digits. We can include only one more digit about which we are not sure. • The number of digits in a measurement about which we are reasonably sure plus the first uncertain digit are called significant figure. Example: Length of an object is written as 2.54 cm 3 significant figures 2.5 is reliable. 3rd figure, i.e. 4 is uncertain For a length of 638.6 4 sig. fig. The last digit 6 is uncertain. Points to remember 1. Number of significant figures does not vary by varying the units selected 638.6 cm = 6.386 m 2. It is customary to write the decimal after the first digit 638.6 cm = 6.386 x 102cm 0.0438 kg = 4.38 x 10-2kg Rules for counting significant figures: 1. All non-zero digits are significant figures. If there is a decimal point, its position does not matter. 23.73 has 4 sig.fig. 2. All zeros between two non-zero digits are significant figures and it does not matter where the decimal point occur. 2090.03 has 6 sig.fig. 3. If there is no decimal point, zero at the end is meaning less and are not counted. 143000 has only 3 sig.fig. 4. If there is a decimal point, zeros at the end are significant. 36.00 or 0.3600 has 4 sig. fig. but Initial zeros after the decimal points are not significant, i.e., zeros on the right of decimal and to the left of non-zero digits are not significant. 0.0034 has only 2 sig.fig. 5. Zero on the left of decimal point for a number < 0 is never significant 6. A number with 3 significant figures (say) gives an accuracy of 1 part in 100 to 1000 A number with 6 significant figures (say) gives an accuracy of 1 part in 105 to 106. Rules of Rounding off the digits 9.876 9.88 Last digit > 5 9.874 9.87 Last digit < 5 If the last digit is 5, e.g., 9.775 9.78 incremented by one if the last digit is odd. 9.785 9.78 no change in the last digit is even Rounding off should be such that the last digit should be even Exercise: Two cesium clocks, if run for 100 years, free from disturbances may differ by only about 0.02s. What does this imply for the accuracy of the clock in measuring a time interval of 1s. Time interval in 100 years = 100 x 365.25 x 24 x60 x 60 s Difference in the two clocks = 0.02s Measured time interval = 3155760000 ± 0.02 = 3155760000.02 or 3155759999.98 Both these values have 12 sig. fig. The accuracy of the clock is 1 part in 1011 to 1012 = Alternative ΔT = 0.02s = 2 x 10-2s T = 100 years = 3.15576 x 109s ΔT/T = 6 x 10-12 Accuracy ≈ 10-11 to 10-12 Rules for Arithmetic Operations with Significant Figures: Results from additions and subtractions are retained to the least decimal places that is present in the numbers being added/subtracted Sum or difference of the two or more numbers has significant figures only in those places where these are in the least precise amongst the given numbers 3.123 + 40.5 + 2.0123 = 45.7253 The answer should be only in three significant figures, i.e. 45.7 (least accurate is 40.5 and has only 3 sig. fig.) Similarly 53.312 – 53.3 = 0.012 But the answer should be 0.0, as the least accurate is 53.3 and has only one digit after decimal point Multiplication or Division of two numbers should have the sig. fig. that is present in the least precise of the given numbers, e.g., 4.08 x 16 = 65.28 but the answer is 65, because the less precise number, 16 has only 2 sig.fig. Similarly, 6300/ 11.97 = 526.31578 but The answer should be 530, because the less precise is 6300 which has 2 sig.fig Note: 3rd digit is > 5 in 526.31578 it is written as 530 Example: For a rectangular block, l = 4.234m, b = 1.005m, h = 2.01cm. Find the area and volume to correct sig. fig. Area = 4.25517m2 = 4.255m2 ( rounded off to 4 sig. fig.) Volume = 0.0855289m3 = 0.0855m3 ( rounded off to 3 sig. fig.) Example: Two masses, m1 = 20.15g and m2 = 20.17 g Difference = 0.02g (one sig.fig) sig, fig. are lost in difference calculations Errors in Measurements It is impossible to find a true value (in practice), while measuring a physical quantity Error: Difference in the measured and true value of a physical quantity. Degree of Accuracy Accuracy depends on the instrument you are measuring with. But as a general rule: The degree of accuracy is half a unit each side of the unit of measure. Examples: When the instrument measures in 1 unit then any value between 6½ and 7½ is measured as 7 units Thus, the value could be between 6½ and 7½ and is written as 7.0 ± 0.5 The error is ± 0.5 When the instrument measures in 2 units then any value between 7 and 9 is measured as 8 units When the value lies between 7 and 9, it is written as 8±1 The error is ±1 Although Physics is exact science, physical instruments do not give the exact values of the quantities measured. All measurements in Physics/Science are inaccurate (generally) in some degree We often say that the true or actual value of a Physical quantity can not be found. We, however, assume that exact value exists and we concern to estimate the limits between which this value lies. The closer these limits, more accurate is the measurement. Such errors follow no simple law and arise due to various causes. A single observer with the same apparatus will record different values of the same quantity. Errors are usually combination of accidental error and systematic errors Accidental Errors occur due to the observer and can be minimized by repeated observation. They are disordered. Systematic errors arise due to the observer or instrument and are usually more troublesome because repeated observations do not reveal them (usually) even when their nature of existence is known. It is difficult to determine and eliminate such errors Assessment of possible errors in any measured quantity is of fundamental importance in science. Types of Errors • Accidental • Systematic • Random Accidental errors estimated by certain statistical concepts. Systematic or personal errors assumed to be small and neglected. But sometimes it becomes rather a serious issue and special treatments are required for elimination Systematic errors of instrument Systematic or Methodic Errors Errors which tend to be in one direction, positive or negative and arise on account of Short coming of measuring method Imperfection of the theory of physical phenomenon to which the quantity being measured is related to Lack of accuracy of the formula used for calculations e.g. Weighing a body on physical balance The systematic errors may be due to the fact that buoyant forces, acting on the body and weights, are not accounted Such errors may be reduced by Changing/improving the measuring method Applying corrections to the formula used Instrumental Errors: Occur due to imperfection of the design inaccurate manufacturing of instrument e.g., stop watch: running may change due to change in temperature or center of the dial may not coincide with the axis of rotation of its hands. It is not possible to eliminate such errors completely. Random Errors: occur irregularly Arise due to the verity of the factors which can not be taken into account. These are not associated with any systematic cause or with definite law of action. e.g., reading of a sensitive beam balance may be affected by the vibrations of the building or settling down the dust particles on the pan. Although such errors can not be completely eliminated, but can be reduced by repeated observations R Random Errors are calculated on the basis of theory of probability by applying the Gaussian law of normal distribution According to this law, probability of an error +ΔA in a measurement of a quantity A is the same as the probability of -ΔA in the very measurement Frequency -ΔA +ΔA Arithmetic mean of a large number of observations is likely to be closer to the true value Absolute Errors Let A1, A2, ………….. An are the measured values of a quantity A in n attempts The arithmetic mean, Am = ΣAi/n Since the true value of the quantity is not known, Am may be considered as the true value. The magnitude of the deviation of any measurement from the arithmetic mean is called the absolute error of the measurement Thus, the absolute errors are ΔA1 = | Am – A1| ΔA1, ΔA2 are magnitudes ΔA2 = | Am – A2| and are always taken as +ve Mean Absolute Error: Arithmetic mean of all the absolute errors Δ Am = Σ Δ Ai/n final absolute error The final results of the measurement is A = Am ± Δ Am Any single measurement of quantity A is likely to be Am - ΔAm ≤ A ≤ Am + ΔAm Relative or % of error = ΔAm/Am or (ΔAm/Am) x 100 % Example: Diameter of a ball was measured five times whose absolute error is Δdinst = ±0.01mm Observations are: d1 = 5.27mm d2 = 5.30mm d3 = 5.28mm d4 = 5.33mm d5 = 5.28mm Find Mean diameter dmean = (5.27+5.30+5.28+5.32+5.28)/5 = 5.29mm Absolute errors Δd1 = dm – d1 = 0.02mm Δd2 = dm – d2 = 0.01mm Δd3 = dm – d3 = 0.01mm Δd4 = dm – d4 = 0.03mm Δd5 = dm – d5 = 0.02mm Mean absolute error = (0.02+0.01+0.01+0.03+0.01)/5 Δdmean = 0.02mm Since Δdmean > Δdinst, result is dmean ± Δdmean = 5.29±0.02 Relative error = Δdmean /dmean = 0.02/5.29= 0.04 % of error = 4% Example: A box is measured to the nearest 2 cm, and got 24 cm × 24 cm × 20 cm Measuring to the nearest 2 cm means the true value could be up to 1 cm smaller or larger. The three measurements are: l = 24 ± 1 cm b = 24 ± 1 cm h = 20 ± 1 cm The smallest possible Volume is: 23cm × 23cm × 19cm = 10051 cm3 The measured Volume is: 24cm × 24cm × 20cm = 11520 cm3 The largest possible Volume is: 25cm × 25cm × 21cm = 13125 cm3 Thus the measured value lies between 10051 and 13125 Absolute Error • From 10051 to 11520 = 1469 • From 11520 to 13125 = 1605 We pick the bigger one, i.e., Absolute error = 1605cm3 Relative Error = 1605 cm3 11520 cm3 Percentage Error = 13.9% = 0.139... • Accuracy: How close is the measured value to the actual value. • Precision: How close are the measured values with each other (quite independent of systematic errors). (X1, X2, …….., Xn) are of high precision if di values are small whatever the values of(𝑋 − 𝑋0 ). The accuracy is high if ei values are small → 𝑋 − 𝑋0 is small too. • Accuracy includes precision but precision does not include accuracy. If true value of quantity = X0, and its recorded value =X e = (X – X0) is error in X0 X = X0 + e = X0 (1+f); f = e/x0 is fractional error e may be positive or negative but we assume |e| << |X0| 𝑋 𝑋0 1 = 1 + 𝑓 𝑜𝑟 = ≈ (1 − 𝑓) 𝑖𝑓 𝑓 ≪ 1 𝑋0 𝑋 1+𝑓 𝑒 𝑒 𝑋 𝑒 𝑒 = . = 1+𝑓 ≈ 𝑎𝑠 𝑓 ≪ 1 𝑋0 𝑋 𝑋0 𝑋 𝑋 For single measurement: estimation of error is widely wrong. (Lack of precision of instrument, personal + accidental error) if a quantity X0 unit is measured n times as X1 , X2 ,……………….., Xn We write Xi = X0 + ei ; Xi is any recorded value and ei is error associated. Arithmatic mean 𝑋1 + 𝑋2 + … … … … . . 𝑋𝑛 𝑋= 𝑛 𝑒1 + 𝑒2 + … … … … 𝑒𝑛 = 𝑋0 + 𝑛 e1, e2…….. en : some may be +ve and some –ve (e1 + e2 + ….. + en)/n may be very small and << e 𝑋 − 𝑋0 ≪ 𝑒 𝑋 is close to X0, represents best value. It is not possible to find e1, e2 ……….. as X0 is not known. We therefore, examine scatter or dispersion about 𝑋 and not about X0. 𝑋𝑟 = 𝑋 + 𝑑𝑟 ; dr is deviation from 𝑋, often called residual of Xr . 𝑋𝑟 = 𝑋0 + 𝑒𝑟 = 𝑋 + 𝑑𝑟 𝑒𝑟 − 𝑑𝑟 = 𝑋 − 𝑋0 , also 𝑒1 + 𝑒2 + … … + 𝑒𝑛 = 𝑛 𝑋 − 𝑋0 and 𝑑1 + 𝑑2 + … … . + 𝑑𝑛 = 0 By repeated measurements of the same quantity, accidental errors may be corrected to some degree, but systematic errors can not be Combination/Propagation of Errors Errors in Compound Quantities Errors in sum/difference A, B the two quantities with their errors as ΔA and ΔB respectively Measured value: (A ± ΔA) and (B ± ΔB) Let Z = A + B Z + ΔZ = (A ± ΔA) + (B ± ΔB) = (A + B) ± (ΔA + ΔB) ΔZ = ± (ΔA + ΔB) Maximum error in Z = (ΔA + ΔB) Similarly: for Z = A – B Z – ΔZ = (A-B) ± (ΔA + ΔB) Errors in Product/Division 𝐿𝑒𝑡 𝑍 = 𝐴 ∗ 𝐵; 𝑍 ± ∆𝑍 = 𝐴 ± ∆𝐴 ∗ 𝐵 ± ∆𝐵 𝑍 ± ∆𝑍 = 𝐴𝐵 ± ∆𝐴. 𝐵 ± 𝐴. ∆𝐵 ± ∆𝐴. ∆𝐵 𝐷𝑖𝑣𝑖𝑑𝑒 𝑏𝑦 𝑍: ∆𝑍 ∆𝐴 ∆𝐵 ∆𝐴. ∆𝐵 1 ± =1 ± + + 𝑍 𝐴 𝐵 𝐴. 𝐵 ∆𝑍 ∆𝐴 ∆𝐵 = + 𝑍 𝐴 𝐵 ∆𝐴 ∆𝐵 ∆𝑍 = 𝐴. 𝐵 + 𝐴 𝐵 Let Z = A/B; Z ± ∆Z = A ± ∆A / B ± ∆B ∆𝐴 𝐴 ∆𝐴 ∆𝐵 𝐴 𝑍 ± ∆𝑍 = = 1± 1± ∆𝐵 𝐵 𝐴 𝐵 𝐵 1± 𝐵 𝐴 1± 𝐴 ∆𝐴 ∆𝐵 𝑍 ± ∆𝑍 = 1± 1± 𝐵 𝐴 𝐵 = 𝐴 𝐵 1 ± ∆𝐴 𝐴 ± ∆𝐵 𝐵 ± ∆𝐴.∆𝐵 𝐴.𝐵 ∆𝐴 ∆𝐵 𝑍 ± ∆𝑍 = Z ± 𝑍. ± 𝑍. 𝐴 𝐵 A ∆A ∆B ∆Z = ± + B A B −1 Error in Power 𝐿𝑒𝑡 𝑍 = 𝐴2 ; 𝑍 ± ∆𝑍 = 𝐴 ± ∆𝐴 2 = 𝐴2 ± 2𝐴∆𝐴 ± ∆𝐴 = 𝑍 ± 2𝐴∆𝐴 ∆𝑍 = 2𝐴. ∆𝐴 Error in Z ∆𝑍 Relative Error in Z, 𝑍 = 2𝐴.∆𝐴 𝐴2 = 2 ∆𝐴 2. 𝐴 Relative error in Z is twice the relative error in A. In General: Relative error in 𝑍 = ∆𝑍 𝑍 = ∆𝐴 𝑝. 𝐴 + 𝐴𝑃 𝐵𝑞 𝐶𝑟 ∆𝐵 𝑞. 𝐵 + is given by ∆𝐶 𝑟. 𝐶 Q.: % of error in mass and speed are 2% and 3%. 1 How much is the max. error in K.E (𝐸 = 𝑚𝑣 2 ) 2 ∆𝐸 ∆𝑚 ∆𝑣 2 3 8 = + 2. = + 2. = 𝐸 𝑚 𝑣 100 100 100 ∆𝐸 𝐸 = 8% 𝑎3 𝑏 2 𝑐.𝑑 Q: A physical quantity is given by 𝑃 = % of errors in a, b, c and d are 1%, 3%, 4% and 2%. If value of P = 3.763, to what value it should be rounded off. ∆𝑃 ∆𝑎 ∆𝑏 1 ∆𝑐 ∆𝑑 = 3. + 2. + + 𝑃 𝑎 𝑏 2 𝑐 𝑑 1 3 1 4 2 = 3. + 2. + + = 100 If P = 3.763, 100 ∆𝑃 = 𝑃 2 100 13 × 100 100 13 100 = 13% = 0.48919 The result should be rounded off to 3.8 as there is uncertainty even in regard to 2nd sig. fig., i.e. 7 in 3.763 Errors in Compound Quantities …… contd. If y is some function of x, the error in y is obtained due to error in X using some mathematical technique Q = a*b, with f1 and f2 the fractional errors in a, b. Q = 𝑎0 1 + 𝑓1 ∗ 𝑏0 1 + 𝑓2 = 𝑎0 + 𝑎0 𝑓1 ∗ 𝑏0 + 𝑏0 𝑓2 neglect 𝑄 = 𝑎0 𝑏0 + 𝑎0 𝑏0 𝑓1 + 𝑎0 𝑏0 𝑓2 + 𝑎𝑎00𝑏𝑏0 𝑓 𝑓11𝑓𝑓22 𝑄 ≈ 𝑎0 𝑏0 1 + 𝑓1 +𝑓2 Fractional error in Q is 𝑓1 +𝑓2 , i.e. sum of fractional errors in a and b Error in Quotient 𝑎 𝑎0 1 + 𝑓1 𝑎0 𝑄= = = 1 + 𝑓1 ∗ 1 − 𝑓2 + ⋯ 𝑏 𝑏0 1 + 𝑓2 𝑏0 ≃ 𝑎0 𝑏0 1 + 𝑓1 − 𝑓2 Thus the fractional error in a/b is approximately the difference in fractional errors in a and b In general if 𝑄 = 𝑎𝑏𝑐… , 𝑙𝑚𝑛… fractional error in Q = (sum of fractional error in a, b, c, ….) (sum of fractional error in l, m, n, ….) Examples Ohm’s law: I = E/R, with fE and fR as the fractional errors in E and R Fractional error in i = fE – fR. Acceleration due to gravity 𝑔= = 4𝜋2𝑙 𝑇2 with l = l0(1+f1) and T = T0(1+f2) 4𝜋 2 𝑙0 1 + 𝑓1 2 = 4𝜋 2 𝑙0 1 + 𝑓1 𝑇0 1 + 𝑓2 𝑇0 2 1 + 2𝑓2 4𝜋 2 𝑙0 = 1 + 𝑓1 − 2𝑓2 2 𝑇0 = 𝑔0 1 + 𝑓1 − 2𝑓2 2 Use of Calculus: X is a measured quantity with δX as error. Y = f(X) estimated with δY as error. 𝜕𝑌 lim 𝛿𝑋→0 𝜕𝑋 = 𝑑𝑌 𝑑𝑋 𝒆𝒓𝒓𝒐𝒓 𝒊𝒏 𝒀 = → 𝜕𝑌 𝜕𝑋 ≡ 𝑑𝑌 𝑑𝑋 if δX is small. 𝒅𝒀 . 𝝏𝑿 𝒅𝑿 As an example, let Y be the area of a circle of radius x, Then 𝑌 = 𝜋𝑋 2 𝑑𝑌 𝑑𝑋 = 2π𝑋 → 𝜕𝑌 = 2𝜋𝑋𝜕𝑋 𝜕𝑌 𝜕𝑋 = 2𝜋𝑋 𝜕𝑌 𝑚𝑎𝑦 𝑏𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑜𝑟 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 , 𝑑𝑒𝑝𝑒𝑛𝑑𝑖𝑛𝑔 𝑜𝑛 𝑡ℎ𝑒 𝑠𝑖𝑔𝑛 𝑜𝑓 𝜕𝑋 𝑖𝑓 𝛿𝑋 𝑖𝑠 𝑠𝑚𝑎𝑙𝑙 Fractional error in the area 𝜕𝑌 2𝜋𝑋𝜕𝑋 𝜕𝑋 = = 2 2 𝑌 𝜋𝑋 𝑋 𝑖. 𝑒. , twice the fractional error in radius. In general if Q is function of several measured quantities, X, Y, Z, …… the error in Q due to errors 𝜕𝑋, 𝜕Y, 𝜕Z is 𝜕𝑄 𝜕𝑄 𝜕𝑄 = 𝜕𝑋 + 𝜕𝑌 + … … … … . . 𝜕𝑋 𝜕𝑌 This is often called as principle of superposition of errors. In general if Q =f(X,Y,Z, ….) 𝜕𝑄 𝜕𝑄 𝜕𝑄 = 𝜕𝑋 + 𝜕𝑌 + … … … … . . 𝜕𝑋 𝜕𝑌 → 𝑝𝑟𝑖𝑛𝑐𝑖𝑝𝑙𝑒 𝑜𝑓 𝑠𝑢𝑝𝑒𝑟𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟𝑠 error in Q due to X alone and 𝜕𝑌, 𝜕𝑍, … … . = 0 Let 𝜕𝑋 have value between -e1 to +e1 𝜕Y have value between –e2 to e2 𝜕𝑄 2 = 𝜕𝑄 𝜕𝑋 ∗ 𝑒1 2 + 𝜕𝑄 𝜕𝑋 ∗ 𝑒2 2 + ………… 𝜕𝑄 is the square root of sum of squares of greatest errors due to errors in each. Example: Simple pendulum g= 4𝜋2 𝑙 𝑇2 𝜕𝑔 = 𝜕𝑔 𝜕𝑙 𝜕𝑙 + 𝜕𝑔 𝜕𝑇 𝜕𝑇 4𝜋2 8𝜋2 𝑙 𝜕𝑔 = 2 𝜕𝑙 − 3 𝜕𝑇 𝑇 𝑇 𝜕𝑔 𝜕𝑙 𝜕𝑇 = −2 𝑡ℎ𝑎𝑡 𝑔 𝑙 𝑇 𝑖𝑠 𝑓1 − 2𝑓2 Example: A quantity Y is expressed in terms of a measured quantity X by the relation, Y = 4X – (2/X). Find the percentage of error in Y if there is an error of 1% in X. 𝑑𝑌 𝑑𝑋 = 4 + 2/𝑋 2 2 𝜕𝑌 = 4 + 2 . 𝜕𝑋 𝑋 𝜕𝑌 𝑌 percentage error in Y = ∗ 100 100 ∗ 4 + 2/𝑋 2 100 ∗ 4𝑋 2 + 2 = 𝜕𝑋 = 𝜕𝑋 2 (4𝑋 − 2/𝑋) 𝑋(4 𝑋 − 2) if 𝜕𝑋 𝑋 4𝑋 2 +2 = 1/100, % error in y is 2 4𝑋 −2 For a free falling object, distance is given by 1 𝑔𝑡 2 2 𝑥= 𝑡ℎ𝑒𝑛 𝑑𝑥 = 𝑔𝑡 𝑜𝑟 𝑑𝑥 = 𝑔𝑡. 𝑑𝑡 𝑑𝑡 i.e. change in time dt causes a change in x by dx. If 𝑔 = 10𝑚/𝑠2 𝑎𝑛𝑑 𝑥= 1 𝑔𝑡 2 2 = 1 2 𝑡 = 4 ± 0.55𝑠 × 10 × 16 = 80 𝑚. 𝑑𝑥 = 𝑔𝑡. 𝑑𝑡 = 10 × 4 × 0.5 = 20 𝑚. In general 𝒅𝒚 = 𝒅𝒇 𝝏𝒙 , 𝒅𝒙 often tedious Some Statistical Ideas The experimental scientist does not regard statistics as an excuse for doing bad experiment. Frequency distributions: Numerical data on scientific measurements (and industrial and social statistics) are often represented graphically to aid their appreciation. First step To arrange these in convenient order if they are large in numbers. This is often done by grouping them into classes according to their magnitude or according to suitable intervals of a variable on which they depend. The number of data in a particular class or group is usually called the frequency for that class The following represents the frequency distribution of student’s marks (in %) in an exam Class Freq. Class Freq. 0-9 2 50-59 32 10-19 5 60-69 25 20-29 6 70-79 10 30-39 14 80-89 2 40-49 22 90-99 2 Width of class is the difference between the1st numbers of two consecutive classes, i.e. 10 here From the table one can appreciate the distribution of marks But Graphically. Frequency Distribution Polygon Bar Graph Frequency Distribution Histogram Histogram: A series of rectangles are constructed of width equal to class width and area equal to frequency of the corresponding class. The areas are 2, 5, 6, 14,... units. Total area of histogram = 120 units = total no. of students. The heights of rectangles are proportional to freq. when classes are of equal width. In this case, the mean height of rectangles = mean frequency The Mean: Mean freq. is not of particular significance. What is often important is mean of the data. If f1, f2, …. are frequencies in various classes and x1, x2, …. are mid values of variables then Mean value of the variable is given by (f1x1 + f2x2+ ……… + fnxn)/(f1 + f2 + …….. + fn), This is weighted mean of x1, x2, ……. , xn; the weights being the freq. of corresponding classes. It is written as 𝑛 𝑖=1 𝑓𝑖 𝑥𝑖 / 𝑛 𝑖=1 𝑓𝑖 or 𝑓𝑥 𝑓 = 𝑥 To evaluate 𝑥 may be sometimes tedious. The arithmetic can be minimized as: Let 𝑥𝑖 = 𝑥 ′ 𝑖 + 𝑚, where m is some constant then 𝑓1 𝑥1 + 𝑓2 𝑥2 + … … . +𝑓𝑛 𝑥𝑛 = 𝑓1 𝑥 ′1 + 𝑚 + 𝑓2 𝑥 ′ 2 + 𝑚 + … … 𝑓𝑛 𝑥 ′ 𝑛 + 𝑚 = 𝑓1 𝑥 ′1 + 𝑓2 𝑥 ′ 2 + … 𝑓𝑛 𝑥 ′ 𝑛 + 𝑚 𝑓1 + 𝑓2 + … + 𝑓𝑛 Therefore 𝑓1 𝑥1 + 𝑓2 𝑥2 + …+ 𝑓𝑛 𝑥𝑛 𝑓1 +𝑓2 + …+ 𝑓𝑛 𝒙 = 𝒙′ + 𝒎 = 𝑓1 𝑥 ′ 1 +𝑓2 𝑥 ′ 2 + …𝑓𝑛 𝑥 ′ 𝑛 𝑓1 +𝑓2 + …+ 𝑓𝑛 ∶ 𝒙′ 𝒊𝒔 𝒎𝒆𝒂𝒏 𝒐𝒇 𝒙′𝒊 +𝑚 By choosing m conveniently we can make evaluation of 𝑥 ′ simpilar than evaluation of 𝑥. 𝑥 ′ will be small if m is close to 𝑥 m is often called the working mean or assumed mean. Example From the Table, below, the mean value can be estimated as 172 10 x= =3 ≅ 3.9 44 11 xi fi fixi 𝒙′ 𝒊 = 𝒙𝒊 − 𝟑. 𝟓 𝒇𝒊 𝒙′ 𝒊 0.5 1 0.5 -3 -3 1.5 5 7.5 -2 -10 2.5 7 17.5 -1 -77 3.5 9 31.5 0 0 4.5 10 45.0 1 10 5.5 8 44.0 2 16 6.5 4 26.0 3 12 SUM 44 172.0 18 However, examining the data, we find that the mean lies between 3.5 and 4.5 Taking m = 3.5 values of x’i are tabulated in 4th column Values of fixi are presented in the last column = 18 44 Therefore 𝑥 = 𝑥′ This gives 𝑥′ +𝑚 = 18 44 + 3.5 = 10 3 11 ≃ 3.9, as found directly Thus the amount of arithmetic, and consequently the likelihood of error, is reduced. The Median : If a set of observations are arranged in ascending or descending order of magnitude the observation in the middle of the set is called MEDIAN. If number of observations are odd (2n+1) say then median is (n+1)th value. If no. of observations are even 2n say then median is the mean of nth and (n+1)th term. e.g. 10, 12, 13, 7, 20, 18, 9, 15, 11 In order 7, 9, 10, 11, 12, 13, 15, 18, 20 then median = 12. If last value 20 is not there, then median = 11+12 2 = 11.5 To find the Median, place the numbers you are given in order and find the middle number. To find the median of {13, 23, 11, 16, 15, 10, 26}. We put them in order: {10, 11, 13, 15, 16, 23, 26} The middle number is 15, so the median is 15. (If there are two middle numbers, average of the two is taken.) MODE: The number which appears most often in a set of numbers. in {6, 3, 9, 6, 6, 5, 9, 3} the Mode is 6 (it occurs most often). Mode corresponds to maximum frequency. The histogram tends to be more and more close to the frequency curve as the number of observations increases. The Mean Deviation The mean of the distances of each value from their mean. Three steps to find the mean deviation • Find the mean of all values • Find the distance of each value from that mean (subtract the mean from each value, ignore minus signs) • Then find the mean of those distances Mean Deviation of 3, 6, 6, 7, 8, 11, 15, 16 Mean = (3+6+6+7+8+11+15+16)/8 = 72/8 = 9 distance of each value from that mean is evaluated as Mean deviation = (6+3+3+2+1+2+6+7)/8 = 30/8 = 3.75 It tells how far, on average, all values are from the middle Value 3 6 6 7 8 11 15 16 Distance from 9 6 3 3 2 1 2 6 7 Measures of Dispersion : An important characteristic of a set of data) or scatter of the data about some value, e.g. mean. For the numbers: 40, 50, 60, 70, 80 mean = 60 Scatter or dispersion = 20 on each side. They are all within limits 60 ± 20. Various parameters are used to measure dispersion. 1. Range 2. Mean deviation 3. Standard deviation Range of freq. distribution = max value – min value of variable. It is a simple measure of dispersion but has limitations due to simplicity. Mean Deviation : If x1, x2 …….. xn are set of data and 𝑥 is mean. Deviation from the mean is 𝑥1 − 𝑥 , 𝑥2 − 𝑥 … … 𝑥𝑛 − 𝑥 𝒐𝒓 𝑑1 , 𝑑2 , … … , 𝑑𝑛 Some of the di values are positive and some negative. But 𝑑1 + 𝑑2 + … … + 𝑑𝑛 = 0 𝑖. 𝑒., 𝑥1 +𝑥2 + … … . + 𝑥𝑛 − 𝑛𝑥 = 0 Mean deviation or mean absolute deviation is given by 𝑑1 + 𝑑2 + … . + 𝑑𝑛 1 = 𝑑𝑖 𝑛 𝑛 Example For the numbers 7, 5, 8, 10, 12, 6 Mean = 8 Mean deviation = 1 6 1+3+0+2+4+2 =2 However, for large data, we use frequency 𝑓1 𝑑1 + 𝑓2 𝑑2 + … . +𝑓𝑛 𝑑𝑛 = 𝑓1 + 𝑓2 + … … + 𝑓𝑛 𝑓𝑖 𝑥𝑖 − 𝑥 𝑓𝑖 Standard Deviation (σ) : (The most important measure of deviation) Defined in terms of the deviations from the mean 𝜎2 2 2 = 𝑑1 + 𝑑2 + … … + 𝑑𝑛 = 1 𝑛 𝑥𝑖 − 𝑥 2 1 𝑛= 𝑛 𝑑𝑖 2 2 If x1, x2, ….., xn have frequencies f1, f2, ……, fn respectively. 𝝈𝟐 = 𝒇𝟏 𝒅𝟏 𝟐 + 𝒇𝟐 𝒅𝟐 𝟐 + … + 𝒇𝒏 𝒅𝒏 𝟐 𝒇𝟏 + 𝒇𝟐 + … . + 𝒇𝒏 𝜎 2 is variance of the data. 𝝈 is root mean square deviation of the data measured from the mean. For the numbers 5, 6, 7, 8, 10, 12 Mean = 8 and 34 σ = 9 + 4 + 1 + 0 + 4 + 16 6 = = 5.67 6 σ = 2.38 2 Frequency Distributions (Three important Types) 1. Binomial 2. Poisson 3. Normal Most of the distribution, based on scientific observations or industrial and social statistics approach closely to on or the other of these three important distributions. These distributions can be derived and expressed mathematically using the theory of prob. The Binomial distribution: On tossing a coin we have two probabilities Heads or tails i.e., chances of getting heads or tails is 50% For ‘n’ number of tosses (n is large) No. of times we get heads ≈ n/2. For small n, chances of deviation from 50% are large Tossing a coin 10 times For another 10 tosses we may get heads 3 times we may get heads 6 times Question: On tossing ‘n’ times what is the chance of getting heads ‘m’ times? 0≤ m ≤ n. For ‘n’ tosses, probabilities of getting heads 0, 1, 2, … , n times are given by successive terms of 1 1 𝑛 binomial expansion of + . 2 2 In general, if probability of occurring certain events is p and that it will not happen is q, such that (p+q = 1) then the probability that it will happen on 0, 1, 2, ……., n out of n occasions are given by successive terms of binomial expansion. 𝒒+𝒑 𝒏 = 𝒒𝒏 𝒏 𝒏 − 𝟏 𝒏−𝟐 𝟐 + 𝒒 𝒑 𝟐! + … … + 𝒑𝒏 + 𝒏𝒒𝒏−𝟏 𝒑 On tossing a coin 10 times, prob. of getting heads on 0, 1, 2, ….., 10 times are given by the successive terms of the expansion of 1 , 10 10 2 × 𝟏 𝟐𝟗 × 1 2 + 1 10 , 2 which are 𝟏 𝟏𝟎×𝟗 𝟏 𝟏 𝟏𝟎×𝟗×𝟖 𝟏 𝟏 , , ,…….. 𝟖 𝟐 𝟕 𝟑 𝟐 𝟐! 𝟐 𝟐 𝟑! 𝟐 𝟐 1 = 10 1, 10, 45, 120, 210, 252, 210, 120, 45, 10, 1 2 Thus chances of getting heads 3 times in 10 𝟏𝟎×𝟗×𝟖 𝟏 𝟏 𝟏𝟓 tosses = 𝟕 𝟑 = 𝟑! 𝟐 𝟐 𝟏𝟐𝟖 If we toss a coin 3 times, we can get HHH HHT HTH HTT THH THT TTH TTT Each outcome is equally likely, and there are 8 of them. So each has a probability of 1/8 So the probability of “two Heads" in “three tosses” is: Total No. of Attempts 23 = 8 No. of outcome Prob. of each we want outcome 3 X 1/8 = 3/8 Thus we have • • • • P(3-heads) = P(HHH) = 1/8 P(2-Heads) = P(HHT) + P(HTH) + P(THH) = 1/8 + 1/8 + 1/8 = 3/8 P(1-Head) = P(HTT) + P(THT) + P(TTH) = 1/8 + 1/8 + 1/8 = 3/8 P(Zero Heads) = P(TTT) = 1/8 We can write this in terms of a Random Variable, X • P(X = 3) = 1/8 • P(X = 2) = 3/8 • P(X = 1) = 3/8 • P(X = 0) = 1/8 We can now make a formula No. of tosses, n = 3 and No. of heads we want, k = 2 𝑛! 3! = =3 𝑘!(𝑛−𝑘)! 2!(3−2)! Similarly, chances of 5 heads in 9 tosses 𝑛! 𝑘!(𝑛−𝑘)! 9! = = 5!(9−5)! 126 While for 9 tosses total attempts = 29 = 512 No of outcome we want = 126 Prob. of each outcome = 1/512 Prob. of getting 5 heads in 9 tosses = 126/512 i.e., P(x=5) = 126/512 = 0.2460 i.e. 25% chance In general Prob. Of getting k out of n ways P(k,n) = 𝑛! 𝑘! 𝑛−𝑘 ! pk (1-p)n-k Which is general Binomial probability formula Poisson Distribution The form of binomial distribution varies considerably depending upon the values of p and n. An important practical case is when the probability of occurring an event, p is very small but n is very large so that n × p is not insignificant. Taking np = m, it can be shown that the binomial expansion of (q + p)n approximates closely to the series 𝑚 𝑚 𝑒 (1 + 1 + 𝑚2 2! + 𝑚3 3! +....+ 𝑚𝑛 ), 𝑛! where e =2.71828 called as Poisson series and any distribution which correspond to the successive terms is called Poisson distribution. Relative frequency r with which an event happens n times varies as n 𝑒𝑚r 0 1 1 2 3 s m 𝑚2 2! 𝑚3 3! 𝑚𝑠 𝑠! .... For a true probability distribution sum of prob. = 1 For Poisson distribution, sum of relative freq. 𝑒 −𝑚 (1 + = 𝑒 −𝑚 × =1 𝑚2 𝑚3 𝑚𝑠 + + +....+ ) 2! 3! 𝑠! 𝑒 𝑚 (approx. if s is very large) 𝑚 1 Also with these approximation, Mean of the distribution is m and standard deviation is 𝐦 𝟏 𝟐 In general, prob. Of occurring of an event n times is 𝑒−𝑚 𝑚𝑛 𝑛! Example Arrivals of a particular charged particle noticed by a detector follow a Poisson distribution with an average of 4.5 particles in every 15 sec. Obtain a bar-plot of the distribution, assuming a maximum of 20 arrivals in 15 sec. Also calculate the probability of fewer than 3 arrivals in this time interval. Prob. of arrival of 0, 1, 2, . . . Particles are given by 𝑒−𝑚 𝑚𝑛 P(n) = 𝑛! Thus with m = 4.5 P(0) = 𝑒−4.5 4.50 = 0.011 0! Similarly P(1) = 0.04999 and P(2) = 0.11248 Hence prob. of fewer than 3 arrivals P(0) + P(1) + P(2) = 0.17358 m=3 m=5 m = 10 Example: Given are the freq. f of n successors in a set of 500 trials. Find the mean of distribution and verify that the distribution is roughly of Poisson type. Show that the mean ≅ variance. n 0 1 2 3 4 5 6 7 8 9 SUM f 24 77 110 112 84 50 24 12 5 2 500 n×f 0 77 220 336 336 250 144 84 40 18 1505 𝒆−𝒎 𝒎𝒏 𝒏! 24.6 74.2 84.3 50.8 25.4 11.0 4 1.4 11.6 112.0 X 500 Mean number of success, m = n×f/f = 1505/500 = 3.01 The terms of Poisson series with m = 3.01 e−3.01 3.012 3.013 1 + 3.01 + + + …… 2! 3! On multiplying by 500, the successive terms are 24.6, 74.2, 11.6, 112.0, 84.3, 50.8, 25.4, 11.0, 4.0, 1.4 which match with the values of f given in the Table Using an assumed mean = 3, d = n-3, we may tabulate Hence n d=n-3 fd fd2 0 -3 -72 216 1 -2 -154 308 2 -1 -110 110 3 0 0 0 4 1 84 84 5 2 100 200 6 3 72 216 7 4 48 192 8 5 25 125 9 6 12 72 sum 5 1523 mean = 3 + 1523 variance = 500 5 500 = 3.01 - (0.01)2 = 3.0459 ≃ 3.05 Normal Distribution • Discovered in 1733 by de Moivre as an approximation to the binomial distribution when the number of trails is large • Also obtained by Laplace and Gauss later • Importance lies in the Central Limit Theorem, which states that the sum of a large number of independent random variables (binomial, Poisson, etc.) will approximate a normal distribution The equation −ℎ2 y = A𝑒 𝑥−𝑚 2 , where A, h , m are constants is know as normal error curve y A 0.6A σ 0.2A 2σ x=m normal distribution The normal distribution curve • has is its maximum value at x = m, i.e., y = A • is symmetrical about the line x = m The area under the curve is given by +∞ ℎ2 (𝑥−𝑚)2 𝐴𝑒 dx = (A 𝜋)/h −∞ If A = h/ 𝜋, the area under the curve is unity and the equation of the curve is y= ℎ −ℎ2 𝑒 (𝑥 𝜋 − 𝑚)2 This is frequency curve, called a normal or Gaussian distribution. Any Gaussian distribution is determined by two parameters h and m m is the mean of the distribution h is often called precision constant and is related to the standard deviation σ such that 2σ2h2 = 1 Taking m = <x> and h2 = 2σ2 y 2 2/2𝜎 1 = 𝑒 −(𝑥−<𝑥>) 𝜎 2𝜋 which gives Gaussian distribution with mean = <x> and standard deviation = σ Gaussian distribution for the same mean and different σ Example From a sample of fish in a pond it is found that the mean length of these fish, m is 30cm while 2 = 4cm We assume that the length is normal random variable If we catch one of these fish the what is the probability that • it will be at least 31 cm. long? • it will be no more than 32 cm. long? • its length will be between 26 and 29 cm? Prob. that it will be 31 cm long P Prob. that it will be less than 32 cm Prob. that it length will be between 28 and 30 cm The Normal or Gaussian Law of errors 2 2/2σ 1 The function, y = 𝑒 −(x−<x>) 𝜎 2𝜋 Defines the normal frequency distribution and is called the Gaussian law of errors. The two parameters, <x> and 𝜎 charecterize such distribution This law states that a set of measurements involving accidental errors are distributed about the mean. Mean is the best estimated value of the quantity, while 𝜎 estimates the best accuracy The value of the measured quantity is <x> ± α α = σ/ 𝒏 is standard error of the mean n is the number of observations. Standard error of σ is σ/ 𝟐𝐧 Example: Values of distances covered by a tourist on a vehicle are listed ( in km/day). Find the mean distance and its standard error 782 798 786 774 771 776 sum Working mean = 780 Residual d 2 18 d2 4 324 6 -6 -9 -4 36 36 81 16 7 497 Mean = working mean + <residual> = 780 + 7/6 = 781 Alternative Mean = 782 + 798 + 786 + 774 + 771 + 776 <x> = 781.16 = 781 Standard deviation, σ = < 𝑥 2 >−< 𝑥 >2 = 610302 − 781 ∗ 781 =9 Standard error of mean = σ/ 𝒏 = 9/ 𝟔 = 3.68 = 4 Standard error of s.d. = σ/ 𝟐𝒏 = 9/ 𝟏𝟐 = 2.6 = 3 Standard Errors of Compound Quantities If a number of measured quantities have means m1, m2, m3, . . . . mn with standard errors α1, α2, α3, . . . .,αn respectively then the standard error of Quantity Error The sum m1+ m2 𝜶 𝟏𝟐 + 𝜶 𝟐𝟐 The difference m1- m2 𝜶 𝟏𝟐 + 𝜶 𝟐𝟐 The multiple km1 The product m1m2 𝒎𝟐𝟐 𝜶𝟏𝟐 + 𝒎𝟏𝟐 𝜶𝟐𝟐 m1m2m3 𝟐 𝜶 = 𝒎𝟏 𝒎𝟐 𝒎𝟑 k𝜶𝟏 k is some constant + Power 𝒎𝟏 𝒑 α = 𝒑𝒎𝒑−𝟏 𝜶𝟏 𝜶𝟏 𝟐 𝒎𝟏 𝜶𝟐 𝟐 𝜶𝟑 𝟐 + 𝒎𝟏 𝒎𝟏 Weighted mean A mean where some values contribute more than others. When we do a simple mean (or average), we give equal weight to each number. mean = (1 + 2 + 3 + 4)/4 = 2.5 Each of the four numbers has a weight of ¼ Mean = ¼×1 + ¼×2 + ¼×3 + ¼×4 = 0.25 + 0.5 + 0.75 +1.0 = 2.5 If weight of 3 is changed to 0.7, the other three numbers have still equal weights of 0.1 each so that total weight is 1 Mean = 0.1 × 1 + 0.1 × 2 + 0.7 × 3+ 0.1 × 4 = 2.8 This weighted mean is now a little higher ("pulled" there by the weight of 3). When some values get more weight than others the central point (the mean) can change: Least-Squares Fitting Fitting requires a parametric model that relates the response data to the predictor data with one or more coefficients. The result of the fitting process is an estimate of the model coefficients. Coefficients are obtained by minimizing the summed squares of residuals. Residuals are defined as 𝑟𝑖 = 𝑦𝑜𝑏𝑠 - 𝑦𝑐𝑎𝑙 The least-squares method minimizes the summed square of residuals. The residual is identified as the error associated with the data. Example: 𝑦 = 𝑎 + 𝑏𝑥 𝑎= b= y x2 − n x2 − x xy x 2 n xy− x y n x2 − x 2 intercept slope Example: x y ycal d d2 1.0 2.4 2.3 0.1 0.01 2.0 3.9 4.1 0.2 0.04 3.0 6.1 5.9 0.2 0.04 4.0 8.3 7.7 0.6 0.36 5.0 9.5 9.5 0.0 0.00 6.0 11.4 11.3 0.1 0.01 Σ x = 21.0 Σ y = 41.6 Σx2 = 91.0 This gives 𝒚 = 𝟎. 𝟓 + 𝟏. 𝟖𝒙 Σ d2 = 0.46 Σxy = 177.6 α2 = Σd2/(n-2) = 0.12 Errors in coefficients 𝜶𝟏 = ± 𝜶𝟐 𝒙𝟐 n x2 − x 2 𝜶𝟐 = ± 𝒏𝜶𝟐 x2 − x 2 n = 0.63 Fitting • To find a functional form that describes data within errors • Why fit data? • Extract physical parameters from data • Test validity of data or model • Interpolate/extrapolate data Goodness of fit • Data are expected to fluctuate by ~ error • Chi-square per Degree of Freedom (DoF) should be ~ 1 • DoF: Number of data points - number of fitted parameters • Chi-square allows to test if model/fitted function is compatible with data How to fit data? Vary parameters of function until you find a global maximum of goodness-of-fit criterion Goodness-of-fit • chi-square (most common) • likelihood • (Kolmogorov-Smirnov test) observed value Chi square 𝜒2= calculated value 𝑥𝑜𝑏𝑠 2 −𝑥𝑐𝑎𝑙 2 𝜎2 estimated uncertainty Best Fit: Global minimum of Chi-square X Y± Error X2 X.Y Ycal d2 χ2 1 5.1 ± 0.6 1 5.1 4.83 0.072 0.202 2 10.2 ± 0.9 4 20.4 9.96 0.057 0.071 3 14.7 ± 1.4 9 44.1 15.09 0.152 0.077 4 19.5 ± 1.8 16 78.0 20.22 0.518 0.160 5 25.9 ± 2.2 25 129.5 25.35 0.302 0.062 6 30.7 ± 3.3 36 184.2 30.48 0.048 0.004 ΣX=21 , ΣY=106.1 , Σd2=1.149 , ΣX2=91 , Σ χ2=0.576 ΣX.Y=461.3 a 2 Y . X X . X .Y b 2 n. X X . X 2 n. X .Y X . Y n. X d 2 X . X 2 n2 = 0.287 = - 0.3 = 5.13 1 2 . X 2 n. X X . X 2 n. 2 2 n. X 2 X . X D.F = ± 0.118 Ycal = -0.3 ±0.460 + (5.13±0.118)X Which gives : And = ± 0.460 2 0.576 = 0.144 4 Chi-square distribution Even if the error estimate is correct and the model is correct • Chi-square will fluctuate from chi2/DoF < 1 to chi2/DoF > 1 • The shape of the chi2-distribution depends only on the number of degrees of freedom • Probability that data and model agree can be calculated Chi-square probability Percentage of all measurements that have a worse chi-square than expected Books 1. Ideas about Errors by J. Topping 2. A Practical Guide to Data analysis for Physical Science Students by Louis Lyons Few figures taken from the net Thank you