Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Dealing with Data During the course of this year you will collect data from many different experiments. You will have to report your data in a statistically correct way. Most importantly you will have to report the uncertainty in your data. Any measurement without an indication of the uncertainty is meaningless! This hand-out tells you how to handle your data. (Note for the mathematicians in the class: this document assumes normally distributed data) Describing your sample data: Mean ( ) and standard deviation (s) When you take several measurements (xi) you can find the average (mean, ) by adding up (∑) all your values (xi) and dividing the result by the total number (n) of measurements. Or in math form: Reporting only the mean of your data set does not say much. Consider the following two data sets: 4, 4.5, 5, 5.5, 6 and 1, 2, 5, 8, 9 Both have a reported mean of 5 (check this!), but clearly the data from the first set is much more centered around the mean than the data from the second set. This spread of the data around the mean is expressed with the standard deviation (s). The standard deviation is calculated with the following formula (don’t worry: your calculator can do this for you!): When you report your data as ± s, you are effectively saying that the true mean of the data that you sampled lies with 68% certainty between - s and + s For example if we apply this to the two data sets given above we would report: 4, 4.5, 5, 5.5, 6: 5 ± 0.79 1, 2, 5, 8, 9 5 ± 3.54 You can see that for the first data set we have a narrow range of ± 0.79 in which we expect the true mean (which we sampled five times) to lie with 68% certainty. For the second data set the range is much wider (± 3.54) In general about 68% of the data lies within one standard deviation of the mean, 95% of the data lies within 2 standard deviations of the mean and 99% of the data lies within 3 standard deviations of the mean. This is illustrated in the following graph: 3s 2s 1s 1s 2s 3s The take-home lesson here: always report your data as ±s! Outliers When you take measurements you will sometimes find values that appear to be very different from the rest of your data set. You may suspect that an outlier has crept into your data. Here are the two rules for dealing with outliers: 1. Use EXTREME caution. You must have an experimental reason for throwing out a data point! In other words, you noted that something went wrong during the experiment and therefore you are justified in throwing out a data point. If this is not the case, be very careful about discarding your data. Sometimes the outlier can be the clue to a new theory. 2. If a value appears to be an outlier use the following test: T = |xq - | / s where xq is the questionable result, is the sample mean and s is the sample standard deviation. Notice that you divide the difference between the questionable value and the mean by the standard deviation. In other words, you are calculating how many standard deviations away from the mean your questionable value lies. If you look at the standard deviation graph above, you will see that it would be unusual to find a value more than 3 standard deviations away from the mean. If T is greater than the values in the table below, then you can be 99% certain that your value is an outlier and can be rejected. Sample size 3 4 5 6 7 8 9 10 Critical T value 1.15 1.49 1.75 1.94 2.10 2.22 2.52 2.41 Error from measurement, absolute and percentage When you make measurements you must always keep track of the error in your measurement. Every piece of equipment comes with some uncertainty in the measurement. Any measurement without indication of the error is meaningless. Absolute and percentage error Many pieces of equipment have the absolute error printed on them. If this is not the case you can assume that the absolute error is half the distance between the smallest markings on the instrument. For example if you measure the volume of water in a cylinder with 1 ml markings, you can assume that the uncertainty in you measurement is ± 0.5 ml. The absolute error simply tells you the range in which your data lies with 68% certainty, or more simply put: the standard deviation of your measurement. Percentage error gives an indication of how good a measurement is relative to the size of the thing being measured. In formula form: absolute error Percentage error 100 % value of thing measured For example if I measure the length of the room as 10 m ± 2 m and I also measure the distance from earth to the moon as 384,403,000 m ± 2 m, I clearly did a much better job with the distance to the moon. This show up in the percentage error. For the room: (2/10) * 100% = 20% error For the moon (2 / 384403000) * 100% = 0.0000005% error. Error propagation in calculations Once you have determined the error in your measurements you will often use your data to calculate new pieces of information. The error in the final answer depends on the operation that you perform: Error in multiplications and divisions Suppose a runner runs across a field and you measure the distance that she runs and the time it takes her to run across this field. Let’s say you find that the distance is 100 meters and that your uncertainty in the measurement is ± 1m. In other words, you are 68% certain that the distance lies between 99 and 101 meters. Say you measure measure the time with a stopwatch and you find 13.2 ± 0.2 seconds. To calculate the speed of the runner you now divide the distance by the time: distance 100 1 meters speed 7 . 58 ?? m/s time 13.2 0.2 seconds Now the question is: how do you find the uncertainty in the calculated speed based on the uncertainties in measured distance and time? Here is the rule for multiplications and divisions: To find the error for the calculated result, you add the percentage errors of the components. For our example this means: distance 100 1 meters 100 1% speed 7 . 58 m/s 2 . 5 % time 13.2 0.2 seconds 13.2 1.5% Since 2.5% of 7.58 is equal to 0.1895, your answer would be: 7.58 ± 0.1895 The first digit that is uncertain is the tenths, (0.1895) so your final answer should not be given to more places behind the decimal than the tenths. You round the digit after that, so that the uncertainty becomes ± 0.2 .Your final answer would be: distance 100 1 meters 100 1% speed 7 . 58 m/s 2 . 5 % = 7.6 ± 0.2 m/s time 13.2 0.2 seconds 13.2 1.5% Error in additions and subtractions There is a different rule for determining the uncertainty of a calculated addition or subtraction. You simply add the absolute uncertainties. For example let’s say that you are measuring the length and the width of a field. Suppose that you get the following measurements: Width = 13.7 ± 0.2 m Length = 11.3 ± 0.3 m If you walk the width and the length of the field how far do you walk? In other words what is (13.7 ± 0.2 m) + (11.3 ± 0.3 m) = 25.0 ± ? m The uncertainty in the width shows that the true width lies somewhere in between 13.7 – 0.2 and 13.7 + 0.2 meters. So between 13.5 and 13.9 meters. The uncertainty in the length shows that the true length is somewhere in between 11.3 – 0.3 and 11.3 + 0.3 meters. So between 11.0 and 11.6 meters. When you walk the length and the width, the shortest possible distance according to this is: 13.5 + 11.0 meters = 24.5 meters. The longest possible length is 13.9 + 11.6 meters = 25.5. In other words the distance walked is somewhere in between 24.5 and 25.5 meters. Notice that that is the same as saying: 25.0 ± 0.5 meters. And notice that the uncertainty in the answer is the same as the sum of the uncertainties in the individual measurement! The uncertainty is in the tenths, so your final answer should contain the tenths as the final digit. So your final answer would be: (13.7 ± 0.2 m) + (11.3 ± 0.3 m) = 25.0 ± 0.5 m Improving the quality of your result: multiple samples You can improve the certainty of your result by doing more measurements. This will narrow the range of uncertainty. The following formula applies: sf s N where sf is the final error, s is the sample error from one measurement and N is the number of samples. For example, suppose you have measured the length of a nail as 5 ± 1 cm, 4 ± 1 cm and 6 ± 1cm. The average (mean) of the three measurements is 5 cm. Each individual measurement has a uncertainty of ± 1cm, but because you did multiple measurement you can report your final answer with greater certainty. When you apply the formula you get: sf = 1/√3 = 0.577 = 0.6, So your final answer for the length of the nail would be: 5.0 ± 0.6 cm.