Download Dealing with Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Dealing with Data
During the course of this year you will collect data from many different experiments.
You will have to report your data in a statistically correct way. Most importantly you
will have to report the uncertainty in your data. Any measurement without an
indication of the uncertainty is meaningless! This hand-out tells you how to handle
your data. (Note for the mathematicians in the class: this document assumes normally
distributed data)
Describing your sample data: Mean ( ) and standard deviation (s)
When you take several measurements (xi) you can find the average (mean, ) by
adding up (∑) all your values (xi) and dividing the result by the total number (n) of
measurements. Or in math form:
Reporting only the mean of your data set does not say much. Consider the following
two data sets:
4, 4.5, 5, 5.5, 6 and 1, 2, 5, 8, 9
Both have a reported mean of 5 (check this!), but clearly the data from the first set is
much more centered around the mean than the data from the second set. This spread
of the data around the mean is expressed with the standard deviation (s).
The standard deviation is calculated with the following formula (don’t worry: your
calculator can do this for you!):
When you report your data as ± s, you are effectively saying that the true mean of
the data that you sampled lies with 68% certainty between - s and + s
For example if we apply this to the two data sets given above we would report:
4, 4.5, 5, 5.5, 6:
5 ± 0.79
1, 2, 5, 8, 9
5 ± 3.54
You can see that for the first data set we have a narrow range of ± 0.79 in which we
expect the true mean (which we sampled five times) to lie with 68% certainty. For the
second data set the range is much wider (± 3.54)
In general about 68% of the data lies within one standard deviation of the mean, 95%
of the data lies within 2 standard deviations of the mean and 99% of the data lies
within 3 standard deviations of the mean. This is illustrated in the following graph:
3s
2s
1s
1s
2s
3s
The take-home lesson here: always report your data as
±s!
Outliers
When you take measurements you will sometimes find values that appear to be very
different from the rest of your data set. You may suspect that an outlier has crept into
your data. Here are the two rules for dealing with outliers:
1. Use EXTREME caution. You must have an experimental reason for throwing
out a data point! In other words, you noted that something went wrong during the
experiment and therefore you are justified in throwing out a data point. If this is
not the case, be very careful about discarding your data. Sometimes the outlier can
be the clue to a new theory.
2. If a value appears to be an outlier use the following test:
T = |xq - | / s
where xq is the questionable result, is the sample mean and s is the sample
standard deviation. Notice that you divide the difference between the questionable
value and the mean by the standard deviation. In other words, you are calculating
how many standard deviations away from the mean your questionable value lies.
If you look at the standard deviation graph above, you will see that it would be
unusual to find a value more than 3 standard deviations away from the mean.
If T is greater than the values in the table below, then you can be 99% certain that
your value is an outlier and can be rejected.
Sample size
3
4
5
6
7
8
9
10
Critical T value
1.15
1.49
1.75
1.94
2.10
2.22
2.52
2.41
Error from measurement, absolute and percentage
When you make measurements you must always keep track of the error in your
measurement. Every piece of equipment comes with some uncertainty in the
measurement. Any measurement without indication of the error is meaningless.

Absolute and percentage error
Many pieces of equipment have the absolute error printed on them. If this is not the
case you can assume that the absolute error is half the distance between the smallest
markings on the instrument. For example if you measure the volume of water in a
cylinder with 1 ml markings, you can assume that the uncertainty in you measurement
is ± 0.5 ml. The absolute error simply tells you the range in which your data lies with
68% certainty, or more simply put: the standard deviation of your measurement.
Percentage error gives an indication of how good a measurement is relative to the
size of the thing being measured. In formula form:
absolute
error
Percentage
error


100
%

value
of
thing
measured
For example if I measure the length of the room as 10 m ± 2 m and I also measure the
distance from earth to the moon as 384,403,000 m ± 2 m, I clearly did a much better
job with the distance to the moon. This show up in the percentage error.
For the room: (2/10) * 100% = 20% error
For the moon (2 / 384403000) * 100% = 0.0000005% error.
Error propagation in calculations
Once you have determined the error in your measurements you will often use your
data to calculate new pieces of information. The error in the final answer depends on
the operation that you perform:

Error in multiplications and divisions
Suppose a runner runs across a field and you measure the distance that she runs and
the time it takes her to run across this field. Let’s say you find that the distance is 100
meters and that your uncertainty in the measurement is ± 1m. In other words, you are
68% certain that the distance lies between 99 and 101 meters. Say you measure
measure the time with a stopwatch and you find 13.2 ± 0.2 seconds. To calculate the
speed of the runner you now divide the distance by the time:
distance
100

1
meters
speed
 

7
.
58

??
m/s
time
13.2

0.2
seconds
Now the question is: how do you find the uncertainty in the calculated speed based on
the uncertainties in measured distance and time? Here is the rule for multiplications
and divisions: To find the error for the calculated result, you add the percentage
errors of the components. For our example this means:
distance
100

1
meters
100

1%
speed

 
7
.
58
m/s

2
.
5
%
time
13.2

0.2
seconds
13.2

1.5%
Since 2.5% of 7.58 is equal to 0.1895, your answer would be: 7.58 ± 0.1895
The first digit that is uncertain is the tenths, (0.1895) so your final answer should not
be given to more places behind the decimal than the tenths. You round the digit after
that, so that the uncertainty becomes ± 0.2 .Your final answer would be:
distance
100

1
meters
100

1%
speed

 
7
.
58
m/s

2
.
5
%
= 7.6 ± 0.2 m/s
time
13.2

0.2
seconds
13.2

1.5%

Error in additions and subtractions
There is a different rule for determining the uncertainty of a calculated addition or
subtraction. You simply add the absolute uncertainties. For example let’s say
that you are measuring the length and the width of a field. Suppose that you get
the following measurements:
Width = 13.7 ± 0.2 m
Length = 11.3 ± 0.3 m
If you walk the width and the length of the field how far do you walk? In other words
what is
(13.7 ± 0.2 m) + (11.3 ± 0.3 m) = 25.0 ± ? m
The uncertainty in the width shows that the true width lies somewhere in between
13.7 – 0.2 and 13.7 + 0.2 meters. So between 13.5 and 13.9 meters. The uncertainty in
the length shows that the true length is somewhere in between 11.3 – 0.3 and 11.3 +
0.3 meters. So between 11.0 and 11.6 meters.
When you walk the length and the width, the shortest possible distance according to
this is: 13.5 + 11.0 meters = 24.5 meters. The longest possible length is 13.9 + 11.6
meters = 25.5. In other words the distance walked is somewhere in between 24.5 and
25.5 meters. Notice that that is the same as saying: 25.0 ± 0.5 meters. And notice that
the uncertainty in the answer is the same as the sum of the uncertainties in the
individual measurement! The uncertainty is in the tenths, so your final answer should
contain the tenths as the final digit. So your final answer would be:
(13.7 ± 0.2 m) + (11.3 ± 0.3 m) = 25.0 ± 0.5 m
Improving the quality of your result: multiple samples
You can improve the certainty of your result by doing more measurements. This will
narrow the range of uncertainty. The following formula applies:
sf 
s
N
where sf is the final error, s is the sample error from one measurement and N is the
number of samples.
For example, suppose you have measured the length of a nail as 5 ± 1 cm, 4 ± 1 cm
and 6 ± 1cm. The average (mean) of the three measurements is 5 cm. Each individual
measurement has a uncertainty of ± 1cm, but because you did multiple measurement
you can report your final answer with greater certainty. When you apply the formula
you get:
sf = 1/√3 = 0.577 = 0.6,
So your final answer for the length of the nail would be: 5.0 ± 0.6 cm.