Download Exercise on Error bars

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Exercise on Error bars
The following are replicate observations of % growth at each of four pH values. For
each set, give the mean, standard deviation, standard error of the mean and range and
arrange them as a table. (It is started for you.)
pH
% Growth
mean
s.d.
s.e.
4.0
2.6 2.9 2.7 2.7 3.8
2.9 2.6 3.1 2.7 3.8
5.0
2.98
0.459
0.145
1.2
2.8 3.4 4.9 3.6 3.8
2.7 3.3 3.1 3.5 4.7
3.58
0.728
0.230
2.2
6.0
3.7 3.6 5.9 4.2
4.3
4.22
0.880
0.359
2.3
7.0
4.4 4.1 3.9 4.5 4.2
4.0 5.1 5.3 4.8
4.478
0.494
0.165
1.4
3.6
range
a) Having calculated the mean, s.d. and standard error for each set of observations,
draw a graph of mean % growth against pH and draw error bars at each point.
b) Do you think a straight line is justified as a fit to this data or does it need a curve?
(Hint: What does the standard error bar tell you about the position of the true mean
%growth at a particular pH?)
Answers above for calculation in italics. Graph (not given) has pH along the horizontal
axis, mean % growth on vertical axis.
a) The point of this is to be sure you can use your calculator in s.d. mode to find mean
and s.d. of a small sample of data in one operation. The standard error has to be
calculated from the s.d. by you. (Not a calculator automatic function) If your s.d.
values are all slightly lower than the above you are using the wrong s.d. (use the
n-1 or s, NOT n or ). Also recall that the range of the data (maximum value –
minimum value) is related to the s.d. Larger s.d.larger range.
b) The chance that the true mean %growth at each pH lies between the top and the
bottom of the standard error bar is only about 68%. To be 95% sure that we have
caught the true value we would need at least to double the bar length. That being
so there is no evidence against a straight line.
1
What’s missing from this graph?
pH
4
5
6
7
Mean Mean – Std. Error Mean + Std. Error
2.98
2.84
3.13
3.58
3.35
3.81
4.22
3.86
4.58
4.48
4.32
4.65
2
Questions Page 8
1. Plate counts using the same basic medium at various pH levels were obtained (after 8
hours incubation). Unfortunately some plates were contaminated and the sample sizes for
them ended up smaller. So the results available were as follows:
pH
Count
5.0
6.0
7.0
8.0
52
89
110
120
49
130
162
154
72
75
98
175
63
140
148
47
131
57
142
i)
ii)
iii)
iv)
Calculate the mean and s.d. of the counts for each pH.
Find the standard error of the mean for each pH.
Draw a graph of mean count against pH incorporating error bars.
Calculate the coefficient of variation for the mean count at each pH.
pH
5.0
6.0
7.0
8.0
Mean
S.d.
Std. Error
Coeff of Variation
56. 7
9.47
3.87
16.7%
108.5 131.8 149.7
31.40 24.07 27.76
15.70 9.83 16.02
28.9% 18.3% 18.5%
3
Page 9
2. A Physiology study took numerous measurements on a large number of people.
a) Below is a random sample of left ear and left foot lengths for males.
Left Ear (cm)
Left foot (cm)
6.0
28.0
6.5
25.0
5.8
25.5
7.0
27.0
6.8
25.2
6.2
25.2
5.0
26.0
5.5
26.0
5.5
24.5
Which set of measurements is the more variable? This is intentionally somewhat openended. You need to decide which measure of variation to use.
Answer
Mean
S.d.
C.V.
Left ear
6.03
0.658
10.9%
Left foot
25.8
1.09
4.2%
To compare variation of two sets of measurements that are on different scales or that have
very different means, the coefficient of variation should be used. This applies here as the
mean lengths are very different. Hence we can say that the length of the ear is more
variable in relative terms than the foot.
4
Page 9
3. Two methods are in use to determine the % moisture in wood. A sample of wood is
divided into 12 pieces. The % moisture is determined 6 times using Method A and 6 times
using Method B.
Method A
B
42% 51% 43% 50% 54% 44%
39% 41% 46% 39% 42% 41%
Which method is the more repeatable?
Answer
Method A
Method B
Mean
47.3%
41.3%
S.d.
4.97%
2.58%
Method B has the lower s.d. and therefore it is more repeatable. Repeatability relates to
the s.d. of the observations rather than the standard error. Anyone who points out that the
range in Method B is smaller than that for Method A and quotes the ranges has also got
it right (and saved work!)
5
Page 9
4. Two methods are in use for vitamin A determination in vitamin pills. To compare
methods a solution of known concentration 2 mg/ml was used. Each method was used for
5 repeats on Monday and another 5 repeats on the following Friday. The amount of
vitamin A should be identical in each case.
Mean
s.d.
Method A
Monday 2.05 2.10 2.08 2.08 2.06
2.074
0.01949
Friday 1.85 1.87 1.93 1.87 1.90
1.884
0.03130
Method B
Monday 2.10 1.92 2.07 1.96 1.95
Friday 2.06 1.97 2.09 2.01 1.92
2.000
2.010
0.07969
0.06819
Which method (if any) is apparently
i) accurate
Method B …….mean very near to 2 – the known mean
ii) repeatable
Method A……..low s.d. within repeat sets
iii) reproducible
probably B……less difference in means Mon. to Fri.
iv) precise?
Method A……..lower s.d. than B on same no. of observations
In practical terms there is a problem. Method B is more accurate but less precise than
Method A. I would want considerably more data if I had to decide between the methods
in real life. If all we want is to compare data to see if there has been an increase or
decrease then method A may be better. If we need to be accurate against an outside
standard we may have to stick with method B and do more repeats each time to increase
the precision since the s.d. is higher.
6
Answers for Confidence intervals for a population mean
Assume that all samples are normally distributed for the questions on this handout.
Remember that if the standard deviation has been calculated on a small sample you
need to use the t-distribution.
Page 13
1. An analysis was made of peat soils from a number of sites which had similar vegetation.
The total phosphate in the soil was as follows:
(mg/100g dry weight)
39.3 46.6 51.7 46.0 68.3 58.0
Calculate the mean and s.d. of the sample and hence find a 95% confidence interval for
the mean phosphate content of the soil.
Explain clearly in words what this confidence interval means. WRITTEN answer!!
Solution:
Mean: 51.65
Standard Deviation: 10.27
Critical Value: page 45 statistical tables.
Identify Column:
(100 − 95)
= 2.5
2
Identify Row: - degrees of freedom n − 1
n=6 ⇒n−1=5
Thus: 𝑡𝜈=5.0.025 = 2.571
Distribution Plot
T, df=5
0.4
Density
0.3
0.2
0.1
0.0
0.025
0.025
-2.571
X
0
2.571
95% confidence interval:
51.65 ± 2.571 × 4.193
95% of the times when we do this type of calculation we will be correct when we claim
that the calculated interval includes the true mean.
7
Page 13
2. Using specimens from 10 children, determination of the %calcium content of sound
teeth gave the following:
36.39 36.19 34.20 35.15 35.47
35.22 36.11 35.63 36.63 35.59
(i) Find 95% and 99% confidence intervals for the mean %calcium of the teeth.
Mean: 35.658
Standard Deviation: 0.7137
Standard Error: 0.7137⁄
= 0.2257
√10
Degrees of Freedom (Row on page 45 of Tables): 𝜈 = 𝑛 − 1 = 10 − 1 = 9
Critical Values:
95% interval: 2.262
99% interval: 3.250
95% Confidence Interval: 35.658 ± 2.262 × 0.2257
99% Confidence Interval: 35.658 ± 3.250 × 0.2257
(ii) Important question. Explain why you cannot find a 100% confidence interval.
Because we are assuming an underlying normal distribution and we can never state a
range for 100% of a normal distribution. (Curve never quite touches the horizontal axis.)
8
Page 13
3. The mean indirect bilirubin level of 16 four-day old infants was found to be 5.98
mg/100cc. The s.d. was calculated to be 3.5 mg/100cc.
Find 90%, 95% and 99% confidence intervals for the mean bilirubin level of the
population.
Critical Values (from t-tables) are based on 15 degrees of freedom:
90%: 1.753
95%: 2.131
99%: 2.947
Standard Error is: 3.5/4 = .875
Confidence intervals are:
90%: 5.98 ± 1.753 × 0.875 = 5.98 ± 1.53
95%: 5.98 ± 2.131 × 0.875 = 5.98 ± 1.86
99%: 5.98 ± 2.947 × 0.875 = 5.98 ± 2.58
9
Page 13
4. A sample of 100 apparently normal adult males, aged 25, had a mean systolic blood
pressure of 125. Assuming that the s.d. of the sample is 15 find
(i) a 90% C.I. for the population mean
(ii) a 95% C.I. for the population mean.
“Large” sample size – use Normal distribution rather than t-distribution. This is the last
row of the t-tables.
Critical Value:
90%: 1.645
95%: 1.960
90% confidence interval
95% confidence interval
125 ± 1.645 × 1.5 = 125 ± 2.45
125 ± 1.96 × 1.5 = 125 ± 2.94
For this it is not necessary to use the t – distribution. The normal distribution can be used as the
sample size is greater than 30. The z – values can be found either by using Table 5 and looking up the
2.5% point for 95% interval etc or you can use the bottom row of Table 10 which is marked as having
an infinite number of degrees of freedom. (Written as ∞ )
General point about the precision of the quoted mean and s.d. (s.e.)
The usual rules are that you give the mean to one more decimal place than the data.
The standard deviation and the standard error should also be quoted to one more
decimal place than the original. BUT this is as a final answer. You need at least one
extra figure of precision while doing the calculations.
For example, in question 1 above you work to 3 d.p. at least but you quote answers
rounded to 2 d.p. at the end.
If you do not work to greater precision than your final answer you can get a very
inaccurate answer.
10