Download 2. Standard Error

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Contents
Contents
1 Histograms: Densities or Numbers?
1
2 Computing σ with n or n − 1?
2
3 Mean values
3.1 example:
3.2 example:
3.3 example:
3
4
5
6
are usually nice but sometimes mean
picky wagtails . . . . . . . . . . . . . . . . . . . . .
spider men & spider women . . . . . . . . . . . . . .
copper-tolerant browntop bent . . . . . . . . . . . .
4 The standard error SE
8
4.1 example: drought stress in sorghum . . . . . . . . . . . . . . . 8
4.2 general consideration . . . . . . . . . . . . . . . . . . . . . . . 10
1
Histograms: Densities or Numbers?
0 2 4 6 8
Number
Number vs. Density
0
1
2
3
4
5
6
8
4
0
Number
12
Histograms with unequal intervals should
show densities, not
numbers!
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0.3
0.0
Density
0.6
0
Computing σ with n or n − 1?
2
0.20
Mean: 25.13
Standard deviation: 1.36
0.00
Density
Simulated population (N=10000 adults)
20
22
24
26
28
30
28
30
Length [cm]
Sample from the population (n=10)
M: 24.43
SD with (n−1): 1.15
SD with n: 1.03
20
● ● ●●
● ● ●● ● ●
22
24
26
Length [cm]
Another sample from the population (n=10)
M: 24.92
SD with (n−1): 1.61
SD with n: 1.45
20
●
22
● ● ●●
●
●
24
● ●
26
Length [cm]
2
●
28
30
0.8
0.0
Density
1000 samples, each of size n=10
0.5
1.0
1.5
2.0
2.5
2.0
2.5
0.8
0.0
Density
SD computed with n−1
0.5
1.0
1.5
SD computed with n
Computing σ with n or n − 1?
The standard deviation σ of a random variable with n equally probable
outcomes x1 , . . . , xn (z.B. rolling a dice) is clearly defined by
v
u n
u1 X
t
(x − xi )2 .
n i=1
If x1 , . . . , xn is a sample (the usual case in statistics) you should rather
use the formula
v
u
n
u 1 X
t
(x − xi )2 .
n − 1 i=1
3
Mean values are usually nice but sometimes
mean
Mean and SD. . .
3
• characterize data well if the distribution is bell-shaped
• and must be interpreted with caution in other cases
We will exemplify this with textbook examples from ecology, see e.g.
References
[BTH08] M. Begon, C. R. Townsend, and J. L. Harper. Ecology: From
Individuals to Ecosystems. Blackell Publishing, 4 edition, 2008.
When original data were not available, we generated similar data sets by
computer simulation. So do not believe all data points.
3.1
example: picky wagtails
Conjecture
• Size of flies varies.
• efficiency for wagtail = energy gain / time to capture and eat
• lab experiments show that efficiency is maximal when flies have size
7mm
References
[Dav77] N.B. Davies. Prey selection and social behaviour in wagtails (Aves:
Motacillidae). J. Anim. Ecol., 46:37–57, 1977.
available dung flies
captured dung flies
6
7
8
9
10
11
0.5
0.4
0.3
0.1
0
5
available
0.0
10
50
4
captured
0.2
fraction per mm
40
sd= 0.69
30
number
50
60
mean= 6.79
20
150
100
sd= 0.96
0
number
mean= 7.99
dung flies: available, captured
4
5
length [mm]
6
7
8
length [mm]
4
9
10
11
4
5
6
7
8
length [mm]
9
10
11
numerical comparison of size distributions
0.5
dung flies: available, captured
available
0.0
0.1
0.2
0.3
fraction per mm
0.4
captured
captured
available
mean
6.29
<
7.99
sd
0.69
<
0.96
4
5
6
7
8
9
10
11
length [mm]
Interpretation
The birds prefer dung-flies from a relatively narrow range around the
predicted optimum of 7mm.
The distributions in this example were bell-shaped, and the 4 numbers
(means and standard deviations) were appropriate to summarize the data.
3.2
example: spider men & spider women
Nephila madagascariensis
http://commons.wikimedia.org/wiki/File:Nphila_inaurata_Madagascar_02.jpgimage (c) by Bernard Gagnon
Simulated Data:
70 sampled spiders
mean size: 21.05 mm
sd of size :12.94 mm
14
12
10
14
mean= 21.06
8
6
Frequency
8
6
Frequency
3
0
10
20
30
40
50
4
0
2
0
2
4
2
1
0
Frequency
4
10
12
Nephila madagascariensis (n=70)
6
Nephila madagascariensis (n=70)
5
?????
0
10
size [mm]
20
30
size [mm]
5
40
50
0
10
20
30
size [mm]
40
50
12
females
8
10
males
2
0
0
2
4
6
8
Frequency
females
6
males
4
Frequency
10
12
14
Nephila madagascariensis (n=70)
14
Nephila madagascariensis (n=70)
0
10
20
30
40
50
0
10
20
size [mm]
30
40
50
size [mm]
Conclusion from spider example
If data comes from different groups, it may be reasonable to compute
mean an sd separately for each group.
3.3
example: copper-tolerant browntop bent
References
[Bra60] A.D. Bradshaw. Population Differentiation in agrostis tenius Sibth.
III. populations in varied environments. New Phytologist, 59(1):92 –
103, 1960.
[MB68] T. McNeilly and A.D Bradshaw. Evolutionary Processes in Populations of Copper Tolerant Agrostis tenuis Sibth. Evolution, 22:108–
118, 1968.
Again, we have no access to original data and use simulated data.
Adaptation to copper?
• root length indicates copper tolerance
• measure root lengths of plants near copper mine
• take seeds from clean meadow and sow near copper mine
• measure root length of these “meadow plants” in copper environment
6
Browntop Bent (n=50)
100
40
Browntop Bent (n=50)
30
Copper Mine Grass
0
0
20
10
20
density per cm
60
40
density per cm
80
Grass seeds from a meadow
50
100
150
200
0
50
100
150
root length (cm)
root length (cm)
Browntop Bent (n=50)
Browntop Bent (n=50)
200
0.03
0.04
meadow plants
0.02
20
density per cm
0.05
Grass seeds from a meadow
copper mine plants
0.01
10
density per cm
30
0.06
40
0.07
0
0
0.00
copper tolerant ?
50
100
150
200
0
50
100
root length (cm)
root length (cm)
Browntop Bent (n=50)
Browntop Bent (n=50)
150
200
100
40
0
m−s
meadow plants
60
0
0
20
10
20
m+s
density per cm
m
40
density per cm
80
m−s
m+s
m
30
copper mine plants
0
50
100
150
200
0
50
root length (cm)
Browntop Bent n=50+50
copper mine plants
●
●
meadow plants
●
●
0
●
●
● ●●
50
●
●●
●
●●
●
100
100
root length (cm)
●
150
●
200
root length (cm)
2/3 of the data within [m-sd,m+sd]???? No!
quartiles of root length [cm]
min Q1 median
Q3
copper adapted 12.9 80.1 100.8 120.9
from meadow
1.1 13.2
16.0
19.6
7
max
188.9
218.9
150
200
Conclusion from browntop bent example
Sometimes the two numbers
m and sd
give not enough information.
In this example the four quartiles
max, Q1, median, Q3, max
that are shown in the boxplot are more approriate.
Conclusions from this section
Always visually inspect the data!
Never rely on summarising values alone!
4
The standard error SE
The Standard Error
sd
SE = √
n
describes the variability of the sample mean.
n: sample size sd: sample standard deviance
4.1
example: drought stress in sorghum
drought stress in sorghum
8
References
[BB05] V. Beyel and W. Brüggemann. Differential inhibition of photosynthesis during pre-flowering drought stress in Sorghum bicolor genotypes
with different senescence traits. Physiologia Plantarum, 124:249–259,
2005.
• 14 sorghum plants were not watered for 7 days.
• in the last 3 days: transpiration was measured for each plant (mean
over 3 days)
• the area of the leaves of each plant was determined
transpiration rate
=
(amount of water per day)/area of leaves
·
ml
cm2 · day
¸
Aim: Determine mean transpiration rate µ under these
conditions.
If we hade many plants, we could determine µ quite
precisely.
Problem: How accurate is the estimation of µ with such
a small sample? (n = 14)
9
6
6
0.10
0.12
0.14
0.16
Transpiration (ml/(day*cm^2))
5
4
2
0
1
2
1
0
0.08
Standard Deviation=0.026
3
4
frequency
5
mean=0.117
3
frequency
4
3
0
1
2
frequency
5
6
drought stressed sorghum (variety B, n = 14)
0.08
0.10
0.12
0.14
0.16
Transpiration (ml/(day*cm^2))
0.08
0.10
0.12
0.14
Transpiration (ml/(day*cm^2))
transpiration data: x1, x2, . . . , x14
14
¡
¢
1 X
x = x1 + x2 + · · · + x14 /14 =
xi
14 i=1
x = 0.117
our estimation:
µ ≈ 0.117
how accurate is this estimation?
How much does x
4.2
deviate from µ
mean value)?
(our estimation)
(the true
general consideration
Assume we had made the experiment not just 14 times,
but repeated it 100 times, 1000 times, 1000000 times
10
0.16
We consider our 14 plants as
random sample
from a very large population of possible values.
population
(all rates of transpiration)
µ
sample
x
We estimate
the population mean
µ
by the sample mean
x.
Each new sample gives a new value of x.
11
x depends on randomness: it is a random variable
Problem: How variable is x?
More precisely: What is the typical deviation of x from
µ?
¡
¢
x = x1 + x2 + · · · + xn /n
What does the variability of x depend on?
1. From the variability of the single observations
x1, x2, . . . , xn
12
mean = 0.117
x varies a lot
⇒ x varies a lot
0.05
0.10
0.15
0.20
0.25
0.05
0.10
0.15
0.20
0.25
mean = 0.117
x varies little
⇒ x varies little
0.05
0.10
0.15
0.20
0.25
0.05
0.10
0.15
0.20
0.25
2.
from the sample size
n
The larger n, the smaller is the variability of
x.
To explore this dependency we perform a
(Computer-)Experiment.
Experiment: Take a population, draw samples and
examine how x varies.
13
We assume the distribtion of possible transpriration
rates looks like this:
10
0
5
density
15
hypothetical distribution of transpiration rates
0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
Transpiration (ml/(day*cm^2))
10
5
0
density
15
SD=0.026
0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
Transpiration (ml/(day*cm^2))
14
At first with small sample sizes:
n=4
10
0
5
Dichte
15
sample of size 4 second sample of size 4 third sample of size 4
0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
10
5
0
Dichte
15
Transpiration (ml/(Tag*cm^2))
0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
Transpiration (ml/(Tag*cm^2))
15
0
2
4
6
8
10
10 samples
0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
Transpiration (ml/(Tag*cm^2))
0
10
20
30
40
50
50 samples
0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
Transpiration (ml/(Tag*cm^2))
16
How variable are
the sample means?
10
8
6
4
2
0
0
2
4
6
8
10
10 samples of size 4 and the corresponding sample means
0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
Transpiration (ml/(Tag*cm^2))
Transpiration (ml/(Tag*cm^2))
50
40
30
20
10
0
0
10
20
30
40
50
50 samples of size 4 and the corresponding sample means
0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
Transpiration (ml/(Tag*cm^2))
Transpiration (ml/(Tag*cm^2))
distribution of sample means (sample size n = 4)
17
40
30
20
Sample mean
Mean=0.117
SD=0.013
0
10
Density
Population
Mean=0.117
SD=0.026
0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
Transpiration (ml/(day*cm^2))
population: standard deviation = 0.026
sample means (n = 4):
standard deviation = 0.013 √
= 0.026/ 4
Increase the sample size from 4to 16
10 samples of size 16 and the corresponding sample
means
18
10
8
6
4
2
0
0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
Transpiration (ml/(day*cm^2))
0
10
20
30
40
50
50 samples of size 16 and the corresponding sample
means
0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
Transpiration (ml/(day*cm^2))
distribution of sample means (sample size n = 16)
19
80
Sample mean
Mean=0.117
SD=0.0064
40
0
20
Density
60
Population
Mean=0.117
SD=0.026
0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
Transpiration (ml/(day*cm^2))
population: standard deviation = 0.026
sample mean (n = 16):
standard deviation = 0.0065 √
= 0.026/ 16
General Rule 1. Let x be the mean of a sample of size n from a distribution
(e.g. all values in a population) with standard deviation σ. Since x depends
on the random sample, it is a random variable. Its standard deviation σx
fulfills
σ
σx = √ .
n
Problem: σ is unknown
Idea: Estimate σ by sample standard deviation s:
σ≈s
σ
s
σx = √ ≈ √ =: SEM
n
n
SEM stands for Standard Error of the Mean, or Standard Error for short.
20
The distribution of x
Observation
Even if the distribution of x is asymmetric and has multiple peaks, the distribution of x will be bell-shaped (at least for larger sample sizes n.)
21
√
√
Pr(x ∈ [µ − (σ/ n), µ + (σ/ n)]) ≈
µ−σ
x − √snσ
µ − √n
x µ
x + √sn
µ+
22
2
3
µ+σ
√σ
n
The distribution of x is approximately of a certain
shape:
the normal distribution.
Density of the normal distribution
0.4
0.2
0.3
µ+σ
0.0
0.1
Normaldichte
µ
µ+σ
−1
0
1
2
3
4
5
The normal distribution is also called Gauß distribution (after Carl
Friedrich Gauß, 1777-1855)
23
Related documents