Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Contents Contents 1 Histograms: Densities or Numbers? 1 2 Computing σ with n or n − 1? 2 3 Mean values 3.1 example: 3.2 example: 3.3 example: 3 4 5 6 are usually nice but sometimes mean picky wagtails . . . . . . . . . . . . . . . . . . . . . spider men & spider women . . . . . . . . . . . . . . copper-tolerant browntop bent . . . . . . . . . . . . 4 The standard error SE 8 4.1 example: drought stress in sorghum . . . . . . . . . . . . . . . 8 4.2 general consideration . . . . . . . . . . . . . . . . . . . . . . . 10 1 Histograms: Densities or Numbers? 0 2 4 6 8 Number Number vs. Density 0 1 2 3 4 5 6 8 4 0 Number 12 Histograms with unequal intervals should show densities, not numbers! 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0.3 0.0 Density 0.6 0 Computing σ with n or n − 1? 2 0.20 Mean: 25.13 Standard deviation: 1.36 0.00 Density Simulated population (N=10000 adults) 20 22 24 26 28 30 28 30 Length [cm] Sample from the population (n=10) M: 24.43 SD with (n−1): 1.15 SD with n: 1.03 20 ● ● ●● ● ● ●● ● ● 22 24 26 Length [cm] Another sample from the population (n=10) M: 24.92 SD with (n−1): 1.61 SD with n: 1.45 20 ● 22 ● ● ●● ● ● 24 ● ● 26 Length [cm] 2 ● 28 30 0.8 0.0 Density 1000 samples, each of size n=10 0.5 1.0 1.5 2.0 2.5 2.0 2.5 0.8 0.0 Density SD computed with n−1 0.5 1.0 1.5 SD computed with n Computing σ with n or n − 1? The standard deviation σ of a random variable with n equally probable outcomes x1 , . . . , xn (z.B. rolling a dice) is clearly defined by v u n u1 X t (x − xi )2 . n i=1 If x1 , . . . , xn is a sample (the usual case in statistics) you should rather use the formula v u n u 1 X t (x − xi )2 . n − 1 i=1 3 Mean values are usually nice but sometimes mean Mean and SD. . . 3 • characterize data well if the distribution is bell-shaped • and must be interpreted with caution in other cases We will exemplify this with textbook examples from ecology, see e.g. References [BTH08] M. Begon, C. R. Townsend, and J. L. Harper. Ecology: From Individuals to Ecosystems. Blackell Publishing, 4 edition, 2008. When original data were not available, we generated similar data sets by computer simulation. So do not believe all data points. 3.1 example: picky wagtails Conjecture • Size of flies varies. • efficiency for wagtail = energy gain / time to capture and eat • lab experiments show that efficiency is maximal when flies have size 7mm References [Dav77] N.B. Davies. Prey selection and social behaviour in wagtails (Aves: Motacillidae). J. Anim. Ecol., 46:37–57, 1977. available dung flies captured dung flies 6 7 8 9 10 11 0.5 0.4 0.3 0.1 0 5 available 0.0 10 50 4 captured 0.2 fraction per mm 40 sd= 0.69 30 number 50 60 mean= 6.79 20 150 100 sd= 0.96 0 number mean= 7.99 dung flies: available, captured 4 5 length [mm] 6 7 8 length [mm] 4 9 10 11 4 5 6 7 8 length [mm] 9 10 11 numerical comparison of size distributions 0.5 dung flies: available, captured available 0.0 0.1 0.2 0.3 fraction per mm 0.4 captured captured available mean 6.29 < 7.99 sd 0.69 < 0.96 4 5 6 7 8 9 10 11 length [mm] Interpretation The birds prefer dung-flies from a relatively narrow range around the predicted optimum of 7mm. The distributions in this example were bell-shaped, and the 4 numbers (means and standard deviations) were appropriate to summarize the data. 3.2 example: spider men & spider women Nephila madagascariensis http://commons.wikimedia.org/wiki/File:Nphila_inaurata_Madagascar_02.jpgimage (c) by Bernard Gagnon Simulated Data: 70 sampled spiders mean size: 21.05 mm sd of size :12.94 mm 14 12 10 14 mean= 21.06 8 6 Frequency 8 6 Frequency 3 0 10 20 30 40 50 4 0 2 0 2 4 2 1 0 Frequency 4 10 12 Nephila madagascariensis (n=70) 6 Nephila madagascariensis (n=70) 5 ????? 0 10 size [mm] 20 30 size [mm] 5 40 50 0 10 20 30 size [mm] 40 50 12 females 8 10 males 2 0 0 2 4 6 8 Frequency females 6 males 4 Frequency 10 12 14 Nephila madagascariensis (n=70) 14 Nephila madagascariensis (n=70) 0 10 20 30 40 50 0 10 20 size [mm] 30 40 50 size [mm] Conclusion from spider example If data comes from different groups, it may be reasonable to compute mean an sd separately for each group. 3.3 example: copper-tolerant browntop bent References [Bra60] A.D. Bradshaw. Population Differentiation in agrostis tenius Sibth. III. populations in varied environments. New Phytologist, 59(1):92 – 103, 1960. [MB68] T. McNeilly and A.D Bradshaw. Evolutionary Processes in Populations of Copper Tolerant Agrostis tenuis Sibth. Evolution, 22:108– 118, 1968. Again, we have no access to original data and use simulated data. Adaptation to copper? • root length indicates copper tolerance • measure root lengths of plants near copper mine • take seeds from clean meadow and sow near copper mine • measure root length of these “meadow plants” in copper environment 6 Browntop Bent (n=50) 100 40 Browntop Bent (n=50) 30 Copper Mine Grass 0 0 20 10 20 density per cm 60 40 density per cm 80 Grass seeds from a meadow 50 100 150 200 0 50 100 150 root length (cm) root length (cm) Browntop Bent (n=50) Browntop Bent (n=50) 200 0.03 0.04 meadow plants 0.02 20 density per cm 0.05 Grass seeds from a meadow copper mine plants 0.01 10 density per cm 30 0.06 40 0.07 0 0 0.00 copper tolerant ? 50 100 150 200 0 50 100 root length (cm) root length (cm) Browntop Bent (n=50) Browntop Bent (n=50) 150 200 100 40 0 m−s meadow plants 60 0 0 20 10 20 m+s density per cm m 40 density per cm 80 m−s m+s m 30 copper mine plants 0 50 100 150 200 0 50 root length (cm) Browntop Bent n=50+50 copper mine plants ● ● meadow plants ● ● 0 ● ● ● ●● 50 ● ●● ● ●● ● 100 100 root length (cm) ● 150 ● 200 root length (cm) 2/3 of the data within [m-sd,m+sd]???? No! quartiles of root length [cm] min Q1 median Q3 copper adapted 12.9 80.1 100.8 120.9 from meadow 1.1 13.2 16.0 19.6 7 max 188.9 218.9 150 200 Conclusion from browntop bent example Sometimes the two numbers m and sd give not enough information. In this example the four quartiles max, Q1, median, Q3, max that are shown in the boxplot are more approriate. Conclusions from this section Always visually inspect the data! Never rely on summarising values alone! 4 The standard error SE The Standard Error sd SE = √ n describes the variability of the sample mean. n: sample size sd: sample standard deviance 4.1 example: drought stress in sorghum drought stress in sorghum 8 References [BB05] V. Beyel and W. Brüggemann. Differential inhibition of photosynthesis during pre-flowering drought stress in Sorghum bicolor genotypes with different senescence traits. Physiologia Plantarum, 124:249–259, 2005. • 14 sorghum plants were not watered for 7 days. • in the last 3 days: transpiration was measured for each plant (mean over 3 days) • the area of the leaves of each plant was determined transpiration rate = (amount of water per day)/area of leaves · ml cm2 · day ¸ Aim: Determine mean transpiration rate µ under these conditions. If we hade many plants, we could determine µ quite precisely. Problem: How accurate is the estimation of µ with such a small sample? (n = 14) 9 6 6 0.10 0.12 0.14 0.16 Transpiration (ml/(day*cm^2)) 5 4 2 0 1 2 1 0 0.08 Standard Deviation=0.026 3 4 frequency 5 mean=0.117 3 frequency 4 3 0 1 2 frequency 5 6 drought stressed sorghum (variety B, n = 14) 0.08 0.10 0.12 0.14 0.16 Transpiration (ml/(day*cm^2)) 0.08 0.10 0.12 0.14 Transpiration (ml/(day*cm^2)) transpiration data: x1, x2, . . . , x14 14 ¡ ¢ 1 X x = x1 + x2 + · · · + x14 /14 = xi 14 i=1 x = 0.117 our estimation: µ ≈ 0.117 how accurate is this estimation? How much does x 4.2 deviate from µ mean value)? (our estimation) (the true general consideration Assume we had made the experiment not just 14 times, but repeated it 100 times, 1000 times, 1000000 times 10 0.16 We consider our 14 plants as random sample from a very large population of possible values. population (all rates of transpiration) µ sample x We estimate the population mean µ by the sample mean x. Each new sample gives a new value of x. 11 x depends on randomness: it is a random variable Problem: How variable is x? More precisely: What is the typical deviation of x from µ? ¡ ¢ x = x1 + x2 + · · · + xn /n What does the variability of x depend on? 1. From the variability of the single observations x1, x2, . . . , xn 12 mean = 0.117 x varies a lot ⇒ x varies a lot 0.05 0.10 0.15 0.20 0.25 0.05 0.10 0.15 0.20 0.25 mean = 0.117 x varies little ⇒ x varies little 0.05 0.10 0.15 0.20 0.25 0.05 0.10 0.15 0.20 0.25 2. from the sample size n The larger n, the smaller is the variability of x. To explore this dependency we perform a (Computer-)Experiment. Experiment: Take a population, draw samples and examine how x varies. 13 We assume the distribtion of possible transpriration rates looks like this: 10 0 5 density 15 hypothetical distribution of transpiration rates 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Transpiration (ml/(day*cm^2)) 10 5 0 density 15 SD=0.026 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Transpiration (ml/(day*cm^2)) 14 At first with small sample sizes: n=4 10 0 5 Dichte 15 sample of size 4 second sample of size 4 third sample of size 4 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 10 5 0 Dichte 15 Transpiration (ml/(Tag*cm^2)) 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Transpiration (ml/(Tag*cm^2)) 15 0 2 4 6 8 10 10 samples 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Transpiration (ml/(Tag*cm^2)) 0 10 20 30 40 50 50 samples 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Transpiration (ml/(Tag*cm^2)) 16 How variable are the sample means? 10 8 6 4 2 0 0 2 4 6 8 10 10 samples of size 4 and the corresponding sample means 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Transpiration (ml/(Tag*cm^2)) Transpiration (ml/(Tag*cm^2)) 50 40 30 20 10 0 0 10 20 30 40 50 50 samples of size 4 and the corresponding sample means 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Transpiration (ml/(Tag*cm^2)) Transpiration (ml/(Tag*cm^2)) distribution of sample means (sample size n = 4) 17 40 30 20 Sample mean Mean=0.117 SD=0.013 0 10 Density Population Mean=0.117 SD=0.026 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Transpiration (ml/(day*cm^2)) population: standard deviation = 0.026 sample means (n = 4): standard deviation = 0.013 √ = 0.026/ 4 Increase the sample size from 4to 16 10 samples of size 16 and the corresponding sample means 18 10 8 6 4 2 0 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Transpiration (ml/(day*cm^2)) 0 10 20 30 40 50 50 samples of size 16 and the corresponding sample means 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Transpiration (ml/(day*cm^2)) distribution of sample means (sample size n = 16) 19 80 Sample mean Mean=0.117 SD=0.0064 40 0 20 Density 60 Population Mean=0.117 SD=0.026 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Transpiration (ml/(day*cm^2)) population: standard deviation = 0.026 sample mean (n = 16): standard deviation = 0.0065 √ = 0.026/ 16 General Rule 1. Let x be the mean of a sample of size n from a distribution (e.g. all values in a population) with standard deviation σ. Since x depends on the random sample, it is a random variable. Its standard deviation σx fulfills σ σx = √ . n Problem: σ is unknown Idea: Estimate σ by sample standard deviation s: σ≈s σ s σx = √ ≈ √ =: SEM n n SEM stands for Standard Error of the Mean, or Standard Error for short. 20 The distribution of x Observation Even if the distribution of x is asymmetric and has multiple peaks, the distribution of x will be bell-shaped (at least for larger sample sizes n.) 21 √ √ Pr(x ∈ [µ − (σ/ n), µ + (σ/ n)]) ≈ µ−σ x − √snσ µ − √n x µ x + √sn µ+ 22 2 3 µ+σ √σ n The distribution of x is approximately of a certain shape: the normal distribution. Density of the normal distribution 0.4 0.2 0.3 µ+σ 0.0 0.1 Normaldichte µ µ+σ −1 0 1 2 3 4 5 The normal distribution is also called Gauß distribution (after Carl Friedrich Gauß, 1777-1855) 23