Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Environmental statistics Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год. Enviromatics 2008 - Environmental statistics 1 Introduction • Statistical analysis of environmental data is an important task to extract information on former and actual states of ecosystems. The estimates are known as sample statistics and form a base for prognoses on environmental system developments. • Topics of statistical analysis of environmental data are 1. Data analysis for the requirements of environmental administrations and associations (descriptive statistics, frequency distributions, averages, variances, error corrections, significance tests), 2. Data analysis for the requirements of different users as companies, farmers, tourists (explanatory statistics, multivariate statistics, time series analysis), 3. Basic research (regression and correlation analysis, multivariate statistics, advanced statistical techniques). Enviromatics 2008 - Environmental statistics 2 Environmental data • Environmental data are obtained by field samples and/or laboratory analysis. • They are directly observed (direct observations) or indirectly observed (due to calibration of analytical instruments and sensors). • Summary data are derived from statistics or by restricted observable indicators. • Simulated data are obtained by simulation models. • Measurement errors and outliers have to be removed from data sets. They will not take into account by data processing features. Enviromatics 2008 - Environmental statistics 3 Probability distributions of environmental data • Environmental data series represent the time and space varying behaviour of environmental processes. Some indicators show a longwave cycling overlaid by short variations. Other indicators lay out stochastic fluctuations. Some indicators represent an unique behaviour with some peak events. Enviromatics 2008 - Environmental statistics 4 Statistical measures • Statistical measures of environmental data are represented by – averages, – variances and – measures of correlation. Enviromatics 2008 - Environmental statistics 5 Averages • • • • • • 1. Arithmetic mean: x* = 1/n⋅Σ xi 2. Empirical median: x~ 3. Empirical mode: M 4. Geometric mean: x° 5. Weighted arithmetic mean: x*g 6. Weighted geometric mean: lg x° Enviromatics 2008 - Environmental statistics 6 Variances: • • • • 1. Range: R = xmin - xmax 2. Empirical variance: s2 3. Empirical standard deviation: s = √s2 4. Empirical coefficient of variation: v = s/x*⋅100 (%) Enviromatics 2008 - Environmental statistics 7 Coefficients of correlation • • • • • 1. Bivariate correlation coefficient 2. Performance index (coefficient of determination) B = r2 3. Multiple correlation coefficient 4. Multiple performance index 5. Spearman’s rank correlation (small sample size, normal probability distribution not necessary) Enviromatics 2008 - Environmental statistics 8 Statistical tests • In sample statistics the characteristics of interest are often expressed in terms of sample parameters such as average μ or variance σ 2. Other questions arise from comparing two or more samples. They may be expressed by the differences of averages. • A statistical hypothesis is a statement about the sample distribution of a random ecological variable. • Hypothesis testing consists of comparing statistical measures called test criteria (or test statistics) deduced from data sample with the values of these criteria taken on the assumption that a given hypothesis is correct. Enviromatics 2008 - Environmental statistics 9 Hypothesis testing • In hypothesis testing one examines a Null hypothesis H0 against one or more alternative hypotheses H1, H2,…,Hn which are stated explicitly or implicitly. • To reach a decision about the hypothesis an arbitrary significance level α is selected (0.05, 0.01 or 0.001). The confidence coefficient ε is given by ε = 1 – α. For hypothesis testing the test criterion (or test statistics) is set up. If this statistic falls into the range of acceptance, then the Null hypothesis can not be rejected. • On the other hand, when this statistics falls into the region of rejection, then the Null hypothesis is rejected. The probability of the test statistic falling in the region of rejection is equal to ε. It is expressed in %-values. Enviromatics 2008 - Environmental statistics 10 Procedure for hypothesis testing • The Null hypothesis H0 and an alternative hypothesis H1 have to be formulated. • The significance level α has to be selected. The test statistic is chosen. The region of rejection of the test statistic on the basis of its probability distribution and the significance level is determined. • Test statistic is calculated from data set. The Null hypothesis is rejected and the alternative hypothesis is accepted when the value of the test statistic falls into the region rejection. • The Null hypothesis is accepted if the value of test statistic does not fall into the region of rejection. Enviromatics 2008 - Environmental statistics 11 Example • From sampled data an average m was calculated and is now compared with an expected value K (a fixed number). • The Null hypothesis H0: m = K is tested against the alternative hypothesis H1: m ≠ K. The significance level α = 0.05 is selected and the test statistic is chosen: • t = |m - K|/s ⋅√n. If the test statistic falls into the region of acceptance of the Null hypothesis, that means tα/2 < t < t1α/2, H0 cannot be rejected. T • he power of the test depends on sample size n. The bigger the sample size (more information is available), the stronger the confidence of the test. Enviromatics 2008 - Environmental statistics 12 t – Test (Student – Test) • The test statistic tcalc = |x* - μ0|/s⋅√n, – – – – where x* - sample mean, μ0 – expectation value of the ensemble, s – standard deviation, n – sample size. • Decision: Acceptance if tcalc < ttab, otherwise rejection. Enviromatics 2008 - Environmental statistics 13 Comparison of means (t-test) • The test statistic t = |x* - x**|/sd ⋅√n*⋅n** / (n* + n**), where x* - first sample • mean, x** – second sample mean, s* – first standard deviation, s** – second • standard deviation, n* – first sample size, n** – second sample size, n-1 – degrees • of freedom and sd = √((n*-1)s*² + (n**-1)s**²)/(n*+n**-2). Decision: Acceptance • if tcalc < ttab, otherwise rejection. Enviromatics 2008 - Environmental statistics 14 Comparison of variances (F – Test) • The test statistic: F = (s*/s**)2 ≥ 1, where s* is the standard deviation of the first • sample, s** is the standard deviation of the second sample. Decision: Acceptance • if Fcalc < Ftab, otherwise rejection. Enviromatics 2008 - Environmental statistics 15 Outlier – Test (NALIMOV-Test) • The test statistic: r = |(x+ - x*)|/s⋅√n/(n-1), where x+ is to be expected as an outlier, • x* is the expectation of the sample, s is the standard deviation of the sample, • and n – sample size. Decision: Acceptance if rcalc < rtab, otherwise rejection Enviromatics 2008 - Environmental statistics 16 Environmental statistics The End Enviromatics 2008 - Environmental statistics 17