Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2 Statistics A branch of mathematics dealing with the collection, analysis, interpretation and presentation of masses of numerical data: Descriptive Statistics (Lecture 1) Basic description of a variable Hypothesis Testing (Lecture 2) Asks the question – is X different from Y? Predictions (Lecture 3) What will happen if… Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s2 Objectives Introduce the basic concepts and assumptions of significance tests Distributions on parade Developing hypotheses What is “true”? Survey statistical methods for testing for differences in populations of numbers Sample size issues Appropriate tests What we won’t do: Elaborate on mathematical underpinnings of tests (take a good stats course for this!) Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s3 From our last lecture The mean: A measure of central tendency The Standard Deviation: A measure of the ‘spread’ of the data Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s4 Tales of the normal distribution Many kinds of data follow this symmetrical, bell-shaped curve, often called a Normal Distribution. Normal distributions have statistical properties that allow us to predict the probability of getting a certain observation by chance. -2.0 -1.5 Developed by: Host -1.0 -0.5 0.0 0.5 1.0 Updated: Jan. 21, 2004 1.5 2.0 U5-m17b-s5 Tales of the normal distribution When sampling a variable, you are most likely to obtain values close to the mean 68% within 1 SD 95% within 2 SD -2.0 Developed by: Host -1.5 2.0 -1.0 1.0 -0.5 Updated: Jan. 21, 2004 0.0 0 0.5 1.0 1.0 1.5 2.0 2.0 U5-m17b-s6 Tales of the normal distribution Note that a couple values are outside the 95th (2 SD) interval These are improbable -2.0 Developed by: Host -1.5 2.0 -1.0 1.0 -0.5 Updated: Jan. 21, 2004 0.0 0 0.5 1.0 1.0 1.5 2.0 2.0 U5-m17b-s7 Tales of the normal distribution The essence of hypothesis testing: If an observation appears in one of the tails of a distribution, there is a probability that it is not part of that population. -2.0 Developed by: Host -1.5 2.0 -1.0 1.0 -0.5 00.0 Updated: Jan. 21, 2004 0.5 1.0 1.0 2.01.5 2.0 U5-m17b-s8 “Significant Differences” A difference is considered significant if the probability of getting that difference by random chance is very small. P value: The probability of making an error by chance Historically we use p < 0.05 Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s9 The probability of detecting a significant difference is influenced by: The magnitude of the effect A big difference is more likely to be significant than a small one Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s10 The probability of detecting a significant difference is influenced by: The spread of the data If the Standard Deviation is low, it will be easier to detect a significant difference Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s11 The probability of detecting a significant difference is influenced by: The number of observations Large samples more likely to detect a difference than a small sample Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s12 Hypothesis testing Hypothesis: A statement which can be proven false Null hypothesis HO: “There is no difference” Alternative hypothesis (HA): “There is a difference…” In statistical testing, we try to “reject the null hypothesis” If the null hypothesis is false, it is likely that our alternative hypothesis is true “False” – there is only a small probability that the results we observed could have occurred by chance Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s13 Common probability levels Alpha Level Reject Null Hypothesis P > 0.05 Not significant No P < 0.05 1 in 20 Significant Yes P <0.01 1 in 100 Significant Yes 1 in 1000 Highly Significant Yes P < 0.001 Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s14 Types of statistical errors (you could be right, you could be wrong) Accept Ho Reject Ho Ho is True Correct Decision Type I Error Alpha Ho is False Type II Error Beta Correct Decision Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s15 Examples of type I and type II errors Type II Error -2.0 2.0 -1.5 Type I Error -1.0 1.0 -0.5 Developed by: Host 0 0.0 0.5 1.01.0 Updated: Jan. 21, 2004 2.0 1.5 2.0 U5-m17b-s16 Common statistical tests Question Test Does a single observation belong to a population of values? Z-test Are two (or more populations) of number different? T-test F-test (ANOVA) Is there a relationship between x and y Regression Is there a trend in the data (special case of above Regression Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s17 Does a single observation belong to a population of values: The Z-test On June 26, 2002, a temperature probe reading at 7 m depth in Medicine Lake was 20.30 C. Is this unusually high for June? Medicine Lake June 2002 Temp - 7 m Note: this is a “one-tailed test”, we just want to know if it’s high # observations 12 10 8 6 June Temp 4 2 0 21.00 20.75 20.50 20.25 20.00 19.75 19.50 19.25 19.00 18.75 18.50 18.25 18.00 We’re not asking if it is unusually low or high (2tailed) Temperature Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s18 The z distribution: Standard normal distribution) The Z-distribution is a Normal Distribution, with special properties: Mean = 0 Variance = 1 Z = (observed value – mean)/standard error Standard error = standard deviation * sqrt(n) The Z distribution Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s19 Medicine lake example Calculate the Z-score for the observed data Compare the Z score with the significant value for a one tailed test (1.645) Medicine Lake June 2002 Temp - 7 m # observations 12 10 8 6 June Temp 4 2 0 21.00 20.75 20.50 20.25 20.00 19.75 19.50 19.25 19.00 18.75 18.50 18.25 18.00 Temperature Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s20 The Deep Math… Z = (observed value – mean)/standard error Standard error = standard deviation * sqrt(n) Z = (20.3 – 19.7) 0.08 = 6.89 Since 6.89 > the critical Z value of 1.64 Our deep temperature is significantly higher than the June average temperature. Further exploration shows that a storm the previous day caused the warmer surface waters to mix into the deeper waters. Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s21 Are two populations different: The t-test Also called Student’s t-test. “Student” was a synonym for a statistician that worked for Guinness brewery Useful for “small” samples (<30) One of the most basic statistical tests, can be performed in Excel or any common statistical package Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s22 Are two populations different: The t-test Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s23 Are two populations different: The t-test One of the most basic statistical tests, can be performed in Excel or any common statistical package Same principle as Z-test – calculate a t value, and assess the probability of getting that value Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s24 In Excel Formula: @ttest(Pop1, Pop2, #Tails, TestType) Tailed tests: 1 or 2 TestType 1 - paired (if there is a logical pairing of XY data) 2 - equal variance 3 - unequal variance Test returns exact probability value Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s25 Example: 1-tailed temperature comparison @ttest(Pop1, Pop2, 1, 3) = 1.5 * 10-149 Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s26 ANOVA: Tests of multiple populations ANOVA – analysis of variance Compare 2 or more populations Surface temperatures for 3 lakes Can handle single or multiple factors One way ANOVA – comparing lakes Two-way ANOVA – compare two factors Temperature x Light effects on algal populations Repeated measures ANOVA – compare factors over time Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s27 Next Time: Regression - Finding relationships among variables Dissolved Oxygen (ppm) H a ls te d S u rfa c e - A u g u s t 1 9 9 9 20 15 10 DO 5 0 7.5 8 8.5 9 9.5 pH Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s28