* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download IB Biology Topic 1: Statistical Anaylsis
Survey
Document related concepts
Transcript
Modify—use bio. IB book IB Biology Topic 1: Statistical Analysis http://www.patana.ac.th/Second ary/Science/c4b/1/stat1.htm An investigation of shell length variation in a mollusc species • A marine gastropod (Thersites bipartita) has been sampled from two different locations: – Sample A: Shells found in full marine conditions – Sample B: Shells found in brackish water conditions. • sample size = 10 shells • length of the shell measured as shown Analysis of Gastropod Data • • • • measured height of shells (ruler) Units: mm + / - 1 mm (ERROR) Significant digits Uncertainty must be consistent. – all measuring devices! – reflects the precision of the measurement • There should be no variation in the precision of raw data 1.1.1 Error bars and the representation of variability in data. • Biological systems are subject to a genetic program and environmental variation • collect a set of data it shows variation • Graphs: show variation using error bars – show range of the data or – standard deviation Mean & Range for each group • Marine • Brackish Graph Mean & Range for each group • Quick comparison of the 2 data sets 1.1.2 Calculation of Mean and Std Dev • 3 classes of data • Mean – arithmetic mean (avg): measure of the central tendency (middle value) • Std Dev – Measures spread around the mean – Measure of variation or accuracy of measurement 1.1.2 Calculation of Mean and Std Dev • Std Dev of sample = s • is for the sample not the total population • Pop 1. Mean = 31.4 s = 5.7 • Pop 2. Mean =41.6 s = 4.3 Graphing Mean and Std Dev: Error Bars • Mean +/- 1 std dev • no overlap between these two populations • The question being considered is: – Is there a significant difference between the two samples from different locations? • or – Are the differences in the two samples just due to chance selection? Graphing Mean and Std Dev: Error Bars StdDev graph compares 68% of the population % begins to show that they look different. Range graph : misleads us to think the data may be similar 1.1.3 Standard deviation and the spread of values around the mean. 1. StdDev is a measure of how spread out the data values are from the mean. 2. Assume: 1. normal distribution of values around the mean 2. data not skewed to either end 3. 68% of all the data values in a sample can be found between the mean +/- 1 standard deviation http://www.patana.ac.th/Secondary/ Science/c4b/1/stat1.htm#gastro • Animation of mean and standard deviation 1.1.3 Standard deviation and the spread of values around the mean. 4. 95% of all the data values in a sample can be found between the mean + 2s and the mean -2s. 1.1.4 Comparing means and standard deviations of 2 or more samples. Sample w/ small StdDev suggests narrow variation Sample w/ larger StdDev suggests wider variation Example: molluscs Pop 1. Mean = 31.4 Standard deviation(s)= 5.7 Pop 2. Mean =41.6 Standard deviation(s) = 4.3 1.1.4 Comparing means and standard deviations of 2 or more samples. Pop 2 has a greater mean shell length but slightly narrower variation. Why this is the case would require further observation and experiment on environmental and genetic factors. http://www.patana.ac.th/Secondary/Science/c4b/1/stat1.htm#gastro 1.1.5 Comparing 2 samples with t-Test Null Hypothesis: There is no significant difference between the two samples except as caused by chance selection of data. OR Alternative hypothesis: There is a significant difference between the height of shells in sample A and sample B. http://www.patana.ac.th/Secondary/Science/c4b/1/stat1.htm#gastro 1.1.5 Comparing 2 samples with t-Test For the examples you'll use in biology, tails is always 2 , and type can be: 1, paired 2,Two samples equal variance 3, Two samples unequal variance Good idea to graph it • Bar chart • Error bars • Stats T-test: Are the mollusc shells from the two locations significantly different? • T-test tells you the probability (P) that the 2 sets are basically the same. (null hypothesis) • P varies from 0 (not likely) to 1 (certain). – higher P = more likely that the two sets are the same, and that any differences are just due to random chance. – lower P = more likely that that the two sets are significantly different, and that any differences are real. T-test: Are the mollusc shells from the two locations significantly different? • In biology the critical P is usually 0.05 (5%) (biology experiments are expected to produce quite varied results) – If P > 5% then the two sets are the same • (i.e. accept the null hypothesis). – If P < 5% then the two sets are different • (i.e. reject the null hypothesis). • For t test, # replicates as large as possible – At least > 5 Drawing Conclusions 1. State null hypothesis & alternative hypothesis (based on research ?) 2. Set critical P level at P=0.05 (5%) 3. Write the decision rule— If P > 5% then the two sets are the same (i.e. accept the null hypothesis). If P < 5% then the two sets are different (i.e. reject the null hypothesis). 4. Write a summary statement based on the decision. The null hypothesis is rejected since calculated P = 0.003 (< 0.05; two-tailed test). 5. Write a statement of results in standard English. There is a significant difference between the height of shells in sample A and sample B. 1.1.6 Correlation & Causation • Sometimes you’re looking for an association between variables. • Correlations see if 2 variables vary together +1 = perfect positive correlation 0 = no correlation -1 = perfect negative correlation • Relations see how 1 variable affects another Pearson correlation (r) • Data are continuous & normally distributed Spearman’s rank-order correlation (r s) • Data are not continuous & normally distributed • Usually scatterplot for either type of correlation • both correlation coefficients indicate a strong + corr. – large females pair with large males – Don’t know why, but it shows there is a correlation to investigate further. Causative: Use linear regression • Fits a straight line to data • Gives slope & intercept – m and c in the equation y = mx + c Doesn’t PROVE causation, but suggests it...need further investigation!