Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psychometrics wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Numerical weather prediction wikipedia , lookup
Predictive analytics wikipedia , lookup
Generalized linear model wikipedia , lookup
Computer simulation wikipedia , lookup
Theoretical ecology wikipedia , lookup
History of numerical weather prediction wikipedia , lookup
Inference in Biology BIOL4062/5062 Hal Whitehead • What are we trying to do? • Null Hypothesis Significance Testing • Problems with Null Hypothesis Significance Testing • Alternatives: – Displays, confidence intervals, effect size statistics – Model comparison using information-theoretic approaches – Bayesian analysis • Methods of Inference in Biology What are we trying to do? • Descriptive or exploratory analyses • Fitting predictive models • Challenging research hypotheses What are we trying to do? • Descriptive or exploratory analyses – What factors influence species diversity? • Fitting predictive models – Can we make global maps of species diversity? • Challenging research hypotheses – Is diversity inversely related to latitude? The traditional approach: Null Hypothesis Significance Testing • • • • • Formulate null hypothesis Formulate alternative hypothesis Decide on test statistic Collect data What is probability (P) of test statistic, or more extreme value, under null hypothesis? • If P<α (usually 0.05) conclude: – Reject null in favour of alternative • If P>α conclude: – Do not reject null hypothesis Null Hypothesis Significance Testing An example • Formulate null hypothesis – “Species diversity does not change with latitude” • Formulate alternative hypothesis – “Species diversity decreases with latitude” • Decide on test statistic – Correlation between diversity measure and latitude, r • Collect data – 405 measures of diversity at different latitudes • What is probability (P) of test statistic, or more extreme value, under null hypothesis? – r = -0.1762; P = 0.002 (one-sided) • If P<α (usually 0.05) conclude: – Reject “Species diversity does not change with latitude” Criticisms of: Null Hypothesis Significance Testing (1) • α is arbitrary • Most null hypotheses are false, so why test them? • Statistical significance is not equivalent to biological significance – with large samples, statistical significance but not biological significance – with small samples, biological significance but not statistical significance • If statistical power is low, the null hypothesis will usually not be rejected when false • Encourages arbitrary inferences when many tests carried out Criticisms of: Null Hypothesis Significance Testing (2) • Power analysis does not save NHST – arbitrary, confounded with P-value – “vacuous intellectual game” (Shaver 1993) • Incomplete reporting and publishing – only report statistically significant results – only publish statistically significant results • Focussing on one null and one alternative hypothesis limits scientific advance • Emphasis on falsification obscures uncertainty about “best” explanation for phenomenon Misuse of: Null Hypothesis Significance Testing • Failure to reject null hypothesis does not imply null is true • Probability of obtaining data given null hypothesis is not probability null hypothesis is true • Poor support for null hypothesis does not imply alternative hypothesis is true Statistical significance Practical importance of observed difference Not significant Significant Not important Happy Annoyed Very sad Elated Important Johnson (1999) “The insignificance of statistical significance testing”J. Wild. Manage. Statistical significance Practical importance of observed difference Not significant Significant Not important n OK n too large n too small n OK Important Johnson (1999) “The insignificance of statistical significance testing”J. Wild. Manage. Null Hypothesis Significance Testing: • “no longer a sound or fruitful basis for statistical investigation” (Clarke 1963) • “essential mindlessness in the conduct of research” (Bakan 1966) • “In practice, of course, tests of significance are not taken seriously” (Guttman 1985) • “simple P-values are not now used by the best statisticians” (Barnard 1998) • “The most common and flagrant misuse of statistics... is the testing of hypotheses, especially the vast majority of them known beforehand to be false” (Johnson 1999) “The problems with Null Hypothesis Significance Testing are so severe that some have argued for it to be completely banned from scholarly journals” Denis (2003) Theory & Science Alternatives to: Null Hypothesis Significance Testing • Displays, confidence intervals, effect size statistics • Model comparison using informationtheoretic approaches • Bayesian statistics Diversity and latitude 6 5.5 • r = -0.1762; P = 0.002 5 • r = -0.1762; 95% c.i.: -0.2690; -0.0801 Diversity 4.5 4 95% c.i. 3.5 3 2.5 2 1.5 1 0 10 20 30 Latitude 40 50 60 70 Diversity and latitude: Maybe by focussing on the diversity-latitude hypothesis, we have missed the real story 4 6 5.5 3.5 5 3 Diversity 4 3.5 3 2.5 2.5 2 2 1.5 1.5 1 0 10 20 30 Latitude 40 50 60 70 1 5 10 15 20 SST 25 30 35 5 4.5 Atlantic Pacific 4 Galápagos Gully Other 4.5 4 3.5 3.5 3 Diversity Diversity Diversity 4.5 2.5 3 2.5 2 2 1.5 1.5 1 5 1 0.5 10 15 20 SST 25 30 35 5 10 15 20 SST 25 30 35 Effect Size Statistics • indicate the association that exists between two or more variables – Pearson’s r correlation coefficient (or r2) • for two continuous variables – Cohen’s d • for one continuous, one two-level category (t-test) – Hedges’ g • better than d when sample sizes are very different – Cohen’s f2 • for one continuous, one multi-level category (F-test) – Cramer’s φ • for two categorical variables (Chi2 test) – Odds ratio • for two binary variables Cohen’s d d = Difference between means of two groups Pooled standard deviation • d = 0.2 indicative of a small effect size • d = 0.5 a medium effect size • d = 0.8 a large effect size Problems with effect size statistics • No serious problems • But they don’t tell the whole story Model fitting: How can we best predict diversity? 4 6 5.5 3.5 5 3 Diversity 4 3.5 3 2.5 2.5 2 2 1.5 1.5 1 0 10 20 30 Latitude 40 50 60 70 1 5 10 15 20 SST 25 30 35 5 4.5 Atlantic Pacific 4 Galápagos Gully Other 4.5 4 3.5 3.5 3 Diversity Diversity Diversity 4.5 2.5 3 2.5 2 2 1.5 1.5 1 5 1 0.5 10 15 20 SST 25 30 35 5 10 15 20 SST 25 30 35 Some models of diversity SST = Sea Surface Temperature lat = Latitude Ocean = Atlantic /Pacific area = Ocean area (categorical) constant SST SST, SST2 SST, SST2, SST3 lat lat, lat2 lat, lat2, lat3 SST, SST2, lat SST, SST2, lat, lat2 SST, SST2, lat, lat2, lat3 ocean SST, SST2, ocean area SST, SST2, area Which model is best? Model: Residual sum of squares constant 0.854 SST 0.774 SST, SST2 0.724 SST, SST2, SST3 0.726 lat 0.835 lat, lat2 0.804 lat, lat2, lat3 0.785 SST, SST2, lat 0.725 SST, SST2, lat, lat2 0.722 SST, SST2, lat, lat2, lat3 0.724 ocean 0.844 SST, SST2, ocean 0.725 area 0.831 SST, SST2, area 0.723 Parameters 2 3 4 5 3 Lowest RSS 4 but many 5 parameters 5 6 7 3 5 4 6 Which model is best? • Information-theoretic AIC – Akaike Information Criterion • A measure of the similarity between the statistical model and the true distribution • Trades off the complexity of a model against how well it fits the data Which model is best? Model: constant SST SST, SST2 SST, SST2, SST3 lat lat, lat2 lat, lat2, lat3 SST, SST2, lat SST, SST2, lat, lat2 SST, SST2, lat, lat2, lat3 ocean SST, SST2, ocean area SST, SST2, area RSS Parameters 0.854 2 0.774 3 0.724 4 0.726 5 0.835 3 0.804 4 0.785 5 0.725 5 0.722 6 0.724 7 0.844 3 0.725 5 0.831 4 0.723 6 AIC -61.08 -99.81 -125.54 -123.64 -69.09 -83.19 -92.19 -124.10 -125.05 -123.05 -64.88 -124.27 -69.77 -124.59 Lowest AIC: Best Model How much support for different models? Model: constant SST SST, SST2 SST, SST2, SST3 lat lat, lat2 lat, lat2, lat3 SST, SST2, lat SST, SST2, lat, lat2 SST, SST2, lat, lat2, lat3 ocean SST, SST2, ocean area SST, SST2, area AIC -61.08 -99.81 -125.54 -123.64 -69.09 -83.19 -92.19 -124.10 -125.05 -123.05 -64.88 -124.27 -69.77 -124.59 ΔAIC 64.46 25.73 0.00 1.90 56.45 42.35 33.35 1.45 0.49 2.49 60.66 1.27 55.77 0.96 How much support for different models? Model: constant SST SST, SST2 SST, SST2, SST3 lat lat, lat2 lat, lat2, lat3 SST, SST2, lat SST, SST2, lat, lat2 SST, SST2, lat, lat2, lat3 ocean SST, SST2, ocean area SST, SST2, area AIC -61.08 -99.81 -125.54 -123.64 -69.09 -83.19 -92.19 -124.10 -125.05 -123.05 -64.88 -124.27 -69.77 -124.59 ΔAIC 64.46 25.73 0.00 1.90 56.45 42.35 33.35 1.45 0.49 2.49 60.66 1.27 55.77 0.96 No support No support Best model Some support No support No support No support Some support Some support Little support No support Some support No support Some support Relative importance of variables from AIC SST SST2 SST3 lat lat2 lat3 ocean area 1.000 1.000 0.211 0.398 0.280 0.075 0.128 0.141 Best model of diversity: Diversity = 0.293 + 0.261SST - 0.00614SST2 4 3.5 Diversity 3 2.5 2 1.5 1 5 10 15 20 SST 25 30 35 Global pattern of diversity apply equation to global SST map Global pattern of diversity apply equation to SST predictions from global circulation models Advantages and criticisms of information-theoretic model-fitting • Indicates “best” model and support for other models • Can compare very different models • Balances complexity of model against fit • Produces predictive models • Fairly simple mathematically and computationally • Model averaging • Philosophical basis “nuanced” • Which models to consider is subjective Bayesian Analysis • Given prior distribution of models or model parameters • Collect data • Work out probability of data for each model and combination of model parameters • Work out posterior distribution of models or model parameters – using Bayes’ theorem Bayes’ Theorem Posterior probability of model given data = Probability of data given model X Probability of model Probability of data Bayesian Analysis • So, Bayesian analysis gives: – the probability of models or parameters given prior knowledge and data – very nice! – but may need considerable computation Example of Bayesian Analysis • Trying to work out survival rate of newly studied species of rodent • Ten other species in genus have mean survival per year of 0.72 (SD 0.13) • Of 20 animals marked, 17 survive for 1 year • Standard (binomial) estimate of survival = 0.850 (95% c.i. 0.621 - 0.968) • Bayesian estimate of survival = 0.797 (95% c.i. 0.637 - 0.921) Advantages and Difficulties with Bayesian Analysis • Philosophically very nice • Gives probability of model given data and prior information • Updates estimates as more information becomes available • Does not give biologically implausible estimates – e.g. survival >1 • Fits adaptive management paradigm • Choice of priors somewhat arbitrary • Bayesian analysis with “uninformative priors” gives similar results to simpler methods • Complex • Computation can be VERY time consuming and opaque Methods of Inference in Biology • Descriptive or exploratory analyses – – – – Displays, confidence intervals, effect size statistics Model comparisons using AIC, etc Bayesian analysis (if prior information) Null hypothesis significance tests? • Fitting predictive models – Model comparisons using AIC, etc – Bayesian analysis (if prior information) • Challenging research hypotheses – Model comparisons using AIC, etc – Null hypothesis significance tests This class • Displays, confidence intervals, effect size statistics *** • Model comparisons using AIC, etc ** • Bayesian analysis • Null hypothesis significance tests *