Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical Inference in Wildlife Science • Goals • Concerns with Nulls • Better Approaches? – Information-theoretic – Metareplication – Data dredging • Important References Goal of Wildlife Research • Gain “reliable knowledge” (Romesburg 1981) – Hypothetico-deductive approach is preferred Induction Research Hypothesis Observed Facts Retroduction Predictions Experiment Reject Fail to Reject Dogmatic Laws Modify Research Hypothesis Test of Statistical (Ho) Hypothesis } Reliable Knowledge Research vs. Statistical Hypotheses • H-D method includes research and statistical hypotheses • Research Hypothesis – Conjecture about a process (how nature works) based on theory (retroduction) • Statistical Hypothesis – Conjecture about a class of facts associated with the process (induction); local questions about a single population or system Statisticians Have Long Debated Hypothesis Testing • Relative use of research vs. statistical hypotheses brought a long-standing debate in the world of statistics to wildlife science – We are overly concerned with testing statistical hypotheses and not concerned enough with rigorous development of, and sorting among, research hypotheses (Anderson et al. 2000) Why are Statisticians Concerned with Our Overreliance on Statistical Hypothesis Testing? • Null hypotheses are viewed incorrectly – Trivial to say there is no difference – Focus is on rejecting Ho rather than investigating the size and precision of a treatment effect – Alpha is arbitrary – Often only the P-value is reported (naked P-value) – P-value is not based on data collected, but on that not collected (probability of an observation at least as extreme as observed, given Ho) – P-value depends on N, hence rejection is certain given enough data – P-value does NOT indicate strength of Ha, but rather degree of consistency (or inconsistency) with Ho (Cherry 1998; Johnson 1999, 2002; Anderson et al. 2000; Guthery et al. 2001) A Better Approach? • Focus on estimating effect size and providing a measure of its precision – Confidence Intervals do this • Rely on SE not SD, which measures variation observed in sample, not precision of estimate Focus on Getting a Good Set of Biologically Reasonable Hypotheses • Embrace the concept of multiple working research hypotheses (Chamberlin 1890) rather than the single Ha vs. single Ho – Can protect research from personal bias as researchers no longer have a single favorite hypothesis they work to confirm (Guthery et al. 2001) • Formulate each hypothesis as a mathematical model • Requires close collaboration with statistician to make sure full complexity of biological hypotheses can be represented (nonlinearity, etc) • Sort among multiple hypotheses using information-theoretic approach (Akaike 1973, 1974; Anderson et al. 2000; Burnham and Anderson 1998; Anderson and Burnham 2002; Anderson et al. 2001) Sorting Among Research Hypotheses • Akaike’s Information Criterion (AIC) AIC Bias Best Model Amount Unexplained Variance Number of Parameters (k) AIC n ln( ) 2k 2 (goodness of fit) + (number of parameters) Rank Models • Calculate AIC and rank hypotheses (models) from best (min AIC) to worse – Single AIC not a useful value, it is relative value that is important – Akaike weights (wi) quantify the weight of evidence in favor of a model (evidence that model is best in defined set; sum of wi = 1) – Rules of Thumb • Wi >0.9 indicates a single, superior model • Relative importance of model can be indicated by change in AIC (AICi – AICmin). If Change in AIC for a model is <10, it should be considered supported by the data. – Model averaging is a powerful way to estimate parameters and their precision • Average parameter value is weighted average (using wi) of parameters (Øaverage = sum wiØi ) Some Issues with I-T Approach (Guthery et al. 2001; Robinson and Wainer 2002) • Method is parametric – Requires assumptions about distributions to be met • Definition of research hypotheses defines conclusions – One of those in the group of alternative models will have the minimum AIC – Need to make sure no trivial hypotheses are in set – Hypotheses should reflect plausible, but different, ways that nature works (i.e., be true research hypotheses, not statistical hypotheses) • Null effects are not necessarily trivial, must be modeled if there is good reason • Frequentist statistics are appropriate for analysis of well designed experiments Better Use of P-value (Fisher 1925; Robinson and Wainer 2002) • IF you use frequentist approach, then: – Follow Fisher’s lead and use p-values to screen for potentially real or useful associations that have merit for future investigation, rather than using them to identify end points (significant findings to draw conclusions from) of an investigation Better Use of P-value (Fisher 1925; Robinson and Wainer 2002) • IF you use frequentist approach, then: – Report actual p-value and effect size plus measure of precision – Do not make reject / fail-to-reject decisions • If P<0.05, report evidence of effect and look to confirm with other studies • If 0.2>P>0.05, report evidence exists for further testing of hypothesis with improved design (replication). State result “leans” in a certain direction. • If P>0.2, report that if there is an effect, it is too small to detect with the current experimental design – If you are doing a 1-time experiment, then α should be reduced well below 0.05 – Do not interpret P as the probability of Ho given the data, it is the probability of the data, given a true Ho • If you want to discuss likelihood of a hypothesis, then I-T or Bayesian approaches are more appropriate Metareplication (Johnson 2002) • This approach gets away from individual P-values by focusing on making inference in the context of prior related findings – A Bayesian approach following Fisher’s lead • Search for multiple studies to point in a common direction rather than a single definitive study with low pvalue to show the direction • Replication of studies (metareplication) is the key – Exploit value of small studies, each of which may not be able to make a definitive conclusion – Truth lies at the “intersection of independent lies” (Levins 1966) – Although independent studies each may suffer from various shortcomings (small n, etc.), if they paint substantially similar pictures, we have confidence in what we see Making Management Recommendations • Place less emphasis on the significant finding of an individual study • Use estimates of effect size and precision from individual studies in meta-analysis to determine consistent effects before making management recommendations • Look for truly replicated studies with consistent findings – Different methods, different locations, different observers Dredging Data Along the Way • Dredging data is not bad, it is the creative process, but analyzing dredged data with traditional statistical methods is a violation of assumptions • Surprising findings should be heralded and used to stimulate new hypotheses and experiments • Put dredged findings in Discussion not Results • Admit it when you dredge • Use dredging to screen for possible effects to be considered in future studies • “Any single study can yield a p-value, but only consistency among replicated studies will advance our science” (Johnson 2002) If YOU Are Doing Research, YOU MUST Read: • • • • • • • • • • • • • • • • • Anderson, DR, Link, WA, Johnson, DH, and KP Burnham. 2001. Suggestions for presenting the results of data analysis. J. Wildlife Manage. 65:373-378. Anderson, DR, Burnham, KP, and WL Thompson. 2000. Null hypothesis testing: problems, prevalence, and an alternative. J. Wildlife Manage. 64:912-923. Burnham, KP and DR Anderson. 1998. Model selection and inference: a practical information-theoretic approach. Springer-Verlag, New York. Chamberlin, TC. 1890. The method of multiple working hypotheses. Science 148:754-759 (reprinted there) Cherry, S. 1998. Statistical tests in publications of The Wildlife Society. Wildlife Society Bulletin 26:947953. Johnson, DH. 1999. The insignificance of statistical significance testing. J. Wildlife Management 63:763772. Robinson, DH and H. Wainer. 2002. On the past and future of null hypothesis significance testing. J. Wildlife Management 66:263-271. Fisher, R.A. 1925. Theory of statistical estimation. Proceedings of the Cambridge Philosophical Society 22:700-725. Fisher, RA. 1928. Statistical methods for research workers. 2nd edition. Oliver and Boyd. London. Anderson, DR and KP Burnham. 2002. Avoiding pitfalls when using information-theoretic methods. J. Wildlife Management 66:912-918. Akaike, H. 1973. Information theory as an extension of the maximum likelihood principle. Pp 267-281. in. BN Petrov and F Csaki, eds. Second international symposium on information theory. Akademiai Kiado, Budapest. Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on automatic control AC 19:716-723. Johnson, DG. 2002. The importance of replication in wildlife research. J. Wildlife Management 66:919-932. Johnson, DG. 2002. The role of hypothesis testing in wildlife science. J. Wildlife Management 66:272-276. Guthery, FS, JJ Lusk, and MJ Peterson. 2001. The fall of the null hypothesis: liabilities and opportunities. J. Wildlife Management 65:379-384. Hurlbert, SH. 1984. Pseudoreplication and the design of ecological field experiments. Ecological Monographs 54:187-211. Romesburg, HC. 1981. Wildlife science: gaining reliable knowledge. J. Wildlife Management. 45:293-313.