Download Statistical Inference in Wildlife Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Transcript
Statistical Inference in Wildlife Science
• Goals
• Concerns with Nulls
• Better Approaches?
– Information-theoretic
– Metareplication
– Data dredging
• Important References
Goal of Wildlife Research
• Gain “reliable knowledge” (Romesburg 1981)
– Hypothetico-deductive approach is preferred
Induction
Research
Hypothesis
Observed
Facts
Retroduction
Predictions
Experiment
Reject
Fail to Reject
Dogmatic
Laws
Modify
Research
Hypothesis
Test of
Statistical (Ho)
Hypothesis
}
Reliable
Knowledge
Research vs. Statistical Hypotheses
• H-D method includes research and
statistical hypotheses
• Research Hypothesis
– Conjecture about a process (how nature
works) based on theory (retroduction)
• Statistical Hypothesis
– Conjecture about a class of facts associated
with the process (induction); local questions
about a single population or system
Statisticians Have Long Debated
Hypothesis Testing
• Relative use of research vs. statistical
hypotheses brought a long-standing
debate in the world of statistics to wildlife
science
– We are overly concerned with testing
statistical hypotheses and not concerned
enough with rigorous development of, and
sorting among, research hypotheses
(Anderson et al. 2000)
Why are Statisticians Concerned with Our Overreliance on Statistical Hypothesis Testing?
• Null hypotheses are viewed incorrectly
– Trivial to say there is no difference
– Focus is on rejecting Ho rather than investigating the size
and precision of a treatment effect
– Alpha is arbitrary
– Often only the P-value is reported (naked P-value)
– P-value is not based on data collected, but on that not
collected (probability of an observation at least as
extreme as observed, given Ho)
– P-value depends on N, hence rejection is certain given
enough data
– P-value does NOT indicate strength of Ha, but rather
degree of consistency (or inconsistency) with Ho
(Cherry 1998; Johnson 1999, 2002; Anderson et al. 2000; Guthery et al. 2001)
A Better Approach?
• Focus on estimating effect size and
providing a measure of its precision
– Confidence Intervals do this
• Rely on SE not SD, which measures variation
observed in sample, not precision of estimate
Focus on Getting a Good Set of
Biologically Reasonable Hypotheses
• Embrace the concept of multiple working
research hypotheses (Chamberlin 1890) rather than the
single Ha vs. single Ho
– Can protect research from personal bias as
researchers no longer have a single favorite
hypothesis they work to confirm (Guthery et al. 2001)
• Formulate each hypothesis as a mathematical
model
• Requires close collaboration with statistician to make sure full
complexity of biological hypotheses can be represented (nonlinearity, etc)
• Sort among multiple hypotheses using
information-theoretic approach (Akaike 1973, 1974; Anderson
et al. 2000; Burnham and Anderson 1998; Anderson and Burnham 2002; Anderson et
al. 2001)
Sorting Among Research Hypotheses
• Akaike’s Information Criterion (AIC)
AIC
Bias
Best
Model
Amount
Unexplained
Variance
Number of Parameters (k)
AIC  n ln(  )  2k
2
(goodness of fit) + (number of parameters)
Rank Models
• Calculate AIC and rank hypotheses (models) from best
(min AIC) to worse
– Single AIC not a useful value, it is relative value that is important
– Akaike weights (wi) quantify the weight of evidence in favor of a
model (evidence that model is best in defined set; sum of wi = 1)
– Rules of Thumb
• Wi >0.9 indicates a single, superior model
• Relative importance of model can be indicated by change in AIC
(AICi – AICmin). If Change in AIC for a model is <10, it should be
considered supported by the data.
– Model averaging is a powerful way to estimate parameters and
their precision
• Average parameter value is weighted average (using wi) of
parameters (Øaverage = sum wiØi )
Some Issues with I-T Approach
(Guthery et al. 2001; Robinson and Wainer 2002)
• Method is parametric
– Requires assumptions about distributions to be met
• Definition of research hypotheses defines conclusions
– One of those in the group of alternative models will have the
minimum AIC
– Need to make sure no trivial hypotheses are in set
– Hypotheses should reflect plausible, but different, ways that
nature works (i.e., be true research hypotheses, not statistical
hypotheses)
• Null effects are not necessarily trivial, must be modeled if
there is good reason
• Frequentist statistics are appropriate for analysis of well
designed experiments
Better Use of P-value
(Fisher 1925; Robinson and Wainer 2002)
• IF you use frequentist approach, then:
– Follow Fisher’s lead and use p-values to
screen for potentially real or useful
associations that have merit for future
investigation, rather than using them to
identify end points (significant findings to draw
conclusions from) of an investigation
Better Use of P-value
(Fisher 1925; Robinson and Wainer 2002)
• IF you use frequentist approach, then:
– Report actual p-value and effect size plus measure of precision
– Do not make reject / fail-to-reject decisions
• If P<0.05, report evidence of effect and look to confirm with other
studies
• If 0.2>P>0.05, report evidence exists for further testing of
hypothesis with improved design (replication). State result “leans”
in a certain direction.
• If P>0.2, report that if there is an effect, it is too small to detect with
the current experimental design
– If you are doing a 1-time experiment, then α should be reduced
well below 0.05
– Do not interpret P as the probability of Ho given the data, it is the
probability of the data, given a true Ho
• If you want to discuss likelihood of a hypothesis, then I-T or
Bayesian approaches are more appropriate
Metareplication (Johnson 2002)
• This approach gets away from individual P-values by
focusing on making inference in the context of prior
related findings
– A Bayesian approach following Fisher’s lead
• Search for multiple studies to point in a common
direction rather than a single definitive study with low pvalue to show the direction
• Replication of studies (metareplication) is the key
– Exploit value of small studies, each of which may not be able to
make a definitive conclusion
– Truth lies at the “intersection of independent lies” (Levins 1966)
– Although independent studies each may suffer from various
shortcomings (small n, etc.), if they paint substantially similar
pictures, we have confidence in what we see
Making Management Recommendations
• Place less emphasis on the significant
finding of an individual study
• Use estimates of effect size and precision
from individual studies in meta-analysis to
determine consistent effects before
making management recommendations
• Look for truly replicated studies with
consistent findings
– Different methods, different locations, different
observers
Dredging Data Along the Way
• Dredging data is not bad, it is the creative
process, but analyzing dredged data with
traditional statistical methods is a violation of
assumptions
• Surprising findings should be heralded and used
to stimulate new hypotheses and experiments
• Put dredged findings in Discussion not Results
• Admit it when you dredge
• Use dredging to screen for possible effects to be
considered in future studies
• “Any single study can yield a p-value, but
only consistency among replicated studies
will advance our science” (Johnson 2002)
If YOU Are Doing Research, YOU MUST Read:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Anderson, DR, Link, WA, Johnson, DH, and KP Burnham. 2001. Suggestions for presenting the results of
data analysis. J. Wildlife Manage. 65:373-378.
Anderson, DR, Burnham, KP, and WL Thompson. 2000. Null hypothesis testing: problems, prevalence, and
an alternative. J. Wildlife Manage. 64:912-923.
Burnham, KP and DR Anderson. 1998. Model selection and inference: a practical information-theoretic
approach. Springer-Verlag, New York.
Chamberlin, TC. 1890. The method of multiple working hypotheses. Science 148:754-759 (reprinted there)
Cherry, S. 1998. Statistical tests in publications of The Wildlife Society. Wildlife Society Bulletin 26:947953.
Johnson, DH. 1999. The insignificance of statistical significance testing. J. Wildlife Management 63:763772.
Robinson, DH and H. Wainer. 2002. On the past and future of null hypothesis significance testing. J.
Wildlife Management 66:263-271.
Fisher, R.A. 1925. Theory of statistical estimation. Proceedings of the Cambridge Philosophical Society
22:700-725.
Fisher, RA. 1928. Statistical methods for research workers. 2nd edition. Oliver and Boyd. London.
Anderson, DR and KP Burnham. 2002. Avoiding pitfalls when using information-theoretic methods. J.
Wildlife Management 66:912-918.
Akaike, H. 1973. Information theory as an extension of the maximum likelihood principle. Pp 267-281. in.
BN Petrov and F Csaki, eds. Second international symposium on information theory. Akademiai Kiado,
Budapest.
Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on automatic control
AC 19:716-723.
Johnson, DG. 2002. The importance of replication in wildlife research. J. Wildlife Management 66:919-932.
Johnson, DG. 2002. The role of hypothesis testing in wildlife science. J. Wildlife Management 66:272-276.
Guthery, FS, JJ Lusk, and MJ Peterson. 2001. The fall of the null hypothesis: liabilities and opportunities. J.
Wildlife Management 65:379-384.
Hurlbert, SH. 1984. Pseudoreplication and the design of ecological field experiments. Ecological
Monographs 54:187-211.
Romesburg, HC. 1981. Wildlife science: gaining reliable knowledge. J. Wildlife Management. 45:293-313.