Download Prior approval: The growth of Bayesian methods in psychology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Forecasting wikipedia , lookup

Choice modelling wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
1
British Journal of Mathematical and Statistical Psychology (2013), 66, 1–7
© 2013 The British Psychological Society
www.wileyonlinelibrary.com
Editorial
Prior approval: The growth of Bayesian methods in
psychology
Within the last few years, Bayesian methods of data analysis in psychology have
proliferated. In this paper, we briefly review the history or the Bayesian approach to
statistics, and consider the implications that Bayesian methods have for the theory and
practice of data analysis in psychology.
Until recently, Bayesian methods for data analysis in psychology were largely unheard of.
Standard textbooks barely gave them even a cursory mention and most researchers
could spend their entire careers unaware of the existence of anything beyond the
orthodox canon that they learned as students. For those who had heard of them,
Bayesian methods were often dismissed as infected by subjectivism, something that few
if any quantitative psychologists could endorse. Bayesian statistics was seen as a minority
topic with a small but fiery cult following. It was not part of the mainstream of the
discipline of statistics and could be safely ignored by scientists dealing with the practical
realities of data analysis.
Times have changed. Beginning in the early 1990s, there was an abrupt proliferation of
studies using Bayesian methods in mainstream statistics.
DOI:10.1111/bmsp.12004
2
Editorial
This has continued for over 20 years—the figure above shows the number of articles on
Bayesian statistics in five top-ranked statistics journals over four decades—so that now the
topic of Bayesian methods comprises some 20% of published articles in statistics. This
trend has been accompanied by a growth in the number and popularity of text-books (e.g.,
Gelman, Carlin, Stern, & Rubin, 2003; Gelman & Hill, 2007) and general purpose software
for Bayesian data analysis and modelling (e.g., BUGS, JAGS). The rising tide has not been
unnoticed within psychology. Although perhaps delayed in its reaction, there has been a
remarkable increase in the use of Bayesian methods in quantitatively focused psychology
journals in the last decade, as the following figure makes clear:
In the last few years, popular textbooks (e.g., Kruschke, 2011) for Bayesian data analysis in
psychology have emerged. However one might feel about them, Bayesian methods can
now no longer be ignored as an irrelevance.
The importance of these trends is that Bayesian methods are not just another set of
topics in advanced statistics such as, for example, structural equation modeling or
nonlinear regression. For some, they represent a new paradigm (in the Kuhnian sense of
term) for the field. As such, their increasing adoption has potentially profound
implications for the nature and practice of data analysis in psychology, possibly affecting
everything from the editorial policies of journals to how statistics is taught to students.
Despite their growing appeal, however, there remains a troubling lack of clarity about
what exactly Bayesian methods do and do not entail and about how they differ from their
so-called classical counterparts. Bayesian methods are often portrayed as being based on a
subjective rather than frequentist interpretation of probability, with inference being an
updating of personal beliefs in light of evidence. In practice, however, most modern
applications of Bayesian methods to real-world data analysis problems are characterized
by pragmatism and expediency: Bayesian methods are adopted because they promise
(and arguably often deliver) solutions to important or difficult problems. Increasingly, the
subjective-Bayesian versus objective-frequentist account begins to seem like like historical
baggage that has yet to be replaced by an account more in keeping with how Bayesian
methods have evolved and are presently being used in psychology and other disciplines.
A more realistic understanding of Bayesian methods is required by our discipline. The
better the appreciation of the nature of Bayesian methods and of their similarities and
Editorial
3
differences with classical methods, the more constructive will be any debate over best
practices or the value of any given methodology. Likewise, it may avoid the dogmatism and
unnecessary polemic that have sometimes accompanied the advocacy of or opposition to
Bayesian methods in the past.
In what follows, we attempt to outline what Bayesian methods are, as they are
currently practiced today, and how they compare to their nominal rivals. To do so, we
first briefly outline the history of Bayesian methods from their origin up to the recent
past.
1. A brief history of Bayesian methods
The modern practice of Bayesian statistics has its origin in a single essay Towards Solving
a Problem in the Doctrine of Chances written by the Reverend Thomas Bayes, and
posthumously published in 1763, two years after the author’s death. The topic being
addressed in the essay was clearly stated in its opening paragraph:
Given the number of times an unknown event has happened and failed: Required the chance
that the probability of its happening in a single trial lies somewhere between any two degrees
of probability that can be named. (Bayes & Price, 1763, p 376)
In more modern terminology, Bayes was considering the problem of inferring the
probability parameter h of a Binomial distribution on the basis of observing n successes in
N trials. In particular, he considered how to infer that h had a value between two
probabilities h0 and h1 , and showed that
R h1
Pðh0 h h1 jN; nÞ ¼ Rh0
hn ð1 hÞNn dh
hn ð1 hÞNn dh
;
when the possible values of h are equally likely a priori.
The historical importance of Bayes’s essay was that it was the first clear solution to a
problem of inverse probability. Calculating forward probability, such as the probability of
drawing a red marble from an urn of n red and m black marbles, had been worked out for
many nontrivial problems by the end of the 17th century as a result of the work of, among
others, Pascal, Fermat, and Jacob Bernouilli. By contrast, solving the inverse problem—for
instance the inverse problem of inferring the values of n and m from observing a series of
draws from the urn of marbles—had remained elusive. For some of the original pioneers
like Jacob Bernoulli, solving this inverse problem was seen as the key to the eventual
application of probability theory beyond the gambling table to real problems in the
physical and social sciences. Bayes’s essay was the first presentation of a solution to a
special case of this problem.
As important as Bayes’s essay was, it is undoubtedly a case of Stigler’s Law of Eponomy
(Stigler, 1999) that we speak of Bayesian statistics or even Bayes’s theorem. Stigler’s law
states that no scientific discovery is named after its original discoverer, and indeed it was
Pierre-Simon Laplace, a giant of 19th century science, who was first to present what we
now call Bayes’s theorem—a generalization of Bayes’s original work that he independently developed—and established and developed the practice we now call Bayesian
statistics. This methodology, which came to be known as inverse probability was the
dominant method of statistical inference until the early 20th century. It was applied to
4
Editorial
problems in the social sciences, earth sciences and, most famously, in astronomy (when
Laplace accurately predicted the masses of Jupiter and Saturn).
1.1. The rise of sampling-theory based inference
New statistical methods that began to gain traction in the late 19th and early 20th century
sought to be founded on rigorous and objective principles. Questions about the nature of
probability became increasingly important. From these new perspectives, the dominant
methods of inverse probability were found wanting. To the extent that this method
survived, it was only from lack of suitable alternatives. As Pearson noted: “The practical
man will … accept the results of inverse probability of the Bayes/Laplace brand till better
are forthcoming” (Pearson, 1920, p. 3). As such, R. A. Fisher’s declaration in his
monumental 1925 Statistical Methods for Research Workers (Fisher, 1925) that “… the
theory of inverse probability is founded upon an error, and must be wholly rejected.”
(Fisher, 1925, p. 10) can be seen as the coup de grace for the use of inverse probability as a
method of inference.
Fisher made his central point of criticism of Bayesian methods clear: “Inferences
respecting populations, from which known samples have been drawn, cannot be
expressed in terms of probability, except in the trivial case when the population is itself a
sample of a super-population the specification of which is known with accuracy.”
(Fisher, 1925, p.10). In other words, the objects of inference, such as parameters in a
probability distribution, are not random variables and so their possible values can not be
expressed in terms of probabilities. In data analysis, the to-be-inferred variables are fixed
but unknown quantities and it is only the observed data that can be described
probabilistically. This perspective was based on the so-called aleatory definition of
probability. This is frequentist interpretation that holds probability only applies to the
outcomes of random physical processes. This general perspective was to drastically limit
the application of probability theory to problems of statistical inference. It became the
received view, justifying the near wholesale abandonment of Bayesian methods for
several decades.
1.2. Bayes gets personal
While a strict frequentist interpretation of probability entailed the abandonment of
Bayesian methods, an equally strict yet opposing interpretation allowed it to survive,
albeit with a minority status. This view strictly interpreted probability as degree of
personal belief (see e.g., De Finetti, 1974). Under this interpretation, Bayes’s rule took on a
central role as the means to update one’s degree of belief in light of new evidence. Initially,
our beliefs about some variable h can be expressed by some probability distribution P(h).
In light of evidence from a set of data D0 , we update these beliefs as
PðhjD0 Þ / PðD1 jhÞPðhÞ:
With yet more data D1 , we continue this process as
PðhjD1 ; D0 Þ / PðD1 jhÞPðhjD0 Þ;
so that the posterior probability at one step becomes the prior probability at the next.
When applied to practical matters like data analysis, this subjective Bayesian approach
Editorial
5
required the starting point for analysis to be the statement and quantification of one’s
beliefs about the variables to be inferred. Having done this, the relevant data allow us to
update our beliefs by the methodical application of Bayes’s rule.
1.3. A Bayesianism Revival
From the early to the late 20th century, the methods that the likes of Fisher promoted
became almost ubiquitous, and Bayesian methods remained marginalized. This marginalization seems less to do with philosophical commitments to theories of probability, than
to the practical utility of Bayesian methods relative to their now well-established
counterparts. For almost all of the commonplace data analysis methods, under general
conditions, the results of classical and Bayesian methods were roughly comparable. For
example, in a linear model, the frequentist confidence interval and the Bayesian posterior
interval were identical under certain circumstances. Bayesian methods did not appear to
offer much difference in practical terms, yet seemed to demand a dubious commitment to
subjectivity that many were reluctant to make.
An advantage of the Bayesian approach was that it was based on the application of
probability calculus to problems of statistics to an extent far beyond that of classical
methods. What this entailed was that whenever a probabilistic model of data could be
specified, how to infer the values of any unobserved variables or parameters in that model
could always be derived in principle. This meant that, in principle at least, challenging data
analysis problems that required bespoke models could always be used. Classical methods,
by contrast, were often stymied by nuisance variables, missing data and small data-sets
(not to mention latent variables and hierarchical data structures).
The catalyst for the adoption of Bayesian methods was the increase in the availability
and cost of computing power. When computer power was minimal, “in principle”
advantages of Bayesian methods were not important because the calculations involved
were often intractable. However, the roughly exponential growth of computational
power from the 1970’s onwards has meant that calculations that were almost beyond the
imagination in one decade could become commonplace in the next. In statistics, this
change seems to begin in the 1980’s, and was in full sway by the 1990’s.
2. The theory and practice of Bayesian methods
The flourishing of Bayesian methods that began in the early 1990’s coincided with the
emergence of general Monte Carlo techniques for inference (e.g., Gelfand & Smith, 1990).
With this development, Bayesian methods could now be applied to previously intractable
problems in statistics and they began to be adopted widely in science, including in
psychology, as they offered solutions to challenging statistical problems.
Although it certainly appears that the growth in popularity of Bayesian methods was
largely a consequence of their practical advantages, the debate about what exactly defines
Bayesian models does not seem to be fully resolved. At the heart of this debate is the
question of whether prior probabilities in Bayesian models are (or should be) a reflection
of our beliefs about the nature of phenomenon being studied. As we have mentioned,
stock definitions of Bayesian methods seem to take this for granted. By contrast, in
practice, priors seem to be chosen on the basis of convenience or expediency.
As we see it, the choice of priors is like the choice of the probabilistic model of the data.
For example, given a set of observations x1 . . .xn , we might model this data as
6
Editorial
xi Nðl; r2 Þ;
for i 2 f1. . .ng:
The choice of this probabilistic model need not be a reflection of our true beliefs about
how this data was generated. Rather it can be seen as literally just a model that can
potentially provide insight into the nature and structure of the data. By the same
reasoning, the priors on l or r2 need not be a reflection of our true beliefs about the
parameters, but are just part of our general modelling assumptions. Just as the generative
model provides a probabilistic model of the data, the priors provide a probabilistic model
of parameters. Just as we assume that our data is drawn from some probability distribution
with fixed but unobserved parameters, so too we assume that the values of the parameters
are drawn from another probability distribution (also with fixed but unobserved
parameters). Priors, therefore, are just assumptions of our model. Like any other
assumptions, they can be good or bad and may need to be extended, revised or possibly
abandoned on the basis of their suitability to the data being studied.
In the current issue, Gelman and Shalizi (2013) addresses in depth this question about the
theory and practice of Bayesian methods. They argue that that practical uses of Bayesian
methodsisoftenatoddswithitsofficialphilosophy.Inparticular,theyadvocatetreatingpriors
and modelsingeneral ashypotheses that should evaluated and possiblyrevised or abandoned
in light of well they fit the data and problem being addressed. This contrasts sharply with the
the view of Bayesian models as ideally infallible representations of our beliefs.
The paper by Gelman and Shalizi is followed by commentaries from a set of
statisticians, philosophers and mathematical psychologists. Each provide their perspective on the general debate about the value of Bayesian methods and how they should be
used. From these papers, it seems clear that Bayesian methods have entered a period of
maturity. Defensive reactions either for or against Bayesian methods seem to have given
way to more balanced views. Anti-Bayesians are rare, and few who use Bayesian methods
treat it as the only method of statistical analysis.
3. The future of Bayesian methods in psychology?
In keeping with the general theme of the Gelman and Shalizi paper and its commentaries,
the position we take here is that psychology needs to move beyond the premises of the
standard critiques of frequentist and Bayesian methods and adopt methods that are useful
in tackling pressing research questions. This is not a new message. For instance, E. G.
Boring’s critique of significance testing in psychology made the point that “… statistical
ability, divorced from a scientific intimacy with the fundamental observations, leads
nowhere” (Boring, 1919, p. 338).
For a psychologist, one crucial advantage of Bayesian data analysis is it now provides a
general, workable framework for incorporating prior information into a statistical
analysis. There are two main objections to this assertion. The first—which we have argued
is not true of modern Bayesian methods—is that this opens the doors to subjectivism in
quantitative psychology. The second is that classical, frequentist methods can and do
incorporate prior information into their analyses. This objection is reasonable and one that
we agree with. The limitation of this approach, however, is that priors typically enter into
a frequentist analysis in an ad hoc fashion.
For example, consider the problem of estimation an odds ratio from a 2 9 2
contingency table with one or more zero cells. A common ad hoc fix is to add 0.5 to each
observed cell value. This, in a sense, captures the prior intuition that an observed zero is an
Editorial
7
underestimate and that the odds-ratio in the population is not zero or infinity but
somewhere in between. More generally, we argue that prior information is used to
structure an analysis by assuming normal errors with constant variance or that
observations are sampled from a binomial distribution with a fixed probability.
Prior information, seen in this light, provides leverage to explore difficult analytic
problems by adding information about the context or from theory. The leverage the extra
information affords is particularly useful for analyses where data is not plentiful or the
number of plausible models is large (encompassing most psychological research). This
needs to be done with care and a degree of humility—regardless of the methods being
used. A poor Bayesian analysis is unlikely to offer any insights over and above a good
frequentist analysis (and may be activiely misleading).
One reason for a degree of humility in our analysis is that no probability model and
hence no statistical model in psychology is complete. There will always be some degree of
uncertainty associated with the choice of model and the appropriateness of its
assumptions. As Macdonald (2002, p. 187) wrote: “if the incompleteness of probability
models … were more widely appreciated psychologists and others might adopt a more
reasonable attitude to statistical tests, the debate about statistical inference might die
down, and the emphasis could shift toward better understanding and presenting data”.
Bayesian data analysis is not a panacea for the problems of statistical modelling in
psychology. Rather, they extend the number and range of tools available to tackle
substantive research questions in our discipline.
Mark Andrews and Thom Baguley (Nottingham Trent University, UK)
References
Bayes, T., & Price, R. (1763). An essay towards solving a problem in the doctrine of chances. By the
late Rev. Mr. Bayes, F.R.S. Communicated by Mr. Price, in a letter to John Canton, A.M.F.R.S.
Philosophical Transactions, 53, 370–418. doi:10.1098/rstl.1763.0053
Boring, E. G. (1919). Mathematical versus scientific significance. Psychological Bulletin, 16,
335–338. doi:10.1037/h0074554
De Finetti, B. (1974). Theory of probability : a critical introductory treatment. London, UK: Wiley.
Fisher, R. A. (1925). Statistical Methods For Research Workers. Edinburgh, UK: Oliver and Boyd.
Gelfand, A., & Smith, A. (1990). Sampling-based Approaches to Calculating Marginal Densities.
Journal of the American Statistical Association, 85(410), 398–409. doi:10.1080/
01621459.1990.10476213
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2003). Bayesian data analysis (2nd ed.).
Chapman & Hall.
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models.
New York, NY: Cambridge University Press.
Gelman, A., & Shalizi, C. (2013). The philosophy and practice of bayesian statistics. The British
Journal of Mathematical and Statistical Psychology, 66, 8–34. doi:10.1111/j.20448317.2011.02037.x
Kruschke, J. K. (2011). Doing Bayesian data analysis. Burlington, MA: Academic Press.
Macdonald, R. R. (2002). The incompleteness of probability models and the resultant implications
for theories of statistical inference. Understanding Statistics, 1, 167–189. doi:10.1207/
S15328031US0103_03
Pearson, K. (1920). The fundamental problem of practical statistics. Biometrika, 13(1), 1–16.
doi:10.1093/biomet/13.1.1
Stigler, S. M. (1999). Statistics on the table: The history of statistical concepts and methods.
Cambridge, MA: Harvard University Press.