Download on p (D+ YA+)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia , lookup

Birthday problem wikipedia , lookup

Dempster–Shafer theory wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability box wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Bayes for Beginners
Reverend Thomas Bayes (1702-61)
Velia Cardin
Marta Garrido
http://www.fil.ion.ucl.ac.uk/spm/software/spm2/
“In addition to WLS estimators and classical inference, SPM2 also
supports Bayesian estimation and inference. In this instance the
statistical parametric maps become posterior probability maps
Posterior Probability Maps (PPMs), where the posterior probability is
a probability of an effect given the data. There is no multiple
comparison problem in Bayesian inference and the posterior
probabilities do not require adjustment.”
Overview
Velia
Bayes vs Frequentist approach
An example
Bayes theorem
Marta
Bayesian Inference
Posterior Probability Distribution
Bayes in SPM
Summary
Frequentist
Statistics
vs
Bayesian
statistics
In general, we want to relate an event (E) to a hypothesis (H)
…and the probability of E given H
With a frequentist approach….
We obtain a p-value that is the probability, given a true H0, for the outcome to be
more or equally extreme as the observed outcome.
If the p-value is sufficiently small, you reject the null hypothesis, but…
it doesn’t say anything about the probability of Hi.
The frequentist conclusion is restricted to the data at hand, it doesn’t take
into account previous, valuable information.
In general, we want to relate an event (E) to a hypothesis (H)
and the probability of E given H
With a Bayesian approach…
The probability of a H being true is determined.
A probability distribution of the parameter or hypothesis is obtained
You can compare the
probabilities of different H for
a same E
Conclusions depends on previous evidence. Bayesian approach is not
data analysis per se, it brings different types of evidence to answer
the questions of importance.
Given a prior state of knowledge or belief, it tells how to
update beliefs based upon observations (current data).
Dana Plato died of a drug
overdose at age 34
Todd Bridges on suspicion of
shooting and stabbing alleged
drug dealer in a crack house. ...
Macaulay Culkin
Busted for Drugs!
Feldman , arrested
and charged with
heroin possession
Corey Haim in a spiral of
prescription drug abuse!
Our observations….
DREW BARRYMORE REVEALS
ALCOHOL AND DRUG PROBLEMS
STARTED AGED EIGHT
Our hypothesis is: “Young actors have more probability of becoming drug-addicts”
We took a random sample of 40 people, 10 of them were young stars, being 3 of
them addicted to drugs. From the other 30, just one.
Drug-addicted
D+
Dyoung YA+
actors
control
YA-
3
7
10
1
29
30
4
36
40
With a frequentist approach we will test:
Hi:’Conditions A and B have different effects’
Young actors have a different probability of becoming drug addicts than
the rest of the people
H0:’There is no difference in the effect of conditions A and B’
The statistical test of choice is 2 and Yates’ correction:
2 = 3.33
p=0.07
We can’t reject the null hypothesis, and the information the p is giving us is
basically that if we “do this experiment” many times, 7% of the times we will obtain
this result if there is no difference between both conditions.
This is not what we want to know!!!
…and we have strong believes that young actors have more probability of
becoming drug addicts!!!
With a Bayesian approach…
D+
We want to know if p (D+YA+) > p (D+YA-)
p(D+YA+)
p (D+ YA+) = p (D+ and YA+) / p (YA+)
p (D+ YA+) = 0.075 / 0.25 = 0.3
D-
total
YA+
3 (0.075) 7 (0.175)
10 (0.25)
YA-
1 (0.025) 29 (0.725)
30 (0.75)
total
4 (0.1)
36 (0.9)
40 (1)
p (D+YA-)
p (D+ YA-) = p (D+ and YA-) / p (YA-)
p (D+ YA-) = 0.025 / 0.75 = 0.033
0.3 > 0.033
p (D+YA+) > p (D+YA-)
Reformulating
p (D+ and YA+) = p (YA+ D+) * p (D+)
Substituting p (D+ and YA+) on
p (D+ YA+) = p (D+ and YA+) / p (YA+)
p (D+YA+)  p (YA+D+)
p (YA+D+)
p (YA+ D+) = p (D+ and YA+) / p (D+)
p (YA+ D+) = 0.075 / 0.1 = 0.75
0.3  0.75
p (D+ YA+)
p (YA+ D+) * p (D+)
p (YA+)
This is Bayes’ Theorem !!!
Bayes’ Theorem for a given parameter 
p (data) = p (data) p () / p (data)
1/P (data) is basically
a normalizing constant
Posterior  likelihood x prior
The prior is the probability of the parameter and represents what was
thought before seeing the data.
The likelihood is the probability of the data given the parameter and
represents the data now available.
The posterior represents what is thought given both prior information
and the data just seen.
It relates the conditional density of a parameter (posterior probability)
with its unconditional density (prior, since depends on information
present before the experiment).
In fMRI….
• Classical
– ‘What is the
likelihood of getting
these data given no
activation occurred?’
• Bayesian (SPM2)
– ‘What is the chance of
getting these parameters,
given these data?
Bayesian Inference
In order to make probability statements about  given y we begin with a model
p( , y )  p( ) p( y |  )
p( | y ) 
joint prob. distribution
p( , y ) p( ) p( y |  )

p( y )
p( y)
where
or
p( y )   p( ) p( y |  )
discrete case
p( y )   p( ) p( y |  )
continuous case

posterior prior
likelihood
p( | y )  p( ) p( y |  )
What you know about the model after
p, (is what
| y ) you
the data arrive,
p (what
 ) the
knew before,
, and
data told you,
p( y.|  )
Posterior Probability Distribution
precision l = 1/2
Likelihood:
Prior:
Posterior:
p(y|) = N(Md, ld-1)
p() = N(Mp, lp-1)
p(|y) ∝ p(y|)* p() = N(Mpost, lpost-1)
lpost = ld + lp
Mpost = ld Md + lp Mp
lpost
lpost-1
ld-1
lp-1
Mp
Mpost Md

The effects of different precisions
lp = ld
lp < ld
lp > ld
lp ≈ 0
Multivariate Distributions
Bayes in SPM
SPM uses priors for estimation in…
 spatial normalization
 segmentation
and Bayesian inference in…
 Posterior Probability Maps (PPM)
 Dynamic Causal Modelling (DCM)
Shrinkage Priors
Large, variable effect
Small, variable effect


Large, consistent effect
Small, consistent effect


Thresholding
g
p( > g | y) = 0.95

Summary
Bayesian methods use probability models for quantifying
uncertainty in inferences based on statistical data analysis.
priors over the
parameters
In Bayesian estimation we…
1. …start with the formulation of a model that we hope is
adequate to describe the situation of interest.
posterior
distributions
2. …observe the data and when the information available
changes it is necessary to update the degrees of belief
(probability).
new priors over
the parameters
3. …evaluate the fit of the model. If necessary we compute
predictive distributions for future observations.
Prejudices or scientific judgment?
The selection of a
prior is subjective
and arbitrary.
It is reasonable to draw
conclusions in the light of
some reason.
References
•
•
•
http://www.stat.ucla.edu/history/essay.pdf (Bayes’ original essay!!!)
http://www.cs.toronto.edu/~radford/res-bayes-ex.html
http://www.gatsby.ucl.ac.uk/~zoubin/bayesian.html
•
A. Gelman, J.B. Carlin, H.S. Stern and D.B. Rubin, 2nd ed. Bayesian Data Analysis.
Chapman & Hall/CRC.
•
Mackay D. Information Theory, Inference and Learning Algorithms. Chapter 37:
Bayesian inference and sampling theory. Cambridge University Press, 2003.
•
Berry D, Stangl D. Bayesian Methods in Health-Realated Research. In: Bayesian
Biostatistics. Berry D and Stangl K (eds). Marcel Dekker Inc, 1996.
•
Friston KJ, Penny W, Phillips C, Kiebel S, Hinton G, Ashburner J. Classical and
Bayesian inference in neuroimaging: theory. Neuroimage. 2002 Jun;16(2):465-83.
Bayes for Beginners
Reverend Thomas Bayes (1702-61)
“We don’t see what we don’t seek.”
E. M. Forster
…good-bayes!!!