Download P(A) - IN2P3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Frequentist versus Bayesian
The Bayesian approach
In Bayesian statistics we can associate a probability with
a hypothesis, e.g., a parameter value q.
Interpret probability of q as ‘degree of belief’ (subjective).
Need to start with ‘prior pdf’ p(q), this reflects degree
of belief about q before doing the experiment.
Our experiment has data x, → likelihood function L(x|q).
Bayes’ theorem tells how our beliefs should be updated in
light of the data x:
Posterior pdf p(q|x) contains all our knowledge about q.
Glen Cowan
Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester
Case #4: Bayesian method
We need to associate prior probabilities with q0 and q1, e.g.,
reflects ‘prior ignorance’, in any
case much broader than
← based on previous
measurement
Putting this into Bayes’ theorem gives:
posterior Q
Glen Cowan
likelihood

prior
Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester
Bayesian method (continued)
We then integrate (marginalize) p(q0, q1 | x) to find p(q0 | x):
In this example we can do the integral (rare). We find
Ability to marginalize over nuisance parameters is an important
feature of Bayesian statistics.
Glen Cowan
Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester
Bayesian Statistics at work: The
Troublesome Extraction of the
angle a
Stéphane T’JAMPENS
LAPP (CNRS/IN2P3 & Université de Savoie)
J. Charles, A. Hocker, H. Lacker, F.R. Le Diberder, S. T’Jampens, hep-ph-0607246
Digression: Statistics
D.R. Cox, Principles of Statistical Inference, CUP (2006)
W.T. Eadie et al., Statistical Methods in Experimental Physics, NHP (1971)
www.phystat.org
Statistics tries answering a wide variety of questions  two main different! frameworks:
Frequentist: probability about the data (randomness of measurements),
given the model
P(data|model)
[only repeatable events
(Sampling Theory)]
Hypothesis testing: given a model, assess the consistency of the data with a
particular parameter value  1-CL curve (by varying the parameter value)
Bayesian: probability about the model (degree of belief), given the data
P(model|data) Likelihood(data,model)  Prior(model)
Bayesian Statistics in 1 slide
The Bayesian approach is based on the use of inverse probability (“posterior”):
Bayesian: probability about the model (degree of belief), given the data
P(model|data)  Likelihood(data;model)  Prior(model)
Bayes’rule
Cox – Principles of Statistical Inference (2006)
 “it treats information derived from data (“likelihood”) as on exactly equal
footing with probabilities derived from vague and unspecified sources (“prior”).
The assumption that all aspects of uncertainties are directly comparable is often
unacceptable.”
 “nothing guarantees that my uncertainty assessment is any good for you - I'm
just expressing an opinion (degree of belief). To convince you that it's a good
uncertainty assessment, I need to show that the statistical model I created
makes good predictions in situations where we know what the truth is, and the
process of calibrating predictions against reality is inherently frequentist.”
(e.g., MC simulations)
Uniform prior: model of ignorance?
Cox – Principles of
Statistical Inference (2006)
A central problem : specifying a prior distribution for a parameter about
which nothing is known  flat prior
Problems:
Not re-parametrization invariant (metric dependent):
uniform in q is not uniform in z=cosq
Favors large values too much [the prior probability for the range 0.1 to 1 is 10
times less than for 1 to 10]
Flat priors in several dimensions may produce clearly unacceptable answers.
In simple problems, appropriate* flat priors yield essentially same answer as
non-Bayesian sampling theory. However, in other situations, particularly those
involving more than two parameters, ignorance priors lead to different and
entirely unacceptable answers.
* (uniform prior for scalar location parameter, Jeffreys’ prior for scalar scale parameter).
Uniform Prior in Multidimensional Parameter Space
Hypersphere:
6D space
One knows nothing about
the individual Cartesian
coordinates x,y,z…
What do we known
about the radius
r =√(x^2+y^2+…) ?
One has achieved the remarkable feat of learning
something about the radius of the hypersphere,
whereas one knew nothing about the Cartesian
coordinates and without making any experiment.
Isospin Analysis : B→hh
J. Charles et al. – hep-ph/0607246
Gronau/London (1990)
MA: Modulus & Argument
RI: Real & Imaginary
Improper posterior
Isospin Analysis: removing information from B0→p0p0
No model-independent constraint on a can be inferred in this case
 Information is extracted on a, which is introduced by the priors (where else?)
Conclusion
PHYSTAT Conferences:
http://www.phystat.org
Statistics is not a science, it is mathematics (Nature will not decide for us)
[You will not learn it in Physics books  go to the professional literature!]
Many attempts to define “ignorance” prior to “let the data speak by themselves” but
none convincing. Priors are informative.
Quite generally a prior that gives results that are reasonable from various viewpoints
for a single parameter will have unappealing features if applied independently to
many parameters.
In a multiparameter space, credible Bayesian intervals generally under-cover.
If the problem has some invariance properties, then the prior should have the
corresponding structure.
specification of priors is fraught with pitfalls (especially in high dimensions).
Examine the consequences of your assumptions (metric, priors, etc.)
Check for robustness: vary your assumptions
Exploring the frequentist properties of the result should be strongly encouraged.
α[ππ] : B-factories status LP07
Isospin analysis : reminder
• Neglecting EW penguin, the amplitude of the SU(2)-related Bππ modes is :
√2 A+0 = √2 A(Bu π+π0) = e-iα (T+- +T00)
A+- =
A(Bd π+π-) = e-iα T+- + P+-
ΔΦ=2α
√2 A+0 = e+iα (T+- +T00)
ΔΦ=2αeff
A+- = e+iα T+- + P+-
√2 A00 = √2 A(Bd π0π0) = e-iα T00 - P+-
√2 A00 = e+iα T00 - P+-
• SU(2) triangular relation : A+0 = A+-/ √2 + A00
• Same for Bρρ decay dominated by longitudinal polarized ρ (CP-even fs)
Im
• B+0
 |A+0|= |A+0|
• B+-, C+-  |A+-|,|A+-|
• S+ sin(2αeff )  2-fold αeff in [0,π]
• B00, C00
A+0
A+0
A00
A+-/√2
α
 |A00|,|A00|
Re
Closing SU(2) triangle  8-fold α
• S00
 relative phase between A00 & A00
A+-/√2
A00
Isospin analysis : reminder
• Sin(2αeff) from B (π/ρ)+ (π/ρ)-  2 solutions for αeff in [0,π]
• Δα = α-αeff from SU(2) B/Bbar triangles 1 ,2 or 4 solutions for Δα (dep. on triangles closure
 2, 4 or 8 solutions for α = αeff + Δα
PiPi
C00 but noS00
RhoRho
no C00/S00
RhoRho
C00 AND S00
Bbar
4-fold Δα
A00/A+0
A+-/√2/A+0
B
2-fold Δα
1-fold Δα (‘plateau’) 1-fold Δα (peak)
Developments in Bayesian
Priors
Roger Barlow
Manchester IoP meeting
November 16th 2005
Plan
• Probability
– Frequentist
– Bayesian
• Bayes Theorem
– Priors
• Prior pitfalls (1): Le Diberder
• Prior pitfalls (2): Heinrich
• Jeffreys’ Prior
– Fisher Information
• Reference Priors: Demortier
Probability
Probability as limit of frequency
P(A)= Limit NA/Ntotal
Usual definition taught to students
Makes sense
Works well most of the timeBut not all
Frequentist probability
“It will probably rain tomorrow.”
“ Mt=174.3±5.1 GeV means the top quark mass
lies between 169.2 and 179.4, with 68%
probability.”
“The statement ‘It will rain tomorrow.’ is probably
true.”
“Mt=174.3±5.1 GeV means: the top quark mass
lies between 169.2 and 179.4, at 68%
confidence.”
Bayesian Probability
P(A) expresses my belief that A is true
Limits 0(impossible) and 1 (certain)
Calibrated off clear-cut instances (coins,
dice, urns)
Frequentist versus Bayesian?
Two sorts of probability – totally different.
(Bayesian probability also known as Inverse
Probability.)
Rivals? Religious differences?
Particle Physicists tend to be frequentists.
Cosmologists tend to be Bayesians
No. Two different tools for practitioners
Important to:
• Be aware of the limits and pitfalls of both
• Always be aware which you’re using
Bayes Theorem (1763)
P(A|B) P(B) = P(A and B) = P(B|A) P(A)
P(A|B)=P(B|A) P(A)
P(B)
Frequentist use eg Čerenkov counter
P(p | signal)=P(signal | p) P(p) / P(signal)
Bayesian use
P(theory |data) = P(data | theory) P(theory)
P(data)
Bayesian Prior
P(theory) is the Prior
Expresses prior belief theory is true
Can be function of parameter:
P(Mtop), P(MH), P(α,β,γ)
Bayes’ Theorem describes way prior belief is
modified by experimental data
But what do you take as initial prior?
Uniform Prior
General usage: choose P(a) uniform in a
(principle of insufficient reason)
Often ‘improper’: ∫P(a)da =∞. Though posterior
P(a|x) comes out sensible
BUT!
If P(a) uniform, P(a2) , P(ln a) , P(√a).. are not
Insufficient reason not valid (unless a is ‘most
fundamental’ – whatever that means)
Statisticians handle this: check results for
‘robustness’ under different priors