Download Roger Barlow - Manchester HEP

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Developments in Bayesian
Priors
Roger Barlow
Manchester IoP meeting
November 16th 2005
Plan
• Probability
– Frequentist
– Bayesian
• Bayes Theorem
– Priors
• Prior pitfalls (1): Le Diberder
• Prior pitfalls (2): Heinrich
• Jeffreys’ Prior
– Fisher Information
• Reference Priors: Demortier
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 2
Probability
Probability as limit of frequency
P(A)= Limit NA/Ntotal
Usual definition taught to students
Makes sense
Works well most of the timeBut not all
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 3
Frequentist probability
“It will probably rain tomorrow.”
“ Mt=174.3±5.1 GeV means the top quark mass
lies between 169.2 and 179.4, with 68%
probability.”
“The statement ‘It will rain tomorrow.’ is
probably true.”
“Mt=174.3±5.1 GeV means: the top quark mass
lies between 169.2 and 179.4, at 68%
confidence.”
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 4
Bayesian Probability
P(A) expresses my belief that A is true
Limits 0(impossible) and 1 (certain)
Calibrated off clear-cut instances (coins,
dice, urns)
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 5
Frequentist versus Bayesian?
Two sorts of probability – totally different.
(Bayesian probability also known as Inverse
Probability.)
Rivals? Religious differences?
Particle Physicists tend to be frequentists.
Cosmologists tend to be Bayesians
No. Two different tools for practitioners
Important to:
• Be aware of the limits and pitfalls of both
• Always be aware which you’re using
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 6
Bayes Theorem (1763)
P(A|B) P(B) = P(A and B) = P(B|A) P(A)
P(A|B)=P(B|A) P(A)
P(B)
Frequentist use eg Čerenkov counter
P( | signal)=P(signal | ) P() / P(signal)
Bayesian use
P(theory |data) = P(data | theory) P(theory)
P(data)
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 7
Bayesian Prior
P(theory) is the Prior
Expresses prior belief theory is true
Can be function of parameter:
P(Mtop), P(MH), P(α,β,γ)
Bayes’ Theorem describes way prior
belief is modified by experimental data
But what do you take as initial prior?
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 8
Uniform Prior
General usage: choose P(a) uniform in a
(principle of insufficient reason)
Often ‘improper’: ∫P(a)da =∞. Though posterior
P(a|x) comes out sensible
BUT!
If P(a) uniform, P(a2) , P(ln a) , P(√a).. are not
Insufficient reason not valid (unless a is ‘most
fundamental’ – whatever that means)
Statisticians handle this: check results for
‘robustness’ under different priors
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 9
Example – Le Diberder
Sad Story
Fitting CKM angle α from B
6 observables
3 amplitudes: 6 unknown parameters
(magnitudes, phases)
α is the fundamentally interesting one
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 10
Results
Frequentist
Bayesian
Set one phase to zero
Uniform priors in other
two phases and 3
magnitudes
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 11
More Results
Bayesian
Parametrise Tree and Penguin
amplitudes

 i
 Te  Pe iP
1  i
i
A0  
e T  TC e TC
2
1  i
i
A00  
e TCe TC  Pe iP
2
A




Bayesian
3 Amplitudes:
3 real parts, 3 Imaginary parts
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 12
Interpretation
• B shows same
(mis)behaviour
• Removing all
experimental info gives
similar P(α)
• The curse of high
dimensions is at work
Uniformity in x,y,z makes
P(r) peak at large r
This result is not robust
under changes of prior
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 13
Example - Heinrich
CDF statistics group looking at problem of estimating
signal cross section S in presence of background and
efficiency.
N= εS+b
Efficiency and Background from separate calibration
experiments (sidebands or MC). Scaling factors κ, ω
are known.
Everything done using Bayesian methods with uniform
priors and Poisson statistics formula. Calibration
experiments use uniform prior for ε and for b,
yielding posteriors used for S
P(N|S)=(1/N!)∫∫e-(εS+b) (εS+b )N P(ε) P(b) dε db
Check coverage – all fine
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 14
But it all goes pear shaped..
If particle decays in several channels
Hγγ H τ+ τ- Hbb
Each channel with different b and ε: total 2N+1
parameters, 2N+1 experiments
Heavy undercoverage!
100%
e.g. with 4 channels,
all ε=25±10%, b=0.75±0.25 90%
For s≈10 get ’90% upper limit’
above s in only 80% of cases
10
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
20
Slide 15
S
The curse strikes again
Uniform prior in ε: fine
Uniform prior in ε1, ε2… εN
εN-1 prior in total ε
Prejudice in favour of
high efficiency
Signal size downgraded
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 16
Happy ending
Effect avoided by using Jeffreys’ Priors
instead of uniform priors for ε and b
Not uniform but like 1/ε, 1/b
Not entirely realistic but interesting
Uniform prior in S is not a problem – but
maybe should consider 1/√S?
Coverage (a very frequentist concept) is a
useful tool for Bayesians
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 17
Fisher Information
P(x,a): everything
P(x)|a is the pdf
P(a)|x is the
likelihood L(a)
Manchester IoP Half
Day Meeting
An informative experiment
is one for which a
measurement of x will give
precise information about
the parameter a.
Quantify: I(a)= -<2 ln L/a2>
(Second derivative –
curvature)
Roger Barlow: Developments in
Bayesian Priors
Slide 18
Jeffreys’ Prior
A prior may be uniform in a –
but if I(a) depends on a it’s
still not ‘flat’: special values of
a give better measurements
Transform a  a’ such that I(a’) is
constant. Then choose a uniform prior
• location parameter – uniform prior OK
• scale parameter – a’ is ln a. prior 1/a
• Poisson mean – prior 1/√a
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 19
Objective Prior?
Jeffreys called this an ‘objective’ prior as
opposed to ‘subjective’ or straight guesswork,
but not everyone was convinced
For statisticians ‘flat prior’ means Jeffreys
prior. For physicists it means uniform prior
Prior depends on likelihood. Your ‘prior belief’
P(MH) (or whatever) depends on the analysis
Equivalent to a prior proportional to √I
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 20
Reference Priors (Demortier)
4 steps
• Intrinsic Discrepancy
Between two PDFs
δ{P1(z),P2(z)}=Min{∫P1(z)ln(P1(z)/P2(z)) dz,
∫P2(z)ln(P2(z)/P1(z))dz}
Sensible measure of difference
δ=0 iff P1(z) & P2(z) are the same, else +ve
Invariant under all transformations of z
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 21
Reference Priors (2)
2) Expected Intrinsic Information
Measurement M: x is sampled from p(x|a)
Parameter a has a prior p(a)
Joint distribution p(x,a)=p(x|a) p(a)
Marginal distribution p(x)=∫p(x|a) p(a) da
I(p(a),M)=δ{p(x,a),p(x)p(a)}
Depends on (i) x-a relationship and (ii) breadth
of p(a)
Expected Intrinsic (Shannon) Information from
measurement M about parameter a
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 22
Reference Priors (3)
3) Missing information
Measurement Mk – k samples of x
Enough measurements fix a completely
Limit k∞ I(p(a),Mk) is the difference
between knowledge encapsulated in
prior p(a) and complete knowledge of a.
Hence Missing Information given p(a).
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 23
Reference Priors(4)
4) Family of priors P (e.g. Fourier series,
polynomials, histogram). p(a)P
Ignorance principle: choose the least
informative (dumbest) prior in the family:
the one for which the missing information
Limit k∞ I(p(a),Mk) is largest.
Technical difficulties in taking k limit and
integrating over infinite range of a
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 24
Family of Priors (Google)
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 25
Reference Priors
Do not represent subjective belief – in fact the
opposite (like a jury selection). Allow the most input
to come from the data. Formal consensus
practitioners can use to arrive at sensible posterior
Depend on measurement p(x|a) – cf Jeffreys
Also require family of P of possible priors
May be improper but this doesn’t matter (do not
represent…).
For 1 parameter (if measurement is asymptoticallly
Gaussian, which the CLT usually secures) give
Jeffreys prior
But can also (unlike Jeffreys) work for several
parameters
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 26
Summary
• Probability
– Frequentist
– Bayesian
• Bayes Theorem
– Priors
• Prior pitfalls (1): Le Diberder
• Prior pitfalls (2): Heinrich
• Jeffreys’ Prior
– Fisher Information
• Reference Priors: Demortier
Manchester IoP Half
Day Meeting
Roger Barlow: Developments in
Bayesian Priors
Slide 27
Related documents