Download BAYES versus FREQUENTISM - Indico

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
BAYES versus FREQUENTISM
The Return of an Old Controversy
• The ideologies, with examples
•
Upper limits
•
Systematics
Louis Lyons, Oxford University
and CERN
1
PHYSTAT 2003
SLAC, STANFORD, CALIFORNIA
8TH –11TH SEPT 2003
Conference on:
STATISTICAL PROBLEMS IN:
PARTICLE PHYSICS
ASTROPHYSICS
COSMOLOGY
http://www-conf.slac.stanford.edu/phystat2003
2
It is possible to spend a lifetime
analysing data without realising that
there are two very different
approaches to statistics:
Bayesianism and Frequentism.
3
How can textbooks not even mention
Bayes/ Frequentism?
(m   )  Gaussian
with no constraint on m(true) then
m  k  m(true)  m  k
For simplest case
at some probability, for both Bayes and Frequentist
(but different interpretations)
4
See Bob Cousins “Why isn’t every physicist a Bayesian?” Amer Jrnl Phys 63(1995)398
We need to make a statement about
Parameters, Given Data
The basic difference between the two:
Bayesian : Probability (parameter, given data)
(an anathema to a Frequentist!)
Frequentist : Probability (data, given parameter)
(a likelihood function)
5
PROBABILITY
MATHEMATICAL
Formal
Based on Axioms
FREQUENTIST
Ratio of frequencies as n infinity
Repeated “identical” trials
Not applicable to single event or physical constant
BAYESIAN Degree of belief
Can be applied to single event or physical constant
(even though these have unique truth)
Varies from person to person
Quantified by “fair bet”
6
Bayesian versus Classical
Bayesian
P(A and B) = P(A;B) x P(B) = P(B;A) x P(A)
e.g. A = event contains t quark
B = event contains W boson
or
A = you are in CERN
B = you are at Workshop
Completely uncontroversial, provided….
P(A;B) = P(B;A) x P(A) /P(B)
7
Bayesian
P( B; A) x P( A)
P( A; B) 
P( B)
Bayes
Theorem
P ( hyothesis ; data )  P(data; hypothesis ) x P(hypothes is)


posterior
likelihood
Problems: P(hyp..)

prior
true or false
“Degree of belief”
Prior
What functional form?
Coverage
Goodness of fit
8
P(hypothesis…..)
True or False
“Degree of Belief”
credible interval
Prior:
What functional form?
Uninformative prior:
flat?
2
In which variable? e.g. m, m , ln m, ....?
Unimportant if “data overshadows prior”
Important for limits
Subjective or Objective prior?
9
10
11
P (Data;Theory)

P (Theory;Data)
HIGGS SEARCH at CERN
Is data consistent with Standard Model?
or with Standard Model + Higgs?
End of Sept 2000 Data not very consistent with S.M.
Prob (Data ; S.M.) < 1% valid frequentist statement
Turned by the press into: Prob (S.M. ; Data) < 1%
and therefore
Prob (Higgs ; Data) > 99%
i.e. “It is almost certain that the Higgs has been seen”
12
P (Data;Theory)

P (Theory;Data)
Theory = male or female
Data = pregnant or not pregnant
P (pregnant ; female) ~ 3%
but
P (female ; pregnant) >>>3%
13
Example 1 :
Is coin fair ?
Toss coin: 5 consecutive tails
What is P(unbiased; data) ? i.e. p = ½
Depends on Prior(p)
If village priest
prior ~  (1/2)
If stranger in pub
prior ~ 1 for 0<p<1
(also needs cost function)
14
Example 2 :
Particle Identification
Try to separate

and protons
probability (p tag;real p) = 0.95
probability (

tag; real p) = 0.05

probability (  tag ; real  ) = 0.90
probability (p tag ; real ( ) = 0.10
Particle gives proton tag. What is it?
Depends on prior = fraction of protons
If proton beam, very likely
If general secondary particles, more even
If pure  beam, ~ 0
15
Hunter and Dog
1) Dog d has 50%
probability of being
100 m. of Hunter h
2) Hunter h has 50%
probability of being
within 100m of Dog
d
h
d
x
River x =0
River x =1 km
16
Given that: a) Dog d has 50% probability of being
100 m. of Hunter
Is it true that b) Hunter h has 50% probability of
being within 100m of Dog d ?
Additional information
• Rivers at zero & 1 km. Hunter cannot cross them.
0  h  1 km
• Dog can swim across river - Statement a) still true
If dog at –101 m, hunter cannot be within 100m of
dog
Statement b) untrue
17
18
Classical Approach
Neyman “confidence interval” avoids pdf for 
uses only P( x;  )
Confidence interval
P(

1 

1 
 contains  ) = 
2
Varying intervals
from ensemble of
experiments

2
:
True for any

fixed

Gives range of  for which observed value x0 was “likely” (
Contrast Bayes : Degree of belief = 
that t is in

1 
)

19
2
COVERAGE
If true for all  :
“correct coverage”
P<  for some  “undercoverage”
(this is serious !)
P>  for some
 “overcoverage”
Conservative
Loss of rejection
power
20
21
   
l
Frequentist

l
u
at 90% confidence
and


u
known, but random
unknown, but fixed
Probability statement about
Bayesian
 and 
l
u


l
and

u
known, and fixed
unknown, and random
Probability/credible statement about

22
Classical Intervals
• Problems
• Advantages
Hard to understand e.g. d’Agostini e-mail
Arbitrary choice of interval
Possibility of empty range
Over-coverage for integer observation
e.g. # of events
Nuisance parameters (systematic errors)
Widely applicable
Well defined coverage
23
Importance of Ordering Rule
Neyman construction in 1 parameter 
2 measurements
p (x ;  )  G (x -  ,1)
An aside: Determination of single parameter p via

2
2

--------------Acceptable level of

x1 x2
2
Range of parameters given by
1) Values of  for which data
2
is likely i.e. p(  ) is
acceptable
or
2)  2 ( )   2 (  )  1
min
2) is good
2

1) Range depends on  min
[“Confidence interval coupled to goodness of fit”]
24
Neyman Construction
x2
* *
*
For given , acceptable ( x 1 , x 2 )
satisfy
2
Defines cylinder in
Experiment gives
2
2
 = ( x1   )  ( x2   )  Ccut
x1
 , x , x space
1
2
x , x    interval
Range depends on
1
2
x1  x 2
x1  x 2
2

 2   x1  x 2  / 2
2
Range and goodness of fit are coupled
25
That was using Probability Ordering
Now change to Likelihood Ratio Ordering
For x1 
x2 ,no value of

gives very good fit
For Neyman Construction at fixed
x     x   
2
1
2
2
with
where
giving
 , compare:
x   2  x
1

best

 best
2
2

 x1  x 2 / 2
best
2
1
 2
2
1
2    x1  x2   x1  x2    2   x1  x2 
2
4




Cutting on Likelihood Ratio Ordering gives:
x1  x2
C


2
2
26
x1  x 2


2
C
2
Therefore, range of
Constant Width
Independent of

is
x2
x1  x2
2
x1
Confidence Range and Goodness of Fit are completely decoupled
27
Bayesian
Pros:
Easy to understand
Physical Interval
Cons:
Needs prior
Hard to combine
Coverage
28
Standard Frequentist
Pros:
Coverage
Cons:
Hard to understand
Small or Empty Intervals
Different Upper Limits
29
SYSTEMATICS
For example
Nevents  LA  b




we need to know these,
Observed Physics
parameter probably from other

measurements (and/or theory)
N N
for statistical errors
Shift Central Value
Bayesian
Uncertainties error in

Some are arguably statistical errors
LA  LA 0   LA
b  b0   b
Frequentist
Mixed
30
Nevents   LA  b
Simplest Method
Evaluate
 0 using LA 0
and
b0
Move nuisance parameters (one at a time) by
their errors   LA &  b
If nuisance parameters are uncorrelated
Combine these contribution in quadrature
 total systematic
31
Bayesian



  
p  ; N  p N;  
Without systematics

prior
With systematics




p  , LA, b; N  p N ;  , LA, b   , LA, b


~ 1   2 LA 3 b 
Then integrate over LA and b




p  ; N   p  , LA, b; N dLA db
32




p  ; N   p  , LA, b; N dLA db
If 1   = constant and 2 LA  = truncated Gaussian TROUBLE!
Upper limit on

from
 p ; N  d
Significance from likelihood ratio for 
0
and
 max
33
Frequentist
Full Method
Imagine just 2 parameters

and LA
and 2 measurements N and M


Physics Nuisance
Do Neyman construction in 4-D
Use observed N and M, to give
Confidence Region
LA
68%

34
Then project onto
 axis
This results in OVERCOVERAGE
Aim to get better shaped region, by suitable
choice of ordering rule
Example: Profile likelihood ordering
L N 0 M 0 ; , LAbest  
L N 0 M 0 ; best , LAbest  
35
Full frequentist method hard to apply in several
dimensions
Used in
3 parameters
For example:
Neutrino oscillations (CHOOZ)
sin 2 , m
2
2
Normalisation of data
Use approximate frequentist methods that reduce
dimensions to just physics parameters
e.g. Profile pdf



i.e. pdf profile N ;   pdf N , M 0 ;  , LAbest

Contrast Bayes marginalisation
Distinguish “profile
ordering”
36
Properties being studied by Giovanni Punzi
Talks at FNAL CONFIDENCE LIMITS WORKSHOP
(March 2000) by:
Gary Feldman
Wolfgang Rolke
hep-ph/0005187 version 2
Acceptance uncertainty worse than Background uncertainty
Limit of C.L. as   0
 C.L. for   0
Need to check Coverage
37
Method: Mixed Frequentist - Bayesian
Bayesian for nuisance parameters and
Frequentist to extract range
Philosophical/aesthetic problems?
Highland and Cousins
(Motivation was paradoxical behavior of Poisson limit
when LA not known exactly)
38
Bayesian versus Frequentism
Bayesian
Basis of
method
Bayes Theorem -->
Posterior probability
distribution
Frequentist
Uses pdf for data,
for fixed parameters
Meaning of
Degree of belief
probability
Problem of
Yes
parameters?
Frequentist defintion
Needs prior? Yes
No
Choice of
interval?
Data
considered
likelihood
principle?
Yes
Yes (except F+C)
Only data you have
….+ more extreme
Yes
No
Anathema
39
Bayesian versus Frequentism
Bayesian
Ensemble of
experiment
No
Final
statement
Posterior probability
distribution
Unphysical/
empty ranges
Excluded by prior
Systematics
Integrate over prior
Coverage
Decision
making
Unimportant
Yes (uses cost function)
Frequentist
Yes (but often not
explicit)
Parameter values 
Data is likely
Can occur
Extend dimensionality
of frequentist
construction
Built-in
Not useful
40
Bayesianism versus Frequentism
“Bayesians address the question everyone is
interested in, by using assumptions no-one
believes”
“Frequentists use impeccable logic to deal
with an issue of no interest to anyone”
41