Download BAYES versus FREQUENTISM - Indico

BAYES versus FREQUENTISM The Return of an Old Controversy • The ideologies, with examples • Upper limits • Systematics Louis Lyons, Oxford University and CERN 1 PHYSTAT 2003 SLAC, STANFORD, CALIFORNIA 8TH –11TH SEPT 2003 Conference on: STATISTICAL PROBLEMS IN: PARTICLE PHYSICS ASTROPHYSICS COSMOLOGY http://www-conf.slac.stanford.edu/phystat2003 2 It is possible to spend a lifetime analysing data without realising that there are two very different approaches to statistics: Bayesianism and Frequentism. 3 How can textbooks not even mention Bayes/ Frequentism? (m   )  Gaussian with no constraint on m(true) then m  k  m(true)  m  k For simplest case at some probability, for both Bayes and Frequentist (but different interpretations) 4 See Bob Cousins “Why isn’t every physicist a Bayesian?” Amer Jrnl Phys 63(1995)398 We need to make a statement about Parameters, Given Data The basic difference between the two: Bayesian : Probability (parameter, given data) (an anathema to a Frequentist!) Frequentist : Probability (data, given parameter) (a likelihood function) 5 PROBABILITY MATHEMATICAL Formal Based on Axioms FREQUENTIST Ratio of frequencies as n infinity Repeated “identical” trials Not applicable to single event or physical constant BAYESIAN Degree of belief Can be applied to single event or physical constant (even though these have unique truth) Varies from person to person Quantified by “fair bet” 6 Bayesian versus Classical Bayesian P(A and B) = P(A;B) x P(B) = P(B;A) x P(A) e.g. A = event contains t quark B = event contains W boson or A = you are in CERN B = you are at Workshop Completely uncontroversial, provided…. P(A;B) = P(B;A) x P(A) /P(B) 7 Bayesian P( B; A) x P( A) P( A; B)  P( B) Bayes Theorem P ( hyothesis ; data )  P(data; hypothesis ) x P(hypothes is)   posterior likelihood Problems: P(hyp..)  prior true or false “Degree of belief” Prior What functional form? Coverage Goodness of fit 8 P(hypothesis…..) True or False “Degree of Belief” credible interval Prior: What functional form? Uninformative prior: flat? 2 In which variable? e.g. m, m , ln m, ....? Unimportant if “data overshadows prior” Important for limits Subjective or Objective prior? 9 10 11 P (Data;Theory)  P (Theory;Data) HIGGS SEARCH at CERN Is data consistent with Standard Model? or with Standard Model + Higgs? End of Sept 2000 Data not very consistent with S.M. Prob (Data ; S.M.) < 1% valid frequentist statement Turned by the press into: Prob (S.M. ; Data) < 1% and therefore Prob (Higgs ; Data) > 99% i.e. “It is almost certain that the Higgs has been seen” 12 P (Data;Theory)  P (Theory;Data) Theory = male or female Data = pregnant or not pregnant P (pregnant ; female) ~ 3% but P (female ; pregnant) >>>3% 13 Example 1 : Is coin fair ? Toss coin: 5 consecutive tails What is P(unbiased; data) ? i.e. p = ½ Depends on Prior(p) If village priest prior ~  (1/2) If stranger in pub prior ~ 1 for 0<p<1 (also needs cost function) 14 Example 2 : Particle Identification Try to separate  and protons probability (p tag;real p) = 0.95 probability (  tag; real p) = 0.05  probability (  tag ; real  ) = 0.90 probability (p tag ; real ( ) = 0.10 Particle gives proton tag. What is it? Depends on prior = fraction of protons If proton beam, very likely If general secondary particles, more even If pure  beam, ~ 0 15 Hunter and Dog 1) Dog d has 50% probability of being 100 m. of Hunter h 2) Hunter h has 50% probability of being within 100m of Dog d h d x River x =0 River x =1 km 16 Given that: a) Dog d has 50% probability of being 100 m. of Hunter Is it true that b) Hunter h has 50% probability of being within 100m of Dog d ? Additional information • Rivers at zero & 1 km. Hunter cannot cross them. 0  h  1 km • Dog can swim across river - Statement a) still true If dog at –101 m, hunter cannot be within 100m of dog Statement b) untrue 17 18 Classical Approach Neyman “confidence interval” avoids pdf for  uses only P( x;  ) Confidence interval P(  1   1   contains  ) =  2 Varying intervals from ensemble of experiments  2 : True for any  fixed  Gives range of  for which observed value x0 was “likely” ( Contrast Bayes : Degree of belief =  that t is in  1  )  19 2 COVERAGE If true for all  : “correct coverage” P<  for some  “undercoverage” (this is serious !) P>  for some  “overcoverage” Conservative Loss of rejection power 20 21     l Frequentist  l u at 90% confidence and   u known, but random unknown, but fixed Probability statement about Bayesian  and  l u   l and  u known, and fixed unknown, and random Probability/credible statement about  22 Classical Intervals • Problems • Advantages Hard to understand e.g. d’Agostini e-mail Arbitrary choice of interval Possibility of empty range Over-coverage for integer observation e.g. # of events Nuisance parameters (systematic errors) Widely applicable Well defined coverage 23 Importance of Ordering Rule Neyman construction in 1 parameter  2 measurements p (x ;  )  G (x -  ,1) An aside: Determination of single parameter p via  2 2  --------------Acceptable level of  x1 x2 2 Range of parameters given by 1) Values of  for which data 2 is likely i.e. p(  ) is acceptable or 2)  2 ( )   2 (  )  1 min 2) is good 2  1) Range depends on  min [“Confidence interval coupled to goodness of fit”] 24 Neyman Construction x2 * * * For given , acceptable ( x 1 , x 2 ) satisfy 2 Defines cylinder in Experiment gives 2 2  = ( x1   )  ( x2   )  Ccut x1  , x , x space 1 2 x , x    interval Range depends on 1 2 x1  x 2 x1  x 2 2   2   x1  x 2  / 2 2 Range and goodness of fit are coupled 25 That was using Probability Ordering Now change to Likelihood Ratio Ordering For x1  x2 ,no value of  gives very good fit For Neyman Construction at fixed x     x    2 1 2 2 with where giving  , compare: x   2  x 1  best   best 2 2   x1  x 2 / 2 best 2 1  2 2 1 2    x1  x2   x1  x2    2   x1  x2  2 4     Cutting on Likelihood Ratio Ordering gives: x1  x2 C   2 2 26 x1  x 2   2 C 2 Therefore, range of Constant Width Independent of  is x2 x1  x2 2 x1 Confidence Range and Goodness of Fit are completely decoupled 27 Bayesian Pros: Easy to understand Physical Interval Cons: Needs prior Hard to combine Coverage 28 Standard Frequentist Pros: Coverage Cons: Hard to understand Small or Empty Intervals Different Upper Limits 29 SYSTEMATICS For example Nevents  LA  b     we need to know these, Observed Physics parameter probably from other  measurements (and/or theory) N N for statistical errors Shift Central Value Bayesian Uncertainties error in  Some are arguably statistical errors LA  LA 0   LA b  b0   b Frequentist Mixed 30 Nevents   LA  b Simplest Method Evaluate  0 using LA 0 and b0 Move nuisance parameters (one at a time) by their errors   LA &  b If nuisance parameters are uncorrelated Combine these contribution in quadrature  total systematic 31 Bayesian       p  ; N  p N;   Without systematics  prior With systematics     p  , LA, b; N  p N ;  , LA, b   , LA, b   ~ 1   2 LA 3 b  Then integrate over LA and b     p  ; N   p  , LA, b; N dLA db 32     p  ; N   p  , LA, b; N dLA db If 1   = constant and 2 LA  = truncated Gaussian TROUBLE! Upper limit on  from  p ; N  d Significance from likelihood ratio for  0 and  max 33 Frequentist Full Method Imagine just 2 parameters  and LA and 2 measurements N and M   Physics Nuisance Do Neyman construction in 4-D Use observed N and M, to give Confidence Region LA 68%  34 Then project onto  axis This results in OVERCOVERAGE Aim to get better shaped region, by suitable choice of ordering rule Example: Profile likelihood ordering L N 0 M 0 ; , LAbest   L N 0 M 0 ; best , LAbest   35 Full frequentist method hard to apply in several dimensions Used in 3 parameters For example: Neutrino oscillations (CHOOZ) sin 2 , m 2 2 Normalisation of data Use approximate frequentist methods that reduce dimensions to just physics parameters e.g. Profile pdf    i.e. pdf profile N ;   pdf N , M 0 ;  , LAbest  Contrast Bayes marginalisation Distinguish “profile ordering” 36 Properties being studied by Giovanni Punzi Talks at FNAL CONFIDENCE LIMITS WORKSHOP (March 2000) by: Gary Feldman Wolfgang Rolke hep-ph/0005187 version 2 Acceptance uncertainty worse than Background uncertainty Limit of C.L. as   0  C.L. for   0 Need to check Coverage 37 Method: Mixed Frequentist - Bayesian Bayesian for nuisance parameters and Frequentist to extract range Philosophical/aesthetic problems? Highland and Cousins (Motivation was paradoxical behavior of Poisson limit when LA not known exactly) 38 Bayesian versus Frequentism Bayesian Basis of method Bayes Theorem --> Posterior probability distribution Frequentist Uses pdf for data, for fixed parameters Meaning of Degree of belief probability Problem of Yes parameters? Frequentist defintion Needs prior? Yes No Choice of interval? Data considered likelihood principle? Yes Yes (except F+C) Only data you have ….+ more extreme Yes No Anathema 39 Bayesian versus Frequentism Bayesian Ensemble of experiment No Final statement Posterior probability distribution Unphysical/ empty ranges Excluded by prior Systematics Integrate over prior Coverage Decision making Unimportant Yes (uses cost function) Frequentist Yes (but often not explicit) Parameter values  Data is likely Can occur Extend dimensionality of frequentist construction Built-in Not useful 40 Bayesianism versus Frequentism “Bayesians address the question everyone is interested in, by using assumptions no-one believes” “Frequentists use impeccable logic to deal with an issue of no interest to anyone” 41

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download BAYES versus FREQUENTISM - Indico