Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Frequentist versus Bayesian The Bayesian approach In Bayesian statistics we can associate a probability with a hypothesis, e.g., a parameter value q. Interpret probability of q as ‘degree of belief’ (subjective). Need to start with ‘prior pdf’ p(q), this reflects degree of belief about q before doing the experiment. Our experiment has data x, → likelihood function L(x|q). Bayes’ theorem tells how our beliefs should be updated in light of the data x: Posterior pdf p(q|x) contains all our knowledge about q. Glen Cowan Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester Case #4: Bayesian method We need to associate prior probabilities with q0 and q1, e.g., reflects ‘prior ignorance’, in any case much broader than ← based on previous measurement Putting this into Bayes’ theorem gives: posterior Q Glen Cowan likelihood prior Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester Bayesian method (continued) We then integrate (marginalize) p(q0, q1 | x) to find p(q0 | x): In this example we can do the integral (rare). We find Ability to marginalize over nuisance parameters is an important feature of Bayesian statistics. Glen Cowan Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester Bayesian Statistics at work: The Troublesome Extraction of the angle a Stéphane T’JAMPENS LAPP (CNRS/IN2P3 & Université de Savoie) J. Charles, A. Hocker, H. Lacker, F.R. Le Diberder, S. T’Jampens, hep-ph-0607246 Digression: Statistics D.R. Cox, Principles of Statistical Inference, CUP (2006) W.T. Eadie et al., Statistical Methods in Experimental Physics, NHP (1971) www.phystat.org Statistics tries answering a wide variety of questions two main different! frameworks: Frequentist: probability about the data (randomness of measurements), given the model P(data|model) [only repeatable events (Sampling Theory)] Hypothesis testing: given a model, assess the consistency of the data with a particular parameter value 1-CL curve (by varying the parameter value) Bayesian: probability about the model (degree of belief), given the data P(model|data) Likelihood(data,model) Prior(model) Bayesian Statistics in 1 slide The Bayesian approach is based on the use of inverse probability (“posterior”): Bayesian: probability about the model (degree of belief), given the data P(model|data) Likelihood(data;model) Prior(model) Bayes’rule Cox – Principles of Statistical Inference (2006) “it treats information derived from data (“likelihood”) as on exactly equal footing with probabilities derived from vague and unspecified sources (“prior”). The assumption that all aspects of uncertainties are directly comparable is often unacceptable.” “nothing guarantees that my uncertainty assessment is any good for you - I'm just expressing an opinion (degree of belief). To convince you that it's a good uncertainty assessment, I need to show that the statistical model I created makes good predictions in situations where we know what the truth is, and the process of calibrating predictions against reality is inherently frequentist.” (e.g., MC simulations) Uniform prior: model of ignorance? Cox – Principles of Statistical Inference (2006) A central problem : specifying a prior distribution for a parameter about which nothing is known flat prior Problems: Not re-parametrization invariant (metric dependent): uniform in q is not uniform in z=cosq Favors large values too much [the prior probability for the range 0.1 to 1 is 10 times less than for 1 to 10] Flat priors in several dimensions may produce clearly unacceptable answers. In simple problems, appropriate* flat priors yield essentially same answer as non-Bayesian sampling theory. However, in other situations, particularly those involving more than two parameters, ignorance priors lead to different and entirely unacceptable answers. * (uniform prior for scalar location parameter, Jeffreys’ prior for scalar scale parameter). Uniform Prior in Multidimensional Parameter Space Hypersphere: 6D space One knows nothing about the individual Cartesian coordinates x,y,z… What do we known about the radius r =√(x^2+y^2+…) ? One has achieved the remarkable feat of learning something about the radius of the hypersphere, whereas one knew nothing about the Cartesian coordinates and without making any experiment. Isospin Analysis : B→hh J. Charles et al. – hep-ph/0607246 Gronau/London (1990) MA: Modulus & Argument RI: Real & Imaginary Improper posterior Isospin Analysis: removing information from B0→p0p0 No model-independent constraint on a can be inferred in this case Information is extracted on a, which is introduced by the priors (where else?) Conclusion PHYSTAT Conferences: http://www.phystat.org Statistics is not a science, it is mathematics (Nature will not decide for us) [You will not learn it in Physics books go to the professional literature!] Many attempts to define “ignorance” prior to “let the data speak by themselves” but none convincing. Priors are informative. Quite generally a prior that gives results that are reasonable from various viewpoints for a single parameter will have unappealing features if applied independently to many parameters. In a multiparameter space, credible Bayesian intervals generally under-cover. If the problem has some invariance properties, then the prior should have the corresponding structure. specification of priors is fraught with pitfalls (especially in high dimensions). Examine the consequences of your assumptions (metric, priors, etc.) Check for robustness: vary your assumptions Exploring the frequentist properties of the result should be strongly encouraged. α[ππ] : B-factories status LP07 Isospin analysis : reminder • Neglecting EW penguin, the amplitude of the SU(2)-related Bππ modes is : √2 A+0 = √2 A(Bu π+π0) = e-iα (T+- +T00) A+- = A(Bd π+π-) = e-iα T+- + P+- ΔΦ=2α √2 A+0 = e+iα (T+- +T00) ΔΦ=2αeff A+- = e+iα T+- + P+- √2 A00 = √2 A(Bd π0π0) = e-iα T00 - P+- √2 A00 = e+iα T00 - P+- • SU(2) triangular relation : A+0 = A+-/ √2 + A00 • Same for Bρρ decay dominated by longitudinal polarized ρ (CP-even fs) Im • B+0 |A+0|= |A+0| • B+-, C+- |A+-|,|A+-| • S+ sin(2αeff ) 2-fold αeff in [0,π] • B00, C00 A+0 A+0 A00 A+-/√2 α |A00|,|A00| Re Closing SU(2) triangle 8-fold α • S00 relative phase between A00 & A00 A+-/√2 A00 Isospin analysis : reminder • Sin(2αeff) from B (π/ρ)+ (π/ρ)- 2 solutions for αeff in [0,π] • Δα = α-αeff from SU(2) B/Bbar triangles 1 ,2 or 4 solutions for Δα (dep. on triangles closure 2, 4 or 8 solutions for α = αeff + Δα PiPi C00 but noS00 RhoRho no C00/S00 RhoRho C00 AND S00 Bbar 4-fold Δα A00/A+0 A+-/√2/A+0 B 2-fold Δα 1-fold Δα (‘plateau’) 1-fold Δα (peak) Developments in Bayesian Priors Roger Barlow Manchester IoP meeting November 16th 2005 Plan • Probability – Frequentist – Bayesian • Bayes Theorem – Priors • Prior pitfalls (1): Le Diberder • Prior pitfalls (2): Heinrich • Jeffreys’ Prior – Fisher Information • Reference Priors: Demortier Probability Probability as limit of frequency P(A)= Limit NA/Ntotal Usual definition taught to students Makes sense Works well most of the timeBut not all Frequentist probability “It will probably rain tomorrow.” “ Mt=174.3±5.1 GeV means the top quark mass lies between 169.2 and 179.4, with 68% probability.” “The statement ‘It will rain tomorrow.’ is probably true.” “Mt=174.3±5.1 GeV means: the top quark mass lies between 169.2 and 179.4, at 68% confidence.” Bayesian Probability P(A) expresses my belief that A is true Limits 0(impossible) and 1 (certain) Calibrated off clear-cut instances (coins, dice, urns) Frequentist versus Bayesian? Two sorts of probability – totally different. (Bayesian probability also known as Inverse Probability.) Rivals? Religious differences? Particle Physicists tend to be frequentists. Cosmologists tend to be Bayesians No. Two different tools for practitioners Important to: • Be aware of the limits and pitfalls of both • Always be aware which you’re using Bayes Theorem (1763) P(A|B) P(B) = P(A and B) = P(B|A) P(A) P(A|B)=P(B|A) P(A) P(B) Frequentist use eg Čerenkov counter P(p | signal)=P(signal | p) P(p) / P(signal) Bayesian use P(theory |data) = P(data | theory) P(theory) P(data) Bayesian Prior P(theory) is the Prior Expresses prior belief theory is true Can be function of parameter: P(Mtop), P(MH), P(α,β,γ) Bayes’ Theorem describes way prior belief is modified by experimental data But what do you take as initial prior? Uniform Prior General usage: choose P(a) uniform in a (principle of insufficient reason) Often ‘improper’: ∫P(a)da =∞. Though posterior P(a|x) comes out sensible BUT! If P(a) uniform, P(a2) , P(ln a) , P(√a).. are not Insufficient reason not valid (unless a is ‘most fundamental’ – whatever that means) Statisticians handle this: check results for ‘robustness’ under different priors