* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
fourth lecture Goodness of fit, confidence intervals and limits Jorge Andre Swieca School Campos do Jordão, January,2003 References • Statistical Data Analysis, G. Cowan, Oxford, 1998 • Statistics, A guide to the Use of Statistical Methods in the Physical Sciences, R. Barlow, J. Wiley & Sons, 1989; • Particle Data Group (PDG) Review of Particle Physics, 2002 electronic edition. • Data Analysis, Statistical and Computational Methods for Scientists and Engineers, S. Brandt, Third Edition, Springer, 1999 Limits “Tens, como Hamlet, o pavor do desconhecido? Mas o que é conhecido? O que é que tu conheces, Para que chames desconhecido a qualquer coisa em especial?” Álvaro de Campos (Fernando Pessoa) “Se têm a verdade, guardem-na!” Lisbon Revisited, Álvaro de Campos Statistical tests How well the data stand in agreement with given predicted probabilities – hypothesis. null hypothesis H0 f ( x | H0 ) alternative f ( x | H1 ) f ( x | H2 ) function of measured variables: test statistics t (x ) g (t | H0 ) g (t | H0 )dt error first kind significance level t cut t cut g (t | H0 )dt power = 1 error second kind power to discriminate against H1 Neyman-Pearson lemma Where to place tcut? H0 signal H1 background 1-D: efficiency (and purity) m-D: t (t1,..., tm ) def. of acceptance region is not obvious Neyman-Pearson lemma: highest power (highest signal purity) for a given significance level α region of t-space such that g (t | H0 ) c g (t | H1 ) determined by the desired efficiency Goodness of fit how well a given null hypothesis H0 is compatible with the observed data (no reference to other alternative hypothesis) coins: N tosses, nh , nt= N - nh test statistic: nh coin “fair’? H and T equal? binomial distribution, p=0.5 N! 12 nh 12 N nh f ( nh ; N ) nh ! (N nh )! N=20, nh=17 E[nh]=Np=10 0 1 2 3 10 17 18 19 20 f (0;20) f (1;20) f ( 2 : 20) f (3;20) f (17;20) f (18;20) f (19 : 20) f ( 20;20) Goodness of fit P-value: probability P, under H0, obtain a result as compatible of less with H0 than the one actually observed. P-value is a random variable, α is a constant specified before carrying out the test P=0.0026 Bayesian statistics: use the Bayes theorem to assign a probability to H0 (specify the prior probability) P value is often interpreted incorrectly as a prob. to H0 P-value: fraction of times on would obtain data as compatible with H0 or less so if the experiment (20 coin tosses) were repeated under similar circunstances Goodness of fit “optional stopping problem” Easy to identify the region of values of t with equal or less degree of compatibility with the hypothesis than the observed value (alternate hypothesis: p ≠ 0.5) Significance of an observed signal Whether a discrepancy between data and expectation is sufficiently significant to merit a claim for a new discovery signal event ns, Poisson variable νS background event nb, Poisson variable νb n ns nb s b ( s b ) e prob. to observe n events: f (n; s , b ) n! n ( s b ) experiment: nobs events, quantify our degree of confidence in the discovery of a new effect (νS≠0) How likely is to find nobs events or more from background alone? Significance of an observed signal P (n nobs ) n nobs f (n; s 0, b ) 1 1 nobs 1 bn e n 0 Ex: expect νb=0.5, nobs= 5 nobs 1 f (n; n 0 s 0, b ) b n! P(n>nobs)=1.7x10-4 this is not the prob. of the hypothesis νS=0 ! this is the prob., under the hypothesis νS=0, of obtaining as many events as observed or more. Significance of an observed signal How to report the measurement? estimate of ν : 5 5 s 4.5 2.2 misleading: • only two std. deviations from zero • impression that νS is not very incompatible with zero yes: prob. that a Poisson variable of mean νb will fluctuate up to nobs or higher no: prob. that a variable with mean nobs will fluctuate down to νb or lower Pearson’s test 2 ni histogram of x with N bins ν i construct a statistic which reflects the level of agreement between observed and expected histograms N (ni i ) data n (n1,, nN ) ni 5 2 aprox. gaussian, Poisson distributed i i 1 with (1,, N ) follow a 2 distribution for N degrees of freedom • regardless of the distribution of x • distribution free larger larger discrepancy between data and the hypothesis 2 Pearson’s P f ( z; nd )dz E [ ] nd 2 2 15 nd 10 2 P 0.13 test 2 2 nd 2 150 1 nd 100 (rule of thumb for a good fit) P 9.0 10 4 Pearson’s test 2 Pearson’s N Before ntot ni test 2 N Poisson variable with tot i i 1 i 1 Set ntot = fixed ni dist. as multinomial with prob. pi i ntot Not testing the total number of expected and observed Events, but only the distribution of x. 2 ( n p n ) i tot 2 i pi ntot i 1 N large number on entries in each bin pi known Follows a 2 distribution for N-1 degrees of freedom In general, if m parameters estimated from data, nd = N - m Standard deviation as stat. error n observations of x, hypothesis p.d.f f(x;θ) ML: ˆ( x1,, x n ) estimator for θ analytic method RCF bound Monte Carlo graphical measurement standard deviation ˆˆ ˆ ˆˆ repeated estimates each based on n obs.: estimator dist. g (ˆ; ) centered around true value θ and with true ˆ estimated by ˆ and ˆˆ Most practical estimators: g (ˆ; ) becomes approx. Gaussian in the large sample limit. Classical confidence intervals n obs. of x, evaluate an estimator ˆ( x1,, x n ) for a param. θ ˆobs obtained and its p.d.f. g (ˆ; ) (for a given θ unknown) prob. α ˆ u prob. β ˆ v P (ˆ u ( )) ˆ; )dˆ 1 G(u ( ), ) g ( u ( ) P(ˆ v ( )) v ( ) ˆ; )dˆ G(v ( ), ) g ( Classical confidence intervals prob. for estimator to be inside the belt regardless of θ P(v ( ) ˆ u ( )) 1 ˆ u ( ) ˆ v ( ) u ( ), v ( ) monotonic incresing functions of θ a(ˆ) u1(ˆ) b(ˆ) v 1(ˆ) a(ˆ ) b(ˆ ) P (a(ˆ) b(ˆ) 1 P (a(ˆ) ) P (b(ˆ) ) Classical confidence intervals Usually: central confidence interval 2 P (a b ) 1 a: hypothetical value of ˆ for which a fraction of the repeated estimt. ˆ would be higher than the obtain. ˆobs ˆobs u (a) v (b) g (ˆ; a)dˆ 1 G(ˆ obs ; a) obs ˆobs g (ˆ; b)dˆ G(ˆ obs ; b) Classical confidence intervals Relationship between a conf. interval and a test of goodness of fit: test the hypothesys a using ˆ obs having equal or less agreement than the result obtained P-value = α (random variable) and θ = a is specified Confidence interval: α is specified first, a is a random quantity depending on the data [a, b ] d ˆ c c ˆ a d b ˆ Classical confidence intervals Many experiments: the interval would include the true value in 1 It does not mean that the probability that the true value of is in the fixed interval is 1 Frequency interpretation: is not a random variable, but the interval fluctuates since it is constructed from data. Gaussian distributed Simple and very important application Central limit theorem: any estimator linear function of sum of random variables becomes Gaussian in the large sample limit. ˆ 2 G(ˆ; , ˆ ) ˆ (ˆ ) exp( )dˆ 2 2 ˆ 22ˆ 1 ˆobs ˆobs a 1 G(ˆobs ; a, ˆ ) 1 ( ) ˆobsˆ b 1 G(ˆobs ; b, ˆ ) 1 ( ) ˆ known, experiment resulted in Gaussian distributed a ˆobs ˆ 1 (1 ) b ˆobs ˆ 1 (1 ) 1( ) 1(1 ) Gaussian distributed Choose quantile 1 (1 2 ) 1 2 3 1 1 (1 ) 1 0.6827 0.9544 0.9973 1 2 3 0.8413 0.9772 0.9987 1 1 (1 ) Choose confidence level 1 0.90 0.95 0.99 1 (1 2 ) 1.645 1.960 2.576 0.90 0.95 0.99 1.282 1.645 2.326.