Download Goodness of fit, confidence intervals and limits

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Statistics wikipedia, lookup

Transcript
fourth lecture
Goodness of fit, confidence
intervals and limits
Jorge Andre Swieca School
Campos do Jordão, January,2003
References
• Statistical Data Analysis, G. Cowan, Oxford, 1998
• Statistics, A guide to the Use of Statistical Methods in the
Physical Sciences, R. Barlow, J. Wiley & Sons, 1989;
• Particle Data Group (PDG) Review of Particle Physics,
2002 electronic edition.
• Data Analysis, Statistical and Computational Methods for
Scientists and Engineers, S. Brandt, Third Edition,
Springer, 1999
Limits
“Tens, como Hamlet, o pavor do desconhecido?
Mas o que é conhecido? O que é que tu conheces,
Para que chames desconhecido a qualquer coisa em
especial?”
Álvaro de Campos (Fernando Pessoa)
“Se têm a verdade, guardem-na!”
Lisbon Revisited, Álvaro de Campos
Statistical tests
How well the data stand in agreement with given predicted
probabilities – hypothesis.

null hypothesis H0 f ( x | H0 )


alternative f ( x | H1 ) f ( x | H2 )

function of measured variables: test statistics t (x ) g (t | H0 )

   g (t | H0 )dt
error first kind
significance level
t cut
t cut
   g (t | H0 )dt

power = 1  
error second kind
power to discriminate
against H1
Neyman-Pearson lemma
Where to place tcut?
H0 signal
H1 background
1-D: efficiency (and purity)
m-D: t  (t1,..., tm )
def. of acceptance region is not obvious
Neyman-Pearson lemma: highest power (highest signal purity)
for a given significance level α
region of t-space such that

g (t | H0 )

c
g (t | H1 )
determined by
the desired
efficiency
Goodness of fit
how well a given null hypothesis H0 is compatible with the
observed data (no reference to other alternative hypothesis)
coins: N tosses, nh , nt= N - nh
test statistic: nh
coin “fair’? H and T equal?
binomial distribution, p=0.5
N!
 12 nh  12 N nh
f ( nh ; N ) 
nh ! (N  nh )!
N=20, nh=17
E[nh]=Np=10
0 1 2 3
10
17 18 19 20
f (0;20)  f (1;20)  f ( 2 : 20)  f (3;20) 
f (17;20)  f (18;20)  f (19 : 20)  f ( 20;20)
Goodness of fit
P-value: probability P, under H0, obtain a result
as compatible of less with H0 than the one actually
observed.
P-value is a random variable, α is a constant specified before
carrying out the test
P=0.0026
Bayesian statistics: use the Bayes theorem to assign a
probability to H0 (specify the prior probability)
P value is often interpreted incorrectly as a prob. to H0
P-value: fraction of times on would obtain data as compatible
with H0 or less so if the experiment (20 coin tosses) were
repeated under similar circunstances
Goodness of fit
“optional stopping problem”
Easy to identify the region of values of t with equal or less
degree of compatibility with the hypothesis than the observed
value (alternate hypothesis: p ≠ 0.5)
Significance of an observed signal
Whether a discrepancy between data and expectation is
sufficiently significant to merit a claim for a new discovery
signal event ns, Poisson variable νS
background event nb, Poisson variable νb
n  ns  nb
   s  b
( s   b ) e
prob. to observe n events: f (n; s , b ) 
n!
n
( s  b )
experiment: nobs events, quantify our degree of confidence in
the discovery of a new effect (νS≠0)
How likely is to find nobs events or more from background
alone?
Significance of an observed signal
P (n  nobs ) 


n  nobs
f (n; s  0, b )  1 
 1
nobs 1

 bn e 
n 0
Ex: expect νb=0.5, nobs= 5
nobs 1
 f (n;
n 0
s
 0, b )
b
n!
P(n>nobs)=1.7x10-4
this is not the prob. of the hypothesis νS=0 !
this is the prob., under the hypothesis νS=0, of obtaining
as many events as observed or more.
Significance of an observed signal
How to report the measurement?
estimate of ν :   5 
5
 s  4.5  2.2
misleading: • only two std. deviations from zero
• impression that νS is not very incompatible
with zero
yes:
prob. that a Poisson variable of mean νb will fluctuate
up to nobs or higher
no:
prob. that a variable with mean nobs will fluctuate down
to νb or lower
Pearson’s
 test
2
ni
histogram of x with N bins ν
i
construct a statistic which reflects the level of agreement
between observed and expected histograms

N
(ni  i )
data n  (n1,, nN ) ni  5
2
 
aprox. gaussian,
Poisson distributed
i
i 1

with   (1,, N )

follow a
2
distribution for N degrees of freedom
• regardless of the distribution of x
• distribution free
larger 
larger discrepancy between data and the
hypothesis
2
Pearson’s

P   f ( z; nd )dz
E [  ]  nd
2
2
  15 nd  10
2
P  0.13
 test
2
2
nd
 2  150
1
nd  100
(rule of thumb
for a good fit)
P  9.0  10 4
Pearson’s
 test
2
Pearson’s
N
Before ntot   ni
 test
2
N
Poisson variable with  tot   i
i 1
i 1
Set ntot = fixed
ni dist. as multinomial with prob. pi 
i
ntot
Not testing the total number of expected and observed
Events, but only the distribution of x.
2
(
n

p
n
)
i tot
2   i
pi ntot
i 1
N
large number on entries in each bin
pi known
Follows a  2 distribution for N-1 degrees of freedom
In general, if m parameters estimated from data, nd = N - m
Standard deviation as stat. error
n observations of x, hypothesis p.d.f f(x;θ)
ML: ˆ( x1,, x n ) estimator for θ
analytic method
RCF bound
Monte Carlo
graphical

measurement
standard deviation ˆˆ
  ˆ  ˆˆ
repeated estimates each based on n obs.:
estimator dist. g (ˆ; )
centered around true value θ and
with true  ˆ estimated by ˆ and ˆˆ

Most practical estimators: g (ˆ; ) becomes approx.
Gaussian in the large sample limit.
Classical confidence intervals
n obs. of x, evaluate an estimator ˆ( x1,, x n ) for a param. θ
ˆobs
obtained and its p.d.f. g (ˆ; )
(for a given θ unknown)
prob. α ˆ  u
prob. β ˆ  v 
  P (ˆ  u ( )) 

ˆ; )dˆ  1  G(u ( ), )
g
(


u ( )
  P(ˆ  v  ( )) 
v  ( )
ˆ; )dˆ  G(v  ( ), )
g
(



Classical confidence intervals
prob. for estimator to be inside
the belt regardless of θ
P(v  ( )  ˆ  u ( ))  1    
ˆ  u ( )
ˆ  v  ( )

u ( ), v  ( )
monotonic incresing
functions of θ
a(ˆ)  u1(ˆ)
b(ˆ)  v 1(ˆ)
a(ˆ )  
b(ˆ )  

P (a(ˆ)    b(ˆ)  1    
P (a(ˆ)   )  
P (b(ˆ)   )  
Classical confidence intervals
  
Usually: central confidence interval

2
P (a    b )  1  
a: hypothetical value of ˆ for which
a fraction  of the repeated estimt.

ˆ
would
be higher than the obtain. ˆobs

ˆobs  u (a)  v  (b)


g (ˆ; a)dˆ  1  G(ˆ


obs

; a)
obs

ˆobs
 g (ˆ; b)dˆ  G(ˆ
obs

; b)
Classical confidence intervals
Relationship between a conf. interval and a test of goodness
of fit:
test the hypothesys   a using ˆ  obs having equal
or less agreement than the result obtained
P-value = α (random variable) and θ = a is specified
Confidence interval: α is specified first, a is a random quantity
depending on the data
[a, b ]
d
ˆ

c
c  ˆ  a
d  b  ˆ
Classical confidence intervals
Many experiments: the interval would include the true
value  in 1    
It does not mean that the probability that the true value of
 is in the fixed interval is 1    
Frequency interpretation:  is not a random variable,
but the interval fluctuates since it is constructed from data.
Gaussian distributed
Simple and very important application
Central limit theorem: any estimator linear function of sum
of random variables becomes Gaussian in the large sample
limit.
ˆ
2
G(ˆ; , ˆ ) 


 ˆ
(ˆ   )
exp( 
)dˆ
2
2 ˆ
22ˆ
1
ˆobs
ˆobs  a

  1  G(ˆobs ; a, ˆ )  1   (
)

ˆobsˆ  b

  1  G(ˆobs ; b, ˆ )  1   (
)
 ˆ
known, experiment resulted in
Gaussian distributed
a  ˆobs   ˆ 1 (1   )
b  ˆobs   ˆ 1 (1   )
 1(  )   1(1   )
Gaussian distributed
Choose quantile
 1 (1  2 )
1
2
3
1 
 1 (1   )
1
0.6827
0.9544
0.9973
1
2
3
0.8413
0.9772
0.9987
1
 1 (1   )
Choose confidence level
1 
0.90
0.95
0.99
 1 (1  2 )
1.645
1.960
2.576
0.90
0.95
0.99
1.282
1.645
2.326.