Download Statistical Data Analysis, G. Cowan, Oxford, 1998

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
third lecture
Parameter estimation, maximum
likelihood and least squares
techniques
Jorge Andre Swieca School
Campos do Jordão, January,2003
References
• Statistics, A guide to the Use of Statistical
Methods in the Physical Sciences, R. Barlow,
J. Wiley & Sons, 1989;
• Statistical Data Analysis, G. Cowan, Oxford,
1998
• Particle Data Group (PDG) Review of Particle
Physics, 2002 electronic edition.
• Data Analysis, Statistical and Computational
Methods for Scientists and Engineers, S. Brandt,
Third Edition, Springer, 1999
Likelihood
“A verossimilhança (…) é muita vez toda a verdade.”
Conclusão de Bento – Machado de Assis
“Quem quer que a ouvisse, aceitaria tudo por verdade, tal era
a nota sincera, a meiguice dos termos e a verossimilhança
dos pormenores”
Quincas Borba – Machado de Assis
Parameter estimation
p.d.f. f(x): sample space all possible values of x.

Sample of size n: x  ( x1, x2 ,..., xn ) independent obsv.
Joint p.d.f. fsam ( x1,..., xn )  f ( x1 )f ( x2 )f ( xn )
Central problem of statistics:

from n measurements of x ,
infer properties of f ( x; ),   (1, 2 ,, m ) .

A statistic: a function of the observed x.
To estimate prop. of p.d.f. (mean, variance,…): estimador.
Estimador para
θ : θ̂
Estimador consistent limn  P ( ˆ     )  0
(large sample or assimptotic limit)
Parameter estimation
ˆ( x1,, x n ) random variable distributed as g (ˆ; )
(sampling distribution)
E [ˆ( x )] 
ˆg (ˆ; )dˆ


Infinite number of similar
experiments of size n

ˆ
     ( x )f ( x1; )f ( xn ; )dx1 dxn
Bias
b  E [ˆ]  
• sample size
• functional form of estimator
• true properties of p.d.f.
b=0 independent of n: θ is unbiased
Important to combine results of two or more experiments.
Parameter estimation
MSE  E [(ˆ   )2 ]  E [(ˆ  E [ˆ]) 2  (E [ˆ   ]) 2
mean square error
MSE  V [ˆ]  b 2
Classical statistics: no unique method for building estimators
given an estimator one can evaluate its properties
sample mean


From x  ( x1, x2 ,..., xn ) supposed from unknown pdf f (x )
Estimator for E[x]=µ
(population mean)
1 n
one possibility: x   x i
n i 1
Parameter estimation
Important property: weak law of large numbers
If V(x) exists, x is a consistent estimator for µ
n→∞,
x →µ in the sense of probability
1 n
 1 n
1 n
E [ x ]  E   xi    E [ xi ]     
n i 1
 n i 1  n i 1
E [ x i ]    x i f ( x1 )f ( x n )dx1 dx n  
x is an unbiased estimator for the population mean µ
Parameter estimation
Sample variance
E [s 2 ]   2
n
1
n
2
2
2
2
s 
(
x

x
)

(
x

x
)

i
n  1 i 1
n 1
s2 is an unbiased estimator for V[x]
if µ is known
n
1
S 2   ( x i   )2  x 2   2
n i 1
S2 is an unbiased estimator for σ2.
Maximum likelihood
Technique
 for estimating parameters given a finite sample
of data x  ( x1, x2 ,..., xn )
Suppose the functional form of f(x;θ) known.
The probability of x1 be in [ x1, x1  dx1 ] is f ( x1; )dx1
n
prob. xi in [ xi , xi  dxi ] for all i =
 f ( x ; )dx
i
i
i 1
If parameters correct: high probability for the data.
n
L( )   f ( x i ; )
likelihood function
i 1
• joint probability
• θ variables
• X parameters
ML estimators for θ: maximize the likelihood function
L
0
 i
i  1,, m
 ˆ  (ˆ1,,ˆm )
Maximum likelihood
Maximum likelihood
n decay times for unstable particles t1,…,tn
hypothesis: distribution an exponential p.d.f. with mean
f (t ; ) 
1

t
exp(  )
(log L )
0


n
n
i 1
i 1

1 t
log L( )   log f (t i ; )   (log  i )

1 n
ˆ   t i
n i 1
E [ˆ(t1,, t n )]     ˆ(t1,, t n )f joint(t1,, t n ; )dt1 dt n
t
t
1 n
1  1
1  n
    (  t i ) e  e dt1 dt n
n i 1


tj
ti

 1 n


1  1 
1 
   t i e dt i   e dt j     
 n i 1
n i 1  

j i

n

Maximum likelihood
50 decay
times
  1.0
ˆ  1.062
Maximum likelihood

1

?
given a( )
L L a

0
 a 
1
n
ˆ
  n
ˆ
n
1 n
E [ˆ]  

n 1  n 1
t
i 1
i
a
0

ˆ
aˆ  a( )
unbiased 1
estimator for
when n→∞ 
Maximum likelihood
n measurements of x assumed to come from a gaussian
2


1
(
x


)
2

f ( x; , ) 
exp 
2
2
2
2


2
n
n


1
1
1
(
x


)
2
2
i

log L( , )   log f ( xi ; , )    log
 log 2 
2

2
2 2
i 1
i 1 

1 n
 log L
E [ ˆ ]  
 0 ̂   x i
unbiased

n i 1
 log L
0
2

/\
2
1 n
   ( xi  ˆ )2
n i 1
n 1 2
E [ ] 

n
/\
2
unbiased for large n
Maximum likelihood
we showed that s2 is an unbiased estimator for the variance
of any p.d.f., so
n
1
2
s2 
(
x


ˆ
)

i
n  1 i 1
is unbiased estimator for 
2
Maximum likelihood
Variance of ML estimators
many experiments (same n): spreadn of
1
analytically (exponential) ˆ   t i
n i 1
V [ˆ]  E [ˆ 2 ]  (E[ˆ]) 2
t
ˆ
?
t
 1
 n
1 n
1
2 1
    (  ti )
e   e  dt1 dt n
n i 1


t
t
2
1 n
1  1
1  n

    (  t i ) e  e dt1 dt n 
n i 1


n
transf. invariance of ML estimators
ML estimate of
 2ˆ 
2
n
/\
2
ˆ
 
ˆ
2
n
ˆˆ 
ˆ
n
Maximum likelihood
ˆ  7.82  0.43
If the experiment repeated many times (with the same n) the
standard deviation of the estimation 0.43.
• one possible interpretation
• not the standard when the distribution is not gaussian
(68% confidence interval, +- standard deviation if the p.d.f.
for the estimator is gaussian)
• in the large sample limit, ML estimates are distributed
according to a gaussian p.d.f.
• two procedures lead to the same result
Maximum likelihood
Variance: MC method
cases too difficult to solve analytically: MC method
• simulate a large number of experiments
• compute the ML estimate each time
• distribution of the resulting values
S2 unbiased estimator for the variance of a p.d.f.
S from MC experiments: statistical errors of the parameter
estimated from real measurement
asymptotic normality: general property of ML estimators for
large samples.
Maximum likelihood
1000 experiments
50 obs/experiment
sample standard
deviation s = 0.151
ˆ
1.062
ˆˆ 

n
50
 0.150
RCF bound
A way to estimate the variance of any estimators without
analytical calculations or MC.
  2 log L 
Rao-Cramer-Frechet
E 

2




Equality (minimum variance): estimator efficient
If efficient estimators exist for a problem, the ML will find them.
b 

ˆ
V [ ]  1 

 

2
ML estimators: always efficient in the large sample limit.
1
Ex: exponential f (t; )  e

b
0

V [ ] 

t

1
 n  2ˆ 
E  2 1  
 
  
 2 log L n  2 1 n  n 
ˆ 

1

t

1

2




i
 2
 2   n i 1   2 


2
n
equal to exact result
efficient estimator
RCF bound

  (1,, m )
V 
1
ij
assume efficiency and zero bias
Vij  cov[ˆi ,ˆj ]
  2 log L 
 E 

  i  j 
  n

2  n
  
  log f ( x k ; )  f ( x l ; )dx l
 i  j  k 1
 l 1
 2

 n   f ( x; )
log f ( x; )dx
 i  j
1
V
n
statistical errors 
1
n
RCF bound
large data sample: evaluating the second derivative with the
measured data and the ML estimates ˆ

V

/\
1
 log L

  
 ij
 i  j
1
  2
 log L  ̂
 2   
/\
2
2

̂
 
usual method for estimating the covariance matrix when the
likelihood function is maximized numerically
Ex: MINUIT (Cern Library)
• finite differences
• invert the matrix to get Vij
Graphical method
single parameter θ
2

1

log L 
  log L 
2
ˆ
ˆ
log L( )  log L( )  
(



)

(



)
 ...
 2 

2!     ˆ
   ˆ
(  ˆ)2
log L( )  log Lmax 
2ˆ2
1
log L(  ˆ )  log Lmax 
2
later
[ˆ    , ˆ    ]
68.3% central confidence
interval
logLmax



1
2
ML with two parameters
angular distribution for the scattering angles θ (x=cosθ) in a
particle reaction.
1  x  x 2
f ( x; ,  ) 
2
2
3
normalized -1≤ x ≤+1
realistic measurements only in xmin ≤ x ≤ xmax
f ( x; ,  ) 
( xmax  xmin ) 

2
1  x   x 2
(x
2
max
x
2
min
)

3
3
3
( xmax
 xmin
)
ML with two parameters
  0.5
  0.5
2000 events
ˆ  0.508  0.052
ˆ  0.466  0.108
ML with two parameters
500 exper.
2000 evts/exp.
ˆ  0.499
sˆ  0.051
ˆ  0.498
Both marginal
pdf’s are aprox.
gaussian
sˆ  0.111
r  0.42
Least squares
measured value y: gaussian random variable centered about
the quantity’s true value λ(x,θ)
( xi , y i   i ) i  1,, n
n
L( y1,, y n ; 1,, n , 12 ,, n2 )  
i 1
 ( y i  i )2 

exp  
2
2
2 i 
2i

1

estimate the 

1 ( y i   ( xi ; ))2
log L( )   
2 i 1 
 2i

n
maximized with  that mimize

n
 2 ( )  
i 1

( y i   ( xi , ))2
 i2
Least squares
used to define the procedure even if yi are not gaussian
measurements not independent, described by a n-dim
Gaussian p.d.f. with nown cov. matrix but unknown mean
values:



1 n
1
log L( )    ( y i  ( xi , ))(V )ij ( y j  ( x j ; ))
2 i , j 1
n



2
1
 ( )   ( y i  ( xi , ))(V )ij ( y j  ( x j ; ))
i , j 1
ˆ1,,ˆm
LS estimators
Least squares

m
 ( x; )   a j ( x ) j
j 1
a j (x ) linearly independent
m
m

( xi ; )   a j ( xi ) j   Aij j
j 1
j 1
• estimators and their variances can be found analytically
• estimators: zero bias and minimum variance
 T 1 

  T 1  

2
  ( y   ) V ( y   )  ( y  A ) V ( y  A )


2
T

1
T

1
minimum
  2( A V y  A V A )  0
ˆ

T
1
1 T
1 
  ( A V A ) A V y  By
Least squares
covariance matrix for the estimators
U ij  cov[ i , j ] U  BVBT  ( ATV 1A)1
2

1

 
1
(U )ij  

2   i  j  ˆ
coincides with the RCF bound for the inverse covariance
matrix if yi are gaussian distributed
log L  
2
2
Least squares

 ( x; )

linear in  ,

2

quadratic in
2 2
m 

1


2
2 ˆ
 ( )   ( )   
2 i , j 1   i  j

to interpret this,
one single θ
  ˆ   
 2 ( )   (ˆ)  1


 ( i  ˆi )( j  ˆj )
 ˆ
2 2

1

 
2
2 ˆ
2
ˆ
 ( )   ( )  
(



)

2    ˆ
/\
 2
 min
ˆ
Chi-squared distribution
n 1
2
z 2
z e
f ( z; n )  n
2 2 ( n 2 )
n=1,2,… 0 ≤ z ≤ ∞
(degrees of freedom)

(n )  (n  1)!
( x )  x( x )
E[z]   f (z; n)dz  n
0

V [z]   ( z  n )2 f ( z; n )dz  2n
0
n independent gaussian random variables xi with known i , i2
n
z
i 1
( x i   i )2
 i2
is distributed as a

2
for n dof
Chi-squared distribution
Chi-squared distribution
Chi-squared distribution
Related documents