Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
third lecture Parameter estimation, maximum likelihood and least squares techniques Jorge Andre Swieca School Campos do Jordão, January,2003 References • Statistics, A guide to the Use of Statistical Methods in the Physical Sciences, R. Barlow, J. Wiley & Sons, 1989; • Statistical Data Analysis, G. Cowan, Oxford, 1998 • Particle Data Group (PDG) Review of Particle Physics, 2002 electronic edition. • Data Analysis, Statistical and Computational Methods for Scientists and Engineers, S. Brandt, Third Edition, Springer, 1999 Likelihood “A verossimilhança (…) é muita vez toda a verdade.” Conclusão de Bento – Machado de Assis “Quem quer que a ouvisse, aceitaria tudo por verdade, tal era a nota sincera, a meiguice dos termos e a verossimilhança dos pormenores” Quincas Borba – Machado de Assis Parameter estimation p.d.f. f(x): sample space all possible values of x. Sample of size n: x ( x1, x2 ,..., xn ) independent obsv. Joint p.d.f. fsam ( x1,..., xn ) f ( x1 )f ( x2 )f ( xn ) Central problem of statistics: from n measurements of x , infer properties of f ( x; ), (1, 2 ,, m ) . A statistic: a function of the observed x. To estimate prop. of p.d.f. (mean, variance,…): estimador. Estimador para θ : θ̂ Estimador consistent limn P ( ˆ ) 0 (large sample or assimptotic limit) Parameter estimation ˆ( x1,, x n ) random variable distributed as g (ˆ; ) (sampling distribution) E [ˆ( x )] ˆg (ˆ; )dˆ Infinite number of similar experiments of size n ˆ ( x )f ( x1; )f ( xn ; )dx1 dxn Bias b E [ˆ] • sample size • functional form of estimator • true properties of p.d.f. b=0 independent of n: θ is unbiased Important to combine results of two or more experiments. Parameter estimation MSE E [(ˆ )2 ] E [(ˆ E [ˆ]) 2 (E [ˆ ]) 2 mean square error MSE V [ˆ] b 2 Classical statistics: no unique method for building estimators given an estimator one can evaluate its properties sample mean From x ( x1, x2 ,..., xn ) supposed from unknown pdf f (x ) Estimator for E[x]=µ (population mean) 1 n one possibility: x x i n i 1 Parameter estimation Important property: weak law of large numbers If V(x) exists, x is a consistent estimator for µ n→∞, x →µ in the sense of probability 1 n 1 n 1 n E [ x ] E xi E [ xi ] n i 1 n i 1 n i 1 E [ x i ] x i f ( x1 )f ( x n )dx1 dx n x is an unbiased estimator for the population mean µ Parameter estimation Sample variance E [s 2 ] 2 n 1 n 2 2 2 2 s ( x x ) ( x x ) i n 1 i 1 n 1 s2 is an unbiased estimator for V[x] if µ is known n 1 S 2 ( x i )2 x 2 2 n i 1 S2 is an unbiased estimator for σ2. Maximum likelihood Technique for estimating parameters given a finite sample of data x ( x1, x2 ,..., xn ) Suppose the functional form of f(x;θ) known. The probability of x1 be in [ x1, x1 dx1 ] is f ( x1; )dx1 n prob. xi in [ xi , xi dxi ] for all i = f ( x ; )dx i i i 1 If parameters correct: high probability for the data. n L( ) f ( x i ; ) likelihood function i 1 • joint probability • θ variables • X parameters ML estimators for θ: maximize the likelihood function L 0 i i 1,, m ˆ (ˆ1,,ˆm ) Maximum likelihood Maximum likelihood n decay times for unstable particles t1,…,tn hypothesis: distribution an exponential p.d.f. with mean f (t ; ) 1 t exp( ) (log L ) 0 n n i 1 i 1 1 t log L( ) log f (t i ; ) (log i ) 1 n ˆ t i n i 1 E [ˆ(t1,, t n )] ˆ(t1,, t n )f joint(t1,, t n ; )dt1 dt n t t 1 n 1 1 1 n ( t i ) e e dt1 dt n n i 1 tj ti 1 n 1 1 1 t i e dt i e dt j n i 1 n i 1 j i n Maximum likelihood 50 decay times 1.0 ˆ 1.062 Maximum likelihood 1 ? given a( ) L L a 0 a 1 n ˆ n ˆ n 1 n E [ˆ] n 1 n 1 t i 1 i a 0 ˆ aˆ a( ) unbiased 1 estimator for when n→∞ Maximum likelihood n measurements of x assumed to come from a gaussian 2 1 ( x ) 2 f ( x; , ) exp 2 2 2 2 2 n n 1 1 1 ( x ) 2 2 i log L( , ) log f ( xi ; , ) log log 2 2 2 2 2 i 1 i 1 1 n log L E [ ˆ ] 0 ̂ x i unbiased n i 1 log L 0 2 /\ 2 1 n ( xi ˆ )2 n i 1 n 1 2 E [ ] n /\ 2 unbiased for large n Maximum likelihood we showed that s2 is an unbiased estimator for the variance of any p.d.f., so n 1 2 s2 ( x ˆ ) i n 1 i 1 is unbiased estimator for 2 Maximum likelihood Variance of ML estimators many experiments (same n): spreadn of 1 analytically (exponential) ˆ t i n i 1 V [ˆ] E [ˆ 2 ] (E[ˆ]) 2 t ˆ ? t 1 n 1 n 1 2 1 ( ti ) e e dt1 dt n n i 1 t t 2 1 n 1 1 1 n ( t i ) e e dt1 dt n n i 1 n transf. invariance of ML estimators ML estimate of 2ˆ 2 n /\ 2 ˆ ˆ 2 n ˆˆ ˆ n Maximum likelihood ˆ 7.82 0.43 If the experiment repeated many times (with the same n) the standard deviation of the estimation 0.43. • one possible interpretation • not the standard when the distribution is not gaussian (68% confidence interval, +- standard deviation if the p.d.f. for the estimator is gaussian) • in the large sample limit, ML estimates are distributed according to a gaussian p.d.f. • two procedures lead to the same result Maximum likelihood Variance: MC method cases too difficult to solve analytically: MC method • simulate a large number of experiments • compute the ML estimate each time • distribution of the resulting values S2 unbiased estimator for the variance of a p.d.f. S from MC experiments: statistical errors of the parameter estimated from real measurement asymptotic normality: general property of ML estimators for large samples. Maximum likelihood 1000 experiments 50 obs/experiment sample standard deviation s = 0.151 ˆ 1.062 ˆˆ n 50 0.150 RCF bound A way to estimate the variance of any estimators without analytical calculations or MC. 2 log L Rao-Cramer-Frechet E 2 Equality (minimum variance): estimator efficient If efficient estimators exist for a problem, the ML will find them. b ˆ V [ ] 1 2 ML estimators: always efficient in the large sample limit. 1 Ex: exponential f (t; ) e b 0 V [ ] t 1 n 2ˆ E 2 1 2 log L n 2 1 n n ˆ 1 t 1 2 i 2 2 n i 1 2 2 n equal to exact result efficient estimator RCF bound (1,, m ) V 1 ij assume efficiency and zero bias Vij cov[ˆi ,ˆj ] 2 log L E i j n 2 n log f ( x k ; ) f ( x l ; )dx l i j k 1 l 1 2 n f ( x; ) log f ( x; )dx i j 1 V n statistical errors 1 n RCF bound large data sample: evaluating the second derivative with the measured data and the ML estimates ˆ V /\ 1 log L ij i j 1 2 log L ̂ 2 /\ 2 2 ̂ usual method for estimating the covariance matrix when the likelihood function is maximized numerically Ex: MINUIT (Cern Library) • finite differences • invert the matrix to get Vij Graphical method single parameter θ 2 1 log L log L 2 ˆ ˆ log L( ) log L( ) ( ) ( ) ... 2 2! ˆ ˆ ( ˆ)2 log L( ) log Lmax 2ˆ2 1 log L( ˆ ) log Lmax 2 later [ˆ , ˆ ] 68.3% central confidence interval logLmax 1 2 ML with two parameters angular distribution for the scattering angles θ (x=cosθ) in a particle reaction. 1 x x 2 f ( x; , ) 2 2 3 normalized -1≤ x ≤+1 realistic measurements only in xmin ≤ x ≤ xmax f ( x; , ) ( xmax xmin ) 2 1 x x 2 (x 2 max x 2 min ) 3 3 3 ( xmax xmin ) ML with two parameters 0.5 0.5 2000 events ˆ 0.508 0.052 ˆ 0.466 0.108 ML with two parameters 500 exper. 2000 evts/exp. ˆ 0.499 sˆ 0.051 ˆ 0.498 Both marginal pdf’s are aprox. gaussian sˆ 0.111 r 0.42 Least squares measured value y: gaussian random variable centered about the quantity’s true value λ(x,θ) ( xi , y i i ) i 1,, n n L( y1,, y n ; 1,, n , 12 ,, n2 ) i 1 ( y i i )2 exp 2 2 2 i 2i 1 estimate the 1 ( y i ( xi ; ))2 log L( ) 2 i 1 2i n maximized with that mimize n 2 ( ) i 1 ( y i ( xi , ))2 i2 Least squares used to define the procedure even if yi are not gaussian measurements not independent, described by a n-dim Gaussian p.d.f. with nown cov. matrix but unknown mean values: 1 n 1 log L( ) ( y i ( xi , ))(V )ij ( y j ( x j ; )) 2 i , j 1 n 2 1 ( ) ( y i ( xi , ))(V )ij ( y j ( x j ; )) i , j 1 ˆ1,,ˆm LS estimators Least squares m ( x; ) a j ( x ) j j 1 a j (x ) linearly independent m m ( xi ; ) a j ( xi ) j Aij j j 1 j 1 • estimators and their variances can be found analytically • estimators: zero bias and minimum variance T 1 T 1 2 ( y ) V ( y ) ( y A ) V ( y A ) 2 T 1 T 1 minimum 2( A V y A V A ) 0 ˆ T 1 1 T 1 ( A V A ) A V y By Least squares covariance matrix for the estimators U ij cov[ i , j ] U BVBT ( ATV 1A)1 2 1 1 (U )ij 2 i j ˆ coincides with the RCF bound for the inverse covariance matrix if yi are gaussian distributed log L 2 2 Least squares ( x; ) linear in , 2 quadratic in 2 2 m 1 2 2 ˆ ( ) ( ) 2 i , j 1 i j to interpret this, one single θ ˆ 2 ( ) (ˆ) 1 ( i ˆi )( j ˆj ) ˆ 2 2 1 2 2 ˆ 2 ˆ ( ) ( ) ( ) 2 ˆ /\ 2 min ˆ Chi-squared distribution n 1 2 z 2 z e f ( z; n ) n 2 2 ( n 2 ) n=1,2,… 0 ≤ z ≤ ∞ (degrees of freedom) (n ) (n 1)! ( x ) x( x ) E[z] f (z; n)dz n 0 V [z] ( z n )2 f ( z; n )dz 2n 0 n independent gaussian random variables xi with known i , i2 n z i 1 ( x i i )2 i2 is distributed as a 2 for n dof Chi-squared distribution Chi-squared distribution Chi-squared distribution