Download lecture 6

‫פרקים נבחרים בפיסיקת החלקיקים‬ ‫אבנר סופר‬ ‫אביב ‪2007‬‬ ‫‪6‬‬ Administrative stuff • Projects status • Other homework problems: – Open questions in HW #1 (questions about the Quantum Universe) and HW #3 (difference between D mixing and Bs mixing analyses) – we will go over them when return from break • The plan for the next few weeks: – Statistics (with as many real examples as possible) – Root and RooFit – Practicing statistics and analysis techniques • Lecture on Tuesday, April 10 (Mimona) instead of Monday (Passover break)? Why do we use statistics in EPP? • Scientific claims need to be based on solid mathematics – How confident are we of the result? What is the probability that we are wrong? – Especially important when working at the frontier of knowledge: extraordinary claims require extraordinary proof • Proving something with high certainty is usually expensive – Many first measurements are made with marginal certainty • Statistical standards: – “Evidence” – “Observation” Probability • • • Set S (sample space) Subset A S The probability P(A) is a real number that satisfies the axioms 1. 2. 3. P(A)  0 If A and B are disjoint subsets, i.e., A B = 0, then P(A B) = P(A) + P(B) P(S) = 1 Derived properties • • • • P(!A) = 1 – P(A), where !A = S – A P(A  !A) = 1 1  P(A)  0 P(null set) = 0 • • If A  B, then P(B)  P(A) P(A  B) = P(A) + P(B) – P(A  B) More definitions • Subsets A and B are independent if P(A  B) = P(A) P(B) • A random variable x is a variable that has a specific value for each element of the set • An element may have more than 1 random variables: x = {x1, …xn} Interpretation of Probability in data analysis • Limiting relative frequency: – Elements of the sample space S = possible outcomes of a repeatable measurement – The probability of a particular outcome e (= element of S) is 1 number of occurrence s of outcome e in n measuremen ts  n  n P(e)  lim (note that the single element e belongs to a subset with one element = an elementary subset) – A non-elementary subset A corresponds to an occurrence of any of the outcomes in the subset, with probability P( A)   P(e) eA Example 1 • Element e = D mixing parameter y’ measured to be 0.01 • Subset A = y’ measured to be in range [0.005, 0.015] • P(A) = fraction of experiments in which y’ is measured in [0.005, 0.015], given that its true value is 0.002 Example 2 • e = (x’2, y’) measured to be (-0.0002, 0.01) • A = (x’2, y’) measured to be anywhere outside the brown (“4s”) contour • P(A) = fraction of experiments in which (x’2, y’) are measured outside the contour, given that their true values are the measured ones Example 3 • e = error on CP-violating parameter q- measured to be 42 • A = q- error measured to be 42 or greater • P(A) = fraction of experiments in which the q- error is measured to be 42 or greater About the relative frequency interpretation • Straightforward when measurements are repeatable: – Particle collisions in an experiment – Radioactive decays of identical nuclei Physical laws don’t change • Also works when measurements are repeatable only in principle : – Measurement of the D mixing parameters using all the data we will ever have – Measurement of the average height of all humans Probability density functions • Outcome of an experiment is a continuous random variable x – Applies to most measurements in particle physics • Define: the probability density function (PDF) f(x), such that f(x) dx = probability to observe x in [x, x+dx] = fraction of experiments in which x will be measured in [x, x+dx] • To satisfy axiom 3: P(S) = 1, normalize the PDF:  f ( x)dx  1 S The PDF and finite number of observations • A set of nmeas measurements xm (m=1…nmeas) can be presented as a histogram: nb (b=1…nbins) = number of measurements for which x falls in bin b • nb / nmeas = probability for a measurement to be in bin b. – b nb / nmeas = 1 • nb / (nmeas Dxb) = (discrete) probability density function • Continuum limit (infinite number of observations, infinitely fine binning): f ( x)  lim nb / (n meas Dxb ) nmeas  Dxb 0 Cumulative distribution x • The cumulative distribution of f(x) is F ( x)   f ( x' )dx' - • Alternatively: F(x) = probability to obtain measurement whose value is < x f(x) = dF(x)/dx (for differentiable F(x)) a-point xa is the value of x such that F(xa) = a, where 1  a  0. Or: xa = F-1(a) • Median = x½ = value of x such that F(x½) = ½ • Mode = xmode such that f(xmode) > f(all other values of x) • – may not be useful or unique if f(x) has multiple local maxima Extension to multi-variable PDFs • For f(x), x = {x1, … xn}, the a-point turns into an a-contour of dimension n-1 • Marginal PDFs: – fx(x) = f(x,y) dy – fy(y) = f(x,y) dx • x and y are independent variables if f(x,y) = fx(x) fy(y) – Also called uncorrelated variables Functions of random variables • a(x) is a continuous function of random variable x, which has PDF f(x) – E.g., a = x2, a = log(x), etc. • What is the PDF g(a)? • Require equal probabilities in corresponding infinitesimal regions: g(a) da = f(x) dx  g(a) = f(x(a)) |dx/da| Abs value to keep PDF positive Assumes a(x) can be inverted da a dx x Example • The CP-violation phases a,b,g are not measured directly. We measure cos f or sin f or sin 2f, then transform to the phases: Multiple-valued x(a) • If a(x) is not uniquely invertable, need to add up the different contributions. da a dx1 dx2 dS(a) = sum of 2 regions x g (a )da   f ( x' )dx' dS for a(x) uniqely invertable we get the previous result : x ( a  da )   f ( x' )dx'  x(a) x ( a ) dx da da  f ( x' )dx'  f ( x) x(a) dx da da Functions of multiple random variables • What is g(a) for a(x), x = {x1, … xn}  f (x )dx ...dx g (a)da  1 dS is the hypersurface in x that encloses [a, a+da] n dS • Example.: z = xy, what is f(z) given g(x) and h(y)? f ( z )dz   g ( x)h( y )dxdy  dS  So  ( z  dz ) / | x|  - z / | x| -  g ( x)dx   h( y)dy  g ( x ) h( z / x ) g ( z / y ) h( y ) f ( z)   dx   dy | x| | y| - - • f(z) is the Mellin convolution of g(x) h(y)  g ( x ) h( z / x ) dz dx | x| • Another example: z = x + y f ( z )dz   g ( x)h( y )dxdy  dS So f ( z)   z - x  dz  - z-x -  g ( x)dx  h( y)dy   g ( x)h( z - x)dzdx   - -  g ( x)h( z - x)dx   g ( z - x)h( y)dy • f(z) is the familiar Fourier convolution of g(x) and h(y). • Recall from the D mixing analysis: The measured decay time t is the true decay time t’ (distribution P(t’)) + a random detector error Dt (distribution r(Dt):  q 2 (t ) 2  qD -t  2 2 D P(t )  e ( x'  y ' )  R t R y '   pD  4 pD   In reality Observe F(t)   P(t ' )r (t - t ' )dt ' Multiple functions of multiple random variables • g(a1, …. an) = f(x1, … xn) |J|, where the Jacobian is x1 / a1 x1 / a2 ... J  x2 / a1 x2 / a2 ... ... ... ... • To determine the marginal distribution gi(ai), need to integrate g(a1, …. an) over the aji variables Expectation values • The expectation value of a random variable x distributed according to the PDF f(x):  E[ x]  x   x f ( x)dx - • Also called population mean • E[x] is the most commonly used location parameter (others are the apoint xa and the mode) • The expectation value of a function a(x) is   - - E[a ]   a g (a )da   a ( x) f ( x)dx Moments  E[ x n ]  • The nth algebraic moment of f(x): n ' x f ( x ) dx  m n  - – Note that the population mean m is the special case m’1  • The nth central moment E[( x - E[ x]) n ]   ( x - m ) n f ( x)dx  m n -  • In particular, E[( x - E[ x]) 2 ]   ( x - m ) 2 f ( x)dx  m 2  V [ x] - is the population variance of f(x) • The standard deviation s  V [x] gives an idea of the spread of f(x) Mean and variance of functions • Take a function of many random variables: a(x). Then   - - E[a ( x)]  m a   a g (a)da   a( x ) f ( x )dxn ...dxn   - - V [a( x)]  s a2  E[( a - m a ) 2 ]   (a - m a ) 2 g (a )da   (a - m a ) 2 f ( x )dxn ...dxn Covariance • For 2 random variables x, y, the covariance cov[x,y] or Vxy is Vxy  E[( x - m x )( y - m y )]  E[ xy] - m x m y      xy f ( x, y)dxdy - m m x y - - • For 2 functions a(x), b(x), the covariance is Vab  E[( a - m a )(b - mb )]  E[ab] - m a mb      ab g (a, b)da db - m m a - -   - -   ...  a ( x )b( x ) f ( x ) dx1...dxn - m a mb • Note that Vab = Vba and Vaa = sa2 • The dimensionless correlation coefficient is rab = Vab / (sa sb) – Note that 1  rab  -1 b Understanding covariance and correlation • Vxy =E[(x - mx)(y - my)] is the expectation value of the product of the deviations from the means. • If having x > mx increases the probability of having y > my then Vxy > 0, x and y are positively correlated • If having x > mx increases the probability of having y < my then Vxy < 0, x and y are negatively correlated or anti-correlated. • For independent variables (defined as f(x,y) = fx(x) fy(y)), we find E[xy] = E[x] E[y] = mx my so Vxy = 0. • Does Vxy = 0 necessarily mean that the variables are independent?... Covariance and correlation • …No. E.g., Propagation of errors • Take n random variables x with unknown PDF f(x), but with E[x] and Vij known (or estimated) • Take the function y(x). What are E[y] and V[y]? – Remember: we don’t know f(x). • Expand y:  y  y ( x)  y ( m )     ( xi - mi ) i 1  xi  x  m n E[ y ( x)]  y ( m ), since E[ xi - mi ]  0  y  E[ y ( x)]  y ( m )  2 y ( m )   E[ xi - mi ] i 1  xi  x  m n 2 2  n     n  y   y   E     ( xi - mi )    ( x j - m j )     j 1  x j   i 1  xi  x  m   x  m       n     n  y   y  E     ( xi - mi )    ( x j - m j )     j 1  x j   i 1  xi  x  m   x  m      Evaluate n   y y  y y     E ( xi - mi )( x j - m j )     cov[ xi , x j ]  x  x  x  x i , j 1  i , j 1   i j  x  m  i j  x  m  n  So :  y y  s  E[ y ( x)] - y ( m )     cov[ xi , x j ] i , j 1   xi x j  x  m n 2 y 2 2 Simlarly, for m functions y1( x ),...y m( x ) :  yk yl  cov[ yk , yl ]     cov[ xi , x j ]  x  x i , j 1  j   i  xm n  yk  In matrix notation : U  DVD , where Dij     xi  x  m T Why is this “error propagation”? Because we often estimate errors from covariances Special cases For y  x1  x2 : For y  x1 x2 : s y2  s 12  s 22  2V12 s y2 s 12 s 22 V12   2 y 2 x12 x22 x1 x2 For uncorrelat ed xi x j : 2  y  2 s i  i 1  xi  x  m n s y2     y y  cov[ yk , yl ]    k l  s i2 i 1  xi xi  x  m n Note: These formulae don’t work if y is significantly non-linear within a distance si around the mean m Orthogonal transformation of variables • It is often useful to work in variables in which the covariance matrix is diagonal: cov[yi,yj] = si2 dij • This can always achieved with a linear transformation: n yi   Aij x j j 1 where the rows of the transformation matrix Aij are the eigenvectors of cov[xi,xj]. • Then si2 are the eigenvalues of cov[xi,xj] Visualize for 2 dimensions • Recall the definition of the correlation coefficient rab = Vab / (sa sb). So we can write  s 12 rs 1 s 2   cov[ x1 , x2 ]  V   2 s 2   rs 1 s 2 The eigenvalue s of V are found by solving | V - I | 0 : 2 1    s 12  s 22  s 12  s 22  - 41 - r 2 s 12s 22  2  The transform ation matrix is a simple rotation :  cos q A    - sin q with sin q   cos q   2 rs s 1 q  tan -1  2 1 22 2  s1 - s 2 Eigenvector 1 Eigenvector 2    More on linear variable transformations • The uncorrelated variables yi have a simpler covariance matrix, but may not always correspond to physically interesting quantities • E.g., in D mixing, x’2 and y’ have a very high correlation coefficient of r= -0.94 • But they are the physically interesting variables…

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download lecture 6