Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב 2007 6 Administrative stuff • Projects status • Other homework problems: – Open questions in HW #1 (questions about the Quantum Universe) and HW #3 (difference between D mixing and Bs mixing analyses) – we will go over them when return from break • The plan for the next few weeks: – Statistics (with as many real examples as possible) – Root and RooFit – Practicing statistics and analysis techniques • Lecture on Tuesday, April 10 (Mimona) instead of Monday (Passover break)? Why do we use statistics in EPP? • Scientific claims need to be based on solid mathematics – How confident are we of the result? What is the probability that we are wrong? – Especially important when working at the frontier of knowledge: extraordinary claims require extraordinary proof • Proving something with high certainty is usually expensive – Many first measurements are made with marginal certainty • Statistical standards: – “Evidence” – “Observation” Probability • • • Set S (sample space) Subset A S The probability P(A) is a real number that satisfies the axioms 1. 2. 3. P(A) 0 If A and B are disjoint subsets, i.e., A B = 0, then P(A B) = P(A) + P(B) P(S) = 1 Derived properties • • • • P(!A) = 1 – P(A), where !A = S – A P(A !A) = 1 1 P(A) 0 P(null set) = 0 • • If A B, then P(B) P(A) P(A B) = P(A) + P(B) – P(A B) More definitions • Subsets A and B are independent if P(A B) = P(A) P(B) • A random variable x is a variable that has a specific value for each element of the set • An element may have more than 1 random variables: x = {x1, …xn} Interpretation of Probability in data analysis • Limiting relative frequency: – Elements of the sample space S = possible outcomes of a repeatable measurement – The probability of a particular outcome e (= element of S) is 1 number of occurrence s of outcome e in n measuremen ts n n P(e) lim (note that the single element e belongs to a subset with one element = an elementary subset) – A non-elementary subset A corresponds to an occurrence of any of the outcomes in the subset, with probability P( A) P(e) eA Example 1 • Element e = D mixing parameter y’ measured to be 0.01 • Subset A = y’ measured to be in range [0.005, 0.015] • P(A) = fraction of experiments in which y’ is measured in [0.005, 0.015], given that its true value is 0.002 Example 2 • e = (x’2, y’) measured to be (-0.0002, 0.01) • A = (x’2, y’) measured to be anywhere outside the brown (“4s”) contour • P(A) = fraction of experiments in which (x’2, y’) are measured outside the contour, given that their true values are the measured ones Example 3 • e = error on CP-violating parameter q- measured to be 42 • A = q- error measured to be 42 or greater • P(A) = fraction of experiments in which the q- error is measured to be 42 or greater About the relative frequency interpretation • Straightforward when measurements are repeatable: – Particle collisions in an experiment – Radioactive decays of identical nuclei Physical laws don’t change • Also works when measurements are repeatable only in principle : – Measurement of the D mixing parameters using all the data we will ever have – Measurement of the average height of all humans Probability density functions • Outcome of an experiment is a continuous random variable x – Applies to most measurements in particle physics • Define: the probability density function (PDF) f(x), such that f(x) dx = probability to observe x in [x, x+dx] = fraction of experiments in which x will be measured in [x, x+dx] • To satisfy axiom 3: P(S) = 1, normalize the PDF: f ( x)dx 1 S The PDF and finite number of observations • A set of nmeas measurements xm (m=1…nmeas) can be presented as a histogram: nb (b=1…nbins) = number of measurements for which x falls in bin b • nb / nmeas = probability for a measurement to be in bin b. – b nb / nmeas = 1 • nb / (nmeas Dxb) = (discrete) probability density function • Continuum limit (infinite number of observations, infinitely fine binning): f ( x) lim nb / (n meas Dxb ) nmeas Dxb 0 Cumulative distribution x • The cumulative distribution of f(x) is F ( x) f ( x' )dx' - • Alternatively: F(x) = probability to obtain measurement whose value is < x f(x) = dF(x)/dx (for differentiable F(x)) a-point xa is the value of x such that F(xa) = a, where 1 a 0. Or: xa = F-1(a) • Median = x½ = value of x such that F(x½) = ½ • Mode = xmode such that f(xmode) > f(all other values of x) • – may not be useful or unique if f(x) has multiple local maxima Extension to multi-variable PDFs • For f(x), x = {x1, … xn}, the a-point turns into an a-contour of dimension n-1 • Marginal PDFs: – fx(x) = f(x,y) dy – fy(y) = f(x,y) dx • x and y are independent variables if f(x,y) = fx(x) fy(y) – Also called uncorrelated variables Functions of random variables • a(x) is a continuous function of random variable x, which has PDF f(x) – E.g., a = x2, a = log(x), etc. • What is the PDF g(a)? • Require equal probabilities in corresponding infinitesimal regions: g(a) da = f(x) dx g(a) = f(x(a)) |dx/da| Abs value to keep PDF positive Assumes a(x) can be inverted da a dx x Example • The CP-violation phases a,b,g are not measured directly. We measure cos f or sin f or sin 2f, then transform to the phases: Multiple-valued x(a) • If a(x) is not uniquely invertable, need to add up the different contributions. da a dx1 dx2 dS(a) = sum of 2 regions x g (a )da f ( x' )dx' dS for a(x) uniqely invertable we get the previous result : x ( a da ) f ( x' )dx' x(a) x ( a ) dx da da f ( x' )dx' f ( x) x(a) dx da da Functions of multiple random variables • What is g(a) for a(x), x = {x1, … xn} f (x )dx ...dx g (a)da 1 dS is the hypersurface in x that encloses [a, a+da] n dS • Example.: z = xy, what is f(z) given g(x) and h(y)? f ( z )dz g ( x)h( y )dxdy dS So ( z dz ) / | x| - z / | x| - g ( x)dx h( y)dy g ( x ) h( z / x ) g ( z / y ) h( y ) f ( z) dx dy | x| | y| - - • f(z) is the Mellin convolution of g(x) h(y) g ( x ) h( z / x ) dz dx | x| • Another example: z = x + y f ( z )dz g ( x)h( y )dxdy dS So f ( z) z - x dz - z-x - g ( x)dx h( y)dy g ( x)h( z - x)dzdx - - g ( x)h( z - x)dx g ( z - x)h( y)dy • f(z) is the familiar Fourier convolution of g(x) and h(y). • Recall from the D mixing analysis: The measured decay time t is the true decay time t’ (distribution P(t’)) + a random detector error Dt (distribution r(Dt): q 2 (t ) 2 qD -t 2 2 D P(t ) e ( x' y ' ) R t R y ' pD 4 pD In reality Observe F(t) P(t ' )r (t - t ' )dt ' Multiple functions of multiple random variables • g(a1, …. an) = f(x1, … xn) |J|, where the Jacobian is x1 / a1 x1 / a2 ... J x2 / a1 x2 / a2 ... ... ... ... • To determine the marginal distribution gi(ai), need to integrate g(a1, …. an) over the aji variables Expectation values • The expectation value of a random variable x distributed according to the PDF f(x): E[ x] x x f ( x)dx - • Also called population mean • E[x] is the most commonly used location parameter (others are the apoint xa and the mode) • The expectation value of a function a(x) is - - E[a ] a g (a )da a ( x) f ( x)dx Moments E[ x n ] • The nth algebraic moment of f(x): n ' x f ( x ) dx m n - – Note that the population mean m is the special case m’1 • The nth central moment E[( x - E[ x]) n ] ( x - m ) n f ( x)dx m n - • In particular, E[( x - E[ x]) 2 ] ( x - m ) 2 f ( x)dx m 2 V [ x] - is the population variance of f(x) • The standard deviation s V [x] gives an idea of the spread of f(x) Mean and variance of functions • Take a function of many random variables: a(x). Then - - E[a ( x)] m a a g (a)da a( x ) f ( x )dxn ...dxn - - V [a( x)] s a2 E[( a - m a ) 2 ] (a - m a ) 2 g (a )da (a - m a ) 2 f ( x )dxn ...dxn Covariance • For 2 random variables x, y, the covariance cov[x,y] or Vxy is Vxy E[( x - m x )( y - m y )] E[ xy] - m x m y xy f ( x, y)dxdy - m m x y - - • For 2 functions a(x), b(x), the covariance is Vab E[( a - m a )(b - mb )] E[ab] - m a mb ab g (a, b)da db - m m a - - - - ... a ( x )b( x ) f ( x ) dx1...dxn - m a mb • Note that Vab = Vba and Vaa = sa2 • The dimensionless correlation coefficient is rab = Vab / (sa sb) – Note that 1 rab -1 b Understanding covariance and correlation • Vxy =E[(x - mx)(y - my)] is the expectation value of the product of the deviations from the means. • If having x > mx increases the probability of having y > my then Vxy > 0, x and y are positively correlated • If having x > mx increases the probability of having y < my then Vxy < 0, x and y are negatively correlated or anti-correlated. • For independent variables (defined as f(x,y) = fx(x) fy(y)), we find E[xy] = E[x] E[y] = mx my so Vxy = 0. • Does Vxy = 0 necessarily mean that the variables are independent?... Covariance and correlation • …No. E.g., Propagation of errors • Take n random variables x with unknown PDF f(x), but with E[x] and Vij known (or estimated) • Take the function y(x). What are E[y] and V[y]? – Remember: we don’t know f(x). • Expand y: y y ( x) y ( m ) ( xi - mi ) i 1 xi x m n E[ y ( x)] y ( m ), since E[ xi - mi ] 0 y E[ y ( x)] y ( m ) 2 y ( m ) E[ xi - mi ] i 1 xi x m n 2 2 n n y y E ( xi - mi ) ( x j - m j ) j 1 x j i 1 xi x m x m n n y y E ( xi - mi ) ( x j - m j ) j 1 x j i 1 xi x m x m Evaluate n y y y y E ( xi - mi )( x j - m j ) cov[ xi , x j ] x x x x i , j 1 i , j 1 i j x m i j x m n So : y y s E[ y ( x)] - y ( m ) cov[ xi , x j ] i , j 1 xi x j x m n 2 y 2 2 Simlarly, for m functions y1( x ),...y m( x ) : yk yl cov[ yk , yl ] cov[ xi , x j ] x x i , j 1 j i xm n yk In matrix notation : U DVD , where Dij xi x m T Why is this “error propagation”? Because we often estimate errors from covariances Special cases For y x1 x2 : For y x1 x2 : s y2 s 12 s 22 2V12 s y2 s 12 s 22 V12 2 y 2 x12 x22 x1 x2 For uncorrelat ed xi x j : 2 y 2 s i i 1 xi x m n s y2 y y cov[ yk , yl ] k l s i2 i 1 xi xi x m n Note: These formulae don’t work if y is significantly non-linear within a distance si around the mean m Orthogonal transformation of variables • It is often useful to work in variables in which the covariance matrix is diagonal: cov[yi,yj] = si2 dij • This can always achieved with a linear transformation: n yi Aij x j j 1 where the rows of the transformation matrix Aij are the eigenvectors of cov[xi,xj]. • Then si2 are the eigenvalues of cov[xi,xj] Visualize for 2 dimensions • Recall the definition of the correlation coefficient rab = Vab / (sa sb). So we can write s 12 rs 1 s 2 cov[ x1 , x2 ] V 2 s 2 rs 1 s 2 The eigenvalue s of V are found by solving | V - I | 0 : 2 1 s 12 s 22 s 12 s 22 - 41 - r 2 s 12s 22 2 The transform ation matrix is a simple rotation : cos q A - sin q with sin q cos q 2 rs s 1 q tan -1 2 1 22 2 s1 - s 2 Eigenvector 1 Eigenvector 2 More on linear variable transformations • The uncorrelated variables yi have a simpler covariance matrix, but may not always correspond to physically interesting quantities • E.g., in D mixing, x’2 and y’ have a very high correlation coefficient of r= -0.94 • But they are the physically interesting variables…