Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1/16 Mathematical statistics matenum2 A random variable (stochastic variable) assigns a probability (probability density) to a possible discrete (continuous) event from a certain discrete (continuous) set of events. Discrete example: dice, pi = 1/6 for i ∈ { , , , , , } Continuous example: time of nucleus decay, p(t) = ke−kt A continuous random variable in 1D (x x ∈ R) is described by a distribution function, density of probability, (continuous) probability distribution,. . . p(x): p(x)dx = probability, that event x ∈ [x, x + dx) occurs In 2D, p(x, y) is defined so that event x ∈ [x + dx) and y ∈ [y + dy) happens with probability p(x, y)dxdy. Normalization: X pi = 1 or i Z∞ p(x)dx = 1 −∞ 5/16 matenum2 Random variables x (with distribution p1(x)) and y (with distribution p2(y)) are independent if p(x, y) = p1(x)p2(y) In the discrete case (throw a dice twice, pij = 1/36): pij = p1,ip2,j Covariance of x and y (with a 2D distribution p(x, y)): Z Cov (x x, y ) = h∆x∆yi = ∆x∆y p(x, y)dxdy Covariance of two independent random variables is zero Z Z Cov (x x, y ) = h∆x∆yix +y = dx dy ∆xp1(x)∆yp2(y) = h∆xix h∆yiy = 0 Covariance of two quantities f(x x) and g(x x) Z Cov (f, g) = h∆f∆gi = ∆f∆g p(x)dx Cumulative (integral) distribution function = probability that x 6 x: Zx p(x 0)dx 0 P(x) = and similarly in the discrete case. NB: Cov (x, x) = Var x −∞ 2/16 Probability distribution Independent random variables and the covariance matenum2 Warning. In physics etc., symbol x (random variable) and x (a value, e.g., in integration) are not distinguished. i is thrown, you will win 5 CZK; if anything else Example. Let’s gamble: if is thrown, you will loose 1 CZK. Is this game fair? r(x, y) = p 6/16 matenum2 Cov(x, y) Var (x)Var (y) Example. Let u 1 and u 2 be two independent random variables with uniform distribution in [0,1]. Calculate: a) r(u1, −u1) 2 b) r(u2 1 , u1 ) c) r(u1, u2 + u1) (see Maple) √ a) −1, b) 1, c) 1/ 2 Mean value, expectation value (not averaged value = arithmetic average of a sample) = Z X loosely E (x x) ≡ hx xi ≡ hxix = hxi = xp(x)dx or xipi [plot/matnum2r.sh] Correlation coefficient yes, the mean win = 0 Variance, fluctuation, dispersion, mean square deviation (MSD) Var (x x) loosely = where ∆x = x − hxi. σ = p Var x = h(x − hxi)2i = h∆x2i = hx2i − hxi2 Var (x x) is called the standard deviation. tab 1 100000 | tabproc "rnd(0)" "rnd(0)" | tabproc A A+B | lr Example. Let distribution u be uniform in interval [0, 1); on a comp, e.g., rnd(0). Calculate the expectation and variance. hu ui = 1/2, Var (u u) = 1/12 Function of a random variable 3/16 matenum2 X x:f(x)=y [plot/matnum2conv.sh] 7/16 matenum2 Let x and y be two continuous random variables with distribution p(x, y). The distribution of x + y is ZZ Z y:=z−x px +y (z)dz = p(x, y)dxdy = p(x, z − x)dxdz Let x be a real random variable with distribution p(x), and f(x) be a real function. A quantity (observable) f(x x) has the distribution pf(y) = Sum of random variables p(x) |f 0(x)| x+y∈(z,z+dz) ⇒ Z px +y (z) = p(x, z − x)dx where the sum is over all roots. Example. Let u be uniform in u ∈ [0, 1). Calculate the distribution function of t = − ln u. Now, let p(x, y) = p1(x)p2(y) (two independent distributions). Then Z px +y (z) = p1(x)p2(z − x)dx ≡ (p1 ∗ p2)(z) exp(−t): e.g., time of atom decay for k = 1 p1 ∗ p2 is called the convolution. Discrete example: Let’s generate two random variables (e.g., roll two dice). What is the distribution of the sum of points? p(2) = 1/36, p(3) = 2/36, . . . p(7) = 6/36, . . . p(12) = 1/36 0 for |x| > 1, 1 − |x| otherwise Example. Calculate the distribution of u 1 −u u2 tab 1 100000 | tabproc "rnd(0)-rnd(0)" | histogr -1.5 1.5 .1 | plot - Function of a random variable: mean value Mean value of quantity f : 4/16 matenum2 Z hfi = f(x)p(x)dx Or based on new random variable f = f(x x): Z hfi = ypf(y)dy Both mean values are the same: Z Z Z subst. y=f(x) yp(x) hfi = f(x)p(x)dx = dy = ypf(y)dy 0 f (x) R where in the 2nd , x = root of equation f(x) = y, for simplicity we assume: there is only one root, function f is increasing Sum of independent random variables [show/convol.sh] 8/16 matenum2 Mean value and variance of independent random variables are additive. Verbose way using the convolution: Z Z E (x x + y ) = zpx +y (z)dz = zp1(x)p2(z − x)dxdz y:=z−z = Directly: Z (x + y)p1(x)p2(y)dxdy = hxi1 + hyi2 = E (x x) + E (y y) Z E (x x + y ) = p1(x)p2(y)(x + y)dxdy Z Z Z Z = p1(x)p2(y)xdxdy+ p1(x)p2(y)ydxdy = p1(x)xdx+ p2(y)ydy = E (x x)+E (y y) Var (x x + y ) = h(∆x + ∆y)2ix +y = h(∆x)2ix +y + 2h∆x∆yix +y + h(∆y)2ix +y = Var (x x) + Var (y y) [plot/randomwalk.sh] Central limit theorem 9/16 matenum2 The sum of n equal independent distributions with a finite mean value and variance limits for n → ∞ to the Gaussian distribution with the mean value nhxi and variance nVar x. Example. Let us consider a discrete distribution b : p(−1/2) = p(1/2) = 1/2. Let us approximate the sum of n such distributions: n=1 p(−1/2) = 1/2, p(1/2) = 1/2, Var b = 1/4 n=2 p(−1) = 1/4, p(0) = 1/2, p(1) = 1/4, Var b 2 = 2/4 n=3 p(±3/2) = 1/8, p(±1/2) = 3/8, Var b 3 = 3/4 13/16 Weights matenum2 A weighted average (weights wi are not necessarily normalized): P xw x = Pi i i i wi Let us know xi (independent random variables) with errors (standard deviations, uncertainties) σi. Which weights are the best? With the lowest σ! We will derive the result for two quantities: x = wx1 + (1 − w)x2 2 2 σ2(x) = h(x − hxi)2i = h(w∆x1 + (1 − w)∆x2)2i = w2σ2 1 + (1 − w) σ2 Let n be even (for simplicity). Then for k = −n/2..n/2: ! k2 n n 1 exp − 2 , σ2 = Var (b bn) = p(k) = 2−n ≈ √ 4 n/2 + k 2σ 2πσ √ where we have used the Stirling formula n! ≈ nne−n/ 2πn The minimum is for w= 1/σ2 1 2 1/σ2 1 + 1/σ2 , 1 − w = w2 = 1/σ2 2 2 1/σ2 1 + 1/σ2 Consequently (can be generalized to more variables) 1 wi = 2 σi But one must be careful if only estimates of σi are known. Central limit theorem – another verification n n+1 2 n! n! = n = n = n n ( 2 − 1)!( n ( 2 )!/( n 2 + 1)! 2 ) · ( 2 )!( 2 + 1) n n 2 10/16 matenum2 14/16 The method of least squares matenum2 ~xi = independent variables (n vectors of any dimension, i = 1..n) yi = dependent variables (real numbers) 1/σ2 i = weights a ~ = parameters (p real parameters written as a vector), p 6 n (! p n) n ×n2 2 +1 n n n n 2 ln p( , 1) = ln p( , 0) + ln n 2 ≈ ln p( , 0) − 2 2 2 n 2 +1 We are looking for function fa x) dependent on p parameters a ~ which de~ (~ scribes data (~xi, yi) (fitting, correlation, regression) The parameters a ~ are determined so that the sum of deviations squared is minimum: X f (~x ) − y 2 i a ~ i min S2, S2 = σi a ~ The next term n−1 n n n 6 2 ln p( , 2) = ln p( , 1) + ln n ≈ ln p( , 1) − 2 2 2 n 2 +2 i and in general ln p(n, k) ≈ ln p(n, 0) − 2 k X 2k − 1 j=1 n , k X (2k − 1) ≈ j=1 Zk 0 (2k − 1)dk = k(k − 1) ≈ k2 Similarly for negative k. In the limit of large k and n: ! k2 p(n, k) ≈ p(n, 0) exp − n/2 Theorem (Gauss–Markov): for function fa ~ , the ~ linearly dependent on a above solution is the: Best (gives the smallest variance of the estimated a ~) Linear (the assumptions) Unbiased (h~ ai is correct) Estimate (BLUE). Example. For fa(x) = a (a constant) and σi = 1 find the estimate of a After normalization we get the same result again The results = (1) the estimate of a ~ , (2) the estimates of errors, (3) the correlations between parameters 11/16 Estimates matenum2 fa x) = f~0(~x) + ~ (~ Estimate of the expectation value = arithmetic average, averaged value: n Let’s calculate the variance of xn: Var (xn) = h(xn − hxi)2i = i ∆xi n 2+ = yi = σ2 n j Consequently, the estimates are (NB: 1 = number of degrees of freedom): 2 P 2 1 P P 1 P x2 − 1 P x 2 2 i i i i n n σ2 i xi − n ( i xi ) i ∆xi 2 2 σ ≈ = , Var(xn) ≈ , σ(xn) ≈ n−1 n−1 n n−1 12/16 Terminology notes matenum2 P x2 − 1 ( P x )2 Thus, the sample standard deviation∗ i i n n− 1 i i is an unbiased estimate† of the squared standard deviation, σ2, of random variable x . Random variable x may represent a measurement which is subject to uncertainty; an estimate of the standard deviation (e.g., the above one) is called the standard uncertainty in metrology. 1 P x2 − 1 P x i i i i n n 2 Similarly, is an unbiased estimate of σ(xn)2. It is referred to n−1 as standard error or standard uncertainty of the average; do not forget that the average is an unbiased estimate of hx xi. For large n, from the central limit theorem it folows that: hx xi ∈ (xn − σ(xn), xn + σ(xn)) hx xi ∈ (xn − 2σ(xn), xn + 2σ(xn)) hx xi ∈ (xn − 3σ(xn), xn + 3σ(xn)) p X ajfj(~x) + δyi, j=1 where σ2 = Var x . We assumed that the events are independent, h∆xi∆xji = 0 for i 6= j. Now, let us estimate σ2: 2+ * " 2# X X 1 1 xi − 1 x1 − x2 + · · · xj =n 1− = (n − 1)σ2 n n n i ajfj(~x), fj(~x) = j=1 ∂f(~x) ∂aj Let all the weights be the same. Data with the errors are: i *P p X matenum2 Without a loss of generality we can assume f~0(~xi) = 0. 1X 1X xi ≡ xi n n i=1 15/16 Linear or linearizable parameters We usually have a sample of a random variable, e.g., 100× rolled dice hxi ≈ xn ≡ a=y + with probability 68 % with probability 95 % with probability 99.7 % In physics, σ(xn) is provided and called “error”, in biology, economy, engineering, 2σ(xn) is common instead (95% level of confidence). ∗ or corrected sample standard deviation, to stress the division by n − 1 instead of n † or estimator to stress that it is an algorithm hδyii = 0, hδyiδyji = σ2δij where δij = 1 for i = j and δij = 0 for i 6= j (Kronecker delta). S2 = 2 p X X ajfj(~xi) − yi i j=1 We are looking for the minimum, hence the derivative should be 0 p X 1 ∂S2 X ! = fk(~xi) ajfj(~xi) − yi = (A · a ~ − ~b)k = 0 2 ∂ak i j=1 Let Fki = fk(xi) (matrix p × n), then A = F · FT, ~b = F · ~y. Linear or linearizable parameters II A·a ~ = ~b, 16/16 matenum2 a ~ = A−1 · ~b = A−1 · F · ~y Errors of estimates and the correlations of parameters: RULE: the summation is over pairs of the same indices X −1 Cov (ai, aj) = h∆ai∆aji = A−1 iα Fαkδyk Ajβ Fβl δyl X −1 −1 = Aiα Fαk Ajβ Fβlσ2δkl X −1 2 = A−1 iα Aαβ Ajβ σ X −1 2 = A−1 iα Aαβ Ajβ σ 2 = A−1 ij σ If σ is not known, it can be estimated by (analogy with the arithmetic average, where p = 1): σ2 = S2 n−p Note: apparently the simplest case is if functions fj are perpendicular, then A is diagonal and the parameters are not correlated. This is difficult to fulfill for a nonlinear (but locally linearized) estimate.