Download Mathematical statistics Probability distribution Function of a random

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1/16
Mathematical statistics
matenum2
A random variable (stochastic variable) assigns a probability (probability
density) to a possible discrete (continuous) event from a certain discrete
(continuous) set of events.
Discrete example: dice, pi = 1/6 for i ∈ { , , , , , }
Continuous example: time of nucleus decay, p(t) = ke−kt
A continuous random variable in 1D (x
x ∈ R) is described by a distribution function, density of probability, (continuous) probability distribution,. . . p(x):
p(x)dx = probability, that event x ∈ [x, x + dx) occurs
In 2D, p(x, y) is defined so that event x ∈ [x + dx) and y ∈ [y + dy) happens
with probability p(x, y)dxdy.
Normalization:
X
pi = 1 or
i
Z∞
p(x)dx = 1
−∞
5/16
matenum2
Random variables x (with distribution p1(x)) and y (with distribution p2(y))
are independent if
p(x, y) = p1(x)p2(y)
In the discrete case (throw a dice twice, pij = 1/36):
pij = p1,ip2,j
Covariance of x and y (with a 2D distribution p(x, y)):
Z
Cov (x
x, y ) = h∆x∆yi = ∆x∆y p(x, y)dxdy
Covariance of two independent random variables is zero
Z
Z
Cov (x
x, y ) = h∆x∆yix +y = dx dy ∆xp1(x)∆yp2(y) = h∆xix h∆yiy = 0
Covariance of two quantities f(x
x) and g(x
x)
Z
Cov (f, g) = h∆f∆gi = ∆f∆g p(x)dx
Cumulative (integral) distribution function = probability that x 6 x:
Zx
p(x 0)dx 0
P(x) =
and similarly in the discrete case. NB: Cov (x, x) = Var x
−∞
2/16
Probability distribution
Independent random variables and the covariance
matenum2
Warning. In physics etc., symbol x (random variable) and x (a value, e.g.,
in integration) are not distinguished.
i
is thrown, you will win 5 CZK; if anything else
Example. Let’s gamble: if
is thrown, you will loose 1 CZK. Is this game fair?
r(x, y) = p
6/16
matenum2
Cov(x, y)
Var (x)Var (y)
Example. Let u 1 and u 2 be two independent random variables with uniform
distribution in [0,1]. Calculate:
a) r(u1, −u1)
2
b) r(u2
1 , u1 )
c) r(u1, u2 + u1) (see Maple)
√
a) −1, b) 1, c) 1/ 2
Mean value, expectation value (not averaged value = arithmetic average
of a sample) =
Z
X
loosely
E (x
x) ≡ hx
xi ≡ hxix = hxi = xp(x)dx or
xipi
[plot/matnum2r.sh]
Correlation coefficient
yes, the mean win = 0
Variance, fluctuation, dispersion, mean square deviation (MSD)
Var (x
x)
loosely
=
where ∆x = x − hxi. σ =
p
Var x = h(x − hxi)2i = h∆x2i = hx2i − hxi2
Var (x
x) is called the standard deviation.
tab 1 100000 | tabproc "rnd(0)" "rnd(0)" | tabproc A A+B | lr
Example. Let distribution u be uniform in interval [0, 1); on a comp, e.g.,
rnd(0). Calculate the expectation and variance.
hu
ui = 1/2, Var (u
u) = 1/12
Function of a random variable
3/16
matenum2
X
x:f(x)=y
[plot/matnum2conv.sh]
7/16
matenum2
Let x and y be two continuous random variables with distribution p(x, y).
The distribution of x + y is
ZZ
Z
y:=z−x
px +y (z)dz =
p(x, y)dxdy =
p(x, z − x)dxdz
Let x be a real random variable with distribution
p(x), and f(x) be a real function. A quantity (observable) f(x
x) has the distribution
pf(y) =
Sum of random variables
p(x)
|f 0(x)|
x+y∈(z,z+dz)
⇒
Z
px +y (z) = p(x, z − x)dx
where the sum is over all roots.
Example. Let u be uniform in u ∈ [0, 1). Calculate
the distribution function of t = − ln u.
Now, let p(x, y) = p1(x)p2(y) (two independent distributions).
Then
Z
px +y (z) = p1(x)p2(z − x)dx ≡ (p1 ∗ p2)(z)
exp(−t): e.g., time of atom decay for k = 1
p1 ∗ p2 is called the convolution.
Discrete example: Let’s generate two random variables (e.g., roll two
dice). What is the distribution of the sum of points?
p(2) = 1/36, p(3) = 2/36, . . . p(7) = 6/36, . . . p(12) = 1/36
0 for |x| > 1, 1 − |x| otherwise
Example. Calculate the distribution of u 1 −u
u2
tab 1 100000 | tabproc "rnd(0)-rnd(0)" | histogr -1.5 1.5 .1 | plot -
Function of a random variable: mean value
Mean value of quantity f :
4/16
matenum2
Z
hfi = f(x)p(x)dx
Or based on new random variable f = f(x
x):
Z
hfi = ypf(y)dy
Both mean values are the same:
Z
Z
Z
subst. y=f(x) yp(x)
hfi = f(x)p(x)dx
=
dy = ypf(y)dy
0
f (x)
R
where in the 2nd , x = root of equation f(x) = y, for simplicity we assume:
there is only one root, function f is increasing
Sum of independent random variables
[show/convol.sh]
8/16
matenum2
Mean value and variance of independent random variables are additive.
Verbose way using the convolution:
Z
Z
E (x
x + y ) = zpx +y (z)dz = zp1(x)p2(z − x)dxdz
y:=z−z
=
Directly:
Z
(x + y)p1(x)p2(y)dxdy = hxi1 + hyi2 = E (x
x) + E (y
y)
Z
E (x
x + y ) = p1(x)p2(y)(x + y)dxdy
Z
Z
Z
Z
= p1(x)p2(y)xdxdy+ p1(x)p2(y)ydxdy = p1(x)xdx+ p2(y)ydy = E (x
x)+E (y
y)
Var (x
x + y ) = h(∆x + ∆y)2ix +y
= h(∆x)2ix +y + 2h∆x∆yix +y + h(∆y)2ix +y = Var (x
x) + Var (y
y)
[plot/randomwalk.sh]
Central limit theorem
9/16
matenum2
The sum of n equal independent distributions with a finite mean value and
variance limits for n → ∞ to the Gaussian distribution with the mean value
nhxi and variance nVar x.
Example. Let us consider a discrete distribution b : p(−1/2) = p(1/2) = 1/2.
Let us approximate the sum of n such distributions:
n=1
p(−1/2) = 1/2, p(1/2) = 1/2, Var b = 1/4
n=2
p(−1) = 1/4, p(0) = 1/2, p(1) = 1/4, Var b 2 = 2/4
n=3
p(±3/2) = 1/8, p(±1/2) = 3/8, Var b 3 = 3/4
13/16
Weights
matenum2
A weighted average (weights wi are not necessarily normalized):
P
xw
x = Pi i i
i wi
Let us know xi (independent random variables) with errors (standard deviations, uncertainties) σi.
Which weights are the best? With the lowest σ!
We will derive the result for two quantities:
x = wx1 + (1 − w)x2
2 2
σ2(x) = h(x − hxi)2i = h(w∆x1 + (1 − w)∆x2)2i = w2σ2
1 + (1 − w) σ2
Let n be even (for simplicity). Then for k = −n/2..n/2:
!
k2
n
n
1
exp − 2 , σ2 = Var (b
bn) =
p(k) =
2−n ≈ √
4
n/2 + k
2σ
2πσ
√
where we have used the Stirling formula n! ≈ nne−n/ 2πn
The minimum is for
w=
1/σ2
1
2
1/σ2
1 + 1/σ2
,
1 − w = w2 =
1/σ2
2
2
1/σ2
1 + 1/σ2
Consequently (can be generalized to more variables)
1
wi = 2
σi
But one must be careful if only estimates of σi are known.
Central limit theorem – another verification
n
n+1
2
n!
n!
= n
= n
=
n
n
( 2 − 1)!( n
( 2 )!/( n
2 + 1)!
2 ) · ( 2 )!( 2 + 1)
n
n
2
10/16
matenum2
14/16
The method of least squares
matenum2
~xi = independent variables (n vectors of any dimension, i = 1..n)
yi = dependent variables (real numbers)
1/σ2
i = weights
a
~ = parameters (p real parameters written as a vector), p 6 n (! p n)
n
×n2
2 +1
n
n
n
n
2
ln p( , 1) = ln p( , 0) + ln n 2 ≈ ln p( , 0) −
2
2
2
n
2 +1
We are looking for function fa
x) dependent on p parameters a
~ which de~ (~
scribes data (~xi, yi) (fitting, correlation, regression) The parameters a
~ are
determined so that the sum of deviations squared is minimum:
X f (~x ) − y 2
i
a
~ i
min S2, S2 =
σi
a
~
The next term
n−1
n
n
n
6
2
ln p( , 2) = ln p( , 1) + ln n
≈ ln p( , 1) −
2
2
2
n
2 +2
i
and in general
ln p(n, k) ≈ ln p(n, 0) − 2
k
X
2k − 1
j=1
n
,
k
X
(2k − 1) ≈
j=1
Zk
0
(2k − 1)dk = k(k − 1) ≈ k2
Similarly for negative k. In the limit of large k and n:
!
k2
p(n, k) ≈ p(n, 0) exp −
n/2
Theorem (Gauss–Markov): for function fa
~ , the
~ linearly dependent on a
above solution is the:
Best (gives the smallest variance of the estimated a
~)
Linear (the assumptions)
Unbiased (h~
ai is correct)
Estimate (BLUE).
Example. For fa(x) = a (a constant) and σi = 1 find the estimate of a
After normalization we get the same result again
The results = (1) the estimate of a
~ , (2) the estimates of errors, (3) the
correlations between parameters
11/16
Estimates
matenum2
fa
x) = f~0(~x) +
~ (~
Estimate of the expectation value = arithmetic average, averaged value:
n
Let’s calculate the variance of xn:
Var (xn) = h(xn − hxi)2i =
i ∆xi
n
2+
=
yi =
σ2
n
j
Consequently, the estimates are (NB: 1 = number of degrees of freedom):
2
P 2 1 P
P
1 P x2 − 1 P x
2
2
i i
i i
n
n
σ2
i xi − n ( i xi )
i ∆xi
2
2
σ ≈
=
, Var(xn) ≈
, σ(xn) ≈
n−1
n−1
n
n−1
12/16
Terminology notes
matenum2
P
x2 − 1 (
P
x )2
Thus, the sample standard deviation∗ i i n n− 1 i i is an unbiased estimate† of the squared standard deviation, σ2, of random variable x .
Random variable x may represent a measurement which is subject to uncertainty; an estimate of the standard deviation (e.g., the above one) is called
the standard uncertainty in metrology.
1 P x2 − 1 P x
i i
i i
n
n
2
Similarly,
is an unbiased estimate of σ(xn)2. It is referred to
n−1
as standard error or standard uncertainty of the average; do not forget
that the average is an unbiased estimate of hx
xi.
For large n, from the central limit theorem it folows that:
hx
xi ∈ (xn − σ(xn), xn + σ(xn))
hx
xi ∈ (xn − 2σ(xn), xn + 2σ(xn))
hx
xi ∈ (xn − 3σ(xn), xn + 3σ(xn))
p
X
ajfj(~x) + δyi,
j=1
where σ2 = Var x . We assumed that the events are independent, h∆xi∆xji = 0
for i 6= j. Now, let us estimate σ2:
2+

*
"
2#
X
X
1
1
xi − 1
x1 − x2 + · · ·
xj
=n
1−
= (n − 1)σ2
n
n
n
i
ajfj(~x),
fj(~x) =
j=1
∂f(~x)
∂aj
Let all the weights be the same. Data with the errors are:
i
*P
p
X
matenum2
Without a loss of generality we can assume f~0(~xi) = 0.
1X
1X
xi ≡
xi
n
n
i=1
15/16
Linear or linearizable parameters
We usually have a sample of a random variable, e.g., 100× rolled dice
hxi ≈ xn ≡
a=y
+
with probability 68 %
with probability 95 %
with probability 99.7 %
In physics, σ(xn) is provided and called “error”, in biology, economy, engineering, 2σ(xn) is common instead (95% level of confidence).
∗
or corrected sample standard deviation, to stress the division by n − 1 instead of n
†
or estimator to stress that it is an algorithm
hδyii = 0, hδyiδyji = σ2δij
where δij = 1 for i = j and δij = 0 for i 6= j (Kronecker delta).
S2 =

2
p
X X

ajfj(~xi) − yi
i
j=1
We are looking for the minimum, hence the derivative should be 0


p
X
1 ∂S2 X
!
=
fk(~xi) 
ajfj(~xi) − yi = (A · a
~ − ~b)k = 0
2 ∂ak
i
j=1
Let Fki = fk(xi) (matrix p × n), then A = F · FT, ~b = F · ~y.
Linear or linearizable parameters II
A·a
~ = ~b,
16/16
matenum2
a
~ = A−1 · ~b = A−1 · F · ~y
Errors of estimates and the correlations of parameters:
RULE: the summation is over pairs of the same indices
X
−1
Cov (ai, aj) = h∆ai∆aji =
A−1
iα Fαkδyk Ajβ Fβl δyl
X
−1
−1
=
Aiα Fαk Ajβ Fβlσ2δkl
X
−1 2
=
A−1
iα Aαβ Ajβ σ
X
−1 2
=
A−1
iα Aαβ Ajβ σ
2
= A−1
ij σ
If σ is not known, it can be estimated by (analogy with the arithmetic
average, where p = 1):
σ2 =
S2
n−p
Note: apparently the simplest case is if functions fj are perpendicular, then
A is diagonal and the parameters are not correlated. This is difficult to fulfill
for a nonlinear (but locally linearized) estimate.
Related documents