Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
45
approximate probability that the yield level will be less than or equal to 250 MPa is
P[Y#250] = 0.257/2 = 0.1285 or almost 13%
1.2.7
The Transformation of Multi-Dimensional Random Variables
In many cases, one has to deal with distributions of random variables that are functions of
other basic random variables, that is, they are derived through transformation of the basic
variables. As an example let (X,Y) denote a two-dimensional random vector or point in ú2, the
(x,y) plane, which through the functional transformation
U=f(X,Y)
(1.82)
V=g(X,Y)
produces another random vector (U,V). If the probability density p(x,y) of (X,Y) is given, the
density q(u,v) of (U,V) is sought.
The probability that (X,Y) has a value within the rectangular square,
{(x1<X#x1+)x1), (y1<Y#y1+)y1)}, i.e. has a value that is close to (x1,y1), is represented by the
integral
which corresponds to the volume of the column shown in Fig. 1.14. Through the presumed one
to one coordinate transformation
u = f(x,y) : x = h(u,v)
(1.83)
v = g(x,y) : y = k(u,v)
the volume of a corresponding column in the (u,v) plane is given by
where *J* denotes the absolute value of the functional determinant, the Jacobean, of the inverse
transformation, Eq. (1.83), that is,
46
Fig. 1.14 The Probability Density of (X,Y)
(1.84)
where by supposition, the inverse transformation functions h(u,v) and k(u,v) possess continuous
first order partial derivatives. The probability that the pair (U,V) has a value within the
transformed "rectangle" {(u1,u1+)u), (v1,v1+)v))} is therefore approximately
and the probability density q(u,v) is given as
q(u,v) = p(h(u,v),k(u,v))*J*
(1.85)
Example 1.21
The distribution of two basic random variables X and Y is given. A new random variable
U is formed as, (1) the sum, U=X+Y, and (2) the product, U=XY, of the basic variables. The
47
distribution function of the two U’s is sought.
Solution:
By introducing an auxiliary variable V=X, the problem can be treated as a transformation
exercise. Therefore, let
(1)
U=X+Y, V=X
Then,
x=h(u,v)=v and y=k(u,v)=u-v
and
whereby the probability density of the two variables U and V is equal to p(v,u-v). The density
R(u) of the variable U=X+Y must then be obtained as the marginal density from the
two-dimensional distribution for the random vector (U,V) by integration over v from -4 to 4.
If the two basic variables X and Y were independent, then p(x,y)=f1(x)f2(y) and
(2)
U=XY, V=X, x=h(u,v)=v, y=k(u,v)=u/v
and
The density of U is again obtained by integration over v, that is,
48
Notice that x=0 must be excluded as it represents the y-axis in the xy-plane, which will be
transformed into the u-axis in the uv-plane where v=0. Since both these axes support the
probability mass zero, this does not affect the calculation.
Example 1.22
Let Y be a random variable that is constructed as the weighted sum of n random variables
Xi, i=1,2,....,n. Thus
where the constants ai, the weights, have real values. Find the expected value and the variance
of Y.
Solution:
To obtain the moments of Y by applying the method of transformed variables with
multidimensional distributions would seem to be insurmountable work according to the above
exercise. It is therefore more advantageous to obtain the moments directly from their definition.
Thus
or
E[Y] = E[(a1X1+a2X2+a3X3+@@@+anXn)] = a1E[X1]+a2E[X2]+a3E[X3]+@@@+anE[Xn]
(1.86)
since the operator E[*] is linear-additive. The second moment, the variance, is obtained as
follows,
(1.87a)
or by introducing the correlation coefficient and the standard deviations
(1.87b)
49
If the random variables {Xi} are uncorrelated two and two, Cov[Xi,Xj]=0 and
(1.87c)
An important special case of Eq. (1.87c) is when the Xi variables are mutually independent.
1.2.8
Approximate Moments and Distributions
When dealing with random variables that have complicated functional relationships with
the basic variables, it is often possible to approximately evaluate their statistical properties,
sufficiently accurately for most practical problems.
Consider a random variable Y that is given as a reasonably well behaved function
Y=g(X) of the random variable X, which has a coefficient of variation VX that is not too large.
Then the difference between X and its mean value will not be too large either, and by expanding
g(x) into Taylor series about the mean value :X, the following approximation is obtained:
(1.88)
in which the derivatives are denoted by
Taking the expected value of both sides of the above equation, the following relation is found,
or
(1.89)
The evaluation of the variance of both sides is somewhat more cumbersome. By
applying Eq. (1.87a) in connection with Eq. (1.88), the following expression is obtained:
50
or
(1.90)
by neglecting the last two higher order terms on the assumption that the coefficient of variation
VX is small. For instance, if *VX* <<1,
Var[(X/:X)2]<<Var[X/:X] or Var[X2]<<:X2Var[X] =:X2FX2.
Most often, only the first two elements of the expansion in the right hand side of Eq.
(1.88) are needed to obtain a reasonably accurate value for the mean, and for the variance only
the first element is needed. Thus,
E[Y]=E[g(X)].g(E[X])
Var[Y]=Var[g(X)].Var[X](g'(:X))2
(1.91)
Similar approximations are also possible for multidimensional functional relationship.
For instance, let
Y=g(X1,X2,X3,.....Xn)
The second order approximation for the mean is obtained by,
(1.92)
and noting that
The first order approximation for the variance is
(1.93)
51
When the random variables {Xi} are mutually independent, the following simpler
expressions are obtained:
(1.94)
and
(1.95)
where :i and Fi are the mean and variance of the random variable Xi.
Finally, by series expansion, (cf. Eq. (1.88), a simple approximation for the distribution
of Y can be obtained. By conserving the first two elements of the right hand side of Eq. (1.88),
the following expression for the mean is obtained:
Y•g(:X)-:XgN(:X)+xgN(:X)
which is of the form Y=a+bX. Then if X has the density function p(x),
or
(1.96)
Example 1.23
In Fig. 1.15, the internal stress distribution in a cross section of a reinforced concrete
beam in pure bending is shown, just before breaking. Now let the ultimate breaking stress of the
concrete correspond to the random variable C with :C=28 MPa and a coefficient of variation
VC=0.25, and the yield stress of the reinforcing steel bars correspond to the random variable S
with :S=308 MPa and VS=0.1. The cross section of the beam has a so-called balanced design,
that is, in the ultimate state, the concrete and the steel will reach their ultimate limits
Fig. 1.15 The Ultimate Breaking State of a RC Beam
52
simultaneously. The mean value and the variance of the breaking moment of the beam shall be
determined, given that the steel reinforcement cross section area is AS=942 mm2.
Solution:
The ultimate breaking moment of a RC beam with a balanced design can be expressed as
follows, (cf. Fig. 1.15).
thus M=g(S,C) is a function of the two random variables S and C that must be independent due
to
the nature of the problem, i.e. the strength of steel has nothing to do with the strength of
concrete.
At first the partial derivatives must be evaluated.
g(s,c)=5.28@105@s-1400@s2/c Nmm
gNs=5.28@105-2800@s/c , gOs=-2800/c
gNc=1400@s2/c2 , gOcc= -2800@s2/c3
(gOss)2=104 for c=:C , (gOcc)2=114@104 for c=:C and s=:S
By the expressions Eqs. (1.94) and (1.95), it is now possible to obtain the necessary information
about the ultimate breaking moment.
g(:S,:C)=5.28@105@308-1400@3082/[email protected]@106=157.9@106 Nmm
=-(2800/28@(0.1@308)2+2800@3082/283@(0.25@28)2)/2=-0.34@106
:M=157.9-0.34=157.6 KNm
53
The variance of the breaking moment is obtained from Eq. (1.95)
=[(5.28@105-2800@308/28)2@(0.1@308)2
+(1400@3082/282)2@(0.25@28)2]=(15.35@106)2 (Nmm)2
and the co-efficient of variation VM=FM/:M=15.35/157.6=0.0973. Therefore the fundamental
assumption for the approximation is fulfilled, that is, the coefficient of variation is sufficiently
small, (V#0.1).
If a reliable design for the moment is defined as the mean value minus two standard
deviations, [email protected]=126.9 KNm, the probability of failure can be computed as
P[M#MD]=pf. Assuming that the moment is a normally distributed variable, the probability of
failure can be obtained from the standardized variable U=(M-:M)/FM, (cf. Ex. 1.12), that is,
pf = P[U#(-2)] = 1-P[U#2] = 1-0.9772 = 0.0228
In other words, if 1000 beams according to the above conditions were to be made, 23 of them
could be expected to be a failure.
1.2.9
Characteristic Functions
The concept of characteristic functions of a random variable can be very useful in many
applications. The characteristic function of the random variable X is defined as the function,
n(T) = E[eiTX]
in which T is a real variable and i is the imaginary unit. The expected value of the function
exp(iTX) is obtained in the usual manner by integration,
(1.97)
The two functions n(T) and p(x) bear close resemblance to functions that form Fourier transform
pairs, (for an overview of Fourier transforms and Fourier integrals see Chapter 7). In order to
that p(x) has a Fourier transform, it must satisfy the Dirichlet's condition, that is, be absolutely
integrable or
. Now since
, Dirichlet's condition is
fulfilled and the function n(-T) is the same as the Fourier transform of the probability density
p(x). By way of the inverse Fourier transform, the probability density p(x) can be obtained from
the characteristic function,
54
(1.98)
that is, the probability density p(x) is the inverse transform of n(-T), or its representation in the
x-domain.
If the random variable X has a finite mean value, then Eq. (1.97) can be differentiated
with respect to T and
If T= 0, then from the above relation,
and
If the k-th moment of X is finite, then through repeated differentiation of Eq. (1.97), one finally
obtains for the k-th derivative,
n(k)(T) = E[ikXkeiTX]
(1.99)
or with T = 0,
(1.100)
Thus the k-th moment of a random variable has the same magnitude as the k-th derivative of its
characteristic function at the point T= 0, and a sign corresponding to ik.
Example 1.24
The probability density of the Gaussian distribution is
Find the moments of the distribution.
Solution:
55
Firstly, the variable X will be standardized, that is,
U = (X-:)/F, X =FU+:
The characteristic functions of U and X are then related through the expression,
nX(T) = E[e(FU+:)iT]=E[eFUiT]@ei:T = ei:TnU(T)
As p(u) = exp(-u2/2)/%(2B), the characteristic function nU(T) is easily obtained,
Therefore,
(1.101)
It can be observed that all moments of the normal distribution are functions of : and F only.
That is, the normal distribution is completely characterized by its mean and variance.
--------------The log-characteristic function should also be mentioned:
R(T) = logen(T)
(1.102)
It can be used for obtaining the so-called semi-invariants or cumulants 6s, first introduced by the
Danish mathematician T. N. Thiele, (see, for instance, Cramer [41]). Indeed,
(1.103)
The semi-invariants of the random variable X are thus related to the central moments as follows,
61 = m1 = :X = :1
62 = :2 = FX2
63 = :3
64 = :4-3:22
..........................
(1.104)
56
Example 1.25
From the characteristic function n(T), an other important function, the moment
generating function, can be obtained by changing the argument T to it where i is the imaginary
unit. Thus a real function M(t) is obtained, i.e.
M(t) = n(it) = E[e-tX]
(1.105)
Using the Taylor series expansion of the exponential function,
e-tx = 1-tx+(tx)2/2!-(tx)3/3!+(tx)4/4!-@@@@@@
the moment generating function is expanded as follows if the first n moments are finite:
(1.106)
Thus, the j-th moment of the random variable X is the coefficient of tj/j! in Eq. (1.106)
E[Xj] = M(j)(0)
(1.107)
The characteristic function itself can be expanded in the same manner (cf. Eq. (1.100)),
(1.108)
Example 1.26
The Poisson distribution for a discrete random variable X can be very useful when
dealing with random arrival times, for instance for accidents occurring at a certain road sections,
earthquake waves arriving at a specific site etc. The discrete variable X has a Poisson
distribution, that is, it can assume the values xk = k, where the k's are non-negative integers, with
the probability
(1.109)
in which 8 is a positive real constant. Obtain the characteristic function n(T) and the
semi-invariants of the Poisson distribution.
57
Solution:
By Eqs. (1.100) and (1.33),
so
(1.110)
Then
Also by Eq. (1.103),
so the three first central moments of the Poisson distribution are equal to 8, cf. Eq. (1.104).
-------------The characteristic function Eq. (1.97) can easily be extended to describe more than one
random variable. Let X,Y and Z be three random variables. Then the joint characteristic
function is given by
(1.111)
and the log-characteristic function is given by
RXYZ(T1,T2,T3) = logenXYZ(T1,T2,T3)
(1.112)
The joint characteristic function can be obtained from integration of the joint density in the usual
manner, Eq. (1.98),
(1.113)
Thus the joint characteristic function is but for the sign of the power in the exponential, the triple
Fourier transform of the joint density function. The inverse relation is obtained in the same way
as for the one-dimensional characteristic function, Eq. (1.99),
58
(1.114)
The marginal characteristic functions can be defined and expressed in the same manner as
marginal densities, Eq. (1.55). For instance, putting T2=T3=0 in Eq. (1.113], the onedimensional characteristic function is obtained through double integration, i.e.
nX(T) = E[eiTX] = nXYZ(T,0,0)
(1.115)
Now, consider the sum variable Z=aX+bY. The characteristic function of Z is
nZ(T) = E[eiTZ] = E[ei(aTX+bTY)]
The last exponential is the same as in Eq. (1.111), where T1 and T2 have been replaced by aT
and bT. Therefore,
nZ(T) = nXY(aT,bT)
If the two variables X and Y are independent, then
and
nXY(T1,T2) = nX(T1)@nY(T2)
(1.116)
Using the inverse transform, Eq. (1.114), it is easy to show that the converse statement is true.
Example 1.27
The two discrete random variables X and Y are independent and Poisson-distributed, Ex.
1.26, with the parameters " and $ respectively. What is the distribution of the sum variable
Z=X+Y?
Solution:
From Ex. 1.24, Eq. (1.110),
By Eq. (1.116),
59
so the sum variable Z is also Poisson-distributed with a parameter ("+$). It can be proven that
the converse statement is also true.
1.2.10 Laws of Large Numbers
In many cases the result or outcome of an action, which is repeated a great number of
times, will gradually tend to a pattern that can be predicted at least intuitively. For instance,
tossing a coin many times, the heads will come up in 50% of the tosses on the average. In other
words, the proportion of heads and tails will tend to 1/2 as the number of tosses becomes
sufficiently large. As an other example consider a basket that contains 100 yellow and 300 white
golf balls. If a ball is taken out at random, there is no obvious pattern to be noticed. However,
when a large number of balls have been taken out of the basket, one would expect the ratio of
yellow to white golf balls, thus extracted, to be 1/3. It can be useful to dress this simple
probabilistic model into a theorem that reflects this behaviour. Actually, it is this theorem,
which is called the law of large numbers, that makes the probability theory useful as a basic tool
for statistical evaluations.
Let Xi, i=1,2,...,n be independent random variables, all with the mean value : and
variance F2. By forming the statistical average, a new random variable Y is formed,
The mean value of Y is easily obtained,
and the variance
The law of large numbers may then be stated as follows. For any prescribed constant *>0, no
matter how small,
(1.117)
60
that is, the probability that Y and the mean : remain apart by as much as * tends to zero for
sufficiently large numbers of variables. The proof is easily carried out by using the Chebyshev
inequality, Eq. (1.81), that is,
so
The law of large numbers provides a theoretical counterpart of the interpretation of
probability as the favourable fraction that was discussed in section 1.1.1, Eq. (1.1). For this
purpose, consider first a discrete random variable X that can only assume two different values
with probabilities p and q=1-p and is therefore a Bernoulli type variable, Eq. (1.30). Without
loss of generality, the two values can be taken as 0 and 1. Its probability distribution can be
conveniently expressed by the following formula,
P[X = k]=pk(1-p)1-k, for k=0 or 1
Its probability density function can be written using the delta function, Eq. (1.33).
p(x)=(1-p)*(x)+p*(x-1)
The expectation of any function g(X) of a Bernoulli variable is then easily obtained by
integration, that is,
E[g(X)]=g(1)p+g(0)(1-p)
For instance, the k-th moment E[Xk]=p, so :=p. Now if a large sequence of n Bernoulli trials is
carried out with j successes, (e.g. X=1 is a success), the law of large numbers states that,
(1.118)
That is, the probability that the favourable fraction j/n is different from the actual probability p
by more than a small prescribed number *>0, tends to zero for large number of trials n.
Sometimes a distinction is made whether the convergence implied by Eq. (1.118) is
strong or weak. The weak law of large numbers as stated above is a special case of Khinchine's
theorem, which states that a random variable Y, which is the average value of the n random IID
variables {Xi}, converges to : in probability even if the variances, Var[Xi], are infinite, that is,
(1.119)
The strong law of large numbers on the other hand implies that the convergence is stronger as
61
the name indicates, that is,
(1.120)
which states that Y converges to : with probability one.
The weak law of large numbers can be proved by applying the characteristic function of
the common variable X. By Eq. (1.108), the two first terms of the expansion of the characteristic
function are given by,
n(T)=1+i:T+o(T)
The first derivative is i: and is also continuous at T= 0. The log-characteristic function Eq.
(1.103) is R(T)=logen(T) and its first derivative by Eq. (1.104) is i:. Now let n*(T) be the
characteristic function of the average value Y. It can be interpreted by studying the logcharacteristic function R*(T)=logen*(T),
In the limit, as n goes to infinity, the last fraction becomes the first derivative of the logcharacteristic function taken at T=0, i:, so the whole exercise yields the value i:T. As the
logarithm function is continuous, the reverse function is obtained as
n*(T) = ei:T
which is the characteristic function of a random variable that is a constant, equal to :! Therefore,
the characteristic function of the random variable Y degenerates to the form where Y is equal to
:, such that Y tends to assume this value in the limit.
1.2.11 The Central Limit Theorem
Whereas the law of large numbers stated that the average value of a number of IID
variables Xi, i=1,2,...,n, tends to become a constant for large values of n, this does not mean that
the sum of the variables approaches a constant multiple of n in the limit. On the contrary, the
sum Sn=X1+X2+X3+@@@+Xn shows increasing variability as the number n grows larger and the
spread of the distribution of Sn becomes larger and larger. The mean value of Sn is obviously
equal to n: and the variance is nF2. Thus for finite values of : and F, as n becomes a very large
number, the centre of the distribution of Sn moves off to infinity with a very large spread.
The above revelation about the variable Sn may seem disappointing. However, it can
62
nevertheless be useful to have some idea about the shape of the distribution of Sn. Actually, it
can be shown that Sn has a universal limiting shape, regardless of the distribution of the
summands Xi, subject to reasonable restrictions. This remarkable feature of Sn is the kernel of
the Central Limit Theorem, which can be stated as follows. Let Xi, i=1,2,...,n, be a sequence of
independent and identically distributed random variables. The mean and variance of each of the
variables Xi is : and F2. Denote by Sn the sum of the variables. Then, the standardized variable.
will be asymptotically normally distributed, that is, as n goes to infinity, the distribution function
of Zn approaches the standardized normal distribution, N(z) with mean 0 and variance 1.
(1.121)
for any value of z. Hence, approximately for large n,
(1.121b)
To prove this important theorem, it is assumed that the standard deviation F of the common basic
variable Xi is finite and not equal to zero. Then, the characteristic function of the distribution of
Xi-: can be expressed as follows, (Eq. (1.108),
Now form the variable,
The characteristic function of Zn = Y1+Y2+Y3+@@@+Yn is obtained from the characteristic function
of the Yi,s
and by raising it to the nth power,
63
Now recalling that e-x=lim (1-x/n)n for n64, also noting that no(t2/n) becomes zero in the limit for
an arbitrary value of T, the characteristic function for Zn is obtained by going to the limit, i.e.
which is the characteristic function of a standardized normal variable, cf. Eq. (1.102). Since the
characteristic function of a random variable is unique this implies that in the limit, Zn has the
standardised normal distribution.
The Central Limit Theorem has been proved for a large number of random
contributions that are identically distributed. This condition can be relaxed and the Central Limit
Theorem is also valid for a large sum of small random contributions that are independent but can
have different probability laws. As pointed out by Kolmogorov, [5], perhaps the greatest service
rendered by the classical Russian school of mathematics, is the work done by Chebychev and his
students, Markov, Bernstein and Lyapunov to formulate, expand and generalize the conditions
for the laws of large numbers and the Central Limit Theorem. Under fairly general conditions,
Lyapunov and Bernstein could show that for an arithmetic mean of random variables,
Y = (X1+X2+@@@+Xn)/n
(1.122)
where the Xi are random variables with bounded individual variances, not necessarily
independent and not necessarily with identical probability distributions,
(1.123)
where the error function,
(1.124)
which is sometimes used in the literature to represent the normal distribution, has been
introduced for future reference. Finally, it can be useful to state the conditions for the above
result, Eq. (1.121), for independent variables with different probability distributions. In this
dress, the theorem is often referred to as Lyapounov's theorem, [71]. If for a sequence of
mutually independent random variables X1,X2,....Xk,..., a positive constant * can be found such
that for n64
64
(1.125)
where
(1.126)
then as n64
(1.127)
Example 1.28
The sum of n real numbers is calculated by rounding each number off to the nearest
integer in the sum. If the round-off error for each number is uniformly distributed over the
interval [-0.5,0.5], find the probability distribution for the round-off error for the sum itself.
Solution:
Letting Xi denote the random value of the round-off error, the mean value is obviously
equal to zero as the density is equal to one in the interval [-0.5,0.5]. The variance of the roundoff error on the other hand is given by the integral,
The probability that the sum of the round-off errors, denoted by Sn, is less than any given number
approaches the normal distribution for large value of n, cf. Eq. (1.121b). Hence,
Example 1.29 Kolmogorov's Law of Fragmentation
Many physical problems have to do with the fragmentation of a larger piece of material
into many small pieces and also the size of grains or particles in a large collection. A typical
example is the size of gold particles in a large sample of gold sand. Many studies of such
problems have shown, that the logarithm of the dimension of the grains or fragmented pieces
65
from a large sample will approximately follow the normal distribution, that is, if a typical
dimension of the particle is called D=eY, Y=logeD is normally distributed. That is
whereby the lognormal distribution has the following probability density
(1.128)
Using the moment generating function for Y=logeD, Eqs. (1.102) and (1.105),
E[Dk]=E[ekY]=M(k)=exp(k:Y+FY2k2/2)
the mean value, k=1, is
:D=exp(:Y+(1/2)FY2)
(1.129)
and the variance
FD2=E[D2]-(E[D])2=(exp(FY2)-1)exp(2:Y+FY2)
(1.130)
The median of D, Ex. 1.13, is easily obtained by noting that
P[X#M2]=P[eY#M2]=P[Y#logeM2]=1/2
whereby, logeM2=:Y so M2=exp(:Y).
In a short paper, Kolmogorov [111] (reproduced by permission of Prof. Dr. A.N.
Shirayev), presented and formalized this interesting probabilistic behaviour, which may be
referenced as the Kolmogorov's law of fragmentation. This work has perhaps not received the
attention that it deserves. In the following, a translation that follows loosely the German text, is
presented. Since the paper is very short and compact without many explanations, further
explanations by the author and Dr Miguel de Icaza have been added, [88].
Following Kolmogorov, let N(t) be the total number of particles at integer times t,
t = 0,1,2,3,..., due to fragmentation from a larger piece of material, (rock, gold etc.). Further,
N(r,t) is the total number of particles that at time t have a typical dimension, (diameter, volume,
weight etc.), k#r, that is, the fragmentation of a piece of material of size r can only yield
particles that are smaller. Now assume that between times t and t+1, the probability that one
single particle of size r is fragmented into n smaller particles is pn. Introduce the relative ratio of
the dimension of each of the n fragmented particles to the dimension of the parent piece r as ki =
66
ri/r. Obviously, ki is a random variable Ki and the joint probability distribution can be written as
follows
Fn(a1,a2,...,an)=P[K1#a1, K2#a2, ..., Kn#an]
(1.131)
where the n fragmented particles have been numbered according to increasing dimensions,
r1#r2#r3#...#rn, so the distribution Eq. (1.131) is only defined for 0#a1#a2#...#an#1. Now, the
average number of particles of dimension k#kr, called Q(k), that were produced by one single
particle of an arbitrary dimension r between times t and t+1 is obviously
(1.132)
where each term in the sum is the probability pn times the probabilities that the dimension kr of
each particle belongs to the ordered groups. This last statement Eq. (1.132) is perhaps not as
clear to everybody as it was to Kolmogorov. The following explanation may be given. In
addition to the above event that a single particle of dimension r is broken up between times t and
t+1, define the subevents: (a) J, which is the event that precisely j particles were produced
having dimensions less than or equal to k, and (b) J* , the event that at least j, ($j), particles were
produced, having K#k, in both cases given that in all n particles were produced. Obviously the
two events, J and (J+1)* are disjoint events and
J+(J+1)* = J*
Therefore, the conditional probability of the event J, called qi is given by
P[J*n] = qj = P[J**n]-P[(J+1)**n]
(1.133)
The probability qj is then given by Eq. (1.133) and the distribution Eq. (1.131), that is,
where
follows
qi = Fn ({k,k,.....k,}i 1,1,....1)-Fn ({k,k,.....k,}i+1 1,1,....1)
(1.134)
Fn ({k,k,.....k,}i+1 1,1,....1) is defined as zero if i=n. It is now possible to write up Q(k) as
(1.135)
where
(1.136)
67
is the conditional mean value of the number of particles with K#k, conditional to that exactly n
particles were produced, and qi is the probability to have exactly i particles for which K#k, Eq.
(1.134). By introducing this result, i.e. the last sum, Eq. (1.136), into Eq. (1.135), the original
Kolmogorov equation, Eq. (1.132), is obtained.
The following reasonable assumptions are now made:
(a) the probabilities pn and the distribution Fn are not dependent on the absolute dimensions of
the particles, nor on the history of the fragmentation, that is, how a particle of any size at an
earlier stage was broken up, nor on the destiny of other particles.
(b) the expectation Q(1), that is, the average number of particles produced by fragmentation of
one single particle of size r, during the time interval [t,t+1] without regard to size of the
fragmented particles, k#r, is both limited and larger than one.
(c) the integral
is finite.
(d) at the beginning, t = 0, a certain number n(0)of particles is at hand, which can have an
arbitrary distribution of size N(r,0).
Under these assumptions, the expectation or the average number N(t) of particles at time t is
E[N(t)] = n(0)(Q(1))t
(1.137)
Now, form the ratio
(1.138)
where the function N(ex,t) represents the number of particles at time t that have the dimension
k#r=ex with x as a real variable. (As pointed out by Kolmogorov, it is completely irrelevant
what the dimension k really is).
The real task is now to estimate the function T(x,t). From the assumptions (a) and (b), it
follows that
(1.139)
that is the average number of particles at time t+1, E[N(r,t+1)], is obtained by summing over all
particles with a dimension k#kr, that are produced from the available particles at time t, letting k
run through all values between 0 and 1. Eq. (1.139) needs further explanation since it is far from
being obvious. Take any partition at time t, {k1,k2,.....km} such that 0<L#k1#k2#k3#@@@#km=1, L
being a very small number. E[N(r/ks+1,t)] is the average number of particles with k#r/ks+1.
68
However, these particles are also counted or included in the number E[N(r/ks,t)]. Therefore, the
particles E[N(r,t)] are counted or included in all the numbers E[N(r/ks,t)], s=1,2,3,...m.
Now from E[N(r,t)], E[N(r,t)]Q(1)=E[N(r/km),t)]Q(km) particles are produced between t and t+1.
From E[Nr/km-1,t)-N(r/km,t)], E[Nr/km-1,t)-N(r/km,t)]Q(km-1) particles are produced
......
......
From E[Nr/k1,t)-N(r/k2,t)], E[Nr/k1,t)-N(r/k2,t)]Q(k1) particles are produced.
Summing up,
By making the partition finer and finer, the above sum tends to the integral, L60+,
whereby Eq. (1.139) is obtained.
Now put
Q(k)=Q(1)S(>), >=logek , 0<k<1 and -4<><0
(1.140)
Then by Eqs. (1.138) and (1.139),
(1.141)
By Eqs. (1.138) and (1.140), the two functions, S(x) and T(x,0) = N(ex,0)/n(0) fulfil all
the conditions of probability distribution functions. If x6 -4, k60, and both S(x) and T(x,0) tend
to zero. If k61, x60 and both functions tend to one. If x>0, S(x) = S(0) =1 and the upper limit 0
in integrals involving S(x) can be replaced by 4. The two functions, therefore, have the same
character as ordinary probability distribution functions, even if they should not be considered as
such in the strictest sense. From the recurrence equation, Eq. (1.141), the same applies to the
function T(x,t) for any integer t>0.
69
From the assumption (c), it follows that the integral
possible
is finite. It is now
to apply Lyapunov's theorem, Eq. (1.127), which allows the interpretation that for t64,
(1.142)
in which
(1.143)
Through Eq. (1.133), these integrals can be replaced by
(1.144)
in which A and B2 represent the mean value and the variance of the variable logeK, which by Eq.
(1.142) is normally distributed, so K is lognormal.
As in many of the above derivations, the final result Eq. (1.132) is far from being obvious. An
explanation may be sought as follows. The recurrence equation Eq. (1.141) is a kind of a
Stieltjes integral for two distribution functions, [71]. In fact, consider two independent random
variables U and V and their sum W=U+V, ( cf. Ex. 1.21). Then
(1.145)
If S(x) and T(x,t) are considered to be two such distribution functions, that is T(x,0) is the
distribution function of the variable U and S(x) of V, then by Eqs. (1.141) and (1.145), T(x,1) is
the distribution function of the variable U+V1, T(x,2) is the distribution function of the variable
U+V1+V2 and T(x,n) is the distribution function of the variable U+V1+V2+@@@+Vn. Introducing
the mean values and the variances of the individual variables and the sum divided by n, (n tends
to a large number),
and
Ak=A=E[Vk] and An/n=E[U+V1+V2+@@@+Vn]/n=E[U]/n+A 6A
Bk2=B2=Var[Vk] and Bn2/n=Var[U+V1+V2+@@@+Vn]/n=Var[U]/n+B26B2
where the A and B are given by the above integrals Eq. (1.143). By selecting the arbitrary
number * in the Lyapounov's condition Eq. (1.125) equal to one, it suffices to show that
E[*Vk-Ak*3]#"<4
70
where " is a positive bounded number. From the condition (c) it follows that
so
the conditions for Lyapounov's theorem are fulfilled, and it has been shown that T(x,t) has the
distribution Eq. (1.142).
The Kolmogorov law of fragmentation has proved very useful in many different and
often unrelated problems. For instance, the law of fragmentation has been used for the study of
stress release in earthquake fracture zones, and in soil mechanics studies, [129], [131].
1.2.12 Distributions of Extreme Values
In Exs. 1.9 and 1.18, the probability distribution of the largest value of a collection of
sample values was obtained, Eq. (1.37). The distributions of extreme values from a population
or series of measurements or observation of certain physical quantities, show up in many kinds
of applications, dealing with the assessment or evaluation of the maxima of these quantities.
Maximum annual wind speeds at a given location, maximum earthquake magnitudes to be
expected in certain areas, the lowest values of strength to be expected in a material, to name but
a few types of such problems, that have to be addressed through the probability distributions of
extreme values of the quantities involved. Due to the importance of this particular field of
application of probability theory to engineers and other scientists, a brief overview of the
distributions of extreme values will be presented, following the classical approach, [57], [73],
[87], [176] and [239].
By Eq. (1.68) it was shown that the probability distribution of the largest or extreme
value Yn from a collection of n random variables Xi, i=1,2,...,n that are independent and
identically distributed with the parent distribution F(x), is found by the n-th power of parent
distribution, that is
y=max{x1,x2,....,xn}
P[Yn#y]=Qn(y) = (F(y))n
(1.68)
Under certain assumptions, it can be shown that as n grows very large, n64, Q(y) approaches
asymptotically a distribution that only depends on few characteristic features of the parent
distribution F(x). An asymptotic extreme value distribution, therefore, has the common property
71
that its mean value and variance may depend on n but its form will be independent of n.
It is customary to introduce the so-called attractions coefficients "n and $n to facilitate the
above implied convergence such that
(1.146)
(1.147)
The first equation Eq. (1.146) can be shown to have in general, three different solutions or types
of limiting distribution functions for large values of n that are governed by some simple
fundamental characteristics of the parent distribution F(x). (1) If the random variable X is
unlimited and (1-F(x)) is asymptotically equivalent to the exponential function for large x, an
asymptotic extreme value distribution of the 1st kind is obtained. (2) If the random variable X
has a lower limit such that (X-"1) can only have positive values, and (1-F(x)) is asymptotically
equivalent to cx-k for large x, an asymptotic extreme value distribution of the 2nd kind is
obtained. (3) The asymptotic extreme value distribution of the 3rd kind is obtained when (X-"1)
can only assume negative values and (1-F(x)) is asymptotically equivalent to c("1-x)k for values
of x close to the limit "1. This surprising result, that regardless of the diverse parent distributions
that can be encountered in various applications, only three distinct extreme value distributions
emerge from the limiting process, is truly remarkable.
In Table 1.2, the main properties of the above limiting distribution types are shown with
the attraction coefficients as they turn out during the limiting process. For the first distribution,
the attraction coefficient "n, also called the localization parameter, along with the expected
value, increases proportionally with logen, whereas the spread is unaffected. For the second
distribution, the localization parameter and the expected value are unaffected by larger n but the
spread, i.e. the variance, increases proportionally with n1/k. As for the third distribution, the
expected value is also unaffected, whereas the spread decreases proportionally with n-1/k.
Table 1.2 Properties of Asymptotic Extreme Value Distributions, (After C. Dyrbye et al. [57]).
Type
Qi(x)
"n
$n
:n-"
Fn2
I
exp[-e-x]
"1+$1@logen
$1
$n(*)
$n2B2/6
II
exp[-x-k]
"1
$1n1/k
$n'(1-1/k)
$n2['(1-2/k)-'2(1-1/k)]
III
exp[-(-x)k]
"1
$1n-1/k
-$n'(1-1/k)
$n2['(1+2/k)-'2(1+1/k)]
72
*) ( = 0.5772.. is Euler's constant
The three extreme value distributions that have thus been established, have the following
properties. For the sake of convenience, the distributions for large positive values (maxima) are
presented. The minima are easily obtained by changing the variables as follows,
min{xi}=max{-xi}
(1.148)
The extreme value distributions for the minima are thus completely analogous. The first
distribution is either called the Gumbel distribution or Fischer-Tippet type I, after the
statisticians, who played a major part in the development of the theory of extreme values, and
has the form
(1.149)
whereas the probability density function is given by
(1.150)
This distribution has the mean value and variance
E[Y]="+$( and Var[Y]=$2B2/6
(1.151)
with the main parameters " and $>0 and an additional constant ( = 0.5772...., which is an
irrational number, the so-called Euler's constant. The Gumbel distribution is widely applicable
in problems when dealing with extreme values that can have both negative and positive values.
However, it can also be applied for variables that are positive only, as its negative tail is very
weak. It has been used, for example, to predict maximum wind speeds, floods and maximum
earthquake magnitudes. It should also be mentioned that it can be shown that the requirement
that all the random variables {Xi }have the same parent distribution can be relaxed, still giving
the Gumbel distribution as the limiting distribution, [87].
It is often convenient to study the standardized Gumbel distribution, which is introduced
by the coordinate transformation
whereby
Q(z)=exp{-exp(-z)} , -4<z<4
The form of the standardized distribution is shown in Fig. 1.16.
(1.152)
73
Finally, the Gumbel distribution is sometimes used as a distribution for minimum values,
Y=min{X1,X2,X3,....}, in which case it takes the form
with the mean value and variance
E[Y]="-$( and Var[Y]=$2B2/6
(1.153)
The second distribution is often called the Fischer-Tippet Type II or the Frêchet
distribution, after the French statistician and mathematician, and has the form
Fig. 1.16 The standardized Gumbel distribution
(1.154)
Usually the lower limit of y is set at 0. The mean value and variance are easily derived
E[Y]="+$'[1-1/k], k>1 and Var[Y]=$2{'[1-2/k]-'2[1-1/k]}, k>2
(1.155)
The Frêchet distribution is convenient when the extreme values are positive only. It has found
use in wind engineering to predict maximum wind speeds to name an example, [4]. Often the
74
coefficient " is disregarded so as to have a two parameter distribution only. In this form the
distribution is written as follows:
(1.156)
The expectation and the coefficient of variation are then given as follows:
(1.157)
which shows that the coefficient of variation is only dependent on the second parameter k,
making it possible to calculate k if the variation of the data is known (see Ex. 1.32). The general
form of the of the Frêchet distribution is shown in Fig. 1.17.
There exists a similar relation between the Gumbel distribution (type I) and the Frêchet
distribution (type II) as between the normal and lognormal distributions. If Y is a Frêchet
variable with the parameters $F and k, then Z=logeY is a Gumbel variable with the parameters
"=loge$F and 1/$G=k, whereby
(1.158)
The third distribution, which is often called the Fischer-Tippet Type III or the Weibull
distribution, after the Swedish engineer and statistician, has the form
(1.159)
75
Fig. 1.17 The Frêchet distribution with the parameters $=1 and k=3
but is more often used for the distribution of minima in the form (cf. Eq. (1.148))
Q( y ) = 1 − exp[ −
( y − α)k
β
k
] , β ,k > 0 , α > −∞
(1.160)
Its density function is given by
(1.161)
The mean value and variance of the Weibull distribution are found to be
E[Y]="+$'[1+1/k] and Var[Y]=$2{'[1+2/k]-'2[1+1/k]}
(1.162)
in which '(x) is the Gamma function with the basic properties '(1+x)=x '(x) and Γ( 21 ) = π .
Setting the lower limit for y equal to zero ("=0) gives the following simpler version of the
distribution
(1.163)
where 8=1/$k. Two important special cases are found when k=1 and k=2. In the first case,
(1.164)
76
which is the exponential distribution. In the second case (k=2),
 y2 
Q( y ) = 1 − exp[ −  2  ]
 2c 
 y2 
y
q ( y ) = 2 exp[ 2  ] , 0 ≤ y < ∞
c
 2c 
(1.165)
where c is the modus of the distribution (cf. Ex. 1.13). This is the classical Rayleigh distribution
which has found widespread use in various applications (cf. Ex. 2.3).
The Weibull distribution has found use in material science, dealing with material
strength, in its form for distributions of minima, but can also be used in applications involving
maxima. It can be used for wind speed distribution as well. In section 4.4, various forms of the
Weibull distribution are shown in connection with distribution of extreme peaks of a random
signal.
The above presentation of the extreme value distributions has been both short and
superficial. It is only included for reference, that is, for the reader to have an easy access to the
main formulas. For further details, the reader should consult Gumbel, [73], or any other modern
text on extreme value statistics, [87], [176]. Some interesting applications and an overview of
the most recent development in the theory of extreme values are found in the proceedings of a
conference commemorating the centennial anniversary of Emil Gumbel’s birthday, [68].