Download Chapter-4:Probability Distributions and Their Applications

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Inductive probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Chapter IV
Probability distributions and their applications
4.1
DISCRETE DISTRIBUTIONS
4.1.1
Binomial distribution
Consider a discrete time scale. At each point on this time scale, an event may either
occur or not occur. Let the probability of the event occurring be p for every point on
the time scale. Thus, the occurrence of the event at any point on the time scale is
independent of the history of any prior occurrences or non-occurrences. The
probability of an occurrence at the i th point on the time scale is p for i = 1,2, K . A
process having these properties is said to be a Bernoulli process.
As an example of a Bernoulli process consider that during any year the probability of
the maximum flow exceeding 10,000 cubic feet per second (cfs) on a particular river
is p . Common terminology for a flow exceeding a given value is an exceedance.
Further consider that the peak flow in any year is independent from year to year (a
necessary condition for the process to be a Bernoulli process). Let q = 1 − p be the
probability of not exceeding 10,000 cfs. We can neglect the probability of a peak
exactly 10000 cfs since the peak flow rates would be a continuous process so the
probability of a peak exactly 10000 cfs would be zero. In this example, the time scale
is discrete with the points nominally 1 year in time apart. We can now make certain
probabilistic statements about the occurrence of a peak flow in excess of 10000 cfs
(an exceedance).
For example, the probability of an exceedance occurring in year 3 and not in year 1 or
2 is qqp since the process is independent from year to year. The probability of
(exactly) one exceedance in any 3-year period is pqq + qpq + qqp since the
exceedance could occur in either the first, second, or third year. Thus the probability
of (exactly) one exceedance in three years is 3 pq 2 . In a similar manner, the
probability of 2 exceedances in 5 years can be found from the summation of the
terms ppqqq , pqqpq , K , qqqpp . It can be seen that each of theses terms is equivalent
to p 2 q 3 and that the number of terms is equal to the number of ways of arranging 2
Chapter IV: Probability Distributions and Their Applications
items (the p 's) among 5 items (the p 's and q 's). Therefore the total number of terms
⎛5⎞
is ⎜⎜ ⎟⎟ or 10 so that the probability of exactly 2 exceedances in 5 years is 10 p 2 q 3 .
⎝ 2⎠
This result can be generalized so that the probability of X = x exceedances in n
⎛n ⎞
years is ⎜⎜ ⎟⎟ p X q n − X . The result is applicable to any Bernoulli process so that the
⎝X⎠
probability of X = x occurrences of an event in n independent trials if p is the
probability of an occurrence in a single trial is given by:
⎛n⎞
f X ( x : n, p ) = ⎜⎜ ⎟⎟ p x q n − x
⎝ x⎠
x = 0,1,2, K , n
This equation is known as the binomial distribution. The binomial distribution and
the Bernoulli process are not limited to a time scale. Any process that may occur with
probability p at discrete points in time or space or in individual trials may be a
Bernoulli process and follow the binomial distribution.
The cumulative binomial distribution is
X ⎛n⎞
F X ( x; n, p) = ∑i = 0 ⎜⎜ ⎟⎟ p i q n − i
⎝i ⎠
x = 0,1,2, K , n
and gives the probability of x or fewer occurrences of an event in n independent trials
if the probability of an occurrence in any trial is p.
Continuing the above example, the probability of less than 3 exceedances in 5 years is
2 ⎛ 5⎞
FX (2;5, p ) = ∑i =0 ⎜⎜ ⎟⎟ p i q 5−i
⎝i ⎠
= f X (0;5, p) + f X (1;5, p) + f X (2;5, p)
The mean and variance of the binomial distribution are
E ( X ) = np
var( X ) = nqp
The coefficient of skew is (q − p) / npq so that the distribution is symmetrical for
p = q , skewed to the right for q > p and skewed to the left for q < p .
68
Chapter IV: Probability Distributions and Their Applications
The binomial distribution has an additive property. That is if X
has a binomial
distribution with parameters n1 and p and Y has a binomial distribution with
parameters n 2 and p , then Z = X + Y has a binomial distribution with parameters
n = n1 + n2 and p .
The binomial distribution can be used to approximate the hyper-geometric distribution
if the sample selected is small in comparison to the number of items N from which
sample is drawn. In this case the probability of a success would be about the same for
each trial.
Example: In order to be 90 percent sure that a design storm is not exceeded in a 10
year period. What should be the return period of the design storm?
Solution: Let p be the probability of the design storm being exceeded. The
probability of no exceedances is given by
⎛10 ⎞
f X (0;10, p) = ⎜⎜ ⎟⎟ p 0 q 10
⎝0 ⎠
0.90 = (1 − p)10
p = 1 − (0.90)1 / 10 = 1 − 0.9895 = 0.0105
T = 1 / p = 95 years.
Comment: To be 90 percent sure that a design storm is not exceeded in a 10-year
period a 95-year return storm must be used. If a 10-year return period storm is used,
the chances of it being exceeded is 1 − f X (0;0,1) = 0.6513 . In general the chance of at
least one occurrence of a T-year event in T-years is 1 − f X (0; T ,1 / T ) = 1 − (1 − 1 / T ) T .
Therefore, for a long design life, the chance of at least one occurrence of an event
with return period equal to the design life approaches 1 − 1 / e or 0.632.Thus if the
design life of a structure and its design return period are the same, the chances are
very great that the capacity of the structure will be exceeded during its design life.
4.1.2
Poisson distribution
The Poisson distribution is like the binomial distribution in that it describes
phenomena for which the average probability of an event is constant, independent of
the number of previous events. In this case, however, the system undergoes transitions
69
Chapter IV: Probability Distributions and Their Applications
randomly from one state with n occurrences of an event to another with ( n + 1)
occurrences, in a process that is irreversible. That is, the ordering of the events cannot
be interchanged. Another distinction between the binomial and Poisson distributions
is that for the Poisson process the number of possible events should be large.
The Poisson distribution may be inferred from the identity
e −µ e µ = 1
where the most probable number of occurrences of the event is µ . If the factorial is
expanded in a power series expansion, the probability P(r) that exactly r random
occurrences will take place can be inferred as the r th term in the series, i.e.,
e −µ µ r
p (r ) =
r!
(4.1.2.1)
This probability distribution leads directly to the interpretation that:
e − µ = the probability that an event will not occur,
µ e − µ = the probability that an event will occur exactly once,
( µ 2 / 2! ) e − µ = the probability that an event will occur exactly twice, etc,
The mean and the variance of the Poisson distribution are:
E( X ) = µ
Var ( X ) = µ
The coefficient of skew is µ −1 / 2 so that as µ gets large, the distribution goes from a
positively skewed distribution to a nearly symmetrical distribution. The cumulative
Poisson probability that an event will occur x times or less is:
x
p (≤ x ) = ∑ p ( r )
r =0
Of course, the probability that the event will occur ( x + 1) or more times would be the
complement of P(x).
The Poisson distribution is useful for analyzing the failure of a system that consists of
a large number of identical components that, upon failure, cause irreversible
transitions in the system. Each component is assumed to fail independently and
randomly. Then µ is the most probable number of system failures over the life time.
To summarize:
70
Chapter IV: Probability Distributions and Their Applications
¾ The binomial distribution is useful for systems with two possible outcomes of
events (failure–no failure) in cases where there is a known, finite number of
(Bernoulli) trials and the ordering of the trials does not affect the outcome.
¾ The Poisson distribution treats systems in which randomly occurring
phenomena cause irreversible transitions from one state to another.
Example:
A given nuclear reactor is fueled with 200 assemblies, each of which
can fail if the cladding on a fuel rod fails. If each assembly fails in an independent and
random manner over the exposure time, calculate the probability of 3 assemblies
failing if, on the average, 1% of the fuel assemblies are known to fail. (MacCormick,
1981, p. 34)
Solution: The mean number of assembly failure is µ = 2 , so using equation (4.1.2.1)
for r = 3 gives
P (3) = (2 3 / 3!)e −2 = 0.1804
As a check, we can use the probability of a single assembly failing, p = 0.01 , and the
binomial distribution equation with n = 200 to obtain
P (3) =
4.1.3
200!
(0.01) 3 (0.99) 200−3 = 0.1814
3!(200 − 3)!
Hyper-geometric distribution
Drawing a random sample of size n (without replacement) from a finite population of
size N with the elements of the population divided into two groups with k elements
belonging to one group is an example of sampling from a hyper-geometric
distribution. The two groups may be defective or non-defective objects, rainy or nonrainy days, success or failure of a project, etc.
The total number of possible outcomes or ways of selecting a sample of size n from
⎛N⎞
N objects is ⎜⎜ ⎟⎟ . The number of ways of selecting x successes and n-x failures
⎝n ⎠
⎛k ⎞ ⎛ N − k ⎞
⎟⎟ . Thus
from the population containing k successes and N − k failures is ⎜⎜ ⎟⎟ ⎜⎜
⎝ x⎠ ⎝n − x ⎠
the probability is:
71
Chapter IV: Probability Distributions and Their Applications
⎛k ⎞ ⎛ N − k ⎞ ⎛ N ⎞
⎟⎟ / ⎜⎜ ⎟⎟
f ( x; N , n, k ) = ⎜⎜ ⎟⎟ ⎜⎜
⎝ x⎠ ⎝n − x ⎠ ⎝n ⎠
(4.1)
The distribution given by equation (4.1) is known as the hyper-geometric distribution
where f ( x; N , n, k ) is the probability of obtaining X = x success in a sample of size
n drawn from a population of size N containing k successes.
The cumulative hyper-geometric distribution giving the probability of x or fewer
successes is:
X ⎛k ⎞⎛ N − k ⎞ ⎛ N ⎞
⎟⎟ / ⎜⎜ ⎟⎟
F ( X ; N , n, k ) = ∑i = 0 ⎜⎜ ⎟⎟ ⎜⎜
⎝i ⎠ ⎝ n − i ⎠ ⎝ n ⎠
(4.2)
The natural restriction on this distribution is that the outcomes must be random and
equally likely.
The mean and variance of the hyper-geometric distribution are:
E ( X ) = nk / N
var( X ) = nk ( N − k )( N − n) / N 2 ( N − 1)
4.1.4
Exponential distribution
The probability distribution of the time, T, between occurrences of the event can be
found by noting that the prob (T ≤ t ) is equal to 1– prob (T > t ) . The prob (T > t ) is
equal to the probability of no occurrences in time t which is f (0; λt ) or e − λ t . Thus
prob (T ≤ t ) = PT (t ; λ ) = 1 − e − λ t
(4.3)
which is a cumulative distribution known as the exponential distribution. The
probability density function is
p T (t; λ ) =
dpT (t; λ )
= λe − λ t
dt
(4.4)
and is the probability distribution of the length of the tome interval between
occurrence of the event. The mean and variance of the exponential distribution
are 1 / λ and 1 / λ 2 , respectively.
4.1.5
Gamma distribution
The probability distribution of the time to the n th occurrence can be found by noting
72
Chapter IV: Probability Distributions and Their Applications
that the time to the n th occurrence is the sum of n independent random variables,
T1 + T2 + K + Tn , from the exponential distribution. The method of derived
distributions can be used with the result that the probability density function of the
time to the n th occurrence is
p T (t ; n, λ ) = λ n t n −1 e − λ t /(n − 1)!
t > 0; λ > 0; n = 1, 2, L
(4.5)
which is the gamma distribution. The The mean and variance of the gamma
distribution are
E (T ) = n / λ
Var (T ) = n / λ 2
Example: Barges arrive at a lock at an average of 4 each hour. (a) If the arrival of
barges at the lock can be considered to follow a Poisson process, what is the
probability that 6 barges will arrive in 2 hours? (b) If the lock master has just locked
through all of the barges at the lock, what is the probability he can take a 15 minute
break without another barge arriving? (c) If the operation of the lock is such that 4
barges can be locked through at once and the lock master insists that this always be
the case, what is the probability that the first barge to arrive after 4 previous barges
have been locked through will have to wait at least 1 hour before being locked
through? (Haan, 1979, p. 78)
4.1.6
Multinomial distribution
The binomial distribution can be generalized to include the probabilities of outcomes
of several types rather than the two possible outcomes of the binomial. If the
probabilities associated with each of k distinct outcomes are p1 , p 2 , K , p k , then in
independent trials the probability of X 1 outcomes of type 1, X 2 outcomes of' type 2,
K, X k outcomes of type k, is given by the multinomial distribution as
f X 1 , X 2 K X k ( x1 , x 2 , K , x k ; n, p1 , p 2 , K , p k ) =
n!
x
x
x
p1 1 p 2 2 K p k k
X 1 ! X 2 !K X k !
f X ( x : n , p ) = n! Π ik=1 p iX i / x i !
or
where X , x and p are 1 × k vectors. Some restrictions on this distribution are
∑
k
i =1
pi = 1 and
∑
k
i =1
Xi = n
73
Chapter IV: Probability Distributions and Their Applications
The mean and variance of the multinomial distribution are
E ( X i ) = npi
var( X i ) = npi (1 − pi )
Problem: On a certain stream the probability that the maximum peak flow during a 1year period will be less than 5,000 cfs is 0.2 and the probability that it will be between
5,000 cfs and 10,000 cfs is 0.4. In a 20-year period, what is the probability of 4 peak
flows less than 5,000 cfs and 8 peak flows between 5,000 and 10,000 cfs? (Haan,
1979, p. 80)
4.2
CONTINUOUS DISTRIBUTIONS
4.2.1
Normal distribution
The normal distribution is a two parameter distribution whose density function is
p X ( x) = (2πσ )
2
−1 / 2
−( x − µ )2
e
2σ 2
−∞ < x < ∞
(4.6)
The parameters µ (mean) and σ 2 (variance) are denoted as location and scale
parameters, respectively. The normal distribution is a bell-shaped, continuous and
symmetrical distribution (the coefficient of skew is zero). If µ is held constant and
σ 2 varied, the distribution changes as in Figure 4.2.1.1. If σ 2 is held constant and µ
varied, the distribution does not change scale but docs change location as in Figure
4.2.1.2. A common notation for indicating that a random variable is normally
distributed with mean µ and variance σ 2 is N ( µ , σ 2 ) .
Figure 4.2.1.1 Normal distributions with same mean and different variances
74
Chapter IV: Probability Distributions and Their Applications
Figure 4.2.1.2 Normal distributions with same variance and different means
If a random variable X is N ( µ , σ 2 ) and Y = a + bX , the distribution of Y can be
shown to be N (a + bµ , b 2σ 2 ) . This can be proven using the method of derived
distributions. Furthermore, if X i for i = 1,2,K n , are independently and normally
distributed with mean µ i and variance σ i2 , then Y = a + b1 X 1 + b2 X 2 + L + bn X n
is normally distributed with
µY = a + ∑i =1 bi µi
n
(4.7) and
σ Y2 = ∑i =1 bi2σ i2
n
(4.8)
Any linear function of independent normal random variables is also a normal random
variable.
Example: If X i is a random observation from the distribution N ( µ ⋅ σ 2 ) , what is the
distribution of X = ∑i =1 X i / n ?
n
Solution: X is a linear function of xi given by X = ( x1 + x2 + K + xn ) / n . From
equations 4.7 and 4.8 (the reproductive properties of the normal distribution), X is
normally distributed with mean
µ x = a + ∑i =1 bi µi = 0 + ∑i =1 µ / n = nµ / n = µ
n
n
and variance
σ X2 = ∑i =1 bi2σ i2 = ∑i =1σ 2 / n 2 = nσ 2 / n 2 = σ 2 / n
n
n
Therefore X is N ( µ , σ 2 / n) .
75
Chapter IV: Probability Distributions and Their Applications
Standard normal distribution: The probability that X is less than or equal to x when
X is N ( µ , σ 2 ) can be evaluated from
x
prob( X ≤ x) = p x ( x) = ∫ (2πσ 2 ) −1/ 2 e −(t − µ )
2
/ 2σ 2
−∞
dt
(4.9)
The equation (4.9) cannot be evaluated analytically so that approximate methods of
integration arc required. If a tabulation of the integral was made, a separate table
would be required for each value of µ and σ 2 . By using the liner transformation
Z = ( X − µ ) / σ , the random variable Z will be N(0,1). The random variable Z is said
to be standardized (has µ = 0 and σ 2 = 1 ) and N (0,1) is said to be the standard
normal distribution. The standard normal distribution is given by
p Z ( z ) = (2π ) −1 / 2 e − z
2
−∞ < z < ∞
/2
(4.10)
and the cumulative standard normal is given by
z
prob(Z ≤ z ) = PZ ( z ) = ∫ (2π ) −1 / 2 e −t
−∞
2
/2
dt
(4.11)
Figure 4.2.1.3 shows the standard normal distribution which along with the
transformation Z = ( X − µ ) / σ contains all of the information shown in Figures 4.1
and 4.2. Both pZ (Z ) and PZ (z ) arc widely tabulated. Most tables utilize the
symmetry of the normal distribution so that only positive values of Z are shown.
Tables of PZ (z ) may show prob ( Z ≤ z ) or prob (0 ≤ Z ≤ z ) . Care must be exercised
when using normal probability tables to see what values are tabulated.
Figure 4.2.1.3 Standard normal distribution
76
Chapter IV: Probability Distributions and Their Applications
By studying a standard normal table it can be seen that 68.26% of the normal
distribution is within 1 standard deviation of the mean, 95.44% within 2 standard
deviations of the mean, and 99.72% within 3 standard deviations of the mean. These
are called the 1, 2 and 3 sigma bounds of the normal distribution. The fact that only
0.28% of the area of the normal distribution lies outside the 3 sigma bound
demonstrates that the probability of a value less than µ − 3σ is only 0.14% and is the
justification for using the normal distribution in some instances even though the
random variable under consideration may be bounded by X = 0 . If µ is greater than
3σ , the chances of an X less than zero are many times negligible (this is not always
true however).
4.2.2 Uniform/rectangular distribution
If a continuous random process is defined over an interval a to b and the probability of
an outcome of this process being in a subinterval of a to b is proportional to the length
of the subinterval, the process is said to be uniformly distributed over the interval a to
b. The probability density function for the continuous uniform distribution is
p X ( x) = 1 /( β − α ) for α ≤ x ≤ β
and the cumulative distribution function is
p X ( x) = ( x − α ) /( β − α ) for α ≤ x ≤ β
The mean and variance of the uniform distribution are
E ( X ) = (β + α ) / 2
var( X ) = ( β − α ) 2 / 12
The methods of moments yields the following estimators for the parameters α and
β:
αˆ = X − 3S
βˆ = X + 3S
The method of maximum likelihood when applied to the uniform distribution results
in the estimators for α and β being the smallest and largest sample values
respectively. That this is the case can be seen by writing out the likelihood function
and then selecting those values of α and β (within the constraints that α ≤ X ≤ β )
that maximize the function.
77
Chapter IV: Probability Distributions and Their Applications
The uniform distribution finds its greatest application as the distribution of PX (x) for
all probability density functions. That is the prob ( PX ( x) ≤ y) is uniformly distributed
over the interval (0 ≤ y ≤ 1) for any continuous probability distribution. This fact is
used for generating random observations from probability distributions.
Example: Use the method of moments to estimate the parameters of the uniform
distribution based on the following sample: 1, 4, 3, 4, 5, 6, 7, 6, 9, 5. What are the
maximum likelihood estimators for this sample? (Haan, 1979, p. 98)
Solution: By method of moments
x = 5.00 and s = 2.26
βˆ = x + 3s =8.83
αˆ = x − 3s =1.17
By maximum likelihood
α̂ =1.00
βˆ =9.00
Comment: This problem illustrates that the method of moments and the method of
maximum likelihood do not always produce the sample parameter estimates. In this
case the parameters estimated by moments are not reasonable since values of X
outside the limits of α̂ and βˆ are present in the sample. This is a common problem
when the method of moments is used to estimate the parameters of the uniform
distribution for small samples. Of course for large samples neither the moment nor the
maximum likelihood estimates will be "good" if the sample is not truly a random
sample from a uniform distribution.
4.2.3
Exponential distribution
The exponential density function is given by
p X ( x ) = λ e − λx
x > 0, λ > 0
and the cumulative exponential by
X
p X ( x) = ∫ λ e − λ t dt = 1 − e − λ X
0
The mean and variance of the exponential distribution are
78
x>0
Chapter IV: Probability Distributions and Their Applications
E( X ) = 1/ λ
var( X ) = 1 / λ2
The exponential distribution is positively skewed with the skewness coefficient of 2.
Both the method of moments and the maximum likelihood estimation give the
parameter λˆ = 1 / X . The exponential distribution is a special case of the gamma
distribution.
4.2.4
Hypo-exponential distribution
Many processes in nature can be divided into sequential phases. If the time the
process spends in each phase is independent and exponentially distributed, then the
overall time is hypo-exponentially distributed. The service times for input-output
operations in a computer system often possess this distribution.
Let X i , i = 1, K , n, be independent exponential random variables with respective rates
λi , i = 1,K, n, and suppose that λi ≠ λ j for i ≠ j . The random variable S = ∑i =1 X i is
n
said to be a hypo-exponential random variable. To compute its probability density
function, let us start with the case n = 2 . Now,
t
f S (t ) = f X 1 + X 2 (t ) = ∫ fX 1 ( s ) fX 2 (t − s ) ds
0
t
= ∫ λ1e −λ1s λ2 e −λ2 (t − s ) ds
0
t
= λ1λ2 e −λ2t ∫ e −( λ1 −λ2 ) s ds
0
=
=
λ1
λ1 − λ2
λ1
λ1 − λ2
λ2 e −λ t (1 − e −( λ −λ )t )
2
λ2 e −λ t +
2
1
λ2
2
λ2 − λ1
λ1e −λ t
1
A similar computation yields for n = 3 ,
3
f S(t ) = f X 1 + X 2 + X 3 (t ) = ∑ λi e −λi t (∏
i =1
which suggests the general result:
79
j ≠i
λj
λ j − λi
)
Chapter IV: Probability Distributions and Their Applications
n
f S (t ) = f X 1 +K+ X n (t ) = ∑ C i , n λi e −λi t
i =1
where C i ,n = ∏
j ≠i
λj
λ j − λi
The distribution function of the hypo-exponential distribution is:
FS (t ) = 1 −
λ2
λ2 − λ1
e −λ1t +
λ1
λ2 − λ1
e − λ2t
t≥0
The hazard rate (failure rate) of this distribution is given by:
λ1λ 2 (e − λ t − e − λ t )
hS (t ) =
λ 2 e −λ t − λ1e −λ t
1
2
1
2
It is not difficult to see that this is an increasing failure rate (IFR) distribution with the
failure rate increasing from 0 to min{ λ1 , λ2 } .
Integrating both sides of the expression for f S from t to ∞ yields that the tail
distribution function of S is given by
n
p{S > t} = ∑ C i ,n e −λi t
i =1
The failure rate function of S , rS (t ) , is as follows:
C λe λ
∑
r (t ) =
∑ C eλ
− it
n
i =1
n
S
i =1
i ,n
i
− it
i ,n
If we let λ j = min( λ1 , K , λ n ) then it follows, upon multiplying the numerator and
denominator of rS (t ) by e
λ jt
, that
lim rS (t ) = λ j
t →∞
From the preceding, we can conclude that the remaining lifetime of a hypoexponentially distributed item that has survived to age t , for large t , is approximately
that of an exponentially distributed random variable with a rate equal to the minimum
of the rates of the random variables whose sums make up the hypo-exponential.
4.2.5
Erlangian distribution
The Erlangian distribution is the time-dependent form of the Poisson discrete
distribution. The Erlangian distribution arises frequently in reliability engineering
80
Chapter IV: Probability Distributions and Their Applications
calculations involving random failures, i.e., those failures for which the hazard rate
λ (t ) is a constant λ . The parameter λ is the average rate of occurrence of the event.
To derive the distribution, we recognize that the mean number of failures µ is the
product of λ and time t . The probability of exactly r failures occurring in time t is
then given by:
p(r , t ) = e − λt (λt ) r / r!
(4.5.1)
and the cumulative probability of x or fewer failure is:
e − λt ( λ t ) r
r!
r =0
x
p ( ≤ x, t ) = ∑
(4.5.2)
Equation (4.5.1) is useful since it permits calculation of the failure probability density
f (t ) for the rth failure in dt about t . What is required, of course, is for the system to
have undergone ( r − 1) prior failures so that it is ready to fail for the r th time with a
conditional probability λ . Thus the Erlangian distribution follows from equation
(4.5.1) as
f (t ) = λp(r − 1, t ) =
λ (λt ) r −1 e − λt
(r − 1)!
λ > 0 , r ≥ 1 (4.5.3)
For r = 1 , we get the exponential distribution. All the distributions have a common
mean at t = 1 / λ . The mode is at x = 0 for r = 1 (exponential) and at t = ( r − 1) /(λr )
for other values of r . The variance of the rth members of the family is 1 /(rλ 2 ) . As
r increases, the mode moves towards the right to 1 / λ and the variance decreases to
0.
The Erlang family has a close association with the exponential distribution going
beyond the fact that the first Erlang distribution is exponential. If we have r random
variables X 1 , X 2 , K X r which are independent and have a common exponential
distribution with mean 1 /( rµ ) , then the random variable X 1 + X 2 + K + X r follows
the rth Erlang distribution with parameter r . These distributions play an important
role in queuing theory and were developed A.K Erlang for use in telephone system.
4.2.6
Gamma distribution
The gamma distribution failure probability density obeys the equation
81
Chapter IV: Probability Distributions and Their Applications
f (t ) =
λ (λt ) r −1 e − λ t
Γ( r )
,
λ >0, r >0
(4.6.1)
where parameter r need not be an integer. The two parameters are the shape
parameter r and the scale parameter λ . The shape of the distribution depends
significantly upon the value of r. It has also an impact on the hazard rate λ (t ) . In the
special case that r is an integer, the Erlangian distribution is recovered; in the
special case that λ = 0.5 and r = 0.5η , where η is the number of degrees of
freedom, the gamma distribution becomes the chi-square distribution.
The cumulative failure probability F(t) is:
F (t ) =
=
1 λ t r −1 − y
y e dy
Γ(r ) ∫0
(4.6.2)
1
γ (r , λt ).
Γ( r )
The mean and variance of the gamma distribution are:
m = r/λ
σ 2 = r / λ2
The gamma distribution is especially appropriate for systems subjected to an
environment of repetitive, random shocks generated according to the Poisson
distribution; thus the failure probability depends upon how many shocks the device
has suffered, i.e., its age. As another application, if the mean rate of wear of a device
is a constant, but the rate of wear is subject to random variations, then the gamma
function should be used.
For some devices, such as those for which corrosion of metals is important, it may be
appropriate to modify the two-parameter gamma distribution by introducing a time
delay τ before the onset of failures begins. Then equation (4.6.1) is modified to read
as:
f (t ) =
λr (t − τ ) r −1 e − λ (t −τ )
Γ(r )
,
82
t ≥τ
(4.6.3)
Chapter IV: Probability Distributions and Their Applications
t <τ
= 0,
In such a case, the mean of the distribution becomes:
m =τ + r /λ
(4.6.4)
Example: Suppose that a device subjected to repetitive random shocks satisfies a
gamma distribution with parameters r = 3 and λ = 10 −3 / hr, and that no failures can
occur until 200 hour have passed. Estimate (a) the probability of failure after the
device has operated for t = 4500 hour and (b) its mean time to failure. (MacCormick,
1981, p. 37)
Solution:
In this problem, the time displacement is τ = 200 hour. Integrating
equation (4.6.3) from 0 to t and using equation (4.6.2) gives the cumulative
probability:
F (4500) =
[
]
1
1
γ 3,10 −3 (4500 − 200) =
γ (3,4,3) ≈ 0.8
Γ(3)
Γ(3)
Using equation (4.6.4) gives mean time to failure (MTTF) = 200 + 3/1 0-3 = 3200 hr.
4.2.7
Weibull distribution
The Weibull is a very general and popular failure distribution that has been shown to
apply to a large number of diverse situations. The distribution is named after Waloddi
Weibull, a Swidish physicist, who used it in 1939 to represent the distribution of the
breaking strength of materials. The distribution has also been used in reliability and
quality control. The density function of the three parameter Weibull distribution is
given by:
α
f ( x) =
β
⎛ x −ε ⎞
⎟⎟
⎜⎜
⎝ β −ε ⎠
α −1
⎡ ⎛ x − ε ⎞α ⎤
⎟⎟ ⎥ ,
exp⎢− ⎜⎜
⎢⎣ ⎝ β − ε ⎠ ⎥⎦
α > 0, β > 0, 0 ≤ ε ≤ x ≤ ∞
The shape of the distribution depends primarily on the shape parameter, α . The scale
parameter is β and the delay/displacement parameter is ε . If α = 1 , the Weibull
distribution reduces to the exponential distribution. As α increases, the Weibull
distribution tends to the normal distribution. For α = 2 , the distribution becomes the
Rayleigh distribution. The Weibull distribution is also known as the bounded
exponential distribution.
83
Chapter IV: Probability Distributions and Their Applications
The cumulative distribution function is given by:
⎡ ⎛ x − ε ⎞α ⎤
⎟⎟ ⎥
F ( x) = 1 − exp⎢− ⎜⎜
⎢⎣ ⎝ β − ε ⎠ ⎥⎦
By using the transformation
⎛ x −ε
y = ⎜⎜
⎝ β −ε
α
⎞
⎟⎟ , tables of e − y can be used to
⎠
determine F (x ) .
The mean and variance of the distribution are:
E ( X ) = ε + ( β − ε )Γ(1 + 1 / α )
(4.7.1)
[
Var ( X ) = ( β − ε ) 2 Γ(1 + 2 / α ) − Γ 2 (1 + 1 / α )
]
(4.7.2)
The coefficient of skew is given by:
γ =
Γ(1 + 3 / α ) − 3Γ(1 + 2 / α )Γ(1 + 1 / α ) + 2Γ 3 (1 + 1 / α )
[Γ(1 + 2 / α ) − Γ
2
(1 + 1 / α )
(4.7.3)
]
3/ 2
From equations (4.7.1) and (4.7.2), we can obtain
β = µ + σA(α )
(4.7.4)
ε = β − σB (α )
(4.7.5)
where A(α ) = [1 − Γ(1 + 1 / α )]B(α )
(4.7.6)
[
B(α ) = Γ(1 + 2 / α ) − Γ 2 (1 + 1 / α )
]
−1 / 2
(4.7.7)
The moment estimates for α , β , and ε can now be obtained by (i) solving equation
(4.7.3) for α̂ , (ii) solving (4.7.6) and (4.7.7) for A(α ) and B (α ) , (iii) solving (4.7.4)
for βˆ and (iv) solving (4.7.5) for εˆ . In Haan (1979), a table is given to simplify the
calculations.
84
Chapter IV: Probability Distributions and Their Applications
Figure 4.7.1
Typical Weibull density curves (the vertical and horizontal axes are
f (x ) and x − ε , respectively)
Example: The lifetime X in hours of a component is modeled by a Weibull
distribution with α = 2 . Starting with a large number of components, it is observed
that 15% of the components that have lasted 90 hours fail before 100 hours.
Determine the parameter λ . (Trivedi, 2004, p. 130)
Solution: FX ( x) = 1 − e − λx and we are given that P ( X < 100 | X > 90) = 0.15 . Also,
2
P( X < 100 | X > 90) =
P(90 < X < 100)
P( X > 90)
=
FX (100) − FX (90)
1 − FX (90)
e − λ ( 90 ) − e − λ (100 )
2
=
e −λ ( 90 )
2
2
.
Equating the two expressions and solving for λ , we get:
λ = − ln(0.85) / 1900 = 0.1625 / 1900 = 0.00008554 .
4.2.8
Hyper-exponential distribution
A process with sequential phases gives rise to a hypo-exponential or an Erlang
distribution, depending upon whether or not the phases have identical distributions.
Instead, if a process consists of alternate phases – that is, during any single
experiment, the process experiences one and only one of the many alternate phases –
and these phases have independent exponential distributions, then the overall
85
Chapter IV: Probability Distributions and Their Applications
distribution is hyper-exponential. The density function of a k -phase hyperexponential random variable is:
k
k
f (t ) = ∑ α i λi e −λi t ,
t > 0, λi > 0, α i > 0, ∑ α i = 1
i =1
i =1
and the distribution function is:
F (t ) = ∑ α i (1 − e − λit ),
t≥0
i
The failure rate is:
αλe λ
∑
h(t ) =
∑α e λ
− t
i
i
− it
t > 0,
,
i
which is a decreasing failure rate from
∑α λ down to min {λ , λ ,K}.
i
i
1
2
The hyper-exponential distribution exhibits more variability than the exponential.
CPU service-time distribution in a computer system has often been observed to
possess such a distribution. Similarly, if a product is manufactured in several parallel
assembly lines and the outputs are merged, then the failure density of the overall
product is likely to be hyper-exponential. The hyper-exponential is a special case of
mixture distributions that often arise in practice – that is, of the from:
∑α
F ( X ) = ∑ α i Fi ( X ),
i
i
= 1, α i ≥ 0 .
To see how a hyper-exponential random variable might originate, imagine that a bin
contains n different types of batteries, with a type j battery lasting for an
exponential distributed time with rate λ j ,
j = 1, 2, L , n . Suppose further that Pj is
the proportion of batteries in the bin that are type j for each j = 1, 2, L , n . If a
battery is randomly chosen, in the sense that it is equally likely to be any of the
batteries in the bin, then the lifetime of the battery selected will have the hyperexponential distribution.
4.2.9
Double exponential distribution
The distribution was discovered by Laplace in 1774, as the form of distribution for
which the likelihood function is maximized by setting the location parameter equal to
the median of the observed values of an odd number of independent identically
distributed random variables. A continuous random variable X is said to have a
86
Chapter IV: Probability Distributions and Their Applications
generalised Laplace double exponenetial distribution if its probability density function
is given by:
1 −
f ( x; θ , λ ) =
e
2λ
x −θ
λ
−∞ < x < ∞
,
(4.9.1)
λ and θ are the two parameters of the distribution such that λ > 0 and − ∞ < x < ∞ .
If θ = 0 and λ = 1 , then (4.9.1) reduces to
f ( x) =
1 −x
e
2
This is the standard form of the double exponential distribution. It is also known as
Poisson's first law of error.
The double exponential distribution has:
1. Mean, E ( X ) = 0
2. Variance, Var[ X ] = 2
3. Mean deviation about mean, η = 1
4. Skewness, β 1 = 0
5. Kurtosis, β 2 = 6
The distribution is symmetrical about x = 0 . The distribution finds its application in
queue theory and the theory of reliability.
4.2.10 Cauchy distribution
A continuous random variable X is said to have a generalized Cauchy distribution
with parameters θ and λ if its probability density function is given by:
f ( x;θ , λ ) =
λ
1
2
π λ + (x −θ )2
−∞ < x < ∞
(4.10.1)
where λ > 0 .
This is the special type of Pearson Type VII distribution. The cumulative distribution
function X is
F ( x;θ , λ ) =
1 1
⎛ x −θ ⎞
+ tan −1 ⎜
⎟
2 π
⎝ λ ⎠
(4.10.2)
The parameters θ and λ are called the location and scale parameters, respectively.
The distribution was discovered by Cauchy. It has a wide range of application in the
87
Chapter IV: Probability Distributions and Their Applications
theory of statistics. The role of Cauchy distribution in statistical theory often lies in
providing counter examples. It is often quoted as a distribution for which moments do
not exist. The distribution is symmetrical about x = θ and thus x = θ gives the
median of the distribution. The function is maximum at x = θ so that the point is also
the mode. In the most general sense, the mean of the Cauchy distribution does not
exist. But if it exists, then it is located at x = θ . Thus, the mean, median and mode of
the distribution coincide at the point x = θ . For the Cauchy distribution, the second
and higher moments about the mean do not exist. The upper and lower quartiles are
θ ± λ . The probability function has points of inflexion at θ ± λ 3 .
Remarks:
• If λ = 1 , Cauchy distribution reduces to the form as
f ( x;θ ) =
• If
x −θ
λ
1
1
π 1 + (x −θ )2
{
}
= z , then (4.10.1) turns into the standard Cauchy distribution whose
pdf is
f ( z) =
1
,
π (1 + z 2 )
−∞ < x < ∞
4.2.11 Beta distribution
A distribution that has both an upper and a lower bound is the beta distribution.
Generally the beta distribution is defined over the interval 0 to 1. It can, however, be
transformed to any interval a to b. If the limits of the distribution are unknown, they
become parameters of the distribution making it a four parameter rather than a two
parameter distribution. The beta density function is given by
f ( x) = x α −1 (1 − x) β −1 / B(α , β ) ,
0 ≤ x ≤1
α > 0 and β > 0 are the two parameters of the distribution. This is the standard form
1
of the beta distribution. The function B(α , β ) = ∫ x α −1 (1 − x) β −1 dx is called the beta
0
function. It can be shown that B(α , β ) =
Γ(α )Γ( β )
.
Γ(α + β )
The mean and variance of the beta distribution are:
88
Chapter IV: Probability Distributions and Their Applications
E( X ) =
Var ( X ) =
α
α +β
αβ
(α + β ) (α + β + 1)
2
The mean and variance can be used to get the moment estimators for α and β .
The beta distribution can assume variety of different shapes depending on the values
of its parameters. If α = β = 1 , the beta distribution reduces to uniform distribution in
the range [0, 1] . But, if one parameter equals one and the other equals two, then it turns
into triangular distribution. If both the parameters greater than one, the mode of the
distribution is
α −1
. If one parameter equals unity and the other is greater than
α +β −2
unity, then there is only one point of inflexion. The distribution is symmetrical if
α = β , skewed to the right if α > β , and skewed to the left if α < β .
Figure 4.11.1 The density functions of the beta distribution for various values
of a = α̂ and b = βˆ
Problem: The proportion of a brand of television sets requiring service after sale
during the first year of operation is a random variable having a beta distribution with
89
Chapter IV: Probability Distributions and Their Applications
α = 3 and β = 2 . What is the probability that at least 80% of the new models sold
this year of this brand will require service during the first year of operation? (Islam,
2004, p. 689)
4.2.12 Pearson distribution
Karl Pearson developed a set of frequency curves which could be obtained as the
solution of the first order differential equation:
dy
y ( x − a)
=
dx b0 + b1 x + b2 x 2
(4.12.1)
where a, b0 , b1 and b2 are the parameters to be calculated from the given data. By
choosing appropriate values for the parameters, the above equation becomes a large
number of families of distributions including the normal, beta and gamma
distributions. The families of frequency functions defined by (4.12.1) are known as
Pearson distribution. The distribution is completely determined by its first four
moments. There are twelve types of Pearson curves. They are type I, type
II,................., and type XII. The form of the frequency curve depends on k , which is
equal to b12 / 4b0 b 2 , and β 2 , which is equal to µ 4 / µ 22 ( µ 2 and µ 4 are the second
and fourth moments, respectively, about the mean). Figure 4.12.1 shows the type of
Pearson's curves for different values of k .
Figure 4.12.1 Type of Pearson's curves for different values of k
Pearson's method of fitting consists of :
a) determining the values of the first four moments of the observed distributions;
b) calculating the observed values of β 1 = µ 32 / µ 23 , β 2 and k , which determine
the type to which the observed distribution belongs;
90
Chapter IV: Probability Distributions and Their Applications
c) equating the observed moments to the moments of the type of distribution
expressed in terms of its parameters; and
d) solving the resulting equations of those parameters; where upon the fitted
distribution is determined.
4.2.13 Lognormal distribution
The lognormal distribution (sometimes spelled out as the logarithmic normal
distribution) of a random variable X is one for which the logarithm of X follows a
normal or Gaussian distribution. Denote Y = ln X , then Y has a normal or Gaussian
distribution given by:
f ( y) =
1
2πσ y2
e
1 ⎛ y−µ y
− ⎜
2 ⎜⎝ σ y
⎞
⎟
⎟
⎠
2
−∞ < y < ∞
,
(4.13.1)
dy 1
= , the distribution of X can be found
dx x
Derived distribution: Since Y = ln X ,
as:
f ( x) = f ( y ) ⋅
dy
=
dx
1
2πσ y2
e
1 ⎛ y−µ y
− ⎜
2 ⎜⎝ σ y
⎞
⎟
⎟
⎠
2
⋅
1
=
x
1
2πx 2σ y2
e
1 ⎛ y−µ y
− ⎜
2 ⎜⎝ σ y
⎞
⎟
⎟
⎠
2
(4.13.2)
Note that equation (4.13.1) gives the distribution of Y as a normal distribution with
mean µ y and variance σ y2 . Equation (4.13.2) gives the distribution of
lognormal distribution with parameters µ y and σ y2 .
Estimation of parameters ( µ y , σ y2 ) of lognormal distribution:
Note: Y = ln X ,
∑y
y=
n
i
, S
2
y
∑y
=
2
i
− ny 2
n −1
Chow (1954) Method:
(1)
Cv = S x / X
(2)
Y =
(3)
S y2 = ln(C v2 + 1)
X2
1
ln 2
2 Cv +1
91
X
as the
Chapter IV: Probability Distributions and Their Applications
The mean and variance of the lognormal distribution are:
E ( X ) = exp(µ y + σ y2 / 2)
[
Var ( X ) = µ x2 e
σ y2
]
−1
The coefficient of variation of the X ' s is:
Cv = e
σ y2
−1
The coefficient of skew of the X ' s is:
γ = 3C v + C v3
Thus the lognormal distribution is skewed to the right; the skewness increasing with
increasing values of C v .
Reproductive property of ln distribution:
(1)
Suppose, X ~ ln and y = aX b then y ~ ln with µ ln y = ln a + bµ ln X
and σ ln2 y = b 2σ ln2 X .
(2) (a) Suppose, X ~ ln , y ~ ln and Z = Xy then Z ~ ln with parameters
µ ln Z = µ ln X +`µ ln y
σ ln2 Z = σ ln2 X + σ ln2 y
(b) If z =
X
and X and y are mutually independent then Z ~ ln
y
with
µ ln Z = µ ln X − µ ln y
σ ln2 Z = σ ln2 X + σ ln2 y
(3) X ~ N ( µ , σ 2 ) , X ~ N ( µ , σ 2 / n)
1
1
Geometric mean: X g = (πx i ) n = ( x1 x 2 K x n ) ~ ln with mean = µ ln X ,
variance = σ ln2 X / n
The lognormal distribution arises in processes in which the change in a random
variable at the n th step is a random proportion of the variable a the ( n − 1) st step.
Another way of saying the same thing is that the lognormal distribution is needed
when factors of percentages characterize the variation. Thus if X represents a
92
Chapter IV: Probability Distributions and Their Applications
quantity that can vary by factors in its error, having a possible range between X 0 / f
and X 0 f , where X 0 is some midpoint reference value and f an error factor, then a
lognormal is the natural distribution for describing the phenomenon.
One of the reasons that the lognormal distribution is frequently suitable for describing
failures in reliability and risk analysis is that data for rarely occurring events may
not be extensive, so that component failure rates may vary by factors. For example, a
failure rate estimated at 10 −6 / hr to 10 −7 / hr if the error factor is 10. When the
failure rate is expressed as 10 − X , where X is some exponent, use of the lognormal
distribution implies that the exponent satisfies a normal distribution. Thus we can
view the lognormal distribution as one for situations in which there is considerable
uncertainty in the failure parameters.
Another feature of the lognormal distribution is that the skewness to higher times
incorporates the general behavior of the data for unlikely phenomenon since the
skewness accounts for the occurrence of infrequent but large deviate values, such as
abnormally high failures rates due to batch defects, environmental degradation, and
other causes.
The three-parameter lognormal distribution is obtained by fitting a normal distribution
to the logarithms of ( x − τ ) where τ is a parameter that must be estimated from the
data. Replacing x in equation (4.13.2) by x − τ results in the generalized threeparameter form. Then
E( X ) = τ + exp(µ y + σ y2 / 2)
Var ( X ) = e
2 µ y +σ y2
[e
σ y2
]
−1
The three-parameter lognormal distribution replaces the two-parameter version
whenever there is no possibility of failure for 0 ≤ x ≤ τ .
4.2.14 Pareto distribution
The Swiss economics professor Vilfredo Pareto (1848-1923) formulated an
eponymous law which states that the fraction of a population with income exceeding
an amount x is equal to
93
Chapter IV: Probability Distributions and Their Applications
Cx − a
(4.14.1)
for all x , where C and a are positive constants independent of x but depending on
the population. Pareto believed that his law was true for all populations regardless of
economics and political conditions. If F (x ) is the CDF of the income distribution,
then (4.14.1) implies that
a
⎛c⎞
F ( x) = 1 − ⎜ ⎟ ,
⎝ x⎠
x > c,
(4.14.2)
where c > 0 is the minimum income. We denote the distribution given in (4.14.2) by
Pareto (a, c) . Pareto distribution finds its application for graduating the city population
sizes, occurrence of natural resources, stop price fluctuations, size of firms, personal
incomes, distribution of losses from investing, etc.
The PDF of the distribution in (4.14.2) is
ac a
f ( x) = a +1 ,
x
x>c
(4.14.3)
So a Pareto distribution has polynomial tails. The constant a is called the tail index or
Pareto constant. An advantage of the Pareto distribution over the t-distribution as a
statistical model is that the tail index of the Pareto can be any positive value whereas
for the t-distribution the tail index is a positive integer.
The survival function of a random variable X is
P( X > x) = 1 − F ( x) ,
where F is the CDF of X . If X is a loss, then the survival function
a
⎛c⎞
1 − F ( x) = ⎜ ⎟ ,
⎝ x⎠
x>c
As x → ∞ , the survival function of a Pareto converges to 0 at a slow polynomial rate
rather than a fast exponential rate, which means that Pareto distributions have a heavy
right tail, and the smaller the value of a the heavier the tail. Figure (2.12) shows the
survival function of a Pareto distribution with c = 0.25 and a = 1.1 . For comparison,
the survival functions of normal and exponential distributions, conditional on being
greater than 0.25, are also shown. The normal has mean 0 and its standard deviation
σ = 0.3113 , and the exponential has θ = c / a = 0.25 /1.1 . These parameters were
chosen so that the Pareto, normal, and exponential densities, conditional on being
94
Chapter IV: Probability Distributions and Their Applications
greater than 0.25, have the same height at 0.25, which implies that their survival
functions have the same slope at 0.25 so the three survival functions start to decrease
to 0 at the same rates. Notice that despite their initial rates of decrease being equal, the
normal survival functions converge to 0 much faster than the Pareto as x → ∞ . This
means that the extreme losses are much more likely if the loss distribution is Pareto
rather than normal. The exponential survival function is intermediate between the
normal and Pareto survival functions.
Figure 4.2.14.1
Survival function of a Pareto distribution (see text)
Pareto distribution can be used to model the probability of a large loss as follows.
Let X = -return. When X > 0 there is a loss. We assume that for some c > 0 , the
distribution of X conditional on X > c is Pareto with parameters c and a . The
value of c can be selected by the analyst and might be the smallest loss which is of
real interest, such that losses smaller than c are too small to be of much concern. The
parameter a can be estimated from a set of loss data.
Example: MLE of the Pareto index and the hill estimator
Assume that X 1 ,K, X n are i.i.d. Pareto (a, c) . The parameter c is often known and if
not, one can use the minimum of X 1 ,K, X n as an estimate of c . The tail index a will
generally not be known, but a can be estimated easily by maximum likelihood as is
now shown. For simplicity, in the following we assume that c is known but c would
be replaced by an estimate if not. By the equation (4.14.3), the likelihood function is
ac a ac a
ac a
L(a) = ( a +1 )( a +1 )K( a +1 ) ,
X1
X2
Xn
95
Chapter IV: Probability Distributions and Their Applications
so the log-likelihood is
n
log{L( a )} = ∑ {log(a ) + a log(c ) − ( a + 1) log( X i )} .
(4.14.4)
i =1
Differentiating (4.14.4) w.r.t. a and setting the derivative equal to 0 gives the
equation
n
n
= ∑ log( X i / c )
a i =1
Therefore the MLE of a is
aˆ =
n
∑
(4.14.5)
n
log( X i / c)
i =1
If X 1 ,K, X n are i.i.d. from a distribution with Pareto tails rather than being exactly
Pareto, then one does not want to compute (4.14.5) using all of the data but rather
only the data in the tail. Otherwise, there could be sizeable bias. Therefore, one should
choose a constant c and use only the data greater than c . This estimator is called the
Hill estimator. Typically, c is one of the X i . The Hill estimator can be written as
aˆHill (c) =
∑
Xi
n
,
X
c
log(
/
)
i
>c
(4.14.6)
where n(c ) is the number of X i greater than c .The difficulty is how to choose c or,
equivalently, n(c ) . The Hill plot is plot of aˆHill (c) versus n(c ) . We expect that aˆHill (c)
will be unstable when n(c ) small due to random variability. If n(c ) gets too large, one
is using nearly all of the data and aˆHill (c) may suffer from bias. One hopes that the
Hill plot will show some stability for n(c ) neither too small nor too large and one can
then use a value of n(c ) in this region of stability.
4.3
SYNTHESIZED DISTRIBUTION
Three categories of synthesized distributions for a device are considered. A mixed
distribution failure model is a linear combination of two or more probability densities
for all times, and a composite distribution failure model is a piecewise-continuouswith-time failure probability density. A convoluted distribution arises for a device that
has replacement units in standby ready for sequential use as each unit fails.
4.3.1
Mixed distribution
96
Chapter IV: Probability Distributions and Their Applications
If f i (t ) is a failure probability density with hazard rate λi (t ), i = 1 to I, then the
corresponding density f(t) for the mixed distribution of a single component device can
be written as
I
f (t ) = ∑ k i f i (t ).
(4.3.1.1)
i =1
The mixing parameters k i , i = 1 to I, must be such that
0 ≤ k i ≤ 1,
I
∑k
i =1
i
(4.3.1.2)
=1
(4.3.1.3)
Example: Construct a mixed distribution failure probability density for a combination
of an exponential distribution and a gamma distribution.
Solution:
From the exponential and gamma probability densities given earlier in
this chapter and equations (4.3.1.1) through (4.3.1.3),
f (t ) = kλ1 exp(−λ1t ) + (1 − k )
λr2 t r −1 exp(−λ2 t )
Γ( r )
Figure 4.3.1.1 Mixed exponential and gamma distribution with λ1 = λ2 = 1, r = 5 ,
and k = 0.1
Figure 4.3.1.1(a) shows the exponential distribution (curve 1), the gamma distribution
(curve 2), and the sum (curve 3) for k = 0.1. 4.3.1.1(b) illustrates the instantaneous
97
Chapter IV: Probability Distributions and Their Applications
failure rate λ (t ) for the mixed model. For the case shown, there initially is a period of
diminishing hazard rate (0 ≤ t ≤ t H ) during which "weak" items in a large population
would be expected to fail; at later times, phenomena such as wear cause the rate to
increase.
Example: Failures of a given device can be classified as either sudden (catastrophic)
or delayed (wear-out). Develop a mixed distribution model for the cumulative failure
probability of the device.
Solution:
Catastrophic failures may occur as soon as the device is exposed to an
operating environment outside the maximum tolerances for operation; then the
Weibull distribution with a location parameter τ 1 = 0 and a shape parameter α 1 < 1 is
an appropriate model. Wear-out failures are due to aging of the device; a Weibull
model with a location parameter τ 2 > 0 and a shape parameter α 2 > 1 is an
appropriate failure model. From the equation of the cumulative Weibull distribution
and equations (4.3.1.1) through (4.3.1.3), the cumulative failure probability is
{
[
F (t ) = k 1 − exp − (t / β 1 )
α1
⎧⎪
⎡ ⎛ t −τ
2
+ (1 − k ) ⎨1 − exp ⎢− ⎜⎜
β
⎢⎣ ⎝ 2
⎪⎩
]}
⎞
⎟⎟
⎠
α2
⎤ ⎫⎪
⎥⎬
⎥⎦ ⎪⎭
for β1 > 0 and β 2 > 0 and 0 ≤ k ≤ 1 . A special case of the corresponding f(t) is
shown in Figure 4.3.1.2.
Figure 4.3.1.2 Mixed Weibull failure probability density with β1 = β 2 = 1, α1 = 0.5 ,
α 2 = 3 , τ 1 = 0 , τ 2 = 0.4 and k = 0.2
4.3.2
Composite distribution
98
Chapter IV: Probability Distributions and Their Applications
A composite failure model for a one-component system can be constructed by linking
together different failure probability densities for different time intervals. Then f j (t )
denotes the composite probability density function for time interval T j −1 ≤ t ≤ T j
where the times T j −1 and T j are the partition parameters for the jth interval.
A special case of a composite distribution exists for any device that cannot fail for a
finite period of time τ . In such situations, the device is not sensitive to any load to
which it is subjected, so f1 (t ) = 0 for 0 ≤ t ≤ τ .
Example: A device is known to always fail in a random fashion while in a phased
mission mode of operation consisting of three stages: 0 ≤ t < T1 , T1 ≤ t < T2 and
t ≥ T2 , Obtain the cumulative failure probability for the device.
Three non-synchronous hazard rates, denoted as λ j , j = 1 to 3 , charac-
Solution:
terize the hazard rate, which can be written as
λ (t ) = λ1 + (λ 2 − λ1 ) H (t − T1 ) + (λ3 − λ 2 ) H (t − T2 ) ,
where H ( x) = 1 ,
x ≥ 0 , and H(x) = 0, x < O. Using
R (t ) = 1 − F (t ) and
t
R(t ) = e
∫
− λ ( t ′ ) dt ′
0
the cumulative failure probability is
F (t ) = 1 − exp(−λ1t ),
0 ≤ t < T1 ,
= 1 − exp[− λ1t − (λ2 − λ1 )(t − T1 )] ,
T1 ≤ t ≤ T2
= 1 − exp[− λ1t − (λ 2 − λ1 )(t − T1 ) − (λ3 − λ 2 )(t − T2 )],
t ≥ T2
A composite model has the advantage that it can sometimes provide flexibility in
fitting and explaining failure data. It is really nothing more than the well-known
method of approximating a function by dividing it into a number of regions.
Intuitively, the greater the number of segments taken, the more accurate the
approximation becomes, but engineering judgment must be exercised to balance
goodness of fit and computational complexity.
4.3.3 Convoluted distribution
99
Chapter IV: Probability Distributions and Their Applications
A device that has replacement units in standby can continue to operate provided at
least one of its units has not failed. The first unit operates until failure at t = t1 ; the
j th unit fails at t = t j . The failure probability density for the ith and all prior units,
f 1, 2,K,i (t ) , may be expressed in terms of that for the (i − 1) unit and all prior units, ,
f 1, 2,K,(i −1) (t ) , as the convolution of two failure probability densities:
t
f1, 2,K,i (t ) = ∫ f i (t − t ′) f12K(i −1) (t ′)dt ′
(4.3.3.1)
0
In this equation, the failure probability density for the i th unit, f i (t − t ′) , accounts for
the system failure probability density for the time (t - t') during which the i th unit is
in operation, while the f 12K( i −1) (t ′) dt ′ accounts for the failure probability of the
(i − 1) th unit in dt' about time t' after all other units j, j < (i − 1) , have failed. The
integration over the time of failure t' of the (i − 1) th unit ranges from 0 to 1 because
the actual time of the i th failure is not known.
Equation (4.3.3.1) can be rewritten in the form of nested integrals by recursively
applying the equation. The result is
t
ti −1
0
0
f 12Ki (t ) = ∫ dt i −1 f i (t − t i −1 ) ∫
t2
dt i − 2 f i −1 (t i −1 − t i − 2 ) L × ∫ dt1 f 2 (t 2 − t1 ) f 1 (t1 ) (4.3.3.2)
0
Thus, for example, for a three-unit system, with initially two replacement units in
standby, ready for use, the system probability density for failure is given by
t
t2
0
0
f123 (t ) = ∫ dt 2 f 3 (t − t 2 ) ∫ dt1 f 2 (t 2 −t1 ) f1 (t1 )
(4.3.3.3)
Any distribution discussed in Section 4-2 may be substituted for f i (t ) in equations
(4.3.3.1) through (4.3.3.3). The f i (t ) need not be identical for all i , but the system
must be capable of functioning with any unit in operation.
Example: Calculate the failure probability density for a system consisting of i
identical units, all having identical constant hazard rates λ . The units are used
successively.
Solution:
Equation f (t ) = λe − λt for the exponential failure model and Eq.
(4.3.3.1) can be combined to give
100
Chapter IV: Probability Distributions and Their Applications
f12Ki (t ) = λ
(λt ) i −1 −λt
e
(i − 1)!
This result should not be surprising since it is precisely the same as for the ith
occurrence in the Poisson process.
To summarize the results of this section:
1. A mixed distribution consists of the addition of failure distributions with variable
mixing coefficients for the different constituent probability densities.
2. A composite distribution is obtained from a set of piecewise-continuous hazard
rates, each valid over a finite interval of time; if the composite hazard rate is
continuous, so also is the composite failure probability density.
3. A convoluted distribution arises for a multi-unit system with one or more
replacement units in standby that can be switched into service instantaneously with no
switch failures.
101