Download Review of Basic Probability and Statistics for Reliability Theory - EGM.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Basic Concepts of Probability and Statistics
for Reliability Engineering
Ernesto Gutierrez-Miravete
Spring 2007
1
1.1
Introduction
Probability
Events and sample space are fundamental concepts in probability. A sample space S is
the set of all possible outcomes of an experiment whose outcome cannot be determined
in advance while an event E is a subset of S. The probability of the event E, P (E), is a
number satisfying the following axioms
0 ≤ P (E) ≤ 1
P (S) = 1
P(
[
Ei ) =
X
P (Ei )
where the various Ei ’s are mutually exclusive events.
One can associate with each occurrence in S a numerical value. A random variable X
is a function assigning a real number to each member of S. Random variables can adopt
discrete values or continuous values.
1.2
Discrete random variables
If X is a random variable (i.e. a function defined over the elements of a sample space) with
a finite number of possible values xi ∈ RX with i = 1, 2, ... , where RX is the range of values
of the random variable, then it is a discrete random variable.
The probability of X having a specific value xi , p(xi ) = P (X = xi ) is a number such that
1
p(xi ) ≥ 0
for every i = 1, 2, ..., and
∞
X
i=1
p(xi ) = 1
The collection of pairs (xi , p(xi )), i = 1, 2, ... is called the probability distribution of X.
p(xi ) is the probability mass function of X.
Two examples of discrete random variables are:
• Number of jobs arriving at a job shop each week.
• Tossing a loaded die.
1.3
Continuous random variables
If RX is an interval rather than a discrete set then X is a continuous random variable.
The probability that X ∈ [a, b] is
P (a ≤ X ≤ b) =
Z
b
a
f (x)dx
where f (x) is the probability density function of X satisfying (for all x ∈ RX )
f (x) ≥ 0
Z
RX
f (x)dx = 1
and, if x 6∈ RX
f (x) = 0
Two example of continuous random variables are:
• The life of a device.
• Temperature readings in a turbulent flow field.
2
1.4
Cumulative distribution function
The probability that X ≤ x, P (X ≤ x) = F (x) is the cumulative distribution function.
The CDF is defined as
n
X
F (x) =
i=1
p(xi )
for discrete X ≤ xn and as
Z
F (x) =
x
−∞
f (t)dt
for continuous X ≤ x.
Note that if a < b then F (a) ≤ F (b), limx→∞ F (x) = 1, limx→−∞ F (x) = 0 and P (a ≤
X ≤ b) = F (b) − F (a).
Exercise. Determine the probabilities of various outcomes in tossing a loaded die and
also the probability that a device has a certain life.
1.5
Expectation and Moment Generating Function
The expected value of a random variable X, the expectation of X is
n
X
E(X) =
xi p(xi )
i=1
for discrete X and
Z
E(X) =
∞
−∞
xf (x)dx
for continuous X. E(X) is also called the mean or the first moment of X. Generalizing,
the nth moment of X is
E(X n ) =
n
X
i=1
for discrete X and
Z
E(X) =
∞
−∞
xni p(xi )
xn f (x)dx
for continuous X.
A moment generating function of a random variable X can be defined as
Z
tX
ψ(t) = E(e ) =
3
etX dF (x)
Moments of all orders for X are obtained as the derivatives of ψ. The existence of a moment
generating function uniquely determines the distribution of X.
The variance of X, V (X) = var(X) = σ 2 is
σ 2 = E((X − E(X))2 ) = E(X 2 ) − (E(X))2
√
The standard deviation of X is σ = σ 2 . The third and fourth moments of a distribution
are associated with its skewness and its kurtosis, respectively.
Exercises. Determine the expectations of various outcomes in tossing a loaded die and
that of certain device having a certain life.
Another important statistic is the covariance of two random variables X and Y , Cov(X, Y ).
This is defined as
Cov(X, Y ) = E(XY ) − E(X)E(Y )
If Cov(X, Y ) = 0 the variables are said to be uncorrelated. Further, the autocorrelation
coefficient, ρ(X, Y ) is defined as
ρ(X, Y ) =
Cov(X, Y )
1
(var(X)var(Y )) 2
The conditional probability gives the probability that a random variable X = x given
that Y = y and is defined as
P (X = x|Y = y) =
P (X = x, Y = y)
P (Y = y)
Exercise. In a population of N people NA are color blind, NH are female and NAH are
color blind females. If a person chosen at random turns out to be a female, what is the
probability that she will also be color blind?
1.6
Law of Large Numbers and the Central Limit Theorem
The following limit theorems are of fundamental and practical importance. They are given
here without proof.
The strong law of large numbers states that if the random variables X1 , X2 , ..., Xn
are independent and identically distributed (iid) with mean µ then the limit
Pn
lim
n→∞
Xi
= lim X̄ = µ
n→∞
n
1
with probability P = 1.
Furthermore if the variance of the distribution of the Xi above is σ 2 , the central limit
theorem states that
lim P [
n→∞
X̄ − µ
√ ≤ a] =
σ/ n
Z
4
a
−∞
1
2
√ e−x /2 dx
2π
In words the√theorem states that the distribution of the normalized random variable
(X̄ − µ)/(σ/ n) approaches the standard normal distribution of mean 0 and standard
deviation 1.
2
Discrete Distributions
2.1
Bernoulli distribution
For an experiment consisting of n independent trials each with two possible outcomes, namely
success and failure. If Xj = 1 for a success and Xj = 0 for failure and the probability of
success remains constant from trial to trial, the probability of success at the jth trial is given
by the Bernoulli distribution as follows


 p
xj = 1, j = 1, 2, ..., n
pj (xj ) = p(xj ) = 1 − p = q xj = 0, j = 1, 2, ..., n


0
otherwise
Note that E(Xj ) = p and V (Xj ) = pq.
The outcome of tossing a fair coin n times can be represented by a Bernoulli distribution
with p = q = 12 .
2.2
Binomial distribution
The number of successes in n Bernoulli trials is a random variable X with the binomial
distribution p(x)



p(x) =


n
x
!
px q n−x x = 0, 1, 2, ..., n
0
otherwise
where
n
x
!
=
n!
x!(n − x)!
Note that E(X) = np and V (X) = npq.
Consider as an example the following situation form quality control in chip manufacture
where the probability of finding more than 2 nonconforming chips in a sample of 50, is
P (X > 2) = 1 − P (X ≤ 2) = 1 −
2
X
x=0
5
n
x
!
px q 50−x
2.3
Geometric distribution
The number of trials required to achieve the first success is a random variable X with the
geometric distribution p(x)
(
p(x) =
q x−1 p x = 1, 2, ...
0
otherwise
Note that E(X) = 1/p and V (X) = q/p2 .
Exercise. In acceptance sampling one must determine, for example, the probability that
the first acceptable item found is the third one inspected given that 40% of items are rejected
during inspection. Determine the values of x and q and find p(x).
2.4
Poisson distribution
If α > 0, the Poisson probability mass function is
(
p(x) =
exp(−α)αx
x!
x = 0, 1, 2, ...
otherwise
0
Note that E(X) = V (X) = α. The cumulative distribution function is
F (x) =
x
X
exp(−α)αi
i!
i=0
Examples of Poisson distributed random variables include
• The number of customers arriving at a bank.
• Beeper calls to an on-call service person.
• Lead time demand in inventory systems.
3
3.1
Continuous Distributions
Uniform distribution
For a random variable X which is uniformly distributed in [a, b] the uniform probability
density function is
(
f (x) =
1
b−a
0
a≤x≤b
otherwise
6
while its cumulative distribution function is


 0
F (x) = 

x<a
a≤x<b
x≥b
x−a
b−a
1
Note that P (x1 < X < x2 ) = F (x2 ) − F (x1 ) =
(b−a)2
x2 −x1
.
b−a
Note also that E(X) =
a+b
2
and
V (X) = 12 .
Examples of uniformly distributed random variables could be:
• Inter arrival time for calls seeking a forklift in warehouse operations.
• Five minute wait probability for passenger at a bus stop.
• Readings from a table of random numbers.
3.2
Exponential distribution
If λ > 0, the exponential probability density function of X is
(
f (x) =
λ exp(−λx) x ≥ 0
0
elsewhere
while its cumulative distribution function is
(
F (x) =
0R
x
0
x<0
λ exp(−λt)dt = 1 − exp(−λx) x ≥ 0
Note that E(X) = 1/λ and V (X) = 1/λ2 .
Examples of exponentially distributed random variables include:
• Inter arrival times of commercial aircraft at an airport.
• Life of a device.
The exponential distribution possesses the memoryless property, i.e. if s ≥ 0 and t ≥ 0
then P (X > s + t|X > s) = P (X > t). Clearly, unless there is agreement beforehand the
time one person arrives at the bank is independent of the arrival time of the next person.
Another example is that of the life of a used component which is as good as new. In the
discrete case the geometric distribution also possesses the memoryless property.
7
3.3
Gamma distribution
The gamma function of parameter β > 0, Γ(β) is
Z
Γ(β) =
∞
0
xβ−1 exp(−x)dx
Note that Γ(β) = (β − 1)Γ(β − 1) = (β − 1)!
A random variable X has a gamma probability density function with shape parameter β and scale parameter θ if
(
f (x) =
βθ
(βθx)β−1
Γ(β)
0
exp(−βθx) x > 0
otherwise
The cumulative distribution function is
(
F (x) =
1−
0
R∞
x
βθ
(βθt)β−1
Γ(β)
exp(−βθt)dt x > 0
x≤0
Note that E(X) = 1/θ and V (X) = 1/βθ2 .
3.4
Erlang distribution
If above β = k where k is an integer, the Erlang distribution of order k is obtained. The
cumulative distribution function is
(
F (x) =
1−
0
Pk−1
i=0
exp(−kθx)(kθx)i
i!
x>0
x≤0
Examples of gamma distributions occur for random variables associated with the reliability function and in the probability that a process consisting of several steps will have a
given duration.
3.5
Normal distribution
A random variable X with mean µ and variance σ 2 has a normal distribution (X ∼
N (µ, σ) if its probability density function in x ∈ [−∞, ∞] is
1 x−µ 2
1
f (x) = √ e− 2 ( σ )
σ 2π
The cumulative distribution function is
F (x) = P (X ≤ x) =
Z
x
−∞
8
1 t−µ 2
1
√ e− 2 ( σ ) dt
σ 2π
The standardized random variable Z = (X − µ)/σ has mean of zero and standard
deviation of 1. Its probability density function is:
z2
1
φ(z) = √ e− 2
2π
and the cumulative distribution function is
Φ(z) = P (X ≤ x) =
Z
z
−∞
t2
1
√ e− 2 dt
2π
Examples of normally distributed random variables abound. A few of them are:
• Time to perform a task.
• Time waiting in a queue.
• lead time demand for an item.
3.6
Lognormal distribution
A random variable X has a lognormal distribution (X ∼ LN (θ, m, σ)) if its probability
density function in x ∈ [0, ∞] is given by
f (x) =
x−θ 2
1
2
√ e−[(ln( m )) /(2σ )]
(x − θ)σ 2π
where θ is the location parameter (often = 0) and m is the scale parameter. When θ = 0
and m = 1 one has the standard lognormal distribution.
The cumulative distribution function is
ln x
)
F (x) = P (X ≤ x) = Φ(
σ
where Φ is the cumulative distribution function of the normal distribution.
Because of its relation with the normal distribution of mean µ and variance σ 2 , the
probability density function of the lognormal distribution with location parameter θ = 0 is
sometimes expressed as
f (x) =
1
√
xσ 2π
e−[(ln x−µ)
2 /(2σ 2 )]
here, µ and σ 2 are the mean and standard deviation of the random variable’s logarithm. The
expected value (mean) of the lognormal distributed random variable is
E(x) = exp(µ + σ 2 /2)
and the variance is
var(x) = exp(2µ + 2σ 2 /2) − exp(1µ + σ 2 )
9
3.7
Weibull distribution
A random variable X associated with the three parameters −∞ < ν < ∞ (location), α > 0
(scale) and β > 0 (shape), has a Weibull distribution if its probability density function is
(
f (x) =
β x−ν β−1
( α )
α
0
exp(−( x−ν
)β ) x ≥ ν
α
otherwise
The cumulative probability distribution function is
(
F (x) =
0
x<ν
x−ν β
1 − exp(−( α ) ) otherwise
If ν = 0, the probability density function becomes
(
f (x) =
β α1 ( αx )β−1 exp(−( αx )β ) x ≥ 0
0
otherwise
The corresponding cumulative distribution function is
(
F (x) =
0
x<0
1 − exp(−( αx )β ) otherwise
If ν = 0 and β = 1, the probability density function
(
f (x) =
1
α
exp(− αx ) x ≥ 0
0
otherwise
i.e. the exponential distribution with parameter λ = 1/α.
The mean and variance of the Weibull distribution are, respectively E(X) = ν +αΓ( β1 +1)
and V (X) = α2 (Γ( β2 + 1) − Γ( β1 + 1)2 ).
Examples of Weibull distributed random variables include:
• Mean time to failure of flat panel screens.
• Probability of clearing an airport runaway within a given time.
3.8
Extreme Value (Gumbell) distribution
A random variable X has an Extreme Value (Gumbell) distribution (X ∼ EV (µ, β) if
its probability density function in x ∈ [−∞, ∞] is given by
f (x) =
x−µ
1 − x−µ
β
e β ee
β
10
where µ is the location parameter and β is the scale parameter. When µ = 0 and β = 1 one
has the standard Gumbell distribution.
The cumulative distribution function of the standard Gumbell distribution is given by
x
F (x) = P (X ≤ x) = 1 − ee
This distribution has been found useful for the description of extreme events such as
floods or earthquakes.
3.9
Triangular distribution
The triangular probability density function is




f (x) =



2(x−a)
(b−a)(c−a)
2(c−x)
(c−b)(c−a)
0
a≤x≤b
b≤x≤c
otherwise
While its cumulative distribution function is
F (x) =


0





x≤a
a<x≤b





 1
b<x≤c
x>c
(x−a)2
(b−a)(c−a)
(c−x)2
1 − (c−b)(c−a)
The mean E(X) = (a + b + c)/3 and the mode M = b. The median is obtained by setting
F (x) = 0.5 and solving for x.
The triangular distribution is a useful one when the only information one has available
about the random variable are its extreme and its maximum values.
4
Empirical Distributions
If the distribution function of a random variable can not be specified in terms of a known
distribution and field data is available, one can use an empirical distribution. Empirical
distributions can be discrete or continuous.
5
Inferences, Estimation and Test of Hypotheses
Statistical inference is a collection of methods designed to investigate the characteristics
of a certain population using only information obtained from a random sample extracted
from such population. Inference is an aid in making decisions confronted with uncertainty
and it is the foundation of modern decision theory.
11
Estimation consists in the determination of the value or range of values of a parameter
of the population using the sample data. Confidence intervals with a specified degree of
confidence are used in interval estimation.
Sometimes, rather than in the value of a parameter one is interested in the validity of
a certain statement (hypothesis testing). In such cases one can encounter the following
situations:
• Accept the statement, it being true (No error is involved).
• Reject the statement, it being true (Type I error).
• Accept the statement, it being false (Type II error).
• Reject the statement, it being false (No error is involved).
One is then interested in the probabilities of incurring in Type I and Type II errors
(respectively, α and β).
Two commonly used statistical inference tests in simulation modeling are the Chi squared
and Kolmogorov-Smirnov tests.
Exercise. Do some research and find out how are the Chi-squared and the KolmogorovSmirnov tests performed.
6
Useful Probabilistic and Statistical Models
6.1
Stochastic Processes
A stochastic process takes place in a system when the state of the system changes with
time in a random manner. Many if not most natural and/or human-made processes are
stochastic processes, although in some cases the random aspects can be neglected.
6.2
Poisson Process
Often one is interested in the number of events which occur over a certain interval of time,
i.e. a counting process (N (t), t ≥ 0). A counting process is a Poisson process if it
involves
• One arrival at a time.
• Random arrivals without rush or slack periods (stationary increments).
• Independent increments.
12
Under these circumstances, the probability that N (t) = n for t ≥ 0 and n = 0, 1, 2, ... is
P (N (t) = n) =
exp(−λt)(λt)n
n!
This means that N (t) has a Poisson distribution with parameter α = λt. Its mean and
variance are E(N (t)) = V (N (t)) = α = λt. It can be shown that if the number of arrivals
has a Poisson distribution, the inter arrival times have an exponential distribution.
The random splitting property of Poisson processes states that if N (t) = N1 (t)+N2 (t)
is Poisson with rate λ, then N1 and N2 are independent Poisson with rates λp and λ(1 − p),
where p and1−p are the probabilities of the branches N1 and N2 . Similarly, if N1 (t)+N2 (t) =
N (t), the reverse is true (random pooling property).
6.3
Markov Chains and the Kolmogorov Balance Equations
If the future probability characteristics of a system in which a stochastic process is taking
place depend only on the state of the system at the current time, one has a Markov process
or chain. The effect of the past on the future is contained in the present state of the system.
As a simple example of a Markov chain consider a machine that works until it fails
(randomly) and then resumes work once is repaired. There are two states for this system,
namely
• The machine is busy (S0 )
• The machine is being repaired (S1 )
The system moves from state S0 to S1 at a rate λ and from S1 back to S0 at a rate µ.
Exercise. Make a graph representing the above Markov chain.
As a second example consider now a facility where two machines A and B perform an
operation. The machines fail randomly but resume work once they are repaired. The four
possible states of this system are
• Both machines are busy (S0 )
• Machine A is being repaired while B is busy (S1 )
• Machine B is being repaired while A is busy (S2 )
• Both machines are being repaired (S3 )
Now λ1 and λ2 are, respectively, the failure rates of machines A and B while µ1 and µ2 are
the corresponding repair rates.
Exercise. Make a graph representing the above Markov chain.
13
The Kolmogorov Balance Equations are differential equations relating the probabilities of the various states involved in a Markov chain P0 , P1 , P2 and P3 . They are obtained
by a probability balance on the states. For the second example above they are
dP0
= µ1 P1 + µ2 P2 − (λ1 + λ2 )P0
dt
dP1
= λ1 P0 + µ2 P3 − (µ1 + λ2 )P1
dt
dP2
= λ2 P0 + µ1 P3 − (λ1 + µ2 )P2
dt
dP3
= λ2 P1 + λ1 P2 − (µ1 + µ2 )P3
dt
Under steady state or equilibrium conditions the time derivatives are zero and the probabilities are then related by a system of simultaneous linear algebraic equations.
6.4
Queueing Systems and Little’s Formula
A queueing system involves one or more servers which provide some service to customers
who arrive, line up and wait for service at a queue when all the servers are busy. Typically,
both arrival and service times are random variables. The single server queue consists of a
single server and a single queue. If the inter arrival times of customers and the service times
are exponentially distributed the resulting queue is known as the M/M/1 queue.
Inter arrival and service times in queues are often modeled probabilistically. Two
examples of queueing systems are:
• Inter arrival times of mechanics at a centralized tool crib.
• Number of mechanics arriving at a centralized tool crib per time period.
Random inter arrival and service times are often simulated using exponential distributions. However, sometimes a normal distribution or a truncated normal distribution may
be more appropriate. Gamma and Weibull distributions are also used.
An important parameter of the queueing system is the server utilization ρ given by
ρ=
λ
µ
where λ is the mean arrival rate of customers from the outside world into the queueing
system and µ is the mean service rate.
14
The single server queue can also be regarded as a Markov chain in which the various
states are distinguished only by the number of customers waiting in the queue. Let us call
the corresponding states S0 , S1 , ..., Sn . The system can then move into state Si either from
Si−1 (if a new customer arrives before service is completed for the customer being served)
or from Si+1 if service is completed and the next customer in line begins service before any
new arrival. Let λi,j be the rate at which the system transitions from state Si to state Sj .
Exercise. Make a graph representing the Markov chain for the single teller queue.
If the queue is at steady state, the Kolmogorov equations yield
λn−1,n ...λ1,2 λ0,1
Pn =
P0
λn,n−1 ...λ2,1 λ1,0
where Pn is the probability of encountering n customers in the system and λ0,1 = λ.
Exercise. Derive the above expression.
In investigating queueing systems one is interested in performance measures such
as the expected number of customers in the system L, the expected number of
customers in the queue Lq , the expected wait time of customers in the system W ,
and the expected wait time of customers in the queue Wq . The above expectancies
are related by Little’s Formula. The formula simply states that
L = λW
or that
Lq = λWq
Exercise. Derive the above expression relating L and W .
A number of queueing problems have been solved yielding closed form expressions for
the above performance parameters. For instance, for the M/M/1 queue at steady state the
results are as follows
ρ
λ
L
= 1−ρ
µ−λ
1
1
W
= µ(1−ρ)
µ−λ
Lq
Wq
Pn
ρ2
λ2
= 1−ρ
µ(µ−λ)
ρ
λ
= µ(1−ρ)
µ(µ−λ)
(1 − µλ )( µλ )n
Further, for the M/G/1 queue in which the service times have a mean of 1/µ and a
variance σ 2 the corresponding results are
L
W
Lq
Wq
P0
2
2 µ2 )
λ2 (1/µ2 +σ 2 )
= ρ + ρ (1+σ
2(1−ρ)
2(1−ρ)
2 +σ 2 )
1
+ λ(1/µ
µ
2(1−ρ)
2
2 µ2 )
λ2 (1/µ2 +σ 2 )
= ρ (1+σ
2(1−ρ)
2(1−ρ)
λ(1/µ2 +σ 2 )
2(1−ρ)
ρ+
1−ρ
15
7
Matching Data with Distributions
For the sake of computational convenience in reliability analysis and modeling, raw data
which is known to consist of independent and identically distributed (i.i.d.) entries are fitted
to a theoretical distribution function. This is nowadays easily done using programs such as
Stat::Fit or Expert.Fit.
Time to failure of complex components is usually represented with the Weibull distribution. If failures are completely random, the exponential distribution is used but if failure
times fluctuate equally around a mean, the normal distribution may be more appropriate.
The lognormal distribution can also be used. For incomplete data uniform, triangular
and beta distributions are used.
Data must then be tested for independence. Some useful tools are:
• Scatter Plots. Contiguous values in a string of values of a random variable are plotted
on a x − y plane. The resulting pattern of points is characteristic of the distribution.
• Autocorrelation Plots. The covariance of values separated by a specified lag in a string
of values of a random variable is plotted as a function of the number of data points.
• Runs Tests. This searches for peculiar patterns in substrings of numbers from a larger
stream.
Data must also be tested to see if they are Identically Distributed (Homogeneity Tests).
Some useful tools are:
• Histograms.
• Distribution Plots.
• Quartile-Quartile Plots.
• Kolmogorov-Smirnov.
• Chi-squared.
• Time Dependency of Distributions.
• ANOVA.
The collected data typically consists of a limited number of data values. Simulation
modeling require large numbers of multiple samples therefore the data must be converted to
a frequency distribution.
Raw data can be used as input for the simulation project but this is usually not recommended except in special cases. More commonly, once data have been tested for independence
and correlation they are converted to a form suitable for use in the simulation model. This is
16
done by fitting it to some distribution. Once a distribution fitting the data has been determined, input for the simulation program is produced as random variates sampled from the
fitted distribution. The frequency distribution selected can be empirical or theoretical, discrete or continuous. Discrete distributions are rarely directly used. Instead, numerical values
of discrete probabilities are directly used. Effectively, continuous, theoretical distributions
are almost always employed.
Of the many available theoretical distributions 12 or so are commonly used in simulation
modeling. Data are fitted to theoretical distributions by identifying the theoretical distribution which best represents the data. Stat::Fit provides a ranking of distributions fitting
a particular data set together with a goodness of fit (Chi-squared or Kolmogorov-Smirnov)
diagnostic. Note also that if the fitted distribution is unbounded values for simulation should
be taken rather from a truncated version of the selected distribution in order to avoid unrealistic extreme values.
7.1
Physical Basis of Common Distributions
Each statistical distribution function has a physical basis. An understanding of this basis
is useful in determining candidate distributions to represent field data. Following is a brief
summary of the physical basis of selected distributions.
• Binomial. This represents the distribution of a random variable giving the number of
successes in n independent trials each yielding either success or failure with probabilities
p and 1 − p, respectively.
• Geometric. This represents the distribution of a random variable giving the number
of independent trials required in an experiment before k successes are achieved.
• Poisson. This represents the distribution of a random variable giving the number of
independent events occurring within a fixed amount of time.
• Normal. This represents the distribution of a random variable which is itself the result
of the sum of component processes.
• Lognormal. This represents the distribution of a random variable which is itself the
result of the product of component processes.
• Exponential. This represents the distribution of a random variable giving the time
interval between independent events.
• Gamma. A distribution of broad applicability restricted to non-negative random
variables.
• Beta. A distribution of broad applicability restricted to bounded random variables.
17
• Erlang. This represents the distribution of a random variable which is itself the result
of the sum of exponential component processes.
• Weibull. This represents the distribution of a random variable giving the time to
failure of a component.
• Uniform. This represents the distribution of a random variable whose values are
completely uncertain.
• Triangular. This represents the distribution of a random variable for which only
minimum, most likely and maximum values are known.
7.2
Common Situations where Specific Distributions are Useful
Representations of Collected Data
Input data for DES models must often be created according to a specific statistical distribution. The required distribution must be identified based on how well it represents the
collected data. Following is a brief summary of the real-life situations where the distributions
mentioned above are likely to be encountered.
• Binomial. Useful when there are only two possible outcomes of an experiment which
is repeated multiple times.
• Geometric. Useful also when there are only two possible outcomes of an experiment
which is repeated multiple times.
• Poisson. Useful to represent the number of incoming customers or requests into a
system.
• Normal. Useful to represent the distribution of errors of all kinds.
• Lognormal. Useful for representation of times required to perform a given task or
accomplish a certain goal.
• Exponential. Useful to represent inter arrival times in all kinds of situations.
• Gamma. Useful also for representation of times required to perform a given task or
accomplish a certain goal but more general.
• Beta. Useful as a rough model under situation of ignorance and/or to represent the
proportion of non-conforming items in a set.
• Erlang. Useful to represent systems making simultaneous request for attention from
a server.
18
• Weibull. Useful to represent the life and/or reliability of components.
• Uniform. Useful when one knows nothing about the system.
• Triangular. Useful when one know little about the system.
7.3
Short-Cut Methods for Distribution Identification
In DES modeling the collected input data is often replaced by computer generated random
variate values which accurately represent the original data. Typically, several distributions
will be considered good candidates. A histogram of the data can provide a first inkling
about the family of distribution function(s) which can well represent the data.
Another simple test which can be used to quickly determine whether a given set of data are
adequately represented by a specific distribution is the construction of quantile-quantile
plots.
Assume that X is a random variable whose cumulative distribution function is F . The
q-quantile of X is that value γ of the random variable which satisfies the equation
F (γ) = P (X < γ) = q
-largest value
In n data values are arranged in increasing order then the value of the j(n+1)
k
-th
or
half-way
through
will be denoted by gj/k . Therefore, the median is g1/2 , i.e. the n+1
2
the data set.
A common application of this concept is in investigating the distribution of income in
a population where the total number of households is divided into five quintiles, (q =
0.2, 0.4, 0.6, 0.8 and 0.1) by increasing values of income. Specifically, here in the USA, if your
household income is more than about γ = 80, 000 dollars per year then you belong in the
top quintile. One in five households is in that quintile.
Consider a collection of n values of the random variable X, xi for i = 1, 2, ..., n. If the
values are arranged according to their magnitude a new string of values is obtained which
we call yj with j = 1, 2, ..., n. The new variable becomes immediately an estimate for the
(j − 12 )/n quantile of X, i.e.
yj ≈ F −1 (
j − 12
)
n
Once an appropriate family of distributions has been selected one proceeds to determine
the various distribution parameters using the collected data values. Following is summary
of distribution parameters and their estimators for three commonly used distributions.
• Poisson. Parameter: E(X) = α. Estimator: sample mean.
• Exponential. Parameter: E(X) = λ. Estimator: reciprocal of the sample mean.
• Normal. Parameters: µ and σ 2 . Estimators: sample mean and variance.
19
7.4
Goodness of Fit Testing
To determine the appropriateness of a given distribution in a particular situation goodnessof-fit tests are required. The tests verify the validity of the null hypothesis H0 which
states that the random variable X follows a specific distribution.
In this section we examine two commonly used tests used for this purpose, the Chisquare test (applicable to large samples) and the Kolmogorov-Smirnov test (applicable
to small samples and restricted to continuous distributions).
For the Chi-square test the n data points are arranged into a desired number of cells (k).
The expected number of points to fall inside the i-th cell, Ei is then given by
Ei = npi
where pi is the probability associated with that interval and is obtained from the specified
distribution.
For instance, consider the case of reliability data consisting of a total of nf failures,
binned into cells representing number of failures within time intervals of uniform duration
∆ti = ti+1 − ti . Assume that direct calculation of the failure rate (hazard function value)
yields a reasonably constant trend and that an average value is computed. Introduce then
the null hypothesis that the data is exponentially distributed with constant failure rate λ̂
estimated as the average failure rate computed from the data. The expected number of
failures inside each time bin is then given by
Ei = nf × [exp(−λ̂ti ) − exp(−λ̂ti+1 )]
Next, using the actual number of data points contained in each cell, Oi one computes
the statistic
χ20 =
n
X
(Oi − Ei )2
Ei
i=1
To test the null hypothesis, the critical value of the statistic is then determined as χ2α,k−s−1
where α is the confidence level and k −s−1 is the number of degrees of freedom and s is
the number of parameters in the candidate distribution (s = 1 in the case of the exponential
distribution). Finally, if χ20 > χ2α,k−s−1 then H0 is rejected but if χ20 < χ2α,k−s−1 the hypothesis
cannot be rejected at the given confidence level.
If the null hypothesis cannot be rejected one can then calculate a confidence interval for
the distribution parameters. For instance, considering again the case of the exponentially
distributed reliability data above, one can show that a 100(1 − α)% confidence interval for
the value of λ̂ is given by
[λ̂ ×
χ2 (2nf , α/2)
χ20 (2nf , 1 − α/2)
, λ̂ × 0
]
2nf
2nf
20
For the Kolmogorov-Smirnov test the n data points are also arranged in increasing order.
If possible the data are made dimensionless dividing each value by the largest value in the
set. Then, one calculates the statistics
D+ = max (
i
− Ri )
n
D− = max (Ri −
i−1
)
n
and
D = max (D+ , D− )
and compares D against the critical value Dc . When D < Dc the null hypothesis H0 cannot
be rejected.
7.5
Input in the Absence of Data
Sometimes input data for DES models is just not easily available. In those instances one must
rely on related engineering data, expert opinion and physical or other limitations to produce
reasonable input values for the model. A few data values combined with the assumption of
uniform, triangular or beta distribution can provide a solid starting point for research.
7.6
Correlated Input Data
In some situations various inputs may be related to each other or the same input quantity
may exhibit autocorrelation over time. Typical examples are in inventory modeling where
demand data affect lead time data and in stock trading where buy and sell orders called to
the broker tend to arrive in bursts.
When two correlated input variables X1 and X2 are involved one uses their covariance
Cov(X1 , X2 ) or their correlation
ρ=
Cov(X1 , X2 )
σ2
σ1
If collected data values for the two values yields ρ >> 0 then one needs to generate correlated
variates.
The following algorithm generates two correlated random variates with normal distributions with parameters µ1 , σ1 and µ2 , σ2 , respectively.
• Generate two independent standard normal variates Z1 and Z2 .
• Set X1 = µ1 + σ1 Z1
21
• Set X2 = µ2 + σ2 (ρZ1 +
√
1 − ρ2 Z 2 )
If data correspond to a time series of values of a single variable X1 , X2 , X − 3, ... all
from the same distribution then one uses the lag-h auto covariance Cov(Xi , Xi+h ) or the
lag-h correlation
ρh =
Cov(Xi , Xi+h )
σi+h
σi
Autoregressive order-1 (AR(1)) and exponential autoregressive order-1 (EAR(1)) models
are commonly used to generate autocorrelated time series.
The algorithm for the AR(1) model is as follows
• Using the collected data, determine the values of the parameters µ ≈ X̄, φ = Cov(Xt , Xt+1 )/S 2
(lag-1 autocorrelation) and σ = S 2 (1 − φ2 ).
• Generate t from a normal distribution with mean 0 and variance σ2 .
• Generate X1 from a normal with mean µ and variance σ /(1 − φ2 ).
• Set Xt = µ + φ(Xt−1 − µ) + t .
• Repeat.
The algorithm for the EAR(1) model is as follows
• Using the collected data, determine the values of the parameters λ ≈ 1/X̄ and φ =
Cov(Xt , Xt+1 )/S 2 (lag-1 autocorrelation).
• Generate X1 from an exponential with mean 1/λ.
• Generate U from a uniform [0,1].
• If U ≤ φ set Xt = φXt−1 .
• If U > φ generate t from an exponential with mean 1/λ and set Xt = φXt−1 + t .
• Repeat.
22
8
Generation of Random Numbers and Pseudo-Random
Numbers
Recall that for a random variable X which is uniformly distributed in [0, 1] the uniform
probability density function is
(
f (x) =
1, 0 ≤ x ≤ 1
0, otherwise
while its cumulative distribution function is


 0,
x<0
F (x) =  x, 0 ≤ x < 1

1, x ≥ 1
A random number (RN) stream is a collection of uniformly distributed random variables.
A truly random stream of numbers has the following characteristics:
• Uniformly distributed.
• Continuous-valued.
• E(R) = 12 .
• σ2 =
1
.
12
• No autocorrelation between numbers.
• No runs.
In practice one always works with streams of pseudo random numbers (PRN). These
have approximately the same characteristics as RN’s. PRN’s are generated with a computer
using a numerical algorithm embedded in a computer program or routine. The requirements
of a good PRNG routine are:
• Fast.
• Portable.
• Long Cycle.
• Replicability.
• Produce PRN with the desired characteristics.
23
8.1
The Linear Congruential Method
The established algorithm for PRN generation is the linear congruential method (LCM).
More sophisticated approaches still use as foundation this method. The fundamental relationship of the LCM is
Xi+1 = (aXi + c)mod (m)
This means that the value of Xi+1 is the remainder left from integer division of aXi + c by
m. Note that the values obtained form the LCM are from the set I = {0, 1/m, 2/m, ..., (m −
1)/m}.
One key feature of the method is its period (P ) (the number of numbers that can be
generated before the same number appears twice). The period is related to the values of m
and c as follows:
• If m = 2b and |c| > 0, P = m = 2b .
• If m = 2b and c = 0, P = m/4 = 2b−2 .
• If m = prime and c = 0, P = m − 1 = 2b − 1.
8.2
The Combined Linear Congruential Method
Large simulations require large collections of PRNs and there is a need for still longer periods.
These can be obtained by the use of combined linear congruential methods (CLCM).
The fundamental theorem associated with CLCM is L’Ecouyer’s.
If W i, 1, Wi,2 , ..., Wi,k are independent discrete-valued random variables with at least one
of them (say Wi,1 ) being uniformly distributed between 0 and m1 − 2. then
Wi = (
k
X
j=1
Wi,j )mod (m1 − 1)
is a uniformly distributed RV between 0 and m1 − 2.
More specifically, consider the following algorithm
Xi = (
k
X
(−1)j−1 Xi,j )mod (m1 − 1)
j=1
where the Xi,j are LC and with
(
Ri =
Xi
,
m1
m1 −1
,
m1
24
Xi > 0
Xi = 0
It can be shown that the maximum period obtained with this algorithm is
P =
(m1 − 1)(m2 − 1)...(mk − 1)
2k − 1
Example. L’Ecuyer proposed the following CLCM:
X1,j+1 = 40014X1,j mod (2147483563)
X2,j+1 = 40692X2,j mod (2147483399)
produce the combined PRNG
Xj+1 = (X1,j+1 − X1,j+1 )mod (2147483562)
to yield
(
Rj+1 =
9
Xj+1
,
2147483563
2147483562
,
2147483563
Xj+1 > 0
Xj+1 = 0
Tests for Random Numbers
Since one always works in practice with PRN streams it is necessary to check how close are
their characteristics to those of real RN streams. Assume a stream containing N PRN’s has
been produced. To verify their characteristics the stream is subjected to various tests. In all
cases, one states a hypothesis about a given characteristic of the stream and then accepts
it or rejects it with a given level of significance α where
α = P (rejectingH0 |H0 is true)
(i.e. Type I error).
In testing for uniformity The null hypothesis H0 is
Ri ∈ U [0, 1]
while the alternative hypothesis H1 is
/ U [0, 1]
Ri ∈
In testing for independence The null hypothesis H0 is
Ri ∈ independent
while the alternative hypothesis H1 is
/ independent
Ri ∈
25
9.1
Kolmogorov-Smirnov Frequency test
For this test the numbers are first arranged in increasing order
R1 < R2 < ... < RN
The test makes use of the new variables
D+ = max (
i
− Ri )
N
D− = max (Ri −
i−1
)
N
and
D = max (D+ , D− )
Once D has been computed, a critical value Dc is obtained from the K-S statistical
table for the desired α and the given N . Finally
• If D > Dc , H0 is rejected (H1 is accepted).
• If D ≤ Dc , H0 is not rejected (i.e. the numbers are uniformly distributed).
9.2
Chi-square Frequency test
In this test the numbers are arranged into n classes by subdividing the range [0, 1] into n
subintervals and determining how many of the numbers end up in each class i, (Oi ).
The test uses the statistic
χ20 =
n
X
(Oi − Ei )2
Ei
i=1
where Ei = N/n are the expected numbers of numbers in each class for a uniform distribution.
Once χ20 has been computed, a critical value χ2α,n−1 is obtained from the Chi-square
statistical table. Finally
• If χ20 > χ2α,n−1 , H0 is rejected (H1 is accepted).
• If χ20 ≤ χ2α,n−1 , H0 is not rejected (i.e. the numbers are uniformly distributed).
26
9.3
Runs Test
This test aims to detect whether there are patterns in substrings of the stream. One examines
the stream and checks whether each number is followed by a larger (+) or a smaller (−)
number. Runs are the resulting patterns of +’s and −’s. In a truly random sequence the
mean and variance of the number of up and down runs a are given by
µa =
2N − 1
3
and
σa2 =
16N − 29
90
When N > 20 the distribution of a is close to normal so the test statistic is
Z0 =
a − µa
σa
which has the normal distribution of mean zero and unit standard deviation (N (0, 1)).
Once Z0 has been computed a critical value zα/2 is obtained from the normal statistical
table. Finally
• If Z0 < −zα/2 or Z0 > zα/2 , H0 is rejected (H1 is accepted).
• If −zα/2 ≤ Z0 ≤ zα/2 , H0 is not rejected (i.e. the numbers are independent).
Other types of runs tests are also possible, for instance runs above and below the mean
and run lengths. For runs above and below the mean a test similar to the one above is used
but with the values of mean and variance for the number of runs b
µb =
2n1 n2 1
+
N
2
and
σb2 =
2n1 n2 (2n1 n2 − N )
N 2 (N − 1)
where n1 and n2 are, respectively, the numbers of runs above and below the mean.
For run lengths one uses the Chi − square test to compare the observed number of runs
of given lengths against the expected number obtained in a truly independent stream.
27
9.4
Autocorrelation Test
This test aims to detect correlation among numbers in the stream separated by specific
number of numbers (lag). Consider the autocorrelation test for a lag m. One investigates
then the behavior of numbers Ri and Ri+jm . If the autocorrelation ρim > 0 there is positive
correlation (i.e. high numbers follow high numbers and vice versa) and if ρim < 0 one has
negative correlation. The autocorrelation is estimated by
ρ0im
M
X
1
[ Ri+km Ri+(k+1)m ] − 0.25
=
M + 1 k=0
where M is the largest integer satisfying i + (M + 1)m ≤ N . The test statistic is in this case
given by
Z0 =
ρ0im
σρ0im
where
√
σρ0im =
13M + 7
12(M + 1)
Once Z0 has been computed a critical value zα/2 is obtained from the normal statistical
table. Finally
• If Z0 < −zα/2 or Z0 > zα/2 , H0 is rejected (H1 is accepted).
• If −zα/2 ≤ Z0 ≤ zα/2 , H0 is not rejected (i.e. the numbers are independent).
9.5
Gap Test
This test checks for independence by tracking down the pattern of gaps between a given
digit in the stream. The test is performed using the Kolmogorov-Smirnov scheme.
9.6
Poker Test
This test checks for independence based on the repetition of certain digits in the sequence.
The test is performed using the Chi-square scheme.
10
Generation of Random Variates
Discrete event simulation models require as inputs the values of random variables with
specified probability distributions. Such random variables are called random variates.
28
Input data for DES models are collected from the field and/or produced from best available
estimates. However, the amount of data collected is rarely enough to run simulation models
and one must use the data to create PRN streams with statistical characteristics similar to
those of the original data.
So, on the one hand one needs to identify the statistical characteristics of the original
data and on the other one must be able to produce large collections of random variates
with statistical characteristics similar to those of the original data. Here we focus on the
second aspect, namely once we have determined the probability distribution applicable to
our data we proceed to generate random variate streams for use in the simulation. This is
accomplished by the inverse transform method.
10.1
The Inverse Transform Method
Given a random (or pseudo-random) number R and a random variate X,
• Determine the cumulative distribution function of X, F (X).
• Set F (X) = R.
• Solve the equation F (X) = R for X in terms of R, i.e. X = F −1 (R).
• Repeat the above for the stream of random (or pseudo-random) numbers R1 , R2 , ..., Rn
to obtain the stream of random variates X1 , X2 , ..., Xn .
Next, the formulae obtained by the inverse transform method for several commonly used
random variates are given.
10.2
Inverse Transform for the Exponential Distribution
Following are the specific steps required to obtain exponentially distributed random variates
with mean λ from a random number stream using the inverse transform method.
• F (x) = 1 − e−λx .
• Set F (X) = 1 − e−λx = R.
• X = − λ1 ln(1 − R).
• For i = 1, 2, ..., n, compute Xi = − λ1 ln(1 − Ri )
29
10.3
Inverse Transform for the Uniform Distribution
Following are the specific steps required to obtain uniformly distributed random variates
between a and b from a random number stream using the inverse transform method.
• F (x) =
x−a
.
b−a
• Set F (X) =
X−a
b−a
= R.
• X = a + (b − a)R.
• For i = 1, 2, ..., n, compute Xi = a + (b − a)Ri
10.4
Inverse Transform for the Weibull Distribution
Following are the specific steps required to obtain Weibull distributed random variates with
parameters α and β from a random number stream using the inverse transform method.
β
• F (x) = 1 − e−(x/α) .
β
• Set F (X) = 1 − e−(X/α) = R.
1
• X = α[ln(1 − R)] β .
1
• For i = 1, 2, ..., n, compute Xi = α[− ln(1 − Ri )] β
10.5
Inverse Transform for the Triangular Distribution
Following are the specific steps required to obtain random variates with triangular distribution between 0 and 2 with mode 1 from a random number stream using the inverse transform
method.
• F (x) =


0



 x2
2

1−




1
(2−x)2
2
x≤0
0<x≤1
1<x≤2
x>2
( √
• Xi =
2Rqi
0 < Ri ≤ 12
2 − 2(1 − Ri ) 12 < Ri ≤ 1
30
10.6
Inverse Transform for Empirical Distributions
If no appropriate distribution can be found for the data one can resort to resampling the
data. This creates an empirical distribution. A simple empirical distribution can be
produced from given data by piecewise linear approximation.
Assume the available data points (observations) are arranged in increasing order x1 , x2 , ..., xn .
Assume also that a probability is assigned to each resulting range xj − xj−1 such that the
cumulative probability of the first j intervals is cj . The associated random variate is obtained
as
Xi = xj−1 +
xj − xj−1
(Ri − cj−1 )
cj − cj−1
when cj−1 < Ri ≤ cj .
10.7
Inverse and Direct Transforms for the Normal Distribution
The normal distribution does not have a closed-form inverse transformation. However, the
following expression is an excellent approximation to the inverse cumulative distribution
function of the standard normal distribution.
Xi ≈
Ri0.135 − (1 − Ri )0.135
0.1975
From the above, random variates with a normal distribution of mean µ and standard deviation σ are readily obtained as
Xi ≈ µ + σ(
Ri0.135 − (1 − Ri )0.135
)
0.1975
A direct transformation can be used to produce two independent standard normal variates
Z1 and Z2 from two random numbers R1 and R2 according to
1
Z1 = (−2 ln R1 ) 2 cos(2πR2 )
and
1
Z2 = (−2 ln R1 ) 2 sin(2πR2 )
Normal random variates Xi with mean µ and standard deviation σ can then be obtained
from
Xi = µ + σZi
31
10.8
Inverse and Direct Transforms for the Lognormal Distribution
If the random variable Y has the normal distribution with mean µ and variance σ 2 , the
associated random variable X = exp(Y ) has the lognormal distribution with parameters µ
and σ 2 .
Thus, random variates with a standard lognormal distribution can be generated from the
expression
Xi ≈ exp(
Ri0.135 − (1 − Ri )0.135
)
0.1975
Random variates with a lognormal distribution of parameters µ and σ are then generated by
Xi ≈ exp[µ + σ(
10.9
Ri0.135 − (1 − Ri )0.135
)]
0.1975
Inverse Transform for the Discrete Distributions
A similar procedure to the one indicated above can be used to produce discretely distributed
random variates. Since the cumulative distribution functions for discrete distributions consist
of discrete jumps separated by horizontal plateaus, lookup tables are a convenient and very
efficient method of generating inverses.
10.10
Other Methods of Generating Random Variates
When two or more random variables are added together to produce a new random variable
with a desired distribution one is using the method of convolution.
If one generates the random variate by selective accepting or rejecting numbers from a
random number stream one is using the acceptance-rejection technique.
Detailed descriptions of these two methods as well as examples can be found in your
textbook.
32