Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Basic Concepts of Probability and Statistics for Reliability Engineering Ernesto Gutierrez-Miravete Spring 2007 1 1.1 Introduction Probability Events and sample space are fundamental concepts in probability. A sample space S is the set of all possible outcomes of an experiment whose outcome cannot be determined in advance while an event E is a subset of S. The probability of the event E, P (E), is a number satisfying the following axioms 0 ≤ P (E) ≤ 1 P (S) = 1 P( [ Ei ) = X P (Ei ) where the various Ei ’s are mutually exclusive events. One can associate with each occurrence in S a numerical value. A random variable X is a function assigning a real number to each member of S. Random variables can adopt discrete values or continuous values. 1.2 Discrete random variables If X is a random variable (i.e. a function defined over the elements of a sample space) with a finite number of possible values xi ∈ RX with i = 1, 2, ... , where RX is the range of values of the random variable, then it is a discrete random variable. The probability of X having a specific value xi , p(xi ) = P (X = xi ) is a number such that 1 p(xi ) ≥ 0 for every i = 1, 2, ..., and ∞ X i=1 p(xi ) = 1 The collection of pairs (xi , p(xi )), i = 1, 2, ... is called the probability distribution of X. p(xi ) is the probability mass function of X. Two examples of discrete random variables are: • Number of jobs arriving at a job shop each week. • Tossing a loaded die. 1.3 Continuous random variables If RX is an interval rather than a discrete set then X is a continuous random variable. The probability that X ∈ [a, b] is P (a ≤ X ≤ b) = Z b a f (x)dx where f (x) is the probability density function of X satisfying (for all x ∈ RX ) f (x) ≥ 0 Z RX f (x)dx = 1 and, if x 6∈ RX f (x) = 0 Two example of continuous random variables are: • The life of a device. • Temperature readings in a turbulent flow field. 2 1.4 Cumulative distribution function The probability that X ≤ x, P (X ≤ x) = F (x) is the cumulative distribution function. The CDF is defined as n X F (x) = i=1 p(xi ) for discrete X ≤ xn and as Z F (x) = x −∞ f (t)dt for continuous X ≤ x. Note that if a < b then F (a) ≤ F (b), limx→∞ F (x) = 1, limx→−∞ F (x) = 0 and P (a ≤ X ≤ b) = F (b) − F (a). Exercise. Determine the probabilities of various outcomes in tossing a loaded die and also the probability that a device has a certain life. 1.5 Expectation and Moment Generating Function The expected value of a random variable X, the expectation of X is n X E(X) = xi p(xi ) i=1 for discrete X and Z E(X) = ∞ −∞ xf (x)dx for continuous X. E(X) is also called the mean or the first moment of X. Generalizing, the nth moment of X is E(X n ) = n X i=1 for discrete X and Z E(X) = ∞ −∞ xni p(xi ) xn f (x)dx for continuous X. A moment generating function of a random variable X can be defined as Z tX ψ(t) = E(e ) = 3 etX dF (x) Moments of all orders for X are obtained as the derivatives of ψ. The existence of a moment generating function uniquely determines the distribution of X. The variance of X, V (X) = var(X) = σ 2 is σ 2 = E((X − E(X))2 ) = E(X 2 ) − (E(X))2 √ The standard deviation of X is σ = σ 2 . The third and fourth moments of a distribution are associated with its skewness and its kurtosis, respectively. Exercises. Determine the expectations of various outcomes in tossing a loaded die and that of certain device having a certain life. Another important statistic is the covariance of two random variables X and Y , Cov(X, Y ). This is defined as Cov(X, Y ) = E(XY ) − E(X)E(Y ) If Cov(X, Y ) = 0 the variables are said to be uncorrelated. Further, the autocorrelation coefficient, ρ(X, Y ) is defined as ρ(X, Y ) = Cov(X, Y ) 1 (var(X)var(Y )) 2 The conditional probability gives the probability that a random variable X = x given that Y = y and is defined as P (X = x|Y = y) = P (X = x, Y = y) P (Y = y) Exercise. In a population of N people NA are color blind, NH are female and NAH are color blind females. If a person chosen at random turns out to be a female, what is the probability that she will also be color blind? 1.6 Law of Large Numbers and the Central Limit Theorem The following limit theorems are of fundamental and practical importance. They are given here without proof. The strong law of large numbers states that if the random variables X1 , X2 , ..., Xn are independent and identically distributed (iid) with mean µ then the limit Pn lim n→∞ Xi = lim X̄ = µ n→∞ n 1 with probability P = 1. Furthermore if the variance of the distribution of the Xi above is σ 2 , the central limit theorem states that lim P [ n→∞ X̄ − µ √ ≤ a] = σ/ n Z 4 a −∞ 1 2 √ e−x /2 dx 2π In words the√theorem states that the distribution of the normalized random variable (X̄ − µ)/(σ/ n) approaches the standard normal distribution of mean 0 and standard deviation 1. 2 Discrete Distributions 2.1 Bernoulli distribution For an experiment consisting of n independent trials each with two possible outcomes, namely success and failure. If Xj = 1 for a success and Xj = 0 for failure and the probability of success remains constant from trial to trial, the probability of success at the jth trial is given by the Bernoulli distribution as follows p xj = 1, j = 1, 2, ..., n pj (xj ) = p(xj ) = 1 − p = q xj = 0, j = 1, 2, ..., n 0 otherwise Note that E(Xj ) = p and V (Xj ) = pq. The outcome of tossing a fair coin n times can be represented by a Bernoulli distribution with p = q = 12 . 2.2 Binomial distribution The number of successes in n Bernoulli trials is a random variable X with the binomial distribution p(x) p(x) = n x ! px q n−x x = 0, 1, 2, ..., n 0 otherwise where n x ! = n! x!(n − x)! Note that E(X) = np and V (X) = npq. Consider as an example the following situation form quality control in chip manufacture where the probability of finding more than 2 nonconforming chips in a sample of 50, is P (X > 2) = 1 − P (X ≤ 2) = 1 − 2 X x=0 5 n x ! px q 50−x 2.3 Geometric distribution The number of trials required to achieve the first success is a random variable X with the geometric distribution p(x) ( p(x) = q x−1 p x = 1, 2, ... 0 otherwise Note that E(X) = 1/p and V (X) = q/p2 . Exercise. In acceptance sampling one must determine, for example, the probability that the first acceptable item found is the third one inspected given that 40% of items are rejected during inspection. Determine the values of x and q and find p(x). 2.4 Poisson distribution If α > 0, the Poisson probability mass function is ( p(x) = exp(−α)αx x! x = 0, 1, 2, ... otherwise 0 Note that E(X) = V (X) = α. The cumulative distribution function is F (x) = x X exp(−α)αi i! i=0 Examples of Poisson distributed random variables include • The number of customers arriving at a bank. • Beeper calls to an on-call service person. • Lead time demand in inventory systems. 3 3.1 Continuous Distributions Uniform distribution For a random variable X which is uniformly distributed in [a, b] the uniform probability density function is ( f (x) = 1 b−a 0 a≤x≤b otherwise 6 while its cumulative distribution function is 0 F (x) = x<a a≤x<b x≥b x−a b−a 1 Note that P (x1 < X < x2 ) = F (x2 ) − F (x1 ) = (b−a)2 x2 −x1 . b−a Note also that E(X) = a+b 2 and V (X) = 12 . Examples of uniformly distributed random variables could be: • Inter arrival time for calls seeking a forklift in warehouse operations. • Five minute wait probability for passenger at a bus stop. • Readings from a table of random numbers. 3.2 Exponential distribution If λ > 0, the exponential probability density function of X is ( f (x) = λ exp(−λx) x ≥ 0 0 elsewhere while its cumulative distribution function is ( F (x) = 0R x 0 x<0 λ exp(−λt)dt = 1 − exp(−λx) x ≥ 0 Note that E(X) = 1/λ and V (X) = 1/λ2 . Examples of exponentially distributed random variables include: • Inter arrival times of commercial aircraft at an airport. • Life of a device. The exponential distribution possesses the memoryless property, i.e. if s ≥ 0 and t ≥ 0 then P (X > s + t|X > s) = P (X > t). Clearly, unless there is agreement beforehand the time one person arrives at the bank is independent of the arrival time of the next person. Another example is that of the life of a used component which is as good as new. In the discrete case the geometric distribution also possesses the memoryless property. 7 3.3 Gamma distribution The gamma function of parameter β > 0, Γ(β) is Z Γ(β) = ∞ 0 xβ−1 exp(−x)dx Note that Γ(β) = (β − 1)Γ(β − 1) = (β − 1)! A random variable X has a gamma probability density function with shape parameter β and scale parameter θ if ( f (x) = βθ (βθx)β−1 Γ(β) 0 exp(−βθx) x > 0 otherwise The cumulative distribution function is ( F (x) = 1− 0 R∞ x βθ (βθt)β−1 Γ(β) exp(−βθt)dt x > 0 x≤0 Note that E(X) = 1/θ and V (X) = 1/βθ2 . 3.4 Erlang distribution If above β = k where k is an integer, the Erlang distribution of order k is obtained. The cumulative distribution function is ( F (x) = 1− 0 Pk−1 i=0 exp(−kθx)(kθx)i i! x>0 x≤0 Examples of gamma distributions occur for random variables associated with the reliability function and in the probability that a process consisting of several steps will have a given duration. 3.5 Normal distribution A random variable X with mean µ and variance σ 2 has a normal distribution (X ∼ N (µ, σ) if its probability density function in x ∈ [−∞, ∞] is 1 x−µ 2 1 f (x) = √ e− 2 ( σ ) σ 2π The cumulative distribution function is F (x) = P (X ≤ x) = Z x −∞ 8 1 t−µ 2 1 √ e− 2 ( σ ) dt σ 2π The standardized random variable Z = (X − µ)/σ has mean of zero and standard deviation of 1. Its probability density function is: z2 1 φ(z) = √ e− 2 2π and the cumulative distribution function is Φ(z) = P (X ≤ x) = Z z −∞ t2 1 √ e− 2 dt 2π Examples of normally distributed random variables abound. A few of them are: • Time to perform a task. • Time waiting in a queue. • lead time demand for an item. 3.6 Lognormal distribution A random variable X has a lognormal distribution (X ∼ LN (θ, m, σ)) if its probability density function in x ∈ [0, ∞] is given by f (x) = x−θ 2 1 2 √ e−[(ln( m )) /(2σ )] (x − θ)σ 2π where θ is the location parameter (often = 0) and m is the scale parameter. When θ = 0 and m = 1 one has the standard lognormal distribution. The cumulative distribution function is ln x ) F (x) = P (X ≤ x) = Φ( σ where Φ is the cumulative distribution function of the normal distribution. Because of its relation with the normal distribution of mean µ and variance σ 2 , the probability density function of the lognormal distribution with location parameter θ = 0 is sometimes expressed as f (x) = 1 √ xσ 2π e−[(ln x−µ) 2 /(2σ 2 )] here, µ and σ 2 are the mean and standard deviation of the random variable’s logarithm. The expected value (mean) of the lognormal distributed random variable is E(x) = exp(µ + σ 2 /2) and the variance is var(x) = exp(2µ + 2σ 2 /2) − exp(1µ + σ 2 ) 9 3.7 Weibull distribution A random variable X associated with the three parameters −∞ < ν < ∞ (location), α > 0 (scale) and β > 0 (shape), has a Weibull distribution if its probability density function is ( f (x) = β x−ν β−1 ( α ) α 0 exp(−( x−ν )β ) x ≥ ν α otherwise The cumulative probability distribution function is ( F (x) = 0 x<ν x−ν β 1 − exp(−( α ) ) otherwise If ν = 0, the probability density function becomes ( f (x) = β α1 ( αx )β−1 exp(−( αx )β ) x ≥ 0 0 otherwise The corresponding cumulative distribution function is ( F (x) = 0 x<0 1 − exp(−( αx )β ) otherwise If ν = 0 and β = 1, the probability density function ( f (x) = 1 α exp(− αx ) x ≥ 0 0 otherwise i.e. the exponential distribution with parameter λ = 1/α. The mean and variance of the Weibull distribution are, respectively E(X) = ν +αΓ( β1 +1) and V (X) = α2 (Γ( β2 + 1) − Γ( β1 + 1)2 ). Examples of Weibull distributed random variables include: • Mean time to failure of flat panel screens. • Probability of clearing an airport runaway within a given time. 3.8 Extreme Value (Gumbell) distribution A random variable X has an Extreme Value (Gumbell) distribution (X ∼ EV (µ, β) if its probability density function in x ∈ [−∞, ∞] is given by f (x) = x−µ 1 − x−µ β e β ee β 10 where µ is the location parameter and β is the scale parameter. When µ = 0 and β = 1 one has the standard Gumbell distribution. The cumulative distribution function of the standard Gumbell distribution is given by x F (x) = P (X ≤ x) = 1 − ee This distribution has been found useful for the description of extreme events such as floods or earthquakes. 3.9 Triangular distribution The triangular probability density function is f (x) = 2(x−a) (b−a)(c−a) 2(c−x) (c−b)(c−a) 0 a≤x≤b b≤x≤c otherwise While its cumulative distribution function is F (x) = 0 x≤a a<x≤b 1 b<x≤c x>c (x−a)2 (b−a)(c−a) (c−x)2 1 − (c−b)(c−a) The mean E(X) = (a + b + c)/3 and the mode M = b. The median is obtained by setting F (x) = 0.5 and solving for x. The triangular distribution is a useful one when the only information one has available about the random variable are its extreme and its maximum values. 4 Empirical Distributions If the distribution function of a random variable can not be specified in terms of a known distribution and field data is available, one can use an empirical distribution. Empirical distributions can be discrete or continuous. 5 Inferences, Estimation and Test of Hypotheses Statistical inference is a collection of methods designed to investigate the characteristics of a certain population using only information obtained from a random sample extracted from such population. Inference is an aid in making decisions confronted with uncertainty and it is the foundation of modern decision theory. 11 Estimation consists in the determination of the value or range of values of a parameter of the population using the sample data. Confidence intervals with a specified degree of confidence are used in interval estimation. Sometimes, rather than in the value of a parameter one is interested in the validity of a certain statement (hypothesis testing). In such cases one can encounter the following situations: • Accept the statement, it being true (No error is involved). • Reject the statement, it being true (Type I error). • Accept the statement, it being false (Type II error). • Reject the statement, it being false (No error is involved). One is then interested in the probabilities of incurring in Type I and Type II errors (respectively, α and β). Two commonly used statistical inference tests in simulation modeling are the Chi squared and Kolmogorov-Smirnov tests. Exercise. Do some research and find out how are the Chi-squared and the KolmogorovSmirnov tests performed. 6 Useful Probabilistic and Statistical Models 6.1 Stochastic Processes A stochastic process takes place in a system when the state of the system changes with time in a random manner. Many if not most natural and/or human-made processes are stochastic processes, although in some cases the random aspects can be neglected. 6.2 Poisson Process Often one is interested in the number of events which occur over a certain interval of time, i.e. a counting process (N (t), t ≥ 0). A counting process is a Poisson process if it involves • One arrival at a time. • Random arrivals without rush or slack periods (stationary increments). • Independent increments. 12 Under these circumstances, the probability that N (t) = n for t ≥ 0 and n = 0, 1, 2, ... is P (N (t) = n) = exp(−λt)(λt)n n! This means that N (t) has a Poisson distribution with parameter α = λt. Its mean and variance are E(N (t)) = V (N (t)) = α = λt. It can be shown that if the number of arrivals has a Poisson distribution, the inter arrival times have an exponential distribution. The random splitting property of Poisson processes states that if N (t) = N1 (t)+N2 (t) is Poisson with rate λ, then N1 and N2 are independent Poisson with rates λp and λ(1 − p), where p and1−p are the probabilities of the branches N1 and N2 . Similarly, if N1 (t)+N2 (t) = N (t), the reverse is true (random pooling property). 6.3 Markov Chains and the Kolmogorov Balance Equations If the future probability characteristics of a system in which a stochastic process is taking place depend only on the state of the system at the current time, one has a Markov process or chain. The effect of the past on the future is contained in the present state of the system. As a simple example of a Markov chain consider a machine that works until it fails (randomly) and then resumes work once is repaired. There are two states for this system, namely • The machine is busy (S0 ) • The machine is being repaired (S1 ) The system moves from state S0 to S1 at a rate λ and from S1 back to S0 at a rate µ. Exercise. Make a graph representing the above Markov chain. As a second example consider now a facility where two machines A and B perform an operation. The machines fail randomly but resume work once they are repaired. The four possible states of this system are • Both machines are busy (S0 ) • Machine A is being repaired while B is busy (S1 ) • Machine B is being repaired while A is busy (S2 ) • Both machines are being repaired (S3 ) Now λ1 and λ2 are, respectively, the failure rates of machines A and B while µ1 and µ2 are the corresponding repair rates. Exercise. Make a graph representing the above Markov chain. 13 The Kolmogorov Balance Equations are differential equations relating the probabilities of the various states involved in a Markov chain P0 , P1 , P2 and P3 . They are obtained by a probability balance on the states. For the second example above they are dP0 = µ1 P1 + µ2 P2 − (λ1 + λ2 )P0 dt dP1 = λ1 P0 + µ2 P3 − (µ1 + λ2 )P1 dt dP2 = λ2 P0 + µ1 P3 − (λ1 + µ2 )P2 dt dP3 = λ2 P1 + λ1 P2 − (µ1 + µ2 )P3 dt Under steady state or equilibrium conditions the time derivatives are zero and the probabilities are then related by a system of simultaneous linear algebraic equations. 6.4 Queueing Systems and Little’s Formula A queueing system involves one or more servers which provide some service to customers who arrive, line up and wait for service at a queue when all the servers are busy. Typically, both arrival and service times are random variables. The single server queue consists of a single server and a single queue. If the inter arrival times of customers and the service times are exponentially distributed the resulting queue is known as the M/M/1 queue. Inter arrival and service times in queues are often modeled probabilistically. Two examples of queueing systems are: • Inter arrival times of mechanics at a centralized tool crib. • Number of mechanics arriving at a centralized tool crib per time period. Random inter arrival and service times are often simulated using exponential distributions. However, sometimes a normal distribution or a truncated normal distribution may be more appropriate. Gamma and Weibull distributions are also used. An important parameter of the queueing system is the server utilization ρ given by ρ= λ µ where λ is the mean arrival rate of customers from the outside world into the queueing system and µ is the mean service rate. 14 The single server queue can also be regarded as a Markov chain in which the various states are distinguished only by the number of customers waiting in the queue. Let us call the corresponding states S0 , S1 , ..., Sn . The system can then move into state Si either from Si−1 (if a new customer arrives before service is completed for the customer being served) or from Si+1 if service is completed and the next customer in line begins service before any new arrival. Let λi,j be the rate at which the system transitions from state Si to state Sj . Exercise. Make a graph representing the Markov chain for the single teller queue. If the queue is at steady state, the Kolmogorov equations yield λn−1,n ...λ1,2 λ0,1 Pn = P0 λn,n−1 ...λ2,1 λ1,0 where Pn is the probability of encountering n customers in the system and λ0,1 = λ. Exercise. Derive the above expression. In investigating queueing systems one is interested in performance measures such as the expected number of customers in the system L, the expected number of customers in the queue Lq , the expected wait time of customers in the system W , and the expected wait time of customers in the queue Wq . The above expectancies are related by Little’s Formula. The formula simply states that L = λW or that Lq = λWq Exercise. Derive the above expression relating L and W . A number of queueing problems have been solved yielding closed form expressions for the above performance parameters. For instance, for the M/M/1 queue at steady state the results are as follows ρ λ L = 1−ρ µ−λ 1 1 W = µ(1−ρ) µ−λ Lq Wq Pn ρ2 λ2 = 1−ρ µ(µ−λ) ρ λ = µ(1−ρ) µ(µ−λ) (1 − µλ )( µλ )n Further, for the M/G/1 queue in which the service times have a mean of 1/µ and a variance σ 2 the corresponding results are L W Lq Wq P0 2 2 µ2 ) λ2 (1/µ2 +σ 2 ) = ρ + ρ (1+σ 2(1−ρ) 2(1−ρ) 2 +σ 2 ) 1 + λ(1/µ µ 2(1−ρ) 2 2 µ2 ) λ2 (1/µ2 +σ 2 ) = ρ (1+σ 2(1−ρ) 2(1−ρ) λ(1/µ2 +σ 2 ) 2(1−ρ) ρ+ 1−ρ 15 7 Matching Data with Distributions For the sake of computational convenience in reliability analysis and modeling, raw data which is known to consist of independent and identically distributed (i.i.d.) entries are fitted to a theoretical distribution function. This is nowadays easily done using programs such as Stat::Fit or Expert.Fit. Time to failure of complex components is usually represented with the Weibull distribution. If failures are completely random, the exponential distribution is used but if failure times fluctuate equally around a mean, the normal distribution may be more appropriate. The lognormal distribution can also be used. For incomplete data uniform, triangular and beta distributions are used. Data must then be tested for independence. Some useful tools are: • Scatter Plots. Contiguous values in a string of values of a random variable are plotted on a x − y plane. The resulting pattern of points is characteristic of the distribution. • Autocorrelation Plots. The covariance of values separated by a specified lag in a string of values of a random variable is plotted as a function of the number of data points. • Runs Tests. This searches for peculiar patterns in substrings of numbers from a larger stream. Data must also be tested to see if they are Identically Distributed (Homogeneity Tests). Some useful tools are: • Histograms. • Distribution Plots. • Quartile-Quartile Plots. • Kolmogorov-Smirnov. • Chi-squared. • Time Dependency of Distributions. • ANOVA. The collected data typically consists of a limited number of data values. Simulation modeling require large numbers of multiple samples therefore the data must be converted to a frequency distribution. Raw data can be used as input for the simulation project but this is usually not recommended except in special cases. More commonly, once data have been tested for independence and correlation they are converted to a form suitable for use in the simulation model. This is 16 done by fitting it to some distribution. Once a distribution fitting the data has been determined, input for the simulation program is produced as random variates sampled from the fitted distribution. The frequency distribution selected can be empirical or theoretical, discrete or continuous. Discrete distributions are rarely directly used. Instead, numerical values of discrete probabilities are directly used. Effectively, continuous, theoretical distributions are almost always employed. Of the many available theoretical distributions 12 or so are commonly used in simulation modeling. Data are fitted to theoretical distributions by identifying the theoretical distribution which best represents the data. Stat::Fit provides a ranking of distributions fitting a particular data set together with a goodness of fit (Chi-squared or Kolmogorov-Smirnov) diagnostic. Note also that if the fitted distribution is unbounded values for simulation should be taken rather from a truncated version of the selected distribution in order to avoid unrealistic extreme values. 7.1 Physical Basis of Common Distributions Each statistical distribution function has a physical basis. An understanding of this basis is useful in determining candidate distributions to represent field data. Following is a brief summary of the physical basis of selected distributions. • Binomial. This represents the distribution of a random variable giving the number of successes in n independent trials each yielding either success or failure with probabilities p and 1 − p, respectively. • Geometric. This represents the distribution of a random variable giving the number of independent trials required in an experiment before k successes are achieved. • Poisson. This represents the distribution of a random variable giving the number of independent events occurring within a fixed amount of time. • Normal. This represents the distribution of a random variable which is itself the result of the sum of component processes. • Lognormal. This represents the distribution of a random variable which is itself the result of the product of component processes. • Exponential. This represents the distribution of a random variable giving the time interval between independent events. • Gamma. A distribution of broad applicability restricted to non-negative random variables. • Beta. A distribution of broad applicability restricted to bounded random variables. 17 • Erlang. This represents the distribution of a random variable which is itself the result of the sum of exponential component processes. • Weibull. This represents the distribution of a random variable giving the time to failure of a component. • Uniform. This represents the distribution of a random variable whose values are completely uncertain. • Triangular. This represents the distribution of a random variable for which only minimum, most likely and maximum values are known. 7.2 Common Situations where Specific Distributions are Useful Representations of Collected Data Input data for DES models must often be created according to a specific statistical distribution. The required distribution must be identified based on how well it represents the collected data. Following is a brief summary of the real-life situations where the distributions mentioned above are likely to be encountered. • Binomial. Useful when there are only two possible outcomes of an experiment which is repeated multiple times. • Geometric. Useful also when there are only two possible outcomes of an experiment which is repeated multiple times. • Poisson. Useful to represent the number of incoming customers or requests into a system. • Normal. Useful to represent the distribution of errors of all kinds. • Lognormal. Useful for representation of times required to perform a given task or accomplish a certain goal. • Exponential. Useful to represent inter arrival times in all kinds of situations. • Gamma. Useful also for representation of times required to perform a given task or accomplish a certain goal but more general. • Beta. Useful as a rough model under situation of ignorance and/or to represent the proportion of non-conforming items in a set. • Erlang. Useful to represent systems making simultaneous request for attention from a server. 18 • Weibull. Useful to represent the life and/or reliability of components. • Uniform. Useful when one knows nothing about the system. • Triangular. Useful when one know little about the system. 7.3 Short-Cut Methods for Distribution Identification In DES modeling the collected input data is often replaced by computer generated random variate values which accurately represent the original data. Typically, several distributions will be considered good candidates. A histogram of the data can provide a first inkling about the family of distribution function(s) which can well represent the data. Another simple test which can be used to quickly determine whether a given set of data are adequately represented by a specific distribution is the construction of quantile-quantile plots. Assume that X is a random variable whose cumulative distribution function is F . The q-quantile of X is that value γ of the random variable which satisfies the equation F (γ) = P (X < γ) = q -largest value In n data values are arranged in increasing order then the value of the j(n+1) k -th or half-way through will be denoted by gj/k . Therefore, the median is g1/2 , i.e. the n+1 2 the data set. A common application of this concept is in investigating the distribution of income in a population where the total number of households is divided into five quintiles, (q = 0.2, 0.4, 0.6, 0.8 and 0.1) by increasing values of income. Specifically, here in the USA, if your household income is more than about γ = 80, 000 dollars per year then you belong in the top quintile. One in five households is in that quintile. Consider a collection of n values of the random variable X, xi for i = 1, 2, ..., n. If the values are arranged according to their magnitude a new string of values is obtained which we call yj with j = 1, 2, ..., n. The new variable becomes immediately an estimate for the (j − 12 )/n quantile of X, i.e. yj ≈ F −1 ( j − 12 ) n Once an appropriate family of distributions has been selected one proceeds to determine the various distribution parameters using the collected data values. Following is summary of distribution parameters and their estimators for three commonly used distributions. • Poisson. Parameter: E(X) = α. Estimator: sample mean. • Exponential. Parameter: E(X) = λ. Estimator: reciprocal of the sample mean. • Normal. Parameters: µ and σ 2 . Estimators: sample mean and variance. 19 7.4 Goodness of Fit Testing To determine the appropriateness of a given distribution in a particular situation goodnessof-fit tests are required. The tests verify the validity of the null hypothesis H0 which states that the random variable X follows a specific distribution. In this section we examine two commonly used tests used for this purpose, the Chisquare test (applicable to large samples) and the Kolmogorov-Smirnov test (applicable to small samples and restricted to continuous distributions). For the Chi-square test the n data points are arranged into a desired number of cells (k). The expected number of points to fall inside the i-th cell, Ei is then given by Ei = npi where pi is the probability associated with that interval and is obtained from the specified distribution. For instance, consider the case of reliability data consisting of a total of nf failures, binned into cells representing number of failures within time intervals of uniform duration ∆ti = ti+1 − ti . Assume that direct calculation of the failure rate (hazard function value) yields a reasonably constant trend and that an average value is computed. Introduce then the null hypothesis that the data is exponentially distributed with constant failure rate λ̂ estimated as the average failure rate computed from the data. The expected number of failures inside each time bin is then given by Ei = nf × [exp(−λ̂ti ) − exp(−λ̂ti+1 )] Next, using the actual number of data points contained in each cell, Oi one computes the statistic χ20 = n X (Oi − Ei )2 Ei i=1 To test the null hypothesis, the critical value of the statistic is then determined as χ2α,k−s−1 where α is the confidence level and k −s−1 is the number of degrees of freedom and s is the number of parameters in the candidate distribution (s = 1 in the case of the exponential distribution). Finally, if χ20 > χ2α,k−s−1 then H0 is rejected but if χ20 < χ2α,k−s−1 the hypothesis cannot be rejected at the given confidence level. If the null hypothesis cannot be rejected one can then calculate a confidence interval for the distribution parameters. For instance, considering again the case of the exponentially distributed reliability data above, one can show that a 100(1 − α)% confidence interval for the value of λ̂ is given by [λ̂ × χ2 (2nf , α/2) χ20 (2nf , 1 − α/2) , λ̂ × 0 ] 2nf 2nf 20 For the Kolmogorov-Smirnov test the n data points are also arranged in increasing order. If possible the data are made dimensionless dividing each value by the largest value in the set. Then, one calculates the statistics D+ = max ( i − Ri ) n D− = max (Ri − i−1 ) n and D = max (D+ , D− ) and compares D against the critical value Dc . When D < Dc the null hypothesis H0 cannot be rejected. 7.5 Input in the Absence of Data Sometimes input data for DES models is just not easily available. In those instances one must rely on related engineering data, expert opinion and physical or other limitations to produce reasonable input values for the model. A few data values combined with the assumption of uniform, triangular or beta distribution can provide a solid starting point for research. 7.6 Correlated Input Data In some situations various inputs may be related to each other or the same input quantity may exhibit autocorrelation over time. Typical examples are in inventory modeling where demand data affect lead time data and in stock trading where buy and sell orders called to the broker tend to arrive in bursts. When two correlated input variables X1 and X2 are involved one uses their covariance Cov(X1 , X2 ) or their correlation ρ= Cov(X1 , X2 ) σ2 σ1 If collected data values for the two values yields ρ >> 0 then one needs to generate correlated variates. The following algorithm generates two correlated random variates with normal distributions with parameters µ1 , σ1 and µ2 , σ2 , respectively. • Generate two independent standard normal variates Z1 and Z2 . • Set X1 = µ1 + σ1 Z1 21 • Set X2 = µ2 + σ2 (ρZ1 + √ 1 − ρ2 Z 2 ) If data correspond to a time series of values of a single variable X1 , X2 , X − 3, ... all from the same distribution then one uses the lag-h auto covariance Cov(Xi , Xi+h ) or the lag-h correlation ρh = Cov(Xi , Xi+h ) σi+h σi Autoregressive order-1 (AR(1)) and exponential autoregressive order-1 (EAR(1)) models are commonly used to generate autocorrelated time series. The algorithm for the AR(1) model is as follows • Using the collected data, determine the values of the parameters µ ≈ X̄, φ = Cov(Xt , Xt+1 )/S 2 (lag-1 autocorrelation) and σ = S 2 (1 − φ2 ). • Generate t from a normal distribution with mean 0 and variance σ2 . • Generate X1 from a normal with mean µ and variance σ /(1 − φ2 ). • Set Xt = µ + φ(Xt−1 − µ) + t . • Repeat. The algorithm for the EAR(1) model is as follows • Using the collected data, determine the values of the parameters λ ≈ 1/X̄ and φ = Cov(Xt , Xt+1 )/S 2 (lag-1 autocorrelation). • Generate X1 from an exponential with mean 1/λ. • Generate U from a uniform [0,1]. • If U ≤ φ set Xt = φXt−1 . • If U > φ generate t from an exponential with mean 1/λ and set Xt = φXt−1 + t . • Repeat. 22 8 Generation of Random Numbers and Pseudo-Random Numbers Recall that for a random variable X which is uniformly distributed in [0, 1] the uniform probability density function is ( f (x) = 1, 0 ≤ x ≤ 1 0, otherwise while its cumulative distribution function is 0, x<0 F (x) = x, 0 ≤ x < 1 1, x ≥ 1 A random number (RN) stream is a collection of uniformly distributed random variables. A truly random stream of numbers has the following characteristics: • Uniformly distributed. • Continuous-valued. • E(R) = 12 . • σ2 = 1 . 12 • No autocorrelation between numbers. • No runs. In practice one always works with streams of pseudo random numbers (PRN). These have approximately the same characteristics as RN’s. PRN’s are generated with a computer using a numerical algorithm embedded in a computer program or routine. The requirements of a good PRNG routine are: • Fast. • Portable. • Long Cycle. • Replicability. • Produce PRN with the desired characteristics. 23 8.1 The Linear Congruential Method The established algorithm for PRN generation is the linear congruential method (LCM). More sophisticated approaches still use as foundation this method. The fundamental relationship of the LCM is Xi+1 = (aXi + c)mod (m) This means that the value of Xi+1 is the remainder left from integer division of aXi + c by m. Note that the values obtained form the LCM are from the set I = {0, 1/m, 2/m, ..., (m − 1)/m}. One key feature of the method is its period (P ) (the number of numbers that can be generated before the same number appears twice). The period is related to the values of m and c as follows: • If m = 2b and |c| > 0, P = m = 2b . • If m = 2b and c = 0, P = m/4 = 2b−2 . • If m = prime and c = 0, P = m − 1 = 2b − 1. 8.2 The Combined Linear Congruential Method Large simulations require large collections of PRNs and there is a need for still longer periods. These can be obtained by the use of combined linear congruential methods (CLCM). The fundamental theorem associated with CLCM is L’Ecouyer’s. If W i, 1, Wi,2 , ..., Wi,k are independent discrete-valued random variables with at least one of them (say Wi,1 ) being uniformly distributed between 0 and m1 − 2. then Wi = ( k X j=1 Wi,j )mod (m1 − 1) is a uniformly distributed RV between 0 and m1 − 2. More specifically, consider the following algorithm Xi = ( k X (−1)j−1 Xi,j )mod (m1 − 1) j=1 where the Xi,j are LC and with ( Ri = Xi , m1 m1 −1 , m1 24 Xi > 0 Xi = 0 It can be shown that the maximum period obtained with this algorithm is P = (m1 − 1)(m2 − 1)...(mk − 1) 2k − 1 Example. L’Ecuyer proposed the following CLCM: X1,j+1 = 40014X1,j mod (2147483563) X2,j+1 = 40692X2,j mod (2147483399) produce the combined PRNG Xj+1 = (X1,j+1 − X1,j+1 )mod (2147483562) to yield ( Rj+1 = 9 Xj+1 , 2147483563 2147483562 , 2147483563 Xj+1 > 0 Xj+1 = 0 Tests for Random Numbers Since one always works in practice with PRN streams it is necessary to check how close are their characteristics to those of real RN streams. Assume a stream containing N PRN’s has been produced. To verify their characteristics the stream is subjected to various tests. In all cases, one states a hypothesis about a given characteristic of the stream and then accepts it or rejects it with a given level of significance α where α = P (rejectingH0 |H0 is true) (i.e. Type I error). In testing for uniformity The null hypothesis H0 is Ri ∈ U [0, 1] while the alternative hypothesis H1 is / U [0, 1] Ri ∈ In testing for independence The null hypothesis H0 is Ri ∈ independent while the alternative hypothesis H1 is / independent Ri ∈ 25 9.1 Kolmogorov-Smirnov Frequency test For this test the numbers are first arranged in increasing order R1 < R2 < ... < RN The test makes use of the new variables D+ = max ( i − Ri ) N D− = max (Ri − i−1 ) N and D = max (D+ , D− ) Once D has been computed, a critical value Dc is obtained from the K-S statistical table for the desired α and the given N . Finally • If D > Dc , H0 is rejected (H1 is accepted). • If D ≤ Dc , H0 is not rejected (i.e. the numbers are uniformly distributed). 9.2 Chi-square Frequency test In this test the numbers are arranged into n classes by subdividing the range [0, 1] into n subintervals and determining how many of the numbers end up in each class i, (Oi ). The test uses the statistic χ20 = n X (Oi − Ei )2 Ei i=1 where Ei = N/n are the expected numbers of numbers in each class for a uniform distribution. Once χ20 has been computed, a critical value χ2α,n−1 is obtained from the Chi-square statistical table. Finally • If χ20 > χ2α,n−1 , H0 is rejected (H1 is accepted). • If χ20 ≤ χ2α,n−1 , H0 is not rejected (i.e. the numbers are uniformly distributed). 26 9.3 Runs Test This test aims to detect whether there are patterns in substrings of the stream. One examines the stream and checks whether each number is followed by a larger (+) or a smaller (−) number. Runs are the resulting patterns of +’s and −’s. In a truly random sequence the mean and variance of the number of up and down runs a are given by µa = 2N − 1 3 and σa2 = 16N − 29 90 When N > 20 the distribution of a is close to normal so the test statistic is Z0 = a − µa σa which has the normal distribution of mean zero and unit standard deviation (N (0, 1)). Once Z0 has been computed a critical value zα/2 is obtained from the normal statistical table. Finally • If Z0 < −zα/2 or Z0 > zα/2 , H0 is rejected (H1 is accepted). • If −zα/2 ≤ Z0 ≤ zα/2 , H0 is not rejected (i.e. the numbers are independent). Other types of runs tests are also possible, for instance runs above and below the mean and run lengths. For runs above and below the mean a test similar to the one above is used but with the values of mean and variance for the number of runs b µb = 2n1 n2 1 + N 2 and σb2 = 2n1 n2 (2n1 n2 − N ) N 2 (N − 1) where n1 and n2 are, respectively, the numbers of runs above and below the mean. For run lengths one uses the Chi − square test to compare the observed number of runs of given lengths against the expected number obtained in a truly independent stream. 27 9.4 Autocorrelation Test This test aims to detect correlation among numbers in the stream separated by specific number of numbers (lag). Consider the autocorrelation test for a lag m. One investigates then the behavior of numbers Ri and Ri+jm . If the autocorrelation ρim > 0 there is positive correlation (i.e. high numbers follow high numbers and vice versa) and if ρim < 0 one has negative correlation. The autocorrelation is estimated by ρ0im M X 1 [ Ri+km Ri+(k+1)m ] − 0.25 = M + 1 k=0 where M is the largest integer satisfying i + (M + 1)m ≤ N . The test statistic is in this case given by Z0 = ρ0im σρ0im where √ σρ0im = 13M + 7 12(M + 1) Once Z0 has been computed a critical value zα/2 is obtained from the normal statistical table. Finally • If Z0 < −zα/2 or Z0 > zα/2 , H0 is rejected (H1 is accepted). • If −zα/2 ≤ Z0 ≤ zα/2 , H0 is not rejected (i.e. the numbers are independent). 9.5 Gap Test This test checks for independence by tracking down the pattern of gaps between a given digit in the stream. The test is performed using the Kolmogorov-Smirnov scheme. 9.6 Poker Test This test checks for independence based on the repetition of certain digits in the sequence. The test is performed using the Chi-square scheme. 10 Generation of Random Variates Discrete event simulation models require as inputs the values of random variables with specified probability distributions. Such random variables are called random variates. 28 Input data for DES models are collected from the field and/or produced from best available estimates. However, the amount of data collected is rarely enough to run simulation models and one must use the data to create PRN streams with statistical characteristics similar to those of the original data. So, on the one hand one needs to identify the statistical characteristics of the original data and on the other one must be able to produce large collections of random variates with statistical characteristics similar to those of the original data. Here we focus on the second aspect, namely once we have determined the probability distribution applicable to our data we proceed to generate random variate streams for use in the simulation. This is accomplished by the inverse transform method. 10.1 The Inverse Transform Method Given a random (or pseudo-random) number R and a random variate X, • Determine the cumulative distribution function of X, F (X). • Set F (X) = R. • Solve the equation F (X) = R for X in terms of R, i.e. X = F −1 (R). • Repeat the above for the stream of random (or pseudo-random) numbers R1 , R2 , ..., Rn to obtain the stream of random variates X1 , X2 , ..., Xn . Next, the formulae obtained by the inverse transform method for several commonly used random variates are given. 10.2 Inverse Transform for the Exponential Distribution Following are the specific steps required to obtain exponentially distributed random variates with mean λ from a random number stream using the inverse transform method. • F (x) = 1 − e−λx . • Set F (X) = 1 − e−λx = R. • X = − λ1 ln(1 − R). • For i = 1, 2, ..., n, compute Xi = − λ1 ln(1 − Ri ) 29 10.3 Inverse Transform for the Uniform Distribution Following are the specific steps required to obtain uniformly distributed random variates between a and b from a random number stream using the inverse transform method. • F (x) = x−a . b−a • Set F (X) = X−a b−a = R. • X = a + (b − a)R. • For i = 1, 2, ..., n, compute Xi = a + (b − a)Ri 10.4 Inverse Transform for the Weibull Distribution Following are the specific steps required to obtain Weibull distributed random variates with parameters α and β from a random number stream using the inverse transform method. β • F (x) = 1 − e−(x/α) . β • Set F (X) = 1 − e−(X/α) = R. 1 • X = α[ln(1 − R)] β . 1 • For i = 1, 2, ..., n, compute Xi = α[− ln(1 − Ri )] β 10.5 Inverse Transform for the Triangular Distribution Following are the specific steps required to obtain random variates with triangular distribution between 0 and 2 with mode 1 from a random number stream using the inverse transform method. • F (x) = 0 x2 2 1− 1 (2−x)2 2 x≤0 0<x≤1 1<x≤2 x>2 ( √ • Xi = 2Rqi 0 < Ri ≤ 12 2 − 2(1 − Ri ) 12 < Ri ≤ 1 30 10.6 Inverse Transform for Empirical Distributions If no appropriate distribution can be found for the data one can resort to resampling the data. This creates an empirical distribution. A simple empirical distribution can be produced from given data by piecewise linear approximation. Assume the available data points (observations) are arranged in increasing order x1 , x2 , ..., xn . Assume also that a probability is assigned to each resulting range xj − xj−1 such that the cumulative probability of the first j intervals is cj . The associated random variate is obtained as Xi = xj−1 + xj − xj−1 (Ri − cj−1 ) cj − cj−1 when cj−1 < Ri ≤ cj . 10.7 Inverse and Direct Transforms for the Normal Distribution The normal distribution does not have a closed-form inverse transformation. However, the following expression is an excellent approximation to the inverse cumulative distribution function of the standard normal distribution. Xi ≈ Ri0.135 − (1 − Ri )0.135 0.1975 From the above, random variates with a normal distribution of mean µ and standard deviation σ are readily obtained as Xi ≈ µ + σ( Ri0.135 − (1 − Ri )0.135 ) 0.1975 A direct transformation can be used to produce two independent standard normal variates Z1 and Z2 from two random numbers R1 and R2 according to 1 Z1 = (−2 ln R1 ) 2 cos(2πR2 ) and 1 Z2 = (−2 ln R1 ) 2 sin(2πR2 ) Normal random variates Xi with mean µ and standard deviation σ can then be obtained from Xi = µ + σZi 31 10.8 Inverse and Direct Transforms for the Lognormal Distribution If the random variable Y has the normal distribution with mean µ and variance σ 2 , the associated random variable X = exp(Y ) has the lognormal distribution with parameters µ and σ 2 . Thus, random variates with a standard lognormal distribution can be generated from the expression Xi ≈ exp( Ri0.135 − (1 − Ri )0.135 ) 0.1975 Random variates with a lognormal distribution of parameters µ and σ are then generated by Xi ≈ exp[µ + σ( 10.9 Ri0.135 − (1 − Ri )0.135 )] 0.1975 Inverse Transform for the Discrete Distributions A similar procedure to the one indicated above can be used to produce discretely distributed random variates. Since the cumulative distribution functions for discrete distributions consist of discrete jumps separated by horizontal plateaus, lookup tables are a convenient and very efficient method of generating inverses. 10.10 Other Methods of Generating Random Variates When two or more random variables are added together to produce a new random variable with a desired distribution one is using the method of convolution. If one generates the random variate by selective accepting or rejecting numbers from a random number stream one is using the acceptance-rejection technique. Detailed descriptions of these two methods as well as examples can be found in your textbook. 32