* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter-4:Probability Distributions and Their Applications
Survey
Document related concepts
Transcript
Chapter IV Probability distributions and their applications 4.1 DISCRETE DISTRIBUTIONS 4.1.1 Binomial distribution Consider a discrete time scale. At each point on this time scale, an event may either occur or not occur. Let the probability of the event occurring be p for every point on the time scale. Thus, the occurrence of the event at any point on the time scale is independent of the history of any prior occurrences or non-occurrences. The probability of an occurrence at the i th point on the time scale is p for i = 1,2, K . A process having these properties is said to be a Bernoulli process. As an example of a Bernoulli process consider that during any year the probability of the maximum flow exceeding 10,000 cubic feet per second (cfs) on a particular river is p . Common terminology for a flow exceeding a given value is an exceedance. Further consider that the peak flow in any year is independent from year to year (a necessary condition for the process to be a Bernoulli process). Let q = 1 − p be the probability of not exceeding 10,000 cfs. We can neglect the probability of a peak exactly 10000 cfs since the peak flow rates would be a continuous process so the probability of a peak exactly 10000 cfs would be zero. In this example, the time scale is discrete with the points nominally 1 year in time apart. We can now make certain probabilistic statements about the occurrence of a peak flow in excess of 10000 cfs (an exceedance). For example, the probability of an exceedance occurring in year 3 and not in year 1 or 2 is qqp since the process is independent from year to year. The probability of (exactly) one exceedance in any 3-year period is pqq + qpq + qqp since the exceedance could occur in either the first, second, or third year. Thus the probability of (exactly) one exceedance in three years is 3 pq 2 . In a similar manner, the probability of 2 exceedances in 5 years can be found from the summation of the terms ppqqq , pqqpq , K , qqqpp . It can be seen that each of theses terms is equivalent to p 2 q 3 and that the number of terms is equal to the number of ways of arranging 2 Chapter IV: Probability Distributions and Their Applications items (the p 's) among 5 items (the p 's and q 's). Therefore the total number of terms ⎛5⎞ is ⎜⎜ ⎟⎟ or 10 so that the probability of exactly 2 exceedances in 5 years is 10 p 2 q 3 . ⎝ 2⎠ This result can be generalized so that the probability of X = x exceedances in n ⎛n ⎞ years is ⎜⎜ ⎟⎟ p X q n − X . The result is applicable to any Bernoulli process so that the ⎝X⎠ probability of X = x occurrences of an event in n independent trials if p is the probability of an occurrence in a single trial is given by: ⎛n⎞ f X ( x : n, p ) = ⎜⎜ ⎟⎟ p x q n − x ⎝ x⎠ x = 0,1,2, K , n This equation is known as the binomial distribution. The binomial distribution and the Bernoulli process are not limited to a time scale. Any process that may occur with probability p at discrete points in time or space or in individual trials may be a Bernoulli process and follow the binomial distribution. The cumulative binomial distribution is X ⎛n⎞ F X ( x; n, p) = ∑i = 0 ⎜⎜ ⎟⎟ p i q n − i ⎝i ⎠ x = 0,1,2, K , n and gives the probability of x or fewer occurrences of an event in n independent trials if the probability of an occurrence in any trial is p. Continuing the above example, the probability of less than 3 exceedances in 5 years is 2 ⎛ 5⎞ FX (2;5, p ) = ∑i =0 ⎜⎜ ⎟⎟ p i q 5−i ⎝i ⎠ = f X (0;5, p) + f X (1;5, p) + f X (2;5, p) The mean and variance of the binomial distribution are E ( X ) = np var( X ) = nqp The coefficient of skew is (q − p) / npq so that the distribution is symmetrical for p = q , skewed to the right for q > p and skewed to the left for q < p . 68 Chapter IV: Probability Distributions and Their Applications The binomial distribution has an additive property. That is if X has a binomial distribution with parameters n1 and p and Y has a binomial distribution with parameters n 2 and p , then Z = X + Y has a binomial distribution with parameters n = n1 + n2 and p . The binomial distribution can be used to approximate the hyper-geometric distribution if the sample selected is small in comparison to the number of items N from which sample is drawn. In this case the probability of a success would be about the same for each trial. Example: In order to be 90 percent sure that a design storm is not exceeded in a 10 year period. What should be the return period of the design storm? Solution: Let p be the probability of the design storm being exceeded. The probability of no exceedances is given by ⎛10 ⎞ f X (0;10, p) = ⎜⎜ ⎟⎟ p 0 q 10 ⎝0 ⎠ 0.90 = (1 − p)10 p = 1 − (0.90)1 / 10 = 1 − 0.9895 = 0.0105 T = 1 / p = 95 years. Comment: To be 90 percent sure that a design storm is not exceeded in a 10-year period a 95-year return storm must be used. If a 10-year return period storm is used, the chances of it being exceeded is 1 − f X (0;0,1) = 0.6513 . In general the chance of at least one occurrence of a T-year event in T-years is 1 − f X (0; T ,1 / T ) = 1 − (1 − 1 / T ) T . Therefore, for a long design life, the chance of at least one occurrence of an event with return period equal to the design life approaches 1 − 1 / e or 0.632.Thus if the design life of a structure and its design return period are the same, the chances are very great that the capacity of the structure will be exceeded during its design life. 4.1.2 Poisson distribution The Poisson distribution is like the binomial distribution in that it describes phenomena for which the average probability of an event is constant, independent of the number of previous events. In this case, however, the system undergoes transitions 69 Chapter IV: Probability Distributions and Their Applications randomly from one state with n occurrences of an event to another with ( n + 1) occurrences, in a process that is irreversible. That is, the ordering of the events cannot be interchanged. Another distinction between the binomial and Poisson distributions is that for the Poisson process the number of possible events should be large. The Poisson distribution may be inferred from the identity e −µ e µ = 1 where the most probable number of occurrences of the event is µ . If the factorial is expanded in a power series expansion, the probability P(r) that exactly r random occurrences will take place can be inferred as the r th term in the series, i.e., e −µ µ r p (r ) = r! (4.1.2.1) This probability distribution leads directly to the interpretation that: e − µ = the probability that an event will not occur, µ e − µ = the probability that an event will occur exactly once, ( µ 2 / 2! ) e − µ = the probability that an event will occur exactly twice, etc, The mean and the variance of the Poisson distribution are: E( X ) = µ Var ( X ) = µ The coefficient of skew is µ −1 / 2 so that as µ gets large, the distribution goes from a positively skewed distribution to a nearly symmetrical distribution. The cumulative Poisson probability that an event will occur x times or less is: x p (≤ x ) = ∑ p ( r ) r =0 Of course, the probability that the event will occur ( x + 1) or more times would be the complement of P(x). The Poisson distribution is useful for analyzing the failure of a system that consists of a large number of identical components that, upon failure, cause irreversible transitions in the system. Each component is assumed to fail independently and randomly. Then µ is the most probable number of system failures over the life time. To summarize: 70 Chapter IV: Probability Distributions and Their Applications ¾ The binomial distribution is useful for systems with two possible outcomes of events (failure–no failure) in cases where there is a known, finite number of (Bernoulli) trials and the ordering of the trials does not affect the outcome. ¾ The Poisson distribution treats systems in which randomly occurring phenomena cause irreversible transitions from one state to another. Example: A given nuclear reactor is fueled with 200 assemblies, each of which can fail if the cladding on a fuel rod fails. If each assembly fails in an independent and random manner over the exposure time, calculate the probability of 3 assemblies failing if, on the average, 1% of the fuel assemblies are known to fail. (MacCormick, 1981, p. 34) Solution: The mean number of assembly failure is µ = 2 , so using equation (4.1.2.1) for r = 3 gives P (3) = (2 3 / 3!)e −2 = 0.1804 As a check, we can use the probability of a single assembly failing, p = 0.01 , and the binomial distribution equation with n = 200 to obtain P (3) = 4.1.3 200! (0.01) 3 (0.99) 200−3 = 0.1814 3!(200 − 3)! Hyper-geometric distribution Drawing a random sample of size n (without replacement) from a finite population of size N with the elements of the population divided into two groups with k elements belonging to one group is an example of sampling from a hyper-geometric distribution. The two groups may be defective or non-defective objects, rainy or nonrainy days, success or failure of a project, etc. The total number of possible outcomes or ways of selecting a sample of size n from ⎛N⎞ N objects is ⎜⎜ ⎟⎟ . The number of ways of selecting x successes and n-x failures ⎝n ⎠ ⎛k ⎞ ⎛ N − k ⎞ ⎟⎟ . Thus from the population containing k successes and N − k failures is ⎜⎜ ⎟⎟ ⎜⎜ ⎝ x⎠ ⎝n − x ⎠ the probability is: 71 Chapter IV: Probability Distributions and Their Applications ⎛k ⎞ ⎛ N − k ⎞ ⎛ N ⎞ ⎟⎟ / ⎜⎜ ⎟⎟ f ( x; N , n, k ) = ⎜⎜ ⎟⎟ ⎜⎜ ⎝ x⎠ ⎝n − x ⎠ ⎝n ⎠ (4.1) The distribution given by equation (4.1) is known as the hyper-geometric distribution where f ( x; N , n, k ) is the probability of obtaining X = x success in a sample of size n drawn from a population of size N containing k successes. The cumulative hyper-geometric distribution giving the probability of x or fewer successes is: X ⎛k ⎞⎛ N − k ⎞ ⎛ N ⎞ ⎟⎟ / ⎜⎜ ⎟⎟ F ( X ; N , n, k ) = ∑i = 0 ⎜⎜ ⎟⎟ ⎜⎜ ⎝i ⎠ ⎝ n − i ⎠ ⎝ n ⎠ (4.2) The natural restriction on this distribution is that the outcomes must be random and equally likely. The mean and variance of the hyper-geometric distribution are: E ( X ) = nk / N var( X ) = nk ( N − k )( N − n) / N 2 ( N − 1) 4.1.4 Exponential distribution The probability distribution of the time, T, between occurrences of the event can be found by noting that the prob (T ≤ t ) is equal to 1– prob (T > t ) . The prob (T > t ) is equal to the probability of no occurrences in time t which is f (0; λt ) or e − λ t . Thus prob (T ≤ t ) = PT (t ; λ ) = 1 − e − λ t (4.3) which is a cumulative distribution known as the exponential distribution. The probability density function is p T (t; λ ) = dpT (t; λ ) = λe − λ t dt (4.4) and is the probability distribution of the length of the tome interval between occurrence of the event. The mean and variance of the exponential distribution are 1 / λ and 1 / λ 2 , respectively. 4.1.5 Gamma distribution The probability distribution of the time to the n th occurrence can be found by noting 72 Chapter IV: Probability Distributions and Their Applications that the time to the n th occurrence is the sum of n independent random variables, T1 + T2 + K + Tn , from the exponential distribution. The method of derived distributions can be used with the result that the probability density function of the time to the n th occurrence is p T (t ; n, λ ) = λ n t n −1 e − λ t /(n − 1)! t > 0; λ > 0; n = 1, 2, L (4.5) which is the gamma distribution. The The mean and variance of the gamma distribution are E (T ) = n / λ Var (T ) = n / λ 2 Example: Barges arrive at a lock at an average of 4 each hour. (a) If the arrival of barges at the lock can be considered to follow a Poisson process, what is the probability that 6 barges will arrive in 2 hours? (b) If the lock master has just locked through all of the barges at the lock, what is the probability he can take a 15 minute break without another barge arriving? (c) If the operation of the lock is such that 4 barges can be locked through at once and the lock master insists that this always be the case, what is the probability that the first barge to arrive after 4 previous barges have been locked through will have to wait at least 1 hour before being locked through? (Haan, 1979, p. 78) 4.1.6 Multinomial distribution The binomial distribution can be generalized to include the probabilities of outcomes of several types rather than the two possible outcomes of the binomial. If the probabilities associated with each of k distinct outcomes are p1 , p 2 , K , p k , then in independent trials the probability of X 1 outcomes of type 1, X 2 outcomes of' type 2, K, X k outcomes of type k, is given by the multinomial distribution as f X 1 , X 2 K X k ( x1 , x 2 , K , x k ; n, p1 , p 2 , K , p k ) = n! x x x p1 1 p 2 2 K p k k X 1 ! X 2 !K X k ! f X ( x : n , p ) = n! Π ik=1 p iX i / x i ! or where X , x and p are 1 × k vectors. Some restrictions on this distribution are ∑ k i =1 pi = 1 and ∑ k i =1 Xi = n 73 Chapter IV: Probability Distributions and Their Applications The mean and variance of the multinomial distribution are E ( X i ) = npi var( X i ) = npi (1 − pi ) Problem: On a certain stream the probability that the maximum peak flow during a 1year period will be less than 5,000 cfs is 0.2 and the probability that it will be between 5,000 cfs and 10,000 cfs is 0.4. In a 20-year period, what is the probability of 4 peak flows less than 5,000 cfs and 8 peak flows between 5,000 and 10,000 cfs? (Haan, 1979, p. 80) 4.2 CONTINUOUS DISTRIBUTIONS 4.2.1 Normal distribution The normal distribution is a two parameter distribution whose density function is p X ( x) = (2πσ ) 2 −1 / 2 −( x − µ )2 e 2σ 2 −∞ < x < ∞ (4.6) The parameters µ (mean) and σ 2 (variance) are denoted as location and scale parameters, respectively. The normal distribution is a bell-shaped, continuous and symmetrical distribution (the coefficient of skew is zero). If µ is held constant and σ 2 varied, the distribution changes as in Figure 4.2.1.1. If σ 2 is held constant and µ varied, the distribution does not change scale but docs change location as in Figure 4.2.1.2. A common notation for indicating that a random variable is normally distributed with mean µ and variance σ 2 is N ( µ , σ 2 ) . Figure 4.2.1.1 Normal distributions with same mean and different variances 74 Chapter IV: Probability Distributions and Their Applications Figure 4.2.1.2 Normal distributions with same variance and different means If a random variable X is N ( µ , σ 2 ) and Y = a + bX , the distribution of Y can be shown to be N (a + bµ , b 2σ 2 ) . This can be proven using the method of derived distributions. Furthermore, if X i for i = 1,2,K n , are independently and normally distributed with mean µ i and variance σ i2 , then Y = a + b1 X 1 + b2 X 2 + L + bn X n is normally distributed with µY = a + ∑i =1 bi µi n (4.7) and σ Y2 = ∑i =1 bi2σ i2 n (4.8) Any linear function of independent normal random variables is also a normal random variable. Example: If X i is a random observation from the distribution N ( µ ⋅ σ 2 ) , what is the distribution of X = ∑i =1 X i / n ? n Solution: X is a linear function of xi given by X = ( x1 + x2 + K + xn ) / n . From equations 4.7 and 4.8 (the reproductive properties of the normal distribution), X is normally distributed with mean µ x = a + ∑i =1 bi µi = 0 + ∑i =1 µ / n = nµ / n = µ n n and variance σ X2 = ∑i =1 bi2σ i2 = ∑i =1σ 2 / n 2 = nσ 2 / n 2 = σ 2 / n n n Therefore X is N ( µ , σ 2 / n) . 75 Chapter IV: Probability Distributions and Their Applications Standard normal distribution: The probability that X is less than or equal to x when X is N ( µ , σ 2 ) can be evaluated from x prob( X ≤ x) = p x ( x) = ∫ (2πσ 2 ) −1/ 2 e −(t − µ ) 2 / 2σ 2 −∞ dt (4.9) The equation (4.9) cannot be evaluated analytically so that approximate methods of integration arc required. If a tabulation of the integral was made, a separate table would be required for each value of µ and σ 2 . By using the liner transformation Z = ( X − µ ) / σ , the random variable Z will be N(0,1). The random variable Z is said to be standardized (has µ = 0 and σ 2 = 1 ) and N (0,1) is said to be the standard normal distribution. The standard normal distribution is given by p Z ( z ) = (2π ) −1 / 2 e − z 2 −∞ < z < ∞ /2 (4.10) and the cumulative standard normal is given by z prob(Z ≤ z ) = PZ ( z ) = ∫ (2π ) −1 / 2 e −t −∞ 2 /2 dt (4.11) Figure 4.2.1.3 shows the standard normal distribution which along with the transformation Z = ( X − µ ) / σ contains all of the information shown in Figures 4.1 and 4.2. Both pZ (Z ) and PZ (z ) arc widely tabulated. Most tables utilize the symmetry of the normal distribution so that only positive values of Z are shown. Tables of PZ (z ) may show prob ( Z ≤ z ) or prob (0 ≤ Z ≤ z ) . Care must be exercised when using normal probability tables to see what values are tabulated. Figure 4.2.1.3 Standard normal distribution 76 Chapter IV: Probability Distributions and Their Applications By studying a standard normal table it can be seen that 68.26% of the normal distribution is within 1 standard deviation of the mean, 95.44% within 2 standard deviations of the mean, and 99.72% within 3 standard deviations of the mean. These are called the 1, 2 and 3 sigma bounds of the normal distribution. The fact that only 0.28% of the area of the normal distribution lies outside the 3 sigma bound demonstrates that the probability of a value less than µ − 3σ is only 0.14% and is the justification for using the normal distribution in some instances even though the random variable under consideration may be bounded by X = 0 . If µ is greater than 3σ , the chances of an X less than zero are many times negligible (this is not always true however). 4.2.2 Uniform/rectangular distribution If a continuous random process is defined over an interval a to b and the probability of an outcome of this process being in a subinterval of a to b is proportional to the length of the subinterval, the process is said to be uniformly distributed over the interval a to b. The probability density function for the continuous uniform distribution is p X ( x) = 1 /( β − α ) for α ≤ x ≤ β and the cumulative distribution function is p X ( x) = ( x − α ) /( β − α ) for α ≤ x ≤ β The mean and variance of the uniform distribution are E ( X ) = (β + α ) / 2 var( X ) = ( β − α ) 2 / 12 The methods of moments yields the following estimators for the parameters α and β: αˆ = X − 3S βˆ = X + 3S The method of maximum likelihood when applied to the uniform distribution results in the estimators for α and β being the smallest and largest sample values respectively. That this is the case can be seen by writing out the likelihood function and then selecting those values of α and β (within the constraints that α ≤ X ≤ β ) that maximize the function. 77 Chapter IV: Probability Distributions and Their Applications The uniform distribution finds its greatest application as the distribution of PX (x) for all probability density functions. That is the prob ( PX ( x) ≤ y) is uniformly distributed over the interval (0 ≤ y ≤ 1) for any continuous probability distribution. This fact is used for generating random observations from probability distributions. Example: Use the method of moments to estimate the parameters of the uniform distribution based on the following sample: 1, 4, 3, 4, 5, 6, 7, 6, 9, 5. What are the maximum likelihood estimators for this sample? (Haan, 1979, p. 98) Solution: By method of moments x = 5.00 and s = 2.26 βˆ = x + 3s =8.83 αˆ = x − 3s =1.17 By maximum likelihood α̂ =1.00 βˆ =9.00 Comment: This problem illustrates that the method of moments and the method of maximum likelihood do not always produce the sample parameter estimates. In this case the parameters estimated by moments are not reasonable since values of X outside the limits of α̂ and βˆ are present in the sample. This is a common problem when the method of moments is used to estimate the parameters of the uniform distribution for small samples. Of course for large samples neither the moment nor the maximum likelihood estimates will be "good" if the sample is not truly a random sample from a uniform distribution. 4.2.3 Exponential distribution The exponential density function is given by p X ( x ) = λ e − λx x > 0, λ > 0 and the cumulative exponential by X p X ( x) = ∫ λ e − λ t dt = 1 − e − λ X 0 The mean and variance of the exponential distribution are 78 x>0 Chapter IV: Probability Distributions and Their Applications E( X ) = 1/ λ var( X ) = 1 / λ2 The exponential distribution is positively skewed with the skewness coefficient of 2. Both the method of moments and the maximum likelihood estimation give the parameter λˆ = 1 / X . The exponential distribution is a special case of the gamma distribution. 4.2.4 Hypo-exponential distribution Many processes in nature can be divided into sequential phases. If the time the process spends in each phase is independent and exponentially distributed, then the overall time is hypo-exponentially distributed. The service times for input-output operations in a computer system often possess this distribution. Let X i , i = 1, K , n, be independent exponential random variables with respective rates λi , i = 1,K, n, and suppose that λi ≠ λ j for i ≠ j . The random variable S = ∑i =1 X i is n said to be a hypo-exponential random variable. To compute its probability density function, let us start with the case n = 2 . Now, t f S (t ) = f X 1 + X 2 (t ) = ∫ fX 1 ( s ) fX 2 (t − s ) ds 0 t = ∫ λ1e −λ1s λ2 e −λ2 (t − s ) ds 0 t = λ1λ2 e −λ2t ∫ e −( λ1 −λ2 ) s ds 0 = = λ1 λ1 − λ2 λ1 λ1 − λ2 λ2 e −λ t (1 − e −( λ −λ )t ) 2 λ2 e −λ t + 2 1 λ2 2 λ2 − λ1 λ1e −λ t 1 A similar computation yields for n = 3 , 3 f S(t ) = f X 1 + X 2 + X 3 (t ) = ∑ λi e −λi t (∏ i =1 which suggests the general result: 79 j ≠i λj λ j − λi ) Chapter IV: Probability Distributions and Their Applications n f S (t ) = f X 1 +K+ X n (t ) = ∑ C i , n λi e −λi t i =1 where C i ,n = ∏ j ≠i λj λ j − λi The distribution function of the hypo-exponential distribution is: FS (t ) = 1 − λ2 λ2 − λ1 e −λ1t + λ1 λ2 − λ1 e − λ2t t≥0 The hazard rate (failure rate) of this distribution is given by: λ1λ 2 (e − λ t − e − λ t ) hS (t ) = λ 2 e −λ t − λ1e −λ t 1 2 1 2 It is not difficult to see that this is an increasing failure rate (IFR) distribution with the failure rate increasing from 0 to min{ λ1 , λ2 } . Integrating both sides of the expression for f S from t to ∞ yields that the tail distribution function of S is given by n p{S > t} = ∑ C i ,n e −λi t i =1 The failure rate function of S , rS (t ) , is as follows: C λe λ ∑ r (t ) = ∑ C eλ − it n i =1 n S i =1 i ,n i − it i ,n If we let λ j = min( λ1 , K , λ n ) then it follows, upon multiplying the numerator and denominator of rS (t ) by e λ jt , that lim rS (t ) = λ j t →∞ From the preceding, we can conclude that the remaining lifetime of a hypoexponentially distributed item that has survived to age t , for large t , is approximately that of an exponentially distributed random variable with a rate equal to the minimum of the rates of the random variables whose sums make up the hypo-exponential. 4.2.5 Erlangian distribution The Erlangian distribution is the time-dependent form of the Poisson discrete distribution. The Erlangian distribution arises frequently in reliability engineering 80 Chapter IV: Probability Distributions and Their Applications calculations involving random failures, i.e., those failures for which the hazard rate λ (t ) is a constant λ . The parameter λ is the average rate of occurrence of the event. To derive the distribution, we recognize that the mean number of failures µ is the product of λ and time t . The probability of exactly r failures occurring in time t is then given by: p(r , t ) = e − λt (λt ) r / r! (4.5.1) and the cumulative probability of x or fewer failure is: e − λt ( λ t ) r r! r =0 x p ( ≤ x, t ) = ∑ (4.5.2) Equation (4.5.1) is useful since it permits calculation of the failure probability density f (t ) for the rth failure in dt about t . What is required, of course, is for the system to have undergone ( r − 1) prior failures so that it is ready to fail for the r th time with a conditional probability λ . Thus the Erlangian distribution follows from equation (4.5.1) as f (t ) = λp(r − 1, t ) = λ (λt ) r −1 e − λt (r − 1)! λ > 0 , r ≥ 1 (4.5.3) For r = 1 , we get the exponential distribution. All the distributions have a common mean at t = 1 / λ . The mode is at x = 0 for r = 1 (exponential) and at t = ( r − 1) /(λr ) for other values of r . The variance of the rth members of the family is 1 /(rλ 2 ) . As r increases, the mode moves towards the right to 1 / λ and the variance decreases to 0. The Erlang family has a close association with the exponential distribution going beyond the fact that the first Erlang distribution is exponential. If we have r random variables X 1 , X 2 , K X r which are independent and have a common exponential distribution with mean 1 /( rµ ) , then the random variable X 1 + X 2 + K + X r follows the rth Erlang distribution with parameter r . These distributions play an important role in queuing theory and were developed A.K Erlang for use in telephone system. 4.2.6 Gamma distribution The gamma distribution failure probability density obeys the equation 81 Chapter IV: Probability Distributions and Their Applications f (t ) = λ (λt ) r −1 e − λ t Γ( r ) , λ >0, r >0 (4.6.1) where parameter r need not be an integer. The two parameters are the shape parameter r and the scale parameter λ . The shape of the distribution depends significantly upon the value of r. It has also an impact on the hazard rate λ (t ) . In the special case that r is an integer, the Erlangian distribution is recovered; in the special case that λ = 0.5 and r = 0.5η , where η is the number of degrees of freedom, the gamma distribution becomes the chi-square distribution. The cumulative failure probability F(t) is: F (t ) = = 1 λ t r −1 − y y e dy Γ(r ) ∫0 (4.6.2) 1 γ (r , λt ). Γ( r ) The mean and variance of the gamma distribution are: m = r/λ σ 2 = r / λ2 The gamma distribution is especially appropriate for systems subjected to an environment of repetitive, random shocks generated according to the Poisson distribution; thus the failure probability depends upon how many shocks the device has suffered, i.e., its age. As another application, if the mean rate of wear of a device is a constant, but the rate of wear is subject to random variations, then the gamma function should be used. For some devices, such as those for which corrosion of metals is important, it may be appropriate to modify the two-parameter gamma distribution by introducing a time delay τ before the onset of failures begins. Then equation (4.6.1) is modified to read as: f (t ) = λr (t − τ ) r −1 e − λ (t −τ ) Γ(r ) , 82 t ≥τ (4.6.3) Chapter IV: Probability Distributions and Their Applications t <τ = 0, In such a case, the mean of the distribution becomes: m =τ + r /λ (4.6.4) Example: Suppose that a device subjected to repetitive random shocks satisfies a gamma distribution with parameters r = 3 and λ = 10 −3 / hr, and that no failures can occur until 200 hour have passed. Estimate (a) the probability of failure after the device has operated for t = 4500 hour and (b) its mean time to failure. (MacCormick, 1981, p. 37) Solution: In this problem, the time displacement is τ = 200 hour. Integrating equation (4.6.3) from 0 to t and using equation (4.6.2) gives the cumulative probability: F (4500) = [ ] 1 1 γ 3,10 −3 (4500 − 200) = γ (3,4,3) ≈ 0.8 Γ(3) Γ(3) Using equation (4.6.4) gives mean time to failure (MTTF) = 200 + 3/1 0-3 = 3200 hr. 4.2.7 Weibull distribution The Weibull is a very general and popular failure distribution that has been shown to apply to a large number of diverse situations. The distribution is named after Waloddi Weibull, a Swidish physicist, who used it in 1939 to represent the distribution of the breaking strength of materials. The distribution has also been used in reliability and quality control. The density function of the three parameter Weibull distribution is given by: α f ( x) = β ⎛ x −ε ⎞ ⎟⎟ ⎜⎜ ⎝ β −ε ⎠ α −1 ⎡ ⎛ x − ε ⎞α ⎤ ⎟⎟ ⎥ , exp⎢− ⎜⎜ ⎢⎣ ⎝ β − ε ⎠ ⎥⎦ α > 0, β > 0, 0 ≤ ε ≤ x ≤ ∞ The shape of the distribution depends primarily on the shape parameter, α . The scale parameter is β and the delay/displacement parameter is ε . If α = 1 , the Weibull distribution reduces to the exponential distribution. As α increases, the Weibull distribution tends to the normal distribution. For α = 2 , the distribution becomes the Rayleigh distribution. The Weibull distribution is also known as the bounded exponential distribution. 83 Chapter IV: Probability Distributions and Their Applications The cumulative distribution function is given by: ⎡ ⎛ x − ε ⎞α ⎤ ⎟⎟ ⎥ F ( x) = 1 − exp⎢− ⎜⎜ ⎢⎣ ⎝ β − ε ⎠ ⎥⎦ By using the transformation ⎛ x −ε y = ⎜⎜ ⎝ β −ε α ⎞ ⎟⎟ , tables of e − y can be used to ⎠ determine F (x ) . The mean and variance of the distribution are: E ( X ) = ε + ( β − ε )Γ(1 + 1 / α ) (4.7.1) [ Var ( X ) = ( β − ε ) 2 Γ(1 + 2 / α ) − Γ 2 (1 + 1 / α ) ] (4.7.2) The coefficient of skew is given by: γ = Γ(1 + 3 / α ) − 3Γ(1 + 2 / α )Γ(1 + 1 / α ) + 2Γ 3 (1 + 1 / α ) [Γ(1 + 2 / α ) − Γ 2 (1 + 1 / α ) (4.7.3) ] 3/ 2 From equations (4.7.1) and (4.7.2), we can obtain β = µ + σA(α ) (4.7.4) ε = β − σB (α ) (4.7.5) where A(α ) = [1 − Γ(1 + 1 / α )]B(α ) (4.7.6) [ B(α ) = Γ(1 + 2 / α ) − Γ 2 (1 + 1 / α ) ] −1 / 2 (4.7.7) The moment estimates for α , β , and ε can now be obtained by (i) solving equation (4.7.3) for α̂ , (ii) solving (4.7.6) and (4.7.7) for A(α ) and B (α ) , (iii) solving (4.7.4) for βˆ and (iv) solving (4.7.5) for εˆ . In Haan (1979), a table is given to simplify the calculations. 84 Chapter IV: Probability Distributions and Their Applications Figure 4.7.1 Typical Weibull density curves (the vertical and horizontal axes are f (x ) and x − ε , respectively) Example: The lifetime X in hours of a component is modeled by a Weibull distribution with α = 2 . Starting with a large number of components, it is observed that 15% of the components that have lasted 90 hours fail before 100 hours. Determine the parameter λ . (Trivedi, 2004, p. 130) Solution: FX ( x) = 1 − e − λx and we are given that P ( X < 100 | X > 90) = 0.15 . Also, 2 P( X < 100 | X > 90) = P(90 < X < 100) P( X > 90) = FX (100) − FX (90) 1 − FX (90) e − λ ( 90 ) − e − λ (100 ) 2 = e −λ ( 90 ) 2 2 . Equating the two expressions and solving for λ , we get: λ = − ln(0.85) / 1900 = 0.1625 / 1900 = 0.00008554 . 4.2.8 Hyper-exponential distribution A process with sequential phases gives rise to a hypo-exponential or an Erlang distribution, depending upon whether or not the phases have identical distributions. Instead, if a process consists of alternate phases – that is, during any single experiment, the process experiences one and only one of the many alternate phases – and these phases have independent exponential distributions, then the overall 85 Chapter IV: Probability Distributions and Their Applications distribution is hyper-exponential. The density function of a k -phase hyperexponential random variable is: k k f (t ) = ∑ α i λi e −λi t , t > 0, λi > 0, α i > 0, ∑ α i = 1 i =1 i =1 and the distribution function is: F (t ) = ∑ α i (1 − e − λit ), t≥0 i The failure rate is: αλe λ ∑ h(t ) = ∑α e λ − t i i − it t > 0, , i which is a decreasing failure rate from ∑α λ down to min {λ , λ ,K}. i i 1 2 The hyper-exponential distribution exhibits more variability than the exponential. CPU service-time distribution in a computer system has often been observed to possess such a distribution. Similarly, if a product is manufactured in several parallel assembly lines and the outputs are merged, then the failure density of the overall product is likely to be hyper-exponential. The hyper-exponential is a special case of mixture distributions that often arise in practice – that is, of the from: ∑α F ( X ) = ∑ α i Fi ( X ), i i = 1, α i ≥ 0 . To see how a hyper-exponential random variable might originate, imagine that a bin contains n different types of batteries, with a type j battery lasting for an exponential distributed time with rate λ j , j = 1, 2, L , n . Suppose further that Pj is the proportion of batteries in the bin that are type j for each j = 1, 2, L , n . If a battery is randomly chosen, in the sense that it is equally likely to be any of the batteries in the bin, then the lifetime of the battery selected will have the hyperexponential distribution. 4.2.9 Double exponential distribution The distribution was discovered by Laplace in 1774, as the form of distribution for which the likelihood function is maximized by setting the location parameter equal to the median of the observed values of an odd number of independent identically distributed random variables. A continuous random variable X is said to have a 86 Chapter IV: Probability Distributions and Their Applications generalised Laplace double exponenetial distribution if its probability density function is given by: 1 − f ( x; θ , λ ) = e 2λ x −θ λ −∞ < x < ∞ , (4.9.1) λ and θ are the two parameters of the distribution such that λ > 0 and − ∞ < x < ∞ . If θ = 0 and λ = 1 , then (4.9.1) reduces to f ( x) = 1 −x e 2 This is the standard form of the double exponential distribution. It is also known as Poisson's first law of error. The double exponential distribution has: 1. Mean, E ( X ) = 0 2. Variance, Var[ X ] = 2 3. Mean deviation about mean, η = 1 4. Skewness, β 1 = 0 5. Kurtosis, β 2 = 6 The distribution is symmetrical about x = 0 . The distribution finds its application in queue theory and the theory of reliability. 4.2.10 Cauchy distribution A continuous random variable X is said to have a generalized Cauchy distribution with parameters θ and λ if its probability density function is given by: f ( x;θ , λ ) = λ 1 2 π λ + (x −θ )2 −∞ < x < ∞ (4.10.1) where λ > 0 . This is the special type of Pearson Type VII distribution. The cumulative distribution function X is F ( x;θ , λ ) = 1 1 ⎛ x −θ ⎞ + tan −1 ⎜ ⎟ 2 π ⎝ λ ⎠ (4.10.2) The parameters θ and λ are called the location and scale parameters, respectively. The distribution was discovered by Cauchy. It has a wide range of application in the 87 Chapter IV: Probability Distributions and Their Applications theory of statistics. The role of Cauchy distribution in statistical theory often lies in providing counter examples. It is often quoted as a distribution for which moments do not exist. The distribution is symmetrical about x = θ and thus x = θ gives the median of the distribution. The function is maximum at x = θ so that the point is also the mode. In the most general sense, the mean of the Cauchy distribution does not exist. But if it exists, then it is located at x = θ . Thus, the mean, median and mode of the distribution coincide at the point x = θ . For the Cauchy distribution, the second and higher moments about the mean do not exist. The upper and lower quartiles are θ ± λ . The probability function has points of inflexion at θ ± λ 3 . Remarks: • If λ = 1 , Cauchy distribution reduces to the form as f ( x;θ ) = • If x −θ λ 1 1 π 1 + (x −θ )2 { } = z , then (4.10.1) turns into the standard Cauchy distribution whose pdf is f ( z) = 1 , π (1 + z 2 ) −∞ < x < ∞ 4.2.11 Beta distribution A distribution that has both an upper and a lower bound is the beta distribution. Generally the beta distribution is defined over the interval 0 to 1. It can, however, be transformed to any interval a to b. If the limits of the distribution are unknown, they become parameters of the distribution making it a four parameter rather than a two parameter distribution. The beta density function is given by f ( x) = x α −1 (1 − x) β −1 / B(α , β ) , 0 ≤ x ≤1 α > 0 and β > 0 are the two parameters of the distribution. This is the standard form 1 of the beta distribution. The function B(α , β ) = ∫ x α −1 (1 − x) β −1 dx is called the beta 0 function. It can be shown that B(α , β ) = Γ(α )Γ( β ) . Γ(α + β ) The mean and variance of the beta distribution are: 88 Chapter IV: Probability Distributions and Their Applications E( X ) = Var ( X ) = α α +β αβ (α + β ) (α + β + 1) 2 The mean and variance can be used to get the moment estimators for α and β . The beta distribution can assume variety of different shapes depending on the values of its parameters. If α = β = 1 , the beta distribution reduces to uniform distribution in the range [0, 1] . But, if one parameter equals one and the other equals two, then it turns into triangular distribution. If both the parameters greater than one, the mode of the distribution is α −1 . If one parameter equals unity and the other is greater than α +β −2 unity, then there is only one point of inflexion. The distribution is symmetrical if α = β , skewed to the right if α > β , and skewed to the left if α < β . Figure 4.11.1 The density functions of the beta distribution for various values of a = α̂ and b = βˆ Problem: The proportion of a brand of television sets requiring service after sale during the first year of operation is a random variable having a beta distribution with 89 Chapter IV: Probability Distributions and Their Applications α = 3 and β = 2 . What is the probability that at least 80% of the new models sold this year of this brand will require service during the first year of operation? (Islam, 2004, p. 689) 4.2.12 Pearson distribution Karl Pearson developed a set of frequency curves which could be obtained as the solution of the first order differential equation: dy y ( x − a) = dx b0 + b1 x + b2 x 2 (4.12.1) where a, b0 , b1 and b2 are the parameters to be calculated from the given data. By choosing appropriate values for the parameters, the above equation becomes a large number of families of distributions including the normal, beta and gamma distributions. The families of frequency functions defined by (4.12.1) are known as Pearson distribution. The distribution is completely determined by its first four moments. There are twelve types of Pearson curves. They are type I, type II,................., and type XII. The form of the frequency curve depends on k , which is equal to b12 / 4b0 b 2 , and β 2 , which is equal to µ 4 / µ 22 ( µ 2 and µ 4 are the second and fourth moments, respectively, about the mean). Figure 4.12.1 shows the type of Pearson's curves for different values of k . Figure 4.12.1 Type of Pearson's curves for different values of k Pearson's method of fitting consists of : a) determining the values of the first four moments of the observed distributions; b) calculating the observed values of β 1 = µ 32 / µ 23 , β 2 and k , which determine the type to which the observed distribution belongs; 90 Chapter IV: Probability Distributions and Their Applications c) equating the observed moments to the moments of the type of distribution expressed in terms of its parameters; and d) solving the resulting equations of those parameters; where upon the fitted distribution is determined. 4.2.13 Lognormal distribution The lognormal distribution (sometimes spelled out as the logarithmic normal distribution) of a random variable X is one for which the logarithm of X follows a normal or Gaussian distribution. Denote Y = ln X , then Y has a normal or Gaussian distribution given by: f ( y) = 1 2πσ y2 e 1 ⎛ y−µ y − ⎜ 2 ⎜⎝ σ y ⎞ ⎟ ⎟ ⎠ 2 −∞ < y < ∞ , (4.13.1) dy 1 = , the distribution of X can be found dx x Derived distribution: Since Y = ln X , as: f ( x) = f ( y ) ⋅ dy = dx 1 2πσ y2 e 1 ⎛ y−µ y − ⎜ 2 ⎜⎝ σ y ⎞ ⎟ ⎟ ⎠ 2 ⋅ 1 = x 1 2πx 2σ y2 e 1 ⎛ y−µ y − ⎜ 2 ⎜⎝ σ y ⎞ ⎟ ⎟ ⎠ 2 (4.13.2) Note that equation (4.13.1) gives the distribution of Y as a normal distribution with mean µ y and variance σ y2 . Equation (4.13.2) gives the distribution of lognormal distribution with parameters µ y and σ y2 . Estimation of parameters ( µ y , σ y2 ) of lognormal distribution: Note: Y = ln X , ∑y y= n i , S 2 y ∑y = 2 i − ny 2 n −1 Chow (1954) Method: (1) Cv = S x / X (2) Y = (3) S y2 = ln(C v2 + 1) X2 1 ln 2 2 Cv +1 91 X as the Chapter IV: Probability Distributions and Their Applications The mean and variance of the lognormal distribution are: E ( X ) = exp(µ y + σ y2 / 2) [ Var ( X ) = µ x2 e σ y2 ] −1 The coefficient of variation of the X ' s is: Cv = e σ y2 −1 The coefficient of skew of the X ' s is: γ = 3C v + C v3 Thus the lognormal distribution is skewed to the right; the skewness increasing with increasing values of C v . Reproductive property of ln distribution: (1) Suppose, X ~ ln and y = aX b then y ~ ln with µ ln y = ln a + bµ ln X and σ ln2 y = b 2σ ln2 X . (2) (a) Suppose, X ~ ln , y ~ ln and Z = Xy then Z ~ ln with parameters µ ln Z = µ ln X +`µ ln y σ ln2 Z = σ ln2 X + σ ln2 y (b) If z = X and X and y are mutually independent then Z ~ ln y with µ ln Z = µ ln X − µ ln y σ ln2 Z = σ ln2 X + σ ln2 y (3) X ~ N ( µ , σ 2 ) , X ~ N ( µ , σ 2 / n) 1 1 Geometric mean: X g = (πx i ) n = ( x1 x 2 K x n ) ~ ln with mean = µ ln X , variance = σ ln2 X / n The lognormal distribution arises in processes in which the change in a random variable at the n th step is a random proportion of the variable a the ( n − 1) st step. Another way of saying the same thing is that the lognormal distribution is needed when factors of percentages characterize the variation. Thus if X represents a 92 Chapter IV: Probability Distributions and Their Applications quantity that can vary by factors in its error, having a possible range between X 0 / f and X 0 f , where X 0 is some midpoint reference value and f an error factor, then a lognormal is the natural distribution for describing the phenomenon. One of the reasons that the lognormal distribution is frequently suitable for describing failures in reliability and risk analysis is that data for rarely occurring events may not be extensive, so that component failure rates may vary by factors. For example, a failure rate estimated at 10 −6 / hr to 10 −7 / hr if the error factor is 10. When the failure rate is expressed as 10 − X , where X is some exponent, use of the lognormal distribution implies that the exponent satisfies a normal distribution. Thus we can view the lognormal distribution as one for situations in which there is considerable uncertainty in the failure parameters. Another feature of the lognormal distribution is that the skewness to higher times incorporates the general behavior of the data for unlikely phenomenon since the skewness accounts for the occurrence of infrequent but large deviate values, such as abnormally high failures rates due to batch defects, environmental degradation, and other causes. The three-parameter lognormal distribution is obtained by fitting a normal distribution to the logarithms of ( x − τ ) where τ is a parameter that must be estimated from the data. Replacing x in equation (4.13.2) by x − τ results in the generalized threeparameter form. Then E( X ) = τ + exp(µ y + σ y2 / 2) Var ( X ) = e 2 µ y +σ y2 [e σ y2 ] −1 The three-parameter lognormal distribution replaces the two-parameter version whenever there is no possibility of failure for 0 ≤ x ≤ τ . 4.2.14 Pareto distribution The Swiss economics professor Vilfredo Pareto (1848-1923) formulated an eponymous law which states that the fraction of a population with income exceeding an amount x is equal to 93 Chapter IV: Probability Distributions and Their Applications Cx − a (4.14.1) for all x , where C and a are positive constants independent of x but depending on the population. Pareto believed that his law was true for all populations regardless of economics and political conditions. If F (x ) is the CDF of the income distribution, then (4.14.1) implies that a ⎛c⎞ F ( x) = 1 − ⎜ ⎟ , ⎝ x⎠ x > c, (4.14.2) where c > 0 is the minimum income. We denote the distribution given in (4.14.2) by Pareto (a, c) . Pareto distribution finds its application for graduating the city population sizes, occurrence of natural resources, stop price fluctuations, size of firms, personal incomes, distribution of losses from investing, etc. The PDF of the distribution in (4.14.2) is ac a f ( x) = a +1 , x x>c (4.14.3) So a Pareto distribution has polynomial tails. The constant a is called the tail index or Pareto constant. An advantage of the Pareto distribution over the t-distribution as a statistical model is that the tail index of the Pareto can be any positive value whereas for the t-distribution the tail index is a positive integer. The survival function of a random variable X is P( X > x) = 1 − F ( x) , where F is the CDF of X . If X is a loss, then the survival function a ⎛c⎞ 1 − F ( x) = ⎜ ⎟ , ⎝ x⎠ x>c As x → ∞ , the survival function of a Pareto converges to 0 at a slow polynomial rate rather than a fast exponential rate, which means that Pareto distributions have a heavy right tail, and the smaller the value of a the heavier the tail. Figure (2.12) shows the survival function of a Pareto distribution with c = 0.25 and a = 1.1 . For comparison, the survival functions of normal and exponential distributions, conditional on being greater than 0.25, are also shown. The normal has mean 0 and its standard deviation σ = 0.3113 , and the exponential has θ = c / a = 0.25 /1.1 . These parameters were chosen so that the Pareto, normal, and exponential densities, conditional on being 94 Chapter IV: Probability Distributions and Their Applications greater than 0.25, have the same height at 0.25, which implies that their survival functions have the same slope at 0.25 so the three survival functions start to decrease to 0 at the same rates. Notice that despite their initial rates of decrease being equal, the normal survival functions converge to 0 much faster than the Pareto as x → ∞ . This means that the extreme losses are much more likely if the loss distribution is Pareto rather than normal. The exponential survival function is intermediate between the normal and Pareto survival functions. Figure 4.2.14.1 Survival function of a Pareto distribution (see text) Pareto distribution can be used to model the probability of a large loss as follows. Let X = -return. When X > 0 there is a loss. We assume that for some c > 0 , the distribution of X conditional on X > c is Pareto with parameters c and a . The value of c can be selected by the analyst and might be the smallest loss which is of real interest, such that losses smaller than c are too small to be of much concern. The parameter a can be estimated from a set of loss data. Example: MLE of the Pareto index and the hill estimator Assume that X 1 ,K, X n are i.i.d. Pareto (a, c) . The parameter c is often known and if not, one can use the minimum of X 1 ,K, X n as an estimate of c . The tail index a will generally not be known, but a can be estimated easily by maximum likelihood as is now shown. For simplicity, in the following we assume that c is known but c would be replaced by an estimate if not. By the equation (4.14.3), the likelihood function is ac a ac a ac a L(a) = ( a +1 )( a +1 )K( a +1 ) , X1 X2 Xn 95 Chapter IV: Probability Distributions and Their Applications so the log-likelihood is n log{L( a )} = ∑ {log(a ) + a log(c ) − ( a + 1) log( X i )} . (4.14.4) i =1 Differentiating (4.14.4) w.r.t. a and setting the derivative equal to 0 gives the equation n n = ∑ log( X i / c ) a i =1 Therefore the MLE of a is aˆ = n ∑ (4.14.5) n log( X i / c) i =1 If X 1 ,K, X n are i.i.d. from a distribution with Pareto tails rather than being exactly Pareto, then one does not want to compute (4.14.5) using all of the data but rather only the data in the tail. Otherwise, there could be sizeable bias. Therefore, one should choose a constant c and use only the data greater than c . This estimator is called the Hill estimator. Typically, c is one of the X i . The Hill estimator can be written as aˆHill (c) = ∑ Xi n , X c log( / ) i >c (4.14.6) where n(c ) is the number of X i greater than c .The difficulty is how to choose c or, equivalently, n(c ) . The Hill plot is plot of aˆHill (c) versus n(c ) . We expect that aˆHill (c) will be unstable when n(c ) small due to random variability. If n(c ) gets too large, one is using nearly all of the data and aˆHill (c) may suffer from bias. One hopes that the Hill plot will show some stability for n(c ) neither too small nor too large and one can then use a value of n(c ) in this region of stability. 4.3 SYNTHESIZED DISTRIBUTION Three categories of synthesized distributions for a device are considered. A mixed distribution failure model is a linear combination of two or more probability densities for all times, and a composite distribution failure model is a piecewise-continuouswith-time failure probability density. A convoluted distribution arises for a device that has replacement units in standby ready for sequential use as each unit fails. 4.3.1 Mixed distribution 96 Chapter IV: Probability Distributions and Their Applications If f i (t ) is a failure probability density with hazard rate λi (t ), i = 1 to I, then the corresponding density f(t) for the mixed distribution of a single component device can be written as I f (t ) = ∑ k i f i (t ). (4.3.1.1) i =1 The mixing parameters k i , i = 1 to I, must be such that 0 ≤ k i ≤ 1, I ∑k i =1 i (4.3.1.2) =1 (4.3.1.3) Example: Construct a mixed distribution failure probability density for a combination of an exponential distribution and a gamma distribution. Solution: From the exponential and gamma probability densities given earlier in this chapter and equations (4.3.1.1) through (4.3.1.3), f (t ) = kλ1 exp(−λ1t ) + (1 − k ) λr2 t r −1 exp(−λ2 t ) Γ( r ) Figure 4.3.1.1 Mixed exponential and gamma distribution with λ1 = λ2 = 1, r = 5 , and k = 0.1 Figure 4.3.1.1(a) shows the exponential distribution (curve 1), the gamma distribution (curve 2), and the sum (curve 3) for k = 0.1. 4.3.1.1(b) illustrates the instantaneous 97 Chapter IV: Probability Distributions and Their Applications failure rate λ (t ) for the mixed model. For the case shown, there initially is a period of diminishing hazard rate (0 ≤ t ≤ t H ) during which "weak" items in a large population would be expected to fail; at later times, phenomena such as wear cause the rate to increase. Example: Failures of a given device can be classified as either sudden (catastrophic) or delayed (wear-out). Develop a mixed distribution model for the cumulative failure probability of the device. Solution: Catastrophic failures may occur as soon as the device is exposed to an operating environment outside the maximum tolerances for operation; then the Weibull distribution with a location parameter τ 1 = 0 and a shape parameter α 1 < 1 is an appropriate model. Wear-out failures are due to aging of the device; a Weibull model with a location parameter τ 2 > 0 and a shape parameter α 2 > 1 is an appropriate failure model. From the equation of the cumulative Weibull distribution and equations (4.3.1.1) through (4.3.1.3), the cumulative failure probability is { [ F (t ) = k 1 − exp − (t / β 1 ) α1 ⎧⎪ ⎡ ⎛ t −τ 2 + (1 − k ) ⎨1 − exp ⎢− ⎜⎜ β ⎢⎣ ⎝ 2 ⎪⎩ ]} ⎞ ⎟⎟ ⎠ α2 ⎤ ⎫⎪ ⎥⎬ ⎥⎦ ⎪⎭ for β1 > 0 and β 2 > 0 and 0 ≤ k ≤ 1 . A special case of the corresponding f(t) is shown in Figure 4.3.1.2. Figure 4.3.1.2 Mixed Weibull failure probability density with β1 = β 2 = 1, α1 = 0.5 , α 2 = 3 , τ 1 = 0 , τ 2 = 0.4 and k = 0.2 4.3.2 Composite distribution 98 Chapter IV: Probability Distributions and Their Applications A composite failure model for a one-component system can be constructed by linking together different failure probability densities for different time intervals. Then f j (t ) denotes the composite probability density function for time interval T j −1 ≤ t ≤ T j where the times T j −1 and T j are the partition parameters for the jth interval. A special case of a composite distribution exists for any device that cannot fail for a finite period of time τ . In such situations, the device is not sensitive to any load to which it is subjected, so f1 (t ) = 0 for 0 ≤ t ≤ τ . Example: A device is known to always fail in a random fashion while in a phased mission mode of operation consisting of three stages: 0 ≤ t < T1 , T1 ≤ t < T2 and t ≥ T2 , Obtain the cumulative failure probability for the device. Three non-synchronous hazard rates, denoted as λ j , j = 1 to 3 , charac- Solution: terize the hazard rate, which can be written as λ (t ) = λ1 + (λ 2 − λ1 ) H (t − T1 ) + (λ3 − λ 2 ) H (t − T2 ) , where H ( x) = 1 , x ≥ 0 , and H(x) = 0, x < O. Using R (t ) = 1 − F (t ) and t R(t ) = e ∫ − λ ( t ′ ) dt ′ 0 the cumulative failure probability is F (t ) = 1 − exp(−λ1t ), 0 ≤ t < T1 , = 1 − exp[− λ1t − (λ2 − λ1 )(t − T1 )] , T1 ≤ t ≤ T2 = 1 − exp[− λ1t − (λ 2 − λ1 )(t − T1 ) − (λ3 − λ 2 )(t − T2 )], t ≥ T2 A composite model has the advantage that it can sometimes provide flexibility in fitting and explaining failure data. It is really nothing more than the well-known method of approximating a function by dividing it into a number of regions. Intuitively, the greater the number of segments taken, the more accurate the approximation becomes, but engineering judgment must be exercised to balance goodness of fit and computational complexity. 4.3.3 Convoluted distribution 99 Chapter IV: Probability Distributions and Their Applications A device that has replacement units in standby can continue to operate provided at least one of its units has not failed. The first unit operates until failure at t = t1 ; the j th unit fails at t = t j . The failure probability density for the ith and all prior units, f 1, 2,K,i (t ) , may be expressed in terms of that for the (i − 1) unit and all prior units, , f 1, 2,K,(i −1) (t ) , as the convolution of two failure probability densities: t f1, 2,K,i (t ) = ∫ f i (t − t ′) f12K(i −1) (t ′)dt ′ (4.3.3.1) 0 In this equation, the failure probability density for the i th unit, f i (t − t ′) , accounts for the system failure probability density for the time (t - t') during which the i th unit is in operation, while the f 12K( i −1) (t ′) dt ′ accounts for the failure probability of the (i − 1) th unit in dt' about time t' after all other units j, j < (i − 1) , have failed. The integration over the time of failure t' of the (i − 1) th unit ranges from 0 to 1 because the actual time of the i th failure is not known. Equation (4.3.3.1) can be rewritten in the form of nested integrals by recursively applying the equation. The result is t ti −1 0 0 f 12Ki (t ) = ∫ dt i −1 f i (t − t i −1 ) ∫ t2 dt i − 2 f i −1 (t i −1 − t i − 2 ) L × ∫ dt1 f 2 (t 2 − t1 ) f 1 (t1 ) (4.3.3.2) 0 Thus, for example, for a three-unit system, with initially two replacement units in standby, ready for use, the system probability density for failure is given by t t2 0 0 f123 (t ) = ∫ dt 2 f 3 (t − t 2 ) ∫ dt1 f 2 (t 2 −t1 ) f1 (t1 ) (4.3.3.3) Any distribution discussed in Section 4-2 may be substituted for f i (t ) in equations (4.3.3.1) through (4.3.3.3). The f i (t ) need not be identical for all i , but the system must be capable of functioning with any unit in operation. Example: Calculate the failure probability density for a system consisting of i identical units, all having identical constant hazard rates λ . The units are used successively. Solution: Equation f (t ) = λe − λt for the exponential failure model and Eq. (4.3.3.1) can be combined to give 100 Chapter IV: Probability Distributions and Their Applications f12Ki (t ) = λ (λt ) i −1 −λt e (i − 1)! This result should not be surprising since it is precisely the same as for the ith occurrence in the Poisson process. To summarize the results of this section: 1. A mixed distribution consists of the addition of failure distributions with variable mixing coefficients for the different constituent probability densities. 2. A composite distribution is obtained from a set of piecewise-continuous hazard rates, each valid over a finite interval of time; if the composite hazard rate is continuous, so also is the composite failure probability density. 3. A convoluted distribution arises for a multi-unit system with one or more replacement units in standby that can be switched into service instantaneously with no switch failures. 101