Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Survey

Document related concepts

no text concepts found

Transcript

Preface This is the seventh volume in the series 'Handbook of Statistics' started by the late Professor P. R. Krishnaiah to provide comprehensive reference books in different areas of statistical theory and applications. Each volume is devoted to a particular topic in statistics; the present one is on 'Quality Control and Reliability', a modern branch of statistics dealing with the complex problems in the production of goods and services, maintenance and repair, and management and operations. The accent is on quality and reliability in all these aspects. The leading chapter in the volume is written by W. Edwards Deming, a pioneer in statistical quality control, who spearheaded the quality control movement in Japan and helped the country in its rapid industrial development during the post war period. He gives a 14-point program for the management to keep a country in the ascending path of industrial development. Two main areas of concern in practice are the reliability of the hardware and of the process control software. The estimation of hardware reliability and its uses is discussed under a variety of models for reliability by R.A. Johnson in Chapter 3, M. Mazumdar in Chapter 4, L. F. Pan in Chapter 15, H. L. Harter in Chapter 22, A. P. Basu in Chapter 23, and S. Iyengar and G. Patwardhan in Chapter 24. The estimation of software reliability is considered by F. B. Bastani and C. V. Ramamoorthy in Chapter 2 and T. A. Mazzuchi and N. D. Singpurwalla in Chapter 5. The main concepts and theory of reliability are discussed in Chapters 10, 12, 13, 14 and 21 by F. Proschan in collaboration with P. J. Boland, F. Guess, R. E. Barlow, G. Mimmack, E. E1-Neweihi and J. Sethuraman. Chapter 6 by N. R. Chaganty and K. Joag-dev, Chapter 7 by B. W. Woodruff and A. H. Moore, Chapter 9 by S. S. Gupta and S. Panchapakesan, Chapter 11 by M . C . Bhattacharjee and Chapter 16 by W . J . Padgett deal with some statistical inference problems arising in reliability theory. Several aspects of quality control of manufactured goods are discussed in Chapter 17 by F. B. Alt and N. D. Smith, in Chapter 18 by B. Hoadley, in Chapter 20 by M. CsOrg6 and L. Horv6th and in Chapter 19 by P. R. Krishnaiah and B. Q. Miao. All the chapters are written by outstanding scholars in their fields of expertise and I wish to thank all of them for their excellent contributions. Special thanks are due to Elsevier Science Publishers B.V. (North-Holland) for their patience and cooperation in bringing out this volume. C. R. Rao Contributors F. B. Alt, Dept. of Management Science & Stat., University of Maryland, College Park, MD 20742, USA (Ch. 17) F. B. Bastani, Dept. of Computer Science, University of Houston, University Park, Houston, TX 77004, USA (Ch. 2) A. P. Basu, Dept. of Statistics, University of Missouri-Columbia, 328 Math. Science Building, Columbia, MO 65201, USA (Ch. 23) M. C. Bhattacharjee, Dept. of Mathematics, New Jersey Inst. of Technology, Newark, NJ 07102, USA (Ch. 11) H. W. Block, Dept. of Mathematics & Statistics, University of Pittsburgh, Pittsburgh, PA 15260, USA (Ch. 8) P. J. Boland, Dept. of Mathematics, University College, Belfield, Dublin 4, Ireland (Ch. 10) R. E. Barlow, Operations Research Center, University of California, Berkeley, CA 94720, USA (Ch. 13) N. R. Chaganty, Math, Dept., Old Dominion University, Hampton Blvd., Norfolk, VA 23508, USA (Ch. 6) M. CsOrg6, Dept. of Mathematics & Statistics, Carleton University, Ottawa, Ontario, Canada K1S 5B6 (Ch. 20) W. Edwards Deming, Consultant in Statistical Studies, 4924 Butterworth Place, Washington, DC 20016, USA (Ch. 1) F. M. Guess, Department of Statistics, University of South Carolina, Columbia, South Carolina 29208, USA (Ch. 12) S. Gupta, Dept. of Statistics, Math./Science Building, Purdue University, Lafayette, IN 47907, USA (Ch. 9) H. L. Harter, 32 S. Wright Ave., Dayton, OH 45403, USA (Ch. 22) B. Hoadley, Bell Laboratories, HP 1A-250, HolmdeL NJ 07733, USA (Ch. 18) L. Horvhth, Bolyai Institute, Szeged University, Aradi Vertanuk tere 1, H-6720 Szeged, Hungary (Ch. 20) S. Iyengar, Dept. of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, USA (Ch. 24) K. Joag-dev, Dept. of Mathematics, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA (Ch. 6) R. A. Johnson, Dept. of Statistics, 1210 West Dayton Street, Madison, WI 53706, USA (Oh. 3) xiii xiv Contributors M. Mazumdar, Dept. of Industrial Engineering, University of Pittsburgh, Benedum Hall 1048, Pittsburgh, PA 15260, USA (Ch. 4) T. A. Mazzuchi, c/o N. D. Singpurwalla, Operations Research & Statistics, Geo Washington University, Washington, DC 20052, USA (Ch. 5) B. Miao, Dept. of Math. & Stat., University of Pittsburgh, Pittsburgh, PA 15260, USA (Ch. 19) G. M. Mimmack, c/o F. Proschan, Statistics Department, Florida State University, Tallahassee, FL 32306, USA (Ch. 14) A. H. Moore, AFIT/ENC, Wright-Patterson AFB, OH 45433, USA (Ch. 7) E. E1-Neweihi, Dept. of Math., Stat. & Comp. Sci., University of Illinois, Chicago, IL 60680, USA (Ch. 21) W. J. Padgett, Math. & Stat. Department, University of South Carolina, Columbia, SC 29208, USA (Ch. 16) G. Patwardhan, Dept. of Mathematics, Pennsylvania State University at Altoona, Altoona, PA 16603, USA (Ch. 24) S. Panchapakesan, Mathematics Department, Southern Illinois University, Carbondale, IL 62901, USA (Ch. 9) L. F. Pau, 7 Route de Drize, CH 1227 Carouge, Switzerland (Ch. 15) F. Proschan, Statistics Department, Florida State University, Tallahassee, FL 32306, USA (Ch. 10, 12, 13, 14, 21) C. V. Ramamoorthy, Dept. of Electrical Engineering & Comp. Sci., University of California at Berkeley, Berkeley, CA 94720, USA (Ch. 2) T. H. Savits, Dept. of Mathematics & Statistics, University of Pittsburgh, Pittsburgh, PA 15260, USA (Ch. 8) J. Sethuraman, Dept. of Statistics, Florida State University, Tallahassee, FL 32306, USA (Ch. 22) N. D. Singpurwalla, Operations Research & Statistics, George Washington University, Washington, DC 20052, USA (Ch. 5) N. D. Smith, Dept. of Management Sci. & Stat., University of Maryland, College Park, MD 20742, USA (Ch. 17) B. Woodruff, Directorate of Mathematical & Inf. Service, AFOSR/NM, Bolling Air Force Base, DC 20332, USA (Ch. 17) P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 1-6 1 & Transformation of Westem Style of Management* W. Edwards Deming 1. The crisis of Western industry The decline of Western industry, which began in 1968 and 1969, a victim of competition, has reached little by little a stage that can only be characterized as a crisis. The decline is caused by Western style of management, and it will continue until the cause is corrected. In fact, the decline may be ready for a nose dive. Some companies will die a natural death, victims of Charles Darwin's inexhorable law of survival of the fittest. In others, there will be awakening and conversion of management. What happened? American industry knew nothing but expansion from 1950 till around 1968. American goods had the market. Then, one by one, many American companies awakened to the reality of competition from Japan. Little by little, one by one, the manufacture of parts and materials moves out of the Western world into Japan, Korea, Taiwan, and now Brazil, for reasons of quality and price. More business is carded on now between the U. S. and the Pacific Basin than across the Atlantic Ocean. A sudden crisis like Pearl Harbor brings everybody out in full force, ready for action, even if they have no idea what to do. But a crisis that creeps in catches its victims asleep. 2. A declining market exposes weaknesses Management in an expanding market is fairly easy. It is difficult to lose when business simply drops into the basket. But when competition presses into the market, knowledge and skill are required for survival. Excuses ran out. By 1969, the comptroller and the legal department began to take charge for survival, fighting a defensive war, backs to the wall. The comptroller does his best, using only visible figures, trying to hold the company in the black, unaware of the importance * Parts of this Chapter are extracts from the author's book Out of the Crisis (Center for Advanced Engineering Study, Massachusetts Institute of Technology, 1985). 2 w. Edwards Deming for management of figures that are unknown and unknowable. The legal department fights off creditors and predators that are on the lookout for an attractive takeover. Unfortunately, management by the comptroller and the legal department only brings further decline. 3. Forces that feed the decline The decline is accelerated by the aim of management to boost the quarterly dividend, and to maximize the price of the company's stock. Quick returns, whether by acquisition, or by divestiture, or by paper profits or by creative accounting, are self-defeating. The effect in the long run erodes investment and ends up as just the opposite to what is intended. A far better plan is to protect investment by plans and methods by which to improve product and service, accepting the inevitable decrease in costs that accompany improvement of quality and service, thus reversing the decline, capturing the market with better quality and lower price. As a result, the company stays in business and provides jobs and more jobs. For years, price tag and not total cost of use governed the purchase of materials and equipment. Numerical goals and M.B.O. have made their contribution to the decline. A numerical goal outside the capability of a system can be achieved only by impairment or destruction of some other part of the company. Work standards more than double costs of production. Worse than that, they rob people of their pride of workmanship. Quotas of production are guarantee of poor quality. Exhortations are directed at the wrong people. They should be directed at the management, not at the workers. Other forces are still more destructive. (1) Lack of constancy of purpose to plan product and service that will have a market and keep the company in business, and provide jobs. (2) Emphasis on short-term profits: short-term thinking (just the opposite from constancy of purpose to stay in business), fed by fear of unfriendly takeover, and by push from bankers and owners for dividends. (3) Personal review system, or evaluation of performance, merit rating, annual review, or annual appraisal, by whatever name, for people in management, the effects of which are devastating. (4) Mobility of management; job hopping from one company to another. (5) Use of visible figures only for management, with little or no consideration of figures that are unknown or unknowable. Peculiar to industry in the Unites States: (6) Excessive medical costs. (7) Excessive costs of liability.* * Eugene L. Grant, interviewin the journal Quality, Chicago, March 1984. Transformation of Western style of management 3 Anyone could add more inhibitors. One, for example, is the choking of business by laws and regulations; also by legislation brought on by groups of people with special interests, the effect of which is too often to nullify the work of standardizing committees of industry, government, and consumers. Still another force is the system of detailed budgets which leave a division manager no leeway. In contrast, the manager in Japan is not bothered by detail. He has complete freedom except for one item; he can not transfer to other uses his expenditure for education and training. 4. Remarks on evaluation of performance, or the so-called merit rating Many companies in America have systems by which everyone in management or in research receives from his superiors a rating every year. Some government agencies have a similar system. The merit system leads to management by fear. The effect is devastating. - It nourishes short-term performance, annihilates long-term planning, builds fear, demolishes teamwork; nourishes rivalry and politics, - It leaves people bitter, others despondent and dejected, some even depressed, unfit for work for weeks after receipt of rating, unable to comprehend why they are inferior. It is unfair, as it ascribes to the people in a group differences that may be caused largely if not totally by the system that they work in. The idea of a merit rating is alluring. The sound of the words captivates the imagination: pay for what you get; get what you pay for; motivate people to do their best, for their own good. The effect of the merit rating is exactly the opposite of what the words promise. Everyone propels himself forward, or tries to, for his own good, on his own life preserver. The organization is the loser. Moreover, a merit rating is meaningless as a predictor of performance, whether in the same job or in one that he might be promoted into. One may predict performance only for someone that falls outside the limits of differences attributable to the system that the people work in. 5. Modern principles of leadership Modern principles of leadership will in time replace the annual performance review. The first step in a company will be to provide education in leadership. This education will include the theory of variation, also known as statistical theory. The annual performance review may then be abolished. Leadership will take its place. Suggestions follow. (1) Institute education in leadership; obligations, principles, and methods. (2) More careful selection of the people in the first place. (3) Better training and education after selection. 4 w. Edwards Deming (4) A leader, instead of being a judge, will be a colleague, counseling and leading his people on a day-to-day basis, learning from them and with them. (5) A leader will discover who if any of his people is (a) outside the system on the good side, (b)outside on the poor side, (c) belonging to the system. The calculations required are fairly simple if numbers are used for measures of performance. Ranking of people (outstanding down to unsatisfactory) that belong to the system violates scientific logic and is ruinous as a policy. In the absence of numerical data, a leader must make subjective judgment. A leader will spend hours with every one of his people. They will know what kind of help they need. There will sometimes be incontrovertible evidence of excellent performance, such as patents, publication of papers, invitations to give lectures. People that are on the poor side of the system will require individual help. Monetary reward for outstanding performance outside the system, without other, more satisfactory recognition, may be counterproductive. (6) The people of a group that form a system will all be subject to the company's formula for privileges and for raisesin pay. This formula may involve (e.g.) seniority. It is important to note that privilege will not depend on rank within the system. (In bad times, there may be no raise for anybody.) (7) Figures on performance should be used not to rank the people in a group that fall within the system, but to assist the leader to accomplish improvement of the system. These figures may also point out to him some of his own weaknesses. (8) Have a frank talk with every employee, up to three or four hours, at least once a year, not for criticism, but to learn from each of them about the job and how to work together. The day is here when anyone deprived of a raise or of any privilege through misuse of figures for performance (as by ranking the people in a group) may with justice file a grievance. Improvement of the system will help everybody, and will decrease the spread between the figures for the performances of people. 6. Other obstacles (1) Hope for quick results (instant pudding). (2) The excuse that 'our problems are different'. (3) Inept teaching in schools of business. (4) Failure of schools of engineering to teach statistical theory. (5) Statistical teaching centres fail to prepare students for the needs of industry. Students learn statistical theory for enumerative studies, then see them applied in class and in textbooks, without justification nor explanation, to analytic problems. They learn to calculate estimates of standard errors of the result of an experiment and in other analytic problems where there is no such thing as a standard error. They learn tests of hypothesis, null hypothesis, and probability levels of significance. Such calculations and the underlying theory are excellent mathematical exercises, but they provide no basis for action, no basis for evaluation of the risk Transformation of Western style of management 5 of prediction of the results of the next experiment, nor of tomorrow's product, which is the only question of interest in a study aimed at improvement of performance of a process or of a product. (6) The supposition by management that the work-force could turn out quality if they would apply full force their skill and effort. The fact is that nearly everyone in Western industry, management and work-force, is impeded by barriers to pride of workmanship. (7) Reliance on QC-Circles, employee involvement, employee participation groups, quality of work life, anything to get rid of the problems of people. These shams, without management's participation, deteriorate and break up after a few months. The big task ahead is to get the management involved in management for quality and productivity. The work-force has always been involved. There will then be quality of work life, pride of workmanship, and quality. Applications of techniques within the system as it exists often accomplish great improvements in quality, productivity and reduction of waste. 7. Remarks on use of visible figures The comptroller runs the company on visible figures. This is a sure road to decline. Why? Because the most important figures for management are not visible: they are unknown and unknowable. Do courses in finance teach students the importance of the unknown and unknowable loss - from a dissatisfied customer? - from a dissatisfied employee, one that, because of correctible faults of the system, can not take pride in his work? - from the annual rating on performance, the so-called merit rating? - loss from absenteeism (purely a function of supervision)? Do courses in finance teach their students about the increase in productivity that comes from people that can take pride in their work? Unfortunately, the answer is no. 8. Condensation of the 14 points for management There is now a theory of management. No one can say now that there is nothing about management to teach. If experience by itself would teach management how to improve, then why are we in this predicament? Everyone doing his best is not the answer that will halt the decline. It is necessary that everyone know what to do; then for everyone to do his best. The 14 points apply anywhere, to small organizations as well as to large ones, to the service industry as well as to manufacturing. (1) Create constancy of purpose toward improvement of product and service, with the aim to excel in quality of product and service, to stay in business, and to provide jobs. 6 IV. Edwards Deming (2) Adopt the new philosophy. We are in a new economic age, created by Japan. Transformation of Western style of management is necessary to halt the continued decline of industry. (3) Cease dependence on inspection to achieve quality. Eliminate the need for inspection on a mass basis by building quality into the product in the first place. (4) End the practice of awarding business on the basis of price tag. Purchasing must be combined with design of product, manufacturing, and sales, to work with the chosen supplier, the aim being to minimizing total cost, not initial cost. (5) Improve constantly and forever every activity in the company, to improve quality and productivity, and thus constantly decrease costs. Improve design of product. (6) Institute training on the job, including management. (7) Institute supervision. The aim of supervision should be to help people and machines and gadgets to do a better job. (8) Drive out fear, so that everyone may work effectively for the company. (9) Break down barriers between departments. People in research, design, sales, and production must work as a team, to foresee problems of production and in use that may be encountered wJ.th the product or service. (10) Eliminate slogans, exhortations, and targets for the work force asking for fewer defects and new levels of productivity. Such exhortations only create adversarial relationships, as the bulk of the causes of low quality and low productivity belong to the system and thus lie beyond the power of the work force. (11) Eliminate work standards that prescribe numerical quotas for the day. Substitute aids and helpful supervision. (12a) Remove the barriers that rob the hourly worker of his right to pride of workmanship. The responsibility of supervisors must be changed from sheer numbers to quality. (b) Remove the barriers that rob people in management and in engineering of their right to pride of workmanship. This means, inter alia, abolishment of the annual or merit rating and of management by objective. (13) Institute a vigorous program of self-improvement and education. (14) Put everybody in the company to work in teams to accomplish the transformation. Teamwork is possible only where the merit rating is abolished, and leadership put in its place. 9. What is required for change? The first step is for Western management to awaken to the need for change. It will be noted that the 14 points as a package, plus removal of the deadly diseases and obstacles to quality, are the responsibility of management. Management in authority will explain by seminars and other means to a critical mass of people in the company why change is necessary, and that the change will involve everybody. Everyone must understand the 14 points, the deadly diseases, and the obstacles. Top management and everyone else must have the courage to change. Top management must break out of line, even to the point of exile amongst their peers. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 7-25 t) Software Reliability F. B. Bastani a n d C. V. R a m a m o o r t h y 1. Introduction Process control systems, such as nuclear power plant safety control systems, air-traffic control systems and ballistic missile defense systems, are embedded computer systems. They are characterized by severe reliability, performance and maintainability requirements. The reliability criterion is particularly crucial since any failures can be catastrophic. Hence, the reliability of these systems must be accurately measured prior to actual use. The theoretical basis for methods of estimating the reliability of the hardware is well developed (Barlow and Proschan, 1975). In this paper we discuss methods of estimating the reliability of process control software. Program proving techniques can, in principle, establish whether the program is correct with respect to its specification or whether it contains some errors. This is the ideal approach since there is no physical deterioration or random malfunctions in software. However, the functions expected of process control systems are usually so complex that the specifications themselves can be incorrect and/or incomplete, thus limiting the applicability of program proofs. One approach is to use statistical methods in order to assess the reliability of the program based on the set of test cases used. Since the early 1970's, several models have been proposed for estimating software reliability and some related parameters, such as the mean time to failure (MTTF), residual error content, and other measures of confidence in the software. These models are based on three basic approaches to estimating software reliability. Firstly, one can observe the error history of a program and use this in order to predict its future behavior. Models in this category are applicable during the testing and debugging phase. It is often assumed that the correction of errors does not introduce any new errors. Hence, the reliability of the program increases and, therefore, these models are often called reliability growth models. A problem with these models is the difficulty in modelling realistic testing processes. Also, they cannot incorporate program proofs, cannot be applied prior to the debugging phase and have to be modified significantly in order to be applicable to programs developed using iterative enhancement. 8 F.B. Bastani and C. V. Ramamoorthy The second approach attempts to predict the reliability of a program on the basis of its behavior for a sample of points taken from its input domain. These software reliability models are applicable during the validation phase (Ramamoorthy and Bastani, 1982; TRW, 1976). Errors found during this phase are not corrected. In fact, if errors are discovered the software may be rejected. The size of the sample required for a given confidence in the reliability estimate can be reduced by using some knowledge about the relationship between different points in the input domain. However, general modelling of the nature of the input domain results in mathematically intractable derivations. The third method which can be used to estimate software reliability is based on error seeding (Mills, 1973; Schick and Wolverton, 1978). In this approach the program is seeded with artificial errors without the knowledge of the team responsible for testing and debugging the software. At the conclusion of the testing and debugging phase, the correctness of the program is estimated by comparing the number of artificial and actual errors found by the test team. The rest of this paper is organized as follows: Section 2 defines software reliability and classifies some of the models which have been proposed over the past several years. Section 3 discusses the concept of error size and testing process. It states the assumptions of software reliability growth models and reviews error-counting and non-error-counting models. Section 4 discusses the measurement of software reliability/correctness using Nelson's model (TRW, 1976) and an input domain based model (Ramamoorthy and Bastani, 1979). Section 5 summarizes the paper and outlines some research issues in this area. 2. Definition and classification In this section we first give a formal definition of software reliability and then present a classification of the models proposed for estimating the reliability of a program. 2.1. Definition Software reliability has been defined as the probability that a software fault which causes deviation from the required output by more than the specified tolerances, in a specified environment, does not occur during a specified exposure period (TRW, 1976). Thus, the software needs to be correct only for inputs for which it is designed (specified environment). Also, if the output is correct within the specified tolerances in spite of an error, then the error is ignored. This may happen in the evaluation of complicated floating point expressions where many approximations are used (e.g., polynomial approximations for cosine, sine, etc.). It is possible that a failure may be due to errors in the compiler, operating system, microcode or even the hardware. These failures are ignored in estimating the reliability of the application program. However, the estimation of the overall system reliability will include the correctness of the supporting software and the reliability of the hardware. Software reliability 9 In some cases it may be desirable to classify software faults into several categories, ranging from trivial errors (e.g., minor misspellings on a hardcopy output) to catastrophic errors (e.g., resulting in total loss of control). Then, one could specify different reliability requirements for the various types of faults. Most software reliability models can be easily adapted for errors in a given class by merely ignoring other types of errors when using the model. However, this decreases the confidence in the reliability estimate since the sample size available for estimating the parameters of the model is reduced. The exposure period should be independent of extraneous factors like machine execution time, programming environment, etc. For many applications the appropriate unit of exposure period is a run corresponding to the selection of a point from the input domain (specified environment) of the program. However, for some programs (e.g., an operating system), it is difficult to determine what constitutes a 'run'. In such cases, the unit of exposure period is time. One has to be careful in measuring time in these cases (Musa, 1975). For example, if a multiuser, interactive data base system is being accessed by five users, should the exposure period be five times the observed time? This may be reasonable if the system is not saturated since then five users are likely to generate approximately five times as much work in the observed time as would a single user. However, this is not true if the system is saturated. Thus, we have: (1) R(i) = reliability over i runs = P{no failure over i runs} (2) R(t) = reliability over t seconds = P{no failure in interval [0, t)}. or (P{E} denotes the probability of the event E.) Definition (1) leads to an intuitive measure of software reliability. Assuming that inputs are selected independently according to some probability distribution function, we have: R(i) = [R(1)]; = (R);, where R = R(1). We can define the reliability, R, as follows: R = 1 - lim nf n~oo n where n = number of runs and nf--- number of failures in n runs. This is the operational definition of software reliability. We can estimate the reliability of a program by observing the outcomes (success/failure) of a number of runs under its operating environment. If we observe nf failures out of n runs, the estimate of R, denoted by/~, is: F. B. Bastani and C. V. Ramamoorthy 10 /~=1 nf n This method of estimating R is the basis of the Nelson model (TRW, 1976). 2.2. Classification In this subsection we present a classification of some of the software reliability models proposed over the past fifteen years. The classification scheme is based on the three different methods of estimating software discussed in Section 1. The main features of a model serves as a subclassification. After a program has been coded, it enters a testing and debugging phase. During this phase, the implemented software is tested till an error is detected. Then the error is located and corrected. The error history of the program is defined to be the realization of a sequence of random variables 1"1, T2, . . . , T,, where Tt denotes the time spent in testing the program after the ( i - 1)-th error was corrected till the i-th error is detected. One class of software reliability models attempts to predict the reliability of a program on the basis of its error history. It is frequently assumed that the correction of errors does not introduce any new errors. Hence, the reliability of the program increases, and therefore such models are called software reliability growth models. Software reliability growth models can be further classified according to whether they express the reliability in terms of the number of errors remaining in the program or not. These constitute error-counting and nonerror-counting models, respectively. Error-counting models estimate both the number of errors remaining in the program as well as its reliability. Both deterministic and stochastic models have been proposed. Deterministic models assume that if the model parameters are known then the correction of an error results in a known increase in the reliability. This category includes the Jelinski-Moranda (1972), Shooman (1972), Musa (1975), and Schick-Wolverton (1978) models. The general Poisson model (Angus et al., 1980) is a generalization of these four models. Stochastic models include Littlewood's Bayesian model (Littlewood, 1980a) which models the (usual) case where larger errors are detected earlier than smaller errors, and the G o e l Okumoto Nonhomogeneous Poisson Process Model (NHPP) (Goel and Okumoto, 1979a) which assumes that the number of faults to be detected is a random variable whose observed value depends on the test and other environmental factors. Extensions to the Goel-Okumoto N H P P model have been proposed by Ohba (1984) and Yamada et al. (Yamada et al., 1983; Yamada and Osaki, 1985). The number of errors remaining in the program is useful in estimating the maintenance cost. However, with these models it is d~Aficult to incorporate the case where new errors may be introduced in the program as a result of imperfect debugging. Further, for some of these models the reliability estimate is unstable if the estimate of the number of remaining errors is low (Forman and Singpurwalla, 1977; Littlewood and Verall, 1980b). Software reliability 11 Nonerror-counting models only estimate the reliability of the software. The Jelinski-Moranda geometric de-eutrophication model (Moranda, 1975) and a simple model used in the Halden project (Dahl and Lahti, 1978) are deterministic models in this category. Stochastic models consider the situation where different errors have different effects on the failure rate of the program. The correction of an error results in a stochastic increase in the reliability. Examples include a stochastic input domain based model (1L~M 80), Littlewood and Verrall's Bayesian model (Littlewood and Verrall, 1973), and the Musa-Okumoto logarithmic model (Musa and Okumoto, 1984). All the models described above treat the program as a black box. That is, the reliability is estimated without regard to the structure of the program. The validity of their assumptions usually increases as the size of the program increases. Since programs for critical control systems may be of medium size only, these models are mainly used to obtain a preliminary estimate of the software reliability. Several variants of software reliability growth models can be obtained by considering various orthogonal factors such as (1) the development of calendar time expressions for predictions of MTTF, stopping time, etc. (Musa, 1975; Musa and Okumoto, 1984); (2) the consideration of the time spent in locating and correcting errors; this aspect is modelled as a Markov process by Trivedi and Shooman (1975); and, (3) the possibility of imperfect debugging, including the introduction of new errors (Goel and Okumoto, 1979b). The second class of software reliability models, called sampling models, estimate the reliability of a program on the basis of its behavior for a set of points selected from its input domain. These models are especially attractive for estimating the reliability of programs developed for critical applications, such as air-traffic control programs, which must be shown to have a high reliability prior to actual use. At the end of the testing and debugging phase, the software is subjected to a large amount of testing in order to assess its reliability. Errors found during this phase are not corrected. In fact, if errors are discovered then the software may be rejected. One sampling model is the Nelson model developed at TRW (1976). It assumes that the software is tested with test cases having the same distribution as the actual operating environment. The operational definition discussed earlier is used to obtain the reliability estimate. The only disadvantage of the Nelson model is that a large amount of test cases are required in order to have a high confidence in the reliability estimate. The approach developed in (Ramamoorthy and Bastani, 1979) reduces the number of test cases by exploiting the nature of the input domain of the program. An important feature of this model is that the testing need not be random--any type of test-selection strategy can be used. However, the model is difficult mathematically and difficult to validate experimentally. The third approach to assessing software reliability is to insert several known errors into the program prior to the testing and debugging phase. At the end of this phase the number of errors remaining in the program can be computed on the basis of the number of known and unknown errors detected. Models based 12 F. B. Bastani and C. F. Ramamoorthy on this approach have been proposed by Mills and Basin (Mills and Basin, 1973; Schick and Wolverton, 1978) and, more recently, by Duran and Wiorkowski (1981). The major problem is that it is difficult to select errors which have the same distribution (such as ease of detectability) as the actual errors in the program. An alternate approach is to let two different teams independently debug a program and then estimate the number of errors remaining in the program on the basis of the number of common and disjoint errors found by them. Besides the extra cost, this method may underestimate the number of errors remaining in the program since many errors are easy to detect and, hence, are more likely to be detected by both the teams. DeMillo, Lipton and Sayward (1978) discuss a related technique called 'program mutation' for systematically seeding errors into a program. In this section we have classified many software reliability models without describing them in detail. References (Bologna and Ehrenberger, 1978; Dahl and Lahti, 1978; Schick and Wolverton, 1978; Tal, 1976; Ramamoorthy and Bastani, 1982; Goel and Okumoto, 1985) contain a detailed survey of most of these models. In the next two sections we discuss a few software reliability growth models and sampling models, respectively. 30 Software reliability growth models In this section we first discuss the concepts of error size and testing process. We develop a general framework for software reliability growth models using these concepts. Then we briefly discuss some error-counting and nonerror-counting models. The section concludes with a discussion on the practical application of such models. 3.1. Error sizes A program P, maps its input domain,/, into its output space, O. Each element in I is mapped to a unique element in O if we assume that the state variables (i.e., output variables whose values are used during the next run, as in process control software) are considered a part of both I and O. Software reliability models used during the development phase are intimately concerned with the size of an error. This is defined as follows: DEFINITION. The size of an error is the probability that an element selected from I according to the test case selection criterion results in failure due to that error. An error is easily detected if it has a large size since then it affects many input elements. Similarly, if it has a small size, then it is relatively more difficult to detect the error. The size of an error depends on the way the inputs are selected. Good test case selection strategies, like boundary value testing, path testing and Software reliability 13 range testing, magnify the size of an error since they exercise error-prone constructs. Likewise, the observed (effective) error size is lower if the test cases are randomly chosen from the input domain. We can generalize the notion of 'error size' by basing it on the different methods of observing programs. For example, an error has a large size visually if it can be easily detected by code reading. Similarly, an error is difficult to detect by code review if it has a small size (e.g., when only one character is missing). The development phase is assumed to consist of the following cycle: (1) The program is tested till an error is found; (2) The error is corrected and step (1) is repeated. As we have noted above, the error history of a program depends on the testing strategy employed, so that the reliability models must consider the testing process used. This is discussed in the following subsection. 3.2. Testing process As a simple example of a case where the error history is strongly dependent on the testing process used, consider a program which has three paths, thus partitioning the input domain into three disjoint subsets. If each input is considered as equally likely, then initially errors are frequently detected. As these are corrected, the interval between error detection increases since fewer errors remain. If a path is tested 'well' before testing another path, then whenever a switch is made to a new path the error detection rate increases. Similarly, if we switch from random testing to boundary value testing, the error detection rate can increase. The major assumption of all software reliability growth models is: ASSUMPTION. Inputs are selected randomly and independently from the input domain according to the operational distribution. This is a very strong assumption and will not hold in general, especially so in the case of process control software where successive inputs are correlated in time during system operation. For example, if an input corresponds to a temperature reading then it cannot change very rapidly. To complicate the issue further, most process control software systems maintain a history of the input variables. The input to the program is not only the current sensor inputs, but also their history. This further reduces the validity of the above assumption. The assumption is necessary in order to keep the analysis and data requirements simple. However, it is possible to relax it as follows: ASSUMPTION. Inputs are selected randomly and independently from the input domain according to some probability distribution (which can change with time). This means that the effective error size varies with time even though the program is not changed. This permits a straightforward modelling of the testing process as discussed in the following subsection. F. B. Bastani and C. V. Ramamoorthy 14 3.3. General growth model Let j k Tj(k) Vj(k) = = = = number of failures experienced; number of runs since the j-th failure; testing process for the k-th run after j failures; size of residual errors for the k-th run after j failures; this can be random. Now, e{success on the k-th run IJ failures} = 1 - Vj(k) = 1 - f(Tj(k))2j where )~j = error size under operational inputs; this can be a r a n d o m variable; 0 ~< 2./~< 1 ; and f(Tj(k)) = severity of the testing process relative to the operational inputs; 0 ~< f(Tj(k)) <~ 1/2j. Hence, Rj(kl2j) = P{no failure over k runs b2j} k = I-[ P{no failure on the i-th run 12j}, i=1 since successive test cases have independent failure probability. Hence, k Rj(kl2j) = [ I [I - f(Tj(i))2j] i.e., Rj(k) = E~j [1 - f(Tj(i))~j] i where E~j[.] is the expectation over ,~j. For cases where it is difficult to identify 'runs', such as operating systems and real-time process control systems, it is simpler to work in continuous time. The above relation becomes" Rj(t) = E~j[e- ~jS'of(rj(s))d,] where ).j -- failure rate after the j-th failure; 0 <~ ).j ~< ~ ; Tj(s) = testing process at time s after the j-th failure; f ( T j ( s ) ) = severity of testing process relative to operational distribution; 0 <~f(Tj(s)) <~ ~ . REMARKS. (1) As we have noted above, f(Tj(.)) is the severity of the testing Software reliability 15 process relative to the operational distribution, where the testing severity is the ratio of the probability that a run based on the test case selection strategy detects an error to the probability that a failure occurs on a run selected according to the operational distribution. Obviously, during the operational phase, f(Tj(.)) = 1. In general it is difficult to determine the severity of the test cases, and most models assume that f ( T j ( . ) ) = 1. However, for some testing strategies we can quantify f(Tj(.)). For example, in functional testing, the severity increases as we switch to new functions since these are more likely to contain errors than functions which have already been tested. (2) Even the weaker assumption is difficult to justify for programs developed using incremental top-down or bottom-up integration (Myers, 1978), since the input domain keeps on changing. Further, the assumption ignores other methods of debugging programs, such as code reviews, static analysis, program proofs, etc. (3) In the continuous case, the time is the CPU time (Musa, 1975). (4) Software reliability growth models can be applied (in principle) to any type of software. However, their validity increases as the size of the software and the number of programmers involved increases. (5) This process is a type of doubly stochastic process; these processes were originally studied by Cox in 1955 (Cox, 1966). 3.4. Error-counting models These models attempt to estimate the software reliability in terms of the estimated number of errors remaining in the program. The Jelinski-Moranda model (1972) was the first error-counting model. The Shooman model (1972) underwent some changes and is now similar to the Jelinski-Moranda model. The Schick-Wolverton model (1978) extended the Jelinski-Moranda model by incorporating a factor representing the severity of the test cases. The Musa model (1975) is equivalent to the Jelinski-Moranda model. However, it is better developed and is the first model to insist on execution time data rather than the calendar time data used in the earlier models. These early models assumed that all the errors had the same error rate. This is clearly unsatisfactory since one would expect that errors which are detected later should have smaller (operational) error rates than those which are detected earlier. This is rectified by Littlewood's model (1980a) which incorporates the case where the failure rate of successive errors is stochastically decreasing. The Goel-Okumoto N H P P model (1979a) makes another departure from the other models by treating the number of faults to be detected as a random variable instead of a fixed unknown constant. Two additional assumptions made by most error-counting models are: (a) The failure rates of the errors remaining in the program are independently identically distributed random variables. (b) The program failure rate is the sum of the individual failure rates. Taken together these assumptions are not true in general since the error distribution across modules is often skewed (Myers, 1978), so that a few complex, error-prone modules contain a large proportion of the errors. Since there is likely 16 F. B. Bastani and C. V. Ramamoorthy to be a considerable overlap in the elements (in the input domain) affected by such closely related errors, the removal of each error, except the last error, decreases the failure rate by l e s s than its own failure rate. This can result in an incorrect estimate of the reliability of the program since each detected error would be perceived as having a failure rate smaller than its actual failure rate. Further, a common testing strategy is to direct subsequent test cases at the module in which an error was most recently detected till sufficient confidence is restored in its correctness. However, this would mean that the failure rates are no longer independently identically distributed. In order to illustrate models in this category, we now present the details of the general Poisson model (GPM) discussed in (Angus et al., 1980). It generalizes the Jelinski-Moranda linear de-eutrophication model, the Shooman model, and the Schick-Wolverton model. The key parts of the Musa model are also generalized by this model. The inputs to the model are (1) tl, tz, . . . , t , where ts is the rime required to detect the j-th failure after the error(s) causing the ( j - 1)-th failure has (have) been corrected, and (2) m l , m z , . . . , m n where m j is the number of errors fixed as a result of the j-th failure. The G P M model assumes that f(Tj(s)) = as ~-1 , 2s = ( N - M j ) ~ b , where N is the number of errors originally present, Mj = Z ji= 1 mi, and 0~, q~ are constants. Hence R j ( t ) = e - dp(N- Mj)t ~ " The assumptions of the G P M model are as follows: (1) consecutive inputs have independent failure probabilities, (2) all errors have the same disjoint failure rate (p, (3) the severity of the testing process is proportional to a power of the elapsed CPU time, (4) no new errors are introduced. Assumption (1) has already been discussed above. Assumption (2) is a major drawback of these models (Littlewood, 1980a): earlier errors are likely to have a larger failure rate since they are detected more easily. Assumption (3) depends to a large extent on the testing strategy used. Intuitively, as time increases, the severity of the testing increases (Schick and Wolverton, 1978). Assumption (4) is not true in general and can lead to invalid estimates (Angus et al., 1980). Musa (1975) partly overcomes this by estimating the total number of errors to be eventually detected. The Maximum Likelihood Estimates (MLE) for the parameters of the model can be derived as follows: failure PDFj(t) dRj(t) dt = (o(N - Mj)~t ~- 1 e- ~(N-Mj),~. Software reliability 17 The likelihood function is L = fi PD~_,(~) j=l Hence, the log likelihood function is logL = n iog~b + n log~ + ~ log(N - Mj_ 1) j=l log, j=l (Nj=l The MLE's can be computed by numerically solving the equations obtained by equating the partial derivatives of logL with respect to N, c¢, and ~p to O. The final equations are as follows: ~ 1 ~ ^ ~ j=l IV-Mj_ 1 j=l~tj =0, n ^ - - + ~ l o g t j - ~ (p(2V-Mj_,)tTlogtj= 0 , ~ j=l i=1 n (~ ~ (N - Mj_I)tj ~ = 0. j=l These are discussed further in (Angus et al., 1980). 3.5. Nonerror-counting models These models only estimate the reliability of the software. They consider the effect of a debugging action on the error size or on the failure rate without concern as to the number of errors detected at a time. For example, in the JelinskiMoranda Geometric De-eutrophication model, we have ~j = ~ j - 1 D where 2j is the error rate and D is a constant to be estimated. An interesting observation is that the estimate of the parameters of this model may exist even in cases where those of the linear de-eutrophication model do not exist, i.e., fail to converge (Dahl and Lahti, 1978; Tal, 1976). Similarly, for the LittlewoodVerrall Bayesian model (1973) we have st • F. B. Bastani and C. V. Ramamoorthy 18 This models the case where there is a possibility that a debugging action may introduce new errors into the program. For the stochastic input domain based model (Ramamoorthy and Bastani, 1980) we have: ,~j__l -- ~j-- ~ j _ l X , where 2j is the error size and X is a random variable having a piecewise continuous distribution. This models the case where errors detected later have (stochastically) smaller sizes than those detected earlier. In order to illustrate models in this category, we present details of the M u s a Okumoto Logarithmic model (i984). The inputs to the model are tl, t2, . . . , tn where t/ is the time (not interval) at which the j-th error was detected. In this model: f(Tj(s))-- 1, 2o 2 ( 0 - - 20 Ot + 1 Thus, the model assumes that the failure rate decreases continuously over the testing and debugging phase, rather than at discrete points corresponding to error correction times. Further, the rate of decrease in 2(0 itself decreases with time, thus modelling the decrease in the size of errors detected as debugging proceeds Rj(t)= e_/~+,~(,)d,={ 2oOtj+ 1 }1/o. 2o O ( t j + t ) + 1 From this, the failure probability density function is failure PDF/(t) = 2(t/+ t) e -Ig +' ~<')a"x(')d" Hence, L = {j=~l )L(lj)} e- So"a(s)d~ Taking the logarithm of the likelihood function, we get logL = n log)~o - ~ log(2o0t/ + 1) - 1 log(2o0t, + 1) j=l Setting the derivative of logL with respect to 2o and 0 to 0 yields two equations which can be solved numerically for the maximum likelihood estimates of 2 o and 0, i.e., 2o and 0: Software reliability n t}~ A tj A 20 tn A j=l 2o0tj+ 1 - ^ 20 &n+ 19 -0 1 A n - to El= tj 0t, + 1 +-- 1 b2 ^ ^ log(2 oOt. + 1) 2ot, ^ ^ ^ 0(4o0t. + 1) =0. Experience has shown that this model is more accurate than the earlier model proposed by Musa (1975). Further discussions concerning the application of the new model appear in (Musa and Okumoto, 1984) 3.6. Summary We can view 2 as a random walk process in the interval (0, e). Each time the program is changed (due to error corrections or other modifications) 2 changes. In the formulation of the general model, 2i denotes the state of 2 after the j-th change to the program. Let Zj denote the time between failures after the j-th change. Zj is a random variable whose distribution depends on 2j. In all the above continuous (discrete) time models, we have assumed that this distribution is the exponential (geometric) distribution with parameter 2j, provided that f(Tj(.)) = 1. We do not know anything about the random walk process of 2 other than a sample of time between failures. Hence, one approach is to construct a model for 2 and fit the parameters of the model to the sample data. Then we assume that the future behavior of 2 can be predicted from the behavior of the model. Some of the models for 2 which have been developed are as follows: General Poisson Model (Angus et al., 1980): The set of possible states are (0, e/N, 2e/N . . . . , e); 2j = ( N - j ) e / N ; the parameters are e and N, there is a finite number of states. Geometric De-Eutrophication Model (Moranda, 1975): The set of possible states are (e, ed, ed 2, ed 3. . . . ), where d < 1; 2j = edJ; the parameters are e and d; there is an infinite (although countable) number of states. Stochastic (Input Domain) Model (Ramamoorthy and Bastani, 1980): The state is continuous over the interval (0, e); 2j = 2j_ 1 + Zig.,where Aj ~ 2j_ 1X, X ~ fl(r, s); the parameters are r and s. An alternative approach is the Bayesian approach advocated by Littlewood (1979). In this method, we postulate a prior distribution for each of 2 l, 22, ..., 2j. Then based on the sample data, we compute the posterior distribution of 2j+ 1. Some additional discussions appear in (Ramamoorthy, 1980). Over 50 different software reliability growth models have been proposed so far. These models yield widely varying predictions for the same set of failure data (Abdel-Ghaly et al., 1986). Further, any given model gives reasonable predictions for one set of data and incorrect predictions for other sets of data. This has led some researchers to propose that for each project several models should be used and then goodness-of-fit tests should be performed prior to selecting a model that is valid for the given set of failure data (Goel, 1985; Abdel-Ghaly et al., 1986). 20 F. B. Bastani and C. V. Ramamoorthy A basic problem with all software reliability growth models is that their assumption that errors are detected as a result of random testing is not true for modern software development methods. Models which have been validated using data gathered over a decade ago are not necesarily valid for current projects that use more systematic methods and tools. As an analogy, consider the task of reviewing a technical paper. There are (at least) three major types of errors which can creep into a manuscript. These are (1) spelling, typographical, and other context independent errors, (2) grammatical, organization, style, and other context dependent errors, and (3) correctness of equations, significance of the contribution, and other technical errors. Context dependent errors can be detected by random testing (i.e., by selecting anyone familiar with the language to review the paper) while three carefully selected referees are vastly superior to a thousand randomly selected referees in their ability to detect technical errors. Also, the failure process observed when all the errors are detected by human beings (testing) is different from that observed when automated tools such as spelling and grammar checkers are used. Similarly, in software development we now have tools that can detect most context independent errors (syntax errors, incorrect procedure calls, etc.) and context dependent errors (undefined variables, invalid pointers, inaccessible code segments, etc.). These tools include strongly typed languages and their compilers, data flow analyzers, etc. The remaining errors are generally the result of misunderstanding of specifications. These are best detected by formal code review and walk-through, simulation, verification where possible, and systematic testing which can be either incremental bottom-up or top-down and which emphasizes error prone regions of the input domain, such as boundary and special value points. Again, the failure process when these methods are used is completely different from that obtained when only random testing is used. In summary, software reliability growth models treat the program as a black box. That is, the reliability is estimated without regard to the structure of the program, number of procedures which have been formally proved/derived, etc. The validity of their assumption regarding random testing is generally not true for modern program development methods. Experience shows that with systematic validation techniques, errors are initially detected in quick succession with an abrupt transition to an (almost) error free state. Thus, these models can only be used for obtaining an approximate estimate of the reliability of programs. 4. Sampling models Software developed for critical applications, like air-traffic control, must be shown to have a high reliability prior to actual use. Since the possibility of specification errors exists, program testing must be used in addition to program proofs. At the end of the development phase, the software is subjected to a large amount of testing in order to estimate its reliability. Errors found during this phase are not corrected. In fact, if errors are discovered the software may be rejected (Ramamoorthy, 1979). Software reliability 21 In this section we discuss methods of measuring the reliability of a program based on the sample selected. We first discuss Nelson's method (MacWilliarns, 1973; Nelson, 1978; TRW, 1976) and then a model for estimating the correctness probability of a program based on its input domain. 4.1. The Nelson model This model (TRW, 1976) is based on the operational definition of software reliability given earlier. It is the only model whose theoretical foundations are sound. However, it suffers from a number of practical drawbacks: (1) In order to have a high confidence in the reliability estimate, a large number of test cases must be used. (2) It does not take into account 'continuity' in the input domain. For example, if the program is correct for a given test case, then it is likely that it is correct for all test cases executing the same sequence of statements. (3) It assumes random sampling of the input domain. Thus, it cannot take advantage of testing strategies which have a higher probability of detecting errors, e.g., boundary value testing, etc. Further, for most real-time control systems, the successive inputs are correlated if the inputs are sensor readings of physical quantities, like temperature, which cannot change rapidly. In these cases we cannot perform random testing. (4) It does not consider any complexity measure of the program, e.g., number of paths, statements, etc. Generally, a complex program should be tested more than a simple program for the same confidence in the reliability estimate. In order to overcome these drawbacks, the model has been extended (Nelson, 1978) as follows: The input domain is divided into several equivalence classes. The division can be based on paths or some other criteria when the number of paths is too large (e.g., program sub-functions). It is assumed that there is some continuity among the elements in an equivalence class, i.e., if the program executes correctly for an input from the j-th equivalence class, then it will execute correctly for any randomly selected input from the same equivalence class with probability 1 - bj., where bj ,~ 1. Then: e(1)= ej(1j=l where m = number of equivalence classes; and Pj = probability of selecting an input from the j-th equivalence class during actual operation. DISCUSSION. This model is a big improvement over the original model. Some comments are: (1) The assignment of values to bj is ad hoc; no theoretical justification is given for the assignment (Nelson, 1978). (2) The model uses only one type of complexity measure, namely, number of paths, functions, etc. However, it does not consider the relative complexity of each path, function, etc. F. B. Bastani and C. V. Ramamoorthy 22 Many other interesting aspects of the Nelson model are discussed in (TRW, 1976). 4.2. Input domain based model This model is discussed in detail in (Ramamoorthy and Bastani, 1979). It removes most of the objections to the Nelson model. The price is the increased complexity of the model. The model was developed for assessing the quality of critical real-time process control programs. In such systems no failures should be detected during the reliability estimation phase, so that the reliability estimate is one. Hence, the important metric of concern is the confidence in the reliability estimate. This model provides an estimate of the conditional probability that the program is correct for all possible inputs given that it is correct for a given set of inputs. The basic assumption is that the outcome of each test case provides at least some stochastic information about the behavior of the program for points which are close to the test point. The model uses the concept of probabilistic equivalence classes which is defined as follows: E is a probabilistic equivalence class if E is a subset o f / , where I is the input domain of the program P, and P is correct for all elements in E, with probability P(X~, . . . , Xa}, if P is correct for each X,. in E, i = 1. . . . . d. Then, P { I IX) is the correctness probability of P based on the set of test cases X. (Obviously, the program must be correct for each element in X.) Probabilistic equivalence classes are derived from the requirements specification and the program source code in order to minimize control flow errors. A suggested selection criterion (Ramamoorthy and Bastani, 1979) is: Let E be a probabilistic equivalence class. X is in E if an error in the program which affects any element in E can affect X, and vice versa. The results of this classification scheme are: (1) It includes all paths without loops since distinct paths differ in at least one statement. (2) Multiple conditions are treated separately since an error in one condition need not affect the other conditions. (3) Loops are restricted to a finite number of repetitions. In order to further minimize control flow errors, these classes should be intersected with classes derived from the requirements specification (Weyuker and Ostrand, 1980). Finally, we can estimate the correctness probability of the program using the continuity assumption, namely, closely related points in the input domain are 'correlated' with respect to the implementation of the function. This is true in general for algebraic programs where errors usually affect an interval of nearby points. These regions correspond to high probability equivalence classes, such as those formed on the basis of program paths. A specific model is developed in (Ramamoorthy and Bastani, 1979). The main result of this model is P{program is correct for all points in [a, a + V lit is correct for test cases having successive distances xj, j = 1. . . . , n - 1} = e-RV j=l 1 +e -'txj ' Software reliability 23 where 2 is a parameter which is deduced from some measure of the complexity of the source code. DISCUSSION. The advantages of this model are: (1) Any test case selection strategy can be used. This will minimize the testing effort since we can choose test cases which exercise error-prone constructs. (2) It does not assume random sampling. (3) It takes into account the complexity of the program: A simple program is tested less than a complicated program for the same correctness probability. The model also yields the optimal testing strategy to be used. Specifically, for algebraic programs the test cases should be spread out over the input domain for higher correctness probability. The disadvantages of the model are: (1) It is relatively expensive to determine the equivalence classes and their complexity. (2) Incorporation of more general continuity assumptions (e.g., boundary value relationships) results in mathematically intractable derivations. 4.3. Summary The models discussed in this section are especially attractive for medium size programs whose reliability cannot be accurately estimated by using reliability growth models. These models also have the advantage of considering the structure of the program. This enables the joint use of program proving and testing in order to validate the program and assess its reliability (Long et al., 1977). 5. Conclusion We first defined software reliability and discussed three methods of measuring it. Then we developed a general framework for software reliability growth models using the concept of error size and testing process. We distinguished between error counting and nonerror counting models. If only the reliability estimate is required, then the nonerror counting models are preferable since they can model the debugging process more realistically. Error counting models should be used when an estimate of the number of remaining errors is needed. This may be required if resources have to be allocated for the maintenance phase (assuming that the average resource per error correction is known). It is also possible to estimate the number of errors remaining in a program by using error seeding techniques. Finally, we briefly discussed two sampling models, namely, the Nelson model and its extension and an input domain based model. At the present time no specific software reliability has found wide acceptance. This is partly due to the cost involved in gathering failure data and partly because of the difficulty in modelling the testing process. In the following, we outline a method combining well established proof procedures with software reliability estimation methods. It is particularly suitable for critical process control systems. 24 F. B. Bastani and C. V. Ramamoorthy (1) During the testing and debugging phase at least two different software reliability growth models should be used, primarily for helping the manager to make decisions such as when to stop testing, etc. Goodness-of-fit tests should be performed in order to select the model which is most appropriate for the failure data obtained from the project. (2) After the reliability growth models indicate that the reliability objective has been achieved, a sampling model is used in order to get a more accurate estimate of the reliability of the program. (a) At first equivalence classes are determined based on the paths in the program using the selection criterion discussed in Section 4.2. Boundary value and range testing are performed in order to ensure that the classes are chosen properly. (b) If the path corresponding to each equivalence class can be verified (e.g., by using symbolic execution) then the correctness probability of the class is 1. (c) If the correctness of the path cannot be verified, then the degree of the equivalence class is estimated. Next, as many test cases as necessary are used so as to achieve a desired confidence in the correctness of the software. During the first decade of software reliability research the major emphasis was on developing models based on various assumptions. This resulted in the proliferation of models, most of which were neither used nor validated. Currently the consensus appears to be that perhaps there is no single model which can be applied to all types of projects. Hence, one area of active research is to investigate whether a set of models can be combined so as to achieve more accurate reliability estimates for various situations. Other research topics include (1) developing methods of analyzing the confidence in the predictions of a model, and (2) using software reliability theory to assist with the management of a project throughout its life cycle. References Abdel-Ghaly, A. A., Chan, P. Y. and Littlewood, B. (1966). Evaluation of competing software reliability predictions. 1EEE Trans. Softw. Eng. 12(9). Angus, J. E., Schafer, R. E. and Sukert, A. (1980). Software reliability model validation. In Proc. Annu. Rel. and Maintainability Syrup., San Francisco, CA, Jan. 1980, 191-199. Barlow, R. E. and Proschan, F. (1975). Statistical Theory of Reliability and Life Testing. Holt, Rinehart and Winston, New York. Bologna, S. and Ehrenberger, W. (1978). Applicabilityof statistical models for reactor safety software verification. Unpublished report. Cox, D. R. and Lewis, P. A. W. (1966). The Statistical Analysis of Series of Events. Methuen, London. Dahl, G. and Lahti, J. (1978). Investigation of methods for production and verification of computer programmes with high requirements for reliability. OECD Halden Reactor Project, Preliminary Report. DeMillo, R. A., Lipton, R. J. and Sayward, F. G. (1978). Hints on test data selection: Help for the practicing programmer. Computer (IEEE), April, 34-41. Duran, J. W., Wiorkowski, J. J. Capture-recapture sampling for estimating software error content. IEEE Trans. Softw. Eng. 7(1). Software reliability 25 Forman, E. H. and Singpurwalla, N. D. (1977). An empirical stopping rule for debugging and testing computer software. J. Amer. Stat. Ass. 72, 750-757. Goel, A. L. and Okumoto, K. (1979a). A time-dependent error-detection rate model for software reliability and other performance measures. 1EEE Trans. ReL 28(3), 206-211. Goel, A. L. and Okumoto, K. (1979b). A Markovian model for reliability and other performance measures for software systems. In Proc. Nat. Comput. Conf., New York 48, 767-774. Goel, A. L. (1985). Software reliability models: Assumptions, limitations, and applicability. IEEE Trans. Softw. Eng. 11(12), 1411-1423. Jelinski, Z. and Moranda, P. (1972). Software reliability research. In: W. Freiberger, ed., Statistical Computer Performance Evaluation. Academic Press, New York, 465-484. Littlewood, B. and Verrall, J. L. (1973). A Bayesian reliability growth model for computer software. J. Roy. Stat. Soc. 22(3), 332-346. Littlewood, B. (1979). How to measure software reliability and how not to... IEEE Trans. Rel. 28, 103-110. Littlewood, B. (1980a). A Bayesian differential debugging model for software reliability. Proc. COMPSAC "80. Chicago, IL, 511-519. Littlewood, B. and Verrall, J. L. (1980b). On the likelihood function of a debugging model for computer software reliability. Dep. Math., City Univ., London. Long, A. B. et al. (1977). A methodology for the development and validation of critical software for nuclear power plants. Proc. 1st Int. Conf. Comp. Softw. & Appl. (COMPSAC "77). Chicago, IL. MacWilliams, W. H. (1973). Reliability of large real-time control software systems. In: Rec. 1973 1EEE Syrup. Comput. Sofiw. Rel. New York, 1-6. Mills, H. D. (1973). On the development of large reliable software. Rec. IEEE Syrup. Comp. Softw. Rel. New York, 155-159. Moranda, P. B. (1975). Prediction of software reliability during debugging. In: Proc. 1975 Annu. Rel. and Maintainability Symp. Washington, DC, 327-332. Musa, J. D. (1975). A theory of software reliability and its applications. IEEE Trans. Softw. Eng. 1(3), 312-327. Musa, J. D. and Okumoto, K. (1984). A logarithmic Poisson execution time model for software reliability measurement. In: Proc. 7th Int. Conf. Softw. Eng., Orlando, FL, 230-237. Myers, G. J. (1978). The Art of Software Testing. Wiley, New York. Nelson, E. (1978). Estimating software reliability from test data. Microelectronics and Reliability 17, 67-74. Ohba, M. (1984). Software reliability analysis models. IBM J. Res. Develop. 28, 428-443. Ramamoorthy, C. V. and Bastani, F. B. (1979). An input domain based approach to the quantitative estimation of software reliability. Proc. Taipei Sere. on Softw. Eng. Taipei. Ramamoorthy, C. V. and Bastani, F. B. (1980). Modelling of the software reliability growth process. In: Proc. COMPSAC "80, Chicago, IL, 161-169. Ramamoorthy, C.. and Bastani, F. B. (1982). Software reliability--Status and perspectives. 1EEE Trans. Soflw. Eng. 8(4), 354-371. Schick, G. J. and Wolverton, R. W. (1978). An analysis of competing software reliability models. IEEE Trans. Softw. Eng. 4(2), 104-120. Shooman, M. L. (1972). Probability models for software reliability prediction. In: W. Freiberger, ed., Statistical Computer Performance Evaluation. Academic Press, New York, 485-502. Tal, J. (1976). Development and evaluation of software reliability estimators. UTEC SR 77-013, Univ. of Utah, Elect. Eng. Dep., Salt Lake City, UT. Trivedi, A. K. and Shooman, M. L. (1975). A many-state Markov model for the estimation and prediction of computer software performance parameters. In: Proc. 1975 Int. Conf. Rel. Sofiw., Los Angeles, CA, 208-220. TRW Defense and Space Systems Group (1976). Software Reliability Study. Rep. No. 76-2260.1-9-5, RW, Redondo Beach, CA. Weyuker, E. J. and Ostrand, T. J. (1980). Theories of program testing and the application of revealing subdomains. IEEE Trans. Softw. Eng. 6(3), 236-246. Yamada, S., Ohba, M. and Osaki, S. (1983). S-shaped reliability growth modeling for software error detection. 1EEE Trans. Rel. 32, 475-478. Yamada, S. and Osaki, S. (1985). Software reliability growth modeling: Models and applications. 1EEE Trans. Softw. Eng. 11(12), 1431-1437. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 27-54 "~ J Stress-Strength Models for Reliability Richard A. Johnson I. Introduction It is a well accepted fact that the strength of a manufactured unit is a variable quantity that should be modeled as a random variable. This fact forms the basis for all of reliability modeling. A second source of variability may also have to be taken into account. When ascertaining the reliability of equipment or the viability of a material, it is also necessary to take into account the stress conditions of the operating environment. That is, uncertainty about the actual environmental stress to be encountered should be modeled as random. The terminology stress-strength model makes explicit that both stress and strength are treated as random variables. Let X be the stress placed on a unit by its operating environment. In many applications, X is taken to represent the maximum value attained by a critical kind of stress. Lloyd and Lipow (1962) describe an application where X is the maximum chamber pressure generated by the ignition of a solid propellant in a rocket engine. Kececioglu (1972) discusses a case where a torsion stress is the most critical type of stress for a rotating steel shaft on a computer. Typically, the stress variable is the most difficult to model accurately because of the lack of sufficient data. In the simplest stress-strength model, X is the stress placed on the unit by the operating environment and Y is the strength of the unit. A unit is able to perform its intended function if its strength is greater than the stress imposed upon it. In this context, we define reliability (R) as R = probability that the unit performs its task satisfactorily. That is, reliability is the probability that the unit is strong enough to overcome the stress. Let the stress X have continuous distribution F(x) and strength Y have continuous distribution G(y). When X and Y can be treated as independent, R = _f F(y)dG(y)= .f [1 - G(x)]dF(x)= P[Y>X]. 27 28 R. A. Johnson This model, first considered by Birnbaum (1956), has found an increasing number of applications in civil, mechanical and aerospace engineering. The following examples help to delineate the versatility of the model. EXAMPLE 1.1 (Rocket engines). Let X represent the maximum chamber pressure generated by ignition of a solid propellant, and Y be the strength of the rocket chamber. Then R is the probability of a successful firing of the engine. EXAMPLE 1.2 (Comparing two treatments). A standard design for the comparison of two drugs is to assign Drug A to one group of subjects and Drug B to another group. Denote by X and Y the remission times with Drug A and Drug B, respectively. Inferences about R = P [ Y > X ] , based on the remission time data X l , X 2 . . . . , X m and I11, Y2, " " , Yn, are of primary interest to the experimenter. Although the name 'stress-strength' is not appropriate in the present context, our target of inference is the parameter R which has the same structure as in Example 1.1. EXAMPLE 1.3 (Threshold response model). A unit, say a receptor in the human eye, operates only if it is stimulated by a source whose random magnitude, Y, is greater than a (random) lower threshold for the unit. Here P[ Y> X] = P[unit operates] is again of the form described above in stress-strength context. 2. Nonparametric inference about stress-strength reliability Let the data consist of a random sample of size m of stresses X 1, X 2. . . . . Xm and an independent random sample of size n of strengths I11, Y2. . . . . In. Birnbaum (1956) proposed the point estimate = Ulmn where U = number of pairs (X,., Yj) with Yj > Xr Alternatively, we can express/~ as 1~ = Fro(y) dG,(y) -oo where Fm(") and Gn(') are the empirical cdfs of the X's and Y's, respectively. Now E(/~)= ~ i=1 k j=l P[ Yj > X ; ] - P [ Y > X ] = R mn 29 Stress-strength models for reliability so /~ is an unbiased estimator of reliability. Under the assumption that the underlying cdfs F(.) and G(.) are continuous, the order statistics X(1) ~< • • • ~<X(,,) and t"(1) ~< " " " ~< Y~n) are complete sufficient statistics so that/~ is the unique uniform minimum variance unbiased estimator of R. Another equivalent expression for R takes advantage of the relation between the Mann-Whitney and Wilcoxon form of the two-sample rank statistics. That is U---mn+m(m+ 1 ) / 2 - ~ rank(Xi). i=l Owen, Craswell and Hansen (1964) point out that /~ remains unbiased even if F(.) and G(.) are not continuous. It is also possible to obtain a distribution-free lower confidence bound on R = P[Y<X] based on /~. First note that, Fro(y) d G . ( y ) - 1~ - R = F ( y ) dG(y) (2.1) = X [Fm(y) - F(y)] d G , ( y ) + - ~ F ( y ) [dGn(y ) - d G ( y ) ] . -oo Birnbaum and McCarthy (1958) bound the right hand side of relation (2.1) by sup[Fm(x ) - F(x)] + sup[G(y) - G.(y)] = D.7, + D + x y where D + (D,,7,) is the Smirnov statistic based on a random sample of size m. Consequently, P[I~ - R <~ c] >f P[DZ,, + D~+ <~ c] (2.2) so a conservative lower confidence bound can be obtained for R from the distribution of D~, + Dff. If c is selected so that 1 - ~ ~< P[D~, + D~+ <~ c], then P [I~ - c <~ R ] >~ 1 - or. (2.3) Thus 1~ - c is a conservative 100(1 - ~)% confidence bound f o r R . Under the transformation Z = F ( X ) , Fm(X ) - F ( x ) = H m ( z ) - z where H m ( z ) = (number of F(Xi) <~ z)/m. Since the Z i = F ( X i ) are independent uniform random variables, the distribution of Dm+ , and that of D~-, are free of F(.) and G('). Furthermore, P[D~, < d ] =P[D+m < d ] and it is well known that P[D+m < d ] - L ( d x / ~ ) ~ O uniformly in d, where L ( z ) = 1 - e x p ( - 2 z 2 ) . This suggests the approximation R . A . Johnson 30 P [ D m + D + <<.c] ~ f f L((c - u) x ~ ) d L ( u x//m) n e - 2mc2 m e - 2nc2 (2.4) -1 m+n m+n 2 x / ~ m n c e - 2mnc2/(m+ n) 1 xf r ~ (m + n) 3/2 f 2~c/,/~+~ d-2mc/m~/'~ e - ~/2 d t . Since D m and D + are independent, both P[Df,, + D + <~ c] = P [ D 2 + D + <<.c] and the approximation are symmetric in the sample sizes m and n. Owen, Craswell and Hansen (1964) present tables, based on approximation (2.4), which extend those presented by Birnbaum and M c C a r t y (1958). The tables are entered, using the confidence level and 2 where 2 = m/(n + m), to obtain 3 = c x / m + n in our notation. Note that their upper bound on P [ X > Y] yields our lower bound on R = P [ Y > X ] . EXAMPLE 2.1. Suppose we have m = 20 values of maximum rocket pressures and 30 observations on the strength of the chambers. Counting cases of strict inequality in our samples, we obtain U = 591 so /~ = 591/(30)(20) = 0.985 and 2 = 20/50 = 0.4. For a 90~o confidence interval, b = 2.69289 = e x / ~ so c -- 0.381 and ,~ - c = 0.985 - 0.381 = 0.604 is the 9 5 ~ lower confidence bound on R. Govindarajulu (1968) reports investigating two-sided confidence bounds based on the inequality I/~ - R ] ~ s u p I F . - F ] + s u p l G m - GI = x Dm + D,, y that follows directly from (2.1). However, the resulting intervals were very wide so he suggests employing the large sample normal approximation for R. THEOREM 2.1. I f m, n--* ~ , /~-R x/min (m, n'~ - - S , N (0, 1) where <y +n G(x) [ 1 - G(y)] dF(x) d F ( y ) -<y ] . Stress-strength models for reliability 31 The asymptotic normality follows directly from the representation as a U-statistic. Involving Van Dantzig's bound Var(l~)<<.R(1-R)/min(m,n) ~< 1/4 min(m, n), one can choose c to satisfy c = ~ - 1(1 - y) 2 x / m i n (m, n) (2.5) where cb-l(") is the inverse of the standard normal cdf. Then / ~ - c is the 100(1 - 7)Yo lower-confidence bound on R. For the equal sample size situation, Govindarajulu (1968) shows the 9 5 ~ confidence bound takes x / m + n c = 1.17 whereas ~/-m + n c = 2.93 for the Birnbaum and McCarthy approach. Alternatively, a 2 in Theorem 2.1 can be estimated by replacing F by Fm and G b y Gn. ~r2=min(m,n) {~[ f FZ(x)dG,,(x)- ( f Fm(X)dG,,(x))2] Sen (1967) gives essentially the same result as Govindarajulu although he derives slightly different estimates of a 2. One of his estimates of a 2 can be described in terms of ranks. Let Rl, . . . , R m be the ranks of the X i and S 1. . . . . S n the ranks on the Yj in the combined sample. Set Sff0 _ 1 m-1 (R i _ i)2 _ m R-- i=1 m + 2 12 . ( +)] 2 sgl = 1 (Si - j ) - n S n-1 j=l The rank estimator of 0. 2 is &2=(Sm°+~)min(m,n). The normality of/~ should be a reasonable approximation if m, n t> 50 unless the reliability in question is extremely high. In this latter case, a conservative bound can be obtained based on (2.2) but using the exact distributions of D~ + D~- rather than the approximation (2.4). The nonparametric approach has one serious drawback. In return for its distribution-flee property, it is not possible to establish high reliability with even moderate sample sizes. R.A. Johnson 32 3. Parametric inference procedures Given any parametric families {F(x[O1), Oze 192} for strength, the reliability 01e191} for stress and {G(y[02), Ro,, o2 = f F(xlO~) dG(xl02). (3.1) Among the numerous choices for stress and strength distributions, only a few special cases are amenable to exact small sample inference procedures. We first treat the normal and then the Weibull stress-strength models before discussing the general case. 3.1. Normal theory stress-strength models Suppose F(') is N(# t, a~) and G(') is N(/~ 2, a2). Then R = PIt> Xl = e l Y - x > Ol = ¢ { ]~2--/"£1 "~ where (I)(.) is the standard normal cdf with pdf cp(.). Without further assumptions, it is not possible to obtain exact confidence procedures. 3.1. I. The general normal-normal model Let Xl, X 2 . . . . , X m be a random sample of stress values and Y~, I"2. . . . . Y, be an independent random sample of strengths. Downton (1973) obtained an expression for the uniform minimum variance unbiased (UMVU) estimate of R by conditioning the indicator I[X < Y] on the complete sufficient statistic x, y, S~ = ~ ( X i -X')2/(m - 1), s~ = Y~(yj- y)2/(n - 1). In particular × f l f d(v) (1 -- /)2)(n - 4)/2(1 _ U2)(m- 4)/2 d u d v -1 ~/-1 with (y - X) x//m s2(n - 1) fmm d(v) = s l ( m - 1)--+I) sl(m 1) ~]n " We take 1~=~0 h i f d ( v ) < ~ - 1 , all I r i s < l , ifd(v)>f 1, all [v[~<l. (3.2) 33 Stress-strength modelsfor reflability When the sample sizes m and n are both large, confidence bounds for R can be set using the approximate normality of ~, &2 = y ~ ( x _ y ) 2 / m , ~ and & 2 = y . ( y ~ - 2 ) 2 / n . The maximum likelihood estimator of reliability is a2 /~ = ~ ( Y - x)/v/~-~ + a2)" Since Y- X #2 - #l 1 [(Y- ~2) - ( x - u,)] ,/712+4 1 (122 -- ~/1) 2 (a~ + a~)3/2 + op x/ [(a2 _ {7?) "}- ({722 -- 0"22)] 1 ) min (m, n) (3.3) we obtain the following asymptotic result. THEOREM 3.1. and m/(m + n)~)~ ( 0 < 2 < I f m, n ~ o o (~__+=X 2 / , 2 _ f ~t 1), then "~ &a N(0, a2) (3.4) where an can be estimated by ^2_Im+n ~rR ~.2 + 8.2 [~+__+&22 ( y _ y ) 2 n 2(8 2 + &2z)z (~m-1 a~+ x --n-1 ^4)1 n2 ~2 (3.5) • As a consequence of Theorem 3.1, an approximate 100(1 - a)% lower confidence bound for R is given by (3.6) ',x/ a~ + a 2 where 1 - e = ~(z~). 3.1.2. Equal variances." ~rZ = aft Suppose it can be assumed that the stress and strength distributions have equal variances. Estimating the common 0-2 by m n s2 = E i=l (xi - -x)2 + YV = 1 re+n-2 ( Y j -- . ~ ) 2 R. A. Johnson 34 leads to the non-central t-variable /[(1 1~,~,2]1/2 T=(Y-X) +n] "J whose noncentrality parameter is b = (/~2 -/-~l)/a(1/m + 1/n) I/2. Since P~[ T ~< t] is monotonically decreasing in 5, a 100(1 - e)~o lower confidence bound for b is given by _5 where P ~ [ T o b ~ ~< t] = 1 - (3.7) (see Lehmann (1959), Corollary 3, p. 80 and also p. 223-224). Next, so the 100(1 - ~)% lower confidence bound on R is R=¢ (~-~/1m + ~) 2 (3.8) where _b is a solution to (3.7). Govindarajulu (1967) gives two-sided limits. 3.1.3. Known stress distribution Mazumdar (1970) derived minimum variance unbiased estimates of R when the stress distribution is known and when a~ is either known or unknown. Downton (1973) gives an alternative integral expression for the estimator. Church and Harris (1970) suggested a closely related estimator and derive the large sample approximate confidence interval. In the notation of Section 3.1.1, their 100(1 - ~)% lower confidence bound is asymptotically equivalent to (3.9) where ^2 - _ _ n ^2 + (y-/~l) 2 ( n - 1)a 2 The fact that/~l and ~r1 are known does not seem to lead to exact inference procedures. Mazumdar (1970) does obtain an exact, but inefficient, confidence bound by introducing m pseudo random numbers for the first sample. Stress-strengthmodelsfor reliability 35 3.1.4. Some sample size considerations Owen, Craswell and Hansen (1964) also treat paired observations and cases where the variances and covariances are known. They then obtain some upper bounds on the sample size required to achieve specified confidence bounds. We present an extension of their approach. For independent samples, when a 2 = a22 = 6 2, R = ~((/.L 2 -- #l)/N/~ 0") and a 2 is estimated by sp2. Given a fixed precision c and reliability R, required sample sizes can be obtained by solving 1 - cc= P[I~ - c < R ] = P [ Y~-X~z(l_R_c,] Lx/2 s v =PIT(bm,,,)<x/2(l+~)-'/2z(~_R_c) 1 (3.10) where T(bm,n) is a non-central t-variable with m + n - 2 degrees of freedom and non-centrality parameter ~rn, n = + Z1-R " Note that the sample sizes m and n enter the non-centrality parameter, degrees of freedom and the percentile x/~+mz(l_R_c). The values of m and n do, however, enter (3.10) symmetrically. In an application, the solutions m, n must be maximized over the range R of interest. Owen, Craswell and Hansen (1964) give a table of values for the case of equal sample sizes. 3.2. Exponential and Weibull distributions with equal shape When the stress and strength distributions are both Weibull, and their shape parameters are equal 1 Rol.o~,p=l-fo~e-(x/o1'"P(X~] p- e- (x/°:)Pdx 02 \02/ 1 1 + (O,/Oz) p (3.11) This Weibull expression includes the negative exponential distribution when p = 1, and the Rayleigh distribution, when p = 2. Unless the common shape parameter is known, only large sample approaches to inference are available. When both distributions are negative exponential, some exact procedures are available. With independent random samples, the likelihood is R. A. Johnson 36 O l m e'rim= 1 Xi/Ol 0 2 n eZT=1 r~/o2 (3.12) and the maximum mikelihood estimator of R = 1/(1 + 01/02) is /~ = 1/(1 + X/Y). The bias is relatively small and (m + n)(/~ - R) = O(1) where R is the UMVU estimator. Since (X/01)/(Y/02) is distributed Fz,,,2n, a 100(1 - e)% lower confidence bound on R is given by ( X 1/ 1 + ~ F2rt, 2m(0~ ) ) <R (3.13) where F2. ' 2.n(~) is the upper ~-th point of the F2. ' 2,.-distribution. Alternatively, since (n/m)F2." 2m/(1 + (n/m)F2,,, 2m) has a beta distribution with parameters n and m, the lower confidence bound can also be expressed as 1 -- ~/1--0¢ - <R m~ where t/1 _~ is the 100(1 - ~)-th percentile of the beta distribution. The case of known stress parameter, 01, can be treated by the same methods. Basu (1980) considers the Marshall-Olkin bivariate exponential distribution. 3.3. General parametric families A ^ Given point estimates 01 and 0 2 the point estimate of R, = f F~,(x) dG~2(x ) (3.14) can usual!y be evaluated by numerical methods. Notice that /~ is the MLE if A 0~ and 02 are MLE's. Except for the normal and exponential cases, confidence bounds must be based on large sample theory. Suppose - , N(0, I 1 1(01) ) A independent of 0 2 and o2) ~'~, N(0, 12- '(02)) where 11(01) and 12(02) are the Fisher information matrices for the stress and strength distributions, respectively. Then, if the derivatives are smooth Stress-strength modelsfor reliability 37 ~/m + n(R ~,. ~ - ROI"02) = n vym(o,_0,)' ? +n Fo,(x)dGo~(x) ~ ^ ~ f Fo,(x)dGo~(x) +O(minmn,) 1 (3.15) Under suitable regularity conditions including the interchange of integration and differentiation. ao~=~ f~ fFo,(x)dGo2(X)=f~_~f~ [1 - Go2(U)] _ ~ ~f(ul01) -~0, f(ul O,)g(xl 02) d u d x f(ulO,)du, ~ O , / f ( u [ 01 ) bo = f ~ F o , ( X ) ~02/g(xl 8g(x]02)02) g(xl 02) dx. (3.16) Notice that ao, and bo2 are expressions for the covariance of score functions. THEOREM 3.2. If m, n ~ ~/m + n(R~ ,, ~ and m/(m + n ) ~ 2 ( 0 < 2 < 1 ) , then - Ro,,o~) ~ , N(O, a~,2 ~) (3.17) where aR,2 A may be estimated as ^2 1 ^ 1 aR'~'=--~ a'°llll(Ol)a'°l+--l- 2 b'~ I f 1( ^ b 02) ~ . As a consequence of Theorem 3.2 an approximate large sample 100(1- ~)% lower confidence bound on Ro," o~ is given by R ~ , . ~2 - z ~ R , ~ / , f m + n. (3.18) R. A. Johnson 38 3.4. Drawback of the parametric model Only moderate sample sizes are required for estimates of /~1, #2 and a( = al = a2) in the normal model. However, estimates of reliability and the lower confidence bound make strong use of the assumptions that the upper tail of the stress distribution and lower tail of the strength are normal If the sample sizes are not large enough to produce observations in these tails, we cannot even check this assumption. If a small fraction of the population of units contain major defects of material or workmanship, even a moderate sample of strengths will not show these 'rare' causes of failure. In this situation, use of an assumed parametric form for the stress distribution will, typically lead to estimates of P [ Y > X ] which are, incorrectly, very high. Even without such extreme departures from the postulated models, tail areas remain very difficult to estimate. The choice between normal, Weibull or lognormal tails can change the estimated reliability by several orders of magnitude when R is extremely large. 4. Stress-strength models for system reliability System models have been discussed by Bhattacharyya and Johnson (1974, 1975, 1977), McCarthy and Orringer (1975), and Chandra and Owen (1975). Bhattacharyya and Johnson (1975) study the situation where a system, consisting of k components, functions when at least s (1 ~< s ~< k) of the components survive a common shock of random magnitude. This formulation includes all series, k-out-of-k, and parallel, 1-out-of-k, systems. EXAMPLE 4.1. A panel consisting of k identical solar cells maintains an adequate power output if at least s of the cells are active during the duration of the mission. The external force interfering with the operation of the cells may be extreme temperatures and the strength of a cell, in this context, may be taken as its capacity to withstand the external temperatures. Under an 'identical component' model, the strengths of the components are assumed to be independent and identically distributed random variables with cdf G(y). The stress, common to all components, is a random variable having cdf F(x). The system reliability is then a function of F(.) and G('). In particular, the reliability of an s-out-of-k system, Rs. k, is given by [1 - G(x)]JGk-J(x) dF(x) R,, k = . j ~ s -- oo = 1-~2N[G(x)]dF(x ) (4.1) where ~ ( ' ) is the cdf of the beta distribution having density oc uk-S(1 - u)s- 1 Stress-strengthmodelsfor reliability 39 4.1. Nonparametric estimation of system reliability Let Y 1, . . . , X m be a random sample from F(.) and Y1, . . . , Yn be a random sample from G(') where F(.) and G(.) are assumed to be continuous. Replacing F(.) and G(.), in (4.1), by the empirical cdf's Fm(") and Gn(.), gives rise to the intuitive estimator R*k=l-f~ ~[G,,(x)]dFm(X)=f~ Fm(x) d~[G,,(x)] m i= 1 (4.2) where s(~)~< S(2 ) ~ " ' " ~ S(n) are the ordered ranks of the Y's in the combined sample. Bhattacharyya and Johnson (1975) also derive the UMVE estimator as a generalized U-statistic based on the kernal h(xl;Y~ .... ,Yk)= 1 ifs = 0 or m o r e y l . . . . ,Yk exceed x l , otherwise. (4.3) After some simplification, the UMVU estimator /~s, k can be expressed as m,:, ,44, Note that/~s,~ is similar in form t o / ~ * k but that it has the feature of a trimmed mean. Bhattacharyya and Johnson (1977) establish the following large sample result. THEOREM 4.1. Let m, n~oo with m/(m + n)--*2 (0<2< 1). Then pointwise = 0(1) (m + n) (Rs, ~ - R,,~) ~* and ,/m + n(~,,k- R,,~) ~ , Y O, 1 - 2 where a~ = VarF[ ~(G(X))] = f ~ 2 ( G ) d F - [ f ~(G)dFl 2, (4.5) ~r~ = --0(3 f2 b[ G(x)] b[ G(y)] {G(min (x, y)) - G(x) G(y)} dF(x) and b(u) is the pdf associated with ~. dF(y), 40 R. A. Johnson From Theorem 4.1 we conclude that a large sample 100(1 - a)% lower confidence bound for R~, k is given by , ~ 1_2+ (4.6) where a^ 2I , aA22 are obtained by replacing F and G by F m and G n in the expressions for alz, a 2. Clearly /~*,k could replace R~,k in the confidence bound (4.6). When the stress distribution F is known, the intuitive estimator has the form /~./,(F) = .~1 [ ~ ( ~ ) - ~ ( ~ n ~ ) l F ( Y ( i ) ) (4.7, and the UMVU estimator is 1 t~* k(F) .-~+1 (4.9) (i_l)C_i)F(y(o)" Bhattacharyya and Johnson (1977) also establish x/n(/~,~,(F) - R) ~ , N(0, a~), n ( R * k ( r ) -/~s,k(r)) = O(1), (4.9) pointwise, so confidence bounds similar to (4.6) are immediate. When F(.) is known, the 100(1 - e)% confidence bound on R~I, is /~,,k Z~ ~ &l. (4.10) 4.2. Exponential distributions for stress and strength When F(x) = 1 - e -x/°l and G(x) = 1 - e -x/°2, Rs, k = 1 k! ~ 1 s! j=s iJ + 02/01) 1 B(s, k () Z (-1) j k-s s + 1) j=o j 1 (s + j (4.11) + 02/01) where the last expression is obtained by expanding the product into partial 41 Stress-strength modelsfor reliability ~m fractions. Here B(s, k - s + 1) is the beta function. We note that ( ;=7 Xe, Y~" e=l Y,.) is a complete sufficient statistic and ( s + j ) - l u [ ( s + j ) X 1 - YI], u(x)--1(0) if x > (~<)0, is an unbiased estimator of (s + k + 02/01) -1. The Rao-Blackwell method leads to the UMVU estimator but its form is complicated and depends on the hypergeometric function of the second kind. The maximum likelihood estimator, /~s, k, has the considerably simpler form /~s.k = 1 k! k~s 1 S! j=O (j + S + Y/X) (4.12) Asymptotically, /~s, g is normally distributed. THEOREM 4.2. L e t m, n --* oo a n d m / ( m + n) ~ 2, 0 < 2 < 1, then + n(~'~, k - R~, 1,) ',,CP) N(0, o'R z) where 111 41> As a consequence of Theorem 4.2, lower confidence bounds are obtained using /~s, k to estimate R and Y/X to estimate 02/01 in the expression for trnz. The asymptotic relative efficiency of the nonparametric estimator (4.2) or (4.4), versus the exponential maximum likelihood estimator (4.11), is given by 1 2 o22 e= , 2 +-OJOl)]j (1 - 2)a 2 + 2aft (4.14) Bhattacharyya and Johnson (1975) tabel values of e. 4.3. Further generalizations of the s-out-of-k The foregoing results are concerned with the reliability of an s out of k system where the underlying assumptions are that the component strengths I11. . . . , Yk are iid random variables and all the components are subjected to a common random stress X which is independent of the Y's. We outline here some extensions of the model for representing the reliability structure of more complex systems. (a) Non-identical component strength distributions. When the components of a system are of different structure, the assymption of identical strength distributions may not be realistic. This is often the case with systems having standby corn- R. A. Johnson 42 ponents. Suppose that out of the k components, k~ are of one category and their strengths can be reasonably assumed to have a common distribution G 1. The remaining k 2 = k - k I components are of a different category and their common strength distribution is denoted by G2. All the k components are exposed to a common stress X having the distribution F, and the system operates successfully if at least s of the k components withstand the stress. This corresponds to the same structure function (4.3). Here, however, Y~. . . . , Yg, are iid G~, Yk, + 1. . . . , Yg are iid G2 and X is distributed as F. The system reliability is a functional of the triplet (F, G~, G2) and it can be formally expressed as kl k2 R= (j~)(j2)j~ ~ ~ f(1-G,y' Gf'-J'(1-G2)J2Gkl-J2 dF (4.15) where the sum extends over 0 ~ j a ~< k l , 0 ~<J2 ~< k2 such that s ~ j a +J2 ~< k. When F, G1 and G 2 are exponential with the scale parameters 0, fl~ and f12, the integral in (4.15) can be simplified to a linear function of terms of the form [alfl I + a2fi2 + 0] - l where the known constants a I and a2 vary from term to term. With independent random samples {X~, . . . , Xm}, {Y~, ..., Yah,} and {Y21 . . . . . Y2n2} from F, G~ and G 2 respectively, one can easily obtain the maximum likelihood estimator of R. The U M V U estimator can also be worked out along the lines of Section 4.2. Nonparametric estimators of R can be constructed by either of the two procedures. For instance, a nonparametric estimator/~* is obtained by replacing F, G 1 and G 2 in (4.15) by the empirical cdfs. Alternatively, defining the kernel function h(X1; Yll, .'., Ylk,; Y2~, "", Y2k:) = 1 if at least s of the (k~ + k2) Y's exceed X1, = 0 otherwise, (4.16) choices of the ordered subscripts, one obtains and averaging h over all mC",~t"2~ ~,kl ] k k 2 ] the U M V U estimator of R. EXAMPLE 4.2. Consider a system with k = 2 and s = 1 where the two components have strength distributions G 1 and G 2 and are subjected to common stress with distribution F. Stress-strength models for reliability 43 From (4.15) with k~ = k 2 = 1, we obtain R= f (1-G1)G2dF+ f G I ( 1 - G z ) d F + f ( 1 - G 1 ) ( 1 - G 2 ) d F = 1 - f GIG2dF. The nonparanaetric UMVU estimator, based on random samples {X~. . . . . Xm}, {Y~ . . . . , Y~n,} and {Y2~, --., Y2n~} from F, G 1 and G 2 respectively, is given by RNP = (Tl + T2 + T3)/mnxn2 where T 1, Tz and T3 are the numbers of triplets {X;, Ylj,, Y2j2} satisfying (Y1j, < X i < Y2j2), (Y2j2 < Xi < Yljl) and (Xi < Ylj,, Xi < Y2j:), respectively. The estimator based on the empirical cdf's is given by 1~* = 1 - f Gln G2,,2dFm = 1 - [mnln2] -1 ~=,~(Qi- i)(Q" - i) where Qi is the rank of the i-th order statistic X(o within the combined X and Y~ samples, and Q[ is the rank of X(,.) within the combined X and Y2 samples. (b) Subsystems with independent stresses. In a more complex situation a system may consist of a number of independent subsystems performing different functions. Within each subsystem, the components have independent and identically distributed strengths and are subjected to a common stress so that each subsystem has the structure of an s out of k stress-strength model. The strength and stress distributions as well as s and k may vary among the subsystems. The following diagram illustrates such a system where the two subsystems A and B are serially connected. subsystem A subsystem B I 2 out of 3 1 out of 2 Fig. 1. Serially connected subsystems with independent stresses. 44 R. A. Johnson The subsystem A functions when at least two of the three components survive the stress X. The component strengths are iid with distribution G~ and the common stress X has distribution F 1. Similarly, the subsystem B has the structure of a 1 out of 2 stress-strength model where the strength and stress distributions are G2 and F 2 respectively. The system reliability R is given by R = R 2A, 3 R B 1,2 where the factors on the rhs are the stress-strength reliability functions for the subsystems and they have the same forms as given in (4.1). Using the methods of Section 4.1, one can obtain the UMVU estimator for each of R~, 3 and R B 1,2 and, due to independence, their product will give the nonparameter UMVU estimator /~ of R. The limiting normal distribution o f / ~ and the form of the asymptotic variance can then be obtained from the subsystem results. (c) Binomial data on components. Often, components are tested under random stress conditions that prevail, and only the number of survivors are recorded rather than the measurements of stresses and strengths. In the context of a single component stress-strength model where our objective is to estimate the probability R~ = P [ Y > X] = S (1 - G ) d F , the present sampling process yields the count Z n which is the number of pairs (X~, Y~.), i = 1, . . . , n, such that Y,. > X i. The numerical measurements of Y,. and Xi are not recorded. The problem then reduces to estimating a binomial probability from the number of successes in n trials. More generally, consider a system consisting of c subsystems where each subsystem has the structure of a single component stress-strength model. The system reliability is then a function R = g ( P l , P2, " . . , Pc) where Pi = S (1 - Gi) dF;, G,. and Fi are the strength and stress distribution for the i-th subsystem, and the functional form o f g is determined by the manner in which the system is structured. Methods of estimating the system reliability from binomial count data have been developed by Myhre and Saunders (1968), Madansky (1965), Easterling (1972), and many others. The stress-strength formulation of the model loses its distinctive features when only the count data are recorded and the subsystems have single components. For a k (~>2) component stress-strength system where all the components are exposed to a common stress X in their natural operating environment, some care is needed for using binomial count data of the component failures for estimating the system reliability. Intuitively, one might interpret the reliability of an s out of k system as the probability of obtaining s or more successes in k Bernoulli trials and proceed to estimate this binomial probability from the count data. In this process, one would be estimating the functional Stress-strength modelsfor reliability 45 where R~ = ~ (1 - G ) d F . This is not the same as the system reliability for an s out of k system which is given by Rs, k= ~ (4.18) Notice, in particular, when s = k and k ~> 2, O(F, G) = Ef (1 - G ) d < (1 - G)~ d F = Rk, k . Bhattacharyya (1977) explores estimation procedures in this contect. He considers data in the form of failure counts when m components are subjected to a common stress, and this experiment is repeated n times. Efficiences are also calculated relative to the exponential model. 5. Extensions of the basic stress-strength model Two recent developments merit further attention. 5.1. Stochastic process formulation A more sophisticated stress-strength model allows the stress, X(t), and strength Y(t) to vary over time. Specifically, let {X(t):t > O} and {Y(t):t > 0} be independent stochastic processes. Consonant with our initial formulation of the stress-strength model in Section 1, we would define reliability for the period (0, to] as Rl(to) = P[ inf Y(t) > supX(t)]. t~t o (5.1) t<~t o Alternative definitions are also plausible. We could only require that current strength exceed the maximum thus far encountered. R2(to) = P[T(t) > supX(s), all t ~< to]. (5.2) s<~t Even less stringent, the requirement could be that current strength exceeds current stress. Ra(to) = P[ Y(t) > X(t) , all t <~to]. (5.3) Using definition (5.3), Basu and Ebrahimi (1983) consider the case where X(t) and Y(t) are brownian motion processes with means /Zl, #a and covariances tr~ min(s, t), tr22min(s, t). They show that R . A . Johnson 46 R3(to) = q~ ( /~2- ~1 \(~? + ,r~)to/ (5.4) which is of the same form as the normal theory model in Section (3.1). Expression (5.4) would not be expected to apply for large to since R(to)>_. 0.5 all to, when ['/'2 > /21" 5.2. Stress-strength models with covariates Strength can usually only be determined by testing a unit to destruction. However, it is often possible to measure covariates of strength without damaging the unit. EXAMPLE 5.1. A 2 X 4 to be used in the frame of a house has bending strength Y which can be observed only by destructive testing. Yet stiffness (the modulus of elasticity), which can be used to predict strength, is easily measured by a non-destructive test. Data for some species suggest that strength is related to stiffness Z according to the linear relation Y = ~z + f12z + e2 where e2 is distributed N(0, ~rzZ). For a specimen whose stiffness is z, the conditional reliability becomes B.z R(X) = PIE> X[z] = r'z- ~]A1] "~ (5.5) . EXAMPLE 5.2. Refer to Example 1.2 where the purpose is to compare remission time X using Drug A with remission time Y using Drug B. Suppose that the age z of the subject influences the remission time. We postulate the linear regression relations X= cq + fllz + e l , Y = c~2 + ~2z + e z where e 1 is distributed N(0, a2) independent of e2 which is distributed N(0, a~). For a new subject of given age z, we should provide information about / r(z) = e[r> Xlz] = ~1 ~ 2 + (¢~2 -- fl,)Z']. (5.6) \ The models in Example 5.1 and Example 5.2 were introduced by Bhattacharyya and Johnson (1981). Initially, we consider the more general model where X and Y may depend on possible different covariates. Set Z 1 = [ Z l l , Z12, . . . , Zlql It and Z,2 = [Z21 , .722 . . . . . Z2q2] t Stress-strength models for reliability 47 and assume X l z 1 ~ N(~ 1 + ~'IZl, 0"?) independent of rrz: ~ N(~: + / ~ z : , a~). We are then interested in making inferences concerning the reliability R(Zl, Z2)= P[ Y > X'Zl, z2] = ~ ( ~ 2 - O~-+--[J'2z--~2-''lZl) • (5.7) Some exact inference procedures are available when the variances are equal. Set a 2 = a ~ = a 2 so R(Z,, Z2) = ~ C x 2 - cq + '2Z2 v/~ -- fl'lZl) We have available, data of the form (Xl, z , , ) , (x2, z~2) . . . . . (Xm, zl,,,), (Y1, Z21), (Y2, z22) ..... (Y,, Z2,). Given the covariate values Zao, Z2o we note that is normally distributed with mean ~2 + P~Z2o - ~ - P' zlo and standard deviation Coa where = - - "+ -- "1- ( Z l 0 -- ~ 1 ) t m n Em Z j = 1 (Zlj -- ~ I ) ( Z l j -- ~ 1 ) t ]1 (ZlO -- Z l ) 1 (Z2o - ~2). Here ^ ~1 and ^ f12 a r e the least squares estimators. Also (m + n - qa - q2 - 2) se = ~ (xj - ~ - ]~11(Zlj- ~i)) 2 j=l + Z (y; - y - ~ ( z 2 j - ~2)) 2 • j=l (5.8) 48 R. A. Johnson is independently distributed as O"2 X#2+ ,. - 2 -- q ! - q2" Consequently, At T= T + ~2(Z2o - ~2) CoS has a non-central t-distribution with m + n - 2 - ql noncentrality parameter. - - q2 degrees of freedom and q = ~ + t ~ Z o - ~, - tr, Z~o CoO" A lower 95% confidence bound, r/, is obtained by solving Fn(Tobs) 0.95 for r/. Consequently, a 95% lower confidence bound for R(z~o, Z2o) is given by = R(zlo, Z2o)= ~(Cotl/x/~). (5.9) Gutmann, Johnson, Bhattacharyya and Reiser (1988) discuss the unequal variance case. 6. Bayesian inference procedures Given the random sample X1 . . . . , Xm from f(" q01) and an independent random sample from g(" I 02), together with a prior density p(O,, Oz), in principle one can obtain the posterior distribution h(01, 02[Xl . . . . , Xm, Y,, "" ", Yn) = p(01, 02) f i f(xil01) (-~ g(y, 102) i= I j= 1 (6.1) for (01, 02)- This distribution could then be transformed to the posterior distribution of Ro," o2 = ~ F(yl 01) dG(y] 02). Enis and Geisser (1971) obtained analytical results for negative exponential distributions and for normal distributions with equal variances. 6.1. Bayesian analysis with exponential distributions Enis and Geisser (1971) assume that the negative exponential scale parameters 01 and 02 are independent, a priori. In particular they make the choice of conjugate prior distributions pa(01 ) ~ O-s, -1 e - c,/O,, P2(02) oc 0z-s2- 1 e - c2/o2 (6.2) Combining the likelihood (3.12) of the samples of sizes m and n, we obtain the joint posterior density ..... h(O,, 02I~, Y)~:\~I ...... 1 e-(c'+m~>/°'\~l l e-C.... ~>/o2 (6.3) Stress-strength modelsfor reliability Transforming to r = 02/(01 + 0 2 ) and v = 01 a + 0 2 produces the marginal posterior distribution of R. h(r)ocrm+S~-l(1 - r)n+s2 1(1 - 1 49 and then integrating out v cr)--(m+n+sl+s2) (6.4) where c= c 2 - c I + ny - m 2 (6.5) <1. C2 + n y The transformed variable p = (1 - r)/(1 - cr) has a beta distribution with parameters n + s 2 and m + s ~ so P[r<R]=PIp< 1 - 1 _ r]_ (6.6) 1 B(n + 82, m + sl) f(l -- E)/(1 -- c_r) Jo u,+S2- 1(1 _ u)m+sl- 1 du. A 100(1 - ~)% Bayesian lower bound on R is given by 1 -- ~]1--o: r - 1 - cql (6.7) _~ where qa - ~ is the 100(1 - ~)-th percentile of the beta distribution with parameters n + s 2 and m + s I . Comparing (6.7) with the alternative form for the bound below (3.13), we see that the choice of 'informationless' priors, s a - - s 2 = 0 and c I = c 2 -- 0, leads to the same bound as the classical procedure. 6.2. Bayesian analysis with normal distributions For the case of independent samples, Enis and Geisser (1971) restrict their treatment to normal populations with equal variances. They employ the conjugate prior P(#l, #2, O')~O'-(b+3)exp { -~12Cr2 (bco+cl(l.tl-ml)2+C2(liz-m2)2)} (6.8) where b, c o, c l, c 2 > 0. The likelihood is 1 (2/1;)TM + n)/2 x exp 1 o.rn + n --[(m + n - 2)sp2 + m(/~ I - 2)z + n(#2 _ .~)2]}. 2ff 2 (6.9) 50 R. A. Johnson Since the reliability R = ~(6) where 6 = (/t 2 - I ~ ) / v / 2 a , determine the joint posterior distribution of b and (m + n posterior density for (/~,/~2, a) can then be written as h(#l, ~2, tr)m t r - ( b + l + m + " ) e x p it is convenient to - 2)sz/tr 2. The joint {1[ - ~ a 2 bco + (m + n - 2)Sp2 + m¢l(-X - ml) 2 + nc2(Y__- _m2)2]t m + c1 n = c2 .o._lexp { (c I + m ) ( • rr 1 exp { A) mx+clml~2~ 20.2 I'll 2a 2 #2 . . . . . . . c2 + n "] .) CZ + m ,610, • / ) F r o m (6.10) it is readily seen that, a posteriori, [/(ny+_c2m2) (_my+ clm,) ~5 given [ a"~N~x n+c2 m+c 1 l(m+ ~r~ , ~ 1 +C 1 n+c 2 )) independent of t~ Z = 0,2 (m + n - 2)s 2 + bc 0 + mcl(Y - ml)2/(m + cl) + nc2(y - m2)/(n + c2) 0-2 w h i c h is d i s t r i b u t e d 2 + n + b" Setting as ~(m c =1 I - -1 + - - 1 m + C1 1 , Fl "[- C 2 k . [. n~+c2m2 . . L m ~ + c~m~-] / . - n -J- C 2 the joint posterior distribution of b, z is 1 427zc e_C(6_k~)2/2 z(m+n+b)/2- I e-Z~2 2(m+n+b)/2mWH+b F() 2 _ _ _ _ //,,/2v, m + c~ d / Stress-strength models for reliability 51 and the marginal posterior distribution of b is given by h(blxl, ..., 1 x m, Yl . . . . (C- lk2 + ,Yn) 1) -(m+n+b)12 • ~ [xf2c- 'k6(1 e - ~)2/2C + c - 'k2) - lizlg F(½(m + n + b + j ) ) s-o (6.11) r ( j + 1) Although expression (6.11) is rather tbrmidable, lower bounds on ~5 can be obtained via one-dimensional numerical integration. In addition to their usual interpretation as information from earlier samples, some guidance in the choice of parameters is provided by the prior expected value Ep[R] = e I tb < m2 - ml 0 +C 1 1 +-- I+C Enis and Geisser (1971) show that the choice of a vague prior oc tr- ~ produces a posterior distribution whose expectation E ( R ) is closer to 0.5 than is the maximum likelihood estimator• Finally, it should be remarked that they treat the slightly more general case of estimating P [ a I X 1 + a 2 X 2 + • • • + apXp > 0] and that one of their formulations includes paired stress-strength data. 6.3. The Bayesian stress-strength model in risk analysis The stress-strength reliability model is also an integral part of many risk analyses. At the component level, for instance, it may be necessary to make an assessment of the reliability of a motor operated value in a nuclear power plant. This application of the stress-strength model has one dominant feature. Little or no data are available on either the critical stress or even on the strength of the component. With regard to estimating the strength distribution, one method is to gather expert opinions from several persons. The ellicited information could be in the form of percentiles such as the 10-th, 50-th and 90-th percentiles. A lognormal, or other distribution, could be fit to each person's percentiles. These must then be combined, possibly in a weighted fit, to provide an estimated strength distribution. Estimation of the stress distribution is usually approached via mathematical models which convert phenomena like earthquake magnitude to the stress on a component located at a given site. Random quantiles, like ground motion from the earthquake and parameters of the structure housing the component, are then introduced. The resulting process is studied by simulation to produce an 52 R.A. Johnson estimated stress distribution. The component reliability, R = P [ Y > X], given an earthquake, can then be estimated using the estimated stress and strength distributions determined above. Mensing (1984) provided the following example. EXAMPLE 6.1. One important eomponent in the operation of a nuclear power plant is the steam generator. In a study of the risk of a nuclear power plant to earthquakes, it is necessary to assess the ability of the generator to withstand the stresses imposed by the ground motion due to an earthquake. Almost no data exists for estimating the strength of steam generators, with respect to ground motion, so expert opinions were elucidated. It was first determined that the most likely cause of generator failure would be failure of its supports. Five experts were asked to estimate the 10-th, 50-th and 90-th percentiles for the strength of the steam generator supports. Their responses are summarized in Table 5.1 where the strength variable is the peak acceleration in ft/sec 2. Table 5.1 Expert opinions concerningpercentiles of the strength distributions (ft/sec2) Percentile Expert 10-th 50-th 90-th 1 2 3 4 5 80.71 77.48 29.06 19.37 32.28 96.85 83.94 59.72 29.06 43.58 103.31 96.85 96.85 48.43 61.34 Assuming the strength of the generator supports can be approximated by a lognormal distribution, a weighted least squares procedure was used to estimate the mean, 0, and standard deviation, b, of the natural logarithm of the strength distribution. The resulting estimates are 0 = 4.06 and ~ = 0.29. Mathematical modeling can be used to estimate the distribution of stress at the base of the steam generator. Suppose the stress distribution is modeled by a lognormal distribution where the natural logarithm of stress has mean 0s = 2.32 ft/sec 2 and standard deviation bs = 0.40 ft/sec 2. Then, it is clear that the reliability of the steam generator is nearly 1.0. Specifically, R = P [ l n Y > lnX] = ~(3.52) = 0.99978. Since the primary source of information about the random variation in stress and strength is expert opinion and engineering judgement, it is a more difficult problem to obtain lower bounds for R. In the context of the nuclear power plant, the lower bound on R converts to an upper bound on the probability of failure and subsequent radioactive release. Some attempts have been made to quantify the uncertainty experts have in formulating their opinions and using this quantified Stress-strength models for reliability 53 uncertainty to develop bounds for the probability of failure. See Bohn et al. (1983) for more information. A risk analysis of a system is considerably more complicated than for a single component. With a nuclear power plant, failure can occur in numerous ways. From a fault-tree analysis, each separate failure path is determined. Data are typically available on some component strengths but it is mostly expert opinion that must be combined in order to obtain an estimate of the failure path probabilities and, ultimately, the system reliability. The calculation of an estimate of system reliability can involve as many as 300 to 400 components and the probability of an accident sequence is calculated from, say, a multivariate normal distribution. In this setting it is possible to include a stress such as an earthquake, as a common stress to numerous components. References Basu A. (1980). The estimation of P[X< Y] for distributions useful in life testing. Navel Res. Log. Quart. 3, 383-392. Basu, A. and Ebrabimi, N. (1983). On computing the reliability of stochastic systems. Statistics and Probability Letters 1, 265-268. Bhattacharyya, G. K. (1977). Reliability estimation from survivor count data in a stress-strength setting. IAPQR Transactions--Journal of the Indian Association for Productivity, Quality and Reliability 2, 1-15. Bhattacharyya, G. K. and Johnson, R. A. (1974). Estimation of reliability in a multicomponent stress-strength model. J. Amer. Statist. Assoc. 69, 966-70. Bhattacharyya, G. K. and Johnson, R. A. (1975). Stress-strength models for system reliability. Proc. Syrup. on Reliability and Fault-tree Analysis, SIAM, 509-32. Bhattacharyya, G. K. and Johnson, R. A. (1977). Estimation of system reliability by nonparamatric techniques. Bulletin of the Mathematical Society of Greece (Memorial Volume), 94-105. Bhattacharyya, G. K. and Johnson, R. A. (1981). Stress-strength models for reliability: Overview and recent advances. Proc. 26th Design of Experiments Conference, 531-546. Bhattacharyya, G. K. and Johnson, R. A. (1983). Some reliability concepts useful in materials testing. Reliability in the Acquisitions Process. Marcel Dekker, New York, 115-131. Birnbaum, Z. W. (1956). On a use of the Mann-Whitney statistic. Proc. Third Berkeley Symp. Math. Statist. Prob. 1, 13-17. Birnbaum, Z. W. and McCarthy, R. C. (1958). A distribution free upper confidence bound for P(Y < X) based on independent samples of X and Y. Ann. Math. Statist. 29, 558-62. Bohn, M. P. et al. (1983). Application of the SSMRP methodology to the seismic risk at the Zion Nuclear Power Plant, NUREG/CR-3428 Nuclear Regulatory Commission, Nov. chandra, S. and Owen, D. B. (1975). On estimating the reliability of a component subject to several different stresses (strengths). Naval Res. Log. Quart. 22, 31-40. Church, J. D. and Harris, B. (1970). The estimation of reliability from stress-strength relationship. Technometncs 12, 49-54. Downton, F. (1973). The estimation of P ( Y < X) in the normal case. Technometrics 15, 551-558. Easterling, R. (1972). Approximate confidence limits for system reliability. J. Amer. Statist. Assoc. 67, 220-22. Enis, P. and Geisser, S. (1971). Estimation of the probability that Y < X. J. Amer. Statist. Assoc. 66, 162-68. Govindarajulu, Z. (1967). Two sided confidence limits for P ( Y < X ) for normal samples of X and Y. Sankhy-d B 29, 35-40. 54 R . A . Johnson Govindarajulu, Z. (1968). Distribution-free confidence bounds for P ( X < Y). Ann. Inst. Statist. Math. 20, 229-38. Guttman, I., Johnson, R. A., Bhattacharyya, G. K. and Reiser, B. (1988). Confidence limits for stress-strength models with explanatory variables. Technometrics (in press). Lehmann, E. (1959). Testing Statistical Hypotheses. Wiley, New York. Kececioglu, D. (1972). Reliability analysis of mechanical components and systems. Nuclear Eng. Des. 9, 257-290. Lloyd, D. K. and Lipow, M. (1962). Reliability, Management, Methods and Mathematics. Prentice-Hall, Englewood Cliffs, NJ. Madansky, A. (1965). Approximate confidence limits for the reliability of series and parallel systems. Technometrics 7, 495-503. Mazumdar, M. (1970). Some estimates of reliability using interference theory, Naval Res. Log. Quart. 17, 159-65. McCarthy, J. F. and Orringer, O. (1975). Some approaches to assessing failure probabilities of redundant structures. Composite Reliability, ASTM STP 580, American Society for Testing and Materials, 5-31. Mensing, R. (1984). Personal communication. Myhre, J. M. and Saunders, S. C. (1968). On confidence limits for the reliability of systems. Ann. Math. Statist. 39, 1463-72. Owen, D. B., Craswell, K. J. and Hanson, D. L. (1964). Nonparametric upper confidence bounds for P(Y < X) and confidence limits for P(Y < X ) when X and Y are normal. J. Amer. Statist. Assoc. 59, 906-24. Sen, P. K. (1967). A note. on asymptotically distribution-free confidence bounds-for P(X < Y) based on two independent samples. Sankhy-d A 29, 95-102. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 55-72 h 1 Approximate Computation of Power Generating System Reliability Indexes M. Mazumdar I. Introduction An electric power system is a massive energy conversion and transmission facility. Its function is to convert chemical, nuclear, or kinetic potential into a more useful electrical potential and transmit electrical energy to its consumers. Power systems tend to have generation concentrated in specific locations, whereas demand is spread over a large geographic region. The problem of providing power to widely scattered demands from remote generating stations is solved by the electric utility companies through a three-tiered system. Elements of this system are: power generation subsystem, transmission subsystem, and distribution subsystem. In the power generation system, electric power is produced from a number of different types of generating plants (fossil, nuclear, hydroelectric, etc.). Transmission systems carry large amounts of power for long distance at a high voltage level. From the transmission sources, distribution systems carry the load to a service area by forming a fine network. The reliability of an electric power system has been defined as the probability of providing the user with continuous service of satisfactory quality [8]. By satisfactory quality, it is meant that the frequency and the voltage of the power supply remain within prescribed bounds. There are several reasons why reliability is very important to the electric power industry. The public has grown accustomed to very reliable supply of electricity, and it would not accept lower standards. The occurrence of power failure is expensive to the customer as well as to the utilities. The social costs of power failure have also been well-documented. There has been increasing concern during the recent years on the risks to public health and safety associated with different energy sources that are used to produce electricity. In relation to nuclear power, these risks are largely contingent on the probability and severity of infrequent system failures. Therefore, reliability considerations have come to play a major role in the planning, design, operation and maintenance of electric power plants. To achieve a high degree of reliability at the customer's level, it is necessary that each of the three components of the power system-generation, transmission and distribution--provide an even higher degree of reliability. 55 56 M. Mazumdar The performance of electric power systems is influenced by a large number of random phenomena. First, the demand for electric power has a large stochastic component, which is strongly influenced by weather. The outdoor equipment, such as transmission lines, is subject to natural causes, e.g., storm, lightning, floods as well as to inadvertent man and animal-caused damages. The equipment used to generate and transmit electricity fails randomly. The time to restore the failed equipment is also a random variable. It is thus necessary to construct probability models which can be used to predict the performance of the power systems as they are influenced by these random variables. These probability models are used to compute standard reliability parameters such as mean time to failure, availability, etc., as well as reliability indexes which are special to the electric utility industry. Concurrently, one needs to pay attention to proper collection and analysis of 'outage' data so that one has appropriate confidence in the output of these reliability studies. Early studies in power system reliability evaluation were confined to determination of generating system reserve capacity. Only comparatively recently, such studies have been extended to cover the transmission and distribution systems. Consequently, the state-of-the-art with regard to generation reliability models is much more advanced as compared to the transmission and distribution systems. Reliability models play an important role in the determination of required installation generation reserves of a given electric utility company, and its longterm planning for generation capacity expansion. The quantity determined here is the amount of installed reserve capacity required such that the probability of load-loss does not exceed a prescribed small amount. These studies thus help the planner in scheduling generating unit additions as the load grows over time. Such models also play an important role in the evaluation of expected generation system production costs. An electric utility system typically consists of many generating units of different capacities, availabilities and operating costs. Not all the units within a system experience equal utilization over time because of such differences, and units with high running costs are pressed into operation only if the load is high and/or the cheaper units are failed at the time. Therefore, the computation of expected overall production costs of a utility system needs to account for the stochastic characteristics of the generating unit failures and the system load. Power generation reliability models include only the generating units within a given system, and the rest of the system is assumed to be perfectly reliable. Thus, according to these models, a system failure occurs when the total power generated by the system falls short of the system load. In order to contrast the available system capacity with the demand, two sets of models are required, one for the states (e.g., failed and non-failed) of the generating units, and the other for load variations in a given system. When one combines these two stochastic models, one arrives at an overall system model whose solution provides the required reliability indexes which can then be used as engineering tools for planning and operating decisions. In this paper, we will confine ourselves to an examination of the computational aspects of two important power generation system reliability indexes which are Approximate computation of power generating system reliability indexes 57 used in power system planning and production costing evaluation. In particular, we provide additional results on the use of an effective approximation scheme which was proposed in a recent paper [ 14]. This scheme uses a transformation proposed by Esscher [10] for computing actuarial risks. Section 2 describes the reliability models used for the power generating system in connection with the determination of the risk due to load loss and the expected production cost. We define here the reliability indexes of interest, and point out the difficulties in their computation. Section 3 gives a description of the Esscher's approximation method as well as that of a more common approximation known in the recent power system literature as the method of cumulants [ 15]. We derive here the necessary formulas in connection with the application of Esscher's method. Section 4 provides the numerical estimates of the accuracy of this method by applying it to several prototype systems. Section 5 states the conclusions. For a more detailed discussion of generation reliability models and their uses, the reader is referred to the monographs by Billinton [3], Billinton, Ringlee and Wood [4], Endrenyi [8], and Sullivan [16]. 2. Generating system model and the reliability indexes Generating unit representation It is assumed that the generating system under consideration is composed of n independent generating units. That is, they can fail, and be repaired, independent of failures and repairs of the other units. This is usually a realistic assumption except when a single boiler supplies steam to several turbogenerators through a common header. In course of their operation, generating units may suffer complete failures or partial failures, where they lose a part of their capacity. If a simple two-state model is assumed for the generating unit, such that it alternates between two operating states, 'up' (operating) or 'down' (under repair), a measure of the unit performance is given by its unavailability, which is defined as follows [2]: Mean downtime A -- (1) Mean downtime + Mean uptime m The index A measures the fraction of the time that a unit is unavailable for service during periods when it is not on planned outage. Endrenyi [8] has shown that this index is meaningful even when maintenance lasting short length of times is concerned, provided that maintenance itself does not contribute to failure. In the power system vocabulary, the term used for unavailability is the forced outage rate (FOR), which unfortunately is a misnomer, since the index represents a pure number and not a rate. The FOR is defined as FOR = Forced outage time Forced outage time + In-service time , (2) 58 M. Mazumdar where the times appearing in the numerator and denominator refer to a reasonably long period of observation. The index (2) is equivalent to (1) when the period under question is long enough. The above definition of the forced outage rate or the unavailability assumes that the generating unit has only two states--operating at 100~o capacity or completely failed. The intermediate capacity states are usually accounted for by defining an index called the equivalent forced outage rate (EFOR), which is given by the following equation: EFOR = FOH + EFOH SH + FOH where FOH, EFOH and SH denote respectively full forced outage hours, equivalent forced outage hours and service hours. The quantity, equivalent forced outage hours, is obtained by multiplying the actual partial outage horus for each partial outage event by the fractional capacity reduction and then summing the products. The introduction of the index EFOR enables one to approximate a unit with several capacity states by one having only two states. In this two-state equivalent representation, the index EFOR estimates the long-run probability of being fully out and the quantity (1-EFOR) estimates the long-run probability that it is available at full capacity. Data on EFOR are presented for a variety of sizes and types of generating units in reports published by the Edison Electric Institute, see, for example, [7]. Load models An hourly load duration curve is obtained by first plotting on the vertical axis the power demand forecasted for each hour in a planning period in a chronological order along the horizontal axis. The load duration curve (LDC) is then obtained by reordering the demands in a descending order of magnitude. Here, the number of days on which the load exceeds a given value is plotted as an abscissa with the forecasted load value as the ordinate. Assume that the forecasted peak demand occurs for one hour during each of the days in a 20-day planning. Then, one can say that the peak load occurred in a fraction equal to 1/24 of the planning period. Figure 1 shows that the system load was expected to be above 100 MW during 50~o of the time. When the abscissa is normalized to 1, the figure can be read to denote the fraction of the time the load is expected to be above a given value. Thus it is possible to give a probabilistic interpretation to the load duration curve. The horizontal axis of the curve yields the survivor function of the load when it is treated as a random variable. It gives the probability that the observed load will exceed a specified value as denoted by the ordinate. In some studies on generation reliability, notably when unit production costs are evaluated, it is a practice to merge the individual generating unit failure models and the load probability distribution by defining the so-called equivalent load Approximate computation of power generating system reliability indexes 59 150 MW- 100 MW 75 MW 0 Fig. 1. H o u r l y l o a d d u r a t i o n 240 480 curve: An example (abscissa normalized (in hours) to 1 for LOLP calculations). duration curve, abbreviated as E L D C [15]. This definition rests on the observation that the outages caused by plant unreliability can be thought of as additions to the true load on the system. Suppose that all n units within a given system are candidates for operation to meet a given load, L. Then Available capacity = c 1 Jr- c 2 --1- • • • --1- C n - ( X 1 "Jr"X 2 -1- " ' " "~ X n "1- L ) , where ci is the installed capacity of unit i, and X; is the capacity on outage for unit i. Notice that the quantity, (X 1 + X 2 + • • • + X, + L), plays the role of an equivalent load that confronts the n units of the system. A curve which shows the proportion of times that the observed equivalent load will exceed given specified values is called the equivalent load duration curve (ELDC). It is clear from the foregoing discussion that separate sets of such curves can be drawn for all the n individual units of the system. Loss of load probability index Two different sets of generation reliability indexes are used by the electric utility industry--one in the context of long-range planning and the other for short-term operational planning. The former provides inputs to decisions in generation expansion planning and the scheduling of new unit additions. The latter indexes are of use to the operating engineer in the daily operation of a power system. The loss of load probability (LOLP) index is used in the long-range planning context, and it measures the probability that a given system's available capacity is insufficient to meet the system peak load on a given day. It estimates the fraction of time the utility system will have a generation deficit, with no consideration given to the magnitude of the deficit. Consider a system consisting of n units such that the installed capacity of unit i is c~ and its (equivalent) forced outage rate is p;, i = 1, 2, ..., n. Define X i as the unavailable capacity or the capacity on outage for unit i on a given day. We 60 M. Mazumdar assume that X; is a sequence of independent random variables. Thus X; is a random variable with ,,, distribution of X i = ci = 0 with probability = Pi, with probability = 1 - p ; . (3) Let L denote the system peak load. Then the loss-of-load probability (LOLP) index is measured by LOLP=Pr{X 1 +X 2+... +X n>c 1+c2+... +c,-L}. (4) In the situation where the LOLP index is being estimated for future time periods, as is typically done in power generation planning, the forecasted peak load will be uncertain and regarded as a random variable. We usually regard L as normally distributed with mean/~ and variance a 2, its distribution being independent of the X i random variables. If the peak load is regarded as known, a 2 = 0 and L = #, but otherwise, a 2 > 0, and departures from normality may also be anticipated. Let Y denote the deviation of the peak load from its mean /~. Then we can also express (4) as follows: LOLP=Pr{XI+X 2 + ... + X n + Y> z} , (5) where z = Cl + c2 + + Cn ~, a n d Y is normally distributed with mean 0 and variance a 2. The electric utilities in the United States plan their operation so as to meet a targeted value of the LOLP index of the order of 10- 4. Thus, the LOLP measure represeilts the extreme tail probability in the distribution of " ' " - - X l + X2 + " " + X . . P r o d u c t i o n costing i n d e x For the evaluation of the expected operating costs of a given utility, we assume somewhat simplifying the real-life situation, that (a) there are n units in the system, (b) the units are brought into operation in accordance with somespecified measure of economy of operation (e.g., marginal cost), and ( c ) t h e unit i, in decreasing order of economy of operation, has capacity, c i and EFOR, pi, i = 1, 2, ..., n. Let U denote the system load, and let F(x) = Pr { U > x}. Thus F(x) represents the load-duration curve or LDC. Consider now the i-th unit in the loading order and let W~ denote the energy unserved (i.e., the unmet demand) after it has been loaded. Let, as before, X,. denote the unavailable capacity for unit i, whose probability distribution is given by (3) and let U denote the system load. We define C i : C i ~- C 2 ~- i = " " " -~ C i. E~=U+XI+X2+'"+X ~, 1, 2, . . . , n , i=1,2,...,n. (6) (7) Approximate computation of power generating system reliability indexes 61 Thus, Z; represents the equivalent load on the first i units. Let ge(.) and G~(.) denote the probability density and distribution functions, respectively, of Z;. Clearly, wi=0 = z,Thus, E(Wi) = c, if z , < c,, if z , > c,. (8) fc o(z - Ce)g~(z) dz. (9) i Now denote by ei the energy produced by unit i. Then it follows from (9) that E(ee) = E(W,._ 1) - E(Wi) = (z- Ci_l)gi_l(Z)dz- ( z - C~)g~(z)dz i -1 i Gi- l(z) dz - = Ci- I I =(l-p;) Ce f/ ai(z) dz -- G i_,(z) d z , i = 1, 2 , . . . , n . (10) Ci-1 where G,(z)= 1 - G , ( z ) , i = 1,2 . . . . , n , and Go(z)=if(z). In the above, we interpret CO = 0 and Go(x) = if(x). The development of (10) is due to Baleriaux et al. [1]. We define the capacity factor for unit i to be CF(i)= E(ei) , i = 1,2 . . . . . n. (11) ¢i This index gives the ratio of the expected output to the maximum possible output for each unit. An accurate estimate of this index is needed by the utilities for the purposes of evaluating expected system operating costs and optimizing its generation planning. Computational difficulties In its planning process, a given utility needs to compute the LOLP and CF indexes for various combinations of the system load and mix of generating units. Thus it is necessary that an inexpensive method of computation be used for the purpose of computing these indexes. Examining (4), we observe that when the ci's and the pt's are all different, at least 2 n arithmetic operations will be required to evaluate one LOLP index. Thus, the total number of arithmetic operations in the computation of one LOLP index varies exponentially with the number of gener- 62 M. Mazumdar ating units in the system, and it might become prohibitively large for large values of n. From (10), we observe that the expected energy output of a given unit is proportional to an average LOLP value over a range of z between Ce_ ~ and Ci. Thus, it is not feasible for a power system planner to engage in a direct computation of (4) or (10), and he has to resort to approximations which require much less computer time. 3. Approximate procedures Method of cumulants From an uncritical application of the central limit theorem, one could have made the convenient assumption that the distribution of X1 + )(2 + "'" + Xn in (5) or the survivor function G~_ l(x) in (10) will be approximately normal. While this assumption may not cause problems while computing probabilities near the central region of the probability distribution, the 'tail' probabilities may be inaccurately estimated. A typical generation mix within a given utility usually contains several large units and otherwise mostly small units thus violating the spirit of the Lindeberg condition [ 11 ] of the central limit theorem. An approach to the problem of near-normality is that of making small corrections to the normal distribution approximation by using asymptotic expansions (Edgeworth or Gram-Charlier) based on the central limit theorem. Use of these expansions in evaluating power generating system reliability indexes has come to be known in the recent power-system literature as the method of cumulants. For details on the use of these expansions in computation of LOLP, see [13], and for its use in computing the capacity factor index, see [5]. In the evaluation of LOLP, one first obtains the cumulants of X1 + X 2 + • • • + X n + Y by summing the corresponding cumulants of the Xi's and of Y. These are then used in the appropriate Edgeworth or Gram-Charlier expansion. Similarly, for the purpose of evaluating E(e~) in (10), one first obtains the cumulants of Z; for each i = 1, 2 . . . . , n, by summing up the cumulants of X1, X 2. . . . . X~ and U. Next, one writes the formal expansion for G~(x) using these cumulants upto a given order. Next, one integrates the series term by term in (10) to obtain an approximation for E(ei). Caramanis et al. [5] have made a detailed investigation of this approximation in the computation of the capacity factor indexes. Their results have cast favorable light on the efficiency of this method. Esscher's approximation: Computation of LOLP We illustrate this method first with respect to the computation of LOLP. We assume that the peak load is non-random and known, i.e., a = 0. As demonstrated in [ 14], this is the worst case for the peak load distribution insofar as the relative accuracy of the different approximation methods is concerned. We use the symbols F i and F* to denote the distribution functions of the random variables X, and X 1 -~- X 2 -t- " " " -1- X n , respectively. The moment generating functions of F; Approximate computation of power generating system reliability indexes 63 and F* are respectively given by Fi(s) = eaCip; + (1 -p~.), .~*(s) = f i l~i(s ) - e ~'(s~ , (12) say. (13) i=1 In order to provide a notation which covers the continuous as well as discrete variables, we use the symbol F(dx) to denote the 'density' of the distribution function F(.), (see Feller [11, p. 139] for a mathematical explanation of the symbol F(dx)). We now define for some s, eSXF~(dx) V~(dx) - - - (14) Further, let V* denote the convolution of V~, V2 . . . . . V,. With these definitions, it is seen that the LOLP index may be expressed, as follows: LOLP = F*(dx) = F*(s) (15) e-sxV*(dx). We now choose s such that z equals the mean of V*(.). Thus, although in practical application, z will lie in the fight hand tall of F*(-), it will now be at the center of the d.f. V*(.). We also expect the distribution of V*(.) to be much closer to the normal distribution in the central portion of the distribution (much more so as compared to the tails). Thus, in the second integral of (15), the integration is being done in a region where V*(.) could be accurately approximated by a normal distribution or an expansion of the Edgeworth type. The effect of the multiplier e-sz for s > 0 is to de-emphasize the contribution of V*(dx) for values of x in the tail. Esscher's approximation technique consists in replacing V*(dx) by an appropriate normal distribution or an Edgeworth expansion, and evaluating (15). It can be shown [9] that corresponding to a given s, the first four cumulants of V*(.) are given by ~'(s) = ~ PiCi , i= 1 Pi + (1 -- Pi) e . . . . O"(s) = , =~"1 [Pi + (1 - P i ) e . . . . 12 , (16a) p,(_l- p9c£ ~b'(s) = ~ ;=' c~pi(1 - Pi) e . . . . [ - P i + (16b) (1 - p;) e-~C~ ] [Pi + (1 - p ; ) e . . . . ]3 ' (16b) 64 M. Mazumdar ~/(4)(S) ~ c ? p , ( l -- p,) e - ~ ' [ p e - 4p~.(1 - p;) e . . . . + (1 - p,)2 e - 2~,] i=1 [p; + (1 - p , ) e-SC'] 4 = (16d) In applying Esscher's approximation, we first solve the equation (in s): ~O'(s) = z. (17) Call this unique root so. We now replace V*(dx) in (15) by the normal density or an appropriate Edgeworth expansion. For a random variable X, whose first four cumulants are kl, k2, k 3 and k 4, its density F(dx) is approximated by the Edgeworth expansion [6] formula as follows: F(dx) -~ kl/~ q(t) - ~- ~ ( t ) +~ ~4~(t) + ~ ~(t) d~, (18) where tp(t) 1 =~e- ,2/2 ¢~(t)- d~tp(t) dt ~ , t - x - k1 g~/2 ' k4 k3 ~1 , k3/2 , 72 k~ Now, if we replace V*(dx) in (15) by the appropriate normal and Edgeworth expansions (18) using first and second order terms, the following formulas result: LOLP=Pr{X~+X2+'"+Xn>z} - LOLP l = eq,¢s0)-~O~Eo(u) (19a) LOLP2 = L O L P I I I _ 7~6 vl _ LOLP 3 = LOLP 2 + LOLP 1 [;4 (19b) uv + -7'lz u3v - 72 where , (19c) t u = ~o~/q,"(So), Eo(u ) = e"2/2[ 1 - ~(u)] (¢(u) = S -oo 7', - @"(s°) [ 0" (So)] 3/2 ?~ - ~4)(s°) [ 0" (so)l 2 w = x/~Eo(u) q~(u) du), Approximate computation of power generating system reliability indexes u21) = U 3 -- - 65 1 - , ~](SO) = logeP*(So) • W E s s c h e r ' s a p p r o x i m a t i o n : C o m p u t a t i o n o f unit capacity f a c t o r s A typical load distribution curve is multimodal, and it cannot be approximated by a standard distribution. For the purpose of applying the present approximation, we discretize the load-duration curve into a distribution representation having probability masses at a given number (say, m) of discrete points. That is, we obtain a discrete approximation of the load duration curve where the load points are l 1, l2 . . . . , lm with the corresponding probabilities q , r 2 . . . . , rm, where rj = P r { U = lj.}. With this approximation, one can evaluate G i _ l ( x ) in (10) as follows: m Gi_l(X)= Pr{U +Xi_ + X 1 +X 2 + ''' --- ~ P r { X I + X 2 + " " 1> +X,_l>zj}r X} (20) j, j=l where zj = x - lj. The expression Pr{X 1 + X 2 + . . . + X i_ 1 > z j } can be evaluated using the formulas given in (19). It can be seen from (16b) that q/(s) = z is an increasing function in s, and we have defined s o to be the root of the equation: ~k'(s) = z. From (16a), we observe n that q/(0) = Y~;= ~ cep e. Thus, in (20), if z j < E [ X ~ + X 2 + . . . + X i_ 1 ], So(Zj) will be negative. Now consider equation (15). If So < 0, the effect of the multiplier e-sx is to amplify the error in the approximation of V*(dx) for large x - - a clearly undesirable situation. Thus, it appears appropriate in this situation to express Pr{X, + X 2 + . . . +Xi_ , >zj} = Pr{X, +)(2 + " ' " 1 - +Xi_, <~zj}, (21) and use Esscher's method on the right hand side of (21). We define L O L P = Pr{X~ + X 2 + . . - + X , <~ z } . (22) Corresponding to (19), we obtain the following approximation for LOLP: (23a) L O L P --- L O L P 1 = e¢(S°)-S°ZEo(u) LOLP 2 = LOLP1 () 1 - 7-1 v' 6 _~ L O L P 3 = L O L P 2 + L O L P ~ (23b) , 7~ u v ' + - (24 72 uv' - ~7 ' (23c) 66 M. Mazumdar where u2 - 1 ~o(U) = e "2/2 ~ ( u ) , w' = - , j ~ e o ( U ) , v' = u 3 - - - w' For the purpose of evaluating E(ei) in (10), the integration can be done using an appropriate numerical integration routine after evaluating G,._ l(x) for as many points as the quadrature formula requires. In the numerical work reported in Section 4, we used the Trapezoidal rule for numerical integration. 4. Numerical results This section applies the formulas obtained in the preceding section to two prototype systems. System A is the prototype generating system provided by the Reliability Test System Task Force of the I E E E Power Engineering Application of Probabilistic Methods Subcommittee [12]. Table 1 gives the assumed generation mix of the 32 units comprising the system--their installed capacities and Table 1 Unit power ratings for a prototype generating system, and their assumed FOR's (System A) Unit size (MW) Number of units Forced outage rate 12 20 50 76 100 155 197 350 400 5 4 6 4 3 4 3 1 2 32 0.02 0.10 0.01 0.02 0.04 0.04 0.05 0.08 0.12 FOR's. Table 2 provides a comparison of the estimated L O L P corresponding to different values of the system margin obtained with the use of Esscher's approximation formulas (19) and the method of cumulants. For the latter method, we use the Edgeworth expansion formula keeping terms up to the first four cumulants only. Usually, such expansions are sufficient to provide close enough approximations in cases where the use of such expansion is appropriate. We also display in this table the exact L O L P values for benchmarking and comparison. Figure 2 shows the percentage relative errors resulting from using the two approximations for a wide range of values of the system margin. Approximate computation of power generating system reliability indexes 67 Table 2 Comparison of algorithms for LOLP estimation (System A) Esscher's approximation z (MW) Exact a value (19a) (19b) (19c) 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 1.23 ( - 1) 6.21 ( - 2) 4.25(-2) 2.47 ( - 2) 1.16(-2) 4.34 ( - 3) 2.35(-3) 7.91 ( - 4) 4.01 ( - 4 ) 1.02 ( - 4) 4.04(-5) 8.06 ( - 6) 1.58 ( - 6) 2.91(-7) 4.69 ( - 8) 7.25 ( - 9) 8.43 ( - 10) 9.27 ( - 11) 7.97 ( - 12) 1.35 ( - 1) 7.75 ( - 2) 4.23(-2) 2.19(-2) 1.07(-2) 4.94 ( - 3) 2.13(-3) 8.51 ( - 4) 3.12(-4) 1.03 ( - 4) 3.01 ( - 5 ) 7.69 ( - 6) 1.69 ( - 6) 3.19(-7) 5.16 ( - 8) 7.07 ( - 9) 8.29 ( - 10) 8.29(-11) 6.27 ( - 12) 1.26 ( - 1) 7.45 ( - 2) 4.16(-2) 2.19(-2) 1.09(-2) 5.07 ( - 3) 2.20(-3) 8.85 ( - 4) 3.24(-4) 1.07 ( - 4) 3.11 ( - 5 ) 7.85 ( - 6) 1.72 ( - 6) 3.23(-7) 5.23 ( - 8) 7.08 ( - 9) 8.49 ( - 10) 9.98(-11) 5.54 ( - 12) 1.23 q - 1 ) 7.22 - 2) 4.03~ - 2) 2.13, - 2 ) 1.06~ - 2) 4.95 - 3) 2.16, - 3) 8.75, - 4) 3.24, - 4 ) 1.08, - 4) 3.16, - 5) 7.99, - 6) 1.73, - 6) 3.23~ - 7) 5.21, -8) 6.90 - 9 ) 8.49~ - 10) 1.45~ - 10) 1.68, - 12) Cumulants 1 . 1 9 ( - 1) 6.30 ( - 2) 3.48 ( - 2) 2.18(-2) 1.34 ( - 2) 6.82 ( - 3) 2.74 ( - 3) 8.57 ( - 4) 2.10(-4) 4.07 ( - 5) 6.29 ( - 6) 7.78 ( - 7) 7.77 ( - 8) 6.28 ( - 9) 4.12 ( - 10) 2.21(-11) 9.63 ( - 13) 3.44 ( - 14) 0 " Excerpted from (12). Table 2 and Figure 2 impress one with the accuracy of Esscher's approximation in the region of our interest, i.e., for values of LOLP in the range between 10-3 and 10-5 and beyond. There is very little difference between the three formulas, and perhaps the formula (19b) represents the overall best choice. The cumulants methods does not fare too badly in the probability range 10- 1 to 10- 3; but below this range, the Esscher approximations appear to be decidedly superior to the method of cumulants. Similar comparisons for several other systems are given in a research report [9]. The results of this report as well as those given in [14] show that Esscher's method, while very accurate, is also speedy enough to be adopted in routine utility practice. For the purpose of evaluating the accuracy of Esscher's approximation in providing production costing expressions, we use the data provided by Caramanis et al. [15] with respect to a second synthetic system, referred to as the EPRI system D. Tables 3 and 4 give respectively the capacity mix of the system with the associated FOR's and the load duration curve. Table 5 gives the derived probability distribution (Is, rs) obtained from Table 4. Here, ls is the interval midpoint for the j-th load class interval in Table 4, and rj is the associated probability mass obtained from differencing. Table 6 gives the estimates capacity factors using the three versions of Esscher's approximations using the normal, 68 M. Mazumdar z (100 MW) Percentage Relative Error -80 5 "--T'-'T" -40 -60 -20 0 20 40 60 Exact Value 80 -] 1.2(-1) 6- - 6.2(-2) 7 --~ 4.2(-2) 8- 2.5(-2) 9- 1.2(-2) 10- 4.3(-3) 11 2.4(-3) - 7.9(-4) 12 13 t 4.0(-41 1.0(-4) 14 4.0(-5) ~ 17 8.1 (-6) i1.6(-6) - 2.9(-7) 18 > Equation (19a) Equation (19b) Equation (19c) Cumulants 2O : : : : 0 0 [] I> 4.7(-8) 7.2(-9) 8.4(-10) 21 I I I I I I I I I I I I I I I t t i I Fig. 2. Graph of relative error for the Esscher and cumulants approximations for LOLP (system A). Table 3 EPRI system D. Unit power ratings in loading order Power rating (MW) No. of units Availabilitya 1200 800 800 600 400 200 50 6 1 2 6 7 56 96 0.85320 0.85320 0.75910 0.78750 0.87420 0.92564 0.76000 a Availability -= 1 - FOR. first a n d s e c o n d o r d e r E d g e w o r t h e x p a n s i o n s . T h e s e e s t i m a t e s are c o m p a r e d with a n u m e r i c a l analytic algorithm ( d e n o t e d b y SC-16), w h i c h is c o n s i d e r e d as a n i n d u s t r y b e n c h m a r k , a n d P3, a n algorithm b a s e d o n the m e t h o d o f c u m u l a n t s . Approximate computation of power generating system reliability indexes 69 Table 4 EPRI system D. Description of the LDC Load (MW) x Load duration value ~(x) Load (MW) value x Load duration value ~(x) 0.0 12288.0 12800.0 13312.0 13824.0 14336.0 14848.0 15360.0 15872.0 16384.0 16896.0 17408.0 17920.0 18432.0 1.000000 1.000000 0.974227 0.962347 0.911022 0.855419 0.804035 0.752947 0.677207 0.635624 0.570880 0.522756 0.493054 0.475233 18944.0 19456.0 19968.0 20480.0 20992.0 21504.0 22016.0 22528.0 23040.0 23552.0 24064.0 24576.0 25088.0 25600.0 0.469293 0.475412 0.445531 0.409888 0.390267 0.350484 0.320782 0.291080 0.243557 0.190093 0.124749 0.071285 0.035643 0.0 Table 5 Discrete version of LDC (EPRI system D) Interval Load (MW) Probability Interval Load (MW) Probability (j) (lj) (rj) (j) (lj) (rj) 1 2 3 4 5 6 7 8 9 10 11 12 13 12 544 13 056 13 568 14080 14 592 15104 15 616 16128 16 640 17,152 17 664 18176 18 688 0.025773 0.011880 0.051325 0.055603 0.051384 0.051088 0.075740 0.041583 0.064744 0.048124 0.029702 0.017821 0.005940 14 15 16 17 18 19 20 21 22 23 24 25 26 19 200 19 712 20 224 20 736 21248 21760 22 272 22784 23 296 23 808 24 320 24 832 25 344 0.011881 0.011881 0.035643 0.017821 0.04 ! 583 0.029702 0.029702 0.047523 0.053464 0.065344 0.053464 0.035642 0.035643 T h e latter t w o a l g o r i t h m s are c o n s i d e r e d to be t h e b e s t in their r e s p e c t i v e c a t e g o r i e s b y C a r a m a n i s et al. [5]. W h e n o n e r e g a r d s the v a l u e s p r o v i d e d by S C - 1 6 as b e n c h m a r k v a l u e s as C a r a m a n i s et al. [5] do, o n e o b s e r v e s t h a t E s s c h e r ' s m e t h o d p r o v i d e s excellent a p p r o x i m a t i o n s to t h e c a p a c i t y f a c t o r s for e a c h unit in the l o a d i n g o r d e r o f E P R I S y s t e m D . Especially, the L D - 2 a n d L D - 3 a p p r o x i m a t i o n s u n i f o r m l y o u t p e r f o r m 70 M. Mazumdar Table 6 Comparison of algorithms for capacity factors (EPRI system D) Esscher's approximation Unit no. SC-16 a P3 ~ LD-1 LD-2 LD-3 b 1-7 8-9 10-13 14 15 16 t7 18 19 20 21 22 23-28 29 30-33 34-39 40 41-44 45 - 4 9 50 51-55 56-59 60 61-67 68-69 70 71 - 78 79-89 90 91-99 100 101-102 103-109 110 111-114 115-125 126 127-131 132 133-138 139 140 141 - 150 151-159 160 161-162 163-164 0.853 0.759 0.788 0.787 0.786 0.870 0.866 0.861 0.852 0.841 0.827 0.809 0.809 0.758 0.717 0.634 0.578 0.544 0.492 0.463 0.439 0.403 0.382 0.344 0.294 0.276 0.213 0.116 0.102 0.092 0.082 0.079 0.070 0.063 0.059 0.048 0.040 0,036 0.033 0.029 0.026 0.025 0.021 0.014 0.012 0.011 0.010 0.853 0.759 0.787 0.787 0.787 0.874 0.874 0.864 0.844 0.831 0.816 0.799 0.800 0.753 0.716 0.641 0.590 0.557 0.503 0.471 0.442 0.399 0.376 0.334 0.283 0.264 0.207 0.118 0.106 0.097 0.088 0.085 0.078 0.071 0.067 0.056 0.048 0.044 0.041 0.036 0.033 0,032 0.026 0.019 0.015 0.014 0.013 0.853 0.759 0.788 0.787 0.786 0.871 0.868 0.862 0.854 0.844 0.831 0.815 0.816 0.766 0.726 0.642 0.585 0.549 0.496 0.467 0.442 0.406 0.386 0.349 0.300 0.281 0.220 0.121 0.107 0.096 0.086 0.083 0.074 0.067 0.062 0.050 0.042 0.038 0.034 0.031 0.027 0.026 0.021 0.014 0.012 0.01 ! 0.011 0.853 0.759 0.788 0.787 0.786 0.870 0.866 0.861 0.852 0.841 0.827 0.810 0.809 0.759 0.718 0.634 0.578 0.544 0.492 0.464 0.439 0.403 0.382 0.344 0.295 0.276 0.213 0.116 0.102 0.092 0.082 0.079 0.070 0.063 0.059 0.048 0.040 0.036 0.033 0.029 0.026 0.025 0.020 0.014 0.012 0.011 0.011 0.853 0.759 0.788 0.787 0.786 0.870 0.866 0.861 0.852 0.841 0.827 0.810 0.809 0.759 0.718 0.633 0.578 0.544 0.492 0.463 0.439 0.403 0.382 0.344 0.295 0.276 0.213 0.116 0.102 0.092 0.081 0.078 0.070 0.063 0.059 0.048 0.040 0.036 0.033 0.029 0.026 0.025 0.020 0.014 0.011 0.011 0.011 Approximate computation of power generating system reliability indexes 71 Table 6 (continued) Esscher's approximation Unit no. SC- 16a P3 a LD- 1 LD-2 LD-3 b 165 166 167 168 169 170 171 172 173 174 0.009 0.009 0.009 0.008 0.008 0.008 0.007 0.007 0.007 0.006 0.012 0.012 0.011 0.010 0.010 0.009 0,009 0.006 0,000 0.000 0.010 0,009 0.009 0.008 0,008 0,008 0.007 0.007 0,007 0.006 0,009 0.009 0,009 0.008 0.008 0,008 0,007 0,007 0.007 0.006 0,009 0.009 0,009 0.008 0,008 0.007 0,007 0.007 0.007 0.006 a Excerpted from [5]. u LD-1, LD-2, LD-3: Esscher's approximations using normal and first and second order Edgeworth expansions. the method of cumulants. We conjecture that the performance of the Esscher approximation will be more convincingly superior to the method of cumulants for systems with lower unit FOR values. Summary and conclusions Reliability of electrical power supply is of utmost importance to the public. To insure adequate and reliable power supply, the electric power industry spends a considerable effort in long-term generation planning. In this connection, several reliability indexes are used by the power system planners. The loss-of-load probability (LOLP) index for a power generating system measures the probability that system load exceeds its available capacity. Direct numerical computation of this index proves unfeasible, and one needs to resort to approximate methods. We adapt an approximation scheme proposed by Esscher in an actuarial context for evaluating the LOLP index. Numerical results given in this article demonstrate that this approximation is very accurate. A second problem considered is estimating the capacity factors of various units which experience different rates of utilization within the system. These indexes are used to determine the expected operating costs of an electric utility company. The computation of these indexes involves similar difficulties as that of LOLP. Here, also, for a typical system evaluated, Esscher's method provides very accurate results. 72 M. Mazumdar References [1] Baleriaux, E., Jamoville, E. and Fr. Linard de Guertechin (1967). Simulation de l'exploitation d'un pare de machines thermiques de production d'61eetricit6 couples a des stations de pompage. Revue E(SRBE ed.) 5, 3-24. [2] Barlow, R. E. and Proschan, F. (1975). Statistical Theory of Reliability and Life Testing Probability Models. Holt, Rinehart and Winston, New York. [3] Billinton, R. (1970). Power System Reliability Evaluation. Gordon and Breach, New York. [4] Billinton, R., Ringlee, R. J. and Wood, A. J. (1973). Power System Reliability Calculations. MIT Press, Cambridge, MA. [5] Caramanis, M., Stremmel, J. V., Fleck, W. and Daniel, S. (1983). Probabilistic production costing. International Journal of Electrical Power and Energy Systems 5, 75-86. [6] Cramer, H. (1946). Mathematical Methods of Statistics. Princeton University Press, Princeton, NJ. [7] EEI Equipment Availability Task Force (1976). Report on equipment availability for the ten-year period, 1966-1975. Edison Electric Institute, New York. [8] Endrenyi, J. (1978). Reliability Modeling in Electric Power Systems. Wiley, New York. [9] Electric Power Research Institute (1985). Large-deviation approximation to computation of generating-system reliability and production costs. EPRI EL-4567, Palo Alto, CA. [10] Esscher, F. (1932). On the probability function in the collective theory of risk. Scandinavian Actuariedskrift 15, 175-195. [11] Feller, W. (1971). An Introduction to Probability Theory and its Applications, Vol. II, 2nd ed. Wiley, New York. [12] IEEE reliability test system (1979). A report prepared by the Reliability Test System Task Force of the Application of Probability Methods Subcommittee. IEEE Transactions on Power Apparatus and Systems 98, 2047-2064. [13] Levy, D. J. and Kahn, E. P. (1982). Accuracy of the edgeworth expansion of LOLP calculations in small power systems. IEEE Transactions on Power Apparatus and Systems 101, 986-994. [14] Mazumdar, M. and Gaver, D. P. (1984). On the computation of power-generating system reliability indexes. Technometrics 26, 173-185. [15] Stremmel, J. P., Jenkins, R. T., Babb, R. A. and Bayless, W. D. (1980). Production costing using the cumulant method of representing the equivalent load curve. IEEE Transactions on Power Apparatus and Systems 99, 1947-1953. [16] Sullivan, R. L. (1976). Power Systems Planning, McGraw-Hill, New York. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 73-98 Software Reliability Models Thomas A. Mazzuchi and Nozer D. Singpurwalla I. Introduction In the past ten years or so, there has been considerable effort in what has been termed software reliability modeling. The generally accepted definition of software reliability is 'the probability of failure-free operation of a computer program in a specified environment for a specified period of time' (Musa and Okumoto, 1982). This area has begun to receive much attention for several reasons. Today the computer is used in many vital areas where a failure could mean costly, even catastrophic consequences. Due to the recent advances in hardware modeling and technology, the main cause for computer system failure would be in the software sector. At the other end of the spectrum, software production is costly and time consuming, with much of the time and cost being devoted to testing, correcting and retesting the software. The software producer needs to know the benefits of testing and must be able to present some tangible evidence of software quality. The issues concerning the quality and performance of software which are of interest to the statistician (see Barlow and Singpurwalla, 1985) are: (1) The quantification and measurement of software reliability. (2) The assessment of the changes in software reliability over time. (3) The analysis of software failure data. (4) The decision of whether to continue or stop testing the software. The problem of software reliability is different from that of hardware reliability for several reasons. The cause of software failures is human error, not mechanical or electrical imperfection, or the wearing of components. Also, once all the errors are removed, the software is 100~o reliable and will continue to be so. Furthermore, unlike hardware errors there is no process which generate failures. Rather software 'bugs' which are in the program due to human error are uncovered by certain program inputs and it is these inputs which are randomly generated as part of some operational environment. Research supported by Contract N00014-85-K-0202, Project NR 347-128-410, Office of Naval Research and Grant DAAG 29-84-K-0160, U.S. Army Research Office. 73 74 T. A. Mazzuchi and N. D. Singpurwalla A more formal discussion of the software failure process is given in Musa and Okumoto (1982). A computer program, is a 'set of complete instructions (operations with operands specified) that executes within a single computer some major function', and undergoes several runs, where a run is associated with 'the accomplishment of a user function'. Each run is characterized by its input variables which is 'any data element that exists external to the run and is used by the run or any externally-initiated interrupt'. The environment of a computer program is the complete set of input variables for each run and the probability of occurrence of each input during operation. A failure is 'a departure of program operation from program requirements' and is usually described in terms of the output variables which are 'any data element that exists external to the run and is set by the run or any interrupt generated by the run and intended for use by another program'. A fault or bug is the 'defect in implementation that is associated with a failure'. The 'act or set of acts of omission or commission by an implementor or implementors that results from a fault', is an error. For a more indepth treatment of software terminology the reader is referred to Musa and Okumoto (1982). For further clarification of types of software errors and their causes see Amster and Shooman (1975). Software reliability models may be classified by their attributes (Musa and Okumoto, 1982; Shanthikumar, 1983) or the phase of the software life cycle where they may be used (Ramamoorthy and Bastani, 1982). The later approach will be used here. There are four main phases of the software lifecycle: testing and debugging phase, validation phase, operational phase, and maintenance phase. Currently no models exist for use in the maintenance phase and thus this phase will not be discussed. 2. Models of the testing and debugging phase In the testing and debugging phase the software is tested for errors. In this phase an attempt is made to correct any bugs which are discovered. The discovery of a software bug is a function of its size (the probability that an input will result in the bug's dis:zovery) and the testing intensity which reflects the way in which inputs are selected. Another issue in this phase, is the treatment of the error correction process. The more simple models assume all errors are corrected with certainty and without introducing new errors, while other account for imperfect debugging. Models in this phase may be classified into two main categories: error counting models, and non-error counting models. Models may be further classified by their approach (Bayesian or classical), their treatment of the effect of error removal on reliability (deterministic or stochastic) and their consideration of the time it takes to find and fix software bugs. Software reliability models 75 2.1. Error counting models Error counting models are based on the assumption that (A1) The failure rate of the software at any point in time is a function of the residual number of errors in the program. Thus effort centers around estimating the residual number of errors and using this to obtain other reliability measures. Furthermore, deterministic models are based on the assumption that (A2) Conditional on the model parameters the correction of an error results in a known improvement in the reliability of the software. The simplest, most cited and most criticized model is that of Jelinski-Moranda (henceforth JM). In addition to (A1) and (A2) this model is based on the assumptions (JM1) Each undetected error contributes an equal amount to the failure rate of the software, which is proportional to the number of remaining errors, (JM2) Conditional on the model parameters the time between successive software failures are independent, (JM3) Once discovered, errors are removed in a minimal amount of time without introducing any new errors. Given the above, the reliability function for Ti, the time between the (i - 1)st and ith failure, is given by R(til(a,N)=exp{-~)(N-i+l)tt} for i=1 ..... N, (2.1.1) where N is the initial number of software bugs and ~p is the failure rate contribution of an individual error. This model may be used to make inference about the software once estimates of N and ~p are obtained. Given the software is tested and n ~< N software failures have occurred, the parameters N and q5 can be obtained by maximum likelihood techniques. They are obtained as the simultaneous solution to n q~ = n ' 5~i=1 ( i - 1)t, NTU 1 i=12N - i + (2.1.2a) n 1 - N--- 1 , (2.1.2b) ~ ( i - 1)t~ Ti=l where T = Y~7=1 ti" The authors note that the above model assumes equal amounts of testing in all periods. They suggest normalizing the time scale by using a time dependent 76 T. A. Mazzuchi and N. D. Singpurwalla parameter e(t), the exposure rate. This parameter would reflect the testing intensity at any time. The model could thus be modified by normalizing the time between failures as t* = f t t; e(u) d u where t; is the time of the ith failure. Shooman (1972) develops a model similar to JM and further elaborates on the notion of 'testing intensity'. Shooman suggests treating the number of corrected software bugs as a continuous function of debugging time, say e(z). The function ~(z) would relate the cumulative number of corrected errors/number of program instructions/debugging time. Once e(v) was established, future software questions, such as when to stop testing could be answered. Analogous to (2,1.1) the reliability function for software which has undergone ~ months of debugging is R ( t [ N , I, 8(z)) = exp { - C ( N / I - ~(z))t} (2.1.3) where I is the total number of program instructions and C is an unknown constant. In shooman (1973) the function e(z) is defined simply as 'the total number of errors corrected by time z normalized with respect to I'. This assumption essentially makes the Shooman and JM models different in notation only. Shooman (1973) and (1975) however, suggest a different technique for estimating C (and thus ~p) and N. The debugging process is divided into k intervals of lengths H~ . . . . , H k. The end of the ith debugging interval is denoted rt- In the ith interval ni failures are recorded but they are not fixed until the end of the interval. A similar approach is undertaken using the JM model in Lipow (1974). The parameters C and N may be obtained by the method of moments by choosing times zi < ~ such that e(z;)< e(zj.) and solving Hi_ n~ 1 /-/j_ nj (2.1.4a) e(r,.)] C[N/I- 1 C[N//- e(zj)] (2.1.4b) or by using the method of maximum likelihood estimation and solving C= C= Z~=I ni , Z j = 1 [ N / I - g(za.)]//j k Zk i=l n i / [ N / I - e(zi) ] Z jk= l n j (2.1.5a) (2.1.5b) Software reliability models 77 If large sample theory is applicable, asymptotic variances of the MLE's are obtained as ~2 var (d) (2.1.6a) - - 2iK1 ni 12 var( ) (2.1.6b) , n;[R/x - = Shick and Wolverton (1973) actually specify and incorporate a testing intensity in their modification of the JM model by assuming the failure rate of the software is a linear function of testing time. The resulting distribution for the interfailure times is the Rayleigh distribution R(til cp, N) = exp { - ~p(N - i + 1)t2/2} (2.1.7) and the resulting MLE's for N and ¢ are obtained by solving n N = [2n/~ + 2;=1 (i - 1)t,.2] n 2 i=l t2 q~ = [27=1 2 / ( N - i + 1)] ~ n , (2.1.8a) (2.1.8b) i = l t2 Several alterations of this have appeared. Wagoner (1973) fit a Weibull distribution to software failure data using least squares estimation for parameters. Lipow (1974) suggested using a linear term which would be a function of the most recent failure time. Shick and Wolverton (1978) discuss the use of a parabolic function to model testing intensity. Sukert (1976) also adapted the model to include the case of more than one failure occurring in a debugging interval. Musa (1975) was the first to point out that software reliability models should be based on execution time rather than calendar time. Musa's model is essentially the same as the JM model but he attempts to model the debugging process in a more realistic fashion. The model undergoes some alterations in Musa (1979). Here, the expected net number of corrected software bugs is expressed as an exponential function of execution time, and the fault correction occurrence rate is assumed proportional to the failure occurrence rate. The reliability of software tested for ~ units of execution time is R(t) = exp { - t/T} where T, the mean time to failure (in execution time) is given by r = TOexp (2.1.9) 78 7". A. Mazzuchi and N. D. Singl~urwalla In the above, TO is the mean time to failure before debugging, M o is the total number of possible software failures in the maintained life of the software and C is a testing compression factor. The value TO is further expressed by TO = 1 / f K N o where f is a ratio of average instruction execution rate to the number of instructions, called the linear execution frequency and K is an error exposure ratio relating error exposure frequency to linear execution frequency. The value N O is the initial number of software errors in the program and is related to M o by M o = N o / B . The parameter B is called the fault reduction factor. This gives the model the additional characteristic of being able to handle the possibility of more than one error being found at one time or the possibility of imperfect debugging. The value C is a ratio relating the rate of failures during testing to that during use. From the parameter relationships, two central measures are obtained. The additional number of software errors which needs to be corrected to increase the mean time to failure for the program from T~ to T2 is given as E1 Am = M o To T1 (2.1.10) and the additional execution time required to increase the mean time to failure from T 1 to T2 is given as A z - M°T° C (2.1.11) log(T2/T1) . Musa derives an execution to calendar time conversion by pointing testing time is a function of three limited resources: failure identification (I), failure correction personnel (F) and computer time (C). The resource ture associated with a change in mean time to failure is approximated out that personel expendiby (2.1.12) h Z k = O k h r + l~kAm for k = I, F, C, where Az and Am are the additional execution time needed and the additional errors corrected to bring about the change and Ok and/1~ are the average resource expenditure rate per execution time and failure respectively. Assuming resources remain constant throughout testing, the testing phase may be divided into three distinct phases. In each phase only one of the resources is limiting and the other two are not fully utilized. Thus the additional calendar time required to increase the mean time to failure from T 1 to T2 is given as 1 I (1 1) Ok log(Tk2~] (2.1.13) where k = C, F, I corresponding to the appropriate resource limiting phase, P. is the amount of resource available, p. is the resource utilization factor, and O. and Software reliability models 79 /~. are as previously defined. The quantities Tk, and Tk2 are the mean time to failures at the boundaries of each resource limiting phase. These boundaries are at the present and desired mean time to failure and the transition points which lie in this range. The mean time to failure for a transition point is derived as Tk~,, = c [ P k , ~ , p~ - P~,,~p~, ] (2.1.14) [P~, p~, Ok - P,p~O~, ] for (k, k ' ) = (C, F), (F, I), (I, C). M u s a notes that it is generally true that OF = 0 and PI = 1 and discusses a method for obtaining PF by treating the failure correction process as a truncated M / M / P F queueing model. Most of the parameters of M u s a ' s model must be obtained from past data on similar projects. The parameters M o and T O (and thus K and No) m a y be obtained by using m a x i m u m likelihood techniques. The M L E ' s are obtained by solving T o = -n 1- i-1 Ze, (2.1.15a) Mo i=1 1 - - i=1 M o - i + 1 c ze , (2.1.15b) M o T o i=1 where zi, i = 1, . . . , n, is the e x e c u t i o n t i m e between the ( i - 1)st and ith failure. An exact expression for the variance of To is obtained as (2.1.16) V a r (7"o) = 7"2/n yielding a coefficient of variation of 1/n 1/2. Though an exact expression for the variance of M o is not available, confidence bounds for M o are obtained using Chebychev's inequality. Based on the distribution of the failure m o m e n t statistic 7 = M o / n - 1/A~k where A ~ = ~k(Mo + 1) - ~k(Mo + 1 - n) and qJ is the d i g a m m a function, a (1 - ~)~o confidence interval for M o is obtained by determining the values of M o which correspond to the values of 7 such that 7= ~ + (2.1.17) SD(~) where _~to n 1 A¢ ' SD(~)= _ 1 (A~O' + (A ~k)2 \ ( A ~k)2 with A~k' = ~k'(Mo + 1) - ~ ' ( M o + 1 - n) and ~k' is the trigamma function. The M u s a model was one of the first to suggest that the number of software failures was governed by a Poisson distribution. Another model which adopted this approach was the Generalized Poisson Model ( G P M ) of Angus, Schafer and 80 T. A. M a z z u c h i a n d N. D. Sin~vurwalla Sukert (1980). This model is also based on the JM assumptions but includes the additional assumption that the severity of the testing process is proportional to an unknown power of elapsed test time. In the ith debugging time interval of length H;, the number of errors observed N t is given by a Poisson distribution with mean value E [ N i ] = dp(N - M e_ I ) H i ~ where M i_ 1 is the number of errors removed before the start of the ith debugging interval and ~ is an unknown constant. As in the debugging scenario of Shooman it is assumed that if bugs are corrected they are corrected at the conclusion of the debugging interval. Parameters N, ~p and ~ may be obtained by solving the maximum likelihood equations. In Ramamoorthy and Bastani (1982) these are given for the case H i = ti, the time between the ( i - 1)st and ith failure, as 1 ~ ~pt~ = O, i=1 (2.1.18a) + ~ logti- ~ ¢p(N- M i_,)t/~ logt/= 0, (2.1.18b) i=1 n - 0¢ N- Mi_ 1 i=1 i=1 - Z ( N - M i_l)t~ = 0. p (2.1.18c) i=l The extra parameter gives the GPM flexibility but also difficulties in terms of parameter estimation. Once the parameter estimates are obtained they may be used with the model to make conclusion regarding the software. One important expression obtained in Angus, Schafer and Sukert (1980) is the expected time until the removal of an additional k ~<N - M faults given M faults have already been removed. The expression is M+k 7~k= &-'F(&) Z {~[N-i+ 11} -'/~ (2.1.19) i=M+ I A where F(.) is the gamma function and &, ¢p, and N are the MLE's of ~, ¢p and N. The use of least squares estimates is also discussed by the aforementioned authors. There has been much comparison and criticism of the early models in terms of their assumptions and their parameter estimation. (See for example Forman and Singpurwalla (1977), Shick and Wolverton (1978), Forman and Singpurwalla (1979), Sukert (1979), Musa (1979), Littlewood (1979), Littlewood (1980a), Littlewood (1980b), Angus, Schafer and Sukert (1980), Littlewood (1981a), Littlewood (1981b), Keiller, Littlewood, Miller and Sofer (1982), Musa and Okumoto (1982), Ramamoorthy and Bastani (1982), Stefanski (1982), Singpurwalla and Meinhold (1983), Langberg and Singpurwalla (1985)). The paramete(estimation of the JM model has been most criticized. Forman and Singpurwalla (1977) and (1979), Littlewood and Verrall (1981) and Joe and Reid (1983), have all illustrated that the solution of the maximum likelihood equations for the JM model can produce unreasonably large even non-f'mite estimate for N. In Forman and Singpurwalla Softwarereliabilitymodels 81 (1977) the authors found that when n is small relative to N, the likelihood function of N is very unstable and may not have a finite optimum. Littlewood and Verrall (1981) found that the estimate of N is finite if and only if n 2"i=1 (i - 1)ti> Y~1=1t; n 2i= 1 (i - 1) (2.1.20) n reliabilitygrowth The authors note that violation of the above implies that no is taking place as a result of the debugging process. In Joe and Reid (1983) the authors show that g is an unsatisfactory point estimate because its median is negatively biased and can be infinite with substantial probability. The authors advocate the use of likelihood interval estimates. Forman and Singpurwalla (1977) and (1979) develop an estimation procedure to insure against unreasonably large estimates. They propose a stopping rule based on the comparison of the relative likelihood function for N, to the 'approximate normal relative likelihood' for N: R . . . . . 1(N) = exp { - ½(N - N)z/var(N)} (2.1.21) where Var(N) = n /Ini~= ( 1 1 (N-/+l) )2 - ( ~ , 1))21 i=1 ( N - i + I . The above function may be used to give an indication of the appropriateness of the large sample theory for estimating N. When appropriate, plots of the relative likelihood function and that of R . . . . al(N) compare favorably. Thus to get a meaningful estimate of N, the authors suggest the following stopping rule. After testing the software to n failures (1) (2) return (3) Compute g the MLE of N using (2.1.2a) and (2.1.2b). If g ~ n go to step 3, if not continue testing until another failure occurs and to step 1. Compute the relative likelihood function for N and compare it with Rnormal(N ). If plots of the two functions display a large discrepancy, this estimate is misleading. Continue testing until another failure occurs then go to step 1. If the plots are in good agreement, stop testing. Furthermore, if the large sample theory appears appropriate, then inference concerning N (and in an analogous manner tp) may be obtained using the normal distribution. Meinhold and Singpurwalla (1983) suggest the adoption of the Bayesian point of view when considering the likelihood function of the JM model. In so doing, the conclusion to be obtained from ridiculous parameter estimates is that the method of inference--specifically maximum likelihood estimation, rather than the 82 T. A. Mazzuchi and N. D. Singpurwalla model that needs to be questioned. A Bayesian approach to inference on N and ~p is discussed. Goel and O k u m o t o (1979) treat the cumulative number of software failures by time t, N(t) is assumed to be a nonhomogeneous Poisson process with mean value function m(t) = a(1 - e - b t ) (2.1.22) where the unknown constants a and b represent the expected number of failures eventually discovered and the occurrence rate of an individual error respectively. Thus for any t >~ 0 Pr {N(t) : n la, b} = [a(1 - e - b ' ) ] " e n! [a(l = poim(n'a(1 - e-bt)), - e-bt)] (2.1.23) n = 0, 1, 2 . . . . . F r o m (2.1.23) the distribution for the total error content is poim(n" a) and the conditional distribution of the number of remaining errors at time t ' , N ( t ' ) : N ( o o ) - N ( t ' ) is P r { N ( t ) = n ' l N ( t ) = n, a, b} -- p o i m ( n ' + n , a ) , n' = 0 , 1 , 2 , . . . . (2.1.24) The reliability function for the interfailure time T; is given by R(ti] t~_ 1, a, b) = exp { - a l e -bt;-I - e - b(t''- 1+ti)]} (2.1.25) where t i j= l tj is the time until the ith failure. Thus in contrast to JM2, software interfailure times are not independent. Also note that due to this dependence, the G o e l - O k u m o t o model is of the stochastic type. Estimators of a and b are obtained via the solution of the m a x i m u m likelihood equations n/a = 1 - exp { - bt'n }, n/b = ~ t'k + I t 'n e x p { - b t ' n } . (2.1.26a) (2.1.26b) k=l A (1 - ~)~/o confidence region for a and b may be established using the approximation L ( h, b lt; . . . . , t'n) - L(a, blt~', . . . , t'n) ~ ~Xz,~ (2.1.27) Goel and O k u m o t o (1980) also discuss the use of the asymptotic normality of h and b for constructing confidence intervals. Here, model results are based Software reliabilitymodels 83 on execution rather than calendar time. This approach represents an extension of the basic model derived in Schneidewind (1975) and is itself extended in Shanthikumar (1981) using a nonhomogeneous Markov process. A combination of the Musa model and Goel and Okumoto model is given in Musa and Okumoto (1984). This model incorporates use of execution time with the analytical ease of the Nonhomogeneous Poisson Process. Furthermore, the authors define the failure intensity in such a way as to reflect the fact that errors with larger size are found earlier. If 20 and 0 are the initial failure intensity and the rate of reduction in the normalized failure intensity per failure, the failure intensity is defined in terms of execution time as 2(~) = 20 e - (2.1.28) Om(*) where re(v) is the mean value function for N(~). Given the above the mean value function is given by 1 m('c) = = log(2oO'r + 1) 0 (2.1.29) and the distribution for N(,) is given by poim(n : (1/0) log(2 o 0z + 1)). Expressions analogous to (2.1.22)-(2.1.25) are obtained by substituting (1/0)log(2 o 0~ + 1) for a(1 - e-bt). Musa and Okumoto obtain further functions of interest by exploiting the relationship between time until the ith failure, T" and the number of failures in a given time. Using this notion e { r ; ~< ~} = ~ [m(~)]J j=e (2.1.30) e -m(x) j! and oe P{T" < "cIN('c,) = nx} = j=i [m(~) n m(T1)]j-hl e- [ m ( z ) -- m ( ' q )] ( j - rh)[ where T; = Z'j = ~ Tj. is the time of the ith software failure. Maximum likelihood estimation is discussed for both cases where failure times and number of failures are used. The complexity of the estimation procedure is reduced by estimating the parameter cp = 200 and solving for 2 and 0 by choosing the mean number of failures equal to the number of software failures encountered. When the software is tested for a time v~ and n failures are recorded at times z;, ..., v,~, ~p may be obtained by solving n T~ q~ i=l tp~,' + 1 (q~" + 1)(log(tpr~ + 1)) = 0 (2.1.32) 84 T. A. Mazzuchi and N. D. Singpurwalla ^ ^ Given ~, estimates 20 and /) may^ be obtained ^by ^setting m(z)= (1/0)ln[~bz" + 1] = n, thus 0 = (1/n)ln[~x" + 1] and 2 = ~/0. When the software is tested over an interval [0, xp] and this is partitioned into intervals (0, xl], (x 1, xz] . . . . . (Xp_ 1, Xp] with n; denoting the number of failures recorded in (0, xi], i = 1, . . . , p, then the maximum likelihood equation for q~ is given as p Z ni ;=x Xi Xi- 1 ~x i + 1 ~bxi + 1 log(~xi + 1 ) - log(~xi_ l + 1) ' np Xp _ = 0 (~bXp + 1)log(q~xp + 1) (2.1.33) where ni = n ; - n;_l. ^ Again using the same approach as before ~ and 2o may be obtained as 0 = (1/np)log[~pXp + 1] and 2 = @~. In the above model, times are in terms of executime time rather than calendar time. The conversion to calendar time follows the developments in Musa (1975). The Musa (1975) model was also one of the first models to address the notion of imperfect debugging. Goel and Okumoto (1979) suggested the use of a Markov process to model imperfect debugging. Kremer (1982) uses a multidimensional birth-death process to account for imperfect debugging and the introduction of new errors as the result of debugging. Kremer (K) begins by assuming that the failure rate of the software is a product of its fault content and an exposure rate, h(t). To account for imperfect debugging he further assumes (K) When a failure occurs, the repair effort is instantaneous and results in one of three mutually exclusive outcomes (i) the fault content is reduced by 1 with probability p; (ii) the fault content remains unchanged with probability q; (iii) the fault content is increased by 1 with probability r. Thus the author defines a birth-death process with birth rate rh(t) and death rate ph(t). A multidimensional process is defined with X(t) denoting the fault content of the software at time t and N ( t ) the number of failures to time t. Though reliability measures are obtained from N(t), the failure rate of the software is a function of X(t), which is changing in a stochastic manner. Given the initial fault content of N, the expected number of faults in the program by time t is (2.1.34) E [ X ( t ) IN, p, r] = N e - p(o where p(t) = ( p - r ) ~ o h ( u ) d u and the expected number of failures by time t is N p-r (2.1.35) E [ N ( t ) IN, p, r] = N h(u) d u , p = r. Software reliability models 85 Thus in the life of the software (if p > r) the expected number of failures will be N/(p - r). Thus p - r is similar to Musa's constant B. Given n failures obtained by time to, conditional expectations may be obtained as E[X(t o + t)lN, p, r,N(to) = n] = [ N - (p - r)n] e -p(t°,') (2.1.36) where p(t o, t) = p(t o + t) - p(to). Using (2.1.36) the conditional expectation for the number of failures in (to, to + t] is E[N(t o + t) - N(t)IN, p, r, N(to) = n] (N= !P_-_r)n) [ 1 _ e-P(to, t,] p-r ffp~r, (2.1.37) [Dt o + t N Jt ifp=r. h(u) du o The birth-death differential-difference equations may be solved for pm(t) = e{x(t) = m} as Po(t) = [~(t)] N, min (N, m) Pm(t) = 2 (2.1.38a) (jN.) ( N + N S ~ - 1) j=O (a(t)) lv-j(fl(t)) m -J(1 - ct(t) - fl(tt} i (2.1.38b) where e(t) = 1 1 and e -°(t) + A(t) fl(t)= 1 e-°(° e -p(° + A(t) and A(t) = f o rh(u) e pCu)d u . From these, the reliability of a program tested for to units of time may be obtained as R (t[ N, p, r) = ~ pro(to) [ Sto(t)] m (2.1.39) m where Sto(t) = exp { - f , i ° + ' h ( u ) d u } is the reliability attribute of each remaining fault. Given n failures by time to the reliability may be expressed as R(tln, p, r, N(to) = n) = ~ Pm(to)[Sto(t)] ~v-m m (2.1.40) 86 7'. A, Mazzuchi and N. D. Singpurwalla where Pro(to) = P{X(to) = N P,,,(to) = m l N ( t o ) = n} and is given by n~ - piqJrk. i--k=m i!j!k! ~ This model is dependent on the parameters N, p, q, r and h(t). Maximum likelihood estimates may be used for N, p, q, r and the parameters of h(t). The amount of data required and the accuracy of the estimates have not been investigated. Estimates of p, q and r could be obtained from experience or best prior guesses. The author also suggests a Bayesian approach for estimating h(t), which closely resembles that pursued in Littlewood (1981). The model of Goel and Okumoto (1979) and Musa and Okumoto (1984) represent a step towards a Bayesian analysis of the problem. In Singpurwalla and Kyparisis (1984) a fully Bayesian approach is taken using the nonhomogeneous poisson process with failure intensity function 2(0 = (fl/~)(t/~) t~- ~ for t>~ 0. Due to the resemblance of 2(0 to the failure rate function of the Weibull distribution, the model is referred to as the Weibull process. Thus N(t) again is assumed to be a nonhomogeneous poisson process with mean value function m(t) = (t/~) t~. In the true Bayesian context uncertainty concerning ~ and fl are expressed by their respective prior densities 1 go(a) = - - , 0 < ~ ~< 7o, (2.1.41a) ~0 fo(/~) - r ( k , + k2) (~ - ~ , Y " - 1(~2 - ~y,2-, r(kl)r(k2) (/~2 -/~IY' +k2-1 O~fll "(fl(fl2 ; kl, k2)O- (2.1.41b) For convenience it is assumed that the prior distributions for ~ and fl are independent. Posterior inference concerning the number of future failures in an interval or the time until the next failure may be obtained once the posterior distributions of ~ and fl are computed. The posterior distribution of fl is of interest in its own right as it may be used to assess the extent, of reliability growth. Reliability growth would be taking place if fie (0, 1), by observing the posterior density one may examine the extent to which this is true. Posterior analysis is conducted fo~ both the case where only the number of failures per interval and the case where the actual failure times are recorded. In both cases the posterior distributions of ~ and/3 are intractable. An approximation is given for the posterior of ft. Due to the intractability of the posterior distributions of c~ and r, posterior inference concerning the number of failures in future intervals and the time next failure are conducted numerically via a computer code described in Kyparisis, Soyer and Daryanani (1984). When only the number failed in each interval is recorded over a period [0, Xe] the posterior distribution of Ark the number of failures in (xk_ 1, xk], k = p + 1, Software reliability models 87 p + 2, p + 3, ... is given by Pr{Nk = nk[nl . . . . . np} 1)] n~ exp { - [m(Xk) -- m(Xk_ 1)]} = f o = ° f l ] =~ [m(Xk)--m(Xk-nk ' • gl(~,fl[nl . . . . . (2.1.42) np) do~ d/3 where gl(c~,/31nl . . . . , np) is the joint posterior density o f ~ and/3. The approximate marginal posterior density of/3 is obtained as gl(/3ln I . . . . , nk) OC (/3 -- ill) ~' -1(/32 -- /3)k2- 1.5(/3)1/fl - - . F(np - 1//3) /3 p Ix f • [1 ,=1 X f _ l ] n~ (2.1.43) s(/3) where S(/3) = Y~= 1 ( x f - xf_ 1). The approximate posterior distribution for /3 is based on the approximation ~) exp - d~ (2.1.44) /35(/3)n~ - 1/fl which works well if % f> S(/3) 1/~. When the software is tested over a period (0, T) and failure times t'1 ~< t~ ~< • • • ~< t', are recorded, then the joint posterior distribution of a and/3 is given by g2(=, ~3It'1, . . . , tL)oc ( / 3 - / 3 , ) < - ' ( / 3 2 • [I (t; I 0 ~ - ' -/3)k2-'(/3/0" exp { - i=l (t'lT) p} (2.1.45) and the marginal posterior of/3 is given by g2(/3[t'1, . . . , t~,)oc(fl- /31)kl-- 1(/32 -- /3)k2 1 / 3 " - ' r ( n • t" (t.)l - n/~ - 1//3) (2.1.46) i=1 using an approximation similar to (2.1.44) which works well provided ~o >/t;. Posterior inference concerning the number of failures in future intervals may be obtained using (2.1.42) in conjunction with (2.1.45). Posterior inference concerning Z k given t',, the time to the (n + k)th failure from t', is obtained by noting that 88 T. A. Mazzuchi and N. D. Singpurwalla given ~ and/7, failure times (t~/a) ~, (t2/a) ' # .... can be viewed as being generated from a homogeneous poisson process. The posterior conditional distribution of Z k given t, is obtained from ao P r { Z ~ <<.z l t l . . . . . t'} = f12 (t n z) fo f, fo ok-- 1 e - v (k i 1)! dv 1 • gz(~, [3l tl, . . . , t',) d a dfl (2.1.47) where v(tn, _ _ _(~)t~. Littlewood (1980) also initiates a Bayesian approach to error counting, but expresses uncertainty of the software's performance through 2~, the failure rate of the software given i - 1 failures have occurred. This Littlewood model embraces the assumptions of the JM model except for (JM1). Arguing that errors with the largest size (and thus greater failure contribution) will be discovered first, Littlewood instead views 2~ = q~ + (P2 q- " ' " -t" ~N--i+ i where (p~ is the failure contribution of the ith remaining error. Uncertainty about the ~O~is expressed via the prior distribution fl~ (p~- ~ e - ~ , (p >i 0 (2.1.48) which is denoted ~ ~ G(a, fl). Because the uncertainty is the same for all (p~, i = 1. . . . . N, initially, the prior distributions will all be identical. The failure contribution for an error which has not been observed by the (i - 1)st failure is given q~i~ G(a, fl + t'_ l) where, as usual the t;_ 1 is the time of the ( i - 1)st failure. Thus the uncertainty about the failure rate of the software after the ( i - 1 ) s t failure is expressed via 2 i ~ G ( ( N - i + 1)~, fl + t'_ 1). The reliability function of T~ may be expressed as R ( t ils, fl) = f l + t , _ l ](N-i+l)~ ..... , 1 + ti fl + t[ (2.1.49) a Pareto distribution. Unlike the exponential distribution the Pareto distribution permits the possibility of very large error free intervals. Also it is interesting to note that the failure rate function given by 2(ti) = ( N - i + 1)/(fl + t ; - i + ti) (2.1.50) displays a decreasing failure rate and this property can be shown to be independent of the prior distribution for the (Pi. Littlewood discusses the use of (2.1.48) and (2.1.49) in determining other reliability measures. The author suggests the use of maximum likelihood estimation Software reliability models 89 (similar to that used in the JM model) in order to obtain estimates of N, a, and ft. A purely Bayesian approach would determine the parameters from elicited prior information. All models thus far have ignored the time required to find and correct software errors. While this keeps the model derivation simple, it may not be adequate and does not enable the measurement of an important reliability parameter, availability. Shooman and Trivedi (1976) introduced the use of the Markov Process to account for the time to find and correct software bugs in large software systems. The thrust of this analysis is to estimate availability rather than reliability. In Kim, Kim, and Park (1982) (KKP) this model is developed and extended. As with the JM model it is assumed that the failure rate of the software is directly proportional to the number of errors and that each error contributes an equal amount to the failure rate. To account for the debugging process the following additional assumption is made (KKP) When a failure occurs, errors are corrected perfectly with rate #o, or are corrected but with the addition of a new error with rate #lGiven the above assumptions the differential difference equations for p,(t) = e(N(t) = n} when the computer is up, and q,(t) = e{N(t) = n} when the computer is down, are given by pN(t ) _ AN + #o + Ill eA+vt+ BN AN - BN pN_1,(t) -(~b#°~'N! + ~o + 111 eS+~t, (2.1.51a) BN -- AN ~ { (Alv-J+ l~O + #')eAU-/ (BN-j + 110 + ~1)eBN-jt + . . AN-i) . . . . []i=O,,+,i(BN-: ; - - - - [[~=o (B N-/--- BN-i) J ' k = 1, . . . , N, (2.1.51b) and ~)(~)#o)kN' ~= (~H (eAN Jt -- e-(bt°+ tll)t qN- k(t) j:o - (eBu-jt _ , N- 1, _j - AN- i) e-(f,o + u,)t 17~:o (B,,±+ - A~_~) Hi:~ o, i+,j (BN-j k = 0, 1 . . . . (A BN- i) l (2.1.52) 90 T. A. Mazzuchi and N. D. SingJ~urwalla where f AN- k ON- k ½ { - [ # o +/~1 + ( N - k)~p] ~[~O + ~'~1 -I- ( N - k ) ~ ] 2 - 4 ( N - k)~j[~o) . (2.1.53) Once estimates of N, tp, #o and #1 are obtained the availability of the system is given by Y,~=oPN_k(t). The authors specify no means for estimating the parameters, however N and tp could be estimated using methods applied to the JM model, while #o and /~1 could be estimated from past experience or from correction times. 2.2. Non-error counting models Non-error counting models are not designed to provide estimates of the number of residual failures but only provides estimates of the effects of the residual errors on software reliability. Deterministic models are represented by the Halden Project model (Dahil and Lahti (1978) and a modification of the JM model called the Jelinski-Moranda Geometric De-Eutrophication model presented in Moranda (1975) and (1979)). This model was designed to handle the case where groups of errors are removed at one time, but can also be used to account for the case where larger size errors are removed first, as in Littlewood (1980) and Musa and Okumoto (1984). The model assumes that 2; = D U - 1 where D is the initial detection rate, and k is the ratio between the ( i - 1)st and ith failure. These parameters may be estimated from the maximum likelihood equations iUti i= 1 E //\i= D = kn kite k it; . i = (n + 1)/2, (2.2.1a) 1 (2.2.1b) 1 Moranda also suggests using this formulation in conjunction with the nonhomogeneous Poisson process. Sukert (1977) generalizes the model to include more than on failure per debugging interval. In Littlewood and Verrall (1973) a stochastic Bayesian model is presented. In this approach the author attempts to model the debugging behavior of the programmer or programmers involved. As each error is encountered, it is the intent of programmer to correct the error and thus increase the reliability of the software. Though this is always the intent it is not always achieved. Often new errors are created which reduce the reliability of the software. To model this situation in a Bayesian context, Littlewood suggests expressing the uncertainty about 2; by assuming a priori that 2 i ~ G(a, ~(i)) where 0(i) is an increasing function indicating the complexity of the program and quality of the programmer. Defining 0(i) as an increasing function of L incorporates the assumption that the programmer's intent is always to improve the software's reliability since 0(i) > 0 ( i - 1) implies that Software reliability models e(,t, < z} I_. < z} 91 (2.2.2) for l > 0. The above implies that the ).i are stochastically ordered. Combining the usual assumption that given 2; the variables T;, i = 1. . . . , n, are independent exponential random variables, with the prior distributions for 2; the posterior reliability for T~ can be obtained as R(,,) =[ ,i] (2.2.3) which is a Pareto distribution. The author suggest trying several parametric families for ~(i) notably t~(i) = [3o + [31i a n d ~ ( i ) = floi + B l i a. The author does discuss the possibility of using a prior distribution for ~, but Littlewood (1980) suggest maximum likelihood estimation for the model parameters, thus making this model a hybrid approach. Further analysis along the lines of modeling the stochastic ordering of 2; are pursued in Ramamoorthy and Bastani (1980). Specifically these models are referred to as the mixed gamma model and the stochastic input domain model. Bayesian time series analysis is used to assess software reliability growth and other reliability parameters in Horigome, Singpurwalla and Soyer (1984) (HSS) and Singpurwalla and Soyer (1985). The authors assume a power law relationship between T,. and T;_ 1 where T; is defined as the failure time at the ith testing stage (note if a testing stage consists of testing to the first system failure the T; is as previously defined). The relationship assumed is T; = Ti~ , bi (2.2.4) where 0; reflects the effects of the changes made as a result of the (i - 1)st stage of testing and bI is an error term to account for uncertainty. Note that reliability growth will have taken place as a result of changes made in the (i - 1)st stage of testing if 0; > 1; 0; = 1 indicates no improvement and 0; < 1 indicates reliability decay. The model is developed based on the following assumptions: (HSS1) The variables Ti, i = 1. . . . . n, are lognormally distributed with T;~< 1 assumed for all i. (HSS2) The values b;, i = 1, . . . , n, are lognormally distributed with known parameters 0 and a 2. (HSS3) The quantities 0i, i = 1. . . . , n, are exchangeable and are distributed according to some distribution G with density g. Taking the logarithm of both sides of (2.2.4) yields r, -- o , r ; _ 1 + ( 2.2.5) 92 T. A. M a z z u c h i and N. D. Sin~ourwalla where Y~= log T~ and e; = log ~ are normally distributed, the latter with mean 0 and variance alz. The sequence { Y,.} is thus given by a first order autoregressive process with a random coefficient 0,.. By assuming further that 0~ ~ N(2, a22) where a ff is known and 2 ~ N(#, a 2) with # and a32 known, the following posterior results are obtained. (i) (2tYl . . . . . y . ) ~ N ( # . , a.:) with #n=(~232+i=1 ~ Y i Y i -,l/] a f f __ ay = 1 ÷ Yi- 1 /-' i= 1 Wi _ 1.t Wi- 1 = (ii) (0nlYl. . . . . a~Yi- 1 + a~ ; y.)'~ N (a~#"+a~YnY"-l,a~(W._la~+aa2a~)Iw~_l); Wn- 1 (iii) (I1. +1 tyl . . . . . (iv) (On +1 [el . . . . . Y~) "~ N(#n, tr2 + tr2) • y.) ~ + w.) ; Note that aft reflects the views about the consistency of policies regarding modifications and design changes made. Using the above, posterior inference can be obtained for any relevant quantity. For example Bayes probability intervals can be constructed for the next failure time or reliability growth at each stage can be assessed by plotting E[Oilyl, . . . , yi] vs. i. Overall, reliability growth can be examined via E[2Iy~, . . . , y~], i = 1 . . . . . n. In Singpurwalla and Soyer (1985) this basic model is extended by assuming various dependence structures for the sequence {0;}. Three additional models are developed using the structure of the Kalman Filter Model. 2.3. Model unification Though highly criticized, the JM model remains central to the topic of software reliability. Langberg and Singpurwalla (1985) provide an altemative motivation for the JM model using shock models. Stefanski (1982) provides another motivation for the JM model using renewal theoretic arguments. Both works allude to the centrality of the model. Langberg and Singpurwalla further provide a unification of software reliability models by illustrating that many other well known models such as Littlewood-Verrall (1973) and Goel and Okumoto (1979) can be obtained by specifying prior distributions for the parameters of the JM model. Extensions to the basic Bayes model and the discussion of the use of posterior modes as point estimates is given in Jewell (1985). Software reliability models 93 3. Models of the validation phase When a decision is made to stop testing the software (see Forman and Singpurwalla, 1977; Okumoto and Goel, 1979, 1980; Krten and Levy, 1980; Shanthikumar and Tufekci, 1981, 1983; Koch and Kubat, 1983; Chow and Schechner, 1985, for decision criteria), the software enters t.he validation phase. In this phase the software undergoes intensive testing in its operational environment with a goal of obtaining some measurement of its reliability. Software errors are not corrected in this phase and, in fact, a software failure could result in the rejection of the software. Nelson (1978) introduced a simple reliability estimate based on probabilistic laws. Letting e r denote the size of the remaining errors in the program and noting that errors are not removed, the number of runs until a software failure is a geometric random variable with parameter e r. Thus the maximum likelihood estimate of e r can be used to determine an estimate of reliability. This is given as R = 1 - nf/n (3.1) where n is the total number of sample runs and nf is the number of sample runs which ended in failure. The above model suffers from several drawbacks (Ramamoorthy and Bastani, 1982) stemming from its simplicity. (1) A large number of sample runs is required to obtain meaningful estimates. (2) The model is based on the assumption that inputs are randomly selected from the input domain and thus does not consider the correlation of runs from adjacent segments of the input domain. (3) The model does not consider any measure of complexity of the program. Extensions to the basic model have attempted to reduce the number sample runs by specifying equivalence classes for the input domain (Nelson, 1978; Ramamoorthy and Bastani, 1979). This goal is achieved at the cost of an increase in model complexity. Crow and Singpurwalla (1984) address the issue of correlation of inputs using a fourier series model. The authors observe that in many cases software failures occur in clusters and thus the usual assumption that the times between failures are independent may not be valid. Rather they assume that the time between failures is given by T i = f(i) + ~ (3.2) where ee is a disturbance term with mean 0 and constant variance and f(i) is some cyclical trend. To identify the cyclical pattern (if any) with which failures occur, the authors fit the Fourier series model f(i) = eo + ~ [e(kj)cos 2re- kji+ fl(kj)sin 2~r- kji 1 j=l n n (3.3) T. A. Mazzuchi and N. D. Singpurwalla 94 where n (the number of observed time between failures) is assumed odd, q = (n - 1)/2 and kj = j , j = 1 . . . . . q. Using the method of least squares the model parameters are obtained as ao = - 1 ~ ti, (3.4a) /'/ i = 1 ~(kj) = -2 5] t ; c o s -2~ - kfi, ni=l ]~(kj)= 2 ~ t i s i n 2 n k f i , ni=l j = 1. . . . . q, (3.4b) j= q. (3.4c) n 1. . . . . n The spectrogram is used to identify the period of the series, and thus the clustering behavior. A parsimonious model may also be obtained by using only those weights ~(kj) and/~(kj) for which p2(kj) = a2(ki) + fl2(kj) is large. This model was applied to three sets of failure data from each of two software systems. The model was found to adequately represent the failure behavior. One potential problem of the model is that due to the relationship of a(kj) and/~(kj) on trigonometric functions, negative values of f ( i ) may be produced. When such is the case, the authors interpret this as an implication of a very small time between failure. Though the intent of the authors in this paper is data analysis, the model can be used to predict future time between failures and future failure clusters. Also by specifying a functional form for ee (such as the usual normal assumption), inference can be made. 4. Models of the operational phase Models in this phase are used to illustrate the behavior of the software in its operating environment. Both Littlewood (1979) and Cheung (1980) obtain the software reaiiability by assuming the software program is divided into modules. Cheung suggests a combination of deterministic properties of the structure of he software with the stochastic properties of module failure behavior, via a Markov process. He assumes (C1) Reliabilities of the modules are independent. (C2) Transfer of control among program modules is a Markov process. (C3) The program begins and ends with a single module, denoted N 1 and Nn respectively. The state space is divided into N 1. . . . . Nn, C, F where N; are the modules, C indicates successful completion, and F indicates an encountered failure. States C and F are absorbing. Transition probabilities from N; to Nj (i # j ) are given by Software reliability models 95 R~p~j where Ri is the reliability of module i and p~j is the usual transition probability from module i to module j. The transition probability from Ni to F is 1 - R ; and the transition probability from N n to C is given by R,. Thus the reliability of the software is obtained as the probability of being absorbed into state C given that the initial state is N 1. This is obtained as R = S(1, n)R, (4.1) where S(i, j) is the (i, j)th entry in the matrix S = ( I - Q)-1 and Q is the transition matrix of the process with the rows and columns of C and F deleted. The module reliabilities R~ may be determined before system integration by techniques of Section 2 or 3. Transition probabilities may be estimated by running test case. Cheung further discusses the use of this module in determining testing strategies and expected error cost of the software. The latter may be used in place of system reliability in determining the acceptance of the software. Littlewood (1979) assumes semi-Markov process and takes into account the time spent in each module. The model further incorporates two sources of failure: within module failure with rate 7~, i = 1. . . . , n, and failure associated with the transfer from module i to module j which is given with rate 2;j (i # j ) . Assuming that these individual failure rates are small in comparison to the switching rates between modules, Littlewood states that the failure point process of the integrated program is asymptotically a Poisson proces s with rate parameter E,.j YlePu(#~J 7~ + 2u) (4.2) 2,.: 11,pu.f In the above 11 = (H~, ..., Hn) is the equilibrium vector of the imbedded Markov chain, and /x~j is the expected sojourn time in module i before transferring to module j. An estimate of overall program availability is given as E,,j YIiPulX~J (4.3) Ei,j Ilipij[#~ j + #~Jv,rnl + 2um[J] where m~ and m~: are the expected downtime due to failure in module i and due to transfer from module i to module j, respectively. As with Chueng's model individual module failure rates can be obtained before interfacing takes place and all other parameter values may be estimated from test cases or experience with similar programs. Estimation of expected costs of failures is also discussed by Littlewood. 5. Closing comments Though there is a large body of literature on software reliability (see Shick and Wolverton, 1978; Ramamoorthy and Bastani, 1982; Shanthikumar, 1983) several 96 T. A. Mazzuchi and N. D. Singpurwalla issues remain. First, there is a lack of models for the validation, operational and m a i n t e n a n c e phase of the software. Additional models are needed to address such issues as software design and testing criteria for release of software. Furthermore, the vast n u m b e r of models for the testing and development phase has left the user somewhat confused. Criteria for c o m p a r i s o n and selection of software models needs to be developed as is done initially in M u s a and O k u m o t o (1982), Kieffer, Littlewood, Miller and Sofer (1982) a n d I a n n i n o , Musa, Okumoto, Littlewood (1984), and Soyer and Singpurwalla (1985). References Amster, S. J. and Shooman, M. L. (1975). Software reliability: An overview. In: E. Barlow, J. B. Fussell and N. D. Singpurwalla, eds., Reliability and Fault Tree Analysis: Theoretical and Applied Aspects of System Reliability and Safety Assessment. SIAM, Philadelphia, PA, 455-485. Angus, J. E., Schafer, R. E. and Sukert, A. (1980). Software reliability model validation. Proceedings of the 1980 Annual Reliability and Maintainability Symposium, 191-198. Barlow, R. E. and Singpurwalla, N. D. (1985). Assessing the reliability of computer software and computer networks: An opportunity for partnership with computer scientists. The American Statistician 39, 88-94. Cheung, R. C. (1980). A user-oriented software reliability model. IEEE Transactions on Software Engineering 6, 118-125. Chow, C. and Schechner, Z. (1985). On simple statistical stopping rules for software debugging processes. Technical Report. Columbia University. Crow, L. H. and Singpurwalla, N. D. (1984). An empirically developed Fourier series model for describing software failures. IEEE Transactions on Reliability 33, 176-183. Dahil, D. and Lahti, J. (1978). Investigation of methods for production and verification of computer programs with high requirements for reliability. OECD Halden Reactor Project Preliminary Report. Forman, E. H. and Singpurwalla,N. D. (1977). An empirical stopping rule for debugging and testing computer software. Journal of the American Statistical Association 72, 750-757. Forman, E. H. and Singpurwalla, N. D. (1979). Optimal time intervals for testing hypotheses on computer software errors. IEEE Transactions on Reliability 28, 250-253. Goel, A. L. (1980). Software error detection model with application. The Journal of Systems and Software 1, 243-249. Goel, A. Lo (1980). A summary of the discussion on 'An analysis of competing software reliability models'. IEEE Transactions on Software Engineering 6, 501-502. Goel, A. L. and Okumoto, K. (1979). Time-dependent error-detection rate model for software reliability and other performance measures. 1EEE Transactions on Reliability 28, 206-211. Goel, A. L. and Okumoto, K. (1979). A Markovian model for reliability and other performance measures. Proceedings of the National Computer Conference, 769-774. Horigome, M., Singpurwalla, N. D. and Soyer, R. (1984). A Bayes empirical Bayes approach for (software) reliability growth. In: L. Bilard, ed., Computer Science and Statistics: Proceedings of the 16th Symposium on the Interface. North-Holland, Amsterdam, 45-56. Iannino, A., Musa, J. D., Okumoto, K. and Littlewood, B. (1984), Criteria for Software Reliability Model Comparisons. IEEE Transactions on Software Engineering 10, 687-691. Jelinski, Z. and Moranda, P. (1972). Software reliability research. In W. Freiberger, ed., Statistical Computer Performance Evaluation. New York, Academic Press, 465-484. Jewell, W. S. (1985). Bayesian extensions to a basic model of software reliability. Technical Report, Operations Research Center, University of California in Berkeley. Joe and Reid (1983). Estimating the number of faults in a system. Submitted to JASA. KeiUer, P. A., Littlewood, B., Miller, D. R. and Sofer, A. (1982). On the quality of software reliability Software reliability models 97 prediction. In: J. K. Skwirzynski, ed., Electronic Systems Effectiveness and Life Cycle Costing. Springer, New York, 441-460. Kim, J. H., Kim, Y. H. and Park, C. J. (1982). A modified Markov model for the estimation of computer software performance. Operations Research Letters 1, 253-257. Koch, H. S. and Kubat, P. (1983). Optimal Release Time of Computer Software. IEEE Transactions on Software Engineering 9, 323-327. Kremer, W. (1983). Birth-death and bug counting. IEEE Transactions on Reliability 32, 37-46. Krten, O. J. and Levy, J. (1980). Software modeling from optimal field energy. Proceedings of the Annual Reliability and Maintainability Symposium, 410-414. Kyparisis, J. and Singpurwalla, N. D. (1984). Bayesian inference for the Weibull process. In: L. Bilard, ed., Computer Science and Statistics; Proceedings of the 16th Symposium on the Interface. North-Holland, Amsterdam, 57-64. Kyparisis, J., Soyer, R. and Daryanani, S. (1984). Computer programs for inference from the Weibull process. Institute for Reliability and Risk Analysis Technical Report, The George Washington University, Washington, DC. Langberg, N. and Singpurwalla, N. D. (1985). Unification of some software reliability models via the Bayesian approach. SIAM Journal on Scientific and Statistical Computing 6, 781-790. Lipow, M. (1974). Some variations of a model for software time-to-failure. Correspondence ML-742260.1, TRW Systems Group. Littlewood, B. (1979). How to measure software reliability and how not to. IEEE Transactions on Reliability 28, 103-110. Littlewood, B. (1979). Software reliability model for modular program structure. IEEE Transactions on Reliability 28, 241-246. Littlewood, B. (1980). The Littlewood-Verral model for software reliability compared with some rivals. The Journal of Systems and Software 1,251-258. Littlewood, B. (1980). Theories of software reliability: How good are they and how can they be improved. IEEE Transactions on Software Engineering 6, 489-500. Littlewood, B. (1981). A critique of the Jelinski-Moranda model for software reliability. Proceedings of the 1981 Annual Reliability and Maintainability Symposium, 357-362. Littlewood, B. (1981). Stochastic reliability growth: a model for fault-removal in computer-programs and hardware design. IEEE Transactions on Reliability 30, 313-320. Littlewood, B. and Veri'all, J. L. (1973). A Bayesian reliability growth model for computer software. Applied Statistics 22, 332-346. Littlewood, B. and Verrall, J. L. (1981). Likelihood function of a debugging model for computer software reliability. IEEE Transactions on Reliability 30, 145-148. Meinhold, R. J. and Singpurwalla, N. D. (1983). Bayesian analysis of a commonly used model for describing software failures. The American Statistician 32, 168-173. Moranda, P. B. (1975). Prediction of software reliability during debugging. Proceedings of the 1981 Annual Reliability and Maintainability Symposium, 327-332. Moranda, P. B. (1979). Event-altered rate models for general reliability analysis. IEEE Transactions on Reliability 28, 376-381. Musa, J. D. (1975). A theory of software reliability and its application. IEEE Transactions on Software Engineering 1, 312-327. Musa, J. D. (1979). Validity of execution-time theory of software reliability. IEEE Transactions on Reliability 28, 181-191. Musa, J. D. and Okumoto, K. (1982). Software reliability models: Concepts classification, comparisons, and practice. In: J. K. Skwirzynski, ed., Electronic Systems Effectiveness and Life Cycle Costing. Springer, New York, 395-423. Musa, J. D. and Okumoto, K. (1984). A logarithm Poisson execution time model for software reliability measurement. Proceedings of the 1984 Reliability and Maintainability Symposium. Nelson, E. (1978). Estimating software reliability from test data. Microelectron. Reliab. 17, 67-74. Okumoto, K. and Goel, A. L. (1979). Optimal release time for software systems. Proceedings of COMPSAC, 500-503. 98 T. A. Mazzuchi and N. D. Singpurwalla Okumoto, K. and Goel, A. L. (1980). Optimal release time for software systems based on reliability and cost criteria. Journal of Systems and Software 1, 315-318. Petroski, C. M. (1984). A survey of software reliability. Student Report, The George Washington University. Ramamoorthy, C. V. and Bastani, F. B. (1979). An input domain based approach to the quantitative estimation of software reliability. Proceedings of the Taipei Seminar on Software Engineering, Taipei, Taiwan. Ramamoorthy, C. V. and Bastani ,F. B. (1980). Modeling the software reliability growth process. Proceedings of COMPSAC, Chicago, IL, 161-169. Ramamoorthy, C. V. and Bastani, F. B. (1982). Software reliability-status and perspectives. IEEE Transactions on Software Engineering 8, 354-371. Schick, G. J. and Wolverton, R. W. (1978). An analysis of competing software reliability models. IEEE Transactions on software Engineering 4, 104-120. Schick, G. J. and Wolverton, R. W. (1973). Assessment of software reliability. Proceedings Operations Research, Physica, Werzberg-Wein, 395-422. Schneidewind, N. F. (1975). An analysis of computer processes in computer software. Proceedings of the International Conference on Reliable Software, 337-346. Shanthikumar, J. G. (1981). A general software reliability model for performance prediction. Mircoelectron. Reliab. 27, 671-682. Shanthikumar, J. G. (1983). Software reliability models: A review. Microelectron. Reliab. 23, 903-943. Shanthikumar, J. G. and Tufekci, S. (1981). Optimal release time using generalized decision trees. Proceedings of the Fourteenth Annual Hawaii International Conference on System Sciences, 58-65. Shanthikumar, J. G. and Tufekci (1983). Application of a software reliability model to describe software release time. Microelectron. Reliab. 23, 41-59. Shooman, M. L. (1972). Probabilistic models for software reliability prediction. In: W. Freiberger, ed. Statistical Computer Performance Evaluation. Academic Press, New York, 485-502. Shooman, M. L. (1973). Operational testing and software reliability estimation during program development. Record of the 1973 IEEE Symposium on Computer Software Reliability, 51-57. Shooman, M. L. (1975). Software reliability: Measurement and models. Proceedings of the 1975 Annual Reliability and Maintainability Symposium, 485-489. Shooman, M. L. and Trivedi, A. K. (1976). A many state Markov model for computer software performance parameters. IEEE Transactions on Reliability 25, 66-68. Singpurwalla, N. D. and Soyer, R. (1985). Assessing (software) reliability growth using a random coefficient autoregressive process and its ramifications. To appear in IEEE Transactions on Software Engineering. Sukert, A. N. (1977). An investigation of software reliability models. Proceedings of the Annual Reliability and Maintainability Symposium, 78-84. Sukert, A. N. (1979). Empirical validation of three software prediction models. IEEE Transactions on Reliability 28, 199-205. Stefanski, L. A. (1982). An application of renewal theory to software reliability. Proceedings of the Twenty-Seventh Conference on the Design of Experiments in Army Research Development Testing. ARO Report 82-2, 101-118. Wagoner, W. L. (1973). The final report on a software reliability measurement study. Report TOR0074-(41221)-1, The Aerospace Corp., El Segundo, CA. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988)99-111 ~,~ Dependence Notions in Reliability Theory Narasinga R. Chaganty and Kumar Joag-dev 1. Introduction The concepts of stochastic dependence play an important role in many statistical applications. Although in reliability theory it is rare that new dependence concepts are created, the well known concepts such as Markov dependence, total positivity, stochastic monotonicity and some others related to positive dependence are quite important. The study of their significance and relevance in reliability theory is the main object of the present chapter. The definitions and some immediate consequences of the concepts which we use in the following, have already appeared in the Handbook of two articles: Boland and Proschan (this Volume, Chapter 10), and Joag-dev (see Vol. 4, Chapter 4). We briefly review these for the sake of completeness. Part I The first part of our study will consist of the effects of dependence on the classification of life distributions according to the properties of aging. Most of these concepts originate in the bivariate case and due to its importance and simplicity we will study this case in more detail. Major source for the material covered in this part consists of the articles by Freund (1961), Harris (1970), Brindly and Thompson (1972), Shaked (1977) and the book by Barlow and Proschan (1981). 1.1. Definitions Let (X, Y) be a pair of real valued random variables defined on a fixed probability space. The joint distribution function and the marginals of (X, Y) will be denoted by Fx, v, Fx and F r and the corresponding density functions by fx, r, f x , f y respectively. We write I(A) for the indicator of an event A. Many of the concepts of positive and negative dependence can be defined in terms of conditions on covariances of functions restricted to certain classes. Thus conditions 99 100 N. R. Chaganty and K. Joag-dev (a) Cov[X, Y] i> 0, (1.1) (b) Cov[gl(X), hi(Y)] >~ 0, where gl and hi are nondecreasing, (c) Cov[g2(X, Y), h2(X, r)] >i 0, where g2 and hE are co-ordinatewise nondecreasing, define successively (strictly) stronger positive dependence conditions. Condition (b) is known as positive quadrant dependence (PQD), it can be seen to be equivalent to (b ~) C o v [ I ( X > x), I ( Y > y)] >10. Condition (c) is known as association. A condition stronger than (c) known as positive regression dependence is obtained by requiring (d) E [ f l ( X ) I Y = y] to be nondecreasing in y, for every nondecreasing function L. Note that this condition is non-symmetric. A condition known as 'monotone likelihood ratio' or 'totally positive of order 2 (TP2)' is even stronger and is given by (e) fx. r(x2, Y2)fx, r(xl, Y,) >>"fx. r(x2, Y , ) f x , r(x,, Y2) for x 2 > Xl and Y2 > YI. Some of the concepts above have multivariate analogs. We mention some of these. Corresponding to PQD, two non-equivalent multivariate generalizations can be described. First one is called 'positive upper orthant dependence' (PUOD) and the second one is labeled as 'positive lower orthant dependence' (PLOD). These are defined by the conditions: k (1.2) P[Xi >~ x e, i = 1, . . . , k] >t l-I e [ x , >/xe) i=1 for every x = (xl, ..., xk)e Nk, k P [ X i <~x i, i = 1, . . . , k] >t l~ P[Xi <<-xi] for e v e r y x e ~ k . (1.3) i=1 The condition of 'association' for X P U O D and PLOD is given by = (X1, Cov[gk(X), hk(X)] 1> O, ... , Xk) , which is stronger than (1.4) for every co-ordinatewise nondecreasing pair of functions ~k_~ R. This condition was first introduced and studied by Esary, Proschan and Walkup (1967), (see Boland and Proschan's article in this volume). A version of regression dependence similar to (d) above would be to require, for every i = 1. . . . , k, E[fI(Xi)[Xj = Xy, j = 1, . . . , (i - 1)], (1.5) Dependence notions in reliability theory 101 nondecreasing in each xj, for every f l nondecreasing. This is sometimes known as 'positive regression dependence in sequence'. It can be shown that this implies association. The property of association is important for obtaining bounds on the survival probabilities of the coherent systems. For example, if it is a series system and if the component lives are denoted by Ti then the system life is min e T~ and association provides the bound, P[min T,. > t] ~> 1-]P[Ti> t]. (1.6) i Similar bound can be obtained for a parallel system. These two bounds can be combined to obtain bounds for a general coherent system. An analog of TP z dependence given in (e) above is obtained by imposing this condition on every pair of the arguments of the joint density in •k, while other arguments are kept fixed. This condition known was MTP z implies association (see for example Barlow and Proschan, 1981). Finally, some of these conditions with appropriate changes, may be used to define negative dependence. For example, see Block, Savits and Shaked (1982) and Joag-dev and Proschan (1983). The components of a vector X = (Xa, X2, . . . , Xk) are negatively associated if for every nonempty subset A of { 1, 2, ..., k} and every pair of co-ordinatewise nondecreasing functions g and h, the Cov(f(XA), g(X~)) is nonpositive, where A- denotes the complement of A. Negative dependence is relevant in systems defined in closed environments. For example, a given number of species competing in an ecosystem with a fixed amount of resources, may have their life lengths negatively associated. 1.2. Dependence and aging classification We adopt the usual notation. A life distribution function F is said to be increasing failure rate (IFR) if the ratio r(x) = f(x)/ff(x) is nondecreasing in x. We say F is decreasing failure rate (DFR) if r(x) is nonincreasing in x. Here f is the density corresponding to F and ff = 1 - F is the survival function. The function r(x) is known as the failure rate. The distribution function F is said to be increasing failure rate on the average (IFRA) if [if(x)] l/x is nonincreasing in x >/0 and F is new better than used (NBU) if ff(x + y) <~F(x)F(y) for all x, y >t 0. Let X, Y be the life-lengths of two components. We examine some dependence relations which have interpretations in terms of failure rates. First note that IFR property is equivalent to having F log concave. Thus the conditional failure rate r(x] Y = y) being increasing in x for every y, is equivalent to having conditional survival function ffc(xly) log concave in x for every y. Suppose now that r(x[y) is decreasing in y for every x, in addition to the conditional IFR property. This would imply d d2 - - - r(xly) (logFc(xly)) >~ O, dy dy dx (1.7) N. R. Chaganty and K. Joag-dev 102 or equivalently ff(xl Y = y) considered as a function of x and y is TP z. Note that if the joint density fx. r(x, y) is TP z, so is the conditional density fc(xly). However, this implies that ffo(xly) is TP 2. This is analogous to the univariate case, where log-concavity implies IFR. Another quantity of interest is the 'mean residual life', re(x), which is the conditional expectation of life at age x. This is given by m(x) = f; tf(t) dt/ff(x) = i(t) dt/i(x). (1.8) The life distribution F is said to be increasing mean residual life (IMRL) if m(x) is increasing in x >~ 0. We say F is decreasing mean residual life (DMRL) if m(x) is decreasing in x >~ 0. To obtain the monotone behaviour of the conditional mean residual life mo(x]y), it can be shown that it suffices to have h(x, y) = fx (t - x)fc(tly) dt (1.9) be TP 2. Again it can be shown that this condition is weaker than that needed for the monotonicity of r(xlc). These results and some extensions were derived by Shaked (1977). In the same article, Shaked (1977) also introduced the concept of dependence by total positivity (DTP) for bivariate distributions. Recently Lee (1985a) generalized the DTP concepts to the multivariate case and obtained a number of inequalities and monotonicity properties of conditional hazard rate and mean residual life functions of some multivariate distributions satisfying the DTP property. In a subsequent paper Lee (1985b) introduced the concept of dependence by reverse regular (DRR) rule, which is the mirror image of DTP, and studied the relationship of DRR with other concepts of negative dependence. Harris (1970) defined IHR (increasing hazard rate) property for a multivariate distribution by requiring (a) ff(x + t 1)~if(x) nonincreasing in x, and (b) P [ X > u IX > x] nondecreasing in x for every fixed vector u. (1.10) Geometric interpretation of (b) has prompted its name 'right comer set increasing' (RCSI). Condition (a) is clearly 'wear out' condition, while as we shall see, (b) describes positive dependence. Brindley and Thompson (1972) studied the class of distributions where only (a) is satisfied. In order to distinguish between these two classes based on aging property, one satisfying (a) is called IFR, while the subclass with the additional requirement of (b) is called IHR (H is for hazard or Harris!). In both cases the classes can be seen to be closed under (a) taking subsets (b) unions of independent sets of variables (c) taking minimums over subsets. Note that ir,Jportance of the minimums stems from its role in the series systems. Both definitions, when restricted to univariate, yield the usual IFR distribution. For the univariate case (b) is trivially satisfied. Dependence notions in reliability theory 103 To see that RCSI implies positive dependence, let K and M be arbitrary subsets (not necessarily disjoint) of { 1, 2 . . . . . n}. Denoting appropriate subvectors by x K and xM etc., it can be seen readily that (1.10b) implies that (1.11) P[XM > uM ]XK > xK] is a co-ordinatewise nondecreasing function of Xk, for every u M fixed. Repeated application of condition (1.11) with a singleton K yields F(x) >~ f i F;(xi) (1.12) i=1 which is PUOD. It would be worthwhile to mention examples of distributions where the above dependency concepts are manifested in a natural way. If the components are independent, then most of the conditions are trivially satisfied and hence we consider those having dependent components. Let U, X~, X 2 be independent random variables. Consider Y1 = min(U, X1), I12 = min(U, X2), such functions determine the life of a system where the component corresponding to U is connected in series. These functions are also important when U represents the arrival time of a shock which disables components corresponding to X~, X 2. This model, when U, Xa, X 2 each has exponential distribution, was studied by Marshall and Olkin (1967). They also studied its multivariate analog where different shocks disable 2, 3. . . . , n components. It should be noted that the property of association is preserved due to the fact that minimum of random variables is a co-ordinatewise increasing functions. Gumbel (1960) discussed a simple model with bivariate distribution where its survival function is given by m G(x, y) = exp(- x - y - bxy) , (1.13) x, y >~ O , where 0 ~< b ~< 1. It is clear that the marginals are exponential and since X has negative regression dependence, it is only appropriate when two variables have such dependence. Freund (1961) describes a bivariate model of a two component system where the joint survival function is same as that of two independent exponentially distributed random variables with shape parameters e and B, as long as both components have not failed. Upon failure of one item the shape parameter of the life distribution of the other component is changed to e I or changed to ill. The joint survival probability function can be written as if(x, y) = exp ( - (e + fl)x) [[( fl - - fll e x p ( - (e + f l ) ( y - x)) "-i e + (e +/~ - ¢~') e x p ( - fll(y - x ) ) / , 3 x~y N. R. Chaganty and K. Joag-dev 104 ~--CZ 1 exp( - (~ + / / ) ( y - x)) = exp(-(o~ + ]~)x)[ioc q exp(-eX(y- x))[, + (~ +/~ _ ~1) A y~< x . (1.14) The marginal distributions are not exponential but are certain mixtures of exponentials and the nature of dependence is determined by the relative magnitudes of the parameters. In fact, ffl(X ) - ~- O~1 ( . +/~ - .~) exp ( - (~ +/~)y) + fl ( . +/~ - .~) exp ( -- ~ 1X) (1.15) and - exp ( - (~ + fl)x) + exp ( - fl'y). (1.16) It is easy to verify that Fl(x) is IFR if and only if ~ < ~1 and F2(y) is IFR if and only if fl < ill. Part II The second part of our study deals with dependence concepts relevant to the models which consider repair and replacement of the components of a system. These dependent concepts arise from the study of the theory of stochastic processes. Some of the classical types of stochastic processes characterized by different dependence relationships are Markov processes, Renewal processes and Markov renewal processes. The latter includes the previous two as special cases. The dependent relations such as total positivity, association, stochastic monotonicity studied in Part I, have natural occurrence among these processes. It is needless to say that the vast number of results in the study of the above processes have wide applications in reliability theory. In the next few sections we shall examine some of these processes and their applicability in characterizing the failure rate of the life distributions of systems, as well as in obtaining bounds of some other quantities of interest in reliability theory. The organization of this part is as follows: In Section 2.1, we define totally positive Markov process and discuss some useful theorems related to this process. A concept weaker than totally positivity is stochastic monotonicity, that is, all totally positive Markov processes are stochastically monotone but not vice versa. This is discussed in Section 2.2. Dependencenotionsin reliabilitytheory 105 Many of the models in reliability theory which consider replacement of items as they fail can be delineated by a renewal process. The renewal function is defined as the expected number of items replaced at a given instant of time. We can obtain lower and upper bounds for the renewal function, when the life distribution of the items is assumed to be in one of the reliability classes of life distributions. These results are discussed in the last Section 2.3. 2.1. Totally positive Markov processes DEFINITION 1. A stochastic process {XA t+ [0, +)} is said to be a Markov process with state space S if for any t, s >_-0 and j in S, P[X,+ s = J IX,; u ~< tl = P[Xt+s =j (2.1) [Xt]. The Markov process is said to be a time-homogeneous Markov process when the conditional probability, (2.2) P [ X t + s = J bXt = i] = P s ( i , j ) is independent of t >t 0, for all i, j in S and s >/0. The collection of matrices j)), t > 0, is simply called the transition function of the Markov processes. P , = (Pt(i, DEFINITION 2. A Markov process with transition matrix Pt is said to be totally positive (TP) if i I < i 2 < . • • < i n a n d J l <J2 < " " " < J , , the determinant [ il, ,, P t; inl "' Jl, .,Jn et(il'Jl)'''et(il'Jn) = " pt(i,,,jl)...Pt(i,,,jn) " - is strictly positive when t > 0 for all n >~ 1. If (2.3) holds for n ~< r, we say that the Mmkov process is totally positive of order r (TPr). When the state space S is a countable set and the parameter set is the set of integers, the Markov process is known as a Markov chain. The Markov chain is said to be time-homogeneous if the transition function Pn is independent of n, in which case we simply write P. The Markov chain is totally positive if P satisfies condition (2.3). Karlin and McGregor (1959a, b) have shown that, indeed several Markov chains and Markov processes are totally positive, the prominant one being the birth and death process. An excellent treatise of totally positive Markov chains and totally positive Markov processes together with applications in several domains of mathematics, including reliability theory, is given in Karlin (1964). Typical of the results of Karlin (1964) are the following theorems regarding inheritance of TP character. 106 N. R. Chaganty and K. Joag-dev THEOREM 3. Let the transition matrix P of a Markov chain {XK, K >/1) be TP r. Define for i > j, Q(n, i) = P [ j < XK <~ i, l <~K <~ n - 1, xn = j [ X o = i ] . (2.4) Then Q(n, i) is TPr for n >~ 0 and i > j. The TP property is also prevalent when the initial state of the Markov chain is fixed. We state this in the theorem below. THEOREM 4. Assume the hypothesis of Theorem 3. Define for i > L Q I [ n , j ) = P [ i < XK < j , l < K < n - 1, X n = j [ X o = i 1. (2.5) Then Q1 is TPr in the variables n ~ O and i > j. The above Theorem 4 was used by Brown and Chaganty (1983) to show that the first passage time distribution from an initial state to a higher state in a birth and death process is IFR. This result was also obtained by Keilson (1979), Derman, Ross and Schechner (1979) using other methods. Another application of Theorem 4 is given by Assaf, Shaked and Shanthikumar (1985). They have shown that the time to failure of some systems which are subject to shocks and damages, which are not necessarily nonnegative, is IFR. 2.2. Stochastic monotonicity in Markov processes A useful notion weaker than total positivity is stochastic monotonicity. This concept was introduced by Kalmykov (1962) and later was discussed in detail by Veinott (1965), Daley (1968), O'Brien (1972) and Kirstein (1976). A detailed study of stochastic monotonicity in Markov processes can be found in the book by Keilson (1979). Stochastic monotonicity is a structural property of the Markov process. The random variables in such processes are associated and this connection gives rise to many interesting inequalities in reliability theory. We define below, stochastic monotonicity for Markov chains and then extend the definition for Markov processes. DEFINITION 5. A Markov chain {XK, K >/0) is said to be stochastically monotone if XK+ 1 given X K = i, is stochastically larger than XK+ 1 given X K =j, for all k / > 0 and i > j . The extension of stochastic monotone property to continuous time Markov processes is straight forward. DEFINITION 6. A time-homogeneous Markov process {Art, t >~ 0} is said to be stochastically monotone if X t given Xo = x l is stochastically larger than Xt given X o = x 2 for all t > 0 and X l > X 2 . Dependence notions in reliability theory 107 Numerous Markov processes are indeed stochastically monotone. These include Markov diffusion processes. More generally the class of totally positive Markov process is a proper subset of the class of stochastically monotone Markov process. Stochastically monotone Markov chains with partially ordered state spaces were introduced by Kamae, Krengel and O'Brien (1977) and their applications to problems in reliability theory were studied by Brown and Chaganty (1983). We discuss these after introducing some notation. Let S be a countable set with a partial ordering denoted by >~. A subset C of S is said to be increasing set if i belongs to C and j >/i implies j is in C. A time homogeneous Markov chain {Xn, n ~> 0} with state space S is said to be stochastically monotone if for j >/i, the transition probability from j to C is larger than from i to C, for all increasing sets C. The Markov chain is said to have monotone paths if P ( X n + 1 >>-Xn) = 1, for all n >/0. The following theorem characterizes the class of I F R A distributions with stochastically monotone Markov chains. THEOREM 7. Let S be a partially ordered countable set. Let {X n, n >i 0} be a stochastically monotone Markov chain with monotone paths and state space S. Let C be an increasing subset of S, with finite complement. Then the first passage time from state i to set C is IFRA. Shaked and Shanthikumar (1984) generalized the above theorem by removing the restriction that the complement of C is finite. As a converse to Theorem 7 we have the following result. THEOREM 8. Every I F R A distribution in discrete time & either the first passage time distribution to an increasing set for a stochastically monotone Markov chain with monotone paths on a partially ordered finite set, or the limit of a sequence of such distributions. Analogous theorems in the continuous time frame also hold. The above theorems were used by Brown and Chaganty (1983) to show that the convolution of two I F R A distributions is IFRA. Various other applications of the above theorems to shock models in reliability theory, sampling with and without replacement can also be found in Brown and Chaganty (1983). Stochastically monotone Markov chains also take an important place in obtaining optimum control limit rules. The following formulation is due to Derman (1963). Suppose that a system is inspected at regular intervals of time and that after each inspection it is classified into one of (m 4- 1) states denoted by 0, 1, 2 . . . . . m. A control limit rule l simply says that replace the system is the observed state is one of the states k, k + 1, . . . , m for some predetermined state k. The state k is called the control limit of l. Let X n denote the observed state of the system in use at time n >/0. We assume that {X~, n ~> 0} is a stationary Markov chain. Let c ( j ) denote the cost incurred when the system is in state j. Let L denote the class of all possible control limit rules. For l ~ L , the asymptotic N.R. Chagan~ and K. Joag-dev 108 ,n expected average cost is defined as A(I) = l i m , _ ~ 1/n ~,= 1 c(X,). The following theorem was proved by Derman (1963). THEOREM 9. Let the Markov chain {X~, n >/0} be stochastically monotone. Then there exists a control limit rule l* such that A (I*) = miLnA (l). (2.6) 2.3. Renewal theory in reliability Let {Xi, i/> 1} be a sequence of nonnegative, independent and identically distributed random variables. Let S n = X 1 + . . . + X n be the nth partial sum and let N, be the maximum value of n for which S n ~< t. In the context of reliability theory we can think that the Xt's represent the life times of items being replaced. The partial sum Sn represents the time at which the nth renewal takes place and N t is the number of renewals that will have occurred by time t. The dependent process {N,, t ~> 0} is known as a renewal process. The study of renewal theory is to derive properties of certain random variables associated with N t from the knowledge of the distribution function F of X~. In this section we shall discuss the important results, when the underlying distribution F is assumed to belong to one of the reliability classes of life distributions. For an extensive study of the general theory of renewal process we refer the reader to the expository article by Smith (1968) and to the books by Cox (1962), Feller (1966) and Karlin and Taylor (1975). The renewal function M(t) = E[Nt] plays a central role in reliability, especially in maintenance models. It is useful to get bounds on M(t) for finite t, since in most cases computing M(t) may be difficult. One such bound is given by M(t) ~ t/#~ - 1, where #1 is the mean of F. Under the additional assumption that F is IFR, Obretenov (1974) obtained the following sharper bound: t M(t) >~-- + - - - 1, ~/1 (2.7) ]'/1 where ~ = l i m n _ o ~ n + l / ( n + 1)/~, #n =E(X~). Barlow and Proschan (1964), while studying replacement policies, when the life distribution of the unit is IFR, obtained the following lower and upper bounds for the renewal random variable N t • THEOREM 10. Let R(t) = -logF(t). If F is IFR with mean #5 then (a) P(N,~n)~ (b) P(N, >~n) >~ ~ (nR(t/n))J e x p ( - nR(t/n)) j=, j! for t >>.O, n >~ l. ~ (t/l~l)J e x p ( - t / ~ t l ) , j=n j! for0~<t<#l, Dependence notions in reliability theory 109 Under weaker conditions on F we have the following theorem. THEOREM 11. (a) Let R(t) = - logF(t). I f F is N B U with finite mean then P(N t >~n) <~ ~Z (R(t)) exp(- R(t)), j=, j! (b) M(h) <~M ( t + h) - M ( t ) , (c) Var (Nt) < M(t) for t >~ O, h >~ O, n >>. 1. The reverse inequalities in the above theorem are valid for F new worse than used (NWU), that is, ff(x + y) >1 F(x)F(y), for all x, y/> 0. In a two paper series Brown (1980, 1981) obtained nice properties for the renewal function M(t) when the underlying distribution F is assumed to be D F R or IMRL. Let Z(t) = S N ( t ) + 1 - - t denote the forward recurrence time at time t and A(t) = t - SN,, the renewal age at t. The following theorem can be found in Brown (1980, 1981). THEOREM 12. (a) I f the underlying distribution F of the renewal process is DFR, then the renewal density M ' ( t ) exists on (0, ~ ) and is decreasing, that is, M(t) is concave. Furthermore, Z(t), A(t) are both stochastically increasing in t >/O. (b) I f F is I M R L then M ( t ) - t/l~ is increasing in t>~ 0 and E[~b(Z(t))] is increasing in t >/0 for increasing convex functions ~. In the case where F is IMRL, Brown (1981) provides counter examples to show that Z(t) is not necessarily stochastically increasing, E[A(t)] not necessarily increasing and M(t) need not to be concave. An example of Berman (1978) shows that the analogous results do not hold for I F R and D M R L distributions. As an application of Theorem 12, Brown (1980) obtained sharp bounds for the renewal function M(t) for F I M R L , with improved bounds for F DFR. These results are given in the next theorem. THEOREM 13. Let Pn = E(X~'), n ~> 1. Let U(t) = t/Izl +/~2/2#12. Let #K+2 be finite for some k ~ O. I f F is I M R L then U(t) >~ M(t) >~ U(t) - min d i t - ' , (2.8) O<~i<~k where the constant di is a simple function of gl . . . . . then U(t) >~M(t) >1 U(t) - min uid~t -~, O<~i<~K where % = 1, c¢i = (i/i + 1) i for i >1 1. #;÷2. Furthermore, if F is D F R (2.9) 110 N. R. Chaganty and K. Joag-dev M a r s h a l l and P r o s c h a n (1972) o b t a i n e d the f o l l o w i n g c h a r a c t e r i z a t i o n o f the N B U class o f life d i s t r i b u t i o n s in t e r m s o f the r e n e w a l p r o c e s s N,. THEOREM 14. The distribution function F is B N U ( N W U ) i f and only i f N(s + t) >1 ( <~) N ( s ) • N(t) f o r all s, t >~ O, where • denotes the convolution operation. Esary, M a r s h a l l and P r o s c h a n (1973) e s t a b l i s h e d the following I F R A p r o p e r t y for the r e n e w a l p r o c e s s , while studying s o m e s h o c k m o d e l s . THEOREM 15. L e t {Nt, t>~0} be a renewal process. Then P [ N t > / k ] l/k is decreasing in k >~ 1, that is, N t possesses the discrete I F R A property. References Assaf, D., Shaked, M. and Shanthikumar, J. G. (1985). First passage times with PF r densities. Journal of Appl. Prob. 22, 185-196. Barlow, R. E. and Proschan, F. (1964). Comparison of replacement policies, and renewal theory implications. Ann. Math. Statist. 35, 577-589. Barlow, R. E. and Proschan, F. (1981). Statistical Theory of Reliability and Life Testing. To Begin With, Silver Spring, Maryland. Berman, M. (1978). Regenerative multivariate point processes. Adv. Appl. Probability 10, 411-430. Block, H. W., Savits, T. H. and Shaked, M. (1982). Some concepts of negative dependence. Ann. of Probability 10, 765-772. Brindley, E. C. Jr. and Thompson, W. A. Jr. (1972). Dependence and aging aspects of multivariate survival. Journal of Amer. Stat. Assoc. 67, 822-830. Brown, M. (1980). Bounds, inequalities, and monotonicity properties for some specialized renewal processes. Ann. of Probability 8, 227-240. Brown, M. (1981). Further monotonicity properties for specialized renewal processes. Ann. of P,obability 9, 891--895. Brown, M. and Chaganty, N. R. (1983). On the first passage time distribution for a class of Markov Chains. Ann. of Probability 11, 1000-1008. Cox, D. R. (1962). Renewal Theory. Methuen, London. Daley, D. J. (!968). Stochastically monotone Markov chains. Z. Wahrsch. verw. Gebiete 10, 305-317. Derman, C. (1963). On optimal replacement rules when changes of state are Markovian. In: Richard Bellman, ed., Mathematical Optimization Techniques. Univ. of California Press, 201-210. Derman, C., Ross, S. M. and Schechner, Z. (1979). A note on first passage times in birth and death and negative diffusion processes. Unpublished manuscript. Esary, J. D., Marshall, A. W. and Proschan, F. (1973). Shock models and wear processes. Ann. of Probability 1, 627-649. Esary, J. D., Proschan, F. and Walkup, D. W. (1967). Association of random variables, with applications. Ann. Math. Stat. 38, 1466-1474. Feller, W. (1966). An Introduction to Probability Theory and lts Applications, Vol. II. Wiley, New York. Freund, J. E. (1961). A bivariate extension of the exponential distribution. Journal of Amer. State. Assoc. 56, 971-977. Gumbel, E. J. (1960). Bivariate exponential distributions. Journal of Amer. Star. Assoc. 55, 698-707. Harris, R. (1970). A multivariate definition for increasing hazard rate distribution functions. Ann. Math. Statist. 41, 713-717. Joag-dev, K. and Proschan, F. (1983). Negative association of random variables with applications. Ann. Statist. 11, 286-295. Karlin, S. (1964). Total positivity, absorption probabilities and applications. Trans. Amer. Math. Soc. Dependence notions in reliability theory 111 III, 33-107. Karlin, S. and McGregor, J. (1959a). Coincidence properties of birth and death processes. Pacific Journal of Math. 9, 1109-1140. Karlin, S. and McGregor, J. (1959b). Coincidence probabilities. Pacific Journal of Math. 9, 1141-1164. Karlin, S. and Taylor, H. M. (1975). A First Course in Stochastic Processes, 2nd edition. Academic Press, New York. Kalmykov, G. I. (1962). On the partial ordering of one-dimensional Markov processes, Theor. Prob. Appl. 7, 456-459. Kamae, T., Krengel, U. and O'Brien, G. C. (1977). Stochastic inequalities on partially ordered spaces. Ann. of Probability 5, 899-912. Keilson, J. (1979). Markov Chain Models--Rarity and Exponentiality. Springer, New York. Kirstein, B. M. (1976). Monotonicity and comparability of time homogeneous Markov processes with discrete state space. Math. Operations Forschung. Stat. 7, 151-168. Lee, Mei-Ling Ting (1985a). Dependence by total positivity. Ann. of Probability 13, 572-582. Lee, Mei-Ling Ting (1985b). Dependence by reverse regular rule. Ann. of Probability 13, 583-591. Marshall, A. W. and Proschan, F. (1972). Classes of distributions applicable in replacement, with renewal theory implications. Proceedings of the 6th Berkeley Symposium on Math. Stat. and Prob. I. Univ. of California Press, Berkeley, CA, 395-415. Marshall, A. W. and Olkin, I. (1967). A multivariate exponential distribution. Journal ofAmer. Stat. Assoc. 62, 30-44. Obretenov, A. (1974). An estimation for the renewal function of an IFR distribution. In: Colloq. Math. Soc. Janos Bolyai 9. North-Holland, Amsterdam, 587-591. O'Brien, G. (1972). A note on comparisons of Markov processes. Ann. of Math. Stat. 43, 365-368. Shaked, M. (1977). A family of concepts of dependence for bivariate distributions. Journal of Amer. Stat. Assoc. 72, 642-650. Shaked, M. and Shanthikumar, J. G. (1984). Multivariate IFRA properties of some Markov jump processes with general state space. Preprint. Smith, W. L. (1958). Renewal theory and its ramifications. J. Roy. Statist. Sot., Series B 20, 243-302. Veinott, A. F. (1965). Optimal policy in a dynamic, single product, nonstationary inventory model with several demand classes. Operations Research 13, 761-778. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 113-120 '7 / Application of Goodness-of-Fit Tests in Reliability B. W. W o o d r u f f a n d A. H. M o o r e 1. Introduction Prior to using a probability model to represent the population underlying data, it is important to test adequacy of the model. One way to do this is by a goodness-of-fit test. However one must make an initial selection of models to be tested. Several avenues are available for an initial screening of the data. One could construct histograms, frequency polygons or more sophisticated non-parametric density estimates [4, 23]. Another very useful initial screening device is the use of a probability plot on special graph paper available for a variety of common distributions used in life testing. Nelson [19] gives an extensive coverage to the use of probability plots in his book on reliability theory. After one has selected a model to be tested further, an initial screening of the model could be done by a X2 goodness-of-fit test discussed below. If the Z2 test rejects at a suitable significance level, then one can proceed to test other reasonable models. However if one fails to reject the model, then one should consider, if possible, other more powerful goodness-of-fit tests. 2. Z2 goodness-of-fit tests This classical test is an almost universal goodness-of-fit test since it can be applied to discrete, continuous or mixed distributions, with grouped or ungrouped data, model completely specified or with the parameters estimated. It can also be adapted to be used with censored data or truncated distributions. The test is an approximate test since the sample statistic is only asymptotically g 2 distributed. Several authors have shown it to have lower power than other applicable tests. In applying the test, the data must be grouped into intervals. Since several statisticians may group the data differently, this may lead to a change in the reject or accept decision and hence the test is not unique. It also requires moderate to large sample sizes. 113 114 B. W. Woodruff and A. H. Moore 2.1. X2 test procedure Ho: F(x) = Fo(x), H A : F(x) ~ Fo(x ) . Take a random (or censored) sample from the unknown distribution and divide the support set into a set of k subsets. Now under the null hypothesis, determine the expected number of observations in each subset denoted by E i (i = 1. . . . . k). The observed number of sample observations in each subset is denoted by O,. A usual rule is to choose the subsets so that the expected number of observations in each subset is greater than or equal to 5. The test statistic is k ( O i - Ei)2 i=1 Ei We reject Ho if Z^2 > ) ~ 2. k - p - i where p is the number of parameters estimated in the specification of the null hypothesis Fo(x). 3. Graphical techniques A probability plot is a very useful way to provide a preliminary examination of how well a particular distribution fits the data. It is fast and easy to use and can provide parameter and percentile estimates of the distribution. It can be applied to complete and censored data and to grouped data. There are probability graph papers for normal, lognormal, exponential, Weibull, extreme-value and chi-square distributions. Weibull graph paper may be used for the Rayleigh distribution by assuming the shape parameter is two. 3.1. Procedure for graphical techniques (i) Order the observations from smallest to largest x(i ) (1 ~< i ~ n). (ii) Assign the value of the cdf at each order statistic F(x(o ). A reasonable value of the cdf at the ith order statistic is its median rank (i - 0.3)/(n + 0.4). Exact tables of median ranks are available for the smaller values of i and n (where n is the sample size). Harter [8, 10] recently wrote several papers where he studied various plotting positions. (iii) Plot the values of x(~) vs. F(x(o ) on the probability paper. The papers are constructed so that if a particular distribution fits the data, then the graph will be approximately a straight line. A curved line would indicate that the chosen distribution is inadequate to model the sample. Probability plots could also uncover mixtures of distributions in modeling the sample. Mardia [ 11] states: Application of goodness-of-fit tests in reliability 115 'The importance of the graphical method should not be underestimated and it is always worthwhile to supplement a test procedure with a plot.' 4. Modified goodness-of-fit test Goodness-of-fit tests based on the empirical distribution fimction (EDF) fall into two categories: (a) a test where the probability model to be tested is completely specified and a single table may be used for all continuous distributions for each test statistic, and (b) a test where the parameters are estimated, called modified goodness-of-fit tests. A different table must be used for each family of distributions. Occasions where the null hypothesis may be completely specified are rare and that, except for one case, will not be pursued further in this paper. If one foolishly used tables for the completely specified case when the parameters are estimated then the actual a error is much smaller than the specified value so strongly biasing the test towards acceptance that it is almost equivalent to accepting H o without testing. See Lawless [ 12] for an extensive coverage of goodnessof-fit tests. 4.1. M o d i f i e d test statistics b a s e d on E D F To use a modified goodness-of-fit test based on the EDF, one has to choose a family of cdfs of the form F [ ( x - c)/O] where c is a location parameter and 0 is a scale parameter. The estimators of the nuisance parameters must be scale and location invariant. Usual estimators having this property are maximum likelihood estimators. When the estimators are inserted in the cdf, we will denote the cdf evaluated at each order statistic under the null hypothesis Fo[(X i - d ) / 0 ] by t0i. Consider the following test statistics: (i) The Kolmogorov-Smirnov statistic /£: /£ = max(D +, D - ) , where D + = 1.u.b. (i/n - P i ) , l <~ i <~ n . D = 1.u.b.(F, - [ ( i - 1)/n]), (ii) The Anderson-Darling statistic ,~2: ,~2 = _ ~ [/~._ ( 2 i - 1)/2n] 2 + (1/12n). i=1 (iii) The Cramer-von Mises statistic 1~'2: I'V2 = ~ [Fi - ( 2 i - 1)/2n] z + (1/12n). i=1 116 B. W. Woodruffand A. H. Moore (iv) The Kuiper statistic I7": I?=D+ +D-. (v) The Watson statistic U2: 0 2 = I,V2 - n ( F - 1/2)2 w h e r e P = ~ Filn. i=1 When the parameters are estimated by location and scale estimators, then the null distribution of the test statistic and hence its percentage points do not depend on c and 0. However in using the tables, one must use the same estimators as were used in the construction of the table. The table of critical values and the power of the test is affected by the invariant estimators chosen. 4.2. Normal (and lognormal) Mardia [ 11] gave an extensive discussion on tests of univariate and multivariate normality. Many of the techniques discussed are applicable to other distributions. In Table 1, he summarized the main univariate test statistics. If the distribution is a two-parameter lognormal, then if we transform the data by taking the logarithm of each observation, then we have a sample from the normal distribution with mean/~ and variance 02. If in a test for normality with the transformed data we accept H o, then we are accepting that the original data was lognormal with parameters /~ and 02. Lilliefors [13] derived by Monte Carlo simulation tables for a modified goodness-of-fit test for the normal using the Kolmogorov-Smirnov (KS) statistic and pointed out the difference in the percentage points for the modified test and standard test for a completely specified Ho. He tabled critical values for n = 4(1)20(5)30 for significance levels ~ = 0.01, 0.05(0.05)0.20. He performed a power study for n = 10 and 20 with ~ -- 0.05 and = 0.10 using four alternate distributions. In the power study he demonstrated that the modified KS test had considerably higher power than the Z 2 test. Green and Hegazy [6] derived tables for the modified goodness-of-fit test for the normal among other distributions using Cramer-von Mises (CvM) and Anderson-Darling (AD) statistics for n = 5, 10, 20, 40, 80, 160. Their power study showed improved power over other known tests. 4.3. Exponential and Rayleigh distributions Lilliefors [14] derived tables for a modified KS goodness-of-fit test for the exponential distribution with unknown mean. He tabled critical values for n = 3(1)20(5)30 and for significance levels 0.1, 0.05(0.05) 0.20 and of an n = 10, 20, and 50. He conducted a power study for two alternative distributions. Woodruff et al. [24] and Bush et al. [2] derived tables or modified KS, CvM and AD tests for the two-parameter negative-exponential (Weibull with shape parameter 1.0) for n = 5(1)15(5)30 and significance levels as above. A power Application of goodness-of-fit tests in reliability 117 study was done for seven altemate distributions. It was shown that the CvM test had the highest power for most of the alternative distributions studied when the null hypothesis was the two parameter negative exponential. Woodruff et al. [24] and Bush et al. [2] also derived tables for the Rayleigh distribution (Weibull shape parameter 2.0) for the same sample sizes and significance levels given above. The papers by Woodruff and Bush also studied a range of other Weibull shape parameters from 0.5(0.5)4.0. A second power study with seven alternate distributions showed that the AD statistic was the most powerful when the null distribution was a Weibull with shape parameter 3.5. A relationship between critical values and the inverse of the shape parameter was presented for the range of shape parameters studied. 4.4. Extreme-value and Weibull distributions Nancy Mann [16] used the fact that two-parameter Weibull distributions (with known location parameter) may be transformed, by taking the logarithm of the observations, to the extreme-value distribution. After the transformation, one has a family with unknown scale and location parameters. She was able by deriving the variance-covariance matrix of the standardized order statistic from extremevalue distribution, to obtain best linear unbiased (BLUE) and best linear invariant (BLIE) estimators for the unknown parameters and hence estimates of the parameters of the original Weibull distribution. It should be noted that the estimators of the parameters of the extreme-value are invariant scale and location parameter estimators. In a following paper [ 17], she derived a goodness-of-fit test for the extreme-value distribution of smallest-values. Accepting the smallest extreme-value distributions as the model for the transformed data is equivalent to accepting the Weibull distribution as the model for the original data. The test is not an E D F test but several papers based on the E D F followed that used the same principal of transforming the Weibull into the extreme-value distribution. Littell et al. [15] derived, by Monte Carlo techniques, tables of critical values for the modified KS, CvM and AD statistics for the extreme-value distribution for n = 10(5)40 and ~ = 0.1, 0.5(0.5)0.20. They use maximum likelihood estimators for the parameters. A power study compared the three new goodness-of-fit tests with several earlier ones. In a later paper, Chandra et ai. [3] derived tables of critical values for modified goodness-of-fit statistics for the KS and for the Kuiper tests for testing the fit to the extreme-value distribution with unknown parameters. The unknown parameters were estimated by the method of maximum likelihood. 4.5. Gamma distribut&n Woodruff et al. [25] derived tables for the percentage points for the modified KS, AD and CvM statistics for goodness-of-fit tests for the gamma distribution with unknown scale and location parameters and known shape parameter for n = 5(5)30 and/~ = 0.1, 0.5(0.5)0.20. 118 B. W. Woodruff and A. H. Moore A power study indicated that for larger sample sizes, the CvM was the most powerful of the three tests. The equation C = a o + ~l(1/fl 2) describes the form of the relationship between the critical values C and the shape parameter fl derived for each of the statistics studied. Again ML estimators were used. 4.6. Logistic distribution Woodruff et al. [26] derived tables of critical values for the modified KS, AD and CvM goodness-of-fit statistics for the logistic distribution with unknown shape and location parameters. ML estimators were used to obtain estimates of the unknown parameters. The statistics were tabled for sample sizes n = 5(5)30 and significance levels ~ = 0.1, 0.5(0.5)0.20. A power study indicated quite good power against uniform and exponential alternatives. The modified KS test had lower power than the other two tests studied. 4.7. Pareto distribution Porter and Moore [20] derived tables of critical values for the modified KS, AD, and CvM goodness-of-fit statistics for the Pareto distribution with unknown location and scale parameters and known shape parameters. Best linear unbiased estimators were used to obtain the parameter estimates. The critical values were tabled for sample sizes n = 5(5)30, significance levels ~ = 0.1, 0.5(0.5)2.0 and Pareto shape parameters 0.5(0.5)4.0. The powers were investigated for eight alternative distributions. A functional relation between the critical values of test statistics and the Pareto shape parameters was derived. 4.8. Laplace distribution Yen and Moore [28] derived tables of critical values for the modified AD and CvM goodness-of-fit statistics for the Laplace distribution. The critical values were tables for sample sizes n = 5(5)50 and significance levels ~ = 0.1, 0.5(0.5)0.20. The AD test generally yielded higher power than the CvM test. 5. Modifications of the EDF One way to improve the power of a goodness-of-fit test is to improve the non-parametric estimate of the distribution function. Harter, Khamis and Lamb [7] modified the definition of the cdf at the ith order statistic to obtain a (modified) KS test statistic for the case where the probability model to be tested is completely specified. They have shown that the test obtained in this fashion is substantially more powerful than the usual KS tests for small to moderate sample sizes. Harter [9] also developed asymptotic formulaes for the critical values of the above test statistic. Application of goodness-of-fit tests in reliability 119 6. New modified goodness-of-fit tests New goodness-of-fit tests for symmetric alternatives were obtained by Moore et al. [18], W o o d r u f f et al. [27] and Yen and Moore [29] for the normal, uniform, and Laplace distributions, respectively. A reflection technique in which the data points are reflected about an invariant estimate of the mean is used to double the sample size. The new sample is used to obtain a better estimate of the distribution function to be used in the goodness-of-fit statistics. New tables were derived for the KS, A D and C v M statistics. The new goodness-of-fit statistics are still invariant with respect to a change in scale or location parameters. Extensive power studies showed that the new test yielded considerably higher power for sample sizes greater than or equal to 25 for all symmetric or nearly symmetric alternative distributions. For non-symmetric alternative distributions, the new test showed a decrease in power which was expected since Schuster [21] showed that the reflection technique gave a poorer estimate of the distribution function in this case. 7. Likelihood ratio tests When a goodness-of-fit test fails to reject two families of distributions, one can use a likelihood ratio test to discriminate between them. Bain [1] ~ves an extensive coverage to likelihood ratio tests. H e lists the test statistic to be used to discriminate between normal vs. two-parameter exponential, normal vs. double exponential, normal vs. Cauchy, Weibull vs. lognormal, and extreme-value vs. normal. For large samples, the asymptotic likelihood ratio test could be used. For small samples from other distributions Monte Carlo techniques can be used to obtain the percentage points of the sample statistic for the likelihood ratio test. References [1] Bain, L. J. (1978). Statistical Analysis of Reliability and Life-Testing Models (Theory and Methods). Marcel Dekker, New York and Basel. [2] Bush, J. G., Woodruff, B. W., Moore, A. H. and Dunne, E. J. (1983). Modified Cramer-von Mises and Anderson-Darling tests for Weibull distribution with unknown location and scale parameters. Commun. Statist.-- Theor. Meth. A 12, 2463-2476. [3] Chandra, M., Singpurwalla, N. and Stephens, M. A. (1981). Kolmogorov statistics for tests of fit for the extreme-value and Weibull distributions. J. Amer. Statist. Assoc. 71, 204-209. [4] Devroye, L. and Gyrrfi, L. (1985). Non-Parametric Density Estimation: the Li View. Wiley, New York. [5] Durbin, J. (1975). Kolmogorov-Smirnov tests when parameters are established with application tests of exponentiality and tests on spacings. Biometn'ka 62, 5-22. [6] Green, J. R. and Hegazy, Y. A. S. (1976). Powerful modified goodness-of-fit tests. J. Amer. Statist. Assoc. 71, 204-209. [7] Harter, H. L., Khamis, H. T. and Lamb, R. E. (1984). Modified Kolmogorov-Smirnov tests of goodness-of-fit. Commun. Statist.--Simula. Computa. 13, 293-323. 120 B. W. Woodruff and A. H. Moore [8] Harter, H. L. (1984). Another look at plotting positions. Commun. Statist.--Theor. Method. 13, 1613-1633. [9] Hatter, H. L. (1984). Asymptotic formulas for critical values of a modified Kolmogorov test statistic. Communications in Statistics B 13, 719-721. [10] Harter, H. L. and Wiegand, R. P. (1985). A Monte Carlo study of plotting positions. Commun. Statist.--Simula. Computa. 14, 317-343. [11] Krishnaiah, P. R. (1980). Handbook of Statistics I, North-Holland, Amsterdam. [12] Lawless, J. F. (1982). Statistical Models and Methods for Lifetime Data. Wiley, New York. [13] Lilliefors, H. W. (1967). On the Kolmogorov test for normality with mean and variance unknown. J. Am. Statist. Assoc. 62, 143-147. [14] Lilliefors, H. W. (1969). On the Kolmogorov test for the exponential distribution with mean unknown. J. Am. Statist. Assoc. 64, 387-389. [15] Littell, R. D., McClave, J. T. and Often, W. W. (1979). Goodness-of-fit tests for the twoparameter Weibull distribution. Commun. Statist.--Simula. Computa. B 8, 257-269. [16] Mann, N. R. (1968). Point and interval estimation procedures for the two-parameter Weibull and extreme-value distributions. Technometrics 10, 231-256. [17] Mann, N. R., Scheuer, E. M. and Fertig, K. W. (1973). A new goodness-of-fit test for the two parameter Weibull or extreme-value distribution with unknown parameters. Communications in Statistics 2, 383-400. [18] Moore, A. H., Ream, T. J. and Woodruff, B. W. A new goodness-of-fit test for normality with mean and variance unknown. (Submitted for publication.) [19] Nelson, W. (1982). Applied Life Data Analysis. Wiley, New York. [20] Porter, J. E., Moore, A. H. and Coleman, J. W. Modified Kolmogorov, Anderson-Darling and Cramer-von Mises tests for the Pareto distribution with unknown location and scale parameters. (Submitted for publication.) [21] Schuster, E. F. (1975). Estimating the distribution function of a symmetric distribution. Biometrika 62, 631-635. [22] Stephens, M. A. (1977). Goodness-of-fit for the extreme-value distribution. Biometrika 64, 583-588. [23] Tapia, R. A. and Thompson, J. R. (1978). Nonparametric Probability Density Estimation. The Johns Hopkins University Press, Baltimore and London. [24] Woodruff, B. W;, Moore, A. H., Dunne, E. J. and Cortes, R. (1983). A modified Kolmogorov-Smirnov test for Weibull distributions with unknown location and scale parameters. IEEE Transactions on Reliability 32, 209-213. [25] Woodruff, B. W., Viviano, P. J., Moore, A. H. and Dunne, E. J. (1984). Modified goodness-of-fit tests for gamma distributions with unknown location and scale parameters. 1EEE Transactions on Reliability 33, 241-245. [26] Woodruff, B. W., Moore, A. H., Yoder, J. D. and Dunne, E. J. (1986). Modified goodness-of-fit tests for logistic distribution with unknown location and scale parameters. Commun. Statist.--Simula. Computa. 15(1), 77-83. [27] Woodruff, B. W., Woodbury, L. B. and Moore, A. H. A new goodness-of-fit test for the uniform with unspecified parameters. (Submitted for publication.) [28] Yen, V. C. and Moore, A. H. Modified goodness-of-fit tests for the Laplace distribution. (Submitted for publication.) [29] Yen, V. C. and Moore, A. H. New modified goodness-of-fit tests for the Laplace distribution. (Submitted for publication.) P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 121-129 v Multivariate Nonparametric Classes in Reliability Henry W. Block* and Thomas H. Savits* I. Introduction This paper is a sequel to the survey paper of Hollander and Proschan (1984) who examine univariate nonparametric classes and methods in reliability. In this paper we will examine multivariate nonparametric classes and methods in reliability. Hollander and Proschan (1984) describe the various univariate nonparametric classes in reliability. The classes of adverse aging described include the IFR, IFRA, NBU, N B U E and D M R L classes. The dual classes of beneficial aging are also covered. Several new univariate classes have been introduced since that time. One that we will briefly mention is the H N B U E class, since we are aware of several multivariate generalizations of this class. The univariate classes in reliability are important in applications concerning systems where the components can be assumed to be independent. In this case the components are often assumed to experience wearout or beneficial aging of a similar type. For example, it is often reasonable to assume that components have increasing failure rate (IFR). In making this I F R assumption it is implicit that each component separately experiences wear and no interactions among components can occur. However in many realistic situations, adverse wear on one component will promulgate adverse wear on other components. From another point of view a common environment will cause components to behave similarly. In either situation, it is clear that an assumption of independence on the components would not be valid. Consequently multivariate concepts of adverse or beneficial aging are required. Multivariate nonparametric classes have been proposed as early as 1970. For background and references as well as some discussion of univariate classes with multivariate generalizations in mind see Block and Savits (1981). In the present paper we shall only describe a few fundamental developments prior to 1981 and * Supported by Grant No. AFOSR-84-0113 and ONR Contract N00014-84-K-0084. 121 122 H. W. Block and T. H. Savits focus on developments since then. The coverage will not be exhaustive but will emphasize the topics which we feel are most important. Section 2 deals with multivariate nonparametric classes. In Section 2.1 multivariate IFRA is discussed with emphasis on the Block and Savits (1980) class. Multivariate N B U is covered in Section 2.2 and multivariate N B U E classes are mentioned in Section 2.3. New developments in multivariate IFR are considered in Section 2.4 and in Section 2.5 the topics of multivariate D M R L and H N B U E are touched on. Familiarity with the univariate classes is assumed. The basic reference for the IFR, IFRA, NBU and N B U E classes is Barlow and Proschan (1981). See also Block and Savits (1981). For information on the D M R L class see Hollander and Proschan (1984). The H N B U E class is relatively recent and the best references are the original articles. See for example, Klefsj6 (1982) and the references contained there. 2. Multivariate nonparametric classes Many multivariate versions of the univariate classes were proposed using generalizations of various failure rate functions. These multivariate classes were extensively discussed in Block and Savits (1981). Other classes were proposed by attempting to imitate univariate definitions in a multivariate setting. (See also Block and Savits, 1981.) One of the most important of these extensions was due to Block and Savits (1980) who generalized the IFRA class. This multivariate class was proposed to parallel the developments of the univariate case where the IFRA class possessed many important closure properties. As in the univariate case the following multivariate class of IFRA, designated the MIFRA class, satisfies important closure properties. First, as in the univariate case, monotone systems with MIFRA lifetimes have MIFRA lifetimes and independent sums of MIFRA lifetimes are MIFRA. From the multivariate point of view, subfamilies of MIFRA are MIFRA, conjunctions of independent MIFRA are MIFRA, scaled MIFRA lifetimes are MIFRA, and various other properties are satisfied. We discuss this extension first since several other classes have been defined using similar techniques. 2.1. Multivariate I F R A Using a characterization of the univariate IFRA class in Block and Savits (1976) the following definition can be made. DEFINITION 2.1.1. Let T = (T1, ..., 7",) be a nonnegative random lifetime. The random vector T is said to be M I F R A if E~'[h(T)] <<.E[h~'(T/o~] for all continuous nonnegative nondecreasing functions h and all 0 < ~ ~< 1. Multivariate nonparametric classes in reliability 123 This definition as mentioned above implies all of the properties one would desire for a multivariate analog of the univariate IFRA class. Part of the reason for this is that the definition is equivalent to many other properties which are both theoretically and intuitively appealing. The statement and proofs of these results are given below; the form in which these are presented is influenced by the paper of Marshall and Shaked (1982) who defined a similar M N B U class. NOXES. (1) Obviously in Definition 2.1.1 we need only consider h defined on E+ = {xlx >i 0}. Hence all of the functions and sets mentioned below are assumed to be Borel measurable in ~q+. (2) We say a function g is homogeneous (subhomogeneous) on ~ + if ~g(t) = (<~)g(at) for all 0~< a~< 1, 0 ~ < t . (3) A is an upper set if x ~ A and x <<,y implies y ~ A. THEOREM 2.1.2. The following conditions are all equivalent to T being MIFRA. (i) P~{T~A" 5 <~P{T~c~A) for all open upper sets in R~+, all 0 < oct< 1. (ii) P ~ { T 6 A ) < ~ P { T ~ A ) for all upper sets in R"+, all 0 < ~ < 1 . (i.e. E~((o(T)) <~E(gp~(T/~)) for all nonnegative, binary, nondecreasing ~ on ~+ ). (iii) E~(h(T))<~E(h(T/~)) for all nonnegative, nondecreasing h on R~+, all 0<~<1. (iv) For all nonnegative, nondecreasing, subhomogeneous h on ~"+, h(T) is IFRA. (v) For all nonnegative, nondecreasing, homogeneous h on R+, h(T) is IFRA. PROOF. (i) => (ii). By Theorem 3.3 of Esary, Proschan and Walkup (1967) for an upper set A and any e > 0 there is an open upper set A~ such that A c A~ and P { T 6 ~A~} <~P{TE aA} + e. Thus P~{T~A} <<.P={T~A~} <<.P(T~ ~A~} <<.P{T~ ~a) + e. (ii) ~ (iii). Let hk, k = 1, 2, . . . , be an increasing sequence of increasing step functions such that l i m k _ ~ h , = h. Specifically take i-1 if i - 1 k if 2~ / 2k < , h ( t ) < 2 k ' i= 1 , 2 , . . . , k 2 k hl,(t) = h(t)>, k , i.e. h~(t)= kZk E ~1 IA,.~(t) i=1 where IA,.~ is the indicator function of the upper set Ai,/, = {tih(t) >~i/2k}. Thus H. IV. Block and T. H. Savits 124 we need only prove the result for functions of the form h(t) = ~ ailA~(t), ai>~ O, i=l where A1, . . . , A m are upper sets, since the remainder follows by the monotone convergence theorem. We have E~(i~=l ailAi(T))=[i~l aiP{T~Ai}] ~ V~, a,P1/~'{ }]'~ =[~=, {~a~'l,(t/¢)dF(,)~)l/~-]1 Li=I <~i=~, f ailAi(t/g)dF(t) = E ([;=~ ailA,(T/~))~] where the last inequality is due to Minkowski. (iii) =:- Def. Obvious. Def. ~ (i). From Esary, Proschan and Walkup (1967) for any open upper set A there exist nonnegative, nondecreasing, continuous functions h~, such that hkT IA. Then apply the monotone convergence theorem. (iii) ~ (iv). Let h be nonnegative, nondecreasing and subhomogeneous. Then <~E(l(t, oo)(~ h(T)))= e{h(T)> ~t} where the first inequality follows from (iii) and the second by the subhomogeneity. (iv) ~ (v). Obvious. (v) => (vi). Let A be an open upper set and define sup 0 > 0 : 1 { h(t) = 0 o t~A } if 0 O: 1 otherwise. Then h is nonnegative, nondecreasing and homogeneous. Thus P={TeA} = P={h(T)> 1} -<. P{h(T)> ~} = P{T~ ~4}. Multivariate nonparametric classes in reliability 125 NOTE 2.1.3. The following two alternate conditions could also have been added to the above list of equivalent conditions (provided F(0) = 1). (vi) P ~ { T ~ A } <~P{T~ ~A} for each set of the form A = U,."_1A+ where A + = { x l x > x + } , x+E~+ and for all 0 < c ~ < 1 . (vii) For each k - - 1 , 2 . . . . . for each a o, i = 1. . . . . k, j = 1. . . . . n, 0 ~< a+/~< 0% and for each coherent life function z of order kn z(allT1, a~2T~ . . . . . alnT1, a21T2, . . . , ak, T,) is IFRA. (See Block and Savits (1980) for a definition of coherent life function and for some details of the proof.) In conjunction with the preceding result the following lemma makes it easy to demonstrate that a host of different lifetimes are MIFRA. LEMMA 2.1.4. Let T be MIFRA and ~1 . . . . . t~m be any continuous, subhomogeneous functions of n variables. Then if Si= ~O+(T) for i= 1. . . . . m, S = (S1, . . . , Sm) is MIFRA. PROOF. This follows easily by considering a nonnegative, increasing, continuous function h of m variables and applying the M I F R A property of T and the monotonicity of the ~;. COROLLARY 2.1.5. Let ~ . . . . . rm be coherent life functions and T be MIFRA. Then (z~(T) . . . . . zm(T)) is MIFRA. PROOf. Since coherent life functions are homogeneous this follows easily. EXAMPLE 2.1.6. Let X 1. . . . . X n be independent I F R A lifetimes and 0 = S + c { 1 , 2 . . . . . n}, i = 1. . . . , m . Since it is not hard to show that independent I F R A lifetimes are MIFRA, it follows that T+ = minj+s Xs, i = 1. . . . . m, are MIFRA. Since many different types of multivariate I F R A can be generated in the above way, the example shows that any of these are MiFRA. See Esary and Marshall (1979) where various types of multivariate I F R A of the type in this example are defined. See Block and Savits (1982) for relationships among these various definitions. Multivariate shock models with multivariate I F R A properties have been treated in Marshall and Shaked (1979) and in Savits and Shaked (1981). 2.2. Multivariate NBU As with all of the multivariate classes, the need for each of them is evident because of the usefulness of the corresponding univariate class. The only difference is that in the multivariate case, the independence of the components is lacking. In particular the concept of N B U is fundamental in discussing maintenance policies in a single component system. For a multicomponent system, where components are dependent, marginally components satisfy the univariate N B U property under various maintenance protocols. However, a joint concept 126 H. W. Block and T. H. Savits describing the interaction of all the components is necessary. Hence multivariate N B U concepts are required. Most of the earliest definition of multivariate N B U (see for example Buchanan and Singpurwalla, 1977) consisted of various generalizations of the defining property of the univariate N B U class. For a survey of these see definitions (1)-(5) of Section 5 of Block and Savits (1981). For shock models which satisfy these definitions see Marshall and Shaked (1979), Griffith (1982), Ebrahimi and Ghosh (1981) and Klefsjo (1982). Other definition s involving generalizations of properties of univariate N B U distributions are given by (7)-(9) of the same reference. These are similar to definitions used by Esary and Marshall (1979) to define multivariate I F R A distributions. Definitions (7) and (8) of the Block and Savits (1981) reference represent a certain type of definition and bear repeating here. The vector T is said to be multivariate N B U if: ~(T1, . . . , Tn) is N B U for all ~ in a certain class of life functions; (2.2.1) There exist independent N B U X~ . . . . . X k and life functions % i = 1, . . . , n, in a certain class such that T,. = vi(X), i = 1. . . . , n. (2.2.2) E1-Neweihi, Proschan and Sethuraman (1983) have considered a special case of (2.2.2) where the zi are minimums and have related this case to some other definitions including the special case of (2.2.1) where ~ is any minimum. As shown in Theorem 2.1, definitions involving increasing functions can be given equivalently in terms of upper (or open upper) sets. Two multivariate N B U definitions which were given in terms of upper sets were those of E1-Neweihi (1981) and Marshall and Shaked (1982). These are respectively: For every upper s e t A c R + and for every 0 < ~ < 1 P { T ~ A } <~P(min(T'/c~, T"/(1 - cO~A) (2.2.3) where T, T ' , T" are independent and have the same distribution. For every upper s e t A c N + and for every ~ > 0 , f l > 0 (2.2.4) Relationships among these definitions are given in E1-Neweihi (1981). A more restrictive definition than either of the above has been given in Berg and Kesten (1984): For every upper A, B c ~n, P ( T c A + B) <~P ( T c A ) P ( T c B ) . (2.2.5) This definition was shown to be useful in percolation theory as well as reliability theory. Multivariate nonparametric classes in reliability 127 A general framework involving generalizations of the concept (2.2.1) called taking the C-closure of ~ and of the concept (2.2.2) called C-generating from (where ~ is the class of univariate NBU lifetimes in (2.2.1) and (2.2.2)) has been given by Marshall and Shaked (1984). Many of the previous NBU definitions are organized within this framework. A similar remark applies when the classes ~- are exponential, IFR, IFRA and NBUE. See Marshall and Shaked (1984). 2.3. Multivariate NBUE Along with the multivariate NBU versions of Buchanan and Singpurwalla (1977) are integrated versions of these definitions. These authors give three versions of multivariate NBUE. The relations among these and closure properties are discussed in Ebrahimi and Ghosh (1981). Furthermore the latter authors relate these multivariate N B U E definitions to four definitions of multivariate NBU (i.e. definitions (1)-(4) of Section 5 of Block and Savits (1981)). Some other multivariate N B U E classes are mentioned by Block and Savits (1981) and Marshall and Shaked (1984). One extension of a univariate characterization of the N B U E class mentioned in Block and Savits (1978) has been proposed by Savits (1983). 2.4. Multivariate IFR Perhaps the most important univariate concept in reliability is that of increasing failure rate. One reason for this is that in a very simple and compelling way this idea describes the wearout of a component. Many engineers, biologists and actuaries find this description fundamental. The monotonicity of the failure rate function is simple and intuitive and occurs in many physical situations. This also is crucial in the multicomponent case where the components are dependent. Several authors have attempted to describe the action of the failure rates increasing for n components simultaneously. These cases were discussed in Block and Savits (1981) and in the references contained therein. A recent definition of multivariate IFR was given by Savits (1985) and is in the spirit of the classes defined by Block and Savits (1980) and Marshall and Shaked (1982). For shock models involving multivariate IFR concepts see Ghosh and Ebrahimi (1981). It is shown in Savits (1985) that a univariate lifetime T is IFR if and only if E[h(x, T)] is log concave in x for all functions h(x, t) which are log concave in (x, t) and are nondecreasing in t for each fixed x >~ 0. This leads to the following multivariate definition. DEFINITION 2.4.1. Let T be a nonnegative random vector. Then T has an MIFR distribution if E[h(x, T)] is log concave in x for all functions h(x, t) which are log concave in (x, t) and nondecreasing in t >~ 0 for each fixed x >~ 0. This class enjoys many closure properties. Among these are that all marginals are MIFR, conjunction of independent M I F R are MIFR, convolutions of MIFR 128 H. W. Block and T. H. Savits are MIFR, scaled MIFR are MIFR, nonnegative nondecreasing concave functions of MIFR are MIFR, and weak convergence preserves MIFR. See Savits (1985) for details. From these results it follows that the multivariate exponential distribution of Marshall and Olkin (1967)is MIFR, as are all distributions with log concave densities. Since the multivariate folded normal has a log concave density, it also is MIFR. The technique used in Definition 2.4.1 for the MIFR class extends to other multivariate classes. In particular, if we replace log concave with log subhomogeneous, we get the same multivariate IFRA class as in Definition 2.1.1; if we replace log concave with log subadditive, we get a new multivariate NBU class which is between that of (2.2.3) and (2.2.4). For more details see Savits (1983, 1985). 2.5. Multivariate D M R L and H N B U E Few definitions of multivariate D M R L have been discussed in the literature, although E. E1-Neweihi has privately communicated one to us. Since developments are premature with respect to this class we will not go into details. Multivariate extensions of the H N B U E class have been proposed by Basu and Ebrahimi (1981) and Klefsj0 (1980). The extensions of the former authors are similar in spirit to the multivariate N B U E classes of Ghosh and Ebrahimi (1981). The latter author's definition extends the univariate definition by replacing the univariate exponential distribution with the bivariate Marshall and Olkin (1967) distribution and considers various multivariate versions of the definition. Basu and Ebrahimi (1981) show relationships among their definitions and KlefsjO's, given some closure properties and also point out relations with multivariate N B U E classes. References Barlow, R. E. and Proschan, F. (1981). Statistical Theory of Reliability and Life Testing: Probability Models. To Begin With, Silver Spring, MD. Basu, A. P. and Ebrahimi, N. (1981). Multivariate HNBUE distributions. University of MissouriColumbia, Technical Report # 110. Berg, J. and Keston, H. (1984). Inequalities with applications to percolation and reliability. Unpublished report. Block, H. W. and Savits, T. H. (1976). The IFRA closure problem. Ann. Prob. 4, 1030-1032. Block, H. W. and Savits, T. H. (1978). Shock models with NBUE survival. J. AppL Prob. 15, 621-628. Block, H. W. and Savits, T. H. (1980). Multivariate increasing failure rate average distributions. Ann. Prob. 8, 793-801. Block, H. W. and Savits, T. H. (1981). Multivariate classes in reliability theory. Math. of O.R. 6, 453-461. Block, H. W. and Savits, T. H. (1982). The class of MIFRA lifetimes and its relation to other classes. NRLO 29, 55-61. Buchanan, B. and Singpurwalla, N, D. (1977). Some stochastic characterizations of multivariate survival. In: C. P. Toskos and I. Shimi, eds., The Theory and AppL of Reliability, Vol. I, Academic Press, New York, 329-348. Multivariate nonparametric classes in reliability 129 Ebrahimi, N. and Ghosh, M. (1981). Multivariate NBU and NBUE distributions. The Egyptian Statistical Journal 25, 36-55. E1-Neweihi, E. (1981). Stochastic ordering and a class of multivariate new better than used distributions. Comm. Statist.-Theor. Meth. A 10(16), 1655-1672. EI-Neweihi, E., Proschan, F. and Sethuraman, J. (1983). A multivariate new better than used class derived from a shock model. Operations Research 31, 177-183. Esary, J. D. and Marshall, A. W. (1979). Multivariate distributions with increasing hazard rate averages. Ann. Prob. 7, 359-370. Esary, J. D., Proschan, F. and Walkup, D. W. (1967). Association of random variables, with applications. Ann. Math. Stat. 38, 1466-1474. Ghosh, M. and Ebrahimi, N. (1981). Shock models leading to increasing failure rate and decreasing mean residual life survival. J. Appl. Prob. 19, 158-166. Griffith, W. (1982). Remarks on a univariate shock model with some bivariate generalizations. NRLQ 29, 63-74. Hollander, M. and Proschan, F. (1984). Nonparametric concepts and methods in reliability. In: P. R. Krishnaiah and P. K. Sen, eds., Handbook of Statistics, Vol. 4, Elsevier, Amsterdam. Klefsj6, B. (1980). Multivariate HNBUE. Unpublished report. Klefsj6, B. (1982). NBU and NBUE survival under the Marshall-Olkin shock model. IAPQR Transactions 7, 87-96. Klefsj6, B. (1982). The HNBUE and HNWUE classes of life distributions. NRLQ 29, 331-344. Marshall, A. W. and Olkin, I. (1967). A generalized bivariate exponential distribution. J. Appl. Prob. 4, 291-302. Marshall, A. W. and Shaked, M. (1979). Multivariate shock models for distributions with increasing hazard rate average. Ann. Prob. 7, 343-358. Marshall, A. W. and Shaked, M. (1982). A class of multivariate new better than used distributions. Ann. Prob. 10, 259-264. Marshall, A. W. and Shaked, M. (1984). Multivariate new better than used distributions. Unpublished report. Savits, T. H. and Shaked, M. (1981). Shock models and the MIFRA property. Stoch. Proc. Appl. 11, 273-283. Savits, T. H. (1983). Multivariate life classes and inequalities. In: Y. L. Tong, ed., Inequalities on Probability and Statistics IMS, Hayward, CA. Savits, T. H. (1985). A multivariate IFR class. J. Appl. Prob., 22, 197-204. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 131-156 0 ./ Selection and Ranking Procedures in Reliability Models* Shanti S. Gupta and S. Panchapakesan I. Introduction Situations abound in practice where the aim of the statistical analyst is to compare two or more populations in some fashion with a view to rank them or select the best one(s) among them. For example, a purchasing firm may want to determine which one of several competing suppliers of components for a certain computer is producing the highest quality product. Typically, the populations that are compared will be life length distributions of the components from the competing manufacturers. The best population could be defined as the one with the largest mean life or with the largest quantile (percentile) of a given order. In such situations, the classical tests of homogeneity are not designed to answer efficiently several possible questions of interest to the experimenter. Selection and ranking procedures were initially devised in the early 1950's to provide the analyst appropriate tools to answer these questions. Most of the investigations in the last thirty odd years have adopted one or the other of two basic formulations. One of them is the so-called indifference zone (IZ) formulation of Bechhofer (1954) and the other is the subset selection (SS) approach of Gupta (1956). Our main purpose in this paper is to describe some important selection procedures that are relevant to reliability models. Selection procedures are available in the literature for various parametric families of distributions. Many of these distributions serve as appropriate models for the life length of a unit. However, we will be concerned with only a few of these such as exponential, gamma, and Weibull distributions. Besides some nonparametric and distribution-free procedures, we emphasize selection procedures for restricted families of distributions such as the increasing failure rate (IFR) and increasing failure rate on the average (IFRA) families which are of importance in reliability problems. In dealing with these procedures, we mainly use the SS aproach. * This research was supported by the Office of Naval Research Contract N00014-84-C-0167 at Purdue University. Reproduction in whole or in part is permitted for any purpose of the United States Government. 131 732 s. s. Gupta and S. Panchapakesan In the last three decades and more, the literature on selection and ranking procedures has grown enormously. Several books have appeared exclusively dealing with selection and ranking procedures. Of these, the monograph of Bechhofer, Kiefer and Sobel (1968) deals with sequential procedures with special emphasis on Koopman-Darmois family. Gibbons, Olkin and Sobel (1977) deal with methods and techniques mostly under the IZ formulation. Gupta and Panchapakesan (1979) provide a comprehensive survey of the developments in the field of ranking and selection, with a special chapter on Guide to Tables. They deal with all aspects of the problem and provide an extensive bibliography. BOringer, Martin and Schriever (1980) and Gupta and Huang (1981) have discussed some specific aspects of the problem. A fairly comprehensive categorized bibliography is provided by Dudewicz and Koo (1982). For a critical review and an assessment of developments in subset selection theory and techniques, reference may be made to Gupta and Panchapakesan (1985). Section 2 discusses the formulation of the basic problem of selecting the best population using the IZ and SS approaches. Section 3 deals with selection from gamma, exponential and Weibull populations. Procedures for different generalized goals are discussed using both IZ and SS approaches. Nonparametric procedures are discussed in Section 4 for selecting in terms of ~t-quantiles. This section also discusses procedures for Bernoulli distributions. These serve as distribution-free procedures for selecting from life distributions in terms of reliability at an arbitrarily chosen time. Procedures for selection from restricted families of distributions are described in Section 5. These include procedures for IFR and IFRA families in particular. A brief discussion of selection in comparison with a standard or control follows in Section 6. 2. Selection and ranking procedures Let 7Zl, ..., 7Zk be k given populations where ni has the associated distribution function Fo,, i = 1. . . . , k. The 0i are real-valued parameters taking values in the set O. It is assumed that the 0; are unknown. The ordered 0,. are denoted by 011j ~< 0[2] ~< • • • ~< 0[k] and the (unknown) population ne associated with Oto by n;, i = 1. . . . . k. The populations are ranked according to their 0-values. To be specific, nu~ is defined to be better than nti) if i < j . No prior information is assumed regarding the true pairing between (01 . . . . . 0~) and (0711, ..., 0[k]). 2.1. Indifference zone (IZ) formulation The goal in the basic problem in the IZ approach is to select the best population, namely, the one associated with 0[k]. A procedure is required to choose one of the populations. A correct selection (CS) is a selection of population(s) satisfying the goal. Here CS corresponds to choosing the best population. Any selection procedure is required to guarantee a minimum probability o f a correct selection (PCS). In the IZ formulation, this requirement is that, for any rule R, Selection and ranking procedures in reliability models P(CS IR) ~ P* whenever b(0[/,], 0[~_ 11) >/b*, 133 (2.1) where P(CSIR) denotes the PCS using R, and b(Otk1, 0[k_ 1]) is an appropriate measure of separation of the best population re(k) from the next best re(k- 1~' The constants P* and b* are specified by the experimenter in advance. The statistical problem is to define a selection rule which really consists of a sampling rule, a stopping rule for sampling, and a decision rule. If we consider taking a single sample of fixed size n from each population, then the minimum value of n is determined subject to (2.1). A crucial step involved in this is to evaluate the infimum of the PCS over 12~. = {0 = 01, . . . , Ok): b(Otl,], Otk_ 11) ~> b*}. Any configuration of 0 where this infimum is attained is called a least favorable configuration (LFC). Between two valid (i.e. satisfying (2.1)) single sample procedures, the sample size n is an obvious criterion for efficiency comparison. The region f2b. is called the preference zone. No requirement is made regarding the PCS when 0 belongs to the complement of fib* which, in fact, is the indifference zone. 2.2. Subset selection (SS) approach In the SS approach for selecting the best population, the goal is to select a nonempty subset for the k populations which includes the best population. The size of the selected subset is not fixed in advance; it is rather determined by the data themselves. Selection of any subset consistent with the goal (i.e. including the best population) is a correct selection. It is required that, for any rule R, P(CSIR)>~P* for all 0~f2 (2.2) where f2 = {0} is the whole parameter space. It should be noted that there is no indifference zone specification in this formulation. As is to be expected, a crucial step is the evaluation of the infimum of the PCS over 12. Any subset selection rule that satisfies (2.2) meets the criterion of validity. Denoting the selected subset by S and its size by IS I, the expected value of lSI serves as a reasonable measure for efficiency comparison between valid procedures. Besides E(IS b), possible performance characteristics include E(IS I) - PCS and E([S ])/PCS. The former one represents the expected number of nonbest populations included in the selected subset. As an overall measure, one can also consider the supremum of E ( / S I) over O. 2.3. Some general remarks The probability requirement, (2.1) or (2.2) as the case may be, is usually referred to as the basic probability requirement, or the P*-requirement, or the P*-condition. There are several modifications and generalizations of the basic goal and requirements on the procedures in both IZ and SS formulations. These will be described as the necessity arises during our discussion of several procedures. s. s. Gupta and S. Panchapakesan 134 For details on these aspects of the problem, reference may be made to Gupta and Panchapakesan (1979). Suppose that the best population is the one associated with the largest 0,.. A procedure R is said to be monotone if the probability of selecting ~i is at least as large as that of selecting rcj whenever 0~> 0j.. 2.4. Two types of subset selection rules Let T~ be the statistic associated with the sample from rce (i = 1. . . . . k) with distribution function F(x, 0e); the 0i are the parameters to be ranked. Most of the rules that have been studied in the literature are of one of the following types: RI: Select re; if and only if T,. >t max Tj - d 1 <~j<~<k (2.3) and R2: Select zci if and only if r,~>c max Tj (2.4) 1 <~j<~k where d > 0 and c e (0, 1) are to be determined so that the P*-requirement is satisfied. These rules R 1 and R 2 have been typically proposed when 0; is a location and a scale parameter, respectively. When 0,. is neither a location nor a scale parameter (e.g. a noncentrality parameter), usually one of these two rules has been proposed depending on the nature of the support of T,.. Most of the rules that are discussed in this paper c o m e under one of these two types. Treatment of R 1 and R 2 in the location and scale case, respectively, is given in Gupta (1965). The following properties hold for RI in the location case and R 2 in the scale case. (1) The procedure is monotone (Gupta, 1965). (2) If the distribution F(x, O) possesses a density f ( x , O) having a monotone likelihood ratio (MLR) in x, then E ( [ S J ) is maximized when 01 . . . . . Ok and this maximum is kP* (Gupta, 1965). (3) Under the MLR assumption, the rule is minimax when the loss is measured by JSp or the number of non-best populations selected (Berger, 1979). (4) In a fairly large class of rules, the procedure is minimax when the loss is measured by the maximum probability of including a non-best population (Berger and Gupta, 1980). A comprehensive unified theory is due to Gupta and Panchapakesan (1972), who have considered a class of rules which includes R1 and R 2 as special cases; see Gupta and Panchapakesan (1979, Section 11.2). Gupta and Huang (1980) have obtained an optimal rule in the class of rules for which the PCS is at least 7 by minimizing the supremum of E([S I). Selection and ranking procedures in reliability models 135 3. Selection from parametric families N u m e r o u s p a r a m e t r i c m o d e l s are e m p l o y e d in the analysis o f life length d a t a a n d in p r o b l e m s c o n n e c t e d w i t h t h e m o d e l i n g o f aging o r failure p r o c e s s e s . A m o n g u n i v a r i a t e m o d e l s , a few p a r t i c u l a r distributions, n a m e l y , the e x p o n e n t i a l , Weibull, a n d g a m m a , s t a n d o u t in v i e w o f their p r o v e n u s e f u l n e s s in a w i d e r a n g e o f situations. O f course, t h e s e d i s t r i b u t i o n s are related to e a c h other. In this section, we will d i s c u s s a few typical p r o c e d u r e s for selection f r o m t h e s e p o p u l a tions. 3.1. Selection from gamma populations Let 7zI . . . . , rc~ d e n o t e k given g a m m a p o p u l a t i o n s with d e n s i t y f u n c t i o n s f ( x , Oi)- - - r(~)o? exp(-x/0~), x>0; 0,., e > 0 ; i= 1. . . . . k, (3.1) w i t h a c o m m o n k n o w n s h a p e p a r a m e t e r ~. F o r the goal o f selecting a subset c o n t a i n i n g the b e s t p o p u l a t i o n , n a m e l y , the o n e a s s o c i a t e d w i t h 0tk 1, G u p t a (1963a) p r o p o s e d a rule b a s e d o n the s a m p l e m e a n s X;, i = 1, . . . , k, arising f r o m n i n d e p e n d e n t o b s e r v a t i o n s f r o m e a c h p o p u l a t i o n . T h e rule o f G u p t a (1963a) is Table la Values of the constant c of Rule R3 satisfying equation (3.3); P* = 0.90 k v 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.111 0.244 0.327 0.386 0.430 0.466 0.494 0.519 0.539 0.558 0.573 0.588 0.600 0.612 0.622 0.632 0.641 0.649 0.657 0.664 0.072 0.183 0.260 0.317 0.360 0.396 0.426 0.451 0.472 0.492 0.508 0.524 0.537 0.550 0.561 0.572 0.582 0.591 0.599 0.607 0.059 0.159 0.232 0.286 0.329 0.364 0.394 0.419 0.441 0.460 0.478 0.493 0.507 0.520 0.532 0.543 0.553 0.562 0.571 0.579 0.052 0.145 0.215 0.268 0.310 0.345 0.374 0.400 0.422 0.441 0.459 0.474 0.488 0.502 0.514 0.525 0.535 0.544 0.553 0.562 0.047 0.135 0.203 0.255 0.297 0.332 0.361 0.386 0.408 0.428 0.445 0.461 0.475 0.488 0.500 0.511 0.522 0.532 0.540 0.549 0.044 0.128 0.195 0.246 0.287 0.321 0.350 0.376 0.398 0.417 0.434 0.450 0.465 0.478 0.490 0.501 0.512 0.522 0.531 0.539 0.041 0.123 0.188 0.239 0.279 0.313 0.342 0.367 0.389 0.409 0.426 0.442 0.456 0.470 0.482 0.493 0.504 0.514 0.523 0.531 0.039 0.119 0.183 0.232 0.273 0.307 0.336 0.360 0.382 0.402 0.419 0.435 0.450 0.463 0.475 0.486 0.497 0.507 0.516 0.525 0.038 0.116 0.178 0.228 0.268 0.301 0.330 0.355 0.376 0.396 0.414 0.429 0.444 0.457 0.469 0.481 0.491 0.501 0.510 0.519 0.036 0.113 0.174 0.223 0.263 0.296 0.325 0.350 0.371 0.391 0.408 0.424 0.439 0.452 0.464 0.476 0.486 0.496 0.506 0.514 S. S. Gupta and S. Panchapakesan 136 Table lb Values of the constant c of Rule R3 satisfying equation (3.3); P* = 0.95 k v 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 3 4 5 6 7 8 9 10 11 0.053 0.156 0.233 0.291 0.336 0.372 0.403 0.428 0.451 0.471 0.488 0.504 0.518 0.531 0.543 0.554 0.564 0.574 0.582 0.591 0.035 0.119 0.188 0.242 0.285 0.320 0.350 0.376 0.399 0.419 0.437 0.453 0.468 0.481 0.494 0.505 0.516 0.526 0.535 0.544 0.028 0.104 0.168 0.220 0.261 0.296 0.326 0.351 0.374 0.394 0.412 0.428 0.443 0.457 0.470 0.481 0.492 0.502 0.512 0.520 0.025 0.095 0.156 0.206 0.247 0.281 0.310 0.336 0.358 0.378 0.396 0.413 0.428 0.442 0.454 0.466 0.477 0.487 0.497 0.506 0.023 0.089 0.148 0.197 0.237 0.271 0.300 0.325 0.347 0.367 0.385 0.402 0.417 0.430 0.443 0.455 0.466 0.476 0.486 0.495 0.021 0.085 0.142 0.190 0.229 0.263 0.291 0.316 0.339 0.359 0.377 0.393 0.408 0.422 0.434 0.446 0.457 0.468 0.477 0.486 0.020 0.082 0.138 0.184 0.223 0.256 0.285 0.310 0.332 0.352 0.370 0.386 0.401 0.415 0.428 0.439 0.450 0.461 0.470 0.480 0.019 0.079 0.134 0.180 0.218 0.251 0.279 0.304 0.326 0.346 0.364 0.380 0.395 0.409 0.422 0.434 0.445 0.455 0.465 0.474 0.018 0.076 0.131 0.176 0.214 0.247 0.275 0.300 0.322 0.341 0.359 0.376 0.390 0.404 0.417 0.429 0.440 0.450 0.460 0.469 0.018 0.074 0.128 0.173 0.210 0.243 0.271 0.296 0.317 0.337 0.355 0.371 0.386 0.400 0.413 0.424 0.436 0.446 0.456 0.465 R3: Select rEi if a n d o n l y if Xi >~ c m a x X j (3.2) 1 ~<j<~k w h e r e c is the largest n u m b e r w i t h 0 < c < 1 for w h i c h the P * - r e q u i r e m e n t is met. T h e L F C is given by 01 . . . . . Ok a n d the c o n s t a n t c is d e t e r m i n e d by fO e Gkv - l ( x / c ) g v ( x ) d x = e * , (3.3) w h e r e v = nc~ and, Gv a n d gv are the c d f a n d the density, respectively, o f a s t a n d a r d i z e d g a m m a r a n d o m v a r i a b l e (i.e. with 0 = 1) w i t h s h a p e p a r a m e t e r v. G u p t a (1963a) has t a b u l a t e d the v a l u e s o f c for v = 1(1)25, k = 2(1)11, a n d P * = 0.75, 0.90, 0.95, 0.99. T a b l e s l a a n d l b are e x c e r p t e d f r o m the tables o f G u p t a (1963a) a n d t h e y p r o v i d e c - v a l u e s for k = 2(1)11, v = 1(1)20, a n d P * = 0.90 a n d 0.95, respectively. D e p e n d i n g on the p h y s i c a l n a t u r e o f the p r o b l e m , w e m a y be i n t e r e s t e d in selecting the p o p u l a t i o n a s s o c i a t i o n w i t h 0tl 1, w h i c h is the best p o p u l a t i o n n o w . Selection and ranking procedures in reliability models 137 In this case, the procedure analogous to R 3 is R4: Select zcg if and only if 1 min X. C 1 <~j~<k J X~ ~< - (3.4) where 0 < c' < 1 is the largest number for which the P*-condition is met. The constant c' is given by f o ~ [ 1 - Gv(c' x)] k - lgv(x ) d x = P * (3.4) where v = n~. The values of the constant c' have been tabulated for v = 1(1)25, k = 2(1)11, and P* = 0.75, 0.90, 0.95, 0.99 by Gupta and Sobel (1962b) who have studied rule R 4 in the context of selecting from k normal populations the one with the smallest variance in a companion paper (1962a). It is known that the gamma family {F(x, 0)}, with common parameter r, is stochastically increasing in 0, i.e., F(x, 0~) and F(x, Oj) are distinct for 0,. # 0j, and F(x, 0~) >1F(x, Oj) for all x when 0~< 0j.. This implies that ranking them in terms of 0 is equivalent to ranking in terms of a-quantile for any 0 < a < 1. 3.2. Selection from exponential (one-parameter) populations We first note that this is a special case of gamma populations with densities f(x, 0~) in (3.1) with ~ - - 1 . Thus the rules R 3 and R 4 are applicable. Now consider a life testing situation where a sample of n items from each population is put on test and the sample is censored (type II) at the rth failure. Let Xil < X i 2 < ' " < X t r denote the r complete lives in the sample from re;, i = 1, . . . , k. Define r T,= L X,j + (n - r)X,r, i= l, ..., k. (3.5) j=l The Ti are the so-called total life statistics. It is well-known that 2Te/Oi has a chi-square distribution with 2r degrees of freedom. In other words, 7",. has a gamma distribution with scale parameter 0~ and shape parameter r. Thus for selecting the population with the largest mean life 0e, the procedure R 3 (stated in terms of the T~) will be R3: Select I[i if and only if T,/> c max Tj 1 <~j<~k where c is given by (3.3) with v -- r. (3.6) s. s. Gupta and S. Panchapakesan 138 3.3. Selection J~om two-parameter exponential distributions Let ni have density f(x'Oi'~r)=l-aexp-{~}' x>Oi; O ~ , a > O ; i = l , . . . , k . ( 3 . 7 ) The density (3.7) provides a model for life length data when we assume a minimum guaranteed life 0~, which is here a location parameter. It is assumed that all the k populations have a common scale parameter a. The 0i are unknown and our interest is in selecting the population associated with the largest 0~. We will discuss some procedures under the IZ formulation. Consider the generalized goal of selecting a subset of fixed size s so that the t best populations (1 ~< t ~< s < k) are included in the selected subset. This generalized goal was introduced by Desu and Sobel (1968). The special case of t = s, namely, that of choosing t populations so that they are the t best, was considered originally by Bechhofer (1954). When s = t = 1, we get the basic goal of selecting the best population. The probability requirement is that PCS >~ P* whenever where 0* and P* are specified a subset of s populations is meaningful problem, we should dures, we will adopt either the will consider the two cases of Otk-t+ lj - 0tk-tl >/0* > 0 (3.8) in advance and a correct selection occurs when selected consistent with the goal. Also, for a have 1/(~) < P* < 1. In describing several procegeneralized goal or one of its special cases. We known and unknown a separately. Case A: Known or. We can assume without loss of generality that cr = 1. Let Xij, j = 1, ..., n, denote a sample of n observations from re;, i = 1, . . . , k. Define Yi mini <-~j<~nXij , i = 1, ..., k. = Raghavachari and Starr (1970) considered the goal of selecting the t best populations (i.e. 1 ~<s = t < k ) and they stvdied the 'natural' rule Rs: Select the t populations associated with Ytk-,+ ~1,''', Y[kl" (3.9) The L F C for this rule is given by 0[l . . . . . O[k-t+ O[k_t] ; 1] ~--- ' ' " O[k-t + 11 = O[k] ; O[k_t] = O*. (3. lo) Selection and ranking procedures in reliability models 139 The minimum sample size required to satisfy (3.8) is the smallest integer n for which (1-e-n°*) k t + ( k _ t) e n O . t i ( e - n O . , t + 1, k - t)>~ P * (3.11) where I(z;~,fl)=f~u~-~(1-u)/3-1du, ~ , f l > 0 ; 0 ~ < z ~ < 1. (3.12) Equivalently, we need the smallest integer n such that nO* >~ - log v, (3.13) where v (0 < v < 1) is the solution of the equation (1 - v) k - t + (k - t ) v - t I ( v , t + 1, k - t) = P * . (3.14) Raghavachari and Starr (1970) have tabulated the v-values for k = 2(1)15, t = l(1)k - 1, and P* = 0.90, 0.95, 0.975, 0.99. In particular, for selecting the best population, the equation (3.14) reduces to (vk)-l[1 - (1 - v) k] = P * . (3.15) For the generalized goal, Desu and Sobel (1968) studied the following rule R 6. ' Ytk~Given n, k, t, 0", and P*, they have shown that the smallest s for which the probability requirement (3.8) is satisfied is the smallest integer s such that R6: Select the s populations associated with Ytk-s+ 1]. . . . (~) >~p.(k) e-n,o*, ' t (3.16) It should be pointed out that Desu and Sobel (1968) have obtained general results for location parameter family. They have also considered the dual problem of selecting a subset of size s (s ~< t) so that all the selected populations are among the t best. Case B: Unknown a. In this case, we consider the basic goal of selecting the best population. Since a is unknown, it is not possible to determine in advance the sample size needed for a single sample procedure in order to guarantee the P*-condition. This is similar to the situation that arises in selecting the population with the largest mean from several normal populations with a common unknown variance. For this latter problem, Bechhofer, Dunnett and Sobel (1954) proposed a non-elimination type two-stage procedure in which the first stage samples are utilized purely for estimating the variance without eliminating any population from further consideration. A similar procedure was proposed by Desu, Narula and Villarreal (1977) for selecting the best exponential population. Kim and Lee (1985) have studied an elimination type two-stage procedure analogous to that of Gupta 140 S. S. Gupta and S. Panchapakesan and Kim (1984) for the normal means problem. In their procedure, the first stage is used not only to estimate a but also to possibly eliminate non-contenders. Their Monte-Carlo study shows that, when 0tkI - 0tk_ 1] is sufficiently large, the elimination type procedure performs better than the other type procedure in terms ot the expected total sample size. The procedure R 7 of Kim and Lee (1985) consists of two stages as follows. Stage I ." Take n o independent observations from each rcg (1 ~< i ~< k), and compute y/.(o = min~ ~j<~noXij, and a pooled estimate ~ of a, namely, k no O" ~- 2 2 ( Y / j - Y~l))/k(n 0 -- 1). i=1 j = l Determine a subset I of {1, ..., k} defined by I = {i1 y.(1) >~ max y)l) _ (2k(no _ 1) &h/n o - 0") + } , 1~j~k where the symbol a + denotes the positive part of a, and h ( > 0 ) is a design constant to be determined. (a) If I has only one element, stop sampling and assert that the population association with V(1) --[k] as the best. (b) If I has more than one element, go to the second stage. Stage 2: Take N - n o additional observations X U from each re,. for i E L where N = max{n o, (2k(n o - 1)~rh/O*)}, and the symbol ( y ) denotes the smallest integer equal to greater than y. Then compute, for the overall sample, Y~.= maxl~j~vX~j and choose the population associated with maxi~ x Y~ as the best. The constant h used in the procedure R 7 is given by fO °° {1 -- (1 -- O~(x))k}2/{k20~2(x)}fv(X) d x = P* (3.17) where e ( x ) - - e x p ( - h x ) and fv(x) is the chi-square density with v = 2 k ( n o - 1) degrees of freedom. The h-values have been tabulated by Kim and Lee (1985) for P* = 0.95, k = 2(1)5(5)20, and n o = 2(1)30. 3.4. Selection from Weibull distributions Let n~ have a two-parameter Weibull distribution given by the cdf Fi(x ) =- F(x; 0 i, e l ) = 1 - e x p { - 0;,c~>0; i = 1. . . . , k . (x/O~)C'}, x > 0; (3.18) Selection and ranking procedures in reliability models 141 The c`. and Oz. are unknown. Kingston and Patel (1980a, b) have considered the problem of selecting from Weibull distributions in terms of their reliabilities (survival probabilities) at an arbitrary but specified time L > 0. The reliability at L for F~ (i = 1. . . . . k) is given by p`. = 1 - F~(L) = exp { - (L/O`.)c'}. (3.19) We can without loss of generality assume that L = 1 because the observed failure times can be scaled so that L = 1 time unit. Further, letting (0`.)c' = 2;, we get p`. = exp { - 27 1}. Obviously, ranking the populations in terms of the p; is equivalent to ranking in terms of the 2;, and the best population is the one associated with 2[k], the largest 2,.. Kingston and Patel (1980a) considered the problem of selecting the best one under the IZ formulation using the natural procedure based on estimates of the 2`. constructed from type II censored samples. They also considered the problem of selecting the best in terms of the a-quantiles for a given Ok= 0 (unknown). The ~ (0, 1), ~ 1 - e -1, in the case where 01 . . . . . ~-quantile of F`. is given by ¢`. = 0[ - l o g ( 1 - ~)]l/ci so that ranking in terms of the ~-quantiles is equivalent to ranking in terms of the shape parameter. It should be noted that the ranking of the ci is in the same order as that of the associated 4`. if a < 1 - e-1, and is in the reverse order if a > 1 - e-1. The procedures discussed above are based on maximum likelihood estimators as well as simplified linear estimators (SLE) considered by Bain (1978, p. 265). For further details on these procedures, see Kingston and Patel (1980a). In another paper, Kingston and Patel (1980b) considered the goal of selecting a subset of restricted size. This formulation, usually referred to as restricted subset selection (RSS) approach, is due to Gupta and Santner (1973) and Santner (1975). In the usual s s approach of Gupta (1956), it is possible that the procedure selects all the k populations. In the RSS approach, we restrict the size of the selected subset by specifying an upper bound m (1 ~< m ~< k - 1); the size of the selected subset is still random variable taking on values 1, 2 . . . . , m. Thus it is a generalization of the usual approach (m = k). However, in doing so, an indifference zone is introduced. The selection goal can be more general than selecting the best. We now consider a generalized goal in the RSS approach for selection from Weibull populations, namely, to select a subset of the k given populations not exceeding m in size such that the selected subset contains at least s of the t best populations. As before, the populations are ranked in terms of their 2-values. Note that 1 ~< s ~< min (t, m) ~< k. The probability requirement now is that PCS >~P* whenever ~, = (21 . . . . . 2~)~f2a. (3.20) where f2~. = {2: 2"2[k t~ ~< 2[k-,+ ,], 2* ~> 1}. (3.21) 142 S . S . Gupta and S. Panchapakesan When t = s = m and 2* > 1, the problem reduces to selecting the t best populations using the IZ formulation. When s = t < m = k and 2*= 1, the problem reduces to selecting a subset of random size containing the t best populations (the usual SS approach). Thus the RSS approach integrates the formulations of Bechhofer (1954), Gupta (1956), and Desu and Sobel (1968). General theory under the RSS approach is given by Santner (1975). Returning to the Weibul selection problem with the generalized RSS goal, Kingston and Patel (1980b) studied a procedure based on type II censored samples from each population. It is defined in terms of the maximum likelihood estimators (or the SLE estimators) 2 i. This procedure is A R8: Include 7ri in the selected subset if and only if ,~i >~ max{'~[k-m+ t1, CA[k-,+ 1]}, (3.22) where c~ [0, 1] is suitably chosen to satisfy (3.20). Let n denote the common sample size and consider censoring each sample at the rth failure. For given k, r, n, s, t, and m, we have three quantities associated with the procedure R 8, namely, P*, c, and 2 * > 0. Given two of these, one can find the third; however, the solution may not be admissible. For example, for some P* and 2*, there may not be a constant c e [0, 1] so that (3.20) is satisfied unless m = k. Kingston and Patel (1980b) have given a few tables of ),*-values for selected values of other constants. Their table values are based on Monte Carlo techniques and the choice of SLE's. 4. Nonparametric and distribution-free procedures Parametric families of distributions serve as life models in situations where there are strong reasons to select a particular family. For example, the model may fit data on hand well, or there may be a good knowledge of the underlying aging or failure process that indicates the appropriateness of the model. But there are many situations in which it becomes desirable to avoid strong assumptions about the model. Nonparametric or distribution-free procedures are important in this context. Gupta and McDonald (1982)have surveyed nonparametric selection and ranking procedures applicable to one-way classification, two-way classification, and paired-comparison models. These procedures are based on rank scores and/or robust estimators such as the Hodges-Lehmann estimator. For the usual types of procedures based on ranks, the LFC is not always the one corresponding to identical distributions. Since all these nonparametric procedures are relevant in the context of selection from life length distributions, the reader is best referred to the survey papers of Gupta and McDonald (1982), Gupta and Panchapakesan (1985), and Chapters 8 and 15 of Gupta and Panchapakesan (1979). Selection a n d ranking p r o c e d u r e s in reliability m o d e l s 143 There have been some investigations of subset selection rules based on ranks while still assuming that the distributions associated with the populations are known. This is appealing especially in situations in which the order of the observations is more readily available than the actual measurements themselves due, perhaps, to excessive cost or other physical constraints. Under this setup, Nagel (1970), Gupta, Huang and Nagel (1979), Huang and Panchapakesan (1982), and Gupta and Liang (1987) have investigated locally optimal subset selection rules which satisfy the validity criterion that the infimum of the PCS is P* when the distributions are identical. They have used different optimality criteria in some neighborhood of an equiparameter point in the parameter space. An account of these rules is given in Gupta and Panchapakesan (1985). Characterizations of life length distributions are provided in many situations by so-called restricted families of distributions which are defined by partial order relations with respect to known distributions. Well-known examples of such families are those with increasing (decreasing) failure rate and increasing (decreasing) failure rate average. Selection procedures for such families will be discussed in the next section. In the remaining part of this section, we will be mainly concerned with nonparametric procedures for selection in terms of a quantile and selection from several Bernoulli distributions. Though the Bernoulli selection problem could have been discussed under parametric model, it is discussed here to emphasize the fact that we can use the Bernoulli selection procedures as distribution-free procedures for selecting from unknown continuous (life) distributions in terms of reliability at any arbitrarily chosen time point L. 4.1. Selection & terms of quantiles Let ~1 . . . . . rck be k populations with continuous distributions F+(x), i = 1, ..., k, respectively. Given 0 < c~< 1., let x~(F) denote the ~th quantile ofF. It is assumed that the ~-quantiles of the k populations are unique. The populations are ranked according to their ~-quantiles. The population associated with the largest ~-quantile is defined to be the best. Rizvi and Sobel (1967) proposed a procedure for selecting a subset containing the best. Let n denote the common size of the samples from the given populations and assume n to be sufficiently large so that 1 ~< (n + 1)~< n. Let r be a positive integer such that r~< (n + 1)~< r + 1. It follows that 1 ~< r ~< n. Let Yj, i denote the jth order statistic in the sample from rc~, i = 1. . . . . k. The procedure of Rizvi and Sobel (1967) is R9: Select ~zi if and only if Y~ i>~ max Yr e j ' l~j<~k -- " (4.1) where c is the smallest integer with 1 ~< c ~< r - 1 for which the P*-condition is satisfied. For the procedure R9, the infimum of the PCS is attained when the distributions F 1. . . . . F k are identical and it is shown by Rizvi and Sobel (1967) that c 144 s. s. Gupta and S. Panchapakesan is the smallest integer with 1 ~< c ~< r - 1 satisfying ~0 1 Grk--cl(u) dGr(u) ~> P* where n~ Gr(u)= (r- (4.2) ur - l ( 1 - u ) . . . . 1, 0,N<u~<l. (4.3) 1)!(n - r)! Rizvi and Sobel have shown that the maximum permissible value o f P* such that a c-value satisfying (4.2) exists is P1 = PI( n, ~, k) given by P1 = i=0 (,,(i+ ". (4.4) r 1)) A short table of Pl-values is given by Rizvi and Sobel for ~ = 0.5 and k = 2(1)10. The n-values range from 1 in steps of 2 to a value (depending on k) for which P1 gets very close to 1. Also given by them is a table of the largest value of r - c for c~ = 1/2 (which means that r = (n + 1)/2), k = 2(1)10, n = 5(10)95(50)495, and P* = 0.75, 0.90, 0.95, 0.975, 0.99. For the IZ approach to this selection problem, see Sobel (1967). 4.2. Distribution-free procedures using Bernoulli model Let re1, ..., lt~ be k populations with the associated continuous (life) distributions F 1. . . . , F k, respectively. The reliability of ~; at L is p~ = 1 - Fi(L ). Let Xo, j = 1, . . . , n, be sample observations from rc~, i = 1. . . . , k. Define Y,y= {~ if X ° > L ' i=1 ..... k;j=l ..... n. (4.1) otherwise, The Yil ..... Yin are independent and identically distributed Bernoulli r a n d o m variables with success probability p;, i = 1. . . . . k. We are interested in selecting the population associated with the largest pi. G u p t a and Sobel (1960) proposed a subset selection rule based on Yi = ~nj=l Y/j, i = 1, . . . , k. Their rule is Rio: Select re,. if and only if Y,. >/ max Ys - D 1 <-%j<~k (4.2) where D is the smallest nonnegative integer for which the P*-requirement is met. An interesting feature o f Procedure Rio is that the infimum of the PCS occurs when Pl . . . . . Pk = P (say) but it is not independent of their c o m m o n value p. Selection and ranking procedures in reliability models 145 For k = 2, Gupta and Sobel (1960) showed that the infimum takes place when p = 1/2. When k > 2, the common value Po for which the infimum takes place is not known. However, it is known that this common value Po ~ 1/2 as n ~ ~ . An improvement in the situation is provided by Gupta, Huang and Huang (1976) who investigated conditional selection rules and, using the conditioning argument, obtained a conservative value of d. Their conditional procedure is RI~: Select re,. if and only if Y~>>. m a x 1 ~<j~< Yj-D(t) (4.3) k given T = ~k;= ~ Y~-= t, where D(t) > 0 is chosen to satisfy the P*-condition. Exact result for the infimum of the PCS is ~ ~tained only for k = 2; in this case, the infimum is attained when p~ = P2 = P and is independent of the common value p. For k > 2, Gupta, Huang and Huang (1976) obtained a conservative value for D(t) and also for D of Rule Rio. They have shown that infP(CS ]R~I ) >i P * if D(t) is chosen such that D(t) Sd(t) for k = 2, ~max{d(r): r = 0, 1, . . . , min(t, 2n)) for k > 2, (4.4) where d(r) is defined as the smallest value such that for k = 2 , N(2; d(r), r, n) >1/.[1 - (1 - P * ) ( k - 1)- l] (zn) (4.5) for k > 2 , and N(k; d(t), t, n) = • ( ~ ) . . . ( ~ ) , with the summation taken over the set of all . . k nonnegatlve integers s; such. that ~ i = 1 si = t and s k >>,m a x i <~j<<.k- ~sj - d(t). A conservative constant d for Procedure Rio is given by d = maxo<.t<~knd(t ). Gupta, Huang and Huang (1976) have tabulated the smallest value d(t) satisfying (4.5) for k = 2,4(1)10, n = 1(i)10, t = 1(1)20, and P* = 0.75, 0.90, 0.95, 0.99. They have also tabulated the d-values (conservative) for Procedure Rio for P* = 0.75, 0.90, 0.95, 0.99, and n = 1(1)4 when k = 3(1)15, and n -- 5(1)10 when k = 3(1)5. Under the IZ formulation, one can use the procedure of Sobel and Huyett (1957) for selecting the population associated with the largest Pi which guarantees a minimum PCS P* whenever PtkJ -- Ptg- II >/A* > 0. Based on samples of size n from each population, their procedure based on the Yi defined in (4.1) is R12: Select the population associated with the largest Yi, using randomization to break ties, if any. (4.6) The sample size required is the smallest n for which the PCS >~ P* when Pt~] . . . . . P[k-lJ = P t k ] - A*, the LCF in this case. Sobel and Huyett (1957) have tabulated the sample sizes (exact and approximate) for k = 2, 3, 4, 10; A* = 0.05(0.05)0.50, and P* = 0.50, 0.60, 0.75(0.05)0.95, 0.99. 146 S. S. Gupta and S. Panchapakesan When n is large, the normal approximation to the PCS yields n ~ c2(1 - A*z)/4A .2 (4.7) where c = c(k, P * ) is the constant satisfying f ~ qtr~- l(x + c)qg(x)dx = P* (4.8) --oO and, ~ and q~ denote correspondingly the cdf and density of the standard normal distribution. The c-value can be obtained from tables of Bechhofer (1954), Gupta (1963b), Milton (1963) and Gupta, Nagel and Panchapakesan (1973) for several selected values of k and P*. The Bernoulli selection problem has applications to the drug selection problem and to clinical trials. This fact has spurred lots of research activity involving investigations of selection procedures using sampling procedures such as the play-the-winner (PW) sampling rule (introduced by Robbins, 1952 and 1956) and vector-at-a-time (VT) rule with a variety of stopping rules. One of the main considerations in many of these procedures is to design the sampling rule so as to minimize the expected total number of observations and/or the expected number of observations from the worst population. Some of these procedures suffer from one drawback or another. For excellent review/survey/comprehensive assessment of these (and other) procedures, reference should be made to Bechhofer and Kulkarni (1982), BOringer, Martin and Schriever (1980), Gupta and Panchapakesan (1979, Sections 4.2 through 4.6), and Hoel, Sobel and Weiss (1975). For corresponding developments in subset selection theory, see Gupta and Panchapakesan (1979, Section 13.2). 5. Selection from restricted families of distributions A restricted family of probability distributions is defined by a partial order relation with respect to a known distribution. As we have pointed out earlier, such families provide characterizations of life length distributions. Selection rules for such restricted families were first considered by Barlow and Gupta (1969). We define below the binary partial order relations ( < ) that have been used in studying selection procedures. These are partial ordering in the sense that they enjoy only reflexivity and transitivity properties, that is, (1) F < F for all distributions F, and (2) F < G, G < H implies F < H. Note that F < G and G < F do not necessarily imply F - G. DEFINITION 5.1. (1) F is said to be convex with respect to G ( F < c G ) if and only if G 1F(x) is convex on the support of F. (2) F is said to be star-shaped with respect to G ( F < . G) if and only if F(O) = G(O) = O, and G - 1F(x)/x is increasing in x >I 0 on the support of F. Selection and ranking procedures in reliability models 147 (3) F is said to be r-ordered with respect to G ( F < r G ) if and only if F(0) = G(0) = 1/2 and G - 1 F ( x ) / x is increasing (decreasing) in x positive (negative). (4) F is said to be tail-ordered with respect to G ( F < t G ) if and only if F(0) = G(0) = 1/2 and G - iF(x) - x is increasing on the support of F. It is well-known that convex ordering implies star ordering. Further, when G(x) = 1 - e - x (x >i 0), F < c G is equivalent to saying that F has an increasing failure rate (IFR) and F < . G is equivalent to saying that F has an increasing failure on the average (IFRA). Of course, if F is IFR, then it is also IFRA. IFR distributions were first studied in detail by Barlow, Marshall and Proschan (1963) and IFRA distributions by Birnbaum, Esary and Marshall (1966). The r-ordering was investigated by Lawrence (1975). Doksum (1969) used the tail-ordering. The convex ordering and s-ordering (not defined here) have been studied by van Zwet (1964). Without the assumption of the common median zero, Definition 5.1-(4) has been used by Bickel and Lehmann (1979) to define an ordering by spread with the germinal concept attributed to Brown and Tukey (1946). Saunders and Moran (1978) have also perceived this kind of ordering (called ordering by dispersion by them) in the context of a neurobiological problem. Gupta and Panchapakesan (1974) have defined a general partial ordering through a class of real-valued functions, which provides a unified way to handle selection problems for star-ordered and tail-ordered families. Their ordering is defined as follows. DEFINITION 5.2. Let ~ = {h(x)} be a class of real-valued functions h(x). Let F and G be distributions such that F(0) = G(0). F is said to be ~-ordered with respect to G ( F < i~eG) if G-1F(h(x))>f h(G-1F(x)) for all h • ~ and all x on the support of F. It is easy to see that we get star-ordering and tail-ordering as special cases of W-ordering by taking W = {ax, a>1 1}, F ( 0 ) = G ( 0 ) = 0 , and out° = { x + b , b >~ 0}, F(0) = G(0) = 1/2, respectively. Hooper and Santner (1979) have used a modified definition of W-ordering. For some useful probability inequalities involving Jt~-ordering, see Gupta, Huang and Panchapakesan (1984). 5. I. Selection in terms of quantiles from star-ordered distributions Let rc~, ..., ~tk have the associated absolutely continuous distributions F 1. . . . . F~, respectively. All the F i are star-shaped with respect to a known continuous distribution G. The population having the largest ~-quantile (0 < ~ < 1) is defined as the best population. It is assumed that the best population is stochastically larger than any of the other populations. Under this setup, Barlow and Gupta (1969) proposed a procedure for selecting a subset containing the best. Let Tj. i denote the jth order statistic in a sample of n independent observations from rci, i = 1. . . . , k, where n is assumed to be large enough so that S. S. Gupta and S. Panchapakesan 148 j ~< (n + 1)c¢< j + 1 for some j. The Barlow-Gupta procedure is Select n i if and only if R13" Tji>~c " (5.1) max Tjr 1 <~r<~k " c(k, P*, n, j) is the largest number in (0, 1) for which the P*-condition is satisfied. The constant c is given by where c = ~o~ Gf- '(x/c)&.(x) dx p* (5.2) where Gj denotes the cdf of the jth order statistic in a sample of n observations from G, and gj is the corresponding density function. The values of c satisfying (5.2) are tabulated by Barlow, Gupta and Panchapakesan (1969) in the special case of exponential G, i.e. for selecting from IFRA populations, for P* = 0.75, 0.90, 0.95, 0.99, and the following values of k, n, and j: (i) j = 1, k = 2(1)11 (in this case, c is independent of n), (ii) k = 2(1)6, j = 2(1)n, and n = 5(1)10 or 12 or 15 depending on k. Table 2a is excerpted from the tables of Barlow, Gupta and Panchapakesan (1969). It gives the values of c for P* = 0.90, 0.95, k = 2(1)5, Table 2a Values of the constant c of Rule R13 satisfying equation (4.2) for selecting the IFRA distribution with the largest median; G(x)= 1 - e -x, x~>0, j~< (n + 1)/2 < j + 1, P * = 0.90 (top entry), 0.95 (bottom entry) k n 2 3 4 5 5 0.32197 0.22871 0.25464 0.18353 0.22607 0.16388 0.20924 0.15215 6 0.32397 0.23045 0.25665 0.18521 0.22808 0.16551 0.21123 0.15377 7 0.38021 0.28527 0.31045 0.23611 0.27994 0.21406 0.26164 0.20068 8 0.38198 0.28692 0.31228 0.23774 0.28179 0.21568 0.26351 0.20229 9 0.42434 0.32973 0.35398 0.27855 0.32257 0.25515 0.30353 0.24079 10 0.42587 0.33121 0.35559 0.28005 0.32422 0.25665 0.30519 0.24228 I1 0.45939 0.36592 0.38927 0.31377 0.35750 0.28958 0.33808 0.27461 12 0.46071 0.36724 0.39069 0.31512 0.35896 0.29094 0.33956 0.27597 Selection and ranking procedures in reliability models 149 Table 2b Values of the constant d of Rule RI4 satisfying equation (5.4) for selecting the IFRA distribution with the smallest median; G(x)= 1 - e -x, x>~O, j ~<(n + 1)/2 < j + 1, P* = 0.90 (top entry), 0.95 (bottom entry) k n 2 3 4 5 5 0.32197 0.22871 0.23711 0.17100 0.19983 0.14516 0.17752 0.12953 6 0.32397 0.23045 0.23881 0.17244 0.20134 0.14643 0.17891 0.13060 7 0.38021 0.28527 0.29477 0.22441 0.25597 0.19623 0.23226 0.17883 8 0.38198 0.28692 0.29636 0.22585 0.25744 0.19755 0.23365 0.18007 9 0.42434 0.32972 0.33988 0.26775 0.30072 0.23845 0.27650 0.22014 10 0.42587 0.33121 0.34131 0.26909 0.30208 0.23971 0.27779 0.22134 11 0.45939 0.36592 0.37647 0.30378 0.33748 0.27399 0.31315 0.25521 12 0.46071 0.36724 0.37775 0.30501 0.33871 0.27516 0.31433 0.25634 n = 5(1)12, and j Such that j ~ (n + 1)/2 < j + 1 (i.e. a p p r o p r i a t e for selection in terms of median). F o r the selection of the p o p u l a t i o n with the smallest a-quantile ( a s s u m e d to be stochastically smaller than any other Fe) the analogous p r o c e d u r e is R14: Select dTs.,i>~ rei if a n d only if min l <~r<~k Tj, r (5.3) where d = d(k, P*, n, j ) is the largest n u m b e r in (0, 1) satisfying the P * - c o n d i t i o n and is given by f o B [1 - G j ( x d ) ] k - l g j ( x ) d x = P * (5.4) where Gj and gs are defined as in (5.2). Barlow, G u p t a a n d P a n c h a p a k e s a n (1969) have t a b u l a t e d the values of d in the case o f exponential G for P * = 0.75, 0.90, 0.95, 0.99 a n d the following values o f k, n, and j : ( i ) j --- 1, k = 2(1)11 (d is i n d e p e n d e n t o f n), (ii) k = 2(1)6, j -- 2(1)n, n = 5(1)12 for k = 6, and n = 5(1)15 s. s. Gupta and S. Panchapakesan 150 for other k values. Table 2b is excerpted from the tables of Barlow, Gupta and Panchapakesan (1969). It gives the values of d for P * = 0.90, 0.95, k = 2(1)5, n = 5(1)12, and j such that j ~< (n + 1)/2 < j + 1 (i.e. appropriate for selection in terms of median). Suppose that G is the Weibull distribution with cdf G(x) = 1 - exp { -(x/O)~}, x ~> 0, and 0, 2 > 0. It is assumed that 2 is known. Then it is easy to see that the new constant c~ is given by c I = c ~/~, where c is the constant in the exponential case (2 = 1). Another interesting special case of G is the half-normal distribution obtained by folding N(0, a 2) at the origin, where a is assumed to be known. The class of distributions which are star-shaped with respect to this folded normal is a subclass of IFRA distributions. Selection in terms of quantiles in this case has been considered by Gupta and Panchapakesan (1975), who have tabulated the constant c associated with RI3 for k--- 2(1)10, n = 5(1)10, j = l(1)n, and P* = 0.75, 0.90, 0.95, 0.99. 5.2. Selection in terms of medians from tail-ordered distributions Barlow and Gupta (1969) considered also the selection of the population with the largest median (assumed to be stochastically larger than other populations) from a set of distributions F,., i = 1, . . . , k, which have lighter tails than a specified distribution G with G(0)= 1/2. This means that, for each i, F i centered at its median A; is r-ordered with respect to G, and (d/dx)Fi(x+Ai)lx= o >1 (d/dx)G(x)Ix= o. This definition of F,. having a lighter tail than G used by them implies that F~ centered at Ai is tail-ordered with respect to G. The procedure of Barlow and Gupta (1969) has been shown by Gupta and Panchapakesan (1974) to work for this wider class defined using tail-ordering. Actually, Gupta and Panchapakesan have also shown a generalized version of this by considering tail-ordering of F; and G when both are centered at their respective ~-quantiles. For selection in terms of .medians, the procedure of Barlow and Gupta is R15: Select ni if and only if Tj.t>/ max T/ " 1 ~r~<k ,r -D j~<(n+ 1)/2<j+ 1 (5.5) , where the T/, r are defined as in the case of the procedure R13 , and the appropriate constant D = D(k, P*, n) > 0 is given by f ~_~ G f - '(t + D)gy(t) dt = P*. (5.6) Here, Gs and gs are the cdf and the density of the jth order statistic in a sample of n independent observations from G. The values of D are given by Gupta and Panchapakesan (1974) in the special case where G is the logistic distribution, G(x) = [ 1 + e-X] - 1, for k = 2(1)10, n = 5(2)15, and P* = 0.75, 0.90, 0.95, 0.99. Using the ~-ordering (Definition 5.2) with the functions h satisfying certain properties, Gupta and Panchapakesan (1974) have discussed a class of proce- Selection and ranking procedures in reliabilitymodels 151 dures for selecting the best (i.e. the one which is stochastically larger than any other, assumed to exist) of k distributions F;, i, . . . , k, which are Yr'-ordered with respect to G. The procedures R13 and R15 are special cases of their procedure. Hooper and Santner (1979) considered selection of good populations in terms of c~-quantiles for star- and tail-ordered distributions using the RSS approach. Let ni have the distribution F; and let Fvl denote the distribution having the ith smallest c~-quantile. Denoting the c~-quantile of any distribution F by x~(F), ~ is called a good population if x~(F~) > c*x~(Ftk_,+ 11), 0 < c* < 1, in the case of star-ordered families, and if x~(F,.)> x~(Ft~,_t+ q ) - d*, d* > 0, in the case of tail-ordered families. The goal of Hooper and Santner (1979) is to select a subset of size not exceeding m(1 ~< m ~< k - 1) that contains at least one good population. They have also considered the problem of selecting a subset of fixed size s so as to include at least r good populations (r~< t, r~< s < k - t + r) using the IZ approach. Selection of one or more good populations as a goal is a relaxation from that of selecting the best population(s). A good population is defined suitably to reflect the fact that it is 'nearly' as good as the best. In some form or other it has been considered by several authors; mention should be made of Fabian (1962), Lehmann (1963), Desu (1970), Carroll, Gupta and Huang (1975), and Panchapakesan and Santner (1977). A discussion of this can be found in Gupta and Panchapakesan (1985, Section 4.2). 5.3. Selection from convex ordered distributions Let ~t~. . . . . rc~ have absolutely continuous distributions F 1. . . . . F k, respectively, of which one is assumed to be stochastically larger than the rest. This distribution, denoted by Ft~j, is defined to be the best. It is assumed that Ft~,l < c G, where G is a known continuous distribution. All distributions in the context are assumed to have the positive real line as the support. Let X)f)~(Yj,n) denote the jth order statistic in a random sample of size n from Fe(G ). Considering samples of size n from F~, . . . , F k each censored at the rth failure, define T i= ~ a X g ) --y--J~ n , i= 1, " " " ' k (5.7) ' J=l where aj=gG-l(J-n 1)-gG-l(~ ), j= 1,...,r- 1, (5.8) a~=gG-'(~-), and g is the density associated with G. If G(y) = 1 - e-Y, y >>,O, then a 1 . . . . . a t - 1 = 1/n, and ar = (n - r + 1)/n. r-- 1 (1) Consequently, n 7",.= ~]j = 1 X)f~ + (n - r + 1) X~I n, the well-known total life statistic until the rth failure from F i. 152 S. s. Gupta and S. Panchapakesan Now, for selecting a subset containing Fte], Gupta and Lu (1979) proposed the rule R16: Select n~ if and only if Ti>~ c max Tj, 1 <~j<~k (5.9) where c is the largest number in (0, 1) satisfying the P*-condition. They have shown that, if aj ~> 0 for j = 1. . . . . r, a,/> c, and g(0) ~< 1, then infP(CS ]R16) = G~r- l ( y / c ) d G r ( y ) , (5.10) g2 ~O ~ r where GT- is the distribution of T = Y~j= 1 aj Yj, n, and f2 is the space of all k-tuples (F 1. . . . . Fk) such that there is one among them which is stochastically larger than the others and is convex with respect to G. Thus, the constant c = min(ar, c*) where c* is the solution for e by equating the fight-hand side of (5.10) to P*. For the special case of G ( y ) = 1 - e -y, y~>0, we get c = m i n ( c * , (n - r + 1)/n). This special case is a slight generalization of the results of Patel (1976). 6. Comparison with a standard or control Although the experimenter is generally interested in selecting the best of k (>t 2) competing categories, in some situations even the best one among them may not be good enough to warrant its selection. Such a situation arises when the goodness of a population is defined in comparison with a standard (known) or a control population. For convenience, we may refer to either one as the control. nk be the k (experimental) populations with associated distribution Let ~1, functions F ( x , Or), i = 1, . . . , k, respectively. The 0r are unknown. Let 0o be the specified standard or the unknown parameter associated with the control population n o whose distribution function is F ( x , 0o). Several different goals have been considered in the literature. For example, one may want to select the best experimental population (i.e. the one associated with 0[k], the largest 0;) provided that it is better than the control (i.e. 0rk] > 0o), and not to select any of them otherwise. An alternative goal is to select a subset (of random size) of the k populations which includes all those populations that are better than the control. Some of the early papers dealing with these problems are Paulson (1952), Dunnett (1955), and Gupta and Sobel (1958). One can define a good population in different ways using comparison with a control. For example, rc~ may be called good if 0r > 0o + A, or [0,. - 0o1 ~< A for some A > 0. Several procedures have been investigated with the goal of selecting good populations or those better than the control and these will not be described here. A good account of these can be had from Gupta and Panchapakesan (1979, "'', Selection and ranking procedures in reliability models 153 Chapter 20). A review of subset selection procedures in this context, including recent developments, is contained in Gupta and Panchapakesan (1985). An important aspect of the recent developments is the so-called isotonic p r o c e d u r e s which become relevant in the situations where it is known that 01 <~ 02 <~ • • • <<, Ok although the values of the 0,. are unknown. This is typical, for example, of experiments involving different dose levels of a drug so that the treatment effects will have a known ordering. Suppose that a population ni is defined to be good if 0~>~ 0o and bad otherwise. For the goal of selecting all the good populations, any reasonable procedure R should have the property: If R selects ~ti then it selects all populations nj for j > i. This is the isotonic behavior of R. Naturally, one would consider procedures based on isotonic estimator of the 0,. Such procedures have been recently studied by Gupta and Yang (1984) in the case of normal means (common variance o"2, known or unknown), by Gupta and Huang (1984) in the case of binomial populations with success probabilities 0;, and by Gupta and Leu (1986) in the case of two-parameter exponential populations with guarantee times (location parameters) 0i and common (known or unknown) scale parameter. All these papers deal with both cases of known and unknown 00. 7. Concluding remarks In the preceding sections, we have described several selection procedures that have special significance in reliability studies. However, we have confined our attention to the classical type procedures since they are of common interest to a wide variety of users. We have also generally restricted ourselves to single-stage procedures. T h e r e is ample literature on two-stage and sequential procedures. Further, we have not discussed decision-theoretic formulations and Bayes and empirical Bayes procedures. There have been substantial developments in these regards, especially using subset selection approach, in the last ten years. For a comprehensive survey of developments until the late 1970's, we refer to Gupta and Panchapakesan (1979). A critical review of developments in the subset selection theory including very recent developments is given by Gupta and Panchapakesan (1985). References Bain, L. (1978). Statistical Analysis of Reliability and Life-Testing Models, Theory and Methods. Marcel Dekker, New York. Barlow, R. E. and Gupta, S. S. (1969). Selectionprocedures for restricted families of distributions. Ann. Math. Statist. 40, 905-917. Barlow, R. E., Gupta, S. S. and Panchapakesan, S. (1969). On the distribution of the maximum and minimum of ratios of order statistics. Ann. Math. Statist. 40, 918-934. Barlow, R. E., Marshall, A. W. and Proschan, F. (1963). Properties of probability distributions with monotone hazard rate. Ann. Math. Statist. 34, 375-389. 154 S. S. Gupta and S. Panchapakesan Bechhofer, R. E. (1954). A single-sample multiple decision procedure for ranking means of normal populations with known variances. Ann. Math. Statist. 25, 16-39. Bechhofer, R. E., Dunnett, C. W. and Sobel, M. (1954). A two-sample multiple-decision procedure for ranking means of normal populations with a common unknown variance. Biometrika 41, 170-176. Bechhofer, R. E., Kiefer, J. and Sobel, M. (1968). Sequential Identification and Ranking Procedures (with special reference to Koopman-Darmois populations). The University of Chicago Press, Chicago. Bechhofer, R. E. and Kulkarni, R. V. (1982). Closed adaptive sequential procedures for selecting the best of k >/2 Bernoulli populations. In: S. S. Gupta and J. O. Berger, eds., Statistical Decision Theory and Related Topics--Ill, Vol. 1, Academic Press, New York, 61-108. Berger, R. L, (1979). Minimax subset selection for loss measured by subset size. Ann. Statist. 7, 1333-1338. Berger, R. L. and Gupta, S. S. (1980). Minimax subset selection rules with applications to unequal variance (unequal sample size) problems. Scand. J. Statist. 7, 21-26. Bickel, P. J. and Lehmann, E. L. (1979). Descriptive statistics for nonparametric models IV. Spread. In: Jana Jureckova, ed., Contributions to Statistics: Jaroslav Hajek Memorial Volume, Reidel, Boston, 3-40. Birnbaum, Z. W., Esary, J. D. and Marshall, A. W. (1966). A stochastic characterization of wear-out for components and systems. Ann. Math. Statist. 37, 816-825. Brown, G. and Tukey, J. W. (1946). Some distributions of sample means. Ann. Math. Statist. 7, 1-12. BiJringer, H., Martin, H. and Schriever, K.-I-I. (1980). Nonparametric Sequential Selection Procedures. Birkhanser, Boston, MA. Carroll, R. J., Gupta, S. S. and Huang, D.-Y. (1975). On selection procedures for the t best populations and some related problems. Comm. Statist. 4, 987-1008. Desu, M. M. (1970). A selection problem. Ann. Math. Statist. 41, 1596-1603. Desu, M. M., Narula, S. C. and Villarreal, B. (1977). A two-stage procedure for selecting the best of k exponential distributions. Comm. Statist. A--Theory Methods 6, 1223-1230. Desu, M. M. and Sobel, M. (1968). A fixed-subset size approach to a selection problem. Biometrika 55, 401-410. Corrections and amendments: 63 (1976), 685. Doksum, M. (1969). Starshaped transformations and the power of rank tests. Ann. Math. Statist. 40, 1167-1176. Dudewicz, E. J. and Koo, J. O. (1982). The Complete Categorized Guide to Statistical Selection and Ranking Procedures. Series in Mathematical and Management Sciences, Vol. 6, American Sciences Press, Columbus, OH. Dunnett, C. W. (1955). A multiple comparison procedure for comparing several treatments with a control. J. Amer. Statist. Assoc. 50, 1096-1121. Fabian, V. (1962). On multiple decision methods for ranking population means. Ann. Math. Statist. 33, 248-254. Gibbons, J. D., Olkin, I. and Sobel, M. (1977). Selecting and Ordering Populations: A New Statistical Methodology. Wiley, New York. Gupta, S. S. (1956). On a decision rule for a problem in ranking means. Mimeograph Series No. 150, Institute of Statistics, University of North Carolina, Chapel Hill, NC. Gupta, S. S. (1963a). On a selection and ranking procedure for gamma populations. Ann. Inst. Statist. Math. 14, 199-216. Gupta, S. S. (1963b). Probability integrals of the multivariate normal and multivariate t. Ann. Math. Statist. 34, 792-828. Gupta, S. S. (1965). On some multiple decision (selection and ranking) rules. Technometrics 7, 225-245. Gupta, S. S. and Huang, D.-Y. (1980). A note on optimal subset selection procedures. Ann. Statist. 8, 1164-1167. Gupta, S. S. and Huang, D.-Y. (1981). Multiple Decision Theory: Recent Developments. Lecture Notes in Statistics, Vol. 6, Springer, New York. Gupta, S. S., Huang, D.-Y. and Huang, W.-T. (1976). On ranking and selection procedures and tests of homogeneity for binomial populations. In: S. Ikeda, T. Hayakawa, H. Hudimoto, M. Okamoto, Selection and ranking procedures in reliability models 155 M. Siotani and S. Yamamoto, eds., Essays in Probability and Statistics, Shinko Tsusho Co. Ltd., Tokyo, Japan, Chapter 33, 501-533. Gupta, S. S., Huang, D.-Y. and Nagel, K. (1979). Locally optimal subset selection procedures based on ranks. In: J. S. Rustagi, ed., Optimizing Methods in Statistics, Academic Press, New York, 251-260. Gupta, S. S., Huang, D.-Y. and Panchapakesan, S. (1984). On some inequalities and monotonicity results in selection and ranking theory. In: Y. L. Tong, ed., Inequalities in Statistics and Probability, IMS Lecture Notes--Monograph Series, Vol. 5, 211-217. Gupta, S. S., Huang, W. T. (1984). On isotonic selection rules for binomial populations better than a standard. In: A. M. Abuammoh, E. A. Ali, E. A. El-Neweihi and M. Q. E1-Osh, eds., Developments in Statistics and lts Applications, King Sand Univ. Library, Riyadh, 89-112. Gupta, S. A. and Kim, W.-X. (1984). A two-stage elimination type procedure for selecting the largest of several normal means with a common unknown variance. In: T. J. Santner and A. C. Tamhane, eds., Design of Experiments: Ranking and Selection, Marcel Dekker, New York, 77-93. Gupta, S. S. and Leu, L.-Y. (1986). Isotonic procedures for selecting populations better than a standard: two-parameter exponential distributions. In: A. P. Basu, ed., Reliability and Quality Control, Elsevier Science Publishers B.V., Amsterdam, 167-183. Gupta, S. S. and Liang, T.-C. (1987). Locally optimal subset selection rules based on ranks under joint type II censoring. Statistics and Decisions 5, 1-13. Gupta, S. S. and Lu, M.-W. (1979). Subset selection procedures for restricted families of probability distributions. Ann. Inst. Statist. Math. 31, 253-252. Gupta, S. S. and McDonald, G. C. (1982). Nonparametric procedures in multiple decisions (ranking and selection procedures). In: B. V. Gnedenko, M. L. Puri and I. Vincze, eds., Colloquia Mathematica Societatis Janos Bolyai, 32: Nonparametric Statistical Inference, Vol. I, North-Holland, Amsterdam, 361-389. Gupta, S. S., Nagel, K. and Panchapakesan, S. (1973). On the order statistics from equally correlated normal random variables. Biometrika 60, 403-413. Gupta, S. S. and Panchapakesan, S. (1972). On a class of subset selection procedures. Ann. Math. Statist. 43, 814-822. Gupta, S. S. and Panchapakesan, S. (1974). Inference for restricted families: (a) multiple decision procedures; (b) order statistics inequalities. In: F. Proschan and R. J. Serfling, eds., Reliability and Biometry: Statistical Analysis of Lifelength, SIAM, Philadelphia, 503-596. Gupta, S. S. and Panchapakesan, S. (1975). On a quantile selection procedure and associated distribution of ratios of order statistics from a restricted family of probability distributions. In: R. E. Barlow, J. B. Fussell and N. D. Singpurwalla, eds., Reliability and Fault Tree Analysis: Theoretical and Applied Aspects of System Reliability and Safety Assessment, SIAM, Philadelphia, 557-576. Gupta, S. S. and Panchapakesan, S. (1979). Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations. Wiley, New York. Gupta, S. S. and Panchapakesan, S. (1985). Subset selection procedures: review and assessment. Amer. J. Management Math. Sci. 5, 235-311. Gupta, S. S. and Santner, T. J. (1973). On selection and ranking procedures--a restricted subset selection rule. Proceedings of the 39th Session of the International Statistical Institute, Vol. 45, Book I, 478-486. Gupta, S. S. and Sobel, M. (1958). On selecting a subset which contains all populations better than a standard. Ann. Math. Statist. 29, 235-244. Gupta, S. S. and Sobel, M. (1960). Selecting a subset containing the best of several binomial populations. In: I. Olkin, S. G. Ghurye, W. Hoeffding, W. G. Madow and H. B. Mann, eds., Contributions to Probability and Statistics, Stanford University Press, Stanford, Chapter 20, 224-248. Gupta, S. S. and Sobel, M. (1962a). On selecting a subset containing the population with the smallest variance. Biometrika 49, 495-507. Gupta, S. S. and Sobel, M. (1962b). On the smallest of several correlated F-statistics. Biometrika 49, 509-523. Gupta, S. S. and Yang, H.-M. (1984). Isotonic procedures for selecting populations better than a control under ordering prior. In: J. K. Ghosh and J. Roy, eds., Statistics: Applications and New 156 S. S. Gupta and S. Panchapakesan Directions: Proceedings of the Indian Statistical Institute Golden Jubilee International Conference, Indian Statistical Institute, Calcutta, 279-312. Hoel, D. G., Sobel, M. and Weiss, G. H. (1975). A survey of adaptive sampling for clinical trials. In: R. M. Elashoff, ed., Perspectives in Biometry, Academic Press, New York, 29-61. Hooper, J. H. and Santner, T. J. (1979). Design of experiments for selection from ordered families of distributions. Ann. Statist. 7, 615-643. Huang, D.-Y. and Panchapakesan, S. (1982). Some locally optimal subset selection rules based on ranks. In: S. S. Gupta and J. O. Berger, eds., Statistical Decision Theory and Related Topics--III, Vol. 2, Academic Press, New York, 1-14. Kim, W.-C. and Lee, S.-H. (1985). An elimination type two-stage selection procedure for exponential distributions. Comm. Statist.--Theor. Meth. 14, 2563-2571. Kingston, J. V. and Patel, J. K. (1980a). Selecting the best one of several Weibull populations. Comm. Statist. A--Theory Methods 9, 383-398. Kingston, J. V. and Patel, J. K. (1980b). A restricted subset selection procedure for Weibull distributions. Comm. Statist. A--Theory Methods 9, 1371-1383. Lawrence, M. J. (1975). Inequalities for s-ordered distributions. Ann. Statist. 3, 413-428. Lehmann, E. L. (1963). A class of selection procedures based on ranks. Math. Annalen 150, 268-275. Milton, R. C. (1963). Tables of equally correlated multivariate normal probability integral. Technical Report No. 27, Department of Statistics, University of Minnesota, Minneapolis, MI. Nagel, K. (1970). On subset selection rules with certain optimality properties. Ph.D. Thesis (also Mimeograph Series No. 222), Department of Statistics, Purdue University, West Lafayette, IN. Panchapakesan, S. and Santner, T. J. (1977). Subset selection procedures for Ap-superior populations. Comm. Statist. A--Theory Methods 6, 1081-1090. Patel, J. K. (1976). Ranking and selection of IFR populations based on means. J. Amer. Statist. Assoc. 71, 143-146. Paulson, E. (1952). On the comparison of several experimental categories with a control. Ann. Math. Statist. 23, 239-246. Raghavachari, M. and Starr, N. (1970). Selection problems for some terminal distributions. Metron 28, 185-197. Rizvi, M. H. and Sobel, M. (1967). Nonparametric procedures for selecting a subset containing the population with the largest ~-quantile. Ann. Math. Statist. 38, 1788-1803. Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58, 527-535. Robbins, H. (1956). A sequential design problem with a finite memory. Proc. Nat. Acad. Sci. U.S.A. 42, 920-923. Santner, T. J. (1975). A restricted subset selection approach to ranking and selection problems. Ann. Statist. 3, 334-349. Saunders, I. W. and Moran, P. A. P. (1978). On the quantiles of the gamma and F distributions. J. AppL Prob. 15, 426-432. Sobel, M. (1967). Nonparametric procedures for selecting the t populations with the largest c~-quantiles. Ann. Math. Statist. 38, 1804-1816. Sobel, M. and Huyett, M. J. (1957). Selecting the best one of several binomial populations. Bell System Tech. J. 36, 537-576. Zwet, W. R. van (1964). Convex Transformations of Random Variables. Mathematical Center, Amsterdam. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics. Vol. 7 © Elsevier Science Publishers B.V. (1988) 157-174 | ~'~ The Impact of Reliability Theory on Some Branches of Mathematics and Statistics Philip J. Boland and Frank Proschan* 0. Introduction It is obvious that reliability theory has used a great variety of mathematical and statistical tools to help achieve needed results. These include: total positivity, majorization and Schur functions, renewal theory, Bayesian statistics, isotonic regression, Markov and semi-Markov processes, stochastic comparisons and bounds, convexity theory, rearrangement inequalities, optimization theory--the list is almost endless• The question now arises: Has reliability theory reciprocated--that is, has reliability theory made any contributions to the development of any of the mathematical and statistical disciplines listed above? The answer is a definite Yes. In this article we shall show that in the course of solving reliability problems, theoreticians have developed new results in some of the disciplines above, of direct value to the discipline and having application in other branches of statistics and mathematics• 1. Total positivity and P61ya frequency functions A function K ( x , y) of two real variables ranging over linearly ordered sets X and Y respectively is said to be totally positive o f order r (TPr) if for all 1 ~< rn ~< r, Xl < x2 < " " " < Xm, Yl < Y2 < " " " < Ym (Xi ~ X, yj E Y), we have the inequalities K [;::;; ;m] g(xl, Yl) K ( X l , Y2) "'" K(x1, Ym) >I K(X2, Yl) K ( x 2 , Y2) "" " K ( x 2 , Y,,,) g ( x m , Yl) K ( x m , y2)" " K(x,,,, y,,,) O. * Research supported by the Air Force Office of Scientific Research Grant AFOSR 82-K-0007. 157 P. J. Boland and F. Proschan 158 Typically, X is an interval of the real line, or a countable set of discrete values on the real line such as the set of all integers or the set of nonnegative integers; similarly for Y. When X or Y is a set of integers, we may use the term 'sequence' rather than 'function'. If a TPr function K(x, y) is a probability density in one of the variables, say x, with respect to a a-finite measure #(x) for each fixed value of y, and is expressible as a function K(x, y) = f ( x - y) of the difference of x and y, then f is said to be a P6lyafrequencyfunction (or density) of order r (PFr). The argument o f f traverses the real line. If the argument is confined to the integers we shall speak of a P61ya frequency sequence of order r (PF~ sequence). Note that if f is a density function on R, then K(x, y) = f ( x - y) is TP 2 if and only if the family of density functions { f ( x - y ) : y ~ Y} has the monotone likelihood ratio property. Many totally positive kernels (functions) may be generated by the judicious use of the following convolution result: THEOREM 1.1. convolution I f K is TP r, L is TP s and # is a a-finite measure, then the M(x, y) = f K(x, z)L(z, y) d/~(z) is TPmin (r, ~). PROOF. The result follows from the 'Basic Composition Formula' (see Karlin (1968) for a proof): MIXI'X2..... Xml= I'''I LYl,Y2, ..., Y,~ A KFXI..... Xm] kZ~, ~1 < Z 2 < ., z,,, o • - "<Zm xF;,1.... mZm] 'l' whenever M(x, y) = S K(x, z)L(z, y) d#(z) converges absolutely with respect to the a-finite measure #. An important feature of totally positive functions is their variation diminishing property. Suppose that K(x, y) is TP r and that h(y) changes sign j times where j ~< r - 1. Let g(x) = S K(x, y)h(y) d#(y), an absolutely convergent integral with /~ a a-finite measure. Then g(x) changes sign at most j times. Moreover, if g(x) actually changes sign j times, then g(x) must have the same arrangement of signs as h(y) does, as x and y traverse their respective domains from left to right. The variation diminishing property is actually equivalent to the (inequalities) definition we have given of TP r (see Karlin and Proschan, 1960; Karlin, 1968, Chapter 5). The impact of reliability theory on mathematics and statistics 159 The theory of totally positive kernels and P61ya frequency functions has been extensively applied in several domains of mathematics, statistics, economics, and mechanics. In particular to give but a few examples in the theory of reliability, the techniques of total positivity have been useful in developing properties of life distributions with monotone failure rates and in the study of some notions of component dependence, P61ya frequency functions of order 2 have been helpful in determining optimal inspection policies, and the variation diminishing property has been used in establishing characteristics of certain shock models. Reliability theory has in turn, however, been the motivating force behind some important developments in the theory of total positivity itself. A good example is the following result (see Karlin and Proschan, 1960): THEOREM 1.2. Let f l , f2, ... be any sequence of densities of nonnegative random variables, where each f is PF r Then the n-fold convolution g(n, x) = f l * r E * " " * fn(x) is TP~ in the variables n and x, where n ranges over 1, 2, ... and x traverses the positive real line. A similar total positivity result for the first passage time probabilities of the partial sum process can be proved in the more general case when the random variables range over the whole real line. THEOREM 1.3. Let f l , f z , ... be any sequence of PF~ densities of random variables X l , X 2 . . . . respectively, which are not necessarily non-negative. Consider the first passage probability for x positive: , ..... . 1 ] for n = 1, 2 . . . . . Then h(n, x') is TPr, where n ranges over 1, 2, . . . , and x traverses the positive real line. Theorems 1.2 and 1.3 were initially inspired by certain models in reliability and inventory theory, and these results in turn motivated Karlin (1964) to characterize new classes of totally positive kernels and develop new applications in (for example) discrete Markov chains. Typical of the results of Karlin (1964) are the following two propositions (see also Karlin, 1968, p. 43): PROPOSITION 1.4. Let ~ be a temporarily homogeneous TP r Markov chain (the transition probability matrix P is TPr) whose state space is the set of nonnegative integers. Then the n-step transition function P~j is TPr in the variables 0 ~ j < oo and n>~O. PROPOSITION 1.5. Let ~ be a TP r Markov chain. Let Finjo denote the probability that the first passage into the set of states <<-Jooccurs at the nth transition when the initial state of the process is i > Jo. Then FT.jo is TP r in the variables n >i 1 and i > Jo. 160 P.J. Boland and F. Proschan We now briefly trace the development leading to Theorems 1.2 and 1.3, beginning with a basic problem in reliability theory that Black and Proschan (1959) consider. (See also Proschan (1960) and Barlow and Proschan (1965) for related problems). Suppose that a system is required to operate for the period [0, to]. When a component fails, it is immediately replaced by a spare component of the same type if one is available. The system fails if no spare is available. Only the originally supplied spares may be used for replacement during the period [0, to]. Assume that the system uses k different types of components. At time 0, for each i = 1, ..., k there are d r 'positions' in the system which are filled with components of type i. By 'position (i, j)' we mean the jth location in the system where a component of type i is used. Components of the same type in different positions may be subject to varying stresses, and so we assume that the life of a component in position (i, j ) has density function f j . Each replacement has the same life distribution as its predecessor, and component lives are assumed to be mutually independent. Let Pr(nr) be the reliability during [0, to] of the ith part of the system (that is the subsystem consisting of the di components of type i), assuming that n; spares of type i are available for replacement. The problem is to determine the 'spares kit' n = ( n ~ , . . . , nk) which will maximize the reliability of the system P~n) = I~/k=l er(nr) during [0, to] subject to a cost constraint of the form Y~r= 1 crnr <~ C (where cr > 0 for all i = 1, . . . , k). A vector n o = (n °, n °, ..., n°) is an undominated spares allocation if whenever O k k P(n) > e(n ), then Y.r= l cinr > Y~= l Crn°. Black and Proschan (1959) consider methods for quickly generating families of undominated spares allocations, which can then be used to solve (approximately) the above problem. One of their procedures is to start with the cheapest cost allocation (0, 0 . . . . . 0), and successively generate more expensive allocations as follows: If the present allocation is n, determine the index io for which [logPr(nr + 1) - logP~(n¢)]/G (i = 1. . . . . k) is a maximum (in the case of ties the lowest such index is taken). The next allocation is then n' = (n 1. . . . , nio_ j, n;o + 1, n;o + 1. . . . , nk). Black and Proschan observe that the procedures they describe generate undominated allocations if each Pc(n) is log concave in n. They are able to verify this directly in the case where the component lives in the ith part of the system are exponentially distributed with parameter 2~. Note that logP;(n) is concave in n if and only if (Pr(n + 1)/Pr(n)) is a decreasing function of n, or equivalently that Pr(n) is a PF a sequence. Let N o. for j = 1, . . . , d r be the random variable indicating the number of replacements of type i needed at position (i, j ) in the interval [0, to]. Proschan (1960) is able to show that iff.j.(t) satisfies the monotone likelihood ratio property for translations (equivalently that fj(t) is a PF 2 function), then f~n)(t) is a TPa function in the variables n and t (where f;~n) is the n-fold convolution of f j with itself). Judiciously using Theorem 1.1 on convolutions of totally positive functions, one is then able to The impact of reliability theory on mathematics and statistics 161 show P r o b ( N i j = n ) is a PF 2 sequence and finally that P i ( n ) = P r o b (Ni~ + • •. + Nid ~<~ n) is a PF 2 sequence. The key tool is of course to show that when f ( t ) is a PF 2 function, then f(n~(t) is TP2 in n and t. Theorems 1.2 and 1.3 are natural generalizations of this result. One may further generalise the 'spares kit' procedure above to show that when each life distribution function F;j (for position (i, j)) is IFR (equivalently that ff = 1 - F is PF2), then the procedure generates undominated allocations (Barlow and Proschan, 1965). 2. Association of random variables The notion of associated random variables is one of the most valuable contributions to statistics that has been generated as a result of reliability theory considerations. We consider two random variables to be in some sense associated if they are positively correlated, that is cov(S, T) >/0. A stronger requirement is c o v ( f ( S ) , g ( T ) ) >1 0 for all nondecreasing f and g. Finally if c o v ( f ( S , T), g ( S , T)) >I 0 for all f and g nondecreasing in each argument, we have still stronger version of association. Esary, Proschan and Walkup (1967) generalize this strongest version of association to the multivariate case in defining random variables T1, . . . , T~ to be associated if c o v ( f ( T ) , g ( T ) ) >>,0 for all nondecreasing functions f and g for which the covariance in question exists. Equivalent definitions of associated random variables result if the functions f and g are taken to be increasing and either (i) binary or (ii) bounded and continuous. Association of random variables satisfies the following desirable multivariate properties: (P1) Any subset of a set of associated random variables is a set of associated random variables. (P2) If two sets of associated random variables are independent of one another, then the union of the two sets is a set of associated random variables. (P3) Any set consisting of a single random variable is a set of associated random variables. (P4) Increasing functions of associated random variables are associated. (Ps) A limit in distribution of a sequence of sets of associated random variables is a set of associated random variables. Note that properties P3 and P2 imply that any set of independent random variables is associated. This fact, together with property P4 enables one to generate many practical examples of associated random variables. In the special case when dealing with binary random variables, one can readily show that the binary random variables X~, . . . , X n are associated if and only if 1 - X ~ , 1 - X 2 . . . . . 1 - X n are associated. Many interesting applications may be obtained as a consequence of the following result about associated random variables: 162 P. J. Boland and F. Proschan THEOREM 2.1. L e t T 1. . . . . 1", be associated, and Si = f ( T ) function f o r each i = 1, . . . , k. Then be a nondecreasing k P[S1 <~ S1 . . . . ' Sk <~"Sk] ~ H P [ S i <'~Si] i=l and k P[S1 > Sl' " ' ' ' Sk > Sk] >~ H i=l P[Si > si] f o r all s = (s 1. . . . , Sk)~ R ~. The following two corollaries are immediate consequences of this theorem. COROLLARY 2.2. (Robbins, 1954). L e t T~ . . . . . T , be independent random variabe the i t h p a r t i a l s u m f o r i = 1, .. . , n . Then bles, and let S i = ~ 'j = I T j P [ S 1 <~ s I . . . . . S n <. s.] >~ fi P(S, <. s,) i=l f o r all s = (s, . . . . . s,) e R'. COROLLARY 2.3. L e t T t u , . . . , T[, 1 be the order statistics in a random sample T 1. . . . , T,. Then k P [ T v , 1 <~ ti~, . . . , Tu, ] <~ t~] >/ I-I P[Tto] <~ t~] j=l and k P [ T v , I > til, " " , 'Tvkl > tik] >/ I ] P[T[,)1 > tij] j=l f o r every choice o f 1 <~ i~ < • • • < ik <~ n and ti, < • • • < tik. Marshall and Olkin (1967) consider the multivariate exponential distribution F F(s, . . . . , sin) = 1 - exp [ - £ 2is i - Z 2u max(s,., sj) i<j k i= 1 - ~ 2;jk max(s;, sj, s~) . . . . i<j<k 212 . . . . "1 max(sl, s2 . . . . , Sin)I • They point out that if this is the distribution function of the random variables S 1. . . . . S=, then there exist independent exponential random variables T l . . . . , T , such that Sj = m i n { T i : i e A j } where A j c {1, 2, . . . , n}. The random variables S 1. . . . , S= are associated and therefore using Theorem 2.1, we can The impact of reliability theory on mathematics and statistics 163 show that F(s 1. . . . , Sm) ~/ ~ Fi(si) i=1 and 1 - F(s 1, . . . , Sm) ~ ~ [1 -- Fi(si) ] i=l where F i is the marginal distribution of S i. The multivariate exponential distribution is also useful in studying shock models. Another application of Theorem 2.1 can be made in the case of analysis of variance in which two hypothesis are tested using the same error variance for each test. Consider the case in which the effects of both rows and columns are to be tested. The standard procedure is to calculate three quadratic forms, ql, q2, q3 which are independently distributed as Z2 with n~, n z, and n 3 degrees of freedom respectively, where ql represents the sum of squares between rows, q2 the sum of squares between columns, and q3 the error sum of squares. The likelihood ratio test statistics for testing the two hypotheses are F 1 = (ql/nx)/(q3/n3) and F 2 = (q2/n2)/(q3/n3). The probability of making no errors of the first kind is P [ F I , , F 2 <<.F2~ ], where F1~ (F2~) is the 100~ per cent point of the distribution of F 1 (/72). Kimball (1951) proves P [ F , <~ F,=, F 2 ~ F2= ] > P[F~ ~ FI~,]P[F 2 <~F2=], or in other words that the chance of no errors of the first kind is greater following the standard experimental procedure than if separate experiments had been performed. This result is an immediate consequence of Theorem 2.1 once it is observed that F 1 and F z are nondecreasing functions of the associated random variables qx, q2, q3 1 The concept of associated random variables has proved to be a useful tool in various areas of operations research. Shogan (1977) uses properties of associated random variables to construct bounds for the stochastic activity duration of PERT network. Heidelberger and Inglehart (1979) use associativity to construct a set of sufficient conditions which guarantee that the dependent simulations of a stochastic system produce a variance reduction over independent simulations. Niu (1981) makes use of association in studying queues with dependent interarrival and service times. The notion of association of random variables is just one of many notions of multivariate dependence. Lehmann (1966) introduces several concepts of bivariate dependence, the strongest of which is TP z dependence ((S, T) are TP 2 dependent if the joint probability density (or in the discrete case joint frequency function) f ( s , t) is totally positive of order 2). For a discussion concerning the relationship 164 P. J. Boland and F. Proschan among several notions of multivariate dependence see Barlow and Proschan (1981). Newman and Wright (1981) obtain limit theory result for sequences of associated random variables. In applications it is often easier to verify that one of the alternative notions which imply association holds, instead of verifying association directly. For example if T = (T 1. . . . . T,) has density f ( q , . . . , t,) which is TP 2 in every pair of variables when the remaining variables are kept fixed and which is everywhere positive on a rectangular support, then T~ . . . . . T, are associated (see Kemperman, 1977). Pitt (1982) proves the following important characterization of association for the multivariate normal case (for a simpler proof see also Joag-dev, Perlman and Pitt (1983)): THEOREM 2.4. L e t T -- (T 1. . . . , T , ) be multivariate normal. Then T l . . . . , T~ are associated if and only if coy(T/, Tj.) ~> 0 f o r all i, j = 1, . . . , n. A related result of particular importance in statistical mechanics is the F K G inequality. Let T = (T1, . . . , Tn) be a random vector with density f ( q . . . . , t,). For s = (s 1, . . . , sn) and t = (q, . . . , tn), let s v t = (max(s,, tl), max(s2, tz) . . . . , max(s~, tn)) and S ^ t = (min(s 1, q), min(s 2, t2), . . . , min(s,, t~)) f is said to satisfy the F K G condition (or be multivariate totally positive o f order 2 (MTP2)) if f ( s v t ) f ( s ^ t) >t f ( s ) f ( t ) for all s, t ~ ~ . The F K G inequality, obtained by Fortuin, Kasteleyn and Ginibre (1971), says that if the density f of T satisfies the F K G condition, then T 1. . . . . T, are associated. For an excellent discussion of the application of the F K G inequality in statistics see Kemperman (1977). The notion of association of random variables which Esary, Proschan and Walkup (1967) develop, has its origins in a problem of Esary and Proschan (1963) concerning coherent structures. Moore and Shannon (1956) investigate the reliability of relay circuits and show that arbitrarily reliable circuits can be constructed from arbitrarily unreliable relays. They prove that if h ( p ) is the probability of closure of a relay network plotted as a function of the common probability p of the closure of a simple relay, then p(1-p)h'(p)>h(p)(1-h(p)) for O < p < 1. Therefore h ( p ) is s-shaped (crosses the diagonal at most once and always from below), a property which is crucial in constructing relay circuits of arbitrarily high reliability. Birnbaum, Esary and Saunders (1961) generalize this result of Moore The impact of reliability theory on mathematics and statistics 165 and Shannon to coherent structures of independent components with identical reliability. Esary and Proschan (1963) in turn generalize to coherent structures with independent components not necessarily of the same reliability. The main tool in their paper is the following specialized version of an inequality of Tchebichev (see Hardy, Littlewood and P61ya, 1952), which may be regarded as a 'forerunner' to the definition of association of random variables: THEOREM 2.5. Let X 1 , . . . , X n be independent binary random variables. Let f(X), i = 1, 2, be increasing functions. Then cov[fl(X), f2(X)] ~> 0. Esary and Proschan also use Theorem 2.5 to construct upper and lower bounds for t h e reliability of a coherent structure in terms of the minimal paths and minimal cut sets of the structure. 3. Renewal theory Renewal theory has its origins in the study of self-renewing aggregates and especially in actuarial science. Today we view the subject more generally as the study of functions of independent identically distributed nonnegative random variables which represent the successive intervals between renewals of a process. The theory is applied to a wide variety of fields such as risk analysis, counting processes, fatigue analysis, inventory theory, queuing theory, traffic flow, and reliability theory. We will summarize a few of the more important and basic ideas in renewal theory (for a more complete treatment consult Smith (1958), Cox (1962), Feller (1966), Ross (1970), or Karlin and Taylor (1975)) and then indicate some of the contributions to this area arising from reliability theory. By a renewal process we will mean a sequence of independent identically distributed nonnegative random variables X1, X 2 . . . . . which are not all zero with probability one. We let F be the distribution function of X1, and F (k) will denote the k-fold convolution of F with itself. The kth partial sum S k = X 1 + • • • + X k is the kth renewal point and has distribution function F (k). For convenience we define F (°) by F(°)(t) = 1 for t >i 0 and zero otherwise. Renewal theory is primarily concerned with the number N(t) of renewals in the interval [0, t]. N(t), the renewal random variable, is the maximum value of k for which Sk <~ t, with the understanding that N(t)= 0 if X ~ > t . It is clear that P ( N ( t ) = n ) = F(n)(t) - F (n+ 1)(0 and e ( N ( t ) >>.n ) = F(")(t). The process {N(t): t >/0} is known as a renewal counting process. The renewal function M(t) is defined to be the expected number of renewals in [0, t], that is M(t) = E(N(t)). Since M(t) = E(N(t)) = 2 k~= l k P [ N ( t ) = k] = oo ~=1~ P[N(t) >t k], it follows that M(t) = Zk= ~ FCk)(t) and moreover that M(t) = ~o- [1 + M ( t - x)] dF(x) (this latter identity being known as the fundamental renewal equation). In spite of the fact that a closed functional form for M(t) is known for only a few special distributions F, the renewal function M(t) plays a central role in renewal theory. 166 P. J. Boland and F. Proschan If F is the distribution function of X1, F is nonlattice if there exists no h > 0 such that the range of X 1 c {h, 2h, 3h,...}. The following basic results were proved in the early stages of renewal theory development. THEOREM 3.1. I f F has mean #i, then N ( t ) / t ~ 1/# 1 almost surely as t--* oo. THEOREM 3.2. Let F have mean #1. Then (i) M(t) >1 t/# I for all t >1 0; (ii) (Blackwell) if F is non-lattice, 1 -- lira [M(t + h) - M(t)] -- h / # , for any h > 0 ; l~oo (iii) if F is non-lattice with 2nd moment #2 < + ~ , M ( t ) = t / # l + # 2 / 2 # 2 - 1 +o(1) as t ~ c o . Note that important as these results may be, they are, with the exception of Theorem 3.2 (i), asymptotic in nature. In their comparison of replacement policies for stochastically failing units, Barlow and Proschan (1964) obtain several new renewal theory inequalities. An age replacement policy is one whereby a unit is replaced upon failure or at age T, a specified constant, whichever comes first. Under a block replacement policy a unit is replaced upon failure and at times T, 2T, 3T, .... It is assumed that failures occur independently and that the replacement time is negligible. There are advantages for both types of policy, and hence it is of interest to compare the two types stochastically with respect to numbers of failures, planned replacements and removals (a removal is a failure or a planned replacement). In many situations it will be assumed that the life distribution of a unit belongs to a monotone class such as the IFR (DFR) class (F is IFR if it has increasing (decreasing) failure rate). It is clear that the evaluation of replacement policies depends heavily on the theory of renewal processes. Suppose we let N(t) indicate the number of renewals in [0, t] due to replacements at failure, N*(t) be the number of failures in [0, t] under a block policy, and N*(t) the number of failures in [0, t] under an age policy. Barlow and Proschan (1964) prove the following result stochastically comparing these random variables: THEOREM 3.3. If F is IFR (DFR), then P(N(t) >>,n) >~ ( <~)P(U*(t) >/n) >~ ( <~)P(U*(t) >/n) for t >l O and n = O, 1, 2 , . . . . The following bounds on the renewal function M(t) = E(N(t)) are an immediate consequence: The impact of reliability theory on mathematics and statistics 167 COROLLARY 3.4. I f F is IFR (DFR), then (i) M(t) >~ ( <~)E(N*(t)) >1 ( <~)e(N*(t)). (ii) M(t) >t (<<.)kM(t/k), k = 1, 2, . . . . (iii) M(t) <~ (>1) t/kL1 (iv) M(h) <~(>. )M(t + h) - M(t) for all h, t >~ O. By considering the number of failures and the number of removals per unit of time as the duration of the replacement operation becomes indefinitely large, Barlow and Proschan (1964) obtain the following simple useful bounds on the renewal function for any F, and an improvement on these bounds for the IFR (DFR) case (these bounds were conjectured by Bazovsky (1962)): THEOREM 3.5. (i) M(t) >~ t/S o i ( x ) d x - 1 >>.t/# 1 - 1 for all t >~ O. (ii) I f F is IFR (DFR), then M(t) <~( >~) tF(t)/ S o if(x) d x <<.(>1) t/l~ 1for all t >~O. As a consequence of this result, it follows that when F is IFR the expected numbers of failures per unit of time under block and age replacement policies do not differ by more than 1/T in the limit as t--, ~ . Feller (1948) shows than l i m t ~ V a r ( N ( t ) ) / M ( t ) = tr2/#2~< 1. Barlow and Proschan (1964) partially generalize this result in proving the following: THEOREM 3.6. I f F is IFR (DFR), then Var(N(t)) ~<(>~)M(t), and this inequality is sharp. The renewal theory implications of the work of Barlow and Proschan (1964) provide the key tool in the probabilistic interpretation of Miner's rule given by Birnbaum and Saunders (1968) and Saunders (1970). Miner's rule (Miner, 1945) is a deterministic formula extensively used in engineering practice for the cumulative damage due to fatigue. Prior to the work of Birnbaum and Saunders, Miner's rule was supported by empirical evidence but had very little theoretical justification. Birnbaum and Saunders investigate models for stochastic crack growth with incremental extensions having an increasing failure rate distribution. The result that for an IFR distribution function F the inequality t/I21 - 1 <~M(t) <~ t/[.t 1 holds, is used to prove that T/121 -- 1 ~ ~ 1"/121 where ]A1 is the expected crack increment per cycle, z is the expected crack length at which failure occurs and 7 is the expected number of loading cycles to failure. This in turn is used to show that under certain conditions of dependence on load, Miner's rule does yield the mathematical expectation of fatigue life. Saunders (1970) extends some of these results by weakening the model assumptions, in particular by assuming that the IFR assumption for the crack growth can be relaxed to assuming that F be new better than used in expectation (NBUE), that is #l > So ff(t + x ) / i ( t ) d x for all t >~ 0 such that F(t) > 0. Marshall and Proschan (1972) determine the largest classes of life distributions for which age and block replacement policies diminish, either stochastically or in expected value, the number of failures in service. In doing so, they give the first 168 P. J. Boland and F. Proschan systematic treatment of the NBU, NWU, NBUE and N W U E classes of life distributions, which are now widely used in statistics. A life distribution function F is new better than used (NBU) if i ( x + y) <~ F(x)F(y) for all x, y >~ 0. The new worse than used (NWU) class of distributions is similarly defined by reversing the order in this inequality. In their investigation they obtain important renewal quantity inequalities, many of which generalize results from Barlow and Proschan (1964). For example they show that if F is NBU (NWU) then VarN(t) ~< (>~)M(t) and M(h) <~ (>l)M(t + h) - M(t) for all h, t ~> 0, while if F is NBUE (NWUE) then M(t)<~ (>>.)tI# 1. The following interesting characterization of the NBU class in terms of the renewal random variable is obtained. Let • denote convolution. THEOREM 3.7. N(s) * N(t) <~ (>~)N(s + t)for all s, t >~ 0 ¢~ F is NBU (NWU). Straub (1970) is interested in bounding the probability that the total amount of insurance claims arising in a fixed period of time does not exceed the amount t of premiums collected. Letting F(t) be the distribution function for the individual claims amount, Straub desires bounds for ff(')(t)= P ( N ( t ) < n). Here we may interpret N(t) as the maximum value of k such that the first k claims sum to a total ~<t. Motivated by the use of tools in reliability theory and in particular in the work of Barlow and Marshall on bounds for classes of monotone distributions, Straub establishes the following important result (see Barlow and Proschan, 1981): THEOREM 3.8. Let F be a continuous distribution function with hazard function R (t) = - logif(t). (a) I f F is NBU (NWU), then "-' P(N(t)<n)>/(<<,) ~ .j=o (R(t))J e-n(t) fort>/O,n= 1,2 ..... j! (b) I f F is IFR (DFR), then , - l [nR(t/n)]j P(N(t) < n) <~ (>I) F, e-'R(t/n) j=o j! for t >1 O, n = 1, 2, . . . . The bounds for the renewal function established by Barlow and Proschan (1964) motivate Marshall (1973) to investigate the existence of 'best' linear bounds for M(t) ('best' is interpreted to mean the sharpest bounds which when iterated in the fundamental renewal equation converge monotonically to M(t) for all t). Esary, Marshall and Proschan (1973) establish properties of the survival function of a device subject to shocks and wear. One of their principal tools is the result that [Ftkl(x)] 1/~" is decreasing in k = 1, 2, ..., for any distribution function F such that F(x) = 0 for x < 0. This result, which is equivalent to the following property of the renewal random variable N(t), can be used to demonstrate monotonicity properties of first passage time distributions for certain Markov processes. The impact o f reliability theory on mathematics and statistics 169 THEOREM 3.9. Let N(t) denote the number of renewals in [0, t] for a renewal process. Then [P(N(t) >~k)] 1/k is decreasing in k = 1, 2 . . . . . Another class of monotone distributions used for modeling in reliability theory is the increasing mean residual life (IMRL) class. Let X 1 have life distribution F. Then F is IMRL if E(X~ - tIX~ > t) is nondecreasing in t/> 0. A D F R distribution function F with finite mean # 1 is IMRL. Mixtures of D F R distributions are DFR, and D F R distributions are used to model the lifetimes of units which improve with age, such as blast furnaces and work-hardening materials. Keilson (1975) shows that a large class of first passage time distributions for Markov process are DFR. Brown (1980) and (1981) proves some very nice renewal quantity results for the D F R and IMRL classes, among which is the following: THEOREM 3.10. (a) I f F is DFR, then the renewal function M(t) is concave. (b) I f F is IMRL, then M(t) - (t/#~ - 1) is increasing in t >~O. (Note however that M(t) is not necessarily concave.) 4. Majorization and Schur functions The theory of inequalities has played a fundamental role in developing new results in reliability theory. In attempting to compare and establish bounds for probability distributions and systems, workers in reliability have been discovering new inequalities. Many of these inequalities are of a general nature and can be presented using the techniques of majorization and Schur functions. Given a vector x = (xl, . . . , X n ) , let Xtl ] ~< X [ 2 ] ~ " " " ~ X[n ] denote an increasing rearrangement of Xl, . . . , x,. The vector x is said to majorize the vector y (we write x > m y ) if ~X{il) ~Y[il i=j i=j forj=2 ..... n and ~X[il= i= 1 ~-'~ Y[ili= 1 Hardy, Littlewood, and P61ya (1952) show that x > m y if and only if there exists a doubly stochastic matrix H such that y = xlI. Schur functions are real valued functions which are monotone with repsect to the partial ordering of majorization. A function h with the property that x > m y ~ h(x)>i ( ~ ) h ( y ) is called Schurconvex (Schur-concave). A convenient characterization of Schur-convexity (-concavity) is provided by the Schur-Ostrowski condition, which states that a differentiable permutation invariant function h defined on R" is Schur-convex (Schur-concave) if and only if ( x i _ xj)(O~x, a#h)>~ (~<)0 for all i , j and xe~q". For an excellent and extensive treatment of the theory of majorization, the reader should consult Marshall and Olkin (1979). P.J. Boland and F. Proschan 170 A k out o f n system is a system with n components which functions if and only if k or more of the components function. Systems of this type are frequently encountered in practice. A one out of n system is a parallel system and an n out of n system is a series system. We assume that the n components of the system function independently. Let hk(p) denote the reliability of a k out of n system in which the component reliabilities are given by p = (Pl, - . . , P,). Computing the reliability function hk(p) is often difficult, particularly when a large number of unlike component probabilities are involved. Some interesting inequalities with applications in other areas of statistics have resulted from efforts to obtain more computable bounds for the system reliability hk(p). For component reliability p; we define the corresponding component hazard R,. by R; = -logp~. Pledger and Proschan (1971) obtain the following comparisons for h~(p): THEOREM 4.1. Let R = (R1, . . . , R , ) be a vector of component hazards which majorizes R' = (R'1. . . . , R ' ) , a second vector of component hazards. Then for the corresponding component reliability vectors p and p' (note that [I1 Pt = H1 p~ since ~ R~ = Y,~ R ; ) we have n hk(p)>~h~(p') fork= 1,... ,n- n t 1 and h , ( p ) = h , ( p ' ) (that is the two systems are equally good in series). Considering the particular case where R '1 . . . . . R ' , one obtains the useful bound hl,(pl, . . . , p,)>1 hk(Pc, . . . , P c ) for k = 1, ..., n, where Pc is the geometric mean (!q~ pt) 1/'. Although a large collection of theory and methods exists for order statistics from a single underlying distribution, a relatively small set of results is available for the case of order statistics from underlying heterogeneous distributions. In as much as the time to failure of a k out of n system of independent components with respective life distributions F 1. . . . . F, corresponds to the (n - k + 1)th order statistic from the set of underlying heterogeneous distributions {F 1. . . . , Fn}, results about k out of n systems may be interpreted in terms of order statistics from heterogeneous distributions. Let us assume that Y/(Y; ) is an observation from distribution Fi (F;) and that Ri(x ) = - l o g f f i ( x ) (R~ (x) = -logff" (x)) is the corresponding hazard function for i = 1. . . . , n. The ordered observations are denoted by YH ~ < ' ' ' ~ < Yt-~ (YI~I ~ < " " ~< YI-I)" A random variable Y is stochastically larger than Y' ( y >~st y , ) if Fr(x) <~Fy, (x) for all x. In the realm of order statistics, Theorem 4.1 yields the following result: Let (/~I(X), . . . , THEOREM 4.2. Rn(X)) )-m(R11(X), ..., gn(x)) for all Ytll-~tYtl~ , for k = 2 . . . . , n . ' and Yrk~ >~st YEk] - x >t O. Then The impact of reliability theory on mathematics and statistics 171 Pledger and Proschan (1971) obtain further results of this type for the case of proportional hazards. We say that the distributions Fl, . . . , F,, F'l, . . . , F'n have proportional hazards with constants of proportionality 21, ..., 2,, 2'1. . . . . 2" if Ri(x ) = 2~R(x) and R ; ( x ) = 2;R(x) for some hazard function R ( x ) and all i = 1, . . . , n. A consequence of Theorem 4.2 is the following: COROLLARY 4.3. Let F 1. . . . . F,, F'I, . . . , F'n have proportional hazard functions with 21 . . . . . 2n, 2 ' 1 , . . . , 2;, as constants of proportionality. I f (21 . . . . . 2n) >m(2'1, . . . , 2"), then Y[1] =st Y[I] and Y[k] >~st Yil,] for k = 2, . . . , n. Proschan and Sethuraman (1976) generalize Corollary 4.3 and show that under the same stated conditions, Y = (Y1, . . . , Yn) > / s t Y ' = (Y'l . . . . . Y'n) ( y > s t y , if and only if f ( Y ) / > s t f ( y , ) for all real valued increasing functions f of n variables). For more on stochastic ordering the interested reader should consult Kamae, Krengel and O'Brien (1977). Proschan and Sethuraman apply their result to study the robustness of standard estimates of the parameter 2 in an exponential distribution (F(x) = 1 - e - a x ) when the observations actually come from a set of heterogeneous exponential distributions. Other comparisons for k out of n systems are given by Gleser (1975), and Boland and Proschan (1983). While investigating the distribution of the number of successes in independent but not necessarily identical Bernoulli trials, Hoeffding shows that 1 >~ hk(1, . . . , 1, 2 1 P ; - [2~Pi], 0 . . . . . 0) >/hk(pl . . . . . p , ) n whenever Y~1Pi >~ k, and 0 = hk(1 . . . . . ~< h~(~, . . . , n 1, Z l P, - [ Z~ p,-], 0 . . . . . 0) ~< hk(pl . . . . . p n ) F) n n n whenever Z l p ~ < ~ k . Here f i = Z l p ~ / n and [Y. lpe] is the integer part of Z l P i . Gleser generalizes this in showing the following: THEOREM 4.4. hk(p) is Schur convex in the region where Z~pe>~ k + 1 and Schur concave in the region where Z~ pi <~ k - 2. In further research on the reliability of k out of n systems, Boland and Proschan (1983) show the following related result: THEOREM 4.5. h~(p) is Schur convex in [(k - 1)/(n - 1), 1] n and Schur concave in [0, ( k - 1)/(n - 1)] n. 172 P. J. Boland and F. Proschan Theorems 4.4 and 4.5 represent inequalities which have practical use in the study of k out of n systems. However it should be clear that they are of more general interest and have applications in particular in the areas of order statistics and independent Bernoulli trials. Barlow and Proschan (1965) show that the mean life of a series system with IFR components exceeds (is greater than or equal to) the mean life of a similar system with exponential components, assuming component mean lives match in the two systems. The reverse ordering is shown to hold in the parallel case. Solovyev and Ushakov (1967) extend these results to include comparisons with systems of degenerate and truncated exponential distributions. Marshall and Proschan (1970) more generally show that if the life distributions F,. and Gi of corresponding components of a pair of series systems satisfy ~o P~(x)dx >~ ~o Gi(x) dx for all t~> 0, then the same kind of inequality holds for the system life distribution. Similarly they show that the domination ~) if(x) dx ~> ~ ~ G(x) dx for all t t> 0 is preserved under the formation of parallel systems, and that both of these types of domination are preserved under convolutions. Marshall and Proschan (1970) are (implicitly) working with the concept of continuous majorization (see Marshall and Olkin (1979)). We say the life distribution function F majorizes the life distribution function G (written F >m G) if #F = ~ o f f ( x ) d x = ~ o - G ( x ) d x = # a and ~ f f ( x ) d x > > , ~ - G ( x ) d x for all t t> 0. As a by-product of their work on the mean life of series and parallel systems, Marshall and Proschan establish the following result in the theory of majorization. THEOREM 4.6. Suppose that Fi > m G J o r each i = 1. . . . , n where Fe and G~ are life distribution functions. L e t F(t) = F 1 * • " • * Fn(t) and G(t) = G 1 * • ' • * Gn(t) be n-fold convolutions, with respective means #F and #G. Then F>mG. Many elementary inequalities of general interest have been generated through optimization problems in reliability theory. Derman, Lieberman and Ross (1972) consider the problem of how to assemble J systems with n different components in order to maximize the expected number of functioning systems. They extend a basic inequality of Hardy, Littlewood, and P61ya and 'rediscover' (their extension is a special case of a result of Lorentz (1953)) the following inequality: THEOREM 4.7. for i = 1,..., L e t F ( x l , . . . , xn) be a joint distribution function. I f xi 1 <~ " " <~ x J n, then J Z j=l J F(x{ ..... x~) >1 Z j F(x~, x022(j) . . . . , xO~n(j)) 1 whenever ~i (i = 2, . . . , n) are permutations o f 1, 2 . . . . . J. The impact of reliability theory on mathematics and statistics 173 References Barlow, R. E. and Proschan, F. (1964). Comparison of replacement policies, and renewal theory implications. Ann. Math. Statist. 35, 577-589. Barlow, R. E. and Proschan, F. (1965). Mathematical Theory of Reliability. Wiley, New York. Barlow, R. E. and Proschan, F. (1981). Statistical Theory of Reliability and Life Testing. To Begin With, Silver Spring, MD. Bazovsky, I. (1962). Study of maintenance cost optimization and reliability of shipboard machinery. ONR Contract No. Nonr-374000(00) (FBM), United Control Corp., Seattle, WA. Birnbaum, Z. W., Esary, J. D. and Saunders, S. C. (1961). Multi-component systems and structures and their reliability. Technometrics 3, 55-77. Birnbaum, Z. W. and Saunders, S. C. (1968). A probabilistic interpretation of Miner's rule. S I A M J. App. Math. 16, 637-652. Black, G. and Proschan, F. (1959). On optimal redundancy. Oper. Res. 7, 581-588. Boland, P. J. and Proschan, F. (1983). The reliability of k out of n systems. Ann. Prob. 11, 760-764. Boland, P. J. and Proschan, F. (1984). An integral inequality with applications to order statistics. To appear. Brown, M. (1980). Bounds, inequalities, and monotonicity properties for some specialized renewal processes. Ann. Probability 8, 227-240. Brown, M. (1981). Further monotonicity properties for specialized renewal processes. Ann. Probability. 9, 891-895. Cox, D. R. (1982). Renewal Theory. Wiley, New York. Derman, C., Lieberman, G. J. and Ross, S. M. (1972). On optimal assembly of systems. Nay. Res. Log. Quart. 19, 569-574. Esary, J. D., Marshall, A. W. and Proschan, F. (1973). Shock models and wear processes. Ann. Prob. 1, 627-649. Esary, J. D. and Proschan, F. (1963). Coherent structures Of non-identical components. Technometrics 5, 191-209. Esary, J. D., Proschan, F. and Walkup, D. W. (1967). Association of random variables, with applications. Ann. Math. Stat. 38, 1466-1474. Feller, W. (1948). On Probability problems in the theory of counters. Courant Anniversary Volume. Interscience, New York. Feller, W. (1966). An Introduction to Probability Theory and Its Applications, Vol. II. Wiley, New York. Fortuin, C. M., Kastelyn, P. W. a~d Ginibre, J. (1971). Correlation inequalities on some partially ordered sets. Comm. Math. Phys. 22, 89-103. Gleser, L. (1975). On the distribution of the number of successes in independent trials. Ann. Prob. 3, 182-188. Hardy, G. H., Littlewood, J. E. and P61ya. (1952). Inequalities. Cambridge University Press, New York. Heidelberger, P. and Inglehart, D. L. (1979). Comparing stochastic systems using regenerative simulation with common random numbers. Adv. Appl. Prob. 11, 804-819. Hoeffding, W. (1956). On the distribution of the number of successes in independent trials. Ann. Math. Stat. 27, 713-721. Joag-dev, K., Perlman, M. D. and Pitt, L. D. (1983). Association of normal random variables and Slepian's inequality. Ann. Prob. 11, 451-455. Kamae, T., Krengel, U. and O'Brien, G. L. (1977). Stochastic inequalities on partially ordered spaces. Ann. Probab. 5, 899-912. Karlin, S. (1964). Total positivity, absorption probabilities and applications. Trans. Amer. Math. Soc. III, 33-107. Karlin, S. (1968). Total Positivity. Stanford University Press, Stanford, CA. Karlin, S. and Proschan, F. (1960). P61ya type distributions of convolutions. Ann. Math. Stat. 31, 721-736. Karlin, S. and Taylor, H. M. (1975). A First Course in Stochastic Processes, 2nd edition. Academic Press, New York. 174 P. J. Boland and F. Proschan Keilson, J. (1975). Systems of independent Markov components and their transient behavior. In: R. E. Barlow, J. B. Fussel and N. D. Singpurwalla, eds., Reliability and Fault Tree Analysis. SIAM, Philadelphia, PA, 351-364. Kemperman, J. H. B. (1977). On the FKG-inequality for measures on a partially ordered space. lndag. Math. 39, 313-331. Kimball, A. W. (1951). On dependent tests of significance in the analysis of variance. Ann. Math. Star. 22, 600-602. Lehmann, E. L. (1966). Some concepts of dependence. Ann. Math. Stat. 37, 1137-1153. Lorentz, G. G. (1953). An inequality for rearrangements. Amer. Math. Mon. 60, 176-179. Marshall, A. W. and Olkin, I. (1967). A multivariate exponential distribution. J. Amer. Stat. Assoc. 62, 30-44. Marshall, A. W. and Olkin, I. (1979). Inequalities: Theory of Majorization and Its Applications. Academic Press, New York. Marshall, A. W. and Proschan, F. (1970). Mean life of series and parallel systems. J. App. Prob. 7, 165-174. Marshall, A. W. and Proschan, F. (1972). Classes of distributions applicable in replacement, with renewal theory implications. In: L. LeCom, J. Neyman and E. L. Scott, eds., Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability, Vol. I, University of California Press, Berkeley, CA, 395-415. Marshall, K. T. (1973). Linear bounds on the renewal function. SIAM J. App. Math. 24, 245-250. Miner, M. A. (1945). Cumulative damage in fatigue. J. AppL Mech. 12, A159-A164. Moore, E. F. and Shannon, C. E. (1956). Reliable circuits using less reliable relays. J. Franklin Institute 262, part I 191-208 and part II 281-297. Newman, C. M. and Wright, A. L. (1981). An invariance principle for certain dependent sequences. Ann. Prob. 9, 671-675. Niu, S. C. (1981). On queues with dependent interarrival and service times. Nay. Res. Log. Quart. 28, 497-501. Pitt, L. D. (1982). Positively correlated normal random variables are associated. Ann. Prob. 10, 496-499. Pledger, G. and Proschan, F. (1971). Comparisons of order statistics and of spacings from heterogeneous distributions. In: J. S. Rustagi, ed., Optimizing Methods in Statistics. Academic Press, New York, 89-113. Proschan, F. (1960). P6lya Type Distributions in Renewal Theory, with an Application to an Inventory Problem. Prentice-Hall, Englewood, NJ. Proschan, F. and Sethuraman, J. (1976). Stochastic comparisons of order statistics from heterogeneous populations, with applications in reliability theory. J. Mult. Anal 6, 608-616. Robbins, H. (1954). A remark on the joint distribution of cumulative sums. Ann. Math. Stat. 25, 614-616. Ross, S. M. (1970). Applied Probability Models with Optimization Applications, Holden-Day, San Francisco. Saunders, S. C. (1970). A probabilistic interpretation of Miner's rule. II. SlAM J. App. Math. 19, 251-265. Shogan, A. W. (1977). Bounding distributions for a stochastic PERT network. Networks 7, 359-381. Solovyev, A. D. and Ushakov, I. A. (1967). Some estimates for systems with components 'wearing out'. (In Russian). Avtomat. i Vycisl. Tehn. 6, 38-44. Smith, W. L. (1968). Renewal theory and its ramifications. J. Roy. Statist. Soc., Series B 20, 243-302. P. R. Krishnaiah and C, R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 175-213 1 1 .lk 1 Reliability Ideas and Applications in Economics and Social Sciences M. C. Bhattacharjee* O. Introduction and summary 0.1. In recent times, Reliability theoretic ideas and methods have been used successfully in several other areas of investigation with a view towards exploiting concepts and tools, which have their roots in Reliability Theory, in other settings to draw useful conclusions. For a purely illustrative list of some of these areas and corresponding problems which have been so addressed, one may mention: demography (bounds on the 'Malthusian parameter', reproductive value and other related parameters in population growth models--useful when the age-specific birth and death-rates are unknown or subject to error: Barlow and Saboia (1973)), queueing theory (probabilistic structure of and bounds on the stationary waiting time and queue lengths in single server queues: Kleinrock (1975), Bergmann and Stoyan (1976), KollerstrOm (1976), Daley (1983)) and economics ('inequality of distribution' and associated problems: Chandra and Singpurwalla (1981), Klefsj6' (1984), Bhattacharjee and Krishnaji (1985)). In each of these problems, the domain of primary concern and immediate reference is not the lifelengths of physical devices/systems of such components or their failurelogic structure per se but some phenomenon, possibly random, evolving in time and space. Nevertheless, the basic reason behind the success of cross-fertilization of ideas and methods in each of the examples listed above is that the concepts and tools which owe their origin to traditional Reliability theory are in principle applicable to non-negative (random) variables and (stochastic) processes generated by such variables. 0.2. Rather than attempt to provide a bibliography of all known applications of Reliability in widely diverse areas, our purpose in this paper is more modest. We review recent work on such applications to some problems in economics and social sciences--which is illustrative of the non-traditional applications of Reliability ideas that is finding increase use. In Section 1, 'social choice functions' and * Work done while the author was visiting the University of Arizona. 175 176 M. C. Bhattacharjee the celebrated 'impossibility theorem' of Arrow (1951) are considered as an application of 'monotone-structure' ideas. Section 2 considers 'voting games' and 'power indices' which are among the best known quantitative models of group behavior in political science, to show they can be modeled via the theory of structure functions. Besides providing new viewpoints and alternative proofs of well known classic results which these situations illustrate, reliability ideas can also lead to new insights. Sections 3 and 4, which exploit appropriate parametric and nonparametric 'life distribution' ideas, are in the latter category. Section 3 considers alternatives to the traditional Lorenz-coefficient and Gini-index for measuring 'inequality of distribution' in economics by exploiting mean residual life and TTT-transform concepts. Section 4 describes an approach to modeling some aspects of the 'economics of innovation and R & D rivalry' by considering the 'reliability characteristics' of the time to innovation of a technologically feasible product or process among a competing group of entrepreneurs or firms which are in the race to be the first to innovate. In each of the four themes, a summary of the problem formulation and basic results of interest precedes the reliability analogies and arguments which can be brought to bear on the problems. No detailed proofs are given except for Arrow's theorem (Section 1.2) from an unpublished technical report whose succint arguments are reviewed to illustrate how the reliability approach can be constructive in clarifying the role of underlying assumptions and an alternative insight. The role of interpretation of appropriate reliability theoretic concepts and results for such an interplay cannot be minimized and are interspersed throughout our presentation. The format is mainly expository in nature, although some results are new. In each section, we also indicate some possible directions of further development that would be interesting from the point of view of the themes addressed and that of reliability theory and applications. 1. The 'Impossibility Theorem' of Arrow 1.i. Arrow (1951) considered the problem of aggregating 'individual perference orderings' to form a 'social preference ordering'. In the conceptual framework of social decision making and particularly in the context of voting theory, his celebrated 'impossibility theorem' is a landmark result which essentially states that there is no social preference ordering which obeys two reasonable axioms and four conditions that one would expect all reasonable ways of aggregating individual preferences to a collective one to satisfy. Pechlivanides (1975) in a paper investigating some aspects of social decision structures, has given an alternative proof of Arrow's theorem using coherent-structure arguments of reliability theory which appears to have remained unpublished and which we believe is a very apt illustration of the reliability arguments for many modeling problems in the social sciences. His arguments are somewhat succint which we will review and amplify. Before reviewing Pechlivanides' proof, we take up a brief description and formal statement of Arrow's theorem which may not be entirely familiar to relia- Reliability applications in economics 177 bility researchers. Central to this is the idea of a preference ordering R among the elements x, y,, ... of a finite set F. R is a relation among the elements of F such that for any x, y ~ F, we say: x R y iff x is at least as preferred as y. Such a relation R is required to satisfy the two axioms: (A1) Transitivity: For all x, y, z t F; x R y and y R z ~ x R z. (A2) Connectedness: For all x, y 6 F; either x R y or y R x or both. Technically R is a complete pre-order on F; it is analogous to a relation such as 'at least as tall as' among a set of persons. Notice that we can have both x R y and y R x but x ~ y. For a given F, it is sometimes easier to understand the relation R through two other relations P, I defined as x P y ~*~ x is strictly preferred to y; while x I y ,~ x and y are equally preferred (indifference). Then note, (i) x R y ~:~ y ~ x, i.e., x R y is the negation o f y P x and (ii) the axiom (A2) says: either x P y or y P x or x I y . Now consider a society S = { 1, 2 . . . . . n} of n-individuals (voters), n >I 2 and a finite set A of alternatives consisting of k-choices (candidates/policies/actions), k > 2. Each individual i t S has a personal preference ordering R i on A satisfying the axions (A1) and (A2). The problem is to aggregate all the individual preferences into a choice for S as a whole. To put it another way, since R; indicates how i 'votes', an 'election' 8 is a complete set of 'votes' {formally, = {Ri:i~ S}) and since the result of any such election must amalgamate its elements (i.e., the individual voter-preferences) in a reasonable manner into a well-defined collective preference of the society S; such a result can be thought of as another relation R* on A which, to be reasonable, must again satisfy the same two axioms (A1) and (A2) with F = A. Arrow conceptualizes the definition of a "voting system" as the specification of a social preference ordering R* given S, A. There are many possible R* that one can define including highly arbitrary ones such as R* = R~ for some i ~ S (such an individual i, if it exists, is called a 'dictator'). To model real-world situations, we require to exclude such unreasonable voting systems and confine ourselves to those R* which satisfy some intuitive criteria of fairness and consistency. Arrow visualized four such conditions, namely: (C1) (Well-definedness). A voting system R * must be capable of a decision. For any pair of alternatives a, b; there exists an 'election' for which the society prefers a to b. [R* must be defined on the set of all n-tuples B = (R~ . . . . . Rn) of individual preferences and is such that for all a, b in A, either a R* b or a ~ * b, there exists an B such that b $ * a . ] (C2) (Independence of Irrelevant Alternatives). R* must be invariant under addition or delition of alternatives. [ I f A ' c A and o~ = {Ri: i t S} is any election, then RI*, should depend only on {Ril A, : i t S} where Rtl A, (RI*, 1, respectively) is the restriction of R; (R* respectively) to A ' . ] (C3) (Positive Responsiveness). An increasing (i.e., nondecreasing) preference for an alternative between two elections does not decrease its social preference. [Formally, given S and A, let g = {R~:i~S} and g ' = { R ' ' i ~ S } be two elections. If there exists an a t A such that M. C. Bhattacharjee 178 (i) a R i a ' =¢. aR; a' for all i t S , and a' ~ a ; (ii) for all pairs ( a ' , b ' ) t A x A with a ' # a , b'~b, a'#b', {(a',b'): a' R,b'} = {(a', b'):a' R; b'}, then, a R * a ' ~ a R * ' a', for all a' ~ a. In other words, if each voter looks on a t A at least as favorably under g ' as he does under g and if the individual preferences between any other pair of altematives remain the same under both elections, then the society looks on a at least as favorably under g ' as it does under do.] (C4) (No Dictator). There is no individual whose preference ('vote') always coincides with the social preference regardless of the other individual preferences. [There does not exist i t S with R* = Ri, i.e., such that for all (a, b), a R i b ~ A R * b and a ~ i b ~ a ~ * b . ] Call a voting system (social preference ordering) R * admissible iff it satisfies the axioms (A1), (A2) and the conditions (C1)-(C4). Arrow's impossibility theorem then claims that for a society of at least two individuals and more than two alternatives, an admissible voting system does not exist. 1.2. The 'reliability' argument. Traditional proof of Arrow's theorem depends heavily on the properties of complete pre-orders. To see the relevance of reliability ideas for proving Arrow's theorem, Pechlivanides imagines the society S as a system and each voter i t S as one of its components. For every pair (a, b) of alternatives with a ¢ b , associate a binary variable x i : A 2 - - * { O , 1}, where A 2 = {(a, b): a t A, b E A, a ~ b} is a set in A x A devoid of its diagonal, by xi(a,b)= 1 i f a R i b , = 0 if aI~ib. (1.1) Relative to b, every xi(a, b) is a vote for a if xe(a, b) = 1 and is a vote against a if it equals zero. Thus x i defines i's vote and is an equivalent description of his individual preference ordering R r The vote-vector x = {x I . . . . . xn): A 2 ~ {0, 1} n is an equivalent description of an election ~ = (R l . . . . , R,). A voting system (social preference ordering) R * is similarly equivalent to specifying a social choice function FA: A 2 ~ {0, 1} such that FA(a,b)= 1 i f a R * b , =0 if a ~ * b . (1.2) Each xe(a, b) = 1 or 0 (FA(a, b) = 1 or 0 respectively) according as the individual i (society S, respectively) does not/does prefer b to a. Formally, Arrow's result is then: IMPOSSIBILITY THEOREM (Arrow). There does not exist a social choice function FA satisfying (A1), (A2) and (C1)-(C4). Reliability applications in economics 179 To argue that the two axioms and four conditions are collectively inconsistent, the first step is to show: LEMMA (C1)-(C3) hold ¢~ FA = 4(x) for some monotone structure function 4. 1. PROOF. Recall that a monotone structure function in reliability theory is any function 4: {0, 1}" ~ {0, 1} such that 4 is non-decreasing in each argument and 4(0) = 0, 4(1)= 1, where 0 = (0, ..., 0) and 1 = (1, ..., 1) (viz., Barlow and Proschan, 1975). First note (C2) ~ FA(a, b) depends only on (a, b) and not on all of A. Hence we will simply write F for F A. The condition (C1) =*. F(a, b) = 4(x(a, b)) for all (a, b ) ~ A 2, for some binary structure function 4. Next, (C3) =*, this 4(x) is monotone non-decreasing in each coordinate x;. Finally (C1) and (C3) together =~ 4(0) = 0, 4(1) = 1; viz., since by (C1), there exist vote-vectors x o and x 1 such that 4(Xo) = 0, 4(xl) = 1; by the monotonicity hypothesis (C3) for 4, we get 0 ~< 4(0) ~< 4(Xo) = 0, 1 : 4(Xl) ~ 4{1) ~ 1. Thus the conditions (C1)-(C3) imply F = 4(x) for some monotone structure function 4. The converse is trivial. [] The axioms (A1) and (A2) for voting systems translated to requirements on the social choice function F(a, b) = 4(x(a, b)) become (A1) Transitivity: F(a, b) = 1 = F(b, c) =~ F(a, c) = 1. (A2) Connectedness: F(a, b)= 1 or 0. Consider a pair of alternatives (a, b ) ~ A 2 such that F(a, b)= 4(x(a, b))= 1. Borrowing the terminology of reliability theory, we will say P(a, b ) = : { i ~ S : xi(a, b ) = 1) = {i~ S: a R, b} (1.3) is an (a, b)-path. Similarly if F(a, b) = 0, call the set of individuals C(a, b) = : { i ~ S : xi(a , b) = 0) = {i6 S: b P~a} (1.4) as an (a, b)-cut. Thus an (a, b)-path ((a, b)-cut, respectively) is any coalition, i.e., subset of individuals whose common 'non-preference of b relative to a' ('preference of b over a', respectively) is inherited by the whole society S. Obviously such paths (cuts) always exist since the whole society S is always a path as well as a cut for every pair of alternatives. When the relevant pair of alternatives (a, b) is clear from the context, we drop the prefix (a, b) for simplicity and just refer to (1.3) and (1.4) as path and cut. A minimal path (cut) is a coalition of which no proper subset is a path (cut). M. C. Bhattacharjee 180 To return to the main proof, notice that Lemma 1 limits the search for social choice functions F = ~(x) to those monotone structure functions tp which satisfy (A1), (A2) and (C4). A social choice function satisfies the connectedness axiom (A2) iff for every pair of alternatives (a, b); there exists either a path or a cut, according as F(a, b) = 1 or 0, whose members' common vote agrees with the social choise F(a, b). The transitivity axiom (A1) that F(a, b)= 1 = F(b, c) =~ F(a, c ) = 1 for each triple of alternatives (a, b, c) can be similarly translated as: for each of the pairs (a, b), (b, c), (a, c); there exists a path, not necessarily the same, which allow the cycle of alternatives a, b, c, to pass. Let ~ ' be the class of monotone structure functions and set = : { ~ J g : no two paths are disjoint}, ~ * =: {q~ J / : intersection of all paths is nonempty}, = where q~d is the dual-structure function ~d(x) = :1 - ~b(1 - x). (~-* respectively) are those monotone structures for which there is at least one common component shared by any two paths (all paths, respectively). ~ is the class of self-dual monotone structures for which every path (cut) is also a cut (path). Clearly i f * ~ ~. Also ~ c ~ ; for if not, then there exists two paths P~, /'2 (which are also cuts by self-duality) which are disjoint so that we then have a cut P1 disjoint from a p a t h / 2 . This contradicts the fact that any two coalitions of which one is a path and the other a cut must have at least one common component, for otherwise it would be possible for a structure tp to fail (tp(x) = 0) and not-fail ((p(x)~ 0) simultaneously violating the weU-definedness condition (C1). Thus c~ ~ * ~ ~ . (1.5) To see if there is an admissible social choice function F, we are asking if there exists a $ ~ ' satisfying (A1), (A2) and (C4). To check that the answer is no, the underlying argument is as follows. First check (A2) ~ ~ ~ (1.6) and hence q ~ ~ by (1.5). Which are the structures in (A2) that satisfy (A1)? We show this is precisely ~ * , i.e., claim ~ (A1) = ~ * (1.7) so that any admissible F = q~(x)~ ~ * . The final step is to show the property defining ~ * and the no-dictator hypothesis (C4) are mutually inconsistent. 181 Reliability applications in economics The following outlines the steps of the argument. For any pair (a, b) of alternatives, the society S obeying axiom (A2) must either decide 'b is not preferred to a' (F(a, b)= q)(x(a, b))= 0) or its negation 'b is preferred to a' (F(a, b) = ¢(x(a, b)) = 1). If the individual votes x(a, b) result in either of these two social choices as it must, the dual response 1 - x(a, b) (which changes every individual vote in x(a, b) to its negation) must induce the other; i.e., for each x, q~(x) = 0 (1, resp.) ¢> q~(1 - x) = 1 (0, resp.) .¢~ ~a(x) = 0 (1, resp.) = ¢(x) Thus (A2) restricts use to ~. To argue (1.6), consider a q~e o~*. If i0 is a component individual common to all paths for all pairs of alternatives, then {io} is necessarily a cut; i.e., systems in ~ * have a singleton cut {to}. Since this component io obeys the transitivity axiom, so does q~. Thus systems in o~* satisfy (A1) so that together with o~ * c o~ we see, o~* is contained in o~ n (A1). One thus has to only argue the reverse inclusion: systems in ~ obeying transitivity must be in o~*. Consider any such system cpe ~ and the set of all of its paths for all alternative pairs (a, b). Now (i) if there is only a single path, then cp¢ o~* trivially and hence satisfies (A1) since ~ * does. (ii) If there are exactly two paths in all, then ~ = ~ * ; so again ¢ e ~'* satisfying (A1). (iii) If there are at least three paths, choose any three, say P~, p2, p3. Let i*(1, 2) be a component in p1 ~ e2. Suppose i*(1, 2) ¢ p3 if possible. Then there exists distinct components i*(2, 3), i*(1, 3) in p2 n p3 and p1 c~ p3 respectively. Choose the component-votes (individual preference orderings) of these components, and "the system-votes (social choices) by appropriate choices of the votes for the remaining components in the three paths for an arbitrary but fixed cycle of alternatives (a, b, c) as shown in Table 1 (for simplicity, the component preferences and votes are generically denoted by P and x(., ") by suppressing the individual identity subscript. Thus for i*(1, 2), the preference P = Pi*(1.2), x(a, b) = xi.(1 ' 2)(a, b) . . . . etc.). Table 1 Paths Common component Individual preference Equivalent componentvote Suitable choices of votes for other components in Corresponding social choice ply p2 i~(1, 2) t~(2, 3) t'*(1, 3) aP bP c cP aP b bP cP a x(c, b) = x(b, a) = 0 x(b, a) = x(a, c) = 0 x(a, c) = x(c, b) = 0 p1 p2 p3 F(c, b) = 0 F(b, a) = 0 F(a, c) = 0 p2, p3 p l , p3 182 M. C. Bhattacharjee Since F = cp(x) is self-dual, we have F(a,b)= 1-F(b,a), all ( a , b ) ~ A 2 ; viz., xi(a, b) = 1 - xt(b, a), all i~ S, all (a, b); hence F(a, b) = qb(x(a, b)) = ~d(x(a, b) = 1 - ~p(1 - x(a, b)) = 1 - ~(x(b, a)) = 1 - F(b, a). Hence, for the cycle of alternatives (a, b, c); from the last column of the above table, we have: F(b, c) = 1 = F(c, a), but F(b, a) = 0; thus contradicting the transitiveness axiom (A1). Hence all three paths must share a common component. In the spirit of the above construction, an inductive argument can now similarly show that if there are (j + 1) paths in all and if every set of j paths have a common component, then so does the set of all (j + 1) paths; j = 1, 2 . . . . if (A1) is to hold. Thus there is a component common to all paths, i.e., q ~ if*. Let i* be such a component. Since i* belongs to every path, it is a one-component cut. It is also a one component path, but the self-duality of qk That {i*} is both a path and a cut says, x,.=l(o) ~ ~(x)=l(0), irrespective of the votes x~ of all other individuals i ~ S , i # i*. Hence i* is a dictator. But this contradicts (C4). [] While unless there are at least two individual components (n >~ 2) the problem of aggregation is vacuous, notice the role of the assumption that there are at least three choices ( k > 2 alternatives) which places the transitiveness axiom in perspective. There are real-life voting systems (social choice functions) which do not satisfy (A1). One such example is the majority system R * such that aR*b .¢~ N ( a , b ) > l N ( b , a ) where N ( a , b ) = {# of voters i ~ S with aRab} = ~ x~(a,b). i=1 Since each individual is a one-component self-dual system (viz., xi(a, b) = 1 - xi(b, a), all (a, b)); the social choice function F corresponding to the majority voting system R* is r(a, b ) = (a(x(a, b))= O(l) ~ ~ xi(a, 6)>1 (<)½n. i=l Thus F is the so-called (m, n)-structure cp in reliability theory, where m = [½n] + 1 i f n o d d , = ½n i f n even. Reliability applications in economics 183 This F = ~p(x) is monotone, indeed a coherent-structure; but F and the corresponding voting system R* is not transitive since with three choices (a, b, c), we may have a majority (>~ n/2) voters not preferring 'c to b' and 'b to a' but strictly less than a majority not preferring 'c to a'. Formally Y~7=1x~(a, b)>~n/2, "i = 1 x i ( b , c) >i n/2 but ~ni = 1 xi(a, c) < n/2; correspondingly F(a, b) = F(b, c) = 1 but F(a, c) = O. The non-transitiveness of majority systems is a telling example of the impossibility of meeting conflicting requirements each of which is desirable by itself. Pechlivanides (ibid.) also shows that if we replace axiom (A1) by symmetry of components (i.e., require tp(x) to be permutation-invariant in coordinates of x) but retain all other assumptions in Arrow's theorem; the only possible resulting structures are the odd-majority systems. In this sense, majority voting systems with an odd number (n = 2m + 1) of voters is a reasonable system. While transitiveness is essentially a consistency requirement, the symmetry hypothesis is an assumption of irrelevance of the identity of individuals in that any mutual exchange of their identities do not affect the collective choice. One can ponder the implications of the trade-off between these assumptions for any theory of democratic behavior for social decision maing. 1.3. The monotone structures tp in Lemma 1 are referred to as coherent structures in Pechlivanides (1975). In accepted contemporary use (viz., Barlow and Proschan, 1975) however, coherence requires substituting the assumption q~(x) = x for x = 0, 1 for monotone structures by the assumption that all components are 'relevant'. A component (voter) i E S is irrelevant if its (the person's) functioning or non-functioning (individual preference for or against an alternative) does not affect the system's performance (social choice) i.e., ~(x) is constant in all x~, equivalently tp(1,, x) - tp(0;, x) = 0, all x where (0;, x):= (x I . . . . x,._ l, 0, xi+ 1. . . . . xn) and (li, x) is defined similarly. Hence tp(.;, x) is the social choice given i's vote, i e S. Thus, ie S is relevant ¢~ q~(li, x) - tp(0i, x) ¢ 0, some x ~b(li, x(a, b)) - ~(0 i, x(a, b)) v~ O, some (a, b) when relevance is translated in terms of social choice given i's vote; while i ~ S is a dictator q~(le, x(a, b) = 1, qb(Oi, x(a, b)) = O, all (a, b). Let S~, b = {i~ S: ¢(li, x(a, b)) - ~(0,, x(a, b)) = O} 184 M. C. Bhattacharjee Then the set of dictators, if any, is D = {i ~ S: tp(1 t, x) - (a(Oe, x) ~ O, all x} = ~ S~, b, (a, b ) ~ A 2 while the set of irrelevant components is D O = {i 6 S : tP(li, x) - tP(Oi, x) = O, all x} = (~ Sa, a. (a, b ) ~ A 2 Note, tp is coherent ,~ ~p is coordinatewise monotone nondecreasing and D O = (empty); while the 'no dictator hypothesis' holds ~,, D = ~. In the context of the social choice problem, we may call D O as the set of 'dummy' voters who are those whose individual preferences are of no consequence for the social choice. An assumption of no dummies (Do empty), which together with (CI)-(C3) then leads to a coherent social choice function F = ~p(x), would require that for every individual there is some pair of alternatives (a, b) for which the social preference agrees with his own. By contrast Arrow's no-indicator hypothesis is the other side of the coin: i.e., for every individual there is some (a, b) for which his preference is immaterial as a determinant of the society's choice. While the coherence assumption of reliability theory has yielded rich dividends for modeling aging/wear and tear of physical systems, it is also clear that the 'no dummy' interpretation of 'all components are relevant' assumption is certainly not an unreasonable one to require of social choice functions. What are the implications, for traditional reliability theory, of replacing the condition of relevance of each component for coherent structures by the no-dictator hypothesis ? Conversely in the framework of social choice, it may be interesting to persue the ramifications of substituting the no dictator hypothesis (C4) by the condition of 'no dummy voters'--themes which we will not pursue here, but which may lead to new insights. 2. Voting g a m e s and political power We turn to 'voting games' as another illustration of the application of reliability ideas in other fields. Of interest to political scientists, these are among the better known mathematical models of group behavior which attempt to explain the processes of decision for or against an issue in the social setting of a committee of n persons and formalize the notion of political power. For an excellent overview of literature and recent research in this area, see Lucas (1978), Deegan and Packel (1978), and Straffin (1978)--all in Brams, Lucas and Straffin (1978a). 2.1. The model and basic results. Denote a committee of n persons by N. Elements of N are called players. We can take N = {1, 2 . . . . . n} without loss of generality. A coalition is any subset S of players, S ~ N. Each player votes yes or no, i.e., for or against the proposition. A winning (blocking) coalition is any Reliability applications in economics 185 coalition whose individual yes (no)-votes collectively ensure the committee passes (falls) the proposition Let W be the set of winning coalitions and v: 2Jv~ {0, 1}, t h e binary coalition-value function v(S) = 1 if S ~ W (S winning), = 0 if s~ W (S is not winning). (2.1) Formally, a simple voting game G (also referred to as a simple game) is an ordered pair G = (N, W), such that (i) ~ s W , N ~ W and (ii) S ~ W , S c T =~ T e W (if everyone votes 'no' ('yes'), the proposition fails (wins); and any coalition containing a winning coalition is also a winning coalition) or, equivalently by an ordered pair (N, v) where (i) v(~) = 0, v(S) = 1 and (ii) v is nondecreasing. The geometry and analysis of winning coalitions in voting games, as conceptual models of real life committee situations, provides insights into the decision processes involved within a group behavior setting for accepting or rejecting a proposition. The theoretical framework invoked for such analysis is that of multiperson cooperative games in which the games G are a special class. To formulate notions of political power we view a measure of individual player's ability to influence the result of a voting game G as a measure of such power. Two such power indices have been advanced. To describe these we need the notions of a pivot and a swing. For any permutation odering 7t = (re(l), ..., re(n)) of the players N = { 1, ..., n), let Ji(r0 = {j ~ N: re(j) preceeds zr(i)} be the predecessor of i. The player i is a pivot in zc if Jr(re) ~ W but Je(rc) u {i) e W; i.e., player i is a pivot if i's vote is decisive in the sense that given the votes are cast sequentially in the order 7r; his vote turns a loosing coalition into a winning one. A coalition S is a swing for i if i E S, S e W but S \ { i } q~ W; i.e., if his vote is critical in turning a winning coalition into a loosing one by changing his vote. Then we have the following two power indices for each player i e N: (Shapley- Shubik) • i =:P(i is pivotal when all permutations are equiprobable) = ~ ( s - 1)!(n - s)! , n! (2.2) where s = :[ S] = the number of voters in S and the sum is over all s such that S is a swing for i. 186 M. C. Bhattacharjee (Banzhaff) /~+= :proportion of swings for i among all coalitions in which i votes 'yes' _ 7+ Y~+~N7+ _ 7+ , (2.3) 2 n-1 where 7+ is the number of swings for i. The Banzhaff power index also has a probability interpretation that we shall see later (Section 2.4). If the indicator variable, xi = 1 if player i votes 'yes', =0 if player i votes 'no', (2.4) denotes i's vote and C l ( x ) = {x: x+ --- i} is the coalition of assenting players for a realization x = (x 1, . . . , xn) of 2 n such voting configurations, then the outcome function ¢: {0, 1}n~ {0, 1} of the voting game is q,(x) = v ( C , ( x ) ) , where v is as defined in (2.1) and tells us whether the proposition passes or fails in the committee. Note q/models the decision structure in the committee given its rules, i.e., given the winning coalitions. In the stochastic version of a simple game, the voting configuration X = (X 1, . . . , Xn) is a random vector whose joint distribution determines the voting-function v =:E~O(X) = P { $ ( X ) = 1}, the win probability of the proposition in the voting game. Sensitivity of v to the parameters of the distribution of X captures the effects of individual players' and their different possible coalitions' voting attitudes on the collective committee decision for a specified decision structure ft. When the players act independently with probabilities p = (Pl . . . . . Pn) of voting 'yes', the voting function is (2.5) v = h(p) for some h: [0, 1 ] n ~ [0, 1]. The function h is called Owen's multilinear extension and satisfies (Owen, 1981): h ( p ) = p~h(l~, p) + (1 - p+)h(O~, p ) , Oh he(p) = : - - = h(l+, p) - h(0+, p ) , since the outcome function can be seen to obey the decomposition (2.6) Reliability applications in economics 187 (2.7) ~k(x) = xiO(le, x) + (1 - x~) ~k(O. x ) , where ('i,x) EO(., is same x) = h(pl ..... as x except xi is specified and h(.,p)=: P i - 1, ", P~+ 1. . . . , p , ) . These identities are reminiscent of well known results in reliability theory on the reliability function of coherent structures of independent components, a theme we return to in Section 2.2. If, as a more realistic description of voting behavior, one wants to drop the assumption of independent players; the modeling choices become literally too wide to draw meaningful conclusions. The problem of assigning suitable joint distributions to the voting configuration X = {X1. . . . , X,) which would capture and mimic some of the essence of real life voting situations has been considered by Straffin (1978a) and others. Straffin assumes the players to be homogeneous in the sense that they have a common 'yes' voting probability p chosen randomly in [0, 1]. Thus according to Straffin's homogeneity assumption; the players agree to collectively or through a third party select a random number p in the unit interval and then given the choice of p, vote independently. The fact that p has a prior, in this case the uniform distribution, makes (X 1. . . . . X.) mutually dependent with joint distribution P(Xr:(1 ) ..... X . ( k ) = 1, X . ( k + k ! ( n - k)! 1) . . . . . X u ( n ) = O) - (n + 1)! (2.8) for any permutation (n(1), ..., n(n)) of the players. (2.8) is a description of homogeneity of the players which Straffin uses to formulate (i) a power index and (ii) an agreement index which is a measure of the extent to which a player's vote and the outcome function coincide. He also considers the relationship between these indices corresponding to the uniform prior and the prior f ( p ) = constp(1 - p ) ; results we will fred more convenient to describe in a more general format in the next section. 2.2. Implications of the reliability framework for voting games. F r o m the above discussions, it is clear that voting games are conceptually equivalent to systems of components in reliability theory. Table 2 is a list o f the dual interpretations of several theoretical concepts in the two contexts: Table 2 Voting games Reliability structures player committee winning (loosing) coalition blocking coalition outcome function voting function multilinear extension component system patch (cut) complement of a cut structure function reliability function reliability function with independent components 188 M. C. Bhattacharjee Thus every voting game has an equivalent reliability network representation and can consequently be analysed using methods of the latter. As an illustration consider the following: EXAMPLE. The simple game (N, IV) with a five N = {1, 2, 3, 4, 5} and winning coalitions IV as the sets (1,2,5), (2,3,5), (1,2,3,5), (1,3,4,5,) (1,4,5), (2,4,5), (1,2,4,5), (2,3,4,5). player committee (1,2,3,4,5), This voting game is equivalent to a coherent structure 1 3 I O 2 5 4 of two parallel subsystems of two components each and a fifth component all in series. We see that to win in the corresponding voting game, a proposition must pass through each of two subcommittees with '50~o majority wins' voting rule and then also be passed by the chairperson (component 5). The voting function of this game when committee members vote 'yes' independently with a probability p (i.e., the version of Owen's multilinear extension in the i.i.d, case) is thus given by the reliability function h(p) = p3(2 - p)2 of the above coherent structure. The minimal path sets of this structure are the smallest possible winning coalitions, which are the four 3-player coalitions in IV. Since the minimal cut sets are (1, 2), (3, 4) and (5), their complements (3,4,5), (1,2,5), (1,2,3,4) are the minimal blocking conditions which are the smallest possible coalitions B with veto-power in the sense that their complements N \ B are not winning coalitions. To persue the reliability analogy further, we proceed as follows. Although it is not the usual way, we may look at a voting game (N, W) as the social choice problem of Section 1 when there are only two alternatives A = {a, b}. Set a = fail the proposition, and b = pass the proposition. Player i's personal preference ordering R; is then defined by Reliability applications in economics aR;b(ag,.b) ~ 189 i d o e s not (does) prefer b t o a i votes no (yes). If xi is i's 'vote' as in (2.4) and y,. = yi(a, b) = 1 or 0 according as a R~ b or a ~,. b (as in Section 1) is the indicator of preference, then Ye = 1 - xi, i s N , and clearly qJ(x) = 0 (1) ~ proposition fails (passes) ~ qJ(1 - x) = (p(y) --- 1 (0), where (p is the social choice and ~ the outcome function. Hence qJ(x) = 1 - q~(1 - x) = ~bd(x) = tp(x) since ~b is self-dual. Thus ~O= (p and hence qJ is also self-dual. The latter in particular implies the existence of a player who must be present in every winning coalition (viz. (1.7)). With the choice set restricted to two alternatives; Arrow's condition (C1) is trivial, condition (C2) of irrelevant alternatives is vacously true and so is the transitivity axiom (A1). Since ~O= tp, the condition (C1) says ~k(x) must be defined for all x while axiom (A2) says ~k is binary. The condition of positive responsiveness (C3) holds ¢~- all supersets of winning coalitions are winning, built in the definition of a voting game. Lemma 1 thus implies: LEMMA 2. The outcome function ~k o f a voting game is a monotone structure function. ~b is a coherent structure iff there are no "dummies'. The first part of the above result is due to Ramamarthy and Parthasarathy (1984). The social choice function analogy of the outcome function and its coherence in the absence of dummies is new. A dummy player is one whose exclusion from a winning coalition does not destroy the winning property of the reduced coalition, i.e., i~Nis dummy ~*, i~S, S~W ~ S\{i}¢W. Equivalently, i is not a dummy iff there is a swing S for i. The coherence conclusion in Lemma 2 holds since in a voting game the 'no dummy hypothesis' says all components are relevant in the equivalent reliability network, viz. for any i~N, i is relevant ~ there exists x ° such that ~O(li, x °) - qJ(0;, x °) ~ 0 So=:{j~U:j¢i, x ° = 1} u {i} is a swing for i ¢~ player i is not a dummy. An equivalent characterization of a dummy i ~ N is that i ¢ minimal winning coalitions. On the other hand in the social choice scenario of Section 1, a player i ~ N is a dictator if {i} is a winning as well as a blocking coalition. When the players act independently in a stochastic voting game, we recognize the identities (2.6), (2.7) on the outcome function and Owen's multilinears extension as reproducing standard decomposition results in coherent structure M. C. Bhattacharjee 190 theory, as they must. The voting funcion h(p) being a monotone (coherent) structure's reliability function must be coordinatewise monotone: p<~p' =~ h(p)<~ h(p') which has been independently recognized in the voting game context (Owen, 1982). The Banzhaffpower index (2.3) is none other than the structural importance of components in ~. Since research in voting games and reliability structures have evolved largely independent of each other, this general lack of recognition of their dualism has been the source of some unnecessary duplication of effort. Every result in either theory has a dual interpretation in the other, although they may not be equally meaningful in both contexts. The following are some further well known reliability ideas in the context of independent or i.i.d. components which have appropriate and interesting implications for voting games. With the exception of 2.2.1 below, we believe the impact of these ideas have not yet been recognized in the literature on voting games with independent or i.i.d. players. 2.2.1. The reliability importance v, = E{~/,(1,, x) - ~k(Oi, x)} (2.9) measures how crucial is i's vote in a game with outcome function ~k and random voting probabilities. As an index of i's voting power, v; is defined for any stochastic voting configuration X and has been used by Straffin within the homogeneity framework ((X~, . . . , X,) conditionally i.i.d, given p). We may call v; the voting importance of i. If the players are independent, then Vi = h i ( p ) in the notation of Section 2.1 (viz. (2.6)). Thus e.g., in the stochastic unanimity game where all players must vote yes to pass a proposition, the player least likely to vote in favor has the most voting importance. Similarly in other committee decision structures, one can use vi to rank the players in order of their voting importance. For a game with i.i.d, players, i's voting importance becomes the function v; = hi(p) where he(p) = h(1 i, p) - h(O;, p) and h('i, o), h(p) denote the corresponding versions of h(.i, p), h(p) respectively when p = (p . . . . . p). Since in this case h'(p) = Y,i~Nhi(P), one can also use the proportional voting importance v~* - vi E j ~ N Vj _hi(P) h' ( p ) as a normalized power index in the i.i.d, case. 2.2.2. The fault-tree-analysis algorithm of reliability theory will systematically enumerate the smallest cut sets and hence the minimal blocking coalitions of a voting game through its reliability network representation. The dual event tree Reliability applications in economics 191 algorithm will similarly produce all minimal winning coalitions, the Banzhaff power indices and the voting importances. 2.2.3. S-shapedness of the voting function for i.i.d, players with no dummies. This follows from the M o o r e - S h a n n o n inequality (Barlow and Proschan, 1965) dh p(1 - p) ~ >~ h(p)(1 - h(p)) dp for the reliability function of a coherent structure with i.i.d, components. Implications of this f a c t in the voting game context is probably not well known. In particular the S-shapedness of the voting function implies that among all committees of a given size n, the k-out-of-n structure (lOOk~n% majority voting games) have the sharpest rate of increase of the probability of a committee of n i.i.d, players passing a bill as the players' common yes-voting probability increases. 2.2.4. Component duplication is more effective than system duplication. This property of a structure function implies: replicating committees is less effective in the sense of resulting in a smaller outcome/voting function than replicating committee members by subcommittees (modules) which mimic the original committee structure ~. This may be useful in the context of designing representative bodies when such choices are available. 2.2.5. Composition of coherent structures. Suppose a voting game (N, W) has no dummies and is not an unanimity game (series structure) or its dual (any single yes vote is enough: parallel structure). Suppose each player in this committee N with structure ~b is replaced by a subcommittee whose structure replicates the original committee, and this process is repeated k-times; k = 1, 2, .... With i.i.d. players, the voting function hk(p) of the resulting expanded committee is then the reliability function of the k-fold composition of the coherent structure qJ which has the property hk(p) $ 0, = Po, 1' 1 ¢> p < , = or > Po as ki', ~ or ~ ~ (Barlow and Proschan, 1965) where Po is the unique value satisfying h(po) = Po, guaranteed by S-shapedness. When we interpret the above for voting games, the first conclusion is perhaps not surprising, although the role of the critical value Po is not fully intuitive. The other two run counter to crude intuition; particularly the last one which says that by expanding the original committee through enough repeated compositions, one can almost ensure winning any proposition which is sufficiently attractive individually. The dictum 'too many cooks spoil the broth' does not apply here. 192 M. C. Bhattacharjee 2.2.6. Compound voting games and modular decomposition. If (Nj, Wj), j = 1, 2, ..., k, are simple games with palrwise disjoint player sets and (M, V) is a simple game with XMI = k players; the compound voting game (N, W ) is defined as the game with N = Uj= ~Nj and W= {ScN: {jeM: SnNje Wj.}e V}. (M, V) is called the master-game and (Nj, Wj) the modules of the compound game (N, W). The combinatorial aspects of compound voting games have been extensively studied. Considering the equivalent reliability networks it is clear however that if the component games (Nj, Wj) have structures ~, j = 1, ..., k, and the master game (M, V) has structure tp; then the compound voting game (N, W) has structure = ,/,(¢,, ..., ~). Conversely the existence of some tp, ~k~, ..., ~bk satisfying this representation for a given ~k can be taken as an equivalent definition of the corresponding master game, component subgames and the accompanying player sets as the modular sets of the original voting game. E.g., in the 5-player example at the beginning of this section, clearly both subcommittees J1 = { 1, 2}, J2 - {3, 4} are modular sets and the corresponding parallel subsystems are the subgame modules. Ramamurthy and Parthasarathy (1983) have recently exploited the results on modular decomposition of coherent systems to investigate voting games in relation to its component subgames (modules) and to decompose a compound voting game into its modular factors (player sets obtained by intersecting maximal modular sets or their complements with each other). Modular factors decompose a voting game into its largest disjoint modules. The following is typical of the results which can be derived via coherent structure arguments (Ramanurthy and Parthasarathy, 1983). THREE MODULES THEOREM. Let J;, i = 1, 2, 3, be coalitions in a voting game (N, W ) with a structure ~b such that Ja to J2, Jz to J3 are both modular. Then each J~ is modular, i = 1, 2, 3 and U~= x Ji is either itself modular or the full committee N. The modules (J1, ~ki) i = 1, 2, 3 which appear in (N, ~k) are either in series or in parallel, i.e., the three-player master game is either an unanimity game, or a trivial game where the only blocking location is the full committee. 2.3. The usual approach in modeling coherent structures of dependent components is to assume the components are associated (Barlow and Proschan, 1975). By contrast, the prevalent theoretical approach in voting games, as suggested by Straffin (1978) when the players are not independent assumes a special form of dependence according to (2.8). One can show that (2.8) implies X 1. . . . , Xn are associated. Thus voting game results under Straffin's model and its generalized version suggests an approach for modeling dependent coherent structures. These Reliability applications in economics 193 results are necessarily stronger than those that can be derived under the associatedness hypothesis alone. The remarkable insight behind Straffin's homogeneity assumption is that it amounts to the voting configuration X being a finite segment of a special sequence of exchangeable variables. The effect of this assumption is that the probability of any voting pattern x -- (x~, . . . , x,) depends only on the size of the assenting and dissenting coalitions and not on the identity of the players, as witness (2.8). One can reproduce this homogeneity of players through an assumption more general than Strattin's. Ramamurthy and Parthasarathy (1984) exploit appropriate reliability ideas to generalize many results of Straffin and others, by considering the following weakening of Straffin's assumption. GENERAL HOMOGENEITY X = (X 1. . . . . HYPOTHESIS. The random voting configuration X , ) is a finite segment of an infinite exchangeable sequence. Since X l , 2 2 , . . . are binary; by the Finnetti's well known theorem, the voting configuration's joint distribution has a representation P(X~o ) . . . . . X,~(k) = 1, X.(k+ ~) . . . . . = --1"~p~'(1 - p ) " - k dF(p) .)o X,~(,,) = O) (2.10) for some prior distribution F on [0, 1]; and the votes X 1 . . . . . X n are conditionally independent given the 'yes' voting probability p. Straffin's homogeneity assumption corresponds to an uniform prior for p, leading to (2.8). For a stochastic voting game defined by its outcome (structure) function ~k, consider the powerindex v,. =:E{$(1 i, X) - ~(0i, X)}, defined in (2.9) and the agreement indices Ai = : e { x , = ¢ ( x ) } , pi =:cov(x;, q4x)), t5 =: cov(X, q l ( X ) l p ) d F ( p ) . ) Also, let b = :cov(P, H ( P ) ) . Here P is the randomized probability of voting 'yes' with prior F in (2.10). Note b, tri are defined only under the general homogeneity assumption, while vi, A t and Pi are well defined for every joint distribution of the voting configuration X. Recall M. C. Bhattacharjee 194 that a power index measures the extent of change in the voting game's outcome as a consequence of a player's switching his vote and an agreement index measures the extent of coincidence of a player's vote and the final outcome. Thus any measure of mutual dependence between two variables reflecting the voting attitudes of a player and the whole committee respectively qualifies as an agreement index. An analysis of the interrelationships of these indices provides an insight into the interactions between players' individual level of command over the game and the extent to which they are in tume with the committee decision and ride the decisive bandwagon. The agreement index A i is due to Rae (1979). Under (2.8), ve becomes Straffin's power index and a e is proportional to an agreement index also considered by Straffin. Note all the coefficients are non-negative. This is clear for ve and A e, and follows Pc, ere and b from standard facts for associated r.v.s. (Barlow and Proschan, 1975) which is weaker than the general homogeneity (GH) hypothesis. The interesting results under the assumption of general homogeneity (Ramamurthy and Parthasarathy, 1984) are pe=ai+b, 1 2 b s ~ ) ~ tri >/ i~N EXe=½ ~ ~0 h(p)(1 - h(p)) d F ( p ) , A e = 2 o - j + 2 b + 1. (2.11) The equality in the second assertion holds only under StralTm's homogeneity (SH) assumption. This assertion follows by noting tre = ~ o1 P ( 1 - h(p))dF(p) under GH, h'(p) = Y'e hi(P), termwise integration by parts in Y~etre with uniform prior to conclude the equality and invoking the S-shapedness of h(p) for the bound. The above relations in particular imply (i) Under GH, i is dummy ¢~ a~ = 0. If the odds of each player voting yes and no are equal under GH, i.e., if the marginal probability P(X e = 1) = ½; then we also have, i dummy ¢:~ Pc--- b ~ A i = 2b + ½. Thus since ~5 is in a sense the minimal affinity between a player's vote and the committee's decision, Straffin suggests using 2a e (Ae - 2b - 1) as an agreement index. (ii) Let w, l = 2 n - w be the number winning and losing coalitions. Since hi(½) = fli (structural importance = Banzhaff power index) and h(1) = w/2"; taking F as a point-mass at ½, (2.11) gives Z fli >/2-2(n-1) wl" i~N Without the equal odds condition, the last relation in (2.11) has a more general version that we may easily develop. Let n; = : .[ 1 p dF(p) = E X~ be the marginal probability of i voting yes under general homogeneity. Then Reliability applications in economics 195 1 A i = ~ P(X i = ~b(X) = j ) = E X~k(1., X ) + E((1 - X~)(1 - ~b(0e, X)) j=0 = E X 1 ~O(X) + E(1 - X 0 ( 1 - if(X)) = 2 cov(X 1, qJ(X)) + E ~O(X){2E X~ - 1} + 1 - E X~ = 2p, + v ( 2 n , - 1) + (1 - hi), = 2 p , + ~ v + (1 - h i ) ( 1 - v) which reduces to the stated relationship whenever n i = 1 for some i e N. Notice that the convex combination term in braces, which measures the marginal contribution to A i of a player's voting probability n/, depends on the game's value v via 1 an interaction term unless n i - 2" 2.4. Influence indices and stochastic compound voting games. There are some interesting relationships among members of a class of voting games via their power and agreement indices. In the spirit of (2.10), consider a compound voting game consisting of the two game modules (i) a voting game G = (N, W) with N = { 1. . . . , n}, and (ii) a simple majority voting game G,, = ( N , W,,) of (2m + 1) players with {n+ 1,...,n+2m, n + 2 m + W m = ( S = U m" ISl>~m+ 1}, Nm= 1}, (2.12) i.e., any majority (at least (m + 1) players) coalition wins. Replacing the player - ( n + 2m + 1) in the majority game by the game G = (N, W), define the compound game G~* = (N*, W*), where N*=NwN,,= {1 . . . . . n , n + 1. . . . . n + 2 m } , W* = {S c N*" either ] S \ N I ~ m + 1 or/and I S \ N I >~m, S n N ~ W}. (2.13) G* models the situation where the player - (n + 2m + 1) in the majority game G m is bound by the wishes of a constituency N, as determined by the outcome of the constituency voting game G = (N, W), which he represents in the committee N m. The winning coalitions in the composite game G* are those which either have enough members to win the majority game G,, or is at most a single vote short of winning the same Gm when the player representing the constituency N is not counted but containing a winning coalition for the constituency game G = (N, W). The winning coalitions in the latter category are precisely those S such that (i) ]S\N[ = m, i.e., for any i¢ S \ N , {i} u S \ N is a swing for every such player i in the majority game Gm and (ii) using appropriate players in S also wins the constituency voting game G. With i.i.d, voting configuration, if hi(p) and h*(p) M. C. Bhattacharjee 196 respectively denote the voting importance of i~ N in G and G*, then clearly h*(p)=(2n~)pm(1-p)mh,(p) , i~N. (2.14) Under general homogeneity, the class of priors F a . b ( p ) = ( a ~( )aT+(b~-- - l ) 1)! ! fo p u a - 1 ( 1 - u ) b- 1 du, a>O, b>O, which leads to the voting configuration distribution a(k) b(n - k) /'(X~ . . . . . X k = 1, Xk+~ . . . . = X. = 0)- (a + b) (") ' (2.15) can reflect different degrees of mutual dependence (tendency of alignments and formation of voting blocks) of players for different choices of a, b. Player i's vote X,. in the model (2.15) is described by the result of the i-th drawing in the well known Polya-urn model which starts with a white and b black balls and adds a ball of the same color as the one drawn in successive random drawings. For any voting game G with a Polya-urn prior Fa. b, denote the associated influence indices of power/agreement by writing ve = re(G: a, b), etc . . . . Notice that Straffin's original homogeneity assumption corresponds to the prior F1, 1. Notice that Straffin's original homogeneity assumption corresponds to the prior F1, 2. Using vi(G: a, b)= S~ht(p)dF(p) and (2.14), Ramamurthy and Parthasarathy (1984) have shown: v,.(G: 1, 1)= ~i, a/(G: a, b ) = ab (a+b)(a+b+ vi(G: a + 1, b + 1), 1) and, in the framework of the compound voting game G* in (2.13), oi(G: m + 1, m + 1) = (2m + l)vi(G*: 1, 1), iEN, (2.16) extending the corresponding results of Straffin (1978) which can be recovered from the above by setting a = b = m = 1. The second assertion above shows that the apparently distinct influence notions of 'agreement' and 'power' are not unrelated and one can capture either one from the other by modifying the degree of dependence among the voters as modeled by (a, b) to (a + 1, b + 1) or (a - 1, b - 1) as may be appropriate. The first assertion states the equivalence of Shapley-Shubik index with voting importance under uniform prior (Straffin's Reliability applications in economics 197 power index), while the third assertion shows a relationship between voting importances in the compound game in (2.13) and the corresponding constituency game under appropriate choice of voter-dependence in the two games. Notice v~(G: m + 1, m + 1)--}fl;, the Banzhaff power-index in the constituency game, since the case of players voting yes or no independently with equal odds (p = ½) can be obtained by letting m ~ oo in the prior Fm+ ~.m + 1" Hence by (2.16), in the composite game G* with (2m + 1) players, (2m + 1)v;(G~: 1, 1)~fle as n ~ oo, ieN, i.e., Straffin's power-index in the compound game G* multiplied by the number of players approaches the Banzhaff power index (structural importance) in the constituency game G = (N, W). The priors Fa. b, under the general homogeneity hypothesis, reflect progressively less and less voter interdependence with increasing (a, b) and thus in this sense also models the maximum possible such dependence under Straffm's homogeneity when a = b = 1, the minimal values for a Polya-urn. To emphasize the conceptual difference as well as similarity of the Shapley-Shulik and Banzhaff indices of power, we may note that they are the two extreme cases of the voting importance vt (viz. 2.9)) corresponding to a = b = 1 and limiting case a = b---} oo. It is interesting to contrast the probability interpretations of the Shapley-Shubik and Banzhaff power indices. A player i~ N is crucial if given the others' votes, his voting makes the difference between winning or loosing the proposition in the committee. While the Shapley-Shubik index ~; in (2.2) is the probability that i ~ N is crucial under Straffin's homogeneity (player's votes are conditionally i.i.d, given p), the Banzhaff index fl; in (2.3) is the probability that i is crucial when the players choose 'yes'-voting probabilities Pi, i ~ N, independently and the Pi, i ~ N are uniformly distributed. The probability of individual group agreement under this independence assumption is /g;. (1) + (1 -/~;). (½) = ½(1 +/8~). The right hand side can be used as an agreement index. These results are due to Straffin (1978). 2.5. While we have argued that several voting game concepts and results are variants of system reliability ideas in a different guise; others and in particular the general homogeneity assumption and its implications may contain important lessons for reliability theory. For example; in systems in which the status of some or all components may not be directly observable except via perfect or highly reliable monitors--such as hazardous components in a nuclear installation, the agreement indices can serve as alternative or surrogate indices of reliability importance of inaccesible components. The general homogeneity assumption in system reliability would amount to considering coherent structures of exchangeable components, a strengthening of the concept of associatedness as a measure 198 M. C. Bhattacharjee of component dependence; an approach which we believe has not been fully exploited and which should lead to more refined results than under associatedness of components alone. 3. 'Inequality' of distribution of wealth 3.1. One of the chief concerns of development economists is the measurement of inequality of income or other economic variables distributed over a population that reflects the degree of disparity in ownership of wealth among its members. The usual tool kit used by economists to measure such inequality of distribution is the well known Lorenz curve and the Gini index for the relevant distribution of income or other similar variables, traditionally assumed to follow a log-normal distribution for which there is substantial empirical evidence and some theoretical arguments. Some studies however have questioned the universality of the lognormal assumption; see e.g., Salem and Mount (1974), MacDonald and Ransom (1979). Mukherjee (1967) has considered some stochastic models leading to gamma distributions for distribution of welath variables such as landholding. Bhattacharjee and Krishnaji (1985) have considered a model for the landholding process across generations, allowing for acquisition and disposal of land in each generation and where ownership is inherited, to argue that the equilibrium distribution of landholding when it exists must be NWU ('new worse than used') in the sense of reliability theory, i.e., the excess residual holding X - t [ X > t over any threshold t stochasticaly dominates the original landholding variable X in the population. The N W U property is a fairly picturesque description of the relative abundance of 'rich' landowners (those holding X > t) compared to the total population of landowners across the entire size scale. In practice, even stronger evidence of disparity has been found. In an attempt to empirically model the distribution of landholdings in India, it has been found (Bhattacharjee and Krishnaji, 1985) that either the log-gamma or/and the D F R gamma laws provide a better approximation to the landholding data for each state Table 3 Landholding in the State of W. Bengal, India (1961-1962) and model estimates Landholding size (acres) NS S Lognormal DFR gamma Loggamma on (1, oo) 0- 1 1- 5 5-10 10-20 >20 1896 1716 482 164 39 2285 1350 333 189 138 1832 1745 515 165 40 1794 422 132 52 Reliability applications in economics 199 in India based on National Sample Survey (NSS) figures. Table 3 is typical of the relatively better approximations provided by the gamma and the log-gamma on (1, ~ ) relative to log-normal. While the log-gamma is known to have an eventually decreasing failure rate, the estimated shape parameter of the gammas were all less than one and typically around ½ for every state and hence all had decreasing failure rates. For landholdings, the NWU argument and the empirical D F R evidence above (everywhere with gammas, or in the long range as with the log-gamma) are suggestive of the possibility of exploiting reliability ideas. If X >/0 is the amount of wealth, such as land, owned with distribution F; it is then natural to invoke appropriate life-distribution for the concepts for the holding distribution F in an attempt to model the degree of inequality present in the pattern of ownership of wealth. The residual-holding X - t l X > t in excess of t with distribution Ft(x ) = 1 - {ff(t + x)/ff(t)} and the mean residual holding g(t) : = E ( X - t IX > t) correspond respectively to the notions of the residual-life and the mean residual life in reliability theory. In particular the extent of wealth which the 'rich' command is described by the behavior of g(t) for large values of t. More generally, the nature of/7, and the excess average holding g(t) over an affluence threshold t as a function of the threshold provides a more detailed description of the pattern of ownership across different levels of affluence in the population. Using the above interpretations of F, and g(t); the notion of skew and heavy tailed distributions of wealth as being symptomatic of the social disparity of ownership can be captured in fairly pitcuresque ways with varying degrees of strength by the different anti-aging classes (DFR, IMRL, NWU, NWUE) of 'life distributions' well known in reliability theory. For example a holding distribution F is D F R (decreasing failure rate: F,i"st stochastically increasing in t) if the proportion of the progressively 'rich' with residual holding in excess of any given amount increases with the level of affluence. The other weaker anti-aging hypotheses: IMRL (increasing mean residual life: g(t)'r ), NWU (new worse than used: Ft >~StF, all t) and N W U E (new worse than used in expectation: g(t)>~ g(0+)) can be similarly interpreted as weaker descriptions of disparity. Motivated by these considerations, Bhattacharjee and Krishnaji (1985) have suggested using 11 = g*/l~, where g* = lim g(t), /~ = g(0 +) t~ oo 1 2 = t ~ o o l i m E ( E I x > t ) = l + limt_~g(t)--t ' (3.1) when they exist, as indices of inequality in the distribution of wealth. They also consider a related measure Io = g* - # =/~(I1 - 1) which is a variant of I~, but 200 M. C. Bhattacharjee is not dimension free as 11, 12 are. The assumption that the limits in (3.1) exist is usually not a real limitation in practice. In particular the existence of g* ~< oo is free under IMRL and DFR assumptions, with g* finite for reasonably nice subfamilies such as the D F R gammas. More generally, the holding distributions for which g* ~< oo (g* < oo respectively) exists is the family of 'age-smooth' life distributions which are those F for which the residual-life hazard function - l n f f t ( x ) converges on [0, oo] ((0, ~ ] respectively) for each x as t ~ o o (Bhattacharjee, 1986). 11 and 12 are indicators of aggregate inequality of the distribution of wealth in two different senses. 11 measures the relative prepondrance of the wealth of the super-rich, while 12 indicates in a sense how rich they are. The traditional index of aggregate inequality, on the other hand, as measured by the classical Gini-index (Lorenz measure) G can be expressed as G = P ( Y > X ) - P(Y<~ X ) = 1 - 2 ~0°° Fa(x ) dF(x), (3.2) where X is the amount of wealth with holding distribution F and Y has the so called "share-distribution' Fl(X ) = : # - 1 f o t dF(t), the share of the population below x. A somewhat pleasantly surprising but not fully understood feature of the three indices 11, I 2 and G is that they turn out to be monotone increasing in the coefficient of variation for many holding distributions F. Such is the case with G under log-normal, 11 under gamma and I 2 under log-gamma (Bhattacharjee and Krishnaji, 1985). Note also that whenever the holding distribution is anti-aging in DFR, IMRL, NWU or NWUE sense, the coefficient of variation (c.v.) is at least one (Barlow and Proschan, 1975); a skewness feature aptly descriptive of the disproportionate share of the rich. Recently the author has considered other inequality indices which share this monotonicity in c.v. under weak anti-aging hypotheses and have re-examined the appropriateness of 11, 12 and measures of aggregate inequality to show (Bhattacharjee, 1986a): (i) The non-trivial case 1 < 12 < m, implies I~ = ~ necessarily and then 12 = (1 + r/:) lim ~,'(t) t~ ~ 11(0 (3.3) where t/ is the coefficient of variation of the holding distribution F, 11(0 = g(t)/l~ = S ~ ff(u) d u / # f f ( t ) ~ I~ = ~ and IFl(t) is the inequality function 11( 0 computed for the share distribution F 1 associated with F. Reliability applications in economics 201 (ii) The ratio of the hazard functions of the holding and share distributions converge to 12: 12 = lim l n ( 1 - F(t)) ' ~ ln(1 - El(t)) (3.4) Clearly 11 ~> l if the holding distribution F is N W U E , with equality iff F is exponential. Similarly by (3.1) I z >/1 with equality iff g(t) = o(t) or, an equivalent condition on hazard functions via (3.4). The question, when 11 and I 2 are finite so as to be meaningful for purposes of comparison across populations has the following answers (Bhattacharjee, 1986a): (iii) 11 < ~ ~ 1 - F(ln x) is ( - p)-varying, for some p • (0, 0o ]. F is strictly N W U E ~ I 1 > 1. (iv) For any holding distribution F, I <~ I 2 <<.00. The different possibilities are characterized by (a) I f F is D F R , then 12 = 1 ~:~ the residual holding scaled by its mean converges to exponential, i.e., e ( x > t + xg(t) [X > t) ~ e - x . This condition is necessary for I 2 = 1, without the D F R hypothesis. (b) 1 < 12 < oo . ~ the "excess holding factor' over an affluence threshold t converges to the Pareto distribution: P(flX> with ~ = & l ( & - t)~x -~, 1). (c) 12 = ~ ¢:~ P ( X - t > x i X > t) ~ t/(t + x) as t ~ 0o. Notice that the distribution on the right hand side is D F R with infinite mean. The n.s.c, in (iii) is the condition of generalized regular variation (Feller, 1966; Senata, 1976): a real valued function h(x) on the half-line is regularly-varying if h(xy)lh(y) converges as y ~ o o and then h ( x y ) / h ( y ) - - , x ~, some ~ ( - ~ , ~). With an obvious interpretation of x ~ when ~ = + ~ , such an h(x) is called a-varying. 3.2. The Lorenz curve and TTT-transform. While 11, 12 and the classical Gini index are all aggregate measures of inequality, it is also useful to have a more dynamic measure of inequality which will describe the variation of the disparity of ownership with changing levels of affluence. This is classically modeled by the Lorenz curve L(p)=# -lf~F-l(u)du, O<~p<~l, where # is the average holding and F - J(u) = inf{t: F(t) >1 u} measures the proportion of total wealth owned by the poorest 100p ~o of the population, and is thus 202 M. C. Bhattacharjee a variant of the share distribution F 1 in (3.2), namely L(F(t)) = Fl(t ). As remarked earlier, the ratio g(t)/# of the mean residual holding to the average holding can also serve such a purpose. The Lorenz curve L and its inverse L - l are both distribution functions on the unit interval. The relevance of reliability ideas for modeling inequality and relationships of the Lorenz curve to some well known functionals of life distributions was first indicated by Chandra and Singpurwalla (1981) and further studied by Klefjs0 (1984). If W(p) =" ~-- 1 ~0F '(p) F(t) dt is the scaled total time on test (TTT) transform of the holding distribution F viewed as a life distribution with mean # and the cumulative TTT-transform, V:= So1 W ( p ) d p , then L ( p ) = W ( p ) - (1 - p)/~- i F - l(p), V=I-G, (Chandra and Singpurwalla, 1981) where the Gini-index fo fo' G= 1-2 = 2 1 F,(t) d F ( t ) = 1 - 2 L ( p ) dp (3.5) { p - L ( p ) } d? is scale-equivalent to the area bounded by the diagonal and the Lorenz curve, as is well known. Based on a random sample with order statistics X(1), X(2). . . . , X(,) from F, the estimated sample Lorenz curve and the Gini-statistic ~'wl / n G.=: j=,j(n -j)(X(j+I)n (n - 1) Z j _ , X(j)) X(j) are similarly related to the total time on test statistic and its cumulative version L. Go=I-V = W. - (n - i) i) j i X(:) , n. Chandra and Singpurwalla (1981), Klefsj0 (1984) and Taillie (1981) have used partial orderings of life distributions to compare the Lorenz curves of holding distributions which are so ordered. For the partial ordering notions Reliability applications in economics 203 (i) H <c F if F - IH is convex, (ii) H < . F if F - 1H is star-shaped (F- ~H(t)/t is increasing, (iii) H <.T F if ( F - 1 / H - 1 ) is increasing, (iv) H < m F if ~x~ { i f ( t ) - H(t)} dt>~ 0, all x > 0, with equality at x = 0; they show, H <oF or H<.TF ~ L~I(p)<.TLT--'(p) H<cF ~ L~'(p)<cLFl(p), H<m F ~ L r ( p ) <~L~r(p) ~ LF(p)<~LI_I(p) , (3.6) In particular taking H to be exponential, the distribution F in (i) above corresponds to DFR, (ii)to D F R A and (iv)to H N W U E (Klefsj6, 1982). Reversing the roles of H and F leads to the dual aging classes. (3.6) implies that L ( p ) <~p + (1 - p)ln(1 - p ) , (3.7) the Lorenz curve of the exponential whenever the holding distribution is H N W U E with a finite mean. This bound obviously remains valid for the smaller class of NWU and D F R distributions for which we have earlier found some theoretical and empirical evidence respectively as plausible models of landholding distributions. In a more general vein, Klefsj6 (1984) remarks that in the spirit of (3.5); contrasting the Lorenz curve against the uniform distribution on (0, 1), the quantities Jk =:(k + Lk=:k(k- 1)fo'pk-l{p-L(p)}dp, 1) f o ~ ( 1 - p ) k k>~ 1, 2 { p _ L ( p ) } d p , k>~2, (3.8) can be used as generalized indices of inequality. The Gini-index is the special case G = J~ = L 2. Notice in view of (3.7), we have Jk >t O, L k >~ 0 for all anti-aging holding distributions F or their 'aging' duals; and J~ = L k = 0 only in the egaliterian case L ( p ) = p where everybody owns the same amount of wealth (F is degenerate). By expressing Jk as Klefsj6 (1984) implicitly notes that Jk can be interpreted as the excess over k - 1 of the ratio of the mean life of a parallel system of (k + 1) i.i.d, components with life distribution F to that of a similar system with exponential lives. Similarly, we note M. C. Bhattacharjee 204 Lk = k ; ( l - u ) ~ - 1 ( 1 - W(u))du= 1 - # 1 ffk(t) dt measures the relative advantage of a component with life F against a series system of k such i.i.d, components as measured by the difference of the corresponding mean lives as a fraction of the component mean life. These interpretations bring to a sharper focus the relationships of the notion of 'inequality of distribution' in economics to measures of system effectiveness in reliability. 3.3. Applications to statistical analysis of lifelengths. The reliability approach to modeling 'inequality of distributions' suggest applications to reliability inference. Using weak convergence of the empirical Lorenz process {L~(t): 0 ~< t ~< 1), L,(t)=:~ =:0 L, -L(t) } if j - 1 < t ~ < -j, n n if t = 0, to a process related to Brownian bridge (Goldie, 1977), it is thus possible to construct a test of exponentiality--a theme of central interest in reliability and life testing. However the difficulty of evaluating the exact distribution of L,(t) to determine the critical points of the goodness-of-fit test based on the sample Lorenz curve has in practice required simulation even in large samples (Gail and Gatswirth, 1978). In contrast the critical cut-off values of the corresponding test based on the sampled TTT-process Wn(t)=:xfn{Wn(j/n)-W(t)}, 0 ~ t ~ < 1, (Barlow and Campo, 1975) are the usual Kolmogroff-Smirnov statistics; since, under the null hypothesis of exponentiality (W(t) = t), Wn(t) converges exactly to the Brownian Bridge. If the alternatives belong to a more restricted family such as the well known non-parametric life distribution classes in reliability, then there are other possibilities. Kelfsj0 (1983) has used a variant of the aggregate inequality index L~ in (3.8) to construct a test of exponentiality against H N B U E ( H N W U E ) alternatives. His test statistic is based on an estimate of B~, =:kLk- ( k - 1), noting B k >/(~<)0 if F is H N B U E ( H N W U E ) with B k = 0 only if F is exponential. Estimation and tests of monotonicity and a turning point of the mean residual life function g(t) have been considered by Hollander and Proschan (1975), Guess and Proschan (1983). Our inequality indices 11 and 12 suggest a related open problem: estimation and tests for I~, I 2 which are parameters descriptive of the tail behavior of the mean residual life. The question of estimating I l is well defined within the family of age-smooth life distributions (Bhattacharjee, 1986). On the other hand the domains of attraction results (Bhattacharjee, 1986a) described earlier, which characterize possible values of 12 implies that estimating 12 and testing I s = 1 against 1 < 12 < oe are problems of independent interest for reliability theory. Reliability applications in economics 205 4. R & D rivalry and the economics of innovation 4.1. Innovations and accompanying technological breakthroughs have changed the lot of mankind throughout history and noticeably more so in the present century at an accelerating pace. Since technological change affects market structure through altering the means of production, economists began to be interested in the subject of technical advance around the fifties. Although there are some earlier references to the economic aspects of technological advance (Taussig, 1915; Hicks, 1932), the stage for serious inquiry on the economics of such advance was set by Schumpeter (1961, 1964, 1975) who emphasized the role of innovation as an economic activity. Since then, the recognition of technical advance as a major source of economic growth has been the subject of many studies, mostly empirical. These studies deal with empirical relationships of industrial innovations to firm size and concentration as indicators of market structure, the 'technologypush' and 'demand-pull' factors (Arrow, 1962) as incentives for innovation, and such other relevant variables. Collectively they point to the need for a conceptual framework and recently an economic theory of technical advance has began to emerge (Kamien and Schwartz, 1982). In this view, the economic agents are firms or entrepreneurs and an act of product- or process-innovation straddles all activities from basic research through invention to development, production, distribution and collection of consequent revenues against the backdrop of industrial rivalry in the competition to gain market supremacy. Schumpeter recognized that acts of invention and innovational entrepreneurship are distinct as are the corresponding risks; and it is only the latter which can lead to the diffusion of benefits of invention to its ultimate consumers. Innovation and entrepreneurship in this framework is viewed as a race to be the first with the incentive of commanding extraordinary profits at least until imitators appear when such monopoly profits will begin to be eroded. The 'Schumpeterian hypothesis' that the opportunity to realize monopoy profits spurs invention and the presence of some monopoly power has a similar effect, the latter also stressed by Galbraith (1952), forms the basis of a modem economic theory of technical advance. The accent is on competition through innovation rather than through price alone, and is thus contrary to the traditional tenets of the western economic doctrine of 'perfect competition' which would eliminate any excess profit of an innovation by immediate imitation. 4.2. The presence of identified or potential rivals who are in the race to be the first to innovate constitutes the major source of uncertainty for an entrepreneur. It is this aspect of innovational ( R & D ) rivalry on which reliability ideas can be brought to bear that is of interest to us. Even within the context of such applications, there are a host of issues in modeling the economics of innovation which can be so addressed within the Schumpeterian framework. Kamien and Schwartz (1982) provide a definitive account of contemporary research on the economics of technical advance, where reliability researchers will recognize the potential to exploit reliability ideas through modeling the uncertainty associated with M. C. Bhattacharjee 206 innovational rivalry and possible duration of monopoly between successful innovation and rivals' imitation. These ideas do not appear to have been explicitly recognized and are only implicit in Kamien and Schwartz (1982). We will consider one such model to focus on the relevance of reliability concepts in modeling the economics of technical advance which may lead to deeper insights into the role of innovational rivalry as a determinant of technological progress. In this simplified model of innovation as an economic activity under the Schumpeter scenario; our entrepreneur or firm has either only one product (economic 'good') or none at all (breaking in as a newcomer), and is competing against rivals to develop an innovation. We assume there is no essential resource constraint and no major uncertainty important enough to warrant stochastic modelling of the entrepreneur's time to complete development. Any desired completion time r can be achieved by spending a required amount C(v) representing the net present value of the cost stream incurred to complete development at time ~. Although it is usual to assume that 0 < C(x) is convex decreasing, for our purposes the latter assumption is unnecessary, and only assuming C(0) sufficiently large to prevent instantaneous development will suffice. Assume a market growth rate 7; 7>, = or < 0 according as the market is growing, stationary or decreasing. The development process is assumed to be contractual in the sense that innovation will be seen through its completion by the entrepreneur as well as the rivals either as a pioneer or as an imitator. The entrepreneur has only an incomplete knowledge about rivals' introduction time T reflected by its d.f. H(t) = P(T<~ t) about which more will be said later. The current rate of the entrepreneur's return r(t; ~, T) at time t depends not only on when the innovation is introduced in the market but also on whether our entrepreneur is a winner succeeding first or, an imitator of the rivals. Let this be r o (receipt on current good) until introduction of the innovation changes it to r 1 or Po recording as some rival or the entrepreneur succeeds first. These rates remain in effect until the moment both the innovating pioneer and the imitator appear. Once the entrepreneur and the rivals are both in the market, the former's rate of return changes again. The current value of its contribution to the total return is a function P(z, T), the current capitalized value of the stream of future receipts, which depend on and T typically through I v - T I: the lag between innovation and imitation. The structure of P also depends on whether the rivals win (r >~ T; correspondingly P = :P1('), say) or imitate (T > z, when P = : Po(')). Accordingly, P(z, T)= P o ( T - z) = PI('~ - T) ifz<T, if z > / T ; Reliability applications in economics 207 and the flow of receipts can be schematically described as below r , min (z, T) ro , P max (~, T) Po x Po [ T z ro , z < T: rival imitates rl P1 z >/T: rival precedes T z The expected net present value of the entrepreneur's returns, with a market interest rate i, as a consequence of the decision to choose an introduction time z is oo U(z) = = L E { e - ( i - ~ ) ' r ( t ; z, T)} dt + E { e - ( ' - r) max(z. T)p(.c, e-(i- r),{ro~(t ) + rill(t) } dt + Po + e -(i-')* Pl(z - t) dH(t) + r)} e - ( i - ,)t~(t) dt e - ( i - ' ) t P o ( t - ~) d H ( t ) . (4.1) The optimal introduction time z* is of course the solution which maximizes the expected value of profit V('O = U(-c)- C('O. (4.2) While z* = 0 can be ruled out by taking C(0) to be sufficiently large, it is possible to have z* = oo (best not to undertake development at all) depending on the relative values of the economic parameters. In the remaining cases there is a finite economically best introduction time. It is usual, but not necessary to have Po >~ ro >1 rl and PD >~ 0, P'I ~< 0 which are easily interpreted: (i) rival precedence, should it occur, does not increase the rate of return from old good which further increases if the entrepreneur succeeds first, (ii)in the post-innovation-cumimitation period, the greater is the lag of rival entry, if we succeed first (the greater is the lag in our following, if the rivals succeed first), the greater (the smaller) is our return from the remaining market. Various special cases m a y occur within these constraints, e.g., rivals' early success m a y m a k e our current good obsolete (r~ = 0); or the entrepreneur m a y be a new entrant with no current good to be 208 M. C. Bhattacharjee replaced (ro = r 1 = 0 ) . Sensitivity of the optimal introduction time to these and other parameters in the model are of obvious economic interest and are easily derived (Kamien and Schwartz, 1982). 4.3. Intensity of rivalry as a reliability idea and its implications. What interests us more is how the speed of development, as reflected by the economic z*, is affected by the extent of innovational rivalry which is built-in in the rivals' introduction time distribution H. Kamien and Schwartz (1982) postulate m H(t) = : P ( T > t) = e -hA(t) and propose h > 0 as a degree of innovational hazard. To avoid confusion with the notion of hazard in reliability theory, we call h as the intensity of innovational rivalry. Setting F(t) = 1 - e-A(O, it is clear that H(t) = fib(t) (4.3) i.e., the rival introduction time d.f. H belongs to a family of distributions with proportional hazards which are of considerable interest in reliability. We may think of F as the distribution of rivals' development time under unit rivalry (h = 1) for judging how fast may the rivals complete development as indicated by H. Since the hazard function A n ( t ) = : - i n H ( t ) is a measure of time-varying innovational risk of rival pre-emption, the proportional hazards hypothesis A~(t) = hA(t) in (4.3) says the effects of time and rivalry on the entrepreneur's innovational hazards are separable and multiplicative. If F has a density and correspondingly a hazard rate (i.e., 'failure rate') 2(0, the so does H with failure rate h2(t). It is the innovational rate of hazard at time t from the viewpoint of our entrepreneur; and by standard reliability theoretic interpretation of failure rates, the conditional probability of rivals' completion soon after t given completion has not occurred within time t is P(T<<. t + 61 T > t) = h62(t) + 0(6). As the intensity of rivalry increases by a factor from h to ch; this probability, for each fixed t and small b, also increases essentiall by the same factor c. To examine the effect of the intensity of rivalry on the speed of development, assume that having imitators is preferable to being one (Po > P~) and that the corresponding rewards are independent of 'innovation-imitation lag' (P'1 = P~ = 0) as a simplifying assumption. By (4.1) and (4.2), the optimal introduction time z* is then the implicit solution of OV - e-(i-~)~[{ro _ Po + h(P, - Po)2(z)}F(z) & + rl - ( i - 2)P~}F(z)] - C'(t) = O, (4.4) Reliability applications in economics 209 satisfying the second derivative condition for a maximum at z*. (4.4) defines z* = z*(h) implicitly as function of the rivalry intensity. Kamien and Schwartz (1982) show that if 2(t) t and 2(t)/A(t)$ in t, (4.5) then either (i) z*(h) 1" or (ii) z*(h) is initially ~ and then t in h. The crux of their argument is the following. If ro(h) is implicitly defined by the equation 2(t){A~z)- h} = {po - ro + rl - ( i - 2)P1}/(Po- P1), (4.6) i.e., the condition for the left hand side of (4.4) to have a local extremum as a function of h; then z*(h) is decreasing, stationary or increasing in h according as z*(h) > , = or < zo(h). Accordingly, since (4.5) implies that zo(h) is decreasing in h; either z*(h) behaves according to one of the two possibilities mentioned, or (iii) r*(h) < zo(h) for all h >~ 0. The last possibility can be ruled out by the continuity of V= V(z, h) in (4.2), V(0, h ) < 0, V(z*, h ) > 0 and the condition P1 > Po. Which one of the two possibilities obtains of course depends on the model parameters. In case (i), the optimal introduction time z*(h) increases with increasing rivalry and the absence of rivalry (h = 0) yields the smallest such optimal introduction time. The other case (ii), that depending on the rates of return and other relevant parameters, there may be an intermediate degree of rivalry for which the optimal development is quickest possible, is certainly not obvious a-priori and highlights the non-intuitive effects of rivalry on decisions to innovate. 4.4. Further reliability ramifications. From a reliability point of view, Kamien and Schwartz's assumption (4.5) says F ~ {IFR} c3 ~ (4.7) and hence so does H; where ~( is the set of life distributions with a log-concave hazard function. The IFR hypothesis is easy to interpret. It says; the composite rivals' residual time to development is stochastically decreasing so that if they have not succeeded so far, then completion of their development within any additional deadline becomes more and more likely with elapsed time. This reflects the accumulation of efforts positively reinforcing the chances of success in future. The other condition that F, and thus H, also has a log-concave hazard function is less apparent to such interpretation; it essentially restricts the way in which the time-dependent component of the entrepreneur's innovational hazard from competing rivals grows with time t. The proportional hazard model (4.3) can accomodate different configurations of market structure as special cases, an argument clearly in its favor. By (4.3), as M. C. Bhattacharjee 210 h --, O, P(T > t) ~ 1 for all t > 0 and in the limiting case T is an improper r.v. witb all its mass at infinity. Thus h = 0 corresponds to absence of rivalry. Similarly as h ~ 0% P ( T > t)---,O for all t > 0; in the limit the composite rivals' appearance is immediate and this prevents the possibility of entreprenunial precedence. If our entrepreneur had a head start with no rivals until a later time when rivals appear with a very large h, then even if our entrepreneur innovates first; his supernormal profits from innovation will very quickly be eliminated by rival imitation with high probability within a very short time as a consequence of high rivalry intensity h, which shrinks to instantaneous imitation as h approaches infinity. In this sense the case h = oo reflects the traditional economists' dream of 'perfect competition'. Among the remaining possibilities 0 < h < oo that reflect more of a realism, Barzel (1968) distinguishes between moderate and intense rivalry, the latter corresponding to the situation when the intensity of rivalry exceeds the market growth rate ( h > 7). If rivalry is sufficiently intense, no development becomes best (h >>~, ~ z*(h) = ~ ) . In other cases, the intense rivalry and non-rivalous solutions provide vividly contrasting benchmarks to understand the innovation process under varying degrees of moderate to intense rivalry. Our modeling to illustrate the use of reliability ideas has been limited to a relatively simplified situation. It is possible to introduce other variations and features of realism such as modification of rivals' effort as a result of entrepreneur's early success, budget constraints, non-contractual development which allows the option of stopping development under rival precedence, and game theoretic formulations which incorporate technical uncertainty. There is now substantial literature on these various aspects of innovation as an economic process (DasGupta and Stiglitz, 1980, 1980a; Kamien and Schwarz, 1968, 1971, 1972, 1974, 1975, 1982; Lee and Wilde, 1980; Lowry, 1979). It appears to us that there are many questions, interesting from a reliability application viewpoint which can be profitably asked and would lead to a deeper understanding of the economics of innovation. Even in the context of the present model which captures the essence of the innovating proces under risk of rivalry, there are many such questions. For example, what kind of framework for R & D rivalry and market mechanisms lead to the rival entry model (4.3)? Stochastic modeling of such mechanisms would be of obvious interest. Note the exponential: H ( t ) = e -m, 2(0 = 1; Weibull: H(t) = e -h'~, 2(0 = ~t ~- 1 and the extreme-value distributions: H(t) = e x p { - h ( e ~ ' - 1)}, 2(t)= 0~e~t all satisfy (4.3) and (4.7), the latter for ~>1. A related open question is the following. Suppose the rival introduction time satisfies (4.3) but its distribution F under unit rivalry (h = 1) is unknown. Under what conditions, interesting from a reliability point of view with an appropriate interpretation in the context of rivalry, does there exist a finite maximin introduction time ~*(h) and what, if any, is a least favorable distribution F* of time to rival entry? Such a pair (z*(h), F*), for which max rain V(~, h; F) = min max V(z, h; F ) = V(z*(h), h; F * ) , z F F ~c Reliability applications in economics 211 would indicate the entrepreneur's best economic introduction time within any specified regime of rivalry when he has only an incomplete knowledge of the benchmark distribution F. Here V(v, h; F) is the total expected reward (4.2) and (4.1) under (4.3). The proportional hazards model (4.3) aggregates all sources of rivalry, from existing firms or potential new entrants. This is actually less of a criticism than it appears because in the entrepreneur's preception, only the distribution of composite rival entry time matters. It is possible to introduce technical uncertainty in the model by recognizing that the effort, usually parametrized through cost, required to successfully complete development is also subject to uncertainties (Kamien and Schwartz, 1971). Suppose there are n competetors including our entrepreneur, the rivals are independent and let G(z) be the probability that any rival completes development with an effort no more than z. If z(t) is the cumulative rival effort up to time t, then the probability that none of the rivals will succeed by time t is P(t) = 1 - {1 - G(z(t))} n-1 This leads to (4.3) with F--- G(z), H = P and intensity h = (n - 1) the number of rivals. We note this provides one possible answer to the question of modeling rivalry described by (4.3). What other alternative mechanisms can also lead to (4.3)? If the effort distribution G has a 'failure rate' (intensity of effort) r(z), then the innovational hazard function and rates are An(t ) ( n - 1) r(u) du, (4.8) 2H(t) = (n - 1)z'(t)r(z(t)), which show how technical uncertainty can generate market uncertainty. If our entrepreneur's effort distribution is also G(z) and independent of the rivals; then note the role of each player in the innovation game is symmetric and each faces the hazard rate (4.8) since from the perspective of each competitor, the other (n - 1) rivals are i.i.d, and in series. It would clearly be desirable to remove the i.i.d, assumption to reflect more of a realism in so far as a rival's effort and spending decisions are often dictated by those of others. Some of the effects of an innovation may be irreversible. Computers and information processing technology which have now begun to affect every facet of human life is clearly a case in point. Are these impacts or their possible irreversibility best for the whole society? None of the above formulations can address this issue, a question not in the perview of economists and quantitative modeling alone; nor do they dispute their relevance. What they can and do provide is an understanding of the structure and evolution of the innovating process as a risky enterprise and it is here that reliability ideas may be able to play a more significant role than hitherto in explaining rivalry and their impacts on the economics of 212 M. C. Bhattacharjee i n n o v a t i o n . In t u r n the m e a s u r a b l e p a r a m e t e r s o f s u c h m o d e l s a n d their c o n s e q u e n c e s c a n t h e n serve as s i g n p o s t s for an i n f o r m e d d e b a t e o n the w i d e r q u e s t i o n s o f social r e l e v a n c e o f an i n n o v a t i o n . References Arrow, K. J. (1951). Social Choice and Individual Values. Wiley, New York. Arrow, K. J. (1962). Economic welfare and the allocation of resources for invention. In: R. R. Nelson, ed., The Rate and Direction of Inventive Activity. Princeton University Press, Princeton, NJ. Barlow, R. E. and Campo, R. (1975). Total time on test processes and applications to failure data analysis. In: R. E. Barlow, J. Fussell and N. D. Singpurwalla, eds., Reliability and Fault Tree Analysis, SIAM, Philadelphia, PA, 451-481. Barlow, R. E. and Saboia, J. L. M. (1973). Bounds and inequalities in the rate of population growth. In: F. Proschan and R. J. Serfling, eds., Reliability and Biometry, Statistical Analysis of Lifelengths, SIAM, Philadelphia, PA, 129-162. Barlow, R. E. and Proschan, F. (1965). Mathematical Theory of Reliability. Wiley, New York. Barlow, R. E. and Proschan, F. (1975). Statistical Theory of Reliability and Life Testing: Probability Models. Holt, Rinehart and Winston, New York. Barzel, Y. (1968). Optimal timing of innovation. Review of Economics and Statistics 50, 348-355. Bergmann, R. and Stoyan, D. (1976). On exponential bound for the waiting time distribution in GI/G/1. J. AppL Prob. 13(2), 411-417. Bhattacharjee, M. C. and Krishnaji, N. (1985). DFR and other heavy tail properties in modeling the distribution of land and some alternative measures of inequality. In: J. K. Ghosse, ed., Statistics: Applications and New Directions, Indian Statistical Institute, Eka Press, Calcutta; 100-115. Bhattacharjee, M. C. (1986). Tail behaviour of age-smooth failure distribution and applications. In: A. P. Basu, ed., Reliability and Statistical Quality Control, North-Holland, Amsterdam, 69-86. Bhattacharjee, M. C. (1986a). On using Reliability Concepts to Model Aggregate Inequality of Distributions. Technical Report, Dept. of Mathematics, University of Arizona, Tucson. Brains, S. J., Lucas, W. F. and Straffin, P. D., Jr. (eds.) (1978). Political and Related Models. Modules in Applied Mathematics: Vol. 2, Springer, New York. Chandra, M. and Singpurwalla, N. D. (1981). Relationships between some notions which are common to reliability and economics. Mathematics of Operations Research 6, 113-121. Daley, D. (ed.) (1983). Stochastic Comparison Methods for Queues and Other Processes. Wiley, New York. Deegan, J., Jr. and Packel, E. W. (1978). To the (Minimal Winning) Victors go the (Equally Divided) Spoils: A New Power Idex for Simple n-Person Games. In: S. J. Brahms, W. F. Lucas and P. D. Straffin, Jr. (eds.): Political and Related Models. Springer-Verlag, New York, 239-255. DasGupta, P. and Stiglitz, J. (1980). Industrial structure and the nature of innovative activity. Economic Journal 90, 266-293. DasGupta, P. and Stiglitz, J. (1980a). Uncertainty, industrial structure and the speed of R& D. Bell Journal of Economics 11, 1-28. Feller, W. (1966). Introduction to Probability Theory and Applications. 2nd ed. Wiley, New York. Gail, M. H. and Gatswirth, J. L. (1978). A scale-free goodness-of-fit test for the exponential distribution based on the Lorenz curve. J. Amer. Statist. Assoc. 73, 787-793. Galbraith, J. K. (1952). American Capitalism. Houghton and Mifflin, Boston. Goldie, C. M. (1977). Convergence theorems for empirical Lorenz curves and their inverses. Advances in Appl. Prob. 9, 765-791. Guess, F., Hollander, M. and Proschan, F. (1983). Testing whether Mean Residual Life Changes Trend. FSU Technical Report #M665, Dept. of Statistics, Florida State University, Tallahassee. Hicks, J. R. (1932). The Theory of Wages. Macmillan, London. Hollander, M. and Proschan, F. (1975). Tests for the mean residual life. Biometrika 62, 585-593. Kamien, M. and Schwartz, N. (1968). Optimal induced technical change. Econometrika 36, 1-17. Reliability applications in economics 213 Kamien, M. and Schwartz, N. (1971). Expenditure patterns for risky R & D projects. J. Appl. Prob. 8, 60-73. Kamien, M. and Schwartz, N. (1972). Timing of innovations under rivalry. Econometrika 40, 43-60. Kamien, M. and Schwartz, N. (1974). Risky R & D with rivalry. Annals of Economic and Social Measurement 3, 276-277. Kamien, M. and Schwartz, N. (1975). Market structure and innovative activity: A survey. J. Economic Literature 13, 1-37. Kamien, M. and Schwartz, N. (1982). Market Structure and Innovation. Cambridge University Press, London. Kelfsj/J, B. (1982). The HNBUE and HNWUE class of life distributions. Naval Res. Logist. Qrtly. 29, 331-344. Kelfsj/5, B. (1983). Testing exponentiality against HNBUE. Scandinavian J. Statist. 10, 65-75. Kelfsj~, B. (1984). Reliability interpretations of some concepts from economics. Naval Res. Logist. Qrtly. 31,301-308. Kleinrock, L. (1975). Queueing Systems, Vol. 1. Theory. Wiley, New York. KSllerstrSm, J. (1976). Stochastic bounds for the single server queue. Math. Proc. Cambridge Phil. Soc. 80, 521-525. Lucas, W. F. (1978). Measuring power in weighted voting systems. In: S. J. Brahms, W. F. Lucas and P. D. Straffin, Jr., eds., Political Science and Related Models. Springer, New York, 183-238. Lee, T. and Wilde, L. (1980). Market structure and innovation: A reformulation. Qrtly. J. of Economics 194, 429-436. Loury, G. C. (1979). Market structure and innovation. Qrtly. J. of Economics XCIII, 395-410. Macdonald, J. B. and Ransom, M. R. (1979). Functional forms, estimation techniques and the distribution of income. Ecometrika 47, 1513-1525. Mukherjee, V. (1967). Type III distribution and its stochastic evolution in the context of distribution of income, landholdings and other economic variables. Sankhy-d A 29, 405-416. Owen, G. (1982). Game Theory. 2nd edition. Academic Press, New York. Pechlivanides, P. M. (1975). Social Choice and Coherent Structures. Unpublished Tech. Report # ORC 75-14, Operations Research Center, University of California, Berkeley, Rae, D. (1979). Decision rules and individual values in constitutional choice. American Political Science Review 63. Ramamurthy, K. G. and Parthasarathy, T. (1983). A note on factorization of simple games. Opsearch 20(3), 170-174. Ramamurthy, K. G. and Parthasarathy, T. (1984). Probabilistic implications of the assumption of homogeneity in voting games. Opsearch 21(2), 81-91. Salem, A. B. Z. and Mount, T. D. (1974). A convenient descriptive model of income distribution. Econometrika 42, 1115-1127. Schumpeter, J. A. (1961). Theory of Economic Development. Oxford University Press, New York. Schumpeter, J. A. (1964). Business Cycles. McGraw-Hill, New York. Schumpeter, J. A. (1975). Capitalism, Socialism and Democracy. Harper and Row, New York. Seneta, E. (1976). Regularly Varying Functions. Lecture Notes in Math. 508, Springer, New York. Straffin, P. D., Jr. (1978). Power indices in politics. In: S. J. Brams, W. F. Lucas and P. D. Straffin, Jr., eds., Political Science and Related Models. Springer, New York, 256-321. Straffin, P. D., Jr. (1978a). Probability models for power indices. In: P. C. Ordershook, ed., Game Theory and Political Science, University Press, New York. TaiUie, C. (1981). Lorenz ordering within the generalized gamma family of income distributions. In: C. Taillie, P. P. Ganapati and B. A. Baldessari, eds., Statistical Distributions in Scientific Work. Vol. 6. Reidel, Dordrecht/Boston, 181-192. Taussig, F. W. (1915). Innovation and Money Makers. McMillan, New York. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 215-224 "1'~ .]k g ~ Mean Residual Life: Theory and Applications* Frank Guess and Frank Proschan 1. Introduction and summary The mean residual life (MRL) has been used as far back as the third century A.D. (cf. Deevey (1947) and Chiang (1968)). In the last two decades, however, reliabilists, statisticians, and others have shown intensified interest in the MRL and derived many useful results concerning it. Given that a unit is of age t, the remaining life after time t is random. The expected value of this random residual life is called the mean residual life at time t. Since the MRL is defined for each time t, we also speak of the M R L function. (See Section 2 for a more formal definition.) The M R L function is like the density function, the moment generating function, or the characteristic function: for a distribution with a finite mean, the MRL completely determines the distribution via an inversion formula (e.g., see Cox (1962), Kotz and Shanbhag (1980), and Hall and Wellner (1981)). Hall and Wellner (1981) and Bhattacharjee (1982) derive necessary and sufficient conditions for an arbitrary function to be a M R L function. These authors recommend the use of the M R L as a helpful tool in model building. Not only is the M R L used for parametric modeling but also for nonparametric modeling. Hall and Wellner (1981) discuss parametric uses of the MRL. Large nonparametric classes of life distributions such as decreasing mean residual life (DMRL) and new better than used in expectation (NBUE) have been defined using MRL. Barlow, Marshall and Proschan (1963) note that the D M R L class is a natural one in reliability. Brown (1983) studies the problem of approximating increasing mean residual life (IMRL) distributions by exponential distributions. He mentions that certain IMRL distributions, '... arise naturally in a class of first passage time distributions for Markov processes, as first illuminated by Keilson'. See Barlow and Proschan (1965) and Hollander and Proschan (1984) for further comments on the nonparametric use of MRL. A fascinating aspect about M R L is its tremendous range of applications. For example, Watson and Wells (1961) use MRL in studying burn-in. Kuo (1984) * Research sponsored by the Air Force Office of Scientific Research, AFSC, USAF, under Grant AFOSR 85-C-0007. 215 216 F. Guess and F. Proschan presents further references on M R L and burn-in in his Appendix 1, as well as a brief history on research in burn-in. Actuaries apply MRL to setting rates and benefits for life insurance. In the biomedical setting researchers analyze survivorship studies by MRL. See ElandtJohnson and Johnson (1980) and Gross and Clark (1975). Morrison (1978) mentions IMRL distributions have been found useful as models in the social sciences for the lifelengths of wars and strikes. Bhattacharjee (1982) observes M R L functions occur naturally in other areas such as optimal disposal of an asset, renewal theory, dynamic programming, and branching processes. In Section 2 we define more formally the M R L function and survey some of the key theory. In Section 3 we discuss further its wide range of applications. 2. T h e o r y o f m e a n r e s i d u a l life Let F be a life distribution (i.e., F(t) = 0 for t < 0) with a finite first moment. Let i ( t ) = 1 - F(t). X is the random life with distribution F. The mean residual life function is defined as m(t)= E [ X - t I X > t] = 0 for if(t)> 0, for if(t) = 0 , (2.1) for t >/0. Note that we can express m(t) ~ L i(x + t) - r(t) f o~ i(u) dx = ff-~ du when i ( t ) > O. If F also has a density f we can write re(t) : uf(u) du/~'(t) - t . Like the failure rate function (recall that it is defined as r(t)= f(t)/F(t) when F(t) > 0), the MRL function is a conditional concept. Both functions are conditioned on survival to time t. While the failure rate function at t provides information about a small interval after time t ('just after t', see p. 10 Barlow and Proschan (1965)), the M R L function at t considers information about the whole interval after t ('all after t'). This intuition explains the difference between the two. Note that it is possible for the M R L function to exist but for the failure rate function not to exist (e.g., consider the standard Cantor ternary function, see Chung (1974), p. 12). On the other hand, it is possible for the failure rate function Mean residual life: theory and applications 217 to exist but the M R L function not to exist (e.g., consider modifying the Cauchy density to yield f ( t ) = 2/n(1 + t 2) for t >f 0). Both the M R L and the failure rate functions are needed in theory and in practice. When m and r both exist the following relationship holds between the two: m'(t) = m ( t ) r ( t ) - (2.2) 1. See Watson and Wells (1961) for further comments on (2.2) and its uses• If the failure rate is a constant ( > 0 ) the distribution is an exponential. If the MRL is a constant ( > 0 ) the distribution is also an exponential. L e t / t = E(X). If F(0) = 0 then m(0) = #. If F(0) > 0 then m(0) = #/F(0) ~ #. For simplicity in discussions and definitions in this section, we assume F(0) = 0. Let F be right continuous (not necessarily continuous). Knowledge of the MRL function completely determines the reliability function as follows: if(t) = m(O) e- $Om~ , , d~ for 0 ~ t < F - l ( 1 ) , m(O =0 for t~> F - I ( 1 ) , (2.3) where F - l ( 1 ) ~ f s u p { t [ F ( t ) < 1}. Cox (1962) assigns as an exercise the demonstration that M R L determines the reliability. Meilijson (1972) gives an elegant, simple proof of (2.3). Kotz and Shanbhag (1980) derive a generalized inversion formula for distributions that are not necessarily life distributions. Hall and Wellner (1981) have an excellent discussion of (2.3) along with further references. A natural question to ask is: what functions are M R L functions? A characterization is possible which answers this. By a function f being increasing (decreasing) we mean that x ~<y implies f(x)<~ (>>,)fly). THEOREM 2.1. Consider the following conditions: (i) m:[0, 09)--+ [0, 09). (ii) m(0) > 0. (iii) m is right continuous (not necessarily continuous). (iv) d(t) ,lof = re(t) + t is increasing on [0, 09). (v) When there exists to such that m ( t o ) = llInt~t¢ re(t) = 0, then m(t) = 0 holds for t ~ [to, 09). Otherwise, when there does not exist such a to with m ( t o ) = O, then S o 1~re(u)du = 09 holds. A function m satisfies (i)-(v) /f and only if m is the M R L function of a nondegenerate at 0 life distribution. • def. See Hall and Wellner (1981) for a proof. See Bhattacharjee (1982) for another characterization. Note that condition (ii) rules out the degenerate at 0 distribution• 218 F. Guess and F. Proschan For (iv) note that d(t) is simply the expected time of death (failure) given that a unit has survived to time t. Theorem 2.1 delineates which functions can serve as MRL functions, and hence, provides models for lifelengths. We restate several bounds involving MRL from Hall and Wellner (1981). Recall a + = a if a >i 0, otherwise a + = 0. THEOREM 2.2. Let F be nondegenerate. L e t ~tr = E X r ~ oo for r > 1. (i) m ( t ) < ~ ( F - l ( 1 ) - t ) + for all t. Equality holds if and only if F ( t ) = F ( ( F - 1(1))-) or 1. (ii) m(t) <~ (#~if(t)) - t f o r all t. Equality holds if and only if F(t) = O. (iii) m(t) < (#r/F(t)) l / r - t for all t. (iv) m(t) >~ (kt - t)+ /F(t) for t < F - 1(1). Equality holds if and only if r ( t ) = O. (v) m(t) > [# - F(t)(l~r/F(t))l/~]iF(t ) - t f o r t < F - 1(1). (vi) m(t)>~ ( # - t) + for all t. Equality holds if and only if F(t) = 0 or 1. Various nonparametric classes of life distributions have been defined using MRL. (Recall, for simplicity we assume F(0) = 0 and the mean is finite for these definitions.) DEFINITION 2.3. DMRL. A life distribution F has decreasing mean residual life if its MRL m is a decreasing function. DEFINITION 2.4. NBUE. A life distribution F is new better than used in expectation if m(0) >1 m(t) for all t >t 0. DEFINITION 2.5. IDMRL. A life distribution F has increasing then decreasing mean residual life if there exist z>~ 0 such that m is increasing on [0, z) and decreasing on [z, ~ ) . Each of these classes above has an obvious dual class associated with it, i.e., increasing mean residual life, new worse than used in expectation (NWUE), and decreasing then increasing mean residual life (DIMRL), respectively. The D M R L class models aging that is adverse (e.g., wearing occurs). Barlow, Marshall and Proschan (1963) note that the D M R L class is a natural one in reliability. See also Barlow and Proschan (1965). The older a D M R L unit is, the shorter is the remaining life on the average. Chen, Hollander and Langberg (1983) contains an excellent discussion of the uses of the D M R L class. Burn-in procedures are needed for units with IMRL. E.g., integrated circuits have been observed empirically to have decreasing failure rates; and thus they satisfy the less restrictive condition of IMRL. Investigating job mobility, social scientists refer to IMRL as inertia. See Morrison (1978) for example. Brown (1983) studies approximating IMRL distributions by exponentials. He comments that certain IMRL distributions, '... arise naturally in a class of first passage time distributions for Markov processes, as first illuminated by Keilson'. Note that D M R L implies NBUE. The N B U E class is a broader and less Mean residual life: theory and applications 219 restrictive class. Hall and Wellner (1981) show for NBUE distributions that the coefficient of variation a/it ~< 1, where a z = Var(X). They also comment on the use of NBUE in renewal theory. Bhattacharjee (1984b) discusses a new notion, age-smoothness, and its relation to NBUE for choosing life distribution models for equipment subject to eventual wear. Note that burn-in is appropriate for NWUE units. For relationships of DMRL, IMRL, NBUE, and N W U E with other classes used in reliability see the survey paper Hollander and Proschan (1984). The IDMRL class models aging that is initially beneficial, then adverse. Situations where it is reasonable to postulate an IDMRL model include: (i) Length of time employees stay with certain companies: An employee with a company for four years has more time and career invested in the company than an employee of only two months. The M R L of the four-year employee is likely to be longer than the M R L of the two-month employee. After this initial IMRL (this is called 'inertia' by social scientists), the processes of aging and retirement yield a D M R L period. (ii) Life lengths of human." High infant mortality explains the initial IMRL. Deterioration and aging explain the later D M R L stage. See Guess (1984) and Guess, Hollander, and Proschan (1983) for further examples and discussion. Bhattacharjee (1983) comments that Gertsbakh and Kordonskiy (1969) graph the MRL function of a lognormal distribution that has a 'bath-tub' shaped M R L (i.e., DIMRL). Hall and Wellner (1981) characterize distributions with MRL's that have linear segments. They use this characterization as a tool for choosing parametric models. Morrison (1978) investigates linearly IMRL. He states and proves that if F is a mixture of exponential then F has linearly IMRL if and only if the mixing distribution, say G, is a gamma. Howell (1984) studies and lists other references on linearly DMRL. In renewal theory M R L arises naturally also. For a renewal process with underlying distribution F, let G(t) = ( ~ if(u)du)/#. G is the limiting distribution of both the forward and the backward recurrence times. See Cox (1962) for more details. Also if the renewal process is in equilibrium then G is the exact distribution of the recurrence times. G(t) = (m(t)ff(t))/#. The failure rate of G, r 6, is inversely related to the MRL of F, m F. I.e., re(t ) = 1/mF(t ). Note, however, that rF(t) ~ 1/mF(t ) is USually the case. See Hall and Wellner (1981), Rolski (1975), Meilijson (1972), and Watson and Wells (1961) for related discussions. Kotz and Shanbhag (1980) establish a stability result concerning convergence of an arbitrary sequence of M R L functions to a limiting MRL function. (See also Bhattacharjee (1982).) They show an analogous stability result for hazard measures. (When the failure rate for F exists and vF is F's hazard measure, then VF(B) = ~B rF(t) dt for B a Borel set.) Their results imply that MRL functions can provide more stable and reliable information than hazard measures when assessing noncontinuous distributions from data. In a multivariate setting, Lee (1985) shows the effect of dependence by total positivity on M R L functions. F. Guess and F. Proschan 220 3. Applications of mean residual life A mean is easy to calculate and explain to a person not necessarily skilled in statistics. To calculate the empirical M R L function, one does not need calculus. Details of computing the empirical M R L follow. Let X 1, X 2 . . . . , X~ be a r a n d o m sample from F. For simpler initial notation, we assume first no ties. Later we allow for ties. Order the observations as (3.1) x,. <x2. < "" <Xn.. Let Xo, = 0. The empirical M R L function is defined as mn(t ) = 2 ni = k + l (Sin -- t) for te [Xk,, X(k + l),) , (3.2) n-k and k = 0, 1, ..., n - 1. rn~(t) = 0 for t>~X,n. Note that (3.2) is simply m,(t) = Total time on test observed after t (3.3) N u m b e r of units observed after t -- def[~n The empirical M R L function at 0, mn(0) = X , = ~,. ~= 1 Xi)/n, is just the usual sample mean when no unit fails at time 0. If a unit fails at 0 then m n ( 0 ) > X,. If ties exist let 0 = Xol<Xll<X2l < ... (3.4) <X~ll be the distinct ordered times of failure, i n; = number of observed failures at time ~';z, se=n- ~ nj (3.5) j=0 for i = 0, 1, ..., I < n. Note that n i ~ 0, i = 1, . . . , / , m.(t) l = ~i=k+ = 0 while n o = 0 is allowed. ni(Xil- t) for t~ [~'kZ, X(k+ ,),), Sk 1 (3.6) for t >~/~'u, for k = 0, 1. . . . , l - 1. Note that (3.6) is simply notation for (3.3). We illustrate in the following example. EXAMPLE 3.1. Bjerkedal (1960) studies the lifelengths of guinea pigs injected with different amounts of tubercle bacilli. Guinea pigs are known to have a high Mean residual life." theory and applications 221 susceptibility to human tuberculosis, which is one reason for choosing this species. We describe the only study (M) in which animals in a single cage are under the same regimen. The regimen number is the common log of the number of bacillary units in 0.5 ml of the challenge solution, e.g., regimen 4.3 corresponds Table 3.1 Empirical m e a n residual life in days at the unique times of death for the 72 guinea pigs under regimen 5.5. We include the empirical M R L at time 0 also. Number of ties Time of death Empirical MRL N u m b e r of ties Time of death Empirical MRL nz Xm mn(Xin) n, .~z~ mn(-~m) 0 1 1 1 2 0 43 45 53 56 141.85 100.24 99.64 92.97 92.66 1 1 1 1 1 123 126 128 137 138 114.92 116.40 119.17 114.96 119.14 1 1 1 1 1 57 58 66 67 73 93.05 93.46 86.80 87.16 82.47 1 1 1 1 1 139 144 145 147 156 123.76 124.70 130.21 135.33 133.76 1 1 2 3 1 74 79 80 81 82 82.80 79.10 80.79 84.15 84.69 1 1 1 1 1 162 174 178 179 184 135.75 132.00 137.14 146.62 153.42 2 1 1 1 2 83 84 88 89 91 86.90 87.59 85.26 85.98 87.55 1 1 1 1 1 191 198 211 214 243 159.73 168.00 172.22 190.38 184.43 2 1 2 2 1 92 97 99 100 101 90.40 87.34 89.40 92.83 94.18 1 1 I 1 1 249 329 380 403 511 208.17 153.80 128.50 140.67 49.00 3 1 1 1 1 102 103 104 107 108 100.94 102.80 104.79 104.88 107.13 1 1 522 598 76.00 0.00 1 1 1 1 1 109 113 114 118 121 109.55 109.07 111.79 111.64 112,67 222 F. Guess and F. Proschan to 2.2 × 104 bacillary units per 0.5 ml (loglo(2.2 × 104)=4.342). Table 3.1 presents the data from regimen 5.5 and the empirical MRL. Graphs of MRL provide useful information not only for data analysis but also for presentations. Commenting on fatigue longevity and on preventive maintenance, Gertsbakh and Kordonskiy (1969) recommend the MRL function as another helpful tool in such analyses. They graph the MRL for different distributions (e.g., Weibull, lognormal, and gamma). Hall and Wellner (1979) graph the empirical MRL for Bjerkedal's (1960) regimen 4.3 and regimen 6.6 data. Bryson and Siddiqui (1969) illustrate the graphical use of the empirical MRL on survival data from chronic granulocytic leukemia patients. Using the standard KaplanMeier estimator (e.g., see Lawless (1982), Nelson (1982), or Miller (1980)), Chen, Hollander, and Langberg (1983) graph the empirical MRL analogue for censored lifetime data. Gertsbakh and Kordonskiy (1969) note that estimation of MRL is more stable than estimation of the failure rate. Statistical properties of estimated means are better than those of estimated derivatives (which enter into failure rates). Yang (1978) shows that the empirical MRL is uniformly strongly consistent. She establishes that mn, suitably standardized, converges weakly to a Gaussian process. Hall and Wellner (1979) require less restrictive conditions to apply these results. They derive and illustrate the use of simultaneous confidence bands for m. Yang (1978) comments that for t > 0, ran(t) is a slightly biased estimator. Specifically, E(mn(t))= m(t)(1 -Fn(t)). Note, however, that l i m ~ E(m~(t))= re(t). Thus, for larger samples rn,(t) is practically unbiased. See also Gertsbakh and Kordonskiy (1969). Yang (1977) studies estimation of the MRL function when the data are randomly censored. For parametric modeling Hall and Wellner (1981) use the empirical MRL plot. They observe that the empirical MRL function is a helpful addition to other life data techniques, such as total time on test plots, empirical (cumulative) failure rate functions, etc. The MRL plot detects certain aspects of the distribution more readily than other techniques. See Hall and WeUner (1981), Hall and WeUner (1979), and Gertsbakh and Kordonskiy (1969) for further comments. When a parametric approach seems inadvisable, the MRL function can still be used as a nonparametric tool. Broad classes defined in terms of MRL allow a more flexible approach while still incorporating preliminary information. For example, to describe a wear process, a DMRL is appropriate. When newly developed components are initially produced, many may fall early (such early failure is called infant mortality and this early stage is called the debugging stage). Another subgroup tends to last longer. Depending on information about this latter subgroup, we suggest IMRL (e.g., lifelengths of integrated circuits) or IDMRL (e.g., more complicated systems where there are infant mortality, useful life, and wear out stages). Objective tests exist for these and other classes defined in terms of MRL. E.g., see Hollander and Proschan (1984) and Guess, Hollander and Proschan (1983). To describe 'burn-in' the MRL is a natural function to use. Kuo's (1984) Appendix 1 presents an excellent brief introduction to burn-in problems and applications of MRL. Mean residual life: theory and applications 223 Actuaries apply M R L to setting rates and benefits for life insurance. In the biomedical setting researchers analyze survivorship studies by M R L . For example, see E l a n d t - J o h n s o n and J o h n s o n (1980) and Gross and Clark (1975). Social scientists use I M R L for studies on job mobility, length o f wars, duration of strikes, etc. See Morrison (1978). In economics M R L arises also. Bhattacharjee and Krishnaji (1981) present applications of M R L for investigating landholding. Bhattacharjee (1984a) uses N B U E for developing optimal inventory policies for perishable items with r a n d o m shelf life and variable supply. Bhattacharjee (1982) observes M R L functions occur naturally in other areas such as optimal disposal of an asset, renewal theory, dynamic programming, and branching processes. Acknowledgements We thank Dr. J. Travis, Department of Biological Sciences, and Dr. D. Meeter, Department of Statistics, Florida State University, for the Deevey (1947) reference. We are also grateful to Dr. M. Bhattacharjee, Indian Institute o f Management, Calcutta, and to Dr. M. Hollander, Department of Statistics, Florida State University for discussions on M R L . References Barlow, R. E., Marshall, A. W. and Proschan, F. (1963). Properties of probability distributions with monotone hazard rate. Ann. Math. Statist. 34, 375-389. Barlow, R. E. and Proschan, F. (1965). Mathematical Theory of Reliability. Wiley, New York. Barlow, R. E. and Proschan, F. (1981). Statistical Theory of Reliability and Life Testing. To Begin With, Silver Springs, MD. Bhattacharjee, M. C. (1984a). Ordering policies for perishable items with unknown shelf life/variable supply distribution. Indian Institute of Management, Calcutta, Technical Report. Bhattacharjee, M. C. (1984b). Tail behavior of age-smooth failure distributions and applications. Indian Institute of Management, Calcutta, Technical Report. Bhattacharjee, M. C. (1983). Personal communication. Bhattacharjee, M. C. (1982). The class of mean residual lives and some consequences. S l A M J. Algebraic Discrete Methods 3, 56-65. Bhattacharjee, M. C. and Krishnaji, N. (1981). DFR and other heavy tail properties in modelling the distribution of land and some alternative measures of inequality. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference. Bjerkedal, T. (1960). Acquisition of resistance in guinea pigs infected with different doses of virulent tubercle bacilli. Amer. J. Hygiene 72, 130-148. Brown, M. (1983). Approximating IMRL distributions by exponential distributions, with applications to first passage times. Ann. Probab. 11, 419-427. Bryson, M. C. and Siddiqui, M. M. (1969). Some criteria for aging. J. Amer. Statist. Assoc. 64, 1472-1483. Chen, Y. Y., Hollander, M. and Langberg, N. A. (1983). Tests for monotone mean residual life, using randomly censored data. Biometrics 39, 119-127. Chiang, C. L. (1968). Introduction to Stochastic Processes in Biostatistics. Wiley, New York. 224 F. Guess and F. Proschan Chung, K. L. (1974). A Course in Probability Theory, 2nd ed. Academic Press, New York. Cox, D. R. (1962). Renewal Theory. Methuen, London. Deevey, E. S. (1947). Life tables for natural populations of animals. Quarterly Review of Biology 22, 283-314. Elandt-Johnson, R. C. and Johnson, N. L. (1980). Survival Models and Data Analysis. Wiley, New York. Gertsbakh, I. B. and Kordonskiy, K. B. (1969). Models of Failure. Springer, New York. Gross, A. J. and Clark, V. A. (1975). Survival Distributions: Reliability Applications in the Biomedical Sciences. Wiley, New York. Guess, F. (1984). Testing whether mean residual life changes trend. Ph.D. dissertation, Department of Statistics, Florida State University. Guess, F., Hollander, M. and Proschan, F. (1983). Testing whether mean residual life changes trend. Florida State University Department of Statistics Report M665. (Air Force Office of Scientific Research Report 83-160). Hall, W. J. and Wellner, J. A. (1979). Estimation of mean residual life. University of Rochester Department of Statistics Technical Report. Hall, W. J. and Wellner, J. A. (1981). Mean residual life. In: M. CsSrgS, D. A. Dawson, J. N. K. Rao and A. K. Md. E. Saleh, eds., Statistics and Related Topics, North-Holland, Amsterdam, 169-184. Hollander, M. and Proschan, F. (1984). Nonparametric concepts and methods in reliability. In: P. R. Krishnaiah and P. K. Sen, eds., Handbook of Statistics, Vol. 4, Nonparametric Methods, NorthHolland, Amsterdam. Howell, I. P. S. (1984). Small sample studies for linear decreasing mean residual life. In: M. S. Abdel-Hameed, J. Quinn and E. ~inlar, eds., Reliability Theory and Models, Academic Press, New York. Keilson, J. (1979). Markov Chain Models--Rarity and Exponentiality. Springer, New York. Kotz, S. and Shanbhag, D. N. (1980). Some new approaches to probability distributions. Adv. in Appl. Probab. 12, 903-921. Kuo, W. (1984). Reliability enhancement through optimal burn-in. IEEE Trans. Reliability 33, 145-156. Lawless, J. F. (1982). Statistical Models and Methods for Lifetime Data. Wiley, New York. Lee, M. T. (1985). Dependence by total positivity. Ann. Probab. 13, 572-582. Meilijson, I. (1972). Limiting properties for the mean residual lifetime function. Ann. Statist. 1, 354-357. Miller, R. G. (1981). Survival Analysis. Wiley, New York. Morrison, D. G. (1978). On linearly increasing mean residual lifetimes. J. Appl. Probab. 15, 617-620. Nelson, W. (1982). Applied Life Data Analysis. Wiley, New York. Rolski, T. (1975). Mean residual life. Bulletin of the International Statistical Institute, Book 4 (Proceedings of the 40th Session), 266-270. Swartz, G. B. (1973). The mean residual lifetime function. IEEE Trans. Reliability 22, 108-109. Watson, G. S. and Wells, W. T. (1961). On the possibility of improving the mean useful life of items by eliminating those with short lives. Technometrics 3, 281-298. Yang, G. L. (1978). Estimation of a biometric function. Ann. Statist. 6, 112-116. Yang, G. (1977). Life expectancy under random censorship. Stochastic Process. AppL 6, 33-39. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 225-249 1 i J Life Distribution Models and Incomplete Data* Richard E. Barlow and Frank Proschan O. Introduction In this paper our objective is to introduce life distribution models and to discuss methods useful for analyzing failure data, especially incomplete data. We show how to express the likelihood functions for general distributions and incomplete data. The likelihood function tends to be fairly fiat for incomplete data, For this reason the maximum likelihood estimator may be of limited value. It is therefore especially important in this situation to assess a prior distribution for parameters and plot the posterior distribution or its contours. Inference based on the exponential model is discussed for general sampling plans. Parameter estimators and credibility intervals are derived for special cases. The Weibull distribution is a very useful model for life distribution studies and also for the analysis of strength data. For these reasons, we describe failure m e c h a n i s m s leading to a Weibull life distribution model. Contour plotting methods for analyzing life data based on a Weibull distribution are also given. 1. Likelihood In this section we present a unified way of analyzing incomplete data for a large number of failure distribution models. We often assume that the failure distribution F is absolutely continuous with density f and failure rate r(x) f(x) = F(x) (1.1) ' where if(x) = 1 - F(x). We call * This research was supported by the Air Force Office of Scientific Research (AFSC), USAF, under Grant AFOSR-77-3179 with the University ef California. Reproduction in whole or in part is permitted for any purpose of the United States Government. 225 R. E. Barlowand F. Proschan 226 R(x) = fo r(u) du the hazard function R(x) so that if(x)= d dx (1.2) associated with F. For general F, define = - lnF(x) (1.3) e x p [ - R(x)]. N o t e that when F has a density f, [ _ lnff(x)] _ _f(x) F(x) _ r(x) so that (1.2) and (1.3) agree in this case. F r o m (1.1) and (1.3) we see that f(x) = r(x) e -R(x) . (1.4) For a discussion of these fundamental concepts, their inter-relationships and illustrations in the case of well k n o w n distributions, see Barlow and Proschan (1975), Chapter 3. Suppose now we observe n independent lifetimes xl, x 2 . . . . . x, corresponding to a given failure rate function, r. The joint density is i~__l f(xi) = [ i=~-Ilr(xi) l exp[ - i~=l R(Xi) ] . The likelihood as a function D = (xl, x 2 . . . . . x,) is then of the failure (1.5) rate function L(r(u), u >~OlD) = [ i=(Ilr(xi)] expf - i~=l R(Xi) ] . EXAMPLE 1.1. The time-transformed function is of the form ff(xl2) = e - ~R°(x) exponential model for data (1.6) Suppose the survival (1.7) where it is assumed that R o is k n o w n and differentiable but 2 is unknown. By (1.2) we may writte 2R°(x) = fo 2r°(u) du. It follows that the hazard function and the failure rate function are essumed known up to the parameter 2. Another way to view the model is to consider time 227 Life distribution models and incomplete data x to be transformed by the function Ro('). For this reason (1.7) is called the time-transformed exponential model Let x~, x 2, ..., x, be n independent observations given 2 from this model. The likelihood is L(21D) = 2n Ii__I~1 ro(Xi)] exp I -/~ i=~l (1.8) Ro(xi) ] • We conclude that Y,"i=l Ro(x~) and n are jointly sufficient for 2. If we use the gamma prior for 2, ~(,~) -- b a,~a - 1 e - b2 r(a) we obtain as the posterior density for 2: n 1t(2ID) = b + Ro(xi) i=1 2a+m_ 1 exp{ - 2[b + 5~,.= ~Ro(x;)]} r ( a + n) (1.9) Inference preceeds exactly as for the exponential model, except that observation x i of the exponential model is replaced by its time-transformed value Ro(x~). This is valid assuming only that Ro(" ) is continuous. 1.1. The general sampling plan In many practical life testing situations, the lifetime data collected are incomplete. This may be due to the sampling plan itself or due to the unplanned withdrawal of test units during the test. (For example, in a medical experiment, one or more of the subjects may leave town, or suffer an accident, etc.) We now describe one type of sampling plan. Suppose unit i having lifetime distribution F is observed over an interval of time starting at age 0 and ending at a random or nonrandom age. Termination of observation occurs in either one of the following two ways: (1) The ith unit is withdrawn or lost from observation at age l; ~> 0; li may be random or nonrandom. (2) The ith unit fails at age Xi, where X; is a random variable. In addition, we require a technical assumption regarding the 'stopping rule'; i.e., a prescription for determining when to stop observation: (3) Suppose unit lifetime, X, depends on an unknown parameter (or parameters) 0. Observation on a unit may stop before unit lifetime is observed. Let STOP be a rule or set of instructions which determines when observation of a unit stops. 228 R. E. Barlow and F. Proschan STOP is noninformative relative to 0, that is, STOP provides no additional information about 0 other than that contained in the data. It is important to remark that the 'stopping rule' is not necessarily the same as the 'stopping time'. To understand assumption (3), consider the sampling plan: put n items on life test and stop testing at the kth observed failure. In this case, the stopping rule depends only on k and is clearly independent of life distribution parameters since k is fixed in advance of testing. Suppose we stop testing at time to. Since to is fixed in advance of testing, the stopping rule is again independent of life distribution parameters. For these sampling plans, the likelihood, up to a constant of proportionality, depends only on the life distribution model and the observed data. This proportionality constant depends on the stopping rule, but not on the unknown parameter. 1.2. Examples of informative stopping rules Records are routinely kept on failures (partial or otherwise) and maintenance actions on critical units such as airplane engines. Should a relatively new type of unit start exhibiting problems earlier than anticipated, this may trigger early withdrawal of units. If this happens, the stopping rule, which is contingent on performance, may also be informative relative to life distribution parameters. This fact needs to be considered when calculating the likelihood and analyzing the data. The second example illustrates another case where assumption (3) is violated. Suppose lifetime X is exponential with failure rate 2 and the random withdrawal time, W, is also exponential with parameter ~p. We observe the minimum of X and W. Furthermore, suppose that X given 2 and W given q~ are judged independent. Then the likelihood given an observed failure at x is L(2, ~blx) = 2 e-~X e-*X. If ~. and ~ are judged a priori independent then the posterior density of ~. is 7t(21 x) oc ;t e - zx n(2) where n is the prior density for ~. However, if ). and ~ are judged dependent with joint prior rt(2, ~p), then the posterior density is zt(21x)oc)~e-~Xf~e-~X~(2,(p)d(o. The factor ~o e-¢Xrc( 2, q~) dq~, contributed by the stopping rule, depends on ~.. There is an important case not covered by the General Sampling Plan--namely when it is known that a unit has failed within some time interval but the exact time of failure is unknown. The following simple example illustrates the way in which incomplete data can arise. 229 Life distribution models and incomplete data EXAMPLE 1.2. Operating data are collected on an airplane part for a fleet of airplanes. A typical age history for several engines is shown in Figure 1.1. The crosses indicate the observed ages at failure. Ordered withdrawal times (nonfailure times) are indicated by short vertical lines. In our example, units 2 and 4 fail at respective times xco and x~2~ while observation on units 1 and 3 is terminated without failure at times l~2~ and l~1~ respectively. Unit number Age u X(1) l(t) x(2) l(2) Fig. 1.1. Age of airplane part at failure or withdrawal. It is important to note that all data are plotted against the age axis. Figure 1.2 illustrates how events may have occurred in calendar time. For example, units 1 and 3 had not failed at the end of the calendar record. 1.3. Total time on test The total time on test is an important statistic for the exponential model. Unit number 1 2 L × 3 I 4 I Start of calendar record End of calendar record Fig. 1.2. Calendar records for airplane parts. 230 R. E. Barlow and F. Proschan DEFINITION 1.3. The total time on test T is the total of the periods of observation of all the units undergoing test. Excluded from this statistic are any periods following death or withdrawal or preceding observation. Specifically, the periods being totalled include only those in which a death or a withdrawal of a unit under observation can be observed. n(u) Age u x(,) l(,) x(2 ) 1(2) Fig. 1.3. Number of units in operation n(u) as a function of age. Let n(u) be the number of units observed to be operating at age u. The observed function n(u) u >~ O, for Example 1.2 is displayed in Figure 1.3. From Figure 1.3 we may readily calculate the total time on test T(t) corresponding to any t, 0 <~ t ~ l(2): r(t) = I f n(u) du. (1.10) For example, for t such that x(2) < t </(2), we obtain from Figure 1.3: T(t) [ ' n(u) du = 4x(1 ) + 3(l(1) - x(1)) + 2(x(2) - l(1)) + (t - x(z)). do After simplifying algebraically, we obtain T(t) = x(a ) + l(1) + x(2) + t. (1.11) Note that the resulting expression, given in (1.11), can be obtained directly, since X(l ) and x(2) represent the observed lifetimes of the 2 units that are observed to fail, l(1) represents the observed age of withdrawal of the unit first withdrawn from observation, and finally t represents the age of the second unit at the instant t specified. Although in this small example, the directly calculated expression (1.11) for total time on test is simpler, Equation (1.10) is an important identity, since it Life distribution models and incomplete data 231 yields the total time on test accumulated by age t in terms of the (varying) number of units on test at each instant during the interval [0, t] for any data set in which the ages at death or withdrawal are observed. Thus it is a general formula applicable in a great variety of problems in which data may be incomplete. Although n(u) is a step function, the integral representation in (1.10) is advantageous, since it is compact, mathematically tractable, and applicable in a great variety of incomplete data situations. Of course, So n ( u ) d u < ~ in practical problems since observation ultimately ceases in order to analyze the data in hand. 1.4. The likelihood function for incomplete data All recorded data are necessarily discrete. Likewise real world life distribution models should also be discrete. Continuous life distribution models are convenient approximations to real world life distributions. However, it is most convenient to define initially the likelihood concept in the context of discrete models. For our purposes, we find it preferable to define the likelihood concept for the General Sampling Plan in the context of a discrete model. Computation of the likelihood function is an intermediate step between specification of the prior distribution on the space O and computation of the posterior distribution on O given observed data D. Suppose temporarily that the life distribution is discrete, i.e., failures can occur only at times 1, 2, ... ; similarly, withdrawals can occur only at these time points. Suppose that the probability of failure of a given unit at x is p(xl 0). Suppose k failures are observed at times xs, s = 1, ..., k, and m withdrawals are observed at times lt, t = 1, ..., m. Failure and withdrawal times need not be distinct. All observations are assumed statistically independent, given parameters. Withdrawal times are produced by a stopping rule which is noninformative concerning 0. For example, the stopping rule might specify that we observe a unit until failure or until withdrawal, whichever comes first, where withdrawal time is specified in advance. For this model, the probability of the observed outcome is k p(DIO) = ~ p(x,[O) f i P(I,[0), s=l (1.12) t=l where P(ujl O) def = Zi= 1 P(Uj+irO) represents the probability that a specified unit fails at age uj+ ~ or later, given the parameter is 0. Note that the first product corresponds to the k failures at respective ages x~ . . . . . x k, while the second product corresponds to the m withdrawals at respective ages l I . . . . . Ira. Another way to model withdrawal is to suppose there exists a random withdrawal age W such that P [ W = t] --- q(t), t = 1, 2 . . . . . with W independent of unit lifetimes and of 0. Under this model, we suppose that we observe minimum (X, W) = ~ X if X ~< W, ( W if X > W. R. E. Barlow and F. Proschan 232 Now for observed data D = {x l, ..., x k, l 1, ... lm}, the probability of the observed outcome given parameter 0 is k k p(D[O) = f i q(It) 1-I Q(xs) I-I p(xs[O) f i e ( l t l 0 ) , t=l s=l s=l (1.13) t=l where Q(uj) ~ f ~]i=1 ~ q(uj+e) represents the probability that W > uj. Note that (1.12) and (1.13) differ only by a factor that does not depend on 0. Thus, relative to calculating the M L E of O, the two models for withdrawals (withdrawal deterministic or withdrawal random) do not differ essentially. There are many practical testing situations in which withdrawals occur as a result of chance mechanisms unrelated to the parameter 0 of the lifetime distribution. For example, concluding the collection of data at a specified chronological time has the effect of withdrawing from observation those units still alive at that point in time. In Figure 1.2, this phenomenon is illustrated by units 1 and 3. Other chance mechanisms causing withdrawal at a random age result from human errors and accidents. The net effect of the various stopping rules that are unrelated to the value of the parameter 0 is summarized in the factor g(x, l) in the expression for the probability of the observed outcome: p(D[ O) = g(x, 1) -if(It[ 0). (1.14) t=l DEFINITION 1.4. The likelihood, L(OiD), is the probability o f the observed outcome, p(D]O), considered as a function of the parameter 0 given the data, D. In the case of a continuous model, the corresponding likelihood will have this interpretation relative to a discrete probability approximation. It follows from (1.14) that k L(OID)o¢ I-I p(x~]O) f i P(ltlO ). s=l (1.15) t=l From Bayes' Theorem, it is clear that we need not know g(x, !) in order to compute the posterior density of 0. In this subsection, we have thus far confined our discussion to the case of discrete time life distributions since the basic concepts are easier to grasp in this case. However, in the case of continuous time life distributions, the likelihood concept is equally relevant, and in fact the expression for the likelihood L(OID) assumes a rather elegant form if we use n(u), the number on test function. In the continuous case, p(x[O) is replaced by the probability density element f(xlO). Given the failure rate, independent observations are made under the General Sampling Plan. Let Xl, x 2, ..., x k denote the k observed failure ages. Let n(u) denote the number of units under observation at age, u, u >i O, and r(u) denote THEOREM 1.5. Life distribution models and incomplete data 233 the failure rate function of the unit at age u. Then the likelihood of the failure rate function r(u), having observed data D described above, is given by L(r(u), u >101D) I~=[-I r(xs)]exp[- ~o°°n(u)r(u)dul, k>/1, (1.16) OC exp[-~o°°n(u)r(u)du], k=O. PROOF. To justify (1.16), we first note that the underlying random events are the ages at failure or withdrawal. Thus the likelihood of the observed outcome is specified by the likelihood of the failure ages and survivals until withdrawal. By Assumption (3) of the General Sampling Model, we need not include any factor contributed by the stopping rule, since the stopping rule does not depend on the failure rate function r(-). To calculate the likelihood, we use the fact that given r(.), ;or'U'U] (See (1.4).) Specifically, if a unit is observed from age 0 until it is withdrawn at age l, without having failed during the interval [0, lt], a factor e x p [ - S~ r(u)du] is contributed to the likelihood. Thus, if no units fail during the test (i.e., k = 0), the likelihood of the observed outcome is proportional to the expression given in (1.16) for k = 0. On the other hand, if a unit is observed from age 0 until it fails at age x~, a factor r(x~)expl- fo~r(u) du] is contributed to the likelihood. The exponential factor corresponds to the survival of the unit during [0, xs], while r(xs) represents the rate of failure at age xs. (Note that if we had retained the differential element 'dx', the corresponding expression r(Xs) dx would approximate an actual probability: the conditional probability of a failure during the interval (xs, x s + dx) given survival to age x~.) The likelihood expression in (1.16) corresponding to the outcome k >i 1 now is clear. The exponential factor corresponds to the survival intervals of both units that failed under observation and units that were withdrawn before failing: yo n(u)r(u) du = ~. r(u) du + ~ r(u) du, R. E. Barlowand F. Proschan 234 where the first sum is taken over units that failed while the second sum is taken over units that were withdrawn. The upper limit ' ~ ' is for simplicity and introduces no technical difficulty, since n(u)=-0 after observation ends. [] The likelihood (1.16) applies for any absolutely continuous life distribution. In the important special case of an exponential life distribution model, f(xl2) = 2 e-~x, the likelihood of the observed outcome takes the simpler form [;o [fo 2 kexp L(AID) oc exp - 2 -2 ] n(u) du , k>~ 1, ] n(u) du , (1.17) k=O. The following theorem is obvious from (1.17). THEOREM 1.6. Assume that the test plan satisfies Assumptions (1), (2) and (3) of the General Sampling Plan. Assume that k failures and the number of units operating at age u, n(u), u >~O, are observed and that the model is the exponential density f(x]2) = 2 e- ~x. Then (a) k and T = So n(u) du together constitute a sufficient statistic for 2; (b) kiT is the MLE for 2. Note that the MLE, k/T, for 2 represents the number of observed failures divided by the total time of test. The maximum likelihood estimator is the mode of the posterior density corresponding to a uniform prior (over an interval containing the MLE). A uniform prior is often a convenient reference prior. Under suitable circumstances, the analyst's actual posterior distribution will be approximately what it would have been had the analyst's prior been uniform. To ignore the departure from uniformity, it is sufficient that the analyst's actual prior density changes gently in the region favored by the data and also that the prior density not too strongly favors some other region. This result is rigorously expressed in the Principle of Stable Estimation [see Edwards, Lindman and Savage (1963)]. DeGroot (1970), pages 198-201, refers to this result under the name of precise measurement. EXAMPLE 1.7. The exact likelihood can be calculated explicitly for specified stopping rules. Suppose that withdrawal times are determined in advance. Then the likelihood is L(r(u), u >~OID) = I ~=l n(xT )r(xs)] e x p l - f o~ n(u)r(u) dul (1.18) where n(Xs ) is the number surviving just prior to the observed failure at age x s. To see this consider the airplane engine data in Example 1.2. Using Figure 1.3 as a guide, the likelihood will have the following factors: Life distributionmodelsand incompletedata 235 1. For the interval [0, x(1)] we have the contribution 4r(xo))expI-~o"~4r(u)du] corresponding to the probability that all 4 units survive to x(S) and the first failure occurs at x(1). 2. For the interval (x(l), l(1)] we have the contribution ex,I corresponding to the probability that the remaining 3 units survive this interval. 3. For the interval (l(1), x(2)] we have the contribution 2r(x(2)) exp[ - f t~i;~2r(u) du] corresponding to the probability that the remaining 2 units survive to x(~) and the failure occurs at x(z). 4. For the interval (x(2), l(2)] we have the contribution expf corresponding to the conditional probability that the remaining unit survives to age l(2). Multiplying together these conditional probabilities, we obtain a likelihood having the form shown in (1.18). 2. Parameter estimators and credible intervals In the previous section we saw how to calculate the likelihood function for general life distributions. This is required in order to calculate the posterior distribution. Calculation and possibly graphical display of the posterior density would conceivably complete our data analysis. If we assume a life density p(xlO) and n(O) is the prior, then p(x, O) = p(x] 0)~(0) is the joint density and p(x) = ~op(x[O)~(0) dO is the marginal or predictive density. Given data D and the posterior density r~(0[D), the predictive density is p(xlD) = foP(XlO)zr(OID)dO. R. E. Barlow and F. Proschan 236 If asked to give the probability of survival until time t, we would calculate P(X > t l D) = p(xlD) d x . EXAMPLE 2.1. For the exponential density 2 e-xx, k ovserved failures, T total time on test, and the General Sampling Plan, the likelihood is proportional to 2/` e - a t . For the natural conjugate prior, ~(2) b a 2a - 1 e - oa = r(a) the posterior density is ~(2lk, T) = (b + T ) a + k 2 a + k - I e-(b+ r)x/F(a + k). In this case the probability of survival until time t is P ( X > thk, T) = f: e-'t/Tz(2]k, T ) d 2 (2.1) +t+ T/ 2.1. Bayes estimators We will need the following notation: El0]=fo 0~z(0)d0 and E[O,D]=~o On(OlD)dO. Of course, E[ t?] is the mean of the prior distribution while E[ OlD] is the mean of the posterior distribution. We wish to select a single value as representing our 'best' estimator of the unknown parameter 0. To define the best estimator we must specify a criterion of goodness (or equivalently, of poorness). Statisticians measure the poorness of an estimator 0 by the expected 'loss' resulting from their estimator 0. One very popular loss function is squared error loss: specifically, having observed data D and determined the posterior density ~z(0[D), the expected squared error loss is given by E l ( 0 - 0)2ID] ; (2.2) the expectation is calculated with respect to the posterior density n(OID). We choose a point estimator 0 so as to minimize the expected squared error loss Life distribution models and incomplete data 237 in (2.2); i.e., we choose O to satisfy minimum E[( 0 - a)2rD] = E[( 0 - 0)21D]. (2.3) a To find the minimizing value t), we add and subtract E ( O I D ) in the loss function to obtain E l ( 0 - a)ZlO] = E[( O - E ( OID))21D] + [E( OID) - a] 2 . Since we wish to minimize the right hand side, we set a = E ( 0 J D ) , which then represents the solution to (2.3). The resulting estimator, E(0ID), the mean of the posterior, is called the Bayes estimator with respect to squared error loss. THEOREM 2.2. The Bayes estimator of a parameter 0 with respect to squared loss is the mean E ( 0 1D) of the posterior density. Another loss function in popular use is the absolute value loss function: Eli 0- 01 ID]. (2.4) To find the minimizing estimator using this criterion, we choose 0 to satisfy: minimumE[ p0 - al ID] = E[I 0 - Of ID]. (2.5) a It is easy to show: THEOREM 2.3. The Bayes estimator of a parameter 0 with respect to the absolute value loss function is the median of the posterior density. Specifically, the estimator 0 satisfies ~c(OID) dO = n(OID) dO = ½ . (2.6) Of course, the prior density and the loss function enter crucially in determining a 'best' estimator. However, no matter what criterion is used, all the information concerning the unknown parameter 0 is contained in the posterior density. Thus, a graph of rc(0[D) is more informative than any single parameter of the posterior density, whether it be the mean, the median, the mode, a quartile, etc. EXAMPLE 2.4. Assume that lifetime is governed by the exponential model, O - l e -x/°. Suppose we conjecture that E[ 0 Ik, T], for sampling plan with k, T sufficient, is linear in T for fixed k. It turns out that such a linear relationship holds if and only if we use as our prior the natural conjugate prior: R. E. Barlow and F. Proschan 238 bao-(a+ 1) e-b/O ~(o) = r(a) (See Diaconis and Ylvisaker (1979) for a proof of this result and for more general results of this kind.) The corresponding Bayes estimator with respect to squared error loss is E[ OIk, T] _ (b + T) ( a + k - 1) (2.7) However, the natural conjugate prior would not be appropriate if we believed, for example, that 0 could assume values only in two disjoint intervals. Under this belief, a bimodal prior density would be more natural, and the corresponding estimator E[ 0lD] would very likely be difficult to obtain in closed form such as in (2.7). However E[ 0 ID] could be computed by numerical integration. There are many other functions of unknown parameters for which we may want the Bayes estimator with respect to squared error loss. For example, we may wish to estimate the probability of survival until age t for the exponential model; i.e., estimate g(O) = e x p [ - ~ ] . (2.8) It is easy to show in this case that (2.9) is the Bayes estimator. If n(O) is the natural conjugate prior, then it is easy to verify that b+ g= T ]a+k b+t+ Tl ' i.e., this is the Bayes estimator of the probability of survival to age t given total time on test T and k observed failures. Note that this ~ is precisely the marginal probability of survival until time t. 2.2. Credible intervals As we have seen, Bayes estimators correspond to certain functions of the posterior distribution such as the mean, the mode, etc. A credible set or interval is another way of presenting a partial description of the posterior distribution. Life distn'bution models and incomplete data 239 Specifically, we choose a set C on the positive axis (since we are dealing with lifetime) such that f rr(OID)dO= 1 - (2.10) a. C Such a set C is called a Bayesian (1 - a) 100 percent credible set (or credible interval if C is an interval) for 0. Obviously, the set C is not uniquely determined. It would seem desirable to choose the set C to be as small (e.g., least length, area, volume) as possible. To achieve this, we seek a constant c 1 _ ~ and a corresponding set C such that C = {0] It(OlD)>/c,_~,} (2.11) and f re(OlD)dO= 1 - ~. (2.12) C A set C satisfying (2.11) and (2.12) is called a highestposterior density credible set (Box and Tiao, 1973). In general, C would have to be determined numerically with the aid of a computer. For the exponential model 2 e-ax, the natural conjugate prior is the gamma density. Since the gamma density is a generalization of the chi-square density, we recall the definition of the latter so that we can make use of it to determine credible intervals for the failure rate of the exponential. DEFINITION 2.5. A random variable, gZ(n), having density ix1 X n/2- 1 exp I - 2 fx2~°)(x) = for x/> 0, n = 1, 2 . . . . , (2.13) is called a chi-square random variable with n degrees of freedom (d.f.). A table of percentage points of the chi-square distribution may be found in Pearson and Hartley (1958). In addition, chi-square programs are available for more extensive calculations using electronic computers and programmable calculators. It is easy to verify that the Z2 random variable with 2n d.f. is distributed as 2(Y1 + Y2 + " ' " + Yn), where Y1, Y2 . . . . . Y~ are independent, exponentially distributed random variables with mean one. Thus, we obtain the following result useful in computing credibility intervals for the failure rate of the exponential model with corresponding natural conjugate prior. R. E. Barlow and F. Proschan 240 THEOREM 2.6. Let k failures and total time on test T be observed under sampling assumptions (1), (2) and (3) (Section 1)for the exponential model 2e -zx. Let )~ have the posterior density corresponding to the natural conjugate prior ~(~) b a )a - 1 e- b2 - r(a) with a an integer. Then p[Z2/2[2(a + k)] <<,~ <Z2-~/2[2(a + k ) ] i D ] = [ 2(b + T) 2(b + T) l _ ~, (2.14) where z~(n) is the lOOfl percentage point of a chi-square distribution with n d.f.; i.e., f ~ ( " ) fz2(m (x) dx = ft. REMARK, Because of the lack of symmetry of the Z2 density, the interval in (2.14) is not the highest posterior density credible interval. PROOF. It is easy to verify that (b + T)). given the data has a gamma density, 1~a+k-1 e 2 F(a + k) corresponding to the density of Y1 + "'" + Ya + k, where the Y's are independent unit exponential random variables. Hence 2).(b + T) ~t 2(Y 1 + " " + Ya+k), where st denotes stochastic equality; i.e., 2),(b + T) has a chi-square density with 2(a + k) d.f. [] COROLLARY 2.7. For 2(a + k) large (say 2(a + k ) > 30), the normal approximation provides the approximate credibility statement p [ ( a + k) + (a + k)l/Zz~/2 ~ b+T I (a + k) + (a + k)l/2z~_ ~/2 D1 J -1-~, b+T (2.15) where z~ satisfies ~ o~ ep(u) du = c~and q~(u) = ( 1 / x / ~ ) e u2/2 is the normal density with mean 0 and variance 1. - 241 Life distributionmodels and incomplete data PROOF. Since the Z2(2n) random variable can be written as Z2(2n) = 2(Yz + Y2 + "'" + Y.) where YI, Y2. . . . , Yn are independent unit exponentials, the Central Limit Theorem (e.g., Hoel, Port and Stone, 1971) applies. Note that EX2(2n) = 2n and Var[z2(2n)] = 4n. Thus, Z2(2n) - 2n is approximately normal with mean 0 and variance 1 by the Central Limit Theorem. [] COROLLARY 2.8. Let k failures and T total time on test be observed under the General Sampling Plan assumptions (1), (2) and (3) (Section 1), for the exponential model O-l e-~/o. Let 0 have the natural conjugate prior with integer a, then I P 2(b + T) ~< Z 2- ,/2 [2(a + k)] 2(b + T ) I D ] = Z2/2 [2(a + k)] l_~t" (2.16) PROOF. Since 0 has the natural conjugate prior distribution for the model 0 - 1 e -x/°, then ,~ = 1/0 has the natural conjugate prior for the model 2 e -~x. (2.16) follows from (2.14). [] 3. The Weibuli distribution Whenever possible, the choice of a life distribution model should be based on the underlying failure mechanisms. Simple structures composed of statistically independent components have been used to derive life distribution models valid when the number of structural components is very large. Suppose a structure of n components fails as soon as k components fail. If also component lifetimes are judged identically distributed and independent, then there are only two possible limiting structure life distributions in the sense that there exist sequences of normalizing constants {a,)~=l, {2n)n~_-i such that for all real x, lim P { 2 n ( ~ . , - a,) ~ x} n~o~ exists. The limit is either 1 (k - (' [,~(x - a)] ~ 1)! Jo e - " u k - a du, ~ , 2 > 0, x > a ~ > 0, (3.1) 242 R . E . B a r l o w a n d F. P r o s c h a n or 1 - (k f f e x p [)t(x - a)] | e - uuk - 1 du , 1)! ~o -oo<x<~, -oo<a<~,A>O (3.2) (Smirnov, 1952). In both cases a is a location parameter and 2 is a scale parameter while ~ and k are shape parameters. If k = 1, then (3.1) becomes W(xla, 4, e)= 1 - e x p { - [ 2 ( x - a ) ] ~ } , x>~a>/0, (3.1') the Weibull distribution, and (3.2) becomes A(xla, 2)= 1 - e x p { - e ~ ( X - a ) } , -oo<x<oo. (3.2') Thus, if X is the structure lifetime, then either X or exp (X) has a Weibull distribution. The failure rate for the Weibull distribution of (3.1') is rw(x )= ~ 2 ~ ( x _ a ) = - i forx>~a, (3.3) and 0 elsewhere. In the second case it is rA(X) = 2 e x p [ 2 ( x - a)]. (3.4) For all parameter values, (3.4) is increasing in x. Hence, if we wish to allow the possibility that the failure rate may be decreasing we must choose the Weibull model, (3.1'), with e < 1. The Weibull model appears to furnish an adequate fit for some strand lifetime data with estimated values of e less than 4. On the other hand, it has been empirically observed that for strength data, estimates for e using the Weibull model are often large ( > 27 in some cases). This suggests that (3.2') may provide a better model for strand strength data. 3.I. Inference for the Weibull distribution The Weibull life distribution model has three parameters: a, 2, and e. The parameter a > 0 is a threshold value for lifetime; before time a we expect to see no failures. If there is no physical reason to justify a positive threshold value, the analyst should use the two parameter Weibull model. The most simple model compatible with prior knowledge concerning physical processes will often provide the most insight. The Weibull density is f(xla, ~, 2) = ~2~(x - a) ~- a e-[~(x-a)l~ for x >~ a and 0 elsewhere. (3.5) 243 Life distribution models and incomplete data Usually we wish to quantify our uncertainty about a particular aspect of the life distribution, such as the probability of surviving x hours. For the three parameter Weibull model, this is given by f f ( x l a , 2, ~) = e x p { - [ 2 ( x - a)]~}. (3.6) It is clearly sufficient to assess our uncertainty concerning a, 2, and ~. Suppose data are obtained under the General Sampling Plan (Section 1). Let xl, x 2 . . . . . x k denote the unordered observed failure ages and n(u) the number surviving until age u. Then by Theorem 1.6 in Section 1, the likelihood is given by L ( a , ~, ).ID) oc c~2 k~ (x i - a) i=l l 'I Ira exp - 2~ an(u) (u - a) ~ - 1 du 1t for a ~< xi and ~, 2 > 0. Suppose there are m withdrawals and we pool observed failure and loss times and relabel them as 0 =-- t(o ) ~ t(1 ) ~ t(2 ) ~ ' ' ' ~ t(k+m ) ~ t. Then, for a ~< x i, i = 1, 2 . . . . . k, we have f a°o k +m n(u) (u - a) ~ - ' du = Z F t(O 1) (n-i+ i= 1 + (n - k - (u-a)~ ,du ,I t(i_ 1) m) f/ (u - a) ~ - I d u . (3.7) (k+m) Observation is confined to the age interval [0, t]. Two important deductions can be made from (3.7): 1. The only sufficient statistic for all three parameters (or for a and 2 alone when a = 0) is the entire data set. 2. No natural conjugate family of priors is available for all three parameters (or for ~ and 2 alone when a = 0). Consequently, the posterior distribution must be computed using numerical integration [see Diaconis and Ylvisaker (1979)]. For most statistical investigations, a and perhaps also a would be considered nuisance parameters. By matching our joint prior density in a, 2 and a with the likelihood (3.7), we can calculate the posterior density, re(a, 2, aiD). For example, of a is considered a nuisance parameter, then we would calculate the marginal density on 2 and ~ as ~z(a, ).ID) = n(a, ct, 2[D) d a . ~O~ 244 R. E. Barlow and F. Proschan 3.2. Credibility regions for two parameter models Let rc(~, kID) be the posterior density for a two parameter model such as the Weibull model above with scale parameter 2 and shape parameter ~. To find the so-called 'highest posterior density' credibility region for ~ and 2 simultaneously (Section 2), we find a constant c(fi) by sequential search such that: R = [(c¢, 2) 1 (Tr(~, 410)>~ c(fl)] (3.8) and f f ~(a,21D)d~d2=fl. The region R defined above is a fl(100) percent credibility region for a and 4. For unimodal densities such regions are bounded by a single curve C which does not intersect itself (i.e., a 'simply connected region'). To illustrate the use of Weibull credibility regions we have computed credibility regions corresponding to the data in Tables 3.1 and 3.2. Twenty-one pressure vessels were put on life test at 68~o of their ultimate mean burst stress. A pressure vessel is filled with a gas or liquid and provides a source of mechanical energy. They are used on space satellites and other space vehicles, After 13488 hours of testing, 5 failures were recorded, After an additional 7080 hours of testing, an additional 4 failures were recorded. Table 3.1 Ordered failure ages of pressure vessels life tested at 68~o of mean rupture strength (n = 21, observation to 13488 hours) Number of failure Age at failure (hours) 1 2 3 4 5 4000 5376 7320 8616 9120 Table 3.2 Ordered failure ages of pressure vessels life tested at 68~o of mean rupture strength (failures between 13488 hours and 20568 hours) Number of failure Age at failure (hours) 1 2 3 4 14400 16104 20231 20233 Life distribution models and incomplete data 245 Figure 3.1 displays credibility contours for ct and 2 after 13488 hours of testing and again after 20 568 hours of testing. The posterior densities were computed relative to uniform priors. The posterior density computed after 20568 hours could also be haterpreted as the result of using the posterior (calculated on the basis of Table 3.1 and a fiat prior) as the new prior for the data in Table 3.2. A qualitative measure of the information gained by an additional year of testing can be deduced by comparing the initial (dark) contours and the tighter (light) contours in Figure 3.1. 2 , O0 ~after 13 4 8 8 h o u r s --after 20 568 hours 1 . 5 0 -- 1. O0 -- (D 0 0 O. 5 0 -- O. O0 -- ~ i 0.'t0 !f] 0.80 J I I r i i ~ ~i 1.20 ,: i i I I r ]1 1.60 2.00 2.40 J I [ J J J I f t r I I I I Ii 2.80 3.20 3.60 4.00 I i i Ir ~t.40 4 . 8 0 Fig. 3.1. Highest probability density contours for ~ and 2 for Kevlar/epoxy pressure vessel life test data, T h e pressure vessels w e r e tested at 68~o stress level. R. E. Barlow and 1:. Proschan 246 To predict pressure vessel life at the 68~o stress level, we can numerically compute P[X>tlD]=fo~fo°~e-(X°°~(a, AlD)d=d2 where rt(~, 2[D) must be numerically computed using the given data, D. If the mean life ,(1+:) O- or the standard deviation of life computed by making a change parameter. For example, if a = 0 the mean life, 0, we can use the are of interest, their posterior densities can be of variable and integrating out the nuisance in the Weibull model and we are interested in Weibull density in terms of c~ and 0. 1+ f ( x l a , O) = a F 1+x ~- 1 exp - 0 a 0 to compute the joint posterior density rc(~t, 0[ D). The prior for a and 2 must be replaced by the induced prior for a and 0. This may be accomplished by a change of variable and by computing the appropriate Jacobian. The marginal posterior density of 0 is then n(OID ) = ~0°° 7r(a, OlD) d~. This can then be used to obtain credibility intervals on 0. 4. Notes and references 4. I. Section 1 In the General Sampling Plan we needed to assume that any stopping rules used were noninformative concerning the failure distribution. The need for this assumption was pointed out by Raiffa and Schlaiffer (1961). Examples of informative stopping rules were given by Roberts (1967) in the context of two stage sampling of biological populations to estimate population size (so-called capturerecapture sampling). Life distribution models and incomplete data 247 4.2. Section 2: Unbiasedness The posterior mean is a Bayes estimator of a parameter, say 0, with respect to squared error loss. It is also a function of the data. An estimator, O(D), is called unbiased in the sample theory sense if ^ E~[ b(D)l 0] = 0 for each 0e O. No Bayes estimator (based on a corresponding proper prior) can be unbiased in the sample theory sense (Bickel and Blackwell, 1967). Most unbiased estimators are in fact inadmissible in the sample theory sense with respect to squared error loss. For example, 0(D) = T/k is a sample theory unbiased estimator for the mean of the density 0-~ e -x/°. However it is inadmissible in the sense that there exists another cO(D) with e :~ 1 such that, for all 0 Er[[cO(D ) - 0121 0] < E,~[ [ O(D) - O]z ] 0]. To find this c, consider Y = O(D)/O and note E Y = 1. Then we need only find c such that ElJ(Cr- 1)210] is minimum. This occurs for co = E Y / E Y 2 which is clearly not 1. Hence 0(D) is sample theory inadmissible. Sample theory unbiasedness is not a viable criterion. For ~_arge k, 0 ( D ) = T/k will be approximately the same as our Bayes estimator. However, T/k is not recommended for small k. Since tables of the chi-square distribution have in the past been more accessible than tables of the gamma distribution, we have given the chi-square special treatment. However with modern computing facilities, we really only need to use the more general gamma distribution. 4.3. Confidence intervals A (1 - c~)100~o confidence interval in the sample theory sense in one such that if the experiment is repeated infinitely often (and the interval recomputed each time) then (1 - ~)100~o of the time the interval will cover the fixed unknown true parameter 0. Since confidence intervals do not produce a probability distribution on the parameter space for 0, they cannot provide the basis for action in the decision theory sense; i.e., a decision maker cannot use a sample theory confidence interval to compute an expected utility function which can then be maximized over his set of possible decisions. If for 2 e -~x we choose the improper prior, n ( 2 ) = 1/2, then the chi-square ( 1 - ~)100~o credible intervals and the sample theory ( 1 - a)100~o confidence intervals agree. Unfortunately, such improper credible intervals can be shown to 248 R. E, Barlow and F. Proschan violate certain rules of logical behavior. Lindley (personal communication) provides the following simple illustration of this fact for the exponential model 2 e-~x. Suppose n units are put on test and we stop at the first failure, so that T = nXo). Now T given 2 also has density 2e -~x so that (ln2)/T is a 50~o improper upper credible limit on 2; i.e., P [ "~<(ln2) lT T'rc(A)=~l=0"50" (4.1) Suppose now that T is observed and we accept the probability statement (4.1). Consider the following hypothetical bet. (i) If ~. < (ln2)/T we lose the amount e- r; (ii) If 2 >/(In 2)/T we win e- r. We can pretend that the true 2 is somehow revealed and bets are paid off. If we believe statement (4.1), then given T such a bet is certainly fair. Now let us compute our expected gain before T is observed (preposterior analysis). This is easily seen to be (conditional on 2) - f on 2~/~2 e - ~ t e - t d t + ,d 0 f ~ 2- ~/~- 1] 2 e - ~ t e - ~ d t = - 2[ ,)(ln 2)/Z 1+ 2 which is negative for all 2 > 0. Note that this is what we subjectively expect, since as (improper) Bayesians, every probability (and presumably even an improper prior) is subjective. The contradiction lies in the observation that 1. conditional on 2 and prior to observing T, our expected winnings are negative for all 2; 2. conditional on T, our expected loss is zero (using the improper prior ~ ( A ) = 1/2). The source of the contradiction is that we have not measured our uncertainty for all events by probability. For example, we have assigned the value ~ to the event 2 < 2 o for all 2 0 > 0 ; i.e., ~ r c ( 2 ) d 2 = S ~ ( 1 / A ) d 2 = ~ . We can prove that for any set of uncertainty statements that are not probabilistically based (relative to proper distributions), a system of bets can be constructed which will result in the certain loss of money. A bet consists of paying pz < z dollars to participate with the understanding that if an event E occurs you win z dollars and otherwise you win nothing. 4.4. Section 3 The Weibull distribution is one of several extreme value distributions. See Barlow and Proschan (1975), Chapter 8, for a more advanced discussion of extreme value distributions. Life distribution models and incomplete data 249 Acknowledgements W e w o u l d like to a c k n o w l e d g e D e n n i s L i n d l e y for his perceptive c o m m e n t s a n d criticisms o f a n earlier draft. T h a n k s are also d u e to C o l l e e n P o s t m u s a n d M a r i k o K u b i k for t y p i n g m a n y v e r s i o n s p r e v i o u s to this one. References Barlow, R. E. and Proschan, F. (1975). Statistical Theory of Realibility and Life Testing. Holt, Rinehart and Winston, New York. Bickel, P. J. and Blackwell, D. (1967). A note on Bayes estimates, Ann. Math. Statist. 38, 1907-1911. Box, G. E. P. and Tiao, T. C. (1973). Bayesian Inference in Statistical Analysis. Addison-Wesley, Reading, MA. De Groot, M. H. (1970). Optimal Statistical Decisions. McGraw-Hill, New York. Diaconis, R. and Ylvisaker, D. (1979). Conjugate priors for exponential families. Ann. Statist. 7, 269-281. Edwards, W., Lindman, H., and Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Rev. 70, 193-242. Hoel, P. G., Port, S. C., and Stone, C. J. (1971). Introduction to Probability Theory. Houghton Mifflin, Boston, MA. Lindley, D. V. (1978). The Bayesian approach. Scandinavian J. Statist. 5, 1-26. Pearson, E. S. and Hartley, H. O. (1958). Biometnka Tables for Statisticians. Vol. 1. The University Press, Cambridge, England. Raiffa, H. and Schlaiffer, R. (1961). Applied Statistical Decosion Theory. Harvard Business School, Boston, MA. Roberts, H. V. (1967). Informative stopping rules and inferences about population size. J. Amer. Statist. Assoc. 62, 763-775. Smirnov, N. V. (1952). Limit distributions for the terms of a variational series. Trans. Math. Soc. Ser. 1, 1-64. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 251-280 l/1 / Piecewise Geometric Estimation of a Survival Function* Gillian M. Mimmack and Frank Proschan 1. Introduction and summary The problem of estimating survival probabilities from incomplete data is well known in the fields of reliability, medicine, biometry and actuarial science. The general situation is described as follows. The variable of interest is the lifespan of some unit: the investigator wishes to estimate the probability of survival beyond any given time. To this end, n identical units are placed 'on test'. Each item is either observed until failure, resulting in an uncensored observation, or is removed from the test before failure, resulting in a censored observation. Thus the data available consist of a number of lifelengths and a number of truncated lifelengths: the statistical problem is to estimate the probability distribution of the lifelengths. The various statistical approaches to the problem can generally be classified according to the restrictiveness of the model assumed and the type of information utilized. At one extreme are purely parametric procedures, which involve assuming that the underlying life distribution belongs to a specific parametric family. These procedures utilize interval information. The Bayesian estimator described by Susarla and Van Ryzin (1976) makes allowance for both parametric and nonparametric models: the type of information utilized depends on the assumptions about the prior distribution. As our approach to the problem is neither parametric nor Bayesian, we do not consider these procedures further but concentrate on nonparametric procedures. Nonparametric procedures range in sophistication from the well-known actuarial estimator, which is a step function constructed from ordinal information alone, to the piecewise polynomial estimators of Whittemore and Keller (1983) that utilize interval information. The most widely used nonparametric estimators are those of Kaplan and Meier (1958) and Nelson (1969). These estimators are also step functions constructed from ordinal information. Their properties are described by Eft-on (1967), Breslow and Crowley (1974), Petersen (1977), Aalen * Research supported by the Air Force Office of Scientific Research, AFSC, USAR, under Grant AFOSR 82-K-0007. 251 252 G. M. Mimmack and F. Proschan (1976, 1978), Kitchin, Langberg and Proschan (1983), Nelson (1972), Fleming and Harrington (1979), and Chen, Hollander and Langberg (1982). One of the by-products of the estimation process is an estimate of the failure rate function: here, another issue is raised. It is evident that survival function estimators that are step functions do not provide useful failure rate function estimators: Miller (1981) mentions smoothing the Kaplan-Meier estimator for this reason and summarizes the development of other survival function estimators that may be obtained by considering a special case of the regression model of Cox (1972). These estimators generally correspond to failure rate function estimators that are step functions and utilize at most part (but not all) of the interval information contained in the data. Whittemore and Keller (1983) give several more refined failure rate function estimators that are step functions and utilize full interval information. They also describe even more complex estimators that utilize full interval information: however, these are not computationally convenient compared with their simpler estimators. It seems, from their work, that a successful rival of the Kaplan-Meier estimator should be only marginally more complex than it (so as to be computationally convenient and yet yield a useful failure rate function estimator) and also should utilize more than ordinal information. In Section 2, we propose an estimator that not only provides a reasonable failure rate function estimator but also utilizes interval information. Moreover, it is computationally simple. Our estimator is a discrete counterpart of two versions of a continuous estimator proposed independently by Kitchin, Langberg and Proschan (1983) and Whittemore and Keller (1983). The motivation for the construction of our estimator is the same as that of the former authors, and our model is the discrete version of theirs: in contrast, the latter authors assume the more restrictive model of random censorship and obtain their estimator by the method of maximum likelihood. This provides an alternative method of deriving our estimator. The remaining sections are concerned with properties of our estimator. As this presentation is expository, proofs are omitted: Mimmack (1985) provides proofs. In Section 3, we explore the asymptotic properties of our estimator under increasingly restrictive models. Our estimator is strongly consistent and asymptotically normal under conditions more general than those typically assumed. Section 4 deals with the relationships among our estimator, the Kaplan-Meier estimator, and the above-mentioned estimator of Kitchin et al. and Whittemore and Keller. The section ends with an example using real data. In Section 5, we continue the comparison of the new estimator and the Kaplan-Meier estimator: since the properties of the new estimator are expected to resemble those of its continuous counterparts, we discuss the implications of simulation studies designed to investigate the small sample behaviour of these estimators. We also present the results of a Monte Carlo pilot study designed to investigate the small sample properties of our estimator. Piecewise geometric estimation of a survival function 253 2. Preliminaries In this section we formulate the problem in statistical terms and define our cstimator. Let X denote the lifelength of a randomly chosen unit, where X has distribution function G. Suppose that n identical items are placed on test. The resultant sample consists of the pairs (Z1, bl) . . . . . (Z~, b~), where Z; represents the time for which unit i is observed and b; indicates whether unit i fails while under observation or is removed from the test before failure. Symbolically, for i = 1, ..., n, we have X; --- lifelength of unit i, where X; has distribution G, Y~ = time to censorship of unit i, Z; = min(X~, Ye), ai = I(X;<... Y~). (Xt, Yt), - . . , (Xn, Yn) are assumed to be independent random pairs. Elements of a pair X; and Y~, where i = 1, . . . , n, are not assumed to be independent. We assume that the lifelength and censoring random variables are discrete. Let 5f = {x~, x 2. . . . } denote the set of possible values of X and Y¢ = {Yl, Y2. . . . } denote the union of the sets of possible values of Y~, Yz . . . . , where ~¢ ___ &r. The survival probabilities of interest are denoted P ( X > xk), k = 1, 2 . . . . . where P ( X > xk) = G(x~) = 1 - G(xk), k = 1, 2 , . . . . It is evident that this formulation differs from that of the model of random censorship which is generally assumed in the literature, and in particular, by Whittemore and Keller (1983). These authors assume that the lifelength and censoring random variables are continuous, that the corresponding pairs X; and Y~, where i = 1, 2, ..., are independent, and that the censoring random variables are identically distributed. Although Kaplan and Meier (1958) assume only independence between corresponding lifelength and censoring random variables, Breslow and Crowley (1974), Petersen (1977), Aalen (1976, 1978), and others--all of whom describe the properties of the K a p l a n - M e i e r e s t m a t o r - - a s s u m e also that the censoring random variables are identically distributed. Our formulation is the discrete counterpart of that of Kitchin, Langberg and Proschan (1983): likewise, our estimator is the discrete counterpart of theirs. Before describing our estimator, we give the notation required. Let nl be the random number of distinct uncensored observations in the sample and let t I < t2 < • • • < tn, denote these distinct observed failure times, with to = 0. Let n 2 be the random number of distinct censored observations in the sample and let s~ < s 2 < • • • < sn: denote these times, with s o --- 0. Let D; be the number of failures observed at time t;: D,= ~. I ( Z j = j=l ti, ~.= l ) fori=l,...,n,. 254 G. M. Mirnmack and F. Proschan Let C,. be the number of censored observations equal to se: C i = ~ I ( Z j = s i, b j = 0 ) for i = 1. . . . . n 2. j=l Let fin(t) --- 1 - Fn(t ) denote the proportion of observations that exceed t: ff~(t) = 1 ~ I(Z_i> t) for t~ [0, oo). H j=l Let F~ (t) denote the proportion of failures observed at or before t: F~(t)= -1 ~ I(Zj<~t, bj= 1) for t~[O, ~). Let T~. be a measure of the total time on test in the interval (t;_ T i = # {m: ti_ 1 < + Xm ~ E k: l i 1, ti]: ti)(nFn(ti) -~- Oi) ~{m:ti_l<Xm<~Sk}Ck for i = 1, . . . , n , , l < Sk ~ tl where # A denotes the cardinality of the set A. (If failure and censoring r a n d o m variables are lattice r a n d o m variables, then T; is the total time on test in (t;_ 1, ti]. In general, however, 7",.increases by one unit whenever an item on test survives an interval of the form ( x j _ l , xj], where t i - 1 < X j _ 1 < X j ~ ti, irrespective of the distance between xj_ 1 and x±) We now construct our estimator. Expressing the survival function G in terms of the failure rates P ( X = XklX>~ Xk), k = 1, 2, . . . , we have k P(X>Xk): 1-I [ 1 - P ( X = x + I X > ~ x ; ) ] for k = 1 , 2 . . . . . (2.1) j=l It is evident from (2.1) that we may estimate our survival function at x k from estimates of the failure rates at xl, x2 . . . . , x k. In the experimental situation, failures are not observed at all the times x 1, x 2 . . . . so specific information about the failure rates at m a n y of the possible failure times is not available. Having observed failures at q, t 2, . . . , t,~, we find it simple to estimate the failure rates P ( X = te]X >~ ti), i = 1. . . . . n 1. However, the question of how to estimate the failure rates at the intervening possible failure times requires special consideration. One a p p r o a c h - - t h a t of Kaplan and Meier (1958), Nelson (1969) and o t h e r s - - i s to estimate the failure rates at these intervening times as zero since no failures are observed then. However, not observing failures at some possible failure times may be a result of being in an experimental situation rather than evidence of very small Piecewise geometric estimation of a survival function 255 failure rates at these times, so we discard this approach and consider nonzero estimates. It is reasonable to assume that the underlying process possesses an element of continuity in that adjacent failure rates do not differ radically from one another. Thus we consider using the estimate of the failure rate at t,. to estimate the failure rate at each of the possible failure times between t;_ 1 and ti, where i = 1, ..., n~. We are therefore assuming that our approximating distribution has a constant failure rate between the times at which failures are o b s e r v e d - - t h a t is, f i ( X = x k l X >~ Xk) = O, (2.2) for ti_ l < x k <~ t i, i = 1 , . . . , n l , where for i = 1. . . . . n 1. qe=P(X=tilX>~t~) Substituting (2.2) into (2.1), we obtain i--1 P(X> Xk) = (1 - qi) #{m:t . . . . . . ~Xk} I-I (l -- Oj)#{m:tj . . . . . ~tj} j=l for t;_ ~ < x k <~ t i, i = 1 , . . . , nI . (2.3) We note that the property of having constant failure rate on gr characterizes a family of geometric distributions defined on Y'. In particular, the failure rates q~ . . . . . qn, identify n~ geometric distributions G~ . . . . , G~ defined on ~c. The survival functions, G 1. . . . . Gn,, have the geometric form I Gi(Xk)--(1-qi) k for k = 1 , 2 . . . . and i = 1. . . . . n l . (2.4) Inspection of (2.3) and (2.4) reveals that our estimating function is constructed from the geometric survival functions G1, . . . , Gn,, where G; is used in the interval (t i_ 1, ti], i = 1 . . . . . n l . Consequently, the estimator (2.3) is called the Piecewise G e o m e t r i c E s t i m a t o r (PEGE). It remains to define estimators of the failure rates ql . . . . , qm" This was originally done by separately obtaining the maximum likelihood estimators of the parameters of n 1 truncated geometric distributions: the procedure is outlined at a later stage because it utilizes the geometric structure of (2.3) and therefore provides further motivation for the name ' P E G E ' . A more straightforward but less appealing approach is to obtain the maximum likelihood estimates of q~, . . . , q,, directly: denoting by L the likelihood of the sample, we have Substituting (2.3) into this expression and differentiating yields the unique maximum likelihood estimates qi = Di/Ti, i = 1. . . . , n1. 256 G. M. Mimmack and F. Proschan Substituting 01, . . . , an, into (2.3), we finally obtain our estimator, formally defined as follows. DEFINITION 2.1. The Piecewise Geometric Estimator ( P E G E ) of the survival function of the lifelength r a n d o m variable X is defined as follows: 1 forxk<O or n l = O , i--1 (1 - D f f r , ) # { ~ : , . . . . . . : ( x > xk) = ~Xk} IX (1 - D j / T j ) # { m : t , - ' < x " ~ ' J } j=l for ti_ 1 < Xk <~ ti, i = 1, . . . , hi, n I > 0 , nl I-1(1 - Dj/Tj) #(m:t' ..... <~t,} j=1 for Xk > tnl , n 1 > 0 . The alternative derivation of the P E G E emphasizes its geometric structure: it turns out that 01 . . . . , 0 , , defined above are m a x i m u m likelihood estimators of the parameters of the truncated geometric distributions G* . . . . , G*, defined below. For i = 1, ..., n~ we formulate the following definitions: Let Ne = # {m: t,._ 1 < Xm ~ t~} be the number of possible times of failure in the interval (t,._ 1, t~] and let X* be the number of possible times of failure that a unit of age t~_ 1 s u r v i v e s - - t h a t is, X* = number of trials to failure of a unit of age t,._ 1 , where the possible values of X* are assumed to be 1, 2 . . . . , N~, N,.+ . The distribution G* of X* is then given by G*(k)=(1-q,)k for k = 1 , 2 . . . . ,IV,., G : ( N ; - ) = O. The information available for estimating qi consists of nff,(t~_ l ) observations on X,.*: of these, D; are equal to N~, nF,(te) are equal to N~+ , and for all sj in the interval (t~_ 1, ti], Cj are equal to the number # {m: t~_ 1 < Xm <<-Sj}. The resultant m a x i m u m likelihood estimator of q; is precisely 0~ defined above. It is evident that the estimators 01, - . . , 0,, have the form of the usual m a x i m u m likelihood estimator of a geometric p a r a m e t e r - - t h a t is, Estimated failure rate = number of failures observed total time on test Moreover, we note that this is the form of the failure rate estimators in the intervals (t o, q ] . . . . . (t,l, oo) defined for the Piecewise Exponential Estimator Piecewise geometric estimation of a survival function 257 (PEXE) of Kitchin, Langberg and Proschan (1983). In terms of our notation (modified for continuity), the PEXE is defined as follows: 1 for t < 0 or nl = 0 , i--1 exp[-(t- ti_,)2i] I-[ e x p [ - ( t j . j=1 P * ( X > t) = for ti_ l < t <~ t/, i = l . . . . tj_ 1),~j] , n l , nl > O , (2.5) n! I-'[ e x p [ - ( t j - tj_ 1),~j] j=l for t > t n , , where nl>0, ^ 2 i = 1/7i for i = 1 , . . . , n 1, 7/= f t ti nFn(u) du for i-- 1. . . . . n l . i-1 For i = 1, ..., n 1, 2/is the failure rate in the interval (ti_ 1, t/] and 7,- is the total time on test in this interval. The PEXE is a piecewise exponential function because its construction is based on the assumption of constant failure rate between observed failures: just as a constant discrete failure rate characterizes a geometric distribution so a constant continuous failure rate characterizes an exponential distribution. Thus the P E G E is the discrete counterpart of the PEXE. Returning to our introductory discussion about the desirable features of survival function estimators, we now compare the P E G E with other estimators in terms of these and other features. First, the P E G E is intuitively pleasing because it reflects the continuity inherent in any life process. The Kaplan-Meier and other estimators that are step functions do not have this property. Second, we note that the P E G E utilizes interval information from both censored and uncensored observations. It is therefore more sophisticated than the Kaplan-Meier and Nelson estimators. Moreover, none of the estimators of Whittemore and Keller utilizes more information than does the PEGE. Third, the P E G E provides a simple, useful estimator of the failure rate function. While this estimator is naive compared with the nonlinear estimators of Whittemore and Keller, the P E G E has the advantage of being simple enough to calculate by hand--moreover it requires only marginally more computational effort than does the Kaplan-Meier estimator. Regarding the applicability of the PEGE, we note that use of the P E G E is not restricted to discrete distributions because it can be easily modified by linear interpolation or by being defined as continuous wherever necessary. This is theoretically justified by the fact that the integer part of an exponential random variable has a geometric distribution: by defining the P E G E to be continuous, we 258 G. M. Mimmack and F. Proschan are merely defining a variant of the PEXE. The properties of this estimator follow immediately from those of the PEXE. Finally, apart from being intuitively pleasing, the form of the P E G E allows reasonable estimates of both the survival function and its percentiles. The Kaplan-Meier estimator is known to overestimate because of its step function form. We show in a later section that the P E G E tends to be less than the Kaplan-Meier estimator, and therefore the P E G E may be more accurate than the Kaplan-Meier estimator. Whittemore and Keller give some favourable indications in this respect. They define three survival function estimators that have constant failure rate between observed failure times. One of these is the PEXE, modified for ties in the data: the form of the failure rate estimator is the same as the form of the P E G E failure rate estimator--specifically, for i = 1. . . . , nl, Estimated failure rate in (t;_ 1, t,.] = number of failures observed at t~ total time on test during (t~_ 1, ti] (2.6) The second of these estimators is defined instead on intervals of the form [t~_ 1, ti): for i = 1, ..., nl, the failure rate estimator has the form Estimated failure rate in [tt_ 1, ti) = number of failures observed at t;_ 1 total time on test during [t~_ 1, t;) (2.7) The third of these estimators is obtained from the average of the two failure rate estimators described by (2.6) and (2.7). In a simulation study to investigate the small sample properties of these three estimators, Whittemore and Keller find that the first estimator tends to underestimate the survival function while the second tends to overestimate the survival function. From these results, we expect the P E G E to underestimate the survival function and its percentiles. Whittemore and Keller do not record further results for the first two estimators: however, they do indicate that, in terms of bias at extreme percentiles, variance and mean square error, the third estimator tends to be better than the Kaplan-Meier estimator. The implications for the discrete version of the third estimator are that, in terms of bias, variance and mean square error, it will compare favourably with the Kaplan-Meier estimator. An unanswered question is whether the performance of this estimator is so superior to the performance of the P E G E as to warrant the additional computational effort required for the former. Piecewise geometric estimation of a survival function 259 3. Asymptotic properties of the PEGE This section treats the asymptotic properties of the P E G E and of the corresponding failure rate function estimator. The properties of primary interest are those of consistency and asymptotic normality: secondary issues are asymptotic bias and asymptotic correlation. Initially considering a very general model, we obtain the limiting function of the PEGE and show that the s e q u e n c e s {Pn(X>Xk)}~°=l and {Pn(X=x/,p oo= ~ converge in distribution to Gaussian sequences. We then explore the X >/ X k)}~ effects of making various assumptions about the lifelength and censoring random variables. Under the most general model, the PEGE is not consistent and the failure rate estimators are not asymptotically uncorrelated: a sufficient condition for consistency is independence between corresponding lifelength and censoring random variables, and a sufficient condition for asymptotically independent failure rate estimators is that the censoring random variables be identically distributed. However, it is not necessary to impose both of these conditions in order to ensure both consistency and asymptotic independence of the failure rate estimators: relaxing the condition of independent lifelength and censoring random variables, we give conditions under which both desirable properties are obtained. Before investigating the asymptotic properties of the PEGE, we describe the theoretical framework of the problem, give some notation, and present a preliminary result that facilitates the exploration of the asymptotic properties of the PEGE. The probability space (f2, ~, P) on which all of the lifelength and censoring random variables are defined is envisaged as the infinite product probability space that may be constructed in the usual way from the sequence of probability spaces corresponding to the sequence of independent random pairs (X1, Yl), 0(2, II2). . . . . Thus 1"2 consists of all possible sequences of pairs of outcomes corresponding to pairs of realizations in 5f x Y¢: the first member of each pair corresponds to failure at a particular time and the second member of each pair corresponds to censorship at a particular time--that is, for each co in f2, k = 1 , 2 , . . . and j = 1,2 . . . . . (Xi, Yt)(co) = (Xi(co), Y~.(co)) = (Xk, yj) if the ith element of the infinite sequence co is the pair of outcomes corresponding to failure at xg and censorship at yj. The argument co is omitted wherever possible. Two conditions are imposed on the random pairs (XI, YI), (22, Y2) . . . . : (A1) There is a distribution function F such that lira 1 ~ P(Zi<~xk)=F(xk) n~°° H i=1 for k = 1 , 2 , . . . . G. M. Mirnmack and F, Proschan 260 (A2) There is a subdistribution function F 1 such that 1 lim - ~ P ( Z ~ < x k , 6;= 1 ) = F l ( x k ) n ~ o o iv/ i= 1 for k = 1,2 . . . . . It is evident that a sufficient condition for (A1) and (A2) is that the censoring random variables be identically distributed. Definitions of symbols used in this section are given below. Assumptions (A1) and (A2) ensure the existence of the limits defined. Let P k i = P ( Z i = X k , ~i= 1) f o r k = 1 , 2 , . . . andi--- 1. . . . . n, R k t = P ( Z i = x k, bi = 0 ) fork= 1,2,... andi= 1 n Pk = lim -~ Z P k ; = F I ( x k ) - F ' ( X k - , ) n~°° l'l 1 R k = lim - 1,...,n, f o r k = 1,2 . . . . . i=1 Z Rk, f o r k = 1,2 . . . . . n~oo n i= 1 The proposition below is fundamental: it asserts that, with probability one, as the sample size increases to infinity, at least one failure is observed at every possible value of the lifelength random variable. First, we need a definition. DEFINITION 3.1. Let t2* c f2 be the set of infinite sequences which contain, for each possible failure time, at least one element corresponding to the outcome of observing failure at that time--that is, t2* = {~: PROPOSITION 3.2. (Vk)(3n)X,,(e9) = x k, Y,,(o9) >1 Xk}. P(~2*) = 1. The proposition is proven by showing that the set of infinite sequences that do not contain at least one element corresponding to the outcome of observing failure at each possible failure time x, has probability z e r o - - t h a t is, P ( nlim°°~ ~=1 ~ {Xi= xk' Yi>/xk}C) =0 for k = 1 , 2 ..... As the pairs (X1, Y1), (X2, Y2), ... are independent, this is equivalent to proving the following equality: n lira l~ ( 1 - P ( X i = x k, Y,.>tXk))=O n~ i=l for k = 1,2 . . . . . (3.1) Piecewise geometric estimation of a survival function Since [I i= °° 1 (1 - p~) = 0 if and only if ~ i= o~ 1 Pi = of probabilities, and since (A2) implies that • P(X i=xk, Y,.>/Xk)= o0 OO, 261 where {Pi}~= 1 is any sequence for k = 1,2 . . . . , i=1 we have (3.1). The importance of the preceding proposition lies in the simplifications it allows. It turns out that, on 12" and for n large enough, the P E G E may be expressed in simple terms of functions that have well-known convergence properties. Since P(12*) = 1, we need consider the asymptotic properties of the P E G E on O* alone: these properties are easily obtained from those of the well-known functions. In order to express the P E G E in this convenient way, we view the estimation procedure in an asymptotic context. Suppose co is chosen arbitrarily from f2*. Then, for each k, there is an N (depending on k and co) such that X;(co) = xj and }',.(co)>~ xj for j = 1. . . . . k and some i ~< N. Consequently, for n >~ N, the smallest k distinct observed failure times tl, . . . , tk are merelY x l , . . . , x k, and, since the set of possible censoring times is contained in f , the smallest k distinct observed times are also x l , . . . , x k. T h e first k intervals between observed failure times are simply (0, x~], (Xl, x2] . . . . . (Xk- 1, Xk], and the function T~,~ defined on the ith interval is given by the number of units on test just before the end of the ith interval--that is, Ti, n = n F n ( x f - ) = n F n ( x i - 1) for i ~-- 1 . . . . , k and n/> N . (3.2) Likewise, we express the function D~, n defined on the ith interval in terms of the empirical subdistribution function F2 as follows: for i = 1, . . . , k and n ~> N . D~.,, = n [ F 2 ( x i ) - F 2 ( x ~ _ , ) ] (3.3) As the P E G E is a function of D;. n and T;, n, it can be expressed in terms of the empirical functions Fn and F2. Specifically, on O*, for any choice of k, there is an N such that for F l ( x ~ ) - F 2 ( x , _ 1!) >I N . Consequently, taking the limit of each side and using Proposition 3.2, we have P n lim/~,(X> Xk)= lim n~oo i= 1 1- ') F~ (xi) - F~ (x i_ l ) ~,,-(x~.-5 _ f o r k = 1,2 . . . . ] = J 1. 262 G. M. M i m m a c k and F. Proschan In exploring the asymptotic behaviour of the P E G E , therefore, we consider the behaviour of the limiting sequence of the sequence {i~l (1 Fln(Xi)-Fl(xi-1) °° ffn(X~--l) )}k=l The proofs of the results that follow are omitted in the interest of brevity. The most general model we consider is that in which only conditions (A1) and (A2) are imposed. The following theorem identifies the limits of the sequences {P.(X = x~lX>~ x~)}~=, and {/3~(X> Xk)}~= ~ for k = l, 2 . . . . and establishes that the sequences {/S.(X= XkIX>~Xk)}~=~ and {/S.(X> Xk)}ff=l converge to Gaussian sequences. THEOREM 3.3. (i) With probability 1, lim P . ( X = x k l X >>,x k ) = FI(Xk) - Fl(xk - 1) n~o~ fork=l, 2,.... (ii) With probability 1, fi(Fl(xi)-Fl(xi-1)) lira /~.(X > xk) = 1- ~~ i= 1 (iii) Let kl, . . . , kM be kl < k2 < " " < kM. Then M (P~(X = XkllX>~ xk, ) . . . . . ~_ for k = 1, 2 . . . . . F ( x i - l) arbitrarily chosen integers ffn(X = XkM]X>~ XkM)) is AN such (1) /~*, - Z* n where ~,* = (P~,/~(xk, _ , ) . . . . . q-1 ~q~ = PkqPkr 2 i=1 P,~,JF(xkM- ,)) , r--, 2 (~kinkj ~- ~kM+ki, kj ~[- aki, kM+kj -~ ~kM+ki, kM+k j) j=l /(~(x,~q_ ~ ) ~ ( ~ , _ ,)y r--1 "]- Pkr 2 (ffkM + kq,ki q- ~kM + kq, kM + ki)/((F(xkr-1 ))2F(xkq 1)) i=1 q-I + Pk~ ~ (ok,~ + a~.+,~.,D/(~(xk ,)(?(xk ,))2) i=1 + akM + k,. kr/(ff(Xkq - 1))F(Xkr-,) for q < r. that , 263 Piecewise geometric estimation of a survivalfunction lim 1 ~ P ~ , , ( 1 n~:x~ n -lim 1 ~ n~oo P~.;) for q = r, q = 1. . . . , M , i=1 n Pl,.,iPk~,, forq<r,q= 1. . . . . M, i= 1 r= 1,...,M, - l i m 1 ~ Pkq_M. iRkr. i r= n~oo n forq =M+ 1, . . . , 2M, i= 1 r = 1, . . . , M , lim 1 ~ R~q_M,i(l_ Rk~_M,;) f o r q = r , q = M + n~oo n l, i= 1 .... 2M, - lim 1 ~ n~oo n i= 1 Rkq_M,iRk,_m,i forq<r,q-M+ 2M, r = M + (iv) Let k I . . . . , kM be kl<k2<...<k M. Then M arbitrarily (/~.(X> xk, ) . . . . , P . ( X > XkM)) is A N chosen integers (1) p**, - Z** . (1 - P,./F(x,_ ,)1 . . . . . \i=l Z** / ..... M;r=l ..... kq kr Cry** ---- 1--[ (1 -- P / i f ( x , _ 1)) ~ I i=1 such that , ) M, (1 -- e j / ~ ' ( x j _ 1)) j=l kq ' Z l=1 2M. l-I (1 - P / i f ( x , _ 1)) , i=1 ].tJqr f q = l 1. . . . . n where p** = 1. . . . . kr 20"/*m/[(1 m=l -- e l / f f ( X l _ 1)(1 - Pro~if(x,,,_ 1))1 for q <~ r. It is evident from the theorem above that the P E G E is a strongly consistent estimator of the underlying survival function if and only if F l ( x k ) - F l ( x k - 1) _ P ( X : xk) ff(Xk_l) for k = 1, 2 . . . . . (3.4) P(X>/x~) The theorems below give conditions under which this equality holds. As for correlation, it is evident from the structure of the P E G E that any two elements of the sequence {Pn(X> xg))ff= 1 are correlated. Consequently the matrix 2~** G. M . M i m m a c k and F. Proschan 264 cannot be reduced to a diagonal matrix under even the most stringent conditions. However it turns out that, under certain conditions, the asymptotic correlation between pairs of the sequence {/Sn(X = xklX>~ x~)}ff= 1 is z e r o - - t h a t is, 1;* is a diagonal matrix. The following theorem shows that independence between lifelength and censoring random variables results in strongly consistent (and therefore asymptotically unbiased) estimators. However any pair in the sequence {/~n(X= xklY>~xk)}2= 1 is asymptotically correlated in this case. Since the matrices ~2" and Z** have the same form as in the theorem above, they are not explicitly defined below. THEOREM 3.4. Suppose (i) the random variables X i and Y,. are independent for i = 1, 2 , . . . , and (ii) there is a distribution function H such that 1 n lim -- Z P(Y¢<<-x~)=H(x*) n~°° fork= 1, 2 , . . . . n i=1 Then k (iii) F l ( x k ) = ~ P ( X = x i ) H ( x i_ 1) and ff(x~) = P ( X > x k ) H ( x ~ ) for k = 1, 2 . . . . i=1 (iv) with probability 1, nlim/Sn(X>xk)=G(xk) for k= 1,2 ..... (v) (/Sn(X = Xk, IX>~xk,), . . . , /S,~(X = xk,~lX>~xk,,,)) is AN (1) ~*, - 22* , n where k~ < k2 < "'" < k M are arbitrarily chosen integers and i,* = ( P ( x = xk~lX>_, x k , ) . . . . . P ( X = xk,~bX>-- x,,~,)). (vi) (/Sn(X> xk,), . . . , f i n ( X > XkM)) is AN (1) ~**, - 22** , n where k~ < k 2 < " ' < k M are arbitrarily chosen integers and ** = ( e ( x > x < ) , . . . , P ( X > XkM)) . A sufficient condition for (A1), (A2) and assumption (ii) of the preceding theorem is that the censoring random variables be identically distributed. In this case the failure rate estimators are asymptotically independent and the matrix Z** Piecewise geometric estimation of a survival function 265 is somewhat simplified: The conditions of the following corollary define the model of random censorship widely assumed in the literature. COROLLARY 3.5. Suppose (i) the random variables X i and Y,. are independent for i = 1, 2 , . . . , and (ii) the random variables Y1, Y2 . . . . are identically distributed. Then (iii) with probability 1, lifnoo13n(X> Xk) = -G(Xk) (iv) (/~,(X= Xk,[X>~Xk~ ) . . . . . for k = 1, 2 . . . . . f t , ( X = XkM[X>~XkM)) is AN (1) ~*, - X* , n where P ( X = XkM]X>~ Xk~)), l~* = (P(X = Xk~lX>~ xk, ) . . . . . ~* = { O ~ q r } q = 1. . . . . M;r=l ..... M' O.j:r={Po(X=Xkq'X~Xkq)P(X~Xkq'X~xkq)/F(Xkq-1) for q = r, for q # r. (v) (P.(X> xk,), ..., P.(X> XkM))is AN (1) #**, X** , n where ~,** = (P(X> x~,) . . . . . P(X > x~..)). r q=l,...,M;r=l,...,M' kq aS** = P ( X > Xk,)P(X > Xkr) 2 P ( X = x i l X >~ x,)/[ff(x i_ ,) i=1 "P ( X > x i l X >~ xi) ] for q <~ r. Having dealt with the most restrictive case in which the lifelength and censoring random variables are assumed to be independent, we now consider relaxing this condition. It turns out that independence between corresponding lifelength and censoring random variables is not necessary for asymptotic independence between pairs of the sequence of failure rate estimators: if the censoring random variables are assumed to be identically distributed but not necessarily independent of the corresponding lifelength random variables, then the failure rate estimators are asymptotically independent. However both the survival function and failure rate estimators are asymptotically biased. The following corollary expresses these facts formally. G. M. Mimmack and F. Proschan 266 COROLLARY 3.6. tributed. Then (ii) P k = P ( Z = x Suppose (i) the random variables Y1, Y2 . . . . are identically dis- (1) for k = 1, 2 . . . . . k, b = 1) and F ( x k ) = P ( Z > x k ) (iii) (/Sn(X = xk~lX>~ xk,) . . . . . P~(X = x~MIX>~ x~M)) is AN #*, I2" , n where #* = (Pk~/?(Xk, Z* = {l~i~ ) i _ 1..... 1 ) , ' ' . , Pk,/F(xk~ M;j--1 ..... ~Pk~(1-Pk,/ff(Xk, 1)), M' 1))/ff(Xk, 1))2 for i = j , for i ¢ j . (iv) (/~n(X> xk, ) . . . . , /~n(X> XkM)) is A N (1) #**, - L-** , n where #** = (1 - P,/ff(x i_1)), . . ' , l~ (1 - PJF(x i_1)) \i=1 , i=1 ..... M;,=I ..... M, kl aS;* = I~ (1 - Pi/?(x~_ ,)) 1~ (1 - Pm/ff(Xm_ 1)) i--1 m=l gj • ~ Pr/[(F(xr- 1))2( 1 - Pr/ff(Xr-1))] r forj<~ l. 1 The corollaries above give sufficient (rather than necessary) conditions for the two desirable properties of (i) consistency and (ii) asymptotic independence between pairs of the sequence of failure rate estimators {fi,(X = x k l X >1 Xk)}k~_ 1. The final corollaries show that both of the conditions of Corollary 3.5 are not necessary for these two desirable properties: the conditions specified in these corollaries are not so stringent as to require that corresponding censoring and lifelength random variables be independent (as in Corollary 3.5), but rather that they be related in a certain way. COROLLARY 3.7. I f the random variables Y1, Y 2 , . . . are identically distributed, then with probability 1, nlim ff n(X > xk) = G(x~) for k = 1, 2 . . . . if and only if P(Y,.>/ x k l X = xk) = P(Y,.>~ xglX>~ xk) for k = 1, 2, ... and i= 1 , 2 , . . . . Piecewise geometric estimation of a survival function 267 COROLLARY 3.8. Suppose (i) the random variables Y1, Y2, " " are identically distributed, and (ii) P(Y,>~ XklX = Xk) = P(Y,>~ XkIX>>- Xk) for k = 1, 2 . . . . and i = 1, 2, . . . . Then (iii) (IS,(X = Xk~rX>~ Xk,), . . . , P , ( X = XkMlX>~ Xk,~)) is AN (1) p*, - Z* , 1l where t~* = (P(X = x~, IX >t x~,) . . . . , P(X = x~, Ix >>-XkM)), Z* = {G;j },~, ..... ~;j=, ..... ~, ,7,= { o ( X = xk,[X >~ x~,)P(X > xk~IX >~ x~,)/F(xk~ ~) for i = j , jbr i ~ j . (iv) (ft,(X> xk~), . . . , /~,,(X> XkM)) is AN (1) ,u**, - Z** , n where p** : ( P ( X > xk,) . . . . . P ( X > x k , ) ) , ' z**={~.~*)j=l ..... M;,=, ..... M, ~** = P(X > X k ) P ( X > Xk,) ~ P(X = xilX >~ xi)/[ff(x,_ ,) i=1 • P(X>x~lX>/x~)] forj<<.l. The last two corollaries are of special interest because they deal with consistency and asymptotic independence in the case of dependent lifelength and censoring random variables--a situation that is not generally considered despite its obvious practical significance. Desu and Narula (1977), Langberg, Proschan and Quinzi (1981) and Kitchin, Langberg and Proschan (1983) consider the continuous version of the model specified in the last two corollaries. The condition specifying the relationship between lifelength and censoring random variables is in fact a mild one: re-expressing it, we have the following condition: P(X = xklX>~ x k, Yi>~ xg) _ P(X = Xk) P ( X ~ x k I X ~ xk, Y t ~ xg) f o r k = 1,2 . . . . P(X>. x~,) and/= 1,2 . . . . . This condition specifies that the failure rate among those under observation at any particular age is the same as the failure rate of the whole population of that age. G. M. Mimmack and F. Proschan 268 It is evident both intuitively and mathematically that this is a fundamental assumption inherent in the process of estimating a life distribution from incomplete data: if this assumption could not be made, the data available would be deemed inadequate for estimating the life distribution. Formally, it is the fact that the condition is both necessary and sufficient for consistency that indicates that it is minimal for the estimation process. It is clear, therefore, that the last two corollaries play an important role in estimation in the context of a practical model more general than the statistically convenient, but unnecessarily restrictive, model of random censorship. 4. The PEGE compared with rivals In Section 1 we motivate the construction of the PEGE by describing some desirable properties of nonparametric survival function estimators and then mentioning that the commonly used estimator of Kaplan and Meier (1958) does not fare well in terms of these properties. We now compare the PEGE with the Kaplan-Meier estimator. We begin with the most obvious desirable features of survival function estimators and then consider statistical and mathematical properties. In comparing the two estimators, we find that the issue of continuity arises and that the PEXE enters the comparison. The section ends with an example using real data. The subsequent section continues the comparison: we discuss the results of simulation studies. The K a p l a n - M e i e r estimator (KME) of the survival function of the lifelength random variable X is defined as follows: 1 forn 1=0or t < t l , n11>l, i--1 P(X > t) = I-[ (1 - D J n f f n ( t f )) for ti_ 1 ~< t < ti, i = 2, . . . , n 1, j~l nl~>2, nl I-I (1 - D J n f f , ( t f - )) for t t> tnl, nl ~> 1. j=l To the prospective user of a survival function estimator, two fundamental questions are, firstly, does the estimating function have the appearance of a survival function, and secondly, is it easy to compute? Considering the second question first, we observe that calculating the PEGE involves only marginally more effort than calculating the KME. Therefore, both estimators are accessible to users equipped with only hand calculators. The first question is a deeper one. If the sample is small or if there are many ties among the uncensored observations in a large sample, the K M E has only a few steps and consequently appears unrealistic. The PEGE, in contrast, reflects the continuity inherent in any life process by decreasing at every possible failure time, not only at the observed failure times. As the number of distinct uncensored Piecewise geometric estimation of a survival function 269 observations increases, both the P E G E and the K M E become smoother: the many steps of the K M E do allow it the appearance of a survival function, except possibly at the right extreme--there is no way of extrapolating very far beyond the range of observation if the K M E is used. (There are several ways of extrapolating from the PEGE.) At face value, therefore, the P E G E is at least as attractive as the KME. A related consideration is whether the estimator provides a realistic estimate of the failure rate function. The KME, being a step function, does not. The seriousness of this omission becomes more apparent when the K M E failure rate function is examined from a user's point of view: if an item of age t has a (perhaps large) chance of failing at its age, then claiming that a slightly older (or slightly younger) item cannot fail at its age seems unreasonable, particularly when it becomes evident that the claim is made on the grounds that none of the items on test happened to fail just after (or just before) time t. Intuitively--or from a frequentist's point of view--the very fact that one of the items on test failed at time t makes it less likely that another item in the sample will fail soon after t because the observed failure times should be scattered along the appropriate range according to the distribution function. Clearly, then, the gaps between observed failure times are a result of the fact that the sample is finite and are not indicative of zero (or very small) failure rates. The PEGE, on the other hand, is constructed so that a failure at time t, say, affects the failure rate in the gap before t. Thus the P E G E compensates for the lack of observations at the possible (but unobserved) failure times. The resultant failure rate function, being a step function, is still na'fve, but it does at least take into account the continuity of life processes and it does provide reasonable estimates of the failure rates at all possible failure times. A more aesthetic--but none the less important--issue is that of information loss. Here the P E G E is again at an advantage. Although interval information about the uncensored observations is used in spacing out the successive values of the KME, the failure rate estimators utilize only ordinal information. Moreover, the only information utilized from the censored observations is their positioning relative to the uncensored observations. Thus the information lost by the K M E is of both the ordinal and interval types. In contrast, the P E G E failure rate estimators use interval information from all the observations: in particular, the positions of censored observations are taken into account precisely. In terms of information usage, then, the P E G E is far more desirable than the KME. An apparently attractive feature of the K M E is that its values are invariant under monotone transformation of the scale of measurement. The P E G E is not invariant under even linear transformation. However, in the light of the discussion about information loss, it is evident that the KME's invariance, and the PEGE's lack thereof, are results of their levels of sophistication rather than properties that can be used for comparison. Having noted that the step function form of the K M E is not pleasing, we now point out that it is also responsible for a statistical defect, namely, that the K M E tends to overestimate the underlying survival function and its percentiles. The fact 270 G. M. Mimmack and F. Proschan that the KME consistently overestimates suggests that its form is inappropriate. Some indications about the bias of the PEGE are given by considering the relationship between the PEGE and the KME. Under certain conditions (for example, if there are no ties among the uncensored observations), the PEGE and the K M E interlace: within each failure interval, the PEGE crosses the K M E once from above. This is not true in general, however. It turns out that the K M E may have large steps in the presence of ties. In the case of the PEGE, however, the effect of the ties is damped and the PEGE decreases slowly relative to the KME. In general, therefore, it is possible to relate the PEGE and the KME only in a one-sided fashion: specifically, the PEGE at any observed failure time is larger than the K M E at that time. Examples have been constructed to show that, in general, the PEGE cannot be bounded from above by the KME. The following theorem relates /s (the PEGE) and P (the KME). THEOREM 4.1. (i) P ( X > ti) >~ P ( X > ti) for i = 1 . . . . . n 1. (ii) I f n f f , ( t j _ ~ ) / ( n F , ( t j _ l ) + Wj_I)<<.DflDj_ 1 for j = 2, . . . , i, where Wj denotes the number of censored observations at tj f o r j = 1. . . . , n 1, then f f ( X > ti) <~ P ( X > ti- ) for i = 1 . . . . . n I . It is evident that the condition in (ii) is met if there are no ties among the uncensored observations: this is likely if the sample is small. From the relationships in the theorem, we infer that the bias of the PEGE is likely to be of the same order of magnitude as that of the KME. Further indications about bias are given later. Having considered some of the practical and physical features of the PEGE and the KME, we turn briefly to asymptotic properties--briefly because the PEGE and the K M E are asymptotically equivalent--that is, P[(V k) nlim P n ( X > x~) = ,lirn P n ( X > xk)] = 1. The practical implication of this is that there is little reason for strong preference of either the PEGE or the K M E if the sample is very large. We now compare the models assumed in using the K M E and the PEGE. In the many studies of the KME, the most general model includes the assumption of independence between corresponding life and censoring random variables. Our most general model does not include this assumption. However this difference is not important because the assumption of independence is used only to facilitate the derivation of certain asymptotic properties of the KME: in fact, the definition of the K M E does not depend on this assumption, and the K M E and the PEGE are asymptotically equivalent under the conditions of the most general model of the PEGE. Therefore this assumption is not necessary for using the KME. The other difference between the models assumed is that the PEGE is designed specifically for discrete life and censoring distributions while the Kaplan-Meier model makes no stipulations about the supports of these distributions. However, Piecewise geometric estimation o f a survival function 271 distinguishing between continuous and discrete random variables in this context is merely a statistical convention--in fact, time to occurrence of some event is always measured along a continuous scale, and the set of observable values is always countable because it is defined by the precision of measurement. Since the process of estimating a life distribution requires measurements, it always entails the assumption of a discrete distribution: whether the support of the estimator is continuous or discrete depends on the way the user perceives the scale of measurement. In practice, therefore, there are no differences between the models underlying the P E G E and the KME: the P E G E is appropriate whenever the K M E is, and vice versa. Having pointed out that the P E G E may be used for estimating continuous survival functions, and having introduced the PEXE as the continuous counterpart of the PEGE, we compare the two. First we note that the PEXE is the continuous version of the P E G E because the construction of each is based on the assumption of constant failure rate between distinct observed failure times. The forms of the estimators differ because of the difference in the ways of expressing discrete and continuous survival functions in terms of failure rates. The P E G E and the PEXE are equally widely applicable since a minor modification of the PEXE can be made to allow for ties. (This estimator is defined in Whittemore and Keller (1983).) The relationship between the P E G E and the modified PEXE, and their positioning relative to the KME, is summarized by the following theorem and the succeeding relationship. Let P * * ( X > t) denote the modified PEXE of the survival probability P(X > t) for t > O. (i) P ( x > O < e * * ( x > t ) fort>O. (ii) l f nF,(tj_l)/(nT"n(tj_a) + Wj_I)<~Dj/Dj_ 1 for j = 2, ..., i, where Wj denotes the number of censored observations at tj for j = 1, ..., n,, then e * * ( x > t,) ~ P ( x > t,_ ,) for i= 1. . . . , n 1. THEOREM 4.2. From Theorems 4.1(i) and 4.2(i), we have P ( X > t~) <~P ( X > t;) < P * * ( X > t;) for i = 1. . . . , n I . Consequently, if the condition in (ii) above is met (as it is when there are no ties among the uncensored observations), both the P E G E and the PEXE interlace with the KME: in each interval of the form (tt_ ,, t~], the P E G E and the PEXE cross the K M E once from above. Practical experience suggests that the condition in (ii) above is not a stringent one: even though this condition is violated in many of the data sets considered to date, the P E G E and the PEXE still interlace with the K M E in the manner described. Another indication from practical experience is that the difference between the PEXE and the P E G E is negligible, even in small samples. Finally, we present an example using the data of Freireich et al. (1963). The 272 G. M. Mimmack and F. Proschan t × 4-× i" I / x4 ,..a,, 1/ I/ x+ I/ +, ,/I +x LJA × + I i k~ Jf x ,+ I, / ×Jr x+ ,t' i.., x -.II,/ ×+ 0 I" ×/+/~ CD z I,/ ×+ _o co r'c ,Y ..~+× I ,+ I 1 X I I x I × 1 o -~o -LD -J- 4- / co / C~ I / x~-, ?o 0 t.u I.-- '-'; Piecewise geometric estimation of a survival function 273 data are the remission times of 21 leukemia patients who have received 6 MP (a mercaptopurine used in the treatment of leukemia). The ordered remission times in weeks are: 6, 6, 6, 6 + , 7, 9 + , 10, 10+, 11+, 13, 16, 17+, 19+, 2 0 + , 22, 23, 2 5 + , 32+, 3 2 + , 3 4 + , 3 5 + . The P E G E and the K M E are presented in Figure 1. (Since the P E G E and the PEXE differ by at most 0.09, only the PEGE appears.) The graphs illustrate the smoothness of the P E G E in contrast with the jagged outline of the KME. The K M E and the PEGE interlace even though the condition in Theorems 4. l(ii) and 4.2(ii) is violated. Since the PEGE is only slightly above the K M E at the observed failure times and the PEGE crosses the K M E early in each failure interval, the K M E is considerably larger than the P E G E by the end of each interval. This behaviour is typical. We infer that the PEGE certainly does not overestimate: it may even tend to underestimate. We conclude that the PEGE (and the modified PEXE) have significant advantages over the KME, particularly in the cases of large samples containing many ties and small samples. It is only in the case of a large sample spread over a large range that the slight increase in computational effort required for the PEGE might merit using the K M E because the P E G E and the K M E are likely to be very similar. 5. Small sample properties of the PEGE In this section we give some indications of the small sample properties of the PEGE by considering three simulation studies. In the first study, Kitchin (1980) compares the small sample properties of the PEXE with those of the KME. In the second study, Whittemore and Keller (1983) consider the small sample behaviour of a number of estimators: we extract the results for the K M E and a particular version of the PEXE. In the third study, we make a preliminary comparison of the K M E and the PEGE. We expect the behaviour of the piecewise exponential estimators to resemble that of the PEGE because piecewise exponential estimators are continuous versions of the PEGE and, moreover, piecewise exponential estimators and the PEGE are similar when the underlying life distribution is continuous. The pi_ecewise exponential estimator considered by Whittemore and Keller is denoted FQ4" It is constructed by averaging the PEXE failure rate function estimator with a variant of the PEXE failure rate function estimator--that is, ~Q4 is the same as the PEXE except that the PEXE failure rate estimators 2/- . . . . , 2 ~ are replaced by the failure rate estimators 2", ..., 2*, defined as follows: 2* = 5(2; ~ - + 2t+- l ) f o r / = 1, .. ., n l , where 2;- = D;/total time on test in (t;_ 2 , ti] for i = 1, . . . , n 1 , 2e+ = D~./total time on test in [ti, ti+ ~) for i = 0, . . . , n~ - , 274 G. M. Mimmack and F. Proschan 2+,1 = {O~,/total time on test in [t,,, ~) ifotherwise~ ~,~,max Z;. > t t l l , A_lthough Whittemore and Keller include in their study the two estimators FQ, and FQ2 constructed from 2 f . . . . . 2~-, and 2~- . . . . . 2,~] respectively, they present the results for the hybrid estimator FQ, alone because they find that FQI tends to be negatively biased and ffQ: tends to be positively biased. The same model is assumed in all three studies. The model is that of random censorship: corresponding life and censoring random variables are independent and the censoring random variables are identically distributed. Whittemore and Keller generate 200 samples in each of the 6 x 3 x 4 = 72 situations that result from considering six life distributions (representing failure rate functions that are constant, linearly increasing, exponentially increasing, decreasing, U-shaped, and discontinuous), three levels of censoring (P(Y<X)~ O, 0.55, 0.76), and four sample sizes (n = 10, 25, 50, 100). Kitchin obtains 1000 samples in each of a variety of situations: he considers four life distributions (Exponential, Weibull with parameter 2, Weibull with parameter ½ and Uniform), three levels of censoring (P(Y<X) = 0, 0.5, 0.67), and four sample sizes (n = 10, 20, 50). Kitchin's study is broader than that of Whittemore and Keller in that Kitchin considers Exponential, Weibull and Uniform censoring distributions while Whittemore and Keller consider only Exponential censoring distributions. Kitchin apparently produces the greater variety of sampling conditions because his results vary slightly according to the model, while Whittemore and Keller find so much similarity in the results from the various distributions that they record only the results from the Weibull distribution. The conclusions we draw from the two studies are similar. Regarding mean squared error (MSE), both Kitchin and Whittemore and Keller find that, in general: (i) The MSE of the exponential estimator is smaller than that of the KME. (ii) As the level of censoring increases, the increase in the MSE is smaller for the exponential estimator than for the KME. Kitchin reports than (i) and (ii) are not always true of the PEXE and the KME: the exceptional cases occur in the tails of the distributions. The conclusions about bias are not so straightforward. Whittemore and Keller find that the PEXE tends to be negatively biased while Kitchin reports that the bias of the PEXE is a monotone increasing function of time: examining his figures, we find that the bias tends to be near zero at some point between the 40th and 60th percentiles except when the life and censoring distributions are Uniform. (In this case, the bias is positive only after the 90th percentile.) We conclude that Whittemore and Keller merely avoid detailed discussion of bias. Regarding the hybrid estimator, we find in the figures recorded some suggestions of the tendencies observed in the PEXE--specifically, monotone increasing bias and a tendency for underestimation when the sample size is small and censoring is heavy. Whether this behaviour is typical of the PEGE also remains to be seen. Piecewise geometric estimation of a survival function 275 In considering the magnitude of the bias of the estimators, we find the following. (i) Both Kitchin and Whittemore and Keller report that the bias of the KME is negligible except in the right tail of the distribution and in the case of a very small sample (n = 10) and heavy censoring. (ii) The PEXE i_s considerably more biased than the KME. (iii) The bias of FQ4 is negligible except in the case of a very small sample and heavy censoring. (iv) The bias of each estimator increases as the censoring becomes heavier and it decreases as the sample size increases. In view of these two studies, we conclude, firstly, that the PEGE is likely to compare favourably with the K M E in terms of MSE, and secondly, that the PEGE is likely to be considerably more biased than the KME. We expect that the discrete counterpart of FQ4 performs well in terms of both MSE and bias. Since the bias of this estimator is likely to be small, adjustment for its presumed tendency to increase monotonically is deemed an unnecessary complication. In the pilot study we generate three collections of data, each consisting of 100 samples of size 10, from independent Geometric life and censoring distributions. In each case the life distribution has parameter p = exp(-0.1). The censoring distributions are chosen so as to produce three levels of censoring: setting p = e x p ( - 2 ) , where 2=0.00001, 0.1, 0.3, yields the censoring probabilities P(Y<X) = 0, 0.475, 0.711 respectively. The conventions followed for extrapolation in the range beyond the largest observed failure time are as follows: ff(X>k)={Po(X>t., ) fort.,<~k<s~:, for k~> s~2 ~> tm , fi(X>k)=fi(X>tnl)(1-O~,) k-t"' for k ~> t~,. This definition of the K M E rests on the assumption that the largest observation is uncensored, while the definition of the PEGE results from assuming that the failure rate after the largest observed failure time is the same as the failure rate in the interval (tn,_ l, t,l ]Our conventions for extrapolation differ from those of Kitchin and of Whittemore and Keller. Consequently our results involving fight-hand tail probabilities differ from theirs: a preliminary indication is that our extrapolation procedures result in estimators that are more realistic than theirs. Although the size of the study precludes reaching more than tentative conclusions, we observe several tendencies. Tables l(a), 2(a) and 3(a) contain the estimated bias and mean squared error (MSE) for the K M E and the P E G E of P(X > k) for k = ~p, where ~p is the pth percentile of the underlying life distribution and p = 1, 5, i0, 20, 30, 40, 50, 60, 70, 80, 90, 95, 99. From these tables we make the following observations. (i) The MSE of the P E G E is generally smaller than that of the KME. The G. M. Mimmack and F. Proschan 276 Table 1 Results of pilot study using 100 samples of size 10, Geometric (p = e x p ( - 0 . 1 ) ) life distribution, Geometric (p = e x p ( - 0.00001)) censoring distribution and P ( Y < X ) ~ - 0 Estimated bias Percentile PEGE KME (a) Survivalfunction estimato~ 1 - 0.0184 5 - 0.0184 10 - 0.0137 20 - 0.0172 30 - 0.0253 40 - 0.0293 50 - 0.0351 60 -0.0347 70 - 0.0318 80 - 0.0283 90 - 0.0199 95 - 0.0096 99 0.0029 (b) Percentile estimators 1 5 10 20 30 40 50 60 70 80 90 95 99 - exceptions moderate Estimated M S E occur (ii) T h e MSE -0.0018 - 0.0018 0.0123 0.0092 -0.0053 -0.0118 - 0.0196 - 0.0159 - 0.0185 - 0.0187 - 0.0167 - 0.0028 - 0.0011 0.00 0.21 0.21 0.08 0.28 0.16 0.53 0.62 1.35 1.87 2.29 2.20 5.01 of each (iv) T h e difference 1.63 1.63 1.37 2.38 3.88 5.71 9.72 14.37 22.82 35.96 95.37 140.17 481.19 under increases increases. is, t h e M S E increases, in the becomes of the PEGE except in the right-hand MSE tude of the bias of each estimator as censoring of the two estimators of the two median of the distribution. (v) Both the KME and the PEGE tion. 0.00 0.35 1.69 3.00 4.48 6.20 9.57 13.70 20.43 35.23 82.53 130.22 577.47 tail of the distribution estimator increases--that little a s t h e c e n s o r i n g 0.0101 0.0101 0.0145 0.0182 0.0225 0.0279 0.0278 0.0257 0.0212 0.0133 0.0060 0.0049 0.0009 conditions of censoring. (iii) T h e d i s p a r i t y i n t h e M S E the censoring KME 0.0078 0.0078 0.0118 0.0161 0.0194 0.0255 0.0271 0.0223 0.0176 0.0108 0.0047 0.0028 0.0006 0.63 0.63 - 0.37 - 0.32 - 0.10 - 0.79 - 0.08 - 1.31 - 2.28 - 3.34 - 4.87 - 1.53 - 18.53 in the right-hand and heavy PEGE estimators more marked increases as by relatively tail. is s m a l l e s t near the generally exhibit negative bias: the magni- is g r e a t e s t around the median of the distribu- 277 Piecewise geometric estimation of a survival function Table 2 Results of pilot study using 100 samples of size 10, Geometric (p = exp(-0.1)) life distribution, Geometric (p = e x p ( - 0.1)) censoring distribution and P ( Y < X) ~ 0.475 Estimated bias Percentile (a) Su~ivalfunction 1 5 10 20 30 40 50 60 70 80 90 95 99 PEGE KME estima~ - 0.0223 - 0.0223 - 0.0207 - 0.0215 - 0.0282 - 0.0432 - 0.0509 - 0.0564 - 0.0553 - 0.0368 - 0.0060 0.0082 0.0149 - 0.0018 - 0.0018 0.0106 0.0094 -0.0042 - 0.0037 - 0.0230 -0.0442 - 0.0800 - 0.0707 - 0.0590 - 0.0401 - 0.0091 (b) Percentile estimators 1 0.00 5 0.19 10 - 0.34 20 - 0.09 30 0.38 40 o. 10 50 0.77 60 - 0.20 70 - 0.67 80 - 0.88 90 - 1.23 95 - 0.60 99 - 2.30 0.80 0.80 - 0.20 0.08 0.80 0.64 1.43 0.62 - 1.44 - 2.73 - 8.92 - 14.92 - 31.92 Estimated MSE PEGE 0.0077 0.0077 0.0124 0.0170 0.0244 0.0407 0.0475 0.0430 0.0333 0.0229 0.0124 0.0082 0.0033 0.00 0.33 1.66 3.69 7.40 12.62 20.97 34.24 64.85 128.02 302.31 561.06 1497.30 KME 0.0101 0.0101 0.0157 0.0208 0.0300 0.0502 0.0601 0.0634 0.0603 0.0413 0.0151 0.0049 0.0001 3.36 3.36 2.76 5.36 9.84 17.24 25.21 37.26 36.28 52.21 121.66 264.70 1060.98 (vi) T h e m a g n i t u d e o f t h e b i a s o f t h e K M E is c o n s i s t e n t l y s m a l l e r t h a n t h a t o f t h e P E G E o n l y w h e n t h e r e is n o c e n s o r i n g . U n d e r c o n d i t i o n s o f m o d e r a t e a n d h e a v y c e n s o r i n g , t h e K M E is less b i a s e d t h a n t h e P E G E o n l y a t p e r c e n t i l e s t o t h e left o f t h e m e d i a n : t o t h e r i g h t o f t h e m e d i a n , t h e P E G E is c o n s i d e r a b l y less biased than the KME. (vii) A s c e n s o r i n g i n c r e a s e s , t h e m a g n i t u d e o f t h e b i a s o f t h e K M E i n c r e a s e s faster than does that of the PEGE. Tables l(b), 2(b) and 3(b) contain the estimated bias and MSE for the Kaplan-Meier (KM) and piecewise geometric (PG) estimators of the percentiles ~p, p = 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95, 99. F r o m t h e s e t a b l e s w e m a k e the following observations. G. M. Mimmack and F. Proschan 278 Table 3 Results of pilot study using 100 s a m p l e s of size 10, G e o m e t r i c ( p = e x p ( - 0 . 1 ) ) G e o m e t r i c ( p = exp ( - 0.3)) censoring distribution a n d P ( Y < X ) --- 0.711 E s t i m a t e d bias Percentile PEGE (a) Surv#alfunction estimato~ 1 - 0.0230 5 - 0.0230 10 20 30 40 50 60 70 - 0.0370 0.0582 0.0714 0.1150 0.1232 0.1006 0.0702 80 90 95 99 - 0.0347 0.0032 0.0173 0.0206 life distribution, Estimated MSE KME PEGE KME - 0.0018 - 0.0018 0.0077 0.0077 0.0101 0.0101 0.0033 -0.0273 -0.0479 - 0.1011 - 0.1443 - 0.2421 - 0.2286 0.0171 0.0301 0.0437 0.0705 0.0709 0.0594 0.0456 0.0185 0.0508 0.0704 0.1257 0.1382 0.1273 0.0711 - 0.0321 0.0187 0.0125 0.0043 0.0341 0.0082 0.0025 0.0001 0.1775 0.0907 0.0498 0.0091 (b) Percentile estimators 1 5 10 20 0.10 0.24 - 0.41 - 0.08 0.87 0.87 - 0.13 0.52 0.52 0.68 1.37 3.22 3.27 3.27 2.53 7.86 30 40 50 0.29 - 0.20 0.48 0.76 - 0.10 0.16 7.19 15.16 28.06 8.82 9.56 10.86 60 70 80 - 0.47 - 0.78 - 1.11 - 2.38 - 4.91 - 8.54 50.99 90.72 167.67 16.66 36.07 84.44 90 95 99 - 1.68 - 1.25 - 3.34 - 15.53 - 21.53 - 38.53 357.58 619.71 1508.06 252.63 474.99 1496.01 (i) With a few exceptions, the PG percentile estimator is less biased than the KM percentile estimator. (ii) Both estimators tend to be negatively biased. (iii) At each level of censoring, the bias of the PG percentile estimator is negligible for percentiles smaller than the 70th, and it is acceptably small for larger percentiles, except perhaps the 99th percentile. In contrast, the KM percentile estimators are almost unbiased only for percentiles smaller than the 60th: to the right of the 60th percentile the bias tends to be very much larger than that of the PG estimators. This tendency is particularly noticeable in the case of heavy censoring. (iv) The MSE of the PG percentile estimator is smaller than that of the KM percentile estimator only in certain ranges, viz.: p ~< 70 for heavy censoring, Piecewise geometric estimation of a survival function 279 p ~< 40 for moderate censoring, and 5 ~<p ~< 95 for no censoring. Since the PG percentile estimator is almost unbiased outside these ranges, the large MSE must be the result of having large variance. On the basis of the observations involving the survival function estimators, we conclude that the small sample behaviour of the P E G E resembles that of the PEXE: specifically, when there is little or no censoring, the PEGE compares favourably with the K M E in terms of MSE but not in terms of bias. We expect that this is true irrespective of the level of censoring when the sample size is larger. It remains to be seen whether inversion of this general behaviour is typical when the sample size is very small and censoring is heavy. It is evident that increased censoring affects the bias and the MSE of the PEGE less than it affects the bias and the MSE of the KME. Our conclusions about the percentile estimators are even more tentative because of the lack of results involving the behaviour of percentile estimators. The fact that the PG percentile estimator is almost unbiased even in the presence of heavy censoring, and even as far to the right as the 95th percentile, is of considerable interest because the KM extrapolation procedures are clearly inadequate for estimating extreme right percentiles. Regarding the MSE, we note that, under conditions of moderate or heavy larcensoring, any estimator of the larger percentiles is expected to vary considerably because there are likely to be very few observations in this range. The ad hoc extrapolation procedure for the KM is expected to cause the estimators of the extreme right percentiles to exhibit large negative bias and little variation. In view of these considerations and the accuracy of the PG percentile estimators, we conclude that the fact the MSE of the PG percentile estimator of the larger percentiles is greater than that of the KM percentile estimator is not evidence of a breakdown in the reliability and efficiency of the PG percentile estimator. The general indications of our pilot study are that the PEGE and the discrete version of are attractive alternatives to the KME. In view of the resemblan__ce between the properties of the P E G E and those of the PEXE, the results for PQ4 portend well for the new discrete estimator: we expect it to be almost unbiased and to be not only more efficient than the K M E but also more stable under increased censoring. Moreover, we expect the corresponding percentile estimator to have these desirable properties also because it is likely to behave at least as well as the PG percentile estimator. The properties involving relative efficiency are of considerable importance because relative efficiency is a measure of the relative quantities of information utilized by the estimators being compared. This interpretation of relative efficiency, and the fa__ct that heavy censoring is often encountered in engineering problems, makes FQ4 and its discrete counterpart even more attractive. ffQ4 References Aalen, O. (1976). Nonparametric inference in connection with multiple decrement models. Scandinavian J. Statist. 3, 15-27. 280 G. M. Mimmack and F. Proschan Aalen, O. (1978). Nonparametric estimation of partial transition probabilities in multiple decrement models. Ann. Statist. 6, 534-545. Breslow, N. and Crowley, J. (1974). A large sample study of the life table and product limit estimators under random censorship. Ann. Statist. 2, 437-453. Chen, Y. Y., Hollander, M. and Langberg, N. (1982). Small-sample results for the Kaplan-Meier estimator. J. Amer. Statist. Assoc. 77, 141-144. Cox, D. R. (1972). Regression models and life tables. J. Roy. Statist. Soc. Ser. B 34, 187-202. Desu, M. M. and Narula, S. C. (1977). Reliability estimation under competing causes of failure. In: I. Shimi and C. P. Tsokos, eds., The Theory and Applications of Reliability I. Academic Press, New York. Efron, B. (1967). The two sample problem with censored data. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability Vol. IV. University of California Press, Berkeley, CA, 831-853. Fleming, T. R. and Harrington, D. P. (1979). Nonparametric estimation of the survival distribution in censored data. Technical Report No. 8, Section of Medical Research Statistics, Mayo Clinic, Rochester, MN. Freireich, E. J. et al. (1963). The effect of 6-Mercaptopurine on the duration of steroid-induced remission in acute leukemia. Blood 21, 699-716. Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. J. Amer. Statist. Assoc. 53, 457-481. Kitchin, J. (1980). A new method for estimating life distributions from incomplete data. Unpublished doctoral dissertation, Florida State University. Kitchin, J., Langberg, N. and Proschan, F. (1983). A new method for estimating life distributions from incomplete data. Statist. and Decisions 1, 241-255. Langberg, N., Proschan, F. and Quinzi, A. J. (1981). Estimating dependent life lengths, with applications to the theory of competing risks. Ann. Statist. 9, 157-167. Miller, R. G. (1981). Survival Analysis. Wiley, New York. Mimmack, G. M. (1985). Piecewise geometric estimation of a survival function. Unpublished doctoral dissertation, Florida State University. Nelson, W. (1969). Hazard plotting for incomplete failure data. J. Quality Technology 1, 27-52. Nelson, W. (1972). Theory and applications of hazard plotting for censored failure data. Technometrics 14, 945-966. Peterson, A. V. (1977). Expressing the Kaplan-Meier estimator as a function of empirical subsurvival functions. J. Amer. Statist. Assoc. 72, 854-858. Susarla, V. and Van Ryzin, J. (1976). Nonparametric Bayesian estimation of survival curves from incomplete observations. J. Amer. Statist. Assoc. 71, 897-902. Umholtz, R. L. (1984). Estimation of the exponential parameter for discrete data. Report, Aberdeen Proving Ground. Whittemore, A. S. and Keller, J. B. (1983). Survival estimation with censored data. Stanford University Technical Report No. 69. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988)281-311 ] ]k Applications of Pattern Recognition in Failure Diagnosis and Quality Control L. F. Pau 1. Introduction Through its compliance with, and implications on design, manufacturing, quality control, testing, operations and maintenance (Figures 1 and 2), the field of technical diagnostics has wide ranging consequences in all technical fields; some of the measures hereof are: system availability; system survivability; - safety; FAILURE-FREE SYSTEM E° (SAFE and AVAILABLE STATE) ~ ILURE R A T E ~ REPAIR RATE FAILED STATE Ei i:q,..,(N-1) (UNSAFE and NON-AVAILABLE STATE) {SYSTEM FAILURE NOT DIAGNOSED ) DIAGNOSED STATE Ei i=0,..,(N-q) (SAFE and NON-AVAILABLE STATE ) (SYSTEM FAILURE DIAGNOSED ) Fig. 1. Relation between failure diagnosis, reliability or degradation processes, safety and maintenance. If the repair is instantaneous (/% = + oo), if there is no detection delay (tin + td = 0), and if the diagnostic system itself never fails, the asymptotic system availability in the stationary case is: i=N--I A = Prob(UUT not failed)t_ ÷ oo = lqi= 1 ei/(2i + ee). More general formulae may be derwed, especially for finite repair times, and more general degradation processes. 281 282 L. F. Pau < D a~ [--, D 0 m 0 0 r~ o ¥ II .r'n [-~ H rn Z H N r~ N r~ Z O H > _ ~ Z m .,e-o _ 0 0 O r~ E~ m .N Z ~ 0 Z m o I ~Z © H b f ._1_ z o H ~ g z o H g o e,i Applications of pattern recognition in failure diagnosis and quality control 283 - production yield; quality; failure tolerance; system activation delays; - costs (lifetime, operation); - maintenance; - warranties. We define here technical diagnostics as the field dealing with all methods, processes, devices and systems whereby one can detect, localize, analyse and monitor failure modes of a system, i.e., defects and degradations (see Section 2). It is at this stage essential to stress that, whereas system reliability and safety theories are concerned with a priori assessments of the probability that the system will perform a required task under specified conditions, without failure, for a specified period of time, the field of failure diagnosis is essentially focusing on a posteriori and on-line processing and acquisition of all monitoring information for later decision making. Failure diagnosis has itself evolved from the utilization of stand-alone tools (e.g. calibers), to heuristic procedures, later codified into maintenance manuals. At a later stage, automatic test systems and non-destructive testing instruments, based on specific test sequences and sensors, have assisted the diagnosis; examples are: rotating machine vibration monitoring, signature analysis, optical flaw detection, ultrasonics, ferrography, wear sensors, process parameters, thermography, etc. More recently, however, there has been implementations and research on evolved diagnostic processes, with heavier emphasis on sensor integration, signal/image processing, software and communications. Finally, research is carried out on automated failure diagnosis, and on expert systems to accumulate and structure failure symptoms and diagnostic strategies (e.g. avionics, aircraft engines, software). Although the number of application areas and the complexicity of diagnostic systems have increased, there is still a heavy reliance on 'ad-hoc' or heuristic approaches to basing decisions on diagnostic information. But a number of fundamental diagnostic strategies have emerged from these approaches, which can be found to be common to these very diversified applications. After having introduced in Section 2 a number of basic concepts in technical diagnostics, we will review in Section 3 some of the measurement problems. The basic diagnostic strategies will be summarized in Section 4. Areas for future research and progress will be proposed in Section 5. 2. B a s i c c o n c e p t s in t e c h n i c a l d i a g n o s t i c s Although they may have parts in common, we will essentially make the difference between the system for which a diagnosis is sought (system/unit/process under test: UUT), and the diagnostic system. The basic events in technical diagnostics are well defined in terminology standards; they are: failure, defect, degradation, condition. 284 L. F. Pau < ESTIMATED FAILURE MODE ( CONFUSION MATRIX ) t A C T U A L F A I L U R E M O D E E. 1 i=0,1,..,N-1 CATASTROPHIC DEGRADATION FAILURE LOCALIZATION FAILURE FAILURE DETECTION - / I I FAILURE DIAGNOSIS Fig. 3. A failure mode is then the particular manner in which an omission of expected occurrence or performance of task or mission happens; it is thus a combination of failures, defects, and degradations. For a given task or mission, the N possible failure modes will be noted Eo, E 1. . . . . E~v_ l, where E o is the no-failure operating mode fulfilling all technical specifications./~ is the failure mode identified by the diagnostic system. 2.1. The basic troubleshooting process (Figure 3) 2.1.1. Failure detection: This is the act of identifying the presence or absence of a non-specified failure mode in a specified system carrying out a given task or mission, or manufactured to a given standard. 2.1.2. Failure localization: If the outcome of failure detection is positive, then failure localization designates the material, structures, components, processes, systems or programs which have had a failure. 2.1.3. Failure diagnosis: The act or process of identifying a failure mode E upon an evaluation of its signs and symptoms, including monitoring information. The diagnostic process carries therefore out a breakdown of failure detection into individual failure modes. Applications of pattern recognition in failure diagnosis and quality control 285 2.1.4. Failure analysis: The process of retrieving via adequate sensors all possible information, measurements, and non-destructive observations, alltogether called diagnostic information, about the life of the system prior and up to the failure; it is also a method whereby to correlate these informations. 2.1.5. Failure monitoring: This is the act of observing indicative change of equipment condition or functional measurements, as warnings for possible needed corrections. 2.2. Performance of a diagnostic process As a decision operator, any diagnostic system can make errors; each of the following errors or performances can be specified either for a specific failure mode, or in the expected sense over the set of all possible failure modes Eo . . . . , EN-1. The probabilities 2.2.1-2.2.4 can be derived from the confusion matrix (Figure 3). The overall effect of these performances is to affect system availability, with or without test system linked to UUT (Pan, 1987b). 2.2.1. Probability of incorrect diagnosis: This is the probability of diagnosing a failure mode different from the actual one, with everything else equal. 2.2.2. Probability of reject (or miss, or non-detection): This is the probability of taking no decision (diagnosis or detection) when a failure mode is actually present. 2.2.3. Probability of false alarm: The probability of diagnosing that a failure mode is present, when in fact none is present (except the normal condition Eo). 2.2.4. Probability of correct detection: The probability of detecting correctly a failure mode to be present, when it actually is (E o excepted); when there is only one possible failure mode El, it is the complement to one of the probability of false alarm. 2.2.5. Failure coverage: This is the conditional probability that, given there exists a failure mode of the UUT, this system is able to recover automatically and continue operations. The process of automatic reconfiguration, and redundancy management has the purpose of improving the coverage and making the system fault-tolerant. 2.2.6. Measurement time tin: This is the total time required for acquiring all diagnostic iinformation (except a priori information) required for the failure detection, localization and diagnosis, This time may be fractioned into subsequences, and estimated in expected value. 2.2.7. Detection (of diagnosis) delay td: This is the total time required to process and analyze the diagnostic information, and also to display or transmit the failure 286 L. F. Pau mode as determined. This time may be fractioned into subsequences, and estimated in expected value. 2.2.8. Forecasting capability tf: This is the lead time with which the occurrence of a specific failure mode (E o excepted) can be forecasted, given a confidence interval or margin. 2.2.9. Risks and costs: Costs/risks are attached to each diagnosed failure mode/~, obtained as a result of the decision process; they include: testing costs, maintenance/repair costs, safety risks, lost usage costs, warranties, yields. 3. Sensors and diagnostic information 3.I. Degradation processes The thorough knowledge about the failure mode occurrence process, and not only about the normal operating mode Eo, is an absolute must. It requires the understanding of all physical effects, as well as of software errors, besides design, operations, human factors and procedures (Figure 4). Failure modes may also occur because of interactions with other UUT's (machines, or communication nodes working together). The results of this knowledge is the derivation of inference of: • categorized lists of failure modes, and their duration or extent; • lists of features or characteristic symptoms for detection and diagnosis, with measurement ranges; • priorities among categorized failure modes vs.: - probabilities (availability), - safety (critical events), - timing (triggering, windowing, etc.), - fault-effect models (e.g. error, propagation, stress-fracture relations). This information is also used for the selection of sensors for technical diagnostics. 3.2. Sensors for technical diagnostics It is important to distinguish between two classes of diagnostic sensors: - passive sensors, with no interaction of probing energy with the UTT; - active sensors, with interaction of probing energy with the U U T perturbating the operations; this is carried out by personnel, automatic test systems, programmable systems, or other probing means. In turn, the measurement process is either destructive, or non-destructive for the UUT. Needless to say, there is a very wide range of sensors, described and reported in the technical diagnostics, measurement, and non-destructive testing litteratures. These sensors are generally used in sequence. We will give below one application 287 Applicatmns of pattern recognition in failure diagnos~ and q u a ~ control DEGRADATION PROCESSES ~ DESIGN ~" __ SOFTWARE WORLD PHYSICAL WORLD (PHYSICS, CHEMISTRY, MATERIALS ) I\ T OPERATIONS .~ ~ HUMAN FACTORS PROCEDURES Fig. 4. Main causes for a degradation. area into which much sensor development research is going, and refer the reader to the References for other fields. EXAMPLE. Integrated circuits diagnosis. See Figure 5. 3.3, Data fusion and feature extraction 3.3.1. Data fusion: In evolved diagnostic systems, it is realized that efficient diagnosis cannot, in many cases, be based on the acquisition of one single measurement only, possibly with one single sensor only (Pau, 1987a). Another fundamental approach, is to strive towards the acquisition of the measurement(s) by monitoring throughout the entire system life, including manufacturing, testing, operations, maintenance, modifications. In order to cover those two requirements, evolved diagnostic systems are based on sensor diversity, which besides increases the global sensor reliability and reduces the vulnerability (Figure 6). 3.3.2. Feature extraction: The features are then those combined symptoms derived jointly from d~erent sensors, these measurements being combined together by an L. F. Pau 288 Active sensors Non-destructive - Electrical signature analysis - Logic testing Micromanipulator probes (after removing die coat) - Nematic LCD to highlight operating circuit paths - LCD displays for comparative circuit nodal analysis Soft failure testing (alpha) Electron beam microscopy - X-ray analysis - Passive sensors - Visual inspection - Electron microscopy - Electrical pin-to-pin characterization - Leak testing - Auger analysis - Infrared thermography - Freon boiling of hot spot - LCD to detect changes in electrical field - - Destructive Capacitive discharges Dynamic and monitored accelerated testing/burn-in -Humidity, vibration, EMC testing Mechanical abrasion with ultrasonic probe Radiation testing Laser melt Photoresist etching - Passive accelerated testing/burn-in Storage reliability testing - Fig. 5. Sensors and measurement processes for the diagnosis of integrated circuits. o p e r a t i o n called feature extraction, to i n c r e a s e their u s e f u l n e s s for diagnosis. D a t a f u s i o n from diverse s e n s o r s usually leads to m u c h i m p r o v e d features, a n d to m o n i t o r i n g capabilities over the entire system life. 3.3.3. Sensor diversity: T h e diversity is in t e r m s of: - m e a s u r e m e n t processes, - design, - location, - acquisition rate, b a n d w i d t h , gain, wavelength, etc., - e n v i r o n m e n t a l exposure, with possible s e n s o r r e d u n d a n c i e s (active, p a s s i v e ) a n d d i s t r i b u t e d s e n s o r control. 3.4. Measurement problems in technical diagnostics I n a d d i t i o n to the classical issues o f calibration, m e a s u r e m e n t stability, process consistency/stability, e n v i r o n m e n t , noise, the specific c o n c e r n s are: 3.4.1. Observability: T h i s is a n e v e n t u a l p r o p e r t y o f d y n a m i c systems which expresses the ability to infer or e s t i m a t e the system c o n d i t i o n at a given p a s t Applications of pattern recognition in failure diagnosis and quality control 289 instant in time, from quantified records of all measurements made on it at later points in time. This property does not hold for most UTT's, first because of missing measurements/data, and next because of time dependent changes of the system condition which, in general, cannot be modelled. 3.4.2. Accessability to measurement points: One of the main limitations to observability is bad accessability of the main test or measurement points because of inadequate design, and the insufficient number of such measurements. Another source of limitation is inadequate selection of the measurement sampling frequency (spatial or temporal or optical), so that fine features revealing incipient failures get unnoticed. Measurement delays tm are also a problem. 3.4.3. Effect of control elements and protective elements: The observability is further reduced for some parts of the UUT because of: physical protection: hybrid/composite structures, coatings, multilayers, casings; - p r o t e c t i o n and failure recovery systems: protection networks, fault-tolerant parts, active spares, sating modes; control elements: feedback controllers, limiters, and measurement effects due to the detection delay td. - - 3.4.4. Sensor-UUT interactions: In case of electrical and mechanical measurements, impedance and bandwidth mismatch are introduced at the interface level, resulting in signal distortion features which do not originate in system failure modes. In the case of human observations, sources to obervation errors are many, as expected. In the case of active sensors, it is essential to understand and model as well as possible: - the propagation of the probing energy into the U U T and the interaction with the defects or failures; - the inverse problem, of how defect and failure features propagate to the sensor. EXAMPLE. Effects of intrinsic fracture energy on brittle fractures vs. ductile fracture under plasticity, external and internal chemistry, and structural loadings. This leads to complex crack kinetics, and ductile vs. brittle process models. 3.4.5. Support structure: The support structure, casing or board may, by its properties or behavior, interfere with both the sensor and UUT, e.g. because of mechanical impedance, electromagnetic interference (EMI), etc. 3.4.6. Distorsion: Is a classical problem in measurement, but added difficulties result from the fact that the sensors themselves cannot be properly modelled outside their normal operating bandwidth, whereas likely true measurements on systems which fail will be characterized by extremely large bandwidths. Such large bandwidths also contradict with low noise, and unfrequent calibration requirements. 290 L. F. Pau - Sensor/measurement type - Location Diversity by: - Design Environment Data acquisition (bandwidth, gain, wavelength, data rate) - Software - - with possible redudancies (active, passive, software), and distributed control Sensor measurement type 1: Signals (analog; digital; radiation) Feature extraction of diagnostic information g Sensor measurement type 2: Images, electromagnetic waves > Sensor measurement type 3: Human text; procedures; software, behavior Fig. 6. Feature extraction and data fusion with sensor diversity. 3.4.7. Sensor reliability: Failure analysis and diagnosis are only possible if the sensors of all kinds survive to system failures; this m a y require sensor redundancy (physical or analytical), separate power supplies, and different technologies and computing resources. Besides sensor and processor reliability, short reaction time t m and good feature extraction are conflicting hardware requirements, all of which contribute to increased costs which in turn limit the extent of possible implementations. Any diagnostic subsystem, and any U U T subsystem which can be activated separately, should be equipped with a time meter, unit or cycle counter. 3.4.8. Data transmission errors: Whether the U U T is autonomous or not, analog or digital multiplexing will often be used, followed by data transmission, e.g. on a c o m m o n bus or local network. These transmission links may themselves generate errors and fail. However, if the data acquisition rate is slow under good operating conditions, data transmission becomes sometimes irrelevant: on-site temporary data storage is then a convenient solution. 3.5. Research on sensors f o r diagnosis The main trends are: - development of cheap and reliable distributed sensor arrays (acoustic imaging, fiber optic sensors, distributed position sensors, accelerometers .... ); sensor integration and measurement fusion, to enhance the detection and diagnosis capabilities (vibrations/pressure, temperature/pressure, optical/temperature, pressure/acceleration/flow); in-built analog-to-digital, or optoelectronic conversion; - - Applications of pattern recognition in failure diagnosis and quality control r~ •. ~ D © H I-.-I 121 M ~ 0 0 r~ m N 0 r~ t \ r~ I 0 H r~ ,7. i--t o r~ r~ o H o~ H r~ 0 Z t~ ~ d 0 W rJ~ m o N < X X 0 0 H 0 0 0 o t~ Z 0 E~ r~ eZ D Z H Z 0 H Z 291 292 L. F. Pau - in-built digital data error-detecting-correcting circuits; - software controlled calibration; better impedance matching of active sensors; noise suppression. Moreover, there is increased attention given to the processing of unstructured verbal/written reports and actions by human operators: even if expressed in plain language, they will often reveal essential diagnostic features. 4. Diagnostic processes As already mentioned in Section 1, there appears to exist essentially a few fundamental diagnostic processes. The discovery of those admidst the technicalities of specific implementations, have actually led to substantial achievements across different application areas (e.g. from mechanical to control systems, from software to mechanical processes). We will therefore review the: - diagnostic strategies; diagnostic system architectures controlled by these strategies (active and passive sensors); test generation. 4.1. Diagnostic strategies S (Figure 7) 4.1.1. Diagnostic strategies S are always sequential, in at least one of the following aspects: 4.1.1.1. UUT configuration D: Diagnosis is sequentially applied to: units/components; systems obtained by stepwise integration of these units/components; - automata, software modules, operating systems obtained by stepwise integration of the U U T with other interfacing systems (sensors, displays, controls, etc.), the selection being guided by the diagnostic strategy. - 4.1.1.2. Diagnostic information Y: The diagnosis is using increasing numbers of diagnostic measurements coming from a diversity of sensors, the selection being guided by S; when active sensors are considered, the diagnostic measurements are the results of the probing, as applied to successive UUT decompositions D. 4.1.1.3. A priori/learning information I: The diagnosis is using increasing numbers of a priori/learning information, the retrieval being guided by S; this information set I includes data on the degradation process (see Section 3.1). As a result, a diagnostic strategy S is a sequential search process in the product set (D x Y x I): it is clear that U U T parts registration, data labelling are both needed, besides timing information. Applications of pattern recognition in failure diagnosis and quality control 293 4.1.2. There are essentially three basic diagnostic strategies S: 4.1.2.1. Failure mode removal by analysis and inspection: The detection, diagnosis, localization and removal o f the failure mode which has occurred, are carried out in sequence; the removal affects, a m o n g others: requirements, design, control, usage, parts, repair, programs, etc. 4.1.2.2. Validation: Diagnosis cannot be considered complete until the U U T has been demonstrated to solve the requirements that were set out in the U U T specifications; validation consists in verifying that these are met. 4.1.2.3. Exploring the operational envelope: The external specifications define the operational envelope within which the U U T must perform correctly in mode E o. These performance limits, while representative o f the realworld process, are not necessarily accurate, and quite different system states m a y occur. These strategies S therefore explore the behavior under circumstances not given as performance requirements, including 'severe' operating environments. 4.1.3. Diagnostic strategy assessment: The assessment is done in terms of the expected risk attached to a r a n d o m failure m o d e E, as estimated in terms of the various performance criteria listed in Section 2.2. 4.1.4. Example: classification of software testing strategies S: The k n o w n software testing techniques can be classified into the 3 classes o f Section 412; see Figure 8. 1. Failure removal: Sensitized path testing Fault seeding Hardware/software test points and monitoring software - Code analyzers Dynamic test probes, injection of test patterns of bits - - - - 2. Validation: - Proof-of-correctness Program verification by predicate testing - Proof-of-loops Validation using a representation in a specification language Validation by simulation - - - 3. Exploring the operational envelope: - Endurance tests - Derivation of tests outside the specifications, by a specification language Automatic test case generation Behavior of specific routines in extreme cases Stress tests (inputs, time), saturation tests - - - Fig. 8. Classification of software testing strategies S. L. F. Pau 294 ) ) ca< C> 1 '7 z ) ) o 8 u ~ u3 r~ Z o ~ ~q~q < Z pq • U 0 O~ ~ z N ~-m < ._~ H~Z~ 0 g] ~ OqJ ~[4 ~q~ P40 o ~ z ~ H~ r~ Z O ~ ~ ~ 8 M ~ ~3 N Applications of pattern recognition in failure diagnosis and quality control 295 4.2. Diagnostic system architectures The diagnostic strategies S to be implemented control the utilization and access to: UUT configuration D, diagnostic information Y, failure models and a priori information/, all of which are part of the diagnostic system. The failure mode/~ is determined by the final diagnostic decision unit. Especially important in the diagnostic system architecture, are the sequential set-up vs. D, Y, 1 with backtrackings, and the: 4.2.1. Measurement/diagnostic information unit: This senses diagnostic information by active and passive sensors, and performs a parametric UUT identification by adjusting a parametric model of the UUT; the estimated parameters are fed into the diagnostic decision unit. If these parameters are all measurable, the diagnosis is called external; if they are only observable (and estimated by e.g. modal analysis, Kalman filter, or error-detection-correction), the diagnosis is called internal. 4.2.2. Failure model unit: For a given UUT configuration D, operational environment, and set of other learning information/, this unit identifies and prioritizes the possible failure modes Eo, E 1. . . . , E N - 1 (e.g. critical parts, active routines, fracture locations). A failure mode effect model (FMEA analysis) is then adjusted to a usage model of the UUT (incorporating e.g. fatigue, ductility, heating, cumulative failures, cumulative contents of registers) to derive predicted parameter values for all possible failure modes Eo, E l , E N_ 1, and the potential effects on the UUT performances. ..., Note that under a sequential diagnostic strategy S, a whole hierarchy of models, with corresponding adjustment factors (environment, specification of parts, usage) are needed; these models usually take the simple form of multi-entry tables stored in read-only memories (e.g. fault dictionaries). EXAMPLE. S n e a k circuit analysis (failure mode identification). This is, for electronic circuits, a systematic review of electric current and logic paths down to the Failure modes E l , - ..., E N_ 1 Fatigue of rolling elements/tracks - Wear - Examples of feature parameters Vibration parameters Fiber optic inspection Shock pulses Radial position changes in shaft position/deflection Cage failures Frictional losses Temperature changes - Lubrication starvation, contamination Temperature changes Fig. 10. Failure modes of bearings (FMEA analysis). 296 L. F. Pau components and logic statements, to detect latent paths, timing errors, software errors, hardware failures. It uses essentially the specifications and nodal/topological network analysis, in addition to state diagrams for the logic. EXAMPLE. Failures of bearings (FMEA analysis). See Figure 10. 4.2.3. Diagnostic decision unit (Figure 11). This decision logic determines the likely failure mode /~ among Eo, El, ..., EN_I, from the estimated and predicted parameters, with account for the cost/risk/time factors. This process, which may also derive classification features from these data, is essentially a pattern recognition process (signals, images, coded data, text, symbols, logic invariants); the simplest case is straightforward comparison (template matching) between estimated and predicted parameters (including event counts). When the diagnostic decosion is used for the prediction of the remaining U U T life, and passive sensors only are used, one would use the term non-destructive evaluation (NDE) instead of technical diagnostics. Extensions to be above are required within the context of knowledge based systems or expert systems for diagnostics (Pan, 1986). 4.3. Test generation This is the process whereby the active sensors, controlled by the diagnostic strategy S, select and apply specific types of probing energy to the UUT. These processes can be classified according to two criteria: (i) functional testing (by cause-effect tables) vs. structural testing (by sensitizing probing energy); (ii) deterministic vs. random (by noise, Monte Carlo simulation, random events), The possible failure modes, and the corresponding probing signals generated by the active sensors, will usually be determined by the failure model unit (Section 422). However, the difficult design/selection issue to be resolved is whether these test signals can also detect other failure modes than those which they should characterize. Test generation design will have both to minimize these overlaps, and to find minimum test sequences to energize all hypothesized failure modes. 4.4. Design considerations for diagnostic system architectures These architectures must meet conflicting criteria, which are essentially: maximum diagnostic system reliability, because it must in general be larger than the UUT reliability; - relative diagnostic system cost vs. UUT cost; ease of use for human operators; the diagnostic system must be either faster or more intelligent; updating capabilities and traceability; - simultaneous design of the U U T and diagnostic system. - - - Applications of pattern recognition in failure diagnosis and quality control H1 rJ3 r 1 m i H i ul i e~ o "3 @ e~ d= [ .< o rfl ~z r~ r~ > i...t [.-t U i 297 298 L. F. Pau 4.5. Statistical pattern recognition methods used The diagnostic decision (Section 4.2.3 and Figure 2) is explicitily a pattern classification problem, as already stated (Pau, 1981). In the case the measurements Y are restricted to numerical values (signals, data), the statistical pattern recognition (Fukunaga, 1972; S ebestyen, 1962) methods apply (Saeks and Liberty, 1977; Pau, 1981a, b; Rasmussen and Rouse, 1981). In view of the requirements of the previous sections (especially 4.4), the standard methods used at each stage for the diagnostic decision are (Section 2.1): Features are selected and priority ranked among the following: 1. User traffic (demand) 2. Off-lineteletraffic measurements and statistics on: each route or link (flows and intensities) around each traffic node (input-output measurements) 3. On-line teletraffic measurements for: - flow control - congestion control/windowing routing protocol use and interrupts 4. Hardware, software node condition monitoring 5. Error correction, propagation anomalies compensation, and disruption of links 6. Test and monitoring unit condition 7. Protection of transmission links carrying diagnostic information - - - - Fig. 12. Features for data communications network tests and monitoring. Failure detection - Sequential hypothesis testing (Wald, 1947). - Non-parametric sequential testing (Pau, 1978; Fu, 1968; Wald, 1947). - Hypothesis testing (shift of the mean, variance) (Clark et al., 1975; Sebestyen, 1962). - Bayes classification (Fukunaga, 1972). Discriminant analysis (Fukunaga, 1972; Sebestyen, 1962). - Nearest neighbor classification rule (Fukunaga, 1972; Devijver, 1979). Sensor/observation error compensation (Pau and Kittler, 1980). - - Failure localization - Graph search algorithms (Saeks and Liberty, 1977; Rasmussen and Rouse, 1981; Slagle and Lee, 1971). - Branch-and-bound algorithms (Navendra and Fukunaga, 1977). Dynamic programming (Pau, 1981a; Bellman, 1966). - Logical inference (Pau, 1984). - Failure diagnosis Correspondence analysis (Pau, 1981a; Hill, 1974; Section 5). - Discriminant analysis (Van de Geer, 1971; Benzecri, 1977). - Applications of pattern recognition in failure diagnosis and quality control 299 Canonical analysis (Hastman, 1960; Benzecri, 1977). Nearest neighbor classification rule (Fukunaga, 1972; Devijver, 1979). - Knowledge based or expert systems for diagnostics (Pan, 1986). - - Failure analysis Variance analysis, correlation analysis (Van de Geer, 1971). Principal components analysis (Pau, 1981a; Van de Geer, 1971; Chien and Fu, 1967). Scatter analysis (Van de Geer, 1971; Everitt, 1974). Clustering procedures, e.g. dynamic clusters algorithm (Pau, 1981a; Everitt, 1974; Hartigan, 1975). Multivariate probability density estimation (Parzen, kernel functions, k-nearest neighbour estimators) (Fukunaga, 1972; Devijver, 1979; Parzen, 1962). - Multivariate sampling plans (Pan et al., 1983). - Failure monitoring Statistics of level crossings, especially two-level crossings (Saeks and Liberty, 1977; Pau, 1981a). - Spectral analysis and FFT (Chen, 1982). Kalman estimation (Pau, 1981a, 1977). Recursive least-squares estimators. Linear prediction ARMA, ARIMA estimators (Chen, 1982). Knowledge based or expert systems for failure monitoring (Pau, 1986). - 5. Example: Correspondence analysis and its application The problem is to diagnose defective machines among 33 machines, described each by 4 measurements, while deriving a sequential diagnostic strategy S and satisfying in that order three detection criteria: (c0 maximum vibration level, (/~) minimum flow, (7) minimum electricity consumption. 5.1. Method 5.1.1. Introduction and problem analys& (a) The case is set up as a clustering problem, where each of the 33 machines considered is described by measurement attributes (vibration level, operating time, electricity consumption, flow). The raw data are given in Figure 13. Some essential characteristics of this problem are the following: (i) the answer requested is to reduce the number of alternatives for the diagnosis and failure location; (ii) it is obvious, for technical reasons, that the four attributes are correlated; (iii) the number of attributes measured on each machine is fairly small, and all observations are real valued and non-negative. L.F. Pau 300 Machine no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Vibration level PRIC Operating time TIME Electricity consumption CONS 509 425 446 564 547 450 473 484 456 488 530 477 589 534 536 494 425 555 543 515 452 547 421 498 467 595 414 431 452 408 478 395 543 74 80 72 65 53 68 65 56 68 72 55 76 53 61 57 72 65 53 57 68 76 68 76 68 65 50 68 66 72 77 59 76 57 1.5 1.5 1.6 1.6 1.8 1.6 1.6 1.7 1.6 1.6 1.7 1.5 1.6 1.4 1.7 1.5 1.8 1.7 1.6 1.5 1.5 1.5 1.4 1.6 1.7 1.8 1.7 1.7 1.5 1.6 1.8 1.5 1.5 Flow WATR 114 110 135 118 140 135 130 115 130 114 135 110 130 122 110 135 120 125 120 130 112 120 130 120 130 135 125 110 115 119 110 120 135 Fig. 13. Raw data of machine diagnosis case (Section 5). However, the parameters of these relations are u n k n o w n and they can only be inferred from the sample of 33 machines. (b) These characteristics build justifications for the use of multivariate statistical analysis, a n d of correspondence analysis in particular because of its joint use of information about the machines and about the diagnostic measurements. The main steps of correspondence analysis are the following (Pan, 1981a; Chen, 1982): Step 1. First, infer from the data estimated correlations between machines and between diagnostic measurements, a reduced set of i n d e p e n d e n t feature measurements, according to which the 33 alternative machines may be ranked. As far as this step is concerned, and this step only, correspondence analysis is comparable Applications of pattern recognition in failure diagnosis and quality control 301 to factor analysis (Van de Geer, 1971; Hartman, 1960), although the two differ in the remaining steps. Step 2. Next, interpret the nature of these statistically independent feature measurements, by indicating the contribution to each of these by the original attribute measurements, and determine the diagnosis in terms of these features. Step 3. Thereafter, rank the feature measurements by decreasing contributions to the reconstruction of the original 33 x 4 evaluation measurements; the best feature measurement (e.g. the first) is, in correspondence analysis, the one maximizing the variance in that direction; in other words, this is the feature measurement which produces the ranking with the highest possible discrimination among the 33 machines, thus reducing the doubt of the repairman. Step 4. Finally, recommend to the failure location those machines which get the most favorable ranking (in terms of the interpretation) on the first feature axis, eventually also on the second axis. (c) One essential advantage of this approach is that the decision maker, will be provided with a two-dimensional chart, which he may easily interpret, and on which he may spot with the eye in a straightforward manner, the final reduced set of candidate machines. Also, apart from the number of feature measurements used in step 4, no additional assumption is needed, because unsupervised multivariate statistical analysis is used. The effect of linear transformation and rescaling of the initial data is indicated in Section 5.1.2.6. 5.1.2. Theory and use of correspondence analysis (Chen, 1982; Hill, 1974; Pau, 1981a). 5.1.2.1. Notation. Let k(I, J) be the incidence table of non-negative numbers, representing the attribute measurements j t 3", j = 1, 2, 3, 4, on the machines i t / , i = 1, ..., 33. The marginals are defined as follows: k(i, ") ~=~ k(, j), K(. , j) ~=~ k(i, j) . j i It is convenient to operate on the contingency table p(I, J), rather than on the incidence table k(1, J): p(i, j) =A k(i, j) / ~] k(m, n), ! and corresponding p(i, "), p(',j) m,~t r will be the number of feature measurements extracted; here r ~< 4. 5.1.2.2. Concepts and principles of interpretation. Generalizing the classical partition of a contingency table by a Z2 test (Pearson), correspondence analysis yields natural clusters made of rows i t I and columns j t J which go together to form natural groups in the feature measurement space. Their construction is essentially based upon geometrical proximities between rows i t I and/or columns j t J; these " proximities may be identified by visual inspection, if only two feature measurements are considered, by building coordinate axes for all machines i t I and 302 L. F. Pau attribute measurements j E J. Such representations, called maps, are precious tools for visual clustering, and thus to diagnose causality relations between measurements and machines. By construction, all the effects of statistically dependent rows and columns such that: k(i, j) = k(i, ") k ( ' , j) will be removed. Equivalent machines will thus appear immediately as having very close representations on the maps. The machine space I is provided with a distance measure, called Z2 metric, defined by d2(il, i2) = ~ p ( ' , j) [x(i,, j) - x(i2, j)12, J x(i, j) a= _ p(',j) p(;, .)p(., j) 1. Moreover, each machine i~ I and each measurement j e J are assigned the weights p(i, .), and p ( . , j), respectively, for all variance computations using the Z2 metric. 5.1.2.3. Theory of correspondence analysis: summary (Pau, 1981a; Chen, 1982; Hill, 1974). (a) Correspondence analysis, or as it is also called, Fisher's canonical analysis of contingency tables, amounts to looking for vectors F = t(F(1), . . . , F(Card(J))) and G = t(G(1), ..., G(Card(I))), where Card(. ) is the number of elements in the set, such that when the functions f, g of the random variables (Y, X) = ( j , / ) are defined by the relations f(Y) = F(j), g ( X ) = G(i), then the correlation between the random variables f ( Y ) , g(X) is maximum. spondence analysis can be applied to non-negative incidence tables k(L well as to contingency tables p(I, J); the former will be considered following. (b) Let k(L ' ) and k ( ' , J ) be the diagonal matrices of row and column assuming none to be zero. The sequence of operations F (1) = ( k ( ' , J ) ) - ~ tk(I, J ) G °~ , G (2) = (k(/, "))- l k ( / , J ) F (1) , F Ce) = (k(., J ) ) - ~ tk(I, J ) G (2~ , etc. CorreJ), as in the totals, Applications of pattern recognition in failure diagnosis and quality control 303 in which new vectors F ('m, G (m) are successively derived from an initial vector G (1), is referred to here as the Co(k((L J)) algorithm corresponding to the tableau I,(i, J). (c) Its eigenvectors, as defined below, are the solutions of the correspondence analysis problem, and the coordinates of the individuals and measurements in the feature space are simply: F(j, n) = F * ( j ) , G(i, n) = G*(i), where n = 1, ..., M i n ( C a r d ( / ) , Card(J)), and F*, G* are the eigenvectors of rank n of the algorithm Co(k(I, J)), when ranked by decreasing eigenvalues 2,. (d) Each triple (p, F*, G*) is an eigensolution if: pGg¢ ~. (k(L .))-1 k(l~ J)F*, pF* = (k(., J ) ) - ' tk(I, J ) G * , p= 5.1.2.4. Computational formulas. (1) Define the dimension 1 ~< r ~< Min (Card)(/), C a r d ( J ) ) of the feature space after data compression. (2) (a) G* and 2, = pn2 are respectively the (n + 1)st column eigenvector and associated eigenvalue of the symmetrical semi-definite matrix S = [sit]: sit = ~ p(i, j)p(i, l) i~1 p(i, ' ) x / p ( . , j ) p ( . , i ) ' j' l ~ J , which has 2 0 = 1 as largest eigenvalue; (b) These eigenvectors G* = [G*(i), creasing eigenvalues 1 >/21 > / . . . > 2r > N(I). (3) The factor axes F* of the cluster values 2,, and F* = ( 1 / x / ~ ) ( p ( . , J ) ) - ' all the coordinates of G* are equal. i = 1. . . . , C a r d ( / ) ] are ranked by de0. They are the factor axes of the cluster N ( J ) are associated to the same eigen- tp(I, J ) G * , ( p ( . j ) ) - i tp(i, j ) = [p(j, i)/(p(., j ) ] , i = row ; j = c o l u m n . (4) (a) The coordinate G(i, n), n = 1. . . . , r, of the individual i e I on the factor axis G* is G*(O. (b) The coordinate F(j, n), n -- 1. . . . , r, of the measurement j e J on the factor axis F~*, is F~*(j). (c) Both individuals i e I and measurements j e J m a y then be displayed in the same r-dimensional feature space, with basis vectors G*, n = 1, . . . , r. (d) G(i, n) - 1 1 ~ p(i, j ) F ( j , n) . .i e .I , . n .= 1, . p(i, • ) jT"J r L. F. Pau 304 (5) Data reconstruction formula: p(i, j) = p(i, .)p(', j) [1 + x/~. F(j, n)G(i, n)l . t/=l~...,r 5.1.2.5. Contributions, and interpretations of the factor axes representing the feature measurements. On a map, the squared Euclidean distance D between rows and/or columns, has the same value as the Z2 distance between the corresponding profiles, and 2. = ~ p(' ,j)2 (F(j, n)) 2 = ~. j p(i, ") (G(i, n)) 2 , n= 1,...,r, i :~n = ~n" Trace(S). This justifies the following definitions: (i) p(i, .)(G(i, n))2 Sign(G(/, n)) is the contribution of the row/machine i to the factor axis n of inertia ).n ; (ii) p(., j) (F(j, n))2 Sign(F(j, n) is the contribution of the column/measurement j to the factor axis n of the inertia 2,,. The rule is then to interpret the feature axis n, with reference only to those machines and measurements which have the largest (or smallest) contributions to that axis. 5.I.2.6. Lffect of rescaling the data k(L J). If the attribute measurement k(i, j) is rescaled by a factor aj > 0, and if the modified x coordinates are noted xa, then xa(i, j) ~=(x(i, j) + 1) 1 + (aj - 1)p(i, j)ip(i, • ) -1. If we assume aj small, Card(J) large, we may replace p(i, j) by its expected value and get the approximation xa(i, j) "~ 1 1 ] (x(i, j)+ 1)- i. Card(J),l ai - As a consequence, the modified ~2 distance becomes da2(il, i2) = aj 1 aj - 1_] 2 d2(i,, i2). Card(J),/ In other words, if one attribute measurement j ~ J is rescaled, essentially only the point representing this measurement will be moved, whereas all distances in the machine space I will be multiplied by the same factor. Rescaling does consequently not affect the relative positions of the machines, and the machine diagnosis procedure does still apply. Applications of pattern recognition in failure diagnosis and quality control 1. C o o r d i n a t e s F o f the m e a s u r e m e n t s M F(PRIC) - F(CONS) F(WATR) F(TIME) 2. C o o r d i n a t e s G o f the m a c h i n e s 3. E i g e n v a l u e s and inertia 0.03785 0.04187 0.05526 0.17734 2 3 - 0.00886 0.02053 0.06180 0.05025 0.00010 - 0.08758 0.00058 0.00032 M 1 2 3 G(L26) G(L13) G(L18) G (L15) G(L19) G (L 5) G(L 4) G(L33) G(Lll) G(L14) G(L 8) G (L22) G (L31) G (L20) G(L24) G(L 1) G(L10) G(L 7) G(L25) G (L16) G (L12) G(L28) G (L 9) G(L29) G(L17) G(L 6) G(L21) G(L 3) G(L27) G(L 2) G(L23) G (L30) G (L32) - 0.11505 - 0.10726 -0.09264 - 0.08407 - 0.07633 - 0.06924 - 0.06350 - 0.05833 - 0.05656 - 0.05310 - 0.04395 - 0.03896 -0.03345 - 0.00388 - 0.00200 0.00459 0.01442 0.01917 0.02446 0.03331 0.03458 0.03717 0.04593 0.04765 0.05156 0.05731 0.06003 0.07663 0.08129 0.10110 0.11189 0.11780 0.12948 0.02421 0.00762 0.00778 - 0.03511 - 0.00993 0.05048 - 0.03973 0.03013 0.04041 -0.01005 0.00195 - 0.03516 - 0.01782 0.00380 - 0.01753 - 0.05232 - 0.04054 0.02893 0.03182 0.01714 - 0.05806 -0.01573 0.02954 -0.02400 0.02187 0.04714 - 0.04308 0.03911 0.03521 - 0.04922 0.02595 - 0.00506 0.00688 - 0.00150 0.00304 - 0.00159 - 0.00472 0.00033 - 0.00186 0.00142 0.00587 - 0.00033 0.00656 -0.00599 0.00438 - 0.01013 0.00537 0.00005 0.00285 - 0.00104 0.00087 - 0.00243 0.00617 0.00130 - 0.00833 0.00064 0.00103 - 0.00978 0.00146 0.00087 0.00179 - 0.00539 -0.00005 0.00689 -0.00246 0.00054 r 1 2 3 0.9931E -03 17.61~o 99.68~0 0.1817 E - 0 4 0.32~o 100 00~o Z 4. E i g e n v e c t o r s 1 305 0.84848 - 0.47204 - 0.23851 0.01960 0.4629 E - 0 2 82.07~ 82 ~o 0.31096 0.81047 - 0.49587 0.02369 0.04857 0.02989 0.03164 - 0.99787 Fig. 14. C o o r d i n a t e s o f all m e a s u r e m e n t s a n d m a c h i n e s ( S e c t i o n 5). 0.42548 0.34558 0.83441 0.05752 L. F. Pau 306 5.3. Case results Following the procedure presented in Section 5.1, the theory of which was summarized in Section 5.1.2, we will in the following interpret the numerical results obtained, eventually displayed in the compagnion Figures 14, 15, 16. 5.2.1. Step 1: Computation of the feature measurements. First r = 3 feature measurements are extracted; they are the eigenvectors G~', G~', G*. 5.2.2. Step 2: Interpretation of the feature measurements. (a) They are obvious from the reading of the computed contributions of the machines and measurements to G*, G*, and G* (see Figure 14). (i) G~': The first feature measurement opposes the operating time (contribution = 0.304 E - 02) to the vibration level (contribution = - 0.103 E - 02), while the flow has weaker but here similar contribution to the operating time; this first feature measurement is thus the vibration level per unit of operating time. (ii) G*: The second feature measurement opposes the flow (contribution = 0.691 RE - 03) to operating time (contribution = - 0.244 E - 03); the second feature measurement is thus the flow required for running the machine. (iii) G*: The third feature measurement isolates the electricity consumption alone (contribution = 0.181 E - 0 4 ) ; this means that it has only a minor impact on the machine diagnosis problem. (b) The goals are to fulfill, in the given order, the following diagnostic criteria: (a) maximize the vibration level per unit operating time, thus select machines with large positive contributions and coordinates on G~'; (fl) minimize the flow, thus select machines with large positive contributions and coordinates on G~'; (~) minimize the electricity consumption, thus select machines with large positive contributions and coordinates on G*. 5.2.3. Step 3: Ranking the feature measurements. The numlerical results from Figure 14 yields: ).1 eigenvalue of G* = 0.4629 E - 02 or z I -- 82.07~o , 22 eigenvalue of G* = 0.993 E - 0 3 or z2 --- 1 7 . 6 1 ~ , 23 eigenvalue of G* = 0.181 E - 0 4 or z3 = 0.32~o . Here, it is obvious that the machine diagnosis would essentially rely on the first feature measurement (vibration level per unit of operating time) and eventually somehow on the second (flow). Our three-criteria problem has been reduced to a two-criteria problem with G* as a leading diagnostic criteria to be maximized. 5.2.4. Step 4: Machine diagnosis. (a) Looking at the machines in the first quadrant of Figure 16, one sees that the non-dominated points according to the two criteria (~) and (13) are 32, 23, 27, 3, 30,2. Applications of pattern recognition in failure diagnosis and quality control l l l l l l l i l l I [ 1 1 1 [ l l i ; l l l l l l l l i l l l i 307 IIII I l l i l l l i l l l f l l i 000000000000000000000000000000000 ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ 0000 ZZZZ .=_ l l i l l i l l l l l l l l l l l l l l i l l i l i t i l l i l l I I I I I I l l l l l l l l rn III~ l II 0 000000000000000000000000000000000 ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ 0000 ZZZZ 0 [.) I I I I I I I t l l l i l l l l l l ~ l i f l l l l l l l l ~ l l l i l l l i l i l l l i l l l l l l l I 000000000000000000000000000000000 ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ ~ ~ - ~ - - ~ - - ~ 0000 ZZZZ . . . . . . . . . ~ ~o~ 308 L. F. Pau . . . . . . . . . . . . . . . . . . . . . . WATR . . . . . . . L5 L6 LII L3 L27 L25 L33 L9 L7 L23 IL26 L17 CONS L16 I I L32 I,LI3 L18 L2O L8 (+) L30 LI9 LI4 PRIC L31 L28 L24 L29 L22 Ll5 L4 LIO L21 L2 Fig. 16. Map of all 4 measurements and 33 machines. TI~E Applications of pattern recognition in failure diagnosis and quality control 309 (b) Because we want the criterion (a) to dominate, we will have to make an ordering within these non-dominated solutions. Figure 15, which contains the contributions of the machines to G*, Figure 14 which contains their coordinates, and last but not least the map of Figure 16, give us, according to the rule (~), the solution: Diagnose as defective machine # 3 2 ; if not: # 3 0 ; if not: # 2 3 ; if not: # 2 ; if not: # 2 7 ; if not: # 3 ; etc. However, the first machine in this sequence also to have a large positive contribution to G* (flow) according to criterion (fl), is Machine 27, and the next Machine 3, or Machine 6. Machines 30 and 2 have negative contributions to G*, and should be eliminated. (c) By visual clustering, one could select right away the machines by the original criteria of minimizing the vibration level, the operating time, the electricity consumption, or the flow p e r s e , by looking at the factor map Figure 16, for which machines are close to the points representing these criteria/measurements: (i) Max vibration level: Machines 14, 19, 31, 24, 8, 20, 13, 18, close to PRIC. (ii) Min operating time: Machines 2, 21, 1, 10, close to TIME. (iii) Min electricity consumption: Machines 17, 16, close to CONS. (iv) Min flow: Machines 6, 3, 27, 25, 11, close to WATR. Notice the large differences between the previous selections (a), (b) according to criteria (~) and (fl), and the latter ones (c). 5.2.5. Conclusion. Because of the significant contributions of G~* and G*, and because of the removal of correlated effects, we recommend the following reduced diagnosis of defect machines: Machines 32, 23, 27, 3 (in that order, the first being the most likely to have failed). References The bibliography on statistical and pattern recognition approaches to failure diagnosis is enormous, and scattered across many sections of the technical litterature, often within the context of specific applications. Therefore, in addition to a few numbered recent references of a general nature, are listed a number of major public conferences dealing to a substantial extent with technical diagnostics. Neither lists are by any means complete, but are indicated to seve as starting points. Beliman, R. (1966). Dynamic programming, pattern recognition and location of faults in complex systems. J. AppL Probab. 3, 268-280. Benzecri, J. P. (1977). L'Analyse des Donn~es, Vol. 1 & 2. Dunod, Paris. Chen, C. H. (1982). Digital Waveform Processing and Recognition. CRC Press, Boca Raton, FL. Chien, Y, T. and Fu, K. S. (1967). On the generalized Karhunen-Lorve expansion. IEEE Trans. Inform. Theory 13, 518-520. Clark, R. N. et al. (1975). Detecting instrument malfunctions in control systems. IEEE Trans. Aerospace Electron. Systems 11 (4). 310 L. F. Pau Collacott, R. A. (1976). Mechanical Fault Diagnosis and Condition Monitoring. Chapman & Hall, London. Devijver, P. A. (1979). New error bounds with the nearest neighbor rule. IEEE Trans. Inform. Theory 25, 749-753. Everitt, B. (1974). Cluster Analysis. Wiley, New York. Fu, K. S. (1968). Sequential Methods in Pattern Recognition and Machine Learning. Academic Press, New York. Fukunaga, K. (1972). Introduction to Statistical Pattern Recognition. Academic Press, New York. Hartigan, J. A. (1975). Clustering Algorithms. Wiley, New York. Hartman, H. (1960). Modern Factor Analysis. University of Chicago Press, Chicago, IL. Hill, M. O. (1974). Correspondence analysis: a neglected multivariate method. AppL Statist. Ser. C 23 (3), 340-354. IEEU Spectrum (1981). Special issue on reliability, October 1981. IMEKO (1980). TC-10: Glossary of terms and definitions recommended for use in technical diagnostics and condition-based maintenance. IMEKO Secretariat, Budapest. Narendra, P. M. and Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. 26, 917-922. Parzen, E. (1962). On estimation of a probability density function and mode. Ann. Math. Statist. 33, 1065-1076. Pan, L. F. (1977). An adaptive signal classification procedure: application to aircraft engine monitoring. Pattern Recognition 9 (3), 121-130. Pau, L. F. (1978). Classification du signal par tests s6quentiels non-param&riques. In: Proc. Conf. Reconnaissance des formes et traitement des images. INRIA, Rocquencourt, pp. 159-168. Pau, L. F. (1981a). Failure Diagnosis and Performance Monitoring. Marcel Dekker, New York. Pau, L. F. (1981b). Applications of pattern recognition to failure analysis and diagnosis. In: K. S. Fu, ed, Applications of Pattern Recognition. CRC Press, Boca Raton, FL, Chapter 5. Pau, L. F. (1984). Failure detection processes by an expert system and hybrid pattern recognition. Pattern Recognition Lett. 2, 419-425. Pau, L. F. (1986). A survey of expert systems for failure diagnosis, test generation and maintenance. Expert Systems J. 3 (2), 100-111. Pau, L. F. (1987a). Knowledge representation approaches in sensor fusion. In: Proc. IFAC World Congress. Pergamon Press, Oxford. Pau, L. F. (1987b). System availability in presence of an imperfect test and monitoring system. IEEE Trans. Aerospace Electron. Systems, 23(5), 625-633. Pau, L. F. and Kittler, J. (1980). Automatic inspection by lots in the presence of classification errors. Pattern Recognition 12 (4), 237-241 Pau, L. F., Toghrai, C. and Chen, C. H. (1983). Multivariate sampling plans in quality control: a numerical example. IEEE Trans. Reliability 32 (4), 359-365. Rasmussen, J. and Rouse, W. B. (Editors) (1981). Human Detection and Diagnosis of System Failures. NATO Conference series, Vol. 15, Series 3. Plenum Press, New York. Saeks, R. and Liberty, S. (1977). Rational Fault Analysis. Marcel Dekker, New York. Sebestyen, G. (1962). Decision Making Processes in Pattern Recognition. MacMillan, New York. Slagle, J. R. and Lee, R. C. T. (1971). Application of game tree searching techniques to sequential pattern recognition. Comm. ACM 14 (2), 103-110. Van de Geer, J. P. (1971). Introduction to Multivariate Analysis for the Social Sciences. Freeman, San Francisco, CA. Wald, A. (1947). Sequential Analysis. Wiley, New York. Conferences IEEE Automatic Testing Conference (AUTOTESTCON). IEEE International Test Conference (Cherry Hill). IEEE/AIAA Annual Reliability and Maintainability Conferences. Applications of pattern recognition in failure diagnosis and quality control 311 IEEE/IFIP International Conferences on Fault-Tolerant Computing. IEEE Reliability Physics. IEEE/ASME/AIAA American Automatic Control Conference. ASME (American Society of mecanical engineers) International Conference on Non-destructive Testing. ASNT (American Society for Non-destructive Testing) Annual QUALTEST Conference. ASNT (American Society for Non-destructive Testing) Topical Conferences. IFAC (International Federation on Automatic Control), SAFECOMP (Safe Computing) Conference. IMEKO (International Measurement Confederation) International Conference on technical diagnostics. IBE Conf. Ltd, International Conference on Terotechnology, England. BINDT (British Institute of NDT), Annual Conference on Non-destructive Testing, England. EFMS (European Federation of Maintenance Societies), European Maintenance Congress. Mechanical Failure Prevention Group (MFPG), National Bureau of Standards, Conferencz on Detection, Diagnosis and Prognosis. ISTFA (International Society for Testing and Failure Analysis), Annual Testing and Failure Analysis Conference. IFS Publ., International Conference on Automated Inspection and Product Control, England. NETWORK Ltd, Annual Conference on Automatic Testing, England. Institute of Environmental Sciences, Annual Conference, USA. ASM (American Society of Metals), International Conference on Non-destructive Evaluation in the Nuclear Industry. ESPRIT-supported Conferences on Expert Systems for Failure Diagnosis. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 313-331 1 I~ 1. Nonparametric Estimation of Density and Hazard Rate Functions when Samples are Censored* W. J. Padgett I. Introduction A common and very old problem in statistics is the estimation of an unknown probability density function. In particular, the problem of nonparametric probability density estimation has been studied for many years. Summaries of results on nonparametric density estimation based on complete (uncensored) random samples have been listed recently by several authors, including Fryer [18], Tapia and Thompson [52], Wertz and Schneider [60], and Bean and Tsokos [2]. Also, a review of results for censored samples has been given by Padgett and McNichols [39]. In addition to its importance in theoretical statistics, nonparametric density estimation has been utilized in hazard analysis, life testing, and reliability, as well as in the areas of nonparametric discrimination and high energy physics [20]. The purpose of this article is to present the different types of nonparametric density estimates that have been proposed for the situation that the sample data are censored or incomplete. This type of data arises in many life testing situations and is common in survival analysis problems (see Lagakos [25] and Kalbfleisch and Prentice [21], for example). In many of these situations, some observations may be censored or truncated from the right, referred to as right-censorship. This occurs often in medical trials when the patients may enter treatment at different times and then either die from the disease under investigation or leave the study before its conclusion. A similar situation may occur in industrial life testing when items are removed from the test at random times for various reasons. It is of interest to be able to estimate nonparametrically the unknown density of the lifetime random variable from this type of data without ignoring or discarding the right-censored information. The development of such nonparametric density estimators has only occurred in the past six or seven years and the avenues of * This work was supported by the U.S. Air Force Office of Scientific Research and Army Research Office. 313 w. J. Padgett 314 investigation have been similar to those for the complete sample case, except that the problems are generally more difficult mathematically. The various types of estimators from right-censored samples that have been proposed in the literature will be indicated and briefly discussed here. They include histogram-type estimators, kemel-type estimators, maximum likelihood estimators, Fourier series estimators, and Bayesian estimators. In addition, since the hazard rate function estimation problem is closely related to the density estimation problem, various types of nonparametric hazard rate estimators from right-censored data will be briefly mentioned. Due to their computational simplicity and other properties, the kernel-type density estimators will be emphasized, and some examples will be given in Section 7. Before beginning the discussion of the various estimators, in the next section the required definitions and notation will be presented. 2. Notation and preliminaries Let X °, X 2, ° ..-, X~o denote the true survival times of n items or individuals which are censored on the right by a sequence Ua, U2. . . . . Un which in general may be either constants or random variables. It is assumed that the )(O's are nonnegative independent identically distributed random variables with common unknown distribution function F °. For the problem of density estimation, it is assumed that F ° is absolutely continuous with density fo. The corresponding hazard rate function is defined by r ° = f°/(1 - F ° ) . The observed right-censored data are denoted by the pairs (X~, A~), i = 1, . . . , n, where X; = min{X °, Us}, Ae={~ if Xg° ~< U;' if X ° > U~. Thus, it is known which observations are times of failure or death and which ones are censored or loss times. The nature of the censoring mechanism depends on the Us's: (i) If U1, . . . , Un are fixed constants, the observations are time-truncated. If all U,.'s are equal to the same constant, then the case of Type I censoring results. (ii) If all f i -- X~°~, the rth order statistic of X ° . . . . , X °, then the situation is that of Type II censoring. (iii) If UI . . . . . Un constitute a random sample from a distribution H (which is usually unknown) and are independent of X °, . . . , X°, then (Xi, A;), i = 1, 2, . . . , n, is called a randomly right-censored sample. The random censorship model (iii) is attractive because of its mathematical convenience. Many of the estimators discussed later are based on this model. Assuming (iii), A1. . . . . A,, are independent Bernoulli random variables and the distribution function F of each X;, i = 1. . . . . n, is given by 1 - F = Nonparametric estimation of density and hazard rate functions 315 (1 - F °) (1 - H). Under the Koziol and Green [24] model of random censorship, which is the proportional hazards assumption of Cox [7], it is assumed that there is a positive constant fl such that 1 - H = (1 - F°) ~. Then by a result of Chen, Hollander and Langberg [6], the pairs (X°, Ue), i-- 1, . . . , n, follow the proportional hazards model if and only if (X1 . . . . . An) and (A1. . . . . An) are independent. This Koziol-Green model of random censorship arises in several situations (Efron [11], Cs0rg6 and Horvfith [8], Chen, Hollander and Landberg [6]). Note that fl is a censoring coefficient since a = P ( X ° ~< Ue) = (1 + fl)- 1, which is the probability of an uncensored observation. Based on the censored sample (X;, Ae), i = 1, . . . , n, a popular estimator of the survival probability S ° ( t ) = 1 - F ° ( t ) at t >/0 is the product-limit estimator, proposed by Kaplan and Meier [22] as the 'nonparametric maximum likelihood estimator' of S °. This estimator was shown to be 'self-consistent' by Efron [11]. Let (Z e, A; ), i = 1. . . . , n, denote the ordered X;'s along with their corresponding Ae's. A value of the censored sample will be denoted by the corresponding lower case letters (x e, be) or (ze, hi) for the unordered or ordered sample, respectively. The product-limit estimator of S O is defined by [11] 1 /sn(t) = i=l - i i+ 1 ) a,.' , O, t E ( Z k _ l, Z ~ ] , k 2 ..... n, t>Z~. Denote the product-limit estimator of F ° ( t ) by fin(t) : 1 - fin(t), and let sj denote the jump o f / s n (or Pn) at Zj, that is, 1 -/Sn(Z2), sj: / P,(zj)-Pn(Zj+l), [ fin(Zn), j = 1, j:2, ..., n - 1, (2.1) j = n. Note that sj = 0 if and only if Aj = O, j < n, that is, if Zj is a censored observation. The product-limit estimator has played a central role in the analysis of censored survival data (Miller [36]), and its properties have been studied extensively by many authors, for example, Breslow and Crowley [4], Frldes, Retj6 and Winter [15], and Wellner [59]. Many of the nonparametric density estimators from right-censored data are naturally based on the product-limit estimator, beginning with the histogram-type and kernel-type estimators. 3. Histogram and kernel estimators One of the simplest nonparametfic estimators of the density function for randomly right-censored samples is the histogram estimator. Although they are W. J. Padgett 316 simple to compute, histogram estimators are not smooth and are generally not suited to sophisticated inference procedures. Estimation of the density function and hazard rate of survival time based on randomly right-censored data was apparently first studied by Gehan [29]. The life table estimate of the survival function was used to estimate the density f o as follows: The observations (x,, b,), i = 1. . . . . n, were grouped into k fixed intervals [ q , /2), [t2, t3), . . . , [tg, ~ ) , with the finite widths denoted by h,= t,+ 1 -te, i = 1, . . . , k - 1. Letting n[ denote the number of individuals alive at time t, L; be the number of individuals censored (lost or withdrawn from the study) in the interval [t,, t,+ 1), and d, be the number of individuals dying or failing in the ith interval (where time to death or failure is recorded from time of entry into the study), define ~, = de/n , and/~, = 1 -Oe, where n, = n~r - 5 1L r Therefore, qe is an estimate of the probability of^dying or failing in the ^ith interval, given exposure risk in the ith interval. Let 1I, = p,_ 1H,_ 1, where //1 - 1. Gehan's estimate of f o at the midpoint tm~ of the ith interval is then A f (tin,) - A 1-1, - A 1-1,+ 1 _ 1-1,0` hi i = 1, k - 1 hi An expression for estimating the large sample approximation to the variance of A f(tmi ) was also given in [19]. Using the product-limit estimator P. of F °, F01des, Rejt0, and Winter [16] defined a histogram estimator of f o on a specified interval [0, T], T > 0. For integer n > 0, let 0 = t~o") < tc1") < • • • < t~,~} = T be a partition of [0, T] into n subintervals I~"), where i~m=~[t~_),,t~m), [.[t(") t vn--1 , T ] , l~<i<v,, i = v, Then their histogram estimator is = L,(t7 °) - , x6I} '°. (3.1) t7 '> A If x ¢ [0, T], f ( x ) is either undefined or defined arbitrarily. Notice that if none of the observations are censored,/~, reduces to the empirical distribution function, and (3.1) becomes the usual histogram estimator with respect to the given partition. The strong uniform consistency of f on [0, T] was proven by F01des, Rejt0, and Winter [16] under some conditions on the partition, provided that f o was continuous on [0, T] and H ( T - ) < 1, where H ( T - ) denotes the limit from the left of H at T. This last condition is common in obtaining consistency properties under random right-censorship and insures that uncensored observations can be obtained from the entire interval of interest. Nonparametric estimation of density and hazard rate functions 317 Burke and Horv/lth [5] defined general density estimators which included histogram-type and kernel-type estimators with appropriate choices of the defining functions. They also obtained asymptotic distribution results for these estimators. In fact, their results were obtained for the more general situation of the k independent competing risks model. When k = 2, this reduces to the random rightcensorship model. The histogram estimator can be obtained as a special case of the kernel density estimators. The kernel-type estimators have been perhaps the most popular estimators in practice due to their relative computational simplicity, smoothness, and other properties. Kernel-type estimators from randomly right-censored data have been studied only since around 1978, beginning with the work of Blum and Susarla [3]. The investigation of kernel estimators for right-censored samples has been attempted along the same lines as for the complete sample case. However, due to mathematical difficulties introduced by the censoring, some of the analogous theory to the complete sample case has not yet been obtained. Blum and Susarla [3] generalized the complete sample results of Rosenblatt [45] concerning maximum deviation of density estimates by the kernel method. To define the Blum-Susarla density estimator, let {hn} be a positive sequence, called the bandwidth sequence, such that limn~o~ h, = 0, and let N + (x) denote the number of observed X;'s that are greater than x. Define where [A] denotes the indicator function of the event A. By a modification of the product-limit estimator, it can be shown that H* is a good estimate of H* = 1 - H. For a kernel function K satisfying certain conditions, the Blum-Susarla density estimator is given by f*(x) = [nhnH*(x)]-I ~ ~ n ] [Aj= j=l 1]. (3.2) For example, K can be a bounded density function with support in the interval [ -A, A ] for some A > 0 and absolutely continuous on [ - A , A ] with derivative K' which is square integrable on [ -A, A]. By following standard arguments, (f°H*),,(x) =- (nh,,) -1 ~ K((x - Xj)/h,,) [ 4 = 11 j=l and H*(x) can be shown to be good estimators off°(x)H*(x) and H*(x), respectively. This motivates the use of (3.1) as an estimator of f°(x). Blum and Susarla also obtain limit theorems for the maximum over a finite interval of a normalized deviation of the density estimator (3.2). These results are useful for goodness-of-fit tests and tests of hypotheses about the unknown lifetime density fo. 318 w. J. Padgett It was conjectured by Blum and Susarla [3] that the kernel-type estimator )~(x)=h£1f~K((x-t)/h~)dF*(t) behaved in the same way as f * , where F* was an estimator of F ° such as the product-limit estimator. In fact, FOldes, Rejt6, and Winter [16] proved uniform almost sure convergence of j~ to f o when F* was ^taken to be P,. Specifically, one of their results was that sup . . . . b lfn(x)-f°(x)l~O almost surely as n ~ ~ provided f o was bounded and had a bounded derivative on (a, b), - ~ ~< a < b ~< c~, K was right-continuous and of bounded variation, hn(n/logn)l/8~ oo, and H(T;o)< 1, where TFO sup{x: F°(x)< 1}. Again, the last condition insured that observed lifetimes in the entire support of F ° would be available. It should be noted that if no censoring is present, then = )~(X) = h2 1 f to K((x - t)/hn) clPn(t) (3.3) -oo reduces to the Parzen [43] estimator. McNichols and Padgett [32] wrote (3.3) in the form J~(x) = h2' ~ s~K[(x - Zj)/h,], (3.4) j=l where si is given by (2.1). They considered the mean, variance, and mean squared error of (3.4) under the K o z i o l - G r e e n model of random Acensorship described in Section 2. This model allowed the expected value of fn(X) to be evaluated by using the independence of (X 1. . . . . An) and (A 1. . . . , A,). In particular, if K is a Borel function such that sup IK(t) l < ~ , ~ _~ooqK(t) l dt < o0, lim,~oo [tK(t) l = 0, and ~ ~_to K(t) dt = 1, then E[J~(x)] = a h ; ' ~0°° g~(t)f(t)K((x - t)/h,) dt + (1 - a)p,,(a)h~ 1 E[K((x - Z,,/hn)], (3.5) where n-1 a= (l+fl) g,(t) = 1, b= 1 - a, Pn( a) : 1~ [ ( n - i + b ) / ( n - i + i=l [ 1 - F(t)]" - j [F(t)l j j=l j-1 ( n +1b )) ' =' ( n' +( bn) ( + n +bb -- k + l ) / k ! ' k ', 1)], Nonparametric estimation of density and hazard rate functions 319 F = 1 - (1 - H) (1 - F°), and f is a d e n s i t y for F. Furthermore, it was shown that if h , ~ 0 , then l i m n ~ E [ f , ( x ) ] = f ° ( x ) , x>0. Thus, under the Koziol-Green model, j~(x) is asymptotically unbiased for f ° ( x ) similar to the complete sample case (the conditions on K and hn are those imposed by Parzen [43]). Second moment convergence was also obtained under the conditions that nhn ~ Go and b = P(a censored observation)< 1 in addition to the conditions required for asymptotic unbiasedness above [32]. For the kernel estimator (3.4), it is desirable to allow the data to play a role in how much smoothing is done. Since, for a fixed n, h, is the 'smoothing constant', it would be reasonable to allow h n to be a function of the right-censored sample. McNichols and Padgett [35] consider this type of modification, which extends the work of Wagner [54] to censored data. This modified kernel estimator is fn(x) = Fn-1 ~ sjK[(x - Zj)/I'n] , (3.6) j=l where Fn = F n ( X 1 , . . . , X , ) is some function of the censored data. For this estimator it was shown that if H(TFo)< 1, K has bounded variation, limlx r~ 00 x K ( x ) = O, Fn ---, 0 in probability (almost surely), and n 1/2 (log log n)- 1/2 × Fn ~ ~ in probability (almost surely), then f ~ ( x ) ~ f ° ( x ) in probability (almost surely) at each x for which f o is continuous. One choice of/'n satisfying the above conditions is as follows: If 7n = [n'], ½< a < 1, where [. ] denotes the greatest integer function, let Din be the distance from Zj to its 7,-nearest neighbor among Z~ . . . . , Zj_l, Zj+I . . . . , Z~, l~<j~<n, and select Fn to be Dj~ with probability sj. The practical choice of the bandwidth h~ for a given censored sample is a problem which must be addressed in order to calculate the kernel estimator. For complete samples, several 'data-based' procedures for selecting a 'good' value of h, for a given set of data have been proposed (see Scott and Factor [46], for example). Among these procedures when samples are right-censored, the maximum likelihood approach seems to be feasible. This will be discussed further in Section 6. /X With the exception of the expressions for the mean, E [ f , ( x ) ] , in (3.5) and for E[j~2(x)] under the Koziol-Green model [32], very little has been done ^ concerning the small-sample properties of f , or any of the other kernel-type density estimators in the censored data case. Padgett and McNichols [40] have performed Monte Carlo simulations for several parametric families of lifetime distributions, uniform and exponential censoring distributions, several kernel functions, and several bandwidths to determine the small-sample behavior of A fn with respect to bias and mean squared error. For estimating the hazard rate function r ° from randomly right-censored data, FOldes, RejtO, and Winter [16] considered estimators of the form r.(x) = ~(x) 1 - F,,(x) , + 1/n x~>O, W. J. Padgett 320 where f denoted either their histogram estimator (3.1) or their kernel-type estimator (3.3). The 1In in the denominator simply prevents dividing by zero. Strong consistency results for rn similar to those for (3.1) and (3.3) were proven. McNiehols and Padgett [34] considered the kernel-type estimator of r ° given by r,,(x) = h;' x>/0 f I,:((x - t)/h,,) [1 - P . ( t ) ] - ' dP.(t), such t h a t F ( x ) < l , under the Koziol-Green model of random censorship. Expressions for E[r,,(x)] and var[r~(x)] were obtained, and it was shown that r,,(x) was asymptotically unbiased, and converged in mean square and in probability to r°(x), extending Watson and Leadbetter's [55, 56] results. Tanner and Wong [50] also studied a kernel-type estimator of r ° based on the ordered censored sample (Z~, A; ), i = 1. . . . , n, given by P(x)= ~ ( n - j + 1)-IAjKh.(x-Zj), j=l x>~0 such t h a t F ( x ) < l , where K was a symmetric integrable kernel with Kh(y ) --K(y/h). They derived expressions for E[f(x)] and var[~(x)] and proved under the conditions on K stated by Watson and Leadbetter [55, 56] that r(x) was asymptotically unbiased if h, ~ 0 and nh, ~ oo. The conditions assumed here were essentially the same as those required by McNichols and Padgett [34], except for the proportional hazards (Koziol-Green) model assumption which gave somewhat different expressions for the mean and variance. The asymptotic variance was also obtained, and Hajek's projection method was used to establish asymptotic normality under conditions on K, F °, H, and h,. Tanner and Wong [51] studied a class of estimators of the same general form as ¢(x) with K h replaced by K s, were 0 was a positive-valued 'smoothing vector' chosen to maximize a likelihood function. Hence, for this estimator the smoothing parameters were chosen based on the observed data. Tanner [49] considered a modified kernel-type estimator of r ° in the form n ~,,(x) = (2Rk) - ~ ~ A; i=~ n - i + I':((x - Z,)/2R,¢), 1 where R k was the distance from x to the kth nearest of the uncensored observations among X 1. . . . . Xn. This estimator allowed the data to play a role in determining the degree of smoothing that would occur in the estimate. Assuming that S o and f o were continuous in a neighborhood about x, k = [n~], ½ < ~ < 1, where [ • ] was the greatest integer function, that K had bounded variation and compact Nonparametric estimation of density and hazard rate functions 321 support on the interval [ - 1, 1], and that r ° was continuous at x, it was shown that ~n(x) was strongly consistent. Blum and Susarla [3] considered the estimator (in the notation of Equation (3.2)) ~n(x) - (f°H*),,(x) , x >~ O , S*(x) where S*(x) = (number of Zj's > x)/n. This estimator was also of the kernel type, and limiting results similar to those stated for the density estimator (3.2) were obtained for ~,. Ramlau-Hansen [44] used martingale techniques to treat the general multiplicative intensity model. His results are very general and include the kernel estimators of hazard rate functions of FOldes, Rejt6 and Winter [16] and Yandell [61]. The martingale techniques yielded local asymptotic properties of many of the hazard rate estimators in a simpler manner than classical procedures. Finally, in a recent paper Liu and Van Ryzin [26] obtained a histogram estimator of the hazard rate function from randomly right-censored data based on spacings in the order statistics. They showed the estimator to be uniformly consistent in a bounded interval and asymptotically normal under suitable conditions. An efficiency comparison of their estimator with the kemel estimator of hazard rate was also given. Also, Liu and Van Ryzin [27] gave the large sample theory for the normalized maximal deviation of a hazard rate estimator under random censoring which was based on a histogram estimate of the subsurvival density of the uncensored observations. 4. Likelihood methods One approach to estimating a density function nonparametrically is that of maximum likelihood. Nonparametric maximum likelihood estimates of a probability density function do not exist in general. That is, the likelihood function for a complete sample is unbounded over the class of all possible densities. However, by suitably restricting the class of densities, a nonparametric maximum likelihood estimator (MLE) may be found within the restricted class. For complete samples, the maximum likelihood estimator of a density g was given by Barlow, Bartholomew, Bremner and Brunk [1] if g was assumed to be either decreasing (nonincreasing) or unimodal with known mode. Wegman [57, 58] assumed unimodality with unknown mode and found the M L E of the density and studied its properties for complete samples. McNichols and Padgett [33] studied maximum likelihood estimation of decreasing or unimodal densities based on arbitrarily right-censored data. The censoring variables U1. . . . . Un could be either constants or continuous random variables. They first assumed that f o was decreasing (nonincreasing) on [0, ~ ) and let F D W. J. Padgett 322 be the set of distributions with decreasing left-continuous densities on [0, oo). For the ordered censored observations (z;, b" ), i = 1. . . . . n, the likelihood function was written as n L ( f °) = l-[ [f°(zi)]a; [S°(zi)] 1-~; , i=1 where S o = 1 - F °. It was shown that a maximum likelihood estimator of f o must be a step function. The estimator was found by maximizing the likelihood function L ( f °) over F D subject to the decreasing density constraint. Equivalently, the constrained optimization problem to be solved was Yl .... , Yn i=1 j=l subject to (i) Y l > > ' Y 2 > t ' ' ' > I Y n > I O , (ii) ~ y:(Z: -- Z:_1) <<.I , j=1 where zo = 0. This function to be maximized was shown to be concave and the problem was shown to have a unique solution, say y* . . . . . y*. Then any density of the form O, I f*(x) = y*, O, x>~0, zj_l < x < ~ z j , j = l . . . . . X~Zn+ n+l, 1 , was a maximum likelihood estimator of fo, where y*+ ~, some value less than or equal to y*, and z,+ 1 ( > z , ) were chosen so that 1- ~ y*(z: - z : - l ) = Yn*+l(Zn+ 1 --Zn)" j=l Similarly, f o was estimated by maximum likelihood assuming that f o was increasing (nondecreasing) on [0, M], M > 0 known. Then, if M denoted the known mode of the unknown unimodal density, the two maximum likelihood estimators on [0, M ] and on (M, ~ ) found as above could be combined to estimate the unimodal density. If f o was assumed to be unimodal with unknown mode M, then McNichols and Padgett [33] applied the above procedure for known mode, assuming zj_ 1 < M < z: for each j = 1. . . . . n, obtaining n solutions for f0. These n solutions gave n corresponding values of the likelihood function. The maximum likelihood estimator of f o was then taken to be the solution with Nonparametric estimation of density and hazard rate functions 323 the largest of the n likelihood values, analogous to Wegman's [57, 58] procedure for complete samples. Another approach to the problem of nonparametric maximum likelihood estimation of a density from complete samples was proposed by Good and Gaskins [20]. This method allowed any smooth integrable function on the interval of interest (a, b) (which may be finite or infinite) as a possible estimator, but added a 'penalty function' to the likelihood. The penalty function penalized a density for its lack of smoothness, so that a very 'rough' density would have a smaller likelihood than a 'smooth' density, and hence, would not be admissible. De Montricher, Tapia, and Thompson [9] proved the existence and uniqueness of the maximum penalized likelihood estimator (MPLE) for complete samples. Lubecke and Padgett [30] assumed that the sample was arbitrarily right-censored, (Xi, Ai), i = 1, ..., n, and showed the existence and uniqueness of a solution to the problem: maximize L(g) subject to g(t) >/0 for all t e O, (4.1) f g(t) dt= 1, and g e H(f2), where L(g) : f i [g(x,)] ~' [1 - G(xi) ]' - ~' exp[ - ¢(g)], i=1 f2 is a finite or infinite interval, H(f2) is a manifold, and G is the distribution function for density g. In particular, letting u = gl/2 and using Good and Gaskins' [20] first penalty function, the problem (4.1) becomes: maximize L(u) = f i [u(x;)] ~' I 1f i=1 x, u2(t) dt ]1/20 - ~,) --oo (4.2) where x i>O, i= 1. . . . . n, ~o Ue(t) dt= 1, and u(t)>>.O, t>O. L e t x i = x iand 6 _ i = b / , i = 1 , . . . , n , a n d d e f i n e f i ( x ) = u ( l x l ) f o r x e R \ { 0 } and ~(0)= limx~o+ u(x). Then define the following problem: maximize L(fi)= f i Iil=l [fx ]1/21 a,-) [~(x,.)] ~i 2 - ~2(t)dt -oo × exp [ - 2 a f _ ~ (~'(t))2 dtl , (4.3) 324 W.J. Padgett where ~ _ ~ f i z ( t ) d t = 2 , ~Hs-{g~Hl(-~,~): g(x)=g(-x)}, and H i ( - ~ , ~ ) is the Sobolev space of real-valued functions such that the function and its first derivative are square integrable. If u* solves (4.3), then it can be shown that u*(t) = u*(t), t>~ O, and u*(t) = O, t < 0, solves (4.2). Lubecke and Padgett [30] showed that a solution to (4.3) was a function K* which solves the linear integral equation ~ ( t ) = C(t; x, ~, )~) + (8~2)-'/2 f/E ,1,, ,; ~ {z~-(x~.) I( . . . . . ](Izl) l × sinh [(2/2~) 1/2 (t - z)]fi~(z) dz, (4.4) where the forcing function is of the form - bi(2~)~)-1/2 [exp(-(2/2e) 1/z It C(t; x, ~, 2) - 1 { i~_ I; 1 x;I) ~(xi) + exp(-(A/2~) 1/2 It + x;I)] c;(1 - hi) [exp(- (2/2o~)1/2t) + exp((2/2~)l/2t)]~, I,I = 1 ~2~(x,) for a 2 > 0. The integral equation (4.4) can be transformed to a second-order differential equation whose solution fi* can be numerically obtained. Then (ft,)2 is the M P L E of the density f o based on the first penalty function of Good and Gaskins. The nonparametric maximum likelihood estimation of the hazard rate function r ° based on the arbitrarily right-censored sample (X;, A;), i = 1, 2, . . . , n, was considered by Padgett and Wei [41] in the class of increasing failure rate (IFR) distributions. The techniques of order restricted inference were used to obtain the estimator following an argument similar to that of Marshall and Proschan [31 ] for the complete sample case. A closed form solution to the likelihood function of r ° subject to the IFR condition was found to be a nondecreasing step function. Small sample properties of their estimator were indicated by a Monte Carlo study. Mykytyn and Santner [37] considered the same problem of maximum likelihood estimation of r ° under arbitrary right censorship assuming either IFR, decreasing failure rate (DFR), or U-shaped failure rate. Their estimator was essentially equivalent to Padgett and Wei's estimator and was shown to be consistent by using a total time on test transform. This estimator was maximum likelihood in the Kiefer-Wolfowitz sense. Friedman [17] also considered maximum likelihood estimation from survival data. Let n survival times be observed over a time period divided into I(n) intervals and assume that the hazard rate function of the time to failure of individual j, rj(t), is constant and equal to r,~ > 0 on the ith interval. The maxi- ) Nonparametric estimation of density and hazard rate functions 325 mum likelihood estimate )~ of the vector 2 = {log r,7: j = 1. . . . . n; i = 1, . . . . I(n)} gave a simultaneous estimate of the hazard rate function. Friedman gave conditions for the existence of 2 and studied the asymptotic properties of linear functionals of ;~ in the general case when the true hazard rate is not a step function. This piecewise smooth estimate of the hazard rate can be regarded as giving piecewise smooth density estimates. 5. Some other methods Nonparametric density estimators based on Fourier series representations have been proposed for censored data. Kimura [23] considered the problem of estimating density functions and cumulatives by using estimated Fourier series. A method for generating a useful class of orthonormal families was first developed for the complete sample case and the results were then generalized to the case of censored data. Variance expressions for the quantity - S - ~ tp(x)dfin(x ) were obtained, where tp was chosen so that the variance existed and Pn was the product-limit estimator. Finally, Monte Carlo simulation was used to test the methods developed. Tarter [53] obtained a new maximum likelihood estimator of the survival function S O by using Fourier series estimators of the probability densities of the uncensored observations and censored observations separately. That is, the density estimates were f and f , obtained from the n 1 observed uncensored X,.'s and the n2 observed censored Xi's, respectively, where n 1 + n 2 = n. It was shown that as n--* ~ the new likelihood estimator approached the product-limit estimator from above. It should be noted that the series-type density estimators f and j7 used here were obtained by the usual complete-sample formulas. The final series-type estimator to be mentioned here is the general estimator of the density in the k competing risks model of Burke and Horvfith [5]. It could be considered as a Fourier-type estimator by appropriate choices of the form of the defining functions. Another method that has been used for estimating hazard rate and density functions is that of Bayesian nonparametric estimation. Since the work of Ferguson [12, 13], many authors have been concerned with the Bayesian nonparametric estimation of a distribution function or related functions with respect to the Dirichlet process or other random probability measures as prior distributions. For censored data Susarla and Van Ryzin [47, 48] considered the estimation of the survival function with respect to Dirichlet process priors, while Ferguson and Phadia [14] used neutral to the right processes as prior distributions. Padgett and Wei [42] obtained Bayesian nonparametric estimators of the survival function, density function, and hazard rate function of the lifetime distribution using pure jump processes as prior distributions on the hazard rate function, assuming an increasing hazard rate. Both complete and right-censored samples were considered. The pure jump process prior was appealing because it w. J. Padgett 326 had an intuitive physical interpretation as shocks occurring randomly in time that caused the hazard rate to increase a constant small amount at each shock, which also closely approximated the (random) increasing failure rate by a (random) step function. Dykstra and Laud [ 10] also considered a prior distribution on the hazard rate function in order to produce smooth nonparametric Bayes estimators. Their prior was an extended gamma process and the posterior distribution was found for right-censored data. The Bayes estimators of the survival and hazard rate functions with respect to a squared error loss were obtained in terms of a one-dimensional integral. Lo [28, 29] estimated densities and hazard rates, as well as other general rate functions, from a Bayesian nonparametric approach by constructing a prior random density as a convolution of a kernel function with the Dirichlet random probability. His estimator of the density with respect to squared error loss was essentially a mixture of an initial or prior guess at the density and a sample probability density function. His technique can be used for complete or censored samples. 6. Numerical examples of some kernel density estimators Of the many types of nonparametric density estimators available, probably the most often used in practice are the kernel-type estimators. They are relatively simple to calculate and can produce smooth, pleasing results. In this section numerical examples will be given for the kernel estimator (3.4) and the modified estimator (3.6) with the nearest neighbor-type procedure for selecting Fn. One problem in using kernel density estimators is that of how to choose the 'best' value of the bandwidth hn to use with a given set of data. This question has been addressed in the complete sample case by several authors (see Scott and Factor [46], for example), and 'data-based' choices of hn have been proposed using maximum likelihood, mean squared error, or other criteria. For the estimator (3.4) no expressions for the mean squared error for finite sample sizes exist at present, except for those very complicated ones given by McNichols and Padgett [32] under the Koziol-Green model. Hence, selection of hn to minimize mean squared error does not seem to be feasible. However, Monte Carlo simulation results of Padgett and McNichols [40] indicate that at each x there is a value of h~ which minimized the estimated mean squared error of f~(x) in (3.4). Similar results were also obtained in [40] for the Blum-Susarla estimator f*(x) defined by (3.2). These simulation results indicated a range of values of h, which ga',~e small estimated mean squared errors of fn(x) and f*(x) at fixed x. The maximum likelihood criterion for selecting h~ for a given censored sample is feasible for fn but does not seem to be tractable, even using numerical methods, for f * due to the complications introduced by the term H*(x) in the likelihood expression. The maximum likelihood approach will be used in the following example for f~. A A A ^ Nonparametric estimation of density and hazard rate functions 327 Following a similar approach to expressions (2.8) and (2.9) of Scott and Factor [46], consider choosing h, to be a value of h >~ 0 which maximizes the likelihood L(h) = [f~ (z,)] ~' f~ (u) du i=1 (6.1) . i Obviously, by definition of ~ , the maximum of (6.1) is + o O Hence, the following modified likelihood criterion is considered: maximize L~(h) = h~O [f,,~(zk)] ~zk=l fnk(U) du at h = O . (6.2) , k where j=l j#k For the standard normal kernel K ( u ) = (2zc)-U2exp(-u2/2), the logarithm of (6.2) becomes logLl(h)=-(k~ + b~)log h 62 log k=l #(2r0- z/2 exp ( - (zk - zj)2/2h 2) Lj=I j#k (6.3) j#k where ~ denotes the standard normal distribution function. An approximate Table 1 Failure times (in millions of operations) of switches z, ~; z, ~" z, b; z; b; 1.151 1.170 1.248 1.331 1.381 1.499 1.508 1.543 1.577 1.584 0 0 0 0 0 1 0 0 0 0 1.667 1.695 1.710 1.955 1.965 2.012 2.051 2.076 2.109 2.116 1 1 I 0 1 0 0 0 I 0 2.119 2.135 2.197 2.199 2.227 2.250 2.254 2.261 2.349 2.369 0 1 1 0 1 0 1 0 0 1 2.547 2.548 2.738 2.794 2.883 2.883 2.910 3.015 3.017 3.793 1 1 0 1 0 0 1 I 1 0 Padgett W. J. 328 0.74 ,L fn 0.66 with a = 0.75 with ~ 0.59 : ~" ~ r I ,' . . . . ' fn with = 0.60 h = 0.18 0.52 0.44 , 4-- i 0.37 0.29 0.22 0.15 0.07 0.00 0.00 0.62 1.25 1.88 2.50 3.13 3.75 4.38 5.00 I I 5.63 6.25 Fig. 1. D e n s i t y e s t i m a t e s for s w t i c h d a t a . (local) maximum of (6.3) with respect to h can be easily found by numerical methods for a given set of censored observations, and this estimated h, denoted by hn, can be used in (3.4) to calculate f,(x). For this example of the density estimation procedure given by (6.3) and (3.4), the life test data for n = 40 mechanical switches reported by Nair [38] are used. Two failure modes, A and B, were recorded and Nair estimated the survival function of mode A, assuming the random right-censorship model. Table 1 shows the 40 observations with corresponding be values, where b; = 1 indicates failure mode A and bi = 0 denotes a censored value (or failure mode B). Using this data, the function logL,(h) had a maximum in the interval [0, 1] at h4o~0.18. Hence, J~o was computed from (3.4) with bandwidth 0.18. This estimate is shown in Figure 1. This maximum likelihood approach to selecting h, does not produce the smoothest estimate, but is one criterion that can be used. Shown also in Figure 1 are the modified kernel estimates calculated from (3.6) with the '7,-nearest neighbor' calculation of F, for the smoothing parameter values = 0.60 and 0.75. The estimate was also Acalculated for ~ = 0.55, but was very close to the fixed bandwidth estimate f 4 4 with h = 0.18 and, hence, is not shown. The modified estimator (3.6) with ~ = 0.75 is pleasingly smooth, but with the small sample and only 17 uncensored observations, the value of 0t = 0.60 might be a compromise between the very smooth (~ = 0.75) and somewhat rough (~ = 0.55) estimates. Nonparametric estimation of density and hazard rate functions 329 References [1] Barlow, R. E., Bartholomew, D. J., Bremner, J. M., and Brunk, H. D. (1972). Statistical Inference Under Order Restrictions. Wiley, New York. [2] Bean, S. J. and Tsokos, C. P. (1980). Developments in nonparametric density estimation. Intern. Statist. Rev. 48, 215-235. [3] Blum, J. R. and Susarla, V. (1980). Maximal derivation theory of density and failure rate function estimates based on censored data. In: P. R. Krishniah, ed., Multivariate Analysis V. North-Holland, Amsterdam, New York, 213-222. [4] Breslow, N. and Crowley, J. (1974). A large sample study of the life table and product limit estimates under random censorship. Ann. Statist. 2, 437-453. [5] Burke, M. and Horvfith, L. (1982). Density and failure rate estimation in a competing risks model. Preprint, Dept. of Math. and Statist., University of Calgary, Canada. [6] Chen, Y. Y., Hollander, M. and Langberg, N. A. (1982). Small sample results for the Kaplan-Meier estimator. J. Amer. Statist. Assoc. 77, 141-144. [7] Cox, D. R. (1972). Regression models and life-tables. J. Roy. Statist. Soc. Ser. B 34, 187-220. [8] Cs6rg6, S. and Horv/tth, L. (1981). On the Koziol-Green model for random censorship. Biometrika 68, 391-401. [9] De Montricher, G. F., Tapia, R. A. and Thompson, J. R.(1975). Nonparametric maximum likelihood estimation of probability densities by penalty function methods. Ann. Statist. 3, 1329-1348. [10] Dykstra, R. L. and Laud, P. (1981). A Bayesian nonparametric approach to reliability. Ann. Statist. 9, 356-367. [11] Efron, B. (1967). The two sample problem with censored data. In: Proc. Fifth Berkely Symp. Math. Statist. Prob. Vo14, 831-853. [12] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1, 209-230. [13] Ferguson, T. S. (1974). Prior distributions on spaces of probability measures. Ann. Statist. 2, 615-629. [14] Ferguson, T. S. and Phadia, E. G. (1979). Bayesian nonparametric estimation based on censored data. Ann. Statist. 7, 163-186. [15] F61des, A., Rejt6, L. and Winter, B. B. (1980). Strong consistency properties of nonparametric estimators for randomly censored data, I: The product-limit estimator. Periodica Mathematica Hungarica 11, 233-250. [16] F61des, A., Rejt6, L. and Winter, B. B. (1981). Strong consistency properties of nonparametric estimators for randomly censored data, Part II: Estimation of density and failure rate. Periodica Mathematica Hungarica 12, 15-29. [17] Friedman, M. (1982). Piecewise exponential models for survival data with covariates. Ann. Statist. 10, 101-113. [18] Fryer, M. J. (1977). A review of some non-parametric methods of density estimation. J. Inst. Math. Appl. 20, 335-354. [19] Gehan, E. (1969). Estimating survival functions from the life table. J. Chron. DIS. 21, 629-644. [20] Good, U. J. and Gaskins, R. A. (1971). Nonparametric roughness penalties for probability densities. Biometrika 58, 255-277. [21] Kalbfleisch, J. D. and Prentice, R. L. (1980). The StatisticalAnalysis of Failure Time Data. Wiley, New York. [22] Kaplan, E. L. and Meier, P. (1958). Non parametric estimation from incomplete observations. J. Amer. Statist. Assoc. 53, 457-481. [23] Kimura, D. K. (1972). Fourier Series Methods for Censored Data, PhD. Dissertation, University of Washington. [24] Koziol, J. A. and Green, S. B. (1976). A Cram6r-von Mises statistic for randomly censored data. Biometrika 63, 465-473. [25] Lagakos, S. W. (1979). General right censoring and its impact on the analysis of survival data. Biometrics 35, 139-156. 330 W. J. Padgett [26] Liu, R. Y. C. and Van Ryzin, J. (1984). A histogram estimator of the hazard rate with censored data. Ann. Statistics . [27] Liu, R. Y. C. and Van Ryzin, J. (1984). The asymptotic distribution of the normalized maximal deviation of a hazard rate estimator under random censoring. Colloquia Mathematica Societatis Janos Bolyai, Debrecen (Hungary). [28] Lo, A. Y. (1978). On a class of Bayesian nonparametric estimates. I: Density estimates. Dept. of Math. and Statist. Tech. Rep., University of Pittsburgh. [29] Lo, A. Y. (1978). Bayesian nonparametric method for rate function. Dept. of Math. and Statist. Tech. Rep., University of Pittsburgh. [30] Lubecke, A. M. and Padgett, W. J. (1985). Nonparametric maximum penalized likelihood estimation of a density from arbitrarily right-censored observations. Comm. Statist.-Theory Meth. . [31] Marshall, A. W. and Proschan, F. (1965). Maximum likelihood estimation for distributions with monotone failure rate. Ann. Math. Statist. 36, 69-77. [32] McNichols, D. T. and Padgett, W. J. (1981). Kernel density estimation under random censorship. Statistics Tech. Rep. No. 74, University of South Carolina. [33] McNichols, D. T. and Padgett, W. J. (1982). Maximum likelihood estimation of unimodal and decreasing densities on arbitrarily right-censored data. Comm. Statist.-Theory Meth. 11, 2259-2270. [34] McNichols, D. T. and Padgett, W. J. (1983). Hazard rate estimation under the Koziol-Green model of random censorship. Statistics Tech. Rep. No. 79, University of South Carolina. [35] McNichols, D. T. and Padgett, W. J. (1984). A modified kernel density estimator for randomly right-censored data. South African Statist. J. 18, 13-27. [36] Miller, R. G. (1981). Survival Analysis. Wiley, New York. [37] Mykytyn, S. and Santner, T. A. (1981). Maximum likelihood estimation of the survival function based on censored data under hazard rate assumptions. Comm. Statist.-Theory Meth. A 10, 1369-1387. [38] Nair, V. N. (1984). Confidence bands for survival functions with censored data: A comparative study. Technometrics 26, 265-275. [39] Padgett, W. J. and McNichols, D. T. (1984). Nonparametric density estimation from censored data. Comm. Statist.-Theory Meth. 13, 1581-1611. [40] Padgett, W. J. and McNichols, D. T. (1984). Small sample properties of kernel density estimators from right-censored data. Statistics Tech. Rep. No. 102, University of South Carolina. [41] Padgett, W. J. and Wei, L. J. (1980). Maximum likelihood estimation of a distribution function with increasing failure rate based on censored observations. Biometn'ka 67, 470-474. [42] Padgett, W. J. and Wei, L. J. (1981). A Bayesian nonparametric estimator of survival probability assuming increasing failure rate. Comm. Statist.-Theory Meth. A 10, 49-63. [43] Parzen, E. (1962). On estimation of a probability density function and mode. Ann. Math. Statist. 33, 1065-1076. [44] Ramlau-Hanse, H. (1983). Smoothing counting process intensities by means of kernel functions. Ann. Statist. 11, 453-466. [45] Rosenblatt, M. (1976). On the maximal deviation of k-dimensional density estimates. Ann. Probab. 4, 1009-1015. [46] Scott, D. W. and Factor, L. E. (1981). Monte Carlo study of three data-based nonparametric probability density estimators. J. Amer. Statist. Assoc. 76, 9-15. [47] Susarla, V. and Van Ryzin, J. (1976). Nonparametric Bayesian estimation of survival curves from incomplete observations. J. Amer. Statist. Assoc. 71, 897-902. [48] Susarla, V. and Van Ryzin, J. (1978). Large sample theory for a Bayesian nonparametric survival curve estimator based on censored samples. Ann. Statist. 6, 755-768. [49] Tanner, M. A. (1983). A note on the variable kernel estimator of the hazard function from randomly censored data. Ann. Statist. 11, 994-998. [50] Tanner, M. A. and Wong, W. H. (1983). The estimation of the hazard function from randomly censored data by the kernel method. Ann. Statist. 11, 989-993. Nonparametric estimation of density and hazard rate functions 331 [51] Tanner, M. A. and Wong, W. H. (1983). Data-based nonparametric estimation of the hazard function with applications to model diagnostics and exploratory analysis. J. Amer. Statistc. Assoc.. [52] Tapia, R. A. and Thompson, J. R. (1978). Nonparametric Probability Density Estimation. The Johns Hopkins Univ. Press, Baltimore, MD. [53] Tarter, M. E. (1979). Trigonometric maximum likelihood estimation and application to the analysis of incomplete survival information. J. Amer. Statist. Assoc. 74, 132-139. [54] Wagner, T. (1975). Nonparametric estimates of probability densities. IEEE Trans. Inform. Theory 21, 438-440. [55] Watson, G. S. and Leadbetter, M. R. (1964). Hazard Analysis I. Biometrika 51, 175-184. [56] Watson, G. S. and Leadbetter, M. R. (1964). Hazard analysis II. Sankhy~ Ser. A 26, 110-116. [57] Wegman, E. J. (1970). Maximum likelihood estimation of a unimodal density function. Ann. Math. Statist. 41, 457-471. [58] Wegman, E. J. (1970). Maximum likelihood estimation of a unimodal density, II. Ann. Math. Statist. 41, 2160-2174. [59] Wellner, J. (1982). Asymptotic optimality of the product limit estimator. Ann. Statist. 10, 595-602. [60] Wertz, W. and Schneider, B. (1979). Statistical density estimation: A bibliography. Internat. Statist. Rev. 47, 155-175. [61] Yandell, B. S. (1982). Nonparametric inference for rates and densities with censored serial data. Biostatistics Program Tech. Rep., University of California, Berkeley. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 333-351 | "7 .Ik/ Multivariate Process Control Frank B. Alt and Nancy D. Smith Introduction There are many situations in which it is necessary to simultaneously monitor two or more correlated quality characteristics. Such problems are referred to as multivariate quality control problems. To illustrate the need for a multivariate approach, consider a manufacturing plant where the product is plastic film. The usefulness of the film depends on its transparency (X~) and its tear resistance (X2). It is assumed that these two quality characteristics are jointly distributed as a bivariate normal. The standard values are: #Ol = 90, #02 = 30, 0-ol = 9 and 0-02 = 3. Furthermore, it has been determined that there is a negative correlation of Po = - 0.3 between these two characteristics. These values can be displayed in a (2 × 1) vector of means, denoted by/~o, and a (2 x 2) covariance matrix, denoted by 2;0: ~llO= (]AO1) = \#oJ (90) ; 30 ~0 = (O"21 \Po 0-ol%2 p00-010-02~8 1(= %z2 ,/ -~.1) - 8.1 A sample of, say, ten items is drawn from the process at regular intervals and measurements are obtained on both variables. For the time being, attention will be focused on monitoring the process means. One approach would be to ignore the correlation between the characteristics and monitor each process mean separately. For each sample of size ten, an estimate of #o~, denoted by Ya, is obtained and plotted against time on an Y-chart with the following control limits: UCL~ =/~o~ + 3(ao~/x/n) = 98.54, EL1 = ~Ol = 90, (1) LCL1 = #ol - 3(%,/w/n) = 81.46. Since 3-sigma limits were used in determining (1), the type I error for this chart equals 0.0027. Another Y-chart would be set up to monitor the process mean of 333 . 334 F. B. Alt and N. D. Smith the tear resistance variable. The limits are: UCL 2 = 32.85, (2) CL 2 = 30, LCL 2 = 27.15. If both sample means plot within their respective control limits, the process is deemed to be in control. The use of separate 2-charts is equivalent to plotting (Xl, 52) on a single chart formed by superimposing one 2-chart over the other, as shown in Figure 1. If the pair of sample means plots wihtin the rectangular control region, the process is considered to be in control. ~2 , I UCL 2 -- - i \ LCL 2 \°. ~ Region A \L Regio~B _ i __~ R o g i o o c I I LCL 1 UCL 1 Fig. 1. The elliptical and rectangular control regions. The use of separate control charts or the equivalent rectangular region can be very misleading. It will be shortly demonstrated that the true control region is elliptical in nature, and the process is judged out of control only if the pair of means (Y1, 52) plots outside this elliptical region. However, if the rectangular region is used, it may be erroneously concluded that both process means are in control (Region A), one is out or control and the other is in (Region B), and both are out of control (Region C). The degree of correlation between the two variables affects the size of these regions and their respective errors. Furthermore, the probability that both sample means will plot within the elliptical region when the process is in control is exactly l - e , whereas with the rectangular region, this probability is at least 1-e. Although the use of separate charts to individually monitor each process mean suffers from the weakness of ignoring the correlation between the variables, these Y-charts can sometimes assist in determining which process mean is out of control. When 5-charts are used in this supplemental fashion, it is recommended that the type I error rate of each one be set equal to e/p, where p is the number of variables and e is the overall type-I error. When p = 2 and e = 0.0054, the Multivariate process control 335 type-I error of each chart would be set at 0.0027 which implies 3-sigrna limits as used in equations (1) and (2). In the sequel, control charts will be presented for both Phase I and Phase II with the presentation for Phase II being first. In both cases, the charts are referred to as multivariate Shewhart charts. Phase II control charts In some instances, estimates of/~o and '~o may be derived from such a large amount of past data that these values may be treated as parameters and not their corresponding estimates. Duncan [10] states that the values for the parameters could also have been selected by management to attain certain objectives. These are referred to as standard or target values. Phase II comprises both scenarios. Control charts for the mean When there is only one quality characteristic, which is normally distributed with mean #o and standard deviation a o, the probability is (1 - ~) that a sample mean will fall between #o + z~/2(tro/x/~) where z~/2 is the standard normal percentile such that P(Z > z~/2) = ~/2. This is the basis for the control charts presented in equations (1) and (2). It is customary to use 3.0 for z~/z in which case ~ = 0.0027. Therefore, if an ~ falls outside the control limits, there is very strong evidence that assignable causes of variation are present. Suppose random samples of a given size are taken from a process at regular intervals and an 2-chart is used to determine whether or not the process mean is at the standard value #o. This is equivalent to repeated significance tests of the form Ho: # =/~o vs. H1: /~ ~ kto. Furthermore, instead of using an 2-chart with upper and lower control limits, one could use a control chart with only an upper control limit on which values of [ x / n ( ~ - #0)/%] 2 are plotted. In this case, UCL = Zl,~ where Za2.~ denotes the Z2-percentile. Note that Z I , 0.0027 = 9.0. Admittedly, the simplicity of construction of the Z2-chart is offset somewhat by the fact that runs above and below the mean will be harder to detect since they are intermingled. However, the hypothesis testing viewpoint and Z2-chart concept provide the foundation for extending the univariate to the multivariate case. The univariate hypothesis on the mean is rejected if - ] = - (Oo:)- ' (2 - > z, (3) A nature generalization is to reject Ho: It = Ito vs. Ha: It # Ito if 2 Xo2 = n ( i - / t o ) ' ,~o- I(X _/to) > Zp. ~, (4) where X denotes the (p x 1) vector of sample means and 2;0 ~ is the inverse of the (p x p) variance-covariance matrix. For the case of two quality characteristics, 336 F. B. Alt and N. D. Smith Z2 = n(1 - po2)- ' [(Yl - Ix01) 2 0"012 q- (x2 - ]A02)2 0022 - 2po aOl'aO21(ffa -/.to1) (22 - #o2)1, (5) which is the equation of an ellipse centered at (#m, #o2)- Thus, for two quality characteristics, a control region could be constructed which is the interior and boundary of such an ellipse. If a particular vector of sample means plots outside the region, the process is said to be out of control and visual inspection may reveal which characteristic is reponsible for this condition. Refer to Figure 1. When there are two or more quality characteristics, the vector of process means could be morzitored by using a control chart with UCL = ~2. ~. If Xo2 > UCL, the process is deemed out of control and an assignable cause would be sought. It may be possible to determine this by the supplemental use of individual Y-charts where the type I error of each chart is set equal to ~/p. By Bonferroni's inequality, the probability that each of the sample means plots within its respective control limits when each process mean is at the standard value is at least 1-0¢. Refer to Alt [2, 3]. The Z2-chart has associated with it an operating characteristic (OC) curve or, equivalently, a power curve. The power shows the probability of detecting a shift in the process mean on the first sample taken after the occurrence of the shift. Let re(2) denote the power of the chart. Then 7r(2) = P(G2~ > Zp2, ~,), (6) where Xp,, ,2 denotes the noncentral chi-square random variable with p degrees of freedom and noncentrality parameter 2 = n(/, -/*o)' ~g 1(/~ _/1o)" For p = 2, 2 = n(1 - p2)- ' [(/~ 1 -/~01)2crffl = + ( # 2 - #02)20"022 - 2poGffl 1 0.ff22(~1 - ~ 0 1 ) ( ~ 2 - ~02)1 • (7) The power is strictly increasing in 2 for fixed significance level ~ and fixed sample size n. Wiener [25] presents tables of the noncentrality parameter 2 for significance levels (e) of 0.100, 0.050, 0.025, 0.010, 0.005 and 0.001, degrees of freedom equal to 1(1)30(2)50(5)100, and for power values (n) of 0.10(0.02)0.70(0.01)0.99. For example, suppose ~ = 0.005, n = 10, Po = - 0 . 4 , and it is important to detect a shift of magnitude 0.5 standard deviations in the mean of each variable. Then r~(2) = 0.42. If Po = - 0 . 2 , the power decreases to 0.28. When there are two positively correlated characteristics and one of the process standard deviations (crl) can be adjusted, Alt et al. [6] found that the power is not a monotonically decreasing function of crI as it is in the univariate case. A fundamental assumption in the development of the Z2-chart is that the underlying distribution of the quality characteristics is multivariate normal. In the univariate case, the effect of nonnormality on the control limits of Y-charts was studied by Schiling and Nelson [24]. By minimizing the average run length of an out-of-control process for a large Multivariate process control 337 fixed value of the average run length of an in-control process, Alt and Deutsch determined the sample size (n) and control two quality characteristics. chart constant (Xj;2, , ) w h e n T h e y f o u n d t h a t (i) f o r a r e l a t i v e l y l a r g e p o s i t i v e c o r r e - Table 1 Control chart data for the mean-standards given Sample number xl xz Zo2 1 2 3 4 5 6 7 8 9 10 11 12 a 13 a 14 a 15 a 16 ~ 17 ~ 18 ~ 19 ~ 20 b 21 b 22 b 23 ~ 24 b 25 b 26 b 27 b 91.32 91.14 93.71 91.02 88.O9 93.62 84.52 90.22 90.21 87.67 94.25 95.37 96.50 97.62 98.75 c 99.87 ~ 101.00 c 102.12 ~ 103.25 c 95.37 96.50 97.62 98.75 ~ 99.87 ~ 101.00 ~ 102.12 ~ 103.25 ~ 29.57 29.01 27.60 30.81 29.63 29.44 31.10 28.76 29.32 30.00 30.31 0.32 1.10 6.71 1.15 0.84 1.67 4.09 30,68 31.06 31.43 31.81 32.18 32.56 32.93 d 33.31 a 1.80 0.54 0.74 2.89 4.44 6.34 8.58 11.16 e 14.09 ~ 17.36 e 20.97 e 24.93 ~ 5.39 8.78 13.07 ~ 18.25 ~ 24.32 e 31.29 " 39.14 ~ 47.90 ~ UCL 1 = 98.54 UCL 2 = 32.85 CLl = 90.00 CL 2 = 30.00 LCL l = 81.45 LCL 2 = 27.15 UCL = ~2,0.0054 2 = 10.44 b c d e [4] there are For these samples, #o~ was increased in increments of 0.125aOl from 91.1250 to 99.0000. For these samples, #ol was increased in increments of 0.125 a~ and #02 was increased in increments of 0.125 a 2 from 30.3750 to 33.0000. These values of ~t plot outside the control limits stated in equation (1). These values of 22 plot outside the control limits stated in equation (2), For these samples, Zo2 > UCL = 10.44. F. B. Alt and N. D. Smith 338 lation, a larger sample size is needed to detect large positive shifts in the means than small positive shifts, and (ii) a larger sample size is needed to detect shifts for p > 0, than when p < 0. Montgomery and Klatt [21, 22] present a cost model for a multivariate quality control procedure to determine the optimal sample size, sampling frequency, and control chart constant. Although Hotelling [15, 16] proposed the use of the Z2 random variable in a control chart setting for the testing of bombsights, he did not actually use Z2-con trol charts since the variance-covariance matrix (Zo) was unknown. His papers are primarily devoted to the case for 2;o unknown. To illustrate the use of the Z2-control chart, consider the data listed in Table 1 for the plastic film extruding plant described in the Introduction. The sample size is ten. To assess the impact of changes in either one or both process means, note that #o~ was increased by increments of 0.125trOl for data sets 12 through 19 while #ol and #02 were each increased by increments of 0.125ao;, i = 1, 2, for data sets 20 through 27. Since type I error was set equal to 0.0054, UCL = ~ 22, 0.0054 2 XO 24 21 18 15 12 UCL 9 6 3 i 5 10 15 20 i i i i 25 i + i Sample No. Fig. 2. = 10.44. The X2-control chart is illustrated in Figure 2. When only #Ol was changed (sample numbers 12 to 19), the value of the test statistic (Zo2) exceeded the UCL as soon as #Ol was increased by at least 0.5 standard deviations (sample numbers 15 to 19). Furthermore, when #ol and #o2 were simultaneously altered (sample numbers 20 to 27), Zoz > UCL as soon as each process mean had been increased by 0.375 standard deviations (sample numbers 22 to 27). The control limits for the individual control charts were presented in equations (1) and (2). For sample numbers 12 to 19, the 2-chart for transparency (X1) performed as well as the zZ-chart. This result is not surprising since the process mean for this variable alone increased. However, when both process means were increased (sample numbers 20 to 27), the individual charts did not perform as well as the Multivariate process control 339 i(2-chart. Specifically, the Z-chart for transparency did not detect an increase until /~ol had increased by at least 0.5 standard deviations and the 2-chart for tear resistance did not plot out-of-control until #02 had increased by at least 0.875 standard deviations. Control charts for process dispersion (Phase 11) In the univariate case, even if the process mean is at the standard value but the process standard deviation has increased, the end result is a greater fraction of nonconforming product. This is illustrated in Montgomery [20]. Thus, it is important to monitor both the mean and the variability of a process. Methods for tracking process dispersion are presented in this section. The case of one quality characteristic is reviewed first. To determine whether the process variance is at the standard value (ao2), several different control charts can be used. All of the control charts assume that a random sample of size n is available and that the characteristic is normally distributed. For small sample sizes (n ~< 10), the range chart is the one most frequently used to monitor process dispersion. It can be shown that E ( R ) = aodz and Var(R) = d2a~. Since most of the distribution of R is contained in the interval E(R) + 3 [Var(R)] 1/2, the control limits for the R-chart are as follows: U C L = %(d 2 + 3d3) = Dzao, (8) CL = aodz, LCL = ao(d2 - 3d3) = D 1fro. Values of d 2, d3, D1, and D 2 are presented in Table M of Duncan [ 10] for n = 2 to n -- 25. Duncan [ 10] also gives details for constructing a percentage point chart based on the distribution of W = R/a o. Another chart that makes use only of the first two moments of the sample statistic is the S-chart, where S denotes the sample standard deviation with a divisor of (n - 1). It is known that E(S z) = ag and E(S) = aoc4, where =[ 2 c4 11/2 I_n 11_1 F(n/2) r((n- 1)/2) (9) Thus, Var(S) = E(S a) [ E ( S ) ] 2 = ag(1 - c2). Since most of the probability distribution of S is within 3 standard deviations of E(S), the control limits for the S-chart are as follows: - U C L : O'o[C4 -F 3 N//T-- C42] = B6o'o, CL = O'oC4, L C L = 00[c4 - 3 , / 1 (10) - c4 1 -- 500- F. B. Alt and N. D. Smith 340 Table 2 C o n t r o l c h a r t d a t a for p r o c e s s d i s p e r s i o n - - s t a n d a r d s given R, R2 s1 s2 slz 1 2 3 28.49 33.58 18.96 5.94 9.40 11.92 9.61 9.96 5.27 1.89 2.96 3.52 - 6.93 - 13.61 - 3.56 4 5 6 23.20 41.65 24.60 10.76 6.70 11.04 6.98 12.04 8.39 3.04 1.99 3.18 7 8 9 12.39 22.12 23.96 9.62 12.10 12.66 3.60 9.12 6.89 10 11 26.19 33.87 7.35 7.61 8.99 9.53 Sample number ISl IS[ l/z 14~ 282.03 681.87 330.91 16.79 26.11 18.19 3.38 0.48 4.46 7.15 -0.92 5.21 397.65 575.45 683.31 19.94 23.99 26.14 4.25 5.15 2.57 2.89 3.97 3.78 2.80 - 15.83 - 1.07 100.03 1060.62 677.84 10.00 32.57 26.04 10.39 1.78 3.09 2.92 2.47 1.92 - 3.70 687.64 539.46 26.22 23.23 1.37 0.83 U n i v a r i a t e c o n t r o l limits T r a n s p a r e n c y (x 0 Tear resistance (xz) R-chart UCL 1 = 49.22 C L 1 = 27.70 L C L 1 = 6.18 U C L 2 = 16.41 C L z = 9.23 L C L 2 = 2.06 S-chart U C L 1 = 15.02 C L 1 = 8.75 L C L l = 2.48 U C L z = 5.01 C L z = 2.92 L C L 2 = 0.83 S Z-chart U C L 1 = 226.44 U C L z = 25.16 Multivariate control limits IS I '/Z-chart (Probability limits) U C L = 51.95 L C L = 6.60 [S r'/2-chart ( 3 - s i g m a limits) U C L = 47.17 C L = 22.90 L C L = 0.00 W*-chart U C L = 12.38 (a = 0.01) Values of ¢4, Bs, and B 6 a r e presented in Table M of Duncan [ 10] for n = 2 to n = 25. A variation of the S-chart is the sigma chart, on which are plotted values of the sample standard deviation where the divisor is n. In this case, the upper and lower control limits are given by Multivariate process control 341 go[C,] + 3 ~/(n - 1 - nGZ)/n] where c,~ = c4 % / ~ - 1)/n. A control chart can also be based on the unbiased sample variance, S 2. Since (n - 1)$2/O-o2 is distributed as a chi-square random variable with (n - 1) degrees of freedom, it follows that p [ 6 ~ )2Z2n _ l , l _ ( = / 2 ) / ( n _ 1)~<S2< a62 2Z , - l, =/2/(n - 1)1 = 1 - ~. The control limits for the S2-chart are as follows: U C L = t76,~, 2 n2_ i , = / 2 / ( n - 1) , LCL = Oz ' 6 ~2 n - 1, i_(=/2)/(n- 1) . (11) However, Guttman, Wilks, and Hunter [ 11 ] point out that is is customary to use only an upper control limit; specifically, U C L = a~Z~_ 1, J ( n - 1). Note that the S2-chart is equivalent to repeated tests of significance of the form Ho: a z = a~ vs. HI: a 2 ¢ a~ where the critical region for this test is equivalent to the regions above the U C L and below the LCL, as stated in equation (11). The power of the test is given by: g ( ~ ) = 1 -- p [ ) ] - 2 ~ 2 _ 2 1, 1-(~/2)~ Zn-I ~ ,'~-2 ~2_ 1,=/2], where ). = a l / % . Operating characteristic curves for this test are presented in Bowker and Lieberman [8] for ~ = 0.05 and 0.01. This test is significantly effected when the assumption of sampling from a normal distribution is violated. Summary statistics and the control limits for all three univariate charts (R-chart, S-chart, and S2-chart) are recorded in Table 2. For the S2-chart, the type I error of each chart was set equal to 0.0027, where ;(9.0.0027 2 = 25.16. When the data were generated, there was no intentional increase in %1 or ao2. Thus, it is not surprising that all of the sample measures of dispersion plot in control. In the multivariate case, attention thus far has been focused on monitoring the process mean vector. It is also desired that the covariance matrix of the process remain at the standard value Xo. To check this, a random sample of size n is obtained and the value of some sample statistic is determined from the (p x n) data matrix. Let S denote the (p x p) sample variance-covariance matrix: S= [ ] S12 S12 • . . Slp s21 s22 Sip S2p • . . s2p S2p where the diagonal elements are the sample variances and the off-diagonal elements are the sample covariances. For the ease of two quality characteristics: 342 S = F. B. Alt and N. D. Smith I k=l s,: S12 ~J ~- 1 ~ (Xlk-~l)(X:,~-~:) (Xlk- ~1) (x2k- ~2)l ~ (xz~- ~2)2 k=l =I Recall that the sample correlation coefficient for the ith and jth variables, denoted by r~j, is defined as r,y = so~sisj. The sample generalized variance, denoted by IS I, is a widely used scalar measure of multivariate dispersion. For two variables, ISI = s21s~- s~2 = s~s~(1 - r 2 2 ) . A geometrical interpretation of IS I for two variables will now be presented. Let D denote the (2 x n) data matrix after centering: LX21 -x2 xz2 - x 2 " " X2n- ~ 2 -I Note that S T ( n - 1 ) - I D D ' . Specifically, s ? = ( n - l ) - l d i d " , i = 1 , 2 , s12 = (n - 1)- i dl d2, and r12 = d'l d 2 / ~ ~ , which is the cosine of the angle 0 between d I and d 2. Thus, sin20 = 1 - r~2 and ISI = sis222 sin 20. However, the square of the area of the parallelogram formed by using d I and d 2 as principal edges is ( n - 1)2slZs~ sin20. It follows that ISl = ( n - 1)-2 (area) 2. This result generalizes to p variables as follows: ISI = ( n - 1)-P(volume) 2. Johnson and Wichern [18] point out the following properties of the generalized sample variance: (i) the volume will increase as the length of any deviation vector (d;) increases; (ii) for deviation vectors of fixed length, the volume will increase until the deviation vectors are at right angles to each other; (iii) if one of the sample variances is small, the volume will be small; (iv) if one of the deviation vectors lies nearly in the hyperplane formed by the others, the volume will be small; and (v) distinctly different covariance matrices can have the same generalized variance. In view of the last property, it is recommended that any procedure based on ISI be accompanied by the appropriate univariate procedures to monitor dispersion• The first chart to be considered is the ISI1/2-chart, which is the multivariate analogue of the S-chart. The first approach makes use of the distributional properties of I S ] 1/2 When p = 2, Hoel [ 14] has shown that 2(n - 1)l S I 1~2~]~ol 1/2 is distributed as Z2z,_4. By pivoting on this expression, it follows that control limits for the t SI1/2-chart are as follows: UCL = 1~011/2~2n 2 - 4, ~/2/2( n -- 1), LCL = ]I;ol 1/2 22n--4, 2 1--C~/2)/2(n -- 1), (12) 343 Multivariate process control where I fop 1/2 = 0-01 0"02 N / / ~ - -- ]02). For the plastic film example, J27o11/2 = 25.76. Thus, for each r a n d o m sample of size n, IS] 1/2 = (s?s~ - s22) 1/2 = s , s 2 x / ~ - r~2 is computed. If IS [ ,/2 > U C L or iS] 1/2 < LCL, the dispersion of the process is deemed to be out of control and assignable causes are sought. Although the exact distribution of pS 11/2 for p > 2 is unknown, several approximations are available and discussed in Alt [ 1 ]. The second approach utilizes only the first two m o m e n t s of IS] '/2 and the property that most of the probability distribution of IS[ ~/2 is within three standard deviations of its expected value. Since Isl = (n - 1)-" 12701 I-i"k =, g 2 - ~, where the chi-square r a n d o m variables are independent, it follows that P E ( ] S I r) = (n - 1)-Pr 2 pr 127olr [I r(r + (n - k ) / 2 ) / F ( ( n - k)/2). k=l Thus E ( [ S I '/2) = 1~.o11/2(2/(n - 1)) p/z r ( n / 2 ) / r ( ( n -p)/2) = 127o1~/2b3 (13) and P E ( I S I ) = I~,ol (n - 1) - p 1-I (n - k) = 12ol 61. (14) k~l Now V a r ( I S I '/2) = E ( I S I ) - [ E ( I S ] ' / 2 ) ] = = 12ol (bl - b32). (15) Since the upper and lower control limits are given by E(ISI ,/2) __!_3 x / V a r ( I S I ,/2), it follows that the control limits for a U C L = 12ol '/2 (b 3 + 3 ~ ISll/=-chart are given by - b32), CL : J2od ,/2 b3 ' (16) L C L = 12o1 '/2 (63 - 3 X ~ l - b32) • When p = 1, b3 = c4 as stated in equation (9), b, = 1, 12o11/2 = ao ' and the control limits presented in equation (16) reduce to those stated in equation (10). When p=2, b, = (n - 2)/(n - 1) and b3 = (2/(n - 1)) [r(n/2)/r((n - 2 ) / 2 ) ] . Furthermore, when n = 10, b 1 = b 3 = 0.889 and bl - b 2 = 0.099. Thus U C L = 1.831 ] 2o] ~/2, C L = 0.889 [ 2o] ,/2, and L C L --- 0 since it is negative. The final chart to monitor process dispersion in the multivariate case is the analogue of the S2-chart, which was equivalent to repeated tests of significance. 344 F. B. Alt and N. D. Smith Anderson [7] shows that the likelihood ratio test of Ho: $ = ~7o vs. H1: , ~ '~o, modified to be unbiased (the power of the rest is greater than or equal to the significance level), is based on the following statistic: - 1) - (n - 1)ln(ISI) + + (n - 1 ) t r ( ~ o l S ) , W* = -p(n (n - 1)ln(l~ol) (17) where tr(Zo 1S) is the sum of the diagonal elements of ~o-1S- When p = 2, tr(~ o 1S) = (1 - p 2 ) - 1 [(s2/a21) + (s2/tr22) _ 2po(s12/trolao2)]. Anderson shows that W* is asymptotically distributed as Xp2(p+1)/2. Although an improved asymptotic approximation is also presented, the upper 5~o and 1 ~ points of the exact distribution of W* have been tabulated and appear in [7] for p = 2(1)10 and various values of (n - 1). For p = 2 and (n - 1) = 9, Ho is rejected at the 5~o level if W* > 8.52 and at the 1% level if W* > 12.38. For successive random samples of size n, the process dispersion is considered to be out of control if the values of W* exceed UCL. When there are multiple characteristics, three procedures have been presented for monitoring the variability of a process. Although ]SI 1/2 is plotted on each of the first two charts, the distinction is that the control limits for the first chart are probability limits (equation (12)) while the control limits for the second chart are 3-sigma limits (equation (16)). The third procedure is based on the modified likelihood ratio test, and values of W* (equation (17)) are plotted on a control chart with the upper control limit determined by a specified significance level. For = 0.01, UCL = 12.38. Summary statistics for all three charts are recorded in Table 2. The statistics for all three charts plot in control. It is concluded that the variability of the process is in control. Although the range chart was used to monitor the variability of each quality characteristic, the multivariate analogue was not presented since it is relatively intractable. Phase I control charts In Phase II, control charts are used to determine whether the process is in control at the standard values (/~o, ~o). During the initial stages of process surveillance,/~o and ,~o are usually unknown and must be estimated from preliminary samples taken when the process is believed to be in control. These preliminary samples are referred to as rational subgroups and m is used to denote the number of subgroups. When there is one quality characteristic, the procedure ordinarily used to construct the Phase I control chart limits is to replace the standard values in the Phase II charts by unbiased estimates obtained from the m rational subgroups. For example, #ol in equation (1) would be replaced by the average of the sample means obtained from each rational subgroup, and any one of several measures of variability would be used in place of trol. However, Hillier [13] and 345 Multivariate process control Yang and Hillier [26] have developed a two-stage procedure using probability limits for determining whether the data for the first m subgroups came from a process that was in control (Stage I) a n d whether future subgroup data from this process exhibit statistical control. This was extended to the multivariate case by Alt et al. [ 5]. Stage I control limits for the m e a n For each of the m subgroups, a r a n d o m sample of size n is obtained a n d the ( p x 1) vector of sample m e a n s (~;) is calculated as is the ( p x p) sample variance-covariance matrix (St). If statistical control existed within each subgroup, then unbiased estimates of the process m e a n vector and the process v a r i a n c e covariance matrix are given by x=(1/m) ~ x'i and S=(1/m) i=1 ~ S;, i=l respectively. F o r the plastic film extrusion process, m was chosen to be 10. The elements of the sample m e a n vectors a n d covariance matrices are recorded in Table 3. W h e n standard values for/~ a n d • are available, the test statistic is stated in equation (4). If the s t a n d a r d values are replaced by their unbiased estimates, the Table 3 Statistics for control charts for the mean (Phase I--Stage I) Subgroup xl,i 22, i S2l , i 90.37 87.70 93.03 102.19 90.10 90.01 87.58 99.71 88.85 92.32 29.36 30.04 28.85 32.08 29.73 31.29 30.24 32.64 29.96 28.03 75.53 99.82 100.21 70.64 166.94 72.34 48.42 109.69 157.11 9.08 S2 2, i S12,1 T2O, 1 T2O, 1 a i 1 2 3 4 5 6 7 8 9 10 6.78 11.71 2.96 5.37 4.39 14.80 11.87 3.31 8.09 7.07 - 3.88 - 16.16 3.52 - 6.11 - 14.30 - 10.74 - 9.04 - 5.84 - 25.76 - 0.87 1.97 2.78 2.53 22.94* 1.20 1.60 2.61 20.94* 1.74 7.05 0.13 0.59 1.36 -0.00 3.47 0.76 -0.17 3.28 Pooled statistics and UCL Subgroups 1 - 10 1-3, 5-7, 9, 10 x1 x2 s2 s2 sl2 c(m, n, p) UCL 92.19 30.22 90.98 7.64 - 8.92 1.82 13.60 89.99a 29.69 a 91.18 a 8.46a -9.65 a 1.77a 13.51 Revised value after subgroups 4 and 8 have been excluded. 346 F. B . A l t a n d N . D . S m i t h resulting statistic is: T 02, 1 = n(~, - ~)' S - 1(2; _ ~) (18) i = 1, 2, . . . , m. F o r the case o f two quality characteristics, T,02, , = ( n / d e t ( S ) ) [ ( x l , ; - 2(X1,i - Xl) x , ) 2~2 + (x2, i - ~2) 2~2 (X2, i - (19) X2)S12] 2 X2 2 = (l/m) where d e t ( S ) = ~ 2 ~ 2 _ ~2,2, s- 2 = ( l / m ) 2 ~ 1 s ,,,, m ~,2=(1/m) 5~,=,s12.~. Alt e t a l . [5] show that T2o. l is p+ 1, where ci(m, n, p)Fp . . . . . . c l ( m , n, p ) = p ( m - 1)(n - 1)/(mn - m - p + 1). m Y~i-, s22, i and -distributed as (20) To determine whether the process was in control when the first m subgroups were obtained, the m values o f T.o. 2 1 are plotted on a chart with U C L = c l ( m , n , p ) F p , mn m p + l , , a n d L C L = 0- If T'2o,1 for one or more o f the m initial subgroups plots out o f control, the c o r r e s p o n d i n g subgroups are d i s c a r d e d and the first stage control limits are recalculated on the basis o f the remaining subgroups. This p r o c e d u r e is illustrated for the plastic film extrusion p r o c e s s ; the s u m m a r y statistics are r e c o r d e d in Table 3. To simulate an out-of-control process, each process mean was increased by one s t a n d a r d deviation for subgroups 4 and 8. N o t e that T 02, 1 exceeded the U C L (with ~ = 0.001) for these two subgroups. As a consequence, these subgroups were discarded, x and S were recomputed, and new control limits were determined using m = 8. F o r the remaining eight subgroups, the recalculated values of T o,, 2 are less than the revised U C L . The process a p p e a r s to be in control with respect to its mean. F o r the case when p = 1, U C L = ( ( m - 1 ) / m ) F a , m ( , _ l ) " and T,02, 1 = n ( x i - ~ ) 2 / s 2 where s 2 was previously defined as the average o f the sample 2 variances obtained from each subgroup. Since F,, . , ( n - ,), ~ = t,~cn1), ~/2, it follows that 1 - ~ = P[(X i - ~)2/~2 ~ ((m - 1)/m)F,,m(,, ,), ~] = P [ IXi - ~1 ~ x / S 2 ( ( m - l ) / m ) t m ( n - , ) , ~/21 = P[~-A4x/~5<~X,<~+A4x/SS], (21) where A 4 = x / ( m - 1)/m tr~¢~- 1), ~/2" Thus, the multivariate result reduces to the univariate result previously o b t a i n e d by Yang and Hillier [26]. Furthermore, Bonferroni intervals for the individual characteristics are obtained by using A 4 = x / ( m - 1)/m tm( n _ 1), a/2p" F o r p = 2, the upper and lower control limits for each variable are given by 2 + A 4 x / / ~ . Setting m = 10, n = 10, and a = 0.001 yields the following control limits: U C L , = 124.81, U C L 2 = 39.67, L C L 1 = 59.57, L C L z = 20.77. 347 Multivariate process control Although each process mean had increased by one standard deviation for subgroups 4 and 8 and this was detected by T,02, 1 ~ these increases failed to show up on the univariate charts. Stage H control limits f o r the mean After the Stage I upper control limit has been revised and the test statistics for the remaining subgroups do not exceed this upper control limit, a Stage II control chart is started for future subgroups. Let i f denote the (p x 1) vector of sample means for a future subgroup. Substituting ~f for 2; in equation (18) yields the Stage II test statistic: T(~ 2 = n(~,f- 5)* S - l ( x f - ~), (22) where x and S are obtained from Stage I. It is shown in Alt et al. [5] that T02, 2 is distributed as c2(m, n, p)Fp . . . . . . p + 1 where c 2 ( m , n, p ) = p(n - 1) (m + 1)/(mn - m - p + 1). (23) In order to determine whether the mean remains in control during Stage II, values of T,02, 2 for each future subgroup are plotted on a control chart with UCL = c 2 ( m , n, p)Fp . . . . . . p+ ~, ~ and LCL = 0. If Z2o,z exceeds the UCL, an assignable cause is sought. Yang and Hillier [26] suggest that 2, S and the UCL be updated fairly often in the beginning, with less updating after the process has stabilized. The To2,2-chart can be supplemented by charts for each quality characteristic. For p = 2, the upper and lower control for limits for each variable are x + A* ~/fi, where A * = x / ( m + 1)/m tmc,- 1), cx/2p. The summary statistics for the plastic film extrusion process are recorded in Table 4. Since each mean was increased by one standard deviation for f--- 4 and Table 4 Statistics for control charts for the mean (Phase I - - S t a g e II) Subgroup x_ j, f _x2, f T 2o, 2 90.38 86.98 92.39 88.91 89.61 91.02 87.96 103.31 86.13 85.76 29.55 29.81 29.98 34.22 29.44 29.44 29.16 33.11 29.35 30.69 0.03 1.05 1.03 26.40* 0.13 O.14 1.20 50.88* 2.39 2.37 f 1 2 3 4 5 6 7 8 9 10 UCL = c2(8, 10, 2)F2, 7J, o.oos = 13.03 348 F. B. Alt and N. Do Smith 8, it is not surprising that ToE,2 > U C L (with ~ = 0.005) for these two subgroups. Since the test statistics for eight more subgroups have plotted in control, ~ and may be recomputed using the sixteen subgroups where T,02, 1 and T,02, 2 plotted in control. Control charts f o r process dispersion (Phase I) The procedure used to monitor the mean of a multivariate process during Phase I is based on probability limit charts for Stages I and II. However, this approach will not be employed to monitor the dispersion since the methodology has not been completed at this time. Rather, the course used will correspond to the univariate method of replacing the population parameter (ao) by an unbiased estimate obtained from m rational subgroups. When there is only one quality characteristic, the R-chart is frequently used to analyze the variability of past data. If R i denotes the range of each subgroup and R- = (l/m) E "i= 1 R;, then an unbiased estimate of a o is -R/d 2. T h e Phase I control limits for the R-chart are obtained by substituting this unbiased estimate for a o in equation (8). Usually, the control limits are written as U C L = D 4 R and LCL = D 3 R . Values of 0 3 and O 4 c a n be found in Table M of Duncan [ 10]. Another possibility for analyzing variability is the S-chart. Let 3 denote the average of the sample standard deviations from the m subgroups. Then the control limits for an S-chart are U C L = B43 and L C L = B33 where B 3 = 1 - (3/c4) ~ - c2 and B 4 = 1 + (3/c4) x//1 - c42 . Values of B 3 and B 4 a r e tabulated in Duncan. The B 3 and B 4 constants used in an S-chart are obtained by substituting the unbiased estimate, 3/c4, for a 0 in equation (10) and simplifying. Another alternative to control process variability in the S2-chart, where 5 2 is the average of the sample variances. Note that S ~ x / ~ since S is the average of the sample standard deviations. When the unbiased estimate (32) of a 2 is substituted in equation (11), the following Phase I control limits are obtained: U C L = SZZ2n-I,~,/2/( n -- 1), LCL = s-2 Z n2- !, ! -- ( o ~ / 2 ) / ( n -- 1). (24) It is customary to use only an upper control limit where the percentage point is 2 Zn- l, at" By using the summary statistics recorded in Table 3, control limits can be determined for the S and S2-charts. For the transparency variable (X1), 31 = 9.15; for the tear resistance variable (Xa), 32 = 2.68. From Table M of Duncan, it is seen that, for n = 10, B 3 = 0.284, and B 4 = 1.716. For X1, the control limits are U C L 1 = 15.70 and LCL 1 = 2.60; for X 2, U C L a = 4.60 and LCL a = 0.76. Since none of the points falls outside the S-control limits for either variable, the variability of the process is deemed to be under control during this preliminary period of 10 subgroups. For the sa-charts, U C L 1 = (90.98)(25.16)/9 = 254.34 Multivariate process control 349 and UCL 2 --- (7.64)(25.16)/9 = 21.36. Again, none of the points exceed the upper control limits and the same conclusion would be reached. When there are multiple quality characteristics, two variations of the IS[ 1/2_ chart were presented for monitoring orocess dispersion during Phase II. The first was a probability limit chart with control limits as stated in equation (12). These particular limits are applicable only when p = 2. Let IS*[ 1/2 denote the average of the square roots of the generalized sample variances. That is, I S , I 1/2 = (l/m) E m ,'= 1 IS;[ 1/2. Since IS * 1!/2/b3 is an unbiased estimate of I1~011/2, Phase I control limits are as follows: UCL = IS*11/2Z~n_4,~/2/2b3(n- 1), LCL = IS*l 1/2 )~2n2 - 4, 1 - ~/2/2b3(n- 1). (25) The constant b 3 w a s defined in equation (13). The other Phase II chart for ISI 1/2 used the 3-sigma limits stated in equation (16) and was appropriate for any number of quality characteristics. Thus, the Phase I limits are obtained by substituting IS*l 1/2/b3 for 127011/2 in equation (16). The resulting limits are: UCL = [S*[ 1/2 [1 + (3/b3) x/rbll - b~], CL = IS*l 1/2, (26) LCL = [S*I 1/2 [1 - (3/b3) x / ~ - b ] ] . When p = 1, the above control limits for a [S*[ 1/2-chart are identical to those for the S-chart with the B 3 and B 4 factors stated in equation (23). Another procedure that could be used for investigating process dispersion during Phase I is obtained from equation (17), which was the likelihood ratio statistic for testing Ho: Z = 27o. To obtain the corresponding Phase I procedure, unbiased estimates of [I;o[ and 270 1 are needed. Let PSol denote the average oftrl the generalized sample variances from the m subgroups. That is, ISol = (i/m) ~i= 1 ISil- By using the result stated in equation (14), it can be shown that 1Sol~b1 is an unbiased estimate of 127olLet SF 1 denote the inverse of the sample variance-covariance matrix for subgroup i, i = 1. . . . , m. Kshirsagar [ 19] shows that (n - p - 2)S 7 l/(n - 1) is an unbiased estimate of 270 1• Thus, if S , 1 = ( /1 m ) ~ =m 1 $ 7 1, then (n - p - 2 ) S , l/(n - 1) is an unbiased estimate of 270 1 obtained from the m rational subgroups. The Phase I procedure is obtained by substituting ISol~b1 for t 27ot and (n - p - 2 ) S , 1/(n - 1) for Zo-1 in equation (17). The revised values of W*, i = 1. . . . , m, would still be plotted on a control chart with UCL = )~p(p 2 + 1)/2T h e control limit factors used during Phase I for both one and more than one quality characteristic were independent of the number of subgroups. Some authors argue that these factors should also be a function of m, the number of subgroups. Such factors are presented in Alt [ 1]. The 3-sigma IS*ll/2-chart will be used to investigate the variability of the plastic film process. The value of ]Se[ 1/2 for each of the initial ten subgroups can be obtained from the summary statistics presented in Table 3. For example, F. B. Alt and N. D. Smith 350 IS111/2 = x/(75.53) (6.78) - ( - 3.88) 2 = 22.29. F o r ease of reference, these values are r e c o r d e d in T a b l e 5. It was previously stated that bl = b3 = 0.889 when n = 10. Thus, U C L = (21.46) (2.06) = 44.21 and L C L = (21.46)(0.06) = 1.29. Since none o f the values o f [Sil !/2 fails outside the control limits, it a p p e a r s that the variability o f the process is under control. Table 5 Statistics for 3-sigma [S*[l/2-chart Subgroup 1 2 3 4 iStl J/2 22.29 16.85 30.13 IS*I 1/2 = 21.46 UCL = 44.21 LCL = 1.29 5 6 7 18.50 22.97 30.91 22.21 8 9 18.15 24.66 10 7.96 Other approaches The control charts presented in this p a p e r were Shewhart charts. W h e n there is one quality characteristic, the cumulative sum ( C U S U M ) control chart has smaller average run lengths than the Shewhart chart when used to detect small shifts in the process mean. Recently, three multivariate C U S U M charts have been proposed. Let t o d e n o t e the s t a n d a r d value of the process m e a n and $ the variance-covariance matrix. Crosier [9] defines c. = [(s._ 1 + x.- to)' Y , - l ( s . - i + x . - oo)l 1/2 and p r o p o s e s the following C U S U M scheme: S~=O if C.<<.k, S,,=-(S,,_~ + X n - / J o ) ( 1 - k / C , ) if C , , > k , where So = 0 and k > 0. Crosier's scheme signals when S ,n ~ - 1Sn > h 2. Healy [ 12] developed a C U S U M p r o c e d u r e b a s e d on the sequential probability ratio test. Let 6 denote the shift from /~o that is i m p o r t a n t to detect. Define D = x / ~ , ~ - 1 ~ and a ' = ~t' ~ - l i D . Then H e a l y ' s scheme has S n -- m a x { S n _ l + a'(x,, - / 1 o ) - 0.5 D, 0} . Healy's scheme signals when S n > L, where L is an appropriately chosen constant. Healy also presents a C U S U M scheme for detecting a shift in the covariance matrix. H e shows that this C U S U M is equivalent to a C U S U M s p o n s o r e d by Pignatiello et al. [23] for detecting a shift in the mean. Multivariate process control 351 Jackson [17] presents an overview of principal components and its relation to quality control as well as several other recent developments, such as Andrews plots. References [1] Alt, F. B. (1973). Aspects of multivariate control charts. M.S. thesis, Georgia Institute of Technology, Atlanta. [2] Alt, F. B. (t982). In S. Kotz and N. L. Johnson, eds., Encyclopedia of Statistical Sciences, Vol. 1, Wiley, New York, 294-300. [3] Alt, F. B. (1985). In S. Kotz and N. L. Johnson, eds., Encyclopedia of Statistical Sciences, Vol. 1, Wiley, New York, 110-122. [4] Alt, F. B. and Deutsch, S. J. (1978). Proc. Seventh Ann. Meeting, Northeast Regional Conf. Amer. Inst. Decision Sci., 109-112. [5] Alt, F. B., Goode, J. J., and Wadsworth, H. M. (1976). Ann. Tech. Conf. Trans. ASQC, 170-176. [6] Aft, F. B., Walker, J. W., and Goode, J. J. (1980). Ann. Tech. Conf. Trans. ASQC, 754-759. [7] Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd ed., Wiley, New York. [8] Bowker, A. H. and Lieberman, G. J. (1959). Engineering Statistics. Prentice-Hall, Englewood Cliffs, NJ. [9] Crosier, R. B. (1986). Technometrics 28, 187-194. [10] Duncan, A. J. (1974). Quality Control and Industrial Statistics. 4th ed. Richard D. Irwin, Homewood, IL. [11] Guttman, I. and Wilks, S. S. (1965). Introductory Engineering Statistics. Wiley, New York. [12] Healy, J. D. (1987). Technometrics. To appear. [13] Hillier, F. S. (1969). J. Qual. Tech. 1, 17-26. [14] Hoel, P. G. (1937). Ann. Math. Stat. 8, 149-158. [15] Hotelling, H. (1947). In: C. Eisenhart, H. Hastay, and W. A. Wallis, eds., Techniques of Statistical Analysis, McGraw-Hill, New York, 111-184. [16] Hotelling, H. (1951). Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, CA, 23-41. [17] Jackson, J. E. (1985). Commun. Statist.-Theor. Meth. 14, 2657-2688. [18] Johnson, R. A. and Wichern, D. W. (1982). Applied Multivariate Statistical Analysis. PrenticeHall, Englewood, NJ. [19] Kshirsagar, A. M. (1972). Multivariate Analysis, Marcel Dekker, New York. [20] Montgomery, D. C. (1985). Introduction to Statistical Quality Control, Wiley, New York. [21] Montgomery, D. C. and Klatt, P. J. (1972). Manag. Sci. 19, 76-89. [22] Montgomery, D. C. and Klatt, P. J. (1972). AIIE Trans. 4, 103-110. [23] Pignatiello, J. J., Runger, G. C. and Korpela, K. S. (1986). Truly multivariate CUSUM charts. Working Paper # 86-024, College of Engineering, University of Arizona, Tucson, AZ. [24] Schilling, E. G. and Nelson, P. R. (1976). J. Qual. Tech. 8. [25] Wiener, H. L. (1975). A Fortran program for rapid computations involving the non-central chi-square distribution. NRL Memorandum Report 3106, Washington, DC. [26] Yang, C.-H. and Hillier, F. S. (1970). J. Qual. Tech. 2, 9-16. P. R. Krishnaiah and C. R. Rao, eds., Handbook of Statistics, Vol. 7 © Elsevier Science Publishers B.V. (1988) 353-373 1 1 u QMP/USP A Modern Approach to Statistical Quality Auditing Bruce Hoadley I. Introduction and summary An important activity of Quality Assurance is to conduct quality audits of manufactured, installed and repaired products. These audits are highly structured inspections done continually on a sampling basis. During a time interval called a rating period, samples of product are inspected for conformance to engineering and manufacturing requirements. Defects found are accumulated over a rating period and then compared to a quality standard established by quality engineers. The quality standard is a current target value for defects per unit, which reflects a trade-off between manufacturing cost, maintenance costs, customer need, and quality improvement opportunities and resources. The comparison to the standard is called rating and is done statistically. The output of rating is an exception report, which guides quality improvement activities. For the purpose of sampling and rating, product and tests are organized into strati called rating classes. Specific examples of rating classes are: (i) a functional test audit for digital hybrid integrated circuits, (ii) a workmanship audit for peripheral switching frames. The purpose of this chapter is to describe the Universal Sampling Plan (USP) and the Quality Measurement Plan (QMP), which were implemented throughout Western Electric in late 1980. USP and QMP are modem methods of audit sampling and rating respectively. They replaced methods that evolved from the work of Shewhart, Dodge and others, starting in the 1920's and continuing through to the middle 1950's [5-8]. More generally, USP and QMP provide a modern foundation for sampling inspection theory. This chapter is a summary of the material published in [ 1-4, 10, 12-17]. Those papers considered primarily attribute data in the form of defects, defectives or weighted defects (called demerits in [8]). Here, we consider Poisson defects only. However, the general case can be transformed into the Poisson case via the concept of equivalent defects [ 1, p. 229]. 353 354 B. Hoadley 1.1. Summary of USP The first step in a quality audit is to select samples of all the rating classes to be inspected. The cost per period of these inspections cannot exceed an inspection budget. The traditional audit sampling consisted of six sampling curves developed by Dodge and Torrey [8]. The curves provide sample size as a function of production. Each product is assigned a curve based on criteria such as complexity and homogeneity. To quote Dodge and Torrey, 'These are empirical curves chosen after careful consideration of the varied classes of product to which they were to be applied as well as of the quantities of production to be encountered.' There is no known theoretical foundation for the curves. The traditional sampling plan did not account for many factors that relate to sampling. For example, (i) cost of auditing, (ii) field cost of defects, (iii) quality history, (iv)statistical operating characteristics of rating, (v) audit budget constraints. USP provides a theoretical foundation for audit sampling, which accounts for all these factors. The fundamental concept of USP is that more extensive audits provide more effective feedback, which results in better quality. The cost benefit is less field maintenance cost; but, this must be compared to the larger audit costs. We assume that the field maintenance cost affected by the audit is F "]. (Production) x F Defects sent to the field] x Field maintenance cost L per unit produced _J Lper defect sent to the field_] The audit affects the second quantity in this expression via a feedback mechanism, Figure 1. Under this feedback model, the production process is a controlled stochastic process. When the process is at the standard level (one on an index scale), there is a small probability per period that the process will change to a substandard level. Given this substandard level, there is a probability per period "x STANDARD: TIME I • - I I I l i i ~ I QUALITY: "~(e) SUBSTANDARD: b DEFECT INDEX f/DEPEN~DS ONe,l ~ PLANJ Fig. 1. The USP feedback model. Q M P / U S P - - A modern approach to statistical quality auditing 355 that the audit will detect this change. This probability depends on the audit sample size. When detection occurs, management acts and forces the process back towards the standard. This phenomenon is empirically observed in audit data. So, sample size affects average long run quality, because it affects detection probability. In this feedback model, we ignore the possibility of false detection when the process is at standard. There is an implicit assumption that the cost of false detection is large and therefore, the producer's risk is small. When detection occurs, management is supposed to act. Such action frequently involves the expenditure of substantial resources on a quality improvement program. So the whole audit strategy is founded on the integrity of exception reporting. Otherwise, management would pay little attention to the results. Our model for the total audit cost per period per product is [Inspection cost per unit] x [Number of units inspected]. The USP sample sizes for all products are determined jointly by minimizing the audit costs plus the field maintenance costs subject to an inspection budget constraint. The approximate solution to this problem is the ollowing formula for e = [expected number of defects in the audit sample per period]: .[0 (i.e., no audit) e=[ ~ when B x / / B ~ < 1, otherwise, where e = Expected number of defects in this audit sample, given standard quality (called expectancy) = nS, n = Sample size, s = Standard defects per unit, N = Production, r = G/Ca, Cr = Field maintenance cost per defect, C a = Audit cost per unit of expectancy, P = Probability per period that the process will change to a substandard level, b B = Process control factor, = Budget control factor (monotonically related to the Lagrange multiplier asso- ciated with the budget constraint). Note that we express the solution in terms of the expectancy in the sample--not the sample size itself. Expectancy is the natural metric for describing the size of an audit; because, the detection power depends on the sample size through 356 B. Hoadley expectancy. Five switching frames with a standard of one defect per unit generate as much information as 5000 transistors with a standard of 0.001 defects per unit. For most applications, the expectancy in the sample ranges from 1 to 10, whereas sample sizes range from 5 to 10000. EXAMPLE. Test audit for a small business system. This example illustrates USP in its simplest form. For a test audit, the quality measurement is total test defects. The standard is s = 0.06 defects per unit. An analysis of past audit data yielded P = 0.04. Economic analyses yielded Ca = $430 and Cr = $280, so r = $280/$430 = 0.65. If B = 2 (used by Western Electric), then the expectancy formula is e = x/2(0.04) (0.65) (0.06)N = (0.056) x / ~ . The sample size version of this formula is n ~fJ_~0.06) = (0.93) x / ~ . If the production is N = 2820, then e = (0.056)~/2820 = 3.0. The sample size is 3.0/(0.06) = 50. Under the traditional plan [8], the sample size would have been x/2 x/N = ~ = 75, a 50 percent increase. USP--A foundation for sampling inspection The general concept of USP is to select product inspections which minimize inspection costs plus field maintenance costs subject to an inspection budget constraint. The tradeoff between inspection and maintenance costs is a result of the feedback model of Figure 1. These concepts are fundamental and can be applied to handle many sampling inspection complexities beyond those presented in this chapter. The following complexities have been treated in Bell Laboratories and Bell Communications Research memoranda by the author and others: 1. Demerits. Defects are weighted by demerits according to their seriousness. Demerits can be transformed into equivalent defects [1, p. 229]. 2. Pass through. An in process audit can detect defects that cannot pass through to the field because of subsequent processing. So the audit does not prevent field maintenance costs associated with those defects. For details see [12, 131. 3. Clustering. The standard defects per unit, s, can be very large; e.g., s = 10 for the installation audit of a whole switching system (a cluster). In this case, true quality may vary from system to system; so, the Poisson assumption does not hold. In this case, the audit expectancy is increased by a cluster factor to account for the between cluster heterogeneity. 4. Fractional coverage. Sometimes it makes sense to inspect only a fraction of a unit of product. For example, a fraction of the connections on a frame of wired equipment. In this case, the decision variable is two-dimensional: number of frames and fraction of each frame. 5. Attribute rating. Sometimes attributes of products are rated rather than products themselves. For example, solder connections on all product at a factory is QMP/USP--A modern approach to statistical quality auditing 357 an attribute rating class. Another example is product labeling. When a frame is inspected, data for several attribute rating classes is generated. The decision variable is now multi-dimensional: number of frames and fractions of the frame for all the attributes rated. 6. Reliability. For reliability audits the decision variable is two-dimensional: number of untis and time on test. Also, quality is defined by the failure rate curve rather than defects per unit. B E L L C O R E - S T D - 2 0 0 [14] is a reliability audit plan based on these concepts. 7. Lot-by-lot acceptance sampling. Mathematically, there is no difference between an audit and lot-by-lot acceptance sampling. An audit period is analogous to a lot and an exception report is analogous to a rejected lot. Often, acceptance sampling is effective because of feedback rather than the screening of rejected lots. Application [10] of QMP/USP to acceptance sampling yields a plan that has many features in common with MIL-STD-105D, but also some important differences. 8. Skip lot acceptance sampling. Here the decision variable is two-dimensional: fiaction of lots to sample and sample size per lot. This is the right approach when there is a large inspection setup cost for each lot. B E L L C O R E - S T D - 1 0 0 [12, 13, 16] is a lot-by-lot/skip lot plan based on these concepts. No doubt the list goes on and on. For example, the application of QMP/USP to sequential and multi-stage sampling has not been investigated. 1.2. Summary of QMP After the product samples are chosen, they are inspected for conformance to engineering and manufacturing requirements. These inspections produce data in the form of defects. QMP is a method of analyzing a time series of defect data. The details of QMP are in [1, 12, 15, 17]. As an introduction to QMP, consider Figure 2. This is a comparison of the QMP reporting format (a) with the old T-rate reporting format (b) which is based on the Shewhart control chart [6]. Each year is divided into eight periods. In Figure 2b, the T-rate is plotted each period and measures the different between the observed and standard defect rates in units of sampling standard deviation (given standard quality). The idea is that if the T-rate is, e.g., less than minus two or three, then the hypothesis of standard quality is rejected. The T-rate is simple, but it has problems. For example, it does not measure quality. A T-rate of - 6 does not mean that quality is twice as bad as when the T-rate is - 3. The T-rate is only a measure of statistical evidence with respect to the hypothesis of standard quality. Also, implicit in the use of the T-rate is the assumption of Normality. For small sample sizes, the Normal distribution is a poor model for the distribution of defects. QMP was designed to alleviate the problems with the T-rate and to use modern statistics. Under QMP, a box and whisker plot (Figure 2a) is plotted each period. The box plot is a graphical representation of the posterior distribution of current population quality on an index scale. The index value one is the standard on the index scale 358 B. Hoadley t977 1978 1 I 2 3 (b) ÷2 0 -2 I I '1 ~ 3 4 - ~ 5 6 78"1 - " 2345 67 8 Fig.2. QMPvs. the T-rate(ShewhartControlChart). and the value two means twice as many defects as expected under the quality standard. The posterior probability that the population index is larger than the top whisker is 0.99. The top of the box, the bottom of the box and the bottom whisker correspond to probabilities of 0.95, 0.05, 0.01 respectively. For the Western Electric application of QMP, exceptions are declared when either the top of the box or the top of the whisker are below standard (i.e., greater than one on the index scale). This makes the producer's risk small, as explained in Section 1.1. The posterior distribution of current population quality is derived under the assumption that population quality varies at random from period to period. This random process has unknown process average and process variance. These two unknown parameters have a joint prior distribution, which describes variation across product. The heavy 'dot' is a Bayes estimate of the process average; the ' x ' is the observed value in the current sample; and the 'dash' is the posterior mean of the QMP/USP--A modern approach to statistical quality auditing 359 current population index and is called the Best Measure of current quality. This is like an empirical Bayes estimate--a shrinkage towards the process average. The process averages ('dots') are joined to show trends. Although the T-rate chart and the QMP chart sometimes convey similar messages, there are differences. The QMP chart provides a measure of quality; the T-rate chart does not. For example, in period 6, 1978 both charts imply that the quality is substandard, but the QMP chart also implies that the population index is somewhere between one and three. Comparing period 6, 1977 with period 4, 1978 reveals similar T-rates, but QMP box plots with different messages. The QMP chart is a modern control or feedback chart for defect rates. However, the important outputs of QMP are the estimated process distribution (sometimes called the prior distribution) and the posterior distribution of current quality. In other decision making contexts, such as Bayesian acceptance sampling [ 11 ], these distributions could be used to optimally inspect quality into the product via the screening of rejected lots [10]. So QMP provides a practical tool for applying Bayesian acceptance sampling plans. 1.3. The QMP and USP models in perspective For the USP model, population quality is either at the standard level (1) or at the substandard level (b), Figure 1. For the QMP model, population quality varies at random from period to period, with a unknown process average and process variance. The two models seem to be inconsistent. But, there is a reason for the difference. The QMP model is used primarily for statistical inference (the posterior distribution of current population quality). This inference should be robust to the real behavior of the population quality process. The population quality process could be very complex and contain elements of (i) random variation, (ii) random walks, (iii) drifts, (iv) auto-correlation, and (v) feedback from out of control signals. But, no matter what the process, it has a long run average and a long run variance. The simple QMP model captures the first-order essence of any process. So, the QMP inference, has a kind of first-order robustness. On the other hand, the reason for an audit is to provide a monitoring tool to guide quality improvement programs. Therefore, the allocation of inspection resources to the many audits, should be based on a model of these monitoring and quality improvement activities, e.g., the USP model. The link between the two models is the USP process control factor, P, which is defined as the probability per period of a change to the substandard level, b. QMP is used to estimate this factor by the formula P = Conditional probability that the population quality in the next period will be worse than b, given all the data through the current period. B. Hoadley 360 2. U S P details This section contains the important elements of the derivation in [4]. 2.1. General theory For a given product, define e A(e) S(e) dS ~(e) = Expectancy of the audit, = Audit cost for audit of size e, = Savings in field maintenance cost due to an audit of size e; S(0) = 0, S'(e) A'(e) We assume: (i) S' (e) and A' (e) > 0 exist for e > 0. (ii) (dS/dA)(e) is monotonically decreasing for e > 0. We deal with m a n y products simultaneously; so, for product i, we use the subscript i. The general U S P problem is to select e,., i = 1. . . . . I, to minimize Y.i [Ai(ei) - Si(ei)] subject to the constraints: (i) e i >>,O, (ii) •iAi(ei) <~M. From K u h n - T u c k e r theory [9], there exists a Lagrange multiplier, 2, so that the optimal et's satisfy: (1) (2) (3) A; (e,.) - S; (ee) + L4; (ei) ~> 0 , i = 1, . . . , I , ee[A t ( e i ) - S i(ee) + 2A;(e~)]=O, t ¢ ! i= 1. . . . . I, Z Ai(ei) ~ M , i (4) (5) ei>~O, 2/>0, i=1 ..... I. For 2 >1 O, define ! if S; (0) < "1 + )., ei(,~) = I o I A; (o) S; (et) A; (ei) solution to - - - - 1 + 2 otherwise. A very simple algorithm for solving the problem is: 1. Choose a value for 2. 2. If Y,i Ai(e;(2)) = M, stop; otherwise increase or decrease 2 according to whether Y~iA;(ei(2)) is greater or less than M. QMP/USP--A modern approach to statisticalquality auditing 361 2.2. USP application Most of the notation used in this section is defined in Section 1.1. The audit cost function: A(e). We assume the linear audit cost function A(e) : Cae. The audit savings function: S(e). Let F(e) denote the field maintenance cost associated with an audit of size e. Then S(e) = F(O) - F(e). Now, according to Section 1.1, F(e) = O(e)sCfN, where 0(e) = Process average (on an index scale) that results from an audit of size e. Recall that 0(e) arises from the quality behavior model described in Figure 1. When the process is at the standard level, there is a probability, P, per period of a change to the substandard level (b). So, the expected waiting time until this change is E[Y] = 1/P. When the process is at the substandard level (b), we assume that in each period, the audit detects the substandard level if the number of defects, x, exceeds an acceptance number', c. We assume that x has a Poisson distribution with mean n . s . b = e" b; so, the expected waiting time until detection depends on e and we define D(e) by E [ Z ] = liD(e). Note that D(e) can be interpreted as an average detection power. Hence, 0(e)=[ E(Y) LE(Y) + 1 E(Z)] (1) F + LE(Y) + E(Z)_] P =1 +[p +D(ei](b-1). Putting all this together yields S(e) = [/9(0)- O(e)lsCfN t When QMP is used for detection, the acceptance number is a random variable. B. Hoadley 362 Analysis. From the general theory, e(2) is often the solution to S'(e) -1+2. A'(e) For the USP application, this equation is P + D(e) - I (b -1) PrNs] 1/2 JD'(e).D'(O) D'(O) (1 + 2) " Furthermore, the condition [S'(O)/A'(O)] < 1 + 2 is E[ , ( b - l) -]1/2 LD (0)31+ 2)lPrNsJ <P/D'(O). 2.3. Approximate detection power function The simplest approximate average detection power function which satisfies condition (ii) of the general theory is D(e) - e e+a For this case, e(2) = max { O, B,/~~+p~ - ae} , where B - a(b - 1) - - (1 + ~) B is called the budget controlfactor and is monotonically related to the Lagrange multiplier, 2, associated with the budget constraint. For practical application of USP, we assume that P is small and use the simple approximate solution otherwise. For the Western Electric application, B = 2 is often used. Q M P / U S P - - A modern approach to statistical quality auditing 363 3. QMP details 3.1. Data format For rating period t [t = 1, . . . , / ( c u r r e n t period)], the audit data is of the form nt = Audit sample size, x t = Defects observed in the audit sample, e t = Expected defects in the sample when the quality standard is met (called expectancy), = SHt where s = Standard defects per unit. In practice, defectives or weighted defects are sometimes used as the quality measure. These cases can be treated via a transformation [ 1, p. 229], to equivalent defects. We express the defect rate, as a multiple of the standard defect rate; i.e., with the index It xt/e t . So /t = 2 means that we observed twice as m a n y defects as expected. 3.2. Statistical foundations of QMP The formulas used for computing the Q M P box plots shown in Figure 2a were derived by an approximate Bayesian analysis of a statistical model [1]. The assumptions of the model are: (1) x t is the observed value of a r a n d o m variable, Xt, whose sampling distribution is Poisson with mean = nt2~, where ~'t is the true defect rate per unit. For convenience, we reparameterize 2~ on an index scale as 0t = True quality index = 2t/s. So the standard value of 0t is 1. (2) 0,, t = 1, . . . , T, is a r a n d o m process (or r a n d o m sample) from a G a m m a distribution with 0 = process average, 7 2 = process variance, which are unknown. Assumption 2 makes this a parametric empirical Bayes model. 364 B. Hoadley (3) 0 and 72 have a joint prior distribution. The physical interpretation of this prior is that each product has its own value of 0 and 72 and these vary at random across products. Assumption (3) makes this a Bayes empirical Bayes model. We never specify the form of this joint prior; because, in our heuristic derivation, only its moments are used. This is now a full Bayesian model. It specifies the joint distribution of all variables. The quality rating in QMP is based on the conditional or posterior distribution of Or given x = (xl, . . . , xr). 3.3. Posterior distribution of current quality The exact posterior distribution of 0 r is computationally impractical. So we approximate the posterior mean and variance of 0 r. The complex approximate formulas given in the Appendix are those published in [1 ]. They resulted from a lengthy fine tuning process conducted over 20000 audit data sets during a two year trial of QMP. Improved QMP formulas are published in [12] and derived in [15]. In this section, we provide only the structure of the formulas. The posterior mean is approximately E [ a r l x ] = Or = 6jrO + (1 + 6Jr)IT, where b = E(O[ x ) , 6JT = E(coT I x ) , cot = O/er O/er + ?2 The posterior mean, 0T, is a weighted average of the estimated process average, 0, and the sample index, I T. It is the dynamics of the weight, 6j7-, that causes the the Bayes estimate to work so well. For any t, the sampling variance of I t (under the Poisson assumption) is Offe r The forecasted value of this is E[Ot/et] = O/e,. So the weight, coT, is [Forecaste sampling variance] [Forecasted sampling variance] + [Process variance] If the process is stable, relative to the sampling variance, then the process variance is relatively small and the weight is mostly on the process average; but if the process is unstable, then the process variance is relatively large and the weight is mostly on the current sample index. The reverse is true of the sampling variance. If it is large (e.g., small expectancy), then the current data is weak and the weight is mostly on the process average; but, if the sampling variance is small (e.g., large expectancy), then the weight is mostly on the current sample index. In other words, cor, is monotonically increasing with the ratio of sampling variance to process variance. QMP/USP--A modern approach to statistical quality auditing 365 The posterior variance of Or is approximately ^ A V[O-r Ix] = Vr = (1 - cot) Or/e r + ~ V ( O I x) + (O + I t ) 2 V(cor Ix). If the process average and variance were known, then the posterior variance of Or would be (1 - < n r ) 0 r / e r , which is estimated by the first term in Vr. But since the process average and variance are unknown, the posterior variance has two additional terms. One contains the posterior variance of the process average and the other contains the posterior variance of the weight. The first term dominates. A large 6)r (relatively stable process), a small Or (good current quality) and a large e r (large audit) all tend to make the posterior variance of Or small. If 697- is small, the the second term is negligible. This is because the past data is not used much, so the uncertainty about the process average is irrelevant. If the current sample index is far from the process average, then the third term can be important. This is because outlying observations add to our uncertainty. If the process average and variance were known, then the posterior distribution would be Gamma, so we approximate the form of the posterior distribution by a Gamma. The parameters of the fitted Gamma distribution are ~ = shape param^2 t, eter = O r / V r, z = scale parameter = V r / O r. And the approximate posterior distribution function is P r [ O r < ~ z l x ] = G~(z/z) = ff /~ -- 1 x ~-1 e - X d x . 3.4. Q M P box and whisker plot For the box and whisker plots shown in Figure 2a, let I 9 9 ~ , I95~o, I05~o, and I01 ~o denote the top whisker, top of box, bottom of box, and bottom whisker respectively. These percentiles are formally defined, for example, by 1 - G~(I95%/z) = 0.95, etc. So, aposteriori, there is a 95 percent chance that Or is larger than I95~o. 3.5. Exception reporting For QMP, there are two kinds of exceptions. (a) Alert: I 9 5 % > 1 but I99~o~<1; i.e., 0.95<Pr[0r>llx]~<0.99. (b) Below Normal: I99~o>1; i.e., 0.99<Pr[0r>llx ]. Products which meet these conditions are highlighted in an exception report. B. Hoadley 366 3.6. USP .process control factor, P As mentioned in Section 1.3, Q M P is used to estimate the U S P process control factor, P, with the expression P = Pr[07-+ 1 > blx]. We approximate the conditional distribution of Or+ 1 (quality in the next period), given x, by a G a m m a distribution fitted by the method of moments. t h e general form of the moments are E[OT+I[X ] = O, V[OT+llX ] = VT+ 1 =E[y21x] + V[OIx] (see the Appendix for detailed formulas). If the process average and variance were known, then the conditional mean and variance of Or+ 1 would be simply 0 and 72. But since they are unknown, we use Bayes estimates of O and ~,2 and add the term V[01 x] to the conditional variance to account for the uncertainty in our estimate of O. The^ parameters of the fitted G a m m a distribution are ~1 = shape parameter = 02/Vr+l, z I = scale parameter = VT+1/O. SO, P = 1 - G~,(b/zl). 3.7. QMP dynamics The Best Measure and the box plot percentiles are nonlinear functions of all the data. So the dynamic behavior of these results is interesting. INDEX SCALE 2 tO 0 ) 2I (b) o 3I 4I ~,Z~--I,, -I ×lJc ~ " I ×t _~ ~ T-RATE CHART ~ I° I × I IE QMP BOX CHARTS Fig. 3. Dynamics of sudden degradation for expectancy = 5. QMP/USP--A modern approach to statistical quality auditing 367 Dynamics of sudden degradation Since QMP uses a long run average, it is natural to ask about responsiveness of the box plot to sudden change. If there is a sudden degradation of quality, Quality Assurance would like to detect it. The history data in Figure 3a is a typical history for a product which is meeting the quality standard. The expectancy of five is average for Western Electric audits. The history is plotted on a T-rate chart along with six possible values for the current T-rate (labeled A through F). So, the current period is anywhere from standard (T-rate = 0) to well below standard (Index --- 3.24, T-rate = - 5 ) . Figure 3b shows the six possible current results plotted in QMP box plot form. The box plot labeled A is the result of combining current result A with the past five periods. The box plot labeled F is the result of combining current result F with the same past history. As you can see, the QMP result becomes Alert at about T-rate = - 3 and becomes Below Normal at about T-rate = - 4 . For the T-rate method of rating, you would have a Below Normal at T-rate = - 3 . The good past history has the effect of tempering the result of a T-rate = - 3 . It is informative to study the relative behavior of the current sample index, process average and Best Measure as you go from current value A to F. The current index changes a lot (from 1.00 to 3.24 and the process average changes a little (from 1.00 to 1.38), both in a linear way. The Best Measure also changes substantially but in a nonlinear way. It changes slowly at first and then speeds up. This is because the weight on the process average is changing from 0.71 to 0.32. The weight changes, because as the current data becomes more inconsistent with the past, the process is becoming more unstable, while the sampling variance is changing slowly in proportion to the process average. Bogie contour plot For a fixed past history and current expectancy, there is a Below Normal Bogie for the current sample index. If the sample index is worse than the Bogie, then the product is Below Normal. Figure 4 is a contour plot of the Bogie for an expectancy of five. The axes are the mean and variance of the five past values of the sample index; i.e., 5 7=51 Z ( I t - i ) 2 , t=l 5 ~ , ~ . 2 .. ~1 E ( I t - ] ) 2 , t=l where I, is the sample index in past period t. For given values of i and S 2, we used a standard pattern of It's to compute the Bogie. The results are insensitive to the pattern. The dashed curve is an upper bound for S 2. To see how the contour plot works, consider an example. Suppose 7 = 0.8 and S 2 = 0.7. The point (0.8, 0.7) falls on the contour labeled 2.6. This means that if the current sample index exceeds 2.6, then the product will be Below Normal. The contour labeled 2.6 is the set of all pairs (i, S 2) that yield a Bogie of 2.6. B. Hoadley 368 2.2 2.3 6.0 2.t .0 5.0 4.0 .9 laJ ¢..) Z n- ~> 3.0 .8 t-u) 13- .7 2.0 .6 .5 t.0 .4 .3 .2 O.O 0.5 "1.0 `1.5 2.0 2.5 3.0 PAST MEAN Fig. 4. Below N o r m a l Bogie c o n t o u r plot for e x p e c t a n c y = 5. This contour plot summarizes the Below Normal behavior of QMP for an expectancy of five. As i gets larger than one, the Bogie gets smaller. If i exceeds 1.6, then the Bogie is smaller than 2.34, which corresponds to a T-rate of - 3 . In this case, QMP Below Normal triggers earlier than a T-rate of - 3 . For ] less than 1.4, as S 2 gets larger, the Bogie gets smaller. This is because large S 2 implies large process variance, which makes an observed deviation more likely to be significant. For very small S 2, as you move from 7 = 0 to i = 1, the Bogie increases from 2.6 (T-rate = - 3 . 6 ) to 2.9 (T-rate = -4.2). This is an apparent paradox. The better the process average, the less cushion the producer gets. This is n o t a paradox, but an important characteristic of QMP. With QMP we are making an QMP/USP--A modern approach to statistical quality auditing 369 inference about current quality, not long-run quality. If we have a stable past with i = 0.2, and we suddenly get a sample index of 2.7, then this is very strong evidence that the process has changed and is worse than standard. If we have a stable past with i = 1, and we suddenly get a sample index of 2.7, then the evidence of change is not as strong as with I = 0.2. The weight we put on the past data depends on how consistent the past is with the present. The Bogie contour plots provide the engineer with a manual tool to forecast the number of defects that will be allowed by the end of a period. Statistical fitter Figure 5 illustrates jitter statistical in the T-rate. The expectancies are about 0.1, so the T-rate jitters every time a defect occurs. The small expectancies are revealed by the long box plots. Period 8, 1977 was Below Normal for the T-rate, but normal for QMP. 1977 ~978 ~ 2 3 4 5 6 7 8 ~ 2 5 4 5 6 7 8 I. I I, I. \ I I \ I "i F- (b) +2 0 o-----q ~q ~ v -2 -2.7 t 2 i 5 i 4 [ 5 ~ 6 I 7 . I 8 1 -1 I 2 i 3 I 4 I 5 I 6 I 7 L 8 Fig. 5. Statistical jitter in the T-rate. 370 B. Hoadley Appendix Q M P formulas The Q M P formulas are derived in [1 ]. In this appendix, we state the formulas in the notation o f [1, Section 4.5]. F o r rating period 2 t [t = 1, . . . , T (current period)], the r a w audit d a t a is of the form: n t = sample s i z e , x t = Defects o b s e r v e d in the audit s a m p l e . The mean and variance o f x t given s t a n d a r d quality (Est and Vst ) are the same, because x t is Poisson. So, x t = Equivalent defects = d e f e c t s , e t = Equivalent expentancy = e x p e c t a n c y . Let x denote the set o f data, { x t, t = 1. . . . , T } . In Q M P , the prior distribution o f the process average manifests itself as 'prior data', which we denote x o = e o = 1. N o w for t = 0, I, . . . , T, c o m p u t e the following: Sample index : It xt/e t , Weighting factors f o r computing process average and variance: ft -- el et , 1 + et/4 gt 2.5 + (1.5)e t + (0.22)et2 Corresponding weights: P, = ft, t qt = gt gt" t N o w let Y, denote ~ rt=o and c o m p u t e the following: b = (2 p,/t). 2 The formulas also apply to lot-by-lot acceptance sampling data. Q M P / U S P - - - A modern approach to statistical quality auditing Degrees o f freedom: df= 2 [ 2 qt(1/e,)] 2 _ 1. qt2(1/e 3 + 2/e if) Total observed variance: ( 1 4 . 4 ) a 2 + (df + 1) Y, qt(It - O) 2 Se 9+df Estimated average sampling variance: a 2 = E q,(It/e,). Variance ratio: R = S2/G 2 . F, G, and H" a = 4.5 + ~ d f , B = T(i), T(O) = 1, i=0 l, B F- B-1 a = - 1 RF H= [~][( a l- 1)(1) + 1]. Current sampling variance: d = bleT. Sampling variance ratio: rT = a~.la 2 . Process variance: ~ 2 = F S 2 _ (72 = (FR - 1)a z . 371 372 B. Hoadley Weights: = o.2/(o.2 + ~2) = IlFR. Best measure of current quality: 07- = ~ T ~ + (1 - & r ) I r . Posterior variance of current quality (07-): V.r=(l_69r)(Or/er)+692EptZ ~2+ + 2 rr( O^ - IT) 2 [(r T - 1 ) ~ + G. 1] 4 Posterior variance of future quality (Or+ 1): Vr+~ = [HS 2 - a 2] [1 + ~ p f ] + 0 Z(p~/et). Posterior distribution of Or: ^2 ^ Q(z) = P r [ O r > zlx] = 1 - G~,(z/z), where 1 G=(y) = f Yo r(~) - - 1 e-X dx = Gamma X a- c.d.f. Posterior distribution of 07-+ 1: A2 ~1= 0 IV~+ ~ , ~ ~ Vr+ ~l b P(z) = P r [ 0 r + 1 :> zl x] = 1 - G~,(Z/Zl). References [1] Hoadley, B. (1981). The Quality Measurement Plan. Bell System Technical J. 60 (2), 215-271. [2] Hoadley, B. (1981). Empirical Bayes analysis and display of failure rates. In: Proceedings of the 1EEE 31st Electronic Components Conference, May 11-13, 1981, Atlanta, GA, pp. 499-505. [3] Hoadley, B. (1986). Quality Measurement Plan. Encyclopedia of Statistical Sciences, Vol. 7. Wiley, New York. [4] Hoadley, B. (1981). The Universal Sampling Plan. In: 35th Annual Quality Congress Transactions of ASQC, May 27-29, 1981, San Francisco, CA, pp. 80-87. [5] Shewhart, W. A. (1958). Nature and Origin of standards of quality. Bell System Technical J. 37 (1), 1-22. QMP/USP--A modern approach to statistical quality auditing 373 [6] Shewhart, W. A. (1931). Economic Control of Quality of Manufactured Product. Van Nostrand, New York. [7] Dodge, H. F. (1928). A method of rating manufactured product. Bell System Technical J. 7, 350-368. [8] Dodge, H. F. and Torrey, M. N. (1956). A check inspection and demerit rating plan, Indust. Qual. Control 13 (1), 1-8. [9] Hillier, F. S. and Lieberman, G. J. (1974). Operations Research. Holden-Day, San Francisco, CA, Chapter 18. [10] Buswell, G. and Hoadley, B. (1985). QMP/USP: A modern alternative to MIL-STD-105D. Naval Logistics Quart. 32 (1), 95-111. [11] Guthrie, D. Jr. and Johns, M. V. Jr. (1959). Bayes acceptance sampling procedures for large lots. Ann. Math. Statist. 30, 896-925. [12] Bell Communications Research (1985). BELLCORE-STD-100 and STD-200 inspection resource allocation plans. Technical Reference TR-TSY-000016 Issue 1. [13] Brush, G. G., Guyton, D. A., Hoadley, B., Huston, W. B. and Senior, R. A. (1984). BELL-STD-100: An inspection resource allocation plan. IEEE Communications Society Global Telecommunications Conference, November 26-29, 1984, Atlanta, GA. [14] Guyton, D. A. and Hoadley, B. (1985). BELLCORE-STD-200 system reliability test sampling plan. In: Proceedings of the Annual Reliability and Maintainability Symposium, January 22-24, 1985, pp. 426-431. [15] Hoadley, B. (1984). QMP theory and algorithms. Bell Communications Research Released Technical Memorandum TM-TSY-000238, October 26, 1984 (available from author). [16] Hoadley, B. (1986). The theory of BELLCORE-STD-100: An inspection resource allocation plan. In: Transactions of the International Conference on Reliability and Quality Control. NorthHolland, Amsterdam. [17] Guyton, D. A. and Tang, J. (1986). Reporting Current Quality and Trends: T Plots. In: Proceedings of the 40th Annual Quality Congress, Anaheim, CA, May 1986. P. R. Krishnaiah and C. R. Rao, eds., Handbook of © Elsevier Science Publishers B.V. (1988) 375-402 Statistics, Vol. 7 | Q .Ik , J Review About Estimation of Change Points P. R. Krishnaiah and B. Q. Miao 1. Introduction Suppose that we have a sequence of observations x I . . . . . X~v with distribution functions F 1. . . . . F N respectively. Generally the subscripts of x's may be considered as time, but one should remark the fact that the observations may not necessarily be taken at equal-spaced times. For example, xl, x2, x3, x4 may be the unemployment rates of a nation in March, June, September and December of 1986, while x 5, x6, ... are the rates of successive months of 1987. A moment is said to be a change point in the sequence if F~+ 1 is vastly different from F~ in some way. The precise nature of change is determined by the problem considered. Such situations occur in a wide range of practical endeavor. In quality control one takes successive observations of the process to see if something happened causing the q u a l i t y o f the items produced to deviate from its pre-set standard value. In econometrics, the variables reflecting the financial situation may change drastically after a crash of the stock market. In the study of growth in biology, it is commonly assumed that there exists a log linear relationship between the size of two body parts and that this relationship persists throughout stable growth period. A structural shift in this relationship may indicate that a new phase may be of considerable interest (see Huxley, 1972; Oshumi, 1960). Although the case in which at most one change is allowed is by far the most important, situations arise where several changes occur. So in the most general setting we have the observations xl, . . . , xN grouped into non-overlaping sets {X 1 . . . . , X v l } , {X.cl+ 1 , . . . , X . c l + 2 } . . . . . {X.vq_,+l ..... XN} such that within each group the distributions of observations remain relatively stable, while abrupt changes (in some sense) occur at z I . . . . , Zq_ 1, which are the change points. When two or more change points are allowed, the problem becomes vastly complicated, though in a number of cases the methods developed in the one-change case can also be applied with some modifications. The statistical inference problem about a change point model consists: (1) To determine if any change point should exist in the sequence. (2) Estimate the number and position(s) of change point(s), and other qualities of interest which 375 376 P. R. Krishnaiah, B. Q. Miao are related to the change. For example, the magnitude of the jump of the mean. In a way, the classical two-sample and multi-sample problem can be considered as a special case of the general change point problem described above. The important difference lies in that in the classical case the possible positions of change are precisely known in advance, while in the above formulation, the most important question is to determine these possible positions. Page (1954) first proposed and studied such a formulation. One frequently-used formulation of the change point problem is as follows: Consider x(t)=#(t)+e(t), 0<t~<l, (1.1) where x(t) is the observation taken from time t, e(t) is the random error, E e(t) = 0, and #(t) is an unknown left-continuous and piecewise smooth function. A point to E (0, 1] satisfying U(to) ~ U(to + 0), is said to be a jump change point. If /~(to) = #(t o + 0) but (d/dt)/~(to - 0) ¢ (d/dt)#(t o + 0), then to is called a first order continuous change point, usually abbreviated to 'continuous change point'. This formulation is nonparametric in nature because the unknown #(t) is not assumed to have any specific form. In order to develop a more fruitful theory, one imposes the restriction that # belongs to some parametric class. An important example is the segmented regression model: ~(t) f ~',h,(t), 0 < t ~ to, "t (/~h2(t), (1.2) to < t ~ < 1, where ~. ~ ~PJ, j = 1, 2, are unknown vectors of regression coefficients, hj is a continuous function taking values in Rpj, j = 1, 2. In the sequel we use fl' to denote the transpose of the vector r, while fl is often considered as a column vector. If fl'lhl(to)v ~ fl~h2(to), t o is a jump change point, otherwise it is a continuous change point, or not a change point at all. We can also consider the case thet x(t) is multi-dimensional, then ill,//2 above are matrices rather than vectors. One should note that in the above formulation, the emphasis is on the possible change of the mean, which is undoubtedly the most important type of the change point problem. For such problem the general formulation giving at the beginning of this section can easily be put in the form of (1.1). We may take x(i/N) as x;, i = 1, . . . , N. Since the observations may not be taken at equal-spaced moments, the variable t in the model (1.1) can not in general be understood in the uniform time scale b a s i s . As mentioned above, the random error process (e(t), 0 < t ~< 1) is assumed to be centered at 0. An often-made assumption is that it is an independent process, though models in which e(t) has some simple dependence structure can also be Review about estimation of change points 377 studied. In the sequel we shall always stick to the independence assumption. A much-studied case is that e(t),.~ N(0, o-2) (o.2 unknown) and hi(t), j = 1, 2, are polynomials of t, of which the linear case is by far the most important. If hi(t), j = 1, 2, are linear and fl'lhl(to)v~ fl~h2(to), is called a switch regression model. This model is investigated by many authors (see Quandt, 1958, 1960; Quandt and Ramsey, 1978; Robison, 1968, and others). Hudson (1966) and other authors discussed the estimation and hypothesis testing of continuous change point using maximum likelihood (ML) and least square (LS) techniques. While estimator of change positions can be obtained by these methods, the distributions of the estimators are usually very complex, and a precise determination of it is out of the question even in the simplest case. Some asymptotics are possible. When the two sections of x t in (1.2) intersect, denote the MLE of the abscissa of the intersection ~ by ~. Feder (1975) proved the asymptotic normality of ~. Hinkley (1971) derived asymptotic distribution for ~ which gave a better fit to that sampling distribution for moderate sample sizes. Inference about y is also proposed. When these regression sections are parallel, the asymptotic distribution of the MLE @ of the jump change point is derived by Hinkley (1970) by random-walk considerations. Unfortunately, z tunas out to be inconsistent. Besides MLE and LSE, Bayesian methods play an active role. Suppose that the positions of changes obey an arbitrarily specified a priori probability distribution appropriate to the special case being studied, and assume that the jumps of the mean are independently and normally distributed random variables with mean 0. Chernoff and Zacks (1964) derived a Bayesian estimator of the current mean ~n for a priori uniform distribution on the whole real line using a quadratic loss function. This approach is extended to the one-parameter exponential family of distributions (see Kander and Zacks, 1966). Bhattacharyya and Johnson (1968) proposed an optimal invariant test for certain location shift alternatives. Numerous authors in this field used Bayesian methods under various assumptions on the model. A large portion of results in this field are derived under the assumption that the number q of change points is known. Situations occur in whcih q is unknown and is to be estimated. In a small-sample setting, this problem does not lend itself to a satisfactory treatment. Some asymptotics are proposed. Vostrikova (1981) investigated this problem in multivariate case. He suggested a binary segmentation procedure to estimate q, and proved that these estimates are consistent. Pettitt (1980) suggested another ad hoc sequential procedure by cumulative sum (CUSUM) method. Krishnaiah, Miao, Subramanyam and Zhao (1986), (1987a,b) considered large sample properties of change point estimators, obtained MLE of q and the positions of change points by model selection considerations. These estimators are proved to be consistent. Under the assumption of independence, normality and the constraint /~1 ~> ' " " t> #~v, Krishnaiah, Miao and Zhao also derived M L E of the number and the positions of change points, and proved their consistency. Later, Yin (1986) proposed a consistent estimator by a nonparametric approach; Chen (1987) and 378 P. R. Krishnaiah, B. Q. Miao Miao (1987) obtained the asymptotic distributions of these estimators for some simple types of change points. The MLE ,) of the intersection 7 of two regression curves is discussed in Section 2, and various non-Bayesian estimators about jump change model are presented in Section 3. Section 4 is devoted to Bayesian methods, and in the last section the estimates about the positions and the number of change points in large sample case are discussed. Some other methods are proposed to study the estimates of change points. Among them are dynamic program and smooth approximation, to name a few. 2. The e s t i m a t e o f the intersection o f regression curves 2.1. Weighted least squares estimation Let xi = x(ti), i = 1. . . . . N, be observations drawn from model (1.1) and (1.2) with only one change point ~. Here the continuity assumption fl~hl(7) =fl~h2(~), (2.1) plays the role of a constraint under which the parameters are estimated. Let wk, k = 1. . . . , N, be a given set of positive real numbers, called weights. Set "c(cQ N Q(fl, ~) = ~ Wx(Xk - fflhl(tk)) 2 + k = 1 E W k ( X k -- f l ~ h l ( t k ) ) 2 , (2.2) k ~ "c(~) + I where z(a) is an integer such that t~ ~< c~< t~+ l" Suppose that the unknown c~ belongs to a given set, say A. For convenience, we set 0 where H z ( m ) is the m x Pl design matrix with rows h'l(tk), 1 ~< k ~< m, and H2(m ) is the (N - m) × P2 design matrix with rows h'2(tk), m x k <~ N, W 1 = diag(w 1. . . . . W = diag(w I . . . . . W2 = diag (win + 1. . . . . Wm), WN)' WN) , B(~) = (h'~(~), - h~(~)) and /~ = (/~'1,/~)' • Rewriting the continuity assumption as B(~)fl = 0, we have the following problem: Find&, /~ such that Q(/~, &)= min+ p 2 Q(fl, c¢), flERPl B(~)B = o (2.3) Review about estimation of change points 379 A solution of this problem is called a weighed least squares estimate (WLSE). W L S E can be found by two steps. First, fixed a (so the row z = z(~) is also fixed) and find the W L S E of fl under the constraint (2.1). The weighted residual sun of squares is given by /~(a) = ]~(Q - {B(a) ( H ' (z) WH('c))- B'(a)} -1 x {U' (~) WH(a)} - B' (a)B(ct)/~(z) (2.4) and Q(a) = QI(Q + Qe(ct) + (B(a)fl(z))2(B(a)fl) 2 × {B(a) (U' (z) W U ( z ) ) - S ' (ct)} - 1 , (2.5) where /~(z) denotes the unconstrained W L S E of fl and QI(z), Q2(z) are the sums of first z and the last N - z squares of weighted residuals. Set (2.6) Q(z) = Q,(z) + Q2(z). Then Q(z) is the total weighted residual sums of squares when z is fixed. The next step is to find that & c A minimizing Q(a) defined by (2.5): Q(&) = min Q(~), (2.7) c~h 2.2. Classification of Let z(~)-- d -d-u d ' (illhl(U))[u = ~ - - - du ' (fl2h2(u))[u =ct (2.8) To facilitate the calculation of e, Hudson (1966) classified the W L S E of e into three types as shown in Table 1. Table 1 Classification of WLSE ~ ti g(&) ~ 0 g(&) = 0 Type One Type Three &= ti Type Two Type Two Let fl*(i) and fl*(i) be the unconstrained W L S E computed from q . . . . . ti and ti+ 1. . . . . tN respectively, ~*(i) be the point (or points) of intersection of fl*hl(t ) and fl*h2(t), which lies in (ti, ti+ 1). 380 P. R. Krishnaiah, B. Q. Miao THEOREM 2.1 (Hudson, 1966). (i) I f ~ is of Type One, and ~ ( q , ti+~) , then A flj=flj*(i), i= 1,2, and = ~*(i) i.e. fl*(i)h,(o~*(i)) = fl*(i)h2(~*(i)). (ii) I f ~ is of Type Two, ~ = tj for some i, then = t~(tt), Q(fl, ~) = Q(te) >~ Q(i). (iii) That ~ is of Type Three implies Q(fl, &)>~Q(i), if t~< & < t i + , . 2.3. Determination of If i is unknown, but we know the joint is of Type One, we can find & as follows. For a value of i, fig//*(t~) as before. Find whether or not the curves join at least one ~*(tj) in the right place, i.e. t; < ~*(i) < q+ ~, (2.9) If (2.9) holds, put T(i) = P*(i) + P*(i), where Pj*(i) j = 1, 2, are the local residual sums of squares. If the curves do not join, or if (2.9) is not satisfied, put T(i) = oo. Carry out the computation for all relevant values of i. Finally, choose the critical value of i for which T(i) is minimized. This procedure is based on the definition of Type One and the theorem above. If we know that the joint is of Type Two, i.e. ~ = t; for some i, we can easily find the remaining parameters since we now have a model which is linear in these parameters. We get the estimate by solving the problem of least square with the linear constraint fl', h,(t~) = fl~h2(t~). One way of doing this is to find the unconstrained local least square estimates Review about estimation of change points 381 fl*(i), and then make the relevant adjustment (see Gallant and Fuller, 1973; Hudson, 1966). If i is unknown, we have to carry out the above computation for all possible values of i, and choose the critical value of i for which the overall residual sum of squares is minimized. Finally, suppose the joint is of Type Three. That is ti < & < ti+ 1, f l ; h l ( & ) = f l 2 h 2 ( & ) • and __d (~,lhl(u))lu =~ _ __d du ^ du (fl;h2(u))]"= a = 0. In this case, depending on practical needs, the regression curve may be assumed to consist of one smooth curve, or it may contain two segment of smooth curves. The two cases are handled separately. Refer to Hudson (1966), and Gallant and Fuller (1973). In practice, usually we have no information concerning the type of a joint. Since Hudson's classification is exhaustive, it suffices to try all three types in order to arrive at the overall solution. We have to do so since there might be two or more change points in the given set A, and the joints might belong to different types. This can be done step by step using Hudson's theorem. Previous discussions can be extended to the models in which r pieces of segmented regression curves are present. 2.4. M a x i m u m likelihood estimation Now we assume that et's in (1.1) are normal. The logarithm of the likelihood function, logL(fl, 0"2, 0.2, c0' can easily be written down. We want to find the maximizing point of it subjecting to the restriction (2.1). Since the function logL(fl, 0-2, 0-22, ~) is not differentiable with respect to ~, we consider first to maximize logL(fl, 0-2, 0"~, ~) for a given ~. We have logL - sup {logL(fl, 0-1z, a 2, ~), fie R m +p2, a.z > 0, i = 1, 2, B(a)fl = 0} A -= logL(fl, 0-1, ^ 2 0"2, A2 N N - - - - log27r . . . . 2 2 00 z 2 N-z log b l 2 ( ~ ) - 2 "2 ) . log az(~ Take any & ~ A such that i.e. logL(&) = max logL(~) ~eA g(a) = arg rain ~ A ( 2 l°g °'2(°0 + - 2 log ~'22(~) . Thus, M L E is given by L(/~(~), 0-1(~), ^2 ^2 0-2(00, 0"). (2.10) 382 P. R. Krishnaiah, B. Q. Miao ^ 2 ) and ~rz(e ^2 ) Finally, (2.10) is not different from general MLE except az(e are restricted by (2.1). When A is an infinite set, e.g. an interval in [tl, tu], complications arise. Robison (1964) decomposed A into a finite number of disjoint sets c~,= {o~eA, z(cO = z } , zeJ, and used a reparametrization of the model based on constraint (2.1) and polynomial functions in order to solve (2.10). If a 2 = o:2, 2 a priori, then & is the same as ordinary least squares estimates. 2.5. Asymptotic distribution o f M L E o f intersection Consider the model which is a special case of (1.1) x, = { 0~1 + ~lUt "t- ~t, t = 1, . . . , z , o~2 q- ~2ut q- ~t, (2.11) t = z + l, . . . , N , where the arguments u l , . . . , u N are ordered, i.e. u 1 < U 2 < . . . < UN, the error terms el . . . . . eU are independent N(0, a z) and the parameters el, e2,/~l,/~2 and z are unknown. Let 7 be the abscissa of the intersection of two regression lines of model (2.11), then it is easy to see that 7 = (0~0 - - ~l)/(fll - - ~0)" (2.12) Further, it is assumed here that u s < 7 < u,+ 1. Thet is to say, the overall regression function is smooth, only a change in tangent occurs in the two sides of 7. Important problems to be considered are estimating 7 and making inferences about ~. Generally, the MLE ~) of ~ can be obtained more easily. In order to estimate and make inferences for 7, one way is to calculate the distribution of ~. Unfortunately, the explicit form of the distribution of ~ is hard to obtain. So we turn to the asymptotic distribution of ~. Feder (1975a) proved the asymptotic normality of ~) under more general conditions that et's are iid. r.v.'s with mean 0, variance a 2, finite (2 + b)-moment for some b > 0 and the number of unknown change points is finite, but may be larger than one. Although ~) is asymptotically normally distributed, an empirical study of the distribution of ~ suggests that for moderate sample size, the normal approximation is inadequate. So it is necessary to find some other approximate distribution which fits better in case the sample size is not large. This was considered by Hinkley (1969). He discussed the relation between the two maximum likelihood estimates ~) and } of ~, the first with constraint (2.1) and the second unconstrained. From asymptotic normality of ~, he deduced the asymptotic normality and approximate expression for the distribution of ~). If the two regression lines in the model (2.11) are parallel, especially/~1 = r2 = 0 and c~~ c~2, the MLE ~) of change point ~ can be deduced. An asymptotic Review about estimation of change points 383 distribution of ~ is derived by Hinkley (1969) using the random walk technique. Unfortunately, his asymptotic distribution is too complicated to be of practical significance. It is an interesting and important problem to find an estimate of a change point 7, for which an asymptotic distribution can be determined explicitly, since the construction of confidence interval of 7 depends upon such an estimate. In some simple models containing jump or slop change points, Chen (1987) and Krishnaiah and Miao (1987) obtained such estimated, see section five. 3. Non-Bayesian estimates of change points in jump change model 3. I. Weighted least squares estimation (WLSE) Consider model (1.1). Let Wt, t = 1. . . . . N, be a given set of weights, and z(~) Q(/3, c~) = ~ k= N wk(xk -/3'lhl(tk)) 2 + 1 ~ k= T(ct)+ wk(x~ -/3~hl(t~))2 , (3.1) 1 A t is the weighted sun of squares. (/3 , "~)' is called a W L S E of/3 and z is O(/~, ~ ) = min {O(/3, z), ~sRPJ, j = 1, 2, ~ J } (3.2) where/3-- (/3'1,/3~)'. . t . 1, , N, then (/~, ^, ~ ) , is the ordinary least squares If IV, . 1. for estimator. A WLSE can be calculated in two steps. First fix ~ and find /3(z) such that O(z) = O(/~(~), ~)= min {O(/3, z), ~ . ~ R pj, j = 1, 2}. In the next step take ~ as the solution which minimises Q(z), i.e., Q(~) = rain Q ( r ) = min (Q~(z)+ Q2(z)) where QI(z) and Q2(z) are the first ~ and the last N of squares, respectively. That is, (3.3) ~ weighted residual sums = arg min Q(~) ZEJ Generally, an explicit expression for ~ is not easy to obtain. The same procedure can be used when x t is a p × 1 vector observation. 384 P. R. Krishnaiah, B. Q. Miao 3.2. M a x i m u m likelihood estimation (MLE) Assume that {e(t)} is independent and normally distributed in model (1.1). The logarithm of the likelihood is given by logL(fl, tr?, aft, z) = ½10g2~z - l['clogtr2 + ( N 1 t ( x t - fl, hl(tt,)) 2 + a f 2 -5[trl 2 k=l Z z) log tr221 ( x t - fl2hz(tk))2] , (3.4) k=z+l Proceeding along the same lines as (3.3), we can obtain a M L E (/~, blz, ^2 0"2, ~) of (/~, a 2, a~, z) in two steps. The first step is to fix 'c, find /~(~), a,.z, i = 1, 2, such that log L (z) - log (/) (z), G~(*), ^2 ,r:(*), ^2 'c) = max {logL(/~, tr2, ~2, 'c), / ~ Rpj, ai2 > 0, i = 1, 2} . N . log. (2 x) 2 . 1 ['clog 3"tz(z) + (N - z)log a2(*) "2 - N] Then find ~ such that L ( ~ ) = max L(z) "r~J or, equivalentiy, = argmin['clogz-lSt(z) + (N- z)log((N- z ) - 1S2(z))], (3.5) z~J where Sl('c) is the sample variance of the first 'c observations, and $2(~) is that of the last (N - ~) observations. This procedure extend easily to multi-dimensional observations and to the case of more than one change point. Suppose the integers kl, . . . , kq satisfy 0 -= k o < k I < " ' " < kq < kq + ~ = N. Then n = ( k l , . . . , kq) is called a partition of [1, N ] , and K~N) = {Tr = (k,, . . . , kq), 0 < k, < ' . . < kq <( N} . (3.6) Consider the following p-dimensional model: xt=#j+e: where {st, t = such that et ~ Case (i). Aj vation matrix. 1. . . . . Np(0, = A> Each forkj_l<~kj, N} is a p x 1 At), At > 0. We 0 for j = 1. . . . . n ~ Kq, say n = j=l,...,q+l, (3.7) sequence of independent normal variables, consider two cases. N. Let X = (x l, . . . , X~v) be a p x N obser(k I . . . . , kq), corresponds to the model Review about estimation of change points M~: Exj=#j, #:#Uj+l, kj_l<i<<.k:,j=l j = 1. . . . . 385 q+l, ..... q. To find MLE under M , , we proceed as follows. The supremum of the log likelihood function is given by = logL(X, ~, A) _ _ _ 1 q+l N Np c - -- log IA] - ~ ~ trace(A- 1Aj(N)), 2 2 (3.8) j=l where IAI denotes the determinant of the matrix A, and (x,- xkj_,k)(x,- ~: ,k,) A j ( N ) : N -1 (3.9) t=kj_l+ l 1 ~fl - ~ fl-- (3.10) xt, ~ t=a+l Let q+l A,~ = Z A j ( N ) . (3.11) j=l For fixed re, the MLE /i of A is ~1 = A,~(N), (3.12) and logL(u) = s u p IogL(X, u, A) = _ N log IA(N)I A>0 2 Np Np 2 2 C. (3.13) Next, find $ = (kl . . . . . iq) such that logL(~) = max logL(rc). (3.14) ~gq Case (ii). A j > 0, j = 1, 2 . . . . . N. Proceeding along the (3.12)-(3.14), we get for the model (re = (k 1. . . . . kq)~ kq) Mrc: E x i = ~ t j , #j##j+l, Varxi=Aj, kj_l<i<~kj, j=l same lines as ..... q+l, J = 1, . . . , q , logL(lr) = sup {logL(X, lr, A1 . . . . . Aq+~), A~>O, i = 1. . . . . q + 1} - Nq+l ~ ~xjloglAj(N)l 2j=1 Np 2 Np c . 2 (3.15) P. R. Krishnaiah, B. Q. Miao 386 where o~ = (kj - k j _ 2 ) / U (3.16) and ks - k j _ 2 > p . Finally, find ~t = (kl . . . . , kq) such that l o g L ( ~ ) = max l o g L 0 z ) . (3.17) neKu i.e. fi = arg m a x l o g L ( n ) . ~EKq 3.3. A type o f m a x i m u m likelihood ratio estimation ( M L R E ) Consider the model Xt =1"//~+g, t = 1. . . . . #2 + e,, z, (3.18) t = z + l, . . . , N , where z is unknown, //1 ~//2, and et, t = 1. . . . , N, are iid. with the c o m m o n distribution N(0, a2). a 2 is unknown. Consider the null hypothesis Ho: X, = #2, t = 1, . . . , N, against the alternative H~: J'#x, ) l#2, x, t= 1,..., z, t= z+ 1,...,N, where z is unknown. Write N x , = z-2 xt ' x*=(N- t=l z)-2 ~ xt, (3.19) t=z+l W,=(N-2)-I (xi-2~)2+ i=l ~ (x i _ 2 . ) 2 . (3.20) i=v+l The standard difference between the observations before and after the change point is y~ = ( z ( N - z ) l N ) l / 2 ( 2 ~ - ~ * ) , (3.21) then V, = (z(N z)lN)ll2(x. - ~*)lx/~ has a t-distribution with N - 2 degrees of freedom under Ho. The likelihood ratio test for unknown z is based on the m a x i m u m t-distribution - V(~) = max I ~'c~N-- 1 V(z). (3.22) Review about estimation of change points 387 i.e. = arg max V(z). l ~ z~<N-- I This method was extended to multi-dimensional case by Srivastava and Worsley (1986). We need only to note that if we substitute (x - 5~)(x - 2~)' and ( x - 5 * ) ( x - 5 * ) ' for ( x - 5~) 2 and ( x - 5 * ) 2 respectively, then under H o, y ' ~ W ~ - l y ~ is a Hotelling T2-Statistics. Take ~ such that H('~) = max z(N- z) (2~ - 2 * ) ' W~- 1(5~ - 5 " ) = max H ( z ) . (3.23) N i.e. = arg m a x H ( z ) . 3.4. C u m u l a t i v e s u m estimation ( C U S U M E ) Consider the model: x i , . . . , x u are independent 0 - 1 r a n d o m variables such that e(x, = 0) = 1 - e(Xl P ( x I = 1 ) = ~" 0 ° ' ( 0l, = 1), i= 1 (3.24) i= z+ 1,...,N, where z is unknown. Let S t = ~ x i, t = 1. . . . . N, (3.25) i=1 vt = N S , - (3.26) tS u , If 0o > 01, then it would be reasonable to estimate z by the quantity t maximizing vt. If there are several such t's, we take "~1 = inf{to: vto >/vt, t = 1, . . . , N } , (3.27) to Similar estimates are then defined when it is known in advance that 0o < 01, or 0o ¢ 01 as follows: z 2 = sup {to: Vto<~Vt, t = 1. . . . . tO = inf{to: to Iv/ol/> vii, N}, t = 1, . . . , N } , 0o < 0 1 , (3.28) for 0o4= 01 . (3.29) 388 P.R. Krishnaiah, B. Q. Miao Under the null hypothesis Ho: (there is no change in probability) versus the alternative H~: (there is a change in probability at unknown time), the following statistics Vm + , V~ and Vm have the same distributions as the null distributions of m ( N - m)O+,N_m, m ( N - m)D/n.N_ m and m ( N - m)Om,N_m, respectively, the multi-dimensional extension of the Kolmogorov-Smirnov two-sample statistics, where V+~ = ( v ~ , I S N : m ) , Vm = (V~zlSN = m ) , Vm = (v~lS~v = m ) . Pettitt (1980) indicated that the C U S U M E and MLE of z are asymptotically equivalent. Monto Carlo simulations show that in many cases Pr {CUSUME = MLE} is approximately one. Hinkley (1971) also considered change point for independent normal random variables by means of cumulative sum of sequential residual errors. 4. Bayesian estimate In this section we discuss the change point problem from the Bayesian view-point. We only discuss the case of jump model, since the treatment of continuous model is similar. Suppose we have a jump model with at most one jump point: x, = {0~1 + fllUt-~- l~t, O~2"~'~2Ut+~,t, where 1, . . . , ~, t= (4.1) t = Z+ 1 , . . . , N , u I < u 2 < "" • < UN, ( ~ 1 ' i l l ) # ( g 2 ' /~2) a n d ~ is unknown. A more general form is ~10 Xt -}- ~ l l U t l -[- ' ' ' ~ 0C20 + 0~2 lUt 1 -1- "~ ~lqlUtql -j- ~'t' t = 1. . . . . z, q-~2q2Utq2+~t, t=z+l,...,N, (4.2) For the sake of simplicity, we only consider the model (4.1). Further, we assume z obeys some specified prior probability distribution and et's are independent and normally distributed. = ~t N(0, (N(0, a~), 2 t= 1,..., t=z+ l,...,N. The Bayesian estimators of z are usually defined by (i) the posterior mode of z or (ii) the value minimizing the expected posterior loss of the quadratic loss functions (z -/1) 2 with respect to the set J of admissible values of z. Review about estimation of change points 389 Now we give a more detailed description of the results of Schulze (1982), which generalizes the results of Smith (1975), Frereira (1975), Holbert and Broemeling (1977) and Chin Choy and Broemeling (1980). We also mention that Chemoff and Zacks (1964), Kander and Zacks (1966), Bhattacharyya and Johnson (1968), Gardner (1969), MacNeill (1971), Sen and Srivastava (1973) and Booth and Smith (1982) also investigated these problems within a Bayesian framework. First, Schulze (1982) considered improper prior distribution by assuming: (i) The parameters 0 - ( ~ , fl), a 2 and z are all independently distributed. (ii) The parameters 01, 02 are uniformly distributed over E2. (iii) The variances alz, az2 are independently distributed with improper densities po(a 2) = (a~) v~ and po(a22) = (a2)% where vl and vz are given integers, for example, vl = v2 = - 1. (iv) Specified a priori probabilities p0(z), z e a r, are given. THEOREM 4.1 (Schulze, 1982). Under (i)-(iv) the prior densities are proportional to po(O, ~2, r) ~ po(r)(,r?)~(,r?) °2 , and the corresponding posterior probabilities px(z) for the change point px(Z) ~ C'(x, Z)po(Z) , (4.3) where X = (X 1 . . . . , XN) , Cl(x, "c)--l(01, 02)' (0,, 02)1 -1/2 F ( z - 2 v l - 4 ) 2 X S 1 ( " c ) ( - m - 2Vl - 4 ) / 2 S l ( r ) ( - N- F( N-r-2v2.4)-2- m - 2v2 - 4)/2 (4.4) Sl(z) and S2(z) denote the residual sums of squares of the least square estimate Oj, j = 1,2, based on the observations t= 1,..., r and t= z+ 1. . . . . N, respectively. To define proper prior distribution with respect to the parameters 0j, aj.2, j = 1, 2, Schulze (1982) made another assumption: (v) Conditional on z = t, the parameters 0j, ~2, j = 1, 2, are independently distributed as norrnalgamma variables NF(2, Tl(t)) and NF(2, T2(t) ) with parameters Tj(t) = (rj(t), Gj(t), 0s, Ss.(t)), j = 1, 2. where rs(t) >>.3, Gs(t) is the positive definite matrices of order 2. (For definition of NF(p, T) distribution see Humak (1977, A, 2.38, p491)). 390 P. R. Krishnaiah, B. Q. Miao THEOREM 4.2 (Schulze, 1982). Under (iv) and (v), prior densities po(O, a 2, z) oc po(z)pl o(01, a? 2pz)p2o(02, a f 21~), (4.5) where pjo(02, %. 2p z) denote the densities of N(2, Tj(z))-distributions, j = 1, 2, and the corresponding posterior probabilities for z are obtained in the form: px(Z) oc C2(x, Z)po(Z), (4.6) where C2(x, "C) [I F(frjx(Z ) - 2)/2) x Sj(z) (rA~)-2)/2 x Iafiv)l 1/2 j=l ~ - - - 2~2)--)( Sjx(7~)(rjx(T)-2)/2 X IGjx(z)l '/2 r,x(Z) = rl(r) + z, rzx(Z) = rz(z ) + N - (4.7) z, ajx(T ) = Gj(T) q- (01, 02)' C01, 02), Sjx('~ ) = Sj('~)q- ~ xi 2 --~ O'lal(,~)O 1 - Ojx(T)' alx('C)Ojx(T,), i=1 Oj~(~) = [Oyx(z)'Gj(z)+ Xj(~)(O,, 02)]Gj;', X;(27) = (X 1. . . . , Xz)' , j = 1, 2, X~('£) ----( X z + l , " ' ' ' XN)' " In order to find the estimate z, one possible choice is that = arg maxpx(z ) = arg max CS(x, Z)po(r), v~d zEY j = 1, 2. (4.8) Note that we have to calculate all px(Z) and thus all values Cffx, z), j = 1, 2, according to previous formula (4.8). But for each fixed z c J , the calculation of CS(x, z), especially for C2(x, z), is quite complicated. Most of the effort is devoted in searching for an optimal ~. Another choice of the estimate @ is defined by minimizing the expected posterior loss function. /I = arg min R ( # ) , R ( / 0 = ~ (z - #)2px(Z) = arg min Epx(Z)(z- #)2. (4.9) "c~J It is well-known that E(zl x) = Y ~ s zpx(z) = arg min,~sR(#). But E(z[ x) is not necessary an element of J. So we can take fi = arg rain (# - E(z[x)) 2 , ,uaJ i.e., the estimate is the point in J which is nearest to the posterior expectation of Review about estimation of change points 391 z. Notice that E(z[x) = ~ "cpx(0 =Y~*~J zCJ(x' z)P°(Z) ~+ (4.10) Y ~+ C;(x, z)Po(*) It requires to calculate px(Z) and thus CJ(x, z) for all z s J Therefore, the two methods presented above are comparable. with po(Z)> 0. 5. Large sample properties of the estimates of change points In the sequel we consider the multivariate jump change model. Let X(t) be an independent p-dimensional process on (0, 1] such that X(t)=#(t)+ V(t), 0 < t ~ < 1, (5.1) where g(t): p x 1 is a non-random left-continuous step function and V(t): p x 1 is an independent normal process with mean vector 0 and covariance matrix A / > 0 in the j-th horizontal segment. Denote all the jump points of /~(t) by tl, . . . , tq, i.e. #(tj):~/~(tj + 0 ) , j = 1, . . . , q, where 0 < tl < "'" < tq < 1. tl, . . . , tq are called change points of the process X(t). Assume that N samples are drawn from X(t) in equal-spaced t, say X ( j / N ) , j = 1. . . . . N. We are goint to find a set of numbers, say n (u) = (k~N), . . . , kCqN)), such that E X(i/N) = #j, VarX(i/N) = Aj, for k}U_)l < i <~k) N), j = 1,2 . . . . . q + 1, (5.2) where k(oN) = 0 , &q+IL(N)__--N , and #j ~ ~j+l. The number q of change points may be assumed known or unknown, but it is known that q is less than a given constant L. Let /~(LN) = {7["(N) = ( k ? ) , ..., k~N); 0 < k~N) < " " < k~N) < N, l = 1 , . . . , L = U K(N) --q (5.3) q=l where K(qN) is defined by (3.6). For simplicity, we omit all the superscript N of X, k, n etc., for example, k j - k ) u), n - n ( N ) , K q _ K ( N ) q . Further, define X~ = X ( j / N ) . We must always keep in mind that these qualities are dependent on N. By (3.3), for given n = (kl . . . . . kq) e Kq, under model M , , we get sup logL(X, n, N) = __N loglA=(N)l + b I - G °) + b~. A>o 2 (5.4) L} 392 P. R. Krishnaiah, B. Q. Miao Similar to (5.4), for given rc = (kt, ..., kq) e rr = (k~ . . . . . Aq+,), A ; > 0 , sup {logL(X, lr, A~, . . . , p_ kq)~Kq, under /~I,, it follows: Kq, i--- 1 . . . . . q + 1} N q~l 2 j = l c~/l°g[Ai(N)l + b 2 ~ G ~ ) + b2" (5.5) ___ where b I and b 2 are constants independent of rr and Av, and A,~(N), A/(N) and aj are defined by (3.11), (3.9) and (3.16), respectively. 5.1. Estimate of change points when q is known (1) Assume such that A1 . . . . . Aq+l is known a priori. G ~( 1-) _ i.e. ",(1). max /tr,~ , ~+K~ ¢r = arg max G ~ ) , (5.6) IrE Kq (2) A s s u m e ^ we have no prior information about Aj, j = 1, . . . , q + 1. Take = (kl, . . . , kq) from gq such that G(~ )= max G (2) ~Kq i.e. (5.7) = arg max G ~ ) , IrE g q Then we have THEOREM 5.1 (Krishnaiah, Miao strongly consistent estimate of (t 1. . . . and Zhao, , 1986). (kl/N, .... ]¢q/N) is a tq). 5.2. Estimate of change points when q is unknown Let {C~)}, (D~)}, j = 1, 2, be two sequences such that (5.8) (5.9) N >> D ~ ) >> "-'No'(1)>> l o g N , N >> D ~ ) >>~N~(2) >> log2N, Hereafter a N >> fiN means limN+ oo ~N/flN = 00. Suppose the number of change points is less than some known constant L. Consider two cases. (i) All the A/ are equal. Let Q~) = -- N 2 l o g l A , ( N ) l - # ( ~ ) C ~ ), ~F.L, (5.10) 393 Review about estimation of change points where #{re} denotes the number of cut-off points in n. Take ~ = ( k , , . . . , lcz) ~/~L such that Q(~)= max o (1) (5.11) (ii) There is no prior information about A, . . . . . Q~) N Aq+ 1 except q<~L. Let q + 1 = --- ~ 2j=, ~tjloglA,(N)l- # ( n ) C ~ ), r t ~ g L. (5.12) Take ~ = (~:1. . . . . k~,)e/~ L such that O(2)= max a(~). 7~ (5.13) K L A ^ In both cases k 1. . . . . kz can be grouped into sets M1, M2, ..;, by the follow!ng procedure. Let k, be an element of M,. For k2, if k2 kx < D~, then k2~M1, otherwise k 2 e M 2, where i = 1, 2 corresponds to cases (i) or (ii), respectively. Continuing this procedure, we get ~t+ieSMj if ~1+1 - k t < D ~ ) , t Ms+ 1 otherwise. ^ i= 1, 2, ^ Thus, k,, . . . , k h are grouped into sets M 1. . . . . M 4. Here we note that M;, are all dependent on N. Krishnaiah, Miao, Subramanyam and Zhao (1987) proved this result: THEOREM 5.2. With probability one for large N, (4, k,,IN, ..., ]%IN)-> (q, t, ..... tq) where 1% is any element of Mj. 5.3. Local likelihood estimation (LLE) The previous results are difficult to put into practical use if the number of change points is rather large. In this connection Krishnaiah, Miao and Zhao (1987) developed a new method, so-called local likelihood method, to estimate the position and the number of change points. This procedure is feasible computationally. Now we introduce this procedure. Consider model (5.1). For every k, k = m . . . . . N - m, we can construct that Ak(N ) = ½(A lk(N) + Auk(N)), (5.14) P. R. Krishnaiah, B. Q. Miao 394 k AI~(N) m-1 ~ -- (xi - i=k-m+ Xk-m+ l k)(Xi -- X k - r n + l k)' , (5.15) 1 k+m A2k(N ) = m- ' Z (Xi -- X ~ + m ) ( X i -- Xk~ +m)' , (5.16) i=k+l k+m B k ( N ) = (2m) 1 F, (x, - ~ - m +,*+m)(X, -- ~ * - m t=k-m+l GN(k ) = + ~ ~ +m)' , (5.17) (5.18) m log [Ak(N)[ - m log ]Bk(N)] , where X u is defined by (3.10). When all the Aj are equal to A, take m = m N which satisfies (5.8). Define ON = {k: k = m, m + 1. . . . , U - m, - G u ( j ) > C ~ ) } , rain {k: k e D N } kiN= D I N = {k: k ~ ON, k - , kiN< 3m}, (5•20) k z u = rain {k: k E O N - D I N } , D i N = {k: k ~ O N - D1N, k - k 2 u < 3 m } , (5.21) Continuing this procedure, we obtain D N = DIN + ''' where each DjN t j = 1. . . . . (5.22) + D4N , 4, is not empty. Put 1 tj = 2N {kj,~ + max(kj, k j e D j N } , j = 1. . . . . THEOREM 5.3 (Krishnaiah, Miao and Zhao, 1987b). 4. (5.23) (q, t 1, . . . , to) is a strong- ly consistent estimate o f (q, q , . . . , tq). Proceeding along the same lines as above, we can obtain a number of results concerning this estimate• For example, we have THEOREM 5.4. Suppose that there is no prior information about Aj, j = 1. . . . , q + 1. L e t m log I A 2 k ( N ) - m log rBk(N)[ G u ( k ) = 2m log IA,k(N)] + ~- (5.24) where Ar,~(N), 7 = 1, 2, and BI,(N) are defined in (5.15)-(5.17). Suppose m = m u Review about estimation of change points satisfies (q, t l , " ' , 395 (5.9). Define DN, DjN , t~, j = 1. . . . . q, by (5.19)-(5.23), to) is" a strongly consistent estimator of (q, tl, . . . , tq). then 5.4. M L E of change points with restricted condition in mean Bartholomew (1959) first proposed the following testing problem. Suppose X l, . . . , X u are independently normally distributed, X~ ~ N(#i, a~2), i = 1, . . . , N, and a~, i = 1. . . . . N, are known. It is desired to test whether X~ . . . . . X u have the same mean when the rank order of these means is known. H e introduced a test statistic, but did not consider the estimation problem. Our local likelihood method is especially suited in estimation of this type, whatever the variances are equal or not. The case where only one change point exists is investigated by Sen and Srivastava (1975) and Holbert and Broemeling (1977), among others. In the model (5.1), let p = 1, X; = X(i/N), i = 1. . . . . N, be independent normal variables, #j's, defined by (5.3), satisfy ~1 > #2' ' ' >#q+l ' Var X ( i / U ) = )~j, kj. L < i < ~ k j , j= 1. . . . . q+l. Take a positive integer m = m N < N which will be defined below. For k = m, m + 1, . . . , N - m, we assume that EXk_m+ I . . . . . EXk=#(1), EXk+ l . . . . . EXk+m=#(2) ' and VarXk-m+ 1 . . . . . VarXk = 2 (l~ , VarXk+ 1 . . . . . VarX~+,~ = 2 (2) , where 2 (i), /~(i), i = 1, 2, may not equal to 2i, #i, i = 1, 2, respectively. Case (i). All the 2fs are equal to 2. The logarithm of the likelihood ratio statistic for testing the null hypothesis H~: #(~)=/~(2) against the alternative Kk: # ~ ) > #(2) is given by GN(k ) = log (A~(N)B~ ' ( N ) ) l ( Y k _ m + I k > X k k +m) E where Xij is defined by (3.10) and I(A) denotes the indicator of a set A, and A k ( N ) and Bk(N ) are defined by (5.14) and (5.17). Take m m N and ~ur(l~ to satisfy (5.8). Define DN, Dis, tj, j = 1, . . . , q by (5.19)-(5.23). Then Krishnaiah, Miao and Z h a o (1986) proved the following = THEOREM 5.5. of (ql . . . . . tq). Under case I, (0, t l . . . . . t 4) is a strongly consistent estimate P. R, Krishnaiah, B. Q. Miao 396 Case (ii). The only thing known about 2's is that 2e > 0, i = 1, ..., q. By the same methodology, we have THEOREM 5.6. Let m = m N and a positive number C ~ ~ satisfy (5.9). Define DN, DjN, 2j, j = i, . . . , 0 by (5.19)-(5.23). Then (0, t l . . . . . 2o) obtained from above procedure is a strongly consistent estimate of (q, q, . . . , tq). 5.5. Non-parametric estimation Quite a lot of papers appeared handling the change point problem by nonparametric methodology. Since in this book Cs6rgO and Horvhth have made a detailed survey on this subject, we shall content ourselves with some supplementary remarks. Yin (1986) proposed a method to search the change points by comparisons made locally. Specifically he considered the model (1.1), in which the non-random function may have discontinuity points t I . . . . . ta of the first type, which he defined as the change points of the model. The function f is supposed to obey the Lipshitz condition within each interval [a, b] c (0, 1] not containing tl, ..., tq. Suppose that we have observed x(i/N), 1 <<,i <~N. Choose a positive integer m = m u appropriately, and define l ~x(k-m" DNk = -- m ~ k=m+ \ ~ / ] + ...+X(~-)-- (5.25) l,...,N-m. Intuitively it is clear that when k i N ~ t+ for some i, IDNkl tends to be large. Otherwise it will be smaller. This simple observations suggests the following procedure: Choose hN > 0, N = 1, 2, ... , and define 11 = the smallest k such that IO N k I -- ( k / N ) h N = max (IDNj] - (j/N)hu) , 12 = the smallest k such that IONkl -- (k/U)h N = max { IDNjl - (j/N)hu: IJ - 111 > 4mN}, Is = the smallest k such that IDN~,b - (k/N)hN = max {IDol - (j/N)hN: IJ - I / > 4mN, l <~i<~s-- 1} The following theorem is true. THEOREM 5.7 (Yin, 1986). Suppose that mN/N-~ O, h u ~ 0, and mN/(NhN)--~ O. Then with probability one we have (i) I/j - tjl ~< 2mzv/N, 1 <~j <<.q, for N large, Review about estimation of change points (ii) DNj ~ f ( t j + O) - f ( t j - 0), (iii) DNj = O(hN), j > q. 397 the j u m p at tj, Based Aon this theorem, if we choose CN = I h N l o g h N I , pick up those integers ..., kq, such that [DNk, I < C N , l<~i<<.(t, then with probability one, the number ~ of such integers tends to q - - t h e number of change points, and k l / N , . . . . k ~ / N (for such k) tend to the change points t I . . . . . tq. Chen (1987) considered the case where at most one change is allowed, and f is a step function: A kl, f( t) a, (a 0 < t ~< t o , +0, (5.26) t o < t < ~ 1, where a, 0 and to are unknown. In this simple case Chen derived the asymptotic distribution of the test statistic under the null hypothesis that no change point exists. THEOREM 5.8 (Chert, 1987). Suppose that Xl, . . . , x N are iid with a c o m m o n normal distribution N(a, a2). L e t m = raN, N = 1, . . . , 2 . . . be positive integers such that lim m / N = 0 , N ~ lim ( l o g N ) 2 / m = O. S~oc~ Let Yk = (2m) 1/2 t 2m I~i=k+ xi - 1 ~u = max {I Ykl: k = m, . . . , N - AN(X) = [2 } ~ xi , k = m, m + 1. . . . , N - m, i=k-m+l m}, l o g ( 3 N / 2 m - 3)] - 1/Z{x