Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Pattern recognition wikipedia , lookup
Inductive probability wikipedia , lookup
Convolutional neural network wikipedia , lookup
History of artificial intelligence wikipedia , lookup
Linear belief function wikipedia , lookup
Knowledge representation and reasoning wikipedia , lookup
Personal knowledge base wikipedia , lookup
Catastrophic interference wikipedia , lookup
114 I I I I I I I I I I I I I I I I I I I Er go: A Graphical Environment for Constructing Bayesian Belief Networks Ingo Beinlich, M.D. Edward Herskovits, M.D. Noetic Systems Incorporated Abstract We describe an environment that considerably simplifies the process of generating Bayesian belief networks. The system has been implemented on readily available, inexpensive hardware, and provides clarity and high performance. We present an introduction to Bayesian belief networks, discuss algorithms for inference with these networks, and delineate the classes of problems that can be solved with this paradigm. We then describe the hardware and software that constitute the system, and illustrate Ergo's use with several examples. Introduction A graphical representation of probabilistic relationships among variables, called a Bayesian belief network, has been independently defined by several researchers. There is thus a plethora of synonyms for Bayesian belief networks [Pearl 1987], such as causal nets [Good 1961a; Good 1961b], probabilistic causal networks [Cooper 1984], influence diagrams [Howard 1984; Shachter 1986], and causal networks [Lauritzen 1988]. We shall use the term belief network in this paper. Belief networks provide a conceptual framework for constructing expert systems [Cooper 1989; Horvitz 1988]; they function as platforms for knowledge acquisition [Chavez 1989; Beckerman 1989; Lehmann 1988] and for normative probabilistic in ference [Lauritzen 1988; Pearl 1986; Suermondt 1988]. This representational power is particularly helpful when used in complex domains, in which conclusions, inter mediate states, and evidence are related by extensive interactions, making knowl edge acquisition and knowledge-base maintenance difficult. A belief network clearly specifies the variables, associations, and probabilities relevant to a particular do main, and can thus form the basis for communication and consensus formation among experts [Bonduelle 1987]. Belief networks are primarily characterized as being used for classification; this class of problems excludes planning, yet includes many interesting problems, such as constraint satisfaction, that traditionally have been solved with classical AI meth ods. Recent research has shown that any influence diagram used for decision mak ing can be transformed to an equivalent belief network [Cooper 1988]. A belief-network is a finite directed acyclic graph in which nodes represent the vari ables of interest, and arcs from parent nodes to child nodes represent a probabilistic association among the child and its parents. Probabilities are attached to nodes and 115 to arcs in a belief network; these probabilities capture the uncertainty inherent to the relationships among the variables. In particular, for each node Xi with a set of par ents 1ri, there is a conditional probability distribution P(xi I 1ri); for each Xi without parents, there is a prior probability distribution P(xv. Conditional probabilities in belief networks can be interpreted as "if-then" rules in the construction of probabilistic expert systems. In this sense, a rule in a belief net work is a conditional probability of the form P(Xi I y1,y2, .. ,yn), where Xi and y1,y2, ,yn are variables with known values. Each variable has an associated set of rules, which is the collection of conditional probabilities for each possible combination of values that this variable and its parents can assume. . . . . The prior and conditional probabilities explicitly represented in a belief network are sufficient for computing any probability of the form P(B 14>), where e and 4> are members of the power set of this belief network's variables. The key feature of the belief-network paradigm is its explicit characterization of conditional independence among variables, which in turn decreases the number of probabilities required to capture the full joint distribution. Inference Algorithms The practicality of Bayesian reasoning has been debated in the literature for several decades [Buchanan 1985, pp. 235-237; Cooper 1989; Rich 1983]. In particular, slow in ference times and the need for a large number of probabilities have been cited. However, several algorithms have been developed during the last ten years that greatly increase the efficiency of Bayesian inference. They share the additional ad vantage of operating directly on the graphical network structure; thus, the same graph used for knowledge acquisition can be immediately and directly used for vali dation of that knowledge, leading to rapid model refinement. Algorithms for belief-network inference, such as those developed by Pearl [Pearl 1986; Suermondt 1988] and Lauritzen and Spiegelhalter [Lauritzen 1988], allow the user to instantiate values for some nodes, after which these algorithms compute posterior distributions over the remaining nodes. These methods thus provide a simple yet general mechanism whereby the user may enter evidence and determine the ensuing implications. Pearl's algorithm implements a local message-passing system for probability updates in singly connected networks. To cope with multiply connected networks, the method of cutset conditioning has been proposed [Pearl 1988, pp. 204-210] and im plemented [Suermondt 1990]. This technique results in a time complexity propor tional to the product of the size of the network, the number of cutset instantiations and (without special scheduling techniques) the size of the evidence set. The num ber of cutset instantiations is exponential in the number of undirected cycles, and for many applications this will result in impractical running times. I I I I I I I I I I I I I I I I I I I 116 I I I I I I I The Lauritzen-Spiegelhalter algorithm rearranges the network into a tree by form ing clusters of nodes (cliques). The complexity of evidence propagation using this al gorithm is linear in the number of cliques, and proportional to the size of the largest clique in the network. The size of a clique is exponential in the number of nodes in that clique. As a node always forms a clique with at least its parents, the maximum number of parents over all nodes in the network is an important determinant of the running time. In contrast to Pearl's algorithm, observing evidence makes inference faster by simplifying the tree of cliques. Approximate algorithms are also being developed [Chavez 1989; Henrion 1988; Shachter 1989]; they allow the user to trade inference time for precision or accuracy. As the sizes of implemented problems increase, the use of these algorithms will be come more prevalent. ----- Rsla I Figure 1 I A graph-editor window in Ergo. The user may edit networks in multiple windows using familiar Macintosh tools I I I I I I I I I I Ergo Although belief networks provide an intuitive medium for knowledge acquisition and inference in probabilistic expert systems, graphical tools are required for the cre ation and manipulation of any nontrivial network. Ergo provides the necessary en vironment for rapid prototyping and implementation of probabilistic expert sys tems. Ergo contains: (1) a graph editor, (2) a probability editor, and (3) a numerical engine. The Graph Editor The graph editor provides a drawing environment for manipulating the network structure (Figure 1). Nodes and edges are created and deleted much as they would be in a mouse-based drawing program; nodes (along with their corresponding condi tional-probability matrices) and arcs may be copied and pasted either within a win- 117 dow or among windows. This facility allows the user to construct networks incre mentally; several subnetworks can be developed individually, and then merged into a larger structure using the copy and paste features. Furthermore, a library of proto typical structures could be designed once and then imported as needed, further modularizing system design and implementation. I I I I The Probability editor At any time during network construction the user may choose to enter probabilities for any node or change its characteristics by using the probability-editor (Figure 2). For each node its name, number of values, and a label for each value is displayed. Probabilities are edited in a tabular format, and can be pasted from other networks or other applications (such as spread-sheets or statistical packages). I I I Figure2 A probability-editor win dow in Ergo. For each node its name, number of values, and their labels can be edited. Probabilities are shown with the values of their parents. I I I I I The Numerical Engine A node is instantiated by selecting one of its values from a pop-up menu. This evi dence can be propagated automatically, or at the user's request after several instanti ations. Results for each node are shown either as histograms or as numerical values in a pop-up menu. We have implemented an efficient version of the Lauritzen-Spiegelhalter (LS) algo rithm. A network is first triangulated using a modified version of maximum cardi nality search [Tarjan 1985] that -allows for disconnected graphs. The graph is then compiled to a tree (or forest) of cliques as described by Pearl [Pearl 1988, p. 113]. Probability updating in the clique tree follows the scheme described by Lauritzen and Spiegelhalter. Our method of evidence absorption differs from that proposed in their paper in two respects: I I I I I I I 118 I I • I I I I I I I • Clique potentials incompatible with new evidence are removed from the cliques and are not set to zero as proposed by Lauritzen and Spiegelhalter. The speed of the following update step increases for any evidence on a variable with more than two values. Consider for example a clique with nodes A, B, and C, where each node has 3 values. This clique has 27 potentials describing its probability dis tribution. Now assume that node C is observed to have value c1. All potentials in the clique (ABC) with values c2 and C3 for Care incompatible with this evidence and are removed. This step takes at most 27 operations (to check the consistency of each potential). The resulting clique (AB I C=c1) now has only nine potentials. During updating, this clique must propagate its values to its parent and later to its children; the number of steps for propagation is proportional to the number of potentials in the clique. Assume that this clique has only one child (a clique can have at most one parent); propagation then requires 2*9 = 18 steps. The total number of steps involved in updating this clique is 27+18 = 45 steps. In compari son, setting clique potentials to zero involves at most 27 steps, but the clique size remains constant; thus, the update takes a total of 27 + 2* 27 = 81 steps. This ben efit increases with the number of children. The existing clique tree is modified to accommodate new evidence, instead of be ing completely rebuilt as Lauritzen and Spiegelhalter suggest. This approach is similar to the join-tree method described by Jensen et al. [Jensen 1988], and obvi ates an initialization step before every update. I 0.6340 True 0.3660 Felse I Inference in Ergo. In this exam ple the node Dyspnea has been instantiated to True. The evi dence is propagated through the network and the resulHng poste rior distributions are shown as histograms. The user may also select any node to view its distri bution numerically. I I I I I I I I Figure 3 X-Ray Dylpnea Refining the model After cycling between network construction and evaluation, the user may save the network to a file, print its structure, or export the graph to any of several Macintosh drawing or writing programs. In addition, the user may choose to export a text-for mat description of the network so that the file can be read by other programs. 119 Using a paradigm of iterative construction and evaluation, a group of users can de velop subnetworks in parallel and then merge them in a common window. This paradigm owes its feasibility to the modularity conferred on belief networks by the explicit representation of conditional independence. Performance We first describe an implementation of the Asia belief network, which was pre sented in [Lauritzen 1988], and is shown in our figures. A fictitional piece of qualita tive knowledge is considered: Patients might have tuberculosis, lung cancer, or bronchitis. These diseases might cause a positive finding on a chest X-ray or short ness of breath (dyspnea). The prior probabilities of the diseases are influenced by a history of smoking and a recent visit to Asia. Using Ergo on a Macintosh II, we drew and specified the network in a few minutes. Network compilation took less than a second, and resulted in a triangulated graph identical to the full moral graph in the paper by Lauritzen and Spiegelhalter. Evidence absorption and propagation also was instantaneous. We further tested Ergo with the ALARM network (see Figure 4), which has been used to analyze the complexity and behavior of different inference algorithms [Beinlich 1989]. This network contains37 nodes (most of them having three or more possible I I I I I I I I I I I I I I •• -- 1 --- .... ... miJ ...-C02 Figure 4 The ALARM network as created in Ergo I I I I I 120 I I I I I I I I values), and 46 arcs. The process of triangulation and clique produced a clique tree with a total of 1,440 potentials. For a single piece of evidence, the total time to return posterior probability distributions for all37 nodes required approximately 250 msec. This value is an upper bound on inference time, as the sizes of the cliques in the clique tree decrease with increasing size of the evidence set. Implementation The numerical engine is written in the C programming language, and is readily portable to any UNIX-based environment as a library that can be linked to other ap plications. The engine works as a function that takes as input a vector of instanti ated nodes, and returns a vector of probability distributions over the remaining nodes. A network produced in Ergo can be saved as a resource for use in other pro grams. On the Macintosh such a resource can be incorporated into a Hypercard stack. This resource allows the user to construct, using the Hypertalk progr amming language, a customized user interface that runs a Bayesian expert system independently of Ergo. The stack fully supports instantiation and inference, and may be distributed to any users with computers that run Hypercard. I I I I I I Summary Ergo demonstrates the feasibility of constructing and using nontrivial Bayesian ex pert systems; it combines an intuitive user interface with an efficient and portable numeric engine, and is implemented on readily available hardware. These features combine to make Ergo a powerful system for rapid prototyping of belief networks. References Beinlich, I. A., Suermondt, H.J., Chavez, R.M., Cooper, G.F. (1989) The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In Proceedings of the Second European Conference on Artificiallnteiiigence in Medicine. London, 247-256. Bonduelle, Y. (1987). Aggregating expert opinions by reconciling sources of disagreement. Ph. D. thesis, Department of Engineering-Economic Systems, Stanford University, Stanford, California. Buchanan, B. G., Shortliffe, E. H., ed. (1985). Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project. Reading, Massachusetts: Addison-Wesley. I Chavez, R. M. (1989) I Cooper, G. F. (1984). NESTOR: A computer-based medical diagnostic aid that integrates causal and probabilistic knowledge. Ph.D. Thesis, Department of Medical Information Sciences, Stanford University, Stanford, California. I Cooper, G. F. (1988) A method for using belief-network algorithms to solve decision-network problems. In Proceedings of the AAAI Workshop on Uncertainty in Artificial Intelligence. Minneapolis, Minnesota, 55-63. I I Hypermedia and randomized algorithms for medical expert systems. In Proceedings of the Thirteenth Annual Symposium on Computer Applications in Medical Care. Washington, DC, 171-177. 121 Cooper, G. F. (1989) Current research directions in the development of expert systems based on belief networks. Applied Stochastic Models and Data Analysis. 5: 39-52. Good, I. J. (1961a) A causal calculus (I). The British Journal of Philosophy of Science. 11: 305-318. Good, I. J. (1961b) A causal calculus (II). The British Journal of Philosophy of Science. 12: 43-51. Heckennan, D. E., Horvitz, E. J., and Nathwani, B. N. (1989) Update on the pathfinder project. In Proceedings of the Thirteenth Symposium on Computer Applications in Medical Care. Washington, DC, 203-207. Henrion, M. (1988) Propagation of uncertainty by probabilistic logic sampling in Bayes' networks. In Uncertainty in Artificiallnteiligence 2. Amsterdam: Elsevier Science Publishers. Horvitz, E. ]., Breese, J.S., Henrion, M. (1988) Decision theory in expert systems and artificial intelligence. Journal of Approximate Reasoning. 2: 247-302. Howard, R. A., Matheson, J.E. (1984) Readings on the Principles and Applications of Decision Analysis. Menlo Park, California: Strategic Decisions Group. I I I I I I Jensen, F. V., Olesen, G. 0., Andersen, 5. K. (1988). An algebra of Bayesian belief universes for knowledge based systems. RR-88-25. Institute of Electronic Systems, Aalborg U niversity, Denmark, July 1988. I Lauritzen, S. L., Spiegelhalter, D. ]. (1988) Local computations with probabilities on graphical structures and their applications to expert systems. Journal of the Royal Statistical Society B. 50: 157-224. I Lehmann, H. P. (1988) Knowledge acquisition for probabilistic expert systems. In Proceedings of the Twelfth Symposium on Computer Applications in Medical Care. Washington, DC, 73-77. Pearl, J. (1986) Fusion, Propagation, and Structuring in Belief Networks. Artificial Intelligence. 29: 241-288. Pearl, J. (1987) Evidential reasoning using stochastic simulation of causal models. Artificial Intelligence. 32: 245-257. Pearl, J. (1988) Probabilistic Reasoning in Inteiligent Systems: Mateo, California: Morgan Kaufmann. Networks of Plausible Inference. S a n I I I Shachter, R. D. (1986) Evaluating influence diagrams. Operations Research. 34: 871-882. I Shachter, R. D., Peot, M. A. (1989) Simulation approaches to general probabilistic inference on belief networks. In Proceedings of the Fifth Workshop on Uncertainty in Artificial Intelligence. Windsor, Ontario, 311-318. I Suermondt, H. J. (1988) Updating probabilities in multiply connected belief networks. In Proceedings of the Fourth Workshop on Uncertainty in Artificial Intelligence. Minneapolis, Minnesota, 335343 . I Suermondt, H. J., Cooper, G. F. (1990) Probabilistic inference in multiply connected belief networks using loop cutsets. International Journal of Approximate Reasoning (to appear). I Rich, E. (1983) Artificial Intelligence. New York: McGraw-Hill. · Tarjan, R. E. (1985) Decomposition by clique separators. Discrete Mathematics. 55: 221-232. I I I I