Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
George Mason University Department of Systems Engineering and Operations Research GMU Bayesian Decision Theory Instructor: Kathryn Blackmond Laskey George Mason University Department of Systems Engineering and Operations Research [email protected] http://www.ite.gmu.edu/~klaskey ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 1 - George Mason University Department of Systems Engineering and Operations Research GMU This workshop is dedicated to the memory of journalist Danny Pearl, brutally murdered in Pakistan in February 2002, and to the pioneering research of his father Judea Pearl. Judea Pearl’s research has the potential to create unprecedented advances in our ability to anticipate and prevent future terrorist incidents. A portion of any compensation derived from this workshop will be donated to the foundation to honor Danny Pearl: Pearl Family Foundation: c/o Wall Street Journal P.O. Box 300 Princeton, New Jersey 08543 ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 2 - George Mason University GMU Department of Systems Engineering and Operations Research Software for Bayesian Networks and Decision Graphs • • • A number of COTS software tools are now available for Bayesian networks and decision graphs The CD supplied with this seminar contains a set of examples constructed in the Netica® package. A limited functionality version of Netica® is included on the CD. All the examples can be run directly from the CD. To purchase a full functionality version of Netica® visit http://www.norsys.com. Examples on the following pages are annotated with the name of the Netica® file or files containing a Bayesian network model for the example ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 3 - George Mason University GMU Department of Systems Engineering and Operations Research Agenda 1. 2. 3. 4. 5. 6. Introduction Basics of decision theory and graphical models Knowledge engineering and model development Multi-entity Bayesian networks and decision graphs Inference using Bayes Rule Combining expert knowledge and data 7. Conclusion ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 4 - George Mason University GMU Department of Systems Engineering and Operations Research Models and Representations • Engineers, scientists, and policy analysts construct models to represent systems – We use the models to answer questions about the system – Goal: Build “good enough” models » “Good enough” depends on purpose for which model is used » Simplifications and inaccuracies don’t matter if they don’t materially affect results • Representations are approximations – Restricted set of variables – Unrealistic simplifications – Untested assumptions • Models are constructed from: – Past data on system or related systems – Judgment of subject matter experts – Judgment of experienced model builders ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 5 - George Mason University Department of Systems Engineering and Operations Research GMU Observations Representation Real World Actions ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 6 - George Mason University Department of Systems Engineering and Operations Research GMU ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 7 - George Mason University GMU Department of Systems Engineering and Operations Research The Fusion Challenge • Fusion is the process of incorporating information from different sources into a single “fused” representation • Why fusion is difficult: – – – – Vast quantities of sensor information Real-time processing requirements Restrictions on weight, communication bandwidth Need to integrate physical and geometrical models with qualitative knowledge – Noisy, unreliable, ambiguous data – Active attempts at deception – Requirement for robustness to new or little-known threat types and/or behavior patterns • Why fusion is important: – Features that are meaningless in isolation are definitive in combination Data, data everywhere, and not the time to think… ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 8 - George Mason University GMU Department of Systems Engineering and Operations Research The Systems Engineering Challenge • Systems engineering is the process of transforming a user need into an operational system that meets that need • Why systems engineering is difficult: – – – – – – – – Increasing complexity of systems Moving from stovepipe to system of systems Constant improvement cycle Need to predict performance of integrated systems made up of legacy, COTS and newly developed subsystems Need to predict performance of new and untested technologies Need to do trade studies among large numbers of options with varying degrees of technology maturity and performance uncertainty Increased reliance on models and simulations with varying degrees of fidelity and resolution Increasing cost and schedule pressure • Why systems engineering is important: – An effective and efficient systems engineering process produces quality systems in a cost-effective manner – Poor systems engineering leads to spectacular performance debacles and cost overruns ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 9 - George Mason University GMU Department of Systems Engineering and Operations Research Paradigm Shift in Computing • Old paradigm: Algorithms running on Turing machines – Deterministic steps transform inputs into outputs – Result is either right or wrong – Semantics based on Boolean logic • New paradigm: Economy of software agents executing on a physical symbol system – Agents make decisions (deterministic or stochastic) to achieve objectives – “Program” is replaced by dynamic system in which solution quality improves over time – Semantics based on decision theory / game theory / stochastic processes • Hardware realizations of physical symbol systems – – – – Physical systems minimize action Decision theoretic systems maximize utility / minimize loss Hardware realization of physical symbol system maps action to utility Programming languages are replaced by specification / interaction languages – Software designer specifies goals, rewards and information flows – Unified theory spans sub-symbolic to cognitive levels • Old paradigm is limiting case of new paradigm ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 10 - George Mason University GMU Department of Systems Engineering and Operations Research Requirements for New Paradigm Logic • • • • Embrace uncertainty Perform plausible reasoning Learn from experience Incorporate observation, historical data, expert knowledge • Explore multiple alternatives • Replace rote procedure with focus on attaining objectives • Trade off multiple objectives ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 11 - George Mason University GMU Department of Systems Engineering and Operations Research Agenda 1. 2. 3. 4. 5. 6. 7. Introduction Basics of decision theory and graphical models Knowledge engineering and model development Multi-entity Bayesian networks and decision graphs Inference and optimization algorithms Combining expert knowledge and data Conclusion ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 12 - George Mason University GMU Department of Systems Engineering and Operations Research The GOOD-D Process • Bayesian decision theory is based on the GOOD-D process • Good decision makers: – Think carefully about their Goals – Explore multiple Options – Predict the possible Outcomes of the decision and the likelihood of each outcome under each of the options they are considering – Weigh uncertainties and trade off different goals to Decide which option best serves their goals – Do it! Implement a sound and effective plan to carry through on the decision and monitor success • Computational models to support GOOD-D process should be grounded in new-paradigm theory ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 13 - George Mason University GMU Department of Systems Engineering and Operations Research Decision Graph • Utility variables represent decision maker’s goals • Action variables represent options available to decision maker • World state variables represent outcomes of the decision and other relevant aspects of world state • Arrows represent influences – Cause and effect relationships between actions and world state variables – Statistical associations among world state variables – How utility for the decision maker depends on state of the world Probabilities: P(Has disease) = .6 Disease Expected utilities: Outcome Utility Treat: .690 + .490 = 90 Don't treat: .60 + .4100 = 40 Best action: Treat patient Action Outcome Disease Outcome Utility Treat Has disease 90 Treat Doesn't have disease Don't treat Has disease Don't treat Doesn't have disease Disease free Side effects Disease free Side effects Long illness No side effects Disease free No side effects ©Kathryn Blackmond Laskey June 2002 90 0 100 Decision graphs are also called influence diagrams BDT Workshop - 14 - George Mason University GMU • Department of Systems Engineering and Operations Research Decision Theory: Criticisms and Rebuttals Argument: – Bayesians insert subjectivity into analyses • Rebuttal: – Every analysis depends on assumptions – Bayesians make subjective assumptions explicit in a way that leaves them open for debate and analysis – Bayesian theory provides a principled, theoretically justifiable way to integrate informed expert judgment with scientific theory and statistical data • Argument: – You just can’t put a numerical value on some things, such as human lives • Rebuttal: – Every time you get into your car you are implicitly putting a value on your own and others’ lives – Every time the airline allows a person to pass through a metal detector a value is being placed on human lives – Bayesian decision theory provides a scientifically sound way to make these tradeoffs explicit and open to public debate – Bayesian decision theory provides a language for public discourse to improve society’s ability to make sound policy judgments • Argument: – If anybody can have any prior they want, how can we justify our model output? • Rebuttal: – – – – Bayesian decision theory allows for reasonable people to disagree When there is disagreement the political process must operate to determine policy Decision theoretic models can inject sound science into the political debate With enough data non-dogmatic Bayesians will come to agree ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 15 - George Mason University Decision Graph: An Example GMU • • • Department of Systems Engineering and Operations Research Maria is visiting a friend when she suddenly begins sneezing. "Oh dear, I'm getting a cold," she thinks. “I had better not visit Grandma.” Then she notices scratches on the furniture. She sighs in relief. "I'm not getting a cold! It's only my cat allergy acting up!” Plausible inference The evidence for cat allergy “explains away” sneezing and cold is no longer needed as an explanation Cat_Nearby True 3.00 Cat_Nearby False 97.0 True 15.2 Cat_Nearby False 84.8 True 91.5 False 8.53 Allergic_Reaction Cold True 3.22 True 8.00 Allergic_Reaction False 96.8 Cold 92.0 False True 20.8 True 51.7 Allergic_Reaction False 79.2 Cold False 48.3 True 88.4 True 14.4 False 11.6 False 85.6 Scratches_on_Furniture Sneezing True 2.77 True 15.3 Scratches_on_Furniture Sneezing False 97.2 False 84.7 True 9.95 True 100 HealthOfGrandmother Scratches_on_Furniture Sneezing False 90.1 False 0 True 100 True 100 HealthOfGrandmother False 0 False 0 VisitGrandmother Go 117.000 VisitGrandmother StayHome 100.000 Go 73.2510 VisitGrandmother StayHome 100.000 Go 110.592 StayHome 100.000 ©Kathryn Blackmond Laskey June 2002 HealthOfGrandmother Pleasure Pleasure Pleasure 3 2 1 Netica® file Cat_DG.dne BDT Workshop - 16 - George Mason University Department of Systems Engineering and Operations Research What is a Decision Graph? GMU • Both a knowledge representation and a computational architecture – Represents knowledge about entities and their interactions – Modular elements with defined interconnections – Computation can exploit loosely coupled structure for efficiency • Used for inference and/or decision problems – Infer likely values of some variables from other variables – Choose policy that best achieves decision maker’s objectives ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 17 - George Mason University GMU • • • Department of Systems Engineering and Operations Research Types of Influence Links into chance variables from other chance variables are called relevance links. A relevance arc indicates that the value of one variable is relevant to the probability distribution of the other variable. Links from decision variables into chance variables are called influence links. An influence link means that the decision affects, or influences, the outcome of the chance variable. Links into decision variables are called information links. An information link means that the quantity will be known at the time the decision is made. – Decision variables are ordered in time – In standard influence diagrams, a decision variable and all its information predecessors are (implicit) information predecessors to all future decision variables (no forgetting) • • Relevance Link Influence Link Information Link Links from chance or decision variables into value variables represent functional links. Value variables may not be parents to decision or chance variables. If there is more than one terminal value variable, the total value is the sum of all terminal value variables. Bayesian network is a decision graph having only world state variables Value Links Notation for value variables varies. Some packages use rounded boxes, others diamonds, others hexagons. ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 18 - George Mason University GMU Department of Systems Engineering and Operations Research Specifying a BN Variable Parent Variable Values Cold Allergic_Reaction True False 8% 92% Cat_Nearby = True 75% 25% Cat_Nearby = False 1% 99% 99% 1% 5% 95% 3% 97% Cat_Nearby = True 60% 40% Cat_Nearby = False 1% 99% Allergic_Reaction = True Sneezing Cold = True Cold = False Cold = True Allergic_Reaction = False Cold = False Cat_Nearby Scratches_On_ Furniture • Simplicity – – • Scalability – – • 31 probabilities required to specify general distribution for 5 binary variables 8 probabilities needed to specify this model Over 1030 probabilities required to specify general distribution for 50 variables with 4 values per variable About 9000 probabilities required to specify BN for 50 variables, 4 values per variable, 3 parents per variable, no local structure Tractability – – General probabilistic inference is exponential in number of variables Inference in singly-connected BN is linear in number of variables ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 19 - George Mason University GMU • Department of Systems Engineering and Operations Research Directed Graphs Represent Dependence Graphs with nodes and edges provide a powerful tool for visualizing dependencies between variables – Variables are represented by nodes – Direct dependencies between variables are represented by edges connecting nodes – Absence of an edge between 2 variables means no direct dependency • Example: Rain or a sprinkler could cause the pavement in front of the house to be wet. This could cause a passerby to slip and fall, and could also ruin my new shoes Sprinkler Rain Pavement Fall ©Kathryn Blackmond Laskey – Rain and Sprinkler are independent, but are dependent conditional on Pavement – Fall and Shoes are dependent (if a fall is more likely, so are ruined shoes) but are independent given Pavement – Sprinkler and Fall are dependent, but are independent given Pavement Shoes June 2002 BDT Workshop - 20 - George Mason University GMU Department of Systems Engineering and Operations Research Causation and Explaining Away • Alternate causes of an event are often (approximately) independent given some generally understood context. They become dependent when a common effect is observed. (Knowing that one cause is true "explains away" other potential causes) • Learning about an effect that could be caused by either of two variables introduces an informational dependence between them • Informational dependence is different from causal dependence – We can change an effect by intervening to change the cause » We can make the car start if the battery is dead by putting in a new battery BatteryOK SolenoidOK – We cannot change the state of a variable by changing one on which it depends informationally » If the car won’t start and we learn that the solenoid is bad we infer that the battery is probably OK » But we cannot make the battery OK by destroying the solenoid CarStarts ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 21 - George Mason University GMU Department of Systems Engineering and Operations Research Exercise: Multi-Sensor Fusion • Several blind men are congregating around an elephant, arguing loudly over what they are feeling. Some of them, feeling the legs, shout: “It is a tree!” Others, feeling the tail, shout: “It is a rope!” Still others, feeling the ears, shout: “It is a blanket!” Others, feeling the trunk, shout: “It is a snake!” • Use a Bayesian network to show that a fusion system that integrates information from all these sources, each looking at a different aspect of the elephant, can reach a conclusion that none of them is able to reach on the basis of his own information. Netica® file BlindManAndElephant.dne ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 22 - George Mason University GMU Department of Systems Engineering and Operations Research Exercise: Develop a Bayesian Network The temperature warning light on a piece of equipment is designed to come on when the engine overheats. The light will also flash if the temperature sensor is broken. If the engine has overheated, it may be because a belt is broken or because the temperature in the engine room is too high. A high temperature in the engine room might cause the sensor to malfunction, and may itself be caused by a broken air conditioner. Either a broken belt or a broken air conditioner could be caused by a company history of poor equipment maintenance. An overheated engine might cause product defects. – Draw a Bayesian network representing the uncertainties faced by a technician in diagnosing the reason the light is on. – Use a Bayesian network software package to enter probabilities for this network. Use your judgment to choose reasonable probability distributions. – Use your Bayesian network to answer the following questions » What is the probability that the engine has overheated? » What is the probability that the engine has overheated given that the warning light is on? » What is the probability that the engine has overheated given that the warning light is on, and the air conditioner is broken? » What is the probability that there are product defects? » What is the probability that there are product defects given that the light is on, the air conditioner is OK, and the maintenance history is poor? Netica® file Eqpt_Diagnosis.dne ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 23 - George Mason University GMU Department of Systems Engineering and Operations Research Example: Develop a Decision Graph • If the belt is replaced it is unlikely to break regardless of maintenance practice. Replacing the belt has a cost and so do product defects. – Add a decision variable and value variables to your graph – Use your model to analyze the decision of whether to replace the belt – Use your model to analyze the decision of whether to replace the belt when the warning light comes on Netica® file Eqpt_Diagnosis_DG.dne ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 24 - George Mason University GMU Department of Systems Engineering and Operations Research Action, Causation and Information • Actions have two aspects – Actions can change the world (intervening actions) » Decision to change the belt – Actions can provide information about the state of the world (nonintervening actions) » Decision to check warning light • Intervening action – – – – Normal time evolution of variable is “interrupted” Agent sets the value of variable to desired value Variables causally downstream are affected by change Variables causally upstream are unaffected • Non-intervening action – Variable is information predecessor to future actions – No other changes • Every action has aspects of both – When we take an action our knowledge is updated to include the fact that we took the action (but we may forget actions we took) – Observing the world changes it (we often do not model the impact of actions whose impact is negligible) ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 25 - George Mason University GMU Department of Systems Engineering and Operations Research Example: Value of Information • You are deciding whether to go on a picnic (D=g) or stay home (D=h). Your utility depends on the whether it rains (W=r) or shines (W=s): – u(g,r) = 0 you go and get wet – u(g,s) = 10 you go and have fun – u(h,r) = 3 you stay dry indoors – u(h,s) = 4 you stay home and it’s nice • You can call for a weather forecast before you make your decision • The forecast is for rain (r*) or shine (s*) – P(F=r* | W=r) = .9 – P(F=s* | W=s) = .8 Forecas t Weathe r Utility Call? Go? • What is the optimal decision if the probability of rain is 0.1? What if the probability is 0.8? • Is it to your benefit to call for a weather forecast before making your decision? ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 26 - George Mason University GMU • Department of Systems Engineering and Operations Research VOI as Function of Prior Probability This graph shows the expected utility of three policies as a function of the prior probability of rain: – g: go no matter what – h: stay home no matter what – f: go only if the forecast says no rain • • • When line for f is above other lines information is valuable Difference is called value of information Summary: (costless) – Collecting information is useful if it might change your decision – The difference between your expected utility with and without the information is called the expected value of information – Costless information has positive value for 4/13<p<16/17 – Costly information has value when value is greater than cost of Low-cost information is worth collecting for some P(rain) information High-cost information is never worth collecting ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 27 - George Mason University GMU • Department of Systems Engineering and Operations Research Variable-Level Independence Independence assumptions are an important tool for simplifying knowledge elicitation and inference – We have seen how Bayesian networks use independence to reduce the number of parameters required to specify a probability distribution – Independence assumptions also reduce the computational complexity of inference • Bayesian networks are not expressive enough to encode all independence assumptions that can be exploited to: – simplify knowledge elicitation – reduce computations for inference • Additional types of independence which can simplify knowledge elicitation and inference: – Independence of causal influence (ICI) » The mechanism by which one parent variable causes child variable is independent of • • values of other parent variables and mechanism by which they cause the child – Context-specific independence (CSI) » Variables may be independent of other variables in some contexts but not others – Local expressions » The probability distribution of a variable given its parents can be specified as any parameterized function of the variable’s parents » For many local expressions, the probability distribution of the child depends on the parents only through a sufficient statistic ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 28 - George Mason University GMU Department of Systems Engineering and Operations Research Example of Noisy-OR • Sneezing can be caused by an allergy (A) , a cold (C), or dust (D) in the air • The Noisy-OR model: – Each cause may or may not "trigger" the effect. A C D » Only causes that are true can trigger the effect » Causes operate "noisily" - if true, they may or may not trigger the effect – There is a "trigger probability" associated with each cause. » Allergy triggers sneezing with probability pA = .6 » Cold triggers sneezing with probability pC = .9 S » Dust triggers sneezing with probability pD = .3 – Basic assumptions of the "noisy-OR" model: • » Effect occurs if one or more of its causes has triggered it » Whether one cause has triggered the effect is independent of whether another cause has triggered the effect ©Kathryn Blackmond Laskey June 2002 • Noisy-OR reduces 8 probability judgments to 3 (or 4 with “leak probability”) If there were 10 parents, it would reduce 1024 probability judgments to 10 (or 11 with “leak probability”) BDT Workshop - 29 - George Mason University GMU • Department of Systems Engineering and Operations Research Context-Specific Independence Activity of military unit depends on the weather (wind and precipitation) and time of day Wind Precipitation Time of Day When there are many parents context-specific independence can provide orders of magnitude savings in elicitation Activity • Context-specific independence reduces number of distributions for “Activity” from 8 to 4: Wind=Windy Wind=Calm Wet Dry Wet Dry Night Day Night Day P( Activity Σ = ΤMoveΥ) = .9 .7 .5 Calm, we t we athe r or windy, dry nights .1 reduce vi sibil ity: O K driving condi ti ons Be st to trave l on a calm, dry night:good dri ving condi ti ons and l ow vi si bil ity Poor driving conditions in wi ndy, we t weather, but l ow visibil ity. Most vi si ble i n dry dayli ght. Example due to Suzanne Mahoney ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 30 - George Mason University GMU Department of Systems Engineering and Operations Research Agenda 1. 2. 3. 4. 5. 6. 7. Introduction Basics of decision theory and graphical models Knowledge engineering and model development Multi-entity Bayesian networks and decision graphs Inference using Bayes Rule Combining expert knowledge and data Conclusion ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 31 - George Mason University GMU Department of Systems Engineering and Operations Research Importance of Sound Knowledge Engineering Process • Bayesian networks are increasing in popularity • Applications are growing more complex • A formal, repeatable process for knowledge engineering is becoming more important – Early work on elicitation of probability models (1970’s) focused on eliciting single probabilities or univariate probability distributions – Early work in BNs tended to assume that structure elicitation was relatively straightforward – As models become more complex the KE process must be managed • Knowledge elicitation for large Bayesian networks is a problem in systems engineering ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 32 - George Mason University GMU Department of Systems Engineering and Operations Research Systems Engineering and the System Lifecycle • Systems engineering is the technical and managerial process by which a user need is translated into an operational system • System life cycle evolves through predictable phases Retire Requirements Requirements Operate Analyze Test Analyze Design Build Test Design Build Operate Retire Spiral Lifecycle Model Waterfall Lifecycle Model ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 33 - George Mason University GMU Department of Systems Engineering and Operations Research Spiral Model of BN Engineering • Goal of knowledge engineering – Discovery and construction of appropriate model – Not extraction of pre-existing model • Spiral model is necessary for systems in which requirements are discovered as development progresses • Spiral KE – Construct series of prototype models – Explore behavior of prototype model on sample problems – Evaluate prototypes and restructure as necessary • KE changes both expert and elicitor – Understanding of expert and elicitor deepen as KE proceeds – Improves communication between elicitor and expert Requirements Operate Test Analyze Design ©Kathryn Blackmond Laskey June 2002 Build BDT Workshop - 34 - George Mason University GMU Department of Systems Engineering and Operations Research Applying Spiral BN Engineering • Begin with a small subproblem – – – – Self-contained Can be completed in short time Interesting in its own right Reasonably representative of global problem • Build and test model for subproblem – Look for common structures and processes that will recur – Think about more efficient ways to structure KE – Develop and document conventions (“style guide”) to be followed as models are expanded • Iteratively expand to more complex problems ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 35 - George Mason University GMU Department of Systems Engineering and Operations Research Model Construction Process • What are the variables? – World state variables represent possible outcomes – Decision variables represent available options – Utility variables represent attributes of value • What are the possible values of world state and decision variables? • What is the graph structure? • What is the structure of the local distributions for world state variables? • What are the local probability distributions for world state variables? • What are the local utility functions for the utility variables? ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 36 - George Mason University GMU Department of Systems Engineering and Operations Research The Participants • Naive view – Put problem experts and modeling experts in a room together and magic will happen” • Realistic view – Pure “problem experts” and pure “modeling experts” will talk past each other – Modeling experts must learn about the problem and problem experts must learn what models can do – This process can be time consuming and frustrating – Team will be more productive if both sides expect and tolerate this process • Training – The most productive way of training modelers and problem experts is to construct very simple models of stylized domain problems – Goal is understanding and NOT realism or accuracy! – Beware: the training phase can seem pointless and frustrating – It is important to get expert buy-in ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 37 - George Mason University GMU Department of Systems Engineering and Operations Research Selecting a Subproblem • Initial model or expansion of existing model • Characteristics – – – – Manageable size Interesting in its own right Path to expansion Risk mitigation • How to restrict – Focus or target variables - variables of direct interest to the decision maker » Restrict to subset of variables of interest » Restrict to subset of values – Evidence variables - variables for which information will be available; used to draw inferences about the focus variables » Restrict to subset of evidence sources – Context variables - variables that will be assumed known and will be set to definite values » Restrict to subset of contextual conditions (sensing conditions, background casual conditions; assignment of objects to sensors; number of objects) ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 38 - George Mason University GMU Department of Systems Engineering and Operations Research Managing Knowledge Acquisition • Record rationale for modeling decisions • Develop “style guide” to maintain consistency across multiple subproblems – Naming conventions – Variable definitions – Modeling conventions • Enforce configuration management – History of model versions – Protocols for making and logging changes to current model – Rationale for changes • Develop protocol for testing models – Record of test results traced to model changes and rationale ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 39 - George Mason University GMU Department of Systems Engineering and Operations Research Model Agility • Requirement: rapid adaptation of model to a new situation • Support for model agility – Libraries of reusable model fragments – Documentation of stable and changeable aspects of model – Development of data sources for inputs to changeable model components » Protocols for data collection and maintenance » Protocols for importing data into knowledge base – Automated support for propagating impact of changes ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 40 - George Mason University GMU Department of Systems Engineering and Operations Research Model Evaluation • Model walk-through – Present completed model to "fresh" experts and/or modelers – Evaluate all components of model • Sensitivity analysis – Measure effect of one variable on another – Compare with expert intuition to evaluate model – Evaluate whether additional modeling is needed • Case-based evaluation – Run model on set of test cases » Cases to check local model fragments (component testing) » Cases to test behavior of global model (whole-model testing) – Compare with expert judgment or “ground truth” – Important issue: selection of test cases ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 41 - George Mason University GMU Department of Systems Engineering and Operations Research Agenda 1. 2. 3. 4. 5. 6. 7. Introduction Basics of decision theory and graphical models Knowledge engineering and model development Multi-entity Bayesian networks and decision graphs Inference using Bayes Rule Combining expert knowledge and data Conclusion ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 42 - George Mason University GMU Department of Systems Engineering and Operations Research Limits of Standard BN / DG • “One size fits all” model – All problem instances involve: » Same set of variables » Same states for variables » Same relationships between variables – Only “evidence” (instantiated variables) varies from instance to instance – All potentially relevant explanations are explicitly represented • In complex, open-world problems: – Number of actors and relationships to each other not fixed in advance – Attribution of evidence to actors may not be known in advance – Situation evolves in time – Need to represent only the most important explanations ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 43 - George Mason University GMU Department of Systems Engineering and Operations Research Maria’s Continuing Saga… • Variation 1: – Tran is sneezing and saw scratches – Tran was recently exposed to a cold and probably is not allergy prone • Variation 2: – Tran saw scratches – Maria did not see scratches – Tran is in room with Maria • Variation 3: – Tran and Maria both are sneezing, are allergy prone, and saw scratches – Tran and Maria are a continent apart ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 44 - George Mason University Department of Systems Engineering and Operations Research GMU Variation 1 NearCat(Maria) True 91.4 False 8.63 • • AllergyProne(Maria) True 100 False 0 Add background variables to specialize model to different individuals Still a “one size fits all” model ExposedToCold(Maria) True 15.7 False 84.3 AllergicReaction(Mar... True 88.2 False 11.8 SeesScratches(Mar... True 100 False 0 ColdStatus(Maria) True 15.0 False 85.0 Sneezing(Maria) True 100 False 0 Health(Grandmother1) Exposed_to_Cold(Tran) True 100 False 0 Cold_Status(Tran) True 97.3 False 2.69 Allergy_Prone(Tran) True 5.71 False 94.3 Visit(Maria,Grandmother1) Go 109.977 StayHome 100.000 Pleasure(Maria,Grandmother1) Allergic_Reaction(Tran) True 4.43 False 95.6 Sneezing(Tran) True 100 False 0 Health(Grandmother2) Near_Cat(Tran) True 65.5 False 34.5 Sees_Scratches(Tran) True 100 False 0 Visit(Tran,Grandmother2) Go 27.6902 StayHome 100.000 Pleasure(Tran,Grandmother2) Netica® file Cat_SSN_Variation1.dne ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 45 - George Mason University GMU Department of Systems Engineering and Operations Research Variation 2 Near(Maria,Tran) True 100 False 0 Loc(Tran) ExposedToCold(Tran) True 100 False 0 ColdStatus(Tran) True 97.0 False 2.98 AllergyProne(Tran) True 5.99 False 94.0 Loc(Cat1) Near(Tran,Cat1) True 91.4 False 8.59 Loc(Maria) Near(Maria,Cat1) True 91.3 False 8.68 AllergicReaction(Tran) True 5.66 False 94.3 Sneezing(Tran) True 100 False 0 AllergyProne(Maria) True 100 False 0 ExposedToCold(Maria) True 15.8 False 84.2 AllergicReaction(Maria) True 88.1 False 11.9 Sees_Scratches(Tran) True 100 False 0 SeesScratches(Maria) True 54.9 False 45.1 ColdStatus(Maria) True 15.0 False 85.0 Sneezing(Maria) True 100 False 0 Health(Grandmother1) Visit(Maria,Grandmother1) Go 109.950 StayHome 100.000 • • Pleasure(Maria,Grandmother1) Decision graph has replicated sub-parts Different kinds of entities (cats, people) Netica® file Cat_SSN_Variation2.dne ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 46 - George Mason University GMU Department of Systems Engineering and Operations Research Variation 3 Done Wrong Near(Maria,Tran) True 0 False 100 Loc(Tran) ExposedToCold(Tran) True 33.8 False 66.2 ColdStatus(Tran) True 35.9 False 64.1 AllergyProne(Tran) True 100 False 0 Loc(Cat1) Near(Tran,Cat1) True 49.5 False 50.5 Loc(Maria) Near(Maria,Cat1) True 49.5 False 50.5 AllergicReaction(Tran) True 51.0 False 49.0 Sneezing(Tran) True 100 False 0 AllergyProne(Maria) True 100 False 0 ExposedToCold(Maria) True 33.8 False 66.2 AllergicReaction(Maria) True 51.0 False 49.0 Sees_Scratches(Tran) True 100 False 0 SeesScratches(Maria) True 100 False 0 ColdStatus(Maria) True 35.9 False 64.1 Sneezing(Maria) True 100 False 0 Health(Grandmother1) Visit(Maria,Grandmother1) Go 89.1446 StayHome 100.000 • • Pleasure(Maria,Grandmother1) Variation 2 model gets wrong answer if Maria and Tran are not near each other and both are near cats! We need to be able to hypothesize additional cats if and when necessary Netica® file Cat_SSN_Variation2.dne ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 47 - George Mason University Department of Systems Engineering and Operations Research Variation 3 Done Right GMU (…but what a mess!) Cat Other Type(Cat2) 92.2 7.77 Near(Tran,Cat2,T1) True 42.9 False 57.1 Cat Other Loc(Cat2,T1) Loc(Tran,T1) Near(Maria,Tran,T1) True 0 False 100 Loc(Maria,T1) Type(Cat1) 92.2 7.77 Near(Maria,Cat2,T1) True 42.9 False 57.1 Loc(Cat1,T1) Near(Tran,Cat1:Cat,T1) True 42.7 False 57.3 ColdStatus(Tran,T0) Cold 13.8 Exposed 13.0 Healthy 73.2 ColdStatus(Tran,T1) Cold 18.0 Exposed 8.79 Healthy 73.2 AllergyProne(Tran) True 100 False 0 Near(Tran,Cat1,T1) True 42.9 False 57.1 Near(Maria,Cat1,T1) True 42.9 False 57.1 Near(Tran,Cat2:Cat,T1) True 42.7 False 57.3 Near(Maria,Cat1:Cat,... True 42.7 False 57.3 Near(Tran,{c}:Cat,T1) True 85.3 False 14.7 Near(Maria,{c}:Cat,T1) True 85.3 False 14.7 AllergicReaction(Tran,T1) True 82.8 False 17.2 Sneezing(Tran,T1) True 100 False 0 Near(Maria,Cat2:Cat,... True 42.7 False 57.3 ColdStatus(Maria,T0) Cold 13.8 Exposed 13.0 Healthy 73.2 AllergyProne(Maria) True 100 False 0 ColdStatus(Maria,T1) Cold 18.0 Exposed 8.79 Healthy 73.2 AllergicReaction(Maria,... True 82.8 False 17.2 SeesScratches(Tran,... True 100 False 0 SeesScratches(Maria,... True 100 False 0 Sneezing(Maria,T1) True 100 False 0 Visit(Maria,Grandmother,... Go 107.033 StayHome 99.9999 • Health(Grandmother,T1) Pleasure(Maria,Grandmother,T1) This model gets the “right answer” on all the variations Netica® file Cat_SSN_Variation3.dne ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 48 - George Mason University Department of Systems Engineering and Operations Research GMU The Solution Near(h,:Cat,t) True 2.76 False 97.2 AllergyProne(h) True 5.00 False 95.0 Loc(x,t) Spatial Loc(y,t) Fragment AllergicReaction(h,t) True 1.10 False 98.9 SeesScratches(h,t) True 2.63 False 97.4 Cats & Allergies Fragment AllergicReaction(h,t) True 1.13 False 98.9 ColdStatus(h,t) Cold 12.0 Exposed 0 Healthy 88.0 Sneezing(h,t) True 17.2 False 82.8 Sneezing Fragment True False Near(x,y,t) 3.04 97.0 ColdStatus(h1,t) Cold 8.35 Exposed 9.82 Healthy 81.8 Visit(h1,h2,t) Go 116.651 StayHome 100.000 True False Near(x,c,t) 3.00 97.0 Type(c) 50.0 50.0 Near(x,c:Cat,t) True 1.50 False 98.5 Hypothesis Management Fragment Value Fragment Near(x,{c}:Cat,t) True 2.48 False 97.5 ColdStatus(h,t-1) Cold 8.35 Exposed 9.82 Healthy 81.8 Colds&Time Fragment Health(h2,t) Pleasure(h1,h2,t) • Cat Other ColdStatus(h,t) Cold 8.35 Exposed 9.82 Healthy 81.8 Specify model in pieces and let the computer compose them ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 49 - George Mason University GMU Department of Systems Engineering and Operations Research Multi-Entity Bayesian Networks and Decision Graphs • Represent knowledge as model fragments – Implicitly represents complete and consistent model of domain and anticipated situations – No a priori bound on #entities, #relevant relationships, #observations • Compose fragments dynamically into situation specific network (SSN) – A situation is a snapshot of the world at an instant of time – A situation-specific network is an ordinary, finite Bayesian network or decision graph constructed from the MEDG knowledge base using network construction operators • Use SSN to compute response to query – General purpose algorithm – Approximates the “correct answer” encoded by the knowledge base – Models with special structure can be solved with special-case algorithms • Use expert-guided Bayesian learning to update knowledge patterns over time ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 50 - George Mason University GMU Department of Systems Engineering and Operations Research Model Construction • Simpler models give same results as more complex model on problems for which they are adequate • We want to construct “good enough” model for our situation • Model constructor builds situation-specific DG from knowledge base implicitly encoding infinite-dimensional DG ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 51 - George Mason University GMU Department of Systems Engineering and Operations Research Functional Architecture for Model Construction System Suggestors BN/DG Fragment KB • Retrieve model fragments • Match variables • Attach evidence to variables Alert Messages • Combine fragments into situation-specific model • Update inferences and decisions Model Workspace Streaming Evidence ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 52 - George Mason University Department of Systems Engineering and Operations Research MEDGs and Object-Oriented Representation GMU • Entities are represented as objects • Probability part of MEDG expresses uncertainty about entities – – – – • Attribute uncertainty Existence uncertainty (false alarm) Type and subtype uncertainty (discrimination within a type hierarchy) Reference uncertainty (association) Value and action part of MEDG represents objectives and plans of software agent Physical Object Near Mammal Location Cat Human Time Composition Physical Object Association Inheritance ©Kathryn Blackmond Laskey Physical Object Physical Object June 2002 BDT Workshop - 53 - George Mason University GMU Department of Systems Engineering and Operations Research MEBN / MEDG Value Added Inference Collaborative Environment (ICE) • MEBN software developed by Information Extraction & Transport, Inc. • BMD/ICE development sponsored by MDA • SPARTA engineers are using ICE to develop fusion architecture for missile defense Modeling with standard Bayesian Networks Effort Modeling with richer representation for managing and reasoning w/ Uncertainty Problem Complexity # of entities, # interactions, spatio-temporal variables, ... ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 54 - George Mason University GMU Department of Systems Engineering and Operations Research Agenda 1. 2. 3. 4. 5. 6. 7. Introduction Basics of decision theory and graphical models Knowledge engineering and model development Multi-entity Bayesian networks and decision graphs Inference using Bayes Rule Combining expert knowledge and data Conclusion ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 55 - Department of Systems Engineering and Operations Research George Mason University GMU • • Bayesian Inference Bayesian inference is a theory of rational belief dynamics Uncertainty about state of world is represented by a probability distribution over states – Probability is a rational agent’s degree of belief about uncertain states of the world • • Beliefs are updated over time by conditioning on new information about the world The rule for updating beliefs with new information is called Bayes rule: P(H2 | E ) P(E | H1 ) P(H1) P(H1 | E) P(E | H 2 ) P(H1) Posterior odds ratio • Prior odds ratio Likelihood ratio Evidence increases odds of hypothesis H2 relative to hypothesis H1 if the evidence is more likely under H2 than under H1 – Posterior probability can increase even with unlikely evidence if it is more unlikely under alternate hypothesis – Posterior probability can fail to increase even with likely evidence if it is more likely under alternate hypothesis • Bayesian theory justifies the scientific process – The best way to confirm a hypothesis is to enumerate many plausible alternatives and find evidence to disconfirm them ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 56 - George Mason University GMU Department of Systems Engineering and Operations Research Graphical View of Bayes Rule * * * * * * * * * *** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ~ 1 million Zappists * * ~ 1 million non-Zappists • Province Province Northeastern 12.5 Other 87.5 Northeastern Other 63.2 36.8 Religion Religion Zappist 11.9 Other 88.1 Zappist Other 100 0 Depravia has a population of 40 million people – 5 million in Northeastern province and 35 million in the rest of Depravia • • The Zappist fundamentalists make up 60% of the population of the Northeastern province and 5% of the population of the rest of Depravia What is the probability that Frangolina is from the Northern province of Depravia if: – All we know about her is that she lives somewhere in Depravia? – We learn that she is a fundamentalist Zappist? Netica® file ZappistProvince.dne ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 57 - George Mason University GMU • Department of Systems Engineering and Operations Research Rare Events and the False Positive Index Diagnostic tests can be evaluated by sensitivity and specificity – Sensitivity is probability condition will be detected if present – Specificity is probability condition will not be detected if not present – We can increase sensitivity by adjusting the threshold for declaring a positive, but at the cost of decreasing specificity • The false positive index is the number of false positives for every true positive • The chart plots false positive index against base rate for a test with 3 different thresholds The best way to increase both sensitivity and specificity is to integrate information from multiple sources Bayesian networks do this in a principled way Increasing the accuracy of single-source sensors may not help if single source cannot observe all relevant features • • • ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 58 - George Mason University GMU • • • • • Department of Systems Engineering and Operations Research Soft Evidence Ten year old Leslie comes downstairs complaining of a headache and chills. You put your hand on her forehead and decide that she probably has a fever. What's the probability she's sick enough to stay home from school? We draw a Bayesian network with variables S (sick?), H (headache?), F (fever?), C (chills?). How do we process the evidence that her forehead "feels hot?" We could add a “virtual” child of F called T (touch forehead). We would assign possible values (e.g., very hot, hot, warm, cool) for T and assess P(T | F) for each value of F and T Why should we have to assess P(F|T) for values of T that did not occur? We can achieve the same effect using “soft evidence” for F – Soft evidence represents the relative likelihood of “Leslie’s feel” if she has a fever and if she doesn’t – An 8 to 1 ratio could be entered as (0.8,0.1), (0.4,0.05) or any pair of values with an 8 to 1 ratio S H F C T S H F (f) = .8 T_ (f) = .1 T C Netica® file Leslie.dne ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 59 - George Mason University Inference by Local Distributed Computation GMU • Department of Systems Engineering and Operations Research Distributed inference algorithms perform fast computation – Algorithm computes probability distribution of non-evidence variables given evidence variables using Bayes rule – Tractable exact inference on probability distributions of complex phenomena – Approximate inference makes even the most difficult problems feasible • Graph separation provides the basis for defining communication architecture for fusion systems – Compile Bayesian network into computational representation for inference – Determine information requirements at variables and required flows along links – Analyze tradeoffs in accuracy and computation/communication AB A F BEC B E G ECG C EGF D CGH H DC ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 60 - George Mason University GMU Department of Systems Engineering and Operations Research Inference Algorithms • Exact, graph theory based – Pearl's tree algorithm and its modifications – Junction tree algorithm – Influence diagram reduction • Exact, factorization based – Symbolic probabilistic inference – Bucket elimination • Approximate – Monte Carlo simulation – Variational methods – Various special case approaches All these algorithms solve the canonical inference problem: find the posterior probability distribution for some variable(s) given direct or virtual evidence about other variable(s) ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 61 - George Mason University GMU Department of Systems Engineering and Operations Research Agenda 1. 2. 3. 4. 5. 6. 7. Introduction Basics of decision theory and graphical models Knowledge engineering and model development Multi-entity Bayesian networks and decision graphs Inference using Bayes Rule Combining expert knowledge and data Conclusion ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 62 - George Mason University GMU Department of Systems Engineering and Operations Research The Learning Problem • Why learning? – – – – • Knowledge elicitation is difficult Experts can be biased There may be no experts for some problems In many applications data are plentiful Learning tasks – Learn a Bayesian network from observations alone – Combine expert knowledge and data • Structure of a learning algorithm – Method for searching over structures – Method for evaluating “goodness” of structures » Bayesian learner compares Structure1 and Structure2 by ratio of posterior probabilities P(Data | Structure1) / P(Data | Structure2) – Method for estimating parameters given structure » Bayesian learner computes posterior distribution of parameters given structure – Choice of output » Single most probable structure and estimated probability table? » Sample of structures from posterior distribution of structures, expected value of probability table, standard deviations of probabilities? » Other possibilities ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 63 - Department of Systems Engineering and Operations Research George Mason University GMU Raw Material: The Observations • Easiest case: – Random sample of cases from the network to be learned – Each case contains an observed value for all variables in the network A A B ? C D B C E D E • Complexities: A 0 1 0 1 1 1 ? 0 1 1 B 1 1 1 0 0 1 0 0 1 1 C 0 0 1 1 1 0 1 1 1 0 D 1 1 1 1 0 1 0 0 0 1 E 1 1 0 0 1 0 1 0 0 1 – Missing observations: Some variables are not observed in some cases – Hidden variables: Some variables are not observed in any cases – Non-random sampling: Sampled cases are not representative of the population for which a Bayesian network is being learned A B A ? C D ©Kathryn Blackmond Laskey E B C D E A 0 1 0 1 1 1 ? 0 1 1 B ? ? ? ? ? ? ? ? ? ? June 2002 C 0 0 ? 1 1 0 1 1 1 ? D 1 1 1 ? 0 1 0 0 0 1 E 1 1 0 0 1 0 1 0 0 1 BDT Workshop - 64 - Department of Systems Engineering and Operations Research George Mason University GMU Learning a Single Probability • Objective: From a sample of student responses to a problem, learn the probability that a similar student will answer correctly If we know nothing about the probability a priori then we may consider all probabilities equally likely (uniform distribution) Suppose 7 of 10 students answer correctly The posterior distribution has a Beta distribution with parameters 8 and 4 The posterior expected value is 8/(8+4) = 0.67 • • • • Correct(1) Correct(10) True 50.0 False 50.0 True 50.0 False 50.0 Correct(10) True 100 False 0 True False Correct(1) 100 0 Correct(2) Correct(9) 50.0 0 to 0.1 10.0 False 50.0 0.1 to 0.2 10.0 0.2 to 0.3 10.0 0.3 to 0.4 10.0 0.4 to 0.5 10.0 0.5 to 0.6 10.0 0.6 to 0.7 10.0 0.7 to 0.8 10.0 0.8 to 0.9 10.0 Correct(8) 50.0 False 50.0 50.0 False 50.0 True False PercentCorrect True True True 0.9 to 1 True False Correct(9) 100 0 PercentCorrect 0 to 0.1 0+ 0.1 to 0.2 .014 0.2 to 0.3 0.34 0.3 to 0.4 2.33 0.4 to 0.5 8.21 0.5 to 0.6 18.3 0.6 to 0.7 27.8 0.7 to 0.8 27.6 0.8 to 0.9 14.3 0.9 to 1 1.15 0.67 ± 0.13 Correct(3) True 50.0 False 50.0 10.0 True False 0.5 ± 0.29 Correct(8) 100 0 True False Correct(4) Correct(7) True 50.0 False 50.0 True 50.0 False 50.0 True False Correct(7) True 0 False 100 Correct(5) Correct(6) True 50.0 True 50.0 False 50.0 False 50.0 True False Prior distribution: All percentages are equally likely (uniform distribution) June 2002 True False Correct(3) 100 0 Correct(4) 100 0 Correct(5) 0 100 Posterior distribution: Beta(8,4) distribution with expected value 67% The possible values for a probability are any number between 0 and 1. Netica® approximates the posterior distribution with finitely many “bins” ©Kathryn Blackmond Laskey Correct(6) 100 0 Correct(2) 0 100 Netica® file Learning.dne BDT Workshop - 65 - George Mason University GMU • • • Department of Systems Engineering and Operations Research Virtual Counts and Expert/Data Combination Suppose 6 out of 10 students in a second sample answered correctly Our new posterior distribution is also a Beta distribution with parameters 14 and 8 The rule for updating: – Prior Beta distribution has “virtual count” of 8 for True and 4 for False – Data contains 6 true values and 4 false values – Add the virtual counts to the actual counts to obtain the posterior virtual counts of 14 and 8 • Uniform distribution is a Beta distribution with virtual count of 1 for True and 1 for False – Posterior probability for True is (1 + #True)/(2 + #Sample) • We can assess virtual counts from the expert and combine them with data – Larger virtual counts mean less uncertainty – Expected value for a state is its virtual count divided by total virtual count • The same idea works when there are more than two states if we use the Dirichlet distribution instead of the Beta distribution Why not just use observed frequencies to estimate probabilities? ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 66 - George Mason University Department of Systems Engineering and Operations Research GMU Triplot • A triplot is a convenient tool for visualizing the combination of prior and data to form a posterior distribution. It plots the prior, posterior and “normalized likelihood” on the same axes. • Triplot for vague but on target expert – Expert distribution is spread out but likely to be near 70% True – 5 out of 25 observations are True – Posterior expected value is 23/30 = 77% True 6 5 4 Prior (Beta(3,2)) NLK (Beta(21,6)) Post. (Beta(23,7)) 3 2 1 • “Blind” application of updating formula can be a bad idea if expert is wrong • Adding “pinch of probability” on hypothesis that model is wrong adds robustness • Serious conflict should trigger an alert! ©Kathryn Blackmond Laskey 0.98 0.91 0.84 0.77 0.70 0.63 0.56 0.49 0.42 0.35 0.28 0.21 0.14 0.07 0.00 0 June 2002 BDT Workshop - 67 - Department of Systems Engineering and Operations Research George Mason University The “Pinch of Probability” GMU 10 9 8 7 Prior (Beta(60,40)) NLK (Beta(21,6)) Post. (Beta(80,45)) 6 5 4 3 2 1 0.98 0.91 0.84 0.77 0.70 0.63 0.56 0.49 0.42 0.35 0.28 0.21 0.14 0.07 0.00 0 Overconfident but approximately correct expert With 1% “pinch” on uniform prior 7 6 5 Prior (Beta(10,30)) NLK (Beta(21,6)) Post. (Beta(30,35)) 4 3 2 1 0.98 0.91 0.84 0.77 0.70 0.63 0.56 0.49 0.42 0.35 0.28 0.21 0.14 0.07 0.00 0 Overconfident and wrong expert ©Kathryn Blackmond Laskey With 1% “pinch” on uniform prior June 2002 BDT Workshop - 68 - George Mason University GMU Department of Systems Engineering and Operations Research Learning Structure • We can represent uncertainty about the structure of a Bayesian network as a probability distribution over structures – Assign prior probability to each structure and to local distributions conditional on structure – Apply Bayes rule to obtain posterior distribution for structure and parameters given data • “Natural Occam’s Razor” – When sample sizes are relatively small Bayesian learning tends to favor simpler models (models with fewer arcs and simpler local distributions) – As observations accumulate Bayesian learning will adjust the number of parameters as needed to account for the data – In general Bayesian learning tends to converge to the simplest model that is consistent with the observations ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 69 - George Mason University GMU Department of Systems Engineering and Operations Research Agenda 1. 2. 3. 4. 5. 6. 7. Introduction Basics of decision theory and graphical models Knowledge engineering and model development Multi-entity Bayesian networks and decision graphs Inference using Bayes Rule Combining expert knowledge and data Conclusion ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 70 - George Mason University GMU Department of Systems Engineering and Operations Research Application Areas • BNs and DGs are increasingly being applied to a broad array of problems – Automated and mixed initiative multi-source fusion – Automated and mixed initiative data mining – Models and simulations for systems engineering trade studies and system design – Agent models for agent-based computing – Cognitive modeling – Corporate and government strategic planning – Biometrics – Cyber-security – Decision support for many applications – Human genome project • How could BNs and DGs be applied in your area of work? ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 71 - George Mason University GMU Department of Systems Engineering and Operations Research Summary • BNs and DGs move from hand-crafted special purpose models to genuine open-world reasoning capability – MEDGs are logic for new-paradigm computing • General-purpose language for formulating and specifying scientific, engineering and policy models – Specify knowledge via modular components with well-defined interconnections – Uncertainty and observability intrinsic elements of the representation – Combine physical, logical, and subjective knowledge in a unified and logically defensible manner – Represent both causal and correlational knowledge • Modular representation for composing complex probability models from manageable sub-units – First-principles domain model – Decision theory provides approach to trading off expressive power against complexity in » » » » Knowledge engineering Model construction Inference Learning – Unified knowledge modeling methodology & process spans sub-symbolic through cognitive and social/organizational levels • • Application experience to date shows strong promise There are many open research issues ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 72 - George Mason University Department of Systems Engineering and Operations Research GMU References • Bayesian networks and decision graphs – Charniak, E. Bayesian Networks without Tears, AI Magazine, 1993. – Jensen, F. Bayesian Networks and Decision Graphs Springer-Verlag, 2001. – Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1988. The first book popularizing Bayesian networks. – Clemen, R. Making Hard Decisions • Knowledge engineering – Special issue of IEEE Transactions on Knowledge and Data Engineering: Probability Models: Where do the Numbers Come From?, 2000. • Constructing BNs and DGs from modular and repeatable components – Glesner, S. and D. Koller (1995) Constructing Flexible Dynamic Belief Networks from FirstOrder Probabilistic Knowledge Bases, in ECSQARU ’95, pp. 217-226. – Laskey, K.B. , Mahoney, S.M. and Wright, E. (2001) Hypothesis Management in SituationSpecific BN Construction. In Koller, D. and Breese, J., Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference, San Francisco, CA: Morgan Kaufmann. • Combining expert knowledge and data – D. Heckerman. A tutorial on learning Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research, March, 1995. – Laskey, K.B. and Mahoney, S.M. Knowledge and Data Fusion in Probabilistic Networks submitted to Journal of Machine Learning Research special issue on knowledge and data fusion • Web site for my GMU course – contains lecture notes, additional references, exercises, useful links – http://ite.gmu.edu/~klaskey/CompProb ©Kathryn Blackmond Laskey June 2002 BDT Workshop - 73 -