Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
“Lost in the Middle of Nowhere” Graduate Student Presentation M. J. Gravier Learning Bayesian Network Structure from Distributed Data R. Chen, K. Sivakumar, H. Kargupta SIAM International Conference on Data Mining 2003 Overview • What is a Bayesian network? • What problem is addressed? • What is the contribution? Bayesian Networks • “...state-of-the-art representation of probabilistic knowledge.” • Graphical diagrams • Probabilistic degrees of dependency • Efficient representation of a joint probability distribution Sun-Me Lee and Patricia Abbott, “Bayesian networks for knowledge discovery in large datasets: basics for nurse researchers,” Journal of Biomedical Informatics, 36 (2003):389-399. Simple Bayesian Network Day after rock concert (X1) Poor exam grade (X2) Mega headache (X3) “Structure Learning”: discovering relationships by - a dependence analysis method (constraint satisfaction problem, often based on hypothesis testing) - a search and score method (basically an optimization problem) Advantages of BN • • • • • Domain expert knowledge Simple to understand Captures interactions Flexible re: missing information Less influenced by sample size Disadvantages of BN • Need conditional probabilities • Lack of software • Computational complexity Typical Centralized Data Site 2 Site 1 Database Site 5 Site 3 Site 4 What if its Decentralized? How do you create your Bayesian network model in this environment? Different data at each site Site 2 Site 1 Site 5 Site 3 Issues: - variable data can all be in one site Site 4 - variable data may be in two or more sites - bandwidth Collective Learning 1. 2. 3. 4. Local Learning Sample selection Cross learning Combination of the results 1. Local Learning • Local variable: since all the information is available locally, the normal local scoring method works • But what about non-local variables? Cross Variables • Some local and some non-local parents – local links can be found – problem with cross links Ulocal Ylocal instead of Ulocal Znon-localYlocal U Z Y Site 2 Site 1 2. Sample Selection • Rank-base local models – low probabilities evidence of cross relationships • Send “keys” for models ranked below threshold ρ from each site to a central site 3. Cross Learning • Keys from step 2 used to create a BN of cross relationships • ρ selection is critical – try two different levels and retain common cross links as a noise reduction method • Cross learning eliminates hidden variables 4. Combination • Combine local & cross load BNs • All BNlocal assembled, then cross links added with cross load BN • Finds missing cross links for cross variables • Eliminates extra local links (hidden variable problem) Experimental Validation • ALARM network model – on-line monitoring of ICU patients – widely used BN benchmark • Characteristics – 37 nodes – 5 cross variables – 15,000 samples Experimental Results • Learned correct structure • All cross links detected • ~10% of all samples transmitted Conclusion • Collective learning method learned same BN as centralized method • Small data transmission requirement • First approach to learn BN structure from heterogeneous data Questions?