Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MIND Models in decision making & data @nalysis Enza Messina and Francesco Archetti Main Activities Research Areas o Machine Learning Algorithms o Probabilistic and Relational Models o Optimization Under Uncertainty o o o o Applicative Domains World Wide Web Life Sciences Ambient Intelligence Finance Faculty: Post Doc: PhD Students: Others: Francesco Archetti Enza Messina Guglielmo Lulli Elisabetta Fersini Daniele Toscani Ilaria Giordani Cristina Elena Manfredotti Gaia Arosio Irene Sberna Francesca Bargna Statistical Learning and Relational Data - Traditional learning methods are consistent with the classical statistical inference problem formulation - are independent and identically distributed (i.i.d.) but do not reflect the real world! We need a solution able to deal with relationships and with uncertainty in more general terms Probabilistic Models SL Learning Techniques Probabilistic Models Learning Techniques SRL Relational Representation Machine Learning and Relational Data Traditional learning approaches Patient work well with flat representations fixed length attribute-value vectors assume independent (IID) sample flatten Problems: – introduces statistical skew – loses relational structure • incapable of detecting link-based patterns – must fix attributes in advance Contact Machine Learning and Relational Data Bayesian nets use propositional representation Real world has objects, related to each other Intelligence Diffic_CS101 Intell_Jane Difficulty Diffic_CS101 Intell_George These “instances” are not independent Grade_Jane_CS101 A Grade Intell_George Grade_George_CS101 C Diffic_Geo101 Grade_George_Geo101 B Daphne Koller, 2003 Probabilistic Relational Models Integrate uncertainty with relational model Convenient language for specifying complex models “Web of influence”: subtle & intuitive reasoning Framework for incorporating heterogeneous data by connecting related entities (consider also relation uncertainty) New problems: Relational clustering Collective classification Heterogeneous Information L N E E A R R Gene Cluster Exp. type GCN4 HSF Lipid Exp. cluster Endoplasmatic Level Open Problems: Inference and Learning Inference Some Applications - Document Analysis - Life Sciences - Ambient Intelligence Document Analysis Document Analysis The Web Case Relational instances representation for enhancing: Web Document Classification Web Document Ranking Rlv ri1 ri 2 ri 3 rlv Enhancing document representation for inducing traditional learning algorithm rlv Document Analysis The Web Case Learning Models for Relational Data: Relational Clustering 1. Constraint Learning 2. Objective Function Adaptation Relational Classification: Probabilistic Relational Models with Relational Uncertainty Document Link ♦ document_id class Rlv #origin_ref #destination_ref Document Analysis E-Forensics JUdicial MAnagement by Digital Libraries Semantics Information Extraction Hearing Summarization Proceedings n° …….. Accused Name XXXXXX Witness Name KKKKKK Prosecutor Name - Lawyer Name YYYYYY ZZZZZZ Meeting Date 1989 Meeting Location Civitanova Marche Emotion Recognition Recent Publications Journal Papers E. Fersini, E. Messina, F. Archetti, A probabilistic relational approach for web document clustering, to appear in Journal of Information Processing and Management. E. Fersini, E. Messina, F. Archetti, Enhancing Web Page Classification using Visual Block Analysis, to appear in Journal of Information Processing and Management. Conference Papers F. Archetti, G. Arosio, E. Fersini, E. Messina, Emotion recognition in judicial domain: a multilayer SVM approach, Lecture Notes in Artificial Intelligence, Machine Learning and data Mining, Lipsia 2009. E. Fersini, E. Messina, F. Archetti, Probabilistic relational models with relational uncertainty: an early study in web page classification, IEEE WI-IAT Workshop, 2009. F. Archetti, G. Arosio, E. Fersini, E. Messina, Audio-based Emotion Recognition for Advanced Automatic Retrieval in Judicial Domain, Proc. ICT4JUSTICE, 1st Int. Conf. on ICT Solutions for Justice, Greece, 2008. F. Archetti, E. Fersini, E. Messina, Granular modeling of web document: impact on information retrieval systems, Tenth International Workshop on Web Information and Data Management – WIDM 2008 F. Archetti, E. Fersini, P. Campanelli, E. Messina, "A Hierarchical Document Clustering Environment Based on the Induced Bisecting k-Means" LNCS Flexible Query Answering Systems, 2006. Life Sciences Relational clustering Find a partition of a given set of instances using additional information coming from instances relationships. SEMI-SUPERVISED LEARNING METHOD where relations can be represented by pair-wise constraints on some of the istances (specifying whether two istances should be in same or different cluster) • Constraint Learning • Modify distance measure objective function in clustering 14 Systems Biology Applications Learning gene regulatory networks Gene DNA Control Coding + Transcription RNA single strand Regulatory modules Modelling the pharmacology of cancer Human cancer Gene expressio n Drug Activity Collaborations Gene drug interaction identification of a drug treatment for a given cell line based both on drug activity pattern and gene expression profile Recent Publications Journal Papers E. Messina, M. Sanguineti eds, Special Issue on OR and data mining for biological data, Comuters and OR, to appear. F. Archetti, I. Giordani, L. Vanneschi, Genetic Programming for Anticancer Therapeutic Response Prediction using the NCI-60 Dataset to appear in Computer and operations Research, 2009. L. Vanneschi, F. Archetti, M. Castelli, I. Giordani, Classification of Oncologic Data with Genetic Programming to appear in Journal of Artificial Evolution and Applications, 2009. G. Lulli, M. Romauch: A Mathematical Program to Refine Gene Regulatory Networks, Discrete Applied Mathematics, 157 (10), 2009. F. Archetti, S. Lanzeni, E. Messina, Graph Models and Mathematical Programming in Biochemical Networks Analysis and Metabolic Engineering Design, Computers & Mathematics with Applications, Vol. 55, n. 5, pp. 970-983, 2008. S. Lanzeni, E. Messina, F. Archetti, Towards metabolic networks phylogeny using Petri Net-based expansional analysis, BMC Systems Biology 2007. Conference Papers F. Archetti, I Giordani, D. Mari, E. Messina, G. Ogliari, A Systems Biology Approach to oral anticoagulation therapy, Systbiohealth Symposium,2008 I. Giordani, L. Vanneschi, E. Fersini. “Modelling the Relationship between the Microarray Data of the NCI-60 Anticancer Dataset with Therapeutic Responses by Genetic Programming”. SysBioHealth Symposium (ISBN: 978-88-903154-0-4), 2007. E. Fersini, C. Manfredotti, E. Messina, F. Archetti. “Relational Clustering for Gene Expression Profiles and Drug Activity Pattern Analysis”. SysBioHealth Symposium (ISBN: 978-88-903154-0-4), 2007. F. Archetti, S. Lanzeni, E. Messina, L. Vanneschi, Genetic Programming for Computational Pharmacokinetics in Drug Discovery and Development. Genetic Programming and Evolvable Machines, vol 8 (4), 2007. F. Archetti, S. Lanzeni, E. Messina, L. Vanneschi "Genetic Programming and other Machine Learning approaches to predict Median Oral Lethal Dose (LD50) and Plasma Protein Binding levels (%PPB) of drugs" Lecture Notes in Computer Sciences, EvoBIO 2007. Submitted Papers Archetti, Giordani, Messina, Mauri, A new clustering approach for learning transcriptional regulatory networks, submitted to Int. Journal of Data Mining and Bioinformatics. F. Archetti, S. Lanzeni, G. Lulli, E. Messina A mathematical model for optimal functional disruption of biochemical networks, submitted to Journal of Mathematical Modelling and Algorithms. E. Fersini, C. Manfredotti, E. Messina, F. Archetti Relational K-Means for Gene Expression Profiles and Drug Activity Pattern Analysis, submitted to Int. Journal of Mathematical Modelling and Algorithms. Pharmacogenomics Application: Predict drug response to oral anticoagulation therapy (OAT) Grouping (Profiling) patients based on their clinical and genotypic features in order to suggest the correct drug dosage Data on more than 1000 patients: Haemorragic risk Thrombotic risk Clinical and therapeutical data: personal patients data, medical diagnosis, therapy, INR and dosage measurements Genetic data: polymorphism of two genes: CYP2C9 and VKORC1 that contribute to differences in patients’ response. In collaboration with . 17 Inference and Decision Problems observation State Estimation Dynamic State Space Model State: a vector of variables some of which are not observable belief Action Selection action A set of possible actions given a belief state distribution Transition Model p(xt|xt-1,at) Observation Model p(zt|xt) Tracking the (hidden) state of a system as it evolves over time from sequentially arriving (noisy or ambiguous) observations Ambient Intelligence Multi-target tracking Multi-target tracking: finding the tracks of an unknown number of moving targets from noisy observations. Track: sequence of “States” travelled by a target need to be estimated (we’ll deal with on-line problems). Requires Data Association: PF tracking objects individually, lack a consistent way to resolve the ambiguities that arise in associating object with measurements Exploiting relations can improve the efficiency of the tracker Monitoring relations can be a goal in itself We model the transition probability of the system with a RDBN. In collaboration with The main research topics we propose: A new representation modelling not only objects but also their relations (i.e. exploiting relations can improve the efficiency of the tracker). A new computational strategy based on a family of Sequential Monte Carlo methods called Relational Particle Filter Statistical techniques for the detection of anomalous behaviours 21 Wireless Sensor Networks Bayesian abstractions for virtual sensing through low cost data aggregation and netwide anomaly detection Modelling Cluster Heads as nodes of a BN Inference to know sensor values also in presence of temporary faults: Lack of communication (sensor failure or sleep) Outlier due to sensor malfunctioning CH5 CH2 CH1 sink CH4 BN CH3 WSN 22 Transportation & Logistics o r i gf Data Models In collaboration with: u Decisions L u fv P hjf w k w w f j ,t f u ,T j wh ,t l wk ,t l f f wv,T ww,T 1 f f d e s tf Recent Publications Journal Papers F. Archetti, M. Frigerio, E. Messina, D. Toscani, IKNOS - Inference and Knowledge in Networks of Sensors, to appear on Int. Journal of Sensor Networks, 2009. F. Chiti, R. Fantacci, F. Archetti, E. Messina, D. Toscani, An integrated Communications Framework for Context aware Continuous Monitoring with Body Sensor Networks, IEEE Journal on Selected Areas in Communications, Vol.27, No.4, pp. 379-386, 2009. P. Dell’Olmo, A. Iovanella, G. Lulli, B. Scoppola, Exploiting Incomplete Information to manage multiprocessor tasks with variable arrival rates, Computers and Operations Research, Vol. 35, no 5, 2008. G. Andreatta, G. Lulli, A Multi-period TSP with Stochastic Regular and Urgent Demands, European Journal of Operations Research, 2008. D. Bertsimas, G. Lulli, A. Odoni, The ATFM Problem: An Integer Optimization Approach, Integer Programming and Combinatorial Optimization, LNCS 5035, 2008. K.F. Doerner, W. J. Gutjahr, R.F. Hartl, G. Lulli, Stochastic Local Search Procedures for the Probabilistic Two-Day Vehicle Routing Problem, Advances in Computational Intelligence in Transportation and Logistics (A. Fink, F. Rothlauf Eds. )- Springer Series on Studies in Computational Intelligence, pp. 153-168, 2008. G. Lulli, S. Sen ,A Heuristic Algorithm for Stochastic Integer Program with Complete Recourse, European Journal of Operations Research, 2006. Conference Papers C. Manfredotti, Modeling and Inference with RDBNs, Canadian Artificial Intelligence Conference, Graduated Student Symposium, May, 2009. C. Manfredotti, E. Messina, F. Archetti.Improving Multiple Traget Tracking with RDBNs, working paper presented at AIROWinter 2009, International Conference of the Italian Operations Research Society, January, 2009. F. Archetti, E. Messina, D. Toscani, M. Frigerio, KOINOS - Knowledge from observations and inference in networks of sensors, Proceedings of IASTED International Conference on Sensor Networs, 2008. F. Archetti, C. Manfredotti, M. Matteucci, E. Messina and D. G. Sorrenti, Multiple Hypotesis Markov Chains For On-Line Anomaly Detection in Traffic Video Surveillance, Proceedings ICDP 2006: Imaging for Crime Detection and Prevention, 13-14 June 2006. F.Archetti, C.E. Manfredotti, E. Messina, and D. G. Sorrenti foreground-to-ghost Discrimination in Single-difference Pre-processing, Lecture Notes in Computer Science: Advanced Concepts for Intelligent Vision Systems, ACIVS’06, 263-274, 2006. Submitted Papers D. Toscani, F. Archetti, E. Messina, M. Frigerio, F. Chiti, R. Fantacci. SIFNOS – Statistical Inference and Filtering in Networks of Sensors. Submitted to IEEE Journal on Selected Areas in Communications - Simple WSN Solutions, 2009. Ambient Intelligence Currently active Projects LENVIS - Localised environmental and health information services for all (EU-FP7) LIMNOS Logistics and Informatics for Mobility and Network OptimiSation (MIUR) In collaboration with SAL Lab. INSYEME – Integrated Systems for Emergencies (MIUR - FIRB) GREIS - Gestione del Risparmio Energetico attraverso Informazioni di Sicurezza (MIUR) In collaboration with NOMADIS Lab. H-CIM Health Care through Intelligent Monitoring (MIUR) Financial Time Series Dynamic State Space Models for Scenario Generation Regime Switching Models Observations: prices St Hidden var.: Regime xt Transition Model p( xt | xt 1 ) Markov Chain Observation Model p( zStt | xt ) Mixture of Gaussians (Autoregressive Process) (Autoregressive) Hidden Markov Model Recent Publications Messina, E., Toscani, D., Hidden Markov models for scenario generation, IMA Journal of Management Mathematics, Vol. 4, pp. 379401, 2008. 27 Perspectives Extend state space models to more general Relational Dynamic Bayesian Networks to account not only prices but also “exogenous” economic factors and unstructured information Algorithms for managing risk tracking portfolio using all available evidence and taking into account all uncertainties Markets are good at gathering information from many heterogeneous sources and combining it appropriately, the same we would expect from models Projects & Collaborations PRIN 2007 ”Probabilistic Models for representing uncertainty in portfolio optimization problems” (with Università di Bergamo and Università della Calabria) Collaboration with Brunel University and CARISMA Research Centre. A cooperation network for research projects and student mobility University of Toronto Brunel University CARISMA Research Center Norwegian University of Science and Technology Aachen University Hungarian Academy of Sciences Massachusset Institute of Technology Centre of Research and Technology Hellas -TXT e-Solutions -Siemens -Project Automation -Aegate Ltd -OptiRisk -Astra Zeneca -DELOS -Comerson