Download Metodi Decisionali per l`e

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
MIND
Models in decision making & data @nalysis
Enza Messina and Francesco Archetti
Main Activities
Research Areas
o Machine Learning Algorithms
o Probabilistic and Relational Models
o Optimization Under Uncertainty
o
o
o
o
Applicative Domains
World Wide Web
Life Sciences
Ambient Intelligence
Finance
Faculty:
Post Doc:
PhD Students:
Others:
Francesco Archetti
Enza Messina
Guglielmo Lulli
Elisabetta Fersini
Daniele Toscani
Ilaria Giordani
Cristina Elena Manfredotti
Gaia Arosio
Irene Sberna
Francesca Bargna
Statistical Learning and Relational Data
-
Traditional learning methods are consistent with the classical
statistical inference problem formulation

-
are independent and identically distributed (i.i.d.)
but do not reflect the real world!
 We need a solution able to deal with relationships and
with uncertainty in more general terms
Probabilistic
Models
SL
Learning
Techniques
Probabilistic
Models
Learning
Techniques
SRL
Relational
Representation
Machine Learning and Relational Data
Traditional learning approaches



Patient
work well with flat representations
fixed length attribute-value vectors
assume independent (IID) sample
flatten
Problems:
– introduces statistical skew
– loses relational structure
• incapable of detecting link-based patterns
– must fix attributes in advance
Contact
Machine Learning and Relational Data


Bayesian nets use propositional representation
Real world has objects, related to each other
Intelligence
Diffic_CS101
Intell_Jane
Difficulty
Diffic_CS101
Intell_George
These
“instances”
are not
independent
Grade_Jane_CS101
A
Grade
Intell_George
Grade_George_CS101
C
Diffic_Geo101
Grade_George_Geo101
B
Daphne Koller, 2003
Probabilistic Relational Models

Integrate uncertainty with relational model

Convenient language for specifying complex models



“Web of influence”: subtle & intuitive reasoning
Framework for incorporating heterogeneous data by connecting
related entities (consider also relation uncertainty)
New problems:


Relational clustering
Collective classification
Heterogeneous
Information
L N
E E
A R
R
Gene Cluster
Exp. type
GCN4
HSF
Lipid
Exp. cluster
Endoplasmatic
Level

Open Problems: Inference and Learning
Inference
Some Applications
- Document Analysis
- Life Sciences
- Ambient Intelligence
Document Analysis
Document Analysis
The Web Case
Relational instances representation for enhancing:
Web Document Classification
Web Document Ranking
Rlv
ri1
ri 2
ri 3
rlv
Enhancing document representation for
inducing traditional learning algorithm
rlv
Document Analysis
The Web Case

Learning Models for Relational Data:

Relational Clustering
1. Constraint Learning
2. Objective Function Adaptation

Relational Classification:

Probabilistic Relational Models with Relational Uncertainty
Document
Link
♦ document_id
class
Rlv
#origin_ref
#destination_ref
Document Analysis
E-Forensics

JUdicial MAnagement by Digital Libraries Semantics
Information Extraction
Hearing Summarization
Proceedings n° ……..
Accused Name
XXXXXX
Witness Name
KKKKKK
Prosecutor Name
-
Lawyer Name
YYYYYY
ZZZZZZ
Meeting Date
1989
Meeting Location
Civitanova Marche
Emotion Recognition
Recent Publications
Journal Papers
E. Fersini, E. Messina, F. Archetti, A probabilistic relational approach for web document clustering, to appear in Journal of
Information Processing and Management.
E. Fersini, E. Messina, F. Archetti, Enhancing Web Page Classification using Visual Block Analysis, to appear in Journal of
Information Processing and Management.
Conference Papers
F. Archetti, G. Arosio, E. Fersini, E. Messina, Emotion recognition in judicial domain: a multilayer SVM approach, Lecture Notes
in Artificial Intelligence, Machine Learning and data Mining, Lipsia 2009.
E. Fersini, E. Messina, F. Archetti, Probabilistic relational models with relational uncertainty: an early study in web page
classification, IEEE WI-IAT Workshop, 2009.
F. Archetti, G. Arosio, E. Fersini, E. Messina, Audio-based Emotion Recognition for Advanced Automatic Retrieval in Judicial
Domain, Proc. ICT4JUSTICE, 1st Int. Conf. on ICT Solutions for Justice, Greece, 2008.
F. Archetti, E. Fersini, E. Messina, Granular modeling of web document: impact on information retrieval systems, Tenth
International Workshop on Web Information and Data Management – WIDM 2008
F. Archetti, E. Fersini, P. Campanelli, E. Messina, "A Hierarchical Document Clustering Environment Based on the Induced
Bisecting k-Means" LNCS Flexible Query Answering Systems, 2006.
Life Sciences
Relational clustering
Find a partition of a given set of instances using additional
information coming from instances relationships.
SEMI-SUPERVISED LEARNING METHOD
where relations can be represented by pair-wise constraints on
some of the istances (specifying whether two istances should be in
same or different cluster)
• Constraint Learning
• Modify distance measure
objective function
in
clustering
14
Systems Biology Applications
Learning gene regulatory networks
Gene
DNA
Control
Coding
+
Transcription
RNA
single
strand
Regulatory modules
Modelling the pharmacology of cancer
Human
cancer
Gene
expressio
n
Drug
Activity
Collaborations
Gene drug interaction
identification of a drug treatment for a given cell
line based both on drug activity pattern and gene
expression profile
Recent Publications
Journal Papers
E. Messina, M. Sanguineti eds, Special Issue on OR and data mining for biological data, Comuters and OR, to appear.
F. Archetti, I. Giordani, L. Vanneschi, Genetic Programming for Anticancer Therapeutic Response Prediction using the NCI-60 Dataset to appear in
Computer and operations Research, 2009.
L. Vanneschi, F. Archetti, M. Castelli, I. Giordani, Classification of Oncologic Data with Genetic Programming to appear in Journal of Artificial
Evolution and Applications, 2009.
G. Lulli, M. Romauch: A Mathematical Program to Refine Gene Regulatory Networks, Discrete Applied Mathematics, 157 (10), 2009.
F. Archetti, S. Lanzeni, E. Messina, Graph Models and Mathematical Programming in Biochemical Networks Analysis and Metabolic Engineering
Design, Computers & Mathematics with Applications, Vol. 55, n. 5, pp. 970-983, 2008.
S. Lanzeni, E. Messina, F. Archetti, Towards metabolic networks phylogeny using Petri Net-based expansional analysis, BMC Systems Biology 2007.
Conference Papers
F. Archetti, I Giordani, D. Mari, E. Messina, G. Ogliari, A Systems Biology Approach to oral anticoagulation therapy, Systbiohealth Symposium,2008
I. Giordani, L. Vanneschi, E. Fersini. “Modelling the Relationship between the Microarray Data of the NCI-60 Anticancer Dataset with Therapeutic
Responses by Genetic Programming”. SysBioHealth Symposium (ISBN: 978-88-903154-0-4), 2007.
E. Fersini, C. Manfredotti, E. Messina, F. Archetti. “Relational Clustering for Gene Expression Profiles and Drug Activity Pattern Analysis”.
SysBioHealth Symposium (ISBN: 978-88-903154-0-4), 2007.
F. Archetti, S. Lanzeni, E. Messina, L. Vanneschi, Genetic Programming for Computational Pharmacokinetics in Drug Discovery and Development.
Genetic Programming and Evolvable Machines, vol 8 (4), 2007.
F. Archetti, S. Lanzeni, E. Messina, L. Vanneschi "Genetic Programming and other Machine Learning approaches to predict Median Oral Lethal Dose
(LD50) and Plasma Protein Binding levels (%PPB) of drugs" Lecture Notes in Computer Sciences, EvoBIO 2007.
Submitted Papers
Archetti, Giordani, Messina, Mauri, A new clustering approach for learning transcriptional regulatory networks, submitted to Int. Journal of Data
Mining and Bioinformatics.
F. Archetti, S. Lanzeni, G. Lulli, E. Messina A mathematical model for optimal functional disruption of biochemical networks, submitted to Journal of
Mathematical Modelling and Algorithms.
E. Fersini, C. Manfredotti, E. Messina, F. Archetti Relational K-Means for Gene Expression Profiles and Drug Activity Pattern Analysis, submitted to
Int. Journal of Mathematical Modelling and Algorithms.
Pharmacogenomics Application:
Predict drug response to oral anticoagulation therapy (OAT)
Grouping (Profiling) patients based on their clinical and genotypic features
in order to suggest the correct drug dosage
Data on more than 1000 patients:
Haemorragic risk
Thrombotic risk
 Clinical and therapeutical data: personal patients
data, medical diagnosis, therapy, INR and dosage
measurements
 Genetic data: polymorphism of two genes:
CYP2C9 and VKORC1 that contribute to differences in
patients’ response.
In collaboration with
.
17
Inference and Decision Problems
observation
State
Estimation
Dynamic State Space Model
State: a vector of variables some
of which are not observable
belief
Action
Selection
action
A set of possible actions
given a belief state distribution
Transition Model p(xt|xt-1,at)
Observation Model p(zt|xt)
Tracking the (hidden) state of a system as it evolves over time from
sequentially arriving (noisy or ambiguous) observations
Ambient Intelligence
Multi-target tracking
Multi-target tracking: finding the tracks of an unknown number of moving targets
from noisy observations.
Track: sequence of “States” travelled by a target
need to be estimated (we’ll deal with on-line problems).
Requires Data Association: PF tracking objects individually, lack a consistent way to
resolve the ambiguities that arise in associating object with measurements


Exploiting relations can improve the efficiency of the tracker
Monitoring relations can be a goal in itself
We model the transition probability of the system with a RDBN.
In collaboration with
The main research topics we propose:

A new representation modelling not only objects but also their relations
(i.e. exploiting relations can improve the efficiency of the tracker).

A new computational strategy based on a family of Sequential Monte
Carlo methods called Relational Particle Filter

Statistical techniques for the detection of anomalous behaviours
21
Wireless Sensor Networks



Bayesian abstractions for virtual sensing through low cost data aggregation and netwide anomaly detection
Modelling Cluster Heads as nodes of a BN
Inference to know sensor values also in presence of temporary faults:


Lack of communication (sensor failure or sleep)
Outlier due to sensor malfunctioning
CH5
CH2
CH1
sink
CH4
BN
CH3
WSN
22
Transportation & Logistics
o
r
i
gf
Data
Models
In collaboration with:
u
Decisions
L
u
fv
P
hjf
w
k
w
w
f
j ,t
f
u ,T
j
 wh ,t l  wk ,t l
f
f
 wv,T  ww,T  1
f
f
d
e
s
tf
Recent Publications
Journal Papers
F. Archetti, M. Frigerio, E. Messina, D. Toscani, IKNOS - Inference and Knowledge in Networks of Sensors, to appear on Int. Journal of Sensor
Networks, 2009.
F. Chiti, R. Fantacci, F. Archetti, E. Messina, D. Toscani, An integrated Communications Framework for Context aware Continuous Monitoring
with Body Sensor Networks, IEEE Journal on Selected Areas in Communications, Vol.27, No.4, pp. 379-386, 2009.
P. Dell’Olmo, A. Iovanella, G. Lulli, B. Scoppola, Exploiting Incomplete Information to manage multiprocessor tasks with variable arrival rates,
Computers and Operations Research, Vol. 35, no 5, 2008.
G. Andreatta, G. Lulli, A Multi-period TSP with Stochastic Regular and Urgent Demands, European Journal of Operations Research, 2008.
D. Bertsimas, G. Lulli, A. Odoni, The ATFM Problem: An Integer Optimization Approach, Integer Programming and Combinatorial Optimization,
LNCS 5035, 2008.
K.F. Doerner, W. J. Gutjahr, R.F. Hartl, G. Lulli, Stochastic Local Search Procedures for the Probabilistic Two-Day Vehicle Routing Problem,
Advances in Computational Intelligence in Transportation and Logistics (A. Fink, F. Rothlauf Eds. )- Springer Series on Studies in Computational
Intelligence, pp. 153-168, 2008.
G. Lulli, S. Sen ,A Heuristic Algorithm for Stochastic Integer Program with Complete Recourse, European Journal of Operations Research, 2006.
Conference Papers
C. Manfredotti, Modeling and Inference with RDBNs, Canadian Artificial Intelligence Conference, Graduated Student Symposium, May, 2009.
C. Manfredotti, E. Messina, F. Archetti.Improving Multiple Traget Tracking with RDBNs, working paper presented at AIROWinter 2009,
International Conference of the Italian Operations Research Society, January, 2009.
F. Archetti, E. Messina, D. Toscani, M. Frigerio, KOINOS - Knowledge from observations and inference in networks of sensors, Proceedings of
IASTED International Conference on Sensor Networs, 2008.
F. Archetti, C. Manfredotti, M. Matteucci, E. Messina and D. G. Sorrenti, Multiple Hypotesis Markov Chains For On-Line Anomaly Detection in
Traffic Video Surveillance, Proceedings ICDP 2006: Imaging for Crime Detection and Prevention, 13-14 June 2006.
F.Archetti, C.E. Manfredotti, E. Messina, and D. G. Sorrenti foreground-to-ghost Discrimination in Single-difference Pre-processing, Lecture Notes
in Computer Science: Advanced Concepts for Intelligent Vision Systems, ACIVS’06, 263-274, 2006.
Submitted Papers
D. Toscani, F. Archetti, E. Messina, M. Frigerio, F. Chiti, R. Fantacci. SIFNOS – Statistical Inference and Filtering in Networks of Sensors.
Submitted to IEEE Journal on Selected Areas in Communications - Simple WSN Solutions, 2009.
Ambient Intelligence
Currently active Projects
LENVIS - Localised environmental and health information services for all (EU-FP7)
LIMNOS Logistics and Informatics for Mobility and Network OptimiSation (MIUR)
In collaboration with SAL Lab.
INSYEME – Integrated Systems for Emergencies (MIUR - FIRB)
GREIS - Gestione del Risparmio Energetico attraverso Informazioni di Sicurezza (MIUR)
In collaboration with NOMADIS Lab.
H-CIM Health Care through Intelligent Monitoring (MIUR)
Financial Time Series
Dynamic State Space Models for
Scenario Generation

Regime Switching Models

Observations: prices
St

Hidden var.: Regime
xt
Transition Model
p( xt | xt 1 )
Markov Chain
Observation Model
p( zStt | xt )
Mixture of Gaussians
(Autoregressive Process)
(Autoregressive) Hidden Markov Model
Recent Publications
Messina, E., Toscani, D., Hidden Markov models for scenario generation, IMA Journal of Management Mathematics, Vol. 4, pp. 379401, 2008.
27
Perspectives

Extend state space models to more general Relational Dynamic
Bayesian Networks to account not only prices but also
“exogenous” economic factors and unstructured information

Algorithms for managing risk tracking portfolio using all available
evidence and taking into account all uncertainties
Markets are good at gathering information from many heterogeneous sources and
combining it appropriately, the same we would expect from models
Projects & Collaborations
PRIN 2007 ”Probabilistic Models for representing uncertainty in portfolio optimization problems”
(with Università di Bergamo and Università della Calabria)
Collaboration with Brunel University and CARISMA Research Centre.
A cooperation network for research
projects and student mobility
University
of Toronto
Brunel
University
CARISMA
Research
Center
Norwegian University of
Science and Technology
Aachen University
Hungarian Academy
of Sciences
Massachusset
Institute of
Technology
Centre of Research
and Technology
Hellas
-TXT e-Solutions
-Siemens
-Project Automation
-Aegate Ltd
-OptiRisk
-Astra Zeneca
-DELOS
-Comerson