Download Towards a spatio-temporal data mining query language Roberto

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Towards a spatio-temporal data mining query language
Roberto Trasarti
PhD Thesis Advancement (Gennaio 2009)
Supervisori: Fosca Giannotti (ISTI-CNR), Chiara Renso (ISTI-CNR), Dino Pedreschi
Commissione: Giorgio Ghelli, Ugo Montanari
Sommario del lavoro svolto nel secondo anno (2008)
Adhering to the objectives of the thesis proposal, a theoretical framework for progressively mining and
querying both movement data and patterns has been developed. The proposal is based on an algebraic
framework, referred to as 2W Model, that defines the knowledge discovery process as a progressive
combination of mining and querying operators. The 2W Model framework provides the underlying
procedural semantics for a language called MO-DMQL, that allows to progressively refine mining
objectives. MO-DMQL extends conventional SQL in two respects, namely a pattern definition mechanism
and the capability to uniformly manipulate both raw data and unveiled patterns. Also, an innovative
computational engine, DAEDALUS, has been developed, capable of processing MO-DMQL statements. The
expressiveness and usefulness of the MO-DMQL language as well as the computational capabilities of
DAEDALUS are qualitatively evaluated by means of a case study, based on a massive dataset of GPS tracks
of private vehicles in Milan.
The work of the candidate on this line of research, which represents the main objective of the thesis, has been
pursued in collaboration with the supervisors, but the role of Trasarti has been central, both in the language
design task and in the system design, implementation and optimization. In the third year, focus will be
directed on some challenging issues: Foremost, the identification of a compact 2W Model algebra, consisting
of a fixed, minimal set of operators. This is useful in two aspects, i.e. the possibility of expressing the
required patterns via suitable combinations of such basic operators, rather than relying on an arbitrary
number of task-oriented mining operators, and the development of a solid theoretical background concerning
expressiveness and complexity results. Also, the development of strategies for optimizing processing plans
would increase the overall performance of the proposed MO-DMQL engine.
As a side effect of the research on the main topic of the PhD thesis, the candidate has cooperated to three
other lines of research, inspired and/or made possible by the availability of MO-DMQL:
• Athena: a reasoner engine for semantic interpretation of movement behavior.
An approach for movement understanding and analysis of semantic trajectories, based on a synergy
of knowledge discovery techniques with ontologies: the proposed reasoning framework is based
onto DEDALUS.
• Location Prediction: a new prediction method in mobility context.
A method for learning a predictor for the next location of a moving object, using the movements of
all moving objects in a certain area. The method has been developed as an add-on to DEDALUS, to
show how the proposed architecture is modular and extensible with new data mining models.
• SMA: a New Technique for Mining Frequent Sequences Under Regular Expression.
In this work the problem of mining frequent sequences satisfying a given regular expression is
presented.
This three works extend in different ways the expressiveness of the MO-DMQL extending the set of tools
that can be used in the language. Athena is an example of an application-oriented layer that can be built on
top of the DMQL, while the two other methods are examples of novel data mining models/patterns that can
be easily plugged into the DMQL (also to validate its logical/physical extensibility). As such, the value of
these contributions in the PhD experience of the candidate are essentially in assessing the expressiveness and
efficiency of the main goal of the thesis, besides proving the ability of the candidate itself in contributing
effectively to collaborative research.
Raccomandazioni
La commissione valuta positivamente lo stato di avanzamento delle ricerche del candidato, alla luce degli
sviluppi nel disegno e nella realizzazione del data mining query language; apprezza inoltre i risultati
collaterali ottenuti dal candidato, riconoscendo come questi siano in gran parte dovuti alla disponibilità di un
ambiente integrato su cui basarsi. Alla luce della presentazione del candidato, ed in vista del completamento
del lavoro di tesi, la commissione ha ritenuto comunque di rimarcare l’importanza di mantenere la centralità
dell’obiettivo definito nella proposta (data mining query language spazio-temporale), valorizzandone tutti gli
aspetti (disegno, realizzazione, ottimizzazione, etc) e strutturando in coerenza con questo obiettivo il
manoscritto finale. In questa ottica, il disegno del DMQL potrebbe essere utilmente l’argomento di un
seminario del candidato in corso di anno.
Pisa, 19 gennaio 2009