Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Towards a spatio-temporal data mining query language Roberto Trasarti PhD Thesis Advancement (Gennaio 2009) Supervisori: Fosca Giannotti (ISTI-CNR), Chiara Renso (ISTI-CNR), Dino Pedreschi Commissione: Giorgio Ghelli, Ugo Montanari Sommario del lavoro svolto nel secondo anno (2008) Adhering to the objectives of the thesis proposal, a theoretical framework for progressively mining and querying both movement data and patterns has been developed. The proposal is based on an algebraic framework, referred to as 2W Model, that defines the knowledge discovery process as a progressive combination of mining and querying operators. The 2W Model framework provides the underlying procedural semantics for a language called MO-DMQL, that allows to progressively refine mining objectives. MO-DMQL extends conventional SQL in two respects, namely a pattern definition mechanism and the capability to uniformly manipulate both raw data and unveiled patterns. Also, an innovative computational engine, DAEDALUS, has been developed, capable of processing MO-DMQL statements. The expressiveness and usefulness of the MO-DMQL language as well as the computational capabilities of DAEDALUS are qualitatively evaluated by means of a case study, based on a massive dataset of GPS tracks of private vehicles in Milan. The work of the candidate on this line of research, which represents the main objective of the thesis, has been pursued in collaboration with the supervisors, but the role of Trasarti has been central, both in the language design task and in the system design, implementation and optimization. In the third year, focus will be directed on some challenging issues: Foremost, the identification of a compact 2W Model algebra, consisting of a fixed, minimal set of operators. This is useful in two aspects, i.e. the possibility of expressing the required patterns via suitable combinations of such basic operators, rather than relying on an arbitrary number of task-oriented mining operators, and the development of a solid theoretical background concerning expressiveness and complexity results. Also, the development of strategies for optimizing processing plans would increase the overall performance of the proposed MO-DMQL engine. As a side effect of the research on the main topic of the PhD thesis, the candidate has cooperated to three other lines of research, inspired and/or made possible by the availability of MO-DMQL: • Athena: a reasoner engine for semantic interpretation of movement behavior. An approach for movement understanding and analysis of semantic trajectories, based on a synergy of knowledge discovery techniques with ontologies: the proposed reasoning framework is based onto DEDALUS. • Location Prediction: a new prediction method in mobility context. A method for learning a predictor for the next location of a moving object, using the movements of all moving objects in a certain area. The method has been developed as an add-on to DEDALUS, to show how the proposed architecture is modular and extensible with new data mining models. • SMA: a New Technique for Mining Frequent Sequences Under Regular Expression. In this work the problem of mining frequent sequences satisfying a given regular expression is presented. This three works extend in different ways the expressiveness of the MO-DMQL extending the set of tools that can be used in the language. Athena is an example of an application-oriented layer that can be built on top of the DMQL, while the two other methods are examples of novel data mining models/patterns that can be easily plugged into the DMQL (also to validate its logical/physical extensibility). As such, the value of these contributions in the PhD experience of the candidate are essentially in assessing the expressiveness and efficiency of the main goal of the thesis, besides proving the ability of the candidate itself in contributing effectively to collaborative research. Raccomandazioni La commissione valuta positivamente lo stato di avanzamento delle ricerche del candidato, alla luce degli sviluppi nel disegno e nella realizzazione del data mining query language; apprezza inoltre i risultati collaterali ottenuti dal candidato, riconoscendo come questi siano in gran parte dovuti alla disponibilità di un ambiente integrato su cui basarsi. Alla luce della presentazione del candidato, ed in vista del completamento del lavoro di tesi, la commissione ha ritenuto comunque di rimarcare l’importanza di mantenere la centralità dell’obiettivo definito nella proposta (data mining query language spazio-temporale), valorizzandone tutti gli aspetti (disegno, realizzazione, ottimizzazione, etc) e strutturando in coerenza con questo obiettivo il manoscritto finale. In questa ottica, il disegno del DMQL potrebbe essere utilmente l’argomento di un seminario del candidato in corso di anno. Pisa, 19 gennaio 2009