Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Modeling and Language Support for the management of PBMS Manolis Terrovitis Panos Vassiliadis Spiros Skiadopoulos Elisa Bertino Barbara Catania Anna Maddalena 1 Outline Introduction Modeling of data and patterns Query operators Summary and future work 2 Motivation Huge amounts of data are produced. Interesting knowledge has to be detected and extracted. Knowledge extraction techniques (i.e., Data Mining) are not sufficient: Huge amounts of results (clusters, association tules, decision trees etc) Arbitrary modeling of results 3 Motivation (con’t) We need to be able to manipulate the knowledge discovered! The basic requirements: A generic and homogenous model for patterns. Well defined query operators. Efficient storage. 4 The Patterns and PBMS [Rizzi et. al. ER 2003] Patterns are compact and rich in semantics representations of raw data. Clusters, association rules, decision trees e.t.c. Pattern Base Management System Patterns are treated as first class citizens Pattern-based queries Approximate mapping between patterns and raw data 5 Contributions We formally define the logical foundations for pattern management We present a pattern specification language We introduce queries and query operators 6 Outline Introduction Modeling of data and patterns Query operators Summary and future work 7 PBMS architecture Pattern Space Pattern Classes Pattern Space: Pattern Types Pattern Classes Patterns Pattern Types Instance of Member of Patterns Intermediate Results Intermediate Mappings Data Mining Algorithms Data Space Pattern Recognition Algorithms DB1 Data Space DB2 8 The patterns Patterns hold information for: the data source the structure of the pattern The relation between the structure and the source, in an approximate logical formula. 9 Pattern - Cluster Example Pid 337 Structure [CENTER: [X: 21, Y: 1200], RAD: 12 ] Data EMP: {[Age, Salary]} Formula (t.Age - 21)2 + (t.Salary - 1200)2 ≤ 12 2 where t EMP 10 Pattern type - example Name Disk Structure Schema [CENTER: [X:real, Y: real], RAD: real ] Data Schema REL: {[X: real, Y: real]} Formula Schema (t.X - CENTER.X)2 + (t.Y - CENTER.Y )2 ≤RAD2 where t REL 11 The formula An intentional description of the patterndata relation pros: Efficiency, more intuitive results cons: Accuracy 12 Intentional vs. Extensional AGE 30 Salary 30 31 13 The formula (con’t) The formula is a predicate: fp(x,y) where x Source,y Structure Expressiveness. Functions and predicates Safety. Range restriction. Queries employing the formula are n-depth domain independent. 14 Outline Introduction Modeling of data and patterns Query operators Summary and future work 15 Query Operators Query operator classes: Database operators Pattern Base operators Crossover database operators Crossover pattern base operators 16 Crossover Operators Exact evaluation, via the intermediate mappings Approximate evaluation, via the formula Data Space Pattern Space PID data formula structure Exact Approximation Exact 17 Crossover Operators Database Drill-Through: Which data are represented by these patterns? Data-Covering: Which data from this dataset can be represented by this pattern? Pattern Base Pattern-Covering: Which of these patterns represent this dataset? 18 Query Example AGE p q Drill-through( { p | p intersects q}) Salary 19 Outline Introduction Modeling of data and patterns Query Operators Summary and future work 20 Summary Formal specification of basic PBMS concepts Investigation on the representation of the pattern-data relation Formal definition of query operators 21 Future Work Query language Generic similarity measures Efficient implementation of intermediate mappings Statistical measures for the patterns. 22