Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Will Data Mining Change the Functions of DBMS? Jiawei Han DAIS (Data And Information Systems) Lab University of Illinois at Urbana-Champaign Will DM Be Integrated with DB Functions? DM: Already a functional component of DBMS Microsoft/SQLServer: Analysis Manager IBM/DB2 & IntelligentMiner Oracle: Data Mining Package But will DM be “intruding” into DBMS, i.e., be integrated with essential DBMS functions? Indexing Data integration Data cleaning Query processing Indexing by Data Mining Indexing graphs? ─ # of subgraphs: exponential! Chemical Informatics/bioinformatics … Discriminative frequent graph patterns (SIGMOD’04) Indexing subsequences? Shopping sequence, DNA/protein sequence (SDM’05) When is discriminative frequent pattern indexing useful? Complex objects, big (object) queries Sample database (a) (b) Query graph (c) Data Cleaning by Data Mining Load messy data into a structured database? Inconsistent data: age = “1946”? Field mis-alignments Glitches of data: completely messed up inputs Missing/un-matching delimiters: XML, HTML data Big field: BLOB, CLOB, multimedia and text Data mining Data cleaning by distribution/outlier analysis Dependency/correlation analysis Schema-directed or schema “discovery” Data Integration by Data Mining Linking and mining cross-over multiple data relations Cross-mine (Classification across multiple data relations: ICDE’04) Search across heterogeneous databases Object identification/merge, reference reconciliation (Alon’s group) Mining across heterogeneous DBs Personalizing data from heterogeneous sources Query Processing by Data Mining Query plan refinement based on query execution history Better query planning by investigating additional data statistics Current optimizer: key/foreign key, cardinality, # distinct values Additional information: Strong dependency/correlation Histogram, dense vs. sparse regions, etc. Conclusions DBers have been “invading” into DM and made great contributions It is time to consider that DM may invade DBMS to enhance its functionality General philosophy Invisible data mining Google is doing this for page ranking successfully Can we do it to enhance DBMS? You can do better if you know your data better!