Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Some Interesting Problems Rakesh Agrawal IBM Almaden Research Center Foundations What is data mining A collection of techniques? A set of composable operations (a la Relational Algebra)? Hints: Inductive Databases (Mannila) Relational Calculus + Statistical Quantifiers (Imielinski) Privacy Implications Can we build accurate data models while preserving privacy of individual records? Hints Randomization (Agrawal & Srikant): Replace x by x+y where y is drawn from a known distribution Anonymization (Crypto literature) Web Mining: Beyond Click Streams Mining knowledge bases from the web Completeness Accuracy Malicious Spam Hints: Brin’s Book experiment etc. etc. Web Mining: Beyond hrefs What other social behaviors exist on the web and how to make use of them? Hints: Viral marketing paper in this conf etc. etc. Actionable Patterns Principled use of domain knowledge for discarding uninteresting patterns performance Hints: Papers in the recent KDD conferences Simultaneous mining over multiple data types Not just Relational tables Time series Textual documents But patterns across all of them Some more problems Online, incremental algorithms over data streams When to retire the past data Long sequential patterns Discovering richer patterns (trees and dags) Automatic, data-dependent selection of algorithm parameters What not to work on? The field is too young! Too early to say we don’t need new algorithms Let every flower bloom!!! Impressive results of the PVSM algorithm Emphasize evaluation and benchmarks Interesting research issues Applications most likely to benefit from data mining Web applications (I think) Bioinformatics (I hope!) Inhibitors Insufficient skill base (Education) Usability The true delight is in the finding out, rather than in the knowing. Isaac Asimov