Download Some Interesting Problems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Some Interesting Problems
Rakesh Agrawal
IBM Almaden Research Center
Foundations

What is data mining



A collection of techniques?
A set of composable operations (a la
Relational Algebra)?
Hints:


Inductive Databases (Mannila)
Relational Calculus + Statistical Quantifiers
(Imielinski)
Privacy Implications


Can we build accurate data models while
preserving privacy of individual records?
Hints


Randomization (Agrawal & Srikant): Replace
x by x+y where y is drawn from a known
distribution
Anonymization (Crypto literature)
Web Mining: Beyond Click
Streams

Mining knowledge bases from the web




Completeness
Accuracy
Malicious Spam
Hints:


Brin’s Book experiment
etc. etc.
Web Mining: Beyond hrefs


What other social behaviors exist on the
web and how to make use of them?
Hints:


Viral marketing paper in this conf
etc. etc.
Actionable Patterns

Principled use of domain knowledge for



discarding uninteresting patterns
performance
Hints:

Papers in the recent KDD conferences
Simultaneous mining over
multiple data types

Not just




Relational tables
Time series
Textual documents
But patterns across all of them
Some more problems





Online, incremental algorithms over data
streams
When to retire the past data
Long sequential patterns
Discovering richer patterns (trees and dags)
Automatic, data-dependent selection of
algorithm parameters
What not to work on?

The field is too young!


Too early to say we don’t need new
algorithms


Let every flower bloom!!!
Impressive results of the PVSM algorithm
Emphasize evaluation and benchmarks

Interesting research issues
Applications most likely to
benefit from data mining


Web applications (I think)
Bioinformatics (I hope!)
Inhibitors


Insufficient skill base (Education)
Usability
The true delight is in the finding
out, rather than in the knowing.
Isaac Asimov