Download Some Interesting Problems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data center wikipedia , lookup

Data model wikipedia , lookup

Data analysis wikipedia , lookup

Operational transformation wikipedia , lookup

3D optical data storage wikipedia , lookup

Forecasting wikipedia , lookup

Data vault modeling wikipedia , lookup

Web analytics wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Business intelligence wikipedia , lookup

Semantic Web wikipedia , lookup

Database model wikipedia , lookup

Information privacy law wikipedia , lookup

Data mining wikipedia , lookup

Transcript
Data Mining: Next 10 Years
Rakesh Agrawal
IBM Almaden Research Center
Foundations

What is data mining



A collection of techniques?
A set of composable operations (a la
Relational Algebra)?
Hints:


Inductive Databases (Mannila)
Relational Calculus + Statistical Quantifiers
(Imielinski)
Privacy Implications


Can we build accurate data models while
preserving privacy of individual records?
Hints


Randomization (Agrawal & Srikant): Replace
x by x+y where y is drawn from a known
distribution
Anonymization (Crypto literature)
Web Mining: Beyond Click
Streams

Mining knowledge bases from the web




Completeness
Accuracy
Malicious Spam
Hints:


Brin’s Book experiment
etc. etc.
Web Mining: Beyond hrefs


What other social behaviors exist on the
web and how to make use of them?
Hints:


Viral marketing paper in this conf
etc. etc.
Actionable Patterns

Principled use of domain knowledge for



discarding uninteresting patterns
performance
Hints:

Papers in the recent KDD conferences
Simultaneous mining over
multiple data types

Not just




Relational tables
Time series
Textual documents
But patterns across all of them
Some more problems





Online, incremental algorithms over data
streams
When to retire the past data
Long sequential patterns
Discovering richer patterns (trees and dags)
Automatic, data-dependent selection of
algorithm parameters
What not to work on?

The field is too young!


Too early to say we don’t need new
algorithms


Let every flower bloom!!!
Impressive results of the PVSM algorithm
Emphasize evaluation and benchmarks

Interesting research issues
Applications most likely to
benefit from data mining


Web applications (I think)
Bioinformatics (I hope!)
Inhibitors


Insufficient skill base (Education)
Usability
Grand Challenge
Find
 What’s there
 What has changed
Across sovereign data repositories
The true delight is in the finding
out, rather than in the knowing.
Isaac Asimov