Download Some Interesting Problems

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data center wikipedia , lookup

Data model wikipedia , lookup

Data analysis wikipedia , lookup

Operational transformation wikipedia , lookup

3D optical data storage wikipedia , lookup

Forecasting wikipedia , lookup

Data vault modeling wikipedia , lookup

Web analytics wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Business intelligence wikipedia , lookup

Semantic Web wikipedia , lookup

Database model wikipedia , lookup

Information privacy law wikipedia , lookup

Data mining wikipedia , lookup

Data Mining: Next 10 Years
Rakesh Agrawal
IBM Almaden Research Center
What is data mining
A collection of techniques?
A set of composable operations (a la
Relational Algebra)?
Inductive Databases (Mannila)
Relational Calculus + Statistical Quantifiers
Privacy Implications
Can we build accurate data models while
preserving privacy of individual records?
Randomization (Agrawal & Srikant): Replace
x by x+y where y is drawn from a known
Anonymization (Crypto literature)
Web Mining: Beyond Click
Mining knowledge bases from the web
Malicious Spam
Brin’s Book experiment
etc. etc.
Web Mining: Beyond hrefs
What other social behaviors exist on the
web and how to make use of them?
Viral marketing paper in this conf
etc. etc.
Actionable Patterns
Principled use of domain knowledge for
discarding uninteresting patterns
Papers in the recent KDD conferences
Simultaneous mining over
multiple data types
Not just
Relational tables
Time series
Textual documents
But patterns across all of them
Some more problems
Online, incremental algorithms over data
When to retire the past data
Long sequential patterns
Discovering richer patterns (trees and dags)
Automatic, data-dependent selection of
algorithm parameters
What not to work on?
The field is too young!
Too early to say we don’t need new
Let every flower bloom!!!
Impressive results of the PVSM algorithm
Emphasize evaluation and benchmarks
Interesting research issues
Applications most likely to
benefit from data mining
Web applications (I think)
Bioinformatics (I hope!)
Insufficient skill base (Education)
Grand Challenge
 What’s there
 What has changed
Across sovereign data repositories
The true delight is in the finding
out, rather than in the knowing.
Isaac Asimov