Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mainlining Data Mining: Jim Gray Microsoft Panel talk at ICDE2000 San Diego, 2 Mar 2000 Is data mining still a niche technology? • 97,363 items on Northern Light re “data mining” • 9,075,288 items re “data base” or “database” • Is 100,000 items a niche? (OR: 14K, XML: 250K) • Today data mining tools for experts (statisticians). (Decision Trees, Clusters, K-means, Neural nets…) • High tech and High Touch aka: consulting and license fees And the vendors like it that way. • Claim that you MUST understand the technology to use it. But.. The Petabytes are Coming!! • We will be/are drowning in data/email/web.. • Abstraction & categorization are key technologies • But, – They have to work. – They have to be trivial to learn. • Successful Ubiquitous data mining (clustering/classifiers…) – Mail Filters/Classifiers – Resume readers – Shopping recommendations, Community finders – Web search engines Key technical/research issues for transition to the mainstream? PROCESS PROBLEMS: • • • • Getting data into tool is hell Scrubbing data is hell Then comes the easy part: mining Then comes the really hard part: visualization and understanding • Most of us: – Can’t understand neural nets (that’s bad). – Can’t understand statistics (that’s a fact). Key technical/research issues for transition to the mainstream? Opportunities: It’s not just numbers • Text mining • Time series • Domain specific – – – – Web logs Protein patterns Spatial (e.g. geology, astronomy) Image New opportunities for KDM? • Make data capture/scrub/import trivial • Provide intuitive manipulation interfaces • Provide simpler analysis concepts support/confidence concept precision/recall ranking pivot & rollup & cube • Provide interactive visual data explorer. • Case in point: I have yet to see a nice data cube visualizer. By Year By Make By Make & Year By Color & Year Sum RED WHITE BLUE By Make & Co By Color Research challenges that will impact data mining? • Simpler analysis concepts • Visualization tools to navigate data • Better algorithms = Better answers