Download Data Mining Technologies for Digital Libraries and Web Information

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data analysis wikipedia , lookup

Corecursion wikipedia , lookup

Neuroinformatics wikipedia , lookup

Transcript
Data Mining Technologies for Digital Libraries and
Web Information Systems
Ramakrishnan Srikant
IBM Almaden Research Center
650 Harry Road
San Jose, CA 95120, USA
[email protected]
In the first half of the talk, I will discuss data mining technologies that can result in
better browsing and searching. Consider the problem of merging documents from different categorizations (taxonomies) into a single master categorization. Current classifiers ignore the implicit similarity information present in the source categorizations. I
will show that by incorporating this information into the classification model, classification accuracy can be substantially improved [1]. Next, I will demonstrate novel
search technology that treats numbers as first-class objects, and thus yields dramatically better results than current Web search engines when searching over product descriptions or other number-rich documents [2].
The second half of the talk will focus on privacy. I will give a brief introduction to
the field of private information retrieval [3], which allows users to retrieve documents
without the library identifying which document was retrieved. I will then cover the
exciting new research area of privacy preserving data mining [4] [5], which allows us
to build accurate data mining models without access to precise information in individual data records, thus finessing the potential conflict between privacy and data
mining.
References
1. Rakesh Agrawal and Ramakrishnan Srikant. On catalog integration. In Proc. of the Tenth
Int'l World Wide Web Conference, Hong Kong, May 2001.
2. Rakesh Agrawal and Ramakrishnan Srikant. Searching with numbers. In Proc. of the Eleventh Int'l World Wide Web Conference, Honolulu, Hawaii, May 2002.
3. Benny Chor, Oded Goldreich, Eyal Kushilevitz, and Madhu Sudan. Private information retrieval. In IEEE Symposium on Foundations of Computer Science, pp.
41-50, 1995.
4. Rakesh Agrawal and Ramakrishnan Srikant. Privacy preserving data mining. In Proc. of the
ACM SIGMOD Conference on Management of Data, pp. 439-450, Dallas, Texas, May 2000.
5. Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, and Johannes Gehrke. Privacy preserving mining of association rules. In Proc. of the 8th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, July 2002.