Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining Technologies for Digital Libraries and Web Information Systems Ramakrishnan Srikant IBM Almaden Research Center 650 Harry Road San Jose, CA 95120, USA [email protected] In the first half of the talk, I will discuss data mining technologies that can result in better browsing and searching. Consider the problem of merging documents from different categorizations (taxonomies) into a single master categorization. Current classifiers ignore the implicit similarity information present in the source categorizations. I will show that by incorporating this information into the classification model, classification accuracy can be substantially improved [1]. Next, I will demonstrate novel search technology that treats numbers as first-class objects, and thus yields dramatically better results than current Web search engines when searching over product descriptions or other number-rich documents [2]. The second half of the talk will focus on privacy. I will give a brief introduction to the field of private information retrieval [3], which allows users to retrieve documents without the library identifying which document was retrieved. I will then cover the exciting new research area of privacy preserving data mining [4] [5], which allows us to build accurate data mining models without access to precise information in individual data records, thus finessing the potential conflict between privacy and data mining. References 1. Rakesh Agrawal and Ramakrishnan Srikant. On catalog integration. In Proc. of the Tenth Int'l World Wide Web Conference, Hong Kong, May 2001. 2. Rakesh Agrawal and Ramakrishnan Srikant. Searching with numbers. In Proc. of the Eleventh Int'l World Wide Web Conference, Honolulu, Hawaii, May 2002. 3. Benny Chor, Oded Goldreich, Eyal Kushilevitz, and Madhu Sudan. Private information retrieval. In IEEE Symposium on Foundations of Computer Science, pp. 41-50, 1995. 4. Rakesh Agrawal and Ramakrishnan Srikant. Privacy preserving data mining. In Proc. of the ACM SIGMOD Conference on Management of Data, pp. 439-450, Dallas, Texas, May 2000. 5. Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, and Johannes Gehrke. Privacy preserving mining of association rules. In Proc. of the 8th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, July 2002.