Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Conversation with Professor Zhongzhi Shi Zhongzhi Shi Chinese Academy of Sciences Beijing 100090, China [email protected] 1. Please share with us your view on the history and important milestones of the Chinese KDD research and application areas. Knowledge Discovery from Data (KDD) or Data Mining is a broad area that integrates techniques from several fields including machine learning, statistics, pattern recognition, artificial intelligence, and database systems, for the analysis of large volumes of data. There have been a large number of data mining algorithms rooted in these fields to perform different data analysis tasks. In China, we can divide KDD into 3 milestones: one that is related to machine learning algorithms, one for integrated knowledge discovery from datasets, and one for distributed and parallel KDD. • Machine Learning Algorithms The First Chinese Conference on Machine Learning was held in Huang-shan Mountain in 1987 and chaired by Professors Qingsheng Cai, Zhongzhi Shi and Shifu Chen. At that time, the Chinese researchers mainly focused on individual learning algorithms, such as inductive learning, analogical learning, casebased learning, explanation-based learning, genetic learning, and connectionist learning. The book ``Principles of Machine Learning’’, published by International Academic Publishers in 1992, and written by Zhongzhi Shi, can reflect the progress of machine learning at that time. • Integrated Knowledge Discovery from Data In 1989, Piatetsky-Shapiro and Fayyad presented the terminology “Knowledge Discovery from Databases”. In 1996, the book “Advances in Knowledge Discovery and Data Mining” was published, and another book “Using the Data Warehouse” was published in China in 1994 helped promote the KDD research in China. During that time, rough set, SVM, ensemble learning all became very hot research topics and a lot of papers on these topics were published. Special conferences were set up to dedicate to these areas. The book “Knowledge Discovery” (First Edition) by Zhongzhi Shi, published by Tsinghua University Press in 2002 gave a summary of the above research areas. • Distributed and Parallel KDD Multi-agent system, semantic Web, Grid and Cloud Computing provide good platforms for KDD and data mining. Other Internet related topics, including Internet of Things, video, image, TV, medicine data, all provide important resources for data mining. The book “Knowledge Discovery” (Second edition), by Zhongzhi Shi, published by Tsinghua University Press in 2011 gave a summary on the research of distributed and parallel KDD. SIGKDD Explorations 2. Please describe your expertise and contribution to KDD. I work in the Key Lab of Intelligent Information Processing, Institute of Computing Technology, the Chinese Academy of Sciences. Our lab specializes in the following research areas: • Attribute Theory in Learning System, 1990 • Memory networks for representation of case-based reasoning. 1992 • A decision-tree learning algorithm based on bias shift, 1998 • Bayesian network based latent semantic analysis algorithm for semi-supervised text mining. 2001 • HyperSurface Classifier algorithms, 2002 • Tolerance granular space model, 2005 • Distributed data mining on agent grid, 2007 • Feature binding model Bayesian Link Field Network, 2008 • Formal definition of partition entropy and quasi-distance, which possesses three properties: symmetry, the triangle law, and the minimum reachable. These properties ensure that the quasi-distance naturally lends itself as the external measure for clustering validation. 2009 • Fusing semantic aspects for image annotation and retrieval, 2010 • A parallel incremental extreme SVM classifier, 2011 • Developed a two-phase cross-domain transfer learning method for text classification, 2011 • Data mining tools, such as MSMiner (2000), • Cloud computing platform based data mining tools: PDMiner(2008), COMS(2010) 3. Please share with us your view on the future of KDD both in China and the world. In my view, the following topics are important for the future development of KDD: • How to discover knowledge from cross-media datasets or non-structured information? • Research on parallel and distributed massive data mining method and algorithms. • Statistic learning theory. • Transfer learning. Volume 13, Issue 2 Page 89 • Applications, such as search engine, recommendation systems, opinion systems, mining environment for data center combining with cloud computing. About the author: Professor Zhongzhi Shi is a professor at the Institute of Computing Technology, Chinese Academy of Sciences, graduated from the Graduate University of Chinese Academy of Sciences in 1968. His research interests include intelligence science, machine learning, multi-agent systems, semantic Web and image processing. Professor Shi has published 14 monographs, 15 books and more than 450 research papers in journals and conferences. He has won a 2nd-Grade National Award at Science and Technology Progress of China in 2002, two 2nd-Grade Awards at Science and Technology Progress of the Chinese Academy of Sciences in 1998 and 2001, respectively. He is a fellow of CCF and CAAI, senior member of IEEE, member of AAAI and ACM, Chair for the WG 12.2 of IFIP. He serves as Editor-in- Chief of Series on Intelligence Science, Editor-inChief of International Journal of Intelligence Science. http://www.intsci.ac.cn/shizz/ SIGKDD Explorations Volume 13, Issue 2 Page 90