Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
120 International Journal of Information Retrieval Research, 3(4), 120-140, October-December 2013 Interactive Visual Analytics of Databases and Frequent Sets Carson K.S. Leung, University of Manitoba, Winnipeg, Canada Christopher L. Carmichael, University of Manitoba, Winnipeg, Canada Patrick Johnstone, University of Manitoba, Winnipeg, Canada David Sonny Hung-Cheung Yuen, University of Manitoba, Winnipeg, Canada ABSTRACT In numerous real-life applications, large databases can be easily generated. Implicitly embedded in these databases is previously unknown and potentially useful knowledge such as frequently occurring sets of items, merchandise, or events. Different algorithms have been proposed for managing and retrieving useful information from these databases. Various algorithms have also been proposed for mining these databases to find frequent sets, which are usually presented in a lengthy textual list. As “a picture is worth a thousand words”, the use of visual representations can enhance user understanding of the inherent relationships among the mined frequent sets. Many of the existing visualizers were not designed to visualize these mined frequent sets. In this journal article, an interactive visual analytic system is proposed for providing visual analytic solutions to the frequent set mining problem. The system enables the management, visualization, and advanced analysis of the original transaction databases as well as the frequent sets mined from these databases. Keywords: Association Rule Mining, Data Management, Data Mining, Data Visualization, Frequent Set Mining, Information Retrieval, Interactive Technologies, Retrieval and Visualization of the Mining Results, Visual Analytics INTRODUCTION As technology advances, large volumes of data— such as (i) structured data in relational or transactional databases and (ii) semi-structured data in text documents or the World Wide Web—can be generated easily. Embedded within these data is potentially useful knowledge that professionals, researchers, students, and practitioners want to discover. This calls for data mining (Frawley et al., 1991), which aims to “retrieve” or discover implicit, previously unknown and potentially useful information or knowledge from large volumes of data. A common data mining task is frequent set mining (Agrawal et al., 1993), which analyzes the data to find frequently occurring sets of items (e.g., frequently collocated events, frequently purchased bundles of merchandise DOI: 10.4018/ijirr.2013100107 Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Information Retrieval Research, 3(4), 120-140, October-December 2013 121 products). These frequent sets serve as building blocks for many other data mining tasks such as the mining of association rules, correlation, sequences, episodes, emerging patterns, web access patterns, maximal patterns, closed frequent sets, and constrained patterns (Pasquier et al., 1999; Pei et al., 2000; Lakshmanan et al., 2003; Leung et al., 2007; Kumar et al., 2012; Leung et al., 2012). Moreover, these frequently occurring sets of items can be used in mining tasks like classification (e.g., associative classification (Liu, 2009)). Frequent sets can also answer many questions that help users make important decisions for real-life applications in different domains such as health care, bioinformatics, social science, as well as business. For example, knowing the sets of frequently purchased merchandise helps store managers make intelligent business decisions like item shelving, finding the sets of popular elective courses helps students select the combination of courses they wish to take, and discovering the sets of frequently occurring patterns in genes helps professionals and researchers get a better understanding of certain biomedical or social behaviours of human beings. Frequent set mining has drawn the attention of many researchers as it has played important roles in many data mining tasks and has contributed to various real-life applications. Since the introduction of the frequent set mining problem (Agrawal et al., 1993), numerous algorithms (Han et al., 2007; Cheng & Han, 2009) have been proposed to mine frequent sets from databases. Most of these algorithms return the mining results in textual forms such as a very long unsorted list of frequent sets of items. Presenting a large number of frequent sets in such a conventional lengthy list does not lead to ease of understanding. Consequently, users may not easily discover the useful knowledge that is embedded in the databases. As “a picture is worth a thousand words”, a visual representation matches the power of the human visual and cognitive system. Hence, having a visual representation of the frequent sets makes it easier for users (e.g., professionals, researchers, students, practitioners) to view and analyze the mining results when compared to presenting a lengthy textual list of frequent sets of items. This leads to visual analytics, which is the science of analytical reasoning supported by interactive visual interfaces (Thomas & Cook, 2005; Keim et al., 2008; Keim et al., 2009a; Keim et al., 2009b). Since numerous frequent set mining algorithms (which analyze large volumes of data to find frequent sets of items) have been proposed, what we need are interactive systems for visualizing the mining results so that we can take advantage of both worlds (i.e., combine advanced data analysis with visualization). Many existing visualization systems were built to visualize data or the mining results (i.e., the input or output of the knowledge discovery process, respectively). Existing visualization systems for the latter mostly show the knowledge discovered for other data mining tasks—such as groups of similar objects (for clustering), decision trees (for classification), and rules (for association rule mining)—rather than frequently occurring sets of items (for frequent set mining). In the current journal article, we present an interactive visual analytic system (iVAS) for effective management, advanced analysis, and interactive visualization of large databases—from which frequent sets of items can be mined. This journal article is an expansion of our book chapter about visual analytics and interactive technologies on data, text, and web mining (Leung & Carmichael, 2011). Here, we enhance and update the research found in the book chapter, and incorporate recent developments and literature. In addition, new materials in the current journal article include (i) usage of glyph icons for intuitive representation of database transactions and the frequent sets mined from the database, (ii) application of the concepts of closed frequent sets and constraints to reduce visual representation space and computation. Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 19 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the publisher's webpage: www.igi-global.com/article/interactive-visual-analytics-ofdatabases-and-frequent-sets/109665 Related Content On the Design and Implementation of Interactive XML Applications Jeff Brown, Rebecca Brown, Chris Velado and Ron Vetter (2013). Information Retrieval Methods for Multidisciplinary Applications (pp. 19-30). www.irma-international.org/chapter/design-implementation-interactive-xmlapplications/75898/ An Affinity Based Complex Artificial Immune System Wei Wang, Mei Chang, Xiaofei Wang and Zheng Tang (2013). Modern Library Technologies for Data Storage, Retrieval, and Use (pp. 165-177). www.irma-international.org/chapter/affinity-based-complex-artificialimmune/73773/ Design of a Least Cost (LC) Vertical Search Engine based on Domain Specific Hidden Web Crawler Sudhakar Ranjan and Komal Kumar Bhatia (2017). International Journal of Information Retrieval Research (pp. 19-33). www.irma-international.org/article/design-of-a-least-cost-lc-vertical-searchengine-based-on-domain-specific-hidden-web-crawler/177280/ Answering Top-k Keyword Queries on Relational Databases Myint Myint Thein and Mie Mie Su Thwin (2012). International Journal of Information Retrieval Research (pp. 36-57). www.irma-international.org/article/answering-top-keyword-queriesrelational/78313/ Ranking Tagged Resources Using Social Semantic Relevance Anjali Thukral, Hema Banati and Punam Bedi (2011). International Journal of Information Retrieval Research (pp. 15-34). www.irma-international.org/article/ranking-tagged-resources-usingsocial/64169/