Download View Sample PDF - IRMA International

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Corecursion wikipedia , lookup

Data analysis wikipedia , lookup

Pattern recognition wikipedia , lookup

Theoretical computer science wikipedia , lookup

Neuroinformatics wikipedia , lookup

Transcript
120 International Journal of Information Retrieval Research, 3(4), 120-140, October-December 2013
Interactive Visual Analytics of
Databases and Frequent Sets
Carson K.S. Leung, University of Manitoba, Winnipeg, Canada
Christopher L. Carmichael, University of Manitoba, Winnipeg, Canada
Patrick Johnstone, University of Manitoba, Winnipeg, Canada
David Sonny Hung-Cheung Yuen, University of Manitoba, Winnipeg, Canada
ABSTRACT
In numerous real-life applications, large databases can be easily generated. Implicitly embedded in these
databases is previously unknown and potentially useful knowledge such as frequently occurring sets of items,
merchandise, or events. Different algorithms have been proposed for managing and retrieving useful information from these databases. Various algorithms have also been proposed for mining these databases to find
frequent sets, which are usually presented in a lengthy textual list. As “a picture is worth a thousand words”,
the use of visual representations can enhance user understanding of the inherent relationships among the
mined frequent sets. Many of the existing visualizers were not designed to visualize these mined frequent sets.
In this journal article, an interactive visual analytic system is proposed for providing visual analytic solutions
to the frequent set mining problem. The system enables the management, visualization, and advanced analysis
of the original transaction databases as well as the frequent sets mined from these databases.
Keywords:
Association Rule Mining, Data Management, Data Mining, Data Visualization, Frequent Set
Mining, Information Retrieval, Interactive Technologies, Retrieval and Visualization of the
Mining Results, Visual Analytics
INTRODUCTION
As technology advances, large volumes of data—
such as (i) structured data in relational or transactional databases and (ii) semi-structured data in
text documents or the World Wide Web—can be
generated easily. Embedded within these data is
potentially useful knowledge that professionals,
researchers, students, and practitioners want to
discover. This calls for data mining (Frawley et
al., 1991), which aims to “retrieve” or discover
implicit, previously unknown and potentially
useful information or knowledge from large
volumes of data. A common data mining task is
frequent set mining (Agrawal et al., 1993), which
analyzes the data to find frequently occurring
sets of items (e.g., frequently collocated events,
frequently purchased bundles of merchandise
DOI: 10.4018/ijirr.2013100107
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Information Retrieval Research, 3(4), 120-140, October-December 2013 121
products). These frequent sets serve as building
blocks for many other data mining tasks such
as the mining of association rules, correlation,
sequences, episodes, emerging patterns, web
access patterns, maximal patterns, closed frequent sets, and constrained patterns (Pasquier
et al., 1999; Pei et al., 2000; Lakshmanan et al.,
2003; Leung et al., 2007; Kumar et al., 2012;
Leung et al., 2012). Moreover, these frequently
occurring sets of items can be used in mining
tasks like classification (e.g., associative classification (Liu, 2009)). Frequent sets can also
answer many questions that help users make
important decisions for real-life applications
in different domains such as health care, bioinformatics, social science, as well as business.
For example, knowing the sets of frequently
purchased merchandise helps store managers
make intelligent business decisions like item
shelving, finding the sets of popular elective
courses helps students select the combination of
courses they wish to take, and discovering the
sets of frequently occurring patterns in genes
helps professionals and researchers get a better
understanding of certain biomedical or social
behaviours of human beings.
Frequent set mining has drawn the attention of many researchers as it has played
important roles in many data mining tasks and
has contributed to various real-life applications.
Since the introduction of the frequent set mining problem (Agrawal et al., 1993), numerous
algorithms (Han et al., 2007; Cheng & Han,
2009) have been proposed to mine frequent
sets from databases. Most of these algorithms
return the mining results in textual forms such
as a very long unsorted list of frequent sets of
items. Presenting a large number of frequent sets
in such a conventional lengthy list does not lead
to ease of understanding. Consequently, users
may not easily discover the useful knowledge
that is embedded in the databases.
As “a picture is worth a thousand words”,
a visual representation matches the power of
the human visual and cognitive system. Hence,
having a visual representation of the frequent
sets makes it easier for users (e.g., professionals,
researchers, students, practitioners) to view and
analyze the mining results when compared to
presenting a lengthy textual list of frequent sets
of items. This leads to visual analytics, which
is the science of analytical reasoning supported
by interactive visual interfaces (Thomas &
Cook, 2005; Keim et al., 2008; Keim et al.,
2009a; Keim et al., 2009b). Since numerous
frequent set mining algorithms (which analyze
large volumes of data to find frequent sets of
items) have been proposed, what we need are
interactive systems for visualizing the mining
results so that we can take advantage of both
worlds (i.e., combine advanced data analysis
with visualization).
Many existing visualization systems were
built to visualize data or the mining results (i.e.,
the input or output of the knowledge discovery
process, respectively). Existing visualization systems for the latter mostly show the
knowledge discovered for other data mining
tasks—such as groups of similar objects (for
clustering), decision trees (for classification),
and rules (for association rule mining)—rather
than frequently occurring sets of items (for frequent set mining). In the current journal article,
we present an interactive visual analytic system
(iVAS) for effective management, advanced
analysis, and interactive visualization of large
databases—from which frequent sets of items
can be mined.
This journal article is an expansion of our
book chapter about visual analytics and interactive technologies on data, text, and web mining
(Leung & Carmichael, 2011). Here, we enhance
and update the research found in the book
chapter, and incorporate recent developments
and literature. In addition, new materials in the
current journal article include (i) usage of glyph
icons for intuitive representation of database
transactions and the frequent sets mined from
the database, (ii) application of the concepts of
closed frequent sets and constraints to reduce
visual representation space and computation.
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
19 more pages are available in the full version of this
document, which may be purchased using the "Add to Cart"
button on the publisher's webpage:
www.igi-global.com/article/interactive-visual-analytics-ofdatabases-and-frequent-sets/109665
Related Content
On the Design and Implementation of Interactive XML Applications
Jeff Brown, Rebecca Brown, Chris Velado and Ron Vetter (2013). Information
Retrieval Methods for Multidisciplinary Applications (pp. 19-30).
www.irma-international.org/chapter/design-implementation-interactive-xmlapplications/75898/
An Affinity Based Complex Artificial Immune System
Wei Wang, Mei Chang, Xiaofei Wang and Zheng Tang (2013). Modern Library
Technologies for Data Storage, Retrieval, and Use (pp. 165-177).
www.irma-international.org/chapter/affinity-based-complex-artificialimmune/73773/
Design of a Least Cost (LC) Vertical Search Engine based on Domain
Specific Hidden Web Crawler
Sudhakar Ranjan and Komal Kumar Bhatia (2017). International Journal of
Information Retrieval Research (pp. 19-33).
www.irma-international.org/article/design-of-a-least-cost-lc-vertical-searchengine-based-on-domain-specific-hidden-web-crawler/177280/
Answering Top-k Keyword Queries on Relational Databases
Myint Myint Thein and Mie Mie Su Thwin (2012). International Journal of Information
Retrieval Research (pp. 36-57).
www.irma-international.org/article/answering-top-keyword-queriesrelational/78313/
Ranking Tagged Resources Using Social Semantic Relevance
Anjali Thukral, Hema Banati and Punam Bedi (2011). International Journal of
Information Retrieval Research (pp. 15-34).
www.irma-international.org/article/ranking-tagged-resources-usingsocial/64169/