Download The interaction between KM and DM is also shown by the current

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
The interaction between KM and DM is also shown by the current efforts on the construction of
automated systems for filtering association rules learned from medical transaction databases.
The availability
of a formal ontology allows the ranking of association rules by clarifying what are the rules
confirming available medical knowledge, what are surprising but plausible, and finally, the ones
to be
filtered out (Raj et al., 2008).
/xvi
Other approaches, such as association and classification rules, joining
the declarative nature of rules, and the availability of learning mechanisms including inductive
logic
programming, are of great potential for effectively merging DM and KM (Amini et al., 2007).
/xvi
A first challenge in discovery knowledge from local patterns in SAGE data is to perform the local
pattern
extractions. Recalling that few years ago it was impossible to mine such patterns in large
datasets and
only association rules with rather a high frequency threshold were used (Becquet et al., 2002).
/262
This chapter gives a summary of our recent experience in mining of transcriptomic data. The
chapter
accentuates the potential of genomic background knowledge stored in various formats such as
free texts,
ontologies, pathways, links among biological entities, etc. It shows the ways in which
heterogeneous
background knowledge can be preprocessed and subsequently applied to improve various
learning and
data mining techniques. In particular, the chapter demonstrates an application of background
knowledge
in the following tasks:
• Relational descriptive analysis
• Constraint-based knowledge discovery
• Feature selection and construction (and its impact on classification accuracy)
• Quantitative association rule mining
/269
Association rule (AR) mining can overcome these drawbacks, however transcriptomic data
represent
a difficult mining context for association rules. First, the data are high-dimensional (typically
contain several thousands of attributes), which asks for an algorithm scalable in the number of
attributes.
Second, expression values are typically quantitative variables. This variable type further
increases
computational demands and moreover may result in an output with a prohibitive number of
redundant
rules. Third, the data are often noisy which may also cause a large number of rules of little
significance.
In this section we discuss the above-mentioned bottlenecks and present results of mining
association
rules using an alternative approach to quantitative association rule mining. We also demonstrate
a way
in which background genomic knowledge can be used to prune the search space and reduce
the amount
of derived rules.
/284
To avoid this discretization step, authors in (Georgii, 2005) investigate the use of quantitative
association
rules, i.e., association rules that operate directly on numeric data and can represent the
cumulative
effects of variables.
/284
Association Rule: A rule, such as implication or correlation, which relates elements cooccurring
within a dataset.
/291
Types of learning desired include classification learning to help with classifying unseen
examples, association
learning to determine any association among features (largely statistical) and clustering to
seek groups of examples that belong together.
/301
Instance (case, record): A single object of the world from which a model will be learned, or on
which a model will be used.
/349
The ever-increasing number of electronic patients’ records, specialized medical databases, and
various
computer-stored clinical files provides an unprecedented opportunity for automated and semiautomated
discovery of patterns, trends, and associations in medical data.
/351
Several methods are available for the integration of domain knowledge into DM in various
applications, using different representation methods. For example, ontologies are used in the
preprocessing
phase to reduce dimensionality, in the mining phase to improve clustering and association
rules, and in the post-processing phase to evaluate discovered patterns (Maedche et al., 2003).
/355
This is an unusual finding, since several medical studies demonstrate association between
OSA and diabetes mellitus (Punjabi & Beamer, 2005).
/370
According these questions, various data mining tasks can be formulated. The descriptive tasks
- associations or segmentation (subgroup discovery), are used if the main purpose of the data
mining
is to find some relation between attributes or examples.
/382
The most known GUHA procedure is the procedure ASSOC (Hájek & Havránek, 1978) mining
for
association rules corresponding to various relations of two Boolean attributes. These rules have
much
stronger expression power than „classical“ association rules using confidence and support; see
e.g.
(Aggraval et al., 1996).
/383
The STULONG data have been used during the Discovery Challenge workshops organized at
the ECML
conferences 2002, 2003 and 2004. About 25 different analyses, reported at these workshops,
covered
a variety of data mining tasks: segmentation, subgroup discovery, mining for association rules of
different
types, mining for sequential and episode rules, classification, regression.
/394
Association Rules: Relation between two Boolean attributes, It is defined by a condition
concerning
contingency table of these attributes. Usually, this condition is given by lower thresholds for
confidence
and support.
/397
JI [08] Data Mining and Medical Knowledge Management, Cases and Applications, Petr Berka, Jan Rauch,
Djamel Abdelkader Zighed, IGI Global, 2009.
The important point here is that the association
of a proximity relationship over the domain over a variable can be seen
as a very creative activity. More importantly the choice of proximity relationship
can play a significant role in the resolution of conflicting information.
/9
Data mining analyzes data previously collected; it is non-experimental.
There are several different data mining products. The most common are conditional
rules or association rules.
/31
At first glance, association rules seem to imply a causal or cause-effect
relationship. That is:
A customer’s purchase of both sausage and beer causes the customer
to also buy hamburger.
/31
The most popular market basket association rule development method
identifies rules of particular interest by screening for joint probabilities (associations)
above a specified threshold.
/32
Association rules are used is to aid in making retail decisions. However, simple
association rules may lead to errors. Errors might occur; either if causality is
recognized where there is no causality; or if the direction of the causal relationship
is wrong [20, 35].
/33
The problem of entity association is at the core of information mining
techniques. In this work we propose an approach that links the similarity of two
knowledge entities to the effort required to fuse them in one. This is implemented
as an iterative updating process.
/123
Associations between these patterns
are then found by applying a data mining technique based on rough set analysis.
Further work and applications to discover knowledge about patterns in
sequences are currently in process.
/159
Abstract. We say that there is an association between two sets of items when the
sets are likely to occur together in transactions. In information retrieval, an association
between two keyword sets means that they co-occur in a record or document.
In databases, an association is a rule latent in the databases whereby an attribute
set can be inferred from another.
Generally, the number of associations may be large, so we look for those that are
particularly strong. Maximal association rules were introduced by [3, 4], and there
is only one maximal association.
Rough set theory has been used successfully for data mining. By using this
theory, rules that are similar to maximal associations can be found. However, we
show that the rough set approach to discovering knowledge is much simpler than
the maximal association method.
/163
An association is said to exist between two sets of items when a transaction
containing one set is likely to also contain the other.
/163
In a database like this, the number of associations may be large. For example,
from a record “Canada, Iran, USA, crude, ship” we may discover a
number of associations such as
/164
Now standard association rules are based on the notion of frequent sets
of attributes which appear in many documents. We are concerned here with
maximal association rules, which are based on frequent maximal sets of attributes.
/164
Association rules are based on the notion of frequent sets of attributes which
appear in many documents. Maximal association rules are based on frequent
maximal sets of attributes which appear maximally in many documents. The
regular association rule X → Y means that if X then Y (with some confidence).
/182
Association rules (ARs) emerged in the domain of market basket analysis
and provide a convenient and effective way to identify and represent certain
dependencies between attributes in a database.
/203
The idea of association rule (AR) mining already dates back to H´ajek et al.
(see e.g. [17, 18, 19]). Its application to market basket analysis gained high
popularity soon after its re-introduction by Agrawal et al. [1] at the beginning
of the 1990’s. The straightforwardness of the underlying ideas as well as the
increasing availability of transaction data from shops certainly helped to this
end.
/205
Association rules can be rated by a number of quality measures, among
which support and confidence stand out as the two essential ones.
/206
Moreover, as information structures representing
associations such as synonymy, specification, and generalization between
linguistic terms seem to pop up in many domains that require the semantical
representation of language (such as information retrieval, and natural
language processing techniques like machine translation) and under a variety
of different names (apart from taxonomies one also encounters thesauri and
ontologies), this application is steadily gaining in importance.
/210
Data mining includes several kinds of technologies such as association rule
analysis, classification, clustering, sequential pattern etc. In the chapter, we
focus on association rule mining since it has been applied in many fields and considered an important
method for discovering associations among data [23].
/225
An association rule is an expression X ⇒ Y meaning that if X occurs, then
Y occurs at the same time, where X and Y are sets of items, X ⊂ I, Y ⊂ I,
and X ∩ Y = Ø.
/225
A lot of research has been carried out in the past by using association
rules to build more accurate classifiers. The idea behind these integrated approaches
is to focus on a limited subset of association rules.
/253
As mentioned above, association and classification rules are the two main
learning algorithms in associative classification. The study of association rules
is focused on using exhaustive search to find all rules in data that satisfy
user-specified minimum support and minimum confidence criteria. On the
other hand, classification rules aim to discover a small set of rules to form
an accurate classifier.
/254
Association rules will search
globally for all rules that satisfy minimum support and minimum confidence
norms. They will therefore contain the full set of rules, which may incorporate
important information. The richness of the rules gives this technique the potential
of reflecting the true classification structure in the data [17].
/255
The research presented in this chapter focused on the integration of supervised
and unsupervised learning. In doing so, a modified version of the CBA algorithm,
which can be used to build classifiers based on association rules, has
been proposed.
/264
JI [10] Intelligent Data Mining, Techniques and Applications, Da Ruan, Guoqing Chen, Etienne E. Kerre,
Geert Wets, Springer, 2005.
Two key statistics
in association rule mining are support and confidence, which measure the number of
cases (i.e., the database transactions in association rule mining) that contain a rule’s
antecedent and consequent parts and the number of cases that contain the consequent
part among those containing the antecedent part, respectively.
/5
Different data mining tasks are discussed in the literature [4, 63, 50], such as regression,
classification, association rule mining, and clustering.
/54
Many studies have shown the limits of support/confidence framework
used in Apriori-like algorithms to mine association rules.
/75
This approach
is based on the research of classification rules which are association rules of
which the consequent is a class label (that is to say class association rules).
/76
An association
rule is a rule A→B, where A and B are two sets of items (also called itemsets) such
that A _= , B _= , and A∩B = , meaning that given a database D of transactions
(where each transaction is a set of items) whenever a transaction T contains A, then
T probably contains B also [3].
/76
Two other differences should be noted between associative classification and decision
trees. The first one is that association rules were built primarily for boolean
data [5], where only the presence of attributes is of interest for the user. The second,
linked to the previous one, is that the usual measures of the interest of a rule,
such as the confidence or the lift, take into account only the cases covered by the
rule. Thereafter, various algorithms have been developed to build association rules
between categorical and numerical attributes (see, for example, [21, 25]).
/77
An association rule is
defined by a database DB, a nonempty set A⊂A (A is an itemset) called antecedent,
and a nonempty set B ⊂ A called consequent such that A∩B = . We denote a rule
by A DB
−→ B, or simply A→B when there is no possible confusion.
/84
An association rule on a
given database is described by two itemsets. One can also speak about the contingency
table of an association rule, which leads us to the notion of descriptor system.
/85
JI [11] Data Mining, Special Issue in Annals of Information Systems, Robert Stahlbock, Sven F. Crone,
Stefan Lessmann, Springer, Nov. 2009.
David W. Cheung technique for updating association rules in
large databases [5],
/102
JI [12] Data Mining In Time Series Databases, Mark Last, Abraham Kandel, Horst Bunke, World Scientific
Publishing, 2004.
Problem statement
The problem of mining association rules over market
basket analysis was introduced in (Agrawal,
Imielinski, & Swami, 1993; Agrawal & Srikant,
1994). The problem consists of finding associations
between items or itemsets in transactional
data. The data is typically retail sales in the form
of customer transactions, but can be any data that
can be modeled into transactions.
/33
It is known that algorithms for discovering association
rules generate an overwhelming number of
those rules.
/33
Since the introduction of association rules a decade
ago and the launch of the research in efficient frequent
itemset mining, the development of effective
approaches for mining large transactional databases
has been the focus of many research studies.
/53
This chapter examines the problem of mining
association patterns (Agrawal, Imielinski & Swami,
1993) from data sets with skewed support
distributions. Most of the algorithms developed so
far rely on the support-based pruning strategy to
prune the combinatorial search space.
/58
A better approach will be to have a measure
that can efficiently identify useful patterns even
at low levels of support and can be used to automatically
remove spurious patterns during the
association mining process. Omiecinski (2003)
recently introduced a measure called all-confidence
as an alternative to the support measure.
/59
Besides all-confidence (Omiecinski, 2003),
other measures of association have been proposed
to extract interesting patterns in large data sets.
For example, Brin, Motwani, and Silverstein
(1997) introduced the interest measure and χ2 test
to discover patterns containing highly dependent
items. However, these measures do not possess
the desired anti-monotone property.
/60
This chapter introduces a data mining method for the discovery of association rules from images
of scanned paper documents.
/176
JI [13] Data Mining Patterns, New Methods And Applications, Pascal Poncelet, Florent Masseglia,
Maguelonne Teisseire, IGI Global, 2008.
Data mining is a process concerned with uncovering patterns, associations,
anomalies and statistically significant structures in data (Fayyad et al., 1996). It
typically refers to the case where the data is too large or too complex to allow either
a manual analysis or analysis by means of simple queries.
/49
It is worth noting here that there is another class of algorithms that may also
be used to deliver nuggets: association rule algorithms (Agrawal, Imielinski &
Swami, 1993; Agrawal Mannila, Srikant, Toivonen & Verkamo, 1996). They were
developed for transaction data (also known as basket data). This type of data
contains information on transactions, for example, showing items that have been
purchased together. Association rule mining algorithms deliver a set of association
rules, often containing all associations between items above certain support and
confidence thresholds. The association rules are generally of the form “customers
that purchase bread and butter also get milk, with 98 % confidence.” This type of
rule is not constrained to have a particular value as output, or indeed to refer to any
particular attribute. Delivering all association rules in transactional data is a suitable
approach, since transactional data tends to contain few associations. Classification
datasets, however, tend to contain many associations, so delivering all association
rules for a classification dataset results in output of overwhelming size. Also
classification datasets often contain many numeric continuous attributes, and
association rule induction algorithms are not designed to cope with this type of data. Therefore,
although association rules can be used for classification (Bayardo, 1997;
Liu, Hsu & Ma, 1998), and even for partial classification or nugget discovery (Ali
et al., 1997), work is required to adapt the association algorithms to cope with
classification data, and with the problem of partial classification or nugget discovery.
/74
Also, the following association rule algorithms were chosen:
GRI: Generalised Rule Induction (Mallen & Bramer, 1995) is described as an
association rule algorithm, although it could also be considered as a partial
classification algorithm. It builds a table of the best N association rules, as
Interesting Nuggets Using Heuristic Techniques 89
ranked by the J measure, where N is a parameter set by the user. In GRI the
output attribute can be chosen, and each rule produced can be used as a nugget
describing that output. They contain binary partitions for numeric attributes and
tests on a simple value for categorical attributes.
Apriori: The Apriori algorithm (Agrawal et al., 1993, 1996) is the most
prominent association rule algorithm. Pre-discretisation of numeric attributes
is necessary, since the algorithm can only handle categorical attributes. A
simple equal width discretisation scheme was used for this. The output of this
algorithm is not constraint to rules for a particular attribute, hence only the
nuggets relating to the class under scrutiny need to be analysed for the task of
nugget discovery. The Apriori rules contain simple value tests for categorical
attributes.
/89
The data clustering approach
was implemented in association with hierarchical clustering and graphtheoretical
techniques, and the network performance is illustrated using
several benchmark problems
/231
Techniques for data mining include mining
association rules, data classification, generalization, clustering, and searching for
patterns (Chen, Han, & Yu, 1996). The focus of data mining is to reveal information
that is hidden and unexpected, as there is little value in finding patterns and
relationships that are already intuitive. By discovering hidden patterns and relationships
in the data, data mining enables users to extract greater value from their data
than simple query and analysis approaches.
/262
There are many data mining techniques being proposed in the literature. The
most common ones, which will be included in this section, are association rules
(Agrawal & Srikant, 1994), sequential patterns (Agrawal & Srikant 1995; Srikant
& Agrawal, 1996; Zaki 1998), classification (Agrawal, Ghosh, Imielinski, Iyer, &
Swami, 1992; Alsabti, Ranka, & Singh, 1998; Mehta, Agrawal & Rissanen, 1996;
Shafer, Agrawal, & Mehta, 1996), and clustering (Aggarwal, Procopiuc, Wolf, Yu,
& Park, 1999; Agrawal, Gehrke, Gunopulos, & Raghavan, 1998; Cheng, Fu &
Zhang, 1999; Guha, Rastogi & Shim, 1998; Ng & Han, 1994; Zhang, Ramakrishnan
& Livny, 1996).
/269
An association rule is a rule that implies certain association relationships
among a set of objects (such as “occur together” or “one implies the other”) in a
database. Given a set of transactions, where each transaction is a set of items, an
association rule is an expression of the form X Y, where X and Y are sets of items.
The intuitive meaning of such a rule is that transactions of the database which
contain X then contain Y. An example of an association rule is “25% of transactions
that contain instant noodles also contain Coca Cola; 3% of all transactions contain
both of these items”. Here 25% is called the confidence of the rule and 3% the
support of the rule. The problem is to find all association rules that satisfy userspecified
minimum support and minimum confidence constraints.
/269
JI [14] Data Mining, A Heuristic Approach, Hussein Aly Abbass, Ruhul Amin Sarker, Charles S. Newton,
Idea Group Publishing, 2002
In the paper “Latent Semantic Space forWeb Clustering” by I-Jen Chiang,
T.Y. Lin, Hsiang-Chun Tsai, Jau-MinWong, and Xiaohua Hu, latent semantic
space in the form of some geometric structure in combinatorial topology and
hypergraph view, has been proposed for unstructured document clustering.
Their clustering work is based on a novel view that term associations of a given
collection of documents form a simplicial complex, which can be decomposed
into connected components at various levels. An agglomerative method for
finding geometric maximal connected components for document clustering is
proposed. Experimental results show that the proposed method can effectively
solve polysemy and term dependency problems in the field of information
retrieval.
/vi
The chapter “Na¨ıve Rules Do Not
Consider Underlying Causality” by Lawrence J. Mazlack argues that it is
important to understand when association rules have causal foundations in
order to avoid na¨ıve decisions and increases the perceived utility of rules with
causal underpinnings.
/viii
In his first paper “Definability of Association Rules and Tables of Critical
Frequencies,” Jan Ranch presents a new intuitive criterion of definability of
association rules based on tables of critical frequencies, which are introduced
as a tool for avoiding complex computation related to the association rules
corresponding to statistical hypotheses tests.
/viii
In the paper “Using Association Rules for Classification from Databases
Having Class Label Ambiguities: A Belief Theoretic Method” by S.P. Subasinghua,
J. Zhang, K. Premaratae, M.L. Shyu, M. Kubat, and K.K.R.G.K.
Hewawasam, a classification algorithm that combines belief theoretic technique
and portioned association mining strategy is proposed, to address both
the presence of class label ambiguities and unbalanced distribution of classes
in the training data.
/ix
Classification Association Rule Mining (CARM) is the technique that utilizes
association mining to derive classification rules. A typical problem with
CARM is the overwhelming number of classification association rules that may
be generated. The paper “Mining Efficiently Significant Classification Associate
Rules” by Yanbo J. Wang, Qin Xin, and Frans Coenen addresses the
issues of how to efficiently identify significant classification association rules
for each predefined class. Both theoretical and experimental results show that
the proposed rule mining approach, which is based on a novel rule scoring and
ranking strategy, is able to identify significant classification association rules
in a time efficient manner.
/x
Association rules [3] describe the co-occurrence among data items in a large
amount of collected data. They have been profitably exploited for classification
purposes [8, 11, 19]. In this case, rules are called classification rules and their
consequent contains the class label. Classification rule mining is the discovery
of a rule set in the training dataset to form a model of data, also called
classifier. The classifier is then used to classify new data for which the class
label is unknown.
/1
Data items in an association rule are unordered. However, in many application
domains (e.g., web log mining, DNA and proteome analysis) the
order among items is an important feature.
/1
To deal with the generation of a large solution set, in the context
of association rule mining a significant effort has been devoted to define concise
representations for frequent itemsets and association rules.
/27
For association rules, concise representations have been proposed
based on closed and generator itemsets [22, 23, 33]. In the context of associative
classification, compact representations for associative classification rules
have been proposed based on generator itemsets [7] and free-sets [15].
/28
Let us recall some examples to illustrate the main intuition. The association
that consists of “wall” and “street” denotes some financial notions that
have meaning beyond the two nodes, “wall” and “street”. This is similar to
the notion of open segment (v0, v1)) that represents one dimensional geometric
object, 1-simplex, that carries information beyond the two end points.
/62
The notion of association rules was introduced by Agrawal et al. [1] and has
been demonstrated to be useful in several domains [4, 5], such as retail sales
transaction database. In the theory two standard measures, called support and
confidence, are often used. For documents the orders of keywords or directions
of rules are not essential. Our focus will be on the support; a set of items that
meets the support is often referred to as frequent itemsets; we will call them
associations (undirected association rules) as to indicate the emphasis on their
meaning more than the phenomena of frequency.
/62
A lot of data mining research has been
focusing on the development of algorithms for performing different tasks, i.e.
clustering, association and classification [1,2,5,13,15,16,19,20,24,28,30], and
on their applications to diverse domains.
/166
The Common Warehouse Model for Data Mining (CWM DM) [22] proposed
by the Object Management Group, introduces a CWM Data Mining
metamodel integrated by the following conceptual areas: a core Mining metamodel
and metamodels representing the data mining subdomains of Clustering,
Association Rules, Supervised, Classification, Approximation, and
Attribute Importance.
/166
The goal of this step is to determine relationships
among variables. In this phase, both statistical methods (e.g. discriminant
analysis, clustering, and regression analysis) and data-oriented methods (e.g.
neural networks, decision trees, association rules) can be used.
/167
The Task Model defines all the
data mining tasks to be done in the project. The approach here is that a
task model is first defined in terms of types of problems (e.g. clustering
instead of K-means, association instead of a priori, . . . ) and then refined
in some iterations by a data mining expert.
/170
This chapter proposes a fuzzy data-mining algorithm for extracting
both association rules and membership functions from quantitative transactions.
/179
Data mining is most commonly used in attempts to induce association rules
from transaction data. Transaction data in real-world applications, however,
usually consist of quantitative values. Designing a sophisticated data-mining
algorithm able to deal with various types of data presents a challenge to
workers in this research field.
/179
In [4], we proposed a mining approach that integrated fuzzy-set concepts with
the a priori mining algorithm [1] to find interesting itemsets and fuzzy association
rules in transaction data with quantitative values.
/180
Wang and Bridges used GAs to tune
membership functions for intrusion detection systems based on similarity of
association rules [11]. Kaya and Alhajj [6] proposed a GA-based clustering
method to derive a predefined number of membership functions for getting
a maximum profit within an interval of user specified minimum support values.
/180
A new algorithm named Compressed Binary Mine (CBMine) for mining
association rules and frequent patterns is presented in this chapter.
/198
Mining association rules in transaction databases have been demonstrated to
be useful and technically feasible in several application areas [14,18,21] particularly
in retail sales, and it becomes every day more important in applications
that use document databases [11, 16, 17]. Although research in this area has
been going on for more than one decade; today, mining such rules is still one
of the most popular methods in knowledge discovery and data mining.
/198
Various algorithms have been proposed to discover large itemsets [2, 3, 6,
9, 11, 19]. Of all of them, a priori has had the biggest impact [3], since its
general conception has been the base for the development of new algorithms
to discover association rules.
/198
The discovery of large itemsets (the first step of the process) is computationally
expensive. The generation of association rules (the second step) is the
easiest of both. The overall performance of mining association rules depends
on the first step; for this reason, the comparative effects of the results that we
present with our algorithm covers only the first step.
/198
An association rule is an
implication of the form X ⇒ Y , where X ⊂ I, Y ⊂ I, and X ∩ Y = ∅. The
association rule X ⇒ Y holds in the database D with certain quality and a
support s, where s is the proportion of transactions in D that contain X ∪Y .
Some quality measures have been proposed, although these are not considered
in this work.
/198
The first step in the discovery of association rules is to find each set of
items (called itemset) that has co-occurrence rate above the minimum support.
An itemset with at least the minimum support is called a large itemset or a
frequent itemset. In this chapter, as in others, the term frequent itemset will
be used. The size of an itemset represents the number of items contained in the
itemset, and an itemset containing k items is called a k-itemset. For example,
beer, diaper can be a frequent 2-itemset. If an itemset is frequent and no proper
superset is frequent, we say that it is a maximally frequent itemset.
/199
Na¨ıve association rules may result if the underlying causality of the
rules is not considered. The greatest impact on the decision value quality of association
rules may come from treating association rules as causal statements without
understanding whether there is, in fact, underlying causality.
/213
One of the cornerstones of data mining is the development of association rules.
Association rules greatest impact is in helping to make decisions. One measure
of the quality of an association rule is its relative decision value. Association
rules are often constructed using simplifying assumptions that lead to na¨ıve
results and consequently na¨ıve and often wrong decisions. Perhaps the greatest
area of concern about the decision value is treating association rules as
causal statements without understanding whether there is, in fact, underlying
causality.
/213
Data mining analyzes data previously collected; it is non-experimental.
There are several different data mining products. The most common are conditional
rules or association rules. Conditional rules are most often drawn from
induced trees while association rules are most often learned from tabular data.
IF Age < 20
THEN vote frequency is: often
with {belief = high}
IFAge is old
THEN Income < $10,000
with {belief = 0.8}
Fig. 2. Conditional rules
220 L.J. Mazlack
Customers who
buy beer and sausage
also tend to buy hamburger
with {confidence = 0.7}
in {support = 0.15}
Customers who
buy strawberries
also tend to buy whipped cream
with {confidence = 0.8}
in {support = 0.2}
Fig. 3. Association rules
/219
At first glance, conditional and association rules seem to imply a causal
or cause-effect relationship. That is:
A customer’s purchase of both sausage and beer causes the customer
to also buy hamburger.
/219
The
association rule’s confidence measure is simply an estimate of conditional
probability. The association rule’s support indicates how often the joint occurrence
happens (the joint probability over the entire data set). The strength of
any causal dependency may be very different from that of a possibly related
association value.
/220
The
association rule’s confidence measure is simply an estimate of conditional
probability. The association rule’s support indicates how often the joint occurrence
happens (the joint probability over the entire data set). The strength of
any causal dependency may be very different from that of a possibly related
association value.
/221
The most popular market basket association rule development method
identifies rules of particular interest by screening for joint probabilities (associations)
above a specified threshold.
/221
Association rules are used is to aid in making retail decisions. However, simple
association rules may lead to errors. Errors might occur; either if causality is
recognized where there is no causality; or if the direction of the causal relationship
is wrong [18, 33]. Errors might occur; either if causality is recognized
where there is no causality; or if the direction of the causal relationship is
wrong.
/221
One of the corner stones of data mining is the development of association
rules. Association rules greatest impact is in helping to make decisions. One
measure of the quality of an association rule is its relative decision value. Association
rules are often constructed using simplifying assumptions that lead
to na¨ıve results and consequently na¨ıve and often wrong decisions. Perhaps
the greatest area of concern is treating association rules as causal statements
without understanding whether there is, in fact, underlying causality.
/226
Data mining analyzes non-experimental data previously collected. There
are several different data mining products. The most common are conditional
rules or association rules. Conditional rules are most often drawn from induced
trees while association rules are most often learned from tabular data.
/241
Many (if not all) DM techniques can be viewed in terms of the data compression
approach. For example, association rules and pruned decision trees
can be viewed as ways of providing compression of parts of the data. Clustering
can also be considered as a way of compressing the dataset. There is
a connection with the Bayesian theory for modeling the joint distribution –
any compression scheme can be viewed as providing a distribution on the set
of possible instances of the data.
/255
Piatetsky-Shapiro in Wu et al. [40] gives a good example that characterizes
the whole area of current DM research: “we see many papers proposing incremental
refinements in association rules algorithms, but very few papers
describing how the discovered association rules are used”. DM is fundamentally
application-oriented area motivated by business and scientific needs to
make sense of mountains of data [40]. A DMS is generally used to support
or do some task(s) by human beings in an organizational environment (see
Fig. 8) both having their desires related to DMS. Further, the organization has
its own environment that has its own interest related to DMS, for example
that privacy of people is not violated.
/267
Finding useful rules is an important task of knowledge discovery in data. Most
of the researchers on knowledge discovery focus on techniques for generating
patterns, such as classification rules, association rules . . . etc, from a data set.
/289
Chapter concerns theoretical foundations of association rules. We deal
with more general rules than the classical association rules related to market baskets
are. Various theoretical aspects of association rules are introduced and several classes
of association rules are defined. Implicational and double implicational rules are
examples of such classes. It is shown that there are practically important theoretical
results related to particular classes. Results concern deduction rules in logical calculi
of association rules, fast evaluation of rules corresponding to statistical hypotheses
tests, missing information and definability of association rules in classical predicate
calculi.
/314
The goal of this chapter is to contribute to theoretical foundations of data
mining. We deal with association rules. We are however interested with more
general rules than the classical association rule [1] inspired by market basket
are.We understand the association rule as an expression ϕ ≈ ψ where ϕ and ψ
are Boolean attributes derived from columns of an analysed data matrix. The
intuitive meaning of the association rule ϕ ≈ ψ is that Boolean attributes ϕ
and ψ are associated in a way corresponding to the symbol ≈. The symbol ≈ is called 4ft-quantifier. It is
associated to a condition related to the (fourfold)
contingency table of ϕ and ψ in the analysed data matrix.
/314
Various aspects of association rules of the form ϕ ≈ ψ were studied:
• Logical calculi formulae of which correspond to association rules were defined.
Some of these calculi are straightforward modification of classical
predicate calculi [2], the other are more simple [12].
• Logical aspects of calculi of association rules e.g. decidability, deduction
rules definability in classical predicate calculus were investigated, see
namely [2, 12, 13].
• Association rules that correspond to statistical hypotheses tests were defined
and studied [2].
• Several approaches to evaluation of association rules in data with missing
information were investigated [2, 7].
• Software tools for mining all kinds of such association rules were implemented
and applied [3, 5, 15].
/316
This chapter concerns theoretical aspects of association rules. It was shown
that most of theoretically interesting and practically important results concerning
association rules are related to classes of association rules. Goal of
this chapter is to give an overview of important classes of association rules
and their properties. Both already published results are mentioned and new
results are introduced.
/316
Broadly CARM algorithms
can be categorised into two groups according to the way that the CRs are
generated:
• Two stage algorithms where a set of CARs are produced first (stage 1),
which are then pruned and placed into a classifier (stage 2). Examples
of this approach include CBA [38] and CMAR [36]. CBA (Classification
Based on Associations), developed by Liu et al. in 1998, is an Apriori [2]
based CARM algorithm, which (1) applies its CBA-GR procedure for CAR
generation; and (2) applies its CBA-CB procedure to build a classifier
based on the generated CARs. CMAR (Classification based on Multiple
Association Rules), introduced by Han and Jan in 2001, is similar to CBA
but generates CARs through a FP-tree [27] based approach.
• Integrated algorithms where the classifier is produced in a single processing
step. Examples of this approach include TFPC2 [15,18], and induction systems
such as FOIL [46], PRM and CPAR [53]. TFPC (Total From Partial
Classification), proposed by Coenen et al. in 2004, is a Apriori-TFP [16]
based CARM algorithm, which generates CARs through efficiently constructing
both P-tree and T-tree set enumeration tree structures. FOIL
2TFPC
may be obtained from http://www.csc.liv.ac.uk/(frans/KDD/Software.
450 Y.J. Wang et al.
(First Order Inductive Learner) is an inductive learning algorithm for
generating CARs developed by Quinlan and Cameron-Jones in 1993. This
algorithm was later developed by Yin and Han to produce the PRM (Predictive
Rule Mining) CAR generation algorithm. PRM was then further
developed, by Yin and Han in 2003 to produce CPAR (Classification based
on Predictive Association Rules).
/450
We present here an abstract model in which data preprocessing and
data mining proper stages of the Data Mining process are are described as two different
types of generalization. In the model the data mining and data preprocessing
algorithms are defined as certain generalization operators. We use our framework to
show that only three Data Mining operators: classification, clustering, and association
operator are needed to express all Data Mining algorithms for classification,
clustering, and association, respectively. We also are able to show formally that the
generalization that occurs in the preprocessing stage is different from the generalization
inherent to the data mining proper stage.
/469
This gives us only three data mining generalization operators
to consider: classification, clustering, and association.
/476
Data mining includes a number of different tasks, such as association rule
mining, classification, and clustering, etc. This paper studies how to learn support
vector machines.
/518
We use a modified association rule mining (ARM) technique to
extract the interesting rules from the training data set and use a belief theoretic
classifier based on the extracted rules to classify the incoming feature vectors. The
ambiguity modelling capability of belief theory enables our classifier to perform better
in the presence of class label ambiguities.
/539
Classification Association Rule Mining (CARM) is the technique that utilizes
association mining to derive classification rules. A typical problem with
CARM is the overwhelming number of classification association rules that may
be generated. The paper “Mining Efficiently Significant Classification Associate
Rules” by Yanbo J. Wang, Qin Xin, and Frans Coenen addresses the
issues of how to efficiently identify significant classification association rules
for each predefined class. Both theoretical and experimental results show that
the proposed rule mining approach, which is based on a novel rule scoring and
ranking strategy, is able to identify significant classification association rules
in a time efficient manner.
/539
JI [15] Data Mining, Foundations and Practice, Tsau Young Lin, Ying Xie, Anita Wasilewska, Churn-Jung
Liau, Springer, 2008.
Dong-Peng
et al. described one such application where the implementations of decision trees and
C. Soares et al. / Applications of Data Mining in E-Business and Finance: Introduction 3
association rules in WEKA [9] are applied in a risk analysis problem in banking, for
which the data was suitably prepared [10]. Another example in this volume is the paper
by Giuffrida et al., in which the Apriori algorithm for association rule mining is used on
an online advertising personalization problem [11].
/3
Another example in this volume is the paper
by Giuffrida et al., in which the Apriori algorithm for association rule mining is used on
an online advertising personalization problem [11].
/4
The applications cover tasks such as clustering (e.g., [15]), classification
(e.g., [13,14]), regression (e.g., [6]), information retrieval (e.g., [8]) and extraction
(e.g., [7]), association mining (e.g., [10,11]) and sequence mining (e.g., [12,16]). Many
research fields are also covered, including neural networks (e.g., [5]), machine learning
(e.g., SVM [13]), data mining (e.g., association rules [10,11]), statistics (e.g., logistic
[13] and linear regression [6]) and evolutionary computation (e.g., [4,14]) The wider the
range of tools that is mastered by a data analyst, the better the results he/she may obtain.
/4
The purpose of association analysis is to figure out hidden associations and some useful
rules contained in the data base, and can use these rules to speculate and judge
unknown data from the already known information [6].
/38
At present, there is a great deal of research which has combined the decision trees
method with other methods such as association rules, Bayesian, Neural Network and
Support Vector Machine [10].
/42
Moreover, a high-support product may have some other temporal restrictions (i.e., it
may go out of the market), thus, it may be necessary to dismiss association rules
associated with it. They introduce the concept of temporal support.
/53
They propose a method to discover cyclic association rules. The same problem
was also considered by others, Verma and Vyas [5] propose an efficient algorithm to
discover calendar-based association rules: the rule “egg coffee” has a strong support
in the early morning but much smaller support during the rest of the day. Zimbrao et al.
[6] extend the seasonality concept just mentioned to include also the concept of product
lifespan during rule generation and application.
/53
Additional work includes enhancing the actionability
of pattern mining in traditional data mining techniques such as association rules [6],
multi-objective optimization in data mining [23], role model-based actionable pattern
mining [27], cost-sensitive learning [28] and postprocessing [29], etc.
/106
JI [17] Applications of Data Mining in E-Business and Finance , Carlos Soares, Yonghong Peng, Jun Meng,
Takashi Washio, Zhi-Hua Zhou, The authors and IOS Press, 2008.
Association rules: Association rules with high confidence and support define a
different kind of pattern. As before, records that do not follow these rules are considered
outliers. The power of association rules is that they can deal with data of
different types. However, Boolean association rules do not provide enough quantitative
and qualitative information. Ordinal association rules, defined by (Maletic
and Marcus, 2000, Marcus et al., 2001), are used to find rules that give more information
(e.g., ordinal relationships between data elements). The ordinal association
rules yield special types of patterns, so this method is, in general, similar
to the pattern-based method. This method can be extended to find other kind of
associations between groups of data elements (e.g., statistical correlations).
/24
The term association rule was first introduced by (Aggarwal et al., 1993) in the context
of market-basket analysis. Association rule of this type are also referred to in
the literature as classical or Boolean association rules. The concept was extended in
other studies and experiments. Of particular interest to this research are the quantitative
association rules (Srikant et al., 1996) and ratio-rules (Korn et al., 1998) that
can be used for the identification of possible erroneous data items with certain modifications.
In previous work we argued that another extension of the association rule –
ordinal association rules (Maletic and Marcus, 2000, Marcus et al., 2001) – is more
flexible, general, and very useful for identification of errors. Since this is a recently
introduced concept, it is briefly defined.
/26
Although various discretization methods are available, they are tuned to different
types of learning, such as decision tree learning, decision rule learning, naive-Bayes
learning, Bayes network learning, clustering, and association learning. Different
types of learning have different characteristics and hence require different strategies
of discretization. It is important to be aware of the leaning context whenever
to design or employ discretization methods. It is unrealistic to pursue a universally
optimal discretization approach that can be blind to its learning context.
/111
The traditional application of association rules is market basket analysis, see for
instance (Brijs et al., 1999). Since then, the technique has been applied to other
kinds of data, such as:
• Census data (Brin et al., 1997A, Brin et al., 1997B)
• Linguistic data for writer evaluation (Aumann and Lindell, 2003)
• Insurance data (Castelo and Giudici, 2003)
• Medical diagnosis (Gamberger et al., 1999)
One of the first generalizations, which has still applications in the field of market
basket analysis, is the consideration of temporal or sequential information, such as
the date of purchase. Applications include:
• Market basket data (Agrawal and Srikant, 1995)
• Causes of plan failures (Zaki, 2001)
• Web personalization (Mobasher et al., 2002)
• Text data (Brin et al., 1997A, Delgado et al., 2002)
• Publication databases (Lee et al., 2001)
/307
Neural networks have been used extensively in data mining for a wide variety of
problems in business, engineering, industry, medicine, and science. In general, neural
networks are good at solving the following common data mining problems such as
classification, prediction, association, and clustering. This section provides a short
overview on the application areas.
/436
With association techniques, we are interested in the correlation or relationship
among a number variables or objects. Association is used in several ways. One use
as in market basket analysis is to help identify the consequent items given a set of
antecedent items. An association rule in this way is an implication of the form: IF x,
THEN Y, where x is a set of antecedent items and Y is the consequent items. This
type of association rule has been used in a variety of data mining tasks including
credit card purchase analysis, merchandise stocking, insurance fraud investigation,
/436
Association/Pattern defect recognition (Kim and Kumara, 1997)
Recognition facial image recognition (Dai and Nakano, 1998)
frequency assignment (Salcedo-Sanz et al., 2004)
graph or image matching (Suganthan et al., 1995; Pajares et al., 1998)
image restoration (Paik and Katsaggelos, 1992; Sun and Yu, 1995)
imgage segmentation (Rout et al., 1998; Wang et al., 1992)
landscape pattern prediction (Tatem et al., 2002)
market basket analysis (Evans, 1997)
object recognition (Huang and Liu, 1997; Young et al., 1997; Li and
xxxxxxxxxxxxx Lee, 2002)
on-line marketing (Changchien and Lu, 2001)
pattern sequence recognition (Lee, 2002)
semantic indexing and searching (Chen et al., 1998)
/437
We present three important classes of neural network
models: Feedforward multilayer networks, Hopfield networks, and Kohonen’s selforganizing
maps, which are suitable for a variety of problems in pattern association,
pattern classification, prediction, and clustering.
/438
Data Mining, as presently understood, is a broad term, including search for “association rules”,
classification, regression, clustering and similar. Here we shall restrict ourselves to search
for “rules” in a rather general sense, namely general dependencies valid in given data and
expressed by formulas of a formal logical language.
/541
The study of logical aspects of Data Mining is interesting and useful: it gives an exact abstract
approach to “association rules” based on the notion of (generalized) quantifiers, important
classes of quantifiers, deductive properties of associations expressed using such quantifiers
as well as other results not mentioned here (as e.g. results on computational complexity).
Hopefully the present chapter will help the reader to enjoy this.
/549
Data Mining is mainly concerned with methodologies for extracting patterns from large data
repositories. There are many Data Mining methods which accomplishing a limited set of tasks
produces a particular enumeration of patterns over data sets. The main tasks of Data Mining
which have already been discussed in previous sections are: i) Clustering, ii) Classification,
iii) Association Rule Extraction, iv)Time Series, v) Regression, and vi) Summarization.
/613
There are also some other well-known approaches and measures for evaluating association
rules are:
• Rule templates are used to describe a pattern for those attributes that can appear in the
left- or right-hand side of an association rule. A rule template may be either inclusive or
restrictive. An inclusive rule template specifies desirable rules that are considered to be
interesting. On the other hand a restrictive rule template specifies undesirable rules that are
considered to be uninteresting. Rule pruning can be done by setting support, confidence
and rule size thresholds.
• Dong and Li’s interestingness measure (Dong and Li, 1998) is used to evaluate the importance
of an association rule by considering its unexpectedness in terms of other association
rules in its neighborhood. The neighborhood of an association rule consists of association
rules within a given distance.
• Gray and Orlowska’s Interestingness (Gray and Orlowka, 1998) to evaluate the confidence
of associations between sets of items in the extracted association rules. Though
suppor and confidence have been shown to be useful for characterizing association rules,
interestingness contains a discriminator component that gives an indication of the independence
of the antecedent and consequent.
• Peculiarity (Zhong et al., 1999) is a distance-based measure of rules interestingness. It
is used to determine the extent to which one data object differs from other similar data
objects.
• Closed Association Rules Mining. It is widely recognized that the larger the set of frequent
itemsets, the more association rules are presented to the user, many of which turn out to be
redundant. However it is not necessary to mine all frequent itemsets to guarantee that all
non-redundant association rules will be found. It is sufficient to consider only the closed
624 Maria Halkidi and Michalis Vazirgiannis
frequent itemsets (Zaki and Hsiao, 2002, Pasquier et al., 1999, Pei et al., 2000). The set
of closed frequent itemsets can guarantee completeness even in dense domains and all
non-redundant association rules can be defined on it. CHARM is an efficient algorithm
for closed association rules mining.
/623
A preliminary exploratory analysis can give indications on how to code the explanatory
variables, in order to maximize their predictive power. In order to reach this objective we have
employed statistical measures of association between pairs of variables, such as chi-squared
based measures and statistical measures of dependence, such as Goodman and Kruskal’s
(see (Giudici, 2003) for a systematic comparison of such measures).
/647
Rare cases are often of special interest. This is especially true in the context of
Data Mining, where one often wants to uncover subtle patterns that may be hidden
in massive amounts of data. Examples of mining rare cases include learning word
pronunciations (Van den Bosch et al., 1997), detecting oil spills from satellite images
(Kubat et al., 1998), predicting telecommunication equipment failures (Weiss
and Hirsh, 1998) and finding associations between infrequently purchased supermarket
items (Liu et al., 1999). Rare cases warrant special attention because they pose
significant problems for Data Mining algorithms.
/747
The three rare cases will be more difficult to detect and generalize from because they
contain fewer data points. A second important unsupervised learning task is association
rule mining, which looks for associations between items (Agarwal et al., 1993).
Groupings of items that co-occur frequently, such as milk and cookies, will be considered
common cases, while other associations may be extremely rare. For example,
mop and broom will be a rare association (i.e., case) in the context of supermarket
sales, not because the items are unlikely to be purchased together, but because neither
item is frequently purchased in a supermarket (Liu et al., 1999).
/748
Table 53.1. Data Mining tasks and used techniques
Data Mining Tasks Data Mining Techniques
Classification induction, neural networks, genetic algorithms
Association Apriori, statistics, genetic algorithms
Clustering neural networks, induction, statistics
Regression induction, neural networks, statistics
Episode discovery induction, neural networks, genetic algorithms
Summarization induction, statistics
/1012
Association rule learning
The standard algorithm for association rule induction is Apriori, which is implemented in
the workbench. Two other algorithms implemented in Weka are Tertius, which can extract
first-order rules, and Predictive Apriori, which combines the standard confidence and support
statistics into a single measure.
/1273
JI [18] Data Mining and Knowledge Discovery Handbook, Oded Maimon, Lior Rokach, Springer, 2nd,
2010.
The three main areas of data mining are (a) classification,
(b) clustering, and (c) association rule mining
(Dunham, 2003). A brief review is given of the methods
discussed in this article.
/7
Association rule mining –(ARM) considers marketbasket
or shopping-cart data, that is, the items purchased
on a particular visit to the supermarket. ARM first
determines the frequent sets, which have to meet a
certain support level.
/7
merchandising, both to analyze patterns of preference
across products, and to recommend products to consumers
based on other products they have selected. An association
rule expresses the relationship that one product is
often purchased along with other products. The number of
possible association rules grows exponentially with the
number of products in a rule, but constraints on confidence
and support, combined with algorithms that build
association rules with itemsets of n items from rules with
n-1 item itemsets, reduce the effective search space.
/45
Association Rules: Used to associate items in a database
sharing some relationship (e.g., co-purchase information).
Often takes the for “if this, then that,” such as,
“If the customer buys a handheld videogame then the
customer is likely to purchase batteries.”
/48
Association Rule Mining (ARM) is concerned with how
items in a transactional database are grouped together. It
is commonly known as market basket analysis, because
it can be likened to the analysis of items that are frequently
put together in a basket by shoppers in a market.
/59
Association rule mining (Agrawal, Imilienski, & Swami,
1993) has been proposed for understanding the relationships
among items in transactions or market baskets. For
instance, if a customer buys butter, what is the chance that
he/she buys bread at the same time? Such information may
be useful for decision makers to determine strategies in a
store.
/65
Association Rule: A kind of rule in the form X _ Ij, where
X is a set of some items and Ij is a single item not in X.
/69
Association Rule: A rule of the form A B meaning
“if the set of items A is present in a transaction, then the
set of items B is likely to be present too”. A typical example
constitutes associations between items purchased at a
supermarket.
/73
Association rules, introduced by Agrawal, Imielinski
and Swami (1993), provide useful means to discover
associations in data. The problem of mining association
rules in a database is defined as finding all the association
rules that hold with more than a user-given minimum
support threshold and a user-given minimum confidence
threshold. According to Agrawal, Imielinski and
Swami, this problem is solved in two steps:
1. Find all frequent itemsets in the database.
2. For each frequent itemset I, generate all the association
rules I'I\I', where I'I.
/150
Association Rule: A pair of frequent itemsets (A, B),
where the ratio between the support of AB and A itemsets
is greater than a predefined threshold, denoted minconf.
/153
Association Rule: A rule in the form of “if this, then
that.” It states a statistical correlation between the occurrence
of certain attributes in a database.
/164
/233
/233
Association rule is a type of data mining that correlates
one set of items or events with another set of items
or events. It employs association or linkage analysis,
searching transactions from operational systems for interesting
patterns with a high probability of repetition
/272
/493
In classical association analysis, records in a transactional
database contain only items. Although transactions
occur under certain contexts, such as time, place,
customers, and so forth, such contextual information has
been ignored in classical association rule mining, due to
the fact that such rule mining was intratransactional in
nature. However, when we talk about intertransactional
associations across multiple transactions, the contexts of
occurrence of transactions become important and must be
taken into account.
/653