Download Class Based Rule Mining using Ant Colony Optimization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Class Based Rule Mining using Ant Colony Optimization
Bijaya Kumar Nanda1 & Gyanesh Das2
1
Department of ICT, F.M. University, Balasore, Odisha
2
Department of ENTC, DRIEMS
Abstract – Ant colony optimization (ACO) can be applied to
the data mining field to extract rule-based classifiers. In this
paper we discuss an overview of two important data mining
tasks such as association rule mining (ARM) and
classification. Finally we provide a brief overview of a
classifier using ant colony optimization that combines
association rules mining and supervised classification.
I.
II. ASSOCIATIVE RULES MINING AND ASSOCIATIVE
CLASSIFICATION:
There are different data mining techniques
including supervised classification, association rules
mining or market basket analysis, unsupervised
clustering, web data mining, and regression. One
technique of data mining is classification. The goal of
classification is to build a model of the training data that
can correctly predict the class of unseen or test objects.
The input of this model learning process is a set of
objects along with their classes (supervised training
data). Once a predictive model is built, it can be used to
predict the class of the objects of test cases for which
class is not known. To measure the accuracy of the
model, the available dataset is divided into training and
test sets. The training set is used to build the model and
test set is used to measure its accuracy. There are several
problems from a wide range of domains which can be
cast into classification problems. Therefore there is
always a need of algorithms for building
comprehensible and accurate classifiers.
INTRODUCTION
Classification rule discovery and association rule
mining are two important data mining techniques.
Association rule mining discovers all those rules from
the training set that satisfy minimum support and
confidence threshold while classification rule mining
discovers a set of rules for predicting the class of unseen
data. In this paper we discuss a classification algorithm
combining the idea of association rules mining and
supervised classification using ACO. It is class based
association rule mining or associative classification in
which consequent of an association rule is always a
class label.
The technique integrates classification with
association rule mining to discover high quality rules for
improving the performance of resulting classifier. ACO
is used to mine only appropriate subset of class
association rules instead of exhaustively searching for
all possible rules. The mining process stops when the
discovered rule set achieves a minimum coverage
threshold. Strong association rules are discovered based
on confidence and support and these rules are used to
classify the unseen data. This integration finds more
accurate and compact rules from the training set.
Association rules mining (ARM) is another
important data mining technique. It is used to find
strong and interesting relationships among data items
present in a set. A typical example of ARM is market
basket analysis . In market basket analysis each record
contains a list of items purchased by a customer. We are
interested to find out the set of items that are frequently
purchased together. The objective is to search for
interesting habits of customers. The sets of items
occurring together can be written as association rules.
These association rules can be written as “IF THEN”
statements. IF part is called the antecedent of rule and
THEN contains the consequent of the rule. In ARM the
antecedent and consequent are sets of data items called
item-set. An item set that contains k items is called k
item set. An association rule is written as A => B, where
A and B are set of items. There are different real world
The rest of this paper is organized as follows.
Section II presents basic ideas of association rule mining
and classification Section III describes integration of
association rule mining and classification using ACO.
Finally, Section IV concludes the paper.
ISSN (Print) : 2319 – 2526, Volume-2, Issue-1, 2013
25
Special Issue of International Journal on Advanced Computer Theory and Engineering (IJACTE)
applications of ARM including market basket analysis,
customer segmentation, electronic commerce, medical,
web mining, finance, and bio informatics.
the percentage of the dataset that is (correctly) covered
by a set of rules. Coverage is also specified by the user.
There are different challenges of class based rule mining
First problem is that rule generation is based on frequent
item-set mining process and for large databases it takes
a lot of time due to the large quantity of items and
samples. Secondly, this generates more rules. There may
be redundant rules included in the classifier which
increases the time cost when classifying objects. There
are different approaches developed for associative
classification. An algorithm was proposed by B. Liu and
et al. It has three main steps: rule discovery, rule
selection and classification. Rule discovery process
mines all rules from training dataset where consequent
of the rule is a class label. These rules are called class
association rules. Rule selection process selects the
subset of rules from all discovered rules on the basis of
their predictive accuracy to make a classifier. They use
confidence measure for selecting rules. Higher
confidence rules usually give higher predictive accuracy
Finally classification process classifies the unseen data
samples. An unseen data sample is assigned the class of
the rule that has highest confidence value and which
also matches with the data sample. The basic problem
with their approach is that they mine all possible rules
that satisfy minimum support and confidence threshold.
This computation is very time expensive in large
databases.
In ARM two factors are used to measure the
importance of a rule, one is called support which is the
ratio (or percentage) of transactions in which an item-set
appears with respect to total number of transactions.
Second factor is confidence, which is the percentage of
the number of transactions that contain all items in the
consequent as well as the antecedent to the number of
transactions that contain all items in the antecedent. The
aim of ARM is to find all rules whose support and
confidence are greater than the minimum support and
confidence threshold specified by the user. The formulas
of calculating support and confidence of a rule X =>Y
are calculated according to Equation (1.1) and (1.2).
Support (X=>Y) = P(XUY)
Confidence
(X=>Y) = P(Y|X)
(1.1)
(1.2)
Where P(XUY) is the probability of transaction
contains X and Y together and P(X|Y) is the probability
of Y given X. In other words, support is the probability
that a selected transaction from the database will hold all
items in the antecedent and the consequent, whereas the
confidence is the probability that a randomly selected
transaction will contain all the items in the consequent
given that the transaction contains all the items in the
antecedent.
Another class based ARM algorithm called
“classification based on multiple class association rules”
is proposed by W. Li and et al. . They use multiple rules
for classifying an unseen data sample. To classify a test
sample the algorithm collects a small set of high
confidence rules that match with the test sample and
analyze correlation among these rules to assign the class
label. They also use a tree structure for storing rules to
improve the efficiency of rule retrieving process for
classification purpose. The algorithm generates all
possible association rules.
Class based rule mining is a specific kind of ARM
in which we are interested in finding class based
association rules. A class based association rule is a rule
in which consequent of the rule is always a class label.
This takes advantage of ARM for finding interesting
relationship among items in the dataset. The support
and confidence measures are used to find important
rules. We are interested in those association rules that
satisfy minimum support and confidence threshold
specified by the user. Basic problem is that of mining
association rules from large amounts of data. The
dataset which is used to build the class association rules
contains a set of transaction described by a set of
attributes. Each transaction belongs to a predetermined
class. The representation of class association rule is
X => C, where X is a list of items and C is the class
label.
Class based rule mining algorithm uses ACO algorithm
for finding interesting relationships among data items. It
uses its evolutionary capability to efficiently find more
interesting subsets of association rules. It does not
exhaustively search for all possible association rules as
conventional ARM approaches does. In each generation
of the algorithm a number of rules that satisfies
minimum support and confidence threshold are selected
for the final classifier. After each generation
pheromones values are updated in such a way that better
rules can be extracted in next coming generations. The
final discovered rule set is the predictive model and is
used to classify unseen test samples. The algorithm
based on ACO Shown below discovers unordered rule
General association rules mining approach can
predict any attribute not just the class attribute and can
predict the values of more than one attributes. Another
difference is that class based association rules are
normally used together as a set for classification of
unseen test cases. Here a factor which is used with the
support and confidence measure is called coverage. It is
ISSN (Print) : 2319 – 2526, Volume-2, Issue-1, 2013
26
Special Issue of International Journal on Advanced Computer Theory and Engineering (IJACTE)
lists and has a different rule construction process,
different pheromone update formula classifying unseen
test cases.
1
Discovered_RuleList = {};
rule list with empty set */
2
Training Set = {all training samples};
3
Initialize min_support, min_confidence,
min_coverege,
4
Initialize No_ants; */ initialize the maximum
number of ants */
5
FOR EACH CLASS C IN THE TRAINING
SET
Rule_Set_Class = {}; /* initialize the rule set
of the selected class with empty set */
7
Initialize pheromone value of all trails;
8
Initialize the heuristic values;
9
Calculate the support of all 1-itemset (item =>
C) of the training set;
10
IF (support (item) < min_support)
12
26
17
DO
18
Antt construct a class based association rule
with a maximum g number of items in the
rule;
19
t = t + 1;
22
III. OVERVIEW OF THE ALGORITHM
In this section we describe the steps of ACO-based
algorithm approach in detail.
3.1 General Description
The approach finds a set of association rules from a
training set to form a classifier. It does not mine all
possible association rules but only a subset of them.
Conventional association rules mining algorithms mine
all possible rules, which are computationally expensive
for large databases. The rules are selected on the basis of
support and confidence. Each rule is in the form:
IF (item1 AND item2 AND …) THEN class
Each item is an attribute-value pair. An example of
item is “weather = cold”. The attribute’s name is
“weather” and “cold” is one of its possible values. The
consequent of each association rule is a class label of a
set of classes present in training dataset. We use only
“=” operator as our algorithm only deals with
categorical attributes. The algorithm for searching for
the rules is ACO based. The search space is defined in
the form of a graph, where each node of the graph
represents a possible value of an attribute. Rules are
discovered for each class separately. A temporary set of
rules is discovered during each generation of the
algorithm and inserted in a set of rules reserved for the
selected class label. This process continues until
coverage of the set of rules of selected class is greater
than or equal to a minimum coverage threshold
specified by the user. When rule set of the selected class
has sufficient rules to satisfy the minimum coverage
threshold then rules are generated for another class. The
algorithm stops when the rules of all classes have been
FOR EACH RULE CONSTRUCTED BY THE
ANTS
IF(support(Rule)>=min_support AND
confidence(Rule)>=min_confidence)
23
24
g = g + 1; /* increment generation count */
34 Output: Final classifier;
20 WHILE (t <= no_ants);
21
29
33 Pruning discovered rule set;
14 WHILE (g != no_attributes && coverege <
min_coverege)
t = 1; /* counter for ants */
Update pheromones;
32 END FOR
13 g = 1; /* generation count */
16
28
31 Insert Rule_Set_Class in Discovered_RuleList
END IF
Temp_Rule_Set_Class = {};
Sort all the rules in Temp_Rule_Set_Class
according to confidence and then support;
30 END WHILE
Set the pheromone value 0 of all those
items;
15
END FOR
27 Insert the rule one by one from
Temp_Rule_Set_Class into Rule_Set_Class
until coverage of Rule_Set_Class is greater
than or equal to min_coverage;
/* initialize the
6
11
25
Insert the rule in Temp_Rule_Set_Class;
END IF
ISSN (Print) : 2319 – 2526, Volume-2, Issue-1, 2013
27
Special Issue of International Journal on Advanced Computer Theory and Engineering (IJACTE)
generated. The final classifier contains rules of all
classes.
which do not satisfy a minimum support threshold. The
value of zero ensures that these items cannot be selected
by ants during rule construction process.
At the start of the algorithm, discovered rule set is
empty and user defined parameters are initialized that
include minimum support, minimum confidence,
minimum coverage and number of ants used by the
algorithm. As we mine the association rules of each
class separately, therefore the first step is to select a
class from the set of remaining classes. The pheromone
values and heuristic values on links between items
(attribute-value pairs) are initialized. The pheromone
values on incoming links to all those items are set to
zero that do not satisfy the minimum support threshold
so that ants are not able to choose these items. The
generation count “g” is set to 1.
3.4 Selection of an Item
An ant incrementally adds an item in the antecedent
part of the rule that it is constructing. When an item (i.e.
an attribute-value pair) has been included in the rule
then no other value of that attribute can be considered.
The probability of selection of an item for current partial
rule is given by the Equation (1.4):
Pij(t)=
Generation count controls how many maximum
numbers of items can be added by an ant in antecedent
part of rule which it is constructing. For example when
g = 2 an ant can add a maximum of two items in its rule
antecedent part. This means that in the first generation
we mine one-length association rules only. In the second
generation we try to have two-length rules but we may
not be able to reach two-length in some cases if the
support of all candidate items is below the minimum
threshold. Similarly we have third, fourth and
subsequent generations. The maximum value of
generation count is the number of attributes in dataset
excluding class attribute.
(1.4)
Where τij(g) is the amount of pheromone associated
between itemi and itemj in current generation.
Furthermore, ηij(c) is the value of the heuristic function
on the link between itemi and itemj for the current
selected class. The total number of attributes in training
dataset is a, and xi is a binary variable that is set to 1 if
the attribute Ai was not used by current ant and
otherwise set to 0, and bi is the number of possible
values in the domain of attribute Ai. The denominator is
used to normalize τij(g) ηij(c) value of each possible
choice with the summation of τij(g) ηij(c) values of all
possible choices. Those items which have higher
pheromone and heuristic values are more likely to be
selected.
3.2 Rule Construction
3.5 Heuristic Function
Each ant constructs a single item rule in the first
generation. In the second generation each ant tries to
construct a rule with two items. Similarly we have 3
item rules in 3rd generation and so on. Rules with a
maximum k number of items are generated in the kth
generation, where k is the number of attributes in
training set excluding the class attribute.
The heuristic value of an item indicates the quality
or attractiveness of that item and is used to guide the
process of item selection. We use a correlation based
heuristic function that calculates correlation of candidate
items with the last item (attribute-value pair) chosen by
the current ant. The heuristic function is:
3.3 Pheromone Initialization
ηij=
The pheromone values on all edges are initialized
before the start of WHILE loop for each new class. The
pheromone values on the edges between all items are
initialized with the same amount of pheromone. The
initial pheromone is:
ij (t=1)
=
.
(1.5)
The most recently chosen item is itemi and itemj is
the item being considered for adding in the rule. The
component |itemi, itemj, classk| is the number of
uncovered training samples having itemi, and itemj with
class label k for which ants are constructing rules. This
value is divided by the number of uncovered training
samples that have itemi with classk to find the
correlation between the items itemi and itemj.
(1.3)
Where a is the total number of attributes in training
set excluding the class attribute and bi is the number of
possible values in the domain of an attribute ai. The
pheromone values of all those items are set to zero
The other component of the heuristic function
indicates the oveall importance of itemj in determining
the classk. The factor |itemj, classk| is the number of
uncovered training samples having itemj with classk
ISSN (Print) : 2319 – 2526, Volume-2, Issue-1, 2013
28
Special Issue of International Journal on Advanced Computer Theory and Engineering (IJACTE)
Where τij(g) is the pheromone value between itemi
and itemj in current generation, ρ represents the
pheromone evaporation rate and Q is the quality of the
rule constructed by an ant. The pheromones of these
rules are increased so that in next generation ants can
explore more search space instead of searching around
those rules which are already inserted in the discovered
rule set. This pheromone strategy increases the diversity
of the search by focusing on new unexplored areas of
search space. The pheromone update on the items
occurring in those rules which are rejected due to low
confidence but which have sufficient support is done in
two steps. First a percentage of the pheromone value is
evaporated and then a percentage of the pheromone
(depending upon the quality of the rule) is added. If the
rule is good then the items of that rule will become more
attractive in next generation and more likely to be
chosen by ants. Pheromones are evaporated to
encourage exploration and to avoid early convergence.
The pheromone values of other rules are updated by
normalizing. Each pheromone value is normalized by
dividing it by the summation of all pheromone values of
its competing items. If the quality of a rule is good and
there is a pheromone increase on the items used in the
rule then the competing items will become less
attractive in next generation due to normalization. The
reverse is true if the quality of the rule is not good.
and is divivded by the factor |itemj| is the number of
uncovered training samples having itemj. The heuristic
function considers the relationship of the items to be
added in the rule and also takes into consideration the
overall distribution of the item to be added. As rules are
built for a specific class labels therefore our heuristic
function is dependent on the class chosen by the ant.
Our heuristic function reduces the irrelevant search
space during rule construction process in order to better
guide the ant to choose the next item in its rule
antecedent part. It assigns a zero value to the
combination of those items which do not occur together
for a given class, thus efficiently restricting the search
space for the ants. Therefore, it can be very useful for
large dimensional search spaces.
3.6 Rule Construction Stoppage
An ant continues to add items in the rule in every
generation, for example if generation counter is three
then it can add maximum three items in the rule
antecedent which it is constructing. The rule
construction process can stop in two cases: one if value
of generation counter is equal to total number of
attributes present in the dataset (excluding class
attribute) and second if in any generation the coverage
of the rule set of that particular class reaches minimum
coverage threshold.
3.9 Rule Selection Process
3.7 Quality of a Rule
After all the ants have constructed their rules during
a generation, these rules are placed in a temporary set.
These rules are checked for minimum support and
confidence criterion and those which do not fulfill them
are removed. The next step is to insert these rules in the
rule set reserved for the discovered rules of the current
class. A rule is moved from the temporary rule set to the
rule set of the current class only if it is found to enhance
the quality of the later set. For this purpose the top rule
from the temporary rule set, called R1, is removed. This
rule R1 is compared, one by one, with all the rules
already present in the discovered rule set of the selected
class. The comparison continues until a rule from the
discovered rule set satisfies a criterion described below,
or until there are no more rules left in the discovered
rule set with which R1 can be compared. In the later
case, when no rules in the discovered rule set are able to
fulfill the criterion, R1 is inserted into the discovered
rule set. If a rule in the discovered rule set fulfills the
criterion then the rule R1 is rejected and further
comparison of R1 is stopped. The criterion is as follows.
Let the compared rule of discovered rule set be called
R2. If R2 is more general than R1 and confidence of R2
is higher than or equal to R1 then R2 satisfies the
criterion to reject the inclusion of R1. If R2 is exactly
the same as R1 then also the criterion is satisfied. The
The quality of a rule is calculated on the basis of its
confidence which is calculated as:
Q=
(1.6)
Here Covered is the number of training samples that
match with the rule antecedent part and TP is the
number of training samples which match the antecedent
of the rule and whose consequent is also same as the
consequent of the rule. If the confidence value is high
then the rule is considered more accurate. This value is
also used in for updating the pheromone values.
3.8 Pheromone Update
The pheromone values are updated after each
generation so that in next generation ants can make use
of this information in their search. The amount of
pheromone on links between items occurring in those
rules which satisfy minimum support threshold but
whose confidence is below the minimum required
confidence (and hence they were removed from the
temporary rule set) are updated according to the
Equation (1.6):
τij(g+1) = (1- ρ) τij(g)+(1-1/1+Q) τij(g)
(1.7)
ISSN (Print) : 2319 – 2526, Volume-2, Issue-1, 2013
29
Special Issue of International Journal on Advanced Computer Theory and Engineering (IJACTE)
logic of this criterion is that since R2 is already in the
rule set any data sample that matches with R1 is also
matched with R2 and since we assign the class label of
highest confidence rule therefore the data sample will
always be classified by R2 and R1 will not increase the
coverage of rule set.
V. REFERENCES
[1]
G. Chen, H. Liu, L. Yu, Q. Wei, and X. Zhang, “A new
approach to classification based on association rule
mining,” Decision Support Systems, Vol. 42, No. 2,
pp. 674-689, 2006
[2]
B. Liu, H. Hsu, and Y. Ma, “Integrating classification
and association rule mining,” in Proceedings of 4th
International Conference on Knowledge Discovery
Data Mining, pp. 80–86, 1998.
[3]
W. Li. “Classification based on multiple association
rules,” MSc Thesis, Simon Fraser University, April
2001.
[4]
R.S. Parpinelli, H.S. Lopes, and A.A. Freitas, “Data
mining with an ant colony optimization algorithm,”
IEEE Transactions on Evolutionary Computation, Vol.
6, No. 4, pp. 321–332, Aug. 2002.
[5]
J. Han, and M. Kamber, Data Mining: Concepts and
Techniques, 2nd ed., Morgan Kaufmann Publishers,
2006.
[6]
M. Dorigo, and T. Stützle, Ant Colony Optimization.
Cambridge, MA: MIT Press,2004.
[7]
A. Freitas, “Survey of evolutionary algorithms for data
mining and knowledge discovery,” in A. Ghosh, S.
Tsutsui
(Eds.),
Advances
in
Evolutionary
Computation, Springer-Verlag, pp. 151-160, 2001.
[8]
W. Li, J. Han, and J. Pei, “CMAR: Accurate and
efficient classification based on multiple classassociation rules,” in
Proceedings of
IEEE
InternationalConference on Data Mining. (ICDM ’01),
pp. 369–376, 2001.
[9]
M. Dorigo, G. Di Caro, and L.M. Gambardella, “Ant
algorithms for discrete optimization.
3.10 Discovered Rule Set
When the coverage of discovered rule set of the
selected class reaches a coverage threshold then we stop
the rule discovery process for that class. This process is
repeated for all classes. A final discovered rule set (or
list) contains discovered rules of all classes.A new test
case unseen during training is assigned the class label of
the rule that covers the test sample and also has the
highest confidence among any other rules covering it .
This is implemented by keeping the rules in a sorted
order (from highest to lowest) on the basis of their
confidence. For a test case the rules are checked one by
one according to the final discovered rule set (or list)
contains discovered rules of all classes. order of their
sorting and the first rule whose antecedents match the
new test sample is fired and the class predicted by the
rule’s consequent is assigned to the sample. If none of
the discovered rules are fired then the sample is assigned
the majority class of the training set which is the default
class of the classifier.
IV. CONCLUSION
In this paper Class based rule mining using ACO
algorithm was discussed, which combines two primary
data mining paradigms, classification and association
rules mining. It is a supervised learning approach for
discovering association rules. ACO is used to find the
most suitable set of association rules. ACO searches
only a subset of association rules to form an accurate
classifier instead of massively searching all possible
association rules from the dataset. The set of discovered
rules is evaluated after each generation of the algorithm.
Better rules are generated in subsequent generations by
adjusting of the pheromone values.

ISSN (Print) : 2319 – 2526, Volume-2, Issue-1, 2013
30