Download VLSI Implementation of Flexible Architecture for Decision Tree

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

B-tree wikipedia , lookup

Operational transformation wikipedia , lookup

Quadtree wikipedia , lookup

Transcript
Vlsi Implementation Of Flexible Architecture For Decision
Tree Classification In Data Mining
Dr K Venkatesh Sharma1, Behailu Shewandagn2, Shankar Nayak Bhukya3
1,a
Asst Professor, School of Computing, Jimma Institute of Technology, Jimma University,.
Asst Professor, School of Computing, Jimma Institute of Technology, Jimma University
3,c
Shankar Nayak Bhukya, Asst Professor, SVIT, Hyderabad.
2,b
a)
[email protected]
b)
[email protected]
c)
[email protected]
Data mining algorithms have become vital to researchers in science, engineering,
medicine, business, search and security domains. In recent years, there has been a terrific raise in
the size of the data being collected and analyzed. Classification is the main difficulty faced in data
mining. In a number of the solutions developed for this problem, most accepted one is Decision
Tree Classification (DTC) that gives high precision while handling very large amount of data. This
paper presents VLSI implementation of flexible architecture for Decision Tree classification in
data mining using c4.5 algorithm.
Abstract. The
Keywords:
Data mining, Decision tree classification, c4.5 algorithm
1. Introduction
Data mining, also known as knowledge discovery in database, is the process of obtaining attractive and
valuable patterns and associations in large data sets. Data mining combines utensils
from statistics and artificial intelligence (such as neural networks and machine learning)
with database management to analyse large data sets. Data mining techniques can be applied to many
fields such as business (insurance, banking, retail), science and research (astronomy, medicine),
and government security (detection of criminals and terrorists).
Decision trees are the most dominant and accepted approaches in knowledge discovery and data
mining. It is the science and technology of exploring huge and composite data sets in order to find out
useful patterns form training dataset. Now days the area of data mining achieves greater significance
because it enables the modelling and information mining from the huge amounts of data available.
Practitioners and theoreticians are incessantly in the hunt for techniques to make the process more
proficient, cost-effective and accurate. Originally Decision trees are implemented in basis of decision
theory and statistics, are very much useful tools in other areas such as data mining, information mining,
machine learning, and pattern detection.
So far, many algorithms have been introduced with different models. Classification algorithms may be
classified as follows.
Distance based algorithms
• Statistical algorithms
• Neural networks
• Genetic algorithms.
• Decision tree algorithms
K-Nearest Neighbor is one of the well known distance based algorithms used in clustering in the
decision tree classification technique [5]. It finds the class of a data item by making use of Euclidian and
Hamming distance measures or any similarity measures.
Statistical classification algorithms commonly include Regression and Bayesian algorithms. Bayesian
algorithms calculate the class value using the probability theorem. In regression, it first determines
whether the model is linear or nonlinear. After that it calculates the class value with the coefficients.
Clustering can also be done using Neural Networks and Genetic algorithms [6]. Neural Network
algorithms calculate weights of attributes in the data set and predict the class of the data item.
John Holland developed Genetic algorithms in 1975[2]. This algorithm uses nature evolution process
and directed random search techniques used in numerical optimization technique. Genetic algorithm uses
cross over and mutation techniques to produce the best population to classify the data set.
Decision tree algorithms make use of a series of if-else-then rules to build a tree from a the given
data items. ID3, C4.5, CART and sprint are the well known decision tree algorithms used for tree building
process. In this paper we propose a decision tree classification using c4.5 algorithm. In the next section
we will discuss the decision tree model and some of the algorithms.
2.
BACK GROUND
Decision tree classification is an important tool in data mining. This section presents various
algorithms used for the implementation of decision tree classification.
First proposed decision tree classification algorithm is IDE3 (Iterative Dichotomiser 3) which was
developed by Quinlan Ross(Quinlan, 1986 and 1987)[1]. Hunt’s algorithm is the base for this algorithm.
It chooses the splitting attribute using the concept of information gain measure. It can only accept
categorical attributes in the process of making a tree structure. IDE3 cannot be used in an atmosphere
where large amounts of data or too much noise is present in training data set. Because of this reason it
doesn’t provide correct result as expected. So a thorough pre-processing of data is required before using
IDE3 algorithm for building a decision tree structure.
Quinlan Ross proposed C4.5 algorithm in 1993, which is more perfect than IDE3 algorithm ie it avoids
the disadvantages of IDE3 algorithm, introduced by Quinlan Ross in 1986 and 1987[1]. For finding out
the splitting attribute it makes use of gain ratio impurity method. C4.5 can take both continuous and
categorical attributes. It has a superior method for tree pruning that decreases misclassification errors due
large amounts of data or too much noise in the training data set.
Breiman in 1984 developed CART (Classification and regression trees) algorithm for decision tree
classification. CART can be used to build both classifications and regressions trees[1]. CART uses binary
splitting of attributes for constructing classification trees. The disadvantages of the CART algorithm are
(1) it does not use combinations of variables. (2) Tree formed through this algorithm is sometimes
unreliable. (3) Tree structures may be unbalanced – a small change in the data set may give different
trees. Though the tree build through this algorithm is most advantageous and produces finest split at each
node but sometimes the tree is not globally optimal.
SLIQ (Supervised Learning In Queues) algorithm was developed by Mehta in 1996[1]. This algorithm
can be used for decision tree classification. Like C4.5 and CART, this algorithm can handle both
continuous and categorical attributes. In the tree growth phase SLIQ uses a pre-sorting technique, which
decreases the cost of evaluation of numeric attributes in the data set. The disadvantage of this algorithm is
it uses a class list data structure that is memory resident, which causes memory restrictions on the data.
R. Narayanan, in this paper, in the decision tree induction process they first isolate the compute
intensive kernel, called Gini score calculation [9]. For reducing the hardware complexity then they
rearranges the computations. To minimize the band width requirements of the DTC architecture it
uses a bitmapped index structure for storing class IDs. This is the first paper that presents the
implementation of a classification algorithm in hardware. The design is build on FPGA platform,
as their reconfigurable nature provides the user sufficient flexibility, allowing for modified
architectures customized to a specific problem and input data size[10]. The most important
property of FPGAs that is very much useful in this design is their ability to scale upward easily as
process technology allows for ever larger gate counts. This architecture provides 5.58 times speed
up as compared to software implementations
3.
ARCHITECTURE
A. To Decision Tree Classification
Data mining techniques can be applied to a variety of data sets and Information produced by data
mining techniques can be characterized in many different ways. Decision tree structures are widely used
to organize classification techniques. Decision trees can be used to find out the steps that reaches a
classification. In general decision trees are starts with a root node or parent node from which the tree
growth begins. Then it evaluates each attribute using an algorithm and finds out the path through which
the tree grows. Typically Decision making process is done through if else statements. The process is
repeated until the tree reaches the leaf node.
Figure 1 Decision Tree Model
In a decision tree structure the leaf node represent the class label. In decision tree classification the
selection of root node is very important. The root node must be the node, which provide the shortest path
to the class label or leaf node. Then the algorithm recursively make calculations to find out attribute,
which is suitable for the next node.
After the selection of the root node the algorithm adds arcs to the root for each branch. Then it adds
nodes to new branches by using the steps in the algorithm. During the processes if the tree pruning
condition is reached, the branching stops and the final branch is labeled with one of the class value. Many
algorithms can be used for this tree building processes. These algorithms are differ s in their splitting
criteria used for finding suitable nodes at each split. Entropyand gini index are the most commonly used
spitting criteria for different algorithms.
B. System Architecture
In this architecture shown in figure 2 the main components are Block-RAM, split info module and a
find min Gini gain module. Block-RAM is used to store the instances(input) values. Split info module
consists of comparators, counters, adders, multipliers and memory elements according to the need. It
computes the Gini gain values for each attribute. This values are given to the find minimum Gini gain
module. This module find out which has lower value and gives the attribute name according to the lower
value at the output.
Block Random access memory (BRAM) is an highly developed memory constructor that is used to
generate area and performance-optimized memories with the help of embedded block RAM resources in
Xilinx FPGAs. It cannot be used to implement other functions like digital logic. So BRAM is a dedicated
memory. It is a two port memory consisting of several kbits of RAM. Depending on how advance the
FPGA, there may be several of them. Block-RAM is used to store the data inputs from pc. They are fixed
RAM modules. Usually they are available in 9 Kbits or 18 Kbits in size. If a small RAM is implemented
with a block RAM then its wastes the rest of the space in RAM. Usually block RAMs are used for large
sized memories and distributed RAM for small sized memories or FIFO's. Block RAM memory is
synchronous. Block-RAM is used to store the data inputs from pc.
Figure2 Proposed Architecture
The Split Info calculation sub block contains two important modules, the counter pattern module and
the Gini Gain module. The counter pattern module calculates the number of instances in the input training
dataset. It takes as input the instances’ value(s) of the attributes and the value of the class label. In case
the instances’ attribute values are the same as the checked attribute value from the equivalent Gini
computing machine the Ri class value memory is written in the position that the class value points,
otherwise, the Li class value memory is written accordingly. Hence, in each memory position, the written
data are equal to the frequency of the class value exists in all instances. After all the input data are read,
the values of each memory are added and multiplied with each other in order to find the final output of the
sub system.
This block is used to get the least values from a set of values. This block firstly finds out the root
node value from the output of the split info sub system. Then this value is given to the memory to store
that value. After that it finds out the branches from the other values got from the split info sub system.
The Gini value is a numerical quantity which calculates the inequality of a data set. Computing
the Gini score for a typical split involves calculating the occurrence of each class in each of the partitions.
The particulars of the Gini calculation can be verified by the following pattern. Assume that there are R
records in the present node. Also, assume that there are only two different values of class IDs, hence there
is only two partitions into which the root node can be split. The algorithm works over the R records and
calculates the number of records belonging to different partitions. The Gini value for each partition is then
given by
1
𝐺𝑖𝑛𝑖𝑖 = 1 − ∑ 𝑅𝑖𝑗/(𝑅𝑖 ∗ 𝑅𝑖)
𝑗=0
Where Ri is the number of total number of instances in partition i. Rij represent the number of
instances that holds the class value j. The Gini value of the total split is then computed using the weighted
average of the Gini values for each partition, i.e.,
1 𝑅𝑖
Gini total=∑𝑖=0
𝑅
∗ 𝐺𝑖𝑛𝑖𝑖
The Rij values are stored in a count matrix, in memory. The partitions are formed based on
splitting criteria, which depends on the value of a particular attribute. Each attribute is a feasible nominee
for being the split attribute. Hence this process of calculating the finest split has to be done over all
attributes. Categorical attributes have a limited number of different class ID values, so there is little
advantage in optimizing Gini value computation for such attributes.
The C4.5 algorithm is used to construct a decision tree, given a set of non-categorical attributes C1, C2, ..,Cn, the categorical
attribute C, and a training data set T of records.
Function C4.5 (R: set of non-categorical attributes,
C: set of the categorical attribute,
S: training data)
gives a decision tree structure;
start
If S is vacant, then produce a single node with value Failure;
If all of the instances present in S have the same value for the categorical attribute, then produce a single node with that value;
If R is vacant , then produce a single node with a value which is the most common value of the categorical attribute that are
found in instances of S
Note:Ifthe records are improperly classified then errors will be there.
Let D be the attribute with highestGain(D,S) compared to all other attributes.
Let {di| i=1,2, .., p} represents the values of attribute D.
Let {Si| i=1,2, .., p} represents subsets of S consisting respectively of instances with value di for attribute D.
Generate a tree with root as D and branches as d1, d2, ..,dp.
C4.5(R-{D}, C, S1), C4.5(R-{D}, C, S2), .., C4.5(R-{D}, C, Sp),
end C4.5;
. Figure 3 Algorithm
4. RESULTS AND DISCUSSIONS
In this section we describe the results obtained through our work. We did our work in Xilinx 13.2
version with device XC5VLX110T. The example data set used for checking the architecture is shown
in table 1. It contains 14 instances, 4 attributes and one class value. For doing this, c4.5 algorithm is
used and according to that the output is obtained. Steps used in the algorithm are shown in figure 3. It
describes each step of the algorithm and using those steps we can reach the output easily.
5. TABLE 1 : Sample Data
Outlook Temperature Humidity Wind
Play
ball
Sunny
Hot
High
Weak No
Sunny
Hot
High
Strong No
Overcast Hot
High
Weak Yes
Rain
Mild
High
Weak Yes
Rain
Cool
Normal
Weak Yes
Rain
Cool
Normal
Strong No
Overcast Cool
Normal
Strong Yes
Sunny
Mild
High
Weak No
Sunny
Cool
Normal
Weak Yes
Rain
Mild
Normal
Weak Yes
Sunny
Mild
Normal
Strong Yes
Overcast Mild
High
Strong Yes
Overcast Hot
Normal
Weak Yes
Rain
High
Strong No
Mild
Total
14
Quinlan developed C4.5 algorithm for inducing Classification Models, also known as Decision
Trees, from training data set.
CLS (concept learning system) is the basic principle followed by all Decision tree induction
algorithms. Hunt developed this principle.
We are given with a set of instances. Each instance has the same structure. It consists of a number of
attribute/value pairs. One of these attributes is used to represents the class value of the instance. The
problem lies in the constructing process of a decision tree on the basis of answers to questions about the
non-category attributes find out correctly the value of class. Commonly the category attribute takes only
the values {true, false}, or {success, failure}, or something equivalent.
The final classified tree is shown in figure 4. Using the c4.5 algorithm we first find out the information
gain of each attribute for selecting the root node. Attribute selection for root node is the basic step in
constructing a decision tree. Entropy and Information Gain is used in process of attribute selection, using
attribute selection,C45 algorithm select which attribute will be selected to become a node of
the decision tree and so on. The result obtained through our work is shown in figure 5
Figure 4. Final tree structure
The final tree structure is shown in figure 5. It takes outlook as root node and its branches are sunny,
rainy and overcast. From sunny it goes to the node humidity and the two branches are high and normal.
Then from high it goes to the leaf node NO and from normal it goes to the leaf node YES. Similarly from
the branch rainy it goes to the branch node windy. The two branches are true and false. From false branch
it goes to the leaf node YES and from true branch it goes to the leaf node NO. From the branch overcast
it directly reaches the leaf node YES.
This architecture can be used for building decision tree for any applications in the data mining process.
It is a flexible architecture, which can be used for any data set with 4 attributes, one class value and any
number of attribute values.
Figure 5. Final result
5.
CONCLUSION
Decision tree induction is one of the classification techniques used in decision support systems and
machine learning process. With the help of decision tree technique the input data set is recursively
partitioned using Hunt’s method or breadth-first greedy technique until each partition is pure or belong to
the same class. Decision tree model is preferred among other classification algorithms because it is an
eager learning algorithm and easy to implement. This paper presents an implementation of the widelyused c4.5 Decision tree classification method as well as a system implementing the split info node’s
computation, using counter pattern and the Gini impurity classification method.
REFERENCES
[1] Matthew N. Anyanwu&Sajjan G. Shiva.Comparative Analysis of Serial Decision Tree Classification
Algorithms.
[2] M. Joshi, G. Karypis, and V. Kumar.ScalParC: A new scalable and efficient parallel classification
algorithm for mining large datasets. In Proceedings of the 11th International ParallelProcessing
Symposium (IPPS), 1998.
[3] Z. K. Baker, V. K. Prasanna, “Efficient Hardware Data Mining with the Apriori Algorithm on FPGAs,”
Field-Programmable Custom Computing Machines, 13th Annual IEEE Symposium on FieldProgrammable Custom Computing Machines (FCCM), pp. 3-12, 2005.
[4] Y. H. Wen, J. W. Huang, M. S. Chen, “Hardware-Enhanced Association Rule Mining with Hashing and
Pipelining,” IEEE Transactions on Knowledge and Data Engineering, pp. 784-795, 2008.
[5] M. Estlick, M. Leeser, J. Theiler, J. J. Szymanski, “Algorithmic transformations in the implementation of
K-means clustering on reconfigurable hardware,” Proceedings of the ACM/SIGDA 9th international
symposium on Field programmable gate arrays, pp. 103-110, 2001.
[6] X. Wang, M. Leeser, “K-means Clustering for Multispectral Images Using Floating-Point Divide,” FieldProgrammable Custom Computing Machines, 15th Annual IEEE Symposium on Field-Programmable
Custom Computing Machines (FCCM), pp. 151-162, 2007.
[7] M. Papadonikolakis, C. Bouganis, G. Constantinides, “Performance comparison of GPU and FPGA
architectures for the SVM training problem,” International Conference on Field- Programmable
Technology, pp.388-391, 2009.
[8] S. Sun, J. Zambreno, “Mining Association Rules with systolic trees,” International Conference on Field
Programmable Logic and Applications, pp.143-148, 2008.
[9] R. Narayanan, D. Honbo, G. Memik, A.Choudhary, J. Zambreno, “An FPGA Implementation of Decision
Tree Classification,” Design, Automation and Test in Europe Conference and Exhibition, pp. 45, 2007.
[10] A. Choudhary, R. Narayanan, B. Ozisikyilmaz, G. Memik, J. Zambreno, J. Pisharath, “Optimizing data
mining workloads using hardware accelerators,” Proceedings of the Workshop of Computer Architecture
Evaluation using Commercial Workloads (CAECW), 2007.