Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Vlsi Implementation Of Flexible Architecture For Decision Tree Classification In Data Mining Dr K Venkatesh Sharma1, Behailu Shewandagn2, Shankar Nayak Bhukya3 1,a Asst Professor, School of Computing, Jimma Institute of Technology, Jimma University,. Asst Professor, School of Computing, Jimma Institute of Technology, Jimma University 3,c Shankar Nayak Bhukya, Asst Professor, SVIT, Hyderabad. 2,b a) [email protected] b) [email protected] c) [email protected] Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm. Abstract. The Keywords: Data mining, Decision tree classification, c4.5 algorithm 1. Introduction Data mining, also known as knowledge discovery in database, is the process of obtaining attractive and valuable patterns and associations in large data sets. Data mining combines utensils from statistics and artificial intelligence (such as neural networks and machine learning) with database management to analyse large data sets. Data mining techniques can be applied to many fields such as business (insurance, banking, retail), science and research (astronomy, medicine), and government security (detection of criminals and terrorists). Decision trees are the most dominant and accepted approaches in knowledge discovery and data mining. It is the science and technology of exploring huge and composite data sets in order to find out useful patterns form training dataset. Now days the area of data mining achieves greater significance because it enables the modelling and information mining from the huge amounts of data available. Practitioners and theoreticians are incessantly in the hunt for techniques to make the process more proficient, cost-effective and accurate. Originally Decision trees are implemented in basis of decision theory and statistics, are very much useful tools in other areas such as data mining, information mining, machine learning, and pattern detection. So far, many algorithms have been introduced with different models. Classification algorithms may be classified as follows. Distance based algorithms • Statistical algorithms • Neural networks • Genetic algorithms. • Decision tree algorithms K-Nearest Neighbor is one of the well known distance based algorithms used in clustering in the decision tree classification technique [5]. It finds the class of a data item by making use of Euclidian and Hamming distance measures or any similarity measures. Statistical classification algorithms commonly include Regression and Bayesian algorithms. Bayesian algorithms calculate the class value using the probability theorem. In regression, it first determines whether the model is linear or nonlinear. After that it calculates the class value with the coefficients. Clustering can also be done using Neural Networks and Genetic algorithms [6]. Neural Network algorithms calculate weights of attributes in the data set and predict the class of the data item. John Holland developed Genetic algorithms in 1975[2]. This algorithm uses nature evolution process and directed random search techniques used in numerical optimization technique. Genetic algorithm uses cross over and mutation techniques to produce the best population to classify the data set. Decision tree algorithms make use of a series of if-else-then rules to build a tree from a the given data items. ID3, C4.5, CART and sprint are the well known decision tree algorithms used for tree building process. In this paper we propose a decision tree classification using c4.5 algorithm. In the next section we will discuss the decision tree model and some of the algorithms. 2. BACK GROUND Decision tree classification is an important tool in data mining. This section presents various algorithms used for the implementation of decision tree classification. First proposed decision tree classification algorithm is IDE3 (Iterative Dichotomiser 3) which was developed by Quinlan Ross(Quinlan, 1986 and 1987)[1]. Hunt’s algorithm is the base for this algorithm. It chooses the splitting attribute using the concept of information gain measure. It can only accept categorical attributes in the process of making a tree structure. IDE3 cannot be used in an atmosphere where large amounts of data or too much noise is present in training data set. Because of this reason it doesn’t provide correct result as expected. So a thorough pre-processing of data is required before using IDE3 algorithm for building a decision tree structure. Quinlan Ross proposed C4.5 algorithm in 1993, which is more perfect than IDE3 algorithm ie it avoids the disadvantages of IDE3 algorithm, introduced by Quinlan Ross in 1986 and 1987[1]. For finding out the splitting attribute it makes use of gain ratio impurity method. C4.5 can take both continuous and categorical attributes. It has a superior method for tree pruning that decreases misclassification errors due large amounts of data or too much noise in the training data set. Breiman in 1984 developed CART (Classification and regression trees) algorithm for decision tree classification. CART can be used to build both classifications and regressions trees[1]. CART uses binary splitting of attributes for constructing classification trees. The disadvantages of the CART algorithm are (1) it does not use combinations of variables. (2) Tree formed through this algorithm is sometimes unreliable. (3) Tree structures may be unbalanced – a small change in the data set may give different trees. Though the tree build through this algorithm is most advantageous and produces finest split at each node but sometimes the tree is not globally optimal. SLIQ (Supervised Learning In Queues) algorithm was developed by Mehta in 1996[1]. This algorithm can be used for decision tree classification. Like C4.5 and CART, this algorithm can handle both continuous and categorical attributes. In the tree growth phase SLIQ uses a pre-sorting technique, which decreases the cost of evaluation of numeric attributes in the data set. The disadvantage of this algorithm is it uses a class list data structure that is memory resident, which causes memory restrictions on the data. R. Narayanan, in this paper, in the decision tree induction process they first isolate the compute intensive kernel, called Gini score calculation [9]. For reducing the hardware complexity then they rearranges the computations. To minimize the band width requirements of the DTC architecture it uses a bitmapped index structure for storing class IDs. This is the first paper that presents the implementation of a classification algorithm in hardware. The design is build on FPGA platform, as their reconfigurable nature provides the user sufficient flexibility, allowing for modified architectures customized to a specific problem and input data size[10]. The most important property of FPGAs that is very much useful in this design is their ability to scale upward easily as process technology allows for ever larger gate counts. This architecture provides 5.58 times speed up as compared to software implementations 3. ARCHITECTURE A. To Decision Tree Classification Data mining techniques can be applied to a variety of data sets and Information produced by data mining techniques can be characterized in many different ways. Decision tree structures are widely used to organize classification techniques. Decision trees can be used to find out the steps that reaches a classification. In general decision trees are starts with a root node or parent node from which the tree growth begins. Then it evaluates each attribute using an algorithm and finds out the path through which the tree grows. Typically Decision making process is done through if else statements. The process is repeated until the tree reaches the leaf node. Figure 1 Decision Tree Model In a decision tree structure the leaf node represent the class label. In decision tree classification the selection of root node is very important. The root node must be the node, which provide the shortest path to the class label or leaf node. Then the algorithm recursively make calculations to find out attribute, which is suitable for the next node. After the selection of the root node the algorithm adds arcs to the root for each branch. Then it adds nodes to new branches by using the steps in the algorithm. During the processes if the tree pruning condition is reached, the branching stops and the final branch is labeled with one of the class value. Many algorithms can be used for this tree building processes. These algorithms are differ s in their splitting criteria used for finding suitable nodes at each split. Entropyand gini index are the most commonly used spitting criteria for different algorithms. B. System Architecture In this architecture shown in figure 2 the main components are Block-RAM, split info module and a find min Gini gain module. Block-RAM is used to store the instances(input) values. Split info module consists of comparators, counters, adders, multipliers and memory elements according to the need. It computes the Gini gain values for each attribute. This values are given to the find minimum Gini gain module. This module find out which has lower value and gives the attribute name according to the lower value at the output. Block Random access memory (BRAM) is an highly developed memory constructor that is used to generate area and performance-optimized memories with the help of embedded block RAM resources in Xilinx FPGAs. It cannot be used to implement other functions like digital logic. So BRAM is a dedicated memory. It is a two port memory consisting of several kbits of RAM. Depending on how advance the FPGA, there may be several of them. Block-RAM is used to store the data inputs from pc. They are fixed RAM modules. Usually they are available in 9 Kbits or 18 Kbits in size. If a small RAM is implemented with a block RAM then its wastes the rest of the space in RAM. Usually block RAMs are used for large sized memories and distributed RAM for small sized memories or FIFO's. Block RAM memory is synchronous. Block-RAM is used to store the data inputs from pc. Figure2 Proposed Architecture The Split Info calculation sub block contains two important modules, the counter pattern module and the Gini Gain module. The counter pattern module calculates the number of instances in the input training dataset. It takes as input the instances’ value(s) of the attributes and the value of the class label. In case the instances’ attribute values are the same as the checked attribute value from the equivalent Gini computing machine the Ri class value memory is written in the position that the class value points, otherwise, the Li class value memory is written accordingly. Hence, in each memory position, the written data are equal to the frequency of the class value exists in all instances. After all the input data are read, the values of each memory are added and multiplied with each other in order to find the final output of the sub system. This block is used to get the least values from a set of values. This block firstly finds out the root node value from the output of the split info sub system. Then this value is given to the memory to store that value. After that it finds out the branches from the other values got from the split info sub system. The Gini value is a numerical quantity which calculates the inequality of a data set. Computing the Gini score for a typical split involves calculating the occurrence of each class in each of the partitions. The particulars of the Gini calculation can be verified by the following pattern. Assume that there are R records in the present node. Also, assume that there are only two different values of class IDs, hence there is only two partitions into which the root node can be split. The algorithm works over the R records and calculates the number of records belonging to different partitions. The Gini value for each partition is then given by 1 𝐺𝑖𝑛𝑖𝑖 = 1 − ∑ 𝑅𝑖𝑗/(𝑅𝑖 ∗ 𝑅𝑖) 𝑗=0 Where Ri is the number of total number of instances in partition i. Rij represent the number of instances that holds the class value j. The Gini value of the total split is then computed using the weighted average of the Gini values for each partition, i.e., 1 𝑅𝑖 Gini total=∑𝑖=0 𝑅 ∗ 𝐺𝑖𝑛𝑖𝑖 The Rij values are stored in a count matrix, in memory. The partitions are formed based on splitting criteria, which depends on the value of a particular attribute. Each attribute is a feasible nominee for being the split attribute. Hence this process of calculating the finest split has to be done over all attributes. Categorical attributes have a limited number of different class ID values, so there is little advantage in optimizing Gini value computation for such attributes. The C4.5 algorithm is used to construct a decision tree, given a set of non-categorical attributes C1, C2, ..,Cn, the categorical attribute C, and a training data set T of records. Function C4.5 (R: set of non-categorical attributes, C: set of the categorical attribute, S: training data) gives a decision tree structure; start If S is vacant, then produce a single node with value Failure; If all of the instances present in S have the same value for the categorical attribute, then produce a single node with that value; If R is vacant , then produce a single node with a value which is the most common value of the categorical attribute that are found in instances of S Note:Ifthe records are improperly classified then errors will be there. Let D be the attribute with highestGain(D,S) compared to all other attributes. Let {di| i=1,2, .., p} represents the values of attribute D. Let {Si| i=1,2, .., p} represents subsets of S consisting respectively of instances with value di for attribute D. Generate a tree with root as D and branches as d1, d2, ..,dp. C4.5(R-{D}, C, S1), C4.5(R-{D}, C, S2), .., C4.5(R-{D}, C, Sp), end C4.5; . Figure 3 Algorithm 4. RESULTS AND DISCUSSIONS In this section we describe the results obtained through our work. We did our work in Xilinx 13.2 version with device XC5VLX110T. The example data set used for checking the architecture is shown in table 1. It contains 14 instances, 4 attributes and one class value. For doing this, c4.5 algorithm is used and according to that the output is obtained. Steps used in the algorithm are shown in figure 3. It describes each step of the algorithm and using those steps we can reach the output easily. 5. TABLE 1 : Sample Data Outlook Temperature Humidity Wind Play ball Sunny Hot High Weak No Sunny Hot High Strong No Overcast Hot High Weak Yes Rain Mild High Weak Yes Rain Cool Normal Weak Yes Rain Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes Overcast Hot Normal Weak Yes Rain High Strong No Mild Total 14 Quinlan developed C4.5 algorithm for inducing Classification Models, also known as Decision Trees, from training data set. CLS (concept learning system) is the basic principle followed by all Decision tree induction algorithms. Hunt developed this principle. We are given with a set of instances. Each instance has the same structure. It consists of a number of attribute/value pairs. One of these attributes is used to represents the class value of the instance. The problem lies in the constructing process of a decision tree on the basis of answers to questions about the non-category attributes find out correctly the value of class. Commonly the category attribute takes only the values {true, false}, or {success, failure}, or something equivalent. The final classified tree is shown in figure 4. Using the c4.5 algorithm we first find out the information gain of each attribute for selecting the root node. Attribute selection for root node is the basic step in constructing a decision tree. Entropy and Information Gain is used in process of attribute selection, using attribute selection,C45 algorithm select which attribute will be selected to become a node of the decision tree and so on. The result obtained through our work is shown in figure 5 Figure 4. Final tree structure The final tree structure is shown in figure 5. It takes outlook as root node and its branches are sunny, rainy and overcast. From sunny it goes to the node humidity and the two branches are high and normal. Then from high it goes to the leaf node NO and from normal it goes to the leaf node YES. Similarly from the branch rainy it goes to the branch node windy. The two branches are true and false. From false branch it goes to the leaf node YES and from true branch it goes to the leaf node NO. From the branch overcast it directly reaches the leaf node YES. This architecture can be used for building decision tree for any applications in the data mining process. It is a flexible architecture, which can be used for any data set with 4 attributes, one class value and any number of attribute values. Figure 5. Final result 5. CONCLUSION Decision tree induction is one of the classification techniques used in decision support systems and machine learning process. With the help of decision tree technique the input data set is recursively partitioned using Hunt’s method or breadth-first greedy technique until each partition is pure or belong to the same class. Decision tree model is preferred among other classification algorithms because it is an eager learning algorithm and easy to implement. This paper presents an implementation of the widelyused c4.5 Decision tree classification method as well as a system implementing the split info node’s computation, using counter pattern and the Gini impurity classification method. REFERENCES [1] Matthew N. Anyanwu&Sajjan G. Shiva.Comparative Analysis of Serial Decision Tree Classification Algorithms. [2] M. Joshi, G. Karypis, and V. Kumar.ScalParC: A new scalable and efficient parallel classification algorithm for mining large datasets. In Proceedings of the 11th International ParallelProcessing Symposium (IPPS), 1998. [3] Z. K. Baker, V. K. Prasanna, “Efficient Hardware Data Mining with the Apriori Algorithm on FPGAs,” Field-Programmable Custom Computing Machines, 13th Annual IEEE Symposium on FieldProgrammable Custom Computing Machines (FCCM), pp. 3-12, 2005. [4] Y. H. Wen, J. W. Huang, M. S. Chen, “Hardware-Enhanced Association Rule Mining with Hashing and Pipelining,” IEEE Transactions on Knowledge and Data Engineering, pp. 784-795, 2008. [5] M. Estlick, M. Leeser, J. Theiler, J. J. Szymanski, “Algorithmic transformations in the implementation of K-means clustering on reconfigurable hardware,” Proceedings of the ACM/SIGDA 9th international symposium on Field programmable gate arrays, pp. 103-110, 2001. [6] X. Wang, M. Leeser, “K-means Clustering for Multispectral Images Using Floating-Point Divide,” FieldProgrammable Custom Computing Machines, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 151-162, 2007. [7] M. Papadonikolakis, C. Bouganis, G. Constantinides, “Performance comparison of GPU and FPGA architectures for the SVM training problem,” International Conference on Field- Programmable Technology, pp.388-391, 2009. [8] S. Sun, J. Zambreno, “Mining Association Rules with systolic trees,” International Conference on Field Programmable Logic and Applications, pp.143-148, 2008. [9] R. Narayanan, D. Honbo, G. Memik, A.Choudhary, J. Zambreno, “An FPGA Implementation of Decision Tree Classification,” Design, Automation and Test in Europe Conference and Exhibition, pp. 45, 2007. [10] A. Choudhary, R. Narayanan, B. Ozisikyilmaz, G. Memik, J. Zambreno, J. Pisharath, “Optimizing data mining workloads using hardware accelerators,” Proceedings of the Workshop of Computer Architecture Evaluation using Commercial Workloads (CAECW), 2007.