Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Artificial Intelligence and Applications for Smart Devices Vol.4, No.1 (2016), pp.9-32 http://dx.doi.org/10.14257/ijaiasd.2016.4.1.02 A Survey on Issues of Decision Tree and Non-Decision Tree Algorithms Kishor Kumar Reddy C1 and Vijaya Babu2 1 2 Department of Computer Science and Engineering, K L University, Guntur Department of Computer Science and Engineering, K L University, Guntur 1 [email protected], [email protected] Abstract Decision tree and non-decision tree approaches are effectively used in many diverse areas such as speech recognition, radar signal classification, satellite signal classification, medical diagnosis, remote sensing, expert systems, and weather forecasting and so on. Even though classification has been studied widely in the past, many of the algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. The volume of data in databases is growing to quite large sizes, both in the number of attributes, instances and class labels. Decision tree learning from a very huge set of records in a database is quite complex task and is usually a very slow process, which is often beyond the capabilities of existing computers. This paper is an attempt to summarize the proposed approaches, tools etc. for decision tree learning with emphasis on optimization of constructed trees and handling large datasets. Further, we also discussed and summarized various non-decision tree approaches like Neural Networks, Support Vector Machines, and Naive Bayes and so on. Key-Words: -Classification, Decision Tree, Expert Systems, Pattern Recognition, Remote Sensing 1. Introduction Recent advances in methodologies of collecting the data, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Huge amounts of data are being collected daily from major scientific projects such as Geographical Information Systems, from hospital information systems, Human Genome Project, the Hubble Space Telescope, from stocks trading, from computerized sales records and many other sources. In addition, researchers, scientists, students and practitioners from more diverse disciplines than ever before are attempting to use automated methods to analyze their data. As the quantity and variety of data available to data exploration methods increases, there is a commensurate need for robust, efficient, accurate and versatile data exploration methods. Supervised learning method attempts to discover relationship between the source attributes and the target attribute [1-2]. Classification is an significant problem in the emerging field of data mining and can be described as follows: The input data (training set) comprises of multiple records, each having multiple attributes followed by class labels [141][182]. The goal of classification is to analyze the input data and to develop an accurate model for each class using the features present in the data. The class descriptions are used to classify future test data for which the class labels are unknown. They can also be used to develop a better understanding of each class in the data [1-2]. Decision trees have been found to be the best algorithms for data classification, providing better accuracy to many real world applications [3-6] [141][182]. Researchers, academicians, practitioners and scientists have developed various decision tree algorithms ISSN: 2288-6710 IJAIASD Copyright ⓒ 2016 SERSC International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) over a period of time with enhancement in performance and ability to handle various types of data [7-15]. In this paper, we presented and summarized various decision tree approaches [141] [182] like ID3 [39], C4.5 [8], CART [9], SLIQ [121], SPRINT [122][126], SLEAS [177-179], SLGAS [180], ISLIQ [181] and so on. Further we also discussed various non decision tree approaches such as Rule Based Classifier [141], Nearest Neighbour Classifiers [141], Bayesian Classifiers [141], Artificial Neural Network (ANN) [141] and Support Vector Machine [141]. We have laid out the rest of this paper as follows: Section 2 provides the representation of decision tree. Section 3 provides various splitting criterions for decision tree algorithms. Section 4 provides the detailed description on various pruning techniques and the recent improvements. Section 5 provides the comparison and summary of various decision tree and non-decision tree approaches. Section 6 provides various softwares in learning decision tree and non-decision tree approaches. Section 7 discusses on real time applications of decision tree and non-decision tree approaches. Section 8 concludes the paper followed by references. 2. Decision Tree Representation Decision tree is a classification model applies for existing data, handles high dimensional data as it is scalable and doesn’t require any domain knowledge [1-6]. Decision tree is described as a complexion of mathematical and statistical technique for categorization and generalization of a dataset. A decision tree is a schematic tree-shaped diagram used to determine a course of action or show statistical probability. Each branch of the decision tree represents a possible decision which may lead to another attribute till it ends with leaf node. The target value is represented by each branch is termed as a leaf of the decision tree wherein input variables are given at each branch. Splitting of the source sets into the subsets recursively until leaf level is reached is the main tree learning process of the decision tree [8]. A decision tree classifier divides the training set into the partitions recursively, so that all or the most of the records are of similar class merely. The algorithm starts with the entire dataset at the root node. The dataset is partitioned by using the splitting criteria into the subsets which uses the various statistical approaches like entropy, gini index and gain ratio and so on [6][8-16]. The splitting is carried out for the dataset by choosing the attribute that will separate best the remaining samples of the nodes portioned into the individual classes. Decision tree generation consists of two phases [141] [182]: Tree construction, which includes: At start, all the training examples are at the root. Partition examples recursively based on selected attributes. Tree pruning, which includes: Identify and remove branches that reflect noise or outliers. Decision trees are applied in decision analysis problems to identify the most suitable strategy for reaching a concrete goal. Decision trees provide classification rules and can be understood easily. However, the accuracy of decision trees has always been a problem [1-2] [4]. Several advantages of decision tree-based classification have been pointed out [141][182]. 1. Knowledge acquisition from pre-classified examples circumvents the bottleneck of acquiring knowledge from a domain expert. 10 Copyright ⓒ 2016 SERSC International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) 2. Tree methods are exploratory as opposed to inferential. They are also nonparametric. As only a few assumptions are made about the model and the data distribution, trees can model a wide range of data distributions. 3. Hierarchical decomposition implies better use of available features and computational efficiency in classification. 4. As opposed to some statistical methods, tree classifiers can treat uni-model as well as multi-modal data in the same fashion. 5. Trees can be used with the same ease in deterministic as well as incomplete problems. 6. Trees perform classification by a sequence of simple, easy-to-understand tests whose semantics are intuitively clear to domain experts. The decision tree formalism itself is intuitively appealing. For these and other reasons, decision tree methodology can provide an important tool in every data mining researcher, practitioner's tool box. In fact, many existing data mining products are based on constructing decision trees from data. Naturally decision makers prefer less complex decision tree, as it is considered more comprehensible. Furthermore according to Breiman et al., the tree complexity has a crucial effect on its accuracy performance. The tree complexity is explicitly controlled by the stopping criteria used and the pruning method employed. 3. Splitting Criterions in Decision Tree 3.1. Gini Index[9][141][182] Step 1: Calculate the G_Info value for the class label, shown in equation 1. M G _ in fo 1 Pi 2 i 1 (1) Where Pi is the probability that a relative tuple in given dataset belongs to class Ci, M is the number of class labels. Step 2: Calculate the G_InfoD value for every attribute, shown in equation 2. 2 N M G _ inf o D P j 1 Pi j 1 i 1 (2) Step 3: The Gini Index is obtained by finding the difference between G_Info and G_InfoD values, shown in equation 3. GiniIndex G _ in fo G _ in fo D (3) Step 4: The maximum Gini Index value is considered as the best split point and is the root node, shown in equation 4. Best Split po int Maximum Copyright ⓒ 2016 SERSC ( Gini Index ) (4) 11 International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) 3.2. Entropy [7] [141][182] Step 1: Calculate the Attribute Entropy value, shown in equation 5. N Attribute Entropy M P j Pi log j 1 i 1 2 Pi (5) Step 2: Calculate the Class Entropy value for every attribute, shown in equation 6. M Pi log Class Entropy 2 Pi i 1 (6) Step 3: The Entropy is obtained by finding the difference between Class Entropy value and Attribute Entropy, shown in equation 7. Entropy Class Entropy Attribute Entropy (7) Step 4: The maximum Entropy value is considered as the best split point and is the root node, shown in equation 8. Best Split po int Maximum ( Entropy) (8) 3.3. Gain Ratio[7] [141][182] Step 1: Calculate Entropy using equation 7. Step 2: Compute Split Info for each and every attribute, shown in equation (9). Split Info N T i i 1 T log T i 2 T (9) Step 3: Calculate Gain Ratio using equation 10. Gain Ratio Entropy Split Info (10) Step 4: The maximum Gain Ratio value is considered as the best split point and is the root node, shown in equation 11. Best split point Maximum Gain Ratio (11) 4. Decision Tree Pruning [141][182] Decision trees are often large and complex and as a result they may become inaccurate and incomprehensible. The causes of overfitting in decision trees are the noisy data, unavailability of training samples for one or more classes and insufficient training instances to represent the target function. Decision tree pruning removes one or more subtrees from a decision tree. It makes overfitting trees more accurate in classifying unseen data. Various methods have been proposed for decision tree pruning. These methods selectively replace a subtree with a leaf, if it does not reduce classification accuracy over pruning data set. Pruning may increase the number of classification errors on the training set but it improves classification accuracy on unseen data [141][182]. 12 Copyright ⓒ 2016 SERSC International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) 4.1. Cost Complexity Pruning Cost complexity pruning is used in the CART system [9]. Starting with initial unpruned tree that is constructed from complete training set, this algorithm constructs a chain of progressively smaller pruned trees by replacing one or more subtress best possible leaves. The method prunes those subtrees that give the lowest increase in error for the training data. The cost complexity of a tree is defined as ratio of number of correctly classified instances to misclassified instances in training data plus number of leaves in that tree multiplied by some parameter α [17-19]. 4.2. Reduced-Error Pruning The reduced error pruning proposed by Quinlan is a bottom-up approach in which the non-leaf subtrees are replaced with best possible leaf nodes if these replacements reduce the classification error on the pruning data set [19]. The process continues towards the root node until the pruning decreases error. The process assures smallest and most accurate decision trees with respect to the test data [17-20]. 4.3. Critical Value Pruning Mingers proposed critical value pruning, which uses the information gathered during tree construction [20]. It sets a threshold called a critical value to select a node for pruning. Various measures such as gain, info gain etc. can be used to select the best attribute at the test node. If the value of selection criterion is smaller than this threshold value the subtree is replaced with a best possible leaf. However, if the subtree contains at least one node having value greater than the threshold, the subtree cannot be pruned [1718]. 4.4. Minimum Error Pruning Niblett and Bratko proposed Minimum-error pruning, which is a bottom-up approach [17]. To get error estimates of a subtree to be pruned, the errors for its children are estimated. The dynamic error of the node is calculated as weighted sum of static errors of its children. If dynamic error of t is less than its static error, t will be pruned and will be replaced with best possible leaf. 4.5. Pessimistic Error Pruning Pessimistic error pruning, a top down approach proposed by Quinlan uses error rate estimates to make decisions concerning pruning the subtrees similar to cost complexity pruning [19]. It calculates classification errors on training data and does not require separate pruning set. Since the classification errors estimated from training set cannot provide best pruning results for unseen data, this pruning technique assumes that each leaf classifies a certain fraction of instances with error. To reflect these errors, it adds a continuity correction for binomial distribution to the derived training error of a subtree. However, as the corrected misclassification estimation by a subtree is expected to be optimistic, the algorithm calculates standard error. Quinlan recommends pruning a subtree if its corrected estimate of error is lower than that for the node by at least one standard error [18-20]. 4.6. Error-Based Pruning Error-based pruning is the default pruning method for the well-known C4.5 decision tree algorithm. Instead of using a pruning set it uses error estimates. The method assumes that the errors are binomially distributed and calculates error estimates from the training data. The number of instances covered by the leaf of the tree is used to estimate errors. A Copyright ⓒ 2016 SERSC 13 International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) bottom-up approach is used for error-based pruning. If the number of predicted errors for the leaf is not greater than the sum of the predicted errors for the leaf nodes of that subtree then subtree is replaced with that leaf [18]. 4.7. Minimum Description Length Pruning Mehata et al., and Quinlan and Rivest utilized MDL principle for decision tree pruning [21-22]. The principle of minimum description length used here states that a classifier that compresses the data is a preferable inducer. The MDL pruning method selects decision tree with less number of bits required to represent it. The size of the decision tree is measured as number of bits required for encoding the decision tree. The method searches for decision tree that maximally compresses the data [18]. 4.8. Optimal Pruning Optimal pruning algorithm constructs smaller pruned trees with maximum classification accuracy on training data. Breiman et al., [9] first suggested a dynamic programming solution for optimal pruning algorithm. Bohanec and Bratko [23] introduced an optimal pruning algorithm called OPT which gives better solution. Almuallim [24] proposed an enhancement to OPT called OPT-2. It is also based on dynamic programming and has flexibility in various aspects and is easy to implement. 4.9. Improvements to Pruning Algorithms Matti Kääriäinen [25] analyzed reduced error pruning and proposed a new method for obtaining generalization of error bounds for pruning the decision trees. Error-based pruning has been blamed for the general effect of under-pruning. Hall et al., [26] proved that if the certainty factor value CF is appropriately set for the data set, error-based pruning constructs trees that are essentially steady in size, in spite of the amount of training data. The CF calculates the upper limit of the probability of an error at a leaf. Oates and Jenson [27] presented improvements to reduced error pruning to handle problems with large data sets. Macek and Lhotsk [28] presented a technique for pruning of decision trees based on the complexity measure of a tree and its error rate. The technique utilizes the Gaussian complexity averages of a decision tree to compute the error rate of classification. Frank [29] enhanced performance of standard decision tree pruning algorithm. The performance is enhanced with statistical significance of observations. Bradford et al. [30] proposed pruning decision trees with misclassification cost with respect to loss. Scott [31] proposed algorithms for size-based penalties and subadditive penalties. Bradley and Lovell [32] proposed a pruning technique that is sensitive to the relative costs of data misclassification. They implemented two cost-sensitive pruning algorithms, by extending pessimistic error pruning and minimum error pruning technique. Cai1 et al., [33] proposed cost-sensitive decision tree pruning CC4.5 to deal with misclassification cost in the decision tree. It provides three cost-sensitive pruning methods to handle with misclassification cost in the decision tree. Mansour [34] proposed pessimistic decision tree pruning based on tree size. A graphical frontier-based pruning (FBP) algorithm is proposed by Huo et al., [35] which provides a full spectrum of information while pruning the tree. The FBP algorithm starts from leaf nodes and proceeds towards root node with local greedy approach. The authors further proposed combination of FBP and cross validation method. In decision tree learning pre-pruning handles noise and postpruning handles the problem of overfitting. Faurnkranz [36] proposed two algorithms to combine pruning and post pruning operations. A method for pruning of oblique decision trees was proposed by Shah and Sastry [37]. 14 Copyright ⓒ 2016 SERSC International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) 4.10. Comparison of Pruning Methods The empirical comparative analysis is one of the important methods to compare the performance of various available algorithms. Quinlan [19] examined and empirically compared tree cost complexity pruning, reduced error pruning and pessimistic pruning on some data sets. These methods have demonstrated significant improvement in terms of size of the tree. Cost complexity pruning tends to produce smaller trees than reduced error pruning or pessimistic error pruning where as in case of classification accuracy, reduced error pruning is somewhat superior to Cost complexity pruning. Floriana et al., [18] presented comparative analysis with six well known pruning methods. Each method has been critically reviewed and its performance has been tested. The paper provides study of theoretical foundations, computational complexity and strengths and weaknesses of the pruning methods. According to this analysis, reducederror pruning outperforms other methods. In addition, MEP, CVP and EBP tend to under prune whereas reduced-error pruning tends to over prune. Similarly, Mingers [19] analyzed five pruning methods with four different splitting criteria. The author has provided the analysis based on size and accuracy of the tree. This work showed that minimum-error pruning is extremely sensitive to the number of classes in the data and is the least accurate method. Pessimistic error pruning is bad on certain datasets and needs to be handled with care. Critical value, cost complexity, and reduced-error pruning methods produced trees with low error rates on all the data sets with consistency. He further clarified that there is no evidence of relation between splitting criteria and pruning method. Windeatt [17] presented empirical comparison of pruning methods for ensemble classifiers. It has been proved that error based pruning performs best for ensemble classifiers. From above studies we can conclude that reduced error pruning and cost complexity pruning methods are the promising pruning methods as compared to other available methods. 5. Comparison of Algorithms [40-95][132-139][141][182] 5.1. CHAID Algorithm [38]: CHAID (Chi-squared Automatic Interaction Detector) is a fundamental decision tree learning algorithm. It was developed by Gordon V Kass in 1980 [38]. CHAID is easy to interpret, easy to handle and can be used for classification and detection of interaction between variables. CHAID is an extension of the AID (Automatic Interaction Detector) and THAID (Theta Automatic Interaction Detector) procedures. It works on principle of adjusted significance testing. After detection of interaction between variables it selects the best attribute for splitting the node which made a child node as a collection of homogeneous values of the selected attribute. The method can handle missing values. It does not imply any pruning method. 5.2. HUNTS Algorithm [39]: Hunt’s algorithm generates a decision tree by top-down or divides and conquers approach. The sample/row data contains more than one class, use an attribute test to split the data into smaller subsets. Hunt’s algorithm maintains optimal split for every stage according to some threshold value as greedy fashion. 5.3. ID3 Algorithm [39]: ID3 stands for Iterative Dichotomiser 3. It builds the tree in a top down fashion, starting from a set of objects and a specification of properties. At each node of the tree, a property is tested and the results used to partition the object set. This process is Copyright ⓒ 2016 SERSC 15 International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) recursively done till the set in a given sub-tree is homogeneous with respect to the classification criteria - in other words it contains objects belonging to the same category. This then becomes a leaf node. At each node, the property to test is chosen based on information theoretic criteria that seek to maximize information gain and minimize entropy. In simpler terms, that property is tested which divides the candidate set in the most homogeneous subsets. The order in which attributes are chosen determines how complicated the tree is. ID3 uses entropy to determine the most informative attribute. ID3 does not use pruning method. It cannot handle numeric attributes and missing attribute values [39]. 5.4. C4.5 Algorithm [8][100][127-129]: The C4.5 algorithm is improvement over ID3 algorithm. The algorithm uses gain ratio as splitting criteria [8]. It can accept data with categorical or numerical values. To handle continuous values it generates threshold and then divides attributes with values above the threshold and values equal to or below the threshold. The default pruning method is errorbased pruning. As missing attribute values are not utilized in gain calculations the algorithm can easily handle missing values. 5.5. CART Algorithm [9]: Classification and regression tree (CART) proposed by Breiman et al., constructs binary trees [9]. The word binary implies that a node in a decision tree can only be split into two groups. CART uses gini index as impurity measure for selecting attribute. The attribute with the largest reduction in impurity is used for splitting the node's records. It can accept data with categorical or numerical values and also handle missing attribute values. It uses cost-complexity pruning. It can also generate regression trees. 5.6. C5.0 Algorithm: C5.0 algorithm is an extension of C4.5 algorithm which is also extension of ID3. It is the classification algorithm which applies in big data set. It is better than C4.5 on the speed, memory and the efficiency. C5.0 model works by splitting the sample based on the field that provides the maximum information gain. The C5.0 model can split samples on basis of the biggest information gain field. The sample subset that is get from the former split will be split afterward. The process will continue until the sample subset cannot be split and is usually according to another field. Finally, examine the lowest level split, those sample subsets that don’t have remarkable contribution to the model will be rejected. C5.0 is easily handled the multi value attribute and missing attribute from data set. 5.7. Evolutionary Techniques: A decision trees is called optimal if it correctly classifies the data set and has minimal number of nodes. The decision tree algorithms use local greedy search method by means of information gain as target function to split the data set. The decision trees generated by these methods are efficient with classification accuracy but they often experience the disadvantage of excessive complexity. Construction of optimal decision tree is identified as NP-Complete problem. This fact leads the use of genetic algorithms that provide global search through space in many directions simultaneously. The genetic algorithm is used to handle combinatorial optimization problems. Different authors have proposed a use of methodologies that integrates genetic algorithms and decision tree learning in order to evolve optimal decision trees. Although the methods are different the goal is to obtain optimal decision trees [3]. 16 Copyright ⓒ 2016 SERSC International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) 5.8. GATree [96]: A. Papagelis and D. Kalles proposed GATree, a genetically evolved decision trees. The genetic algorithms use binary string as initial populations but GATree uses binary decision trees as initial populations. A binary decision tree that includes one decision node with two different leaves. Initially to construct such initial trees a random attribute is selected, if that attribute is nominal valued one of its possible values is randomly selected and in case of continuous attributes an integer value from its minimum to maximum range is randomly selected. Thus the size of the search space is reduced. Two arbitrary nodes from population of subtrees are selected and nodes of those subtrees are swapped to perform crossover operation. In view of the fact that a predicted class value depends just on leaves, the crossover operator does not affect the decision trees consistency. An arbitrary node of a preferred tree is selected and it substitutes the node’s test-value with a new arbitrary chosen value to perform mutation. In case if the arbitrary node is a leaf, it substitutes the installed class with a new arbitrary chosen class. Validation is performed after crossover and mutation to get final decision tree. The fitness function for evaluation is percentage of correctly classified instances on the test data set by the decision tree. The results show compact and equally accurate decision trees as compared to standard decision tree algorithms. 5.9. GAIT Algorithm [97]: Z. Fu proposed GAIT [97] algorithm. The algorithm constructs a set of different decision trees from different subsets of the original data set by using a decision tree algorithm C4.5, on small samples of the data. The genetic algorithm uses these trees as its initial populations. The selection operation selects decision trees from pool by random selection mechanism. The crossover operation exchanges subtrees between the parent trees whereas mutation exchanges subtrees or leaf inside the same tree. The fitness criterion for evaluation is the classification accuracy. The validation on fitness function is performed after crossover and mutation to get final decision tree that are smaller in size [98-99]. 5.10. Fuzzy Decision Tree [101-106]: The fuzzy decision tree provides elevated comprehensibility and the elegant performance of fuzzy systems. Fuzzy sets and fuzzy logic permits the modelling of language related uncertainties. In addition to this, it provides a symbolic outline for knowledge comprehensibility, the capability to represent fine knowledge details and the ability in dealing with problems of noise and inexact data. The tree construction procedure for fuzzy decision tree is similar to decision tree construction algorithm. The splitting criteria are based on fuzzy boundaries but the procedures for inference are dissimilar in fuzzy decision tree. The fuzzy decision trees have fuzzy decisions at each branching point. It makes calculating the best split difficult to some extent if the attributes are continuous valued or multivalue. Constructing smaller fuzzy decision trees is valuable as they contain more information in internal nodes [101]. The enhancements to the fuzzy decision tree algorithms are as follows. Zeidler and Schlosser proposed use of membership function to discretize the attributes for handling continuous valued attributes [102]. Janikow optimized the fuzzy component of a fuzzy decision tree using a genetic algorithm [103]. Myung Won Kim et al., proposed an algorithm that determines an appropriate set of membership functions for each attribute [104]. The algorithm uses histogram analysis with application of the genetic algorithm to tune the initial set of membership functions. A fuzzy decision tree with given set of membership functions is constructed. Fajfer and Janikow described bottom-up fuzzy partitioning in fuzzy decision trees, a complement of top down technique [105]. The proposed algorithm is useful to partition continuous valued attributes into fuzzy sets. Copyright ⓒ 2016 SERSC 17 International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) Guetova et al. proposed Incremental fuzzy decision trees. The algorithm gets equivalent results to non-incremental methods [106]. 5.11. Parallel and Distributed Algorithms [107-120] Parallelization is a renowned, conventional means to speed up classification tasks with large amounts of data and complex programs. In the data mining applications, the size of dataset is growing that leads us to find out computationally efficient, parallel and scalable algorithms with the objective to get optimal accuracy in a reasonable amount of time with parallel processors. The algorithms work in parallel using multiple processors to construct a single reliable model [107-120]. Kazuto et al., explained two methods for parallelizing decision tree algorithm, intranode and the inter-node parallelization [107]. Intra-node parallelization practices the parallel processing in single node and Inter-node parallelization practices the parallel processing among multiple nodes. Intranode parallelism is further classified in record parallelism, attribute parallelism and their combination. Authors have implemented and experimented these four types of parallelizing methods with four kinds of test data. The performance analysis from these experiments states that there is a relation between the characteristics of data and the parallelizing methods. The combination of various parallelizing approaches is the most effectual parallel method. Kufrin proposed a framework for decision tree construction on shared and distributed memory multiprocessor [108]. The method builds parallel decision trees that overcome limitation of serial decision tree on large-scale training data. Narlikar proposed parallel structure of a decision tree-based classifier for memory-resident datasets on SMP. The structure uses two types of divide-and-conquer parallelism, intra-node parallelization and the inter-node parallelization with lightweight Pthreads [109]. Experimental verification on large datasets signifies that the space and time performance of the tree construction algorithm scales with the data size and number of processors. Joshi et al., proposed ScalparC, a scalable parallel classification algorithm for mining large datasets with decision trees using MPI on Cray T3D system [110]. This implementation confirms scalability and efficiency of ScalparC for wide range of training set and wide range of processors. Hall et al., presented combining decision trees learned in parallel [111]. The proposed algorithm builds decision trees with n disjoint data subsets of a complete dataset in parallel and after that converts them into rules to combine into a single rule set. The experiments on two datasets illustrate that there is enhancement of around 40% in quantity of rules generated by decision tree. Zaki et al., proposed parallel algorithm for building decision tree on shared memory multiprocessors and it was verified that it achieves good speedup [112]. Srivastava et al. presented two parallel formulations for decision tree induction as synchronous tree induction and partitioned tree induction [113]. Authors proposed a hybrid method that implements the high quality features of these formulations. The experimental results illustrate the high speedups and scalability in processing. Kazuto et al., proposed a parallel decision tree algorithm on a PC cluster [114]. Plain parallelization of decision tree is not efficient due to load imbalance. The proposed algorithm is a better parallel algorithm with data redistribution. The parallel algorithm’s performance on benchmark data demonstrate that it provides an improvement in speed of 3.5 times, in the best case and equal performance even in the worst case. Jin and Agrawal proposed parallel decision tree construction with memory and communication efficiency [115]. The approach achieves very low communication volume; no need to sort the training records, during the execution and combining shared memory and distributed memory parallelization. Jin et al., proposed use of SMP machines with a chain of techniques that includes full replication, full locking, fixed locking, optimized full locking and cache sensitive locking 18 Copyright ⓒ 2016 SERSC International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) for parallelization of data mining algorithms [116]. The results state that among full replication, optimized full locking and cache sensitive locking, there is no clear conqueror. Any of these three techniques can outperform other technique depending upon machine and dataset. These techniques perform considerably better than the other two techniques. In decision tree construction, combining different techniques is found to be critical for obtaining high performance. Li Wenlong et al., proposed parallel decision tree algorithm based on combination. Similarly distributed decision trees learning algorithms are proposed [117]. Jie Ouyang et al., proposed Chi-Square test based decision trees induction in distributed environment [118]. Kanishka Bhaduri et al., proposed distributed decision tree induction in peer-to-peer systems [119]. Bin Liu et al. proposed data mining in distributed data environment [120]. 5.12. SLIQ Algorithm [121]: The SLIQ (Supervised Learning in Quest) was developed by the Quest team at the IBM Almaden Research Center to handle both numeric and categorical datasets [121]. A SLIQ decision tree classifier recursively divides the training set into partitions so that most or all of the records are of a similar class. The algorithm starts with the entire dataset at the root node. The dataset is partitioned by splitting the criteria into subsets according to the gini index. The attribute containing the split point that maximizes the reduction in impurity or, equivalently, has the minimum gini index, is used to split the node. The value of a split point depends upon how well it separates the classes. The splitting is carried out for the dataset by choosing the attribute that will best separate the remaining samples of the nodes apportioned into the individual classes. SLIQ eliminates the need to sort the data at every node of decision tree, despite the training datasets are sorted only once for every numeric attribute at the beginning of the tree growth phase. 5.13. SPRINT Algorithm [122][126]: Shafer et al., proposed SPRINT that provides scalability, parallelism and removes memory restriction [122]. It achieves parallelism by its design, which allows multiple processors to work together. In this algorithm, lists are created as are created in SLIQ [121]. Initially an attribute list for each attribute in the data set is created. The entries in these lists called attribute records that consists an attribute value, a class label and the index of the record. Initial lists for continuous attributes are sorted by attribute value. If the complete data does not fit in memory, attribute lists are kept on disk. Thus memory restrictions are solved. The initial lists formed from the training set are linked with the root of the classification tree. As the algorithm executes the tree is grown and nodes are split to create new children, the attribute lists for each node are partitioned and associated with the children. The order of the records in the list is maintained while partition and thus partitioned lists never require resorting. The algorithm uses gini index as splitting criteria. The results demonstrate good scale up and speedup on large data set. The size up performance is also good because the communication cost for exchanging split points and count matrices does not change as the training set size is increased. 5.14. Rain Forest Algorithm [123]: Gehrke et al., proposed a framework called Rainforest that provides approach for implementing scalability in decision tree algorithms with large data sets [123]. Rainforest makes refinement to some initial steps of decision tree construction algorithm. Algorithm creates only one attribute list for all categorical attributes jointly. It creates the histograms for splitting information and thus avoids additional scan. The refinement is made up to this step, afterwards remaining part conventional decision tree algorithm proceeds. Copyright ⓒ 2016 SERSC 19 International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) 5.15. CLOUDS Algorithm [124]: Alsabti et al., proposed CLOUDS a decision tree classifier for large datasets [124]. The proposed algorithm samples for splitting points on numeric attributes followed by estimate procedure to narrow search space of best split. CLOUDS reduces computational and I/O complexity as compared to benchmark classifiers with quality in terms of accuracy and tree size. 5.16. BOAT Algorithm [125]: Gehrke et al., proposed BOAT an approach for optimistic decision tree construction [125]. It uses small subset of data for initial decision tree construction and refines it to construct final tree. With only two scans of training data it can construct several levels of decision tree and thus it is claimed to be faster by factor three. It can handle insertion and deletion of data in dynamic databases and thus it is first scalable incremental decision tree approach. 5.17. SLEAS Algorithm [177-179]: Recently Kishor Kumar Reddy et al., proposed SLEAS (Supervised Learning using Entropy as Attribute Selection measure) to handle both numeric and categorical datasets [177]. SLEAS decision tree classifier recursively divides the training set into partitions so that most or all of the records are of a similar class. The algorithm starts with the entire dataset at the root node. The dataset is partitioned by splitting the criteria into subsets according to the entropy. The attribute containing the split point that maximizes the reduction in impurity or, equivalently, has the maximum entropy, is used to split the node. The value of a split point depends upon how well it separates the classes. The splitting is carried out for the dataset by choosing the attribute that will best separate the remaining samples of the nodes apportioned into the individual classes. SLEAS eliminates the need to sort the data at every node of decision tree, despite the training datasets are sorted only once for every numeric attribute at the beginning of the tree growth phase. 5.18. SLGAS Algorithm [180]: Recently Kishor Kumar Reddy et al., proposed SLGAS (Supervised Learning using Gain Ratio as Attribute Selection measure) to handle both numeric and categorical datasets [180]. SLGAS decision tree classifier recursively divides the training set into partitions so that most or all of the records are of a similar class. The algorithm starts with the entire dataset at the root node. The dataset is partitioned by splitting the criteria into subsets according to the gain ratio. The attribute containing the split point that maximizes the reduction in impurity or, equivalently, has the maximum gain ratio, is used to split the node. The value of a split point depends upon how well it separates the classes. The splitting is carried out for the dataset by choosing the attribute that will best separate the remaining samples of the nodes apportioned into the individual classes. SLGAS eliminates the need to sort the data at every node of decision tree, despite the training datasets are sorted only once for every numeric attribute at the beginning of the tree growth phase. 5.19. ISLIQ Algorithm [181]: Recently Kishor Kumar Reddy et al., proposed ISLIQ (Improved Supervised Learning in Quest) to handle both numeric and categorical datasets [181]. ISLIQ decision tree classifier recursively divides the training set into partitions so that most or all of the records are of a similar class. The algorithm starts with the entire dataset at the root node. The dataset is partitioned by splitting the criteria into subsets according to the gini index. The major drawback with SLIQ is split points are computed whenever there is a change in 20 Copyright ⓒ 2016 SERSC International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) the class label. This increases computational complexity when the numbers of records are more. To overcome this, in ISLIQ interval range is computed and based on the interval range split points are evaluated. The attribute containing the split point that maximizes the reduction in impurity or, equivalently, has the maximum gini index, is used to split the node. The value of a split point depends upon how well it separates the classes. The splitting is carried out for the dataset by choosing the attribute that will best separate the remaining samples of the nodes apportioned into the individual classes. ISLIQ eliminates the need to sort the data at every node of decision tree, despite the training datasets are sorted only once for every numeric attribute at the beginning of the tree growth phase. 5.20. KNN [1] [141]: KNN is considered among the oldest non-parametric classification algorithms. To classify an unknown example, the distance (using some distance measure e.g., Eculidean) from that example to every other training example is measured. The k smallest distances are identified, and the most represented class in these k classes is considered the output class label. The value of k is normally determined using a validation set or using crossvalidation. 5.21. Naive Bayes [1][141]: The Naïve Bayes classifier works on a simple, but comparatively intuitive concept. Also, in some cases it is also seen that Naïve Bayes outperforms many other comparatively complex algorithms. It makes use of the variables contained in the data sample, by observing them individually, independent of each other. The Naïve Bayes classifier is based on the Bayes rule of conditional probability. It makes use of all the attributes contained in the data, and analyses them individually as though they are equally important and independent of each other. 5.22. Support Vector Machines [1][141]: Support Vector Machines are supervised learning methods used for classification, as well as regression. The advantage of Support Vector Machines is that they can make use of certain kernels in order to transform the problem, such that we can apply linear classification techniques to non-linear data. Applying the kernel equations arranges the data instances in such a way within the multi-dimensional space, that there is a hyperplane that separates data instances of one kind from those of another. The kernel equations may be any function that transforms the linearly non-separable data in one domain into another domain where the instances become linearly separable. Kernel equations may be linear, quadratic, Gaussian, or anything else that achieves this particular purpose. Once we manage to divide the data into two distinct categories, our aim is to get the best hyper-plane to separate the two types of instances. This hyper-plane is important because it decides the target variable value for future predictions. We should decide upon a hyper-plane that maximizes the margin between the support vectors on either side of the plane. Support vectors are those instances that are either on the separating planes on each side, or a little on the wrong side. One important thing to note about Support Vector Machines is that the data to be separated needs to be binary. Even if the data is not binary, Support Vector Machines handles it as though it is, and completes the analysis through a series of binary assessments on the data. 5.23. Neural Networks [1][141]: The back propagation algorithm performs learning on a multilayer fee-forward neural network. The inputs correspond to the attributes measured for each raining sample. The Copyright ⓒ 2016 SERSC 21 International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) inputs are fed simultaneously into layer of units making up the input layer. The weighted outputs of these units are, in turn, fed simultaneously to a second layer of neuron like units, known as a hidden layer. The hidden layer s weighted outputs can be input to another hidden layer, and so on. The number of hidden layers is arbitrary, although in practice, usually only one is used. The weighted outputs of the last hidden layer are input to units making up the output layer, which emits the network’s prediction for given samples. Except for the input nodes, each node is a neuron (or processing element) with a nonlinear activation function. MLP utilizes a supervised learning technique for training the network. 6. Decision Tree Learning Software Various decision tree softwares are available for researchers working in data mining. Some of the prominent softwares employed for analysis of data and some of the commonly used data sets for decision tree learning are discussed below. 6.1. WEKA [140]: WEKA (Waikato Environment for Knowledge Analysis) workbench is set of different data mining tools developed by machine learning group at University of Waikato, New Zealand. WEKA versions supporting windows, Linux and MAC operating systems are available [140]. It provides various associations, classification and clustering algorithms; in addition to that it provides pre-processors like filters and attribute selection algorithms. In case of decision tree learning WEKA provides J48 i.e., C4.5 implementation in java, Simple CART and Random Forest Tree are some of the prominent tree classifiers. In J48 we can construct trees with EBP, REP and unpruned trees. The input data file is in .arff (attribute relation file format) format. The source code is available to the user. 6.2. C4.5 [2] C4.5 Version C4.5.8 developed by Quinlan supports Unix based operating systems only [2]. The package consists of programs for decision tree generator, the rule generator, the decision tree interpreter and the production rule interpreter. The decision tree generator expects two input files with extentions .name and .data file as input files. The .name file provides information about attributes and classes. The .data file contains actual attribute values with their class. The source code of C4.5 is available to the user. 6.3. OC1 OC1 is an oblique decision tree classifier by Murthy [136]. Various splitting criteria are also available with this package. The OC1 software can also be used to create both standard axis-parallel decision trees and oblique trees. The source code is available to the user. 6.4. GATree [96] GATree is evolutionary decision tree by Papagelis and Kalles [96]. GATree works on windows operating system. The evaluation version of GATree is available on request to the authors. Here we can set various parameters like generations, populations, crossover and mutation probability etc. to generate decision trees. 7. Applications of Decision Trees [141] The decision tree algorithm has applications in all walks of life [141-176] [182]. The application areas are listed below: 22 Copyright ⓒ 2016 SERSC International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) Business: Virine and Rapley proposed use of decision trees in visualization of probabilistic business models [142]. Yang et al., proposed use of decision tree in customer relationship management [143]. Zhang et al. proposed use of decision tree in credit scoring for credit card users [144]. E-Commerce: A good online catalog is essential for the success of an e-commerce web site, Sung et al., mentioned use of decision tree for construction of online catalog topologies [145]. Energy Modelling: Energy modelling for buildings is one of the important tasks in building design. Zhun et al., proposed decision tree method for building energy demand modelling [146]. Image Processing: Macarthur et al., proposed use of decision tree in content-based image retrieval [147]. Park et al. proposed perceptual grouping of 3-D features in aerial image using decision tree classifier [148]. Intrusion Detection: Sinclair et al., proposed decision trees with genetic algorithms to automatically generate rules for an intrusion detection expert system [149]. Abbes et al., proposed protocol analysis in intrusion detection using decision tree [150]. Medical Research: Medical research and practice are the important areas of application for decision tree techniques. Stasis et al., proposed decision trees algorithms for heart sound diagnosis [151]. Lenic et al., focused on decision tree methods that can support physicians in medical diagnosing in case of mitral valve prolapsed [152]. Kokol et al., introduced decision trees as part of intelligent systems that help physicians [153]. Dong et al., proposed evaluating skin condition using a decision tree [154]. Kennedy and Adams proposed decision tree to help out in selecting a brain computer interface device for patients who are cognitively intact but unable to move or communicate [155]. Hui and GaiLiping proposed analysis of complex diseases by statistical estimation of diagnosis with genetic markers based on decision tree analysis [156]. Intelligent Vehicles: The job of finding the lane boundaries of the road is important task in development of intelligent vehicles. Gonzalez and Ozguner proposed lane detection for intelligent vehicles by using decision tree [157]. Object Recognition: Freixenet et al., proposed use of decision trees for color feature selection in object recognition for outdoor scenes [158]. Reliability Engineering: Claudio and Rocco proposed approximate reliability expressions for network reliability using a decision tree approach [159]. Assaf and Dugan proposed method that establishes a Copyright ⓒ 2016 SERSC 23 International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) dynamic reliability model and generates a diagnostic model using decision trees for a system [160]. Remote Sensing: Remote sensing is a strong application area for pattern recognition work with decision trees. Simard et al., proposed decision tree-based classification for land cover categories in remote sensing [161]. Palaniappan et al., proposed binary tree with genetic algorithm for land cover classification [162]. Space Application: The portfolio allocation problem is pervasive to all research activities. The use of past experience in this area with help of decision trees is recommended. Manvi et al., suggested use of decision trees in NASA space missions [163]. Speech Recognition: Amit and Murua proposed speech recognition using randomized decision trees [164]. Bahl et al. proposed a tree-based statistical language model for natural language speech recognition, which predicts the next word spoken, based on previous word spoken [165]. Yamagishi et al. proposed decision trees for speech synthesis [166]. Software Development: Selby and Porter proposed decision trees for software resource analysis [167]. Khoshgoftaar et al., proposed decision trees for software quality classification [168]. Steganalysis: Geetha et al., proposed evolving decision tree based system for audio stego anomalies [169]. Text Processing: Diao et al., introduced decision trees for text categorization [170]. Traffic and Road Detection: Wu et al., proposed use of decision tree in analysing, predicting and guiding the traffic flow [171]. Jeong and Nedevschi proposed intelligent road region detection based on decision tree in highway and rural way environments [172]. Video Processing: Jaser et al., proposed automatic sports video classification with decision tree [173]. Cen and Cosman explained decision trees for error concealment in video decoding [174]. Web Applications: Bonchi et al., proposed decision trees for intelligent web caching [175]. Chen et al., presented a decision tree learning approach to diagnosing failures in large Internet sites [176]. 8. Conclusion In this paper, we have presented a multi-disciplinary survey of work in constructing the decision tree and non-decision tree for a given dataset. The main goal is to provide an existing work on decision trees and non-decision trees, and a taste of their usefulness, to the newcomers as well as practitioners in the emerging field of data mining and knowledge discovery. We also hope that overviews like these can help to avoid 24 Copyright ⓒ 2016 SERSC International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) redundant, ad hoc effort, from researchers, academicians, scientists, practitioners and so on to a greater extent. In this paper, we have presented various properties of decision trees, splitting criterion, pruning methodologies, decision tree algorithms and non-decision tree algorithms. Further, we presented various decision tree and non-decision tree learning softwares and real time applications of decision trees. As per the observations from the survey it is found that: 1. The performance of a decision tree strongly depends on type of attribute selection measure used such as entropy, gini index, gain ratio and so on. 2. The performance of a decision tree strongly depends on dataset i.e. number of attributes, instances, training data, test data. 3. The performance of decision trees increases to a greater extent after pruning. 4. The performance of a decision tree in terms of time decreases when it is executed parallely. 5. The rules generated by the decision trees are easy to understand. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] T. M. Mitchell, “Machine Learning”, The McGrawHill Companies, Inc., (1997). J. R. Quinlan, “C4.5: Programming for Machine Learning”, San Francisco, CA: Morgan Kaufman, (1993). S. K. Murthy, “Automatic construction of decision trees from data: a multi-disciplinary survey”, Data Mining and Knowledge Discovery, vol. 2, no. 4, (1998), pp. 345-389. E. Alpaydin, “Introduction to machine Learning Prentice-Hall of India”, (2005). S. Ruggieri, “Efficient C4.5”, IEEE Transaction on Knowledge and Data Engineering, vol. 14, no. 2, (2002), pp. 438-444. M. Ben-Bassat, “Use of distance measure, Information measures and error bounds on feature evaluation”, In Sreerama Murthy, vol. 1, pp. 9-11. M. Last and O. Maimon, “A compact and accurate model for classification”, IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 2, pp. 203-215. B. Hwan Jun, C. Soo Kim, H.-Y. Song and J. Kim, “A new criterion in selection and discretization of attributes for the generation of decision trees”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 12, pp. 1371- 1375. L. Breiman, J. H. Friedman, R. A. Olshen and C. J. Stone, “Classification and Regression Trees”, Wadsworth International Group, Belmont, California. S. K. Murthy, Simon Kasif and Steven Salzberg, “A system for induction of oblique decision trees”, Journal of Artificial Intelligence Research, vol. 2, pp. 1-33. R. S. Mantaras, “A distance based attribute selection measure for decision tree induction”, Technical Report, Machine Learning, vol. 6, pp. 81-92. B. Chandra, R. Kothari and P. Paul, “A new node splitting measure for decision tree construction Pattern Recognition”, Elsevier Publishers, vol. 43, pp. 2725-2731. E. Rounds, “A combined nonparametric approach to feature selection and binary decision tree design”, Pattern Recognition, vol. 12, pp. 313-317. P. E. Utgoff and J. A. Clouse, “A KolmogorovSmirnoff metric for decision tree induction”, Tech. Rep. No. 96-3, Dept. Comp. Science, University Massachusetts, Amherst. J. K. Martin, “An exact probability metric for decision tree splitting and stopping”, Machine Learning, vol. 28, no. 2-3, pp. 257-29. W. L. Buntine and T. Niblett, “A further comparison of splitting rules for decision-tree induction”, Machine Learning, vol. 8, pp. 75-85. T. Windeatt and G. Ardeshir, “An empirical comparison of pruning methods for ensemble classifiers”, Proc. of 4th International Conference on Advances in Intelligent Data Analysis, Cascais, Portugal, (2001), pp. 208-217. F. Esposito, D. Malerba and G. Semeraro, “A comparative analysis of methods for pruning decision trees”, IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 19, no. 5, (1997), pp. 476-491. J. R. Quinlan, “Simplifying decision trees”, International Journal of Man Machine Studies, vol. 27, pp. 221-234. J. Mingers, “An empirical comparison of pruning methods for decision tree induction”, Machine Learning, vol. 3, (1989), pp. 227-243. M. Mehta, J. Rissanen and R. Agrawal, “MDLbased decision tree pruning”, Proc. of the 1st International Conference on Knowledge Discovery in Databases and Data Mining, Montreal, Canada, (1995), pp. 216-221. J. Ross Quinlan and Ronald L. Rivest, “Inferring decision trees using the minimum description length principle”, Inform. Comput., (1989), vol. 80, pp. 227-248. Copyright ⓒ 2016 SERSC 25 International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) [23] Bratko and M. Bohanec, “Trading accuracy for simplicity in decision trees”, Machine Learning, vol. 15, (1994), pp. 223-250. [24] H. Allamullim, “An efficient algorithm for optimal pruning of decision trees”, Artificial Intelligence, vol. 83, no. 2, pp. 347-362. [25] M. Kaariainen, “Learning small trees and graphs that generalize”, A Report, University of Helsinki, Finland Series of Publications Helsinki, (2004). [26] L. Hall, R. Collins, K. W. Bowyer and R. Banfield, “Error-Based pruning of decision trees grown on very large data sets can work!”, Proc. of the 14th IEEE International Conference on Tools with Artificial Intelligence, (2002), pp. 233-238. [27] T. Oates and D. Jensen, “Toward a theoretical understanding of why and when decision tree pruning algorithms fail”, Proc. of the Sixteenth National Conference on Artificial Intelligence, (1999), pp. 372378. [28] J. Macek and L. Lhotsk, “Gaussian complexities based decision tree pruning”, Cybernetics and Systems 2004, Austrian Society for Cybernetics Studies Vienna, (2004), pp. 713-718. [29] E. Frank, “Pruning Decision Trees and List”, A Doctoral Thesis Submitted to University of Waikato, (2000). [30] J. P. Bradford, C. Kunz, R. Kohavi, C. Brunk and C. E. Brodley, “Pruning decision trees with misclassification costs”, European Conference on Machine Learning, (1998), pp. 131-136. [31] C. Scott, “Tree pruning with subadditive penalties”, IEEE Transactions On Signal Processing, vol. 53, no. 12, pp. 4518-4525. [32] A. P. Bradley and B. C. Lovell, “Costsensitive decision tree pruning: use of the ROC curve”, In Eighth Australian Joint Conference on Artificial Intelligence, Canberra, Australia, (1995) November, pp. 1-8. [33] J. Cai1, J. Durkin1 and Q. Cai, “CC4.5: costsensitive decision tree pruning”, Proc. of Data Mining Conference, Skiathos, Greece, (2005), pp. 239-245. [34] Y. Mansour, “Pessimistic decision tree pruning based on tree size”, Proceedings of 14th International Conference on Machine Learning, (1997), pp. 195-201. [35] X. Huo, S. Bum Kim, K.-L. Tsui and S. Wang, “FBP: A frontier-based treepruning algorithm”, INFORMS Journal on Computing, vol. 18, no. 4, (2006), pp. 494-505. [36] J. Faurnkranz, “Pruning algorithms for rule learning”, Machine Learning, vol. 27, (1997), pp. 139-172. [37] S. Shah and P. S. Sastry, “New algorithms for learning and pruning oblique decision trees”, IEEE Transactions on Systems, Man, And Cybernetics— Part C: Applications and Reviews, vol. 29, no. 4, pp. 494-505. [38] G. V. Kass, “An exploratory technique for investigating large quantities of categorical data”, Applied Statistics, vol. 29, no. 2, pp. 119-127. [39] J. R. Quinlan, “Induction of decision trees”, Machine Learning, vol. 1-1, pp. 81-106. [40] R. Kohavi, “Feature subset selection as search with probabilistic estimates”, Proc. the AAAI Fall Symposium on Relevance, (1994), pp. 122-126. [41] R. Caruana and D. Freitag, “Greedy attribute selection”, Proc. of the 11th International Conference on Machine Learning, (1994), pp. 28-36. [42] M. Last, A. Kandel, O. Maimon and E. Eberbach, “Anytime algorithm for feature selection”, Proc. of Second International Conference on Rough Sets and Current Trends in Computing, (2000), pp. 532-539. [43] S. Wu and P. A. Flach, “Feature selection with labelled and unlabelled data”, Marko Bohanec, Dunja Mladenic, and Nada Lavrac, editors, ECML/PKDD'02 workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning, (2002), pp. 156-167. [44] H. Yuan, S.-S. Tseng, W. Gangshan and Z. Fuyan, “A two-phase feature selection method using both filter and wrapper”, Proc. of IEEE International Conference on Systems, Man and Cybernetics, (1999), pp. 132-136. [45] K. Grabczewski and N. Jankowski, “Feature selection with decision tree criterion”, Proc. of Fifth International Conference on Hybrid Intelligent Systems, (2005) November 6-9, pp. 212-217. [46] J. Bins and B. A. Draper, “Feature selection from huge feature sets”, Processing of International Conference on Computer Vision, Vancouver, (2001), pp. 159-165. [47] C. Guerra-Salcedo, S. Chen, D. Whitley and S. Smith, “Fast and accurate feature selection using hybrid genetic strategies”, Proceedings of the Congress on Evolutionary Computation, (1999), pp. 177-184. [48] J.-A. Landry, L. Da Costa and T. Bernier, “Discriminant feature selection by genetic programming: towards a domain independent multi-class object detection system”, Journal of Systemics, Cybernetics and Informatics, vol. 1, no. 3, (2006), pp. 76-81. [49] J. Huang Bala, H. Vafaie, K. DeJong and H. Wechsler, “Hybrid learning using genetic algorithms and decision trees for pattern classification”, Proc. of the IJCAI conference, Montreal, (1995), pp. 719-724. [50] G. Legrand and N. Nicoloyannis, “Feature selection and preferences aggregation”, Machine Learning and Data Mining in Pattern Recognition, Springer Heidelberg, (2005), pp. 203-217. [51] M. A. Hall and L. A. Smith, “Feature subset selection: a correlation based filter approach”, Proceedings of International Conference on Neural Information Processing and Intelligent Information Systems 1997, (1997), pp. 855-858. 26 Copyright ⓒ 2016 SERSC International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) [52] W. Duch, J. Biesiada, T. Winiarski, K. Grudzinski and K. Gr. Abczewski, “Feature selection based on information theory filters and feature elimination wrapper methods”, Proceedings of the International Conference on Neural Networks and Soft Computing, Advances in Soft Computing, (2002), pp. 173176. [53] M. A. Hall, “Correlation-based feature selection for discrete and numeric class machine learning”, Proc. of International Conference on Machine Learning, Stanford University, CA. Morgan Kaufmann Publishers, (2000), pp. 359-366. [54] H. Yuan, S.-S. Tseng, W. Gangshan and Z. Fuyan, “A two-phase feature selection method using both filter and wrapper”, Proc. of IEEE International Conference on Systems, Man and Cybernetics, pp. 132136. [55] P. Luca Lanzi, “Fast feature selection with genetic algorithms: a filter approach”, Proc. of 1997 IEEE International Conference on Evolutionary Computation, pp. 537-540. [56] G. H. John, “Robust decision trees: Removing outliers from databases”, In Proc. of the First ICKDDM, (1995), pp. 174-179. [57] R. Agrawal Arning and P. Raghavan, “A linear method for deviation detection in large databases”, KDDM 1996, pp. 164-169. [58] I. Guyon, N. Matic and V. Vapnik, “Discovering informative patterns and data cleaning”, Advances in knowledge discovery and data mining, AAAI, (1996), pp. 181-203. [59] G. D. Gamberger and N. Lavrac, “Conditions for Occam's Razor applicability and noise elimination”, Marteen van Someren and Gerhard Widmer, editors, Proceedings of the 9th European Conference on Machine Learning, Springer, (1997), pp. 108-123. [60] E. M. Knorr and R. T. Ng. “A unified notion of outliers: properties and computation”, Proceedings of 3rd International Conference on Knowledge Discovery and Data Mining, (1997). [61] E. M. Knorr and R. T. Ng, “Algorithms for mining distance-based outliers in large datasets”, Proceedings 24th VLDB, (1998), pp. 392-403, 24-27. [62] D. Tax and R. Duin, “Outlier detection using classifier instability”, Proceedings of the workshop Statistical Pattern Recognition, Sydney, (1998). [63] E. Brodley and M. A. Fried, “Identifying mislabeled training data”, Journal of Artificial Intelligence Research, vol. 11, (1999), pp. 131-167. [64] S. Weisberg, “Applied Linear Regression”, John Wiley and Sons, (1985). [65] D. Gamberger, N. Lavrac and C. Groselj, “Experiments with noise filtering in a medical domain”, In Proc. 16th ICML, Morgan Kaufman, San Francisco, CA, (1999), pp. 143-151. [66] S. Schwarm and S. Wolfman, “Cleaning data with Bayesian methods”, 2000. Final project report for University of Washington Computer Science and Engineering CSE574, (2000) March 16. [67] S. Ramaswamy, R. Rastogi and K. Shim. “Efficient algorithms for mining outliers from large data sets”, ACM SIGMOD, vol. 29, no. 2, (2000) June, pp. 427-438. [68] V. Raman and J. M. Hellerstein, “An interactive framework for data transformation and cleaning”, Technical report University of California Berkeley, California, (2000) September. [69] J. Kubica and A. Moore, “Probabilistic noise identification and data cleaning”, Third IEEE International Conference on Data Mining, (2003) November 19-22. [70] V. Verbaeten and A. V. Assche, “Ensemble methods for noise elimination in classification problems. In Multiple Classifier Systems”, Springer, (2003). [71] J. A. Loureiro, L. Torgo and C. Soares, “Outlier detection using clustering methods: a data cleaning application”, Proceedings of KDNet Symposium on Knowledge-based Systems for the Public Sector, Bonn, Germany, (2004). [72] H. Xiong, G. Pande, M. Stein and V. Kumar, “Enhancing Data analysis with noise removal”, IEEE Transaction on knowledge and Data Engineering, vol. 18, no. 3, (2006) March, pp. 304-319. [73] S. Kim, N. Wook Cho, B. Kang and S. Ho Kang, “Fast outlier detection for very large log data”, Expert Systems with Applications, Elsevier, vol. 38, (2011), pp. 9587-9596. [74] S. Hido, Y. Tsuboi, H. Kashima, M. Sugiyama and T. Kanamori, “Statistical Outlier Detection Using Direct Density Ratio Estimation. Knowledge and Information Systems”, vol. 26, no. 2, pp. 309-336, 2011. [75] G. John and P. Langley, “Static Versus Dynamic Sampling for Data Mining”, In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining 1996, AAAI Press, pp. 367-370. [76] F. Provost, D. Jensen and T. Oates, “Efficient Progressive sampling”, Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, ACM Press, (1999), pp. 23-32. [77] V. Patil and R. S. Bichkar, “A hybrid evolutionary approach to construct optimal decision trees with large data sets”, In Proc. IEEE ICIT06 Mumbai, (2006), December 15-17, pp. 429-433. [78] R. J. Little and D. B. Rubin, “Statistical Analysis with Missing Data”, John Wiley and Sons, New York, (1987). [79] G. Batista and M. C. Monard, “An analysis of four missing data treatment methods for supervised learning”, Applied Artificial Intelligence, vol. 17, (2003), pp. 519-533. [80] J. H. Friedman, J. Louis Bentley and R. A. Finkel, “An algorithm for finding best matches in logarithmic expected time”, ACM Transactions on Mathematical Software, vol. 3, (1977), pp. 209-226. Copyright ⓒ 2016 SERSC 27 International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) [81] J. R. Quinlan, “Unknown attribute values in induction”, Journal of Machine Learning, vol. 1, (1986), pp. 81-106. [82] R. J. Kuligowski and A. P. Barros, “Using artificial neural Networks to estimate missing rainfall data”, Journal AWRA, vol. 34, no. 6, 14 (1998). [83] L. L. Brockmeier, J. D. Kromrey and C. V. Hines, “Systematically Missing Data and Multiple Regression Analysis: An Empirical Comparison of Deletion and Imputation Techniques”, Multiple Linear Regression Viewpoints, vol. 25, (1998), pp. 20-39. [84] A. J. Abebe, D. P. Solomatine and R. G. W. Venneker, “Application of adaptive fuzzy rule-based models for reconstruction of missing precipitation events”, Hydrological Sciences Journal, vol. 45, no. 3, pp. 425-436. [85] S. Sinharay, H. S. Stern and D. Russell, “The use of multiple imputations for the analysis of missing data”, Psychological Methods, vol. 4, pp. 317-329. [86] K. Khalil, M. Panu and W. C. Lennox, “Groups and neural networks based stream flow data infilling procedures”, Journal of Hydrology, vol. 241, (2001), pp. 153-176. [87] B. Bhattacharya, D. L. Shrestha and D. P. Solomatine, “Neural networks in reconstructing missing wave data in Sedimentation modelling”, In the Proceedings of 30th IAHR Congress, Thessaloniki, Greece Congress, Thessaloniki, Greece, (2003) August 24-29. [88] F. Fessant and S. Midenet, “Self-organizing map for data imputation and correction in surveys”, Neural Comput. Appl., vol. 10, (2002), pp. 300-310. [89] C. M. Musil, C. B. Warner, P. K. Yobas and S. L. Jones, “A comparison of imputation techniques for handling missing data”, Weston Journal of Nursing Research, vol. 24, no. 7, (2002), pp. 815-829. [90] H. Junninen, H. Niska, K. Tuppurainen, J. Ruuskanen and M. Kolehmainen, “Methods for imputation of missing values in air quality data sets”, Atoms. Environ., vol. 38, (2004), pp. 2895-2907. [91] M. Subasi, E. Subasi and P. L. Hammer, “New Imputation Method for Incomplete Binary Data”, Rutcor Research Report, (2009) August. [92] A. Mohammad Kalteh and P. Hjorth, “Imputation of Missing values in precipitation-runoff process database”, Journal of Hydrology research, vol. 40, no. 4, pp. 420-432. [93] R. M. Daniel, M. G. Kenward, “A method for increasing the robustness of multiple imputation”, Computational Statistics and Data Analysis, doi 10.1016/j.csda.2011.10.006, (2011) Elsevier. [94] P. and B., “Multiple Imputation of Missing Data with Genetic Algorithms based Techniques”, International Journal on Computer Applications Special Issue on Evolutionary Computation in Optimisation Techniques, vol. 2, (2010), pp. 74-78. [95] G. Mitchell Weiss, “The Effect of Small Disjuncts and Class Distribution on Decision Tree Learning”, A Doctoral Thesis Submitted to the Graduate School, New Brunswick Rutgers, The State University of New Jersey, (2003). [96] P. and D. Kalles, “GATree: Genetically evolved decision trees”, Proc. 12th International Conference on Tools with Artificial Intelligence, (2000), pp. 203-206. [97] Z. Fu and F. Mae, “A computational study of using genetic algorithms to develop intelligent decision trees”, Proc. of the 2001 IEEE Congress On Evolutionary Computation, vol. 2, (2001), pp. 1382-1387. [98] N. and E. Tazaki, “Genetic programming combined with association rule algorithm for decision tree construction”, Proc. of fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies, vol. 2, (2000), pp. 746-749. [99] Y. Kornienko and A. Borisov, “Investigation of a hybrid algorithm for decision tree generation”, Proc. of the Second IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, (2003), pp. 63-68. [100] Z.-H. Zhou and Y. Jiang, “NeC4.5: Neural ensemble based C4.5”, IEEE Transactions On Knowledge And Data Engineering, vol. 16, no. 6, (2004), pp. 770-773. [101] C. Z. Janikow, “Fuzzy decision trees: Issues and methods”, IEEE Transactions on Systems, Man, and Cybernetics, vol. 28, no. 1, pp. 1-14. [102] Zeidler and M. Schlosser, “Continuous-valued attributes in fuzzy decision trees”, Proc. of the International Conference on Information Processing and Management of Uncertainty in KnowledgeBased Systems, (1996), pp. 395-400. [103] C. Z. Janikow, “A genetic algorithm method for optimizing the fuzzy component of a fuzzy decision tree”, In CA for Pattern Recognition, editors S. Pal and P. Wang, CKC Press, pp. 253-282. [104] M. Won Kim, J. Geun Lee and C. Min, “Efficient fuzzy rule generation based on fuzzy decision tree for data mining”, Proceeding of IEEE International Fuzzy Systems Conference Seoul, Korea, (1999), 22-25. [105] M. Fajfer and C. Z. Janikow, “Bottom-up fuzzy partitioning in fuzzy decision trees”, Proceedings of 19th International Conference of the North American Fuzzy Information Processing Society, (2000), pp. 326-330. [106] M. Guetova, S. Holldobler and H.-P. Storr, “Incremental fuzzy decision trees”, International Conference on Fuzzy Sets and Soft Computing in Economics and Finance, St. Petersburg, Russia, (2004). [107] K. Kubota, H. Sakai, A. Nakase and S. Oyanagi, “Parallelization of decision tree algorithm and its performance evaluation”, Proceedings of The Fourth International Conference on High Performance Computing in the Asia-Pacific Region, vol. 2, (2000), pp. 574 -579. 28 Copyright ⓒ 2016 SERSC International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) [108] R. Kufrin, “Decision trees on parallel processors Machine Intelligence and pattern recognition”, Elsevier, vol. 20, (1997), pp. 279-306. [109] G. J. Narlikar, “A parallel, multithreaded decision tree builder”, A Technical Report, School of Computer Science, Carnegie Mellon University, (1998). [110] M. V. Joshi, G. Karypis and V. Kumar, “Scalparc: A new scalable and efficient parallel classification algorithm for mining large datasets”, Proceedings of the International Parallel Processing Symposium, (1998), pp. 573-579. [111] L. O. Hall, N. Chawla and K. W. Bowyer, “Combining decision trees learned in parallel”, Distributed Data Mining Workshop at International Conference of Knowledge Discovery and Data Mining, (1998), pp. 77-83. [112] M. J. Zaki, C.T. Ho and R. Agrawal, “Parallel classification for data mining on shared-memory multiprocessors”, IEEE International Conference on Data Engineering, (1999), pp. 198-205. [113] A. Srivastava, E. H. Han, V. Kumar and V. Singh, “Parallel formulations of decisiontree classification algorithms”, Data Mining and Knowledge Discovery: An International Journal, vol. 3, no. 3, (1999), pp. 237-261. [114] K. Kubota, A. Nakase and S. Oyanagi, “Implementation and performance evaluation of dynamic scheduling for parallel decision tree generation”, Proceedings of the 15th International Parallel and Distributed Processing Symposium, (2001), pp. 1579-1588. [115] R. Jin and G. Agrawal, “Communication and memory efficient parallel decision tree construction”, Proceedings of Third SIAM Conference on Data Mining, (2003). [116] R. Jin, G. Yang and G. Agrawal, “Shared memory parallelization of data mining algorithms: techniques, programming interface, and performance”, IEEE Transactions On Knowledge And Data Engineering, vol. 16, no. 10, (2004), pp. 71-89. [117] L. Wenlong and X. Changzheng, “Parallel Decision Tree Algorithm Based on Combination”, IEEE International Forum on Information Technology and Applications (IFITA) Kunming, (2010) July 16-18, pp. 99-101. [118] J. Ouyang, N. Patel and I. K. Sethi, “Chi-Square Test Based Decision Trees Induction in Distributed Environment”, IEEE International Conference on Data Mining Workshops, ICDMW '08, (2008) December 15-19, pp. 477-485. [119] K. Bhaduri, R. Wolff, C. Giannella and H. Kargupta, “Distributed Decision-Tree Induction in Peer-toPeer Systems”, Journal Statistical Analysis and Data Mining, John Wiley and Sons, vol. 1, no. 2, (2008) June. [120] B. Liu, S.-G. Cao, X.-L. Jim and Z.-H. Zhi, “Data mining in distributed data environment”, International Conference on Machine Learning and Cybernetics (ICMLC), vol. 1, (2010) July 11-14, pp. 421-426. [121] M. Mehta, R. Agrawal and J. Rissanen, “SLIQ: A fast scalable classifier for data mining”, Proceedings of the Fifth international Conference on Extending Database Technology, Avignon, France, (1996), pp. 18-32. [122] R. Agrawal Shafer and M. Mehta, “SPRINT: A scalable parallel classifier for data mining”, Proceedings of the 22nd VLDB Conference, (1996), pp. 544-555. [123] J. Gehrke, R. Ramakrishnan and V. Ganti, “Rainforest-A framework for fast decision tree construction of large datasets”, Proceedings of Conference on Very Large Databases (VLDB), (1998), pp. 416-427. [124] K. Alsabti, S. Ranka and V. Singh, “CLOUDS: a decision tree classifier for large datasets”, Proceedings of Conference on Knowledge Discovery and Data Mining (KDD-98), (1998), pp. 2-8. [125] J. Gehrke, V. Ganti, R. Ramakrishnan and W. Loh, “BOAT-optimistic decision tree construction”, Proc. of Conference SIGMOD, (1999), pp. 169-180. [126] P. Chan and S. J. Stolfo, “Toward parallel and distributed learning by meta-learning”, In Working Notes AAAI Work. Knowledge Discovery in Databases, pp. 227-240. [127] L. Todorovski and Dzeroski, “Combining multiple models with meta decision trees”, Proc. of the Fourth European Conference on Principles of Data Mining and Knowledge Discovery, (2000), pp. 54-64. [128] L. Todorovski and Dzeroski, “Combining classifiers with meta decision trees”, Machine Learning, vol. 50, no. 3, (2003), pp. 223-249. [129] B. Zenko, L. Todorovski and Dzeroski, “A comparison of stacking with meta decision trees to bagging, boosting, and stacking with other methods”, Proceedings of the 2001 IEEE International Conference on Data Mining, (2001), pp. 669-670. [130] A. L. Prodromidis, P. K. Chan and S. J. Stolfo, “Meta-learning in distributed data mining systems: Issues and approaches”, editors Hillol Kargupta and Philip Chan, Book on Advances of Distributed Data Mining AAAI press, (2000), pp. 81-113. [131] S. Stolfo, W. Fan, W. Lee, A. Prodromidis and P. Chan, “Credit Card Fraud Detection Using Metalearning: Issues and Initial Results”, In working notes of AAAI Workshop on AI Approaches to Fraud Detection and Risk Management, (1997). [132] S. Rasoul Safavian and D. Landgrebe, “A survey of decision tree classifier methodology”, IEEE Transaction on Systems, Man, and Cybernetics, vol. 21, no. 3, (1991), pp. 660-674. [133] L. Rokach and O. Maimon, “Top-down induction of decision trees classifiers-a survey”, IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications And Reviews, vol. 35, no. 4, (2005), pp. 476-487. Copyright ⓒ 2016 SERSC 29 International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) [134] P. E. Utgoff, “Incremental induction of decision trees”, Machine Learning, vol. 4, (1989), pp. 161-186. [135] R. Reynolds and Hasan Al-Shehri, “The use of cultural algorithms with evolutionary programming to guide decision tree induction in large databases”, Proceedings of the 1998 IEEE International conference on Evolutionary Computation, at IEEE World Congress on Computational Intelligence at Anchorage, AK, USA, (1998), pp. 441-546. [136] S. K. Murthy, S. Kasif, S. Salzberg and R. Beigel, “OC1: Randomized induction of oblique decision trees”, Proceeding Eleventh National Conference on Artificial Intelligence, Washington, DC, 11-15th, AAAI Press, (1993) July, pp. 322-327. [137] R. Setiono and H. Liu, “A connectionist approach to generating oblique decision trees”, IEEE Transactions On Systems, Man, And Cybernetics, vol. 29, no. 3, (1999). [138] V. S. Iyengar, “HOT: Heuristics for oblique trees”, Proceedings of Eleventh International Conference on Tools with Artificial Intelligence, IEEE Press, (1999), pp. 91-98. [139] E. Cantu-Paz and C. Kamath, “Inducing oblique decision trees with evolutionary algorithms”, IEEE Transactions on Evolutionary Computation, vol. 7, no. 1, (2003), pp. 5-68. [140] I. H. Witten and E. Frank, “Data Mining Practical Machine Learning Tools and Techniques”, Morgan Kaufmann, (2005). [141] J. Han, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, (2001). [142] L. Virine and L. Rapley, “Visualization of probabilistic business models”, Proceedings of the 2003 Winter Simulation Conference, vol. 2, (2003), pp. 1779-1786. [143] Q. Yang, J. Yin, C. X. Ling and T. Chen, “Post processing decision trees to extract actionable knowledge”, Proceedings of the Third IEEE International Conference on Data Mining, Florida, USA, (2003). [144] D. Zhang, X. Zhou, S. C. H. Leung and J. Zheng, “Vertical bagging decision trees model for credit scoring”, Expert Systems with Applications, Elsevier Publishers, vol. 37, (2010), pp. 7838-7843. [145] W.-K. Sung, D. Yang, S.-M. Yiu, D. W. Cheung, W.-S. Ho and T.-W. Lam, “Automatic construction of online catalog topologies”, IEEE Transactions on Systems, Man, And Cybernetics—Part C: Applications and Reviews, vol. 32, no. 4, (2002). [146] Z. Yu, F. Haghighat, B. C. M. Fung and H. Yoshino, “A decision tree method for building energy demand modeling”, International Journal of Energy and Buildings, vol. 42, pp. 1637-1646. [147] S. D. MacArthur, C. E. Brodley, A. C. Kak and L. S. Broderick, “Interactive content based image retrieval using relevance feedback”, Computer Vision and Image Understanding, (2002), pp. 55-75. [148] K. Park, K. Mu Lee and S. Uk Lee, “Perceptual grouping of 3D features in aerial image using decision tree classifier”, Proceedings of 1999 International Conference on Image Processing, vol. 1, (1999), pp. 31-35. [149] C. Sinclair, L. Pierce and S. Matzner, “An application of machine learning to network intrusion detection”, Proceedings of 15th Annual Computer Security Applications Conference, (1999), pp. 37137. [150] T. Abbes, A. Bouhoula and M. Rusinowitch, “Protocol analysis in intrusion detection using decision tree”, Proceedings of the International Conference on Information Technology: Coding and Computing, IEEE, (2004), pp. 404-408. [151] C. Stasis, E. N. Loukis, S. A. Pavlopoulos and D. Koutsouris, “Using decision tree algorithms as a basis for a heart sound diagnosis decision support system”, Proceedings of the 4th Annual IEEE Conference on Information Technology Applications in Biomedicine, UK, (2003), pp. 354-357. [152] M. Lenic, P. Povalej, M. Zorman, V. Podgorelec, P. Kokol and L. Lhotska, “Multimethod machine learning approach for medical diagnosing”, Proceedings of the 4th Annual IEEE Conf on Information Technology Applications in Biomedicine, UK, (2003), pp. 195-198. [153] P. Kokol, M. Zorman, V. Podgorelec and S. Hleb Babie, “Engineering for intelligent systems”, Proceedings of 1999 IEEE International Conference on Systems, Man, and Cybernetics, vol. 6, pp. 306311. [154] M. Dong, R. Kothari, M. Visschert and S. B. Hoatht, “Evaluating skin condition using a new decision tree induction algorithm”, Proceedings International Joint Conference on Neural Networks, vol. 4, (2001), pp. 2456-2460. [155] P. R. Kennedy and K. D. Adams, “A decision tree for brain-computer interface devices”, IEEE Transactions On Neural Systems And Rehabilitation Engineering, vol. 11, no. 2, (2003), pp. 148-150. [156] L. Hui and G. Liping, “Statistical estimation of diagnosis with genetic markers based on decision tree analysis of complex disease”, International Journal of Computers in Biology and Medicine, vol. 39, (2009), pp. 989-992. [157] J. Pablo Gonzalez and U. Ozguner, “Lane detection using histogram-based segmentation and decision trees”, Proceedings of IEEE Intelligent Transportation Systems, (2000), pp. 346-351. [158] J. Freixenet, X. Lladb, J. Marti and X. Cufi, “Use of decision trees in color feature selection. application to object recognition in outdoor scenes”, Proceedings of International Conference on Image Processing, vol. 3, (2000), pp. 496-499. [159] C. M. Rocco S., “Approximate reliability expressions using a decision tree approach”, Proceedings of Annual Symposium - RAMS Reliability and Maintainability, (2004), pp. 116-121. 30 Copyright ⓒ 2016 SERSC International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) [160] T. Assaf and J. Bechta Dugan, “Diagnostic expert systems from dynamic fault trees”, Annual Symposium-RAMS Reliability and Maintainability, (2004), pp. 444-450. [161] M. Simard, S. S. Saatchi and G. De Grandi, “The use of decision tree and multiscale texture for classification of jers-1 sar data over tropical forest”, IEEE Transactions On Geoscience And Remote Sensing, vol. 38, no. 5, (2000). [162] F. Zhu Palaniappan, X. Zhuang and Y. Zhao Blanchard, “Enhanced binary tree genetic algorithm for automatic land cover classification”, Proceedings of International Geoscience and Remote Sensing Symposium, (2000), pp. 688-692. [163] R. Manavi, C. Weisbin, W. Zimmerman and G. Rodriguez, “Technology portfolio options for NASA missions using decision trees”, Proceedings of IEEE Aerospace Conference, Big Sky, Montana, (2002), pp. 115-126. [164] Y. Amit and A. Murua, “Speech recognition using randomized relational decision trees”, IEEE Transactions On Speech And Audio Processing, vol. 9, no. 4, (2001), pp. 333-341. [165] L. R. Bahl, P. F. Brown, P. V. De Souza and R. L. Mercer, “A tree-based statistical language model for natural language speech recognition”, IEEE Transactions On Acoustics, Speech, And Signal Processing, vol. 37, no. 7, (1989), pp. 1001-1008. [166] J. Yamagishi, M. Tachibana, T. Masuko and T. Kobayashi, “Speaking style adaptation using context clustering decision tree for hmm-based speech synthesis”, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, (1989), pp. 5-8. [167] R. W. Selby and Porter, “A learning from examples: generation and evaluation of decision trees for software resource analysis”, IEEE Transactions On Software Engineering, vol. 14, (1988), pp. 17431757. [168] T. M. Khoshgoftaar, N. Seliya and Y. Liu, “Genetic programming-based decision trees for software quality classification”, Proceedings of 15th IEEE International Conference on Tools with Artificial Intelligence, (2003), pp. 374-383. [169] S. Geetha, N. N. Ishwarya and N. Kamaraj, “Evolving decision tree rule based system for audio stego anomalies detection based on Hausdorff distance statistics”, Information Sciences Elsevier Publisher, (2010), pp. 2540-2559. [170] L. Diao, K. Hu, Y. Lu and C. Shi Boosting, “Simple decision trees with Bayesian learning for text categorization”, IEEE Robotics and Automation Society Proc. of the 4th World Congress on Intelligent Control and Automation, Shanghai, China, pp. 321-325. [171] B. Wu, W.-J. Zhou and W.-D. Zhang, “The applications of data mining technologies in dynamic traffic prediction”, IEEE Intelligent Transportation Systems, vol. 1, (2003), pp. 396-401. [172] P. Jeong and S. Nedevschi, “Intelligent road detection based on local averaging classifier in real-time environments”, Proc. of the 12th International Conference on Image Analysis and Processing. [173] E. Jaser, J. Kittler and W. Christmas, “Hierarchical decision making scheme for sports video categorization with temporal postprocessing”, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2004), pp. 908-913. [174] Song Cen and Pamela C. Cosman, “Decision trees for error concealment in video decoding”, IEEE Transactions on Multimedia, vol. 5, no. 1, (2003), pp. 1-7. [175] F. Bonchi, F. Giannotti, G. Manco, C. Renso, M. Nanni, D. Pedreschi and S. Ruggieri, “Data mining for intelligent web caching”, Proceedings of International Conference on Information Technology: Coding and computing, (2001). [176] M. Chen, A. Zheng, J. Lloyd, M. Jordan and E. Brewer, “Failure diagnosis using decision trees”, Proceedings of the International Conference on Autonomic Computing. [177] K. Kumar Reddy C, V. Babu B and C. H. Rupa, “SLEAS: Supervised Learning using Entropy as Attribute Selection Measure”, International Journal of Engineering and Technology, (2014), pp. 20532060. [178] K. Kumar Reddy C, C. H. Rupa and B. Vijaya Babu, “A Pragmatic Methodology to Predict the Presence of Snow/No-Snow using Supervised Learning Methodologies”, International Journal of Applied Engineering Research, (2014), pp. 11381-11394. [179] K. Kumar Reddy C, C. H. Rupa and V. Babu, “SPM: A Fast and Scalable Model for Predicting Snow/No-Snow”, World Applied Sciences Journal, (2014), pp. 1561-1570. [180] K. Kumar Reddy C, C. H. Rupa and V. Babu, “SLGAS: Supervised Learning using Gain Ratio as Attribute Selection Measure to Nowcast Snow/No-Snow”, International Review on Computers and Software, (2015). [181] K. Kumar Reddy C, V. Babu and C. H. Rupa, “ISIQ: Improved Supervised Learning using in Quest to Nowcast Snow/No-Snow”, WSEAS Transactions on Computers, (2015). [182] A. K. Pujari, “Data Mining Techniques”, Universities Press, (2004). Authors C. Kishor Kumar Reddy obtained his B.Tech in Information Technology from JNTU Anantapur in 2011, M.Tech in Computer Science and Engineering from JNTU Hyderabad Copyright ⓒ 2016 SERSC 31 International Journal of Artificial Intelligence and Applications for Smart Devices Vol. 4, No. 1 (2016) in 2013 and currently pursuing Ph. D in Computer Science Engineering from K L University, Guntur. He has published 28 papers in International Conferences and Journals, indexed by Scopus and Dblp databases. He is the member of IEEE, CSI, ISTE, IAENG, IACSIT, and IAPA. He is the editorial board member of IJAEIM. Dr B.Vijaya Babu is presently working as Professor in CSE department of K L University, Andhra Pradesh. He obtained his PhD (CSSE) degree from A U College of Engineering, Andhra University, VISAKHAPATNAM, Andhra Pradesh in 2012.He has teaching experience of about 20 years, in various private engineering colleges of Andhra Pradesh, in various positions. His research area is Knowledge Engineering/Data Mining and published about 30 research papers in various International/Scopus journals. He is the life member of Professional bodies like ISTE. 32 Copyright ⓒ 2016 SERSC