Download A Survey on Issues of Decision Tree and Non-Decision

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol.4, No.1 (2016), pp.9-32
http://dx.doi.org/10.14257/ijaiasd.2016.4.1.02
A Survey on Issues of Decision Tree and Non-Decision
Tree Algorithms
Kishor Kumar Reddy C1 and Vijaya Babu2
1
2
Department of Computer Science and Engineering, K L University, Guntur
Department of Computer Science and Engineering, K L University, Guntur
1
[email protected], [email protected]
Abstract
Decision tree and non-decision tree approaches are effectively used in many diverse
areas such as speech recognition, radar signal classification, satellite signal
classification, medical diagnosis, remote sensing, expert systems, and weather forecasting
and so on. Even though classification has been studied widely in the past, many of the
algorithms are designed only for memory-resident data, thus limiting their suitability for
data mining large data sets. The volume of data in databases is growing to quite large
sizes, both in the number of attributes, instances and class labels. Decision tree learning
from a very huge set of records in a database is quite complex task and is usually a very
slow process, which is often beyond the capabilities of existing computers. This paper is
an attempt to summarize the proposed approaches, tools etc. for decision tree learning
with emphasis on optimization of constructed trees and handling large datasets. Further,
we also discussed and summarized various non-decision tree approaches like Neural
Networks, Support Vector Machines, and Naive Bayes and so on.
Key-Words: -Classification, Decision Tree, Expert Systems, Pattern Recognition,
Remote Sensing
1. Introduction
Recent advances in methodologies of collecting the data, storage and processing
technology are providing a unique challenge and opportunity for automated data
exploration techniques. Huge amounts of data are being collected daily from major
scientific projects such as Geographical Information Systems, from hospital information
systems, Human Genome Project, the Hubble Space Telescope, from stocks trading, from
computerized sales records and many other sources. In addition, researchers, scientists,
students and practitioners from more diverse disciplines than ever before are attempting to
use automated methods to analyze their data. As the quantity and variety of data available
to data exploration methods increases, there is a commensurate need for robust, efficient,
accurate and versatile data exploration methods.
Supervised learning method attempts to discover relationship between the source
attributes and the target attribute [1-2]. Classification is an significant problem in the
emerging field of data mining and can be described as follows: The input data (training
set) comprises of multiple records, each having multiple attributes followed by class
labels [141][182]. The goal of classification is to analyze the input data and to develop an
accurate model for each class using the features present in the data. The class descriptions
are used to classify future test data for which the class labels are unknown. They can also
be used to develop a better understanding of each class in the data [1-2].
Decision trees have been found to be the best algorithms for data classification,
providing better accuracy to many real world applications [3-6] [141][182]. Researchers,
academicians, practitioners and scientists have developed various decision tree algorithms
ISSN: 2288-6710 IJAIASD
Copyright ⓒ 2016 SERSC
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
over a period of time with enhancement in performance and ability to handle various
types of data [7-15]. In this paper, we presented and summarized various decision tree
approaches [141] [182] like ID3 [39], C4.5 [8], CART [9], SLIQ [121], SPRINT
[122][126], SLEAS [177-179], SLGAS [180], ISLIQ [181] and so on. Further we also
discussed various non decision tree approaches such as Rule Based Classifier [141],
Nearest Neighbour Classifiers [141], Bayesian Classifiers [141], Artificial Neural
Network (ANN) [141] and Support Vector Machine [141].
We have laid out the rest of this paper as follows: Section 2 provides the representation
of decision tree. Section 3 provides various splitting criterions for decision tree
algorithms. Section 4 provides the detailed description on various pruning techniques and
the recent improvements. Section 5 provides the comparison and summary of various
decision tree and non-decision tree approaches. Section 6 provides various softwares in
learning decision tree and non-decision tree approaches. Section 7 discusses on real time
applications of decision tree and non-decision tree approaches. Section 8 concludes the paper
followed by references.
2. Decision Tree Representation
Decision tree is a classification model applies for existing data, handles high
dimensional data as it is scalable and doesn’t require any domain knowledge [1-6].
Decision tree is described as a complexion of mathematical and statistical technique for
categorization and generalization of a dataset. A decision tree is a schematic tree-shaped
diagram used to determine a course of action or show statistical probability. Each branch
of the decision tree represents a possible decision which may lead to another attribute till
it ends with leaf node. The target value is represented by each branch is termed as a leaf
of the decision tree wherein input variables are given at each branch. Splitting of the
source sets into the subsets recursively until leaf level is reached is the main tree learning
process of the decision tree [8].
A decision tree classifier divides the training set into the partitions recursively, so that
all or the most of the records are of similar class merely. The algorithm starts with the
entire dataset at the root node. The dataset is partitioned by using the splitting criteria into
the subsets which uses the various statistical approaches like entropy, gini index and gain
ratio and so on [6][8-16]. The splitting is carried out for the dataset by choosing the
attribute that will separate best the remaining samples of the nodes portioned into the
individual classes.
Decision tree generation consists of two phases [141] [182]:
Tree construction, which includes:
 At start, all the training examples are at the root.
 Partition examples recursively based on selected attributes.
Tree pruning, which includes:
 Identify and remove branches that reflect noise or outliers.
Decision trees are applied in decision analysis problems to identify the most suitable
strategy for reaching a concrete goal. Decision trees provide classification rules and can
be understood easily. However, the accuracy of decision trees has always been a problem
[1-2] [4].
Several advantages of decision tree-based classification have been pointed out
[141][182].
1. Knowledge acquisition from pre-classified examples circumvents the bottleneck of
acquiring knowledge from a domain expert.
10
Copyright ⓒ 2016 SERSC
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
2. Tree methods are exploratory as opposed to inferential. They are also
nonparametric. As only a few assumptions are made about the model and the data
distribution, trees can model a wide range of data distributions.
3. Hierarchical decomposition implies better use of available features and
computational efficiency in classification.
4. As opposed to some statistical methods, tree classifiers can treat uni-model as well
as multi-modal data in the same fashion.
5. Trees can be used with the same ease in deterministic as well as incomplete
problems.
6. Trees perform classification by a sequence of simple, easy-to-understand tests
whose semantics are intuitively clear to domain experts. The decision tree
formalism itself is intuitively appealing.
For these and other reasons, decision tree methodology can provide an important tool
in every data mining researcher, practitioner's tool box. In fact, many existing data mining
products are based on constructing decision trees from data.
Naturally decision makers prefer less complex decision tree, as it is considered more
comprehensible. Furthermore according to Breiman et al., the tree complexity has a
crucial effect on its accuracy performance. The tree complexity is explicitly controlled by
the stopping criteria used and the pruning method employed.
3. Splitting Criterions in Decision Tree
3.1. Gini Index[9][141][182]
Step 1: Calculate the G_Info value for the class label, shown in equation 1.
M
G _ in fo  1   Pi
2
i 1
(1)
Where Pi is the probability that a relative tuple in given dataset belongs to class Ci, M is
the number of class labels.
Step 2: Calculate the G_InfoD value for every attribute, shown in equation 2.
2

N
M
G _ inf o D   P j 1   Pi 


j 1
i 1
(2)
Step 3: The Gini Index is obtained by finding the difference between G_Info and
G_InfoD values, shown in equation 3.
GiniIndex
 G _ in fo  G _ in fo D
(3)
Step 4: The maximum Gini Index value is considered as the best split point and is
the root node, shown in equation 4.
Best
Split po int  Maximum
Copyright ⓒ 2016 SERSC
( Gini Index )
(4)
11
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
3.2. Entropy [7] [141][182]
Step 1: Calculate the Attribute Entropy value, shown in equation 5.

N
Attribute
Entropy

M
 P j    Pi log
j 1

i 1
2

Pi 

(5)
Step 2: Calculate the Class Entropy value for every attribute, shown in equation 6.
M
   Pi log
Class Entropy
2
Pi
i 1
(6)
Step 3: The Entropy is obtained by finding the difference between Class Entropy
value and Attribute Entropy, shown in equation 7.
Entropy
 Class Entropy
 Attribute
Entropy
(7)
Step 4: The maximum Entropy value is considered as the best split point and is the
root node, shown in equation 8.
Best Split po int  Maximum ( Entropy)
(8)
3.3. Gain Ratio[7] [141][182]
Step 1: Calculate Entropy using equation 7.
Step 2: Compute Split Info for each and every attribute, shown in equation (9).
Split Info
N T
i
  
i 1 T
log
 T 
 i 
2 
 T 


(9)
Step 3: Calculate Gain Ratio using equation 10.
Gain Ratio
Entropy

Split Info
(10)
Step 4: The maximum Gain Ratio value is considered as the best split point and is
the root node, shown in equation 11.
Best
split
point
 Maximum
Gain
Ratio

(11)
4. Decision Tree Pruning [141][182]
Decision trees are often large and complex and as a result they may become inaccurate
and incomprehensible. The causes of overfitting in decision trees are the noisy data,
unavailability of training samples for one or more classes and insufficient training
instances to represent the target function. Decision tree pruning removes one or more
subtrees from a decision tree. It makes overfitting trees more accurate in classifying
unseen data. Various methods have been proposed for decision tree pruning. These
methods selectively replace a subtree with a leaf, if it does not reduce classification
accuracy over pruning data set. Pruning may increase the number of classification errors
on the training set but it improves classification accuracy on unseen data [141][182].
12
Copyright ⓒ 2016 SERSC
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
4.1. Cost Complexity Pruning
Cost complexity pruning is used in the CART system [9]. Starting with initial
unpruned tree that is constructed from complete training set, this algorithm constructs a
chain of progressively smaller pruned trees by replacing one or more subtress best
possible leaves. The method prunes those subtrees that give the lowest increase in error
for the training data. The cost complexity of a tree is defined as ratio of number of
correctly classified instances to misclassified instances in training data plus number of
leaves in that tree multiplied by some parameter α [17-19].
4.2. Reduced-Error Pruning
The reduced error pruning proposed by Quinlan is a bottom-up approach in which the
non-leaf subtrees are replaced with best possible leaf nodes if these replacements reduce
the classification error on the pruning data set [19]. The process continues towards the
root node until the pruning decreases error. The process assures smallest and most
accurate decision trees with respect to the test data [17-20].
4.3. Critical Value Pruning
Mingers proposed critical value pruning, which uses the information gathered during
tree construction [20]. It sets a threshold called a critical value to select a node for
pruning. Various measures such as gain, info gain etc. can be used to select the best
attribute at the test node. If the value of selection criterion is smaller than this threshold
value the subtree is replaced with a best possible leaf. However, if the subtree contains at
least one node having value greater than the threshold, the subtree cannot be pruned [1718].
4.4. Minimum Error Pruning
Niblett and Bratko proposed Minimum-error pruning, which is a bottom-up approach
[17]. To get error estimates of a subtree to be pruned, the errors for its children are
estimated. The dynamic error of the node is calculated as weighted sum of static errors of
its children. If dynamic error of t is less than its static error, t will be pruned and will be
replaced with best possible leaf.
4.5. Pessimistic Error Pruning
Pessimistic error pruning, a top down approach proposed by Quinlan uses error rate
estimates to make decisions concerning pruning the subtrees similar to cost complexity
pruning [19]. It calculates classification errors on training data and does not require
separate pruning set. Since the classification errors estimated from training set cannot
provide best pruning results for unseen data, this pruning technique assumes that each leaf
classifies a certain fraction of instances with error. To reflect these errors, it adds a
continuity correction for binomial distribution to the derived training error of a subtree.
However, as the corrected misclassification estimation by a subtree is expected to be
optimistic, the algorithm calculates standard error. Quinlan recommends pruning a subtree
if its corrected estimate of error is lower than that for the node by at least one standard
error [18-20].
4.6. Error-Based Pruning
Error-based pruning is the default pruning method for the well-known C4.5 decision
tree algorithm. Instead of using a pruning set it uses error estimates. The method assumes
that the errors are binomially distributed and calculates error estimates from the training
data. The number of instances covered by the leaf of the tree is used to estimate errors. A
Copyright ⓒ 2016 SERSC
13
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
bottom-up approach is used for error-based pruning. If the number of predicted errors for
the leaf is not greater than the sum of the predicted errors for the leaf nodes of that subtree
then subtree is replaced with that leaf [18].
4.7. Minimum Description Length Pruning
Mehata et al., and Quinlan and Rivest utilized MDL principle for decision tree pruning
[21-22]. The principle of minimum description length used here states that a classifier that
compresses the data is a preferable inducer. The MDL pruning method selects decision
tree with less number of bits required to represent it. The size of the decision tree is
measured as number of bits required for encoding the decision tree. The method searches
for decision tree that maximally compresses the data [18].
4.8. Optimal Pruning
Optimal pruning algorithm constructs smaller pruned trees with maximum
classification accuracy on training data. Breiman et al., [9] first suggested a dynamic
programming solution for optimal pruning algorithm. Bohanec and Bratko [23]
introduced an optimal pruning algorithm called OPT which gives better solution.
Almuallim [24] proposed an enhancement to OPT called OPT-2. It is also based on
dynamic programming and has flexibility in various aspects and is easy to implement.
4.9. Improvements to Pruning Algorithms
Matti Kääriäinen [25] analyzed reduced error pruning and proposed a new method for
obtaining generalization of error bounds for pruning the decision trees. Error-based
pruning has been blamed for the general effect of under-pruning. Hall et al., [26] proved
that if the certainty factor value CF is appropriately set for the data set, error-based
pruning constructs trees that are essentially steady in size, in spite of the amount of
training data. The CF calculates the upper limit of the probability of an error at a leaf.
Oates and Jenson [27] presented improvements to reduced error pruning to handle
problems with large data sets.
Macek and Lhotsk [28] presented a technique for pruning of decision trees based on
the complexity measure of a tree and its error rate. The technique utilizes the Gaussian
complexity averages of a decision tree to compute the error rate of classification. Frank
[29] enhanced performance of standard decision tree pruning algorithm. The performance
is enhanced with statistical significance of observations. Bradford et al. [30] proposed
pruning decision trees with misclassification cost with respect to loss. Scott [31] proposed
algorithms for size-based penalties and subadditive penalties. Bradley and Lovell [32]
proposed a pruning technique that is sensitive to the relative costs of data
misclassification. They implemented two cost-sensitive pruning algorithms, by extending
pessimistic error pruning and minimum error pruning technique.
Cai1 et al., [33] proposed cost-sensitive decision tree pruning CC4.5 to deal with
misclassification cost in the decision tree. It provides three cost-sensitive pruning methods
to handle with misclassification cost in the decision tree. Mansour [34] proposed
pessimistic decision tree pruning based on tree size. A graphical frontier-based pruning
(FBP) algorithm is proposed by Huo et al., [35] which provides a full spectrum of
information while pruning the tree. The FBP algorithm starts from leaf nodes and
proceeds towards root node with local greedy approach. The authors further proposed
combination of FBP and cross validation method. In decision tree learning pre-pruning
handles noise and postpruning handles the problem of overfitting. Faurnkranz [36]
proposed two algorithms to combine pruning and post pruning operations. A method for
pruning of oblique decision trees was proposed by Shah and Sastry [37].
14
Copyright ⓒ 2016 SERSC
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
4.10. Comparison of Pruning Methods
The empirical comparative analysis is one of the important methods to compare the
performance of various available algorithms. Quinlan [19] examined and empirically
compared tree cost complexity pruning, reduced error pruning and pessimistic pruning on
some data sets. These methods have demonstrated significant improvement in terms of
size of the tree. Cost complexity pruning tends to produce smaller trees than reduced error
pruning or pessimistic error pruning where as in case of classification accuracy, reduced
error pruning is somewhat superior to Cost complexity pruning.
Floriana et al., [18] presented comparative analysis with six well known pruning
methods. Each method has been critically reviewed and its performance has been tested.
The paper provides study of theoretical foundations, computational complexity and
strengths and weaknesses of the pruning methods. According to this analysis, reducederror pruning outperforms other methods. In addition, MEP, CVP and EBP tend to under
prune whereas reduced-error pruning tends to over prune. Similarly, Mingers [19]
analyzed five pruning methods with four different splitting criteria. The author has
provided the analysis based on size and accuracy of the tree. This work showed that
minimum-error pruning is extremely sensitive to the number of classes in the data and is
the least accurate method. Pessimistic error pruning is bad on certain datasets and needs to
be handled with care. Critical value, cost complexity, and reduced-error pruning methods
produced trees with low error rates on all the data sets with consistency. He further
clarified that there is no evidence of relation between splitting criteria and pruning
method. Windeatt [17] presented empirical comparison of pruning methods for ensemble
classifiers. It has been proved that error based pruning performs best for ensemble
classifiers.
From above studies we can conclude that reduced error pruning and cost complexity
pruning methods are the promising pruning methods as compared to other available
methods.
5. Comparison of Algorithms [40-95][132-139][141][182]
5.1. CHAID Algorithm [38]:
CHAID (Chi-squared Automatic Interaction Detector) is a fundamental decision tree
learning algorithm. It was developed by Gordon V Kass in 1980 [38]. CHAID is easy to
interpret, easy to handle and can be used for classification and detection of interaction
between variables. CHAID is an extension of the AID (Automatic Interaction Detector)
and THAID (Theta Automatic Interaction Detector) procedures. It works on principle of
adjusted significance testing. After detection of interaction between variables it selects the
best attribute for splitting the node which made a child node as a collection of
homogeneous values of the selected attribute. The method can handle missing values. It
does not imply any pruning method.
5.2. HUNTS Algorithm [39]:
Hunt’s algorithm generates a decision tree by top-down or divides and conquers
approach. The sample/row data contains more than one class, use an attribute test to split
the data into smaller subsets. Hunt’s algorithm maintains optimal split for every stage
according to some threshold value as greedy fashion.
5.3. ID3 Algorithm [39]:
ID3 stands for Iterative Dichotomiser 3. It builds the tree in a top down fashion,
starting from a set of objects and a specification of properties. At each node of the tree, a
property is tested and the results used to partition the object set. This process is
Copyright ⓒ 2016 SERSC
15
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
recursively done till the set in a given sub-tree is homogeneous with respect to the
classification criteria - in other words it contains objects belonging to the same category.
This then becomes a leaf node. At each node, the property to test is chosen based on
information theoretic criteria that seek to maximize information gain and minimize
entropy. In simpler terms, that property is tested which divides the candidate set in the
most homogeneous subsets. The order in which attributes are chosen determines how
complicated the tree is. ID3 uses entropy to determine the most informative attribute. ID3
does not use pruning method. It cannot handle numeric attributes and missing attribute
values [39].
5.4. C4.5 Algorithm [8][100][127-129]:
The C4.5 algorithm is improvement over ID3 algorithm. The algorithm uses gain ratio
as splitting criteria [8]. It can accept data with categorical or numerical values. To handle
continuous values it generates threshold and then divides attributes with values above the
threshold and values equal to or below the threshold. The default pruning method is errorbased pruning. As missing attribute values are not utilized in gain calculations the
algorithm can easily handle missing values.
5.5. CART Algorithm [9]:
Classification and regression tree (CART) proposed by Breiman et al., constructs
binary trees [9]. The word binary implies that a node in a decision tree can only be split
into two groups. CART uses gini index as impurity measure for selecting attribute. The
attribute with the largest reduction in impurity is used for splitting the node's records. It
can accept data with categorical or numerical values and also handle missing attribute
values. It uses cost-complexity pruning. It can also generate regression trees.
5.6. C5.0 Algorithm:
C5.0 algorithm is an extension of C4.5 algorithm which is also extension of ID3. It is
the classification algorithm which applies in big data set. It is better than C4.5 on the
speed, memory and the efficiency. C5.0 model works by splitting the sample based on the
field that provides the maximum information gain. The C5.0 model can split samples on
basis of the biggest information gain field. The sample subset that is get from the former
split will be split afterward. The process will continue until the sample subset cannot be
split and is usually according to another field. Finally, examine the lowest level split,
those sample subsets that don’t have remarkable contribution to the model will be
rejected. C5.0 is easily handled the multi value attribute and missing attribute from data
set.
5.7. Evolutionary Techniques:
A decision trees is called optimal if it correctly classifies the data set and has minimal
number of nodes. The decision tree algorithms use local greedy search method by means
of information gain as target function to split the data set. The decision trees generated by
these methods are efficient with classification accuracy but they often experience the
disadvantage of excessive complexity. Construction of optimal decision tree is identified
as NP-Complete problem. This fact leads the use of genetic algorithms that provide global
search through space in many directions simultaneously. The genetic algorithm is used to
handle combinatorial optimization problems. Different authors have proposed a use of
methodologies that integrates genetic algorithms and decision tree learning in order to
evolve optimal decision trees. Although the methods are different the goal is to obtain
optimal decision trees [3].
16
Copyright ⓒ 2016 SERSC
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
5.8. GATree [96]:
A. Papagelis and D. Kalles proposed GATree, a genetically evolved decision trees. The
genetic algorithms use binary string as initial populations but GATree uses binary
decision trees as initial populations. A binary decision tree that includes one decision
node with two different leaves. Initially to construct such initial trees a random attribute is
selected, if that attribute is nominal valued one of its possible values is randomly selected
and in case of continuous attributes an integer value from its minimum to maximum range
is randomly selected. Thus the size of the search space is reduced. Two arbitrary nodes
from population of subtrees are selected and nodes of those subtrees are swapped to
perform crossover operation. In view of the fact that a predicted class value depends just
on leaves, the crossover operator does not affect the decision trees consistency. An
arbitrary node of a preferred tree is selected and it substitutes the node’s test-value with a
new arbitrary chosen value to perform mutation. In case if the arbitrary node is a leaf, it
substitutes the installed class with a new arbitrary chosen class. Validation is performed
after crossover and mutation to get final decision tree. The fitness function for evaluation
is percentage of correctly classified instances on the test data set by the decision tree. The
results show compact and equally accurate decision trees as compared to standard
decision tree algorithms.
5.9. GAIT Algorithm [97]:
Z. Fu proposed GAIT [97] algorithm. The algorithm constructs a set of different
decision trees from different subsets of the original data set by using a decision tree
algorithm C4.5, on small samples of the data. The genetic algorithm uses these trees as its
initial populations. The selection operation selects decision trees from pool by random
selection mechanism. The crossover operation exchanges subtrees between the parent
trees whereas mutation exchanges subtrees or leaf inside the same tree. The fitness
criterion for evaluation is the classification accuracy. The validation on fitness function is
performed after crossover and mutation to get final decision tree that are smaller in size
[98-99].
5.10. Fuzzy Decision Tree [101-106]:
The fuzzy decision tree provides elevated comprehensibility and the elegant
performance of fuzzy systems. Fuzzy sets and fuzzy logic permits the modelling of
language related uncertainties. In addition to this, it provides a symbolic outline for
knowledge comprehensibility, the capability to represent fine knowledge details and the
ability in dealing with problems of noise and inexact data. The tree construction
procedure for fuzzy decision tree is similar to decision tree construction algorithm. The
splitting criteria are based on fuzzy boundaries but the procedures for inference are
dissimilar in fuzzy decision tree. The fuzzy decision trees have fuzzy decisions at each
branching point. It makes calculating the best split difficult to some extent if the attributes
are continuous valued or multivalue. Constructing smaller fuzzy decision trees is valuable
as they contain more information in internal nodes [101].
The enhancements to the fuzzy decision tree algorithms are as follows. Zeidler and
Schlosser proposed use of membership function to discretize the attributes for handling
continuous valued attributes [102]. Janikow optimized the fuzzy component of a fuzzy
decision tree using a genetic algorithm [103]. Myung Won Kim et al., proposed an
algorithm that determines an appropriate set of membership functions for each attribute
[104]. The algorithm uses histogram analysis with application of the genetic algorithm to
tune the initial set of membership functions. A fuzzy decision tree with given set of
membership functions is constructed. Fajfer and Janikow described bottom-up fuzzy
partitioning in fuzzy decision trees, a complement of top down technique [105]. The
proposed algorithm is useful to partition continuous valued attributes into fuzzy sets.
Copyright ⓒ 2016 SERSC
17
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
Guetova et al. proposed Incremental fuzzy decision trees. The algorithm gets equivalent
results to non-incremental methods [106].
5.11. Parallel and Distributed Algorithms [107-120]
Parallelization is a renowned, conventional means to speed up classification tasks with
large amounts of data and complex programs. In the data mining applications, the size of
dataset is growing that leads us to find out computationally efficient, parallel and scalable
algorithms with the objective to get optimal accuracy in a reasonable amount of time with
parallel processors. The algorithms work in parallel using multiple processors to construct
a single reliable model [107-120].
Kazuto et al., explained two methods for parallelizing decision tree algorithm, intranode and the inter-node parallelization [107]. Intra-node parallelization practices the
parallel processing in single node and Inter-node parallelization practices the parallel
processing among multiple nodes. Intranode parallelism is further classified in record
parallelism, attribute parallelism and their combination. Authors have implemented and
experimented these four types of parallelizing methods with four kinds of test data. The
performance analysis from these experiments states that there is a relation between the
characteristics of data and the parallelizing methods. The combination of various
parallelizing approaches is the most effectual parallel method.
Kufrin proposed a framework for decision tree construction on shared and distributed
memory multiprocessor [108]. The method builds parallel decision trees that overcome
limitation of serial decision tree on large-scale training data. Narlikar proposed parallel
structure of a decision tree-based classifier for memory-resident datasets on SMP. The
structure uses two types of divide-and-conquer parallelism, intra-node parallelization and
the inter-node parallelization with lightweight Pthreads [109]. Experimental verification
on large datasets signifies that the space and time performance of the tree construction
algorithm scales with the data size and number of processors.
Joshi et al., proposed ScalparC, a scalable parallel classification algorithm for mining
large datasets with decision trees using MPI on Cray T3D system [110]. This
implementation confirms scalability and efficiency of ScalparC for wide range of training
set and wide range of processors.
Hall et al., presented combining decision trees learned in parallel [111]. The proposed
algorithm builds decision trees with n disjoint data subsets of a complete dataset in
parallel and after that converts them into rules to combine into a single rule set. The
experiments on two datasets illustrate that there is enhancement of around 40% in
quantity of rules generated by decision tree.
Zaki et al., proposed parallel algorithm for building decision tree on shared memory
multiprocessors and it was verified that it achieves good speedup [112]. Srivastava et al.
presented two parallel formulations for decision tree induction as synchronous tree
induction and partitioned tree induction [113]. Authors proposed a hybrid method that
implements the high quality features of these formulations. The experimental results
illustrate the high speedups and scalability in processing.
Kazuto et al., proposed a parallel decision tree algorithm on a PC cluster [114]. Plain
parallelization of decision tree is not efficient due to load imbalance. The proposed
algorithm is a better parallel algorithm with data redistribution. The parallel algorithm’s
performance on benchmark data demonstrate that it provides an improvement in speed of
3.5 times, in the best case and equal performance even in the worst case. Jin and Agrawal
proposed parallel decision tree construction with memory and communication efficiency
[115]. The approach achieves very low communication volume; no need to sort the
training records, during the execution and combining shared memory and distributed
memory parallelization.
Jin et al., proposed use of SMP machines with a chain of techniques that includes full
replication, full locking, fixed locking, optimized full locking and cache sensitive locking
18
Copyright ⓒ 2016 SERSC
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
for parallelization of data mining algorithms [116]. The results state that among full
replication, optimized full locking and cache sensitive locking, there is no clear
conqueror. Any of these three techniques can outperform other technique depending upon
machine and dataset. These techniques perform considerably better than the other two
techniques. In decision tree construction, combining different techniques is found to be
critical for obtaining high performance. Li Wenlong et al., proposed parallel decision tree
algorithm based on combination. Similarly distributed decision trees learning algorithms
are proposed [117].
Jie Ouyang et al., proposed Chi-Square test based decision trees induction in
distributed environment [118]. Kanishka Bhaduri et al., proposed distributed decision tree
induction in peer-to-peer systems [119]. Bin Liu et al. proposed data mining in distributed
data environment [120].
5.12. SLIQ Algorithm [121]:
The SLIQ (Supervised Learning in Quest) was developed by the Quest team at the
IBM Almaden Research Center to handle both numeric and categorical datasets [121]. A
SLIQ decision tree classifier recursively divides the training set into partitions so that
most or all of the records are of a similar class. The algorithm starts with the entire dataset
at the root node. The dataset is partitioned by splitting the criteria into subsets according
to the gini index. The attribute containing the split point that maximizes the reduction in
impurity or, equivalently, has the minimum gini index, is used to split the node. The value
of a split point depends upon how well it separates the classes. The splitting is carried out
for the dataset by choosing the attribute that will best separate the remaining samples of
the nodes apportioned into the individual classes. SLIQ eliminates the need to sort the
data at every node of decision tree, despite the training datasets are sorted only once for
every numeric attribute at the beginning of the tree growth phase.
5.13. SPRINT Algorithm [122][126]:
Shafer et al., proposed SPRINT that provides scalability, parallelism and removes
memory restriction [122]. It achieves parallelism by its design, which allows multiple
processors to work together. In this algorithm, lists are created as are created in SLIQ
[121]. Initially an attribute list for each attribute in the data set is created. The entries in
these lists called attribute records that consists an attribute value, a class label and the
index of the record. Initial lists for continuous attributes are sorted by attribute value. If
the complete data does not fit in memory, attribute lists are kept on disk. Thus memory
restrictions are solved. The initial lists formed from the training set are linked with the
root of the classification tree. As the algorithm executes the tree is grown and nodes are
split to create new children, the attribute lists for each node are partitioned and associated
with the children. The order of the records in the list is maintained while partition and
thus partitioned lists never require resorting. The algorithm uses gini index as splitting
criteria. The results demonstrate good scale up and speedup on large data set. The size up
performance is also good because the communication cost for exchanging split points and
count matrices does not change as the training set size is increased.
5.14. Rain Forest Algorithm [123]:
Gehrke et al., proposed a framework called Rainforest that provides approach for
implementing scalability in decision tree algorithms with large data sets [123]. Rainforest
makes refinement to some initial steps of decision tree construction algorithm. Algorithm
creates only one attribute list for all categorical attributes jointly. It creates the histograms
for splitting information and thus avoids additional scan. The refinement is made up to
this step, afterwards remaining part conventional decision tree algorithm proceeds.
Copyright ⓒ 2016 SERSC
19
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
5.15. CLOUDS Algorithm [124]:
Alsabti et al., proposed CLOUDS a decision tree classifier for large datasets [124]. The
proposed algorithm samples for splitting points on numeric attributes followed by
estimate procedure to narrow search space of best split. CLOUDS reduces computational
and I/O complexity as compared to benchmark classifiers with quality in terms of
accuracy and tree size.
5.16. BOAT Algorithm [125]:
Gehrke et al., proposed BOAT an approach for optimistic decision tree construction
[125]. It uses small subset of data for initial decision tree construction and refines it to
construct final tree. With only two scans of training data it can construct several levels of
decision tree and thus it is claimed to be faster by factor three. It can handle insertion and
deletion of data in dynamic databases and thus it is first scalable incremental decision tree
approach.
5.17. SLEAS Algorithm [177-179]:
Recently Kishor Kumar Reddy et al., proposed SLEAS (Supervised Learning using
Entropy as Attribute Selection measure) to handle both numeric and categorical datasets
[177]. SLEAS decision tree classifier recursively divides the training set into partitions so
that most or all of the records are of a similar class. The algorithm starts with the entire
dataset at the root node. The dataset is partitioned by splitting the criteria into subsets
according to the entropy. The attribute containing the split point that maximizes the
reduction in impurity or, equivalently, has the maximum entropy, is used to split the node.
The value of a split point depends upon how well it separates the classes. The splitting is
carried out for the dataset by choosing the attribute that will best separate the remaining
samples of the nodes apportioned into the individual classes. SLEAS eliminates the need
to sort the data at every node of decision tree, despite the training datasets are sorted only
once for every numeric attribute at the beginning of the tree growth phase.
5.18. SLGAS Algorithm [180]:
Recently Kishor Kumar Reddy et al., proposed SLGAS (Supervised Learning using
Gain Ratio as Attribute Selection measure) to handle both numeric and categorical
datasets [180]. SLGAS decision tree classifier recursively divides the training set into
partitions so that most or all of the records are of a similar class. The algorithm starts with
the entire dataset at the root node. The dataset is partitioned by splitting the criteria into
subsets according to the gain ratio. The attribute containing the split point that maximizes
the reduction in impurity or, equivalently, has the maximum gain ratio, is used to split the
node. The value of a split point depends upon how well it separates the classes. The
splitting is carried out for the dataset by choosing the attribute that will best separate the
remaining samples of the nodes apportioned into the individual classes. SLGAS
eliminates the need to sort the data at every node of decision tree, despite the training
datasets are sorted only once for every numeric attribute at the beginning of the tree
growth phase.
5.19. ISLIQ Algorithm [181]:
Recently Kishor Kumar Reddy et al., proposed ISLIQ (Improved Supervised Learning
in Quest) to handle both numeric and categorical datasets [181]. ISLIQ decision tree
classifier recursively divides the training set into partitions so that most or all of the
records are of a similar class. The algorithm starts with the entire dataset at the root node.
The dataset is partitioned by splitting the criteria into subsets according to the gini index.
The major drawback with SLIQ is split points are computed whenever there is a change in
20
Copyright ⓒ 2016 SERSC
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
the class label. This increases computational complexity when the numbers of records are
more. To overcome this, in ISLIQ interval range is computed and based on the interval
range split points are evaluated. The attribute containing the split point that maximizes the
reduction in impurity or, equivalently, has the maximum gini index, is used to split the
node. The value of a split point depends upon how well it separates the classes. The
splitting is carried out for the dataset by choosing the attribute that will best separate the
remaining samples of the nodes apportioned into the individual classes. ISLIQ eliminates
the need to sort the data at every node of decision tree, despite the training datasets are
sorted only once for every numeric attribute at the beginning of the tree growth phase.
5.20. KNN [1] [141]:
KNN is considered among the oldest non-parametric classification algorithms. To
classify an unknown example, the distance (using some distance measure e.g., Eculidean)
from that example to every other training example is measured. The k smallest distances
are identified, and the most represented class in these k classes is considered the output
class label. The value of k is normally determined using a validation set or using crossvalidation.
5.21. Naive Bayes [1][141]:
The Naïve Bayes classifier works on a simple, but comparatively intuitive concept.
Also, in some cases it is also seen that Naïve Bayes outperforms many other
comparatively complex algorithms. It makes use of the variables contained in the data
sample, by observing them individually, independent of each other. The Naïve Bayes
classifier is based on the Bayes rule of conditional probability. It makes use of all the
attributes contained in the data, and analyses them individually as though they are equally
important and independent of each other.
5.22. Support Vector Machines [1][141]:
Support Vector Machines are supervised learning methods used for classification, as
well as regression. The advantage of Support Vector Machines is that they can make use
of certain kernels in order to transform the problem, such that we can apply linear
classification techniques to non-linear data. Applying the kernel equations arranges the
data instances in such a way within the multi-dimensional space, that there is a hyperplane that separates data instances of one kind from those of another.
The kernel equations may be any function that transforms the linearly non-separable
data in one domain into another domain where the instances become linearly separable.
Kernel equations may be linear, quadratic, Gaussian, or anything else that achieves this
particular purpose. Once we manage to divide the data into two distinct categories, our
aim is to get the best hyper-plane to separate the two types of instances. This hyper-plane
is important because it decides the target variable value for future predictions. We should
decide upon a hyper-plane that maximizes the margin between the support vectors on
either side of the plane. Support vectors are those instances that are either on the
separating planes on each side, or a little on the wrong side.
One important thing to note about Support Vector Machines is that the data to be
separated needs to be binary. Even if the data is not binary, Support Vector Machines
handles it as though it is, and completes the analysis through a series of binary
assessments on the data.
5.23. Neural Networks [1][141]:
The back propagation algorithm performs learning on a multilayer fee-forward neural
network. The inputs correspond to the attributes measured for each raining sample. The
Copyright ⓒ 2016 SERSC
21
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
inputs are fed simultaneously into layer of units making up the input layer. The weighted
outputs of these units are, in turn, fed simultaneously to a second layer of neuron like
units, known as a hidden layer. The hidden layer s weighted outputs can be input to
another hidden layer, and so on. The number of hidden layers is arbitrary, although in
practice, usually only one is used. The weighted outputs of the last hidden layer are input
to units making up the output layer, which emits the network’s prediction for given
samples. Except for the input nodes, each node is a neuron (or processing element) with a
nonlinear activation function. MLP utilizes a supervised learning technique for training
the network.
6. Decision Tree Learning Software
Various decision tree softwares are available for researchers working in data mining.
Some of the prominent softwares employed for analysis of data and some of the
commonly used data sets for decision tree learning are discussed below.
6.1. WEKA [140]:
WEKA (Waikato Environment for Knowledge Analysis) workbench is set of different
data mining tools developed by machine learning group at University of Waikato, New
Zealand. WEKA versions supporting windows, Linux and MAC operating systems are
available [140]. It provides various associations, classification and clustering algorithms;
in addition to that it provides pre-processors like filters and attribute selection algorithms.
In case of decision tree learning WEKA provides J48 i.e., C4.5 implementation in java,
Simple CART and Random Forest Tree are some of the prominent tree classifiers. In J48
we can construct trees with EBP, REP and unpruned trees. The input data file is in .arff
(attribute relation file format) format. The source code is available to the user.
6.2. C4.5 [2]
C4.5 Version C4.5.8 developed by Quinlan supports Unix based operating systems
only [2]. The package consists of programs for decision tree generator, the rule generator,
the decision tree interpreter and the production rule interpreter. The decision tree
generator expects two input files with extentions .name and .data file as input files. The
.name file provides information about attributes and classes. The .data file contains actual
attribute values with their class. The source code of C4.5 is available to the user.
6.3. OC1
OC1 is an oblique decision tree classifier by Murthy [136]. Various splitting criteria
are also available with this package. The OC1 software can also be used to create both
standard axis-parallel decision trees and oblique trees. The source code is available to the
user.
6.4. GATree [96]
GATree is evolutionary decision tree by Papagelis and Kalles [96]. GATree works on
windows operating system. The evaluation version of GATree is available on request to
the authors. Here we can set various parameters like generations, populations, crossover
and mutation probability etc. to generate decision trees.
7. Applications of Decision Trees [141]
The decision tree algorithm has applications in all walks of life [141-176] [182]. The
application areas are listed below:
22
Copyright ⓒ 2016 SERSC
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
Business:
Virine and Rapley proposed use of decision trees in visualization of probabilistic
business models [142]. Yang et al., proposed use of decision tree in customer relationship
management [143]. Zhang et al. proposed use of decision tree in credit scoring for credit
card users [144].
E-Commerce:
A good online catalog is essential for the success of an e-commerce web site, Sung et
al., mentioned use of decision tree for construction of online catalog topologies [145].
Energy Modelling:
Energy modelling for buildings is one of the important tasks in building design. Zhun
et al., proposed decision tree method for building energy demand modelling [146].
Image Processing:
Macarthur et al., proposed use of decision tree in content-based image retrieval [147].
Park et al. proposed perceptual grouping of 3-D features in aerial image using decision
tree classifier [148].
Intrusion Detection:
Sinclair et al., proposed decision trees with genetic algorithms to automatically
generate rules for an intrusion detection expert system [149]. Abbes et al., proposed
protocol analysis in intrusion detection using decision tree [150].
Medical Research:
Medical research and practice are the important areas of application for decision tree
techniques. Stasis et al., proposed decision trees algorithms for heart sound diagnosis
[151]. Lenic et al., focused on decision tree methods that can support physicians in
medical diagnosing in case of mitral valve prolapsed [152]. Kokol et al., introduced
decision trees as part of intelligent systems that help physicians [153]. Dong et al.,
proposed evaluating skin condition using a decision tree [154]. Kennedy and Adams
proposed decision tree to help out in selecting a brain computer interface device for
patients who are cognitively intact but unable to move or communicate [155]. Hui and
GaiLiping proposed analysis of complex diseases by statistical estimation of diagnosis
with genetic markers based on decision tree analysis [156].
Intelligent Vehicles:
The job of finding the lane boundaries of the road is important task in development of
intelligent vehicles. Gonzalez and Ozguner proposed lane detection for intelligent
vehicles by using decision tree [157].
Object Recognition:
Freixenet et al., proposed use of decision trees for color feature selection in object
recognition for outdoor scenes [158].
Reliability Engineering:
Claudio and Rocco proposed approximate reliability expressions for network reliability
using a decision tree approach [159]. Assaf and Dugan proposed method that establishes a
Copyright ⓒ 2016 SERSC
23
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
dynamic reliability model and generates a diagnostic model using decision trees for a
system [160].
Remote Sensing:
Remote sensing is a strong application area for pattern recognition work with decision
trees. Simard et al., proposed decision tree-based classification for land cover categories
in remote sensing [161]. Palaniappan et al., proposed binary tree with genetic algorithm
for land cover classification [162].
Space Application:
The portfolio allocation problem is pervasive to all research activities. The use of past
experience in this area with help of decision trees is recommended. Manvi et al.,
suggested use of decision trees in NASA space missions [163].
Speech Recognition:
Amit and Murua proposed speech recognition using randomized decision trees [164].
Bahl et al. proposed a tree-based statistical language model for natural language speech
recognition, which predicts the next word spoken, based on previous word spoken [165].
Yamagishi et al. proposed decision trees for speech synthesis [166].
Software Development:
Selby and Porter proposed decision trees for software resource analysis [167].
Khoshgoftaar et al., proposed decision trees for software quality classification [168].
Steganalysis:
Geetha et al., proposed evolving decision tree based system for audio stego anomalies
[169].
Text Processing:
Diao et al., introduced decision trees for text categorization [170].
Traffic and Road Detection: Wu et al., proposed use of decision tree in analysing,
predicting and guiding the traffic flow [171]. Jeong and Nedevschi proposed intelligent
road region detection based on decision tree in highway and rural way environments
[172].
Video Processing: Jaser et al., proposed automatic sports video classification with
decision tree [173]. Cen and Cosman explained decision trees for error concealment in
video decoding [174].
Web Applications: Bonchi et al., proposed decision trees for intelligent web caching
[175]. Chen et al., presented a decision tree learning approach to diagnosing failures in
large Internet sites [176].
8. Conclusion
In this paper, we have presented a multi-disciplinary survey of work in constructing the
decision tree and non-decision tree for a given dataset. The main goal is to provide an
existing work on decision trees and non-decision trees, and a taste of their usefulness, to
the newcomers as well as practitioners in the emerging field of data mining and
knowledge discovery. We also hope that overviews like these can help to avoid
24
Copyright ⓒ 2016 SERSC
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
redundant, ad hoc effort, from researchers, academicians, scientists, practitioners and so
on to a greater extent.
In this paper, we have presented various properties of decision trees, splitting criterion, pruning
methodologies, decision tree algorithms and non-decision tree algorithms. Further, we presented
various decision tree and non-decision tree learning softwares and real time applications of
decision trees.
As per the observations from the survey it is found that:
1. The performance of a decision tree strongly depends on type of attribute selection measure
used such as entropy, gini index, gain ratio and so on.
2. The performance of a decision tree strongly depends on dataset i.e. number of attributes,
instances, training data, test data.
3. The performance of decision trees increases to a greater extent after pruning.
4. The performance of a decision tree in terms of time decreases when it is executed parallely.
5. The rules generated by the decision trees are easy to understand.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
T. M. Mitchell, “Machine Learning”, The McGrawHill Companies, Inc., (1997).
J. R. Quinlan, “C4.5: Programming for Machine Learning”, San Francisco, CA: Morgan Kaufman,
(1993).
S. K. Murthy, “Automatic construction of decision trees from data: a multi-disciplinary survey”, Data
Mining and Knowledge Discovery, vol. 2, no. 4, (1998), pp. 345-389.
E. Alpaydin, “Introduction to machine Learning Prentice-Hall of India”, (2005).
S. Ruggieri, “Efficient C4.5”, IEEE Transaction on Knowledge and Data Engineering, vol. 14, no. 2,
(2002), pp. 438-444.
M. Ben-Bassat, “Use of distance measure, Information measures and error bounds on feature
evaluation”, In Sreerama Murthy, vol. 1, pp. 9-11.
M. Last and O. Maimon, “A compact and accurate model for classification”, IEEE Transactions on
Knowledge and Data Engineering, vol. 16, no. 2, pp. 203-215.
B. Hwan Jun, C. Soo Kim, H.-Y. Song and J. Kim, “A new criterion in selection and discretization of
attributes for the generation of decision trees”, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 19, no. 12, pp. 1371- 1375.
L. Breiman, J. H. Friedman, R. A. Olshen and C. J. Stone, “Classification and Regression Trees”,
Wadsworth International Group, Belmont, California.
S. K. Murthy, Simon Kasif and Steven Salzberg, “A system for induction of oblique decision trees”,
Journal of Artificial Intelligence Research, vol. 2, pp. 1-33.
R. S. Mantaras, “A distance based attribute selection measure for decision tree induction”, Technical
Report, Machine Learning, vol. 6, pp. 81-92.
B. Chandra, R. Kothari and P. Paul, “A new node splitting measure for decision tree construction Pattern
Recognition”, Elsevier Publishers, vol. 43, pp. 2725-2731.
E. Rounds, “A combined nonparametric approach to feature selection and binary decision tree design”,
Pattern Recognition, vol. 12, pp. 313-317.
P. E. Utgoff and J. A. Clouse, “A KolmogorovSmirnoff metric for decision tree induction”, Tech. Rep.
No. 96-3, Dept. Comp. Science, University Massachusetts, Amherst.
J. K. Martin, “An exact probability metric for decision tree splitting and stopping”, Machine Learning,
vol. 28, no. 2-3, pp. 257-29.
W. L. Buntine and T. Niblett, “A further comparison of splitting rules for decision-tree induction”,
Machine Learning, vol. 8, pp. 75-85.
T. Windeatt and G. Ardeshir, “An empirical comparison of pruning methods for ensemble classifiers”,
Proc. of 4th International Conference on Advances in Intelligent Data Analysis, Cascais, Portugal,
(2001), pp. 208-217.
F. Esposito, D. Malerba and G. Semeraro, “A comparative analysis of methods for pruning decision
trees”, IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 19, no. 5, (1997), pp.
476-491.
J. R. Quinlan, “Simplifying decision trees”, International Journal of Man Machine Studies, vol. 27, pp.
221-234.
J. Mingers, “An empirical comparison of pruning methods for decision tree induction”, Machine
Learning, vol. 3, (1989), pp. 227-243.
M. Mehta, J. Rissanen and R. Agrawal, “MDLbased decision tree pruning”, Proc. of the 1st International
Conference on Knowledge Discovery in Databases and Data Mining, Montreal, Canada, (1995), pp.
216-221.
J. Ross Quinlan and Ronald L. Rivest, “Inferring decision trees using the minimum description length
principle”, Inform. Comput., (1989), vol. 80, pp. 227-248.
Copyright ⓒ 2016 SERSC
25
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
[23] Bratko and M. Bohanec, “Trading accuracy for simplicity in decision trees”, Machine Learning, vol. 15,
(1994), pp. 223-250.
[24] H. Allamullim, “An efficient algorithm for optimal pruning of decision trees”, Artificial Intelligence,
vol. 83, no. 2, pp. 347-362.
[25] M. Kaariainen, “Learning small trees and graphs that generalize”, A Report, University of Helsinki,
Finland Series of Publications Helsinki, (2004).
[26] L. Hall, R. Collins, K. W. Bowyer and R. Banfield, “Error-Based pruning of decision trees grown on
very large data sets can work!”, Proc. of the 14th IEEE International Conference on Tools with Artificial
Intelligence, (2002), pp. 233-238.
[27] T. Oates and D. Jensen, “Toward a theoretical understanding of why and when decision tree pruning
algorithms fail”, Proc. of the Sixteenth National Conference on Artificial Intelligence, (1999), pp. 372378.
[28] J. Macek and L. Lhotsk, “Gaussian complexities based decision tree pruning”, Cybernetics and Systems
2004, Austrian Society for Cybernetics Studies Vienna, (2004), pp. 713-718.
[29] E. Frank, “Pruning Decision Trees and List”, A Doctoral Thesis Submitted to University of Waikato,
(2000).
[30] J. P. Bradford, C. Kunz, R. Kohavi, C. Brunk and C. E. Brodley, “Pruning decision trees with
misclassification costs”, European Conference on Machine Learning, (1998), pp. 131-136.
[31] C. Scott, “Tree pruning with subadditive penalties”, IEEE Transactions On Signal Processing, vol. 53,
no. 12, pp. 4518-4525.
[32] A. P. Bradley and B. C. Lovell, “Costsensitive decision tree pruning: use of the ROC curve”, In Eighth
Australian Joint Conference on Artificial Intelligence, Canberra, Australia, (1995) November, pp. 1-8.
[33] J. Cai1, J. Durkin1 and Q. Cai, “CC4.5: costsensitive decision tree pruning”, Proc. of Data Mining
Conference, Skiathos, Greece, (2005), pp. 239-245.
[34] Y. Mansour, “Pessimistic decision tree pruning based on tree size”, Proceedings of 14th International
Conference on Machine Learning, (1997), pp. 195-201.
[35] X. Huo, S. Bum Kim, K.-L. Tsui and S. Wang, “FBP: A frontier-based treepruning algorithm”,
INFORMS Journal on Computing, vol. 18, no. 4, (2006), pp. 494-505.
[36] J. Faurnkranz, “Pruning algorithms for rule learning”, Machine Learning, vol. 27, (1997), pp. 139-172.
[37] S. Shah and P. S. Sastry, “New algorithms for learning and pruning oblique decision trees”, IEEE
Transactions on Systems, Man, And Cybernetics— Part C: Applications and Reviews, vol. 29, no. 4, pp.
494-505.
[38] G. V. Kass, “An exploratory technique for investigating large quantities of categorical data”, Applied
Statistics, vol. 29, no. 2, pp. 119-127.
[39] J. R. Quinlan, “Induction of decision trees”, Machine Learning, vol. 1-1, pp. 81-106.
[40] R. Kohavi, “Feature subset selection as search with probabilistic estimates”, Proc. the AAAI Fall
Symposium on Relevance, (1994), pp. 122-126.
[41] R. Caruana and D. Freitag, “Greedy attribute selection”, Proc. of the 11th International Conference on
Machine Learning, (1994), pp. 28-36.
[42] M. Last, A. Kandel, O. Maimon and E. Eberbach, “Anytime algorithm for feature selection”, Proc. of
Second International Conference on Rough Sets and Current Trends in Computing, (2000), pp. 532-539.
[43] S. Wu and P. A. Flach, “Feature selection with labelled and unlabelled data”, Marko Bohanec, Dunja
Mladenic, and Nada Lavrac, editors, ECML/PKDD'02 workshop on Integrating Aspects of Data Mining,
Decision Support and Meta-Learning, (2002), pp. 156-167.
[44] H. Yuan, S.-S. Tseng, W. Gangshan and Z. Fuyan, “A two-phase feature selection method using both
filter and wrapper”, Proc. of IEEE International Conference on Systems, Man and Cybernetics, (1999),
pp. 132-136.
[45] K. Grabczewski and N. Jankowski, “Feature selection with decision tree criterion”, Proc. of Fifth
International Conference on Hybrid Intelligent Systems, (2005) November 6-9, pp. 212-217.
[46] J. Bins and B. A. Draper, “Feature selection from huge feature sets”, Processing of International
Conference on Computer Vision, Vancouver, (2001), pp. 159-165.
[47] C. Guerra-Salcedo, S. Chen, D. Whitley and S. Smith, “Fast and accurate feature selection using hybrid
genetic strategies”, Proceedings of the Congress on Evolutionary Computation, (1999), pp. 177-184.
[48] J.-A. Landry, L. Da Costa and T. Bernier, “Discriminant feature selection by genetic programming:
towards a domain independent multi-class object detection system”, Journal of Systemics, Cybernetics
and Informatics, vol. 1, no. 3, (2006), pp. 76-81.
[49] J. Huang Bala, H. Vafaie, K. DeJong and H. Wechsler, “Hybrid learning using genetic algorithms and
decision trees for pattern classification”, Proc. of the IJCAI conference, Montreal, (1995), pp. 719-724.
[50] G. Legrand and N. Nicoloyannis, “Feature selection and preferences aggregation”, Machine Learning
and Data Mining in Pattern Recognition, Springer Heidelberg, (2005), pp. 203-217.
[51] M. A. Hall and L. A. Smith, “Feature subset selection: a correlation based filter approach”, Proceedings
of International Conference on Neural Information Processing and Intelligent Information Systems
1997, (1997), pp. 855-858.
26
Copyright ⓒ 2016 SERSC
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
[52] W. Duch, J. Biesiada, T. Winiarski, K. Grudzinski and K. Gr. Abczewski, “Feature selection based on
information theory filters and feature elimination wrapper methods”, Proceedings of the International
Conference on Neural Networks and Soft Computing, Advances in Soft Computing, (2002), pp. 173176.
[53] M. A. Hall, “Correlation-based feature selection for discrete and numeric class machine learning”, Proc.
of International Conference on Machine Learning, Stanford University, CA. Morgan Kaufmann
Publishers, (2000), pp. 359-366.
[54] H. Yuan, S.-S. Tseng, W. Gangshan and Z. Fuyan, “A two-phase feature selection method using both
filter and wrapper”, Proc. of IEEE International Conference on Systems, Man and Cybernetics, pp. 132136.
[55] P. Luca Lanzi, “Fast feature selection with genetic algorithms: a filter approach”, Proc. of 1997 IEEE
International Conference on Evolutionary Computation, pp. 537-540.
[56] G. H. John, “Robust decision trees: Removing outliers from databases”, In Proc. of the First ICKDDM,
(1995), pp. 174-179.
[57] R. Agrawal Arning and P. Raghavan, “A linear method for deviation detection in large databases”,
KDDM 1996, pp. 164-169.
[58] I. Guyon, N. Matic and V. Vapnik, “Discovering informative patterns and data cleaning”, Advances in
knowledge discovery and data mining, AAAI, (1996), pp. 181-203.
[59] G. D. Gamberger and N. Lavrac, “Conditions for Occam's Razor applicability and noise elimination”,
Marteen van Someren and Gerhard Widmer, editors, Proceedings of the 9th European Conference on
Machine Learning, Springer, (1997), pp. 108-123.
[60] E. M. Knorr and R. T. Ng. “A unified notion of outliers: properties and computation”, Proceedings of
3rd International Conference on Knowledge Discovery and Data Mining, (1997).
[61] E. M. Knorr and R. T. Ng, “Algorithms for mining distance-based outliers in large datasets”,
Proceedings 24th VLDB, (1998), pp. 392-403, 24-27.
[62] D. Tax and R. Duin, “Outlier detection using classifier instability”, Proceedings of the workshop
Statistical Pattern Recognition, Sydney, (1998).
[63] E. Brodley and M. A. Fried, “Identifying mislabeled training data”, Journal of Artificial Intelligence
Research, vol. 11, (1999), pp. 131-167.
[64] S. Weisberg, “Applied Linear Regression”, John Wiley and Sons, (1985).
[65] D. Gamberger, N. Lavrac and C. Groselj, “Experiments with noise filtering in a medical domain”, In
Proc. 16th ICML, Morgan Kaufman, San Francisco, CA, (1999), pp. 143-151.
[66] S. Schwarm and S. Wolfman, “Cleaning data with Bayesian methods”, 2000. Final project report for
University of Washington Computer Science and Engineering CSE574, (2000) March 16.
[67] S. Ramaswamy, R. Rastogi and K. Shim. “Efficient algorithms for mining outliers from large data sets”,
ACM SIGMOD, vol. 29, no. 2, (2000) June, pp. 427-438.
[68] V. Raman and J. M. Hellerstein, “An interactive framework for data transformation and cleaning”,
Technical report University of California Berkeley, California, (2000) September.
[69] J. Kubica and A. Moore, “Probabilistic noise identification and data cleaning”, Third IEEE International
Conference on Data Mining, (2003) November 19-22.
[70] V. Verbaeten and A. V. Assche, “Ensemble methods for noise elimination in classification problems. In
Multiple Classifier Systems”, Springer, (2003).
[71] J. A. Loureiro, L. Torgo and C. Soares, “Outlier detection using clustering methods: a data cleaning
application”, Proceedings of KDNet Symposium on Knowledge-based Systems for the Public Sector,
Bonn, Germany, (2004).
[72] H. Xiong, G. Pande, M. Stein and V. Kumar, “Enhancing Data analysis with noise removal”, IEEE
Transaction on knowledge and Data Engineering, vol. 18, no. 3, (2006) March, pp. 304-319.
[73] S. Kim, N. Wook Cho, B. Kang and S. Ho Kang, “Fast outlier detection for very large log data”, Expert
Systems with Applications, Elsevier, vol. 38, (2011), pp. 9587-9596.
[74] S. Hido, Y. Tsuboi, H. Kashima, M. Sugiyama and T. Kanamori, “Statistical Outlier Detection Using
Direct Density Ratio Estimation. Knowledge and Information Systems”, vol. 26, no. 2, pp. 309-336,
2011.
[75] G. John and P. Langley, “Static Versus Dynamic Sampling for Data Mining”, In Proceedings of the
Second International Conference on Knowledge Discovery and Data Mining 1996, AAAI Press, pp.
367-370.
[76] F. Provost, D. Jensen and T. Oates, “Efficient Progressive sampling”, Proceedings of the Fifth
International Conference on Knowledge Discovery and Data Mining, ACM Press, (1999), pp. 23-32.
[77] V. Patil and R. S. Bichkar, “A hybrid evolutionary approach to construct optimal decision trees with
large data sets”, In Proc. IEEE ICIT06 Mumbai, (2006), December 15-17, pp. 429-433.
[78] R. J. Little and D. B. Rubin, “Statistical Analysis with Missing Data”, John Wiley and Sons, New York,
(1987).
[79] G. Batista and M. C. Monard, “An analysis of four missing data treatment methods for supervised
learning”, Applied Artificial Intelligence, vol. 17, (2003), pp. 519-533.
[80] J. H. Friedman, J. Louis Bentley and R. A. Finkel, “An algorithm for finding best matches in logarithmic
expected time”, ACM Transactions on Mathematical Software, vol. 3, (1977), pp. 209-226.
Copyright ⓒ 2016 SERSC
27
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
[81] J. R. Quinlan, “Unknown attribute values in induction”, Journal of Machine Learning, vol. 1, (1986), pp.
81-106.
[82] R. J. Kuligowski and A. P. Barros, “Using artificial neural Networks to estimate missing rainfall data”,
Journal AWRA, vol. 34, no. 6, 14 (1998).
[83] L. L. Brockmeier, J. D. Kromrey and C. V. Hines, “Systematically Missing Data and Multiple
Regression Analysis: An Empirical Comparison of Deletion and Imputation Techniques”, Multiple
Linear Regression Viewpoints, vol. 25, (1998), pp. 20-39.
[84] A. J. Abebe, D. P. Solomatine and R. G. W. Venneker, “Application of adaptive fuzzy rule-based
models for reconstruction of missing precipitation events”, Hydrological Sciences Journal, vol. 45, no.
3, pp. 425-436.
[85] S. Sinharay, H. S. Stern and D. Russell, “The use of multiple imputations for the analysis of missing
data”, Psychological Methods, vol. 4, pp. 317-329.
[86] K. Khalil, M. Panu and W. C. Lennox, “Groups and neural networks based stream flow data infilling
procedures”, Journal of Hydrology, vol. 241, (2001), pp. 153-176.
[87] B. Bhattacharya, D. L. Shrestha and D. P. Solomatine, “Neural networks in reconstructing missing wave
data in Sedimentation modelling”, In the Proceedings of 30th IAHR Congress, Thessaloniki, Greece
Congress, Thessaloniki, Greece, (2003) August 24-29.
[88] F. Fessant and S. Midenet, “Self-organizing map for data imputation and correction in surveys”, Neural
Comput. Appl., vol. 10, (2002), pp. 300-310.
[89] C. M. Musil, C. B. Warner, P. K. Yobas and S. L. Jones, “A comparison of imputation techniques for
handling missing data”, Weston Journal of Nursing Research, vol. 24, no. 7, (2002), pp. 815-829.
[90] H. Junninen, H. Niska, K. Tuppurainen, J. Ruuskanen and M. Kolehmainen, “Methods for imputation of
missing values in air quality data sets”, Atoms. Environ., vol. 38, (2004), pp. 2895-2907.
[91] M. Subasi, E. Subasi and P. L. Hammer, “New Imputation Method for Incomplete Binary Data”, Rutcor
Research Report, (2009) August.
[92] A. Mohammad Kalteh and P. Hjorth, “Imputation of Missing values in precipitation-runoff process
database”, Journal of Hydrology research, vol. 40, no. 4, pp. 420-432.
[93] R. M. Daniel, M. G. Kenward, “A method for increasing the robustness of multiple imputation”,
Computational Statistics and Data Analysis, doi 10.1016/j.csda.2011.10.006, (2011) Elsevier.
[94] P. and B., “Multiple Imputation of Missing Data with Genetic Algorithms based Techniques”,
International Journal on Computer Applications Special Issue on Evolutionary Computation in
Optimisation Techniques, vol. 2, (2010), pp. 74-78.
[95] G. Mitchell Weiss, “The Effect of Small Disjuncts and Class Distribution on Decision Tree Learning”,
A Doctoral Thesis Submitted to the Graduate School, New Brunswick Rutgers, The State University of
New Jersey, (2003).
[96] P. and D. Kalles, “GATree: Genetically evolved decision trees”, Proc. 12th International Conference on
Tools with Artificial Intelligence, (2000), pp. 203-206.
[97] Z. Fu and F. Mae, “A computational study of using genetic algorithms to develop intelligent decision
trees”, Proc. of the 2001 IEEE Congress On Evolutionary Computation, vol. 2, (2001), pp. 1382-1387.
[98] N. and E. Tazaki, “Genetic programming combined with association rule algorithm for decision tree
construction”, Proc. of fourth International Conference on Knowledge-Based Intelligent Engineering
Systems and Allied Technologies, vol. 2, (2000), pp. 746-749.
[99] Y. Kornienko and A. Borisov, “Investigation of a hybrid algorithm for decision tree generation”, Proc.
of the Second IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing
Systems: Technology and Applications, (2003), pp. 63-68.
[100] Z.-H. Zhou and Y. Jiang, “NeC4.5: Neural ensemble based C4.5”, IEEE Transactions On Knowledge
And Data Engineering, vol. 16, no. 6, (2004), pp. 770-773.
[101] C. Z. Janikow, “Fuzzy decision trees: Issues and methods”, IEEE Transactions on Systems, Man, and
Cybernetics, vol. 28, no. 1, pp. 1-14.
[102] Zeidler and M. Schlosser, “Continuous-valued attributes in fuzzy decision trees”, Proc. of the
International Conference on Information Processing and Management of Uncertainty in KnowledgeBased Systems, (1996), pp. 395-400.
[103] C. Z. Janikow, “A genetic algorithm method for optimizing the fuzzy component of a fuzzy decision
tree”, In CA for Pattern Recognition, editors S. Pal and P. Wang, CKC Press, pp. 253-282.
[104] M. Won Kim, J. Geun Lee and C. Min, “Efficient fuzzy rule generation based on fuzzy decision tree for
data mining”, Proceeding of IEEE International Fuzzy Systems Conference Seoul, Korea, (1999), 22-25.
[105] M. Fajfer and C. Z. Janikow, “Bottom-up fuzzy partitioning in fuzzy decision trees”, Proceedings of
19th International Conference of the North American Fuzzy Information Processing Society, (2000), pp.
326-330.
[106] M. Guetova, S. Holldobler and H.-P. Storr, “Incremental fuzzy decision trees”, International Conference
on Fuzzy Sets and Soft Computing in Economics and Finance, St. Petersburg, Russia, (2004).
[107] K. Kubota, H. Sakai, A. Nakase and S. Oyanagi, “Parallelization of decision tree algorithm and its
performance evaluation”, Proceedings of The Fourth International Conference on High Performance
Computing in the Asia-Pacific Region, vol. 2, (2000), pp. 574 -579.
28
Copyright ⓒ 2016 SERSC
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
[108] R. Kufrin, “Decision trees on parallel processors Machine Intelligence and pattern recognition”,
Elsevier, vol. 20, (1997), pp. 279-306.
[109] G. J. Narlikar, “A parallel, multithreaded decision tree builder”, A Technical Report, School of
Computer Science, Carnegie Mellon University, (1998).
[110] M. V. Joshi, G. Karypis and V. Kumar, “Scalparc: A new scalable and efficient parallel classification
algorithm for mining large datasets”, Proceedings of the International Parallel Processing Symposium,
(1998), pp. 573-579.
[111] L. O. Hall, N. Chawla and K. W. Bowyer, “Combining decision trees learned in parallel”, Distributed
Data Mining Workshop at International Conference of Knowledge Discovery and Data Mining, (1998),
pp. 77-83.
[112] M. J. Zaki, C.T. Ho and R. Agrawal, “Parallel classification for data mining on shared-memory
multiprocessors”, IEEE International Conference on Data Engineering, (1999), pp. 198-205.
[113] A. Srivastava, E. H. Han, V. Kumar and V. Singh, “Parallel formulations of decisiontree classification
algorithms”, Data Mining and Knowledge Discovery: An International Journal, vol. 3, no. 3, (1999), pp.
237-261.
[114] K. Kubota, A. Nakase and S. Oyanagi, “Implementation and performance evaluation of dynamic
scheduling for parallel decision tree generation”, Proceedings of the 15th International Parallel and
Distributed Processing Symposium, (2001), pp. 1579-1588.
[115] R. Jin and G. Agrawal, “Communication and memory efficient parallel decision tree construction”,
Proceedings of Third SIAM Conference on Data Mining, (2003).
[116] R. Jin, G. Yang and G. Agrawal, “Shared memory parallelization of data mining algorithms: techniques,
programming interface, and performance”, IEEE Transactions On Knowledge And Data Engineering,
vol. 16, no. 10, (2004), pp. 71-89.
[117] L. Wenlong and X. Changzheng, “Parallel Decision Tree Algorithm Based on Combination”, IEEE
International Forum on Information Technology and Applications (IFITA) Kunming, (2010) July 16-18,
pp. 99-101.
[118] J. Ouyang, N. Patel and I. K. Sethi, “Chi-Square Test Based Decision Trees Induction in Distributed
Environment”, IEEE International Conference on Data Mining Workshops, ICDMW '08, (2008)
December 15-19, pp. 477-485.
[119] K. Bhaduri, R. Wolff, C. Giannella and H. Kargupta, “Distributed Decision-Tree Induction in Peer-toPeer Systems”, Journal Statistical Analysis and Data Mining, John Wiley and Sons, vol. 1, no. 2, (2008)
June.
[120] B. Liu, S.-G. Cao, X.-L. Jim and Z.-H. Zhi, “Data mining in distributed data environment”, International
Conference on Machine Learning and Cybernetics (ICMLC), vol. 1, (2010) July 11-14, pp. 421-426.
[121] M. Mehta, R. Agrawal and J. Rissanen, “SLIQ: A fast scalable classifier for data mining”, Proceedings
of the Fifth international Conference on Extending Database Technology, Avignon, France, (1996), pp.
18-32.
[122] R. Agrawal Shafer and M. Mehta, “SPRINT: A scalable parallel classifier for data mining”, Proceedings
of the 22nd VLDB Conference, (1996), pp. 544-555.
[123] J. Gehrke, R. Ramakrishnan and V. Ganti, “Rainforest-A framework for fast decision tree construction
of large datasets”, Proceedings of Conference on Very Large Databases (VLDB), (1998), pp. 416-427.
[124] K. Alsabti, S. Ranka and V. Singh, “CLOUDS: a decision tree classifier for large datasets”, Proceedings
of Conference on Knowledge Discovery and Data Mining (KDD-98), (1998), pp. 2-8.
[125] J. Gehrke, V. Ganti, R. Ramakrishnan and W. Loh, “BOAT-optimistic decision tree construction”, Proc.
of Conference SIGMOD, (1999), pp. 169-180.
[126] P. Chan and S. J. Stolfo, “Toward parallel and distributed learning by meta-learning”, In Working Notes
AAAI Work. Knowledge Discovery in Databases, pp. 227-240.
[127] L. Todorovski and Dzeroski, “Combining multiple models with meta decision trees”, Proc. of the Fourth
European Conference on Principles of Data Mining and Knowledge Discovery, (2000), pp. 54-64.
[128] L. Todorovski and Dzeroski, “Combining classifiers with meta decision trees”, Machine Learning, vol.
50, no. 3, (2003), pp. 223-249.
[129] B. Zenko, L. Todorovski and Dzeroski, “A comparison of stacking with meta decision trees to bagging,
boosting, and stacking with other methods”, Proceedings of the 2001 IEEE International Conference on
Data Mining, (2001), pp. 669-670.
[130] A. L. Prodromidis, P. K. Chan and S. J. Stolfo, “Meta-learning in distributed data mining systems:
Issues and approaches”, editors Hillol Kargupta and Philip Chan, Book on Advances of Distributed Data
Mining AAAI press, (2000), pp. 81-113.
[131] S. Stolfo, W. Fan, W. Lee, A. Prodromidis and P. Chan, “Credit Card Fraud Detection Using
Metalearning: Issues and Initial Results”, In working notes of AAAI Workshop on AI Approaches to
Fraud Detection and Risk Management, (1997).
[132] S. Rasoul Safavian and D. Landgrebe, “A survey of decision tree classifier methodology”, IEEE
Transaction on Systems, Man, and Cybernetics, vol. 21, no. 3, (1991), pp. 660-674.
[133] L. Rokach and O. Maimon, “Top-down induction of decision trees classifiers-a survey”, IEEE
Transactions on Systems, Man, And Cybernetics-Part C: Applications And Reviews, vol. 35, no. 4,
(2005), pp. 476-487.
Copyright ⓒ 2016 SERSC
29
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
[134] P. E. Utgoff, “Incremental induction of decision trees”, Machine Learning, vol. 4, (1989), pp. 161-186.
[135] R. Reynolds and Hasan Al-Shehri, “The use of cultural algorithms with evolutionary programming to
guide decision tree induction in large databases”, Proceedings of the 1998 IEEE International conference
on Evolutionary Computation, at IEEE World Congress on Computational Intelligence at Anchorage,
AK, USA, (1998), pp. 441-546.
[136] S. K. Murthy, S. Kasif, S. Salzberg and R. Beigel, “OC1: Randomized induction of oblique decision
trees”, Proceeding Eleventh National Conference on Artificial Intelligence, Washington, DC, 11-15th,
AAAI Press, (1993) July, pp. 322-327.
[137] R. Setiono and H. Liu, “A connectionist approach to generating oblique decision trees”, IEEE
Transactions On Systems, Man, And Cybernetics, vol. 29, no. 3, (1999).
[138] V. S. Iyengar, “HOT: Heuristics for oblique trees”, Proceedings of Eleventh International Conference on
Tools with Artificial Intelligence, IEEE Press, (1999), pp. 91-98.
[139] E. Cantu-Paz and C. Kamath, “Inducing oblique decision trees with evolutionary algorithms”, IEEE
Transactions on Evolutionary Computation, vol. 7, no. 1, (2003), pp. 5-68.
[140] I. H. Witten and E. Frank, “Data Mining Practical Machine Learning Tools and Techniques”, Morgan
Kaufmann, (2005).
[141] J. Han, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, (2001).
[142] L. Virine and L. Rapley, “Visualization of probabilistic business models”, Proceedings of the 2003
Winter Simulation Conference, vol. 2, (2003), pp. 1779-1786.
[143] Q. Yang, J. Yin, C. X. Ling and T. Chen, “Post processing decision trees to extract actionable
knowledge”, Proceedings of the Third IEEE International Conference on Data Mining, Florida, USA,
(2003).
[144] D. Zhang, X. Zhou, S. C. H. Leung and J. Zheng, “Vertical bagging decision trees model for credit
scoring”, Expert Systems with Applications, Elsevier Publishers, vol. 37, (2010), pp. 7838-7843.
[145] W.-K. Sung, D. Yang, S.-M. Yiu, D. W. Cheung, W.-S. Ho and T.-W. Lam, “Automatic construction of
online catalog topologies”, IEEE Transactions on Systems, Man, And Cybernetics—Part C:
Applications and Reviews, vol. 32, no. 4, (2002).
[146] Z. Yu, F. Haghighat, B. C. M. Fung and H. Yoshino, “A decision tree method for building energy
demand modeling”, International Journal of Energy and Buildings, vol. 42, pp. 1637-1646.
[147] S. D. MacArthur, C. E. Brodley, A. C. Kak and L. S. Broderick, “Interactive content based image
retrieval using relevance feedback”, Computer Vision and Image Understanding, (2002), pp. 55-75.
[148] K. Park, K. Mu Lee and S. Uk Lee, “Perceptual grouping of 3D features in aerial image using decision
tree classifier”, Proceedings of 1999 International Conference on Image Processing, vol. 1, (1999), pp.
31-35.
[149] C. Sinclair, L. Pierce and S. Matzner, “An application of machine learning to network intrusion
detection”, Proceedings of 15th Annual Computer Security Applications Conference, (1999), pp. 37137.
[150] T. Abbes, A. Bouhoula and M. Rusinowitch, “Protocol analysis in intrusion detection using decision
tree”, Proceedings of the International Conference on Information Technology: Coding and Computing,
IEEE, (2004), pp. 404-408.
[151] C. Stasis, E. N. Loukis, S. A. Pavlopoulos and D. Koutsouris, “Using decision tree algorithms as a basis
for a heart sound diagnosis decision support system”, Proceedings of the 4th Annual IEEE Conference
on Information Technology Applications in Biomedicine, UK, (2003), pp. 354-357.
[152] M. Lenic, P. Povalej, M. Zorman, V. Podgorelec, P. Kokol and L. Lhotska, “Multimethod machine
learning approach for medical diagnosing”, Proceedings of the 4th Annual IEEE Conf on Information
Technology Applications in Biomedicine, UK, (2003), pp. 195-198.
[153] P. Kokol, M. Zorman, V. Podgorelec and S. Hleb Babie, “Engineering for intelligent systems”,
Proceedings of 1999 IEEE International Conference on Systems, Man, and Cybernetics, vol. 6, pp. 306311.
[154] M. Dong, R. Kothari, M. Visschert and S. B. Hoatht, “Evaluating skin condition using a new decision
tree induction algorithm”, Proceedings International Joint Conference on Neural Networks, vol. 4,
(2001), pp. 2456-2460.
[155] P. R. Kennedy and K. D. Adams, “A decision tree for brain-computer interface devices”, IEEE
Transactions On Neural Systems And Rehabilitation Engineering, vol. 11, no. 2, (2003), pp. 148-150.
[156] L. Hui and G. Liping, “Statistical estimation of diagnosis with genetic markers based on decision tree
analysis of complex disease”, International Journal of Computers in Biology and Medicine, vol. 39,
(2009), pp. 989-992.
[157] J. Pablo Gonzalez and U. Ozguner, “Lane detection using histogram-based segmentation and decision
trees”, Proceedings of IEEE Intelligent Transportation Systems, (2000), pp. 346-351.
[158] J. Freixenet, X. Lladb, J. Marti and X. Cufi, “Use of decision trees in color feature selection. application
to object recognition in outdoor scenes”, Proceedings of International Conference on Image Processing,
vol. 3, (2000), pp. 496-499.
[159] C. M. Rocco S., “Approximate reliability expressions using a decision tree approach”, Proceedings of
Annual Symposium - RAMS Reliability and Maintainability, (2004), pp. 116-121.
30
Copyright ⓒ 2016 SERSC
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
[160] T. Assaf and J. Bechta Dugan, “Diagnostic expert systems from dynamic fault trees”, Annual
Symposium-RAMS Reliability and Maintainability, (2004), pp. 444-450.
[161] M. Simard, S. S. Saatchi and G. De Grandi, “The use of decision tree and multiscale texture for
classification of jers-1 sar data over tropical forest”, IEEE Transactions On Geoscience And Remote
Sensing, vol. 38, no. 5, (2000).
[162] F. Zhu Palaniappan, X. Zhuang and Y. Zhao Blanchard, “Enhanced binary tree genetic algorithm for
automatic land cover classification”, Proceedings of International Geoscience and Remote Sensing
Symposium, (2000), pp. 688-692.
[163] R. Manavi, C. Weisbin, W. Zimmerman and G. Rodriguez, “Technology portfolio options for NASA
missions using decision trees”, Proceedings of IEEE Aerospace Conference, Big Sky, Montana, (2002),
pp. 115-126.
[164] Y. Amit and A. Murua, “Speech recognition using randomized relational decision trees”, IEEE
Transactions On Speech And Audio Processing, vol. 9, no. 4, (2001), pp. 333-341.
[165] L. R. Bahl, P. F. Brown, P. V. De Souza and R. L. Mercer, “A tree-based statistical language model for
natural language speech recognition”, IEEE Transactions On Acoustics, Speech, And Signal Processing,
vol. 37, no. 7, (1989), pp. 1001-1008.
[166] J. Yamagishi, M. Tachibana, T. Masuko and T. Kobayashi, “Speaking style adaptation using context
clustering decision tree for hmm-based speech synthesis”, Proceedings of IEEE International
Conference on Acoustics, Speech and Signal Processing, vol. 1, (1989), pp. 5-8.
[167] R. W. Selby and Porter, “A learning from examples: generation and evaluation of decision trees for
software resource analysis”, IEEE Transactions On Software Engineering, vol. 14, (1988), pp. 17431757.
[168] T. M. Khoshgoftaar, N. Seliya and Y. Liu, “Genetic programming-based decision trees for software
quality classification”, Proceedings of 15th IEEE International Conference on Tools with Artificial
Intelligence, (2003), pp. 374-383.
[169] S. Geetha, N. N. Ishwarya and N. Kamaraj, “Evolving decision tree rule based system for audio stego
anomalies detection based on Hausdorff distance statistics”, Information Sciences Elsevier Publisher,
(2010), pp. 2540-2559.
[170] L. Diao, K. Hu, Y. Lu and C. Shi Boosting, “Simple decision trees with Bayesian learning for text
categorization”, IEEE Robotics and Automation Society Proc. of the 4th World Congress on Intelligent
Control and Automation, Shanghai, China, pp. 321-325.
[171] B. Wu, W.-J. Zhou and W.-D. Zhang, “The applications of data mining technologies in dynamic traffic
prediction”, IEEE Intelligent Transportation Systems, vol. 1, (2003), pp. 396-401.
[172] P. Jeong and S. Nedevschi, “Intelligent road detection based on local averaging classifier in real-time
environments”, Proc. of the 12th International Conference on Image Analysis and Processing.
[173] E. Jaser, J. Kittler and W. Christmas, “Hierarchical decision making scheme for sports video
categorization with temporal postprocessing”, Proceedings of the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, (2004), pp. 908-913.
[174] Song Cen and Pamela C. Cosman, “Decision trees for error concealment in video decoding”, IEEE
Transactions on Multimedia, vol. 5, no. 1, (2003), pp. 1-7.
[175] F. Bonchi, F. Giannotti, G. Manco, C. Renso, M. Nanni, D. Pedreschi and S. Ruggieri, “Data mining for
intelligent web caching”, Proceedings of International Conference on Information Technology: Coding
and computing, (2001).
[176] M. Chen, A. Zheng, J. Lloyd, M. Jordan and E. Brewer, “Failure diagnosis using decision trees”,
Proceedings of the International Conference on Autonomic Computing.
[177] K. Kumar Reddy C, V. Babu B and C. H. Rupa, “SLEAS: Supervised Learning using Entropy as
Attribute Selection Measure”, International Journal of Engineering and Technology, (2014), pp. 20532060.
[178] K. Kumar Reddy C, C. H. Rupa and B. Vijaya Babu, “A Pragmatic Methodology to Predict the Presence
of Snow/No-Snow using Supervised Learning Methodologies”, International Journal of Applied
Engineering Research, (2014), pp. 11381-11394.
[179] K. Kumar Reddy C, C. H. Rupa and V. Babu, “SPM: A Fast and Scalable Model for Predicting
Snow/No-Snow”, World Applied Sciences Journal, (2014), pp. 1561-1570.
[180] K. Kumar Reddy C, C. H. Rupa and V. Babu, “SLGAS: Supervised Learning using Gain Ratio as
Attribute Selection Measure to Nowcast Snow/No-Snow”, International Review on Computers and
Software, (2015).
[181] K. Kumar Reddy C, V. Babu and C. H. Rupa, “ISIQ: Improved Supervised Learning using in Quest to
Nowcast Snow/No-Snow”, WSEAS Transactions on Computers, (2015).
[182] A. K. Pujari, “Data Mining Techniques”, Universities Press, (2004).
Authors
C. Kishor Kumar Reddy obtained his B.Tech in Information Technology from JNTU
Anantapur in 2011, M.Tech in Computer Science and Engineering from JNTU Hyderabad
Copyright ⓒ 2016 SERSC
31
International Journal of Artificial Intelligence and Applications for Smart Devices
Vol. 4, No. 1 (2016)
in 2013 and currently pursuing Ph. D in Computer Science Engineering from K L
University, Guntur. He has published 28 papers in International Conferences and
Journals, indexed by Scopus and Dblp databases. He is the member of IEEE, CSI, ISTE,
IAENG, IACSIT, and IAPA. He is the editorial board member of IJAEIM.
Dr B.Vijaya Babu is presently working as Professor in CSE department of K L
University, Andhra Pradesh. He obtained his PhD (CSSE) degree from A U College of
Engineering, Andhra University, VISAKHAPATNAM, Andhra Pradesh in 2012.He has
teaching experience of about 20 years, in various private engineering colleges of Andhra
Pradesh, in various positions. His research area is Knowledge Engineering/Data Mining
and published about 30 research papers in various International/Scopus journals. He is the
life member of Professional bodies like ISTE.
32
Copyright ⓒ 2016 SERSC