Download A Systematic Review of Classification Techniques and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Naive Bayes classifier wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
4th International Conference on System Modeling & Advancement in Research Trends (SMART)
College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad
[2015]
A Systematic Review of Classification Techniques
and Implementation of ID3 Decision Tree Algorithm
Arohi Gupta1, Surbhi Gupta2, Deepika Singh3
Research Scholar, College of Computing Sciences & Information Technology, TMU, Moradabad, India
2
Research Scholar, College of Computing Sciences & Information Technology, TMU, Moradabad, India
3
Assistant Professor, College of Computing Sciences & Information Technology, TMU, Moradabad, India
1
1
2
[email protected]
[email protected]
3
[email protected]
Abstract— Data mining is a knowledge discovery process that
analyzes the data and generates useful information and patterns
from it which assist in decision making in an organization.
Classification is a supervised learning technique of data mining,
which consists of a set of predefined classes and on the basis of
these predefined classes new objects are classified. Classification
classifies data based on the training dataset and generates a
classifier or model and uses it in classifying new data. In this
research paper, we have discussed about the classification
techniques proposed in the literature and the detailed study of
the Decision tree based data mining algorithms such as ID3 and
C4.5 has been done. Also, we have presented the comparative
study of various classification algorithms along with their
advantages and disadvantages.
Keywords— Data Mining, Classification, Decision Tree, Neural
Network, K-Nearest Neighbor, Naïve Bayesian
I. INTRODUCTION
The development of Information technology has
generated large amount of databases and huge data
in various areas. The research in databases and
information technology has given rise to an
approach to store and manipulate this precious data
for further decision making. Data mining refers to
extracting or mining knowledge from large amounts
of data or we can say that data mining is a process
of extraction of useful information and patterns
from huge data. It is also called as knowledge
discovery process, knowledge mining from data,
knowledge extraction or data pattern Analysis
[1][9]. Classification is one of the supervised
learning technique of knowledge mining from the
vast amount of data. In classification we find a
model that describes and distinguishes data classes,
using the training dataset whose class label is
known. This model can be used to predict the class
of objects whose class label is unknown. In
literature the classification method is subdivided
into a number of techniques for classifying data
with the correct class labels. Some of these
techniques as proposed by researchers are: Decision
tree based method, Bayseian Classifiers, Neural
network based classifiers, Lazy learner, Support
vector machines, Rule based method [1][25].
Decision tree [34] based classification method is
the graphical representation of the data point
attributes and it is one of the simplest method for
building a classifier model. A decision tree is
represented using nodes, branches and leaves,
where each node denotes a test, each branch
represents an outcome of the test and leaves
represent classes. This tree can be converted to
classification rules [8]. Another method of
classification, Naive Bayes classifier [35], is a
simple probabilistic classification method based on
applying Bayes theorem with strong independence
assumptions. The model based on this classifier
would be more precisely called as an independent
feature model [10]. Classification can also be
carried out using Neural Network [36] or an
artificial neural network, which is a biological
system that detects patterns and makes predictions
[3]. Another approach for classification proposed in
the literature is of using lazy learners [33]. KNearest Neighbor is a type of instance-based
learning, or lazy learning algorithm, which
classifies objects based on closest training examples
in the feature space. In k- nearest neighbor
algorithm, the function is only approximated locally
and all computation is deferred until classification
[11]. One of the strongest method for building
classifiers is SVM (Support Vector Machines).
Support Vector Machines [31] can classify both
144
4th International Conference on System Modeling & Advancement in Research Trends (SMART)
College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad
[2015]
linear and non linear data. It can transform original network based classifiers, Lazy learners, Support
training data into higher dimensions by using non vector machines, and Rule based method [1][25].
linear mapping [8]. Rule based classification [37]
technique uses a collection of “if-then” rule for
classifying the dataset [25].
In this paper our aim is to review the state of the
art on the existing classification based algorithms
and to present the advantages and disadvantages of
the various classification algorithms so as to make a
comparison among these algorithms. The rest of
this research paper is categorized into subsections
as: second section consists of overview of different
classification techniques. In the third section, the
detailed study of ID3 and C4.5 algorithms has been
given. The fourth section is a comparative study of
classification algorithms. Section Five is the
conclusion and future aspects for proposing an
efficient classification algorithm on the basis of the
Fig. 1 Proposed taxonomy for the classification algorithms
already proposed classification algorithms.
II. STATE OF ART OF THE CLASSIFICATION TECHNIQUES
In classification a model or classifier is
constructed to find categorical labels. For example,
the loan application data which can help a bank
loan officer to analyze that the loan applicant is safe
or risky for the bank. The categorical class labels
for the loan application data are safe or risky. Data
classification is a two-step process. In the first
learning step, a classifier or model is built using
training dataset whose class labels are known. In
the second step, the model is used for classification,
where the accuracy of the classifier is estimated
using test dataset [8][1]. If the accuracy is
considered acceptable, the classifier can be used to
classify future data tuple whose class label is
unknown.
Some
typical
applications
of
classification are target marketing, Medical
diagnosis, Credit approval, Fraud detection [8].
As we have stated a number of classification
techniques has been proposed in the literature. To
start with the description of the taxonomy, we have
explained our proposal in Fig. 1, where we have
categorized the classification algorithms. Mainly
the classification process is divided into six
different categories, which are named as Decision
tree based method, Bayseian Classifiers, Neural
A. Decision Tree
Decision tree is a classifier that can be viewed as
a tree where each internal node (non leaf node)
denotes a test on an attribute, each branch
represents an outcome of the test, and each leaf
node (or terminal node) holds a class label. It is a
flowchart like structure. The topmost node in a tree
is the root node. Decision trees can easily be
converted to classification rules [8]. In data mining,
Decision tree structures are a common way to
organize classification schemes. Classification
using a decision tree is performed by routing from
the root node until arriving at a leaf node [12]. The
algorithms used for decision tree are ID3 [13], C4.5
[32], C5.0 [5] and CART [38].
Fig. 2 Decision Tree Model
145
4th International Conference on System Modeling & Advancement in Research Trends (SMART)
College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad
1) Decision Tree Algorithm
Generate a decision: Generate a decision tree
from the training tuples of data partition D.
Input: Data partition D, which is a set of training
tuples and their associated class labels, attribute list,
Attribute selection method.
Output: A decision tree.
Method:
 Create a node N
 If tuples in D are all of the same class, C
then
 Return N as a leaf node labelled with the
class C
 If attribute list is empty then
 Return N as a leaf node labelled with the
majority class in D
 Apply Attribute selection method (D,
attribute list) to find the best splitting
criterion
 Label node N with splitting criterion
 If splitting attribute is discrete-valued and
multiway splits allowed then
 Attribute list attribute list splitting attribute
 For each outcome j of splitting criterion
 Let Dj be the set of data tuples in D
satisfying outcome j
 If Dj is empty then
 attach a leaf labelled with the majority class
in D to node N
 Else attach the node returned by Generate
decision tree (Dj, attribute list) to node N,
 End for
 Return N [8].
 The recursive partitioning stops only when
any one of the following terminating
condition is true:
 All of the tuples in partition D belong to the
same class or [8].
 There are no remaining attributes on which
the tuples may be further partitioned. In
this case, majority voting is employed.
This involves converting Node N into a
leaf and labeling it with the most common
class in D. Alternatively, the class
[2015]
distribution of the node tuples may be
stored [8].
 There are no tuples for a given branch, that
is, a partition Dj is empty. In this case, a
leaf is created with the majority class in D
[8].
B. Naive Bayesian
Bayesian classifiers, statistical classifiers, can
predict class membership probabilities, such as the
probability that a given tuple belongs to a particular
class. Bayesian classification is based on Bayes
theorem. The naive Bayes method also called
idiot’s Bayes, simple Bayes, and independence
Bayes. It is very easy to construct, not needing any
complicated iterative parameter estimation schemes
[4]. Naive Bayes classifiers use all the attributes. It
is based on following two assumptions:
1. Attributes are equally important.
2. Attributes are statistically independent i.e.,
class conditional independence, knowing
the value of one attribute says nothing
about the value of another.
The naive Bayesian classifier works as follows:
Let D be a training set of tuples and their
associated class labels. Each tuple is represented by
an n-D attribute vector X = (x1, x2,…, xn). Suppose
there are m classes C1, C2,…, Cm. Classification is
to derive the maximum posteriori, i.e., the maximal
P(Ci|X). This can be derived from Bayes theorem
P (C | X ) 
i
P ( X | C ) P (C )
i
i
P (X )
Eq. 1
Since P(X) is constant for all classes, only needs
to be maximized
P (C | X )  P ( X | C ) P (C )
i
i
i
Eq. 2
Naive Bayesian prediction requires each
conditional probability be non-zero. Otherwise, the
predicted probability will be zero. To avoid this
problem, Laplacian correction or Laplace estimator
technique is used. The corrected probability
estimates will close to their uncorrected
counterparts, yet the zero probability value will be
avoided [8].
C. Neural Network
An artificial neural network (ANN), also called a
neural network (NN), is one of the newest signals
146
4th International Conference on System Modeling & Advancement in Research Trends (SMART)
College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad
processing technology. It is a mathematical model
or computational model based on biological neural
networks. It consists of an interconnected group of
artificial neurons and processes information using a
connectionist approach to computation. In most
cases an ANN is an adaptive system that changes its
structure based on external or internal information
that flows through the network during the learning
phase [1]. After the training is complete the
parameter are fixed. If there are lots of data and
problem is poorly understandable then using ANN
model is accurate, the non linear characteristics of
ANN provide it lots of flexibility to achieve input
output map [3].
Fig. 3 Example of an unacceptable low-resolution image
Components of ANN are Neuron or node or unit,
Input links, Output links, Weight. Each unit
performs a simple process:
1. Receives n-inputs
2. Multiplies each input by its weight
3. Applies activation function to the sum of
results
4. Outputs result
D. K-Nearest Neighbor
Nearest Neighbor classifiers are based on
learning by analogy, in which a given test tuples is
compared with training tuples that are similar to it.
All training tuples are stored in an n-dimensional
pattern space because each tuple represents a point
in an n-dimensional space [1][8]. When given an
unknown tuple, a k-nearest neighbor classifier
searches the pattern space for the k training tuples
that are closest to the unknown tuple. These k
training tuples are the k nearest neighbors of the
unknown tuple. Closeness is defined in terms of a
distance metric, such as Euclidean distance [6][1].
The Euclidean distance between two points or
[2015]
tuples, say X1=(x11, x12,….x1n) and X2=(x21 ,
x22 ,….x2n) is
dist(X1, X2) = √Σ(x1i – x2i)2
Eq. 3
We normalize the values of each attribute before
using the Eq. 3 (min-max normalization). In knearest neighbor classification, the unknown tuple
is assigned the most common class among its k
nearest neighbours [8]. For categorical attributes we
compare the corresponding value of the attribute in
tuple X1 with that in tuple in X2. If the two are
identical then the difference between the two is
taken as 0. If the two are different then the
difference is considered to be 1[8].
E. Support Vector Machines
As Support Vector Machines (also known as
Maximization Margin Classifiers) simultaneously
minimize the empirical classification error and
maximize the geometric marginal [23] so, it does
not depend on the dimensions of the feature space
and can therefore efficiently handle high
dimensional data [22][23]. These are based on
Structural risk minimization [23]. The basic concept
of structural risk minimization is to find the
hypothesis for which it guarantees the lowest true
error [22]. SVM has strong regularization properties.
Regularization refers to the generalization of the
model to new data. Support vector machines were
designed as a tool to solve supervised learning
classification problems [29][30]. SVM map input
vector to a higher dimensional space where a
maximal separating hyperplane is constructed. Two
parallel hyperplanes are constructed on each side of
the hyperplane that separate the data. The
separating hyperplane is the hyperplane that
maximize the distance between the two parallel
hyperplanes. An assumption is made that the larger
the margin or distance between these parallel
hyperplanes the better the generalization error of
the classifier will be [23][31].
F. Rule Based Method
Rule based classification technique uses a
collection of “if- then” rule for classifying the
dataset [25]. For example consider the rule R1
given as
147
4th International Conference on System Modeling & Advancement in Research Trends (SMART)
College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad
R1: IF Manual Checkup = Pass AND Year = Valid
THEN Issue = Yes
The ‘IF’ part of a rule is called the rule
antecedent or precondition and the ‘THEN’ part is
the rule consequent. Here in the above rule we are
predicting the pollution under control certificate for
a vehicle. If all the attribute(i.e. manual checkup
and year) tests in the rule holds true for the given
tuple, we say that the rule satisfied and that the rule
covers the tuple. There are two parameters for the
assessing a rule R, defined as the rule coverage and
the rule accuracy [8].
Consider a dataset D where |D| denotes the
number of tuples in D. Let ncovers be the number
of tuples covered by rule R and ncorrect be the
number of tuples correctly classified by R. Then the
coverage and accuracy is defined as
[2015]
simplest, tree is found [8]. The expected
information needed to classify a tuple in D is given
by [8]
Eq. 6
Info(D) is also known as the entropy of D.
InfoA(D) is the expected information required to
classify a tuple from D based on the partitioning by
A and is given by [8]
Eq. 7
The smaller the expected information required,
the greater the purity of the partitions. Information
gain is defined as the difference between the
original information requirement and the new
requirement [8].
Eq. 8
Gain(A) = Info(D) - InfoA(D)
Experiments to evaluate the performance of the
algorithm with continuous valued attributes and
missing attribute values reveal that ID3 does not
give acceptable results for continuous valued
Eq. 5
attributes and works well in certain data sets with
missing values [14][15].
III. DECISION TREE BASED ALGORITHM
The entropy and information gain has been
As stated above in section II here we have shown calculated for the data shown in the table below
the implementation of decision tree based algorithm [16]. Fig. 4 is the screen shot of the calculated
ID3 in java and have calculated the information values.
gain and entropy. Then further in this section, we
TABLE I
have given a brief description of C4.5 and C5.0.
DATABASE D
Eq. 4
Name
Fuel
Catego
ry
Kilometers
Servic
e
Year
Manu
al
Check
up
(MC)
Issue
Riva
Petrol
Two
Not
Covered
No
Valid
Pass
Yes
Sita
Petrol
Two
Covered
Yes
Invalid
Pass
No
Puru
Petrol
Four
Covered
No
Valid
Fail
No
Riya
Petrol
Four
Covered
Yes
Valid
Pass
Yes
Neha
Diesel
Four
Covered
No
Invalid
Fail
No
Ram
Diesel
Three
Covered
No
Valid
Fail
No
Ekta
Diesel
Three
Not
Covered
No
Valid
Pass
Yes
Saya
CNG
Three
Not
Covered
No
Valid
Pass
Yes
Ajay
Petrol
Four
Covered
Yes
Invalid
Fail
No
A. ID3
ID3 uses information gain as its attribute
selection measure. ID3, Iterative Dichotomiser3 is a
decision tree learning algorithm which is used for
the classification of the objects with the iterative
inductive approach. It uses the greedy top to down
search to build the tree which will decide the
decision rules [16][14]. Let D is a database with N
number of nodes and some tuples. The attribute
with the highest information gain is chosen as the
splitting attribute for Node N. This attribute
minimizes the information needed to classify the
tuples in the resulting partitions and reflects the
least randomness or impurity in these partitions.
Such an approach minimizes the expected number
of tests needed to classify a given tuple and
guarantees that a simple but not necessarily the
148
4th International Conference on System Modeling & Advancement in Research Trends (SMART)
College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad
[2015]
It differs from information gain, which measures
the information with respect to classification that is
acquired based on the same partitioning. The gain
ratio is defined as
Eq. 10
The attribute with the maximum gain ratio is
selected as the splitting attribute [8].
C. C5.0
C4.5 was superseded in 1997 by a commercial
system See5/C5.0 (or C5.0 for short) [17]. C4.5
algorithm follows the rules of ID3 algorithm.
Similarly C5 algorithm follows the rules of C4.5
algorithm. C5.0 algorithm provides Feature
selection, Cross validation and reduced error
Fig. 4 Calculated Entropy and Information Gain for Database D
pruning facilities. C5.0 algorithm has many features
The decision tree for this database based on the like [5]:
calculated entropy and information gain is as shown 1. The large decision tree can be viewing as a set
in Fig. 5.
of rules which is easy to understand [5].
2. It gives the acknowledge on noise and missing
data [5].
3. Problem of over fitting and error pruning is
solved by the C5.0 algorithm [5].
4. C5.0 classifier can anticipate which attributes
are relevant and which are not relevant in
classification [5].
IV. COMPARATIVE ANALYSIS OF CLASSIFICATION
ALGORITHMS
In this section we studied advantages and
disadvantages of various classification algorithms.
Fig. 5 Decision Tree for Database D
B. C4.5
The information gain measure is biased toward
tests with many outcomes. C4.5 [24], a successor of
ID3, uses an extension to information gain known
as gain ratio (attribute selection measure), which
attempts to overcome this bias [8]. When all
attributes are binary, the gain ratio criterion has
been found to give considerably smaller decision
trees [13]. It applies a kind of normalization to
information gain using a split information value
defined analogously with Info(D) as
Eq. 9
149
4th International Conference on System Modeling & Advancement in Research Trends (SMART)
College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad
[2015]
TABLE III
ADVANTAGES AND DISADVANTAGES OF CLASSIFICATION ALGORITHMS
Algorithm
ID3
C4.5












C5.0
Neural Network
K- Nearest
Neighbor


















Naive Bayesian





Suport Vector
Machines



Rule Based
Method


Advantages
Very simple [21]
Easy to implement.
Quite a simple process.
Running time increases only linealy with complexity of
problem.
Avoid overfitting the data [2].
Faster than ID3 [2].
More memory efficient than ID3 [2].
Handles missing and continuous attributes [2].
Determining how deeply to grow a decision tree.
Improved computational efficiency.
Faster than c4.5 [5][18].
Use less memory than C4.5 during ruleset construction
[18].
Gets similar result with smaller decision tree [5].
Supports boosting: It improves the trees and give more
accuracy.
C5.0 rulesets are easier to understand [19].
Lower error rates on unseen cases [5][18].
Solve problem of over fitting [5]
High tolerance to noisy data.
Ability to classify untrained patterns.
Well-suited for continuous-valued inputs and outputs.
Successful on a wide array of real-world data.
Algorithms are inherently parallel.
Techniques have recently been developed for the
extraction of rules from trained neural networks.
Easy to understand [7][17].
Easy to implement [7][17].
Training is very fast [7].
Robust to noisy training data [7].
Particularly well suited for multimodal classes as well as
applications in whichan object can have many class
labels [17].
Simple for constructing classifier.
Require small amount of training data to estimate the
parameters necessary for classification.
Even if the naïve bayes assumptions does not hold, a
naïve bayes classifier still often performs surprisingly
well in practice.
Good performance[7].
It can efficiently handle non linear data.
It can handle multi class problem.
By introducing the kernel, SVMs gain flexibility in the
choice of the form of the threshold separating solvent
from insolvent companies [26].
No assumptions about the functional form of the
transformation [26].
Provide a good out-of-sample generalization [26].
Deliver a unique solution, since the optimality problem
is convex [26].
Easy for people to understand [39][25].
rule learning systems outperform decision tree learners
on many problems [25][40][20].

























Disadvantages
Does not guarantee optimal solution.
Does not give acceptable result for continuous
data and missing data [15].
It takes the more memory.
It has long searching time.
Data may be over-fitted or over-classified [21].
Empty branches [21].
Insignificant branches [21].
Susceptible to noise [21].
For applications with very many cases C5.0
may crash with a message like segmentation
fault [19].
The use of case weighting does not guarantee
that the classifier will be more accurate for
unseen cases with higher weights [19].
Long training time.
Require a number of parameters typically best
determined empirically
Poor interpretability
Memory limitation [7].
Being a supervised learning lazy algorithm i.e.,
runs slowly [7].
Expensive particularly for large training sets
[17].
The naive Bayes classifier requires a very large
number of records to obtain good results [7].
It is instance-based or lazy in that they store all
of the training samples [7].
The marginal contribution of each financial
ratio to the score is variable [26].
The lack of transparency of results [26].
The choice of the kernel [27].
Extension of multiclass problem [28].
Long training time [28].
Selection of parameters [28].
When data contains uncertainty, the algorithm
can not process the uncertainty properly [25].
150
4th International Conference on System Modeling & Advancement in Research Trends (SMART)
College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad
V. CONCLUSIONS
[12]
Data mining is a wide area that integrates
techniques from various fields. These techniques
can be based on supervised learning method or
unsupervised learning method. One of the
supervised learning based method, called as
classification, for mining data patterns has been
reviewed in this paper. The important task of
classification process is to classify new and unseen
sample correctly. These classification algorithms
can be implemented on different types of data sets
like data of patients, financial data, and student data.
Each technique has got its own pros and cons as
given in the paper. Based on the needed conditions
each one as needed can be selected. This paper
deals with various classification techniques used in
data mining and a detailed study on ID3 and C4.5
decision tree based algorithms, has been conducted.
[13]
REFERENCES
[22]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
S. Neelamegam, Dr. E. Ramaraj, Classification algorithm in Data
mining: An Overview, International Journal of P2P Network Trends
and Technology (IJPTT), Volume 4 Issue 8, Sep 2013
A. S. Galathiya, A. P. Ganatra, C. K. Bhensdadia, Classification with
an improved Decision Tree Algorithm, International Journal of
Computer Applications (0975 – 8887), Volume 46– No.23, May 2012
Nikita Jain, Vishal Srivastava, Data mining techniques: a survey paper,
IJRET: International Journal of Research in Engineering and
Technology eISSN: 2319-1163 | pISSN: 2321-7308
Raj Kumar, Dr. Rajesh Verma, Classification Algorithms for Data
Mining: A Survey, International Journal of Innovations in Engineering
and Technology (IJIET) Vol. 1
Rutvija Pandya, Jayati Pandya, C5.0 Algorithm to Improved Decision
Tree with Feature Selection and Reduced Error Pruning, International
Journal of Computer Applications (0975 – 8887) Volume 117 - No. 16,
May 2015
Thair Nu Phyu, Survey of Classification Techniques in Data Mining,
Proceedings of the International Multi Conference of Engineers and
Computer Scientists 2009 Vol I, IMECS 2009, March 18 - 20, 2009,
Hong Kong
S. Archana, Dr. K. Elangovan, Survey of Classification Techniques in
Data Mining, International Journal of Computer Science and Mobile
Applications, Vol.2 Issue. 2, February- 2014, pg. 65-71 ISSN: 23218363
Jiawei Han & Micheline Kamber, DM Concepts & Techniques ,
Second Edition
CLUSTERING AND CLASSIFICATION: DATA MINING
APPROACHES by Ed Colet
Kavitha Murugeshan, Neeraj RK, Discovering Patterns to Produce
Effective Output through Text Mining Using Naïve Bayesian
Algorithm, International Journal of Innovative Technology and
Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-2, Issue-6,
May 2013
Dorina Kabakchieva, Predicting Student Performance by Using Data
Mining Methods for Classification, BULGARIAN ACADEMY OF
SCIENCES
CYBERNETICS
AND
INFORMATION
TECHNOLOGIES, Volume 13, No 1 ,Sofia 2013 Print ISSN: 13119702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[2015]
Decision Trees for Business Intelligence and Data Mining: Using SAS
Enterprise Miner, Decision Trees, What Are They?
J.R. QUINLAN, Induction of Decision Trees, Machine Learning 1: 81106, 1986, Kluwer Academic Publishers, Boston - Manufactured in
The Netherlands.
Anand Bahety, Extension and Evaluation of ID3 – Decision Tree
Algorithm’. University of Maryland, College Park.
Rupali Bhardwaj , Sonia Vatta, Implementation of ID3 Algorithm,
International Journal of Advanced Research in Computer Science and
Software Engineering, Volume 3, Issue 6, June 2013 ISSN: 2277 128X
Rupali Bhardwaj, Sonia Vatta, Issuing of Pollution Under Control
Certificate using ID3 algorithm, International Journal of Advanced
Research in Computer Science and Software Engineering, Volume 3,
Issue 5, May 2013 ISSN: 2277 128X
XindongWu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang
Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu,
Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, Dan
Steinberg, Top 10 algorithms in data mining, Knowl Inf Syst (2008)
14:1–37DOI 10.1007/s10115-007-0114-2
The
Rulequest
Research
Website.
[Online].
Available:
http://rulequest.com/see5-comparison.html.
The
Rulequest
Research
Website.
[Online].
Available:
http://www.rulequest.com/see5-unix.html
S. M. Weiss and N. Indurkhya, Reduced complexity rule induction, in
IJCAI, 1991, pp. 678–684.
Sonia Singh, Priyanka Gupta, COMPARATIVE STUDY ID3, CART
AND C4.5 DECISION TREE ALGORITHM: A SURVEY,
International Journal of Advanced Information Science and
Technology (IJAIST), Vol.27, No. 27, July 2014.
Thorsten Joachims, Text categorization with Support Vector Machines:
Learning with many relevant feature, 10th European Conference on
Machine Learning Chemnitz, Germany, Vol. 1398, April 21–23, 1998
Proceedings, pp 137-142.
DURGESH
K.Srivastava,
Lekha
Bhambhu,
DATA
CLASSIFICATION USING SUPPORT VECTOR MACHINE, Journal
of Theoretical and Applied Information Technology © 2005 - 2009
JATIT.
J. R. Quinlan, Improved Use of Continuous Attributes in C4.5, Journal
of Artificial Intelligence Research, Vol 4, (1996), 77-90
Biao Qin, Yuni Xia, Sunil Prabhakar, Yicheng Tu, A Rule-Based
Classification Algorithm for Uncertain Data, IEEE International
Conference on Data Engineering.
Laura Auria, Rouslan A. Moro, Berlin, Support Vector Machines
(SVM) as a Technique for Solvency Analysis, August 2008
CHRISTOPHER J.C. BURGES, A Tutorial on Support Vector
Machines for Pattern Recognition, Kluwer Academic Publishers,
Boston, Manufactured in the Netherlands
Support Vector Machine for Pattern Classification by Shigeo Abe
Himani Bhavsar, Mahesh H. Panchal, A Review on Support Vector
Machine for Data Classification, International Journal of Advanced
Research in Computer Engineering & Technology (IJARCET),
Volume 1, Issue 10, December 2012
N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector
Machines, Cambridge University Press, 2000
V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag.
1995.
J. R. Quinlan, Improved Use of Continuous Attributes in C4.5, Journal
of Arti_cial Intelligence Research 4 (1996) 77-90, Submitted 10/95;
published 3/96.
T. G. Dietterich, Ensemble methods in machine learning, Lecture
Notes in Computer Science, vol. 1857, pp. 1–15, 2000.
J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan
Kaufman Publishers, 1993.
P. Langley, W. Iba, and K. Thompson, An analysis of Bayesian
classifiers, in National Conf. on Artigicial Intelligence, 1992, pp. 223–
228.
R. Andrews, J. Diederich, and A. Tickle, A survey and critique of
techniques for extracting rules from trained artificial neural networks,
Knowledge Based Systems, vol. 8, no. 6, pp. 373–389, 1995.
151
4th International Conference on System Modeling & Advancement in Research Trends (SMART)
College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad
[37]
[38]
W. W. Cohen, Fast effective rule induction, in Proc. of the 12th Intl.
Conf. on Machine Learning, 1995, pp. 115–123.
L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification
and Regression Trees, Belmont, CA: Wadsworth, 1984.
[39]
[40]
[2015]
J. Catlett, Megainduction: A test flight, in ML, 1991, pp. 596–599.
G. Pagallo and D. Haussler, Boolean feature discovery in empirical
learning, Machine Learning, vol. 5, pp. 71–99, 1990.
152