Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
4th International Conference on System Modeling & Advancement in Research Trends (SMART) College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad [2015] A Systematic Review of Classification Techniques and Implementation of ID3 Decision Tree Algorithm Arohi Gupta1, Surbhi Gupta2, Deepika Singh3 Research Scholar, College of Computing Sciences & Information Technology, TMU, Moradabad, India 2 Research Scholar, College of Computing Sciences & Information Technology, TMU, Moradabad, India 3 Assistant Professor, College of Computing Sciences & Information Technology, TMU, Moradabad, India 1 1 2 [email protected] [email protected] 3 [email protected] Abstract— Data mining is a knowledge discovery process that analyzes the data and generates useful information and patterns from it which assist in decision making in an organization. Classification is a supervised learning technique of data mining, which consists of a set of predefined classes and on the basis of these predefined classes new objects are classified. Classification classifies data based on the training dataset and generates a classifier or model and uses it in classifying new data. In this research paper, we have discussed about the classification techniques proposed in the literature and the detailed study of the Decision tree based data mining algorithms such as ID3 and C4.5 has been done. Also, we have presented the comparative study of various classification algorithms along with their advantages and disadvantages. Keywords— Data Mining, Classification, Decision Tree, Neural Network, K-Nearest Neighbor, Naïve Bayesian I. INTRODUCTION The development of Information technology has generated large amount of databases and huge data in various areas. The research in databases and information technology has given rise to an approach to store and manipulate this precious data for further decision making. Data mining refers to extracting or mining knowledge from large amounts of data or we can say that data mining is a process of extraction of useful information and patterns from huge data. It is also called as knowledge discovery process, knowledge mining from data, knowledge extraction or data pattern Analysis [1][9]. Classification is one of the supervised learning technique of knowledge mining from the vast amount of data. In classification we find a model that describes and distinguishes data classes, using the training dataset whose class label is known. This model can be used to predict the class of objects whose class label is unknown. In literature the classification method is subdivided into a number of techniques for classifying data with the correct class labels. Some of these techniques as proposed by researchers are: Decision tree based method, Bayseian Classifiers, Neural network based classifiers, Lazy learner, Support vector machines, Rule based method [1][25]. Decision tree [34] based classification method is the graphical representation of the data point attributes and it is one of the simplest method for building a classifier model. A decision tree is represented using nodes, branches and leaves, where each node denotes a test, each branch represents an outcome of the test and leaves represent classes. This tree can be converted to classification rules [8]. Another method of classification, Naive Bayes classifier [35], is a simple probabilistic classification method based on applying Bayes theorem with strong independence assumptions. The model based on this classifier would be more precisely called as an independent feature model [10]. Classification can also be carried out using Neural Network [36] or an artificial neural network, which is a biological system that detects patterns and makes predictions [3]. Another approach for classification proposed in the literature is of using lazy learners [33]. KNearest Neighbor is a type of instance-based learning, or lazy learning algorithm, which classifies objects based on closest training examples in the feature space. In k- nearest neighbor algorithm, the function is only approximated locally and all computation is deferred until classification [11]. One of the strongest method for building classifiers is SVM (Support Vector Machines). Support Vector Machines [31] can classify both 144 4th International Conference on System Modeling & Advancement in Research Trends (SMART) College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad [2015] linear and non linear data. It can transform original network based classifiers, Lazy learners, Support training data into higher dimensions by using non vector machines, and Rule based method [1][25]. linear mapping [8]. Rule based classification [37] technique uses a collection of “if-then” rule for classifying the dataset [25]. In this paper our aim is to review the state of the art on the existing classification based algorithms and to present the advantages and disadvantages of the various classification algorithms so as to make a comparison among these algorithms. The rest of this research paper is categorized into subsections as: second section consists of overview of different classification techniques. In the third section, the detailed study of ID3 and C4.5 algorithms has been given. The fourth section is a comparative study of classification algorithms. Section Five is the conclusion and future aspects for proposing an efficient classification algorithm on the basis of the Fig. 1 Proposed taxonomy for the classification algorithms already proposed classification algorithms. II. STATE OF ART OF THE CLASSIFICATION TECHNIQUES In classification a model or classifier is constructed to find categorical labels. For example, the loan application data which can help a bank loan officer to analyze that the loan applicant is safe or risky for the bank. The categorical class labels for the loan application data are safe or risky. Data classification is a two-step process. In the first learning step, a classifier or model is built using training dataset whose class labels are known. In the second step, the model is used for classification, where the accuracy of the classifier is estimated using test dataset [8][1]. If the accuracy is considered acceptable, the classifier can be used to classify future data tuple whose class label is unknown. Some typical applications of classification are target marketing, Medical diagnosis, Credit approval, Fraud detection [8]. As we have stated a number of classification techniques has been proposed in the literature. To start with the description of the taxonomy, we have explained our proposal in Fig. 1, where we have categorized the classification algorithms. Mainly the classification process is divided into six different categories, which are named as Decision tree based method, Bayseian Classifiers, Neural A. Decision Tree Decision tree is a classifier that can be viewed as a tree where each internal node (non leaf node) denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a class label. It is a flowchart like structure. The topmost node in a tree is the root node. Decision trees can easily be converted to classification rules [8]. In data mining, Decision tree structures are a common way to organize classification schemes. Classification using a decision tree is performed by routing from the root node until arriving at a leaf node [12]. The algorithms used for decision tree are ID3 [13], C4.5 [32], C5.0 [5] and CART [38]. Fig. 2 Decision Tree Model 145 4th International Conference on System Modeling & Advancement in Research Trends (SMART) College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad 1) Decision Tree Algorithm Generate a decision: Generate a decision tree from the training tuples of data partition D. Input: Data partition D, which is a set of training tuples and their associated class labels, attribute list, Attribute selection method. Output: A decision tree. Method: Create a node N If tuples in D are all of the same class, C then Return N as a leaf node labelled with the class C If attribute list is empty then Return N as a leaf node labelled with the majority class in D Apply Attribute selection method (D, attribute list) to find the best splitting criterion Label node N with splitting criterion If splitting attribute is discrete-valued and multiway splits allowed then Attribute list attribute list splitting attribute For each outcome j of splitting criterion Let Dj be the set of data tuples in D satisfying outcome j If Dj is empty then attach a leaf labelled with the majority class in D to node N Else attach the node returned by Generate decision tree (Dj, attribute list) to node N, End for Return N [8]. The recursive partitioning stops only when any one of the following terminating condition is true: All of the tuples in partition D belong to the same class or [8]. There are no remaining attributes on which the tuples may be further partitioned. In this case, majority voting is employed. This involves converting Node N into a leaf and labeling it with the most common class in D. Alternatively, the class [2015] distribution of the node tuples may be stored [8]. There are no tuples for a given branch, that is, a partition Dj is empty. In this case, a leaf is created with the majority class in D [8]. B. Naive Bayesian Bayesian classifiers, statistical classifiers, can predict class membership probabilities, such as the probability that a given tuple belongs to a particular class. Bayesian classification is based on Bayes theorem. The naive Bayes method also called idiot’s Bayes, simple Bayes, and independence Bayes. It is very easy to construct, not needing any complicated iterative parameter estimation schemes [4]. Naive Bayes classifiers use all the attributes. It is based on following two assumptions: 1. Attributes are equally important. 2. Attributes are statistically independent i.e., class conditional independence, knowing the value of one attribute says nothing about the value of another. The naive Bayesian classifier works as follows: Let D be a training set of tuples and their associated class labels. Each tuple is represented by an n-D attribute vector X = (x1, x2,…, xn). Suppose there are m classes C1, C2,…, Cm. Classification is to derive the maximum posteriori, i.e., the maximal P(Ci|X). This can be derived from Bayes theorem P (C | X ) i P ( X | C ) P (C ) i i P (X ) Eq. 1 Since P(X) is constant for all classes, only needs to be maximized P (C | X ) P ( X | C ) P (C ) i i i Eq. 2 Naive Bayesian prediction requires each conditional probability be non-zero. Otherwise, the predicted probability will be zero. To avoid this problem, Laplacian correction or Laplace estimator technique is used. The corrected probability estimates will close to their uncorrected counterparts, yet the zero probability value will be avoided [8]. C. Neural Network An artificial neural network (ANN), also called a neural network (NN), is one of the newest signals 146 4th International Conference on System Modeling & Advancement in Research Trends (SMART) College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad processing technology. It is a mathematical model or computational model based on biological neural networks. It consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation. In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase [1]. After the training is complete the parameter are fixed. If there are lots of data and problem is poorly understandable then using ANN model is accurate, the non linear characteristics of ANN provide it lots of flexibility to achieve input output map [3]. Fig. 3 Example of an unacceptable low-resolution image Components of ANN are Neuron or node or unit, Input links, Output links, Weight. Each unit performs a simple process: 1. Receives n-inputs 2. Multiplies each input by its weight 3. Applies activation function to the sum of results 4. Outputs result D. K-Nearest Neighbor Nearest Neighbor classifiers are based on learning by analogy, in which a given test tuples is compared with training tuples that are similar to it. All training tuples are stored in an n-dimensional pattern space because each tuple represents a point in an n-dimensional space [1][8]. When given an unknown tuple, a k-nearest neighbor classifier searches the pattern space for the k training tuples that are closest to the unknown tuple. These k training tuples are the k nearest neighbors of the unknown tuple. Closeness is defined in terms of a distance metric, such as Euclidean distance [6][1]. The Euclidean distance between two points or [2015] tuples, say X1=(x11, x12,….x1n) and X2=(x21 , x22 ,….x2n) is dist(X1, X2) = √Σ(x1i – x2i)2 Eq. 3 We normalize the values of each attribute before using the Eq. 3 (min-max normalization). In knearest neighbor classification, the unknown tuple is assigned the most common class among its k nearest neighbours [8]. For categorical attributes we compare the corresponding value of the attribute in tuple X1 with that in tuple in X2. If the two are identical then the difference between the two is taken as 0. If the two are different then the difference is considered to be 1[8]. E. Support Vector Machines As Support Vector Machines (also known as Maximization Margin Classifiers) simultaneously minimize the empirical classification error and maximize the geometric marginal [23] so, it does not depend on the dimensions of the feature space and can therefore efficiently handle high dimensional data [22][23]. These are based on Structural risk minimization [23]. The basic concept of structural risk minimization is to find the hypothesis for which it guarantees the lowest true error [22]. SVM has strong regularization properties. Regularization refers to the generalization of the model to new data. Support vector machines were designed as a tool to solve supervised learning classification problems [29][30]. SVM map input vector to a higher dimensional space where a maximal separating hyperplane is constructed. Two parallel hyperplanes are constructed on each side of the hyperplane that separate the data. The separating hyperplane is the hyperplane that maximize the distance between the two parallel hyperplanes. An assumption is made that the larger the margin or distance between these parallel hyperplanes the better the generalization error of the classifier will be [23][31]. F. Rule Based Method Rule based classification technique uses a collection of “if- then” rule for classifying the dataset [25]. For example consider the rule R1 given as 147 4th International Conference on System Modeling & Advancement in Research Trends (SMART) College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad R1: IF Manual Checkup = Pass AND Year = Valid THEN Issue = Yes The ‘IF’ part of a rule is called the rule antecedent or precondition and the ‘THEN’ part is the rule consequent. Here in the above rule we are predicting the pollution under control certificate for a vehicle. If all the attribute(i.e. manual checkup and year) tests in the rule holds true for the given tuple, we say that the rule satisfied and that the rule covers the tuple. There are two parameters for the assessing a rule R, defined as the rule coverage and the rule accuracy [8]. Consider a dataset D where |D| denotes the number of tuples in D. Let ncovers be the number of tuples covered by rule R and ncorrect be the number of tuples correctly classified by R. Then the coverage and accuracy is defined as [2015] simplest, tree is found [8]. The expected information needed to classify a tuple in D is given by [8] Eq. 6 Info(D) is also known as the entropy of D. InfoA(D) is the expected information required to classify a tuple from D based on the partitioning by A and is given by [8] Eq. 7 The smaller the expected information required, the greater the purity of the partitions. Information gain is defined as the difference between the original information requirement and the new requirement [8]. Eq. 8 Gain(A) = Info(D) - InfoA(D) Experiments to evaluate the performance of the algorithm with continuous valued attributes and missing attribute values reveal that ID3 does not give acceptable results for continuous valued Eq. 5 attributes and works well in certain data sets with missing values [14][15]. III. DECISION TREE BASED ALGORITHM The entropy and information gain has been As stated above in section II here we have shown calculated for the data shown in the table below the implementation of decision tree based algorithm [16]. Fig. 4 is the screen shot of the calculated ID3 in java and have calculated the information values. gain and entropy. Then further in this section, we TABLE I have given a brief description of C4.5 and C5.0. DATABASE D Eq. 4 Name Fuel Catego ry Kilometers Servic e Year Manu al Check up (MC) Issue Riva Petrol Two Not Covered No Valid Pass Yes Sita Petrol Two Covered Yes Invalid Pass No Puru Petrol Four Covered No Valid Fail No Riya Petrol Four Covered Yes Valid Pass Yes Neha Diesel Four Covered No Invalid Fail No Ram Diesel Three Covered No Valid Fail No Ekta Diesel Three Not Covered No Valid Pass Yes Saya CNG Three Not Covered No Valid Pass Yes Ajay Petrol Four Covered Yes Invalid Fail No A. ID3 ID3 uses information gain as its attribute selection measure. ID3, Iterative Dichotomiser3 is a decision tree learning algorithm which is used for the classification of the objects with the iterative inductive approach. It uses the greedy top to down search to build the tree which will decide the decision rules [16][14]. Let D is a database with N number of nodes and some tuples. The attribute with the highest information gain is chosen as the splitting attribute for Node N. This attribute minimizes the information needed to classify the tuples in the resulting partitions and reflects the least randomness or impurity in these partitions. Such an approach minimizes the expected number of tests needed to classify a given tuple and guarantees that a simple but not necessarily the 148 4th International Conference on System Modeling & Advancement in Research Trends (SMART) College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad [2015] It differs from information gain, which measures the information with respect to classification that is acquired based on the same partitioning. The gain ratio is defined as Eq. 10 The attribute with the maximum gain ratio is selected as the splitting attribute [8]. C. C5.0 C4.5 was superseded in 1997 by a commercial system See5/C5.0 (or C5.0 for short) [17]. C4.5 algorithm follows the rules of ID3 algorithm. Similarly C5 algorithm follows the rules of C4.5 algorithm. C5.0 algorithm provides Feature selection, Cross validation and reduced error Fig. 4 Calculated Entropy and Information Gain for Database D pruning facilities. C5.0 algorithm has many features The decision tree for this database based on the like [5]: calculated entropy and information gain is as shown 1. The large decision tree can be viewing as a set in Fig. 5. of rules which is easy to understand [5]. 2. It gives the acknowledge on noise and missing data [5]. 3. Problem of over fitting and error pruning is solved by the C5.0 algorithm [5]. 4. C5.0 classifier can anticipate which attributes are relevant and which are not relevant in classification [5]. IV. COMPARATIVE ANALYSIS OF CLASSIFICATION ALGORITHMS In this section we studied advantages and disadvantages of various classification algorithms. Fig. 5 Decision Tree for Database D B. C4.5 The information gain measure is biased toward tests with many outcomes. C4.5 [24], a successor of ID3, uses an extension to information gain known as gain ratio (attribute selection measure), which attempts to overcome this bias [8]. When all attributes are binary, the gain ratio criterion has been found to give considerably smaller decision trees [13]. It applies a kind of normalization to information gain using a split information value defined analogously with Info(D) as Eq. 9 149 4th International Conference on System Modeling & Advancement in Research Trends (SMART) College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad [2015] TABLE III ADVANTAGES AND DISADVANTAGES OF CLASSIFICATION ALGORITHMS Algorithm ID3 C4.5 C5.0 Neural Network K- Nearest Neighbor Naive Bayesian Suport Vector Machines Rule Based Method Advantages Very simple [21] Easy to implement. Quite a simple process. Running time increases only linealy with complexity of problem. Avoid overfitting the data [2]. Faster than ID3 [2]. More memory efficient than ID3 [2]. Handles missing and continuous attributes [2]. Determining how deeply to grow a decision tree. Improved computational efficiency. Faster than c4.5 [5][18]. Use less memory than C4.5 during ruleset construction [18]. Gets similar result with smaller decision tree [5]. Supports boosting: It improves the trees and give more accuracy. C5.0 rulesets are easier to understand [19]. Lower error rates on unseen cases [5][18]. Solve problem of over fitting [5] High tolerance to noisy data. Ability to classify untrained patterns. Well-suited for continuous-valued inputs and outputs. Successful on a wide array of real-world data. Algorithms are inherently parallel. Techniques have recently been developed for the extraction of rules from trained neural networks. Easy to understand [7][17]. Easy to implement [7][17]. Training is very fast [7]. Robust to noisy training data [7]. Particularly well suited for multimodal classes as well as applications in whichan object can have many class labels [17]. Simple for constructing classifier. Require small amount of training data to estimate the parameters necessary for classification. Even if the naïve bayes assumptions does not hold, a naïve bayes classifier still often performs surprisingly well in practice. Good performance[7]. It can efficiently handle non linear data. It can handle multi class problem. By introducing the kernel, SVMs gain flexibility in the choice of the form of the threshold separating solvent from insolvent companies [26]. No assumptions about the functional form of the transformation [26]. Provide a good out-of-sample generalization [26]. Deliver a unique solution, since the optimality problem is convex [26]. Easy for people to understand [39][25]. rule learning systems outperform decision tree learners on many problems [25][40][20]. Disadvantages Does not guarantee optimal solution. Does not give acceptable result for continuous data and missing data [15]. It takes the more memory. It has long searching time. Data may be over-fitted or over-classified [21]. Empty branches [21]. Insignificant branches [21]. Susceptible to noise [21]. For applications with very many cases C5.0 may crash with a message like segmentation fault [19]. The use of case weighting does not guarantee that the classifier will be more accurate for unseen cases with higher weights [19]. Long training time. Require a number of parameters typically best determined empirically Poor interpretability Memory limitation [7]. Being a supervised learning lazy algorithm i.e., runs slowly [7]. Expensive particularly for large training sets [17]. The naive Bayes classifier requires a very large number of records to obtain good results [7]. It is instance-based or lazy in that they store all of the training samples [7]. The marginal contribution of each financial ratio to the score is variable [26]. The lack of transparency of results [26]. The choice of the kernel [27]. Extension of multiclass problem [28]. Long training time [28]. Selection of parameters [28]. When data contains uncertainty, the algorithm can not process the uncertainty properly [25]. 150 4th International Conference on System Modeling & Advancement in Research Trends (SMART) College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad V. CONCLUSIONS [12] Data mining is a wide area that integrates techniques from various fields. These techniques can be based on supervised learning method or unsupervised learning method. One of the supervised learning based method, called as classification, for mining data patterns has been reviewed in this paper. The important task of classification process is to classify new and unseen sample correctly. These classification algorithms can be implemented on different types of data sets like data of patients, financial data, and student data. Each technique has got its own pros and cons as given in the paper. Based on the needed conditions each one as needed can be selected. This paper deals with various classification techniques used in data mining and a detailed study on ID3 and C4.5 decision tree based algorithms, has been conducted. [13] REFERENCES [22] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] S. Neelamegam, Dr. E. Ramaraj, Classification algorithm in Data mining: An Overview, International Journal of P2P Network Trends and Technology (IJPTT), Volume 4 Issue 8, Sep 2013 A. S. Galathiya, A. P. Ganatra, C. K. Bhensdadia, Classification with an improved Decision Tree Algorithm, International Journal of Computer Applications (0975 – 8887), Volume 46– No.23, May 2012 Nikita Jain, Vishal Srivastava, Data mining techniques: a survey paper, IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 Raj Kumar, Dr. Rajesh Verma, Classification Algorithms for Data Mining: A Survey, International Journal of Innovations in Engineering and Technology (IJIET) Vol. 1 Rutvija Pandya, Jayati Pandya, C5.0 Algorithm to Improved Decision Tree with Feature Selection and Reduced Error Pruning, International Journal of Computer Applications (0975 – 8887) Volume 117 - No. 16, May 2015 Thair Nu Phyu, Survey of Classification Techniques in Data Mining, Proceedings of the International Multi Conference of Engineers and Computer Scientists 2009 Vol I, IMECS 2009, March 18 - 20, 2009, Hong Kong S. Archana, Dr. K. Elangovan, Survey of Classification Techniques in Data Mining, International Journal of Computer Science and Mobile Applications, Vol.2 Issue. 2, February- 2014, pg. 65-71 ISSN: 23218363 Jiawei Han & Micheline Kamber, DM Concepts & Techniques , Second Edition CLUSTERING AND CLASSIFICATION: DATA MINING APPROACHES by Ed Colet Kavitha Murugeshan, Neeraj RK, Discovering Patterns to Produce Effective Output through Text Mining Using Naïve Bayesian Algorithm, International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-2, Issue-6, May 2013 Dorina Kabakchieva, Predicting Student Performance by Using Data Mining Methods for Classification, BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES, Volume 13, No 1 ,Sofia 2013 Print ISSN: 13119702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 [14] [15] [16] [17] [18] [19] [20] [21] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [2015] Decision Trees for Business Intelligence and Data Mining: Using SAS Enterprise Miner, Decision Trees, What Are They? J.R. QUINLAN, Induction of Decision Trees, Machine Learning 1: 81106, 1986, Kluwer Academic Publishers, Boston - Manufactured in The Netherlands. Anand Bahety, Extension and Evaluation of ID3 – Decision Tree Algorithm’. University of Maryland, College Park. Rupali Bhardwaj , Sonia Vatta, Implementation of ID3 Algorithm, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 6, June 2013 ISSN: 2277 128X Rupali Bhardwaj, Sonia Vatta, Issuing of Pollution Under Control Certificate using ID3 algorithm, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 5, May 2013 ISSN: 2277 128X XindongWu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, Dan Steinberg, Top 10 algorithms in data mining, Knowl Inf Syst (2008) 14:1–37DOI 10.1007/s10115-007-0114-2 The Rulequest Research Website. [Online]. Available: http://rulequest.com/see5-comparison.html. The Rulequest Research Website. [Online]. Available: http://www.rulequest.com/see5-unix.html S. M. Weiss and N. Indurkhya, Reduced complexity rule induction, in IJCAI, 1991, pp. 678–684. Sonia Singh, Priyanka Gupta, COMPARATIVE STUDY ID3, CART AND C4.5 DECISION TREE ALGORITHM: A SURVEY, International Journal of Advanced Information Science and Technology (IJAIST), Vol.27, No. 27, July 2014. Thorsten Joachims, Text categorization with Support Vector Machines: Learning with many relevant feature, 10th European Conference on Machine Learning Chemnitz, Germany, Vol. 1398, April 21–23, 1998 Proceedings, pp 137-142. DURGESH K.Srivastava, Lekha Bhambhu, DATA CLASSIFICATION USING SUPPORT VECTOR MACHINE, Journal of Theoretical and Applied Information Technology © 2005 - 2009 JATIT. J. R. Quinlan, Improved Use of Continuous Attributes in C4.5, Journal of Artificial Intelligence Research, Vol 4, (1996), 77-90 Biao Qin, Yuni Xia, Sunil Prabhakar, Yicheng Tu, A Rule-Based Classification Algorithm for Uncertain Data, IEEE International Conference on Data Engineering. Laura Auria, Rouslan A. Moro, Berlin, Support Vector Machines (SVM) as a Technique for Solvency Analysis, August 2008 CHRISTOPHER J.C. BURGES, A Tutorial on Support Vector Machines for Pattern Recognition, Kluwer Academic Publishers, Boston, Manufactured in the Netherlands Support Vector Machine for Pattern Classification by Shigeo Abe Himani Bhavsar, Mahesh H. Panchal, A Review on Support Vector Machine for Data Classification, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), Volume 1, Issue 10, December 2012 N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines, Cambridge University Press, 2000 V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag. 1995. J. R. Quinlan, Improved Use of Continuous Attributes in C4.5, Journal of Arti_cial Intelligence Research 4 (1996) 77-90, Submitted 10/95; published 3/96. T. G. Dietterich, Ensemble methods in machine learning, Lecture Notes in Computer Science, vol. 1857, pp. 1–15, 2000. J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufman Publishers, 1993. P. Langley, W. Iba, and K. Thompson, An analysis of Bayesian classifiers, in National Conf. on Artigicial Intelligence, 1992, pp. 223– 228. R. Andrews, J. Diederich, and A. Tickle, A survey and critique of techniques for extracting rules from trained artificial neural networks, Knowledge Based Systems, vol. 8, no. 6, pp. 373–389, 1995. 151 4th International Conference on System Modeling & Advancement in Research Trends (SMART) College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad [37] [38] W. W. Cohen, Fast effective rule induction, in Proc. of the 12th Intl. Conf. on Machine Learning, 1995, pp. 115–123. L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Belmont, CA: Wadsworth, 1984. [39] [40] [2015] J. Catlett, Megainduction: A test flight, in ML, 1991, pp. 596–599. G. Pagallo and D. Haussler, Boolean feature discovery in empirical learning, Machine Learning, vol. 5, pp. 71–99, 1990. 152