Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CST NEW SLICING TECHNIQUES TO IMPROVE CLASSIFICATION ACCURACY OMAR A. A. SHIBA1 Computer Department-Faculty of Science-University of Sebha-Libya E-mail: 1 [email protected], ABSTRACT Classification problem is one of the core topics in data mining. Within the artificial intelligence community, classification has been demonstrated to be an important modality in diverse problem-solving situations. The goal of this paper is to improve the classification problem solving accuracy in data mining. The paper achieves this goal by introducing a new algorithm based on slicing techniques. The proposed approach called Case Slicing Technique (CST). The idea is based on slicing cases with respect to slicing criterion. The paper also compares the proposed approach against other approaches in solving the classification problem. ID3, K-NN and Naive Bayes approaches are used in this comparison. This paper also presents the experimental results of the CST on three real-world datasets. The main results obtained showed that the proposed approach is a promising classification method in solving the decision-making in classification problem. Keywords: Data Mining, Case-Slicing Technique, Case Retrieval, Classification Accuracy. 1. INTRODUCTION One of the problems addressed by machine learning is that of data classification. Since the 1960’s, many algorithms for data classification have been proposed [1]. The problem of classification is defined as follows: The input data is referred to as the training set, which contains a plurality of records, each of which contains multiple attributes or features. Each example in the training set is tagged with a class label. The class label may either be categorical or quantitative. The problem of classification in the context of a quantitative class label is referred to as the regression-modeling problem. The training set is used in order to build a model of the classification attribute based upon the other attributes. This model is used in order to predict the value of the class label for the test set. Some well-known techniques for classification include Decision Tree: Induction of Decision Tree Algorithm (ID3) [2] [3], K-Nearest Neighbor Techniques [4] [5] and Bayesian Learners: Naive Bayes (NB) [6]. This paper investigates that, the possibility of using slicing technique successfully with classification problem and how does slicing technique compare as a classification approach against to ID3, K-NN and Naive Bayes classification approaches. The paper is organized as follows: the next section presents the selected classification algorithms. Section 3 describes the case slicing technique. Section 4 presents experimental results that compare the performance of the algorithms. Finally the conclusion is presented in section 5. 2. SELECTED CLASSIFICATION ALGORITHMS In this section only a brief description of the selected classification methods is given here, more information can be found in [2] [4] [6] for ID3, K-NN and NB respectively. 2.1 INDUCTION OF DECISION TREE ALGORITHM (ID3) ID3 is an algorithm introduced by Quinlan for inducing Classification Model, also called Induction of Decision Tree, from data. We are given a set of records. Each record has the same structure, consisting of a number of attribute/value pairs. One of these attributes represents the category of the record. The problem is to determine a decision tree that on the basis of answers to questions about the noncategory attributes predicts correctly the value of the category attribute. Usually the category attribute takes only the values {true, false}, or {success, failure}, or something equivalent. In any case, one of its values will mean failure [2] [3]. The basic ideas behind ID3 are that: In the decision tree each node corresponds to a non-categorical attribute and each arc to a possible value of that attribute. A leaf of the tree specifies the expected value of the categorical attribute for the records described by the path from the root to that leaf. [This defines what is a Decision Tree.] In the decision tree at each node should be associated the non-categorical attribute which is most informative among the attributes not yet considered in the path from the root. [This establishes what is a "Good" decision tree.] Entropy is used to measure how informative is a node. [This defines what we mean by "Good". By the way, this notion was introduced by Claude Shannon in Information Theory [7]]. 2.2 K-NEAREST NEIGHBOR (K-NN) The basic idea of the K-Nearest Neighbor algorithm (K-NN) is to compare every attribute of every case in the set of similar cases with every corresponding attribute of the input case. A numeric function is used to decide the value of comparison. Then the K-NN algorithm selects a case, which has the highest comparison value and retrieves it [4]. K-NN assumes each case X= {x1, x2…xn, xc} is defined by a set of n (numeric or symbolic) features, where xc is x’s class value. Given a query q and case library L, k-NN retrieves the set k of q’s k most similar (i.e., least distant) cases in L and predicts their weighted majority class as the class of q [5]. Distance in K-NN is defined as in equation (1) below: distance ( x, q ) = n f =1 w f * difference ( xf , qf ) 2 (1) Where wf is the parameterized weight value assigned to feature f as in equation (2) below: w f = P(C ia ) (2) That is, the weight for feature a for a class c is the conditional probability that a case is a member of c given the value to a where P(C|ia) is defined in equation (3); and the difference between x and q can be calculated as in equation (4). P (C ia ) = instances containing ia ∧ class = C instances containing ia x f − q f if ( difference x f , q f )= 0 if (3) f is numeric f is symbolic & xf = qf (4) 1 otherwise In the equation (2) k-nn assigns equal weights to all features (i.e. f {wf = 1}). 2.3 NAÏVE BAYES (NB) The estimates of the probability masses are used as input for a Naïve Bayes classifier. This classifier simply computes the conditional probabilities of the different classes given the values of attributes, and then selects the class with the highest conditional probability. If an instance is described with n attributes ai (i=1…n), then the class that instance is classified to a class v from set of possible classes V according to a Maximum A Priori criterion (MAP) Naive Bayes classifier can be defined as in equation (5): n v = arg max p( v )∏ p( a i | v j ) v j ∈V (5) i =1 The conditional probabilities in the above formula are obtained from the estimates of the probability mass function using training data. This Bayes classifier minimizes the probability of classification error under the assumption that the sequence of points is independent [6]. 3. THE CASE SLICING ALGORITHM Conceptually, the proposed method is a variation of the nearest neighbor algorithms and is called Case Slicing Technique (CST). It compares new cases to the training cases in the data file. It computes the similarity between the new cases and training cases to classify the new cases. The proposed method is a classification technique based on slicing [9] [10][11]. Case Slicing means we are interested in automatically obtaining that portion ‘features’ of the case responsible for specific parts of the solution of the case at hand. By slicing the case with respect to important features we can obtain new case with a small number of features or with only important features. The proposed approach consists of a database with three calculation modules as follows: Discretization module to compute and find the discrete value for continues values. Features weighting module to assign weight to each feature. Distance computation module to compute the distance between each two cases. 3.1 SLICING TECHNIQUE The objective of slicing technique is to optimize the similarity matching to achieve best classification results. The proposed approach is adapting slicing technique that have been used in programming language, to slice the cases by selecting subset of features with respect to the selected slicing criterion. Case classification algorithm based slicing is shown in Figure 1. below: Algorithm: Algorithm for Case Classification Input: User’s Input Problem Specification Output: Classified Case Begin While True do Discrete_continuous_values() {To convert continues values into discrete values} Assign_Weights_Cases() {To assign weights to each feature in the each case} Slicing_Cases_w.r.t.Slicing_Criterion() {To slice the cases w. r. t. important features} Calculate_Distance() {To find the distance between each two features} Closer_ Case_Searching() {To find closest case to the case at hand} Return_Classified_Case() {To assign class label to the new case} Enddo End. Fig. 1. Case classification algorithm based slicing 3.1.1 A FORMAL DESCRIPTION OF CASE SLICING TECHNIQUE In this section we will give a formal description of Case Slicing Techniques in a basic version allowing for a detailed investigation of the approach. Let S = {C1 , C2 , C3 … Cn} set of cases in Case Base. ∀S ∃Ci S ≠ φ. Ci = {f1 , f2 , f3 … fn} where n is the number of features in Ci λ = [{Cs|Cs is a set of sliced cases}] OR λ = {all cases that contains one or more important feature(s)} I = {if1, if2,.…, ifn} where n is the number of important features in I I ⊆ Ci ⊆ S I ⊆ Cs ⊆ λ. 4. EXPERIMENTAL RESULTS In this section the results of several practical experiments are presented to examine the performance of the proposed approach and the performance of the selected classification algorithms on real world problems. 4.1 SELECTED DATASETS In this paper three real-world datasets has been used, which are widely used in the machine-learning field for evaluation of case slicing technique. The three medical datasets: Cleveland Heart Disease (CLEV), Breast Cancer (BCO) and Hepatitis Domain (HEPA) were chosen from the UCI: Machine Learning Repositories and Domain Theories [12]. Table 1. presents the main characteristics of these datasets, where B, C and D in the table means Boolean, continuous and discrete attributes respectively. Table 1 Characteristics of the selected datasets Datasets CLEV BCO HEPA No. Of Data 303 699 155 Type & No. Of Attributes 9D, 6C (15) 13B, 6C (19) 13B, 6C (19) No. Of Classes 2 2 2 REFERENCES 4.2 EMPIRICAL RESULTS We evaluated the performance of the case slicing technique by comparing it against ID3, Naive Bayes and K-Nearest-Neighbor classifiers on a variety of datasets. The datasets we have selected are very good choice to test and evaluate the slicing technique because there is a good mixture of continues, discrete and Boolean features. In all the experiments reported here we used the evaluation technique 10-fold crossvalidation, which consists of randomly dividing the data into 10 equally, sized subgroups and performing ten different experiments. We separated one group along with their original labels as the validation set; another group was considered as the starting training set; the remainder of the data were considered the test set. Each experiment consists of ten runs of the procedure described above, and the overall average are the results reported here. The criterion of choosing the best classification approach is based on the highest percentage of classification. The results, given in Table 2, list the classification accuracies achieved by each approach for each of the datasets. Table 2 The classification accuracy achieved by the different classification algorithms. Methods ID3 NB K-NN CST Datasets CLEV BCO HEPA 71.20 94.30 67.74 83.40 96.40 86.30 71.20 97.10 92.90 96.00 99.30 97.00 5. CONCLUSION This paper has presented and discussed the Case Slicing Technique (CST) as a new approach based slicing to improve classification task in data mining. CST was supported with experiments on three datasets. The experiments show that using the CST indeed improves the accuracy of classification. The paper also gave brief discription of a number of common classification algorithms that are used either in data mining or in general AI. The paper has presented a comparison between proposed method and other selected classification algorithms using medical domain. The proposed technique has possessed a competitive result. It gave very high percentage of classification accuracy. [1] Thamar, S. and Olac, F., “Improving Classifier Accuracy Using Unlabeled Data” in Proceedings of the IASTED International Conference on Artificial Intelligence and Applications (AIA2001), Marbella, Spain, Sept. 2001. [2] Quinlan, J. R. “Induction of Decision Trees.” Machine Learning, Vol. 1, No. 1, pp. 81–106, 1986. [3] Quinlan, J. R. C4.5: Programs for Machine Learning, CA: Morgan Kaufmann Publishers, Inc., 1993. [4] Xiaoli, Q. A Case-Based Reasoning System For Bearing Design, Master Thesis Submitted to the Faculty of Drexel University, June 1999. [5] Wettschereck, D. and Aha, D. W. “Weighting Features”, In proceedings of the 1st International Conference on CBR (ICCBR-95), pp. 347-358, 1995. [6] Mitchell, T. M. Machine Learning, McGrawHill, New York, 1997. [7] Cover, T. M. and Thomas, J. A. Elements of Information Theory, Wiley, 1991. [8] Thamar, S and Olac, F. “Improving Classification Accuracy of Large Test Sets Using the Ordered Classification Algorithm”, in Proceedings of IBERAMIA-02, Lecture Notes in Computer Science, 2002. [9] Weiser, M. “Program Slicing.” IEEE Transaction Software Engineering, Vol. 10, No. 4, pp. 352357, 1984. [10] Vasconcelos, W. W. “Slicing Knowledge-Based Systems Techniques and Applications.” Knowledge based Systems Journal, Elsevier, Vol. 13, pp. 177-198, 2000. [11] Omar A. Shiba “Using Code Slicing Technique To Improve Case Classification: A suggested Case Slicing Approach.” International Journal of Applied Computing & Informatics, Vol. 2, No. 1, pp. 24-37, 2003. [12] Murphy, P. M. UCI Repositories of Machine Learning and Domain Theories [online]. University of California, Irvine Available: http://www.isc.uci.edu/~mlearn/MLRepository.ht ml, 1996. [2001, Nov. 12 - Date of brows].