Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 4, NO. 3, AUGUST 2013 129 Mining Complex Data Streams: Discretization, Attribute Selection and Classification Dewan Md. Farid and Chowdhury Mofizur Rahman Department of Computer Science and Engineering, United International University, Dhaka-1209, Bangladesh Email: [email protected] Abstract—Due to the large volume of data set as well as complex and dynamic properties of data instances, several data mining algorithms have been applied for mining complex data streams in the last decades. Now a day, knowledge extraction from data streams is getting more complex because the structure of the data instance does not match the attribute values when considering the tabulated data, texts, web, images or videos etc. In this paper, we address some difficulties of mining complex data streams such as dealing with continuous attributes, input attribute selection, and classifier construction. The proposed discretization algorithm finds the possible cut points in continuous attributes using information gain heuristic and naïve Bayesian classifier that can separate the class distributions. We evaluate the proposed algorithms on several benchmark data sets from UCI machine learning repository. The experimental results demonstrate that the proposed method improves the quality of discretization of continuous attributes and scales up the classification accuracy for different types of classification problem. Index Terms—cut point, contradictory example, data stream, data mining, interval border, redundant attribute I. INTRODUCTION Mining complex data stream is the process of extracting knowledge from large volume of data set that come from real life applications. In data stream classification technique, we need to consider many issues such as discretization of continuous attributes, important input attributes selection from training dataset, removing contradictory examples from training dataset, and classifier construction. The contradictory examples appear more than once in the training data set with different class labels. It confuses the learning algorithms, so these examples should be avoided or labeled correctly before learning. A. Motivation To develop highly adaptive system that allows selflearning from streaming data (collected in real-time sensory data, video, speech, advanced industrial applications, intrusion detection etc) is becoming a Manuscript received October 17, 2012; accepted March 3, 2013. Corresponding author: Dewan Md. Farid Email: [email protected] ©2013 ACADEMY PUBLISHER doi:10.4304/jait.4.3.129-135 challenging research issue for intelligent computational researchers. Data mining is the process of analyzing data from large stores of data to discover patterns and summarizing it into useful information. Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events. Data mining is also known as Knowledge Discovery in Data (KDD). In classification problems of supervised learning most of the learning algorithms only deal with the discretized or symbolic attribute values. So it is necessary to transform the continuous attribute into discretized attribute [1]-[4]. The discretized intervals can be treated in a similar way to nominal values during learning and classification. The classification accuracy of data mining algorithm is depending on the quality of discretization of continuous attributes. This paper significantly extends our previous work on supervised discretization of continuous attributes in data mining [29] in several ways. First, in our previous work, we did not consider the attribute selection process. Second, we have proposed an algorithm for handling the contradictory training instances in this paper. Third, more discussions are added for improved readability, clarity, and analytical enrichment. The discretization algorithm finds the possible cut points in continuous attributes that can separate the class distributions, and then considers the best cut point as an interval border with information gain heuristic and naïve Bayesian classifier. We have successfully tested the proposed algorithms on a number of benchmark problems from UCI machine learning repository. The proposed discretization algorithm can also select the redundant attributes from training dataset and also improve the data stream classification accuracy. B. Dealing with Continuous Attributes The objective of dealing with continuous attribute is to discretize the continuous attribute into a number of intervals (i.e., real number or integers). Ideally, each interval of the discretization should contain only the data instances that belong to the same class. It is very important to find the right places to set up the interval borders [5]. The simplest technique is to place the interval borders of continuous attribute between each adjacent pair of attribute values that are not classified into the same class. Suppose the pair of adjacent values on attribute Ai are Ai1 and Ai2, “A = (Ai1+Ai1)/2” can be taken as an interval border. 130 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 4, NO. 3, AUGUST 2013 Existing discretization methods partition the attribute range into two or several intervals using a single or a set of cut points, which are categorized into top-down or bottom-up approach. Top-down discretization starts with a single interval that encompasses the entire value range, and then repeatedly splits it into sub-intervals until some stopping criterion is satisfied. It gives a list of k-1 boundary points. From each boundary point, a binary partition is built by splitting the population at this point. The point which optimizes a given criterion is then retained as the cut point. The process stops if no improvement can be done. Bottom-up discretization starts with each value in a separate interval, and then repeatedly merges adjacent intervals until a stopping criterion is satisfied. The information gain heuristic and naïve Bayesian classifier are well known discretization methods that enhance the performance of a classifier [6]-[9]. The information gain technique finds the most informative border to split the values of the continuous attribute. The maximum gain value always considers the cut point (the midpoint) between two attribute values of different classes. In C4.5 algorithm, each of the possible cut point is not the midpoint between the two nearest values, rather than the greatest value in the entire dataset that does not exceed the midpoint. The naïve Bayesian classifier also uses for discretization of continuous attributes by constructing a probability curve for each class in the dataset. When the curves for every class have been constructed, interval borders are placed on each of those points where the leading curves are different from its two sides. A few other methods such as equal distance division, grouping, k-nearest neighbors, and fuzzy borders are also applied for dealing with continuous attributes. C. Input Attribute Selection Effective and useful input attributes selection from the training dataset is a form of search. Input attribute selection involves the selection of a subset of attributes d from a total of D original attributes of training dataset, based on a given optimization principle that improves the performance of classifier. Ideally, attribute selection methods search through the subsets of attributes, and try to find the best ones among the competing 2N candidate subsets according to some evaluation function. In complex classification domains, input attribute selection is very important, because irrelevant and redundant attributes may lead to complex classification model as well as reduce the classification accuracy [10]-[16]. Sometimes, input attributes of training dataset may contain false correlations, which hamper the classification process. Some attributes in the training dataset may be redundant, because the information they add is contained in other attributes. Also, some extra attributes can increase the computational time, and can have impact on the classification accuracy. D. Classification Classification maps data into predefined groups or classes. It is often referred to as supervised learning in ©2013 ACADEMY PUBLISHER data mining, because the classes are determined before examining the data instances. Classification creates a function from training dataset. The training dataset consist of number of attributes, and desired output. The output of the function can be a continuous value, or can predict a class label of the input object. The task of the classification is to predict the value of the function for any valid input object after having seen only a small number of training examples [30]-[31]. E. Structure of the Paper The remainder of the paper is organized as follows. Section II describes the related works, ID3 classifier, and naïve Bayesian classifier. Section III presents the proposed algorithms. Section IV describes the experimental results based on a number of widely used benchmark datasets from the UCI machine learning repository [17]. Finally, Section V presents our conclusions with future works. II. STATE OF THE ART The decision tree (DT) and naïve Bayesian (NB) classifiers are powerful and popular tools for classification problem in supervised learning, which are widely used in many applications in the fields of data mining, information retrieval, image processing, bioinformatics, stock prediction, and weather forecasting etc. DT is easy to implement and requires little prior knowledge. In DT the successive division of the set of training examples proceeds until all the subsets consists of examples of a single class. DT can be constructed from training dataset with many attributes. In NB classifier the prior and conditional probabilities are calculated from a given training dataset and then these probabilities are used to classify the known or unknown examples. A. Related Works In 1993, Fayyad and Irani [18] used an information criterion called minimum description length principle cut (MDLPC) based on information gain for discretization of continuous attributes. Also in 1993, Van de Merckt [19] proposed a criterion that took into account the homogeneity of the classes and also the point density. In 1995, Dougherty et al. [20] proposed supervised and unsupervised methods for discretization of continuous attributes by using entropy based and statistics based methods to find the relationship between the intervals and the classes. Supervised methods are only applicable when the datasets are divided into classes, which refer to the class information when selecting discretization cut points, but unsupervised methods do not use the class information. In 1995, Pfahringer [21] used entropy to select a large number of candidate splitting points and employed a best first search with a minimum description length heuristic to determine a good discretization. In 1998, Perner and Trautzsch [22] applied decision tree learning for discretization of continuous attributes, which discretized the attribute according to class values were calculated. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 4, NO. 3, AUGUST 2013 In 2000, Bay introduced univariate method to discretize an attribute without reference to attributes other than the class, which considered relationships among attributes during discretization [23]. In 2000, Hsu et al. [24] used mapping function to discretize continuous attributes as it was needed during classification time. In 2001, Ishibuchi et al [25] applied fuzzy discretization that created a fuzzy mapping function using an initial primary method and then used other primary methods to adjust the initial cut points. In 2005, Boulle [26] used bottom-up discretization to optimize naïve Bayes’ classifier. It used a standard discretization model distributed according to a three-stage prior, which is Bayes optimal for a given set of examples to be discretized. B. Information Gain Heuristic The ID3 algorithm applies information gain heuristic for attribute selection from training dataset [27]. The attribute with the highest information gain value is chosen as the splitting attribute for node N in decision tree. This attribute minimizes the information needed to classify the examples in the resulting partitions and reflects the least randomness or “impurity” in these partitions. The expected information needed to classify an example in training dataset D is given by m Info( D) = −∑ pi log 2 ( pi ) (1) i Where pi is the probability that an arbitrary example in training dataset D belongs to class Ci and is estimated by |Ci,D|/|D|. A log function to the base 2 is used, because the information is encoded in bits. Info(D) is just the average amount of information needed to identify the class label of an example in dataset D. Partitioning (e.g., where a partition m a y contain a c o l l e c t i o n of examples from different classes rather than f r o m a single class) needs in following information to produce an exact classification of the examples 131 where i = 1,…,n-1 is a possible cut point, if Ai and Ai+1 have been taken by different class values in the training dataset. The information gain heuristic check each of the possible cut points and find the best splitting cut point. It’s a top-down discretization approach. It produces very large number of interval borders, if the attribute is not very informative. C. Naïve Bayesian Classifier The Naïve Bayesian (NB) classifier is a simple probabilistic classifier [28]. NB classifier is based on probability models that incorporate strong independence assumptions, which often have no bearing in reality, hence are (deliberately) naïve. A more descriptive term for the underlying probability model would be independent feature model. Furthermore the probability model can be derived using Bayes’ Theorem. NB classifier estimates the class-conditional probability by assuming that the attributes are conditionally independent, given the class label c. The conditional independence assumption can be formally stated as follows: n P( A | C = c) = ∏ P( Ai | C = c) Where each attribute set A = {A1,A2,….,An} consists of n attribute values. With the conditional independence assumption, instead of computing the class-conditional probability for every combination of A, only estimate the conditional probability of each Ai, given C. The latter approach is more practical because it does not require a very large training set to obtain a good estimate of the probability. To classify a test example, the Naïve Bayesian classifier computes the posterior probability for each class C. n P(C | A) = P(C )∏ P( Ai | C ) i =1 P( A) v | Di | * Info( Di ) j =1 | D | InfoA ( D) = −∑ (3) The information gain heuristic adopted in ID3 classifier can be used to find the most informative border to split the value domain of the continuous attribute, when the continuous attribute values are in ascending order. The maximum information gain is always considered at a cut point or the midpoint between the values taken by two examples of different classes. Each attribute value of the formula “A = (Ai +Ai+1)/2” ©2013 ACADEMY PUBLISHER (5) (2) The term |Dj|/|D| acts as the weight of the jth partition. InfoA(D) is the expected information required to classify an example from training dataset D based on the partitioning by A. The information gain is defined as the difference between the original information requirement and the new requirement, that is, Gain( A) = Info( D) − Info A ( D) (4) i =1 Since P(A) is fixed for every A, it is sufficient to choose the class that maximizes the numerator term, n P(C )∏ P( Ai | C ) (6) i =1 The naïve Bayesian classifier has several advantages. It is easy to use, and unlike other classification approaches, only one scan of the training data is required. The naïve Bayesian classifier can easily handle missing attribute values by simply omitting the probability when calculating the likelihoods of membership in each class. According to Bayesian formula: 132 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 4, NO. 3, AUGUST 2013 P(C j | A) = P( A | Cj ) P(C j ) m ∑ P( A | C ) P(C ) k =1 k follows: (7) k Where P(Cj|A) is the probability of an example belonging to class Cj. If the example takes the continuous attributes then the probability P(Cj|A) of the example taking attribute value if it is classified in the class Cj. Given P(Cj) and P(Cj|A), a probability curve can be constructed for each class Cj: f j ( A) = P( A | C j ) P(C j ) (8) When the curves for every class have been constructed, the interval borders are placed on each of those points where the leading curves are different on its two sides. III. PROPOSED METHODS In this section, we introduce algorithms for discretization of continuous attributes and handling the contradictory examples in training data set. The proposed discretization algorithm improved the quality of discretization of continuous attribute, which scaled up the classification rates in classification problems. A. Discretization Algorithm Dealing with a continuous domain means discretization of the numerical domain into a certain number of intervals. The discretized intervals can be treated in a similar way to nominal values during induction and deduction. This section presents the algorithm that produce remarkably accurate rules for difficult problem domains, and also the problems contain continuous attributes. Given a training data set, N indicates the number of the examples in the training set, C = {C1, C2,…,Cm}, is the number of the classes that the examples in training set are classified into a particular class Cj, A= {A1, A2,…,An} is the number of attributes used to describe each example in training set, and each attribute Ai contains the following attribute values {Ai1, Ai2,…,Aih}. For each continuous attribute Ai in training dataset, sort the attribute values {Ai1, Ai2,…,Aih} in ascending order. Then find the minimum (Min) and maximum (Max) attribute value for each class value Cj in training dataset and select the cut points in the attributes values based on the Min and Max values of each class Cj. Next find the information gain value and probability P(Cj|A) value on each cut point, and select the best cut point with maximum values of information gain and probability. This cut point is selected as an interval border. If the cut point does not satisfy maximum information gain and probability value, then select the next cut point where information gain value and probability value are maximum at the same point. The main procedure of proposed algorithm is described as ©2013 ACADEMY PUBLISHER Algorithm I: Discretization of continuous attributes. Input: N, number of training examples. Ai, continuous attributes. Cj, class values in training dataset. Output: Interval borders in Ai. Procedure: 1. for each continuous attribute Ai in training dataset do 2. Sort the values of continuous attribute Ai in ascending order. 3. for each class Cj in training dataset do 4. Find the minimum (Min) attribute value of Ai for Cj. 5. Find the maximum (Max) attribute value of Ai for Cj. 6. endfor 7. Find the cut points in the continuous attributes values based on the Min and Max values of each class Cj. 8. Find the information gain on each cut point, and select the cut point with maximum information gain value. 9. Find the probability P(Cj|A) on each cut point and select the cut point with maximum probability value. 10. if both the cut point using the maximum information gain value and the maximum probability value is same then it can be taken as an interval border else consider the next cut point, where information gain value and probability value are maximum at the same point. 11. endfor If the attribute values of a continuous attribute are same for all the classes in the training dataset that cannot separate the class distributions, then these attributes are redundant. So, we can remove the redundant attributes from the training dataset. B. Algorithm for Handling Contradictory Examples Contradictory example is the example that appears more than once in the training dataset with different class labels. Contradictory examples confuse the learning algorithms, so these examples should be avoided or labeled correctly before learning. Redundant examples are the multiple copies of the same example in the training dataset. It does not create problem, if they do not form contradictions, but redundancy can change the decision trees produced by ID3 algorithm. For mining complex data, it is better to remove redundancy from training data by keeping only a unique example in the training dataset. By doing so, it not only saves the space of storage in dataset but also speeding significantly the learning process. The main procedure of handling contradictory example is described as follows: Algorithm II: Handling Contradictory Examples. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 4, NO. 3, AUGUST 2013 Input: N, number of training examples. Cj, class values in training dataset. Output: Correctly labeled contradictory examples. Procedure: 1. Search the multiple copies of same example in training examples N, if found then keeps only one unique example in training examples N. 2. Search the contradictory examples in training data; whose attribute values are same, but class value is different. 3. If found contradictory example in training data; then find the similarity of attribute values of contradictory example with each other examples in training data. 4. Find the majority of the similarity of attributes value among the examples in training data. 5. Replace the class value of contradictory example by examples’ majority class value in training dataset having similar attributes. 6. Step 2 to 5 will be repeated until all the contradictory examples in training data are correctly labeled. IV. EXPERIMENTAL ANALYSIS In this section, we discuss some of the benchmark problems from UCI machine learning repository that have been used for experimental analysis. A. Benchmark Problems From UCI Repository Data set is the set of data items, which is a very basic concept of data mining or machine learning. A data set is a collection of examples. Each example consists of a number of attributes that is classified into a particular class. Following describes the six benchmark datasets that ware used for experimental analysis. 1) Soybean (Small) Database: A small subset of the original soybean database, which contains 47 examples, 35 attributes, and 4 class values. There is no missing attribute value in this dataset. 2) Breast Cancer Dataset: This dataset was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia that includes 201 instances of one class and 85 instances of another class. The instances are described by 9 attributes. 3) Iris Plants Database: This is one of the best known databases in the pattern recognition literature. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2 classes. There are 4 attributes in this dataset. 4) Heart Disease Databases: This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. The "goal" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. 5) Australian Credit Approval: This data set concerns credit card applications, which contains ©2013 ACADEMY PUBLISHER 133 690 instances with 14 attributes. All attribute names and values have been changed to meaningless symbols to protect confidentiality of the data. 6) Ecoli Data Set: This data set contains 336 instances, 8 attributes, and 6 classes for classification. B. Experimental Result All the experiments have been done using the above six datasets from UCI machine learning repository [17] laid out in Table I. In all, the six datasets have 71 continuous attributes. TABLE I DATASETS AND THEIR SUMMARY Dataset Continuous Attributes Size Class Values Soybean 35 47 4 Breast 9 286 2 Iris 4 150 3 Heart 10 270 2 Australian 6 690 2 Ecoli 7 336 6 Total = 71 We have discretized each continuous attribute of the above shown datasets using proposed algorithm I, and then removed the redundant attributes from training datasets. After that using proposed algorithm II, we have removed the redundant examples from training datasets, and correctly labeled the class value of contradictory examples in training datasets. Table II shows the comparison of classification rates of each dataset after discretization of continuous attributes using proposed discretization approach, discretization by ID3 classifier, and discretization by naïve Bayesian (NB) classifier. The results show that proposed discretization method improves the classification rates with compare to ID3 classifier and NB classifier. TABLE II Classification rate (%) for each dataset Dataset Proposed Approach ID3 NB Soybean 100 93 90 Breast 95 79 75 Iris 89 85 83 Heart 77 63 59 Australian 83 80 77 Ecoli 93 87 91 Fig. 1, shows the comparison of accuracy of information gain heuristic, Bayesian classifier, and proposed algorithm for each data set. Figure 1. Comparison of the accuracy of methods for each data set. 134 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 4, NO. 3, AUGUST 2013 V. CONCLUSION In this paper, we have introduced new algorithms for dealing with continuous attributes and contradictory examples in training dataset, which improved the classification accuracy for different types of classification problems. It also removed the redundant attributes and examples from training dataset. The proposed discretization algorithm used information gain heuristic and naïve Bayesian classifier to improve the quality of discretization. It found the possible cut points in continuous attribute that can separate the class distributions, and also considered the best cut point as an interval border with maximum information gain value and maximum probability value. The proposed algorithm for handling contradictory examples in training dataset replaces the class value of contradictory example by examples’ majority class value in training dataset having similar attributes. The proposed algorithms have been tested on a number of benchmark problems from UCI machine learning repository, and the experimental results proved that it improved the quality of discretization and classification rate. The future works focus on applying the proposed methods in real life problem domains of classification task of supervised learning and clustering task of unsupervised learning. ACKNOWLEDGMENT Support for this research received from Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh. REFERENCES [1] H. Liu, F. Hussain, C. L. Tan, and M. Dash, “Discretization: An enabling technique,” Data Mining and Knowledge Discovery, Vol. 6, No. 4, 2002, pp. 393-423. [2] A. Kusiak, “Feature transformation methods in data mining,” IEEE Trans. On Electronics Packaging Manufacturing, Vol. 24, No. 3, 2001, pp. 214-221. [3] D. A. Zighed, S. Rabaseda, and R. Rakotomalala, “Discretization methods in supervised learning,” Encyclopedia of Computer Science and Technology, Vol. 40, 1998, pp. 35-50. [4] J. Y. Ching, A. K. C. Wong, and K. C. C. Chan, “Classdependent discretization for inductive learning from continuous a n d m i x e d m o d e d a t a ,” IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. 17, No. 7, 1995, pp. 641-651. [5] M. R. Chmielewski, and J. W. GrzymalanBusse, “Global discretization of continuous attributes as pre-processing for machine learning,” In 3rdInternational Workshop on Rough Sets and Soft Computing, 1994, pp. 294-301. [6] U. M. Fayyed, and K. B. Irani, “On the handling of continuous-valued attributes in decision tree generation,” Machine Learning, Vol. 8, 1992, pp. 87-102. [7] J. R. Quinlan, “Improved use of continuous attributes in C4.5,” Journal of Artificial Intelligence Research, Vol. 4, 1996, pp. 77-90. [8] I. Kononenko, “naïve Bayesian classifier and continuous attributes,” Informatics 16, 1992, pp. 53-54. ©2013 ACADEMY PUBLISHER [9] M. J. Pazzani, “An iterative improvement approach for the discretization of numeric attributes in Bayesian classifiers,” In Proc. of the 1st International Conference on Knowledge Discovery and Data Mining, 1995, pp. 228-233. [10] A. Tsymbal, S. Puuronen, and D. W. Patterson, “Ensemble feature selection with the simple Bayesian classification,” Information Fusion, vol. 4, no. 2, pp. 87–100, 2003. [11] L. S. Oliveira, R. Sabourin, R. F. Bortolozzi, and C. Y. Suen, “Feature selection using multi-objective genetic algorithms for handwritten digit recognition,” In Proc. of the 16th International Conference on Pattern Recognition (ICPR 2002), Quebec: IEEE Computer Society, pp. 568–571, 2002. [12] C. G. Salcedo, S. Chen, D. Whitley, and S. Smith, “Fast and accurate feature selection using hybrid genetic strategies,” In Proc. of the Genetic and Evolutionary Computation Conference, pp. 1–8, 1999. [13] S. Chebrolu, A. Abraham, and J. P. Thomas, “Feature deduction and ensemble design of intrusion detection systems,” Computer & Security, vol. 24, no. 4, pp. 295– 307, 2004. [14] A. H. Sung and S. Mukkamala, “Identifying important features for intrusion detection using support vector machines and neural networks,” In Proc. of International Symposium on Applications and the Internet (SAINT 2003), pp. 209–217, 2003. [15] P. V. W. Radtke, R. Sabourin, and T. Wong, “Intelligent feature extraction for ensemble of classifiers,” In Proc. of the 8th International Conference on Document Analysis and Recognition (ICDAR 2005), Seoul: IEEE Computer Society, pp. 866–870, 2005. [16] W. K. Lee and S. J. Stolfo, “A framework for constructing features and models for intrusion detection systems,” ACM Transactions on Information and System Security, vol. 3, no. 4, pp. 227–261, 2000. [17] Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. [18] U. M. Fayyad, and K. B. Irani, “Multi-internal discretization of continuous-valued attributes for classification learning,” In Proc. Of the 13th International Joint Conf. on Artificial Intelligence, Morgan Kaufmann, San Mateo, CA, 1993, pp. 1022-1027. [19] T. Van de Merckt, “Decision trees in numerical attribute spaces,” In Proc. Of the 13th International Joint Conf. on Artificial Intelligence, Morgan Kaufmann, San Mateo, CA, 1993, pp. 1013-1021. [20] J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and unsupervised discretization of continuous features,” In Proc. Of the 12th International Conference on Machine Learning, 1995, pp. 194-202. [21] Bernhard Pfahringer, “Compression-based discretization of continuous attributes,” ICML 1995, 1995, pp. 456-463. [22] P. Perner, and S. Trautzsh, “Multi-interval discretization methods for decision tree learning,” In Proc. of Advances in Pattern Recognition, Joint IAPR International Workshops SSPRS98, 1998, pp. 475-482. [23] S. D. Bay, “Multivariate discretization of continuous th variables for set mining,” In Proc. of t he 6 ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, 2000, pp. 315-319. [24] C. N. Hus, H. J. Huang, and T. T. Wong, “Why discretization work for naïve Bayesian classifier,” In Proc. Of the 17th International Conference on Machine Learning, 2000, pp. 309-406. [25] H. Ishibuchi, T. Yamamoto, and T. Nakashima, JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 4, NO. 3, AUGUST 2013 [26] [27] [28] [29] [30] [31] 135 “Fuzzy data mining: Effect of fuzzy discretization,” In the 2001 IEEE International Conference on Data Mining, 2001. M. Boulle, “A Bayes optimal approach for partitioning the values of categorical attributes,” Journal of Machine Learning Research, Vol. 6, 2005, pp. 1431-1452. J. R. Quinlan, “Induction of Decision Tree,” Machine Learning, Vol. 1, pp. 81-106, 1986. P. Langley, “Induction of recursive Bayesian classifier,” In Proc. of the European Conference on Machine Learning, 1993, pp. 153-164. D. M. Farid, “Improve the Quality of Supervised Discretization of Continuous Valued Attributes in Data Mining,” In Proc. of the 14th International Conference on Computer and Information Technology (ICCIT 2011), 2224 December 2011, Dhaka, Bangladesh, pp. 61-64, and IEEE Xplore Digital Archive. A. Biswas, D. M. Farid, and C. M. Rahman, “A New Decision Tree Learning Approach for Novel Class Detection in Concept Drifting Data Stream Classification,” Journal of Computer Science and Engineering (JCSE-UK), Vol. 14, Issues 1, July 2012, pp. 1-8. D. M. Farid, and C. M. Rahman, “Novel Class Detection in Concept-Drifting Data Stream Mining Employing Decision Tree,” 7th International Conference on Electrical and Computer Engineering (ICECE 2012), 2022 December 2012, Dhaka, Bangladesh, pp. 630-633, and IEEE Xplore Digital Archive. Dr. Dewan Md. Farid is a post doctoral fellow in the Department of Computer Science and Digital Technology, Northumbria University, Newcastle upon Tyne, UK. He was an Assistant Professor in the Department of Computer Science and Engineering, United International University, Bangladesh. He received B.Sc. in Computer Science and Engineering from Asian University of Bangladesh in 2003, M.Sc. in Computer Science and Engineering from United International University, Bangladesh in 2004, and Ph.D. in Computer Science and Engineering from Jahangirnagar University, Bangladesh in 2012. Dr. Farid has published 1 book chapter, 10 international journals and 12 international conferences in the field of data mining, machine learning, and intrusion detection. He has participated and presented his papers in international conferences at Malaysia, Portugal, Italy, and France. He is a member of IEEE and IEEE Computer Society. He worked as a visiting researcher at ERIC Laboratory, University Lumière Lyon 2 – France from 01-092009 to 30-06-2010. He received Senior Fellowship I & II awarded by National Science & Information and Communication Technology (NSICT), Ministry of Science & Information and Communication Technology, Government of Bangladesh, in 2008 and 2011 respectively. ©2013 ACADEMY PUBLISHER Professor Dr. Chowdhury Mofizur Rahman had his B.Sc. (EEE) and M.Sc. (CSE) from Bangladesh University of Engineering and Technology (BUET) in 1989 and 1992 respectively. He earned his Ph.D. from Tokyo Institute of Technology in 1996 under the auspices of Japanese Government scholarship. Prof Chowdhury is presently working as the Pro Vice Chancellor and acting treasurer of United International University (UIU), Dhaka, Bangladesh. He is also one of the founder trustees of UIU. Before joining UIU he worked as the head of Computer Science & Engineering department of Bangladesh University of Engineering & Technology which is the number one technical public university in Bangladesh. His research area covers Data Mining, Machine Learning, AI and Pattern Recognition. He is active in research activities and published around 100 technical papers in international journals and conferences. He was the Editor of IEB journal and worked as the moderator of NCC accredited centers in Bangladesh. He worked as the organizing chair and program committee member of a number of international conferences held in Bangladesh and abroad. At present he is acting as the coordinator from Bangladesh for EU sponsored cLINK project. Prof Chowdhury has been working as the external expert member for Computer Science departments of a number of renowned public and private universities in Bangladesh. He is actively contributing towards the national goal of converting the country towards Digital Bangladesh.