Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Measure of a Man (Star Trek: The Next Generation) wikipedia , lookup
Data (Star Trek) wikipedia , lookup
Machine learning wikipedia , lookup
Gene expression programming wikipedia , lookup
Cross-validation (statistics) wikipedia , lookup
Time series wikipedia , lookup
International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol. 3, Issue 4, Oct 2013, 159-172 © TJPRC Pvt. Ltd. CLASSIFICATION OF DEFORESTATION FACTORS USING DATA MINING TECHNIQUES S. JYOTHI1, K. SARITHA2 & K. R. MANJULA3 1 Professor, Department of Computer Science, Sri Padmavati Women’s University, Tirupati, Andhra Pradesh, India 2 Research Scholar, Department of Computer Science, Sri Padmavati Women’s University, Tirupati, Andhra Pradesh, India 3 Assistant Professor, Department of CSE, SASTRA University, Thanjavur, Tamil Nadu, India ABSTRACT Data mining techniques have been widely used for extracting knowledge from large amounts of data. Monitoring deforestation is utmost important for the developing countries. Classification of deforestation is one of the primary objectives in the analysis of remotely sensed data. The present study focuses on monitoring accurate results of deforestation and forest degradation using classification techniques. In this paper, an experiment has been set up on different classification algorithms to compare the results. To evaluate the results, we used the WEKA open source tool, which is a collection of machine learning algorithms consisting of different processing tasks such as classification, association and clustering. The main aim of our study in this paper is comparative study of the classification algorithms to find the best algorithm of our data set. KEYWORDS: Deforestation, Data Mining, Classification, WEKA INTRODUCTION Knowledge discovery in databases is the nontrivial process of identifying valid, novel, useful and ultimately understandable patterns in data [7]. The process of automatic classification based on data patterns obtained from data set is referred as Data mining [5]. Classification is one of the data mining task, the objective of the classification is to build a model in training data set to predict the class of future objects whose class label is not known [2][13]. There are lots of classification algorithms, for example, classification based on decision-tree, Bayesian classification based on statistics, classification based on neural network [4]. Geospatial data mining is a process of geographic knowledge discovery from the spatial datasets mined with data mining algorithms to identify interesting and previously unknown but potentially useful patterns [14]. Conversion of forest land into non-forested land either directly or indirectly is referred as deforestation. But generally most of the deforestation is happening due to human intervention or activities like developing urbanization, constructing roads along forest area, improving or shifting the agriculture land, mining the available resources at forests, logging the wood for fuel or other purposes etc. These are the major causes of deforestation. Data mining techniques are applied for classifying the factors of deforestation. In this paper, the classification techniques are applied for classifying our data and the performance of each classification is achieved and compared to analyze the best classification technique. PROBLEM DOMAIN The study area covers the 5000 square kilometers which includes Chittoor, Kadapa and Nellore districts. The 160 S. Jyothi, K. Saritha & K. R. Manjula boundary lies between lower left East 78 " Longitude and E 13 " Latitude and the upper right corner N 79 39" Longitude and N 14 33" Latitude with an area of 15,379 square kilometers of Kadapa district, which include 51 Mandals and three revenue divisions. The geographical area of Chittoor district lies between 12 37"to 14 18" N Latitude and 78 33"to 79 55" E Longitude. The district area is 13,076 square kilometers divided into three revenue divisions and 46 Mandals administratively. The data is derived from the Manjula et.al [10][11] consisting of maps and tables regarding the association technique. The data set consists of 5 attributes and 99 instances. The classification problem involved the factors of deforestation like Agriculture, Built-up, Mining and Roads that cover the bulk area of forest used for analyzing the best algorithm for our data set. Figure 1 and Figure 2 represent the maps of study area. Figure 1: Map of Study Area CLASSIFICATION TECHNIQUES Classification of data is very typical task in data mining. There are large number of classifiers that are used to classify the data such as Bayes net, Function, Rule based and Decision Tree etc. The goal of classification is to predict the correct value of a designated discrete class variable, given a vector of predictors or attributes [6]. In this paper we implement our data in WEKA machine learning tool to analyze the performance analysis of different classification techniques. Figure 2: Topographical Map of Study Area Comparison of Classification Algorithms Bayesian Methods Bayesian methods are also one of the classification techniques in data mining. In this paper two main Bayesian methods are used namely Naive Bayes and Bayesian networks that are implemented in WEKA software for classification. 161 Classification of Deforestation Factors Using Data Mining Techniques A Bayes classifier could be defined as an independent feature model deals with a simple probabilistic classifier based on applying Bayes theorem with strong independence assumptions. Bayes rule is applied to calculate the likelihood. There are several models that make different assumption fitting for Naive Bayes [12] [8]. Evaluation on Training Set Time taken to build model: 0 seconds Correctly Classified Instances 99 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0.0194 Root mean squared error 0.0521 Relative absolute error 10.1457 % Root relative squared error 16.922 % Total Number of Instances 99 === Confusion Matrix === a b c d e f g h <-- classified as 17 0 0 0 0 0 0 0 | a = ABM 0 42 0 0 0 0 0 0 | b = AR 00300000|c=R 0 0 0 3 0 0 0 0 | d = ABMR 0 0 0 0 6 0 0 0 | e = AMR 0 0 0 0 0 11 0 0 | f = A 0 0 0 0 0 0 10 0 | g = ABR 0 0 0 0 0 0 0 7 | h = BR Stratified Cross-Validation Time taken to build model: 0 seconds Correctly Classified Instances Incorrectly Classified Instances 98 98.9899 % 1 1.0101 % Kappa statistic 0.9867 Mean absolute error 0.0247 Root mean squared error 0.0743 Relative absolute error 12.8792 % 162 S. Jyothi, K. Saritha & K. R. Manjula Root relative squared error 24.0768 % Total Number of Instances 99 === Confusion Matrix === a b c d e f g h <-- classified as 17 0 0 0 0 0 0 0 | a = ABM 0 42 0 0 0 0 0 0 | b = AR 00300000|c=R 0 0 0 3 0 0 0 0 | d = ABMR 0 0 0 0 6 0 0 0 | e = AMR 0 0 0 0 0 11 0 0 | f = A 0 0 0 0 0 0 9 1 | g = ABR 0 0 0 0 0 0 0 7 | h = BR Naive Bayesian Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes theorem with strong independence assumptions. Naive Bayes classifier is that it only requires a small amount of training data to estimate the parameters necessary for classification. Evaluation on Training Set Time taken to build model: 0 seconds Correctly Classified Instances Incorrectly Classified Instances 98 98.9899 % 1 1.0101 % Kappa statistic 0.9867 Mean absolute error 0.0339 Root mean squared error 0.0861 Relative absolute error 17.6942 % Root relative squared error 27.9577 % Total Number of Instances 99 === Confusion Matrix === a b c d e f g h <-- classified as 17 0 0 0 0 0 0 0 | a = ABM 0 42 0 0 0 0 0 0 | b = AR 00300000|c=R Classification of Deforestation Factors Using Data Mining Techniques 163 0 0 0 3 0 0 0 0 | d = ABMR 0 0 0 0 6 0 0 0 | e = AMR 0 0 0 0 0 11 0 0 | f = A 0 0 0 0 0 0 9 1 | g = ABR 0 0 0 0 0 0 0 7 | h = BR Stratified Cross-Validation Time taken to build model: 0 seconds Correctly Classified Instances 95 95.9596 % Incorrectly Classified Instances 4 4.0404 % Kappa statistic 0.9458 Mean absolute error 0.0401 Root mean squared error 0.1055 Relative absolute error 20.8635 % Root relative squared error 34.2137 % Total Number of Instances 99 === Confusion Matrix === a b c d e f g h <-- classified as 17 0 0 0 0 0 0 0 | a = ABM 0 42 0 0 0 0 0 0 | b = AR 03000000|c=R 0 0 0 3 0 0 0 0 | d = ABMR 0 0 0 0 6 0 0 0 | e = AMR 0 0 0 0 0 11 0 0 | f = A 0 0 0 0 0 0 9 1 | g = ABR 0 0 0 0 0 0 0 7 | h = BR Decision Tree A decision tree is a flow chart like hierarchical tree structure consists of a root, a set of internal nodes and terminal nodes called leaves. The root node and the internal nodes are linked as decision stages, the terminal node represent final classification. The classification process provides a set of rules that determine the path starting from the root node and ending at one terminal node. Each terminal node represents one class label for the object being classified. Decision trees are powerful classification algorithms. Popular decision tree algorithms include Quinlan’s ID , C . , C , and Breiman et al.’s CART [3]. As the name implies, this technique recursively separates observations in 164 S. Jyothi, K. Saritha & K. R. Manjula branches to construct a tree for the purpose of improving the prediction accuracy. Most decision tree classifiers perform classification in two phases: tree-growing (or building) and tree-pruning. The tree building is done in top-down manner. During this phase the tree is recursively partitioned till all the data items belong to the same class label. In the tree pruning phase the full grown tree is cut back to prevent over fitting and improve the accuracy of the tree in bottom up fashion. It is used to improve the prediction and classification accuracy of the algorithm by minimizing the over-fitting. Compared to other data mining techniques, it is widely applied in various areas since it is robust to data scales or distributions. J48 is an open source Java implementation of the C4.5 algorithm in the WEKA data mining tool. C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is a software extension and thus improvement of the basic ID3 algorithm designed by Quinlan. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier [15]. For inducing classification rules in the form of Decision Trees from a set of given examples C4.5 algorithm was introduced by Quinlan. C4.5 is an evolution and refinement of ID3 that accounts for unavailable values, continuous attribute value ranges, pruning of decision trees, rule derivation, and so on. Evaluation on Training Set Time taken to build model: 0.02 seconds Correctly Classified Instances Incorrectly Classified Instances 98 98.9899 % 1 1.0101 % Kappa statistic 0.9867 Mean absolute error 0.0042 Root mean squared error 0.0459 Relative absolute error 2.197 % Root relative squared error 14.9017 % Total Number of Instances 99 === Confusion Matrix === a b c d e f g h <-- classified as 17 0 0 0 0 0 0 0 | a = ABM 0 42 0 0 0 0 0 0 | b = AR 00300000|c=R 0 0 0 3 0 0 0 0 | d = ABMR 0 0 0 0 6 0 0 0 | e = AMR 0 0 0 0 0 11 0 0 | f = A 0 0 0 0 0 0 9 1 | g = ABR 0 0 0 0 0 0 0 7 | h = BR 165 Classification of Deforestation Factors Using Data Mining Techniques Stratified Cross-Validation Time taken to build model: 0 seconds Correctly Classified Instances Incorrectly Classified Instances Kappa statistic 96 96.9697 % 3 3.0303 % 0.96 Mean absolute error 0.0081 Root mean squared error 0.0817 Relative absolute error 4.2058 % Root relative squared error 26.4947 % Total Number of Instances 99 === Confusion Matrix === a b c d e f g h <-- classified as 17 0 0 0 0 0 0 0 | a = ABM 0 42 0 0 0 0 0 0 | b = AR 00300000|c=R 0 0 0 3 0 0 0 0 | d = ABMR 0 0 0 0 6 0 0 0 | e = AMR 0 0 0 0 0 11 0 0 | f = A 0 0 0 0 0 0 9 1 | g = ABR 0 0 0 0 0 0 2 5 | h = BR K Nearest Neighbour A Nearest Neighbor Classifier assumes all instance correspond to points in the n-dimensional space. During learning, all instances are remembered. When a new point is classified, the k nearest points to the new point is found and is used with a weight for determining the class value of the new point. For the sake of increasing accuracy, greater weights are given to closer points [9]. Evaluation on Training Set Time taken to build model: 0 seconds Correctly Classified Instances 99 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0.0019 166 S. Jyothi, K. Saritha & K. R. Manjula Root mean squared error 0.0045 Relative absolute error 1.0033 % Root relative squared error 1.4611 % Total Number of Instances 99 === Confusion Matrix === a b c d e f g h <-- classified as 17 0 0 0 0 0 0 0 | a = ABM 0 42 0 0 0 0 0 0 | b = AR 00300000|c=R 0 0 0 3 0 0 0 0 | d = ABMR 0 0 0 0 6 0 0 0 | e = AMR 0 0 0 0 0 11 0 0 | f = A 0 0 0 0 0 0 10 0 | g = ABR 0 0 0 0 0 0 0 7 | h = BR Stratified Cross-Validation Time taken to build model: 0 seconds Correctly Classified Instances Incorrectly Classified Instances 98 98.9899 % 1 1.0101 % Kappa statistic 0.9867 Mean absolute error 0.0051 Root mean squared error 0.0438 Relative absolute error 2.635 % Root relative squared error 14.2084 % Total Number of Instances 99 === Confusion Matrix === a b c d e f g h <-- classified as 17 0 0 0 0 0 0 0 | a = ABM 0 42 0 0 0 0 0 0 | b = AR 00300000|c=R 0 0 0 3 0 0 0 0 | d = ABMR 0 0 0 0 6 0 0 0 | e = AMR 167 Classification of Deforestation Factors Using Data Mining Techniques 0 0 0 0 0 11 0 0 | f = A 1 0 0 0 0 0 9 0 | g = ABR 0 0 0 0 0 0 0 7 | h = BR Artificial Neural Networks Artificial Neural Networks (ANN) is one of the classification methods in data mining. To employ Figure 3: Multilayer Neural Network Neural Network based classifiers, Multi-Layer Perceptron (MLP) is used (Figure 3). MLP is a feed forward technique that makes a model to map input data to output data. Hidden layer in MLP can include various layers between input and output. The structure of MLP is shown below [1]. Evaluation on Training Set Time taken to build model: 1 seconds Correctly Classified Instances 99 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0.0089 Root mean squared error 0.0152 Relative absolute error 4.627 % Root relative squared error Total Number of Instances === Confusion Matrix === a b c d e f g h <-- classified as 17 0 0 0 0 0 0 0 | a = ABM 0 42 0 0 0 0 0 0 | b = AR 00300000|c=R 0 0 0 3 0 0 0 0 | d = ABMR 0 0 0 0 6 0 0 0 | e = AMR 0 0 0 0 0 11 0 0 | f = A 4.9526 % 99 168 S. Jyothi, K. Saritha & K. R. Manjula 0 0 0 0 0 0 10 0 | g = ABR 0 0 0 0 0 0 0 7 | h = BR Stratified Cross-Validation Time taken to build model: 0.86 seconds Correctly Classified Instances 98 98.9899 % Incorrectly Classified Instances 1 1.0101 % Kappa statistic 0.9866 Mean absolute error 0.0126 Root mean squared error 0.0502 Relative absolute error 6.5723 % Root relative squared error 16.2596 % Total Number of Instances 99 == Confusion Matrix === a b c d e f g h <-- classified as 17 0 0 0 0 0 0 0 | a = ABM 0 42 0 0 0 0 0 0 | b = AR 00300000|c=R 0 0 0 3 0 0 0 0 | d = ABMR 0 0 0 0 6 0 0 0 | e = AMR 0 0 0 0 0 11 0 0 | f = A 0 1 0 0 0 0 9 0 | g = ABR 0 0 0 0 0 0 0 7 | h = BR Simple CART CART algorithm stands for Classification And Regression Trees algorithm, it is a data exploration and prediction algorithm. Classification and Regression Trees is a classification method which in order to construct decision trees uses historical data. To classify new data decision trees so obtained are used. Number of classes must be known a prior in order to use CART. CART uses so called learning sample which is a set of historical data with pre-assigned classes for all observations for building decision trees [16]. Evaluation on Training Set Time taken to build model: 0.05 seconds Correctly Classified Instances Incorrectly Classified Instances 99 100 % 0 0 % 169 Classification of Deforestation Factors Using Data Mining Techniques Kappa statistic 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 % Root relative squared error 0 % Total Number of Instances 99 === Confusion Matrix === a b c d e f g h <-- classified as 17 0 0 0 0 0 0 0 | a = ABM 0 42 0 0 0 0 0 0 | b = AR 00300000|c=R 0 0 0 3 0 0 0 0 | d = ABMR 0 0 0 0 6 0 0 0 | e = AMR 0 0 0 0 0 11 0 0 | f = A 0 0 0 0 0 0 10 0 | g = ABR 0 0 0 0 0 0 0 7 | h = BR Stratified Cross-Validation Time taken to build model: 0.02 seconds Correctly Classified Instances Incorrectly Classified Instances 93 93.9394 % 6 6.0606 % Kappa statistic 0.9185 Mean absolute error 0.0163 Root mean squared error 0.1106 Relative absolute error 8.4774 % Root relative squared error 35.8587 % Total Number of Instances 99 === Confusion Matrix === a b c d e f g h <-- classified as 17 0 0 0 0 0 0 0 | a = ABM 0 42 0 0 0 0 0 0 | b = AR 03000000|c=R 170 S. Jyothi, K. Saritha & K. R. Manjula 0 0 0 0 0 0 3 0 | d = ABMR 0 0 0 0 6 0 0 0 | e = AMR 0 0 0 0 0 11 0 0 | f = A 0 0 0 0 0 0 10 0 | g = ABR 0 0 0 0 0 0 0 7 | h = BR EXPERIMENTAL RESULTS In this section, we present the results of different classification algorithms and perform analysis on their performance to verify the effectiveness of each algorithm. The domain of this work is to analyze the best algorithm for our data set. Performance evaluation of algorithms is also done between the training and validation methods to analyze the best algorithm. Table 1 shows the summary of the computational time, correct instances, kappa and measures like MAE, RMSE are evaluated on the Training set in the classification algorithms. Regarding to the computational time, Bayes Net, IBK, Naives Bayes perform fast computation with less time, but it also obtain worst results when we evaluate the kappa, measures and confusion matrix. It should also be noted that Bayes Net is showing the best performance when cross checked with kappa, measures and confusion matrix. Despite this fact, that is not considering the time computation, if we perform analysis on kappa, and other measures the algorithms like Bayes Net, MLP, Simple CART are playing major role in providing the best accuracy. Kappa is a chance-corrected measure of agreement between the classified classes. If Kappa =1, then there is a perfect agreement, if Kappa=0, then there is no agreement, if the value is >0 then it means that classifier is doing better classification. Mean absolute error is sum of absolute errors divided by number of predictions. Root means square error is a square of sum of squares error divided number of predictions, it is a measure the differences between values by a predicted by a model and the values actually observed. Smaller the values of RMSE shows that the model with better accuracy. So, if MAE and RMSE are minimum then the better prediction and accuracy. Using Training Set Table 1: Accuracy Results of all Methods in Training Set Algorithm Bayes Naïve Bayesian J48 IBK MLP SimpleCART Time 0 0 0.02 0 1 0.05 Correctly 99 98 98 99 99 99 Incorrectly 0 1 1 0 0 0 Kappa 1 0.9867 0.9867 1 1 1 MAE 0.0194 0.0339 0.0042 0.0019 0.0089 0 RMAE 0.0521 0.0861 0.0459 0.0045 0.0152 0 Figure 4: Comparison of Classifiers Based on Accuracy RAE % 10.1457 17.6942 2.197 1.0033 4.627 0 RRAE % 16.922 27.9577 14.9017 1.4611 4.9526 0 171 Classification of Deforestation Factors Using Data Mining Techniques Figure 4 demonstrates the comparison of accuracy between classifiers in many aspects. It is revealed and justifies in the graph that Bayes Net, IBK, MLP, and Simple CART shows the best performance results. Table 2 also shows the summary results of various classification techniques using the cross-validation of k-folds model. In the present case, the performance analysis of various algorithms indicates that time computation is high in MLP and Simple CART algorithms, whereas it is less in other learning algorithms like Bayes, Naïve Bayes, J48 and IBK but the measures like kappa statistics, MAE, RMSE and confusion matrix is indicating less accuracy. But IBK (K-NN) is showing the high accurate results in all aspects. When compared to all algorithms IBK k-nearest neighbor showing the best accuracy but its accuracy is not excellent is training set, though its performance on cross-validation is approached to the perfect level. Interestingly, if the performance analysis is done independent of time then the MLP algorithm is showing the good accuracy in both the cases. Using Cross-Validation Table 2: Accuracy Results of all Methods in Cross-Validation Algorithm Bayes NaiveBayesian J48 IBK MLP SimpleCART Time 0 0 0 0 0.86 0.02 Correctly 98 95 96 98 98 93 Incorrectly 1 4 3 1 1 6 Kappa 0.9867 0.9458 0.96 0.9867 0.9866 0.9185 MAE 0.0247 0.0401 0.0081 0.0051 0.0126 0.0163 RMAE 0.0743 0.1055 0.0817 0.0438 0.0502 0.1106 RAE % 12.8792 20.8635 4.2058 2.035 6.5723 8.4774 RRAE % 24.5768 34.2137 26.4947 14.2084 16.2596 35.8587 Figure 5: Comparison of Classifiers Based on Performance Figure 5 describes the performance of each classifier. To evaluate the performance of different methods, we made detailed analysis on above characteristics and we can see that Simple CART achieves the best classification performance. CONCLUSIONS A variety of classification methods has been applied and tested on deforestation data. Our main aim is to analyze the best algorithm for our data set. For this purpose, we compare the performance results of different classification algorithms in WEKA a Machine Learning Language tool. Selecting the best algorithm is an important task to pertain the accurate results, Which are not found in the observed algorithms, some of the algorithms are yielding the best results like Bayes Net, MLP, Simple CART but the time of computation, MAE, RMSE are differ in each cases. The results and findings of the presented study may be used for extending the new algorithm which reflects the best properties of the different classification algorithms. So to obtain the optimal results for our data set, we propose the hybrid algorithm as our future work containing the best properties of the above algorithms. ACKNOWLEDGEMENTS The authors are grateful to the UGC for providing the funds for our work. 172 S. Jyothi, K. Saritha & K. R. Manjula REFERENCES 1. D. A. Avellaneda, et al., "Natural Texture Classification: A Neural Network Models Benchmark," 2009, pp. 325329. 2. R. Bayardo, “Brute-force mining of high-confidence classification rules”, Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD-97), AAAI Press, Newport Beach, CA, United States, August 1997, pp. 123-126. 3. Bernhard Pfahringer, Geoffrey Holmes and Richard Kirkby, “Optimizing the Induction of Alternating Decision Trees”, Proceedings of the Fifth Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. 2001, pp. 477-487. 4. Chung-Chian Hsu, Sheng-Hsuan Wang. “An integrated framework for visualized and exploratory pattern discovery in mixed data’, IEEE Transactions on Knowledge and Data Engineering, 2006, 18(2): 161-173. 5. Desouza, K.C. (2001) Artificial intelligence for healthcare management In Proceedings of the First International Conference on Management of Healthcare and Medical Technology Enschede, Netherlands: Institute for Healthcare Technology Management. 6. Daniel Grossman and Pedro Domingos (200 ), “Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood”, In Press of Proceedings of the 21st International Conference on Machine Learning, Banff, Canada. 7. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. From Data Mining to Knowledge Discovery: An Overview. In Fayyad, U., Piatetsky-Shapiro, G., Amith, Smyth, P., and Uthurusamy, R. (eds.), Advances in Knowledge Discovery and Data Mining, MIT Press, 1-36, Cambridge, 1996. 8. Y. Herdiyeni, et al., "A Bayesian network approach for image similarity," International Conference on Instrumentation, Communications, Information Technology, and Biomedical Engineering (ICICI-BME), 2009, pp. 1-6. 9. T.M. Mitchell, Machine Learning, McGraw-Hill Companies, USA, 1997. 10. K.R. Manjula, Dr. S. Jyothi, S. Anand Kumar Varma, ”Analysing the factors of deforestation using GIS”. 11. K.R. Manjula, Dr. S. Jyothi, S. Anand Kumar Varma, Dr. S.Vijaya Kumar, “ Construction of Spatial Dataset from Remote Sensing using GIS for Deforestation Study”, International Journal of Computer Applications (0975 – 8887) Volume 31– No.10, October 2011. 12. G. Qiang, "An Effective Algorithm for Improving the Performance of Naive Bayes for Text Classification,",Second International Conference on Computer Research and Development, 2010, pp. 699-701. 13. J. Quinlan, C . , “Programs for machine learning”, San Mateo, CA: Morgan Kaufmann, 1993. 14. Shekhar, S., et al., Trends in Spatial Data Mining, in Data Mining: Next Generation Challenges and Future Directions, AAAI Pres,2004. 15. http://www.c4.5-Wikipedia, the free encyclopedia. 16. http:// www.CART-Wikipedia, the free encyclopedia.