Download Classification of Deforestation Factors Using Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

The Measure of a Man (Star Trek: The Next Generation) wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Machine learning wikipedia , lookup

Gene expression programming wikipedia , lookup

Cross-validation (statistics) wikipedia , lookup

Time series wikipedia , lookup

Pattern recognition wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
International Journal of Computer Science Engineering
and Information Technology Research (IJCSEITR)
ISSN 2249-6831
Vol. 3, Issue 4, Oct 2013, 159-172
© TJPRC Pvt. Ltd.
CLASSIFICATION OF DEFORESTATION FACTORS USING DATA MINING
TECHNIQUES
S. JYOTHI1, K. SARITHA2 & K. R. MANJULA3
1
Professor, Department of Computer Science, Sri Padmavati Women’s University, Tirupati, Andhra Pradesh, India
2
Research Scholar, Department of Computer Science, Sri Padmavati Women’s University, Tirupati, Andhra Pradesh, India
3
Assistant Professor, Department of CSE, SASTRA University, Thanjavur, Tamil Nadu, India
ABSTRACT
Data mining techniques have been widely used for extracting knowledge from large amounts of data. Monitoring
deforestation is utmost important for the developing countries. Classification of deforestation is one of the primary
objectives in the analysis of remotely sensed data. The present study focuses on monitoring accurate results of
deforestation and forest degradation using classification techniques. In this paper, an experiment has been set up on
different classification algorithms to compare the results. To evaluate the results, we used the WEKA open source tool,
which is a collection of machine learning algorithms consisting of different processing tasks such as classification,
association and clustering. The main aim of our study in this paper is comparative study of the classification algorithms to
find the best algorithm of our data set.
KEYWORDS: Deforestation, Data Mining, Classification, WEKA
INTRODUCTION
Knowledge discovery in databases is the nontrivial process of identifying valid, novel, useful and ultimately
understandable patterns in data [7]. The process of automatic classification based on data patterns obtained from data set is
referred as Data mining [5]. Classification is one of the data mining task, the objective of the classification is to build a
model in training data set to predict the class of future objects whose class label is not known [2][13]. There are lots of
classification algorithms, for example, classification based on decision-tree, Bayesian classification based on statistics,
classification based on neural network [4]. Geospatial data mining is a process of geographic knowledge discovery from
the spatial datasets mined with data mining algorithms to identify interesting and previously unknown but potentially
useful patterns [14].
Conversion of forest land into non-forested land either directly or indirectly is referred as deforestation. But
generally most of the deforestation is happening due to human intervention or activities like developing urbanization,
constructing roads along forest area, improving or shifting the agriculture land, mining the available resources at forests,
logging the wood for fuel or other purposes etc. These are the major causes of deforestation.
Data mining techniques are applied for classifying the factors of deforestation. In this paper, the classification techniques
are applied for classifying our data and the performance of each classification is achieved and compared to analyze the best
classification technique.
PROBLEM DOMAIN
The study area covers the 5000 square kilometers which includes Chittoor, Kadapa and Nellore districts. The
160
S. Jyothi, K. Saritha & K. R. Manjula
boundary lies between lower left East 78
" Longitude and E 13
" Latitude and the upper right corner N 79 39"
Longitude and N 14 33" Latitude with an area of 15,379 square kilometers of Kadapa district, which include 51 Mandals
and three revenue divisions. The geographical area of Chittoor district lies between 12 37"to 14 18" N Latitude and 78
33"to 79 55" E Longitude. The district area is 13,076 square kilometers divided into three revenue divisions and 46
Mandals administratively. The data is derived from the Manjula et.al [10][11] consisting of maps and tables regarding the
association technique. The data set consists of 5 attributes and 99 instances. The classification problem involved the factors
of deforestation like Agriculture, Built-up, Mining and Roads that cover the bulk area of forest used for analyzing the best
algorithm for our data set. Figure 1 and Figure 2 represent the maps of study area.
Figure 1: Map of Study Area
CLASSIFICATION TECHNIQUES
Classification of data is very typical task in data mining. There are large number of classifiers that are used to
classify the data such as Bayes net, Function, Rule based and Decision Tree etc. The goal of classification is to predict the
correct value of a designated discrete class variable, given a vector of predictors or attributes [6]. In this paper we
implement our data in WEKA machine learning tool to analyze the performance analysis of different classification
techniques.
Figure 2: Topographical Map of Study Area
Comparison of Classification Algorithms
Bayesian Methods
Bayesian methods are also one of the classification techniques in data mining. In this paper two main Bayesian
methods are used namely Naive Bayes and Bayesian networks that are implemented in WEKA software for classification.
161
Classification of Deforestation Factors Using Data Mining Techniques
A Bayes classifier could be defined as an independent feature model deals with a simple probabilistic classifier
based on applying Bayes theorem with strong independence assumptions. Bayes rule is applied to calculate the likelihood.
There are several models that make different assumption fitting for Naive Bayes [12] [8].
Evaluation on Training Set
Time taken to build model: 0 seconds
Correctly Classified Instances
99
100
%
Incorrectly Classified Instances
0
0
%
Kappa statistic
1
Mean absolute error
0.0194
Root mean squared error
0.0521
Relative absolute error
10.1457 %
Root relative squared error
16.922 %
Total Number of Instances
99
=== Confusion Matrix ===
a b c d e f g h <-- classified as
17 0 0 0 0 0 0 0 | a = ABM
0 42 0 0 0 0 0 0 | b = AR
00300000|c=R
0 0 0 3 0 0 0 0 | d = ABMR
0 0 0 0 6 0 0 0 | e = AMR
0 0 0 0 0 11 0 0 | f = A
0 0 0 0 0 0 10 0 | g = ABR
0 0 0 0 0 0 0 7 | h = BR
Stratified Cross-Validation
Time taken to build model: 0 seconds
Correctly Classified Instances
Incorrectly Classified Instances
98
98.9899 %
1
1.0101 %
Kappa statistic
0.9867
Mean absolute error
0.0247
Root mean squared error
0.0743
Relative absolute error
12.8792 %
162
S. Jyothi, K. Saritha & K. R. Manjula
Root relative squared error
24.0768 %
Total Number of Instances
99
=== Confusion Matrix ===
a b c d e f g h <-- classified as
17 0 0 0 0 0 0 0 | a = ABM
0 42 0 0 0 0 0 0 | b = AR
00300000|c=R
0 0 0 3 0 0 0 0 | d = ABMR
0 0 0 0 6 0 0 0 | e = AMR
0 0 0 0 0 11 0 0 | f = A
0 0 0 0 0 0 9 1 | g = ABR
0 0 0 0 0 0 0 7 | h = BR
Naive Bayesian
Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes theorem with strong
independence assumptions. Naive Bayes classifier is that it only requires a small amount of training data to estimate the
parameters necessary for classification.
Evaluation on Training Set
Time taken to build model: 0 seconds
Correctly Classified Instances
Incorrectly Classified Instances
98
98.9899 %
1
1.0101 %
Kappa statistic
0.9867
Mean absolute error
0.0339
Root mean squared error
0.0861
Relative absolute error
17.6942 %
Root relative squared error
27.9577 %
Total Number of Instances
99
=== Confusion Matrix ===
a b c d e f g h <-- classified as
17 0 0 0 0 0 0 0 | a = ABM
0 42 0 0 0 0 0 0 | b = AR
00300000|c=R
Classification of Deforestation Factors Using Data Mining Techniques
163
0 0 0 3 0 0 0 0 | d = ABMR
0 0 0 0 6 0 0 0 | e = AMR
0 0 0 0 0 11 0 0 | f = A
0 0 0 0 0 0 9 1 | g = ABR
0 0 0 0 0 0 0 7 | h = BR
Stratified Cross-Validation
Time taken to build model: 0 seconds
Correctly Classified Instances
95
95.9596 %
Incorrectly Classified Instances
4
4.0404 %
Kappa statistic
0.9458
Mean absolute error
0.0401
Root mean squared error
0.1055
Relative absolute error
20.8635 %
Root relative squared error
34.2137 %
Total Number of Instances
99
=== Confusion Matrix ===
a b c d e f g h <-- classified as
17 0 0 0 0 0 0 0 | a = ABM
0 42 0 0 0 0 0 0 | b = AR
03000000|c=R
0 0 0 3 0 0 0 0 | d = ABMR
0 0 0 0 6 0 0 0 | e = AMR
0 0 0 0 0 11 0 0 | f = A
0 0 0 0 0 0 9 1 | g = ABR
0 0 0 0 0 0 0 7 | h = BR
Decision Tree
A decision tree is a flow chart like hierarchical tree structure consists of a root, a set of internal nodes and terminal
nodes called leaves. The root node and the internal nodes are linked as decision stages, the terminal node represent final
classification. The classification process provides a set of rules that determine the path starting from the root node and
ending at one terminal node. Each terminal node represents one class label for the object being classified.
Decision trees are powerful classification algorithms. Popular decision tree algorithms include Quinlan’s ID ,
C . , C , and Breiman et al.’s CART [3]. As the name implies, this technique recursively separates observations in
164
S. Jyothi, K. Saritha & K. R. Manjula
branches to construct a tree for the purpose of improving the prediction accuracy. Most decision tree classifiers perform
classification in two phases: tree-growing (or building) and tree-pruning. The tree building is done in top-down manner.
During this phase the tree is recursively partitioned till all the data items belong to the same class label. In the tree pruning
phase the full grown tree is cut back to prevent over fitting and improve the accuracy of the tree in bottom up fashion. It is
used to improve the prediction and classification accuracy of the algorithm by minimizing the over-fitting. Compared to
other data mining techniques, it is widely applied in various areas since it is robust to data scales or distributions.
J48 is an open source Java implementation of the C4.5 algorithm in the WEKA data mining tool. C4.5 is an
algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is a software extension and thus improvement
of the basic ID3 algorithm designed by Quinlan. The decision trees generated by C4.5 can be used for classification, and
for this reason, C4.5 is often referred to as a statistical classifier [15]. For inducing classification rules in the form of
Decision Trees from a set of given examples C4.5 algorithm was introduced by Quinlan. C4.5 is an evolution and
refinement of ID3 that accounts for unavailable values, continuous attribute value ranges, pruning of decision trees, rule
derivation, and so on.
Evaluation on Training Set
Time taken to build model: 0.02 seconds
Correctly Classified Instances
Incorrectly Classified Instances
98
98.9899 %
1
1.0101 %
Kappa statistic
0.9867
Mean absolute error
0.0042
Root mean squared error
0.0459
Relative absolute error
2.197 %
Root relative squared error
14.9017 %
Total Number of Instances
99
=== Confusion Matrix ===
a b c d e f g h <-- classified as
17 0 0 0 0 0 0 0 | a = ABM
0 42 0 0 0 0 0 0 | b = AR
00300000|c=R
0 0 0 3 0 0 0 0 | d = ABMR
0 0 0 0 6 0 0 0 | e = AMR
0 0 0 0 0 11 0 0 | f = A
0 0 0 0 0 0 9 1 | g = ABR
0 0 0 0 0 0 0 7 | h = BR
165
Classification of Deforestation Factors Using Data Mining Techniques
Stratified Cross-Validation
Time taken to build model: 0 seconds
Correctly Classified Instances
Incorrectly Classified Instances
Kappa statistic
96
96.9697 %
3
3.0303 %
0.96
Mean absolute error
0.0081
Root mean squared error
0.0817
Relative absolute error
4.2058 %
Root relative squared error
26.4947 %
Total Number of Instances
99
=== Confusion Matrix ===
a b c d e f g h <-- classified as
17 0 0 0 0 0 0 0 | a = ABM
0 42 0 0 0 0 0 0 | b = AR
00300000|c=R
0 0 0 3 0 0 0 0 | d = ABMR
0 0 0 0 6 0 0 0 | e = AMR
0 0 0 0 0 11 0 0 | f = A
0 0 0 0 0 0 9 1 | g = ABR
0 0 0 0 0 0 2 5 | h = BR
K Nearest Neighbour
A Nearest Neighbor Classifier assumes all instance correspond to points in the n-dimensional space. During
learning, all instances are remembered. When a new point is classified, the k nearest points to the new point is found and is
used with a weight for determining the class value of the new point. For the sake of increasing accuracy, greater weights
are given to closer points [9].
Evaluation on Training Set
Time taken to build model: 0 seconds
Correctly Classified Instances
99
100
%
Incorrectly Classified Instances
0
0
%
Kappa statistic
1
Mean absolute error
0.0019
166
S. Jyothi, K. Saritha & K. R. Manjula
Root mean squared error
0.0045
Relative absolute error
1.0033 %
Root relative squared error
1.4611 %
Total Number of Instances
99
=== Confusion Matrix ===
a b c d e f g h <-- classified as
17 0 0 0 0 0 0 0 | a = ABM
0 42 0 0 0 0 0 0 | b = AR
00300000|c=R
0 0 0 3 0 0 0 0 | d = ABMR
0 0 0 0 6 0 0 0 | e = AMR
0 0 0 0 0 11 0 0 | f = A
0 0 0 0 0 0 10 0 | g = ABR
0 0 0 0 0 0 0 7 | h = BR
Stratified Cross-Validation
Time taken to build model: 0 seconds
Correctly Classified Instances
Incorrectly Classified Instances
98
98.9899 %
1
1.0101 %
Kappa statistic
0.9867
Mean absolute error
0.0051
Root mean squared error
0.0438
Relative absolute error
2.635 %
Root relative squared error
14.2084 %
Total Number of Instances
99
=== Confusion Matrix ===
a b c d e f g h <-- classified as
17 0 0 0 0 0 0 0 | a = ABM
0 42 0 0 0 0 0 0 | b = AR
00300000|c=R
0 0 0 3 0 0 0 0 | d = ABMR
0 0 0 0 6 0 0 0 | e = AMR
167
Classification of Deforestation Factors Using Data Mining Techniques
0 0 0 0 0 11 0 0 | f = A
1 0 0 0 0 0 9 0 | g = ABR
0 0 0 0 0 0 0 7 | h = BR
Artificial Neural Networks
Artificial Neural Networks (ANN) is one of the classification methods in data mining. To employ
Figure 3: Multilayer Neural Network
Neural Network based classifiers, Multi-Layer Perceptron (MLP) is used (Figure 3). MLP is a feed forward
technique that makes a model to map input data to output data. Hidden layer in MLP can include various layers between
input and output. The structure of MLP is shown below [1].
Evaluation on Training Set
Time taken to build model: 1 seconds
Correctly Classified Instances
99
100
%
Incorrectly Classified Instances
0
0
%
Kappa statistic
1
Mean absolute error
0.0089
Root mean squared error
0.0152
Relative absolute error
4.627 %
Root relative squared error
Total Number of Instances
=== Confusion Matrix ===
a b c d e f g h <-- classified as
17 0 0 0 0 0 0 0 | a = ABM
0 42 0 0 0 0 0 0 | b = AR
00300000|c=R
0 0 0 3 0 0 0 0 | d = ABMR
0 0 0 0 6 0 0 0 | e = AMR
0 0 0 0 0 11 0 0 | f = A
4.9526 %
99
168
S. Jyothi, K. Saritha & K. R. Manjula
0 0 0 0 0 0 10 0 | g = ABR
0 0 0 0 0 0 0 7 | h = BR
Stratified Cross-Validation
Time taken to build model: 0.86 seconds
Correctly Classified Instances
98
98.9899 %
Incorrectly Classified Instances
1
1.0101 %
Kappa statistic
0.9866
Mean absolute error
0.0126
Root mean squared error
0.0502
Relative absolute error
6.5723 %
Root relative squared error
16.2596 %
Total Number of Instances
99
== Confusion Matrix ===
a b c d e f g h <-- classified as
17 0 0 0 0 0 0 0 | a = ABM
0 42 0 0 0 0 0 0 | b = AR
00300000|c=R
0 0 0 3 0 0 0 0 | d = ABMR
0 0 0 0 6 0 0 0 | e = AMR
0 0 0 0 0 11 0 0 | f = A
0 1 0 0 0 0 9 0 | g = ABR
0 0 0 0 0 0 0 7 | h = BR
Simple CART
CART algorithm stands for Classification And Regression Trees algorithm, it is a data exploration and prediction
algorithm. Classification and Regression Trees is a classification method which in order to construct decision trees uses
historical data. To classify new data decision trees so obtained are used. Number of classes must be known a prior in order
to use CART. CART uses so called learning sample which is a set of historical data with pre-assigned classes for all
observations for building decision trees [16].
Evaluation on Training Set
Time taken to build model: 0.05 seconds
Correctly Classified Instances
Incorrectly Classified Instances
99
100
%
0
0
%
169
Classification of Deforestation Factors Using Data Mining Techniques
Kappa statistic
1
Mean absolute error
0
Root mean squared error
0
Relative absolute error
0
%
Root relative squared error
0
%
Total Number of Instances
99
=== Confusion Matrix ===
a b c d e f g h <-- classified as
17 0 0 0 0 0 0 0 | a = ABM
0 42 0 0 0 0 0 0 | b = AR
00300000|c=R
0 0 0 3 0 0 0 0 | d = ABMR
0 0 0 0 6 0 0 0 | e = AMR
0 0 0 0 0 11 0 0 | f = A
0 0 0 0 0 0 10 0 | g = ABR
0 0 0 0 0 0 0 7 | h = BR
Stratified Cross-Validation
Time taken to build model: 0.02 seconds
Correctly Classified Instances
Incorrectly Classified Instances
93
93.9394 %
6
6.0606 %
Kappa statistic
0.9185
Mean absolute error
0.0163
Root mean squared error
0.1106
Relative absolute error
8.4774 %
Root relative squared error
35.8587 %
Total Number of Instances
99
=== Confusion Matrix ===
a b c d e f g h <-- classified as
17 0 0 0 0 0 0 0 | a = ABM
0 42 0 0 0 0 0 0 | b = AR
03000000|c=R
170
S. Jyothi, K. Saritha & K. R. Manjula
0 0 0 0 0 0 3 0 | d = ABMR
0 0 0 0 6 0 0 0 | e = AMR
0 0 0 0 0 11 0 0 | f = A
0 0 0 0 0 0 10 0 | g = ABR
0 0 0 0 0 0 0 7 | h = BR
EXPERIMENTAL RESULTS
In this section, we present the results of different classification algorithms and perform analysis on their
performance to verify the effectiveness of each algorithm. The domain of this work is to analyze the best algorithm for our
data set. Performance evaluation of algorithms is also done between the training and validation methods to analyze the best
algorithm. Table 1 shows the summary of the computational time, correct instances, kappa and measures like MAE, RMSE
are evaluated on the Training set in the classification algorithms. Regarding to the computational time, Bayes Net, IBK,
Naives Bayes perform fast computation with less time, but it also obtain worst results when we evaluate the kappa,
measures and confusion matrix. It should also be noted that Bayes Net is showing the best performance when cross
checked with kappa, measures and confusion matrix.
Despite this fact, that is not considering the time computation, if we perform analysis on kappa, and other
measures the algorithms like Bayes Net, MLP, Simple CART are playing major role in providing the best accuracy. Kappa
is a chance-corrected measure of agreement between the classified classes. If Kappa =1, then there is a perfect agreement,
if Kappa=0, then there is no agreement, if the value is >0 then it means that classifier is doing better classification. Mean
absolute error is sum of absolute errors divided by number of predictions. Root means square error is a square of sum of
squares error divided number of predictions, it is a measure the differences between values by a predicted by a model and
the values actually observed. Smaller the values of RMSE shows that the model with better accuracy. So, if MAE and
RMSE are minimum then the better prediction and accuracy.
Using Training Set
Table 1: Accuracy Results of all Methods in Training Set
Algorithm
Bayes
Naïve Bayesian
J48
IBK
MLP
SimpleCART
Time
0
0
0.02
0
1
0.05
Correctly
99
98
98
99
99
99
Incorrectly
0
1
1
0
0
0
Kappa
1
0.9867
0.9867
1
1
1
MAE
0.0194
0.0339
0.0042
0.0019
0.0089
0
RMAE
0.0521
0.0861
0.0459
0.0045
0.0152
0
Figure 4: Comparison of Classifiers Based on Accuracy
RAE %
10.1457
17.6942
2.197
1.0033
4.627
0
RRAE %
16.922
27.9577
14.9017
1.4611
4.9526
0
171
Classification of Deforestation Factors Using Data Mining Techniques
Figure 4 demonstrates the comparison of accuracy between classifiers in many aspects. It is revealed and justifies
in the graph that Bayes Net, IBK, MLP, and Simple CART shows the best performance results.
Table 2 also shows the summary results of various classification techniques using the cross-validation of k-folds
model. In the present case, the performance analysis of various algorithms indicates that time computation is high in MLP
and Simple CART algorithms, whereas it is less in other learning algorithms like Bayes, Naïve Bayes, J48 and IBK but the
measures like kappa statistics, MAE, RMSE and confusion matrix is indicating less accuracy. But IBK (K-NN) is showing
the high accurate results in all aspects. When compared to all algorithms IBK k-nearest neighbor showing the best accuracy
but its accuracy is not excellent is training set, though its performance on cross-validation is approached to the perfect
level. Interestingly, if the performance analysis is done independent of time then the MLP algorithm is showing the good
accuracy in both the cases.
Using Cross-Validation
Table 2: Accuracy Results of all Methods in Cross-Validation
Algorithm
Bayes
NaiveBayesian
J48
IBK
MLP
SimpleCART
Time
0
0
0
0
0.86
0.02
Correctly
98
95
96
98
98
93
Incorrectly
1
4
3
1
1
6
Kappa
0.9867
0.9458
0.96
0.9867
0.9866
0.9185
MAE
0.0247
0.0401
0.0081
0.0051
0.0126
0.0163
RMAE
0.0743
0.1055
0.0817
0.0438
0.0502
0.1106
RAE %
12.8792
20.8635
4.2058
2.035
6.5723
8.4774
RRAE %
24.5768
34.2137
26.4947
14.2084
16.2596
35.8587
Figure 5: Comparison of Classifiers Based on Performance
Figure 5 describes the performance of each classifier. To evaluate the performance of different methods, we made
detailed analysis on above characteristics and we can see that Simple CART achieves the best classification performance.
CONCLUSIONS
A variety of classification methods has been applied and tested on deforestation data. Our main aim is to analyze
the best algorithm for our data set. For this purpose, we compare the performance results of different classification
algorithms in WEKA a Machine Learning Language tool. Selecting the best algorithm is an important task to pertain the
accurate results, Which are not found in the observed algorithms, some of the algorithms are yielding the best results like
Bayes Net, MLP, Simple CART but the time of computation, MAE, RMSE are differ in each cases. The results and
findings of the presented study may be used for extending the new algorithm which reflects the best properties of the
different classification algorithms. So to obtain the optimal results for our data set, we propose the hybrid algorithm as our
future work containing the best properties of the above algorithms.
ACKNOWLEDGEMENTS
The authors are grateful to the UGC for providing the funds for our work.
172
S. Jyothi, K. Saritha & K. R. Manjula
REFERENCES
1.
D. A. Avellaneda, et al., "Natural Texture Classification: A Neural Network Models Benchmark," 2009, pp. 325329.
2.
R. Bayardo, “Brute-force mining of high-confidence classification rules”, Proceedings of the 3rd International
Conference on Knowledge Discovery and Data Mining (KDD-97), AAAI Press, Newport Beach, CA, United
States, August 1997, pp. 123-126.
3.
Bernhard Pfahringer, Geoffrey Holmes and Richard Kirkby, “Optimizing the Induction of Alternating Decision
Trees”, Proceedings of the Fifth Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining.
2001, pp. 477-487.
4.
Chung-Chian Hsu, Sheng-Hsuan Wang. “An integrated framework for visualized and exploratory pattern
discovery in mixed data’, IEEE Transactions on Knowledge and Data Engineering, 2006, 18(2): 161-173.
5.
Desouza, K.C. (2001) Artificial intelligence for healthcare management In Proceedings of the First International
Conference on Management of Healthcare and Medical Technology Enschede, Netherlands: Institute for
Healthcare Technology Management.
6.
Daniel Grossman and Pedro Domingos (200 ), “Learning Bayesian Network Classifiers by Maximizing
Conditional Likelihood”, In Press of Proceedings of the 21st International Conference on Machine Learning,
Banff, Canada.
7.
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. From Data Mining to Knowledge Discovery: An Overview. In
Fayyad, U., Piatetsky-Shapiro, G., Amith, Smyth, P., and Uthurusamy, R. (eds.), Advances in Knowledge
Discovery and Data Mining, MIT Press, 1-36, Cambridge, 1996.
8.
Y. Herdiyeni, et al., "A Bayesian network approach for image similarity," International Conference on
Instrumentation, Communications, Information Technology, and Biomedical Engineering (ICICI-BME), 2009,
pp. 1-6.
9.
T.M. Mitchell, Machine Learning, McGraw-Hill Companies, USA, 1997.
10. K.R. Manjula, Dr. S. Jyothi, S. Anand Kumar Varma, ”Analysing the factors of deforestation using GIS”.
11. K.R. Manjula, Dr. S. Jyothi, S. Anand Kumar Varma, Dr. S.Vijaya Kumar, “ Construction of Spatial Dataset from
Remote Sensing using GIS for Deforestation Study”, International Journal of Computer Applications (0975 –
8887) Volume 31– No.10, October 2011.
12. G. Qiang, "An Effective Algorithm for Improving the Performance of Naive Bayes for Text
Classification,",Second International Conference on Computer Research and Development, 2010, pp. 699-701.
13. J. Quinlan, C . , “Programs for machine learning”, San Mateo, CA: Morgan Kaufmann, 1993.
14. Shekhar, S., et al., Trends in Spatial Data Mining, in Data Mining: Next Generation Challenges and Future
Directions, AAAI Pres,2004.
15. http://www.c4.5-Wikipedia, the free encyclopedia.
16. http:// www.CART-Wikipedia, the free encyclopedia.