Download A Survey Report on RFM Pattern Matching Using Efficient

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
International Journal of Computer Trends and Technology (IJCTT) - volume4 Issue5–May 2013
A Survey Report on RFM Pattern Matching Using
Efficient Multi-Set HPID3 Algorithm
Priyanka Rani1, Nitin Mishra2 ,Samidha Diwedi Sharma3
Abstract— with the increasing contest in the retail industry, the
main focus of superstore is to classify valuable customers
accurately and quickly among the large volume of data. In this
paper a survey report on proposed HPID3 classification
algorithm is presented. It creates an innovative decision tree
structure in order to predict the future outcomes. This technique
proofs better over existing efficient techniques of classification,
which had been proposed in recent times. In this proposed
technique, concept of Recency, Frequency and Monetary is
introduced, which is usually used by marketing investigators to
develop some marketing rules and strategies, to find important
patterns. Conventional ID3 algorithm is modified by firstly
horizontally splitting the sample of RFM dataset and then some
mathematical calculations are applied in order to construct
decision tree and then to predict future customer behaviors by
matching pattern. This proposed algorithm significantly reduces
the processing time mean absolute error, relative absolute error
and memory space required for processing the dataset and for
constructing decision tree. The technique works optimally for
small and large size database. The performance of the proposed
algorithm is analyzed by the standard RFM dataset and
compared with the conventional ID3 algorithm using weka data
mining tool where the results are better for the proposed
technique.
Keywords— Data mining, ID3, HPID3, RFM, customer
classification, Decision tree
I. INTRODUCTION
Data mining is the exploration and analysis of large quantities
of data in order to discover valid, novel, potentially useful,
and ultimately understandable patterns in data. Exploiting
large volumes of data for superior decision making by looking
for interesting patterns in the data has become a main task in
today’s business environment. Data classification is one of the
most widely used technologies in data mining. Its main
purpose is to build a classification model, which can be
mapped to a particular subclass through the data list in the
database. Classification is very essential to organize data,
retrieve information correctly and rapidly. At present, the
decision tree has become an important data mining method. It
is a more general data classification function approximation
algorithm based on machine learning. There exist many
methods to do decision analysis. Each method has its own
advantages and disadvantages. In machine learning, decision
tree learning is one of the most popular techniques for making
classifications decisions in pattern recognition. The basic
learning approach of decision tree is greedy algorithm, which
ISSN: 2231-2803
use the recursive top-down approach of decision tree structure.
The decision tree as a classification method has the advantage
of processing large amount of data quickly and accurately.
ID3 algorithm is one of the most widely used mathematical
algorithms for building the decision tree. It was invented by J.
Ross Quinlan and uses Information Theory invented by
Shannon. It builds the tree from the top down, with no
backtracking. ID3 algorithm is an example of Symbolic
Learning and Rule Induction. It is also a supervised learner
which means it looks at examples like a training data set to
make its decisions. The key feature of ID3 is choosing
information gain as the standard for testing attributes for
classification and then expanding the branches of decision tree
recursively until the entire tree has been built completely. The
main objective of this paper is to predict future customer
behaviours from customer purchasing R-F-M dataset using
HPID3 algorithm. Firstly the historical data is collected from
which we can predict customer behaviours or customer
purchasing patterns and then the entire dataset is partitioned
horizontally. Using R–F–M attributes of customer transaction
dataset classification rules are discovered by HPID3 algorithm
in which divide-and-conquer strategy is used and the decision
tree is pruned to avoid over fitting problem. The mathematical
calculation for entropy and information gain is performed on
each attribute in multiple partitions. At the end, the
information gains of the same attribute in multiple partitions
are added together to find out the total information gain of a
particular attribute in the given training dataset. The attribute
with the largest information gain is selected as the root of the
decision tree and their values as its branches. After that
information gain of the remaining attributes is again
calculated with respect to selected attribute. This process
continues recursively until it comes to a conclusion at the
decision tree's leaf node. In this way the proposed system will
minimize information content needed and the average depth of
the generated decision tree.
II. RELATED WORK
In 2011 Wei Jianping did research on VIP Customer
Classification Rule Based on RFM Model. The main objective
of this paper is to expose the most important VIP customers
from all customers, using data mining based on rough set
technology. An RFM model is used in classification of VIP
customers, which achieves the simplification and extraction of
rules. Finally he concluded that the proposed work provides
one new method for the fast determination of customer rank,
http://www.ijcttjournal.org
Page 987
International Journal of Computer Trends and Technology (IJCTT) - volume4 Issue5–May 2013
which is helpful for enterprises to develop more feasible
marketing programs for different customers [1]. In 2010 HuiChu Chang proposed an EL-RFM model (for E-Learning
RFM) to measure learner's learning motive force. The model
is an approximate RFM model & its aim is to measure
students' learning motive forces, and then the instructor can
realize and analysis students' behaviors. He concluded that the
instructor can use ELRFM model to quantify learners'
learning behaviors even though the learning environment is
Internet [2]. In 2010 Ya-Han Hu, Fan Wu, Tzu-Wei Yeh the
author proposed a tree structure, named RFMP-tree, to
compress and store entire transactional database and a new
RFMP-growth algorithm to discover all RFM-patterns from
RFMP-tree. The experimental results showed that the
proposed method can not only significantly reduce the size of
outputted patterns, but also retain more meaningful results to
users [3]. In 2011 Xiaojing Zhou1, Zhuo Zhang1, Yin Lu2
proposed five kinds of CS methods in traditional market
segmentation and makes some analysis and research about the
efficiency and applicability. Customer Segmentation (CS)
based on vital statistics, Customer segmentation based on
lifestyle, Customer segmentation based on the way of act, The
CS based on value of corporation profit, CS method based on
profit [4]. In 2011 Chaohua Liu investigated the scope of
customer value and brings forward special definition of
customer value. Also, he analyzes customer value according
to current value, potential value and customer loyalty. And he
proposes a model to classify customers into sixteen categories
based on Self- Organization Map [5]. In 2010 Mei-Ping Xie,
Wei-Ya Zhao proposed a decision tree model to analyze the
main factors that affect Customers’ satisfaction degree such as
the factors of price, quality, marketing and promotion. Finally,
the author concluded that if one company wants to improve its
Customers’ Satisfaction degree, managers should try their best
to improve the level of technology and the employees’
professional skills, innovate their products continuously to
satisfy more customer’s needs [6]. In 2010 Mehdi Bizhani,
Mohammad Jafar Tarokh a new RFM approach is defined
based on the banking industry to analyze transactional
behavior. It is used for both preprocess and segmentation
phases. Segmentation based on the RFM helps us to achieve a
better understanding of groups of customers. For segmentation,
ULVQ (unsupervised learning vector quantization) algorithm
along with thirteen clustering validation indices is applied to
find more exact and understandable results. The result of this
analysis helps bank for better planning its 85 marketing and
CRM issues. So a framework is proposed to find an exact and
valuable. After calculating RFM scores for every POS, the
result is analyzed to find the interesting segment of shops.
Also the result shows us an alert of a considerable group of
inactive POSs [7]. In 2010 Yuxun, Xie Niuniu a new decision
tree algorithm based on attribute is proposed. The improved
algorithm uses attribute-importance to increase information
gain of attribution which has fewer attributions and compares
ID3 with improved ID3 by an example [8]. In 2010 liu jiale,
duhuiying proposed a framework to analyze and calculate
customer value defined by RFM model. By evaluating
ISSN: 2231-2803
customer value, the author concluded that identifying highvalue customers for the airlines is extremely important for
airlines to make proper marketing strategies and realize
precise marketing. The analysis of the experimental data show
that improved ID3 algorithm compared with ID3 algorithm
has better classification accuracy and can get more
reasonable ,more effective and more objectively actual
classification rules [9]. In 2010 the author used ID3 algorithm
in the application of intrusion detection system, it would
created fault alarm and omission alarm. To this fault, an
improved decision tree algorithm was proposed. The decision
tree was created after the data collected classified correctly.
The tree would be not high and has a few of branches.
Experimental results showed the effectiveness of the
algorithm, false alarm rate and omission rate decreased,
increasing the detection rate and reducing the space
consumption [10]. In 2011 Imas Sukaesih Sitanggang, Razali
Yaakob, Norwati Mustapha, Ahmad Ainuddin B Nuruddin
proposed a new spatial decision tree algorithm based on the
ID3 algorithm for discrete features represented in points, lines
and polygons. The new formula for spatial information gain is
proposed using spatial measures for point, line and polygon
features. Empirical result demonstrated that the proposed
algorithm can be used to join two spatial objects in
constructing spatial decision trees on small spatial dataset.
The proposed algorithm has been applied to the real spatial
dataset consisting of point and polygon features. The result is
a spatial decision tree with 138 leaves and the accuracy is
74.72% [11]. In 2011 Jun-Hui LIU, Na LI proposed an
Optimized ID3 algorithm based on attribute importance and
convex function. Compared with the traditional ID3 algorithm,
the optimized one possesses higher average category accuracy.
Meanwhile, it has less decision leaves, thus reduces its
complexity. Especially, when the data set size is bigger, the
efficiency and performance of the optimized ID3 algorithm is
better, and possesses more obvious superiority [12]. In 2011
Xingwen Liu, Dianhong Wang, Liangxiao Jiang, Fenxiong
Chen and Shengfeng Gan , used the improved information
gain based on dependency degree of condition attributes on
decision attribute as a heuristic for selecting the optimal
splitting attribute in order to overcome above-stated
shortcoming of the traditional ID3 algorithm. Experiments
proved that the tree size and classification accuracy of the
decision trees generated by the improved algorithm is superior
to the ID3 algorithm [13]. In 2011 Liu zhongtao, Wang hong
reduced the drawbacks of decision tree's attribute dependency
towards more value by decision tree optimization through
twice information gain and optimized calculation aimed at the
backwards of the information gain in ID3, through the
improvement of information gain on interests of the users, and
based on the calculating specialties of information gain in ID3.
The experiment proved that: compared with the traditional
method, the optimized ID3 is provided with high accuracy and
counting speed. In addition the structure decision tree
possesses the advantage of lower average of leaf tree [14]. In
2010 I-Cheng, King-Jang Yang, Tao-Ming Ting used
Bernoulli sequence in probability theory, to derive out the
http://www.ijcttjournal.org
Page 988
International Journal of Computer Trends and Technology (IJCTT) - volume4 Issue5–May 2013
formula that can estimate the probability that one customer
will buy at the next time, and the expected value of the total
number of times that the customer will buy in the future. It
introduced a comprehensive methodology to discover the
knowledge for selecting targets for direct marketing from a
database. This methodology leads to more efficient and
accurate selection procedures than the existing ones. In the
empirical part he examined a case study, blood transfusion
service, to show that the methodology has greater predictive
accuracy than traditional RFM approaches [15]. In the RFM
model, recency (R) is, in general, defined as the interval from
the time when the latest consumption happens to the present,
frequency (F) is the number of consumption within a certain
period, and monetary (M) is the amount of money spent
within a certain period. An earlier study showed that
customers with bigger R, F values are to make a new trade
with enterprise. Moreover, the bigger M is, the more likely the
corresponding customers are to respond to enterprises'
products and service again. [19]. While the three parameters
are considered equally important in [17], they are unequally
weighted due to the characteristics of industry in [18]. In [17],
each of the R, F, M dimensions is divided into five equal parts
and customers are clustered into 125 groups according to their
R, F, M values. Consequently, the high potential groups (or
customers) can be easily identified. In [18], the RFM model is
utilized in profitability evaluation and a weighted-based
evaluation function was proposed. In 2010 I-Cheng, KingJang Yang, Tao-Ming Ting used Bernoulli sequence in
probability theory, to derive out the formula that can estimate
the probability that one customer will buy at the next time,
and the expected value of the total number of times that the
customer will buy in the future. It introduced a comprehensive
methodology to discover the knowledge for selecting targets
for direct marketing from a database. This methodology leads
to more efficient and accurate selection procedures than the
existing ones. In the empirical part he examined a case study,
blood transfusion service, to show that the methodology has
greater predictive accuracy than traditional RFM approaches.
The algorithm considers not only attributes of the object to be
objects. The algorithm does not make distinction between
thematic layers and it takes into account only one spatial
relationship [21]. The decision tree from spatial data was also
proposed as in [22]. The approach for spatial classification
used in [22] is based on both (1) non-spatial properties of the
classified objects and (2) attributes, predicates and functions
describing spatial relation between classified objects and other
features located in the spatial proximity of the classified
objects. Reference [23] discusses another spatial decision tree
algorithm namely SCART (Spatial Classification and
Regression Trees) as an extension of the CART method. The
CART (Classification and Regression Trees) is one of most
commonly used systems for induction of decision trees for
classification proposed by Brieman et. al. in 1984. The
SCART considers the geographical data organized in thematic
layers, and their spatial relationships.
ISSN: 2231-2803
III. EXISTING ID3 ALGORITHM
ID3 algorithm is one of the most widely used mathematical
algorithms for building the decision tree. It was invented by J.
Ross Quinlan and uses Information Theory invented by
Shannon. It builds the tree from the top down, with no
backtracking. The key feature of ID3 is choosing information
gain as the standard for testing attributes for classification and
then expanding the branches of decision tree recursively until
the entire tree has been built completely.
Input:
R, a set of attributes.
C, the class attributes.
S, data set of tuples.
Output:
The Decision tree
Procedure:
1. If R is empty then
2. Return the leaf having the most frequent value in data set S.
3. Else if all tuples in S have the same class value then
4. Return a leaf with that specific class value.
5. Else
6. Determine attribute A with the highest information gain in
S.
7. Partition S in m parts S (a1), ..., S (am) such that a1, ..., am
are the different values of A.
8. Return a tree with root A and m branches labeled a1...am,
such that branch i contains ID3(R − {A}, C, S (ai)).
9. End if
Basic ConceptDefinition 1
Information:
Assume there are two classes, P and N. Let dataset S contain p
elements of class P and n elements of class N. The amount of
information, needed to decide if an arbitrary instance in S
belongs to P or N is defined as
I ( p, n)  
p
log
p  n
2
p
n

log
p  n
p  n
2
n
p  n
Definition 2
Entropy:
Suppose A is an attribute having n different values
{a1,a2,…………an} . Using this property, S can be divided
into n number of subsets {s1,s2………sv}, If Si contains pi
examples of P and ni examples of N, the entropy, or the
expected information needed to classify objects in all sub trees
Si is
http://www.ijcttjournal.org

E (A) 

i1
pi  ni
I ( pi , ni )
p  n
Page 989
International Journal of Computer Trends and Technology (IJCTT) - volume4 Issue5–May 2013
Definition 3
Gain:
The encoding information that would be gained by branching
on attribute A
GAIN(A) = I (p, n) – E (A)
IV. RECENCY FREQUENCY MONETRY
The RFM is the most frequently adopted segmentation
technique that contains three measures (recency, frequency
and monetary) & used to analyse the behaviour of a customer
and then make predictions based on the behaviour in the
database. Segmentation divides markets into groups with
similar customer needs and characteristics that are likely to
exhibit similar purchasing behaviours. Due to the
effectiveness of RFM in marketing, some data mining
techniques have been carried out with RFM. The most
common techniques are: (1) clustering and (2) classification.
Clustering based on RFM attributes provides information of
customers’ actual marketing levels. Classification based on
RFM attributes provides valuable information’s for managers
to predict future customer behaviour.
Definition:
Recency : Recency is commonly defined by the time period
from the last activity to now. A customer having a high score
of recency implies that he or she is more likely to make a
repeated action.
Monetary: Monetary is commonly defined by the total
amount of money spent during a specified period of time.
Higher monetary score indicates that customer is more
important.
Frequency: Frequency is commonly defined by the number
of activities made in a certain time period. Higher frequency
score indicates greater customer loyalty he or she has great
demand for the product and is more likely to purchase the
products repeatedly.
ADVANTAGES OF RFM METHODOLOGY
1. It not only determines the existence of a pattern but also
checks whether it conforms to the recency & monetary
constraints.
2. It measures when people buy, how often they buy and how
much they buy.
3. It allows marketers to test marketing actions to smaller
segments of customers, and direct larger actions only towards
those customer segments that are predicted to respond
profitably.
4. Company can easily classify their customers who are
important & valuable.
5. It can provide company’s profit in a short time.
6. It acts as a common tool to develop marketing strategies.
ISSN: 2231-2803
7. It facilitates to choose which customers to target with an
offer.
8. It helps in identifying significant and valuable customers.
9. It is used to identify customers and analyse customer
profitability.
10. Firms can get much benefit from the adoption of RFM,
encompassing increased response rates, lowered order cost
and greater profit.
V. PROPOSED SYSTEM
Firstly the RFM dataset is collected from which we can
predict customer behaviours or customer purchasing patterns
and then the entire dataset is partitioned horizontally. Using
R–F–M attributes of customer transaction dataset
classification rules are discovered by HPID3 algorithm in
which divide-and-conquer strategy is used and the decision
tree is pruned to avoid over fitting problem. The mathematical
calculation for entropy and information gain is performed on
each attribute in multiple partitions. At the end, the
information gains of the same attribute in multiple partitions
are added together to find out the total information gain of a
particular attribute in the given training dataset. The attribute
with the largest information gain is selected as the root of the
decision tree and their values as its branches. After that
information gain of the remaining attributes is again
calculated with respect to selected attribute. This process
continues recursively until it comes to a conclusion at the
decision tree's leaf node. In this way the proposed system will
minimize information content needed and the average depth of
the generated decision tree.
Step 1: Data Preparation for RFM Modeling
The first step is to collect RFM dataset from which we can
predict customer behaviours, or customer purchasing patterns.
Using RFM model we can select customers who are more
likely to buy. In this way, we can improve marketing
efficiency and maximize profit.
Step 2: Horizontally partitioning the dataset
After pre-processing, the dataset is partitioned horizontally
into multiple partitions.
Step 3: Develop Decision Tree using HPID3 algorithm
Using R–F–M attributes of dataset classification rules are
discovered by HPID3 algorithm in which divide-and-conquer
strategy is used and the decision tree is pruned to avoid over
fitting problem. It calculates overall entropy and information
gains of all attributes. The attribute with the highest
information gain is chosen to make the decision. So, at each
node of tree, horizontal partition decision tree chooses one
attribute that most effectively splits the training data into
subsets with the best cut point, according to the entropy and
information gain.
http://www.ijcttjournal.org
Page 990
International Journal of Computer Trends and Technology (IJCTT) - volume4 Issue5–May 2013
Step 4: Targeting Segment Selection
The next step is to perform profit analysis by selecting the
segment with higher profit or higher response rate. Targeting
segments are selected so as to maximize profit.
Step 5: Fetching RFM values
Using RFM model RFM values are fetched & we can select
customers who are more likely to buy.
Step 6: Pattern matching
After fetching RFM values we get frequent pattern by
sequentially matching it in the dataset.
Collect historical data
Data Preprocessing
Horizontally partitioning
the dataset
frequent pattern
Applying pattern
matching
.
.
Pn (V1), Pn (V2), …., Pn (Vm)
8. Return the Tree whose Root is labelled ABestAttribute and has
m edges labelled V1, V2, …., Vm. Such that for every i the
edge Vi goes to the Tree
9. End.
VI CONCLUSIONS
In this paper the concept of Recency, Frequency and
Monetary (RFM) is introduced, which is usually used by
marketing investigators to develop marketing strategies, to
find important patterns. Conventional ID3 algorithm is
modified by horizontally splitting the sample of customer
purchasing RFM dataset and then some classification rules are
discovered to predict future customer behaviors by matching
pattern. Both the algorithms have been tested on the blood
transfusion service centre RFM dataset. The experimental
result shows that the proposed HPID3 algorithm is more
effective than conventional ID3 in terms of accuracy and
processing speed.
REFERENCES
Fetch RFM values
[1]
[2]
Develop Decision Tree
using HPID3 algorithm
Targeting Segment
Selection
Figure 1: (Phases of proposed system)
The Proposed Algorithm:
1. Horizontally partition the dataset into multiple partitions P1,
P2, …., Pn Partition.
2. Each Partition contains R set of attributes A1, A2, …., AR. 3.
C the class attributes contains c class values C1, C2, …., Cc.
4. For partition Pi where i = 1 to n do
If R is Empty then
Return a leaf node with class value
Else if all transaction in T (Pi) has the same class then
Return a leaf node with the class value
Else
Calculate Expected Information to classify the given sample
for each partition Pi individually.
Calculate Entropy for each attribute (A1, A2, …., AR) of each
partition Pi.
Calculate Information Gain for each attribute (A1, A2,…., AR)
of each partition Pi
End If
End For
5. Calculate Total Information Gain for each attribute of all
partitions (TotalInformationGain ( )).
6. ABestAttribute  MaxInformationGain ( )
7.Let V1, V2, …., Vm be the value of attributes. ABestAttribute
partitioned P1, P2,…., Pn partitions into m partitions
P1 (V1), P1 (V2), …., P1 (Vm)
P2 (V1), P2 (V2), …., P2 (Vm)
.
.
ISSN: 2231-2803
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
http://www.ijcttjournal.org
Wei Jianping “Research on Customer Classification Rule Extraction
base on RFM Model and Rough Set”,2010 IEEE.
Mei-Ping Xie, Wei-Ya Zhao,”The Analysis of Customers’ Satisfaction
Degree Based On Decision Tree Model”, 2010 IEEE.
Xiaojing Zhou1, Zhuo Zhang1, Yin Lu2,” Review of Customer
Segmentation method in CRM”, 2011 IEEE.
“Liu Yuxun, Xie Niuniu”,Improved ID3 Algorithm”, 2010 IEEE.
Wei Jianping”Research on VIP Customer Classification Rule Base on
RFM Model”, 2011 IEEE.
Xingwen Liu*, Dianhong Wang, Liangxiao Jiang, Fenxiong Chen and
Shengfeng Gan,” A Novel Method for Inducing ID3 Decision Trees
Based on Variable Precision Rough Set”, 2011 IEEE.
Imas Sukaesih Sitanggang#†1, Razali Yaakob#2, Norwati Mustapha#3,
Ahmad Ainuddin B Nuruddin*4,” An Extended ID3 Decision Tree
Algorithm for Spatial Data”, 2011 IEEE.
Hnin Wint Khaing,” Data Mining based Fragmentation and Prediction
of Medical Data”, 2011 IEEE
Derya Birant,” Data Mining Using RFM Analysis”, Dokuz Eylul
University Turkey
Hui-Chu Chang,” Developing EL-RFM Model for Quantification
Learner's Learning Behavior in Distance Learning”, Department of
Electrical Engineering National Taiwan University, 2010 IEEE.
Ya-Han Hu, Fan Wu, Tzu-Wei Yeh,” Considering RFM-Values of
Frequent Patterns in Transactional Databases”, 2010 IEEE
Chaohua Liu,” Customer Segmentation and Evaluation Based On RFM,
Cross-selling and Customer Loyalty ”, 2011 IEEE
liu jiale, duhuiying,” Study on Airline Customer Value Evaluation
Based on RFM Model”, 2010 International Conference On Computer
Design And Appliations (ICCDA 2010)
Zahra Tabaei, Mohammad Fathian,” Developing W-RFM Model for
Customer Value: An Electronic Retailing Case Study”, Department of
Industrial Engineering, Iran University of Science and Technology
Chen Jin, Luo De-lin, Mu Fen-xiang, “An Improved ID3 Decision Tree
Algorithmdy”, Proceedings of 2009 4th International Conference on
Computer Science & Education
“Research and Improvement on ID3 Algorithm in Intrusion Detection
System”, 2010 Sixth International Conference on Natural Computation
(ICNC 2010), 2010 IEEE
J. Miglautsch, "Thoughts on RFM scoring", Journal of Database
Marketing, Vol. 6, 2000.
C.-Y. Tsai and C.-C. Chiu, "A purchase-based market segmentation
methodology", Expert Systems with Applications, Vol. 27, 2004.
Page 991
International Journal of Computer Trends and Technology (IJCTT) - volume4 Issue5–May 2013
[19]
[20]
[21]
[22]
[23]
[24]
J. Wu and Z. Lin, "Research on Customer Segmentation Model by
Clustering", Proceedings of the 7th ACM ICEC international
conference on Electronic commerce, 2005.
C.-H. Cheng, and Y.-S. Chen, "Classifying the segmentation of
customer value via RFM model and RS theory", Expert Systems with
Applications, Vol. 36, Issue 3, Part 1, April 2009.
K. Zeitouni and C. Nadjim, “Spatial Decision Tree – Application to
Traffic Risk Analysis,” in ACS/IEEE International Conference, IEEE,
2001.
K. Koperski, J. Han and N. Stefanovic, “An efficient two-step method
for classification of spatial data,” In Symposium on Spatial Data
Handling, 1998.
N. Chelghoum, Z. Karine, and B. Azedine, “A Decision Tree for MultiLayered Spatial Data,” in Symposium on Geospatial Theory,
Processing and Applications, Ottawa, 2002.
J. Han and M. Kamber, Data Mining Concepts and Techniques, 2nd ,
San Diego, USA: Morgan-Kaufmann, 2006.
ISSN: 2231-2803
http://www.ijcttjournal.org
Page 992