Download Comparison of Decision Tree and ANN Techniques for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012)
Comparison of Decision Tree and ANN Techniques for
Data Classification
K.Nandhini1, S.Saranya2
1
2
Head- Computer Science, Dr.NGP Arts and Science College, Coimbatore-48
Research Scholar (MPhil), Department of Computer Science, Dr.NGP Arts and Science College, Coimbatore-48
1
[email protected]
[email protected]
2
In simple words it can be defined as the process of
extracting the potentially useful, valid and novel patterns
from the data warehouse for resolving a problem domain.
Data mining is being used in several applications like
banking, insurance, hospital and studies. In case of
education, Data mining plays a vital role in monitoring,
assessing, evaluating and predicting the academic
performance in the institutions period by period
consecutively by employing various techniques on each
dataset obtained.
Abstract— As the development of the welfare is concerned,
many factors had to be constantly monitored to achieve
reliable solutions in various fields particularly in the
education domain. The first and foremost goal of any
educational system is to continually maintain and increase the
graduation rates periodically. To make this work better the
performance and attitude of the pupil towards the education
should be carefully monitored. Generally, performance
analysis and monitoring involves gathering both formal and
informal data to help decision making process of any domain
to achieve their goals. It eliminates several perspectives on a
problem and proposes a solution based on the data what is
discovered. The core and primary function of performance
analysis model is classification. Various techniques of
classification are used to improve the accuracy and reliability
of prediction. This paper compares decision tree algorithms
J4.8 and ID3 with Cascade-Correlation (CC) algorithm of
ANN. The Cascade-Correlation algorithms has several
advantages that it learns the population very quickly, and
also the network and topology determines its own size and it
retains the structures even when the training set changes.
This methodology extracts highly useful, reliable and novel
patterns from the dataset and the patterns obtained are
compared by means of decision tree algorithms J4.8, ID3 with
ANN for result prediction to resolve the problem domain.
Educational Data Mining is an emerging discipline,
concerned with developing methods for exploring the
unique types of data that come from educational settings,
and using those methods to better understand pupil’s, and
the settings which they learn in. A key area of EDM is
mining pupil’s performance. Another key area is mining
enrollment data. The areas of EDM application are:
Providing feedback for supporting instructors,
Recommendations
for
pupil’s,
Predicting
pupil
performance, Pupil modeling and detecting undesirable
pupil’s behaviors, Grouping pupil’s, constructing
courseware, Planning and scheduling. Recently many
methodologies have been developed using data mining in
educational systems for higher education. The main motto
of this methodology is to determine how accurately it
predicts the pupil’s performance based on the attributes and
profile present. Performance monitoring involves tests
which provide information that is useful for pupil’s and
teachers to take decisions for future purpose. So this
technique is being employed to predict the accuracy which
is a key constraint in EDM [1, 12].
Keywords—ANN, CC, Data Mining, ID3, J4.8, Text
Classification.
I. INTRODUCTION
Data mining techniques when compared with the earlier
methodologies proves to be the effective and reliable one
for retrieving information from the database and providing
a provable solution to the respective problem domain by
eliminating the redundancy that takes place with other
techniques.
The main concept that contributes to Data mining is the
Knowledge discovery process, a sequential process which
prone to provide a reliable prediction model for the defined
problem domain.
Data mining involves various techniques and algorithms
to accomplish different process/tasks, in which all of these
algorithms attempt to build a model to the data.
323
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012)
All these algorithms play a common role which helps to
determine a model for the problem domain based on the
data fed into the system. Data mining model can be created
either predictive or descriptive in nature. A predictive
model makes a prediction about the values of data using
known results from various types of data. Predictive model
of data mining tasks include classification, regression, time
series analysis and prediction. Prediction may also be used
to indicate a specific type of data mining function. A
descriptive model identifies patterns or relationship in data.
Unlike the predictive model, a descriptive model serves as
a way new properties.
marks obtained by a perfect method of questioning and
testing their skills in various sorts that helps to predict
exactly which pupil may need some extra attention and
coaching in the course of their education. The model
developed helps to achieve a precise solution for
performance compilation that yields faster results and
meets the demands and needs of both the institution and the
pupils’ welfare.
II. METHODOLOGY
The data used for this research is the pupil data of
computer science department [2]. The information such as
age, gender, religion, country, academic details such as
institution last studied, course taken, course undergoing
now, marks awarded, personal skills, key skills and other
skills. This paper compares two decision tree algorithms
J4.8 and ID3 with cascade-correlation algorithm of ANN.
Prediction is simply predicting a future value rather than
a current state. The Cascade Correlation (CC) and decision
tree algorithms J4.8 and ID3 are applied on pupils’
percentage assessment data to predict whether the pupils’
are eligible for higher studies. The predictor variables are
Demographic profile, pupils’ UG percentage, Parents
Educational Qualification. The comparison results proved
that the ANN is able to produce accurate results compared
with the decision tree algorithms J4.8 and ID3 [3,5].
A. Artificial Neural Network
Neural network models in artificial intelligence are usually
referred to as Artificial Neural Networks (ANN), these are
essentially simple mathematical models defining a function
or a f : x  y distribution over X or both X and Y, but
sometimes models are also intimately associated with a
particular learning algorithm or learning rule[2,7,10].
Classification techniques map the data into predefined
classes [4]. It is also known as supervised learning, because
the classes are determined before examining the data.
Examples of classification applications are bank loan and
identifying credit risks.
This paper is one such application of classification which
is designed for performance assessment by applying the
classification techniques and its algorithms respectively.
The most commonly used technique in classification is the
decision tree.
A decision tree is a tree where internal node is a test
attribute, tree branch is the test outcome and leaf node is
the class label or class. Based on the classification task
employed on the datasets, the results are derived. This
technique provides maximum accuracy in the prediction of
academic results of the students in the future.
Artificial Neural Networks acts as a ―black box‖
approach to problem solving. CC algorithm of ANN is used
in this paper which acts as a black box for providing the
solution.
Fig 1: Layers of ANN
ANN is a type of information processing network whose
architecture is inspired by the structure of biological neural
system. Knowledge is acquired by the network through a
learning process.
As a whole, the Performance analysis depends upon the
work and assignments done by the pupil in the respective
semester, their attitude towards the academics, secured
324
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012)
ANN consists of three layers of the nodes namely Input
node, Hidden node and Output node as represented in
figure 1. Input is fed into the multilayered input nodes
which are then processed by the hidden nodes (black box)
and the output is generated from the output nodes. The
input to individual neural network nodes must be numeric
and fall in the closed interval range from 0 to 1. Each
attribute of pupil’s must be normalized such as age must be
divided by 100. While the pupils’ gender and race are
identified by binary inputs.
In this case, the information such as demographic data,
academic profile and personal profile should be
transformed into the required range like 0-1. After the
hidden logic being established on the inputs fed into the
network, the output will return whether it is a qualified or
disqualified profile[6,12].
C. Decision Tree
A decision tree is tree-shaped structure that represents
sets of decisions. These decisions generate rules for the
classification of a dataset. Trees develop arbitrary accuracy
and use validation data sets to avoid spurious detail. They
are easy to understand and modify. Moreover, the tree
representative is more explicit, easy-to understand rules for
each cluster of pupils’ performance. The decision tree
represents the knowledge in the form of IF-THEN rules.
Each rule can be created for each path from the root to a
leaf. The leaf node holds the class prediction. This paper
compares two decision tree algorithms ID3 and J4.8 with
ANN.
The decision tree approach is most useful in classification
problems. With this technique a tree is constructed to
model the classification process. Attributes in the database
schema that will be used to label nodes in the tree and
around which the division will take place are called the
splitting attributes. Associated with the ordering of the
attributes is the number of splits to take. With some
attributes the domain is small, so the number of splits is
obvious based on the domain. If the domain is continuous
or has a large number of values, the number of splits to use
is not easily determined.
D. ID3
An early technique by the influential Ross Quinlan that
influenced a large part of the research on Decision Trees is
useful to look at in order to understand basic decision tree
construction.
Splitting Criteria:
A fundamental part of any algorithm that constructs a
decision tree from a dataset is the method in which it
selects attributes at each node of the tree.
Entropy:
A measure used from Information Theory in the ID3
algorithm and many others used in decision tree
construction is that of Entropy. Informally, the entropy of a
dataset can be considered to be how disordered it is. It has
been shown that entropy is related to information, in the
sense that the higher the entropy, or uncertainty, of some
data, then the more information is required in order to
completely describe that data [8].
B. Cascade Correlation Algorithm
Cascade-Correlation is a supervised learning algorithm
for Artificial Neural Networks (ANN). This algorithm
begins with a minimal network, then automatically trains
and adds new hidden units one by one creating a multilayer structure. The cascade correlation architecture learns
very quickly and it determines its own size and topology. It
requires no back-propagation of error signals through the
connections of the network.
Algorithm steps:
1. CC starts with a minimal network consisting only of an
input and an output layer. Both layers are fully connected.
2. Train all the connections ending at an output unit with a
usual learning algorithm until the error of the net no longer
decreases.
3. Generate the so-called candidate units. Every candidate
unit is connected with all input units and with all existing
hidden units. Between the pool of candidate units and the
output units there are no weights.
4. Try to maximize the correlation between the activation
of the candidate units and the residual error of the net by
training all the links leading to a candidate unit. Learning
takes place with an ordinary learning algorithm. The
training is stopped when the correlation scores no longer
improves.
5. Choose the candidate unit with the maximum correlation
freeze its incoming weights and add it to the net. To change
the candidate unit into a hidden unit, generate links
between the selected unit and all the output units. Since the
weights leading to the new hidden unit are frozen, a new
permanent feature detector is obtained. Loop back to step 2.
In building a decision tree, we aim to decrease the entropy
of the dataset until we reach leaf nodes at which point the
subset that we are left with is pure, or has zero entropy and
represents instances all of one class (all instances have the
same value for the target attribute).
325
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012)
The entropy of a dataset S, with respect to one attribute,
in this case the target attribute, with the following
calculation:
Entropy(s)
All the samples in the list belong to the same class.
When this happens, it simply creates a leaf node for the
decision tree saying to choose that class.
None of the features provide any information gain. In this
case, J4.8 creates a decision node higher up the tree using
the expected value of the class.
Instance of previously-unseen class encountered. Again,
J4.8 creates a decision node higher up the tree using the
expected value.
Algorithm steps:
1. Check for base cases
2. For each attribute a Find the normalized information
gain from splitting on a
3. Let a_best be the attribute with the highest normalized
information gain
4. Create a decision node that splits on a_best
5. Recurse on the sub lists obtained by splitting on a_best,
and add those nodes as children of node
Where, Pi is the proportion of instances in the dataset that
take the ith value of the target attribute, which has C
different values.
This probability measures give us an indication of how
uncertain we are about the data. And use a log2 measure as
this represents how many bits we would need to use in
order to specify what the class (value of the target attribute)
is of a random instance.
Algorithm steps:
If all the instances have the same value for the target
attribute then return a decision tree that is simply this value
(not really a tree - more of a stump).
Else
1. Compute Gain values for all attributes and select an
attribute with the highest value and create a node for that
attribute.
2. Make a branch from this node for every value of the
attribute
3. Assign all possible values of the attribute to branches.
4. Follow each branch by partitioning the dataset to be only
instances whereby the value of the branch is present and
then go back to 1.
E. J4.8
J4.8 builds decision trees from a set of training data in the
same way as ID3, using the concept of information entropy.
The training data is a set S = s1,s2,... of already classified
samples. Each sample si = x1,x2,... is a vector where x1,x2,...
represent attributes or features of the sample.
III. RESULT DISCUSSION
The prediction analysis is to improve the academic status
of the pupils’. The ANN prediction algorithm is analyzed
to find out the pupils’ who are eligible for higher studies by
using the following methods [9,11].
1. The efficiency of the algorithm is measured by taking the
pupil’s information like pupils’ UG percentage, board of
study, parent’s educational qualification depending on
these details, to finds the eligible candidates who are
entering for their higher studies by comparing the J4.8, ID3
algorithm with ANN algorithm.
Here 2000 pupil’s data set were collected from the
Department of Computer Science in Dr. NGP Arts and
Science College. The algorithm is analyzed in the
following terms:
1. Whether the pupil’s are eligible for higher studies.
2. The time taken to derive the tree
The training data is augmented with a vector C = c1,c2,...
where c1,c2,... represent the class to which each sample
belongs. J48 is an open source Java implementation of the
C4.5 algorithm in the weka data mining tool.
At each node of the tree, J4.8 chooses one attribute of the
data that most effectively splits its set of samples into
subsets enriched in one class or the other. Its criterion is the
normalized information gain (difference in entropy) that
results from choosing an attribute for splitting the data. The
attribute with the highest normalized information gain is
chosen to make the decision. The J4.8 algorithm then
recurses on the smaller sub lists.
This algorithm has a few base cases.
326
Algorithms
Number of
Male
Students
Number
of Female
Students
Time
ID3
900
1100
3 seconds
J48
876
925
2 seconds
CC
1000
950
1 seconds
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012)
Table 1.Comparison of ID3, J4.8 and CC Algorithms
[6]
Kotsiantis and etel S.B., ―Efficiency of machine learning techniques
in predicting students’ performance in distance learning systems‖,
proc, Recent advances in machanics and related fields.
[7]
Luan J. (2004) ―Data Mining Applications in Higher Education‖,
SPSS Exec. Report, ―Data Mining and Knowledge Management in
Higher Education – Potential Applications‖. Presentation at AIR
Forum Toronto, Canada,2002.
[8]
Mierle, K. Laven, K., Roweis, S., Wilson, G.(2005) ―Mining
Students CVS Repositories for Performance Indicators‖.
[9]
Naeimeh Delavari and Mohammad Reza Beikzadeh and Somnuk
Phon-Amnuaisuk. ―Application of Enhanced Analysis Model for
Data Mining Process in Higher Educational System‖ ITHET 6 th
Annual International Conference.Juan Dolio Dominican Republic.
July, 2005.
1200
1000
800
Male
600
Fem ale
400
Tim e
200
0
ID3
J48
CC
Fig 2: Eligibility Criteria for Male and Female
[10] Shaeela Ayesha and et al, ―Data Mining Model for Higher Education
System‖, European Journal of Scientific Research, Vol.43
No.1(2010).
The above table clearly specifies the numbers of male
and female pupils are eligible for higher studies. By
comparing the decision tree algorithms ID3, J48 with ANN
(CC), the ANN finds more pupils enter into higher studies
than the ID3 and J48. Therefore it is clear that CC
algorithm is more accurate than the other two decision tree
algorithms.
IV. CONCLUSION
This paper analogous two classification techniques which
are, Artificial Neural Network and Decision Tree are used
with respect to data classification. These techniques are
applied in this paper on the institutional data for predicting
the pupils’ status. The efficiency of the algorithms has been
analyzed based on their accuracy and time taken for its
processing. The result clearly shows that the time taken by
CC less than the other two algorithms.
[11] Varun Kumar, Anupama Chanda, An Emprical Study of the
Applications of Data Mining Techniques in Higher Education,
(IJACSA) International Journal of Advanced Computer Science and
Applications, March 2011.
[12] Waiyamai K, ―Improving the Quality of Graduate Students by Data
Mining‖. Department of Computer Engineering, Faculty of
Engineering, Kasetsart University, Bangkok, Thailand,2003.
REFERENCES
[1]
Alaa el-halees, mining students data to analyze learning behavior:, a
case study, Department of Computer Science, Islamic University of
Gaza.
[2]
Anupama Kumar S., Dr. Vijayalakshmi M.N., ―A Novel Approach
in Data Mining Techniques for Educational Data‖, Proc 2011 3rd
International Conference on Machine Learning and Computing‖
(ICMLC 2011), Singapore, 26th – 28th Feb 2011.
[3]
Delaveri. N. (2005) ―Application of Enhanced Analysis Model for
Data Mining in Higher Educational System‖, IEEE.
[4]
Han, J., Kamber, M. (2001) ―Data Mining: Concepts and
Techniques‖. Morgan Kaufmann Publishers.
[5]
Ibrahim, Z. and Rusli, D. (2007) ―Predicting Student’s Academic
Performance: Comparing Artificial Neural Network, Decision Tree
and Linear Regression‖ 21st Annual SAS forum.
327