Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012) Comparison of Decision Tree and ANN Techniques for Data Classification K.Nandhini1, S.Saranya2 1 2 Head- Computer Science, Dr.NGP Arts and Science College, Coimbatore-48 Research Scholar (MPhil), Department of Computer Science, Dr.NGP Arts and Science College, Coimbatore-48 1 [email protected] [email protected] 2 In simple words it can be defined as the process of extracting the potentially useful, valid and novel patterns from the data warehouse for resolving a problem domain. Data mining is being used in several applications like banking, insurance, hospital and studies. In case of education, Data mining plays a vital role in monitoring, assessing, evaluating and predicting the academic performance in the institutions period by period consecutively by employing various techniques on each dataset obtained. Abstract— As the development of the welfare is concerned, many factors had to be constantly monitored to achieve reliable solutions in various fields particularly in the education domain. The first and foremost goal of any educational system is to continually maintain and increase the graduation rates periodically. To make this work better the performance and attitude of the pupil towards the education should be carefully monitored. Generally, performance analysis and monitoring involves gathering both formal and informal data to help decision making process of any domain to achieve their goals. It eliminates several perspectives on a problem and proposes a solution based on the data what is discovered. The core and primary function of performance analysis model is classification. Various techniques of classification are used to improve the accuracy and reliability of prediction. This paper compares decision tree algorithms J4.8 and ID3 with Cascade-Correlation (CC) algorithm of ANN. The Cascade-Correlation algorithms has several advantages that it learns the population very quickly, and also the network and topology determines its own size and it retains the structures even when the training set changes. This methodology extracts highly useful, reliable and novel patterns from the dataset and the patterns obtained are compared by means of decision tree algorithms J4.8, ID3 with ANN for result prediction to resolve the problem domain. Educational Data Mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand pupil’s, and the settings which they learn in. A key area of EDM is mining pupil’s performance. Another key area is mining enrollment data. The areas of EDM application are: Providing feedback for supporting instructors, Recommendations for pupil’s, Predicting pupil performance, Pupil modeling and detecting undesirable pupil’s behaviors, Grouping pupil’s, constructing courseware, Planning and scheduling. Recently many methodologies have been developed using data mining in educational systems for higher education. The main motto of this methodology is to determine how accurately it predicts the pupil’s performance based on the attributes and profile present. Performance monitoring involves tests which provide information that is useful for pupil’s and teachers to take decisions for future purpose. So this technique is being employed to predict the accuracy which is a key constraint in EDM [1, 12]. Keywords—ANN, CC, Data Mining, ID3, J4.8, Text Classification. I. INTRODUCTION Data mining techniques when compared with the earlier methodologies proves to be the effective and reliable one for retrieving information from the database and providing a provable solution to the respective problem domain by eliminating the redundancy that takes place with other techniques. The main concept that contributes to Data mining is the Knowledge discovery process, a sequential process which prone to provide a reliable prediction model for the defined problem domain. Data mining involves various techniques and algorithms to accomplish different process/tasks, in which all of these algorithms attempt to build a model to the data. 323 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012) All these algorithms play a common role which helps to determine a model for the problem domain based on the data fed into the system. Data mining model can be created either predictive or descriptive in nature. A predictive model makes a prediction about the values of data using known results from various types of data. Predictive model of data mining tasks include classification, regression, time series analysis and prediction. Prediction may also be used to indicate a specific type of data mining function. A descriptive model identifies patterns or relationship in data. Unlike the predictive model, a descriptive model serves as a way new properties. marks obtained by a perfect method of questioning and testing their skills in various sorts that helps to predict exactly which pupil may need some extra attention and coaching in the course of their education. The model developed helps to achieve a precise solution for performance compilation that yields faster results and meets the demands and needs of both the institution and the pupils’ welfare. II. METHODOLOGY The data used for this research is the pupil data of computer science department [2]. The information such as age, gender, religion, country, academic details such as institution last studied, course taken, course undergoing now, marks awarded, personal skills, key skills and other skills. This paper compares two decision tree algorithms J4.8 and ID3 with cascade-correlation algorithm of ANN. Prediction is simply predicting a future value rather than a current state. The Cascade Correlation (CC) and decision tree algorithms J4.8 and ID3 are applied on pupils’ percentage assessment data to predict whether the pupils’ are eligible for higher studies. The predictor variables are Demographic profile, pupils’ UG percentage, Parents Educational Qualification. The comparison results proved that the ANN is able to produce accurate results compared with the decision tree algorithms J4.8 and ID3 [3,5]. A. Artificial Neural Network Neural network models in artificial intelligence are usually referred to as Artificial Neural Networks (ANN), these are essentially simple mathematical models defining a function or a f : x y distribution over X or both X and Y, but sometimes models are also intimately associated with a particular learning algorithm or learning rule[2,7,10]. Classification techniques map the data into predefined classes [4]. It is also known as supervised learning, because the classes are determined before examining the data. Examples of classification applications are bank loan and identifying credit risks. This paper is one such application of classification which is designed for performance assessment by applying the classification techniques and its algorithms respectively. The most commonly used technique in classification is the decision tree. A decision tree is a tree where internal node is a test attribute, tree branch is the test outcome and leaf node is the class label or class. Based on the classification task employed on the datasets, the results are derived. This technique provides maximum accuracy in the prediction of academic results of the students in the future. Artificial Neural Networks acts as a ―black box‖ approach to problem solving. CC algorithm of ANN is used in this paper which acts as a black box for providing the solution. Fig 1: Layers of ANN ANN is a type of information processing network whose architecture is inspired by the structure of biological neural system. Knowledge is acquired by the network through a learning process. As a whole, the Performance analysis depends upon the work and assignments done by the pupil in the respective semester, their attitude towards the academics, secured 324 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012) ANN consists of three layers of the nodes namely Input node, Hidden node and Output node as represented in figure 1. Input is fed into the multilayered input nodes which are then processed by the hidden nodes (black box) and the output is generated from the output nodes. The input to individual neural network nodes must be numeric and fall in the closed interval range from 0 to 1. Each attribute of pupil’s must be normalized such as age must be divided by 100. While the pupils’ gender and race are identified by binary inputs. In this case, the information such as demographic data, academic profile and personal profile should be transformed into the required range like 0-1. After the hidden logic being established on the inputs fed into the network, the output will return whether it is a qualified or disqualified profile[6,12]. C. Decision Tree A decision tree is tree-shaped structure that represents sets of decisions. These decisions generate rules for the classification of a dataset. Trees develop arbitrary accuracy and use validation data sets to avoid spurious detail. They are easy to understand and modify. Moreover, the tree representative is more explicit, easy-to understand rules for each cluster of pupils’ performance. The decision tree represents the knowledge in the form of IF-THEN rules. Each rule can be created for each path from the root to a leaf. The leaf node holds the class prediction. This paper compares two decision tree algorithms ID3 and J4.8 with ANN. The decision tree approach is most useful in classification problems. With this technique a tree is constructed to model the classification process. Attributes in the database schema that will be used to label nodes in the tree and around which the division will take place are called the splitting attributes. Associated with the ordering of the attributes is the number of splits to take. With some attributes the domain is small, so the number of splits is obvious based on the domain. If the domain is continuous or has a large number of values, the number of splits to use is not easily determined. D. ID3 An early technique by the influential Ross Quinlan that influenced a large part of the research on Decision Trees is useful to look at in order to understand basic decision tree construction. Splitting Criteria: A fundamental part of any algorithm that constructs a decision tree from a dataset is the method in which it selects attributes at each node of the tree. Entropy: A measure used from Information Theory in the ID3 algorithm and many others used in decision tree construction is that of Entropy. Informally, the entropy of a dataset can be considered to be how disordered it is. It has been shown that entropy is related to information, in the sense that the higher the entropy, or uncertainty, of some data, then the more information is required in order to completely describe that data [8]. B. Cascade Correlation Algorithm Cascade-Correlation is a supervised learning algorithm for Artificial Neural Networks (ANN). This algorithm begins with a minimal network, then automatically trains and adds new hidden units one by one creating a multilayer structure. The cascade correlation architecture learns very quickly and it determines its own size and topology. It requires no back-propagation of error signals through the connections of the network. Algorithm steps: 1. CC starts with a minimal network consisting only of an input and an output layer. Both layers are fully connected. 2. Train all the connections ending at an output unit with a usual learning algorithm until the error of the net no longer decreases. 3. Generate the so-called candidate units. Every candidate unit is connected with all input units and with all existing hidden units. Between the pool of candidate units and the output units there are no weights. 4. Try to maximize the correlation between the activation of the candidate units and the residual error of the net by training all the links leading to a candidate unit. Learning takes place with an ordinary learning algorithm. The training is stopped when the correlation scores no longer improves. 5. Choose the candidate unit with the maximum correlation freeze its incoming weights and add it to the net. To change the candidate unit into a hidden unit, generate links between the selected unit and all the output units. Since the weights leading to the new hidden unit are frozen, a new permanent feature detector is obtained. Loop back to step 2. In building a decision tree, we aim to decrease the entropy of the dataset until we reach leaf nodes at which point the subset that we are left with is pure, or has zero entropy and represents instances all of one class (all instances have the same value for the target attribute). 325 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012) The entropy of a dataset S, with respect to one attribute, in this case the target attribute, with the following calculation: Entropy(s) All the samples in the list belong to the same class. When this happens, it simply creates a leaf node for the decision tree saying to choose that class. None of the features provide any information gain. In this case, J4.8 creates a decision node higher up the tree using the expected value of the class. Instance of previously-unseen class encountered. Again, J4.8 creates a decision node higher up the tree using the expected value. Algorithm steps: 1. Check for base cases 2. For each attribute a Find the normalized information gain from splitting on a 3. Let a_best be the attribute with the highest normalized information gain 4. Create a decision node that splits on a_best 5. Recurse on the sub lists obtained by splitting on a_best, and add those nodes as children of node Where, Pi is the proportion of instances in the dataset that take the ith value of the target attribute, which has C different values. This probability measures give us an indication of how uncertain we are about the data. And use a log2 measure as this represents how many bits we would need to use in order to specify what the class (value of the target attribute) is of a random instance. Algorithm steps: If all the instances have the same value for the target attribute then return a decision tree that is simply this value (not really a tree - more of a stump). Else 1. Compute Gain values for all attributes and select an attribute with the highest value and create a node for that attribute. 2. Make a branch from this node for every value of the attribute 3. Assign all possible values of the attribute to branches. 4. Follow each branch by partitioning the dataset to be only instances whereby the value of the branch is present and then go back to 1. E. J4.8 J4.8 builds decision trees from a set of training data in the same way as ID3, using the concept of information entropy. The training data is a set S = s1,s2,... of already classified samples. Each sample si = x1,x2,... is a vector where x1,x2,... represent attributes or features of the sample. III. RESULT DISCUSSION The prediction analysis is to improve the academic status of the pupils’. The ANN prediction algorithm is analyzed to find out the pupils’ who are eligible for higher studies by using the following methods [9,11]. 1. The efficiency of the algorithm is measured by taking the pupil’s information like pupils’ UG percentage, board of study, parent’s educational qualification depending on these details, to finds the eligible candidates who are entering for their higher studies by comparing the J4.8, ID3 algorithm with ANN algorithm. Here 2000 pupil’s data set were collected from the Department of Computer Science in Dr. NGP Arts and Science College. The algorithm is analyzed in the following terms: 1. Whether the pupil’s are eligible for higher studies. 2. The time taken to derive the tree The training data is augmented with a vector C = c1,c2,... where c1,c2,... represent the class to which each sample belongs. J48 is an open source Java implementation of the C4.5 algorithm in the weka data mining tool. At each node of the tree, J4.8 chooses one attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. Its criterion is the normalized information gain (difference in entropy) that results from choosing an attribute for splitting the data. The attribute with the highest normalized information gain is chosen to make the decision. The J4.8 algorithm then recurses on the smaller sub lists. This algorithm has a few base cases. 326 Algorithms Number of Male Students Number of Female Students Time ID3 900 1100 3 seconds J48 876 925 2 seconds CC 1000 950 1 seconds International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012) Table 1.Comparison of ID3, J4.8 and CC Algorithms [6] Kotsiantis and etel S.B., ―Efficiency of machine learning techniques in predicting students’ performance in distance learning systems‖, proc, Recent advances in machanics and related fields. [7] Luan J. (2004) ―Data Mining Applications in Higher Education‖, SPSS Exec. Report, ―Data Mining and Knowledge Management in Higher Education – Potential Applications‖. Presentation at AIR Forum Toronto, Canada,2002. [8] Mierle, K. Laven, K., Roweis, S., Wilson, G.(2005) ―Mining Students CVS Repositories for Performance Indicators‖. [9] Naeimeh Delavari and Mohammad Reza Beikzadeh and Somnuk Phon-Amnuaisuk. ―Application of Enhanced Analysis Model for Data Mining Process in Higher Educational System‖ ITHET 6 th Annual International Conference.Juan Dolio Dominican Republic. July, 2005. 1200 1000 800 Male 600 Fem ale 400 Tim e 200 0 ID3 J48 CC Fig 2: Eligibility Criteria for Male and Female [10] Shaeela Ayesha and et al, ―Data Mining Model for Higher Education System‖, European Journal of Scientific Research, Vol.43 No.1(2010). The above table clearly specifies the numbers of male and female pupils are eligible for higher studies. By comparing the decision tree algorithms ID3, J48 with ANN (CC), the ANN finds more pupils enter into higher studies than the ID3 and J48. Therefore it is clear that CC algorithm is more accurate than the other two decision tree algorithms. IV. CONCLUSION This paper analogous two classification techniques which are, Artificial Neural Network and Decision Tree are used with respect to data classification. These techniques are applied in this paper on the institutional data for predicting the pupils’ status. The efficiency of the algorithms has been analyzed based on their accuracy and time taken for its processing. The result clearly shows that the time taken by CC less than the other two algorithms. [11] Varun Kumar, Anupama Chanda, An Emprical Study of the Applications of Data Mining Techniques in Higher Education, (IJACSA) International Journal of Advanced Computer Science and Applications, March 2011. [12] Waiyamai K, ―Improving the Quality of Graduate Students by Data Mining‖. Department of Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand,2003. REFERENCES [1] Alaa el-halees, mining students data to analyze learning behavior:, a case study, Department of Computer Science, Islamic University of Gaza. [2] Anupama Kumar S., Dr. Vijayalakshmi M.N., ―A Novel Approach in Data Mining Techniques for Educational Data‖, Proc 2011 3rd International Conference on Machine Learning and Computing‖ (ICMLC 2011), Singapore, 26th – 28th Feb 2011. [3] Delaveri. N. (2005) ―Application of Enhanced Analysis Model for Data Mining in Higher Educational System‖, IEEE. [4] Han, J., Kamber, M. (2001) ―Data Mining: Concepts and Techniques‖. Morgan Kaufmann Publishers. [5] Ibrahim, Z. and Rusli, D. (2007) ―Predicting Student’s Academic Performance: Comparing Artificial Neural Network, Decision Tree and Linear Regression‖ 21st Annual SAS forum. 327