Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal On Advanced Computer Theory And Engineering (IJACTE) _______________________________________________________________________________________________ Educational Data Mining –A New Approach to the Education Systems 1 Patil Sameer G., 2Barahate Sachin R. 1 2 Department of Computer Engineering Tasgaonkar College of Engg. & Management, Mumbai, India Dept. of Information Technology Yadavrao, Padmbhushan Vasatdada Pratishtan College of Engg. Mumbai, India Abstract: Data mining techniques are analytical tools that can be used to extract meaningful knowledge from large data sets. Data mining is an interdisciplinary field, that confluence multiple disciplines. Data mining has enormous applications for businesses and industries, but newest area of its applicability is the education sector. One of the biggest challenges that higher education system face today is to improve the quality of education and managerial decisions. The managerial decision making process becomes more intricate as the complexity of educational entities increase. Educational institute seek out more efficient technology to support decision making procedures and to formulate better management plans. This can be achieved by utilizing valuable implicit knowledge, which is currently unknown. This knowledge is hidden among the educational data set and it is extractable through data mining techniques. Artificial Neural Networks is the one of the promising data mining tool. It has better performance than that of many other traditional data mining techniques, so it can be used for narrowing knowledge gap that exists in higher learning institutes. Educational data mining through artificial neural networks will help to enhance traditional educational procedures. This research presents the capabilities of data mining in the perspective of higher educational system by proposing a systematic roadmap for higher learning institutions to enhance their current decision processes. It also aims at applying data mining techniques to discover new explicit knowledge for improving quality of education. I. INTRODUCTION: There is valuable information hidden in data. Since the underlying data is generated much faster than it can be processed and made sense of, this information often remains buried and untapped. It becomes virtually impossible for individuals or groups with limited resources specifically technological to find and gain any insight from the data. The term data mining" is more popular than the longer term of knowledge discovery in databases"(KDD). Data mining is the process of discovering interesting knowledge from large amounts of data stored either in databases, data warehouses, or other information repositories. Data mining is a field at the intersection of computer science and statistics, is the process that attempts to discover patterns in large data sets. It utilizes methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Furthermore, finding patterns and relationships can also result in prediction of future outcomes. The importance of data mining has been established for business applications, criminal investigations, bio-medicine and more recently counter-terrorism. Most retailers, for example, employ data mining practices to uncover customer buying patterns; Amazon.com uses purchase history to make product recommendations to shoppers. Data mining can be applied wherever there is an abundance of data available for and in need of analysis. II. REVIEW OF LITERATURE: Data mining is a field which has influence from many disciplines, including databases, information retrieval, statistics, algorithms, and machine learning. Data mining can be either predictive or descriptive [7]. Predictive model makes prediction about values of data using known results found from different data. Tasks as Classification, Prediction, Regression comes under predictive data mining. Descriptive model identifies patterns or relationships in data. Tasks as Clustering, Association rules come under descriptive data mining [7]. Classification is one of the most common data mining function/task, which is applicable in almost all the fields. Classification maps data into predefined groups or classes. It is often called as supervised learning because the classes are determined before examining the data. Data mining provides certain algorithms for the classification function, but classification of data must obey certain criteria. A good classification algorithm should have good predictive accuracy, fast working speed, robustness, scalability etc. Traditional algorithms _______________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -5, Issue -1, 2016 18 Data Mining encompasses tools and techniques for the extraction or mining knowledge from large amounts of data. There are many other terms carrying a similar or slightly different meaning to data mining, such as knowledge mining from databases, knowledge extraction, data pattern analysis, data archaeology, and data dredging. And other popularly used term, "Knowledge Discovery in Databases", or KDD. International Journal On Advanced Computer Theory And Engineering (IJACTE) _______________________________________________________________________________________________ may not work as required for given data. Hence, there is a need for introduction of some new techniques in the field of data mining. Artificial Neural Network (ANN) is one such field which can be applied for data mining functions [6]. In this work, I am going to focus on data classification function by using traditional data mining techniques as well as artificial neural network techniques. I will use some datasets under classification, apply traditional as well as ANN techniques for classification and finally compare the results of two classifications. This will give us more accuracy of classification and will also help in selecting better algorithm for classification. III. PROBLEM STATEMENT: Data mining is the process of discovering hidden messages, patterns and knowledge within large amounts of data and of making predictions for outcomes or behaviors. Many applications areas such as banking, retail industry and marketing, fraud detection, computer auditing, biomedical and DNA analysis, telecommunications, financial industry have already been advanced through the sturdy techniques of data mining. Another application area that can take advantage of data mining techniques is educational system, especially in higher learning institutes. system is to prove that data mining using Artificial Neural Network can be more accurate than as compare to some of traditional data mining techniques. Also in this research engineering students also classified on criteria based on their learning environments and provide necessary information to improve their class of performance. V. PHASES OF THE PROPOSED SYSTEM 1. State the Problem and Collect Data: Most data-based modeling studies are performed in a particular application domain. Hence, domain-specific knowledge and experience are usually necessary in order to come up with a meaningful problem statement. Selection of related data concerned with how the data are generated and collected. In general, there are two distinct possibilities. The first is when the datageneration process is under the control of an expert (modeler): this approach is known as a designed experiment. One of the most important facts in higher education system is quality objectives. Higher learning institutes face many problems which keep them away from achieving their quality objectives. Several of these problems stem from knowledge gap. Knowledge gap is the lack of significant knowledge at the educational main processes such as counseling, planning, registration, evaluation and marketing etc. The main idea is that the hidden patterns, associations, and anomalies that are discovered by data mining techniques can help bridge this knowledge gap in higher learning institutions. The knowledge discovered by data mining techniques would enable the higher learning institutions in making better decisions, having more advanced planning in directing students, predicting individual behaviors with higher accuracy, and enabling the institution to allocate resources and staff more effectively. It results in improving the effectiveness and efficiency of the processes thus, maintaining quality of education. IV. PROPOSED SYSTEM: For Educational Data Mining purpose, we have to make choice from available data mining algorithms depending upon the type of task to be done. For the best results, selection and implementation of the appropriate datamining algorithm is the main task and this process is not straightforward. As data mining is the interdisciplinary field, it provides multiple algorithms for doing the same task, but accuracy of each algorithm may vary. Hence selection of appropriate algorithm, for given data mining task, must be done carefully. So the idea of proposed Figure 5.1: Selection of Data Mining Technique The second possibility is when the expert cannot influence the data generation process: this is known as the observational approach. An observational setting, namely, random data generation, is assumed in most data-mining applications. Also, it is important to make sure that the data used for estimating a model and the data used later for testing and applying a model come from the same, unknown, sampling distribution. If this is not the case, the estimated model cannot be successfully used in a final application of the results. 2. Data Preprocessing: The data collected from the industry and other sources is complex and have noisy, missing and inconsistent data. The data is preprocessed to improve the quality of data and make it fit for the data mining task. The data used are transformed into appropriate formats to support meaningful analysis. Some more attributes are derived using the acquired knowledge to support the mining process. Generally, a good preprocessing method _______________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -5, Issue -1, 2016 19 International Journal On Advanced Computer Theory And Engineering (IJACTE) _______________________________________________________________________________________________ provides an optimal representation for a data-mining technique by incorporating a priori knowledge in the form of application- specific scaling and encoding. [2] Prof. Sonal kadu , Prof.Sheetal Dhande ,Elective Data Mining Through Neural Networks", IJARCSSE - Volume 2, Issue 3, March 2012 ISSN: 2277 128X [3] Dr. Yashpal Singh, Alok Singh Chauhan , Neural Networks in Data Mining" Journal of Theoretical and Applied Information Technology 2009. [4] Haykin S., Neural Networks, Prentice Hall International Inc., 1999 [5] Hongjun Lu, Rudy Setiono, and Huan Liu, Elective Data Mining Using Neural Networks" IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6. [6] Svein Nordbotten ,Data Mining with Neural Networks", Svein Nordbotten and Associates Bergen 2006. [7] M.H. Dunham, S.Sridhar ,Data Mining Introductory and Advanced Topics", Pearson Education 2007 ISBN 81-7758-785-4. 3. Apply Algorithm: Apply traditional data mining algorithm and backpropagation neural network algorithm on the preprocessed data and evaluate the accuracy of each algorithm. Accuracy is the most important factor to evaluate any model in data mining, so select the model which gives better accuracy. Data classification techniques are to be used as per this model. Naive Bayesian classification, decision tree classifiers are the examples for traditional data mining algorithms while back-propagation algorithm is ANN technique. 4. Evaluate Algorithm: Classification and prediction methods can be compared and evaluated according to the following criteria: Predictive Accuracy: This refers to the ability of the model to correctly predict the class label of new or previously unseen data. [8] Speed: This refers to the computation costs involved in generating and using the model. Han J, Kamber, Data Mining Concepts and Techniques", Morgan Kaufmann, M 2001. [9] Robustness: This is the ability of the model to make correct predictions given noisy data or data with missing values. Witten, Ian H. and Frank, Eibe, Data mining: Practical machine learning tools and Techniques", Academic Press, 2000. [10] Scalability: This refers to the ability of the learned model to perform efficiently on large amounts of data. Mrs. Bharati M. Ramageri, Data Mining Techniques and Applications", IJCSE Vol. 1 No. 4301-305 [11] Educational DataMining.org". 2012. Retrieved 2012-09-16. 53 [12] R. Baker, K. Yacef, The State of Educational Data Mining in 2009: A Review and Future Visions", Journal of Educational Data Mining, Volume 1, Issue 11-3-17. [13] C. Romero, S. Ventura, E. Garcia, Data mining in Course Management Systems: MOODLE Case Study and Tutorial", Computers and Education. 51(1) 368-384. [14] C. Romero, S. Ventura, Educational Data Mining: A Review of the State-of-the- Art", IEEE Transaction on Systems, 40(6), 601{618, 2010. [15] Umesh Kumar Pandey, Brijesh Kumar Bhardwaj, Saurabh pal, \Data Mining as a Torch Bearer in Education Sector", Technical Journal of LBSIMDS [16] Al-Radaideh, A. Qasem, E. M. Al-Shawakfa and M.I. Al-Najjar, \Mining Student Data Using Decision Trees, International Arab Conference on Information Technology, 2006. Interpretability: This refers to the level of understanding and insight that is provided by the learned model. VI. CONCLUSION This work is an effort to enhance the traditional educational process via strategic roadmap of data mining functionalities. The proposed EDM model is used to analyze the current works of data mining in education and identify the existing gaps and further works. The main contribution of this work discusses on how the various data mining techniques can be applied to the set of educational data and what new explicit knowledge or models are discovered. These models can be either predictive or descriptive. The obtained rules from each model can be translated into plain text for setting new strategies and plans to improve managerial decision making. REFERENCES: [1] Prof. Barahate Sachin , Prof.Shelake Vijay ,A Survey and Future of Data Mining in Educational Field", IEEE 978-1-4673-0471-9, 2012. _______________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -5, Issue -1, 2016 20