Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Multidisciplinary Research and Development Online ISSN: 2349-4182 Print ISSN: 2349-5979 www.allsubjectjournal.com Volume 3; Issue 3; March 2016; Page No. 43-48; (Special Issue) Data Mining In Education Selvapriya M, Dr. J Komala Lakshmi Research Scholar, Department of Computer Science, S.N.R SONS College (Autonomous), Coimbatore, Tamil Nadu, India Abstract Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where data mining is the core of this process. Data mining can be used to mine understandable meaningful patterns from large databases and these patterns may then be converted into knowledge. Data mining is the process of extracting the information and patterns derived by the KDD process which helps in crucial decision-making. Selection, transformation, mining and results interpretation. In this paper, it is reviewed that Knowledge Discovery perspective in Data Mining and consolidated different areas of data mining, its techniques and methods in it. Educational data mining (EDM) is an emerging discipline that focuses on applying data mining tools and techniques to educationally related data. The discipline focuses on analyzing educational data to develop models for improving learning experiences and improving institutional effectiveness. EDM methods are useful to measure the performance of students, assessment of students and study students’ behavior etc. In recent years, Educational data mining has proven to be more successful at many of the educational statistics problems due to enormous computing power and data mining algorithms. This paper describes how to apply the main data mining methods such as prediction, classification, relationship mining, clustering, and social area networking to educational data. Keywords Decision, Knowledge, Mining, Selection, Transformation, Warehouse, educational data mining, academic analytics, learning analytics, institutional effectiveness 1. Introduction Knowledge Discovery in Databases (KDD) is the process of finding useful knowledge from large dataset. Data preparation, pattern search, knowledge evaluation and refinement are steps of KDD. The process of Knowledge Discovery consists of Data Cleaning, Data Integration, Data Selection, Transformation, Data Mining and Pattern Evaluation Phases. In short, data mining is process of deriving patterns from large databases [3]. DM analyses large dataset to extract hidden patterns such as similar groups of data records using clustering technique. This data is used for machine learning and predictive analysis. Query languages or graphical user interface are required to express the DM requests and discovered information, so that results obtained from the DM Engine become understandable and usable for end users. Machine Learning (ML) [4]. Data mining produce useful patterns by applying algorithmic methods on observational data. Data mining algorithms show best results for numerical data but with the emergence of Statistics and Machine Learning techniques, algorithms have been developed to mine non numerical data and relational databases [5]. DM applications are successfully implemented in various fields like health care, finance, retail, telecommunication, fraud detection, risk analysis, education etc [8]. Due to increasing complexities in various fields and improvements in technology, there are new challenges to DM. Fig 1: The steps of extracting knowledge from data. 3. Educational Data Mining The educational data mining community [16] defines educational data mining as, “Educational Data Mining (EDM) is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the setting which they learn in”. EDM focuses on collection, archiving, and analysis of data related to students’ learning and assessment the application of data mining in educational system is an interactive cycle of hypothesis formation, testing, and refinement as shown is Fig 2 [10]. Discovered knowledge should enter the loop of the system and guide, facilitate, and enhance learning as a whole. The system is used to turning data into knowledge as well as filtering mined knowledge for decision making. 2. Data Mining Trends Data Mining introduced in the year 1990’s and it is the combination of many disciplines like database management systems (DBMS), Statistics, Artificial Intelligence (AI), and 4. EDM Methods Romero and Ventura, [11, 12], and Baker categorize methods in educational data mining into the following general categories: 43 Prediction Clustering Relationship mining Discovery with models Distillation of data for human judgment Fig 2: The cycle of applying data mining in educational system The first three categories of educational data mining method are largely acknowledged to be universal across types of data mining. The fourth and fifth categories achieve particular prominence within educational data mining. 4.1 Prediction The goal of prediction is to develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of data (predictor variables). Prediction has two key uses within educational data mining. In the first type we study the features of the model used for prediction. This can be mainly used for analysis of student’s performance. In the second type the output values are predicted based on the context. Prediction can be classified in three types Classification, Regression and Density estimation. 4.1.1 Classification Classification consists of assigning a class label to a set of unclassified cases. In the Supervised Classification, the set of possible classes is known in advance. In the Unsupervised Classification, Set of possible classes is not known. After classification we can try to assign a name to that class. Unsupervised classification is called clustering. Some popular classification methods include logistic regression, support vector machines and decision trees. The decision tree algorithm produces a tree-like structure of the model it produces. Form the tree it is then easy to generate rules in the form IF condition THEN outcome. It is basically a predictive model in which an instance is classified by following the path of satisfied condition from root until reaching a leaf, which will correspond to class label. Some of the most well-known decision tree algorithms are C4.5 and ID3. 4.1.2 Regression Regression is a data mining function that predicts a number. A regression task begins with a data set in which the target values are known. Regression models are tested by computing various statistics that measure the difference between the predicted values and the expected values. Some popular regression methods within educational data mining include linear regression, neural networks and support vector machine regression. The real-world educational data mining problems cannot be simply predicted. So some complex techniques are required to forecast the values using combination of various techniques. Neural network too can create both classification and regression models. 4.2 Clustering Clustering: the process of grouping physical or abstract objects into classes of similar objects [14]. Clustering is a process of partitioning a set of data (or objects) into a set of meaningful sub-classes, called clusters. It helps users understand the natural grouping or structure in a data set. Clusters of objects are formed so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. Each cluster formed can be viewed as a class of objects, from which rules can be derived [7]. Application of clustering in education can help in finding academic trends, student’s performance analysis in class. In educational data mining, clustering has been used to group students according to their behavior [2]. The schools could be clustered together (to investigate similarities & differences between schools), students could be clustered together (to investigate similarities and differences between students). Some of the well known clustering methods are k-mean and expectationmaximization algorithm (EM-clustering) [9]. 4.3 Relationship Mining In relationship mining, the goal is to discover relationships between variables, in a data set with large number of variables. This may take the form of attempting to find out which variables are most strongly related/ associated with a single variable of particular interest. Broadly relationship mining is classified into four types: association rule mining, correlation mining, sequential pattern mining, and casual data mining. 4.3.1 Association rule mining Association rules are if/then statements that help uncover relationships between unrelated data in a relational database or other information repository. An association rule is an implication of the form: X Y, where X, Y I, and X Y =. Association rule mining has been applied to educational data mining for: finding students’ mistakes often occurring together while solving exercises [15], finding 44 out relationships in learners’ behavior patterns [17], diagnosing students’ learning problems and offers students advice [13] etc. 4.3.2 Correlation mining In correlation mining, the goal is to find (positive or negative) linear correlations between variables. Correlation analysis is used to find the most strongly correlation attributes 5. The Various Data Mining Areas 5.1 Web Mining Web mining is the application of data mining to discover the patterns from the Web in the form of data collected from online information databases, hyperlinks, and digital data. Data mining technique used in web mining are Classification (supervised learning), Clustering (unsupervised learning) [19, 20]. 5.2 Ubiquitous data mining Increasing computational capacity and emergence of latest electronic devices leads to ubiquitous or pervasive computing paradigm [21]. The Ubiquitous computing environments give rise to Ubiquitous Data Mining (UDM). 5.3 Data mining using multimedia The multimedia data includes images, video, audio, and animation. Data mining techniques followed in multimedia data are rule based decision tree classification algorithms like Artificial Neural Networks, Instance-based learning algorithms, Support Vector Machines, Association rule mining, clustering methods [23]. 5.4 Spatial data mining The spatial data includes astronomical and data related to space technology. It includes the use of spatial warehouses, spatial data cubes, spatial OLAP, and clustering methods [25] . 5.5 Emergence of Data mining in other fields Other data mining areas include visualization, medical, pattern, wireless networks, association rule based mining. 6. Performance Improvement in Education Sector 6.1.1 Data Mining Techniques in Education Applying data mining techniques to educational data for knowledge discovery is significant to educational organizations as well as students. Knowledge driven data supports educational decision support system. Educational data mining enhance our understanding of learning by finding educational trends which includes improving student performance, course selection, in-house trainings and faculty development. Using linear regression analysis [26], some factors are correlated to student’s academic performance like mother’s education and student’s family income. DM combines machine learning, statistics and visualization techniques to discover and extract knowledge. Questionnaires and feedback forms are often used to collect data related to students’ approach towards educational patterns or trends, interest towards technologies, teaching methodologies followed and data collected is to be analyzed using techniques like decision tree, neural networks etc. There are different Mining models like Decision Trees, Naïve Bayes, Support Vector Machines, Linear Regression, Minimum Description Length, K-means, and O-Cluster. By using these models, one can get Student Behaviors Patterns, Course Behavior Patterns, Predict Student Retention, Predict Course Suitability, and Personalized Intervention Strategy [27] . 6.1.2 Statistics and visualization According to Tsantis & Castellani [31], Student’s log history and usage statistics are helpful in evaluation of an e-learning system. Information visualization techniques [28] can be used to graphically represent student data like his maximum interest towards which technologies or interest which he has shown in solving questionnaires etc are collected by webbased educational systems. Visualization techniques involves conversations among online groups, social networking websites etc. These techniques are also helpful for instructors which can manipulate the graphical representations generated and get the understanding and interest of their learners. 6.2 Web mining Srivastava et al., [32] proposed, “Web mining is used to extract knowledge from web data”. In web mining useful information is extracted from the contents of web documents and web usage mining is another technique to discover meaningful patterns from data generated by clientserver Transactions on one or more web localities. 6.2.1 Clustering, classification and outlier detection Clustering and classification are both classification methods. Clustering is unsupervised and classification is supervised. Classification and prediction are also related techniques. Classification predicts class labels, whereas prediction predicts continuous-valued functions and outlier is an observation that is unusually large or small relative to the other values in a dataset. According to Chen, Liu [35] decision tree i.e. C5.0 algorithm and data cube technology are used for managing classroom processes. Induction analysis helps in identifying potential student groups having similar characteristics. Talavera and Gaudioso [36] propose mining student data using clustering to discover patterns reflecting user behaviors. 6.2.2 Adaptive and intelligent web-based educational systems Tang et al. [37] gives concept of data clustering for web based learning and to help in solving learner based problems. They find clusters of students with similar learning characteristics based on the sequence and the contents of the pages they visited. 6.3 Association rule mining Association rule mining is popular mining method used between set of items in large databases. Here one or more attributes of a dataset are associated with each other using IF-THEN statements. 6.3.1 Particular web-based courses Ha et al. [38] perform web page navigational structure analysis from web-based virtual classrooms, e-learning portals and web pages navigated by learners. 45 6.3.2 Adaptive and intelligent web-based educational systems Lu uses association fuzzy rules in a personalized e-learning material recommender system. Fuzzy matching rules are used to discover associations between student’s requirements and a list of learning materials [38]. Romero et al. [39] propose to use grammar-based genetic programming with optimization techniques for providing a feedback to authors who designed courses and derived relationships from student’s usage information. 6.4 Text mining In text mining, mining is done on text data and is related to web content mining. It is an interdisciplinary area involving machine learning and data mining, statistics, information retrieval and natural language processing[28,40].Text mining can work with unstructured or semi-structured datasets such as full-text documents, HTML files, emails, etc. 6.5 Web-based educational systems Data mining and text mining technologies are used in Webbased educational systems for shared learning. Text mining is used for discussion board for expanded correspondence analysis. Learners select the relevant category which represents his/her comment and the system provides evaluations for learner’s comments between peers. 6.5.1 Well-known learning content management systems Dringus and Ellis [41, 42] propose to use text mining as a strategy for assessing conversations among irregular discussion forums. Text mining techniques also helps in evaluating the progress of a thread or user group discussions. Data can be retrieved from pdf interactive multimedia productions for helping the evaluation of multimedia presentations for statistics purpose and for extracting relevant data [38, 39]. Web-based educational systems collect large amount of student data from web log history which can be further analyzed for deriving meaningful patterns [43]. 6.5.2 Adaptive and intelligent web-based educational systems Tang et al. [37] propose to construct a personalized web based application by which mining can be done on both framework and structure of the courseware. Keyword-driven text mining algorithms are used to select articles for distance learning students. 7. Conclusion Educational Data Mining is an upcoming field related to several well-established areas of research including elearning, web mining, text mining etc. Data Mining Techniques are used to analyze Educational data and extract useful information from large amount of data. This paper presents review of the KDD and basic data-mining techniques so as to integrate research in this area. The KDD field is related to development of methods and techniques which make the data relevant. In Educational Sector software’s and visualization techniques can be developed using Data Mining Techniques which not only predict student’s performance in examinations as well as helps us to cluster those students who need special attention in their studies. Knowledge Discovery in Databases results in better decision-making related to latest technologies useful in classroom teaching as well as faculty enhancement programs and in-house trainings etc. Using data mining techniques we can achieve refined data from distributed databases. Data Mining is an efficient tool for improving institutional effectiveness and student learning. Knowledge ac-quired by Educational Data Mining not only help teachers to manage their classes, improves their teaching skills, students learning processes but also provide feedback to institutions to im-prove their infrastructures and quality. For making this approach successful and to increase its scope, more data can be collected from Educational Institutions and queries can be performed on it. Using Techniques like Decision Tree we can predict the Class Result of students based on the attributes taken. Decision tree classifiers are used on student's data to predict the student's performance in class result. These techniques will help in identifying those students who are below attendance and shown poor performance in Sessional. The main finding of using these techniques is the gathering of knowledge from student’s academic performance. Another helpful technique is K-Means Clustering through which we can cluster the students based on some attributes like their Class Performance, sessional and Attendance in class. Centroid are calculated from the educational data set taking Kclusters. It enhances the decision making approach to monitor the performance of students. On increasing the value of K clusters, the accuracy becomes better with huge dataset and K means can find the better grouping of the data. It also helps us to clusters those students who need special attention. This review on the Knowledge Discovery perspective in Data Mining would be helpful to find useful patterns related to Educational Data Set. 8. Reference 1. Fan Jianhua, Li Deyi. An Overview of Data Mining and Knowledge” Discovery, J. of Comput. Sci. & Technol. 1998; 13(4). 2. Amershi S, Conati C. Automatic Recognition of Learner Groups in Exploratory Learning Environments, Proceedings of ITS 2006, 8th International Conference on Intelligent Tutoring Systems, 2006. Tavel, P. Modeling and Simulation Design. AK Peters Ltd, 2007 3. Sang Jun Lee, Keng Siau. A review of data mining techniques, Industrial Management & Data Systems 2001; 101(1):41-46. 4. Piatetsky-Shapiro, Gregory. The Data-Mining Industry Coming of Age. IEEE Intelligent Systems, 2000. 5. Venkatadri M, Dr. Lokanatha C Reddy. A Review on Data mining from Past to the Future. International Journal of Computer Applications. 2011; 15(7):09758887. 6. Brijesh Kumar Baradwaj, Saurabh Pal. Mining Educational Data to Analyze Students’ Performance. In International Journal of Advanced Computer Science and Applications. 2011; 2(6):63-69. 7. Beck JE, Mostow J. How who should practice: Using learning decomposition to evaluate the efficacy of different types of practice for different types of students, In Proceedings of the 9th International 46 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. Conference on Intelligent Tutoring Systems, 2008, pp. 353-362. Dharminder Kumar, Deepak Bhardwaj. Rise of Data Mining: Current and Future Application Areas. IJCSI International Journal of Computer Science Issues. 2011; 8(5):1, ISSN (Online): 1694-0814. Dempster A, Larid N, Rubin D. Maximum Likelihood estimation from incomplete data via EM Algorithm. Journal of Royal Statistics Society. 1997; 39(1)1-38. Romero C, Ventura S. Educational data mining: A survey from 1995 to 2005, Expert Systems with Applications 2007; 33(1):135-146. Romero C, Ventura S, Bra P, Castro C. Discovering prediction rules in aha! Courses, In Proceedings of the International Conference on User Modeling, 2003, pp. 25-34. Romero C, Ventura S, Espejo PG, Hervas C. Data Mining Algorithms to Classify Students, In Proceedings of the 1st International Conference on Educational Data Mining, 2008, pp. 8-17. Hwang GJ, Hsiao CL, Tseng CR. A computer-assisted approach to diagnosing student learning problems in science courses. Journal of Information Science and Engineering. 2003; 19:229-248. Jain AK, Murty MN, Flynn PJ. Data clustering: A review, ACM Computing Surveys 1999; 31(3):264-323. Merceron A, Yacef K. Mining student data captured from a web-based tutoring tool: Initial exploration and results, Journal of Interactive Learning Research 2004; 15(4):319-346. www.educationaldatamining.org. Yu P, Own C, Lin L. On learning behavior analysis of web based interactive environment, In Proceedings of the implementing curricular change in engineering education, Oslo, Norway, 2001, pp. 1-10. International Journal of Data Mining & Knowledge Management Process (IJDKP). 2014; 4(5). Soumen Chakrabarti. Data Mining for hypertext: A tutorial survey, SIGKDD Explorations, 2000; 1(2). Ming-Syan Chen, Jiawei Han, Data Mining. An overview from a Database Perspective, IEEE Transactions on Knowledge and Data Engineering, 1996; 8(6). Hsu J. Data Mining Trends and Developments: The Key Data Mining Technologies and Applications for the 21st Century, The Proceedings of the 19th Annual Conference for Information Systems Educators (ISECON 2002), ISSN: 1542-7382, 2002. Available Online: http://colton.byuh.edu/isecon/2002/224b/Hsu.pdf Shonali Krishnaswamy. Towards Situation awareness and Ubiquitous Data Mining for Road Safety: Rationale and Architecture for a Compelling Application (2005), Proceedings of Conference on Intelligent Vehicles and Road Infrastructure, 2005, 16-17. Available at: http://www.csse.monash.edu.au/~mgaber/CameraReady Kotsiantis S, Kanellopoulos D, Pintelas P. Multimedia mining. WSEAS Transactions on Systems 2004; 3:3263-3268. Abdulvahit, Torun, Ebnem, Düzgün. Using spatial data mining techniques to reveal vul-nerability of people and places due to oil transportation and accidents: A case 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. study of Istanbulstrait, ISPRS Technical Commission II Symposium, and Vienna. Addison Wesley, 1st edition. Ying Zhang, Samia Oussena, Tony Clark, Hyeonsook Kim. Use Data Mining to improve student retention in higher education – A CASE STUDY. Lukasz A Kurgan, Petr Musilek. A survey of Knowledge Discovery and Data Mining process models, The Knowledge Engineering Review 2006; 21(1):1-24. Sachin RB, Vijay MS. A Survey and Future Vision of Data Mining in Educational Field, published in 2012, Second International Conference on Advanced Computing & Communication Technologies (ACCT), Rohtak, Haryana, ISBN 978-1-4673-0471-9, 2012, pp 96-100. Tsantis L, Castellani J. Enhancing Learning Environments through Solution-based Knowledge Discovery Tools: Forecasting for Self-Perpetuating Systemic Reform. Journal of Special Education Technology. 2001; 16(4):39-52. Jaideep Srivastava, Prasanna Desikan, Vipin Kumar. Web Mining - Concepts, Applications & Research Directions, AHPCRC Technical Report, Chapter 3, pp.51-53 Haixun Wang, Wei Wang, Jiong Yang, Philip S. Yu. Clustering by Pattern Similarity in Large 13 Data Sets, In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, ACM, 2002 pp. 394-405. Liu, Chen-Chung. Knowledge discovery from web portfolios: tools for learning performance assessment. Diss. 2001. Talavera Luis, Elena Gaudioso. Mining student data to characterize similar behavior groups in unstructured collaboration spaces. In Proceedings of the Artificial Intelligence in Computer Supported Collaborative Learning Workshop at the ECAI 2004, pp. 17-23. Tang Tiffany Ya, Gordon McCalla. Student modeling for a web-based learning environ-ment: a data mining approach." In AAAI/IAAI, 2002, pp. 967-968. Ha S, Bae S, Park S. Web mining for distance education. In IEEE international conference on management of innovation and technology, 2000, pp. 715-719. Cristobal Romero, Sebastian Ventura, Paul De Bra. Knowledge Discovery with Genetic Programming for Providing Feedback to Courseware Authors, User Modeling and User-Adapted Interaction 2004; 14:425464 © Springer 2005. Vishal Gupta, Gurpreet S Lehal. A Survey of Text Mining Techniques and Applications. Journal of Emerging Technologies in Web Intelligence. 2009; 1(1). Laurie P Dringus, Timothy Ellis. Using data mining as a strategy for assessing asynchronous discussion forums”, Computers & Education 2005; 45:141-160. M'hammed Abdous, Wu He, Cherng-Jyh Yen. Using Data Mining for Predicting Relation-ships between Online Question Theme and Final Grade, Educational Technology & Society, 2012; 15(3):77-88. Available Online at www.sciencedirect.com. 47 39. Agathe Merceron, KalinaYacef, Educational Data Mining: a Case Study, Supporting Learning through Intelligent and Socially Informed Technology. Proceedings of the 12th International Conference on Artificial Intelligence in Education, AIED 2005, 18-22, Amsterdam, the Netherlands. Mrs. Selvapriya. M has finished her Bachelor of Computer Applications from Hindusthan College of Arts & Science, Coimbatore. She has completed her M.C.A from Hindusthan College of Arts & Science, Coimbatore. She has been awarded her M.Phil in Data Mining from Hindusthan College of Arts & Science, Coimbatore. She is working as Assistant professor in Hindusthan College of Arts & Science, Coimbatore for seven years. She is currently a regular part - time Research Scholar in Department of Computer Science at SNR SONS COLLEGE, Coimbatore, Tamil Nadu, India working towards her Ph.D. Mrs. J. Komala Lakshmi has finished her B.Sc Mathematics from Seetha Lakshmi Achi College for women, Pallathur, during 1995. She has completed her M.C.A from J. J. college of Arts and Sciences, Pudhukottai during 1998. She has been awarded her M.Phil Computer Science, Bharathiar University during 2008. She has been awarded with the Silver medal for her academic Excellence in M.C.A from the Institution. She has ten years of teaching experience in collegiate service. She is currently working as Assistant Professor, Department of Computer Science, SNR SONS COLLEGE, Coimbatore. She has presented her papers in three international conferences. She has published papers in six international journals. She also contributed a book publications to the computer science students. She was a Graduate Student Member of IEEE. She has been awarded with the recognition from Who’s Who in the World 2011s, 28th Edition for her not able contribution to the society. Biography M. Selvapriya completed my Bachelor of Computer Application from Hindusthan College of Arts & Science, Coimbatore. I completed M.C.A. from Hindusthan College of Arts & Science and I have been awarded with M.Phil, in Data mining from Hindusthan College of Arts & Science, Coimbatore. I am having a teaching experience of 8 years as Assistant Professor, Department of Computer Technology & Information Technology in Hindusthan College of Arts & Science, Coimbatore. Now, I am currently pursuing my Part-Time Ph.D as a Research Scholar in Department of Computer Science, SNR Sons College, Coimbatore. I have published 2 International Journals and participated and presented papers in various workshops, seminars and conferences. My research areas include Data Mining. 48