Download Data Mining Technology in e

Data Mining Technology in e-Learning Abdel-Badeeh M. Salem Computer Science Department Faculty of Computer & Information Sciences Ain Shams University, Abbassia, Cairo, Egypt E-mail: [email protected] Abstract Data mining technology deals with the discovery of hidden knowledge, unexpected patterns and new rules from large database. It is currently regarded as the key element of a much more elaborate process called "knowledge discovery in databases", KDD. From the artificial intelligence point of view, the term KDD refers to the whole process of extraction of knowledge from data. Recently, researchers have begun to investigate various data mining methods to help teachers improve the capabilities of e-Learning systems. These methods allow them to discover new knowledge based on students' usage data. So, one of the most promising areas is the application of knowledge extraction. This talk presents the application of data mining techniques and concepts in e-Learning systems. 1. Introduction Knowledge discovery process (KDD) and data mining aim to extract useful information and discover some hidden patterns form huge amount of databases, which statistical approaches cannot discover. it is a multidisciplinary field of research includes: machine learning, databases, statistics, expert systems, visualization, high performance computing, rough sets, neural networks, and knowledge representation, etc. And some of the most useful data mining tasks and methods are statistics, visualization, clustering, classification and association rule mining [1]. Recently, researchers have begun to investigate various data mining methods to help instructors and administrators to improve elearning systems [2]. These methods discover new, interesting and useful knowledge based on students’ usage data. Some of the mains elearning problems or subjects to which data mining techniques have been applied are dealing with the assessment of student’s learning performance, provide course adaptation and learning recommendations based on the students’ learning behavior, dealing with the evaluation of learning material and educational web-based courses, provide feedback to both teachers and students of e-learning courses, and detection of atypical student’s learning behavior. 2. KDD and Data Mining Methodology Knowledge Discovery in Databases process involves the following processes; (a) using the database along with any required selection, Preprocessing, subsampling, and transformations of it. (b) applying data mining methods (algorithms) to enumerate patterns from it. and (c) evaluating the products of data mining to identify the subset of the enumerated patterns deemed knowledge. The data mining components of the KDD process is concerned with the algorithmic means by which patterns are extracted and enumerated from data. The overall KDD process includes the evaluation and possible interpretation of the mined patterns to determine which patterns can be considered new Knowledge. KDD process is interactive and iterative, involving numerous steps with many decisions made by the user. In what follows a brief description about each process [1]  Define the Goal of KDD Process: Developing an understanding of the application domain and the relevant prior knowledge and identifying the goal of the KDD Process from the customer's viewpoint.  Selection of Target Dataset: Creating a target data set, OR selecting a data set, OR focusing on a subset of variables or data samples, on which discovery is to be performed.  Data cleaning and preprocessing : in which the following tasks are performed (a) removing noise if appropriate , (b) collecting the necessary information to model or account for noise, (c) deciding on       strategies for handling missing data fields , (d) accounting for timesequence information and known changes. Data reduction and transformation: Finding useful features to represent the data depending on the goal of the task. With dimensionality reduction or transformation methods, the effective number of variables under consideration can be reduced, or invariant representations for the data can be found. Matching the goals of the KDD process to a particular data-mining method, e.g. Summarization, Regression, Classification and Clustering. Exploratory analysis and model & hypothesis selection: Choosing the mining algorithm(s) and selecting method(s) to be used for searching for data patterns. This process includes (a) Deciding which models and parameters might be appropriate (models for categorical data are different than models of vectors) (b) Matching a particular data-mining method with the overall criteria of the KDD process (the end user might be more interested in understanding the model than its predictive capabilities). Data mining: Searching for patterns of interest in a particular representational form or a set of such representations, including classification rules or trees, regression, and clustering. Interpreting mined patterns: This process involves visualization of the extracted patterns and models or visualization of the data given the extracted models possibly returning to any of process 1 through 7 for further iteration. Acting on the discovered knowledge: Using the knowledge directly, incorporating the knowledge into another system for further actions, or simply documenting it and reporting it to interested parties. This process includes checking for and resolving potential conflicts with previously believed (or extracted) knowledge. 3. Data Mining Tasks Data mining is supported by a host that captures the character of data in several different ways.  Clustering: The key objective is to find natural groupings (clusters) in highly dimensional data. Clustering is an example of unsupervised learning, and it is a part of pattern recognition.  Regression Models: These originate from standard regression analysis and its applied part known as system identification. The underlying idea is to construct a linear or nonlinear function  Classification: This concerns learning that classifies data into the predetermined categories. The term originates form pattern recognition, in which a vast number of classifiers have been developed.  Summarization: This is an approach towards characterizing data via small number of features/attributes. In the simplest scenario one can think of a mean and standard deviations as two extremely compact descriptors of the data. This technique is often applied in an interactive exploratory data analysis and automated report generation.  Link analysis: It is concerned with determination of relationships (dependencies) between fields in a database. In a particular case we may be interested in the determination of the correlation between the variables.  Sequence Analysis: This type of analysis is geared toward problems of modeling sequential data. Pertinent models embrace time series analysis, time series models, and temporal neural networks. 4. Data Mining Techniques This section presents a brief account about the well known data mining techniques [3]. 4.1. Neural Networks Neural networks (NN) are inspired in biological models of brain functioning. They are capable of learning by examples and generalizing the acquired knowledge. Due to these abilities the neural networks are widely used to find out nonlinear relations which otherwise could not be unveiled due to analytical constraints. The learned knowledge is hidden in their structure thus it is not possibly to be easily extracted and interpreted. The structure of the multilayered perceptron (MLP), i.e. the number of hidden layers and the number of neurons, determines its capacity, while the knowledge about the relations between input and output data is stored in the weights of connections between neurons. The values of weights are updated in the supervised training process with a set of known and representative values of input – output data samples. 4.2. Support Vector Machines(SVM) SVM are new learning-by example paradigm for classification and regression problems [4]. SVM have demonstrated significant efficiency when compared with neural networks. Their main advantage lies in the structure of the learning algorithm which consists of a constrained quadratic optimization problem (QP), thus avoiding the local minima drawback of NN. The approach has its roots in statistical learning theory (SLT) and provides a way to build “optimum classifiers” according to some optimality criterion that is referred to as the maximal margin criterion. An interesting development in SLT is the introduction of the Vapnik- Chervonenkis (VC) dimension, which is a measure of the complexity of the model. Equipped with a sound mathematical background, support vector machines treat both the problem of how to minimize complexity in the course of learning and how high generalization might be attained. This trade-off between complexity and accuracy led to a range of principles to find the optimal compromise. Vapnik and co-authors' work have shown the generalization to be bounded by the sum of the training error and a term depending on the Vapnik- Chervonenkis (VC) dimension of the learning machine leading to the formulation of the structural risk minimization (SRM) principle. By minimizing this upper bound, which typically depends on the margin of the classifier, the resulting algorithms lead to high generalization in the learning process. 4.3. Clustering Clustering techniques apply when the instances of data are to be divided into natural groups. The classical clustering technique is k-means where clusters are specified in advance prior to application of the algorithm. This corresponds to parameter k. Then k points are chosen at random as clusters centers. All instances are assigned to their closest cluster center according to the Euclidian distance metric. Next the centroid, or mean, of each cluster center is calculated. These centroids are taken to be the new cluster centers for their respective clusters. The whole process is repeated with the new cluster centers. Iteration continues until the same points are assigned to each cluster in consecutive runs. At this point the cluster centers have stabilized and will remain the same [3]. There are many variants of clustering even for the kmeans algorithm depending upon the method of choosing the initial centers. 4.4. Association rule mining Association rules mining is one of the most well studied data mining tasks. It discovers relationships among attributes in databases, producing if-then statements concerning attribute-values [5]. An association rule X ⇒ Y expresses that in those transactions in the database where X occurs; there is a high probability of having Y as well. X and Y are called respectively the antecedent and consequent of the rule. The strength of such a rule is measured by its support and confidence. The confidence of the rule is the percentage of transactions with X in the database that contain the consequent Y also. The support of the rule is the percentage of transactions in the database that contain both the antecedent and the consequent. Association rule mining has been applied to e-learning systems for traditionally association analysis (finding correlations between items in a dataset).An efficient algorithm to discover these association rules was first introduced in [5]. The algorithm constructs a candidate set of frequent item sets of length k, counts the number of occurrences, keeps only the frequent ones, then constructs a candidate set of item sets of length k+1 from the frequent item sets of smaller length. It continues iteratively until no candidate item set can be constructed. In other words, every subset of a frequent item set must also be frequent. The rules are then generated from the frequent item sets with probabilities attached to them indicating the likelihood (called support) that the association occurs. We use this idea of association rules to train our recommender agent to build a model representing the web page access behavior or associations between on-line learning activities. 4.5. Rough sets Rough set theory was proposed as a new approach to vague concept description from incomplete data. The rough set theory is one of the most useful techniques in many real life applications such as medicine, pharmacology, engineering, banking and market analysis. This theory provides a powerful foundation to reveal and discover important structures in data and to classify complex objects. One of the main advantages of rough set theory is that it does not need any preliminary or additional information about data. Information about rough sets software for data analysis was given in [6]. In our research group at Ain Shams, a rough setbased medical system for mining patient data for predictive rules to determine thrombosis disease was developed in [6] this system aims to search for patterns specific/sensitive to thrombosis disease. This system reduced the number of attributes that describe the thrombosis disease from 60 to 16 significant attribute in addition to extracting some decision rules, through decision applying decision algorithms, which can help young physicians to predict the thrombosis disease. 4.6. Genetic Algorithms Many classifications models have been proposed in the literature, such as distributed algorithms, restricted search, data reduction algorithms, parallel algorithms, neural networks and decision trees, genetic algorithms. These approaches either cause loss of accuracy or cannot effectively uncover the data structure. Genetic Algorithms (GA) provide an approach to learning that based loosely on simulated evolution. The GA methodology hinges on a population of potential solutions, and as such exploits the mechanisms of natural selection well known in evolution. Rather than searching from general to specific hypothesis or from simple to complex GA generates successive hypotheses by repeatedly mutating and recombining parts of the best currently known hypotheses. The GA algorithm operates by iteratively updating a poll of hypotheses (population). One each iteration, old members of the population are evaluated according a fitness function. A new generation is then generated by probabilistically selecting the fittest individuals form the current population. Some of these selected individuals are carried forward into the next generation population others are used as the bases for creating new offspring individuals by applying genetic operations such as crossover and mutation. In our research group we developed a hybrid classifier that integrates the strengths of genetic algorithms and decision trees. The algorithm was applied on a medical database of 20 MB size for predicting thrombosis disease [7]. The results show that our classifier is a very promising tool for thrombosis disease prediction in terms of predictive accuracy. 5. Applications of data mining techniques in e-learning This section presents the applications of different data mining methods and tasks in elearning domain [8]. 5.1. Application of association rule mining in web-based education systems Association rule mining has been applied to web-based education systems for the following tasks:  Building recommender agents that could recommend on-line learning activities or shortcuts.  Diagnosing student learning problems and offer students advice.  Guiding the learner’s activities automatically and recommending learning materials.  Determining which learning materials are the most suitable to be recommended to the user.  Identifying attributes characterizing patterns of performance disparity between various groups of students.  Discovering interesting relationships from student’s usage information in order to provide feedback to course author.  Finding out relationships in learners’ behaviour patterns.  Finding students’ mistakes that often accompany each other.  Guiding the search for best fitting transfer models of student learning.  Optimizing the content of the elearning portal by determining what most interests the user. 5.2. Information Visualization Information visualization is a branch of computer graphics and user interface which is concerned with the presentation of interactive or animated digital images so that users can understand data [9]. These techniques facilitate analysis of large amounts of information by representing the data in some visual display. Normally large quantities of raw instance data are represented or plotted as spreadsheet charts, scatter plots and 3D representations. Information visualization can be used to graphically render complex, multidimensional student tracking data collected by web-based educational systems [10]. The information visualized in e-learning can be used in the following educational tasks; complementary assignments, admitted questions, exam scores, etc. Visualization tools enable instructors to manipulate the graphical representations generated, which allow them to gain an understanding of their learners and become aware of what is happening in distance classes. The most common specific visualization tools in educational domain are:  CourseVis visualizes data from a java on-line distance course inside WebCT.  GISMO uses Moodle students’ tracking data as source data, and generates graphical representations that can be explored by course instructors.  Listen tool browses vast student–tutor interaction logs from Project LISTEN’s automated Reading Tutor. 5.3. Clustering Clustering is a process of grouping objects into classes of similar objects [11]. It is an unsupervised classification or partitioning of patterns (observations, data items, or feature vectors) into groups or subsets (clusters) based on their locality and connectivity within an ndimensional space. In e-learning, clustering has been used for:  Finding clusters of students with similar learning characteristics and to promote group-based collaborative learning as well as to provide incremental learner diagnosis.      Discovering patterns reflecting user behaviors and for collaboration management to characterize similar behavior groups in unstructured collaboration spaces. Grouping students and personalized itineraries for courses based on learning objects. Grouping students in order to give them differentiated guiding according to their skills and other characteristics. Grouping tests and questions into related groups based on the data in the score matrix. Grouping users based on the timeframed navigation sessions. 5.4. Classification A classifier is a mapping from a (discrete or continuous) feature space X to a discrete set of labels Y [12]. Classification or discriminant analysis predicts class labels. This is supervised classification which provides a collection of labeled (preclassified) patterns, the problem being to label a newly encountered, still unlabeled, pattern. In e-learning, classification has been used for:  Discovering potential student groups with similar characteristics and reactions to a specific pedagogical strategy.  Predicting students’ performance and their final grade.  Detecting students’ misuse or students playing around.  Predicting the students’ performance as well as to assess the relevance of the attributes involved.  Grouping students as hint-driven or failure-driven and finding students’ common misconceptions.  Identifying learners with little motivation and finding remedial actions in order to lower drop-out rates.  Predicting course success. 5.5. Sequential Pattern Mining (SPM) SPM is a more restrictive form of association rule mining in which the accessed items’ order is taken into account. It tries to discover if the presence of a set of items is followed by another item in a time-ordered set of sessions or episodes [13]. The applications of sequential patterns in elearning can be summarized in the following:  Evaluating learners’ activities and can be used in adapting and customizing resource delivery.  Discovering and comparison with expected behavioral patterns specified by the instructor that describe an ideal learning path.  Giving an indication of how to best organize the educational web space and be able to make suggestions to learners who share similar characteristics.  Generating personalized activities to different groups of learners.  Supporting the evaluation and validation of learning site designs.  Identifying interaction sequences indicative of problems and patterns that are markers of success. 5.6. Text Mining TM can be viewed as an extension of data mining to text data and it is closely related to web content mining. Its methods include text mining that can work with unstructured or semistructured data sets such as full-text documents, HTML files and emails [14]. The specific application of text mining techniques in elearning can be used for the following:  Grouping documents according to their topics and similarities and providing summaries.  Finding and organizing material using semantic information.  Supporting editors when gathering and preparing the materials.  Evaluating the progress of the thread discussion to see what the contribution to the topic is.  Collaborative learning and a discussion board with evaluation between peers.  Identifying the main blocks of multimedia presentations.  Selecting articles and automatically constructing e-textbooks and personalized courseware.  Detecting the conversation focus of threaded discussions, classifying topics and estimating the technical depth of contribution. 5.7. Applying data mining tools management learning systems for Nowadays, data mining tools are normally designed more for power and flexibility than for simplicity. In what follows a brief description of the general, public and specific educational data mining tools  General and specific data mining tools and frameworks; e.g.DBMiner, SPSS Clementine, DB2 and Intelligent Miner.  Public domain mining tools; e.g. Weka and Keel.  Specific educational data mining tools o Tools for association and pattern and text mining [15]; e.g. TheMining tool, EPRules, Simulog, Sequential Mining tool, O3R and KAON. o Tools for statistics and visualization [16]; e.g. Synergo/ColAT, GISMO, Listen tool TADAEd, o Tools for association and classification [17], MultiStar and CIECoF. o Tools for learning paths and performance [18]; e.g. Tool and MINEL 6. Conclusions The paper discusses the application of data mining techniques in e-learning tasks and domains. The following techniques; visualization, clustering, classification, sequential pattern mining, and text mining are discussed from e-learning prospective. Data mining techniques can enhance on-line education for the educators as well as the learners. While some tools using data mining techniques to help educators and learners are being developed, the research is still in its infancy. Data mining techniques are very promising approach towards the analysis of the data of student activities and behavior which accumulated by learning management systems. Most of the current data mining tools are too complex for educators to use their features go well beyond the scope of what an educator might require. 7. References [1] Cios K. J., Pedrycz, W. and Swiniarski, R. W. Data Mining Methods for Knowledge Discovery. Kluwer 1998. [2] Romero, C., & Ventura, S. Data mining in elearning. Southampton, UK: Wit Press 2006. [3] I. H. Witten and E. Frank, Data Mining – Practical Machine Learning Tools and Techniques. 2nd ed Elsevier, 2005. [4] C. Cortes and V. Vapnik, “Support vector networks”, Machine Learning, vol. 20, pp. 273297, 1995. [5] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data, pages 207–216, Washington, D.C., May 1993. [6] A. M. salem, safia A. Mahmoud., “Mining patient Data Based on Rough Set Theory to Determine Thrombosis Disease”, Proceedings of First Intelligence conference on Intelligent Computing and Information Systems, pp 291296. ICICIS 2002, Cairo, Egypt, June 2426,2002. [7] Abdel-Badeeh M.Salem and Abeer M.Mahmoud, “A Hybrid Genetic AlgorithmDecision Tree Classifier”, Proceedings of the 3rd International Conference on New Trends in Intelligent Information Processing and Web Mining, Zakopane, Poland, pp. 221-232, June 25, 2003. [8] C. Romero, S. Ventura, E. Garcıa. Data mining in course management systems: Moodle case study and tutorial. Computers & Education 2007. [9] Spence, R. Information visualization. Addison-Wesley 2001. [10] I. Cadez, D. Heckerman, and C. Meek. Visualization of navigation patterns on web site using model based clustering. In ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD’00), PP 280–284, Boston, USA, August 2000. [11] Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323. [12] Duda, R. O., Hart, P. E., & Stork, D. G. Pattern classification. Wiley Interscience 2000. [13] Agarwal, R., & Srikant, R. Mining sequential patterns. In Proceedings of the eleventh international conference on data engineering, Taipei, Taiwan (pp. 3–14), 2005. [14] Feldman, R., & Sanger, J. The text mining handbook. Cambridge University Press 2006. [15] Zaı¨ane, O., & Luo, J. Web usage mining for a better web-based learning environment. In Proceedings of conference on advanced technology for education, Banff, Alberta, PP. 60–64 2001. [16] Mazza, R., & Milani, C. Exploring usage analysis in learning systems: Gaining insights from visualisations. In Workshop on usage analysis in learning systems at 12th international conference on artificial intelligence in education, New York, USA PP. 1–6, 2005. [17] Silva, D., & Vieira, M. Using data warehouse and data mining resources for ongoing assessment in distance learning. In IEEE international conference on advanced learning technologies, Kazan, Russia PP. 40–45, 2002. [18] Bellaachia, A., Vommina, E., & Berrada, B. (2006). Minel: A framework for mining elearning logs. In Proceedings of the fifth IASTED international conference on Web-based education, Mexico PP. 259-263, 2006.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Data Mining Technology in e