Download Using Data Warehouse and Data Mining Resources for

Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning Daniela Resende Silva1 E-mail: [email protected] Marina Teresa Pires Vieira E-mail: [email protected] Department of Computer Sciences UFSCar - Federal University of São Carlos Rod. Washington Luís, Km 235 Caixa Postal 676 13565-905 / São Carlos – SP – Brazil Phone/Fax:(55 16) 260-8232 Abstract The work proposed herein presents an approach that differs from the existing ones for the ongoing assessment of distance learning using some of the aspects relating to those utilized in the above cited studies. Section 2 provides a set of information to guide the implementation of ongoing assessment of learning in distance learning environments, while Section 3 briefly discusses the modeling of a data warehouse based on the set of information proposed. Section 4 presents the implementation of this data warehouse using the MultiStar environment, and finally, Section 5 lists our conclusions to this paper. This paper discusses the use of Data Warehouse and Data Mining resources to aid in the assessment of distance learning of students enrolled in distance courses. Information considered relevant for the assessment of distance learning is presented, as is the modeling of a data warehouse to store this information and the MultiStar environment, which allows for knowledge discovery to be performed in the data warehouse. 1. Introduction 2. Ongoing Assessment of Distance Learning A variety of applications have benefited from the use of Data Warehousing technology [1, 2, 3] to support management analyses, which can be obtained through the use of Data Mining [4]. The joint use of Data Warehousing and Data Mining techniques is a trend in KDD – Knowledge Discovery in Data Warehousing applications (referred to herein as KDW – Knowledge Discovery in Data Warehouse), since the data in a warehouse are better prepared for data mining. This paper discusses how the data warehouse and data mining resources can be used for the assessment of distance learning and proposes the MultiStar environment for KDW to support this assessment. Several studies focus on supporting student assessment, among them those of [5, 6] and [7]. Some studies apply data mining resources to Web log information [8, 9, 10 11]. 1 The teaching-learning process naturally produces information about the status of a student’s activities in a course. The study of this information and the decisions based on this study characterize the ongoing assessment of the learner. In most computational environments for distance learning involving some kind of student assessment, this is done by collecting the student’s interactions with the environment (the student’s actions). Analyzing the student’s history of interactions can reveal how the manner in which he conducts his studies influences the extent to which he profits from the course. Today there is a wide range of environments available for distance courses. To identify how these environments assess the student’s assimilation, a survey was made of the ones most frequently cited in the literature, as documented by MPhil scholarship-CAPES/Brazil 0-473-08801-0/01 $20.00 © 2002 IEEE 40 the criterion used to decide whether or not the student has carried them out. [12]. Five mechanisms to support the ongoing assessment of distance learning were identified through this survey: − tracking of the student’s actions; − − − − 3. Ongoing Assessment of Distance Learning using Data Warehouse Resources redirectioning through evaluation; records of messages from lists; records of messages from forums; records of messages from chats. The relevant information for ongoing assessment of distance learning can be stored in a data warehouse to support management decisions. This study explores the use of a data warehouse with these characteristics for the application of data mining techniques, allowing for patterns of student behavior to be identified, thereby favoring decision making for ongoing assessment of the student. In this work, the modeling of the data warehouse follows the fact constellation schema [2], incorporating generalization hierarchies for fact or dimensions tables of the data warehouse. Figure 1 constitutes part of the data warehouse that was developed based on the information discussed in the previous section. The gray boxes in these figures represent fact tables, i.e., tables that store information about a subject, about which measures (or facts) are defined (highlighted in bold). The remaining boxes represent the dimension tables from which one wishes to store the values that determine the fact table measures. The representation of a fact table with its dimension tables is called Star Schema. Part A and B of Figure 1 represent two star schemas. The results of this survey show a tendency for these environments to support the tracking of some student activities to monitor his learning. Most of these environments contain a small set of information that tracks the path the student has taken during the course. This set varies from one environment.to another, according to a criterion not divulged by its designers. Although there is no standard set of requisites to assess the student’s learning, there are clearly two types of information to guide the implementation of ongoing assessment of learning in distance learning environments: − Information about the student’s actions and communication [13]. This information can aid in understanding how the student’s interactions with the environment and with other course participants influence his learning. Two types of student interaction can be identified: − Student-Person Interactions: which are those in which the student interacts with other course participants, such as the teacher, the assistant teacher or another student, through some communication mechanism. With regard to these interactions, it is interesting to know, for instance, the subject of the message and the mechanism (chat, email, list, forum, etc.) employed. − Student-Material Interactions: which are those in which the student interacts with the didactic material (content pages, tests, exercises, etc.). About these interactions, it is interesting to know, for example, how much time was spent on them, if the interaction consisted of downloading or uploading, which discipline the material belongs to, what link was used to access the material, etc. − Information about the student’s activities in the course [8] This kind of information, which depends on a rule established by the teacher, strongly influences in determining whether or not the student has actually learned. Each activity proposed by the teacher may have a result: for instance, participation or not in a conference, the grade given for an assignment, and so on. This type of information depends on the activities proposed for the course and the way the teacher has chosen to validate them, i.e., Figure 1. Fact constellation schema for the Activity and Personal Interaction. 41 the measures and dimensions of these two facts can be analysed jointly, crossing information about the interactions and activities developed by the students. One kind of analysis that can be made, for example, is to check if the students’ interactions influence in the performance of the course activities. Figure 2 illustrates the fact constellation schema of the data warehouse developed to assess distance learning. A fact constellation is a collection of stars. In addition to the information about activities and personal interactions, this data warehouse contains the following information: − the student’s interaction (access) with the didactic material (StudentMaterialInteraction fact tablecentered), involving the attributes DurationOfTheAccess, LinkOfTheMaterialAccessed, TypeOfAccess (download or upload), etc. − the tests the student has taken (Test fact tablecentered), with the attributes Grade, NumberOfIncorrectly AnsweredQuestions, etc. − and whether the student has passed the tests upon conclusion of a discipline (Approval fact tablecentered), with the attributes Dropped-out?, Passed?, TemporarilySuspended?, etc. For purposes of legibility, Figure 2 groups the Student, Course, Discipline, Institution, Time and Group dimensions shared by all the facts into one entity to avoid the pollution caused by linking. The data warehouse in Figure 2 shows various indirect Information about the activities developed by the student during the course can be stored in the data warehouse, as illustrated in part A of Figure 1, while information about the student-person interactions can follow the model shown in part B of Figure 1. The PersonalInteraction fact table shown in part B of Figure 1 specializes in 4 different interactions: InteractionViaChat, InteractionViaEmail, InteractionViaList and InteractionViaForum. The semantics of this hierarchical structure is translated into the measures and dimensions of the specialized facts. These fact tables contain all the dimensions and measures of the PersonalInteraction. In analytical terms, this represents the possibility of examining, in each fact of the specialization, the .dimensions and measures common to all the personal interactions as well as the specific information about each interaction (via chat, via email, via list or via forum), considering the instances pertinent to the fact table in question. For analytical purposes, the PersonalInteraction fact table is used when one wishes to analyze measures and attributes common to all the types of personal interaction. An analysis of Figure 1 reveals that the stars of the Activity and PersonalInteraction facts have common dimensions: Student, Course, Discipline, Institution, Group and Time. Joining these two stars forms a constellation with two facts that share six dimensions. This union is advantageous because, in addition to avoiding the duplication of data, in practice it means that Figure 2. Fact constellation for ongoing assessement. 42 Figures 3 and 4 exemplify the use of the MultiStar environment for knowledge discovery in the data warehouse in Figure 2. These figures portray how the selection and mining of information in this environment can be performed. Field 1 of Figure 3 represents the fact tables of Figure 2 which, upon being expanded (fields 2, 3 and 4), show the attributes that represent the subjects subjected to analysis in the fact table (called measures or facts) and information about the related dimension tables. relationships among the fact tables. This opens up a wide range of possibilities when combining measures and dimensions to carry out analyses, e.g., − analyze whether there is a relation between a student’s score, his personal interactions and his accessing of the didactic material (involving the Test, PersonalInteraction and StudentMaterialInteraction facts); − verify the influence of factors such as communication and study on learning (involving the PersonalInteraction and StudentMaterialInteraction facts); − discover if the type of connection a student possesses influences the number of times he accesses the environment (involving the Student dimension and the StudentMaterialInteraction fact); − find activities that are more effective in given courses, age groups, level of schooling, etc. (involving the Course and Student dimensions and the Activity fact). These analyses can be made using the environment for Knowledge Discovery in Data Warehouses (KDW) described in the following section. 4. A KDW Application for Assessment of Distance Learning Figure 3. MultiStar: selecting information. The purpose of the data selection process illustrated in Figure 3 is to support an analysis of the influence of the chat interactions on the student’s activities. Thus, a selection was made in the data warehouse of the Student dimension common to the Activity (field 2), Approval (field 3) and PersonalInteraction (field 4) fact tables, the TypeOfInteraction and Reply? measures in the PersonalInteraction fact table, the Passed? measure of the Approval fact table, and the Accomplished? measure of the Activity fact table. This analysis was restricted to students of the ATA Institution during the period of 1999 to 2001. This led to the creation of filters (field 5) for the attribute Name of the dimension Institution (field 6) and for the attribute Year of the dimension Time (field 7), both of which are attributes of dimensions common to the three fact tables. The information selected is stored in a data cube2 called ‘Interactions and Activities’, which contains all the attributes of the Student dimension table (as shown in Figure 1) and the measures cited below. In the MultiStar environment, for a generalization hierarchy between fact or dimension tables, characteristics inherited from the parent tables are displayed automatically in the child tables, making the hierarchies Commercial tools can be used to carry out management analyses in the data warehouse presented in the previous section; however, they support simple analyses, i.e., using only one fact and its dimension tables, e.g., identify the profiles of students more prone to dropping out of a course (involving the Student dimension table and the Approval fact table). However, there are important analyses that can be performed in this warehouse which require a comparison of the different aspects of the student’s learning process. Examples of this type of analysis were given in the previous section. To support this type of broad analysis, i.e., those involving more than one fact (star), an environment called MultiStar was developed for knowledge discovery [14]. This environment allows information to be selected in which data mining tasks will be applied, providing resources for the recognition of fact constellations and the treatment of generalization hierarchies. By recognizing .fact constellations, MultiStar allows for analyses involving facts that belong to the same constellation, i.e., facts that share dimensions. The treatment of generalization hierarchies involving the relationship of inheritance among the fact or dimension tables of a data warehouse does not require the user to understand the concept on which it is based. 2 A data cube [4] is a structure composed of dimensions and facts organized to facilitate analyses of the data. 43 The data mining task chosen was Classification, with the purpose of classifying the student according to the measure Passed?. When this mining task is performed, MultiStar textually presents the patterns it finds. The patterns resulting from the classification task are expressed through rules, as shown in the example below: IF Accomplished? = yes, and TypeOfConnection = superfast, and TypeOfInteraction = chat, and Reply? = yes THEN Passed? = yes The number of cases in which a rule occurs and the degree of reliability of the rule are indicated for each rule found. clear to the user. With regard to the fact constellations, when a dimension or measure is selected, the MultiStar environment allows for the selection of only the fact tables that are related directly or indirectly with the selected information. 5. Conclusions This paper discusses the relevant information for ongoing assessment of learning in computational distance learning environments, proposing a solution to aid in those ongoing assessment through the use of data warehouse and data mining resources. Modeling of a data warehouse was presented to illustrate the information identified, as well as the MultiStar environment, which allows for knowledge discovery in this data warehouse. The authors intend to present the results of the application of data mining tasks in the next version of the environment in a more user intuitive form, using graphic resources. An intelligent tutor can also be developed to automatically guide the student in his learning process, based on the results of the data mining tasks applied to the data warehouse discussed herein. Figure 4. MultiStar: mining data. Once the data has been selected, MultiStar provides resources for the application of data mining tasks so that patterns can be extracted based on those data. Figure 4 shows the interface for the application of data mining on the data selected in Figure 3. In Field 1 of Figure 4, the user selects the cube to be analyzed (the ‘Interactions and Activities’ cube was selected here). Field 2 shows the attributes of the selected cube (dimensions and measures). The user must choose one attribute from each dimension of the cube (the attribute TypeOfConnection from the Student dimension table was selected). These attributes together with the measures of the cube (Accomplished? from the Activity .fact table, Passed? from the Approval fact table, and TypeOfInteraction and Reply? from the PersonalInteraction fact table, in our example) compose a view to be mined. Field 5 shows the cube filter selected. A mining task is selected in Field 3, and the parameters for this task are defined in Field 4. The data mining tasks available in the environment are Association [15], Classification [16] and Clustering [17]. Each of these tasks allows the data to be analyzed from a different standpoint. 6. References [1] W.H. Inmon, Building the Data Warehouse, John Wiley & Sons, 2nd edition, 1996 [2] R. Kimball, The Data Warehouse Toolkit – Practical Techniques for Building Dimensional Data Warehouses, John Wiley Professio, 1996 [3] R. Kimball, L. Reeves, M. Ross and W. Thornthwaite, The Data Warehouse Lifecycle Toolkit, Willey Computer Publishings, 1998 [4] J. Han and M. Kamber, Data mining – Concepts and Techniques, 1 st edition, New York: Morgan Kaufmann, 2000 [5] K. Nurmela, E. Lehtinen, T. Palonen, Evaluating CSCL Log Files by Social Network Analysis, In: 44 [12] D.R. Silva and M.T.P. Vieira, An Ongoing Assessment Model in Distance Learning, In: Proceedings of Internet and Multimedia Systems and Applications, Honolulu, USA, 2001 Computer Support for Collaborative Learning, Stanford, USA, 1999. Proceedings. p. 434-441 [6] M. Rahkila and M. Karjalainen, Evaluation of Learning in Computer Based Education Using Log Systems. In: ASEE/IEEE Frontiers in Education Conference, 29., San Juan, Puerto Rico, 1999, Procedings. p. 16-21 [13] C. Vrasidas and M.S. McIsaac, Factors Influencing Interaction in an Online Course; The American Journal of Distance Education, v. 13, n. 3, 1999. [14] D.R. Silva, A Tool for Knowledge Discovery using Data Warehousing and its Application on the Ongoing Assessment of Distance Learning. MPhil. Dissertation, Departament of Computer Science, UFSCar, São Carlos, Brazil, 2002, 108p. (In portuguese) [7] S.L. Tanimoto, Towards an Ontology for Alternative Assessment in Education. Metting of IEEE Learning Technology Standards Committee, Pittsburgh, USA, 1998 [8] J. Pei, J. Han, B. Mortazavi-Asl and H. Zhu, Mining Access Patterns Efficiently from Web Logs, In: PacificAsia Conference on Knowledge Discovery and Data Mining, Kyoto, Japan, 2000, Proceedings. p. 396-407 [15] R. Agrawal, T. Imielinski and A. Swami, Mining Associations between Sets of Items in Massive Databases. In: ACM SIGMOD International Conference on the Management of Data. New York, USA, 1993. Proceedings. NY: ACM Press, 1993, p. 207--216. [9] O.R. Zaiane, M. Xin and J. Han, Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs, In: Advances in Digital Libraries Conference, Santa Barbara, USA, 1998, Proceedings. p. 19-29 [16] J.R. Quinlan, Induction of Decision Trees. Machine Learning, 1:81-106, 1986 [17] P. Cheeseman and J. Stutz, Bayesian Classification (AutoClass): Theory and Results, In: Advances in Knowledge Discovery in Databases, 1995. 10., Proceedings. AAAI Press, p. 61-83, 1995 [11] B. Mortazavi-Asl, Discovering and Mining User Web-Page Traversal Patterns, MPhil. Dissertation, Simon Fraser University, 1999, p. 93 45

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Using Data Warehouse and Data Mining Resources for