Download slides - Anjo Anjewierden

Towards educational data mining: Using data mining methods for automated chat analysis to understand and support inquiry learning processes Anjo Anjewierden, Bas Kollöffel and Casper Hulshof Anjo Anjewierden hdddddtp://anjo.blogs.com Department of Instructional Technology Faculty of Behavourial Sciences University of Twente The Netherlands Overview (1) • • • • • • Motivation Classification of educational chats Methods for automated analysis Experiment Results Conclusions Motivation • Chats can structure collaborative learning – Doing vs. doing and discussing with other learners • Current use of chats is limited to – Logging the messages for later analysis • Our goals related to chat analysis – Provide adaptive feedback based on on-line analysis of the chats – Make the learner part of the simulation by visualising her actions and behaviour (e.g. through avatars) Approach • Define models by which messages can be classified – One model is based on term usage – Another model is based on the grammar – Later we want to combine the models to find "semantic patterns" • Applying the models to each message of a particular chat it can be assigned a class – Aggregation of class assignments over time is what an avatar can visualise Inquiry learning Learning environment • Both learners see the same simulation on two different screens • One learner can run the simulation • Learners use chat to discuss: – Simulations to run, variable settings, etc. – Interpretation of the results of simulations – Which answer to give to a question – etc. Overview (2) • • • • • • Motivation Classification of educational chats Methods for automated analysis Experiment Results Conclusions Classifications of chats • Which functions should we distinguish in chat messages? • We use a classification proposed by Gijlers and De Jong (2005): – – – – Regulative: planning, monitoring, agreeing, etc. Domain: transformative Technical: about the learning environment Social: greetings, compliments and other off-task Examples • Regulative: – Ok // Yes // Next – I think the answer is 3 – Perhaps we should try again • Domain: – The momentum becomes negative – Speed of the red ball is 2 m/s • Technical: – Move the mouse to the right • Social: – Well done partner Data used • Chats collected by Nadira Saab for her Ph.D. research (University of Amsterdam, 2005) • Domain: simulations related to collisions (e.g. momentum for elastic and inelastic collisions) • Language: Dutch • 78 chat sessions • 16879 chat messages Data normalisation • Messages are extremely noisy – Misspellings (accidental and on purpose) – Chat language (w8 = wait) – See paper for Dutch examples • Messages have been manually corrected to obtain words that can be found in the dictionary – Grammar has not been corrected Overview (3) • • • • • • Motivation Classification of educational chats Methods for automated analysis Experiment Results Conclusions Types of features • For each class one can define – Characterising terms (domain: speed, increases) – Grammatical patterns: • the speed increases (<article> <noun> <verb>) • I think (<personal pronoun> <verb>) – Both terms and syntactic patterns are used by humans to classify the messages • Data mining – Discover the terms and patterns automatically Words as features • Each word in a message is a feature – Order is not taken into account – Smileys, !, ?, integers are separate words • Example – The answer is 5!!!! :-) – Features: { answer, is, the, #, !, <smiley> } (where # is any integer) Grammar as features • Each message is parsed by a part-ofspeech (POS) tagger – Determines role words play in a message (noun, verb, etc.) • POS-sequences are a feature, if: 1. They occur at least 20 times, and 2. They do not fully overlap a longer sequence • Example: 1. the speed: {<article>, <noun>, <article> <noun>} 2. Remove full overlaps: {<article> <noun>} Naive Bayes classifier • Standard Naive Bayes classifier is used – Once for the word features – Once for the grammar features • See paper for technical details Overview (4) • • • • • • Motivation Classification of educational chats Methods for automated analysis Experiment Results Conclusions Experiment • Four researchers each classified 400 messages – Randomly selected with a bias towards longer messages (nearly all short messages are regulative) – 1280 unique messages were classified • Expert manually checked whether the classifications were "correct" • Result was used to create two classification models (words, grammar) using Naive Bayes Overview (5) • • • • • • Motivation Classification of educational chats Methods for automated analysis Experiment Results Conclusions Results by demonstration Overview (6) • • • • • • Motivation Classification of educational chats Methods for automated analysis Experiment Results Conclusions Conclusions • Automatic classification of messages – Naive Bayes works surprisingly well • Even for a small feature set per item (chat) • And for a large number of features over all items – Sufficiently accurate for • The classes we used • Visualising aggregated learner behaviour through avatars • Misspellings are a source of concern Future work • Combining manual and automatic classification – Started: see interaction classification tool – Can speed up chat coding in general (also for research) • Find "semantic patterns" in chats – Based on combining information from the word and grammar models – Relate these "semantic patterns" to learner actions in the simulation environment Thank you! • And thanks to – Nadira Saab – Hannie Gijlers – Petra Hendrikse – Sylvia van Borkulo – Jan van der Meij – Wouter van Joolingen – and the anonymous reviewers

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download slides - Anjo Anjewierden