Download coursework2014b - School of Computing

COMP3776 Data Mining and Text Analytics. Lecturer: Eric Atwell, School of Computing, University of Leeds Coursework2014b: 5-minute Presentation on a WEKA Data-mining challenge (summative, 15% of module grading) Students should work in PAIRS: choose your own partners (see notice in Long Room). Each PAIR will make a joint 5-minute presentation, and both partners get the same grade. DEADLINE: Presentations to DM+TA Challenge Workshop, Tuesday 09/12/14, week 11 1. Learning objectives This exercise will enable you to - Understand how to apply algorithms, resources and techniques for implementing and evaluating data mining and text analytics in a practical exercise using the WEKA toolkit; - understand theory and terminology used in WEKA classifiers; - summarise and present your understanding to a peer audience; - reflect on what you have learnt. 2. The task for the DM+TA Challenge Workshop Chapter 17 of the Data Mining textbook (Witten et al 2011) presents a number of Tutorial Exercises for the Weka Explorer. Section 17.5 has some exercises in Document Classification, including use of the StringToWordVector filter, and experiments with some news datasets: ReutersCorn-train.arff and ReutersGrain-train.arff are training sets for classifying news stories about corn and grain; ReutersCorntest.arff and ReutersGrain-test.arff are corresponding test files you must use for evaluation. Your challenge is: find the best Classifiers for the ReutersGrain and ReutersCorn datasets, giving the highest accuracy score when trained on the train.arff files and evaluated on the test.arff files. In your Presentations, you must outline your Methods: what you did to find the best Classifier for each dataset; and your Results: the highest accuracy score you achieved for each dataset; and Reflection: what did you learn from this exercise? You MUST keep to time: five minutes maximum per pair of students. There will be Prizes for the best accuracy scores: the Grain Classifier Prize, and the Corn Classifier Prize. 3. Marking scheme At the DM+TA Challenge Workshop, as each pair presents, I will assess: Methods: what you did to find the best Classifier for each dataset (0-6 marks); Results: the highest accuracy score you achieved for each dataset (0-3 marks); Reflection: What did you learn from this exercise? (0-3 marks); In addition: Presentation: overall interest and appeal of the presentation (0-3 marks) TOTAL grade: up to 15 marks. 4. Recording the presentations I hope to record all presentations, using the Lecture Capture facilities in the lecture theatre. The School of Computing may want to display your presentations, for example on Open Days and/or Applicant Days. If you do NOT want your presentation on public view, EMAIL ME ([email protected]) 5. References I Witten et al. 2011. Data Mining (3rd edition). Elsevier. Ch.17 Tutorial Exercises for the Weka Explorer: https://moodle.umons.ac.be/pluginfile.php/43703/mod_resource/content/2/WekaTutorial.pdf

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download coursework2014b - School of Computing