Download coursework2014b - School of Computing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
COMP3776 Data Mining and Text Analytics.
Lecturer: Eric Atwell, School of Computing, University of Leeds
Coursework2014b: 5-minute Presentation on a WEKA Data-mining challenge
(summative, 15% of module grading)
Students should work in PAIRS: choose your own partners (see notice in Long Room).
Each PAIR will make a joint 5-minute presentation, and both partners get the same grade.
DEADLINE: Presentations to DM+TA Challenge Workshop, Tuesday 09/12/14, week 11
1. Learning objectives
This exercise will enable you to
- Understand how to apply algorithms, resources and techniques for implementing and evaluating
data mining and text analytics in a practical exercise using the WEKA toolkit;
- understand theory and terminology used in WEKA classifiers;
- summarise and present your understanding to a peer audience;
- reflect on what you have learnt.
2. The task for the DM+TA Challenge Workshop
Chapter 17 of the Data Mining textbook (Witten et al 2011) presents a number of Tutorial Exercises for the
Weka Explorer. Section 17.5 has some exercises in Document Classification, including use of the
StringToWordVector filter, and experiments with some news datasets: ReutersCorn-train.arff and
ReutersGrain-train.arff are training sets for classifying news stories about corn and grain; ReutersCorntest.arff and ReutersGrain-test.arff are corresponding test files you must use for evaluation.
Your challenge is: find the best Classifiers for the ReutersGrain and ReutersCorn datasets, giving the
highest accuracy score when trained on the train.arff files and evaluated on the test.arff files. In your
Presentations, you must outline your Methods: what you did to find the best Classifier for each dataset; and
your Results: the highest accuracy score you achieved for each dataset; and Reflection: what did you learn
from this exercise? You MUST keep to time: five minutes maximum per pair of students.
There will be Prizes for the best accuracy scores: the Grain Classifier Prize, and the Corn Classifier Prize.
3. Marking scheme
At the DM+TA Challenge Workshop, as each pair presents, I will assess:
Methods: what you did to find the best Classifier for each dataset (0-6 marks);
Results: the highest accuracy score you achieved for each dataset (0-3 marks);
Reflection: What did you learn from this exercise? (0-3 marks);
In addition: Presentation: overall interest and appeal of the presentation (0-3 marks)
TOTAL grade: up to 15 marks.
4. Recording the presentations
I hope to record all presentations, using the Lecture Capture facilities in the lecture theatre. The School of
Computing may want to display your presentations, for example on Open Days and/or Applicant Days. If
you do NOT want your presentation on public view, EMAIL ME ([email protected])
5. References
I Witten et al. 2011. Data Mining (3rd edition). Elsevier. Ch.17 Tutorial Exercises for the Weka Explorer:
https://moodle.umons.ac.be/pluginfile.php/43703/mod_resource/content/2/WekaTutorial.pdf