Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Code: IS53023B Name: Data Mining Level: 6 Credits: 15 Prerequisites: Aims The course introduces machine learning techniques utilised in the process of discovery of knowledge or hidden patterns in potentially large volumes of data. Practical data mining is introduced through assignments consisting in theoretical exercises, algorithm implementation and data mining suite utilisation for knowledge discovery. The module offers also pointers towards new developments in the field as the multimedia data mining, in particular text, web, and music data mining. Learning Outcomes On successful completion of this module, students will have demonstrated ability to: Subject-related Knowledge K1. Explain the various phases of knowledge discovery in data K2. Describe and explain the representative machine learning and statistical techniques providing data mining algorithms for supervised learning, unsupervised clustering and association rule learning K3. Explain the methods of performance evaluation for the classification, estimation, prediction, clustering and association analysis models K4. Compare and contrast predictive data mining models in order to select and apply the optimal ones Subject-related Skills S1. Construct programs that implement data mining algorithms in a programming language as Java or a scripting language as PHP S2. Develop complete processes of knowledge discovery in data utilising a data mining suite S3. Apply data mining algorithms in various areas as fraud detection, banking, estate value and product quality estimation, medical diagnosing and research, consumer analytics Transferable Skills T1. Construct predictive tools involved in decision support systems T2. Construct document categorization tools involved in web page recommendation systems, email spam detection systems, by utilising data mining strategies as classification, unsupervised clustering and association analysis T3. Construct classification tools involved in music and image retrieval systems utilising data mining strategies as supervised learning and unsupervised clustering Syllabus 1. Motivation for data mining. 2. Data mining strategies: supervised learning, unsupervised clustering and association analysis. 3. Representative machine learning and statistical algorithms for data mining: classification, estimation and prediction algorithms for building models as decision trees, neural networks, logistic, linear regressions and regression trees, Bayesian classifiers, segmentation algorithms based on agglomerative hierarchical clustering, centred cluster building (k-means), self organising maps, and market basket analysis algorithms. 4. Evaluation of performance of predictive models. 5. Knowledge discovery in data processes. 6. Data pre-processing: treatment of missing values, data normalisation, data sampling, selection of the best predictive attributes. 7. Data warehousing. 8. Case studies using implemented data mining algorithms and a data mining suite, supporting [S1-S3]. 9. Trends in data mining related to [T2-T3]. Assessment Main texts: • Richard Roiger and Michael Geatz “Data Mining, a tutorial-based primer”, Addison Wesley, 2003 • Jiawei Han and Micheline Kamber “Data Mining: Concepts and Techniques”, Morgan Kaufmann, 2006 Recommended titles: • Ian Witten and Eibe Frank “Data Mining: Practical Machine Learning Tools and Techniques” , Morgan Kaufmann, 2005 • Margaret Dunham “Data Mining: Introductory and Advanced Topics”, Prentice Hall, 2002 • Sheldon Ross “Introductory Statistics”, Elsevier Academic Press, 2005 • Pang-Ning Tan, Michael Steinbach, and Vipin Kumar: Introduction to Data Mining, Addison Wesley, 2005 Assessment Assignments A1: Specific exercises supporting lecture material [unassessed]. A2: Homework consisting in optional further reading on specific themes [unassessed] A3: Practical Coursework: Combination of exercises and practical implementation of data mining processes based on the techniques introduced [assessed]. Worth 20% of overall mark. Examination E1: 2hr 15 min, 3 questions out of 5, unseen examination testing theoretical elements of the syllabus. Worth 80% of overall mark. Resource/Timetabling 2 hours lecture, 1 hour lab