Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
COURSE SPECIFICATION NAME OF COURSE: Data Mining Techniques and Applications STATUS (main, optional, Free Choice): Main LEVEL: (F,A,P,1,2,3,M) 3 COURSE CODE: CS455 / IS455 UNIT VALUE: 4 Department offering course: Course Co-ordinator: Computer Science Information Systems Degree Programmes in which to be offered: Information Systems, Computer Science Pre-requisites: Indicate whether a new course or name of course IS240/CS240 Databases being replaced: New TERMS TAUGHT: Spring Date of course commencement: February Total Contact Hours: 75 Lectures/Tutorials: 45 Practicals: 30 AIMS OF THE COURSE: The overall aim of this course is to introduce students to modern data mining techniques and their use in business and other areas of applications. In particular, the course explores basic concepts, principles and techniques of data mining, online analytic processing and data warehousing with emphasis on both the technical and the practical issues. The course provides students with an understanding in evaluating and comparing data mining solutions for effective use of the solutions in practice. The course also equips students with some hands-on experience and skills in conducting a data mining project using a data mining software tool, and/or constructing a data warehouse. SSST Course Specification March 2012 (Page 1) INTENDED LEARNING OUTCOMES LEARNING AND TEACHING STRATEGIES TO BE USED: ASSESSMENT CRITERIA (SHOULD LINK EXPLICITLY TO INTENDED LEARNING OUTCOMES): By the end of the course the students will be able to: 1. Understand basic concepts and principles of data mining and data warehousing 2. Understand state-of-art approaches and techniques in data mining and visualisation 3. Develop a working application using a commercial data mining/data warehousing software tool. 1. Lectures introduce theoretical and conceptual materials from the recommended textbooks 2. Tutorials explore the application of theory and concepts 1. 4 major topics tests (40%) 3. Laboratory sessions provide exercises to apply the theory and use data mining/warehousing software tools 3. Group project (10%) 2. Written final exam (50%) 4. Group project enables students to develop team-work skills and apply what they have learnt in the course to a practical problem TRANSFERABLE SKILLS AND OTHER ATTRIBUTES 1. Oral and written presentation: ability to express ideas clearly and precisely 2. Critical thinking: ability to analyse data in databases and evaluate possible solutions for real life applications 3. Team work: ability to collaborate in applications development projects 4. Group discussions: ability to participate in group discussions on a given subject LEARNING AND TEACHING STRATEGIES USED: 1. Personal and team projects to read, evaluate and present knowledge. 2. Practical classes and application on real life data ASSESSMENT CRITERIA (SHOULD LINK EXPLICITLY TO INTENDED LEARNING OUTCOMES): 1. Projects 2. Projects and examinations 3. Projects 4. Practical classes SSST Course Specification March 2012 (Page 2) COURSE OUTLINE/SYLLABUS: • Introduction: Concept of data mining. Data mining and KDD. Data mining process. Major data mining tasks. Data mining approaches. Overview of data mining solutions. Importance of evaluation. Evolution of data mining. Promises and challenges. • Understanding Data: data and data sets. Data types. Data quality, Data pre-processing. Data summarisation and visualisation. • Data Mining Techniques I: Problem of cluster detection. Proximity measures. Basic clustering methods: K-means and Agglomeration. Validation and evaluation of clusters. Overview of other types of clustering methods. Clustering in practice. • Data Mining Techniques II: Problem of classification. Decision tree induction approach: ID3 and other tree induction solutions. Nearest neighbour approach: kNN and PEBLS methods. Statistical approach: Naïve Bayes method. Overview of other classification approaches. Evaluation of classifiers. Problem of overfitting and solutions. Classification in practice. • Data Mining Techniques III: Problem of association rule discovery. Apriori algorithms for Boolean, generalised and quantitative association rules. Evaluation of association rules. Other types of association rules. Association rule discovery in practice. • Data Mining Projects. Data Mining project life cycle. The industry standard: CRISP-DM guideline for data mining. A case study on customer segmentation • Data Mining software tool WEKA: Overview of WEKA functions. Data Reprocessing in WEKA. WEKA visualisation facilities for data and patterns, WEKA data mining functions. WEKA evaluation parameters. Choosing the best data mining solutions in WEKA. Overview of data mining software tools. • Data Warehousing: Goals and characteristics. Differences between data warehouse, data mart and database. Data warehouse architectures. Metadata and management. Data loading and integration. Various data warehousing technologies in industry. Design issues for building a data warehouse. • Online Analytic Processing (OLAP): Concepts of Multidimensional cubes. Hierarchies of data abstraction. Operations over multidimensional cubes. OLAP approaches. OLAP models and systems. Limitations and constraints of OLAP. • Putting Everything Into Perspective: Database, data warehouse, query, OLAP, data mining and decision support. Data, information and Knowledge in enterprises. Application areas of data mining. Ethical and professional issues regarding data mining. KEY TEXTS AND/OR OTHER LEARNING MATERIALS: Recommended Texts: • Hongbo Du, Data Mining Techniques and Applications, ISBN 9781844808915. • Marakas G.M., Modern Data Warehousing, Mining, and Visualization: Core Concepts, Prentice Hall, 2003, ISBN-10: 0131014595. Additional Reading: • Rob P. and Coronel C. , Database Systems: Design, Implementation and Management, Thomson Course Technology, 2004, ISBN 0-619-21323-X • Berry, M. J. A. and Linoff, G., Data Mining Techniques, for Marketing, Sales and Customer Support, 2nd ed, Wiley, 2004, ISBN: 0471470643 SSST Course Specification March 2012 (Page 3)