Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Page 1 of 4 Philadelphia University Faculty of information technology Department of Computer Information Systems First semester, 2008/2009 Course Syllabus Course Title: Data Mining and Data Warehousing Course code: 760463 Course Level: 4 Course prerequisite (s) and/or corequisite (s): Lecture Time: Credit hours: 3 Academic Staff Specifics Name Dr. Fadi Fayez Rank Senior Lecturer Office Number and Location 7302 Tel. No. +962-2-6374444 Ext: 513 Office Hours E-mail Address Mon & Wed (11:00 – 13:00 am) Course/module description: The module equips students with the knowledge and skills necessary to design, implement a data warehouse/ a data mining algorithm using Oracle or any other appropriate programming language. Students are expected to become familiar with the common data mining tasks and techniques, principles of dimensional data modeling, techniques for extraction of data from source systems, data transformation methods, data staging, data warehouse architecture and infrastructure. Issues such as preprocessing the data, discretisation, rule pruning, cross validation, inductive bias, and prediction are included. Students will design and develop a simple data mining prototype using Oracle data mining package or any appropriate tools. Course/module objectives: 1. To provide the student with an understanding of the concepts of data warehousing and data mining 2. To study the dimensional modeling technique for designing a data warehouse 3. To study data warehouse architectures, OLAP and the project planning aspects in building a data warehouse 4. To explain the knowledge discovery process 5. To describe the data mining tasks and study their well-known techniques 6. To develop an understanding of the role played by knowledge in a diverse range of intelligent systems. 7. To test real data sets using popular data mining tools such as WEKA Page 2 of 4 Course/ module components Books (title, author (s), publisher, year of publication) 1. Tan, P-N, Steinbach, M., Kumar, V. Introduction to Data Mining. Addison Wesley, 2005. 2. Han, J. and Kamber, M, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2006 Software(s) 1. WEKA 2. Oracle data warehousing package Teaching methods: Duration: 16 hours weeks, 48 hours in total Lectures: 32 hours (2 hours per week) Tutorials: Approximately 1 per week Learning Outcomes: Knowledge and understanding 1. To provide a brief introduction to general issues of Data Warehouse and Data mining. 2. To understand different architectures and mining techniques Cognitive skills (thinking and analysis) 1. Introduce students to the role and function of Data Warehouse and Data Mining. 2. Understand the theoretical background of data mining tasks and techniques Communication skills (personal and academic) 1. Explain the stages and process different data mining techniques. 2. Be able to work effectively with others, and to carry out projects in groups Practical and subject specific skills (Transferable Skills) 1. To learn mining and warehouse techniques through the use of different tools (e.g. ORACLE, WEKA) 2. To learn the evaluation techniques of data mining and data warehouse. Assessment instruments Short reports and/ or presentations, and/ or Short research projects Quizzes, home works Final examination: 50 marks Allocation of Marks Assessment Instruments Mark First examination 20% Second examination 20% Final examination: 50 marks 50% Reports, research projects, Quizzes, Home works, Projects 10% Total 100% Page 3 of 4 Course/module academic calendar Basic and support material to be covered Week (1) (2) (3) (4) (5) (6) First examination (7) (8) (9) (10) (11) Second examination Course Overview Course Introduction Knowledge discovery process Why data mining Introduction What is data mining Motivation and challenges of data mining Data mining tasks Types of Data Data set types Data mining applications Tutorial 1 Data quality Data preprocessing: Aggression, sampling, dimensionality reduction, feature selection, feature creation, discretisation, transformation Measuring the similarity and dissimilarity between: Simple attributes, data objects Tutorials 2, 3 : WEKA Proximity measures Issues in proximity calculation Tutorial 4: Exploring the IRIS data set Data Mining Techniques Mining association rules Association rule mining Apriori algorithm Frequent Pattern Growth algorithm Rule based Classification What is classification Decision trees: ID3, C4.5 Rule induction: RIPPER algorithm Tutorial 5, 6: WEKA Data Mining Techniques Rule based Classification Associative classification (CBA, MMAC) Rule Pruning : REP, database coverage Data Mining Techniques Statistical classification: Naïve bayes Issues in Classification: Overfitting and cross-validation Evaluation methods in Classification Tutorial 7: Assignment 1 Data Mining Techniques Other classification approaches Regression Neural networks Genetic algorithms Cluster analysis Partitioning methods (K-means) Hierarchical methods (BIRCH and CURE) Outlier analysis Preliminaries Statistical approaches Homework/ reports and their due dates (12) (13) (14) (15) Specimen examination (Optional) (16) Final Examination Density-based methods Tutorial 8: WEKA Case Study: Text Categoristion Tutorial 9: Using associative classification for text categorisation Data Warehousing Why data warehouse? Basic concepts related to data warehousing OLTP and OLAP Data Cube Data Warehouse implementation Tutorial 10: review questions Data Warehousing Data Warehouse modeling Warehouse views Data Warehouse Architectures Tutorial 11: review questions Applications on Data Warehouse Case Study Tutorial 12: Data warehousing in Oracle Data Warehousing Tutorial 13: Data warehousing in Oracle Review and Final Exam Expected workload: On average students need to spend 2 hours of study and preparation for each 50-minute lecture/tutorial. Attendance policy: Absence from lectures and/or tutorials shall not exceed 15%. Students who exceed the 15% limit without a medical or emergency excuse acceptable to and approved by the Dean of the relevant college/faculty shall not be allowed to take the final examination and shall receive a mark of zero for the course. If the excuse is approved by the Dean, the student shall be considered to have withdrawn from the course. Module references Books 1. Connoly, T., and Begg, C. Data Base Systems: A Practical Approach to Design, Implementation, and Management, Addison Wesley, fourth edition, 1999. 2. Witten, I. and Frank, E. Data Mining: Practical Machine Learning Tools and Techniques with Java implementations. San Francisco: Morgan Kaufmann, 2001. Journals Websites 1. ---- WEKA 2. Oracle data warehousing package