Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Page 1 of 4 Philadelphia University Faculty of Information Technology Department of Computer Science Second Semester, 2008/2009 Course Syllabus Course Title: Data Warehousing Course code: 760463 and Data Mining Course Level: Course prerequisite (s) and/or corequisite (s): Lecture Time: Credit hours: 3 Academic Staff Specifics Name Dr. Fadi Fayez Rank Senior Lecturer Office Number and Location IT 7302 Tel. No. +962-2-6374444 Ext: 513 Office Hours E-mail Address Mon & Wed (11:00 – 13:00) Course module description: The module equips students with the knowledge and skills necessary to design, implement a data warehouse/ a data mining algorithm using Oracle or any other appropriate programming language. Students are expected to become familiar with the common data mining tasks and techniques, principles of dimensional data modeling, techniques for extraction of data from source systems, data transformation methods, data staging, data warehouse architecture and infrastructure. Issues such as preprocessing the data, discretisation, rule pruning, cross validation, inductive bias, and prediction are included. Students will design and develop a simple data mining prototype using Oracle data mining package or any appropriate tools. Course module objectives: 1- To provide the student with an understanding of the concepts of data warehousing and data mining 2- To study the dimensional modeling technique for designing a data warehouse 3- To study data warehouse architectures, OLAP and the project planning aspects in building a data warehouse 4- To explain the knowledge discovery process 5- To describe the data mining tasks and study their well-known techniques 6- To develop an understanding of the role played by knowledge in a diverse range of intelligent systems. 7- To test real data sets using popular data mining tools such as WEKA Page 2 of 4 Course/ module components Books (title , author (s), publisher, year of publication) 1. Title: Introduction to Data Mining Author: Tan, P-N, Steinbach, M., Kumar Publisher: Addison Wesley, 2005 2. Title: Data Mining: Concepts and Techniques Author: Han, J. and Kamber, M Publisher: Morgan Kaufmann, 2006 Software(s) 1. WEKA 2. Oracle data warehousing package Teaching methods: Duration: 16 hours weeks, 48 hours in total Lectures: 32 hours (2 hours per week) Tutorials: Approximately 1 per week Learning outcomes: Knowledge and understanding 1. To provide a brief introduction to general issues of Data Warehouse and Data Mining. 2. To provide students with a clear understanding of the different architectures and mining techniques Cognitive skills (thinking and analysis). 1. Introduce students to the role and function of Data Warehouse and Data Mining. 2. Understand the theoretical background of data mining tasks and techniques Communication skills (personal and academic). 1. Explain the stages and process different data mining techniques. 2. Be able to work effectively with others, and to carry out projects in groups Practical and subject specific skills (Transferable Skills). 1. To learn mining and warehouse techniques through the use of different tools (e.g. ORACLE, WEKA) 2. To learn the evaluation techniques of data mining and data warehouse. Assessment instruments Short reports and/ or presentations, and/ or Short research projects Quizzes; Home works Final examination: 50 marks Allocation of Marks Assessment Instruments Mark First examination 15 Second examination 15 Final examination: 50 marks 50 Reports, research projects, Quizzes, Home works, Projects 20 Total 100 Page 3 of 4 Documentation and academic honesty Documentation style (with illustrative examples) -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Protection by copyright Avoiding plagiarism. Course/module academic calendar Basic and support material to be covered Week (1) (2) (3) (4) (5) (6) First examination (7) (8) (9) (10) (11) Second examination (12) (13) (14) (15) Specimen examination (16) Final Exam Course Overview: Introduction; Knowledge discovery process; Why data warehouse & data mining Data Warehouse: Why data warehouse?; OLTP and OLAP; Data Cube; Data Warehouse modeling; Warehouse views; Data Warehouse Architectures; Data Warehouse implementation Data preprocessing: Why preprocess the data? Data cleaning; Data integration and transformation Data reduction; Dimensionality reduction; Data compression Feature extraction; Discretization and concept hierarchy generation Applications on Data Warehouse; Case Study What is data mining ; Motivation and challenges of data mining; Data mining tasks; Types of Data; Data set types Data mining applications Data quality; Data preprocessing: Aggression, sampling, dimensionality reduction, feature selection, feature creation, discretisation, transformation; Measuring the similarity and dissimilarity between: Simple attributes, data objects Tutorials: WEKA Proximity measures; Issues in proximity calculation Tutorial: Exploring the IRIS data set Data Mining Techniques; Mining association rules: Association rule mining; Apriori algorithm Frequent Pattern Growth algorithm; Rule based Classification What is classification: Decision trees: ID3, C4.5; Rule induction: RIPPER algorithm Tutorial: WEKA Data Mining Techniques: Rule based Classification Associative classification (CBA, MMAC) Rule Pruning : REP, database coverage Data Mining Techniques: Statistical classification: Naïve bayes; Issues in Classification: Overfitting and cross-validation; Evaluation methods in Classification Tutorial Data Mining Techniques: Other classification approaches: Regression; Neural networks; Genetic algorithms Cluster analysis: Partitioning methods (K-means); Hierarchical methods (BIRCH and CURE) Outlier analysis: Preliminaries; Statistical approaches; Density-based methods; Tutorial 8: WEKA Case Study: Text Categorization Tutorial: Using associative classification for text categorization review questions and Final Exam Homework/report s and their due dates Assignment 1 Assignment 2 Page 4 of 4 Expected workload: On average students need to spend 2 hours of study and preparation for each 50-minute lecture/tutorial. Attendance policy: Absence from lectures and/or tutorials shall not exceed 15%. Students who exceed the 15% limit without a medical or emergency excuse acceptable to and approved by the Dean of the relevant college/faculty shall not be allowed to take the final examination and shall receive a mark of zero for the course. If the excuse is approved by the Dean, the student shall be considered to have withdrawn from the course. Module references Books References 1. Connoly, T., and Begg, C. Data Base Systems: A Practical Approach to Design, Implementation, and Management, Addison Wesley, fourth edition, 1999. 2. Witten, I., and Frank, E. Data mining: practical machine learning tools and techniques with Java implementations. San Francisco: Morgan Kaufmann, 2001. Journals Websites 3. ---- WEKA 4. Oracle data warehousing package