Download View - Philadelphia University Jordan

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

The Measure of a Man (Star Trek: The Next Generation) wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Time series wikipedia , lookup

Transcript
Page 1 of 4
Philadelphia University
Faculty of information technology
Department of Computer Information Systems
First semester, 2008/2009
Course Syllabus
Course Title: Data Mining and
Data Warehousing
Course code: 760463
Course Level: 4
Course prerequisite (s) and/or corequisite (s):
Lecture Time:
Credit hours: 3
Academic Staff
Specifics
Name
Dr. Fadi Fayez
Rank
Senior
Lecturer
Office Number and
Location
7302
Tel. No. +962-2-6374444
Ext: 513
Office Hours
E-mail
Address
Mon & Wed (11:00 –
13:00 am)
Course/module description:
The module equips students with the knowledge and skills necessary to design,
implement a data warehouse/ a data mining algorithm using Oracle or any other
appropriate programming language. Students are expected to become familiar with
the common data mining tasks and techniques, principles of dimensional data
modeling, techniques for extraction of data from source systems, data transformation
methods, data staging, data warehouse architecture and infrastructure. Issues such as
preprocessing the data, discretisation, rule pruning, cross validation, inductive bias,
and prediction are included. Students will design and develop a simple data mining
prototype using Oracle data mining package or any appropriate tools.
Course/module objectives:
1. To provide the student with an understanding of the concepts of data warehousing
and data mining
2. To study the dimensional modeling technique for designing a data warehouse
3. To study data warehouse architectures, OLAP and the project planning aspects in
building a data warehouse
4. To explain the knowledge discovery process
5. To describe the data mining tasks and study their well-known techniques
6. To develop an understanding of the role played by knowledge in a diverse range
of intelligent systems.
7. To test real data sets using popular data mining tools such as WEKA
Page 2 of 4
Course/ module components
Books (title, author (s), publisher, year of publication)
1. Tan, P-N, Steinbach, M., Kumar, V. Introduction to Data Mining. Addison
Wesley, 2005.
2. Han, J. and Kamber, M, Data Mining: Concepts and Techniques, Morgan
Kaufmann, 2006
Software(s)
1. WEKA
2. Oracle data warehousing package
Teaching methods:
Duration: 16 hours weeks, 48 hours in total
Lectures: 32 hours (2 hours per week)
Tutorials: Approximately 1 per week
Learning Outcomes:
 Knowledge and understanding
1. To provide a brief introduction to general issues of Data Warehouse and Data
mining.
2. To understand different architectures and mining techniques
 Cognitive skills (thinking and analysis)
1. Introduce students to the role and function of Data Warehouse and Data Mining.
2. Understand the theoretical background of data mining tasks and techniques
 Communication skills (personal and academic)
1. Explain the stages and process different data mining techniques.
2. Be able to work effectively with others, and to carry out projects in groups
 Practical and subject specific skills (Transferable Skills)
1. To learn mining and warehouse techniques through the use of different tools (e.g.
ORACLE, WEKA)
2. To learn the evaluation techniques of data mining and data warehouse.
Assessment instruments
 Short reports and/ or presentations, and/ or Short research projects
 Quizzes, home works
 Final examination: 50 marks
Allocation of Marks
Assessment Instruments
Mark
First examination
20%
Second examination
20%
Final examination: 50 marks
50%
Reports, research projects, Quizzes, Home works, Projects
10%
Total
100%
Page 3 of 4
Course/module academic calendar
Basic and support material to be covered
Week
(1)
(2)
(3)
(4)
(5)
(6)
First examination
(7)
(8)
(9)
(10)
(11)
Second
examination
Course Overview
 Course Introduction
 Knowledge discovery process
 Why data mining
Introduction
 What is data mining
 Motivation and challenges of data mining
 Data mining tasks
 Types of Data
 Data set types
 Data mining applications
Tutorial 1
Data quality
Data preprocessing:
 Aggression, sampling, dimensionality
reduction, feature
selection, feature creation, discretisation,
transformation
Measuring the similarity and dissimilarity
between:
 Simple attributes, data objects
Tutorials 2, 3 : WEKA
Proximity measures
Issues in proximity calculation
Tutorial 4: Exploring the IRIS data set
Data Mining Techniques
Mining association rules
 Association rule mining
 Apriori algorithm
 Frequent Pattern Growth algorithm
Rule based Classification
 What is classification
 Decision trees: ID3, C4.5
 Rule induction: RIPPER algorithm
Tutorial 5, 6: WEKA
Data Mining Techniques
Rule based Classification
 Associative classification (CBA, MMAC)
Rule Pruning : REP, database coverage
Data Mining Techniques
 Statistical classification: Naïve bayes
 Issues in Classification: Overfitting and
cross-validation
 Evaluation methods in Classification
Tutorial 7: Assignment 1
Data Mining Techniques
Other classification approaches
 Regression
 Neural networks
 Genetic algorithms
Cluster analysis
 Partitioning methods (K-means)
 Hierarchical methods (BIRCH and CURE)
Outlier analysis
 Preliminaries
 Statistical approaches
Homework/
reports and
their due
dates
(12)
(13)
(14)
(15)
Specimen
examination
(Optional)
(16)
Final
Examination
 Density-based methods
Tutorial 8: WEKA
Case Study: Text Categoristion
Tutorial 9: Using associative classification for
text categorisation
Data Warehousing
 Why data warehouse?
 Basic concepts related to data warehousing
 OLTP and OLAP
 Data Cube
 Data Warehouse implementation
Tutorial 10: review questions
Data Warehousing
 Data Warehouse modeling
 Warehouse views
 Data Warehouse Architectures
Tutorial 11: review questions
Applications on Data Warehouse
Case Study
Tutorial 12: Data warehousing in Oracle
Data Warehousing
Tutorial 13: Data warehousing in Oracle
Review and Final Exam
Expected workload:
On average students need to spend 2 hours of study and preparation for each 50-minute
lecture/tutorial.
Attendance policy:
Absence from lectures and/or tutorials shall not exceed 15%. Students who exceed the 15%
limit without a medical or emergency excuse acceptable to and approved by the Dean of
the relevant college/faculty shall not be allowed to take the final examination and shall
receive a mark of zero for the course. If the excuse is approved by the Dean, the student
shall be considered to have withdrawn from the course.
Module references
Books
1. Connoly, T., and Begg, C. Data Base Systems: A Practical Approach to Design,
Implementation, and Management, Addison Wesley, fourth edition, 1999.
2. Witten, I. and Frank, E. Data Mining: Practical Machine Learning Tools and
Techniques with Java implementations. San Francisco: Morgan Kaufmann, 2001.
Journals
Websites
1. ---- WEKA
2. Oracle data warehousing package