Download View - Philadelphia University Jordan

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

The Measure of a Man (Star Trek: The Next Generation) wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Time series wikipedia , lookup

Transcript
Page 1 of 4
Philadelphia University
Faculty of Information Technology
Department of Computer Science
Second Semester, 2008/2009
Course Syllabus
Course Title: Data Warehousing
Course code: 760463
and Data Mining
Course Level:
Course prerequisite (s) and/or corequisite (s):
Lecture Time:
Credit hours: 3
Academic Staff Specifics
Name
Dr. Fadi Fayez
Rank
Senior
Lecturer
Office Number and Location
IT 7302
Tel. No. +962-2-6374444
Ext: 513
Office Hours
E-mail Address
Mon & Wed
(11:00 – 13:00)
Course module description:
The module equips students with the knowledge and skills necessary to design,
implement a data warehouse/ a data mining algorithm using Oracle or any other
appropriate programming language. Students are expected to become familiar with
the common data mining tasks and techniques, principles of dimensional data
modeling, techniques for extraction of data from source systems, data transformation
methods, data staging, data warehouse architecture and infrastructure. Issues such as
preprocessing the data, discretisation, rule pruning, cross validation, inductive bias,
and prediction are included. Students will design and develop a simple data mining
prototype using Oracle data mining package or any appropriate tools.
Course module objectives:
1- To provide the student with an understanding of the concepts of data warehousing
and data mining
2- To study the dimensional modeling technique for designing a data warehouse
3- To study data warehouse architectures, OLAP and the project planning aspects in
building a data warehouse
4- To explain the knowledge discovery process
5- To describe the data mining tasks and study their well-known techniques
6- To develop an understanding of the role played by knowledge in a diverse range of
intelligent systems.
7- To test real data sets using popular data mining tools such as WEKA
Page 2 of 4
Course/ module components
Books (title , author (s), publisher, year of publication)
1. Title: Introduction to Data Mining
Author: Tan, P-N, Steinbach, M., Kumar
Publisher: Addison Wesley, 2005
2. Title: Data Mining: Concepts and Techniques
Author: Han, J. and Kamber, M
Publisher: Morgan Kaufmann, 2006
Software(s)
1. WEKA
2. Oracle data warehousing package
Teaching methods:
Duration: 16 hours weeks, 48 hours in total
Lectures: 32 hours (2 hours per week)
Tutorials: Approximately 1 per week
Learning outcomes:
 Knowledge and understanding
1. To provide a brief introduction to general issues of Data Warehouse and Data Mining.
2. To provide students with a clear understanding of the different architectures and mining
techniques

Cognitive skills (thinking and analysis).
1. Introduce students to the role and function of Data Warehouse and Data Mining.
2. Understand the theoretical background of data mining tasks and techniques

Communication skills (personal and academic).
1. Explain the stages and process different data mining techniques.
2. Be able to work effectively with others, and to carry out projects in groups

Practical and subject specific skills (Transferable Skills).
1. To learn mining and warehouse techniques through the use of different tools (e.g.
ORACLE, WEKA)
2. To learn the evaluation techniques of data mining and data warehouse.
Assessment instruments
 Short reports and/ or presentations, and/ or Short research projects
 Quizzes; Home works
 Final examination: 50 marks
Allocation of Marks
Assessment Instruments
Mark
First examination
15
Second examination
15
Final examination: 50 marks
50
Reports, research projects, Quizzes, Home works, Projects
20
Total
100
Page 3 of 4
Documentation and academic honesty
 Documentation style (with illustrative examples)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Protection by copyright
 Avoiding plagiarism.
Course/module academic calendar
Basic and support material to be covered
Week
(1)
(2)
(3)
(4)
(5)
(6)
First
examination
(7)
(8)
(9)
(10)
(11)
Second
examination
(12)
(13)
(14)
(15)
Specimen
examination
(16)
Final Exam
Course Overview: Introduction; Knowledge discovery
process; Why data warehouse & data mining
Data Warehouse: Why data warehouse?; OLTP and OLAP; Data
Cube; Data Warehouse modeling; Warehouse views; Data
Warehouse Architectures; Data Warehouse implementation
Data preprocessing: Why preprocess the data? Data
cleaning; Data integration and transformation
Data reduction; Dimensionality reduction; Data compression
Feature extraction; Discretization and concept hierarchy
generation
Applications on Data Warehouse; Case Study
What is data mining ; Motivation and challenges of data mining;
Data mining tasks; Types of Data; Data set types
Data mining applications
Data quality; Data preprocessing: Aggression, sampling,
dimensionality reduction, feature selection, feature creation,
discretisation, transformation; Measuring the similarity and
dissimilarity between: Simple attributes, data objects
Tutorials: WEKA
Proximity measures; Issues in proximity calculation
Tutorial: Exploring the IRIS data set
Data Mining Techniques; Mining association rules:
 Association rule mining; Apriori algorithm
Frequent Pattern Growth algorithm; Rule based Classification
What is classification: Decision trees: ID3, C4.5; Rule induction:
RIPPER algorithm
Tutorial: WEKA
Data Mining Techniques: Rule based Classification
 Associative classification (CBA, MMAC)
Rule Pruning : REP, database coverage
Data Mining Techniques: Statistical classification: Naïve bayes;
Issues in Classification: Overfitting and cross-validation;
Evaluation methods in Classification
Tutorial
Data Mining Techniques: Other classification approaches:
Regression; Neural networks; Genetic algorithms
Cluster analysis: Partitioning methods (K-means); Hierarchical
methods (BIRCH and CURE)
Outlier analysis: Preliminaries; Statistical approaches;
Density-based methods; Tutorial 8: WEKA
Case Study: Text Categorization
Tutorial: Using associative classification for text categorization
review questions and Final Exam
Homework/report
s and their due
dates
Assignment 1
Assignment 2
Page 4 of 4
Expected workload:
On average students need to spend 2 hours of study and preparation for each 50-minute
lecture/tutorial.
Attendance policy:
Absence from lectures and/or tutorials shall not exceed 15%. Students who exceed the 15%
limit without a medical or emergency excuse acceptable to and approved by the Dean of
the relevant college/faculty shall not be allowed to take the final examination and shall
receive a mark of zero for the course. If the excuse is approved by the Dean, the student
shall be considered to have withdrawn from the course.
Module references
Books
References
1. Connoly, T., and Begg, C. Data Base Systems: A Practical Approach
to Design, Implementation, and Management, Addison Wesley, fourth
edition, 1999.
2. Witten, I., and Frank, E. Data mining: practical machine learning tools
and techniques with Java implementations. San Francisco: Morgan
Kaufmann, 2001.
Journals
Websites
3. ---- WEKA
4. Oracle data warehousing package