Download CSIS455

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
COURSE SPECIFICATION
NAME OF COURSE: Data Mining Techniques and Applications
STATUS (main, optional,
Free Choice): Main
LEVEL: (F,A,P,1,2,3,M)
3
COURSE CODE: CS455 / IS455
UNIT VALUE:
4
Department offering course:
Course Co-ordinator:
Computer Science
Information Systems
Degree Programmes in which to be offered:
Information Systems, Computer Science
Pre-requisites:
Indicate whether a new
course or name of course
IS240/CS240 Databases
being replaced: New
TERMS TAUGHT:
Spring
Date of course commencement:
February
Total Contact Hours: 75
Lectures/Tutorials: 45
Practicals: 30
AIMS OF THE COURSE:
The overall aim of this course is to introduce students to modern data mining techniques and their use in business and other areas of applications. In particular, the
course explores basic concepts, principles and techniques of data mining, online analytic processing and data warehousing with emphasis on both the technical
and the practical issues. The course provides students with an understanding in evaluating and comparing data mining solutions for effective use of the solutions in
practice. The course also equips students with some hands-on experience and skills in conducting a data mining project using a data mining software tool, and/or
constructing a data warehouse.
SSST Course Specification
March 2012
(Page 1)
INTENDED LEARNING OUTCOMES
LEARNING AND TEACHING STRATEGIES
TO BE USED:
ASSESSMENT CRITERIA
(SHOULD LINK EXPLICITLY TO INTENDED
LEARNING OUTCOMES):
By the end of the course the students
will be able to:
1. Understand basic concepts and
principles of data mining and data
warehousing
2. Understand state-of-art
approaches and techniques in
data mining and visualisation
3. Develop a working application
using a commercial data
mining/data warehousing software
tool.
1. Lectures introduce theoretical and
conceptual materials from the
recommended textbooks
2. Tutorials explore the application of theory
and concepts
1. 4 major topics tests (40%)
3. Laboratory sessions provide exercises to
apply the theory and use data
mining/warehousing software tools
3. Group project (10%)
2. Written final exam (50%)
4. Group project enables students to
develop team-work skills and apply what
they have learnt in the course to a
practical problem
TRANSFERABLE SKILLS AND
OTHER ATTRIBUTES
1. Oral and written presentation:
ability to express ideas clearly and
precisely
2. Critical thinking: ability to analyse
data in databases and evaluate
possible solutions for real life
applications
3. Team work: ability to collaborate in
applications development projects
4. Group discussions: ability to
participate in group discussions on
a given subject
LEARNING AND TEACHING STRATEGIES
USED:
1. Personal and team projects to read,
evaluate and present knowledge.
2. Practical classes and application on real
life data
ASSESSMENT CRITERIA
(SHOULD LINK EXPLICITLY TO INTENDED
LEARNING OUTCOMES):
1. Projects
2. Projects and examinations
3. Projects
4. Practical classes
SSST Course Specification
March 2012
(Page 2)
COURSE OUTLINE/SYLLABUS:
•
Introduction: Concept of data mining. Data mining and KDD. Data mining process. Major data mining tasks. Data mining approaches. Overview of data mining
solutions. Importance of evaluation. Evolution of data mining. Promises and challenges.
•
Understanding Data: data and data sets. Data types. Data quality, Data pre-processing. Data summarisation and visualisation.
•
Data Mining Techniques I: Problem of cluster detection. Proximity measures. Basic clustering methods: K-means and Agglomeration. Validation and
evaluation of clusters. Overview of other types of clustering methods. Clustering in practice.
•
Data Mining Techniques II: Problem of classification. Decision tree induction approach: ID3 and other tree induction solutions. Nearest neighbour approach:
kNN and PEBLS methods. Statistical approach: Naïve Bayes method. Overview of other classification approaches. Evaluation of classifiers. Problem of
overfitting and solutions. Classification in practice.
•
Data Mining Techniques III: Problem of association rule discovery. Apriori algorithms for Boolean, generalised and quantitative association rules. Evaluation of
association rules. Other types of association rules. Association rule discovery in practice.
•
Data Mining Projects. Data Mining project life cycle. The industry standard: CRISP-DM guideline for data mining. A case study on customer segmentation
•
Data Mining software tool WEKA: Overview of WEKA functions. Data Reprocessing in WEKA. WEKA visualisation facilities for data and patterns, WEKA data
mining functions. WEKA evaluation parameters. Choosing the best data mining solutions in WEKA. Overview of data mining software tools.
•
Data Warehousing: Goals and characteristics. Differences between data warehouse, data mart and database. Data warehouse architectures. Metadata and
management. Data loading and integration. Various data warehousing technologies in industry. Design issues for building a data warehouse.
•
Online Analytic Processing (OLAP): Concepts of Multidimensional cubes. Hierarchies of data abstraction. Operations over multidimensional cubes. OLAP
approaches. OLAP models and systems. Limitations and constraints of OLAP.
•
Putting Everything Into Perspective: Database, data warehouse, query, OLAP, data mining and decision support. Data, information and Knowledge in
enterprises. Application areas of data mining. Ethical and professional issues regarding data mining.
KEY TEXTS AND/OR OTHER LEARNING MATERIALS:
Recommended Texts:
•
Hongbo Du, Data Mining Techniques and Applications, ISBN 9781844808915.
• Marakas G.M., Modern Data Warehousing, Mining, and Visualization: Core Concepts, Prentice Hall, 2003, ISBN-10: 0131014595.
Additional Reading:
•
Rob P. and Coronel C. , Database Systems: Design, Implementation and Management, Thomson Course Technology, 2004, ISBN 0-619-21323-X
•
Berry, M. J. A. and Linoff, G., Data Mining Techniques, for Marketing, Sales and Customer Support, 2nd ed, Wiley, 2004, ISBN: 0471470643
SSST Course Specification
March 2012
(Page 3)