Download IS53023B Name

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Code: IS53023B
Name: Data Mining
Level: 6
Credits: 15
Prerequisites:
Aims
The course introduces machine learning techniques utilised in the process of
discovery of knowledge or hidden patterns in potentially large volumes of data.
Practical data mining is introduced through assignments consisting in theoretical
exercises, algorithm implementation and data mining suite utilisation for
knowledge discovery. The module offers also pointers towards new
developments in the field as the multimedia data mining, in particular text, web,
and music data mining.
Learning Outcomes
On successful completion of this module, students will have demonstrated ability to:
Subject-related Knowledge
K1. Explain the various phases of knowledge discovery in data
K2. Describe and explain the representative machine learning and statistical techniques
providing data mining algorithms for supervised learning, unsupervised clustering
and association rule learning
K3. Explain the methods of performance evaluation for the classification,
estimation, prediction, clustering and association analysis models
K4. Compare and contrast predictive data mining models in order to select and
apply the optimal ones
Subject-related Skills
S1. Construct programs that implement data mining algorithms in a programming
language as Java or a scripting language as PHP
S2. Develop complete processes of knowledge discovery in data utilising a data
mining suite
S3. Apply data mining algorithms in various areas as fraud detection, banking,
estate value and product quality estimation, medical diagnosing and research,
consumer analytics
Transferable Skills
T1. Construct predictive tools involved in decision support systems
T2. Construct document categorization tools involved in web page
recommendation systems, email spam detection systems, by utilising data
mining strategies as classification, unsupervised clustering and association
analysis
T3. Construct classification tools involved in music and image retrieval systems
utilising data mining strategies as supervised learning and unsupervised
clustering
Syllabus
1. Motivation for data mining.
2. Data mining strategies: supervised learning, unsupervised clustering and
association analysis.
3. Representative machine learning and statistical algorithms for data
mining: classification, estimation and prediction algorithms for building
models as decision trees, neural networks, logistic, linear regressions and
regression trees, Bayesian classifiers, segmentation algorithms based on
agglomerative hierarchical clustering, centred cluster building (k-means),
self organising maps, and market basket analysis algorithms.
4. Evaluation of performance of predictive models.
5. Knowledge discovery in data processes.
6. Data pre-processing: treatment of missing values, data normalisation, data
sampling, selection of the best predictive attributes.
7. Data warehousing.
8. Case studies using implemented data mining algorithms and a data
mining suite, supporting [S1-S3].
9. Trends in data mining related to [T2-T3].
Assessment
Main texts:
• Richard Roiger and Michael Geatz “Data Mining, a tutorial-based primer”,
Addison Wesley, 2003
• Jiawei Han and Micheline Kamber “Data Mining: Concepts and Techniques”,
Morgan Kaufmann, 2006
Recommended titles:
• Ian Witten and Eibe Frank “Data Mining: Practical Machine Learning Tools and
Techniques” , Morgan Kaufmann, 2005
• Margaret Dunham “Data Mining: Introductory and Advanced Topics”, Prentice
Hall, 2002
• Sheldon Ross “Introductory Statistics”, Elsevier Academic Press, 2005
• Pang-Ning Tan, Michael Steinbach, and Vipin Kumar: Introduction to Data
Mining, Addison Wesley, 2005
Assessment
Assignments
A1: Specific exercises supporting lecture material [unassessed].
A2: Homework consisting in optional further reading on specific themes [unassessed]
A3: Practical Coursework: Combination of exercises and practical implementation of
data mining processes based on the techniques introduced [assessed].
Worth 20% of overall mark.
Examination
E1: 2hr 15 min, 3 questions out of 5, unseen examination testing theoretical elements of
the syllabus. Worth 80% of overall mark.
Resource/Timetabling
2 hours lecture, 1 hour lab