Download Scalable Data Analysis and Data Mining CP (ECTS)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Title of Module : AIM-3 / SDADM
Advanced Information Management III –
Scalable Data Analysis and Data Mining
Person Responsible for Module:
Prof. Dr. Volker Markl
CP (ECTS) Short Name:
6
MINF-SE-DIMA-AIM3-SDADM.S12
Secr.:
EN 7
e-mail address:
[email protected]
Module Description
1. Qualification Aims
Recent advances in technology have led to rapid growth of big data. This led to the need for cost
efficient and scalable analysis algorithms. In this course concepts for scalable analysis of big data
sets will be presented and applied using open source technologies. Participants of this module will
gain an in-depth understanding of concepts and methods as well as practical experience in the area
of scalable data analysis and data mining.
The course is principally designed to impart:
technical skills 50% method skills 30% system skills 10% social skills10%
2. Content
The focus is of this module is to get familiar with different parallel processing platforms and
paradigms and to understand their feasibility for different kinds of data mining problems. For that
students will learn how to adapt popular data mining and standard machine learning algorithms such
as: Naïve Bayes, K-Means clustering or PageRank to scalable processing paradigms. And
subsequently gain practical experience in how to implement them on parallel processing platforms
such as Apache Hadoop, Stratosphere and Apache Giraph.
3. Module Components
Course Name
Course
Type
Advanced Information Management
3 –Scalable Data Analysis and Data
Mining
IV
Weekly hours CP (acc. to
per semester
ECTS)
4
6
Compulsory (C)
Semester
Compulsory
(WiSe / SoSe)
Elective (CE)
CE
SoSe
4. Description of Teaching and Learning Methods
This „integrated course“(Integrierte Veranstaltung, IV) consists of lectures on key concepts and
exercise sessions with smaller and larger exercises, particularly one complex task, to be fulfilled in
team work. This includes elaborating one of the key topics with own literature work, giving a short
presentation and developing an implementation. Active contribution to all parts of the course is
essential, as there will be a final presentation of the complex exercise by all members of the course.
5. Prerequisites for Participation
Prerequisites: The material covered in the basic modules MPGI 1-5 in the Bachelor Curriculum in
Computer Sciences/ TU Berlin, MPGI5 (“Datenbanksysteme”) as well as good Java programming
skills are required. A basic understanding of Probability and Statistics as well as Linear Algebra is
helpful.
The AIM-3 / SDADM course will be given in English language, thus fluency in English is required!
6. Target Group of Module
This course addresses master students with a focus on database systems and information
management after the first (master) term in Computer Science (Major field: System Engineering),
Computer Engineering (Major Field: Information Systems and Software Engineering)) and Industrial
Engineering (If the capacities are sufficient: Compulsory Elective Module for all students) AIM-3 /
SDADM also is open for the remaining diploma students in the mentioned areas.
(Wahlpflichtmodul im Masterstudiengang Informatik/ Studienschwerpunkt System Engineering,
Technische Informatik/ Studienschwerpunkte Informationssysteme und Software Engineering und im
Masterstudiengang Wirtschaftsingenieurswesen (Studiengang IuK).)
7. Work Requirements and Credit Points
Course Type
Plenary sessions of this integrated course
Calculation
Hours
15*4
60
Preparation / Consolidation (including literature work and seminar
presentation)
60
Exercises / Practice
30
Final report & presentation (as preparation for oral exam)
30
Total
180
8. Module Examination and Grading Procedures
The grade will be given with an oral examination. To be admitted for this final exam, a participant must
fulfill all required tasks during the course: seminar work; active participation in home/lab exercises
including final report and presentation.
9. Duration of Module
The module can be completed within 1 semester.
10. Number of Participants
The lab capacity limits this course to max. 30 participants.
11. Enrolment Procedures
Students are required to register via the DIMA course registration tool before the start of the first
lecture (http://www.dima.tu-berlin.de/). Within the first six weeks after commencement of the lecture,
students will have to register for the course at QISPOS (university examination protocol tool) and ISIS
(course organization tool) in addition to the registration at the DIMA course registration tool.
12. Recommended Reading, Lecture Notes
Scriptum available in paper: no
Slides of the lecture: yes
Web page: http://www.dima.tu-berlin.de
Recommended Reading:
 Anand Rajaraman, Jeffrey David Ullman : Mining of Massive Datasets (Free Online:
http://infolab.stanford.edu/~ullman/mmds/book.pdf)
 Ian H. Witten, Eibe Frank: Data Mining: Practical Machine Learning Tools and Techniques.
 Tom White: Hadoop: The Definitive Guide von Tom White.
For each topic during this course additional research papers and reports will be used.
13. Other Information
This module is offered each summer term.