Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Title of Module : AIM-3 / SDADM Advanced Information Management III – Scalable Data Analysis and Data Mining Person Responsible for Module: Prof. Dr. Volker Markl CP (ECTS) Short Name: 6 MINF-SE-DIMA-AIM3-SDADM.S12 Secr.: EN 7 e-mail address: [email protected] Module Description 1. Qualification Aims Recent advances in technology have led to rapid growth of big data. This led to the need for cost efficient and scalable analysis algorithms. In this course concepts for scalable analysis of big data sets will be presented and applied using open source technologies. Participants of this module will gain an in-depth understanding of concepts and methods as well as practical experience in the area of scalable data analysis and data mining. The course is principally designed to impart: technical skills 50% method skills 30% system skills 10% social skills10% 2. Content The focus is of this module is to get familiar with different parallel processing platforms and paradigms and to understand their feasibility for different kinds of data mining problems. For that students will learn how to adapt popular data mining and standard machine learning algorithms such as: Naïve Bayes, K-Means clustering or PageRank to scalable processing paradigms. And subsequently gain practical experience in how to implement them on parallel processing platforms such as Apache Hadoop, Stratosphere and Apache Giraph. 3. Module Components Course Name Course Type Advanced Information Management 3 –Scalable Data Analysis and Data Mining IV Weekly hours CP (acc. to per semester ECTS) 4 6 Compulsory (C) Semester Compulsory (WiSe / SoSe) Elective (CE) CE SoSe 4. Description of Teaching and Learning Methods This „integrated course“(Integrierte Veranstaltung, IV) consists of lectures on key concepts and exercise sessions with smaller and larger exercises, particularly one complex task, to be fulfilled in team work. This includes elaborating one of the key topics with own literature work, giving a short presentation and developing an implementation. Active contribution to all parts of the course is essential, as there will be a final presentation of the complex exercise by all members of the course. 5. Prerequisites for Participation Prerequisites: The material covered in the basic modules MPGI 1-5 in the Bachelor Curriculum in Computer Sciences/ TU Berlin, MPGI5 (“Datenbanksysteme”) as well as good Java programming skills are required. A basic understanding of Probability and Statistics as well as Linear Algebra is helpful. The AIM-3 / SDADM course will be given in English language, thus fluency in English is required! 6. Target Group of Module This course addresses master students with a focus on database systems and information management after the first (master) term in Computer Science (Major field: System Engineering), Computer Engineering (Major Field: Information Systems and Software Engineering)) and Industrial Engineering (If the capacities are sufficient: Compulsory Elective Module for all students) AIM-3 / SDADM also is open for the remaining diploma students in the mentioned areas. (Wahlpflichtmodul im Masterstudiengang Informatik/ Studienschwerpunkt System Engineering, Technische Informatik/ Studienschwerpunkte Informationssysteme und Software Engineering und im Masterstudiengang Wirtschaftsingenieurswesen (Studiengang IuK).) 7. Work Requirements and Credit Points Course Type Plenary sessions of this integrated course Calculation Hours 15*4 60 Preparation / Consolidation (including literature work and seminar presentation) 60 Exercises / Practice 30 Final report & presentation (as preparation for oral exam) 30 Total 180 8. Module Examination and Grading Procedures The grade will be given with an oral examination. To be admitted for this final exam, a participant must fulfill all required tasks during the course: seminar work; active participation in home/lab exercises including final report and presentation. 9. Duration of Module The module can be completed within 1 semester. 10. Number of Participants The lab capacity limits this course to max. 30 participants. 11. Enrolment Procedures Students are required to register via the DIMA course registration tool before the start of the first lecture (http://www.dima.tu-berlin.de/). Within the first six weeks after commencement of the lecture, students will have to register for the course at QISPOS (university examination protocol tool) and ISIS (course organization tool) in addition to the registration at the DIMA course registration tool. 12. Recommended Reading, Lecture Notes Scriptum available in paper: no Slides of the lecture: yes Web page: http://www.dima.tu-berlin.de Recommended Reading: Anand Rajaraman, Jeffrey David Ullman : Mining of Massive Datasets (Free Online: http://infolab.stanford.edu/~ullman/mmds/book.pdf) Ian H. Witten, Eibe Frank: Data Mining: Practical Machine Learning Tools and Techniques. Tom White: Hadoop: The Definitive Guide von Tom White. For each topic during this course additional research papers and reports will be used. 13. Other Information This module is offered each summer term.