Download Data Mining, Part 2 - AIT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining, Part 2
Published on AIT-Budapest’s web site (http://www.ait-budapest.com)
Data Mining, Part 2
Course Title:
Data Mining 2: Applications
Instructors:
András A. Benczúr, Róbert Pálovics, Eszter Friedman, Ancsa Hannák, Johannes
Wachs
Duration:
Weeks 8-14, 2x2 hours, 2 credits
Short Description of the Course:
"What data scientists do is make discoveries while swimming in data", as described by the Harvard Business
Review. In the second part of the course, we learn advanced techniques including kernel methods, recommender
systems, network centrality, in addition to getting introduced to Big Data tools such as Hadoop. During the course,
we will have guest lectures by data scientists from companies in the Budapest area. Students will have the option
to define their data mining projects and work in teams during the semester.
Aim of the Course:
The aim of the course is to discuss advanced techniques of data mining with useful knowledge of related
disciplines supporting real-world, especially bioinformatics data mining projects. By the end of the course, students
will be able to analyze biological (genomic, microarray, pathway, protein, chemical) data sets using complex data
mining methods.
Prerequisites:
The course requires basic knowledge in data mining. (See also the course Data Mining: Models and Algorithms)
Background in probability theory, linear algebra and programming is important.
Detailed Program and Class Schedule:
Advanced classification methods: Bagging, boosting, AdaBoost.
More models and algorithms for classification: neural networks, linear separation methods, support vector
machine (SVM).
Random forest.
Recommender systems. Collaborative filtering. Implicit and explicit recommendation.
Dimensionality reduction by spectral methods, singular value decomposition, low-rank approximation.
Search engines, web information retrieval, PageRank and network mining.
Distributed data processing systems, data processing with Hadoop.
Text mining, natural language processing.
Selected topics connected to student projects (e.g. Mining biological, scientific, social media data)
Final test.
Method of Instruction:
Handouts, presentations, IPython Notebooks, relevant research papers, web page, course mailing list and Wiki.
Weekly regular office hour for consultations.
Textbooks:
Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Addison-Wesley, 2006.
Page 1 of 2
Data Mining, Part 2
Published on AIT-Budapest’s web site (http://www.ait-budapest.com)
Jure Leskovec, Anand Rajaraman, Jeff Ullman: Mining of Massive Datasets
http://www.mmds.org/
Instructors’ Bios:
András Benczúr (born 1969) is the head of the Big Data Lab at the Institute for Computer Science and Control of
the Hungarian Academy of Science (MTA SZTAKI).
He received his Ph.D. degree at MIT, US in 1997. His primary research areas are information retrieval, data mining
and algorithms. He won a "Yahoo! Faculty Research Grant" in 2006. Benczúr's group won 1st place at the KDD
Cup of the ACM in 2007 and 2nd place at RecSys Challenge 2014.
He is the author or co-author of more than 50 refereed research papers with over 500 citations. He has served as
coordinator and/or principal researcher of several national and international information retrieval and data mining
projects, including the collaboration with TU Berlin to develop Apache Flink.
Johannes Wachs is a PhD student at Central European University's Center for Network Science. His masters is in
mathematics and he has also worked in finance. He studies public contracting markets using network methods. He
is interested in patterns that emerge when actors are corrupt. He is also affiliated with the Government
Transparency Institute, a non-profit that researches corruption in government using quantitative methods. He was
born in Germany, grew up in the US, and has been living in Hungary since 2009.
Back
Page 2 of 2