Download Data Mining, Part 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Data Mining, Part 1
Published on AIT-Budapest’s web site (http://www.ait-budapest.com)
Data Mining, Part 1
Course Title:
Data Mining 1: Models and Algorithms
Instructors:
András A. Benczúr, Róbert Pálovics, Eszter Friedman, Ancsa Hannák, Johannes
Wachs
Duration:
Weeks 1-7, 2x2 hours, 2 credits
Short Description of the Course:
Data Scientist is called "the sexiest job of the Century" by Harvard Business Review. In the first part of the course,
we learn the basics of understanding data and predicting its unknown properties.
We give a general introduction to data analysis, modeling, and algorithms of data mining.The course provides a
good base for its follow-up course, Data Mining Applications.Lectures are supplemented by computer exercises
and student projects in small teams.
Aim of the Course:
The aim of the course is to provide a basic but comprehensive introduction to data mining. By the end of the
course students will be able to choose the right algorithms for data science problems to build, implement and
evaluate data mining models.
Prerequisites:
The course requires basic knowledge in calculus, probability theory, and linear algebra. Knowledge of graphs and
basic algorithms is an advantage. Basic programming skills are also required.
Detailed Program and Class Schedule:
Motivations for data mining. Examples of application domains.
Analyzing data: preparation and exploration.
Models and algorithms for classification.
Introduction to the IPython Notebook and python based data mining software packages. Classification with
scikit-learn.
Basics of classification. Concepts of training and prediction. Measuring quality and comparison of
classification models.
Type of variables, measuring similarity and distances. The k-nearest neighbor classifier.
Decision trees, naive Bayes. The concept of model over and underfitting. Midterm test.
Basics of cluster analysis. Partitioning clustering algorithms, k-means, k-medoids.
Hierarchical clustering algorithms.
Introduction to frequent itemset mining. Applications for finding association rules. Level-wise algorithms,
APRIORI.
Final test.
Method of instruction:
Handouts, presentations, IPython Notebooks, relevant research papers, web page, course mailing list and Wiki.
Weekly regular office hour for consultations..
Textbooks:
Page 1 of 2
Data Mining, Part 1
Published on AIT-Budapest’s web site (http://www.ait-budapest.com)
Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Addison-Wesley, 2006.
Jure Leskovec, Anand Rajaraman, Jeff Ullman: Mining of Massive Datasets
http://www.mmds.org/.
Instructors’ Bios:
András Benczúr (born 1969) is the head of the Big Data Lab at the Institute for Computer Science and Control of
the Hungarian Academy of Science (MTA SZTAKI).
He received his Ph.D. degree at MIT, US in 1997. His primary research areas are information retrieval, data mining
and algorithms. He won a "Yahoo! Faculty Research Grant" in 2006. Benczúr's group won 1st place at the KDD
Cup of the ACM in 2007 and 2nd place at RecSys Challenge 2014.
He is the author or co-author of more than 50 refereed research papers with over 500 citations. He has served as
coordinator and/or principal researcher of several national and international information retrieval and data mining
projects, including the collaboration with TU Berlin to develop Apache Flink.
Johannes Wachs is a PhD student at Central European University's Center for Network Science. His masters is in
mathematics and he has also worked in finance. He studies public contracting markets using network methods. He
is interested in patterns that emerge when actors are corrupt. He is also affiliated with the Government
Transparency Institute, a non-profit that researches corruption in government using quantitative methods. He was
born in Germany, grew up in the US, and has been living in Hungary since 2009.
Back
Page 2 of 2