Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS 591.003 Title: Introduction to Data Mining Instructor: Abdullah Mueen Time: 12:30 pm - 1:45 pm Room: Centennial Engineering Center B146A Office Hours: Tuesday and Thursday, 10:00AM-12:00PM Description: This course covers data mining topics from basic to advanced level. Topics include data cleaning, clustering, classification, outlier detection, associationrule discovery, tools and technologies for data mining and algorithms for mining complex data such as graphs, text and sequences. Students will work on a data mining project to gather hands-on experience. The course learning objectives include • Learning basic data mining algorithms and their applications • Learning about the tools and technologies available for analyzing various types of data • Gaining hands-on experience in cleaning, managing and processing complex data. Book: Data Mining: Concepts and Techniques, 3rd ed. By Jiawei Han, Micheline Kamber and Jian Pei, The Morgan Kaufmann Series in Data Management Systems Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791 Grading: There will be two exams. One midterm on topics from weeks 1-7 and the final on the reminder of the topics. The exams are worth 25% each. Students will pick group-projects and apply mining algorithms. Project is worth 20%. There will be three to five homework, together they are 10% of the class. There will be four assignments worth 5% each. Homework will focus on understanding the algorithms and techniques. The assignments will be on applying different data mining techniques on real-data selected by the instructor. Lecture Schedule: A tentative weekly distribution of topics is given below. There will be re-arrangement for holidays and exams. Week 1: Week 2: Week 3: Week 4: Ch. 1, 2: What is Data Mining? Types of Data. Ch. 3: Data Preprocessing. Cleaning, Integration, Reduction and Transformation Ch. 6, 7: Mining Frequent Patterns (FP), Associations and Correlations. Apriori, FP Tree Ch. 8: Basic Classification: Decision Tree, Bayes Classifier, Rule Based, Goodness measures Week 5: Week 6: Week 7: Week 8: Week 9: Week 10 Week 11: Week 12: Week 13: Week 14: Ch. 8, 9: Advanced Classification: Boosting, Bagging, Random Forest, Lazy Learners, FP based classification Ch. 10: Basic Clustering: Hierarchical, Partitioning, Density-based, Grid-based Ch. 11: Advanced Clustering: Subspace clustering, Co-clustering, Fuzzy clustering, Expectation-Maximization clustering Ch. 12: Outlier Detection: Statistical and Proximity based methods Ch. 13: Mining Complex Data Types: Sequences (real and discrete) Mining Complex Data Types: Graphs and Trees Mining Complex Data Types: Text, Logs, Reviews Ch. 4,5: Data Mining Systems: Data warehousing, Data cubing, Business Intelligence systems Data Mining Tools: Weka, Vowpal-wabbit, Pivot-tables, Matlab Statistics Toolbox Web Mining: Web search, Computational advertising, User behavior modeling, Fraud detection Project: Each group will do one project. A group can have at most two students. Students in the CS 491 section can have groups of three students. A project consists of two phases with equal weights. 1. Data Preprocessing and Cleaning: Each group will propose a data source or pick a data from a given list. Each group will propose data mining tasks, a set of algorithms/tools and success measures. Groups will clean the data for the projects and submit the written proposals by the end of 8th week. 2. Implementation and Presentation: Each group will implement the project and write up the methods and results in the final project report. The groups will present and demonstrate their projects in the class or in a poster session. Exam: The exams will be comprehensive covering basic concepts. The questions will be deterministically testing the student’s knowledge about the algorithms.