Download Here - UNM Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
CS 591.003
Title: Introduction to Data Mining
Instructor: Abdullah Mueen
Time: 12:30 pm - 1:45 pm
Room: Centennial Engineering Center B146A
Office Hours: Tuesday and Thursday, 10:00AM-12:00PM
Description: This course covers data mining topics from basic to advanced level.
Topics include data cleaning, clustering, classification, outlier detection, associationrule discovery, tools and technologies for data mining and algorithms for mining
complex data such as graphs, text and sequences. Students will work on a data mining
project to gather hands-on experience.
The course learning objectives include
• Learning basic data mining algorithms and their applications
• Learning about the tools and technologies available for analyzing various
types of data
• Gaining hands-on experience in cleaning, managing and processing complex
data.
Book: Data Mining: Concepts and Techniques, 3rd ed. By Jiawei Han, Micheline
Kamber and Jian Pei, The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
Grading: There will be two exams. One midterm on topics from weeks
1-7 and the final on the reminder of the topics. The exams are worth
25% each. Students will pick group-projects and apply mining
algorithms. Project is worth 20%. There will be three to five
homework, together they are 10% of the class. There will be four
assignments worth 5% each. Homework will focus on understanding
the algorithms and techniques. The assignments will be on applying different data
mining techniques on real-data selected by the instructor.
Lecture Schedule: A tentative weekly distribution of topics is given below. There will
be re-arrangement for holidays and exams.
Week 1:
Week 2:
Week 3:
Week 4:
Ch. 1, 2: What is Data Mining? Types of Data.
Ch. 3: Data Preprocessing. Cleaning, Integration, Reduction and
Transformation
Ch. 6, 7: Mining Frequent Patterns (FP), Associations and
Correlations. Apriori, FP Tree
Ch. 8: Basic Classification: Decision Tree, Bayes Classifier, Rule
Based, Goodness measures
Week 5:
Week 6:
Week 7:
Week 8:
Week 9:
Week 10
Week 11:
Week 12:
Week 13:
Week 14:
Ch. 8, 9: Advanced Classification: Boosting, Bagging, Random
Forest, Lazy Learners, FP based classification
Ch. 10: Basic Clustering: Hierarchical, Partitioning, Density-based,
Grid-based
Ch. 11: Advanced Clustering: Subspace clustering, Co-clustering,
Fuzzy clustering, Expectation-Maximization clustering
Ch. 12: Outlier Detection: Statistical and Proximity based methods
Ch. 13: Mining Complex Data Types: Sequences (real and discrete)
Mining Complex Data Types: Graphs and Trees
Mining Complex Data Types: Text, Logs, Reviews
Ch. 4,5: Data Mining Systems: Data warehousing, Data cubing,
Business Intelligence systems
Data Mining Tools: Weka, Vowpal-wabbit, Pivot-tables, Matlab
Statistics Toolbox
Web Mining: Web search, Computational advertising, User
behavior modeling, Fraud detection
Project:
Each group will do one project. A group can have at most two students. Students in
the CS 491 section can have groups of three students. A project consists of two
phases with equal weights.
1. Data Preprocessing and Cleaning: Each group will propose a data source or
pick a data from a given list. Each group will propose data mining tasks, a set
of algorithms/tools and success measures. Groups will clean the data for the
projects and submit the written proposals by the end of 8th week.
2. Implementation and Presentation: Each group will implement the project and
write up the methods and results in the final project report. The groups will
present and demonstrate their projects in the class or in a poster session.
Exam:
The exams will be comprehensive covering basic concepts. The questions will be
deterministically testing the student’s knowledge about the algorithms.