Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSC 4740 / 6740 Fall 2016 Data Mining Instructor: Yubao Wu Fall 2016 Welcome! Instructor: Yubao Wu Office: 25 Park Place Suite 737 Phone: 404-413-6125 (office) E-mail: [email protected] Website: http://www.robwu.net/teaching Office Hours: 4:00 pm - 5:30 pm, Wednesday; 3:30 pm - 5:00 pm, Friday; or by appointment Classroom and Date Classroom: Petit Science Center 230 Date/Time: Monday/Wednesday, 10:00 am - 11:45 am Textbook Data Mining: Concepts and Techniques, Third Edition, by Jiawei Han, Micheline Kamber, and Jian Pei, Morgan Kaufmann Publishers, 2011. ISBN:978-0123814791 References Introduction to Data Mining, by Tan, Steinbach, and Kumar, Addison Wesley, 2006. (ISBN:0-321-32136-7) Principles of Data Mining, by Hand, Mannila, and Smyth, MIT Press, 2001. (ISBN:0-262-08290-X) The Elements of Statistical Learning --- Data Mining, Inference, and Prediction, by Hastie, Tibshirani, and Friedman, Springer, 2001. (ISBN:0-387-95284-5) Course Content Basic data mining techniques association rules mining Sequential Patterns Classification and Prediction Clustering and Outlier Detection Regression Pattern Interestingness Dimensionality Reduction …… Big data mining applications Web data mining Bioinformatics Social networks Text mining Visualization Financial data analysis Software Engineering …… Course Requirements Prerequisite: CSC 3410 Data Structures The department will strictly enforce all prerequisites. Students without proper prerequisites will be dropped from the class, without any prior notice, at any time during the semester. Course Requirements: Basic theoretical principles Practical hands-on experience Assignments Mid-term Exam Final Exam Research Project Assignments and Exams Mid-Term Exam: Open Textbook Final Exam: Open Textbook The problems for CSC 4740 and CSC 6740 may be different. Research Projects CSC 4740: One or Two undergraduate students form a group. Each group does a project and submits one project report. CSC 6740: Each graduate student does a project and submits one project report. Research Projects discovers interesting relationships within a significant amount of data. Some project ideas (only examples, best to propose your own) Statistical Computing (Speed up traditional statistical methods, such as correlation computation). Data Mining in Business Applications (Customer Segmentation, Accounting, Marketing) Literature Survey Mining Biological Datasets Social Network Analysis Your own ideas Research Projects Project proposal (2 - 4 pages, ACM SIGKDD or IEEE ICDM template) Title, project idea, survey of related work, data source, key algorithms/technology, and what you expect to submit at the end of the semester. Final report (6 - 12 pages , ACM SIGKDD or IEEE ICDM template) A comprehensive description of your project. project idea, extended survey of related work, detailed algorithm/technology, specific implementation, key results what worked, what did not work, what surprised you, and why Research Projects CSC 4740: Project Proposal Final Report Software, user manual, and sample dataset CSC 6740: Project Proposal Final Report Software, user manual, and sample dataset Slides Research Projects Final presentation In the last a few classes, each graduate student presents his/her project to the rest of the class. About 15 minute presentation + 2 minute questions Checkpoints Proposal (due Sep 21): ~ 1 month Final Report (due Dec 5): ~ 2 months Class Policy: Attendance: Students are required to attend all classes. Academic honesty: Plagiarism will result in a score of zero on the test or project. The instructor has the right to make a decision. Assignments and Projects: They must be handed in on time and will not be accepted when past due. Withdrawals: Oct 11 Tuesday is the last day to withdraw and possibly receive a W. Make-ups: need the instructor's special permission. Grading Policy: CSC 4740 Mid-term Exam 25% Final Exam 25% Assignments 30% Project 15% Attendance 5% CSC 6740 20% 20% 25% 30% 5% A+ [97, 100] A [93, 97) B+ [87, 90) B [83, 87) C+ [77, 80) C [73, 77) D [60, 70) F [0, 60) A- [90, 93) B- [80, 83) C- [70, 73) If one student’s score is no less than 97, an A+ will be given. The scores may be adjusted if the average is low. Tentative Course Outline and Schedule: Chapter 1 Introduction Chapter 2 Getting to Know Your Data Chapter 3 Data Preprocessing Aug. 22 Aug. 24 Chapter 6 Mining Frequent Patterns, Associations, Aug. 29, 31, Sep. 7 and Correlations: Basic Concepts and Methods Chapter 8 Classification: Basic Concepts Sep. 12 Chapter 9 Classification: Advanced Methods Sep. 14, 19, 21 Project Proposal Due 6 pm eastern time, Sep. 21 Tentative Course Outline and Schedule: Chapter 10 Cluster Analysis: Basic Concepts and Methods Sep. 26, 28, Oct. 5, 10 Mid-term Exam Oct. 3 Chapter 11 Advanced Cluster Analysis Oct. 12, 17, 19, 24 Chapter 13 Data Mining Trends and Research Frontiers Oct. 26, 31, Nov. 2, 7, 9, 14 Project Presentations Nov. 16, 28, 30 Final Exam Dec. 5 Research Project Due 6 pm eastern time, Dec. 8 KDD References Data mining and KDD Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc. Journal: ACM-KDD, Data Mining and Knowledge Discovery, KDD Explorations Database systems Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT, DASFAA Journals: ACM-TODS, IEEE-TKDE, JIIS, J. ACM, etc. AI & Machine Learning Conferences: Machine learning (ICML), AAAI, IJCAI, COLT (Learning Theory), etc. Journals: Machine Learning, Artificial Intelligence, etc. KDD References Statistics Conferences: Joint Stat. Meeting, etc. Journals: Annals of statistics, etc. Bioinformatics Conferences: ISMB, RECOMB, PSB, CSB, BIBE, etc. Journals: J. of Computational Biology, Bioinformatics, PLoS Computational Biology, etc. Questions?