Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining Techniques Instructor: Ruoming Jin Fall 2006 1 Welcome! Instructor: Ruoming Jin Homepage: www.cs.kent.edu/~jin/ Office: 264 MCS Building Email: [email protected] Office hour: Mondays and Wednesdays (10:00AM to 11:00AM) or by appointment 2 Overview Homepage: www.cs.kent.edu/~jin/datamining.html Time: 11:00-12:15PM Monday and Wednesday Place: MSB 276 Prerequisite: none Preferred: Database, AI, Machine Learning, Statistics, Algorithms, and Data Structures 3 Overview Textbook: Introduction to Data Mining – Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison Wesley References Data Mining --- Concepts and techniques, by Han and Kamber, Morgan Kaufmann, 2001. (ISBN:1-55860-489-8) Principles of Data Mining, by Hand, Mannila, and Smyth, MIT Press, 2001. (ISBN:0-262-08290-X) The Elements of Statistical Learning --- Data Mining, Inference, and Prediction, by Hastie, Tibshirani, and Friedman, Springer, 2001. (ISBN:0-387-95284-5) Mining the Web --- Discovering Knowledge from Hypertext Data, by Chakrabarti, Morgan Kaufmann, 2003. (ISBN:1-55860-7544) 4 Overview Grading scheme Paper Presentation and discussion Project 35% Attendance and participation 15% No homework No exam 50% 5 Overview (Presentation) Paper presentation One per student Research paper(s) List of recommendations (will be available by the end of second week) Your own pick (upon approval) Three parts Review of research ideas in the paper Debate (Pros/Cons) Questions and comments from audience Class participation: One question/comment per student 6 Overview (Presentation) Order of presentation: assigned by instructor The presentation will start from late October or early November You need make your choice and send it to me by Sep. 22nd! You need submit three drafts before the final presentation First draft due on Oct. 14th Second draft due on Oct. 21th Final slides due one day before your presentation. I will provide feedback and suggestion for each draft Note that I do expect the complete presentation slides in the first draft 7 Overview (Project) Project (due Dec 3rd) One project: One or Two students Some suggestion will be available shortly Checkpoints The project will focus on visualizing data mining algorithm. Proposal: title and goal (due Oct 7th) Outline of approach (due Oct 7th) Implementation (due Dec 3rd) Evaluation (due Dec 3rd) Documentation (duce Dec 10rd) Each group will have a short presentation and demo (20 minutes) Each group will provide a five-page document on the project 8 Topics Scope:Data Mining Topics: Association Rule Sequential Patterns Graph Mining Clustering and Outlier Detection Classification and Prediction Regression Pattern Interestingness Dimensionality Reduction … 9 Topics Applications Bioinformatics Web mining Text mining Visualization Financial data analysis Intrusion detection … 10 KDD References Data mining and KDD (SIGKDD: CDROM) Journal: Data Mining and Knowledge Discovery, KDD Explorations Database systems (SIGMOD: CD ROM) Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc. Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT, DASFAA Journals: ACM-TODS, IEEE-TKDE, JIIS, J. ACM, etc. AI & Machine Learning Conferences: Machine learning (ICML), AAAI, IJCAI, COLT (Learning Theory), etc. Journals: Machine Learning, Artificial Intelligence, etc. 11 KDD References Statistics Conferences: Joint Stat. Meeting, etc. Journals: Annals of statistics, etc. Bioinformatics Conferences: ISMB, RECOMB, PSB, CSB, BIBE, etc. Journals: J. of Computational Biology, Bioinformatics, etc. Visualization Conference proceedings: CHI, ACM-SIGGraph, etc. Journals: IEEE Trans. visualization and computer graphics, etc. 12