Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Syllabus Graduate Program in Software CSIS 536: Data Mining Instructor Name: Chih Lai, Ph.D. E-mail: [email protected] (please include “CS536” in the subject field of your e-mail for ease categorization) Graduate Program in Software CSIS 536 Data Mining WWW: http://personal1.stthomas.edu/clai/ (click here) Voice: 651-962-5573 Mailing stop: Syllabus Office: Fax: 651-962-5543 Mail #OSS301 University of St. Thomas 2115 Summit Avenue St. Paul, MN 55105-1079 OSS 308 Office Hours: 3:30 – 5:00 PM Wednesday Also by prior appointment. Class Rooms / Hours 5:45 – 9:00 PM Wednesday, © Copyright 2007 by Chih Lai, University of St. Thomas Page: 1 © Copyright 2007 by Chih Lai, University of St. Thomas Textbooks Textbooks Data Mining: Concepts and Techniques, by Jiawei Han, Micheline Kamber, Morgan Kaufmann 2006. Room OSS 333 Page: 2 Course Description Highly Recommended Textbooks Data Mining: Introductory and Advanced Topics, by Margarnet H. Dunham, Prentice Hall 2002. Course Description In this course, we will discuss suitable data models, data preparation, and different methods and algorithms to discover new knowledge from large amount of raw data. Topics include: (1) Data warehousing and data cleaning, (2) Association rule and market basket analysis, (3) Decision tree classification and customer behavior prediction, (4) Data clustering and market segmentation, (5) Temporal, spatial, and graph analysis, (6) Data mining tools and frameworks. If time permits, we will also discuss (1) Inductive and analytic al learning, (2) analytical Genetic algorithms and programming. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementation (Second Edition), by Ian H. Witten, and Eibe Frank, Morgan Kaufmann, 2005 Note: This course is an advanced technical course and emphasizes on the fundamentals of data mining algorithms and research issues. This is **NOT** a high-level end-user tool-training class. Others Data Mining: Concepts, Models, Methods, and Algorithms, by Mehmed Kantardzic, John Wiley & Sons, 2003. Prerequisite CSIS530 required and some programming experiences may help. Patient and fresh brain are highly recommended. Intorduction To Data Mining, by Pang Ning Tan, Michael Steibach, Vipin Kumar, Addison Wesley, 2005. © Copyright 2007 by Chih Lai, University of St. Thomas Page: 3 © Copyright 2007 by Chih Lai, University of St. Thomas Page: 4 Tentative Class Schedule Course Project Course Research Project You will conduct a data mining project in a team of 3—4 people. You are all required to participate to the maximum of your ability in the project. The instructor will offer general guidelines. Check the class schedule for the due dates of project plan and final report. See the attached WORD document for detailed project requirements and submission guidelines. Suggestions on Presentations Motivations– Give examples why the problem is interesting and important Technical contents– Use examples to show how the techniques work Discussion– Pros & cons. Performance studies References– Background and related work Issues/impacts related to information ethics and privacy © Copyright 2007 by Chih Lai, University of St. Thomas Page: 5 No Date 1 1/31 Topics *Introduction to Data Mining 2 2/7 *Data Warehouse: Schema, Indexing, TDC / BUC Algorithms *Chapter 2 3 2/14 *Association Rule, Support, Confidence, Apriori, Improvements *Chapter 6.1 – 6.2 / Others 4 2/21 *Association Rule, FP-Tree, Sequential and Cyclic Rules *Chapter 6.2 / Research Papers 5 2/28 *Multi-level Associations, Quantitative Rules, Constraints-Mining *Chapter 6.3 – 6.6 6 3/7 *Classification, Apriori, Naïve Bayes Theorem, Zero-Frequency, Laplace Estimator, Decision Tree *Chapter 7.4, 7.6 / Others 7 3/14 *Entropy, Information Gain/Ratio/Bias, Gini Index, Missing/Numeric Values, Overfitting, Tree Pruning, Sequential Covering *Chapter 7.3, 7.7, 7.9 / Others 3/21 Spring Break, NO Class!!!!! 8 3/28 *Mid-term exam (5:45—8:00pm) 9 4/4 *Clustering, Outliers, Data Transformation, Similarity Measurement *Chapter 8.1 – 8.3 10 4/11 *Partitioning Clustering, k-means, k-medoids, MST Algorithm, *Chapter 8.4 – 8.5 / Others 11 4/18 *Hierarchical Clustering, Single/Complete Link, BIRCH, CURE *Chapter 8.5 – 8.6 / Others 4/25 *Density Clustering, DBSCAN, Neural Neural Network, Network, Genetic Genetic Algorithm, Algorithm, Spatial Spatial rules, rules, Inductive Inductive // Deductive Deductive Learning Learning *Chapter 8.6 / Others 13 5/2 *Project presentation *Notes prepared by teams 14 5/9 *Final Exam © Copyright 2007 by Chih Lai, University of St. Thomas There will be two exams for this class. The exams are in class and closed-book. The exams will be based primarily on the materials covered in class but will include some research type questions as well. Grading Computing Resources OSS 327 Computer Lab, Please check your UST e-mail account regularly. Support Staff Instructor Chih Lai for questions regarding the materials covered in class, design and implementation clarification. GPS Lab assistant Marius Tegomeh (962-5517, [email protected]) for questions on using the equipment in Room 327. 10% 30% 30% 30% Letter grade will be assigned approximately as follows: Attendance Policy Course attendance is expected, but no grade is given for it. Students who miss sessions are responsible for all information in that session. Students who need to miss presentations or exams due to unavoidable conflicts must arrange in advance to make up the session with the instructor. A, AB+, B, BC+, C, CF *** Final distribution may be adjusted based on the class performance. *** Students who do NOT take exam(s) or miss project presentation will receive an “F” grade © Copyright 2007 by Chih Lai, University of St. Thomas Page: 6 Resources Exams 80% — 100% 70% — 80% 60% — 70% Below 60% *Project plan due 12 Exams and Grading Homework assignments Project Midterm exam Final exam Materials / References *Chapter 1 Page: 7 Course Assignments Homework will be assigned from time to time during the semester in order to reinforce the concepts/techniques discussed in the class. Assignments will be collected on the specified due dates. NO late submission will be accepted without proper reasons. © Copyright 2007 by Chih Lai, University of St. Thomas Page: 8 Where to Find References? Resources on Information Ethics Data mining and KDD (SIGKDD) Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc. Journal: Data Mining and Knowledge Discovery, KDD Explorations Database systems (SIGMOD) Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT, DASFAA Journals: ACM-TODS, IEEE-TKDE, JIIS, J. ACM, etc. AI & Machine Learning Conferences: Machine learning (ML), AAAI, IJCAI, COLT (Learning Theory), etc. Journals: Machine Learning, Artificial Intelligence, etc. Statistics Conferences: Joint Stat. Meeting, etc. Journals: Annals of statistics, etc. Visualization Conference proceedings: CHI, ACM-SIGGraph, etc. Journals: IEEE Trans. visualization and computer graphics, etc. © Copyright 2007 by Chih Lai, University of St. Thomas Page: 9 Peter Fule, John F. Roddick, (2004) Detecting privacy and ethical sensitivity in data mining results. ACM International Conference Proceeding Series; Vol. 56, Pages: 159-166. Thuraisingham, B. (2002). Data mining, national security, privacy and civil liberties. SIGKDD Explorations, 4(2), 1-5. Danna, A., & Gandy, O. H., Jr. (2002). All that glitters is not gold: digging beneath the surface of data mining. Journal of Business Ethics, 40(4), 373-386. Wahlstrom, K., & Roddick, J. F. (2001). On the impact of knowledge discovery and data mining. 2nd Australian Institute of Computer Ethics Conference (Canberra, 2001). Computer Ethics– Stanford Encyclopedia of Philosophy http://plato.stanford.edu/entries/ethics-computer/ CSIS 550, Legal Issues in Technology © Copyright 2007 by Chih Lai, University of St. Thomas Page: 10