Download Data Mining - University of St. Thomas

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Syllabus
Graduate Program in Software
CSIS 536: Data Mining
Instructor
Name:
Chih Lai, Ph.D.
E-mail: [email protected]
(please include “CS536” in the subject field of your e-mail for ease categorization)
Graduate Program in Software
CSIS 536
Data Mining
WWW:
http://personal1.stthomas.edu/clai/ (click here)
Voice:
651-962-5573
Mailing stop:
Syllabus
Office:
Fax:
651-962-5543
Mail #OSS301
University of St. Thomas
2115 Summit Avenue
St. Paul, MN 55105-1079
OSS 308
Office Hours: 3:30 – 5:00 PM Wednesday
Also by prior appointment.
Class Rooms / Hours
5:45 – 9:00 PM Wednesday,
© Copyright 2007 by Chih Lai, University of St. Thomas
Page: 1
© Copyright 2007 by Chih Lai, University of St. Thomas
Textbooks
Textbooks
Data Mining: Concepts and Techniques, by Jiawei Han, Micheline Kamber, Morgan
Kaufmann 2006.
Room OSS 333
Page: 2
Course Description
Highly Recommended Textbooks
Data Mining: Introductory and Advanced Topics, by Margarnet H. Dunham,
Prentice Hall 2002.
Course Description
In this course, we will discuss suitable data models, data preparation, and
different methods and algorithms to discover new knowledge from large
amount of raw data. Topics include: (1) Data warehousing and data cleaning,
(2) Association rule and market basket analysis, (3) Decision tree
classification and customer behavior prediction, (4) Data clustering and
market segmentation, (5) Temporal, spatial, and graph analysis, (6) Data
mining tools and frameworks.
If time permits, we will also discuss (1) Inductive and analytic
al learning, (2)
analytical
Genetic algorithms and programming.
Data Mining: Practical Machine Learning Tools and Techniques with Java
Implementation (Second Edition), by Ian H. Witten, and Eibe Frank, Morgan
Kaufmann, 2005
Note: This course is an advanced technical course and emphasizes on the
fundamentals of data mining algorithms and research issues.
This is **NOT** a high-level end-user tool-training class.
Others
Data Mining: Concepts, Models, Methods, and Algorithms, by Mehmed Kantardzic,
John Wiley & Sons, 2003.
Prerequisite
CSIS530 required and some programming experiences may help.
Patient and fresh brain are highly recommended.
Intorduction To Data Mining, by Pang Ning Tan, Michael Steibach, Vipin Kumar,
Addison Wesley, 2005.
© Copyright 2007 by Chih Lai, University of St. Thomas
Page: 3
© Copyright 2007 by Chih Lai, University of St. Thomas
Page: 4
Tentative Class Schedule
Course Project
Course Research Project
You will conduct a data mining project in a team of 3—4 people. You are all
required to participate to the maximum of your ability in the project. The
instructor will offer general guidelines.
Check the class schedule for the due dates of project plan and final report.
See the attached WORD document for detailed project requirements and
submission guidelines.
Suggestions on Presentations
Motivations– Give examples why the problem is interesting and important
Technical contents– Use examples to show how the techniques work
Discussion– Pros & cons.
Performance studies
References– Background and related work
Issues/impacts related to information ethics and privacy
© Copyright 2007 by Chih Lai, University of St. Thomas
Page: 5
No
Date
1
1/31
Topics
*Introduction to Data Mining
2
2/7
*Data Warehouse: Schema, Indexing, TDC / BUC Algorithms
*Chapter 2
3
2/14
*Association Rule, Support, Confidence, Apriori, Improvements
*Chapter 6.1 – 6.2 / Others
4
2/21
*Association Rule, FP-Tree, Sequential and Cyclic Rules
*Chapter 6.2 / Research Papers
5
2/28
*Multi-level Associations, Quantitative Rules, Constraints-Mining
*Chapter 6.3 – 6.6
6
3/7
*Classification, Apriori, Naïve Bayes Theorem, Zero-Frequency,
Laplace Estimator, Decision Tree
*Chapter 7.4, 7.6 / Others
7
3/14
*Entropy, Information Gain/Ratio/Bias, Gini Index, Missing/Numeric
Values, Overfitting, Tree Pruning, Sequential Covering
*Chapter 7.3, 7.7, 7.9 / Others
3/21
Spring Break, NO Class!!!!!
8
3/28
*Mid-term exam (5:45—8:00pm)
9
4/4
*Clustering, Outliers, Data Transformation, Similarity Measurement
*Chapter 8.1 – 8.3
10
4/11
*Partitioning Clustering, k-means, k-medoids, MST Algorithm,
*Chapter 8.4 – 8.5 / Others
11
4/18
*Hierarchical Clustering, Single/Complete Link, BIRCH, CURE
*Chapter 8.5 – 8.6 / Others
4/25
*Density Clustering, DBSCAN, Neural
Neural Network,
Network, Genetic
Genetic Algorithm,
Algorithm,
Spatial
Spatial rules,
rules, Inductive
Inductive // Deductive
Deductive Learning
Learning
*Chapter 8.6 / Others
13
5/2
*Project presentation
*Notes prepared by teams
14
5/9
*Final Exam
© Copyright 2007 by Chih Lai, University of St. Thomas
There will be two exams for this class. The exams are in class and closed-book. The exams
will be based primarily on the materials covered in class but will include some research
type questions as well.
Grading
Computing Resources
OSS 327 Computer Lab,
Please check your UST e-mail account regularly.
Support Staff
Instructor Chih Lai for questions regarding the materials covered in class, design and
implementation clarification.
GPS Lab assistant Marius Tegomeh (962-5517, [email protected]) for questions on
using the equipment in Room 327.
10%
30%
30%
30%
Letter grade will be assigned approximately as follows:
Attendance Policy
Course attendance is expected, but no grade is given for it. Students who miss sessions are
responsible for all information in that session. Students who need to miss presentations
or exams due to unavoidable conflicts must arrange in advance to make up the session
with the instructor.
A, AB+, B, BC+, C, CF
*** Final distribution may be adjusted based on the class performance.
*** Students who do NOT take exam(s) or miss project presentation will receive an “F”
grade
© Copyright 2007 by Chih Lai, University of St. Thomas
Page: 6
Resources
Exams
80% — 100%
70% — 80%
60% — 70%
Below 60%
*Project plan due
12
Exams and Grading
Homework assignments
Project
Midterm exam
Final exam
Materials / References
*Chapter 1
Page: 7
Course Assignments
Homework will be assigned from time to time during the semester in order to reinforce the
concepts/techniques discussed in the class. Assignments will be collected on the specified
due dates. NO late submission will be accepted without proper reasons.
© Copyright 2007 by Chih Lai, University of St. Thomas
Page: 8
Where to Find References?
Resources on Information Ethics
Data mining and KDD (SIGKDD)
Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc.
Journal: Data Mining and Knowledge Discovery, KDD Explorations
Database systems (SIGMOD)
Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT, DASFAA
Journals: ACM-TODS, IEEE-TKDE, JIIS, J. ACM, etc.
AI & Machine Learning
Conferences: Machine learning (ML), AAAI, IJCAI, COLT (Learning Theory), etc.
Journals: Machine Learning, Artificial Intelligence, etc.
Statistics
Conferences: Joint Stat. Meeting, etc.
Journals: Annals of statistics, etc.
Visualization
Conference proceedings: CHI, ACM-SIGGraph, etc.
Journals: IEEE Trans. visualization and computer graphics, etc.
© Copyright 2007 by Chih Lai, University of St. Thomas
Page: 9
Peter Fule, John F. Roddick, (2004) Detecting privacy and ethical sensitivity in data mining
results. ACM International Conference Proceeding Series; Vol. 56, Pages: 159-166.
Thuraisingham, B. (2002). Data mining, national security, privacy and civil liberties.
SIGKDD Explorations, 4(2), 1-5.
Danna, A., & Gandy, O. H., Jr. (2002). All that glitters is not gold: digging beneath the
surface of data mining. Journal of Business Ethics, 40(4), 373-386.
Wahlstrom, K., & Roddick, J. F. (2001). On the impact of knowledge discovery and data
mining. 2nd Australian Institute of Computer Ethics Conference (Canberra, 2001).
Computer Ethics– Stanford Encyclopedia of Philosophy
http://plato.stanford.edu/entries/ethics-computer/
CSIS 550, Legal Issues in Technology
© Copyright 2007 by Chih Lai, University of St. Thomas
Page: 10