Download CSC 4740 - Yubao Wu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
CSC 4740 / 6740 Fall 2016
Data Mining
Instructor: Yubao Wu
Fall 2016
Welcome!
Instructor: Yubao Wu
Office: 25 Park Place Suite 737
Phone: 404-413-6125 (office)
E-mail: [email protected]
Website: http://www.robwu.net/teaching
Office Hours: 4:00 pm - 5:30 pm, Wednesday;
3:30 pm - 5:00 pm, Friday; or by appointment
Classroom and Date
Classroom: Petit Science Center 230
Date/Time: Monday/Wednesday, 10:00 am - 11:45 am
Textbook
Data Mining: Concepts and
Techniques, Third Edition,
by Jiawei Han, Micheline Kamber,
and Jian Pei,
Morgan Kaufmann Publishers,
2011.
ISBN:978-0123814791
References
 Introduction to Data Mining, by Tan, Steinbach, and
Kumar, Addison Wesley, 2006. (ISBN:0-321-32136-7)
 Principles of Data Mining, by Hand, Mannila, and Smyth,
MIT Press, 2001. (ISBN:0-262-08290-X)
 The Elements of Statistical Learning --- Data Mining,
Inference, and Prediction, by Hastie, Tibshirani, and
Friedman, Springer, 2001. (ISBN:0-387-95284-5)
Course Content
Basic data mining techniques








association rules mining
Sequential Patterns
Classification and Prediction
Clustering and Outlier Detection
Regression
Pattern Interestingness
Dimensionality Reduction
……
Big data mining applications








Web data mining
Bioinformatics
Social networks
Text mining
Visualization
Financial data analysis
Software Engineering
……
Course Requirements
Prerequisite: CSC 3410 Data Structures
The department will strictly enforce all prerequisites. Students
without proper prerequisites will be dropped from the class,
without any prior notice, at any time during the semester.
Course Requirements:
 Basic theoretical principles
 Practical hands-on
experience




Assignments
Mid-term Exam
Final Exam
Research Project
Assignments and Exams
Mid-Term Exam: Open Textbook
Final Exam: Open Textbook
The problems for CSC 4740 and CSC 6740 may be different.
Research Projects
CSC 4740:
One or Two undergraduate students form a group.
Each group does a project and submits one project report.
CSC 6740:
Each graduate student does a project and submits one
project report.
Research Projects
discovers interesting relationships within a significant amount of
data.
Some project ideas (only examples, best to propose your own)
 Statistical Computing (Speed up traditional statistical
methods, such as correlation computation).
 Data Mining in Business Applications (Customer
Segmentation, Accounting, Marketing)
 Literature Survey
 Mining Biological Datasets
 Social Network Analysis
 Your own ideas
Research Projects
 Project proposal (2 - 4 pages, ACM SIGKDD or IEEE ICDM template)
 Title, project idea, survey of related work, data source, key
algorithms/technology, and what you expect to submit at the
end of the semester.
 Final report (6 - 12 pages , ACM SIGKDD or IEEE ICDM template)
 A comprehensive description of your project.
 project idea, extended survey of related work, detailed
algorithm/technology, specific implementation, key results
 what worked, what did not work, what surprised you, and why
Research Projects
CSC 4740:
 Project Proposal
 Final Report
 Software, user manual, and sample dataset
CSC 6740:




Project Proposal
Final Report
Software, user manual, and sample dataset
Slides
Research Projects
Final presentation
 In the last a few classes, each graduate student presents
his/her project to the rest of the class.
 About 15 minute presentation + 2 minute questions
Checkpoints
 Proposal (due Sep 21): ~ 1 month
 Final Report (due Dec 5): ~ 2 months
Class Policy:
 Attendance: Students are required to attend all classes.
 Academic honesty: Plagiarism will result in a score of zero on
the test or project. The instructor has the right to make a
decision.
 Assignments and Projects: They must be handed in on time
and will not be accepted when past due.
 Withdrawals: Oct 11 Tuesday is the last day to withdraw and
possibly receive a W.
 Make-ups: need the instructor's special permission.
Grading Policy:
CSC 4740
Mid-term Exam
25%
Final Exam
25%
Assignments
30%
Project
15%
Attendance
5%
CSC 6740
20%
20%
25%
30%
5%
A+ [97, 100] A [93, 97)
B+ [87, 90) B [83, 87)
C+ [77, 80) C [73, 77)
D [60, 70)
F [0, 60)
A- [90, 93)
B- [80, 83)
C- [70, 73)
If one student’s score is no less than 97, an A+ will be given.
The scores may be adjusted if the average is low.
Tentative Course Outline and Schedule:
Chapter 1 Introduction
Chapter 2 Getting to Know Your Data
Chapter 3 Data Preprocessing
Aug. 22
Aug. 24
Chapter 6 Mining Frequent Patterns, Associations,
Aug. 29, 31, Sep. 7
and Correlations: Basic Concepts and Methods
Chapter 8 Classification: Basic Concepts
Sep. 12
Chapter 9 Classification: Advanced Methods
Sep. 14, 19, 21
Project Proposal Due
6 pm eastern time, Sep. 21
Tentative Course Outline and Schedule:
Chapter 10 Cluster Analysis: Basic Concepts and
Methods
Sep. 26, 28, Oct. 5, 10
Mid-term Exam
Oct. 3
Chapter 11 Advanced Cluster Analysis
Oct. 12, 17, 19, 24
Chapter 13 Data Mining Trends and Research
Frontiers
Oct. 26, 31, Nov. 2, 7, 9, 14
Project Presentations
Nov. 16, 28, 30
Final Exam
Dec. 5
Research Project Due
6 pm eastern time, Dec. 8
KDD References
 Data mining and KDD
 Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc.
 Journal: ACM-KDD, Data Mining and Knowledge Discovery, KDD
Explorations
 Database systems
 Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT,
DASFAA
 Journals: ACM-TODS, IEEE-TKDE, JIIS, J. ACM, etc.
 AI & Machine Learning
 Conferences: Machine learning (ICML), AAAI, IJCAI, COLT (Learning
Theory), etc.
 Journals: Machine Learning, Artificial Intelligence, etc.
KDD References
 Statistics
 Conferences: Joint Stat. Meeting, etc.
 Journals: Annals of statistics, etc.
 Bioinformatics
 Conferences: ISMB, RECOMB, PSB, CSB, BIBE, etc.
 Journals: J. of Computational Biology, Bioinformatics,
PLoS Computational Biology, etc.
Questions?