Download CSE/CIS 787: Analytical Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
CIS600/CSE 690: Applied Data Mining
CLASS SCHEDULE AND COVERAGE
(1) Material covered on August 31 and September 2 (two meetings)
August 31 (M):
Introduction to KDD and DM; DM phases; CRISP model; four
functionalities.
September 2(W): Two meetings, examples of four functionalities (Classification;
prediction; association rules; clustering); brief description of SKICAT,
cancer classification, software engineering; classification problem;
development and use of classification models; decision tree;
Quinlan’s algorithm using information gain; Example 6.1 (Han’s
book) to discuss decision tree construction.
September 9(W): (two meetings). Review classification; stopping rules for tree
construction, classification error; tree pruning; training, validation
and generalization errors; model selection and assessment.
Case studies: SKICAT; cancer classification; diabetes; software
module criticality;
September 10 (Thursday, optional): Review session 4:30-5:30pm
September 14(M): Association rules; itemsets and frequent itemsets; a-priori property;
support and confidence of rules
September 23(W): Quiz NO.1; Rapid miner presentation and demonstration.
September 30(W): Project: classification and association rules using (i) diabetes data (ii)
heart disease data. Rapid miner project report due in class on Oct
14.
October 5(M):
Introduction to clustering.
October 7(W):
Clustering for diabetes and heart disease data using rapid miner
(optional class).
October 12,14:
Rapid miner project report due in class on Oct 14; Introduction to
prediction modeling using regression analysis; course review;
October 15:
Course review. Optional, time: TBA.
October 16(F):
Examination No.1, 5-7:30pm , room:TBA
October 19,21:
Neural networks for classification and prediction; case studies
October 26,28:
Radial basis functions; case studies from “Lessons Learned”.
October 29:
Optional review class
The above represents about 9-10 weeks of class meetings; coverage for other classes to
be finalized later.
NOTE: Tentative: Quiz No.2 November 3(M); Exam No.2 November 13,2009 (F)
CIS600/CSE 690: Analytical Data Mining
Quiz No.1
September 23, 2009(Wednesday) about 3:50 – 4:20pm
Coverage:
PartA: KDD/DM (similar to assignment No.1) (15%)
1. KDD/DM description, goals of DM.
2. CRISP model; description of each phase.
3. Description of the four functionalities in DM; classification; prediction; association
rules; clustering.
PartB: Classification (similar to assignment No.2) (50%)
1. Classification, 2-steps of development and use of classification model
2. Decision tree introduction from training data;
3. Quinlan’s algorithm using information gain criterion.
4. Develop a tree for given data set.
5. Interpretation of decision tree.
6. Classification error
7. Tree pruning; pros and cons.
PartC: Some basic concepts. (15%)
1. Supervised and unsupervised learning.
2. Training, validation and test data
3. Theoretical behavior of training, validation and test errors versus model complexity.
4. Model selection and assessment.
PartD: Case studies (20%)
1. Importance of SKICAT application; significance of data mining results; contribution to
science.
2. Diabetes classification problem; problem description, approach for classification (no
details of radial basis functions); interpretation of classification results.
3. Micro array data classification for cancer type; problem description; goal of study;
interpretation and significance of results.