Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CIS600/CSE 690: Applied Data Mining CLASS SCHEDULE AND COVERAGE (1) Material covered on August 31 and September 2 (two meetings) August 31 (M): Introduction to KDD and DM; DM phases; CRISP model; four functionalities. September 2(W): Two meetings, examples of four functionalities (Classification; prediction; association rules; clustering); brief description of SKICAT, cancer classification, software engineering; classification problem; development and use of classification models; decision tree; Quinlan’s algorithm using information gain; Example 6.1 (Han’s book) to discuss decision tree construction. September 9(W): (two meetings). Review classification; stopping rules for tree construction, classification error; tree pruning; training, validation and generalization errors; model selection and assessment. Case studies: SKICAT; cancer classification; diabetes; software module criticality; September 10 (Thursday, optional): Review session 4:30-5:30pm September 14(M): Association rules; itemsets and frequent itemsets; a-priori property; support and confidence of rules September 23(W): Quiz NO.1; Rapid miner presentation and demonstration. September 30(W): Project: classification and association rules using (i) diabetes data (ii) heart disease data. Rapid miner project report due in class on Oct 14. October 5(M): Introduction to clustering. October 7(W): Clustering for diabetes and heart disease data using rapid miner (optional class). October 12,14: Rapid miner project report due in class on Oct 14; Introduction to prediction modeling using regression analysis; course review; October 15: Course review. Optional, time: TBA. October 16(F): Examination No.1, 5-7:30pm , room:TBA October 19,21: Neural networks for classification and prediction; case studies October 26,28: Radial basis functions; case studies from “Lessons Learned”. October 29: Optional review class The above represents about 9-10 weeks of class meetings; coverage for other classes to be finalized later. NOTE: Tentative: Quiz No.2 November 3(M); Exam No.2 November 13,2009 (F) CIS600/CSE 690: Analytical Data Mining Quiz No.1 September 23, 2009(Wednesday) about 3:50 – 4:20pm Coverage: PartA: KDD/DM (similar to assignment No.1) (15%) 1. KDD/DM description, goals of DM. 2. CRISP model; description of each phase. 3. Description of the four functionalities in DM; classification; prediction; association rules; clustering. PartB: Classification (similar to assignment No.2) (50%) 1. Classification, 2-steps of development and use of classification model 2. Decision tree introduction from training data; 3. Quinlan’s algorithm using information gain criterion. 4. Develop a tree for given data set. 5. Interpretation of decision tree. 6. Classification error 7. Tree pruning; pros and cons. PartC: Some basic concepts. (15%) 1. Supervised and unsupervised learning. 2. Training, validation and test data 3. Theoretical behavior of training, validation and test errors versus model complexity. 4. Model selection and assessment. PartD: Case studies (20%) 1. Importance of SKICAT application; significance of data mining results; contribution to science. 2. Diabetes classification problem; problem description, approach for classification (no details of radial basis functions); interpretation of classification results. 3. Micro array data classification for cancer type; problem description; goal of study; interpretation and significance of results.