Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSci 8980: Data Mining (Fall 2002) Instructor: Vipin Kumar Time and Place: Mon Wed 4:00PM - 5:15PM, Phys 143 Office Hours: TBA Instructor's Office: 5-215 (EE/CSci Bldg) This page address: http://www.cs.umn.edu/~ptan/dmclass/index.html Course Overview: The last decade has seen an explosive growth in database technology and the amount of data collected. This has created an unprecedented opportunity for "data mining", which is a process of efficient supervised or unsupervised discovery of interesting information hidden in the data. Some of the common tasks in data mining are classification, discovery of association rules, clustering, and discovery of sequential patterns. This course will provide a rapid and vigorous introduction to the field of data mining, as well as provide extensive hands-on experience via research projects. The course will consist of about 7 weeks of lectures followed by 7 weeks of project work by students on selected data sets such as the Earth Science data, Web data, network intrusion data, market-basket data and text data. Course outline (First 7 weeks): Introduction o What is data mining? o Intro to Data Mining Tasks (Classification, Clustering, Association Rules, Sequential Patterns, Regression, Deviation Detection) Classification Algorithms o Decision-Tree Based Approach (e.g. C4.5) o Rule-set Based Approach o Bayesian Approach : Naive and Bayesian Networks. o Instance Based classifiers (e.g. k-Nearest Neighbor) o Neural Network Based Approach Clustering o k-means, k-medoids methods o Hierarchical methods o Graph based methods, including shared-nearest neighbor methods o Density-based methods Association Rule Discovery o Apriori Principle and its extensions o Sequential Associations o Handling attributes with continuous values and type hierarchies o Interestingness measures Preprocessing o Feature Selection o Dimensionality Reduction o Data Cleaning Commercial and Scientific Applications Background Required: General background in Computer Science (algorithms, etc), and Motivation to Learn. Workload and Grading Scheme: Four to six homeworks (25%), mid-term exam (25%), project/paper/presentation (40%), and class participation (10%) Textbook: Fundamentals of Data Mining, P.N. Tan, M. Steinbach, V. Kumar (a draft version will be available at http://www.cs.umn.edu/~ptan/dmclass)