Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Spring 2013 Statistics 702: Data Mining Statistical Methods I. General Information Lecture: TTh 12:30-1:45 pm, GMCS 325 Course web page: http://blackboard.sdsu.edu and rohan.sdsu.edu/∼jjfan/sta702 Instructor: Juanjuan Fan Office: GMCS 519 E-mail: [email protected] Office Hours: T 1:45-2:30 pm, Th 9:00-9:45 am Textbook: Classification and Regression Trees by Breiman, Friedman, Olshen, and Stone (1984), Chapman & Hall/CRC References: 1. The Elements of Statistical Learning, 2nd ed. by Hastie, Tibshirani, Friedman (2009), Springer 2. Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning, by Izenman (2008), Springer 3. Data Mining with R: Learning with Case Studies, by L. Torgo (2010), Chapman and Hall/CRC 4. Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery, by G. Williams (2011), Springer 5. The Art of R Programming: A Tour of Statistical Software Design, by N. Matloff (2011), No Starch Press Prerequisites: Stat 670B, Stat 510, and Graduate Student status Grading: Assignments and group project: 20% Midterm exams: 40% Class participation and presentation: 15% Final project: 25% Note that any regrading request on assignments or exams has to be made within one week after the paper is returned. Exams and projects: Late papers will not be accepted. NO early or makeup exams are given - no exceptions. II. Other Information 1. All assignments and solutions (for close-ended problems) will be posted on blackboard. There will be one group project. 2. There will be two in-class exams on Tuesday, March 5 and Thursday, April 18. 3. There will be a final project (in lieu of a final exam) so that you can apply the data mining techniques to real data. You will be asked to present your findings during the last two weeks of classes. In addition, a written report will be due by Thursday, May 16 by 12:30pm. 4. Due to concerns of computer viruses, I ask that you hand in all your assignments/reports as a hard copy. No electronic copies will be accepted. III. Course Content Introduction to data mining and introduction to R, cluster analysis, nearest neighbors, classification and regression trees (CART), bagging, random forests, and boosting.