Download Spring 2013 Statistics 702: Data Mining Statistical Methods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Spring 2013
Statistics 702: Data Mining Statistical Methods
I. General Information
Lecture:
TTh 12:30-1:45 pm, GMCS 325
Course web page: http://blackboard.sdsu.edu
and rohan.sdsu.edu/∼jjfan/sta702
Instructor:
Juanjuan Fan
Office: GMCS 519
E-mail: [email protected]
Office Hours: T 1:45-2:30 pm, Th 9:00-9:45 am
Textbook:
Classification and Regression Trees
by Breiman, Friedman, Olshen, and Stone (1984), Chapman & Hall/CRC
References:
1. The Elements of Statistical Learning, 2nd ed.
by Hastie, Tibshirani, Friedman (2009), Springer
2. Modern Multivariate Statistical Techniques: Regression, Classification,
and Manifold Learning, by Izenman (2008), Springer
3. Data Mining with R: Learning with Case Studies, by L. Torgo (2010),
Chapman and Hall/CRC
4. Data Mining with Rattle and R: The Art of Excavating Data for
Knowledge Discovery, by G. Williams (2011), Springer
5. The Art of R Programming: A Tour of Statistical Software Design,
by N. Matloff (2011), No Starch Press
Prerequisites: Stat 670B, Stat 510, and Graduate Student status
Grading:
Assignments and group project: 20%
Midterm exams: 40%
Class participation and presentation: 15%
Final project: 25%
Note that any regrading request on assignments or exams has to be made
within one week after the paper is returned.
Exams and projects:
Late papers will not be accepted.
NO early or makeup exams are given - no exceptions.
II. Other Information
1. All assignments and solutions (for close-ended problems) will be posted on blackboard. There
will be one group project.
2. There will be two in-class exams on Tuesday, March 5 and Thursday, April 18.
3. There will be a final project (in lieu of a final exam) so that you can apply the data mining
techniques to real data. You will be asked to present your findings during the last two weeks
of classes. In addition, a written report will be due by Thursday, May 16 by 12:30pm.
4. Due to concerns of computer viruses, I ask that you hand in all your assignments/reports as
a hard copy. No electronic copies will be accepted.
III. Course Content
Introduction to data mining and introduction to R, cluster analysis, nearest neighbors, classification and regression trees (CART), bagging, random forests, and boosting.