Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MATH 574M: Introduction to Statistical Machine Learning • Course Web Page: http://www.math.arizona.edu/∼hzhang/math574.html • Instructor: Hao Helen Zhang, [email protected], ENR2 S323 • Lecture Hours and Location: Tue, Thu 9:30-10:45pm, Phys-Atoms Sci. (PAS) 224 • Office Hours: Tuesday 2-4pm (ENR2 S323), or by appointment • Prerequisite: MATH 464, MATH 466 (or MATH 363), or their equivalence, R software Course Description: • Goal: With rapid advances in information and technology, huge amount of scientific and commercial data have been generated in various fields. For example, the human genome database project has collected gigabytes of data on the human genetic code. The World Wide Web provides another example with billions of web pages consisting of textual and multimedia information that are used by millions of people. This course covers modern data science techniques, including basic statistical learning theories and their applications. A variety of data mining methodologies, algorithms, and software tools will be introduced, with emphasis on both conceptual and computational aspects. Applications in bioinformatics, genomics, text mining, social networks, and so on will be covered. This course emphasizes on statistical analysis, methodology, and theory in modern machine learning. It is intended for students who want to practice state-of-art machine learning tools and algorithms, and also understand theoretical principles and statistical properties that underlie the algorithms. The topics include regression, classification, clustering, dimension reduction, and high dimensional analysis. • Software: All computational problems and the project are to be completed using the R programming language. The software can be downloaded at http://cran.r-project.org/. Some tutorial material can be downloaded at our course website. • Textbook: The Elements of Statistical Learning, 2nd Edition (Hastie, Tibshirani, Friedman 2009). A free electronic version of the book can be downloaded at our course page, or at http://statweb.stanford.edu/∼ tibs/ElemStatLearn/. • Reference: Principle and Theory for Data Mining and Machine Learning (Clark, Forkoue, Zhang, 2009) • Other Online Short Courses: The authors Hastie and Tibshirani are teaching a free online course on statistical learning this winter quarter using their new book. Check out the website https://class.stanford.edu/courses/HumanitiesScience/StatLearning/Winter2014/about. 1 Topics: 1. Overview and Introduction, Application Examples, Tutorial on R • What is data mining? Connection to Statistics, Data Science, and Computer Sciences 2. Supervised learning • Statistical decision theory, Loss function, Risk minimization, Consistency, bias-variance trade-off 3. Regression I: parametric models • Linear model theory, Classical model selection methods, Modern shrinkage methods, LASSO-type methods 4. Classification I (model-based methods) • Binary problems: Discriminant analysis, Logistic regression; Multiclass problems 5. Regression II: nonparametric models • Basis expansion, regularization, splines, Generalized additive models (GAM), tree-based methods 6. Classification II: (modern large-margin methods) • Concept of large margin, Support vector machines (SVM), kernel methods 7. Tree-based methods, Ensemble methods • CART and MART, random forest, Boosting and bagging 8. Unsupervised learning • k-means clustering, Principal component analysis (PCA) Homework & General Instruction: • Turn in all HW through D2L. • R code is turned in the format ***.r, which can be executed by the function source(). • Output files should be saved in PDF. • HW files are named as “hw1 Last First Prob1.r”, “hw1 Last First output.pdf”. • It is Ok to turn in multiple files. • It is Ok to discuss HW with your classmates, but identical solution is not acceptable. Grades: Letter grade is given, which is based on homework (65%) and final project (35%). Questions and Suggestions? 2