Download MATH 574M: Introduction to Statistical Machine Learning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
MATH 574M: Introduction to Statistical Machine Learning
• Course Web Page: http://www.math.arizona.edu/∼hzhang/math574.html
• Instructor: Hao Helen Zhang, [email protected], ENR2 S323
• Lecture Hours and Location: Tue, Thu 9:30-10:45pm, Phys-Atoms Sci. (PAS) 224
• Office Hours: Tuesday 2-4pm (ENR2 S323), or by appointment
• Prerequisite: MATH 464, MATH 466 (or MATH 363), or their equivalence, R software
Course Description:
• Goal: With rapid advances in information and technology, huge amount of scientific and
commercial data have been generated in various fields. For example, the human genome
database project has collected gigabytes of data on the human genetic code. The World
Wide Web provides another example with billions of web pages consisting of textual and
multimedia information that are used by millions of people.
This course covers modern data science techniques, including basic statistical
learning theories and their applications. A variety of data mining methodologies, algorithms, and software tools will be introduced, with emphasis on both conceptual and computational aspects. Applications in bioinformatics, genomics, text mining, social networks, and
so on will be covered.
This course emphasizes on statistical analysis, methodology, and theory in modern
machine learning. It is intended for students who want to practice state-of-art machine
learning tools and algorithms, and also understand theoretical principles and statistical properties that underlie the algorithms. The topics include regression, classification, clustering,
dimension reduction, and high dimensional analysis.
• Software: All computational problems and the project are to be completed using the R
programming language. The software can be downloaded at http://cran.r-project.org/. Some
tutorial material can be downloaded at our course website.
• Textbook: The Elements of Statistical Learning, 2nd Edition (Hastie, Tibshirani, Friedman
2009). A free electronic version of the book can be downloaded at our course page, or at
http://statweb.stanford.edu/∼ tibs/ElemStatLearn/.
• Reference: Principle and Theory for Data Mining and Machine Learning (Clark, Forkoue,
Zhang, 2009)
• Other Online Short Courses: The authors Hastie and Tibshirani are teaching a free online
course on statistical learning this winter quarter using their new book. Check out the website
https://class.stanford.edu/courses/HumanitiesScience/StatLearning/Winter2014/about.
1
Topics:
1. Overview and Introduction, Application Examples, Tutorial on R
• What is data mining? Connection to Statistics, Data Science, and Computer Sciences
2. Supervised learning
• Statistical decision theory, Loss function, Risk minimization, Consistency, bias-variance
trade-off
3. Regression I: parametric models
• Linear model theory, Classical model selection methods, Modern shrinkage methods,
LASSO-type methods
4. Classification I (model-based methods)
• Binary problems: Discriminant analysis, Logistic regression; Multiclass problems
5. Regression II: nonparametric models
• Basis expansion, regularization, splines, Generalized additive models (GAM), tree-based
methods
6. Classification II: (modern large-margin methods)
• Concept of large margin, Support vector machines (SVM), kernel methods
7. Tree-based methods, Ensemble methods
• CART and MART, random forest, Boosting and bagging
8. Unsupervised learning
• k-means clustering, Principal component analysis (PCA)
Homework & General Instruction:
• Turn in all HW through D2L.
• R code is turned in the format ***.r, which can be executed by the function source().
• Output files should be saved in PDF.
• HW files are named as “hw1 Last First Prob1.r”, “hw1 Last First output.pdf”.
• It is Ok to turn in multiple files.
• It is Ok to discuss HW with your classmates, but identical solution is not acceptable.
Grades: Letter grade is given, which is based on homework (65%) and final project (35%).
Questions and Suggestions?
2