Download Spring 2016 Statistical Learning and Data Mining 15.077 (Welsch

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Spring 2016
(Welsch)
Statistical Learning and Data Mining
15.077
(ESD.753J)
Tentative Schedule
Texts:
1. Rice, J.A., Mathematical Statistics and Data Analysis (with CD Data Sets), 3rd ed., Duxbury, 2007
(ISBN 978-0-534-39942-9). [R]
2. Hastie, T., Tibshirani, R., and Friedman, J., The Elements of Statistical Learning: Data Mining,
Inference and Prediction, Springer, 2nd ed., 2009 (ISBN 978-0-387-84857-0). [H]. This text is
available on the Internet for about $25.
If #1 seems a little hard to follow in the beginning, then you might try:
3. Tamhane, A.C., Dunlop, D.D., Statistics and Data Analysis: From Elementary to Intermediate,
Pearson, 2000 (ISBN 9780137444267).
For material relating to the bootstrap and resampling approach to statistics, look at:
4. Chihara, L. and Hesterberg, T., Mathematical Statistics with Resampling and R, Wiley, 2011
(ISBN 978-1-1180-2985-5)
5. Lock, R. et al., Statistics: Unlocking the Power of Data, Wiley 2013 (ISBN 978-0-470-60187-7)
If you need proofs of some theorems and a more advanced discussion than #1 try:
6. Casella, G. and Berger, R.L., Statistical Inference, 2nd ed., Cengage Learning, 2002 (ISBN
9780534243128).
The Hastie, et al. book varies in level. Resources at a lower level are:
7. Shmueli, G., Patel, N., and Bruce, P., Data Mining for Business Intelligence: Concepts,
Techniques, and Applications in Microsoft Excel with XL Miner, 2nd ed., Wiley, 2010 (ISBN 978-0470-52682-8).
8. James, G., Witten, D., Hastie, T., and Tibshirani, R., An Introduction to Statistical Learning with
Applications in R, Springer, 2013 (ISBN 978-1-4614-7138-7).
For something between #2 and #7 or #8 you might look at:
9. Tan, P., Steinbach, M., Kumar, V., Introduction to Data Mining, Addison-Wesley, 2006 (ISBN
9780321321367).
Some applications of the material in this class to finance are contained in:
10. Ruppert, D., Statistics and Data Analysis for Financial Engineering, Springer, 2011 (ISBN
978‐1‐4419‐7786‐1). Online through the MIT Libraries website.
For use of the bootstrap in finance see:
11. Michaud, Richard and Michaud, Robert, Efficient Asset Management: A Practical Guide to Stock
Optimization and Asset Allocation, 2nd Ed., Oxford University Press, 2008 (ISBN 9780195331912).
Date (L#)
Topics
Reading
Data Analysis and Statistical Inference
Feb.
2 T
No recitation
3 W (1)
What statisticians do and how they think?
Notes
8 M (2)
Sampling and statistical distributions
R 6,7.1-7.3, 7.5-7.6
9T
Rec.: Computing: graphics
10 W (3)
15 M
R 9.8-9.9, 10.2.3, 10.3
10.6-10.8
Estimation, confidence intervals, and the bootstrap R 8.1, 8.3-8.5,
8.7, 8.9, 10.4.6
MIT vacation, class moved to Tuesday
16 T (4)
Hypothesis testing; likelihood ratios;
goodness-of-fit tests; permutation tests
16 T
No recitation since Monday classes
17 W (5*)
Bayesian inference, MCMC, and Gibbs sampling
R 3.5.2 (pages 94 ,95), 8.6
22 M (6)
Linear regression and smoothing
R4.4.2, 14.1-14.7
23 T
Rec.: Estimation and testing; Bayes methods
24 W (7)
Regression diagnostics; residuals
and influential data
Notes, R10.4.2-10.4.3 ,
10.5
29 M (8)
Regression diagnostics; robust regression
and collinearity
Notes, R10.4.4-10.4.5,
14.8
March
1 T
R 9.1-9.6, 9.10, 8.2
4.6 (pages 161-163)
Rec.: Regression and diagnostics
2 W (9*)
Comparing two samples; non-parametric methods. R11.1-11.5
Experimental design
7 M (10)
Analysis of categorical data
8T
Rec.: Two samples and categorical data
9 W (11)
Analysis of variance; approximate methods
14 M (12)
Guest lecture and/or software tutorials (SIP week)
15 T
Recitation
R13.1-13.7
R12.1-12.4, 4.6
H 18.7.1-18.7.2
16 W (13*)
Guest lecture and/or software tutorials (SIP Week)
21 M
No class. Spring Vacation
22 T
No recitation. Spring Vacation
23 W
No class. Spring Vacation
Data Mining
28 M (14)
Learning from Data
29 T
Rec.: ANOVA and data-mining computing
30 W (15*)
Model assessment
H7.1-7.7, 7.10-7.11
April
4 M (16)
Regression selection: Ridge, PCR, PLS, LAR
H3.1-3.6, 3.9, 18.1
5 T
Rec.: Regression selection
6 W (17)
Discriminant analysis; Logistic regression
11 M (18*)
Generalized additive models and trees: CART
12 T
Rec.: Classification, logistic reg., and CART
13 W (19)
Examination handed out and due at 4pm on April 16 (open book)
No lecture and can start exam during class.
18 M
Vacation, no class
19 T
Vacation, no recitation
20 W (20)
Support vector machines; support vector
regression
Project Proposal Due
H4.5, 12.1-12.2, 12.3.112.3.2, 12.3.6, 12.3.8,
18.3.3
25 M (21*)
Neural networks
H11.1, 11.3-11.8, 11.10
26 T
Rec.: Midterm solutions, SVM
27 W (22)
Cluster analysis: k-means, hierarchical clustering,
biclustering
May
H1, 2
H4.1-4.4, 18.3.2, 18.4,
18.4.1
H9.1-9.2
H13.1-13.2 (omit 13.2.213.2.3), 13.3(omit 13.3.3)
14.3 (omit 14.3.9)
2 M (23)
3 T
Bagging and boosting; AdaBoost, random forests; H8.5-8.9, 10.1-10.14,
Expectation and maximization (EM)
11.9. 15.1-15.3
Rec.: Clustering and ensemble methods
4 W (24*)
Collaborative filtering and affinity analysis
9 M (25)
Catch-up and selected project presentations
10 T
Rec.: Affinity analysis and project consultations
11 W (26)
Selected project presentations
Project report due
* Denotes tentative homework due dates.
11/30/2015
H14.2