Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Spring 2016 (Welsch) Statistical Learning and Data Mining 15.077 (ESD.753J) Tentative Schedule Texts: 1. Rice, J.A., Mathematical Statistics and Data Analysis (with CD Data Sets), 3rd ed., Duxbury, 2007 (ISBN 978-0-534-39942-9). [R] 2. Hastie, T., Tibshirani, R., and Friedman, J., The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer, 2nd ed., 2009 (ISBN 978-0-387-84857-0). [H]. This text is available on the Internet for about $25. If #1 seems a little hard to follow in the beginning, then you might try: 3. Tamhane, A.C., Dunlop, D.D., Statistics and Data Analysis: From Elementary to Intermediate, Pearson, 2000 (ISBN 9780137444267). For material relating to the bootstrap and resampling approach to statistics, look at: 4. Chihara, L. and Hesterberg, T., Mathematical Statistics with Resampling and R, Wiley, 2011 (ISBN 978-1-1180-2985-5) 5. Lock, R. et al., Statistics: Unlocking the Power of Data, Wiley 2013 (ISBN 978-0-470-60187-7) If you need proofs of some theorems and a more advanced discussion than #1 try: 6. Casella, G. and Berger, R.L., Statistical Inference, 2nd ed., Cengage Learning, 2002 (ISBN 9780534243128). The Hastie, et al. book varies in level. Resources at a lower level are: 7. Shmueli, G., Patel, N., and Bruce, P., Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Excel with XL Miner, 2nd ed., Wiley, 2010 (ISBN 978-0470-52682-8). 8. James, G., Witten, D., Hastie, T., and Tibshirani, R., An Introduction to Statistical Learning with Applications in R, Springer, 2013 (ISBN 978-1-4614-7138-7). For something between #2 and #7 or #8 you might look at: 9. Tan, P., Steinbach, M., Kumar, V., Introduction to Data Mining, Addison-Wesley, 2006 (ISBN 9780321321367). Some applications of the material in this class to finance are contained in: 10. Ruppert, D., Statistics and Data Analysis for Financial Engineering, Springer, 2011 (ISBN 978‐1‐4419‐7786‐1). Online through the MIT Libraries website. For use of the bootstrap in finance see: 11. Michaud, Richard and Michaud, Robert, Efficient Asset Management: A Practical Guide to Stock Optimization and Asset Allocation, 2nd Ed., Oxford University Press, 2008 (ISBN 9780195331912). Date (L#) Topics Reading Data Analysis and Statistical Inference Feb. 2 T No recitation 3 W (1) What statisticians do and how they think? Notes 8 M (2) Sampling and statistical distributions R 6,7.1-7.3, 7.5-7.6 9T Rec.: Computing: graphics 10 W (3) 15 M R 9.8-9.9, 10.2.3, 10.3 10.6-10.8 Estimation, confidence intervals, and the bootstrap R 8.1, 8.3-8.5, 8.7, 8.9, 10.4.6 MIT vacation, class moved to Tuesday 16 T (4) Hypothesis testing; likelihood ratios; goodness-of-fit tests; permutation tests 16 T No recitation since Monday classes 17 W (5*) Bayesian inference, MCMC, and Gibbs sampling R 3.5.2 (pages 94 ,95), 8.6 22 M (6) Linear regression and smoothing R4.4.2, 14.1-14.7 23 T Rec.: Estimation and testing; Bayes methods 24 W (7) Regression diagnostics; residuals and influential data Notes, R10.4.2-10.4.3 , 10.5 29 M (8) Regression diagnostics; robust regression and collinearity Notes, R10.4.4-10.4.5, 14.8 March 1 T R 9.1-9.6, 9.10, 8.2 4.6 (pages 161-163) Rec.: Regression and diagnostics 2 W (9*) Comparing two samples; non-parametric methods. R11.1-11.5 Experimental design 7 M (10) Analysis of categorical data 8T Rec.: Two samples and categorical data 9 W (11) Analysis of variance; approximate methods 14 M (12) Guest lecture and/or software tutorials (SIP week) 15 T Recitation R13.1-13.7 R12.1-12.4, 4.6 H 18.7.1-18.7.2 16 W (13*) Guest lecture and/or software tutorials (SIP Week) 21 M No class. Spring Vacation 22 T No recitation. Spring Vacation 23 W No class. Spring Vacation Data Mining 28 M (14) Learning from Data 29 T Rec.: ANOVA and data-mining computing 30 W (15*) Model assessment H7.1-7.7, 7.10-7.11 April 4 M (16) Regression selection: Ridge, PCR, PLS, LAR H3.1-3.6, 3.9, 18.1 5 T Rec.: Regression selection 6 W (17) Discriminant analysis; Logistic regression 11 M (18*) Generalized additive models and trees: CART 12 T Rec.: Classification, logistic reg., and CART 13 W (19) Examination handed out and due at 4pm on April 16 (open book) No lecture and can start exam during class. 18 M Vacation, no class 19 T Vacation, no recitation 20 W (20) Support vector machines; support vector regression Project Proposal Due H4.5, 12.1-12.2, 12.3.112.3.2, 12.3.6, 12.3.8, 18.3.3 25 M (21*) Neural networks H11.1, 11.3-11.8, 11.10 26 T Rec.: Midterm solutions, SVM 27 W (22) Cluster analysis: k-means, hierarchical clustering, biclustering May H1, 2 H4.1-4.4, 18.3.2, 18.4, 18.4.1 H9.1-9.2 H13.1-13.2 (omit 13.2.213.2.3), 13.3(omit 13.3.3) 14.3 (omit 14.3.9) 2 M (23) 3 T Bagging and boosting; AdaBoost, random forests; H8.5-8.9, 10.1-10.14, Expectation and maximization (EM) 11.9. 15.1-15.3 Rec.: Clustering and ensemble methods 4 W (24*) Collaborative filtering and affinity analysis 9 M (25) Catch-up and selected project presentations 10 T Rec.: Affinity analysis and project consultations 11 W (26) Selected project presentations Project report due * Denotes tentative homework due dates. 11/30/2015 H14.2