Download Statistical Computing

Statistical Computing Duration Language Entry requirements 216 hours (6 ESTC) English level B1 (European Framework of Reference of Communicative Skills) BSc degree in Physics, Math or Computer Sciences. Ehh! Biology grade is also permissible! About the course The course provides an overview and introduction to the up-to-date methodology and techniques for non-linear statistical analysis of multidimensional data. Some methods and approaches would be discussed in detail; some computational experiments and lab projects devoted to the real data analysis would support the class-works. Outline of content 1 . Short introductory sub-course in probability theory foundations, and some classical issues of statistics. 2 . Bulky data in multidimensional space; an idea of metric space, a brief outline of geometry and topology of metric spaces. 3 . Multidimensional data visualization: what one can see and how? What usually remains invisible? 4 . Principal component analysis. Factors of divergence. 5 . Clustering. What is Clustering Strategy; the curse of dimension. 6 . Hierarchial clustering. A choice of rules to control hierarchial clustering. 7 . K-means. How to choose proper value for K? Discernibility of classes: when to stop. 8 . Elastic map technique to visualize multidimensional data and (nonlinear) clustering. Educator Michael Sadovsky, Doctor Habitats in Biophysics, Leading Researcher at ICM SB RAS; quarterProfessor, Department of Applied math & computer safety, Siberian Federal University E-mail: [email protected] Course description Tremendous (faster than exponential) growth of data necessary to be analyzed by experts raises the problem of the development and implementation of the relevant (and adequate) methods and techniques to do it. This demand results in significant progress both in pure and applied mathematics and related disciplines (say, programme design and algorithmic solutions). Hence, the course brings an introduction to the problem of bulky data analysis, and explores (in reasonable scale) the relevant issues in mathematics and related topics. Course aims The Aim of the course is present an introduction to the up-to-date ideas, approaches and techniques for multidimensional data analysis. Also, the course is aimed to give students a critical understanding of current technical implementations of some methods and techniques mentioned above, and to train student in the methods application. 1 Objectives The objectives of the course are: 1) to give students an understanding of the concept of data analysis and visualization; 2) to provide students with up-to-date knowledge on some methods and techniques of clusterization, data visualization, extraction and retrieval of patterns of interdependence; 3) to provide students with comprehensive understanding of the constraints, advantages and problem points of the methods mentioned above; 4) to make students familiar with some software packages and toolkits used to implement the methods mentioned above into practiced of data analysis. Learning outcomes By the end of the course, students will be able 1) to identify and classify main phenomenae and basic peculiarities in multidimensional datasets, in order to select a proper and most efficient methods of the analysis; 2) to apply hierarchial classifications, PCA, K-means, mean-shift, and elastic map technique, where necessary; 3) to provide a sketch of interpretation of the results of multidimensional data treatment. Attendance Policy Students are expected to attend classes regularly, since the consistent attendance offers the most effective opportunity open to all students to gain command of the concepts and materials of the course. Meanwhile, excuses of various origin are permissible, in case students take a consultation and do the necessary class-work at home (or at their own). Such “hidden extramural” activity must not exceed a quarter of the total course time. Assesments and Assessment Methods The 1. 2. 3. 4. course assessment assignments will include (with the draft scheme of student’s grade): Short-response questionnaire 6 10 % (exchangeable with item # 2); Class participation 6 15 % (exchangeable with item # 1); Practically oriented class/home mini-projects 35 %; Oral examination (Full course) 40 % . Recommended Reading and Other Handy Skills (optional) 1) Computational Statistics (textbook). Springer, ISBN 978-0-387-98144-4; Basic reading: chapters 6, 7, 10, 12, 16. 2) Takayuki Saito, Hiroshi Yadohisa, Data Analysis of Asymmetric Structures: Advanced Approaches in Computational Statistics, 2004 by CRC Press, ISBN 9780824753986. 3) James G., Witten D., Hastie T., Tibshirani R. An Introduction to Statistical Learning. Springer New York Heidelberg Dordrecht London, 2013. ISBN 978-1-4614-7137-0; ISBN 9781-4614-7138-7 (eBook); DOI 10.1007/978-1-4614-7138-7 . Further reading (yet tentative) 1) Keinosuke Fukunaga, Introduction to Statistical Pattern Recognition. (1990) Elsevier Inc., ISBN: 978-0-08-047865-4 2) Leskovec J., Rajaraman A., Ullman J. D. Mining of Massive Datasets. (2014) Cambridge University Press; книга доступна бесплатно по адресу: http://www.mmds.org/#ver21 (и 2 что радует – совершенно официально!) 3) Aggarwal N., Aggarwal K. (2012) An Improved K-means Clustering Algorithm For Data Mining. LAP LAMBERT Academic Publishing; ISBN-13: 978-3659216657 4) Wu J. (2012) Advances in K-means Clustering: A Data Mining Thinking (Springer Theses: Recognizing Outstanding Ph.D. Research) Springer-Verlag New York, LLC; ISBN-13: 9783642298066 5) Classification, Clustering, and Data Analysis. K. Jajuga, A. Sokolowski, H.-H. Bock, Eds. 2002, Springer, ISBN: 3-540-43691-X. 6) Christopher Bishop, Pattern Recognition and Machine Learning, Springer-Verlag New York, 2009, 978-0-387-31073-2. Reasonable level in programming is welcomed. Special Features Statistics is an important tool in various areas of science ranging from biology to sociology. It covers concepts and methods which are able to draw inference based on empirical data, with given level of reliability and confidence. Being a branch of mathematics, Statistics has strong connections to applications and offers a chance to grow through a number of various specific fields of knowledge and expertise. 3

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistical Computing