Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stony Brook University CSE 51d Data Science Fundamentals Tentative Syllabus Lecture 1: Introduction Introduction to Data Science Big Data products Scoping Projects, Asking good questions The Data Mining process Course introduction Lecture 2: Data Preparation Basic Data Types Data Analytics Major Building Blocks: A Bird's Eye View Data Collection, Cleaning, Integration, Storage Lecture 3-4: Statistics Statistics basics, Experiment design, Pitfalls Observational and Longitudinal studies Sampling Probability Distributions Hypothesis Testing Significance Lecture 5: Optimization Convexity, Convex functions Convex Optimization: unconstrained/constrained Gradient Descent and variants Lecture 6-8: Statistical Learning Statistical Models and Likelihood, Likelihood Principle, MLE foundations Machine Learning Concepts & Tasks: Classification, Regression, Clustering, Dimensionality Reduction Machine Learning: Specific algorithms Lecture 9-10: Visualization Visualization fundamentals Charts, Graphs, Infographics Interactive Visualization Summarization Lecture 11-13: Data Mining Data Mining Concepts & Tasks: Association Rules, Similarity Search, Cluster Analysis, Outlier Analysis Data Mining: Specific algorithms Lecture 14-17: Data: Unstructured vs. Structured Time-Series Analysis Text Mining/Information Retrieval Network Analysis Spatial Data Mining Lecture 18-19: Matrix Methods Matrix Factorization: Models and Algorithms Matrix-Vector Product & Applications Lecture 20-21: Computing at Scale Memory, Parallelization, Map-Reduce Hadoop, Pig, HBase, Hive Spark, Spark SQL Lecture 22: Data Science in the Real-World Data Journalism Provenance, Privacy, Ethics, Governance Books: • • • • • • • • Data Mining –The Textbook http://link.springer.com/book/10.1007/978-3-319-14142-8 (FREE) Mining Massive Datasets http://mmds.org/ (FREE) The Elements of Statistical Learning http://statweb.stanford.edu/~tibs/ElemStatLearn/ Doing Data Science http://columbiadatascience.com/doing-data-science/ Data Science for Business http://www.data-science-for-biz.com/ (FREE) An Introduction to Information Retrieval http://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf The Visual Display of Quantitative Information http://www.edwardtufte.com/tufte/books_vdqi Convex Optimization http://web.stanford.edu/~boyd/cvxbook/