Download Stony Brook University CSE 51d Data Science Fundamentals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Stony Brook University
CSE 51d Data Science Fundamentals
Tentative Syllabus
Lecture 1: Introduction
Introduction to Data Science
Big Data products
Scoping Projects, Asking good questions
The Data Mining process
Course introduction
Lecture 2: Data Preparation
Basic Data Types
Data Analytics Major Building Blocks: A Bird's Eye View
Data Collection, Cleaning, Integration, Storage
Lecture 3-4: Statistics
Statistics basics, Experiment design, Pitfalls
Observational and Longitudinal studies
Sampling
Probability Distributions
Hypothesis Testing
Significance
Lecture 5: Optimization
Convexity, Convex functions
Convex Optimization: unconstrained/constrained
Gradient Descent and variants
Lecture 6-8: Statistical Learning
Statistical Models and Likelihood, Likelihood Principle, MLE foundations
Machine Learning Concepts & Tasks:
Classification, Regression, Clustering, Dimensionality Reduction
Machine Learning: Specific algorithms
Lecture 9-10: Visualization
Visualization fundamentals
Charts, Graphs, Infographics
Interactive Visualization
Summarization
Lecture 11-13: Data Mining
Data Mining Concepts & Tasks:
Association Rules, Similarity Search, Cluster Analysis, Outlier Analysis
Data Mining: Specific algorithms
Lecture 14-17: Data: Unstructured vs. Structured
Time-Series Analysis
Text Mining/Information Retrieval
Network Analysis
Spatial Data Mining
Lecture 18-19: Matrix Methods
Matrix Factorization: Models and Algorithms
Matrix-Vector Product & Applications
Lecture 20-21: Computing at Scale
Memory, Parallelization, Map-Reduce
Hadoop, Pig, HBase, Hive
Spark, Spark SQL
Lecture 22: Data Science in the Real-World
Data Journalism
Provenance, Privacy, Ethics, Governance
Books:
•
•
•
•
•
•
•
•
Data Mining –The Textbook
http://link.springer.com/book/10.1007/978-3-319-14142-8
(FREE) Mining Massive Datasets
http://mmds.org/
(FREE) The Elements of Statistical Learning
http://statweb.stanford.edu/~tibs/ElemStatLearn/
Doing Data Science
http://columbiadatascience.com/doing-data-science/
Data Science for Business
http://www.data-science-for-biz.com/
(FREE) An Introduction to Information Retrieval
http://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf
The Visual Display of Quantitative Information
http://www.edwardtufte.com/tufte/books_vdqi
Convex Optimization
http://web.stanford.edu/~boyd/cvxbook/
Related documents