Download CS 461

ABET Course Syllabus Course Title Introduction to Data Science Course Number Total Credit CS 461 4 Coordinator Russ Abbott Contact Hours 4 hours/week Course Information This course is an elective for both the BS and MS programs. a) Catalog Description Tools and techniques for extracting information from typically massive amounts of data and then visualizing the results. Lecture 3 hours, recitation/activity 1 hour. b) Prerequisite CS 312 and at least one of the programming paradigms courses. Course Goals At the end of the course, students are able to  Understand and identify appropriate uses of relational and non-relational databases.  Use data mining tools such as MapReduce (Hadoop)  Understand the issues involved in statistical modeling, including, for example, overfitting.  Understand and use basic statistical modeling algorithms nearest neighbor, decision trees, regression, k-means, and multi-dimensional scaling.  Design and implement intuitive ways to visualize data. These course goals contribute to the success of Student Learning Outcomes (SLOs) 1, 2, 3, 5, 7, and 9:  SLO1. Students will be able to apply concepts and techniques from computing and mathematics to both theoretical and practical problems.  SLO2 Students will be able to demonstrate fluency in at least one programming language and acquaintance with at least three more.  SLO3. Students will have a strong foundation in the design, analysis, and application of many types of algorithms.  SLO5. Students will have the training to analyze problems and identify and define the computing requirements appropriate to their solutions.  SLO7. Students will be able to communicate effectively orally and in writing.  SLO9. Students will have the ability to analyze the local and global impact of computing on individuals and society. Major Topics Covered in the Course: Part 0: Introduction  Examples, data science articulated, history and context, technology landscape Part 1: Data Manipulation, at Scale  Databases and the relational algebra  Parallel databases, parallel query processing, in-database analytics  MapReduce, Hadoop, relationship to databases, algorithms, extensions, languages  Key-value stores and NoSQL; tradeoffs of SQL and NoSQL  Entity resolution, record linkage, data cleaning Part 2: Analytics  Basic statistical modeling, experiment design, introduction to machine learning, overfitting  Supervised learning: overview, simple nearest neighbor, decision trees/forests, regression  Unsupervised learning: k-means, multi-dimensional scaling  Graph Analytics: PageRank, community detection, recursive queries, iterative processing  Text Analytics: latent semantic analysis  Collaborative Filtering: slope-one Part 3: Communicating Results  Visualization, data products, visual data analytics  Provenance, privacy, ethics, governance Recitation sections Hands-on activities are critical components of computer science courses that have significant programming components. Each week students do a project related to the week’s material. During the recitation section, students describe and explain their work. Explaining what one has done helps develop a deeper understanding of it. Besides pushing them to deepen their understanding, the explanation requirement helps students develop presentation skills they will need after graduation. Textbook Howe, Bill (2013) Introduction to Data Science, Coursera References Abu-Mostafa, Yaser, Malik Magdon-Ismail, and Hsuan-Tien Lin (2012) Learning from Data. AML Book. Barber, David (2012) Bayesian Reasoning and Machine Learning. Cambridge University Press. Downey, Allen B. (2013) Think Bayes. O’Reilly Media. Foreman, John W. (2013) DataSmart: Using Data Science to Transform Information into Insight. Wiley. Han, Jiawei, Micheline Kamber, and Jian Pei (2011) Data Mining: Concepts and Techniques. Morgan Kaufmann. Murphy, Devin P. (2012) Machine Learning: A Probabilistic Perspective. MIT Press. O’Neil, Cathy and Rachel Schutt (2013) Doing Data Science: Straight Talk from the Frontline. O’Reilly Media. Rajaraman, Anand and Jeffrey David Ullman (2012) Mining of Massive Datasets. Cambridge University Press. Ratner, Bruce (2011) Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data. CRC Press. Richert, Willi and Luis Pedro Coelho (2013) Building Machine Learning Systems with Python. Packt Publishing. Siegel, Eric (2013) Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. Wiley. Stone, James V. (2013) Bayes’ Rule: A Tutorial Introduction to Bayesian Analysis. Sebtel Press. Witten, Ian H. Eibe Frank, and Mark A. Hall (2011) Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann. Assessment [(i) Chengyu and I will send you the list of courses that this section is applicable. (ii) We will include the necessary assignments/projects/rubrics that will be applied in this course that gives the data for direct measures described in the assessment plan] Academic Integrity Cheating will not be tolerated. Anyone cheating or helping someone else cheat will receive a grade of F for the course and will be reported to the proper authorities. ADA Statement Reasonable accommodation will be provided to any student who is registered with the Office of Students with Disabilities and requests needed accommodation.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download CS 461