Download CS 461

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia, lookup

ABET Course Syllabus
Course Title
Introduction to Data Science
Course Number
Total Credit
CS 461
Russ Abbott
Contact Hours
4 hours/week
Course Information
This course is an elective for both the BS and MS programs.
a) Catalog Description
Tools and techniques for extracting information from typically massive amounts of
data and then visualizing the results. Lecture 3 hours, recitation/activity 1 hour.
b) Prerequisite
CS 312 and at least one of the programming paradigms courses.
Course Goals
At the end of the course, students are able to
Understand and identify appropriate uses of relational and non-relational
Use data mining tools such as MapReduce (Hadoop)
Understand the issues involved in statistical modeling, including, for example,
Understand and use basic statistical modeling algorithms nearest neighbor,
decision trees, regression, k-means, and multi-dimensional scaling.
Design and implement intuitive ways to visualize data.
These course goals contribute to the success of Student Learning Outcomes (SLOs)
1, 2, 3, 5, 7, and 9:
SLO1. Students will be able to apply concepts and techniques from computing
and mathematics to both theoretical and practical problems.
SLO2 Students will be able to demonstrate fluency in at least one
programming language and acquaintance with at least three more.
SLO3. Students will have a strong foundation in the design, analysis, and
application of many types of algorithms.
SLO5. Students will have the training to analyze problems and identify and
define the computing requirements appropriate to their solutions.
SLO7. Students will be able to communicate effectively orally and in writing.
SLO9. Students will have the ability to analyze the local and global impact of
computing on individuals and society.
Major Topics Covered in the Course:
Part 0: Introduction
 Examples, data science articulated, history and context, technology landscape
Part 1: Data Manipulation, at Scale
 Databases and the relational algebra
 Parallel databases, parallel query processing, in-database analytics
 MapReduce, Hadoop, relationship to databases, algorithms, extensions, languages
 Key-value stores and NoSQL; tradeoffs of SQL and NoSQL
 Entity resolution, record linkage, data cleaning
Part 2: Analytics
 Basic statistical modeling, experiment design, introduction to machine
learning, overfitting
 Supervised learning: overview, simple nearest neighbor, decision trees/forests,
 Unsupervised learning: k-means, multi-dimensional scaling
 Graph Analytics: PageRank, community detection, recursive queries, iterative
 Text Analytics: latent semantic analysis
 Collaborative Filtering: slope-one
Part 3: Communicating Results
 Visualization, data products, visual data analytics
 Provenance, privacy, ethics, governance
Recitation sections
Hands-on activities are critical components of computer science courses that have
significant programming components. Each week students do a project related to the
week’s material. During the recitation section, students describe and explain their
work. Explaining what one has done helps develop a deeper understanding of it.
Besides pushing them to deepen their understanding, the explanation requirement
helps students develop presentation skills they will need after graduation.
Howe, Bill (2013) Introduction to Data Science, Coursera
Abu-Mostafa, Yaser, Malik Magdon-Ismail, and Hsuan-Tien Lin (2012) Learning
from Data. AML Book.
Barber, David (2012) Bayesian Reasoning and Machine Learning. Cambridge
University Press.
Downey, Allen B. (2013) Think Bayes. O’Reilly Media.
Foreman, John W. (2013) DataSmart: Using Data Science to Transform Information
into Insight. Wiley.
Han, Jiawei, Micheline Kamber, and Jian Pei (2011) Data Mining: Concepts and
Techniques. Morgan Kaufmann.
Murphy, Devin P. (2012) Machine Learning: A Probabilistic Perspective. MIT Press.
O’Neil, Cathy and Rachel Schutt (2013) Doing Data Science: Straight Talk from the
Frontline. O’Reilly Media.
Rajaraman, Anand and Jeffrey David Ullman (2012) Mining of Massive Datasets.
Cambridge University Press.
Ratner, Bruce (2011) Statistical and Machine-Learning Data Mining: Techniques for
Better Predictive Modeling and Analysis of Big Data. CRC Press.
Richert, Willi and Luis Pedro Coelho (2013) Building Machine Learning Systems with
Python. Packt Publishing.
Siegel, Eric (2013) Predictive Analytics: The Power to Predict Who Will Click, Buy,
Lie, or Die. Wiley.
Stone, James V. (2013) Bayes’ Rule: A Tutorial Introduction to Bayesian Analysis.
Sebtel Press.
Witten, Ian H. Eibe Frank, and Mark A. Hall (2011) Data Mining: Practical Machine
Learning Tools and Techniques. Morgan Kaufmann.
[(i) Chengyu and I will send you the list of courses that this section is applicable. (ii) We
will include the necessary assignments/projects/rubrics that will be applied in this course
that gives the data for direct measures described in the assessment plan]
Academic Integrity
Cheating will not be tolerated. Anyone cheating or helping someone else cheat will
receive a grade of F for the course and will be reported to the proper authorities.
ADA Statement
Reasonable accommodation will be provided to any student who is registered
with the Office of Students with Disabilities and requests needed