Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DSC 421 Big Data DSC 421 Big Data (3, 0, 3). Manipulation, storage, and analysis of large scale data; large-scale distributed filesystems like HDFS (Hadoop Distributed File System); large scale databases including SQL and NoSQL; MapReduce algorithm design. Sample Texts: Anand Rajaraman and Jeffrey David Ullman, Mining of Massive Datasets, Cambridge University Press, 2011. Tom White, Hadoop: The Definitive Guide, 3rd Edition, O’Reilly Media, 2012. Prerequisites: DSC 411 (Data Mining) CSC 450 (Database Management Systems) Objectives and Outcomes: 1. 2. 3. 4. Explain how large-scale distributed filesystems work. Analyze and select appropriate database solutions from SQL and NoSQL options. Design and implement algorithms for MapReduce systems. Apply data mining techniques to big data sets. Topics: 1. What is Big Data? 2. Large-Scale Distributed Filesystems 3. Developing MapReduce Algorithms 4. How MapReduce Works 5. Locality Sensitive Hashing 6. Link Analysis 7. Analysis of Massive Graphs 8. Large-Scale Machine Learning 9. Scaling Up Relational Databases 10. NoSQL Databases 11. Mining Data Streams 12. Big Data Case Studies Coursework: Programming Assignments Team projects Presentations Midterm and final exams