Download CSCI3170 Introduction to Database Systems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data model wikipedia , lookup

Asynchronous I/O wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Database wikipedia , lookup

Data center wikipedia , lookup

Data analysis wikipedia , lookup

Operational transformation wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

3D optical data storage wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Clusterpoint wikipedia , lookup

Concurrency control wikipedia , lookup

Business intelligence wikipedia , lookup

Database model wikipedia , lookup

Transcript
CSCI5570
Large Scale Data Processing Systems
Course Overview
Instructor: Prof. James Cheng
Course Webpage
2
Check course webpage regularly
http://www.cse.cuhk.edu.hk/~jcheng/5570.html
Remark: I prefer to put the course webpage under
my own directory to make it easier for offcampus access.
Topics Overview
3
Topic
Tentative
Schedule
Distributed Data Analytics Systems
Distributed Database Systems
NoSQL
Weeks 1-3
Weeks 4-5
Weeks 6-7
NewSQL
Distributed Graph Processing Systems
Distributed Data Storage Systems
Weeks 8-9
Weeks 9-10
Weeks 11-12
Distributed Stream Processing Systems
Other Large Scale Data Processing Systems
Weeks 12-13
???
Distributed Data Analytics Systems
4
Focus on state-of-the-art big data platforms, widely
adopted by industry (e.g., Hadoop, Spark) or best in
research (e.g., Naiad, Husky)
Fundamental concepts of big data analytics systems
Applications (too ad hoc to teach them all, but you can try
them out with the course project):
Data collecting, data extraction, data cleaning …
Machine learning (e.g., classification, clustering, recommendation,
feature selection, dimensionality reduction …)
OLAP, data cube
Data mining
Graph analytics (including social network analysis)
Similarity search (e.g., scalable locality sensitive hashing)
Distributed Database Systems
5
Fundamental concepts of distributed database systems,
prerequisite to NoSQL and NewSQL, as well as other
distributed data processing systems
Parallel query processing
Distributed query processing
NoSQL/NewSQL
6
Relational databases are the foundation of
western civilization, but now is the era of NoSQL
databases
NoSQL databases, such as MongoDB,
Cassandra, CouchDB, etc., are rapidly taking
large shares of the market from traditional
vendors such as Oracle
Must learn for big data analytics
NewSQL databases try to combine the pros of
both traditional DBMS and NoSQL
Distributed Graph Processing Systems
7
Graph data: web graphs, online social networks,
mobile communication networks, financial
networks, biological networks, neutral networks …
Distributed systems that make the analysis of
these large scale graphs/networks possible
Key techniques and algorithms for large scale
graph data processing
Distributed Data Storage Systems
8
How to store massive volumes of different types
of data, retrieve them, and update them
efficiently?
How to handle consistency issues?
How to handle availability issues?
Distributed Stream Processing Systems
9
Streaming data become common today, e.g.,
tweets, news feeds, …
How to analyze such massive high-speed data in
real time?
Key techniques and applications
Reading List
10
A list of papers for each topic (except for the older topics
such as Relational Database Systems and Distributed
Database Systems) will be released weekly
Reference
11
Database Systems – The Complete
Book
•Second edition (Prentice Hall)
•Hector Garcia-Molina, Jeffrey Ullman
Jenifer Widom
Reference
12
Database Management Systems
•Third edition
•Raghu Ramakrishnan, Johannes
Gehrke
Assessment Criteria
13
Short (Bring-Home) Quizzes: 50%
Select 5 topics and read papers from the reading list for
those topics.
Select one paper for each of the 5 topics, write a review
for the paper. The review should include a summary of
the paper, 3 strong points and 3 weak points, and more
detailed comments and suggestions.
Make an appointment with me to discuss these 5
papers. You should show me that you have deep
understanding about the works.
Assessment Criteria
14
Course Project: 50%
Either individual or a group of two students
Students may choose to do one of the following:



develop an application, a library package, or a sub-system based
on some existing system (e.g., Spark, Husky, Hadoop, Storm,
etc.)
improve an existing system (by either improving its performance
in some aspects, or adding new functionalities)
develop a new system (prototype) for large scale data
processing
High flexibility for students to explore different things, but
students must first get our approval of their project proposal
(must be finalized on Sept 29 1p.m., so talk to us earlier)
Some suggested projects will be posted after the first lab