Download SE 611723: Advanced DBMS - Al

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SQL wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

IMDb wikipedia , lookup

Oracle Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Ingres (database) wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
SE 611723: Advanced DBMS
Shadi Aljawarneh
Course Description
Database management systems are standard tools that enable the storage and retrieval of data within modern
information systems. Units introducing database concepts are now an accepted part of most computer science courses.
These introductory units tend to concentrate on the use of relational database systems. This advanced module, in
contrast, deals with implementation aspects of relational systems and tests the candidates’ knowledge of the current
enhancements to relational database systems, object oriented database, RDF database and XML database systems.
Course Aims
The overall aims and objectives of this course will help you to:
1. Develop your knowledge and understanding of the underlying principles of Relational
2. Database Management System
3. Build up your capacity to learn DBMS advanced features
4. Develop your competence in enhancing database models using distributed databases
5. Build up your capacity to implement and maintain an efficient database system using emerging trends.
Course Objectives
1. Upon completion of the course, you should be able to:
2. Describe the basic concepts of Relational Database Design
3. Explain Database implementation and tools
4. Describe SQL and Database System catalog.
5. Describe the process of DB Query processing and evaluation.
6. Discuss the concepts of transaction management.
7. Explain the Database Security and Authorization.
8. Describe the design of Distributed Databases and big data.
9. Know how to design with DB, XML and RDF.
10. Describe the basic concept of Data warehousing and Data mining
11. Discuss the emerging Database Models Technologies and Applications
Reading material will consist primarily of research papers. All students will have to present a research paper of their
choice, either from the list below or other papers subject to instructor’s approval. There will also be two exams
(midsem/endsem), assignments, and a course project.
Anyone who does an exceptional course project that has the potential to be a publishable paper is eligible for a straight
Excellent grade. Otherwise the grading breakup would be midsem 30, endsem 40, project 20 and assignments plus
seminar presentation 10 (the breakup of these will depend on whether we have individual or joint seminars, which
depends on the final enrollment).
Assignments To be decided.
Project The project is mandatorily an implementation oriented project or a literature survey is acceptable as a project.
(You may still need to do some literature survey to figure out your project though.) Projects should be done in groups
of 2.
A basic project will take any of the papers we study in the course, or other related papers, and implement the
algorithms in the paper, and do a very basic performance study. However, I would expect most projects to improve
upon existing techniques.
A more advanced project would take a problem specification for which no solution is publicly available, figure out how
to solve it, and implement the solution.
Textbook (for background material only)
Database System Concepts, 6th Ed.
Avi Silberschatz, Hank Korth, and S. Sudarshan. McGraw Hill, 2010.
(book home page) <
Database Design and
Implementation
1. (Oct 13)
Part 1: Relational Databases.
Related papers, not required reading:



Prasan's full thesis
The Volcano Optimizer Generator: Extensibility and Efficient Search.
Goetz Graefe, William J. McKenna
ICDE 1993: 209-218
The Cascades Framework for Query Optimization.
Goetz Graefe
IEEE Data Eng. Bull. 18(3): 19-29 (1995)
2. (Oct 13)
Part 2: Database Design
3. (Oct 20)
Data Storage and Querying
Related papers, not required reading:


4. (Oct 27)
Optimizing Nested Queries with Parameter Sort Orders
Ravindra Guravannavar, Ramanujam H.S., S. Sudarshan
Talk (ppt)
Reducing Order Enforcement Cost in Complex Query Plans
Ravindra Guravannavar and S. Sudarshan, ICDE 2007
Tech Report (@arxiv.org)
Talk (ppt)
Database Security &Authorization
From TextBook “ADVANCED DATABASE MANAGEMENT SYSTEM” By
NATIONAL OPEN UNIVERSITY OF NIGERIA
Execution strategies for SQL subqueries
Mostafa Elhemali, Cesar A. Galindo-Legaria, Torsten Grabs, Milind Joshi
SIGMOD Conference 2007: 993-1004
(Talk from SIGMOD 07 (ppt))
Related papers, not required reading:
Query Processing for SQL Updates
Cesar A. Galindo-Legaria, Stefano Stefani, Florian Waas
SIGMOD Conference 2004: 844-849
Talk (ppt)
Adaptive Query processing
5. (Nov 3)
Massively Parallel Data
Management Systems
(a.k.a. Big Data Systems)
Background reading: The
parallel database chapter and
the distributed database
chapter from DB Concepts.
Slides: Chapter 18: Parallel
Databases, and Chapter 19:
Eddies: Continuously Adaptive Query Processing,
Avnur and Hellerstein, SIGMOD 2000.
(Talk taken from http://web.cs.wpi.edu/~cs561/s05/talks/eddy-sigmod00cs561.ppt)
(Adaptive Query Processing using Eddies (ppt) by Amol Deshpande) (Jan 25,
2011)
Talk (ppt)
Distributed Databases (plus
3PC, not available on book
site)
6. Nov 10, 17)
Unit 1: Object Oriented Database
Unit 2: Database and XML
Unit 3: Introduction To Data Warehousing
Unit 4: Introduction to Data Mining
7. (Nov 24)
Bigtable: A Distributed Storage System for Structured Data
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber,
OSDI 06)
Video of talk by Jeff Dean: Local mp4 copy OR on video.google.com
Talk (ppt)
Related papers, not required reading:


8. (Nov 24)
Megastore: Providing Scalable, Highly Available Storage for Interactive
Services,
Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin,
James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim
Yushprakh
CIDR 2011
You can also read about the Google AppEngines DataStore API, an API
in Python.
PNUTS: Yahoo!'s Hosted Data Serving Platform,
Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein,
VLDB Talk by
Brian Cooper (ppt)
Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver and Ramana
Yerneni.
VLDB (industry track) 2008.
Related papers, not required reading:

9. (Nov 24)
database implementation on S3 (Brantner et al SIGMOD 2008)
Asynchronous view maintenance for VLSD databases
Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, Raghu
Ramakrishnan, SIGMOD 2009
Talk (odp) and(pdf)
Old talk from 2010 (ppt)
Related papers, not required reading:

10. (Nov 24)
The Megastore paper (see above), to understand how it does
asynchronous maintenance of indices.
SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets
Ronnie Chaiken, Bob Jenkins, Per-‫…أ‬ke Larson, Bill Ramsey, Darren Shakib,
Simon Weaver, and Jingren Zhou,
VLDB 2008
Related papers, not required reading:


Hive - a petabyte scale data warehouse using Hadoop,
A. Thusoo, J. S. Sarma, N. Jain, Shao Zheng, P. Chakka, Zhang Ning, S.
Antony, Liu Hao, and R. Murthy,
ICDE 2010
Pig Latin: A Not-So-Foreign Language for Data Processing
Talk (pptx)
Chris Olston, Brian Reed, Utarsh Srivastava, Ravi Kumar and Andrew
Tomkins
SIGMOD 2008, Talk (ppt)
Week of 18-23:
Midsemester Exam
11. (Dec 1)
IR and DB
12 and 13. (Dec 8)
Guest lecture on HSearch by Abinasha Karana, Founder and CTO Bizosys,
Bangalore
Talk (pdf)
Keyword Searching and Browsing in Databases using BANKS
Gaurav Bhalotia, Charuta Nakhe, Arvind Hulgeri, Soumen Chakrabarti and S.
Sudarshan, ICDE 2002
Talk (ppt)
Related papers, not required reading:

Bidirectional Expansion For Keyword Search on Graph Databases,
Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S Sudarshan,
Rushi Desai and Hrishikesh Karambelkar, VLDB 2005
(Talk: ppt)
Big Data (again)
14. (Dec 15)
Spanner: Google's Globally-Distributed Database
James C. Corbett et al., OSDI 2012
Talk (pptx) by Sagar
Chordia
Column-stores vs. row-stores: how different are they really?
Daniel J. Abadi, Samuel Madden, Nabil Hachem:
SIGMOD Conference 2008: 967-980
Talk from 2010 and Talk from 2011 (ppt) by Paresh Modak and Souman
Talk (pdf) (source
files) by Subhro
Bhattacharyya and
Souvik Pal
Column Stores
15. (Dec 22)
Mandal .
See also VLDB 09 tutorial on column stores by Hariozopoulos, Abadi and Boncz
Streaming Data
16. (Dec 29)
Monitoring Streams - A New Class of Data Management Applications,
Donald Carney, Ugur ‫‡أ‬etintemel, Mitch Cherniack, Christian Convey, Sangdon
Lee, Greg Seidman, Michael Stonebraker, Nesime Tatbul, Stanley B. Zdonik
VLDB 2002: 215-226
Talk (pptx) by Joydip Datta and Debarghya Majumdar
You must also read this talk: (PODS 2002 talk by Motwani)
Related papers, not required reading



17. (Dec 29)
Big Data yet again (Self
Talk (pptx) by Ajay
Gupta,Vinit Deodhar
Aurora: A New Model and Architecture for Data Stream Management.
Abadi, D. J., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee,
S., Stonebraker, M., Tatbul, N., and Zdonik, S.
The VLDB Journal 12 (2003), 120-139.
Abadi, D., Ahmad, Y., Balazinska, M., Cetintemel, U., Cherniack, M.,
Hwang, J.-H., Lindner, W., Maskey, A. S., Rasin, A., Ryvkina, E., Tatbul,
N., Xing, Y., and Zdonik, S. The Design of the Borealis Stream
Processing Engine. In Proceedings of the 2nd Conference on Innovative
Databasee Research (CIDR) (Jan. 2005), pp. 277-289.
Models and issues in data stream systems,
Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, Jennifer
Widom PODS 2002
(PODS 2002 talk by Motwani)
Physically Independent Stream Merging
Badrish Chandramouli, David Maier and Jonathan Goldstein
ICDE 2012
Talk (pdf) by Amol
Bhangdiya and
Pushkar Khadilkar
Study)
18.
Declarative Data
Processing (outside of
databases)
18. (Jan 5)
19. (Jan 5)
Calvin: Fast Distributed Transactions for Partitioned Database systems Systems
Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip
Shao, and Daniel J. Abadi.
SIGMOD 2012
Talk (pptx) by K. V.
Mahesh and
Abhishek Gupta
Declarative Networking
Boon Thau Loo, Tyson Condie, Minos Garofalakis, David E. Gay, Joseph M.
Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion
Stoica
CACM 52(11), Nov 2009
Talk (pptx) by Harsh
Vardhan and
Sandeep Joshi
Scalability for Virtual Worlds
Talk (pptx) by Pratik
Nitin Gupta, Alan J. Demers, Johannes Gehrke, Philipp Unterbrunner, Walker M. Patre and Biplab Kar
White
ICDE 2009
Talk (ppt) by Siddharth Chinoy and Zibran Shaikh
Related papers, not required reading:


Distributed Databases
SEMMO: A Scalable Engine for Massively Multiplayer Online Games
(Demonstration Paper) Nitin Gupta, Alan Demers, and Johannes Gehrke,
SIGMOD 2008
Database Research Opportunities in Computer Games Walker White,
Christoph Koch, Nitin Gupta, Johannes Gehrke, and Alan Demers, In
SIGMOD Record, September 2007.
RDF Database
20. (Jan 12)
21. (Jan 12)
RDF-3X: a RISC-style Engine for RDF
Thomas Neumann, Gerhard Weikum, VLDB 2008
Discussion on future of data management
Talk (pptx) by
Pankaj Vanwari