* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download select
Survey
Document related concepts
Transcript
Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan [email protected] Sandor Heman @ cwi.nl Jennie Zhang @ cwi.nl Romulo Goncalves @cwi.nl Administrative details • The website evolves as during the course • Exam material is marked explicitly • Lab work deadlines are strict • Email is the preferred way to communicate • Tomorrow the assistants will be available in person between 11:00-12:00, room REC-P.123 Relational systems • A database system should simplify the organization, validation, sharing, and bookkeeping of information • Prerequisite knowledge – Relational data model and algebra – Data structures (B-tree, hash) – Operating system concepts – Using a SQL database system • What is your practical experience? [Ruby on Rails expertise needed] Applications • Bread-and-butter applications? – Web-shop – Banking systems – Inventory systems – Production systems – Shopping systems – Government systems – Health systems – Multimedia systems – Science systems … Advanced Applications • Bread-and-butter applications ??? – Banking systems • What happens if you install a stock trading system which should handle >100K transactions/minute • How to derive trading advice using compute intensive applications • How to warn thousands of users about their trading opportunity – …. Need for parallel, distributed main-memory database technology… Advanced application requirements • Bread-and-butter applications – Inventory applications • How to install a battlefield inventory systems • How to deliver goods just in time? • How to keep track of moving objects/persons ? • … need for sensor-based database support and RFID tags … need for a new DBMS ?… Advanced Applications • Production systems – How to interact with component suppliers – How to manage the production workflow – How to avoid bad production steps – How to maintain a database with 12000 tables (SAP) • … need for interoperability between autonomous systems… datamining and knowledge discovery… Advanced Applications • Health information systems – How to monitor your health over 30 years – How to enable quick response to a heart attack • …need for interoperable database systems … The Ambient Home HELP The Ambient Home HELP 911 called MonetDB DataCell MonetDB DataCell A Shared Tuple Space using an SQL DBMS nucleus 911 called Advanced applications • Government systems – Security • Biometric data management issues, finger/image matching – Public safety • Forensics, manipulate complex objects using proprietary algorithms • …need for extensible database technology…need to support unstructured data… Advanced Applications • Science systems – The new accelerator in CERN • how to handle >1PTByte files – The Sloan Digital Skyserver schema is 200 pages and the catalogued data 2.5Tb • How to query this efficiently – ..need for P2P and … a novel way to organize data… LOFAR central processor specs • Streaming Data – Input: 320 Gbit/s – Internally within correlator: 20 Tbit/s – Into storage: 25 Gbit/s = 250 TByte/day – Final products: 1-3 TByte/day • High Performance Computing – Correlation: 15 Tflops – Pre processing and filtering: 5 Tflops – Off-line processing (calibration, analysis): 5-10 Tflops – Visualisation, control, scheduling etc: 2 Tflops • Storage – On-line temporal storage: 500 TByte – Archive: PByte range of data stored in Grid Technological challenges • Data is often not structured as tables – XML and XQuery • Data does not always fit on one system – Distributed and parallel databases • Querying is more like world-wide searching – Continuous and streaming queries • A database tells more than facts – Datamining and knowledge discovery Code bases • Database management systems are BIG software systems – Oracle, SQL-server, DB2 >1 M lines – PostgreSQL 300K lines – MySQL 500 K lines – MonetDB 200-800 K lines – SQLite 40K lines • Programmer teams for DBMS kernels range from a few to a few hundred Performance components • • • • • Hardware platform Data structures Algebraic optimizer SQL parser Application code – What is the total cost of execution ? – How many tasks can be performed/minute ? – How good is the optimizer? – What is the overhead of the datastructures ? Not all are equal Gaining insight • Study the code base (inspection + profiling) – Often not accessible outside development lab • Study individual techniques (data structures + simulation) – Focus of most PhD research in DBMS • Detailed knowledge becomes available, but ignores the total cost of execution. • Study as a functional black box – Analyse a small application framework The Jack The Ripper Project • Study the snippet of the database technology and design an XQuery and SQL application • What is the schema? • What are the queries? • What are unorthodox solutions? Learning points • My poor knowledge on relational database? Read the chapters on SQL and relational algebra. Knowledge on data structures comes in handy. • Database systems are much more than administrative bookkeeping systems Learning points – Advanced application challenge the technology provided by a DBMS – Many techniques do not easily scale in size, complexity, functionality – Effectiveness of a DBMS is determined by many tightly interlocked components