Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Oracle Database wikipedia , lookup
Concurrency control wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Access wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Relational model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
A Survey of Techniques Used to Reduce the Semantic Gap Between DBMS Storage Manager and Storage Subsystems Biplob Kumar Debnath Nagapramod Mandagere 1 Semantic Gap Standard interfaces to storage subsystem virtualizes storage as a simple linear array of fixed-size logical blocks (+) Compatibility (+) Case of implementation (-) Hide useful information DBMS storage managers do a lot to optimize storage access, without sufficient information Storage subsystem front-ends do a lot to optimize under the covers, without enough info about application Wouldn’t it be better if we can reduce the semantic gap? 2 Contributions Classification of Existing Approaches Comparison of Existing Approaches Research Directions Recommendations 3 Outline Background on Database Storage Manager Motivation Classification overview Comparison of Different approaches Future Research Directions Conclusion 4 Background: Database Storage Manager Interface between DMBS and disk storage is in terms of fixed sized disk pages Same Layout on disk/memory Multi-level storage hierarchy Different optimal access on each level Figure is taken from [1] 5 Data Layout in Disk 6 Slide is taken from [1] NSM in Memory Hierarchy Optimized for full record access Slow for partial access Wastes I/O Bandwidth Low spatial locality at CPU cache Figure is taken from [1] 7 Importance Pitfalls in Database Storage Managers DISK DATABASE Page layout is Relational tables are LINEAR TWO-DIMENSIONAL When a query requests a subset of attributes, the database must read ALL attributes even though only a fraction is needed However, with tables of small number of attributes and online transaction processing (OLTP) workload, such pitfall was not 8 considered as a major concern in commercial DBMSs. Motivation - New Applications Peta-scale data size Data administration becomes challenging High-dimensional tables (e.g., 500 attributes in a single table is common) NSM data layout is completely inefficient A radical change in the query workload as many queries ask only about very small part of the table (e.g., decision support queries) Need a radical change in the data layout 9 Classification Overview Server Side Modifications Storage Side Modifications Specialized Database Machine New Storage Devices 10 Classification of Existing Approaches Server Side Modifications DSM PAX Fractured Mirrors Atropos LVM Storage Side Modifications Semantic Disk Object based Storage Device Specialized Hardware Database Machine Active Disk New Storage Devices MEMS Object based Storage Devices 11 Decomposition Storage Model (DSM) Partition original table into n 1-attribute sub-tables Each sub-table stored separately in NSM pages 12 Slide is taken from [1] DSM in Memory Hierarchy Optimized for partial-access Slow for full-record access Reconstructing full record incurs random I/O Slide is taken from [1] 13 Comparison of NSM & DSM Page Layout Cache Memory Performance Disk Memory Performance Full-Record Access Full-Record Access Partial-Record Access Partial-Record Access NSM √ x √ x DSM x √ x √ 14 Fractured Mirrors Two copies – one in DSM & one in NSM Requires Double Space 15 Figure is taken from [1] Disk Basics Access time = seek time + rotation delay + transfer time Seek Time = 5-15 ms Rotational delay = 4 ms on average for 7400 rpm drive Transfer time = 1 ms for 8MB/sec Track Aligned Access to save Rotational Time 16 Figure is taken from [11] Motivation Behind LVM Semi-sequential access Track 1 2 Read 1 3 4 5 6 7 Process 1 Skew Head-0 1 2 3 4 5 6 7 Head-1 12 13 14 8 9 10 11 Head-3 16 17 17 19 20 21 15 17 Atropos Logical Volume Manager LVM exposes low level storage characteristics to the applications Applications uses this info for efficient data layout 18 Figure is taken from [3] DB Aware Semantic Disk Find correlations among the blocks. Storage systems have a very high level knowledge of the DBMS working behavior. Uses WAL entries to find correlations among blocks Uses block relations knowledge for better Caching Layout Reliability Security guarantees With some help from DBMS side it can improve performance even better. access time of a particular block or table access counts, # of queries that access table within a fixed time 19 access co-relations for tables and indices Reference [4] Database Machine Explored extensively in late 1970s and late 1980s Classified in 4 categories: Processor per head Processor per track Processor per disk Multi-processor cache Reasons for failure: Use special-purpose hardware. Performance was impressive for only scan, projections operations. Same performance can be achieved by smart indexing techniques. CPU was sitting idle. Communication overhead Database vendors did not agree to rewrite their legacy code base. 20 Reference [6] Active Disk Got attention from the research community again in 1990s. Leverage storage technology advancements and parallel algorithms improvement. Widespread use of disk arrays that use a large number of disks working in parallel. Advancement in parallel algorithms Serial communications advancement Disks are embedded with a processor, memory, cache, and network connection. Disk can perform some computation in addition to storing data. How to map all basic database operations on Active Disk system with proper low-level primitives. 21 Reference [5] MEMS Storage 22 Figure is taken from [1] 2-D Database Access in MEMS MEMS is 2-D Full area is divided into grids Activate only select heads Figure is taken from [1] 23 Object-based Storage Device (OSD) Traditional OSD • Same hardware, new interface Slide is taken from [10] OSD Objects An object is a logical unit File-like access methods Attributes describes the characteristics Security policies authorizes access Variable size and can store any type of data Application can decide what should go into object Dynamically shrink and grow OSD: Shared attributes Traditional Slide is taken from [10] OSD OSD for Databases Figure is taken from [2] 27 Comparison 28 Possible Research Areas Finding efficient layout for indexes. Exploring new possibilities assuming large cache is available in storage devices Making changes to support database aware semantic disk approach. Explore OSD 29 OSD + DBMS Research Issues Deciding upon the object and attribute representations Defining the interface layer between DBMS and OSD Defining a geometry-aware read/write functions in the OSD side Changing the storage manager at the DBMS to be light weight to only communicate with OSD Modifying the query optimizer in the DBMS to take care of a new set of possible query plans that utilize OSD 30 Conclusion It is possible to reduce the semantic gap between DBMS and Storage subsystems. For short term solution, Semantic disk or Atropos like LVM looks promising. For Long term solution, OSD looks very promising. 31 Thank You !!!! 32 Comments / Questions ? 33 References [1] A. Ailamkai, Database Architectures for New Hardware (invited tutorial), VLDB Toronto, Canada, August 2004. [2] Steve Schlosser, Sami Iren. Database Storage Management with Object-Based Storage Devices, DAMON 2005 [3]Atropos: A Disk Array Volume Manager for Orchestrated Use of Disks, In 3rd USENIX Conference on File and Storage Technologies FAST 04, CA. March 2004 [4]Muthian Sivathanu, Lakshmi N Bairavasundaram, et al, Database Aware Sematicaly Smart Storage, FAST 2005. [5]E. Riedel, C. Faloutsos, and D. Nagle. Active Disk Archittecture for Databases. Technical Report CMU-CS-00-145, Carniegie Mellon University, April 2000. [6]K. Keeton. Computer Architecture Support for Database Applications, PhD thesis, University of California at Berkeley, 1999. [7]R. Ramamurthy, D. J. Dewitt, and Q. Su. A Case for Fractured Mirrors, In Proc. VLDB, 2002 [8]A. Ailamaki, D. J. Dewitt, M. D. Hill, and M. Skounakis. Weaving Relations for Cache Performance. In Proc. VLDB, 2001. [9]S. Schlosser et al. , On Multidimensioanla data & Moderndisk, Fast 2005. [10] E. Riedel, Object Bases Storage Device basics, SNIA Tutorial, April 2005. [11] http://www.cs.duke.edu/courses/spring01/cps110/slides/disk-fs.pdf 34 Modern Disk & Multidimensal data 35 Figures are taken from [9]