Download PowerPoint ******

In-Memory Database 전준민, 정주성, 이한민, 곽하녹 1 Table of Contents 1. 2. 3. 4. 5. 6. 7. Introduction Disk Resident DB vs In-Memory DB Column Store Durability Data Overflow Products of IMDB Optimization Aspects on IMDB 2 1. Introduction What is In-Memory Database (IMDB) ? Architecture Rise of IMDB Applications Myths about IMDB 3 What is In-Memory Database (IMDB)? • An in-memory database system is a database management system that stores data entirely in main memory. 4 What is In-Memory Database (IMDB)? 5 Architecture • Fast data access • Algorithms optimized on main memory • Efficient memory usage • Durability 6 Rise of the IMDB • Multicore Processors • Cheaper and Bigger Memories • Demands on Fast Databases 7 Rise of the IMDB 8 Rise of the IMDB 9 Applications • Low-latency, high volume systems 10 Myths about IMDB • Given the same amount of RAM, disk DBs can perform at the sa me speed as IMDBs (by using caching technology). • If a RAM disk is created and a traditional disk DB is deployed on it, it delivers the same performance as an in-memory database. • • • • write on disk buffer manager indexes for disk redundant data 11 2. Disk Resident DB (DRDB) vs In-Memory DB (IMDB) DRDB vs IMDB : Overview Indexes Concurrency Control 12 DRDB vs IMDB : Overview [1] DRDB IMDB File I/O Carries File I/O burden No file I/O burden Storage Usage Assumes storage is abundant Uses storage more efficiently Algorithm optimized for disk CPU Cycles More CPU cycles Algorithms optimized for me mory Less CPU cycles Persistence Non-volatile Volatile Lock Fine Locks Coarse Locks Algorithms 13 Indexes: B+-Tree in DRDB [2] • The redundant data are kept in some index structures, to reduce I/O. 14 Indexes: T-Tree in IMDB [3] • The indexes in IMDBs are focused on reduced memory consumption and CPU cycles. • In the early 90's, Lehman and Carey proposed the T-tree as an index structure for main memory database. • The T-tree indexes are more efficient than B-trees in that they require less memory space and fewer CPU cycles. 15 Indexes: T-Tree in IMDB • The T-tree evolved from AVL Trees and B-Trees. 16 Indexes: Hash indexes in IMDB • Hash indexes are used for key-value based in-memory databases (cache servers) such as Redis and Memcached. 17 Concurrency Control • In DRDBs, locking granules are low level. • To reduce contention • To increase parallelism • In IMDBs, locks are coarse-grained thanks to fast processing. • Locking granules like a relation or an entire database • No need to look up hash table • Serial scheduling is enough in most cases 18 3. Column Store What is Column Store? Benefits of Column Store Delta Storage 19 What is Column Store? • Column Store • stores data tables as columns of data rather than as rows of data 20 Benefits of Column Store [4] • Column stores are more suitable in IMDB than row stores • Better parallelism • Better compression • Faster data access • Using parallel processing. • Especially for aggregations. . 21 22 Benefits of Column Store: Parallelism [5] • Column storage can easily be separated into equal parts which leads to effective parallel processing. • Highly parallelized scan operations are available which are faster than indexed searches. • The row store cannot compete if processing is set-oriented and requires column operations, but most applications are based on set-oriented processing and not direct tuple access. 23 Benefits of Column Store: Parallelism • Highly parallelized scan operations using column stores are faster than using just ordinary indexes. 24 Benefits of Column Store: Compression • Column store allows highly efficient compression because the columns contain only few distinct values. • Compression 25 Delta Storage [6] • Since writing on compressed column stores in real time is inefficient, delta storage techniques are used. • Delta Storage • optimized for write operations • Main Storage • compressed column store 26 Delta Storage • INSERT • insert a new record in the delta storage. The merge process will move the record from delta to main. • DELETE • A DELETE statement will select the record and mark it as invalid by setting a flag (for main or delta). The merge process will delete the record from memory once there is no open transaction active for it anymore. • UPDATE • An UPDATE statement will insert a new version of the record. The merge process will move the latest version from delta to main. Old versions will be deleted once there is no open transaction active for them anymore. 27 Delta Storage: Simplified View of InsertOnly Approach 28 Delta Storage • The merge process starts when the delta storage grows big enough. 29 4. Durability Logging and Checkpointing Command Logging NVM Logging 30 Durability • Durability is difficult to support in IMDBs • Many IMDBs have added durability via the following mechanisms • Checkpoints • Transaction logging 31 Checkpointing • Checkpoints in DRDB • Bring pages on disk up to date • Reduce the work of recovery • Checkpoints in IMDB • Make a copy of the data on disks (snapshot) • Truncate the logs 32 Logging and Checkpointing Transaction Memory Tablespace log sync REDO Log File Physical Disk Checkpoint Image File Memory Log Buffer • Problem • Log I/O becomes bottleneck • How long do we need to keep the log? • Until the next checkpoint 33 Logging and Checkpointing [7] • TPCC benchmarking on DRDBs (New Order transaction) • Logging takes up a non-small portion • Larger portion for IMDBs 34 Command Logging [8] • Light-weight, coarse-grained logging technique • Logical logging • Advantages • Write substantially fewer bytes per transaction than physical logging • Reduce run time overhead • Disadvantages • Slow recovery • Failures that require recovery to ensure system availability are much less frequent • 1.5X higher throughput than main-memory optimized implementation of physical logging 35 Command Logging 36 NVM Logging [9] • NVM (Non-Volatile Memory) • low read/write latency like DRAM • persistent write like SSD DRAM NAND Flash NVM ByteAddressable Yes No Yes Capacity 1X 4X 2-4X Latency 1X 400X 3-5X 37 NVM+DRAM Architecture • DBMS relies on both DRAM and NVM 38 5. Data overflow Anti-caching Project Siberia 39 Data overflow • Datasets may not fit in DRAM • IMDB Solutions • Anti-caching • Project Siberia 40 Anti-caching [10] • Used in H-Store • Cold data is moved to disk in a safe manner • Bloom filter used for tracking data • Manage cold data by maintaining a LRU chain 41 Anti-caching 42 Anti-caching • Fine-grained eviction • eviction is performed at tuple-level, not page-level • Non-blocking fetches • a transaction that accesses evicted data is simply aborted and then restarted at a later point 43 Project Siberia [11] • Used in Hekaton • Automatically and transparently maintain cold data on cheaper secondary storage • Allow more data to fit in memory • Log-based management of cold data 44 6. Products of IMDB H-Store / VoltDB Hekaton SAP HANA In-memory NoSQL Databases 45 Products of IMDB 46 H-Store / VoltDB • Distributed row-based in-memory relational database • Targeted for high-performance OLTP processing • Light-weight logging strategy • Anti-caching 47 Hekaton • Memory-optimized OLTP engine • Fully integrated into Microsoft SQL server • Multi-version concurrency control • Project Siberia 48 SAP HANA • A distributed in-memory database featured for the integration of OLTP and OLAP • Provides rich data analytics functionality by offering multiple query language interfaces (e.g., standard SQL, SQLScript, MDX, WIPE, FOX and R) 49 SAP HANA • Three-level column-oriented unified table structure 50 In-memory NoSQL Databases • RAMCloud • Distributed in-memory key-value store, featured for low latency, high availability and high memory utilization • Bitsy • Embeddable in-memory graph database that implements the Blueprints API, with ACID guarantees on transactions based on the optimistic concurrency mode 51 Comparison of IMDB Systems H-Store Relational Databases NoSQL Databases Graph Databases Data Model relation(row) [12] Indexes Fault Tolerance Memory Overflow OLTP hashing, b+tree, binary tree command logging, checkpoint, replica anti-caching OLTP latch-free hashing, Bwtree logging, checkpoint, replica Project Siberia table/partition -level swapping Workloads Hekaton relation(row) SAP HANA relation, graph, text OLTP, OLAP timeline index logging, checkpoint, standby server RAMCloud key-value object operations hashing logging, replica N/A OLTP optimistic concurrency control logging, backup N/A Bitsy N/A 52 7. Optimization Aspects on IMDB 53 Optimization Aspects on IMDB [12] Aspects Concerns Index cache consciousness, time/space efficiency T-Tree, CSS-Trees, CSB+-Trees, BD-Tree cache consciousness, space efficiency columnar layout, HANA Hybrid Store, log structure overhead, correctness virtual snapshot, transaction memory, MVCC Query Processing code locality, time efficiency stored procedure, JIT compilation, sorting Fault Tolerance durability, correlated failures, availability group commit and log coalescing, NVM, command logging, remote logging Data Overflow locality, paging, hot/cold classification anti-caching, Hekaton Siberia, data compression, virtual memory management, pointer swizzling Data Layout Concurrency Control Related Work 54 References [1] Garcia-Molina, Hector, and Kenneth Salem. "Main memory database systems: An overview." Knowledge and Data Engineering, IEEE Transactions on 4.6 (1992): 509-516. [2] Comer, Douglas. "Ubiquitous B-tree." ACM Computing Surveys (CSUR) 11.2 (1979): 121-137. [3] Lehman, Tobin J., and Michael J. Carey. "A study of index structures for main memory database management systems." Conference on Very Large Data Bases. Vol. 294. 1986. [4] Abadi, Daniel J., Samuel R. Madden, and Nabil Hachem. "Column-stores vs. rowstores: how different are they really?." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008 [5] Plattner, Hasso. "A common database approach for OLTP and OLAP using an inmemory column database." Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. ACM, 2009. [6] Färber, Franz, et al. "The SAP HANA Database--An Architecture Overview."IEEE Data Eng. Bull. 35.1 (2012): 28-33. 55 References [7] Harizopoulos, Stavros, et al. "OLTP through the looking glass, and what we found there." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008. [8] Malviya, Nirmesh, et al. "Rethinking main memory oltp recovery." Data Engineering (ICDE), 2014 IEEE 30th International Conference on. IEEE, 2014. [9] DeBrabant, Justin, et al. "A Prolegomenon on OLTP Database Systems for Non-Volatile Memory." Proceedings of the VLDB Endowment 7.14 (2014). [10] DeBrabant, Justin, et al. "Anti-caching: A new approach to database management system architecture." Proceedings of the VLDB Endowment 6.14 (2013): 1942-1953. [11] Eldawy, Ahmed, Justin Levandoski, and Paul Larson. "Trekking through siberia: Managing cold data in a memory-optimized database." Proceedings of the VLDB Endowment 7.11 (2014). [12] Zhang, Hao, et al. "In-memory big data management and processing: A survey." (2015). 56

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download PowerPoint ******