Download In Memory Database

In Memory Database Jordan Easley, Kelby Chase, Nirmal Dharmaratne Outline -History -Introduction -Features -Parallelism -Concurrent Operations -Reliability and Scalability -Long term data -Disadvantages -Differences between Oracle TimesTen and SAP HANA Brief History • • • • • Development first started in 1988 by White Cross Systems, Limited. Relational IMDS started in 1993, released in 1993 by Perihelion Software. Times-ten acquired by Oracle in 2005. SAP HANA announced in 2011 - the first full in-memory database system. Only recently RAM is cheap enough to be viable What is In-Memory Database? • • A database management system that stores data entirely on the main memory. Designed for use with big data and high level analysis. Disk Storage • Operates by magnetising a film of magnetic material on a spinning disk. To retrieve data, the head assembly moves to the data location while the disk is spinning. o Access time is limited by variations in physical location of data. o RAM • Random Access Memory o o Data stored using transistors and capacitors that hold a charge. Allows data to be accessed in any order, regardless of location. RAM Speed • • IMDS has o 4x faster read speeds o 400x faster writing speeds Memory Requirements • • A well designed in-memory system can require less than a 100K of memory. This is much smaller than traditional databases which require from 100s of kilobytes up to many megabytes. OLAP • Online Analytical Processing o o Uses data cubes - databases containing every possible combination of dimension, measure, and hierarchy. Stores pre-processed aggregate data so that it isn't calculated for every query. Features of some in memory databases -No Aggregate tables -Column oriented structure -Limited projections -Insert only data management -No caching -On-the-fly extensibility -Optimization -Indexing -Bulk load mode Aggregate Data • • • • In traditional database design aggregate data is stored in separate tables to speed up queries. As data scales, more and more aggregate data is added to speed up queries. This adds complexity to the system. • • SAP HANA has decreased code complexity by eliminating aggregate data in favor of an in-memory systems higher speed and compute all aggregates need "on-the-fly". This method provides the same or better performance as traditional systems while making system maintenance a lot easier Column Oriented Structure • A database management system that stores data tables as sections of columns of data rather than rows of data. Column Oriented Structure • • • Does not scan each row left-to-right Retrieves data only from the specified columns Each entity has an index corresponding to its row. Column oriented serialization • • row oriented databases serialize all values of in a row together 1,Smith,Joe,40000; 2,Jones,Mary,50000; 3,Johnson,Cathy,44000; column oriented databases serialize all values of a column together. 1,2,3; Smith,Jones,Johnson; Joe,Mary,Cathy; 40000,50000,44000; Column vs. Row based structure • Row based database o o • Traditionally used when a system needs all or most of an entire row. $SQL = "SELECT * from Customers WHERE Fname = 'John' AND Lname = 'Smith'" Column based database o o o Used for high level analysis and functions $SQL = "SELECT AVG(Salary) FROM Employees" Typically takes longer to enter new records Limited Projections • • Instead of grabbing all information it only takes needed data. Limited projections are part of what make Column base design possible. Insert Only • Data management that does not update or delete records Does not use processing time on deleting or updating. o Allows companies to archive data. o Caching • • Traditional databases may cache the results of frequently queried data in RAM Caching is not existent in an in-memory database o Again, it just adds complexity to the system On-The-Fly Extensibility • Allows the adding of new columns to an existing database without the rewriting of all pages using the database. Optimization • In-memory systems are considered more optimized than traditional systems • On-disk databases have to keep redundant data for the sake of indexing and caching, and use CPU power to maintain the cache. Indexing Techniques •Hash indexes •T-tree indexes Hash indexes •They specify how many buckets are to be allocated for the table's hash index. T-tree indexes • A typical disk-based RDBMS uses B+-tree indexes to reduce the amount of disk I/O needed to accomplish indexed look-up of data files. • T-tree indexes are more economical than Btrees in that they require less memory space and fewer CPU cycles T-tree cont. • T-tree consists of a group of T-tree nodes . • Each node consists of a group of 64 index • • • keys, each of which points to a separate table row. Node holds an array of keys and three pointers: one to parent node and two to right and left child respectively Hold only the value of the left-most key in each index node. Other values can be accessed by accessing the data rows themselves. • • • The T-tree search algorithm knows with just two comparisons whether the value it is searching for is on the current node or elsewhere in memory. With every new index node pointer indirection, its search area is cut in half They shrink the complexity of code and reduce code path length by removing unnecessary computations and taking advantage of pointer traversal where T*-tree -shows the search path used to locate a range of rows that contain the value ‘58’ in the column designated as the index key. T tree vs B tree Bulk load mode • Bulk load mode allows the insert of large amounts of data without the overhead of each individual transaction. Reduction of Layers •In most in-memory databases, you can run processing on the database layer in contrast to traditional database where it is at least 3 tiered application( Web, Presentation,Processing) •This is because in in-memory you don't have to deal with the network transfer of data over to your middle tiers, thus speeding up the Parallelism • • Parallelism is the use of multiple cores within a application. Allows multi-core processing of a single query. Concurrent operations • • Concurrency is not that different from traditional databases Since in-memory databases are faster than traditional ones, transactions complete quicker making lock contention not as important as previously. Reliability of in-memory systems • • While main memory is more volatile than disk memory and thus less reliable, a well designed in-memory system can roll back to the last successful transaction on a system crash Even with this assurance most companies still feel the need to regular backups on disk Scalability of In-Memory Systems • • • In-memory systems have been proven to scale far beyond the terabyte size range. In a test an in-memory database grew to 1.17 TB and 15.54 billion rows. In this test performance remained consistent as the database grew showing near linear scalability. Long term data storage • In-memory systems are not considered useful for storing unused or rarely used data o o SAP calls this passive data Passive data may be moved to secondary storage during downtime Disadvantages of In-Memory systems -Cost -Higher energy use -Requires more floor space Why is there a need for a new system -Data half life- the amount of time in which a set of data is useful. -Latency- the time delay experienced by a system before data is available for use. -Traditional systems reaching limits Differences between SAP HANA and Oracle TimesTen -Oracle TimesTen -Partial in-memory database -More of a huge cache system -better system of concurrency for users -SAP HANA -Fully functioning in-memory database -Uses column based storage -Can use multithreading for individual queries http://www.youtube.com/watch?v=4vKOb6HAz- SAP HANA Feature Benefit Multicore CPU Large memory footprint Greater computation power Faster than disk Row and column store Faster aggregation with column store Compression Highly dependent on actual data used Partitioning Analysis of large data sets Complex computations No aggregate tables Nonmaterialized views Flexible modeling No data duplication Insert only on delta Fast data loads Oracle TimesTen Features Benefits Low latency Microsecond response time Multi-user concurrency Durability and Persistence Transactional parallel replication Supports SQL and PL/SQL via ODBC, JDBC, ODP.NET, OCI and Pro*C/C++ Real time performance Consistent response time Automated database failover Zero data loss Supports OLTP and analytic workloads Hardware -Both Oracle TimesTen and SAP HANA use the same hardware -Intel CPU -40 Cores -1 TB of RAM So In Conclusion... • • • • In Memory Computing has become a big deal because the prices of RAM have fallen in recent years. Main memory is much faster than disk storage, so database design can be different. Although still more expensive than traditional design, In Memory Database becomes very efficient with large amounts of data, especially when using it for large-scale analysis. For now, SAP is ahead of the pack in IMDS.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download In Memory Database